cs.LG @ 2025-06-13: 2115
-
00 06-12 (4) Rethinking Losses for Diffusion Bridge Samplers Rückdenken von Verlusten für Diffusionsbrücken-Probenehmer 重新思考扩散桥采样器的损失 2506.10982v1 -
01 06-12 Fine-Grained Perturbation Guidance via Attention Head Selection Feinkörnige Störungsführung über Aufmerksamkeitskopfauswahl 通过 “ 关注负责人甄选 “ 指导 2506.10978v1 -
02 06-12 AutoMind: Adaptive Knowledgeable Agent for Automated Data Science AutoMind: Adaptives Knowledgeable Agent für automatisierte Datenwissenschaft 自动Mind:自动数据科学适应性知识代理 2506.10974v1 -
03 06-12 Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning Prinzipierte Ansätze zur Erweiterung neuraler Architekturen auf Funktionsräume für das Bediener-Lernen 将神经结构扩展至功能空间以便操作员学习的原则办法 2506.10973v1 -
04 06-12 Farseer: A Refined Scaling Law in Large Language Models Farseer: Ein verfeinertes Skalierungsgesetz in großen Sprachmodellen Farseer:大语言模式中经过完善的扩大法 2506.10972v1 -
05 06-12 What Exactly Does Guidance Do in Masked Discrete Diffusion Models Was genau tut Anleitung in maskierten diskreten Diffusionsmodellen? 在隐蔽的分辨扩散模型中,指导到底做什么? 2506.10971v1 -
06 06-12 SoK: Watermarking for AI-Generated Content SoK: Wasserzeichen für KI-generierten Inhalt SoK: 用于 AI 生成内容的水标志 2411.18479v3 -
07 06-12 SpectralAR: Spectral Autoregressive Visual Generation SpectralAR: Spectral Autoregressive Visual Generation 光谱AR:光谱自动递减视觉生成 2506.10962v1 -
08 06-12 ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark ChinesischHarm-Bench: Ein chinesischer schädlicher Content Detection Benchmark 中中汉禁区:中国有害内容检测基准 2506.10960v1 -
09 06-12 Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods In-Context-Lernen über strukturierte Manifolds verstehen: Aufmerksamkeit auf Kernel-Methoden überbrücken 理解关于结构式操纵的内文学习:弥合对核心方法的注意 2506.10959v1 -
10 06-12 AssistanceZero: Scalably Solving Assistance Games AssistanceZero: Skalably Solving Assistance Games 援助零:可升级解决援助运动会 2504.07091v2 -
11 06-12 ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems ReGuidance: Ein einfacher Diffusions-Wrapper zur Steigerung der Probenqualität bei harten inversen Problemen 重新指导:一个简单的传播包包,用于提高反向问题样本质量 2506.10955v1 -
12 06-12 Build the web for agents, not agents for the web Erstellen Sie das Web für Agenten, nicht Agenten für das Web 为代理者而不是网络代理者建立网络 2506.10953v1 -
13 06-12 Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training Domain2Vec: Vectorizing Datasets, um die optimale Datenmischung ohne Training zu finden 域2Vec: 将数据集矢量化,以查找未经过培训的最佳数据混合体 2506.10952v1 -
14 06-12 Execution Guided Line-by-Line Code Generation Ausführen Geführte Zeilen-für-Linien-Code-Generierung 逐行逐行制导 2506.10948v1 -
15 06-12 Spurious Rewards: Rethinking Training Signals in RLVR Spurious Rewards: Rethinking Trainingssignale in RLVR 净利奖:在RLVR重新思考培训信号 2506.10947v1 -
16 06-12 GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models GUARD: Geführtes Lernen und Zurückhalten über Datenzuweisung für große Sprachmodelle GUARD:通过大语言模式数据归称制,指导学习和保留 2506.10946v1 -
17 06-12 Coupled reaction and diffusion governing interface evolution in solid-state batteries Gekoppelte Reaktion und Diffusion, die die Schnittstellenentwicklung in Festkörperbatterien regeln 关于固体状态电池界面演变的组合反应与扩散 2506.10944v1 -
18 06-12 TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems TimberStrike: Datensatz-Rekonstruktion Angriff Enthüllen der Privatsphäre Leckage in Federated Tree-Based Systems 木材三角:联邦树基系统中数据集重建攻击清除隐私渗漏 2506.07605v2 -
19 06-12 Self-Adapting Language Models Selbstanpassende Sprachmodelle 自我改进语言模式 2506.10943v1 -
20 06-12 VINCIE: Unlocking In-context Image Editing from Video VINCIE: Im Kontext Bildbearbeitung von Video entsperren VINCIE: 从视频中解锁 Incontext 图像编辑 2506.10941v1 -
21 06-12 Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction Entwicklung eines leistungsfähigen Rahmens für die Sprachemotionserkennung unter naturalistischen Bedingungen Herausforderung für die emotionale Attributvorhersage 制定一个高绩效框架,以便在自然条件下对情感属性预测的自然条件挑战中承认言论情感 2506.10930v1 -
22 06-12 Improving LLM Safety Alignment with Dual-Objective Optimization Verbesserung der LLM-Sicherheitsausrichtung mit Dual-Ziel-Optimierung 提高LLM安全一致性,实现双目标优化 2503.03710v2 -
23 06-12 On feature selection in double-imbalanced data settings: a Random Forest approach Zur Feature-Auswahl in doppelt ausgeglichenen Dateneinstellungen: ein Random Forest-Ansatz 在双重平衡数据设置中选择功能的功能:随机森林办法 2506.10929v1 -
24 06-12 Investigating the Relationship Between Physical Activity and Tailored Behavior Change Messaging: Connecting Contextual Bandit with Large Language Models Untersuchung der Beziehung zwischen körperlicher Aktivität und maßgeschneidertem Verhalten Veränderung Messaging: Verknüpfung von Kontextbandit mit großen Sprachmodellen 调查身体活动与因地制宜的行为变化通信之间的关系:将环境强盗与大语言模式联系起来 2506.07275v2 -
25 06-12 Robustly Improving LLM Fairness in Realistic Settings via Interpretability Robuste Verbesserung der LLM Fairness in realistischen Einstellungen durch Dolmetschbarkeit 通过可解释性在现实环境中强有力地提高LLM公平性 2506.10922v1 -
26 06-12 Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization Dekomponieren von MLP-Aktivierungen in Interpretierbare Funktionen über semi-Nonnegative Matrix-Fabrikisierung 通过半氮基矩阵化系数化,将劳动和生产部的分解活动转化为可解释性特征 2506.10920v1 -
27 06-12 Sequential-Parallel Duality in Prefix Scannable Models Sequential-Parallel-Dualität in voreingestellten Scannable-Modellen 前缀扫描可扫描模型中的序列- Parallel 品质 2506.10918v1 -
28 06-12 M4V: Multi-Modal Mamba for Text-to-Video Generation M4V: Multi-Modal Mamba für die Text-zu-Video-Generierung M4V: 用于文本到视频一代的多式Mamba 2506.10915v1 -
29 06-12 Foundation Models for Causal Inference via Prior-Data Fitted Networks Gründungsmodelle für die Kausalableitung über Prior-Data Fitted Networks 通过先前数据适合型网络进行因果关系推断的模型基础 2506.10914v1 -
30 06-12 NoLoCo: No-all-reduce Low Communication Training Method for Large Models NoLoCo: Nein-alles-Reduce Low Communication Training Method for Large Models NoLOCO: 大型模型的低通信培训方法 2506.10911v1 -
31 06-12 Probably Approximately Correct Labels Wahrscheinlich ungefähr korrekte Etiketten 可能是近似正确标签 2506.10908v1 -
32 06-12 Demystifying Spectral Feature Learning for Instrumental Variable Regression Entmystifizieren von Spektralfunktionen für instrumentelle variable Regression 为乐器变量回归而解开神秘的光谱特征学习 2506.10899v1 -
33 06-12 The Diffusion Duality Die Diffusionsdualität 传播质量 2506.10892v1 -
34 06-12 PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play PLAY2PROMPT: Zero-shot Tool Instruction Optimierung für LLM Agenten über Tool Play PLAY2PROMOPT: 通过工具游戏优化LLM代理器的零射工具指令 2503.14432v2 -
35 06-12 Lattice Climber Attack: Adversarial attacks for randomized mixtures of classifiers Lattice Climber Attack: Adversariale Angriffe für randomisierte Mischungen von Klassifikatoren Lattice 攀爬者攻击: 随机混合分类器的反向攻击 2506.10888v1 -
36 06-12 Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers Verallgemeinerung oder Halluzination? Verstehen von Out-of-Context-Reasoning in Transformers 通化还是幻觉? 理解变异器的逻辑外原因 2506.10887v1 -
37 06-12 Three iterations of $(d-1)$-WL test distinguish non isometric clouds of $d$-dimensional points Drei Iterationen von $(d-1)$-WL-Test unterscheiden nicht-isometrische Wolken von $d$-dimensionalen Punkten 三次(d-1-1美元-WL)的迭代(美元-美元-WL)测试区分非等度云(美元-维点) 2303.12853v4 -
38 06-12 A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials Ein Goemans-Williamson-Algorithmus zur Identifizierung von Subkohorten in klinischen Studien Goemans- Williamson 型算法,用于在临床试验中识别亚科 2506.10879v1 -
39 06-12 Data-Driven Prediction of Dynamic Interactions Between Robot Appendage and Granular Material Datengestützte Vorhersage dynamischer Interaktionen zwischen Roboter-Anhang und Granular Material 机器人附加和颗粒材料之间动态相互作用的数据驱动预测 2506.10875v1 -
40 06-12 The Gittins Index: A Design Principle for Decision-Making Under Uncertainty Der Gittins Index: Ein Design-Prinzip für Entscheidungsfindung unter Unsicherheit Gittins指数:不确定性下决策的设计原则 2506.10872v1 -
41 06-12 Unveiling the Role of Randomization in Multiclass Adversarial Classification: Insights from Graph Theory Enthüllen der Rolle der Randomisierung in der Mehrklassen-Adversarial-Klassifikation: Einsichten aus der Graphentheorie 打破随机化在多级反对向分类中的作用:图表理论的透视 2503.14299v2 -
42 06-12 Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization Lebensfähigkeit künftiger Maßnahmen: Robuste Sicherheit im Stärkungslernen durch Entropieregularisierung 未来行动的可行性:通过英信正规化加强学习的有力安全 2506.10871v1 -
43 06-12 mLaSDI: Multi-stage latent space dynamics identification mLaSDI: Mehrstufige latente Raumdynamik-Identifikation mLASDI:多阶段潜在空间动态识别 2506.09207v2 -
44 06-12 Latent Action Learning Requires Supervision in the Presence of Distractors Latent Action Learning erfordert Überwachung in der Gegenwart von Distraktoren 长期行动学习要求对有减压器的存在进行监督 2502.00379v5 -
45 06-12 Multi-group Uncertainty Quantification for Long-form Text Generation Multi-Gruppen-Unsicherheits-Quantifizierung für langformige Textgenerierung 长式文本生成的不确定性量化 2407.21057v2 -
46 06-12 Breaking Distortion-free Watermarks in Large Language Models Breaking Distortion-free Watermarks in großen Sprachmodellen 大语言模型中无扭曲的无水印 2502.18608v2 -
47 06-12 Debiasing Watermarks for Large Language Models via Maximal Coupling Debiasing Wasserzeichen für große Sprachmodelle über Maximal Coupling 通过Maximal Coupling为大语言模型减少对水标记的偏差 2411.11203v2 -
48 06-12 Sample Complexity and Representation Ability of Test-time Scaling Paradigms Probenkomplexität und Darstellungsfähigkeit von Testzeit-Skalierparadigmen 试验时间缩放模型的复杂性和代表性 2506.05295v2 -
49 06-12 OmniFluids: Unified Physics Pre-trained Modeling of Fluid Dynamics OmniFluids: Unified Physics Pre-trained Modeling of Fluid Dynamics OmniFluids: 流体动力学统一物理预培训模拟模型 2506.10862v1 -
50 06-12 Energy-Efficient Deep Learning for Traffic Classification on Microcontrollers Energieeffizientes Deep Learning für die Verkehrsklassifizierung an Mikrocontrollern 关于微控制器交通分类的节能深层学习 2506.10851v1 -
51 06-12 GraphThought: Graph Combinatorial Optimization with Thought Generation GraphThought: Graph Combinatorial Optimization mit Thought Generation 图图图图图图: 与新一代人相比的图形组合式优化 2502.11607v2 -
52 06-12 Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles Beschleunigen von Diffusion Großsprachige Modelle mit SlowFast: Die drei goldenen Prinzipien 以慢速加速传播大语言模式:三大金原则 2506.10848v1 -
53 06-12 Advanced fraud detection using machine learning models: enhancing financial transaction security Fortgeschrittene Betrugserkennung mit maschinellen Lernmodellen: Erhöhung der finanziellen Transaktionssicherheit 利用机器学习模式发现高级欺诈:加强金融交易安全 2506.10842v1 -
54 06-12 Simplicity bias and optimization threshold in two-layer ReLU networks Einfachheits-Bias und Optimierungsschwelle in zweischichtigen ReLU-Netzwerken 双层ReLU网络中的简化偏差和优化阈值 2410.02348v2 -
55 06-12 Efficiency Robustness of Dynamic Deep Learning Systems Effizienz Robustheit dynamischer Deep Learning Systeme 动态深深学习系统的效率强强 2506.10831v1 -
56 06-12 On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves Auf der Geometrie der Receiver-Betriebscharakteristik und Precision-Recall-Kurven 收件者操作特征和精确召回曲线几何 2504.02169v2 -
57 06-12 Learning richness modulates equality reasoning in neural networks Lernreichtum moduliert Gleichstellungsabwägungen in neuronalen Netzwerken 学习丰富 适应神经网络的平等推理 2503.09781v2 -
58 06-12 Empirical and computer-aided robustness analysis of long-step and accelerated methods in smooth convex optimization Empirische und computergestützte Robustheitsanalyse von langen und beschleunigten Methoden in reibungsloser konvexer Optimierung 经验分析和计算机辅助的稳健性分析,分析如何以长步骤和加速方法顺利优化顺流优化 2506.09730v2 -
59 06-12 General targeted machine learning for modern causal mediation analysis Allgemeines gezieltes maschinelles Lernen für moderne kausale Mediationsanalyse 现代因果调解分析通用定向机器学习 2408.14620v2 -
60 06-12 Detecting High-Stakes Interactions with Activation Probes Ermitteln von High-Stakes-Interaktionen mit Aktivierungssonden 与活性检测检测检测检测高吸量相互作用 2506.10805v1 -
61 06-12 Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background Knowledge Optimale Algorithmen für eingeschränktes k-Center-Clustering mit Instance-Level-Hintergrundwissen 受约束的 k- Center 集成实情背景知识的近最佳 K- Center 算法 2401.12533v4 -
62 06-12 Dense Associative Memory with Epanechnikov Energy Dichtes assoziatives Gedächtnis mit Epanechnikov Energie Epanechnikov能源的密集联合记忆 2506.10801v1 -
63 06-12 Divide-Fuse-Conquer: Eliciting “Aha Moments” in Multi-Scenario Games Divide-Fuse-Conquer: Eliciting “Aha Momente” in Multi-Szenario-Spiele 分裂-裂变:在多种场景运动会中激发“哈动力” 2505.16401v4 -
64 06-12 Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling Adaptives Federated LoRA in heterogenen drahtlosen Netzwerken mit unabhängiger Probenahme 具有独立抽样调查的多源无线网络中的联邦适应性 2505.23555v2 -
65 06-12 Mimicking Human Intuition: Cognitive Belief-Driven Reinforcement Learning Mimikking Human Intuition: Kognitives Glauben-getriebenes Stärkungslernen 模拟人类感知:认知信仰 – – 强化学习 2410.01739v3 -
66 06-12 Incentivizing Quality Text Generation via Statistical Contracts Förderung der Erzeugung von Qualitätstexten durch statistische Verträge 通过统计合同鼓励质量制文本 2406.11118v2 -
67 06-12 SlotPi: Physics-informed Object-centric Reasoning Models SlotPi: Physik-informierte objektzentrierte Vernunftmodelle SlotPi:物理学-知情的物体中心理由模型 2506.10778v1 -
68 06-12 Monotone Classification with Relative Approximations Einfarbige Klassifizierung mit relativen Annäherungen 相对相近的单体分类 2506.10775v1 -
69 06-12 Skillful joint probabilistic weather forecasting from marginals Qualifizierte gemeinsame probabilistische Wettervorhersage von Randgruppen 来自边缘地带的有技能的联合概率天气预报 2506.10772v1 -
70 06-12 OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems OPT-BENCH: LLM Agent auf großräumigen Suchräumen bewerten Optimierungsprobleme OOP-BENCH:评估大型搜索空间优化问题LLM代理 2506.10764v1 -
71 06-12 Great Models Think Alike and this Undermines AI Oversight Große Modelle denken ähnlich und dies unterminiert AI Oversight 伟大的模特儿们想着类似的想法 和这枚地下地雷 AI监督 2502.04313v2 -
72 06-12 Persistent Topological Features in Large Language Models Persistente Topologische Features in großen Sprachmodellen 大语言模式中的持久性有机污染物特征 2410.11042v2 -
73 06-12 Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering Neural bei ArchEHR-QA 2025: Agentische Prompt-Optimierung für evidenzgerundete klinische Fragen ArchEHR-QA 2025:证据四舍五入临床问题解答的代理快速优化 2506.10751v1 -
74 06-12 Graphical Transformation Models Graphische Transformationsmodelle 图形转换模型 2503.17845v3 -
75 06-12 A User’s Guide to Sampling Strategies for Sliced Optimal Transport Leitfaden für Anwender zur Probenahme von Strategien für den optimalen Transport 剪切最佳运输抽样战略用户指南 2502.02275v4 -
76 06-12 Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals Hybride Bernstein-Normalisierungsströmungen für flexible Multivariate Dichteregression mit interpretierbaren Marginalen 具有可解释边际作用的灵活多变密度递减的 伯恩斯坦混合 伯恩斯坦 2505.14164v2 -
77 06-12 BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery BioNeMo Framework: eine modulare, leistungsstarke Bibliothek für die Entwicklung von KI-Modellen in der Wirkstoffentdeckung BioNemo框架:一个模块化、高绩效图书馆,用于在药物发现方面开发AI模型 2411.10548v3 -
78 06-12 PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models PREMISE: Skalierbare und strategische Prompt-Optimierung für effiziente mathematische Reasoning in großen Modellen PREMISE:大规模模型中高效数学理由的可扩展和战略快速优化 2506.10716v1 -
79 06-12 Testing Generalizability in Causal Inference Prüfung der Verallgemeinerbarkeit bei Kausalzusammenhang 因果关系中的通用性 2411.03021v2 -
80 06-12 Continual Hyperbolic Learning of Instances and Classes Kontinuierliches Hyperbolisches Lernen von Instanzen und Klassen 不断对各种事件和课程进行双曲学习 2506.10710v1 -
81 06-12 Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models Anreize für eine fortgeschrittene Instruktions-Folge von großen Sprachmodellen 为采用大语言模式的高级指示提供激励理由 2506.01413v2 -
82 06-12 ConTextTab: A Semantics-Aware Tabular In-Context Learner ConTextTab: Ein Semantik-Bewusst Tabular In-Context Learner ConTextTab : 语义学软件制表器 2506.10707v1 -
83 06-12 Preserving Task-Relevant Information Under Linear Concept Removal Task-Relevante Informationen unter linearen Konzeptentfernung speichern 在线性概念删除下保留任务相关信息 2506.10703v1 -
84 06-12 SNR and Resource Adaptive Deep JSCC for Distributed IoT Image Classification SNR und Resource Adaptive Deep JSCC für verteilte IoT-Bildklassifikation SNR与资源适应性深部JSC,用于分布的IoT图像分类 2506.10699v1 -
85 06-12 A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack Ein hierarchischer Ansatz zur Bewertung der Verwundbarkeit baumbasierter Klassifikationsmodelle für den Schlussfolgerungsangriff der Mitgliedschaft 评估树树分类模式易受成员推推推攻击的脆弱性的等级办法 2502.09396v2 -
86 06-12 FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems FedRAG: Ein Rahmen für Systeme der Feinsteuerung von Retrieval-Augmented Generation FFRAG: 微调取回系统框架 2506.09200v2 -
87 06-12 Scalable unsupervised feature selection via weight stability Skalierbare unüberwachte Feature-Auswahl über Gewichtsstabilität 通过重量稳定性进行可缩放的、不受监督的特征选择 2506.06114v2 -
88 06-12 STOAT: Spatial-Temporal Probabilistic Causal Inference Network STOAT: Raum-Temporal Probabilistic Causal Inference Network STOAT: 空间-临时概率因果关系网络 2506.09544v2 -
89 06-12 Learning hidden cascades via classification Versteckte Kaskaden durch Klassifizierung lernen 通过分类学习隐藏的级联 2505.11228v2 -
90 06-12 Large Language Models for Detection of Life-Threatening Texts Große Sprachmodelle zur Erkennung lebensbedrohlicher Texte 探测生命威胁文字的长语言大语言模型 2506.10687v1 -
91 06-12 Croppable Knowledge Graph Embedding Croppable Knowledge Graph Embedding 可裁剪知识图嵌入 2407.02779v2 -
92 06-12 Implicit Language Models are RNNs: Balancing Parallelization and Expressivity Implizite Sprachmodelle sind RNNs: Ausgleichende Parallelisierung und Expressivität 隐含语言模式是RNNs:平衡平行和表达。 2502.07827v3 -
93 06-12 SR-Reward: Taking The Path More Traveled SR-Reward: Den Weg mehr beschreiten SR-奖励:走上更多旅行的道路 2501.02330v3 -
94 06-12 Saturation Self-Organizing Map Karte zur Selbstorganisation der Sättigung 饱和自组织地图 2506.10680v1 -
95 06-12 VeriContaminated: Assessing LLM-Driven Verilog Coding for Data Contamination VeriContaminiert: Bewertung von LLM-getriebener Verilog Coding für Datenkontamination 受污染:评估数据污染的LLM-Driven Verilog编码 2503.13572v3 -
96 06-12 Practical Improvements of A/B Testing with Off-Policy Estimation Praktische Verbesserungen der A/B-Prüfung mit Off-Policy-Abschätzung 实际改进A/B测试和非政策性估算的非政策性A/B测试 2506.10677v1 -
97 06-12 RmGPT: A Foundation Model with Generative Pre-trained Transformer for Fault Diagnosis and Prognosis in Rotating Machinery RmGPT: Ein Fundamentmodell mit generativem vortrainiertem Transformer für Fehlerdiagnose und Prognose in rotierenden Maschinen RMGPT: 轮调机械系统失灵诊断和预预报的预培训前生成变形器基金会模型 2409.17604v2 -
98 06-12 Automated Generation of Precedence Graphs in Digital Value Chains for Automotive Production Automatisierte Generierung von Precedence Graphen in digitalen Wertschöpfungsketten für die Automobilproduktion 自动生成汽车生产数字价值链中的优先性图 2504.19835v2 -
99 06-12 Diffusion-Free Graph Generation with Next-Scale Prediction Diffusionsfreie Graphengenerierung mit Next-Scale-Vorhersage 下期期期预测的无传播图制作 2503.23612v2 -
100 06-12 Logarithmic Smoothing for Adaptive PAC-Bayesian Off-Policy Learning Logarithmische Glättung für adaptives PAC-Bayesisches Off-Policy-Lernen PAC-Bayesian适应性PAC-Bayesian离政策学习对数平滑 2506.10664v1 -
101 06-12 Pushing the Limits of Extreme Weather: Constructing Extreme Heatwave Storylines with Differentiable Climate Models Die Grenzen des extremen Wetters sprengen: Extreme Hitzewellen-Geschichten mit differenzierbaren Klimamodellen konstruieren 推推极端天气的极限:用不同的气候模型构建极端热浪线 2506.10660v1 -
102 06-12 CoRT: Code-integrated Reasoning within Thinking CoRT: Code-integrierte Vernunft im Denken CORT: 思考中守则综合理由 2506.09820v2 -
103 06-12 TSFM-Bench: A Comprehensive and Unified Benchmark of Foundation Models for Time Series Forecasting TSFM-Bench: Ein umfassender und einheitlicher Benchmark von Basismodellen für die Zeitreihenprognose TSFM-Bench:时间序列预测基础模型的全面统一基准 2410.11802v6 -
104 06-12 Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference Graph-dependent regret Bounds in Multi-Armed Bandits mit Interferenz 多武装有干扰力的多武装强力土匪的图形- 依赖性 Regret Regret 界圈 2503.07555v2 -
105 06-12 Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes Robuste, unüberwachte Anpassung eines Spracherkennungsgeräts mit Entropie-Minimierungs- und Lautsprechercodes 使用磁最小化和演讲人守则的演讲者演讲者 2506.10653v1 -
106 06-12 Data Shifts Hurt CoT: A Theoretical Study Data Shifts Hurt CoT: Eine theoretische Studie Data Shifts Hurt CoT:理论研究 2506.10647v1 -
107 06-12 Flexible Tails for Normalizing Flows Flexible Endlagen für Normalisierung von Strömungen 流动正常化的灵活反面 2406.16971v2 -
108 06-12 Structure and asymptotic preserving deep neural surrogates for uncertainty quantification in multiscale kinetic equations Struktur und asymptotische Erhaltung tiefer neuronaler Surrogate zur Unsicherheitsquantifizierung in multiskaligen kinetischen Gleichungen 用于多尺度动动能方程中不确定性量化的深海神经代代体 2506.10636v1 -
109 06-12 Deploying Open-Source Large Language Models: A performance Analysis Bereitstellung von Open-Source-Großsprachenmodellen: Eine Leistungsanalyse 部署开放源码大语言模式:业绩分析 2409.14887v4 -
110 06-12 Hessian Geometry of Latent Space in Generative Models Hessische Geometrie des Latentenraums in generativen Modellen 创创模型中的深层空间的赫西德几何学 2506.10632v1 -
111 06-12 Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs Zeitreihenprognosen als Begründung: Ein langsam denkender Ansatz mit verstärkten LLMs 以时间序列预测作为理由:与强化的LLMs采取慢思考方式 2506.10630v1 -
112 06-12 Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning Aufgabenanpassung aus Kompetenzen: Informationsgeometrie, Entflechtung und neue Ziele für unbeaufsichtigtes Weiterbildungslernen 适应技能的任务:信息几何、分离和未受监督的加强学习新目标 2506.10629v1 -
113 06-12 Leveraging Low-rank Factorizations of Conditional Correlation Matrices in Graph Learning Leveraging Low-rank Factorizations of Conditional Correlation Matrices in Graph Learning 在图表学习中利用条件性关联矩阵的低级别定级 2506.10628v1 -
114 06-12 Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation Pretraining Generative Flow Networks mit teuren Belohnungen für die molekulare Graphengenerierung 为分子图生成建立具有不昂贵奖励的 预培训产生流动网络 2503.06337v4 -
115 06-12 SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis SDialog: Ein Python-Toolkit für die Synthetische Dialog-Generierung und -Analyse Sidialog:合成对话生成和分析的Python工具包 2506.10622v1 -
116 06-12 Assessing the Resilience of Automotive Intrusion Detection Systems to Adversarial Manipulation Bewertung der Resilienz von Intrusions-Detektionssystemen für die Adversarialmanipulation 评估汽车入侵探测系统对冲操纵的复原力 2506.10620v1 -
117 06-12 Universal Neural Optimal Transport Universeller Neural Optimaler Verkehr 通用神经优化运输 2212.00133v6 -
118 06-12 Deep Learning-Based Digitization of Overlapping ECG Images with Open-Source Python Code Deep Learning-based Digitalisierung von überlappenden EKG-Bildern mit Open-Source-Python-Code 使用开放源码的 ECG 重叠图像的深学习数字化 2506.10617v1 -
119 06-12 Non-stationary Online Learning for Curved Losses: Improved Dynamic Regret via Mixability Non-stationäres Online-Lernen für geschwungene Verluste: Dynamischer Streit durch Mischbarkeit verbessert 非静止的曲线损失在线学习:通过混合改进动态迟缓 2506.10616v1 -
120 06-12 Data Driven Diagnosis for Large Cyber-Physical-Systems with Minimal Prior Information Datenangetriebene Diagnose für große Cyber-Physical-Systeme mit minimalen Vorinformationen 具有最起码原始信息的大型网络物理系统数据驱动诊断 2506.10613v1 -
121 06-12 Density Ratio Estimation with Conditional Probability Paths Schätzung des Dichte-Verhältnisses mit bedingten Wahrscheinlichkeitspfaden 以有条件概率路径估算密度比率 2502.02300v3 -
122 06-12 Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models Inferenz-Zeit-Dekomposition von Aktivierungen (ITDA): Ein skalierbarer Ansatz zur Interpretation großer Sprachmodelle 触发(ITDA):解释大语言模式的可扩展办法 2505.17769v2 -
123 06-12 Distortion-Aware Brushing for Reliable Cluster Analysis in Multidimensional Projections Distortion-Aware Bürsten für zuverlässige Clusteranalyse in mehrdimensionalen Projektionen 为多层面预测中的可靠集集分析 2201.06379v2 -
124 06-12 TransMLA: Multi-Head Latent Attention Is All You Need TransMLA: Mehrkopf-Latent-Achtung ist alles, was Sie brauchen TransMLA: 多党领袖关注是你们需要的 2502.07864v5 -
125 06-12 ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization ConfPO: Ausnutzen des politischen Modells Vertrauen für kritische Token-Auswahl in Präferenz-Optimierung 召集:利用政策模范信心在优先最佳化中选择关键物优选标准 2506.08712v2 -
126 06-12 Generative Uncertainty in Diffusion Models Generative Unsicherheit bei Diffusionsmodellen 扩散模型中产生的不确定性 2502.20946v2 -
127 06-12 Size-adaptive Hypothesis Testing for Fairness Size-adaptive Hypothesenprüfung auf Fairness 公平性假设测试 2506.10586v1 -
128 06-12 MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark MAYA: Inkonsistenzen in generativem Passwort ansprechen Raten über einen einheitlichen Benchmark MAYA:通过统一基准解决生成密码猜猜不一致问题 2504.16651v2 -
129 06-12 Bandit Convex Optimisation Bandit Convex Optimierung Bandit Convex 优化优化 2402.06535v4 -
130 06-12 Graph Neural Networks for Automatic Addition of Optimizing Components in Printed Circuit Board Schematics Graphische Neuralnetze zur automatischen Ergänzung von Optimierungskomponenten in gedruckten Schaltplanen 打印电路板图示板自动增加优化部件的神经网络 2506.10577v1 -
131 06-12 Evolutionary Prediction Games Evolutionäre Vorhersage Spiele 演进预测运动会 2503.03401v2 -
132 06-12 Identifiability Challenges in Sparse Linear Ordinary Differential Equations Identifizierbarkeitsherausforderungen in Sparse Lineare Standarddifferenzgleichungen 分散线性普通差别等分法的可识别性挑战 2506.09816v2 -
133 06-12 Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration Box-Constrained Softmax-Funktion und seine Anwendung für die Post-Hoc-Kalibrierung Box- Contract Softmax 函数及其热后校准应用程序 2506.10572v1 -
134 06-12 On the role of non-linear latent features in bipartite generative neural networks Zur Rolle nichtlinearer latenter Merkmale in bipartitischen generativen neuronalen Netzwerken 关于非线性潜伏特征在双边基因神经网络中的作用 2506.10552v1 -
135 06-12 Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods Kostenlose Bewertung von Record-Level-Datensicherheitsrisiken durch Artifact-basierte Methoden 采用人工方法进行自由记录隐私风险评估 2411.05743v3 -
136 06-12 Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models Obliviate: Effiziente Unvergesslichkeit für den Schutz geistigen Eigentums in großen Sprachmodellen 默认:在大语言模式中有效统一保护知识产权 2502.15010v2 -
137 06-12 Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models Erforschung von Leistungs-Komplexitäts-Trade-Offs in Sound Event Detection-Modellen 探索健全事件探测模型中的性能-复杂交易-便利交易 2503.11373v2 -
138 06-12 Data-driven Day Ahead Market Prices Forecasting: A Focus on Short Training Set Windows datengetriebener Tag voraus Marktpreise Prognose: Ein Fokus auf Kurztraining-Set Windows 头期市场价格预测:关注短期培训成套窗口 2506.10536v1 -
139 06-12 Equivariant Neural Diffusion for Molecule Generation Equivariante Neuraldiffusion für die Molekülgeneration 分子代的等量神经元传播 2506.10532v1 -
140 06-12 Decision Making under the Exponential Family: Distributionally Robust Optimisation with Bayesian Ambiguity Sets Entscheidungsfindung unter der Exponential-Familie: Verteilungsstarke Optimierung mit Bayesian Ambiguity Sets 根据 “ 名人家庭 “ 下的决策:与贝ysian 暗形组的分布式强力优化 2411.16829v2 -
141 06-12 Learning in Budgeted Auctions with Spacing Objectives Lernen in budgetierten Auktionen mit unterschiedlichen Zielen 具有间距目标的预算拍卖中的学习 2411.04843v2 -
142 06-12 QuXAI: Explainers for Hybrid Quantum Machine Learning Models QuXAI: Erklärer für Hybrid-Quantum-Maschinen-Lernmodelle QuXAI:混合量子机器学习模型的解释器 2505.10167v3 -
143 06-12 Macro Graph of Experts for Billion-Scale Multi-Task Recommendation Makrograph der Experten für Milliarden-Scale Multi-Task-Empfehlung 10亿多任务建议十亿多阶段专家宏观图 2506.10520v1 -
144 06-12 A Weighted Loss Approach to Robust Federated Learning under Data Heterogeneity Ein gewichtiger Verlust Ansatz für robustes Federated Learning unter Daten Heterogenität 数据异质下联邦强力学习的加权损失方法 2506.09824v2 -
145 06-12 Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning Subgraph Gaussian Einbettungskontrast für selbstüberwachtes Graphen-Darstellungslernen 自支持图表代表制学习的 Subgraph Gaussian 嵌入式对比对比度 2505.23529v2 -
146 06-12 Differentially private and decentralized randomized power method Differential private und dezentrale randomisierte Energiemethode 区别对待的私人和分散的随机电源方法 2411.01931v3 -
147 06-12 Large Scale Multi-Task Bayesian Optimization with Large Language Models Große Multi-Task Bayesian Optimierung mit großen Sprachmodellen 使用大语言模式的大型多语类多语类多语类多语类最佳优化 2503.08131v2 -
148 06-12 Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation Qualität über Menge: Steigerung der Dateneffizienz durch ensembled multimodale Daten-Curation 质量高于数量:通过集合多模式数据曲线提高数据效率 2502.08211v2 -
149 06-12 A Crack in the Bark: Leveraging Public Knowledge to Remove Tree-Ring Watermarks Ein Riss in der Bark: Benutzt öffentliches Wissen, um Baum-Ring-Wasserzeichen zu entfernen 瓶中的裂缝:利用公共知识来清除树状水印 2506.10502v1 -
150 06-12 BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis BugGen: Eine selbstkorrigierende LLM-Pipeline für eine realistische RTL-Bug-Synthese BugGen: 现实的 RTL 错误合成自更正多 Agency LLM 管道 2506.10501v1 -
151 06-12 Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG Lag nicht, RAG: Training-freie Adversarial Detection mit RAG 不要拉格,RAG:使用RAG进行无训练的反向探测 2504.04858v2 -
152 06-12 SHORE: A Long-term User Lifetime Value Prediction Model in Digital Games SHORE: Ein langfristiges User Lifetime Value Prediction Model in digitalen Spielen SHORE: 数字游戏中长期用户终身价值预测模型 2506.10487v1 -
153 06-12 A Conjoint Graph Representation Learning Framework for Hypertension Comorbidity Risk Prediction Ein konjoint Graph Representation Learning Framework für Hypertonie Comorbidity Risk Prediction 超高频风险预测联合图表代表学习框架 2505.05094v2 -
154 06-12 Zero-Shot Offline Imitation Learning via Optimal Transport Zero-Shot Offline-Imitation Lernen über optimalen Transport 通过最佳交通方式进行零热离线模拟学习 2410.08751v3 -
155 06-12 Engagement-Driven Content Generation with Large Language Models Engagement-getriebene Content-Generierung mit großen Sprachmodellen 具有大语言模式的 “ 互动互动 “ 式内容生成 2411.13187v5 -
156 06-12 Prediction of steady states in a marine ecosystem model by a machine learning technique Vorhersage von stetigen Zuständen in einem marinen Ökosystemmodell durch eine Maschinelle Lerntechnik 通过机械学习技术预测海洋生态系统模型中稳定的状态 2506.10475v1 -
157 06-12 PASCO (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms PASCO (PArallel Structured COarsening): ein Overlay, um Graph Clustering Algorithmen zu beschleunigen PASCO(PARALLEL 结构化COseting):加速图形组合算法的重叠 2412.13592v2 -
158 06-12 Glimpse: Generalized Locality for Scalable and Robust CT Glimpse: Generalisierte Lokalität für skalierbare und robuste CT Glumpse: 可缩放和强力CT的通用位置 2401.00816v3 -
159 06-12 Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts Auf dem Weg zur robusten multimodalen Emotionserkennung unter fehlenden Modalitäten und Verteilungsverschiebungen 争取在缺失模式和分销转移模式下强有力地承认多模式情感 2506.10452v1 -
160 06-12 Extending AALpy with Passive Learning: A Generalized State-Merging Approach AALpy mit passivem Lernen erweitern: Ein generalisierter Staats-Merging-Ansatz 扩大被动学习的AALPY:普遍化的国家-老龄化办法 2506.06333v2 -
161 06-12 MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices MNN-LLM: Eine generische Inferenz-Engine für schnelle großsprachliche Modellbereitstellung auf mobilen Geräten MNN-LLM:移动设备快速大语言模型部署通用推断引擎 2506.10443v1 -
162 06-12 Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Experten-Rennen: Eine flexible Routing-Strategie für Scaling Diffusion Transformer mit Mixture von Experten 专家种族:与专家混合的扩大传播变异器灵活机动战略 2503.16057v3 -
163 06-12 MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning MasHost baut alles: Autonomes Multi-Agenten-System, das durch Verstärkungslernen gesteuert wird 以强化学习为导向的多机构自治系统 2506.08507v2 -
164 06-12 From Features to Graphs: Exploring Graph Structures and Pairwise Interactions via GNNs Von Features zu Graphen: Erforschen von Graphenstrukturen und paarweise Interaktionen über GNNs 从地貌到图表:通过GNNs探索图表结构和对称互动 2502.13471v2 -
165 06-12 System Identification Using Kolmogorov-Arnold Networks: A Case Study on Buck Converters Systemidentifizierung mit Kolmogorov-Arnold Networks: Eine Fallstudie zu Buck Convertern 使用Kolmogorov-Arnold网络的系统识别系统:关于巴克转换器的案例研究 2506.10434v1 -
166 06-12 Measuring Semantic Information Production in Generative Diffusion Models Messung der semantischen Informationsproduktion in generativen Diffusionsmodellen 生物传播模型中计量语义信息生产 2506.10433v1 -
167 06-12 CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning CheMatAgent: Verbesserung von LLMs für Chemie und Materialwissenschaft durch baumsuchebasiertes Tool Learning CheMatAgent:通过植树搜索工具学习加强化学和材料科学LLMs 2506.07551v2 -
168 06-12 Multi-dimensional Autoscaling of Processing Services: A Comparison of Agent-based Methods Mehrdimensionale Autoskalierung von Verarbeitungsdienstleistungen: Ein Vergleich von agentenbasierten Methoden 处理服务多维多维自动升级:以代理为基础的方法比较 2506.10420v1 -
169 06-12 Data-Driven Soil Organic Carbon Sampling: Integrating Spectral Clustering with Conditioned Latin Hypercube Optimization Datengetriebene Bodenorganische Kohlenstoff-Probenahme: Integration von Spektral-Clustering mit bedingter lateinischer Hyperkube-Optimierung 数据驱动土壤有机碳取样:将光谱集成与有条件的拉丁超立方体优化结合 2506.10419v1 -
170 06-12 Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation Verbesserter Algorithmus für tiefes aktives Lernen unter Unwucht durch Optimale Trennung 通过最佳离职,在平衡下改善深层积极学习的比值 2312.09196v4 -
171 06-12 Towards Graph-Based Privacy-Preserving Federated Learning: ModelNet – A ResNet-based Model Classification Dataset Auf dem Weg zu einem graphisch basierten Datenschutz-Erhaltungs-Federated Learning: ModelNet – Ein resNet-basierter Modellklassifikations-Datensatz 走向基于图表的隐私-保留联邦学习:模型网 – – 以ResNet为基础的分类模型数据集 2506.00476v2 -
172 06-12 Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series Zeit-IMM: Ein Datensatz und Benchmark für irreguläre multimodale Multivariate Zeitreihen 时间-IMM:非正常多式联运多变时间序列的数据集和基准 2506.10412v1 -
173 06-12 DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models DeepGDel: Deep Learning-basierte Gene Deletion Prediction Framework für wachstumsverbundene Produktion in Genom-Scale Metabolic-Modellen 深层GDel:在基因组-规模元元模型中实现增长和混合生产以深学习为基础的基因删除预测框架 2504.06316v2 -
174 06-12 IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection IoTGeM: Generalisierbare Modelle für die IoT-Angrifferkennung auf Verhaltensbasis IotGeM:基于行为的IoT攻击探测通用模型 2401.01343v2 -
175 06-12 An energy-efficient learning solution for the Agile Earth Observation Satellite Scheduling Problem Eine energieeffiziente Lernlösung für das agile Erdbeobachtungssatelliten-Scheduling-Problem Agile地球观测卫星排程问题的一个节能学习解决方案 2503.04803v2 -
176 06-12 PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier PAG: Multi-Turn verstärkt LLM Selbstkorrektion mit Politik als Generativer Prüfer PAG: 多发强化LLM自我校正,政策作为产生验证 2506.10406v1 -
177 06-12 Generative Algorithms for Wildfire Progression Reconstruction from Multi-Modal Satellite Active Fire Measurements and Terrain Height Generative Algorithmen für Wildfire-Progressions-Rekonstruktion durch Multi-Modal Satellite-Aktivfeuer-Messungen und Geländehöhe 从多模式卫星活性火灾测量和地面高度生成的野火进量重建 2506.10404v1 -
178 06-12 Time To Impeach LLM-as-a-Judge: Programs are the Future of Evaluation Time To Impeach LLM-as-a-Richter: Programme sind die Zukunft der Evaluation 弹劾LLM作为法官的时间:方案就是评价的未来 2506.10403v1 -
179 06-12 RsGCN: Rescaling Enhances Generalization of GCNs for Solving Scalable Traveling Salesman Problems RsGCN: Rescaling verbessert die Generalisierung von GCNs zur Lösung skalierbarer reisender Salesman-Probleme RsGCN: 提高全球氯化萘的通用化,以解决可缩放旅行销售员问题 2506.00533v2 -
180 06-12 TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning TooBadRL: Trigger-Optimierung zur Steigerung der Effektivität von Backdoor-Angriffen auf Deep Verstärkung Learning TooBadRL:在深强化学习中提高后门袭击效果的触发最佳优化 2506.09562v2 -
181 06-12 EQA-RM: A Generative Embodied Reward Model with Test-time Scaling EQA-RM: Generatives Embmund-Reward-Modell mit Testzeitskalierung EQA-RM:带有试验时间缩放的产生性内嵌式奖赏模型 2506.10389v1 -
182 06-12 Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective Messen von Repräsentationsverschiebungen im kontinuierlichen Lernen: Eine lineare Transformationsperspektive 持续学习中的衡量代表性变化:线性转变前景 2505.20970v3 -
183 06-12 Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning Hierarchische Latentenfähigkeiten von Sprachmodellen über das kausale Repräsentationslernen entdecken 通过因果代表制学习发现语言模式的分级本端能力 2506.10378v1 -
184 06-12 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Ein minimalistischer Ansatz zur LLM-Vernunft: von der Abstoßung zur Verstärkung 从拒绝抽样到强化 2504.11343v2 -
185 06-12 Revisiting Transformers with Insights from Image Filtering Übertragen von Transformern mit Einsichten aus der Bildfilterung 以图像过滤透视的洞视重新审视变形器 2506.10371v1 -
186 06-12 Can We Infer Confidential Properties of Training Data from LLMs? Können wir vertrauliche Eigenschaften von Trainingsdaten von LLMs ableiten? 我们能否从LLMS中推断培训数据的机密性? 2506.10364v1 -
187 06-12 Permutation-Based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data Permutationsbasierter Rank-Test in der Gegenwart von Diskretisierung und Anwendung in der kausalen Entdeckung mit gemischten Daten 在与混合数据相混合的Causal Discovery中存在分化和应用 2501.18990v2 -
188 06-12 Advanced deep architecture pruning using single filter performance Fortschrittliche Tiefenarchitektur-Beschneidung mit Single-Filter-Performance 使用单一过滤性能进行高级深高级结构修剪 2501.12880v2 -
189 06-12 Dimension-Independent Kernel ε-Covers Dimensionsunabhängiger Kernel ε-Covers 独立的内核 - 中央 2306.16516v2 -
190 06-12 Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success Demonstration von Multi-Suction Item Picking at Scale über Multi-Modal Lernen von Pick-Erfolg 通过多模式学习选取成功模式, 演示通过选取成功多模式学习在缩放中选择的多测试项目 2506.10359v1 -
191 06-12 TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree TreeLoRA: Effizientes kontinuierliches Lernen über schichtweise LoRAs, geleitet von einem Hierarchischen Gradienten-Similaritätsbaum TreeLORA:通过层-维兹 LoRAs 有效持续学习,由等级梯度梯度- 硅化树指导 2506.10355v1 -
192 06-12 History-Aware Neural Operator: Robust Data-Driven Constitutive Modeling of Path-Dependent Materials History-Aware Neuronal Operator: Robuste datengetriebene konstitutive Modellierung von pfadabhängigen Materialien 历史软件神经操作器:路径依赖材料的强有力数据驱动建模模型 2506.10352v1 -
193 06-12 PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications PhysNav-DG: Ein neuartiges adaptives Framework für robuste VLM-Sensor-Fusion in Navigationsanwendungen PhysNav-DG:导航应用中的强力VLM-传感器融合新适应框架 2505.01881v2 -
194 06-12 Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control Verbesserung der Glasdefekterkennung mit Diffusionsmodellen: Adressierung unausgewogener Datensätze in der Fertigungsqualitätskontrolle 利用传播模型加强玻璃破损检测:在制造业质量控制中解决数据集不平衡问题 2505.03134v2 -
195 06-12 Decoding for Punctured Convolutional and Turbo Codes: A Deep Learning Solution for Protocols Compliance Dekodierung für punktierte Konvolutions- und Turbo-Codes: Eine Deep-Learning-Lösung für Protokolle Compliance 钻穿的革命代码和涡轮代码解码:遵守议定书的深学习解决方案 2502.15475v2 -
196 06-12 PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation PhysioWave: Multi-Scale Wavelet-Transformer für Physiologische Signaldarstellung PhysioWave: 生理信号代表的多阶段波盘转换器 2506.10351v1 -
197 06-12 LightKG: Efficient Knowledge-Aware Recommendations with Simplified GNN Architecture LightKG: Effiziente wissensbasierte Empfehlungen mit vereinfachter GNN-Architektur LightKG: 具有简化 GNN 架构的高效知识软件建议 2506.10347v1 -
198 06-12 Technical Report with Proofs for A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments Technischer Bericht mit Beweisen für ein vollständiges Bild in Übereinstimmungsprüfung: Alle optimalen Ausrichtungen effizient zusammenfassen 技术报告,附有符合要求检查中完整图片的校对证明:高效地对齐所有最佳对齐 2506.10345v1 -
199 06-12 Provably Learning from Language Feedback Wahrscheinlich von Sprachfeedback lernen 从语言反馈中学习 2506.10341v1 -
200 06-12 Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Auf dem Weg zu datenschutzerhaltender verteilter Intelligenz in AR/VR/MR 多模式、多模式、多任务、多任务联邦基金会下一代扩展现实系统模式:争取在AR/VR/MR/MR中发布隐私保护分配情报 2506.05683v3 -
201 06-12 On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention On-the-Fly Adaptive Destillation von Transformer auf Dual-State Lineare Aufmerksamkeit 向两国线性注意对变压器进行实时调整性蒸馏 2506.09316v2 -
202 06-12 Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs Amulett: Neuausrichtung während der Testzeit für Personalisierte Präferenzanpassung von LLMs 缩略图:在试验期间重新对准,以适应LLMM的个性化偏好 2502.19148v3 -
203 06-12 Air in Your Neighborhood: Fine-Grained AQI Forecasting Using Mobile Sensor Data Luft in Ihrer Nachbarschaft: Feinkörnige AQI-Prognose mit mobilen Sensordaten 利用移动感应器数据进行精密的AQI预报 2506.10332v1 -
204 06-12 CompilerDream: Learning a Compiler World Model for General Code Optimization CompilerDream: Lernen eines Compiler-Weltmodells für die allgemeine Code-Optimierung 汇编者:学习编纂者世界通用守则优化模式 2404.16077v3 -
205 06-12 Towards Scalable SOAP Note Generation: A Weakly Supervised Multimodal Framework Auf dem Weg zu einer skalierbaren SOAP-Note Generation: Ein schwach überwachter multimodaler Rahmen 向可缩放的 SOAP 笔记本生成方向迈进: 受微弱监督的多模式框架 2506.10328v1 -
206 06-12 A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon Ein Benchmark für die Verallgemeinerung unterschiedlicher Teamstrategien im wettbewerbsfähigen Pokémon 普凯蒙竞争中全面推广不同团队战略的基准 2506.10326v1 -
207 06-12 SWDL: Stratum-Wise Difference Learning with Deep Laplacian Pyramid for Semi-Supervised 3D Intracranial Hemorrhage Segmentation SWDL: Stratum-Wise Difference Learning mit tiefer Laplacian Pyramide für halbüberwachte 3D-Intrakranialblutungssegmentierung SWDL: 与深拉拉拉比西亚金字塔为半超3D内出血分块进行平流和深拉拉拉西亚金字塔差异学习 2506.10325v1 -
208 06-12 DiffUMI: Training-Free Universal Model Inversion via Unconditional Diffusion for Face Recognition DiffUMI: Schulungsfreie universelle Modellinversion über Bedingungslose Diffusion für die Gesichtserkennung DiffUMI:通过无条件传播面相承认,无培训通用模式翻版 2504.18015v2 -
209 06-12 DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDE DeePoly: Ein High-Order Accuracy Scientific Machine Learning Framework für die Funktionsannäherung und Lösung von PDE Deepoly: 功能近似和溶解PDE高分精确度科学机器学习框架 2506.04613v2 -
210 06-12 Policy-Based Trajectory Clustering in Offline Reinforcement Learning Policy-based Trajektorie Clustering im Offline-Verstärkungslernen 以政策为基础的离线强化学习轨迹集群 2506.09202v2 -
211 06-12 PerfTracker: Online Performance Troubleshooting for Large-scale Model Training in Production PerfTracker: Online-Performance-Fehlersuche für großformatige Modellschulungen in der Produktion PerfTracker:大规模生产示范培训在线绩效问题解决 2506.08528v3 -
212 06-12 Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting Aufwärtsgewichtung einfache Proben in Feinabstimmung Mitigates Vergessen 高重量加重量易加重量样本 2502.02797v2 -
213 06-12 PyLO: Towards Accessible Learned Optimizers in PyTorch PyLO: Auf dem Weg zu zugänglich gelernten Optimierern in PyTorch PyLO:在皮托尔奇努力实现无障碍、可获取的优化 2506.10315v1 -
214 06-12 Detecting Sockpuppetry on Wikipedia Using Meta-Learning Sockepuppetry auf Wikipedia erkennen Mit Meta-Learning 在维基百科上用元学习探测袜子布料 2506.10314v1 -
215 06-12 Collaborative Min-Max Regret in Grouped Multi-Armed Bandits Kollaborativer Min-Max-Streit in gruppierten Multi-Armed Bandits 联合多武装强盗团体中负负负负 2506.10313v1 -
216 06-12 Agnostic Smoothed Online Learning without Knowledge of the Base Measure Agnostisches geglättetes Online-Lernen ohne Kenntnis des Basismaßes 缺乏基础措施知识的Agnostic平滑在线学习 2410.05124v3 -
217 06-12 Self-learning signal classifier for decameter coherent scatter radars Selbstlernender Signalklassifikator für Dekameter-Konsistenzstreuradar 10度连续散射雷达自学信号分类器 2506.10305v1 -
218 06-12 The Alignment Trap: Complexity Barriers Die Alignment-Falle: Komplexitätsbarrieren 协调陷阱:复杂障碍 2506.10304v1 -
219 06-12 Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training ReLU-Netzwerke zusammenführen, um bei der Initialisierung und während des Trainings viele lineare Regionen exponieren zu lassen 在初始化和培训期间,将ReLU网络强制推至展示极多线性线性区域 2311.18022v7 -
220 06-12 Distributionally-Constrained Adversaries in Online Learning Verteilt bedingte Widersacher im Online-Lernen 在线学习中经分发的、经过培训的在线学习论文 2506.10293v1 -
221 06-12 Heterogeneous Multi-Agent Reinforcement Learning for Distributed Channel Access in WLANs Heterogenes Multi-Agenten-Verstärkungslernen für verteilten Kanalzugang in WLANs 不同多机构强化学习,用于在局域网中分配的频道接入 2412.14218v2 -
222 06-12 Context Is Not Comprehension Kontext ist nicht verständlich 背景不令人理解 2506.04907v4 -
223 06-12 ClusterUCB: Efficient Gradient-Based Data Selection for Targeted Fine-Tuning of LLMs ClusterUCB: Effiziente Gradient-basierte Datenauswahl für gezielte Feinsteuerung von LLMs COCUCB: 高效率的逐步数据选择,以便有针对性地微调LLMM 2506.10288v1 -
224 06-12 Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning Graph-MLLM: Multimodale große Sprachmodelle für multimodales Graphenlernen nutzen 图图-MLLM:利用多模式多模式大语言模型进行多模式图表学习 2506.10282v1 -
225 06-12 Noise Balance and Stationary Distribution of Stochastic Gradient Descent Geräuschausgleich und Stationäre Verteilung des stochastischen Gradientenabstiegs 沙粒渐变后裔的噪音平衡和固定分布 2308.06671v2 -
226 06-12 Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty Leichtes Datenset Pruning ohne vollständiges Training über Beispiel Schwierigkeit und Vorhersage Ungewissheit 轻量数据集不通过实例困难和预测不确定性进行充分培训 2502.06905v3 -
227 06-12 VQC-MLPNet: An Unconventional Hybrid Quantum-Classical Architecture for Scalable and Robust Quantum Machine Learning VQC-MLPNet: Eine unkonventionelle hybride Quantum-Klassische Architektur für skalierbares und robustes Quantum Machine Learning VQC-MLPNet:用于可缩放和强力量量机器学习的非常规混合量子分类结构 2506.10275v1 -
228 06-12 TimeBridge: Better Diffusion Prior Design with Bridge Models for Time Series Generation TimeBridge: Besseres Diffusions-Vorentwurf mit Bridge-Modellen für die Zeitreihenerstellung TimeBridge: 更好的传播前设计,具有时间序列生成的桥梁模型 2408.06672v2 -
229 06-12 GD doesn’t make the cut: Three ways that non-differentiability affects neural network training GD macht nicht den Schnitt: Drei Wege, dass Nicht-Unterschiedlichkeit das neuronale Netzwerk-Training beeinflusst GD没有做出削减: 三种无差异的方式影响神经网络培训 2401.08426v9 -
230 06-12 Predicting function of evolutionarily implausible DNA sequences Vorhersagefunktion evolutionär unglaubwürdiger DNA-Sequenzen 进化性DNA序列的预测功能 2506.10271v1 -
231 06-12 Interior-Point Vanishing Problem in Semidefinite Relaxations for Neural Network Verification Interior-Point Vanishing Problem in semidefinite Entspannung für Neural Network Verifikation 神经网络核查半无限期放松措施中的内部-点消失问题 2506.10269v1 -
232 06-12 Do Language Models Have Bayesian Brains? Distinguishing Stochastic and Deterministic Decision Patterns within Large Language Models Haben Sprachmodelle Bayesische Gehirne? Beeindruckende stochastische und deterministische Entscheidungsmuster innerhalb großer Sprachmodelle 语言模式是否具有贝耶斯人脑? 区分大语言模式中的斯托卡和决定性决定模式 2506.10268v1 -
233 06-12 Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update Heavy-Tailed Linear Bandits: Huber-Regression mit One-Pass-Update 重困线性强盗:用单页更新的 “ Huber Refer “ 后退 2503.00419v2 -
234 06-12 Meta-learning Representations for Learning from Multiple Annotators Meta-Learning-Darstellungen zum Lernen von mehreren Annotatoren 从多个宣传员那里学习的元学习代表 2506.10259v1 -
235 06-12 Elucidating the Design Space of Multimodal Protein Language Models Erklärung des Gestaltungsraums multimodaler Protein-Sprachmodelle 说明多模式蛋白质语言模型的设计空间 2504.11454v3 -
236 06-12 OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning OmniSage: Large Scale, Multi-Entity Heterogenes Graph Representation Learning 总括:大规模、多实体异质图代表制学习 2504.17811v3 -
237 06-12 Qronos: Correcting the Past by Shaping the Future… in Post-Training Quantization Qronos: Die Vergangenheit korrigieren, indem sie die Zukunft gestalten… in der Post-Training-Quantisierung Qronos:在培训后量化中通过塑造未来来纠正过去 2505.11695v2 -
238 06-11 (3) Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing Gewicht wert: Randomisierte und Regularisierte Blockkaczmarz-Algorithmen ohne Vorverarbeitung 重量:无预处理的随机和正规化卡茨马尔兹理算区块 2502.00882v2 -
239 06-11 A new type of federated clustering: A non-model-sharing approach Eine neue Art von föderierten Clustern: Ein nicht-Modell-Sharing-Ansatz 新型的联盟联合集群:非模式共享办法 2506.10244v1 -
240 06-11 PointNet with KAN versus PointNet with MLP for 3D Classification and Segmentation of Point Sets PointNet mit KAN versus PointNet mit MLP für 3D Klassifizierung und Segmentierung von Punktsätzen 与 KAN 和 PointNet 和 MLP 相对的点网点网点,用于 3D 的点集分类和分划 2410.10084v3 -
241 06-11 Function-Guided Conditional Generation Using Protein Language Models with Adapters Funktionsgeführte Conditional Generation Verwendung von Protein Language Models mit Adaptern 使用配有适应器的蛋白语言模型的 函数引导有条件生成 2410.03634v2 -
242 06-11 Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods Prompt Angriffe Enthüllen Oberflächliche Wissensentfernung in unlearning Methoden 以不学习方法清除知识 2506.10236v1 -
243 06-11 LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation LaMAGIC2: Erweiterte Schaltungsformulierungen für sprachmodellbasierte analoge Topologie-Generierung LaMAGIC2:语言模拟模拟模拟地形生成的先进电路配制 2506.10235v1 -
244 06-11 Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices Große Sprachmodelle sichern: Bedrohungen, Schwachstellen und verantwortungsvolle Praktiken 保障大语言模式:威胁、脆弱性和负责任做法 2403.12503v2 -
245 06-11 ScoreMix: Improving Face Recognition via Score Composition in Diffusion Generators ScoreMix: Verbesserung der Gesichtserkennung über Score-Komposition in Diffusionsgeneratoren 分数组合:通过传播发电机的分分数构成改善面对面的承认 2506.10226v1 -
246 06-11 Meta-learning Optimizers for Communication-Efficient Learning Meta-Learning-Optimierer für Kommunikations-Effizientes Lernen 通信-高效学习的元学习优化工具 2312.02204v2 -
247 06-11 Neural Networks Generalize on Low Complexity Data Neurale Netzwerke verallgemeinern auf Daten mit geringer Komplexität 通用低复杂度数据神经网络 2409.12446v3 -
248 06-11 The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks Die Auswirkungen von Feature Scaling im maschinellen Lernen: Auswirkungen auf Regressions- und Klassifizierungsaufgaben 机械学习中的特质增强效果:对倒退和分类任务的影响 2506.08274v2 -
249 06-11 SALAD: Systematic Assessment of Machine Unlearing on LLM-Aided Hardware Design SALAD: Systematische Bewertung der Maschinenentleerung auf LLM-Aided Hardware Design SALAD: 系统评估LLM辅助硬件设计中的机器渗漏 2506.02089v2 -
250 06-11 Cross-Learning Between ECG and PCG: Exploring Common and Exclusive Characteristics of Bimodal Electromechanical Cardiac Waveforms Cross-Learning zwischen EKG und PCG: Erforschung gemeinsamer und exklusiver Eigenschaften bimodaler elektromechanischer Herzwellenformen ECG和PCG之间的交叉学习:探索双式电动机械心脏病波形的常见和独有特点 2506.10212v1 -
251 06-11 Failure Modes of LLMs for Causal Reasoning on Narratives Failure Modes von LLMs für die ursächliche Begründung von Narrativen 以叙述为由解释原因的LLMs失败模式 2410.23884v4 -
252 06-11 SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models SealQA: Anhebung der Messlatte für die Vernunft in Search-Augmented Language Models SealQA: 提高搜索增强语言模式中的原因栏 2506.01062v2 -
253 06-11 ETS: Efficient Tree Search for Inference-Time Scaling ETS: Effiziente Baumsuche nach Inferenz-Zeitskalierung ETS: 高效树搜索推断时间缩放 2502.13575v2 -
254 06-11 Towards Fair Representation: Clustering and Consensus Auf dem Weg zu einer fairen Vertretung: Clustering und Konsens 实现公平代表性:集群和共识 2506.08673v2 -
255 06-11 AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent AWP: Activation-Aware Weight Pruning und Quantisierung mit projiziertem Gradient Descent AWP: 与预测的渐变人种相比,活性-敏量 体重 稳重和量化 2506.10205v1 -
256 06-11 VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use VTool-R1: VLMs lernen mit Bildern zu denken, indem sie mehr über multimodale Werkzeugnutzung lernen VTool-R1:VLMs通过多模式工具使用强化学习学习如何用图像思考 2505.19255v3 -
257 06-11 Prompt Variability Effects On LLM Code Generation Veränderliche Auswirkungen auf die LLM-Code-Generierung 对LLM 代码生成的迅速易变性效应 2506.10204v1 -
258 06-11 From Neural Representations to Interpretable Logic Rules Von neuralen Vertretungen zu interpretierbaren Logischen Regeln 从神经代表到可解释逻辑规则 2501.08281v2 -
259 06-11 Learning Geometric Invariant Features for Classification of Vector Polygons with Graph Message-passing Neural Network Lernen Geometrische Invariante Funktionen zur Klassifizierung von Vektor-Polygonen mit Graph Message-passing Neural Network 具有图形信件通过神经网络的矢量多边形分类 2407.04334v2 -
260 06-11 Privacy-aware Berrut Approximated Coded Computing for Federated Learning Privacy-aware Berrut Ungefähre Coded Computing für Federated Learning Berrut Berrut Berrut 接近的联邦学习联合会编码计算器 2405.01704v3 -
261 06-11 DynaSubVAE: Adaptive Subgrouping for Scalable and Robust OOD Detection DynaSubVAE: Adaptive Subgruppierung für skalierbare und robuste OOD-Erkennung DynaSubVAE:可缩放和强力OOD探测适应性分组 2506.10200v1 -
262 06-11 Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance Objekt-Halluzination in großen Visions-Sprachen-Modellen durch Bild-umrundete Anleitung abmildern 通过图像四舍五入指南在大型视觉语言模型中减轻物体幻觉 2402.08680v2 -
263 06-11 Exploring Topological and Localization Phenomena in SSH Chains under Generalized AAH Modulation: A Computational Approach Erforschung topologischer und Lokalisierungsphänomene in SSH-Ketten unter generalisierter AAH-Modulation: Ein rechnerischer Ansatz 根据普遍AAAH变动:一种计算方法,在SSH链链中探索地形学和本地化现象 2506.10195v1 -
264 06-11 Multi-task Representation Learning for Mixed Integer Linear Programming Multi-Task-Darstellungs-Lernen für gemischte Integer-Linear-Programmierung 混合整数线性线性方案规划多任务代表学习 2412.14409v2 -
265 06-11 Improving Oral Cancer Outcomes Through Machine Learning and Dimensionality Reduction Verbesserung der Ergebnisse von Oralkrebs durch maschinelles Lernen und Reduzierung der Dimensionalität 通过机器学习和减少分量来改善口腔癌结果 2506.10189v1 -
266 06-11 Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce Zukunft der Arbeit mit KI-Agenten: Auditing Automation und Augmentation Potential in der gesamten US-Arbeitskraft 未来与AI代理机构的合作:全美国员工队伍的审计自动化和增强潜力 2506.06576v2 -
267 06-11 Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming Problem Balans: Multi-Armed Bandits-basierte adaptive Großnachbarschaft Suche nach gemischt-integer-Programmierungsproblem Balans:多武装强盗基于适应性的大型邻里搜索混合内插方案拟订问题 2412.14382v2 -
268 06-11 Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment Skalierbare nicht-equivariante 3D-Molekül-Erzeugung über Rotational Alignment 通过旋转对齐可缩放的非等式 3D 分子生成 2506.10186v1 -
269 06-11 Optimizing Genetic Algorithms with Multilayer Perceptron Networks for Enhancing TinyFace Recognition Optimierung genetischer Algorithmen mit mehrschichtigen Perceptron-Netzwerken zur Verbesserung der TinyFace-Erkennung 利用增强微面识别的多层倍感人网络优化遗传算法 2506.10184v1 -
270 06-11 A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes Eine vergleichende Studie über Techniken des maschinellen Lernens zur Frühvorhersage von Diabetes 早期预测糖尿病的机器学习技术比较研究 2506.10180v1 -
271 06-11 Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models Geometrische Regelmäßigkeit bei der Deterministischen Probenahme von Diffusions-basierten Generativen Modellen 以传播为基础的创造模型的确定性抽样的几何规律性 2506.10177v1 -
272 06-11 SPARKE: Scalable Prompt-Aware Diversity Guidance in Diffusion Models via RKE Score SPARKE: Skalierbares Prompt-Aware Diversity Guidance in Diffusionsmodellen über RKE-Score SPARKE:通过RKE分数在传播模型中提供可缩放的即时快速软件多样性指导 2506.10173v1 -
273 06-11 Learning a Gaussian Mixture for Sparsity Regularization in Inverse Problems Eine Gaußische Mischung für Sparsamkeitsregularisierung in inversen Problemen lernen a 学习高斯混音,以在反向问题中实现分级规范化 2401.16612v2 -
274 06-11 Momentum Multi-Marginal Schrödinger Bridge Matching Momentum Multi-Marginal Schrödinger Brücke passend 多波盘桥配桥 2506.10168v1 -
275 06-11 Wasserstein Barycenter Soft Actor-Critic Wasserstein Barycenter Soft-Actor-Kritik 瓦塞斯泰因·巴里研究中心 2506.10167v1 -
276 06-11 The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset Der PNPL-Wettbewerb 2025: Spracherkennung und Phoneme-Klassifizierung im LibriBrain-Datensatz 2025年PNPL竞争:Libribrain数据集中的语音探测和电话分类 2506.10165v1 -
277 06-11 Probabilistic Variational Contrastive Learning Probabilistisches Variationelles Kontrastives Lernen 概率概率变化差异性差异性学习 2506.10159v1 -
278 06-11 Measuring Corporate Human Capital Disclosures: Lexicon, Data, Code, and Research Opportunities Corporate Human Capital Disclosures messen: Lexikon, Daten, Code und Forschungsmöglichkeiten 衡量公司人力资本披露:词汇、数据、代码和研究机会 2506.10155v1 -
279 06-11 Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME Analysieren von Emotionen in Bangla Social Media Kommentare mit maschinellem Lernen und LIME 利用机器学习和LIME分析孟加拉社会媒体评论中的情感 2506.10154v1 -
280 06-11 Attention on flow control: transformer-based reinforcement learning for lift regulation in highly disturbed flows Aufmerksamkeit auf Strömungssteuerung: Transformer-basiertes Bewehrungslernen zur Heberegulierung in stark gestörten Strömungen 注意流动控制:以变压器为基础的增强力学习如何在高度动荡的流量中进行升降管制 2506.10153v1 -
281 06-11 Balanced Hyperbolic Embeddings Are Natural Out-of-Distribution Detectors Ausgewogene hyperbolische Einbettungen sind natürliche Out-of-Distribution-Detektoren 平衡的双曲嵌入器是天然流散探测器 2506.10146v1 -
282 06-11 Physiological-Model-Based Neural Network for Heart Rate Estimation during Daily Physical Activities Physiologisch-Modellbasiertes Neuronales Netzwerk zur Schätzung der Herzfrequenz während täglicher körperlicher Aktivitäten 日常体育活动中心电率估计的生理-基于模型的神经网络 2506.10144v1 -
283 06-11 Diffusion prior as a direct regularization term for FWI Diffusion vor der direkten Regularisierung für FWI 之前作为家庭福利基金会直接正规化期的传播 2506.10141v1 -
284 06-11 Survival Analysis as Imprecise Classification with Trainable Kernels Überlebensanalyse als ungenaue Klassifizierung mit lernfähigen Kernen 进行生存分析,将生存分析作为可训练内核的不精确分类 2506.10140v1 -
285 06-11 Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban Lernte Suche interpretieren: Ein Übergangsmodell und eine Wertfunktion in einer RNN zu finden, die Sokoban spielt 解释学识搜索:在播放 Sokoban 的 RNN 中找到过渡模式和价值函数 2506.10138v1 -
286 06-11 Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning Selbstprädiktive Darstellungen für die kombinatorische Generalisierung im Verhaltensklonen 行为克隆综合通用的自我预测代表 2506.10137v1 -
287 06-11 Provable Sim-to-Real Transfer via Offline Domain Randomization Provable Sim-to-Real-Übertragung über Offline-Domain Randomisierung 通过离线网域随机转接 2506.10133v1 -
288 06-11 ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs ViCrit: Eine überprüfbare Verstärkungs-Lern-Proxy-Aufgabe für visuelle Wahrnehmung in VLMs 虚拟: VLM 视觉感知的可核查强化学习代用任务 2506.10128v1 -
289 06-11 Generative Modeling with Diffusion Generative Modellierung mit Diffusion 利用传播生成模型 2412.10948v2 -
290 06-11 Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms Treffen Sie mich am Arm: Das kooperative Multi-Armed Bandits Problem mit geteilten Armen 与我在 “ 武器:合作的多武装强盗与可分享武器问题 “ 聚会 2506.10127v1 -
291 06-11 Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds’ Annotated Imagery Peer-Ranked Precision: Erstellung eines Basisdatensatzes für feinfühlige Visionsmodelle aus DataSeeds kommentierten Bildern 同行式精确度:从数据种子附加图象中为精美的愿景模型创建基础数据集 2506.05673v3 -
292 06-11 GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments GRAIL: Ein Benchmark für GRAPH ActIve Learning in dynamischen Sensing-Umgebungen GRAIL: 动态遥感环境中GRaph Active学习基准 2506.10120v1 -
293 06-11 Detecção da Psoríase Utilizando Visão Computacional: Uma Abordagem Comparativa Entre CNNs e Vision Transformers Detecção da Psoríase Utilizando Visão Computacional: Uma Abortagem Comparativa Entre CNNs e Vision Transformers 利用有线电视新闻网和愿景变革者 2506.10119v1 -
294 06-11 NnD: Diffusion-based Generation of Physically-Nonnegative Objects NnD: Diffusionsbasierte Generierung physikalisch-nonnegativer Objekte NND: 以传播为主的物理-氮化物体的生成 2506.10112v1 -
295 06-11 AI5GTest: AI-Driven Specification-Aware Automated Testing and Validation of 5G O-RAN Components AI5GTest: AI-Driven Specification-Aware Automatisierte Prüfung und Validierung von 5G O-RAN Komponenten AI5GTest: AI-Driver 5G O-RAN 部件的AI-DRE 规格-软件自动测试和验证 2506.10111v1 -
296 06-11 A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning Ein einheitliches Framework, um Symmetrie im maschinellen Lernen zu stärken, zu entdecken und zu fördern 实施、发现和促进机器学习对称的统一框架 2311.00212v3 -
297 06-11 Learning to Collaborate Over Graphs: A Selective Federated Multi-Task Learning Approach Lernen, über Graphen zu kollaborieren: Ein selektiver, federführender Lernansatz mit mehreren Aufgaben 学习 “ 合作翻图:选择性联邦多任务学习方法 “ 2506.10102v1 -
298 06-11 VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis VISTA: Vision-Language-Schlussfolgerung für eine trainingsfreie Analyse der Stock-Zeitreihen VISTA:无培训-库存无培训-时间-系列分析的远景-语言推断 2505.18570v3 -
299 06-11 Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes Grundlegende Grenzen des Lernens Hochdimensionale Vereinfachungen in Noisy Regimes 吵闹制度中学习高维简单主义的基本限制 2506.10101v1 -
300 06-11 Token-Efficient RL for LLM Reasoning Token-Efficient RL für LLM-Vernunft 用于LLM 理由解释的 Token- Effecticent RL 2504.20834v4 -
301 06-11 Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models Schätzung der gemeinsamen Wahrscheinlichkeit von Szenario-Parametern mit Gaussian Mixture Copula-Modellen 与高山混合混合胶合合构件模型共同估计设想参数的 联合概率 2506.10098v1 -
302 06-11 Unsupervised Deep Clustering of MNIST with Triplet-Enhanced Convolutional Autoencoders Unüberwachtes Deep Clustering von MNIST mit Triplet-Enhanced Convolutional Autoencodern 在无人监督下,MNIST与三维强化的革命自动自动编码器一起深入集群 2506.10094v1 -
303 06-11 Efficient kernelized bandit algorithms via exploration distributions Effiziente kernisierte Bandit-Algorithmen über Explorations-Distributionen 通过勘探分布分配进行高效率的内脏化土匪算法 2506.10091v1 -
304 06-11 Assortment Optimization for Patient-Provider Matching Sortimentsoptimierung für Patient-Provider-Matching 患者-供应者匹配的组群优化 2502.10353v2 -
305 06-11 Optimizing Latent Dimension Allocation in Hierarchical VAEs: Balancing Attenuation and Information Retention for OOD Detection Optimierung der Latent Dimension Allocation in Hierarchical VAEs: Abgleichende Dämpfung und Informationsretention für OOD Detection 优化在等级性VAEs中的中局量分配:平衡减速和信息保留以探测OOD 2506.10089v1 -
306 06-11 M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Predictive Embedding Architecture M3-JEPA: Multimodale Ausrichtung über Multi-Gate MoE auf Basis der Joint-Predictive Embedding Architecture M3-JEPA:通过基于联合预防嵌入结构的多门教育部实现多式联运协调 2409.05929v5 -
307 06-11 Patient-Specific Deep Reinforcement Learning for Automatic Replanning in Head-and-Neck Cancer Proton Therapy Patientenspezifisches Tiefen-Verstärkungs-Lernen zur automatischen Neuplanung in der Kopf-Neck-Krebs-Protonen-Therapie 在头部和颈部癌症质子治疗中进行自动再规划的 具体病人深强化学习 2506.10073v1 -
308 06-11 Textual Bayes: Quantifying Uncertainty in LLM-Based Systems Textbuchten: Quantifizierung der Unsicherheit in LLM-basierten Systemen 以LLM为基础的系统中的量化不确定性 2506.10060v1 -
309 06-11 DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos DGS-LRM: 实时变形 3D Gaussian 重建独视视频 2506.09997v1 -
310 06-11 Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling Flipping gegen alle Ungeraden: LLM-Münze Flip-Bias durch verbalisierte Ablehnungsstichproben reduzieren 反万千万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万千万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万 2506.09998v1 -
311 06-11 EVINET: Towards Open-World Graph Learning via Evidential Reasoning Network EVINET: Auf dem Weg zum Open-World Graph Learning über Evidential Reasoning Network EVINET:通过证明理由网络走向开放世界图表学习 2506.07288v2 -
312 06-11 Text-Aware Image Restoration with Diffusion Models Text-Aware Bildrestaurierung mit Diffusionsmodellen 带有传播模型的文本软件图像恢复 2506.09993v1 -
313 06-11 Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Multiverse: Ihre Sprachmodelle entscheiden geheim, wie die Generation parallelisiert und zusammengeführt werden kann 多种语言:您的语言模型秘密决定如何平行和合并一代 2506.09991v1 -
314 06-11 Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation Chain-of-Action: Trajektorie Autoregressive Modellierung für die Robotermanipulation 操作链:机器人操纵的轨迹自动递减模型模型 2506.09990v1 -
315 06-11 Spectral Image Tokenizer Spektraler Bild-Tokenizer 频谱图像调试器 2412.09607v2 -
316 06-11 EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits EditInspector: Ein Benchmark für die Auswertung von textgeführten Bildbearbeitungen 编辑监视器:文本指导图像编辑评价基准 2506.09988v1 -
317 06-11 Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs Omni-DPO: Ein Dual-Perspektiven-Paradigma für dynamisches Preference-Lernen von LLMs Omni-DPO:一个双双面范式,以动态方式学习LLMs 2506.10054v1 -
318 06-11 A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs Ein Shortcut-aware-Video-QA-Benchmark für physisches Verständnis über minimale Video-Paare 通过最小视频对等器进行身体理解的捷径视距视频QA基准 2506.09987v1 -
319 06-11 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning V-JEPA 2: Selbstüberwachte Videomodelle ermöglichen Verständnis, Vorhersage und Planung V-JEPA2:自我监督的视频模型,促进理解、预测和规划 2506.09985v1 -
320 06-11 Trustworthy AI: Safety, Bias, and Privacy – A Survey Vertrauenswürdige KI: Sicherheit, Bias und Privatsphäre – Eine Umfrage 值得信赖的大赦国际:安全、比亚斯和隐私 – – 调查 2502.10450v2 -
321 06-11 Resa: Transparent Reasoning Models via SAEs Resa: Transparente Begründungsmodelle über SAE Resa:通过SAEs建立透明说明理由模型 2506.09967v1 -
322 06-11 Gradient Aligned Regression via Pairwise Losses Gradient Aligned Regression über Paarweise Verluste 通过对等损失逐步走向一致后退 2402.06104v6 -
323 06-11 Griffin: Towards a Graph-Centric Relational Database Foundation Model Griffin: Auf dem Weg zu einem Graph-Centric Relational Database Foundation Modell 格里芬:走向图形中心关系数据库基础模型 2505.05568v2 -
324 06-11 Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models Provable Benefits of Unüberwachte Pre-Training and Transfer Learning via Single-Index Models 通过单一指标模式进行的未受监督的培训和转让前学习的惠益 2502.16849v2 -
325 06-11 Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy Kvasir-VQA-x1: Ein multimodaler Datensatz für medizinische Vernunft und robustes MedVQA in der gastrointestinalen Endoskopie Kvasir-VQA-x1:一个用于医疗理由的多模式数据集和气肠内肠内镜检查中的强力MDVQA 2506.09958v1 -
326 06-11 Canonical Latent Representations in Conditional Diffusion Models Kanonische Latentendarstellungen in bedingten Diffusionsmodellen 有条件扩散模型中卡尼卡内端表示器 2506.09955v1 -
327 06-11 Scalable First-order Method for Certifying Optimal k-Sparse GLMs Skalierbare Methode erster Ordnung zur Zertifizierung optimaler k-Spar-GLMs 可缩放的核证最佳 k- Sparse GLM 最佳 GLM 的可缩放第一等级方法 2502.09502v3 -
328 06-11 7B Fully Open Source Moxin-LLM/VLM – From Pretraining to GRPO-based Reinforcement Learning Enhancement 7B Vollständig offene Quelle Moxin-LLM/VLM – Vom Vortraining zur GRPO-basierten Verstärkungs-Lernverbesserung 7B 完全开放源代码 Moxin-LLM/VLM – – 从预培训到基于GROP的强化学习加强 2412.06845v5 -
329 06-11 Curvature Tuning: Provable Training-free Model Steering From a Single Parameter Curvature Tuning: Vorstellbare trainingsfreie Modelllenkung aus einem einzigen Parameter 曲线图示:从单一参数看可实现的无培训示范指导 2502.07783v4 -
330 06-11 The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability Die Komplexität der Online-Strategischen Entscheidungsfindung mit Informationen Asymmetrie und Wissenstransportierbarkeit 在线战略决策与信息不对称和知识可传输性 2506.09940v1 -
331 06-11 Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order Leveraging Coordinate Momentum in SignSGD und Muon: Speicheroptimierte Null-Order 利用 SignSGD 和 Muon 中的调控坐标动态: 内存适应的零点 2506.04430v2 -
332 06-11 Limits of nonlinear and dispersive fiber propagation for an optical fiber-based extreme learning machine Grenzen der nichtlinearen und dispersiven Faserausbreitung für eine optische Faser-basierte extreme Lernmaschine 光纤极端学习机器的非线性和分散性纤维传播限制 2503.03649v3 -
333 06-11 CaLMQA: Exploring culturally specific long-form question answering across 23 languages CaLMQA: Erforschen von kulturell spezifischen Langzeitfragen in 23 Sprachen CALMQA:探讨以23种语言回答文化上特有的长式问题 2406.17761v3 -
334 06-11 Generate-then-Verify: Reconstructing Data from Limited Published Statistics Generate-then-Verify: Rekonstruieren von Daten aus begrenzten veröffentlichten Statistiken 生成 - - - - - - - 核查:重新构建来自有限出版统计数据的数据 2504.21199v2 -
335 06-11 Discovering Forbidden Topics in Language Models Verbotene Themen in Sprachmodellen entdecken 发现语言模型中的禁止专题 2505.17441v3 -
336 06-11 RNE: a plug-and-play framework for diffusion density estimation and inference-time control RNE: ein Plug-and-Play-Framework für Diffusionsdichteschätzung und Inferenzzeitkontrolle RNE:用于传播密度估计和推断时间控制的一个插件和操作框架 2506.05668v2 -
337 06-11 Bayesian Probabilistic Matrix Factorization Bayesische probabilistische Matrix-Fabrikisierung Bayesian 概率矩阵 2506.09928v1 -
338 06-11 Mamba time series forecasting with uncertainty quantification Mamba-Zeitreihenvorhersage mit Unsicherheitsquantifizierung Mamba时间序列预测,带有不确定性量化的Mamba时间序列 2503.10873v2 -
339 06-11 Dynamic Diffusion Schrödinger Bridge in Astrophysical Observational Inversions Dynamische Diffusion Schrödinger Brücke in astrophysikalischen Beobachtungsinversionen 天体物理观测反转中的动态扩散Srödinger桥 2506.08065v2 -
340 06-11 Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning Apollo: A Posteriori Label-Only Mitgliedschaft Inferenz Angriff auf Maschine Unlearning 阿波罗:一个后选标签 - 唯一的会员 推论攻击 机器退出学习 2506.09923v1 -
341 06-11 Logits-Based Finetuning Logits-based Finetuning 基于登录的微调 2505.24461v2 -
342 06-11 Optimal Noise Reduction in Dense Mixed-Membership Stochastic Block Models under Diverging Spiked Eigenvalues Condition Optimale Lärmreduzierung in Dichte Mischmitgliedschaft Stochastic Block Models unter Diverging Spiked Eigenvalues Zustand 在四分五裂的斯派派值条件下,减少高密度混合混合成员制碎块模型的最佳噪音 2307.14530v3 -
343 06-11 Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity Adam exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity Adam 通过协调的适应性 开发 $\ ell\ incty$- 损失地貌测量 2410.08198v3 -
344 06-11 Temperature Optimization for Bayesian Deep Learning Temperaturoptimierung für Bayesian Deep Learning Bayesian深层学习的温度优化 2410.05757v2 -
345 06-11 PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants PersonaLens: Ein Benchmark für die Personalisierungsbewertung in Conversational AI Assistants 人际关系:对话性AI助理个人化评价基准 2506.09902v1 -
346 06-11 “What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended) “Was sind meine Optionen?”: Erklären von RL-Agenten mit unterschiedlichen nahezu optimalen Alternativen (erweitert) “我有什么选择?”:解释具有多种近最佳替代方法的RL代理(扩展) 2506.09901v1 -
347 06-11 The Remarkable Robustness of LLMs: Stages of Inference? Die bemerkenswerte Robustheit von LLMs: Stufen der Schlussfolgerung? LLMS的显著威力:推论阶段? 2406.19384v2 -
348 06-11 A look at adversarial attacks on radio waveforms from discrete latent space Ein Blick auf feindliche Angriffe auf Radiowellenformen aus diskretem latenten Raum 审视对来自离散潜空空间的无线电波形的对抗性攻击 2506.09896v1 -
349 06-11 Causal Climate Emulation with Bayesian Filtering Causal Klima Emulation mit Bayesian Filtering 以贝耶斯过滤器模拟因果气候 2506.09891v1 -
350 06-11 Learning single-index models via harmonic decomposition Ein-Index-Modelle über harmonische Zersetzung lernen 通过口腔分解学习单指数模型 2506.09887v1 -
351 06-11 OmniJet-$α_C$: Learning point cloud calorimeter simulations using generative transformers OmniJet-$α_C$: Simulationen von Punktwolkenkalorimetern mit generativen Transformatoren OmniJet-$ C$: 使用基因变压器学习点云度热量计模拟 2501.05534v2 -
352 06-11 Leveraging data-driven weather models for improving numerical weather prediction skill through large-scale spectral nudging Nutzung von datengesteuerten Wettermodellen zur Verbesserung der numerischen Wettervorhersagefähigkeit durch großflächige spektrale Nudging 利用数据驱动天气模型,通过大型光谱裸露提高数字天气预测技能 2407.06100v3 -
353 06-11 UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching UmbraTTS: Text-zu-Sprechen an Umweltkontexte anpassen mit Flow Matching UmbratTS:用流动匹配使文字语音适应环境环境环境 2506.09874v1 -
354 06-11 Private Aggregation for Byzantine-Resilient Heterogeneous Federated Learning Private Aggregation für Byzantinisch-Resilient Heterogenes Federated Learning Byzantine-抗后异异质联邦学习私营聚合 2506.09870v1 -
355 06-11 Machine Learning-Based Classification of Oils Using Dielectric Properties and Microwave Resonant Sensing Machine Learning-based Klassifizierung von Ölen mit dielektrischen Eigenschaften und Mikrowellen-Resonant Sensing 利用电力特性和微波共振遥感对石油进行机 机 动 学习 分类 2506.09867v1 -
356 06-11 Conformal Prediction as Bayesian Quadrature Konforme Vorhersage als bayesische Quadratur Bayesian 夸度的非正规预测 2502.13228v2 -
357 06-11 Guided Graph Compression for Quantum Graph Neural Networks Geführte Graphenkompression für Quantum Graphen-Neural-Netzwerke 量图神经网络的向导图形压缩 2506.09862v1 -
358 06-11 ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models ImageChain: Advancing Sequential Image-to-Text Reasoning in multimodalen großen Sprachmodellen 图像Chain:在多模式大语言模型中推进按顺序排列的图像到文字理由 2502.19409v2 -
359 06-11 Tractable hierarchies of convex relaxations for polynomial optimization on the nonnegative orthant Tractable Hierarchien konvexer Entspannungen zur polynomialen Optimierung auf dem nicht negativen Orthanten 用于非阴性恒星多边优化的 convex 放松的可跟踪的等级结构 2209.06175v2 -
360 06-11 FinTSBridge: A New Evaluation Suite for Real-world Financial Prediction with Advanced Time Series Models FinTSBridge: Eine neue Evaluierungs-Suite für reale finanzielle Vorhersagen mit Modellen der fortgeschrittenen Zeitreihen FinTSBridge:具有先进时间序列模型的实时世界金融预测新评价套件 2503.06928v2 -
361 06-11 Advancing Exchange Rate Forecasting: Leveraging Machine Learning and AI for Enhanced Accuracy in Global Financial Markets Advancing Exchange Rate Forecasting: Hebeling Machine Learning und KI für verbesserte Genauigkeit in den globalen Finanzmärkten 提高汇率预测:利用机器学习和大赦国际促进全球金融市场更加准确性 2506.09851v1 -
362 06-11 Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective Produkt von Experten mit LLMs: Leistungssteigerung auf ARC ist eine Frage der Perspektive 与LLMM公司的专家产品有关的产品:促进ACR的绩效是一个透视问题 2505.07859v2 -
363 06-11 Reinforcement Learning from Human Feedback Verstärktes Lernen aus menschlichem Feedback 从人类反馈中学习 2504.12501v2 -
364 06-11 A Deep Generative Model for the Simulation of Discrete Karst Networks Ein tiefes Generatives Modell für die Simulation diskreter Karst-Netzwerke Discrete Karst 网络模拟深度生成模型 2506.09832v1 -
365 06-11 Multi-Variable Batch Bayesian Optimization in Materials Research: Synthetic Data Analysis of Noise Sensitivity and Problem Landscape Effects Multivariable Batch Bayesian Optimierung in der Materialforschung: Synthetische Datenanalyse von Lärmempfindlichkeit und Problemlandschaftseffekten 材料研究中多种可变批量贝耶斯最佳优化:噪音敏感度和问题景观效应合成数据分析 2504.03943v2 -
366 06-11 Online Discovery of Simulation Models for Evolving Business Processes (Extended Version) Online Discovery of Simulation Models for Evolving Business Processes (Erweiterte Version) 不断演变的业务流程模拟模型在线发现(扩展版) 2506.10049v1 -
367 06-11 Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation Sparser, besser, schneller, stärker: Sparsity Detection für effiziente automatische Differenzierung Sparser、更好、更快、更有力:为高效自动差异而进行分级检测 2501.17737v2 -
368 06-11 Metritocracy: Representative Metrics for Lite Benchmarks Metritokratie: Repräsentative Metrics für Lite Benchmarks 宏观经济:利特基准的代议制衡量标准 2506.09813v1 -
369 06-11 Generalizing Supervised Contrastive learning: A Projection Perspective Verallgemeinern des Überwachten Kontrastivlernens: Eine Projektionsperspektive 普及受监督的反竞争学习:预测视角 2506.09810v1 -
370 06-11 Network Dynamics-Based Framework for Understanding Deep Neural Networks Network Dynamics-based Framework zum Verständnis tiefer neuraler Netzwerke 了解深神经网络的动态框架 2501.02436v3 -
371 06-11 Using Shapley interactions to understand how models use structure Shapley-Interaktionen nutzen, um zu verstehen, wie Modelle Struktur verwenden 利用模拟互动来了解模型如何使用结构 2403.13106v2 -
372 06-11 Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy Automatische Behandlungsplanung mit Verstärkungstraining für eine hoch dosierte Prostata Brachytherapie 利用高剂量前列腺布拉奇疗法强化学习进行自动治疗规划 2506.09805v1 -
373 06-11 Regularizing Learnable Feature Extraction for Automatic Speech Recognition Regularisieren von Lern-Feature-Extraktion für automatische Spracherkennung 将自动语音识别的可读功能抽取器正规化 2506.09804v1 -
374 06-11 Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols Teufelshand: Daten vergiften Angriffe auf lokal private Graphen-Lernprotokolle 魔鬼之手:对本地私人图案学习程序的数据毒害攻击 2506.09803v1 -
375 06-11 LogProber: Disentangling confidence from contamination in LLM responses LogProber: Entwirren des Vertrauens in LLM-Antworten 日志Prober:解除对LLM反应中污染的信心 2408.14352v2 -
376 06-11 Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction Einschließlich sprachlicher Einschränkungen aus der externen Wissensquelle für Audio-Visual Target Speech Extraction 将外部知识来源的语言限制纳入音像目标演讲摘录 2506.09792v1 -
377 06-11 DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images DANCE: Deep Learning-Assisted Analyse von Proteinsequenzen mit Chaos verbesserte Kaleidoskopische Bilder 成就:利用混乱强化卡莱多斯图象对蛋白序列进行深入学习辅助分析 2409.06694v3 -
378 06-11 Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing Darstellung Shattering in Transformers: Synthetische Studie mit Wissensbearbeitung 在变形器中代表变形器:带有知识编辑的合成研究 2410.17194v5 -
379 06-11 A theoretical framework for self-supervised contrastive learning for continuous dependent data Ein theoretischer Rahmen für selbstüberwachtes kontrastives Lernen für kontinuierliche abhängige Daten 为持续依赖数据进行自我监督对比式学习的理论框架 2506.09785v1 -
380 06-11 On the Similarities of Embeddings in Contrastive Learning Über die Ähnlichkeiten von Einbettungen im kontrastiven Lernen 关于差异学习中的嵌入相似性 2506.09781v1 -
381 06-11 Cross-Channel Unlabeled Sensing over a Union of Signal Subspaces Cross-Channel-Unmarkierte Sensing über eine Union von Signal-Subräumen 对信号子空间联盟的交叉通道无标签遥感 2506.09773v1 -
382 06-11 Load-Aware Training Scheduling for Model Circulation-based Decentralized Federated Learning Load-Aware-Trainingsplanung für Modellzirkulationsbasiertes dezentralisiertes Federated Learning 模拟环环流分散式联邦学习模式的负载软件培训计划 2506.09769v1 -
383 06-11 Anomaly Detection via Autoencoder Composite Features and NCE Anomalieerkennung über Autoencoder Composite-Features und NCE 通过自动编码器合成特征和 NCE 异常探测 2502.01920v2 -
384 06-11 Learning to Optimize Package Picking for Large-Scale, Real-World Robot Induction Optimierung der Auswahl von Paketen für großformatige, real-Welt-Roboter-Induktion 学习优化大型、现实世界机器人上岗软件包选择 2506.09765v1 -
385 06-11 Alice and the Caterpillar: A more descriptive null model for assessing data mining results Alice und die Caterpillar: Ein beschreibendes Nullmodell zur Bewertung der Ergebnisse des Data Mining 爱丽丝和毛毛虫:评估数据开采结果的更具描述性的无效模式 2506.09764v1 -
386 06-11 Generating Likely Counterfactuals Using Sum-Product Networks Wahrscheinliche Counterfactuals mit Sum-Product-Netzwerken generieren 使用总和生产网络产生可能的反事实 2401.14086v5 -
387 06-11 STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization STAR: Vielfältiges Lernen von Roboter-Fähigkeit Abstraktionen durch Rotations-Augmented Vector Quantization STAR:通过旋转-增强矢量量化,学习多式机器人技能变异 2506.03863v2 -
388 06-11 Feature Engineering for Agents: An Adaptive Cognitive Architecture for Interpretable ML Monitoring Feature Engineering for Agents: Eine adaptive Kognitive Architektur für interpretierbare ML-Überwachung 代理物特征工程:用于可解释ML监测的适应性认知结构 2506.09742v1 -
389 06-11 Byzantine-Resilient Decentralized Multi-Armed Bandits Byzantinisch-Resiliente dezentralisierte Multi-Armed Banditen 拜占庭抗力强的分散式多武装强盗 2310.07320v2 -
390 06-11 Towards Multi-modal Graph Large Language Model Auf dem Weg zum multimodalen Graphenmodell für große Sprachen 走向多式大语言数字模型 2506.09738v1 -
391 06-11 AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale AtmosMJ: Über die Jahresskala hinausgehende KI-Wettervorhersagen über das Gateing-Mechanismus hinaus AtmosMJ: 重新研究国际气象预报系统在年规模以后的天气预报的确定机制 2506.09733v1 -
392 06-11 Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts Auf dem Weg zu Basismodellen für die dynamische Systemrekonstruktion: Hierarchisches Meta-Learning über die Mischung von Experten 建立动态系统重建基础模式:通过专家混合学习 2502.05335v2 -
393 06-11 Reevaluating Meta-Learning Optimization Algorithms Through Contextual Self-Modulation Neubewertung von Meta-Learning-Optimierungsalgorithmen durch kontextuelle Selbst-Modulation 通过上下文自体调整重新评价元-收益优化优化化的元-收益优化化 2410.01655v2 -
394 06-11 Auto-Compressing Networks Autokomprimierende Netzwerke 自动压缩网络 2506.09714v1 -
395 06-11 Training-Free Voice Conversion with Factorized Optimal Transport Schulungsfreie Sprachumwandlung mit Factorized Optimal Transport 以优化运输为要素的免费培训语音转换 2506.09709v1 -
396 06-11 LLM2TEA: Agentic AI Designer Finds Innovative Objects with Generative Evolutionary Multitasking LLM2TEA: Agentischer AI-Designer findet innovative Objekte mit generativem evolutionären Multitasking LLM2TEA: ARI 设计师发现具有引演进进化多任务的创新对象 2406.14917v2 -
397 06-11 On the Importance of Embedding Norms in Self-Supervised Learning Über die Bedeutung der Einbettung von Normen in das Selbstüberwachte Lernen 关于自我监督学习中嵌入规范的重要性 2502.09252v2 -
398 06-11 TRIDENT: Temporally Restricted Inference via DFA-Enhanced Neural Traversal TRIDENT: Vorübergehend eingeschränkte Schlussfolgerung über DFA-verstärktes Neuraltraversal 通过DFA-增强的神经轨迹轨迹临时受限制的引文 2506.09701v1 -
399 06-11 SLEEPYLAND: trust begins with fair evaluation of automatic sleep staging models SLEEPYLAND: Vertrauen beginnt mit einer fairen Bewertung automatischer Schlafinszenierungsmodelle SEPEYLAND:信任始于公平评价自动睡眠模式 2506.08574v2 -
400 06-11 Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment Jenseits von Bradley-Terry-Modellen: Ein allgemeines Präferenzmodell für Sprachmodellausrichtung 超越布拉德利-泰瑞模式:语文模式统一通用优先模式 2410.02197v3 -
401 06-11 Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation Auto-Regressive vs Flow-Matching: Eine vergleichende Studie zur Modellierung von Paradigmen für die Text-zu-Musik-Generierung 自动递减与流动配接: 文本到音乐一代模型模型比较研究 2506.08570v2 -
402 06-11 Share Secrets for Privacy: Confidential Forecasting with Vertical Federated Learning Share Secrets for Privacy: Vertrauliche Vorhersagen mit vertikalem Federated Learning 私隐共享秘密:与垂直联邦学习有关的保密预报 2405.20761v2 -
403 06-11 Adding simple structure at inference improves Vision-Language Compositionality Einfache Struktur bei Inferenz hinzufügen verbessert die Vision-Sprachenkompositionalität 在推论中添加简单结构改进视野-语言的构成性 2506.09691v1 -
404 06-11 Wasserstein Hypergraph Neural Network Wasserstein Hypergraph Neuronales Netzwerk Vasserstein 高音神经网络 2506.09682v1 -
405 06-11 Assessing the Quality of Denoising Diffusion Models in Wasserstein Distance: Noisy Score and Optimal Bounds Bewertung der Qualität des Denoisierens von Diffusionsmodellen in Wasserstein Abstand: Geräuschpegel und optimale Bounds 评估瓦西尔斯坦距离中Disoising扩散模型的质量:噪音分数和最佳区区 2506.09681v1 -
406 06-11 SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting SplitLoRA: Balance Stabilität und Plastizität im kontinuierlichen Lernen durch gradienten Raum Splitting Split LoRA:通过逐步空间分割在持续学习中平衡稳定和可塑性 2505.22370v3 -
407 06-11 Wavelet Scattering Transform and Fourier Representation for Offline Detection of Malicious Clients in Federated Learning Wavelet Scattering Transform und Fourier-Darstellung zur Offline-Erkennung von bösartigen Kunden im Federated Learning 联邦学习联盟恶意客户离线检测分波散散变形和Fourier代表处 2506.09674v1 -
408 06-11 Optimizing Shortfall Risk Metric for Learning Regression Models Optimierung von Shortfall Risk Metric zum Lernen von Regressionsmodellen 优化 “ 短缺风险 “ 风险计量模型的学习倒退模型 2505.17777v3 -
409 06-11 CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization CROW: Beseitigung von Backdoors aus großen Sprachmodellen über interne Konsistenzregularisierung CROW:通过内部一致性规范化,消除大语言模式中的后门 2411.12768v2 -
410 06-11 CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain CINeMA: Bedingter Impliziter Neuraler Multi-Modaler Atlas für eine räumlich-zeitliche Darstellung des Perinatalen Gehirns CINEMA: 围产期大脑空间时/时代表性有条件的隐性神经隐性多模式图集 2506.09668v1 -
411 06-11 Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes Lernen von zeitvariierenden Multi-Region Gehirnkommunikation über skalierbare Markovian Gaussian Prozesse 通过可缩放的马尔科维扬高斯进程进行学习、改变时间的多区域脑交流 2407.00397v5 -
412 06-11 Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement Anmerkungsfreie MIDI-to-Audio-Synthese über Konkatenative Synthese und Generative Veredelung 通过分类合成和生产改进进行无说明的MIDI到研究综合综述 2410.16785v2 -
413 06-11 SyncFed: Time-Aware Federated Learning through Explicit Timestamping and Synchronization SyncFed: Time-Aware-Federated Learning durch explizites Zeitstempeln und Synchronisieren 同步Fed:通过明确时间印记和同步化进行时间软件联邦学习 2506.09660v1 -
414 06-11 Intent Factored Generation: Unleashing the Diversity in Your Language Model Intent Factored Generation: Die Vielfalt in Ihrem Sprachmodell entfesseln 内在因数代:在你的语言模式中释放多样性 2506.09659v1 -
415 06-11 DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy DipLLM: Feinsteuerungs-LLM für strategische Entscheidungsfindung in der Diplomatie DipLLM: 外交战略决策的精细推荐LLM 2506.09655v1 -
416 06-11 Improving Discriminator Guidance in Diffusion Models Verbesserung der Orientierungshilfen für Diskriminatoren bei Diffusionsmodellen 改进传播模式中反对者的指导 2503.16117v2 -
417 06-11 Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations Die Gleichwertigkeit Bayesischer neuraler Netzwerke und Gaussian-Prozesse: Über die Bedeutung von Lernaktivierungen 重新审视贝耶斯神经网络和高斯进程等效的贝耶斯神经网络和高斯进程:关于学习活动的重要性 2410.15777v3 -
418 06-11 Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension Diskrepanzen sind Tugend: Schwach-zu-starke Verallgemeinerung durch Lens der Intrinsischen Dimension 差异是道德:通过内分泌尺寸的透镜对电压的微弱普遍化 2502.05075v4 -
419 06-11 Pruning Spurious Subgraphs for Graph Out-of-Distribtuion Generalization Pruning Spurious Subgraphs für Graph Out-of-Distribtuion Verallgemeinerung 平面平面平面平面图图解裂平面图集集 2506.05957v3 -
420 06-11 No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces Keine Aufgabe nach links: Isotropes Modell, das sich mit gemeinsamen und aufgabenspezifischen Subräumen vermischt 未留下任务 : 与共同和任务特定子空间合并的异形模型 2502.04959v3 -
421 06-11 HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios HopaDIFF: Ganzheitlich-Partial Aware Fourier Conditioned Diffusion für die Referenzierung von Human Action Segmentierung in Multi-Personen-Szenarien HopaDIFF: 在多重人的情况中为提及人类行动分部分而传播的全方位全方位认识 2506.09650v1 -
422 06-11 Scaling Laws for Uncertainty in Deep Learning Skalierungsgesetze für Ungewissheit im Deep Learning 强化深层学习不确定性法律 2506.09648v1 -
423 06-11 Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls Verteilungs- und adversariell robuste Logistische Regression über Intersecting Wasserstein Balls 通过交叉瓦塞施泰因球的分布式和反向强劲后勤后退 2407.13625v4 -
424 06-11 Real-Time Network Traffic Forecasting with Missing Data: A Generative Model Approach Echtzeit-Netzverkehr mit fehlenden Daten: Ein generatives Modellansatz 缺少数据实时网络流量预测:生成模式方法 2506.09647v1 -
425 06-11 Generalizable and Fast Surrogates: Model Predictive Control of Articulated Soft Robots using Physics-Informed Neural Networks Generalisierbare und schnelle Surrogate: Modellvorhersagesteuerung von artikulierten Soft Robots mit physikinformierten Neuronalen Netzwerken 通用和快速超速超速:利用物理成形神经网络对人工软体机器人进行模拟预测控制 2502.01916v2 -
426 06-11 Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering Effizientes und generalisierbares Graph Retriever für Wissens-Graph-Frageantworten lernen 知识拼图问题解答的学习效率和通用图表 2506.09645v1 -
427 06-11 Evasion Attacks Against Bayesian Predictive Models Evasionsangriffe gegen bayesische Vorhersagemodelle 攻击贝耶斯人预测模型 2506.09640v1 -
428 06-11 FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models FFVLMBENCH: 设想语言模型联邦罚款基准 2506.09638v1 -
429 06-11 Model Attribution and Detection of Synthetic Speech via Vocoder Fingerprints Modellzuweisung und Erkennung von Synthetischer Sprache über Vocoder-Fingerabdrücke 通过 Vocoder 指纹识别合成演讲的模型归属和检测 2411.14013v2 -
430 06-11 In-Context Bias Propagation in LLM-Based Tabular Data Generation In-Context Bias Propagation in der LLM-basierten tabellarischen Datengenerierung 以LLM为基础的制表数据生成中的Intext Bias 2506.09630v1 -
431 06-11 Holistic Uncertainty Estimation For Open-Set Recognition Ganzheitliche Ungewissheitsschätzung für Open-Set-Erkennung Open-Set 识别的全局性不确定性估算 2408.14229v2 -
432 06-11 GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras GLGENN: Eine neuartige Parameter-Licht-Equivariante Neural Networks-Architektur auf Basis von Clifford Geometrischen Algebras GLGENN: 以克里夫德几何代数为基础的新参数-日等均质神经网络建筑 2506.09625v1 -
433 06-11 SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot SparseSSM: Effiziente selektiv strukturierte State Space-Modelle können in One-Shot beschnitten werden SprassessSSSSM: 高效选择性结构结构国家空间模型可在单层中谨慎处理 2506.09613v1 -
434 06-11 Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism Verständnis der Kompetenzlücke in recurrenten Sprachmodellen: Die Rolle des Sammel- und Aggregatmechanismus 了解经常语文模式中的技能差距:集合和集合机制的作用 2504.18574v2 -
435 06-11 Accelerating Large-Scale Regularized High-Order Tensor Recovery Beschleunigung der großflächigen Regularisierung der High-Order Tensor Erholung 加速大规模正规化高阶天文恢复 2506.09594v1 -
436 06-11 Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting Physics-informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting 用于全球次季对季预报的计算机化远程连接软件变换器 2506.08049v2 -
437 06-11 Metric-Entropy Limits on the Approximation of Nonlinear Dynamical Systems Metrisch-Entropie-Grenzen zur Annäherung nichtlinearer dynamischer Systeme 非线-线性动态系统接近度的中子-内基限值 2407.01250v2 -
438 06-11 Beyond Overconfidence: Foundation Models Redefine Calibration in Deep Neural Networks Beyond Overconfidency: Foundation Models Neufine Calibration in Deep Neural Networks 超越自信:深神经网络中的模型重新校准基金会 2506.09593v1 -
439 06-11 Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis Decoding Knowledge Attribution in Mixture-of-Experts: Ein Rahmenwerk der Basic-Refinement-Kollaboration und Effizienzanalyse 在混合专家中解码知识的配置:基础改进协作和效率分析框架 2505.24593v2 -
440 06-11 Human-like object concept representations emerge naturally in multimodal large language models Mensch-ähnliche Objekt-Konzept-Darstellungen entstehen natürlich in multimodalen großen Sprachmodellen 在多式大型语言模式中自然出现类似人类的物体概念表述 2407.01067v3 -
441 06-11 Feasibility Study of CNNs and MLPs for Radiation Heat Transfer in 2-D Furnaces with Spectrally Participative Gases Machbarkeitsstudie von CNNs und MLPs für den Strahlungswärmetransfer in 2-D-Öfen mit Spektrally Participative Gasen 关于有线电视新闻网和多频多频卫星在2-D发热中用光谱参与气体进行辐射热传导的有线电视新闻网和 MLP的可行性研究 2506.08033v2 -
442 06-11 MOORL: A Framework for Integrating Offline-Online Reinforcement Learning MOORL: Ein Rahmen für die Integration von Offline-Online-Verstärkungslernen MORL: 整合离线-在线强化学习的框架 2506.09574v1 -
443 06-11 Mechanistic PDE Networks for Discovery of Governing Equations Mechanistische PDE-Netzwerke für die Entdeckung von Regierenden Gleichungen 发现管理等分的机械 PDE 网络 2502.18377v2 -
444 06-11 From Symbolic to Neural and Back: Exploring Knowledge Graph-Large Language Model Synergies Von Symbolisch zu Neural und Zurück: Erforschen von Wissen Graphen-Große Sprachmodell-Synergien 从符号到神经和后背:探索知识图集-大型语言模式协同作用 2506.09566v1 -
445 06-11 Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics Funktion Normalisierung verhindert Zusammenbruch nicht kontrastiver Lerndynamiken 防止非争议学习动态的崩溃 2309.16109v2 -
446 06-11 Traceable LLM-based validation of statements in knowledge graphs Rückführbare LLM-basierte Validierung von Aussagen in Wissensgraphen 以可追踪的LLM为基础,对知识图中的报表进行可追踪的LLM验证 2409.07507v2 -
447 06-11 On Finetuning Tabular Foundation Models Über Finetuning Tabular Foundation Modelle 关于微调表表基础模型 2506.08982v2 -
448 06-11 Large Language Models Miss the Multi-Agent Mark Große Sprachmodelle vermissen das Multi-Agent Mark 大语言模型 2505.21298v2 -
449 06-11 Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means Robuste Verteilte Schätzung: Erweiterung von Gossip-Algorithmen auf Rangfolge und Trimmmittel 强有力的分布分布式估算:将Gossip的数值扩大至排名和缩略语 2505.17836v5 -
450 06-11 DeepExtractor: Time-domain reconstruction of signals and glitches in gravitational wave data with deep learning DeepExtractor: Zeit-Domänen-Rekonstruktion von Signalen und Störungen in Gravitationswellendaten mit Deep Learning 深度提取器:用深层学习重建引力波数据信号和小故障的时间- 域重建 2501.18423v3 -
451 06-11 Differentially Private Federated $k$-Means Clustering with Server-Side Data Differential Private Federated $k$-Means Clustering mit Server-Side-Daten 与服务器-Side数据组合在一起 2506.05408v2 -
452 06-11 The Causal Information Bottleneck and Optimal Causal Variable Abstractions Die Kausalinformation Engpass und Optimale Kausalvariable Abstraktionen 因果信息瓶颈和最佳因果变量抽象 2410.00535v5 -
453 06-11 Identifiable Latent Bandits: Leveraging observational data for personalized decision-making Identifizierbare Latent Bandits: Nutzung von Beobachtungsdaten für personalisierte Entscheidungen 利用观测数据进行个性化决策 2407.16239v4 -
454 06-11 Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information Eng gekoppelte LiDAR-IMU-Leg-Odometrie mit Online-Lektionskinematik, die Fußtaktile Informationen enthält LIDAR-IMU-Leg Odo量度与包含脚触觉信息的在线学习腿脚力学 2506.09548v1 -
455 06-11 Design Patterns for Securing LLM Agents against Prompt Injections Designmuster zur Sicherung von LLM-Agenten gegen sofortige Injektionen 防止及时注射的LLM代理物的安全设计模式 2506.08837v2 -
456 06-11 On the Privacy Risks of Spiking Neural Networks: A Membership Inference Analysis Über die Datenschutzrisiken von Spiking Neuronal Networks: Eine Schlussfolgerungsanalyse für die Mitgliedschaft 《关于间谍神经网络的隐私风险:成员推断分析》 2502.13191v4 -
457 06-11 Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models Athena: Multimodale Vernunft mit dateneffizienten Prozess-Reward-Modellen verbessern Athena:利用数据高效流程奖励模式加强多式联运理由 2506.09532v1 -
458 06-11 Neural Functions for Learning Periodic Signal Neurale Funktionen für das Lernen von periodischem Signal 学习周期信号的神经功能 2506.09526v1 -
459 06-11 Bias Detection via Maximum Subgroup Discrepancy Bias-Erkennung über maximale Untergruppe Diskrepanz 通过最大分组差异探测比亚斯 2502.02221v2 -
460 06-11 LLM-Powered CPI Prediction Inference with Online Text Time Series LLM-Powered CPI Prediction Inferenz mit Online Text Time Series LLM 授权的CPI 在线文本时间序列预测推论 2506.09516v1 -
461 06-11 A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications Eine Umfrage zur Rolle von Künstlicher Intelligenz und maschinellem Lernen in 6G-V2X Anwendungen 关于6G-V2X应用中人工情报和机器学习的作用的调查 2506.09512v1 -
462 06-11 Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design Effizientes Preference-Based-Verstärkungslernen: Randomized Exploration trifft auf experimentelles Design 高效的以优惠为基础的强化学习:随机探索与实验设计 2506.09508v1 -
463 06-11 A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes Eine einheitliche Theorie der Kompositionalität, Modularität und Interpretationsfähigkeit in Markov-Entscheidungsprozessen 马尔科夫决定程序的组成、模式和解释统一理论 2506.09499v1 -
464 06-11 LEMUR Neural Network Dataset: Towards Seamless AutoML LEMUR Neural Network Dataset: Auf dem Weg zur nahtlosen AutoML LEMUR 神经网络数据集: 走向无缝自动ML 2504.10552v2 -
465 06-11 EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization EnerBridge-DPO: Energiegeführtes Protein inverses Falten mit Markov-Brücken und direkter Präferenzoptimierung EnerBridge-DPO:以Markov桥和直接优化为首的热导蛋白质反向折叠 2506.09496v1 -
466 06-11 Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers Bridging Online Behavior and Clinical Insight: Eine längsseitige LLM-basierte Studie zur Suizidalität auf YouTube enthüllt neuartige digitale Marker 架桥在线行为和临床透视:YouTube Reveals新数字标记上基于纵向LLM的自杀性纵向LM研究 2506.09495v1 -
467 06-11 MIRA: Medical Time Series Foundation Model for Real-World Health Data MIRA: Medical Time Series Foundation Modell für real-World Gesundheitsdaten 医疗时间系列基金会实际世界卫生数据模型 2506.07584v2 -
468 06-11 Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds Minimax-Optimalität von tiefen neuronalen Netzwerken auf abhängigen Daten über PAC-Bayes-Grenzen 通过PAC-Bayes界限依靠数据建立深海神经网络的最小最佳程度 2410.21702v2 -
469 06-11 BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation BemaGANv2: Eine Tutorial- und Vergleichsstudie von GAN-basierten Vocodern für die Langzeit-Audiogenerierung BemaGANv2: 以GAN为基地的长期音频生成Vocoders的教学和比较调查 2506.09487v1 -
470 06-11 Plug-and-Play image restoration with Stochastic deNOising REgularization Plug-and-Play-Bildrestaurierung mit stochastischer Entnoisierung der Regularisierung 利用斯托卡式脱氧化还溶性恢复插件和布板图像 2402.01779v3 -
471 06-11 Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling Auf dem Weg zu einem einheitlichen und verlustfreien Latent-Raum für 3D-Molekular-Latent-Diffusionsmodell 迈向3D分子中子中子扩散模型的 统一和无损失的冷冻空间 2503.15567v3 -
472 06-11 Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models Konvergenzbedingungen für stochastische Liniensuche auf Basis der Optimierung von überparametrisierten Modellen 存储线搜索的一致条件 优化过度平衡模型的优化 2408.03199v2 -
473 06-11 On a few pitfalls in KL divergence gradient estimation for RL Bei einigen Fallen bei KL-Divergenzgradientenschätzung für RL KL RL差差差梯度估算的几个陷阱 2506.09477v1 -
474 06-11 Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough? Modellgröße in kontinuierlichen Gauß-Prozessen anpassen: Wie groß ist groß genug? 调整连续高斯进程模型大小:大到多大? 2408.07588v4 -
475 06-11 A Reinforcement Learning Approach for RIS-aided Fair Communications Ein stärkerer Lernansatz für RIS-gestützte Fair Communications 以强化学习方法促进RIRS援助的公平交流 2506.06344v2 -
476 06-11 Forecasting high-impact research topics via machine learning on evolving knowledge graphs Prognose hochwirksamer Forschungsthemen durch maschinelles Lernen auf sich entwickelnden Wissensgraphen 通过对不断演变的知识图表进行机器学习,预测高影响研究专题 2402.08640v4 -
477 06-11 ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning ReinFlow: Feinsteuerungs-Flow Matching-Politik mit Online-Verstärkungs-Lernen ReinFlow: 与在线强化学习匹配流动政策的微调 2505.22094v3 -
478 06-11 Bayesian Neural Scaling Law Extrapolation with Prior-Fitted Networks Bayesian Neural Scaling Law Extrapolation mit Prior-Fitted-Netzwerke Bayesian神经扩增法与事先确定网络的外推法 2505.23032v2 -
479 06-11 Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms Auf dem Weg zur Überbrückung des Lohn-Generation-Gaps in direkten Ausrichtungsalgorithmen 争取弥合直接调整数值中的奖励-评分差距 2506.09457v1 -
480 06-11 NDCG-Consistent Softmax Approximation with Accelerated Convergence NDCG-Konsistente Softmax-Annäherung mit beschleunigter Konvergenz NDCG-Consistent Softmax 与加速汇合相近 2506.09454v1 -
481 06-11 MADCluster: Model-agnostic Anomaly Detection with Self-supervised Clustering Network MADCluster: Modell-agnostische Anomalieerkennung mit selbstüberwachtem Clustering-Netzwerk MADCluster:使用自监管的集群网进行模型-不可知异常探测 2505.16223v5 -
482 06-11 SafeCoT: Improving VLM Safety with Minimal Reasoning SafeCoT: VLM-Sicherheit mit minimaler Vernunft verbessern SafeCot: 提高VLM安全性,并采用最低合理理由 2506.08399v2 -
483 06-11 Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform Verschlimmerungen von LLM-Embedding-Sequenzen lernen: Glas-Reformation 学习LLM 嵌入序列的学习障碍:染色玻璃变形 2506.09452v1 -
484 06-11 Safe Screening Rules for Group SLOPE Sichere Screening-Regeln für Gruppen-LOPE SLOPE集团安全筛选规则 2506.09451v1 -
485 06-11 Modified K-means Algorithm with Local Optimality Guarantees Modifizierter K-Mittel-Algorithmus mit lokalen Optimierungsgarantien 使用本地最佳度保障修改 K 平均值 2506.06990v2 -
486 06-11 Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking Aufmerksamkeits-Bayesischer Hybrid-Ansatz für modulares multiples Partikel-Tracking 模块多粒子跟踪的注意 – – 巴伊斯混合办法 2506.09441v1 -
487 06-11 Optimization over Sparse Support-Preserving Sets: Two-Step Projection with Global Optimality Guarantees Optimierung über Sparse Support-Preserving Sets: Zwei-Schritt-Projektion mit globalen Optimierungsgarantien 使用全球最佳保证的两步投影 2506.08558v2 -
488 06-11 Generalization Error Analysis for Attack-Free and Byzantine-Resilient Decentralized Learning with Data Heterogeneity Generalisierungsfehleranalyse für Angriffsfreies und Byzantinisch-Resilientes dezentralisiertes Lernen mit Daten Heterogenität 为无攻击攻击和有数据差异化的抗拜占庭分散化学习提供通用错误分析 2506.09438v1 -
489 06-11 Discovering Physics Laws of Dynamical Systems via Invariant Function Learning Physikgesetze dynamischer Systeme durch Invariant Function Learning entdecken 通过动态功能学习发现动态系统的物理法 2502.04495v2 -
490 06-11 Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling Lohnüberoptimierung bei Direktausrichtungsalgorithmen mit Wichtigkeits-Sampling abmildern 在具有重要性抽样调查的直接对齐数值中,减缓回调过度优化 2506.08681v2 -
491 06-11 When Is Diversity Rewarded in Cooperative Multi-Agent Learning? Wann wird Vielfalt im kooperativen Multi-Agenten-Lernen belohnt? 当多样性在多机构合作学习中得到回报吗? 2506.09434v1 -
492 06-11 Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training Milderung von sauberen Korrelationen in LLMs durch Kausalität-Bewusstsein nach dem Training 通过Causaliity-Aware 培训后在LLMs中减轻净污损 2506.09433v1 -
493 06-11 Through a Steerable Lens: Magnifying Neural Network Interpretability via Phase-Based Extrapolation Durch eine Steerable Lens: Vergrößerung der Neuralen Netzwerk-Interpretabilität über phasenbasierte Extrapolation a 通过固态透镜:通过基于分阶段的外推法使神经网络放大解释 2506.02300v3 -
494 06-11 CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis CTPD: Cross-Modal Temporal Pattern Discovery für verbesserte multimodale elektronische Gesundheitsdatenanalyse CTPD: 增强多模式电子健康记录分析的跨模式时间模式发现 2411.00696v2 -
495 06-11 Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic Monitoring Schwach überwachtes Multiple Instance-Lernen für Whale Call Detection und zeitliche Lokalisierung in der langdauerhaften passiven akustischen Überwachung 在长期被动听力监测中,受监督的捕鲸呼叫探测和时间定位多案例学习能力薄弱 2502.20838v2 -
496 06-11 Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models Folgen Sie der Energie, finden Sie den Weg: Riemannsche Metrics aus energiebasierten Modellen 遵循能源,寻找道路:以能源为基础的模型的里曼尼度量仪 2505.18230v2 -
497 06-11 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Vertrauen ist alles, was Sie brauchen: Wenig heißes RL Feintuning von Sprachmodellen 信心是你们需要的:很少点点RL精细的语言模式 2506.06395v3 -
498 06-11 Improving the performance of optical inverse design of multilayer thin films using CNN-LSTM tandem neural networks Verbesserung der Leistungsfähigkeit des optischen inversen Designs von Mehrschicht-Dünnfilmen mit CNN-LSTM Tandem-Neuronalen Netzwerken 利用CNN-LSTM同步神经网络改进多层薄薄膜光反光设计性能 2506.10044v1 -
499 06-11 Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation Zeiteinheitliche Diffusionspolitik mit Aktionsdiskriminierung für die Robotermanipulation 对机器人操纵行动歧视的、时间统一的统一传播政策 2506.09422v1 -
500 06-11 A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy Ein Aufruf für kollaborative Intelligenz: Warum Menschen-Agenten-Systeme der KI-Autonomie vorausgehen sollten 呼吁合作情报:为什么人力-物力系统应先于自主 2506.09420v1 -
501 06-11 NestQuant: Nested Lattice Quantization for Matrix Products and LLMs NestQuant: Nested Lattice Quantization für Matrix-Produkte und LLMs NestQuant: 母体产品和LLMs的Nasted Lattice量化 2502.09720v2 -
502 06-11 DeepForm: Reasoning Large Language Model for Communication System Formulation DeepForm: Großes Sprachmodell für die Formulierung von Kommunikationssystemen 深形式:开发通信系统的理由大语言模式 2506.08551v2 -
503 06-11 Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors Wer die Störung gestartet hat, sollte sie beenden: Leiten datenfreies Modell Zusammenführen über Task-Vektoren 开始干预者应当结束干预的任何人:指导通过任务矢量合并无数据模式 2503.08099v2 -
504 06-11 Arrival Control in Quasi-Reversible Queueing Systems: Optimization and Reinforcement Learning Ankunftskontrolle in Quasi-Reversible Queueing Systemen: Optimierung und Verstärkungslernen 准可更新排队系统抵达控制:优化和强化学习 2505.16353v2 -
505 06-11 Code-Switching Curriculum Learning for Multilingual Transfer in LLMs Code-Switching Curriculum Learning für den mehrsprachigen Transfer in LLMs 以LLMM为多种语言转让进行代码转换课程学习 2411.02460v2 -
506 06-11 Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations Sim-to-Real-Causal-Transfer: Ein metrischer Lernansatz zu kausal-aware Interaktionsdarstellungen 简单到实际因果转移:从计量学习方法进行体能互动演示 2312.04540v2 -
507 06-11 Scoop-and-Toss: Dynamic Object Collection for Quadrupedal Systems Scoop-and-Toss: Dynamische Objektsammlung für Vierradsysteme Scooop-tos:四制系统动态对象收藏 2506.09406v1 -
508 06-11 Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps Feinabstimmungs-Diffusions-Politiken mit Backpropagation durch Diffusion Zeitschritte 微调通过扩散时间步骤进行反向宣传的传播政策 2505.10482v2 -
509 06-11 What Can RL Bring to VLA Generalization? An Empirical Study Was kann RL zur VLA-Verallgemeinerung bringen? Eine empirische Studie RL能带给VLA的概括化带来什么?经验研究。 2505.19789v2 -
510 06-11 AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI) AdversariaL attacK sAfety alignment(ALKALI): Sicherung von LLMs durch GRACE: Geometrische Darstellung-Bewusst Kontrastive Verbesserung- Einführung von Adversarial Vulnerability Quality Index (AVQI) Adversarial attacK sAfety alignment(ALKALI):通过GRACE来保护LLMS:几何代表-提高软件对抗性增强-采用反脆弱性质量指数(AVQI) 2506.08885v2 -
511 06-11 Chem42: a Family of chemical Language Models for Target-aware Ligand Generation Chem42: eine Familie chemischer Sprachmodelle für die zielbewusste Ligand-Generation Chem42: “ 目标觉知长者 “ 化学语言模式家庭 2503.16563v2 -
512 06-11 Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization Synergisierendes Verstärkungslernen und genetische Algorithmen für die neurale Kombinatorialoptimierung 实现神经组合优化的强化强化学习和遗传数值同步化 2506.09404v1 -
513 06-11 Pre-trained Large Language Models Learn Hidden Markov Models In-context Vortrainierte große Sprachmodelle lernen versteckte Markov-Modelle In-Kontext 受过培训的大型语言模型学习隐藏Markov模型 2506.07298v2 -
514 06-11 LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification LLM Enhancers for GNNs: Eine Analyse aus der Perspektive der Identifizierung von Kausalmechanismen GNN的LLM增强者:从因果关系机制的确定角度进行分析 2505.08265v3 -
515 06-11 A theoretical basis for model collapse in recursive training Eine theoretische Grundlage für Modellkollaps in rekursiver Ausbildung 循环培训模式崩溃的理论基础 2506.09401v1 -
516 06-11 Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames Effiziente Vorhersage von SO(3)-Equivarienten Hamiltonian Matrices über SO(2) Local Frames 通过SO(2)地方框架有效预测SO(3)-等质汉密尔顿母体 2506.09398v1 -
517 06-11 The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent Der Computational Advantage of Depth: High-Dimensional Hierarchical Funktionen mit gradienten Abstieg lernen 深度的计算优势:学习带有渐变后代的高多元等级功能 2502.13961v3 -
518 06-11 SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving SLED: Ein spekulatives LLM-Decoding-Framework für effizientes Edge Serving SLED: 有效边缘服务投机性LLM代谢框架 2506.09397v1 -
519 06-11 A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities Ein genauerer Blick auf TabPFN v2: seine Stärken verstehen und seine Fähigkeiten erweitern 更仔细地看看TabPFN v 2:了解其力量并扩大其能力 2502.17361v2 -
520 06-11 MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models MedChat: Multi-Agenten-Framework für multimodale Diagnosen mit großen Sprachmodellen MedChat:使用大语言模式的多语言多模式诊断多机构框架 2506.07400v2 -
521 06-11 Spatial-RAG: Spatial Retrieval Augmented Generation for Real-World Geospatial Reasoning Questions Raum-RAG: Raum-Retrieval Augmented Generation für Real-World Geospatial Reasoning Fragen 空间-RAG:世界现实地理空间原因问题空间检索增加一代 2502.18470v5 -
522 06-11 Coil2Coil: Self-supervised MR image denoising using phased-array coil images Coil2Coil: Selbstüberwachtes MR-Bild, das mit Hilfe von Phased-Array-Coil-Bildern dämpft Coil2Cooil:使用相相继阵状圆圈图像自行监督的 MR 图像除去污染 2208.07552v2 -
523 06-11 Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation Die neuen Diffusionsmodelle: Vom Generativen Pre-Training bis zur Ein-Schritt-Generation 重新审视传播模式:从产生培训前培训到单步一代 2506.09376v1 -
524 06-11 Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining Modality-Balancing Preference Optimierung großer multimodaler Modelle durch disversariale Negative Mining 通过反反向负负采矿优化大型多式联运模型 2506.08022v2 -
525 06-11 Effective Regularization Through Loss-Function Metalearning Effektive Regularisierung durch Loss-Function Metalearning 通过损失 – – 职能计算学习实现有效正规化 2010.00788v5 -
526 06-11 Low-resource domain adaptation while minimizing energy and hardware resource consumption Low-Resource-Domänenanpassung bei Minimierung des Energie- und Hardware-Ressourcenverbrauchs 低资源领域的适应,同时尽量减少能源和硬件资源的消耗 2506.08433v2 -
527 06-11 TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation TimeDART: Ein Diffusion Autoregressiver Transformer für selbstüberwachte Zeitreihendarstellung TimeDART: 自动控制时间序列代表的传播自动递减变动器 2410.05711v5 -
528 06-11 LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization LPO: Auf dem Weg zu einer präzisen Interaktion der GUI-Agenten über Standortpräferenzoptimierung LPO:通过位置优化实现准确的图形界面代理互动 2506.09373v1 -
529 06-11 Learnable Spatial-Temporal Positional Encoding for Link Prediction Erlernbare räumlich-temporale Positionskodierung für Link Prediction 链接预测的空间-临时定位编码 2506.08309v2 -
530 06-11 Anomaly Detection and Generation with Diffusion Models: A Survey Anomalieerkennung und -generierung mit Diffusionsmodellen: Eine Umfrage 使用传播模型进行异常检测和生成:调查 2506.09368v1 -
531 06-11 An Online Learning Approach to Prompt-based Selection of Generative Models and LLMs Ein Online-Lernansatz zur sofortigen Auswahl generativer Modelle und LLMs 以在线学习方式迅速选择创创模式和LLMs 2410.13287v4 -
532 06-11 SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending SkillBlender: Auf dem Weg zu einer vielseitigen humanoiden Ganzkörperloko-Manipulation durch Skill Blending 技术人才库:通过技能混合处理,努力通过技能混合处理,实现全体操控和操纵 2506.09366v1 -
533 06-11 TextSquare: Scaling up Text-Centric Visual Instruction Tuning TextSquare: Skalieren von Text-Centric Visual Instruction Tuning TextSquare: 增强文本内容的视觉教学图示 2404.12803v3 -
534 06-11 Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models Automatische Pseudo-Harmful Prompt Generation zur Bewertung falscher Ablehnungen in großen Sprachmodellen 自动生成用于评价大语言模式虚假拒绝的优多有害快速自动生成 2409.00598v2 -
535 06-11 Generalized Lie Symmetries in Physics-Informed Neural Operators Generalisierte Lie-Symmetrien in physikinformierten Neuraloperatoren 物理内成形神经操作器中的一般谎言对称 2502.00373v2 -
536 06-11 PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs PDE-Controller: LLMs für Autoformalisierung und Begründung von PDEs PDE – – 主计长:PDE的自动化正规化和原因说明 2502.00963v2 -
537 06-11 Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation Autoregressive Adversariale Nachschulung für die Echtzeit-Interactive Videogenerierung 实时互动互动视频制作培训后自动递减反反反反向自动 2506.09350v1 -
538 06-11 TS-PIELM: Time-Stepping Physics-Informed Extreme Learning Machine Facilitates Soil Consolidation Analyses TS-PIELM: Zeitschnelle Physik-informierte extreme Lernmaschine erleichtert Bodenkonsolidierungsanalysen TS-PIELM: 时间制备物理系统一体化极端学习机器促进土壤巩固分析 2506.08381v2 -
539 06-11 Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis Warum sind Web-KI-Agenten anfälliger als Standalone LLMs? Eine Sicherheitsanalyse 为什么互联网上AI代理机构比独立的LLMs更加脆弱? 2502.20383v2 -
540 06-11 Adversarial Surrogate Risk Bounds for Binary Classification Adversarial Surrogate Risk Bounds für Binäre Klassifizierung 二元分类的反向代谢风险临界值 2506.09348v1 -
541 06-11 ErrorEraser: Unlearning Data Bias for Improved Continual Learning ErrorErraser: Unlearning Data Bias für verbessertes kontinuierliches Lernen 错误搜索器: 退出数据比亚, 用于改进持续学习 2506.09347v1 -
542 06-11 Ming-Omni: A Unified Multimodal Model for Perception and Generation Ming-Omni: Ein einheitliches multimodales Modell für Wahrnehmung und Erzeugung Ming-Omni:感知和一代的统一多模式模型 2506.09344v1 -
543 06-11 RePO: Replay-Enhanced Policy Optimization RePO: Replay-erweiterte Politikoptimierung REPO: 重放强化政策优化 2506.09340v1 -
544 06-11 AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin AsFT: Verankerungssicherheit während LLM-Feinsteuerung innerhalb eines schmalen Sicherheitsbeckens AsFT:LLM在狭窄安全盆地内精密出海期间安整安全 2506.08473v2 -
545 06-11 Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference Sentence-Level-Reward-Modell kann besser für die Ausrichtung LLM aus menschlichen Präferenzen Generalisieren 判决一级奖赏模式可以普遍化,更有利于将人类特惠的LLM与LM相匹配 2503.04793v4 -
546 06-11 Know What You Don’t Know: Uncertainty Calibration of Process Reward Models Wissen, was Sie nicht wissen: Unsicherheit Kalibrierung von Prozess-Reward-Modellen 你不知道你不知道什么: 过程奖励模型的不确定性校准 2506.09338v1 -
547 06-11 STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation STAMImputer: Raum-Temporale Aufmerksamkeit MoE für Verkehrsdaten Imputation STAMIMUTER: 交通数据截读的时/时注意MoE 2506.08054v2 -
548 06-11 Natural Language Guided Ligand-Binding Protein Design Natürliche Sprache geführt Ligand-Binding Protein Design Ligand-Binding 蛋白设计 2506.09332v1 -
549 06-11 SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space SWAT-NN: Simultangewichte und Architekturtraining für neurale Netzwerke im Latent Space SWAT-NN: 低空空间神经网络的同质重量和建筑培训 2506.08270v2 -
550 06-11 Your Agent Can Defend Itself against Backdoor Attacks Ihr Agent kann sich gegen Hintertürangriffe verteidigen 你的特工可以保护自己 免受幕后攻击 2506.08336v2 -
551 06-11 polyGen: A Learning Framework for Atomic-level Polymer Structure Generation polyGen: Ein Lernrahmen für die atomare Polymerstrukturbildung 聚金:原子级聚合体结构生成学习框架 2504.17656v3 -
552 06-11 Low-Loss Space in Neural Networks is Continuous and Fully Connected Low-Loss-Raum in neuralen Netzwerken ist kontinuierlich und vollständig verbunden 神经网络中的低损失空间是连续和完全连接的 2505.02604v3 -
553 06-11 Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies Beschleunigung der LLM-Inferenz mit verlustfreier spekulativer Dekodierung von Algorithmen für heterogene Vokabeln 加速LLM 超异基因词汇表无损失投机代谢值的加速LLM推导值 2502.05202v3 -
554 06-11 Monte Carlo Tree Diffusion for System 2 Planning Monte Carlo Tree Diffusion für System 2 Planung 用于系统2规划的蒙特卡洛树传播 2502.07202v4 -
555 06-11 On The Impact of Merge Request Deviations on Code Review Practices Über die Auswirkungen von Merge Request Abweichungen auf Code-Review-Praktiken 合并请求对守则审查惯例的影响 2506.08860v2 -
556 06-11 Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models Schulungsfreie Anleitung über die Unterscheidungsfähigkeit: Skalierbare Pfadsteuerung mit Baumsuche in Diffusions- und Flussmodellen 超越差异的无培训指导—-超越差异的无培训指导:在传播和流动模型中进行植树搜索的可扩展路径指导 2502.11420v3 -
557 06-11 Alzheimer’s Dementia Detection Using Perplexity from Paired Large Language Models Alzheimer’s Dementia Detection Using Perplexity from Paired Large Language Models 阿兹海默氏病痴呆症检测,使用Paired大语言模型的复度 2506.09315v1 -
558 06-11 Surrogate models to optimize plasma assisted atomic layer deposition in high aspect ratio features Surrogate-Modelle zur Optimierung der plasmaunterstützten Atomschichtabscheidung in Eigenschaften mit hohem Seitenverhältnis 利用代用模型优化高方位比率特征的等离子体辅助原子层沉积 2506.09313v1 -
559 06-11 What is the Cost of Differential Privacy for Deep Learning-Based Trajectory Generation? Was sind die Kosten der unterschiedlichen Privatsphäre für Deep Learning-Based Trajektorie Generation? 深层学习轨迹世代的不同隐私成本是多少? 2506.09312v1 -
560 06-11 Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions Rewind-to-Delete: Zertifiziertes Maschinen-Entlernen für nicht konvexe Funktionen 倒回到删除: 认证机器不学习非convex函数 2409.09778v4 -
561 06-11 A Geometric Modeling of Occam’s Razor in Deep Learning Eine geometrische Modellierung von Occams Razor im Deep Learning Occam的深层学习剃刀的几何建模 1905.11027v9 -
562 06-11 PARAFAC2-based Coupled Matrix and Tensor Factorizations with Constraints PARAFAC2-basierte gekoppelte Matrix- und Tensor-Faktorisierungen mit Einschränkungen 以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2为基础的 2406.12338v2 -
563 06-11 Neuromorphic Optical Tracking and Imaging of Randomly Moving Targets through Strongly Scattering Media Neuromorphes optisches Tracking und Bildgebung von zufällig sich bewegenden Zielen durch stark streuende Medien 通过强烈散射媒体对随机移动目标进行神经形态光学跟踪和成像 2501.03874v2 -
564 06-11 Lower Bounds for Learning Quantum States with Single-Copy Measurements Untere Grenzen für das Lernen von Quantenzuständen mit Single-Copy-Messungen 具有单复度测量法的低下学习区次数国家 2207.14438v3 -
565 06-10 (2) Lightweight Object Detection Using Quantized YOLOv4-Tiny for Emergency Response in Aerial Imagery Leichtgewichtige Objekterkennung mit Quantisiertem YOLOv4-Tiny für Notfallreaktion in Luftbildern 在空中图像中,使用量化的YOLOv4-Tiny用于应急反应的轻重量物体探测 2506.09299v1 -
566 06-10 WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles WaKA: Datenzuweisung unter Verwendung von K-Nächstgelegenen Nachbarn und Mitgliedschafts-Privacy Principles WaKA:使用K-近邻和成员隐私权原则的数据归属 2411.01357v3 -
567 06-10 Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems Künstliche Intelligenz für die Wissenschaft in Quanten-, Atom- und Kontinuumsystemen 量子、原子学和连续系统科学人造情报 2307.08423v5 -
568 06-10 Causal Graph Recovery in Neuroimaging through Answer Set Programming Causal Graph Recovery in Neuroimaging durch Answer Set Programmierung 通过 “ 解答组 “ 方案编制实现神经成形过程中的因果图恢复 2506.09286v1 -
569 06-10 ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs Skalierbare HD: Skalierbare und hochdurchsatzfähige Hyperdimensionale Computing-Inferenz auf Multi-Core-CPUs 可缩缩放的HD:可缩放的和高通向的多孔数多维电子计算引文 2506.09282v1 -
570 06-10 TTrace: Lightweight Error Checking and Diagnosis for Distributed Training TTrace: Leichte Fehlerprüfung und Diagnose für verteiltes Training TTrace:轻量度错误检查和分布式培训诊断 2506.09280v1 -
571 06-10 Ambient Diffusion Omni: Training Good Models with Bad Data Ambient Diffusion Omni: Gute Modelle mit schlechten Daten trainieren 环境扩散全云:培训不良数据良好模型 2506.10038v1 -
572 06-10 A Topic Modeling Analysis of Stigma Dimensions, Social, and Related Behavioral Circumstances in Clinical Notes Among Patients with HIV Thematische Modellierung von Stigma-Dimensionen, sozialen und damit zusammenhängenden Verhaltensumständen bei Patienten mit HIV 艾滋病毒携带者临床说明中的污名、社会和相关行为环境的典型问题专题分析 2506.09279v1 -
573 06-10 UFM: A Simple Path towards Unified Dense Correspondence with Flow UFM: Ein einfacher Weg zur Unified Dense Correspondence mit Flow UFM:通向统一思维与流动对应的简单路径 2506.09278v1 -
574 06-10 EMMA: Efficient Visual Alignment in Multi-Modal LLMs EMMA: Effiziente visuelle Ausrichtung in Multi-Modal LLMs EMMA:多模式LMS中高效视觉协调 2410.02080v2 -
575 06-10 Learning The Minimum Action Distance Die minimale Aktionsdistanz lernen 学习最低行动距离 2506.09276v1 -
576 06-10 Amortized Inference of Causal Models via Conditional Fixed-Point Iterations Amortisierte Schlussfolgerung von Kausalmodellen über bedingte Fixed-Point-Iterationen 通过有条件的固定点定点迭代对因果模型进行摊销 2410.06128v3 -
577 06-10 RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts RocketPPA: Code-Level Power, Performance und Area Prediction über LLM und Mixture of Experts 火箭式PPPA:通过LLM和专家混合进行代码级动力、性能和地区预测 2503.21971v3 -
578 06-10 G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration G-Sim: Generative Simulationen mit großen Sprachmodellen und gradientfreier Kalibrierung G-Sim:使用大语言模型和梯度自由校准的生成模拟 2506.09272v1 -
579 06-10 Uncertainty Prioritized Experience Replay Ungewissheit Priorisierte Erfahrung Replay 不确定性确定性优先经验重放 2506.09270v1 -
580 06-10 A Stable Whitening Optimizer for Efficient Neural Network Training Ein stabiler Whitening Optimizer für effizientes Neural Network Training 高效神经网络培训的稳定白电优化器 2506.07254v2 -
581 06-10 Dataset Properties Shape the Success of Neuroimaging-Based Patient Stratification: A Benchmarking Analysis Across Clustering Algorithms Datensatzeigenschaften gestalten den Erfolg der Neuroimaging-basierten Patientenschichtung: Eine Benchmarking-Analyse über Cluster-Algorithmen hinweg 数据集属性形状基于神经成形的患者分层成功率:对集束数值的基准分析 2503.12066v2 -
582 06-10 Archon: An Architecture Search Framework for Inference-Time Techniques Archon: Ein Architektur-Suchrahmen für Inferenz-Zeit-Techniken Archon: 推断-时间技术建筑搜索框架 2409.15254v6 -
583 06-10 CFMI: Flow Matching for Missing Data Imputation CFMI: Flow Matching für fehlende Daten-Imputation CFMI: 缺失数据计算流程匹配 2506.09258v1 -
584 06-10 Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Kommentar zur Illusion des Denkens: Die Stärken und Grenzen von Vernunftmodellen über das Lens of Problem Complexity verstehen 关于 “ 思考的幻觉:通过问题复杂焦点了解理性模型的长处和局限性 “ 的评论 2506.09250v1 -
585 06-10 Simulation-based Inference for High-dimensional Data using Surjective Sequential Neural Likelihood Estimation Simulationsbasierte Inferenz für hochdimensionale Daten mittels Surjektiv-Sequentieller Neural Likelihood-Abschätzung 使用粒子序列神经神经近似近似性估计法对高维数据进行基于模拟的推论 2308.01054v3 -
586 06-10 Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation 在多式工业数据库检索增强型多式工业数据库中有条件监测援助 2506.09247v1 -
587 06-10 A Generative Framework for Predictive Modeling of Multiple Chronic Conditions Using Graph Variational Autoencoder and Bandit-Optimized Graph Neural Network Ein generatives Framework für die vorausschauende Modellierung mehrerer chronischer Zustände mit Graph Variational Autoencoder und Bandit-Optimiertes Graphen-Neural-Netzwerk 利用图表变异自动编码器和强盗优化的图形神经网络对多种慢性条件进行预测模拟的生成框架 2409.13671v3 -
588 06-10 Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning Ursachenzuweisung von Lieferrisiken durch kausale Entdeckung mit Verstärkungslernen 利用强化学习发现原因和强化学习,确定分娩风险的根源原因 2408.05860v3 -
589 06-10 Electroencephalogram Emotion Recognition via AUC Maximization Elektroenzephalogramm-Emotionserkennung über AUC-Maximierung 通过AUUC实现最大化来识别电脑图情感 2408.08979v3 -
590 06-10 PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies PatchGuard: Widerwärtig robuste Anomalieerkennung und Lokalisierung durch Vision Transformer und Pseudoanomalien PatchGuard:通过视觉变形器和假冒异常器反向强力异常检测和定位 2506.09237v1 -
591 06-10 FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy FastLloyd: Federated, Exakt, Secure, and Tunable $k$-Means Clustering with Differential Privacy 联邦、准确、安全和金枪鱼用$k$-配有不同隐私的组合型工具 2405.02437v3 -
592 06-10 LieRE: Lie Rotational Positional Encodings LieRE: Lie Rotational Positional Encodings 谎言: 谎言旋转位置编码 2406.10322v4 -
593 06-10 SoK: Machine Unlearning for Large Language Models SoK: Maschinelles Lernen für große Sprachmodelle SoK:大语言模型的机器脱学 2506.09227v1 -
594 06-10 The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound Die Wirksamkeit von Semantik-erhaltenden Transformationen im selbstüberwachten Lernen für medizinischen Ultraschall 医学超声波自监测学习中语义保留变异的功效 2504.07904v2 -
595 06-10 Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs Robuste Geräuschdämpfung durch adaptive Bündelung von Transformer-Ausgängen 通过调适式组合变异器产出,通过调适式组合组合,强有力噪音减速 2506.09215v1 -
596 06-10 Revisiting Graph Projections for Effective Complementary Product Recommendation Überarbeiten von Graph Projektionen für eine effektive ergänzende Produktempfehlung 修订有效补充产品建议书的图表预测 2506.09209v1 -
597 06-10 A Comparative Study of Conventional and Tripolar EEG for High-Performance Reach-to-Grasp BCI Systems Vergleichende Studie des konventionellen und tripolaren EEG für Hochleistungs-Reach-to-Grap BCI-Systeme 利用常规和三极EEG进行高绩效接近格拉斯普到格拉斯普BCI系统比较研究 2402.09448v3 -
598 06-10 FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models FLoRIST: Singular Value Thresholding für effizientes und präzises Federated Fine-Tuning von großen Sprachmodellen FLORST:为高效率和准确的联邦大语言模型的精密使用而保持单一价值 2506.09199v1 -
599 06-10 Multimodal Pragmatic Jailbreak on Text-to-image Models Multimodaler Pragmatischer Jailbreak zu Text-to-Image-Modellen 关于文本到图像模型的多式实用监狱 2409.19149v2 -
600 06-10 Certified Unlearning for Neural Networks Zertifiziertes Lernen für neurale Netzwerke 神经网络认证的不学习 2506.06985v2 -
601 06-10 LaDCast: A Latent Diffusion Model for Medium-Range Ensemble Weather Forecasting LaDCast: Ein latentes Diffusionsmodell für mittelgroße Ensemble-Wettervorhersage LaDCast: 中频集合天气预报的延迟传播模型 2506.09193v1 -
602 06-10 Multi-Task Reward Learning from Human Ratings Multi-Task-Reward-Lernen aus menschlichen Bewertungen 从人类评级中学习多任务奖励 2506.09183v1 -
603 06-10 Pointwise confidence estimation in the non-linear $\ell^2$-regularized least squares Punktweise Vertrauensschätzung in den nichtlinearen $\ell^2$-regularisierten kleinsten Quadraten 在非线性$\ell2美元正规化最低平方中点性信心估计值 2506.07088v2 -
604 06-10 Spatial Reasoning with Denoising Models Raumgerechtes Denken mit vernarbenden Modellen 空间原因与排减模式 2502.21075v2 -
605 06-10 Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism Roboter-Gated Interaktives Imitation-Lernen mit adaptivem Interventionsmechanismus 采用适应性干预机制的机器人自制交互式模拟学习 2506.09176v1 -
606 06-10 Multivariate Long-term Time Series Forecasting with Fourier Neural Filter Multivariate Langzeit-Zeitreihenprognose mit Fourier-Neuralfilter 使用 Fourier神经过滤器预测多变长期时间序列 2506.09174v1 -
607 06-10 The Curious Language Model: Strategic Test-Time Information Acquisition Das Kuriose Sprachmodell: Strategische Test-Zeit-Informationsbeschaffung 好奇语言模式:战略试验时间信息采集 2506.09173v1 -
608 06-10 MultiNet: An Open-Source Software Toolkit \& Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models MultiNet: Ein Open Source Software Toolkit \& Benchmark Suite für die Bewertung und Anpassung multimodaler Aktionsmodelle 多网:多模式行动模式评价和适应评估的开放源软件工具包 基准套件 2506.09172v1 -
609 06-10 Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search Verbesserung der LLM-Agentenplanung mit In-Context Learning über Atomic Fact Augmentation und Lookahead Search 通过原子事实增强和目光头搜索改进LLM Agent 规划,通过原子事实增强和目光头搜索进行文中学习 2506.09171v1 -
610 06-10 ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning ProteinZero: Selbstverbessernde Proteinerzeugung über Online-Verstärkungslernen 蛋白零:通过在线强化学习改善蛋白质生成 2506.07459v2 -
611 06-10 Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models Ist die Lärmkonditionierung notwendig? Eine einheitliche Theorie der Bedingungslosen Graphen-Diffusionsmodelle 是否有必要设定噪音条件? 无条件图形扩散模型的统一理论 2505.22935v2 -
612 06-10 Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes Skalierbare Spatiotemporale Inferenz mit Biased Scan Aufmerksamkeit Transformer Neural Prozesse 可缩缩缩的 Spatoto时推理 2506.09163v1 -
613 06-10 ICONS: Influence Consensus for Vision-Language Data Selection ICONS: Einflusskonsens für Vision-Sprachen-Datenauswahl ICONS: 愿景语言数据选择影响共识 2501.00654v3 -
614 06-10 FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts FC-Attack:通过自动发光流程图制作的破监狱多式多式大语言模型 2502.21059v2 -
615 06-10 QuEST: Stable Training of LLMs with 1-Bit Weights and Activations QuEST: Stabiles Training von LLMs mit 1-Bit Gewichten und Aktivierungen QuEST:对具有1-比分重量和激活作用的LLMs进行稳定培训 2502.05003v2 -
616 06-10 TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval TRACE: Erdungszeitreihen im Kontext für multimodale Einbettung und Retrieval TRACE: 在多式嵌入和检索背景下确定时间序列 2506.09114v1 -
617 06-10 Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations Verständnis von Aufgabenvektoren im In-Context-Lernen: Emergenz, Funktionalität und Einschränkungen 了解内文学习中的任务矢量:新出现、功能和局限性 2506.09048v1 -
618 06-10 Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation Agentische Neuronale Netzwerke: Selbstständige Multi-Agenten-Systeme über textuelle Backpropagation 动态神经网络:通过文字反向分析实现自我演进的多行为者系统 2506.09046v1 -
619 06-10 Evolutionary Policy Optimization Optimierung der Evolutionspolitik 演变政策优化 2503.19037v2 -
620 06-10 The Decoupled Risk Landscape in Performative Prediction Die entkoppelte Risikolandschaft in der Performativen Vorhersage 性能预测中脱钩的风险景观 2506.09044v1 -
621 06-10 FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed FZOO: Schneller Zeroth-Order-Optimierer für feintunende große Sprachmodelle in Richtung Adam-Scale-Geschwindigkeit FZOO: 向亚当规模速度的微调大语言模型最佳最佳优选器 2506.09034v1 -
622 06-10 Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning Router-R1: Lehren von LLMs Multi-Round Routing und Aggregation durch Verstärkungslernen 路由-R1路由-R1路由:教学LLMS 2506.09033v1 -
623 06-10 Optimal Spectral Transitions in High-Dimensional Multi-Index Models Optimale Spektralübergänge in High-Dimensional Multi-Index-Modellen 高多维多指数模型中最佳光谱转换 2502.02545v2 -
624 06-10 JuStRank: Benchmarking LLM Judges for System Ranking JuStRank: Benchmarking der LLM-Richter für Systemranking JuStRank:为系统评分确定LLM法官基准 2412.09569v2 -
625 06-10 Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery Spektrale Schätzwerte für Multi-Index-Modelle: Präzise Asymptotik und optimale schwache Erholung 多 Index 模型的光谱光谱模拟器:精确的无症状和最佳微弱恢复 2502.01583v2 -
626 06-10 Diffuse and Disperse: Image Generation with Representation Regularization Diffuse und Disperse: Bildgenerierung mit Repräsentationsregularisierung Diffuse & diffperse: 形象生成,有代表性的规范化 2506.09027v1 -
627 06-10 e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs e3: Erforschen lernen ermöglicht Extrapolation von Test-Time Compute für LLMs e3: 学习探索以利对LLMM的试验时间计算进行外推计算 2506.09026v1 -
628 06-10 DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging DIsoN: Dezentrale Isolationsnetzwerke für Out-of-Distribution Detection in Medical Imaging DISON:医疗成像中分散分布式隔离探测网 2506.09024v1 -
629 06-10 Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing Wahrscheinlich kostensensive gegnerische Verteidigung durch Randomized Glättung 通过随机滑动平滑,进行成本敏感度高的反反向防御 2310.08732v3 -
630 06-10 Edit Flows: Flow Matching with Edit Operations Flows bearbeiten: Flow passend zu Bearbeitungsoperationen 编辑流程: 与编辑操作匹配的流程 2506.09018v1 -
631 06-10 SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning SPEED-RL: Schnellere Schulung von Vernunftmodellen über Online Curriculum Learning SPEEED-RL:通过在线课程学习更快地培训推理模型 2506.09016v1 -
632 06-10 Effective Data Pruning through Score Extrapolation Effektive Datenkorrektur durch Score-Extrapolation 通过计分法外推法有效调节数据 2506.09010v1 -
633 06-10 Scalable Equilibrium Sampling with Sequential Boltzmann Generators Skalierbare Equilibrium-Probenahme mit sequenziellen Boltzmann-Generatoren 配有序列波尔茨曼发电机的可缩缩平衡取样器 2502.18462v2 -
634 06-10 Branched Schrödinger Bridge Matching Verzweigte Schrödingerbrücke passend 分割的 Schrödinger 桥桥配对 2506.09007v1 -
635 06-10 CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model CodeBrain: Bridging Entkoppelter Tokenizer und Multi-Scale-Architektur für EEG-Stiftungsmodell 编码系统:为EEG基金会模式建立连接拆分调控器和多规模结构 2506.09110v1 -
636 06-10 SensorLM: Learning the Language of Wearable Sensors SensorLM: Die Sprache der tragbaren Sensoren lernen 传感器LM:学习穿装传感器的语言 2506.09108v1 -
637 06-10 Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Effiziente Ausrichtung der medizinischen Visionssprache durch Anpassung maskierter Visionsmodelle 通过调适蒙面愿景模型,提高医疗愿景-语言协调效率 2506.08990v1 -
638 06-10 SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning SwS: Selbstbewusste Schwäche-getriebene Problemsynthese bei der Stärkung des Lernens für LLM-Reasoning SWS:自觉的弱点驱动的问题综述 2506.08989v1 -
639 06-10 Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models Quamba2: Robustes und skalierbares Post-Training Quantization Framework für selektive State Space Models Quamba2:有选择性的国家空间模型强力和可扩展的培训后量化框架 2503.22879v3 -
640 06-10 PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips PrisonBreak: Gefängnisbrechende große Sprachmodelle mit weniger als fünfundzwanzig gezielten Bit-Flips 监狱破碎:破禁大语言模式,目标位翻页小于25个目标小于25个 2412.07192v2 -
641 06-10 Bias Analysis in Unconditional Image Generative Models Bias-Analyse in bedingungslosen Bildgenerativen Modellen 无条件图像生成模型中的偏见分析 2506.09106v1 -
642 06-10 DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models DiffLM: Ansteuerbare Synthetische Datengenerierung über Diffusions-Sprachenmodelle DiffLM: 通过传播语言模型生成可控合成数据 2411.03250v2 -
643 06-10 Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs Exakte obere und untere Grenzen für die Ausgabeverteilung von neuralen Netzwerken mit Random Inputs 带有随机输入的神经网络输出分布精确的上下上下曲线 2502.11672v2 -
644 06-10 Propositional Logic for Probing Generalization in Neural Networks Propositionelle Logik zur Probing-Verallgemeinerung in neuralen Netzwerken 神经网络普遍化的预示逻辑 2506.08978v1 -
645 06-10 Tailored Architectures for Time Series Forecasting: Evaluating Deep Learning Models on Gaussian Process-Generated Data Maßgeschneiderte Architekturen für die Prognose von Zeitreihen: Bewertung von Deep Learning-Modellen auf gaussischen prozessgenerierten Daten 时间序列预测:评价高斯进程生成数据的深学习模型 2506.08977v1 -
646 06-10 Calibrated Physics-Informed Uncertainty Quantification Kalibrierung der physikinformierten Unsicherheit Quantifizierung 校准物理 - 内部不确定性量化 2502.04406v2 -
647 06-10 On Large-scale Evaluation of Embedding Models for Knowledge Graph Completion Zur groß angelegten Bewertung von Einbettungsmodellen für die Wissensgraphenvervollständigung 大规模评价知识图完成情况嵌入模型 2504.08970v2 -
648 06-10 Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers Vortrainierte Sprachmodelle lernen bemerkenswert genaue Darstellungen von Zahlen 培训前语文模式的精确数字表示 2506.08966v1 -
649 06-10 GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO GFREND: Generatives, nur wenige Schuss Belohnungsableitung durch EfficieNt DPO GFRIEND: 通过公平竞争和消费者保护局生成的微小的报偿推理 2506.08965v1 -
650 06-10 Gumbel-max List Sampling for Distribution Coupling with Multiple Samples Gumbel-max-Listenstichproben für die Verteilungskopplung mit mehreren Proben 与多个样本相混合的分布式组合的 Gumbel- max 列表列表抽样 2506.05632v2 -
651 06-10 BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly 双形:双体几何大会的学习协作能力 2506.06221v2 -
652 06-10 Towards Robust Deep Reinforcement Learning against Environmental State Perturbation Auf dem Weg zu einem robusten, tiefen Stärkungslernen gegen Umweltzustandsstörung 努力开展强有力的深入强化学习,防止环境状态的干扰 2506.08961v1 -
653 06-10 MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning MetaTT: Ein globaler Tensor-Train-Adapter für parametereffizientes Feintuning MetATT: 参数有效精美调试全球Tensor-Train适应器 2506.09105v1 -
654 06-10 IntTrajSim: Trajectory Prediction for Simulating Multi-Vehicle driving at Signalized Intersections IntTrajSim: Trajektorienvorhersage für das Simulieren von Multi-Fahrzeug-Fahren bei Signalisierten Schnitten IntrajSim: 信号交界处模拟多车辆驾驶轨迹预测 2506.08957v1 -
655 06-10 Unifying Block-wise PTQ and Distillation-based QAT for Progressive Quantization toward 2-bit Instruction-Tuned LLMs Vereinheitlichen Blockweise PTQ und Destillationsbasiertes QAT für die progressive Quantisierung in Richtung 2-Bit Instruction-Tuned LLMs 用于逐步量化至2位制导二位制导LMM 的QAT 统一块式 PTQ 和基于蒸馏的QAT 统一PTQ 和 蒸馏基QAT 2506.09104v1 -
656 06-10 Data Augmentation For Small Object using Fast AutoAugment Datenvergrößerung für kleine Objekte mit schneller AutoAugment 使用快速自动增强的小型对象数据增强 2506.08956v1 -
657 06-10 Segment Concealed Objects with Incomplete Supervision Segmentverdeckte Objekte mit unvollständiger Überwachung 带不完全监督的隐藏物体 2506.08955v1 -
658 06-10 Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction Protriever: End-to-End-Unterscheidbare Proteinhomologie Suche nach Fitness-Vorhersage 远期:最终到最终差异可辨别的蛋白质同理学研究,以预测适合性。 2506.08954v1 -
659 06-10 Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Explanations for Multivariate Time Series Classification Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Erklärungen für multivariate Zeitreihenklassifikation 多变量时间序列分类的多目标子序列、基于多目标子序列的简单反事实解释 2501.04009v2 -
660 06-10 Epistemic Uncertainty in Conformal Scores: A Unified Approach Epistemische Ungewissheit in konformen Partituren: Ein einheitlicher Ansatz 非正式分数中的不确定性:统一办法 2502.06995v2 -
661 06-10 Implicit Neural Representations for Chemical Reaction Paths Implizite neurale Darstellungen für chemische Reaktionswege 化学反应路径的隐性神经图示 2502.15843v2 -
662 06-10 KARMA: A Multilevel Decomposition Hybrid Mamba Framework for Multivariate Long-Term Time Series Forecasting KARMA: Ein Multilevel-Dekompositions-Hybrid-Mamba-Framework für multivariate Langzeitreihenprognosen KARMA: 多变长期时间序列预测多层次分解混合 Mamba 框架 2506.08939v1 -
663 06-10 BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models BioLangFusion: Multimodale Fusion von DNA, mRNA und Protein Language Models BioLangFusion:DNA、MRNA和蛋白素语言模型的多式联运 2506.08936v1 -
664 06-10 Can A Gamer Train A Mathematical Reasoning Model? Kann ein Gamer ein mathematisches Reasoning-Modell trainieren? 赌徒能训练数学理论模型吗? 2506.08935v1 -
665 06-10 Local MDI+: Local Feature Importances for Tree-Based Models Lokale MDI+: lokale Feature-Wichtigkeiten für baumbasierte Modelle 本地 MDI+: 树基模型的本地特点重要性 2506.08928v1 -
666 06-10 Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap Multimodale Unüberwachte Domain-Verallgemeinerung durch Herausholen über die Modalitätslücke 通过跨越模式差距的检索实现多式多式不受监督域化 2402.04416v3 -
667 06-10 PropMEND: Hypernetworks for Knowledge Propagation in LLMs PropMEND: Hypernetworks für die Wissensverbreitung in LLMs PropMEND: 以LLMMs方式传播知识的超网络 2506.08920v1 -
668 06-10 Provably Accurate Adaptive Sampling for Collocation Points in Physics-informed Neural Networks Wahrscheinlich genaue adaptive Probenahme für Kollokationspunkte in physikinformierten Neuronalen Netzwerken 物理知情神经网络定位点的准确适应性抽样 2504.00910v2 -
669 06-10 Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL) Verbesserung der Verallgemeinerbarkeit der Modellentdeckung über den Parameterraum mit Multi-Experiment-Gleichung Lernen (ME-EQL) 通过多实验方程学习(ME-EQL)加强跨参数空间模型发现的一般性 2506.08916v1 -
670 06-10 Federated Generalised Variational Inference: A Robust Probabilistic Federated Learning Framework Federated Generalised Variational Inference: Ein robustes probabilistisches Federated Learning Framework 联邦通用多变性推断:强力概率联邦学习框架 2502.00846v3 -
671 06-10 Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU Implementierung von Keyword-Spoting auf dem MCUX947 Microcontroller mit integrierter NPU 执行关于MCUX947微型控制器与集成 NPU 2506.08911v1 -
672 06-10 BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning BEAST: Effiziente Tokenisierung von B-Splines kodierten Aktionssequenzen für das Imitationslernen BEAST: 模拟学习的B-S级编码行动序列的有效招接 2506.06072v2 -
673 06-10 Intention-Conditioned Flow Occupancy Models Intentionskonditionierte Flow-Belegungsmodelle 有意有条件流动占用模式 2506.08902v1 -
674 06-10 Feature Shift Localization Network Feature Shift Localization Network 本地化网络 2506.09101v1 -
675 06-10 Generalization analysis of an unfolding network for analysis-based Compressed Sensing Generalisierungsanalyse eines sich entfaltenden Netzwerks für analysebasiertes Komprimiertes Sensing 对正在展开的基于分析的压缩遥感压缩分析网络的概括化分析 2303.05582v3 -
676 06-10 Improved Variational Inference in Discrete VAEs using Error Correcting Codes Verbesserte Variationale Schlussfolgerungen in diskreten VAEs mit Fehlerkorrekturcodes 使用错误校正代码的差错 VAEs 中改进的变量引用 2410.07840v2 -
677 06-10 Real-Time Cascade Mitigation in Power Systems Using Influence Graph Improved by Reinforcement Learning Real-Time Cascade Mitigation in Stromsystemen mit Einfluss Graph verbessert durch Verstärkung Learning 利用强化学习所改进的影响图,在电力系统中实时连锁减缓电源系统 2506.08893v1 -
678 06-10 SeerAttention-R: Sparse Attention Adaptation for Long Reasoning SeherAchtung-R: Sparse-Achtung-Anpassung für langes Nachdenken SeerAttention-R: 偏差的注意力调整以适应长期理由 2506.08889v1 -
679 06-10 InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation Analysis InfoDPCCA: Informationstheoretische dynamische probabilistische kanonischen Korrelationsanalyse InfoDPCA: 信息理论动态动态概率能力与卡尼诺相关关系分析 2506.08884v1 -
680 06-10 Filling in the Blanks: Applying Data Imputation in incomplete Water Metering Data Füllen der Blanks: Anwenden von Daten Imputation in unvollständigen Wasserzählerdaten 填充空白:在不完整的水计量数据中应用数据计算 2506.08882v1 -
681 06-10 Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models Pivoting Factorization: Eine kompakte Meta-Low-Rank-Darstellung von Sparsität für effiziente Schlussfolgerungen in großen Sprachmodellen 重力因素化:大语言模型中高效引引的公平代表性的契约元低率代表 2501.19090v2 -
682 06-10 Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks Sparse Spectral Training und Inferenz auf Euklidische und Hyperbolische Neuronale Netzwerke Euclidean 和超双曲神经网络的 广度光谱培训和推断 2405.15481v2 -
683 06-10 Adapting to Heterophilic Graph Data with Structure-Guided Neighbor Discovery Anpassung an heterophile Graphendaten mit strukturgeführter Neighbor Discovery 与结构引导邻居发现发现器相适应以适应结构辅助邻居发现器的热对数图形数据 2506.08871v1 -
684 06-10 Low-Rank Augmented Implicit Neural Representation for Unsupervised High-Dimensional Quantitative MRI Reconstruction Low-Rank Augmented Implizite Neural Repräsentanz für unüberwachte hochdimensionelle Quantitative MRT-Rekonstruktion 低Rank 低强度增强的内隐隐含神经力代表,用于未受监督的高多样性定量MRI重建 2506.09100v1 -
685 06-10 k-NN as a Simple and Effective Estimator of Transferability k-NN als einfacher und effektiver Schätzwert für die Übertragbarkeit k-NN作为可转让性的一个简单而有效的模拟器 2503.18528v2 -
686 06-10 Deep reinforcement learning for near-deterministic preparation of cubic- and quartic-phase gates in photonic quantum computing Tiefes Verstärkungslernen für die nahezu deterministische Vorbereitung von kubischen und quartic-Phasentoren im photonischen Quanten-Computing 深入强化学习,为光子量衡算的立方和临界阶段大门进行近乎决定性的准备 2506.07859v2 -
687 06-10 Cooperative and Collaborative Multi-Task Semantic Communication for Distributed Sources Kooperative und kollaborative Multi-Task Semantische Kommunikation für verteilte Quellen 合作与合作 多任务、跨语义交流促进分配源的传播 2411.02150v2 -
688 06-10 StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams StreamSplat: Auf dem Weg zu Online Dynamische 3D-Rekonstruktion aus unkalibrierten Videostreams StreamSplat:从未经校正的视频流走向在线动态3D重建 2506.08862v1 -
689 06-10 DIME:Diffusion-Based Maximum Entropy Reinforcement Learning DIME:Diffusionsbasiertes maximales Entropie-Verstärkungs-Lernen DIME: 以传播为基础的最大助爆强化学习 2502.02316v2 -
690 06-10 Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers Zu groß zu denken: Kapazität, Erinnerung und Verallgemeinerung in vortrainierten Transformern 能力、记忆和在培训前变异器中的普及化 2506.09099v1 -
691 06-10 On the Hardness of Sampling from Mixture Distributions via Langevin Dynamics Über die Härte der Probenahme aus Mischungsverteilungen über Langevin Dynamics 使用 Langevin Directives 进行混音发行的样本的强度 2406.02017v3 -
692 06-10 Learning surrogate equations for the analysis of an agent-based cancer model Erlernen von Surrogatgleichungen für die Analyse eines agentenbasierten Krebsmodells 用于分析基于制剂的癌症模式的学习替代方程 2503.01718v2 -
693 06-10 Agile Reinforcement Learning for Real-Time Task Scheduling in Edge Computing Agiles Verstärkungslernen für Echtzeit-Aufgabenplanung im Edge Computing 边边计算中实时任务排期 2506.08850v1 -
694 06-10 IMAGIC-500: IMputation benchmark on A Generative Imaginary Country (500k samples) IMAGIC-500: Imputations-Benchmark für ein generatives Imaginary Country (500k-Proben) IMAGIC-500: 创造想象力国家(500公里样本)的估测基准 2506.08844v1 -
695 06-10 Quantum Architecture Search with Unsupervised Representation Learning Quantenarchitektur Suche mit unüberwachtem Repräsentationslernen 以无人监督的代表制学习方式搜索量子结构 2401.11576v5 -
696 06-10 Robot Pouring: Identifying Causes of Spillage and Selecting Alternative Action Parameters Using Probabilistic Actual Causation Robot-Pouring: Identifizierung von Ursachen von Spillage und Auswahl alternativer Aktionsparameter mittels probabilistischer tatsächlicher Schadensursachen 机器人定位:利用概率实际原因查明螺旋原因和选择替代行动参数 2502.09395v3 -
697 06-10 Nearly Optimal Differentially Private ReLU Regression Fast optimal unterschiedliche private ReLU-Regression 接近最佳差异化私自 ReLU 回归 2503.06009v2 -
698 06-10 Relational decomposition for program synthesis Verhältnismäßige Zersetzung für die Programmsynthese 用于方案合成的通货膨胀分解 2408.12212v3 -
699 06-10 Stepsize anything: A unified learning rate schedule for budgeted-iteration training Stepsize everything: Eine einheitliche Lernrate Zeitplan für budgetierte iteration Ausbildung 采取任何步骤:为列入预算的就地培训统一学习率时间表 2505.24452v2 -
700 06-10 EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations EquivaMap: LLM-Leveraging für automatische Äquivalenzprüfung von Optimierungsformulierungen Equivamap: 利用LMLMs对优化配方进行自动等效检查 2502.14760v2 -
701 06-10 High-Dimensional Independence Testing via Maximum and Average Distance Correlations High-Dimensional Independence Testing über maximale und mittlere Distanzkorrelationen 通过最大和平均距离交错进行高度多功能独立测试 2001.01095v4 -
702 06-10 Activated LoRA: Fine-tuned LLMs for Intrinsics Aktivierte LoRA: Feinjustierte LLMs für die Intrinsik 活性LORA: 用于内分泌的微调LMLM 2504.12397v4 -
703 06-10 Spectral invariance and maximality properties of the frequency spectrum of quantum neural networks Spektrale Invarianz und maximale Eigenschaften des Frequenzspektrums quantenneuraler Netzwerke 量子神经网络频谱的光谱性和最大度性 2402.14515v3 -
704 06-10 Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO Eintauchen in RL für Bildgenerierung mit CoT: Eine Studie über DPO vs. GRPO 与COT一起为图像生成而跳入RL:关于DPO与GROP公司的研究 2505.17017v2 -
705 06-10 In Praise of Stubbornness: An Empirical Case for Cognitive-Dissonance Aware Continual Update of Knowledge in LLMs In Lob der Stubbornness: Ein empirischer Fall für Kognitiv-Dissonanz Aware Continual Update of Knowledge in LLMs 赞美固态:关于知识在LLM中不断更新的认知与不相识经验案例 2502.04390v2 -
706 06-10 Improving the Noise Estimation of Latent Neural Stochastic Differential Equations Verbesserung der Lärmschätzung latenter neuraler stochastischer Differentialgleichungen 改进对内心神经斯托卡式差异的噪音估计 2412.17499v2 -
707 06-10 Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval Optimierte Text-Embedding-Modelle und Benchmarks für die Amharische Passage Retrieval 阿姆光通过通过检索的最佳文本嵌入模型和基准 2505.19356v2 -
708 06-10 syren-baryon: Analytic emulators for the impact of baryons on the matter power spectrum Syren-Baryon: Analytische Emulatoren für den Einfluss von Baryonen auf das Materie-Leistungsspektrum syren-baryon: 用于分析男爵对物质能量频谱影响的分析模拟器 2506.08783v1 -
709 06-10 LT-PINN: Lagrangian Topology-conscious Physics-informed Neural Network for Boundary-focused Engineering Optimization LT-PINN: Lagrangian Topologie-bewusste physik-informierte Neuronales Netzwerk für boundary-focused Engineering Optimization LT-PINN:Lagrangian 地形 – – 具有意识的地形 – – 物理意识 – – 以边界为重点的工程优化知情神经网络 2506.06300v2 -
710 06-10 Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models Landsat-Bench: Datensätze und Benchmarks für Landsat-Stiftungsmodelle Landsat-Bench:Landsat基金会模型数据集和基准 2506.08780v1 -
711 06-10 Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch Silencing Empowerment, Allowing Bigotry: Auditierung der Moderation von Hass Rede auf Twitch 静默赋权,允许偏见:审计对抽搐者仇恨言论的温和程度 2506.07667v2 -
712 06-10 Data-Driven Discovery of PDEs via the Adjoint Method Datengestützte Entdeckung von PDEs über die Adjoint-Methode 通过 “ 联合方法 “ 发现项目设计书的数据驱动数据 2401.17177v4 -
713 06-10 TextAtari: 100K Frames Game Playing with Language Agents TextAtari: 100K Frames Spiel mit Sprachagenten TextAtari: 100K 框架游戏与语言代理游戏 2506.04098v2 -
714 06-10 Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal Discovery Pfade zur Kausalität: Informative Subgraphen innerhalb von Wissensgraphen finden für wissensbasierte Kausale Entdeckung 因果路径:在知识型因果发现的知识图中查找信息子集 2506.08771v1 -
715 06-10 CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries KLARIFY: Kontrastive Präferenz Verstärkungslernen für untangling Ambiguous Queries CLARIFY: 拼凑不清的问询中竞争优惠强化学习 2506.00388v3 -
716 06-10 Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting Kann langsam denkende LLMs Vernunft im Laufe der Zeit? Empirische Studien in Zeitreihen Prognose 时间序列预测的经验研究 2505.24511v2 -
717 06-10 Detecting malignant dynamics on very few blood sample using signature coefficients Erkennung maligner Dynamik an sehr wenigen Blutproben mit Signaturkoeffizienten 利用签字系数检测极少数血液样本中的恶性动态 2506.09097v1 -
718 06-10 On the Stability of the Jacobian Matrix in Deep Neural Networks Zur Stabilität der Jacobian Matrix in tiefen neuralen Netzwerken 深神经网络Jacobian矩阵的稳定 2506.08764v1 -
719 06-10 EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements EDINET-Bench: Bewertung von LLMs zu komplexen Finanzaufgaben anhand japanischer Finanzausweise EDINET-Bench:利用日本财务报表评估复杂金融任务LMs 2506.08762v1 -
720 06-10 Intra-Trajectory Consistency for Reward Modeling Intra-Trajektorien-Konsistenz für Prämienmodellierung 奖励建模在轨内兼容性 2506.09096v1 -
721 06-10 Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL Verbesserung der Genauigkeit und Aufrechterhaltung des Datenrücklaufs von Kernkraftwerken: Ein funktionsorientierter LLM-Ansatz über NL-zu-SQL 提高核电厂数据检索的准确性和可维护性:NL-SQL的功能计算LLM法 2506.08757v1 -
722 06-10 Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction Denken vs. Tun: Agenten, die Grund durch Skalierung Test-Zeit-Interaktion 思考与做:通过扩大试验时间互动而有理由的代理商 2506.07976v2 -
723 06-10 Understanding High-Dimensional Bayesian Optimization High-Dimensional Bayesian Optimierung verstehen 理解高多样性巴耶斯最佳优化 2502.09198v2 -
724 06-10 Superposed Parameterised Quantum Circuits Überlagerte parametrische Quantenkreise 量子电路 2506.08749v1 -
725 06-10 Towards Secure and Private Language Models for Nuclear Power Plants Auf dem Weg zu sicheren und privaten Sprachmodellen für Kernkraftwerke 争取实现核电厂安全和私用语言模式 2506.08746v1 -
726 06-10 Bridging RDF Knowledge Graphs with Graph Neural Networks for Semantically-Rich Recommender Systems Bridging RDF Knowledge Graphs mit Graph Neural Networks für Semantically-Rich Recommender Systeme 将RDF知识图与Smantic-Rich建议系统图形神经网络连接起来 2506.08743v1 -
727 06-10 Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports Urban Incident Prediction mit Graph Neural Networks: Integration von Regierungsbewertungen und Crowdsourced Reports 与图形神经网络城市事件预测:整合政府评级和众包报告 2506.08740v1 -
728 06-10 Exploration by Random Reward Perturbation Exploration durch zufällige Belohnung Störung 利用随机奖励干扰探测 2506.08737v1 -
729 06-10 scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data scSSL-Bench: Benchmarking Selbstüberwachtes Lernen für Single-Cell-Daten scSSL-Bench:为单一分类数据制定自我监督学习基准 2506.10031v1 -
730 06-10 Fusing Global and Local: Transformer-CNN Synergy for Next-Gen Current Estimation Fusing Global und Local: Transformer-CNN-Synergie für die aktuelle Schätzung von Next-Gen 全球和局部引信: 变异器-CNN 用于下金当前估算的协同效应 2504.07996v2 -
731 06-10 Flexible and Efficient Drift Detection without Labels Flexible und effiziente Drift-Erkennung ohne Etiketten 无标签的灵活而高效的漂流探测 2506.08734v1 -
732 06-10 Mitigating fairwashing using Two-Source Audits Fairwashing durch Zwei-Quellen-Audits abmildern 利用双重来源审计减少洗水 2305.13883v2 -
733 06-10 Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs Breaking the ICE: Erforschen von Versprechungen und Herausforderungen von Benchmarks für Inferenz-Kohlenstoff- & Energieschätzungen für LLMs 打破ICE:探索LLMM的碳和能源估算基准的许诺和挑战 2506.08727v1 -
734 06-10 Stop Misusing t-SNE and UMAP for Visual Analytics Stoppen Sie die missbräuchliche Verwendung von t-SNE und UMAP für Visual Analytics 停止滥用 t- SNE 和视觉分析分析 UMAP 2506.08725v1 -
735 06-10 VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? VIST-GPT: Sich in der Ära des Visual Storytelling mit LLMs niederlassen? 使用LLMS的视觉故事时代? 2504.19267v3 -
736 06-10 Strain Problems got you in a Twist? Try StrainRelief: A Quantum-Accurate Tool for Ligand Strain Calculations Strain Probleme haben Sie in einem Twist? Versuchen Sie StrainRelief: Ein Quantum-Accurate-Tool für Ligand Strain Berechnungen Strain Relief: Ligand Strain 计算 的量子精确工具 2503.13352v2 -
737 06-10 TimeWak: Temporal Chained-Hashing Watermark for Time Series Data TimeWak: Temporales Ketten-Hashing-Wasserzeichen für Zeitreihendaten TimeWak:用于时间序列数据的时链套装水印 2506.06407v2 -
738 06-10 ASIDE: Architectural Separation of Instructions and Data in Language Models ASIDE: Architekturelle Trennung von Instruktionen und Daten in Sprachmodellen AIDIDE: 语言模式教学和数据结构分类 2503.10566v3 -
739 06-10 A Survey on Quantum Machine Learning: Current Trends, Challenges, Opportunities, and the Road Ahead Eine Umfrage zum Quantum Machine Learning: Aktuelle Trends, Herausforderungen, Chancen und die Zukunft 关于 “ 量子机器学习:当前趋势、挑战、机遇和路前道路 “ 的调查 2310.10315v4 -
740 06-10 Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data Intelligenter zusammenführen, besser verallgemeinern: Modell verbessern Zusammenführen auf OOD-Daten 合并的 “ 聪明者 “ 、 “ 更普遍化:加强OOD数据模型合并 “ 2506.09093v1 -
741 06-10 Variational Autoencoder-Based Approach to Latent Feature Analysis on Efficient Representation of Power Load Monitoring Data Variationeller Autoencoder-basierter Ansatz zur Latent Feature Analyse zur effizienten Darstellung von Power Load Monitoring Daten 以变式自动编码器为基础,对电荷载量监测数据高效代表情况进行长期特征分析的变式自动编码器方法 2506.08698v1 -
742 06-10 Self-Training Elicits Concise Reasoning in Large Language Models Selbst-Trainings-Elizite schließen Vernunft in großen Sprachmodellen ein 大语言模式中的简明理由 2502.20122v3 -
743 06-10 CUDA-LLM: LLMs Can Write Efficient CUDA Kernels CUDA-LLM: LLMs können effiziente CUDA-Kernel schreiben CUDA-LLM: 能够写写有效CUDA核心的LLMss 2506.09092v1 -
744 06-10 Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy Variationelle Schlussfolgerung optimiert mit der gekrümmten Geometrie der gekoppelten freien Energie 使用共同自由能源曲线几何法优化 2506.09091v1 -
745 06-10 Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? Optuna vs Code Llama: Sind LLMs ein neues Paradigma für Hyperparameter-Tuning? Optuna 与 Code Llama:LLMs是超参数图例的新范式吗? 2504.06006v3 -
746 06-10 RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling RuleReasoner: Verstärkte regelbasiertes Reasoning via Domain-aware Dynamic Sampling 规则响应器:通过Domain-aware动态抽样加强基于规则的理由说明 2506.08672v1 -
747 06-10 sparseGeoHOPCA: A Geometric Solution to Sparse Higher-Order PCA Without Covariance Estimation spärlichGeoHOPCA: Eine geometrische Lösung, um PCA ohne Kovarianzabschätzung höher zu bestellen 稀多GeoHOPCA:无变化估计的普通五氯苯甲醚的几何解决办法 2506.08670v1 -
748 06-10 Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search Mit Blueprints und Prompt Template Search die Fähigkeit kleiner Sprachmodelle zu verbessern 提高有蓝图和快速模板搜索的小型语言模式的理性能力 2506.08669v1 -
749 06-10 Optimizing Learned Image Compression on Scalar and Entropy-Constraint Quantization Optimierung der Bildkompression auf Scalar- und Entropie-Konstraint-Quantisierung 优化 Scalar 和 Entropy- Contrpy- 约束量化的学习图像压缩 2506.08662v1 -
750 06-10 Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness Auf dem Weg zu robuster Echtzeit-Multivariate-Zeitreihenprognose: Ein einheitliches Framework für Abhängigkeit, Asynchronität und Missingness 走向强劲的现实世界多变时间序列预测:关于依赖、同步和失踪的统一框架 2506.08660v1 -
751 06-10 Integrating Asynchronous AdaBoost into Federated Learning: Five Real World Applications Integration von Asynchron AdaBoost in das Federated Learning: Fünf Anwendungen in der realen Welt 将AdaBoost亚洲同步体纳入联邦学习:五个真实世界应用 2506.09090v1 -
752 06-10 Scalable Graph Attention-based Instance Selection via Mini-Batch Sampling and Hierarchical Hashing Skalierbare Grafik aufmerksamkeitsbasierte Instanzenauswahl über Mini-Batch-Sampling und Hierarchical Hashing 通过小型批量抽样和等级类散列选择基于可缩放图表的注意点选案 2502.20293v2 -
753 06-10 When Simple Model Just Works: Is Network Traffic Classification in Crisis? Wenn einfaches Modell gerade funktioniert: Ist Netzwerkverkehrsklassifikation in der Krise? ” 简单模型 “ 何时起作用:网络交通分类是否处于危机之中? 2506.08655v1 -
754 06-10 A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck Ein Privacy-Preserving Föderated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck 保护隐私的联邦学习框架,以普及CBCT, 将合成CT翻译在头部和颈部 2506.08654v1 -
755 06-10 Compositional Causal Reasoning Evaluation in Language Models kompositorische Kausale Reasoning-Bewertung in Sprachmodellen 语文模式的构成原因原因原因评估 2503.04556v4 -
756 06-10 JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset JoFormer (Journal-basierter Transformer): Theorie und empirische Analyse auf dem winzigen Shakespeare-Datensatz Joformer(基于日圆的变换器):关于小莎士比亚数据集的理论和经验分析 2506.08652v1 -
757 06-10 BridgeNet: A Hybrid, Physics-Informed Machine Learning Framework for Solving High-Dimensional Fokker-Planck Equations BridgeNet: Hybrides, physikinformiertes Machine Learning Framework zur Lösung hochdimensionaler Fokker-Planck-Gleichungen BridgeNet:用于解决高二分法克-普朗克赤道的混合、物理成形机械学习框架 2506.04354v3 -
758 06-10 TableDreamer: Progressive and Weakness-guided Data Synthesis from Scratch for Table Instruction Tuning TableDreamer: Progressive und schwache Datensynthese von Scratch für die Table Instruction Tuning 表Dreamer:Scratch为表格指示图纸提供的渐进和弱点制导数据合成 2506.08646v1 -
759 06-10 Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach Durchgängige cross-modale und uni-modale Darstellungen: Ein Kronecker-Produktansatz 引信的跨模式和统一模式代表:克罗马克产品方法 2506.08645v1 -
760 06-10 Semi-gradient DICE for Offline Constrained Reinforcement Learning Halbgradienter DICE für Offline-Beschränktes Verstärkungslernen 用于非线外约束训练强化学习的半级DICE 2506.08644v1 -
761 06-10 An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation Eine Analyse von Hyperparameter-Optimierungsmethoden für die retrieval Augmented Generation 对回收增加的一代的超光子辐射仪最佳优化方法的分析 2505.03452v2 -
762 06-10 Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers Zeitreihendarstellungen für die Klassifizierung liegen in vorgebildeten Visionstransformatoren versteckt 未受过训练的愿景变异者中隐藏的分类隐隐藏物的 时间序列 2506.08641v1 -
763 06-10 CASE: Contrastive Activation for Saliency Estimation RECHTSSACHE: Kontrastive Aktivierung für Salienzschätzung CASE: 弹性估计的反动动动动 2506.07327v2 -
764 06-10 Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth Effizientes Feintuning Quantisierter Modelle über adaptive Rank und Bitwidth 通过适应级和Bitwidth对量化模型进行高效率的微调 2505.03802v3 -
765 06-10 Discovering Continuous-Time Memory-Based Symbolic Policies using Genetic Programming Kontinuierliche-Zeit-Speicher-basierte symbolische Richtlinien mit genetischer Programmierung entdecken 利用遗传方案拟定办法发现基于连续时间记忆的象征性政策 2406.02765v6 -
766 06-10 HSG-12M: A Large-Scale Spatial Multigraph Dataset HSG-12M: Ein großräumiger multigraphischer Datensatz HSG-12M:大型空间多光谱数据集 2506.08618v1 -
767 06-10 Predicting and Understanding College Student Mental Health with Interpretable Machine Learning Vorhersagen und Verständnis College Student Mental Health mit Interpretable Machine Learning 预测和理解学院学生心理健康与可解释机器学习 2503.08002v3 -
768 06-10 Generalizing while preserving monotonicity in comparison-based preference learning models Verallgemeinern unter Wahrung der Monotonizität in vergleichsbasierten Präferenz-Lernmodellen 在以比较为基础的优惠学习模式中,在保持单一性的同时,实现普遍化 2506.08616v1 -
769 06-10 PatchTrAD: A Patch-Based Transformer focusing on Patch-Wise Reconstruction Error for Time Series Anomaly Detection PatchTrad: Ein Patch-basierter Transformer mit Schwerpunkt auf Patch-Wise-Reconstruction-Fehler für Zeitreihenanomalienerkennung PatchTraAD:一个以补丁为主的变换器,侧重于用于反常探测的时间序列的补补维重建错误 2504.08827v2 -
770 06-10 A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices Ein zweistufiges Datenauswahl-Framework für dateneffiziente Modellschulungen auf Edge-Geräten 边缘装置数据有效模型培训两阶段数据选择框架 2505.16563v2 -
771 06-10 CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference CENTAUR: Die Unmögliche Dreifaltigkeit von Datenschutz, Effizienz und Leistung in der Datenschutz-Bewahrung Transformer-Schlussfolgerung CENTAUR:在隐私保护变换人推断中弥合隐私、效率和性能这三方面不可能实现的三重问题 2412.10652v2 -
772 06-10 Spiking Neural Models for Decision-Making Tasks with Learning Spiralische Neuralmodelle für Entscheidungs-Making-Aufgaben mit Lernen 学习决策任务Spiking神经模型 2506.09087v1 -
773 06-10 Sample Efficient Demonstration Selection for In-Context Learning Beispiel Effiziente Demonstrationsauswahl für das In-Context-Lernen 内部学习的高效示范选择 2506.08607v1 -
774 06-10 Flow Matching Meets PDEs: A Unified Framework for Physics-Constrained Generation Flow Matching trifft auf PDEs: Ein einheitliches Framework für physikbedingte Generationen 流动匹配符合PDEs:受物理约束的一代的统一框架 2506.08604v1 -
775 06-10 CALT: A Library for Computer Algebra with Transformer CALT: Eine Bibliothek für Computer Algebra mit Transformer CALT:一个带有变换器的计算机代数库 2506.08600v1 -
776 06-10 Model-based learning for joint channel estimationand hybrid MIMO precoding Modellbasiertes Lernen für gemeinsame Kanalschätzung und hybride MIMO-Vorkodierung 联合渠道估计和混合混合混合的混合混合混合混合混合混合混合混合模式的模型学习 2505.04255v2 -
777 06-10 Solving excited states for long-range interacting trapped ions with neural networks Lösen von angeregten Zuständen für Langstrecken-Interagieren von gefangenen Ionen mit neuronalen Netzwerken 用神经网络解决长距离互动嵌入离子的兴奋状态 2506.08594v1 -
778 06-10 Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings Dichte Retriever können bei einfachen Abfragen scheitern: Enthüllen des Granularitäts-Dilemmas der Einbettungen 简单查询: 读取嵌入器的颗粒性三角形 2506.08592v1 -
779 06-10 Diversity-Guided MLP Reduction for Efficient Large Vision Transformers Diversity-geführte MLP-Reduktion für effiziente Großbildtransformatoren 为高效的大型愿景变异器而减少由多样性引导的减量 2506.08591v1 -
780 06-10 How transformers learn structured data: insights from hierarchical filtering Wie Transformatoren strukturierte Daten lernen: Erkenntnisse aus der hierarchischen Filterung 变压器如何学习结构化数据:从上层过滤器获得的洞察力 2408.15138v3 -
781 06-10 Diffusion-based Time Series Forecasting for Sewerage Systems Diffusionsbasierte Zeitreihenprognosen für Kanalisationssysteme 排污系统时间序列预报 2506.08577v1 -
782 06-10 FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL FlickerFusion: Intra-Trajektorie Domain Generalizing Multi-Agent RL FlickerFusion: 磁盘内域域 通用多代理 RL 2410.15876v4 -
783 06-10 How Malicious AI Swarms Can Threaten Democracy Wie bösartige KI-Schwärme die Demokratie bedrohen können 恶意的AI Swararms能够如何威胁民主 2506.06299v2 -
784 06-10 The Geometries of Truth Are Orthogonal Across Tasks Die Geometrien der Wahrheit sind orthogonal über Aufgaben hinweg 真理的地貌不同任务 2506.08572v1 -
785 06-10 Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification Asymptotische Normalität von unendlichen zentrierten Zufallswäldern -Anwendung auf eine unausgewogene Klassifizierung 无限集中的随机森林的零散状态—-对平衡分类的适用 2506.08548v1 -
786 06-10 LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation LLM-ML Teaming: Integrierte Symbolische Dekodierung und Gradientensuche nach gültiger und stabiler Generativer Feature Transformation LLM-ML 团队化:综合符号解码和渐进搜索,以进行有效和稳定的生成地貌转型 2506.09085v1 -
787 06-10 Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models Verbesserte Ganzseitenoptimierung durch gemischt-körnige Belohnungsmechanismen-angepasste Sprachmodelle 通过混合发奖机制-成熟语言模式优化全页 2506.09084v1 -
788 06-10 Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models Auf dem Weg zu einem autonomen Stärkungslernen für die real-weltliche Robotermanipulation mit großen Sprachmodellen 争取以大语言模式进行现实世界机器人操纵的自主强化学习 2503.04280v4 -
789 06-10 Structured Variational $D$-Decomposition for Accurate and Stable Low-Rank Approximation Strukturierte Variation $D$-Dekomposition für genaue und stabile Low-Rank-Annäherung 结构变化($D美元)-准确度和稳定、低 Rank 接近度的削减 2506.08535v1 -
790 06-10 Regularized Langevin Dynamics for Combinatorial Optimization Regularisierte Langevin-Dynamik für kombinatorische Optimierung 固定的用于组合优化的兰埃文动态 2502.00277v2 -
791 06-10 Federated Learning: From Theory to Practice Föderiertes Lernen: Von der Theorie zur Praxis 联邦学习:从理论到实践 2505.19183v2 -
792 06-10 Robust Evolutionary Multi-Objective Network Architecture Search for Reinforcement Learning (EMNAS-RL) Robuste evolutionäre multiobjektive Netzwerkarchitektur Suche nach Verstärkungslernen (EMNAS-RL) 强劲进化多目标网络强化学习建筑搜索(EMNAS-RL) 2506.08533v1 -
793 06-10 Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks Effiziente robuste konforme Vorhersage über Lipschitz-gebundene Netzwerke 通过Lipschitz-Bounds网络进行非正式预测 2506.05434v2 -
794 06-10 Leveraging chaos in the training of artificial neural networks Das Chaos in der Ausbildung künstlicher neuronaler Netze abbauen 在培训人工神经网络时利用混乱 2506.08523v1 -
795 06-10 FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching FEDTAIL: Federated Long-Tailed Domain Generalization mit Sharpness-Guided Gradient Matching FEDTAIL: 采用尖锐引导梯度匹配法的联邦长期失败域 2506.08518v1 -
796 06-10 Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering Meta-Adaptive Prompt-Destillation für wenige-heiße visuelle Frage-Antworten 用于少数浅光视觉问题解答的元适应性快速蒸馏 2506.06905v2 -
797 06-10 NeurIPS 2024 ML4CFD Competition: Results and Retrospective Analysis NeurIPS 2024 ML4CFD Wettbewerb: Ergebnisse und Retrospektivenanalyse NeurIPS 2024 ML4CFD竞争:结果和回溯性分析 2506.08516v1 -
798 06-10 DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training DiffGradCAM: Eine universelle Aktivierungskarte der Klasse, die dem adversarialen Training standhält DiffGradCAM: 通用级启动地图抗反向培训 2506.08514v1 -
799 06-10 Human-Aligned Image Models Improve Visual Decoding from the Brain Menschlich ausgerichtete Imagemodelle verbessern die visuelle Dekodierung aus dem Gehirn 人与人之间的图像模型改进大脑的视觉解码 2502.03081v3 -
800 06-10 Explaining, Fast and Slow: Abstraction and Refinement of Provable Explanations Erklären, Schnell und Langsam: Abstraktion und Verfeinerung wahrnehmbarer Erklärungen 解释、快速和缓慢:可允许解释的抽象和完善 2506.08505v1 -
801 06-10 CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations CoMuMDR: Code-gemischter multimodaler Multidomain-Korpus für Diskurs paRsing in Gesprächen COMMDR: 混合编码的多式混合多式对话中聊天声响多式公文 2506.08504v1 -
802 06-10 Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning Memory, Benchmark & Robots: Ein Benchmark für die Lösung komplexer Aufgaben mit Verstärkungslernen 记忆、基准和机器人:用强化学习解决复杂任务的基准 2502.10550v2 -
803 06-10 DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs In Konflikte geraten: In suchgesteigerten LLMs widersprüchliche Quellen erkennen und bekämpfen 钻入冲突:发现和解决搜索中的冲突源 2506.08500v1 -
804 06-10 Chip Placement with Diffusion Models Chipplatzierung mit Diffusionsmodellen 与传播模型一起放置芯片 2407.12282v3 -
805 06-10 The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning Das Gesicht des Populismus: Prüfung von Unterschieden in Gesichtsgefühlen politischer Führer mit Hilfe von maschinellem Lernen 民粹主义的面孔:利用机器学习研究政治领袖的情感表达方式的差别 2304.09914v5 -
806 06-10 Thermodynamically Consistent Latent Dynamics Identification for Parametric Systems Thermodynamisch konsistente Latent Dynamics-Identifikation für parametrische Systeme 参数系统热动力一致的冷流动态识别 2506.08475v1 -
807 06-10 Penalty Learning for Optimal Partitioning using Multilayer Perceptron Penalty Learning für optimale Partitionierung mit Multilayer Perceptron 使用多层 Perceptron 进行最佳分区的惩罚学习 2408.00856v4 -
808 06-10 A Survey of the Self Supervised Learning Mechanisms for Vision Transformers Eine Übersicht über die selbstüberwachten Lernmechanismen für Visionstransformatoren 展望变革者自我监督学习机制调查 2408.17059v5 -
809 06-10 How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation Wie richten und ergänzen Bilder LiDAR? Auf dem Weg zu einer harmonisierten multimodalen 3D-Panoptischen Segmentierung 图像如何对齐和补充 LiDAR ? 2505.18956v2 -
810 06-10 AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models AVA-Bench: Atomic Visual Ability Benchmark für Vision Foundation Modelle AVA-Bench:远景基础模型的原子视觉能力基准 2506.09082v1 -
811 06-10 MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature MAC: Eine effiziente Gradienten-Vorkonditionierung mit der mittleren Aktivierung Ungefähre Krümmung MAC: 使用平均振动作用的高效渐进式预设设备 2506.08464v1 -
812 06-10 How to Provably Improve Return Conditioned Supervised Learning? Wie man wahrscheinlich verbessern Rückkehr bedingtes Überwachtes Lernen? 如何改善受监督的学习? 2506.08463v1 -
813 06-10 MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning MOBODY: Modellbasierte Off-Dynamik Offline-Verstärkung Lernen MOBODY: 示范非药物非药物脱线强化学习 2506.08460v1 -
814 06-10 Textual Unlearning Gives a False Sense of Unlearning Textuelles Nichtlernen gibt einen falschen Sinn für das Nichtlernen 文字不学习给人一个错误的不学习感 2406.13348v3 -
815 06-10 The interplay of robustness and generalization in quantum machine learning Das Zusammenspiel von Robustheit und Verallgemeinerung im quantenmechanischen Lernen 量子机器学习中强力和一般化的相互作用 2506.08455v1 -
816 06-10 An Explainable Vision Transformer with Transfer Learning Combined with Support Vector Machine Based Efficient Drought Stress Identification Ein erklärbarer Vision Transformer mit Transfer Learning kombiniert mit Unterstützung Vector Maschine basierte effiziente Drought Stress-Identifikation 一个可解释的愿景变异器,与转让学习相结合,与基于支持矢量的高效干旱应激反应机的识别 2407.21666v2 -
817 06-10 Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data Stabilisierendes Kontrastives RL: Techniken für das Roboterziel, das von Offline-Daten aus erreicht wird 稳定违规RL:利用离线数据实现机器人目标的技术 2306.03346v3 -
818 06-10 Systematic and Efficient Construction of Quadratic Unconstrained Binary Optimization Forms for High-order and Dense Interactions Systematische und effiziente Konstruktion von Quadratic Unconstrained Binary Optimization Forms für High-Order und Dense Interaktionen 系统、高效地为高阶和高调互动和高调互动系统构建 “ 中度、无限制的 “ 优化二进制表格 2506.08448v1 -
819 06-10 Understanding Bias Reinforcement in LLM Agents Debate Verständnis der Bias-Verstärkung in LLM-Agenten-Debatte 了解LLLM代理商的强化申请 2503.16814v3 -
820 06-10 A Framework for Controllable Multi-objective Learning with Annealed Stein Variational Hypernetworks Ein Rahmen für kontrollierbares multiobjektives Lernen mit Annealed Stein Variational Hypernetworks 与Annaaled Stein变异超网络的可控多目标学习框架 2506.06715v2 -
821 06-10 Time-Aware World Model for Adaptive Prediction and Control Time-Aware-Weltmodell für adaptive Vorhersage und Kontrolle 适应性预测和控制世界模型 2506.08441v1 -
822 06-10 Refiner: Data Refining against Gradient Leakage Attacks in Federated Learning Raffinerie: Datenveredelung gegen gradiente Leckage-Angriffe im Federated Learning 完善:完善数据,防止联邦学习中逐渐泄漏事件 2212.02042v3 -
823 06-10 Scaling Laws in Linear Regression: Compute, Parameters, and Data Skalierungsgesetze in linearer Regression: Berechnung, Parameter und Daten 线性回归法的扩展法:计算、参数和数据 2406.08466v3 -
824 06-10 Learning to Lead: Incentivizing Strategic Agents in the Dark Führen lernen: Anreize für strategische Agenten im Dunkeln 学习铅:在黑暗中激励战略剂 2506.08438v1 -
825 06-10 Olica: Efficient Structured Pruning of Large Language Models without Retraining Olica: Effizientes strukturiertes Beschneiden großer Sprachmodelle ohne Umschulung Olica:无再培训的大型语言模型高效结构稳健 2506.08436v1 -
826 06-10 Boosting Gradient Leakage Attacks: Data Reconstruction in Realistic FL Settings Förderung gradienter Leckagenangriffe: Datenwiederaufbau in realistischen FL-Einstellungen 推动渐变渗漏袭击:在现实的FL设置中进行数据重建 2506.08435v1 -
827 06-10 FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making FinHEAR: Human Expertise und adaptive Risiko-Bewusst Temporale Begründung für Finanzentscheidungen FinHEAR: 金融决策的人类专门知识和适应性风险—-风险警报时间原因 2506.09080v1 -
828 06-10 Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings Verstärkte Konvergenzraten für die nichtkonvexe Optimierung durch Reduktionsmappings 通过减少绘图实现非电流优化的夏端趋同率 2506.08428v1 -
829 06-10 HASFL: Heterogeneity-aware Split Federated Learning over Edge Computing Systems HASFL: Heterogenität bewusst Split Federated Learning over Edge Computing Systems HASFL: 超越边缘电子计算系统的异异质-认知异质分化联邦学习系统 2506.08426v1 -
830 06-10 Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy Mic-hackathon 2024: Hackathon zum maschinellen Lernen für Elektronen- und Rastersondenmikroskopie Mic-hackathon 2024:关于电子和扫描探测器机器学习的Hackathon 2506.08423v1 -
831 06-10 Dense ReLU Neural Networks for Temporal-spatial Model Dense ReLU Neuronale Netzwerke für zeitlich-räumliches Modell 时空空间模型神经网络 2411.09961v8 -
832 06-10 DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization DisCO: Mit diskriminativ eingeschränkter Optimierung groß angelegte Modelle stärken disCO:加强大理由理由模型,采用有偏见的限制性最佳化 2505.12366v2 -
833 06-10 Online Learning-guided Learning Rate Adaptation via Gradient Alignment Online Learning-guided Learning Rate Anpassung über Gradient Alignment 通过 “ 渐进协调 “ 调整,调整在线学习指导学习率 2506.08419v1 -
834 06-10 Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood Offline RL mit glatter OOD-Verallgemeinerung in Convex Hull und seiner Nachbarschaft 在Convex Hull及其邻里平滑OOOD平滑的离线RL(RL) 2506.08417v1 -
835 06-10 DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data DISCO balanciert die Skalen: Adaptives Domain- und Schwierigkeitsgrad-Bewusst-Verstärkung-Lernen auf unausgewogenen Daten DISCO 平衡规模:关于数据不平衡的适应性域域和困难性软件强化学习 2505.15074v2 -
836 06-10 Improved Scaling Laws in Linear Regression via Data Reuse Verbesserte Skalierungsgesetze bei linearer Regression über Datenwiederverwertung 通过数据再利用改进线性回归通过数据再利用的扩展法 2506.08415v1 -
837 06-10 Learning to Hear Broken Motors: Signature-Guided Data Augmentation for Induction-Motor Diagnostics Lernen, gebrochene Motoren zu hören: Signatur-geführte Datenvergrößerung für Induktions-Motor-Diagnostik 学习听听破碎汽车:诱导-摩托汽车诊断的签名指导数据增强 2506.08412v1 -
838 06-10 Mutual-Taught for Co-adapting Policy and Reward Models Gegenseitiges Lernen für Ko-Anpassungspolitik und Prämienmodelle 共同适应政策和奖励模式相互培训 2506.06292v2 -
839 06-10 FUSE: Measure-Theoretic Compact Fuzzy Set Representation for Taxonomy Expansion FUSE: Maß-theoretische Kompakt-Fuzzy-Set-Darstellung für Taxonomie-Expansion FUSE: 测量-理论结合模糊集束代表扩大分类学 2506.08409v1 -
840 06-10 Towards Generalized Source Tracing for Codec-Based Deepfake Speech Auf dem Weg zu einer generalisierten Quellenverfolgung für codec-basierte Deepfake-Rede 努力实现以基于编码的 “ 深层假言 “ 为依据的 “ 普遍源源追踪 “ 目标 2506.07294v2 -
841 06-10 RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation RLHS: Fehlausrichtung in RLHF mit Hindsight Simulation abmildern RLHS: 减轻RLHF的差错,采用速光模拟模拟 2501.08617v3 -
842 06-10 Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry Sparse Training von Zufallsinitialisierung: Ausrichten von Lotteriekartenmasken mit Gewichtssymmetrie 随机初始化的分散训练: 使用重量对称对调彩票票面罩 2505.05143v2 -
843 06-10 mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks mSTEB: Massive mehrsprachige Bewertung von LLMs zu Sprach- und Textaufgaben mSTEB: 对关于发言和文本任务LLM女士进行大规模多语种评价 2506.08400v1 -
844 06-10 Spatiotemporal deep learning models for detection of rapid intensification in cyclones Geotemporale Deep-Learning-Modelle zur Erkennung einer schnellen Verstärkung bei Zyklonen 探测气旋风暴迅速加剧的斯波时深深学习模式 2506.08397v1 -
845 06-10 From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models Von Pixeln zu Prädikaten: Learning Symbolic World Models via Pretrained Vision-Language Models 从像素到预言:通过预先培训的视觉语言模型学习象征性世界模式 2501.00296v3 -
846 06-10 Learning-based density-equalizing map Learning-based Dichte-equalizing Karte 基于学习的密度平等图 2506.10027v1 -
847 06-10 Reinforcement Learning Teachers of Test Time Scaling Verstärktes Lernen von Lehrern der Testzeitskalierung 测试时间尺度强化学习教师 2506.08388v1 -
848 06-10 Network Threat Detection: Addressing Class Imbalanced Data with Deep Forest Netzwerk Threat Detection: Adressierung von unausgeglichenen Daten der Klasse mit tiefem Wald 网络威胁探测:处理深森林的分类平衡数据 2506.08383v1 -
849 06-10 Reinforce LLM Reasoning through Multi-Agent Reflection Verstärkung der LLM-Vernunft durch Multi-Agent-Reflexion 通过多方机构反省说明理由 2506.08379v1 -
850 06-10 A Probabilistic Framework for Imputing Genetic Distances in Spatiotemporal Pathogen Models Ein probabilistischer Rahmen für die Imputierung genetischer Entfernungen in Spatiotemporal Pathogen-Modellen 关于将遗传距离算入时地病原体模型的概率框架 2506.09076v1 -
851 06-10 AlphaFold Database Debiasing for Robust Inverse Folding AlphaFold Datenbank-Debiasing für robuste Inverse Faltung AlphaFold 数据库对硬性反向折叠的偏移 2506.08365v1 -
852 06-10 Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity Lösen von Convex-Concave-Problemen mit $\tilde{\mathcal{O}}(ε^{-4/7})$ Oracle-Komplexität zweiter Ordnung 以 $\ title_ mathcal{O} (- 4/ 7}) 美元解决二次折形甲骨文复杂度的连接问题 2506.08362v1 -
853 06-10 NysAct: A Scalable Preconditioned Gradient Descent using Nystrom Approximation NysAct: Ein skalierbarer vorkonditionierter Gradient Abstieg mit Nystrom-Annäherung NysAct: 使用Nystrom 接近法的可缩放的预设条件渐变源 2506.08360v1 -
854 06-10 An Adaptive Method Stabilizing Activations for Enhanced Generalization Eine adaptive Methode zur Stabilisierung von Aktivierungen für verstärkte Generalisierung 增强普遍化的适应方法稳定活动 2506.08353v1 -
855 06-10 Differentially Private Relational Learning with Entity-level Privacy Guarantees Unterschiedlich privates relationales Lernen mit Datenschutzgarantien auf Körperschaftsebene 与实体一级的隐私保障有区别的私人关系学习 2506.08347v1 -
856 06-10 Re4MPC: Reactive Nonlinear MPC for Multi-model Motion Planning via Deep Reinforcement Learning Re4MPC: Reaktives nichtlineares MPC für Multi-Model-Bewegungsplanung über Deep Reinforcement Learning Re4MPC:通过深强化学习进行多模式运动规划的反活动非线性非线性MPC 2506.08344v1 -
857 06-10 Dynamical System Optimization Dynamische Systemoptimierung 动态系统优化 2506.08340v1 -
858 06-10 Equivalence of stochastic and deterministic policy gradients Gleichwertigkeit stochastischer und deterministischer politischer Gradienten 政策梯度和确定性政策梯度等同 2505.23244v2 -
859 06-10 midr: Learning from Black-Box Models by Maximum Interpretation Decomposition midr: Von Black-Box-Modellen lernen durch maximale Interpretationszersetzung 中间体: 通过最大解析分解从黑ox 模型中学习 2506.08338v1 -
860 06-10 A Simple Analysis of Discretization Error in Diffusion Models Eine einfache Analyse des Diskretisierungsfehlers in Diffusionsmodellen 简单分析扩散模型中的分解错误 2506.08337v1 -
861 06-10 Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer Zero-shot Meta-Learning für Tabular Prediction Aufgaben mit adversarially Pre-trained Transformer 采用未受过训练的变换器进行表列预测任务 2502.04573v2 -
862 06-10 Generation from Noisy Examples Generierung aus lauteren Beispielen 从吵闹中生成 2501.04179v2 -
863 06-10 Graph Prompting for Graph Learning Models: Recent Advances and Future Directions Graph Prompting für Graph Learning Models: Aktuelle Fortschritte und zukünftige Richtungen 图示学习模型提示图示:最近的进展和未来方向 2506.08326v1 -
864 06-10 Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces Modellfreier Kernel Conformal Depth misst Algorithmen zur Unsicherheitsquantifizierung in Regressionsmodellen in trennbaren Hilbert-Räumen 在塞可分离的希尔伯特空间的回归模型中,不确定性量化不确定性的模型-无模型核心常规 常规深度测量标准 2506.08325v1 -
865 06-10 RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors und Natural Language 关于音频、视频、嵌入传感器和自然语言的问答答案 2505.17114v2 -
866 06-10 TPP-LLM: Modeling Temporal Point Processes by Efficiently Fine-Tuning Large Language Models TPP-LLM: Modellierung von Temporal Point-Prozessen durch effizientes Feintuning großer Sprachmodelle TPP-LLM:通过高效微调大语言模式模拟时点进程 2410.02062v2 -
867 06-10 Curse of High Dimensionality Issue in Transformer for Long-context Modeling Fluch der Hochdimensionalitätsfrage im Transformer für die Langkontextmodellierung 变异器中高多维度问题的诅咒,用于长期建模 2505.22107v3 -
868 06-10 Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion Warum Diffusion arbeitet maskieren: Zustand auf dem Sprungplan für verbesserte diskrete Diffusion 为何遮掩传播工作有效:改进传播的跳跳时间表上的条件 2506.08316v1 -
869 06-10 Private Evolution Converges Private Evolutionskonvergenzen 私人进化集市 2506.08312v1 -
870 06-10 Continuous Temporal Learning of Probability Distributions via Neural ODEs with Applications in Continuous Glucose Monitoring Data Kontinuierliches Temporales Lernen der Wahrscheinlichkeitsverteilungen über neurale ODEs mit Anwendungen in kontinuierlichen Glukoseüberwachungsdaten 不断从时间上学习通过有持续甘蔗监测数据应用的神经代码对概率分布进行概率分布的连续时间学习 2505.08698v2 -
871 06-10 GRAM: Generalization in Deep RL with a Robust Adaptation Module GRAM: Verallgemeinerung in Deep RL mit einem robusten Anpassungsmodul GRAM: 带有强力适应模块的深 RL 常规化 2412.04323v2 -
872 06-10 H$^2$GFM: Towards unifying Homogeneity and Heterogeneity on Text-Attributed Graphs H$^2$GFM: Auf dem Weg zur Vereinheitlichung von Homogenität und Heterogenität auf Text-Attributen Grafiken H$2美元GFM:努力在文本图表中统一同质性和异质性 2506.08298v1 -
873 06-09 (1) From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information? Von passiv zur aktiven Vernunft: Können große Sprachmodelle unter unvollständigen Informationen die richtigen Fragen stellen? 从被动到积极理由:大语言模式能够在不完整信息下询问正确的问题吗? 2506.08295v1 -
874 06-09 From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium Von der Debatte zum Equilibrium: Glaube-getriebener Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium 从辩论到平衡:通过巴伊西亚纳什平衡进行 2506.08292v1 -
875 06-09 Digital Twin Synchronization: Bridging the Sim-RL Agent to a Real-Time Robotic Additive Manufacturing Control Digitale Twin-Synchronisierung: Überbrücken des Sim-RL Agents zu einer Echtzeit-Roboter-Zusatz-Fertigungssteuerung 数字双双同步同步:将Sim-RL代理连接到实时机器人添加剂制造控制 2501.18016v2 -
876 06-09 Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder Therapy Edge Computing basierte human-Roboter Kognitive Fusion: Eine medizinische Fallstudie in der Autismus-Spektrum-Störungstherapie 以边缘计算机为基础的人类-机器人认知共生:自闭症频谱病理医学案例研究 2401.00776v2 -
877 06-09 Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers Präzise hochdimensionale Asymptotik für die Quantifizierung heterogener Übertragungen 用于量化异基因转移的精确高多元性高活性 2010.11750v5 -
878 06-09 Voice Impression Control in Zero-Shot TTS Sprach-Impression-Steuerung in Zero-Shot TTS 零热TTTS中的声音压抑控制 2506.05688v2 -
879 06-09 Seeing Voices: Generating A-Roll Video from Audio with Mirage Stimmen sehen: A-Roll-Video aus Audio mit Mirage generieren 观看声音:从《幻影之音》制作一首歌曲视频 2506.08279v1 -
880 06-09 Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain Instruction-Tuned Video-Audio Modelle Elucidate Funktionelle Spezialisierung im Gehirn 脑中精明功能性专业 2506.08277v1 -
881 06-09 SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device SnapGen-V: Ein Fünf-Sekunden-Video innerhalb von fünf Sekunden auf einem mobilen Gerät erzeugen SnapGen-V:在移动设备上制作5秒内5秒内5秒的视频 2412.10494v2 -
882 06-09 LEANN: A Low-Storage Vector Index LEANN: Ein speicherarmer Vektorindex LEANN:低压力矢量指数 2506.08276v1 -
883 06-09 Exploring Microstructural Dynamics in Cryptocurrency Limit Order Books: Better Inputs Matter More Than Stacking Another Hidden Layer Erforschung mikrostruktureller Dynamiken in Kryptowährung Limit Order Books: Bessere Eingaben sind wichtiger als das Stapeln einer anderen versteckten Ebene 探索加密货币限制顺序书中的微观结构动态:更好的投入比另层隐藏层更重要 2506.05764v2 -
884 06-09 DualDynamics: Synergizing Implicit and Explicit Methods for Robust Irregular Time Series Analysis DualDynamics: Synergetische Implizite und explizite Methoden zur robusten irregulären Zeitreihenanalyse 双动态: 优化非常规时间序列分析的隐含和显性方法 2401.04979v6 -
885 06-09 Navigating the Latent Space Dynamics of Neural Models Navigation der latenten Raumdynamik von Neuralmodellen 导航内壳模型的冷层空间动态 2505.22785v2 -
886 06-09 Comprehensive Review of Neural Differential Equations for Time Series Analysis Umfassende Überprüfung neuraler Differentialgleichungen für die Zeitreihenanalyse 全面审查用于时间序列分析的神经差异等同 2502.09885v2 -
887 06-09 Universal Differential Equations for Scientific Machine Learning of Node-Wise Battery Dynamics in Smart Grids Universelle Differentialgleichungen für das wissenschaftliche maschinelle Lernen von Node-Wise-Batteriedynamiken in Smart Grids 智能网格节点电池动态科学机器学习通用差异等量 2506.08272v1 -
888 06-09 Why Gradients Rapidly Increase Near the End of Training Warum Gradienten am Ende der Ausbildung rasch zunehmen 为何在培训接近结束时梯度迅速增加 2506.02285v2 -
889 06-09 Sparse Interpretable Deep Learning with LIES Networks for Symbolic Regression Sparse Interpretable Deep Learning mit LIES-Netzwerken für symbolische Regression 与 LIES 网络共享的粗化可解释深空学习,用于符号回归 2506.08267v1 -
890 06-09 Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints Verstärktes Lernen aus menschlichem Feedback mit hochkonfidenziellen Sicherheitseinschränkungen 在高度信任安全限制下,从人类反馈中加强学习 2506.08266v1 -
891 06-09 Learning-Based Multiuser Scheduling in MIMO-OFDM Systems with Hybrid Beamforming Lernbasierte Multiuser-Scheichung in MIMO-OFDM-Systemen mit Hybrid-Beamforming MOIMO-OFDM系统和混合波束系统中基于学习的多用户规划 2506.08263v1 -
892 06-09 SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense SHIELD: Sichere Hypernetzwerke für inkrementale Expansions-Learning Defense SHIELD: 安全超网络,以逐步扩大学习防御 2506.08255v1 -
893 06-09 Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction Selbstüberwachte Transformer als iterative Lösungsverbesserungen für Einschränkungen Zufriedenheit 自我主宰的变形器作为抑制满意度的循环解决方案改进器 2502.15794v2 -
894 06-09 TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation TRAVEL: Training-freies Retrieval und Ausrichtung für Vision-und-Sprachen-Navigation TRAVEL:无培训检索和协调愿景和语言导航 2502.07306v2 -
895 06-09 Approximation-Aware Bayesian Optimization Angemessene Bayesian-Optimierung 近似- Aware Bayesian 优化优化 2406.04308v2 -
896 06-09 General Loss Functions Lead to (Approximate) Interpolation in High Dimensions Allgemeine Verlustfunktionen führen zu (nahezu) Interpolation in hohen Dimensionen 导致(近近)高层面内插(近) 2303.07475v2 -
897 06-09 Parameter-free approximate equivariance for tasks with finite group symmetry Parameterfreie ungefähre Äquivarianz für Aufgaben mit endlicher Gruppensymmetrie 有限组对称任务无参数 2506.08244v1 -
898 06-09 Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic Temporalisierendes Vertrauen: Bewertung von Ketten-of-Thought-Reasoning mit Signal Temporal Logic 时间化信任度:对根据信号时空逻辑进行的研究链原因的评价 2506.08243v1 -
899 06-09 Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations Umgang mit den bösen Zwillingen: Zufällige Augmentation verbessern, indem Katastrophenvergessen von verschiedenen Augmentationen adressiert wird 对付邪恶双胞胎:通过解决灾难性地忘记多种增加现象,改善随机增加现象 2506.08240v1 -
900 06-09 MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning MF-LAL: Arzneimittel-Compound-Generierung mit Multi-Fidelity Latent Space Active Learning MF-LAL:利用多功能低空空间主动学习生成药物化合物 2410.11226v3 -
901 06-09 Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework Gewährleistung der Zuverlässigkeit von kuratierten EHR-Daten: Die Validierung der Genauigkeit für LLM/ML-extrahierte Informationen und Daten (VALID) Framework 确保封闭式电子HR-驱动数据可靠性:LLM/ML-提取信息和数据(VALID)框架的准确性验证 2506.08231v1 -
902 06-09 Scaling Laws of Motion Forecasting and Planning – A Technical Report Skalierungsgesetze der Bewegungsvorhersage und Planung – Ein technischer Bericht 扩大运动预测和规划法 – – 技术报告 2506.08228v1 -
903 06-09 Mondrian: Transformer Operators via Domain Decomposition Mondrian: Transformer Operators via Domain Decomposition Mondrian:通过域分解的变形操作器 2506.08226v1 -
904 06-09 Harnessing the Continuous Structure: Utilizing the First-order Approach in Online Contract Design Kontinuierliche Struktur: Nutzung des First-Order-Ansatzes im Online-Contract Design 利用连续结构:在网上合同设计中使用一等办法 2403.07143v3 -
905 06-09 What makes an Ensemble (Un) Interpretable? Was macht ein Ensemble (Un) Interpretierbar? 集合物怎么可以解释呢? 2506.08216v1 -
906 06-09 Scalable Meta-Learning via Mixed-Mode Differentiation Skalierbares Meta-Learning über Mixed-Mode Differenzierung 通过混合模式差异的可缩放元学习 2505.00793v2 -
907 06-09 Distinguishing Cause from Effect with Causal Velocity Models Bezeichnende Ursache von Wirkung mit kausalen Velocity-Modellen 区别因与因果速率模型的影响的原因 2502.05122v2 -
908 06-09 A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation Eine umfassende Studie von Decoder-Only LLMs für die Text-to-Image-Generierung 综合研究 “ 文字到图像生成的开发者 – – 唯一 “ 有限LLMs “ 综合研究 2506.08210v1 -
909 06-09 A Machine Learning Approach to Generate Residual Stress Distributions using Sparse Characterization Data in Friction-Stir Processed Parts Ein maschineller Lernansatz zur Generierung residualer Stressverteilungen mit Sparse Charakterisierungsdaten in Friction-Stir Prozessteilen 利用摩擦-标准处理部件中的扭曲特征数据生成残余应激反应分布的机械学习方法 2506.08205v1 -
910 06-09 Correlated Noise Mechanisms for Differentially Private Learning Correlated Noise Mechanisms for Differentially Private Learning 差别私人学习相关噪音机制 2506.08201v1 -
911 06-09 Tight Lower Bounds and Improved Convergence in Performative Prediction Enge niedrigere Grenzen und verbesserte Konvergenz in der Performativen Vorhersage 性能预测的下下角拉紧和更好趋同 2412.03671v2 -
912 06-09 Interpreting Agent Behaviors in Reinforcement-Learning-Based Cyber-Battle Simulation Platforms Verdolmetschung von Agenten-Behaviors bei verstärkten, Learning-basierten Cyber-Battle-Simulationsplattformen 强化-以学习为基础的网络战模拟平台中解释代理行为行为 2506.08192v1 -
913 06-09 Adapting Prediction Sets to Distribution Shifts Without Labels Anpassung von Vorhersagen an Verteilungsverschiebungen ohne Etiketten 调整预测集以适应无标签的分布转换 2406.01416v2 -
914 06-09 Innate-Values-driven Reinforcement Learning based Cognitive Modeling Angeborene Werte-getriebene Verstärkungs-Lernen auf Kognitive Modellierung 基于认知模型的强化学习模型 2411.09160v2 -
915 06-09 Innate-Values-driven Reinforcement Learning based Cooperative Multi-Agent Cognitive Modeling Angeborene Werte-getriebene Verstärkung Learning basierte kooperative Multi-Agent Kognitive Modellierung 以基于强化的学习为基础的合作多代理共认型建模 2401.05572v2 -
916 06-09 FedGA-Tree: Federated Decision Tree using Genetic Algorithm FedGA-Baum: Föderierter Entscheidungsbaum mit genetischem Algorithmus FedGA-Tree:使用遗传算法的联邦决策树 2506.08176v1 -
917 06-09 xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs xGen-MM-Vid (BLIP-3-Video): Sie benötigen nur 32 Zeichen, um ein Video auch in VLMs darzustellen xGen-MM-Vid (BLIP-3-Video):您只需32个调子即可在 VLMs 中代表视频 2410.16267v2 -
918 06-09 Grouped Discrete Representation for Object-Centric Learning Gruppierte diskrete Darstellung für Objekt-Zentrisches Lernen 用于对象中心学习的分组差异代表 2411.02299v2 -
919 06-09 Federated Learning on Stochastic Neural Networks Föderiertes Lernen in stochastischen Neuronalen Netzwerken 存储神经神经网络方面的联邦学习 2506.08169v1 -
920 06-09 SILK: Smooth InterpoLation frameworK for motion in-betweening A Simplified Computational Approach SILK: Smooth InterpoLation frameworK für Bewegung zwischen einem vereinfachten Computational Approach SILK: 在简化计算方法之间运动的平滑间插框架work 2506.09075v1 -
921 06-09 UniVarFL: Uniformity and Variance Regularized Federated Learning for Heterogeneous Data UniVarFL: Uniformität und Varianz Regularisiertes Federated Learning für heterogene Daten UUVARFL: 统一和差异校正的联邦异种数据学习 2506.08167v1 -
922 06-09 BLUR: A Bi-Level Optimization Approach for LLM Unlearning BLUR: Ein Bi-Level-Optimierungsansatz für LLM-Unlearning BLUR: LLM 重新学习的双级优化办法 2506.08164v1 -
923 06-09 A Metrics-Oriented Architectural Model to Characterize Complexity on Machine Learning-Enabled Systems Ein metrisch ausgerichtetes architektonisches Modell zur Charakterisierung von Komplexität auf maschinell lernfähigen Systemen 以计量为主的建筑建筑模型,以明确机械学习系统的复杂性 2506.08153v1 -
924 06-09 Towards a Mechanistic Explanation of Diffusion Model Generalization Auf dem Weg zu einer mechanistischen Erklärung der Verallgemeinerung des Difffusionsmodells 争取对传播模型通用化进行机械解释 2411.19339v3 -
925 06-09 Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models Mehrsprachige Hate Speech Detection in sozialen Medien mit translationsbasierten Ansätzen mit großen Sprachmodellen 在社会媒体中使用基于翻译的方法和大语言模式的多语言社会媒体多语言仇恨言论探测 2506.08147v1 -
926 06-09 A Theory of Machine Understanding via the Minimum Description Length Principle Eine Theorie des Maschinenverständnisses über das Prinzip der Mindestbeschreibungslänge 通过最低描述长度原则理解机器理论 2504.00395v3 -
927 06-09 Fully data-driven inverse hyperelasticity with hyper-network neural ODE fields Vollständig datengetriebene inverse Hyperelastizität mit hyper-network neuronalen ODE-Feldern 由全数据驱动的全数据驱动的超反超弹性,具有超网络神经极极光字段 2506.08146v1 -
928 06-09 Accelerating Spectral Clustering under Fairness Constraints Beschleunigung des Spektralclustering unter Fairness-Beschränkungen 加速在公平制约下聚居 2506.08143v1 -
929 06-09 AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists AutoSDT: Scaling Data-Driven Discovery Aufgaben gegenüber offenen Mitwissenschaftlern AutoSDD: 扩大数据驱动发现任务, 以建立开放的共同科学家 2506.08140v1 -
930 06-09 Nearness of Neighbors Attention for Regression in Supervised Finetuning Nähe der Nachbarn Achtung vor der Regression in überwachter Finetuning 邻近邻居的近邻 在监督的微调中注意后退 2506.08139v1 -
931 06-09 Learning Efficient Representations of Neutrino Telescope Events Effiziente Darstellungen von Neutrino-Teleskop-Veranstaltungen lernen 中子望远镜活动学习效率高的代表性 2410.13148v2 -
932 06-09 UnCLe: Benchmarking Unsupervised Continual Learning for Depth Completion UnCLe: Benchmarking unüberwachtes kontinuierliches Lernen für die Tiefenvervollständigung 无CLLe:为深度完成确定不受监督的连续学习基准 2410.18074v4 -
933 06-09 Directed Exploration in Reinforcement Learning from Linear Temporal Logic Direkte Exploration bei der Verstärkung Lernen von linearen Temporal Logic 直接探索从线上时间逻辑中加强学习 2408.09495v2 -
934 06-09 Constrained Pareto Set Identification with Bandit Feedback Beschränkte Pareto-Set-Identifikation mit Bandit-Feedback 被约束的 Pareto 设置 识别强盗反馈 2506.08127v1 -
935 06-09 Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning Bingo: Effiziente Begründung von LLMs durch dynamisches und auf Bedeutung basierendes Verstärkungslernen 宾果:通过以动态和重要内容为基础的强化学习,提高LLMs高效率的理据能力 2506.08125v1 -
936 06-09 Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence Kontinuierliche Politik und Wert Iteration für stochastische Kontrollprobleme und deren Konvergenz 斯托卡控制问题及其趋同的持续政策和价值迭代 2506.08121v1 -
937 06-09 Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting Benchmarking-Modelle der vortrainierten Zeitreihen für die Strompreisprognose 确定电力价格预测培训前时间序列模型的基准 2506.08113v1 -
938 06-09 Selective Matching Losses – Not All Scores Are Created Equal Selektive Matching Losses – Nicht alle Scores werden gleich erstellt 选择性匹配损失 – – 并非所有分数都等于创建 2506.04446v2 -
939 06-09 Positional Attention: Expressivity and Learnability of Algorithmic Computation Positionale Aufmerksamkeit: Expressivität und Erlernbarkeit algorithmischer Computation 位置注意: 算法的可表达性和可理解性 2410.01686v3 -
940 06-09 StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets StableMTL:从部分附加说明合成数据集中重新定位多任务学习的远程传播模型 2506.08013v1 -
941 06-09 Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Selbsterzwingen: Überbrückung der Zug-Testlücke in autoregressiver Video-Diffusion 自我强迫:弥合自动递减视频传播中的培训-测试差距 2506.08009v1 -
942 06-09 Hidden in plain sight: VLMs overlook their visual representations Versteckt in freier Sicht: VLMs übersehen ihre visuellen Darstellungen 隐藏在普通视线之下:VLMs忽略了它们的视觉形象。 2506.08008v1 -
943 06-09 Reparameterized LLM Training via Orthogonal Equivalence Transformation Reparameterisiertes LLM-Training über Orthogonale Äquivalenztransformation 通过正正对等转化进行修复性磁力LLM培训 2506.08001v1 -
944 06-09 MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation MADFormer: Mixed Autoregressive und Diffusion Transformer für die kontinuierliche Bildgenerierung MADFormer: 用于连续图像生成的混合自动递减和扩散变异器 2506.07999v1 -
945 06-09 Generative Modeling of Weights: Generalization or Memorization? Generative Modellierung von Gewichten: Verallgemeinerung oder Erinnerung? 产生重力建模:普遍化还是记忆化? 2506.07998v1 -
946 06-09 Domain Switching on the Pareto Front: Multi-Objective Deep Kernel Learning in Automated Piezoresponse Force Microscopy Domain-Umschaltung der Pareto-Front: Multi-Objektives Deep Kernel-Lernen in automatisierter Piezoresponse-Kraftmikroskopie Pareto Front 域域切换:在自动 Piezo响应力量中进行多目标深核心学习的显微镜 2506.08073v1 -
947 06-09 PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation PhantomWiki: On-Demand-Datensätze für die Bewertung von Vernunft und Retrieval PhohantomWiki: 用于理性和检索评价的现场数据集 2502.20377v2 -
948 06-09 When Two LLMs Debate, Both Think They’ll Win Wenn zwei LLMs diskutieren, denken beide, dass sie gewinnen werden 当两个LLM 辩论, 双方都认为他们会赢 2505.19184v3 -
949 06-09 Rethinking Crowd-Sourced Evaluation of Neuron Explanations Rethinking Crowd-Sourced Evaluation von Neuron Erklärungen 重新思考人群-源对内向解释的重新思考 2506.07985v1 -
950 06-09 CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray CXR-LT 2024: Eine MICCAI-Herausforderung bei Langzeit-, Multilabel- und Zero-Shot-Erkrankungsklassifizierung aus der Bruströntgenuntersuchung CXR-LT 2024:MICCAI对胸前X光中长尾、多标签和零发疾病分类的挑战 2506.07984v1 -
951 06-09 Real-time Localization of a Soccer Ball from a Single Camera Echtzeit Lokalisierung eines Fußballballs aus einer einzigen Kamera 单摄像头足球球的实时本地化 2506.07981v1 -
952 06-09 Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation Bipartite Ranking von mehreren Labels: Auf Verlust Versus Label Aggregation 多标签的双边排名:关于损失的Versus Label聚合 2504.11284v2 -
953 06-09 Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator Realistischer städtischer Verkehrsgenerator mit dezentralisiertem Federated Learning für den SUMO-Simulator 使用分散化的联邦学习来模拟SUMO的模拟器进行现实化城市交通发电机 2506.07980v1 -
954 06-09 Automated Capability Discovery via Foundation Model Self-Exploration Automatisierte Capability Discovery über die Selbsterkundung von Fundamentmodellen 通过基金会自我探索示范模式自动发现能力 2502.07577v3 -
955 06-09 Hyperpruning: Efficient Search through Pruned Variants of Recurrent Neural Networks Leveraging Lyapunov Spectrum Hyperpruning: Effiziente Suche durch beschnittene Varianten von recurrenten Neuronalen Netzwerken, die Lyapunov Spectrum nutzen 超光速运行:通过经常神经网络的审慎变式有效搜索,利用Lyapunov光谱网 2506.07975v1 -
956 06-09 Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures Intelligentes Offloading im Fahrzeug Edge Computing: Eine umfassende Überprüfung von Deep Reinforcement-Lernansätzen und Architekturen 在车辆边缘计算机中卸载:对深强化学习方法和架构的全面审查 2502.06963v2 -
957 06-09 HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization HeuriGym: Agentischer Benchmark für LLM-gekrempelte Heuristik in der kombinatorischen Optimierung HeuriGym: 组合优化中LLM-减压希力学的代理基准 2506.07972v1 -
958 06-09 MIB: A Mechanistic Interpretability Benchmark MIB: Ein Mechanistischer Interpretations-Benchmark MIB: 机械解释基准 2504.13151v2 -
959 06-09 A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling Ein zweiphasiges Deep-Learning-Framework für adaptive Zeitschritte in der High-Speed-Flow-Modellierung 高频流建模适应性时间排行的两阶段深学习框架 2506.07969v1 -
960 06-09 Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs 物理知情神经溶解器中发现聚合物的神经内核内核分析:PIKANs诉PINNs 2506.07958v1 -
961 06-09 Distillation Robustifies Unlearning Destillation stärkt das Unlearning 固化 固化 校正 校正 2506.06278v2 -
962 06-09 Language Models over Canonical Byte-Pair Encodings Sprachmodelle über Canonical Byte-Pair Encodings Canonicic 字节对称编码语言模型 2506.07956v1 -
963 06-09 Discrete and Continuous Difference of Submodular Minimization Diskreter und kontinuierlicher Unterschied der Submodularen Minimierung 子模块最小化的分质和连续差异 2506.07952v1 -
964 06-09 Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization Easy2Hard-Bench: Standardisierte Schwierigkeits-Labels für Profiling LLM Performance und Generalisierung Easy2Hard-Bench:用于分析LLM性能和通用的标准化困难标签 2409.18433v2 -
965 06-09 Cost-Optimal Active AI Model Evaluation Kostenoptimale, aktive KI-Modellbewertung AI模型评价 2506.07949v1 -
966 06-09 TokenBreak: Bypassing Text Classification Models Through Token Manipulation TokenBreak: Textklassifikationsmodelle durch Token-Manipulation umgehen Token Break: 通过调制调控绕过文本分类模型 2506.07948v1 -
967 06-09 Info-Coevolution: An Efficient Framework for Data Model Coevolution Info-Coevolution: Ein effizienter Rahmen für die Datenmodellkoevolution 信息革命:数据模型革命的有效框架 2506.08070v1 -
968 06-09 Gradients: When Markets Meet Fine-tuning – A Distributed Approach to Model Optimisation Gradienten: Wenn Märkte Feinsteuerung treffen – Ein verteilter Ansatz zur Modelloptimierung 渐进:市场达到微调时 – – 模型优化的分配办法 2506.07940v1 -
969 06-09 Reliably detecting model failures in deployment without labels Zuverlässige Erkennung von Modellfehlern im Einsatz ohne Etiketten 可靠地发现无标签部署中的模型失败 2506.05047v2 -
970 06-09 Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models Mimicking oder Reasoning: Multi-Modal In-Context Learning in Vision-Language-Modellen neu denken 模拟或解释:在视觉语言模型中重新思考多模式内学习 2506.07936v1 -
971 06-09 Ensemble-Based Survival Models with the Self-Attended Beran Estimator Predictions Ensemble-basierte Überlebensmodelle mit den selbstbestimmten Beran-Estimator-Vorhersagen 与自行预测的贝兰估算器合成生存模型 2506.07933v1 -
972 06-09 Predicting Bad Goods Risk Scores with ARIMA Time Series: A Novel Risk Assessment Approach Risiko-Scores mit ARIMA Time Series voraussagen: Ein neuartiger Ansatz zur Risikobewertung ARIMA 时间系列:新风险评估方法 2502.16520v3 -
973 06-09 Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor Squeeze3D: Ihr Modell der 3D-Generation ist geheim ein extremer Neuralkompressor Squeeze3D: 您的 3D 代的模型是一个秘密的 极端神经压缩器 2506.07932v1 -
974 06-09 Missing Data Imputation by Reducing Mutual Information with Rectified Flows Fehlende Daten-Imputation durch Verringerung gegenseitiger Informationen mit rektifizierten Strömen 通过减少相互信息与纠正流流 2505.11749v2 -
975 06-09 A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle Ein generativer physik-informierter, lernbasierter Ansatz zur Konstruktion repräsentativer Antriebszyklen 为建造具有代表性的驱动周期而采用基于学习的创形物理综合强化强化学习方法 2506.07929v1 -
976 06-09 Solving Inequality Proofs with Large Language Models Unequality Proofs mit großen Sprachmodellen lösen 以大语言模式解决不平等证据 2506.07927v1 -
977 06-09 A Comparative Study of U-Net Architectures for Change Detection in Satellite Images Eine vergleichende Studie über U-Net-Architekturen zur Erkennung von Veränderungen in Satellitenbildern U-Net图象变化探测网络结构比较研究 2506.07925v1 -
978 06-09 W4S4: WaLRUS Meets S4 for Long-Range Sequence Modeling W4S4: WaLRUS trifft S4 für lange Sequenzmodellierung W4S4:WALRUS在长程序列建模中与S4相匹配 2506.07920v1 -
979 06-09 Uncovering the Functional Roles of Nonlinearity in Memory Enthüllen der funktionalen Rollen der Nichtlinearität im Gedächtnis 发挥记忆中非线性的作用 2506.07919v1 -
980 06-09 CausalPFN: Amortized Causal Effect Estimation via In-Context Learning CausalPFN: Amortisierte Abschätzung des Kausaleffekts durch In-Context Learning CausalPPFN: 通过内文学习对因果效应的摊销估计 2506.07918v1 -
981 06-09 Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces Diffuse Alles: Multimodale Diffusionsmodelle auf willkürlichen Staatsräumen Diffuse Everything:关于任意国家空间的多模式扩散模型 2506.07903v1 -
982 06-09 FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative Modeling FunDiff: Diffusionsmodelle über Funktionsräume für physikinformierte Generative Modellierung FunDiff:物理成形生成模型的功能空间上的传播模型 2506.07902v1 -
983 06-09 MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs MEMOIR: Lebenslange Modellbearbeitung mit minimalem Überschreiben und informierter Aufbewahrung für LLMs MEMOIR: 长寿命模型编辑,以最小覆盖和知情保留LLMM 2506.07899v1 -
984 06-09 GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution GaussianVAE: Adaptive Lerndynamik von 3D Gaussianern für High-Fidelity Super-Resolution GausianVAE: 高频高频高频高频高频高分辨率3D的适应性学习动力 2506.07897v1 -
985 06-09 Learning to Stop: Deep Learning for Mean Field Optimal Stopping Stoppen lernen: Deep Learning für das mittlere Feld Optimales Stoppen 学习到停止:为平均字段进行深度学习 2410.08850v2 -
986 06-09 SMRS: advocating a unified reporting standard for surrogate models in the artificial intelligence era SMRS: Für einen einheitlichen Berichterstattungsstandard für Surrogatmodelle im Zeitalter der künstlichen Intelligenz SMRS:倡导人工智能时代代用模型的统一报告标准 2502.06753v2 -
987 06-09 SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark SoK: Datenrekonstruktion greift gegen Modelle des maschinellen Lernens an: Definition, Metrics und Benchmark SoK:对机器学习模式进行数据重建攻击:定义、计量和基准 2506.07888v1 -
988 06-09 SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection SimLTD: Einfache, überwachte und halbüberwachte Langzeitobjekterkennung SimLTD: 简单监视和半监视长途远航物体探测 2412.20047v3 -
989 06-09 Schauder Bases for $C[0, 1]$ Using ReLU, Softplus and Two Sigmoidal Functions Schauder-Basis für $C[0, 1]$ mit ReLU, Softplus und zwei Sigmoidal-Funktionen 使用 ReLU、 Softplus 和两个悬浮函数的 Schauder 基地[0, 1]$C[0, 1]美元 2506.07884v1 -
990 06-09 Diffusion Counterfactual Generation with Semantic Abduction Diffusion Gegenfaktische Generation mit semantischer Abduktion 以语义绑架方式传播反实际生成 2506.07883v1 -
991 06-09 On Hypothesis Transfer Learning of Functional Linear Models Über Hypothesentransfer Lernen von funktionalen Linearmodellen 关于职能线性模型学习的假设转移 2206.04277v5 -
992 06-09 GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity GradSkip: Kommunikationsbeschleunigte lokale Gradientenmethoden mit besserer Computational Complexity GradSkip:通讯加速的地方渐进法,具有更好的计算复杂度 2210.16402v3 -
993 06-09 SIFBench: An Extensive Benchmark for Fatigue Analysis SIFBench: Ein umfassender Benchmark für die Ermüdungsanalyse SIFBench:关于 “ 致命分析 “ 的广泛基准 2506.01173v2 -
994 06-09 Algorithmic Aspects of Strategic Trading Algorithmische Aspekte des strategischen Handels 战略贸易的算法方面 2502.07606v2 -
995 06-09 TS-RSR: A provably efficient approach for batch Bayesian Optimization TS-RSR: Ein nachweislich effizienter Ansatz für die Batch Bayesian Optimierung TS-RSR:对Bayesian的批量最佳化采取可衡量有效办法 2403.04764v4 -
996 06-09 Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models? Können Hessian-Based Insights Fehlerdiagnosen in aufmerksamkeitsbasierten Modellen unterstützen? 以海珊为基地的洞察能支持以关注为基础的模型中的过失诊断吗? 2506.07871v1 -
997 06-09 Accurate INT8 Training Through Dynamic Block-Level Fallback Genaues INT8-Training durch dynamischen Block-Level-Fallback 通过动态区档级后退进行准确的INT8培训 2503.08040v3 -
998 06-09 FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity FreeGave: 3D Physik Lernen aus dynamischen Videos von Gaussian Velocity FreeGave: Gaussian 速度从动态视频学习 3D物理 2506.07865v1 -
999 06-09 Lightweight Sequential Transformers for Blood Glucose Level Prediction in Type-1 Diabetes Leichte Sequentielle Transformer für Blutglukose-Level-Vorhersage bei Typ-1-Diabetes 一型糖尿病血凝胶水平预测的轻量量 轻序式变压器 2506.07864v1 -
1000 06-09 VIVAT: Virtuous Improving VAE Training through Artifact Mitigation VIVAT: Tugendhafte Verbesserung der VAE-Ausbildung durch Artefakt-Minderung VIVAT:通过减少非手工行为,提高VAE培训的道德水平 2506.07863v1 -
1001 06-09 PropEnc: A Property Encoder for Graph Neural Networks Propenc: Ein Objekt-Encoder für Graph-Neural-Netzwerke PropEnc: 图形神经网络的属性编码器 2409.11554v3 -
1002 06-09 Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective Fairness Overfiting in Machine Learning: Eine informationstheoretische Perspektive 在机器学习中过分适应公平:信息理论视角 2506.07861v1 -
1003 06-09 LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds LogoSP: Lokale-globale Gruppierung von Superpoints für unbeaufsichtigte semantische Segmentierung von 3D-Punkt-Clouds LogoSP: 3D点云无监督的语义分割超点地方-全球组别 2506.07857v1 -
1004 06-09 Residual Reweighted Conformal Prediction for Graph Neural Networks Residual Reweighted Conformal Prediction for Graph Neural Networks 图形神经网络的重新加权常规预测 2506.07854v1 -
1005 06-09 Conditional Local Independence Testing with Application to Dynamic Causal Discovery Conditional Local Independence Testing mit Anwendung auf Dynamische Kausal-Discovery 有条件的局部独立测试,应用到动态因子发现 2506.07844v1 -
1006 06-09 Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels Jarzynski Umgewichtungs- und Probenahmedynamiken für das Training energiebasierter Modelle: Theoretische Analyse verschiedener Übergangskerne Jarzynski 培训以能源为基础的模型的加权和抽样动态:对不同过渡核心的理论分析 2506.07843v1 -
1007 06-09 FedX: Adaptive Model Decomposition and Quantization for IoT Federated Learning FedX: Adaptive Modellzersetzung und Quantisierung für IoT-Federated Learning FFX:国际oT联合会学习的适应性模型分解和量化 2504.12849v3 -
1008 06-09 A Cognac Shot To Forget Bad Memories: Corrective Unlearning for Graph Neural Networks Ein Cognac, um schlechte Erinnerungen zu vergessen: Korrektives Lernen für Graphen-Neural-Netzwerke 为忘记坏记忆而拍摄的科格纳克白兰地:为图形神经网络进行矫正性学习 2412.00789v4 -
1009 06-09 C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition C3T: Grenzüberschreitender Transfer durch Zeit für sensorgestützte menschliche Aktivitätserkennung C3T: 以传感器为基础的人类活动识别跨时间跨模式转让 2407.16803v3 -
1010 06-09 Efficient and Generalized end-to-end Autonomous Driving System with Latent Deep Reinforcement Learning and Demonstrations Effizientes und generalisiertes End-to-End Autonomes Fahrsystem mit Latent Deep Reinforcement Learning und Demonstrationen 高效和普遍通用的端到端自动驾驶系统,配有深深层深强化学习和示范 2401.11792v7 -
1011 06-09 Beyond Numeric Rewards: In-Context Dueling Bandits with LLM Agents Über Numerische Belohnungen hinaus: In-Context-Duellbanditen mit LLM-Agenten 除数值奖励之外:与LLM代理商进行文中评分的强盗 2407.01887v4 -
1012 06-09 Improving large language models with concept-aware fine-tuning Verbesserung großer Sprachmodelle mit konzeptbewusster Feinsteuerung 改进大语言模式,进行概念意识微调 2506.07833v1 -
1013 06-09 Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information Dezentralisierung des Mehr-Agenten-Verstärkungs-Lernens mit zeitlichen Kausalinformationen 利用时间原因信息分散多机构强化学习 2506.07829v1 -
1014 06-09 R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation R3D2: Realistische 3D Asset Insertion über Diffusion für autonome Fahrsimulation R3D2:现实的 3D 资产通过传播插入资产,用于自动驾驶模拟 2506.07826v1 -
1015 06-09 Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation Beschleunigung von Diffusionsmodellen in Offline-RL über die Reward-Aware-Konsistenz-Trajektoriendestillation 通过奖励-软件兼容性轨迹蒸馏在离线 RL 中加速扩散模型 2506.07822v1 -
1016 06-09 Accelerating Constrained Sampling: A Large Deviations Approach Beschleunigte Probenahme beschleunigen: Ein großer Abweichungsansatz 加速受控抽样:大偏离方法 2506.07816v1 -
1017 06-09 A weighted quantum ensemble of homogeneous quantum classifiers Ein gewichtetes Quantenensemble homogener Quantenklassifikatoren 一份加权量量的加权总量,由同质量分类器组成 2506.07810v1 -
1018 06-09 Identifiable Object Representations under Spatial Ambiguities Identifizierbare Objektdarstellungen unter räumlichen Ambiguitäten 空间模糊度下可识别物体的表示 2506.07806v1 -
1019 06-09 Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability Verbesserung der adversarialen Robustheit mit konformer Vorhersage: Ein Rahmen für eine garantierte Model-Zuverlässigkeit 增强以非正式预测增强反向能力:保证模型可靠性框架 2506.07804v1 -
1020 06-09 Fine-grained Hierarchical Crop Type Classification from Integrated Hyperspectral EnMAP Data and Multispectral Sentinel-2 Time Series: A Large-scale Dataset and Dual-stream Transformer Method Feinkörnige Hierarchische Sortenklassifikation aus integrierten Hyperspektralen EnMAP-Daten und Multispektralen Sentinel-2 Zeitreihen: Eine großflächige Datensatz- und Dual-Stream-Transformer-Methode 从综合超高光谱 EmmAP 数据和多谱哨兵-2时间序列:大型数据集和双流变异器方法中精细测的等级级作物类型分类 2506.06155v2 -
1021 06-09 Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness Widersacher mit Anreizen: Eine strategische Alternative zur widerspenstigen Robustheit 具有奖励措施的逆向:反反向强力战略替代方案 2406.11458v3 -
1022 06-09 MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification MultiMatch: Multihead-Konsistenzregularisierung passend zur semi-überwachten Textklassifikation 多匹配: 用于半有效文本分类的多标题一致性规则化 2506.07801v1 -
1023 06-09 Generalized Interpolating Discrete Diffusion Generalisierte Interpolation diskreter Diffusion 普遍国际刑 化 2503.04482v2 -
1024 06-09 LLM Unlearning Should Be Form-Independent LLM Unlearning sollte formunabhängig sein LLM 重新学习应是独立的形式 2506.07795v1 -
1025 06-09 High-Dimensional Learning in Finance High-Dimensional Learning in Finance 金融领域高层次学习 2506.03780v2 -
1026 06-09 Text-to-LoRA: Instant Transformer Adaption Text-zu-LoRA: Instant Transformer Adaption 文本到LORA:即时变换器适应 2506.06105v2 -
1027 06-09 Detecting Out-of-Distribution Objects through Class-Conditioned Inpainting Nicht-Verteilungsobjekte durch klassenkonditionierte Inpainting erkennen 通过按类别修饰油漆检测分销物体 2402.03292v3 -
1028 06-09 Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy Prozess und Vertrauen: Byzantinische Angriffe mit umfassender Verteidigungsstrategie angehen 审判和信任:以全面防卫战略对付拜占庭袭击 2505.07614v2 -
1029 06-09 Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger Re-Ranking Reasoning Kontext mit Baumsuche macht große Vision-Sprache Modelle stärker 以树木搜索为重新排列的原因说明背景,使大型视觉语言模型更强大 2506.07785v1 -
1030 06-09 Is poisoning a real threat to LLM alignment? Maybe more so than you think Ist Vergiftung eine echte Bedrohung für LLM Ausrichtung? Vielleicht mehr, als Sie denken 中毒是否真正威胁到LLM对齐? 2406.12091v4 -
1031 06-09 WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection WWAggr: Ein Fenster Wasserstein-basierte Aggregation für Ensemble Change Point Detection WWWAggr:一个基于窗口的瓦西施泰因聚合群,用于聚合变化点探测 2506.08066v1 -
1032 06-09 Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics Die Grenzen der Low-Bit-Optimierer drücken: Fokus auf EMA-Dynamik 推动低比最佳优化因素的极限:注重EMA动态 2505.00347v2 -
1033 06-09 Trend-Aware Fashion Recommendation with Visual Segmentation and Semantic Similarity Trend-Aware Fashion Empfehlung mit Visueller Segmentierung und semantischer Ähnlichkeit 具有视觉分解和语义相似性的趋势软件时装建议 2506.07773v1 -
1034 06-09 Diffusion Models-Aided Uplink Channel Estimation for RIS-Assisted Systems Diffusion Models-Aided Uplink Channel Abschätzung für RIS-Assistente Systeme IRIS辅助系统传播模型辅助上链接频道估计 2506.07770v1 -
1035 06-09 Clustered Federated Learning via Embedding Distributions Clustered Federated Learning durch Einbettung von Distributionen 通过嵌入式分发分发方式进行分组联邦学习 2506.07769v1 -
1036 06-09 Quickest Causal Change Point Detection by Adaptive Intervention Schnellste ursächliche Punktänderungserkennung durch adaptive Intervention 通过适应干预措施快速发现因果变化点 2506.07760v1 -
1037 06-09 Agent Semantics, Semantic Spacetime, and Graphical Reasoning Agent Semantics, Semantische Raumzeit und Graphische Vernunft 语义学、语义空间时间和图形解释 2506.07756v1 -
1038 06-09 Comparing Credit Risk Estimates in the Gen-AI Era Vergleich der Schätzungen des Kreditrisikos in der Ära der Gen-AI 比较GEN-AI时代的信用风险估算 2506.07754v1 -
1039 06-09 An Overview of the Burer-Monteiro Method for Certifiable Robot Perception Ein Überblick über die Burer-Monteiro-Methode für zertifizierbare Roboterwahrnehmung 关于可核证的机器人感知的Burer-Monteiro方法概览 2410.00117v2 -
1040 06-09 DualCast: A Model to Disentangle Aperiodic Events from Traffic Series DualCast: Ein Modell, um Aperiodische Ereignisse aus der Verkehrsreihe zu entwirren DualCast: 将周期性事件与交通系列分开的模型 2411.18286v2 -
1041 06-09 Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning Graph-Assistente Stiche für Offline-Hierarchisches Verstärkungslernen 离线高层强化学习的图表辅助细化 2506.07744v1 -
1042 06-09 An end-to-end attention-based approach for learning on graphs Ein durchgängiger, auf der Aufmerksamkeit basierender Ansatz für das Lernen auf Diagrammen 以端对端关注为基础的图表学习方法 2402.10793v3 -
1043 06-09 When, Where and Why to Average Weights? Wann, wo und warum zu Durchschnittsgewichten? 何时、何地和为何平均体重? 2502.06761v2 -
1044 06-09 Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning Spracheinbettung trifft auf Dynamische Grafik: Eine neue Exploration für neurale Architektur-Repräsentations-Lernen 语言嵌入与动态图表相匹配:神经结构代表制学习的新探索 2506.07735v1 -
1045 06-09 Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild Robuste 3D-Form-Rekonstruktion in Zero-Shot aus einem einzigen Bild in der Wildnis 从野生的单一图像中进行零热零位的强力 3D 形状重建 2403.14539v3 -
1046 06-09 Profiling Electric Vehicles via Early Charging Voltage Patterns Profilieren von Elektrofahrzeugen über frühe Ladespannungsmuster 通过早期充电压模式分析电动车辆 2506.07714v1 -
1047 06-09 Representation Bending for Large Language Model Safety Darstellungsbiegen für große Sprachmodellsicherheit 大语文示范语文安全示范语文代表名单 2504.01550v2 -
1048 06-09 Statistical Taylor Expansion Statistische Taylor-Erweiterung 扩大泰勒统计 2410.01223v8 -
1049 06-09 Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation Bewertung der Robustheit in latenten Diffusionsmodellen durch Einbetten von Level Augmentation 通过嵌入级递增评估前流传播模型的强力 2506.07706v1 -
1050 06-09 Towards a Small Language Model Lifecycle Framework Auf dem Weg zu einem Rahmen für den Lebenszyklus eines kleinen Sprachmodells 建立一个小型语言模拟生命周期框架 2506.07695v1 -
1051 06-09 Training Superior Sparse Autoencoders for Instruct Models Training Superior Sparse Autoencoder für Instruct-Modelle 培训高级Sparass Autoencors 教学模型高级导师 2506.07691v1 -
1052 06-09 Rao-Blackwellised Reparameterisation Gradients Rao-Blackwellised Reparameterisation Gradienten Rao-黑-黑- 恢复度梯度 2506.07687v1 -
1053 06-09 On Fitting Flow Models with Large Sinkhorn Couplings Auf Einbau-Flow-Modelle mit großen Sinkhorn-Kupplungen 配有大型Sinkhorn组合体的 “ 适合流程模型 “ 2506.05526v2 -
1054 06-09 Forests for Differences: Robust Causal Inference Beyond Parametric DiD Wälder für Unterschiede: Robuste Kausalableitung jenseits parametrischer DiD 差异的森林:超出参数二D的强力因果推断 2505.09706v2 -
1055 06-09 Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models Unlösbare Problemerkennung: Robuste Bewertung des Verständnisses für große multimodale Modelle 无法解决的问题探测:大型多式联运模型的强有力理解评估 2403.20331v4 -
1056 06-09 How Benchmark Prediction from Fewer Data Misses the Mark Wie Benchmark-Voraussagen von weniger Daten das Zeichen vermissen 从更少数据中的基准预测如何错过标记 2506.07673v1 -
1057 06-09 Data-driven inventory management for new products: An adjusted Dyna-$Q$ approach with transfer learning Datengesteuertes Bestandsmanagement für neue Produkte: Ein angepasster Dyna-$Q$-Ansatz mit Transfer-Lernen 新产品数据驱动的库存管理新产品库存管理:经调整的转让学习Dyna-Q$办法 2501.08109v4 -
1058 06-09 ProARD: progressive adversarial robustness distillation: provide wide range of robust students ProARD: progressive konversarische Robustheitsdestillation: für eine breite Palette robuster Studenten ProARD: 进步对抗性强力蒸馏:提供各种强健学生 2506.07666v1 -
1059 06-09 The Universality Lens: Why Even Highly Over-Parametrized Models Learn Well Die Universalitätslinse: Warum selbst hoch überparametrisierte Modelle gut lernen 普世透镜:为什么甚至高超美化模型学习良好 2506.07661v1 -
1060 06-09 Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks Bewertung von Dialect Fairness und Robustheit von großen Sprachmodellen in mit Gründen versehenen Aufgaben 评估大语言模式在说明理由任务中的公平性和强度 2410.11005v3 -
1061 06-09 Feature Statistics with Uncertainty Help Adversarial Robustness Feature Statistik mit Unsicherheit Hilfe Widersprüchliche Robustheit 具有不确定性的特征统计 2503.20583v2 -
1062 06-09 Trajectory World Models for Heterogeneous Environments Trajektoriale Weltmodelle für heterogene Umgebungen 不同不同环境世界模型 2502.01366v2 -
1063 06-09 Training Articulatory Inversion Models for Interspeaker Consistency Training Artikulatorische Inversionsmodelle für die Konsistenz von Interspeakern 跨口间连通性人工转换模型培训 2505.20529v3 -
1064 06-09 HieraEdgeNet: A Multi-Scale Edge-Enhanced Framework for Automated Pollen Recognition HieraEdgeNet: Multi-Scale Edge-Enhanced Framework für automatisierte Pollenerkennung HieraEdgeNet:一个多层边缘强化自动承认花粉框架 2506.07637v1 -
1065 06-09 RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors RAID: Ein Datensatz für die Prüfung der adversarialen Robustheit von KI-generierten Bilddetektoren 用于测试AI-光学图像探测器的反向强力的数据集 2506.03988v3 -
1066 06-09 Can Perplexity Predict Fine-tuning Performance? An Investigation of Tokenization Effects on Sequential Language Models for Nepali Kann Perplexity Fine-Tuning Performance voraussagen? Eine Untersuchung von Tokenization-Effekten auf sequentielle Sprachmodelle für Nepali 重复性预测微调性能能否? 对尼泊尔的序列语言模式的收缩性影响调查i 2404.18071v2 -
1067 06-09 AI-based Framework for Robust Model-Based Connector Mating in Robotic Wire Harness Installation KI-basiertes Framework für robuste modellbasierte Steckverbinder-Matierung in der Roboterdraht-Harness-Installation 以AI为基础的机器人电线装置中强有力的模型连接器配对框架 2503.09409v2 -
1068 06-09 Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks Rückkehr von ChebNet: Verständnis und Verbesserung eines überblickten GNN bei langfristigen Aufgaben ChebNet的返回:了解和改进对长期任务GNN的忽视 2506.07624v1 -
1069 06-09 LoRMA: Low-Rank Multiplicative Adaptation for LLMs LoRMA: Multiplikative Anpassung mit niedrigem Rank für LLMs LoRMA:LLMML的低Rank多复制性适应 2506.07621v1 -
1070 06-09 Retrieval-Augmented Generation as Noisy In-Context Learning: A Unified Theory and Risk Bounds Retrieval-Augmented Generation as Noisy In-Context Learning: Eine einheitliche Theorie und Risikogrenzen 作为有噪音的文内学习:统一理论和风险波 2506.03100v3 -
1071 06-09 The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning Der Catechol-Benchmark: Zeitreihen-Lösungsauswahldaten für wenige Aufnahmen maschinelles Lernen Catechol基准:几发机器学习的时间序列溶剂选择数据 2506.07619v1 -
1072 06-09 FuXi-Air: Urban Air Quality Forecasting Based on Emission-Meteorology-Pollutant multimodal Machine Learning FuXi-Air: Vorhersage der städtischen Luftqualität auf der Grundlage von Emissions-Meteorologie-Pollutan multimodalen maschinellen Lernens FUXi-Air:基于排放-气象-污染-多式联运机学习的城市空气质量预测 2506.07616v1 -
1073 06-09 Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds Poisson Midpoint-Methode für Log Concave Sampling: Jenseits der starken Fehler unteren Bounds 日志集中取样的 Poisson 中点方法: 超越强误差, 下界 2506.07614v1 -
1074 06-09 APE: Selective Fine-tuning with Acceptance Criteria for Language Model Adaptation APE: Selektive Feinabstimmung mit Annahmekriterien für Sprachmodellanpassung APE:有选择地调整语言模式适应接受标准 2505.19912v2 -
1075 06-09 Link Prediction with Relational Hypergraphs Link Vorhersage mit relationalen Hypergraphen 将预测与关系电报联系起来 2402.04062v3 -
1076 06-09 SWAG: Long-term Surgical Workflow Prediction with Generative-based Anticipation SWAG: Langzeit-Surgical Workflow Prediction mit generativer Antizipation SWAG: 长期外科工作流动预测,以产生为基础预测 2412.18849v3 -
1077 06-09 Tree-Sliced Wasserstein Distance with Nonlinear Projection Baumgeschliffene Wasserstein-Distanz mit nichtlinearer Projektion 用非线性投影绘制的树切瓦西尔斯坦距离 2505.00968v2 -
1078 06-09 Learning from Double Positive and Unlabeled Data for Potential-Customer Identification Lernen aus doppelt positiven und unmarkierten Daten für die potenzielle Kundenidentifikation 学习用于潜在海关识别的双正和无标签数据 2506.00436v2 -
1079 06-09 How Expressive are Knowledge Graph Foundation Models? Wie ausdrucksstark sind Wissensgraphen-Stiftungsmodelle? 知识图基础模型的表达性如何? 2502.13339v2 -
1080 06-09 Revisiting Non-Acyclic GFlowNets in Discrete Environments Nicht-azyklische GFlowNets in diskreten Umgebungen erneut besuchen 重新审视环境混乱中的非周期性GFlowNets 2502.07735v2 -
1081 06-09 TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts TwinBreak: Jailbreaking LLM Security Alignments basierend auf Twin Prompts 双突破:基于双提示的监狱破碎LLM安保调整 2506.07596v1 -
1082 06-09 Tree-Sliced Wasserstein Distance: A Geometric Perspective Baumgeschliffene Wasserstein-Distanz: Eine geometrische Perspektive 树切瓦西尔斯坦距离:几何视角 2406.13725v3 -
1083 06-09 Exploiting Curvature in Online Convex Optimization with Delayed Feedback Ausnutzen von Kurven in Online Convex-Optimierung mit verzögertem Feedback 利用在线 Convex 优化使用延迟反馈的在线 Convex 优化中的曲线曲线 2506.07595v1 -
1084 06-09 Lite-RVFL: A Lightweight Random Vector Functional-Link Neural Network for Learning Under Concept Drift Lite-RVFL: Ein leichter Zufallsvektor Funktions-Link Neuronales Netzwerk zum Lernen unter Konzept Drift Lite-RVFL: 轻量随机随机矢量功能链接神经网络,用于在概念驱动下学习 2506.08063v1 -
1085 06-09 FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning FairDICE: Fairness-getriebenes Offline-Multi-Ziel-Verstärkungs-Lernen FairDICE: 公平发展离线多目标强化学习 2506.08062v1 -
1086 06-09 Explore the vulnerability of black-box models via diffusion models Entdecken Sie die Schwachstelle von Black-Box-Modellen über Diffusionsmodelle 通过推广模式探索黑盒模式的脆弱性 2506.07590v1 -
1087 06-09 Attention with Trained Embeddings Provably Selects Important Tokens Aufmerksamkeit bei trainierten Einbettungen wählt wahrscheinlich wichtige Token aus 与经过训练的嵌入器的关注 2505.17282v2 -
1088 06-09 PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs PrunePEFT: Iterative Hybride Pruning für eine parametereffiziente Feinabstimmung von LLMs PurnePEFT: LLMs 参数有效微调过渡混合节制 2506.07587v1 -
1089 06-09 Theoretical Limitations of Ensembles in the Age of Overparameterization Theoretische Einschränkungen von Ensembles im Zeitalter der Überparameterisierung 过分参数化时代集合群的理论局限性 2410.16201v2 -
1090 06-09 Aircraft Trajectory Dataset Augmentation in Latent Space Flugbahndatensatz Augmentation im Latent Space 低空空间轨迹数据集增强 2506.07585v1 -
1091 06-09 FedCGD: Collective Gradient Divergence Optimized Scheduling for Wireless Federated Learning FedCGD: Kollektive Gradient Divergenz optimierte Planung für drahtloses Federated Learning FedCGD:无线联邦学习集体渐进变异优化计划 2506.07581v1 -
1092 06-09 HSF: Defending against Jailbreak Attacks with Hidden State Filtering HSF: Verteidigung gegen Jailbreak Attacken mit versteckten Zustand Filterung HSF: 防范以隐蔽国家过滤手段侵入监狱的袭击 2409.03788v2 -
1093 06-09 Denoising the Future: Top-p Distributions for Moving Through Time Denoising the Future: Top-P-Distributionen für die Zeit 摒弃未来:跨越时空的顶级发行品 2506.07578v1 -
1094 06-09 Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models Uncertainty-o: Ein Modell-agnostisches Framework zur Enthüllung von Uncertainty in großen multimodalen Modellen 不确定性 – – o:大型多式联运模型中不确定性的不确定性稳定的一个示范 – – 不可知框架 2506.07575v1 -
1095 06-09 Attacking Attention of Foundation Models Disrupts Downstream Tasks Angriff auf die Aufmerksamkeit von Stiftungsmodellen bricht nachgelagerte Aufgaben aus 攻击基金会模型的注意 破坏下流任务 2506.05394v2 -
1096 06-09 AI Scientists Fail Without Strong Implementation Capability KI-Wissenschaftler scheitern ohne starke Umsetzungsfähigkeit AI 缺乏强有力的执行能力的科学家失败 2506.01372v2 -
1097 06-09 Complex Physics-Informed Neural Network Komplexes physik-informiertes Neuronales Netzwerk 复杂物理内化神经网络 2502.04917v2 -
1098 06-09 A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment Ein einfacheres und erlernbares Graph Convolutional Attention Network für unüberwachtes Wissen Graphen Alignment 一个用于不受监督知识图表对齐的简化和可学习的图变注意网络 2410.13263v2 -
1099 06-09 AQUATIC-Diff: Additive Quantization for Truly Tiny Compressed Diffusion Models AQUATIC-Diff: Additive Quantisierung für wirklich kleine komprimierte Diffusionsmodelle AQUATIC-Diff: 真正微缩压缩扩散模型补充量化 2506.05960v2 -
1100 06-09 Improving Memory Efficiency for Training KANs via Meta Learning Verbesserung der Speichereffizienz für die Schulung von KANs über Meta Learning 通过Meta学习提高培训KANs的记忆效率 2506.07549v1 -
1101 06-09 Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques Eliciting Fine-Tuned Transformer Fähigkeiten durch Inferenz-Zeit-Techniken 通过推断-时间技术的精密导精度变形器能力 2506.08060v1 -
1102 06-09 Output-Constrained Decision Trees Output-beschränkte Entscheidungsbäume 受产出约束的决策树 2405.15314v3 -
1103 06-09 Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling Heartcare Suite: Multidimensionales Verständnis von EKG mit Raw Multi-Lead Signal Modeling 心脏护理套套件:以原始多牵头信号建模对ECG的多层面理解 2506.05831v2 -
1104 06-09 Flowing Datasets with Wasserstein over Wasserstein Gradient Flows Flowing Datasets mit Wasserstein über Wasserstein Gradient Flows 与瓦森斯坦对瓦森斯坦梯流的流数据集 2506.07534v1 -
1105 06-09 Learning What Reinforcement Learning Can’t: Interleaved Online Fine-Tuning for Hardest Questions Lernen, was Verstärktes Lernen nicht kann: Interleaved Online Fine-Tuning für härteste Fragen 学习强化学习无法做到的东西: 2506.07527v1 -
1106 06-09 Test-time Correlation Alignment Prüfzeitkorrelation Ausrichtung 测试时关联对齐 2505.00533v2 -
1107 06-09 Towards Achieving Perfect Multimodal Alignment Auf dem Weg zu einer perfekten multimodalen Ausrichtung 实现完美多式联运协调 2503.15352v2 -
1108 06-09 Addressing Correlated Latent Exogenous Variables in Debiased Recommender Systems Adressierung korrelierter latenter exogener Variablen in debiased Recommender Systems 处理受损害建议系统中与气候相关的低端外部源变量 2506.07517v1 -
1109 06-09 SecFormer: Fast and Accurate Privacy-Preserving Inference for Transformer Models via SMPC SecFormer: Schnelle und genaue Datenschutz-Bewahrung von Schlussfolgerungen für Transformer-Modelle über SMPC SecFormer:通过SMPC为变形模型快速和准确的隐私保护推断 2401.00793v5 -
1110 06-09 Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators Beyond Closure Models: Chaotische Systeme über physikinformierte Neuraloperatoren lernen 超越封闭模型:通过物理内成形神经操作员学习毛质系统 2408.05177v4 -
1111 06-09 Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models Eliciting In-Context Retrieval und Reasoning für Lang-Kontext Große Sprachmodelle 长式长式大语言模型的全文检索和理由 2501.08248v3 -
1112 06-09 Reinforcement Learning via Implicit Imitation Guidance Verstärkung Lernen über Implizite Imitation Anleitung 通过隐含模拟指导加强学习 2506.07505v1 -
1113 06-09 CaliciBoost: Performance-Driven Evaluation of Molecular Representations for Caco-2 Permeability Prediction CaliciBoost: Performance-getriebene Auswertung von molekularen Darstellungen für Caco-2 Permeabilitätsvorhersage CaliciBoost:对Caco-2 渗透性预测的分子代表量进行实绩-驱动评价 2506.08059v1 -
1114 06-09 BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction BatteryLife: Ein umfassender Datensatz und Benchmark für die Vorhersage der Akkulaufzeit 电池寿命:电池寿命预测综合数据集和基准 2502.18807v6 -
1115 06-09 Scalable Sobolev IPM for Probability Measures on a Graph Skalierbares Sobolev IPM für Wahrscheinlichkeitsmessungen auf einem Graphen 可缩缩缩的 Sobolev 用于对图表进行概率测量的IPM IPM 2502.00737v2 -
1116 06-09 Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning Graph-of-Causal Evolution: Herausfordernde Modellkette für die Vernunft Causal 进化图:有挑战性的理由模型链 2506.07501v1 -
1117 06-09 Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks Mind the Gap: Entfernen der Diskretisierungslücke in differenzierbaren Logic Gate-Netzwerken 牢记差距:消除不同逻辑门网的分化差距 2506.07500v1 -
1118 06-09 Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization Low-Dimension-to-High-Dimension Verallgemeinerung und ihre Implikationen für Länge Verallgemeinerung 低差异至高差异通用化及其对延长通用化的影响 2410.08898v2 -
1119 06-09 Explicit Preference Optimization: No Need for an Implicit Reward Model Explizite Preference-Optimierung: Keine Notwendigkeit für ein Implizites Prämienmodell 明确的优惠最优化:没有必要采用隐性奖赏模式 2506.07492v1 -
1120 06-09 Binary Classifier Optimization for Large Language Model Alignment Binärklassifikator-Optimierung für großsprachliche Modellausrichtung 大语言模型对齐的二进制分类器优化 2404.04656v2 -
1121 06-09 CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model für die Kontextoptimierung CoCOA-混合:环境优化组合模型 2506.07484v1 -
1122 06-09 Premise Selection for a Lean Hammer Premise Auswahl für einen Lean Hammer 精精精锤预选 2506.07477v1 -
1123 06-09 FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models FRED: Flexible Reduction-Distribution Interconnect und Communication Implementation für Wafer-Scale Distributed Training von DNN-Modellen FRED: DNN模型的宽度分布式分散式培训灵活弹性吸收-分散式互联和通信实施 2406.19580v2 -
1124 06-09 Deep Ridgelet Transform and Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines Deep Ridgelet Transform und Unified Universality Theorem für tiefe und flache Gelenk-Gruppe-Equivariante Maschinen 深脊变异和统一通用深海和浅水联合小组-平等合用机器理论 2405.13682v5 -
1125 06-09 Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models Verfolgung beweglicher Ziele mit Online-Selbstspiel-Verstärkung Lernen für sicherere Sprachmodelle 利用在线加强自身能力学习,建立更安全语言模式,以追踪移动目标 2506.07468v1 -
1126 06-09 Physics-Informed Deep Learning Model for Line-integral Diagnostics Across Fusion Devices Physik-informiertes Deep-Learning-Modell für linienintegrierte Diagnosen über Fusionsgeräte 横穿融合装置的直线整体诊断法深学习模型 2412.00087v3 -
1127 06-09 Circumventing Backdoor Space via Weight Symmetry Umgehung des Backdoor-Raums über die Gewichtssymmetrie 通过重量对称法环绕后门空间 2506.07467v1 -
1128 06-09 Parameter-Efficient Fine-Tuning of State Space Models Parameter-Effizient Feintuning von State Space Models 国家空间模型有效精美设计 2410.09016v3 -
1129 06-09 KScope: A Framework for Characterizing the Knowledge Status of Language Models KScope: Ein Rahmen für die Charakterisierung des Wissensstatus von Sprachmodellen KCope: 语言模式知识状况描述框架 2506.07458v1 -
1130 06-09 When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment Wenn Stil die Sicherheit bricht: Verteidigung von Sprachmodellen gegen hocheffiziente Stilausrichtung 当样式打破安全时: 防御语言模型, 对抗超优样式对齐 2506.07452v1 -
1131 06-09 Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs Die Erweiterung der epistemischen Unsicherheit über Parameter hinaus würde helfen, zuverlässige LLMs zu konzipieren 超越参数的不确定性延伸有助于设计可靠的长效磁环 2506.07448v1 -
1132 06-09 One Fits All: Learning Fair Graph Neural Networks for Various Sensitive Attributes Eins passt auf alle: Faire Graphen-Neural-Netzwerke für verschiedene Sensitive Attribute lernen ” 一个适合所有人 “ :学习各种敏感属性的公平图表神经网络 2406.13544v3 -
1133 06-09 Resilience of Rademacher chaos of low degree Resilienz von Rademacher Chaos von niedrigem Grad Rademacher低程度混乱的复原能力 2402.10504v5 -
1134 06-09 Position: We Need Responsible, Application-Driven (RAD) AI Research Position: Wir brauchen verantwortungsbewusste, anwendungsgetriebene (RAD) KI-Forschung 立场:我们需要负责任、应用驱动(RAD)AI研究 2505.04104v2 -
1135 06-09 Straight-Line Diffusion Model for Efficient 3D Molecular Generation Geradliniges Diffusionsmodell für effiziente 3D-Molekulargeneration 高效 3D 分子生成的直线扩散模型 2503.02918v2 -
1136 06-09 Federated In-Context Learning: Iterative Refinement for Improved Answer Quality Föderiertes In-Context-Lernen: Iterative Veredelung für verbesserte Antwortqualität 联邦内部学习:改进答复质量的过渡性改进 2506.07440v1 -
1137 06-09 Fast Geometric Embedding for Node Influence Maximization Schnelle geometrische Einbettung für Knotenbeeinflussung Maximierung 节点影响最大化快速几何嵌入 2506.07435v1 -
1138 06-09 Cartridges: Lightweight and general-purpose long context representations via self-study Patronen: Leichte und universelle lange Kontextdarstellungen durch Selbststudium Cartridges:轻量和一般用途长背景介绍,通过自学 2506.06266v2 -
1139 06-09 Multi-agent Architecture Search via Agentic Supernet Multi-Agent Architektur Suche über Agentic Supernet 通过 Agric Supernet 多剂机构建筑搜索 2502.04180v2 -
1140 06-09 FlashDMoE: Fast Distributed MoE in a Single Kernel FlashDMoE: Schnell verteiltes MoE in einem einzigen Kernel FlashDMoE: 在一个单一核心中快速分布的教育部 2506.04667v2 -
1141 06-09 State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models State-offset Tuning: State-based Parameter-Efficient Fine-Tuning für State Space Models 以国家为基础的国家空间模型参数 – – 有效精美设计 2503.03499v2 -
1142 06-09 HeTa: Relation-wise Heterogeneous Graph Foundation Attack Model HeTa: Verhältnisweises Heterogenes Graph Foundation Attack Model HeTa: 关系性异异异图形基金会攻击模型 2506.07428v1 -
1143 06-09 Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective Macht-Rechts-Dekay Verlust für große Sprachmodell Finetuning: Eine Theorie-Perspektive 大语言模型微调的功率法减缩损失:理论视角 2505.16900v5 -
1144 06-09 DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning DOMAIN: Mildly COnservative Model-BAsed Offline Verstärkungslernen DOMAIN: 低度防腐性示范性软体外表强化学习 2309.08925v4 -
1145 06-09 IF-GUIDE: Influence Function-Guided Detoxification of LLMs IF-GUIDE: Einfluss funktionsgeführte Entgiftung von LLMs IFGIIDE: LLMs的受控功能解毒作用影响 2506.01790v2 -
1146 06-09 Evidential Spectrum-Aware Contrastive Learning for OOD Detection in Dynamic Graphs Evidential Spectrum-Aware Kontrastives Lernen für OOD Detection in dynamischen Graphen 动态图中 OOD 探测的 证据性光谱- 软件差异性学习 2506.07417v1 -
1147 06-09 LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments LiteVLM: Eine Bildverarbeitungs-Modell-Inferenz-Pipeline mit geringer Latenz für ressourcenbeschränkte Umgebungen LiteVLM:资源受限环境的低寿命愿景-语言模型推导管道 2506.07416v1 -
1148 06-09 Variational Supervised Contrastive Learning Variationelles Überwachtes Kontrastuelles Lernen 差异监督反舞弊学习 2506.07413v1 -
1149 06-09 GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models GANQ: GPU-Adaptive nicht einheitliche Quantisierung für große Sprachmodelle GANQ: 大语言模型的GPU-APU-APU-Adapidive非统一量化 2501.12956v3 -
1150 06-09 Structured Pruning for Diverse Best-of-N Reasoning Optimization Strukturierte Prunings für die unterschiedliche Best-of-N-Reasoning-Optimierung 结构审慎,实现最佳理由的多样化最佳最佳理由的优化 2506.03978v2 -
1151 06-09 Fractional-order Jacobian Matrix Differentiation and Its Application in Artificial Neural Networks Fraktions-Order Jacobian Matrix Differenzierung und seine Anwendung in künstlichen neuralen Netzwerken Jacobian 矩阵差异及其在人造神经网络中的应用 2506.07408v1 -
1152 06-09 Anomaly Detection and Early Warning Mechanism for Intelligent Monitoring Systems in Multi-Cloud Environments Based on LLM Anomalieerkennungs- und Frühwarnmechanismus für intelligente Überwachungssysteme in Multi-Cloud-Umgebungen auf Basis von LLM 以LLM为基础在多城市环境中建立智能监测系统的异常探测和预警机制 2506.07407v1 -
1153 06-09 InverseScope: Scalable Activation Inversion for Interpreting Large Language Models InverseScope: Skalierbare Aktivierungs-Inversion zum Dolmetschen großer Sprachmodelle 反向范围: 用于解释大语言模型的可缩放活动转换 2506.07406v1 -
1154 06-09 RiemannFormer: A Framework for Attention in Curved Spaces RiemannFormer: Ein Rahmenwerk für Aufmerksamkeit in geschwungenen Räumen Riemann Fremer: 曲线空间的注意框架 2506.07405v1 -
1155 06-09 Outlier-weighed Layerwise Sampling for LLM Fine-tuning Ausgewägte schichtweise Probenahme für LLM-Feinabstimmung LLM 精密调试 2405.18380v3 -
1156 06-09 Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions Numerische Untersuchung der Sequenzmodellierungstheorie mit steuerbaren Speicherfunktionen 使用可控内存函数对序列模拟理论进行数字调查 2506.05678v2 -
1157 06-09 G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems G-Memory: Hierarchischer Speicher für Multi-Agent-Systeme G-记忆:为多机构系统追踪等级记忆 2506.07398v1 -
1158 06-09 Nonparametric Modern Hopfield Models Nichtparametrische moderne Hopfield-Modelle 非参数现代跳地模型 2404.03900v2 -
1159 06-09 From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks Von der statischen zur adaptiven Verteidigung: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks 从静态到适应性防卫:联邦多要求深入强化学习-驱动移动目标防御,防止UAV Swarm网络的DoS攻击 2506.07392v1 -
1160 06-09 Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection Schutz von Deep-Learning-Modell-Urheberrechten mit zweifelhafter Beispiel-freier Wiederverwertungserkennung 保护深学习模式版权,进行反反对学性实例自由再利用探测 2407.03883v2 -
1161 06-09 Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert Lärm-Robustheit durch Lärm: Asymmetrische LoRA-Anpassung mit Giftungsexperte 噪音造成的噪音破坏:与中毒专家一起进行非对称 LoRA适应 2505.23868v3 -
1162 06-09 Moment Alignment: Unifying Gradient and Hessian Matching for Domain Generalization Moment Alignment: Vereinheitlichung von Gradienten und Hessischen Matching für Domain Generalization 时速对齐: 统一渐变和赫西相匹配域域通用化 2506.07378v1 -
1163 06-09 Adapter Naturally Serves as Decoupler for Cross-Domain Few-Shot Semantic Segmentation Adapter dient natürlich als Entkoppler für die Cross-Domain-Few-Shot Semantic Segmentierung 自然适应器作为跨域鲜热语义分割的解码器 2506.07376v1 -
1164 06-09 Numerical Analysis of HiPPO-LegS ODE for Deep State Space Models Numerische Analyse von HiPPO-LegS ODE für Deep State Space Modelle HIPPPO-LegS深海国家空间模型ODE的数值分析 2412.08595v2 -
1165 06-09 A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment Eine umfassende Umfrage in LLM(-Agent) Full Stack Sicherheit: Daten, Schulung und Bereitstellung 用LLLM(-代理)全堆安全:数据、培训和部署进行的全面调查 2504.15585v4 -
1166 06-09 Theoretical Benefit and Limitation of Diffusion Language Model Theoretischer Nutzen und Begrenzung des Diffusions-Sprachmodells 传播语言模式理论效益和限制 2502.09622v2 -
1167 06-09 MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing MoE-GPS: Guidlines für Vorhersagestrategie für dynamische Expertenduplikation in MoE Load Balancing MoE-GPS:教育部负载平衡中动态专家重叠预测战略指南 2506.07366v1 -
1168 06-09 Multiple Object Stitching for Unsupervised Representation Learning Mehrere Objektstiche für unüberwachtes Repräsentationslernen 用于无人监督的代表学习的多对象拼接 2506.07364v1 -
1169 06-09 Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding Stochastische nichtlineare Steuerung über finite-dimensionale Spektral-Dynamische Einbettung 通过有限维谱光谱动态嵌入器控制非线性控件 2304.03907v5 -
1170 06-09 Lightweight Joint Audio-Visual Deepfake Detection via Single-Stream Multi-Modal Learning Framework Leichte gemeinsame Audio-Visuelle Deepfake-Erkennung über Multi-Modal-Lernrahmen mit Single-Stream 通过单层多模式学习框架,通过单一链式多模式学习框架,联合探测音频-视觉深海假相 2506.07358v1 -
1171 06-09 CBAM-STN-TPS-YOLO: Enhancing Agricultural Object Detection through Spatially Adaptive Attention Mechanisms CBAM-STN-TPS-YOLO: Verbesserung der Erkennung landwirtschaftlicher Objekte durch räumlich adaptive Aufmerksamkeitsmechanismen CBAM-STN-TPS-YOLO:通过空间适应性关注机制加强农业物体的探测 2506.07357v1 -
1172 06-09 Unveiling and Addressing Pseudo Forgetting in Large Language Models Enthüllung und Adressierung von Pseudo Forgetting in großen Sprachmodellen 在大语言模式中统一和解决 “ 优于 “ 的遗忘 2411.11932v2 -
1173 06-09 SALT: A Lightweight Model Adaptation Method for Closed Split Computing Environments SALT: Eine leichte Modellanpassungsmethode für geschlossene Split Computing-Umgebungen SALT: 用于封闭式拆分计算环境的轻量模型适应方法 2506.07355v1 -
1174 06-09 Task Generalization With AutoRegressive Compositional Structure: Can Learning From $D$ Tasks Generalize to $D^{T}$ Tasks? Aufgabenverallgemeinerung mit autoregressiver Kompositionsstruktur: Kann man von $D$ Aufgaben zu $D^{T}$ Aufgaben verallgemeinern? 与自动递减的构成结构相关的一般任务:能否从“$D”的任务向“$D”的任务普遍学习到“$D”的任务? 2502.08991v2 -
1175 06-09 Decentralized Optimization on Compact Submanifolds by Quantized Riemannian Gradient Tracking Dezentralisierte Optimierung auf kompakten Submanifolds durch Quantized Riemannian Gradient Tracking 量化的里伊曼尼梯度跟踪对契约子部件的优化 2506.07351v1 -
1176 06-09 Distributed Risk-Sensitive Safety Filters for Uncertain Discrete-Time Systems Verteilte risikosensible Sicherheitsfilter für unsichere Diskretzeitsysteme 用于不确定分辨时间系统的分散式风险敏感度安全过滤器 2506.07347v1 -
1177 06-09 ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity ST-GraphNet: Ein räumlich-temporales Neuralnetzwerk für das Verständnis und die Vorhersage von Automatisiertem Fahrzeug-Crash-Severity ST-GraphNet:一个了解和预测自动车辆撞车严重性的空间-时时图神经网络 2506.08051v1 -
1178 06-09 A Comprehensive Survey on Artificial Intelligence for Complex Network: Potential, Methodology and Application Eine umfassende Umfrage zu Künstlicher Intelligenz für komplexes Netzwerk: Potenzial, Methodik und Anwendung 复杂网络人工情报综合调查:潜力、方法和应用 2402.16887v2 -
1179 06-09 On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks Über die Auswirkungen von Unsicherheit und Kalibrierung auf Likelihood-Ratio-Mitgliedschafts-Inferenzangriffe 不确定性和校准对潜在――拉蒂奥成员推论攻击的影响 2402.10686v5 -
1180 06-09 Calibrated Value-Aware Model Learning with Probabilistic Environment Models Kalibriertes Value-Aware-Modell-Lernen mit probabilistischen Umweltmodellen 使用概率环境模型校准的价值软件模型学习 2505.22772v2 -
1181 06-09 Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models Membership Inferenz Attack sollte weiter zu Verteilungsstatistiken für destillierte Generative Modelle 成员攻击的推论应转向已蒸馏生成模型的分发统计数据 2502.02970v2 -
1182 06-09 Real-Time Execution of Action Chunking Flow Policies Real-Time Ausführung von Maßnahmen Chunking Flow Richtlinien 实时执行 “ 快速行动 “ 流动政策 2506.07339v1 -
1183 06-09 Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models Graph-KV: Durchbrechen der Sequenz durch Einspritzen struktureller Biasen in große Sprachmodelle 图图-KV:通过向大语言模型输入结构分界来打破序列 2506.07334v1 -
1184 06-09 Model Generalization on Text Attribute Graphs: Principles with Large Language Models Modellverallgemeinerung auf Text-Attribut-Grafiken: Prinzipien mit großen Sprachmodellen 文本属性图的模型通用化:大语言模型原则 2502.11836v2 -
1185 06-09 JavelinGuard: Low-Cost Transformer Architectures for LLM Security JavelinGuard: Low-Cost-Transformer-Architekturen für LLM-Sicherheit JavelinGuard:低成本变革结构促进LLM安全 2506.07330v1 -
1186 06-09 Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning Dynamische Planung für Fahrzeug-zu-Fahrzeug-Kommunikation verbessertes Federated Learning 车辆到车辆的交通通信增强联邦学习的动态日程安排 2406.17470v2 -
1187 06-08 (7) Mobility-Aware Asynchronous Federated Learning with Dynamic Sparsification Mobilität-Bewusst Asynchrones Federated Learning mit dynamischer Sparsifikation 具有动态分级功能的流动意识非同步联邦学习 2506.07328v1 -
1188 06-08 Reward Model Interpretability via Optimal and Pessimal Tokens Reward Model Interpretierbarkeit über Optimale und Pessimal Token 通过最佳和微量调制调来解释性 2506.07326v1 -
1189 06-08 Devanagari Digit Recognition using Quantum Machine Learning Devanagari Digit Recognition mit Quantum Machine Learning Devanagari Digit 使用量子机器学习识别 2506.09069v1 -
1190 06-08 DEF: Diffusion-augmented Ensemble Forecasting DEF: Diffusionsgenerierte Ensemble-Prognose DEF: 扩散增强的集合预测 2506.07324v1 -
1191 06-08 Watermarking Language Models with Error Correcting Codes Wasserzeichen-Sprachenmodelle mit Fehler-Korrektur-Codes 错误校正代码的划水语言模型 2406.10281v4 -
1192 06-08 Generative Modeling of Networked Time-Series via Transformer Architectures Generative Modellierung von vernetzten Zeitreihen über Transformer-Architekturen 通过变换器架构生成网络化时序建模 2506.07312v1 -
1193 06-08 Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference Paged Attention trifft auf FlexAttention: Locking Long-Context Efficiency in Deployed Inference 专注:在部署推论中解锁长文本效率 2506.07311v1 -
1194 06-08 PASS: Private Attributes Protection with Stochastic Data Substitution PASS: Schutz privater Attribute mit stochastischer Datenersetzung PASS: 私人财产保护,用存储数据替代数据 2506.07308v1 -
1195 06-08 Highly Fast Text Segmentation With Pairwise Markov Chains Hochschnelle Textsegmentierung mit paarweise Markov-Ketten 配对马科夫链条的快速文本分割 2102.11037v2 -
1196 06-08 Sharpness-Aware Teleportation on Riemannian Manifolds Scharfheitsbewusste Teleportation auf Riemannschen Manifolds Riemannian Manifolds 上的移动 2309.17215v2 -
1197 06-08 Uncertainty-Aware Strategies: A Model-Agnostic Framework for Robust Financial Optimization through Subsampling Unsichere Strategien: Ein modell-agnostischer Rahmen für robuste finanzielle Optimierung durch Unterstichproben 不确定性 – – 软件战略:通过子取样实现稳健金融优化的模型 – – 不可确定框架 2506.07299v1 -
1198 06-08 MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache MiniKV: Die Grenzen der LLM-Inferenz durch 2-Bit-Layer-Diskriminativ-KV-Cache drücken MiniKV: 通过2-Bit 层分解 KV 缓存推进LLM 推断的限度 2411.18077v3 -
1199 06-08 On the kernel learning problem Zum Problem des Kernel-Lernens 关于内核学习问题 2502.11665v2 -
1200 06-08 Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI Multi-Step Guided Diffusion für die Bildwiederherstellung an Edge-Geräten: Auf dem Weg zur leichten Wahrnehmung in körpernaher KI 多脚制导扩散,促进边缘设备图像恢复:向Embodied AI 中的轻量度感知 2506.07286v1 -
1201 06-08 Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization Verteidigung gegen verschiedene Angriffe im Federated Learning durch Konsens-basierte Bi-Level-Optimierung 通过基于共识的双级优化,在通过共识实现的两级最佳化,在联邦学习中防范多种袭击 2412.02535v2 -
1202 06-08 Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback 超前优先优化(EGPO):超越纳什最后的融合(Nash) 从人类反馈中学习 2503.08942v2 -
1203 06-08 Selective Prompt Anchoring for Code Generation Selektive Prompt-Ankerung für die Code-Generierung 代代代代代代代代代代代代代代代代代 代代代代代代代代代代代代代 代代代代代代代代代代代代 2408.09121v5 -
1204 06-08 Machine Learning Should Maximize Welfare, but Not by (Only) Maximizing Accuracy Maschinelles Lernen sollte Wohlfahrt maximieren, aber nicht durch (nur) Maximierung der Genauigkeit 机器学习应该最大限度地提高福利,但不能(仅仅)最大限度地提高准确度 2502.11981v2 -
1205 06-08 Tokenized Bandit for LLM Decoding and Alignment Tokenized Bandit für LLM Decodierung und Ausrichtung 用于LLMM 解码和对齐的定義土匪 2506.07276v1 -
1206 06-08 A Cramér-von Mises Approach to Incentivizing Truthful Data Sharing Ein Cramér-von-Mises-Ansatz zur Förderung der wahrheitsgemäßen Datenfreigabe 激励真相数据共享的Cramér-von Mises方法 2506.07272v1 -
1207 06-08 Machine Learning-Based Self-Localization Using Internal Sensors for Automating Bulldozers Maschinelle Learning-basierte Selbstlokalisierung mittels interner Sensoren zur Automatisierung von Bulldozern 利用内部传感器实现推推推车自动化 2506.07271v1 -
1208 06-08 PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference PecSched: Präventive und effiziente Clusterplanung für LLM-Inferenz 预测:先发制人和高效率的LLM推理群集安排 2409.15104v2 -
1209 06-08 Universality of conformal prediction under the assumption of randomness Universalität der konformen Vorhersage unter der Annahme der Zufälligkeit 假设随机性的情况下一致预测的普遍性 2502.19254v2 -
1210 06-08 FDC: Fast KV Dimensionality Compression for Efficient LLM Inference FDC: Schnelle KV-Dimensionalitätskompression für effiziente LLM-Inferenz FDC: 高效 LLM 推断快速 KV 尺寸压缩 2408.04107v3 -
1211 06-08 Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods Ensemble-MIX: Verbesserung der Probeneffizienz in Multi-Agent RL mit Ensemble-Methoden 组合-MIX:利用组合方法提高多代理RL的取样效率 2506.02841v2 -
1212 06-08 BG-HOP: A Bimanual Generative Hand-Object Prior BG-HOP: Bimanuelle Generative Hand-Objekt-Vorhersage BG-HOP: 双体生成手向对象前置 2506.09068v1 -
1213 06-08 Counterfactual inference in sequential experiments Gegenfaktische Inferenz in sequentiellen Experimenten 连续实验中的反事实推论 2202.06891v5 -
1214 06-08 RADAR: Recall Augmentation through Deferred Asynchronous Retrieval RADAR: Erinnerung an Augmentation durch verzögerte asynchrone Retrieval RADAR:通过推迟的非同步回溯性回溯性增量 2506.07261v1 -
1215 06-08 ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition ALINE: Gemeinsame Amortisation für Bayesian Inferenz und aktive Datenerfassung AlINE: 贝耶斯推断和主动数据采集联合摊销 2506.07259v1 -
1216 06-08 Can the Rookies Cut the Tough Cookie? Exploring the Use of LLMs for SQL Equivalence Checking Können die Rookies das harte Cookie ausschneiden? Erforschen der Verwendung von LLMs für SQL-Equivalenzprüfung Rookies 能够切开硬饼干吗? 探索 SQL 等效检查使用 LLMs 的情况 2412.05561v2 -
1217 06-08 AMPO: Active Multi-Preference Optimization for Self-play Preference Selection AMPO: Aktive Multi-Preference-Optimierung für Selbstspiel-Preference-Auswahl AMPO: 主动的自玩首选多优先优化 2502.18293v2 -
1218 06-08 DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products DeltaProdukt: Verbesserung der State-Tracking in linearen RNNs über Haushaltsprodukte DeltaProduction:通过家用产品改进国家通过家用产品对Linear RNNNs的跟踪 2502.10297v5 -
1219 06-08 Horizon Reduction Makes RL Scalable Horizontreduzierung macht RL skalierbar 降低地平线使RL可缩放 2506.04168v2 -
1220 06-08 Regretful Decisions under Label Noise Bedauerliche Entscheidungen im Rahmen von Label Noise 标签噪音下的遗憾决定 2504.09330v2 -
1221 06-08 Unraveling Token Prediction Refinement and Identifying Essential Layers in Language Models Unraveling Token Prediction Verfeinerung und Identifizierung wesentlicher Ebenen in Sprachmodellen 在语言模型中统一地改进和确定关键层 2501.15054v2 -
1222 06-08 Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search Unsichere LLM-basierte Suche: Quantitative Analyse und Minderung von Sicherheitsrisiken in der AI-Websuche 不安全的LLM-基于LLM的搜索:AI网络搜索中安全风险的定量分析和减轻 2502.04951v2 -
1223 06-08 Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy Imperatives Lernen: Ein selbstüberwachtes neuro-symbolisches Lernrahmenwerk für Roboterautonomie 必要学习:自我监督的机器人自主的神经-双向学习框架 2406.16087v6 -
1224 06-08 InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference InfoSEM: Ein tiefgehendes Generatives Modell mit informativen Prioren für Gene Regulatory Network Inferenz InfoSEM: 基因监管网络推论具有信息前置信息的深生成模型 2503.04483v2 -
1225 06-08 Improving the Efficiency of Long Document Classification using Sentence Ranking Approach Verbesserung der Effizienz der Langdokumentklassifikation mittels Sentence-Ranking-Ansatz 采用判决分级办法提高长文件分类的效率 2506.07248v1 -
1226 06-08 Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models Förderung der Ensemblevielfalt mit interaktiver bayesischer Distributions Robustheit für Feinsteuerungs-Stiftungsmodelle 促进混合多样性与互动的贝叶斯分配强力促进微调基础模型 2506.07247v1 -
1227 06-08 Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs Übertaktung LLM-Reasoning: Überwachung und Kontrolle von Denkpfadlängen in LLMs LLM 解释:监测和控制LLM中思考长度 2506.07240v1 -
1228 06-08 Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments Lernen als Individuen, Evolve als Team: Multi-Agent LLMs Anpassung in körpereigenen Umgebungen 作为个人学习,作为一个团队参与:多剂LMs在渗透环境中的适应 2506.07232v1 -
1229 06-08 VARSHAP: Addressing Global Dependency Problems in Explainable AI with Variance-Based Local Feature Attribution VARSHAP: Bewältigung globaler Abhängigkeitsprobleme bei erklärbarer KI mit varianzbasierter lokaler Feature-Attribution VARSHAP: 解决全球依赖性问题,在可解释的AI 中与基于差异的地方特点 2506.07229v1 -
1230 06-08 A spring-block theory of feature learning in deep neural networks Eine Frühling-Block-Theorie des Feature-Lernens in tiefen neuronalen Netzwerken 深神经网络中特征学习的春季阶段理论 2407.19353v3 -
1231 06-08 Targeting relative risk heterogeneity with causal forests Ausrichtung der relativen Risiko Heterogenität mit ursächlichen Wäldern 确定相对风险目标,确定与因果森林的异质性 2309.15793v3 -
1232 06-08 VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data VORTEX: Ein räumliches Computing-Framework für optimierte Drohnentelemetrie-Extraktion aus der First-Person-Ansicht Flugdaten anzeigen VORTEX:从第一人查看飞行数据中最佳遥测提取无人机空间计算框架 2412.18505v2 -
1233 06-08 Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts Diversifizierung des Expertenwissens für Task-Agnostic Pruning in Sparse Mixture-of-Experts 使专家在Sparse Mixture 专家中的任务不可知性调节专家知识多样化 2407.09590v4 -
1234 06-08 Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward Mehr multimodale Begründungsfähigkeiten multimodaler Großsprachenmodelle durch Visual Perception Reward 通过视觉感知奖励促进多式大语言模型的多式多式能力 2506.07218v1 -
1235 06-08 Decoupled Data Consistency with Diffusion Purification for Image Restoration Entkoppelte Datenkonsistenz mit Diffusionsreinigung für Bildrestaurierung 数据与用于图像恢复的传播净化相兼容 2403.06054v6 -
1236 06-08 RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts RepLoRA: Reparameterisieren der Low-Rank-Anpassung über die Perspektive der Mischung von Experten RepLORA:通过专家混合的视角,恢复低成本适应的计量 2502.03044v2 -
1237 06-08 Tilted Sharpness-Aware Minimization Gekippte Schärfe-Bewusst-Minimierung 静液尖锐度- 最小化 2410.22656v2 -
1238 06-08 Randomness, exchangeability, and conformal prediction Zufälligkeit, Austauschbarkeit und konforme Vorhersage 随机性、易交换性以及符合性预测 2501.11689v3 -
1239 06-08 Improving Generalization with Flat Hilbert Bayesian Inference Verbesserung der Generalisierung durch flache Hilbert Bayesische Schlussfolgerung 利用Flat Hilbert Bayesian推论改进普遍化 2410.04196v2 -
1240 06-08 Another look at inference after prediction Ein weiterer Blick auf die Schlussfolgerung nach der Vorhersage 另一种观点是预测之后的推论 2411.19908v4 -
1241 06-08 Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching Audio-Synthesizer-Inversion in symmetrischen Parameterräumen mit ungefähr gleichmässigem Flow-Matching 音频合成器在对称参数空间的反转, 大约等差流匹配 2506.07199v1 -
1242 06-08 GGBall: Graph Generative Model on Poincaré Ball GGBall: Graph Generatives Modell auf Poincaré Ball GGBall:poincaré球球的图形生成模型 2506.07198v1 -
1243 06-08 Finite-PINN: A Physics-Informed Neural Network with Finite Geometric Encoding for Solid Mechanics Finite-PINN: Ein physikinformiertes Neuronales Netzwerk mit Finite Geometrischer Encoding für feste Mechanik Finite-PINN:一个物理化神经网络,其中含有固体机械学的有限几何编码 2412.09453v2 -
1244 06-08 Analyzing Breast Cancer Survival Disparities by Race and Demographic Location: A Survival Analysis Approach Analyse der Überlebensdisparitäten von Brustkrebs nach Rasse und demographischer Lage: Ein Ansatz zur Überlebensanalyse 按种族和人口地点分析乳腺癌生存差异:生存分析方法 2506.07191v1 -
1245 06-08 ST-USleepNet: A Spatial-Temporal Coupling Prominence Network for Multi-Channel Sleep Staging ST-USleepNet: Ein Raum-Temporal-Koppelungs-Prominenz-Netzwerk für Multi-Channel-Sleep-Staging ST-USSLSLSINT:多通道睡眠调节空间-时际聚合显要网络 2408.11884v3 -
1246 06-08 Learning based on neurovectors for tabular data: a new neural network approach Lernen auf Basis von Neurovektoren für tabellarische Daten: ein neuer neuronaler Netzwerkansatz 基于用于表格数据的神经动力学的学习:新的神经网络方法 2506.07185v1 -
1247 06-08 Regularized Adaptive Graph Learning for Large-Scale Traffic Forecasting Regularisiertes Adaptives Graphenlernen für großräumige Verkehrsprognosen 用于大型交通流量预测的正规化适应性图表学习 2506.07179v1 -
1248 06-08 RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality RULE: Verstärkung des Unerfahrenen erreicht Vergessen-Behalten Pareto-Optimalität RULE: 强化不放功率达到遗忘-保留最佳程度 2506.07171v1 -
1249 06-08 Efficient Text-Attributed Graph Learning through Selective Annotation and Graph Alignment Effizientes Text-Attributes Graphenlernen durch selektive Annotation und Graphenausrichtung 通过选择性注释和图表对齐,通过选择性注释和图表对齐,高效率的文本定位图表学习 2506.07168v1 -
1250 06-08 AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models AMoPO: Adaptive Multi-Objektive Preference-Optimierung ohne Prämienmodelle und Referenzmodelle AMOP: 适应性、多目标、不奖励模式和参考模式的优化 2506.07165v1 -
1251 06-08 ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models ExLM: Nachdenken über die Auswirkungen von [MASK] Token in maskierten Sprachmodellen EXLM:重新思考[MASK]语调模式中[MASK]语调的影响 2501.13397v5 -
1252 06-08 SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision SMI-Editor: Bearbeiten-basiertes SMILES-Sprachenmodell mit Fragment-Level-Überwachung SMI-编辑器:基于编辑的SMILES语言模式,带有零散监督级别 2412.05569v2 -
1253 06-08 pFedSOP : Accelerating Training Of Personalized Federated Learning Using Second-Order Optimization pFedSOP : Beschleunigendes Training des personalisierten Föderierten Lernens unter Verwendung von Second-Order-Optimierung pFedSOP: 利用二阶优化加快个人化联邦学习培训 2506.07159v1 -
1254 06-08 Syntactic Control of Language Models by Posterior Inference Syntaktische Kontrolle von Sprachmodellen durch hintere Inferenz 由Posideer 推断对语言模型的协同控制 2506.07154v1 -
1255 06-08 Tensor Product Neural Networks for Functional ANOVA Model Tensor Produkt Neuronale Netzwerke für funktionales ANOVA-Modell 功能ANOVA模型的神经网络 2502.15215v4 -
1256 06-08 Comba: Improving Bilinear RNNs with Closed-loop Control Comba: Bilineare RNNs mit Closed-Loop-Steuerung verbessern Comba: 改进有闭环控制的双线区域网网 2506.02475v2 -
1257 06-08 How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level Analysis Wie reduziert DPO die Toxizität? Eine mechanistische Neuronen-Level-Analyse DPO 如何减少毒性? 2411.06424v3 -
1258 06-08 Quantile-Optimal Policy Learning under Unmeasured Confounding Quantile-Optimal Policy Learning unter ungemessener Verwirrung 在无法计量的混乱状态下进行量化最佳政策学习 2506.07140v1 -
1259 06-08 Evaluation of Machine Learning Models in Student Academic Performance Prediction Evaluation von Machine Learning Models in der studentischen Leistungsvorhersage 评价学生学术表现预测中的机器学习模式 2506.08047v1 -
1260 06-08 Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning Zuverlässige Kritiker: Monotonische Verbesserung und Konvergenzgarantien für die Stärkung des Lernens 可靠的批评:加强学习的单体改进和融合保障 2506.07134v1 -
1261 06-08 DiC: Rethinking Conv3x3 Designs in Diffusion Models DiC: Conv3x3-Designs in Diffusionsmodellen neu denken DiC: 重新思考传播模型中的Conv3x3 设计 2501.00603v2 -
1262 06-08 Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models Gleichwertige Denoiser können Graphen nicht kopieren: Richten Sie Ihre Graphen-Diffusionsmodelle aus 等式 Denois 无法复制图表: 对齐您的图形扩散模型 2405.17656v2 -
1263 06-08 Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models Quality-Diversity Red-Teaming: Automatisierte Generierung von hochwertigen und unterschiedlichen Attackern für große Sprachmodelle 质量-多样性红队:为大语言模式自动生成高品质和不同攻击者 2506.07121v1 -
1264 06-08 A Hybrid Multi-Factor Network with Dynamic Sequence Modeling for Early Warning of Intraoperative Hypotension Hybrides Multi-Factor-Netzwerk mit dynamischer Sequenzmodellierung zur Frühwarnung von intraoperativer Hypotonie 混合多要素网络,具有动态序列模型模型,以及早警告不合作水分的不合作状态; 2409.11064v4 -
1265 06-08 Conditional Denoising Meets Polynomial Modeling: A Flexible Decoupled Framework for Time Series Forecasting Bedingtes Stören trifft auf Polynommodellierung: Ein flexibles entkoppeltes Framework für die Zeitreihenprognose 满足多面性建模:时间序列预测灵活拆分框架 2410.13253v7 -
1266 06-08 Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings Auf dem Weg zu einer universellen Offline-Black-Box-Optimierung über Sprachmodell-Embeddings 通过学习语言模型嵌入式嵌入,实现普遍离线黑箱优化 2506.07109v1 -
1267 06-08 Filling the Missings: Spatiotemporal Data Imputation by Conditional Diffusion Füllung der fehlenden Daten: Spatiotemporale Daten-Imputation durch Bedingte Diffusion 填补失踪人员:通过有条件的传播来估计时空数据 2506.07099v1 -
1268 06-08 A Theoretical Characterization of Optimal Data Augmentations in Self-Supervised Learning Eine theoretische Charakterisierung von Optimal Data Augmentations im Selbst-überwachten Lernen 自我监督学习中最佳数据增加的理论特征 2411.01767v4 -
1269 06-08 Epidemic-guided deep learning for spatiotemporal forecasting of Tuberculosis outbreak Epidemisch geführtes Deep Learning für die spatiotemporale Vorhersage des Tuberkuloseausbruchs 流行病指导深层学习,以对结核病爆发进行时空预测 2502.10786v2 -
1270 06-08 Patient Similarity Computation for Clinical Decision Support: An Efficient Use of Data Transformation, Combining Static and Time Series Data Patienten-Ähnlichkeitsrechnung für die klinische Entscheidungsunterstützung: Eine effiziente Nutzung von Datentransformationen, die Kombination von statischen und Zeitreihendaten 临床决定支持:有效利用数据转换、静态和时间系列数据 2506.07092v1 -
1271 06-08 Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models Stellen Sie sich vor zu hören: Auditory Knowledge Generation kann ein effektiver Assistent für Sprachmodelle sein 想象一下:产生听力知识可以成为语言模式的有效助理。 2503.16853v2 -
1272 06-08 State Entropy Regularization for Robust Reinforcement Learning Staatliche Entropie-Regularisierung für robustes Stärkungslernen 强力强化学习的状态元件常规化 2506.07085v1 -
1273 06-08 Inverse Design of Metamaterials with Manufacturing-Guiding Spectrum-to-Structure Conditional Diffusion Model Inverses Design von Metamaterialien mit Fertigungsleit-Spektrum-zu-Struktur Bedingtes Diffusionsmodell 具有 “ 制造-引导光谱到结构有条件扩散模型 “ 的元材料逆向设计 2506.07083v1 -
1274 06-08 On the Generalization of Data-Assisted Control in port-Hamiltonian Systems (DAC-pH) Zur Generalisierung der datengestützten Steuerung in port-Hamiltonischen Systemen (DAC-pH) 港口-汉堡系统数据辅助控制普遍化(DAC-pH) 2506.07079v1 -
1275 06-08 AgentDrug: Utilizing Large Language Models in an Agentic Workflow for Zero-Shot Molecular Optimization AgentDrug: Einsatz großer Sprachmodelle in einem Agentischen Workflow für die Null-Shot-Molekularoptimierung A剂药物:利用大型语言模型进行零热分子优化的制剂工作流程中的大语言模型 2410.13147v9 -
1276 06-08 E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models E-BATS: Effiziente Backpropagation-freie Testzeitanpassung für Sprachstiftungsmodelle EBATS: 语音基础模型的有效后支持-无测试-时间适应 2506.07078v1 -
1277 06-08 Diversity from Human Feedback Vielfalt durch menschliches Feedback 来自人类反馈的多样性 2310.06648v3 -
1278 06-08 Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization Beschleunigung der 3D Gaussian Splatting mit neuraler Sortierung und achsenorientierter Rasterisierung 加速 3D Gaussian 与神经排序和轴偏向光化加速 3D Gaussian 喷洒 2506.07069v1 -
1279 06-08 Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems Stabile Derivative Freie Gaussian Mischung Variationelle Schlussfolgerung für Bayesian Inverse Probleme Bayes 反问题中稳定衍生自由高斯混集变异推论 2501.04259v3 -
1280 06-08 AdaSplash: Adaptive Sparse Flash Attention AdaSplash: Adaptive Sparse Flash Achtung AdaSplash: 适应性散散闪光关注 2502.12082v2 -
1281 06-08 D2R: dual regularization loss with collaborative adversarial generation for model robustness D2R: Dualer Regularisierungsverlust mit kollaborativer konversatorischer Generation für Modell Robustheit D2R:双重身份正规化损失,为示范性稳健而协作性对抗性一代人 2506.07056v1 -
1282 06-08 Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead Policy Gradient mit Baumsuche: Lokale Optimas durch Lookahead vermeiden 与树搜索的政策渐变:通过 Lookahead 避免本地 Oppima 2506.07054v1 -
1283 06-08 Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz Hyperbolisches rezidivierendes neuronales Netzwerk als erste Art von nicht-euklidischen neuronalen Quantenzustandsansatz 超双曲经常性神经网络,作为第一种非欧洲的神经量子状态 ansatz 2505.22083v2 -
1284 06-08 AHSG: Adversarial Attack on High-level Semantics in Graph Neural Networks AHSG: Adversarialer Angriff auf hochrangige Semantik in Graphen-Neuralen Netzwerken AHSG:对图形神经网络高层语义的反向攻击 2412.07468v3 -
1285 06-08 When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models Wenn Aufmerksamkeit zusammenbricht: Wie degenerierte Ebenen in LLMs kleinere, stärkere Modelle aktivieren 注意折叠时: LLM 中如何分解层的 LLM 启用更小、 更强模型 2404.08634v3 -
1286 06-08 FairPFN: A Tabular Foundation Model for Causal Fairness FairPFN: Ein Tabular Foundation Modell für Causal Fairness FairPPFN: 因果关系公平性表示基金会模型 2506.07049v1 -
1287 06-08 SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration SalbeiAchtung: Genaue 8-Bit-Achtung für Plug-and-Play-Inferenzbeschleunigung SageAttention: 精确注意8-Bit 加速插管和玩耍推论加速 2410.02367v8 -
1288 06-08 SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization SalbeiAchtung2: Effiziente Aufmerksamkeit mit gründlicher Ausreißerglättung und per-thread INT4 Quantisierung SageAttention 2: 高效关注,以完全外流滑动和每英尺INT4 量化 2411.10958v6 -
1289 06-08 Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning Effiziente $Q$-Learning- und Actor-Critic-Methoden für robustes durchschnittliches Reward-Verstärkungs-Lernen 高效益平均奖励强化学习的学习和行为-竞争方法 2506.07040v1 -
1290 06-08 BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling BRIDGE: Bootstrapping-Text zur Steuerung der Time-Series-Generation über Multi-Agent iterative Optimierung und Diffusionsmodellierung BRIDGE:通过多代理迭代优化和传播模型化控制时间- 系列生成的推进文本 2503.02445v4 -
1291 06-08 Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond Learning-at-Criticality in großen Sprachmodellen für Quantenfeldtheorie und darüber hinaus 量子实地理论及以后大语言模型的学习困难 2506.03703v2 -
1292 06-08 Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalanced Regression Mixtur-Experten mit selbstüberwachter Aggregation für tabellarische Imbalanced Regression 用于制表平衡回归的测试时自我监督集成混合专家 2506.07033v1 -
1293 06-08 AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint AlphaSteer:学习使用有原则的Null-空间限制的拒绝指导 2506.07022v1 -
1294 06-08 LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models LEANCODE: Modelle besser verstehen für Code-Vereinfachung von vortrainierten großen Sprachmodellen LEANCODE: 更好地理解模式,以更好地简化培训前大语言模式的守则 2505.14759v3 -
1295 06-08 Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation Struktur-agnostische Optimalität des doppelt robusten Lernens zur Abschätzung des Behandlungseffekts Doubly 治疗效果估测工具的“治疗效果强力学习”的结构-不可知性优化 2402.14264v4 -
1296 06-08 TABLET: Table Structure Recognition using Encoder-only Transformers TABELLET: Tabelle Struktur Anerkennung mit Encoder-only Transformers TableT: 使用只使用编码器的变换器确认表格结构 2506.07015v1 -
1297 06-08 Comparison of Lightweight Methods for Vehicle Dynamics-Based Driver Drowsiness Detection Vergleich von Leichtbaumethoden für Fahrzeugdynamik-basierte Fahrer-Drowsiness-Erkennung 机动机动机动机动机动机动车机动机动车驾驶识别漂泊情况轻型方法比较 2506.07014v1 -
1298 06-08 Half-AVAE: Adversarial-Enhanced Factorized and Structured Encoder-Free VAE for Underdetermined Independent Component Analysis Half-AVAE: Adversarial-erhöhte faktorisierte und strukturierte Encoder-freie VAE für unterbestimmte unabhängige Komponentenanalyse 半AVAE:用于未确定独立部件分析的反强化加工厂和结构化无编码器 VAE 2506.07011v1 -
1299 06-08 When do neural networks learn world models? Wann lernen neuronale Netzwerke Weltmodelle? 神经网络何时学习世界模型? 2502.09297v4 -
1300 06-08 LLM-HDR: Bridging LLM-based Perception and Self-Supervision for Unpaired LDR-to-HDR Image Reconstruction LLM-HDR: Überbrückung der LLM-basierten Wahrnehmung und Selbstüberwachung für ungepaarte LDR-zu-HDR-Bildrekonstruktion LLM-HDR: 连接以LLM为基础的LLM感知和自我监督的未受重视的LDR到HDR图像重建 2410.15068v3 -
1301 06-08 Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts Feynman-Kac Korrektoren in der Diffusion: Annealing, Anleitung und Produkt von Experten 传播中Feynman-Kac的更正者: 2503.02819v2 -
1302 06-08 Automatic Doubly Robust Forests Automatic Doppelt robuste Wälder Doubly 自动硬森林 2412.07184v2 -
1303 06-08 End-to-End Probabilistic Framework for Learning with Hard Constraints End-to-End-Probabilistisches Rahmenwerk für das Lernen mit harten Einschränkungen 最终到最终的艰苦制约因素学习概率框架 2506.07003v1 -
1304 06-08 Towards Physics-informed Diffusion for Anomaly Detection in Trajectories Auf dem Weg zu physikinformierter Diffusion für Anomalienerkennung in Trajektorien 走向在物理知情的情况下传播用于轨迹中的异常探测 2506.06999v1 -
1305 06-08 What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding Was macht Vernunftmodelle anders? Folgen Sie der Vernunftführerin für effiziente Decodierung 是什么让理性模型不同? 跟随高效配方的理性导师 2506.06998v1 -
1306 06-08 Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access Probenkomplexität des Diffusionsmodelltrainings ohne empirischen Risikominimizer-Zugang 传播示范培训的复杂程度,没有经验风险最小化,没有经验风险最小化机会 2505.18344v2 -
1307 06-08 STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings STAMP Ihr Inhalt: Nachweis der Datensatzmitgliedschaft über Watermarked Rephrasings STAMP 您的内容: 通过水标修改方式证明数据集成员 2504.13416v2 -
1308 06-08 Correcting for Position Bias in Learning to Rank: A Control Function Approach Korrektur für Position Bias in Learning to Rank: Ein Steuerungsfunktionsansatz 校正学习排名中的比值位置:控制功能方法 2506.06989v1 -
1309 06-08 Fully Explainable Classification Models Using Hyperblocks Vollständig erklärbare Klassifikationsmodelle mit Hyperblocks 使用超超分块的完全可解释的分类模型 2506.06986v1 -
1310 06-08 Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments Deep RL braucht tiefe Verhaltensanalyse: Implizite Planung durch modellfreie Agenten in offenen Umgebungen erkunden 深 RL 需要深 RL 深度行为分析:探索不限名额环境中无示范工具的隐性规划 2506.06981v1 -
1311 06-08 MoXGATE: Modality-aware cross-attention for multi-omic gastrointestinal cancer sub-type classification MoXGATE: Modalitätsbewusste Cross-Attention für multi-omere gastrointestinale Tumor-Subtypklassifikation MoXGATE:多工程胃肠胃癌亚型癌症分类模式 – – 意识交叉关注 2506.06980v1 -
1312 06-08 Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias Turb-L1: Langfristige Turbulenzen erreichen, die durch das Greifen spektraler Bias verfolgt werden Turb-L1:通过处理光辉双鱼,实现长期动荡追踪 2505.19038v2 -
1313 06-08 Near Optimal Non-asymptotic Sample Complexity of 1-Identification Nahe Optimale nicht-asymptotische Probenkomplexität der 1-Identifikation 近于最优化的非不消毒样品 复杂度 1 - 识别 2506.06978v1 -
1314 06-08 Post Reinforcement Learning Inference Rückschlüsse nach der Verstärkung des Lernens 强化后学习的推论 2302.08854v4 -
1315 06-08 UdonCare: Hierarchy Pruning for Unseen Domain Discovery in Predictive Healthcare UdonCare: Hierarchie Pruning für Unsichtbare Domain Discovery in Predictive Healthcare UdonCare: 预测保健中未见的域域发现 的分级预留机制 2506.06977v1 -
1316 06-08 Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning Lernen, durch verstärktes Lernen durch reward-gewichtete Feintuning zu klären 学习通过通过奖励-加权微调加强学习来澄清问题 2506.06964v1 -
1317 06-08 Fitting Multilevel Factor Models Passende Multilevel-Faktormodelle 适合多级因数模型 2409.12067v3 -
1318 06-08 CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration CE-CoLLM: Effiziente und adaptive Großsprachenmodelle durch Cloud-Edge-Kollaboration CE-COLLM:通过云眼协作建立高效和适应性大语言模型 2411.02829v2 -
1319 06-08 Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs 超越自反应内核:历史驱动目标,实现高效的非线性非线性通用图形MCMC 2505.18300v2 -
1320 06-08 Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression Sicherheitsbewusstes Verstärktes Lernen zur Kontrolle über risikosensitive Aktions-Value Iteration und quantitative Regression 通过风险敏感行动增值和量回归,加强安全软件强化学习,以通过风险敏感行动控制 2506.06954v1 -
1321 06-07 (6) Is Your Training Pipeline Production-Ready? A Case Study in the Healthcare Domain Ist Ihr Training Pipeline Production-Ready? Eine Fallstudie im Bereich Healthcare 你的训练管道生产-准备? 保健领域案例研究 2506.06946v1 -
1322 06-07 Polar Hierarchical Mamba: Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences Polar Hierarchical Mamba: Auf dem Weg zur Streaming-LiDAR-Objekterkennung mit Punktwolken als egozentrische Sequenzen 极地分级马姆巴:向以点云为主心序列的分点云星探测星体径流激光雷达物体的方向迈进 2506.06944v1 -
1323 06-07 Conditional Denoising Diffusion for ISAC Enhanced Channel Estimation in Cell-Free 6G Bedingte störende Diffusion für ISAC-Verstärkte Kanalabschätzung in zellfreier 6G 以无细胞6G方式为ISAC强化频道估计仪进行有条件的低排放传播 2506.06942v1 -
1324 06-07 RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior RestoreGrad: Signal-Restaurierung mit Bedingten störenden Diffusions-Modellen mit gemeinsam gelernten Prior RestAGrad:使用有条件的低度扩散模型和先前联合研究的传播模型,恢复信号恢复信号 2502.13574v3 -
1325 06-07 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Die Illusion des Denkens: Die Stärken und Grenzen von Vernunftmodellen über das Lens of Problem Complexity verstehen 思考的幻觉:通过问题复杂焦点了解理性模型的长处和局限性 2506.06941v1 -
1326 06-07 Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More Schärfe-Dynamik im NN-Training mit einem minimalistischen Beispiel verstehen: Die Auswirkungen von Dataset-Schwierigkeiten, Tiefe, Stochastik und mehr 在NN培训中以最微小的示例了解尖锐的动态:数据集难度、深度、储存性等的影响 2506.06940v1 -
1327 06-07 Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning Intelligente Anomalieerkennung für Lane-Rendering mit Transformer mit selbstüberwachtem Pre-Training und maßgeschneidertem Feintuning 利用自我监督的训练前和定制的微调访问变形器,对利用自我监督的训练前和定制的微调访问前变形器进行巷路成形的智能异常异常探测 2312.04398v6 -
1328 06-07 Scaling over Scaling: Exploring Test-Time Scaling Plateau in Large Reasoning Models Skalierung über Skalierung: Untersuchung des Testzeit-Skalierungs-Plateau in Großmodellen 缩放超过缩放: 探索大型理由模型中的试验时间缩放高台 2505.20522v2 -
1329 06-07 Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry Das Budget neu schreiben: Ein allgemeiner Rahmen für Black-Box-Angriffe unter Kosten-Asymmetrie 重写预算:费用不对称下的黑牛袭击总框架 2506.06933v1 -
1330 06-07 Revisiting semi-supervised learning in the era of foundation models Halbbeaufsichtigtes Lernen im Zeitalter der Gründungsmodelle 在基础模式时代重新审查半监督的学习 2503.09707v2 -
1331 06-07 BLEUBERI: BLEU is a surprisingly effective reward for instruction following BLEUBERI: BLEU ist eine überraschend effektive Belohnung für den Unterricht nach BLEU是一种令人惊讶的有效奖赏 2505.11080v2 -
1332 06-07 Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes Analyse der Thompson-Probenahme zur Kontrolle unbekannter linearer Diffusionsprozesse Thompson 用于控制未知线性扩散过程的抽样分析 2206.09977v2 -
1333 06-07 Distributionally Robust Learning in Survival Analysis Verteilungsweise robustes Lernen in der Überlebensanalyse 在生存分析中进行有力的分配学习 2506.01348v2 -
1334 06-07 Collapse-Proof Non-Contrastive Self-Supervised Learning Kollaps-Proof des nicht kontrasiven Selbstüberwachten Lernens 非封闭性自我监督学习的折叠结果 2410.04959v3 -
1335 06-07 Basis Transformers for Multi-Task Tabular Regression Basistransformatoren für Multi-Task-Tabellarische Regression 多任务表格回归基础变换器 2506.06926v1 -
1336 06-07 The global convergence time of stochastic gradient descent in non-convex landscapes: Sharp estimates via large deviations Die globale Konvergenzzeit stochastischer Gradientenabgänge in nicht konvexen Landschaften: Scharfe Schätzungen über große Abweichungen 非沿海地貌中悬浮性梯度下降的全球趋同时间:因巨大偏差而得出的粗略估计 2503.16398v2 -
1337 06-07 Measurement to Meaning: A Validity-Centered Framework for AI Evaluation Messung zur Bedeutung: Ein gültigkeitszentrierter Rahmen für die AI-Bewertung 衡量到意义:AI评价的有效性-中心框架 2505.10573v3 -
1338 06-07 Graph-Based Physics-Guided Urban PM2.5 Air Quality Imputation with Constrained Monitoring Data Graph-based Physics-Guided Urban PM2.5 Luftqualität Imputation mit eingeschränkten Überwachungsdaten 基于图表的物理指导城市 PP2.5 空气质量与受控监测数据 2506.06917v1 -
1339 06-07 Graph Neural Networks in Modern AI-aided Drug Discovery Graph Neuronale Netzwerke in der modernen KI-gestützten Drug Discovery 现代人工智能辅助药物发现现代神经网络图 2506.06915v1 -
1340 06-07 Uncertainty Estimation on Graphs with Structure Informed Stochastic Partial Differential Equations Ungewissheitsschätzung auf Graphen mit Struktur Informierte stochastische Partielle Differentialgleichungen 具有结构、知情斯托卡部分差别的图图的不确定性估计 2506.06907v1 -
1341 06-07 When Incentives Backfire, Data Stops Being Human Wenn Incentives Backfire, Daten aufhören, Mensch zu sein 当鼓励反火时,数据停止成为人类 2502.07732v2 -
1342 06-07 Prediction-Powered Adaptive Shrinkage Estimation Vorhersage-Powered Adaptive Schrumpfung Schätzung 预测力的适应性减缩估计值 2502.14166v2 -
1343 06-07 An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation Eine adaptiv ungenaue Methode für das Zwei-Stufen-Lernen mit primär-dualer Stildifferenzierung 使用纯-双式风格差异的双级学习适应性不精确方法 2412.06436v3 -
1344 06-07 TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training TorchTitan: Ein-Stop PyTorch native Lösung für produktionsfertiges LLM-Vortraining 火炬泰坦:当地一站式PyToirch当地生产即成LLM培训前的一站式PyTorch本地解决方案 2410.06511v3 -
1345 06-07 NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery NSD-Imagery: Ein Benchmark-Datensatz zur Erweiterung von fMRI-Vision-Dekodierungsmethoden auf geistige Bilder NSD- 想象力:将FMRI愿景解码方法扩展至精神图像的基准数据集 2506.06898v1 -
1346 06-07 Exploring Image Transforms derived from Eye Gaze Variables for Progressive Autism Diagnosis Erforschen von Bildtransformationen, abgeleitet von Eye Gaze-Variablen für die progressive Autismus-Diagnose 探索从用于渐进自闭症诊断的眼盖变量中产生的图像变换 2506.09065v1 -
1347 06-07 LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models LLM-SRBench: Ein neuer Benchmark für die Entdeckung wissenschaftlicher Gleichungen mit großen Sprachmodellen LLM-SRBench:采用大语言模式科学等同发现新基准 2504.10415v2 -
1348 06-07 Multimodal Integration of Longitudinal Noninvasive Diagnostics for Survival Prediction in Immunotherapy Using Deep Learning Multimodale Integration der langitudinalen Nichtinvasiven Diagnostik zur Überlebensvorhersage in der Immuntherapie mittels Deep Learning 利用深层学习的免疫疗法中生存预测纵向非侵入性诊断的多模式综合 2411.18253v2 -
1349 06-07 Scalable Gaussian Processes with Latent Kronecker Structure Skalierbare Gaußprozesse mit latenter Kronecker-Struktur 可缩放高斯进程 2506.06895v1 -
1350 06-07 A Novel Neural Filter to Improve Accuracy of Neural Network Models of Dynamic Systems Ein neuartiger neuraler Filter zur Verbesserung der Genauigkeit neuraler Netzwerkmodelle dynamischer Systeme 提高动态系统神经网络模型准确性的新神经过滤器 2409.13654v2 -
1351 06-07 Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks? Kann In-Context-Verstärkung lernen, sich von Belohnungs-Vergiftungsangriffen zu erholen? 文中加强学习能从受毒害袭击中恢复吗? 2506.06891v1 -
1352 06-07 FREE: Fast and Robust Vision Language Models with Early Exits KOSTENLOS: Schnelle und robuste Vision Sprachmodelle mit frühen Ausgängen FREE:有早期出入口的快速和强有力的愿景语言模式 2506.06884v1 -
1353 06-07 LookAlike: Consistent Distractor Generation in Math MCQs LookAlike: Konsistente Distraktor-Generation in Math MCQs 类似外观: 在数学 MCQs 中, 一致的分隔器生成 2505.01903v2 -
1354 06-07 Empowering COVID-19 Detection: Optimizing Performance Through Fine-Tuned EfficientNet Deep Learning Architecture Empowering COVID-19 Detection: Optimierung der Leistung durch eine effiziente Net Deep Learning Architektur 增强COVID-19探测能力:通过精巧高效网络深层学习架构优化业绩 2311.16593v2 -
1355 06-07 Learnable Activation Functions in Physics-Informed Neural Networks for Solving Partial Differential Equations Lernbare Aktivierungsfunktionen in physikinformierten Neuronalen Netzwerken zur Lösung partieller Differentialgleichungen 用于解决部分差异等量的物理内成神经网络中可学习的激活功能 2411.15111v3 -
1356 06-07 An Optimized Ensemble Deep Learning Model For Brain Tumor Classification Ein optimiertes Ensemble Deep Learning Modell für die Klassifizierung von Gehirntumoren 大脑肿瘤分类最佳组合深层学习模型 2305.12844v3 -
1357 06-07 Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization Verbesserte Probenkomplexität für private Nonsmooth Nonconvex-Optimierung 改进私人非机动非康通非节流优化的样品复杂度 2410.05880v2 -
1358 06-07 Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding Weit und deutlich sehen: Halluzinationen in MLLMs mit Aufmerksamkeit abbauen 远看远看得很清楚:在MLLLMS中减少幻觉,并注意固态解说 2505.16652v2 -
1359 06-07 Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary Methods Learning Fluid-Struktur-Interaktion Dynamik mit physikinformierten Neuronalen Netzwerken und eingetauchten Grenzmethoden 与物理内成形神经网络和混合边界方法的互动动态 2505.18565v2 -
1360 06-07 Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning Log-Sum-Exponential-Estimator für Off-Policy-Evaluierung und -Lernen 用于非政策评价和学习的非政策评价和学习的日志- 2506.06873v1 -
1361 06-07 Watermark under Fire: A Robustness Evaluation of LLM Watermarking Wasserzeichen unter Feuer: Eine Robustheitsbewertung von LLM Watermarking 火灾下的水印:对LLM水标志的强力评价 2411.13425v3 -
1362 06-07 KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies KEA: Exploration durch proaktiv koordinierte Explorationsstrategien am Leben erhalten KEA:通过主动协调的勘探战略使勘探活动保持活跃 2503.18234v2 -
1363 06-07 SAFE: Finding Sparse and Flat Minima to Improve Pruning SAFE: Sparse und flache Minima finden, um Pruning zu verbessern FAFE: 寻找粗糙和平坦的迷你小米来改善谨慎 2506.06866v1 -
1364 06-07 Generalization and Robustness of the Tilted Empirical Risk Verallgemeinerung und Robustheit des abgeschwächten Empirischen Risikos 微薄经验风险的概括化和有力性 2409.19431v3 -
1365 06-07 Multimodal Spatial Language Maps for Robot Navigation and Manipulation Multimodale räumliche Sprachkarten für Roboternavigation und -manipulation 用于机器人导航和操纵的多模式空间语言地图 2506.06862v1 -
1366 06-07 Differentially Private Sparse Linear Regression with Heavy-tailed Responses Unterschiedlich Private Sparse Lineare Regression mit stark gestaffelten Responses 具有重尾量反应的 不同私人的粗线性递减 2506.06861v1 -
1367 06-07 High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations High-Fidelity Scientific Simulation Surrogats über adaptive Implizite Neuraldarstellungen 通过适应性隐隐性神经仪表显示的高纤维科学模拟模拟器 2506.06858v1 -
1368 06-07 Free Random Projection for In-Context Reinforcement Learning Freie Zufallsprojektion für In-Context-Verstärkung 内容强化学习自由随机投射 2504.06983v2 -
1369 06-07 JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data JESTR: Gemeinsame Einbettung von Weltraumtechnik für Ranking-Kandidatenmoleküle für die Annotation untargeted Metabolomics Data JESTR: 联合嵌入空间技术,以评分非目标元元数据评分候选候选分子 2411.14464v3 -
1370 06-07 Curvature Enhanced Data Augmentation for Regression Verstärkte Datenvergrößerung für Regression 后回归加速度增强数据增强值 2506.06853v1 -
1371 06-07 Active Preference Optimization for Sample Efficient RLHF Aktive Preference-Optimierung für Probeneffizienz RLHF 抽样高效RLHF的主动首选最佳优化 2402.10500v3 -
1372 06-07 A Statistical Framework for Model Selection in LSTM Networks Ein statistischer Rahmen für die Modellauswahl in LSTM-Netzwerken LSTM网络模式选择统计框架 2506.06840v1 -
1373 06-07 Harnessing Vision-Language Models for Time Series Anomaly Detection Nutzung von Vision-Language-Modellen für Zeitreihenanomalienerkennung 利用视觉语言模型进行时间序列异常探测 2506.06836v1 -
1374 06-07 LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models LLM-Attacken: Verbesserung der Closed-Loop Adversarial Scenario Generation für autonomes Fahren mit großen Sprachmodellen LLM-攻击者:加强利用大语言模型自主驾驶的闭路反反向假想生成 2501.15850v2 -
1375 06-07 FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-World LoRA FedALT: Federated Fine-Tuning durch Adaptives lokales Training mit Rest-of-World LoRA FedALT:通过与世界其余地区合作的适应性地方培训实现联邦退休 2503.11880v2 -
1376 06-07 The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes Die Strömungen des Konflikts: Konflikttrends mit gaussischen Prozessen zersetzen 冲突流:与高斯进程冲突趋势的分解 2506.06828v1 -
1377 06-07 Data Shapley in One Training Run Datenschädlichkeit in einem Trainingslauf 一个培训运行中的数据形状 2406.11011v3 -
1378 06-07 ASPO: Constraint-Aware Bayesian Optimization for FPGA-based Soft Processors ASPO: Constraint-Aware Bayesian Optimierung für FPGA-basierte Softprozessoren APPO: 以FPGA为基础的软处理器的节压-Aware Bayesian优化 2506.06817v1 -
1379 06-07 Path Integral Optimiser: Global Optimisation via Neural Schrödinger-Föllmer Diffusion Path Integral Optimiser: Globale Optimierung über Neural Schrödinger-Föllmer Diffusion 路径综合优化:通过Neural Schrödinger-Föllmer扩散实现全球优化 2506.06815v1 -
1380 06-07 Homophily-Driven Sanitation View for Robust Graph Contrastive Learning Homophil-getriebene Sanitäransicht für robustes Graph Kontrastives Lernen 强力图表对比学习的光极- 驱动光学卫生视图 2307.12555v2 -
1381 06-07 Efficient Parallelization of Message Passing Neural Network Potentials for Large-scale Molecular Dynamics Effiziente Parallelisierung von Message-Passing-Neural Network-Potenziale für großmaßstäbliche molekulare Dynamiken 大型分子动态信息传递神经网络潜力的有效平行 2505.06711v3 -
1382 06-07 Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity Schnelle kausale Entdeckung durch ungefähre Kernel-basierte generalisierte Score-Funktionen mit linearer Computational Complexity 以近端内核为基础的通用计分函数及线性比较复杂度 2412.17717v2 -
1383 06-07 IMPA-HGAE:Intra-Meta-Path Augmented Heterogeneous Graph Autoencoder IMPA-HGAE:Intra-Meta-Path Augmented Heterogene Graph Autoencoder IMPA-HGAE: 甲基板内增增异异异形图形自动编码器 2506.06809v1 -
1384 06-07 Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification Label-Semantik Aware Generativer Ansatz für Domain-Agnostic Multilabel-Klassifikation 域-不可知性多标签分类的认知生成方法 2506.06806v1 -
1385 06-07 Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs Dialog ohne Grenzen: Konstant-Sized KV-Caches für erweiterte Reaktionen in LLMs 无限制对话框: 普通 KV 宽度响应库( LLM ) 2503.00979v2 -
1386 06-07 Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate Vorhersage-erweitert Monte Carlo: Eine Machine-Learning-Ansicht auf Steuerungsvariate 预测增强的蒙特卡洛:关于控制Variatte的机械学习观点 2412.11257v3 -
1387 06-07 EasyFS: an Efficient Model-free Feature Selection Framework via Elastic Transformation of Features EasyFS: Ein effizientes modellfreies Feature Selection Framework über die elastische Transformation von Features EsimpFS:通过地物的弹性变异,建立一个高效的无示范模式的地物选择框架 2402.05954v2 -
1388 06-07 Refining Adaptive Zeroth-Order Optimization at Ease Verfeinerung adaptiver Zeroth-Order-Optimierung bei Leichtigkeit Ease 的适应性零极优化 2502.01014v2 -
1389 06-07 Is Optimal Transport Necessary for Inverse Reinforcement Learning? Ist ein optimaler Verkehr für das Inverse Verstärkungslernen notwendig? 最佳交通是否为反强化学习所必需? 2506.06793v1 -
1390 06-07 Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models Ferret: Federated Full-Parameter Tuning auf Scale für große Sprachmodelle Ferret:在大语言模型规模上采用联邦全票全票制表 2409.06277v3 -
1391 06-07 FuncGNN: Learning Functional Semantics of Logic Circuits with Graph Neural Networks FuncGNN: Funktionelle Semantik von Logic Circuits mit Graph Neural Networks lernen FuncGNN: 学习具有图形神经网络的逻辑电路功能性词义 2506.06787v1 -
1392 06-07 Caterpillar GNN: Replacing Message Passing with Efficient Aggregation Caterpillar GNN: Ersetzen der Nachricht durch effiziente Aggregation GNN: 以高效聚合取代信件传递 2506.06784v1 -
1393 06-07 Interpretable LLMs for Credit Risk: A Systematic Review and Taxonomy Interpretierbare LLMs für Kreditrisiken: Systematische Überprüfung und Taxonomie 信用风险:系统审查和分类 2506.04290v2 -
1394 06-07 Feature-Based Instance Neighbor Discovery: Advanced Stable Test-Time Adaptation in Dynamic World Feature-Based Instance Neighbor Discovery: Erweiterte stabile Test-Time-Anpassung in dynamischer Welt 以地貌为基础的实例邻国发现:动态世界中的高级稳定测试时间适应 2506.06782v1 -
1395 06-07 Continuous-Time SO(3) Forecasting with Savitzky–Golay Neural Controlled Differential Equations Continuous-Time SO(3) Vorhersage mit Savitzky–Golay Neural gesteuerte Differentialgleichungen 连续时间 SO(3) 使用Savitzky-Golay 控制神经控制的神经等量的预测 2506.06780v1 -
1396 06-07 Continuous Semi-Implicit Models Kontinuierliche semiimplizite Modelle 连续半集成模型 2506.06778v1 -
1397 06-07 Label Distribution Learning using the Squared Neural Family on the Probability Simplex Label-Distribution-Lernen mit der Quadratischen Neural-Familie auf der Wahrscheinlichkeit Simplex 利用 “ 概率简单 “ 上的平方神经家庭进行标签分布学习 2412.07324v2 -
1398 06-07 Taming Wild Branches: Overcoming Hard-to-Predict Branches using the Bullseye Predictor Wilde Zweige zähmen: Hard-to-Predict-Zweige mit dem Bullseye Predictor überwinden 塔姆荒野分支:利用 “ 红心 “ 预测仪克服难以预防的分支 2506.06773v1 -
1399 06-07 On Adversarial Robustness of Language Models in Transfer Learning Zur widerlichen Robustheit von Sprachmodellen im Transfer-Lernen 转让学习中语言模式的反效果 2501.00066v2 -
1400 06-07 AfroBench: How Good are Large Language Models on African Languages? AfroBench: Wie gut sind große Sprachmodelle auf afrikanischen Sprachen? AfroBench:非洲语言大语言模式有多好? 2311.07978v5 -
1401 06-07 Xeno-learning: knowledge transfer across species in deep learning-based spectral image analysis Xeno-Learning: Wissenstransfer über Arten hinweg in der Deep Learning-basierten Spektralbildanalyse Xeno-学习:深学习光谱图象分析中的跨物种知识转让 2410.19789v2 -
1402 06-07 The OCR Quest for Generalization: Learning to recognize low-resource alphabets with model editing OCR Quest for Generalization: Lernen, Low-Resource-Alphabets mit Modellbearbeitung zu erkennen OCR 普遍化调查:学会以示范编辑方式识别低资源字母表 2506.06761v1 -
1403 06-07 Bio-Inspired Classification: Combining Information Theory and Spiking Neural Networks – Influence of the Learning Rules Bio-Inspirierte Klassifikation: Verknüpfung von Informationstheorie und spikenden Neuronalen Netzwerken – Einfluss der Lernregeln 生物启发分类:信息理论与Spiking神经网络相结合 – – 学习规则的影响 2506.06750v1 -
1404 06-07 Honey, I shrunk the hypothesis space (through logical preprocessing) Schatz, ich habe den Hypothesenraum geschrumpft (durch logische Vorverarbeitung) 亲爱的,我缩小了假设空间(通过逻辑预处理) 2506.06739v1 -
1405 06-07 WorldLLM: Improving LLMs’ world modeling using curiosity-driven theory-making WorldLLM: Verbesserung der Weltmodellierung von LLMs durch kuriositätsgetriebene Theoriebildung WorldLLLM:利用由好奇力驱动的理论制造改进LLMs的世界模型 2506.06725v1 -
1406 06-07 Directly Forecasting Belief for Reinforcement Learning with Delays Direkt voraussagender Glaube für verstärktes Lernen mit Verzögerungen 直接预测学习迟迟加强学习的信念 2505.00546v2 -
1407 06-07 IQFM A Wireless Foundational Model for I/Q Streams in AI-Native 6G IQFM Ein drahtloses Grundmodell für I/Q Streams in AI-Native 6G AI-Native 6G的I/Q流无线无线基础模型 2506.06718v1 -
1408 06-07 Online SLA Decomposition: Enabling Real-Time Adaptation to Evolving Network Systems Online-SLA-Zersetzung: Echtzeit-Anpassung an sich entwickelnde Netzwerksysteme ermöglichen SLA在线分解:使实时适应不断演变的网络系统 2408.08968v5 -
1409 06-07 Towards Black-Box Membership Inference Attack for Diffusion Models Auf dem Weg zu Black-Box-Mitgliedschafts-Inferenz-Angriff für Diffusionsmodelle 争取为扩散模型进行黑箱成员推导攻击 2405.20771v4 -
1410 06-07 Understanding Nonlinear Implicit Bias via Region Counts in Input Space Nichtlineare Implizite Bias über Regionszählungen im Eingaberaum verstehen 通过输入空间中的区域计数理解非线性隐隐含比值 2505.11370v2 -
1411 06-07 RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking RED QUEEN: Schutz großer Sprachmodelle gegen verschleierte Multi-Turn-Jailbreaking RED QUEEN: 保护大型语言模型,防止隐蔽的多轮破门破门 2409.17458v2 -
1412 06-07 Do Protein Transformers Have Biological Intelligence? Haben Proteintransformatoren biologische Intelligenz? 蛋白质变异器有生物智能吗? 2506.06701v1 -
1413 06-07 MarginSel : Max-Margin Demonstration Selection for LLMs MarginSel : Max-Margin-Demonstrationsauswahl für LLMs MaginSel : LLMS 最大边框演示选择 2506.06699v1 -
1414 06-07 Contextual Experience Replay for Self-Improvement of Language Agents Kontextuelles Erfahrungsreplay zur Selbstverbesserung von Sprachagenten 语文代理人员自我改进的内事经验回顾 2506.06698v1 -
1415 06-07 Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning Breaking Data Silos: Auf dem Weg zu offenen und skalierbaren Mobilitäts-Stiftungsmodellen durch generatives kontinuierliches Lernen 突破数据硅:通过创造持续学习实现开放和可缩放的流动基金会模式 2506.06694v1 -
1416 06-07 From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications Vom Tiefrang gradienten Subraumstabilisierung zu Low-Rank-Gewichten: Beobachtungen, Theorien und Anwendungen 从低级梯度子空间稳定到低射速重量:观测、理论和应用 2407.11239v2 -
1417 06-07 Learning Robust Heterogeneous Graph Representations via Contrastive-Reconstruction under Sparse Semantics Robuste Heterogene Graphendarstellungen über Kontrastive-Rekonstruktion unter Sparsemantik lernen 在粗简语义下通过对比重建进行学习强强强异基因图示 2506.06682v1 -
1418 06-07 From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients Von Offline zu Online Memory-Free und Task-Free Continual Learning über feinkörnige Hypergradienten 从离线到在线无记忆和无任务持续学习 2502.18762v2 -
1419 06-07 Interpretation of Deep Learning Model in Embryo Selection for In Vitro Fertilization (IVF) Treatment Interpretation von Deep-Learning-Modell in der Embryo-Auswahl für die In-Vitro-Düngung (IVF) Behandlung 体外受肥(IVF)治疗Embryo选择 Empryo的深学习模型解释 2506.06680v1 -
1420 06-07 Rethinking Benign Overfitting in Two-Layer Neural Networks Neudenken Benign Overfitting in Zwei-Ebenen-Neural-Netzwerke 重新思考两层神经网络中的 “ 仁善 “ 改造 2502.11893v2 -
1421 06-07 Accounting for plasticity: An extension of inelastic Constitutive Artificial Neural Networks Bilanzierung der Plastizität: Eine Erweiterung der inelastischen konstitutiven künstlichen Neuronalen Netzwerke 可塑性核算:无弹性建筑人工神经神经网络的扩展 2407.19326v2 -
1422 06-07 Flood-DamageSense: Multimodal Mamba with Multitask Learning for Building Flood Damage Assessment using SAR Remote Sensing Imagery Flood-DamageSense: Multimodales Mamba mit Multitasking-Lernen zur Erstellung von Hochwasserschäden mit Hilfe von SAR Remote Sensing Imagery 水灾-灾情警报:利用SAR遥感图像学进行水灾损害评估的多任务学习的多模式Mamba 2506.06667v1 -
1423 06-07 Through the Gaps: Uncovering Tactical Line-Breaking Passes with Clustering Durch die Lücken: Entdeckt taktische Linien-Breaking-Pässe mit Clustering 穿透差距:以集束方式破开战术直线通道 2506.06666v1 -
1424 06-07 SDP-CROWN: Efficient Bound Propagation for Neural Network Verification with Tightness of Semidefinite Programming SDP-CROWN: Effiziente Bound Propagation für neurale Netzwerkverifikation mit Zähigkeit der semidefiniten Programmierung SDP-CROWN: 有效促进神经网络核查和半无限期方案拟订的严格性 2506.06665v1 -
1425 06-07 SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL SLAC: Simulationsvorausgesetzter Latent Action Space für Ganzkörper-Real-World RL SLAC: 模拟 - 预先准备的中转行动空间,用于整个模拟 - 模拟 - 现实世界RL 2506.04147v2 -
1426 06-07 Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence Kurzlanges Adversarial Training hilft LLMs, langanhaltende Jailbreak Attacken zu verteidigen: Theoretische und empirische Beweise 短期对面培训帮助长者捍卫长期越狱袭击:理论和经验证据 2502.04204v2 -
1427 06-07 Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning Q-Policy: Quantenverstärkte Politikevaluierung für skalierbares Stärkungslernen Q-政策:为可扩展的强化学习加强政策评价 2505.11862v2 -
1428 06-07 Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning Pareto-Optimal Rewards von Noisy Preferences lernen: Ein Rahmen für multi-objektives Inverse-Verstärkung-Lernen 从新偏爱中学习 Pareto- Opatimal 奖励:多目标反强化学习框架 2505.11864v2 -
1429 06-07 GraphRAG under Fire GraphRAG unter Feuer 火下的图形RAG 2501.14050v3 -
1430 06-07 Rescaled Influence Functions: Accurate Data Attribution in High Dimension Reskalierte Einflussfunktionen: Genaue Datenzuweisung in hoher Dimension 重新尺度影响函数: 高维度的准确数据归属 2506.06656v1 -
1431 06-07 Explaining Risks: Axiomatic Risk Attributions for Financial Models Risiken erklären: Axiomatische Risikozuweisungen für Finanzmodelle 解释风险:金融模型的逻辑风险归属 2506.06653v1 -
1432 06-07 Protap: A Benchmark for Protein Modeling on Realistic Downstream Applications Protap: Ein Benchmark für Proteinmodellierung auf realistischen Downstream-Anwendungen Protap:关于现实下下游应用的蛋白质建模基准 2506.02052v2 -
1433 06-07 SAFER: A Calibrated Risk-Aware Multimodal Recommendation Model for Dynamic Treatment Regimes SAFER: Ein kalibriertes multimodales Empfehlungsmodell für dynamische Behandlungssysteme FAFER: 动态处理制度经校准的风险软件多式建议模型 2506.06649v1 -
1434 06-07 Spark Transformer: Reactivating Sparsity in FFN and Attention Spark Transformer: Sparsamkeit in FFN und Aufmerksamkeit reaktivieren Spark 变换器: 重新激活FFFFF 的公平性和注意 2506.06644v1 -
1435 06-07 GLASS: Guided Latent Slot Diffusion for Object-Centric Learning GLASS: Geführte Latent Slot Diffusion für Objekt-Centric Learning GLASS: 向导中中流槽扩散,用于对象中心学习 2407.17929v2 -
1436 06-07 Zero-shot Generalist Graph Anomaly Detection with Unified Neighborhood Prompts Zero-shot Generalist Graph Anomalie Erkennung mit Unified Neighborhood Prompts 使用统一邻里提示器进行异常探测 2410.14886v2 -
1437 06-07 Non-Intrusive Load Monitoring Based on Image Load Signatures and Continual Learning Nicht-intrusive Lastüberwachung basierend auf Bildlastsignaturen und kontinuierlichem Lernen 基于图像加载签名和不断学习的无侵入性负载监测 2506.06637v1 -
1438 06-07 Vision-QRWKV: Exploring Quantum-Enhanced RWKV Models for Image Classification Vision-QRWKV: Quantum-erweiterte RWKV-Modelle für die Bildklassifikation 展望-QRWKKV:探索以量为基础增强的RWKV图像分类模型 2506.06633v1 -
1439 06-07 Arbitrarily-Conditioned Multi-Functional Diffusion for Multi-Physics Emulation Willkürlich bedingte multifunktionale Diffusion für multiphysikalische Emulation 多物理模拟模拟多功能多功能扩散 2410.13794v2 -
1440 06-07 Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach Wiederherstellung der Kalibrierung für ausgerichtete große Sprachmodelle: Ein Calibration-Aware Fine-Tuning-Ansatz 统一大语言模型的重新校准:校准-软件精美调试方法 2505.01997v2 -
1441 06-07 Harnessing On-Device Large Language Model: Empirical Results and Implications for AI PC Nutzung von On-Device Large Language Modell: Empirische Ergebnisse und Implikationen für KI-PC 利用大语言大语言模型:经验结果和对AIPC的影响 2505.15030v3 -
1442 06-07 Policy Filtration for RLHF to Mitigate Noise in Reward Models Politische Filtration für RLHF zur Mititation von Lärm in Prämienmodellen 将RLHF政策归类为奖励模型中最小噪音的政策 2409.06957v5 -
1443 06-07 Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning Lehrplan Verstärkung Lernen von einfachen bis harten Aufgaben verbessert LLM Reasoning 从易到困难的工作学习改进LLM 理由 2506.06632v1 -
1444 06-07 Q-WSL: Optimizing Goal-Conditioned RL with Weighted Supervised Learning via Dynamic Programming Q-WSL: Optimierung der zielkonditionierten RL mit gewichtetem betreutem Lernen durch dynamische Programmierung Q-WSL:通过动态方案拟订,以加权监督的学习方式,优化目标有条件的RL 2410.06648v5 -
1445 06-07 Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts Zertifiziert robuste Modellbewertung im Federated Learning unter Meta-Distributionsverschiebungen 在混合分配变化下对联邦学习模式进行可验证的健全模式评价 2410.20250v2 -
1446 06-07 Equilibrium Distribution for t-Distributed Stochastic Neighbor Embedding with Generalized Kernels Gleichgewichtsverteilung für t-Distributed Stochastic Neighbor Einbettung mit generalisierten Kernen 以一般内核嵌入的三散散散蒸散邻居间均匀分布 2505.24311v2 -
1447 06-07 Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations Robuste Erlernbarkeit von Proben-kompressiblen Verteilungen unter Lärm oder störenden Störungen 在噪音或对称干扰下抽样压缩分发的可有力学习性 2506.06613v1 -
1448 06-07 Transferring Features Across Language Models With Model Stitching Übertragung von Funktionen über Sprachmodelle mit Modellstich 使用模型裁剪的跨语言模型传输功能 2506.06609v1 -
1449 06-07 Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit Training-Free Tokenizer Transplantation über Orthogonal Passende Verfolgung 通过矫形配对运动进行无培训的取暖器移植 2506.06607v1 -
1450 06-07 Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization Stacey: Förderung des stochastischen Steepest Descents durch beschleunigte Optimierung von $\ell_p$-Smooth Nonconvex Stacey: 通过加速 $\ ell_ p$- Smooth 的非convex 优化, 推广斯托切性隐蔽源 2506.06606v1 -
1451 06-07 Scoring the Unscorables: Cyber Risk Assessment Beyond Internet Scans Bewertung der Unscorables: Cyber-Risikobewertung jenseits von Internet-Scans CSCOW:互联网扫描之外的网络风险评估 2506.06604v1 -
1452 06-07 CAtCh: Cognitive Assessment through Cookie Thief CAtCh: Kognitive Bewertung durch Cookie Dieb CATCH: 通过饼干小偷进行认知评估 2506.06603v1 -
1453 06-07 Direct Prediction Set Minimization via Bilevel Conformal Classifier Training Direkte Vorhersage Einstellung Minimierung über Bilevel Conformal Classifier Training 通过双级正规分类培训尽量减少直接预测设置 2506.06599v1 -
1454 06-07 Scaling Inference-Efficient Language Models Skalierung von Schlussfolgerungen-Effiziente Sprachmodelle 扩展率-有效语言模型 2501.18107v2 -
1455 06-07 From Model-Based and Adaptive Control to Evolving Fuzzy Control Von der modellbasierten und adaptiven Steuerung zur sich entwickelnden Fuzzy-Steuerung 从模型和适应控制到演化模糊控制 2506.06594v1 -
1456 06-06 (5) Improving the Diffusability of Autoencoders Verbesserung der Diffusibilität von Autoencodern 提高自动编码器的易变性 2502.14831v3 -
1457 06-06 Neighbour-Driven Gaussian Process Variational Autoencoders for Scalable Structured Latent Modelling Nachbar-getriebene Gauß-Prozessvariationelle Autoencoder für skalierbare strukturierte Latentenmodellierung 相邻的Driven Gaussian 用于可缩放结构晚期模拟模型的流程变异自动编码器 2505.16481v2 -
1458 06-06 Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures Globale Konvergenz von Gradienten EM für überparameterisierte Gauß-Mischungen 用于超度测量高斯混合体的梯度 EM 全球趋同 2506.06584v1 -
1459 06-06 Demystifying Topological Message-Passing with Relational Structures: A Case Study on Oversquashing in Simplicial Message-Passing Entmystifizieren Topologisches Message-Passing mit relationalen Strukturen: Eine Fallstudie zum Übersetzen in Simplicial Message-Passing 解开神秘的地形信息与关系结构的粘贴:关于简单信息传送中过度隔绝的案例研究 2506.06582v1 -
1460 06-06 Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques Auf dem Weg zu einer effizienten Multi-LLM-Inferenz: Charakterisierung und Analyse von LLM Routing und Hierarchical Techniques 实现高效的多LLLM 多级LLM推理:LLM转流和等级技术的特征和分析 2506.06579v1 -
1461 06-06 Generalization of Geometric Graph Neural Networks with Lipschitz Loss Functions Generalisierung geometrischer Graphen-Neuralnetze mit Lipschitz-Verlustfunktionen 具有Lipschitz损失功能的几何图神经网络的通用化 2409.05191v2 -
1462 06-06 DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling DISC: DISC: Dynamische Zersetzung verbessert LLM-Inferenzskalierung DISC: DISC: 动态分解改善LLM 推推法的扩大 2502.16706v2 -
1463 06-06 A Manifold Perspective on the Statistical Generalization of Graph Neural Networks Eine vielfältige Perspektive auf die statistische Verallgemeinerung von Graphen-Neuralen Netzwerken 图表神经网络统计一般化的多功能视角 2406.05225v6 -
1464 06-06 Graph Persistence goes Spectral Grafik Persistenz geht Spektral 图 常年 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 2506.06571v1 -
1465 06-06 Reconstructing Heterogeneous Biomolecules via Hierarchical Gaussian Mixtures and Part Discovery Rekonstruieren heterogene Biomoleküle über hierarchische Gauß-Mischungen und Teil-Entdeckung 通过高山级等级混合体和部件发现,通过等级制混合体重新构造异基因生物分子 2506.09063v1 -
1466 06-06 Securing Traffic Sign Recognition Systems in Autonomous Vehicles Sicherung von Verkehrszeichenerkennungssystemen in autonomen Fahrzeugen 保障机动车辆交通标志识别系统的安全 2506.06563v1 -
1467 06-06 Enhancing radioisotope identification in gamma spectra via supervised domain adaptation Verbesserung der Radioisotop-Identifikation in Gammaspektren durch Überwachung der Domänenanpassung 通过受监督的域适应加强伽马光谱中放射性同位素识别 2412.07069v2 -
1468 06-06 ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models ThinkEdit: Verdolmetschbare Gewichte bearbeiten, um übermäßig kurzes Denken in vernünftigen Modellen zu vermischen 思考Edit:将可解释的重量编辑为 “ 解释性模型 “ 中模拟超短思考 2503.22048v3 -
1469 06-06 Rapid training of Hamiltonian graph networks without gradient descent Schnelles Training von Hamiltonschen Graphennetzwerken ohne Gradientenabstieg 快速培训汉密尔顿图表网络,无梯度下降 2506.06558v1 -
1470 06-06 Infinity Search: Approximate Vector Search with Projections on q-Metric Spaces Infinity Search: Ungefähre Vektorsuche mit Projektionen auf q-Metric Spaces 无限搜索: 近似矢量搜索和 q- 计量空间预测 2506.06557v1 -
1471 06-06 SDN-Based False Data Detection With Its Mitigation and Machine Learning Robustness for In-Vehicle Networks SDN-basierte falsche Datenerkennung mit seiner Abmilderung und Machine Learning Robustheit für In-Vehicle-Netzwerke 基于SDN的假数据探测,其减轻和机械学习强力适用于车辆内网络 2506.06556v1 -
1472 06-06 Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features Theoretische Grundlagen des Deep Copula Klassifikators: Ein generativer Ansatz zur Modellierung abhängiger Merkmale 深 Cocula 分类法理论基础:建模附属地貌的开创性方法 2505.22997v2 -
1473 06-06 GeoClip: Geometry-Aware Clipping for Differentially Private SGD GeoClip: Geometry-Aware Clipping für unterschiedlich private SGD GeoClip: 区分专用 SGD 的几何- 软件剪切 2506.06549v1 -
1474 06-06 MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping MINT: Multimodale Instruktions-Tuning mit multimodaler Interaktions-Gruppierung MINT: 与多式互动组合组合的多式指示调制 2506.02308v3 -
1475 06-06 Mixed-curvature decision trees and random forests Entscheidungsbäume mit gemischter Krümmung und zufällige Wälder 混合精度决定树和随机森林 2410.13879v3 -
1476 06-06 GNNAnatomy: Rethinking Model-Level Explanations for Graph Neural Networks GNNAnatomy: Rethinking Model-Level Erklärungen für Graph Neuronale Netzwerke GNNN 解剖:重新思考图形神经网络的模型级解释 2406.04548v3 -
1477 06-06 Direct Fisher Score Estimation for Likelihood Maximization Direkte Fisher-Score-Schätzung für Likelihood Maximierung 渔业直接分数估计,以争取使机会最大化 2506.06542v1 -
1478 06-06 LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs LabSafety Bench: Benchmarking LLMs zu Sicherheitsfragen in wissenschaftlichen Laboren 实验室安全法官:科学实验室安全问题LLM女士基准基准 2410.14182v3 -
1479 06-06 Rational Tuning of LLM Cascades via Probabilistic Modeling Rationales Tuning von LLM-Kaskaden durch probabilistische Modellierung 通过概率建模对LLM Casscades 进行合理测算 2501.09345v4 -
1480 06-06 Hierarchical and Collaborative LLM-Based Control for Multi-UAV Motion and Communication in Integrated Terrestrial and Non-Terrestrial Networks Hierarchische und kollaborative LLM-basierte Steuerung für Multi-UAV-Bewegung und Kommunikation in integrierten terrestrischen und nicht-terrestrischen Netzwerken 在综合陆地和非地球网络中以基于LLM(基于LLM)的基于控制系统,以控制多功能、无人驾驶航空器的移动和通信 2506.06532v1 -
1481 06-06 Training-Free Constrained Generation With Stable Diffusion Models Schulungsfreie eingeschränkte Generation mit stabilen Diffusionsmodellen 具有稳定传播模式的稳定传播模式的无培训、无培训的受训练一代 2502.05625v3 -
1482 06-06 Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs Sharp Gap-Dependent Variance-Aware Regret Bounds für Tabular MDPs 表列 MDP 的差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差幅 2506.06521v1 -
1483 06-06 A Systematic Review of Poisoning Attacks Against Large Language Models Eine systematische Überprüfung von Giftangriffen gegen große Sprachmodelle 系统审查针对大语言模式的中毒袭击 2506.06518v1 -
1484 06-06 MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding MindLLM: Ein subject-agnostisches und vielseitiges Modell für fMRI-to-Text Decodierung MindLLLM: FMRI 到文本代码的主体不可知性和易变性模型 2502.15786v2 -
1485 06-06 InstantFT: An FPGA-Based Runtime Subsecond Fine-tuning of CNN Models InstantFT: Eine FPGA-basierte Laufzeit-Untersekunde Feinabstimmung von CNN-Modellen 即时TFT:以FPGA为基础对有线电视新闻网模型进行FPGA运行时间第二级微调 2506.06505v1 -
1486 06-06 Optimal Rates in Continual Linear Regression via Increasing Regularization Optimale Raten in kontinuierlicher linearer Regression durch zunehmende Regularisierung 通过增加正规化,持续线性回归的优化率 2506.06501v1 -
1487 06-06 Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms Synthetische Problemgenerierung für die Vernunft über Qualität-Diversität-Algorithmen 通过质量-多样性算法导致的原因产生合成问题 2506.06499v1 -
1488 06-06 Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent Verbesserte Finite-Partikel-Konvergenzraten für Steinvariationsgradabszenz 改进斯坦因变化而变的渐变后代的有限粒子聚合率 2409.08469v3 -
1489 06-06 Synthetic Text Generation for Training Large Language Models via Gradient Matching Synthetische Textgenerierung für das Training großer Sprachmodelle über Gradient Matching 通过 “ 渐级匹配 “ 培训大语言模型的合成文本生成 2502.17607v2 -
1490 06-06 Differentiable Quadratic Optimization For The Maximum Independent Set Problem Unterschiedliche quadratische Optimierung für das maximale unabhängige Set-Problem 最大独立集集问题可区别的二次二次曲线优化 2406.19532v7 -
1491 06-06 RelGNN: Composite Message Passing for Relational Deep Learning RelGNN: Composite-Botschaft für relationales Deep Learning RelGNN:用于关系深深学习的综合信息传递 2502.06784v2 -
1492 06-06 Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks Alternierende Gradientenströme: Eine Theorie des Feature-Lernens in zweischichtigen Neuronalen Netzwerken 交错的渐变流:两层神经网络中的特色学习理论 2506.06489v1 -
1493 06-06 Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws Diversity of Transformer Layers: Ein Aspekt der Parameterskalierungsgesetze 变形层多样性:参数面积法的一个方面 2505.24009v2 -
1494 06-06 Membership Inference Attacks for Unseen Classes Mitgliedschafts-Inferenz-Angriffe für ungesehene Klassen 对不见人见的类类的推论攻击 2506.06488v1 -
1495 06-06 Neural-Augmented Kelvinlet: Real-Time Soft Tissue Deformation with Multiple Graspers Neural-Augmented Kelvinlet: Echtzeit-Soft Tissue-Deformation mit mehreren Graspers 神经加速开尔文莱:实时软体组织变形,带多颗粒变形 2506.08043v1 -
1496 06-06 A Certified Unlearning Approach without Access to Source Data Ein zertifizierter Unlearning-Ansatz ohne Zugriff auf Quelldaten 未经认证的不学习方法,无法获得源数据 2506.06486v1 -
1497 06-06 The Economic Dispatch of Power-to-Gas Systems with Deep Reinforcement Learning:Tackling the Challenge of Delayed Rewards with Long-Term Energy Storage Die wirtschaftliche Dispatch von Power-to-Gas-Systemen mit tiefem Ausbau-Lernen:Die Herausforderung von verzögerten Belohnungen mit langfristiger Energiespeicherung bewältigen 以深入强化学习的方式经济发送电力到地电系统:利用长期能源储存应对延迟奖励的挑战 2506.06484v1 -
1498 06-06 Noise Consistency Regularization for Improved Subject-Driven Image Synthesis Lärmkonsistenz-Regularisierung für verbesserte themengetriebene Bildsynthese 改进主题驱动图像合成改进目标驱动图像集的一致一致规范化 2506.06483v1 -
1499 06-06 TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness TimeRecipe: Ein Time-Series-Prognose-Rezept über Benchmarking Module Level Wirksamkeit 时间Recpe: 通过基准标准模块的实时序列预测食谱 2506.06482v1 -
1500 06-06 Unveiling Markov Heads in Pretrained Language Models for Offline Reinforcement Learning Markov-Köpfe in vorgebildeten Sprachmodellen für Offline-Verstärkungslernen enthüllen 离线强化学习培训前语言模式中未排行的Markov负责人 2409.06985v2 -
1501 06-06 DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation DELT: Ein einfaches Diversity-orientiertes EarlyLate-Training für die Datensatzdestillation DELT: 以简单多样性驱动的数据集蒸馏早期远程培训 2411.19946v2 -
1502 06-06 Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage Kosteneffizientes LLM-Training mit Lifetime-Aware Tensor-Offloading über GPUDirect Storage 通过GPUPUDirect 储存进行终身软件传感器脱载的成本效益高的LLM培训 2506.06472v1 -
1503 06-06 SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation SIGMA:通过西布林-引导蒙特卡罗增殖改进大语言示范理由说明 2506.06470v1 -
1504 06-06 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning BRiTE: Bootstrapping verstärkter Denkprozess zur Verbesserung von Sprachmodellen Brite:强化强化思考程序以加强语言示范理由 2501.18858v2 -
1505 06-06 Latent Thought Models with Variational Bayes Inference-Time Computation Latent Thought Models mit Variational Bayes Inferenz-Zeit-Computation 具有变异性贝湾推推算-时间计算法的古老思想模型 2502.01567v2 -
1506 06-06 FairICP: Encouraging Equalized Odds via Inverse Conditional Permutation FairICP: Förderung ausgeglichener Odds durch inverse Conditional Permutation FairICP:通过反条件性变异鼓励平等奇数 2404.05678v4 -
1507 06-06 Towards Infant Sleep-Optimized Driving: Synergizing Wearable and Vehicle Sensing in Intelligent Cruise Control Auf dem Weg zum schlafoptimierten Fahren für Säuglinge: Verwendbares und Fahrzeug sensibilisieren in intelligenter Kreuzfahrtsteuerung 婴儿睡眠优化驾驶:在智能巡航控制中统一穿戴和车辆遥感 2506.06459v1 -
1508 06-06 WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets WISCA: Ein auf Konsens basierender Ansatz zur Harmonisierung der Dolmetschbarkeit in Tabellendatensätzen WISSCA: 统一图表数据集解释性的协商一致办法 2506.06455v1 -
1509 06-06 LETS Forecast: Learning Embedology for Time Series Forecasting LETS-Prognose: Lernembedologie für Zeitreihenprognosen LETS预测: 学习时间序列预测的生态学 2506.06454v1 -
1510 06-06 Tackling Feature and Sample Heterogeneity in Decentralized Multi-Task Learning: A Sheaf-Theoretic Approach Schlagen von Feature und Sample Heterogenität im dezentralisierten Multi-Task-Lernen: Ein schaftheoretischer Ansatz 处理分散化多任务学习中的地貌特征和抽样差异:沙夫理论方法 2502.01145v2 -
1511 06-06 Canonical Autoregressive Generation Kanonische autoregressive Generation Canonic 自动后退一代 2506.06446v1 -
1512 06-06 Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Saffron-1: Auf dem Weg zu einem Skalierungsparadigma für die LLM-Sicherheitssicherung 山红-1:努力为LLM公司安全保证建立一个推论范围扩增范例 2506.06444v1 -
1513 06-06 Unlocking Chemical Insights: Superior Molecular Representations from Intermediate Encoder Layers Entsperren von chemischen Erkenntnissen: Überlegene molekulare Darstellungen von Zwischen-Encoder-Schichten 解锁化学透视:中间编码层的高级分子代表 2506.06443v1 -
1514 06-06 A Framework of decision-relevant observability: Reinforcement Learning converges under relative ignorability Ein Rahmen entscheidungsrelevanter Beobachtbarkeit: Verstärktes Lernen konvergiert unter relativer Entzündlichkeit 与决策相关的可观察性框架:加强学习在相对忽视的情况下趋同 2504.07722v4 -
1515 06-06 Exploring Diffusion Transformer Designs via Grafting Erforschen von Diffusion Transformer Designs über Grafting 探索通过Garfting 开发传播变异器设计 2506.05340v2 -
1516 06-06 Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias Eigenspektrumanalyse von Neuronalen Netzwerken ohne Aspect Ratio Bias 对无视比比比的神经网络进行微微光谱分析 2506.06280v1 -
1517 06-06 STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis STARFlow: Skalierung von Latent Normalisierungsströmen für hochauflösende Bildsynthese STARRFlow: 增加高分辨率图像合成的中端正常化流程 2506.06276v1 -
1518 06-06 Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding Filmfakten und Fibs (MF$^2$): Ein Benchmark für langes Filmverständnis 电影事实和纤维(MF$2美元):长期电影理解基准 2506.06275v1 -
1519 06-06 Do Large Language Models Reason Causally Like Us? Even Better? Haben große Sprachmodelle Grund ursächlich wie wir? Noch besser? 大语言模型的原因和我们一样吗?更好吗? 2502.10215v2 -
1520 06-06 Covering Number of Real Algebraic Varieties and Beyond: Improved Bounds and Applications Abdeckung der Anzahl der realen algebraischen Sorten und darüber hinaus: Verbesserte Grenzen und Anwendungen 覆盖实际代数差异数目及以后:改善边界和应用 2311.05116v4 -
1521 06-06 Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems Sketched Equivariant Imaging Regularization und Deep Internal Learning für inverse Probleme 用于反向问题的预言的等同成像正规化和深内部学习 2411.05771v4 -
1522 06-06 Normalizing Flows are Capable Generative Models Normalisierende Strömungen sind fähige Generative Modelle 正常流动是能够生成的模型 2412.06329v3 -
1523 06-06 Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens Reflect-then-Plan: Offline-Modellbasierte Planung durch eine doppelt Bayesian-Lens 反映时计划:通过杜布里·巴耶斯冷镜头进行离线示范规划 2506.06261v1 -
1524 06-06 DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation DPCore: Dynamischer Prompt-Kernsatz für kontinuierliche Test-Zeit-Anpassung DPCore: 持续试验-时间适应动态快速核心集 2406.10737v4 -
1525 06-06 PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time PersonaAgent: Wenn große Sprachmodell-Agenten Personalisierung zur Testzeit treffen 人 员:当大语言示范媒介在试验时间达到个性化时 2506.06254v1 -
1526 06-06 Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Sparse Autoencoder lernen monosemantische Funktionen in Vision-Language-Modellen 在视觉语言模型中学习数学特征 2504.02821v2 -
1527 06-06 Lagrangian-based Equilibrium Propagation: generalisation to arbitrary boundary conditions & equivalence with Hamiltonian Echo Learning Lagrangian-basierte Equilibrium Propagation: Verallgemeinerung zu willkürlichen Randbedingungen & Äquivalenz mit Hamiltonian Echo Learning Lagrangian 以拉格朗江为基础的平衡传播:对任意边界条件的概括化和与汉密尔顿回声学习的等同 2506.06248v1 -
1528 06-06 Neural Responses to Affective Sentences Reveal Signatures of Depression Neurale Reaktionen auf Affective Sentences Offenbarungssignaturen von Depressionen 对消极判决的神经反应 反萧条征兆 2506.06244v1 -
1529 06-06 LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models LlavaGuard: Ein offenes VLM-basiertes Framework zur Sicherung von Vision-Datensätzen und -Modellen LlavaGuard:基于VLM的公开保护视觉数据集和模型框架 2406.05113v3 -
1530 06-06 fairmetrics: An R package for group fairness evaluation fairmetrics: Ein R-Paket für die Bewertung von Gruppengerechtigkeit 公平度:团体公平评估R包件 2506.06243v1 -
1531 06-06 TQml Simulator: Optimized Simulation of Quantum Machine Learning TQml Simulator: Optimierte Simulation des Quantum Machine Learning TQml 模拟器: 量子机器学习优化模拟 2506.04891v2 -
1532 06-06 Towards an Explainable Comparison and Alignment of Feature Embeddings Auf dem Weg zu einem erklärbaren Vergleich und der Ausrichtung von Feature-Embeddings 争取实现可解释的比较和地貌隐含物的一致 2506.06231v1 -
1533 06-06 NeurNCD: Novel Class Discovery via Implicit Neural Representation NeurNCD: Novel Class Discovery über Implizite Neuraldarstellung NeurNCD: 通过隐性神经代表方式发现新类 2506.06412v1 -
1534 06-06 Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets Fréchet Radiomic Distance (FRD): Ein vielseitiges Metric für den Vergleich medizinischer Bilddatensätze Fréchet放射性距离(FRD):用于比较医疗成像数据集的Versatile计量仪 2412.01496v2 -
1535 06-06 Corrector Sampling in Language Models Korrektor-Probenahme in Sprachmodellen 语言模型中校正器样本 2506.06215v1 -
1536 06-06 Model-Driven Graph Contrastive Learning Modellgetriebenes Graph Kontrastives Lernen 模型驱动图反比学习 2506.06212v1 -
1537 06-06 Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning Pseudo-Etikettierung trifft Labelglättung für geräuscharmes partiales Label-Lernen Pseudo 标签会满足新闻部分标签学习的标签平滑 2402.04835v3 -
1538 06-06 Kinetics: Rethinking Test-Time Scaling Laws Kinetik: Überdenken von Test-Zeit-Skalierungsgesetzen 动因:重新思考试验时间扩增法 2506.05333v2 -
1539 06-06 ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment ByzSecAgg: Ein Byzantinisch-Resistent Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment ByzsecAgg:基于编码计算和矢量承诺的联邦学习拜占庭 – – 相对安全安全聚合计划 2302.09913v4 -
1540 06-06 How to craft a deep reinforcement learning policy for wind farm flow control Wie man eine tief greifende Lernpolitik für die Steuerung des Windparkflusses schafft 如何为风力农场流动控制制定深入强化学习政策 2506.06204v1 -
1541 06-06 MLOps with Microservices: A Case Study on the Maritime Domain MLOps mit Microservices: Eine Fallstudie zum maritimen Bereich 具有微服务的多边业务方案:海洋领域案例研究 2506.06202v1 -
1542 06-06 Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons Quantifizierung der Optimierungs- und Generalisierungsvorteile von Graphen-Neuralnetzwerken gegenüber mehrschichtigen Perzeptronen 量化多层倍感图像神经网络的优化和普及优势 2306.13926v3 -
1543 06-06 DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models DyGMamba: Langfristige zeitliche Abhängigkeiten auf kontinuierlich-zeitdynamischen Graphen mit State Space Models effizient modellieren DyGMamba:利用国家空间模型以连续时动态图表作为有效模拟长期时间依赖模型 2408.04713v4 -
1544 06-06 Transformative or Conservative? Conservation laws for ResNets and Transformers Transformative oder konservative? Erhaltungsgesetze für ResNets und Transformer 改革还是保守? ResNets和变异器的养护法 2506.06194v1 -
1545 06-06 ICU-TSB: A Benchmark for Temporal Patient Representation Learning for Unsupervised Stratification into Patient Cohorts ICU-TSB: Benchmark für zeitliche Patientenrepräsentanz-Lernen zur unüberwachten Schichtung in Patientenkohorten ICU-TSB: 用于在不受监督的情况下将患者分划入患者群体的时间病人代表制学习基准 2506.06192v1 -
1546 06-06 Physics-Informed Neural Networks for Control of Single-Phase Flow Systems Governed by Partial Differential Equations Physik-informierte Neuronale Netze zur Steuerung von einphasigen Flusssystemen, die durch partielle Differentialgleichungen gesteuert werden 用于控制按部分差异等量划分的单阶段流动系统的物理内建神经网络 2506.06188v1 -
1547 06-06 Antithetic Noise in Diffusion Models Antithetischer Lärm bei Diffusionsmodellen 扩散模型中的抗异静能 2506.06185v1 -
1548 06-06 SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels SemiOccam: Ein robustes semi-überwachtes Bilderkennungsnetzwerk mit Sparse-Labels 半 Occam: 使用粗略标签粗略标签的强力半半超图像识别网络 2506.03582v2 -
1549 06-06 A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization Eine theoretische Studie der (Hyper-)Selbstachtung durch das Lens of Interactions: Repräsentation, Ausbildung, Verallgemeinerung 通过互动的镜头进行(Hyper)自我关注的理论研究:代表性、培训、普遍化 2506.06179v1 -
1550 06-06 CoxNTF: A New Approach for Joint Clustering and Prediction in Survival Analysis CoxNTF: Ein neuer Ansatz für Joint Clustering und Vorhersagen in der Überlebensanalyse CoxNTF:在生存分析中联合集群和预测的新方法 2506.06411v1 -
1551 06-06 Reusing Trajectories in Policy Gradients Enables Fast Convergence Wiederverwendung von Trajektorien in politischen Gradienten ermöglicht schnelle Konvergenz 在政策梯度中重新使用轨迹以启用快速聚合 2506.06178v1 -
1552 06-06 Improving choice model specification using reinforcement learning Verbesserung der Wahlmodellspezifikation mittels Verstärkungslernen 利用强化学习改进选择模式规格 2506.06410v1 -
1553 06-06 Emergent Response Planning in LLMs Emergent Response Planning in LLMs 新的LLMM 应急规划 2502.06258v2 -
1554 06-06 A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks Riemannische Optimierungsperspektive der Gauß-Newton-Methode für neurale Netze Fefforward神经网络的高斯-牛顿方法的优化观点 2412.14031v4 -
1555 06-06 The Lock-in Hypothesis: Stagnation by Algorithm Die Lock-in Hypothese: Stagnation durch Algorithmen Lock-in 假说: 阿尔哥里特姆的停滞 2506.06166v1 -
1556 06-06 Automated Journalistic Questions: A New Method for Extracting 5W1H in French Automatisierte journalistische Fragen: Eine neue Methode zur Extraktion von 5W1H in Französisch 自动新闻问题:以法文提取5W1H的新方法 2505.14804v2 -
1557 06-06 Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning Nicht alle Rollouts sind nützlich: Down-Sampling Rollouts im LLM Verstärkungslernen 并非所有的推出都有用:在LLM强化学习中进行下选推出 2504.13818v2 -
1558 06-06 ENMA: Tokenwise Autoregression for Generative Neural PDE Operators ENMA: Tokenwise Autoregression für generative neurale PDE-Betreiber ENMA: 生成神经器 PDE 运算符的取向自动递减 2506.06158v1 -
1559 06-06 Deconfounding Multi-Cause Latent Confounders: A Factor-Model Approach to Climate Model Bias Correction Deconfounding Multi-Cause Latent Confounders: Ein Faktor-Modell Ansatz zur Klimamodell Bias Korrektur 多原因、中位本部混乱创始人:对气候模型比重纠正的因数模式办法 2408.12063v2 -
1560 06-06 Graph Neural Network Generalization with Gaussian Mixture Model Based Augmentation Graphische Neural-Netzwerk-Verallgemeinerung mit Gauß-Mischungsmodell-basierter Augmentation Gaussian 混合模型基底放大模型的神经网络通用 2411.08638v3 -
1561 06-06 ELEVATE-GenAI: Reporting Guidelines for the Use of Large Language Models in Health Economics and Outcomes Research: an ISPOR Working Group on Generative AI Report ELEVATE-GenAI: Berichterstattungsrichtlinien für den Einsatz von großen Sprachmodellen in der Gesundheitsökonomie und der Ergebnisforschung: eine ISPOR-Arbeitsgruppe für generative KI-Berichte ELEVATE-GenAI:在卫生经济学和成果研究中使用大语言模式的报告准则:科学、科学、科学、科学、技术研究所创举报告工作组 2501.12394v2 -
1562 06-06 Laplace Transform Based Low-Complexity Learning of Continuous Markov Semigroups Laplace Transform-basierte Low-Complexity-Lernen von kontinuierlichen Markov-Halbgruppen 连续马尔科夫半群体低复杂程度低学习 2410.14477v2 -
1563 06-06 LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws LLMs on the Line: Daten bestimmen die Loss-to-Loss-Skalierungsgesetze 线上LLM:数据确定损失到损失的缩放法 2502.12120v2 -
1564 06-06 carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks carps: Ein Rahmen für den Vergleich von N Hyperparameter-Optimierern auf M Benchmarks 在M基准上比较超超参数最佳放大器的框架 2506.06143v1 -
1565 06-06 Efficient Diffusion Models: A Survey Effiziente Diffusionsmodelle: Eine Umfrage 高效传播模型:调查 2502.06805v3 -
1566 06-06 Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models Tabelle-r1: Selbstbeaufsichtigtes und verstärktes Lernen für programmbasierte Tabellenveranlagung in kleinen Sprachmodellen 表-r1:针对基于方案的小语言模式说明理由的表格进行自我监督和强化学习 2506.06137v1 -
1567 06-06 Regret-Free Reinforcement Learning for LTL Specifications Bedauern-freie Verstärkungs-Lernen für LTL-Spezifikationen 为远程学习具体要求而后悔地免费加强学习 2411.12019v2 -
1568 06-06 VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters VisionTS: Visuelle Maskierte Autoencoder sind kostenlose Zero-Shot-Zeitreihen-Forecaster 视野:视觉蒙面自动识别器是无热零热时间序列预测器 2408.17253v4 -
1569 06-06 Similarity Matching Networks: Hebbian Learning and Convergence Over Multiple Time Scales Ähnlichkeit passender Netzwerke: Hebbisches Lernen und Konvergenz über mehrere Zeitskalan 相似性匹配网络: Hebbian 学习和多重时间尺度的趋同 2506.06134v1 -
1570 06-06 Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions Verbesserung der numerischen Stabilität des Normalisierten gegenseitigen Informations-Schätzers über hohe Abmessungen 提高高方面标准化相互信息模拟器的数值稳定 2410.07642v2 -
1571 06-06 Gradient Similarity Surgery in Multi-Task Deep Learning Gradient Ähnlichkeit Chirurgie in Multi-Task Deep Learning 多任务深层学习中的渐渐相似性外科手术 2506.06130v1 -
1572 06-06 Flow-Attentional Graph Neural Networks Fluss-aufmerksamer Graph Neuronale Netzwerke 流动动态图图神经网络 2506.06127v1 -
1573 06-06 Convergence of linear programming hierarchies for Gibbs states of spin systems Konvergenz von linearen Programmierhierarchien für Gibbs-Zustände von Spinsystemen Gibbbs旋转系统状态线性编程分级组合 2506.06125v1 -
1574 06-06 Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library Verstärkte Lernoptimierung für großformatiges Lernen: Eine effiziente und benutzerfreundliche Scaling-Bibliothek 大规模学习的强化学习优化:高效和方便用户的扩大规模图书馆 2506.06122v1 -
1575 06-06 Training on Plausible Counterfactuals Removes Spurious Correlations Training auf Plausible Counterfactals entfernt spurlose Korrelationen 关于可视反事实消除污损的培训 2505.16583v5 -
1576 06-06 Learners’ Languages Sprachen der Lernenden 学生语言 2103.01189v3 -
1577 06-06 Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness Auf dem Weg zum Lebenszyklus Unlearning Commitment Management: Messung der Stichprobenebene Unlearning Completeness 实现生命周期不学习承诺管理:衡量抽样一级不学习 2506.06112v1 -
1578 06-06 Synthetic Tabular Data: Methods, Attacks and Defenses Synthetische Tabellendaten: Methoden, Angriffe und Verteidigungen 合成图表数据:方法、攻击和防御 2506.06108v1 -
1579 06-06 The Canary’s Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text Das Echo der Kanaren: Auditing Privacy Risks of LLM-generated Synthetic Text 加那利的回声:LLM发明的合成文字的 审计隐私风险 2502.14921v2 -
1580 06-06 Label-Context-Dependent Internal Language Model Estimation for CTC Label-Context-Dependent Interne Sprachmodellschätzung für CTC Label-Ctext-依赖性内部语言模型 反恐委员会内部语言模型估算 2506.06096v1 -
1581 06-06 Flexible Operator Fusion for Fast Sparse Transformer with Diverse Masking on GPU Flexible Operator Fusion für schnellen Sparse Transformer mit vielfältigem maskieren auf GPU 具有 GPU 上多种掩码的快速分散变异器灵活操作器融合 2506.06095v1 -
1582 06-06 On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems On-Board-Mission-Replanning für adaptive kooperative Multi-Roboter-Systeme 特派团适应性合作多机器人系统重新规划 2506.06094v1 -
1583 06-06 Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture Paradigmen der AI-Bewertung: Kartierungsziele, Methoden und Kultur AI 评价:绘图目标、方法和文化 2502.15620v2 -
1584 06-06 HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions HeavyWater und SimplexWater: Wassermarkierung mit geringer Entropie Textverteilung 重水和简易水:低英基文本分发的标注水 2506.06409v1 -
1585 06-06 LinGuinE: Longitudinal Guidance Estimation for Volumetric Lung Tumour Segmentation LinGuinE: Longitudinal Guidance Schätzung für Volumetrische Lungentumorsegmentierung LinGuinE: 活性肺炎肿瘤分离的纵向指导估计 2506.06092v1 -
1586 06-06 Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion Multidimensionaler Adaptiver Koeffizient für die Inferenz-Trajektorie-Optimierung in Strömung und Diffusion 流动与传播过程中的多维可调适系数 2404.14161v3 -
1587 06-06 Multilevel neural simulation-based inference Mehrstufige neuronale simulationsbasierte Inferenz 多级神经模拟模拟推断 2506.06087v1 -
1588 06-06 A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data Ein neuartiges, Mensch-in-the-Loop Computational Grounded Theory Framework für große soziale Daten 大型社会数据新颖的、 “ 网上人 “ 计算基础理论大社会数据新理论框架 2506.06083v1 -
1589 06-06 Training Deep Learning Models with Norm-Constrained LMOs Ausbildung von Deep Learning-Modellen mit normgebundenen ABS 与受诺姆约束的LMO培训的LMO培训深学习模式 2502.07529v2 -
1590 06-06 Proximal Policy Distillation Proximale Politikdestillation Proximal 政策蒸馏 2407.15134v2 -
1591 06-06 System-Aware Unlearning Algorithms: Use Lesser, Forget Faster System-Aware Unlearning Algorithmen: Weniger verwenden, schneller vergessen 系统提醒 取消学习算法 : 使用更少, 忘记更快 2506.06073v1 -
1592 06-06 ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis ProofAug: Effiziente Neuraltheorem-Probung mittels feinkörniger Proof-Strukturanalyse 校对:通过精细的校对结构分析证明有效的神经理论 2501.18310v2 -
1593 06-06 Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models Bewertung von Intersektionalen Bias in Darstellungen von vortrainierten Bilderkennungsmodellen 评估培训前图像识别模型代表性的跨部门偏见 2506.03664v2 -
1594 06-06 The Influence of the Memory Capacity of Neural DDEs on the Universal Approximation Property Der Einfluss der Speicherkapazität von neuralen DDEs auf die universelle Annäherungseigenschaft 神经DDE的记忆力对普遍接近财产的影响 2505.07244v2 -
1595 06-06 Zero-Shot Detection of LLM-Generated Code via Approximated Task Conditioning Zero-Shot-Erkennung des LLM-generierten Codes über eine angenäherte Aufgabenkonditionierung 通过近似任务设置条件对LLM生成的代码进行零热检测 2506.06069v1 -
1596 06-06 Feedforward Few-shot Species Range Estimation Einspeisung Wenig abgespuckte Arten Range-Schätzung 向前进的微粒物种范围估计 2502.14977v2 -
1597 06-06 HIGHT: Hierarchical Graph Tokenization for Molecule-Language Alignment HIGHT: Hierarchische Graphen-Tokenisierung für die Ausrichtung von Molekülen HIG: 分子-语言对齐的等级图形方位化 2406.14021v2 -
1598 06-06 Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large $p$ Ab Initio Nichtparametrische Variable Auswahl für skalierbare symbolische Regression mit großen $p$ 用于大额p美元可缩放的符号回归的非参数变量选择 2410.13681v2 -
1599 06-06 Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation Effiziente feinkörnige Anleitung für die Diffusionsmodellbasierte Symbolische Musikgenerierung 以模型为基础的示范符号音乐制作的 扩散模型的精效微粒指导 2410.08435v3 -
1600 06-06 MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding MedXpertQA: Benchmarking von medizinischem Verständnis auf Expertenebene MedXperttQA:确定专家级医疗理由和理解基准 2501.18362v3 -
1601 06-06 TRUST: Test-time Resource Utilization for Superior Trustworthiness TRUST: Testzeit-Ressourcennutzung für höchste Vertrauenswürdigkeit 信托基金:高可信赖性试验时间利用资源 2506.06048v1 -
1602 06-06 Diffusion-Based Hierarchical Graph Neural Networks for Simulating Nonlinear Solid Mechanics Diffusionsbasierte Hierarchische Graphen-Neuralnetze zur Simulation nichtlinearer Festmechanik 模拟非线性固体机械师 2506.06045v1 -
1603 06-06 SDS-Net: Shallow-Deep Synergism-detection Network for infrared small target detection SDS-Net: Shallow-Deep Synergism-Detektionsnetzwerk für Infrarot-Kleinzielerkennung SDS-Net:红外小型目标探测浅水深协同-探测网络 2506.06042v1 -
1604 06-06 Tensor-to-Tensor Models with Fast Iterated Sum Features Tensor-to-Tensor-Modelle mit schnell iterierten Sum-Features 具有快速迭代超特性的电传感器到传感器模型 2506.06041v1 -
1605 06-06 Do-PFN: In-Context Learning for Causal Effect Estimation Do-PFN: In-Context-Lernen für die Schätzung des Kausaleffekts DoPFN: 因果关系估算的理论内学习 2506.06039v1 -
1606 06-06 CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models CRISP-NAM: Konkurrenzfähige Risiken interpretierbare Überlebensvorhersage mit neuralen Additivenmodellen CRIISP-NAM: 与神经添加模型相竞争的风险解释性生存预测 2505.21360v2 -
1607 06-06 A PID-Controlled Tensor Wheel Decomposition Model for Dynamic Link Prediction Ein PID-gesteuertes Tensor-Radzersetzungsmodell für dynamische Link-Vorhersage PID-控制式天线仪式轮式动态链接预测分解模型 2505.14211v2 -
1608 06-06 Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling Neural Flow Diffusion Modelle: Erlernbarer Vorwärtsprozess für verbesserte Diffusionsmodellierung 神经流流传播模型:改进传播模型的可学习前进进程 2404.12940v3 -
1609 06-06 SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations SDE Matching: Skalierbares und simulationsfreies Training latenter stochastischer Differentialgleichungen SDE 匹配:可缩放和模拟无模拟的静态碎裂差异等量模拟培训 2502.02472v2 -
1610 06-06 Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification Probenspezifische Geräuschinjektion für die Diffusions-basierte Adversarialreinigung 以传播为基础的反向净化的样本特定噪音注射 2506.06027v1 -
1611 06-06 On Inverse Problems, Parameter Estimation, and Domain Generalization Inverse Probleme, Parameterschätzung und Domain-Verallgemeinerung 反向问题、参数估计和域通用化 2506.06024v1 -
1612 06-06 Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation Begründung durch Ausführung: Vereinheitlichung von Prozess- und Ergebnisprämien für die Codegenerierung 执行中的理由:代码生成的统一程序和结果奖励 2412.15118v2 -
1613 06-06 Improving Customer Service with Automatic Topic Detection in User Emails Verbesserung des Kundenservices mit automatischer Themenerkennung in Benutzer-E-Mails 改进用户电子邮件自动专题检测客户服务 2502.19115v3 -
1614 06-06 Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems Unisoma: Ein Unified Transformer-basierter Solver für Multi-Solid-Systeme Unisoma:多层系统统一变压器解决方案 2506.06021v1 -
1615 06-06 One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy Ein Stein, zwei Vögel: Verstärkung der feindlichen Verteidigung durch die Linse der distributiven Diskrepanz 一石,两鸟:通过分布性差异的镜头加强反向防御 2503.02169v2 -
1616 06-06 A Theoretical Justification for Asymmetric Actor-Critic Algorithms Eine theoretische Rechtfertigung für asymmetrische Aktor-Kritische Algorithmen 关于非对称精算作用者-批评算法的理论理由 2501.19116v2 -
1617 06-06 A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces Ein Fisher-Rao Gradientenstrom für Entropie-regularisierte Markov-Entscheidungsprozesse in polnischen Räumen 用于波兰空间的环丙正规化的Markov决策过程的Fisher-Rao梯度流 2310.02951v3 -
1618 06-06 pLDDT-Predictor: High-speed Protein Screening Using Transformer and ESM2 pLDDT-Predictor: High-Speed Protein Screening mit Transformer und ESM2 pLDDT-预测:使用变压器和无害环境管理2进行高速蛋白素筛选 2410.21283v3 -
1619 06-06 Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation Klon-Robuste Gewichte in metrischen Räumen: Umgang mit Redundanz-Bias für Benchmark-Aggregation 中度空间的克隆-罗布体重量:为基准聚合处理冗余琵琶湖 2502.03576v2 -
1620 06-06 LightGTS: A Lightweight General Time Series Forecasting Model LightGTS: Ein leichtes allgemeines Zeitreihen-Vorhersagemodell LightGTS:轻型一般时间序列预测模型 2506.06005v1 -
1621 06-06 What Really is a Member? Discrediting Membership Inference via Poisoning Was ist wirklich ein Mitglied? Diskreditierende Mitgliedschaftsinferenz durch Vergiftung 真正的成员是什么? 2506.06003v1 -
1622 06-06 LaDEEP: A Deep Learning-based Surrogate Model for Large Deformation of Elastic-Plastic Solids LaDEEP: Ein Deep Learning-basiertes Surrogate-Modell für große Deformation von elastischen-Plastischen Festkörpern LaDEEEP:一个基于深层学习的弹性固体大规模变形代金模型 2506.06001v1 -
1623 06-06 Machine learning for in-situ composition mapping in a self-driving magnetron sputtering system Maschinelles Lernen für die In-situ-Kompositionsmapping in einem selbstfahrenden Magnetron-Sputtersystem 在自驾驶磁磁突飞系统内进行原地成像绘图的机器学习 2506.05999v1 -
1624 06-06 RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory RETENTION: Ressourceneffizientes baumbasiertes Ensemblemodell Beschleunigung mit Content-Addressablem Speicher 消耗:基于资源的高效树基集合模型加速,内装内存内存内装内存内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内 2506.05994v1 -
1625 06-06 Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning Diffusions-Politiken für die Verallgemeinerung von Out-of-Distribution im Offline-Verstärkungs-Lernen 在离线强化学习中推广分配外普遍化政策 2307.04726v4 -
1626 06-06 Peri-LN: Revisiting Normalization Layer in the Transformer Architecture Peri-LN: Normalisierungsschicht in der Transformer-Architektur neu aufsuchen Peri-LN: 变形器结构中重新审视正常化层 2502.02732v3 -
1627 06-06 Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning Dynamische Mischung von Progressivem Parameter-Effizienter Expertenbibliothek für lebenslanges Roboterlernen 终身机器人学习进步参数有效专家图书馆动态混合体 2506.05985v1 -
1628 06-06 Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations Jacobian Sparse Autoencoder: Sparsify Computations, nicht nur Aktivierungen Jacobian Sparass Autoencoders: 简化计算,而不仅仅是激活 2502.18147v2 -
1629 06-06 A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations Eine Übersicht über automatische Bewertungsmethoden zu Text-, Bild- und Sprachgenerierungen 文字、视觉和语音一代自动评价方法调查 2506.10019v1 -
1630 06-06 HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks HO-FMN: Hyperparameter-Optimierung für schnelle Minimal-Norm-Angriffe HO-FMN: 快速最低目标攻击的超参数优化 2407.08806v2 -
1631 06-06 AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification AMPED: Adaptive Multi-Objektive Projektion zum Ausgleich von Exploration und Geschicklichkeit Diversifizierung AMMPED: 兼顾探索与技能多样化的适应性多目标预测 2506.05980v1 -
1632 06-06 Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning Mit Adaptive Transformer Block Expansion in Federated Fine-Tuning Catastrophic Forgetting 减轻灾难性灾难,以适应性变异器集团扩大联邦炼油厂中的适应性变异器块 2506.05977v1 -
1633 06-06 Graph Deep Learning for Time Series Forecasting Graph Deep Learning für die Vorhersage von Zeitreihen 用于时间序列预测的深深学习图 2310.15978v2 -
1634 06-06 On Measuring Long-Range Interactions in Graph Neural Networks Bei der Messung von Langzeit-Interaktionen in Graphen-Neuralen Netzwerken 关于在图形神经网络中测量长距离相互作用 2506.05971v1 -
1635 06-06 Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning Schrittweiser Übergang vom Bellman Optimality Operator zum Bellman Operator im Online-Verstärkungslernen 从Bellman最佳运算符逐步过渡到在线强化学习中的Bellman运算符 2506.05968v1 -
1636 06-06 Preference Learning for AI Alignment: a Causal Perspective Bevorzugtes Lernen für die AI-Ausrichtung: eine kausale Perspektive AI一致的优先学习:从因果关系角度看问题 2506.05967v1 -
1637 06-06 Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities Auf dem Weg zu robusten multimodalen physiologischen Grundlagenmodellen: Umgang mit willkürlich fehlenden Modalitäten 争取建立强有力的多式多模式生理基金会模型:处理任意失踪方式 2504.19596v2 -
1638 06-06 Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology Subspezialitätsspezifisches Stiftungsmodell für intelligente Gastrointestinalpathologie 智能气胃肠道病理学 2505.21928v2 -
1639 06-06 Certification for Differentially Private Prediction in Gradient-Based Training Zertifizierung für unterschiedliche private Vorhersagen im gradient-based Training 渐进式培训中差别私人预测认证 2406.13433v3 -
1640 06-06 Diving into Self-Evolving Training for Multimodal Reasoning Eintauchen in das Selbst-Evolving Training für multimodale Vernunft 跳入多模式理由自我发展培训 2412.17451v3 -
1641 06-06 Applying XAI based unsupervised knowledge discovering for Operation modes in a WWTP. A real case: AQUAVALL WWTP Anwendung von XAI-basierten unüberwachten Kenntnissen für Betriebsmodi in einem WWTP. Ein echter Fall: AQUAVALL WWTP 在WWTP中应用基于 XAI 的未经监督的操作模式知识发现 XAI 。 一个真实案例: AQUAVALL WWTP 2506.05958v1 -
1642 06-06 Inferring entropy production in many-body systems using nonequilibrium MaxEnt Ableitung der Entropieproduktion in vielen Körpersystemen unter Verwendung von Nonequilibrium MaxEnt 在使用无平衡的MaxEnt的多机体系统中,在多机体系统中产生导导导导酶 2505.10444v2 -
1643 06-06 Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes Deterministische Politik mit politischen Gradienten in eingeschränkten Markov-Entscheidungsprozessen lernen 在受约束的Markov决策过程中,学习具有政策分级的政策确定性政策 2506.05953v1 -
1644 06-06 How to explain grokking Wie man Grokking erklärt 如何解释怪语 2412.18624v3 -
1645 06-06 GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression GRASP: Redundante Ebenen durch adaptive Singular-Parameter für effiziente Modellkompression ersetzen GRASP: 用高效模型压缩的适应性单质参数取代冗余层 2501.00339v3 -
1646 06-06 BARK: A Fully Bayesian Tree Kernel for Black-box Optimization BARK: Ein vollbayesischer Baumkernel für Black-Box-Optimierung 用于黑盒优化的全巴伊西亚树核心圈 2503.05574v2 -
1647 06-06 AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML AutoML-Agent: Ein Multi-Agent-LLM-Framework für Full-Pipeline-AutoML 自动MAL- Agency: 全Pipeline 自动MLM 多边代理LLM 框架 2410.02958v2 -
1648 06-06 Additive decomposition of one-dimensional signals using Transformers Additive Zersetzung von eindimensionalen Signalen mit Transformern 使用变压器的单维信号的添加分解 2506.05942v1 -
1649 06-06 SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models SafeAuto: Knowledge-Enhanced Safe Autonomous Driving mit multimodalen Foundation-Modellen SafeAuto:利用多模式基金会模式加强知识增强安全自主驾驶 2503.00211v2 -
1650 06-06 Comparative Analysis of Modern Machine Learning Models for Retail Sales Forecasting Vergleichende Analyse von modernen Machine Learning-Modellen für die Prognose des Einzelhandels 零售销售预测现代机器学习模型比较分析 2506.05941v1 -
1651 06-06 Exponential Family Variational Flow Matching for Tabular Data Generation Exponentieller Familien-Variationsfluss passend zur tabellarischen Datengenerierung 用于制表数据的家庭流动量变化 2506.05940v1 -
1652 06-06 Quantifying Adversarial Uncertainty in Evidential Deep Learning using Conflict Resolution Quantifizierung von Unsicherheiten im Evidential Deep Learning mittels Konfliktlösung 以解决冲突的方式量化证据深层学习中存在的逆差不确定性 2506.05937v1 -
1653 06-06 Deep Learning Weather Models for Subregional Ocean Forecasting: A Case Study on the Canary Current Upwelling System Deep Learning Weather Models for Subregional Ocean Forecasting: Eine Fallstudie über das kanarische aktuelle Auftriebssystem 次区域海洋预报的深学习气象模型:关于加那利洋流上升系统的个案研究 2505.24429v2 -
1654 06-06 Machine Learning Predictions for Traffic Equilibria in Road Renovation Scheduling Machine Learning Vorhersagen für Verkehrsequilibrien in der Road Renovation Scheduling 道路翻新日程安排交通平衡的机器学习预测 2506.05933v1 -
1655 06-06 Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training Progressive Data Dropout: Ein verblüffend einfacher Ansatz zum schnelleren Training 渐进数据辍学:快速培训的一个令人尴尬的简单方法 2505.22342v2 -
1656 06-06 Can Masked Autoencoders Also Listen to Birds? Können maskierte Autoencoder auch Vögel hören? 面具汽车人也能听鸟的话吗? 2504.12880v3 -
1657 06-06 GraphGPT: Generative Pre-trained Graph Eulerian Transformer GraphGPT: Generative vortrainierte Graph Eulerian Transformer 图GGPT: 未经培训的先产生图 欧勒里亚变形器 2401.00529v3 -
1658 06-06 UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning UDora: Ein einheitliches rotes Teaming-Framework gegen LLM-Agenten durch Dynamisches Entführen ihrer eigenen Vernunft UDora:通过动态劫持LLM代理商的自身理由,针对这些代理商的统一红色团队框架 2503.01908v2 -
1659 06-06 Over-PINNs: Enhancing Physics-Informed Neural Networks via Higher-Order Partial Derivative Overdetermination of PDEs Über-PINNs: Verbesserung physikinformierter neuraler Netzwerke durch höhere Ordnung Partial Derivative Überbestimmung von PDEs 超标准PINNs:通过高端部分衍生因素超过PDE的确定,加强物理内化神经网络 2506.05918v1 -
1660 06-06 Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness Halbüberwachte Segmentierung jenseits der Genauigkeit neu denken: Zuverlässigkeit und Robustheit 重新思考超越准确度的半监督分割:可靠性和强性 2506.05917v1 -
1661 06-06 Approximating Latent Manifolds in Neural Networks via Vanishing Ideals Approximierende latente Manifolds in neuralen Netzen über Vanishing Ideals 借助消失的理念在神经网络中接近的中点修饰 2502.15051v2 -
1662 06-06 DeviceScope: An Interactive App to Detect and Localize Appliance Patterns in Electricity Consumption Time Series DeviceScope: Eine interaktive App zur Erkennung und Lokalisierung von Appliance-Mustern in der Stromverbrauchs-Zeitreihe 设备范围:在电力消耗时间序列中检测和本地化应用程序模式的交互式应用程序 2506.05912v1 -
1663 06-06 CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness CAPability: Ein umfassender Visual Caption Benchmark zur Bewertung von Korrektheit und Gründlichkeit 能力:评估正确性和彻底性的综合视觉能力基准 2502.14914v3 -
1664 06-06 Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants Brauchen wir Strategien zur Neuausrichtung? Eine theoretische und empirische Studie rund um SMOTE und deren Varianten 我们需要重新平衡战略吗? 围绕SMOTE及其变体进行理论和经验研究。 2402.03819v5 -
1665 06-06 A Driving Regime-Embedded Deep Learning Framework for Modeling Intra-Driver Heterogeneity in Multi-Scale Car-Following Dynamics Ein Driving Regime-embedded Deep Learning Framework zur Modellierung von Intra-Driver Heterogenität in multi-Scale Car-Following Dynamics 用于模拟多规模汽车跟踪动态中多规模汽车内部异质性模型的驱动系统(以制度为床位的深学习框架) 2506.05902v1 -
1666 06-06 State-Covering Trajectory Stitching for Diffusion Planners State-Covering Trajektorie Stich für Diffusionsplaner 用于扩散规划器的 国家覆盖轨迹切换 2506.00895v2 -
1667 06-06 Few Labels are all you need: A Weakly Supervised Framework for Appliance Localization in Smart-Meter Series Wenige Labels sind alles, was Sie brauchen: Ein schwach überwachtes Framework für die Gerätelokalisierung in der Smart-Meter-Serie 少数标签是您所需要的:智能- 密特系列中受微弱监督的应用程序本地化框架 2506.05895v1 -
1668 06-06 Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games Politikoptimierung für lineare-Quadratic Graphon Mean Field Games mit kontinuierlicher Zeit 连续时线性二次曲线平面平面运动会的政策优化 2506.05894v1 -
1669 06-06 Open Your Eyes: Vision Enhances Message Passing Neural Networks in Link Prediction Öffnen Sie Ihre Augen: Vision verbessert die Botschaft an neuralen Netzwerken in der Link-Vorhersage ” 睁开眼睛:愿景在链接预测中加强信息传递神经网络 “ 2505.08266v3 -
1670 06-06 An Uncertainty-Aware ED-LSTM for Probabilistic Suffix Prediction Eine unsichere ED-LSTM für probabilistische Suffix-Vorhersage 用于概率后置物后置物预测的不确定性( ED-LSTM) 的不确定性警告 ED-LSTM 2505.21339v2 -
1671 06-06 Variational Inference for Quantum HyperNetworks Variationelle Schlussfolgerung für Quantum HyperNetworks 量子超网络的变异推推推 2506.05888v1 -
1672 06-06 NILMFormer: Non-Intrusive Load Monitoring that Accounts for Non-Stationarity NILMFormer: Non-Intrusive Load Monitoring, die Konten für Non-Stationarität NILMFormer: 非入侵性账户的无侵入性负载监测 2506.05880v1 -
1673 06-06 Infinite-Dimensional Diffusion Models Infinite-Dimensional Diffusionsmodelle 无限不同传播模型 2302.10130v3 -
1674 06-06 A projection-based framework for gradient-free and parallel learning Ein projektionsbasierter Rahmen für gradientenfreies und paralleles Lernen 以预测为基础的无梯度和平行学习框架 2506.05878v1 -
1675 06-06 Interpretable Clustering Ensemble Interpretiertes Cluster-Ensemble 可解释的集群组合组合 2506.05877v1 -
1676 06-06 BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures BestServe: Servierstrategien mit optimalem Goodput in Kollokations- und Disaggregationsarchitekturen 最佳服务:在合用和拆分结构中采用最优优利用的优良战略 2506.05871v1 -
1677 06-06 Loss Functions for Predictor-based Neural Architecture Search Verlustfunktionen für vorausschauende neurale Architektursuche 预测基于神经建筑搜索的损失函数 2506.05869v1 -
1678 06-06 Stealix: Model Stealing via Prompt Evolution Stealix: Modelldiebstahl über Prompt Evolution Stenyix: 通过快速进化偷盗模型 2506.05867v1 -
1679 06-06 Enhancing pretraining efficiency for medical image segmentation via transferability metrics Steigerung der Vorschulungseffizienz für die Segmentierung medizinischer Bilder durch Übertragbarkeitsmetriken 通过可转让性指标,提高医疗图象分形的训练前效率 2410.18677v2 -
1680 06-06 Conditioning Diffusions Using Malliavin Calculus Konditionierung Diffusionen mit Malliavin Calculus 使用 Malliavin Calculus 有条件的传播 2504.03461v2 -
1681 06-06 Improved Allergy Wheal Detection for the Skin Prick Automated Test Device Verbesserte Allergikererkennung für das Automatisierte Testgerät Skin Prick 改进皮肤皮板自动自动测试装置的过敏 wHeal 检测 2506.05862v1 -
1682 06-06 Wavelet-based Disentangled Adaptive Normalization for Non-stationary Times Series Forecasting Wavelet-basiert Disentangled Adaptive Normalisierung für nicht-stationäre Zeitreihen Forecasting 用于非静止时报系列预测的不固定时报调适正常化 2506.05857v1 -
1683 06-06 TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research TinySQL: Ein progressiver Text-zu-SQL-Datensatz für die mechanistische Interpretationsforschung TinySQL: 用于机械解释性研究的渐进文本到SQL数据集 2503.12730v3 -
1684 06-06 BoA: Attention-aware Post-training Quantization without Backpropagation BoA: Aufmerksamkeitsbewusste Nachschulung Quantisierung ohne Backpropagation BoA: 注意培训后量化而不反言反言 2406.13474v3 -
1685 06-06 Training-Free Query Optimization via LLM-Based Plan Similarity Schulungsfreie Abfrage Optimierung über LLM-basierte Plan-Ähnlichkeit 通过LLM-基于LLM-计划相似性实现无培训查询优化 2506.05853v1 -
1686 06-06 Regional, Lattice and Logical Representations of Neural Networks Regionale, gitter- und logische Vertretungen neuraler Netzwerke 神经网络的区域、拉蒂斯和逻辑代表 2506.05834v1 -
1687 06-06 Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning Mit dem Pfeil der Zeit lernen: Hyperbolische Geometrie für rückseitig-kompatibles Repräsentationslernen 沿时间之箭学习:反向比较代表学习的超单几何学 2506.05826v1 -
1688 06-06 Domain Generalizable Knowledge Tracing via Concept Aggregation and Relation-Based Attention Domain Generalizable Knowledge Tracing über Konzeptaggregation und relationsbasierte Aufmerksamkeit 通过概念聚合和基于关系的关注寻找可普遍适用的知识 2407.02547v2 -
1689 06-06 SAGE: A Framework of Precise Retrieval for RAG SAGE: Ein Rahmen für präzises Retrieval für RAG SAGE:RAG的精确检索框架 2503.01713v3 -
1690 06-06 SageAttention2++: A More Efficient Implementation of SageAttention2 SageAttention2++: Effizientere Umsetzung von SageAttention2 SageAttention2++:更有效地实施SageAttention2 2505.21136v3 -
1691 06-06 Non-Asymptotic Length Generalization Nicht-Asymptotische Längenverallgemeinerung 非负时长一般化 2506.03085v2 -
1692 06-06 Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants 高有机铜剂在 lvLiNGAM 中的因果效应识别 2506.05202v2 -
1693 06-06 Taming Knowledge Conflicts in Language Models Zähmen von Wissenskonflikten in Sprachmodellen 在语言模式中讨论知识冲突 2503.10996v2 -
1694 06-06 Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models Boolesche Matrix-Logik-Programmierung für aktives Lernen von Genfunktionen in metabolischen Netzwerkmodellen im Genommaßstab 在基因组规模代谢网络模型中积极学习基因功能的布尔矩阵逻辑逻辑编程 2405.06724v4 -
1695 06-06 SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference SpargeAchtung: Genaue und trainingsfreie Sparse Achtung Beschleunigen jeder Modellinferenz 隔离注意:准确和无培训的分散注意加速任何典型推断 2502.18137v5 -
1696 06-06 Overcoming Challenges of Partial Client Participation in Federated Learning : A Comprehensive Review Herausforderungen der Teilkundenbeteiligung am Föderierten Lernen überwinden : Ein umfassender Überblick 克服部分客户参与联邦学习的挑战:全面审查 2506.02887v2 -
1697 06-06 BOLD: Boolean Logic Deep Learning BOLD: Boolean Logic Deep Learning 布尔逻辑深层学习 2405.16339v2 -
1698 06-06 Positional Encoding meets Persistent Homology on Graphs Positionale Kodierung trifft Persistente Homologie auf Graphen 定位编码与图纸上的常同理学相识 2506.05814v1 -
1699 06-06 E-LDA: Toward Interpretable LDA Topic Models with Strong Guarantees in Logarithmic Parallel Time E-LDA: Auf dem Weg zu interpretierbaren LDA-Themenmodellen mit starken Garantien in logarithmischer Parallelzeit E-LDA:争取在对数平行时间建立具有有力保证的可解释的LDA专题模型 2506.07747v1 -
1700 06-06 Unintended Harms of Value-Aligned LLMs: Psychological and Empirical Insights Unbeabsichtigte Schäden an wertorientierten LLMs: Psychologische und empirische Erkenntnisse 价值观一致的LMMs:心理和实证透视的意外伤害 2506.06404v1 -
1701 06-06 ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think ZeroFlow: Katastrophisches Vergessen zu überwinden ist einfacher als Sie denken 零花:克服灾难的遗忘比你想象的容易 2501.01045v4 -
1702 06-06 Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model Neuraler Zusammenbruch in kumulativen Linkmodellen für Ordinalregression: Eine Analyse mit ungezwungenem Feature Model Ordinal 倒退累积链接模型中的神经折叠:含有不受限制特性模型的分析 2506.05801v1 -
1703 06-06 Uncertainty Propagation in the Fast Fourier Transform Ungewissheitsverbreitung in der schnellen Fourier-Transformation 快速傅里叶变形中的不确定性传播 2504.10136v2 -
1704 06-06 Option Pricing Using Ensemble Learning Optionspreise mit Ensemble-Lernen 使用集合学习的选项定价 2506.05799v1 -
1705 06-06 Maximum Entropy Reinforcement Learning with Diffusion Policy Maximales Entropie-Verstärkung-Lernen mit Diffusionspolitik 采用传播政策进行最大内容强化学习 2502.11612v3 -
1706 06-06 EqCollide: Equivariant and Collision-Aware Deformable Objects Neural Simulator EqCollide: Gleichwertige und kollisionsfähige verformbare Objekte Neural Simulator EqCollide: 等同和相撞-可变形物体神经模拟器 2506.05797v1 -
1707 06-06 TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining TiC-LM: Ein Web-Scale-Benchmark für zeitkontinuierliches LLM-Vortraining TIC-LM:培训前时间-持续性LLM网络基准 2504.02107v3 -
1708 06-06 Exploiting Similarity for Computation and Communication-Efficient Decentralized Optimization Ausnutzung von Ähnlichkeiten für die Berechnung und Kommunikation-Effiziente dezentralisierte Optimierung 利用计算和通信效率-权力下放优化的相似性 2506.05791v1 -
1709 06-06 Robust sensor fusion against on-vehicle sensor staleness Robuste Sensor-Fusion gegen Stillstand des Sensors im Fahrzeug 压制车辆传感器悬浮剂的强感传感器聚变 2506.05780v1 -
1710 06-06 Pegasus: A Universal Framework for Scalable Deep Learning Inference on the Dataplane Pegasus: Ein universelles Framework für skalierbare Deep Learning-Schlussfolgerungen auf dem Dataplane Pegasus:数据飞机可缩放深学习推理通用框架 2506.05779v1 -
1711 06-06 Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch Dunkles Wissen unter verschiedenen Lehrer-Kapazitäten erforschen und Kapazitäts-Mismatch angehen 在各种教师能力之下探索黑暗知识,解决能力不足问题 2405.13078v2 -
1712 06-06 ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation ActionPiece: Kontextuelle Tokenisierung von Handlungssequenzen für eine generative Empfehlung ActionPiece: 产生建议的背景化行动序列 2502.13581v2 -
1713 06-06 Evaluating Neuron Explanations: A Unified Framework with Sanity Checks Neuron-Erklärungen bewerten: Ein einheitliches Framework mit Sanitätskontrollen 评估内向解释:一个统一框架,检查是否合乎情理 2506.05774v1 -
1714 06-06 SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures SGD Jittering: Eine Schulungsstrategie für robuste und präzise modellbasierte Architekturen SGD JGT JUGT JIGT: 强健和准确的建模建筑培训战略 2410.14667v3 -
1715 06-06 Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks Visualisierung, Nachdenken und Abbau der verlorenen Landschaft tiefer neuraler Netzwerke 深神经网络丧失的地貌景观 2405.12493v2 -
1716 06-06 Predicting ICU In-Hospital Mortality Using Adaptive Transformer Layer Fusion Vorhersage der Krankenhaussterblichkeit der Intensivstation mittels adaptiver Transformer Layer Fusion 利用适应性变异器图层融合预测伊斯兰法院联盟在医院内的死亡率 2506.04924v2 -
1717 06-06 AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation AANet: Virtuelles Screening unter struktureller Unsicherheit über Alignment und Aggregation AANet:通过对齐和聚合在结构不确定性下进行虚拟筛选 2506.05768v1 -
1718 06-06 On the Query Complexity of Verifier-Assisted Language Generation Zur Fragekomplexität der sprachlichen Generierung von Prüfern 关于验证者协助语言生成的查询复杂程度 2502.12123v2 -
1719 06-06 BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning BiTrajDiff: Bidirektionale Trajektorien-Generation mit Diffusionsmodellen für Offline-Verstärkung BirrajDiff:双向轨迹生成,带有离线强化学习传播模型 2506.05762v1 -
1720 06-06 Mapping correlations and coherence: adjacency-based approach to data visualization and regularity discovery Mapping-Korrelationen und Kohärenz: Adjazenz-basierter Ansatz zur Datenvisualisierung und Regelmäßigkeits-Entdeckung 绘图的相互关系和一致性:基于对应的可视化和定期发现数据方法 2506.05758v1 -
1721 06-06 FlowOE: Imitation Learning with Flow Policy from Ensemble RL Experts for Optimal Execution under Heston Volatility and Concave Market Impacts FlowOE: Imitation Learning with Flow Policy von Ensemble RL Experts for Optimal Execution unter Heston Volatility und Concave Market Impacts FlowOE:在赫斯顿波动和集中市场影响下最优化执行的集合RL专家以流动政策进行模拟学习 2506.05755v1 -
1722 06-06 Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective Beschränkte Probenahmen für Sprachmodelle sollten einfach sein: Eine MCMC-Perspektive 语言模式抽样应容易:MCMC视角 2506.05754v1 -
1723 06-06 Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting Integration von Spatiotemporalen Features in LSTM für spatially Informed COVID-19 Hospitalization Forecasting LSTM综合地貌特征纳入LSTM系统,用于空间知情COVID-19医院化预测 2506.05752v1 -
1724 06-06 Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models 低Rank适应的计算限制(LORA)变换模型的精确调整 2406.03136v2 -
1725 06-06 Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance Effiziente Online-RFT mit Plug-and-Play-LLM-Richtern: Entsperren State-of-the-Art-Performance 与 “ 插插 “ 和 “ Play LLM “ 法官一起高效在线RFT:解锁最新业绩 2506.05748v1 -
1726 06-06 SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models SPRINT: Interleaved Planung und parallelisierte Ausführung in vernünftigen Modellen ermöglichen SPRINT: 在理由说明模式中促进跨离规划和平行执行 2506.05745v1 -
1727 06-06 In-context denoising with one-layer transformers: connections between attention and associative memory retrieval In-Context-Denoising mit Einschichttransformatoren: Verbindungen zwischen Aufmerksamkeit und assoziativem Speicherabruf 与一层变压器的自通脱色:注意和联动内存检索之间的联系 2502.05164v2 -
1728 06-06 FinSage: A Multi-aspect RAG System for Financial Filings Question Answering FinSage: Ein Multi-Aspect-RAG-System für Finanzfilialen Frageantworten FinSage:财务申报问题解答的多层RAG系统 2504.14493v3 -
1729 06-06 ResearchTown: Simulator of Human Research Community ResearchTown: Simulator der Human Research Community 研究对象:人类研究界模拟者 2412.17767v2 -
1730 06-06 Generalized Incremental Learning under Concept Drift across Evolving Data Streams Allgemeines Inkrementelles Lernen unter Konzept Drift über Evolving Data Streams 贯穿不断演变的数据流的 “ 概念漂流 “ 概念下的普通递增学习 2506.05736v1 -
1731 06-06 Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness Vergessen LLMs wirklich? Bewertung des Unlernens mit Wissenskorrelation und Vertrauensbewusstsein LLMs真的会忘却吗?用知识关联和信心意识来评价不学习 2506.05735v1 -
1732 06-06 Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity Nicht-konvexe Matrix-Erfassung: Brechen der quadratischen Rank-Schranke in der Probenkomplexität 非曲线矩阵表感测:打破样本复杂程度的二次级屏障 2408.13276v3 -
1733 06-06 GroMo: Plant Growth Modeling with Multiview Images GroMo: Plant Growth Modeling mit Multiview-Bildern Gromo:以多视角图像建模的植物增长模型 2503.06608v2 -
1734 06-06 Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization Provable Complexity Verbesserung von AdaGrad über SGD: obere und untere Bounds in stochastischen Non-Convex-Optimierung AdaGrad的复杂度高于SGD的改善:在斯托切性非Convex优化中的上下界 2406.04592v3 -
1735 06-06 Large Language Models are Good Relational Learners Große Sprachmodelle sind gute relationale Lernende 大语言模型是好关系学习者 2506.05725v1 -
1736 06-06 Any-Class Presence Likelihood for Robust Multi-Label Classification with Abundant Negative Data Jede-Klasse-Präsenz-Wahrscheinlichkeit für robuste Multi-Label-Klassifikation mit überschwänglichen Negativdaten 具有大量负值数据的强性多标签分类中存在任何类别的可能性 2506.05721v1 -
1737 06-06 Optimization Proxies using Limited Labeled Data and Training Time – A Semi-Supervised Bayesian Neural Network Approach Optimierungs-Proxies mit begrenzten beschrifteten Daten und Trainingszeit – Ein semi-überwachter Bayesian Neural Network-Ansatz 使用有限标签数据和培训时间的有限标签数据和培训时间 – – 半超模贝耶斯神经网络方法 2410.03085v3 -
1738 06-06 Grokking Beyond the Euclidean Norm of Model Parameters Grokking jenseits der euklidischen Norm von Modellparametern 示范参数欧洲标准 2506.05718v1 -
1739 06-06 Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning Ensemble Elastic DQN: Ein neuartiger Mehrschritt-Ensemble-Ansatz zur Bewältigung von Überschätzungen im tiefen wertbasierten Verstärkungslernen 共合大 Elastic DQN: 一种新颖的多步共步办法,以解决深重价值强化学习中的高估问题 2506.05716v1 -
1740 06-06 Mirage: A Multi-Level Superoptimizer for Tensor Programs Mirage: Ein Multi-Level-Superoptimizer für Tensor-Programme 幻影:向导方案多层次超强激励器 2405.05751v3 -
1741 06-06 Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation Kommen Sie zusammen, aber nicht jetzt: Eine progressive Strategie, um Low-Rank-Anpassung zu fördern 齐心合力,但现在不是现在:一个推进低Rank适应的渐进战略 2506.05713v1 -
1742 06-06 A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models Eine Umfrage über Sparse Autoencoder: Interpretation der internen Mechanismen großer Sprachmodelle 粗略自动编码器调查:解释大语言模式内部机制 2503.05613v2 -
1743 06-06 Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v1 -
1744 06-06 Understanding Memorization in Generative Models via Sharpness in Probability Landscapes Gedächtnis in Generativen Modellen über Schärfe in Wahrscheinlichkeitslandschaften verstehen 通过在概率景观中的尖锐化理解生成模型中的记忆化 2412.04140v3 -
1745 06-06 Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting Online-Erkennung von LLM-generierten Texten durch sequentielle Hypothesenprüfung durch Wetten 通过打赌按顺序假设测试在线检测LLM发光文本 2410.22318v3 -
1746 06-06 Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces Action-Adaptive Continual Learning: Politische Generalisierung unter dynamischen Aktionsräumen aktivieren 行动促进性持续学习:动态行动空间下的扶持性政策概括化 2506.05702v1 -
1747 06-06 Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health Statistisch gültige Überwachung nach der Bereitstellung sollte Standard für KI-basierte digitale Gesundheit sein 统计上有效的部署后监测应作为AI 数字健康的标准 2506.05701v1 -
1748 06-06 Knowledge Retention for Continual Model-Based Reinforcement Learning Wissensvorbehalt für kontinuierliches modellbasiertes Verstärkungslernen 继续保留知识,以不断保持知识,进行基于模式的不断强化学习 2503.04256v4 -
1749 06-06 Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning Allgemeines LLM-Lernen von Graphen-Synthetischen Daten mit Verstärkungs-Lernen 利用强化学习学习图形合成数据学习 2506.00845v2 -
1750 06-06 Graph Attention Networks Unleashed: A Fast and Explainable Vulnerability Assessment Framework for Microgrids Graphen-Achtungsnetzwerke unleashed: Ein schneller und erklärbarer Sicherheitsbewertungsrahmen für Mikrogrids 未剥离的图示注意网络:微电网快速和可解释的脆弱性评估框架 2503.00786v2 -
1751 06-06 Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework Progressiv stark sein! Erweiterung der Wissensdestillation von großen Sprachmodellen durch einen Curriculum Learning Framework 通过课程学习框架加强大语言模式的知识提炼 2506.05695v1 -
1752 06-06 Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs Direct Behavior Optimization: Entsperren des Potenzials leichter LLMs 直接行为优化:释放轻度LMT的潜力 2506.06401v1 -
1753 06-06 Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning Gerade genug denken: Effiziente Vernunft mit adaptiven Längenstrafen Stärkungslernen ” 思考够了 “ : “ 适应性延长刑罚强化学习的高效理性 “ 2506.05256v2 -
1754 06-06 Simmering: Sufficient is better than optimal for training neural networks Simmering: Ausreichend ist besser als optimal für die Ausbildung neuronaler Netzwerke 浸泡:对培训神经网络而言,充足比最佳的更好 2410.19912v2 -
1755 06-06 TracLLM: A Generic Framework for Attributing Long Context LLMs TracLLM: Ein generisches Rahmenwerk für die Zuweisung von LLMs mit langem Kontext TracLLM: 长期LMLM授标通用框架 2506.04202v2 -
1756 06-06 RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models RoSTE: Ein effizienter Quantization-Aware-überwachter Feintuning-Ansatz für große Sprachmodelle RoSTE:对大语言模式采用高效率的量化软件监督微调方法 2502.09003v3 -
1757 06-06 Sparse Autoencoders, Again? Sparse Autoencoder, schon wieder? 又是粗糙的自动编码器吗? 2506.04859v2 -
1758 06-06 Learning Design-Score Manifold to Guide Diffusion Models for Offline Optimization Learning Design-Score Manifold zur Anleitung von Diffusionsmodellen für Offline-Optimierung 指导离线优化推广模式 2506.05680v1 -
1759 06-06 Topology-aware Neural Flux Prediction Guided by Physics Topologie-bewusst Neurale Flux-Vorhersage geleitet von der Physik 物理引导 2506.05676v1 -
1760 06-06 LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models LauraTSE: Target Speaker Extraction mit Auto-Regressive Decoder-Only Language Models LARTSE: 使用自动递退解码器单一语言模式将议长抽取作为目标 2504.07402v2 -
1761 06-06 Kernel Density Machines Kerndichtemaschinen 内核密度机 2504.21419v2 -
1762 06-06 Contextually Guided Transformers via Low-Rank Adaptation Kontextuell geführte Transformer über Low-Rank-Anpassung 通过低辐射适应实现上下文向导变形器 2506.05672v1 -
1763 06-06 BAQ: Efficient Bit Allocation Quantization for Large Language Models BAQ: Effiziente Bitallokation Quantisierung für große Sprachmodelle BAQ: 大语言模式的高效比分配比量量化 2506.05664v1 -
1764 06-06 CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation CoopetitiveV: LLM-powered Coopetitive Multi-Agent für hochwertige Verilog-Generation 协作V:利用LLM-动力协同协作的多方协作促进高品质活性一代 2412.11014v2 -
1765 06-06 Emulating compact binary population synthesis simulations with robust uncertainty quantification and model comparison: Bayesian normalizing flows Emulieren kompakter binärer Populationssynthesesimulationen mit robuster Unsicherheitsquantifizierung und Modellvergleich: Bayesische Normalisierungsströme 以稳健的不确定性量化和模型比较来模拟精密的二元人口合成模拟,并进行有力的不确定性量化和模型比较: 2506.05657v1 -
1766 06-06 Federated Foundation Model for GI Endoscopy Images Federiertes Stiftungsmodell für GI-Endoskopie-Bilder GI 内窥镜图像联邦基金会模型 2505.24108v2 -
1767 06-06 Learning to Weight Parameters for Data Attribution Gewichtungsparameter für die Zuordnung von Daten lernen 数据归属的重度参数学习 2506.05647v1 -
1768 06-06 Computational bottlenecks for denoising diffusions Computationale Engpässe für die Disseminierung von Diffusionen 消除传播传播的计算瓶颈 2503.08028v2 -
1769 06-06 Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models Pseudo Relevanz Feedback reicht aus, um die Lücke zwischen kleinen und großen Dense-Retrieval-Modellen zu schließen Peedo相关性反馈足以弥合小型和大型常量检索模型之间的差距 2503.14887v2 -
1770 06-06 Interpretable Enzyme Function Prediction via Residue-Level Detection Interpretierbare Enzymfunktionsvorhersage über Rückstandserkennung 通过残留水平探测的可解释酶功能预测 2501.05644v2 -
1771 06-06 Where is the signal in tokenization space? Wo ist das Signal im Tokenisationsraum? 象征空间的信号在哪里? 2408.08541v2 -
1772 06-06 Hierarchical Refinement: Optimal Transport to Infinity and Beyond Hierarchische Verfeinerung: Optimaler Transport zur Unendlichkeit und darüber hinaus 梯级精炼:向无限及以后的最佳运输 2503.03025v2 -
1773 06-06 Autocomp: LLM-Driven Code Optimization for Tensor Accelerators Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v2 -
1774 06-06 Estimating stationary mass, frequency by frequency Schätzung der stationären Masse, Häufigkeit nach Häufigkeit 估计固定质量,按频率分列的频率 2503.12808v3 -
1775 06-06 Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer Longitudinal Targeted Minimal Loss-basierte Abschätzung mit zeitlich-differenzierter Heterogener Transformer 与时间差异变化异质变异器的纵向定值最低损失估计值 2404.04399v2 -
1776 06-06 Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones Projektierbare Modelle: One-Shot-Generation von kleinen Spezialtransformatoren aus großen Modellen 集成项目模型:大型一型小型专门化变形器的单制制式 2506.05641v1 -
1777 06-05 (4) A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition Ein fiktiver Q&A-Datensatz für das Studieren von Memorization und Knowledge Acquisition 用于研究记忆化和知识获取的字形 A 数据集 2506.05639v1 -
1778 06-05 Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks’ Internal Representations Erklärbare Konzeptgenerierung durch Vision-Sprachvorliebe Lernen zum Verständnis der internen Repräsentationen neuraler Netzwerke 通过愿景-语言优先学习,了解神经网络内部代表网络,可解释的概念生成 2408.13438v3 -
1779 06-05 CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP CLIPErase: Effizientes Entlernen von Visual-Textual Associations in CLIP CLIPErase:在CLIP中有效不学习视觉-语言协会 2410.23330v2 -
1780 06-05 Bayesian Inference for Correlated Human Experts and Classifiers Bayesische Schlussfolgerung für korrelierte menschliche Experten und Klassifikatoren 与相关人类专家和分类有关连的人类专家和分类人的贝叶斯推论 2506.05636v1 -
1781 06-05 AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking AdaReasoner: Adaptive Reasoning ermöglicht flexibleres Denken AdaReasoner:适应性理性使思考更灵活 2505.17312v2 -
1782 06-05 AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization AutoQD: Automatische Entdeckung von unterschiedlichen Verhalten mit Qualitäts-Diversitäts-Optimierung 自动QD: 以质量- 多样性优化自动发现多种行为 2506.05634v1 -
1783 06-05 The TESS Ten Thousand Catalog: 10,001 uniformly-vetted and -validated Eclipsing Binary Stars detected in Full-Frame Image data by machine learning and analyzed by citizen scientists Der TESS Zehntausend-Katalog: 10,001 gleichmäßig getestete und validierte Eclipsing Binary Stars in Full-Frame-Bilddaten durch maschinelles Lernen erkannt und von Bürgerwissenschaftlern analysiert TESS 10 000万个目录:10 001个统一审查的和经过验证的二进制星,通过机器学习和由公民科学家分析,在全框架图像数据中检测到 2506.05631v1 -
1784 06-05 A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series Ein verlässlicher Rahmen für die Mensch-in-the-Loop-Anomalie-Erkennung in der Zeitreihe 时间序列中人类在Loop异常探测的可靠框架 2405.03234v3 -
1785 06-05 GP-MoLFormer-Sim: Test Time Molecular Optimization through Contextual Similarity Guidance GP-MoLFormer-Sim: Testzeit-Molekularoptimierung durch kontextuelle Ähnlichkeitsführung GP-MoLFormer-Sim:通过背景相似性指导试验时间分子优化 2506.05628v1 -
1786 06-05 Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency Grundlegende Grenzen der schnellen Tuning Transformer: Universalität, Kapazität und Effizienz 迅速提款变换者的根本限制:普遍性、能力和效益 2411.16525v2 -
1787 06-05 Theoretical Analysis of Positional Encodings in Transformer Models: Impact on Expressiveness and Generalization Theoretische Analyse von Positionskodierungen in Transformer-Modellen: Auswirkungen auf Expressivität und Generalisierung 变换模型中位置编码理论分析:对表现力和普遍化的影响 2506.06398v1 -
1788 06-05 Two-dimensional Taxonomy for N-ary Knowledge Representation Learning Methods Zweidimensionale Taxonomie für N-are Wissensdarstellung Lernmethoden Nary知识代表制学习方法二维分类学 2506.05626v1 -
1789 06-05 Heterogeneous Sequel-Aware Graph Neural Networks for Sequential Learning Heterogene Sequel-Aware Graph Neuronale Netzwerke für sequentielles Lernen 用于序列学习的异异质Sequel-Aware 图像神经网络 2506.05625v1 -
1790 06-05 Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space Verteilungsmatrix Fertigstellung über die nächsten Nachbarn im Wassersteinraum 通过瓦塞斯坦空间近距离邻居完成分布矩阵 2410.13112v2 -
1791 06-05 Active Learning of Piecewise Gaussian Process Surrogates Aktives Lernen des stückweisen Gaußschen Prozesses Surrogats 正在积极学习小费高斯进程代得国家 2301.08789v3 -
1792 06-05 EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM? EnsemW2S: Kann ein Ensemble von LLMs dazu genutzt werden, eine stärkere LLM zu erhalten? EnsemW2S:能否利用捆绑的LLMs获得更强大的LMm? 2410.04571v2 -
1793 06-05 Exploring Representations and Interventions in Time Series Foundation Models Exploring Repräsentationen und Interventionen in Time Series Foundation Models 探讨在时间系列基础模型中的代表性和干预措施 2409.12915v5 -
1794 06-05 Toward Efficient Kernel-Based Solvers for Nonlinear PDEs Auf dem Weg zu effizienten Kernel-basierten Lösern für nichtlineare PDEs 实现非线性PDE 高效的以核心为基础的内核解决问题器 2410.11165v4 -
1795 06-05 Population-Proportional Preference Learning from Human Feedback: An Axiomatic Approach Bevölkerung-proportionale Präferenz Lernen aus menschlichem Feedback: Ein axiomatischer Ansatz 从人类反馈中学习:一种不言而喻的方法 2506.05619v1 -
1796 06-05 Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods Rollout Roulette: Ein probabilistischer Schlussfolgerungsansatz zur Inferenz-Zeit-Skalierung von LLMs mit partikelbasierten Monte-Carlo-Methoden 滚轮推出:使用以粒子为基础的蒙特卡洛方法对LLMs的推论-时间尺度的概率推推方法 2502.01618v4 -
1797 06-05 LFA applied to CNNs: Efficient Singular Value Decomposition of Convolutional Mappings by Local Fourier Analysis LFA für CNNs: Effiziente Singularwertzerlegung von konvolutionären Mappings durch lokale Fourier-Analysen LFA适用于有线电视新闻网:通过局部Fourier分析对革命绘图法的高效单项价值分解 2506.05617v1 -
1798 06-05 When Maximum Entropy Misleads Policy Optimization Wenn maximale Entropie die Politikoptimierung verfälscht 当最大 Entroppy 错误引导政策优化 2506.05615v1 -
1799 06-05 Extracting Interpretable Logic Rules from Graph Neural Networks Extrahieren von interpretierbaren Logic Rules aus Graph Neural Networks 从图形神经网络中提取可解释逻辑规则 2503.19476v2 -
1800 06-05 Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning Reasoning Towards Fairness: Bias in Sprachmodellen durch reasoning-guided Fine-Tuning abmildern 追求公平的理由:通过有理、有理、有指导的微调减少语言模式中的偏见 2504.05632v3 -
1801 06-05 Robust and Efficient Transfer Learning via Supernet Transfer in Warm-started Neural Architecture Search Robustes und effizientes Transfer-Lernen über Supernet-Transfer in der neuralen Architektur 通过热启动的神经结构搜索中的超级网传输进行强力和高效传输学习 2407.20279v2 -
1802 06-05 Adversarial Tokenization Adversariale Tokenisierung 对抗感应器化 2503.02174v2 -
1803 06-05 Model-Based Exploration in Monitored Markov Decision Processes Modellbasierte Exploration in überwachten Markov-Entscheidungsprozessen 在监测的Markov决策过程中进行基于模型的探索 2502.16772v4 -
1804 06-05 An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus Eine optimale Kaskade-Feature-Level-Spatiotemporale Fusionsstrategie für Anomalieerkennung im CAN-Bus CAN公共汽车异常探测的最佳最佳岩岩岩层地貌级超光时聚变战略 2501.18821v3 -
1805 06-05 FaCTR: Factorized Channel-Temporal Representation Transformers for Efficient Time Series Forecasting FaCTR: Factorized Channel-Temporal Representation Transformers für effiziente Zeitreihenprognosen FACTR: 高效时间序列预报的有条频道-临时代表制变革器 2506.05597v1 -
1806 06-05 Zero-shot protein stability prediction by inverse folding models: a free energy interpretation Null-Shot-Proteinstabilitätsprognose durch inverse Faltmodelle: eine freie Energieinterpretation 通过反折叠模型进行零射蛋白稳定性预测:免费能源解释 2506.05596v1 -
1807 06-05 Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations Unüberwachte Latent Pattern Analyse zur Schätzung des Typ-2-Diabetes-Risikos in nicht diagnostizierten Populationen 未经监督的对未诊断的人群2型糖尿病风险估算的 2505.21824v2 -
1808 06-05 Nonlinear Causal Discovery through a Sequential Edge Orientation Approach Nichtlineare Causal Discovery durch einen sequentiellen Edge-Orientierungsansatz 通过序列边缘方向方法的非线性因果发现 2506.05590v1 -
1809 06-05 MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark MMTU: Ein massiver Multi-Task-Tabellen-Verstands- und Orientierungs-Benchmark MMTU: 大规模多任务表格理解和说明理由基准 2506.05587v1 -
1810 06-05 CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions CoFrNets: Interpretierte Neuralarchitektur Inspiriert von fortgesetzten Brüchen CoFrNets:受连续分数启发的可解释神经结构 2506.05586v1 -
1811 06-05 TabFlex: Scaling Tabular Learning to Millions with Linear Attention TabFlex: Tabulares Lernen zu Millionen mit linearer Aufmerksamkeit skalieren TabFlex: 以线性关注方式向百万上百万上百万人推广表格学习 2506.05584v1 -
1812 06-05 Conformal Prediction Adaptive to Unknown Subpopulation Shifts Konforme Vorhersage Anpassung an unbekannte Subpopulationsverschiebungen 适应未知亚人口变化 2506.05583v1 -
1813 06-05 On the Importance of Gaussianizing Representations Über die Bedeutung gaussianisierender Repräsentationen 关于 “ 高机构化 “ 代表的重要性 2505.00685v2 -
1814 06-05 The Impact of Inference Acceleration on Bias of LLMs Die Auswirkungen von Inferenzbeschleunigung auf LLM-Bias 推推加速对LLMM的偏见的影响 2410.22118v3 -
1815 06-05 Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts Gemeinsames Lernen in Agentensystemen: Eine kollektive KI ist größer als die Summe ihrer Teile 危险系统合作学习:集体AI大于其各部分的总和 2506.05577v1 -
1816 06-05 When can in-context learning generalize out of task distribution? Wann kann das In-Context-Lernen aus der Aufgabenverteilung verallgemeinern? 什么时候能通俗易懂地了解任务分配的概况? 2506.05574v1 -
1817 06-05 Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning Ravan: Multi-Head Low-Rank Anpassung für Federated Fine-Tuning Ravan:针对联邦罚款的多级低兰克适应措施 2506.05568v1 -
1818 06-05 Partially-Supervised Neural Network Model For Quadratic Multiparametric Programming Teilweise überwachtes neurales Netzwerkmodell für die quadratische multiparametrische Programmierung 半二次多参数方案拟订部分运行的神经网络模型 2506.05567v1 -
1819 06-05 Applying Informer for Option Pricing: A Transformer-Based Approach Informer für Optionspreise anwenden: Ein transformerbasierter Ansatz 备选定价应用内线:基于变换器的方法 2506.05565v1 -
1820 06-05 Relational Conformal Prediction for Correlated Time Series Beziehungskonforme Vorhersage für korrelierte Zeitreihen Cor相关时间序列的通货膨胀正规预测 2502.09443v2 -
1821 06-05 Mixed-Precision Conjugate Gradient Solvers with RL-Driven Precision Tuning Mixed-Precision Conjugate Gradient Solver mit RL-getriebener Präzisionstuning RL-Driven 精确调试混合精密共分梯度溶剂 2504.14268v4 -
1822 06-05 DART-Vetter: A Deep LeARning Tool for automatic triage of exoplanet candidates DART-Vetter: Ein Deep LeARning Tool für die automatische Triage von Exoplaneten-Kandidaten DART – – 设计者:一个用于自动分流异丙型板块候选物的深深LeARning工具 2506.05556v1 -
1823 06-05 Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP Multivariate Temporale Regression im Maßstab: Ein Drei-Pillar-Framework kombiniert ML, XAI und NLP 规模上多变时间回落:三边框架组合 ML、 XAI 和 NLP 2504.02151v2 -
1824 06-05 Online Conformal Model Selection for Nonstationary Time Series Online-Konforme Modellauswahl für nichtstationäre Zeitreihen 非固定时间系列在线非正式模型选择 2506.05544v1 -
1825 06-05 Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data Agentomics-ML: Autonomes Machine Learning Experimentation Agent für Genomische und Transkriptionsdaten ML:基因组和转基因数据自动机械学习实验代理 2506.05542v1 -
1826 06-05 SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms SocialDF: Benchmark Dataset und Detection Model zur Minderung schädlicher Deepfake-Inhalte auf Social Media-Plattformen 社会DF:社会媒体平台减少有害深层内容的基准数据集和检测模式 2506.05538v1 -
1827 06-05 Faster Approx. Top-K: Harnessing the Full Power of Two Stages Schneller ca. Top-K: Die volle Kraft zweier Etappen nutzen 快速近似。 最高K级: 利用两个阶段的全功率 2506.04165v2 -
1828 06-05 A new pathway to generative artificial intelligence by minimizing the maximum entropy Ein neuer Weg zur generativen künstlichen Intelligenz durch Minimierung der maximalen Entropie 通过最大限度地减少最大倍增 实现人工智能基因化的新途径 2502.13287v2 -
1829 06-05 MCMC-Correction of Score-Based Diffusion Models for Model Composition MCMC-Korrektion von Score-basierten Diffusionsmodellen für die Modellzusammensetzung MC-基于记分传播模型模型的示范构成的校正 2307.14012v3 -
1830 06-05 Spectral Graph Neural Networks are Incomplete on Graphs with a Simple Spectrum Spektrale Graphen-Neuralnetze sind auf Graphen mit einem einfachen Spektrum unvollständig 光谱图形神经网络在带有简单光谱的图形上不完整 2506.05530v1 -
1831 06-05 Avoiding Death through Fear Intrinsic Conditioning Vermeiden des Todes durch Angst Intrinsische Konditionierung 避免因恐惧而死亡 2506.05529v1 -
1832 06-05 MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning MORSE-500: Ein programmierbarer Video-Benchmark zum Stress-Test Multimodale Reasoning MORSE-500:压力测试多模式理由方案可控制的视频基准 2506.05523v1 -
1833 06-05 The Complexity of Learning Sparse Superposed Features with Feedback Die Komplexität des Lernens Sparse überlagerte Funktionen mit Feedback 具有反馈的学习偏散超强特征的复杂性 2502.05407v3 -
1834 06-05 Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots Learning to Recover: Dynamische Belohnungsformung mit Rad-Leg-Koordination für gefallene Roboter 学习恢复: 堕落机器人使用轮式轴协调进行动态回授形状 2506.05516v1 -
1835 06-05 Winner-takes-all for Multivariate Probabilistic Time Series Forecasting Winner-takes-all für Multivariate probabilistische Zeitreihen-Prognose 多变量概率性时间序列预测的赢取全胜 2506.05515v1 -
1836 06-05 Geometric and Physical Constraints Synergistically Enhance Neural PDE Surrogates Geometrische und physikalische Einschränkungen Synergistisch verbessern neurale PDE-Überlagerungen 同步增强神经PDE代号 2506.05513v1 -
1837 06-05 Investigating Non-Transitivity in LLM-as-a-Judge Untersuchung der Nichtübertragbarkeit in LLM-as-a-Richter 调查LLM-as-A法官的非过渡性 2502.14074v3 -
1838 06-05 How can Diffusion Models Evolve into Continual Generators? Wie können sich Diffusionsmodelle zu kontinuierlichen Generatoren entwickeln? 传播模型如何发展成持续发电机? 2505.11936v2 -
1839 06-05 The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models Der Generative Sprung: Scharfe Probenkomplexität für effizientes Lernen Gaussischer Multi-Index-Modelle 产生式滑坡:高效学习高斯多指标模型的简单抽样复杂程度 2506.05500v1 -
1840 06-05 Multidimensional Analysis of Specific Language Impairment Using Unsupervised Learning Through PCA and Clustering Multidimensionale Analyse der spezifischen Sprachbeeinträchtigung durch unbeaufsichtigtes Lernen durch PCA und Clustering 利用无监督的学习,通过常设仲裁院和集群,对特定语言障碍进行多层面分析 2506.05498v1 -
1841 06-05 Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models Conformal Prediction Beyond the Seen: Eine fehlende Massenperspektive für Unsicherheitsquantifizierung in generativen Modellen 可见之外的非正式预测:在生成模型中不确定性量化的缺失质量角度 2506.05497v1 -
1842 06-05 Learning-Augmented Hierarchical Clustering Learning-Augmented Hierarchical Clustering 学习增强的等级集群 2506.05495v1 -
1843 06-05 GENIUS: A Generative Framework for Universal Multimodal Search GENIUS: Ein generatives Framework für die universelle multimodale Suche GENNIUS: 通用多模式搜索创创框架 2503.19868v2 -
1844 06-05 Sentiment Analysis in Learning Management Systems Understanding Student Feedback at Scale Sentiment Analysis in Lernmanagement-Systemen Verständnis des Feedbacks von Studierenden auf Scale 学习管理系统对学习管理系统的情感分析 了解学生的大规模反馈 2506.05490v1 -
1845 06-05 Initial Model Incorporation for Deep Learning FWI: Pretraining or Denormalization? Ursprüngliche Modellinkorporation für Deep Learning FWI: Vor- oder Entnormierung? FWI: 培训前或非正规化? 2506.05484v1 -
1846 06-05 ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting ODE-GS: Latente ODEs für dynamische Szene Extrapolation mit 3D Gaussian Splatting ODE-GS: 带有 3D Gaussian Splatting 3D 的动态场点外推法 2506.05480v1 -
1847 06-05 Learning-Augmented Algorithms for MTS with Bandit Access to Multiple Predictors Lerngesteigerte Algorithmen für MTS mit Bandit-Zugriff auf mehrere Vorhersager 多边贸易体系的学习强化比值,使强盗能够接触多种预测者 2506.05479v1 -
1848 06-05 Inference-Time Hyper-Scaling with KV Cache Compression Inferenz-Zeit-Hyper-Skalierung mit KV-Cache-Kompression 与 KV 缓存压缩一起的推断时间超大缩放 2506.05345v1 -
1849 06-05 Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets Warum LLM Sicherheits-Guardrails nach Feinabstimmung zusammenbrechen: Eine Ähnlichkeitsanalyse zwischen Alignment- und Feinabstimmungs-Datensätzen 为何LLM 安全护卫车在微调后倒塌:对准和微调数据集之间的相似性分析 2506.05346v1 -
1850 06-05 Search Arena: Analyzing Search-Augmented LLMs Sucharena: Analyse von Such-Augmentierten LLMs 搜索竞技场: 分析搜索推荐LMS 2506.05334v1 -
1851 06-05 AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning AceReason-Nemotron: Mathematische und Code-Reasonierung durch Stärkungslernen AceReson-Nepron:通过强化学习推进数学和守则 2505.16400v3 -
1852 06-05 Zeroth-Order Optimization Finds Flat Minima Nullte-Order-Optimierung findet flache Minima 零阶优化发现平底微米 2506.05454v1 -
1853 06-05 The broader spectrum of in-context learning Das breitere Spektrum des In-Context-Lernens 广义的内通学习 2412.03782v3 -
1854 06-05 Admissibility of Completely Randomized Trials: A Large-Deviation Approach Zulässigkeit von völlig randomisierten Prüfungen: Ein Ansatz mit großer Abweichung 完全随机审理的可受理性:一个大型减缓办法 2506.05329v1 -
1855 06-05 Seeing the Invisible: Machine learning-Based QPI Kernel Extraction via Latent Alignment Das Unsichtbare sehen: Maschinelles Lernen basierte QPI-Kernel-Extraktion über Latent Alignment 可见:机器学习 QPI Kernel 采掘,通过Lient对齐 2506.05325v1 -
1856 06-05 LSM-2: Learning from Incomplete Wearable Sensor Data LSM-2: Lernen von unvollständigen Wearable Sensordaten LSM-2:学习不全穿感应器数据 2506.05321v1 -
1857 06-05 Generalizable, real-time neural decoding with hybrid state-space models Verallgemeinerbare, Echtzeit-Neuraldekodierung mit hybriden State-Space-Modellen 与混合状态空间模型的通用实时神经解码 2506.05320v1 -
1858 06-05 Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay Verbesserung der Dateneffizienz für LLM-Verstärkung Feinabstimmung durch problemorientierte Online-Datenauswahl und Rollout-Replay 提高LLLM数据效率,通过困难目标在线数据选择和推出重播提高LLM强化微调的数据效率 2506.05316v1 -
1859 06-05 Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models Engropisches Lernen eingeschränkt: Ein primäres Rahmenwerk für große Sprachmodelle 未学习:大语言模式的原始-双重框架 2506.05314v1 -
1860 06-05 Learning normalized image densities via dual score matching Normalisierte Bilddichten durch Dual-Score-Matching lernen 通过双分比比对学习图像密度正常化 2506.05310v1 -
1861 06-05 ProRefine: Inference-time Prompt Refinement with Textual Feedback ProRefine: Inferenz-Zeit Prompt Verfeinerung mit Text-Feedback ProRefine: 用文字反馈迅速改进推论-时间 2506.05305v1 -
1862 06-05 Power Law Guided Dynamic Sifting for Efficient Attention Power Law geführtes Dynamic Sifting für effiziente Aufmerksamkeit 有效关注动力法指导动态筛选 2506.05300v1 -
1863 06-05 Control Tax: The Price of Keeping AI in Check Steuerregelung: Der Preis der AII-Kontrolle 控制税:检查人工智能检查价格 2506.05296v1 -
1864 06-05 A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$: Robust Imitation via Learning to Search Ein glattes Meer hat nie eine geschickte $\texttt{SAILOR}$: Robuste Imitation über das Lernen zu suchen 平滑的海海从未造出技能用$\ textt{ SAILOR}$: 通过学习搜索进行强力模仿 2506.05294v1 -
1865 06-05 Learning Beyond Experience: Generalizing to Unseen State Space with Reservoir Computing Learning Beyond Experience: Mit Reservoir Computing den Zustand des Weltraums verallgemeinern 超越经验的学习:利用储量计算,向不见得的国家空间推广 2506.05292v1 -
1866 06-05 Stable Vision Concept Transformers for Medical Diagnosis Stabile Vision Konzepttransformatoren für medizinische Diagnose 医学诊断的稳定远景构想变异器 2506.05286v1 -
1867 06-05 Fast-DataShapley: Neural Modeling for Training Data Valuation Fast-DataShapley: Neurale Modellierung für Trainingsdatenbewertung 快速数据库:培训数据评估神经模型 2506.05281v1 -
1868 06-05 How to Unlock Time Series Editing? Diffusion-Driven Approach with Multi-Grained Control Wie man Zeitreihenbearbeitung entsperrt? Diffusion-getriebener Ansatz mit mehrkörniger Steuerung 如何解锁时间序列编辑 ? 2506.05276v1 -
1869 06-05 Tight analyses of first-order methods with error feedback Enge Analysen von First-Order-Methoden mit Fehlerrückmeldung 利用错误反馈对一级处理方法进行严格分析 2506.05271v1 -
1870 06-05 Context is Key: A Benchmark for Forecasting with Essential Textual Information Kontext ist der Schlüssel: Ein Benchmark für Vorhersagen mit wesentlichen Textinformationen 背景是关键:用基本文字信息预测的基准 2410.18959v4 -
1871 06-05 Learning long range dependencies through time reversal symmetry breaking Längere Abhängigkeiten durch Zeitumkehr-Symmetrie-Bruch lernen 通过时间逆向对称断裂,通过时间逆向对称断裂,学习远距离依赖性 2506.05259v1 -
1872 06-05 Continual Learning from Simulated Interactions via Multitask Prospective Rehearsal for Bionic Limb Behavior Modeling Kontinuierliches Lernen von Simulierten Interaktionen über Multitask Prospective Probe für Bionic Limb Behavior Modeling 通过多塔斯克双立架行为模拟模型预演预演预演,从模拟互动中不断学习 2405.01114v4 -
1873 06-05 Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning Konservative Klassifikatoren tun konsequent gut mit Verbesserung Agenten: Charakterisierung statistischer und Online-Lernen 保守的分类机构与改进机构保持一贯的很好:将统计和在线学习定性为特征 2506.05252v1 -
1874 06-05 On the Convergence of Gradient Descent on Learning Transformers with Residual Connections Über die Konvergenz des gradienten Abstiegs auf Lerntransformatoren mit residualen Verbindungen 关于有残余连接的学习变异器的 “ 渐渐后代 “ 趋同 2506.05249v1 -
1875 06-05 Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress LLM-Fortschritt neu denken: Berechenbare und unabhängige Wege zum Fortschritt 重新思考LLM进步:计算依赖性和独立前进之路 2505.04075v2 -
1876 06-05 A theoretical framework for overfitting in energy-based modeling Ein theoretischer Rahmen für die Überrüstung in energiebasierte Modellierung 以能源为基础的建模的超套理论框架 2501.19158v3 -
1877 06-05 Robust Moment Identification for Nonlinear PDEs via a Neural ODE Approach Robuste Moment-Identifikation für nichtlineare PDEs über einen neuralen ODE-Ansatz 通过神经极光模式方法确定非线性PDE的强劲势头 2506.05245v1 -
1878 06-05 Aligning Latent Spaces with Flow Priors Latente Räume mit Flow Priors ausrichten 将局部空间与流动前置空间对齐 2506.05240v1 -
1879 06-05 Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit Sparse Autoencoder bewerten: Vom shallow Design bis hin zum passenden Pursuit 评估粗略自动校对器:从浅色设计到匹配追求 2506.05239v1 -
1880 06-05 Timing is Important: Risk-aware Fund Allocation based on Time-Series Forecasting Timing ist wichtig: Risiko-aware Fund Allokation basierend auf Time-Series Forecasting 时间选择很重要:根据时间-系列预测进行有风险的基金分配 2505.24835v2 -
1881 06-05 Blink of an eye: a simple theory for feature localization in generative models Blink des Auges: eine einfache Theorie zur Featurelokalisierung in generativen Modellen 眨眼:在基因模型中地貌化的简单理论 2502.00921v2 -
1882 06-05 Stochastic Poisson Surface Reconstruction with One Solve using Geometric Gaussian Processes Stochastische Poisson-Oberflächenrekonstruktion mit einer Lösung mit geometrischen Gauß-Prozessen 利用几何高斯进程进行一索一索的斯托查斯·普瓦松地面重建 2503.19136v2 -
1883 06-05 MesaNet: Sequence Modeling by Locally Optimal Test-Time Training MesaNet: Sequenzmodellierung durch lokal optimale Test-Time-Schulung MesaNet:通过当地最佳试验时间培训进行序列建模 2506.05233v1 -
1884 06-05 Causal Discovery from Conditionally Stationary Time Series Causal Discovery aus Conditionally Stationary Time Series 从有条件的静止时间序列中发现的因果发现 2110.06257v4 -
1885 06-05 Progressive Tempering Sampler with Diffusion Progressive Temperierung mit Diffusion 具有扩散作用的渐进性温度采样器 2506.05231v1 -
1886 06-05 Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts Diagonales Batching löst Parallelismus in recurrenten Speichertransformatoren für lange Kontexte auf 对角对角拔 解锁长期常用内存变换器中的平行主义 2506.05229v1 -
1887 06-05 The Lessons of Developing Process Reward Models in Mathematical Reasoning Die Lehren aus der Entwicklung von Prozess-Reward-Modellen in mathematischer Reasoning 数学理由中发展进程奖励模型的经验教训 2501.07301v2 -
1888 06-05 A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values Ein einheitliches Rahmenwerk für wahrscheinlich effiziente Algorithmen zur Schätzung von Shapley-Werten 用于估计损耗值的可预见高效算法统一框架 2506.05216v1 -
1889 06-05 Learning Theory of Decentralized Robust Kernel-Based Learning Algorithm Lerntheorie des dezentralisierten robusten Kernel-basierten Lernalgorithmus 分散的强力核心学习学习理论 2506.05215v1 -
1890 06-05 Mitigating Degree Bias Adaptively with Hard-to-Learn Nodes in Graph Contrastive Learning Degree Bias im Graph Contrastive Learning adaptiv mit schwer zu erlernenden Knoten abmildern 与图表对比学习中难以识别的节点相适应 2506.05214v1 -
1891 06-05 Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data Verstärktes Lernen für individuelle Optimale Politik aus heterogenen Daten 利用不同不同数据为个人最佳政策强化学习 2505.09496v2 -
1892 06-05 Unleashing The Power of Pre-Trained Language Models for Irregularly Sampled Time Series Lösen der Macht von vortrainierten Sprachmodellen für unregelmäßig gemusterte Zeitreihen 为不定期抽样抽样时间序列释放培训前语言模型的力量 2408.08328v2 -
1893 06-05 The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text The Common Pile v0.1: Ein 8TB-Datensatz von Public Domain und Openly Licensed Text Pile v0.1:公共域和公开许可文本的 8TB数据集 2506.05209v1 -
1894 06-05 One Wave To Explain Them All: A Unifying Perspective On Feature Attribution Eine Welle, um sie alle zu erklären: Eine verbindende Perspektive auf Feature Attribution 一波一波解释全部:一个统一化的地物归属观 2410.01482v2 -
1895 06-05 Trustworthiness Preservation by Copies of Machine Learning Systems Vertrauenswürdigkeit Erhaltung durch Kopien von Machine Learning Systemen 机器学习系统副本保存可信赖性 2506.05203v1 -
1896 06-05 Transformers Meet In-Context Learning: A Universal Approximation Theory Transformer treffen auf In-Context Learning: Eine universelle Annäherungstheorie 变革者满足内文学习:通用近似理论 2506.05200v1 -
1897 06-05 Quantifying Cross-Modality Memorization in Vision-Language Models Quantifizierung der Cross-Modality-Memorisierung in Vision-Language-Modellen 设想语言模型中量化跨模式的计量 2506.05198v1 -
1898 06-05 Abnormal component analysis Abnorme Komponentenanalyse 异常元件分析 2312.16139v2 -
1899 06-05 Locality Preserving Markovian Transition for Instance Retrieval Lokalität Erhaltung der Markovschen Übergang für Instance Retrieval 将Markovian 过渡保留为事件检索 2506.05196v1 -
1900 06-05 Chaotic Hedging with Iterated Integrals and Neural Networks Chaotische Absicherung mit iterierten Integralen und neuralen Netzwerken 使用循环综合器和神经网络的冷酷杂乱和神经网络 2209.10166v4 -
1901 06-05 Counterfactual reasoning: an analysis of in-context emergence Gegenfaktische Argumentation: eine Analyse der Entstehung von Inkontexten 反事实推理:对内源出现的分析 2506.05188v1 -
1902 06-05 TreeRPO: Tree Relative Policy Optimization TreeRPO: Baum Relative Politikoptimierung TreeRPO:树相对政策优化 2506.05183v1 -
1903 06-05 AI-powered Contextual 3D Environment Generation: A Systematic Review KI-powered Contextual 3D Environment Generation: Eine systematische Überprüfung 3D环境代:系统审查 2506.05449v1 -
1904 06-05 Associative Memory and Generative Diffusion in the Zero-noise Limit Assoziatives Gedächtnis und Generative Diffusion in der Null-Rausch-Grenze 零噪音限制中的联合内存和生成扩散 2506.05178v1 -
1905 06-05 Can Large Language Models Understand Intermediate Representations in Compilers? Können große Sprachmodelle Zwischendarstellungen in Compilern verstehen? 大语言模式能理解《汇编者》的中间代表吗? 2502.06854v2 -
1906 06-05 Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks Verallgemeinerungen durch Meta-Learned Model Representations: PAC-Bayes und Sample Compression Hypernetworks 通过Met-Learn模型代表:PAC-Bayes和抽样压缩超网络的通用环 2410.13577v3 -
1907 06-05 Goal-Oriented Time-Series Forecasting: Foundation Framework Design Zielorientierte Zeitreihenplanung: Grundlagenrahmengestaltung 目标导向的时间系列预测:基础框架设计 2504.17493v2 -
1908 06-05 Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests Bewertung der Robustheit von Deep Reinforcement Learning für die autonome Oberflächenfahrzeugsteuerung in Feldversuchen 评价实地测试中用于自动水陆车辆控制的深强化学习的强力 2505.10033v2 -
1909 06-05 Detection-Driven Object Count Optimization for Text-to-Image Diffusion Models Detektionsgesteuerte Objektanzahl-Optimierung für Text-zu-Bild-Diffusionsmodelle 文本到图像传播模型最佳化 2408.11721v2 -
1910 06-05 SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows SESaMo: Symmetrie-verstärkende stochastische Modulation für normalisierende Strömungen SESaMo: 正常流动的对称性-强化斯托调动 2505.19619v2 -
1911 06-05 Federated Isolation Forest for Efficient Anomaly Detection on Edge IoT Systems Föderierter Isolationswald für effiziente Anomalienerkennung an Edge IoT-Systemen 在边缘IOT系统中高效异常探测的联邦隔离林 2506.05138v1 -
1912 06-05 Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning Training Dynamics zugrunde liegende Sprachmodellskalierungsgesetze: Verlustverschleierung und Null-Summe-Lernen 培训动态基础语言示范缩写法:损失减速和零苏姆学习 2506.05447v1 -
1913 06-05 DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning DiCoRe: Erweitern der Null-Shot-Erkennung durch Divergent-Convergent LLM Reasoning DiCore: 通过差异-说服者LLM 合理性,加强零射事件探测 2506.05128v1 -
1914 06-05 Membership Inference Attacks on Sequence Models Membership Inferenz Attacks on Sequence Models 对序列模型的攻击 2506.05126v1 -
1915 06-05 Jailbreak Attack Initializations as Extractors of Compliance Directions Jailbreak Attack Initialisierungen als Extraktoren für Compliance-Anweisungen 作为 “ 遵守指示 “ 的提取器 2502.09755v2 -
1916 06-05 Nonlinear Causal Discovery for Grouped Data Nichtlineare kausale Entdeckung für gruppierte Daten 分组数据的非线性构造发现 2506.05120v1 -
1917 06-05 Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images Erforschung von adversarialen Wasserzeichen in transformerbasierten Modellen: Übertragbarkeit und Robustheit gegen Abwehrmechanismen für medizinische Bilder 探索以变换模式为基础的变换模型中的反对水标记:医疗图像防御机制的可转移性和强力 2506.06389v1 -
1918 06-05 Learning Rock Pushability on Rough Planetary Terrain Rock Pushability lernen auf rauem planetarischen Gelände 环绕行星地的学习岩岩推力 2505.09833v2 -
1919 06-05 Survey on the Evaluation of Generative Models in Music Umfrage zur Bewertung generativer Modelle in der Musik 音乐创创模型评价调查 2506.05104v1 -
1920 06-05 Structural Alignment Improves Graph Test-Time Adaptation Struktural Alignment verbessert Graph Test-Time Anpassung 结构调整改进图示测试时间适应 2502.18334v3 -
1921 06-05 Privacy Amplification Through Synthetic Data: Insights from Linear Regression Datenschutzverstärkung durch synthetische Daten: Einblicke aus linearer Regression 通过合成数据放大隐私:线性回归的洞视 2506.05101v1 -
1922 06-05 The Impossibility of Fair LLMs Die Unmöglichkeit fairer LLMs 公平专利Ms的不可行性 2406.03198v2 -
1923 06-05 Semi-Implicit Variational Inference via Kernelized Path Gradient Descent Halbimplizite Variationsableitung über kernalisierten Pfad gradienten Abstieg 通过内核化路径梯度梯度 2506.05088v1 -
1924 06-05 Autoformulation of Mathematical Optimization Models Using LLMs Autoformulierung mathematischer Optimierungsmodelle mit LLMs 使用LLMM 的数学优化模型自动格式化 2411.01679v2 -
1925 06-05 EMBER2024 – A Benchmark Dataset for Holistic Evaluation of Malware Classifiers EMBER2024 – Ein Benchmark-Datensatz für die ganzheitliche Auswertung von Malware-Klassifikatoren 2024年9月24日 – – 用于全面评估软件分类器基准数据集 2506.05074v1 -
1926 06-05 GoRA: Gradient-driven Adaptive Low Rank Adaptation GoRA: Gradient-getriebene Adaptive Low-Rank-Anpassung GARA:逐步驱动的适应性低级别适应 2502.12171v2 -
1927 06-05 UnHiPPO: Uncertainty-aware Initialization for State Space Models UnHiPPO: Unsichere Initialisierung für State Space Models UNHPPPO:国家空间模型的不确定性初始化 2506.05065v1 -
1928 06-05 NIMO: a Nonlinear Interpretable MOdel NIMO: ein nichtlinear interpretierbarer MOdel NIIMO:非线性解释元体 2506.05059v1 -
1929 06-05 Mini Diffuser: Fast Multi-task Diffusion Policy Training Using Two-level Mini-batches Mini Diffusor: Schnelles Multi-Task Diffusion Policy Training mit zweistufigen Mini-Batches Mini Diffuser:利用两级小型水槽快速多任务传播政策培训 2505.09430v2 -
1930 06-05 Neurosymbolic Association Rule Mining from Tabular Data Neurosymbolische Assoziationsregel Mining aus tabellarischen Daten 利用表格数据进行常规采矿 2504.19354v2 -
1931 06-05 A Compressive-Expressive Communication Framework for Compositional Representations Ein kompressiv-expressiver Kommunikationsrahmen für kompositorische Repräsentationen 组成代表制压缩-高压沟通框架 2501.19182v3 -
1932 06-05 Optimizing Anytime Reasoning via Budget Relative Policy Optimization Optimierung jederzeit über Budget-Relational-Policy-Optimierung 通过预算相对政策优化优化,优化任何时间的理由 2505.13438v2 -
1933 06-05 Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic Causal Policy Learning in Verstärkung Lernen: Hintertür-angepasste Soft-Actor-Kritik 强化学习中因果政策学习:后门调整后软动作器-批评 2506.05445v1 -
1934 06-05 A Selective Homomorphic Encryption Approach for Faster Privacy-Preserving Federated Learning Ein selektiver homomorpher Verschlüsselungsansatz für schnelleres Datenschutz-Erhalten von Federated Learning 为更快的隐私-保护联邦学习采取选择性单态加密方法 2501.12911v4 -
1935 06-05 iN2V: Bringing Transductive Node Embeddings to Inductive Graphs iN2V: Transduktive Knoteneinbettungen in induktive Graphen bringen iN2V: 将感应节点嵌入式带入感应图 2506.05039v1 -
1936 06-05 Scalable Multi-Agent Offline Reinforcement Learning and the Role of Information Skalierbares Mehr-Agent-Offline-Verstärkungs-Lernen und die Rolle der Information 可缩放的多代理机构脱线强化学习和信息的作用 2502.11260v2 -
1937 06-05 TIMING: Temporality-Aware Integrated Gradients for Time Series Explanation TIMING: Temporality-Aware integrierte Gradienten für die Zeitreihe Erklärung TIMING: 时间序列解释时空软件集成梯度 2506.05035v1 -
1938 06-05 Identifying and Understanding Cross-Class Features in Adversarial Training Identifizieren und Verstehen von Cross-Class-Features im Adversarial Training 确定和理解反向培训的跨系列特点 2506.05032v1 -
1939 06-05 Artificial Intelligence Should Genuinely Support Clinical Reasoning and Decision Making To Bridge the Translational Gap Künstliche Intelligenz sollte die klinische Vernunft und Entscheidungsfindung wirklich unterstützen, um die translationale Lücke zu überbrücken 人工情报应真正支持临床理由和决策,以弥补翻译差距 2506.05030v1 -
1940 06-05 Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning Engpässe Transformer: Regelmäßige KV-Cache Abstraktion für generalisierte Vernunft 装瓶式变压器: 用于一般理由的周期 KV 缓存摘要 2505.16950v2 -
1941 06-05 Tuning the Right Foundation Models is What you Need for Partial Label Learning Tuning the right foundation models is what you need for Partial Label Learning 向右基建模投注是部分标签学习所需要的 2506.05027v1 -
1942 06-05 A unified weighting framework for evaluating nearest neighbour classification Ein einheitlicher Gewichtungsrahmen für die Bewertung der nächsten Nachbarklassifikation 评价近邻分类的统一加权框架 2311.16872v3 -
1943 06-05 GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs GRU: Abbau des Kompromisses zwischen Unlearning und Retention für LLMs GRU:减少LLMM的不学习与保留之间的取舍 2503.09117v3 -
1944 06-05 U-NetMN and SegNetMN: Modified U-Net and SegNet models for bimodal SAR image segmentation U-NetMN und SegNetMN: Modifizierte U-Net- und SegNet-Modelle für bimodale SAR-Bildsegmentierung U-NetMN和SegNetMN:经过修改的双式合成孔径雷达图像分离 U-Net和SegNet模型和SegNet模型 2506.05444v1 -
1945 06-05 On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding Über Expressive Power von Schleiftransformatoren: Theoretische Analyse und Verbesserung über die Zeitschrittkodierung 循环变换器的表达力:通过时间步骤编码进行理论分析和加强 2410.01405v7 -
1946 06-05 Stein Variational Evolution Strategies Strategien der Steinvariational Evolution Stein变异演变战略 2410.10390v2 -
1947 06-05 Controlling Summarization Length Through EOS Token Weighting Kontrolle der Zusammenfassung Länge durch EOS Token Gewichtung 控制通过 EOS 过 EOS 键权重加权的控控总和长度 2506.05017v1 -
1948 06-05 A new architecture of high-order deep neural networks that learn martingales Eine neue Architektur von hochordentlich tiefen neuronalen Netzwerken, die martingales lernen 一个由高阶深层神经网络组成的新结构,能够学习马丁鱼 2505.03789v2 -
1949 06-05 Multi-Point Proximity Encoding For Vector-Mode Geospatial Machine Learning Multi-Point Proximity Encoding für Vector-Mode Geospatial Machine Learning 矢量模式地理空间机器学习多点近似编码 2506.05016v1 -
1950 06-05 Towards Reasonable Concept Bottleneck Models Auf dem Weg zu vernünftigen Konzeptmodellen für Engpässe 迈向合理概念的瓶颈模式 2506.05014v1 -
1951 06-05 Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity 1568 Tokens in einen einzigen Vektor und wieder zurück krammen: Die Grenzen der Einbettung von Raumkapazität erkunden 将1568吨撞成单一矢量和后向:探索嵌入空间能力的极限 2502.13063v2 -
1952 06-05 QiMeng: Fully Automated Hardware and Software Design for Processor Chip QiMeng: Vollautomatisierte Hardware und Software-Design für Prozessor Chip QiMeng:处理芯片全自动硬件和软件设计 2506.05007v1 -
1953 06-05 Monotonic anomaly detection Monotonische Anomalieerkennung 单声异常探测 2410.23158v2 -
1954 06-05 Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games Vorsichtiger Optimismus: Ein Meta-Algorithmus für fast-konstantes Bedauern in den allgemeinen Spielen 谨慎乐观主义:普通运动会中近距离遗憾的代算算法 2506.05005v1 -
1955 06-05 UniPTMs: The First Unified Multi-type PTM Site Prediction Model via Master-Slave Architecture-Based Multi-Stage Fusion Strategy and Hierarchical Contrastive Loss UniPTMs: Das erste einheitliche Multi-Type PTM Site Prediction Model über Master-Slave Architecture-basierte Multi-Stage Fusionsstrategie und hierarchischen Kontrastverlust UNIPTMs:第一个单一多型PTM多型PTM站点预测模型,通过基于万能结构的建筑多阶段融合战略和等级对立损失 2506.05443v1 -
1956 06-05 FPTQuant: Function-Preserving Transforms for LLM Quantization FPTQuant: Funktionsschonende Transformationen für die LLM-Quantisierung FPTQuant: 用于LLM量化的功能保存变换 2506.04985v1 -
1957 06-05 Agentic AI for Intent-Based Industrial Automation Agentische KI für Intent-Based Industrial Automation 用于内在工业自动化的AAA剂 2506.04980v1 -
1958 06-05 Entropy-based Training Methods for Scalable Neural Implicit Sampler Entropie-basierte Trainingsmethoden für skalierbare neurale Implizite Sampler 用于可缩缩的神经隐隐性取样器的基于英信的培训方法 2306.04952v2 -
1959 06-05 Deep histological synthesis from mass spectrometry imaging for multimodal registration Tiefe histologische Synthese aus Massenspektrometrie-Bildgebung für multimodale Registrierung 从用于多式联运注册的质谱成像成像中产生深度神学合成 2506.05441v1 -
1960 06-05 Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences Diff-Instruct++: Ein-Schritt-Training Text-zu-Bild-Generator-Modell zur Ausrichtung mit menschlichen Einstellungen Diff- Instrucct+++:培训一步骤文本到图像生成模型,使之与人类偏好相一致 2410.18881v2 -
1961 06-05 Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule Pilotieren von strukturbasiertem Drug Design über Modalitätsspezifische Optimale Planung 通过特定方式和特定最佳安排表进行基于结构的药物设计 2505.07286v2 -
1962 06-05 Efficient Time Series Processing for Transformers and State-Space Models through Token Merging Effiziente Zeitreihenverarbeitung für Transformatoren und State-Space-Modelle durch Token Merging 通过 Token 合并对变形器和国家空间模型的有效时间序列处理 2405.17951v3 -
1963 06-05 Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models Gemeinsame Interventionseffekte von Single-Variablen-Interventionen in Additiven-Modellen lernen 在 “ 补充模式 “ 中单一可变干预措施的 学习联合干预效应 2506.04945v1 -
1964 06-05 Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation Schwacher Generativer Sampler zur effizienten Probeninvariantverteilung der stochastischen Differentialgleichung 用于有效取样的弱生成采样器,以有效采样、不变化地分配蒸蒸蒸式差别等式的微变式采样器 2405.19256v2 -
1965 06-05 David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training David und Goliath: Kleines Ein-Schritt-Modell schlägt große Diffusion mit Partitur nach dem Training David和Goliath:小型一步骤模式比大规模传播更胜于记分后培训 2410.20898v3 -
1966 06-05 Can Transformers Do Enumerative Geometry? Können Transformer enumerative Geometrie? 变形人能做数字几何吗? 2408.14915v3 -
1967 06-05 Energentic Intelligence: From Self-Sustaining Systems to Enduring Artificial Life Energentische Intelligenz: Von selbsterhaltenden Systemen zum Ausharren des künstlichen Lebens 能源情报:从自我维持系统到人命持续 2506.04916v1 -
1968 06-05 MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining MiMo: Entsperren des vernünftigen Potenzials des Sprachmodells – Von der Vorschulung zur Nachschulung 米莫:释放语文模式的理性潜力 – – 从预培训到员额培训 2505.07608v2 -
1969 06-05 Dissecting Long Reasoning Models: An Empirical Study Modelle der langen Vernunft zu entschlüsseln: Eine empirische Studie 解析长期理由模型:经验研究 2506.04913v1 -
1970 06-05 When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models Wenn LLMs denken lügen: Enthüllen der strategischen Täuschung in Repräsentationen von Vernunftmodellen 当思考LLLM Lie:在解释理由模型代表中消除战略欺骗时 2506.04909v1 -
1971 06-05 Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy Trainieren Sie einen Sparse Autoencoder über mehrere Sparsity-Haushalte, um Dolmetschbarkeit und Genauigkeit zu erhalten 跨多级预算培训一个简单自动编码员,以维护可解释性和准确性 2505.24473v2 -
1972 06-05 Are all models wrong? Fundamental limits in distribution-free empirical model falsification Sind alle Modelle falsch? Grundlegende Grenzen in der verteilungsfreien empirischen Modellfälschung 所有模型是否都错了?无分销经验模型伪造的基本限制 2502.06765v2 -
1973 06-05 Leveraging Diffusion Models for Parameterized Quantum Circuit Generation Nutzung von Diffusionsmodellen für die parameterisierte Quantum Circuit Generation 利用可计量量子电路生成的传播模型 2505.20863v2 -
1974 06-05 Scalable Multi-Output Gaussian Processes with Stochastic Variational Inference Skalierbare Multi-Output Gaußsche Prozesse mit stochastischer Variationsableitung 可缩放多输出高斯进程,带有斯托卡变动推断 2407.02476v2 -
1975 06-05 Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge Full-Parameter Continual Pretraining von Gemma2: Einblicke in Fluency und Domain Knowledge Gemma2:深入了解流能和广域知识 2505.05946v2 -
1976 06-05 Gaussian Process Diffeomorphic Statistical Shape Modelling Outperforms Angle-Based Methods for Assessment of Hip Dysplasia Gaussian Prozess Diffeomorphe statistische Form Modellierung übertrifft Winkel-basierte Methoden zur Beurteilung von Hüftdysplasie Gausian 进程 外形外形光效光效模型统计形态底部形态评估方法 2506.04886v1 -
1977 06-05 Self-Tuning Spectral Clustering for Speaker Diarization Selbst-Tuning Spectral Clustering für Sprecher-Diarisierung 为议长进行自我引导光谱群集 2410.00023v2 -
1978 06-05 Multiplicative Dynamic Mode Decomposition Multiplikative Dynamische Moduszersetzung 多倍复制动态模式分解 2405.05334v2 -
1979 06-05 Multi-granularity Knowledge Transfer for Continual Reinforcement Learning Multi-Granularitäts-Wissenstransfer für kontinuierliches Stärkungslernen 用于持续加强学习的多族知识转让 2401.15098v3 -
1980 06-05 kTULA: A Langevin sampling algorithm with improved KL bounds under super-linear log-gradients kTULA: Ein Langevin-Sampling-Algorithmus mit verbesserten KL-Grenzen unter superlinearen Log-Gradienten kTULA: Langevin 取样算法,在超线性日志梯度下改进 KL 界限 2506.04878v1 -
1981 06-05 There Was Never a Bottleneck in Concept Bottleneck Models Es gab nie einen Engpass in Konzept Engpässe Modelle ” 概念瓶颈模型 “ 从未出现过瓶颈问题 2506.04877v1 -
1982 06-05 Aligning Multimodal Representations through an Information Bottleneck Multimodale Repräsentationen durch einen Informationsengpass ausrichten 通过信息瓶颈协调多式联运代表机构 2506.04870v1 -
1983 06-05 LLMs for sensory-motor control: Combining in-context and iterative learning LLMs für die sensorisch-motorische Steuerung: Kombination von Kontext und iterativem Lernen 感官运动控制LLMs:将内文与迭接学习相结合 2506.04867v1 -
1984 06-05 Fokker-Planck to Callan-Symanzik: evolution of weight matrices under training Fokker-Planck zu Callan-Symanzik: Entwicklung von Gewichtsmatrizen im Training Fokker-Planck to Callan-Symanzik:培训中重量矩阵的演变 2501.09659v2 -
1985 06-05 Improving AI-generated music with user-guided training Verbesserung der KI-generierten Musik durch benutzergeführtes Training 通过用户指导培训,改进AI制制的音乐 2506.04852v1 -
1986 06-05 An Unsupervised Framework for Dynamic Health Indicator Construction and Its Application in Rolling Bearing Prognostics Ein unbeaufsichtigtes Framework für die Konstruktion dynamischer Gesundheitsindikatoren und deren Anwendung in der Wälzlagerprognostik 一个无人监督的动态健康指标建设框架及其在滚轴预测器中的应用 2506.05438v1 -
1987 06-05 Learning pure quantum states (almost) without regret Lernen von reinen Quantenzuständen (fast) ohne Bedauern 学习纯量度状态(几乎) 2406.18370v2 -
1988 06-05 Hybrid deep convolution model for lung cancer detection with transfer learning Hybrides Deep-Convolution-Modell für Lungenkrebserkennung mit Transfer-Lernen 肺癌检测与转移学习的混合深变深混合模型 2501.02785v2 -
1989 06-05 Event Classification of Accelerometer Data for Industrial Package Monitoring with Embedded Deep Learning Event-Klassifikation von Accelerometer-Daten für die industrielle Paketüberwachung mit eingebettetem Deep Learning 用于以嵌入深层学习方式监测工业一揽子措施的加速计数据的事件分类 2506.05435v1 -
1990 06-05 From EHRs to Patient Pathways: Scalable Modeling of Longitudinal Health Trajectories with LLMs Von EHRs zu Patientenpfaden: Skalierbare Modellierung von langitudinalen Gesundheits-Trajektorien mit LLMs 从EHRs到患者途径:可扩展的长纵向健康轨迹模型与LLMs 2506.04831v1 -
1991 06-05 Reading Recognition in the Wild Leseerkennung in der Wildnis 野生生物的阅读识别 2505.24848v2 -
1992 06-05 Supervised Graph Contrastive Learning for Gene Regulatory Network Überwachtes Graph Kontrastives Lernen für Gene Regulatory Network 受监督的基因监管网络图表对比性学习 2505.17786v2 -
1993 06-05 Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light Detectors Tor das Stopplicht: Realistische Widersacher-Angriffe auf Verkehrslicht-Detektoren 愚笨的暗灯:现实的对抗性对交通灯光探测器的补丁攻击 2506.04823v1 -
1994 06-05 Offline Multi-agent Reinforcement Learning via Score Decomposition Offline-Multi-Agenten-Verstärkung Lernen über Score-Dekomposition 通过计分分分分分分化进行离线多剂强化学习 2505.05968v2 -
1995 06-05 LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning LogicPuzzleRL: Robuste mathematische Reasoning in LLMs durch Verstärkungslernen kultivieren LologicPuwloughRL:通过强化学习在LLMs中培养强力数学理由 2506.04821v1 -
1996 06-05 Distributional encoding for Gaussian process regression with qualitative inputs Verteilungskodierung für Gaußsche Prozessregression mit qualitativen Inputs 带有定性投入的高斯进程回归的分布编码 2506.04813v1 -
1997 06-05 Uniform convergence of the smooth calibration error and its relationship with functional gradient Einheitliche Konvergenz des glatten Kalibrierfehlers und seines Verhältnisses mit dem funktionellen Gradienten 平稳校准误差及其与功能梯度的关系统一汇合 2505.19396v2 -
1998 06-05 Adaptive Preconditioners Trigger Loss Spikes in Adam Adaptive Preconditioners Trigger Verlust Spikes in Adam 亚当的触发器失落斯派克 2506.04805v1 -
1999 06-05 Augmented Invertible Koopman Autoencoder for long-term time series forecasting Augmented Invertible Koopman Autoencoder für langfristige Zeitreihen 用于长期时间序列预报的可逆 Koopman 自动编码器 2503.12930v2 -
2000 06-05 LotusFilter: Fast Diverse Nearest Neighbor Search via a Learned Cutoff Table LotusFilter: Schnelles Diverses Naheste Nachbarschaftssuche über einen erfahrenen Cutoff-Tisch LotusFilter:通过计算截止表快速搜索近邻 2506.04790v1 -
2001 06-05 Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques Auf dem Weg zur multimodalen Fusion LLM-Centric: Eine Studie über Integrationsstrategien und -techniken 走向LLM-Centric Multimods 融合:关于一体化战略和技术的调查 2506.04788v1 -
2002 06-05 Kernel $k$-Medoids as General Vector Quantization Kernel $k$-Medoids als General Vector Quantization 作为一般矢量量化的内核 $k$- 中值 2506.04786v1 -
2003 06-05 Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward Prefix Grouper: Effiziente GRPO-Schulung durch Shared-Prefix Forward 前缀群組: 通过共享前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前语前言前言前言前语前言前语前言前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语语 2506.05433v1 -
2004 06-05 Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards Verbesserte Regret Bounds für lineare Banditen mit schwerfälligen Belohnungen 使用重负奖励的线性强盗的遗憾弹道改善 2506.04775v1 -
2005 06-05 Fine-Grained Interpretation of Political Opinions in Large Language Models Feinkörnige Interpretation politischer Meinungen in großen Sprachmodellen 以大语言模式对政治意见的精细解释 2506.04774v1 -
2006 06-05 Inclusive, Differentially Private Federated Learning for Clinical Data Inklusives, differenziert privates Federated Learning für klinische Daten 包容性、差异化私联校临床数据学习 2505.22108v2 -
2007 06-05 Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective Der Stabilitäts-Plastizitäts-Trade-off im kontinuierlichen Lernen aus architektonischer Perspektive neu denken 从建筑学的视角重新思考持续学习中稳定-弹性权衡取舍 2506.03951v2 -
2008 06-05 PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling PCDVQ: Verbesserung der Vector Quantization für große Sprachmodelle über Polar Coordinate Entkopplung PCDVQ:通过极地协调脱钩,加强大语言模型的矢量量化 2506.05432v1 -
2009 06-05 OpenGT: A Comprehensive Benchmark For Graph Transformers OpenGT: Ein umfassender Benchmark für Graph Transformer OpenGT: 图表变形器综合基准 2506.04765v1 -
2010 06-05 Log-Linear Attention Log-Linear-Achtung 日志边注意 2506.04761v1 -
2011 06-05 Robustness Evaluation for Video Models with Reinforcement Learning Robustheitsbewertung für Videomodelle mit Verstärkungslernen 强化学习录像模型的强力评价 2506.05431v1 -
2012 06-05 Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models Multi-Layer GRPO: Verbessern von Vernunft und Selbstkorrektion in großen Sprachmodellen 多语言GROPO:加强大语言模式的理性和自我校正 2506.04746v1 -
2013 06-05 Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models Text-zu-CAD-Generierung durch visuelles Feedback in großen Sprachmodellen 通过将视觉反馈引入大语言模型,将生成的文字引入CAD 2501.19054v3 -
2014 06-05 Focus On This, Not That! Steering LLMs with Adaptive Feature Specification Fokus auf diese, nicht das! Lenkung LLMs mit adaptiven Feature Specification 聚焦于此, 不是那个! 指导性带适应性特征规格的LLMS 2410.22944v4 -
2015 06-05 Model-based Neural Data Augmentation for sub-wavelength Radio Localization Modellbasierte Neural Data Augmentation für die Subwellenlängen-Radiolokalisierung 以模型为基础的亚波长无线电无线电下波长无线电神经数据扩增本地化模型 2506.06387v1 -
2016 06-05 The Role of Diversity in In-Context Learning for Large Language Models Die Rolle der Vielfalt im In-Context-Lernen für große Sprachmodelle 多样性在为大语言模式进行内文学习方面的作用 2505.19426v2 -
2017 06-05 SPHINX: Structural Prediction using Hypergraph Inference Network SPHINX: Structural Prediction mit Hypergraph Inference Network SPHINX:利用高光推断网络进行结构预测 2410.03208v2 -
2018 06-05 Automated Modeling Method for Pathloss Model Discovery Automatisierte Modellierungsmethode für Pathloss Model Discovery 病理模型发现自动建模方法 2505.23383v2 -
2019 06-05 Scaling Trends in Language Model Robustness Skalierungstrends in der Robustheit von Sprachmodellen 语言模型强度的增缩趋势 2407.18213v5 -
2020 06-05 $μ$nit Scaling: Simple and Scalable FP8 LLM Training $μ$nit Scaling: Einfache und skalierbare FP8 LLM-Schulung $m$nit 缩放:简单和可缩放的FP8LLM培训 2502.05967v3 -
2021 06-05 Coordinated Robustness Evaluation Framework for Vision-Language Models Koordinierte Robustheitsbewertungsrahmen für Vision-Language-Modelle 愿景-语言模型协调强力评价框架 2506.05429v1 -
2022 06-05 Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks Regularisierte KL-Divergenz für gut definierte Funktions-Raum-Varianten-Schlussfolgerung in Bayesischen neuronalen Netzwerken Bayesian神经网络中定义精准的功能-空间变动推论的正规化 KL-KL-Diverence 2406.04317v3 -
2023 06-05 Adapt before Continual Learning Anpassung vor dem kontinuierlichen Lernen 在持续学习前适应 2506.03956v2 -
2024 06-05 FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation FlowDAS: Ein stochastisches Interpolant-basiertes Framework für die Datenassimilierung FlowDAS:一个基于数据同化的存储式内插框架 2501.16642v2 -
2025 06-05 Wyckoff Transformer: Generation of Symmetric Crystals Wyckoff Transformer: Erzeugung symmetrischer Kristalle Wyckoff 变形器: 生成对称晶体 2503.02407v4 -
2026 06-05 Using In-Context Learning for Automatic Defect Labelling of Display Manufacturing Data Verwendung von In-Context-Learning zur automatischen Defekt-Etikettierung von Display-Produktionsdaten 使用内文学习自动去除显示器制造数据标签 2506.04717v1 -
2027 06-05 Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices Infinitesimale Spektralvariationen höherer Ordnung in rektangulären realen Zufallsmatrizen 矩形真实随机矩阵中无尽的高端天文光谱变化 2506.03764v2 -
2028 06-05 Bandit Multiclass List Classification Bandit-Listenklassifikation für mehrere Klassen 强盗多级名单分类 2502.09257v2 -
2029 06-05 UNO: Unlearning via Orthogonalization in Generative models UNO: Entlernen über Orthogonalisierung in generativen Modellen UNO:在创制模型中,通过正正孔化重新学习 2506.04712v1 -
2030 06-05 Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence Explizite Dichte-Annäherung für neurale Implizite Sampler mit einer Bernstein-basierten Konvex-Divergenz 使用伯恩斯坦-以伯尔斯坦为基础的电汇变异的神经隐性采样器的显性密度近似 2506.04700v1 -
2031 06-05 Enhanced Drought Analysis in Bangladesh: A Machine Learning Approach for Severity Classification Using Satellite Data Verbesserte Trockenheitsanalyse in Bangladesch: Ein maschineller Lernansatz für die Severity-Klassifikation mittels Satellitendaten 孟加拉国加强干旱分析:利用卫星数据进行重力分类的机械学习方法 2506.04696v1 -
2032 06-05 On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models Über den Mechanismus der bewussten Musterauswahl im Ausbau des Lernens für Sprachmodelle 语文模式强化学习中选择理由选择模式机制 2506.04695v1 -
2033 06-05 Influence Functions for Edge Edits in Non-Convex Graph Neural Networks Einflussfunktionen für Kantenbearbeitungen in nicht konvexen Graphen-Neuralnetzwerken 非Convex 图形神经网络中的边缘编辑的影响函数 2506.04694v1 -
2034 06-05 Towards Better Generalization via Distributional Input Projection Network Auf dem Weg zu einer besseren Verallgemeinerung über das Distributional Input Projection Network 通过分配投入预测网络更好地实现普遍化 2506.04690v1 -
2035 06-05 Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models Recycling the Web: Eine Methode zur Verbesserung der Vorschulung von Daten Qualität und Menge für Sprachmodelle 网上再循环:提高语文模式培训前数据质量和数量的方法 2506.04689v1 -
2036 06-05 UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules UniSim: Ein einheitlicher Simulator für zeitgesteuerte Dynamik von Biomolekülen UniSim: 生物分子时间分析分析动态的统一模拟器 2506.03157v2 -
2037 06-05 Model Predictive Control is Almost Optimal for Restless Bandit Modell Predictive Control ist fast optimal für Restless Bandit 模型预测控制几乎是无休止强盗的最佳模式 2410.06307v2 -
2038 06-05 Neural Discovery in Mathematics: Do Machines Dream of Colored Planes? Neural Discovery in der Mathematik: Träumen Maschinen von farbigen Flugzeugen? 数学中的神经发现:机器梦到彩色平面吗? 2501.18527v3 -
2039 06-05 Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction Diffusion mit einem sprachlichen Kompass: Steuerung der Generation klinisch plausibler Zukunft sMRI-Darstellungen für frühe MCI-Konversionsvorhersage 语言指南的传播:指导为早期MCI转换预测生成临床可视未来SMRI代表 2506.05428v1 -
2040 06-05 Urania: Differentially Private Insights into AI Use Urania: Unterschiedliche private Einblicke in die KI-Nutzung Urania: 差异私人透视转化为 AI 使用 2506.04681v1 -
2041 06-05 The cost of ensembling: is it always worth combining? Die Kosten für die Montage: lohnt es sich immer zu kombinieren? 集合的成本:总值得合并吗? 2506.04677v1 -
2042 06-05 Gen-n-Val: Agentic Image Data Generation and Validation Gen-n-Val: Gen-n-Val: Agentische Bilddatengenerierung und -validierung Gen-n-Val: 代理图像数据生成和校验 2506.04676v1 -
2043 06-05 Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models Psi-Sampler: Erste Partikel-Probenahme für SMC-basierte Inferenz-Zeit-Belohnung in Score-Modellen Psi-Sampler: 分数模型中基于 SMC 的推理-时间回报比对初始粒子抽样 2506.01320v2 -
2044 06-05 TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization TANGO: Clustering mit Typicality-Aware Nonlocal Mode-Seeking und Graph-Cut-Optimierung TANGO:以非本地模式搜索和图集优化组合组合 2408.10084v2 -
2045 06-05 MTPNet: Multi-Grained Target Perception for Unified Activity Cliff Prediction MTPNet: Multi-Grained Target Perception für Unified Activity Cliff Prediction MTPNet: 统一活动悬崖预测的多重目标感知 2506.05427v1 -
2046 06-05 FedAPM: Federated Learning via ADMM with Partial Model Personalization FedAPM: Federated Learning via ADMM mit Teilmodell-Personalisierung FedAPP:通过具有部分示范性人格化的ADMMD学习联合会 2506.04672v1 -
2047 06-05 Noise-Resistant Label Reconstruction Feature Selection for Partial Multi-Label Learning Noise-Resistent Label Reconstruction Feature Selection for Partial Multi-Label Learning 部分多标签学习选择部分多标签学习的恢复标签功能 2506.04669v1 -
2048 06-05 Mixture-of-Experts Meets In-Context Reinforcement Learning Mixture-of-Experts trifft auf In-Context-Verstärktes Lernen 混合专家会议 文中加强学习 2506.05426v1 -
2049 06-05 Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs Multiple Invertible und Partial-Equivariant Funktion für Latent Vector Transformation zur Verbesserung der Disentanglementierung in VAEs 用于中子矢量变异的多重可垂直和部分等同函数,以强化 VAEs 的分解 2502.03740v2 -
2050 06-05 Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning Latent Safety-Constrained Policy Approach für sicheres Offline-Verstärkungslernen 安全离线强化安全学习受安保约束的政策办法 2412.08794v2 -
2051 06-05 Can Artificial Intelligence Trade the Stock Market? Kann Künstliche Intelligenz den Aktienmarkt handeln? 人工智能贸易 股市? 2506.04658v1 -
2052 06-05 The Oversmoothing Fallacy: A Misguided Narrative in GNN Research Der überwältigende Fehler: Ein fehlgeleitetes Narrativ in der GNN-Forschung 《压倒性代价:GNN研究中的误导性叙述》 2506.04653v1 -
2053 06-05 Active Learning with Variational Quantum Circuits for Quantum Process Tomography Aktives Lernen mit unterschiedlichen Quantenkreisen für die Quantenprozesstomographie 使用量子过程地形图谱学量子变量电路进行主动学习 2412.20925v2 -
2054 06-05 Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction Neural Network Reprogrammierbarkeit: Ein einheitliches Thema auf Modellumprogrammierung, Prompt-Tuning und Prompt-Instruktion 神经网络重组:关于模式重组、即时提款和即时指示的统一主题 2506.04650v1 -
2055 06-05 Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing Nicht alle Optionen sind gleich: Textuelle Optionsgewichtung für Token-Efficient LLM-basierte Wissensverfolgung 并非所有选项都创建为等: Token- Efficient LLM 知识追踪的文本选项比重 2410.12872v2 -
2056 06-05 ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation ActivePusher: Aktives Lernen und Planen mit Residual Physics für nichtprähensile Manipulation 主动推进器: 主动学习和规划,对非传染病操纵进行残余物理研究与规划 2506.04646v1 -
2057 06-05 Inference economics of language models Schlußfolgerung Wirtschaftlichkeit von Sprachmodellen 语言模式的推论经济学 2506.04645v1 -
2058 06-05 MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning MoDoMoDo: Multi-Domain-Datenmischungen für multimodales LLM-Verstärkungslernen MoDoMoMoDoDo:多式LLM强化学习多功能数据混合体 2505.24871v2 -
2059 06-05 Adversarial Combinatorial Semi-bandits with Graph Feedback Adversariale Kombinatoriale Halbbänder mit Graph Feedback 带有图图反馈的半斜面 2502.18826v4 -
2060 06-05 An analytic theory of creativity in convolutional diffusion models Eine analytische Theorie der Kreativität in konvolutionären Diffusionsmodellen 革命扩散模型中的创造性分析理论 2412.20292v2 -
2061 06-05 Composing Agents to Minimize Worst-case Risk Komponieren von Agenten zur Minimierung des schlimmsten Risikos 尽量减少最坏情况风险的混合剂 2506.04632v1 -
2062 06-05 PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models Vergifteter Papagei: Subtile Daten vergiften Angriffe auf Elicit Copyright-Verletzung Inhalt aus großen Sprachmodellen 有毒Parrot:大语言模型中侵犯版权内容的隐含数据毒害攻击 2503.07697v2 -
2063 06-05 Constructive Symbolic Reinforcement Learning via Intuitionistic Logic and Goal-Chaining Inference Konstruktives symbolisches Stärkungslernen durch intuitionistische Logik und zielführende Schlussfolgerung 通过神学逻辑和目标 – – 测深推断法进行建设性建构性符号强化学习 2506.05422v1 -
2064 06-05 On The Sample Complexity Bounds In Bilevel Reinforcement Learning Auf der Probe Komplexität Grenzen im Zwei-Ebenen-Verstärkungs-Lernen 关于在双级强化学习中的抽样复杂度交错 2503.17644v3 -
2065 06-05 MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation MAVL: Ein mehrsprachiger Audio-Video-Text Datensatz für animierte Song-Übersetzung MAVL: 动动歌曲翻译多语种视听歌词数据集 2505.18614v2 -
2066 06-05 Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos Ungewissheitsbasiertes Extensibles Codebook für diskretes Federated Learning in heterogenen Datensilos 以不确定性为基础的异异种数据过滤器中分辨联邦学习扩展守则手册 2402.18888v4 -
2067 06-05 Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? Können Wasserzeichen große Sprachmodelle die urheberrechtlich geschützte Textgenerierung verhindern und Trainingsdaten ausblenden? 水标记大语言模型能否防止有版权的文本生成和隐藏培训数据? 2407.17417v3 -
2068 06-05 Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning Bedauern-Optimal Q-Lernen mit niedrigen Kosten für ein-Agenten und Federated Verstärkungs-Lernen 单一机构和联邦强化学习的低费用学习 2506.04626v1 -
2069 06-05 Static Word Embeddings for Sentence Semantic Representation Statische Wort-Einbettungen für Satz semantische Darstellung 判决语义代表的静态单词嵌入 2506.04624v1 -
2070 06-05 Solving Inverse Problems via Diffusion-Based Priors: An Approximation-Free Ensemble Sampling Approach Inverse Probleme durch Diffusionsbasierte Priorien lösen: Ein annäherungsfreier Ensemble-Probenansatz 通过以传播为基础的前例解决反向问题:一种近似-无聚合集成样样样的方法 2506.03979v2 -
2071 06-05 Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting Proaktive Modellanpassung gegen Konzept Drift für Online-Zeitreihenprognosen 针对在线时间序列预测的 “ 动态动态 “ 概念的前瞻性模型适应 2412.08435v4 -
2072 06-05 Efficiently Serving Large Multimodal Models Using EPD Disaggregation Effizientes Servieren großer multimodaler Modelle mit EPD-Disaggregation 利用EPD拆分有效服务大型多模式模式 2501.05460v3 -
2073 06-05 Exploring bidirectional bounds for minimax-training of Energy-based models Erforschung bidirektionaler Grenzen für Minimax-Training von energiebasierten Modellen 探讨能源模型小型培训的双向双向界限 2506.04609v1 -
2074 06-05 ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning ADG: Ambient Diffusion-Guided Dataset Recovery für Korruption-Robust Offline-Verstärkung Lernen ADG: 为腐败 – – ROBT离线强化学习而进行渗透性传播、指导性数据集恢复 2505.23871v2 -
2075 06-05 Ignoring Directionality Leads to Compromised Graph Neural Network Explanations Ignorieren der Richtungalität führt zu kompromissierten Graphen-Neural-Netzwerk-Erklärungen 忽略方向引导导致压缩的图形神经网络解释 2506.04608v1 -
2076 06-05 MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball MVP-Shapley: Featurebasierte Modellierung für die Bewertung des wertvollsten Spielers im Basketball MVP-Shaplay:评估篮球中最有价值的玩家的基于地物的模型模型 2506.04602v1 -
2077 06-05 Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets Skalierungsgesetze für einen robusten Vergleich von Open Foundation Language-Vision-Modellen und -Datensätzen 开放基础语言-视觉模型和数据集强力比较法 2506.04598v1 -
2078 06-05 Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Lösen des vernünftigen Potenzials von vortrainierten LLMs durch Kritik Feinsteuerung auf ein Problem 通过Critique对一个问题的微调,释放预先培训的LLMs的理据潜力 2506.03295v2 -
2079 06-05 Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable Sicherheitssteuer: Sicherheitsausrichtung macht Ihre großen vernünftigen Modelle weniger vernünftig 安全税:安全协调使你的大理由模型降低合理性 2503.00555v2 -
2080 06-05 Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification Sicher: Mathematische Reasoning in großen Sprachmodellen durch retrospektive stufenweise Verifizierung verbessern 安全:通过回溯性逐步认识正式核查,加强大语言模型中的数学理由 2506.04592v1 -
2081 06-05 Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret Thompson Sampling und UCB verbinden: Auf dem Weg zu effizienteren Kompromissen zwischen Datenschutz und Bedauern 连接Thompson抽样和UCB:实现更高效的隐私和遗憾之间的权衡取舍 2505.02383v2 -
2082 06-05 Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization Adressierung von Konzept-Mislabeling in Konzept-Bottleneck-Modellen durch Preference-Optimierung 通过优先优化处理概念瓶颈模式中的概念误贴标签问题 2504.18026v3 -
2083 06-05 AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment AdvBDGen: Adversarially Fortified Prompt-Spezifische Fuzzy Backdoor Generator gegen LLM Ausrichtung AdvBDGen:抗LLM对齐的反增强的快速特制烟雾后门发电机 2410.11283v3 -
2084 06-05 Learning Hyperparameters via a Data-Emphasized Variational Objective Hyperparameter über ein datenverstärktes Variationsziel lernen 通过数据-按单位分列的变异目标学习超超参数 2502.01861v2 -
2085 06-05 Contrastive Visual Data Augmentation Kontrastive Bilddatenvergrößerung 对比视觉数据增强 2502.17709v2 -
2086 06-05 StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation StatsMerging: Statistik-geführtes Modell Zusammenführen über Task-Spezifische Lehrer-Destillation StatsMerging:通过特定任务教师蒸馏将统计指导模型合并 2506.04567v1 -
2087 06-05 Clustering and Median Aggregation Improve Differentially Private Inference Clustering und Median Aggregation verbessern unterschiedliche private Schlussfolgerungen 群集和中中聚合改善差别私人推断 2506.04566v1 -
2088 06-05 ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration ReCalKV: Low-Rank KV Cache-Kompression über Head Reordering und Offline-Kalibrierung ReCalKV: 通过头重排序和离线校准进行低Rank KV缓存压缩 2505.24357v2 -
2089 06-05 OpenThoughts: Data Recipes for Reasoning Models OpenThoughts: Datenrezepte für sinnvolle Modelle OpenToughts: 用于解释模型的数据折录 2506.04178v2 -
2090 06-05 Investigating Distributions of Telecom Adapted Sentence Embeddings for Document Retrieval Untersuchung der Verteilung von Telecom-adaptierten Satz-Einbindungen für Dokumentenwiederherstellung 用于文件检索的经调整的远程判刑嵌入表的调查分发情况 2406.12336v3 -
2091 06-05 LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits LEDRO: LLM-Enhanced Design Raumreduzierung und Optimierung für analoge Schaltungen LLDRO: LLM-加强设计空间减少和模拟电路优化 2411.12930v2 -
2092 06-05 What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Was in LLM-Schichten passiert ist, wenn es um schnelles gegen langsames Denken geht: Eine gradiente Perspektive 训练快速与慢思考:渐进视角时 LLM 图层中发生的情况 2410.23743v2 -
2093 06-05 Non-linear Multi-objective Optimization with Probabilistic Branch and Bound Nichtlineare Multiobjektive Optimierung mit probabilistischem Zweig und Bound 非线性多目标优化与概率分支和圆 2506.04554v1 -
2094 06-05 Unsupervised Machine Learning for Scientific Discovery: Workflow and Best Practices Unüberwachtes maschinelles Lernen für wissenschaftliche Entdeckung: Workflow und Best Practices 为科学发现进行无人监督的机器学习:工作流程和最佳做法 2506.04553v1 -
2095 06-05 Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality Politisches Lernen “ohne” überschneidet sich: Pessimismus und generalisierte empirische Ungleichheit Bernsteins 政策学习“无重叠”重叠:悲观主义和普遍经验型伯恩斯坦的不平等 2212.09900v4 -
2096 06-05 Communication Efficient Adaptive Model-Driven Quantum Federated Learning Kommunikation Effizientes adaptives modellgetriebenes Quantum-Federiertes Lernen 适应性强、适应性强的示范模式-量子联邦学习 2506.04548v1 -
2097 06-05 $K$-MSHC: Unmasking Minimally Sufficient Head Circuits in Large Language Models with Experiments on Syntactic Classification Tasks $K$-MSHC: Entlarvung minimal ausreichender Kopfkreise in großen Sprachmodellen mit Experimenten zu syntaktischen Klassifizierungsaufgaben USK美元-MSHC:在以同步分类任务实验的大型语言模型中,以最小化的足够大语言电路总电路 2505.12268v2 -
2098 06-05 ATLaS: Agent Tuning via Learning Critical Steps ATLaS: Agent Tuning über Learning Critical Steps 通过学习关键步骤进行测试的代理 2503.02197v2 -
2099 06-05 General Intelligence Requires Reward-based Pretraining Allgemeine Intelligenz erfordert eine prämienbasierte Vorausbildung 以奖励为基础的训练前训练 2502.19402v2 -
2100 06-05 Combinatorial Reinforcement Learning with Preference Feedback Kombinatorisches Stärkungslernen mit Präferenz-Feedback 结合强化学习与优先反馈 2502.10158v3 -
2101 06-05 hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation hdl2v: Ein Code-Übersetzungsdatensatz für verbesserte LLM Verilog-Generierung hdl2v: 用于强化LLM Verilog 生成的代码翻译数据集 2506.04544v1 -
2102 06-05 Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction Neurale MJD: Neurale nicht-stationäre Merton Jump Diffusion für Zeitreihenvorhersage 神经MJD: 用于时间序列预测的神经非立体Merton跳跃扩散 2506.04542v1 -
2103 06-05 Chronoamperometry with Room-Temperature Ionic Liquids: Sub-Second Inference Techniques Chronoamperometrie mit Ionenflüssigkeiten der Raumtemperatur: Sub-Second Inferenztechniken 与室-室-温度-不态液体:二次二次推断技术 2506.04540v1 -
2104 06-05 Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models Intelligent4DSE: Optimierung der High-Level-Synthese Design Space Exploration mit Graph Neural Networks und großen Sprachmodellen 智能4DSE:利用图形神经网络和大语言模型优化高级合成设计空间探索 2504.19649v2 -
2105 06-05 Olfactory Inertial Odometry: Sensor Calibration and Drift Compensation Olfactory Inertial Odometrie: Sensorkalibrierung und Driftkompensation 腐蚀性惰性测量:传感器校准和漂流赔偿 2506.04539v1 -
2106 06-05 Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs Projektionsbasierte Lyapunov-Methode für voll heterogene schwach gekoppelte MDPs 以投射为基础的Lyapunov 方法,用于完全多元的微软相混合的 MDP 2502.06072v4 -
2107 06-05 NOBLE – Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models NOBLE – Neuronenoperator mit biologisch informierten Latent-Embeddings zur Erfassung experimenteller Variabilität in biologischen Neuronenmodellen NOBL – – 神经操作器,具有生物知情的中程嵌入装置,以捕获生物中子模型中的实验变异性 2506.04536v1 -
2108 06-05 On the Emergence of Position Bias in Transformers Über die Entstehung der Position Bias in Transformatoren 变换者中立场偏见的出现 2502.01951v2 -
2109 06-05 HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training HALOS: Hierarchische asynchrone lokale SGD über langsame Netzwerke für geo-distributed Large Language Model Training HALOS:地理分布大语言模式培训的跨慢网的等级非同步本地 SGD 2506.04531v1 -
2110 06-05 Self-Predictive Dynamics for Generalization of Vision-based Reinforcement Learning Selbstvorhersagbare Dynamiken für die Generalisierung des visionsbasierten Verstärkungslernens 普及基于愿景的强化学习的自我预测动态 2506.05418v1 -
2111 06-05 Hierarchical Implicit Neural Emulators Hierarchische Implizite Neuralemulatoren 等级隐性神经隐性模拟器 2506.04528v1 -
2112 06-05 Uncovering Memorization Effect in the Presence of Spurious Correlations Uncovering Memorization Effekt in der Gegenwart von Purous Correlations 在存在净腐蚀物时未覆盖的记忆化效应 2501.00961v3 -
2113 06-05 Perturbative Gradient Training: A novel training paradigm for bridging the gap between deep neural networks and physical reservoir computing Perturbative Gradient Training: Ein neuartiges Trainingsparadigma zur Überbrückung der Kluft zwischen tiefen neuronalen Netzwerken und physikalischen Reservoir Computing 循环渐进培训:弥合深神经网络与物理储油层计算之间的差距的新培训模式 2506.04523v1 -
2114 06-05 EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles EnsemW2S: Verbesserung der Schwach-zu-Strong-Verallgemeinerung mit großsprachigen Modellensembles EnsemW2S:用大语言模型组合加强弱至强的通用化 2505.21959v2
Article 0
Title@2025-06-12 (4): Rethinking Losses for Diffusion Bridge Samplers
Title: Rethinking Losses for Diffusion Bridge Samplers | Rückdenken von Verlusten für Diffusionsbrücken-Probenehmer | 重新思考扩散桥采样器的损失 2506.10982v1 |
Authors (5): Sebastian Sanokowski, Lukas Gruber, Christoph Bartmann, Sepp Hochreiter, Sebastian Lehner
Diffusion bridges are a promising class of deep-learning methods for sampling from unnormalized distributions. Recent works show that the Log Variance (LV) loss consistently outperforms the reverse Kullback-Leibler (rKL) loss when using the reparametrization trick to compute rKL-gradients. While the on-policy LV loss yields identical gradients to the rKL loss when combined with the log-derivative trick for diffusion samplers with non-learnable forward processes, this equivalence does not hold for diffusion bridges or when diffusion coefficients are learned. Based on this insight we argue that for diffusion bridges the LV loss does not represent an optimization objective that can be motivated like the rKL loss via the data processing inequality. Our analysis shows that employing the rKL loss with the log-derivative trick (rKL-LD) does not only avoid these conceptual problems but also consistently outperforms the LV loss. Experimental results with different types of diffusion bridges on challenging benchmarks show that samplers trained with the rKL-LD loss achieve better performance. From a practical perspective we find that rKL-LD requires significantly less hyperparameter optimization and yields more stable training behavior.
扩散桥是一个很有希望的深层学习方法类别,从非正常分布分布中取样。最近的工程显示,日志差异(LV)损失在使用重新校正技巧计算 rKL 梯度时,始终优于反向 Kullback- Leiber (rKL) 损失。虽然在政策上LV 损失产生与 RKL 损失相同的梯度,如果与逻辑衍生技巧(rKL-LD)相结合,不仅避免了这些概念问题,而且始终超越了LV 损失。在具有挑战性的基准上,不同种类的传播桥梁的实验结果显示,通过 RKLL-LD损失培训的采样者要达到更高的性能。从实际角度看,我们发现使用RKLLL损失的优化不是最强的,而是更稳定地进行高的优化。
Article 1
Title@2025-06-12 (4): Fine-Grained Perturbation Guidance via Attention Head Selection
Title: Fine-Grained Perturbation Guidance via Attention Head Selection | Feinkörnige Störungsführung über Aufmerksamkeitskopfauswahl | 通过 “ 关注负责人甄选 “ 指导 2506.10978v1 |
Authors (10): Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Minjae Kim, Jaewon Min, Wooseok Jang, Saungwu Lee, Sayak Paul, Susung Hong, Seungryong Kim
Recent guidance methods in diffusion models steer reverse sampling by perturbing the model to construct an implicit weak model and guide generation away from it. Among these approaches, attention perturbation has demonstrated strong empirical performance in unconditional scenarios where classifier-free guidance is not applicable. However, existing attention perturbation methods lack principled approaches for determining where perturbations should be applied, particularly in Diffusion Transformer (DiT) architectures where quality-relevant computations are distributed across layers. In this paper, we investigate the granularity of attention perturbations, ranging from the layer level down to individual attention heads, and discover that specific heads govern distinct visual concepts such as structure, style, and texture quality. Building on this insight, we propose “HeadHunter”, a systematic framework for iteratively selecting attention heads that align with user-centric objectives, enabling fine-grained control over generation quality and visual attributes. In addition, we introduce SoftPAG, which linearly interpolates each selected head’s attention map toward an identity matrix, providing a continuous knob to tune perturbation strength and suppress artifacts. Our approach not only mitigates the oversmoothing issues of existing layer-level perturbation but also enables targeted manipulation of specific visual styles through compositional head selection. We validate our method on modern large-scale DiT-based text-to-image models including Stable Diffusion 3 and FLUX.1, demonstrating superior performance in both general quality enhancement and style-specific guidance. Our work provides the first head-level analysis of attention perturbation in diffusion models, uncovering interpretable specialization within attention layers and enabling practical design of effective perturbation strategies.
传播模型的最近指导方法通过扰动模型来引导反向取样,以构建一个隐含的薄弱模型并引导其生成。在这些方法中,人们的注意力扰动在无条件的假设情景中表现出了很强的经验性表现,这些假设情景中不适用无分类指导。然而,现有的注意力扰动方法缺乏确定应在何处应用扰动的系统化框架,特别是在与质量相关的计算分布在多个层次的Difult 变异器(DiT)结构中。在本文中,我们调查了从层到个人关注头的注意力扰动颗粒性变化,发现具体的头部在结构、风格和质质质质量质量质量等不同的视觉概念上表现得非常强。基于这一洞察,我们建议“头动”是一个系统化框架,用于迭代地选择与用户中心目标一致的注意力头部结构,从而能够对生成质量和视觉属性进行细化控制。 此外,我们引入了SoftPAG, 将每个选定的头部关注度图以直线度为首,提供持续 knob 以调动的振动性流动性水平和压压动性结构型结构的深度结构。 我们的方法只能在视觉结构内部进行着的升级的深度上进行着的升级的升级,我们现有的视觉结构结构结构的升级,我们通过特定的调整,只能测测测测测测测测测测测测测测测测测测测测测测测测,我们现有的系统, 。
Article 2
Title@2025-06-12 (4): AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Title: AutoMind: Adaptive Knowledgeable Agent for Automated Data Science | AutoMind: Adaptives Knowledgeable Agent für automatisierte Datenwissenschaft | 自动Mind:自动数据科学适应性知识代理 2506.10974v1 |
Authors (9): Yixin Ou, Yujie Luo, Jingsheng Zheng, Lanning Wei, Shuofei Qiao, Jintian Zhang, Da Zheng, Huajun Chen, Ningyu Zhang
Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science.
大型语言模型(LLM)代理商在解决现实世界数据科学问题方面表现出了巨大的潜力。LLM驱动的数据科学代理商承诺使整个机器学习管道自动化,然而其真实世界的有效性仍然有限。现有框架依赖于僵硬、预先定义的工作流程和不灵活的编码战略;因此,它们仅擅长于相对简单、古老的问题,未能捕捉人类从业者带来复杂、创新任务的经验专长。在这项工作中,我们引入了AutoMind(AutoMind)(一个适应性、知识丰富的LLM(LM)代理商)框架,通过三项关键进步克服了这些缺陷:(1) 一种成熟的专家知识基础,使该代理商具有领域专家知识;(2) 一种具有代理知识的树搜索算法,从战略上探索可能的解决方案;(3) 一种自我调整的编码战略,根据任务的复杂性动态地定制生成代码。对两个自动化数据科学基准的评估表明,AutoMind(AutoMind)能够提供优异的绩效、效率和质量解决方案质量,强调AutMind(Autmind)是迈向完全自动化数据科学的高效和稳健健捷的一步。
Article 3
Title@2025-06-12 (4): Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning
Title: Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning | Prinzipierte Ansätze zur Erweiterung neuraler Architekturen auf Funktionsräume für das Bediener-Lernen | 将神经结构扩展至功能空间以便操作员学习的原则办法 2506.10973v1 |
Authors (7): Julius Berner, Miguel Liu-Schiaffini, Jean Kossaifi, Valentin Duruisseaux, Boris Bonev, Kamyar Azizzadenesheli, Anima Anandkumar
A wide range of scientific problems, such as those described by continuous-time dynamical systems and partial differential equations (PDEs), are naturally formulated on function spaces. While function spaces are typically infinite-dimensional, deep learning has predominantly advanced through applications in computer vision and natural language processing that focus on mappings between finite-dimensional spaces. Such fundamental disparities in the nature of the data have limited neural networks from achieving a comparable level of success in scientific applications as seen in other fields. Neural operators are a principled way to generalize neural networks to mappings between function spaces, offering a pathway to replicate deep learning’s transformative impact on scientific problems. For instance, neural operators can learn solution operators for entire classes of PDEs, e.g., physical systems with different boundary conditions, coefficient functions, and geometries. A key factor in deep learning’s success has been the careful engineering of neural architectures through extensive empirical testing. Translating these neural architectures into neural operators allows operator learning to enjoy these same empirical optimizations. However, prior neural operator architectures have often been introduced as standalone models, not directly derived as extensions of existing neural network architectures. In this paper, we identify and distill the key principles for constructing practical implementations of mappings between infinite-dimensional function spaces. Using these principles, we propose a recipe for converting several popular neural architectures into neural operators with minimal modifications. This paper aims to guide practitioners through this process and details the steps to make neural operators work in practice. Our code can be found at https://github.com/neuraloperator/NNs-to-NOs
功能空间一般是无限的,但深层次的学习主要通过计算机视觉和自然语言处理的应用,侧重于在有限空间之间绘制地图。这些数据性质上的根本差异限制了神经网络在科学应用方面取得与其它领域相似的成功水平。神经操作员是将神经网络推广到功能空间间绘图的一条原则性途径,为复制深层次学习对科学问题的影响提供了途径。例如,神经操作员可以学习整个类别PDE的解决方案操作员,例如,具有不同边界条件、系数功能和地理特征的物理系统。深层次学习的关键因素之一是通过广泛的实验测试对神经结构进行仔细的工程设计。将这些神经结构转换成神经操作员,使操作员能够学习这些在功能空间之间的实验性优化。然而,以前的神经操作员结构常常被引入为独立模型,而不是直接在构建现有神经网络结构的扩展中,通过这些核心结构的扩展,我们通过这些软质操作员和软质结构的扩展,我们找到了这些软质操作员的模型。
Article 4
Title@2025-06-12 (4): Farseer: A Refined Scaling Law in Large Language Models
Title: Farseer: A Refined Scaling Law in Large Language Models | Farseer: Ein verfeinertes Skalierungsgesetz in großen Sprachmodellen | Farseer:大语言模式中经过完善的扩大法 2506.10972v1 |
Authors (11): Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing a model loss surface $L(N,D)$, Farseer achieves a significantly better fit to empirical data than prior laws (e.g., Chinchilla’s law). Our methodology yields accurate, robust, and highly generalizable predictions, demonstrating excellent extrapolation capabilities, improving upon Chinchilla’s law by reducing extrapolation error by 433\%. This allows for the reliable evaluation of competing training strategies across all $(N,D)$ settings, enabling conclusions from small-scale ablation studies to be confidently extrapolated to predict large-scale performance. Furthermore, Farseer provides new insights into optimal compute allocation, better reflecting the nuanced demands of modern LLM training. To validate our approach, we trained an extensive suite of approximately 1,000 LLMs across diverse scales and configurations, consuming roughly 3 million NVIDIA H100 GPU hours. We are comprehensively open-sourcing all models, data, results, and logs at https://github.com/Farseer-Scaling-Law/Farseer to foster further research.
培训大型语言模型(LLMS)的费用太高,令人望而却步,造成了一个关键的缩小差距,因为小规模实验的洞察力往往无法向资源密集型生产系统转移,从而阻碍有效的创新。为了弥补这一差距,我们引入了Farseer, 这是一项新颖和经过改进的缩略法, 提供了更高的全尺度预测准确度。通过系统地构建一个模型损失表面值$(N,D),Farseer比以往的法律(例如Chinchilla的法律)更符合经验数据。我们的方法得出了准确、有力和高度普遍化的预测,显示了极好的外推法能力,通过减少433的外推误改进了Chinchilla的法律。这样,我们就可以可靠地评估所有(N,D)环境中相互竞争的培训战略,提供更高的预测性预测性预测性能。此外,Farseer提供了对最佳的计算分配的新见解,更好地反映现代LMM培训的精细要求。为了验证我们的方法,我们培训了整个方法,在各种规模和配置上大约1 000 LLMS-LMS-IA全面研算。
Article 5
Title@2025-06-12 (4): What Exactly Does Guidance Do in Masked Discrete Diffusion Models
Title: What Exactly Does Guidance Do in Masked Discrete Diffusion Models | Was genau tut Anleitung in maskierten diskreten Diffusionsmodellen? | 在隐蔽的分辨扩散模型中,指导到底做什么? 2506.10971v1 |
Authors (3): He Ye, Rojas Kevin, Tao Molei
We study masked discrete diffusion models with classifier-free guidance (CFG). Assuming no score error nor discretization error, we derive an explicit solution to the guided reverse dynamics, so that how guidance influences the sampling behavior can be precisely characterized. When the full data distribution is a mixture over classes and the goal is to sample from a specific class, guidance amplifies class-specific regions while suppresses regions shared with other classes. This effect depends on the guidance strength $w$ and induces distinct covariance structures in the sampled distribution. Notably, we observe quantitatively different behaviors in $1$D and $2$D. We also show that for large $w$, the decay rate of the total variation ($\mathrm{TV}$) along the reverse dynamics is double-exponential in $w$ for both $1$D and $2$D. These findings highlight the role of guidance, not just in shaping the output distribution, but also in controlling the dynamics of the sampling trajectory. Our theoretical analysis is supported by experiments that illustrate the geometric effects of guidance and its impact on convergence.
假设没有分数错误或分解错误,我们就会为引导反向动态找到一个明确的解决方案,从而可以精确地描述指导如何影响抽样行为。当完整的数据分布是跨类混合的,目标是从特定类别中取样时,指导会扩大特定类别区域,同时压制与其他类别共享的区域。这种影响取决于指导强度,因为美元值取决于指导强度,并会在抽样分布中产生不同的共差结构。值得注意的是,我们观察到以1美元和2美元计算的量化不同行为。我们还表明,对于大美元而言,整个反向动态变化的衰减率($mathrm{TV}$)是双倍的,以1美元和2美元计算。这些结论突出了指导的作用,不仅在决定产出分布方面,而且在控制抽样分布的动态方面。我们的理论分析得到实验的支持,这些实验显示了指导的几何效果及其对趋同的影响。
Article 6
Title@2025-06-12 (4): SoK: Watermarking for AI-Generated Content
Title: SoK: Watermarking for AI-Generated Content | SoK: Wasserzeichen für KI-generierten Inhalt | SoK: 用于 AI 生成内容的水标志 2411.18479v3 |
Authors (14): Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Carlini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, Somesh Jha, Lei Li, Yu-Xiang Wang, Dawn Song
As the outputs of generative AI (GenAI) techniques improve in quality, it becomes increasingly challenging to distinguish them from human-created content. Watermarking schemes are a promising approach to address the problem of distinguishing between AI and human-generated content. These schemes embed hidden signals within AI-generated content to enable reliable detection. While watermarking is not a silver bullet for addressing all risks associated with GenAI, it can play a crucial role in enhancing AI safety and trustworthiness by combating misinformation and deception. This paper presents a comprehensive overview of watermarking techniques for GenAI, beginning with the need for watermarking from historical and regulatory perspectives. We formalize the definitions and desired properties of watermarking schemes and examine the key objectives and threat models for existing approaches. Practical evaluation strategies are also explored, providing insights into the development of robust watermarking techniques capable of resisting various attacks. Additionally, we review recent representative works, highlight open challenges, and discuss potential directions for this emerging field. By offering a thorough understanding of watermarking in GenAI, this work aims to guide researchers in advancing watermarking methods and applications, and support policymakers in addressing the broader implications of GenAI.
由于基因化的AI(GenAI)技术在质量上有所改进,因此将其与人类创造的内容区别开来已变得日益具有挑战性。水标记计划是解决区别AI和人类生成的内容问题的一个很有希望的方法。这些计划将隐藏的信号嵌入AI生成的内容中,以便能够进行可靠的检测。虽然水标记并不是解决与GenAI有关的所有风险的银弹,但它可以通过打击错误和欺骗行为,在提高AI的安全和可信度方面发挥关键作用。本文件全面概述了GenAI的水标记技术,从历史和监管角度出发,为GenAI提供一个水标记技术的全面概览。我们正式确定水标记计划的定义和预期特性,并审查现有方法的关键目标和威胁模式。我们还探讨了实际评估战略,为开发能够抵御各种袭击的稳健的水标记技术提供了深刻的见解。此外,我们审查最近的代表著作,突出公开的挑战,并讨论这一新兴领域的潜在方向。通过全面了解GenAI的水标记,这项工作旨在指导研究人员推进水标记方法和应用,并支持决策者解决GenAI的更广泛影响。
Article 7
Title@2025-06-12 (4): SpectralAR: Spectral Autoregressive Visual Generation
Title: SpectralAR: Spectral Autoregressive Visual Generation | SpectralAR: Spectral Autoregressive Visual Generation | 光谱AR:光谱自动递减视觉生成 2506.10962v1 |
Authors (6): Yuanhui Huang, Weiliang Chen, Wenzhao Zheng, Yueqi Duan, Jie Zhou, Jiwen Lu
Autoregressive visual generation has garnered increasing attention due to its scalability and compatibility with other modalities compared with diffusion models. Most existing methods construct visual sequences as spatial patches for autoregressive generation. However, image patches are inherently parallel, contradicting the causal nature of autoregressive modeling. To address this, we propose a Spectral AutoRegressive (SpectralAR) visual generation framework, which realizes causality for visual sequences from the spectral perspective. Specifically, we first transform an image into ordered spectral tokens with Nested Spectral Tokenization, representing lower to higher frequency components. We then perform autoregressive generation in a coarse-to-fine manner with the sequences of spectral tokens. By considering different levels of detail in images, our SpectralAR achieves both sequence causality and token efficiency without bells and whistles. We conduct extensive experiments on ImageNet-1K for image reconstruction and autoregressive generation, and SpectralAR achieves 3.02 gFID with only 64 tokens and 310M parameters. Project page: https://huang-yh.github.io/spectralar/.
与扩散模型相比,自动递减的视觉生成由于其可变性和与其他模式的兼容性而日益引起注意。大多数现有方法将视觉序列构建为自动递减生成的空间补丁。然而,图像补丁本质上是平行的,与自动递减模型的因果关系性质相矛盾。为了解决这个问题,我们提议一个光谱自动递减生成框架,从光谱角度认识视觉序列的因果关系。具体地说,我们首先将图像转换成带有Nested Spectoral Tokenization(代表低频至高频组件)的定购光谱符号。然后,我们用光谱符号序列的相向偏移方式进行自动递增生成。通过考虑图像中不同的详细程度,我们的SpectrameralAR在没有钟和哨子的情况下实现序列因果关系和象征性效率。我们用图像Net-1K进行广泛的实验,用于图像重建和自动递增生成,而SpecturalAR只实现3.02 gFID,只有64个符号和3100M参数。项目页面: https://huang-ygio/slamgioalgioal.
Article 8
Title@2025-06-12 (4): ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
Title: ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark | ChinesischHarm-Bench: Ein chinesischer schädlicher Content Detection Benchmark | 中中汉禁区:中国有害内容检测基准 2506.10960v1 |
Authors (10): Kangwei Liu, Siyuan Cheng, Bozhong Tian, Xiaozhuan Liang, Yuyang Yin, Meng Han, Ningyu Zhang, Bryan Hooi, Xi Chen, Shumin Deng
Large language models (LLMs) have been increasingly applied to automated harmful content detection tasks, assisting moderators in identifying policy violations and improving the overall efficiency and accuracy of content review. However, existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope. We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data. Our annotation process further yields a knowledge rule base that provides explicit expert knowledge to assist LLMs in Chinese harmful content detection. In addition, we propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models, enabling smaller models to achieve performance comparable to state-of-the-art LLMs. Code and data are available at https://github.com/zjunlp/ChineseHarm-bench.
大型语言模型(LLMS)已越来越多地用于自动识别有害内容的任务,协助主持人查明违反政策的情况,提高内容审查的总体效率和准确性;然而,用于检测有害内容的现有资源主要集中于英语,中国数据集仍然稀缺,而且范围往往有限;我们为中国内容伤害检测提供了一个全面、专业的附加说明的基准,该基准涵盖六个代表性类别,完全以现实世界数据为基础;我们的批注过程还产生了一个知识规则基础,该知识基础提供明确的专家知识知识,协助查找中国有害内容的LMS;此外,我们提出一个知识强化基线,将人附加说明的知识规则和大型语言模型的隐含知识结合起来,使较小的模型能够达到与最先进的LMS相近的性能;可在https://github.com/zjunp/中国Harm-bench查阅数据。
Article 9
Title@2025-06-12 (4): Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
Title: Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods | In-Context-Lernen über strukturierte Manifolds verstehen: Aufmerksamkeit auf Kernel-Methoden überbrücken | 理解关于结构式操纵的内文学习:弥合对核心方法的注意 2506.10959v1 |
Authors (4): Zhaiming Shen, Alexander Hsu, Rongjie Lai, Wenjing Liao
While in-context learning (ICL) has achieved remarkable success in natural language and vision domains, its theoretical understanding–particularly in the context of structured geometric data–remains unexplored. In this work, we initiate a theoretical study of ICL for regression of H"older functions on manifolds. By establishing a novel connection between the attention mechanism and classical kernel methods, we derive generalization error bounds in terms of the prompt length and the number of training tasks. When a sufficient number of training tasks are observed, transformers give rise to the minimax regression rate of H"older functions on manifolds, which scales exponentially with the intrinsic dimension of the manifold, rather than the ambient space dimension. Our result also characterizes how the generalization error scales with the number of training tasks, shedding light on the complexity of transformers as in-context algorithm learners. Our findings provide foundational insights into the role of geometry in ICL and novels tools to study ICL of nonlinear models.
虽然在自然语言和视觉领域取得了显著的成功(ICL),但其理论理解 – – 特别是在结构化几何数据-剩余部分尚未探索的几何数据-领域。在这项工作中,我们开始对ICL进行理论研究,以在多元上回归H'older函数。通过在关注机制和古典内核方法之间建立新的联系,我们从迅速的长度和培训任务的数量方面得出了概括性错误的界限。当观察到足够数量的训练任务时,变压器产生在多元体上的H'older函数的微缩回归率,这些元体与多元体的内在维度相比,而不是环境空间层面是倍增的。我们的结果还说明了与培训任务数量相比的一般误差尺度,揭示了变压器作为同源算算算法学习者的复杂性。我们的调查结果为在ICL和研究非线性模型的ICL(ICL)的新工具中的几何作用提供了基本洞察力。
Article 10
Title@2025-06-12 (4): AssistanceZero: Scalably Solving Assistance Games
Title: AssistanceZero: Scalably Solving Assistance Games | AssistanceZero: Skalably Solving Assistance Games | 援助零:可升级解决援助运动会 2504.07091v2 |
Authors (8): Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, Anca Dragan
Assistance games are a promising alternative to reinforcement learning from human feedback (RLHF) for training AI assistants. Assistance games resolve key drawbacks of RLHF, such as incentives for deceptive behavior, by explicitly modeling the interaction between assistant and user as a two-player game where the assistant cannot observe their shared goal. Despite their potential, assistance games have only been explored in simple settings. Scaling them to more complex environments is difficult because it requires both solving intractable decision-making problems under uncertainty and accurately modeling human users’ behavior. We present the first scalable approach to solving assistance games and apply it to a new, challenging Minecraft-based assistance game with over $10^{400}$ possible goals. Our approach, AssistanceZero, extends AlphaZero with a neural network that predicts human actions and rewards, enabling it to plan under uncertainty. We show that AssistanceZero outperforms model-free RL algorithms and imitation learning in the Minecraft-based assistance game. In a human study, our AssistanceZero-trained assistant significantly reduces the number of actions participants take to complete building tasks in Minecraft. Our results suggest that assistance games are a tractable framework for training effective AI assistants in complex environments. Our code and models are available at https://github.com/cassidylaidlaw/minecraft-building-assistance-game.
援助游戏是加强学习人类反馈(RLHF)以培训AI助理的一种有希望的替代方法。援助游戏解决了RLHF的关键缺点,例如鼓励欺骗行为。援助游戏将助理和用户之间的互动作为两个玩家的游戏,使助理无法遵守共同目标。援助游戏尽管有潜力,但只是在简单的环境下才探索。将援助游戏推广到更复杂的环境中是困难的,因为它既需要解决不确定的棘手决策问题,也需要准确地模拟人类用户的行为。我们提出了第一个可扩展的方法来解决援助游戏,并将其应用到一个新的、具有挑战性的基于地雷的援助游戏,其可能的目标超过1 000美元400美元。我们的方法,“援助Zero”,将阿尔法Zero扩展到一个预测人类行动和奖励的神经网络,使其能够在不确定的情况下进行规划。我们显示,援助Zero超越了无模型RL算法和模拟基于地雷的援助游戏的模拟学习。在一项人类研究中,我们培训Zero培训的助理大大减少了参与者完成地雷工业建筑任务的数量。我们的成果表明,“AlezeZero Zero”,我们的援助游戏是一个复杂的建筑模型。
Article 11
Title@2025-06-12 (4): ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems
Title: ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems | ReGuidance: Ein einfacher Diffusions-Wrapper zur Steigerung der Probenqualität bei harten inversen Problemen | 重新指导:一个简单的传播包包,用于提高反向问题样本质量 2506.10955v1 |
Authors (3): Aayush Karan, Kulin Shah, Sitan Chen
There has been a flurry of activity around using pretrained diffusion models as informed data priors for solving inverse problems, and more generally around steering these models using reward models. Training-free methods like diffusion posterior sampling (DPS) and its many variants have offered flexible heuristic algorithms for these tasks, but when the reward is not informative enough, e.g., in hard inverse problems with low signal-to-noise ratio, these techniques veer off the data manifold, failing to produce realistic outputs. In this work, we devise a simple wrapper, ReGuidance, for boosting both the sample realism and reward achieved by these methods. Given a candidate solution $\hat{x}$ produced by an algorithm of the user’s choice, we propose inverting the solution by running the unconditional probability flow ODE in reverse starting from $\hat{x}$, and then using the resulting latent as an initialization for DPS. We evaluate our wrapper on hard inverse problems like large box in-painting and super-resolution with high upscaling. Whereas state-of-the-art baselines visibly fail, we find that applying our wrapper on top of these baselines significantly boosts sample quality and measurement consistency. We complement these findings with theory proving that on certain multimodal data distributions, ReGuidance simultaneously boosts the reward and brings the candidate solution closer to the data manifold. To our knowledge, this constitutes the first rigorous algorithmic guarantee for DPS.
使用预先培训的传播模型,作为解决反向问题的知情数据前导,以及更一般地以奖励模型指导这些模型。 推广远端抽样(DPS)及其许多变量等无培训方法为这些任务提供了灵活的超光速算法,但是当奖励不够充分时,例如,在信号到噪音比率低的硬反向问题中,这些技术从数据元流中跳出,未能产生现实产出。在这项工作中,我们设计了一个简单的包装、重新指导,用以提升这些方法所取得的样板真实主义和奖励。鉴于根据用户选择的算法产生的一个候选人解决方案$\hat{x}及其许多变量,我们建议通过运行无条件概率流来扭转这些方法,从$\hat{x}开始,然后利用由此产生的潜值来启动DPS。我们首先评估关于硬反向反向的问题的包装,例如大型的纸箱和超分辨率,以高尺度来提升这些方法所取得的真实性真实性和超清晰度。 在通过用户选择的算算法的精度基准中,我们发现一些最精确性的数据,我们用最精确的精确的精确的基值模型来测量数据,我们用这些最精确的精确的精确的模型来测量模型分析结果。
Article 12
Title@2025-06-12 (4): Build the web for agents, not agents for the web
Title: Build the web for agents, not agents for the web | Erstellen Sie das Web für Agenten, nicht Agenten für das Web | 为代理者而不是网络代理者建立网络 2506.10953v1 |
Authors (4): Xing Han Lù, Gaurav Kamath, Marius Mosbach, Siva Reddy
Recent advancements in Large Language Models (LLMs) and multimodal counterparts have spurred significant interest in developing web agents – AI systems capable of autonomously navigating and completing tasks within web environments. While holding tremendous promise for automating complex web interactions, current approaches face substantial challenges due to the fundamental mismatch between human-designed interfaces and LLM capabilities. Current methods struggle with the inherent complexity of web inputs, whether processing massive DOM trees, relying on screenshots augmented with additional information, or bypassing the user interface entirely through API interactions. This position paper advocates for a paradigm shift in web agent research: rather than forcing web agents to adapt to interfaces designed for humans, we should develop a new interaction paradigm specifically optimized for agentic capabilities. To this end, we introduce the concept of an Agentic Web Interface (AWI), an interface specifically designed for agents to navigate a website. We establish six guiding principles for AWI design, emphasizing safety, efficiency, and standardization, to account for the interests of all primary stakeholders. This reframing aims to overcome fundamental limitations of existing interfaces, paving the way for more efficient, reliable, and transparent web agent design, which will be a collaborative effort involving the broader ML community.
大语言模型(LLMS)和多式联运对应方最近的进展引起了人们对开发网络代理器的极大兴趣 – – 能够自主浏览和完成网络环境中任务的AI系统。在拥有使复杂的网络互动自动化的巨大希望的同时,目前的方法面临巨大的挑战,因为人设计的界面与LLM能力之间根本的不匹配。目前的方法与网络投入的内在复杂性挣扎,无论是处理大型DOM树,依靠通过额外信息增加的截图,还是完全通过API互动绕过用户界面。这份立场文件倡导网络代理器研究的范式转变:而不是迫使网络代理器适应为人类设计的界面,我们应开发一种新的互动模式,专门优化了代理能力。为此,我们引入了AWI(AWI)概念,这是一个专门为代理器浏览网站而设计的界面。我们为AWI设计制定了六项指导原则,强调安全、效率和标准化,以顾及所有主要利益攸关方的利益。这一重新配置的目的是克服现有界面的基本限制,为更有效、可靠和透明的网络代理器设计铺平道路,这将扩大社区的协作努力。
Article 13
Title@2025-06-12 (4): Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training
Title: Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training | Domain2Vec: Vectorizing Datasets, um die optimale Datenmischung ohne Training zu finden | 域2Vec: 将数据集矢量化,以查找未经过培训的最佳数据混合体 2506.10952v1 |
Authors (4): Mozhi Zhang, Howe Tissue, Lu Wang, Xipeng Qiu
We introduce~\textsc{Domain2Vec}, a novel approach that decomposes any dataset into a linear combination of several \emph{meta-domains}, a new concept designed to capture the key underlying features of datasets. \textsc{Domain2Vec} maintains a vocabulary of meta-domains and uses a classifier to decompose any given dataset into a domain vector that corresponds to a distribution over this vocabulary. These domain vectors enable the identification of the optimal data mixture for language model (LM) pretraining in a training-free manner under the \emph{\textbf{D}istribution \textbf{A}lignment \textbf{A}ssumption} (DA$^{2}$), which suggests that when the data distributions of the training set and the validation set are better aligned, a lower validation loss is achieved. Moreover, \textsc{Domain2vec} can be seamlessly integrated into previous works to model the relationship between domain vectors and LM performance, greatly enhancing the efficiency and scalability of previous methods. Extensive experiments demonstrate that \textsc{Domain2Vec} helps find the data mixture that enhances downstream task performance with minimal computational overhead. Specifically, \textsc{Domain2Vec} achieves the same validation loss on Pile-CC using only $51.5\%$ of the computation required when training on the original mixture of The Pile dataset. Under equivalent compute budget, \textsc{Domain2Vec} improves downstream performance by an average of $2.83\%$.
我们引入了{ textsc{ domain2Vec} 这一新的方法, 将任何数据集分解成多个\ emph{ meta- domains} 的线性组合。 这是一个新概念, 旨在捕捉数据集的关键基本特征。\ textsc{ Domain2Vec} 维持一个元域词汇, 并使用一个分类器将任何特定数据集分解成一个与此词汇中分布相对应的域矢量。 这些域矢量使得能够根据\ emph{ meta- domays} 将任何数据集分解成一个线性化组合。 在\ emph{ textb{ D} 以无培训方式对语言模式( LM) 进行预培训前的优化数据组合。 Do} greatation\ textbf{A} lignment\ textbf{A} Asumplegation of divical conditional ladections the requideal requidealal $ $ $2\ deviltypecial develyal devely extradeal le exdeal le ex deal extradestruffects a ex extial le lemental lemental lemental lections lections 在前端, levelmental lections lemental press
Article 14
Title@2025-06-12 (4): Execution Guided Line-by-Line Code Generation
Title: Execution Guided Line-by-Line Code Generation | Ausführen Geführte Zeilen-für-Linien-Code-Generierung | 逐行逐行制导 2506.10948v1 |
Authors (3): Boaz Lavon, Shahar Katz, Lior Wolf
We present a novel approach to neural code generation that incorporates real-time execution signals into the language model generation process. While large language models (LLMs) have demonstrated impressive code generation capabilities, they typically do not utilize execution feedback during inference, a critical signal that human programmers regularly leverage. Our method, Execution-Guided Classifier-Free Guidance (EG-CFG), dynamically incorporates execution signals as the model generates code, providing line-by-line feedback that guides the generation process toward executable solutions. EG-CFG employs a multi-stage process: first, we conduct beam search to sample candidate program completions for each line; second, we extract execution signals by executing these candidates against test cases; and finally, we incorporate these signals into the prompt during generation. By maintaining consistent signals across tokens within the same line and refreshing signals at line boundaries, our approach provides coherent guidance while preserving syntactic structure. Moreover, the method naturally supports native parallelism at the task level in which multiple agents operate in parallel, exploring diverse reasoning paths and collectively generating a broad set of candidate solutions. Our experiments across diverse coding tasks demonstrate that EG-CFG significantly improves code generation performance compared to standard approaches, achieving state-of-the-art results across various levels of complexity, from foundational problems to challenging competitive programming tasks. Our code is available at: https://github.com/boazlavon/eg_cfg
我们提出了一种将实时执行信号纳入语言模式生成过程的神经代码生成新颖方法,将实时执行信号纳入语言模式生成过程。大型语言模型(LLMs)展示了令人印象深刻的代码生成能力,而大型语言模型(LLMs)则通常没有在推断过程中使用执行反馈,而这是人类程序员经常利用的关键信号。我们的方法,即执行-指导分类-免责指南(EG-CFG),在模型生成代码时动态整合执行信号,提供逐行反馈,引导生成可执行解决方案的生成过程。EG-CFG采用多阶段程序:首先,我们进行抽样搜索,为每行选取候选人方案完成样本;第二,我们通过执行这些候选人测试案例来提取执行信号;最后,我们将这些信号纳入到生成过程中的快速信号。我们的方法,即执行-指导了同一行内的一致性信号,在保持同步结构结构的同时,提供了一致的指导。此外,该方法自然地支持任务层面的本土平行平行运行,探索不同的推理路径,并集体生成广泛的候选人解决方案。我们在不同的网络-CFLEG-C-C-C-C-CF-CFS-CFD-CFD 不同层次上实现具有挑战性的业绩-CFD-CFD-CFD-CFD-CFD的代码层次上的各种实验,从不同的标准化方法,从不同的标准-CFDM-CFDM 和不同层次上,从不同的标准-CFD-CFD-CFDM 的层次,从不同的实验,从不同的标准化方法到到到不同层次,从不同的标准-CFD。
Article 15
Title@2025-06-12 (4): Spurious Rewards: Rethinking Training Signals in RLVR
Title: Spurious Rewards: Rethinking Training Signals in RLVR | Spurious Rewards: Rethinking Trainingssignale in RLVR | 净利奖:在RLVR重新思考培训信号 2506.10947v1 |
Authors (14): Rulin Shao, Shuyue Stella Li, Rui Xin, Scott Geng, Yiping Wang, Sewoong Oh, Simon Shaolei Du, Nathan Lambert, Sewon Min, Ranjay Krishna, Yulia Tsvetkov, Hannaneh Hajishirzi, Pang Wei Koh, Luke Zettlemoyer
We show that reinforcement learning with verifiable rewards (RLVR) can elicit strong mathematical reasoning in certain models even with spurious rewards that have little, no, or even negative correlation with the correct answer. For example, RLVR improves MATH-500 performance for Qwen2.5-Math-7B in absolute points by 21.4% (random reward), 13.8% (format reward), 24.1% (incorrect label), 26.0% (1-shot RL), and 27.1% (majority voting) – nearly matching the 29.1% gained with ground truth rewards. However, the spurious rewards that work for Qwen often fail to yield gains with other model families like Llama3 or OLMo2. In particular, we find code reasoning – thinking in code without actual code execution – to be a distinctive Qwen2.5-Math behavior that becomes significantly more frequent after RLVR, from 65% to over 90%, even with spurious rewards. Overall, we hypothesize that, given the lack of useful reward signal, RLVR must somehow be surfacing useful reasoning representations learned during pretraining, although the exact mechanism remains a topic for future work. We suggest that future RLVR research should possibly be validated on diverse models rather than a single de facto choice, as we show that it is easy to get significant performance gains on Qwen models even with completely spurious reward signals.
我们显示,通过可核查的奖励(RLVR)强化学习可以在某些模型中引起强烈的数学推理,即使虚假的奖励很少、没有,甚至与正确的答案有负相关关系。例如,RLVR将Quwen2.5-Matth-7B的MATH-500性能绝对提高21.4%(随机奖赏)、13.8%(标准奖赏)、24.1%(标准奖赏)、24.1%(不正确的标签)、26.0%(1发RLL)和27.1%(多数投票) – – 几乎与获得的29.1%的实情奖状相匹配。然而,为Quenyerywen工作往往无法与Llama3 或OLMo2等其他模范式家庭取得收益的虚假奖状。特别是,我们发现代码推理学原理 – – 在代码中思考而不实际的代码执行 – – 是一种独特的Qwen2.5-Math行为,在RVR后,从65%到90%以上的奖状,甚至有惊人的奖状。总体而言,我们假设,由于缺少有用的奖状奖状的奖状信号,RVLV 一定的推理陈述在前学到了有用的推理陈述,虽然确切的机制是我们未来的研究成绩可能是一个重要的题目,但只是的标志,在将来的标志性能的标志。
Article 16
Title@2025-06-12 (4): GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models
Title: GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models | GUARD: Geführtes Lernen und Zurückhalten über Datenzuweisung für große Sprachmodelle | GUARD:通过大语言模式数据归称制,指导学习和保留 2506.10946v1 |
Authors (7): Evelyn Ma, Duo Zhou, Peizhi Niu, Huiting Zhou, Huan Zhang, Olgica Milenkovic, S. Rasoul Etesami
Unlearning in large language models (LLMs) is becoming increasingly important due to regulatory compliance, copyright protection, and privacy concerns. However, a key challenge in LLM unlearning is unintended forgetting, where the removal of specific data inadvertently impairs the utility of the model and its retention of valuable, desired information. While prior work has primarily focused on architectural innovations, the influence of data-level factors on unlearning performance remains underexplored. As a result, existing methods often suffer from degraded retention when forgetting high-impact data. To address this, we propose GUARD-a novel framework for Guided Unlearning And Retention via Data attribution. At its core, GUARD introduces a lightweight proxy data attribution metric tailored for LLM unlearning, which quantifies the “alignment” between the forget and retain sets while remaining computationally efficient. Building on this, we design a novel unlearning objective that assigns adaptive, nonuniform unlearning weights to samples, inversely proportional to their proxy attribution scores. Through such a reallocation of unlearning power, GUARD mitigates unintended losses in retention. We provide rigorous theoretical guarantees that GUARD significantly enhances retention while maintaining forgetting metrics comparable to prior methods. Extensive experiments on the TOFU benchmark across multiple LLM architectures demonstrate that GUARD substantially improves utility preservation while ensuring effective unlearning. Notably, GUARD reduces utility sacrifice on the Retain Set by up to 194.92% in terms of Truth Ratio when forgetting 10% of the training data.
在大型语言模型(LLMS)中,不学于大型语言模型(LLM)正变得日益重要。然而,LLM(LLM)中的一个关键挑战是无意忘记,因为删除具体数据会无意地损害模型的效用和保存宝贵、期望的信息。虽然以前的工作主要侧重于建筑创新,但数据层面因素对不学业绩的影响仍然未得到充分探讨。因此,在忘记高影响数据时,现有方法往往会受到退化保留。为了解决这个问题,我们提议GUARD(GUARD)为通过数据归属引导不学和保留建立一个新的框架。在核心方面,GUARD为LM(LM)不学习推出一个轻型代用代用数据归属指标,该指标使遗忘和保留各套之间的“匹配”在计算上保持效率的同时被量化。在此基础上,我们设计了一个新的不学习目标,即给样本分配适应性、非单一的不学习权重,与其代用分数的分数反比。我们提议,GURD(GARD)提出一个严格的理论保证为LM(GR)在大幅提升历史标定值的同时,同时,使R(DR)R(DR)的R)BILILILILIL)标准在以往的升级中大幅改进。同时,同时,同时,使R(BILVILBIBIBR(R)的不比重)的保存。
Article 17
Title@2025-06-12 (4): Coupled reaction and diffusion governing interface evolution in solid-state batteries
Title: Coupled reaction and diffusion governing interface evolution in solid-state batteries | Gekoppelte Reaktion und Diffusion, die die Schnittstellenentwicklung in Festkörperbatterien regeln | 关于固体状态电池界面演变的组合反应与扩散 2506.10944v1 |
Authors (7): Jingxuan Ding, Laura Zichi, Matteo Carli, Menghang Wang, Albert Musaelian, Yu Xie, Boris Kozinsky
Understanding and controlling the atomistic-level reactions governing the formation of the solid-electrolyte interphase (SEI) is crucial for the viability of next-generation solid state batteries. However, challenges persist due to difficulties in experimentally characterizing buried interfaces and limits in simulation speed and accuracy. We conduct large-scale explicit reactive simulations with quantum accuracy for a symmetric battery cell, {\symcell}, enabled by active learning and deep equivariant neural network interatomic potentials. To automatically characterize the coupled reactions and interdiffusion at the interface, we formulate and use unsupervised classification techniques based on clustering in the space of local atomic environments. Our analysis reveals the formation of a previously unreported crystalline disordered phase, Li$2$S${0.72}$P${0.14}$Cl${0.14}$, in the SEI, that evaded previous predictions based purely on thermodynamics, underscoring the importance of explicit modeling of full reaction and transport kinetics. Our simulations agree with and explain experimental observations of the SEI formations and elucidate the Li creep mechanisms, critical to dendrite initiation, characterized by significant Li motion along the interface. Our approach is to crease a digital twin from first principles, without adjustable parameters fitted to experiment. As such, it offers capabilities to gain insights into atomistic dynamics governing complex heterogeneous processes in solid-state synthesis and electrochemistry.
了解和控制关于形成固态电路交替器(SEI)的原子级反应对于下一代固态电池的可行性至关重要。然而,由于在实验性地定性掩埋界面方面的困难以及模拟速度和精确度方面的限制,挑战依然存在。我们在SEI进行大规模明确的反应模拟,对对称电池电池电池(thsymcell})进行量度精确度的模拟,通过积极学习和深度等异质神经网络间生潜能来进行这种模拟。要自动确定界面上的各种反应和相互融合的特征,我们根据当地原子环境空间的组合,制定和使用不受监督的分类技术。我们的分析显示,一个先前未报告的晶状干扰的阶段的形成,即:Li$2$=0.72}P$0.14}0.14美元。 在SEI中,通过纯粹基于热力学过程的预测而回避以前的预测,强调对全面反应和运输动态进行明确模拟的重要性。我们的模拟同意并解释对SEI的实验的实验性观测结果,从重要的磁性结构到我们开始的精确性机制。
Article 18
Title@2025-06-12 (4): TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems
Title: TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems | TimberStrike: Datensatz-Rekonstruktion Angriff Enthüllen der Privatsphäre Leckage in Federated Tree-Based Systems | 木材三角:联邦树基系统中数据集重建攻击清除隐私渗漏 2506.07605v2 |
Authors (5): Marco Di Gennaro, Giovanni De Lucia, Stefano Longari, Stefano Zanero, Michele Carminati
Federated Learning has emerged as a privacy-oriented alternative to centralized Machine Learning, enabling collaborative model training without direct data sharing. While extensively studied for neural networks, the security and privacy implications of tree-based models remain underexplored. This work introduces TimberStrike, an optimization-based dataset reconstruction attack targeting horizontally federated tree-based models. Our attack, carried out by a single client, exploits the discrete nature of decision trees by using split values and decision paths to infer sensitive training data from other clients. We evaluate TimberStrike on State-of-the-Art federated gradient boosting implementations across multiple frameworks, including Flower, NVFlare, and FedTree, demonstrating their vulnerability to privacy breaches. On a publicly available stroke prediction dataset, TimberStrike consistently reconstructs between 73.05% and 95.63% of the target dataset across all implementations. We further analyze Differential Privacy, showing that while it partially mitigates the attack, it also significantly degrades model performance. Our findings highlight the need for privacy-preserving mechanisms specifically designed for tree-based Federated Learning systems, and we provide preliminary insights into their design.
联邦学习联合会已成为中央机构学习的以隐私为导向的替代方案,有利于合作模式培训,而没有直接分享数据。尽管对神经网络进行了广泛研究,但基于树的模型对安全和隐私的影响仍未得到充分探讨。这项工作引入了TaultStrike,这是以横向结合的树为基础的模型为对象的基于优化的数据元重建攻击。我们由一个客户进行的攻击,利用决策树的离散性质,利用不同的价值和决定路径从其他客户处推断敏感培训数据。我们评估了木材在包括Flower、NVFFlare和FedTre在内的多个框架的州级联盟梯度促进实施方面发生的碰撞,展示了它们易受隐私破坏的脆弱性。在公开提供的中风预测数据集中,木材Strike持续地重建了所有执行过程中目标数据集的73.05 %至95.63%。我们进一步分析差异隐私,表明它虽然部分减轻了攻击,但也显著地降低了模型性。我们的调查结果突出表明需要专门为基于树木的联邦学习系统设计的隐私保护机制,我们提供了初步的见解。
Article 19
Title@2025-06-12 (4): Self-Adapting Language Models
Title: Self-Adapting Language Models | Selbstanpassende Sprachmodelle | 自我改进语言模式 2506.10943v1 |
Authors (6): Adam Zweiger, Jyothish Pari, Han Guo, Ekin Akyürek, Yoon Kim, Pulkit Agrawal
Large language models (LLMs) are powerful but static; they lack mechanisms to adapt their weights in response to new tasks, knowledge, or examples. We introduce Self-Adapting LLMs (SEAL), a framework that enables LLMs to self-adapt by generating their own finetuning data and update directives. Given a new input, the model produces a self-edit-a generation that may restructure the information in different ways, specify optimization hyperparameters, or invoke tools for data augmentation and gradient-based updates. Through supervised finetuning (SFT), these self-edits result in persistent weight updates, enabling lasting adaptation. To train the model to produce effective self-edits, we use a reinforcement learning loop with the downstream performance of the updated model as the reward signal. Unlike prior approaches that rely on separate adaptation modules or auxiliary networks, SEAL directly uses the model’s own generation to control its adaptation process. Experiments on knowledge incorporation and few-shot generalization show that SEAL is a promising step toward language models capable of self-directed adaptation. Our website and code is available at https://jyopari.github.io/posts/seal.
大型语言模型(LLMS)是强大但静态的;它们缺乏根据新的任务、知识或实例调整其权重的机制;我们引入了自我开发的LLMS(SEAL),这个框架使LLMS能够通过生成自己的微调数据和更新指令进行自我改造。根据新的投入,该模型产生自编辑的一代,可以以不同方式重组信息,指定最优化的超参数,或为数据扩增和梯度更新援引工具。通过监管的微调(SFT),这些自编辑导致持续更新重量,使得能够进行持久的适应。为制作有效自编辑而培训模型,我们使用一个强化学习循环,将更新后的模型的下游性表现作为奖励信号。与以前依赖单独的适应模块或辅助网络的方法不同,SEAL直接使用模型的生成来控制其适应进程。关于知识整合的实验和几张图的概括化显示SEAL是朝着能够自导适应的语言模型迈出的有希望的一步。我们的网站和代码可以在 https://jyopari.github.gitual/s。
Article 20
Title@2025-06-12 (4): VINCIE: Unlocking In-context Image Editing from Video
Title: VINCIE: Unlocking In-context Image Editing from Video | VINCIE: Im Kontext Bildbearbeitung von Video entsperren | VINCIE: 从视频中解锁 Incontext 图像编辑 2506.10941v1 |
Authors (10): Leigang Qu, Feng Cheng, Ziyan Yang, Qi Zhao, Shanchuan Lin, Yichun Shi, Yicong Li, Wenjie Wang, Tat-Seng Chua, Lu Jiang
In-context image editing aims to modify images based on a contextual sequence comprising text and previously generated images. Existing methods typically depend on task-specific pipelines and expert models (e.g., segmentation and inpainting) to curate training data. In this work, we explore whether an in-context image editing model can be learned directly from videos. We introduce a scalable approach to annotate videos as interleaved multimodal sequences. To effectively learn from this data, we design a block-causal diffusion transformer trained on three proxy tasks: next-image prediction, current segmentation prediction, and next-segmentation prediction. Additionally, we propose a novel multi-turn image editing benchmark to advance research in this area. Extensive experiments demonstrate that our model exhibits strong in-context image editing capabilities and achieves state-of-the-art results on two multi-turn image editing benchmarks. Despite being trained exclusively on videos, our model also shows promising abilities in multi-concept composition, story generation, and chain-of-editing applications.
文本内图像编辑旨在根据由文本和先前生成的图像组成的背景序列修改图像。 现有方法通常取决于任务特定管道和专家模型( 如分解和油漆)来整理培训数据。 在这项工作中,我们探讨能否直接从视频中学习文本内图像编辑模型; 我们采用可扩缩的方法将视频注释作为相互脱节的多式联运序列; 为了有效地从这些数据中学习, 我们设计了一个块- 螺旋扩散变压器, 就三种代理任务进行了培训: 下一个图像预测, 当前分解预测, 和下一个分解预测。 此外, 我们提出一个新的多方向图像编辑基准, 以推进这一领域的研究。 广泛的实验表明, 我们的模式展示了超强的文本图像编辑能力, 在两个多方向图像编辑基准上取得了最新的结果。 尽管我们只接受过视频培训, 我们的模式还展示了多概念构成、 故事生成和链编辑应用程序方面的有前途的能力。
Article 21
Title@2025-06-12 (4): Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction
Title: Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction | Entwicklung eines leistungsfähigen Rahmens für die Sprachemotionserkennung unter naturalistischen Bedingungen Herausforderung für die emotionale Attributvorhersage | 制定一个高绩效框架,以便在自然条件下对情感属性预测的自然条件挑战中承认言论情感 2506.10930v1 |
Authors (4): Thanathai Lertpetchpun, Tiantian Feng, Dani Byrd, Shrikanth Narayanan
Speech emotion recognition (SER) in naturalistic conditions presents a significant challenge for the speech processing community. Challenges include disagreement in labeling among annotators and imbalanced data distributions. This paper presents a reproducible framework that achieves superior (top 1) performance in the Emotion Recognition in Naturalistic Conditions Challenge (IS25-SER Challenge) - Task 2, evaluated on the MSP-Podcast dataset. Our system is designed to tackle the aforementioned challenges through multimodal learning, multi-task learning, and imbalanced data handling. Specifically, our best system is trained by adding text embeddings, predicting gender, and including Other'' (O) and
No Agreement’’ (X) samples in the training set. Our system’s results secured both first and second places in the IS25-SER Challenge, and the top performance was achieved by a simple two-system ensemble.
语言处理社区面临的重大挑战是自然条件下的言语情绪识别(SER)对语言处理社区来说是一个重大挑战,挑战包括记录者之间在标签上的分歧和数据分布不平衡。本文件提供了一个可复制的框架,在自然条件下的情感识别挑战(IS25-SER挑战)中取得了优异(顶部1)性能。任务2,对MSP-Podcast数据集进行了评估。我们的系统旨在通过多式学习、多任务学习和不平衡的数据处理来应对上述挑战。具体地说,我们的最佳系统通过添加文本嵌入、预测性别以及将“其他”和“不协议”样本纳入培训集来培训。我们的系统结果确保了IS25-SER挑战的第一和第二位,而顶级的功能是通过简单的两系统共体实现的。
Article 22
Title@2025-06-12 (4): Improving LLM Safety Alignment with Dual-Objective Optimization
Title: Improving LLM Safety Alignment with Dual-Objective Optimization | Verbesserung der LLM-Sicherheitsausrichtung mit Dual-Ziel-Optimierung | 提高LLM安全一致性,实现双目标优化 2503.03710v2 |
Authors (7): Xuandong Zhao, Will Cai, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song
Existing training-time safety alignment techniques for large language models (LLMs) remain vulnerable to jailbreak attacks. Direct preference optimization (DPO), a widely deployed alignment method, exhibits limitations in both experimental and theoretical contexts as its loss function proves suboptimal for refusal learning. Through gradient-based analysis, we identify these shortcomings and propose an improved safety alignment that disentangles DPO objectives into two components: (1) robust refusal training, which encourages refusal even when partial unsafe generations are produced, and (2) targeted unlearning of harmful knowledge. This approach significantly increases LLM robustness against a wide range of jailbreak attacks, including prefilling, suffix, and multi-turn attacks across both in-distribution and out-of-distribution scenarios. Furthermore, we introduce a method to emphasize critical refusal tokens by incorporating a reward-based token-level weighting mechanism for refusal learning, which further improves the robustness against adversarial exploits. Our research also suggests that robustness to jailbreak attacks is correlated with token distribution shifts in the training process and internal representations of refusal and harmful tokens, offering valuable directions for future research in LLM safety alignment. The code is available at https://github.com/wicai24/DOOR-Alignment
对大型语言模型(LLMS)的现有培训时间安全调整技术仍然容易受到越狱攻击。直接偏好优化(DPO)是一种广泛采用的调整方法,在实验和理论背景下都表现出了局限性,因为其损失功能证明对拒绝学习而言并不理想。我们通过基于梯度的分析,查明这些缺陷,并提议改进安全调整,将DPO的目标分为两个部分:(1) 强有力的拒绝培训,鼓励即使在产生部分不安全的一代人的情况下也予以拒绝,(2) 有针对性的不学习有害知识。这一方法大大提高了LM对一系列广泛的越狱攻击的稳健性,包括预先填补、后缀和在分配和分配之外两种情况下的多发攻击。此外,我们引入了一种强调关键拒绝标志的方法,即采用基于奖励的象征性加权机制来学习拒绝学习,从而进一步提高对抗对抗敌对剥削的稳健性。我们的研究还表明,对监狱破伤攻击的稳健性与培训过程中象征性的分发变化以及拒绝和有害标志的内部表现有关,为LM安全调整的未来研究提供了宝贵的方向。该代码可在 https://Giusub/DOB/DO.
Article 23
Title@2025-06-12 (4): On feature selection in double-imbalanced data settings: a Random Forest approach
Title: On feature selection in double-imbalanced data settings: a Random Forest approach | Zur Feature-Auswahl in doppelt ausgeglichenen Dateneinstellungen: ein Random Forest-Ansatz | 在双重平衡数据设置中选择功能的功能:随机森林办法 2506.10929v1 |
Authors (1): Fabio Demaria
Feature selection is a critical step in high-dimensional classification tasks, particularly under challenging conditions of double imbalance, namely settings characterized by both class imbalance in the response variable and dimensional asymmetry in the data $(n \gg p)$. In such scenarios, traditional feature selection methods applied to Random Forests (RF) often yield unstable or misleading importance rankings. This paper proposes a novel thresholding scheme for feature selection based on minimal depth, which exploits the tree topology to assess variable relevance. Extensive experiments on simulated and real-world datasets demonstrate that the proposed approach produces more parsimonious and accurate subsets of variables compared to conventional minimal depth-based selection. The method provides a practical and interpretable solution for variable selection in RF under double imbalance conditions.
地物选择是高层次分类任务的关键步骤,特别是在具有挑战性的双重不平衡条件下,即在数据$(n\gg p)中以反应变数和尺寸不对称两种等级不平衡为特点的设置。在这种情况下,适用于随机森林的传统地物选择方法往往产生不稳定或误导重要性的排名。本文提出基于最低深度的地物选择新颖的门槛计划,利用树本学评估可变关联性。模拟和实际世界数据集的广泛实验表明,与传统的最低深度选择相比,拟议的方法产生了更相似和准确的变量子集。这种方法为在双重失衡条件下选择俄罗斯联邦的变量提供了实用和可解释的解决方案。
Article 24
Title@2025-06-12 (4): Investigating the Relationship Between Physical Activity and Tailored Behavior Change Messaging: Connecting Contextual Bandit with Large Language Models
Title: Investigating the Relationship Between Physical Activity and Tailored Behavior Change Messaging: Connecting Contextual Bandit with Large Language Models | Untersuchung der Beziehung zwischen körperlicher Aktivität und maßgeschneidertem Verhalten Veränderung Messaging: Verknüpfung von Kontextbandit mit großen Sprachmodellen | 调查身体活动与因地制宜的行为变化通信之间的关系:将环境强盗与大语言模式联系起来 2506.07275v2 |
Authors (7): Haochen Song, Dominik Hofer, Rania Islambouli, Laura Hawkins, Ananya Bhattacharjee, Meredith Franklin, Joseph Jay Williams
Machine learning approaches, such as contextual multi-armed bandit (cMAB) algorithms, offer a promising strategy to reduce sedentary behavior by delivering personalized interventions to encourage physical activity. However, cMAB algorithms typically require large participant samples to learn effectively and may overlook key psychological factors that are not explicitly encoded in the model. In this study, we propose a hybrid approach that combines cMAB for selecting intervention types with large language models (LLMs) to personalize message content. We evaluate four intervention types: behavioral self-monitoring, gain-framed, loss-framed, and social comparison, each delivered as a motivational message aimed at increasing motivation for physical activity and daily step count. Message content is further personalized using dynamic contextual factors including daily fluctuations in self-efficacy, social influence, and regulatory focus. Over a seven-day trial, participants receive daily messages assigned by one of four models: cMAB alone, LLM alone, combined cMAB with LLM personalization (cMABxLLM), or equal randomization (RCT). Outcomes include daily step count and message acceptance, assessed via ecological momentary assessments (EMAs). We apply a causal inference framework to evaluate the effects of each model. Our findings offer new insights into the complementary roles of LLM-based personalization and cMAB adaptation in promoting physical activity through personalized behavioral messaging.
在这项研究中,我们提出一种混合方法,将选择干预类型与大语言模型(LLMs)相结合,使信息内容个性化。我们评价了四种干预类型:行为自我监测、获取框架、损失框架和社会比较,每个类型都作为一种激励信息,目的是增加体力活动和日常步骤计数的动力。信息内容利用动态背景因素进一步个性化,包括自觉、社会影响和监管重点方面的日常波动。在七天的试验中,与会者收到四种模式之一的每日信息:单是CMAB、LLM、与LM个人化(cMABxLM)、或平等随机化(RCT)。结果包括每日步骤计数和信息接收,通过个人瞬间判断评估评估个人行为。我们运用了每个周期性结论,我们运用了个人行为分析模型,我们运用了每个周期性结论性评估。
Article 25
Title@2025-06-12 (4): Robustly Improving LLM Fairness in Realistic Settings via Interpretability
Title: Robustly Improving LLM Fairness in Realistic Settings via Interpretability | Robuste Verbesserung der LLM Fairness in realistischen Einstellungen durch Dolmetschbarkeit | 通过可解释性在现实环境中强有力地提高LLM公平性 2506.10922v1 |
Authors (2): Adam Karvonen, Samuel Marks
Large language models (LLMs) are increasingly deployed in high-stakes hiring applications, making decisions that directly impact people’s careers and livelihoods. While prior studies suggest simple anti-bias prompts can eliminate demographic biases in controlled evaluations, we find these mitigations fail when realistic contextual details are introduced. We address these failures through internal bias mitigation: by identifying and neutralizing sensitive attribute directions within model activations, we achieve robust bias reduction across all tested scenarios. Across leading commercial (GPT-4o, Claude 4 Sonnet, Gemini 2.5 Flash) and open-source models (Gemma-2 27B, Gemma-3, Mistral-24B), we find that adding realistic context such as company names, culture descriptions from public careers pages, and selective hiring constraints (e.g.,``only accept candidates in the top 10\%”) induces significant racial and gender biases (up to 12\% differences in interview rates). When these biases emerge, they consistently favor Black over White candidates and female over male candidates across all tested models and scenarios. Moreover, models can infer demographics and become biased from subtle cues like college affiliations, with these biases remaining invisible even when inspecting the model’s chain-of-thought reasoning. To address these limitations, our internal bias mitigation identifies race and gender-correlated directions and applies affine concept editing at inference time. Despite using directions from a simple synthetic dataset, the intervention generalizes robustly, consistently reducing bias to very low levels (typically under 1\%, always below 2.5\%) while largely maintaining model performance. Our findings suggest that practitioners deploying LLMs for hiring should adopt more realistic evaluation methodologies and consider internal mitigation strategies for equitable outcomes.
大型语言模型(LLMS)越来越多地被部署到高端招聘应用程序中,做出直接影响人们职业和生计的决定。虽然先前的研究显示简单的反偏见提示可以消除受控评价中的人口偏差,但我们发现,当引入现实背景细节时,这些缓解措施会失败。 我们通过内部偏差减缓这些失败:通过在模型启动过程中识别和中和敏感属性方向,我们在所有测试的情景中实现强烈的偏差减少。在领先的商业模型(GPT-4o、Claude 4 Sonnet、Gemini 2.5 Flash)和开放源代码模型(Gemma-227B、Gemma-3、Gemma-3、Mistral-24B)中,我们发现,在多数情况下,增加现实的环境环境环境环境,如公司名称、公共职业网页的文化描述以及选择性招聘限制(例如,只接受前10的候选人),会引发严重的种族和性别偏差(在面试率上达到12差异 ) 。当这些偏差出现这些偏差时,在所有测试的模型和低度模型和假设模型中,他们总是偏向白人候选人和女性候选人倾重男候选人。此外, 考虑保持人口统计学战略,从低的偏差的偏差,从大学的偏差,从低于大学的校友关系,在1级推向下,在测试中推向下,在精确推向下,在计算。
Article 26
Title@2025-06-12 (4): Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Title: Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization | Dekomponieren von MLP-Aktivierungen in Interpretierbare Funktionen über semi-Nonnegative Matrix-Fabrikisierung | 通过半氮基矩阵化系数化,将劳动和生产部的分解活动转化为可解释性特征 2506.10920v1 |
Authors (3): Or Shafran, Atticus Geiger, Mor Geva
A central goal for mechanistic interpretability has been to identify the right units of analysis in large language models (LLMs) that causally explain their outputs. While early work focused on individual neurons, evidence that neurons often encode multiple concepts has motivated a shift toward analyzing directions in activation space. A key question is how to find directions that capture interpretable features in an unsupervised manner. Current methods rely on dictionary learning with sparse autoencoders (SAEs), commonly trained over residual stream activations to learn directions from scratch. However, SAEs often struggle in causal evaluations and lack intrinsic interpretability, as their learning is not explicitly tied to the computations of the model. Here, we tackle these limitations by directly decomposing MLP activations with semi-nonnegative matrix factorization (SNMF), such that the learned features are (a) sparse linear combinations of co-activated neurons, and (b) mapped to their activating inputs, making them directly interpretable. Experiments on Llama 3.1, Gemma 2 and GPT-2 show that SNMF derived features outperform SAEs and a strong supervised baseline (difference-in-means) on causal steering, while aligning with human-interpretable concepts. Further analysis reveals that specific neuron combinations are reused across semantically-related features, exposing a hierarchical structure in the MLP’s activation space. Together, these results position SNMF as a simple and effective tool for identifying interpretable features and dissecting concept representations in LLMs.
机械诠释的核心目标是在大型语言模型(LLMs)中确定正确的分析单位,这些分析单位能因果解释其产出。虽然早期工作侧重于单个神经元,但证据表明神经元往往将多种概念编码成多种概念,从而推动向分析空间激活方向转变。一个关键问题是如何找到能够以不受监督的方式捕捉可解释特征的方向。目前的方法依靠的是与稀有自动编码器(SAEs)一起的字典学习,这些方法通常经过关于从零开始学习方向的剩余流启动的训练。然而,SAEs经常在因果评估方面挣扎,缺乏内在解释性的解释性,因为它们的学习与模型特性的计算没有明确挂钩。在这里,我们通过直接将MLP的启动与半同步矩阵因子化要素(SNMF)进行分解,从而找到能够以不受监督的方式捕捉到可解释的特性。当前方法依赖于(a) 与稀松散的自动神经元神经元的线性组合,以及(b)与其激活性投入直接解释。Llama 3.1, Gemma 2 和GPT-2 实验显示SNMLEDF 的超越了SAEER 的定位位置位置,同时将S-de-de-deal-de-deal-deal-res-lavial-liversal-laviews) 的定位,同时将一个强大的结构显示一个强大的结构的、一个强大的结构,并显示一个稳定的循环的精确的精确的模型,并显示一个稳定的模型。
Article 27
Title@2025-06-12 (4): Sequential-Parallel Duality in Prefix Scannable Models
Title: Sequential-Parallel Duality in Prefix Scannable Models | Sequential-Parallel-Dualität in voreingestellten Scannable-Modellen | 前缀扫描可扫描模型中的序列- Parallel 品质 2506.10918v1 |
Authors (6): Morris Yau, Sharut Gupta, Valerie Engelmayer, Kazuki Irie, Stefanie Jegelka, Jacob Andreas
Modern neural sequence models are designed to meet the dual mandate of parallelizable training and fast sequential inference. Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba, that achieve such ``sequential-parallel duality.’’ This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference? We begin by describing a broad class of such models – state space models – as those whose state updates can be computed using the classic parallel prefix scan algorithm with a custom associative aggregation operator. We then define a more general class, Prefix-Scannable Models (PSMs), by relaxing the state aggregation operator to allow arbitrary (potentially non-associative) functions such as softmax attention. This generalization unifies many existing architectures, including element-wise RNNs (e.g., Mamba) and linear transformers (e.g., GLA, Mamba2, mLSTM), while also introducing new models with softmax-like operators that achieve O(1) amortized compute per token and log(N) memory for sequence length N. We empirically evaluate such models on illustrative small-scale language modeling and canonical synthetic tasks, including state tracking and associative recall. Empirically, we find that PSMs retain the expressivity of transformer-based architectures while matching the inference efficiency of state space models – in some cases exhibiting better length generalization than either.
现代神经序列模型的设计是为了满足平行培训和快速连续测序的双重任务的双重任务。最近的事态发展产生了各种模型,如Gated Linear Connect (GLA) 和 Mamba,这些模型可以实现“序列-平行双重性 ” 。 ”这提出了一个自然的问题:我们能否将支持近一致-时间平行评估和线性-时间、恒定-空间相继推的全层神经序列模型定性为支持近一致-平行评价和线性-时间、恒定-空间相继推断的全级神经序列模型?我们首先描述一大批这类模型 – – 国家空间模型 – – 即国家更新可以使用传统的平行前端扫描算算法和一个自定义的自定义变递缩缩式组合集集控操作器(GLA、Mamba2、可扫描模型)和MLSTM等各种模型。我们随后定义了一个更通用的类,通过放松国家汇总操作器使任意的(潜在非关联性)功能性功能性(例如软性RNNNNIS)的模型和线性变换式模型,同时在软性SAL-imal-imal-lial lial-lial-lial-lial-lial lical ladeal maild maild maildal maild maildal maisal commal commal commal commal commal commal commal commal commal commal im commal im commal im im im commation commation commation commal commal commex commal commal impeal commal commal commal commal commal commal commal commal compeal impeal compeal commal impeal impeal compeal commal commal commal commal commal commal commal commal commal immal im commal commal comm
Article 28
Title@2025-06-12 (4): M4V: Multi-Modal Mamba for Text-to-Video Generation
Title: M4V: Multi-Modal Mamba for Text-to-Video Generation | M4V: Multi-Modal Mamba für die Text-zu-Video-Generierung | M4V: 用于文本到视频一代的多式Mamba 2506.10915v1 |
Authors (8): Jiancheng Huang, Gengwei Zhang, Zequn Jie, Siyu Jiao, Yinlong Qian, Ling Chen, Yunchao Wei, Lin Ma
Text-to-video generation has significantly enriched content creation and holds the potential to evolve into powerful world simulators. However, modeling the vast spatiotemporal space remains computationally demanding, particularly when employing Transformers, which incur quadratic complexity in sequence processing and thus limit practical applications. Recent advancements in linear-time sequence modeling, particularly the Mamba architecture, offer a more efficient alternative. Nevertheless, its plain design limits its direct applicability to multi-modal and spatiotemporal video generation tasks. To address these challenges, we introduce M4V, a Multi-Modal Mamba framework for text-to-video generation. Specifically, we propose a multi-modal diffusion Mamba (MM-DiM) block that enables seamless integration of multi-modal information and spatiotemporal modeling through a multi-modal token re-composition design. As a result, the Mamba blocks in M4V reduce FLOPs by 45% compared to the attention-based alternative when generating videos at 768$\times$1280 resolution. Additionally, to mitigate the visual quality degradation in long-context autoregressive generation processes, we introduce a reward learning strategy that further enhances per-frame visual realism. Extensive experiments on text-to-video benchmarks demonstrate M4V’s ability to produce high-quality videos while significantly lowering computational costs. Code and models will be publicly available at https://huangjch526.github.io/M4V_project.
文本到视频的生成极大地丰富了内容的创建,并有可能演变成强大的世界模拟器。然而,对庞大的片段空间进行建模仍然在计算上要求很高,特别是当使用在序列处理中产生二次复杂性从而限制实际应用的变异器时。最近线性时间序列建模的进步,特别是Mamba结构,提供了一个更有效的替代方案。然而,它的简单设计限制了其对多式和超式视频生成任务的直接适用性。为了应对这些挑战,我们引入了M4V,即供文本到视频生成的多式VMamba框架。具体地说,我们建议采用多式传播Mamba(MMM-DIM)块,以便能够通过多式象征性重新配置设计无缝地整合多式信息和广式模型。结果,M4V4中的Mamba区块在生成768美元到1280美元的视频生成时,将FLOPs与以关注为基础的替代方案相比减少45%。此外,我们提议在长式视频-视频-视频-视频的高清晰度生成过程中,将进一步降低视觉质量的降解,同时在高层次的图像-直观-视频生成过程中,我们将在高透明-直观-直观-直观-直观-直观-直观-直观-直观-影像生成的生成模型中,将进一步展示-直观-直观-直观-直路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路。
Article 29
Title@2025-06-12 (4): Foundation Models for Causal Inference via Prior-Data Fitted Networks
Title: Foundation Models for Causal Inference via Prior-Data Fitted Networks | Gründungsmodelle für die Kausalableitung über Prior-Data Fitted Networks | 通过先前数据适合型网络进行因果关系推断的模型基础 2506.10914v1 |
Authors (4): Yuchen Ma, Dennis Frauen, Emil Javurek, Stefan Feuerriegel
Prior-data fitted networks (PFNs) have recently been proposed as a promising way to train tabular foundation models. PFNs are transformers that are pre-trained on synthetic data generated from a prespecified prior distribution and that enable Bayesian inference through in-context learning. In this paper, we introduce CausalFM, a comprehensive framework for training PFN-based foundation models in various causal inference settings. First, we formalize the construction of Bayesian priors for causal inference based on structural causal models (SCMs) in a principled way and derive necessary criteria for the validity of such priors. Building on this, we propose a novel family of prior distributions using causality-inspired Bayesian neural networks that enable CausalFM to perform Bayesian causal inference in various settings, including back-door, front-door, and instrumental variable adjustment. Finally, we instantiate CausalFM and explicitly train a foundation model for estimating conditional average treatment effects (CATEs) using back-door adjustment. We show that CausalFM performs competitively for CATE estimation using various synthetic and semi-synthetic benchmarks. In sum, our framework can be used as a general recipe to train foundation models for various causal inference settings. In contrast to the current state-of-the-art in causal inference, CausalFM offers a novel paradigm with the potential to fundamentally change how practitioners perform causal inference in medicine, economics, and other disciplines.
最近,有人提议将先前安装数据的网络(PFNs)作为培训表层基础模型的有希望的方法。PFNs是变压器,在事先规定的先前分发的合成数据上经过预先培训,并且能够通过同文学习进行贝叶斯的推断。在本文中,我们引入了CausalFM(一个综合框架),用于在各种因果推断环境中培训以PFN为基础的基础模型。首先,我们以原则方式正式建立基于结构性因果模型的Bayesian因果推断前科,并为此类前科的有效性制定必要的标准。在此基础上,我们提议采用因果诱导导出贝叶斯的神经网络,使Bayesaurgy(Causal FMy)能够在不同环境中(包括后门、前门和辅助性变数调整)进行巴伊斯的因果推断。最后,我们用后门调整的方式正式建立Bayesian Font(CATEMs) ,并明确培训一个基础模型,用以评估有条件的平均治疗效果。我们展示了CATEFM(CED)在基本理论环境中进行竞争性估算,使用各种合成和半因果分析基础基础基础基础,在各种因果基质模型中可以用来进行。
Article 30
Title@2025-06-12 (4): NoLoCo: No-all-reduce Low Communication Training Method for Large Models
Title: NoLoCo: No-all-reduce Low Communication Training Method for Large Models | NoLoCo: Nein-alles-Reduce Low Communication Training Method for Large Models | NoLOCO: 大型模型的低通信培训方法 2506.10911v1 |
Authors (5): Jari Kolehmainen, Nikolay Blagoev, John Donaghy, Oğuzhan Ersoy, Christopher Nies
Training large language models is generally done via optimization methods on clusters containing tens of thousands of accelerators, communicating over a high-bandwidth interconnect. Scaling up these clusters is expensive and can become impractical, imposing limits on the size of models that can be trained. Several recent studies have proposed training methods that are less communication intensive, avoiding the need for a highly connected compute cluster. These state-of-the-art low communication training methods still employ a synchronization step for model parameters, which, when performed over all model replicas, can become costly on a low-bandwidth network. In this work, we propose a novel optimization method, NoLoCo, that does not explicitly synchronize all model parameters during training and, as a result, does not require any collective communication. NoLoCo implicitly synchronizes model weights via a novel variant of the Nesterov momentum optimizer by partially averaging model weights with a randomly selected other one. We provide both a theoretical convergence analysis for our proposed optimizer as well as empirical results from language model training. We benchmark NoLoCo on a wide range of accelerator counts and model sizes, between 125M to 6.8B parameters. Our method requires significantly less communication overhead than fully sharded data parallel training or even widely used low communication training method, DiLoCo. The synchronization step itself is estimated to be one magnitude faster than the all-reduce used in DiLoCo for few hundred accelerators training over the internet. We also do not have any global blocking communication that reduces accelerator idling time. Compared to DiLoCo, we also observe up to $4\%$ faster convergence rate with wide range of model sizes and accelerator counts.
培训大型语言模型通常是通过在包含数万个加速器的集群上优化方法进行,通过高带宽的连接进行沟通。 扩大这些集群费用昂贵, 可能变得不切实际, 对可以培训的模型的大小施加限制。 最近的一些研究提出了沟通强度较低的培训方法, 避免需要一个高度连接的计算组。 这些最先进的低通信培训方法仍然对模型参数采用同步步骤, 在所有模型复制器上实施时, 低带宽网络上可能变得费用高昂。 在这项工作中, 我们建议一种新型优化方法, 即 No LoCo, 在培训期间没有明确同步所有模型参数, 因而不要求任何集体交流。 No LoCo 隐含同步模型重量, 通过Nesterov 动力优化的一种新变式, 部分平均模型重量, 并随机选择其他方法。 我们为我们拟议的最优化者提供理论性趋近的趋同分析, 以及任何语言模型训练的经验性结果都会变得昂贵。 我们把No Loco 用于一个范围不甚广的网络化的模型, 甚至不协调所有模型中的时间范围, 也要求大量使用成本比我们所用的标准的模型 。
Article 31
Title@2025-06-12 (4): Probably Approximately Correct Labels
Title: Probably Approximately Correct Labels | Wahrscheinlich ungefähr korrekte Etiketten | 可能是近似正确标签 2506.10908v1 |
Authors (3): Emmanuel J. Candès, Andrew Ilyas, Tijana Zrnic
Obtaining high-quality labeled datasets is often costly, requiring either extensive human annotation or expensive experiments. We propose a method that supplements such “expert” labels with AI predictions from pre-trained models to construct labeled datasets more cost-effectively. Our approach results in probably approximately correct labels: with high probability, the overall labeling error is small. This solution enables rigorous yet efficient dataset curation using modern AI models. We demonstrate the benefits of the methodology through text annotation with large language models, image labeling with pre-trained vision models, and protein folding analysis with AlphaFold.
获取高质量的标签数据集往往成本高昂,需要广泛的人类批注或昂贵的实验。我们提出了一个方法来补充这种“专家”标签,加上从经过培训的模型中预测的人工智能标签,以便以更具成本效益的方式构建标签数据集。我们的方法可以大致正确的标签:在高概率的情况下,总体标签错误很小。这个方法可以使用现代的AI模型进行严格而高效的数据集弯曲。我们通过使用大型语言模型的文本注释、与经过培训的视觉模型进行图像标签以及与阿尔法福尔德进行蛋白质折叠分析来证明这种方法的好处。
Article 32
Title@2025-06-12 (4): Demystifying Spectral Feature Learning for Instrumental Variable Regression
Title: Demystifying Spectral Feature Learning for Instrumental Variable Regression | Entmystifizieren von Spektralfunktionen für instrumentelle variable Regression | 为乐器变量回归而解开神秘的光谱特征学习 2506.10899v1 |
Authors (5): Dimitri Meunier, Antoine Moulin, Jakub Wornbard, Vladimir R. Kostic, Arthur Gretton
We address the problem of causal effect estimation in the presence of hidden confounders, using nonparametric instrumental variable (IV) regression. A leading strategy employs spectral features - that is, learned features spanning the top eigensubspaces of the operator linking treatments to instruments. We derive a generalization error bound for a two-stage least squares estimator based on spectral features, and gain insights into the method’s performance and failure modes. We show that performance depends on two key factors, leading to a clear taxonomy of outcomes. In a good scenario, the approach is optimal. This occurs with strong spectral alignment, meaning the structural function is well-represented by the top eigenfunctions of the conditional operator, coupled with this operator’s slow eigenvalue decay, indicating a strong instrument. Performance degrades in a bad scenario: spectral alignment remains strong, but rapid eigenvalue decay (indicating a weaker instrument) demands significantly more samples for effective feature learning. Finally, in the ugly scenario, weak spectral alignment causes the method to fail, regardless of the eigenvalues’ characteristics. Our synthetic experiments empirically validate this taxonomy.
我们用非参数工具变量(IV)回归法解决在隐蔽混淆器存在的情况下因果估计问题。 一种主要策略使用光谱特征, 即操作者将处理方法与仪器连接起来的顶部天分次空间的学习特征。 我们根据光谱特征为两阶段最小方形估计器得出一个一般错误, 并深入了解方法的性能和故障模式。 我们显示, 性能取决于两个关键因素, 导致结果的清晰分类。 在一种良好的情景中, 这种方法是最佳的。 发生于强光谱匹配, 意味着结构功能由有条件操作者顶部天分功能充分体现, 以及该操作者缓慢的天分衰减, 表明一种强大的工具。 性能在一种坏的情景中降解: 光谱对齐仍然很强大, 但快速的光值衰变( 指示较弱的仪器) 需要大量样本来有效特征学习。 最后, 在丑陋的情景中, 弱光谱协调法导致方法失败, 而不考虑其特性的特性。
Article 33
Title@2025-06-12 (4): The Diffusion Duality
Title: The Diffusion Duality | Die Diffusionsdualität | 传播质量 2506.10892v1 |
Authors (6): Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, Volodymyr Kuleshov
Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: http://s-sahoo.github.io/duo
单一状态离散扩散模型具有快速生成文本的希望,因为它们具有自我校正的内在能力。 但是,它们通常表现优于自动递减模型和蒙面扩散模型。 在这项工作中,我们通过利用关键的洞察力缩小了这一性能差距:统一状态扩散过程自然地从根基高斯扩散中产生。我们的方法,Duo,从高山扩散中转让强大的技术,以改进培训和取样。首先,我们引入了高山进程指导下的课程学习战略,通过减少差异使培训速度翻倍。经过课程培训的模型在7个基准中的3个标准中超过了自动递增模型。第二,我们展示了混凝固性蒸馏法,将一致性从连续的蒸馏到离散的设置。这种算法通过加速两个级的采样来解开传播语言模型的几步生成。我们在项目网页上提供了代码和模式检查站: http://s-sahoo.github.io/duo。我们提供了项目页面上的代码和模式检查站: http://s-sahoo. github. duo/duo/duo。
Article 34
Title@2025-06-12 (4): PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play
Title: PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play | PLAY2PROMPT: Zero-shot Tool Instruction Optimierung für LLM Agenten über Tool Play | PLAY2PROMOPT: 通过工具游戏优化LLM代理器的零射工具指令 2503.14432v2 |
Authors (5): Wei Fang, Yang Zhang, Kaizhi Qian, James Glass, Yada Zhu
Large language models (LLMs) are increasingly integrated with specialized external tools, yet many tasks demand zero-shot tool usage with minimal or noisy documentation. Existing solutions rely on manual rewriting or labeled data for validation, making them inapplicable in true zero-shot settings. To address these challenges, we propose PLAY2PROMPT, an automated framework that systematically “plays” with each tool to explore its input-output behaviors. Through this iterative trial-and-error process, PLAY2PROMPT refines tool documentation and generates usage examples without any labeled data. These examples not only guide LLM inference but also serve as validation to further enhance tool utilization. Extensive experiments on real-world tasks demonstrate that PLAY2PROMPT significantly improves zero-shot tool performance across both open and closed models, offering a scalable and effective solution for domain-specific tool integration.
大型语言模型(LLMS)日益与专门的外部工具融合在一起,但许多任务要求使用零光工具,且文件很少或吵闹。现有解决方案依靠人工重写或标签数据进行验证,使其在真正的零光环境中不适用。为了应对这些挑战,我们提议PLAY2PROMPT,这是一个系统“玩耍”的自动框架,每个工具都可以探索其输入-输出行为。通过这个迭接的试验-操作程序,PLAY2PROMOT改进了工具文档,并生成了使用示例,而没有任何标签数据。这些示例不仅指导LLLM推断,而且还作为进一步增强工具利用的验证。关于现实世界任务的广泛实验表明,PLAY2PROMOT大大改善了开放和封闭模式的零光工具性,为特定域工具整合提供了可扩展的有效解决方案。
Article 35
Title@2025-06-12 (4): Lattice Climber Attack: Adversarial attacks for randomized mixtures of classifiers
Title: Lattice Climber Attack: Adversarial attacks for randomized mixtures of classifiers | Lattice Climber Attack: Adversariale Angriffe für randomisierte Mischungen von Klassifikatoren | Lattice 攀爬者攻击: 随机混合分类器的反向攻击 2506.10888v1 |
Authors (3): Lucas Gnecco-Heredia, Benjamin Negrevergne, Yann Chevaleyre
Finite mixtures of classifiers (a.k.a. randomized ensembles) have been proposed as a way to improve robustness against adversarial attacks. However, existing attacks have been shown to not suit this kind of classifier. In this paper, we discuss the problem of attacking a mixture in a principled way and introduce two desirable properties of attacks based on a geometrical analysis of the problem (effectiveness and maximality). We then show that existing attacks do not meet both of these properties. Finally, we introduce a new attack called {\em lattice climber attack} with theoretical guarantees in the binary linear setting, and demonstrate its performance by conducting experiments on synthetic and real datasets.
提议将精密的分类器混合物(a.k.a. 随机组合)作为提高抵御对抗性攻击的力度的一种方法。然而,现有攻击已证明不适合这种分类器。在本文中,我们讨论以原则性方式攻击混合物的问题,根据对问题(有效性和最大性)的几何分析,引入两种可取的攻击特性。然后,我们表明现有的攻击不符合这两种特性。最后,我们用二进制线性线性环境的理论保证引进了一个新的攻击,称为“范拉蒂斯登山者攻击”),并通过对合成和真实数据集进行实验来证明其性能。
Article 36
Title@2025-06-12 (4): Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Title: Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers | Verallgemeinerung oder Halluzination? Verstehen von Out-of-Context-Reasoning in Transformers | 通化还是幻觉? 理解变异器的逻辑外原因 2506.10887v1 |
Authors (8): Yixiao Huang, Hanlin Zhu, Tianyu Guo, Jiantao Jiao, Somayeh Sojoudi, Michael I. Jordan, Stuart Russell, Song Mei
Large language models (LLMs) can acquire new knowledge through fine-tuning, but this process exhibits a puzzling duality: models can generalize remarkably from new facts, yet are also prone to hallucinating incorrect information. However, the reasons for this phenomenon remain poorly understood. In this work, we argue that both behaviors stem from a single mechanism known as out-of-context reasoning (OCR): the ability to deduce implications by associating concepts, even those without a causal link. Our experiments across five prominent LLMs confirm that OCR indeed drives both generalization and hallucination, depending on whether the associated concepts are causally related. To build a rigorous theoretical understanding of this phenomenon, we then formalize OCR as a synthetic factual recall task. We empirically show that a one-layer single-head attention-only transformer with factorized output and value matrices can learn to solve this task, while a model with combined weights cannot, highlighting the crucial role of matrix factorization. Our theoretical analysis shows that the OCR capability can be attributed to the implicit bias of gradient descent, which favors solutions that minimize the nuclear norm of the combined output-value matrix. This mathematical structure explains why the model learns to associate facts and implications with high sample efficiency, regardless of whether the correlation is causal or merely spurious. Ultimately, our work provides a theoretical foundation for understanding the OCR phenomenon, offering a new lens for analyzing and mitigating undesirable behaviors from knowledge injection.
大型语言模型(LLMS)可以通过微调获得新知识,但这一过程呈现出一种令人费解的双重性:模型可以从新的事实中加以概括,但又容易产生幻觉。然而,这一现象的原因仍然没有得到很好理解。在这项工作中,我们争辩说,这两种行为都来自一个单一的机制,称为超文本推理(OCR):通过结合概念,甚至没有因果联系的概念,来推断影响的能力。我们在五个突出的LMS的实验中证实,OCR确实既会推动普遍性,也会引发幻觉,这取决于相关概念是否因果相关。要建立严格的理论理解,我们然后将OCR正式确定为综合事实回顾任务。我们从经验上表明,一个单层单一的注意力变异器,带有因素性产出和价值矩阵,可以学会解决这项任务,而一个综合权重的模型则不能突出矩阵化的关键作用。我们的理论分析显示,OCR能力可以归因于渐渐下降的隐含偏见,这取决于相关概念是否具有因果关系。为了建立严格的核规范,我们然后将OCRR正式确定一个综合产出-价值模型,这是我们最终的理论分析基础,这解释了为什么是提供我们如何解释。
Article 37
Title@2025-06-12 (4): Three iterations of $(d-1)$-WL test distinguish non isometric clouds of $d$-dimensional points
Title: Three iterations of $(d-1)$-WL test distinguish non isometric clouds of $d$-dimensional points | Drei Iterationen von $(d-1)$-WL-Test unterscheiden nicht-isometrische Wolken von $d$-dimensionalen Punkten | 三次(d-1-1美元-WL)的迭代(美元-美元-WL)测试区分非等度云(美元-维点) 2303.12853v4 |
Authors (5): Valentino Delle Rose, Alexander Kozachinskiy, Cristóbal Rojas, Mircea Petrache, Pablo Barceló
The Weisfeiler–Lehman (WL) test is a fundamental iterative algorithm for checking isomorphism of graphs. It has also been observed that it underlies the design of several graph neural network architectures, whose capabilities and performance can be understood in terms of the expressive power of this test. Motivated by recent developments in machine learning applications to datasets involving three-dimensional objects, we study when the WL test is {\em complete} for clouds of euclidean points represented by complete distance graphs, i.e., when it can distinguish, up to isometry, any arbitrary such cloud. %arbitrary clouds of euclidean points represented by complete distance graphs. % How many dimensions of the Weisfeiler–Lehman test is enough to distinguish any two non-isometric point clouds in $d$-dimensional Euclidean space, assuming that these point clouds are given as complete graphs labeled by distances between the points? This question is important for understanding, which architectures of graph neural networks are capable of fully exploiting the spacial structure of a point cloud. Our main result states that the $(d-1)$-dimensional WL test is complete for point clouds in $d$-dimensional Euclidean space, for any $d\ge 2$, and that only three iterations of the test suffice. We also observe that the $d$-dimensional WL test only requires one iteration to achieve completeness. Our paper thus provides complete understanding of the 3-dimensional case: it was shown in previous works that 1-WL is not complete in $\mathbb{R}^3$, and we show that 2-WL is complete there. We also strengthen the lower bound for 1-WL by showing that it is unable to recognize planar point clouds in $\mathbb{R}^3$. Finally, we show that 2-WL is not complete in $\mathbb{R}^6$, leaving as an open question, whether it is complete in $\mathbb{R}^{d}$ for $d = 4,5$.
Weisfeiler- Lehman (WL) 测试是用于检查图形是否具有任意性云层的基本迭代算法。 人们也观察到,它支撑了数个图形神经网络结构的设计, 其能力和性能可以以测试的表达力来理解。 由机器在涉及三维天体的数据集中学习应用的最新发展动态驱动的Weisfeiler- Lehman 测试, 当 WL 测试完成时, 我们研究这些点云仅以完整的距离图标出。 也就是说, 当它能够辨别, 直至偏差值, 任何任意的云层。% a 任意的oclide 点的云层由 2LL 表示的功能和性能, 因此, 前Wisfeiler- Lehman 测试的多维维度足以区分任何两个非测量点云层值的云层值, 假设这些点云点仅以完全的平面图标出, 而我们只能完全地 4 美元 。 。 这个问题对于理解, 它的图形网络结构能够完全利用 $ 美元 。
Article 38
Title@2025-06-12 (4): A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials
Title: A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials | Ein Goemans-Williamson-Algorithmus zur Identifizierung von Subkohorten in klinischen Studien | Goemans- Williamson 型算法,用于在临床试验中识别亚科 2506.10879v1 |
Authors (1): Pratik Worah
We design an efficient algorithm that outputs a linear classifier for identifying homogeneous subsets (equivalently subcohorts) from large inhomogeneous datasets. Our theoretical contribution is a rounding technique, similar to that of Goemans and Williamson (1994), that approximates the optimal solution of the underlying optimization problem within a factor of $0.82$. As an application, we use our algorithm to design a simple test that can identify homogeneous subcohorts of patients, that are mainly comprised of metastatic cases, from the RNA microarray dataset for breast cancer by Curtis et al. (2012). Furthermore, we also use the test output by the algorithm to systematically identify subcohorts of patients in which statistically significant changes in methylation levels of tumor suppressor genes co-occur with statistically significant changes in nuclear receptor expression. Identifying such homogeneous subcohorts of patients can be useful for the discovery of disease pathways and therapeutics, specific to the subcohort.
我们设计了一个高效的算法,从大型异质数据集中输出一个线性分类器,用于识别同质子集(相当于亚异质)。我们的理论贡献是一种四舍五入的技术,类似于戈曼斯和威廉姆森(1994年)的技术,它近似于0.82美元系数内潜在优化问题的最佳解决办法。作为一种应用,我们用我们的算法设计一个简单的测试,可以识别病人的同质次组,主要包括Curtis等人(2012年)的乳腺癌RNA微粒数据集中的同质次组。此外,我们还利用算法的测试输出系统识别病人的次组群,其中肿瘤抑制基因的甲基化水平发生了统计学上的重大变化,同时对核感应器表达方式也发生了统计上的重大变化。确定病人的同质次组群可以有助于发现子科特的疾病路径和治疗方法。
Article 39
Title@2025-06-12 (4): Data-Driven Prediction of Dynamic Interactions Between Robot Appendage and Granular Material
Title: Data-Driven Prediction of Dynamic Interactions Between Robot Appendage and Granular Material | Datengestützte Vorhersage dynamischer Interaktionen zwischen Roboter-Anhang und Granular Material | 机器人附加和颗粒材料之间动态相互作用的数据驱动预测 2506.10875v1 |
Authors (4): Guanjin Wang, Xiangxue Zhao, Shapour Azarm, Balakumar Balachandran
An alternative data-driven modeling approach has been proposed and employed to gain fundamental insights into robot motion interaction with granular terrain at certain length scales. The approach is based on an integration of dimension reduction (Sequentially Truncated Higher-Order Singular Value Decomposition), surrogate modeling (Gaussian Process), and data assimilation techniques (Reduced Order Particle Filter). This approach can be used online and is based on offline data, obtained from the offline collection of high-fidelity simulation data and a set of sparse experimental data. The results have shown that orders of magnitude reduction in computational time can be obtained from the proposed data-driven modeling approach compared with physics-based high-fidelity simulations. With only simulation data as input, the data-driven prediction technique can generate predictions that have comparable accuracy as simulations. With both simulation data and sparse physical experimental measurement as input, the data-driven approach with its embedded data assimilation techniques has the potential in outperforming only high-fidelity simulations for the long-horizon predictions. In addition, it is demonstrated that the data-driven modeling approach can also reproduce the scaling relationship recovered by physics-based simulations for maximum resistive forces, which may indicate its general predictability beyond a case-by-case basis. The results are expected to help robot navigation and exploration in unknown and complex terrains during both online and offline phases.
已经提出并采用了一种替代数据驱动模型方法,以深入了解机器人运动与颗粒层在某些长度尺度上与颗粒层互动的机器人运动,该方法的基础是:整合尺寸缩减(按顺序排列的高级Order Singular 值分解)、代建模型(Gauussian Process)和数据同化技术(减少顺序粒子过滤器),这一方法可以在线使用,并以离线数据为基础,从高纤维模拟数据离线采集的离线数据以及一组稀少的实验数据为基础;结果显示,计算时间的降幅从拟议的数据驱动模型方法中获得,与基于物理学的高纤维模拟相比,可以减少数量级。如果只有模拟数据作为投入,数据驱动的预测技术可以产生类似模拟数据的准确性预测(Gaussilian Process Process)和数据驱动的物理实验测量数据驱动方法作为投入,那么数据吸收的模拟性模拟方法只能超过长期正弦预测的高度纤维化模拟模拟。此外,还表明,数据驱动的模型和模拟性模型分析结果可以复制到不那么的模型化的模型化的模型分析基础,然后再复制,然后再复制。
Article 40
Title@2025-06-12 (4): The Gittins Index: A Design Principle for Decision-Making Under Uncertainty
Title: The Gittins Index: A Design Principle for Decision-Making Under Uncertainty | Der Gittins Index: Ein Design-Prinzip für Entscheidungsfindung unter Unsicherheit | Gittins指数:不确定性下决策的设计原则 2506.10872v1 |
Authors (2): Ziv Scully, Alexander Terenin
The Gittins index is a tool that optimally solves a variety of decision-making problems involving uncertainty, including multi-armed bandit problems, minimizing mean latency in queues, and search problems like the Pandora’s box model. However, despite the above examples and later extensions thereof, the space of problems that the Gittins index can solve perfectly optimally is limited, and its definition is rather subtle compared to those of other multi-armed bandit algorithms. As a result, the Gittins index is often regarded as being primarily a concept of theoretical importance, rather than a practical tool for solving decision-making problems. The aim of this tutorial is to demonstrate that the Gittins index can be fruitfully applied to practical problems. We start by giving an example-driven introduction to the Gittins index, then walk through several examples of problems it solves - some optimally, some suboptimally but still with excellent performance. Two practical highlights in the latter category are applying the Gittins index to Bayesian optimization, and applying the Gittins index to minimizing tail latency in queues.
Gittins指数是一个工具,它能最好地解决各种涉及不确定性的决策问题,包括多臂土匪问题,最大限度地减少队列中的平均潜伏,以及Pandora箱式模型等搜索问题。然而,尽管有上述例子和后来的扩展,Gittins指数完全可以最佳解决的问题空间是有限的,其定义与其他多臂土匪算法相比相当微妙。因此,Gittins指数常常被视为主要具有理论重要性的概念,而不是解决决策问题的实用工具。这个教程的目的是证明Gittins指数可以富有成效地应用于实际问题。我们首先以实例为驱动地介绍Gittins指数,然后通过几个问题的例子,有些是最佳的,有些是次近的,但表现仍然很好。后一类的两个实际亮点是将Gittins指数应用于Bayesian 优化,并应用Gittins索引来尽量减少排队尾拖。
Article 41
Title@2025-06-12 (4): Unveiling the Role of Randomization in Multiclass Adversarial Classification: Insights from Graph Theory
Title: Unveiling the Role of Randomization in Multiclass Adversarial Classification: Insights from Graph Theory | Enthüllen der Rolle der Randomisierung in der Mehrklassen-Adversarial-Klassifikation: Einsichten aus der Graphentheorie | 打破随机化在多级反对向分类中的作用:图表理论的透视 2503.14299v2 |
Authors (6): Lucas Gnecco-Heredia, Matteo Sammut, Muni Sreenivas Pydi, Rafael Pinot, Benjamin Negrevergne, Yann Chevaleyre
Randomization as a mean to improve the adversarial robustness of machine learning models has recently attracted significant attention. Unfortunately, much of the theoretical analysis so far has focused on binary classification, providing only limited insights into the more complex multiclass setting. In this paper, we take a step toward closing this gap by drawing inspiration from the field of graph theory. Our analysis focuses on discrete data distributions, allowing us to cast the adversarial risk minimization problems within the well-established framework of set packing problems. By doing so, we are able to identify three structural conditions on the support of the data distribution that are necessary for randomization to improve robustness. Furthermore, we are able to construct several data distributions where (contrarily to binary classification) switching from a deterministic to a randomized solution significantly reduces the optimal adversarial risk. These findings highlight the crucial role randomization can play in enhancing robustness to adversarial attacks in multiclass classification.
作为改善机器学习模式对抗性强健性的一种手段,随机化最近引起了人们的极大关注。 不幸的是,迄今为止的理论分析大多侧重于二进制分类,只对更复杂的多级设置提供了有限的洞察力。在本文中,我们通过从图形理论领域获得灵感,朝着缩小这一差距迈出了一步。我们的分析侧重于离散的数据分布,从而使我们能够将对抗性风险最小化的问题置于既定的成套包装问题框架之内。通过这样做,我们得以确定支持数据分配的三个结构性条件,而这是随机化提高稳健性所必需的。此外,我们能够在从确定性(从二进制分类到二进制分类)转换为随机化解决方案的地方建立若干数据分布,从而大大降低了最佳的对抗性风险。这些结论强调了随机化在多级分类中增强对抗性攻击的稳健性方面可以发挥的关键作用。
Article 42
Title@2025-06-12 (4): Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization
Title: Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization | Lebensfähigkeit künftiger Maßnahmen: Robuste Sicherheit im Stärkungslernen durch Entropieregularisierung | 未来行动的可行性:通过英信正规化加强学习的有力安全 2506.10871v1 |
Authors (4): Pierre-François Massiani, Alexander von Rohr, Lukas Haverbeck, Sebastian Trimpe
Despite the many recent advances in reinforcement learning (RL), the question of learning policies that robustly satisfy state constraints under unknown disturbances remains open. In this paper, we offer a new perspective on achieving robust safety by analyzing the interplay between two well-established techniques in model-free RL: entropy regularization, and constraints penalization. We reveal empirically that entropy regularization in constrained RL inherently biases learning toward maximizing the number of future viable actions, thereby promoting constraints satisfaction robust to action noise. Furthermore, we show that by relaxing strict safety constraints through penalties, the constrained RL problem can be approximated arbitrarily closely by an unconstrained one and thus solved using standard model-free RL. This reformulation preserves both safety and optimality while empirically improving resilience to disturbances. Our results indicate that the connection between entropy regularization and robustness is a promising avenue for further empirical and theoretical investigation, as it enables robust safety in RL through simple reward shaping.
尽管在加强学习(RL)方面最近取得了许多进展,但学习政策如何在未知扰动下强有力地满足国家制约的问题仍然有待解决。在本文件中,我们通过分析无模式RL中两种成熟技术的相互作用,提供了实现稳健安全的新视角:即:变本加厉的正规化和约束性处罚。我们从经验中发现,受限制的RL的诱导性正规化在学习最大限度地增加未来可行行动的数量方面有着内在的偏向性,从而促使对行动噪音产生强烈的满意度。此外,我们表明,通过通过惩罚放松严格的安全限制,受限制的RL问题可以任意地被一个不受约束的束缚的束缚性监管者所近似,从而使用标准的无模式RL来解决。这种重整既能维护安全和最佳性,又在经验上提高抗扰动的复原力。我们的结果表明,变本和稳健之间的联系是进一步进行实证和理论性调查的一个有希望的途径,因为它通过简单的奖励方式使RL的稳健安全。
Article 43
Title@2025-06-12 (4): mLaSDI: Multi-stage latent space dynamics identification
Title: mLaSDI: Multi-stage latent space dynamics identification | mLaSDI: Mehrstufige latente Raumdynamik-Identifikation | mLASDI:多阶段潜在空间动态识别 2506.09207v2 |
Authors (3): William Anderson, Seung Whan Chung, Youngsoo Choi
Determining accurate numerical solutions of partial differential equations (PDEs) is an important task in many scientific disciplines. However, solvers can be computationally expensive, leading to the development of reduced-order models (ROMs). Recently, Latent Space Dynamics Identification (LaSDI) was proposed as a data-driven, non-intrusive ROM framework. LaSDI compresses the training data using an autoencoder and learns a system of user-chosen ordinary differential equations (ODEs), which govern the latent space dynamics. This allows for rapid predictions by interpolating and evolving the low-dimensional ODEs in the latent space. While LaSDI has produced effective ROMs for numerous problems, the autoencoder can have difficulty accurately reconstructing training data while also satisfying the imposed dynamics in the latent space, particularly in complex or high-frequency regimes. To address this, we propose multi-stage Latent Space Dynamics Identification (mLaSDI). With mLaSDI, several autoencoders are trained sequentially in stages, where each autoencoder learns to correct the error of the previous stages. We find that applying mLaSDI with small autoencoders results in lower prediction and reconstruction errors, while also reducing training time compared to LaSDI.
确定部分差异方程式(PDEs)的准确数字解决方案在许多科学学科中是一项重要任务。然而,解答器在计算上可能费用高昂,导致开发了减少顺序模型(ROMs)。最近,LaST空间动态识别(LASDI)被提议为数据驱动的非侵入性ROM框架。LASDI使用自动编码器压缩培训数据,并学习一个管理潜在空间动态的用户选择的普通差异方程式系统(MLASDI),这样就可以通过对潜层低维代号进行内插和演化来进行快速预测。虽然LASDI为许多问题制作了有效的ROM,但自动编码器可能难以准确重建培训数据,同时也满足了潜在空间,特别是复杂或高频系统中的强制动态。为了解决这个问题,我们提出了多阶段的低端空间动态识别(MLASDI)系统。使用MSDI系统,对几个自动编码器进行了连续培训,使每个小型解码器都学会如何纠正前几个阶段的错误。我们发现,在将LASDRADS在进行低级的重建时,同时将LADDRADS在降低时间预测中进行。
Article 44
Title@2025-06-12 (4): Latent Action Learning Requires Supervision in the Presence of Distractors
Title: Latent Action Learning Requires Supervision in the Presence of Distractors | Latent Action Learning erfordert Überwachung in der Gegenwart von Distraktoren | 长期行动学习要求对有减压器的存在进行监督 2502.00379v5 |
Authors (7): Alexander Nikulin, Ilya Zisman, Denis Tarasov, Nikita Lyubaykin, Andrei Polubarov, Igor Kiselev, Vladislav Kurenkov
Recently, latent action learning, pioneered by Latent Action Policies (LAPO), have shown remarkable pre-training efficiency on observation-only data, offering potential for leveraging vast amounts of video available on the web for embodied AI. However, prior work has focused on distractor-free data, where changes between observations are primarily explained by ground-truth actions. Unfortunately, real-world videos contain action-correlated distractors that may hinder latent action learning. Using Distracting Control Suite (DCS) we empirically investigate the effect of distractors on latent action learning and demonstrate that LAPO struggle in such scenario. We propose LAOM, a simple LAPO modification that improves the quality of latent actions by 8x, as measured by linear probing. Importantly, we show that providing supervision with ground-truth actions, as few as 2.5% of the full dataset, during latent action learning improves downstream performance by 4.2x on average. Our findings suggest that integrating supervision during Latent Action Models (LAM) training is critical in the presence of distractors, challenging the conventional pipeline of first learning LAM and only then decoding from latent to ground-truth actions.
最近,由Lient Action Policy(LAPO)率先推出的潜在行动学习在只观察数据上表现出了惊人的训练前效率,为利用网上大量视频进行体现AI提供了潜力。然而,先前的工作侧重于分散式数据,观测之间的变化主要通过地面真相行动来解释。不幸的是,真实世界的视频含有可能阻碍潜在行动学习的与行动有关的分散性。我们利用干扰控制套件,实证地调查了分散器对潜在行动学习的影响,并展示了LAPO在这种情景中的斗争。我们建议LAOM,这是一个简单的LAPO修改,按照线性研究的测量,将潜在行动的质量提高8x。重要的是,我们表明,在潜在行动学习过程中,对地面真相行动的监督,只有2.5 % , 平均而言,4.2x的下游表现会提高。我们的调查结果表明,在使用远程行动模型(LAM)培训时,综合监督对于分散式的培训对于转移器的存在至关重要,对最初学习LAM的常规管道提出了挑战,然后才从潜至地面的分解。
Article 45
Title@2025-06-12 (4): Multi-group Uncertainty Quantification for Long-form Text Generation
Title: Multi-group Uncertainty Quantification for Long-form Text Generation | Multi-Gruppen-Unsicherheits-Quantifizierung für langformige Textgenerierung | 长式文本生成的不确定性量化 2407.21057v2 |
Authors (2): Terrance Liu, Zhiwei Steven Wu
While past works have shown how uncertainty quantification can be applied to large language model (LLM) outputs, the question of whether resulting uncertainty guarantees still hold within sub-groupings of data remains open. In our work, given some long-form text generated by an LLM, we study uncertainty at both the level of individual claims contained within the output (via calibration) and across the entire output itself (via conformal prediction). Using biography generation as a testbed for this study, we derive a set of (demographic) attributes (e.g., whether some text describes a man or woman) for each generation to form such “subgroups” of data. We find that although canonical methods for both types of uncertainty quantification perform well when measuring across the entire dataset, such guarantees break down when examining particular subgroups. Having established this issue, we invoke group-conditional methods for uncertainty quantification – multicalibration and multivalid conformal prediction – and find that across a variety of approaches, additional subgroup information consistently improves calibration and conformal prediction within subgroups (while crucially retaining guarantees across the entire dataset). As the problems of calibration, conformal prediction, and their multi-group counterparts have not been extensively explored in the context of long-form text generation, we consider these results to form a benchmark for this setting.
过去的工作表明,不确定性的量化如何适用于大型语言模型(LLM)产出,由此产生的不确定性的保证是否仍然保留在数据分组内的问题仍然未解决。在我们的工作中,鉴于LLM产生的一些长式文本,我们研究了产出(校准)和整个产出本身(校准)中包含的个别索赔的不确定性水平(通过一致的预测),利用传记生成作为本研究的测试台,我们为每代人得出一套(人口)属性(例如,一些文字是否描述一个男人或女人),以形成这种“子群”的数据。我们发现,尽管这两种不确定性量化的可归纳方法在测量整个数据集时效果良好,但在检查特定分组时,这种保证会破裂。我们确定了这一问题,我们援引了不确定性量化的群点 – – 多重校准和多valid一致的预测 – – 并发现,在各种方法中,额外的分组信息始终在改进校准和一致的预测(同时关键地在整个数据集中保留着保证)。由于校准、符合这些版本的预测和多式结果的组问题不是我们所探讨的长式的对应。
Article 46
Title@2025-06-12 (4): Breaking Distortion-free Watermarks in Large Language Models
Title: Breaking Distortion-free Watermarks in Large Language Models | Breaking Distortion-free Watermarks in großen Sprachmodellen | 大语言模型中无扭曲的无水印 2502.18608v2 |
Authors (8): Shayleen Reynolds, Hengzhi He, Dung Daniel T. Ngo, Saheed Obitayo, Niccolò Dalmasso, Guang Cheng, Vamsi K. Potluru, Manuela Veloso
In recent years, LLM watermarking has emerged as an attractive safeguard against AI-generated content, with promising applications in many real-world domains. However, there are growing concerns that the current LLM watermarking schemes are vulnerable to expert adversaries wishing to reverse-engineer the watermarking mechanisms. Prior work in breaking or stealing LLM watermarks mainly focuses on the distribution-modifying algorithm of Kirchenbauer et al. (2023), which perturbs the logit vector before sampling. In this work, we focus on reverse-engineering the other prominent LLM watermarking scheme, distortion-free watermarking (Kuditipudi et al. 2024), which preserves the underlying token distribution by using a hidden watermarking key sequence. We demonstrate that, even under a more sophisticated watermarking scheme, it is possible to compromise the LLM and carry out a spoofing attack, i.e. generate a large number of (potentially harmful) texts that can be attributed to the original watermarked LLM. Specifically, we propose using adaptive prompting and a sorting-based algorithm to accurately recover the underlying secret key for watermarking the LLM. Our empirical findings on LLAMA-3.1-8B-Instruct, Mistral-7B-Instruct, Gemma-7b, and OPT-125M challenge the current theoretical claims on the robustness and usability of the distortion-free watermarking techniques.
近几年来,LLM 水的标记已成为防止AI产生的含量的有吸引力的保障,在许多现实世界域中应用了很有希望,然而,人们越来越担心目前的LLM水的标记办法对希望逆向设计水的标记机制的专家对手来说是脆弱的。以前破坏或偷LLM水的标记主要是Kirchenbauer等人(2023年)的分布调节算法,这种算法在取样之前会破坏对对对正对流矢量的干扰。在这项工作中,我们侧重于反向设计其他著名的LM水的标记办法,即无扭曲的水标记(Kuditipudi等人,2024年),这种办法通过使用隐藏的水标记关键序列来保持基本象征性的分布。我们证明,即使是在更复杂的水标记办法下,也有可能损害LMM和Spofing攻击,即产生大量(可能有害的)案文,可归因于原始的水标记LMM的理论。具体地说,我们建议使用适应性迅速和基于排序的算法,以精确地恢复我们关于当前主要关键LMMM的标记和MMM的精确的精确的精确的对目前进行水的磁标记。
Article 47
Title@2025-06-12 (4): Debiasing Watermarks for Large Language Models via Maximal Coupling
Title: Debiasing Watermarks for Large Language Models via Maximal Coupling | Debiasing Wasserzeichen für große Sprachmodelle über Maximal Coupling | 通过Maximal Coupling为大语言模型减少对水标记的偏差 2411.11203v2 |
Authors (5): Yangxinyu Xie, Xiang Li, Tanwi Mallick, Weijie J. Su, Ruixun Zhang
Watermarking language models is essential for distinguishing between human and machine-generated text and thus maintaining the integrity and trustworthiness of digital communication. We present a novel green/red list watermarking approach that partitions the token set into green'' and
red’’ lists, subtly increasing the generation probability for green tokens. To correct token distribution bias, our method employs maximal coupling, using a uniform coin flip to decide whether to apply bias correction, with the result embedded as a pseudorandom watermark signal. Theoretical analysis confirms this approach’s unbiased nature and robust detection capabilities. Experimental results show that it outperforms prior techniques by preserving text quality while maintaining high detectability, and it demonstrates resilience to targeted modifications aimed at improving text quality. This research provides a promising watermarking solution for language models, balancing effective detection with minimal impact on text quality.
水标识语言模型对于区分人类和机器生成的文本,从而保持数字通信的完整性和可信赖性至关重要。我们展示了一种新的绿色/红列表水标识方法,将标牌分割成“绿色”和“红”列表,小于增加绿色标牌的生成概率。为了纠正象征性分布偏差,我们的方法采用了最大程度的组合,使用统一硬币翻转来确定是否应用偏向纠正,结果嵌入伪随机水标识。理论分析证实了这一方法的公正性和稳健的检测能力。实验结果表明,它通过在保持高可探测性的同时维护文本质量,优于先前的技术,并展示了对旨在改善文本质量的有针对性的修改的复原力。这一研究为语言模型提供了一个有希望的标注水解决方案,平衡了有效检测,对文本质量的影响最小。
Article 48
Title@2025-06-12 (4): Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Title: Sample Complexity and Representation Ability of Test-time Scaling Paradigms | Probenkomplexität und Darstellungsfähigkeit von Testzeit-Skalierparadigmen | 试验时间缩放模型的复杂性和代表性 2506.05295v2 |
Authors (8): Baihe Huang, Shanda Li, Tianhao Wu, Yiming Yang, Ameet Talwalkar, Kannan Ramchandran, Michael I. Jordan, Jiantao Jiao
Test-time scaling paradigms have significantly advanced the capabilities of large language models (LLMs) on complex tasks. Despite their empirical success, theoretical understanding of the sample efficiency of various test-time strategies – such as self-consistency, best-of-$n$, and self-correction – remains limited. In this work, we first establish a separation result between two repeated sampling strategies: self-consistency requires $\Theta(1/\Delta^2)$ samples to produce the correct answer, while best-of-$n$ only needs $\Theta(1/\Delta)$, where $\Delta < 1$ denotes the probability gap between the correct and second most likely answers. Next, we present an expressiveness result for the self-correction approach with verifier feedback: it enables Transformers to simulate online learning over a pool of experts at test time. Therefore, a single Transformer architecture can provably solve multiple tasks without prior knowledge of the specific task associated with a user query, extending the representation theory of Transformers from single-task to multi-task settings. Finally, we empirically validate our theoretical results, demonstrating the practical effectiveness of self-correction methods.
测试时间缩放范式极大地提高了大型语言模型(LLMS)在复杂任务方面的能力。尽管它们取得了经验上的成功,但对于各种测试时间战略(如自我一致性、最佳美元和自我校正等)的抽样效率的理论理解仍然有限。在这项工作中,我们首先将两种重复的抽样战略区分开来:自我一致性需要$\Tata(1/Delta)2美元样本来得出正确的答案,而最佳美元只需要$@theta(1/Delta)$($-n)$($-n)$($-delta < 1美元)就表示正确和第二最有可能的答案之间的概率差距。接下来,我们展示了自我校验者反馈的自我校正方法的清晰性结果:它使变换者能够在测试时间在一个专家库中模拟在线学习。因此,单一的变换器结构可以在不事先了解与用户查询有关的具体任务的情况下以可调和方式解决多项任务,将变换器的表述理论从单项到多任务设置之间的概率差差。最后,我们验证了实际的自我校正方法。
Article 49
Title@2025-06-12 (4): OmniFluids: Unified Physics Pre-trained Modeling of Fluid Dynamics
Title: OmniFluids: Unified Physics Pre-trained Modeling of Fluid Dynamics | OmniFluids: Unified Physics Pre-trained Modeling of Fluid Dynamics | OmniFluids: 流体动力学统一物理预培训模拟模型 2506.10862v1 |
Authors (6): Rui Zhang, Qi Meng, Han Wan, Yang Liu, Zhi-Ming Ma, Hao Sun
High-fidelity and efficient simulation of fluid dynamics drive progress in various scientific and engineering applications. Traditional computational fluid dynamics methods offer strong interpretability and guaranteed convergence, but rely on fine spatial and temporal meshes, incurring prohibitive computational costs. Physics-informed neural networks (PINNs) and neural operators aim to accelerate PDE solvers using deep learning techniques. However, PINNs require extensive retraining and careful tuning, and purely data-driven operators demand large labeled datasets. Hybrid physics-aware methods embed numerical discretizations into network architectures or loss functions, but achieve marginal speed gains and become unstable when balancing coarse priors against high-fidelity measurements. To this end, we introduce OmniFluids, a unified physics pre-trained operator learning framework that integrates physics-only pre-training, coarse-grid operator distillation, and few-shot fine-tuning, which enables fast inference and accurate prediction under limited or zero data supervision. For architectural design, the key components of OmniFluids include a mixture of operators, a multi-frame decoder, and factorized Fourier layers, which enable efficient and scalable modeling of diverse physical tasks while maintaining seamless integration with physics-based supervision. Across a broad range of two- and three-dimensional benchmarks, OmniFluids significantly outperforms state-of-the-art AI-driven methods in flow field reconstruction and turbulence statistics accuracy, delivering 10-100x speedups compared to classical solvers, and accurately recovers unknown physical parameters from sparse, noisy data. This work establishes a new paradigm for efficient and generalizable surrogate modeling in complex fluid systems under limited data availability.
传统计算流体动力学方法提供强大的解释性和有保障的趋同,但依赖细微的时空模模,导致令人难以承受的计算成本。物理知情神经网络(PINNs)和神经操作员的目标是利用深层学习技术加速PDE解答器。然而,PINN需要广泛的再培训和仔细调整,而纯数据驱动操作员则需要大量贴标签的数据集。混合物理认知方法将数字分解参数嵌入网络结构或损失功能,但实现边际速度增益,在平衡粗缩的先前时间和高纤维度测量时变得不稳定。为此,我们引入了OmniFluids、统一的物理预训练操作员学习框架,将仅物理学前培训、粗电离电网操作员蒸馏和微微微调结合起来,从而可以在有限或零数据监督下快速推断和准确预测模型。对于建筑设计、OmniFlud的精确流流数据组合的关键部分包括操作员的混合体、多框架拆解的重建、以及四层集成的系统,这三维基化的实地操作的流流化和四层的系统可以保证在一般数据系统中进行高效的流流流流流中执行。
Article 50
Title@2025-06-12 (4): Energy-Efficient Deep Learning for Traffic Classification on Microcontrollers
Title: Energy-Efficient Deep Learning for Traffic Classification on Microcontrollers | Energieeffizientes Deep Learning für die Verkehrsklassifizierung an Mikrocontrollern | 关于微控制器交通分类的节能深层学习 2506.10851v1 |
Authors (4): Adel Chehade, Edoardo Ragusa, Paolo Gastaldo, Rodolfo Zunino
In this paper, we present a practical deep learning (DL) approach for energy-efficient traffic classification (TC) on resource-limited microcontrollers, which are widely used in IoT-based smart systems and communication networks. Our objective is to balance accuracy, computational efficiency, and real-world deployability. To that end, we develop a lightweight 1D-CNN, optimized via hardware-aware neural architecture search (HW-NAS), which achieves 96.59% accuracy on the ISCX VPN-NonVPN dataset with only 88.26K parameters, a 20.12K maximum tensor size, and 10.08M floating-point operations (FLOPs). Moreover, it generalizes across various TC tasks, with accuracies ranging from 94% to 99%. To enable deployment, the model is quantized to INT8, suffering only a marginal 1-2% accuracy drop relative to its Float32 counterpart. We evaluate real-world inference performance on two microcontrollers: the high-performance STM32F746G-DISCO and the cost-sensitive Nucleo-F401RE. The deployed model achieves inference latencies of 31.43ms and 115.40ms, with energy consumption of 7.86 mJ and 29.10 mJ per inference, respectively. These results demonstrate the feasibility of on-device encrypted traffic analysis, paving the way for scalable, low-power IoT security solutions.
在本文中,我们介绍了对资源有限的微控制器进行节能交通分类(TC)的实际深入学习(DL)方法,该方法在基于IoT的智能系统和通信网络中广泛使用。我们的目标是平衡准确性、计算效率和实际世界可部署性。为此,我们开发了一个轻量的1D-CNN,通过硬件智能神经结构搜索优化1D-CNN(HW-NAS),该方法在ISCX VPN-NONVPN数据集上实现了96.59%的准确性,该数据仅具有88.26K参数,20.12K最高拉高尺寸和10.08M浮动点操作(FLOPs)。此外,该方法在各种技术合作任务中广泛使用,从94%到99%不等。为了能够部署,该模型仅通过硬件智能神经结构搜索(HW-NAS), 其神经结构搜索中只有1至2%的精确度下降。我们评估了两个微量控制器的实时性能表现:高性STMF746G-DISCO和10M浮标定点操作操作(FLEL-CL-401M401),分别在31-F4401号部署的安全性安全性安全分析中实现了高效度的SBS-CLMLMF-40和成本性流流/40MLVLVLVA结果。
Article 51
Title@2025-06-12 (4): GraphThought: Graph Combinatorial Optimization with Thought Generation
Title: GraphThought: Graph Combinatorial Optimization with Thought Generation | GraphThought: Graph Combinatorial Optimization mit Thought Generation | 图图图图图图: 与新一代人相比的图形组合式优化 2502.11607v2 |
Authors (9): Zixiao Huang, Lifeng Guo, Wenhao Li, Junjie Sheng, Chuyun Shen, Haosheng Chen, Bo Jin, Changhong Lu, Xiangfeng Wang
Graph combinatorial optimization (GCO) problems are central to domains like logistics and bioinformatics. While traditional solvers dominate, large language models (LLMs) offer new possibilities for structured reasoning, yet struggle with complex GCO tasks requiring rigorous combinatorial analysis and multi-step deduction, often producing hallucinated steps. We first formalize the Optimal Thoughts Design (OTD) problem, which provides a structured guidance for producing high-quality intermediate reasoning steps. Building on this formulation, we introduce GraphThought, a novel framework that generates effective reasoning sequences through either heuristic-guided forward search or solver-aligned backward reasoning. By fine-tuning LLMs on these structured thought sequences, we develop Llama-GT, an 8B-parameter model that achieves state-of-the-art performance on the GraphArena benchmark, outperforming significantly larger models like DeepSeek-V3. Our results demonstrate that when scaffolded with structured reasoning priors, principled thought generation can significantly enhance LLM performance on GCO tasks without requiring increased model scale.
图形组合优化(GCO)问题是物流和生物信息学等领域的核心。虽然传统解决方案占主导地位,但大型语言模型(LLMs)为结构化推理提供了新的可能性,但与复杂的GCO任务挣扎,这些任务需要严格的组合分析和多步扣减,常常产生幻觉步骤。我们首先正式确定最佳思维设计(OTD)问题,它为产生高质量的中间推理步骤提供了结构化指导。基于这一提法,我们引入了GaphThought,这是一个通过超常引导前搜索或求解后向推理来产生有效推理序列的新框架。通过对这些结构化思维序列进行微调LLMs,我们开发了Llama-GT,8B参数模型,在GagArenna基准上达到最新表现,比DeepSeek-V3等大得多的模型表现要好得多。我们的结果显示,在以结构推理的推理推理推理推理推理推理推理推理法时,原则思想生成能够大大提高GCO任务的LM性能。
Article 52
Title@2025-06-12 (4): Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles
Title: Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles | Beschleunigen von Diffusion Großsprachige Modelle mit SlowFast: Die drei goldenen Prinzipien | 以慢速加速传播大语言模式:三大金原则 2506.10848v1 |
Authors (5): Qingyan Wei, Yaojie Zhang, Zhiyuan Liu, Dongrui Liu, Linfeng Zhang
Diffusion-based language models (dLLMs) have emerged as a promising alternative to traditional autoregressive LLMs by enabling parallel token generation and significantly reducing inference latency. However, existing sampling strategies for dLLMs, such as confidence-based or semi-autoregressive decoding, often suffer from static behavior, leading to suboptimal efficiency and limited flexibility. In this paper, we propose SlowFast Sampling, a novel dynamic sampling strategy that adaptively alternates between exploratory and accelerated decoding stages. Our method is guided by three golden principles: certainty principle, convergence principle, and positional principle, which govern when and where tokens can be confidently and efficiently decoded. We further integrate our strategy with dLLM-Cache to reduce redundant computation. Extensive experiments across benchmarks and models show that SlowFast Sampling achieves up to 15.63$\times$ speedup on LLaDA with minimal accuracy drop, and up to 34.22$\times$ when combined with caching. Notably, our approach outperforms strong autoregressive baselines like LLaMA3 8B in throughput, demonstrating that well-designed sampling can unlock the full potential of dLLMs for fast and high-quality generation.
以扩散为基础的语言模型(dLLMS)通过平行象征性生成和大量减少推导延迟度,成为传统自动递增的LLMS(dLLMS)的一个有希望的替代方法,它使平行象征性生成和显著减少推导延迟度,然而,现有的dLLMS取样战略,例如信心基或半自动递减解码,往往受到静态行为的影响,导致效率低下和灵活性有限。在本文中,我们建议慢速取样,这是一种新的动态抽样战略,在探索阶段和加速解码阶段之间互换。我们的方法以三项黄金原则为指导:确定性原则、趋同原则和定位原则,这些原则在何时和何地规范标志可以自信和高效地解码。我们进一步将我们的战略与dLLLMM-Case整合,以减少多余的计算。跨基准和模型的广泛实验表明,慢速采样在LLADADA中达到15.631美元,在精确下降后达到34.22\时间,加固。值得注意的是,我们的方法超越了像LLMAMA3和快速制模8B的快速制成的完整自动采样的潜力。
Article 53
Title@2025-06-12 (4): Advanced fraud detection using machine learning models: enhancing financial transaction security
Title: Advanced fraud detection using machine learning models: enhancing financial transaction security | Fortgeschrittene Betrugserkennung mit maschinellen Lernmodellen: Erhöhung der finanziellen Transaktionssicherheit | 利用机器学习模式发现高级欺诈:加强金融交易安全 2506.10842v1 |
Authors (10): Nudrat Fariha, Md Nazmuddin Moin Khan, Md Iqbal Hossain, Syed Ali Reza, Joy Chakra Bortty, Kazi Sharmin Sultana, Md Shadidur Islam Jawad, Saniah Safat, Md Abdul Ahad, Maksuda Begum
The rise of digital payments has accelerated the need for intelligent and scalable systems to detect fraud. This research presents an end-to-end, feature-rich machine learning framework for detecting credit card transaction anomalies and fraud using real-world data. The study begins by merging transactional, cardholder, merchant, and merchant category datasets from a relational database to create a unified analytical view. Through the feature engineering process, we extract behavioural signals such as average spending, deviation from historical patterns, transaction timing irregularities, and category frequency metrics. These features are enriched with temporal markers such as hour, day of week, and weekend indicators to expose all latent patterns that indicate fraudulent behaviours. Exploratory data analysis reveals contextual transaction trends across all the dataset features. Using the transactional data, we train and evaluate a range of unsupervised models: Isolation Forest, One Class SVM, and a deep autoencoder trained to reconstruct normal behavior. These models flag the top 1% of reconstruction errors as outliers. PCA visualizations illustrate each models ability to separate anomalies into a two-dimensional latent space. We further segment the transaction landscape using K-Means clustering and DBSCAN to identify dense clusters of normal activity and isolate sparse, suspicious regions.
数字付款的上升加快了对智能和可扩缩系统的需求,以发现欺诈行为。这一研究为发现信用卡交易异常和欺诈,提供了一个端到端的、内容丰富的机器学习框架,以利用真实世界的数据来发现信用卡交易异常和欺诈。研究首先将交易、持卡人、商人和商类数据集从一个关系数据库中合并起来,以形成统一的分析观点。我们通过特征工程过程,提取了平均支出、偏离历史模式、交易时间异常和类别频率尺度等行为信号。这些特征增加了时间标记,如小时、每周一天和周末指标,以暴露显示显示欺诈行为的所有潜在模式。探索性数据分析揭示了所有数据集特征的背景交易趋势。我们利用交易数据,培训和评价一系列不受监督的模式:隔离森林、一类SVM和受过重建正常行为的训练的深层自动编码器。这些模型将重建错误的前1%标为外围值。五氯苯的可视化说明每一种模型都有能力将异常现象分解成一个二维的隐蔽空间。我们使用K-MEAs的正常集群和DBS-CAN活动区域。
Article 54
Title@2025-06-12 (4): Simplicity bias and optimization threshold in two-layer ReLU networks
Title: Simplicity bias and optimization threshold in two-layer ReLU networks | Einfachheits-Bias und Optimierungsschwelle in zweischichtigen ReLU-Netzwerken | 双层ReLU网络中的简化偏差和优化阈值 2410.02348v2 |
Authors (2): Etienne Boursier, Nicolas Flammarion
Understanding generalization of overparametrized neural networks remains a fundamental challenge in machine learning. Most of the literature mostly studies generalization from an interpolation point of view, taking convergence of parameters towards a global minimum of the training loss for granted. While overparametrized architectures indeed interpolated the data for typical classification tasks, this interpolation paradigm does not seem valid anymore for more complex tasks such as in-context learning or diffusion. Instead for such tasks, it has been empirically observed that the trained models goes from global minima to spurious local minima of the training loss as the number of training samples becomes larger than some level we call optimization threshold. While the former yields a poor generalization to the true population loss, the latter was observed to actually correspond to the minimiser of this true loss. This paper explores theoretically this phenomenon in the context of two-layer ReLU networks. We demonstrate that, despite overparametrization, networks often converge toward simpler solutions rather than interpolating the training data, which can lead to a drastic improvement on the test loss with respect to interpolating solutions. Our analysis relies on the so called early alignment phase, during which neurons align towards specific directions. This directional alignment, which occurs in the early stage of training, leads to a simplicity bias, wherein the network approximates the ground truth model without converging to the global minimum of the training loss. Our results suggest that this bias, resulting in an optimization threshold from which interpolation is not reached anymore, is beneficial and enhances the generalization of trained models.
对超均分的神经神经网络的普遍理解仍然是机器学习中的一项根本挑战。大多数文献大多研究从内推角度的概括性,将参数的趋同性从全球范围的最低培训损失归结到一个全球最低水平。虽然过分对称性结构实际上将典型分类任务的数据相互交织,但这种内推式模式似乎不再适用于更复杂的任务,如文中学习或传播。对于这些任务,从经验上看,经过培训的模型往往从全球最低水平到虚假的当地培训损失最低水平,因为培训样本的数量比我们所谓的优化临界水平要大得多。虽然前者的参数趋同性参数对真正的人口损失造成较低的普遍程度,但人们认为后者实际上与真正损失的最小程度相对应。本文从理论上讲,在两个层次的RELU网络中探索了这种现象。我们证明,尽管这种偏差性,网络往往趋向于更简单的模式,而不是对培训数据进行内推,这可能导致测试损失与内置解决方案的急剧改善。我们的分析依靠的是,在如此早期的神经调整阶段,在进行这种特定的调整过程中,在进行这种调整的早期的阶段,在进行这种特定的调整,在进行中,在进行这种调整过程中,在进行某种特定的调整。
Article 55
Title@2025-06-12 (4): Efficiency Robustness of Dynamic Deep Learning Systems
Title: Efficiency Robustness of Dynamic Deep Learning Systems | Effizienz Robustheit dynamischer Deep Learning Systeme | 动态深深学习系统的效率强强 2506.10831v1 |
Authors (7): Ravishka Rathnasuriya, Tingxi Li, Zexin Xu, Zihe Song, Mirazul Haque, Simin Chen, Wei Yang
Deep Learning Systems (DLSs) are increasingly deployed in real-time applications, including those in resourceconstrained environments such as mobile and IoT devices. To address efficiency challenges, Dynamic Deep Learning Systems (DDLSs) adapt inference computation based on input complexity, reducing overhead. While this dynamic behavior improves efficiency, such behavior introduces new attack surfaces. In particular, efficiency adversarial attacks exploit these dynamic mechanisms to degrade system performance. This paper systematically explores efficiency robustness of DDLSs, presenting the first comprehensive taxonomy of efficiency attacks. We categorize these attacks based on three dynamic behaviors: (i) attacks on dynamic computations per inference, (ii) attacks on dynamic inference iterations, and (iii) attacks on dynamic output production for downstream tasks. Through an in-depth evaluation, we analyze adversarial strategies that target DDLSs efficiency and identify key challenges in securing these systems. In addition, we investigate existing defense mechanisms, demonstrating their limitations against increasingly popular efficiency attacks and the necessity for novel mitigation strategies to secure future adaptive DDLSs.
为了应对效率挑战,动态深层学习系统(DLS)根据投入复杂度调整推论计算,降低管理费用。虽然这种动态行为提高了效率,但这种行为引入了新的攻击表面。特别是,效率对抗性攻击利用了这些动态机制来降低系统性能。本文系统地探索DLS的效率强度,首次对效率攻击进行全面分类。我们根据三种动态行为对这些攻击进行了分类:(一) 攻击动态计算,根据动态推断,(二) 攻击动态推断迭代,(三) 攻击下游任务动态产出生产,通过深入评估,我们分析针对DLS效率的对抗性战略,并找出保障这些系统的关键性挑战。此外,我们调查现有的防御机制,表明它们对日益流行的效率攻击的局限性,以及有必要制定新的缓解战略,以确保今后的DLS适应性攻击。
Article 56
Title@2025-06-12 (4): On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves
Title: On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves | Auf der Geometrie der Receiver-Betriebscharakteristik und Precision-Recall-Kurven | 收件者操作特征和精确召回曲线几何 2504.02169v2 |
Authors (1): Reza Sameni
We study the geometry of Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves in binary classification problems. The key finding is that many of the most commonly used binary classification metrics are merely functions of the composition function $G := F_p \circ F_n^{-1}$, where $F_p(\cdot)$ and $F_n(\cdot)$ are the class-conditional cumulative distribution functions of the classifier scores in the positive and negative classes, respectively. This geometric perspective facilitates the selection of operating points, understanding the effect of decision thresholds, and comparison between classifiers. It also helps explain how the shapes and geometry of ROC/PR curves reflect classifier behavior, providing objective tools for building classifiers optimized for specific applications with context-specific constraints. We further explore the conditions for classifier dominance, present analytical and numerical examples demonstrating the effects of class separability and variance on ROC and PR geometries, and derive a link between the positive-to-negative class leakage function $G(\cdot)$ and the Kullback–Leibler divergence. The framework highlights practical considerations, such as model calibration, cost-sensitive optimization, and operating point selection under real-world capacity constraints, enabling more informed approaches to classifier deployment and decision-making.
在二进制分类问题中,我们研究了接收器操作特征曲线(ROC)和精度-回调(PR)曲线的几何性。关键结论是,许多最常用的二进制分类指标只是构成函数的函数$G:=F_p\circ F_n-1}$,其中,F_p(cdot)$和$F_n(cdot)$分别是正类和负类中分类分数的等级累积性分布功能。这一几何观点有利于选择操作点,理解决定阈值的影响和分类者之间的比较。它也有助于解释ROC/PR曲线的形状和几何形状和几何分反映分类行为,为在特定环境的限制下优化特定应用程序优化的分类提供客观工具。我们进一步探索了分类支配地位的条件,提出分析性和数字实例,表明分类对正到负级的分类差分数和差异对ROC和PR的分布式分布性功能。这种几何角度视角有助于选择操作点渗漏功能,理解决定阈值-实际选择成本比重。
Article 57
Title@2025-06-12 (4): Learning richness modulates equality reasoning in neural networks
Title: Learning richness modulates equality reasoning in neural networks | Lernreichtum moduliert Gleichstellungsabwägungen in neuronalen Netzwerken | 学习丰富 适应神经网络的平等推理 2503.09781v2 |
Authors (2): William L. Tong, Cengiz Pehlevan
Equality reasoning is ubiquitous and purely abstract: sameness or difference may be evaluated no matter the nature of the underlying objects. As a result, same-different (SD) tasks have been extensively studied as a starting point for understanding abstract reasoning in humans and across animal species. With the rise of neural networks that exhibit striking apparent proficiency for abstractions, equality reasoning in these models has also gained interest. Yet despite extensive study, conclusions about equality reasoning vary widely and with little consensus. To clarify the underlying principles in learning SD tasks, we develop a theory of equality reasoning in multi-layer perceptrons (MLP). Following observations in comparative psychology, we propose a spectrum of behavior that ranges from conceptual to perceptual outcomes. Conceptual behavior is characterized by task-specific representations, efficient learning, and insensitivity to spurious perceptual details. Perceptual behavior is characterized by strong sensitivity to spurious perceptual details, accompanied by the need for exhaustive training to learn the task. We develop a mathematical theory to show that an MLP’s behavior is driven by learning richness. Rich-regime MLPs exhibit conceptual behavior, whereas lazy-regime MLPs exhibit perceptual behavior. We validate our theoretical findings in vision SD experiments, showing that rich feature learning promotes success by encouraging hallmarks of conceptual behavior. Overall, our work identifies feature learning richness as a key parameter modulating equality reasoning, and suggests that equality reasoning in humans and animals may similarly depend on learning richness in neural circuits.
平等推理是无处不在的,纯粹抽象的:无论基本对象的性质如何,都可能评价相同或差异。因此,作为理解人类和动物物种抽象推理的起点,对相同或差异(SD)任务进行了广泛的研究。随着神经网络的兴起,明显精准抽象,这些模型的平等推理也引起了兴趣。尽管进行了广泛的研究,关于平等推理的结论大相径庭,而且很少达成共识。为了澄清学习SD任务的基本原则,我们在多层感官(MLP)中发展了一种平等推理理论。在比较心理学的观察之后,我们提出了从概念到感知结果的多种行为。概念行为的特点有:特定任务的表现、高效的学习、对假概念细节细节的敏锐性。概念行为的特点是高度敏感地认识细节,同时需要详尽的培训来学习任务。我们开发了一个数学理论,表明MLP的行为由学习的丰富程度驱动。根据Rich-realim MLP 展示了概念和感知性的行为,而理论性推导了我们的概念性、感性、感性、感性、感性、感性、感官、感性、感性、感性、感性、感性、感性、感性、感性、感力、感性、感力、感力、感力、感力、感力、感、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性能、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性
Article 58
Title@2025-06-12 (4): Empirical and computer-aided robustness analysis of long-step and accelerated methods in smooth convex optimization
Title: Empirical and computer-aided robustness analysis of long-step and accelerated methods in smooth convex optimization | Empirische und computergestützte Robustheitsanalyse von langen und beschleunigten Methoden in reibungsloser konvexer Optimierung | 经验分析和计算机辅助的稳健性分析,分析如何以长步骤和加速方法顺利优化顺流优化 2506.09730v2 |
Authors (2): Pierre Vernimmen, François Glineur
This work assesses both empirically and theoretically, using the performance estimation methodology, how robust different first-order optimization methods are when subject to relative inexactness in their gradient computations. Relative inexactness occurs, for example, when compressing the gradient using fewer bits of information, which happens when dealing with large-scale problems on GPUs. Three major families of methods are analyzed: constant step gradient descent, long-step methods, and accelerated methods. The latter two are first shown to be theoretically not robust to inexactness. Then, a semi-heuristic shortening factor is introduced to improve their theoretical guarantees. All methods are subsequently tested on a concrete inexact problem, with two different types of relative inexactness, and it is observed that both accelerated methods are much more robust than expected, and that the shortening factor significantly helps the long-step methods. In the end, all shortened methods appear to be promising, even in this inexact setting.
这项工作利用业绩估计方法,从经验上和理论上评估了在梯度计算中相对不精确时,不同一级优化方法的稳健程度如何。例如,在使用较少的信息块压缩梯度时,就会出现相对不准确的情况,这在处理GPU的大规模问题时发生。对方法的三大类进行了分析:不变的梯度梯度下降、长步骤方法和加速方法。后两类方法首先显示在理论上不精确。然后,采用了半超速缩短系数来改进其理论保障。所有方法随后都经过一个具体的不精确问题的测试,有两种不同的相对不准确性,据观察,两种加速方法都比预期的强得多,缩短系数对长步骤方法大有帮助。最后,所有缩短的方法似乎在理论上都不可靠,即使在这种精确的环境下也是有希望的。
Article 59
Title@2025-06-12 (4): General targeted machine learning for modern causal mediation analysis
Title: General targeted machine learning for modern causal mediation analysis | Allgemeines gezieltes maschinelles Lernen für moderne kausale Mediationsanalyse | 现代因果调解分析通用定向机器学习 2408.14620v2 |
Authors (4): Richard Liu, Nicholas T. Williams, Kara E. Rudolph, Iván Díaz
Causal mediation analyses investigate the mechanisms through which causes exert their effects, and are therefore central to scientific progress. The literature on the non-parametric definition and identification of mediational effects in rigourous causal models has grown significantly in recent years, and there has been important progress to address challenges in the interpretation and identification of such effects. Despite great progress in the causal inference front, statistical methodology for non-parametric estimation has lagged behind, with few or no methods available for tackling non-parametric estimation in the presence of multiple, continuous, or high-dimensional mediators. In this paper we show that the identification formulas for six popular non-parametric approaches to mediation analysis proposed in recent years can be recovered from just two statistical estimands. We leverage this finding to propose an all-purpose one-step estimation algorithm that can be coupled with machine learning in any mediation study that uses any of these six definitions of mediation. The estimators have desirable properties, such as $\sqrt{n}$-convergence and asymptotic normality. Estimating the first-order correction for the one-step estimator requires estimation of complex density ratios on the potentially high-dimensional mediators, a challenge that is solved using recent advancements in so-called Riesz learning. We illustrate the properties of our methods in a simulation study and illustrate its use on real data to estimate the extent to which pain management practices mediate the total effect of having a chronic pain disorder on opioid use disorder.
尽管在因果关系方面取得重大进展,但非参数估算的统计方法仍然落后,在多重、连续或高层次调解人在场的情况下,用于处理非参数估算的方法很少或根本没有,因此是科学进步的核心。在本文中,我们表明,近年来提出的调解分析的六种流行非参数定义和确定严格因果关系模型的鉴定公式可以从两个统计估计值中收回。我们利用这一发现提出一个全目的一步估算算法,与在任何调解研究中使用这六种调解定义的机能学习相结合。非参数估算方法具有可取性,例如,在多个、连续或高层次调解人在场的情况下,用于处理非参数估算的方法很少或根本没有。我们在本文中显示,近年来提出的调解分析的六种流行非参数性非参数定义的鉴定公式可以从两个统计估计值中恢复。我们利用这一发现来提出一个全目的一步估算法,同时在任何调解研究中学习任何这六种调解定义的机能方面,这些估计值具有可取性,例如$qrt{converg}和astial restruational real real restial restial real real restial restial restial restial romatial ex romatiquest romatime 需要我们使用一个高度对一个高度的精确度的深度的精确度的精确度估算。
Article 60
Title@2025-06-12 (4): Detecting High-Stakes Interactions with Activation Probes
Title: Detecting High-Stakes Interactions with Activation Probes | Ermitteln von High-Stakes-Interaktionen mit Aktivierungssonden | 与活性检测检测检测检测高吸量相互作用 2506.10805v1 |
Authors (7): Alex McKenzie, Urja Pawar, Phil Blandfort, William Bankes, David Krueger, Ekdeep Singh Lubana, Dmitrii Krasheninnikov
Monitoring is an important aspect of safely deploying Large Language Models (LLMs). This paper examines activation probes for detecting “high-stakes” interactions – where the text indicates that the interaction might lead to significant harm – as a critical, yet underexplored, target for such monitoring. We evaluate several probe architectures trained on synthetic data, and find them to exhibit robust generalization to diverse, out-of-distribution, real-world data. Probes’ performance is comparable to that of prompted or finetuned medium-sized LLM monitors, while offering computational savings of six orders-of-magnitude. Our experiments also highlight the potential of building resource-aware hierarchical monitoring systems, where probes serve as an efficient initial filter and flag cases for more expensive downstream analysis. We release our novel synthetic dataset and codebase to encourage further study.
本文审查了用于探测“高接触”相互作用的激活探测器 – – 文本指出这种相互作用可能导致重大伤害 – – 作为这种监测的关键但探索不足的目标。我们评估了几个经过合成数据培训的探测结构,发现这些结构能够展示出对多样化的、分布外的、真实世界数据进行有力概括的特征。探头的性能与激励或微调的中等规模LLM监测器的性能相当,同时提供了六个微量级的计算节余。我们的实验还突显了建立有资源意识的等级监测系统的潜力,在这种系统中,探测器作为高效的初始过滤器和标志案例,用于更昂贵的下游分析。我们发行了我们新的合成数据集和代码库,以鼓励进一步研究。
Article 61
Title@2025-06-12 (4): Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background Knowledge
Title: Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background Knowledge | Optimale Algorithmen für eingeschränktes k-Center-Clustering mit Instance-Level-Hintergrundwissen | 受约束的 k- Center 集成实情背景知识的近最佳 K- Center 算法 2401.12533v4 |
Authors (5): Longkun Guo, Chaoqi Jia, Kewen Liao, Zhigang Lu, Minhui Xue
Center-based clustering has attracted significant research interest from both theory and practice. In many practical applications, input data often contain background knowledge that can be used to improve clustering results. In this work, we build on widely adopted $k$-center clustering and model its input background knowledge as must-link (ML) and cannot-link (CL) constraint sets. However, most clustering problems including $k$-center are inherently $\mathcal{NP}$-hard, while the more complex constrained variants are known to suffer severer approximation and computation barriers that significantly limit their applicability. By employing a suite of techniques including reverse dominating sets, linear programming (LP) integral polyhedron, and LP duality, we arrive at the first efficient approximation algorithm for constrained $k$-center with the best possible ratio of 2. We also construct competitive baseline algorithms and empirically evaluate our approximation algorithm against them on a variety of real datasets. The results validate our theoretical findings and demonstrate the great advantages of our algorithm in terms of clustering cost, clustering quality, and running time.
以中心为主的集群在理论和实践两方面都引起了重要的研究兴趣。在许多实际应用中,输入数据往往包含可用于改进集群结果的背景知识。在这项工作中,我们以广泛采用的以美元为核心的集群为基础,并以其输入背景知识作为必须链接(ML)和不能链接(CL)的制约组合模型进行建模。然而,包括美元中枢在内的大多数集群问题本质上是美元硬的,而已知较复杂的受限制变量会遭受严重近似和计算障碍,从而严重限制其适用性。通过使用一系列技术,包括反向主导数据集、线性编程(LP)整体多元元和LP的双重性,我们达到了限制美元中枢的第一个有效近似算法,其最佳比率为2。 我们还建立了竞争性基线算法,并用经验评估我们根据各种真实数据集进行的近似算法。结果证实了我们的理论结论,并展示了我们的算法在组合成本、组合质量和运行时间方面的巨大优势。
Article 62
Title@2025-06-12 (4): Dense Associative Memory with Epanechnikov Energy
Title: Dense Associative Memory with Epanechnikov Energy | Dichtes assoziatives Gedächtnis mit Epanechnikov Energie | Epanechnikov能源的密集联合记忆 2506.10801v1 |
Authors (5): Benjamin Hoover, Zhaoyang Shi, Krishnakumar Balasubramanian, Dmitry Krotov, Parikshit Ram
We propose a novel energy function for Dense Associative Memory (DenseAM) networks, the log-sum-ReLU (LSR), inspired by optimal kernel density estimation. Unlike the common log-sum-exponential (LSE) function, LSR is based on the Epanechnikov kernel and enables exact memory retrieval with exponential capacity without requiring exponential separation functions. Moreover, it introduces abundant additional \emph{emergent} local minima while preserving perfect pattern recovery – a characteristic previously unseen in DenseAM literature. Empirical results show that LSR energy has significantly more local minima (memories) that have comparable log-likelihood to LSE-based models. Analysis of LSR’s emergent memories on image datasets reveals a degree of creativity and novelty, hinting at this method’s potential for both large-scale memory storage and generative tasks.
我们提议为 “ 共振内存 “ 网络,即由最佳内核密度估计所启发的日志总和(LSR)网络提供一个新的能源功能。与通用日志总和(LSE)功能不同,LSR以Epanechnikov内核为基础,能够以指数能力进行精确的内存检索,而不需要指数分离功能。此外,它引入了大量更多的额外\emph{emergent}当地微型资源,同时保持了完美的模式恢复 – – 这是DenseAM文献中以前所见的特征。经验性结果显示,LSR能量具有比LSE模型类似本地微型(模量)的显著增加。对图像数据集的原始记忆的分析揭示了一定程度的创造性和新颖性,暗示了这一方法对于大型内存存储和基因化任务的潜力。
Article 63
Title@2025-06-12 (4): Divide-Fuse-Conquer: Eliciting “Aha Moments” in Multi-Scenario Games
Title: Divide-Fuse-Conquer: Eliciting “Aha Moments” in Multi-Scenario Games | Divide-Fuse-Conquer: Eliciting “Aha Momente” in Multi-Szenario-Spiele | 分裂-裂变:在多种场景运动会中激发“哈动力” 2505.16401v4 |
Authors (8): Xiaoqing Zhang, Huabin Zheng, Ang Lv, Yuhan Liu, Zirui Song, Xiuying Chen, Rui Yan, Flood Sung
Large language models (LLMs) have been observed to suddenly exhibit advanced reasoning abilities during reinforcement learning (RL), resembling an ``aha moment’’ triggered by simple outcome-based rewards. While RL has proven effective in eliciting such breakthroughs in tasks involving mathematics, coding, and vision, it faces significant challenges in multi-scenario games. The diversity of game rules, interaction modes, and environmental complexities often leads to policies that perform well in one scenario but fail to generalize to others. Simply combining multiple scenarios during training introduces additional challenges, such as training instability and poor performance. To overcome these challenges, we propose Divide-Fuse-Conquer, a framework designed to enhance generalization in multi-scenario RL. This approach starts by heuristically grouping games based on characteristics such as rules and difficulties. Specialized models are then trained for each group to excel at games in the group is what we refer to as the divide step. Next, we fuse model parameters from different groups as a new model, and continue training it for multiple groups, until the scenarios in all groups are conquered. Experiments across 18 TextArena games show that Qwen2.5-32B-Align trained with the Divide-Fuse-Conquer strategy reaches a performance level comparable to Claude3.5, achieving 7 wins and 4 draws. We hope our approach can inspire future research on using reinforcement learning to improve the generalization of LLMs.
大型语言模型(LLMS)被观察到在强化学习(RL)期间突然展示了先进的推理能力,类似于简单的基于结果的奖励触发了“aha moment ” 的“aha moment ” 。虽然RL已证明在数学、编码和愿景等任务中取得突破是有效的,但它在多角度的游戏中面临重大挑战。游戏规则、互动模式和环境复杂性的多样性往往导致在一种情景中表现良好,但未能推广到其他情景。仅仅在培训期间将多种情景结合起来就带来了额外的挑战,例如培训不稳定和业绩不佳。为了克服这些挑战,我们提议了一个旨在强化多角度RL中通用化的框架。这个方法始于基于规则和困难等特点的超理论性组合游戏。然后,我们为每个群体培训了在集团游戏中优异的模型,我们称之为分化步骤。接下来,我们将不同群体的模式参数融合为新的模式,并继续为多个群体提供培训,直到所有群体的情况都被征服。在18个TextArnaF游戏中进行实验,通过经过培训的升级的升级战略来提升我们未来的学习水平。
Article 64
Title@2025-06-12 (4): Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling
Title: Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling | Adaptives Federated LoRA in heterogenen drahtlosen Netzwerken mit unabhängiger Probenahme | 具有独立抽样调查的多源无线网络中的联邦适应性 2505.23555v2 |
Authors (7): Yanzhao Hou, Jiaxiang Geng, Boyu Li, Xiaofeng Tao, Juncheng Wang, Xiaodong Xu, Bing Luo
Federated LoRA has emerged as a promising technique for efficiently fine-tuning large language models (LLMs) on distributed devices by reducing the number of trainable parameters. However, existing approaches often inadequately overlook the theoretical and practical implications of system and data heterogeneity, thereby failing to optimize the overall training efficiency, particularly in terms of wall-clock time. In this paper, we propose an adaptive federated LoRA strategy with independent client sampling to minimize the convergence wall-clock time of federated fine-tuning under both computation and communication heterogeneity. We first derive a new convergence bound for federated LoRA with arbitrary and independent client sampling, notably without requiring the stringent bounded gradient assumption. Then, we introduce an adaptive bandwidth allocation scheme that accounts for heterogeneous client resources and system bandwidth constraints. Based on the derived theory, we formulate and solve a non-convex optimization problem to jointly determine the LoRA sketching ratios and sampling probabilities, aiming to minimize wall-clock convergence time. An efficient and low-complexity algorithm is developed to approximate the solution. Finally, extensive experiments demonstrate that our approach significantly reduces wall-clock training time compared to state-of-the-art methods across various models and datasets.
通过减少可训练参数的数量,联邦洛拉联盟已成为高效微调分布式设备上大型语言模型(LLMs)的一个很有希望的技术,通过减少可训练参数的数量,可以有效地微调分布式设备上的大型语言模型(LLMs),但是,现有的方法往往没有适当地忽视系统和数据差异的理论和实践影响,从而未能优化总体培训效率,特别是墙时时段的培训效率。在本文件中,我们提出了一个适应性的联邦洛拉联盟战略,通过独立客户抽样,尽量减少计算和通信差异性两种情况下联合微调的同步时间。我们首先为具有任意和独立客户抽样的联邦洛拉公司找到新的趋同点,特别是不需要严格的封闭梯度假设。然后,我们引入了适应性带宽分配计划,考虑到各种客户资源和系统带宽限制。根据推理,我们制定并解决非凝固型优化问题,共同确定洛拉的草图比例和取样概率,目的是最大限度地减少墙时段的趋同时间。我们制定了高效和低兼容性的算法,以近解决方案。最后,广泛的实验表明我们的做法大大缩短了各种壁点培训时间和不同状态的数据。
Article 65
Title@2025-06-12 (4): Mimicking Human Intuition: Cognitive Belief-Driven Reinforcement Learning
Title: Mimicking Human Intuition: Cognitive Belief-Driven Reinforcement Learning | Mimikking Human Intuition: Kognitives Glauben-getriebenes Stärkungslernen | 模拟人类感知:认知信仰 – – 强化学习 2410.01739v3 |
Authors (3): Xingrui Gu, Guanren Qiao, Chuyi Jiang
Traditional reinforcement learning (RL) methods mainly rely on trial-and-error exploration, often lacking mechanisms to guide agents toward more informative decision-making and struggling to leverage past experiences, resulting in low sample efficiency. To overcome this issue, we propose an innovative framework inspired by cognitive principles: Cognitive Belief-Driven Reinforcement Learning (CBD-RL). By incorporating cognitive heuristics, CBD-RL transforms conventional trial-and-error learning into a more structured and guided learning paradigm, simulating the human reasoning process. This framework’s core is a belief system that optimizes action probabilities by integrating feedback with prior experience, thus enhancing decision making under uncertainty. It also organizes state-action pairs into meaningful categories, promoting generalization and improving sample efficiency. The concrete implementations of this framework, CBDQ, CBDPPO, and CBDSAC, demonstrate superior performance in discrete and continuous action spaces in diverse environments such as Atari and MuJoCo. By bridging cognitive science and reinforcement learning, this research opens a new avenue for developing RL systems that are more interpretable, efficient, and cognitively inspired.
传统强化学习(RL)方法主要依赖试探和试探探索,往往缺乏机制来引导代理人进行信息更加丰富的决策,并努力利用过去的经验,从而降低抽样效率。为了克服这一问题,我们提议了一个由认知原则启发的创新框架:认知信仰驱动强化学习(CBD-RL),通过将认知超常性学纳入一个更有条理、更有指导的学习模式,CBD-RL将常规试探和试探学习转化成一个更有条理的学习模式,模拟人类推理过程。这个框架的核心是一个信仰系统,它通过将反馈与以往的经验结合起来,优化行动概率,从而增强在不确定性下的决策。它还将州际行动对口组织成有意义的类别,促进总体化并提高抽样效率。这个框架(CBDQ、CBOPPO和CDSAC)的具体实施展示了在诸如Atari和MuJoCo等不同环境中的离散和连续行动空间的优异性表现。通过连接认知科学和强化学习,这一研究为开发更易解释、高效和有认知灵感的RL系统开辟了一条新途径。
Article 66
Title@2025-06-12 (4): Incentivizing Quality Text Generation via Statistical Contracts
Title: Incentivizing Quality Text Generation via Statistical Contracts | Förderung der Erzeugung von Qualitätstexten durch statistische Verträge | 通过统计合同鼓励质量制文本 2406.11118v2 |
Authors (3): Eden Saig, Ohad Einav, Inbal Talgam-Cohen
While the success of large language models (LLMs) increases demand for machine-generated text, current pay-per-token pricing schemes create a misalignment of incentives known in economics as moral hazard: Text-generating agents have strong incentive to cut costs by preferring a cheaper model over the cutting-edge one, and this can be done “behind the scenes” since the agent performs inference internally. In this work, we approach this issue from an economic perspective, by proposing a pay-for-performance, contract-based framework for incentivizing quality. We study a principal-agent game where the agent generates text using costly inference, and the contract determines the principal’s payment for the text according to an automated quality evaluation. Since standard contract theory is inapplicable when internal inference costs are unknown, we introduce cost-robust contracts. As our main theoretical contribution, we characterize optimal cost-robust contracts through a direct correspondence to optimal composite hypothesis tests from statistics, generalizing a result of Saig et al. (NeurIPS’23). We evaluate our framework empirically by deriving contracts for a range of objectives and LLM evaluation benchmarks, and find that cost-robust contracts sacrifice only a marginal increase in objective value compared to their cost-aware counterparts.
虽然大型语言模型(LLMs)的成功增加了对机器生成文本的需求,但目前的按成本计价计划却造成了对经济学中已知的道德危害的激励因素的不匹配: 文本制作商通过选择更便宜的模型而不是尖端模型而强烈地鼓励削减成本,而且这样做可以“在幕后”进行,因为该代理商在内部进行推断。在这项工作中,我们从经济角度处理该问题,提出一个基于合同的按业绩计酬框架来激励质量。我们研究了一种主要代理商游戏,即代理商使用昂贵的推断生成文本,而合同根据自动质量评估确定文本的本金付款。由于标准合同理论在内部推论成本未知的情况下不适用,我们引入了成本-破坏合同。作为我们的主要理论贡献,我们通过直接对应统计中的最佳综合假设测试,概括赛格等人的结果,我们通过将成本与成本比比比,我们仅将成本比对目标和成本比比的牺牲合同来评估我们的框架。(NeurIPS23)。
Article 67
Title@2025-06-12 (4): SlotPi: Physics-informed Object-centric Reasoning Models
Title: SlotPi: Physics-informed Object-centric Reasoning Models | SlotPi: Physik-informierte objektzentrierte Vernunftmodelle | SlotPi:物理学-知情的物体中心理由模型 2506.10778v1 |
Authors (11): Jian Li, Wan Han, Ning Lin, Yu-Liang Zhan, Ruizhi Chengze, Haining Wang, Yi Zhang, Hongsheng Liu, Zidong Wang, Fan Yu, Hao Sun
Understanding and reasoning about dynamics governed by physical laws through visual observation, akin to human capabilities in the real world, poses significant challenges. Currently, object-centric dynamic simulation methods, which emulate human behavior, have achieved notable progress but overlook two critical aspects: 1) the integration of physical knowledge into models. Humans gain physical insights by observing the world and apply this knowledge to accurately reason about various dynamic scenarios; 2) the validation of model adaptability across diverse scenarios. Real-world dynamics, especially those involving fluids and objects, demand models that not only capture object interactions but also simulate fluid flow characteristics. To address these gaps, we introduce SlotPi, a slot-based physics-informed object-centric reasoning model. SlotPi integrates a physical module based on Hamiltonian principles with a spatio-temporal prediction module for dynamic forecasting. Our experiments highlight the model’s strengths in tasks such as prediction and Visual Question Answering (VQA) on benchmark and fluid datasets. Furthermore, we have created a real-world dataset encompassing object interactions, fluid dynamics, and fluid-object interactions, on which we validated our model’s capabilities. The model’s robust performance across all datasets underscores its strong adaptability, laying a foundation for developing more advanced world models.
通过视觉观测,与现实世界中的人类能力相近,对物理法规范的动态的理解和推理提出了重大挑战。目前,以物体为中心的动态模拟方法,与人类行为相似,取得了显著的进展,但忽略了两个关键方面:(1)物理知识融入模型;人类通过观察世界而获得物理洞察力,并将这种知识应用于对各种动态假设的准确理解;(2)对不同情景模型的适应性进行验证;(2)真实世界动态,特别是涉及流体和物体的动态,要求不仅捕捉物体相互作用,而且模拟流体流体特征的模型。为弥补这些差距,我们引入了SlotPi,即基于空格的物理知情物理学物体中心推理模型。SlotPi将基于汉密尔顿原则的物理模块与用于动态预测的时空预测模块结合起来。我们的实验突出模型在基准和流体数据集等任务中的优势。此外,我们创建了一个包含物体相互作用、流体动力动态和流体反射体相互作用的实时数据集,我们以此验证了我们模型的先进能力。模型展示了一种强大的世界模型的基础。
Article 68
Title@2025-06-12 (4): Monotone Classification with Relative Approximations
Title: Monotone Classification with Relative Approximations | Einfarbige Klassifizierung mit relativen Annäherungen | 相对相近的单体分类 2506.10775v1 |
Authors (1): Yufei Tao
In monotone classification, the input is a multi-set $P$ of points in $\mathbb{R}^d$, each associated with a hidden label from ${-1, 1}$. The goal is to identify a monotone function $h$, which acts as a classifier, mapping from $\mathbb{R}^d$ to ${-1, 1}$ with a small {\em error}, measured as the number of points $p \in P$ whose labels differ from the function values $h(p)$. The cost of an algorithm is defined as the number of points having their labels revealed. This article presents the first study on the lowest cost required to find a monotone classifier whose error is at most $(1 + \epsilon) \cdot k^$ where $\epsilon \ge 0$ and $k^$ is the minimum error achieved by an optimal monotone classifier – in other words, the error is allowed to exceed the optimal by at most a relative factor. Nearly matching upper and lower bounds are presented for the full range of $\epsilon$. All previous work on the problem can only achieve an error higher than the optimal by an absolute factor.
在单调分类中,输入的值为$mathbb{Rd$的多价P$,每个值都与$@-1, 1@$的隐藏标签相关。 目标是确定一个单调函数$h$, 它作为分类器发挥作用, 从$mathbb{Rd$到$%-1, 1$@$@$d$, 用小的 em差错量测量, 其标签不同于函数值$h( p)$的点数 。 算法的成本被定义为其标签被显示的点数。 本文章首次对找到一个单调分类器所需的最低成本进行了研究, 该分类器的错误最大值为$(1+\ epsilon)\ = k$@ $@ $@ gege 0 和 $k$ $是最佳单调的最小错误 – 换句话说, 允许错误以最优的相对系数超过最优的值。 近匹配的上下限和下限, 仅匹配的上下限值是前位值的绝对值, 。
Article 69
Title@2025-06-12 (4): Skillful joint probabilistic weather forecasting from marginals
Title: Skillful joint probabilistic weather forecasting from marginals | Qualifizierte gemeinsame probabilistische Wettervorhersage von Randgruppen | 来自边缘地带的有技能的联合概率天气预报 2506.10772v1 |
Authors (11): Ferran Alet, Ilan Price, Andrew El-Kadi, Dominic Masters, Stratis Markou, Tom R. Andersson, Jacklynn Stott, Remi Lam, Matthew Willson, Alvaro Sanchez-Gonzalez, Peter Battaglia
Machine learning (ML)-based weather models have rapidly risen to prominence due to their greater accuracy and speed than traditional forecasts based on numerical weather prediction (NWP), recently outperforming traditional ensembles in global probabilistic weather forecasting. This paper presents FGN, a simple, scalable and flexible modeling approach which significantly outperforms the current state-of-the-art models. FGN generates ensembles via learned model-perturbations with an ensemble of appropriately constrained models. It is trained directly to minimize the continuous rank probability score (CRPS) of per-location forecasts. It produces state-of-the-art ensemble forecasts as measured by a range of deterministic and probabilistic metrics, makes skillful ensemble tropical cyclone track predictions, and captures joint spatial structure despite being trained only on marginals.
机器学习(ML)的天气模型由于比基于数字天气预测(NWP)的传统预测更加精确和速度,在近期全球概率天气预报中表现优于传统组合,因此迅速变得突出,本文件介绍了FGN这一简单、可扩展和灵活的模型方法,该方法大大优于目前最先进的模型;FGN通过经学习的模型干扰和一系列适当受限制的模型生成集合;它直接接受培训,以尽量降低每个地点预测的连续级概率分数(CRPS);它根据一系列确定性和概率指标进行最先进的共同预测;使高技能的热带气旋轨迹预测变得多才多艺;尽管只受过边际学培训,但它捕捉了联合空间结构。
Article 70
Title@2025-06-12 (4): OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Title: OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems | OPT-BENCH: LLM Agent auf großräumigen Suchräumen bewerten Optimierungsprobleme | OOP-BENCH:评估大型搜索空间优化问题LLM代理 2506.10764v1 |
Authors (7): Xiaozhe Li, Jixuan Chen, Xinyu Fang, Shengyuan Ding, Haodong Duan, Qingwen Liu, Kai Chen
Large Language Models (LLMs) have shown remarkable capabilities in solving diverse tasks. However, their proficiency in iteratively optimizing complex solutions through learning from previous feedback remains insufficiently explored. To bridge this gap, we present OPT-BENCH, a comprehensive benchmark designed to evaluate LLM agents on large-scale search space optimization problems. OPT-BENCH includes 20 real-world machine learning tasks sourced from Kaggle and 10 classical NP problems, offering a diverse and challenging environment for assessing LLM agents on iterative reasoning and solution refinement. To enable rigorous evaluation, we introduce OPT-Agent, an end-to-end optimization framework that emulates human reasoning when tackling complex problems by generating, validating, and iteratively improving solutions through leveraging historical feedback. Through extensive experiments on 9 state-of-the-art LLMs from 6 model families, we analyze the effects of optimization iterations, temperature settings, and model architectures on solution quality and convergence. Our results demonstrate that incorporating historical context significantly enhances optimization performance across both ML and NP tasks. All datasets, code, and evaluation tools are open-sourced to promote further research in advancing LLM-driven optimization and iterative reasoning. Project page: \href{https://github.com/OliverLeeXZ/OPT-BENCH}{https://github.com/OliverLeeXZ/OPT-BENCH}.
大型语言模型(LLMS)在解决不同任务方面表现出了非凡的能力,然而,它们通过从以往反馈中学习,在迭代优化复杂解决方案方面精通率的精通度仍未得到充分探讨。为了缩小这一差距,我们介绍了旨在评价大型搜索空间优化问题LLM代理物的全面基准O-BENCH。OF-BENCH包括来自Kagle和10个典型NP问题的20个真实世界机器学习任务,为在迭代推理和解决方案完善方面评估LLM代理物提供了多样而具有挑战性的环境。为了能够进行严格的评估,我们引入了LO-Agents,即一个在利用历史反馈解决复杂问题时仿照人类推理的端对端优化框架,即生成、验证和迭代改进解决方案。通过对来自6个模式家庭的9个最先进的LMMs进行广泛的实验,我们分析了优化迭代机率、温度设置和模型结构对解决方案质量和趋同的影响。我们的成果表明,纳入历史背景极大地提高了ML和NP任务的最佳性业绩。所有数据集、代码和评价工具都是公开来源,以促进推进LMM-LM-LO-LOBZ/DUBSUB/DLGIZ/DLVLS/CS/CLS/CLGIGIGLPLPS/CS/CLPLPLPSPS/CSPSPPPPPPPS/PPPPPS/PPPPPPPPPPPPPPPPS/S/S/PPPPPPPPS/S/S/S/S/S/S/SPPPPS/S/S/S/S/S/S/S/S/S/S_
Article 71
Title@2025-06-12 (4): Great Models Think Alike and this Undermines AI Oversight
Title: Great Models Think Alike and this Undermines AI Oversight | Große Modelle denken ähnlich und dies unterminiert AI Oversight | 伟大的模特儿们想着类似的想法 和这枚地下地雷 AI监督 2502.04313v2 |
Authors (9): Shashwat Goel, Joschka Struber, Ilze Amanda Auzina, Karuna K Chandra, Ponnurangam Kumaraguru, Douwe Kiela, Ameya Prabhu, Matthias Bethge, Jonas Geiping
As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we refer to as ‘‘AI Oversight’’. We study how model similarity affects both aspects of AI oversight by proposing Chance Adjusted Probabilistic Agreement (CAPA): a metric for LM similarity based on overlap in model mistakes. Using CAPA, we first show that LLM-as-a-judge scores favor models similar to the judge, generalizing recent self-preference results. Then, we study training on LM annotations, and find complementary knowledge between the weak supervisor and strong student model plays a crucial role in gains from ‘‘weak-to-strong generalization’’. As model capabilities increase, it becomes harder to find their mistakes, and we might defer more to AI oversight. However, we observe a concerning trend – model mistakes are becoming more similar with increasing capabilities, pointing to risks from correlated failures. Our work underscores the importance of reporting and correcting for model similarity, especially in the emerging paradigm of AI oversight.
随着语言模型(LM)能力的进步、评估和在规模上监督这些能力对人类来说越来越困难。希望其他语言模型能够使这两项任务自动化,我们称之为“AI 监督 ” 。我们研究模型相似性如何影响AI监督的两方面,提出“CAPA 机会调整概率协议(CAPA) : 一种基于模式错误重叠的LM相似性衡量标准。我们使用CAPA,我们首先显示LLM-as-a-judge 评分类似法官的偏爱模式,概括最近的自我参照结果。然后,我们研究LM 说明,并在弱的主管和强的学生模型之间找到互补知识,在“Weak-en-strong-processional化”的收益中发挥着关键作用。随着模型能力的增长,发现错误变得更加困难,我们可能更偏重于AI监督。然而,我们观察到一种趋势 — 模型错误正越来越接近于日益增强的能力,指出了相关失败的风险。我们的工作强调了报告和纠正模式相似性的重要性,特别是在新兴的AI 监督模式中。
Article 72
Title@2025-06-12 (4): Persistent Topological Features in Large Language Models
Title: Persistent Topological Features in Large Language Models | Persistente Topologische Features in großen Sprachmodellen | 大语言模式中的持久性有机污染物特征 2410.11042v2 |
Authors (6): Yuri Gardinazzi, Karthik Viswanathan, Giada Panerai, Alessio Ansuini, Alberto Cazzaniga, Matteo Biagetti
Understanding the decision-making processes of large language models is critical given their widespread applications. To achieve this, we aim to connect a formal mathematical framework – zigzag persistence from topological data analysis – with practical and easily applicable algorithms. Zigzag persistence is particularly effective for characterizing data as it dynamically transforms across model layers. Within this framework, we introduce topological descriptors that measure how topological features, $p$-dimensional holes, persist and evolve throughout the layers. Unlike methods that assess each layer individually and then aggregate the results, our approach directly tracks the full evolutionary path of these features. This offers a statistical perspective on how prompts are rearranged and their relative positions changed in the representation space, providing insights into the system’s operation as an integrated whole. To demonstrate the expressivity and applicability of our framework, we highlight how sensitive these descriptors are to different models and a variety of datasets. As a showcase application to a downstream task, we use zigzag persistence to establish a criterion for layer pruning, achieving results comparable to state-of-the-art methods while preserving the system-level perspective.
理解大语言模型的决策过程至关重要,因为大语言模型的应用十分广泛。为了实现这一点,我们力求将正式数学框架 – – 从地形数据分析得出的zigzag持久性 – – 与实用和易于应用的算法联系起来。 Zigzag持久性对于数据特征的定性特别有效,因为它能动态地跨模式层转变。在此框架内,我们引入了表层描述符,用来测量地形特征、美元-维洞、如何持续并在整个层次上演。不同于对每个层次进行单独评估和随后汇总结果的方法,我们的方法直接跟踪这些特征的全面演进路径。这提供了一种统计视角,说明在代表空间中如何重新排列亮度及其相对位置的变化,为整个系统的运作提供洞察力。为了展示我们框架的清晰度和适用性,我们强调这些描述符对不同模型和各种数据集的敏感度。作为下游任务的示范应用,我们用 zigzag持久性来建立层钻探标准,在保持系统层面视角的同时,取得与最新方法相近的结果。
Article 73
Title@2025-06-12 (4): Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering
Title: Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering | Neural bei ArchEHR-QA 2025: Agentische Prompt-Optimierung für evidenzgerundete klinische Fragen | ArchEHR-QA 2025:证据四舍五入临床问题解答的代理快速优化 2506.10751v1 |
Authors (6): Sai Prasanna Teja Reddy Bogireddy, Abrar Majeedi, Viswanatha Reddy Gajjala, Zhuoyan Xu, Siddhant Rai, Vaishnav Potlapalli
Automated question answering (QA) over electronic health records (EHRs) can bridge critical information gaps for clinicians and patients, yet it demands both precise evidence retrieval and faithful answer generation under limited supervision. In this work, we present Neural, the runner-up in the BioNLP 2025 ArchEHR-QA shared task on evidence-grounded clinical QA. Our proposed method decouples the task into (1) sentence-level evidence identification and (2) answer synthesis with explicit citations. For each stage, we automatically explore the prompt space with DSPy’s MIPROv2 optimizer, jointly tuning instructions and few-shot demonstrations on the development set. A self-consistency voting scheme further improves evidence recall without sacrificing precision. On the hidden test set, our method attains an overall score of 51.5, placing second stage while outperforming standard zero-shot and few-shot prompting by over 20 and 10 points, respectively. These results indicate that data-driven prompt optimization is a cost-effective alternative to model fine-tuning for high-stakes clinical QA, advancing the reliability of AI assistants in healthcare.
对电子健康记录自动回答问题(QA)可以弥补临床医生和病人的关键信息差距,但要求准确的证据检索和在有限监督下忠实地生成答案。在这项工作中,我们介绍BioNLP 2025年ArchEHR-QA的后排神经系统,共同执行有证据根据的临床质量分析任务。我们建议的方法将这项任务分为(1) 判决一级证据识别和(2) 回答合成,并明确引用。在每一个阶段,我们自动探索与DSPy’s MIPROv2优化器、联合调整指示和对成套开发的微小的演示的迅速空间。自我一致投票计划可以进一步改进证据召回,而不牺牲精确性。在隐藏的测试中,我们的方法达到总分51.5分,处于第二个阶段,同时超过标准零发数和几发点,分别超过20分和10分。这些结果表明,数据驱动的快速优化是高取临床质量分析模型的成本效益替代方法,提高AI助理在医疗保健方面的可靠性。
Article 74
Title@2025-06-12 (4): Graphical Transformation Models
Title: Graphical Transformation Models | Graphische Transformationsmodelle | 图形转换模型 2503.17845v3 |
Authors (4): Matthias Herp, Johannes Brachem, Michael Altenbuchinger, Thomas Kneib
Graphical Transformation Models (GTMs) are introduced as a novel approach to effectively model multivariate data with intricate marginals and complex dependency structures non-parametrically, while maintaining interpretability through the identification of varying conditional independencies. GTMs extend multivariate transformation models by replacing the Gaussian copula with a custom-designed multivariate transformation, offering two major advantages. Firstly, GTMs can capture more complex interdependencies using penalized splines, which also provide an efficient regularization scheme. Secondly, we demonstrate how to approximately regularize GTMs using a lasso penalty towards pairwise conditional independencies, akin to Gaussian graphical models. The model’s robustness and effectiveness are validated through simulations, showcasing its ability to accurately learn parametric vine copulas and identify conditional independencies. Additionally, the model is applied to a benchmark astrophysics dataset, where the GTM demonstrates favorable performance compared to non-parametric vine copulas in learning complex multivariate distributions.
图形转换模型(GTMs)被引入为一种新的方法,以有效模拟具有复杂边际和复杂依赖结构的多变量数据,而不以参数为依据,同时通过识别不同的有条件依赖性来保持解释性。GTMs扩展了多变量转换模型,以定制设计的多变量变换模式取代Gaussian 椰子,提供了两个主要的优势。首先,GTMs可以使用惩罚性的样条来捕捉更复杂的相互依存关系,这也提供了一个有效的正规化计划。第二,我们展示了如何使用拉索罚款来大致规范GTMs,使其适应与高斯图形模型类似的双向有条件依赖性。该模型的坚固性和有效性通过模拟得到验证,展示其准确学习参数性松动和确定有条件的互不依赖性的能力。此外,该模型还用于一个基准的天体物理学数据集,其中GTM在学习复杂的多变量分布中显示优于非参数相交配方的功能。
Article 75
Title@2025-06-12 (4): A User’s Guide to Sampling Strategies for Sliced Optimal Transport
Title: A User’s Guide to Sampling Strategies for Sliced Optimal Transport | Leitfaden für Anwender zur Probenahme von Strategien für den optimalen Transport | 剪切最佳运输抽样战略用户指南 2502.02275v4 |
Authors (3): Keanu Sisouk, Julie Delon, Julien Tierny
This paper serves as a user’s guide to sampling strategies for sliced optimal transport. We provide reminders and additional regularity results on the Sliced Wasserstein distance. We detail the construction methods, generation time complexity, theoretical guarantees, and conditions for each strategy. Additionally, we provide insights into their suitability for sliced optimal transport in theory. Extensive experiments on both simulated and real-world data offer a representative comparison of the strategies, culminating in practical recommendations for their best usage.
本文作为用户对切片最佳运输的抽样战略的指南,我们提供了关于切片瓦森斯坦距离的提醒和额外规律性结果,我们详细介绍了每个战略的建造方法、生成时间复杂性、理论保障和条件。此外,我们深入了解这些方法在理论上是否适合切片最佳运输。关于模拟数据和现实世界数据的广泛实验对战略进行了有代表性的比较,最终提出了最佳使用的实际建议。
Article 76
Title@2025-06-12 (4): Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals
Title: Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals | Hybride Bernstein-Normalisierungsströmungen für flexible Multivariate Dichteregression mit interpretierbaren Marginalen | 具有可解释边际作用的灵活多变密度递减的 伯恩斯坦混合 伯恩斯坦 2505.14164v2 |
Authors (4): Marcel Arpogaus, Thomas Kneib, Thomas Nagler, David Rügamer
Density regression models allow a comprehensive understanding of data by modeling the complete conditional probability distribution. While flexible estimation approaches such as normalizing flows (NF) work particularly well in multiple dimensions, interpreting the input-output relationship of such models is often difficult, due to the black-box character of deep learning models. In contrast, existing statistical methods for multivariate outcomes such as multivariate conditional transformation models (MCTM) are restricted in flexibility and are often not expressive enough to represent complex multivariate probability distributions. In this paper, we combine MCTM with state-of-the-art and autoregressive NF to leverage the transparency of MCTM for modeling interpretable feature effects on the marginal distributions in the first step and the flexibility of neural-network-based NF techniques to account for complex and non-linear relationships in the joint data distribution. We demonstrate our method’s versatility in various numerical experiments and compare it with MCTM and other NF models on both simulated and real-world data.
密度回归模型通过对完全的有条件概率分布进行建模,使得人们能够全面理解数据。灵活的估算方法,例如正常流动(NF)在多个方面特别有效,但是由于深层学习模型的黑盒特性,解释这些模型的输入-产出关系往往很困难。相比之下,多种变式有条件转换模型(MCTM)等现有多变量结果统计方法的灵活性有限,而且往往不够清晰,无法代表复杂的多变量概率分布。在本文中,我们将MCTTM与最新和自动递增的NF结合起来,以便利用MTM的透明度来模拟第一阶段边际分布的可解释特点效应,以及神经网络的NFT技术的灵活性,以便在联合数据分布中说明复杂和非线性关系。我们展示了我们的方法在各种数字实验中的多功能性,并在模拟数据和实际数据中与MTM和其他NF模型进行比较。
Article 77
Title@2025-06-12 (4): BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery
Title: BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery | BioNeMo Framework: eine modulare, leistungsstarke Bibliothek für die Entwicklung von KI-Modellen in der Wirkstoffentdeckung | BioNemo框架:一个模块化、高绩效图书馆,用于在药物发现方面开发AI模型 2411.10548v3 |
Authors (93): Peter St. John, Dejun Lin, Polina Binder, Malcolm Greaves, Vega Shah, John St. John, Adrian Lange, Patrick Hsu, Rajesh Illango, Arvind Ramanathan, Anima Anandkumar, David H Brookes, Akosua Busia, Abhishaike Mahajan, Stephen Malina, Neha Prasad, Sam Sinai, Lindsay Edwards, Thomas Gaudelet, Cristian Regep, Martin Steinegger, Burkhard Rost, Alexander Brace, Kyle Hippe, Luca Naef, Keisuke Kamata, George Armstrong, Kevin Boyd, Zhonglin Cao, Han-Yi Chou, Simon Chu, Allan dos Santos Costa, Sajad Darabi, Eric Dawson, Kieran Didi, Cong Fu, Mario Geiger, Michelle Gill, Darren J Hsu, Gagan Kaushik, Maria Korshunova, Steven Kothen-Hill, Youhan Lee, Meng Liu, Micha Livne, Zachary McClure, Jonathan Mitchell, Alireza Moradzadeh, Ohad Mosafi, Youssef Nashed, Saee Paliwal, Yuxing Peng, Sara Rabhi, Farhad Ramezanghorbani, Danny Reidenbach, Camir Ricketts, Brian C Roland, Kushal Shah, Tyler Shimko, Hassan Sirelkhatim, Savitha Srinivasan, Abraham C Stern, Dorota Toczydlowska, Srimukh Prasad Veccham, Niccolò Alberto Elia Venanzi, Anton Vorontsov, Jared Wilber, Isabel Wilkinson, Wei Jing Wong, Eva Xue, Cory Ye, Xin Yu, Yang Zhang, Guoqing Zhou, Becca Zandstein, Alejandro Chacòn, Prashant Sohani, Maximilian Stadler, Christian Hundt, Feiwen Zhu, Christian Dallago, Bruno Trentini, Emine Kucukbenli, Saee Paliwal, Timur Rvachov, Eddie Calleja, Johnny Israeli, Harry Clifford, Risto Haukioja, Nicholas Haemel, Kyle Tretina, Neha Tadimeti, Anthony B Costa
Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.
人工智能模型的编码生物学和化学正在开辟通向高通量和高质量硅药物开发的新途径,然而,它们的培训越来越依赖计算尺度,最近对数百个图形处理器进行了蛋白质语言模型培训;我们引入了生物Nemo框架,以便利在数百个全球动力器中进行计算生物学和化学AI模型的培训;其模块设计允许将单个组成部分,如数据处理器等纳入现有工作流程,并允许社区作出贡献;我们通过使用诸如PLM预培训和微调等案例,详细介绍了生物Nemo框架的技术特点;关于256个NVIDIA A100s,生物Nemo框架在4.2天里对30亿个参数BERT基PLM进行了为期1万亿多个符号的培训;生物Nemo框架是开放的,可供每个人免费使用。
Article 78
Title@2025-06-12 (4): PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models
Title: PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models | PREMISE: Skalierbare und strategische Prompt-Optimierung für effiziente mathematische Reasoning in großen Modellen | PREMISE:大规模模型中高效数学理由的可扩展和战略快速优化 2506.10716v1 |
Authors (3): Ye Yu, Yaoning Yu, Haohan Wang
Large reasoning models (LRMs) such as Claude 3.7 Sonnet and OpenAI o1 achieve strong performance on mathematical benchmarks using lengthy chain-of-thought (CoT) reasoning, but the resulting traces are often unnecessarily verbose. This inflates token usage and cost, limiting deployment in latency-sensitive or API-constrained settings. We introduce PREMISE (PRompt-based Efficient Mathematical Inference with Strategic Evaluation), a prompt-only framework that reduces reasoning overhead without modifying model weights. PREMISE combines trace-level diagnostics with gradient-inspired prompt optimization to minimize redundant computation while preserving answer accuracy. The approach jointly optimizes brevity and correctness through a multi-objective textual search that balances token length and answer validity. Unlike prior work, PREMISE runs in a single-pass black-box interface, so it can be applied directly to commercial LLMs. On GSM8K, SVAMP, and Math500 we match or exceed baseline accuracy ($96\%\rightarrow96\%$ with Claude, $91\%\rightarrow92\%$ with Gemini) while reducing reasoning tokens by up to $87.5\%$ and cutting dollar cost by $69$–$82\%$. These results show that prompt-level optimization is a practical and scalable path to efficient LRM inference without compromising reasoning quality.
大型推理模型(LRMs),如Claude 3.7 Sonnet 和 OpenAI o1 等大型推理模型(LRM),利用冗长的思维链推理,在数学基准上取得很强的成绩,但由此产生的痕迹往往不必要。这种增缩象征使用和成本,限制在对延时敏感或受API限制的环境下的部署。我们引入了PREMISE(基于Prompt的高效数学参数与战略评估的PROMISE),一个在不改变模型重量的情况下减少推理成本的简单框架。PREMISE将追踪级诊断与受梯度激励的快速优化结合起来,以尽量减少多余的计算,同时保持回答准确性。这种方法通过多目标文本搜索,使符号长度和回答的有效性达到最佳和准确性。 与以前的工作不同,PREMSE在单一的黑箱界面中运行,因此可以直接应用到商业LMSMS。 GSM8K, SVAMP, 和 Math500我们匹配或超过基准精确性精确性(96-rdefile ladeal $ $99_cal_cal$)和Gemini-rillational-rillational leximlexinal lexal lex lex lex lex lex lex lex lex lex lex lemental lex lex lex lex lex lex lex。
Article 79
Title@2025-06-12 (4): Testing Generalizability in Causal Inference
Title: Testing Generalizability in Causal Inference | Prüfung der Verallgemeinerbarkeit bei Kausalzusammenhang | 因果关系中的通用性 2411.03021v2 |
Authors (3): Daniel de Vassimon Manela, Linying Yang, Robin J. Evans
Ensuring robust model performance in diverse real-world scenarios requires addressing generalizability across domains with covariate shifts. However, no formal procedure exists for statistically evaluating generalizability in machine learning algorithms. Existing predictive metrics like mean squared error (MSE) help to quantify the relative performance between models, but do not directly answer whether a model can or cannot generalize. To address this gap in the domain of causal inference, we propose a systematic framework for statistically evaluating the generalizability of high-dimensional causal inference models. Our approach uses the frugal parameterization to flexibly simulate from fully and semi-synthetic causal benchmarks, offering a comprehensive evaluation for both mean and distributional regression methods. Grounded in real-world data, our method ensures more realistic evaluations, which is often missing in current work relying on simplified datasets. Furthermore, using simulations and statistical testing, our framework is robust and avoids over-reliance on conventional metrics, providing statistical safeguards for decision making.
在不同现实世界情景中确保稳健的模型性能需要解决不同领域的通用性,并发生共变变化。然而,在从统计角度评估机器学习算法的通用性方面,没有正式的程序。现有的预测性指标,如平均平方差(MSE)有助于量化模型之间的相对性能,但并不直接回答模型能否或不能概括化。为了解决因果推断领域的这一差距,我们提出了一个系统框架,用于统计评估高维因果推理模型的通用性。我们的方法利用节制参数,从充分和半合成因果基准中进行灵活模拟,为平均值和分布回归方法提供综合评价。基于现实世界数据,我们的方法确保了更现实的评估,而目前的工作往往依靠简化的数据集,而这种评估往往缺乏。此外,我们的框架使用模拟和统计测试,是稳健健的,避免过度依赖常规指标,为决策提供统计保障。
Article 80
Title@2025-06-12 (4): Continual Hyperbolic Learning of Instances and Classes
Title: Continual Hyperbolic Learning of Instances and Classes | Kontinuierliches Hyperbolisches Lernen von Instanzen und Klassen | 不断对各种事件和课程进行双曲学习 2506.10710v1 |
Authors (6): Melika Ayoughi, Mina Ghadimi Atigh, Mohammad Mahdi Derakhshani, Cees G. M. Snoek, Pascal Mettes, Paul Groth
Continual learning has traditionally focused on classifying either instances or classes, but real-world applications, such as robotics and self-driving cars, require models to handle both simultaneously. To mirror real-life scenarios, we introduce the task of continual learning of instances and classes, at the same time. This task challenges models to adapt to multiple levels of granularity over time, which requires balancing fine-grained instance recognition with coarse-grained class generalization. In this paper, we identify that classes and instances naturally form a hierarchical structure. To model these hierarchical relationships, we propose HyperCLIC, a continual learning algorithm that leverages hyperbolic space, which is uniquely suited for hierarchical data due to its ability to represent tree-like structures with low distortion and compact embeddings. Our framework incorporates hyperbolic classification and distillation objectives, enabling the continual embedding of hierarchical relations. To evaluate performance across multiple granularities, we introduce continual hierarchical metrics. We validate our approach on EgoObjects, the only dataset that captures the complexity of hierarchical object recognition in dynamic real-world environments. Empirical results show that HyperCLIC operates effectively at multiple granularities with improved hierarchical generalization.
持续学习传统上侧重于对实例或类别进行分类, 但现实世界应用, 如机器人和自驾驶汽车等, 需要同时使用模型来处理两者。 为了镜像现实生活情景, 我们同时引入持续学习实例和类别的任务。 这个任务挑战模型要适应颗粒的多重水平, 需要用粗微分分层分类法的概括性来平衡微微分体识别。 在本文中, 我们确定该类别和情况自然形成等级结构。 为了模拟这些等级关系, 我们建议使用超超CLIC, 一种持续学习的算法, 利用超单向空间, 因为它能够代表像树一样的结构, 且低扭曲和紧凑嵌入, 因而特别适合等级数据 。 我们的框架包含超单向分类和蒸馏目标, 使等级关系能够持续嵌入。 为了评估多颗粒性的性, 我们引入了连续的等级测量标准。 我们验证了我们在EgoObjects上的方法, 唯一的数据集能捕捉到动态真实世界环境中分级物体识别的复杂性 。 磁性CLIC 的结果显示, 系统在多级上有效操作的压性 。
Article 81
Title@2025-06-12 (4): Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models
Title: Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models | Anreize für eine fortgeschrittene Instruktions-Folge von großen Sprachmodellen | 为采用大语言模式的高级指示提供激励理由 2506.01413v2 |
Authors (9): Yulei Qin, Gang Li, Zongyi Li, Zihan Xu, Yuchen Shi, Zhekai Lin, Xiao Cui, Ke Li, Xing Sun
Existing large language models (LLMs) face challenges of following complex instructions, especially when multiple constraints are present and organized in paralleling, chaining, and branching structures. One intuitive solution, namely chain-of-thought (CoT), is expected to universally improve capabilities of LLMs. However, we find that the vanilla CoT exerts a negative impact on performance due to its superficial reasoning pattern of simply paraphrasing the instructions. It fails to peel back the compositions of constraints for identifying their relationship across hierarchies of types and dimensions. To this end, we propose a systematic method to boost LLMs in dealing with complex instructions via incentivizing reasoning for test-time compute scaling. First, we stem from the decomposition of complex instructions under existing taxonomies and propose a reproducible data acquisition method. Second, we exploit reinforcement learning (RL) with verifiable rule-centric reward signals to cultivate reasoning specifically for instruction following. We address the shallow, non-essential nature of reasoning under complex instructions via sample-wise contrast for superior CoT enforcement. We also exploit behavior cloning of experts to facilitate steady distribution shift from fast-thinking LLMs to skillful reasoners. Extensive evaluations on seven comprehensive benchmarks confirm the validity of the proposed method, where a 1.5B LLM achieves 11.74% gains with performance comparable to a 8B LLM. Codes and data are available at https://github.com/yuleiqin/RAIF.
现有大型语言模型(LLMS)面临遵守复杂指示的挑战,特别是当存在多种限制因素,并在平行、链条和分支结构中组织多种制约时。一个直观的解决方案,即思维链(CoT),预计将普遍提高LLMs的能力。然而,我们发现香草COT由于其肤浅的推理模式而对其业绩产生消极影响,因为它只是将指示抛光光,它没有剥去在确定不同类型和层面的等级关系方面的制约的构成。为此,我们提出一个系统的方法,通过激励测试-时间计算比例的推理,促进LMS处理复杂的指示。首先,我们源于现有分类法下复杂指示的分解,并提出可再生的数据采集方法。第二,我们利用强化学习(RLL)和可核实的以规则为中心的奖赏信号,专门为随后的教学提供推理。我们通过样本对比,处理在高级COT执法的复杂指示下的浅浅、非本质推理学性质。我们还利用专家行为克隆,促进从现有测试-LMLMS-LMS-S-S-S-S-S-S-S-S-Syal-S-S-S-S-S-S-S-S-Sy-Sy-Sy-S-Syal-Syal-S-S-Syal-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-Sy-Sy-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-
Article 82
Title@2025-06-12 (4): ConTextTab: A Semantics-Aware Tabular In-Context Learner
Title: ConTextTab: A Semantics-Aware Tabular In-Context Learner | ConTextTab: Ein Semantik-Bewusst Tabular In-Context Learner | ConTextTab : 语义学软件制表器 2506.10707v1 |
Authors (4): Marco Spinaci, Marek Polewczyk, Maximilian Schambach, Sam Thelin
Tabular in-context learning (ICL) has recently achieved state-of-the-art (SOTA) performance on several tabular prediction tasks. Previously restricted to classification problems on small tables, recent advances such as TabPFN and TabICL have extended its use to larger datasets. While being architecturally efficient and well-adapted to tabular data structures, current table-native ICL architectures, being trained exclusively on synthetic data, do not fully leverage the rich semantics and world knowledge contained in real-world tabular data. On another end of this spectrum, tabular ICL models based on pretrained large language models such as TabuLa-8B integrate deep semantic understanding and world knowledge but are only able to make use of a small amount of context due to inherent architectural limitations. With the aim to combine the best of both these worlds, we introduce ConTextTab, integrating semantic understanding and alignment into a table-native ICL framework. By employing specialized embeddings for different data modalities and by training on large-scale real-world tabular data, our model is competitive with SOTA across a broad set of benchmarks while setting a new standard on the semantically rich CARTE benchmark.
最近的一些进展,如TabPFN和TabICL等,已经将其用于更大的数据集。虽然目前的表型ICL结构在结构上效率高,适应了表格数据结构,但完全以合成数据为主的培训,目前的表型ICL结构没有充分利用真实世界表格数据中所包含的丰富的语义和世界知识。在这一频谱的另一端,基于诸如TabuLa-8B等预先培训的大型语言模型的ICL模型,将深度语义理解和世界知识整合在一起,但由于固有的建筑限制,我们只能利用少量的环境。为了将这两个世界的最佳数据组合起来,我们引入了ConTextTAB,将语义理解和一致性纳入表格型ICL框架。通过对不同数据模式的专门嵌入和大规模真实世界表格数据的培训,我们的模型与SOTA具有竞争力,跨越一套广泛的标准基准,同时制定了新的标准CARSAR标准。
Article 83
Title@2025-06-12 (4): Preserving Task-Relevant Information Under Linear Concept Removal
Title: Preserving Task-Relevant Information Under Linear Concept Removal | Task-Relevante Informationen unter linearen Konzeptentfernung speichern | 在线性概念删除下保留任务相关信息 2506.10703v1 |
Authors (3): Floris Holstege, Shauli Ravfogel, Bram Wouters
Modern neural networks often encode unwanted concepts alongside task-relevant information, leading to fairness and interpretability concerns. Existing post-hoc approaches can remove undesired concepts but often degrade useful signals. We introduce SPLICE-Simultaneous Projection for LInear concept removal and Covariance prEservation-which eliminates sensitive concepts from representations while exactly preserving their covariance with a target label. SPLICE achieves this via an oblique projection that “splices out” the unwanted direction yet protects important label correlations. Theoretically, it is the unique solution that removes linear concept predictability and maintains target covariance with minimal embedding distortion. Empirically, SPLICE outperforms baselines on benchmarks such as Bias in Bios and Winobias, removing protected attributes while minimally damaging main-task information.
现代神经网络往往将不必要的概念与任务相关的信息编码起来,从而导致对公平和可解释性的关切。 现有的热后方法可以消除不理想的概念,但往往会降低有用的信号。 我们引入了 SPLICE - 同步预测, 用于Learar 概念的清除和共性观测, 从而消除敏感概念的表达方式, 同时完全保持与目标标签的共性。 SPLICE通过模糊的预测实现这一点, 即“ 防止” 不必要的方向, 同时又保护重要的标签关联性。 理论上, 这是消除线性概念可预测性并维持目标变量, 且最小的嵌入扭曲的独特解决方案 。 简而言之, SPLICE 超越了生物和维诺比亚等基准的基线, 去除受保护的属性, 同时又尽可能减少主要任务信息 。
Article 84
Title@2025-06-12 (4): SNR and Resource Adaptive Deep JSCC for Distributed IoT Image Classification
Title: SNR and Resource Adaptive Deep JSCC for Distributed IoT Image Classification | SNR und Resource Adaptive Deep JSCC für verteilte IoT-Bildklassifikation | SNR与资源适应性深部JSC,用于分布的IoT图像分类 2506.10699v1 |
Authors (2): Ali Waqas, Sinem Coleri
Sensor-based local inference at IoT devices faces severe computational limitations, often requiring data transmission over noisy wireless channels for server-side processing. To address this, split-network Deep Neural Network (DNN) based Joint Source-Channel Coding (JSCC) schemes are used to extract and transmit relevant features instead of raw data. However, most existing methods rely on fixed network splits and static configurations, lacking adaptability to varying computational budgets and channel conditions. In this paper, we propose a novel SNR- and computation-adaptive distributed CNN framework for wireless image classification across IoT devices and edge servers. We introduce a learning-assisted intelligent Genetic Algorithm (LAIGA) that efficiently explores the CNN hyperparameter space to optimize network configuration under given FLOPs constraints and given SNR. LAIGA intelligently discards the infeasible network configurations that exceed computational budget at IoT device. It also benefits from the Random Forests based learning assistance to avoid a thorough exploration of hyperparameter space and to induce application specific bias in candidate optimal configurations. Experimental results demonstrate that the proposed framework outperforms fixed-split architectures and existing SNR-adaptive methods, especially under low SNR and limited computational resources. We achieve a 10\% increase in classification accuracy as compared to existing JSCC based SNR-adaptive multilayer framework at an SNR as low as -10dB across a range of available computational budget (1M to 70M FLOPs) at IoT device.
在IOT设备上基于感官的本地测算系统面临严重的计算限制,常常需要通过噪音无线频道传输数据,用于服务器处理。为此,我们采用了基于源-气相联合编码(JSCC)的分裂式网络深神经网络(DNN)计划,以提取和传输相关特征,而不是原始数据。然而,大多数现有方法依赖于固定网络分割和静态配置,缺乏适应不同计算预算和频道条件的适应性。本文中,我们提议建立一个新型的SNR-和计算-适应性分布式CNN框架,用于在IOT装置和边缘服务器上进行无线图像分类。我们引入了智能智能智能智能智能阿尔哥里姆(LAIGA)(DNIS)计划,高效探索了CNNW超光速超光速智能网络空间,以优化网络配置模式优化网络配置。LIGA明智地抛弃了超过IO设备的计算预算的不可行网络配置。我们通过随机森林学习帮助避免对超光度空间进行彻底探索,并在候选人最佳配置中引入具体的应用偏差偏差。我们实验结果,在S-ROLLLL的低级的当前预算结构下,在现有的标准结构下,在S-RRRRLLLLLLLLLA中,将现有的现有低级计算结构下,将现有低级结构下,将现有的框架提升了现有低级结构下,作为现有的标准,作为现有的标准。
Article 85
Title@2025-06-12 (4): A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack
Title: A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack | Ein hierarchischer Ansatz zur Bewertung der Verwundbarkeit baumbasierter Klassifikationsmodelle für den Schlussfolgerungsangriff der Mitgliedschaft | 评估树树分类模式易受成员推推推攻击的脆弱性的等级办法 2502.09396v2 |
Authors (2): Richard J. Preen, Jim Smith
Machine learning models can inadvertently expose confidential properties of their training data, making them vulnerable to membership inference attacks (MIA). While numerous evaluation methods exist, many require computationally expensive processes, such as training multiple shadow models. This article presents two new complementary approaches for efficiently identifying vulnerable tree-based models: an ante-hoc analysis of hyperparameter choices and a post-hoc examination of trained model structure. While these new methods cannot certify whether a model is safe from MIA, they provide practitioners with a means to significantly reduce the number of models that need to undergo expensive MIA assessment through a hierarchical filtering approach. More specifically, it is shown that the rank order of disclosure risk for different hyperparameter combinations remains consistent across datasets, enabling the development of simple, human-interpretable rules for identifying relatively high-risk models before training. While this ante-hoc analysis cannot determine absolute safety since this also depends on the specific dataset, it allows the elimination of unnecessarily risky configurations during hyperparameter tuning. Additionally, computationally inexpensive structural metrics serve as indicators of MIA vulnerability, providing a second filtering stage to identify risky models after training but before conducting expensive attacks. Empirical results show that hyperparameter-based risk prediction rules can achieve high accuracy in predicting the most at risk combinations of hyperparameters across different tree-based model types, while requiring no model training. Moreover, target model accuracy is not seen to correlate with privacy risk, suggesting opportunities to optimise model configurations for both performance and privacy.
机器学习模型可能无意中暴露其培训数据的保密性质,使其容易受到会员推价攻击(MIA) 。虽然存在许多评价方法,但许多评价方法需要计算昂贵的过程,例如培训多个影子模型。本篇文章提出了两种新的补充方法,以有效识别脆弱的树型模型:超光谱选择的抗热分析和经过培训的模型结构的后热检查。虽然这些新方法无法证明模型是否安全地来自MIA,但它们为从业者提供了一种手段,以大幅减少需要通过等级过滤法进行昂贵的隐私权评估的模型数量。更具体地说,这表明不同超光度组合的披露风险等级顺序在数据集之间仍然一致,从而得以制定简单、人为互换规则,在培训前确定相对高风险的模式。虽然这种超光谱分析不能确定绝对安全性,因为这也取决于具体的数据集,但允许在超光光度调整期间消除不必要的模型配置。此外,计算价格低廉的结构指标是MIA脆弱性的指标,提供第二个过滤阶段,用以在进行高度预测之前确定风险性模型,同时在进行高风险预测之前进行高风险预测。
Article 86
Title@2025-06-12 (4): FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems
Title: FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems | FedRAG: Ein Rahmen für Systeme der Feinsteuerung von Retrieval-Augmented Generation | FFRAG: 微调取回系统框架 2506.09200v2 |
Authors (8): Val Andrei Fajardo, David B. Emerson, Amandeep Singh, Veronica Chatrath, Marcelo Lotif, Ravi Theja, Alex Cheung, Izuki Matsuba
Retrieval-augmented generation (RAG) systems have been shown to be effective in addressing many of the drawbacks of relying solely on the parametric memory of large language models. Recent work has demonstrated that RAG systems can be improved via fine-tuning of their retriever and generator models. In this work, we introduce FedRAG, a framework for fine-tuning RAG systems across centralized and federated architectures. FedRAG supports state-of-the-art fine-tuning methods, offering a simple and intuitive interface and a seamless conversion from centralized to federated training tasks. FedRAG is also deeply integrated with the modern RAG ecosystem, filling a critical gap in available tools.
事实证明,回收-提款发电系统(RAG)对于解决完全依赖大型语言模型的参数记忆的许多缺点是有效的,最近的工作表明,通过微调其检索器和发电机模型,可以改进RAG系统。在这项工作中,我们引入FedRAG,这是对中央和联合结构的RAG系统进行微调的框架。 FedRAG支持最先进的微调方法,提供一个简单和直观的界面,并将中央化培训任务顺利地转换为联合培训任务。 FedRAG还与现代RAG生态系统紧密结合,填补了现有工具中的一个关键空白。
Article 87
Title@2025-06-12 (4): Scalable unsupervised feature selection via weight stability
Title: Scalable unsupervised feature selection via weight stability | Skalierbare unüberwachte Feature-Auswahl über Gewichtsstabilität | 通过重量稳定性进行可缩放的、不受监督的特征选择 2506.06114v2 |
Authors (2): Xudong Zhang, Renato Cordeiro de Amorim
Unsupervised feature selection is critical for improving clustering performance in high-dimensional data, where irrelevant features can obscure meaningful structure. In this work, we introduce the Minkowski weighted $k$-means++, a novel initialisation strategy for the Minkowski Weighted $k$-means. Our initialisation selects centroids probabilistically using feature relevance estimates derived from the data itself. Building on this, we propose two new feature selection algorithms, FS-MWK++, which aggregates feature weights across a range of Minkowski exponents to identify stable and informative features, and SFS-MWK++, a scalable variant based on subsampling. We support our approach with a theoretical guarantee under mild assumptions and extensive experiments showing that our methods consistently outperform existing alternatives. Our software can be found at https://github.com/xzhang4-ops1/FSMWK.
不受监督的特性选择对于提高高维数据的群集性能至关重要, 在高维数据的群集性能方面, 不相干的特点可以模糊有意义的结构。 在这项工作中, 我们引入了 Minkowski 加权 $k$- means++, 这是Minkowski Weighted $k$s- means的新型初始化战略。 我们的初始化使用数据本身得出的特征相关估计, 概率性地选择了小行星。 在此基础上, 我们提议了两种新的特征选择算法 FS- MWK++, 将一系列Minkowski Expents 的权重组合在一起, 以识别稳定和信息性特征。 SFS- MWK++, 是一种基于子抽样的可缩放变量。 我们支持我们的方法, 在温和假设和广泛实验下, 以理论保证我们的方法始终优于现有替代品。 我们的软件可以在 https://github.com/xzhang4-ops1/FSMWWK 中找到。
Article 88
Title@2025-06-12 (4): STOAT: Spatial-Temporal Probabilistic Causal Inference Network
Title: STOAT: Spatial-Temporal Probabilistic Causal Inference Network | STOAT: Raum-Temporal Probabilistic Causal Inference Network | STOAT: 空间-临时概率因果关系网络 2506.09544v2 |
Authors (4): Yang Yang, Du Yin, Hao Xue, Flora Salim
Spatial-temporal causal time series (STC-TS) involve region-specific temporal observations driven by causally relevant covariates and interconnected across geographic or network-based spaces. Existing methods often model spatial and temporal dynamics independently and overlook causality-driven probabilistic forecasting, limiting their predictive power. To address this, we propose STOAT (Spatial-Temporal Probabilistic Causal Inference Network), a novel framework for probabilistic forecasting in STC-TS. The proposed method extends a causal inference approach by incorporating a spatial relation matrix that encodes interregional dependencies (e.g. proximity or connectivity), enabling spatially informed causal effect estimation. The resulting latent series are processed by deep probabilistic models to estimate the parameters of the distributions, enabling calibrated uncertainty modeling. We further explore multiple output distributions (e.g., Gaussian, Student’s-$t$, Laplace) to capture region-specific variability. Experiments on COVID-19 data across six countries demonstrate that STOAT outperforms state-of-the-art probabilistic forecasting models (DeepAR, DeepVAR, Deep State Space Model, etc.) in key metrics, particularly in regions with strong spatial dependencies. By bridging causal inference and geospatial probabilistic forecasting, STOAT offers a generalizable framework for complex spatial-temporal tasks, such as epidemic management.
现有方法往往独立地模拟空间和时间动态,并忽视因果关系驱动的概率预测,从而限制其预测力。为了解决这一问题,我们提议采用STOAT(空间-时空概率因果推断网络),这是在STC-TS中进行概率预测的新框架。拟议方法扩展了因果推断方法,纳入了一个空间关系矩阵,该矩阵将区域间依赖(例如接近或连通)编码,使空间知情的因果关系估计成为独立模型,并忽视因果关系驱动的概率预测,从而限制了其预测力。为此,我们提议采用STOAT(空间-时空概率概率因果诱因果推断网络),这是在STC-TS中进行概率预测的新框架。拟议方法将因果推断方法扩展为因果推断方法。6个国家的COVID-19数据实验表明,STOAT(如甚高频-亚经互连通性地理空间预测模型、甚高空基的SBI-ART),其潜在序列模型(如甚高空基的S-ART),其基础性、甚低空基的S-AV-BI-B-BI-B-SIR预测模型。
Article 89
Title@2025-06-12 (4): Learning hidden cascades via classification
Title: Learning hidden cascades via classification | Versteckte Kaskaden durch Klassifizierung lernen | 通过分类学习隐藏的级联 2505.11228v2 |
Authors (5): Derrick Gilchrist Edward Manoharan, Anubha Goel, Alexandros Iosifidis, Henri Hansen, Juho Kanniainen
The spreading dynamics in social networks are often studied under the assumption that individuals’ statuses, whether informed or infected, are fully observable. However, in many real-world situations, such statuses remain unobservable, which is crucial for determining an individual’s potential to further spread the infection. While this final status is hidden, intermediate indicators such as symptoms of infection are observable and provide important insights into the spread process. We propose a partial observability-aware Machine Learning framework to learn the characteristics of the spreading model. We term the method Distribution Classification, which utilizes the power of classifiers to infer the underlying transmission dynamics. We evaluate our method on two types of synthetic networks and extend the study to a real-world insider trading network. Results show that the method performs well, especially on complex networks with high cyclic connectivity, supporting its utility in analyzing real-world spreading phenomena where direct observation of individual statuses is not possible.
社会网络的蔓延动态往往在以下假设下研究:个人状况,无论是知情还是受感染,都是完全可见的;然而,在许多现实世界局势中,这种状况仍然是不可观察的,这对于确定个人进一步传播感染的可能性至关重要;虽然这种最终状况是隐蔽的,但传染症状等中间指标是可观测的,对传播过程提供了重要的洞察力;我们提议了一个部分易观察性机器学习框架,以了解传播模式的特点;我们使用了方法分配分类,利用分类者的力量来推断潜在的传播动态;我们评估了两种合成网络的方法,并将研究扩大到真实世界的内部交易网络;结果显示,该方法运行良好,特别是在具有高度循环连接的复杂网络上,支持其用于分析无法直接观察个人状况的真实世界传播现象。
Article 90
Title@2025-06-12 (4): Large Language Models for Detection of Life-Threatening Texts
Title: Large Language Models for Detection of Life-Threatening Texts | Große Sprachmodelle zur Erkennung lebensbedrohlicher Texte | 探测生命威胁文字的长语言大语言模型 2506.10687v1 |
Authors (3): Thanh Thi Nguyen, Campbell Wilson, Janis Dalins
Detecting life-threatening language is essential for safeguarding individuals in distress, promoting mental health and well-being, and preventing potential harm and loss of life. This paper presents an effective approach to identifying life-threatening texts using large language models (LLMs) and compares them with traditional methods such as bag of words, word embedding, topic modeling, and Bidirectional Encoder Representations from Transformers. We fine-tune three open-source LLMs including Gemma, Mistral, and Llama-2 using their 7B parameter variants on different datasets, which are constructed with class balance, imbalance, and extreme imbalance scenarios. Experimental results demonstrate a strong performance of LLMs against traditional methods. More specifically, Mistral and Llama-2 models are top performers in both balanced and imbalanced data scenarios while Gemma is slightly behind. We employ the upsampling technique to deal with the imbalanced data scenarios and demonstrate that while this method benefits traditional approaches, it does not have as much impact on LLMs. This study demonstrates a great potential of LLMs for real-world life-threatening language detection problems.
检测威胁生命的语言对于保护处于困境中的个人、促进心理健康和福祉以及防止潜在伤害和生命损失至关重要,本文件展示了使用大型语言模型(LLMs)确定威胁生命的文本的有效方法,并将这些文本与传统方法进行比较,如变异器的单词包、字嵌入、主题模型和双向编码显示器等传统方法。我们微小地运用高端技术处理不平衡的数据假设情况,并表明虽然这种方法有利于传统方法,但对Llama-2 系统的影响不大。这项研究表明LLMs在现实世界威胁生命的语言探测问题方面有很大的潜力。
Article 91
Title@2025-06-12 (4): Croppable Knowledge Graph Embedding
Title: Croppable Knowledge Graph Embedding | Croppable Knowledge Graph Embedding | 可裁剪知识图嵌入 2407.02779v2 |
Authors (6): Yushan Zhu, Wen Zhang, Zhiqiang Liu, Mingyang Chen, Lei Liang, Huajun Chen
Knowledge Graph Embedding (KGE) is a common approach for Knowledge Graphs (KGs) in AI tasks. Embedding dimensions depend on application scenarios. Requiring a new dimension means training a new KGE model from scratch, increasing cost and limiting efficiency and flexibility. In this work, we propose a novel KGE training framework MED. It allows one training to obtain a croppable KGE model for multiple scenarios with different dimensional needs. Sub-models of required dimensions can be directly cropped and used without extra training. In MED, we propose a mutual learning mechanism to improve the low-dimensional sub-models and make high-dimensional sub-models retain the low-dimensional sub-models’ capacity, an evolutionary improvement mechanism to promote the high-dimensional sub-models to master the triple that the low-dimensional sub-models can not, and a dynamic loss weight to adaptively balance the multiple losses. Experiments on 4 KGE models across 4 standard KG completion datasets, 3 real-world scenarios using a large-scale KG, and extending MED to the BERT language model demonstrate its effectiveness, high efficiency, and flexible extensibility.
知识嵌入图(KGE)是AI任务中知识图(KGS)的一种常见方法。嵌入维度取决于应用情景。 要求新的维度意味着从零开始培训一个新的KGE模型,增加成本并限制效率和灵活性。 在这项工作中,我们提议了一个新的KGE培训框架MED。它允许一种培训为具有不同维度需要的多种情景获得可裁成的KGE模型。要求层面的子模型可以在不经过额外培训的情况下直接裁剪和使用。在MED中,我们提议了一个相互学习机制,以改善低维次模型,并使高维次模型保持低维次模型的能力,这是一种进化改进机制,以促进高维次模型掌握低维子模型无法掌握的三重模型,以及动态损失权重,以适应性平衡多重损失。在4个标准KG的完成数据集中对4个KGE模型进行实验,3个真实世界模型使用大型KG,并将MED扩展至BERT语言模型,展示其有效性、高效率和灵活性。
Article 92
Title@2025-06-12 (4): Implicit Language Models are RNNs: Balancing Parallelization and Expressivity
Title: Implicit Language Models are RNNs: Balancing Parallelization and Expressivity | Implizite Sprachmodelle sind RNNs: Ausgleichende Parallelisierung und Expressivität | 隐含语言模式是RNNs:平衡平行和表达。 2502.07827v3 |
Authors (6): Mark Schöne, Babak Rahmani, Heiner Kremer, Fabian Falck, Hitesh Ballani, Jannes Gladrow
State-space models (SSMs) and transformers dominate the language modeling landscape. However, they are constrained to a lower computational complexity than classical recurrent neural networks (RNNs), limiting their expressivity. In contrast, RNNs lack parallelization during training, raising fundamental questions about the trade off between parallelization and expressivity. We propose implicit SSMs, which iterate a transformation until convergence to a fixed point. Theoretically, we show that implicit SSMs implement the non-linear state-transitions of RNNs. Empirically, we find that only approximate fixed-point convergence suffices, enabling the design of a scalable training curriculum that largely retains parallelization, with full convergence required only for a small subset of tokens. Our approach demonstrates superior state-tracking capabilities on regular languages, surpassing transformers and SSMs. We further scale implicit SSMs to natural language reasoning tasks and pretraining of large-scale language models up to 1.3B parameters on 207B tokens representing, to our knowledge, the largest implicit model trained to date. Notably, our implicit models outperform their explicit counterparts on standard benchmarks. Our code is publicly available at http://github.com/microsoft/implicit_languagemodels .
国家空间模型(SSM)和变压器在语言建模格局中占主导地位,然而,它们被限制在比古典经常性神经网络(RNN)更低的计算复杂性下,限制其表达性。相比之下,在培训期间,RNNN缺乏平行性,对平行性和表达性之间的平衡提出了根本性问题。我们提出了隐含的 SMS,在向一个固定点趋同之前,它会转换。理论上,我们显示隐含的SMS实施了非线性状态的RNNs过渡。我们经常发现,只有近乎固定点的趋同,才能设计一个可扩展的培训课程,基本上保持平行化,只对一小部分象征物要求完全趋同。我们的方法显示了在常规语言、超变压器和SMM等上的高级国家跟踪能力。我们进一步将隐含的SMS扩大到自然语言推理任务,并预先培训大型语言模型,在207B标志上达到1.3B参数,据我们所知,是迄今所训练的最大隐含型模型。 值得注意的是,我们隐含的模型超越了他们在标准模模模/Anglimal标准基准上的明确对应者。我们代码是公开的。
Article 93
Title@2025-06-12 (4): SR-Reward: Taking The Path More Traveled
Title: SR-Reward: Taking The Path More Traveled | SR-Reward: Den Weg mehr beschreiten | SR-奖励:走上更多旅行的道路 2501.02330v3 |
Authors (4): Seyed Mahdi B. Azad, Zahra Padar, Gabriel Kalweit, Joschka Boedecker
In this paper, we propose a novel method for learning reward functions directly from offline demonstrations. Unlike traditional inverse reinforcement learning (IRL), our approach decouples the reward function from the learner’s policy, eliminating the adversarial interaction typically required between the two. This results in a more stable and efficient training process. Our reward function, called \textit{SR-Reward}, leverages successor representation (SR) to encode a state based on expected future states’ visitation under the demonstration policy and transition dynamics. By utilizing the Bellman equation, SR-Reward can be learned concurrently with most reinforcement learning (RL) algorithms without altering the existing training pipeline. We also introduce a negative sampling strategy to mitigate overestimation errors by reducing rewards for out-of-distribution data, thereby enhancing robustness. This strategy inherently introduces a conservative bias into RL algorithms that employ the learned reward. We evaluate our method on the D4RL benchmark, achieving competitive results compared to offline RL algorithms with access to true rewards and imitation learning (IL) techniques like behavioral cloning. Moreover, our ablation studies on data size and quality reveal the advantages and limitations of SR-Reward as a proxy for true rewards.
在本文中,我们提出了一种从离线演示中直接学习奖励功能的新颖方法。与传统的反向强化学习(IRL)不同,我们的方法将奖励功能从学习者的政策中分离出来,消除两者之间通常需要的对抗性互动。这导致一个更加稳定和高效的培训过程。我们的奖励功能,称为\ textit{SR-Rerward},利用后续代表(SR),根据示范政策和过渡动态,根据预期的未来国家访问来将一个国家编码。通过使用贝尔曼方程式,SR-Rward可以与大多数强化学习(RL)算法同时学习,而不改变现有的培训管道。我们还采用了负面的抽样战略,通过减少分配外数据的奖励来减少过高估计错误,从而增强稳健性。我们的奖励功能是将保守的偏差引入使用学到的奖励的RL算法。我们用D4RL基准的方法取得了竞争性的结果,而从离线的RL算法可以取得真正的奖赏和模仿学习(IL)技术,例如行为性克隆。此外,我们关于数据规模和质量奖赏的真正关系研究,揭示了数据的优势。
Article 94
Title@2025-06-12 (4): Saturation Self-Organizing Map
Title: Saturation Self-Organizing Map | Karte zur Selbstorganisation der Sättigung | 饱和自组织地图 2506.10680v1 |
Authors (2): Igor Urbanik, Paweł Gajewski
Continual learning poses a fundamental challenge for neural systems, which often suffer from catastrophic forgetting when exposed to sequential tasks. Self-Organizing Maps (SOMs), despite their interpretability and efficiency, are not immune to this issue. In this paper, we introduce Saturation Self-Organizing Maps (SatSOM)-an extension of SOMs designed to improve knowledge retention in continual learning scenarios. SatSOM incorporates a novel saturation mechanism that gradually reduces the learning rate and neighborhood radius of neurons as they accumulate information. This effectively freezes well-trained neurons and redirects learning to underutilized areas of the map.
持续学习给神经系统带来了根本性的挑战,神经系统在接连工作时往往会遭受灾难性的遗忘,而自我组织的地图尽管可以解释和效率高,但不能幸免于这一问题,在本文件中,我们引入了饱和自组织地图(SatSOM)——SOM(SatSOM)——扩展SOM(SOM),目的是在不断学习的情景中提高知识保留率。SatSOM包含一种新的饱和机制,在神经元积累信息时逐渐降低学习率和周边周边周边。这有效地冻结了受过良好训练的神经元,并将学习转向地图中未充分利用的地区。
Article 95
Title@2025-06-12 (4): VeriContaminated: Assessing LLM-Driven Verilog Coding for Data Contamination
Title: VeriContaminated: Assessing LLM-Driven Verilog Coding for Data Contamination | VeriContaminiert: Bewertung von LLM-getriebener Verilog Coding für Datenkontamination | 受污染:评估数据污染的LLM-Driven Verilog编码 2503.13572v3 |
Authors (8): Zeng Wang, Minghao Shao, Jitendra Bhandari, Likhitha Mankali, Ramesh Karri, Ozgur Sinanoglu, Muhammad Shafique, Johann Knechtel
Large Language Models (LLMs) have revolutionized code generation, achieving exceptional results on various established benchmarking frameworks. However, concerns about data contamination - where benchmark data inadvertently leaks into pre-training or fine-tuning datasets - raise questions about the validity of these evaluations. While this issue is known, limiting the industrial adoption of LLM-driven software engineering, hardware coding has received little to no attention regarding these risks. For the first time, we analyze state-of-the-art (SOTA) evaluation frameworks for Verilog code generation (VerilogEval and RTLLM), using established methods for contamination detection (CCD and Min-K% Prob). We cover SOTA commercial and open-source LLMs (CodeGen2.5, Minitron 4b, Mistral 7b, phi-4 mini, LLaMA-{1,2,3.1}, GPT-{2,3.5,4o}, Deepseek-Coder, and CodeQwen 1.5), in baseline and fine-tuned models (RTLCoder and Verigen). Our study confirms that data contamination is a critical concern. We explore mitigations and the resulting trade-offs for code quality vs fairness (i.e., reducing contamination toward unbiased benchmarking).
大型语言模型(LLMS)使代码生成发生了革命性变化,在各种既定基准框架(VerilogEval和RTLLM)上取得了特殊成果,然而,对数据污染的关注(基准数据无意中泄漏到培训前或微调数据集中)引起了关于这些评估有效性的问题,尽管这个问题众所周知,限制LLM驱动软件工程的工业采用,硬件编码在这些风险方面几乎没有引起任何注意。我们首次在基线和微调模型(RTLCD和Verigen)中分析了Verilog码生成的最新(SOTA)评价框架(VerilogEval和RTLLLM),我们的研究证实,SOTA的商业和开放源码软件(CodeGen2.5、Minitron 4b、Mistral 7b、phy-4 mini、LLAMA-1,2,3.1}、GPT-{2,3.5,4o}、Deepseekk-Coder和DCQwen)中,数据污染质量和标准化标准(RTLCoder和Vergen)是一个重要的关切。我们的研究确认数据污染是降低质量和标准。
Article 96
Title@2025-06-12 (4): Practical Improvements of A/B Testing with Off-Policy Estimation
Title: Practical Improvements of A/B Testing with Off-Policy Estimation | Praktische Verbesserungen der A/B-Prüfung mit Off-Policy-Abschätzung | 实际改进A/B测试和非政策性估算的非政策性A/B测试 2506.10677v1 |
Authors (3): Sakhi Otmane, Gilotte Alexandre, Rohde David
We address the problem of A/B testing, a widely used protocol for evaluating the potential improvement achieved by a new decision system compared to a baseline. This protocol segments the population into two subgroups, each exposed to a version of the system and estimates the improvement as the difference between the measured effects. In this work, we demonstrate that the commonly used difference-in-means estimator, while unbiased, can be improved. We introduce a family of unbiased off-policy estimators that achieves lower variance than the standard approach. Among this family, we identify the estimator with the lowest variance. The resulting estimator is simple, and offers substantial variance reduction when the two tested systems exhibit similarities. Our theoretical analysis and experimental results validate the effectiveness and practicality of the proposed method.
我们处理A/B测试问题,这是用来评估新决策系统与基线相比可能实现的改进的一个广泛应用的协议。这个协议将人口分成两个分组,每个分组都接触一个版本的系统,并将改进作为衡量效果之间的差别来估计。在这项工作中,我们证明,通常使用的中值差异估测器虽然没有偏见,但可以改进。我们引入了一个不带偏见的非政策性估计器组成的大家庭,其差异小于标准方法。在这个大家庭中,我们找出了差异最小的估测器。由此得出的估测器很简单,当两个测试的系统显示出相似之处时,其差异就会大为减少。我们的理论分析和实验结果证实了拟议方法的有效性和实用性。
Article 97
Title@2025-06-12 (4): RmGPT: A Foundation Model with Generative Pre-trained Transformer for Fault Diagnosis and Prognosis in Rotating Machinery
Title: RmGPT: A Foundation Model with Generative Pre-trained Transformer for Fault Diagnosis and Prognosis in Rotating Machinery | RmGPT: Ein Fundamentmodell mit generativem vortrainiertem Transformer für Fehlerdiagnose und Prognose in rotierenden Maschinen | RMGPT: 轮调机械系统失灵诊断和预预报的预培训前生成变形器基金会模型 2409.17604v2 |
Authors (8): Yilin Wang, Yifei Yu, Kong Sun, Peixuan Lei, Yuxuan Zhang, Enrico Zio, Aiguo Xia, Yuanxiang Li
In industry, the reliability of rotating machinery is critical for production efficiency and safety. Current methods of Prognostics and Health Management (PHM) often rely on task-specific models, which face significant challenges in handling diverse datasets with varying signal characteristics, fault modes and operating conditions. Inspired by advancements in generative pretrained models, we propose RmGPT, a unified model for diagnosis and prognosis tasks. RmGPT introduces a novel generative token-based framework, incorporating Signal Tokens, Prompt Tokens, Time-Frequency Task Tokens and Fault Tokens to handle heterogeneous data within a unified model architecture. We leverage self-supervised learning for robust feature extraction and introduce a next signal token prediction pretraining strategy, alongside efficient prompt learning for task-specific adaptation. Extensive experiments demonstrate that RmGPT significantly outperforms state-of-the-art algorithms, achieving near-perfect accuracy in diagnosis tasks and exceptionally low errors in prognosis tasks. Notably, RmGPT excels in few-shot learning scenarios, achieving 82\% accuracy in 16-class one-shot experiments, highlighting its adaptability and robustness. This work establishes RmGPT as a powerful PHM foundation model for rotating machinery, advancing the scalability and generalizability of PHM solutions. \textbf{Code is available at: https://github.com/Pandalin98/RmGPT.
在工业中,旋转机械的可靠性对于生产效率和安全至关重要。目前的预测和健康管理方法往往依赖任务特有的模型,这些模型在处理具有不同信号特点、故障模式和操作条件的不同数据集方面面临着重大挑战。受基因化预测试模型进步的启发,我们提议了用于诊断和预测任务的统一模式RmGPT,这是诊断和预测任务的统一模式。RmGPT引入了一个新型的基因化代号框架,包括Signal Tokens、Spent Tokens、Timer-Front Tokens和Fault Tokens,以便在统一的模型结构中处理各种数据。我们利用自我监督学习的方法进行强力地特征提取,并采用下一个信号信号信号信号信号化预测预培训战略,同时迅速学习具体任务适应。广泛的实验表明,RmGPTT大大超前一流的算法,在诊断任务中达到近于性差的准确性,在预测/诊断任务中出现极低的错误。 突出的RMPTT在几发学习情景中优优异,在16-G级的可变动性机械基础上达到82的精确性常规,在16-GMMMLSlimdroma的可调制进行。
Article 98
Title@2025-06-12 (4): Automated Generation of Precedence Graphs in Digital Value Chains for Automotive Production
Title: Automated Generation of Precedence Graphs in Digital Value Chains for Automotive Production | Automatisierte Generierung von Precedence Graphen in digitalen Wertschöpfungsketten für die Automobilproduktion | 自动生成汽车生产数字价值链中的优先性图 2504.19835v2 |
Authors (2): Cornelius Hake, Christian Friedrich
This study examines the digital value chain in automotive manufacturing, focusing on the identification, software flashing, customization, and commissioning of electronic control units in vehicle networks. A novel precedence graph design is proposed to optimize this process chain using an automated scheduling algorithm, which combines structured data extraction from heterogeneous sources via natural language processing and classification techniques with mixed integer linear programming for efficient graph generation. The results show significant improvements in key metrics. The algorithm reduces the number of production stations equipped with expensive hardware and software to execute digital value chain processes, while also increasing capacity utilization through efficient scheduling and reduced idle time. Task parallelization is optimized, resulting in streamlined workflows and increased throughput. Compared to the traditional scheduling method, the automated approach has reduced preparation time by 50% and reduced scheduling activities, as it now takes two minutes to create the precedence graph. The flexibility of the algorithm’s constraints allows for vehicle-specific configurations while maintaining high responsiveness, eliminating backup stations and facilitating the integration of new topologies. Automated scheduling significantly outperforms manual methods in efficiency, functionality, and adaptability.
这项研究审查了汽车制造业的数字价值链,重点是车辆网络电子控制单位的识别、软件闪烁、定制和调试,建议采用新的优先图表设计,采用自动排期算法优化这一流程链,将通过自然语言处理和分类技术从不同来源按结构提取的数据与用于高效生成图形的混合整数线性编程相结合,结果显示关键指标有了重大改进;算法减少了配备了昂贵硬件和软件以执行数字价值链流程的生产站的数量,同时通过高效排期和减少闲置时间提高了能力利用率;优化了任务平行化,从而简化了工作流程和吞吐量。与传统的排期法相比,自动化方法减少了50%的排期,减少了排期活动,因为现在需要两分钟才能创建排期图。算法的局限性使得车辆特定配置具有灵活性,同时保持高度的响应能力,消除备份站,并促进新表的整合。自动排期表在效率、功能和适应性方面大大优于手工方法。
Article 99
Title@2025-06-12 (4): Diffusion-Free Graph Generation with Next-Scale Prediction
Title: Diffusion-Free Graph Generation with Next-Scale Prediction | Diffusionsfreie Graphengenerierung mit Next-Scale-Vorhersage | 下期期期预测的无传播图制作 2503.23612v2 |
Authors (6): Samuel Belkadi, Steve Hong, Marian Chen, Miruna Cretu, Charles Harris, Pietro Lio
Autoregressive models excel in efficiency and plug directly into the transformer ecosystem, delivering robust generalization, predictable scalability, and seamless workflows such as fine-tuning and parallelized training. However, they require an explicit sequence order, which contradicts the unordered nature of graphs. In contrast, diffusion models maintain permutation invariance and enable one-shot generation but require up to thousands of denoising steps and additional features for expressivity, leading to high computational costs. Inspired by recent breakthroughs in image generation, especially the success of visual autoregressive methods, we propose MAG, a novel diffusion-free graph generation framework based on next-scale prediction. By leveraging a hierarchy of latent representations, the model progressively generates scales of the entire graph without the need for explicit node ordering. Experiments on both generic and molecular graph datasets demonstrated the potential of this method, achieving inference speedups of up to three orders of magnitude over state-of-the-art methods, while preserving high-quality generation.
自动递减模型在效率方面非常出色,直接插入变压器生态系统,提供了强有力的通用、可预测的缩放性和无缝工作流程,如微调和平行培训。然而,它们需要明确的序列顺序,这与图形的无顺序性质相矛盾。相反,扩散模型保持了一闪不动的生成,但需要多达数千个分流步骤和额外特征来进行表达,从而导致高计算成本。在图像生成方面最近出现的突破,特别是视觉自动递增方法的成功,我们建议MAG,一个基于下尺度预测的新型无扩散图形生成框架。模型通过利用潜伏代表结构的等级,逐渐生成整个图形的尺度,而不需要明确的节点排序。对通用和分子图形数据集的实验显示了这种方法的潜力,在保持高质量生成的同时,在将高质量生成的同时,实现最高至三个级的超标度加速。
Article 100
Title@2025-06-12 (4): Logarithmic Smoothing for Adaptive PAC-Bayesian Off-Policy Learning
Title: Logarithmic Smoothing for Adaptive PAC-Bayesian Off-Policy Learning | Logarithmische Glättung für adaptives PAC-Bayesisches Off-Policy-Lernen | PAC-Bayesian适应性PAC-Bayesian离政策学习对数平滑 2506.10664v1 |
Authors (2): Maxime Haddouche, Otmane Sakhi
Off-policy learning serves as the primary framework for learning optimal policies from logged interactions collected under a static behavior policy. In this work, we investigate the more practical and flexible setting of adaptive off-policy learning, where policies are iteratively refined and re-deployed to collect higher-quality data. Building on the success of PAC-Bayesian learning with Logarithmic Smoothing (LS) in static settings, we extend this framework to the adaptive scenario using tools from online PAC-Bayesian theory. Furthermore, we demonstrate that a principled adjustment to the LS estimator naturally accommodates multiple rounds of deployment and yields faster convergence rates under mild conditions. Our method matches the performance of leading offline approaches in static settings, and significantly outperforms them when intermediate policy deployments are allowed. Empirical evaluations across diverse scenarios highlight both the advantages of adaptive data collection and the strength of the PAC-Bayesian formulation.
离政策学习是学习从静态行为政策下收集的登录互动中获取最佳政策的最佳政策的主要框架。 在这项工作中,我们调查适应性离政策学习的更实际和灵活环境,在这种环境中,对政策进行迭代改进和重新部署,以收集更高质量的数据。在PAC-Bayesian在静态环境中与对数平滑(LS)成功学习的基础上,我们利用在线PAC-Bayesian理论的工具,将这一框架扩大到适应情景。此外,我们还表明,对LS测算仪进行有原则的调整,自然会适应多轮部署,并在温和条件下产生更快的趋同率。我们的方法与静态环境中主要离线方法的绩效相匹配,并在允许中期政策部署时大大优于这些绩效。 不同情景的实证评估强调了适应性数据收集的优势和PAC-Bayesian公式的力度。
Article 101
Title@2025-06-12 (4): Pushing the Limits of Extreme Weather: Constructing Extreme Heatwave Storylines with Differentiable Climate Models
Title: Pushing the Limits of Extreme Weather: Constructing Extreme Heatwave Storylines with Differentiable Climate Models | Die Grenzen des extremen Wetters sprengen: Extreme Hitzewellen-Geschichten mit differenzierbaren Klimamodellen konstruieren | 推推极端天气的极限:用不同的气候模型构建极端热浪线 2506.10660v1 |
Authors (2): Tim Whittaker, Alejandro Di Luca
Understanding the plausible upper bounds of extreme weather events is essential for risk assessment in a warming climate. Existing methods, based on large ensembles of physics-based models, are often computationally expensive or lack the fidelity needed to simulate rare, high-impact extremes. Here, we present a novel framework that leverages a differentiable hybrid climate model, NeuralGCM, to optimize initial conditions and generate physically consistent worst-case heatwave trajectories. Applied to the 2021 Pacific Northwest heatwave, our method produces temperature anomalies up to 3.7 $^\circ$C above the most extreme member of a 75-member ensemble. These trajectories feature intensified atmospheric blocking and amplified Rossby wave patterns–hallmarks of severe heat events. Our results demonstrate that differentiable climate models can efficiently explore the upper tails of event likelihoods, providing a powerful new approach for constructing targeted storylines of extreme weather under climate change.
了解极端天气事件可信的上层界限对于在气候变暖情况下进行风险评估至关重要。基于基于物理模型的大型组合的现有方法往往计算成本昂贵或缺乏模拟稀有、高影响极端所需的忠诚性。在这里,我们提出了一个新的框架,利用不同的混合气候模型NeuralGCM来优化初始条件和产生物理上一致的最差热波轨迹。在2021年太平洋西北热浪中,我们的方法产生了温度异常,超过75个成员组合中最极端的成员3.7 $circ$C。这些轨迹特征是大气阻塞加剧,并放大了严重热事件的罗斯比波模式。我们的结果表明,不同的气候模型能够有效地探索事件可能性的上尾部,为构建气候变化下极端天气的定向故事线提供了强有力的新办法。
Article 102
Title@2025-06-12 (4): CoRT: Code-integrated Reasoning within Thinking
Title: CoRT: Code-integrated Reasoning within Thinking | CoRT: Code-integrierte Vernunft im Denken | CORT: 思考中守则综合理由 2506.09820v2 |
Authors (11): Chengpeng Li, Zhengyang Tang, Ziniu Li, Mingfeng Xue, Keqin Bao, Tian Ding, Ruoyu Sun, Benyou Wang, Xiang Wang, Junyang Lin, Dayiheng Liu
Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations. Addressing these limitations through computational tools (e.g., computation libraries and symbolic solvers) is promising, but it introduces a technical challenge: Code Interpreter (CI) brings external knowledge beyond the model’s internal text representations, thus the direct combination is not efficient. This paper introduces CoRT, a post-training framework for teaching LRMs to leverage CI effectively and efficiently. As a first step, we address the data scarcity issue by synthesizing code-integrated reasoning data through Hint-Engineering, which strategically inserts different hints at appropriate positions to optimize LRM-CI interaction. We manually create 30 high-quality samples, upon which we post-train models ranging from 1.5B to 32B parameters, with supervised fine-tuning, rejection fine-tuning and reinforcement learning. Our experimental results demonstrate that Hint-Engineering models achieve 4\% and 8\% absolute improvements on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B respectively, across five challenging mathematical reasoning datasets. Furthermore, Hint-Engineering models use about 30\% fewer tokens for the 32B model and 50\% fewer tokens for the 1.5B model compared with the natural language models. The models and code are available at https://github.com/ChengpengLi1003/CoRT.
o1 和 DeepSeek-R1 等大型解释模型(LRMs)显示,在使用长期思维链(CoT)的自然语言推理方面,取得了显著进展,但在处理复杂的数学操作时,它们仍然效率低或不准确。通过计算工具(例如计算图书馆和符号求解器)解决这些限制是大有希望的,但它带来了一个技术挑战:代码解释器(CI)将外部知识带出模型的内部文本演示之外,因此直接组合效率不高。本文介绍了CORT,这是一个培训后框架,用于教授LRM有效和高效地利用CI。作为第一步,我们通过Hint-EngIL3/RT,将代码集解的理论数据数据数据集数据集数据集数据集数据集(Hint-Eng-I)合成数据集数据集数据集在适当的位置上插入了不同的提示。我们手工创建了30个高品质的样本,从1.5B到32B参数的后向模型,同时对模型进行监管的微调、拒绝、微调、微调和加固化。我们的实验结果表明,Hint-Ening模型可以实现4-R-r-r-r-r-r-rxxx-xxxxx-xxxxxxxx的50的数学-x-x的数学-r-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-xx-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-
Article 103
Title@2025-06-12 (4): TSFM-Bench: A Comprehensive and Unified Benchmark of Foundation Models for Time Series Forecasting
Title: TSFM-Bench: A Comprehensive and Unified Benchmark of Foundation Models for Time Series Forecasting | TSFM-Bench: Ein umfassender und einheitlicher Benchmark von Basismodellen für die Zeitreihenprognose | TSFM-Bench:时间序列预测基础模型的全面统一基准 2410.11802v6 |
Authors (11): Zhe Li, Xiangfei Qiu, Peng Chen, Yihang Wang, Hanyin Cheng, Yang Shu, Jilin Hu, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Bin Yang
Time Series Forecasting (TSF) is key functionality in numerous fields, such as financial investment, weather services, and energy management. Although increasingly capable TSF methods occur, many of them require domain-specific data collection and model training and do not generalize well when applied in other domains. Time Series Foundation Models (TSFMs) that are pre-trained on massive heterogeneous time series data aim to overcome these limitations. The prospects for generalizability have spurred the development of a new generation of TSFMs. This study proposes a benchmark, TSFM-Bench, to facilitate comprehensive and unified evaluation of TSFMs. TSFM-Bench covers a wide range of TSFMs, including those based on large language models and those pre-trained on time series data. TSFM-Bench supports multiple forecasting scenarios, including zero-shot, few-shot, and full-shot, enabling assessment across the full range of adaptation strategies. TSFM-Bench also provides a standardized experimental protocols for critical evaluation processes such as dataset splitting, loading, normalization, and few-shot sampling, facilitating consistency and fairness. We report on an extensive evaluation of TSFMs across a diverse range of datasets spanning multiple domains and exhibiting varied statistical characteristics. Specifically, we identify pros and cons and inherent limitations of existing TSFMs, and we propose potential directions for new model designs.
时间序列预测(TSF)是金融投资、天气服务和能源管理等许多领域的关键功能,尽管技术战略框架方法越来越具有能力,但其中许多方法需要具体领域的数据收集和模型培训,在应用其他领域时没有很好地推广。时间系列基础模型(TSFM)预先培训了大规模不同时间序列数据,目的是克服这些限制。一般性的前景刺激了新一代技术战略框架的发展。本研究报告提出了一套基准,即TSFM-Bench,以促进对技术供应链的全面和统一评价。TSFM-Bench涵盖广泛的技术战略框架,包括基于大语言模型和事先接受过时间序列数据培训的模型。TSFM-Bench支持多种预测情景,包括零射、几发和全射,有利于对各种适应战略进行评估。TSFM-Bench还提出了一套标准化的示范性实验程序,用于关键评估进程,例如数据集的分离、装载、正常化和少发抽样取样,促进一致性和公平性。我们报告关于广泛评价大规模评估战略领域和多种统计模式的动态,我们提出了各种统计模式,并提出了各种统计模式,并提出了我们提出了各种统计模式,并展示了多种模式。
Article 104
Title@2025-06-12 (4): Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference
Title: Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference | Graph-dependent regret Bounds in Multi-Armed Bandits mit Interferenz | 多武装有干扰力的多武装强力土匪的图形- 依赖性 Regret Regret 界圈 2503.07555v2 |
Authors (3): Fateme Jamshidi, Mohammad Shahverdikondori, Negar Kiyavash
We study multi-armed bandits under network interference, where each unit’s reward depends on its own treatment and those of its neighbors in a given graph. This induces an exponentially large action space, making standard approaches computationally impractical. We propose a novel algorithm that uses the local graph structure to minimize regret. We derive a graph-dependent upper bound on cumulative regret that improves over prior work. Additionally, we provide the first lower bounds for bandits with arbitrary network interference, where each bound involves a distinct structural property of the graph. These bounds show that for both dense and sparse graphs, our algorithm is nearly optimal, with matching upper and lower bounds up to logarithmic factors. When the interference graph is unknown, a variant of our algorithm is Pareto optimal: no algorithm can uniformly outperform it across all instances. We complement our theoretical results with numerical experiments, showing that our approach outperforms the baseline methods.
在网络干扰下,我们研究多武装强盗,每个单位的奖赏取决于自己的待遇和某个图表中邻居的待遇。这引出了一个指数大的行动空间,使得标准方法在计算上变得不切实际。我们建议了一个使用本地图形结构的新型算法,以最大限度地减少遗憾。我们从累积的遗憾中得出一个依靠图形的上限,从而改善先前的工作。此外,我们为强盗提供了第一个较低的下限,即任意网络干扰,每个捆绑都涉及图的特殊结构属性。这些界限表明,对于稠密和稀疏的图表来说,我们的算法几乎是最佳的,与对数因素相匹配。当干扰图为未知时,我们的算法的变式是Pareto最佳的:没有一种算法可以在所有实例中一致地超越它。我们用数字实验来补充我们的理论结果,表明我们的方法比基线方法要好。
Article 105
Title@2025-06-12 (4): Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes
Title: Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes | Robuste, unüberwachte Anpassung eines Spracherkennungsgeräts mit Entropie-Minimierungs- und Lautsprechercodes | 使用磁最小化和演讲人守则的演讲者演讲者 2506.10653v1 |
Authors (4): Rogier C. van Dalen, Shucong Zhang, Titouan Parcollet, Sourav Bhattacharya
Speech recognisers usually perform optimally only in a specific environment and need to be adapted to work well in another. For adaptation to a new speaker, there is often too little data for fine-tuning to be robust, and that data is usually unlabelled. This paper proposes a combination of approaches to make adaptation to a single minute of data robust. First, instead of estimating the adaptation parameters with cross-entropy on a single error-prone hypothesis or “pseudo-label”, this paper proposes a novel loss function, the conditional entropy over complete hypotheses. Using multiple hypotheses makes adaptation more robust to errors in the initial recognition. Second, a “speaker code” characterises a speaker in a vector short enough that it requires little data to estimate. On a far-field noise-augmented version of Common Voice, the proposed scheme yields a 20% relative improvement in word error rate on one minute of adaptation data, increasing on 10 minutes to 29%.
语音识别器通常只在特定环境中最优化地发挥作用, 并且需要在另一个环境中适应效果良好。 为了适应新扬声器, 通常只有太少的数据可以微调, 数据通常不贴标签。 本文提出将适应方法组合起来, 使数据达到单分钟的稳健度。 首先, 本文没有在单一的错误易变假设或“ 假名” 上用跨点滴子来估计适应参数, 而是提出一个新的损失功能, 即有条件的对完整假称的诱变。 使用多个假设使适应能力更能适应初始识别错误。 其次, “ 扩音码” 的特性使一个矢量短, 需要很少数据来估计。 在远方的常见声音版本中, 所拟议的方案在适应数据一分钟的字错误率上产生20%的相对改善, 增加10分钟到29 % 。
Article 106
Title@2025-06-12 (4): Data Shifts Hurt CoT: A Theoretical Study
Title: Data Shifts Hurt CoT: A Theoretical Study | Data Shifts Hurt CoT: Eine theoretische Studie | Data Shifts Hurt CoT:理论研究 2506.10647v1 |
Authors (3): Lang Yin, Debangshu Banerjee, Gagandeep Singh
Chain of Thought (CoT) has been applied to various large language models (LLMs) and proven to be effective in improving the quality of outputs. In recent studies, transformers are proven to have absolute upper bounds in terms of expressive power, and consequently, they cannot solve many computationally difficult problems. However, empowered by CoT, transformers are proven to be able to solve some difficult problems effectively, such as the $k$-parity problem. Nevertheless, those works rely on two imperative assumptions: (1) identical training and testing distribution, and (2) corruption-free training data with correct reasoning steps. However, in the real world, these assumptions do not always hold. Although the risks of data shifts have caught attention, our work is the first to rigorously study the exact harm caused by such shifts to the best of our knowledge. Focusing on the $k$-parity problem, in this work we investigate the joint impact of two types of data shifts: the distribution shifts and data poisoning, on the quality of trained models obtained by a well-established CoT decomposition. In addition to revealing a surprising phenomenon that CoT leads to worse performance on learning parity than directly generating the prediction, our technical results also give a rigorous and comprehensive explanation of the mechanistic reasons of such impact.
运用于各种大型语言模型(LLMS),并证明在提高产出质量方面行之有效。在最近的研究中,变压器被证明在表达力方面具有绝对的上限,因此无法解决许多计算上的困难问题。然而,由于CoT的赋权,变压器被证明能够有效解决一些棘手问题,例如美元差异问题。然而,这些工作依赖于两个紧迫的假设:(1) 相同的培训和测试分布,和(2) 无腐败的培训数据,并有正确的推理步骤。然而,在现实世界中,这些假设并非始终有效。尽管数据变化的风险已经引起注意,但我们的工作是首先严格研究这种变化对我们知识的最佳影响所造成的确切损害。在这项工作中,我们注重美元差异问题,我们调查两类数据变化的共同影响:分布变化和数据中毒,以及由完善的Cot分解法获得的经过培训的模式的质量。此外,我们还揭示出一种令人吃惊的现象,即CoT导致学习平等性表现比直接产生严格的预测结果。
Article 107
Title@2025-06-12 (4): Flexible Tails for Normalizing Flows
Title: Flexible Tails for Normalizing Flows | Flexible Endlagen für Normalisierung von Strömungen | 流动正常化的灵活反面 2406.16971v2 |
Authors (2): Tennessee Hickling, Dennis Prangle
Normalizing flows are a flexible class of probability distributions, expressed as transformations of a simple base distribution. A limitation of standard normalizing flows is representing distributions with heavy tails, which arise in applications to both density estimation and variational inference. A popular current solution to this problem is to use a heavy tailed base distribution. We argue this can lead to poor performance due to the difficulty of optimising neural networks, such as normalizing flows, under heavy tailed input. We propose an alternative, “tail transform flow” (TTF), which uses a Gaussian base distribution and a final transformation layer which can produce heavy tails. Experimental results show this approach outperforms current methods, especially when the target distribution has large dimension or tail weight.
正常化流是一种灵活的概率分布类别,表现为简单基分布的转换。标准正常流的局限性代表重尾的分布,这种分布出现在密度估计和变异推论的应用中。当前最受欢迎的解决办法是使用重尾基分布。我们争辩说,这可能导致性能不佳,因为很难优化神经网络,例如,在重尾投入下流的正常化。我们提出了另一种选择,即“尾变流 ” (TTF),它使用高山基分布和最终变形层,可以产生重尾。实验结果显示,这种方法优于当前的方法,特别是在目标分布具有大尺寸或尾重量的情况下。
Article 108
Title@2025-06-12 (4): Structure and asymptotic preserving deep neural surrogates for uncertainty quantification in multiscale kinetic equations
Title: Structure and asymptotic preserving deep neural surrogates for uncertainty quantification in multiscale kinetic equations | Struktur und asymptotische Erhaltung tiefer neuronaler Surrogate zur Unsicherheitsquantifizierung in multiskaligen kinetischen Gleichungen | 用于多尺度动动能方程中不确定性量化的深海神经代代体 2506.10636v1 |
Authors (3): Wei Chen, Giacomo Dimarco, Lorenzo Pareschi
The high dimensionality of kinetic equations with stochastic parameters poses major computational challenges for uncertainty quantification (UQ). Traditional Monte Carlo (MC) sampling methods, while widely used, suffer from slow convergence and high variance, which become increasingly severe as the dimensionality of the parameter space grows. To accelerate MC sampling, we adopt a multiscale control variates strategy that leverages low-fidelity solutions from simplified kinetic models to reduce variance. To further improve sampling efficiency and preserve the underlying physics, we introduce surrogate models based on structure and asymptotic preserving neural networks (SAPNNs). These deep neural networks are specifically designed to satisfy key physical properties, including positivity, conservation laws, entropy dissipation, and asymptotic limits. By training the SAPNNs on low-fidelity models and enriching them with selected high-fidelity samples from the full Boltzmann equation, our method achieves significant variance reduction while maintaining physical consistency and asymptotic accuracy. The proposed methodology enables efficient large-scale prediction in kinetic UQ and is validated across both homogeneous and nonhomogeneous multiscale regimes. Numerical results demonstrate improved accuracy and computational efficiency compared to standard MC techniques.
使用传统的蒙特卡洛(MC)采样方法,虽然广泛使用,但缓慢趋同和高度差异,随着参数空间的维度增加而变得日益严重。为了加速MC采样,我们采取了一个多尺度的控制变异战略,利用简化动动能模型的低纤维溶液来减少差异。为了进一步提高采样效率和保护基础物理学,我们采用了基于结构和无药保存神经网络的代谢模型(SAPNNS)。这些深神经网络专门设计以满足关键物理特性,包括现实性、保护法、消化和无药限制。通过对SAPNN(S)进行低纤维模型培训,并用全波尔茨曼方程式中选定的高纤维化样本丰富这些网络,我们的方法在保持物理一致性和无药性精确性的同时实现了显著的差异减少。拟议的方法使得在运动UQ(SAPNNNS)中进行高效的大规模预测,并且经过同质和非遗传性、非遗传性计算方法的验证,从而展示了标准化和非遗传性结果。
Article 109
Title@2025-06-12 (4): Deploying Open-Source Large Language Models: A performance Analysis
Title: Deploying Open-Source Large Language Models: A performance Analysis | Bereitstellung von Open-Source-Großsprachenmodellen: Eine Leistungsanalyse | 部署开放源码大语言模式:业绩分析 2409.14887v4 |
Authors (3): Yannis Bendi-Ouis, Dan Dutartre, Xavier Hinaut
Since the release of ChatGPT in November 2022, large language models (LLMs) have seen considerable success, including in the open-source community, with many open-weight models available. However, the requirements to deploy such a service are often unknown and difficult to evaluate in advance. To facilitate this process, we conducted numerous tests at the Centre Inria de l’Universit'e de Bordeaux. In this article, we propose a comparison of the performance of several models of different sizes (mainly Mistral and LLaMa) depending on the available GPUs, using vLLM, a Python library designed to optimize the inference of these models. Our results provide valuable information for private and public groups wishing to deploy LLMs, allowing them to evaluate the performance of different models based on their available hardware. This study thus contributes to facilitating the adoption and use of these large language models in various application domains.
自2022年11月公布ChattGPT以来,大型语言模式(LLMS)取得了相当大的成功,包括在开放源码社区,有许多开放量级模式,然而,部署这种服务的要求往往不为人所知,而且难以事先评估,为便利这一进程,我们在Bordeaux大学中心进行了多次测试,在本篇文章中,我们提议比较几种不同规模的模型(主要是Mistral和LLLaMa)的性能,这取决于现有GPUs的性能,使用VLLM(一个旨在优化这些模型推断的Python图书馆),我们的成果为希望部署LMS的私人和公共团体提供了宝贵的信息,使他们能够根据它们现有的硬件评估不同模型的性能,从而帮助在不同应用领域采用和使用这些大型语言模式。
Article 110
Title@2025-06-12 (4): Hessian Geometry of Latent Space in Generative Models
Title: Hessian Geometry of Latent Space in Generative Models | Hessische Geometrie des Latentenraums in generativen Modellen | 创创模型中的深层空间的赫西德几何学 2506.10632v1 |
Authors (4): Alexander Lobashev, Dmitry Guskov, Maria Larchenko, Mikhail Tamm
This paper presents a novel method for analyzing the latent space geometry of generative models, including statistical physics models and diffusion models, by reconstructing the Fisher information metric. The method approximates the posterior distribution of latent variables given generated samples and uses this to learn the log-partition function, which defines the Fisher metric for exponential families. Theoretical convergence guarantees are provided, and the method is validated on the Ising and TASEP models, outperforming existing baselines in reconstructing thermodynamic quantities. Applied to diffusion models, the method reveals a fractal structure of phase transitions in the latent space, characterized by abrupt changes in the Fisher metric. We demonstrate that while geodesic interpolations are approximately linear within individual phases, this linearity breaks down at phase boundaries, where the diffusion model exhibits a divergent Lipschitz constant with respect to the latent space. These findings provide new insights into the complex structure of diffusion model latent spaces and their connection to phenomena like phase transitions. Our source code is available at https://github.com/alobashev/hessian-geometry-of-diffusion-models.
本文介绍了一种新颖的方法,用以分析包括统计物理模型和传播模型在内的基因模型的潜在空间几何,方法是通过重建渔业信息度量来分析渔业信息度量,该方法近似了所生成样本潜在变量的后部分布,并用来学习记录分配功能,该功能为指数型家庭定义了渔业指标。提供了理论趋同保证,该方法在Ising和TASEP模型上得到了验证,超过了重建热力学数量的现有基线。该方法适用于扩散模型,该方法揭示了以渔业指标突变为特征的潜层空间阶段转变的分形结构。我们表明,虽然地深线间间相互作用在个别阶段中大致线性,但这种线性在阶段间断,在潜在空间方面,扩散模型显示利普施奇兹的常数。这些发现为扩散模型潜在空间的复杂结构及其与阶段过渡等现象的联系提供了新的见解。我们的源代码可在http://github.com/alobashev/hesian-geologisat-drigation-difift-diful-mod-mography-mod-mod-mod-mod-mod-modals.)。
Article 111
Title@2025-06-12 (4): Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs
Title: Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs | Zeitreihenprognosen als Begründung: Ein langsam denkender Ansatz mit verstärkten LLMs | 以时间序列预测作为理由:与强化的LLMs采取慢思考方式 2506.10630v1 |
Authors (7): Yucong Luo, Yitong Zhou, Mingyue Cheng, Jiahao Wang, Daoyu Wang, Tingyue Pan, Jintao Zhang
To advance time series forecasting (TSF), various methods have been proposed to improve prediction accuracy, evolving from statistical techniques to data-driven deep learning architectures. Despite their effectiveness, most existing methods still adhere to a fast thinking paradigm-relying on extracting historical patterns and mapping them to future values as their core modeling philosophy, lacking an explicit thinking process that incorporates intermediate time series reasoning. Meanwhile, emerging slow-thinking LLMs (e.g., OpenAI-o1) have shown remarkable multi-step reasoning capabilities, offering an alternative way to overcome these issues. However, prompt engineering alone presents several limitations - including high computational cost, privacy risks, and limited capacity for in-depth domain-specific time series reasoning. To address these limitations, a more promising approach is to train LLMs to develop slow thinking capabilities and acquire strong time series reasoning skills. For this purpose, we propose Time-R1, a two-stage reinforcement fine-tuning framework designed to enhance multi-step reasoning ability of LLMs for time series forecasting. Specifically, the first stage conducts supervised fine-tuning for warmup adaptation, while the second stage employs reinforcement learning to improve the model’s generalization ability. Particularly, we design a fine-grained multi-objective reward specifically for time series forecasting, and then introduce GRIP (group-based relative importance for policy optimization), which leverages non-uniform sampling to further encourage and optimize the model’s exploration of effective reasoning paths. Experiments demonstrate that Time-R1 significantly improves forecast performance across diverse datasets.
为推动时间序列预测(TSF),提出了各种提高预测准确性的方法,从统计技术演变为数据驱动的深层学习结构。尽管这些方法具有效力,但大多数现有方法仍然坚持快速思维模式,以提取历史模式并将其映射成未来价值观作为其核心模型哲学,缺乏包含中间时间序列推理的明确思维过程。与此同时,新出现的低思维长效LMS(例如OpenAI-o1)显示出显著的多步推理能力,为克服这些问题提供了替代方法。然而,仅迅速工程本身就提出了若干限制,包括高计算成本、隐私风险和深度特定域时间序列推理能力有限。为了克服这些限制,一个更有希望的方法是培训LLMS发展缓慢的思维能力,并获得强大的时间序列推理技能。为此,我们提议了一个两阶段强化微调框架,目的是提高LMS模型在时间序列预测方面的多步推理能力。1 第一阶段对暖化适应进行了监督的微调整,第二阶段则利用强化学习来提高模型的精确性精确度预测,从而具体地推理学地推理,从而推理地推理地推算出一个不比性地推理的GRRRRRR的模型。
Article 112
Title@2025-06-12 (4): Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning
Title: Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning | Aufgabenanpassung aus Kompetenzen: Informationsgeometrie, Entflechtung und neue Ziele für unbeaufsichtigtes Weiterbildungslernen | 适应技能的任务:信息几何、分离和未受监督的加强学习新目标 2506.10629v1 |
Authors (6): Yucheng Yang, Tianyi Zhou, Qiang He, Lei Han, Mykola Pechenizkiy, Meng Fang
Unsupervised reinforcement learning (URL) aims to learn general skills for unseen downstream tasks. Mutual Information Skill Learning (MISL) addresses URL by maximizing the mutual information between states and skills but lacks sufficient theoretical analysis, e.g., how well its learned skills can initialize a downstream task’s policy. Our new theoretical analysis in this paper shows that the diversity and separability of learned skills are fundamentally critical to downstream task adaptation but MISL does not necessarily guarantee these properties. To complement MISL, we propose a novel disentanglement metric LSEPIN. Moreover, we build an information-geometric connection between LSEPIN and downstream task adaptation cost. For better geometric properties, we investigate a new strategy that replaces the KL divergence in information geometry with Wasserstein distance. We extend the geometric analysis to it, which leads to a novel skill-learning objective WSEP. It is theoretically justified to be helpful to downstream task adaptation and it is capable of discovering more initial policies for downstream tasks than MISL. We finally propose another Wasserstein distance-based algorithm PWSEP that can theoretically discover all optimal initial policies.
无人监督的强化学习(URL)旨在学习隐性下游任务的一般技能。 互信技能学习(MISL)通过最大限度地增加国家与技能之间的相互信息,处理URL,但缺乏足够的理论分析,例如,其所学技能在多大程度上能够启动下游任务的政策。 本文中的新理论分析表明,学习技能的多样性和分离性对于下游任务适应至关重要,但MISL不一定保证这些属性。 为了补充MISL, 我们提议了一个新的分解性指标LSEPIN。 此外, 我们在LSEPIN和下游任务适应成本之间建立了信息地理学联系。 为了更好的几何特性,我们调查了一种新的战略,用瓦塞斯坦距离取代KLL在信息几何方面的差异。 我们将地理学分析扩展到它,从而导致一个新的技能学习目标WSEP。 从理论上讲,帮助下游任务适应是有道理的,并且能够发现比MISL更多的下游任务初始政策。 我们最后建议了另一个瓦列斯特斯坦远程算算算法 PWESEP, 能够从理论上发现所有最佳初始政策。
Article 113
Title@2025-06-12 (4): Leveraging Low-rank Factorizations of Conditional Correlation Matrices in Graph Learning
Title: Leveraging Low-rank Factorizations of Conditional Correlation Matrices in Graph Learning | Leveraging Low-rank Factorizations of Conditional Correlation Matrices in Graph Learning | 在图表学习中利用条件性关联矩阵的低级别定级 2506.10628v1 |
Authors (4): Thu Ha Phi, Alexandre Hippert-Ferrer, Florent Bouchard, Arnaud Breloy
This paper addresses the problem of learning an undirected graph from data gathered at each nodes. Within the graph signal processing framework, the topology of such graph can be linked to the support of the conditional correlation matrix of the data. The corresponding graph learning problem then scales to the squares of the number of variables (nodes), which is usually problematic at large dimension. To tackle this issue, we propose a graph learning framework that leverages a low-rank factorization of the conditional correlation matrix. In order to solve for the resulting optimization problems, we derive tools required to apply Riemannian optimization techniques for this particular structure. The proposal is then particularized to a low-rank constrained counterpart of the GLasso algorithm, i.e., the penalized maximum likelihood estimation of a Gaussian graphical model. Experiments on synthetic and real data evidence that a very efficient dimension-versus-performance trade-off can be achieved with this approach.
本文探讨从每个节点收集的数据中学习一个非方向图的问题。 在图形信号处理框架内, 此图的表层可以与数据有条件相关矩阵的支持联系起来。 相应的图层学习问题然后将变量数( 节点) 的方形按比例排列为通常在大范围内存在问题的变量数( 节点) 。 为了解决这一问题, 我们提议了一个图形学习框架, 利用条件相关矩阵的低等级因子化。 为了解决由此产生的优化问题, 我们获取了将里曼尼亚优化技术应用于这一特定结构所需的工具。 然后, 该提案被具体指定为GLasso 算法的低级别限制对应方, 即高斯图形模型的最大受限估计可能性。 实验合成和真实数据证据表明, 使用这种方法可以实现非常高效的维度- 反效果- 贸易。
Article 114
Title@2025-06-12 (4): Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation
Title: Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation | Pretraining Generative Flow Networks mit teuren Belohnungen für die molekulare Graphengenerierung | 为分子图生成建立具有不昂贵奖励的 预培训产生流动网络 2503.06337v4 |
Authors (5): Mohit Pandey, Gopeshh Subbaraj, Artem Cherkasov, Martin Ester, Emmanuel Bengio
Generative Flow Networks (GFlowNets) have recently emerged as a suitable framework for generating diverse and high-quality molecular structures by learning from rewards treated as unnormalized distributions. Previous works in this framework often restrict exploration by using predefined molecular fragments as building blocks, limiting the chemical space that can be accessed. In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. We propose an unsupervised pre-training approach using drug-like molecule datasets, which teaches A-GFNs about inexpensive yet informative molecular descriptors such as drug-likeliness, topological polar surface area, and synthetic accessibility scores. These properties serve as proxy rewards, guiding A-GFNs towards regions of chemical space that exhibit desirable pharmacological properties. We further implement a goal-conditioned finetuning process, which adapts A-GFNs to optimize for specific target properties. In this work, we pretrain A-GFN on a subset of ZINC dataset, and by employing robust evaluation metrics we show the effectiveness of our approach when compared to other relevant baseline methods for a wide range of drug design tasks. The code is accessible at https://github.com/diamondspark/AGFN.
最近,出现了一种利用个体原子作为组成部分来更全面地探索类似毒品的化学空间的基础性基因化模型(GFlowNets),通过学习作为非标准化分布分布的奖励,形成多样化和高质量分子结构的合适框架。本框架以前的工作往往限制勘探,使用预先定义的分子碎片作为构件,限制可以进入的化学空间。在这项工作中,我们引入了Atoro-GFlowNets(A-GFNs),这是利用个体原子作为建筑块来更全面地探索类似毒品的化学空间的基础性基因化模型。我们建议采用一种不受监督的训练前方法,使用类似毒品的分子数据集,教A-GFNs使用廉价但信息丰富的分子描述器,例如类似毒品、表层极地表面积和合成的可获取性分数。这些特性作为代理奖赏,指导A-GFNs到具有适当药性能的化学空间区域。我们进一步实施一个目标性调整过程,使A-GFNs适应于特定的目标性特性。在这项工作中,我们先将A-GFN系统用于ZINC数据基点的一个子组的一组,然后在使用可靠的基准/RAFDAFS范围上,在使用其他设计方法上显示我们可获取性基准设计范围时,我们的相关设计方法的范围。
Article 115
Title@2025-06-12 (4): SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis
Title: SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis | SDialog: Ein Python-Toolkit für die Synthetische Dialog-Generierung und -Analyse | Sidialog:合成对话生成和分析的Python工具包 2506.10622v1 |
Authors (3): Sergio Burdisso, Esaú Villatoro-Tello, Petr Motlicek
The advancement of conversational AI systems relies on the availability of high-quality, flexible, and reproducible synthetic dialogues for training, evaluation, and benchmarking. SDialog is a modular, extensible Python toolkit designed to address the challenges of synthetic dialogue generation and analysis. By leveraging instruction-tuned Large Language Models (LLMs), SDialog provides abstractions for personas, orchestration, and scenario management, enabling the creation of realistic, diverse, and controllable conversational data for research and development. SDialog supports workflows such as multi-agent simulation and scenario-driven generation, and represents a step forward in the standardization of tools and frameworks for synthetic data generation, a crucial advancement for ensuring reproducibility in today’s fast-evolving research landscape.
Sdialog是一个模块化的、可推广的Python工具包,旨在应对合成对话生成和分析的挑战。 Sdialog通过利用受指导的大型语言模型(LLMs),为个人、交响和情景管理提供抽象信息,为研究和开发创造现实、多样和可控制的交谈数据。Sdialog支持多试模拟和情景生成等工作流程,并代表了合成数据生成工具和框架标准化的一步,这是确保当今快速变化的研究格局中再生的关键进步。
Article 116
Title@2025-06-12 (4): Assessing the Resilience of Automotive Intrusion Detection Systems to Adversarial Manipulation
Title: Assessing the Resilience of Automotive Intrusion Detection Systems to Adversarial Manipulation | Bewertung der Resilienz von Intrusions-Detektionssystemen für die Adversarialmanipulation | 评估汽车入侵探测系统对冲操纵的复原力 2506.10620v1 |
Authors (4): Stefano Longari, Paolo Cerracchio, Michele Carminati, Stefano Zanero
The security of modern vehicles has become increasingly important, with the controller area network (CAN) bus serving as a critical communication backbone for various Electronic Control Units (ECUs). The absence of robust security measures in CAN, coupled with the increasing connectivity of vehicles, makes them susceptible to cyberattacks. While intrusion detection systems (IDSs) have been developed to counter such threats, they are not foolproof. Adversarial attacks, particularly evasion attacks, can manipulate inputs to bypass detection by IDSs. This paper extends our previous work by investigating the feasibility and impact of gradient-based adversarial attacks performed with different degrees of knowledge against automotive IDSs. We consider three scenarios: white-box (attacker with full system knowledge), grey-box (partial system knowledge), and the more realistic black-box (no knowledge of the IDS’ internal workings or data). We evaluate the effectiveness of the proposed attacks against state-of-the-art IDSs on two publicly available datasets. Additionally, we study effect of the adversarial perturbation on the attack impact and evaluate real-time feasibility by precomputing evasive payloads for timed injection based on bus traffic. Our results demonstrate that, besides attacks being challenging due to the automotive domain constraints, their effectiveness is strongly dependent on the dataset quality, the target IDS, and the attacker’s degree of knowledge.
现代车辆的安全已变得日益重要,控制区网络(CAN)公共汽车是各种电子控制单位的关键通信主干线。由于CAN缺乏强有力的安全措施,加上车辆的连通性日益增强,因此这些车辆很容易受到网络攻击。虽然已经开发了入侵探测系统(IDS)来对付这种威胁,但这些系统并非易犯的。反向攻击,特别是逃避攻击,可以操纵输入,绕过IDS的探测。本文扩展了我们以前的工作,调查了基于梯度的对冲攻击的可行性和影响,对汽车IDS进行了不同程度的知识攻击。我们考虑了三种情景:白箱(具有全系统知识的攻击者)、灰盒(部分系统知识)和更加现实的黑盒(对IDS的内部工作或数据一无所知)。我们评估了拟议对State of the state-the art-arted IDS的攻击的有效性。此外,我们研究了对攻击袭击的对抗性透视线对攻击影响的影响,并通过预先分析对实时可行性进行评估。我们考虑的是:白箱(具有全系统知识的攻击)、灰盒(部分系统知识)以及更现实的黑盒(没有时间定位数据系统),我们对攻击的定位数据系统测试质量限制。
Article 117
Title@2025-06-12 (4): Universal Neural Optimal Transport
Title: Universal Neural Optimal Transport | Universeller Neural Optimaler Verkehr | 通用神经优化运输 2212.00133v6 |
Authors (4): Jonathan Geuter, Gregor Kornhardt, Ingimar Tomasson, Vaios Laschos
Optimal Transport (OT) problems are a cornerstone of many applications, but solving them is computationally expensive. To address this problem, we propose UNOT (Universal Neural Optimal Transport), a novel framework capable of accurately predicting (entropic) OT distances and plans between discrete measures for a given cost function. UNOT builds on Fourier Neural Operators, a universal class of neural networks that map between function spaces and that are discretization-invariant, which enables our network to process measures of variable resolutions. The network is trained adversarially using a second, generating network and a self-supervised bootstrapping loss. We ground UNOT in an extensive theoretical framework. Through experiments on Euclidean and non-Euclidean domains, we show that our network not only accurately predicts OT distances and plans across a wide range of datasets, but also captures the geometry of the Wasserstein space correctly. Furthermore, we show that our network can be used as a state-of-the-art initialization for the Sinkhorn algorithm with speedups of up to $7.4\times$, significantly outperforming existing approaches.
最佳运输(OT)问题是许多应用的基石,但解决它们的费用是计算成本高昂的。为了解决这个问题,我们提议UNOT(通用神经最佳运输),这是一个能够准确预测特定成本功能离散措施(即机能性)OT距离和计划的新框架。UNOT建在Fourier神经操作器上,这是在功能空间之间绘制图的通用神经网络类别,并且是分解和异变的神经网络,使我们的网络能够处理变量分辨率的测量。这个网络是用第二个工具进行对抗性训练的,产生网络和自我监督的靴子损失。我们把UNOT置于一个广泛的理论框架内。我们通过在Euclidean和非EClidean域的实验,我们表明我们的网络不仅准确预测了在广泛数据集中的OT距离和计划,而且还正确地捕捉了瓦列斯坦空间的几何测量。此外,我们显示我们的网络可以作为Sinkhorn算法的尖端初始化方法,其速度高达7.4美元。
Article 118
Title@2025-06-12 (4): Deep Learning-Based Digitization of Overlapping ECG Images with Open-Source Python Code
Title: Deep Learning-Based Digitization of Overlapping ECG Images with Open-Source Python Code | Deep Learning-based Digitalisierung von überlappenden EKG-Bildern mit Open-Source-Python-Code | 使用开放源码的 ECG 重叠图像的深学习数字化 2506.10617v1 |
Authors (4): Reza Karbasi, Masoud Rahimi, Abdol-Hossein Vahabie, Hadi Moradi
This paper addresses the persistent challenge of accurately digitizing paper-based electrocardiogram (ECG) recordings, with a particular focus on robustly handling single leads compromised by signal overlaps-a common yet under-addressed issue in existing methodologies. We propose a two-stage pipeline designed to overcome this limitation. The first stage employs a U-Net based segmentation network, trained on a dataset enriched with overlapping signals and fortified with custom data augmentations, to accurately isolate the primary ECG trace. The subsequent stage converts this refined binary mask into a time-series signal using established digitization techniques, enhanced by an adaptive grid detection module for improved versatility across different ECG formats and scales. Our experimental results demonstrate the efficacy of our approach. The U-Net architecture achieves an IoU of 0.87 for the fine-grained segmentation task. Crucially, our proposed digitization method yields superior performance compared to a well-established baseline technique across both non-overlapping and challenging overlapping ECG samples. For non-overlapping signals, our method achieved a Mean Squared Error (MSE) of 0.0010 and a Pearson Correlation Coefficient (rho) of 0.9644, compared to 0.0015 and 0.9366, respectively, for the baseline. On samples with signal overlap, our method achieved an MSE of 0.0029 and a rho of 0.9641, significantly improving upon the baseline’s 0.0178 and 0.8676. This work demonstrates an effective strategy to significantly enhance digitization accuracy, especially in the presence of signal overlaps, thereby laying a strong foundation for the reliable conversion of analog ECG records into analyzable digital data for contemporary research and clinical applications. The implementation is publicly available at this GitHub repository: https://github.com/masoudrahimi39/ECG-code.
本文讨论了将基于纸张的心电图(ECG)记录进行准确数字化的持久挑战,特别侧重于严格处理因信号重叠而受损的单项线索,这是现有方法中常见但处理不足的问题。我们提出一个旨在克服这一限制的两阶段管道。第一阶段使用基于U-Net的分解网络,该网的分解网络经过培训,该网的数据集经过重叠信号的丰富和定制数据增强,以准确隔离主要的ECG追踪。随后阶段利用既定的数字化技术,将这一经过改进的双面遮罩转换成一个时间序列信号,并辅之以一个适应性电网检测模块,以改善不同ECGF格式和尺度的多功能。我们的实验结果显示了我们的方法的有效性。U-Net结构为精细分解任务实现了0.87的IOU。 值得注意的是,我们提议的数字化方法在非重叠和具有挑战性的ECGH样本中,将这一改进的双面双面双面双层遮固的双层遮掩罩。 对于非重叠的信号,我们的方法在0.00和Pearson-Correxl的服务器的转换应用中, 将这一基基基比为0.9-0.00的基比的基数据, 提高了我们的数据基基基比为0.16的基比为0.9的基的基比的基比的基基基比的基基比提高了。
Article 119
Title@2025-06-12 (4): Non-stationary Online Learning for Curved Losses: Improved Dynamic Regret via Mixability
Title: Non-stationary Online Learning for Curved Losses: Improved Dynamic Regret via Mixability | Non-stationäres Online-Lernen für geschwungene Verluste: Dynamischer Streit durch Mischbarkeit verbessert | 非静止的曲线损失在线学习:通过混合改进动态迟缓 2506.10616v1 |
Authors (3): Yu-Jie Zhang, Peng Zhao, Masashi Sugiyama
Non-stationary online learning has drawn much attention in recent years. Despite considerable progress, dynamic regret minimization has primarily focused on convex functions, leaving the functions with stronger curvature (e.g., squared or logistic loss) underexplored. In this work, we address this gap by showing that the regret can be substantially improved by leveraging the concept of mixability, a property that generalizes exp-concavity to effectively capture loss curvature. Let $d$ denote the dimensionality and $P_T$ the path length of comparators that reflects the environmental non-stationarity. We demonstrate that an exponential-weight method with fixed-share updates achieves an $\mathcal{O}(d T^{1/3} P_T^{2/3} \log T)$ dynamic regret for mixable losses, improving upon the best-known $\mathcal{O}(d^{10/3} T^{1/3} P_T^{2/3} \log T)$ result (Baby and Wang, 2021) in $d$. More importantly, this improvement arises from a simple yet powerful analytical framework that exploits the mixability, which avoids the Karush-Kuhn-Tucker-based analysis required by existing work.
近些年来,非静止在线学习引起了人们的极大关注。尽管取得了相当大的进步,但动态最小化的遗憾主要集中于 convex 功能,使得这些功能的曲线(如平方或后勤损失)没有得到充分探讨。在这项工作中,我们通过利用混合化概念,表明这一遗憾可以大大改善,因为混合化概念是一种通用化的属性,可以有效捕捉损失曲线。让美元表示方位和美元方块的路径长度,反映环境非静止化的参照方块。我们证明,使用固定共享更新的指数加权法可以取得美元(d T1/3}(d T2/3}P_T_P_T$)。更重要的是,这种改进产生于简单而有力的分析框架,通过利用现有的模型(Baby和Wang, 2021, 美元),避免了目前这种组合化分析。
Article 120
Title@2025-06-12 (4): Data Driven Diagnosis for Large Cyber-Physical-Systems with Minimal Prior Information
Title: Data Driven Diagnosis for Large Cyber-Physical-Systems with Minimal Prior Information | Datenangetriebene Diagnose für große Cyber-Physical-Systeme mit minimalen Vorinformationen | 具有最起码原始信息的大型网络物理系统数据驱动诊断 2506.10613v1 |
Authors (6): Henrik Sebastian Steude, Alexander Diedrich, Ingo Pill, Lukas Moddemann, Daniel Vranješ, Oliver Niggemann
Diagnostic processes for complex cyber-physical systems often require extensive prior knowledge in the form of detailed system models or comprehensive training data. However, obtaining such information poses a significant challenge. To address this issue, we present a new diagnostic approach that operates with minimal prior knowledge, requiring only a basic understanding of subsystem relationships and data from nominal operations. Our method combines a neural network-based symptom generator, which employs subsystem-level anomaly detection, with a new graph diagnosis algorithm that leverages minimal causal relationship information between subsystems-information that is typically available in practice. Our experiments with fully controllable simulated datasets show that our method includes the true causal component in its diagnosis set for 82 p.c. of all cases while effectively reducing the search space in 73 p.c. of the scenarios. Additional tests on the real-world Secure Water Treatment dataset showcase the approach’s potential for practical scenarios. Our results thus highlight our approach’s potential for practical applications with large and complex cyber-physical systems where limited prior knowledge is available.
复杂的网络物理体系的诊断过程往往需要以详细系统模型或全面培训数据的形式提供广泛的先前知识。然而,获取此类信息是一个重大挑战。为了解决这一问题,我们提出了一个新的诊断方法,它以最起码的先前知识运作,只需要对子系统关系和名义操作数据有基本了解。我们的方法将神经网络症状生成器(使用子系统一级异常探测法)与新的图表诊断算法(利用通常可以实际得到的子系统信息之间最小的因果关系信息)结合起来。我们用完全可控的模拟数据集进行的实验表明,我们的方法包括所有案例的正确因果关系,而其诊断集为82页c.,同时有效地减少了情景的搜索空间。对现实世界安全水处理数据集的额外测试展示了该方法在实际情景方面的潜力。因此,我们的结果突出表明了我们的方法在使用大型和复杂的网络物理系统方面的潜在实际应用潜力,而先前的知识有限。
Article 121
Title@2025-06-12 (4): Density Ratio Estimation with Conditional Probability Paths
Title: Density Ratio Estimation with Conditional Probability Paths | Schätzung des Dichte-Verhältnisses mit bedingten Wahrscheinlichkeitspfaden | 以有条件概率路径估算密度比率 2502.02300v3 |
Authors (5): Hanlin Yu, Arto Klami, Aapo Hyvärinen, Anna Korba, Omar Chehab
Density ratio estimation in high dimensions can be reframed as integrating a certain quantity, the time score, over probability paths which interpolate between the two densities. In practice, the time score has to be estimated based on samples from the two densities. However, existing methods for this problem remain computationally expensive and can yield inaccurate estimates. Inspired by recent advances in generative modeling, we introduce a novel framework for time score estimation, based on a conditioning variable. Choosing the conditioning variable judiciously enables a closed-form objective function. We demonstrate that, compared to previous approaches, our approach results in faster learning of the time score and competitive or better estimation accuracies of the density ratio on challenging tasks. Furthermore, we establish theoretical guarantees on the error of the estimated density ratio.
高维度的密度比率估计可被重新设定为结合一定数量、时间评分、超过两种密度之间相互交错的概率路径。 实际上,时间评分必须根据两种密度的样本来估计。然而,这一问题的现有方法仍然计算昂贵,可以得出不准确的估计数。在基因模型的最近进展的启发下,我们引入了一个基于调节变数的时间评分新框架。选择调节变数明智地选择了封闭式目标功能。我们证明,与以往方法相比,我们的方法可以更快地了解时间评分和竞争性或更好估计挑战性任务密度比率的准确性。此外,我们还为估计密度比率的错误建立了理论保障。
Article 122
Title@2025-06-12 (4): Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models
Title: Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models | Inferenz-Zeit-Dekomposition von Aktivierungen (ITDA): Ein skalierbarer Ansatz zur Interpretation großer Sprachmodelle | 触发(ITDA):解释大语言模式的可扩展办法 2505.17769v2 |
Authors (3): Patrick Leask, Neel Nanda, Noura Al Moubayed
Sparse autoencoders (SAEs) are a popular method for decomposing Large Langage Models (LLM) activations into interpretable latents. However, due to their substantial training cost, most academic research uses open-source SAEs which are only available for a restricted set of models of up to 27B parameters. SAE latents are also learned from a dataset of activations, which means they do not transfer between models. Motivated by relative representation similarity measures, we introduce Inference-Time Decomposition of Activations (ITDA) models, an alternative method for decomposing language model activations. To train an ITDA, we greedily construct a dictionary of language model activations on a dataset of prompts, selecting those activations which were worst approximated by matching pursuit on the existing dictionary. ITDAs can be trained in just 1% of the time required for SAEs, using 1% of the data. This allowed us to train ITDAs on Llama-3.1 70B and 405B on a single consumer GPU. ITDAs can achieve similar reconstruction performance to SAEs on some target LLMs, but generally incur a performance penalty. However, ITDA dictionaries enable cross-model comparisons, and a simple Jaccard similarity index on ITDA dictionaries outperforms existing methods like CKA, SVCCA, and relative representation similarity metrics. ITDAs provide a cheap alternative to SAEs where computational resources are limited, or when cross model comparisons are necessary. Code available at https://github.com/pleask/itda.
将大型朗格模型(LLM)激活成可解释的潜值,这是将大型朗格模型(LLM)激活成大型朗格模型的流行方法。然而,由于培训费用高昂,大多数学术研究都使用开放源的SAE系统,而只有一套范围有限的、不超过27B参数的模型才有开放源的SAE系统。还从一个启动数据集中学习SAE的潜值,这意味着它们不会在模型之间转移。受相对代表性相似措施的驱动,我们引入了将活动模型(ITDA)激活成可解释的语言模型激活的替代方法。为了培训ITDA,我们贪婪地在提示数据集上建立语言模型的词典启动,选择最接近于现有字典中最差的启动。ITDA系统可以仅用1%的时间对SAE系统进行训练,使用1%的数据。这使我们能够在Llama-31-70B和405B的替代方法中培训ITDA系统,但在单个消费者GPU中,ITDA系统可以实现类似的重建业绩,在SADA上普遍地使IDA系统能使ICAS格式的S得到类似的业绩。
Article 123
Title@2025-06-12 (4): Distortion-Aware Brushing for Reliable Cluster Analysis in Multidimensional Projections
Title: Distortion-Aware Brushing for Reliable Cluster Analysis in Multidimensional Projections | Distortion-Aware Bürsten für zuverlässige Clusteranalyse in mehrdimensionalen Projektionen | 为多层面预测中的可靠集集分析 2201.06379v2 |
Authors (7): Hyeon Jeon, Michaël Aupetit, Soohyun Lee, Kwon Ko, Youngtaek Kim, Ghulam Jilani Quadri, Jinwook Seo
Brushing is a common interaction technique in 2D scatterplots, allowing users to select clustered points within a continuous, enclosed region for further analysis or filtering. However, applying conventional brushing to 2D representations of multidimensional (MD) data, i.e., Multidimensional Projections (MDPs), can lead to unreliable cluster analysis due to MDP-induced distortions that inaccurately represent the cluster structure of the original MD data. To alleviate this problem, we introduce a novel brushing technique for MDPs called Distortion-aware brushing. As users perform brushing, Distortion-aware brushing corrects distortions around the currently brushed points by dynamically relocating points in the projection, pulling data points close to the brushed points in MD space while pushing distant ones apart. This dynamic adjustment helps users brush MD clusters more accurately, leading to more reliable cluster analysis. Our user studies with 24 participants show that Distortion-aware brushing significantly outperforms previous brushing techniques for MDPs in accurately separating clusters in the MD space and remains robust against distortions. We further demonstrate the effectiveness of our technique through two use cases: (1) conducting cluster analysis of geospatial data and (2) interactively labeling MD clusters.
在 2D 散射点中,浏览是一种常见的互动技术,使用户能够在连续、封闭的封闭区域中选择聚集点,以便进一步分析或过滤。然而,对多维(MD)数据的2D表示方式,即多维预测(MDP)数据(MDP)应用常规的刷刷,可能导致由于MDP引起的扭曲而导致群集分析不可靠,这种扭曲不准确地代表原始MD数据的群集结构。为了缓解这一问题,我们为MDP引进了一种叫作扭曲的刷刷新技术。随着用户在投影中通过动态移动点对目前刷出的点进行刷新,扭曲意识的刷新,将数据点拉近MD空间的刷新点,同时将数据拉离遥远的点。这种动态调整有助于用户更准确地刷刷MD组,从而导致更可靠的群集分析。我们与24名参与者进行的用户研究表明,扭曲觉觉的刷新方法大大偏离了MD 空间中精确分离组合的MD 空间中微调技术,并保持稳健防止扭曲。我们进一步通过两个互动案例展示了我们技术的有效性:进行MD 类集 标签分析。
Article 124
Title@2025-06-12 (4): TransMLA: Multi-Head Latent Attention Is All You Need
Title: TransMLA: Multi-Head Latent Attention Is All You Need | TransMLA: Mehrkopf-Latent-Achtung ist alles, was Sie brauchen | TransMLA: 多党领袖关注是你们需要的 2502.07864v5 |
Authors (6): Fanxu Meng, Pingzhi Tang, Xiaojuan Tang, Zengwei Yao, Xing Sun, Muhan Zhang
In this paper, we present TransMLA, a framework that seamlessly converts any GQA-based pre-trained model into an MLA-based model. Our approach enables direct compatibility with DeepSeek’s codebase, allowing these models to fully leverage DeepSeek-specific optimizations such as vLLM and SGlang. By compressing 93% of the KV cache in LLaMA-2-7B, TransMLA achieves a 10.6x inference speedup at an 8K context length while preserving meaningful output quality. Additionally, the model requires only 6 billion tokens for fine-tuning to regain performance on par with the original across multiple benchmarks. TransMLA offers a practical solution for migrating GQA-based models to the MLA structure. When combined with DeepSeek’s advanced features, such as FP8 quantization and Multi-Token Prediction, even greater inference acceleration can be realized.
在本文中,我们展示了 TransMLA, 这个框架可以无缝地将基于GQA的预培训模式转换成基于司法协助的模式。 我们的方法可以直接与DeepSeek的代码库兼容, 使这些模型能够充分利用DepSeek特有的优化, 如 vLLM 和SGlan。 通过压缩LalaMA-2-7B 中93%的 KV缓存, TransMLA 可以在保留有意义的产出质量的同时以8K 的上下文长度加速10.6x 的推断速度。 此外, 该模型只需要60亿个符号进行微调, 以恢复与最初的多个基准相同的业绩。 TransMLA 提供了将基于GQA 的模型迁移到司法协助结构的实用解决方案。 当与DeepSeek的先进功能, 如FP8 量化和多方向预测相结合时, 甚至可以实现更大的推论加速。
Article 125
Title@2025-06-12 (4): ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization
Title: ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization | ConfPO: Ausnutzen des politischen Modells Vertrauen für kritische Token-Auswahl in Präferenz-Optimierung | 召集:利用政策模范信心在优先最佳化中选择关键物优选标准 2506.08712v2 |
Authors (5): Hee Suk Yoon, Eunseop Yoon, Mark Hasegawa-Johnson, Sungwoong Kim, Chang D. Yoo
We introduce ConfPO, a method for preference learning in Large Language Models (LLMs) that identifies and optimizes preference-critical tokens based solely on the training policy’s confidence, without requiring any auxiliary models or compute. Unlike prior Direct Alignment Algorithms (DAAs) such as Direct Preference Optimization (DPO), which uniformly adjust all token probabilities regardless of their relevance to preference, ConfPO focuses optimization on the most impactful tokens. This targeted approach improves alignment quality while mitigating overoptimization (i.e., reward hacking) by using the KL divergence budget more efficiently. In contrast to recent token-level methods that rely on credit-assignment models or AI annotators, raising concerns about scalability and reliability, ConfPO is simple, lightweight, and model-free. Experimental results on challenging alignment benchmarks, including AlpacaEval 2 and Arena-Hard, demonstrate that ConfPO consistently outperforms uniform DAAs across various LLMs, delivering better alignment with zero additional computational overhead.
我们引入了ConpPO(ConfPO),这是在大语言模型中进行优先学习的一种方法,它仅基于培训政策的信心,而无需任何辅助模型或计算,而确定和优化偏爱关键符号。与以前直接对齐的算法(DAAs)(DPO)不同,前者统一调整所有象征性概率,而不论其与偏好的相关性如何,ConfPO(Fonpo)侧重于最有影响力的标码。这一有针对性的方法提高了匹配质量,同时通过使用 KL 差异预算更有效地减少过度优化(即奖励黑客),同时通过使用 KL 差异预算来降低过度优化(即奖励) 。与最近采用依赖信用分配模型或AI Antiers(AI ) 的象征性方法相比, ConfPO(Compo) 是简单、轻和无型的。关于具有挑战性的校准基准(包括AlpacaEval 2 和 Arena-Hard)的实验结果显示,Conpo(Confo) 始终优于不同LMS(DAA) 的一致的DA) ,与零的额外计算管理。
Article 126
Title@2025-06-12 (4): Generative Uncertainty in Diffusion Models
Title: Generative Uncertainty in Diffusion Models | Generative Unsicherheit bei Diffusionsmodellen | 扩散模型中产生的不确定性 2502.20946v2 |
Authors (6): Metod Jazbec, Eliot Wong-Toi, Guoxuan Xia, Dan Zhang, Eric Nalisnick, Stephan Mandt
Diffusion models have recently driven significant breakthroughs in generative modeling. While state-of-the-art models produce high-quality samples on average, individual samples can still be low quality. Detecting such samples without human inspection remains a challenging task. To address this, we propose a Bayesian framework for estimating generative uncertainty of synthetic samples. We outline how to make Bayesian inference practical for large, modern generative models and introduce a new semantic likelihood (evaluated in the latent space of a feature extractor) to address the challenges posed by high-dimensional sample spaces. Through our experiments, we demonstrate that the proposed generative uncertainty effectively identifies poor-quality samples and significantly outperforms existing uncertainty-based methods. Notably, our Bayesian framework can be applied post-hoc to any pretrained diffusion or flow matching model (via the Laplace approximation), and we propose simple yet effective techniques to minimize its computational overhead during sampling.
最近,传播模型在基因模型方面取得了重大突破。尽管最先进的模型平均产生高质量的样本,但单个样本的质量仍然很低。在未经人类检查的情况下检测这些样本仍然是一项艰巨的任务。为了解决这个问题,我们提议建立一个贝叶斯框架来估计合成样本的基因变异性。我们概述了如何使贝叶斯的推论在大型现代基因模型中切实可行,并引入一种新的语义可能性(在特征提取的潜质空间中进行评估)来应对高维样本空间构成的挑战。通过我们的实验,我们证明拟议的基因变异性有效地识别了质量差的样本,大大超越了现有的基于不确定性的方法。值得注意的是,我们的贝叶斯框架可以应用在热后应用任何未经训练的传播或流动匹配模型(通过拉帕热近光线),我们提出了在取样过程中尽量减少其计算间接费用的简单而有效的技术。
Article 127
Title@2025-06-12 (4): Size-adaptive Hypothesis Testing for Fairness
Title: Size-adaptive Hypothesis Testing for Fairness | Size-adaptive Hypothesenprüfung auf Fairness | 公平性假设测试 2506.10586v1 |
Authors (5): Antonio Ferrara, Francesco Cozzi, Alan Perotti, André Panisson, Francesco Bonchi
Determining whether an algorithmic decision-making system discriminates against a specific demographic typically involves comparing a single point estimate of a fairness metric against a predefined threshold. This practice is statistically brittle: it ignores sampling error and treats small demographic subgroups the same as large ones. The problem intensifies in intersectional analyses, where multiple sensitive attributes are considered jointly, giving rise to a larger number of smaller groups. As these groups become more granular, the data representing them becomes too sparse for reliable estimation, and fairness metrics yield excessively wide confidence intervals, precluding meaningful conclusions about potential unfair treatments. In this paper, we introduce a unified, size-adaptive, hypothesis-testing framework that turns fairness assessment into an evidence-based statistical decision. Our contribution is twofold. (i) For sufficiently large subgroups, we prove a Central-Limit result for the statistical parity difference, leading to analytic confidence intervals and a Wald test whose type-I (false positive) error is guaranteed at level $\alpha$. (ii) For the long tail of small intersectional groups, we derive a fully Bayesian Dirichlet-multinomial estimator; Monte-Carlo credible intervals are calibrated for any sample size and naturally converge to Wald intervals as more data becomes available. We validate our approach empirically on benchmark datasets, demonstrating how our tests provide interpretable, statistically rigorous decisions under varying degrees of data availability and intersectionality.
确定算法决策系统是否歧视特定人口,通常需要比较一个单一点的公平衡量标准估计值,与预先确定的阈值相比较。这种做法在统计上是模糊的:它忽略了抽样错误,把小人口分组与大分组一样对待。在交叉分析中,问题加剧,因为多重敏感属性被一并考虑,导致更多的小群体。随着这些群体变得更加细微,它们所代表的数据变得过于稀少,无法可靠地估算,公平衡量标准产生过度广泛的信任间隔,从而无法对潜在的不公平待遇作出有意义的结论。在本文中,我们引入了一个统一、大小适应性的假设测试框架,将公平评估转化为基于证据的统计决定。我们的贡献是双重的。 对于足够大的分组,我们证明是统计均等差异的一个中度-弹性结果,导致分析性的信任间隔,以及衡量其类型一(偏差)的误差以美元表示肯定的正差,从而导致对潜在不公平待遇进行过度的交叉分析。 (二)对于小型交叉小组的长尾巴伊西亚-迪里斯特利德·迪利斯特利斯特-迪勒(Dalbloral-alblationalblationalblationalblationalblationalblationalalalalblock)) 数据测试,我们得出了一种完全的基比基比基比重的标准化数据。
Article 128
Title@2025-06-12 (4): MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark
Title: MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark | MAYA: Inkonsistenzen in generativem Passwort ansprechen Raten über einen einheitlichen Benchmark | MAYA:通过统一基准解决生成密码猜猜不一致问题 2504.16651v2 |
Authors (4): William Corrias, Fabio De Gaspari, Dorjan Hitaj, Luigi V. Mancini
Recent advances in generative models have led to their application in password guessing, with the aim of replicating the complexity, structure, and patterns of human-created passwords. Despite their potential, inconsistencies and inadequate evaluation methodologies in prior research have hindered meaningful comparisons and a comprehensive, unbiased understanding of their capabilities. This paper introduces MAYA, a unified, customizable, plug-and-play benchmarking framework designed to facilitate the systematic characterization and benchmarking of generative password-guessing models in the context of trawling attacks. Using MAYA, we conduct a comprehensive assessment of six state-of-the-art approaches, which we re-implemented and adapted to ensure standardization. Our evaluation spans eight real-world password datasets and covers an exhaustive set of advanced testing scenarios, totaling over 15,000 compute hours. Our findings indicate that these models effectively capture different aspects of human password distribution and exhibit strong generalization capabilities. However, their effectiveness varies significantly with long and complex passwords. Through our evaluation, sequential models consistently outperform other generative architectures and traditional password-guessing tools, demonstrating unique capabilities in generating accurate and complex guesses. Moreover, the diverse password distributions learned by the models enable a multi-model attack that outperforms the best individual model. By releasing MAYA, we aim to foster further research, providing the community with a new tool to consistently and reliably benchmark generative password-guessing models. Our framework is publicly available at https://github.com/williamcorrias/MAYA-Password-Benchmarking.
最近基因化模型的进展导致其应用于密码猜测,目的是复制人类创造的密码的复杂性、结构和模式。尽管这些密码具有潜力,但以往研究中的不一致和不充分的评价方法妨碍了有意义的比较和对其能力的全面、公正的理解。本文介绍了一个统一的、可定制的、插接和游戏基准框架MAYA, 目的是便利在拖网攻击中系统化地貌密码猜测模型的定性和基准化。我们利用MAYA,对六种最先进的方法进行了全面评估,我们重新实施和调整,以确保标准化。我们的评价跨越八个真实世界密码数据集,涵盖一套详尽的先进测试情景,总共15 000个计算小时。我们的研究结果表明,这些模型有效地捕捉了人类密码分布的不同方面,并展示了强大的概括化能力。然而,它们的效力因长而复杂地密码而大不相同。通过我们的评估,顺序模型始终优于其他的基因化框架和传统的密码调查工具,展示了在创建准确和复杂的基因化模型方面的独特能力。A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-
Article 129
Title@2025-06-12 (4): Bandit Convex Optimisation
Title: Bandit Convex Optimisation | Bandit Convex Optimierung | Bandit Convex 优化优化 2402.06535v4 |
Authors (1): Tor Lattimore
Bandit convex optimisation is a fundamental framework for studying zeroth-order convex optimisation. This book covers the many tools used for this problem, including cutting plane methods, interior point methods, continuous exponential weights, gradient descent and online Newton step. The nuances between the many assumptions and setups are explained. Although there is not much truly new here, some existing tools are applied in novel ways to obtain new algorithms. A few bounds are improved in minor ways.
Bandit convex 优化是研究零顺序 convex 优化的基本框架。 这本书涵盖了用于这一问题的许多工具, 包括切割平面方法、 内点方法、 连续指数权重、 梯度下行和在线 牛顿 步骤。 许多假设和设置之间的细微差别得到了解释。 虽然这里没有多少真正的新事物, 但有些现有工具正在以新颖的方式应用, 以获得新的算法。 一些界限稍有改进 。
Article 130
Title@2025-06-12 (4): Graph Neural Networks for Automatic Addition of Optimizing Components in Printed Circuit Board Schematics
Title: Graph Neural Networks for Automatic Addition of Optimizing Components in Printed Circuit Board Schematics | Graphische Neuralnetze zur automatischen Ergänzung von Optimierungskomponenten in gedruckten Schaltplanen | 打印电路板图示板自动增加优化部件的神经网络 2506.10577v1 |
Authors (4): Pascal Plettenberg, André Alcalde, Bernhard Sick, Josephine M. Thomas
The design and optimization of Printed Circuit Board (PCB) schematics is crucial for the development of high-quality electronic devices. Thereby, an important task is to optimize drafts by adding components that improve the robustness and reliability of the circuit, e.g., pull-up resistors or decoupling capacitors. Since there is a shortage of skilled engineers and manual optimizations are very time-consuming, these best practices are often neglected. However, this typically leads to higher costs for troubleshooting in later development stages as well as shortened product life cycles, resulting in an increased amount of electronic waste that is difficult to recycle. Here, we present an approach for automating the addition of new components into PCB schematics by representing them as bipartite graphs and utilizing a node pair prediction model based on Graph Neural Networks (GNNs). We apply our approach to three highly relevant PCB design optimization tasks and compare the performance of several popular GNN architectures on real-world datasets labeled by human experts. We show that GNNs can solve these problems with high accuracy and demonstrate that our approach offers the potential to automate PCB design optimizations in a time- and cost-efficient manner.
印刷电路委员会(PCB)示意图的设计和优化对于发展高质量的电子装置至关重要。因此,一项重要任务是通过增加能够提高电路的稳健性和可靠性的部件,例如拉动阻力器或脱钩电容器,优化草稿。由于缺少熟练的工程师和人工优化非常耗时,这些最佳做法往往被忽视。然而,这通常会导致后期开发阶段的故障排除成本较高,产品寿命周期缩短,导致电子废物数量增加,难以回收。在这里,我们提出一种方法,将新部件添加到多氯联苯示意图上,作为双向图,并使用基于“神经网络”图的结对预测模型。我们采用的方法是三项高度相关的多氯联苯设计优化任务,并将人类专家所标注的实时数据集上若干广受欢迎的GNN结构的性能加以比较。我们表明,GNN可以以高准确性的方式解决这些问题,并表明我们的方法能够以高效的方式为多氯联苯设计自动化的升级。
Article 131
Title@2025-06-12 (4): Evolutionary Prediction Games
Title: Evolutionary Prediction Games | Evolutionäre Vorhersage Spiele | 演进预测运动会 2503.03401v2 |
Authors (2): Eden Saig, Nir Rosenfeld
When a prediction algorithm serves a collection of users, disparities in prediction quality are likely to emerge. If users respond to accurate predictions by increasing engagement, inviting friends, or adopting trends, repeated learning creates a feedback loop that shapes both the model and the population of its users. In this work, we introduce evolutionary prediction games, a framework grounded in evolutionary game theory which models such feedback loops as natural-selection processes among groups of users. Our theoretical analysis reveals a gap between idealized and real-world learning settings: In idealized settings with unlimited data and computational power, repeated learning creates competition and promotes competitive exclusion across a broad class of behavioral dynamics. However, under realistic constraints such as finite data, limited compute, or risk of overfitting, we show that stable coexistence and mutualistic symbiosis between groups becomes possible. We analyze these possibilities in terms of their stability and feasibility, present mechanisms that can sustain their existence, and empirically demonstrate our findings.
当预测算法用于收集用户时,预测质量的差异可能会出现。如果用户通过增加参与、邀请朋友或采用趋势来回应准确的预测,反复学习会产生影响模型和用户人口的反馈循环。在这项工作中,我们引入进化预测游戏,这是一个基于进化游戏理论的框架,它模拟了用户群体之间自然选择过程等反馈循环。我们的理论分析揭示了理想化和现实世界学习环境之间的差距:在理想化环境中,有无限的数据和计算能力,反复学习在广泛的行为动态中产生竞争,促进竞争性排斥。然而,在诸如有限数据、有限计算或过度适应风险等现实的限制下,我们显示群体之间能够实现稳定的共存和相互共生关系。我们从稳定性和可行性的角度分析这些可能性,现有的机制能够维持它们的存在,并用经验展示我们的调查结果。
Article 132
Title@2025-06-12 (4): Identifiability Challenges in Sparse Linear Ordinary Differential Equations
Title: Identifiability Challenges in Sparse Linear Ordinary Differential Equations | Identifizierbarkeitsherausforderungen in Sparse Lineare Standarddifferenzgleichungen | 分散线性普通差别等分法的可识别性挑战 2506.09816v2 |
Authors (3): Cecilia Casolo, Sören Becker, Niki Kilbertus
Dynamical systems modeling is a core pillar of scientific inquiry across natural and life sciences. Increasingly, dynamical system models are learned from data, rendering identifiability a paramount concept. For systems that are not identifiable from data, no guarantees can be given about their behavior under new conditions and inputs, or about possible control mechanisms to steer the system. It is known in the community that “linear ordinary differential equations (ODE) are almost surely identifiable from a single trajectory.” However, this only holds for dense matrices. The sparse regime remains underexplored, despite its practical relevance with sparsity arising naturally in many biological, social, and physical systems. In this work, we address this gap by characterizing the identifiability of sparse linear ODEs. Contrary to the dense case, we show that sparse systems are unidentifiable with a positive probability in practically relevant sparsity regimes and provide lower bounds for this probability. We further study empirically how this theoretical unidentifiability manifests in state-of-the-art methods to estimate linear ODEs from data. Our results corroborate that sparse systems are also practically unidentifiable. Theoretical limitations are not resolved through inductive biases or optimization dynamics. Our findings call for rethinking what can be expected from data-driven dynamical system modeling and allows for quantitative assessments of how much to trust a learned linear ODE.
动态系统建模是自然科学和生命科学科学研究的核心支柱。 动态系统模型越来越多地从数据中学习, 使识别性成为最高概念。 对于无法从数据中识别的系统, 无法保证其在新的条件和投入下的行为, 或可能的控制机制来指导系统。 社区知道“ 线性普通差异方程式( ODE) 几乎可以肯定地从单一轨迹中识别出来 ” 。 然而, 这只能维持在密度基质上。 稀疏的系统仍然没有得到充分探讨, 尽管它与许多生物、社会和物理系统中自然产生的弥漫性具有实际相关性。 在这项工作中, 我们通过说明稀疏线性线性模型来弥补这一差距。 与密集的情况相反, 我们显示, 稀疏的系统是无法识别的,在实际相关的宽度制度中具有积极的可能性,并且为这种可能性提供了较低的界限。 我们进一步研究了这种理论性不可辨别的方法是如何在从数据中估计线性 ODEs 。 我们的结果证实, 稀疏的系统也是几乎无法辨别的模型。 理论限制使得我们所期望的动态的动态的系统能够被重新定位, 成为一种动态的动态分析。
Article 133
Title@2025-06-12 (4): Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration
Title: Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration | Box-Constrained Softmax-Funktion und seine Anwendung für die Post-Hoc-Kalibrierung | Box- Contract Softmax 函数及其热后校准应用程序 2506.10572v1 |
Authors (4): Kyohei Atarashi, Satoshi Oyama, Hiromi Arai, Hisashi Kashima
Controlling the output probabilities of softmax-based models is a common problem in modern machine learning. Although the $\mathrm{Softmax}$ function provides soft control via its temperature parameter, it lacks the ability to enforce hard constraints, such as box constraints, on output probabilities, which can be critical in certain applications requiring reliable and trustworthy models. In this work, we propose the box-constrained softmax ($\mathrm{BCSoftmax}$) function, a novel generalization of the $\mathrm{Softmax}$ function that explicitly enforces lower and upper bounds on output probabilities. While $\mathrm{BCSoftmax}$ is formulated as the solution to a box-constrained optimization problem, we develop an exact and efficient computation algorithm for $\mathrm{BCSoftmax}$. As a key application, we introduce two post-hoc calibration methods based on $\mathrm{BCSoftmax}$. The proposed methods mitigate underconfidence and overconfidence in predictive models by learning the lower and upper bounds of the output probabilities or logits after model training, thereby enhancing reliability in downstream decision-making tasks. We demonstrate the effectiveness of our methods experimentally using the TinyImageNet, CIFAR-100, and 20NewsGroups datasets, achieving improvements in calibration metrics.
在现代机器学习中,控制软负式模型的输出概率是一个常见的问题。 尽管 $\ mathrm{ Softmax} 函数通过其温度参数提供软控制, 但它缺乏对产出概率实施硬性限制的能力, 如框限制, 对要求可靠和可信赖模型的某些应用程序来说, 产出概率可能至关重要。 在这项工作中, 我们提出盒式软式控制( mathrm{ BCSoftmax} $) 功能, 一种对 $\ mathrm{ BCftmax} 功能的新的概括化。 虽然 $\ mathrm{ BCftmax} 函数通过其温度参数提供软性控制。 虽然 $\ mathrm{ BCSoftmax} 函数通过箱式控制优化问题的解决方案来实施硬性限制产出概率。 我们为 $\ mathrm{ BCBCSoftmax} 功能, 我们引入了两种基于 $\ mathrimm{ BCSoftmax} 功能的后校准校准方法。 在预测性模型中减少和过度信任性和过度性, 我们的模型中, 正在学习的下层分析中, , 在测试后, 在测试中, 在测试中, 测试中, 在测试中, 在测试中, 在测试中, 在测试后演练后演算中, 校程中, 校程中, 校程中, 校程中, 校程中, 我们的校正。
Article 134
Title@2025-06-12 (4): On the role of non-linear latent features in bipartite generative neural networks
Title: On the role of non-linear latent features in bipartite generative neural networks | Zur Rolle nichtlinearer latenter Merkmale in bipartitischen generativen neuronalen Netzwerken | 关于非线性潜伏特征在双边基因神经网络中的作用 2506.10552v1 |
Authors (4): Tony Bonnaire, Giovanni Catania, Aurélien Decelle, Beatriz Seoane
We investigate the phase diagram and memory retrieval capabilities of bipartite energy-based neural networks, namely Restricted Boltzmann Machines (RBMs), as a function of the prior distribution imposed on their hidden units - including binary, multi-state, and ReLU-like activations. Drawing connections to the Hopfield model and employing analytical tools from statistical physics of disordered systems, we explore how the architectural choices and activation functions shape the thermodynamic properties of these models. Our analysis reveals that standard RBMs with binary hidden nodes and extensive connectivity suffer from reduced critical capacity, limiting their effectiveness as associative memories. To address this, we examine several modifications, such as introducing local biases and adopting richer hidden unit priors. These adjustments restore ordered retrieval phases and markedly improve recall performance, even at finite temperatures. Our theoretical findings, supported by finite-size Monte Carlo simulations, highlight the importance of hidden unit design in enhancing the expressive power of RBMs.
我们调查两边基于能源的神经网络,即限制波尔茨曼机器(RBM)的阶段图和记忆检索能力,这是先前对其隐蔽单元(包括二进制、多状态和类似RLU的激活)的分布功能。我们从无序系统的统计物理学中绘制与霍普菲尔德模型的连接并利用这些系统的分析工具,我们探索建筑选择和激活功能如何影响这些模型的热力特性。我们的分析表明,带有二进制隐藏节点和广泛连接的标准按级按级按级按级按级按级按级按级按级按级按级按级按级按级计算,限制了它们作为连带记忆的效力。为了解决这个问题,我们研究了几项修改,例如引入地方偏差和采用较富的隐藏单元前期等。这些调整恢复了定级的检索阶段,并显著地改进了业绩,即使在有限的温度下。我们的理论发现,在限定规模的蒙特卡洛模拟中,突出了隐藏单位设计在加强按级按级按级计调制制时的重要性。
Article 135
Title@2025-06-12 (4): Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods
Title: Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods | Kostenlose Bewertung von Record-Level-Datensicherheitsrisiken durch Artifact-basierte Methoden | 采用人工方法进行自由记录隐私风险评估 2411.05743v3 |
Authors (4): Joseph Pollock, Igor Shilov, Euodia Dodd, Yves-Alexandre de Montjoye
Membership inference attacks (MIAs) are widely used to empirically assess privacy risks in machine learning models, both providing model-level vulnerability metrics and identifying the most vulnerable training samples. State-of-the-art methods, however, require training hundreds of shadow models with the same architecture as the target model. This makes the computational cost of assessing the privacy of models prohibitive for many practical applications, particularly when used iteratively as part of the model development process and for large models. We propose a novel approach for identifying the training samples most vulnerable to membership inference attacks by analyzing artifacts naturally available during the training process. Our method, Loss Trace Interquartile Range (LT-IQR), analyzes per-sample loss trajectories collected during model training to identify high-risk samples without requiring any additional model training. Through experiments on standard benchmarks, we demonstrate that LT-IQR achieves 92% precision@k=1% in identifying the samples most vulnerable to state-of-the-art MIAs. This result holds across datasets and model architectures with LT-IQR outperforming both traditional vulnerability metrics, such as loss, and lightweight MIAs using few shadow models. We also show LT-IQR to accurately identify points vulnerable to multiple MIA methods and perform ablation studies. We believe LT-IQR enables model developers to identify vulnerable training samples, for free, as part of the model development process. Our results emphasize the potential of artifact-based methods to efficiently evaluate privacy risks.
成员推导攻击(MIAs)被广泛用于实证评估机器学习模型中的隐私风险,既提供模型级别的脆弱度度度指标,又确定最易受到伤害的培训样本。然而,最先进的方法要求培训数百个影子模型,与目标模型的架构相同。这使得评估模型隐私的计算成本对许多实际应用来说令人望而却步,特别是在作为模型开发过程的一部分和大型模型的迭代使用时。我们提出一种新的方法,通过分析培训过程中自然可得的工艺品,确定最易成为会员推断袭击的样本。我们的方法,即损失追踪国际间交易范围(LT-IQR),分析在模型培训期间收集的每个抽样损失轨迹,以确定高风险样品,而不需要任何额外的示范培训。我们通过对标准基准的实验,证明LT-IQR实现了92%的精确度@k=1%的精确度,通过分析最易受到伤害的模式样本,分析在培训过程中自然可以获得的文物。我们的方法,LT-IR(LT-I)超越了传统的脆弱度标点,同时进行模拟的光度和模型的模型评估。我们通过对低比标准的模型的模型的模型,还显示了损失评估方法。
Article 136
Title@2025-06-12 (4): Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models
Title: Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models | Obliviate: Effiziente Unvergesslichkeit für den Schutz geistigen Eigentums in großen Sprachmodellen | 默认:在大语言模式中有效统一保护知识产权 2502.15010v2 |
Authors (2): Mark Russinovich, Ahmed Salem
Recent copyright agreements between AI companies and content creators underscore the need for fine-grained control over language models’ ability to reproduce copyrighted text. Existing defenses-ranging from aggressive unlearning to simplistic output filters-either sacrifice model utility or inadequately address verbatim leakage. We introduce Obliviate, a lightweight post-training method that surgically suppresses exact reproduction of specified sequences while preserving semantic understanding. Obliviate first identifies memorized passages and then, for each target token, minimally adjusts the model’s output distribution via a Kullback-Leibler divergence penalty to drive down the probability of exact reproduction. Simultaneously, we enforce a consistency loss on non-target tokens to retain the model’s fluency and task performance. We evaluate Obliviate on four popular 6-8B-parameter models (LLaMA-3.1, LLaMA-3.1-Instruct, Qwen-2.5, and Yi-1.5) using synthetic memorization benchmarks and organic copyrighted excerpts (e.g., Moby Dick, Frankenstein, Alice in Wonderland and Les Miserables). Across all settings, Obliviate reduces verbatim recall by two orders of magnitude (e.g., from hundreds of words to fewer than 12) while degrading downstream accuracy by at most 1% on HellaSwag, MMLU, TruthfulQA, and Winogrande. Furthermore, we benchmark Obliviate aganist different unlearning and copyright techniques using the MUSE and CoTaEval benchmarks. These results position Obliviate as a practical, high-fidelity solution for copyright compliance in deployed LLMs.
AI公司和内容创作者之间最近达成的版权协议强调,需要对语言模型复制版权文本的能力进行细微的控制。现有的防御措施包括:从主动的不学习到简单的产出过滤器,无论是牺牲模型效用还是不当的邮件渗漏。我们采用了轻巧的培训后训练方法Obliviate,在保持语义理解的同时,用外科方式抑制特定序列的复制。首先确定记忆化的段落,然后对每个目标标志,通过Kullback-Lebel差价罚款,对模型的输出分配进行最小调整,以降低精确复制的概率。与此同时,我们对非目标标志实施一致性损失,以保持模型的流畅和任务性。我们用四种受欢迎的 6-8B 参数模型(LLLAMA-3、LalaMA-3.1-Instruct、Qwen-2.5和Yy-1.5) 评估了四种受欢迎的6-8B参数模型(LLO-I-Instruct、Q-2.5和Y-1.5),用合成的数学标准基准和有机版权基准(例如,Moby Dick、Francel、Al-Al-Al-Al-Lislate 和Lislation) 和最低级的准确性排序,用两种标准,用最低级标准,用最低的顺序和最低的顺序,将这些结果和最低的精确性。
Article 137
Title@2025-06-12 (4): Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models
Title: Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models | Erforschung von Leistungs-Komplexitäts-Trade-Offs in Sound Event Detection-Modellen | 探索健全事件探测模型中的性能-复杂交易-便利交易 2503.11373v2 |
Authors (5): Tobias Morocutti, Florian Schmid, Jonathan Greif, Francesco Foscarin, Gerhard Widmer
We target the problem of developing new low-complexity networks for the sound event detection task. Our goal is to meticulously analyze the performance-complexity trade-off, aiming to be competitive with the large state-of-the-art models, at a fraction of the computational requirements. We find that low-complexity convolutional models previously proposed for audio tagging can be effectively adapted for event detection (which requires frame-wise prediction) by adjusting convolutional strides, removing the global pooling, and, importantly, adding a sequence model before the (now frame-wise) classification heads. Systematic experiments reveal that the best choice for the sequence model type depends on which complexity metric is most important for the given application. We also investigate the impact of enhanced training strategies such as knowledge distillation. In the end, we show that combined with an optimized training strategy, we can reach event detection performance comparable to state-of-the-art transformers while requiring only around 5% of the parameters. We release all our pre-trained models and the code for reproducing this work to support future research in low-complexity sound event detection at https://github.com/theMoro/EfficientSED.
我们的目标是在计算要求的一小部分时间里,仔细分析性能-复杂程度权衡,目的是与大型最新模型进行竞争。我们发现,以前为音频标记提议的低复杂性变异性模型可以有效地适应于事件探测(这需要以框架为根据的预测),方法是调整连带进步,消除全球集合,并且重要的是,在(现为框架的)分类头之前添加一个序列模型。系统实验表明,对序列模型类型的最佳选择取决于对特定应用最重要的哪些复杂度衡量标准。我们还调查了知识蒸馏等强化培训战略的影响。最后,我们表明,与优化培训战略相结合,我们可以达到与最先进变异型器相匹配的事件探测性能,同时只需要5%的参数。我们发布了我们所有预先培训的模型和重新制作这项工作的代码,以支持未来在https://giub/scientoro-Shotoroi/McomcomforyEDDM探测低复杂度事件的研究。
Article 138
Title@2025-06-12 (4): Data-driven Day Ahead Market Prices Forecasting: A Focus on Short Training Set Windows
Title: Data-driven Day Ahead Market Prices Forecasting: A Focus on Short Training Set Windows | datengetriebener Tag voraus Marktpreise Prognose: Ein Fokus auf Kurztraining-Set Windows | 头期市场价格预测:关注短期培训成套窗口 2506.10536v1 |
Authors (6): Vasilis Michalakopoulos, Christoforos Menos-Aikateriniadis, Elissaios Sarmas, Antonis Zakynthinos, Pavlos S. Georgilakis, Dimitris Askounis
This study investigates the performance of machine learning models in forecasting electricity Day-Ahead Market (DAM) prices using short historical training windows, with a focus on detecting seasonal trends and price spikes. We evaluate four models, namely LSTM with Feed Forward Error Correction (FFEC), XGBoost, LightGBM, and CatBoost, across three European energy markets (Greece, Belgium, Ireland) using feature sets derived from ENTSO-E forecast data. Training window lengths range from 7 to 90 days, allowing assessment of model adaptability under constrained data availability. Results indicate that LightGBM consistently achieves the highest forecasting accuracy and robustness, particularly with 45 and 60 day training windows, which balance temporal relevance and learning depth. Furthermore, LightGBM demonstrates superior detection of seasonal effects and peak price events compared to LSTM and other boosting models. These findings suggest that short-window training approaches, combined with boosting methods, can effectively support DAM forecasting in volatile, data-scarce environments.
这项研究调查了利用短期历史培训窗口预测电日市场价格的机器学习模型的绩效,重点是发现季节趋势和价格涨幅;我们评估了四个模型,即具有进料错误校正的LSTM(FFEC)、XGBost、LightGBM和CatBoost(希腊、比利时、爱尔兰)三个欧洲能源市场,利用ENTSO-E预报数据生成的成套特征;培训窗口长度从7天到90天不等,以便评估受数据限制的模型适应性;结果显示,灯GBM始终在预测准确性和稳健性方面达到最高水平,特别是45天和60天的培训窗口,这些窗口平衡了时间相关性和学习深度;此外,灯GBM显示,与LSTM和其他推进模型相比,对季节效应和高峰价格事件的探测率较高。这些研究结果表明,短窗口培训方法与推进方法相结合,可以有效支持DAM在波动、数据侵蚀环境中的预报。
Article 139
Title@2025-06-12 (4): Equivariant Neural Diffusion for Molecule Generation
Title: Equivariant Neural Diffusion for Molecule Generation | Equivariante Neuraldiffusion für die Molekülgeneration | 分子代的等量神经元传播 2506.10532v1 |
Authors (4): François Cornet, Grigory Bartosh, Mikkel N. Schmidt, Christian A. Naesseth
We introduce Equivariant Neural Diffusion (END), a novel diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Compared to current state-of-the-art equivariant diffusion models, the key innovation in END lies in its learnable forward process for enhanced generative modelling. Rather than pre-specified, the forward process is parameterized through a time- and data-dependent transformation that is equivariant to rigid transformations. Through a series of experiments on standard molecule generation benchmarks, we demonstrate the competitive performance of END compared to several strong baselines for both unconditional and conditional generation.
我们引入了异质神经分解(END),这是3D分子生成的新型扩散模型,与欧clidean变异具有等同性。与当前最新的等异性扩散模型相比,END的关键创新在于其可学习的强化基因建模前期进程。远前进程不是预先规定的,而是通过一个与僵化变异等同的时间和数据化变异来参数化。通过一系列标准分子生成基准实验,我们展示了END的竞争性性能,而无条件和有条件的生成则有几条强有力的基线。
Article 140
Title@2025-06-12 (4): Decision Making under the Exponential Family: Distributionally Robust Optimisation with Bayesian Ambiguity Sets
Title: Decision Making under the Exponential Family: Distributionally Robust Optimisation with Bayesian Ambiguity Sets | Entscheidungsfindung unter der Exponential-Familie: Verteilungsstarke Optimierung mit Bayesian Ambiguity Sets | 根据 “ 名人家庭 “ 下的决策:与贝ysian 暗形组的分布式强力优化 2411.16829v2 |
Authors (3): Charita Dellaporta, Patrick O’Hara, Theodoros Damoulas
Decision making under uncertainty is challenging as the data-generating process (DGP) is often unknown. Bayesian inference proceeds by estimating the DGP through posterior beliefs on the model’s parameters. However, minimising the expected risk under these beliefs can lead to suboptimal decisions due to model uncertainty or limited, noisy observations. To address this, we introduce Distributionally Robust Optimisation with Bayesian Ambiguity Sets (DRO-BAS) which hedges against model uncertainty by optimising the worst-case risk over a posterior-informed ambiguity set. We provide two such sets, based on posterior expectations (DRO-BAS(PE)) or posterior predictives (DRO-BAS(PP)) and prove that both admit, under conditions, strong dual formulations leading to efficient single-stage stochastic programs which are solved with a sample average approximation. For DRO-BAS(PE) this covers all conjugate exponential family members while for DRO-BAS(PP) this is shown under conditions on the predictive’s moment generating function. Our DRO-BAS formulations outperform existing Bayesian DRO on the Newsvendor problem and achieve faster solve times with comparable robustness on the Portfolio problem.
在不确定情况下作出决策具有挑战性,因为数据产生过程往往不为人知。贝伊斯通过模型参数的后端信仰估计DGP而得出预测结果。然而,由于模型的不确定性或有限的、吵闹的观测,将这些信念下的预期风险降到最低可能导致作出不理想的决定。为了解决这个问题,我们引入了与Bayesian Ambiguity Set (DRO-BAS) 的分布式强力优化优化,通过优化后方知情的模棱两可避免模型不确定性。我们根据后方期望(DRO-BAS(PE))或后方预测(DRO-BAS(PP)))提供两套这类预测结果。我们根据预测后方期望(DRO-BAS(DRO-BAS(P)))或后方预测(DRO-BAS(DPP)) 提供两种情况,以预测结果证明,在条件下,两种情况都承认强有力的双重配方配制导致高效的单一阶段的抽查方案。对于DRO-BAS(PES)来说,这涵盖了所有同DRO-BAS(PAS)的指数(PPP)在预测时段产生更快的节制中,在可比较的BA-Formax-Fy-Formax-S-S-promaismaildromais promais pro pro pro pro pro pro pro pro pro pro pro pro prois pro pro pro pro prois pro pro pro pro pro pro plemental plementald prois pro pro plement 上的问题上, y prodald prodaldaldaldaldald plementald pro pro pro prod prod pled prod prodal prod proddaldal progal prod prodrodal y prod y prodrodroddrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodro
Article 141
Title@2025-06-12 (4): Learning in Budgeted Auctions with Spacing Objectives
Title: Learning in Budgeted Auctions with Spacing Objectives | Lernen in budgetierten Auktionen mit unterschiedlichen Zielen | 具有间距目标的预算拍卖中的学习 2411.04843v2 |
Authors (6): Giannis Fikioris, Robert Kleinberg, Yoav Kolumbus, Raunak Kumar, Yishay Mansour, Éva Tardos
In many repeated auction settings, participants care not only about how frequently they win but also how their winnings are distributed over time. This problem arises in various practical domains where avoiding congested demand is crucial, such as online retail sales and compute services, as well as in advertising campaigns that require sustained visibility over time. We introduce a simple model of this phenomenon, modeling it as a budgeted auction where the value of a win is a concave function of the time since the last win. This implies that for a given number of wins, even spacing over time is optimal. We also extend our model and results to the case when not all wins result in “conversions” (realization of actual gains), and the probability of conversion depends on a context. The goal is to maximize and evenly space conversions rather than just wins. We study the optimal policies for this setting in second-price auctions and offer learning algorithms for the bidders that achieve low regret against the optimal bidding policy in a Bayesian online setting. Our main result is a computationally efficient online learning algorithm that achieves $\tilde O(\sqrt T)$ regret. We achieve this by showing that an infinite-horizon Markov decision process (MDP) with the budget constraint in expectation is essentially equivalent to our problem, even when limiting that MDP to a very small number of states. The algorithm achieves low regret by learning a bidding policy that chooses bids as a function of the context and the system’s state, which will be the time elapsed since the last win (or conversion). We show that state-independent strategies incur linear regret even without uncertainty of conversions. We complement this by showing that there are state-independent strategies that, while still having linear regret, achieve a $(1-\frac 1 e)$ approximation to the optimal reward.
在许多反复的拍卖环境中, 参与者不仅关心他们赢得的频率, 也关心他们赢得的得分如何随时间而分配。 这个问题出现在各种实际领域, 避免需求紧要, 比如在线零售销售和计算服务, 以及需要长期持续可见的广告运动中。 我们引入了这一现象的简单模式, 将赢的价值作为自上次赢以来时间的混搭功能, 以预算为模式的拍卖模式。 这意味着对于某个特定数目的赢者来说, 甚至独立时间的间隔是最佳的。 我们还将我们的模型和结果推广到并非所有赢得“ 转换” ( 实际收益的实现) 的关键领域, 以及转换的可能性取决于一个环境。 我们的目标是在二价拍卖中研究这个环境的最佳政策, 并且给竞拍者提供学习算法, 在Bayesian 的网络环境下, 我们的主要结果是计算到一个高效的在线学习算法, 实现O( sqrent T) 的变换值( 实现实际收益的变值 ) , 而转换的概率概率取决于一个环境 。 我们从一个无限的算算法, 将显示我们预算的回报的变数 的算算法, 的算算法将显示一个比重的变数, 的策略将显示我们的预算 的算法 的 的 的算法是 的 的 的 。
Article 142
Title@2025-06-12 (4): QuXAI: Explainers for Hybrid Quantum Machine Learning Models
Title: QuXAI: Explainers for Hybrid Quantum Machine Learning Models | QuXAI: Erklärer für Hybrid-Quantum-Maschinen-Lernmodelle | QuXAI:混合量子机器学习模型的解释器 2505.10167v3 |
Authors (6): Saikat Barua, Mostafizur Rahman, Shehenaz Khaled, Md Jafor Sadek, Rafiul Islam, Shahnewaz Siddique
The emergence of hybrid quantum-classical machine learning (HQML) models opens new horizons of computational intelligence but their fundamental complexity frequently leads to black box behavior that undermines transparency and reliability in their application. Although XAI for quantum systems still in its infancy, a major research gap is evident in robust global and local explainability approaches that are designed for HQML architectures that employ quantized feature encoding followed by classical learning. The gap is the focus of this work, which introduces QuXAI, an framework based upon Q-MEDLEY, an explainer for explaining feature importance in these hybrid systems. Our model entails the creation of HQML models incorporating quantum feature maps, the use of Q-MEDLEY, which combines feature based inferences, preserving the quantum transformation stage and visualizing the resulting attributions. Our result shows that Q-MEDLEY delineates influential classical aspects in HQML models, as well as separates their noise, and competes well against established XAI techniques in classical validation settings. Ablation studies more significantly expose the virtues of the composite structure used in Q-MEDLEY. The implications of this work are critically important, as it provides a route to improve the interpretability and reliability of HQML models, thus promoting greater confidence and being able to engage in safer and more responsible use of quantum-enhanced AI technology. Our code and experiments are open-sourced at: https://github.com/GitsSaikat/QuXAI
混合量子古典机学(HQML)模式的出现开启了计算智能的新视野,但其基本复杂性经常导致黑盒行为,破坏其应用的透明度和可靠性。虽然量子系统XAI仍然处于初级阶段,但为使用量化特征编码的HQML结构设计的强大的全球和地方解释方法明显存在重大研究差距,这些结构采用量化特征编码,随后是古典学习。这种差距是这项工作的焦点,它引入了QuXAI(QuXAI)(基于Q-MEDLEY(Q-MEDLEY)的框架),这是解释这些混合系统重要性的一个解释器。我们的模型需要创建包含量子特征图的黑盒行为,使用Q-MEDLEY(Q-MEDAY)模型,该模型结合基于推断的特性,保存量子转换阶段和由此产生的属性的可视化分析方法。我们的结果显示,Q-MEDLE(Q)中具有影响力的经典 XAI(Q-MEDL)技术的开放技术,并与之竞争。更显著地暴露了在Q-MEDLA/Q(HLAIAY)中所使用的复合结构结构的美度的美度,因此,可以更安全地解释。在提高的模型中进行更安全性、更安全的模型。
Article 143
Title@2025-06-12 (4): Macro Graph of Experts for Billion-Scale Multi-Task Recommendation
Title: Macro Graph of Experts for Billion-Scale Multi-Task Recommendation | Makrograph der Experten für Milliarden-Scale Multi-Task-Empfehlung | 10亿多任务建议十亿多阶段专家宏观图 2506.10520v1 |
Authors (9): Hongyu Yao, Zijin Hong, Hao Chen, Yuanchen Bei, Zhiqing Li, Qijie Shen, Zuobin Ying, Huan Gong, Feiran Huang
Graph-based multi-task learning at billion-scale presents a significant challenge, as different tasks correspond to distinct billion-scale graphs. Traditional multi-task learning methods often neglect these graph structures, relying solely on individual user and item embeddings. However, disregarding graph structures overlooks substantial potential for improving performance. In this paper, we introduce the Macro Graph of Expert (MGOE) framework, the first approach capable of leveraging macro graph embeddings to capture task-specific macro features while modeling the correlations between task-specific experts. Specifically, we propose the concept of a Macro Graph Bottom, which, for the first time, enables multi-task learning models to incorporate graph information effectively. We design the Macro Prediction Tower to dynamically integrate macro knowledge across tasks. MGOE has been deployed at scale, powering multi-task learning for the homepage of a leading billion-scale recommender system. Extensive offline experiments conducted on three public benchmark datasets demonstrate its superiority over state-of-the-art multi-task learning methods, establishing MGOE as a breakthrough in multi-task graph-based recommendation. Furthermore, online A/B tests confirm the superiority of MGOE in billion-scale recommender systems.
在10亿尺度上,基于图表的多任务学习是一个重大挑战,因为不同的任务与不同的10亿尺度图表相对应。传统的多任务学习方法往往忽视这些图表结构,只依靠单个用户和项目嵌入。然而,无视图表结构忽略了改善业绩的巨大潜力。在本文中,我们引入了宏观专家图(MGOE)框架,这是能够利用宏观图嵌入来捕捉具体任务宏观特征的第一个方法,同时模拟具体任务专家之间的关联。具体地说,我们提出了宏观图底部的概念,首次使多任务学习模型能够有效地纳入图表信息。我们设计宏观预测塔,以动态地将宏观知识贯穿各项任务。MGOE已经大规模部署,为领先的10亿尺度建议系统的主页提供多任务学习动力。在三个公共基准数据集上进行的广泛离线实验表明它优于最先进的多任务学习方法,将MGOE确定为多任务图表制建议中的突破。此外,在线A/B测试确认了MGO系统在多任务图表基础上的优势。
Article 144
Title@2025-06-12 (4): A Weighted Loss Approach to Robust Federated Learning under Data Heterogeneity
Title: A Weighted Loss Approach to Robust Federated Learning under Data Heterogeneity | Ein gewichtiger Verlust Ansatz für robustes Federated Learning unter Daten Heterogenität | 数据异质下联邦强力学习的加权损失方法 2506.09824v2 |
Authors (5): Johan Erbani, Sonia Ben Mokhtar, Pierre-Edouard Portier, Elod Egyed-Zsigmond, Diana Nurbakova
Federated learning (FL) is a machine learning paradigm that enables multiple data holders to collaboratively train a machine learning model without sharing their training data with external parties. In this paradigm, workers locally update a model and share with a central server their updated gradients (or model parameters). While FL seems appealing from a privacy perspective, it opens a number of threats from a security perspective as (Byzantine) participants can contribute poisonous gradients (or model parameters) harming model convergence. Byzantine-resilient FL addresses this issue by ensuring that the training proceeds as if Byzantine participants were absent. Towards this purpose, common strategies ignore outlier gradients during model aggregation, assuming that Byzantine gradients deviate more from honest gradients than honest gradients do from each other. However, in heterogeneous settings, honest gradients may differ significantly, making it difficult to distinguish honest outliers from Byzantine ones. In this paper, we introduce the Worker Label Alignement Loss (WoLA), a weighted loss that aligns honest worker gradients despite data heterogeneity, which facilitates the identification of Byzantines’ gradients. This approach significantly outperforms state-of-the-art methods in heterogeneous settings. In this paper, we provide both theoretical insights and empirical evidence of its effectiveness.
联邦学习(FL)是一种机器学习模式,它使多个数据持有者能够在不与外部各方分享培训数据的情况下合作培训机器学习模式,而无需与外部各方分享其培训数据。在这个模式中,工人在当地更新一个模型,并与中央服务器共享其更新的梯度(或模型参数)。虽然FL从隐私角度似乎吸引,但从安全角度打开了一些威胁,因为(Byzantine)参与者可以造成有毒梯度(或模型参数)损害模型趋同。Byzantine-Relishive FL(Byzantine-Relishive FL)解决这个问题,确保培训成果与Byzantine参与者不存在一样。为此,共同战略在模型汇总期间忽略了外部梯度,假设Byzantine梯度偏离的诚实梯度多于诚实梯度。然而,在多种环境里,诚实梯度的梯度可能差异很大,难以区分诚实的离差者与Byzantine的模型趋同者。在本文中,我们介绍了WoLA, 一种加权损失使诚实的工人梯度梯度与数据具有高度的梯度,这为我们提供了模型化的理论推导路。
Article 145
Title@2025-06-12 (4): Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning
Title: Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning | Subgraph Gaussian Einbettungskontrast für selbstüberwachtes Graphen-Darstellungslernen | 自支持图表代表制学习的 Subgraph Gaussian 嵌入式对比对比度 2505.23529v2 |
Authors (3): Shifeng Xie, Aref Einizade, Jhony H. Giraldo
Graph Representation Learning (GRL) is a fundamental task in machine learning, aiming to encode high-dimensional graph-structured data into low-dimensional vectors. Self-Supervised Learning (SSL) methods are widely used in GRL because they can avoid expensive human annotation. In this work, we propose a novel Subgraph Gaussian Embedding Contrast (SubGEC) method. Our approach introduces a subgraph Gaussian embedding module, which adaptively maps subgraphs to a structured Gaussian space, ensuring the preservation of input subgraph characteristics while generating subgraphs with a controlled distribution. We then employ optimal transport distances, more precisely the Wasserstein and Gromov-Wasserstein distances, to effectively measure the similarity between subgraphs, enhancing the robustness of the contrastive learning process. Extensive experiments across multiple benchmarks demonstrate that \method~outperforms or presents competitive performance against state-of-the-art approaches. Our findings provide insights into the design of SSL methods for GRL, emphasizing the importance of the distribution of the generated contrastive pairs.
图形教学( GRL) 是机器学习的一项基本任务, 目的是将高方图结构数据编码为低维矢量。 自我支持学习( SSL) 方法在 GRL 中被广泛使用, 因为它们可以避免昂贵的人类批注。 在这项工作中, 我们提出一个新的Subgraph Gausian 嵌入对比( SubGEC) 方法。 我们的方法引入了子集子集成模块, 该模块将子集成成到结构化的高斯空间, 确保在生成受控分布的子集时保存输入子集特性。 然后我们使用最佳的运输距离, 更精确地说, 瓦瑟斯坦 和 Gromov- Wasserstein 的距离, 以有效测量子集之间的相似性, 增强对比性学习过程的稳健性。 跨多个基准的大规模实验表明, 血压~ 外形或显示与最新技术方法相比具有竞争力的性能。 我们的发现为 GRSL 设计 SL 方法提供了洞察 设计 方法的洞察力, , 强调生成对比配对分布的重要性 。
Article 146
Title@2025-06-12 (4): Differentially private and decentralized randomized power method
Title: Differentially private and decentralized randomized power method | Differential private und dezentrale randomisierte Energiemethode | 区别对待的私人和分散的随机电源方法 2411.01931v3 |
Authors (5): Julien Nicolas, César Sabater, Mohamed Maouche, Sonia Ben Mokhtar, Mark Coates
The randomized power method has gained significant interest due to its simplicity and efficient handling of large-scale spectral analysis and recommendation tasks. However, its application to large datasets containing personal information (e.g., web interactions, search history, personal tastes) raises critical privacy problems. This paper addresses these issues by proposing enhanced privacy-preserving variants of the method. First, we propose a variant that reduces the amount of the noise required in current techniques to achieve Differential Privacy (DP). More precisely, we refine the privacy analysis so that the Gaussian noise variance no longer grows linearly with the target rank, achieving the same DP guarantees with strictly less noise. Second, we adapt our method to a decentralized framework in which data is distributed among multiple users. The decentralized protocol strengthens privacy guarantees with no accuracy penalty and a low computational and communication overhead. Our results include the provision of tighter convergence bounds for both the centralized and decentralized versions, and an empirical comparison with previous work using real recommendation datasets.
随机电动方法由于简单和高效地处理大型光谱分析和建议任务而引起了极大的兴趣,然而,它适用于包含个人信息的大型数据集(例如网络互动、搜索历史、个人口味),引起了严重的隐私问题。本文件通过提出方法的强化隐私保护变体来解决这些问题。首先,我们提出了一个变体,减少当前技术为实现差异隐私(DP)所需噪音的数量。更准确地说,我们完善隐私分析,使高西亚噪音不再与目标级别成直线变化,在严格减少噪音的情况下实现相同的DP保障。第二,我们调整我们的方法,使之适应一个分散化的框架,在多个用户之间分配数据。分散化的协议加强了隐私保障,没有准确的罚款,计算和通信管理低。我们的成果包括提供集中和分散化版本的更紧密的连接线,以及用真实的建议数据集对先前的工作进行经验性比较。
Article 147
Title@2025-06-12 (4): Large Scale Multi-Task Bayesian Optimization with Large Language Models
Title: Large Scale Multi-Task Bayesian Optimization with Large Language Models | Große Multi-Task Bayesian Optimierung mit großen Sprachmodellen | 使用大语言模式的大型多语类多语类多语类多语类最佳优化 2503.08131v2 |
Authors (10): Yimeng Zeng, Natalie Maus, Haydn Thomas Jones, Jeffrey Tao, Fangping Wan, Marcelo Der Torossian Torres, Cesar de la Fuente-Nunez, Ryan Marcus, Osbert Bastani, Jacob R. Gardner
In multi-task Bayesian optimization, the goal is to leverage experience from optimizing existing tasks to improve the efficiency of optimizing new ones. While approaches using multi-task Gaussian processes or deep kernel transfer exist, the performance improvement is marginal when scaling beyond a moderate number of tasks. We introduce a novel approach leveraging large language models (LLMs) to learn from, and improve upon, previous optimization trajectories, scaling to approximately 1500 distinct tasks. Specifically, we propose a feedback loop in which an LLM is fine-tuned on the high quality solutions to specific tasks found by Bayesian optimization (BO). This LLM is then used to generate initialization points for future BO searches for new tasks. The trajectories of these new searches provide additional training data for fine-tuning the LLM, completing the loop. We evaluate our method on two distinct domains: database query optimization and antimicrobial peptide design. Results demonstrate that our approach creates a positive feedback loop, where the LLM’s generated initializations gradually improve, leading to better optimization performance. As this feedback loop continues, we find that the LLM is eventually able to generate solutions to new tasks in just a few shots that are better than the solutions produced by “from scratch” by Bayesian optimization while simultaneously requiring significantly fewer oracle calls.
在多功能贝叶斯优化中,目标是利用现有任务的经验,优化现有任务,提高优化新任务的效率。 虽然使用多任务高斯进程或深内核传输的方法存在,但绩效改进在超出中度任务范围以外时是微不足道的。 我们引入了一种创新方法,利用大语言模型(LLMs)从以往的优化轨迹中学习并改进,缩小到大约1500项不同的任务。 具体地说,我们提议了一个反馈循环,让一个LLM对巴耶斯优化(BO)发现的具体任务的高质量解决方案进行微调。这个LM随后被用来为未来BO搜索新任务生成初始化点。这些新搜索的轨迹提供了额外的培训数据,用于微调LM,完成这一循环。我们在两个不同的领域评估了我们的方法:数据库查询优化和抗微缩胶条纹设计。结果显示,我们的方法创造了积极的反馈循环,在这个循环中,LLM的初始化逐渐改进,导致更好的优化绩效。 由于这一反馈循环继续存在,我们发现LM公司最终能够产生比新的解决方案的更好。
Article 148
Title@2025-06-12 (4): Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation
Title: Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation | Qualität über Menge: Steigerung der Dateneffizienz durch ensembled multimodale Daten-Curation | 质量高于数量:通过集合多模式数据曲线提高数据效率 2502.08211v2 |
Authors (7): Jinda Xu, Yuhao Song, Daming Wang, Weiwei Zhao, Minghua Chen, Kangliang Chen, Qinya Li
In an era overwhelmed by vast amounts of data, the effective curation of web-crawl datasets is essential for optimizing model performance. This paper tackles the challenges associated with the unstructured and heterogeneous nature of such datasets. Traditional heuristic curation methods often inadequately capture complex features, resulting in biases and the exclusion of relevant data. We introduce an advanced, learning-driven approach, Ensemble Curation Of DAta ThroUgh Multimodal Operators (EcoDatum), incorporating a novel quality-guided deduplication method to ensure balanced feature distributions. EcoDatum strategically integrates various unimodal and multimodal data curation operators within a weak supervision ensemble framework, utilizing automated optimization to score each data point effectively. EcoDatum, which significantly improves the data curation quality and efficiency, outperforms existing state-of-the-art (SOTA) techniques, ranked 1st on the DataComp leaderboard, with an average performance score of 0.182 across 38 diverse evaluation datasets. This represents a 28% improvement over the DataComp baseline method, demonstrating its effectiveness in improving dataset curation and model training efficiency.
在一个被大量数据所淹没的时代,有效校正网络拖网数据集对于优化模型性能至关重要。本文件应对与这类数据集的无结构性和多样性性质有关的挑战。传统的超常调解析方法往往不能充分捕捉复杂的特征,导致偏向和排斥相关数据。我们引入了一种先进的、由学习驱动的方法,即Data ThroUgh多式操作员(EcoDatum)的集合曲线,其中包括一种新型的质量引导解析方法,以确保均衡地分布特征。EcoDatum在战略上将各种单式和多式联运数据调控操作员纳入一个薄弱的监管整体框架,利用自动化优化来有效评分每个数据点。EcoDatum大大改进了数据调控的质量和效率,超越了现有的状态(SOTA)技术,在数据组合领导板上排名第1位,在38个不同的评估数据集中平均业绩评分为0.182。这代表了数据比较基准方法的28%的改进,显示了其在改进数据整理模型和培训效率方面的有效性。
Article 149
Title@2025-06-12 (4): A Crack in the Bark: Leveraging Public Knowledge to Remove Tree-Ring Watermarks
Title: A Crack in the Bark: Leveraging Public Knowledge to Remove Tree-Ring Watermarks | Ein Riss in der Bark: Benutzt öffentliches Wissen, um Baum-Ring-Wasserzeichen zu entfernen | 瓶中的裂缝:利用公共知识来清除树状水印 2506.10502v1 |
Authors (2): Junhua Lin, Marc Juarez
We present a novel attack specifically designed against Tree-Ring, a watermarking technique for diffusion models known for its high imperceptibility and robustness against removal attacks. Unlike previous removal attacks, which rely on strong assumptions about attacker capabilities, our attack only requires access to the variational autoencoder that was used to train the target diffusion model, a component that is often publicly available. By leveraging this variational autoencoder, the attacker can approximate the model’s intermediate latent space, enabling more effective surrogate-based attacks. Our evaluation shows that this approach leads to a dramatic reduction in the AUC of Tree-Ring detector’s ROC and PR curves, decreasing from 0.993 to 0.153 and from 0.994 to 0.385, respectively, while maintaining high image quality. Notably, our attacks outperform existing methods that assume full access to the diffusion model. These findings highlight the risk of reusing public autoencoders to train diffusion models – a threat not considered by current industry practices. Furthermore, the results suggest that the Tree-Ring detector’s precision, a metric that has been overlooked by previous evaluations, falls short of the requirements for real-world deployment.
我们展示了一种新颖的攻击,专门针对树林,一种以其高度不可感知性和强力来对付清除攻击的传播模型。与以往的清除攻击相比,我们的攻击仅仅需要使用用于训练目标扩散模型的变异自动编码器,这是经常公开使用的部件。通过利用这种变异自动编码器,攻击者可以接近模型的中间潜伏空间,从而能够更有效地进行代孕攻击。我们的评估表明,这一方法导致AUC的植树探测器ROC和PR曲线大幅下降,分别从0.993降至0.13和0.994降至0.385,同时保持高图像质量。值得注意的是,我们的攻击超出了现有方法,无法完全进入扩散模型。这些调查结果突出表明了重新使用公共自动编码器来训练扩散模型的风险 – – 目前工业实践没有考虑到这一威胁。此外,结果显示,树林探测器的精确度是以前评估所忽略的、实际世界部署要求的短处。
Article 150
Title@2025-06-12 (4): BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis
Title: BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis | BugGen: Eine selbstkorrigierende LLM-Pipeline für eine realistische RTL-Bug-Synthese | BugGen: 现实的 RTL 错误合成自更正多 Agency LLM 管道 2506.10501v1 |
Authors (7): Surya Jasper, Minh Luu, Evan Pan, Aakash Tyagi, Michael Quinn, Jiang Hu, David Kebo Houngninou
Hardware complexity continues to strain verification resources, motivating the adoption of machine learning (ML) methods to improve debug efficiency. However, ML-assisted debugging critically depends on diverse and scalable bug datasets, which existing manual or automated bug insertion methods fail to reliably produce. We introduce BugGen, a first of its kind, fully autonomous, multi-agent pipeline leveraging Large Language Models (LLMs) to systematically generate, insert, and validate realistic functional bugs in RTL. BugGen partitions modules, selects mutation targets via a closed-loop agentic architecture, and employs iterative refinement and rollback mechanisms to ensure syntactic correctness and functional detectability. Evaluated across five OpenTitan IP blocks, BugGen produced 500 unique bugs with 94% functional accuracy and achieved a throughput of 17.7 validated bugs per hour-over five times faster than typical manual expert insertion. Additionally, BugGen identified 104 previously undetected bugs in OpenTitan regressions, highlighting its utility in exposing verification coverage gaps. Compared against Certitude, BugGen demonstrated over twice the syntactic accuracy, deeper exposure of testbench blind spots, and more functionally meaningful and complex bug scenarios. Furthermore, when these BugGen-generated datasets were employed to train ML-based failure triage models, we achieved high classification accuracy (88.1%-93.2%) across different IP blocks, confirming the practical utility and realism of generated bugs. BugGen thus provides a scalable solution for generating high-quality bug datasets, significantly enhancing verification efficiency and ML-assisted debugging.
硬件复杂程度继续使核查资源紧张,促使采用机器学习(ML)方法来提高调试效率。然而,ML协助的调试关键取决于多种且可缩放的错误数据集,而现有的人工或自动错误插入方法无法可靠地生成这些数据集。我们引入了BugGen,这是同类的首个完全自主的多剂管道管道,利用大语言模型(LLLMS)系统生成、插入和验证了RTL. BugGen 分区模块中切合实际的功能错误,通过闭路代理结构选择了突变目标,并使用了迭代性改进和回滚回机制,以确保同步性效用正确性和功能可探测性。在五个 OpenTitan IP IP区中,BugGen 生成了500个独特的错误,其功能准确性为94%,并实现了17.7个经验证的错误,比典型的手动专家插入速度快5倍。此外,BugG在 OpenTeral-Timan Registration中发现了104个先前未检测过的错误,突出的错误在揭示核查范围上产生漏洞。 对比,BBBBBG 80,BBBen 和滚化的精确度的精确度为两倍的两倍。当我们使用了双向了两次地展示的精确度提高了的精确度测测测测测测测测测测算时,这些测测测测测测测测测测测测测测算。
Article 151
Title@2025-06-12 (4): Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG
Title: Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG | Lag nicht, RAG: Training-freie Adversarial Detection mit RAG | 不要拉格,RAG:使用RAG进行无训练的反向探测 2504.04858v2 |
Authors (4): Roie Kazoom, Raz Lapid, Moshe Sipper, Ofer Hadar
Adversarial patch attacks pose a major threat to vision systems by embedding localized perturbations that mislead deep models. Traditional defense methods often require retraining or fine-tuning, making them impractical for real-world deployment. We propose a training-free Visual Retrieval-Augmented Generation (VRAG) framework that integrates Vision-Language Models (VLMs) for adversarial patch detection. By retrieving visually similar patches and images that resemble stored attacks in a continuously expanding database, VRAG performs generative reasoning to identify diverse attack types, all without additional training or fine-tuning. We extensively evaluate open-source large-scale VLMs, including Qwen-VL-Plus, Qwen2.5-VL-72B, and UI-TARS-72B-DPO, alongside Gemini-2.0, a closed-source model. Notably, the open-source UI-TARS-72B-DPO model achieves up to 95 percent classification accuracy, setting a new state-of-the-art for open-source adversarial patch detection. Gemini-2.0 attains the highest overall accuracy, 98 percent, but remains closed-source. Experimental results demonstrate VRAG’s effectiveness in identifying a variety of adversarial patches with minimal human annotation, paving the way for robust, practical defenses against evolving adversarial patch attacks.
通过嵌入局部扰动,误导深层模型,Adversarial Adversarial Communication攻击对视觉系统构成重大威胁。传统防御方法往往需要再培训或微调,使其不适合于现实世界的部署。我们提议了一个无培训的视觉回溯回动一代(VRAG)框架,将视觉-语言模型(VLMS)与Gemini-2.0的封闭源模式结合起来。值得注意的是,开放源代码的UI-TARS-72B-DPO模型在不断扩展的数据库中获取类似存储式袭击的类似视觉补丁和图像,实现了95%的分类精确度,在没有额外培训或微调的情况下,都确定了各种不同的攻击类型。我们广泛评价了开放源大型VLMS,包括Qwen-VL-Plus、Qwen2.5-VL-72B和UI-TARS-72B-DPO(VRAG-72B-DPO),以及封闭源模型。一个最高水平的保密性测试结果,用于不断升级的VG-TAR-TAR-72B-DPO。
Article 152
Title@2025-06-12 (4): SHORE: A Long-term User Lifetime Value Prediction Model in Digital Games
Title: SHORE: A Long-term User Lifetime Value Prediction Model in Digital Games | SHORE: Ein langfristiges User Lifetime Value Prediction Model in digitalen Spielen | SHORE: 数字游戏中长期用户终身价值预测模型 2506.10487v1 |
Authors (6): Shuaiqi Sun, Congde Yuan, Haoqiang Yang, Mengzhuo Guo, Guiying Wei, Jiangbo Tian
In digital gaming, long-term user lifetime value (LTV) prediction is essential for monetization strategy, yet presents major challenges due to delayed payment behavior, sparse early user data, and the presence of high-value outliers. While existing models typically rely on either short-cycle observations or strong distributional assumptions, such approaches often underestimate long-term value or suffer from poor robustness. To address these issues, we propose SHort-cycle auxiliary with Order-preserving REgression (SHORE), a novel LTV prediction framework that integrates short-horizon predictions (e.g., LTV-15 and LTV-30) as auxiliary tasks to enhance long-cycle targets (e.g., LTV-60). SHORE also introduces a hybrid loss function combining order-preserving multi-class classification and a dynamic Huber loss to mitigate the influence of zero-inflation and outlier payment behavior. Extensive offline and online experiments on real-world datasets demonstrate that SHORE significantly outperforms existing baselines, achieving a 47.91\% relative reduction in prediction error in online deployment. These results highlight SHORE’s practical effectiveness and robustness in industrial-scale LTV prediction for digital games.
在数字赌博中,长期用户寿命值(LTV)预测对于货币化战略至关重要,但是由于延迟支付行为、早期用户数据稀少以及存在高值外值,带来了重大挑战。虽然现有模型通常依赖短周期观测或强有力的分配假设,但这类方法往往低估长期价值或缺乏稳健性。为了解决这些问题,我们提议采用“Shoort周期辅助”和“命令保护回流(SHORE)”,这是一个新的LTV预测框架,将短视预测(例如LTV-15和LTV-30)作为辅助任务,以加强长周期目标(例如LTV-60),从而构成重大挑战。SHORE还引入了混合损失功能,结合了维持订单的多级分类和动态枢纽损失,以减轻零通胀和外部支付行为的影响。在现实世界数据集上的广泛的离线和在线实验表明,SHORE大大超越了现有基线(例如LTVE),实现了在线部署中预测误差47.91%的相对减少。这些结果突出表明了SHORTV的实用性预测。
Article 153
Title@2025-06-12 (4): A Conjoint Graph Representation Learning Framework for Hypertension Comorbidity Risk Prediction
Title: A Conjoint Graph Representation Learning Framework for Hypertension Comorbidity Risk Prediction | Ein konjoint Graph Representation Learning Framework für Hypertonie Comorbidity Risk Prediction | 超高频风险预测联合图表代表学习框架 2505.05094v2 |
Authors (3): Leming Zhou, Zuo Wang, Zhixuan Duan
The comorbidities of hypertension impose a heavy burden on patients and society. Early identification is necessary to prompt intervention, but it remains a challenging task. This study aims to address this challenge by combining joint graph learning with network analysis. Motivated by this discovery, we develop a Conjoint Graph Representation Learning (CGRL) framework that: a) constructs two networks based on disease coding, including the patient network and the disease difference network. Three comorbidity network features were generated based on the basic difference network to capture the potential relationship between comorbidities and risk diseases; b) incorporates computational structure intervention and learning feature representation, CGRL was developed to predict the risks of diabetes and coronary heart disease in patients; and c) analysis the comorbidity patterns and exploring the pathways of disease progression, the pathological pathogenesis of diabetes and coronary heart disease may be revealed. The results show that the network features extracted based on the difference network are important, and the framework we proposed provides more accurate predictions than other strong models in terms of accuracy.
高血压的发病率给病人和社会带来沉重的负担。早期诊断对于及时干预来说是必要的,但它仍然是一项艰巨的任务。本研究旨在通过将联合图表学习与网络分析相结合来应对这一挑战。我们受这一发现驱动,开发了一个共同图表代表学习框架:(a) 建立基于疾病编码的两个网络,包括病人网络和疾病差异网络。三个发病率网络功能基于基本差异网络产生,以捕捉并发症和风险疾病之间的潜在关系;(b) 纳入计算结构干预和学习特征代表,开发了高血压中心,以预测糖尿病和慢性心脏病在病人中的风险;以及(c) 分析并发症模式并探索疾病演变的途径、糖尿病和冠心病病病病的病理病理病理发。结果显示,基于差异网络提取的网络特征很重要,我们提出的框架比其他强型模型更准确地预测。
Article 154
Title@2025-06-12 (4): Zero-Shot Offline Imitation Learning via Optimal Transport
Title: Zero-Shot Offline Imitation Learning via Optimal Transport | Zero-Shot Offline-Imitation Lernen über optimalen Transport | 通过最佳交通方式进行零热离线模拟学习 2410.08751v3 |
Authors (5): Thomas Rupf, Marco Bagatella, Nico Gürtler, Jonas Frey, Georg Martius
Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent’s immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks. The code is available at https://github.com/martius-lab/zilot.
零光模拟学习算法有望从测试时的单一演示中产生复制无形行为的希望。 现有的实用方法将专家演示视为一个目标序列, 能够与高级目标选择器进行模仿, 以及一个低级目标限制政策。 但是, 这个框架可能会受到近似行为的影响: 代理人为实现个别目标而立即采取的行动可能会破坏长期目标。 我们引入了一种新的方法, 通过直接优化与模仿学习所固有的占用匹配的目标来缓解这一问题。 我们提议将一个有目标条件的价值函数提升到占居之间的距离, 而后者又可以通过一个学习的世界模型进行近似。 由此产生的方法可以从离线、 亚最佳的数据中学习, 并且能够进行非显微、 零光的模仿, 正如我们在复杂、 连续的基准中所显示的那样。 该代码可以在 https://github.com/ martius-lab/zilot 上查阅 。
Article 155
Title@2025-06-12 (4): Engagement-Driven Content Generation with Large Language Models
Title: Engagement-Driven Content Generation with Large Language Models | Engagement-getriebene Content-Generierung mit großen Sprachmodellen | 具有大语言模式的 “ 互动互动 “ 式内容生成 2411.13187v5 |
Authors (5): Erica Coppolillo, Federico Cinus, Marco Minici, Francesco Bonchi, Giuseppe Manco
Large Language Models (LLMs) demonstrate significant persuasive capabilities in one-on-one interactions, but their influence within social networks, where interconnected users and complex opinion dynamics pose unique challenges, remains underexplored. This paper addresses the research question: \emph{Can LLMs generate meaningful content that maximizes user engagement on social networks?} To answer this, we propose a pipeline using reinforcement learning with simulated feedback, where the network’s response to LLM-generated content (i.e., the reward) is simulated through a formal engagement model. This approach bypasses the temporal cost and complexity of live experiments, enabling an efficient feedback loop between the LLM and the network under study. It also allows to control over endogenous factors such as the LLM’s position within the social network and the distribution of opinions on a given topic. Our approach is adaptive to the opinion distribution of the underlying network and agnostic to the specifics of the engagement model, which is embedded as a plug-and-play component. Such flexibility makes it suitable for more complex engagement tasks and interventions in computational social science. Using our framework, we analyze the performance of LLMs in generating social engagement under different conditions, showcasing their full potential in this task. The experimental code is publicly available at https://github.com/mminici/Engagement-Driven-Content-Generation.
大型语言模型(LLMS)在一对一互动中展示了相当的说服能力,但在社交网络中,用户相互联系和复杂的观点动态构成独特的挑战,其影响仍然未得到充分探讨。本文论述研究问题: =emph{Can LLMS产生有意义的内容,使用户在社交网络上的参与最大化?}为此,我们提议利用模拟反馈的强化学习管道,让网络对LLM产生的内容(即奖赏)的反应通过正式参与模式模拟。这种方法绕过现场实验的时间成本和复杂性,使得LLM和正在研究的网络之间能够有一个高效的反馈循环。它还能够控制本地因素,如LLMM在社交网络中的位置和对某个特定主题的意见的传播。我们的方法是适应基础网络的舆论分布,对参与模式的具体内容(即奖赏)的认知,该模式被嵌入为“Con-lay”部分。这种灵活性使得它适合于更复杂的参与任务和计算社会科学中的干预。我们用这个框架来分析LLMMMMS/LG公司在不同的条件下现有的实验性任务。
Article 156
Title@2025-06-12 (4): Prediction of steady states in a marine ecosystem model by a machine learning technique
Title: Prediction of steady states in a marine ecosystem model by a machine learning technique | Vorhersage von stetigen Zuständen in einem marinen Ökosystemmodell durch eine Maschinelle Lerntechnik | 通过机械学习技术预测海洋生态系统模型中稳定的状态 2506.10475v1 |
Authors (2): Sarker Miraz Mahfuz, Thomas Slawig
We used precomputed steady states obtained by a spin-up for a global marine ecosystem model as training data to build a mapping from the small number of biogeochemical model parameters onto the three-dimensional converged steady annual cycle. The mapping was performed by a conditional variational autoencoder (CVAE) with mass correction. Applied for test data, we show that the prediction obtained by the CVAE already gives a reasonable good approximation of the steady states obtained by a regular spin-up. However, the predictions do not reach the same level of annual periodicity as those obtained in the original spin-up data. Thus, we took the predictions as initial values for a spin-up. We could show that the number of necessary iterations, corresponding to model years, to reach a prescribed stopping criterion in the spin-up could be significantly reduced compared to the use of the originally uniform, constant initial value. The amount of reduction depends on the applied stopping criterion, measuring the periodicity of the solution. The savings in needed iterations and, thus, computing time for the spin-up ranges from 50 to 95\%, depending on the stopping criterion for the spin-up. We compared these results with the use of the mean of the training data as an initial value. We found that this also accelerates the spin-up, but only by a much lower factor.
我们用一个全球海洋生态系统模型的附带值获得的预测稳定状态作为培训数据,从少量生物地球化学模型参数到三维趋同的稳定的年周期进行绘图。该绘图由一个有条件的变异自动coder(CVAE)进行,并进行质量校正。我们应用了测试数据,显示CVAE获得的预测已经对通过定期旋转获得的稳定状态提供了合理的良好近似值。然而,预测并不达到与原始旋转数据中获得的周期值相同的年度周期水平。因此,我们将这些预测作为递增的初步值。我们可以显示,必要的迭代数(与模型年相对应)与质量校正校准相符,与最初的统一、不变的初始值相比,可能大大降低。降低的幅度取决于应用的停止标准,衡量解决办法的周期性。需要的年周期性节省量,因此,旋转周期的时间从50到95°不等,这取决于循环值的停止值,与模型年相比,我们也可以表明,必要的迭代数的数量可以达到一定的停止标准,与模型值相比,与最初的统一、不变的初始值相比,我们用这个加速值比较了这些系数的结果。我们还比较了这一数值。
Article 157
Title@2025-06-12 (4): PASCO (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms
Title: PASCO (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms | PASCO (PArallel Structured COarsening): ein Overlay, um Graph Clustering Algorithmen zu beschleunigen | PASCO(PARALLEL 结构化COseting):加速图形组合算法的重叠 2412.13592v2 |
Authors (7): Etienne Lasalle, Rémi Vaudaine, Titouan Vayer, Pierre Borgnat, Rémi Gribonval, Paulo Gonçalves, Màrton Karsai
Clustering the nodes of a graph is a cornerstone of graph analysis and has been extensively studied. However, some popular methods are not suitable for very large graphs: e.g., spectral clustering requires the computation of the spectral decomposition of the Laplacian matrix, which is not applicable for large graphs with a large number of communities. This work introduces PASCO, an overlay that accelerates clustering algorithms. Our method consists of three steps: 1-We compute several independent small graphs representing the input graph by applying an efficient and structure-preserving coarsening algorithm. 2-A clustering algorithm is run in parallel onto each small graph and provides several partitions of the initial graph. 3-These partitions are aligned and combined with an optimal transport method to output the final partition. The PASCO framework is based on two key contributions: a novel global algorithm structure designed to enable parallelization and a fast, empirically validated graph coarsening algorithm that preserves structural properties. We demonstrate the strong performance of 1 PASCO in terms of computational efficiency, structural preservation, and output partition quality, evaluated on both synthetic and real-world graph datasets.
将图形的节点分组为图表分析的基石,并且已经进行了广泛的研究。但是,一些流行的方法不适合非常大的图表:例如,光谱集群要求计算拉普拉西亚矩阵的光谱分解,而拉普拉西亚矩阵不适用于有大量社区的大图。这项工作介绍了PASCO, 这是一种加速组合算法的叠加。我们的方法包括三个步骤:1-我们通过应用高效和结构保存的粗略算法,计算代表输入图的几个独立的小图。2-A组算法是平行在每一个小图上运行的,并提供了初始图的若干分区。3-这些分区与输出最终分区的最佳运输方法一致并结合。PASCO框架基于两种关键贡献:一种新的全球算法结构,旨在促成平行化,一种经经验验证的快速图表分解算法,以保存结构特性。我们从计算效率、结构保存和产出分区质量的角度展示了1个PASCO在合成和现实世界图表数据集上的巨大表现。
Article 158
Title@2025-06-12 (4): Glimpse: Generalized Locality for Scalable and Robust CT
Title: Glimpse: Generalized Locality for Scalable and Robust CT | Glimpse: Generalisierte Lokalität für skalierbare und robuste CT | Glumpse: 可缩放和强力CT的通用位置 2401.00816v3 |
Authors (4): AmirEhsan Khorashadizadeh, Valentin Debarnot, Tianlin Liu, Ivan Dokmanić
Deep learning has become the state-of-the-art approach to medical tomographic imaging. A common approach is to feed the result of a simple inversion, for example the backprojection, to a multiscale convolutional neural network (CNN) which computes the final reconstruction. Despite good results on in-distribution test data, this often results in overfitting certain large-scale structures and poor generalization on out-of-distribution (OOD) samples. Moreover, the memory and computational complexity of multiscale CNNs scale unfavorably with image resolution, making them impractical for application at realistic clinical resolutions. In this paper, we introduce Glimpse, a local coordinate-based neural network for computed tomography which reconstructs a pixel value by processing only the measurements associated with the neighborhood of the pixel. Glimpse significantly outperforms successful CNNs on OOD samples, while achieving comparable or better performance on in-distribution test data and maintaining a memory footprint almost independent of image resolution; 5GB memory suffices to train on 1024x1024 images which is orders of magnitude less than CNNs. Glimpse is fully differentiable and can be used plug-and-play in arbitrary deep learning architectures, enabling feats such as correcting miscalibrated projection orientations. Our implementation and Google Colab demo can be accessed at https://github.com/swing-research/Glimpse.
深度学习已成为医学摄影成像的最先进的医学成像方法。 一种共同的方法是将简单的反转结果(例如回射)转化为一个计算最终重建的多比例进化神经神经网络(CNN ) 。 尽管在分布测试数据上取得了良好的结果,但这往往造成某些大规模结构的过度,而且对超出分布(OOOD)样本的简略化不甚理想。此外,多种规模的CNN的记忆和计算复杂性与图像解析几乎不相容,使其不切实际的临床分辨率应用。 在本文中,我们引入了Glimpse,一个基于当地协调的神经神经网络,用于计算像素值的计算,而计算网络仅通过处理与像素相邻的测量来重建像素值。 Glimpse 大大超越了在分布测试数据中成功的CNN的功能,同时保持一个几乎与图像解析无关的记忆足迹; 5GB记忆足以在1024x1024图像上培训比CNNB的级别较低的临床解析分辨率。 Glimpal-Glips-libal 和Gligalim-libal imal 学会可以完全不同。 学习Glical-lical-lical-lical
Article 159
Title@2025-06-12 (4): Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts
Title: Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts | Auf dem Weg zur robusten multimodalen Emotionserkennung unter fehlenden Modalitäten und Verteilungsverschiebungen | 争取在缺失模式和分销转移模式下强有力地承认多模式情感 2506.10452v1 |
Authors (5): Guowei Zhong, Ruohong Huan, Mingzhen Wu, Ronghua Liang, Peng Chen
Recent advancements in Multimodal Emotion Recognition (MER) face challenges in addressing both modality missing and Out-Of-Distribution (OOD) data simultaneously. Existing methods often rely on specific models or introduce excessive parameters, which limits their practicality. To address these issues, we propose a novel robust MER framework, Causal Inference Distiller (CIDer), and introduce a new task, Random Modality Feature Missing (RMFM), to generalize the definition of modality missing. CIDer integrates two key components: a Model-Specific Self-Distillation (MSSD) module and a Model-Agnostic Causal Inference (MACI) module. MSSD enhances robustness under the RMFM task through a weight-sharing self-distillation approach applied across low-level features, attention maps, and high-level representations. Additionally, a Word-level Self-aligned Attention Module (WSAM) reduces computational complexity, while a Multimodal Composite Transformer (MCT) facilitates efficient multimodal fusion. To tackle OOD challenges, MACI employs a tailored causal graph to mitigate label and language biases using a Multimodal Causal Module (MCM) and fine-grained counterfactual texts. Notably, MACI can independently enhance OOD generalization with minimal additional parameters. Furthermore, we also introduce the new repartitioned MER OOD datasets. Experimental results demonstrate that CIDer achieves robust performance in both RMFM and OOD scenarios, with fewer parameters and faster training compared to state-of-the-art methods. The implementation of this work is publicly accessible at https://github.com/gw-zhong/CIDer.
现有方法往往依赖具体模型或引入过多参数,从而限制其实用性。为了解决这些问题,我们提议采用一个新的强有力的市场模型框架,即Causal Inference Dispitiller(CIDer),并引入一个新的任务,即随机模式特征特征缺失(RMFM),以概括缺失模式的定义。CIDer综合了两个关键组成部分:模型特征自我蒸馏参数(MSSD)模块和模型-遗传诱导(MACI)模块。MSSD通过在低级别特征、关注地图和高级别表述中应用加权共享自我蒸馏法,增强RMFM任务下的稳健性。此外,Word级自我调控模块(WASAM)降低了计算复杂性,而多模式性能快速变异变(MCT)则促进高效的多式联运挑战,MACI采用定制的因果性图表,同时在IMM(IMA)通用的IMA和IMA(IMA) IM(O) IM(多级变数) 测试中引入了常规定义和IM(IM(IM) IM(IMA) IM(IMA) IM(IM) IM) IM(O) IM(IM) IM(IM) IM(IM) IML) IML) IML) 常规和新版本操作(O(O(IML) (IML) IML) IML) (IM(IML) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (OL) (IML) (IMLL) (IM) (IM) (IM) (IM) (IML) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IML) (IML) (IML) (IML) (IM) (IM(O) (IM(IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (O) (IM) (IM) (IM) (IM) (IML) (IM) (IM
Article 160
Title@2025-06-12 (4): Extending AALpy with Passive Learning: A Generalized State-Merging Approach
Title: Extending AALpy with Passive Learning: A Generalized State-Merging Approach | AALpy mit passivem Lernen erweitern: Ein generalisierter Staats-Merging-Ansatz | 扩大被动学习的AALPY:普遍化的国家-老龄化办法 2506.06333v2 |
Authors (2): Benjamin von Berg, Bernhard K. Aichernig
AALpy is a well-established open-source automata learning library written in Python with a focus on active learning of systems with IO behavior. It provides a wide range of state-of-the-art algorithms for different automaton types ranging from fully deterministic to probabilistic automata. In this work, we present the recent addition of a generalized implementation of an important method from the domain of passive automata learning: state-merging in the red-blue framework. Using a common internal representation for different automaton types allows for a general and highly configurable implementation of the red-blue framework. We describe how to define and execute state-merging algorithms using AALpy, which reduces the implementation effort for state-merging algorithms mainly to the definition of compatibility criteria and scoring. This aids the implementation of both existing and novel algorithms. In particular, defining some existing state-merging algorithms from the literature with AALpy only takes a few lines of code.
AALpy 是一个以 Python 写成的成熟的开放源码自动磁体学习图书馆, 重点是积极学习具有 IO 行为的系统。 它为从完全确定性到概率性不同类型的不同自动磁体提供了一系列最先进的算法, 从完全确定性到概率性自动磁体。 在这项工作中, 我们介绍了最近从被动自动磁体学习领域普遍采用重要方法的附加情况: 红蓝色框架中的国家合并。 使用不同的自动磁体类型的共同内部代表可以普遍和高度可配置地实施红蓝色框架。 我们描述如何定义和执行使用 AALPy 的州合并算法, 从而减少国家合并算法的实施努力, 主要是确定兼容性标准和评分。 这有利于实施现有和新颖的算法。 特别是, 从文献中用 AALpy 来定义某些现有的州合并算法只需要几行代码。
Article 161
Title@2025-06-12 (4): MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices
Title: MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices | MNN-LLM: Eine generische Inferenz-Engine für schnelle großsprachliche Modellbereitstellung auf mobilen Geräten | MNN-LLM:移动设备快速大语言模型部署通用推断引擎 2506.10443v1 |
Authors (7): Zhaode Wang, Jingbang Yang, Xinyu Qian, Shiwen Xing, Xiaotang Jiang, Chengfei Lv, Shengyu Zhang
Large language models (LLMs) have demonstrated exceptional performance across a variety of tasks. However, their substantial scale leads to significant computational resource consumption during inference, resulting in high costs. Consequently, edge device inference presents a promising solution. The primary challenges of edge inference include memory usage and inference speed. This paper introduces MNN-LLM, a framework specifically designed to accelerate the deployment of large language models on mobile devices. MNN-LLM addresses the runtime characteristics of LLMs through model quantization and DRAM-Flash hybrid storage, effectively reducing memory usage. It rearranges weights and inputs based on mobile CPU instruction sets and GPU characteristics while employing strategies such as multicore load balancing, mixed-precision floating-point operations, and geometric computations to enhance performance. Notably, MNN-LLM achieves up to a 8.6x speed increase compared to current mainstream LLM-specific frameworks.
大型语言模型(LLMS)在各种任务中表现出非凡的性能,然而,它们的巨大规模导致在推论期间大量计算资源消耗,从而导致高成本。因此,边缘装置推论提出了一个有希望的解决办法。边缘推论的主要挑战包括内存使用和推论速度。本文介绍了MNN-LLM,这是一个专门旨在加速在移动设备上部署大型语言模型的框架。MNN-LLM通过模型四分化和DRAM-Flash混合储存处理LMS的运行时间特点,有效地减少了记忆的用量。它根据移动式CPU教学组和GPU特性重新安排了重量和投入,同时采用了多核心负载平衡、混合精度浮动点操作和几何计算等战略,以提高性能。值得注意的是,MNN-LLM比当前的主流LM特定框架提高了8.6x的速度。
Article 162
Title@2025-06-12 (4): Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
Title: Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts | Experten-Rennen: Eine flexible Routing-Strategie für Scaling Diffusion Transformer mit Mixture von Experten | 专家种族:与专家混合的扩大传播变异器灵活机动战略 2503.16057v3 |
Authors (7): Yike Yuan, Ziyu Wang, Zihao Huang, Defa Zhu, Xun Zhou, Jingyi Yu, Qiyang Min
Diffusion models have emerged as mainstream framework in visual generation. Building upon this success, the integration of Mixture of Experts (MoE) methods has shown promise in enhancing model scalability and performance. In this paper, we introduce Race-DiT, a novel MoE model for diffusion transformers with a flexible routing strategy, Expert Race. By allowing tokens and experts to compete together and select the top candidates, the model learns to dynamically assign experts to critical tokens. Additionally, we propose per-layer regularization to address challenges in shallow layer learning, and router similarity loss to prevent mode collapse, ensuring better expert utilization. Extensive experiments on ImageNet validate the effectiveness of our approach, showcasing significant performance gains while promising scaling properties.
在视觉生成中,传播模式已成为主流框架。在这一成功的基础上,整合专家混合方法在加强模型缩放性和性能方面显示出了希望。在本文中,我们引入了 “ 种族DIT “ ,这是用于传播变压器的新型的 “ 种族DIT “ 模型,这是一个具有灵活路由战略的传播变压器的新颖的 “ 种族DIT “ 模型。通过允许象征性品和专家共同竞争和挑选顶级候选人,该模型学会动态地将专家分配到关键标志上。此外,我们提议对每个层次进行正规化,以应对浅层学习方面的挑战,以及路由性相似性损失,以防止模式崩溃,确保专家得到更好的利用。关于图像网络的广泛实验验证了我们的方法的有效性,展示了在有希望提升性能的同时取得的重大绩效收益。
Article 163
Title@2025-06-12 (4): MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning
Title: MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning | MasHost baut alles: Autonomes Multi-Agenten-System, das durch Verstärkungslernen gesteuert wird | 以强化学习为导向的多机构自治系统 2506.08507v2 |
Authors (8): Kuo Yang, Xingjie Yang, Linhui Yu, Qing Xu, Yan Fang, Xu Wang, Zhengyang Zhou, Yang Wang
Large Language Model (LLM)-driven Multi-agent systems (Mas) have recently emerged as a powerful paradigm for tackling complex real-world tasks. However, existing Mas construction methods typically rely on manually crafted interaction mechanisms or heuristic rules, introducing human biases and constraining the autonomous ability. Even with recent advances in adaptive Mas construction, existing systems largely remain within the paradigm of semi-autonomous patterns. In this work, we propose MasHost, a Reinforcement Learning (RL)-based framework for autonomous and query-adaptive Mas design. By formulating Mas construction as a graph search problem, our proposed MasHost jointly samples agent roles and their interactions through a unified probabilistic sampling mechanism. Beyond the accuracy and efficiency objectives pursued in prior works, we introduce component rationality as an additional and novel design principle in Mas. To achieve this multi-objective optimization, we propose Hierarchical Relative Policy Optimization (HRPO), a novel RL strategy that collaboratively integrates group-relative advantages and action-wise rewards. To our knowledge, our proposed MasHost is the first RL-driven framework for autonomous Mas graph construction. Extensive experiments on six benchmarks demonstrate that MasHost consistently outperforms most competitive baselines, validating its effectiveness, efficiency, and structure rationality.
大型语言模型(LLM)驱动的多试剂系统(Mas)最近成为处理复杂的现实世界任务的一个强有力的范例,然而,现有的Mas建筑方法通常依赖人工设计的互动机制或超常规则,引入人类偏见并限制自主能力。即使最近在适应性Mas建设方面有所进展,现有系统在很大程度上仍然处于半自治模式范式的范式之内。在这项工作中,我们提议以MasHost为主的强化学习框架(RL)为基础,用于自主和调试性Mas设计。通过将Mas建筑设计成图表搜索问题,我们提议的Mashost联合样本代理作用及其相互作用通过统一的概率抽样机制进行。除了在以前的工程中追求的准确性和效率目标外,我们还引入了部分合理性,作为新的设计原则。为了实现这一多目标优化,我们提议了高分级相对优化(HRPO),这是一个新型的RL战略,将群体优势和行动角度的奖赏结合起来。我们所拟议的Mashost是第一个由RL驱动的自主结构最有竞争力的标准,在连续的马斯最有竞争力的结构上展示的基线。
Article 164
Title@2025-06-12 (4): From Features to Graphs: Exploring Graph Structures and Pairwise Interactions via GNNs
Title: From Features to Graphs: Exploring Graph Structures and Pairwise Interactions via GNNs | Von Features zu Graphen: Erforschen von Graphenstrukturen und paarweise Interaktionen über GNNs | 从地貌到图表:通过GNNs探索图表结构和对称互动 2502.13471v2 |
Authors (4): Phaphontee Yamchote, Saw Nay Htet Win, Chainarong Amornbunchornvej, Thanapon Noraset
Feature interaction is crucial in predictive machine learning models, as it captures the relationships between features that influence model performance. In this work, we focus on pairwise interactions and investigate their importance in constructing feature graphs for Graph Neural Networks (GNNs). We leverage existing GNN models and tools to explore the relationship between feature graph structures and their effectiveness in modeling interactions. Through experiments on synthesized datasets, we uncover that edges between interacting features are important for enabling GNNs to model feature interactions effectively. We also observe that including non-interaction edges can act as noise, degrading model performance. Furthermore, we provide theoretical support for sparse feature graph selection using the Minimum Description Length (MDL) principle. We prove that feature graphs retaining only necessary interaction edges yield a more efficient and interpretable representation than complete graphs, aligning with Occam’s Razor. Our findings offer both theoretical insights and practical guidelines for designing feature graphs that improve the performance and interpretability of GNN models.
在预测性机器学习模型中,特性互动至关重要,因为它能捕捉影响模型性能的特征之间的关系。在这项工作中,我们注重双向互动,并调查其在为图形神经网络(GNN)构建特征图形方面的重要性。我们利用现有的GNN模型和工具来探索特征图形结构之间的关系及其建模互动的有效性。通过综合数据集实验,我们发现互动特征之间的边缘对于使GNN能够有效地模拟特征互动非常重要。我们还注意到,包括非互动边缘可以起到噪音、降低模式性能的作用。此外,我们提供理论支持,支持使用最低描述长度原则选择稀有特征图形。我们证明,只保留必要互动边缘的特征图形能够比完整的图形产生比完整图形更高效和可解释的描述性。我们的调查结果为设计改进GNN模型的性能和可解释性提供了理论洞察力和实用指南。
Article 165
Title@2025-06-12 (4): System Identification Using Kolmogorov-Arnold Networks: A Case Study on Buck Converters
Title: System Identification Using Kolmogorov-Arnold Networks: A Case Study on Buck Converters | Systemidentifizierung mit Kolmogorov-Arnold Networks: Eine Fallstudie zu Buck Convertern | 使用Kolmogorov-Arnold网络的系统识别系统:关于巴克转换器的案例研究 2506.10434v1 |
Authors (3): Nart Gashi, Panagiotis Kakosimos, George Papafotiou
Kolmogorov-Arnold Networks (KANs) are emerging as a powerful framework for interpretable and efficient system identification in dynamic systems. By leveraging the Kolmogorov-Arnold representation theorem, KANs enable function approximation through learnable activation functions, offering improved scalability, accuracy, and interpretability compared to traditional neural networks. This paper investigates the application of KANs to model and analyze the dynamics of a buck converter system, focusing on state-space parameter estimation along with discovering the system equations. Using simulation data, the methodology involves approximating state derivatives with KANs, constructing interpretable state-space representations, and validating these models through numerical experiments. The results demonstrate the ability of KANs to accurately identify system dynamics, verify model consistency, and detect parameter changes, providing valuable insights into their applicability for system identification in modern industrial systems.
Kolmogorov-Arnold Networks (KANs) 正在成为动态系统中可解释和高效系统识别的强大框架。 通过利用 Kolmogorov-Arnold 演示理论, KANs 能够通过可学习的激活功能实现功能近似, 提供比传统神经网络更好的可缩放性、 准确性和可解释性。 本文调查了 KANs 用于模拟和分析高压转换器系统的动态, 重点是州空间参数估计以及发现系统方程式。 使用模拟数据, 该方法涉及将国家衍生物与 KANs相匹配, 构建可解释的状态- 空间表达方式, 通过数字实验验证这些模型。 结果表明 KANs 能够准确识别系统动态、 校验模型一致性和检测参数变化, 提供有价值的洞察其适用于现代工业系统中系统识别系统的实用性。
Article 166
Title@2025-06-12 (4): Measuring Semantic Information Production in Generative Diffusion Models
Title: Measuring Semantic Information Production in Generative Diffusion Models | Messung der semantischen Informationsproduktion in generativen Diffusionsmodellen | 生物传播模型中计量语义信息生产 2506.10433v1 |
Authors (4): Florian Handke, Félix Koulischer, Gabriel Raya, Luca Ambrogioni
It is well known that semantic and structural features of the generated images emerge at different times during the reverse dynamics of diffusion, a phenomenon that has been connected to physical phase transitions in magnets and other materials. In this paper, we introduce a general information-theoretic approach to measure when these class-semantic “decisions” are made during the generative process. By using an online formula for the optimal Bayesian classifier, we estimate the conditional entropy of the class label given the noisy state. We then determine the time intervals corresponding to the highest information transfer between noisy states and class labels using the time derivative of the conditional entropy. We demonstrate our method on one-dimensional Gaussian mixture models and on DDPM models trained on the CIFAR10 dataset. As expected, we find that the semantic information transfer is highest in the intermediate stages of diffusion while vanishing during the final stages. However, we found sizable differences between the entropy rate profiles of different classes, suggesting that different “semantic decisions” are located at different intermediate times.
众所周知, 生成图像的语义和结构特征在不同时间的反向扩散动态中出现, 这种现象与磁体和其他材料的物理阶段转变有关。 在本文中, 我们引入了一种一般的信息理论方法, 以测量这些类类的语义“ 决定” 是在基因化过程中作出的。 我们通过使用最佳贝叶斯语分类器的在线公式, 估计了等级标签的有条件的酶值, 以吵闹的状态为条件。 然后, 我们用有条件的 entropy 衍生物的时间来决定噪音状态和类标签之间最高信息传输的时间间隔。 我们用单维高斯语混合模型和在 CIFAR10 数据集培训的 DDPM 模型来演示我们的方法 。 正如预期的那样, 我们发现语义信息传输在传播的中间阶段最高, 在最后阶段消失。 然而, 我们发现不同等级的 的 伦比率剖面剖面图有相当大的差异, 表明不同的“ 命令” 位于不同的中间时期 。
Article 167
Title@2025-06-12 (4): CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning
Title: CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning | CheMatAgent: Verbesserung von LLMs für Chemie und Materialwissenschaft durch baumsuchebasiertes Tool Learning | CheMatAgent:通过植树搜索工具学习加强化学和材料科学LLMs 2506.07551v2 |
Authors (10): Mengsong Wu, YaFei Wang, Yidong Ming, Yuqi An, Yuwei Wan, Wenliang Chen, Binbin Lin, Yuqiang Li, Tong Xie, Dongzhan Zhou
Large language models (LLMs) have recently demonstrated promising capabilities in chemistry tasks while still facing challenges due to outdated pretraining knowledge and the difficulty of incorporating specialized chemical expertise. To address these issues, we propose an LLM-based agent that synergistically integrates 137 external chemical tools created ranging from basic information retrieval to complex reaction predictions, and a dataset curation pipeline to generate the dataset ChemToolBench that facilitates both effective tool selection and precise parameter filling during fine-tuning and evaluation. We introduce a Hierarchical Evolutionary Monte Carlo Tree Search (HE-MCTS) framework, enabling independent optimization of tool planning and execution. By leveraging self-generated data, our approach supports step-level fine-tuning (FT) of the policy model and training task-adaptive PRM and ORM that surpass GPT-4o. Experimental evaluations demonstrate that our approach significantly improves performance in Chemistry QA and discovery tasks, offering a robust solution to integrate specialized tools with LLMs for advanced chemical applications. All datasets and code are available at https://github.com/AI4Chem/ChemistryAgent .
大型语言模型(LLMS)最近表现出了在化学任务方面充满希望的能力,同时由于培训前知识过时和难以纳入专门的化学专门知识,仍然面临着挑战。为了解决这些问题,我们提议一个基于LLM的代理机构,以协同方式整合从基本信息检索到复杂反应预测等范围创建的137个外部化学工具,以及一个旨在生成数据集ChemToolBench的数据中心曲线管道,该平台既有利于在微调和评估期间有效选择工具,也有利于精确的参数填充。我们引入了一个等级进化蒙特卡洛树搜索框架,使工具规划和执行能够独立优化。我们的方法利用自我生成的数据,支持政策模型的级微调,以及培训任务适应性PRM和ORM,这超过了GPT-4o。实验性评估表明,我们的方法大大改进了化学质量和发现任务的业绩,为先进化学应用的专业工具与LMS整合提供了强有力的解决方案。所有数据集和代码都可在https://github.com/AI4chem/ChemAgentistry上查阅。
Article 168
Title@2025-06-12 (4): Multi-dimensional Autoscaling of Processing Services: A Comparison of Agent-based Methods
Title: Multi-dimensional Autoscaling of Processing Services: A Comparison of Agent-based Methods | Mehrdimensionale Autoskalierung von Verarbeitungsdienstleistungen: Ein Vergleich von agentenbasierten Methoden | 处理服务多维多维自动升级:以代理为基础的方法比较 2506.10420v1 |
Authors (5): Boris Sedlak, Alireza Furutanpey, Zihang Wang, Víctor Casamayor Pujol, Schahram Dustdar
Edge computing breaks with traditional autoscaling due to strict resource constraints, thus, motivating more flexible scaling behaviors using multiple elasticity dimensions. This work introduces an agent-based autoscaling framework that dynamically adjusts both hardware resources and internal service configurations to maximize requirements fulfillment in constrained environments. We compare four types of scaling agents: Active Inference, Deep Q Network, Analysis of Structural Knowledge, and Deep Active Inference, using two real-world processing services running in parallel: YOLOv8 for visual recognition and OpenCV for QR code detection. Results show all agents achieve acceptable SLO performance with varying convergence patterns. While the Deep Q Network benefits from pre-training, the structural analysis converges quickly, and the deep active inference agent combines theoretical foundations with practical scalability advantages. Our findings provide evidence for the viability of multi-dimensional agent-based autoscaling for edge environments and encourage future work in this research direction.
由于严格的资源限制,计算断层与传统的自动计算断裂,因此,利用多种弹性维度鼓励更灵活的缩放行为。这项工作引入了一个基于代理的自动缩放框架,对硬件资源和内部服务配置进行动态调整,以最大限度地满足受限制环境中的要求。我们比较了四种类型的缩放剂:主动推论、深Q网络、结构知识分析和深活性推理,同时运行两个真实世界的处理服务:用于视觉识别的YOLOv8和用于QR代码检测的 OpenCV。结果显示,所有代理都实现了可接受的 SLO 性能,并有各种不同的趋同模式。虽然深Q网络从培训前获益,但结构分析迅速汇合,深活性推理剂将理论基础与实用的可伸缩性优势结合起来。我们的调查结果为边缘环境基于多维体的自动调整的可行性提供了证据,并鼓励今后在这一研究方向开展工作。
Article 169
Title@2025-06-12 (4): Data-Driven Soil Organic Carbon Sampling: Integrating Spectral Clustering with Conditioned Latin Hypercube Optimization
Title: Data-Driven Soil Organic Carbon Sampling: Integrating Spectral Clustering with Conditioned Latin Hypercube Optimization | Datengetriebene Bodenorganische Kohlenstoff-Probenahme: Integration von Spektral-Clustering mit bedingter lateinischer Hyperkube-Optimierung | 数据驱动土壤有机碳取样:将光谱集成与有条件的拉丁超立方体优化结合 2506.10419v1 |
Authors (3): Weiying Zhao, Aleksei Unagaev, Natalia Efremova
Soil organic carbon (SOC) monitoring often relies on selecting representative field sampling locations based on environmental covariates. We propose a novel hybrid methodology that integrates spectral clustering - an unsupervised machine learning technique with conditioned Latin hypercube sampling (cLHS) to enhance the representativeness of SOC sampling. In our approach, spectral clustering partitions the study area into $K$ homogeneous zones using multivariate covariate data, and cLHS is then applied within each zone to select sampling locations that collectively capture the full diversity of environmental conditions. This hybrid spectral-cLHS method ensures that even minor but important environmental clusters are sampled, addressing a key limitation of vanilla cLHS which can overlook such areas. We demonstrate on a real SOC mapping dataset that spectral-cLHS provides more uniform coverage of covariate feature space and spatial heterogeneity than standard cLHS. This improved sampling design has the potential to yield more accurate SOC predictions by providing better-balanced training data for machine learning models.
土壤有机碳(SOC)监测往往依靠在环境共变的基础上选择具有代表性的实地取样地点。我们建议采用一种新的混合方法,将光谱集聚(一种不受监督的机器学习技术,使用有条件的拉丁超立方取样(cLHS)来提高SOC取样的代表性。在我们的方法中,光谱集聚将研究区分成一个使用多变量共变数据,将研究区划成千元同质区,然后在每个区内应用CLHS来选择能够共同捕捉环境条件充分多样性的取样地点。这种混合光谱-cLHS方法确保即使是小型但重要的环境组群都取样,解决可以忽略这些地区的香草丙基HS的一个关键限制。我们用真正的SOC测绘数据集证明,光谱-LHS提供了比标准 cLHS系统更统一的共变地空间和空间异性区。这种改进的取样设计有可能通过为机器学习模型提供更平衡的培训数据,产生更准确的SOC预测。
Article 170
Title@2025-06-12 (4): Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation
Title: Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation | Verbesserter Algorithmus für tiefes aktives Lernen unter Unwucht durch Optimale Trennung | 通过最佳离职,在平衡下改善深层积极学习的比值 2312.09196v4 |
Authors (4): Shyam Nuggehalli, Jifan Zhang, Lalit Jain, Robert Nowak
Class imbalance severely impacts machine learning performance on minority classes in real-world applications. While various solutions exist, active learning offers a fundamental fix by strategically collecting balanced, informative labeled examples from abundant unlabeled data. We introduce DIRECT, an algorithm that identifies class separation boundaries and selects the most uncertain nearby examples for annotation. By reducing the problem to one-dimensional active learning, DIRECT leverages established theory to handle batch labeling and label noise – another common challenge in data annotation that particularly affects active learning methods. Our work presents the first comprehensive study of active learning under both class imbalance and label noise. Extensive experiments on imbalanced datasets show DIRECT reduces annotation costs by over 60\% compared to state-of-the-art active learning methods and over 80\% versus random sampling, while maintaining robustness to label noise.
虽然存在各种解决办法,但积极学习通过从大量未贴标签的数据中战略性地收集平衡、信息化的标签示例,提供了根本的解决方案。我们引入了直接(Direct)算法,该算法识别了分类隔离界限,并选择了附近最不确定的注解实例。通过将问题降低为一维积极学习,直接(Direct)利用既定理论处理批量标签和标签噪音,这是数据批量标签和标签噪音方面的另一个共同挑战,尤其影响积极的学习方法。我们的工作展示了在分类不平衡和标签噪音下积极学习的首次全面研究。关于不平衡数据集的广泛实验显示,直接(Direct)比最新活跃学习方法减少了批量记成本60%以上,相对于随机抽样则减少了80%以上,同时保持了标签噪音的稳健性。
Article 171
Title@2025-06-12 (4): Towards Graph-Based Privacy-Preserving Federated Learning: ModelNet – A ResNet-based Model Classification Dataset
Title: Towards Graph-Based Privacy-Preserving Federated Learning: ModelNet – A ResNet-based Model Classification Dataset | Auf dem Weg zu einem graphisch basierten Datenschutz-Erhaltungs-Federated Learning: ModelNet – Ein resNet-basierter Modellklassifikations-Datensatz | 走向基于图表的隐私-保留联邦学习:模型网 – – 以ResNet为基础的分类模型数据集 2506.00476v2 |
Authors (2): Abhisek Ray, Lukas Esterle
Federated Learning (FL) has emerged as a powerful paradigm for training machine learning models across distributed data sources while preserving data locality. However, the privacy of local data is always a pivotal concern and has received a lot of attention in recent research on the FL regime. Moreover, the lack of domain heterogeneity and client-specific segregation in the benchmarks remains a critical bottleneck for rigorous evaluation. In this paper, we introduce ModelNet, a novel image classification dataset constructed from the embeddings extracted from a pre-trained ResNet50 model. First, we modify the CIFAR100 dataset into three client-specific variants, considering three domain heterogeneities (homogeneous, heterogeneous, and random). Subsequently, we train each client-specific subset of all three variants on the pre-trained ResNet50 model to save model parameters. In addition to multi-domain image data, we propose a new hypothesis to define the FL algorithm that can access the anonymized model parameters to preserve the local privacy in a more effective manner compared to existing ones. ModelNet is designed to simulate realistic FL settings by incorporating non-IID data distributions and client diversity design principles in the mainframe for both conventional and futuristic graph-driven FL algorithms. The three variants are ModelNet-S, ModelNet-D, and ModelNet-R, which are based on homogeneous, heterogeneous, and random data settings, respectively. To the best of our knowledge, we are the first to propose a cross-environment client-specific FL dataset along with the graph-based variant. Extensive experiments based on domain shifts and aggregation strategies show the effectiveness of the above variants, making it a practical benchmark for classical and graph-based FL research. The dataset and related code are available online.
联邦学习联合会(FL)已成为在保存数据地点的同时,培训分布式数据源的机器学习模型的强大范例。然而,当地数据的隐私始终是一个关键关切,在最近关于FL制度的研究中受到了很多关注。此外,基准中缺乏域异质性和客户专用隔离,这仍然是严格评估的关键瓶颈。在本文中,我们引入了模型Net,这是从事先培训的ResNet50模型中嵌入的一个新的图像分类数据集。首先,我们将CIFAR100数据集修改为三种客户特有变异,考虑到三种域异性(均匀、混杂和随机)。随后,我们用预先培训的ResNet50模型模型来培训所有三种变异,以保存模型参数。除了多域图像数据外,我们还提出了一个新的假设,用来确定FL算基模型参数,以便以更高效的方式维护当地隐私。模型Net旨在模拟FL(FL)的域域域域域域域数据流(Flalderalalalality)和Floral-al-al-al-ligal 数据流(Floral-lational-liversal-liversal-lational-dal-lational-lational-lational-lational-lational-lational-lational-lational-lational-lational-lations)的模型)的模型和Fmo 数据流流流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据分布、F-FM-F-F-F-slation-Flation-lation-lation-lation-lation-lation-lation-lation-xx-lation-lation-s-s-s-s-x-l-l-I-I-I-l-l-l-l-I-I-I-l-l-l-I-I-I-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-I-l-l-I-l-l-I-I-I-I-l
Article 172
Title@2025-06-12 (4): Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series
Title: Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series | Zeit-IMM: Ein Datensatz und Benchmark für irreguläre multimodale Multivariate Zeitreihen | 时间-IMM:非正常多式联运多变时间序列的数据集和基准 2506.10412v1 |
Authors (7): Ching Chang, Jeehyun Hwang, Yidan Shi, Haixin Wang, Wen-Chih Peng, Tien-Fu Chen, Wei Wang
Time series data in real-world applications such as healthcare, climate modeling, and finance are often irregular, multimodal, and messy, with varying sampling rates, asynchronous modalities, and pervasive missingness. However, existing benchmarks typically assume clean, regularly sampled, unimodal data, creating a significant gap between research and real-world deployment. We introduce Time-IMM, a dataset specifically designed to capture cause-driven irregularity in multimodal multivariate time series. Time-IMM represents nine distinct types of time series irregularity, categorized into trigger-based, constraint-based, and artifact-based mechanisms. Complementing the dataset, we introduce IMM-TSF, a benchmark library for forecasting on irregular multimodal time series, enabling asynchronous integration and realistic evaluation. IMM-TSF includes specialized fusion modules, including a timestamp-to-text fusion module and a multimodality fusion module, which support both recency-aware averaging and attention-based integration strategies. Empirical results demonstrate that explicitly modeling multimodality on irregular time series data leads to substantial gains in forecasting performance. Time-IMM and IMM-TSF provide a foundation for advancing time series analysis under real-world conditions. The dataset is publicly available at https://www.kaggle.com/datasets/blacksnail789521/time-imm/data, and the benchmark library can be accessed at https://anonymous.4open.science/r/IMMTSF_NeurIPS2025.
在现实世界应用中的时间序列数据,如医疗保健、气候模型和融资,往往不规则、多模式和混乱,抽样率不同,不统一的方式和普遍缺失;然而,现有基准通常假定清洁、定期抽样、单一方式的数据,造成研究和现实世界部署之间的巨大差距;我们引入了时间-IMM,这是一个数据集,专门用来记录多式多变时间序列中由原因驱动的不规则性;时间-IMM是9种不同的时间序列,分为触发型、约束型和工艺型机制;补充数据集,我们引入IMM-TSF,这是一个用于预测不规则的多式联运时间序列的基准图书馆,能够实现非同步整合和现实评估;IMM-TSM-TSF包括专门的聚合模块,包括一个时间戳至文字融合模块和一个基于多式多变时间序列的多式联运模块;时间-观测平均和关注型整合型战略。
Article 173
Title@2025-06-12 (4): DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models
Title: DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models | DeepGDel: Deep Learning-basierte Gene Deletion Prediction Framework für wachstumsverbundene Produktion in Genom-Scale Metabolic-Modellen | 深层GDel:在基因组-规模元元模型中实现增长和混合生产以深学习为基础的基因删除预测框架 2504.06316v2 |
Authors (2): Ziwei Yang, Takeyuki Tamura
In genome-scale constraint-based metabolic models, gene deletion strategies are crucial for achieving growth-coupled production, where cell growth and target metabolite production are simultaneously achieved. While computational methods for calculating gene deletions have been widely explored and contribute to developing gene deletion strategy databases, current approaches are limited in leveraging new data-driven paradigms, such as machine learning, for more efficient strain design. Therefore, it is necessary to propose a fundamental framework for this objective. In this study, we first formulate the problem of gene deletion strategy prediction and then propose a framework for predicting gene deletion strategies for growth-coupled production in genome-scale metabolic models. The proposed framework leverages deep learning algorithms to learn and integrate sequential gene and metabolite data representation, enabling the automatic gene deletion strategy prediction. Computational experiment results demonstrate the feasibility of the proposed framework, showing substantial improvements over baseline methods. Specifically, the proposed framework achieves a 14.69%, 22.52%, and 13.03% increase in overall accuracy across three metabolic models of different scales under study, while maintaining balanced precision and recall in predicting gene deletion statuses. The source code and examples for the framework are publicly available at https://github.com/MetNetComp/DeepGDel.
在基因组规模的抑制性代谢模型中,基因删除战略对于实现生长混合生产至关重要,因为细胞增长和目标代谢物生产同时实现。虽然计算基因删除的计算方法已经得到广泛探讨,并有助于开发基因删除战略数据库,但目前采用的方法有限,难以利用新的数据驱动模式,如机器学习,以便更有效地设计菌株,因此,有必要为此目标提出一个基本框架。在本研究中,我们首先提出基因删除战略预测问题,然后提出一个框架,用于预测基因淘汰战略,以便在基因组规模的代谢模型中实现增长混合生产。拟议框架利用深层次的学习算法,学习和整合顺序基因和代谢物数据,使自动基因删除战略预测成为可能。综合实验结果显示拟议框架的可行性,表明在基线方法上有了重大改进。具体地说,拟议框架在研究中,在三个不同规模的代谢模型中,实现了14.69%、22.52%和13.03%的总体准确度提高,同时保持平衡的精确度,并回顾基因删除状态。MARB/MF/SEBSUD/F/FIFSUD/FIFLD/FDSUD 提供的源代码和框架是公开框架的范例。
Article 174
Title@2025-06-12 (4): IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection
Title: IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection | IoTGeM: Generalisierbare Modelle für die IoT-Angrifferkennung auf Verhaltensbasis | IotGeM:基于行为的IoT攻击探测通用模型 2401.01343v2 |
Authors (3): Kahraman Kostas, Mike Just, Michael A. Lones
Previous research on behavior-based attack detection for networks of IoT devices has resulted in machine learning models whose ability to adapt to unseen data is limited and often not demonstrated. This paper presents IoTGeM, an approach for modeling IoT network attacks that focuses on generalizability, yet also leads to better detection and performance. We first introduce an improved rolling window approach for feature extraction. To reduce overfitting, we then apply a multi-step feature selection process where a Genetic Algorithm (GA) is uniquely guided by exogenous feedback from a separate, independent dataset. To prevent common data leaks that have limited previous models, we build and test our models using strictly isolated train and test datasets. The resulting models are rigorously evaluated using a diverse portfolio of machine learning algorithms and datasets. Our window-based models demonstrate superior generalization compared to traditional flow-based models, particularly when tested on unseen datasets. On these stringent, cross-dataset tests, IoTGeM achieves F1 scores of 99\% for ACK, HTTP, SYN, MHD, and PS attacks, as well as a 94\% F1 score for UDP attacks. Finally, we build confidence in the models by using the SHAP (SHapley Additive exPlanations) explainable AI technique, allowing us to identify the specific features that underlie the accurate detection of attacks.
先前对以行为为基础探测IoT装置网络攻击的研究已经产生了机器学习模型,这些模型适应不可见数据的能力有限,而且往往没有展示。本文展示了IoTGeM,这是模拟IoT网络攻击的一种方法,侧重于通用性,但也导致更好的探测和性能。我们首先采用更好的滚动窗口方法进行特征提取。为了减少过度配制,我们随后采用一个多步骤特征选择程序,在这种程序下,遗传阿尔格利什姆(GA)以单独独立的独立的数据集提供的外部反馈为独特的指导。为了防止共同数据泄漏,我们使用严格孤立的列车和测试数据集来建立和测试我们的模型。由此产生的模型使用多种机器学习算法和数据集的组合进行严格的评估。我们基于窗口的模型比传统的流基模型更高级的通用性,特别是在对隐蔽的数据集进行测试时。在这些严格、交叉数据集测试中,IoTGEM为ACK、HTTP、SYN、MH和PSS测试数据集攻击的F1级分数。我们用SAP攻击的分数,最后的SADSBrealS,将SADrestration 解释我们用于SADreal攻击的F1的精确攻击的分数。
Article 175
Title@2025-06-12 (4): An energy-efficient learning solution for the Agile Earth Observation Satellite Scheduling Problem
Title: An energy-efficient learning solution for the Agile Earth Observation Satellite Scheduling Problem | Eine energieeffiziente Lernlösung für das agile Erdbeobachtungssatelliten-Scheduling-Problem | Agile地球观测卫星排程问题的一个节能学习解决方案 2503.04803v2 |
Authors (3): Antonio M. Mercado-Martínez, Beatriz Soret, Antonio Jurado-Navas
The Agile Earth Observation Satellite Scheduling Problem (AEOSSP) entails finding the subset of observation targets to be scheduled along the satellite’s orbit while meeting operational constraints of time, energy and memory. The problem of deciding what and when to observe is inherently complex, and becomes even more challenging when considering several issues that compromise the quality of the captured images, such as cloud occlusion, atmospheric turbulence, and image resolution. This paper presents a Deep Reinforcement Learning (DRL) approach for addressing the AEOSSP with time-dependent profits, integrating these three factors to optimize the use of energy and memory resources. The proposed method involves a dual decision-making process: selecting the sequence of targets and determining the optimal observation time for each. Our results demonstrate that the proposed algorithm reduces the capture of images that fail to meet quality requirements by > 60% and consequently decreases energy waste from attitude maneuvers by up to 78%, all while maintaining strong observation performance.
敏捷的地球观测卫星规划问题(AEOSSP)意味着在满足时间、能量和记忆等操作限制的同时,在卫星轨道上寻找预定的观测目标子集。决定观测什么和何时观测的问题本身就十分复杂,在考虑影响所捕获图像质量的一些问题时甚至更具挑战性,如云层隔离、大气动荡和图像解析。本文件介绍了一种深度强化学习(DRL)方法,以基于时间的利润解决AEOSSP,将这三个因素结合起来,以优化能源和记忆资源的使用。拟议方法涉及一个双重决策程序:选择目标序列和确定每个目标的最佳观测时间。我们的结果表明,拟议的算法减少了未能达到质量要求的60%以上的图像的捕获,从而将能源浪费从姿态动作减少到78 %,所有这些都保持了强大的观测性能。
Article 176
Title@2025-06-12 (4): PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
Title: PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier | PAG: Multi-Turn verstärkt LLM Selbstkorrektion mit Politik als Generativer Prüfer | PAG: 多发强化LLM自我校正,政策作为产生验证 2506.10406v1 |
Authors (8): Yuhua Jiang, Yuwen Xiong, Yufeng Yuan, Chao Xin, Wenyuan Xu, Yu Yue, Qianchuan Zhao, Lin Yan
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks, yet they still struggle to reliably verify the correctness of their own outputs. Existing solutions to this verification challenge often depend on separate verifier models or require multi-stage self-correction training pipelines, which limit scalability. In this paper, we propose Policy as Generative Verifier (PAG), a simple and effective framework that empowers LLMs to self-correct by alternating between policy and verifier roles within a unified multi-turn reinforcement learning (RL) paradigm. Distinct from prior approaches that always generate a second attempt regardless of model confidence, PAG introduces a selective revision mechanism: the model revises its answer only when its own generative verification step detects an error. This verify-then-revise workflow not only alleviates model collapse but also jointly enhances both reasoning and verification abilities. Extensive experiments across diverse reasoning benchmarks highlight PAG’s dual advancements: as a policy, it enhances direct generation and self-correction accuracy; as a verifier, its self-verification outperforms self-consistency.
大型语言模型(LLMS)在复杂的推理任务中表现出了令人印象深刻的能力,然而,它们仍然在努力可靠地核实其自身产出的正确性。这一核查挑战的现有解决办法往往取决于单独的核查模型,或需要多阶段自我校正培训管道,这限制了可缩放性。在本文中,我们提议政策作为“创制验证器”(PAG),这是一个简单而有效的框架,它使LLMs能够通过在统一的多方向强化学习模式中交替政策和验证员的作用来自我纠正。与以往的做法不同,以往的做法总是产生第二次尝试,而不管模型是否有信心。 PAG采用了选择性的修订机制:模型只有在其自身的基因核查步骤发现错误时才修改其答复。这一校验后工作流程不仅缓解了模型的崩溃,而且还共同增强了推理和核查能力。 不同推理基准的广泛实验突出PAG的双重进步:作为一种政策,它能加强直接生成和自我校正的准确性;作为校验者,其自我核查超越了自我一致性。
Article 177
Title@2025-06-12 (4): Generative Algorithms for Wildfire Progression Reconstruction from Multi-Modal Satellite Active Fire Measurements and Terrain Height
Title: Generative Algorithms for Wildfire Progression Reconstruction from Multi-Modal Satellite Active Fire Measurements and Terrain Height | Generative Algorithmen für Wildfire-Progressions-Rekonstruktion durch Multi-Modal Satellite-Aktivfeuer-Messungen und Geländehöhe | 从多模式卫星活性火灾测量和地面高度生成的野火进量重建 2506.10404v1 |
Authors (11): Bryan Shaddy, Brianna Binder, Agnimitra Dasgupta, Haitong Qin, James Haley, Angel Farguell, Kyle Hilburn, Derek V. Mallia, Adam Kochanski, Jan Mandel, Assad Oberai
Increasing wildfire occurrence has spurred growing interest in wildfire spread prediction. However, even the most complex wildfire models diverge from observed progression during multi-day simulations, motivating need for data assimilation. A useful approach to assimilating measurement data into complex coupled atmosphere-wildfire models is to estimate wildfire progression from measurements and use this progression to develop a matching atmospheric state. In this study, an approach is developed for estimating fire progression from VIIRS active fire measurements, GOES-derived ignition times, and terrain height data. A conditional Generative Adversarial Network is trained with simulations of historic wildfires from the atmosphere-wildfire model WRF-SFIRE, thus allowing incorporation of WRF-SFIRE physics into estimates. Fire progression is succinctly represented by fire arrival time, and measurements for training are obtained by applying an approximate observation operator to WRF-SFIRE solutions, eliminating need for satellite data during training. The model is trained on tuples of fire arrival times, measurements, and terrain, and once trained leverages measurements of real fires and corresponding terrain data to generate samples of fire arrival times. The approach is validated on five Pacific US wildfires, with results compared against high-resolution perimeters measured via aircraft, finding an average Sorensen-Dice coefficient of 0.81. The influence of terrain height on the arrival time inference is also evaluated and it is observed that terrain has minimal influence when the inference is conditioned on satellite measurements.
野火发生率的增加引起了人们对野火扩散预测的兴趣,但即使是最复杂的野火模型也引起了对野火扩散预测的兴趣。即使最复杂的野火模型也与多天模拟期间观察到的野火趋势有差异,因此需要数据同化。将测量数据同化为复杂的大气和热火模型的有用方法,是从测量中估计野火进展,利用这一进展来形成一个匹配的大气状态。在这项研究中,开发了一种方法来估计七战热火测量的火灾进展、从地表点火点火点火的时间、从地表高度数据。一个有条件的创生性对网络进行了培训,模拟了从大气和边缘火点火模型WRF-SFIRIR的历史性野火,从而将WRF-SFIRI物理学纳入估计中。火到来时间是简洁的反映,通过将一个大致观测操作员对WRF-SFIRI解决方案进行训练,在培训期间不再需要卫星到达的时间、测量和地形数据,一旦经过培训的火力测量和相应的地形数据,以生成火到的样本。在五个太平洋地平地平地平面的地形上测量时,通过测测测测测测到了5时,在高的地平时,对飞机到达的地平地平地平地平路路路的测测测到结果。
Article 178
Title@2025-06-12 (4): Time To Impeach LLM-as-a-Judge: Programs are the Future of Evaluation
Title: Time To Impeach LLM-as-a-Judge: Programs are the Future of Evaluation | Time To Impeach LLM-as-a-Richter: Programme sind die Zukunft der Evaluation | 弹劾LLM作为法官的时间:方案就是评价的未来 2506.10403v1 |
Authors (3): Tzu-Heng Huang, Harit Vishwakarma, Frederic Sala
Large language models (LLMs) are widely used to evaluate the quality of LLM generations and responses, but this leads to significant challenges: high API costs, uncertain reliability, inflexible pipelines, and inherent biases. To address these, we introduce PAJAMA (Program-As-a-Judge for Automated Model Assessment), a new alternative that uses LLMs to synthesize executable judging programs instead of directly scoring responses. These synthesized programs can be stored and run locally, costing orders of magnitude less while providing interpretable, and auditable judging logic that can be easily adapted. Program-based judges mitigate biases, improving judgment consistency by 15.83% and reducing biased responses by 23.7% on average compared to a Qwen2.5-14B-based LLM-as-a-judge. When program judgments are distilled into a model, PAJAMA outperforms LLM-as-a-judge on the challenging CHAT-HARD subset of RewardBench, outperforming metrics by 2.19% on Prometheus and 8.67% on the JudgeLM dataset, all at three orders of magnitude lower cost.
大型语言模型(LLMS)被广泛用于评估LLM世代和应对措施的质量,但这带来了重大挑战:API成本高、可靠性不确定、输油管不灵活以及固有的偏见。为了解决这些问题,我们引入了PAAMA(自动模型评估方案法官),这是一个新的替代方案,使用LLMS来综合可执行的判断程序,而不是直接评分。这些综合方案可以在当地储存和运行,成本水平低,同时提供可解释和可审计的逻辑,容易调整。基于方案的法官减少偏见,提高判断一致性15.83%,将偏差反应平均减少23.7%,而基于Qwen2.5-14B的LM-as-a-judge。当方案判决被提炼为模型时,PAJAMA优于LM-A-a-法官对挑战性CHAT-HARD的RewardBench(RewardBench)子集,优于Prometheus和法官LM数据集8.67%的指数,均以2.19%的成绩低于2.9%。
Article 179
Title@2025-06-12 (4): RsGCN: Rescaling Enhances Generalization of GCNs for Solving Scalable Traveling Salesman Problems
Title: RsGCN: Rescaling Enhances Generalization of GCNs for Solving Scalable Traveling Salesman Problems | RsGCN: Rescaling verbessert die Generalisierung von GCNs zur Lösung skalierbarer reisender Salesman-Probleme | RsGCN: 提高全球氯化萘的通用化,以解决可缩放旅行销售员问题 2506.00533v2 |
Authors (4): Junquan Huang, Zong-Gan Chen, Yuncheng Jiang, Zhi-Hui Zhan
Neural traveling salesman problem (TSP) solvers face two critical challenges: poor generalization for scalable TSPs and high training costs. To address these challenges, we propose a new Rescaling Graph Convolutional Network (RsGCN). Focusing on the scale-dependent features (i.e., features varied with problem scales) related to nodes and edges that influence the sensitivity of GCNs to the problem scales, a Rescaling Mechanism in RsGCN enhances the generalization capability by (1) rescaling adjacent nodes to construct a subgraph with a uniform number of adjacent nodes for each node across various scales of TSPs, which stabilizes the graph message aggregation; (2) rescaling subgraph edges to adjust the lengths of subgraph edges to the same magnitude, which maintains numerical consistency. In addition, an efficient training strategy with a mixed-scale dataset and bidirectional loss is used in RsGCN. To fully exploit the heatmaps generated by RsGCN, we design an efficient post-search algorithm termed Re2Opt, in which a reconstruction process based on adaptive weight is incorporated to help avoid local optima. Based on a combined architecture of RsGCN and Re2Opt, our solver achieves remarkable generalization and low training cost: with only 3 epochs of training on the mixed-scale dataset containing instances with up to 100 nodes, it can be generalized successfully to 10K-node instances without any fine-tuning. Extensive experiments demonstrate our state-of-the-art performance across uniform distribution instances of 9 different scales from 20 to 10K nodes and 78 real-world instances from TSPLIB, while requiring the fewest learnable parameters and training epochs among neural competitors.
神经巡回销售员问题解决者面临两个严峻的挑战:对可缩放的TSP和高培训成本而言,对可缩放的TSP和高额培训成本的统略性反应不足。为了应对这些挑战,我们建议建立一个新的“缩放图表革命网络 ” (RsGCN) 。关注节点和影响GCN对问题规模敏感度的节点和边缘的大小特征(即与问题规模不同的特点),一个RsGCN的调整机制通过:(1) 调整相邻的通用节点,为TSP不同规模的每个节点建造一个具有统一数目的相邻节点的子图谱,以稳定图形电文汇总;(2) 调整子缩放边缘,将子图边的长度调整为同一程度,以保持数字一致性。此外, RsGCN 的高效培训策略使用混合比例数据集和双向损失。为了充分利用RsGCN生成的热测图,我们设计了一个名为Re2CN的后期分析算法,在不包含适应性重量值的模型和高额培训中,在快速的重建过程中,只能通过Sloveal-IF2的模型学习10级和低成本进行。
Article 180
Title@2025-06-12 (4): TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning
Title: TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning | TooBadRL: Trigger-Optimierung zur Steigerung der Effektivität von Backdoor-Angriffen auf Deep Verstärkung Learning | TooBadRL:在深强化学习中提高后门袭击效果的触发最佳优化 2506.09562v2 |
Authors (4): Songze Li, Mingxuan Zhang, Kang Wei, Shouling Ji
Deep reinforcement learning (DRL) has achieved remarkable success in a wide range of sequential decision-making domains, including robotics, healthcare, smart grids, and finance. Recent research demonstrates that attackers can efficiently exploit system vulnerabilities during the training phase to execute backdoor attacks, producing malicious actions when specific trigger patterns are present in the state observations. However, most existing backdoor attacks rely primarily on simplistic and heuristic trigger configurations, overlooking the potential efficacy of trigger optimization. To address this gap, we introduce TooBadRL (Trigger Optimization to Boost Effectiveness of Backdoor Attacks on DRL), the first framework to systematically optimize DRL backdoor triggers along three critical axes, i.e., temporal, spatial, and magnitude. Specifically, we first introduce a performance-aware adaptive freezing mechanism for injection timing. Then, we formulate dimension selection as a cooperative game, utilizing Shapley value analysis to identify the most influential state variable for the injection dimension. Furthermore, we propose a gradient-based adversarial procedure to optimize the injection magnitude under environment constraints. Evaluations on three mainstream DRL algorithms and nine benchmark tasks show that TooBadRL significantly improves attack success rates, while ensuring minimal degradation of normal task performance. These results highlight the previously underappreciated importance of principled trigger optimization in DRL backdoor attacks. The source code of TooBadRL can be found at https://github.com/S3IC-Lab/TooBadRL.
深入强化学习(DRL)在一系列广泛的连续决策领域取得了显著成功,包括机器人、医疗保健、智能电网和金融。最近的研究表明,攻击者可以在培训阶段有效利用系统弱点,实施幕后攻击,在州观测中出现特定触发模式时采取恶意行动。然而,大多数现有的后门攻击主要依靠简单和超常的触发配置,忽视触发优化的潜在效力。为了解决这一差距,我们引入了ToBadRL(Trigger优化到增强DRL后门攻击的效果),这是在三个关键轴上系统优化DRL后门触发的第一个框架,即:时间、空间和规模。具体地说,我们首先引入了一种有性适应性的适应性冷冻机制,用于注入时间。然后,我们将维度选择作为一种合作游戏,利用沙普利价值分析来确定注射层面最有影响力的变量。此外,我们还可以提出一种基于梯度的对抗程序,以优化环境制约下的注射量。在三个主流DR算法和九个基准任务上,即时间、空间、空间和规模。我们首先引入了性适应性ADRral-BL攻击的成功率,然后在常规评估中,在SBRABRABABABRABABABADR的常规攻击中大幅测试中大幅改进了成功。
Article 181
Title@2025-06-12 (4): EQA-RM: A Generative Embodied Reward Model with Test-time Scaling
Title: EQA-RM: A Generative Embodied Reward Model with Test-time Scaling | EQA-RM: Generatives Embmund-Reward-Modell mit Testzeitskalierung | EQA-RM:带有试验时间缩放的产生性内嵌式奖赏模型 2506.10389v1 |
Authors (3): Yuhang Chen, Zhen Tan, Tianlong Chen
Reward Models (RMs), vital for large model alignment, are underexplored for complex embodied tasks like Embodied Question Answering (EQA) where nuanced evaluation of agents’ spatial, temporal, and logical understanding is critical yet not considered by generic approaches. We introduce EQA-RM, a novel generative multimodal reward model specifically architected for EQA, trained via our innovative Contrastive Group Relative Policy Optimization (C-GRPO) strategy to learn fine-grained behavioral distinctions. The generative nature of EQA-RM provides interpretable, structured reward feedback (beyond simple scalars), uniquely enabling test-time scaling to dynamically adjust evaluation granularity, from concise scores to detailed critiques of reasoning and grounding, at inference without retraining. Concurrently, we introduce EQARewardBench, a new benchmark built on OpenEQA for standardized EQA reward model assessment. Demonstrating high sample efficiency, EQA-RM (fine-tuning Qwen2-VL-2B-Instruct) achieves 61.9\% accuracy on EQA-RM-Bench with only 700 samples, outperforming strong proprietary baselines, including Gemini-2.5-Flash, GPT-4o, Claude-3.5-Haiku, and open-sourced state-of-the-art models such as RoVRM and VisualPRM. The code and dataset can be found here https://github.com/UNITES-Lab/EQA-RM.
对大型模型调整至关重要的重塑模型(RMs),在对代理人的空间、时间和逻辑理解进行细微的细微评价至非通用方法所考虑的关键情况下,对于诸如Embodied 问答(EQA)等复杂而成型的任务,尚未充分探讨。我们引入了EQA-RM,这是专门为EQA设计的一种新颖的基因化多式奖赏模型,通过我们创新的对比小组相对政策优化(C-GROPO)战略培训,以了解细微区分的行为区分。EQA-RM的基因化性质提供了可解释的、结构化的奖赏反馈(超越简单的标度 ) 、独特的测试时间缩放至动态调整评价颗粒度,从简洁的评分到详细的推理和基础分析。 同时,我们引入EQAwardBench,这是建立在OpEQA(C-GRPO)标准奖赏模型评估的一个新的基准。展示高采样效率,EQA-RM(对Q2-V-L-2-B-Instra-In-RM-S-RM-S-I-RM-S-S-I-S-S-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-IB-S-S-S-S-S-S-IB-S-S-S-S-S-S-S-S-S-S-S-S-S-S-IAR_BAR_BAR_BAR_BAR-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-IAR-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-IAR_BAR_B-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-
Article 182
Title@2025-06-12 (4): Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective
Title: Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective | Messen von Repräsentationsverschiebungen im kontinuierlichen Lernen: Eine lineare Transformationsperspektive | 持续学习中的衡量代表性变化:线性转变前景 2505.20970v3 |
Authors (3): Joonkyu Kim, Yejin Kim, Jy-yong Sohn
In continual learning scenarios, catastrophic forgetting of previously learned tasks is a critical issue, making it essential to effectively measure such forgetting. Recently, there has been growing interest in focusing on representation forgetting, the forgetting measured at the hidden layer. In this paper, we provide the first theoretical analysis of representation forgetting and use this analysis to better understand the behavior of continual learning. First, we introduce a new metric called representation discrepancy, which measures the difference between representation spaces constructed by two snapshots of a model trained through continual learning. We demonstrate that our proposed metric serves as an effective surrogate for the representation forgetting while remaining analytically tractable. Second, through mathematical analysis of our metric, we derive several key findings about the dynamics of representation forgetting: the forgetting occurs more rapidly to a higher degree as the layer index increases, while increasing the width of the network slows down the forgetting process. Third, we support our theoretical findings through experiments on real image datasets, including Split-CIFAR100 and ImageNet1K.
在持续学习的情景中,灾难性地忘记以往学到的任务是一个关键问题,因此,必须有效地衡量如何忘记这些任务。最近,人们越来越关注注重代表的遗忘,在隐蔽层测量的忘记。在本文中,我们首次对代表的遗忘进行了理论分析,并利用这一分析更好地了解持续学习的行为。首先,我们引入了一个新的指标,称为代表差异,以衡量通过持续学习所培训的模型的两张快照所构建的代表性空间之间的差异。我们证明,我们提议的衡量标准是代表的遗忘的有效替代物,同时仍然可以分析地进行。第二,我们通过数学分析我们的衡量标准,得出关于代表的动态的一些关键结论:随着分层指数的增加,遗忘现象会更快地发生,同时提高网络的宽度,减缓了遗忘过程。第三,我们通过对真实图像数据集的实验,包括Splid-CIFAR100和图像网络1K,支持我们的理论发现。
Article 183
Title@2025-06-12 (4): Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
Title: Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning | Hierarchische Latentenfähigkeiten von Sprachmodellen über das kausale Repräsentationslernen entdecken | 通过因果代表制学习发现语言模式的分级本端能力 2506.10378v1 |
Authors (4): Jikai Jin, Vasilis Syrgkanis, Sham Kakade, Hanlin Zhang
Faithful evaluation of language model capabilities is crucial for deriving actionable insights that can inform model development. However, rigorous causal evaluations in this domain face significant methodological challenges, including complex confounding effects and prohibitive computational costs associated with extensive retraining. To tackle these challenges, we propose a causal representation learning framework wherein observed benchmark performance is modeled as a linear transformation of a few latent capability factors. Crucially, these latent factors are identified as causally interrelated after appropriately controlling for the base model as a common confounder. Applying this approach to a comprehensive dataset encompassing over 1500 models evaluated across six benchmarks from the Open LLM Leaderboard, we identify a concise three-node linear causal structure that reliably explains the observed performance variations. Further interpretation of this causal structure provides substantial scientific insights beyond simple numerical rankings: specifically, we reveal a clear causal direction starting from general problem-solving capabilities, advancing through instruction-following proficiency, and culminating in mathematical reasoning ability. Our results underscore the essential role of carefully controlling base model variations during evaluation, a step critical to accurately uncovering the underlying causal relationships among latent model capabilities.
对语言模型能力的忠实评价对于得出可用于模型开发的可操作的洞察力至关重要。然而,这一领域的严格的因果评估面临着重大的方法挑战,包括复杂的混乱效应和与广泛再培训有关的令人难以承受的计算费用。为了应对这些挑战,我们提议了一个因果代表性学习框架,其中所观察到的基准业绩以几个潜在能力因素的线性转变为模型。关键是,这些潜在因素在适当控制基础模型后被确定为因果性关联性,而作为共同的混淆因素。将这一方法应用于一个包含1500多个模型的综合数据集,该数据集由开放 LLM 领头板的六个基准进行评估,我们确定了一个简洁的三点线性因果结构,可靠地解释了观察到的业绩差异。对这一因果结构的进一步解释提供了超出简单数字排序以外的大量科学洞察力:具体地说,我们揭示了一个明确的因果性方向,从一般解决问题的能力出发,通过以熟练的教学方式推进,以数学推理能力为顶峰。我们的结果强调了在评估期间仔细控制基础模型变换的基本作用,这是准确发现潜在模型能力之间潜在因果关系的关键一步。
Article 184
Title@2025-06-12 (4): A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Title: A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce | Ein minimalistischer Ansatz zur LLM-Vernunft: von der Abstoßung zur Verstärkung | 从拒绝抽样到强化 2504.11343v2 |
Authors (11): Wei Xiong, Jiarui Yao, Yuhui Xu, Bo Pang, Lei Wang, Doyen Sahoo, Junnan Li, Nan Jiang, Tong Zhang, Caiming Xiong, Hanze Dong
Reinforcement learning (RL) has become a prevailing approach for fine-tuning large language models (LLMs) on complex reasoning tasks. Among recent methods, GRPO stands out for its empirical success in training models such as DeepSeek-R1, yet the sources of its effectiveness remain poorly understood. In this work, we revisit GRPO from a reinforce-like algorithm perspective and analyze its core components. Surprisingly, we find that a simple rejection sampling baseline, RAFT, which trains only on positively rewarded samples, yields competitive performance than GRPO and PPO. Our ablation studies reveal that GRPO’s main advantage arises from discarding prompts with entirely incorrect responses, rather than from its reward normalization. Motivated by this insight, we propose Reinforce-Rej, a minimal extension of policy gradient that filters both entirely incorrect and entirely correct samples. Reinforce-Rej improves KL efficiency and stability, serving as a lightweight yet effective alternative to more complex RL algorithms. We advocate RAFT as a robust and interpretable baseline, and suggest that future advances should focus on more principled designs for incorporating negative samples, rather than relying on them indiscriminately. Our findings provide guidance for future work in reward-based LLM post-training.
加强学习已成为在复杂的推理任务上微调大型语言模型(LLMs)的普遍做法。在最近的方法中,GROP在DeepSeek-R1等培训模型中表现出了成功的经验,但是其效力的来源仍然不甚为人理解。在这项工作中,我们从强化的算法角度重新研究GROP,并分析其核心组成部分。令人惊讶的是,我们发现一个简单的拒绝抽样基准,RAFT,它仅对积极奖励的样本进行培训,比GROPO和PPPO具有竞争性性能。我们的通货膨胀研究表明,GROPO的主要优势来自于完全错误的反应,而不是其奖赏正常化。我们建议加强-Rej,这是政策梯度的最小延伸,可以筛选完全不正确和完全正确的样本。加强-REj提高了KL的效率和稳定性,作为较复杂的RL算法的较轻但有效的替代方法。我们主张RAFT为强有力和可解释的基线,并建议今后的进展应侧重于更有原则性的设计,以纳入负面的样品,而不是依赖我们未来的草本性训练。
Article 185
Title@2025-06-12 (4): Revisiting Transformers with Insights from Image Filtering
Title: Revisiting Transformers with Insights from Image Filtering | Übertragen von Transformern mit Einsichten aus der Bildfilterung | 以图像过滤透视的洞视重新审视变形器 2506.10371v1 |
Authors (3): Laziz U. Abdullaev, Maksim Tkachenko, Tan M. Nguyen
The self-attention mechanism, a cornerstone of Transformer-based state-of-the-art deep learning architectures, is largely heuristic-driven and fundamentally challenging to interpret. Establishing a robust theoretical foundation to explain its remarkable success and limitations has therefore become an increasingly prominent focus in recent research. Some notable directions have explored understanding self-attention through the lens of image denoising and nonparametric regression. While promising, existing frameworks still lack a deeper mechanistic interpretation of various architectural components that enhance self-attention, both in its original formulation and subsequent variants. In this work, we aim to advance this understanding by developing a unifying image processing framework, capable of explaining not only the self-attention computation itself but also the role of components such as positional encoding and residual connections, including numerous later variants. We also pinpoint potential distinctions between the two concepts building upon our framework, and make effort to close this gap. We introduce two independent architectural modifications within transformers. While our primary objective is interpretability, we empirically observe that image processing-inspired modifications can also lead to notably improved accuracy and robustness against data contamination and adversaries across language and vision tasks as well as better long sequence understanding.
自我注意机制是以变异器为基础的最先进的深层次学习结构的基石,它在很大程度上是超自然学驱动的,对解释具有根本的挑战性。因此,建立坚实的理论基础来解释其显著的成功和局限性,已成为最近研究的一个日益突出的重点。一些值得注意的方向探索了通过图像脱色和非对称回归的视角来理解自我注意。虽然前景看好,但现有框架仍然缺乏对各种建筑构件的更深层次机械化解释,这些构件在最初的构件和随后的变式中都加强了自我注意。在这项工作中,我们的目标是通过开发一个统一的图像处理框架来推进这种理解,不仅能够解释自我注意计算本身,而且能够解释定位编码和剩余连接等构件的作用,包括许多后来的变式。我们还确定了在我们框架内建立的两个概念之间的潜在区别,并努力缩小这一差距。我们在变异体中引入了两个独立的建筑修饰。虽然我们的首要目标是可解释性,但我们从经验上认为,受启发的修改还能够显著地提高准确性和稳健性,防止数据在语言和愿景上相互交叉和长远理解。
Article 186
Title@2025-06-12 (4): Can We Infer Confidential Properties of Training Data from LLMs?
Title: Can We Infer Confidential Properties of Training Data from LLMs? | Können wir vertrauliche Eigenschaften von Trainingsdaten von LLMs ableiten? | 我们能否从LLMS中推断培训数据的机密性? 2506.10364v1 |
Authors (4): Penguin Huang, Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri
Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets to support applications in fields such as healthcare, finance, and law. These fine-tuning datasets often have sensitive and confidential dataset-level properties – such as patient demographics or disease prevalence – that are not intended to be revealed. While prior work has studied property inference attacks on discriminative models (e.g., image classification models) and generative models (e.g., GANs for image data), it remains unclear if such attacks transfer to LLMs. In this work, we introduce PropInfer, a benchmark task for evaluating property inference in LLMs under two fine-tuning paradigms: question-answering and chat-completion. Built on the ChatDoctor dataset, our benchmark includes a range of property types and task configurations. We further propose two tailored attacks: a prompt-based generation attack and a shadow-model attack leveraging word frequency signals. Empirical evaluations across multiple pretrained LLMs show the success of our attacks, revealing a previously unrecognized vulnerability in LLMs.
大型语言模型(LLMS)日益在特定领域数据集上进行微调,以支持保健、金融和法律等领域的应用;这些微调数据集往往具有敏感和保密的数据集级特性,如病人人口或疾病流行,这些性质并不打算披露;虽然以前的工作研究过对歧视模型(例如图像分类模型)和基因模型(例如图像数据GANs)进行的财产推断攻击,但仍然不清楚这种攻击是否转移到LMS;在这项工作中,我们引入PropInfer,这是在两个微调模式下评估LLMS财产推断的基准任务:回答问题和完成聊天。在ChatDoctor数据集上,我们的基准包括一系列财产类型和任务配置。我们进一步建议了两种有针对性的攻击:即时生成攻击和影子模型攻击,利用文字频率信号。在多个受过训练的LMS中进行的经验性评价显示我们攻击的成功,揭示了LMS中先前未确认的脆弱性。
Article 187
Title@2025-06-12 (4): Permutation-Based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data
Title: Permutation-Based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data | Permutationsbasierter Rank-Test in der Gegenwart von Diskretisierung und Anwendung in der kausalen Entdeckung mit gemischten Daten | 在与混合数据相混合的Causal Discovery中存在分化和应用 2501.18990v2 |
Authors (9): Xinshuai Dong, Ignavier Ng, Boyang Sun, Haoyue Dai, Guang-Yuan Hao, Shunxing Fan, Peter Spirtes, Yumou Qiu, Kun Zhang
Recent advances have shown that statistical tests for the rank of cross-covariance matrices play an important role in causal discovery. These rank tests include partial correlation tests as special cases and provide further graphical information about latent variables. Existing rank tests typically assume that all the continuous variables can be perfectly measured, and yet, in practice many variables can only be measured after discretization. For example, in psychometric studies, the continuous level of certain personality dimensions of a person can only be measured after being discretized into order-preserving options such as disagree, neutral, and agree. Motivated by this, we propose Mixed data Permutation-based Rank Test (MPRT), which properly controls the statistical errors even when some or all variables are discretized. Theoretically, we establish the exchangeability and estimate the asymptotic null distribution by permutations; as a consequence, MPRT can effectively control the Type I error in the presence of discretization while previous methods cannot. Empirically, our method is validated by extensive experiments on synthetic data and real-world data to demonstrate its effectiveness as well as applicability in causal discovery.
最近的进展表明,跨变量矩阵等级的统计测试在因果发现中起着重要作用,这些等级测试包括作为特殊案例的部分相关测试,并提供关于潜在变量的进一步图形信息。现有的等级测试通常假定所有连续变量都可以完全测量,但实际上许多变量只能在分解后才能测量。例如,在心理测定研究中,一个人某些个性层面的连续水平只有在分解成不同、中立和同意等按顺序保存选项之后才能测量。我们为此提出混合数据跨位测试(MPRT),即使某些或所有变量是分解的,也能够适当控制统计错误。理论上,我们确定可互换性,并估计通过分解的无损分布;因此,MPRT能够有效控制存在离解时的I型错误,而以前的方法则无法。我们的方法通过对合成数据和真实世界数据的广泛实验加以验证,以证明其有效性和在因果发现中的适用性。
Article 188
Title@2025-06-12 (4): Advanced deep architecture pruning using single filter performance
Title: Advanced deep architecture pruning using single filter performance | Fortschrittliche Tiefenarchitektur-Beschneidung mit Single-Filter-Performance | 使用单一过滤性能进行高级深高级结构修剪 2501.12880v2 |
Authors (6): Yarden Tzach, Yuval Meir, Ronit D. Gross, Ofek Tevet, Ella Koresh, Ido Kanter
Pruning the parameters and structure of neural networks reduces the computational complexity, energy consumption, and latency during inference. Recently, a novel underlying mechanism for successful deep learning (DL) was presented based on a method that quantitatively measures the single filter performance in each layer of a DL architecture, and a new comprehensive mechanism of how deep learning works was presented. This statistical mechanics inspired viewpoint enables to reveal the macroscopic behavior of the entire network from the microscopic performance of each filter and their cooperative behavior. Herein, we demonstrate how this understanding paves the path to high quenched dilution of the convolutional layers of deep architectures without affecting their overall accuracy using applied filter cluster connections (AFCC). AFCC is exemplified on VGG-11 and EfficientNet-B0 architectures trained on CIFAR-100, and its high pruning outperforms other techniques using the same pruning magnitude. Additionally, this technique is broadened to single nodal performance and highly pruning of fully connected layers, suggesting a possible implementation to considerably reduce the complexity of over-parameterized AI tasks.
保护神经网络的参数和结构可以降低计算复杂性、能源消耗和推论期间的潜伏。最近,根据定量测量DL结构每一层的单一过滤性能的方法,提出了一个新的深层成功学习基本机制(DL),并提出了新的深层学习工作的综合机制。这一统计机制启发的观点能够揭示整个网络的宏观行为,从每个过滤器的微镜性能及其合作行为中揭示出。在这里,我们展示了这一理解如何铺平了深层建筑的快速稀释,而不会影响其总体准确性,使用应用的过滤性集束连接(AFCC),AFCC以VG-11和高效的Net-B0为范例,在CIFAR-100上培训的架构及其高运行率超过使用相同运行强度的其他技术。此外,这一技术已扩大到单一节度性表现和高度连接的层,建议可能实施,以大大降低过度分离的AI任务的复杂性。
Article 189
Title@2025-06-12 (4): Dimension-Independent Kernel ε-Covers
Title: Dimension-Independent Kernel ε-Covers | Dimensionsunabhängiger Kernel ε-Covers | 独立的内核 - 中央 2306.16516v2 |
Authors (2): Jeff M. Phillips, Hasan Pourmahmood-Aghababa
We introduce the notion of an $\varepsilon$-cover for a kernel range space. A kernel range space concerns a set of points $X \subset \mathbb{R}^d$ and the space of all queries by a fixed kernel (e.g., a Gaussian kernel $K(p,\cdot) = \exp(-|p-\cdot|^2)$, where $p \in \mathbb{R}^d$). For a point set $X$ of size $n$, a query returns a vector of values $R_p \in \mathbb{R}^n$, where the $i$th coordinate $(R_p)_i = K(p,x_i)$ for $x_i \in X$. An $\varepsilon$-cover is a subset of points $Q \subset \mathbb{R}^d$ so for any $p \in \mathbb{R}^d$ that $\frac{1}{n} |R_p - R_q|_1\leq \varepsilon$ for some $q \in Q$. This is a smooth analog of Haussler’s notion of $\varepsilon$-covers for combinatorial range spaces (e.g., defined by subsets of points within a ball query) where the resulting vectors $R_p$ are in ${0,1}^n$ instead of $[0,1]^n$. The kernel versions of these range spaces show up in data analysis tasks where the coordinates may be uncertain or imprecise, and hence one wishes to add some flexibility in the notion of inside and outside of a query range. Our main result is that, unlike combinatorial range spaces, the size of kernel $\varepsilon$-covers is independent of the input size $n$ and dimension $d$. We obtain a bound of $2^{\tilde O(1/\varepsilon^2)}$, where $\tilde{O}(f(1/\varepsilon))$ hides log factors in $(1/\varepsilon)$ that can depend on the kernel. This implies that by relaxing the notion of boundaries in range queries, eventually the curse of dimensionality disappears, and may help explain the success of machine learning in very high-dimensions. We also complement this result with a lower bound of almost $(1/\varepsilon)^{\Omega(1/\varepsilon)}$, showing the exponential dependence on $1/\varepsilon$ is necessary.
nan
Article 190
Title@2025-06-12 (4): Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success
Title: Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success | Demonstration von Multi-Suction Item Picking at Scale über Multi-Modal Lernen von Pick-Erfolg | 通过多模式学习选取成功模式, 演示通过选取成功多模式学习在缩放中选择的多测试项目 2506.10359v1 |
Authors (9): Che Wang, Jeroen van Baar, Chaitanya Mitash, Shuai Li, Dylan Randle, Weiyao Wang, Sumedh Sontakke, Kostas E. Bekris, Kapil Katyal
This work demonstrates how autonomously learning aspects of robotic operation from sparsely-labeled, real-world data of deployed, engineered solutions at industrial scale can provide with solutions that achieve improved performance. Specifically, it focuses on multi-suction robot picking and performs a comprehensive study on the application of multi-modal visual encoders for predicting the success of candidate robotic picks. Picking diverse items from unstructured piles is an important and challenging task for robot manipulation in real-world settings, such as warehouses. Methods for picking from clutter must work for an open set of items while simultaneously meeting latency constraints to achieve high throughput. The demonstrated approach utilizes multiple input modalities, such as RGB, depth and semantic segmentation, to estimate the quality of candidate multi-suction picks. The strategy is trained from real-world item picking data, with a combination of multimodal pretrain and finetune. The manuscript provides comprehensive experimental evaluation performed over a large item-picking dataset, an item-picking dataset targeted to include partial occlusions, and a package-picking dataset, which focuses on containers, such as boxes and envelopes, instead of unpackaged items. The evaluation measures performance for different item configurations, pick scenes, and object types. Ablations help to understand the effects of in-domain pretraining, the impact of different modalities and the importance of finetuning. These ablations reveal both the importance of training over multiple modalities but also the ability of models to learn during pretraining the relationship between modalities so that during finetuning and inference, only a subset of them can be used as input.
这项工作展示了机器人操作的自发学习方面如何从低标签、真实世界的部署数据、工业规模的工程解决方案中自主学习机器人操作的方方面面能够提供提高性能的解决方案。 具体地说, 它侧重于多制制式机器人选取, 并全面研究多式视觉解析器用于预测候选机器人选取的成功。 从非结构化的堆叠中选取各种物品, 是一个重要而具有挑战性的任务, 在现实世界环境中, 如仓库, 机器人操作中选取。 从杂交中选取数据的方法必须用于一套开放的物品, 同时满足延缓限制, 以达到高通量。 演示的方法利用多种输入模式, 如 RGB、 深度和 语义分解分解, 来估计候选多式视觉选取器的质量。 战略从真实世界的选取项目中选取各种物品, 将多式前导和细调数据组合组合中进行全面的实验性评价, 专制式选取数据集部分封闭, 和精细制数据设置的精细化数据设置, 也注重于能力模式, 评估容器、 格式的精选取、 格式的精选制、 格式的精选制、 格式的精选的精选的精选的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度, , 等的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度, 的精度的精度的精度的精度的精度的精度, 的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度, 的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度,
Article 191
Title@2025-06-12 (4): TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree
Title: TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree | TreeLoRA: Effizientes kontinuierliches Lernen über schichtweise LoRAs, geleitet von einem Hierarchischen Gradienten-Similaritätsbaum | TreeLORA:通过层-维兹 LoRAs 有效持续学习,由等级梯度梯度- 硅化树指导 2506.10355v1 |
Authors (5): Yu-Yang Qian, Yuan-Ze Xu, Zhen-Yu Zhang, Peng Zhao, Zhi-Hua Zhou
Many real-world applications collect data in a streaming environment, where learning tasks are encountered sequentially. This necessitates continual learning (CL) to update models online, enabling adaptation to new tasks while preserving past knowledge to prevent catastrophic forgetting. Nowadays, with the flourish of large pre-trained models (LPMs), efficiency has become increasingly critical for CL, due to their substantial computational demands and growing parameter sizes. In this paper, we introduce TreeLoRA (K-D Tree of Low-Rank Adapters), a novel approach that constructs layer-wise adapters by leveraging hierarchical gradient similarity to enable efficient CL, particularly for LPMs. To reduce the computational burden of task similarity estimation, we employ bandit techniques to develop an algorithm based on lower confidence bounds to efficiently explore the task structure. Furthermore, we use sparse gradient updates to facilitate parameter optimization, making the approach better suited for LPMs. Theoretical analysis is provided to justify the rationale behind our approach, and experiments on both vision transformers (ViTs) and large language models (LLMs) demonstrate the effectiveness and efficiency of our approach across various domains, including vision and natural language processing tasks.
许多现实世界应用程序在不断流出的环境中收集数据,学习任务依次相遇。这需要不断学习(CL),在网上更新模型,使适应新任务,同时保留过去的知识,防止灾难性的遗忘。如今,随着大量预先培训的模型(LPMs)的蓬勃发展,效率对于CL越来越重要,因为它们有大量的计算需求,而且参数大小不断增大。在本文中,我们引入了TrealoRA(K-D 低兰克适应器树),这是一种新颖的方法,它利用等级梯度相似的等级梯度来构建多层次适应器,从而使得高效率的CLLL,特别是LPMs。为了减少任务相似性估算的计算负担,我们使用了土匪技术来开发一种基于较低信任度的算法,以便有效地探索任务结构。此外,我们使用稀有的梯度更新法来方便参数的优化,使LPMs更适合LPMs。我们的方法有合理性,并且对视觉变异器和大语言模型进行实验,以展示我们的方法的效益和效率,包括视觉和自然语言处理任务。
Article 192
Title@2025-06-12 (4): History-Aware Neural Operator: Robust Data-Driven Constitutive Modeling of Path-Dependent Materials
Title: History-Aware Neural Operator: Robust Data-Driven Constitutive Modeling of Path-Dependent Materials | History-Aware Neuronal Operator: Robuste datengetriebene konstitutive Modellierung von pfadabhängigen Materialien | 历史软件神经操作器:路径依赖材料的强有力数据驱动建模模型 2506.10352v1 |
Authors (3): Binyao Guo, Zihan Lin, QiZhi He
This study presents an end-to-end learning framework for data-driven modeling of path-dependent inelastic materials using neural operators. The framework is built on the premise that irreversible evolution of material responses, governed by hidden dynamics, can be inferred from observable data. We develop the History-Aware Neural Operator (HANO), an autoregressive model that predicts path-dependent material responses from short segments of recent strain-stress history without relying on hidden state variables, thereby overcoming self-consistency issues commonly encountered in recurrent neural network (RNN)-based models. Built on a Fourier-based neural operator backbone, HANO enables discretization-invariant learning. To enhance its ability to capture both global loading patterns and critical local path dependencies, we embed a hierarchical self-attention mechanism that facilitates multiscale feature extraction. Beyond ensuring self-consistency, HANO mitigates sensitivity to initial hidden states, a commonly overlooked issue that can lead to instability in recurrent models when applied to generalized loading paths. By modeling stress-strain evolution as a continuous operator rather than relying on fixed input-output mappings, HANO naturally accommodates varying path discretizations and exhibits robust performance under complex conditions, including irregular sampling, multi-cycle loading, noisy data, and pre-stressed states. We evaluate HANO on two benchmark problems: elastoplasticity with hardening and progressive anisotropic damage in brittle solids. Results show that HANO consistently outperforms baseline models in predictive accuracy, generalization, and robustness. With its demonstrated capabilities, HANO provides an effective data-driven surrogate for simulating inelastic materials and is well-suited for integration with classical numerical solvers.
本研究为利用神经操作员对依赖路径的弹性材料进行数据驱动建模提供了一个端到端学习的框架。框架的前提是,从可见的数据中可以推断出由隐藏的动态调节的材料反应不可逆转的演变。我们开发了历史软件神经操作员(HANO),这是一个自动递减模型,它预测从最近紧张历史的短片中产生基于路径的材料反应,而不必依赖隐藏的状态变量,从而克服经常的内向型网络(RNNN)基于神经运行的模型中常见的自我一致性问题。在基于四流的神经操作员骨干上,HANO能够让不易的内向性学习。为了提高它捕捉全球装货模式和关键的本地路径依赖的能力,我们建立了有利于多级特征提取的等级自控机制。除了确保自身的一致性之外,HANO还减轻了对初始隐藏状态的敏感度,一个通常被忽视的问题,在应用通用的内装能力时导致经常模型的不稳定性。通过模拟压力压力的演化,而不是依赖固定的内向型螺,在固定的内置的内置的内流、内流、自动的内置的内置、自动的内置的内置、自动的内置的内置的内置的内置和内置的内置式的内置的内置的内置的内置的内置的内置的内置的内存状态的内存状态的内存状态, 的内置的内置的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存状态,包括:一个内存的内存的内存的内存的内存的内存的内存的内行的内行的内存的内行状态的内存的内存的内存的内存的内存的内存的内行的内存的内存的内存的内存的内行的内行的内行的内行的内行的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内
Article 193
Title@2025-06-12 (4): PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
Title: PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications | PhysNav-DG: Ein neuartiges adaptives Framework für robuste VLM-Sensor-Fusion in Navigationsanwendungen | PhysNav-DG:导航应用中的强力VLM-传感器融合新适应框架 2505.01881v2 |
Authors (2): Trisanth Srinivasan, Santosh Patapati
Robust navigation in diverse environments and domains requires both accurate state estimation and transparent decision making. We present PhysNav-DG, a novel framework that integrates classical sensor fusion with the semantic power of vision-language models. Our dual-branch architecture predicts navigation actions from multi-sensor inputs while simultaneously generating detailed chain-of-thought explanations. A modified Adaptive Kalman Filter dynamically adjusts its noise parameters based on environmental context. It leverages several streams of raw sensor data along with semantic insights from models such as LLaMA 3.2 11B and BLIP-2. To evaluate our approach, we introduce the MD-NEX Benchmark, a novel multi-domain dataset that unifies indoor navigation, autonomous driving, and social navigation tasks with ground-truth actions and human-validated explanations. Extensive experiments and ablations show that PhysNav-DG improves navigation success rates by over 20% and achieves high efficiency, with explanations that are both highly grounded and clear. This work connects high-level semantic reasoning and geometric planning for safer and more trustworthy autonomous systems.
在不同环境和领域进行强力导航既需要准确的状态估计,也需要透明的决策。 我们展示了PhysNav-DG, 这个将古典传感器与视觉语言模型的语义力融合在一起的新框架。 我们的双部门架构从多传感器投入中预测导航行动,同时产生详细的思维链解释。 一个修改过的适应性卡尔曼过滤器根据环境背景动态调整其噪音参数。 它利用了多种原始传感器数据流以及LalaMA 3. 2 11B 和 BLIP-2 等模型的语义洞察。 为了评估我们的方法,我们引入了MD-NEX基准,这是一套新颖的多维数据集,它以地面导航、自主驾驶和社会导航动作和人性解释为一体。 广泛的实验和推理显示PhysNav-DG将导航成功率提高20%以上,并实现了高效率,其解释基础和清晰。 这项工作将高层次的语义推理推理和几何规划连接了更安全和更可靠的自主系统。
Article 194
Title@2025-06-12 (4): Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control
Title: Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control | Verbesserung der Glasdefekterkennung mit Diffusionsmodellen: Adressierung unausgewogener Datensätze in der Fertigungsqualitätskontrolle | 利用传播模型加强玻璃破损检测:在制造业质量控制中解决数据集不平衡问题 2505.03134v2 |
Authors (3): Sajjad Rezvani Boroujeni, Hossein Abedi, Tom Bush
Visual defect detection in industrial glass manufacturing remains a critical challenge due to the low frequency of defective products, leading to imbalanced datasets that limit the performance of deep learning models and computer vision systems. This paper presents a novel approach using Denoising Diffusion Probabilistic Models (DDPMs) to generate synthetic defective glass product images for data augmentation, effectively addressing class imbalance issues in manufacturing quality control and automated visual inspection. The methodology significantly enhances image classification performance of standard CNN architectures (ResNet50V2, EfficientNetB0, and MobileNetV2) in detecting anomalies by increasing the minority class representation. Experimental results demonstrate substantial improvements in key machine learning metrics, particularly in recall for defective samples across all tested deep neural network architectures while maintaining perfect precision. The most dramatic improvement was observed in ResNet50V2’s overall classification accuracy, which increased from seventy-eight percent to ninety-three percent when trained with the augmented data. This work provides a scalable, cost effective approach to enhancing automated defect detection in glass manufacturing that can potentially be extended to other industrial quality assurance systems and industries with similar class imbalance challenges.
工业玻璃制造中的视觉缺陷检测由于有缺陷产品的频率低,导致数据集失衡,限制了深层学习模型和计算机视觉系统的性能,因此,在工业玻璃制造中发现有缺陷仍然是一项重大挑战。本文介绍了一种新颖的方法,使用Denoising Difmission 概率模型(DDPMs)生成合成有缺陷的玻璃产品图像,用于数据增强,有效解决制造质量控制和自动视觉检查中的阶级不平衡问题。该方法通过增加少数阶层的代表性,大大提高了有线电视新闻新闻网标准结构(ResNet50V2, effectiveNetB0, 和 MovedNetV2)在发现反常现象方面的图像分类性能。实验结果表明,关键机器学习指标有重大改进,特别是在所有经过测试的深神经网络结构中召回有缺陷的样本,同时保持完全精确性。在ResNet50V2的总体分类精确度中观察到了最显著的改进,在接受强化数据培训后,从78%增加到93%。这项工作为改进玻璃制造中的自动缺陷检测提供了一种可扩展到其他工业质量保证系统和有类似阶级不平衡挑战的可推广的可推广的、成本有效的方法。
Article 195
Title@2025-06-12 (4): Decoding for Punctured Convolutional and Turbo Codes: A Deep Learning Solution for Protocols Compliance
Title: Decoding for Punctured Convolutional and Turbo Codes: A Deep Learning Solution for Protocols Compliance | Dekodierung für punktierte Konvolutions- und Turbo-Codes: Eine Deep-Learning-Lösung für Protokolle Compliance | 钻穿的革命代码和涡轮代码解码:遵守议定书的深学习解决方案 2502.15475v2 |
Authors (2): Yongli Yan, Linglong Dai
Neural network-based decoding methods have shown promise in enhancing error correction performance, but traditional approaches struggle with the challenges posed by punctured codes. In particular, these methods fail to address the complexities of variable code rates and the need for protocol compatibility. This paper presents a unified Long Short-Term Memory (LSTM)-based decoding architecture specifically designed to overcome these challenges. The proposed method unifies punctured convolutional and Turbo codes. A puncture embedding mechanism integrates puncturing patterns directly into the network, enabling seamless adaptation to varying code rates, while balanced bit error rate training ensures robustness across different code lengths, rates, and channels, maintaining protocol flexibility. Extensive simulations in Additive White Gaussian Noise and Rayleigh fading channels demonstrate that the proposed approach outperforms conventional decoding techniques, providing significant improvements in decoding accuracy and robustness. These results underscore the potential of LSTM-based decoding as a promising solution for next-generation artificial intelligence powered communication systems.
以神经网络为基础的解码方法在提高错误校正性能方面显示出了希望,但传统方法在克服被刺破的代码所带来的挑战方面挣扎,特别是,这些方法未能解决可变代码率的复杂性和对协议兼容性的需求。本文件介绍了一个专门为克服这些挑战而设计的基于统一长期内存(LSTM)的解码结构。拟议方法统一了穿透性脉冲和涡轮编码。一个穿刺嵌入机制将刺破模式直接纳入网络,使得能够无缝地适应不同的代码率,而平衡的位数错误率培训确保了不同代码长度、费率和渠道的稳健性,并保持了协议的灵活性。在Additive White Gaussian Noise和Rayleigh 的大规模模拟表明,拟议方法超越了常规解码技术,大大改进了解码准确性和稳健性。这些结果突出表明了基于LSTM解码的解码作为下一代人工智能通信系统有希望的解决办法的潜力。
Article 196
Title@2025-06-12 (4): PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Title: PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation | PhysioWave: Multi-Scale Wavelet-Transformer für Physiologische Signaldarstellung | PhysioWave: 生理信号代表的多阶段波盘转换器 2506.10351v1 |
Authors (6): Yanlong Chen, Mattia Orlandi, Pierangelo Maria Rapa, Simone Benatti, Luca Benini, Yawei Li
Physiological signals are often corrupted by motion artifacts, baseline drift, and other low-SNR disturbances, which pose significant challenges for analysis. Additionally, these signals exhibit strong non-stationarity, with sharp peaks and abrupt changes that evolve continuously, making them difficult to represent using traditional time-domain or filtering methods. To address these issues, a novel wavelet-based approach for physiological signal analysis is presented, aiming to capture multi-scale time-frequency features in various physiological signals. Leveraging this technique, two large-scale pretrained models specific to EMG and ECG are introduced for the first time, achieving superior performance and setting new baselines in downstream tasks. Additionally, a unified multi-modal framework is constructed by integrating pretrained EEG model, where each modality is guided through its dedicated branch and fused via learnable weighted fusion. This design effectively addresses challenges such as low signal-to-noise ratio, high inter-subject variability, and device mismatch, outperforming existing methods on multi-modal tasks. The proposed wavelet-based architecture lays a solid foundation for analysis of diverse physiological signals, while the multi-modal design points to next-generation physiological signal processing with potential impact on wearable health monitoring, clinical diagnostics, and broader biomedical applications.
此外,这些信号表现出强烈的非常态性,其峰值和突变不断变化,使得它们难以代表使用传统的时空或过滤方法。为了解决这些问题,提出了一种新的基于波盘的生理信号分析方法,旨在捕捉各种生理信号中的多尺度时间频率特征。利用这一技术,首次引入了针对环境管理集团和ECG的两种大规模预先培训的模型,实现了优异性,为下游任务设定了新的基线。此外,通过整合预先培训的EEEG模型,构建了一个统一的多模式框架,其中每种模式都通过其专用分支提供指导,并通过可学习的加权聚合而结合。这一设计有效地应对了诸如信号到噪音比率低、高跨质变异和装置不匹配等挑战,超过了关于多模式任务的现有方法。拟议的波盘基结构为分析多种生理信号奠定了坚实的基础,同时在多模式、更广泛的临床诊断性诊断性应用方面,同时将多模式设计点用于后期的生理影响处理。
Article 197
Title@2025-06-12 (4): LightKG: Efficient Knowledge-Aware Recommendations with Simplified GNN Architecture
Title: LightKG: Efficient Knowledge-Aware Recommendations with Simplified GNN Architecture | LightKG: Effiziente wissensbasierte Empfehlungen mit vereinfachter GNN-Architektur | LightKG: 具有简化 GNN 架构的高效知识软件建议 2506.10347v1 |
Authors (5): Yanhui Li, Dongxia Wang, Zhu Sun, Haonan Zhang, Huizhong Guo
Recently, Graph Neural Networks (GNNs) have become the dominant approach for Knowledge Graph-aware Recommender Systems (KGRSs) due to their proven effectiveness. Building upon GNN-based KGRSs, Self-Supervised Learning (SSL) has been incorporated to address the sparity issue, leading to longer training time. However, through extensive experiments, we reveal that: (1)compared to other KGRSs, the existing GNN-based KGRSs fail to keep their superior performance under sparse interactions even with SSL. (2) More complex models tend to perform worse in sparse interaction scenarios and complex mechanisms, like attention mechanism, can be detrimental as they often increase learning difficulty. Inspired by these findings, we propose LightKG, a simple yet powerful GNN-based KGRS to address sparsity issues. LightKG includes a simplified GNN layer that encodes directed relations as scalar pairs rather than dense embeddings and employs a linear aggregation framework, greatly reducing the complexity of GNNs. Additionally, LightKG incorporates an efficient contrastive layer to implement SSL. It directly minimizes the node similarity in original graph, avoiding the time-consuming subgraph generation and comparison required in previous SSL methods. Experiments on four benchmark datasets show that LightKG outperforms 12 competitive KGRSs in both sparse and dense scenarios while significantly reducing training time. Specifically, it surpasses the best baselines by an average of 5.8\% in recommendation accuracy and saves 84.3\% of training time compared to KGRSs with SSL. Our code is available at https://github.com/1371149/LightKG.
最近,图形神经网络(GNN)因其证明的有效性而成为了知识图形显示建议系统(KGRS)的主导方法。在基于GNN的KGRS、自增强学习系统(SSL)的基础上,已经纳入一个简单而强大的GNNKGS,以解决模糊性问题,导致培训时间延长。然而,通过广泛的实验,我们发现:(1)与其他KGRS相比,现有的GNNS的GGGGS没有在即使与SSL的微薄互动中保持其优异性。(2) 更复杂的模型往往在稀少的互动假设和复杂机制(如关注机制)中表现得更差,因为它们往往会增加学习困难。根据这些发现,我们提出了LightKG,这是一个简单但又强大的GNNNGG系统,以解决模糊性问题。 LightKG包括一个简化的GNNNS图层,该图将关系作为卡路配方而不是密集的嵌入,并采用线性汇总框架,大大降低GNNSNS的复杂度。此外,LKG在SLV的高效对比层结构中,将SLSLS的节级培训比时间与S的精确时间比。在以往的G中将SBERG标准中直接显示SBRRRR的模型中,将SBR的原始数据格式缩小了。
Article 198
Title@2025-06-12 (4): Technical Report with Proofs for A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments
Title: Technical Report with Proofs for A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments | Technischer Bericht mit Beweisen für ein vollständiges Bild in Übereinstimmungsprüfung: Alle optimalen Ausrichtungen effizient zusammenfassen | 技术报告,附有符合要求检查中完整图片的校对证明:高效地对齐所有最佳对齐 2506.10345v1 |
Authors (3): Philipp Bär, Moe T. Wynn, Sander J. J. Leemans
This technical report provides proofs for the claims in the paper “A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments”.
本技术报告为“符合要求的完整图片检查:有效概括所有最佳调整”文件中的索赔提供了证据。
Article 199
Title@2025-06-12 (4): Provably Learning from Language Feedback
Title: Provably Learning from Language Feedback | Wahrscheinlich von Sprachfeedback lernen | 从语言反馈中学习 2506.10341v1 |
Authors (6): Wanqiao Xu, Allen Nie, Ruijie Zheng, Aditya Modi, Adith Swaminathan, Ching-An Cheng
Interactively learning from observation and language feedback is an increasingly studied area driven by the emergence of large language model (LLM) agents. While impressive empirical demonstrations have been shown, so far a principled framing of these decision problems remains lacking. In this paper, we formalize the Learning from Language Feedback (LLF) problem, assert sufficient assumptions to enable learning despite latent rewards, and introduce $\textit{transfer eluder dimension}$ as a complexity measure to characterize the hardness of LLF problems. We show that transfer eluder dimension captures the intuition that information in the feedback changes the learning complexity of the LLF problem. We demonstrate cases where learning from rich language feedback can be exponentially faster than learning from reward. We develop a no-regret algorithm, called $\texttt{HELiX}$, that provably solves LLF problems through sequential interactions, with performance guarantees that scale with the transfer eluder dimension of the problem. Across several empirical domains, we show that $\texttt{HELiX}$ performs well even when repeatedly prompting LLMs does not work reliably. Our contributions mark a first step towards designing principled interactive learning algorithms from generic language feedback.
从观察和语言反馈中相互学习是一个日益受大型语言模式(LLM)代理商的出现所驱动的研究领域。虽然已经展示了令人印象深刻的经验示范,但迄今为止仍然缺乏对这些决定问题的有原则框架。在本文中,我们将语言反馈(LLLF)问题正式化,提出足够的假设,使学习成为可能得到的回报,并引入$\textit{transfer eluder situle} $,作为衡量LLF问题难度的复杂尺度。我们表明,传输私利的维度能够捕捉到反馈中的信息改变LLLF问题学习复杂性的直觉。我们展示了从丰富的语言反馈中学习的速度可能比从奖励中学习的速度快得多的案例。我们开发了一个名为$\ textt{HELX}的无约束算法,这通过连续的相互作用可以解决LLLF问题,而绩效保证与LF问题的转移 Eluder 维度的规模。在几个经验领域,我们显示, $\ textt{HELX} 能够很好地完成LLMS的学习问题的复杂性。我们的贡献标志着设计从通用语言的交互式反射法的第一步。
Article 200
Title@2025-06-12 (4): Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR
Title: Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR | Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Auf dem Weg zu datenschutzerhaltender verteilter Intelligenz in AR/VR/MR | 多模式、多模式、多任务、多任务联邦基金会下一代扩展现实系统模式:争取在AR/VR/MR/MR中发布隐私保护分配情报 2506.05683v3 |
Authors (5): Fardis Nadimi, Payam Abdisarabshali, Kasra Borazjani, Jacob Chakareski, Seyyedali Hosseinalipour
Extended reality (XR) systems, which consist of virtual reality (VR), augmented reality (AR), and mixed reality (XR), offer a transformative interface for immersive, multi-modal, and embodied human-computer interaction. In this paper, we envision that multi-modal multi-task (M3T) federated foundation models (FedFMs) can offer transformative capabilities for XR systems through integrating the representational strength of M3T foundation models (FMs) with the privacy-preserving model training principles of federated learning (FL). We present a modular architecture for FedFMs, which entails different coordination paradigms for model training and aggregations. Central to our vision is the codification of XR challenges that affect the implementation of FedFMs under the SHIFT dimensions: (1) Sensor and modality diversity, (2) Hardware heterogeneity and system-level constraints, (3) Interactivity and embodied personalization, (4) Functional/task variability, and (5) Temporality and environmental variability. We illustrate the manifestation of these dimensions across a set of emerging and anticipated applications of XR systems. Finally, we propose evaluation metrics, dataset requirements, and design tradeoffs necessary for the development of resource-aware FedFMs in XR. This perspective aims to chart the technical and conceptual foundations for context-aware privacy-preserving intelligence in the next generation of XR systems.
由虚拟现实(VR)、强化现实(AR)和混合现实(XR)组成的扩展现实(XR)系统为隐性、多模式和体现人的计算机互动提供了一个变革界面,在本文件中,我们设想,多模式多任务(M3T)联合基础模型(FedFMs)能够通过将M3T基础模型(FMs)的代表性强度和体现的个人化、(4)功能/任务变异性以及(5)时间和环境变异性相结合,为XRFM提供一个模块架构,这需要为模型培训和聚合提供不同的协调模式。我们愿景的核心是编纂影响SHIFT维度下FMS执行的XR挑战:(1) 传感器和模式多样性,(2) 硬软件异质和系统级限制,(3) 互动和体现个人化,(4) 功能/任务变异性,以及(5) 时间和环境变异性。我们展示了这些层面在XR系统一系列新兴和预期的应用中的表现。我们愿景的核心是:在SHIFFFM系统下,为下一个数据库设计、数据格式设计目标,我们提议了XFM格式设计技术图表的下一个格式要求。
Article 201
Title@2025-06-12 (4): On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
Title: On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention | On-the-Fly Adaptive Destillation von Transformer auf Dual-State Lineare Aufmerksamkeit | 向两国线性注意对变压器进行实时调整性蒸馏 2506.09316v2 |
Authors (5): Yeonju Ro, Zhenyu Zhang, Souvik Kundu, Zhangyang Wang, Aditya Akella
Large language models (LLMs) excel at capturing global token dependencies via self-attention but face prohibitive compute and memory costs on lengthy inputs. While sub-quadratic methods (e.g., linear attention) can reduce these costs, they often degrade accuracy due to overemphasizing recent tokens. In this work, we first propose dual-state linear attention (DSLA), a novel design that maintains two specialized hidden states-one for preserving historical context and one for tracking recency-thereby mitigating the short-range bias typical of linear-attention architectures. To further balance efficiency and accuracy under dynamic workload conditions, we introduce DSLA-Serve, an online adaptive distillation framework that progressively replaces Transformer layers with DSLA layers at inference time, guided by a sensitivity-based layer ordering. DSLA-Serve uses a chained fine-tuning strategy to ensure that each newly converted DSLA layer remains consistent with previously replaced layers, preserving the overall quality. Extensive evaluations on commonsense reasoning, long-context QA, and text summarization demonstrate that DSLA-Serve yields 2.3x faster inference than Llama2-7B and 3.0x faster than the hybrid Zamba-7B, while retaining comparable performance across downstream tasks. Our ablation studies show that DSLA’s dual states capture both global and local dependencies, addressing the historical-token underrepresentation seen in prior linear attentions. Codes are available at https://github.com/utnslab/DSLA-Serve.
大型语言模型(LLMS)擅长通过自我注意捕捉全球象征性依赖性,但在冗长投入下却面临着令人望而却步的计算和记忆成本。尽管次二次计算方法(如线性关注)可以降低这些费用,但由于过度强调最近的一些象征性,这些方法往往会降低准确性。在这项工作中,我们首先提议双州线性关注(DSLA),这是保持两种保存历史背景的专门隐蔽状态的新设计,也是追踪反应的新型设计,从而减轻线性关注结构典型的短距离偏差。为了在动态工作量条件下进一步平衡效率和准确性,我们引入DSLA-Serve,这是一个在线适应性蒸馏框架,在基于敏感性的层次下,以DSLA-Servoration层取代变异的变异层。 DSLA-Servervey 使用一个链式的微调战略,以确保每个新转换的DSLA层与先前更换的层保持一致,从而保持总体质量。对通用逻辑推理学、长直线性QA和文字平衡进行广泛的评价,我们在LSLSLA-ServiewServiewSDSLx公司在LA-SLA-ServiewSLSLxSBSerview中比SLSBSBSBServiax前的成绩上更快更快的成绩上更快。
Article 202
Title@2025-06-12 (4): Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Title: Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs | Amulett: Neuausrichtung während der Testzeit für Personalisierte Präferenzanpassung von LLMs | 缩略图:在试验期间重新对准,以适应LLMM的个性化偏好 2502.19148v3 |
Authors (8): Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang
How to align large language models (LLMs) with user preferences from a static general dataset has been frequently studied. However, user preferences are usually personalized, changing, and diverse regarding culture, values, or time. This leads to the problem that the actual user preferences often do not coincide with those trained by the model developers in the practical use of LLMs. Since we cannot collect enough data and retrain for every demand, researching efficient real-time preference adaptation methods based on the backbone LLMs during test time is important. To this end, we introduce Amulet, a novel, training-free framework that formulates the decoding process of every token as a separate online learning problem with the guidance of simple user-provided prompts, thus enabling real-time optimization to satisfy users’ personalized preferences. To reduce the computational cost brought by this optimization process for each token, we additionally provide a closed-form solution for each iteration step of the optimization process, thereby reducing the computational time cost to a negligible level. The detailed experimental results demonstrate that Amulet can achieve significant performance improvements in rich settings with combinations of different LLMs, datasets, and user preferences, while maintaining acceptable computational efficiency.
经常研究如何使大型语言模型(LLMS)与静态一般数据集的用户偏好相一致,然而,用户偏好通常是个性化的、变化的,在文化、价值或时间方面是多种多样的,这导致一个问题,即实际用户偏好往往与模型开发者在实际使用LLMs方面所培训的偏好不相吻合。由于我们无法为每项需求收集足够的数据和再培训,因此在测试期间根据主干LLMS研究有效的实时偏好适应方法十分重要。为此,我们引入了Amulet,这是一个创新的、无培训的框架,它以简单的用户提供的提示为指导,将每个符号的解码进程作为一个单独的在线学习问题,从而使得实时优化能够满足用户的个人偏好。为了减少这种优化进程为每种符号带来的计算成本,我们为优化过程的每个迭代步骤提供额外的封闭式解决方案,从而将计算时间成本降至微不足道的水平。我们详细的实验结果表明,Amulett可以在丰富的环境中实现显著的性改进,同时将不同的LMS、数据设置和用户偏好,同时保持可接受的效率。
Article 203
Title@2025-06-12 (4): Air in Your Neighborhood: Fine-Grained AQI Forecasting Using Mobile Sensor Data
Title: Air in Your Neighborhood: Fine-Grained AQI Forecasting Using Mobile Sensor Data | Luft in Ihrer Nachbarschaft: Feinkörnige AQI-Prognose mit mobilen Sensordaten | 利用移动感应器数据进行精密的AQI预报 2506.10332v1 |
Authors (1): Aaryam Sharma
Air pollution has become a significant health risk in developing countries. While governments routinely publish air-quality index (AQI) data to track pollution, these values fail to capture the local reality, as sensors are often very sparse. In this paper, we address this gap by predicting AQI in 1 km^2 neighborhoods, using the example of AirDelhi dataset. Using Spatio-temporal GNNs we surpass existing works by 71.654 MSE a 79% reduction, even on unseen coordinates. New insights about AQI such as the existence of strong repetitive short-term patterns and changing spatial relations are also discovered. The code is available on GitHub.
在发展中国家,空气污染已成为严重的健康风险。虽然政府定期公布空气质量指数(AQI)数据以跟踪污染,但这些数值未能反映当地现实,因为传感器往往非常稀少。 在本文中,我们用AirDelhi数据集的例子预测AQI在1平方公里的相邻地区存在这一差距。使用时空GNNs,我们比现有工程减少71.654 MSE79%,即使是在看不见的坐标上。还发现了关于AQI的新见解,例如存在强烈重复的短期模式和不断变化的空间关系。该代码可在GitHub上查阅。
Article 204
Title@2025-06-12 (4): CompilerDream: Learning a Compiler World Model for General Code Optimization
Title: CompilerDream: Learning a Compiler World Model for General Code Optimization | CompilerDream: Lernen eines Compiler-Weltmodells für die allgemeine Code-Optimierung | 汇编者:学习编纂者世界通用守则优化模式 2404.16077v3 |
Authors (5): Chaoyi Deng, Jialong Wu, Ningya Feng, Jianmin Wang, Mingsheng Long
Effective code optimization in compilers is crucial for computer and software engineering. The success of these optimizations primarily depends on the selection and ordering of the optimization passes applied to the code. While most compilers rely on a fixed sequence of optimization passes, current methods to find the optimal sequence either employ impractically slow search algorithms or learning methods that struggle to generalize to code unseen during training. We introduce CompilerDream, a model-based reinforcement learning approach to general code optimization. CompilerDream comprises a compiler world model that accurately simulates the intrinsic properties of optimization passes and an agent trained on this model to produce effective optimization strategies. By training on a large-scale program dataset, CompilerDream is equipped to serve as a general code optimizer across various application scenarios and source-code languages. Our extensive experiments first highlight CompilerDream’s strong optimization capabilities for autotuning, where it leads the CompilerGym leaderboard. More importantly, the zero-shot generalization ability of large-scale trained compiler world model and agent, excels across diverse datasets, surpassing LLVM’s built-in optimizations and other state-of-the-art methods in both settings of value prediction and end-to-end code optimization.
编译器的有效代码优化对于计算机和软件工程至关重要。 这些优化的成功主要取决于对代码应用的优化通行证的选择和顺序。 虽然大多数编译器依赖固定的优化通行证序列, 但当前寻找最佳序列的方法要么采用不切实际的慢速搜索算法, 要么采用在培训期间难以普及到无法理解的代码的学习方法。 我们引入了基于模型的强化学习方法CapilrDream, 这是一种基于模型的强化学习方法, 用于一般代码优化。 编译器Dream 包括一个编译器世界模型, 准确模拟优化通行证的内在特性, 以及一个经过培训的代理人, 以制作有效的优化战略。 虽然大多数编译器依赖一个大型程序数据集的培训, 但通过对编译器的数据集进行培训, 将编译器Dream 的强大优化能力 首次凸显了 , 并引导了编译器Gym 领导板。 更重要的是, 大规模经过培训的编译器世界模型和代理器的零光化能力, 超越了各种数据集, 超越了LLVM的终极值和最优化预测设置中的其他状态。
Article 205
Title@2025-06-12 (4): Towards Scalable SOAP Note Generation: A Weakly Supervised Multimodal Framework
Title: Towards Scalable SOAP Note Generation: A Weakly Supervised Multimodal Framework | Auf dem Weg zu einer skalierbaren SOAP-Note Generation: Ein schwach überwachter multimodaler Rahmen | 向可缩放的 SOAP 笔记本生成方向迈进: 受微弱监督的多模式框架 2506.10328v1 |
Authors (3): Sadia Kamal, Tim Oates, Joy Wan
Skin carcinoma is the most prevalent form of cancer globally, accounting for over $8 billion in annual healthcare expenditures. In clinical settings, physicians document patient visits using detailed SOAP (Subjective, Objective, Assessment, and Plan) notes. However, manually generating these notes is labor-intensive and contributes to clinician burnout. In this work, we propose a weakly supervised multimodal framework to generate clinically structured SOAP notes from limited inputs, including lesion images and sparse clinical text. Our approach reduces reliance on manual annotations, enabling scalable, clinically grounded documentation while alleviating clinician burden and reducing the need for large annotated data. Our method achieves performance comparable to GPT-4o, Claude, and DeepSeek Janus Pro across key clinical relevance metrics. To evaluate clinical quality, we introduce two novel metrics MedConceptEval and Clinical Coherence Score (CCS) which assess semantic alignment with expert medical concepts and input features, respectively.
皮肤癌是全球最常见的癌症形式,每年的保健支出超过80亿美元。在临床环境中,医生使用详细的SOAP(目标、目标、评估和计划)注释来记录病人的探视。然而,人工生成这些笔记是劳动密集型的,有助于临床烧伤。在这项工作中,我们提出一个监督不力的多式联运框架,从有限的投入(包括损耗图像和稀少的临床文本)中产生临床结构化的SOAP笔记。我们的方法减少了对手动说明的依赖,允许可缩放、临床基础文件,同时减轻临床负担和减少对大量附加说明数据的需求。我们的方法取得了与GPT-4o、Claude和DeepSeek Janus Pro相类似的业绩。为了评估临床质量,我们引入了两个新型的计量标准MedCeptionEval和临床一致性计数,分别评估与专家医疗概念和输入特征的语义调整。
Article 206
Title@2025-06-12 (4): A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon
Title: A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon | Ein Benchmark für die Verallgemeinerung unterschiedlicher Teamstrategien im wettbewerbsfähigen Pokémon | 普凯蒙竞争中全面推广不同团队战略的基准 2506.10326v1 |
Authors (5): Cameron Angliss, Jiaxun Cui, Jiaheng Hu, Arrasy Rahman, Peter Stone
Developing AI agents that can robustly adapt to dramatically different strategic landscapes without retraining is a central challenge for multi-agent learning. Pok'emon Video Game Championships (VGC) is a domain with an extraordinarily large space of possible team configurations of approximately $10^{139}$ - far larger than those of Dota or Starcraft. The highly discrete, combinatorial nature of team building in Pok'emon VGC causes optimal strategies to shift dramatically depending on both the team being piloted and the opponent’s team, making generalization uniquely challenging. To advance research on this problem, we introduce VGC-Bench: a benchmark that provides critical infrastructure, standardizes evaluation protocols, and supplies human-play datasets and a range of baselines - from large-language-model agents and behavior cloning to reinforcement learning and empirical game-theoretic methods such as self-play, fictitious play, and double oracle. In the restricted setting where an agent is trained and evaluated on a single-team configuration, our methods are able to win against a professional VGC competitor. We extensively evaluated all baseline methods over progressively larger team sets and find that even the best-performing algorithm in the single-team setting struggles at scaling up as team size grows. Thus, policy generalization across diverse team strategies remains an open challenge for the community. Our code is open sourced at https://github.com/cameronangliss/VGC-Bench.
在不再培训的情况下,发展能够强有力地适应完全不同的战略景观的AI代理机构是多试探学习的一个中心挑战。 Pok\ emamon Vegle General Campales(VGC)是一个非常庞大的领域,拥有大约10139美元(比Dota或Starcraft要大得多)的可能团队配置空间,远大于Dota或Starcraft。在Pok'emon VGC的团队建设中,高度离散、组合性强的团队建设导致最佳战略的急剧转变,取决于正在试点的团队和对手团队,使普遍性具有独特的挑战性。为了推进对这一问题的研究,我们引入VGC-Bench:一个提供关键基础设施的基准,将评估协议标准化,并提供人类游戏数据集和一系列基线 — 从大型语言模范代理和行为克隆到强化学习和实验性游戏-理论方法,如自玩游戏、虚构游戏和双形或变形。在限制环境中培训和评价一个代理机构,我们的方法能够战胜专业VGC竞争者。我们广泛评估VGC-Bench:我们广泛评估了所有基线方法,在团队规模上超越了整个团队的团队规模上,在团队中不断提升了整个团队的游戏中,在团队中不断演进进进进进进进进进进进进进进进进进进进进进进进式的系统。
Article 207
Title@2025-06-12 (4): SWDL: Stratum-Wise Difference Learning with Deep Laplacian Pyramid for Semi-Supervised 3D Intracranial Hemorrhage Segmentation
Title: SWDL: Stratum-Wise Difference Learning with Deep Laplacian Pyramid for Semi-Supervised 3D Intracranial Hemorrhage Segmentation | SWDL: Stratum-Wise Difference Learning mit tiefer Laplacian Pyramide für halbüberwachte 3D-Intrakranialblutungssegmentierung | SWDL: 与深拉拉拉比西亚金字塔为半超3D内出血分块进行平流和深拉拉拉西亚金字塔差异学习 2506.10325v1 |
Authors (6): Cheng Wang, Siqi Chen, Donghua Mi, Yang Chen, Yudong Zhang, Yinsheng Li
Recent advances in medical imaging have established deep learning-based segmentation as the predominant approach, though it typically requires large amounts of manually annotated data. However, obtaining annotations for intracranial hemorrhage (ICH) remains particularly challenging due to the tedious and costly labeling process. Semi-supervised learning (SSL) has emerged as a promising solution to address the scarcity of labeled data, especially in volumetric medical image segmentation. Unlike conventional SSL methods that primarily focus on high-confidence pseudo-labels or consistency regularization, we propose SWDL-Net, a novel SSL framework that exploits the complementary advantages of Laplacian pyramid and deep convolutional upsampling. The Laplacian pyramid excels at edge sharpening, while deep convolutions enhance detail precision through flexible feature mapping. Our framework achieves superior segmentation of lesion details and boundaries through a difference learning mechanism that effectively integrates these complementary approaches. Extensive experiments on a 271-case ICH dataset and public benchmarks demonstrate that SWDL-Net outperforms current state-of-the-art methods in scenarios with only 2% labeled data. Additional evaluations on the publicly available Brain Hemorrhage Segmentation Dataset (BHSD) with 5% labeled data further confirm the superiority of our approach. Code and data have been released at https://github.com/SIAT-CT-LAB/SWDL.
医学成像(SSL)的近期进展将深层次的基于学习的分割作为主要方法,尽管通常需要大量人工附加说明的数据。然而,由于标签过程既乏味又昂贵,因此获得室内出血(ICH)的说明仍然特别困难。半监督的学习(SSL)已成为解决标签数据稀缺问题的一个大有希望的解决办法,特别是在体积图解析方面。与主要侧重于高信任伪标签或一致性正规化的常规SSL方法不同,我们提议SWDL-Net(SWDL-Net)是一个新的SSL框架,它利用了拉普拉西金字塔和深刻的革命性上层取样的互补优势。 Laplacian金字塔在精锐化方面表现突出,而深层的演进化则通过灵活的地貌图绘制提高了细节的精确度。我们的框架通过一种差异学习机制实现腐蚀细节和界限的更优分化,有效地融合了这些互补的方法。关于271个案件ICH数据集和公共基准的广泛实验表明,SWD-AB网络在假设中超越了目前状态的状态方法,在SBRial-ral-labalalalation方法中,只确认了我们已公开标定的MS/MISD数据格式的数据。
Article 208
Title@2025-06-12 (4): DiffUMI: Training-Free Universal Model Inversion via Unconditional Diffusion for Face Recognition
Title: DiffUMI: Training-Free Universal Model Inversion via Unconditional Diffusion for Face Recognition | DiffUMI: Schulungsfreie universelle Modellinversion über Bedingungslose Diffusion für die Gesichtserkennung | DiffUMI:通过无条件传播面相承认,无培训通用模式翻版 2504.18015v2 |
Authors (4): Hanrui Wang, Shuo Wang, Chun-Shien Lu, Isao Echizen
Face recognition technology presents serious privacy risks due to its reliance on sensitive and immutable biometric data. To address these concerns, such systems typically convert raw facial images into embeddings, which are traditionally viewed as privacy-preserving. However, model inversion attacks challenge this assumption by reconstructing private facial images from embeddings, highlighting a critical vulnerability in face recognition systems. Most existing inversion methods require training a separate generator for each target model, making them computationally intensive. In this work, we introduce DiffUMI, a diffusion-based universal model inversion attack that requires no additional training. DiffUMI is the first approach to successfully leverage unconditional face generation without relying on model-specific generators. It surpasses state-of-the-art attacks by 15.5% and 9.82% in success rate on standard and privacy-preserving face recognition systems, respectively. Furthermore, we propose a novel use of out-of-domain detection (OODD), demonstrating for the first time that model inversion can differentiate between facial and non-facial embeddings using only the embedding space.
面部识别技术由于依赖敏感和不可改变的生物鉴别数据而带来严重的隐私风险。 为解决这些关切,这类系统通常将原始面部图像转换成嵌入器,传统上被视为隐私保护。 然而,模型反向攻击通过将私人面部图像从嵌入器中重建出来,在面部识别系统中突出一个关键弱点,从而挑战这一假设。 大多数现有的反向方法要求为每个目标模型培训一个单独的生成器,使其在计算上集聚。 在这项工作中,我们引入了DiffUMI, 一种基于扩散的通用反向模型攻击,不需要额外培训。 DiffUMI是成功利用无条件面部生成而无需依赖模型特定生成器的第一个方法。 它在标准上和隐私面部识别系统中分别超过最先进的攻击率15.5%和9.82%。 此外,我们提议新使用外部检测(OODDD), 首次展示了模型能够仅使用嵌入空间区分面部和非面部嵌入器。
Article 209
Title@2025-06-12 (4): DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDE
Title: DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDE | DeePoly: Ein High-Order Accuracy Scientific Machine Learning Framework für die Funktionsannäherung und Lösung von PDE | Deepoly: 功能近似和溶解PDE高分精确度科学机器学习框架 2506.04613v2 |
Authors (2): Li Liu, Heng Yong
Recently, machine learning methods have gained significant traction in scientific computing, particularly for solving Partial Differential Equations (PDEs). However, methods based on deep neural networks (DNNs) often lack convergence guarantees and computational efficiency compared to traditional numerical schemes. This work introduces DeePoly, a novel framework that transforms the solution paradigm from pure non-convex parameter optimization to a two-stage approach: first employing a DNN to capture complex global features, followed by linear space optimization with combined DNN-extracted features (Scoper) and polynomial basis functions (Sniper). This strategic combination leverages the complementary strengths of both methods – DNNs excel at approximating complex global features (i.e., high-gradient features) and stabilize the polynomial approximation while polynomial bases provide high-precision local corrections with convergence guarantees. Theoretical analysis and numerical experiments demonstrate that this approach significantly enhances both high-order accuracy and efficiency across diverse problem types while maintaining mesh-free and scheme-free properties. This paper also serves as a theoretical exposition for the open-source project DeePoly.
最近,机器学习方法在科学计算中获得了显著的牵引力,特别是在解决部分差异(PDE)方面。然而,基于深神经网络(DNNS)的方法往往缺乏趋同保障和与传统数字方法相比的计算效率。这项工作介绍了Deepolly,这是一个将解决方案范式从纯非Convex参数优化转变为两阶段方法的新框架:首先使用DNN来捕捉复杂的全球特征,然后是线性空间优化,同时结合DNN抽取的特征(Scoper)和多元基函数(Sniper),这种战略组合利用了两种方法的互补优势 – – DNNS在相近复杂的全球特征(即高位特征)方面的优势,稳定多元近距离,而多元基础则提供高精度的地方校正,同时提供趋同保证。理论分析和数字实验表明,这一方法极大地提高了不同类型问题高端的准确度和效率,同时保持无网格和无计划特性。这份文件还作为开放源项目Deeppoly的理论演示。
Article 210
Title@2025-06-12 (4): Policy-Based Trajectory Clustering in Offline Reinforcement Learning
Title: Policy-Based Trajectory Clustering in Offline Reinforcement Learning | Policy-based Trajektorie Clustering im Offline-Verstärkungslernen | 以政策为基础的离线强化学习轨迹集群 2506.09202v2 |
Authors (3): Hao Hu, Xinqi Wang, Simon Shaolei Du
We introduce a novel task of clustering trajectories from offline reinforcement learning (RL) datasets, where each cluster center represents the policy that generated its trajectories. By leveraging the connection between the KL-divergence of offline trajectory distributions and a mixture of policy-induced distributions, we formulate a natural clustering objective. To solve this, we propose Policy-Guided K-means (PG-Kmeans) and Centroid-Attracted Autoencoder (CAAE). PG-Kmeans iteratively trains behavior cloning (BC) policies and assigns trajectories based on policy generation probabilities, while CAAE resembles the VQ-VAE framework by guiding the latent representations of trajectories toward the vicinity of specific codebook entries to achieve clustering. Theoretically, we prove the finite-step convergence of PG-Kmeans and identify a key challenge in offline trajectory clustering: the inherent ambiguity of optimal solutions due to policy-induced conflicts, which can result in multiple equally valid but structurally distinct clusterings. Experimentally, we validate our methods on the widely used D4RL dataset and custom GridWorld environments. Our results show that both PG-Kmeans and CAAE effectively partition trajectories into meaningful clusters. They offer a promising framework for policy-based trajectory clustering, with broad applications in offline RL and beyond.
为了解决这个问题,我们提出了将离线强化学习(RL)数据集的轨迹分组的新任务,即每个集群中心代表产生其轨迹的政策。通过利用离线轨迹分布的KL光度与政策驱动分布的混合分布之间的连接,我们制定了自然集群目标。为了解决这个问题,我们提出了政策指南K手段(PG-Kpoles)和Centrid-Atrate Autoencoder(CAAE),PG-Kpoles反复培训行为克隆(BC)政策,并根据政策生成概率分配轨迹。CAAE类似于VQ-VAE框架,指导轨道在特定代码集条目附近的潜在表达方式,以达到集群。理论上,我们证明PG-Kpoles(PG-Kpoles)和Coide-Aloadroadroadal 集群的有限阶段融合,并找出一个关键挑战:政策引发的冲突导致多重的、但结构上不同的组合,而结构上则可以使R-VQ-VA的应用程序与VV-VA(R)框架相似,我们有效地验证了我们G-G Greal-G Greal-G Groal-I-G)在数据库提供的结果。
Article 211
Title@2025-06-12 (4): PerfTracker: Online Performance Troubleshooting for Large-scale Model Training in Production
Title: PerfTracker: Online Performance Troubleshooting for Large-scale Model Training in Production | PerfTracker: Online-Performance-Fehlersuche für großformatige Modellschulungen in der Produktion | PerfTracker:大规模生产示范培训在线绩效问题解决 2506.08528v3 |
Authors (13): Yu Guan, Zhiyu Yin, Haoyu Chen, Sheng Cheng, Chaojie Yang, Kun Qian, Tianyin Xu, Yang Zhang, Hanyu Zhao, Yong Li, Wei Lin, Dennis Cai, Ennan Zhai
Troubleshooting performance problems of large model training (LMT) is immensely challenging, due to unprecedented scales of modern GPU clusters, the complexity of software-hardware interactions, and the data intensity of the training process. Existing troubleshooting approaches designed for traditional distributed systems or datacenter networks fall short and can hardly apply to real-world training systems. In this paper, we present PerfTracker, the first online troubleshooting system utilizing fine-grained profiling, to diagnose performance issues of large-scale model training in production. PerfTracker can diagnose performance issues rooted in both hardware (e.g., GPUs and their interconnects) and software (e.g., Python functions and GPU operations). It scales to LMT on modern GPU clusters. PerfTracker effectively summarizes runtime behavior patterns of fine-grained LMT functions via online profiling, and leverages differential observability to localize the root cause with minimal production impact. PerfTracker has been deployed as a production service for large-scale GPU clusters of O(10, 000) GPUs (product homepage https://help.aliyun.com/zh/pai/user-guide/perftracker-online-performance-analysis-diagnostic-tool). It has been used to diagnose a variety of difficult performance issues.
大型模型培训(LMT)的故障排除问题非常艰巨,原因是现代GPU集群规模空前,软件硬件互动的复杂性,以及培训过程的数据强度。为传统分布式系统或数据中心网络设计的现有故障排除方法不足,难以适用于现实世界培训系统。本文介绍PerfTracker,这是第一个使用微小分析的在线故障排除系统,用于诊断生产中大规模模型培训的绩效问题。 PerfTracker可以诊断硬件(例如,GPUs及其内部连接)和软件(例如,Python功能和GPU操作)两方面的绩效问题。在现代GPU组或数据中心中,现有故障排除方法不足,难以适用于现实世界培训系统。在本文中,我们介绍PerfTracker,这是第一个使用微小分析模型分析的在线故障排除系统,用来分析大规模GPUG(例如,10,000)的硬件组合及其内部连接)和软件(例如,Python 功能-stall-stall) GPUS-stall a has hasimal-hillagemental-deviewal ASyalmental a.
Article 212
Title@2025-06-12 (4): Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting
Title: Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting | Aufwärtsgewichtung einfache Proben in Feinabstimmung Mitigates Vergessen | 高重量加重量易加重量样本 2502.02797v2 |
Authors (5): Sunny Sanyal, Hayden Prairie, Rudrajit Das, Ali Kavis, Sujay Sanghavi
Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities, a phenomenon known as “catastrophic forgetting”. This is especially an issue when one does not have access to the data and recipe used to develop the pre-trained model. Under this constraint, most existing methods for mitigating forgetting are inapplicable. To address this challenge, we propose a sample weighting scheme for the fine-tuning data solely based on the pre-trained model’s losses. Specifically, we upweight the easy samples on which the pre-trained model’s loss is low and vice versa to limit the drift from the pre-trained model. Our approach is orthogonal and yet complementary to existing methods; while such methods mostly operate on parameter or gradient space, we concentrate on the sample space. We theoretically analyze the impact of fine-tuning with our method in a linear setting, showing that it stalls learning in a certain subspace which inhibits overfitting to the target task. We empirically demonstrate the efficacy of our method on both language and vision tasks. As an example, when fine-tuning Gemma 2 2B on MetaMathQA, our method results in only a $0.8\%$ drop in accuracy on GSM8K (another math dataset) compared to standard fine-tuning, while preserving $5.4\%$ more accuracy on the pre-training datasets. Our code is publicly available at https://github.com/sanyalsunny111/FLOW_finetuning .
在下游任务上微调一个经过预先训练的模型往往会降低其原有能力,即所谓的“灾难性的忘记”现象。当一个人无法获取用于开发预训练模式的数据和食谱时,这尤其是一个问题。在这个制约因素下,大多数现有的减轻忘记的方法是不适用的。为了应对这一挑战,我们建议只根据经过训练的模型的损失对微调数据进行抽样加权。具体地说,我们将经过训练的模型损失程度较低的简易样本加起来,反之,以限制从预训练模式中流出。我们的方法是垂直的,但又与现有方法相辅相成;虽然这些方法大多在参数或梯度空间上运作,但我们集中关注抽样空间。我们从理论上分析在线性环境下微调方法的影响,表明它使微调方法的学习停滞在某些子空间里,这抑制了目标任务。我们从经验上展示了我们方法在语言和视觉任务两方面的功效。我们的方法在MetAQA 2B 上微调GMath-QA 2,我们的方法只是将G-Q-Q-roup a dal deal dal disal disal disal setty on setal setal dal sal sal setal sal setal set laps.
Article 213
Title@2025-06-12 (4): PyLO: Towards Accessible Learned Optimizers in PyTorch
Title: PyLO: Towards Accessible Learned Optimizers in PyTorch | PyLO: Auf dem Weg zu zugänglich gelernten Optimierern in PyTorch | PyLO:在皮托尔奇努力实现无障碍、可获取的优化 2506.10315v1 |
Authors (6): Paul Janson, Benjamin Therien, Quentin Anthony, Xiaolong Huang, Abhinav Moudgil, Eugene Belilovsky
Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optimizers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances – such as VeLO, which was meta-trained for 4000 TPU-months – remain largely inaccessible to the broader community, in part due to their reliance on JAX and the absence of user-friendly packages for applying the optimizers after meta-training. To address this gap, we introduce PyLO, a PyTorch-based library that brings learned optimizers to the broader machine learning community through familiar, widely adopted workflows. Unlike prior work focused on synthetic or convex tasks, our emphasis is on applying learned optimization to real-world large-scale pre-training tasks. Our release includes a CUDA-accelerated version of the small_fc_lopt learned optimizer architecture from (Metz et al., 2022a), delivering substantial speedups – from 39.36 to 205.59 samples/sec throughput for training ViT B/16 with batch size 32. PyLO also allows us to easily combine learned optimizers with existing optimization tools such as learning rate schedules and weight decay. When doing so, we find that learned optimizers can substantially benefit. Our code is available at https://github.com/Belilovsky-Lab/pylo
过去十年来,最优化的学习者一直是一个积极的研究主题,在实用的、通用的优化优化方法方面不断取得进展,这种优化方法可以作为像亚当这样的广泛使用的方法的现成替代物。然而,最近的进展 – – 例如VELO, 它经过4000个TPU月的元训练,在4000个TPU月中,对于更广大的社区来说基本上是无法实现的 – – 部分原因是他们依赖JAX,而且缺乏方便用户的软件包,在进行元培训后应用最优化设备。为了弥补这一差距,我们引入了PyLO,这是一个以PyTorch为基础的图书馆,通过熟悉和广泛采用工作流程,将学习的优化者带到更广泛的机器学习界。与以前侧重于合成或康韦克斯任务的工作不同,我们的重点是对现实世界大规模培训前的任务应用所学的优化。我们的发布内容包括CUDA-AC-ACerc_lopat学习的小型优化器结构(Metz et al., 2022a),我们引入了3936至205.59个样本/sexinstupupupupupupups-weal labs,我们可以很容易地学习到这样的优化工具。
Article 214
Title@2025-06-12 (4): Detecting Sockpuppetry on Wikipedia Using Meta-Learning
Title: Detecting Sockpuppetry on Wikipedia Using Meta-Learning | Sockepuppetry auf Wikipedia erkennen Mit Meta-Learning | 在维基百科上用元学习探测袜子布料 2506.10314v1 |
Authors (2): Luc Raszewski, Christine De Kock
Malicious sockpuppet detection on Wikipedia is critical to preserving access to reliable information on the internet and preventing the spread of disinformation. Prior machine learning approaches rely on stylistic and meta-data features, but do not prioritise adaptability to author-specific behaviours. As a result, they struggle to effectively model the behaviour of specific sockpuppet-groups, especially when text data is limited. To address this, we propose the application of meta-learning, a machine learning technique designed to improve performance in data-scarce settings by training models across multiple tasks. Meta-learning optimises a model for rapid adaptation to the writing style of a new sockpuppet-group. Our results show that meta-learning significantly enhances the precision of predictions compared to pre-trained models, marking an advancement in combating sockpuppetry on open editing platforms. We release a new dataset of sockpuppet investigations to foster future research in both sockpuppetry and meta-learning fields.
维基百科上的恶性袜子探测对保存在互联网上获取可靠信息的机会和防止虚假信息传播至关重要。 先前的机器学习方法依赖于文体和元数据特征,但并不优先考虑作者特定行为。 结果,他们努力有效地模拟特定袜子布偶群体的行为,特别是在文本数据有限的情况下。 为了解决这个问题,我们提议应用元学习,即一种机器学习技术,通过多种任务的培训模型来改善数据卡片环境中的性能。 元学习选修模式是快速适应一个新的袜子布偶团体的写作风格的模式。 我们的结果表明,与预先培训的模式相比,元学习极大地提高了预测的准确性,标志着在打击开放编辑平台上的袜子布偶方面有所进步。 我们发布了一套新数据集,由袜子调查提供,以促进今后在袜子和元学习领域的研究。
Article 215
Title@2025-06-12 (4): Collaborative Min-Max Regret in Grouped Multi-Armed Bandits
Title: Collaborative Min-Max Regret in Grouped Multi-Armed Bandits | Kollaborativer Min-Max-Streit in gruppierten Multi-Armed Bandits | 联合多武装强盗团体中负负负负 2506.10313v1 |
Authors (2): Moïse Blanchard, Vineet Goyal
We study the impact of sharing exploration in multi-armed bandits in a grouped setting where a set of groups have overlapping feasible action sets [Baek and Farias ‘24]. In this grouped bandit setting, groups share reward observations, and the objective is to minimize the collaborative regret, defined as the maximum regret across groups. This naturally captures applications in which one aims to balance the exploration burden between groups or populations – it is known that standard algorithms can lead to significantly imbalanced exploration cost between groups. We address this problem by introducing an algorithm Col-UCB that dynamically coordinates exploration across groups. We show that Col-UCB achieves both optimal minimax and instance-dependent collaborative regret up to logarithmic factors. These bounds are adaptive to the structure of shared action sets between groups, providing insights into when collaboration yields significant benefits over each group learning their best action independently.
我们研究在一个组合环境中在多武装匪徒中分享勘探活动的影响,在这种组合环境中,一组团体[Baek和Faras ‘24]有重叠的可行行动组合。在这个组合的匪徒环境中,小组分享奖励意见,目标是尽量减少协作的遗憾,即各团体之间的最大遗憾。这自然地捕捉了旨在平衡各团体或人口之间勘探负担的各种应用 – – 众所周知,标准算法可能导致各团体之间勘探费用严重失衡。我们通过采用动态协调各团体间勘探活动的Col-CUB算法来解决这一问题。我们证明,Col-UB既实现了最佳小型合作,又实现了以实例为基础的协作遗憾,最终达到了对数因素。这些界限适应了各团体之间共同行动组合的结构,提供了对协作在为每个团体独立学习其最佳行动带来重大好处时的洞察力。
Article 216
Title@2025-06-12 (4): Agnostic Smoothed Online Learning without Knowledge of the Base Measure
Title: Agnostic Smoothed Online Learning without Knowledge of the Base Measure | Agnostisches geglättetes Online-Lernen ohne Kenntnis des Basismaßes | 缺乏基础措施知识的Agnostic平滑在线学习 2410.05124v3 |
Authors (1): Moïse Blanchard
Classical results in statistical learning typically consider two extreme data-generating models: i.i.d. instances from an unknown distribution, or fully adversarial instances, often much more challenging statistically. To bridge the gap between these models, recent work introduced the smoothed framework, in which at each iteration an adversary generates instances from a distribution constrained to have density bounded by $\sigma^{-1}$ compared to some fixed base measure $\mu$. This framework interpolates between the i.i.d. and adversarial cases, depending on the value of $\sigma$. For the classical online prediction problem, most prior results in smoothed online learning rely on the arguably strong assumption that the base measure $\mu$ is known to the learner, contrasting with standard settings in the PAC learning or consistency literature. We consider the general agnostic problem in which the base measure is unknown and values are arbitrary. Along this direction, Block et al. showed that empirical risk minimization has sublinear regret under the well-specified assumption. We propose an algorithm R-Cover based on recursive coverings which is the first to guarantee sublinear regret for agnostic smoothed online learning without prior knowledge of $\mu$. For classification, we prove that R-Cover has adaptive regret $\tilde O(\sqrt{dT/\sigma})$ for function classes with VC dimension $d$, which is optimal up to logarithmic factors. For regression, we establish that R-Cover has sublinear oblivious regret for function classes with polynomial fat-shattering dimension growth.
统计学习的经典结果通常会考虑两种极端的数据生成模型:i.d.来自未知分布或完全对立的事例,通常在统计上更具挑战性。为了弥合这些模型之间的差距,最近的工作引入了平滑的框架,在每次迭代中,一个对手生成了因密度限制而受$\sigma_1美元约束的分布实例,而与某些固定基量 $\ mu$ 比较。这个框架在i.i.d. 和对抗性案例之间进行交叉计算,这取决于美元值。对于传统的在线预测问题,平滑的在线学习大多数先前结果都依赖于一个可论证的强的假设,即基量$\mu$,与PAC学习或一致性文献中的标准设置相比。我们认为,基量为未知,值是任意的。沿着这个方向,Block 和 Al- al. 显示,实证风险最小度在精确度假设的维度层面下具有亚线上的遗憾。我们建议基于递增的R-Cover值,它基于累变的累系数,这是第一个可以保证先入的R&rodeal rocle, rocial recial 。
Article 217
Title@2025-06-12 (4): Self-learning signal classifier for decameter coherent scatter radars
Title: Self-learning signal classifier for decameter coherent scatter radars | Selbstlernender Signalklassifikator für Dekameter-Konsistenzstreuradar | 10度连续散射雷达自学信号分类器 2506.10305v1 |
Authors (2): Oleg Berngardt, Ivan Lavygin
The paper presents a method for automatic constructing a classifier for processed data obtained by decameter coherent scatter radars. Method is based only on the radar data obtained, the results of automatic modeling of radio wave propagation in the ionosphere, and mathematical criteria for estimating the quality of the models. The final classifier is the model trained at data obtained by 12 radars of the SuperDARN and SECIRA networks over two years for each radar. The number of the model coefficients is 2669. For the classification, the model uses both the calculated parameters of radio wave propagation in the model ionosphere and the parameters directly measured by the radar. Calibration of radiowave elevation measurements at each radar was made using meteor trail scattered signals. The analysis showed that the optimal number of classes in the data is 37, of which 25 are frequently observed. The analysis made it possible to choose 14 classes from them, which are confidently separated in other variants of model training. A preliminary interpretation of 10 of them was carried out. The dynamics of observation of various classes and their dependence on the geographical latitude of radars at different levels of solar and geomagnetic activity were presented, it was shown that it does not contradict with known physical mechanisms. The analysis showed that the most important parameters to identify the classes are the shape of the signal ray-tracing trajectory in its second half, the ray-traced scattering height and the Doppler velocity measured by the radar.
论文展示了一种方法,用于自动构建一个分类器,用于对经处理的数据进行分解,该方法仅以获得的雷达数据、电离层无线电波传播自动建模结果和估计模型质量的数学标准为基础;最后的分类器是经12个雷达(SuperDARN和SECIRA网络的12个雷达)在两年中为每个雷达获得的数据培训的模型;模型系数数为2669;对于分类,模型使用了模型电离层无线电波传播的计算参数和雷达直接测量的参数;对每个雷达的无线电波高度测量进行了校准,使用了流星踪分散信号;分析表明,数据中最优的等级数为37个,其中25个经常观测;分析使得有可能从中选择14个类别,这些类别在其他模型培训变式中自信地分开;对其中10个模型进行了初步解释;对不同等级的观测动态及其对不同级别雷达地理纬度的依赖度的动态和由雷达直接测量的参数;对每个雷达进行校准的频率测量,每个雷达的测量是使用流道分散的信号信号波散的信号;分析显示,其半级与已知的轨道分析显示,其物理结构与已知的轨道与已知的轨道对比是相反。
Article 218
Title@2025-06-12 (4): The Alignment Trap: Complexity Barriers
Title: The Alignment Trap: Complexity Barriers | Die Alignment-Falle: Komplexitätsbarrieren | 协调陷阱:复杂障碍 2506.10304v1 |
Authors (1): Jasper Yao
We establish fundamental computational complexity barriers to verifying AI safety as system capabilities scale. Our main results show that for AI systems with expressiveness EXP$(m)$ above a critical threshold $\tau$, safety verification requires exponential time and is coNP-complete. We formalize the Capability-Risk Scaling (CRS) dynamic, which demonstrates how increasing AI capability drives societal safety requirements toward perfection, creating an inescapable tension with verification complexity. Through four core theorems, we prove that (1) verification complexity grows exponentially with system expressiveness, (2) safe policies comprise at most a $2^{-2^m}$ fraction of the policy space, (3) no finite set of alignment techniques can provide universal coverage, and (4) robust safety properties form measure-zero sets for neural networks. These results characterize an “intractability gap” where practical safety requirements fall within the region of computational intractability. We conclude by presenting a strategic trilemma: AI development must either constrain system complexity to maintain verifiable safety, accept unverifiable risks while scaling capabilities, or develop fundamentally new safety paradigms beyond verification. Our work provides the first systematic complexity-theoretic analysis of AI alignment and establishes rigorous bounds that any safety approach must confront. A formal verification of the core theorems in Lean4 is currently in progress.
我们通过系统能力规模,为核查AI安全设置了基本的计算复杂性障碍。我们的主要结果表明,对于清晰度 EXP$(m)$(m)超过临界阈值的AI系统,安全核查需要指数化的时间,并且已经完全完成。我们正式确定了CRS(CRS)动态,这表明了增强AI能力如何推动社会安全要求达到完美,造成无法避免的紧张与核查复杂性。我们通过四个核心理论,证明(1) 核查复杂性随着系统的清晰度而急剧增长,(2) 安全政策最多包含政策空间的2-2m}美元份额,(3) 没有一套有限的校准技术能够提供普遍覆盖,(4) 神经网络的稳健的安全性度为措施-零套。这些结果具有“可选性差距”的特点,因为实际的安全要求在计算不易失当的区域范围内。我们通过提出战略三重模型来得出结论: AI 开发必须要么限制系统复杂性,以保持可核查的安全,接受无法核查的风险,而同时提升能力,或者开发超出核查范围的基本新的安全模式。我们的工作提供了当前系统性复杂度分析,即严格地进行常规的AI调整。
Article 219
Title@2025-06-12 (4): Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training
Title: Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training | ReLU-Netzwerke zusammenführen, um bei der Initialisierung und während des Trainings viele lineare Regionen exponieren zu lassen | 在初始化和培训期间,将ReLU网络强制推至展示极多线性线性区域 2311.18022v7 |
Authors (3): Max Milkert, David Hyde, Forrest Laine
In a neural network with ReLU activations, the number of piecewise linear regions in the output can grow exponentially with depth. However, this is highly unlikely to happen when the initial parameters are sampled randomly, which therefore often leads to the use of networks that are unnecessarily large. To address this problem, we introduce a novel parameterization of the network that restricts its weights so that a depth $d$ network produces exactly $2^d$ linear regions at initialization and maintains those regions throughout training under the parameterization. This approach allows us to learn approximations of convex, one dimensional functions that are several orders of magnitude more accurate than their randomly initialized counterparts. We further demonstrate a preliminary extension of our construction to multidimensional and non-convex functions, allowing the technique to replace traditional dense layers in various architectures.
在使用 ReLU 激活的神经网络中,输出中的片断线性区域的数量会随着深度而成倍增长。 但是,当初始参数随机抽样时,这极不可能发生,因此往往导致使用不必要的大网络。为了解决这个问题,我们引入了网络的新参数化,限制其重量,以便一个深度的美元网络在初始化时能够产生精确的2d美元线性区域,并在整个培训过程中在参数化下保持这些区域。这个方法使我们能够学习二次曲线的近似值,即比随机初始化的对等系统更精确的数个数量级的一维函数。我们进一步展示了我们建设的多维和非曲线功能的初步扩展,使得技术能够取代各种结构中传统的密度层。
Article 220
Title@2025-06-12 (4): Distributionally-Constrained Adversaries in Online Learning
Title: Distributionally-Constrained Adversaries in Online Learning | Verteilt bedingte Widersacher im Online-Lernen | 在线学习中经分发的、经过培训的在线学习论文 2506.10293v1 |
Authors (2): Moïse Blanchard, Samory Kpotufe
There has been much recent interest in understanding the continuum from adversarial to stochastic settings in online learning, with various frameworks including smoothed settings proposed to bridge this gap. We consider the more general and flexible framework of distributionally constrained adversaries in which instances are drawn from distributions chosen by an adversary within some constrained distribution class [RST11]. Compared to smoothed analysis, we consider general distributional classes which allows for a fine-grained understanding of learning settings between fully stochastic and fully adversarial for which a learner can achieve non-trivial regret. We give a characterization for which distribution classes are learnable in this context against both oblivious and adaptive adversaries, providing insights into the types of interplay between the function class and distributional constraints on adversaries that enable learnability. In particular, our results recover and generalize learnability for known smoothed settings. Further, we show that for several natural function classes including linear classifiers, learning can be achieved without any prior knowledge of the distribution class – in other words, a learner can simultaneously compete against any constrained adversary within learnable distribution classes.
最近人们非常关心理解在线学习中从对立式到对立式的连续过程,包括提出平滑的设置等各种框架,以弥合这一差距。我们考虑到分布上受限制的对手较一般和灵活的框架,在这种框架中,从一些受限制的分配类[RST11]中由对手选择的分布中产生。与平滑的分析相比,我们考虑一般分配类,这种分配类可以细微地理解完全随机和完全对立式的学习环境,使学习者能够实现非三重式的遗憾。我们给出了在这种背景下可以学习哪些分销类的特征,既能与盲目对手和适应性对手对立,又能够洞察功能类和对对手的分布限制之间的相互作用类型。特别是,我们的结果恢复了已知的平稳环境,并普遍地学习。此外,我们表明,对于几个自然功能类,包括线性分类,学习可以在不事先了解分配类的情况下实现 – 换句话说,学习者可以同时与任何受限制的对手在可学习的分配类内进行竞争。
Article 221
Title@2025-06-12 (4): Heterogeneous Multi-Agent Reinforcement Learning for Distributed Channel Access in WLANs
Title: Heterogeneous Multi-Agent Reinforcement Learning for Distributed Channel Access in WLANs | Heterogenes Multi-Agenten-Verstärkungslernen für verteilten Kanalzugang in WLANs | 不同多机构强化学习,用于在局域网中分配的频道接入 2412.14218v2 |
Authors (6): Jiaming Yu, Le Liang, Chongtao Guo, Ziyang Guo, Shi Jin, Geoffrey Ye Li
This paper investigates the use of multi-agent reinforcement learning (MARL) to address distributed channel access in wireless local area networks. In particular, we consider the challenging yet more practical case where the agents heterogeneously adopt value-based or policy-based reinforcement learning algorithms to train the model. We propose a heterogeneous MARL training framework, named QPMIX, which adopts a centralized training with distributed execution paradigm to enable heterogeneous agents to collaborate. Moreover, we theoretically prove the convergence of the proposed heterogeneous MARL method when using the linear value function approximation. Our method maximizes the network throughput and ensures fairness among stations, therefore, enhancing the overall network performance. Simulation results demonstrate that the proposed QPMIX algorithm improves throughput, mean delay, delay jitter, and collision rates compared with conventional carrier-sense multiple access with collision avoidance (CSMA/CA) mechanism in the saturated traffic scenario. Furthermore, the QPMIX algorithm is robust in unsaturated and delay-sensitive traffic scenarios. It coexists well with the conventional CSMA/CA mechanism and promotes cooperation among heterogeneous agents.
本文调查多试剂强化学习(MARL)的使用,以解决无线局域网中分散的频道接入问题。我们特别审议了一个具有挑战性的、更实际的情况,即代理商采用多种不同的基于价值或基于政策的强化学习算法来培训模型。我们提议了一个称为QPMIX的多式MARL培训框架,采用集中培训,采用分布式执行模式,使不同代理商能够合作。此外,我们从理论上证明,在使用线性值函数近似时,拟议的多元MARL方法已经趋于一致。我们的方法使网络通过量最大化,确保各站之间的公平性,从而加强了整个网络的性能。模拟结果表明,与饱和交通情景下的常规承运人多发式避免碰撞(CSMA/CA)机制相比,拟议的QPMIX算法改进了吞吐量、平均延迟、延迟和碰撞率。此外,QPMIX算法在不饱和迟缓性反应性交通情景下十分稳健。它与常规的CSMA/CA机制共存,并促进各不同物剂之间的合作。
Article 222
Title@2025-06-12 (4): Context Is Not Comprehension
Title: Context Is Not Comprehension | Kontext ist nicht verständlich | 背景不令人理解 2506.04907v4 |
Authors (2): Alex Pan, Mary-Anne Williams
The dominant way of judging Large Language Models (LLMs) has been to ask how well they can recall explicit facts from very long inputs. While today’s best models achieve near perfect recall, this masks a harder skill: performing multi-step reasoning and tracking intermediate state that never appears verbatim. We introduce Verbose ListOps (VLO), a benchmark that embeds deterministic ListOps computations inside narrative camouflage and, crucially, allows step-level evaluation of every intermediate result. Experiments show that models which solve raw ListOps with approximately 100% accuracy collapse on VLO after only 10,000 tokens. By exposing where a model’s reasoning chain first diverges, VLO moves assessment beyond sheer context length and toward genuine comprehension. VLO’s generation pipeline is task-agnostic: it can weave any deterministically verifiable reasoning schema – arithmetic, symbolic, abductive, inductive or defeasible – into narrative form. This makes VLO a reusable test-bed for the next wave of reasoning-centric model designs, not merely those with step-explicit scaffolds.
判断大语言模型(LLMS)的主要方法一直是询问它们能够从非常长的投入中记得清楚的事实。 虽然今天的最佳模型几乎完全可以回顾, 但是它掩盖了一个更难掌握的技能: 执行多步推理和追踪从未出现过的中间状态。 我们引入了Verbose ListOps (VLO), 这个基准将确定性 ListOps 计算方法嵌入叙事伪装中, 并且关键地允许对每个中间结果进行分级评估。 实验显示, 解决原始 ListOps 的模型, 其精度在1万个符号之后大约为100%的VLO 崩溃。 通过曝光模型推理链首先出现差异的地方, VLO 移动评估超越纯粹的上下文长度, 走向真正的理解。 VLO 的一代管道具有任务性: 它可以将任何可确定性、 可核实的推理的推理方法 – 算、 符号、 、 诱性、 诱导性或易失能性 – 编成叙述形式。 这让VLO成为下一个推理中心模型设计的可重复的测试台, 不仅仅是具有分辨的试的测试台式模型设计。 。 这让VLO成为下一个波的试的测试台, 的测试台, 。
Article 223
Title@2025-06-12 (4): ClusterUCB: Efficient Gradient-Based Data Selection for Targeted Fine-Tuning of LLMs
Title: ClusterUCB: Efficient Gradient-Based Data Selection for Targeted Fine-Tuning of LLMs | ClusterUCB: Effiziente Gradient-basierte Datenauswahl für gezielte Feinsteuerung von LLMs | COCUCB: 高效率的逐步数据选择,以便有针对性地微调LLMM 2506.10288v1 |
Authors (6): Zige Wang, Qi Zhu, Fei Mi, Minghui Xu, Ruochun Jin, Wenjing Yang
Gradient-based data influence approximation has been leveraged to select useful data samples in the supervised fine-tuning of large language models. However, the computation of gradients throughout the fine-tuning process requires too many resources to be feasible in practice. In this paper, we propose an efficient gradient-based data selection framework with clustering and a modified Upper Confidence Bound (UCB) algorithm. Based on the intuition that data samples with similar gradient features will have similar influences, we first perform clustering on the training data pool. Then, we frame the inter-cluster data selection as a constrained computing budget allocation problem and consider it a multi-armed bandit problem. A modified UCB algorithm is leveraged to solve this problem. Specifically, during the iterative sampling process, historical data influence information is recorded to directly estimate the distributions of each cluster, and a cold start is adopted to balance exploration and exploitation. Experimental results on various benchmarks show that our proposed framework, ClusterUCB, can achieve comparable results to the original gradient-based data selection methods while greatly reducing computing consumption.
在大型语言模型的监督下微调中,已利用基于渐变的数据影响近似值来选择有用的数据样本,但在整个微调过程中计算梯度需要太多的资源才能实际可行。在本文件中,我们建议采用高效的梯度数据选择框架,同时采用集群和修改的超信任量计算法。基于具有类似梯度特征的数据样本具有类似影响的直觉,我们首先在培训数据库中进行分组。然后,我们将集群数据选择视为一个有限的计算预算分配问题,并将其视为一个多臂强盗问题。修改后的UCB算法被用来解决这一问题。具体地说,在迭代抽样过程中,记录历史数据影响信息,直接估计每个组的分布情况,并采用一个寒冷的起点来平衡探索和利用。各种基准的实验结果表明,我们提议的Crob CUCB框架可以取得与原始梯度数据选择方法的类似结果,同时大大减少计算消耗量。
Article 224
Title@2025-06-12 (4): Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning
Title: Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning | Graph-MLLM: Multimodale große Sprachmodelle für multimodales Graphenlernen nutzen | 图图-MLLM:利用多模式多模式大语言模型进行多模式图表学习 2506.10282v1 |
Authors (6): Jiajin Liu, Dongzhe Fan, Jiacheng Shen, Chuanhao Ji, Daochen Zha, Qiaoyu Tan
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in representing and understanding diverse modalities. However, they typically focus on modality alignment in a pairwise manner while overlooking structural relationships across data points. Integrating multimodality with structured graph information (i.e., multimodal graphs, MMGs) is essential for real-world applications such as social networks, healthcare, and recommendation systems. Existing MMG learning methods fall into three paradigms based on how they leverage MLLMs: Encoder, Aligner, and Predictor. MLLM-as-Encoder focuses on enhancing graph neural networks (GNNs) via multimodal feature fusion; MLLM-as-Aligner aligns multimodal attributes in language or hidden space to enable LLM-based graph reasoning; MLLM-as-Predictor treats MLLMs as standalone reasoners with in-context learning or fine-tuning. Despite their advances, the MMG field lacks a unified benchmark to fairly evaluate across these approaches, making it unclear what progress has been made. To bridge this gap, we present Graph-MLLM, a comprehensive benchmark for multimodal graph learning by systematically evaluating these three paradigms across six datasets with different domains. Through extensive experiments, we observe that jointly considering the visual and textual attributes of the nodes benefits graph learning, even when using pre-trained text-to-image alignment models (e.g., CLIP) as encoders. We also find that converting visual attributes into textual descriptions further improves performance compared to directly using visual inputs. Moreover, we observe that fine-tuning MLLMs on specific MMGs can achieve state-of-the-art results in most scenarios, even without explicit graph structure information. We hope that our open-sourced library will facilitate rapid, equitable evaluation and inspire further innovative research in this field.
多式大语言模型(MLLM)在体现和理解多种模式方面表现出了非凡的能力,然而,它们通常侧重于模式调整,以配对方式调整,同时忽略各数据点之间的结构关系。将多式信息与结构图形信息(即多式图表、MMGs)相结合,对于社会网络、医疗保健和建议系统等现实世界应用至关重要。现有的MMG学习方法基于如何利用MLMM(MLMM)的三种模式:Encoder、Aleger和Ans-Encorder。MLM-as-Encorder(GNNNs)侧重于通过多式功能组合加强图形直观神经网络(GNNS);MLLM-Aligner(MLM-Aligner)在语言或隐蔽空间将多式图形信息模型属性整合在一起,使LLMMM-MM(MM)结构推理推理;MLM-S-PLM(M)将MLM(MLM(M)作为独立解或微调调解的理论)作为独立解释的理论的理论解释。MLMLMLM(我们在这方面的更深入的更深入评估,我们甚至更深入的理论)实地评估。MLILM(我们更深入的理论)将利用了这些直观的文本,我们更深入的理论的理论,我们更深入的理论,我们更深入的实地的理论,我们观察的实地评估。
Article 225
Title@2025-06-12 (4): Noise Balance and Stationary Distribution of Stochastic Gradient Descent
Title: Noise Balance and Stationary Distribution of Stochastic Gradient Descent | Geräuschausgleich und Stationäre Verteilung des stochastischen Gradientenabstiegs | 沙粒渐变后裔的噪音平衡和固定分布 2308.06671v2 |
Authors (3): Liu Ziyin, Hongchao Li, Masahito Ueda
The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks. However, it remains poorly understood how the SGD navigates the highly nonlinear and degenerate loss landscape of a neural network. In this work, we show that the minibatch noise of SGD regularizes the solution towards a noise-balanced solution whenever the loss function contains a rescaling parameter symmetry. Because the difference between a simple diffusion process and SGD dynamics is the most significant when symmetries are present, our theory implies that the loss function symmetries constitute an essential probe of how SGD works. We then apply this result to derive the stationary distribution of stochastic gradient flow for a diagonal linear network with arbitrary depth and width. The stationary distribution exhibits complicated nonlinear phenomena such as phase transitions, broken ergodicity, and fluctuation inversion. These phenomena are shown to exist uniquely in deep networks, implying a fundamental difference between deep and shallow models.
用于培养神经网络的算法(SGD)是用于培养神经网络的算法。 但是,它仍然不太清楚 SGD是如何导航神经网络高度非线性和退化性损失景观的。 在这项工作中,我们显示SGD的微小噪声在损失函数包含一个伸缩参数对称功能时,会规范对噪音平衡解决办法的解决方案。由于简单扩散过程与SGD动态之间的差别在出现对称时最为显著,我们的理论暗示损失函数对称构成了SGD如何工作的基本探测。 然后我们应用这一结果来为具有任意深度和宽度的对角线网络生成静态的斜度流分布。 静止分布显示了复杂的非线性现象, 如阶段转换、 折断的垂直和波动。 这些现象在深层网络中表现出独特的存在, 意味着深度和浅度模型之间的根本差异 。
Article 226
Title@2025-06-12 (4): Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty
Title: Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty | Leichtes Datenset Pruning ohne vollständiges Training über Beispiel Schwierigkeit und Vorhersage Ungewissheit | 轻量数据集不通过实例困难和预测不确定性进行充分培训 2502.06905v3 |
Authors (4): Yeseul Cho, Baekrok Shin, Changmin Kang, Chulhee Yun
Recent advances in deep learning rely heavily on massive datasets, leading to substantial storage and training costs. Dataset pruning aims to alleviate this demand by discarding redundant examples. However, many existing methods require training a model with a full dataset over a large number of epochs before being able to prune the dataset, which ironically makes the pruning process more expensive than just training the model on the entire dataset. To overcome this limitation, we introduce a Difficulty and Uncertainty-Aware Lightweight (DUAL) score, which aims to identify important samples from the early training stage by considering both example difficulty and prediction uncertainty. To address a catastrophic accuracy drop at an extreme pruning, we further propose a ratio-adaptive sampling using Beta distribution. Experiments on various datasets and learning scenarios such as image classification with label noise and image corruption, and model architecture generalization demonstrate the superiority of our method over previous state-of-the-art (SOTA) approaches. Specifically, on ImageNet-1k, our method reduces the time cost for pruning to 66% compared to previous methods while achieving a SOTA, specifically 60% test accuracy at a 90% pruning ratio. On CIFAR datasets, the time cost is reduced to just 15% while maintaining SOTA performance.
最近深层学习的进展严重依赖大量数据集,导致大量储存和培训费用。数据集修剪的目的是通过丢弃冗余实例来缓解这种需求。然而,许多现有方法需要培训一个模型,在大量时代中拥有完整的数据集,然后才能将数据集精化,讽刺的是,这使得修剪过程比在整个数据集中仅对模型进行训练要贵得多。为了克服这一限制,我们引入了困难和不确定的软件轻量分(DUAL),目的是通过考虑示例困难和预测不确定性,从早期培训阶段找出重要样本,以便从早期培训阶段找出重要样本,从而通过考虑示例困难和预测不确定性来缓解这一需求。要解决在极端修剪裁时灾难性的精确度下降的问题,我们进一步建议使用Beta分布来进行一个适合比例的抽样。实验了各种数据集和学习情景,例如标签噪音和图像腐败的图像分类,以及模型结构化,表明我们的方法优于以往的状态-艺术(SOTA)方法。具体地说,我们的方法旨在降低与先前方法相比的运行时间成本为66%,同时实现SOTA的精确率率率为15,而SOTRA的运行为90。
Article 227
Title@2025-06-12 (4): VQC-MLPNet: An Unconventional Hybrid Quantum-Classical Architecture for Scalable and Robust Quantum Machine Learning
Title: VQC-MLPNet: An Unconventional Hybrid Quantum-Classical Architecture for Scalable and Robust Quantum Machine Learning | VQC-MLPNet: Eine unkonventionelle hybride Quantum-Klassische Architektur für skalierbares und robustes Quantum Machine Learning | VQC-MLPNet:用于可缩放和强力量量机器学习的非常规混合量子分类结构 2506.10275v1 |
Authors (4): Jun Qi, Chao-Han Yang, Pin-Yu Chen, Min-Hsiu Hsieh
Variational Quantum Circuits (VQCs) offer a novel pathway for quantum machine learning, yet their practical application is hindered by inherent limitations such as constrained linear expressivity, optimization challenges, and acute sensitivity to quantum hardware noise. This work introduces VQC-MLPNet, a scalable and robust hybrid quantum-classical architecture designed to overcome these obstacles. By innovatively employing quantum circuits to dynamically generate parameters for classical Multi-Layer Perceptrons (MLPs) via amplitude encoding and parameterized quantum operations, VQC-MLPNet substantially expands representation capabilities and augments training stability. We provide rigorous theoretical guarantees via statistical learning techniques and Neural Tangent Kernel analysis, explicitly deriving upper bounds on approximation, uniform deviation, and optimization errors. These theoretical insights demonstrate exponential improvements in representation capacity relative to quantum circuit depth and the number of qubits, providing clear computational advantages over standalone quantum circuits and existing hybrid quantum architectures. Our theoretical claims are empirically corroborated through extensive experiments, including classifying semiconductor quantum-dot charge states and predicting genomic transcription factor binding sites, demonstrating resilient performance even under realistic IBM quantum noise simulations. This research establishes a theoretically sound and practically robust framework, advancing the frontiers of quantum-enhanced learning for unconventional computing paradigms in the Noisy Intermediate-Scale Quantum era and beyond.
QQC-MLPNet(VQC-MLPNet)为量子机器学习提供了一条全新的路径,然而,其实际应用却受到内在限制的阻碍,例如限制直线表达、优化挑战、对量子硬件噪音的敏感度等固有限制。这项工作引入了VQC-MLPNet,这是一个可升级和强大的混合量子古典结构,旨在克服这些障碍。通过创新利用量子电路,动态地为古典多低视谱(MLP)生成参数,通过振幅编码和参数化量子操作,VQC-MLPNet(VQC-MLPNet)大大扩大了代表性能力,增强了培训稳定性。我们通过统计学习技术和神经中内内内内内内内内内内核分析,提供严格的理论保障,明确得出关于近似、统一偏差和优化误差的上界限。这些理论洞察显示,相对于量电路深度和Qibitis数量结构(ML)而言,代表性明显具有计算优势。我们的理论主张通过广泛的实验得到证实,包括将半导师型量控股价充量充量充量充量的状态,并预测在现实的代平级平级平级研究中平级平级平级平级平级的平级结构结构结构结构结构内,建立具有实际的平质性平质性平质的平级研究基础的平质性平标。
Article 228
Title@2025-06-12 (4): TimeBridge: Better Diffusion Prior Design with Bridge Models for Time Series Generation
Title: TimeBridge: Better Diffusion Prior Design with Bridge Models for Time Series Generation | TimeBridge: Besseres Diffusions-Vorentwurf mit Bridge-Modellen für die Zeitreihenerstellung | TimeBridge: 更好的传播前设计,具有时间序列生成的桥梁模型 2408.06672v2 |
Authors (5): Jinseong Park, Seungyun Lee, Woojin Jeong, Yujin Choi, Jaewook Lee
Time series generation is widely used in real-world applications such as simulation, data augmentation, and hypothesis testing. Recently, diffusion models have emerged as the de facto approach to time series generation, enabling diverse synthesis scenarios. However, the fixed standard-Gaussian diffusion prior may be ill-suited for general time series data, such as temporal order and fixed points. In this paper, we propose TimeBridge, a framework that flexibly synthesizes time series data by using diffusion bridges to learn paths between a chosen prior and the data distribution. We then explore several prior designs tailored to time series synthesis. Our framework covers (i) data- and time-dependent priors for unconditional generation and (ii) scale-preserving priors for conditional generation. Experiments show that our framework with data-driven priors outperforms standard diffusion models on time series generation.
时间序列生成被广泛用于模拟、数据扩增和假设测试等现实世界应用。最近,扩散模型被作为时间序列生成的实际方法出现,从而产生了多种合成设想。然而,以前固定的标准-Gausian扩散可能不适合一般时间序列数据,如时间顺序和固定点。在本文中,我们提议了TimeBridge,这是一个通过使用传播桥梁来灵活合成时间序列数据的框架,以学习选定先前数据分布与数据分布之间的路径。然后我们探索了以前为时间序列合成而定制的若干设计。我们的框架涵盖(一) 无条件生成的数据和时间前期,以及(二) 有条件生成的缩放前期。实验表明,我们以数据驱动的前期生成框架在时间序列生成方面优于标准扩散模型。
Article 229
Title@2025-06-12 (4): GD doesn’t make the cut: Three ways that non-differentiability affects neural network training
Title: GD doesn’t make the cut: Three ways that non-differentiability affects neural network training | GD macht nicht den Schnitt: Drei Wege, dass Nicht-Unterschiedlichkeit das neuronale Netzwerk-Training beeinflusst | GD没有做出削减: 三种无差异的方式影响神经网络培训 2401.08426v9 |
Authors (1): Siddharth Krishna Kumar
This paper critically examines the fundamental distinctions between gradient methods applied to non-differentiable functions (NGDMs) and classical gradient descents (GDs) for differentiable functions, revealing significant gaps in current deep learning optimization theory. We demonstrate that NGDMs exhibit markedly different convergence properties compared to GDs, strongly challenging the applicability of extensive neural network convergence literature based on $L-smoothness$ to non-smooth neural networks. Our analysis reveals paradoxical behavior of NDGM solutions for $L_{1}$-regularized problems, where increasing regularization counterintuitively leads to larger $L_{1}$ norms of optimal solutions. This finding calls into question widely adopted $L_{1}$ penalization techniques for network pruning. We further challenge the common assumption that optimization algorithms like RMSProp behave similarly in differentiable and non-differentiable contexts. Expanding on the Edge of Stability phenomenon, we demonstrate its occurrence in a broader class of functions, including Lipschitz continuous convex differentiable functions. This finding raises important questions about its relevance and interpretation in non-convex, non-differentiable neural networks, particularly those using ReLU activations. Our work identifies critical misunderstandings of NDGMs in influential literature, stemming from an overreliance on strong smoothness assumptions. These findings necessitate a reevaluation of optimization dynamics in deep learning, emphasizing the crucial need for more nuanced theoretical foundations in analyzing these complex systems.
本文批判地审视了适用于非差别功能(NGDMs)的梯度方法与适用于非差别功能(NGDMs)的经典梯度下降(GDs)方法之间的根本区别,揭示了当前深层次学习优化理论中的重大差距。我们证明,NGDMs展示了与GDs相比明显不同的趋同特性,强烈质疑基于美元对非线性神经网络的大规模神经网络趋同文献的适用性。我们的分析揭示了对美元常规化问题的NDGM解决方案的自相矛盾行为,在这些问题上,不断增加的正规化反直觉导致最佳解决方案的更高值($L*1})规范。这一发现使人们对广泛采用美元($$1})的处罚方法进行质疑。我们进一步质疑了一种共同的假设,即像RMSProp这样的优化算法在差异和无差别的环境中运行类似。在稳定现象的扩大时,我们展示了它出现在更广泛的职能类别中,包括Libschitz convers condicial convers ex ex remodition refrodutional refroduction reful fidiversal relist resmavidustrislational revidustrismismismismismismismismlation.
Article 230
Title@2025-06-12 (4): Predicting function of evolutionarily implausible DNA sequences
Title: Predicting function of evolutionarily implausible DNA sequences | Vorhersagefunktion evolutionär unglaubwürdiger DNA-Sequenzen | 进化性DNA序列的预测功能 2506.10271v1 |
Authors (3): Shiyu Jiang, Xuyin Liu, Zitong Jerry Wang
Genomic language models (gLMs) show potential for generating novel, functional DNA sequences for synthetic biology, but doing so requires them to learn not just evolutionary plausibility, but also sequence-to-function relationships. We introduce a set of prediction tasks called Nullsettes, which assesses a model’s ability to predict loss-of-function mutations created by translocating key control elements in synthetic expression cassettes. Across 12 state-of-the-art models, we find that mutation effect prediction performance strongly correlates with the predicted likelihood of the nonmutant. Furthermore, the range of likelihood values predictive of strong model performance is highly dependent on sequence length. Our work highlights the importance of considering both sequence likelihood and sequence length when using gLMs for mutation effect prediction.
基因组语言模型(gLMs)显示出为合成生物学生成新型、功能性DNA序列的潜力,但这样做要求它们不仅了解进化的可信任性,而且还了解序列到功能关系。我们引入了一套称为Nullsettes的预测任务,评估模型预测通过合成表达式磁带中关键控制要素的转换产生的功能性突变的能力。在12个最先进的模型中,我们发现变异效应预测性能与预测的非变异性的可能性密切相关。此外,预测强型性能的可能性值范围高度取决于序列长度。我们的工作强调了在使用 gLMs进行突变效应预测时考虑序列概率和序列长度的重要性。
Article 231
Title@2025-06-12 (4): Interior-Point Vanishing Problem in Semidefinite Relaxations for Neural Network Verification
Title: Interior-Point Vanishing Problem in Semidefinite Relaxations for Neural Network Verification | Interior-Point Vanishing Problem in semidefinite Entspannung für Neural Network Verifikation | 神经网络核查半无限期放松措施中的内部-点消失问题 2506.10269v1 |
Authors (4): Ryota Ueda, Takami Sato, Ken Kobayashi, Kazuhide Nakata
Semidefinite programming (SDP) relaxation has emerged as a promising approach for neural network verification, offering tighter bounds than other convex relaxation methods for deep neural networks (DNNs) with ReLU activations. However, we identify a critical limitation in the SDP relaxation when applied to deep networks: interior-point vanishing, which leads to the loss of strict feasibility – a crucial condition for the numerical stability and optimality of SDP. Through rigorous theoretical and empirical analysis, we demonstrate that as the depth of DNNs increases, the strict feasibility is likely to be lost, creating a fundamental barrier to scaling SDP-based verification. To address the interior-point vanishing, we design and investigate five solutions to enhance the feasibility conditions of the verification problem. Our methods can successfully solve 88% of the problems that could not be solved by existing methods, accounting for 41% of the total. Our analysis also reveals that the valid constraints for the lower and upper bounds for each ReLU unit are traditionally inherited from prior work without solid reasons, but are actually not only unbeneficial but also even harmful to the problem’s feasibility. This work provides valuable insights into the fundamental challenges of SDP-based DNN verification and offers practical solutions to improve its applicability to deeper neural networks, contributing to the development of more reliable and secure systems with DNNs.
在神经网络核查方面,半终点编程的放松已成为一个有希望的神经网络核查方法,为深神经网络的激活提供了比其他孔雀放松方法更紧的界限。然而,我们发现,在应用到深网络时,SDP放松是一个关键限制:内点消失,导致失去严格的可行性 – – 这是SDP数字稳定性和最佳性的一个关键条件。通过严格的理论和经验分析,我们表明,随着DNN的深度增加,严格的可行性有可能丧失,为扩大SDP的核查造成一个基本障碍。为解决内部点消失的问题,我们设计和调查五项解决办法,以加强核查问题的可行性条件。我们的方法可以成功地解决现有方法无法解决的88%的问题,占总数的41%。我们的分析还表明,对ReLU每个单位的下层和上层界限的有效限制,传统上是没有坚实理由从先前工作中继承的,但实际上不仅不可靠,而且对扩大基于SDP的核查范围构成根本挑战,而且对问题的实际可行性也有害。这项工作为SNPR更深入的核查提供了宝贵的见解,为SD的系统提供了更可靠的安全性的挑战。
Article 232
Title@2025-06-12 (4): Do Language Models Have Bayesian Brains? Distinguishing Stochastic and Deterministic Decision Patterns within Large Language Models
Title: Do Language Models Have Bayesian Brains? Distinguishing Stochastic and Deterministic Decision Patterns within Large Language Models | Haben Sprachmodelle Bayesische Gehirne? Beeindruckende stochastische und deterministische Entscheidungsmuster innerhalb großer Sprachmodelle | 语言模式是否具有贝耶斯人脑? 区分大语言模式中的斯托卡和决定性决定模式 2506.10268v1 |
Authors (2): Andrea Yaoyun Cui, Pengfei Yu
Language models are essentially probability distributions over token sequences. Auto-regressive models generate sentences by iteratively computing and sampling from the distribution of the next token. This iterative sampling introduces stochasticity, leading to the assumption that language models make probabilistic decisions, similar to sampling from unknown distributions. Building on this assumption, prior research has used simulated Gibbs sampling, inspired by experiments designed to elicit human priors, to infer the priors of language models. In this paper, we revisit a critical question: Do language models possess Bayesian brains? Our findings show that under certain conditions, language models can exhibit near-deterministic decision-making, such as producing maximum likelihood estimations, even with a non-zero sampling temperature. This challenges the sampling assumption and undermines previous methods for eliciting human-like priors. Furthermore, we demonstrate that without proper scrutiny, a system with deterministic behavior undergoing simulated Gibbs sampling can converge to a “false prior.” To address this, we propose a straightforward approach to distinguish between stochastic and deterministic decision patterns in Gibbs sampling, helping to prevent the inference of misleading language model priors. We experiment on a variety of large language models to identify their decision patterns under various circumstances. Our results provide key insights in understanding decision making of large language models.
语言模型基本上是象征性序列的概率分布。 自动递减模型通过迭代计算和采样从下一个符号的分布中产生句子。 迭代抽样引入了随机性, 从而假设语言模型作出概率性决定, 类似于未知分布的抽样。 基于这一假设, 先前的研究使用了模拟Gibs抽样, 受旨在提取人类前科的实验的启发, 并参照了语言模型的前身。 在本文中, 我们重新审视了一个关键问题 : 语言模型是否拥有巴耶斯人的大脑? 我们的研究结果显示, 在某些条件下, 语言模型可以展示出近乎决定性的决策决策, 例如生成最大概率估计, 甚至是非零采样温度。 这挑战了取样假设, 并破坏了先前的人类类似模式。 此外, 我们证明, 未经适当审查, 模拟吉布抽样抽样的具有确定性行为的系统可以归结为“ 假象 ” 。 为了解决这个问题, 我们建议一种直截了的方法来区分随机和确定性决定性的决定模式, 有助于防止在大量语言模型中做出误导性判断。
Article 233
Title@2025-06-12 (4): Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update
Title: Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update | Heavy-Tailed Linear Bandits: Huber-Regression mit One-Pass-Update | 重困线性强盗:用单页更新的 “ Huber Refer “ 后退 2503.00419v2 |
Authors (4): Jing Wang, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou
We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develops a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational costs: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a \emph{one-pass} algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from $\mathcal{O}(t \log T)$ to $\mathcal{O}(1)$ with respect to current round $t$ and the time horizon $T$, and achieves a near-optimal and variance-aware regret of order $\widetilde{\mathcal{O}}\big(d T^{\frac{1-\epsilon}{2(1+\epsilon)}} \sqrt{\sum_{t=1}^T \nu_t^2} + d T^{\frac{1-\epsilon}{2(1+\epsilon)}}\big)$ where $d$ is the dimension and $\nu_t^{1+\epsilon}$ is the $(1+\epsilon)$-th central moment of reward at round $t$.
我们用重尾噪声研究螺旋线性土匪。 处理重尾噪声、 脱轨和中位速率的两种原则战略已经引入到重尾土匪身上。 尽管如此, 这些方法仍然依赖于特定的噪音假设或强盗结构, 限制其适用于一般环境。 最近的工作 [Huang et al. 2024] 通过适应性枢纽回归开发一种软拖网法, 以解决这些限制。 然而, 它们的方法有不理想的计算成本: 它需要存储所有历史数据并在每轮中完成这些数据的完整传输。 在本文中, 我们提出基于在线镜底下移框架的 emph{ one- pass} 算法。 我们的方法仅使用每轮的当前数据更新, 将每轮计算成本从$\macal{O} (t\log T) 美元降低到$ mathal{ {O}(1)$, 相对于当前圆美元和时平面值$2( $_\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
Article 234
Title@2025-06-12 (4): Meta-learning Representations for Learning from Multiple Annotators
Title: Meta-learning Representations for Learning from Multiple Annotators | Meta-Learning-Darstellungen zum Lernen von mehreren Annotatoren | 从多个宣传员那里学习的元学习代表 2506.10259v1 |
Authors (5): Atsutoshi Kumagai, Tomoharu Iwata, Taishi Nishiyama, Yasutoshi Ida, Yasuhiro Fujiwara
We propose a meta-learning method for learning from multiple noisy annotators. In many applications such as crowdsourcing services, labels for supervised learning are given by multiple annotators. Since the annotators have different skills or biases, given labels can be noisy. To learn accurate classifiers, existing methods require many noisy annotated data. However, sufficient data might be unavailable in practice. To overcome the lack of data, the proposed method uses labeled data obtained in different but related tasks. The proposed method embeds each example in tasks to a latent space by using a neural network and constructs a probabilistic model for learning a task-specific classifier while estimating annotators’ abilities on the latent space. This neural network is meta-learned to improve the expected test classification performance when the classifier is adapted to a given small amount of annotated data. This classifier adaptation is performed by maximizing the posterior probability via the expectation-maximization (EM) algorithm. Since each step in the EM algorithm is easily computed as a closed-form and is differentiable, the proposed method can efficiently backpropagate the loss through the EM algorithm to meta-learn the neural network. We show the effectiveness of our method with real-world datasets with synthetic noise and real-world crowdsourcing datasets.
我们建议了一种向多个吵闹的告示员学习的元学习方法。 在诸如众包服务等许多应用中,监督学习的标签由多个说明员提供。 由于批注员有不同的技能或偏差, 给标签会很吵。 要学习准确的分类师, 现有的方法需要许多吵闹的附加说明的数据。 但是, 在实践中可能缺乏足够的数据。 为了克服数据缺乏的问题, 提议的方法使用在不同但相关的任务中获得的标签数据。 拟议的方法将每个例子嵌入任务的潜在空间中, 使用神经网络, 并构建一种在估计潜在空间内特定任务分类员的能力时学习特定分类员的概率模型。 这个神经网络是元化的, 以提高预期的测试分类性性能, 当分类员适应给定少量附加说明的数据时。 这个分类员的调整是通过期望- 最大化的算法来进行。 由于EM 算法的每个步骤很容易以封闭的形式和不同的方式计算, 拟议的方法可以有效地将一个特定分类员在潜在空间中学习一个任务分类员的能力模型。 这个神经网络的计算出真正的损失, 通过我们 EM 的合成的算算算算方法可以有效地用真实数据系统来显示真实的数据- 。
Article 235
Title@2025-06-12 (4): Elucidating the Design Space of Multimodal Protein Language Models
Title: Elucidating the Design Space of Multimodal Protein Language Models | Erklärung des Gestaltungsraums multimodaler Protein-Sprachmodelle | 说明多模式蛋白质语言模型的设计空间 2504.11454v3 |
Authors (8): Cheng-Yen Hsieh, Xinyou Wang, Daiheng Zhang, Dongyu Xue, Fei Ye, Shujian Huang, Zaixiang Zheng, Quanquan Gu
Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. We identify tokenization loss and inaccurate structure token predictions by the PLMs as major bottlenecks. To address these, our proposed design space covers improved generative modeling, structure-aware architectures and representation learning, and data exploration. Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling. The effective design methods dramatically improve the structure generation diversity, and notably, folding abilities of our 650M model by reducing the RMSD from 5.52 to 2.36 on PDB testset, even outperforming 3B baselines and on par with the specialized folding models. Project page and code: https://bytedance.github.io/dplm/dplm-2.1/.
多式蛋白语言模型(PLM)结合了序列和基于象征性的结构信息,作为蛋白质建模、生成和设计的一个强有力的基础。然而,依赖象征性的三维结构作为离散的象征物造成了对精细结构细节和相关性的忠实性的巨大损失。在本文件中,我们系统地阐明了多式蛋白语言模型的设计空间,以克服其局限性。我们确定PLM模型的象征性损失和不准确的结构象征性预测是主要的瓶颈。为解决这些问题,我们拟议的设计空间包括改良的基因模型、结构认知架构和代表性学习以及数据探索。我们的进步方法的精细化监督,表明基于象征性的多式MLMs能够实现强有力的结构建模。有效的设计方法极大地改进了我们650M模型的结构多样性,特别是将PDB测试的RMSD从5.52降至2.36,甚至超过3B基线和与专门折叠模型相匹配。项目页和代码:https://byteance.gith.io/dplm/dplm-2.1/。
Article 236
Title@2025-06-12 (4): OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning
Title: OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning | OmniSage: Large Scale, Multi-Entity Heterogenes Graph Representation Learning | 总括:大规模、多实体异质图代表制学习 2504.17811v3 |
Authors (8): Anirudhan Badrinath, Alex Yang, Kousik Rajesh, Prabhat Agarwal, Jaewon Yang, Haoyu Chen, Jiajing Xu, Charles Rosenberg
Representation learning, a task of learning latent vectors to represent entities, is a key task in improving search and recommender systems in web applications. Various representation learning methods have been developed, including graph-based approaches for relationships among entities, sequence-based methods for capturing the temporal evolution of user activities, and content-based models for leveraging text and visual content. However, the development of a unifying framework that integrates these diverse techniques to support multiple applications remains a significant challenge. This paper presents OmniSage, a large-scale representation framework that learns universal representations for a variety of applications at Pinterest. OmniSage integrates graph neural networks with content-based models and user sequence models by employing multiple contrastive learning tasks to effectively process graph data, user sequence data, and content signals. To support the training and inference of OmniSage, we developed an efficient infrastructure capable of supporting Pinterest graphs with billions of nodes. The universal representations generated by OmniSage have significantly enhanced user experiences on Pinterest, leading to an approximate 2.5% increase in sitewide repins (saves) across five applications. This paper highlights the impact of unifying representation learning methods, and we make the model code publicly available at https://github.com/pinterest/atg-research/tree/main/omnisage.
代表学习是学习潜在载体以代表实体的一项任务,是改进网络应用程序搜索和建议系统的关键任务。已经开发了各种代表学习方法,包括实体间关系的图形化方法、用户活动时间演变的顺序化方法、利用文字和视觉内容的基于内容的模式。然而,开发一个整合这些不同技术以支持多种应用的统一框架,仍是一项重大挑战。本文介绍了OmniSage,这是一个大型代表框架,它学习各种应用在兴趣中的通用代表。OmniSage将图形神经网络与基于内容的模式和用户序列模型相结合,为此采用多种对比性学习任务来有效处理图表数据、用户序列数据和内容信号。为了支持OmniSage的培训和推断,我们开发了一个高效的基础设施,能够支持有数十亿个节点的利息图表。OmniSage的通用表述极大地增强了用户在兴趣方面的经验,导致五个应用程序的全网站(savises)调控器增加了大约2.5%。本文着重介绍了统一代表性/在线学习方法的影响。
Article 237
Title@2025-06-12 (4): Qronos: Correcting the Past by Shaping the Future… in Post-Training Quantization
Title: Qronos: Correcting the Past by Shaping the Future… in Post-Training Quantization | Qronos: Die Vergangenheit korrigieren, indem sie die Zukunft gestalten… in der Post-Training-Quantisierung | Qronos:在培训后量化中通过塑造未来来纠正过去 2505.11695v2 |
Authors (4): Shihao Zhang, Haoyu Zhang, Ian Colbert, Rayan Saab
We introduce Qronos – a new state-of-the-art post-training quantization algorithm that sequentially rounds and updates neural network weights. Qronos not only explicitly corrects errors due to both weight and activation quantization, but also errors resulting from quantizing previous layers. Our iterative algorithm is based on an interpretable and disciplined optimization framework that subsumes and surpasses existing data-driven approaches. At each step, Qronos alternates between error correction and diffusion via optimal update rules. Importantly, we prove that Qronos admits an efficient implementation that uses the Cholesky decomposition for solving least-squares problems. We also demonstrate that Qronos is compatible with existing transformation techniques such as Hadamard-based incoherence processing and weight-activation scaling equalization, among others. We evaluate Qronos using recent autoregressive language generation models in the Llama3 family; Qronos consistently outperforms previous state-of-the-art adaptive rounding methods when quantizing the weights, activations, and/or KV caches.
我们引入了Qronos – – 一种新的最先进的培训后量化算法,它依次循环并更新神经网络重量。 Qronos不仅明确纠正由于重量和激活量化造成的错误, 而且还纠正了前几层的量化所产生的错误。 我们的迭代算法基于一个可解释和有纪律的优化框架, 该框架将分解并超过现有的数据驱动方法。 每一步, Qronos通过最佳更新规则将错误更正与传播相交。 重要的是, 我们证明 Qronos 承认了一种高效的实施, 使用Choolesky分解法解决最小方位问题。 我们还表明, Qronos 与现有的变异技术兼容, 例如基于 Hadamad 的异位处理和重振动比例均衡等 。 我们使用最近的Llama3 家族的反向语言生成模型来评估Qronos ; Qronos 一贯地超越了在对重重力、 激活和/ 或 KV 缓存中进行四轮回调的旧方法。
Article 238
Title@2025-06-11 (3): Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing
Title: Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing | Gewicht wert: Randomisierte und Regularisierte Blockkaczmarz-Algorithmen ohne Vorverarbeitung | 重量:无预处理的随机和正规化卡茨马尔兹理算区块 2502.00882v2 |
Authors (3): Gil Goldshlager, Jiang Hu, Lin Lin
Due to the ever growing amounts of data leveraged for machine learning and scientific computing, it is increasingly important to develop algorithms that sample only a small portion of the data at a time. In the case of linear least-squares, the randomized block Kaczmarz method (RBK) is an appealing example of such an algorithm, but its convergence is only understood under sampling distributions that require potentially prohibitively expensive preprocessing steps. To address this limitation, we analyze RBK when the data is sampled uniformly, showing that its iterates converge in a Monte Carlo sense to a $\textit{weighted}$ least-squares solution. Unfortunately, for general problems the condition number of the weight matrix and the variance of the iterates can become arbitrarily large. We control these issues by incorporating regularization into the RBK iterations, yielding the regularized algorithm ReBlocK. Numerical experiments including examples arising from natural gradient optimization demonstrate that ReBlocK can outperform both RBK and minibatch stochastic gradient descent for inconsistent problems with rapidly decaying singular values.
由于机器学习和科学计算所利用的数据数量不断增加,因此越来越有必要发展一种算法,每次只对一小部分数据进行抽样。对于线性最小方块而言,随机的卡茨马尔兹区块方法(Kaczmarz)是这种算法的引人注意的例子,但它的趋同只能通过抽样分布来理解,而抽样分布则可能需要花费巨大得多的预处理步骤。为了解决这一限制,我们在对数据进行统一取样时对RBK进行了分析,表明其循环在蒙特卡洛感中汇合到美元(textit{加权}美元)最低方块的溶液中。不幸的是,对于一般问题,重量矩阵的条件数量和迭代国的差异可能会变得任意大。我们通过将正规化纳入RBK迭代法,产生正规化的算法ReBlocK. Numerical实验,包括自然梯度优化所产生的实例,表明ReBlocK和微型批次梯度梯度梯度梯度梯度梯度梯度梯度梯度下降,因为问题与指数的迅速衰减。
Article 239
Title@2025-06-11 (3): A new type of federated clustering: A non-model-sharing approach
Title: A new type of federated clustering: A non-model-sharing approach | Eine neue Art von föderierten Clustern: Ein nicht-Modell-Sharing-Ansatz | 新型的联盟联合集群:非模式共享办法 2506.10244v1 |
Authors (8): Yuji Kawamata, Kaoru Kamijo, Maki Kihira, Akihiro Toyoda, Tomoru Nakayama, Akira Imakura, Tetsuya Sakurai, Yukihiko Okada
In recent years, the growing need to leverage sensitive data across institutions has led to increased attention on federated learning (FL), a decentralized machine learning paradigm that enables model training without sharing raw data. However, existing FL-based clustering methods, known as federated clustering, typically assume simple data partitioning scenarios such as horizontal or vertical splits, and cannot handle more complex distributed structures. This study proposes data collaboration clustering (DC-Clustering), a novel federated clustering method that supports clustering over complex data partitioning scenarios where horizontal and vertical splits coexist. In DC-Clustering, each institution shares only intermediate representations instead of raw data, ensuring privacy preservation while enabling collaborative clustering. The method allows flexible selection between k-means and spectral clustering, and achieves final results with a single round of communication with the central server. We conducted extensive experiments using synthetic and open benchmark datasets. The results show that our method achieves clustering performance comparable to centralized clustering where all data are pooled. DC-Clustering addresses an important gap in current FL research by enabling effective knowledge discovery from distributed heterogeneous data. Its practical properties – privacy preservation, communication efficiency, and flexibility – make it a promising tool for privacy-sensitive domains such as healthcare and finance.
近年来,各机构利用敏感数据的日益需要,导致对联合学习(FL)这一分散式机器学习模式的日益重视,这种分散式的机械学习模式使示范培训得以不共享原始数据。然而,现有的基于FL的集群方法(称为Federate Group)通常采取简单的数据分割方案,例如横向或纵向分割,无法处理更复杂的分布结构。本研究报告提议数据合作集群(DC-Clustering),这是一种新的联合式集群方法,支持在横向和纵向分裂共存的复杂数据分割假设中进行集群。在DC-Clustering中,每个机构只共享中间显示器而不是原始数据,确保隐私保护,同时促成合作集群。这种方法允许灵活选择 k- means和光谱集群,并通过与中央服务器的单轮通信取得最终结果。我们利用合成和开放基准数据集进行了广泛的实验。结果显示,我们的方法的组合性业绩与集中组合性(所有数据都聚集在一起)相近。DC-Clustering 解决当前FL研究中的一个重要差距,即能够从分布的混杂数据中有效地发现知识。它的实际性质 – – 隐私保护、通信效率和灵活性 – – – 使它成为一种有希望的工具。
Article 240
Title@2025-06-11 (3): PointNet with KAN versus PointNet with MLP for 3D Classification and Segmentation of Point Sets
Title: PointNet with KAN versus PointNet with MLP for 3D Classification and Segmentation of Point Sets | PointNet mit KAN versus PointNet mit MLP für 3D Klassifizierung und Segmentierung von Punktsätzen | 与 KAN 和 PointNet 和 MLP 相对的点网点网点,用于 3D 的点集分类和分划 2410.10084v3 |
Authors (1): Ali Kashefi
Kolmogorov-Arnold Networks (KANs) have recently gained attention as an alternative to traditional Multilayer Perceptrons (MLPs) in deep learning frameworks. KANs have been integrated into various deep learning architectures such as convolutional neural networks, graph neural networks, and transformers, with their performance evaluated. However, their effectiveness within point-cloud-based neural networks remains unexplored. To address this gap, we incorporate KANs into PointNet for the first time to evaluate their performance on 3D point cloud classification and segmentation tasks. Specifically, we introduce PointNet-KAN, built upon two key components. First, it employs KANs instead of traditional MLPs. Second, it retains the core principle of PointNet by using shared KAN layers and applying symmetric functions for global feature extraction, ensuring permutation invariance with respect to the input features. In traditional MLPs, the goal is to train the weights and biases with fixed activation functions; however, in KANs, the goal is to train the activation functions themselves. We use Jacobi polynomials to construct the KAN layers. We extensively and systematically evaluate PointNet-KAN across various polynomial degrees and special types such as the Lagrange, Chebyshev, and Gegenbauer polynomials. Our results show that PointNet-KAN achieves competitive performance compared to PointNet with MLPs on benchmark datasets for 3D object classification and part and semantic segmentation, despite employing a shallower and simpler network architecture. We also study a hybrid PointNet model incorporating both KAN and MLP layers. We hope this work serves as a foundation and provides guidance for integrating KANs, as an alternative to MLPs, into more advanced point cloud processing architectures.
Kolmogorov-Arnold 网络( KANs) 最近作为传统多层天体( MLPs) 的替代方案,在深层学习框架中受到关注。 KANs 已被整合到各种深层学习结构中, 如 convolual 神经网络、图形神经网络和变压器, 并且对其性能进行了评估。 但是, 在基于点球的神经网络中, KANPs 仍然没有被挖掘出来。 为了解决这一差距, 我们首次将 KANs 纳入 PortNet 网络, 以评价其在3D 点的云级分类和分解任务上的表现。 具体地说, 我们引入了点Net- KAN- KAN, 以两个关键组成部分为基础建立。 首先, 它使用 KANs 而不是传统的 MLPs 。 第二, 它保留了点网络的核心原则, 使用共享 KAN 的层次, 并应用对全方位功能模型, 确保输入输入输入输入。 在传统的 MLPPS 中, 我们的目标是将重重度和偏重点用于 IMAN 和 Calal IM 。
Article 241
Title@2025-06-11 (3): Function-Guided Conditional Generation Using Protein Language Models with Adapters
Title: Function-Guided Conditional Generation Using Protein Language Models with Adapters | Funktionsgeführte Conditional Generation Verwendung von Protein Language Models mit Adaptern | 使用配有适应器的蛋白语言模型的 函数引导有条件生成 2410.03634v2 |
Authors (4): Jason Yang, Aadyot Bhatnagar, Jeffrey A. Ruffolo, Ali Madani
The conditional generation of proteins with desired functions is a key goal for generative models. Existing methods based on prompting of protein language models (PLMs) can generate proteins conditioned on a target functionality, such as a desired enzyme family. However, these methods are limited to simple, tokenized conditioning and have not been shown to generalize to unseen functions. In this study, we propose ProCALM (Protein Conditionally Adapted Language Model), an approach for the conditional generation of proteins using adapters to PLMs. While previous methods have used adapters for structure-conditioned generation from PLMs, our implementation of ProCALM involves finetuning ProGen2 to condition generation based on versatile representations of protein function-e.g. enzyme family, taxonomy, or natural language descriptions. ProCALM matches or exceeds the performance of existing methods at conditional sequence generation from target functions. Impressively, it can also generalize to rare and unseen functions. Overall, ProCALM is a flexible and computationally efficient approach, and we expect that it can be extended to a wide range of generative language models.
具有理想功能的有条件蛋白质的生成是基因模型的一个关键目标; 以刺激蛋白语言模型(PLMs)为基础的现有方法可以产生以目标功能为条件的蛋白质,例如理想的酶系; 然而,这些方法仅限于简单、象征性的调节,没有被显示为可概括的功能; 在这项研究中,我们提议了ProCALM(有条件适应语言模型),一种使用适应器生成有条件蛋白质的方法; 虽然以前的方法已经使用适应器从PLMs中产生结构成型的适应器,但我们实施ProCALM(ProCALM)需要根据蛋白质功能的多功能(例如:酶族、分类或自然语言描述)对ProGen2进行微调,以条件生成。ProCALM在设定目标功能的有条件序列生成时匹配或超过现有方法的功能。 令人印象深刻地说,它也可以概括到稀有和看不见的功能。 总体来说,ProCALM是一种灵活和计算高效的方法,我们期望它能够扩大到广泛的基因化语言模型。
Article 242
Title@2025-06-11 (3): Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods
Title: Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods | Prompt Angriffe Enthüllen Oberflächliche Wissensentfernung in unlearning Methoden | 以不学习方法清除知识 2506.10236v1 |
Authors (4): Yeonwoo Jang, Shariqah Hossain, Ashwin Sreevatsa, Diogo Cruz
In this work, we show that some machine unlearning methods may fail when subjected to straightforward prompt attacks. We systematically evaluate eight unlearning techniques across three model families, and employ output-based, logit-based, and probe analysis to determine to what extent supposedly unlearned knowledge can be retrieved. While methods like RMU and TAR demonstrate robust unlearning, ELM remains vulnerable to specific prompt attacks (e.g., Hindi filler text in original prompt recovering 57.3% accuracy). Our logit analysis also confirms that unlearned models are generally not hiding knowledge by modifying the way the answer is formatted, as the correlation between output and logit accuracy is strong. These results challenge prevailing assumptions about unlearning effectiveness and highlight the need for evaluation frameworks that can reliably distinguish between true knowledge removal and superficial output suppression. We also publicly make available our evaluation framework to easily evaluate prompting techniques to retrieve unlearning knowledge.
在这项工作中,我们发现,当受到直截了当的迅速攻击时,一些机器的不学习方法可能会失败。我们系统地评估了三个模式家庭8种不学习技术,并采用了基于产出的、基于登录的和基于调查的分析,以确定在多大程度上可以检索到据称没有获得的知识。RMU和TAR等方法显示的是强有力的不学习,而ELM则仍然容易受到具体的迅速攻击(例如,印地语填充器的原始快速恢复57.3%的准确性)。我们的对日志的分析还证实,由于产出和登录准确性之间的关联性很强,未学习模式通常不会通过改变答案的格式来隐藏知识。这些结果挑战了关于不学习有效性的普遍假设,并强调了评价框架的必要性,这种框架能够可靠地区分真正的知识删除和表面禁止产出。我们还公开提供我们的评价框架,以方便地评价检索不学习知识的快速技术。
Article 243
Title@2025-06-11 (3): LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation
Title: LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation | LaMAGIC2: Erweiterte Schaltungsformulierungen für sprachmodellbasierte analoge Topologie-Generierung | LaMAGIC2:语言模拟模拟模拟地形生成的先进电路配制 2506.10235v1 |
Authors (5): Chen-Chia Chang, Wan-Hsuan Lin, Yikang Shen, Yiran Chen, Xin Zhang
Automation of analog topology design is crucial due to customized requirements of modern applications with heavily manual engineering efforts. The state-of-the-art work applies a sequence-to-sequence approach and supervised finetuning on language models to generate topologies given user specifications. However, its circuit formulation is inefficient due to O( | V | 2) token length and suffers from low precision sensitivity to numeric inputs. In this work, we introduce LaMAGIC2, a succinct float-input canonical formulation with identifier (SFCI) for language model-based analog topology generation. SFCI addresses these challenges by improving component-type recognition through identifier-based representations, reducing token length complexity to O( | V | ), and enhancing numeric precision sensitivity for better performance under tight tolerances. Our experiments demonstrate that LaMAGIC2 achieves 34% higher success rates under a tight tolerance of 0.01 and 10X lower MSEs compared to a prior method. LaMAGIC2 also exhibits better transferability for circuits with more vertices with up to 58.5% improvement. These advancements establish LaMAGIC2 as a robust framework for analog topology generation. |
模拟地形设计自动化至关重要,因为现代应用需要大量手工工程,因此模拟地形设计自动化至关重要。最先进的工作采用顺序到顺序的方法,对语言模型进行有监督的微调,以产生符合用户规格的地形。然而,由于O(V 2)象征性长度,电路配制效率低下,对数字输入的精确度低。在这项工作中,我们引入了LaMAGIC2, 一种简明的、具有基于语言模型的模拟地形生成识别标志(SFCI)的浮式输入式集成式配方。SFCI通过基于识别特征的表示方式改进部件类型识别,降低O(V ) 的象征性复杂性,提高数字精度,以便在紧紧的容度下提高性能。我们的实验表明,LaMAGIC2在0.0和10X较低的MSE的严格耐受容度下,取得了34%更高的成功率。LaMAGIC2还表明,具有更高程度58.5%改进的脊椎的电路路路的可转移性。这些进步使LAMAGIC2成为了最高生成结构的坚固框架。
Article 244
Title@2025-06-11 (3): Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Title: Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices | Große Sprachmodelle sichern: Bedrohungen, Schwachstellen und verantwortungsvolle Praktiken | 保障大语言模式:威胁、脆弱性和负责任做法 2403.12503v2 |
Authors (5): Sara Abdali, Richard Anarfi, CJ Barberan, Jia He, Erfan Shayegani
Large language models (LLMs) have significantly transformed the landscape of Natural Language Processing (NLP). Their impact extends across a diverse spectrum of tasks, revolutionizing how we approach language understanding and generations. Nevertheless, alongside their remarkable utility, LLMs introduce critical security and risk considerations. These challenges warrant careful examination to ensure responsible deployment and safeguard against potential vulnerabilities. This research paper thoroughly investigates security and privacy concerns related to LLMs from five thematic perspectives: security and privacy concerns, vulnerabilities against adversarial attacks, potential harms caused by misuses of LLMs, mitigation strategies to address these challenges while identifying limitations of current strategies. Lastly, the paper recommends promising avenues for future research to enhance the security and risk management of LLMs.
大型语言模型(LLMs)极大地改变了自然语言处理(NLP)的格局,其影响跨越了各种各样的任务,改变了我们对待语言理解和几代人的方式,然而,LLMs除了具有非凡的效用外,还引入了重要的安全和风险考虑,这些挑战值得仔细研究,以确保负责任地部署和防范潜在的脆弱性,本研究文件从五个主题角度彻底调查与LLMs有关的安全和隐私问题:安全和隐私关切、对抗性攻击的脆弱性、滥用LLMs可能造成的伤害、应对这些挑战的缓解战略,同时查明目前战略的局限性;最后,该文件建议今后开展研究以加强LLMs的安全和风险管理的可行途径。
Article 245
Title@2025-06-11 (3): ScoreMix: Improving Face Recognition via Score Composition in Diffusion Generators
Title: ScoreMix: Improving Face Recognition via Score Composition in Diffusion Generators | ScoreMix: Verbesserung der Gesichtserkennung über Score-Komposition in Diffusionsgeneratoren | 分数组合:通过传播发电机的分分数构成改善面对面的承认 2506.10226v1 |
Authors (2): Parsa Rahimi, Sebastien Marcel
In this paper, we propose ScoreMix, a novel yet simple data augmentation strategy leveraging the score compositional properties of diffusion models to enhance discriminator performance, particularly under scenarios with limited labeled data. By convexly mixing the scores from different class-conditioned trajectories during diffusion sampling, we generate challenging synthetic samples that significantly improve discriminative capabilities in all studied benchmarks. We systematically investigate class-selection strategies for mixing and discover that greater performance gains arise when combining classes distant in the discriminator’s embedding space, rather than close in the generator’s condition space. Moreover, we empirically show that, under standard metrics, the correlation between the generator’s learned condition space and the discriminator’s embedding space is minimal. Our approach achieves notable performance improvements without extensive parameter searches, demonstrating practical advantages for training discriminative models while effectively mitigating problems regarding collections of large datasets. Paper website: https://parsa-ra.github.io/scoremix
在本文中,我们提出ScowMix,这是一个创新而简单的数据增强战略,它利用扩散模型的分数构成特性来提高歧视性性能,特别是在有有限标签数据的情况下。我们通过在扩散取样过程中将不同等级固定轨道的分数混为一谈,生成了具有挑战性的合成样本,大大提高了所有研究基准的歧视性能力。我们系统地调查了分类选择混合战略,发现如果将远离歧视者嵌入空间的班级结合在一起,而不是接近发电机的状态空间,则产生更大的性能收益。此外,我们从经验上表明,根据标准衡量标准,发电机的学习状态空间与歧视者嵌入空间之间的关系是最小的。我们的方法在没有广泛参数搜索的情况下取得了显著的性能改进,展示了培训歧视模式的实际优势,同时有效地缓解了大型数据集收集方面的问题。 论文网站: https://parsa-ra.github.io/scorminix。
Article 246
Title@2025-06-11 (3): Meta-learning Optimizers for Communication-Efficient Learning
Title: Meta-learning Optimizers for Communication-Efficient Learning | Meta-Learning-Optimierer für Kommunikations-Effizientes Lernen | 通信-高效学习的元学习优化工具 2312.02204v2 |
Authors (5): Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky
Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally on each worker, before averaging model parameters, helping relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art adaptive optimizers for deep learning. In this work, we investigate if the recent progress in the emerging area of learned optimizers can potentially close this gap in homogeneous data and homogeneous device settings while remaining communication-efficient. Specifically, we meta-learn how to perform global updates given an update from local SGD iterations. Our results demonstrate that learned optimizers can substantially outperform local SGD and its sophisticated variants while maintaining their communication efficiency. Our learned optimizers can even generalize to unseen and much larger datasets and architectures, including ImageNet and ViTs, and to unseen modalities such as language modeling. We therefore show the potential of learned optimizers for improving communication-efficient distributed learning.
SGD, 特别是本地 SGD 的通信效率变异器近年来受到极大关注。 这些方法在平均模型参数之前,在当地计算每个工人的多重梯度步骤,帮助减轻分布式深层学习培训中关键的通信瓶颈。 虽然提出了许多变异器,但有时它们可能落后于最先进的适应性优化系统,以便深层学习。在这项工作中,我们调查在新兴的有知识的优化器领域最近取得的进展能否在保持通信效率的同时,缩小同质数据和同质设备设置方面的差距。具体地说,我们利用本地SGD的更新,将如何进行全球更新。我们的成果表明,学习的优化器在保持通信效率的同时,可以大大优于本地的 SGD及其尖端变异器。我们学到的优化器甚至可以概括为看不见的和更大的数据集和结构,包括图像网和ViT,以及语言模型等隐形模式。 因此,我们展示了学习的优化器在改进传播效率的学习方面的潜力。
Article 247
Title@2025-06-11 (3): Neural Networks Generalize on Low Complexity Data
Title: Neural Networks Generalize on Low Complexity Data | Neurale Netzwerke verallgemeinern auf Daten mit geringer Komplexität | 通用低复杂度数据神经网络 2409.12446v3 |
Authors (2): Sourav Chatterjee, Timothy Sudijono
We show that feedforward neural networks with ReLU activation generalize on low complexity data, suitably defined. Given i.i.d.~data generated from a simple programming language, the minimum description length (MDL) feedforward neural network which interpolates the data generalizes with high probability. We define this simple programming language, along with a notion of description length of such networks. We provide several examples on basic computational tasks, such as checking primality of a natural number. For primality testing, our theorem shows the following and more. Suppose that we draw an i.i.d.~sample of $n$ numbers uniformly at random from $1$ to $N$. For each number $x_i$, let $y_i = 1$ if $x_i$ is a prime and $0$ if it is not. Then, the interpolating MDL network accurately answers, with probability $1- O((\ln N)/n)$, whether a newly drawn number between $1$ and $N$ is a prime or not. Note that the network is not designed to detect primes; minimum description learning discovers a network which does so. Extensions to noisy data are also discussed, suggesting that MDL neural network interpolators can demonstrate tempered overfitting.
我们用RELU 激活功能来显示饲料前进的神经网络, 以简单编程语言生成的数据, 即 i. d. ~ 数据, 以简单编程语言生成的 i. d. ~ 数据, 最小描述长度 (MDL) 向神经网络, 以高概率对数据进行统观。 我们定义了这种简单的编程语言, 并给出了描述这种网络长度的概念。 我们提供了一些关于基本计算任务的例子, 如检查自然数字的初等性。 在初等测试中, 我们的理论显示以下及更多。 假设我们绘制了 i. d. ~ 美元数字的样本, 平均为1美元到 美元。 对于每个数字, $_ i 美元, 允许$y_ i = 1美元, 如果$x i 是正数, 如果不是正数, 则为0. 0美元。 然后, 我们提供一些插图解的MDL 网络准确的答案, 概率为 1- O ( n) / n 美元, , , 无论新绘制的数字在$ $ $ n 之间是 1 pri. d. d. d 或 not. prople small pall un 。 。 un.
Article 248
Title@2025-06-11 (3): The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks
Title: The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks | Die Auswirkungen von Feature Scaling im maschinellen Lernen: Auswirkungen auf Regressions- und Klassifizierungsaufgaben | 机械学习中的特质增强效果:对倒退和分类任务的影响 2506.08274v2 |
Authors (7): João Manoel Herrera Pinheiro, Suzana Vilas Boas de Oliveira, Thiago Henrique Segreto Silva, Pedro Antonio Rabelo Saraiva, Enzo Ferreira de Souza, Leonardo André Ambrosio, Marcelo Becker
This research addresses the critical lack of comprehensive studies on feature scaling by systematically evaluating 12 scaling techniques - including several less common transformations - across 14 different Machine Learning algorithms and 16 datasets for classification and regression tasks. We meticulously analyzed impacts on predictive performance (using metrics such as accuracy, MAE, MSE, and $R^2$) and computational costs (training time, inference time, and memory usage). Key findings reveal that while ensemble methods (such as Random Forest and gradient boosting models like XGBoost, CatBoost and LightGBM) demonstrate robust performance largely independent of scaling, other widely used models such as Logistic Regression, SVMs, TabNet, and MLPs show significant performance variations highly dependent on the chosen scaler. This extensive empirical analysis, with all source code, experimental results, and model parameters made publicly available to ensure complete transparency and reproducibility, offers model-specific crucial guidance to practitioners on the need for an optimal selection of feature scaling techniques.
这项研究通过系统评估14种不同的机器学习算法和16个数据集用于分类和回归任务的12种缩放技术(包括若干较不常见的变换)和16个数据集,解决了严重缺乏关于特征缩放的全面研究的问题。我们仔细分析了对预测性业绩的影响(使用精确度、MAE、MSE和$R%2美元等指标)和计算成本(培训时间、推算时间和记忆使用)和计算成本的影响(培训时间、试验结果和记忆使用)。主要调查结果显示,尽管混合方法(如随机森林和梯度加速模型,如XGBoost、CatBoost和LightGBM)显示强健的性能基本上独立于缩放,但其他广泛使用的模式(如物流递增、SVMS、TabNet和MLPs)显示,显著的性能差异在很大程度上取决于选定的缩放尺度。这一广泛的实证分析提供了所有源代码、实验结果和模型参数,以确保完全透明和可复制性,向从业人员提供关于最佳选择特征缩放技术需要的模型的关键指导。
Article 249
Title@2025-06-11 (3): SALAD: Systematic Assessment of Machine Unlearing on LLM-Aided Hardware Design
Title: SALAD: Systematic Assessment of Machine Unlearing on LLM-Aided Hardware Design | SALAD: Systematische Bewertung der Maschinenentleerung auf LLM-Aided Hardware Design | SALAD: 系统评估LLM辅助硬件设计中的机器渗漏 2506.02089v2 |
Authors (9): Zeng Wang, Minghao Shao, Rupesh Karn, Likhitha Mankali, Jitendra Bhandari, Ramesh Karri, Ozgur Sinanoglu, Muhammad Shafique, Johann Knechtel
Large Language Models (LLMs) offer transformative capabilities for hardware design automation, particularly in Verilog code generation. However, they also pose significant data security challenges, including Verilog evaluation data contamination, intellectual property (IP) design leakage, and the risk of malicious Verilog generation. We introduce SALAD, a comprehensive assessment that leverages machine unlearning to mitigate these threats. Our approach enables the selective removal of contaminated benchmarks, sensitive IP and design artifacts, or malicious code patterns from pre-trained LLMs, all without requiring full retraining. Through detailed case studies, we demonstrate how machine unlearning techniques effectively reduce data security risks in LLM-aided hardware design.
大型语言模型(LLMS)为硬件设计自动化提供了变革能力,特别是在Verilog代码生成方面,但也带来了数据安全方面的重大挑战,包括Verilog评价数据污染、知识产权设计渗漏和恶意Verilog生成的风险。我们引入了SALAD(SALAD)综合评估,该评估利用机器退出学习来减轻这些威胁。我们的方法使得有选择地将受污染的基准、敏感的IP和设计文物或恶意代码模式从预先培训的LMS中去除,而无需充分再培训。通过详细的案例研究,我们展示了机器不学习技术如何有效减少LLM辅助硬件设计的数据安全风险。
Article 250
Title@2025-06-11 (3): Cross-Learning Between ECG and PCG: Exploring Common and Exclusive Characteristics of Bimodal Electromechanical Cardiac Waveforms
Title: Cross-Learning Between ECG and PCG: Exploring Common and Exclusive Characteristics of Bimodal Electromechanical Cardiac Waveforms | Cross-Learning zwischen EKG und PCG: Erforschung gemeinsamer und exklusiver Eigenschaften bimodaler elektromechanischer Herzwellenformen | ECG和PCG之间的交叉学习:探索双式电动机械心脏病波形的常见和独有特点 2506.10212v1 |
Authors (4): Sajjad Karimi, Amit J. Shah, Gari D. Clifford, Reza Sameni
Simultaneous electrocardiography (ECG) and phonocardiogram (PCG) provide a comprehensive, multimodal perspective on cardiac function by capturing the heart’s electrical and mechanical activities, respectively. However, the distinct and overlapping information content of these signals, as well as their potential for mutual reconstruction and biomarker extraction, remains incompletely understood, especially under varying physiological conditions and across individuals. In this study, we systematically investigate the common and exclusive characteristics of ECG and PCG using the EPHNOGRAM dataset of simultaneous ECG-PCG recordings during rest and exercise. We employ a suite of linear and nonlinear machine learning models, including non-causal LSTM networks, to reconstruct each modality from the other and analyze the influence of causality, physiological state, and cross-subject variability. Our results demonstrate that nonlinear models, particularly non-causal LSTM, provide superior reconstruction performance, with reconstructing ECG from PCG proving more tractable than the reverse. Exercise and cross-subject scenarios present significant challenges, but envelope-based modeling that utilizes instantaneous amplitude features substantially improves cross-subject generalizability for cross-modal learning. Furthermore, we demonstrate that clinically relevant ECG biomarkers, such as fiducial points and QT intervals, can be estimated from PCG in cross-subject settings. These findings advance our understanding of the relationship between electromechanical cardiac modalities, in terms of both waveform characteristics and the timing of cardiac events, with potential applications in novel multimodal cardiac monitoring technologies.
在这项研究中,我们利用ECG和PCG同时的ECG-PCG记录数据集,系统地调查ECG和PCG的共同和专属特性。我们使用一套线性和非线性机器学习模型,包括非线性LSTM网络,从其他模式中重建每一种模式,分析因果关系、生理状态和跨主题差异的影响。我们的结果表明,非线性模型,特别是非线性 LSTM, 提供了优异的重建性能,从PCG重建ECG, 证明比反向更易行。练习和跨背景情景提出了重大挑战,但基于信封的模型,利用瞬间粘度特性,从其他模式中重建各种模式,包括LSTM网络,从其他模式中重建每一种模式,分析因果关系、生理状态和跨主题差异的影响。我们的结果表明,非线性模型,特别是非线性LSTM,提供了优超标准性重建性功能,从PCG中重建ECG,证明比反向更易操作。练习和跨背景情景带来了重大挑战,但基于信封模型的模型,利用瞬间粘测测测测测测测测测测的轨道的轨道机机性机性机性机性机性模型,从而测量测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测了EG。
Article 251
Title@2025-06-11 (3): Failure Modes of LLMs for Causal Reasoning on Narratives
Title: Failure Modes of LLMs for Causal Reasoning on Narratives | Failure Modes von LLMs für die ursächliche Begründung von Narrativen | 以叙述为由解释原因的LLMs失败模式 2410.23884v4 |
Authors (5): Khurram Yamin, Shantanu Gupta, Gaurav R. Ghosal, Zachary C. Lipton, Bryan Wilder
In this work, we investigate the causal reasoning abilities of large language models (LLMs) through the representative problem of inferring causal relationships from narratives. We find that even state-of-the-art language models rely on unreliable shortcuts, both in terms of the narrative presentation and their parametric knowledge. For example, LLMs tend to determine causal relationships based on the topological ordering of events (i.e., earlier events cause later ones), resulting in lower performance whenever events are not narrated in their exact causal order. Similarly, we demonstrate that LLMs struggle with long-term causal reasoning and often fail when the narratives are long and contain many events. Additionally, we show LLMs appear to rely heavily on their parametric knowledge at the expense of reasoning over the provided narrative. This degrades their abilities whenever the narrative opposes parametric knowledge. We extensively validate these failure modes through carefully controlled synthetic experiments, as well as evaluations on real-world narratives. Finally, we observe that explicitly generating a causal graph generally improves performance while naive chain-of-thought is ineffective. Collectively, our results distill precise failure modes of current state-of-the-art models and can pave the way for future techniques to enhance causal reasoning in LLMs.
在这项工作中,我们调查大型语言模型(LLMs)的因果推理能力,方法是通过从叙述中推断因果关系这一具有代表性的问题来推断大语言模型(LLMs)的因果推理能力;我们发现,即使是最先进的语言模型也依赖不可靠的捷径,无论是叙述性陈述还是其参数知识;例如,LLMs往往根据事件的地形顺序确定因果关系(即早期事件导致后来的事件),因此,如果事件没有按其确切的因果顺序加以说明,其性能就会下降;同样,我们证明LLMs与长期因果推理斗争,当叙述很长且包含许多事件时往往失败。此外,我们显示LLMs似乎严重依赖其准度知识,而牺牲了对所提供的叙述性陈述的推理。这削弱了他们的能力,每当叙述性说明反对参数知识时,我们通过仔细控制的合成实验以及实际世界叙述性叙述性来广泛验证这些失败模式。最后,我们指出,明确产生因果关系的图表通常会改善业绩,而天真的思维链是无效的。从整体上看,我们的结果会保持准确的失败模式,从而强化了当前磁号模型的推算方法,可以加强未来的推导法。
Article 252
Title@2025-06-11 (3): SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
Title: SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models | SealQA: Anhebung der Messlatte für die Vernunft in Search-Augmented Language Models | SealQA: 提高搜索增强语言模式中的原因栏 2506.01062v2 |
Authors (6): Thinh Pham, Nguyen Nguyen, Pratibha Zunjare, Weiyuan Chen, Yu-Min Tseng, Tu Vu
We introduce SealQA, a new challenge benchmark for evaluating SEarch-Augmented Language models on fact-seeking questions where web search yields conflicting, noisy, or unhelpful results. SealQA comes in three flavors: (1) Seal-0 (main) and (2) Seal-Hard, which assess factual accuracy and reasoning capabilities, with Seal-0 focusing on the most challenging questions where chat models (e.g., GPT-4.1) typically achieve near-zero accuracy; and (3) LongSeal, which extends SealQA to test long-context, multi-document reasoning in “needle-in-a-haystack” settings. Our evaluation reveals critical limitations in current models: Even frontier LLMs perform poorly across all SealQA flavors. On Seal-0, frontier agentic models equipped with tools like o3 and o4-mini achieve only 17.1% and 6.3% accuracy, respectively, at their best reasoning efforts. We find that advanced reasoning models such as DeepSeek-R1-671B and o3-mini are highly vulnerable to noisy search results. Notably, increasing test-time compute does not yield reliable gains across o3-mini, o4-mini, and o3, with performance often plateauing or even declining early. Additionally, while recent models are less affected by the “lost-in-the-middle” issue, they still fail to reliably identify relevant documents in LongSeal when faced with numerous distractors. To facilitate future work, we release SealQA at huggingface.co/datasets/vtllms/sealqa.
我们引入了Search-Auged 语言模型, 用于评估网络搜索结果相互冲突、吵闹或无益的实况调查问题的新挑战基准。 SealQA 以三种口味出现:(1) Seal-0 (main) 和(2) Seal-Hard, 评估真实准确性和推理能力, 由Seal-0 侧重于聊天模式(例如GPT-4.1)通常达到接近零准确度的最具有挑战性的问题;(3) LongSeal, 将SealQA 扩展至测试“需要-a-haystack” 设置中的长相文本、多文档推理。 我们的评估揭示了当前模式中的关键局限性:(1) 甚至边境LLMS在所有Seal 口味上表现不佳。 在Seil-0, 配有O3 和 o4 工具的边境代理模型分别只达到17.1% 和6.3% 准确度, 其最佳推理工作。 我们发现, DeepSeek- R1-671B 和 o3 Mini 等高级推理模型仍然极易受到冷测的搜索结果的伤害。 。 。 明显, 不断的测试- 分析- 和不断下降的收益 , 无法产生 。
Article 253
Title@2025-06-11 (3): ETS: Efficient Tree Search for Inference-Time Scaling
Title: ETS: Efficient Tree Search for Inference-Time Scaling | ETS: Effiziente Baumsuche nach Inferenz-Zeitskalierung | ETS: 高效树搜索推断时间缩放 2502.13575v2 |
Authors (10): Coleman Hooper, Sehoon Kim, Suhong Moon, Kerem Dilmen, Monishwaran Maheswaran, Nicholas Lee, Michael W. Mahoney, Sophia Shao, Kurt Keutzer, Amir Gholami
Test-time compute scaling has emerged as a new axis along which to improve model accuracy, where additional computation is used at inference time to allow the model to think longer for more challenging problems. One promising approach for test-time compute scaling is search against a process reward model, where a model generates multiple potential candidates at each step of the search, and these partial trajectories are then scored by a separate reward model in order to guide the search process. The diversity of trajectories in the tree search process affects the accuracy of the search, since increasing diversity promotes more exploration. However, this diversity comes at a cost, as divergent trajectories have less KV sharing, which means they consume more memory and slow down the search process. Previous search methods either do not perform sufficient exploration, or else explore diverse trajectories but have high latency. We address this challenge by proposing Efficient Tree Search (ETS), which promotes KV sharing by pruning redundant trajectories while maintaining necessary diverse trajectories. ETS incorporates a linear programming cost model to promote KV cache sharing by penalizing the number of nodes retained, while incorporating a semantic coverage term into the cost model to ensure that we retain trajectories which are semantically different. We demonstrate how ETS can achieve 1.8$\times$ reduction in average KV cache size during the search process, leading to 1.4$\times$ increased throughput relative to prior state-of-the-art methods, with minimal accuracy degradation and without requiring any custom kernel implementation. Code is available at: https://github.com/SqueezeAILab/ETS.
测试时计算缩放是一个新的轴轴,沿着这个轴来提高模型的准确性,在这个轴上,在推论时间使用额外的计算方法,让模型更长时间思考更具有挑战性的问题。一个有希望的测试时计算缩放方法就是对一个过程奖励模型进行搜索,一个模型在搜索的每一步都产生多种潜在候选人,然后这些部分轨迹通过一个单独的奖赏模型来评分,以指导搜索进程。树上搜索过程中的轨迹多样性影响到搜索的准确性,因为增加的多样化有助于更多的探索。然而,这种多样性是成本产生的,因为不同的轨迹较少地让模型对更具挑战性的问题进行思考。不同的轨迹显示,这意味着它们消耗更多的记忆力和减慢搜索过程。 以前的搜索方法要么没有进行充分的探索,或者探索不同的轨迹图,但是我们提出高效的树木搜索(ETS),通过运行多余的美元轨迹来促进 KV的共享,同时保持必要的最低轨迹。 ETS 包含一个线性编程成本模型,用来促进KV缓存的缓存, 将我们保留在平时, 平时可以保留一个稳定的搜索时间里程。
Article 254
Title@2025-06-11 (3): Towards Fair Representation: Clustering and Consensus
Title: Towards Fair Representation: Clustering and Consensus | Auf dem Weg zu einer fairen Vertretung: Clustering und Konsens | 实现公平代表性:集群和共识 2506.08673v2 |
Authors (5): Diptarka Chakraborty, Kushagra Chatterjee, Debarati Das, Tien Long Nguyen, Romina Nobahari
Consensus clustering, a fundamental task in machine learning and data analysis, aims to aggregate multiple input clusterings of a dataset, potentially based on different non-sensitive attributes, into a single clustering that best represents the collective structure of the data. In this work, we study this fundamental problem through the lens of fair clustering, as introduced by Chierichetti et al. [NeurIPS’17], which incorporates the disparate impact doctrine to ensure proportional representation of each protected group in the dataset within every cluster. Our objective is to find a consensus clustering that is not only representative but also fair with respect to specific protected attributes. To the best of our knowledge, we are the first to address this problem and provide a constant-factor approximation. As part of our investigation, we examine how to minimally modify an existing clustering to enforce fairness – an essential postprocessing step in many clustering applications that require fair representation. We develop an optimal algorithm for datasets with equal group representation and near-linear time constant factor approximation algorithms for more general scenarios with different proportions of two group sizes. We complement our approximation result by showing that the problem is NP-hard for two unequal-sized groups. Given the fundamental nature of this problem, we believe our results on Closest Fair Clustering could have broader implications for other clustering problems, particularly those for which no prior approximation guarantees exist for their fair variants.
集成共识是机器学习和数据分析的一项根本任务,其宗旨是将可能基于不同非敏感属性的数据集的多重输入群集汇总成一个最能代表数据集体结构的单一组群。在这项工作中,我们通过由Chierichetti et al.[NeurIPS’17]介绍的公平组群透镜研究这一根本问题,它包含了不同的影响理论,以确保每个受保护群体在每个组群的数据集中的比例代表比例。我们的目标是找到一个不仅具有代表性的共识组群,而且在特定受保护属性方面也是公平的。我们最了解的是,我们首先处理这一问题,并提供一个常态点近似。作为我们调查的一部分,我们研究如何尽量减少现有组群集的公平化镜头,以落实公平性,这是许多组群应用中一个基本的后处理步骤,需要公平代表性。我们为每个组群群中具有平等代表性的数据集和近线时常系数缩算法制定了一种最佳算法,用于两个组规模不同的更一般假设。我们通过显示问题的严重性,我们首先处理这一问题,提供不变的近似结果。作为我们调查的一部分,我们无法对两个不平等的组群体产生更深刻的影响。
Article 255
Title@2025-06-11 (3): AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent
Title: AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent | AWP: Activation-Aware Weight Pruning und Quantisierung mit projiziertem Gradient Descent | AWP: 与预测的渐变人种相比,活性-敏量 体重 稳重和量化 2506.10205v1 |
Authors (5): Jing Liu, Toshiaki Koike-Akino, Ye Wang, Hassan Mansour, Matthew Brand
To address the enormous size of Large Language Models (LLMs), model compression methods, such as quantization and pruning, are often deployed, especially on edge devices. In this work, we focus on layer-wise post-training quantization and pruning. Drawing connections between activation-aware weight pruning and sparse approximation problems, and motivated by the success of Iterative Hard Thresholding (IHT), we propose a unified method for Activation-aware Weight pruning and quantization via Projected gradient descent (AWP). Our experiments demonstrate that AWP outperforms state-of-the-art LLM pruning and quantization methods. Theoretical convergence guarantees of the proposed method for pruning are also provided.
为解决大语言模型(LLMs)的巨大规模问题,通常会采用模型压缩方法,如量化和裁剪等,特别是在边缘设备上。在这项工作中,我们侧重于从层到层的训练后量化和裁剪。在激活-觉察重量裁剪和稀疏近近似问题之间建立联系,并受迭代硬推力控股的成功推动,我们提出了一种统一的方法,通过预测梯度下降进行活化-觉察重量裁剪和四分化。我们的实验表明,AWP优于最先进的LLM裁剪裁剪和四分化方法。我们还提供了拟议裁剪方法的理论趋同保证。
Article 256
Title@2025-06-11 (3): VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Title: VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use | VTool-R1: VLMs lernen mit Bildern zu denken, indem sie mehr über multimodale Werkzeugnutzung lernen | VTool-R1:VLMs通过多模式工具使用强化学习学习如何用图像思考 2505.19255v3 |
Authors (9): Mingyuan Wu, Jingcheng Yang, Jize Jiang, Meitang Li, Kaizhuo Yan, Hanchao Yu, Minjia Zhang, Chengxiang Zhai, Klara Nahrstedt
Reinforcement Learning Finetuning (RFT) has significantly advanced the reasoning capabilities of large language models (LLMs) by enabling long chains of thought, self-correction, and effective tool use. While recent works attempt to extend RFT to vision-language models (VLMs), these efforts largely produce text-only reasoning conditioned on static image inputs, falling short of true multimodal reasoning in the response. In contrast, test-time methods like Visual Sketchpad incorporate visual steps but lack training mechanisms. We introduce VTool-R1, the first framework that trains VLMs to generate multimodal chains of thought by interleaving text and intermediate visual reasoning steps. VTool-R1 integrates Python-based visual editing tools into the RFT process, enabling VLMs to learn when and how to generate visual reasoning steps that benefit final reasoning. Trained with outcome-based rewards tied to task accuracy, our approach elicits strategic visual tool use for reasoning without relying on process-based supervision. Experiments on structured visual question answering over charts and tables show that VTool-R1 enhances reasoning performance by teaching VLMs to “think with images” and generate multimodal chain of thoughts with tools.
通过长期思维链、自我校正和有效工具的使用,大型语言模型(LLMS)的推理能力有了显著提高,最近的工作试图将RFT推广到视觉语言模型(VLMS),但这些努力基本上产生了仅以静态图像输入为条件的文本推理,没有达到真正的多式推理。相比之下,视觉Strachpad等试验时间方法包含视觉步骤,但缺乏培训机制。我们引入VTool-R1,这是培训VLMS通过互读文本和中间视觉推理步骤产生多式思维链的第一个框架。VTool-R1将Python的视觉编辑工具纳入RFTA进程,使VLMS能够学习何时以及如何产生有利于最终推理的视觉推理步骤。经过基于结果的奖励与任务精度挂钩,我们的方法引出战略视觉工具用于推理,而无需基于程序的监督。通过图表和表格对结构直观问题进行实验,显示VTool-R1通过教授VLMS公司“用图像思考”和产生多式思维链。
Article 257
Title@2025-06-11 (3): Prompt Variability Effects On LLM Code Generation
Title: Prompt Variability Effects On LLM Code Generation | Veränderliche Auswirkungen auf die LLM-Code-Generierung | 对LLM 代码生成的迅速易变性效应 2506.10204v1 |
Authors (5): Andrei Paleyes, Radzim Sendyka, Diana Robinson, Christian Cabrera, Neil D. Lawrence
Code generation is one of the most active areas of application of Large Language Models (LLMs). While LLMs lower barriers to writing code and accelerate development process, the overall quality of generated programs depends on the quality of given prompts. Specifically, functionality and quality of generated code can be sensitive to user’s background and familiarity with software development. It is therefore important to quantify LLM’s sensitivity to variations in the input. To this end we propose a synthetic evaluation pipeline for code generation with LLMs, as well as a systematic persona-based evaluation approach to expose qualitative differences of LLM responses dependent on prospective user background. Both proposed methods are completely independent from specific programming tasks and LLMs, and thus are widely applicable. We provide experimental evidence illustrating utility of our methods and share our code for the benefit of the community.
生成代码是应用大语言模型最活跃的领域之一。虽然LLMs公司降低写法码和加速发展进程的障碍,但生成程序的总体质量取决于给定提示的质量。具体地说,生成代码的功能和质量对用户的背景和软件开发的熟悉度十分敏感,因此,必须量化LLM公司对投入差异的敏感度。为此,我们提议为与LLMs公司生成代码建立一个合成评价管道,并采用基于人的系统评价方法,以揭示LLM公司根据潜在用户背景做出的反应的质量差异。两种拟议方法都完全独立于具体的方案编制任务和LMS公司,因此广泛适用。我们提供实验性证据,说明我们的方法的效用,并分享我们的代码,以造福社区。
Article 258
Title@2025-06-11 (3): From Neural Representations to Interpretable Logic Rules
Title: From Neural Representations to Interpretable Logic Rules | Von neuralen Vertretungen zu interpretierbaren Logischen Regeln | 从神经代表到可解释逻辑规则 2501.08281v2 |
Authors (5): Chuqin Geng, Xiaojie Xu, Anqi Xing, Ziyu Zhao, Xujie Si
As deep neural networks continue to excel across various domains, their black-box nature has raised concerns about transparency and trust. In particular, interpretability has become increasingly essential for applications that demand high safety and knowledge rigor, such as drug discovery, autonomous driving, and genomics. However, progress in understanding even the simplest deep neural networks - such as fully connected networks - has been limited, despite their role as foundational elements in state-of-the-art models like ResNet and Transformer. In this paper, we address this challenge by introducing NeuroLogic, a novel approach for decoding interpretable logic rules from neural networks. NeuroLogic leverages neural activation patterns to capture the model’s critical decision-making processes, translating them into logical rules represented by hidden predicates. Thanks to its flexible design in the grounding phase, NeuroLogic can be adapted to a wide range of neural networks. For simple fully connected neural networks, hidden predicates can be grounded in certain split patterns of original input features to derive decision-tree-like rules. For large, complex vision neural networks, NeuroLogic grounds hidden predicates into high-level visual concepts that are understandable to humans. Our empirical study demonstrates that NeuroLogic can extract global and interpretable rules from state-of-the-art models such as ResNet, a task at which existing work struggles. We believe NeuroLogic can help pave the way for understanding the black-box nature of neural networks.
由于深层神经网络继续跨越多个领域,它们的黑箱性质引起了对透明度和信任的担忧。 特别是,对于需要高度安全和知识固定的应用程序,例如药物发现、自主驾驶和基因组学,解释性已变得越来越重要。 然而,对即使是最简单的深层神经网络 — — 如完全连接的网络 — — 的理解进展也很有限,尽管这些网络在ResNet和变异器等最先进的模型中扮演着基本元素的作用。 在本文中,我们通过引入“神经实验室”来应对这一挑战,这是将神经网络中可解释的逻辑规则解译为新颖的方法。 神经实验室网络利用神经激活模式来捕捉捉摸模型的关键决策过程,将其转化为隐蔽的上游规则。 由于在地面阶段的灵活设计,NeroLologic可以适应广泛的神经网络网络网络网络。 对于简单完全连接的神经网络来说,隐藏的上游数据可以植根于某些原始输入模式的分解模式中,从而得出类似决策层规则。 对于大型、复杂的神经网络网络网络网络网络网络网络网络网络的动态激活模式来说,可以将神经激活模式转换成隐含的内行法。
Article 259
Title@2025-06-11 (3): Learning Geometric Invariant Features for Classification of Vector Polygons with Graph Message-passing Neural Network
Title: Learning Geometric Invariant Features for Classification of Vector Polygons with Graph Message-passing Neural Network | Lernen Geometrische Invariante Funktionen zur Klassifizierung von Vektor-Polygonen mit Graph Message-passing Neural Network | 具有图形信件通过神经网络的矢量多边形分类 2407.04334v2 |
Authors (3): Zexian Huang, Kourosh Khoshelham, Martin Tomko
Geometric shape classification of vector polygons remains a challenging task in spatial analysis. Previous studies have primarily focused on deep learning approaches for rasterized vector polygons, while the study of discrete polygon representations and corresponding learning methods remains underexplored. In this study, we investigate a graph-based representation of vector polygons and propose a simple graph message-passing framework, PolyMP, along with its densely self-connected variant, PolyMP-DSC, to learn more expressive and robust latent representations of polygons. This framework hierarchically captures self-looped graph information and learns geometric-invariant features for polygon shape classification. Through extensive experiments, we demonstrate that combining a permutation-invariant graph message-passing neural network with a densely self-connected mechanism achieves robust performance on benchmark datasets, including synthetic glyphs and real-world building footprints, outperforming several baseline methods. Our findings indicate that PolyMP and PolyMP-DSC effectively capture expressive geometric features that remain invariant under common transformations, such as translation, rotation, scaling, and shearing, while also being robust to trivial vertex removals. Furthermore, we highlight the strong generalization ability of the proposed approach, enabling the transfer of learned geometric features from synthetic glyph polygons to real-world building footprints.
矢量多边形的几何形状分类仍然是空间分析中一项艰巨的任务。先前的研究主要侧重于对弧式矢量多边形的深度学习方法,而对离散多边形和相应的学习方法的研究仍未得到充分探讨。在本研究中,我们调查矢量多边形的图形代表,并提议一个简单的图表信息传递框架,即PolyMP及其密集的自我连接变体PolyMP-DSC,以学习更多表达和强大的多边形的潜在表现。这个框架从等级上捕捉自我透视图信息,并学习多边形分类的几何异性特征。通过广泛的实验,我们证明,将偏差-异式图形信息传递神经网络与密集的自我连接机制相结合,在基准数据集(包括合成晶体和真实世界建筑足迹)上取得了强有力的业绩,超过了几个基线方法。我们的研究结果表明, PolyMP 和 PolyMP-DSC 有效捕捉到在共同变异状态下仍然存在的显性地球度特征,例如翻译、旋转、缩放、缩缩和缩放等几异变异性特性特征。同时,还提议将精准的合成地球平面方法转化为学习。
Article 260
Title@2025-06-11 (3): Privacy-aware Berrut Approximated Coded Computing for Federated Learning
Title: Privacy-aware Berrut Approximated Coded Computing for Federated Learning | Privacy-aware Berrut Ungefähre Coded Computing für Federated Learning | Berrut Berrut Berrut 接近的联邦学习联合会编码计算器 2405.01704v3 |
Authors (3): Xavier Martínez Luaña, Rebeca P. Díaz Redondo, Manuel Fernández Veiga
Federated Learning (FL) is an interesting strategy that enables the collaborative training of an AI model among different data owners without revealing their private datasets. Even so, FL has some privacy vulnerabilities that have been tried to be overcome by applying some techniques like Differential Privacy (DP), Homomorphic Encryption (HE), or Secure Multi-Party Computation (SMPC). However, these techniques have some important drawbacks that might narrow their range of application: problems to work with non-linear functions and to operate large matrix multiplications and high communication and computational costs to manage semi-honest nodes. In this context, we propose a solution to guarantee privacy in FL schemes that simultaneously solves the previously mentioned problems. Our proposal is based on the Berrut Approximated Coded Computing, a technique from the Coded Distributed Computing paradigm, adapted to a Secret Sharing configuration, to provide input privacy to FL in a scalable way. It can be applied for computing non-linear functions and treats the special case of distributed matrix multiplication, a key primitive at the core of many automated learning tasks. Because of these characteristics, it could be applied in a wide range of FL scenarios, since it is independent of the machine learning models or aggregation algorithms used in the FL scheme. We provide analysis of the achieved privacy and complexity of our solution and, due to the extensive numerical results performed, a good trade-off between privacy and precision can be observed.
联邦学习(FL)是一个有趣的战略,它使不同数据所有者能够在不透露其私人数据集的情况下对AI模型进行合作培训,从而能够在不同数据所有者之间对AI模型进行合作培训。即使如此,FL也有一些隐私脆弱性,通过应用差异隐私(DP)、单态加密(HE)或安全多党计算(SMPC)等技术,试图克服这些隐私脆弱性。然而,这些技术有一些重要的缺点,可能缩小其应用范围:使用非线性功能和操作大型矩阵倍增和高通信和计算成本以管理半正统节点。在这方面,我们提出了一个解决方案,以保障FL的隐私,同时解决上述问题。我们的提案基于Berrut Ap相近的编码计算(He)技术,这是根据编码分配的计算机模型技术,适应了秘密共享配置,以可缩放的方式为FL提供输入隐私。它可用于计算非线性功能,并处理分布式矩阵倍增的特例,这是许多自动学习任务的核心的原始关键。由于这些特性,因此,我们的提案可以广泛应用FL的精确性模型,因此,因此,我们可以将采用一种独立的模型和Falalalalalalalalal al comma ex ex exal laction laction laction lades。
Article 261
Title@2025-06-11 (3): DynaSubVAE: Adaptive Subgrouping for Scalable and Robust OOD Detection
Title: DynaSubVAE: Adaptive Subgrouping for Scalable and Robust OOD Detection | DynaSubVAE: Adaptive Subgruppierung für skalierbare und robuste OOD-Erkennung | DynaSubVAE:可缩放和强力OOD探测适应性分组 2506.10200v1 |
Authors (4): Tina Behrouzi, Sana Tonekaboni, Rahul G. Krishnan, Anna Goldenberg
Real-world observational data often contain existing or emerging heterogeneous subpopulations that deviate from global patterns. The majority of models tend to overlook these underrepresented groups, leading to inaccurate or even harmful predictions. Existing solutions often rely on detecting these samples as Out-of-domain (OOD) rather than adapting the model to new emerging patterns. We introduce DynaSubVAE, a Dynamic Subgrouping Variational Autoencoder framework that jointly performs representation learning and adaptive OOD detection. Unlike conventional approaches, DynaSubVAE evolves with the data by dynamically updating its latent structure to capture new trends. It leverages a novel non-parametric clustering mechanism, inspired by Gaussian Mixture Models, to discover and model latent subgroups based on embedding similarity. Extensive experiments show that DynaSubVAE achieves competitive performance in both near-OOD and far-OOD detection, and excels in class-OOD scenarios where an entire class is missing during training. We further illustrate that our dynamic subgrouping mechanism outperforms standalone clustering methods such as GMM and KMeans++ in terms of both OOD accuracy and regret precision.
与传统方法不同,DynaSubVAE与数据不同,通过动态更新其潜在结构来捕捉新趋势。它利用高斯混合模型的新型非参数组群机制,在嵌入相似性的基础上发现和模拟潜在组群。 广泛的实验显示,DynaSubVAE在近OOD和远OODD探测中都取得了竞争性业绩,并在整个班级在培训中都缺少的类类OOD情景中取得了优异性。我们进一步说明,我们动态分组机制超越了诸如GMM和KIMeans++等独立组群方法,例如GMM和KIMeans+,在OODG和PR精确度方面都是如此。
Article 262
Title@2025-06-11 (3): Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance
Title: Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance | Objekt-Halluzination in großen Visions-Sprachen-Modellen durch Bild-umrundete Anleitung abmildern | 通过图像四舍五入指南在大型视觉语言模型中减轻物体幻觉 2402.08680v2 |
Authors (4): Linxi Zhao, Yihe Deng, Weitong Zhang, Quanquan Gu
The advancement of Large Vision-Language Models (LVLMs) has increasingly highlighted the critical issue of their tendency to hallucinate non-existing objects in the images. To address this issue, previous works focused on using specially curated datasets or powerful LLMs to rectify the outputs of LVLMs. However, these approaches require either costly training or fine-tuning, or API access to proprietary LLMs for post-generation correction. In response to these limitations, we propose Mitigating hallucinAtion via image-gRounded guIdaNcE (MARINE), a framework that is both training-free and API-free. MARINE effectively and efficiently reduces object hallucinations during inference by introducing image-grounded guidance to LVLMs. This is achieved by leveraging open-source vision models to extract object-level information, thereby enhancing the precision of LVLM-generated content. Our framework’s flexibility further allows for the integration of multiple vision models, enabling more reliable and robust object-level guidance. Through comprehensive evaluations across 5 popular LVLMs with diverse evaluation metrics and benchmarks, we demonstrate the effectiveness of MARINE, which even outperforms existing fine-tuning-based methods. Remarkably, it reduces hallucinations consistently in GPT-4V-assisted evaluation while maintaining the detailedness of LVLMs’ generations. We release our code at https://github.com/Linxi-ZHAO/MARINE.
为解决这一问题,以往的工作重点是利用特别整理的数据集或强大的LLMS来纠正LVMs的产出。然而,这些方法需要成本高的培训或微调,或者需要API获得专有的LMS来进行生成后校正。为了应对这些限制,我们提议通过图像定位古尔达恩克(MARINE)来缓解幻觉,这是一个既无培训又无API的框架。MARINE通过向LVLMs引入基于图像的指导来切实有效地减少物体的幻觉。这是通过利用公开来源的视觉模型来提取目标级信息,从而提高LVMM内容的精确度。我们的框架的灵活性进一步允许通过图像模型的整合,使得更可靠和可靠的对象级指导。通过对5个大众LVLMS系统进行全面评价,同时采用不同的评价基准和基准,从而在LVMS-LMS-DMS/REDRA规则的不断调整中,我们通过GLMS-RISMS-remartal 的升级方法来降低其有效性。
Article 263
Title@2025-06-11 (3): Exploring Topological and Localization Phenomena in SSH Chains under Generalized AAH Modulation: A Computational Approach
Title: Exploring Topological and Localization Phenomena in SSH Chains under Generalized AAH Modulation: A Computational Approach | Erforschung topologischer und Lokalisierungsphänomene in SSH-Ketten unter generalisierter AAH-Modulation: Ein rechnerischer Ansatz | 根据普遍AAAH变动:一种计算方法,在SSH链链中探索地形学和本地化现象 2506.10195v1 |
Authors (2): Souvik Ghosh, Sayak Roy
The Su-Schrieffer-Heeger (SSH) model serves as a canonical example of a one-dimensional topological insulator, yet its behavior under more complex, realistic conditions remains a fertile ground for research. This paper presents a comprehensive computational investigation into generalized SSH models, exploring the interplay between topology, quasi-periodic disorder, non-Hermiticity, and time-dependent driving. Using exact diagonalization and specialized numerical solvers, we map the system’s phase space through its spectral properties and localization characteristics, quantified by the Inverse Participation Ratio (IPR). We demonstrate that while the standard SSH model exhibits topologically protected edge states, these are destroyed by a localization transition induced by strong Aubry-Andr'e-Harper (AAH) modulation. Further, we employ unsupervised machine learning (PCA) to autonomously classify the system’s phases, revealing that strong localization can obscure underlying topological signatures. Extending the model beyond Hermiticity, we uncover the non-Hermitian skin effect, a dramatic localization of all bulk states at a boundary. Finally, we apply a periodic Floquet drive to a topologically trivial chain, successfully engineering a Floquet topological insulator characterized by the emergence of anomalous edge states at the boundaries of the quasi-energy zone. These findings collectively provide a multi-faceted view of the rich phenomena hosted in generalized 1D topological systems.
Su-Schrieffer- Heeger (SSH) 模型是一个单维表层绝缘模型的典型例子,但该模型在更复杂、更现实的条件下的行为仍然是研究的肥沃土壤。本文展示了对普通SSH模型的全面计算调查,探索了地形学、半周期性混乱、非周期性以及时间驱动之间的相互作用。我们使用精确的分解和专门的数字解算器,通过光谱特性和本地化特征绘制了系统的阶段空间图,由反向参与比率(IPR)进行量化。我们证明,虽然标准SSH模型在更复杂、更现实的条件下的行为仍然是受表面保护的边缘状态,但这些状态却由于由强力AUBry-Andrèe-Harper(AAAAH)调控的本地化过程而导致的局部化转变而遭到破坏。此外,我们采用不受监督的机器学习(PCA) 来对系统的各个阶段进行自主分类,表明强的本地化可以模糊基本的地表层特征特征。将模型扩展到Hermitici(I),我们发现非人类的内层结构结构结构的表面,我们发现在1级一级的地表层上层的表面的表面上层的表面的表面结构上层,我们成功地将一个稳定的地层的地压压压压压压的地压压压在1的地压的地压在最上。
Article 264
Title@2025-06-11 (3): Multi-task Representation Learning for Mixed Integer Linear Programming
Title: Multi-task Representation Learning for Mixed Integer Linear Programming | Multi-Task-Darstellungs-Lernen für gemischte Integer-Linear-Programmierung | 混合整数线性线性方案规划多任务代表学习 2412.14409v2 |
Authors (3): Junyang Cai, Taoan Huang, Bistra Dilkina
Mixed Integer Linear Programs (MILPs) are highly flexible and powerful tools for modeling and solving complex real-world combinatorial optimization problems. Recently, machine learning (ML)-guided approaches have demonstrated significant potential in improving MILP-solving efficiency. However, these methods typically rely on separate offline data collection and training processes, which limits their scalability and adaptability. This paper introduces the first multi-task learning framework for ML-guided MILP solving. The proposed framework provides MILP embeddings helpful in guiding MILP solving across solvers (e.g., Gurobi and SCIP) and across tasks (e.g., Branching and Solver configuration). Through extensive experiments on three widely used MILP benchmarks, we demonstrate that our multi-task learning model performs similarly to specialized models within the same distribution. Moreover, it significantly outperforms them in generalization across problem sizes and tasks.
混合线性程序(MILP)是模拟和解决复杂的现实世界组合优化问题的非常灵活和有力的工具。最近,机器学习(ML)指导方法在提高MILP解决效率方面显示出巨大潜力。然而,这些方法通常依靠单独的离线数据收集和培训程序,这限制了其可缩放性和适应性。本文介绍了ML引导的MILP解决的第一个多任务学习框架。拟议框架提供了MILP嵌入器,有助于指导MILP解决解决解决解决解决解决解决解决所有问题(如Gurobi和SCIP)和跨任务(如分流和溶解器配置)的问题。通过对三大广泛使用的MILP基准进行广泛试验,我们证明我们的多任务学习模式与同一分布范围内的专门模型类似。此外,它大大超越了在问题大小和任务之间的通用模式。
Article 265
Title@2025-06-11 (3): Improving Oral Cancer Outcomes Through Machine Learning and Dimensionality Reduction
Title: Improving Oral Cancer Outcomes Through Machine Learning and Dimensionality Reduction | Verbesserung der Ergebnisse von Oralkrebs durch maschinelles Lernen und Reduzierung der Dimensionalität | 通过机器学习和减少分量来改善口腔癌结果 2506.10189v1 |
Authors (3): Mohammad Subhi Al-Batah, Muhyeeddin Alqaraleh, Mowafaq Salem Alzboon
Oral cancer presents a formidable challenge in oncology, necessitating early diagnosis and accurate prognosis to enhance patient survival rates. Recent advancements in machine learning and data mining have revolutionized traditional diagnostic methodologies, providing sophisticated and automated tools for differentiating between benign and malignant oral lesions. This study presents a comprehensive review of cutting-edge data mining methodologies, including Neural Networks, K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and ensemble learning techniques, specifically applied to the diagnosis and prognosis of oral cancer. Through a rigorous comparative analysis, our findings reveal that Neural Networks surpass other models, achieving an impressive classification accuracy of 93,6 % in predicting oral cancer. Furthermore, we underscore the potential benefits of integrating feature selection and dimensionality reduction techniques to enhance model performance. These insights underscore the significant promise of advanced data mining techniques in bolstering early detection, optimizing treatment strategies, and ultimately improving patient outcomes in the realm of oral oncology.
口腔癌在肿瘤学方面提出了艰巨的挑战,需要早期诊断和准确预测,以提高患者存活率。最近机器学习和数据挖掘方面的进步使传统诊断方法发生了革命性的变化,为区分良性和恶性口腔损伤提供了尖端和自动化的工具。本研究报告全面审查了尖端数据挖掘方法,包括神经网络、K-Nearest Nearbearbors(KNNN)、支持矢量机(SVM)和联合学习技术,特别适用于口腔癌的诊断和诊断。通过严格的比较分析,我们的调查结果显示神经网络超越了其他模型,在预测口腔癌症方面达到了令人印象深刻的分类精确度93.6 % 。此外,我们强调将特征选择和减少维度技术结合起来以提高模型性能的潜在好处。这些见解强调了先进的数据挖掘技术在支持早期检测、优化治疗战略以及最终改善口腔肿瘤领域的患者结果方面的巨大前景。
Article 266
Title@2025-06-11 (3): Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce
Title: Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce | Zukunft der Arbeit mit KI-Agenten: Auditing Automation und Augmentation Potential in der gesamten US-Arbeitskraft | 未来与AI代理机构的合作:全美国员工队伍的审计自动化和增强潜力 2506.06576v2 |
Authors (7): Yijia Shao, Humishka Zope, Yucheng Jiang, Jiaxin Pei, David Nguyen, Erik Brynjolfsson, Diyi Yang
The rapid rise of compound AI systems (a.k.a., AI agents) is reshaping the labor market, raising concerns about job displacement, diminished human agency, and overreliance on automation. Yet, we lack a systematic understanding of the evolving landscape. In this paper, we address this gap by introducing a novel auditing framework to assess which occupational tasks workers want AI agents to automate or augment, and how those desires align with the current technological capabilities. Our framework features an audio-enhanced mini-interview to capture nuanced worker desires and introduces the Human Agency Scale (HAS) as a shared language to quantify the preferred level of human involvement. Using this framework, we construct the WORKBank database, building on the U.S. Department of Labor’s O*NET database, to capture preferences from 1,500 domain workers and capability assessments from AI experts across over 844 tasks spanning 104 occupations. Jointly considering the desire and technological capability divides tasks in WORKBank into four zones: Automation “Green Light” Zone, Automation “Red Light” Zone, R&D Opportunity Zone, Low Priority Zone. This highlights critical mismatches and opportunities for AI agent development. Moving beyond a simple automate-or-not dichotomy, our results reveal diverse HAS profiles across occupations, reflecting heterogeneous expectations for human involvement. Moreover, our study offers early signals of how AI agent integration may reshape the core human competencies, shifting from information-focused skills to interpersonal ones. These findings underscore the importance of aligning AI agent development with human desires and preparing workers for evolving workplace dynamics.
复合AI系统(a.k.a.a.a.,AI代理商)的迅速上升正在改变劳动力市场,使人们对就业转移、人力机构衰减和过度依赖自动化等问题感到关切。然而,我们缺乏对不断变化的形势的系统理解。在本文件中,我们通过引入新的审计框架来弥补这一差距,以评估哪些职业任务工人希望AI代理商自动化或扩大,以及这些工人的愿望如何与当前的技术能力相一致。我们的框架将强化的小型访问功能用于捕捉微妙的工人愿望,并将人类机构规模作为量化人类参与偏好程度的共同语言。我们利用这一框架,在劳工部O*NET数据库的基础上,建立WorkBank数据库,从1,500个领域工人的偏好和AI代理商的能力评估,涵盖超过104种职业的844项任务。我们共同考虑将工作班克的任务分为四个区域:自动化的“绿色光区”、自动化的“亮点区”的“人类机构规模”,将人类机构规模作为量化人类参与程度的通用语言。我们利用这个框架构建了Horb Bank数据库数据库,从A.
Article 267
Title@2025-06-11 (3): Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming Problem
Title: Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming Problem | Balans: Multi-Armed Bandits-basierte adaptive Großnachbarschaft Suche nach gemischt-integer-Programmierungsproblem | Balans:多武装强盗基于适应性的大型邻里搜索混合内插方案拟订问题 2412.14382v2 |
Authors (3): Junyang Cai, Serdar Kadioglu, Bistra Dilkina
Mixed-integer programming (MIP) is a powerful paradigm for modeling and solving various important combinatorial optimization problems. Recently, learning-based approaches have shown a potential to speed up MIP solving via offline training that then guides important design decisions during the search. However, a significant drawback of these methods is their heavy reliance on offline training, which requires collecting training datasets and computationally costly training epochs yet offering only limited generalization to unseen (larger) instances. In this paper, we propose Balans, an adaptive meta-solver for MIPs with online learning capability that does not require any supervision or apriori training. At its core, Balans is based on adaptive large-neighborhood search, operating on top of an MIP solver by successive applications of destroy and repair neighborhood operators. During the search, the selection among different neighborhood definitions is guided on the fly for the instance at hand via multi-armed bandit algorithms. Our extensive experiments on hard optimization instances show that Balans offers significant performance gains over the default MIP solver, is better than committing to any single best neighborhood, and improves over the state-of-the-art large-neighborhood search for MIPs. Finally, we release Balans as a highly configurable, MIP solver agnostic, open-source software.
混合内插程序(MIP)是建模和解决各种重要的组合优化问题的强大范例。最近,基于学习的方法显示出通过离线培训加快MIP解决离线培训的潜力,从而在搜索过程中指导重要的设计决定。然而,这些方法的一个重大缺陷是严重依赖离线培训,这需要收集培训数据集和计算成本高昂的培训时代,但仅对看不见(大)案例提供有限的概括化。在本文中,我们提议Balans,这是具有无需任何监督或优先培训的在线学习能力的MIP的适应性元软件。在其核心方面,Balans基于适应性大型邻里搜索,通过连续的摧毁和修理社区运营商应用在MIP解决方案解决方案解决方案解决方案解决方案解决方案解决方案的顶端运作。在搜索过程中,不同街区定义的选择以多种手边际(大)算法为指导。我们在硬性优化实例上的广泛实验显示,Balans为默认的MIP解决方案解决方案解决方案解决方案解决方案解决方案解决方案解决方案提供显著的绩效收益,比致力于任何单一的最佳邻里搜索工具更好,我们最终将高端的搜索系统升级。
Article 268
Title@2025-06-11 (3): Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment
Title: Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment | Skalierbare nicht-equivariante 3D-Molekül-Erzeugung über Rotational Alignment | 通过旋转对齐可缩放的非等式 3D 分子生成 2506.10186v1 |
Authors (2): Yuhui Ding, Thomas Hofmann
Equivariant diffusion models have achieved impressive performance in 3D molecule generation. These models incorporate Euclidean symmetries of 3D molecules by utilizing an SE(3)-equivariant denoising network. However, specialized equivariant architectures limit the scalability and efficiency of diffusion models. In this paper, we propose an approach that relaxes such equivariance constraints. Specifically, our approach learns a sample-dependent SO(3) transformation for each molecule to construct an aligned latent space. A non-equivariant diffusion model is then trained over the aligned representations. Experimental results demonstrate that our approach performs significantly better than previously reported non-equivariant models. It yields sample quality comparable to state-of-the-art equivariant diffusion models and offers improved training and sampling efficiency. Our code is available at https://github.com/skeletondyh/RADM
在 3D 分子 生成 中, 等式扩散模型取得了令人印象深刻的性能。 这些模型通过使用 SE(3) 等式除尘网络, 包含了 3D 分子 的 Euclide 的对称性。 然而, 专业的等式结构限制了 扩散模型的可缩放性和效率 。 在本文中, 我们提出了一种放松这种等式限制的方法 。 具体地说, 我们的方法为每个分子学习了一种基于样本的 SO(3) 转换, 以构建一个匹配的潜在空间 。 然后, 对一个非 等式扩散模型进行了相应的培训 。 实验结果显示, 我们的方法比先前报告的非等式模型要好得多 。 它产生与最先进的等式扩散模型相近的样本质量, 并提供更好的培训和取样效率 。 我们的代码可以在 https://github.com/skeletondyh/RADM 上查阅 。
Article 269
Title@2025-06-11 (3): Optimizing Genetic Algorithms with Multilayer Perceptron Networks for Enhancing TinyFace Recognition
Title: Optimizing Genetic Algorithms with Multilayer Perceptron Networks for Enhancing TinyFace Recognition | Optimierung genetischer Algorithmen mit mehrschichtigen Perceptron-Netzwerken zur Verbesserung der TinyFace-Erkennung | 利用增强微面识别的多层倍感人网络优化遗传算法 2506.10184v1 |
Authors (3): Mohammad Subhi Al-Batah, Mowafaq Salem Alzboon, Muhyeeddin Alqaraleh
This study conducts an empirical examination of MLP networks investigated through a rigorous methodical experimentation process involving three diverse datasets: TinyFace, Heart Disease, and Iris. Study Overview: The study includes three key methods: a) a baseline training using the default settings for the Multi-Layer Perceptron (MLP), b) feature selection using Genetic Algorithm (GA) based refinement c) Principal Component Analysis (PCA) based dimension reduction. The results show important information on how such techniques affect performance. While PCA had showed benefits in low-dimensional and noise-free datasets GA consistently increased accuracy in complex datasets by accurately identifying critical features. Comparison reveals that feature selection and dimensionality reduction play interdependent roles in enhancing MLP performance. The study contributes to the literature on feature engineering and neural network parameter optimization, offering practical guidelines for a wide range of machine learning tasks
研究概览:研究包括三个关键方法:(a) 利用多湖 Perceptron(MLP)的默认设置进行基线培训;(b) 利用基于基因变异(GA)的精细进行特征选择;(c) 基于主要组成部分分析(PCA)的尺寸降低。研究结果显示了有关这些技术如何影响性能的重要信息。尽管五氯苯甲醚通过准确识别关键特征,在低维和无噪音数据集方面显示出在GA持续提高复杂数据集的准确性的好处。比较表明,特征选择和减少维度在提高MLP性能方面发挥着相互依存的作用。该研究有助于关于特征工程学和神经网络参数优化的文献,为广泛的机器学习任务提供了实用指南。
Article 270
Title@2025-06-11 (3): A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes
Title: A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes | Eine vergleichende Studie über Techniken des maschinellen Lernens zur Frühvorhersage von Diabetes | 早期预测糖尿病的机器学习技术比较研究 2506.10180v1 |
Authors (5): Mowafaq Salem Alzboon, Mohammad Al-Batah, Muhyeeddin Alqaraleh, Ahmad Abuashour, Ahmad Fuad Bader
In many nations, diabetes is becoming a significant health problem, and early identification and control are crucial. Using machine learning algorithms to predict diabetes has yielded encouraging results. Using the Pima Indians Diabetes dataset, this study attempts to evaluate the efficacy of several machine-learning methods for diabetes prediction. The collection includes information on 768 patients, such as their ages, BMIs, and glucose levels. The techniques assessed are Logistic Regression, Decision Tree, Random Forest, k-Nearest Neighbors, Naive Bayes, Support Vector Machine, Gradient Boosting, and Neural Network. The findings indicate that the Neural Network algorithm performed the best, with an accuracy of 78.57 percent, followed by the Random Forest method, with an accuracy of 76.30 percent. The study implies that machine learning algorithms can aid diabetes prediction and be an efficient early detection tool.
在许多国家,糖尿病正在成为一个严重的健康问题,早期识别和控制至关重要。使用机器学习算法来预测糖尿病已经产生了令人鼓舞的结果。使用皮马印第安人糖尿病数据集,这项研究试图评估糖尿病预测的若干机器学习方法的功效。收集包括768名病人的信息,如他们的年龄、BMIs和葡萄糖水平。评估的技术是后勤递增、决定树、随机森林、K-Near邻居、Naive Bayes、支持矢量机、Gradient Bobsting和Neural 网络。 研究结果显示神经网络算法最出色,精确度为78.57%,其次是随机森林法,精确度为76.30%。研究意味着机器学习算法可以帮助糖尿病的预测,并成为高效的早期检测工具。
Article 271
Title@2025-06-11 (3): Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models
Title: Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models | Geometrische Regelmäßigkeit bei der Deterministischen Probenahme von Diffusions-basierten Generativen Modellen | 以传播为基础的创造模型的确定性抽样的几何规律性 2506.10177v1 |
Authors (4): Defang Chen, Zhenyu Zhou, Can Wang, Siwei Lyu
Diffusion-based generative models employ stochastic differential equations (SDEs) and their equivalent probability flow ordinary differential equations (ODEs) to establish a smooth transformation between complex high-dimensional data distributions and tractable prior distributions. In this paper, we reveal a striking geometric regularity in the deterministic sampling dynamics: each simulated sampling trajectory lies within an extremely low-dimensional subspace, and all trajectories exhibit an almost identical ‘‘boomerang’’ shape, regardless of the model architecture, applied conditions, or generated content. We characterize several intriguing properties of these trajectories, particularly under closed-form solutions based on kernel-estimated data modeling. We also demonstrate a practical application of the discovered trajectory regularity by proposing a dynamic programming-based scheme to better align the sampling time schedule with the underlying trajectory structure. This simple strategy requires minimal modification to existing ODE-based numerical solvers, incurs negligible computational overhead, and achieves superior image generation performance, especially in regions with only $5 \sim 10$ function evaluations.
在本文中,我们揭示了确定性取样动态中惊人的几何规律性:每个模拟采样轨迹都位于一个极低维次空间之内,而所有轨迹都呈现出几乎相同的“boomerang”形状,而不论模型结构、应用条件或生成的内容如何。我们对这些轨迹的一些令人感兴趣的特性作了定性,特别是在基于内核估计数据模型的封闭式解决办法下。我们还展示了所发现的轨迹规律性的实际应用,为此提出了一个动态的基于规划的计划,以更好地将采样时间表与基本轨迹结构相协调。这一简单战略要求对现有以ODE为基础的数字解答器进行最低限度的修改,产生微不足道的计算间接费用,并取得更好的图像生成性能,特别是在只有5美元或10美元的功能评价的区域。
Article 272
Title@2025-06-11 (3): SPARKE: Scalable Prompt-Aware Diversity Guidance in Diffusion Models via RKE Score
Title: SPARKE: Scalable Prompt-Aware Diversity Guidance in Diffusion Models via RKE Score | SPARKE: Skalierbares Prompt-Aware Diversity Guidance in Diffusionsmodellen über RKE-Score | SPARKE:通过RKE分数在传播模型中提供可缩放的即时快速软件多样性指导 2506.10173v1 |
Authors (4): Mohammad Jalali, Haoyu Lei, Amin Gohari, Farzan Farnia
Diffusion models have demonstrated remarkable success in high-fidelity image synthesis and prompt-guided generative modeling. However, ensuring adequate diversity in generated samples of prompt-guided diffusion models remains a challenge, particularly when the prompts span a broad semantic spectrum and the diversity of generated data needs to be evaluated in a prompt-aware fashion across semantically similar prompts. Recent methods have introduced guidance via diversity measures to encourage more varied generations. In this work, we extend the diversity measure-based approaches by proposing the Scalable Prompt-Aware R'eny Kernel Entropy Diversity Guidance (SPARKE) method for prompt-aware diversity guidance. SPARKE utilizes conditional entropy for diversity guidance, which dynamically conditions diversity measurement on similar prompts and enables prompt-aware diversity control. While the entropy-based guidance approach enhances prompt-aware diversity, its reliance on the matrix-based entropy scores poses computational challenges in large-scale generation settings. To address this, we focus on the special case of Conditional latent RKE Score Guidance, reducing entropy computation and gradient-based optimization complexity from the $O(n^3)$ of general entropy measures to $O(n)$. The reduced computational complexity allows for diversity-guided sampling over potentially thousands of generation rounds on different prompts. We numerically test the SPARKE method on several text-to-image diffusion models, demonstrating that the proposed method improves the prompt-aware diversity of the generated data without incurring significant computational costs. We release our code on the project page: https://mjalali.github.io/SPARKE
然而,确保快速引导扩散模型的生成样本具有充分的多样性仍然是一项挑战,特别是当闪光覆盖广泛的语义频谱,而生成的数据的多样性需要通过语义相似的信号进行快速觉悟式的评价。最近的方法通过多样性措施引入了指南,鼓励更多代人。在这项工作中,我们推广基于多样性的计量方法,为快速觉悟的多样化指南提出了可缩用快速引导的Kennal Entropy Explication Guidelines(SPARKE)方法。 SPARKE利用有条件的昆虫用于多样性指南,动态地为类似的语义频谱频谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱,从而在大规模生成环境中提出计算挑战。为了解决这个问题,我们把重点放在了“Contial livestal RKE recostorational Goal” 指南的特殊案例, 降低“Otropreal demal” rograductional-roduction roductional-roductionsalationalational-slational roupalational roupationalationalationalation-salationalationalationalationalationalation-slational daldal-s) 。
Article 273
Title@2025-06-11 (3): Learning a Gaussian Mixture for Sparsity Regularization in Inverse Problems
Title: Learning a Gaussian Mixture for Sparsity Regularization in Inverse Problems | Eine Gaußische Mischung für Sparsamkeitsregularisierung in inversen Problemen lernen | a 学习高斯混音,以在反向问题中实现分级规范化 2401.16612v2 |
Authors (4): Giovanni S. Alberti, Luca Ratti, Matteo Santacesaria, Silvia Sciutto
In inverse problems, it is widely recognized that the incorporation of a sparsity prior yields a regularization effect on the solution. This approach is grounded on the a priori assumption that the unknown can be appropriately represented in a basis with a limited number of significant components, while most coefficients are close to zero. This occurrence is frequently observed in real-world scenarios, such as with piecewise smooth signals. In this study, we propose a probabilistic sparsity prior formulated as a mixture of degenerate Gaussians, capable of modeling sparsity with respect to a generic basis. Under this premise, we design a neural network that can be interpreted as the Bayes estimator for linear inverse problems. Additionally, we put forth both a supervised and an unsupervised training strategy to estimate the parameters of this network. To evaluate the effectiveness of our approach, we conduct a numerical comparison with commonly employed sparsity-promoting regularization techniques, namely LASSO, group LASSO, iterative hard thresholding, and sparse coding/dictionary learning. Notably, our reconstructions consistently exhibit lower mean square error values across all $1$D datasets utilized for the comparisons, even in cases where the datasets significantly deviate from a Gaussian mixture model.
反之,人们广泛认识到,将表面偏差纳入之前的偏差性会给解决办法带来一种正规化效果。这一方法基于先验的假设,即未知因素可以在数量有限的重要组成部分的基础上得到适当代表,而大多数系数接近于零。这种情况经常在现实世界的情景中观察到,例如以片状光滑信号观察到。在这项研究中,我们提出一种概率偏差,先作为退化的高斯人的一种混合体来配制,能够模拟一般基础的过度性。在这个前提下,我们设计了一个神经网络,可以被解释为对线性反向问题进行拜斯估计。此外,我们提出了一个监督和非监督的培训战略来估计这个网络的参数。为了评估我们的方法的有效性,我们与常用的偏差性刺激规范化技术进行了数字比较,即LASSO、LASSO组、迭代硬阈定门槛和稀有的编码/字典学习。值得注意的是,我们的重建工作始终显示,在所有1美元D的混合物模型中,与用于比较的数据差得多。
Article 274
Title@2025-06-11 (3): Momentum Multi-Marginal Schrödinger Bridge Matching
Title: Momentum Multi-Marginal Schrödinger Bridge Matching | Momentum Multi-Marginal Schrödinger Brücke passend | 多波盘桥配桥 2506.10168v1 |
Authors (4): Panagiotis Theodoropoulos, Augustinos D. Saravanos, Evangelos A. Theodorou, Guan-Horng Liu
Understanding complex systems by inferring trajectories from sparse sample snapshots is a fundamental challenge in a wide range of domains, e.g., single-cell biology, meteorology, and economics. Despite advancements in Bridge and Flow matching frameworks, current methodologies rely on pairwise interpolation between adjacent snapshots. This hinders their ability to capture long-range temporal dependencies and potentially affects the coherence of the inferred trajectories. To address these issues, we introduce \textbf{Momentum Multi-Marginal Schr"odinger Bridge Matching (3MSBM)}, a novel matching framework that learns smooth measure-valued splines for stochastic systems that satisfy multiple positional constraints. This is achieved by lifting the dynamics to phase space and generalizing stochastic bridges to be conditioned on several points, forming a multi-marginal conditional stochastic optimal control problem. The underlying dynamics are then learned by minimizing a variational objective, having fixed the path induced by the multi-marginal conditional bridge. As a matching approach, 3MSBM learns transport maps that preserve intermediate marginals throughout training, significantly improving convergence and scalability. Extensive experimentation in a series of real-world applications validates the superior performance of 3MSBM compared to existing methods in capturing complex dynamics with temporal dependencies, opening new avenues for training matching frameworks in multi-marginal settings.
通过从少样的样片中推断出轨迹来理解复杂系统,这是一系列广泛领域,例如单细胞生物学、气象学和经济学领域的一项根本挑战。尽管在大桥和流程匹配框架方面有所进步,但目前的方法依赖相邻快照之间对齐的内插。这妨碍了它们捕捉长距离时间依赖的能力,并有可能影响推断轨迹的一致性。为了解决这些问题,我们引入了\ textbf{Mmomentum 多重悬界Schr"大桥匹配(3MSBMBM})这一新型匹配框架,以学习满足多种定位限制的随机系统平滑度估量值螺旋丝。尽管在大桥和流动匹配框架方面有所进步,但目前的方法是:通过提升空间的动态,将随机桥梁连接到若干点,形成多边际的有条件的新偏移最佳控制问题。然后通过尽量减少变异性目标来学习基本动态,同时将多边框桥桥牵引导出的道路固定在(3MSBM)中。作为匹配方法的匹配方法,3MSBBM系列的运输路径与在大幅改进现有双向性动态定位的移动模型的模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模型中学习,从而在大大保持现有伸缩图。
Article 275
Title@2025-06-11 (3): Wasserstein Barycenter Soft Actor-Critic
Title: Wasserstein Barycenter Soft Actor-Critic | Wasserstein Barycenter Soft-Actor-Kritik | 瓦塞斯泰因·巴里研究中心 2506.10167v1 |
Authors (2): Zahra Shahrooei, Ali Baheri
Deep off-policy actor-critic algorithms have emerged as the leading framework for reinforcement learning in continuous control domains. However, most of these algorithms suffer from poor sample efficiency, especially in environments with sparse rewards. In this paper, we take a step towards addressing this issue by providing a principled directed exploration strategy. We propose Wasserstein Barycenter Soft Actor-Critic (WBSAC) algorithm, which benefits from a pessimistic actor for temporal difference learning and an optimistic actor to promote exploration. This is achieved by using the Wasserstein barycenter of the pessimistic and optimistic policies as the exploration policy and adjusting the degree of exploration throughout the learning process. We compare WBSAC with state-of-the-art off-policy actor-critic algorithms and show that WBSAC is more sample-efficient on MuJoCo continuous control tasks.
在连续控制领域,深层的外政策行为者-批评算法已成为加强学习的主要框架,但是,大多数这些算法都因抽样效率低下而受到影响,特别是在回报微薄的环境中;在本文件中,我们通过提供有原则的定向勘探战略,为解决这一问题迈出了一步;我们提出了瓦瑟斯坦·巴里森·巴里森特·索夫特·演员-批评(WBSAC)算法,该算法得益于一个在时间差异方面学习的悲观行为者,以及一个促进探索的乐观行为者。这是通过利用悲观和乐观政策的瓦塞斯坦温温中心作为探索政策,并在整个学习过程中调整勘探程度来实现的。我们把WBSAC与最先进的非政策行为者-批评性算法进行比较,并表明WBSAC在 MuJoco连续控制任务上更具样本效率。
Article 276
Title@2025-06-11 (3): The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset
Title: The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset | Der PNPL-Wettbewerb 2025: Spracherkennung und Phoneme-Klassifizierung im LibriBrain-Datensatz | 2025年PNPL竞争:Libribrain数据集中的语音探测和电话分类 2506.10165v1 |
Authors (18): Gilad Landau, Miran Özdogan, Gereon Elvers, Francesco Mantegna, Pratik Somaiya, Dulhan Jayalath, Luisa Kurth, Teyun Kwon, Brendan Shillingford, Greg Farquhar, Minqi Jiang, Karim Jerbi, Hamza Abdelhedi, Yorguin Mantilla Ramos, Caglar Gulcehre, Mark Woolrich, Natalie Voets, Oiwi Parker Jones
The advance of speech decoding from non-invasive brain data holds the potential for profound societal impact. Among its most promising applications is the restoration of communication to paralysed individuals affected by speech deficits such as dysarthria, without the need for high-risk surgical interventions. The ultimate aim of the 2025 PNPL competition is to produce the conditions for an “ImageNet moment” or breakthrough in non-invasive neural decoding, by harnessing the collective power of the machine learning community. To facilitate this vision we present the largest within-subject MEG dataset recorded to date (LibriBrain) together with a user-friendly Python library (pnpl) for easy data access and integration with deep learning frameworks. For the competition we define two foundational tasks (i.e. Speech Detection and Phoneme Classification from brain data), complete with standardised data splits and evaluation metrics, illustrative benchmark models, online tutorial code, a community discussion board, and public leaderboard for submissions. To promote accessibility and participation the competition features a Standard track that emphasises algorithmic innovation, as well as an Extended track that is expected to reward larger-scale computing, accelerating progress toward a non-invasive brain-computer interface for speech.
从非侵入性大脑数据中解码言论的推进具有深刻的社会影响的潜力,其中最有希望的应用是恢复通信,使受言论赤字影响的个人瘫痪,例如听障者,不需要高风险外科手术干预。2025年PNPL竞争的最终目的是通过利用机器学习界的集体力量,创造“图像网时刻”或非侵入性神经解码突破的条件。为了促进这一愿景,我们展示了迄今为止记录的最大主体内主MEG数据集(利布里布雷恩),以及一个方便用户的Python图书馆(ppl),以便容易获得数据并融入深层学习框架。关于竞争,我们定义了两项基本任务(即语音探测和脑数据的电话分类),完成于标准化的数据分解和评估衡量标准、说明性基准模型、在线辅导代码、社区讨论平台以及公众提交材料的首选板。促进无障碍和参与竞争,其特点是强调算法创新的标准轨道,并将其作为升级的语音界面,用于加速进行非升级的升级的计算机计算。
Article 277
Title@2025-06-11 (3): Probabilistic Variational Contrastive Learning
Title: Probabilistic Variational Contrastive Learning | Probabilistisches Variationelles Kontrastives Lernen | 概率概率变化差异性差异性学习 2506.10159v1 |
Authors (3): Minoh Jeong, Seonho Kim, Alfred Hero
Deterministic embeddings learned by contrastive learning (CL) methods such as SimCLR and SupCon achieve state-of-the-art performance but lack a principled mechanism for uncertainty quantification. We propose Variational Contrastive Learning (VCL), a decoder-free framework that maximizes the evidence lower bound (ELBO) by interpreting the InfoNCE loss as a surrogate reconstruction term and adding a KL divergence regularizer to a uniform prior on the unit hypersphere. We model the approximate posterior $q_\theta(z | x)$ as a projected normal distribution, enabling the sampling of probabilistic embeddings. Our two instantiations–VSimCLR and VSupCon–replace deterministic embeddings with samples from $q_\theta(z | x)$ and incorporate a normalized KL term into the loss. Experiments on multiple benchmarks demonstrate that VCL mitigates dimensional collapse, enhances mutual information with class labels, and matches or outperforms deterministic baselines in classification accuracy, all the while providing meaningful uncertainty estimates through the posterior model. VCL thus equips contrastive learning with a probabilistic foundation, serving as a new basis for contrastive approaches. |
SimCLR 和 SupCon 等对比学习方法所学的确定性嵌入方法,例如SimCLR 和 SupCon , 取得了最先进的性能,但却缺乏一个用于量化不确定性的原则机制。我们建议采用差异性对比学习(VCL),这是一个无编码性框架,通过将InfoNCE 损失解释为替代重建术语,在单位超视谱中将一个标准化的 KL 术语纳入一个统一的前题,从而最大限度地增加证据约束性(ELBO ) 。我们将近似后端 $Qtheta (zxx) 的数据作为预测正常分布的模型,以便能够取样概率性嵌入。我们的两个即时- VimCLR 和 VSupCon 的确定性嵌入点与 $qótheta (zxx ) 的样本的样本混集(ELBOB) , 并在损失中加入一个标准化的 KLL 术语。在多个基准实验中表明VCLL 能够缓解尺寸崩溃, 用类标签加强相互信息, 和匹配或超出确定性基准, 分类精确性基准, , 并同时提供新的对比基础。
Article 278
Title@2025-06-11 (3): Measuring Corporate Human Capital Disclosures: Lexicon, Data, Code, and Research Opportunities
Title: Measuring Corporate Human Capital Disclosures: Lexicon, Data, Code, and Research Opportunities | Corporate Human Capital Disclosures messen: Lexikon, Daten, Code und Forschungsmöglichkeiten | 衡量公司人力资本披露:词汇、数据、代码和研究机会 2506.10155v1 |
Authors (3): Elizabeth Demers, Victor Xiaoqi Wang, Kean Wu
Human capital (HC) is increasingly important to corporate value creation. Unlike other assets, however, HC is not currently subject to well-defined measurement or disclosure rules. We use a machine learning algorithm (word2vec) trained on a confirmed set of HC disclosures to develop a comprehensive list of HC-related keywords classified into five subcategories (DEI; health and safety; labor relations and culture; compensation and benefits; and demographics and other) that capture the multidimensional nature of HC management. We share our lexicon, corporate HC disclosures, and the Python code used to develop the lexicon, and we provide detailed examples of using our data and code, including for fine-tuning a BERT model. Researchers can use our HC lexicon (or modify the code to capture another construct of interest) with their samples of corporate communications to address pertinent HC questions. We close with a discussion of future research opportunities related to HC management and disclosure.
与其它资产不同,HC目前不受明确界定的测量或披露规则的约束。 我们使用经过经证实的HC披露方法培训的机器学习算法(word2vec)来制定一份全面的HC相关关键词清单,这些关键词可分为五个亚类(DI;卫生和安全;劳动关系和文化;报酬与福利;人口统计等),它们反映了HC管理的多层面性质。我们分享了我们的词汇、公司HC披露和用于开发词汇的Python代码,我们提供了使用我们数据和代码的详细例子,包括微调BERT模型。研究人员可以使用我们的HC词汇(或修改代码以获取另一个利益结构)及其公司通信样本来解决相关的HC问题。我们最后讨论了与HC管理和披露有关的未来研究机会。
Article 279
Title@2025-06-11 (3): Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME
Title: Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME | Analysieren von Emotionen in Bangla Social Media Kommentare mit maschinellem Lernen und LIME | 利用机器学习和LIME分析孟加拉社会媒体评论中的情感 2506.10154v1 |
Authors (5): Bidyarthi Paul, SM Musfiqur Rahman, Dipta Biswas, Md. Ziaul Hasan, Md. Zahid Hossain
Research on understanding emotions in written language continues to expand, especially for understudied languages with distinctive regional expressions and cultural features, such as Bangla. This study examines emotion analysis using 22,698 social media comments from the EmoNoBa dataset. For language analysis, we employ machine learning models: Linear SVM, KNN, and Random Forest with n-gram data from a TF-IDF vectorizer. We additionally investigated how PCA affects the reduction of dimensionality. Moreover, we utilized a BiLSTM model and AdaBoost to improve decision trees. To make our machine learning models easier to understand, we used LIME to explain the predictions of the AdaBoost classifier, which uses decision trees. With the goal of advancing sentiment analysis in languages with limited resources, our work examines various techniques to find efficient techniques for emotion identification in Bangla.
有关理解书面语言情感的研究在继续扩大,特别是针对研究不足的语言,这些语言具有独特的区域表现形式和文化特征,如孟加拉语。本研究利用EmonoBa数据集的22,698份社交媒体评论来审查情感分析。在语言分析中,我们采用了机器学习模型:Linear SVM、KNN和随机森林,使用来自TF-IDF矢量器的n-gram数据。我们进一步调查了五氯苯如何影响维度的减少。此外,我们利用了BILSTM模型和AdaBoost来改进决策树。为了使机器学习模型更容易理解,我们利用LIME来解释AdaBoost分类师的预测,因为后者使用决策树。为了在资源有限的情况下推进语言的情绪分析,我们的工作研究了各种技术,以便在孟加拉语中找到有效的情感识别技术。
Article 280
Title@2025-06-11 (3): Attention on flow control: transformer-based reinforcement learning for lift regulation in highly disturbed flows
Title: Attention on flow control: transformer-based reinforcement learning for lift regulation in highly disturbed flows | Aufmerksamkeit auf Strömungssteuerung: Transformer-basiertes Bewehrungslernen zur Heberegulierung in stark gestörten Strömungen | 注意流动控制:以变压器为基础的增强力学习如何在高度动荡的流量中进行升降管制 2506.10153v1 |
Authors (2): Zhecheng Liu, Jeff D. Eldredge
A linear flow control strategy designed for weak disturbances may not remain effective in sequences of strong disturbances due to nonlinear interactions, but it is sensible to leverage it for developing a better strategy. In the present study, we propose a transformer-based reinforcement learning (RL) framework to learn an effective control strategy for regulating aerodynamic lift in gust sequences via pitch control. The transformer addresses the challenge of partial observability from limited surface pressure sensors. We demonstrate that the training can be accelerated with two techniques – pretraining with an expert policy (here, linear control) and task-level transfer learning (here, extending a policy trained on isolated gusts to multiple gusts). We show that the learned strategy outperforms the best proportional control, with the performance gap widening as the number of gusts increases. The control strategy learned in an environment with a small number of successive gusts is shown to effectively generalize to an environment with an arbitrarily long sequence of gusts. We investigate the pivot configuration and show that quarter-chord pitching control can achieve superior lift regulation with substantially less control effort compared to mid-chord pitching control. Through a decomposition of the lift, we attribute this advantage to the dominant added-mass contribution accessible via quarter-chord pitching. The success on multiple configurations shows the generalizability of the proposed transformer-based RL framework, which offers a promising approach to solve more computationally demanding flow control problems when combined with the proposed acceleration techniques.
针对微弱扰动设计的线性流动控制战略,由于非线性互动,在强烈扰动序列中可能不会继续有效,但由于非线性互动而使线性流动控制战略有效,但利用它来制定更好的战略是明智的。在本研究中,我们提议了一个基于变压器的强化学习(RL)框架,以学习一种有效的控制战略,以通过声控控制来调节古斯特序列中的空气动力提升。变压器应对从有限的表面压力传感器中部分可视化的挑战。我们证明,可以通过两种技术 – – 采用专家政策(这里,线性控制)和任务级转移学习(这里,将受过孤立螺旋式训练的政策扩大到多个螺旋型) – – 加快培训速度。我们表明,学习的变压式战略超越了最佳比例控制(RL)框架,随着气压数量的增加,性能差距扩大。 在一个环境中位性压式压力传感器的拟议组合中,我们调查了分弦式配置,并表明,四弦级调控控能能够实现更高级的升升调调,而我们通过中位式的递增压式控制优势。
Article 281
Title@2025-06-11 (3): Balanced Hyperbolic Embeddings Are Natural Out-of-Distribution Detectors
Title: Balanced Hyperbolic Embeddings Are Natural Out-of-Distribution Detectors | Ausgewogene hyperbolische Einbettungen sind natürliche Out-of-Distribution-Detektoren | 平衡的双曲嵌入器是天然流散探测器 2506.10146v1 |
Authors (3): Tejaswi Kasarla, Max van Spengler, Pascal Mettes
Out-of-distribution recognition forms an important and well-studied problem in deep learning, with the goal to filter out samples that do not belong to the distribution on which a network has been trained. The conclusion of this paper is simple: a good hierarchical hyperbolic embedding is preferred for discriminating in- and out-of-distribution samples. We introduce Balanced Hyperbolic Learning. We outline a hyperbolic class embedding algorithm that jointly optimizes for hierarchical distortion and balancing between shallow and wide subhierarchies. We then use the class embeddings as hyperbolic prototypes for classification on in-distribution data. We outline how to generalize existing out-of-distribution scoring functions to operate with hyperbolic prototypes. Empirical evaluations across 13 datasets and 13 scoring functions show that our hyperbolic embeddings outperform existing out-of-distribution approaches when trained on the same data with the same backbones. We also show that our hyperbolic embeddings outperform other hyperbolic approaches, beat state-of-the-art contrastive methods, and natively enable hierarchical out-of-distribution generalization.
分配外的识别是深层学习中的一个重要且经过广泛研究的问题,目的是筛选不属于网络所培训的分布范围的样本。 本文的结论很简单: 优等级双曲嵌入优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优等优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优
Article 282
Title@2025-06-11 (3): Physiological-Model-Based Neural Network for Heart Rate Estimation during Daily Physical Activities
Title: Physiological-Model-Based Neural Network for Heart Rate Estimation during Daily Physical Activities | Physiologisch-Modellbasiertes Neuronales Netzwerk zur Schätzung der Herzfrequenz während täglicher körperlicher Aktivitäten | 日常体育活动中心电率估计的生理-基于模型的神经网络 2506.10144v1 |
Authors (5): Yaowen Zhang, Libera Fresiello, Peter H. Veltink, Dirk W. Donker, Ying Wang
Heart failure (HF) poses a significant global health challenge, with early detection offering opportunities for improved outcomes. Abnormalities in heart rate (HR), particularly during daily activities, may serve as early indicators of HF risk. However, existing HR monitoring tools for HF detection are limited by their reliability on population-based averages. The estimation of individualized HR serves as a dynamic digital twin, enabling precise tracking of cardiac health biomarkers. Current HR estimation methods, categorized into physiologically-driven and purely data-driven models, struggle with efficiency and interpretability. This study introduces a novel physiological-model-based neural network (PMB-NN) framework for HR estimation based on oxygen uptake (VO2) data during daily physical activities. The framework was trained and tested on individual datasets from 12 participants engaged in activities including resting, cycling, and running. By embedding physiological constraints, which were derived from our proposed simplified human movement physiological model (PM), into the neural network training process, the PMB-NN model adheres to human physiological principles while achieving high estimation accuracy, with a median R$^2$ score of 0.8 and an RMSE of 8.3 bpm. Comparative statistical analysis demonstrates that the PMB-NN achieves performance on par with the benchmark neural network model while significantly outperforming traditional physiological model (p=0.002). In addition, our PMB-NN is adept at identifying personalized parameters of the PM, enabling the PM to generate reasonable HR estimation. The proposed framework with a precise VO2 estimation system derived from body movements enables the future possibilities of personalized and real-time cardiac monitoring during daily life physical activities.
心衰竭(HF)是全球健康的一个重大挑战,早期发现为改进结果提供了机会。心衰竭(HR)的常态(HR),特别是在日常活动中,可能是高频风险的早期指标。然而,现有的高频检测的HR监测工具因其在基于人口的平均数上的可靠性而受到限制。个人化的HR估算是一个动态数字双,有利于准确跟踪心脏健康生物标志。目前的HR估算方法,分为生理驱动和纯数据驱动的模式,在效率和可解释性上挣扎。这项研究引入了一个基于生理摄入氧量(VO2)数据的新颖的基于生理模型的神经网络(PMB-NN)框架。在日常物理活动中,对参与包括休息、骑自行车和运行等活动的12名参与者的个人数据集进行了培训和测试。通过将我们提议的简化人类运动生理模型(PMF)纳入神经网络培训进程,PMB-NN模型符合人类生理原则,同时实现高估精度的准确度估算值,从摄入0.8的RMB值,而IMF的逻辑值值为8.3 IMB 个人-MB 运行基准进行大幅的对比分析。
Article 283
Title@2025-06-11 (3): Diffusion prior as a direct regularization term for FWI
Title: Diffusion prior as a direct regularization term for FWI | Diffusion vor der direkten Regularisierung für FWI | 之前作为家庭福利基金会直接正规化期的传播 2506.10141v1 |
Authors (3): Yuke Xie, Hervé Chauris, Nicolas Desassis
Diffusion models have recently shown promise as powerful generative priors for inverse problems. However, conventional applications require solving the full reverse diffusion process and operating on noisy intermediate states, which poses challenges for physics-constrained computational seismic imaging. In particular, such instability is pronounced in non-linear solvers like those used in Full Waveform Inversion (FWI), where wave propagation through noisy velocity fields can lead to numerical artifacts and poor inversion quality. In this work, we propose a simple yet effective framework that directly integrates a pretrained Denoising Diffusion Probabilistic Model (DDPM) as a score-based generative diffusion prior into FWI through a score rematching strategy. Unlike traditional diffusion approaches, our method avoids the reverse diffusion sampling and needs fewer iterations. We operate the image inversion entirely in the clean image space, eliminating the need to operate through noisy velocity models. The generative diffusion prior can be introduced as a simple regularization term in the standard FWI update rule, requiring minimal modification to existing FWI pipelines. This promotes stable wave propagation and can improve convergence behavior and inversion quality. Numerical experiments suggest that the proposed method offers enhanced fidelity and robustness compared to conventional and GAN-based FWI approaches, while remaining practical and computationally efficient for seismic imaging and other inverse problem tasks.
然而,常规应用需要解决完全反向扩散过程,并在吵闹的中间状态上运行,这给物理上受限制的计算地震成像带来了挑战。特别是,这种不稳定在非线性溶液中十分明显,如在全波变换(FWI)中所使用的那种非线性溶液中十分明显,因为通过超速田的波浪传播可能导致数字文物和低反向质量。在这项工作中,我们提议了一个简单而有效的框架,直接将事先经过训练的Denoising Difmission Abability 模型(DDPM)整合为一种基于分的遗传传播模型,在FWI之前,通过分数再配比战略将之作为一种基于分的遗传传播模型传播。与传统的传播方法不同,我们的方法避免了反向扩散抽样,需要更少的迭代。我们完全在清洁的图像空间操作这种图像,不需要通过噪音速度模型操作。在FWIFWI标准更新规则中引入前的基因化扩散,这需要对现有FWI输管管管进行最低限度的修改。这可以促进稳定的波传播,并且改进趋近感行为和内化质量,同时将其他的GWIWILIA试验提出改进。
Article 284
Title@2025-06-11 (3): Survival Analysis as Imprecise Classification with Trainable Kernels
Title: Survival Analysis as Imprecise Classification with Trainable Kernels | Überlebensanalyse als ungenaue Klassifizierung mit lernfähigen Kernen | 进行生存分析,将生存分析作为可训练内核的不精确分类 2506.10140v1 |
Authors (3): Andrei V. Konstantinov, Vlada A. Efremenko, Lev V. Utkin
Survival analysis is a fundamental tool for modeling time-to-event data in healthcare, engineering, and finance, where censored observations pose significant challenges. While traditional methods like the Beran estimator offer nonparametric solutions, they often struggle with the complex data structures and heavy censoring. This paper introduces three novel survival models, iSurvM (the imprecise Survival model based on Mean likelihood functions), iSurvQ (the imprecise Survival model based on the Quantiles of likelihood functions), and iSurvJ (the imprecise Survival model based on the Joint learning), that combine imprecise probability theory with attention mechanisms to handle censored data without parametric assumptions. The first idea behind the models is to represent censored observations by interval-valued probability distributions for each instance over time intervals between events moments. The second idea is to employ the kernel-based Nadaraya-Watson regression with trainable attention weights for computing the imprecise probability distribution over time intervals for the entire dataset. The third idea is to consider three decision strategies for training, which correspond to the proposed three models. Experiments on synthetic and real datasets demonstrate that the proposed models, especially iSurvJ, consistently outperform the Beran estimator from the accuracy and computational complexity points of view. Codes implementing the proposed models are publicly available.
生存分析是保健、工程和金融领域时间到活动数据模型的基本工具,在这种模型中,经过审查的观察提出了重大挑战。贝兰天文仪等传统方法提供了非参数性的解决办法,但往往与复杂的数据结构和繁琐的审查程序相冲突。本文介绍了三种新的生存模型,即iSurvM(基于平均概率功能的不精确生存模型)、iSurvQ(基于概率功能量化的不精确生存模型)和iSurvJ(基于联合学习的不精确生存模型),这些模型结合了不准确概率理论和关注机制,以不带参数性假设的方式处理受审查的数据。模型的第一个想法是代表通过时间间隔间间隔间间隔对每个案例的受审查概率分布进行观察。第二个想法是采用基于内核的纳达拉亚-瓦特森(基于概率功能的不精确生存模型)和iSurvJ(基于联合学习的不精确生存模型),用以计算整个数据集的不精确时间间隔的概率分布。第三个想法是考虑三种培训决策战略,这与拟议的三个模型相匹配。对合成和真实的精确度假设和真实的精确度模型进行实验,从提议的公开计算,特别是从提议的模型显示。
Article 285
Title@2025-06-11 (3): Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban
Title: Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban | Lernte Suche interpretieren: Ein Übergangsmodell und eine Wertfunktion in einer RNN zu finden, die Sokoban spielt | 解释学识搜索:在播放 Sokoban 的 RNN 中找到过渡模式和价值函数 2506.10138v1 |
Authors (4): Mohammad Taufeeque, Aaron David Tucker, Adam Gleave, Adrià Garriga-Alonso
We partially reverse-engineer a convolutional recurrent neural network (RNN) trained to play the puzzle game Sokoban with model-free reinforcement learning. Prior work found that this network solves more levels with more test-time compute. Our analysis reveals several mechanisms analogous to components of classic bidirectional search. For each square, the RNN represents its plan in the activations of channels associated with specific directions. These state-action activations are analogous to a value function - their magnitudes determine when to backtrack and which plan branch survives pruning. Specialized kernels extend these activations (containing plan and value) forward and backward to create paths, forming a transition model. The algorithm is also unlike classical search in some ways. State representation is not unified; instead, the network considers each box separately. Each layer has its own plan representation and value function, increasing search depth. Far from being inscrutable, the mechanisms leveraging test-time compute learned in this network by model-free training can be understood in familiar terms.
我们的分析揭示了类似于经典双向搜索组件的若干机制。 对于每个广场, RNN 代表着它与特定方向相关频道的激活计划。 这些州行动激活类似于一个价值函数 - 它们的数量决定了向后跟踪的时间和哪个计划分支生存下来。 专门的内核扩展了这些前向和后向的启动( 包含计划和价值) 以创建路径, 形成一个过渡模式。 算法也不同于经典的搜索方式 。 国家代表制不统一; 相反, 网络将每个框分开考虑。 每个层都有自己的计划表达和价值功能, 增加搜索深度。 远非易变形, 可以理解利用无模式培训的测试时间计算机制。
Article 286
Title@2025-06-11 (3): Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Title: Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning | Selbstprädiktive Darstellungen für die kombinatorische Generalisierung im Verhaltensklonen | 行为克隆综合通用的自我预测代表 2506.10137v1 |
Authors (5): Daniel Lawson, Adriana Hugessen, Charlotte Cloutier, Glen Berseth, Khimya Khetarpal
Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in domains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective, $\text{BYOL-}\gamma$ augmented GCBC, which is not only able to theoretically approximate the successor representation in the finite MDP case without contrastive samples or TD learning, but also, results in competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.
通过监督学习(SL)培训的行为性克隆(BC)方法是学习机器人等领域人类示范政策的有效途径。 目标调整这些政策使单一的通用政策能够捕捉离线数据集内包含的各种行为。 虽然目标性行为性克隆(GCBC)方法在分散培训任务中可以很好地发挥作用,但不一定将零射射入到需要以新的国家目标配对(即组合式概括化)为条件的任务中。这一限制可部分归因于不列颠哥伦比亚所学的国家代表性缺乏时间上的一致性;如果与时间相关的国家被编码为类似的潜在代表,那么新的州目标配对的超分配差距就会缩小。因此,鼓励代表性空间的这种时间性一致性应该有助于组合式的概括化。 成功表现(它将未来所访问的国家的分布与当前状态相挂钩,即精美地概括了这一属性。 但是,先前的后继代表制方法依赖于对比性样本、时间差异(TD)学习,或者两者兼有。在这项工作中,我们提议,在不要求进行IMBBA-B的常规代表性总体对比中,我们建议一个简单的、不要求在IM-BC的排序中进行一个简单的常规对比性对比,而能够对等的对比。
Article 287
Title@2025-06-11 (3): Provable Sim-to-Real Transfer via Offline Domain Randomization
Title: Provable Sim-to-Real Transfer via Offline Domain Randomization | Provable Sim-to-Real-Übertragung über Offline-Domain Randomisierung | 通过离线网域随机转接 2506.10133v1 |
Authors (3): Arnaud Fickinger, Abderrahim Bendahi, Stuart Russell
Reinforcement-learning agents often struggle when deployed from simulation to the real-world. A dominant strategy for reducing the sim-to-real gap is domain randomization (DR) which trains the policy across many simulators produced by sampling dynamics parameters, but standard DR ignores offline data already available from the real system. We study offline domain randomization (ODR), which first fits a distribution over simulator parameters to an offline dataset. While a growing body of empirical work reports substantial gains with algorithms such as DROPO, the theoretical foundations of ODR remain largely unexplored. In this work, we (i) formalize ODR as a maximum-likelihood estimation over a parametric simulator family, (ii) prove consistency of this estimator under mild regularity and identifiability conditions, showing it converges to the true dynamics as the dataset grows, (iii) derive gap bounds demonstrating ODRs sim-to-real error is up to an O(M) factor tighter than uniform DR in the finite-simulator case (and analogous gains in the continuous setting), and (iv) introduce E-DROPO, a new version of DROPO which adds an entropy bonus to prevent variance collapse, yielding broader randomization and more robust zero-shot transfer in practice.
在从模拟到真实世界的部署过程中,强化学习的代理机构往往在从模拟到真实世界的部署中挣扎。减少模拟到现实差距的主要策略是域随机化(DR),通过取样动态参数对许多模拟模拟器进行政策培训,但标准的DR忽略了实际系统中已经存在的离线数据。我们研究了离线域随机化(ODR),它首先适合模拟参数的分布,到离线数据集。越来越多的经验性工作报告在诸如DROPO这样的算法中取得了巨大的收益,而网上解决的理论基础基本上仍未得到探讨。在这项工作中,我们(i)将网上解决正式确定为对一个计量参数模拟器大家庭的最大相似性估计,(ii)证明这个估计器在温和的经常性和可识别性条件下具有一致性,显示它与随着数据集的不断增长而真实的动态一致。 (iii) 显示网上解决机构模拟到真实错误的分错的分界线是O(M)比统一的DRM系数更接近于固定模拟器案(和类似在连续的递增的DPO中的成果),并引入更强劲的递增的递增的DPO-RPO(iv),和递增的递增的递增的递增的RPO)。
Article 288
Title@2025-06-11 (3): ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
Title: ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs | ViCrit: Eine überprüfbare Verstärkungs-Lern-Proxy-Aufgabe für visuelle Wahrnehmung in VLMs | 虚拟: VLM 视觉感知的可核查强化学习代用任务 2506.10128v1 |
Authors (13): Xiyao Wang, Zhengyuan Yang, Chao Feng, Yongyuan Liang, Yuhang Zhou, Xiaoyu Liu, Ziyi Zang, Ming Li, Chung-Ching Lin, Kevin Lin, Linjie Li, Furong Huang, Lijuan Wang
Reinforcement learning (RL) has shown great effectiveness for fine-tuning large language models (LLMs) using tasks that are challenging yet easily verifiable, such as math reasoning or code generation. However, extending this success to visual perception in vision-language models (VLMs) has been impeded by the scarcity of vision-centric tasks that are simultaneously challenging and unambiguously verifiable. To this end, we introduce ViCrit (Visual Caption Hallucination Critic), an RL proxy task that trains VLMs to localize a subtle, synthetic visual hallucination injected into paragraphs of human-written image captions. Starting from a 200-word captions, we inject a single, subtle visual description error-altering a few words on objects, attributes, counts, or spatial relations-and task the model to pinpoint the corrupted span given the image and the modified caption. This formulation preserves the full perceptual difficulty while providing a binary, exact-match reward that is easy to compute and unambiguous. Models trained with the ViCrit Task exhibit substantial gains across a variety of VL benchmarks. Crucially, the improvements transfer beyond natural-image training data to abstract image reasoning and visual math, showing promises of learning to perceive rather than barely memorizing seen objects. To facilitate evaluation, we further introduce ViCrit-Bench, a category-balanced diagnostic benchmark that systematically probes perception errors across diverse image domains and error types. Together, our results demonstrate that fine-grained hallucination criticism is an effective and generalizable objective for enhancing visual perception in VLMs.
强化学习(RL)在微调大型语言模型(LLMS)方面显示了极大的实效,这些模型使用具有挑战性但易于核查的任务,如数学推理或代码生成等。然而,将这一成功扩大到视觉语言模型(VLMS)中的视觉感知差值,由于缺少具有挑战性和明确可核查的视觉中心任务而受阻。为此,我们引入了VCrit (VicalCaption Hallucilation Criticle),这是一个代理任务,该任务培训VLMS将隐含的微妙、合成视觉幻觉植入人类写图像标题的段落。从200个字标题开始,我们输入了一个单一、微妙的视觉描述错差值,在对象、属性、计数或空间关系模型中添加了几个字眼感知觉觉觉知觉,以找出图像和经修改的标题。我们引入了完全的认知困难,同时提供了易于理解和清晰的二进式、准确的奖赏。经过VICrit任务培训的模型展示了各种VL基准中的重大收益。 值得注意,在各种VLIL基准基准中,在视觉感知错度方面进行改进,超越了常规和直观领域,在直观分析目标领域之外,我们学习了一种精测测测测数据,在学习了一种微的模型,在学习了我们所见的数学和直观测标的模型的模型,我们所看到的数学比。
Article 289
Title@2025-06-11 (3): Generative Modeling with Diffusion
Title: Generative Modeling with Diffusion | Generative Modellierung mit Diffusion | 利用传播生成模型 2412.10948v2 |
Authors (1): Justin Le
We provide an overview of the diffusion model as a method to generate new samples. Generative models have been recently adopted for tasks such as art generation (Stable Diffusion, Dall-E) and text generation (ChatGPT). Diffusion models in particular apply noise to sample data and then “reverse” this noising process to generate new samples. We will formally define these noising and denoising processes, then present algorithms to train and generate with a diffusion model. Afterward, we will explore a potential application of diffusion models in improving classifier performance on imbalanced data.
我们概述了扩散模型作为产生新样本的一种方法,最近为艺术生成(稳定传播、Dall-E)和文本生成(ChatGPT)等任务采用了生成模型。传播模型特别将噪音应用于样本数据,然后“反转”这个“循环”过程来生成新的样本。我们将正式界定这些点火和去注过程,然后提出算法来培训和生成扩散模型。随后,我们将探索推广模型的可能应用,以提高分类者在不平衡数据方面的性能。
Article 290
Title@2025-06-11 (3): Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms
Title: Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms | Treffen Sie mich am Arm: Das kooperative Multi-Armed Bandits Problem mit geteilten Armen | 与我在 “ 武器:合作的多武装强盗与可分享武器问题 “ 聚会 2506.10127v1 |
Authors (2): Xinyi Hu, Aldo Pacchiano
We study the decentralized multi-player multi-armed bandits (MMAB) problem under a no-sensing setting, where each player receives only their own reward and obtains no information about collisions. Each arm has an unknown capacity, and if the number of players pulling an arm exceeds its capacity, all players involved receive zero reward. This setting generalizes the classical unit-capacity model and introduces new challenges in coordination and capacity discovery under severe feedback limitations. We propose A-CAPELLA (Algorithm for Capacity-Aware Parallel Elimination for Learning and Allocation), a decentralized algorithm that achieves logarithmic regret in this generalized regime. Our main contribution is a collaborative hypothesis testing protocol that enables synchronized successive elimination and capacity estimation through carefully structured collision patterns. This represents a provably efficient learning result in decentralized no-sensing MMAB with unknown arm capacities.
我们研究在无观测环境下的多玩家多武装强盗(MMAB)问题,每个玩家只得到他们自己的奖赏,没有关于碰撞的信息。每个手臂都具有未知的能力,如果拉着一个手臂的玩家人数超过其能力,所有参与者都得到零奖。这概括了典型的单位能力模式,在严重的反馈限制下提出了协调和能力发现方面的新挑战。我们提议A-CAPELLA(能力-软件平行消除学习和分配的算法),这是一种分散的算法,在这个普遍制度中实现对数的遗憾。我们的主要贡献是合作的假设测试协议,它能够通过仔细结构的碰撞模式同步连续消除和能力估算。这代表了一种相当有效的学习结果,即分散了不明的手臂能力。
Article 291
Title@2025-06-11 (3): Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds’ Annotated Imagery
Title: Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds’ Annotated Imagery | Peer-Ranked Precision: Erstellung eines Basisdatensatzes für feinfühlige Visionsmodelle aus DataSeeds kommentierten Bildern | 同行式精确度:从数据种子附加图象中为精美的愿景模型创建基础数据集 2506.05673v3 |
Authors (4): Sajjad Abdoli, Freeman Lewin, Gediminas Vasiliauskas, Fabian Schonholz
The development of modern Artificial Intelligence (AI) models, particularly diffusion-based models employed in computer vision and image generation tasks, is undergoing a paradigmatic shift in development methodologies. Traditionally dominated by a “Model Centric” approach, in which performance gains were primarily pursued through increasingly complex model architectures and hyperparameter optimization, the field is now recognizing a more nuanced “Data-Centric” approach. This emergent framework foregrounds the quality, structure, and relevance of training data as the principal driver of model performance. To operationalize this paradigm shift, we introduce the DataSeeds.AI sample dataset (the “DSD”), initially comprised of approximately 10,610 high-quality human peer-ranked photography images accompanied by extensive multi-tier annotations. The DSD is a foundational computer vision dataset designed to usher in a new standard for commercial image datasets. Representing a small fraction of DataSeeds.AI’s 100 million-plus image catalog, the DSD provides a scalable foundation necessary for robust commercial and multimodal AI development. Through this in-depth exploratory analysis, we document the quantitative improvements generated by the DSD on specific models against known benchmarks and make the code and the trained models used in our evaluation publicly available.
现代人工智能(AI)模型的开发,特别是计算机视觉和图像生成任务中使用的基于传播的模型,正在发展方法的范式转变中发生范式转变。传统上,发展方法以“模型中心”方法为主,主要通过日益复杂的模型结构和超光谱优化来追求绩效收益,目前,该领域正在承认一种更加精细的“数据中心”方法。这个新兴框架为培训数据的质量、结构和相关性提供了基础,作为模型绩效的主要驱动因素。为了实施这一范式转变,我们引入了DataSeids.AI样本数据集(“DSD”),最初由大约10,610个高质量的人类同侪摄影图像组成,并配有广泛的多层次图解。DSD是一个基础计算机远景数据集,旨在引入新的商业图像数据集标准。DSD代表了一小部分数据SEE.AI的1,000万以上图像目录,它为稳健的商业和多式联运开发提供了可扩展的基础。通过这一深入的探索性分析,我们记录了我们所了解的量化的模型,并用在已知的DDD所制作的定量模型上所使用的定量模型。
Article 292
Title@2025-06-11 (3): GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments
Title: GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments | GRAIL: Ein Benchmark für GRAPH ActIve Learning in dynamischen Sensing-Umgebungen | GRAIL: 动态遥感环境中GRaph Active学习基准 2506.10120v1 |
Authors (2): Maryam Khalid, Akane Sano
Graph-based Active Learning (AL) leverages the structure of graphs to efficiently prioritize label queries, reducing labeling costs and user burden in applications like health monitoring, human behavior analysis, and sensor networks. By identifying strategically positioned nodes, graph AL minimizes data collection demands while maintaining model performance, making it a valuable tool for dynamic environments. Despite its potential, existing graph AL methods are often evaluated on static graph datasets and primarily focus on prediction accuracy, neglecting user-centric considerations such as sampling diversity, query fairness, and adaptability to dynamic settings. To bridge this gap, we introduce GRAIL, a novel benchmarking framework designed to evaluate graph AL strategies in dynamic, real-world environments. GRAIL introduces novel metrics to assess sustained effectiveness, diversity, and user burden, enabling a comprehensive evaluation of AL methods under varying conditions. Extensive experiments on datasets featuring dynamic, real-life human sensor data reveal trade-offs between prediction performance and user burden, highlighting limitations in existing AL strategies. GRAIL demonstrates the importance of balancing node importance, query diversity, and network topology, providing an evaluation mechanism for graph AL solutions in dynamic environments.
以图表为基础的积极学习(AL) 利用图表结构来有效地确定标签查询的优先次序,减少标签费用和用户在健康监测、人类行为分析和感官网络等应用方面的负担。通过确定具有战略定位的节点,图形AL最大限度地减少数据收集需求,同时保持模型性能,使其成为动态环境的宝贵工具。尽管具有潜力,但现有的图表AL方法经常在静态图表数据集中评估,主要侧重于预测准确性,忽视抽样多样性、查询公平性和适应动态环境等以用户为中心的考虑。为了缩小这一差距,我们引入了GRIL,这是一个新的基准框架,旨在评价动态、现实世界环境中的图形AL战略。GRIL引入了新的衡量标准,以评估持续的有效性、多样性和用户负担,以便能够在不同条件下全面评估AL方法。关于以动态、真实生命人类传感器数据为特征的数据集的广泛实验揭示了预测性业绩与用户负担之间的利弊,突出了现有AL战略的局限性。GRIL表明平衡节点重要性、查询多样性和网络地形学的重要性,为动态环境中的图形AL解决方案提供评价机制。
Article 293
Title@2025-06-11 (3): Detecção da Psoríase Utilizando Visão Computacional: Uma Abordagem Comparativa Entre CNNs e Vision Transformers
Title: Detecção da Psoríase Utilizando Visão Computacional: Uma Abordagem Comparativa Entre CNNs e Vision Transformers | Detecção da Psoríase Utilizando Visão Computacional: Uma Abortagem Comparativa Entre CNNs e Vision Transformers | 利用有线电视新闻网和愿景变革者 2506.10119v1 |
Authors (3): Natanael Lucena, Fábio S. da Silva, Ricardo Rios
This paper presents a comparison of the performance of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in the task of multi-classifying images containing lesions of psoriasis and diseases similar to it. Models pre-trained on ImageNet were adapted to a specific data set. Both achieved high predictive metrics, but the ViTs stood out for their superior performance with smaller models. Dual Attention Vision Transformer-Base (DaViT-B) obtained the best results, with an f1-score of 96.4%, and is recommended as the most efficient architecture for automated psoriasis detection. This article reinforces the potential of ViTs for medical image classification tasks.
本文件比较了革命神经网络(CNNs)和愿景变异器(VVTs)在包含恶性丝虫病和类似疾病损害的多级图像任务方面的表现。在图像网络上预先培训的模型经过了具体数据集的调整。这两个模型都实现了高预测度,但ViTs则以较小的模型为优异性能。双注意力视觉变异器(DaViT-B)获得了最佳效果,F1核心为96.4%,被推荐为自动检测恶性丝虫病的最有效结构。这篇文章强化了ViTs在医学图像分类任务方面的潜力。
Article 294
Title@2025-06-11 (3): NnD: Diffusion-based Generation of Physically-Nonnegative Objects
Title: NnD: Diffusion-based Generation of Physically-Nonnegative Objects | NnD: Diffusionsbasierte Generierung physikalisch-nonnegativer Objekte | NND: 以传播为主的物理-氮化物体的生成 2506.10112v1 |
Authors (3): Nadav Torem, Tamar Sde-Chen, Yoav Y. Schechner
Most natural objects have inherent complexity and variability. While some simple objects can be modeled from first principles, many real-world phenomena, such as cloud formation, require computationally expensive simulations that limit scalability. This work focuses on a class of physically meaningful, nonnegative objects that are computationally tractable but costly to simulate. To dramatically reduce computational costs, we propose nonnegative diffusion (NnD). This is a learned generative model using score based diffusion. It adapts annealed Langevin dynamics to enforce, by design, non-negativity throughout iterative scene generation and analysis (inference). NnD trains on high-quality physically simulated objects. Once trained, it can be used for generation and inference. We demonstrate generation of 3D volumetric clouds, comprising inherently nonnegative microphysical fields. Our generated clouds are consistent with cloud physics trends. They are effectively not distinguished as non-physical by expert perception.
大多数自然物体具有固有的复杂性和可变性。 虽然一些简单的天体可以从最初的原则中建模, 但许多现实世界现象, 如云层形成, 需要计算昂贵的模拟, 以限制可缩放性。 这项工作侧重于一组物理上有意义的非阴性物体, 这些物体在计算上可移动, 但模拟成本却昂贵。 为了大幅降低计算成本, 我们建议非阴性扩散( NnD ) 。 这是一个使用基于分数的传播方法的学习的基因模型。 它使无线的兰格文动态在迭代场生成和分析( 推断) 中以设计方式执行非负性。 NnD 用于高质量的物理模拟物体的培训。 一旦经过培训, 就可以用于生成和推断。 我们展示了3D体积云的生成, 由固有的非负性微物理场组成。 我们生成的云与云层物理趋势一致。 它们实际上没有被专家的认知区分为非物理。
Article 295
Title@2025-06-11 (3): AI5GTest: AI-Driven Specification-Aware Automated Testing and Validation of 5G O-RAN Components
Title: AI5GTest: AI-Driven Specification-Aware Automated Testing and Validation of 5G O-RAN Components | AI5GTest: AI-Driven Specification-Aware Automatisierte Prüfung und Validierung von 5G O-RAN Komponenten | AI5GTest: AI-Driver 5G O-RAN 部件的AI-DRE 规格-软件自动测试和验证 2506.10111v1 |
Authors (3): Abiodun Ganiyu, Pranshav Gajjar, Vijay K Shah
The advent of Open Radio Access Networks (O-RAN) has transformed the telecommunications industry by promoting interoperability, vendor diversity, and rapid innovation. However, its disaggregated architecture introduces complex testing challenges, particularly in validating multi-vendor components against O-RAN ALLIANCE and 3GPP specifications. Existing frameworks, such as those provided by Open Testing and Integration Centres (OTICs), rely heavily on manual processes, are fragmented and prone to human error, leading to inconsistency and scalability issues. To address these limitations, we present AI5GTest – an AI-powered, specification-aware testing framework designed to automate the validation of O-RAN components. AI5GTest leverages a cooperative Large Language Models (LLM) framework consisting of Gen-LLM, Val-LLM, and Debug-LLM. Gen-LLM automatically generates expected procedural flows for test cases based on 3GPP and O-RAN specifications, while Val-LLM cross-references signaling messages against these flows to validate compliance and detect deviations. If anomalies arise, Debug-LLM performs root cause analysis, providing insight to the failure cause. To enhance transparency and trustworthiness, AI5GTest incorporates a human-in-the-loop mechanism, where the Gen-LLM presents top-k relevant official specifications to the tester for approval before proceeding with validation. Evaluated using a range of test cases obtained from O-RAN TIFG and WG5-IOT test specifications, AI5GTest demonstrates a significant reduction in overall test execution time compared to traditional manual methods, while maintaining high validation accuracy.
开放电台接入网络(O-RAN)的出现,通过促进互操作性、供应商多样性和快速创新,改变了电信业,然而,其分类结构带来了复杂的测试挑战,特别是在根据O-RAN Allianscent和3GP规格验证多供应商组件方面;现有框架,如开放测试和一体化中心(OTICs)提供的框架,严重依赖人工流程,支离破碎,容易发生人为错误,导致不一致和可缩放问题;为解决这些限制,我们提出了AI5GTest – – 一个由AI驱动的、符合规格的测试框架,旨在自动验证O-RAN的组件;AI5GTest利用一个合作型大语言模型框架,包括Gen-LLM、Val-LLIM和Debug-LPPM。 Gentral-LUM自动生成基于3GPP和O-RAN规格的测试案件的预期程序流,同时,Val-LLM交叉参照这些流动的信息信号,以验证遵守情况和检测偏离。如果出现异常情况,DLLM进行相关的根分析,则使用相关的根分析,提供对IG标准的测试。
Article 296
Title@2025-06-11 (3): A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning
Title: A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning | Ein einheitliches Framework, um Symmetrie im maschinellen Lernen zu stärken, zu entdecken und zu fördern | 实施、发现和促进机器学习对称的统一框架 2311.00212v3 |
Authors (4): Samuel E. Otto, Nicholas Zolman, J. Nathan Kutz, Steven L. Brunton
Symmetry is present throughout nature and continues to play an increasingly central role in physics and machine learning. Fundamental symmetries, such as Poincar'{e} invariance, allow physical laws discovered in laboratories on Earth to be extrapolated to the farthest reaches of the universe. Symmetry is essential to achieving this extrapolatory power in machine learning applications. For example, translation invariance in image classification allows models with fewer parameters, such as convolutional neural networks, to be trained on smaller data sets and achieve state-of-the-art performance. In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in three ways: 1. enforcing known symmetry when training a model; 2. discovering unknown symmetries of a given model or data set; and 3. promoting symmetry during training by learning a model that breaks symmetries within a user-specified group of candidates when there is sufficient evidence in the data. We show that these tasks can be cast within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles. We extend and unify several existing results by showing that enforcing and discovering symmetry are linear-algebraic tasks that are dual with respect to the bilinear structure of the Lie derivative. We also propose a novel way to promote symmetry by introducing a class of convex regularization functions based on the Lie derivative and nuclear norm relaxation to penalize symmetry breaking during training of machine learning models. We explain how these ideas can be applied to a wide range of machine learning models including basis function regression, dynamical systems discovery, neural networks, and neural operators acting on fields.
自然中存在对称性, 并继续在物理和机器学习中扮演越来越核心的角色。 基本对称性, 比如 Poincar\\ {e} 差异性, 允许将地球上实验室发现的物理法律推至宇宙最远的角落。 测量性能对于在机器学习应用中实现这种外推力至关重要。 例如, 图像分类的翻译性差让参数较少的模型, 如卷轴神经网络 接受培训, 学习更小的数据集, 并实现最先进的性能。 在本文中, 我们提供了一个统一的理论和方法框架, 以三种方式将对称性对称性纳入机器学习模式: 1. 在培训模型时执行已知的对称性能; 2. 发现一个特定模型或数据集的未知的对等性; 3. 通过学习一个模型, 在数据中有足够的证据时, 在用户- 指定的候选人组中打破对正对等性, 我们显示这些任务可以在一个共同的数学框架内进行, 其核心的对等性对等性对机的对等性对等性对等性, 将一系列的对机极性模型和机极性机极性机极性机极性机极性分析性机变函数进行解释性研究, 。 我们通过在进行一系列的对机变的对机变的演演演制性判法, 将一些对机变的对机极性判法, , 将一系列的对机变法, 将一些正的对机变的机变的对机变的机变法, 向性判法, 向性判法性判法性判法性判法性判法, , 向性判法, 向性判法, 向性判法性判法, 演演演算性判法, 演演演算性判法, 演算性判法, 演演演演制性判法, , , 演算性判法, 演算性判法, 演算性判法, 演算性判法, 演算性判制的演算性判法, 演算性判法, 演算性判制的演算性判法, 演算法, 演算法, 演算法, 演算性判法, 演算性判制
Article 297
Title@2025-06-11 (3): Learning to Collaborate Over Graphs: A Selective Federated Multi-Task Learning Approach
Title: Learning to Collaborate Over Graphs: A Selective Federated Multi-Task Learning Approach | Lernen, über Graphen zu kollaborieren: Ein selektiver, federführender Lernansatz mit mehreren Aufgaben | 学习 “ 合作翻图:选择性联邦多任务学习方法 “ 2506.10102v1 |
Authors (3): Ahmed Elbakary, Chaouki Ben Issaid, Mehdi Bennis
We present a novel federated multi-task learning method that leverages cross-client similarity to enable personalized learning for each client. To avoid transmitting the entire model to the parameter server, we propose a communication-efficient scheme that introduces a feature anchor, a compact vector representation that summarizes the features learned from the client’s local classes. This feature anchor is shared with the server to account for local clients’ distribution. In addition, the clients share the classification heads, a lightweight linear layer, and perform a graph-based regularization to enable collaboration among clients. By modeling collaboration between clients as a dynamic graph and continuously updating and refining this graph, we can account for any drift from the clients. To ensure beneficial knowledge transfer and prevent negative collaboration, we leverage a community detection-based approach that partitions this dynamic graph into homogeneous communities, maximizing the sum of task similarities, represented as the graph edges’ weights, within each community. This mechanism restricts collaboration to highly similar clients within their formed communities, ensuring positive interaction and preserving personalization. Extensive experiments on two heterogeneous datasets demonstrate that our method significantly outperforms state-of-the-art baselines. Furthermore, we show that our method exhibits superior computation and communication efficiency and promotes fairness across clients.
我们提出了一个新的联结式多任务学习方法,利用跨客户的相似性,使每个客户都能进行个性化学习。为了避免将整个模型传输到参数服务器,我们提议了一个通信高效计划,引入一个功能锚,即一个集客户本地分类所学特征的紧凑矢量代表。这个功能锚与服务器共享,以说明当地客户的分配情况。此外,客户共享分类头,一个轻量的线性层,并进行基于图表的正规化,以使客户之间能够进行合作。通过将客户之间的合作建模成动态图表,不断更新和完善这个图表,我们可以说明客户的任何漂移。为了确保有益的知识转移和防止负面合作,我们利用基于社区检测的方法,将这种动态图形分割成一个单一的社区,最大限度地增加任务相似性的总和,作为每个社区的图形边端的重量。这个机制限制了客户在他们组成社区内与非常相似的客户的合作,确保积极的互动和保持个性化。在两个不同的数据中进行广泛的实验,表明我们的方法大大优于州-艺术客户的基线。此外,我们展示了我们的方法,展示了我们的方法,提高了客户的公平性。
Article 298
Title@2025-06-11 (3): VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis
Title: VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis | VISTA: Vision-Language-Schlussfolgerung für eine trainingsfreie Analyse der Stock-Zeitreihen | VISTA:无培训-库存无培训-时间-系列分析的远景-语言推断 2505.18570v3 |
Authors (5): Tina Khezresmaeilzadeh, Parsa Razmara, Seyedarmin Azizi, Mohammad Erfan Sadeghi, Erfan Baghaei Potraghloo
Stock price prediction remains a complex and high-stakes task in financial analysis, traditionally addressed using statistical models or, more recently, language models. In this work, we introduce VISTA (Vision-Language Inference for Stock Time-series Analysis), a novel, training-free framework that leverages Vision-Language Models (VLMs) for multi-modal stock forecasting. VISTA prompts a VLM with both textual representations of historical stock prices and their corresponding line charts to predict future price values. By combining numerical and visual modalities in a zero-shot setting and using carefully designed chain-of-thought prompts, VISTA captures complementary patterns that unimodal approaches often miss. We benchmark VISTA against standard baselines, including ARIMA and text-only LLM-based prompting methods. Experimental results show that VISTA outperforms these baselines by up to 89.83%, demonstrating the effectiveness of multi-modal inference for stock time-series analysis and highlighting the potential of VLMs in financial forecasting tasks without requiring task-specific training.
股票价格预测仍然是金融分析中一项复杂和高度的任务,传统上使用统计模型或最近使用的语言模型加以处理。在这项工作中,我们采用了VISTA(股票时间序列分析的视觉-语言推断),这是一个没有培训的新框架,利用视野-语言模型(VLMs)进行多模式股票预测。VISTA推动了一个VLM,同时以文字形式表述历史股票价格及其相应的线性图表来预测未来价格值。通过在零镜头设置中结合数字和视觉模式,并使用精心设计的思维链提示,VISTA捕捉了通常不采用单一方式的方法的互补模式。我们根据标准基线对VISTA进行基准,包括ARIMA和基于文本的LLMM快速方法。实验结果表明,VISTA比这些基线高出89.83%,表明股票时间序列分析的多模式推断的有效性,并强调VLMs在财务预测任务中的潜力,而无需具体任务培训。
Article 299
Title@2025-06-11 (3): Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes
Title: Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes | Grundlegende Grenzen des Lernens Hochdimensionale Vereinfachungen in Noisy Regimes | 吵闹制度中学习高维简单主义的基本限制 2506.10101v1 |
Authors (4): Seyed Amir Hossein Saberi, Amir Najafi, Abolfazl Motahari, Babak H. khalaj
In this paper, we establish sample complexity bounds for learning high-dimensional simplices in $\mathbb{R}^K$ from noisy data. Specifically, we consider $n$ i.i.d. samples uniformly drawn from an unknown simplex in $\mathbb{R}^K$, each corrupted by additive Gaussian noise of unknown variance. We prove an algorithm exists that, with high probability, outputs a simplex within $\ell_2$ or total variation (TV) distance at most $\varepsilon$ from the true simplex, provided $n \ge (K^2/\varepsilon^2) e^{\mathcal{O}(K/\mathrm{SNR}^2)}$, where $\mathrm{SNR}$ is the signal-to-noise ratio. Extending our prior work~\citep{saberi2023sample}, we derive new information-theoretic lower bounds, showing that simplex estimation within TV distance $\varepsilon$ requires at least $n \ge \Omega(K^3 \sigma^2/\varepsilon^2 + K/\varepsilon)$ samples, where $\sigma^2$ denotes the noise variance. In the noiseless scenario, our lower bound $n \ge \Omega(K/\varepsilon)$ matches known upper bounds up to constant factors. We resolve an open question by demonstrating that when $\mathrm{SNR} \ge \Omega(K^{1/2})$, noisy-case complexity aligns with the noiseless case. Our analysis leverages sample compression techniques (Ashtiani et al., 2018) and introduces a novel Fourier-based method for recovering distributions from noisy observations, potentially applicable beyond simplex learning.
在本文中, 我们从噪音数据中以 $\ mathbb{ RK$ 建立用于学习高维implice 的样本复杂度。 具体地说, 我们考虑 $ i. d. 统一地从一个未知的简单x $\ mathb{ RK$ 中提取的样本, 其中每个样本都因 添加高尔氏噪音而腐蚀。 我们证明存在一种算法, 极有可能, 输出在$2 美元或全变异( TV) 中, 最多为 $\ vareptieplon , 提供 $ $2\ varepsl2 emax , 提供 $ legn $\ vreal2\ k_ listermax listal_ listations a 。 我们从一个简单的估算值中显示 $ direntireal- devoil liversal dism a.
Article 300
Title@2025-06-11 (3): Token-Efficient RL for LLM Reasoning
Title: Token-Efficient RL for LLM Reasoning | Token-Efficient RL für LLM-Vernunft | 用于LLM 理由解释的 Token- Effecticent RL 2504.20834v4 |
Authors (2): Alan Lee, Harry Tong
We propose reinforcement learning (RL) strategies tailored for reasoning in large language models (LLMs) under strict memory and compute limits, with a particular focus on compatibility with LoRA fine-tuning. Building on early policy gradient methods with baseline subtraction, we design critic-free methods that operate on a small, informative subset of output tokens to reduce memory usage and stabilize training. We introduce S-GRPO, a stochastic variant of Group Relative Policy Optimization, and T-SPMO, a token-level prefix matching approach for fine-grained credit assignment. Applied to Qwen2-1.5B, our methods raise accuracy on the SVAMP benchmark from 46% to over 70% and show strong performance on multi-digit multiplication. Surprisingly, full-token GRPO under LoRA fails to improve over the base model, suggesting that selective token-level optimization may act as an implicit regularizer in low-parameter training regimes.
nan
Article 301
Title@2025-06-11 (3): Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models
Title: Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models | Schätzung der gemeinsamen Wahrscheinlichkeit von Szenario-Parametern mit Gaussian Mixture Copula-Modellen | 与高山混合混合胶合合构件模型共同估计设想参数的 联合概率 2506.10098v1 |
Authors (4): Christian Reichenbächer, Philipp Rank, Jochen Hipp, Oliver Bringmann
This paper presents the first application of Gaussian Mixture Copula Models to the statistical modeling of driving scenarios for the safety validation of automated driving systems. Knowledge of the joint probability distribution of scenario parameters is essential for scenario-based safety assessment, where risk quantification depends on the likelihood of concrete parameter combinations. Gaussian Mixture Copula Models bring together the multimodal expressivity of Gaussian Mixture Models and the flexibility of copulas, enabling separate modeling of marginal distributions and dependencies. We benchmark Gaussian Mixture Copula Models against previously proposed approaches - Gaussian Mixture Models and Gaussian Copula Models - using real-world driving data drawn from scenarios defined in United Nations Regulation No. 157. Our evaluation across 18 million scenario instances demonstrates that Gaussian Mixture Copula Models provide a better fit to the data in terms of both likelihood and Sinkhorn distance. These results suggest that Gaussian Mixture Copula Models are a compelling foundation for future scenario-based validation frameworks.
nan
Article 302
Title@2025-06-11 (3): Unsupervised Deep Clustering of MNIST with Triplet-Enhanced Convolutional Autoencoders
Title: Unsupervised Deep Clustering of MNIST with Triplet-Enhanced Convolutional Autoencoders | Unüberwachtes Deep Clustering von MNIST mit Triplet-Enhanced Convolutional Autoencodern | 在无人监督下,MNIST与三维强化的革命自动自动编码器一起深入集群 2506.10094v1 |
Authors (1): Md. Faizul Islam Ansari
This research implements an advanced unsupervised clustering system for MNIST handwritten digits through two-phase deep autoencoder architecture. A deep neural autoencoder requires a training process during phase one to develop minimal yet interpretive representations of images by minimizing reconstruction errors. During the second phase we unify the reconstruction error with a KMeans clustering loss for learned latent embeddings through a joint distance-based objective. Our model contains three elements which include batch normalization combined with dropout and weight decay for achieving generalized and stable results. The framework achieves superior clustering performance during extensive tests which used intrinsic measurements including Silhouette Score and Davies-Bouldin Index coupled with extrinsic metrics NMI and ARI when processing image features. The research uses t-SNE visualization to present learned embeddings that show distinct clusters for digits. Our approach reaches an optimal combination between data reconstruction accuracy and cluster separation purity when adding the benefit of understandable results and scalable implementations. The approach creates a dependable base that helps deploy unsupervised representation learning in different large-scale image clustering applications.
nan
Article 303
Title@2025-06-11 (3): Efficient kernelized bandit algorithms via exploration distributions
Title: Efficient kernelized bandit algorithms via exploration distributions | Effiziente kernisierte Bandit-Algorithmen über Explorations-Distributionen | 通过勘探分布分配进行高效率的内脏化土匪算法 2506.10091v1 |
Authors (3): Bingshan Hu, Zheng He, Danica J. Sutherland
We consider a kernelized bandit problem with a compact arm set ${X} \subset \mathbb{R}^d $ and a fixed but unknown reward function $f^*$ with a finite norm in some Reproducing Kernel Hilbert Space (RKHS). We propose a class of computationally efficient kernelized bandit algorithms, which we call GP-Generic, based on a novel concept: exploration distributions. This class of algorithms includes Upper Confidence Bound-based approaches as a special case, but also allows for a variety of randomized algorithms. With careful choice of exploration distribution, our proposed generic algorithm realizes a wide range of concrete algorithms that achieve $\tilde{O}(\gamma_T\sqrt{T})$ regret bounds, where $\gamma_T$ characterizes the RKHS complexity. This matches known results for UCB- and Thompson Sampling-based algorithms; we also show that in practice, randomization can yield better practical results.
nan
Article 304
Title@2025-06-11 (3): Assortment Optimization for Patient-Provider Matching
Title: Assortment Optimization for Patient-Provider Matching | Sortimentsoptimierung für Patient-Provider-Matching | 患者-供应者匹配的组群优化 2502.10353v2 |
Authors (2): Naveen Raman, Holly Wiberg
Rising provider turnover results in frequently needing to rematch patients with available providers. However, the rematching process is cumbersome for both patients and health systems, resulting in labor-intensive and ad hoc reassignments. We propose a novel patient-provider matching approach to address this issue by offering patients limited provider menus. The goal is to maximize match quality across the system while preserving patient choice. We frame this as a novel variant of assortment optimization, where patient-specific provider menus are offered upfront, and patients respond in a random sequence to make their selections. This hybrid offline-online setting is understudied in previous literature and captures system dynamics across various domains. We first demonstrate that a greedy baseline policy–which offers all providers to all patients–can maximize the match rate but lead to low-quality matches. Based on this, we construct a set of policies and demonstrate that the best policy depends on problem specifics, such as a patient’s willingness to match and the ratio of patients to providers. On real-world data, our proposed policy improves average match quality by 13% over a greedy solution by tailoring assortments based on patient characteristics. Our analysis reveals a tradeoff between menu size and system-wide match quality, highlighting the value of balancing patient choice with centralized planning.
nan
Article 305
Title@2025-06-11 (3): Optimizing Latent Dimension Allocation in Hierarchical VAEs: Balancing Attenuation and Information Retention for OOD Detection
Title: Optimizing Latent Dimension Allocation in Hierarchical VAEs: Balancing Attenuation and Information Retention for OOD Detection | Optimierung der Latent Dimension Allocation in Hierarchical VAEs: Abgleichende Dämpfung und Informationsretention für OOD Detection | 优化在等级性VAEs中的中局量分配:平衡减速和信息保留以探测OOD 2506.10089v1 |
Authors (3): Dane Williamson, Yangfeng Ji, Matthew Dwyer
Out-of-distribution (OOD) detection is a critical task in machine learning, particularly for safety-critical applications where unexpected inputs must be reliably flagged. While hierarchical variational autoencoders (HVAEs) offer improved representational capacity over traditional VAEs, their performance is highly sensitive to how latent dimensions are distributed across layers. Existing approaches often allocate latent capacity arbitrarily, leading to ineffective representations or posterior collapse. In this work, we introduce a theoretically grounded framework for optimizing latent dimension allocation in HVAEs, drawing on principles from information theory to formalize the trade-off between information loss and representational attenuation. We prove the existence of an optimal allocation ratio $r^{\ast}$ under a fixed latent budget, and empirically show that tuning this ratio consistently improves OOD detection performance across datasets and architectures. Our approach outperforms baseline HVAE configurations and provides practical guidance for principled latent structure design, leading to more robust OOD detection with deep generative models.
nan
Article 306
Title@2025-06-11 (3): M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Predictive Embedding Architecture
Title: M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Predictive Embedding Architecture | M3-JEPA: Multimodale Ausrichtung über Multi-Gate MoE auf Basis der Joint-Predictive Embedding Architecture | M3-JEPA:通过基于联合预防嵌入结构的多门教育部实现多式联运协调 2409.05929v5 |
Authors (11): Hongyang Lei, Xiaolong Cheng, Qi Qin, Dan Wang, Kun Fan, Huazhen Huang, Qingqing Gu, Yetao Wu, Zhonglin Jiang, Yong Chen, Luo Ji
Current multimodal learning strategies primarily optimize in the original token space. Such a framework is easy to incorporate with the backbone of pretrained language model, but might result in modality collapse. To alleviate such issues, we leverage the joint embedding predictive architecture (JEPA) on the multimodal tasks, which converts the input embedding into the output embedding space by a predictor and then conducts the cross-modal alignment on the latent space. We implement this predictor by a Multi-Gate Mixture of Experts (MMoE) and name the framework as M3-JEPA, accordingly. The gating function disentangles the modality-specific and shared information and derives information-theoretic optimality. The framework is implemented with both contrastive and regularization loss, and solved by alternative gradient descent (AGD) between different multimodal tasks. By thoroughly designed experiments, we show that M3-JEPA can obtain state-of-the-art performance on different modalities and tasks, generalize to unseen datasets and domains, and is computationally efficient in both training and inference. Our observation suggests that M3-JEPA might become a new basis to self-supervised learning in the open world.
nan
Article 307
Title@2025-06-11 (3): Patient-Specific Deep Reinforcement Learning for Automatic Replanning in Head-and-Neck Cancer Proton Therapy
Title: Patient-Specific Deep Reinforcement Learning for Automatic Replanning in Head-and-Neck Cancer Proton Therapy | Patientenspezifisches Tiefen-Verstärkungs-Lernen zur automatischen Neuplanung in der Kopf-Neck-Krebs-Protonen-Therapie | 在头部和颈部癌症质子治疗中进行自动再规划的 具体病人深强化学习 2506.10073v1 |
Authors (6): Malvern Madondo, Yuan Shao, Yingzi Liu, Jun Zhou, Xiaofeng Yang, Zhen Tian
Anatomical changes during intensity-modulated proton therapy (IMPT) for head-and-neck cancer (HNC) can shift Bragg peaks, risking tumor underdosing and organ-at-risk overdosing. As a result, treatment replanning is often required to maintain clinically acceptable treatment quality. However, current manual replanning processes are resource-intensive and time-consuming. We propose a patient-specific deep reinforcement learning (DRL) framework for automated IMPT replanning, with a reward-shaping mechanism based on a $150$-point plan quality score addressing competing clinical objectives. We formulate the planning process as an RL problem where agents learn control policies to adjust optimization priorities, maximizing plan quality. Unlike population-based approaches, our framework trains personalized agents for each patient using their planning CT (Computed Tomography) and augmented anatomies simulating anatomical changes (tumor progression and regression). This patient-specific approach leverages anatomical similarities throughout treatment, enabling effective plan adaptation. We implemented two DRL algorithms, Deep Q-Network and Proximal Policy Optimization, using dose-volume histograms (DVHs) as state representations and a $22$-dimensional action space of priority adjustments. Evaluation on five HNC patients using actual replanning CT data showed both DRL agents improved initial plan scores from $120.63 \pm 21.40$ to $139.78 \pm 6.84$ (DQN) and $142.74 \pm 5.16$ (PPO), surpassing manual replans generated by a human planner ($137.20 \pm 5.58$). Clinical validation confirms that improvements translate to better tumor coverage and OAR sparing across diverse anatomical changes. This work demonstrates DRL’s potential in addressing geometric and dosimetric complexities of adaptive proton therapy, offering efficient offline adaptation solutions and advancing online adaptive proton therapy.
nan
Article 308
Title@2025-06-11 (3): Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
Title: Textual Bayes: Quantifying Uncertainty in LLM-Based Systems | Textbuchten: Quantifizierung der Unsicherheit in LLM-basierten Systemen | 以LLM为基础的系统中的量化不确定性 2506.10060v1 |
Authors (11): Brendan Leigh Ross, Noël Vouitsis, Atiyeh Ashari Ghomi, Rasa Hosseinzadeh, Ji Xin, Zhaoyan Liu, Yi Sui, Shiyi Hou, Kin Kwan Leung, Gabriel Loaiza-Ganem, Jesse C. Cresswell
Although large language models (LLMs) are becoming increasingly capable of solving challenging real-world tasks, accurately quantifying their uncertainty remains a critical open problem, which limits their applicability in high-stakes domains. This challenge is further compounded by the closed-source, black-box nature of many state-of-the-art LLMs. Moreover, LLM-based systems can be highly sensitive to the prompts that bind them together, which often require significant manual tuning (i.e., prompt engineering). In this work, we address these challenges by viewing LLM-based systems through a Bayesian lens. We interpret prompts as textual parameters in a statistical model, allowing us to use a small training dataset to perform Bayesian inference over these prompts. This novel perspective enables principled uncertainty quantification over both the model’s textual parameters and its downstream predictions, while also incorporating prior beliefs about these parameters expressed in free-form text. To perform Bayesian inference, a difficult problem even for well-studied data modalities, we introduce Metropolis-Hastings through LLM Proposals (MHLP), a novel Markov chain Monte Carlo (MCMC) algorithm that combines prompt optimization techniques with standard MCMC methods. MHLP is a turnkey modification to existing LLM pipelines, including those that rely exclusively on closed-source models. Empirically, we demonstrate that our method yields improvements in both predictive accuracy and uncertainty quantification (UQ) on a range of LLM benchmarks and UQ tasks. More broadly, our work demonstrates a viable path for incorporating methods from the rich Bayesian literature into the era of LLMs, paving the way for more reliable and calibrated LLM-based systems.
nan
Article 309
Title@2025-06-11 (3): DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos
Title: DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos | DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos | DGS-LRM: 实时变形 3D Gaussian 重建独视视频 2506.09997v1 |
Authors (14): Chieh Hubert Lin, Zhaoyang Lv, Songyin Wu, Zhen Xu, Thu Nguyen-Phuoc, Hung-Yu Tseng, Julian Straub, Numair Khan, Lei Xiao, Ming-Hsuan Yang, Yuheng Ren, Richard Newcombe, Zhao Dong, Zhengqin Li
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM), the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene. Feed-forward scene reconstruction has gained significant attention for its ability to rapidly create digital replicas of real-world environments. However, most existing models are limited to static scenes and fail to reconstruct the motion of moving objects. Developing a feed-forward model for dynamic scene reconstruction poses significant challenges, including the scarcity of training data and the need for appropriate 3D representations and training paradigms. To address these challenges, we introduce several key technical contributions: an enhanced large-scale synthetic dataset with ground-truth multi-view videos and dense 3D scene flow supervision; a per-pixel deformable 3D Gaussian representation that is easy to learn, supports high-quality dynamic view synthesis, and enables long-range 3D tracking; and a large transformer network that achieves real-time, generalizable dynamic scene reconstruction. Extensive qualitative and quantitative experiments demonstrate that DGS-LRM achieves dynamic scene reconstruction quality comparable to optimization-based methods, while significantly outperforming the state-of-the-art predictive dynamic reconstruction method on real-world examples. Its predicted physically grounded 3D deformation is accurate and can readily adapt for long-range 3D tracking tasks, achieving performance on par with state-of-the-art monocular video 3D tracking methods.
nan
Article 310
Title@2025-06-11 (3): Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling
Title: Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling | Flipping gegen alle Ungeraden: LLM-Münze Flip-Bias durch verbalisierte Ablehnungsstichproben reduzieren | 反万千万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万千万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万 2506.09998v1 |
Authors (6): Tim Z. Xiao, Johannes Zenn, Zhen Liu, Weiyang Liu, Robert Bamler, Bernhard Schölkopf
Large language models (LLMs) can often accurately describe probability distributions using natural language, yet they still struggle to generate faithful samples from them. This mismatch limits their use in tasks requiring reliable stochasticity, such as Monte Carlo methods, agent-based simulations, and randomized decision-making. We investigate this gap between knowledge and sampling in the context of Bernoulli distributions. We introduce Verbalized Rejection Sampling (VRS), a natural-language adaptation of classical rejection sampling that prompts the LLM to reason about and accept or reject proposed samples. Despite relying on the same Bernoulli mechanism internally, VRS substantially reduces sampling bias across models. We provide theoretical analysis showing that, under mild assumptions, VRS improves over direct sampling, with gains attributable to both the algorithm and prompt design. More broadly, our results show how classical probabilistic tools can be verbalized and embedded into LLM workflows to improve reliability, without requiring access to model internals or heavy prompt engineering.
nan
Article 311
Title@2025-06-11 (3): EVINET: Towards Open-World Graph Learning via Evidential Reasoning Network
Title: EVINET: Towards Open-World Graph Learning via Evidential Reasoning Network | EVINET: Auf dem Weg zum Open-World Graph Learning über Evidential Reasoning Network | EVINET:通过证明理由网络走向开放世界图表学习 2506.07288v2 |
Authors (5): Weijie Guan, Haohui Wang, Jian Kang, Lihui Liu, Dawei Zhou
Graph learning has been crucial to many real-world tasks, but they are often studied with a closed-world assumption, with all possible labels of data known a priori. To enable effective graph learning in an open and noisy environment, it is critical to inform the model users when the model makes a wrong prediction to in-distribution data of a known class, i.e., misclassification detection or when the model encounters out-of-distribution from novel classes, i.e., out-of-distribution detection. This paper introduces Evidential Reasoning Network (EVINET), a framework that addresses these two challenges by integrating Beta embedding within a subjective logic framework. EVINET includes two key modules: Dissonance Reasoning for misclassification detection and Vacuity Reasoning for out-of-distribution detection. Extensive experiments demonstrate that EVINET outperforms state-of-the-art methods across multiple metrics in the tasks of in-distribution classification, misclassification detection, and out-of-distribution detection. EVINET demonstrates the necessity of uncertainty estimation and logical reasoning for misclassification detection and out-of-distribution detection and paves the way for open-world graph learning. Our code and data are available at https://github.com/SSSKJ/EviNET.
nan
Article 312
Title@2025-06-11 (3): Text-Aware Image Restoration with Diffusion Models
Title: Text-Aware Image Restoration with Diffusion Models | Text-Aware Bildrestaurierung mit Diffusionsmodellen | 带有传播模型的文本软件图像恢复 2506.09993v1 |
Authors (9): Jaewon Min, Jin Hyeon Kim, Paul Hyunbin Cho, Jaeeun Lee, Jihye Park, Minkyu Park, Sangpil Kim, Hyunhee Park, Seungryong Kim
Image restoration aims to recover degraded images. However, existing diffusion-based restoration methods, despite great success in natural image restoration, often struggle to faithfully reconstruct textual regions in degraded images. Those methods frequently generate plausible but incorrect text-like patterns, a phenomenon we refer to as text-image hallucination. In this paper, we introduce Text-Aware Image Restoration (TAIR), a novel restoration task that requires the simultaneous recovery of visual contents and textual fidelity. To tackle this task, we present SA-Text, a large-scale benchmark of 100K high-quality scene images densely annotated with diverse and complex text instances. Furthermore, we propose a multi-task diffusion framework, called TeReDiff, that integrates internal features from diffusion models into a text-spotting module, enabling both components to benefit from joint training. This allows for the extraction of rich text representations, which are utilized as prompts in subsequent denoising steps. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art restoration methods, achieving significant gains in text recognition accuracy. See our project page: https://cvlab-kaist.github.io/TAIR/
nan
Article 313
Title@2025-06-11 (3): Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Title: Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation | Multiverse: Ihre Sprachmodelle entscheiden geheim, wie die Generation parallelisiert und zusammengeführt werden kann | 多种语言:您的语言模型秘密决定如何平行和合并一代 2506.09991v1 |
Authors (5): Xinyu Yang, Yuwei An, Hongyi Liu, Tianqi Chen, Beidi Chen
Autoregressive Large Language Models (AR-LLMs) frequently exhibit implicit parallelism in sequential generation. Inspired by this, we introduce Multiverse, a new generative model that enables natively parallel generation. Multiverse internalizes a MapReduce paradigm, generating automatically through three stages: (i) a Map stage for adaptive task decomposition, (ii) a Process stage for parallel subtask execution, and (iii) a Reduce stage for lossless result synthesis. Next, we build a real-world Multiverse reasoning model with co-design of data, algorithm, and system, enabling rapid and seamless transfer from frontier AR-LLMs. Starting from sequential reasoning chains, we create Multiverse 1K by converting them into structured training data using an automated LLM-assisted pipeline, avoiding costly human annotations. Algorithmically, we design Multiverse Attention to separate parallel reasoning steps while keeping compatibility with causal attention for efficient training. Systematically, we implement Multiverse Engine to enable parallel inference. It features a dedicated scheduler that dynamically switches between sequential and parallel generation, triggered directly by the model. After a 3-hour fine-tuning with 1K examples, our Multiverse-32B stands as the only open-sourced non-AR model achieving performance on par with leading AR-LLMs of the same scale, evidenced by AIME24 & 25 scores of 54% and 46%, respectively. Moreover, our budget control experiments show that Multiverse-32B exhibits superior scaling, outperforming AR-LLMs by 1.87% on average using the same context length. Such scaling further leads to practical efficiency gain, achieving up to 2x speedup across varying batch sizes. We have open-sourced the entire Multiverse ecosystem, including data, model weights, engine, supporting tools, as well as complete data curation prompts and detailed training and evaluation recipes.
nan
Article 314
Title@2025-06-11 (3): Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation
Title: Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation | Chain-of-Action: Trajektorie Autoregressive Modellierung für die Robotermanipulation | 操作链:机器人操纵的轨迹自动递减模型模型 2506.09990v1 |
Authors (10): Wenbo Zhang, Tianrun Hu, Yanyuan Qiao, Hanbo Zhang, Yuchu Qin, Yang Li, Jiajun Liu, Tao Kong, Lingqiao Liu, Xiao Ma
We present Chain-of-Action (CoA), a novel visuo-motor policy paradigm built upon Trajectory Autoregressive Modeling. Unlike conventional approaches that predict next step action(s) forward, CoA generates an entire trajectory by explicit backward reasoning with task-specific goals through an action-level Chain-of-Thought (CoT) process. This process is unified within a single autoregressive structure: (1) the first token corresponds to a stable keyframe action that encodes the task-specific goals; and (2) subsequent action tokens are generated autoregressively, conditioned on the initial keyframe and previously predicted actions. This backward action reasoning enforces a global-to-local structure, allowing each local action to be tightly constrained by the final goal. To further realize the action reasoning structure, CoA incorporates four complementary designs: continuous action token representation; dynamic stopping for variable-length trajectory generation; reverse temporal ensemble; and multi-token prediction to balance action chunk modeling with global structure. As a result, CoA gives strong spatial generalization capabilities while preserving the flexibility and simplicity of a visuo-motor policy. Empirically, we observe CoA achieves the state-of-the-art performance across 60 RLBench tasks and 8 real-world manipulation tasks.
nan
Article 315
Title@2025-06-11 (3): Spectral Image Tokenizer
Title: Spectral Image Tokenizer | Spektraler Bild-Tokenizer | 频谱图像调试器 2412.09607v2 |
Authors (3): Carlos Esteves, Mohammed Suhail, Ameesh Makadia
Image tokenizers map images to sequences of discrete tokens, and are a crucial component of autoregressive transformer-based image generation. The tokens are typically associated with spatial locations in the input image, arranged in raster scan order, which is not ideal for autoregressive modeling. In this paper, we propose to tokenize the image spectrum instead, obtained from a discrete wavelet transform (DWT), such that the sequence of tokens represents the image in a coarse-to-fine fashion. Our tokenizer brings several advantages: 1) it leverages that natural images are more compressible at high frequencies, 2) it can take and reconstruct images of different resolutions without retraining, 3) it improves the conditioning for next-token prediction – instead of conditioning on a partial line-by-line reconstruction of the image, it takes a coarse reconstruction of the full image, 4) it enables partial decoding where the first few generated tokens can reconstruct a coarse version of the image, 5) it enables autoregressive models to be used for image upsampling. We evaluate the tokenizer reconstruction metrics as well as multiscale image generation, text-guided image upsampling and editing.
nan
Article 316
Title@2025-06-11 (3): EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits
Title: EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits | EditInspector: Ein Benchmark für die Auswertung von textgeführten Bildbearbeitungen | 编辑监视器:文本指导图像编辑评价基准 2506.09988v1 |
Authors (4): Ron Yosef, Moran Yanuka, Yonatan Bitton, Dani Lischinski
Text-guided image editing, fueled by recent advancements in generative AI, is becoming increasingly widespread. This trend highlights the need for a comprehensive framework to verify text-guided edits and assess their quality. To address this need, we introduce EditInspector, a novel benchmark for evaluation of text-guided image edits, based on human annotations collected using an extensive template for edit verification. We leverage EditInspector to evaluate the performance of state-of-the-art (SoTA) vision and language models in assessing edits across various dimensions, including accuracy, artifact detection, visual quality, seamless integration with the image scene, adherence to common sense, and the ability to describe edit-induced changes. Our findings indicate that current models struggle to evaluate edits comprehensively and frequently hallucinate when describing the changes. To address these challenges, we propose two novel methods that outperform SoTA models in both artifact detection and difference caption generation.
nan
Article 317
Title@2025-06-11 (3): Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs
Title: Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs | Omni-DPO: Ein Dual-Perspektiven-Paradigma für dynamisches Preference-Lernen von LLMs | Omni-DPO:一个双双面范式,以动态方式学习LLMs 2506.10054v1 |
Authors (10): Shangpin Peng, Weinong Wang, Zhuotao Tian, Senqiao Yang, Xing Wu, Haotian Xu, Chengquan Zhang, Takashi Isobe, Baotian Hu, Min Zhang
Direct Preference Optimization (DPO) has become a cornerstone of reinforcement learning from human feedback (RLHF) due to its simplicity and efficiency. However, existing DPO-based approaches typically treat all preference pairs uniformly, ignoring critical variations in their inherent quality and learning utility, leading to suboptimal data utilization and performance. To address this challenge, we propose Omni-DPO, a dual-perspective optimization framework that jointly accounts for (1) the inherent quality of each preference pair and (2) the model’s evolving performance on those pairs. By adaptively weighting samples according to both data quality and the model’s learning dynamics during training, Omni-DPO enables more effective training data utilization and achieves better performance. Experimental results on various models and benchmarks demonstrate the superiority and generalization capabilities of Omni-DPO. On textual understanding tasks, Gemma-2-9b-it finetuned with Omni-DPO beats the leading LLM, Claude 3 Opus, by a significant margin of 6.7 points on the Arena-Hard benchmark. On mathematical reasoning tasks, Omni-DPO consistently outperforms the baseline methods across all benchmarks, providing strong empirical evidence for the effectiveness and robustness of our approach. Code and models will be available at https://github.com/pspdada/Omni-DPO.
nan
Article 318
Title@2025-06-11 (3): A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Title: A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs | Ein Shortcut-aware-Video-QA-Benchmark für physisches Verständnis über minimale Video-Paare | 通过最小视频对等器进行身体理解的捷径视距视频QA基准 2506.09987v1 |
Authors (7): Benno Krojer, Mojtaba Komeili, Candace Ross, Quentin Garrido, Koustuv Sinha, Nicolas Ballas, Mahmoud Assran
Existing benchmarks for assessing the spatio-temporal understanding and reasoning abilities of video language models are susceptible to score inflation due to the presence of shortcut solutions based on superficial visual or textual cues. This paper mitigates the challenges in accurately assessing model performance by introducing the Minimal Video Pairs (MVP) benchmark, a simple shortcut-aware video QA benchmark for assessing the physical understanding of video language models. The benchmark is comprised of 55K high-quality multiple-choice video QA examples focusing on physical world understanding. Examples are curated from nine video data sources, spanning first-person egocentric and exocentric videos, robotic interaction data, and cognitive science intuitive physics benchmarks. To mitigate shortcut solutions that rely on superficial visual or textual cues and biases, each sample in MVP has a minimal-change pair – a visually similar video accompanied by an identical question but an opposing answer. To answer a question correctly, a model must provide correct answers for both examples in the minimal-change pair; as such, models that solely rely on visual or textual biases would achieve below random performance. Human performance on MVP is 92.9\%, while the best open-source state-of-the-art video-language model achieves 40.2\% compared to random performance at 25\%.
nan
Article 319
Title@2025-06-11 (3): V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Title: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning | V-JEPA 2: Selbstüberwachte Videomodelle ermöglichen Verständnis, Vorhersage und Planung | V-JEPA2:自我监督的视频模型,促进理解、预测和规划 2506.09985v1 |
Authors (30): Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Mojtaba, Komeili, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, Sergio Arnaud, Abha Gejji, Ada Martin, Francois Robert Hogan, Daniel Dugas, Piotr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong Li, Xiaodong Ma, Sarath Chandar, Franziska Meier, Yann LeCun, Michael Rabbat, Nicolas Ballas
A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supervised approach that combines internet-scale video data with a small amount of interaction data (robot trajectories), to develop models capable of understanding, predicting, and planning in the physical world. We first pre-train an action-free joint-embedding-predictive architecture, V-JEPA 2, on a video and image dataset comprising over 1 million hours of internet video. V-JEPA 2 achieves strong performance on motion understanding (77.3 top-1 accuracy on Something-Something v2) and state-of-the-art performance on human action anticipation (39.7 recall-at-5 on Epic-Kitchens-100) surpassing previous task-specific models. Additionally, after aligning V-JEPA 2 with a large language model, we demonstrate state-of-the-art performance on multiple video question-answering tasks at the 8 billion parameter scale (e.g., 84.0 on PerceptionTest, 76.9 on TempCompass). Finally, we show how self-supervised learning can be applied to robotic planning tasks by post-training a latent action-conditioned world model, V-JEPA 2-AC, using less than 62 hours of unlabeled robot videos from the Droid dataset. We deploy V-JEPA 2-AC zero-shot on Franka arms in two different labs and enable picking and placing of objects using planning with image goals. Notably, this is achieved without collecting any data from the robots in these environments, and without any task-specific training or reward. This work demonstrates how self-supervised learning from web-scale data and a small amount of robot interaction data can yield a world model capable of planning in the physical world.
nan
Article 320
Title@2025-06-11 (3): Trustworthy AI: Safety, Bias, and Privacy – A Survey
Title: Trustworthy AI: Safety, Bias, and Privacy – A Survey | Vertrauenswürdige KI: Sicherheit, Bias und Privatsphäre – Eine Umfrage | 值得信赖的大赦国际:安全、比亚斯和隐私 – – 调查 2502.10450v2 |
Authors (4): Xingli Fang, Jianwei Li, Varun Mulchandani, Jung-Eun Kim
The capabilities of artificial intelligence systems have been advancing to a great extent, but these systems still struggle with failure modes, vulnerabilities, and biases. In this paper, we study the current state of the field, and present promising insights and perspectives regarding concerns that challenge the trustworthiness of AI models. In particular, this paper investigates the issues regarding three thrusts: safety, privacy, and bias, which hurt models’ trustworthiness. For safety, we discuss safety alignment in the context of large language models, preventing them from generating toxic or harmful content. For bias, we focus on spurious biases that can mislead a network. Lastly, for privacy, we cover membership inference attacks in deep neural networks. The discussions addressed in this paper reflect our own experiments and observations.
nan
Article 321
Title@2025-06-11 (3): Resa: Transparent Reasoning Models via SAEs
Title: Resa: Transparent Reasoning Models via SAEs | Resa: Transparente Begründungsmodelle über SAE | Resa:通过SAEs建立透明说明理由模型 2506.09967v1 |
Authors (7): Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Deqing Fu, Willie Neiswanger
How cost-effectively can we elicit strong reasoning in language models by leveraging their underlying representations? We answer this question with Resa, a family of 1.5B reasoning models trained via a novel and efficient sparse autoencoder tuning (SAE-Tuning) procedure. This method first trains an SAE to capture reasoning abilities from a source model, and then uses the trained SAE to guide a standard supervised fine-tuning process to elicit such abilities in a target model, all using verified question-answer data without any reasoning traces. Notably, when applied to certain base models before further RL post-training, SAE-Tuning retains >97% of its RL-trained counterpart’s reasoning performance while reducing training costs by >2000x to roughly $1 and training time by >450x to around 20 minutes. Furthermore, when applied to lightly RL-trained models (e.g., within 1 hour on 2 GPUs), it enables reasoning performance such as 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23 for only around $1 additional cost. Surprisingly, the reasoning abilities extracted via SAEs are potentially both generalizable and modular. Generality means abilities extracted from one dataset still elevate performance on a larger and overlapping corpus. Modularity means abilities extracted from Qwen or Qwen-Math can be attached to the R1-Distill model at test time, without any retraining, and yield comparable gains. Extensive ablations validate these findings and all artifacts are fully open-sourced.
nan
Article 322
Title@2025-06-11 (3): Gradient Aligned Regression via Pairwise Losses
Title: Gradient Aligned Regression via Pairwise Losses | Gradient Aligned Regression über Paarweise Verluste | 通过对等损失逐步走向一致后退 2402.06104v6 |
Authors (3): Dixian Zhu, Tianbao Yang, Livnat Jerby
Regression is a fundamental task in machine learning that has garnered extensive attention over the past decades. The conventional approach for regression involves employing loss functions that primarily concentrate on aligning model prediction with the ground truth for each individual data sample. Recent research endeavors have introduced novel perspectives by incorporating label similarity to regression via imposing extra pairwise regularization on the latent feature space and demonstrated the effectiveness. However, there are two drawbacks for those approaches: i) their pairwise operation in latent feature space is computationally more expensive than conventional regression losses; ii) it lacks of theoretical justifications behind such regularization. In this work, we propose GAR (Gradient Aligned Regression) as a competitive alternative method in label space, which is constituted by a conventional regression loss and two pairwise label difference losses for gradient alignment including magnitude and direction. GAR enjoys: i) the same level efficiency as conventional regression loss because the quadratic complexity for the proposed pairwise losses can be reduced to linear complexity; ii) theoretical insights from learning the pairwise label difference to learning the gradient of the ground truth function. We limit our current scope as regression on the clean data setting without noises, outliers or distributional shifts, etc. We demonstrate the effectiveness of the proposed method practically on two synthetic datasets and on eight extensive real-world tasks from six benchmark datasets with other eight competitive baselines. Running time experiments demonstrate the superior efficiency of the proposed GAR over existing methods with pairwise regularization in latent feature space and ablation studies demonstrate the effectiveness of each component for GAR.
nan
Article 323
Title@2025-06-11 (3): Griffin: Towards a Graph-Centric Relational Database Foundation Model
Title: Griffin: Towards a Graph-Centric Relational Database Foundation Model | Griffin: Auf dem Weg zu einem Graph-Centric Relational Database Foundation Modell | 格里芬:走向图形中心关系数据库基础模型 2505.05568v2 |
Authors (7): Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, Muhan Zhang
We introduce Griffin, the first foundation model attemptation designed specifically for Relational Databases (RDBs). Unlike previous smaller models focused on single RDB tasks, Griffin unifies the data encoder and task decoder to handle diverse tasks. Additionally, we enhance the architecture by incorporating a cross-attention module and a novel aggregator. Griffin utilizes pretraining on both single-table and RDB datasets, employing advanced encoders for categorical, numerical, and metadata features, along with innovative components such as cross-attention modules and enhanced message-passing neural networks (MPNNs) to capture the complexities of relational data. Evaluated on large-scale, heterogeneous, and temporal graphs extracted from RDBs across various domains (spanning over 150 million nodes), Griffin demonstrates superior or comparable performance to individually trained models, excels in low-data scenarios, and shows strong transferability with similarity and diversity in pretraining across new datasets and tasks, highlighting its potential as a universally applicable foundation model for RDBs. Code available at https://github.com/yanxwb/Griffin.
nan
Article 324
Title@2025-06-11 (3): Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models
Title: Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models | Provable Benefits of Unüberwachte Pre-Training and Transfer Learning via Single-Index Models | 通过单一指标模式进行的未受监督的培训和转让前学习的惠益 2502.16849v2 |
Authors (3): Taj Jones-McCormick, Aukosh Jagannath, Subhabrata Sen
Unsupervised pre-training and transfer learning are commonly used techniques to initialize training algorithms for neural networks, particularly in settings with limited labeled data. In this paper, we study the effects of unsupervised pre-training and transfer learning on the sample complexity of high-dimensional supervised learning. Specifically, we consider the problem of training a single-layer neural network via online stochastic gradient descent. We establish that pre-training and transfer learning (under concept shift) reduce sample complexity by polynomial factors (in the dimension) under very general assumptions. We also uncover some surprising settings where pre-training grants exponential improvement over random initialization in terms of sample complexity.
nan
Article 325
Title@2025-06-11 (3): Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy
Title: Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy | Kvasir-VQA-x1: Ein multimodaler Datensatz für medizinische Vernunft und robustes MedVQA in der gastrointestinalen Endoskopie | Kvasir-VQA-x1:一个用于医疗理由的多模式数据集和气肠内肠内镜检查中的强力MDVQA 2506.09958v1 |
Authors (3): Sushant Gautam, Michael A. Riegler, Pål Halvorsen
Medical Visual Question Answering (MedVQA) is a promising field for developing clinical decision support systems, yet progress is often limited by the available datasets, which can lack clinical complexity and visual diversity. To address these gaps, we introduce Kvasir-VQA-x1, a new, large-scale dataset for gastrointestinal (GI) endoscopy. Our work significantly expands upon the original Kvasir-VQA by incorporating 159,549 new question-answer pairs that are designed to test deeper clinical reasoning. We developed a systematic method using large language models to generate these questions, which are stratified by complexity to better assess a model’s inference capabilities. To ensure our dataset prepares models for real-world clinical scenarios, we have also introduced a variety of visual augmentations that mimic common imaging artifacts. The dataset is structured to support two main evaluation tracks: one for standard VQA performance and another to test model robustness against these visual perturbations. By providing a more challenging and clinically relevant benchmark, Kvasir-VQA-x1 aims to accelerate the development of more reliable and effective multimodal AI systems for use in clinical settings. The dataset is fully accessible and adheres to FAIR data principles, making it a valuable resource for the wider research community. Code and data: https://github.com/Simula/Kvasir-VQA-x1 and https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1
nan
Article 326
Title@2025-06-11 (3): Canonical Latent Representations in Conditional Diffusion Models
Title: Canonical Latent Representations in Conditional Diffusion Models | Kanonische Latentendarstellungen in bedingten Diffusionsmodellen | 有条件扩散模型中卡尼卡内端表示器 2506.09955v1 |
Authors (4): Yitao Xu, Tong Zhang, Ehsan Pajouheshgar, Sabine Süsstrunk
Conditional diffusion models (CDMs) have shown impressive performance across a range of generative tasks. Their ability to model the full data distribution has opened new avenues for analysis-by-synthesis in downstream discriminative learning. However, this same modeling capacity causes CDMs to entangle the class-defining features with irrelevant context, posing challenges to extracting robust and interpretable representations. To this end, we identify Canonical LAtent Representations (CLAReps), latent codes whose internal CDM features preserve essential categorical information while discarding non-discriminative signals. When decoded, CLAReps produce representative samples for each class, offering an interpretable and compact summary of the core class semantics with minimal irrelevant details. Exploiting CLAReps, we develop a novel diffusion-based feature-distillation paradigm, CaDistill. While the student has full access to the training set, the CDM as teacher transfers core class knowledge only via CLAReps, which amounts to merely 10 % of the training data in size. After training, the student achieves strong adversarial robustness and generalization ability, focusing more on the class signals instead of spurious background cues. Our findings suggest that CDMs can serve not just as image generators but also as compact, interpretable teachers that can drive robust representation learning.
nan
Article 327
Title@2025-06-11 (3): Scalable First-order Method for Certifying Optimal k-Sparse GLMs
Title: Scalable First-order Method for Certifying Optimal k-Sparse GLMs | Skalierbare Methode erster Ordnung zur Zertifizierung optimaler k-Spar-GLMs | 可缩放的核证最佳 k- Sparse GLM 最佳 GLM 的可缩放第一等级方法 2502.09502v3 |
Authors (3): Jiachang Liu, Soroosh Shafiee, Andrea Lodi
This paper investigates the problem of certifying optimality for sparse generalized linear models (GLMs), where sparsity is enforced through an $\ell_0$ cardinality constraint. While branch-and-bound (BnB) frameworks can certify optimality by pruning nodes using dual bounds, existing methods for computing these bounds are either computationally intensive or exhibit slow convergence, limiting their scalability to large-scale problems. To address this challenge, we propose a first-order proximal gradient algorithm designed to solve the perspective relaxation of the problem within a BnB framework. Specifically, we formulate the relaxed problem as a composite optimization problem and demonstrate that the proximal operator of the non-smooth component can be computed exactly in log-linear time complexity, eliminating the need to solve a computationally expensive second-order cone program. Furthermore, we introduce a simple restart strategy that enhances convergence speed while maintaining low per-iteration complexity. Extensive experiments on synthetic and real-world datasets show that our approach significantly accelerates dual bound computations and is highly effective in providing optimality certificates for large-scale problems.
nan
Article 328
Title@2025-06-11 (3): 7B Fully Open Source Moxin-LLM/VLM – From Pretraining to GRPO-based Reinforcement Learning Enhancement
Title: 7B Fully Open Source Moxin-LLM/VLM – From Pretraining to GRPO-based Reinforcement Learning Enhancement | 7B Vollständig offene Quelle Moxin-LLM/VLM – Vom Vortraining zur GRPO-basierten Verstärkungs-Lernverbesserung | 7B 完全开放源代码 Moxin-LLM/VLM – – 从预培训到基于GROP的强化学习加强 2412.06845v5 |
Authors (18): Pu Zhao, Xuan Shen, Zhenglun Kong, Yixin Shen, Sung-En Chang, Timothy Rupprecht, Lei Lu, Enfu Nan, Changdi Yang, Yumei He, Weiyan Shi, Xingchen Xu, Yu Huang, Wei Jiang, Wei Wang, Yue Chen, Yong He, Yanzhi Wang
Recently, Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Although open-source LLMs present unprecedented opportunities for innovation and research, the commercialization of LLMs has raised concerns about transparency, reproducibility, and safety. Many open-source LLMs fail to meet fundamental transparency requirements by withholding essential components like training code and data, which may hinder further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed, adhering to principles of open science, open source, open data, and open access. We release the pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints, aiming to make continuous commitments to fully open-source LLMs. After pre-training the base model, we finetune the Moxin Base model with SOTA post-training framework and instruction data to obtain Moxin Instruct model. To improve the reasoning capability, we further finetune our Instruct model with chain-of-thought data distilled from DeepSeek R1, and then use Group Relative Policy Optimization (GRPO) following DeepSeek R1 to finetune our model, leading to the Moxin Reasoning model. Moreover, we develop our vision language model based on our Moxin model. Experiments show that our models achieve superior performance in various evaluations such as zero-shot evaluation, few-shot evaluation, and CoT evaluation.
nan
Article 329
Title@2025-06-11 (3): Curvature Tuning: Provable Training-free Model Steering From a Single Parameter
Title: Curvature Tuning: Provable Training-free Model Steering From a Single Parameter | Curvature Tuning: Vorstellbare trainingsfreie Modelllenkung aus einem einzigen Parameter | 曲线图示:从单一参数看可实现的无培训示范指导 2502.07783v4 |
Authors (3): Leyang Hu, Matteo Gamba, Randall Balestriero
The scaling of model and data sizes has reshaped the AI landscape, establishing finetuning pretrained models as the standard paradigm for solving downstream tasks. However, dominant finetuning methods typically rely on weight adaptation, often lack interpretability, and depend on heuristically chosen hyperparameters. In this paper, we take a different perspective and shift the focus from weights to activation functions, viewing them through the lens of spline operators. We propose Curvature Tuning (CT), an interpretable and principled steering method that modulates a model’s decision boundary by injecting a single hyperparameter into its activation functions. We show that CT provably adjusts model decision boundary curvature and, more fundamentally, projects a model onto a space of smooth functions-thereby complementing current finetuning methods, whose effect lies primarily in feature adaptation. Making this hyperparameter trainable gives rise to a novel and highly parameter-efficient finetuning method. Empirically, CT improves both generalization and robustness. For example, it boosts downstream accuracy of ResNet-50/152 by 7.14%/8.46% over linear probing and 4.64%/1.70% over LoRA across 12 datasets, and improves robust accuracy on the $\ell_\infty$ benchmark from RobustBench by 1032.64%/1494.46%. Our code is available at https://github.com/Leon-Leyang/curvature-tuning.
nan
Article 330
Title@2025-06-11 (3): The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability
Title: The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability | Die Komplexität der Online-Strategischen Entscheidungsfindung mit Informationen Asymmetrie und Wissenstransportierbarkeit | 在线战略决策与信息不对称和知识可传输性 2506.09940v1 |
Authors (7): Jiachen Hu, Rui Ai, Han Zhong, Xiaoyu Chen, Liwei Wang, Zhaoran Wang, Zhuoran Yang
Information asymmetry is a pervasive feature of multi-agent systems, especially evident in economics and social sciences. In these settings, agents tailor their actions based on private information to maximize their rewards. These strategic behaviors often introduce complexities due to confounding variables. Simultaneously, knowledge transportability poses another significant challenge, arising from the difficulties of conducting experiments in target environments. It requires transferring knowledge from environments where empirical data is more readily available. Against these backdrops, this paper explores a fundamental question in online learning: Can we employ non-i.i.d. actions to learn about confounders even when requiring knowledge transfer? We present a sample-efficient algorithm designed to accurately identify system dynamics under information asymmetry and to navigate the challenges of knowledge transfer effectively in reinforcement learning, framed within an online strategic interaction model. Our method provably achieves learning of an $\epsilon$-optimal policy with a tight sample complexity of $O(1/\epsilon^2)$.
nan
Article 331
Title@2025-06-11 (3): Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order
Title: Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order | Leveraging Coordinate Momentum in SignSGD und Muon: Speicheroptimierte Null-Order | 利用 SignSGD 和 Muon 中的调控坐标动态: 内存适应的零点 2506.04430v2 |
Authors (8): Egor Petrov, Grigoriy Evseev, Aleksey Antonov, Andrey Veprikov, Pavel Plyusnin, Nikolay Bushkov, Stanislav Moiseev, Aleksandr Beznosikov
Fine-tuning Large Language Models (LLMs) is essential for adapting pre-trained models to downstream tasks. Yet traditional first-order optimizers such as Stochastic Gradient Descent (SGD) and Adam incur prohibitive memory and computational costs that scale poorly with model size. In this paper, we investigate zero-order (ZO) optimization methods as a memory- and compute-efficient alternative, particularly in the context of parameter-efficient fine-tuning techniques like LoRA. We propose $\texttt{JAGUAR SignSGD}$, a ZO momentum-based algorithm that extends ZO SignSGD, requiring the same number of parameters as the standard ZO SGD and only $\mathcal{O}(1)$ function evaluations per iteration. To the best of our knowledge, this is the first study to establish rigorous convergence guarantees for SignSGD in the stochastic ZO case. We further propose $\texttt{JAGUAR Muon}$, a novel ZO extension of the Muon optimizer that leverages the matrix structure of model parameters, and we provide its convergence rate under arbitrary stochastic noise. Through extensive experiments on challenging LLM fine-tuning benchmarks, we demonstrate that the proposed algorithms meet or exceed the convergence quality of standard first-order methods, achieving significant memory reduction. Our theoretical and empirical results establish new ZO optimization methods as a practical and theoretically grounded approach for resource-constrained LLM adaptation. Our code is available at https://github.com/brain-mmo-lab/ZO_LLM
nan
Article 332
Title@2025-06-11 (3): Limits of nonlinear and dispersive fiber propagation for an optical fiber-based extreme learning machine
Title: Limits of nonlinear and dispersive fiber propagation for an optical fiber-based extreme learning machine | Grenzen der nichtlinearen und dispersiven Faserausbreitung für eine optische Faser-basierte extreme Lernmaschine | 光纤极端学习机器的非线性和分散性纤维传播限制 2503.03649v3 |
Authors (8): Andrei V. Ermolaev, Mathilde Hary, Lev Leybov, Piotr Ryczkowski, Anas Skalli, Daniel Brunner, Goëry Genty, John M. Dudley
We report a generalized nonlinear Schr"odinger equation simulation model of an extreme learning machine (ELM) based on optical fiber propagation. Using the MNIST handwritten digit dataset as a benchmark, we study how accuracy depends on propagation dynamics, as well as parameters governing spectral encoding, readout, and noise. For this dataset and with quantum noise limited input, test accuracies of : over 91% and 93% are found for propagation in the anomalous and normal dispersion regimes respectively. Our results also suggest that quantum noise on the input pulses introduces an intrinsic penalty to ELM performance.
nan
Article 333
Title@2025-06-11 (3): CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Title: CaLMQA: Exploring culturally specific long-form question answering across 23 languages | CaLMQA: Erforschen von kulturell spezifischen Langzeitfragen in 23 Sprachen | CALMQA:探讨以23种语言回答文化上特有的长式问题 2406.17761v3 |
Authors (6): Shane Arora, Marzena Karpinska, Hung-Ting Chen, Ipsita Bhattacharjee, Mohit Iyyer, Eunsol Choi
Despite rising global usage of large language models (LLMs), their ability to generate long-form answers to culturally specific questions remains unexplored in many languages. To fill this gap, we perform the first study of textual multilingual long-form QA by creating CaLMQA, a dataset of 51.7K culturally specific questions across 23 different languages. We define culturally specific questions as those that refer to concepts unique to one or a few cultures, or have different answers depending on the cultural or regional context. We obtain these questions by crawling naturally-occurring questions from community web forums in high-resource languages, and by hiring native speakers to write questions in under-resourced, rarely-studied languages such as Fijian and Kirundi. Our data collection methodologies are translation-free, enabling the collection of culturally unique questions like “Kuber iki umwami wa mbere w’uburundi yitwa Ntare?” (Kirundi; English translation: “Why was the first king of Burundi called Ntare (Lion)?”). We evaluate factuality, relevance and surface-level quality of LLM-generated long-form answers, finding that (1) for many languages, even the best models make critical surface-level errors (e.g., answering in the wrong language, repetition), especially for low-resource languages; and (2) answers to culturally specific questions contain more factual errors than answers to culturally agnostic questions – questions that have consistent meaning and answer across many cultures. We release CaLMQA to facilitate future research in cultural and multilingual long-form QA.
nan
Article 334
Title@2025-06-11 (3): Generate-then-Verify: Reconstructing Data from Limited Published Statistics
Title: Generate-then-Verify: Reconstructing Data from Limited Published Statistics | Generate-then-Verify: Rekonstruieren von Daten aus begrenzten veröffentlichten Statistiken | 生成 - - - - - - - 核查:重新构建来自有限出版统计数据的数据 2504.21199v2 |
Authors (5): Terrance Liu, Eileen Xiao, Adam Smith, Pratiksha Thaker, Zhiwei Steven Wu
We study the problem of reconstructing tabular data from aggregate statistics, in which the attacker aims to identify interesting claims about the sensitive data that can be verified with 100% certainty given the aggregates. Successful attempts in prior work have conducted studies in settings where the set of published statistics is rich enough that entire datasets can be reconstructed with certainty. In our work, we instead focus on the regime where many possible datasets match the published statistics, making it impossible to reconstruct the entire private dataset perfectly (i.e., when approaches in prior work fail). We propose the problem of partial data reconstruction, in which the goal of the adversary is to instead output a $\textit{subset}$ of rows and/or columns that are $\textit{guaranteed to be correct}$. We introduce a novel integer programming approach that first $\textbf{generates}$ a set of claims and then $\textbf{verifies}$ whether each claim holds for all possible datasets consistent with the published aggregates. We evaluate our approach on the housing-level microdata from the U.S. Decennial Census release, demonstrating that privacy violations can still persist even when information published about such data is relatively sparse.
nan
Article 335
Title@2025-06-11 (3): Discovering Forbidden Topics in Language Models
Title: Discovering Forbidden Topics in Language Models | Verbotene Themen in Sprachmodellen entdecken | 发现语言模型中的禁止专题 2505.17441v3 |
Authors (4): Can Rager, Chris Wendler, Rohit Gandikota, David Bau
Refusal discovery is the task of identifying the full set of topics that a language model refuses to discuss. We introduce this new problem setting and develop a refusal discovery method, Iterated Prefill Crawler (IPC), that uses token prefilling to find forbidden topics. We benchmark IPC on Tulu-3-8B, an open-source model with public safety tuning data. Our crawler manages to retrieve 31 out of 36 topics within a budget of 1000 prompts. Next, we scale the crawler to a frontier model using the prefilling option of Claude-Haiku. Finally, we crawl three widely used open-weight models: Llama-3.3-70B and two of its variants finetuned for reasoning: DeepSeek-R1-70B and Perplexity-R1-1776-70B. DeepSeek-R1-70B reveals patterns consistent with censorship tuning: The model exhibits “thought suppression” behavior that indicates memorization of CCP-aligned responses. Although Perplexity-R1-1776-70B is robust to censorship, IPC elicits CCP-aligned refusals answers in the quantized model. Our findings highlight the critical need for refusal discovery methods to detect biases, boundaries, and alignment failures of AI systems.
nan
Article 336
Title@2025-06-11 (3): RNE: a plug-and-play framework for diffusion density estimation and inference-time control
Title: RNE: a plug-and-play framework for diffusion density estimation and inference-time control | RNE: ein Plug-and-Play-Framework für Diffusionsdichteschätzung und Inferenzzeitkontrolle | RNE:用于传播密度估计和推断时间控制的一个插件和操作框架 2506.05668v2 |
Authors (4): Jiajun He, José Miguel Hernández-Lobato, Yuanqi Du, Francisco Vargas
In this paper, we introduce the Radon-Nikodym Estimator (RNE), a flexible, plug-and-play framework for diffusion inference-time density estimation and control, based on the concept of the density ratio between path distributions. RNE connects and unifies a variety of existing density estimation and inference-time control methods under a single and intuitive perspective, stemming from basic variational inference and probabilistic principles therefore offering both theoretical clarity and practical versatility. Experiments demonstrate that RNE delivers strong results in diffusion density estimation, and offers broad applicability to inference-time control tasks – such as annealing, diffusion model composition, and reward-tilting – with promising inference-time scaling performance.
nan
Article 337
Title@2025-06-11 (3): Bayesian Probabilistic Matrix Factorization
Title: Bayesian Probabilistic Matrix Factorization | Bayesische probabilistische Matrix-Fabrikisierung | Bayesian 概率矩阵 2506.09928v1 |
Authors (2): Ruixuan Xu, Xiangxiang Weng
Matrix factorization is a widely used technique in recommendation systems. Probabilistic Matrix Factorization (PMF) [1] extends traditional matrix factorization by incorporating probability distributions over latent factors, allowing for uncertainty quantification. However, computing the posterior distribution is intractable due to the high-dimensional integral. To address this, we employ two Bayesian inference methods: Markov Chain Monte Carlo (MCMC) [2] and Variational Inference (VI) [3] to approximate the posterior. We evaluate their performance on MovieLens dataset and compare their convergence speed, predictive accuracy, and computational efficiency. Experimental results demonstrate that VI offers faster convergence, while MCMC provides more accurate posterior estimates.
nan
Article 338
Title@2025-06-11 (3): Mamba time series forecasting with uncertainty quantification
Title: Mamba time series forecasting with uncertainty quantification | Mamba-Zeitreihenvorhersage mit Unsicherheitsquantifizierung | Mamba时间序列预测,带有不确定性量化的Mamba时间序列 2503.10873v2 |
Authors (5): Pedro Pessoa, Paul Campitelli, Douglas P. Shepherd, S. Banu Ozkan, Steve Pressé
State space models, such as Mamba, have recently garnered attention in time series forecasting due to their ability to capture sequence patterns. However, in electricity consumption benchmarks, Mamba forecasts exhibit a mean error of approximately 8\%. Similarly, in traffic occupancy benchmarks, the mean error reaches 18\%. This discrepancy leaves us to wonder whether the prediction is simply inaccurate or falls within error given spread in historical data. To address this limitation, we propose a method to quantify the predictive uncertainty of Mamba forecasts. Here, we propose a dual-network framework based on the Mamba architecture for probabilistic forecasting, where one network generates point forecasts while the other estimates predictive uncertainty by modeling variance. We abbreviate our tool, Mamba with probabilistic time series forecasting, as Mamba-ProbTSF and the code for its implementation is available on GitHub (https://github.com/PessoaP/Mamba-ProbTSF). Evaluating this approach on synthetic and real-world benchmark datasets, we find Kullback-Leibler divergence between the learned distributions and the data–which, in the limit of infinite data, should converge to zero if the model correctly captures the underlying probability distribution–reduced to the order of $10^{-3}$ for synthetic data and $10^{-1}$ for real-world benchmark, demonstrating its effectiveness. We find that in both the electricity consumption and traffic occupancy benchmark, the true trajectory stays within the predicted uncertainty interval at the two-sigma level about 95\% of the time. We end with a consideration of potential limitations, adjustments to improve performance, and considerations for applying this framework to processes for purely or largely stochastic dynamics where the stochastic changes accumulate, as observed for example in pure Brownian motion or molecular dynamics trajectories.
nan
Article 339
Title@2025-06-11 (3): Dynamic Diffusion Schrödinger Bridge in Astrophysical Observational Inversions
Title: Dynamic Diffusion Schrödinger Bridge in Astrophysical Observational Inversions | Dynamische Diffusion Schrödinger Brücke in astrophysikalischen Beobachtungsinversionen | 天体物理观测反转中的动态扩散Srödinger桥 2506.08065v2 |
Authors (5): Ye Zhu, Duo Xu, Zhiwei Deng, Jonathan C. Tan, Olga Russakovsky
We study Diffusion Schr"odinger Bridge (DSB) models in the context of dynamical astrophysical systems, specifically tackling observational inverse prediction tasks within Giant Molecular Clouds (GMCs) for star formation. We introduce the Astro-DSB model, a variant of DSB with the pairwise domain assumption tailored for astrophysical dynamics. By investigating its learning process and prediction performance in both physically simulated data and in real observations (the Taurus B213 data), we present two main takeaways. First, from the astrophysical perspective, our proposed paired DSB method improves interpretability, learning efficiency, and prediction performance over conventional astrostatistical and other machine learning methods. Second, from the generative modeling perspective, probabilistic generative modeling reveals improvements over discriminative pixel-to-pixel modeling in Out-Of-Distribution (OOD) testing cases of physical simulations with unseen initial conditions and different dominant physical processes. Our study expands research into diffusion models beyond the traditional visual synthesis application and provides evidence of the models’ learning abilities beyond pure data statistics, paving a path for future physics-aware generative models which can align dynamics between machine learning and real (astro)physical systems.
nan
Article 340
Title@2025-06-11 (3): Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning
Title: Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning | Apollo: A Posteriori Label-Only Mitgliedschaft Inferenz Angriff auf Maschine Unlearning | 阿波罗:一个后选标签 - 唯一的会员 推论攻击 机器退出学习 2506.09923v1 |
Authors (3): Liou Tang, James Joshi, Ashish Kundu
Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.
nan
Article 341
Title@2025-06-11 (3): Logits-Based Finetuning
Title: Logits-Based Finetuning | Logits-based Finetuning | 基于登录的微调 2505.24461v2 |
Authors (7): Jingyao Li, Senqiao Yang, Sitong Wu, Han Shi, Chuanyang Zheng, Hong Xu, Jiaya Jia
In recent years, developing compact and efficient large language models (LLMs) has emerged as a thriving area of research. Traditional Supervised Fine-Tuning (SFT), which relies on singular ground truth labels, often fails to capture token-level dependencies and linguistic diversity. To address these limitations, we propose a logits-based fine-tuning framework that integrates the strengths of supervised learning and knowledge distillation. Our approach constructs enriched training targets by combining teacher logits with ground truth labels, preserving both correctness and linguistic diversity. This ensures more reliable and effective training. We constructed a large-scale 1.2M logits dataset and trained a series of science-focused models. Experimental results demonstrate that our method achieves significant improvements, with accuracy gains of 18% on Mawps and 22.7% on TabMWP. Across nine widely used mathematical benchmarks, our method consistently outperforms prior SFT models, achieving an average improvement of 7.28%. Codes are available at https://github.com/dvlab-research/Logits-Based-Finetuning.
nan
Article 342
Title@2025-06-11 (3): Optimal Noise Reduction in Dense Mixed-Membership Stochastic Block Models under Diverging Spiked Eigenvalues Condition
Title: Optimal Noise Reduction in Dense Mixed-Membership Stochastic Block Models under Diverging Spiked Eigenvalues Condition | Optimale Lärmreduzierung in Dichte Mischmitgliedschaft Stochastic Block Models unter Diverging Spiked Eigenvalues Zustand | 在四分五裂的斯派派值条件下,减少高密度混合混合成员制碎块模型的最佳噪音 2307.14530v3 |
Authors (2): Fedor Noskov, Maxim Panov
Community detection is one of the most critical problems in modern network science. Its applications can be found in various fields, from protein modeling to social network analysis. Recently, many papers appeared studying the problem of overlapping community detection, where each node of a network may belong to several communities. In this work, we consider Mixed-Membership Stochastic Block Model (MMSB) first proposed by Airoldi et al. MMSB provides quite a general setting for modeling overlapping community structure in graphs. The central question of this paper is to reconstruct relations between communities given an observed network. We compare different approaches and establish the minimax lower bound on the estimation error. Then, we propose a new estimator that matches this lower bound. Theoretical results are proved under fairly general conditions on the considered model. Finally, we illustrate the theory in a series of experiments.
nan
Article 343
Title@2025-06-11 (3): Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
Title: Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity | Adam exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity | Adam 通过协调的适应性 开发 $\ ell\ incty$- 损失地貌测量 2410.08198v3 |
Authors (3): Shuo Xie, Mohamad Amin Mohamadi, Zhiyuan Li
Adam outperforms SGD when training language models. Yet this advantage is not well-understood theoretically – previous convergence analysis for Adam and SGD mainly focuses on the number of steps $T$ and is already minimax-optimal in non-convex cases, which are both $\widetilde{O}(T^{-1/4})$. In this work, we argue that the exploitation of nice $\ell_\infty$-geometry is the key advantage of Adam over SGD. More specifically, we give a new convergence analysis for Adam under novel assumptions that loss is smooth under $\ell_\infty$-geometry rather than the more common $\ell_2$-geometry, which yields a much better empirical smoothness constant for GPT-2 and ResNet models. Our experiments confirm that Adam performs much worse when the favorable $\ell_\infty$-geometry is changed while SGD provably remains unaffected. We also extend the convergence analysis to blockwise Adam under novel blockwise smoothness assumptions.
nan
Article 344
Title@2025-06-11 (3): Temperature Optimization for Bayesian Deep Learning
Title: Temperature Optimization for Bayesian Deep Learning | Temperaturoptimierung für Bayesian Deep Learning | Bayesian深层学习的温度优化 2410.05757v2 |
Authors (4): Kenyon Ng, Chris van der Heide, Liam Hodgkinson, Susan Wei
The Cold Posterior Effect (CPE) is a phenomenon in Bayesian Deep Learning (BDL), where tempering the posterior to a cold temperature often improves the predictive performance of the posterior predictive distribution (PPD). Although the term `CPE’ suggests colder temperatures are inherently better, the BDL community increasingly recognizes that this is not always the case. Despite this, there remains no systematic method for finding the optimal temperature beyond grid search. In this work, we propose a data-driven approach to select the temperature that maximizes test log-predictive density, treating the temperature as a model parameter and estimating it directly from the data. We empirically demonstrate that our method performs comparably to grid search, at a fraction of the cost, across both regression and classification tasks. Finally, we highlight the differing perspectives on CPE between the BDL and Generalized Bayes communities: while the former primarily emphasizes the predictive performance of the PPD, the latter prioritizes the utility of the posterior under model misspecification; these distinct objectives lead to different temperature preferences.
nan
Article 345
Title@2025-06-11 (3): PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants
Title: PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants | PersonaLens: Ein Benchmark für die Personalisierungsbewertung in Conversational AI Assistants | 人际关系:对话性AI助理个人化评价基准 2506.09902v1 |
Authors (6): Zheng Zhao, Clara Vania, Subhradeep Kayal, Naila Khan, Shay B. Cohen, Emine Yilmaz
Large language models (LLMs) have advanced conversational AI assistants. However, systematically evaluating how well these assistants apply personalization–adapting to individual user preferences while completing tasks–remains challenging. Existing personalization benchmarks focus on chit-chat, non-conversational tasks, or narrow domains, failing to capture the complexities of personalized task-oriented assistance. To address this, we introduce PersonaLens, a comprehensive benchmark for evaluating personalization in task-oriented AI assistants. Our benchmark features diverse user profiles equipped with rich preferences and interaction histories, along with two specialized LLM-based agents: a user agent that engages in realistic task-oriented dialogues with AI assistants, and a judge agent that employs the LLM-as-a-Judge paradigm to assess personalization, response quality, and task success. Through extensive experiments with current LLM assistants across diverse tasks, we reveal significant variability in their personalization capabilities, providing crucial insights for advancing conversational AI systems.
nan
Article 346
Title@2025-06-11 (3): “What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)
Title: “What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended) | “Was sind meine Optionen?”: Erklären von RL-Agenten mit unterschiedlichen nahezu optimalen Alternativen (erweitert) | “我有什么选择?”:解释具有多种近最佳替代方法的RL代理(扩展) 2506.09901v1 |
Authors (4): Noel Brindise, Vijeth Hebbar, Riya Shah, Cedric Langbort
In this work, we provide an extended discussion of a new approach to explainable Reinforcement Learning called Diverse Near-Optimal Alternatives (DNA), first proposed at L4DC 2025. DNA seeks a set of reasonable “options” for trajectory-planning agents, optimizing policies to produce qualitatively diverse trajectories in Euclidean space. In the spirit of explainability, these distinct policies are used to “explain” an agent’s options in terms of available trajectory shapes from which a human user may choose. In particular, DNA applies to value function-based policies on Markov decision processes where agents are limited to continuous trajectories. Here, we describe DNA, which uses reward shaping in local, modified Q-learning problems to solve for distinct policies with guaranteed epsilon-optimality. We show that it successfully returns qualitatively different policies that constitute meaningfully different “options” in simulation, including a brief comparison to related approaches in the stochastic optimization field of Quality Diversity. Beyond the explanatory motivation, this work opens new possibilities for exploration and adaptive planning in RL.
nan
Article 347
Title@2025-06-11 (3): The Remarkable Robustness of LLMs: Stages of Inference?
Title: The Remarkable Robustness of LLMs: Stages of Inference? | Die bemerkenswerte Robustheit von LLMs: Stufen der Schlussfolgerung? | LLMS的显著威力:推论阶段? 2406.19384v2 |
Authors (3): Vedang Lad, Wes Gurnee, Max Tegmark
We investigate the robustness of Large Language Models (LLMs) to structural interventions by deleting and swapping adjacent layers during inference. Surprisingly, models retain 72-95% of their original top-1 prediction accuracy without any fine-tuning. We find that performance degradation is not uniform across layers: interventions to the early and final layers cause the most degradation, while the model is remarkably robust to dropping middle layers. This pattern of localized sensitivity motivates our hypothesis of four stages of inference, observed across diverse model families and sizes: (1) detokenization, where local context is integrated to lift raw token embeddings into higher-level representations; (2) feature engineering, where task- and entity-specific features are iteratively refined; (3) prediction ensembling, where hidden states are aggregated into plausible next-token predictions; and (4) residual sharpening, where irrelevant features are suppressed to finalize the output distribution. Synthesizing behavioral and mechanistic evidence, we provide a framework for interpreting depth-dependent computations in LLMs.
nan
Article 348
Title@2025-06-11 (3): A look at adversarial attacks on radio waveforms from discrete latent space
Title: A look at adversarial attacks on radio waveforms from discrete latent space | Ein Blick auf feindliche Angriffe auf Radiowellenformen aus diskretem latenten Raum | 审视对来自离散潜空空间的无线电波形的对抗性攻击 2506.09896v1 |
Authors (3): Attanasia Garuso, Silvija Kokalj-Filipovic, Yagna Kaasaragadda
Having designed a VQVAE that maps digital radio waveforms into discrete latent space, and yields a perfectly classifiable reconstruction of the original data, we here analyze the attack suppressing properties of VQVAE when an adversarial attack is performed on high-SNR radio-frequency (RF) data-points. To target amplitude modulations from a subset of digitally modulated waveform classes, we first create adversarial attacks that preserve the phase between the in-phase and quadrature component whose values are adversarially changed. We compare them with adversarial attacks of the same intensity where phase is not preserved. We test the classification accuracy of such adversarial examples on a classifier trained to deliver 100% accuracy on the original data. To assess the ability of VQVAE to suppress the strength of the attack, we evaluate the classifier accuracy on the reconstructions by VQVAE of the adversarial datapoints and show that VQVAE substantially decreases the effectiveness of the attack. We also compare the I/Q plane diagram of the attacked data, their reconstructions and the original data. Finally, using multiple methods and metrics, we compare the probability distribution of the VQVAE latent space with and without attack. Varying the attack strength, we observe interesting properties of the discrete space, which may help detect the attacks.
nan
Article 349
Title@2025-06-11 (3): Causal Climate Emulation with Bayesian Filtering
Title: Causal Climate Emulation with Bayesian Filtering | Causal Klima Emulation mit Bayesian Filtering | 以贝耶斯过滤器模拟因果气候 2506.09891v1 |
Authors (9): Sebastian Hickman, Ilija Trajkovic, Julia Kaltenborn, Francis Pelletier, Alex Archibald, Yaniv Gurwicz, Peer Nowack, David Rolnick, Julien Boussard
Traditional models of climate change use complex systems of coupled equations to simulate physical processes across the Earth system. These simulations are highly computationally expensive, limiting our predictions of climate change and analyses of its causes and effects. Machine learning has the potential to quickly emulate data from climate models, but current approaches are not able to incorporate physics-informed causal relationships. Here, we develop an interpretable climate model emulator based on causal representation learning. We derive a physics-informed approach including a Bayesian filter for stable long-term autoregressive emulation. We demonstrate that our emulator learns accurate climate dynamics, and we show the importance of each one of its components on a realistic synthetic dataset and data from two widely deployed climate models.
nan
Article 350
Title@2025-06-11 (3): Learning single-index models via harmonic decomposition
Title: Learning single-index models via harmonic decomposition | Ein-Index-Modelle über harmonische Zersetzung lernen | 通过口腔分解学习单指数模型 2506.09887v1 |
Authors (4): Nirmit Joshi, Hugo Koubbi, Theodor Misiakiewicz, Nathan Srebro
We study the problem of learning single-index models, where the label $y \in \mathbb{R}$ depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown one-dimensional projection $\langle \boldsymbol{w}*,\boldsymbol{x}\rangle$. Prior work has shown that under Gaussian inputs, the statistical and computational complexity of recovering $\boldsymbol{w}*$ is governed by the Hermite expansion of the link function. In this paper, we propose a new perspective: we argue that “spherical harmonics” – rather than “Hermite polynomials” – provide the natural basis for this problem, as they capture its intrinsic “rotational symmetry”. Building on this insight, we characterize the complexity of learning single-index models under arbitrary spherically symmetric input distributions. We introduce two families of estimators – based on tensor unfolding and online SGD – that respectively achieve either optimal sample complexity or optimal runtime, and argue that estimators achieving both may not exist in general. When specialized to Gaussian inputs, our theory not only recovers and clarifies existing results but also reveals new phenomena that had previously been overlooked.
nan
Article 351
Title@2025-06-11 (3): OmniJet-$α_C$: Learning point cloud calorimeter simulations using generative transformers
Title: OmniJet-$α_C$: Learning point cloud calorimeter simulations using generative transformers | OmniJet-$α_C$: Simulationen von Punktwolkenkalorimetern mit generativen Transformatoren | OmniJet-$ C$: 使用基因变压器学习点云度热量计模拟 2501.05534v2 |
Authors (6): Joschka Birk, Frank Gaede, Anna Hallin, Gregor Kasieczka, Martina Mozzanica, Henning Rose
We show the first use of generative transformers for generating calorimeter showers as point clouds in a high-granularity calorimeter. Using the tokenizer and generative part of the OmniJet-${\alpha}$ model, we represent the hits in the detector as sequences of integers. This model allows variable-length sequences, which means that it supports realistic shower development and does not need to be conditioned on the number of hits. Since the tokenization represents the showers as point clouds, the model learns the geometry of the showers without being restricted to any particular voxel grid.
nan
Article 352
Title@2025-06-11 (3): Leveraging data-driven weather models for improving numerical weather prediction skill through large-scale spectral nudging
Title: Leveraging data-driven weather models for improving numerical weather prediction skill through large-scale spectral nudging | Nutzung von datengesteuerten Wettermodellen zur Verbesserung der numerischen Wettervorhersagefähigkeit durch großflächige spektrale Nudging | 利用数据驱动天气模型,通过大型光谱裸露提高数字天气预测技能 2407.06100v3 |
Authors (12): Syed Zahid Husain, Leo Separovic, Jean-François Caron, Rabah Aider, Mark Buehner, Stéphane Chamberland, Ervig Lapalme, Ron McTaggart-Cowan, Christopher Subich, Paul A. Vaillancourt, Jing Yang, Ayrton Zadra
Operational meteorological forecasting has long relied on physics-based numerical weather prediction (NWP) models. Recently, this landscape has faced disruption by the advent of data-driven artificial intelligence (AI)-based weather models, which offer tremendous computational performance and competitive forecasting accuracy. However, data-driven models for medium-range forecasting generally suffer from major limitations, including low effective resolution and a narrow range of predicted variables. This study illustrates the relative strengths and weaknesses of these competing paradigms using the physics-based GEM (Global Environmental Multiscale) and the AI-based GraphCast models. Analyses of their respective global predictions in physical and spectral space reveal that GraphCast-predicted large scales outperform GEM, particularly for longer lead times, even though fine scales predicted by GraphCast suffer from excessive smoothing. Building on this insight, a hybrid NWP-AI system is proposed, wherein temperature and horizontal wind components predicted by GEM are spectrally nudged toward GraphCast predictions at large scales, while GEM itself freely generates the fine-scale details critical for local predictability and weather extremes. This hybrid approach is capable of leveraging the strengths of GraphCast to enhance the prediction skill of the GEM model while generating a full suite of physically consistent forecast fields with a full power spectrum. Additionally, trajectories of tropical cyclones are predicted with enhanced accuracy without significant changes in intensity. Work is in progress for operationalization of this hybrid system at the Canadian Meteorological Centre.
nan
Article 353
Title@2025-06-11 (3): UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
Title: UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching | UmbraTTS: Text-zu-Sprechen an Umweltkontexte anpassen mit Flow Matching | UmbratTS:用流动匹配使文字语音适应环境环境环境 2506.09874v1 |
Authors (9): Neta Glazer, Aviv Navon, Yael Segal, Aviv Shamsian, Hilit Segev, Asaf Buchnick, Menachem Pirchi, Gil Hetz, Joseph Keshet
Recent advances in Text-to-Speech (TTS) have enabled highly natural speech synthesis, yet integrating speech with complex background environments remains challenging. We introduce UmbraTTS, a flow-matching based TTS model that jointly generates both speech and environmental audio, conditioned on text and acoustic context. Our model allows fine-grained control over background volume and produces diverse, coherent, and context-aware audio scenes. A key challenge is the lack of data with speech and background audio aligned in natural context. To overcome the lack of paired training data, we propose a self-supervised framework that extracts speech, background audio, and transcripts from unannotated recordings. Extensive evaluations demonstrate that UmbraTTS significantly outperformed existing baselines, producing natural, high-quality, environmentally aware audios.
nan
Article 354
Title@2025-06-11 (3): Private Aggregation for Byzantine-Resilient Heterogeneous Federated Learning
Title: Private Aggregation for Byzantine-Resilient Heterogeneous Federated Learning | Private Aggregation für Byzantinisch-Resilient Heterogenes Federated Learning | Byzantine-抗后异异质联邦学习私营聚合 2506.09870v1 |
Authors (2): Maximilian Egger, Rawad Bitar
Ensuring resilience to Byzantine clients while maintaining the privacy of the clients’ data is a fundamental challenge in federated learning (FL). When the clients’ data is homogeneous, suitable countermeasures were studied from an information-theoretic perspective utilizing secure aggregation techniques while ensuring robust aggregation of the clients’ gradients. However, the countermeasures used fail when the clients’ data is heterogeneous. Suitable pre-processing techniques, such as nearest neighbor mixing, were recently shown to enhance the performance of those countermeasures in the heterogeneous setting. Nevertheless, those pre-processing techniques cannot be applied with the introduced privacy-preserving mechanisms. We propose a multi-stage method encompassing a careful co-design of verifiable secret sharing, secure aggregation, and a tailored symmetric private information retrieval scheme to achieve information-theoretic privacy guarantees and Byzantine resilience under data heterogeneity. We evaluate the effectiveness of our scheme on a variety of attacks and show how it outperforms the previously known techniques. Since the communication overhead of secure aggregation is non-negligible, we investigate the interplay with zero-order estimation methods that reduce the communication cost in state-of-the-art FL tasks and thereby make private aggregation scalable.
nan
Article 355
Title@2025-06-11 (3): Machine Learning-Based Classification of Oils Using Dielectric Properties and Microwave Resonant Sensing
Title: Machine Learning-Based Classification of Oils Using Dielectric Properties and Microwave Resonant Sensing | Machine Learning-based Klassifizierung von Ölen mit dielektrischen Eigenschaften und Mikrowellen-Resonant Sensing | 利用电力特性和微波共振遥感对石油进行机 机 动 学习 分类 2506.09867v1 |
Authors (3): Amit Baran Dey, Wasim Arif, Rakhesh Singh Kshetrimayum
This paper proposes a machine learning-based methodology for the classification of various oil samples based on their dielectric properties, utilizing a microwave resonant sensor. The dielectric behaviour of oils, governed by their molecular composition, induces distinct shifts in the sensor’s resonant frequency and amplitude response. These variations are systematically captured and processed to extract salient features, which serve as inputs for multiple machine learning classifiers. The microwave resonant sensor operates in a non-destructive, low-power manner, making it particularly well-suited for real-time industrial applications. A comprehensive dataset is developed by varying the permittivity of oil samples and acquiring the corresponding sensor responses. Several classifiers are trained and evaluated using the extracted resonant features to assess their capability in distinguishing between oil types. Experimental results demonstrate that the proposed approach achieves a high classification accuracy of 99.41% with the random forest classifier, highlighting its strong potential for automated oil identification. The system’s compact form factor, efficiency, and high performance underscore its viability for fast and reliable oil characterization in industrial environments.
nan
Article 356
Title@2025-06-11 (3): Conformal Prediction as Bayesian Quadrature
Title: Conformal Prediction as Bayesian Quadrature | Konforme Vorhersage als bayesische Quadratur | Bayesian 夸度的非正规预测 2502.13228v2 |
Authors (2): Jake C. Snell, Thomas L. Griffiths
As machine learning-based prediction systems are increasingly used in high-stakes situations, it is important to understand how such predictive models will perform upon deployment. Distribution-free uncertainty quantification techniques such as conformal prediction provide guarantees about the loss black-box models will incur even when the details of the models are hidden. However, such methods are based on frequentist probability, which unduly limits their applicability. We revisit the central aspects of conformal prediction from a Bayesian perspective and thereby illuminate the shortcomings of frequentist guarantees. We propose a practical alternative based on Bayesian quadrature that provides interpretable guarantees and offers a richer representation of the likely range of losses to be observed at test time.
nan
Article 357
Title@2025-06-11 (3): Guided Graph Compression for Quantum Graph Neural Networks
Title: Guided Graph Compression for Quantum Graph Neural Networks | Geführte Graphenkompression für Quantum Graphen-Neural-Netzwerke | 量图神经网络的向导图形压缩 2506.09862v1 |
Authors (6): Mikel Casals, Vasilis Belis, Elias F. Combarro, Eduard Alarcón, Sofia Vallecorsa, Michele Grossi
Graph Neural Networks (GNNs) are effective for processing graph-structured data but face challenges with large graphs due to high memory requirements and inefficient sparse matrix operations on GPUs. Quantum Computing (QC) offers a promising avenue to address these issues and inspires new algorithmic approaches. In particular, Quantum Graph Neural Networks (QGNNs) have been explored in recent literature. However, current quantum hardware limits the dimension of the data that can be effectively encoded. Existing approaches either simplify datasets manually or use artificial graph datasets. This work introduces the Guided Graph Compression (GGC) framework, which uses a graph autoencoder to reduce both the number of nodes and the dimensionality of node features. The compression is guided to enhance the performance of a downstream classification task, which can be applied either with a quantum or a classical classifier. The framework is evaluated on the Jet Tagging task, a classification problem of fundamental importance in high energy physics that involves distinguishing particle jets initiated by quarks from those by gluons. The GGC is compared against using the autoencoder as a standalone preprocessing step and against a baseline classical GNN classifier. Our numerical results demonstrate that GGC outperforms both alternatives, while also facilitating the testing of novel QGNN ansatzes on realistic datasets.
nan
Article 358
Title@2025-06-11 (3): ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models
Title: ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models | ImageChain: Advancing Sequential Image-to-Text Reasoning in multimodalen großen Sprachmodellen | 图像Chain:在多模式大语言模型中推进按顺序排列的图像到文字理由 2502.19409v2 |
Authors (3): Danae Sánchez Villegas, Ingo Ziegler, Desmond Elliott
Reasoning over sequences of images remains a challenge for multimodal large language models (MLLMs). While recent models incorporate multi-image data during pre-training, they still struggle to recognize sequential structures, often treating images independently. This work introduces ImageChain, a framework that enhances MLLMs with sequential reasoning capabilities over image data by modeling visual sequences as a multi-turn conversation. In ImageChain, images are interleaved with corresponding textual descriptions to form a controlled dialogue that explicitly captures temporal dependencies and narrative progression. Our method optimizes for the task of next-scene description, where the model generates a context-aware description of an upcoming scene based on preceding visual and textual cues. We demonstrate that our approach improves performance on the next-scene description task – achieving an average improvement from 3.7% to 19% in SimRate, a metric that quantifies semantic similarity to human-annotated ground truths. Moreover, ImageChain achieves robust zero-shot out-of-domain performance in applications ranging from comics to robotics. Extensive experiments validate that instruction-tuning in a multimodal, multi-turn conversation design is key to bridging the gap between static image understanding and temporally-aware reasoning.
nan
Article 359
Title@2025-06-11 (3): Tractable hierarchies of convex relaxations for polynomial optimization on the nonnegative orthant
Title: Tractable hierarchies of convex relaxations for polynomial optimization on the nonnegative orthant | Tractable Hierarchien konvexer Entspannungen zur polynomialen Optimierung auf dem nicht negativen Orthanten | 用于非阴性恒星多边优化的 convex 放松的可跟踪的等级结构 2209.06175v2 |
Authors (4): Ngoc Hoang Anh Mai, Victor Magron, Jean-Bernard Lasserre, Kim-Chuan Toh
We consider polynomial optimization problems (POP) on a semialgebraic set contained in the nonnegative orthant (every POP on a compact set can be put in this format by a simple translation of the origin). Such a POP can be converted to an equivalent POP by squaring each variable. Using even symmetry and the concept of factor width, we propose a hierarchy of semidefinite relaxations based on the extension of P'olya’s Positivstellensatz by Dickinson-Povh. As its distinguishing and crucial feature, the maximal matrix size of each resulting semidefinite relaxation can be chosen arbitrarily and in addition, we prove that the sequence of values returned by the new hierarchy converges to the optimal value of the original POP at the rate $O(\varepsilon^{-c})$ if the semialgebraic set has nonempty interior. When applied to (i) robustness certification of multi-layer neural networks and (ii) computation of positive maximal singular values, our method based on P'olya’s Positivstellensatz provides better bounds and runs several hundred times faster than the standard Moment-SOS hierarchy.
nan
Article 360
Title@2025-06-11 (3): FinTSBridge: A New Evaluation Suite for Real-world Financial Prediction with Advanced Time Series Models
Title: FinTSBridge: A New Evaluation Suite for Real-world Financial Prediction with Advanced Time Series Models | FinTSBridge: Eine neue Evaluierungs-Suite für reale finanzielle Vorhersagen mit Modellen der fortgeschrittenen Zeitreihen | FinTSBridge:具有先进时间序列模型的实时世界金融预测新评价套件 2503.06928v2 |
Authors (7): Yanlong Wang, Jian Xu, Tiantian Gao, Hongkang Zhang, Shao-Lun Huang, Danny Dongning Sun, Xiao-Ping Zhang
Despite the growing attention to time series forecasting in recent years, many studies have proposed various solutions to address the challenges encountered in time series prediction, aiming to improve forecasting performance. However, effectively applying these time series forecasting models to the field of financial asset pricing remains a challenging issue. There is still a need for a bridge to connect cutting-edge time series forecasting models with financial asset pricing. To bridge this gap, we have undertaken the following efforts: 1) We constructed three datasets from the financial domain; 2) We selected over ten time series forecasting models from recent studies and validated their performance in financial time series; 3) We developed new metrics, msIC and msIR, in addition to MSE and MAE, to showcase the time series correlation captured by the models; 4) We designed financial-specific tasks for these three datasets and assessed the practical performance and application potential of these forecasting models in important financial problems. We hope the developed new evaluation suite, FinTSBridge, can provide valuable insights into the effectiveness and robustness of advanced forecasting models in finanical domains.
nan
Article 361
Title@2025-06-11 (3): Advancing Exchange Rate Forecasting: Leveraging Machine Learning and AI for Enhanced Accuracy in Global Financial Markets
Title: Advancing Exchange Rate Forecasting: Leveraging Machine Learning and AI for Enhanced Accuracy in Global Financial Markets | Advancing Exchange Rate Forecasting: Hebeling Machine Learning und KI für verbesserte Genauigkeit in den globalen Finanzmärkten | 提高汇率预测:利用机器学习和大赦国际促进全球金融市场更加准确性 2506.09851v1 |
Authors (7): Md. Yeasin Rahat, Rajan Das Gupta, Nur Raisa Rahman, Sudipto Roy Pritom, Samiur Rahman Shakir, Md Imrul Hasan Showmick, Md. Jakir Hossen
The prediction of foreign exchange rates, such as the US Dollar (USD) to Bangladeshi Taka (BDT), plays a pivotal role in global financial markets, influencing trade, investments, and economic stability. This study leverages historical USD/BDT exchange rate data from 2018 to 2023, sourced from Yahoo Finance, to develop advanced machine learning models for accurate forecasting. A Long Short-Term Memory (LSTM) neural network is employed, achieving an exceptional accuracy of 99.449%, a Root Mean Square Error (RMSE) of 0.9858, and a test loss of 0.8523, significantly outperforming traditional methods like ARIMA (RMSE 1.342). Additionally, a Gradient Boosting Classifier (GBC) is applied for directional prediction, with backtesting on a $10,000 initial capital revealing a 40.82% profitable trade rate, though resulting in a net loss of $20,653.25 over 49 trades. The study analyzes historical trends, showing a decline in BDT/USD rates from 0.012 to 0.009, and incorporates normalized daily returns to capture volatility. These findings highlight the potential of deep learning in forex forecasting, offering traders and policymakers robust tools to mitigate risks. Future work could integrate sentiment analysis and real-time economic indicators to further enhance model adaptability in volatile markets.
nan
Article 362
Title@2025-06-11 (3): Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective
Title: Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective | Produkt von Experten mit LLMs: Leistungssteigerung auf ARC ist eine Frage der Perspektive | 与LLMM公司的专家产品有关的产品:促进ACR的绩效是一个透视问题 2505.07859v2 |
Authors (3): Daniel Franzen, Jan Disselhoff, David Hartmann
The Abstraction and Reasoning Corpus (ARC-AGI) poses a significant challenge for large language models (LLMs), exposing limitations in their abstract reasoning abilities. In this work, we leverage task-specific data augmentations throughout the training, generation, and scoring phases, and employ a depth-first search algorithm to generate diverse, high-probability candidate solutions. Furthermore, we utilize the LLM not only as a generator but also as a scorer, using its output probabilities to select the most promising solutions. Our method achieves a score of 71.6% (286.5/400 solved tasks) on the public ARC-AGI evaluation set, demonstrating state-of-the-art performance among publicly available approaches. While concurrent closed-source work has reported higher scores, our method distinguishes itself through its transparency, reproducibility, and remarkably low inference cost, averaging only around 2ct per task on readily available hardware (we assume a price of 36ct/hour for a Nvidia 4090 GPU).
nan
Article 363
Title@2025-06-11 (3): Reinforcement Learning from Human Feedback
Title: Reinforcement Learning from Human Feedback | Verstärktes Lernen aus menschlichem Feedback | 从人类反馈中学习 2504.12501v2 |
Authors (1): Nathan Lambert
Reinforcement learning from human feedback (RLHF) has become an important technical and storytelling tool to deploy the latest machine learning systems. In this book, we hope to give a gentle introduction to the core methods for people with some level of quantitative background. The book starts with the origins of RLHF – both in recent literature and in a convergence of disparate fields of science in economics, philosophy, and optimal control. We then set the stage with definitions, problem formulation, data collection, and other common math used in the literature. The core of the book details every optimization stage in using RLHF, from starting with instruction tuning to training a reward model and finally all of rejection sampling, reinforcement learning, and direct alignment algorithms. The book concludes with advanced topics – understudied research questions in synthetic data and evaluation – and open questions for the field.
nan
Article 364
Title@2025-06-11 (3): A Deep Generative Model for the Simulation of Discrete Karst Networks
Title: A Deep Generative Model for the Simulation of Discrete Karst Networks | Ein tiefes Generatives Modell für die Simulation diskreter Karst-Netzwerke | Discrete Karst 网络模拟深度生成模型 2506.09832v1 |
Authors (3): Dany Lauzon, Julien Straubhaar, Philippe Renard
The simulation of discrete karst networks presents a significant challenge due to the complexity of the physicochemical processes occurring within various geological and hydrogeological contexts over extended periods. This complex interplay leads to a wide variety of karst network patterns, each intricately linked to specific hydrogeological conditions. We explore a novel approach that represents karst networks as graphs and applies graph generative models (deep learning techniques) to capture the intricate nature of karst environments. In this representation, nodes retain spatial information and properties, while edges signify connections between nodes. Our generative process consists of two main steps. First, we utilize graph recurrent neural networks (GraphRNN) to learn the topological distribution of karst networks. GraphRNN decomposes the graph simulation into a sequential generation of nodes and edges, informed by previously generated structures. Second, we employ denoising diffusion probabilistic models on graphs (G-DDPM) to learn node features (spatial coordinates and other properties). G-DDPMs enable the generation of nodes features on the graphs produced by the GraphRNN that adhere to the learned statistical properties by sampling from the derived probability distribution, ensuring that the generated graphs are realistic and capture the essential features of the original data. We test our approach using real-world karst networks and compare generated subgraphs with actual subgraphs from the database, by using geometry and topology metrics. Our methodology allows stochastic simulation of discrete karst networks across various types of formations, a useful tool for studying the behavior of physical processes such as flow and transport.
nan
Article 365
Title@2025-06-11 (3): Multi-Variable Batch Bayesian Optimization in Materials Research: Synthetic Data Analysis of Noise Sensitivity and Problem Landscape Effects
Title: Multi-Variable Batch Bayesian Optimization in Materials Research: Synthetic Data Analysis of Noise Sensitivity and Problem Landscape Effects | Multivariable Batch Bayesian Optimierung in der Materialforschung: Synthetische Datenanalyse von Lärmempfindlichkeit und Problemlandschaftseffekten | 材料研究中多种可变批量贝耶斯最佳优化:噪音敏感度和问题景观效应合成数据分析 2504.03943v2 |
Authors (7): Imon Mia, Armi Tiihonen, Anna Ernst, Anusha Srivastava, Tonio Buonassisi, William Vandenberghe, Julia W. P. Hsu
Bayesian Optimization (BO) machine learning method is increasingly used to guide experimental optimization tasks in materials science. To emulate the large number of input variables and noise-containing results in experimental materials research, we perform batch BO simulation of six design variables with a range of noise levels. Two test cases relevant for materials science problems are examined: a needle-in-a-haystack case (Ackley function) that may be encountered in, e.g., molecule optimizations, and a smooth landscape with a local optimum in addition to the global optimum (Hartmann function) that may be encountered in, e.g., material composition optimization. We show learning curves, performance metrics, and visualization to effectively track the optimization progression and evaluate how the optimization outcomes are affected by noise, batch-picking method, choice of acquisition function, and exploration hyperparameter values. We find that the effects of noise depend on the problem landscape: noise degrades the optimization results of a needle-in-a-haystack search (Ackley) dramatically more. However, with increasing noise, we observe an increasing probability of landing on the local optimum in Hartmann. Therefore, prior knowledge of the problem domain structure and noise level is essential when designing BO for materials research experiments. Synthetic data studies – with known ground truth and controlled noise levels – enable us to isolate and evaluate the impact of different batch BO components, {\it e.g.}, acquisition policy, objective metrics, and hyperparameter values, before transitioning to the inherent uncertainties of real experimental systems. The results and methodology of this study will facilitate a greater utilization of BO in guiding experimental materials research, specifically in settings with a large number of design variables to optimize.
nan
Article 366
Title@2025-06-11 (3): Online Discovery of Simulation Models for Evolving Business Processes (Extended Version)
Title: Online Discovery of Simulation Models for Evolving Business Processes (Extended Version) | Online Discovery of Simulation Models for Evolving Business Processes (Erweiterte Version) | 不断演变的业务流程模拟模型在线发现(扩展版) 2506.10049v1 |
Authors (4): Francesco Vinci, Gyunam Park, Wil van der Aalst, Massimiliano de Leoni
Business Process Simulation (BPS) refers to techniques designed to replicate the dynamic behavior of a business process. Many approaches have been proposed to automatically discover simulation models from historical event logs, reducing the cost and time to manually design them. However, in dynamic business environments, organizations continuously refine their processes to enhance efficiency, reduce costs, and improve customer satisfaction. Existing techniques to process simulation discovery lack adaptability to real-time operational changes. In this paper, we propose a streaming process simulation discovery technique that integrates Incremental Process Discovery with Online Machine Learning methods. This technique prioritizes recent data while preserving historical information, ensuring adaptation to evolving process dynamics. Experiments conducted on four different event logs demonstrate the importance in simulation of giving more weight to recent data while retaining historical knowledge. Our technique not only produces more stable simulations but also exhibits robustness in handling concept drift, as highlighted in one of the use cases.
nan
Article 367
Title@2025-06-11 (3): Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation
Title: Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation | Sparser, besser, schneller, stärker: Sparsity Detection für effiziente automatische Differenzierung | Sparser、更好、更快、更有力:为高效自动差异而进行分级检测 2501.17737v2 |
Authors (2): Adrian Hill, Guillaume Dalle
From implicit differentiation to probabilistic modeling, Jacobian and Hessian matrices have many potential use cases in Machine Learning (ML), but they are viewed as computationally prohibitive. Fortunately, these matrices often exhibit sparsity, which can be leveraged to speed up the process of Automatic Differentiation (AD). This paper presents advances in sparsity detection, previously the performance bottleneck of Automatic Sparse Differentiation (ASD). Our implementation of sparsity detection is based on operator overloading, able to detect both local and global sparsity patterns, and supports flexible index set representations. It is fully automatic and requires no modification of user code, making it compatible with existing ML codebases. Most importantly, it is highly performant, unlocking Jacobians and Hessians at scales where they were considered too expensive to compute. On real-world problems from scientific ML, graph neural networks and optimization, we show significant speed-ups of up to three orders of magnitude. Notably, using our sparsity detection system, ASD outperforms standard AD for one-off computations, without amortization of either sparsity detection or matrix coloring.
nan
Article 368
Title@2025-06-11 (3): Metritocracy: Representative Metrics for Lite Benchmarks
Title: Metritocracy: Representative Metrics for Lite Benchmarks | Metritokratie: Repräsentative Metrics für Lite Benchmarks | 宏观经济:利特基准的代议制衡量标准 2506.09813v1 |
Authors (4): Ariel Procaccia, Benjamin Schiffer, Serena Wang, Shirley Zhang
A common problem in LLM evaluation is how to choose a subset of metrics from a full suite of possible metrics. Subset selection is usually done for efficiency or interpretability reasons, and the goal is often to select a representative'' subset of metrics. However,
representative’’ is rarely clearly defined. In this work, we use ideas from social choice theory to formalize two notions of representation for the selection of a subset of evaluation metrics. We first introduce positional representation, which guarantees every alternative is sufficiently represented at every position cutoff. We then introduce positional proportionality, which guarantees no alternative is proportionally over- or under-represented by more than a small error at any position. We prove upper and lower bounds on the smallest number of metrics needed to guarantee either of these properties in the worst case. We also study a generalized form of each property that allows for additional input on groups of metrics that must be represented. Finally, we tie theory to practice through real-world case studies on both LLM evaluation and hospital quality evaluation.
nan
Article 369
Title@2025-06-11 (3): Generalizing Supervised Contrastive learning: A Projection Perspective
Title: Generalizing Supervised Contrastive learning: A Projection Perspective | Verallgemeinern des Überwachten Kontrastivlernens: Eine Projektionsperspektive | 普及受监督的反竞争学习:预测视角 2506.09810v1 |
Authors (2): Minoh Jeong, Alfred Hero
Self-supervised contrastive learning (SSCL) has emerged as a powerful paradigm for representation learning and has been studied from multiple perspectives, including mutual information and geometric viewpoints. However, supervised contrastive (SupCon) approaches have received comparatively little attention in this context: for instance, while InfoNCE used in SSCL is known to form a lower bound on mutual information (MI), the relationship between SupCon and MI remains unexplored. To address this gap, we introduce ProjNCE, a generalization of the InfoNCE loss that unifies supervised and self-supervised contrastive objectives by incorporating projection functions and an adjustment term for negative pairs. We prove that ProjNCE constitutes a valid MI bound and affords greater flexibility in selecting projection strategies for class embeddings. Building on this flexibility, we further explore the centroid-based class embeddings in SupCon by exploring a variety of projection methods. Extensive experiments on multiple datasets and settings demonstrate that ProjNCE consistently outperforms both SupCon and standard cross-entropy training. Our work thus refines SupCon along two complementary perspective–mutual information interpretation and projection design–and offers broadly applicable improvements whenever SupCon serves as the foundational contrastive objective.
nan
Article 370
Title@2025-06-11 (3): Network Dynamics-Based Framework for Understanding Deep Neural Networks
Title: Network Dynamics-Based Framework for Understanding Deep Neural Networks | Network Dynamics-based Framework zum Verständnis tiefer neuraler Netzwerke | 了解深神经网络的动态框架 2501.02436v3 |
Authors (4): Yuchen Lin, Yong Zhang, Sihan Feng, Hong Zhao
Advancements in artificial intelligence call for a deeper understanding of the fundamental mechanisms underlying deep learning. In this work, we propose a theoretical framework to analyze learning dynamics through the lens of dynamical systems theory. We redefine the notions of linearity and nonlinearity in neural networks by introducing two fundamental transformation units at the neuron level: order-preserving transformations and non-order-preserving transformations. Different transformation modes lead to distinct collective behaviors in weight vector organization, different modes of information extraction, and the emergence of qualitatively different learning phases. Transitions between these phases may occur during training, accounting for key phenomena such as grokking. To further characterize generalization and structural stability, we introduce the concept of attraction basins in both sample and weight spaces. The distribution of neurons with different transformation modes across layers, along with the structural characteristics of the two types of attraction basins, forms a set of core metrics for analyzing the performance of learning models. Hyperparameters such as depth, width, learning rate, and batch size act as control variables for fine-tuning these metrics. Our framework not only sheds light on the intrinsic advantages of deep learning, but also provides a novel perspective for optimizing network architectures and training strategies.
nan
Article 371
Title@2025-06-11 (3): Using Shapley interactions to understand how models use structure
Title: Using Shapley interactions to understand how models use structure | Shapley-Interaktionen nutzen, um zu verstehen, wie Modelle Struktur verwenden | 利用模拟互动来了解模型如何使用结构 2403.13106v2 |
Authors (6): Divyansh Singhvi, Diganta Misra, Andrej Erkelens, Raghav Jain, Isabel Papadimitriou, Naomi Saphra
Language is an intricately structured system, and a key goal of NLP interpretability is to provide methodological insights for understanding how language models represent this structure internally. In this paper, we use Shapley Taylor interaction indices (STII) in order to examine how language and speech models internally relate and structure their inputs. Pairwise Shapley interactions measure how much two inputs work together to influence model outputs beyond if we linearly added their independent influences, providing a view into how models encode structural interactions between inputs. We relate the interaction patterns in models to three underlying linguistic structures: syntactic structure, non-compositional semantics, and phonetic coarticulation. We find that autoregressive text models encode interactions that correlate with the syntactic proximity of inputs, and that both autoregressive and masked models encode nonlinear interactions in idiomatic phrases with non-compositional semantics. Our speech results show that inputs are more entangled for pairs where a neighboring consonant is likely to influence a vowel or approximant, showing that models encode the phonetic interaction needed for extracting discrete phonemic representations.
nan
Article 372
Title@2025-06-11 (3): Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy
Title: Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy | Automatische Behandlungsplanung mit Verstärkungstraining für eine hoch dosierte Prostata Brachytherapie | 利用高剂量前列腺布拉奇疗法强化学习进行自动治疗规划 2506.09805v1 |
Authors (3): Tonghe Wang, Yining Feng, Xiaofeng Yang
Purpose: In high-dose-rate (HDR) prostate brachytherapy procedures, the pattern of needle placement solely relies on physician experience. We investigated the feasibility of using reinforcement learning (RL) to provide needle positions and dwell times based on patient anatomy during pre-planning stage. This approach would reduce procedure time and ensure consistent plan quality. Materials and Methods: We train a RL agent to adjust the position of one selected needle and all the dwell times on it to maximize a pre-defined reward function after observing the environment. After adjusting, the RL agent then moves on to the next needle, until all needles are adjusted. Multiple rounds are played by the agent until the maximum number of rounds is reached. Plan data from 11 prostate HDR boost patients (1 for training, and 10 for testing) treated in our clinic were included in this study. The dosimetric metrics and the number of used needles of RL plan were compared to those of the clinical results (ground truth). Results: On average, RL plans and clinical plans have very similar prostate coverage (Prostate V100) and Rectum D2cc (no statistical significance), while RL plans have less prostate hotspot (Prostate V150) and Urethra D20% plans with statistical significance. Moreover, RL plans use 2 less needles than clinical plan on average. Conclusion: We present the first study demonstrating the feasibility of using reinforcement learning to autonomously generate clinically practical HDR prostate brachytherapy plans. This RL-based method achieved equal or improved plan quality compared to conventional clinical approaches while requiring fewer needles. With minimal data requirements and strong generalizability, this approach has substantial potential to standardize brachytherapy planning, reduce clinical variability, and enhance patient outcomes.
nan
Article 373
Title@2025-06-11 (3): Regularizing Learnable Feature Extraction for Automatic Speech Recognition
Title: Regularizing Learnable Feature Extraction for Automatic Speech Recognition | Regularisieren von Lern-Feature-Extraktion für automatische Spracherkennung | 将自动语音识别的可读功能抽取器正规化 2506.09804v1 |
Authors (5): Peter Vieting, Maximilian Kannen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney
Neural front-ends are an appealing alternative to traditional, fixed feature extraction pipelines for automatic speech recognition (ASR) systems since they can be directly trained to fit the acoustic model. However, their performance often falls short compared to classical methods, which we show is largely due to their increased susceptibility to overfitting. This work therefore investigates regularization methods for training ASR models with learnable feature extraction front-ends. First, we examine audio perturbation methods and show that larger relative improvements can be obtained for learnable features. Additionally, we identify two limitations in the standard use of SpecAugment for these front-ends and propose masking in the short time Fourier transform (STFT)-domain as a simple but effective modification to address these challenges. Finally, integrating both regularization approaches effectively closes the performance gap between traditional and learnable features.
nan
Article 374
Title@2025-06-11 (3): Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols
Title: Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols | Teufelshand: Daten vergiften Angriffe auf lokal private Graphen-Lernprotokolle | 魔鬼之手:对本地私人图案学习程序的数据毒害攻击 2506.09803v1 |
Authors (5): Longzhu He, Chaozhuo Li, Peng Tang, Litian Zhang, Sen Su
Graph neural networks (GNNs) have achieved significant success in graph representation learning and have been applied to various domains. However, many real-world graphs contain sensitive personal information, such as user profiles in social networks, raising serious privacy concerns when graph learning is performed using GNNs. To address this issue, locally private graph learning protocols have gained considerable attention. These protocols leverage the privacy advantages of local differential privacy (LDP) and the effectiveness of GNN’s message-passing in calibrating noisy data, offering strict privacy guarantees for users’ local data while maintaining high utility (e.g., node classification accuracy) for graph learning. Despite these advantages, such protocols may be vulnerable to data poisoning attacks, a threat that has not been considered in previous research. Identifying and addressing these threats is crucial for ensuring the robustness and security of privacy-preserving graph learning frameworks. This work introduces the first data poisoning attack targeting locally private graph learning protocols. The attacker injects fake users into the protocol, manipulates these fake users to establish links with genuine users, and sends carefully crafted data to the server, ultimately compromising the utility of private graph learning. The effectiveness of the attack is demonstrated both theoretically and empirically. In addition, several defense strategies have also been explored, but their limited effectiveness highlights the need for more robust defenses.
nan
Article 375
Title@2025-06-11 (3): LogProber: Disentangling confidence from contamination in LLM responses
Title: LogProber: Disentangling confidence from contamination in LLM responses | LogProber: Entwirren des Vertrauens in LLM-Antworten | 日志Prober:解除对LLM反应中污染的信心 2408.14352v2 |
Authors (3): Nicolas Yax, Pierre-Yves Oudeyer, Stefano Palminteri
In machine learning, contamination refers to situations where testing data leak into the training set. The issue is particularly relevant for the evaluation of the performance of Large Language Models (LLMs), which are generally trained on gargantuan, and generally opaque, corpora of text scraped from the world wide web. Developing tools to detect contamination is therefore crucial to be able to fairly and properly track the evolution of the performance of LLMs. To date, only a few recent studies have attempted to address the issue of quantifying and detecting contamination in short text sequences, such as those commonly found in benchmarks. However, these methods have limitations that can sometimes render them impractical.In the present paper, we introduce LogProber, a novel, efficient algorithm that we show to be able to detect contamination in a black box setting that tries to tackle some of these drawbacks by focusing on the familiarity with the question rather than the answer. Here, we explore the properties of the proposed method in comparison with concurrent approaches, identify its advantages and limitations, and illustrate how different forms of contamination can go undetected depending on the design of the detection algorithm.
nan
Article 376
Title@2025-06-11 (3): Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
Title: Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction | Einschließlich sprachlicher Einschränkungen aus der externen Wissensquelle für Audio-Visual Target Speech Extraction | 将外部知识来源的语言限制纳入音像目标演讲摘录 2506.09792v1 |
Authors (5): Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li
Audio-visual target speaker extraction (AV-TSE) models primarily rely on target visual cues to isolate the target speaker’s voice from others. We know that humans leverage linguistic knowledge, such as syntax and semantics, to support speech perception. Inspired by this, we explore the potential of pre-trained speech-language models (PSLMs) and pre-trained language models (PLMs) as auxiliary knowledge sources for AV-TSE. In this study, we propose incorporating the linguistic constraints from PSLMs or PLMs for the AV-TSE model as additional supervision signals. Without introducing any extra computational cost during inference, the proposed approach consistently improves speech quality and intelligibility. Furthermore, we evaluate our method in multi-language settings and visual cue-impaired scenarios and show robust performance gains.
nan
Article 377
Title@2025-06-11 (3): DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images
Title: DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images | DANCE: Deep Learning-Assisted Analyse von Proteinsequenzen mit Chaos verbesserte Kaleidoskopische Bilder | 成就:利用混乱强化卡莱多斯图象对蛋白序列进行深入学习辅助分析 2409.06694v3 |
Authors (5): Taslim Murad, Prakash Chourasia, Sarwan Ali, Imdad Ullah Khan, Murray Patterson
Cancer is a complex disease characterized by uncontrolled cell growth. T cell receptors (TCRs), crucial proteins in the immune system, play a key role in recognizing antigens, including those associated with cancer. Recent advancements in sequencing technologies have facilitated comprehensive profiling of TCR repertoires, uncovering TCRs with potent anti-cancer activity and enabling TCR-based immunotherapies. However, analyzing these intricate biomolecules necessitates efficient representations that capture their structural and functional information. T-cell protein sequences pose unique challenges due to their relatively smaller lengths compared to other biomolecules. An image-based representation approach becomes a preferred choice for efficient embeddings, allowing for the preservation of essential details and enabling comprehensive analysis of T-cell protein sequences. In this paper, we propose to generate images from the protein sequences using the idea of Chaos Game Representation (CGR) using the Kaleidoscopic images approach. This Deep Learning Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images (called DANCE) provides a unique way to visualize protein sequences by recursively applying chaos game rules around a central seed point. we perform the classification of the T cell receptors (TCRs) protein sequences in terms of their respective target cancer cells, as TCRs are known for their immune response against cancer disease. The TCR sequences are converted into images using the DANCE method. We employ deep-learning vision models to perform the classification to obtain insights into the relationship between the visual patterns observed in the generated kaleidoscopic images and the underlying protein properties. By combining CGR-based image generation with deep learning classification, this study opens novel possibilities in the protein analysis domain.
nan
Article 378
Title@2025-06-11 (3): Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
Title: Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing | Darstellung Shattering in Transformers: Synthetische Studie mit Wissensbearbeitung | 在变形器中代表变形器:带有知识编辑的合成研究 2410.17194v5 |
Authors (6): Kento Nishi, Rahul Ramesh, Maya Okawa, Mikail Khona, Hidenori Tanaka, Ekdeep Singh Lubana
Knowledge Editing (KE) algorithms alter models’ weights to perform targeted updates to incorrect, outdated, or otherwise unwanted factual associations. However, recent work has shown that applying KE can adversely affect models’ broader factual recall accuracy and diminish their reasoning abilities. Although these studies give insights into the potential harms of KE algorithms, e.g., performance evaluations on benchmarks, little is understood about why such destructive failures occur. Motivated by this, we define a novel synthetic task in which a Transformer is trained from scratch to internalize a “structured” knowledge graph. The structure enforces relationships between entities of the graph, such that editing a factual association has “trickling effects” on other entities (e.g., altering X’s parent is Y to Z affects who X’s siblings’ parent is). Through evaluations of edited models on this task, we show that KE inadvertently affects representations of entities beyond the targeted one, distorting relevant structures that allow a model to infer unseen knowledge about an entity. We call this phenomenon representation shattering and demonstrate that it degrades models’ factual recall and reasoning performance. We further corroborate our findings in naturalistic settings with pre-trained Llama and Mamba models as well. Overall, our work yields a precise mechanistic hypothesis to explain why KE has adverse effects on model abilities.
nan
Article 379
Title@2025-06-11 (3): A theoretical framework for self-supervised contrastive learning for continuous dependent data
Title: A theoretical framework for self-supervised contrastive learning for continuous dependent data | Ein theoretischer Rahmen für selbstüberwachtes kontrastives Lernen für kontinuierliche abhängige Daten | 为持续依赖数据进行自我监督对比式学习的理论框架 2506.09785v1 |
Authors (3): Alexander Marusov, Alexander Yuhay, Alexey Zaytsev
Self-supervised learning (SSL) has emerged as a powerful approach to learning representations, particularly in the field of computer vision. However, its application to dependent data, such as temporal and spatio-temporal domains, remains underexplored. Besides, traditional contrastive SSL methods often assume \emph{semantic independence between samples}, which does not hold for dependent data exhibiting complex correlations. We propose a novel theoretical framework for contrastive SSL tailored to \emph{continuous dependent data}, which allows the nearest samples to be semantically close to each other. In particular, we propose two possible \textit{ground truth similarity measures} between objects – \emph{hard} and \emph{soft} closeness. Under it, we derive an analytical form for the \textit{estimated similarity matrix} that accommodates both types of closeness between samples, thereby introducing dependency-aware loss functions. We validate our approach, \emph{Dependent TS2Vec}, on temporal and spatio-temporal downstream problems. Given the dependency patterns presented in the data, our approach surpasses modern ones for dependent data, highlighting the effectiveness of our theoretically grounded loss functions for SSL in capturing spatio-temporal dependencies. Specifically, we outperform TS2Vec on the standard UEA and UCR benchmarks, with accuracy improvements of $4.17$\% and $2.08$\%, respectively. Furthermore, on the drought classification task, which involves complex spatio-temporal patterns, our method achieves a $7$\% higher ROC-AUC score.
nan
Article 380
Title@2025-06-11 (3): On the Similarities of Embeddings in Contrastive Learning
Title: On the Similarities of Embeddings in Contrastive Learning | Über die Ähnlichkeiten von Einbettungen im kontrastiven Lernen | 关于差异学习中的嵌入相似性 2506.09781v1 |
Authors (4): Chungpa Lee, Sehee Lim, Kibok Lee, Jy-yong Sohn
Contrastive learning (CL) operates on a simple yet effective principle: embeddings of positive pairs are pulled together, while those of negative pairs are pushed apart. Although various forms of contrastive loss have been proposed and analyzed from different perspectives, prior works lack a comprehensive framework that systematically explains a broad class of these objectives. In this paper, we present a unified framework for understanding CL, which is based on analyzing the cosine similarity between embeddings of positive and negative pairs. In full-batch settings, we show that perfect alignment of positive pairs is unattainable when similarities of negative pairs fall below a certain threshold, and that this misalignment can be alleviated by incorporating within-view negative pairs. In mini-batch settings, we demonstrate that smaller batch sizes incur stronger separation among negative pairs within batches, which leads to higher variance in similarities of negative pairs. To address this limitation of mini-batch CL, we introduce an auxiliary loss term that reduces the variance of similarities of negative pairs in CL. Empirical results demonstrate that incorporating the proposed loss consistently improves the performance of CL methods in small-batch training.
nan
Article 381
Title@2025-06-11 (3): Cross-Channel Unlabeled Sensing over a Union of Signal Subspaces
Title: Cross-Channel Unlabeled Sensing over a Union of Signal Subspaces | Cross-Channel-Unmarkierte Sensing über eine Union von Signal-Subräumen | 对信号子空间联盟的交叉通道无标签遥感 2506.09773v1 |
Authors (4): Taulant Koka, Manolis C. Tsakiris, Benjamín Béjar Haro, Michael Muma
Cross-channel unlabeled sensing addresses the problem of recovering a multi-channel signal from measurements that were shuffled across channels. This work expands the cross-channel unlabeled sensing framework to signals that lie in a union of subspaces. The extension allows for handling more complex signal structures and broadens the framework to tasks like compressed sensing. These mismatches between samples and channels often arise in applications such as whole-brain calcium imaging of freely moving organisms or multi-target tracking. We improve over previous models by deriving tighter bounds on the required number of samples for unique reconstruction, while supporting more general signal types. The approach is validated through an application in whole-brain calcium imaging, where organism movements disrupt sample-to-neuron mappings. This demonstrates the utility of our framework in real-world settings with imprecise sample-channel associations, achieving accurate signal reconstruction.
nan
Article 382
Title@2025-06-11 (3): Load-Aware Training Scheduling for Model Circulation-based Decentralized Federated Learning
Title: Load-Aware Training Scheduling for Model Circulation-based Decentralized Federated Learning | Load-Aware-Trainingsplanung für Modellzirkulationsbasiertes dezentralisiertes Federated Learning | 模拟环环流分散式联邦学习模式的负载软件培训计划 2506.09769v1 |
Authors (2): Haruki Kainuma, Takayuki Nishio
This paper proposes Load-aware Tram-FL, an extension of Tram-FL that introduces a training scheduling mechanism to minimize total training time in decentralized federated learning by accounting for both computational and communication loads. The scheduling problem is formulated as a global optimization task, which-though intractable in its original form-is made solvable by decomposing it into node-wise subproblems. To promote balanced data utilization under non-IID distributions, a variance constraint is introduced, while the overall training latency, including both computation and communication costs, is minimized through the objective function. Simulation results on MNIST and CIFAR-10 demonstrate that Load-aware Tram-FL significantly reduces training time and accelerates convergence compared to baseline methods.
nan
Article 383
Title@2025-06-11 (3): Anomaly Detection via Autoencoder Composite Features and NCE
Title: Anomaly Detection via Autoencoder Composite Features and NCE | Anomalieerkennung über Autoencoder Composite-Features und NCE | 通过自动编码器合成特征和 NCE 异常探测 2502.01920v2 |
Authors (2): Yalin Liao, Austin J. Brockmeier
Unsupervised anomaly detection is a challenging task. Autoencoders (AEs) or generative models are often employed to model the data distribution of normal inputs and subsequently identify anomalous, out-of-distribution inputs by high reconstruction error or low likelihood, respectively. However, AEs may generalize and achieve small reconstruction errors on abnormal inputs. We propose a decoupled training approach for anomaly detection that both an AE and a likelihood model trained with noise contrastive estimation (NCE). After training the AE, NCE estimates a probability density function, to serve as the anomaly score, on the joint space of the AE’s latent representation combined with features of the reconstruction quality. To further reduce the false negative rate in NCE we systematically varying the reconstruction features to augment the training and optimize the contrastive Gaussian noise distribution. Experimental assessments on multiple benchmark datasets demonstrate that the proposed approach matches the performance of prevalent state-of-the-art anomaly detection algorithms.
nan
Article 384
Title@2025-06-11 (3): Learning to Optimize Package Picking for Large-Scale, Real-World Robot Induction
Title: Learning to Optimize Package Picking for Large-Scale, Real-World Robot Induction | Optimierung der Auswahl von Paketen für großformatige, real-Welt-Roboter-Induktion | 学习优化大型、现实世界机器人上岗软件包选择 2506.09765v1 |
Authors (6): Shuai Li, Azarakhsh Keipour, Sicong Zhao, Srinath Rajagopalan, Charles Swan, Kostas E. Bekris
Warehouse automation plays a pivotal role in enhancing operational efficiency, minimizing costs, and improving resilience to workforce variability. While prior research has demonstrated the potential of machine learning (ML) models to increase picking success rates in large-scale robotic fleets by prioritizing high-probability picks and packages, these efforts primarily focused on predicting success probabilities for picks sampled using heuristic methods. Limited attention has been given, however, to leveraging data-driven approaches to directly optimize sampled picks for better performance at scale. In this study, we propose an ML-based framework that predicts transform adjustments as well as improving the selection of suction cups for multi-suction end effectors for sampled picks to enhance their success probabilities. The framework was integrated and evaluated in test workcells that resemble the operations of Amazon Robotics’ Robot Induction (Robin) fleet, which is used for package manipulation. Evaluated on over 2 million picks, the proposed method achieves a 20\% reduction in pick failure rates compared to a heuristic-based pick sampling baseline, demonstrating its effectiveness in large-scale warehouse automation scenarios.
nan
Article 385
Title@2025-06-11 (3): Alice and the Caterpillar: A more descriptive null model for assessing data mining results
Title: Alice and the Caterpillar: A more descriptive null model for assessing data mining results | Alice und die Caterpillar: Ein beschreibendes Nullmodell zur Bewertung der Ergebnisse des Data Mining | 爱丽丝和毛毛虫:评估数据开采结果的更具描述性的无效模式 2506.09764v1 |
Authors (3): Giulia Preti, Gianmarco De Francisci Morales, Matteo Riondato
We introduce novel null models for assessing the results obtained from observed binary transactional and sequence datasets, using statistical hypothesis testing. Our null models maintain more properties of the observed dataset than existing ones. Specifically, they preserve the Bipartite Joint Degree Matrix of the bipartite (multi-)graph corresponding to the dataset, which ensures that the number of caterpillars, i.e., paths of length three, is preserved, in addition to other properties considered by other models. We describe Alice, a suite of Markov chain Monte Carlo algorithms for sampling datasets from our null models, based on a carefully defined set of states and efficient operations to move between them. The results of our experimental evaluation show that Alice mixes fast and scales well, and that our null model finds different significant results than ones previously considered in the literature.
nan
Article 386
Title@2025-06-11 (3): Generating Likely Counterfactuals Using Sum-Product Networks
Title: Generating Likely Counterfactuals Using Sum-Product Networks | Wahrscheinliche Counterfactuals mit Sum-Product-Netzwerken generieren | 使用总和生产网络产生可能的反事实 2401.14086v5 |
Authors (3): Jiri Nemecek, Tomas Pevny, Jakub Marecek
The need to explain decisions made by AI systems is driven by both recent regulation and user demand. The decisions are often explainable only post hoc. In counterfactual explanations, one may ask what constitutes the best counterfactual explanation. Clearly, multiple criteria must be taken into account, although “distance from the sample” is a key criterion. Recent methods that consider the plausibility of a counterfactual seem to sacrifice this original objective. Here, we present a system that provides high-likelihood explanations that are, at the same time, close and sparse. We show that the search for the most likely explanations satisfying many common desiderata for counterfactual explanations can be modeled using Mixed-Integer Optimization (MIO). We use a Sum-Product Network (SPN) to estimate the likelihood of a counterfactual. To achieve that, we propose an MIO formulation of an SPN, which can be of independent interest. The source code with examples is available at https://github.com/Epanemu/LiCE.
nan
Article 387
Title@2025-06-11 (3): STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
Title: STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization | STAR: Vielfältiges Lernen von Roboter-Fähigkeit Abstraktionen durch Rotations-Augmented Vector Quantization | STAR:通过旋转-增强矢量量化,学习多式机器人技能变异 2506.03863v2 |
Authors (7): Hao Li, Qi Lv, Rui Shao, Xiang Deng, Yinchuan Li, Jianye Hao, Liqiang Nie
Transforming complex actions into discrete skill abstractions has demonstrated strong potential for robotic manipulation. Existing approaches mainly leverage latent variable models, e.g., VQ-VAE, to learn skill abstractions through learned vectors (codebooks), while they suffer from codebook collapse and modeling the causal relationship between learned skills. To address these limitations, we present \textbf{S}kill \textbf{T}raining with \textbf{A}ugmented \textbf{R}otation (\textbf{STAR}), a framework that advances both skill learning and composition to complete complex behaviors. Specifically, to prevent codebook collapse, we devise rotation-augmented residual skill quantization (RaRSQ). It encodes relative angles between encoder outputs into the gradient flow by rotation-based gradient mechanism. Points within the same skill code are forced to be either pushed apart or pulled closer together depending on gradient directions. Further, to capture the causal relationship between skills, we present causal skill transformer (CST) which explicitly models dependencies between skill representations through an autoregressive mechanism for coherent action generation. Extensive experiments demonstrate the superiority of STAR on both LIBERO benchmark and realworld tasks, with around 12\% improvement over the baselines.
nan
Article 388
Title@2025-06-11 (3): Feature Engineering for Agents: An Adaptive Cognitive Architecture for Interpretable ML Monitoring
Title: Feature Engineering for Agents: An Adaptive Cognitive Architecture for Interpretable ML Monitoring | Feature Engineering for Agents: Eine adaptive Kognitive Architektur für interpretierbare ML-Überwachung | 代理物特征工程:用于可解释ML监测的适应性认知结构 2506.09742v1 |
Authors (6): Gusseppe Bravo-Rocca, Peini Liu, Jordi Guitart, Rodrigo M Carrillo-Larco, Ajay Dholakia, David Ellison
Monitoring Machine Learning (ML) models in production environments is crucial, yet traditional approaches often yield verbose, low-interpretability outputs that hinder effective decision-making. We propose a cognitive architecture for ML monitoring that applies feature engineering principles to agents based on Large Language Models (LLMs), significantly enhancing the interpretability of monitoring outputs. Central to our approach is a Decision Procedure module that simulates feature engineering through three key steps: Refactor, Break Down, and Compile. The Refactor step improves data representation to better capture feature semantics, allowing the LLM to focus on salient aspects of the monitoring data while reducing noise and irrelevant information. Break Down decomposes complex information for detailed analysis, and Compile integrates sub-insights into clear, interpretable outputs. This process leads to a more deterministic planning approach, reducing dependence on LLM-generated planning, which can sometimes be inconsistent and overly general. The combination of feature engineering-driven planning and selective LLM utilization results in a robust decision support system, capable of providing highly interpretable and actionable insights. Experiments using multiple LLMs demonstrate the efficacy of our approach, achieving significantly higher accuracy compared to various baselines across several domains.
nan
Article 389
Title@2025-06-11 (3): Byzantine-Resilient Decentralized Multi-Armed Bandits
Title: Byzantine-Resilient Decentralized Multi-Armed Bandits | Byzantinisch-Resiliente dezentralisierte Multi-Armed Banditen | 拜占庭抗力强的分散式多武装强盗 2310.07320v2 |
Authors (4): Jingxuan Zhu, Alec Koppel, Alvaro Velasquez, Ji Liu
In decentralized cooperative multi-armed bandits (MAB), each agent observes a distinct stream of rewards, and seeks to exchange information with others to select a sequence of arms so as to minimize its regret. Agents in the cooperative setting can outperform a single agent running a MAB method such as Upper-Confidence Bound (UCB) independently. In this work, we study how to recover such salient behavior when an unknown fraction of the agents can be Byzantine, that is, communicate arbitrarily wrong information in the form of reward mean-estimates or confidence sets. This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets. Our key contribution is the development of a fully decentralized resilient upper confidence bound (UCB) algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values. This truncation step enables us to establish that the performance of each normal agent is no worse than the classic single-agent UCB1 algorithm in terms of regret, and more importantly, the cumulative regret of all normal agents is strictly better than the non-cooperative case, provided that each agent has at least 3f+1 neighbors where f is the maximum possible Byzantine agents in each agent’s neighborhood. Extensions to time-varying neighbor graphs, and minimax lower bounds are further established on the achievable regret. Experiments corroborate the merits of this framework in practice.
nan
Article 390
Title@2025-06-11 (3): Towards Multi-modal Graph Large Language Model
Title: Towards Multi-modal Graph Large Language Model | Auf dem Weg zum multimodalen Graphenmodell für große Sprachen | 走向多式大语言数字模型 2506.09738v1 |
Authors (6): Xin Wang, Zeyang Zhang, Linxin Xiao, Haibo Chen, Chendi Ge, Wenwu Zhu
Multi-modal graphs, which integrate diverse multi-modal features and relations, are ubiquitous in real-world applications. However, existing multi-modal graph learning methods are typically trained from scratch for specific graph data and tasks, failing to generalize across various multi-modal graph data and tasks. To bridge this gap, we explore the potential of Multi-modal Graph Large Language Models (MG-LLM) to unify and generalize across diverse multi-modal graph data and tasks. We propose a unified framework of multi-modal graph data, task, and model, discovering the inherent multi-granularity and multi-scale characteristics in multi-modal graphs. Specifically, we present five key desired characteristics for MG-LLM: 1) unified space for multi-modal structures and attributes, 2) capability of handling diverse multi-modal graph tasks, 3) multi-modal graph in-context learning, 4) multi-modal graph interaction with natural language, and 5) multi-modal graph reasoning. We then elaborate on the key challenges, review related works, and highlight promising future research directions towards realizing these ambitious characteristics. Finally, we summarize existing multi-modal graph datasets pertinent for model training. We believe this paper can contribute to the ongoing advancement of the research towards MG-LLM for generalization across multi-modal graph data and tasks.
nan
Article 391
Title@2025-06-11 (3): AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale
Title: AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale | AtmosMJ: Über die Jahresskala hinausgehende KI-Wettervorhersagen über das Gateing-Mechanismus hinaus | AtmosMJ: 重新研究国际气象预报系统在年规模以后的天气预报的确定机制 2506.09733v1 |
Authors (1): Minjong Cheon
The advent of Large Weather Models (LWMs) has marked a turning point in data-driven forecasting, with many models now outperforming traditional numerical systems in the medium range. However, achieving stable, long-range autoregressive forecasts beyond a few weeks remains a significant challenge. Prevailing state-of-the-art models that achieve year-long stability, such as SFNO and DLWP-HPX, have relied on transforming input data onto non-standard spatial domains like spherical harmonics or HEALPix meshes. This has led to the prevailing assumption that such representations are necessary to enforce physical consistency and long-term stability. This paper challenges that assumption by investigating whether comparable long-range performance can be achieved on the standard latitude-longitude grid. We introduce AtmosMJ, a deep convolutional network that operates directly on ERA5 data without any spherical remapping. The model’s stability is enabled by a novel Gated Residual Fusion (GRF) mechanism, which adaptively moderates feature updates to prevent error accumulation over long recursive simulations. Our results demonstrate that AtmosMJ produces stable and physically plausible forecasts for about 500 days. In quantitative evaluations, it achieves competitive 10-day forecast accuracy against models like Pangu-Weather and GraphCast, all while requiring a remarkably low training budget of 5.7 days on a V100 GPU. Our findings suggest that efficient architectural design, rather than non-standard data representation, can be the key to unlocking stable and computationally efficient long-range weather prediction.
nan
Article 392
Title@2025-06-11 (3): Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts
Title: Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts | Auf dem Weg zu Basismodellen für die dynamische Systemrekonstruktion: Hierarchisches Meta-Learning über die Mischung von Experten | 建立动态系统重建基础模式:通过专家混合学习 2502.05335v2 |
Authors (5): Roussel Desmond Nzoyem, Grant Stevens, Amarpal Sahota, David A. W. Barton, Tom Deakin
As foundational models reshape scientific discovery, a bottleneck persists in dynamical system reconstruction (DSR): the ability to learn across system hierarchies. Many meta-learning approaches have been applied successfully to single systems, but falter when confronted with sparse, loosely related datasets requiring multiple hierarchies to be learned. Mixture of Experts (MoE) offers a natural paradigm to address these challenges. Despite their potential, we demonstrate that naive MoEs are inadequate for the nuanced demands of hierarchical DSR, largely due to their gradient descent-based gating update mechanism which leads to slow updates and conflicted routing during training. To overcome this limitation, we introduce MixER: Mixture of Expert Reconstructors, a novel sparse top-1 MoE layer employing a custom gating update algorithm based on $K$-means and least squares. Extensive experiments validate MixER’s capabilities, demonstrating efficient training and scalability to systems of up to ten parametric ordinary differential equations. However, our layer underperforms state-of-the-art meta-learners in high-data regimes, particularly when each expert is constrained to process only a fraction of a dataset composed of highly related data points. Further analysis with synthetic and neuroscientific time series suggests that the quality of the contextual representations generated by MixER is closely linked to the presence of hierarchical structure in the data.
nan
Article 393
Title@2025-06-11 (3): Reevaluating Meta-Learning Optimization Algorithms Through Contextual Self-Modulation
Title: Reevaluating Meta-Learning Optimization Algorithms Through Contextual Self-Modulation | Neubewertung von Meta-Learning-Optimierungsalgorithmen durch kontextuelle Selbst-Modulation | 通过上下文自体调整重新评价元-收益优化优化化的元-收益优化化 2410.01655v2 |
Authors (3): Roussel Desmond Nzoyem, David A. W. Barton, Tom Deakin
Contextual Self-Modulation (CSM) (Nzoyem et al., 2025) is a potent regularization mechanism for Neural Context Flows (NCFs) which demonstrates powerful meta-learning on physical systems. However, CSM has limitations in its applicability across different modalities and in high-data regimes. In this work, we introduce two extensions: $i$CSM which expands CSM to infinite-dimensional variations by embedding the contexts into a function space, and StochasticNCF which improves scalability by providing a low-cost approximation of meta-gradient updates through a sampled set of nearest environments. These extensions are demonstrated through comprehensive experimentation on a range of tasks, including dynamical systems, computer vision challenges, and curve fitting problems. Additionally, we incorporate higher-order Taylor expansions via Taylor-Mode automatic differentiation, revealing that higher-order approximations do not necessarily enhance generalization. Finally, we demonstrate how CSM can be integrated into other meta-learning frameworks with FlashCAVIA, a computationally efficient extension of the CAVIA meta-learning framework (Zintgraf et al., 2019). Together, these contributions highlight the significant benefits of CSM and indicate that its strengths in meta-learning and out-of-distribution tasks are particularly well-suited to physical systems. Our open-source library, designed for modular integration of self-modulation into contextual meta-learning workflows, is available at https://github.com/ddrous/self-mod.
nan
Article 394
Title@2025-06-11 (3): Auto-Compressing Networks
Title: Auto-Compressing Networks | Autokomprimierende Netzwerke | 自动压缩网络 2506.09714v1 |
Authors (3): Vaggelis Dorovatas, Georgios Paraskevopoulos, Alexandros Potamianos
Deep neural networks with short residual connections have demonstrated remarkable success across domains, but increasing depth often introduces computational redundancy without corresponding improvements in representation quality. In this work, we introduce Auto-Compressing Networks (ACNs), an architectural variant where additive long feedforward connections from each layer to the output replace traditional short residual connections. ACNs showcase a unique property we coin as “auto-compression”, the ability of a network to organically compress information during training with gradient descent, through architectural design alone. Through auto-compression, information is dynamically “pushed” into early layers during training, enhancing their representational quality and revealing potential redundancy in deeper ones. We theoretically show that this property emerges from layer-wise training patterns present in ACNs, where layers are dynamically utilized during training based on task requirements. We also find that ACNs exhibit enhanced noise robustness compared to residual networks, superior performance in low-data settings, improved transfer learning capabilities, and mitigate catastrophic forgetting suggesting that they learn representations that generalize better despite using fewer parameters. Our results demonstrate up to 18% reduction in catastrophic forgetting and 30-80% architectural compression while maintaining accuracy across vision transformers, MLP-mixers, and BERT architectures. Furthermore, we demonstrate that coupling ACNs with traditional pruning techniques, enables significantly better sparsity-performance trade-offs compared to conventional architectures. These findings establish ACNs as a practical approach to developing efficient neural architectures that automatically adapt their computational footprint to task complexity, while learning robust representations.
nan
Article 395
Title@2025-06-11 (3): Training-Free Voice Conversion with Factorized Optimal Transport
Title: Training-Free Voice Conversion with Factorized Optimal Transport | Schulungsfreie Sprachumwandlung mit Factorized Optimal Transport | 以优化运输为要素的免费培训语音转换 2506.09709v1 |
Authors (3): Alexander Lobashev, Assel Yermekova, Maria Larchenko
This paper introduces Factorized MKL-VC, a training-free modification for kNN-VC pipeline. In contrast with original pipeline, our algorithm performs high quality any-to-any cross-lingual voice conversion with only 5 second of reference audio. MKL-VC replaces kNN regression with a factorized optimal transport map in WavLM embedding subspaces, derived from Monge-Kantorovich Linear solution. Factorization addresses non-uniform variance across dimensions, ensuring effective feature transformation. Experiments on LibriSpeech and FLEURS datasets show MKL-VC significantly improves content preservation and robustness with short reference audio, outperforming kNN-VC. MKL-VC achieves performance comparable to FACodec, especially in cross-lingual voice conversion domain.
nan
Article 396
Title@2025-06-11 (3): LLM2TEA: Agentic AI Designer Finds Innovative Objects with Generative Evolutionary Multitasking
Title: LLM2TEA: Agentic AI Designer Finds Innovative Objects with Generative Evolutionary Multitasking | LLM2TEA: Agentischer AI-Designer findet innovative Objekte mit generativem evolutionären Multitasking | LLM2TEA: ARI 设计师发现具有引演进进化多任务的创新对象 2406.14917v2 |
Authors (5): Melvin Wong, Jiao Liu, Thiago Rios, Stefan Menzel, Yew Soon Ong
In this paper, we introduce LLM-driven MultiTask Evolutionary Algorithm (LLM2TEA), the first agentic AI designer within a generative evolutionary multitasking (GEM) framework that promotes the crossover and synergy of designs from multiple domains, leading to innovative solutions that transcend individual disciplines. Of particular interest is the discovery of objects that are not only innovative but also conform to the physical specifications of the real world in science and engineering. LLM2TEA comprises a large language model to initialize a population of genotypes (defined by text prompts) describing the objects of interest, a text-to-3D generative model to produce phenotypes from these prompts, a classifier to interpret the semantic representations of the objects, and a physics simulation model to assess their physical properties. We propose several novel LLM-based multitask evolutionary operators to guide the search toward the discovery of high-performing practical objects. Experimental results in conceptual design optimization validate the effectiveness of LLM2TEA, revealing from 97\% to 174\% improvement in the diversity of innovative objects compared to the present text-to-3D generative model baseline. In addition, more than 73\% of the generated designs have better physical performance than the top 1\% percentile of the designs generated in the baseline. Moreover, LLM2TEA generates designs that are not only aesthetically creative but also functional in real-world applications. Several of these designs have been successfully 3D-printed, emphasizing the proposed approach’s capacity to transform AI-generated outputs into tangible physical objects. The designs produced by LLM2TEA meets practical requirements while showcasing creative and innovative features, underscoring its potential applications in complex design optimization and discovery.
nan
Article 397
Title@2025-06-11 (3): On the Importance of Embedding Norms in Self-Supervised Learning
Title: On the Importance of Embedding Norms in Self-Supervised Learning | Über die Bedeutung der Einbettung von Normen in das Selbstüberwachte Lernen | 关于自我监督学习中嵌入规范的重要性 2502.09252v2 |
Authors (7): Andrew Draganov, Sharvaree Vadgama, Sebastian Damrich, Jan Niklas Böhm, Lucas Maes, Dmitry Kobak, Erik Bekkers
Self-supervised learning (SSL) allows training data representations without a supervised signal and has become an important paradigm in machine learning. Most SSL methods employ the cosine similarity between embedding vectors and hence effectively embed data on a hypersphere. While this seemingly implies that embedding norms cannot play any role in SSL, a few recent works have suggested that embedding norms have properties related to network convergence and confidence. In this paper, we resolve this apparent contradiction and systematically establish the embedding norm’s role in SSL training. Using theoretical analysis, simulations, and experiments, we show that embedding norms (i) govern SSL convergence rates and (ii) encode network confidence, with smaller norms corresponding to unexpected samples. Additionally, we show that manipulating embedding norms can have large effects on convergence speed. Our findings demonstrate that SSL embedding norms are integral to understanding and optimizing network behavior.
nan
Article 398
Title@2025-06-11 (3): TRIDENT: Temporally Restricted Inference via DFA-Enhanced Neural Traversal
Title: TRIDENT: Temporally Restricted Inference via DFA-Enhanced Neural Traversal | TRIDENT: Vorübergehend eingeschränkte Schlussfolgerung über DFA-verstärktes Neuraltraversal | 通过DFA-增强的神经轨迹轨迹临时受限制的引文 2506.09701v1 |
Authors (5): Vincenzo Collura, Karim Tit, Laura Bussi, Eleonora Giunchiglia, Maxime Cordy
Large Language Models (LLMs) and other neural architectures have achieved impressive results across a variety of generative and classification tasks. However, they remain fundamentally ill-equipped to ensure that their outputs satisfy temporal constraints, such as those expressible in Linear Temporal Logic over finite traces (LTLf). In this paper, we introduce TRIDENT: a general and model-agnostic inference-time algorithm that guarantees compliance with such constraints without requiring any retraining. TRIDENT compiles LTLf formulas into a Deterministic Finite Automaton (DFA), which is used to guide a constrained variant of beam search. At each decoding step, transitions that would lead to constraint violations are masked, while remaining paths are dynamically re-ranked based on both the model’s probabilities and the DFA’s acceptance structure. We formally prove that the resulting sequences are guaranteed to satisfy the given LTLf constraints, and we empirically demonstrate that TRIDENT also improves output quality. We validate our approach on two distinct tasks: temporally constrained image-stream classification and controlled text generation. In both settings, TRIDENT achieves perfect constraint satisfaction, while comparison with the state of the art shows improved efficiency and high standard quality metrics.
nan
Article 399
Title@2025-06-11 (3): SLEEPYLAND: trust begins with fair evaluation of automatic sleep staging models
Title: SLEEPYLAND: trust begins with fair evaluation of automatic sleep staging models | SLEEPYLAND: Vertrauen beginnt mit einer fairen Bewertung automatischer Schlafinszenierungsmodelle | SEPEYLAND:信任始于公平评价自动睡眠模式 2506.08574v2 |
Authors (10): Alvise Dei Rossi, Matteo Metaldi, Michal Bechny, Irina Filchenko, Julia van der Meer, Markus H. Schmidt, Claudio L. A. Bassetti, Athina Tzovara, Francesca D. Faraci, Luigi Fiorillo
Despite advances in deep learning for automatic sleep staging, clinical adoption remains limited due to challenges in fair model evaluation, generalization across diverse datasets, model bias, and variability in human annotations. We present SLEEPYLAND, an open-source sleep staging evaluation framework designed to address these barriers. It includes more than 220’000 hours in-domain (ID) sleep recordings, and more than 84’000 hours out-of-domain (OOD) sleep recordings, spanning a broad range of ages, sleep-wake disorders, and hardware setups. We release pre-trained models based on high-performing SoA architectures and evaluate them under standardized conditions across single- and multi-channel EEG/EOG configurations. We introduce SOMNUS, an ensemble combining models across architectures and channel setups via soft voting. SOMNUS achieves robust performance across twenty-four different datasets, with macro-F1 scores between 68.7% and 87.2%, outperforming individual models in 94.9% of cases. Notably, SOMNUS surpasses previous SoA methods, even including cases where compared models were trained ID while SOMNUS treated the same data as OOD. Using a subset of the BSWR (N=6’633), we quantify model biases linked to age, gender, AHI, and PLMI, showing that while ensemble improves robustness, no model architecture consistently minimizes bias in performance and clinical markers estimation. In evaluations on OOD multi-annotated datasets (DOD-H, DOD-O), SOMNUS exceeds the best human scorer, i.e., MF1 85.2% vs 80.8% on DOD-H, and 80.2% vs 75.9% on DOD-O, better reproducing the scorer consensus than any individual expert (k = 0.89/0.85 and ACS = 0.95/0.94 for healthy/OSA cohorts). Finally, we introduce ensemble disagreement metrics - entropy and inter-model divergence based - predicting regions of scorer disagreement with ROC AUCs up to 0.828, offering a data-driven proxy for human uncertainty.
nan
Article 400
Title@2025-06-11 (3): Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Title: Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment | Jenseits von Bradley-Terry-Modellen: Ein allgemeines Präferenzmodell für Sprachmodellausrichtung | 超越布拉德利-泰瑞模式:语文模式统一通用优先模式 2410.02197v3 |
Authors (5): Yifan Zhang, Ge Zhang, Yue Wu, Kangping Xu, Quanquan Gu
Modeling human preferences is crucial for aligning foundation models with human values. Traditional reward modeling methods, such as the Bradley-Terry (BT) reward model, fall short in expressiveness, particularly in addressing intransitive preferences. In this paper, we introduce preference embedding, an approach that embeds responses into a latent space to capture intricate preference structures efficiently, achieving linear query complexity. Additionally, we propose preference score-based General Preference Optimization (GPO), which generalizes reward-based reinforcement learning from human feedback (RLHF). Experimental results show that our General Preference embedding Model (GPM) consistently outperforms the BT reward model on the RewardBench benchmark and effectively models cyclic preferences where any BT reward model behaves like a random guess. Furthermore, evaluations on downstream tasks such as AlpacaEval2.0, following the language model post-training with GPO and our general preference model, reveal performance improvements over BT models. These findings indicate that our method may enhance the alignment of foundation models with nuanced human values. The code is available at https://github.com/general-preference/general-preference-model.
nan
Article 401
Title@2025-06-11 (3): Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
Title: Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation | Auto-Regressive vs Flow-Matching: Eine vergleichende Studie zur Modellierung von Paradigmen für die Text-zu-Musik-Generierung | 自动递减与流动配接: 文本到音乐一代模型模型比较研究 2506.08570v2 |
Authors (3): Or Tal, Felix Kreuk, Yossi Adi
Recent progress in text-to-music generation has enabled models to synthesize high-quality musical segments, full compositions, and even respond to fine-grained control signals, e.g. chord progressions. State-of-the-art (SOTA) systems differ significantly across many dimensions, such as training datasets, modeling paradigms, and architectural choices. This diversity complicates efforts to evaluate models fairly and pinpoint which design choices most influence performance. While factors like data and architecture are important, in this study we focus exclusively on the modeling paradigm. We conduct a systematic empirical analysis to isolate its effects, offering insights into associated trade-offs and emergent behaviors that can guide future text-to-music generation systems. Specifically, we compare the two arguably most common modeling paradigms: Auto-Regressive decoding and Conditional Flow-Matching. We conduct a controlled comparison by training all models from scratch using identical datasets, training configurations, and similar backbone architectures. Performance is evaluated across multiple axes, including generation quality, robustness to inference configurations, scalability, adherence to both textual and temporally aligned conditioning, and editing capabilities in the form of audio inpainting. This comparative study sheds light on distinct strengths and limitations of each paradigm, providing actionable insights that can inform future architectural and training decisions in the evolving landscape of text-to-music generation. Audio sampled examples are available at: https://huggingface.co/spaces/ortal1602/ARvsFM
nan
Article 402
Title@2025-06-11 (3): Share Secrets for Privacy: Confidential Forecasting with Vertical Federated Learning
Title: Share Secrets for Privacy: Confidential Forecasting with Vertical Federated Learning | Share Secrets for Privacy: Vertrauliche Vorhersagen mit vertikalem Federated Learning | 私隐共享秘密:与垂直联邦学习有关的保密预报 2405.20761v2 |
Authors (5): Aditya Shankar, Jérémie Decouchant, Dimitra Gkorou, Rihan Hai, Lydia Y. Chen
Vertical federated learning (VFL) is a promising area for time series forecasting in many applications, such as healthcare and manufacturing. Critical challenges to address include data privacy and over-fitting on small and noisy datasets during both training and inference. Additionally, such forecasting models must scale well with the number of parties while ensuring strong convergence and low-tuning complexity. We address these challenges and propose ``Secret-shared Time Series Forecasting with VFL’’ (STV), a novel framework with the following key features: i) a privacy-preserving algorithm for forecasting with SARIMAX and autoregressive trees on vertically-partitioned data; ii) decentralised forecasting using secret sharing and multi-party computation; and iii) novel N-party algorithms for matrix multiplication and inverse operations for exact parameter optimization, giving strong convergence with minimal tuning complexity. We evaluate on six representative datasets from public and industry-specific contexts. Results demonstrate that STV’s forecasting accuracy is comparable to those of centralized approaches. Our exact optimization outperforms centralized methods, including state-of-the-art diffusion models and long-short-term memory, by 23.81% on forecasting accuracy. We also evaluate scalability by examining the communication costs of exact and iterative optimization to navigate the choice between the two. STV’s code and supplementary material is available online: https://github.com/adis98/STV.
nan
Article 403
Title@2025-06-11 (3): Adding simple structure at inference improves Vision-Language Compositionality
Title: Adding simple structure at inference improves Vision-Language Compositionality | Einfache Struktur bei Inferenz hinzufügen verbessert die Vision-Sprachenkompositionalität | 在推论中添加简单结构改进视野-语言的构成性 2506.09691v1 |
Authors (4): Imanol Miranda, Ander Salaberria, Eneko Agirre, Gorka Azkune
Dual encoder Vision-Language Models (VLM) such as CLIP are widely used for image-text retrieval tasks. However, those models struggle with compositionality, showing a bag-of-words-like behavior that limits their retrieval performance. Many different training approaches have been proposed to improve the vision-language compositionality capabilities of those models. In comparison, inference-time techniques have received little attention. In this paper, we propose to add simple structure at inference, where, given an image and a caption: i) we divide the image into different smaller crops, ii) we extract text segments, capturing objects, attributes and relations, iii) using a VLM, we find the image crops that better align with text segments obtaining matches, and iv) we compute the final image-text similarity aggregating the individual similarities of the matches. Based on various popular dual encoder VLMs, we evaluate our approach in controlled and natural datasets for VL compositionality. We find that our approach consistently improves the performance of evaluated VLMs without any training, which shows the potential of inference-time techniques. The results are especially good for attribute-object binding as shown in the controlled dataset. As a result of an extensive analysis: i) we show that processing image crops is actually essential for the observed gains in performance, and ii) we identify specific areas to further improve inference-time approaches.
nan
Article 404
Title@2025-06-11 (3): Wasserstein Hypergraph Neural Network
Title: Wasserstein Hypergraph Neural Network | Wasserstein Hypergraph Neuronales Netzwerk | Vasserstein 高音神经网络 2506.09682v1 |
Authors (2): Iulia Duta, Pietro Liò
The ability to model relational information using machine learning has driven advancements across various domains, from medicine to social science. While graph representation learning has become mainstream over the past decade, representing higher-order relationships through hypergraphs is rapidly gaining momentum. In the last few years, numerous hypergraph neural networks have emerged, most of them falling under a two-stage, set-based framework. The messages are sent from nodes to edges and then from edges to nodes. However, most of the advancement still takes inspiration from the graph counterpart, often simplifying the aggregations to basic pooling operations. In this paper we are introducing Wasserstein Hypergraph Neural Network, a model that treats the nodes and hyperedge neighbourhood as distributions and aggregate the information using Sliced Wasserstein Pooling. Unlike conventional aggregators such as mean or sum, which only capture first-order statistics, our approach has the ability to preserve geometric properties like the shape and spread of distributions. This enables the learned embeddings to reflect how easily one hyperedge distribution can be transformed into another, following principles of optimal transport. Experimental results demonstrate that applying Wasserstein pooling in a hypergraph setting significantly benefits node classification tasks, achieving top performance on several real-world datasets.
nan
Article 405
Title@2025-06-11 (3): Assessing the Quality of Denoising Diffusion Models in Wasserstein Distance: Noisy Score and Optimal Bounds
Title: Assessing the Quality of Denoising Diffusion Models in Wasserstein Distance: Noisy Score and Optimal Bounds | Bewertung der Qualität des Denoisierens von Diffusionsmodellen in Wasserstein Abstand: Geräuschpegel und optimale Bounds | 评估瓦西尔斯坦距离中Disoising扩散模型的质量:噪音分数和最佳区区 2506.09681v1 |
Authors (3): Vahan Arsenyan, Elen Vardanyan, Arnak Dalalyan
Generative modeling aims to produce new random examples from an unknown target distribution, given access to a finite collection of examples. Among the leading approaches, denoising diffusion probabilistic models (DDPMs) construct such examples by mapping a Brownian motion via a diffusion process driven by an estimated score function. In this work, we first provide empirical evidence that DDPMs are robust to constant-variance noise in the score evaluations. We then establish finite-sample guarantees in Wasserstein-2 distance that exhibit two key features: (i) they characterize and quantify the robustness of DDPMs to noisy score estimates, and (ii) they achieve faster convergence rates than previously known results. Furthermore, we observe that the obtained rates match those known in the Gaussian case, implying their optimality.
nan
Article 406
Title@2025-06-11 (3): SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting
Title: SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting | SplitLoRA: Balance Stabilität und Plastizität im kontinuierlichen Lernen durch gradienten Raum Splitting | Split LoRA:通过逐步空间分割在持续学习中平衡稳定和可塑性 2505.22370v3 |
Authors (6): Haomiao Qiu, Miao Zhang, Ziyue Qiao, Weili Guan, Min Zhang, Liqiang Nie
Continual Learning requires a model to learn multiple tasks in sequence while maintaining both stability:preserving knowledge from previously learned tasks, and plasticity:effectively learning new tasks. Gradient projection has emerged as an effective and popular paradigm in CL, where it partitions the gradient space of previously learned tasks into two orthogonal subspaces: a primary subspace and a minor subspace. New tasks are learned effectively within the minor subspace, thereby reducing interference with previously acquired knowledge. However, existing Gradient Projection methods struggle to achieve an optimal balance between plasticity and stability, as it is hard to appropriately partition the gradient space. In this work, we consider a continual learning paradigm based on Low-Rank Adaptation, which has gained considerable attention due to its efficiency and wide applicability, and propose a novel approach for continual learning, called SplitLoRA. We first provide a theoretical analysis of how subspace partitioning affects model stability and plasticity. Informed by this analysis, we then introduce an effective method that derives the optimal partition of the gradient space for previously learned tasks. This approach effectively balances stability and plasticity in continual learning. Experimental results on multiple datasets demonstrate that the proposed method achieves state-of-the-art performance.
nan
Article 407
Title@2025-06-11 (3): Wavelet Scattering Transform and Fourier Representation for Offline Detection of Malicious Clients in Federated Learning
Title: Wavelet Scattering Transform and Fourier Representation for Offline Detection of Malicious Clients in Federated Learning | Wavelet Scattering Transform und Fourier-Darstellung zur Offline-Erkennung von bösartigen Kunden im Federated Learning | 联邦学习联盟恶意客户离线检测分波散散变形和Fourier代表处 2506.09674v1 |
Authors (3): Alessandro Licciardi, Davide Leo, Davide Carbone
Federated Learning (FL) enables the training of machine learning models across decentralized clients while preserving data privacy. However, the presence of anomalous or corrupted clients - such as those with faulty sensors or non representative data distributions - can significantly degrade model performance. Detecting such clients without accessing raw data remains a key challenge. We propose WAFFLE (Wavelet and Fourier representations for Federated Learning) a detection algorithm that labels malicious clients {\it before training}, using locally computed compressed representations derived from either the Wavelet Scattering Transform (WST) or the Fourier Transform. Both approaches provide low-dimensional, task-agnostic embeddings suitable for unsupervised client separation. A lightweight detector, trained on a distillated public dataset, performs the labeling with minimal communication and computational overhead. While both transforms enable effective detection, WST offers theoretical advantages, such as non-invertibility and stability to local deformations, that make it particularly well-suited to federated scenarios. Experiments on benchmark datasets show that our method improves detection accuracy and downstream classification performance compared to existing FL anomaly detection algorithms, validating its effectiveness as a pre-training alternative to online detection strategies.
nan
Article 408
Title@2025-06-11 (3): Optimizing Shortfall Risk Metric for Learning Regression Models
Title: Optimizing Shortfall Risk Metric for Learning Regression Models | Optimierung von Shortfall Risk Metric zum Lernen von Regressionsmodellen | 优化 “ 短缺风险 “ 风险计量模型的学习倒退模型 2505.17777v3 |
Authors (2): Harish G. Ramaswamy, L. A. Prashanth
We consider the problem of estimating and optimizing utility-based shortfall risk (UBSR) of a loss, say $(Y - \hat Y)^2$, in the context of a regression problem. Empirical risk minimization with a UBSR objective is challenging since UBSR is a non-linear function of the underlying distribution. We first derive a concentration bound for UBSR estimation using independent and identically distributed (i.i.d.) samples. We then frame the UBSR optimization problem as minimization of a pseudo-linear function in the space of achievable distributions $\mathcal D$ of the loss $(Y- \hat Y)^2$. We construct a gradient oracle for the UBSR objective and a linear minimization oracle (LMO) for the set $\mathcal D$. Using these oracles, we devise a bisection-type algorithm, and establish convergence to the UBSR-optimal solution.
nan
Article 409
Title@2025-06-11 (3): CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Title: CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization | CROW: Beseitigung von Backdoors aus großen Sprachmodellen über interne Konsistenzregularisierung | CROW:通过内部一致性规范化,消除大语言模式中的后门 2411.12768v2 |
Authors (4): Nay Myat Min, Long H. Pham, Yige Li, Jun Sun
Large Language Models (LLMs) are vulnerable to backdoor attacks that manipulate outputs via hidden triggers. Existing defense methods–designed for vision/text classification tasks–fail for text generation. We propose Internal Consistency Regularization (CROW), a defense leveraging the observation that backdoored models exhibit unstable layer-wise hidden representations when triggered, while clean models show smooth transitions. CROW enforces consistency across layers via adversarial perturbations and regularization during finetuning, neutralizing backdoors without requiring clean reference models or trigger knowledge–only a small clean dataset. Experiments across Llama-2 (7B, 13B), CodeLlama (7B, 13B), and Mistral-7B demonstrate CROW’s effectiveness: it achieves significant reductions in attack success rates across diverse backdoor strategies (sentiment steering, targeted refusal, code injection) while preserving generative performance. CROW’s architecture-agnostic design enables practical deployment.
nan
Article 410
Title@2025-06-11 (3): CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain
Title: CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain | CINeMA: Bedingter Impliziter Neuraler Multi-Modaler Atlas für eine räumlich-zeitliche Darstellung des Perinatalen Gehirns | CINEMA: 围产期大脑空间时/时代表性有条件的隐性神经隐性多模式图集 2506.09668v1 |
Authors (8): Maik Dannecker, Vasiliki Sideri-Lampretsa, Sophie Starck, Angeline Mihailov, Mathieu Milh, Nadine Girard, Guillaume Auzias, Daniel Rueckert
Magnetic resonance imaging of fetal and neonatal brains reveals rapid neurodevelopment marked by substantial anatomical changes unfolding within days. Studying this critical stage of the developing human brain, therefore, requires accurate brain models-referred to as atlases-of high spatial and temporal resolution. To meet these demands, established traditional atlases and recently proposed deep learning-based methods rely on large and comprehensive datasets. This poses a major challenge for studying brains in the presence of pathologies for which data remains scarce. We address this limitation with CINeMA (Conditional Implicit Neural Multi-Modal Atlas), a novel framework for creating high-resolution, spatio-temporal, multimodal brain atlases, suitable for low-data settings. Unlike established methods, CINeMA operates in latent space, avoiding compute-intensive image registration and reducing atlas construction times from days to minutes. Furthermore, it enables flexible conditioning on anatomical features including GA, birth age, and pathologies like ventriculomegaly (VM) and agenesis of the corpus callosum (ACC). CINeMA supports downstream tasks such as tissue segmentation and age prediction whereas its generative properties enable synthetic data creation and anatomically informed data augmentation. Surpassing state-of-the-art methods in accuracy, efficiency, and versatility, CINeMA represents a powerful tool for advancing brain research. We release the code and atlases at https://github.com/m-dannecker/CINeMA.
nan
Article 411
Title@2025-06-11 (3): Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes
Title: Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes | Lernen von zeitvariierenden Multi-Region Gehirnkommunikation über skalierbare Markovian Gaussian Prozesse | 通过可缩放的马尔科维扬高斯进程进行学习、改变时间的多区域脑交流 2407.00397v5 |
Authors (4): Weihan Li, Yule Wang, Chengrui Li, Anqi Wu
Understanding and constructing brain communications that capture dynamic communications across multiple regions is fundamental to modern system neuroscience, yet current methods struggle to find time-varying region-level communications or scale to large neural datasets with long recording durations. We present a novel framework using Markovian Gaussian Processes to learn brain communications with time-varying temporal delays from multi-region neural recordings, named Adaptive Delay Model (ADM). Our method combines Gaussian Processes with State Space Models and employs parallel scan inference algorithms, enabling efficient scaling to large datasets while identifying concurrent communication patterns that evolve over time. This time-varying approach captures how brain region interactions shift dynamically during cognitive processes. Validated on synthetic and multi-region neural recordings datasets, our approach discovers both the directionality and temporal dynamics of neural communication. This work advances our understanding of distributed neural computation and provides a scalable tool for analyzing dynamic brain networks.
nan
Article 412
Title@2025-06-11 (3): Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement
Title: Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement | Anmerkungsfreie MIDI-to-Audio-Synthese über Konkatenative Synthese und Generative Veredelung | 通过分类合成和生产改进进行无说明的MIDI到研究综合综述 2410.16785v2 |
Authors (2): Osamu Take, Taketo Akama
Recent MIDI-to-audio synthesis methods using deep neural networks have successfully generated high-quality, expressive instrumental tracks. However, these methods require MIDI annotations for supervised training, limiting the diversity of instrument timbres and expression styles in the output. We propose CoSaRef, a MIDI-to-audio synthesis method that does not require MIDI-audio paired datasets. CoSaRef first generates a synthetic audio track using concatenative synthesis based on MIDI input, then refines it with a diffusion-based deep generative model trained on datasets without MIDI annotations. This approach improves the diversity of timbres and expression styles. Additionally, it allows detailed control over timbres and expression through audio sample selection and extra MIDI design, similar to traditional functions in digital audio workstations. Experiments showed that CoSaRef could generate realistic tracks while preserving fine-grained timbre control via one-shot samples. Moreover, despite not being supervised on MIDI annotation, CoSaRef outperformed the state-of-the-art timbre-controllable method based on MIDI supervision in both objective and subjective evaluation.
nan
Article 413
Title@2025-06-11 (3): SyncFed: Time-Aware Federated Learning through Explicit Timestamping and Synchronization
Title: SyncFed: Time-Aware Federated Learning through Explicit Timestamping and Synchronization | SyncFed: Time-Aware-Federated Learning durch explizites Zeitstempeln und Synchronisieren | 同步Fed:通过明确时间印记和同步化进行时间软件联邦学习 2506.09660v1 |
Authors (4): Baran Can Gül, Stefanos Tziampazis, Nasser Jazdi, Michael Weyrich
As Federated Learning (FL) expands to larger and more distributed environments, consistency in training is challenged by network-induced delays, clock unsynchronicity, and variability in client updates. This combination of factors may contribute to misaligned contributions that undermine model reliability and convergence. Existing methods like staleness-aware aggregation and model versioning address lagging updates heuristically, yet lack mechanisms to quantify staleness, especially in latency-sensitive and cross-regional deployments. In light of these considerations, we introduce \emph{SyncFed}, a time-aware FL framework that employs explicit synchronization and timestamping to establish a common temporal reference across the system. Staleness is quantified numerically based on exchanged timestamps under the Network Time Protocol (NTP), enabling the server to reason about the relative freshness of client updates and apply temporally informed weighting during aggregation. Our empirical evaluation on a geographically distributed testbed shows that, under \emph{SyncFed}, the global model evolves within a stable temporal context, resulting in improved accuracy and information freshness compared to round-based baselines devoid of temporal semantics.
nan
Article 414
Title@2025-06-11 (3): Intent Factored Generation: Unleashing the Diversity in Your Language Model
Title: Intent Factored Generation: Unleashing the Diversity in Your Language Model | Intent Factored Generation: Die Vielfalt in Ihrem Sprachmodell entfesseln | 内在因数代:在你的语言模式中释放多样性 2506.09659v1 |
Authors (5): Eltayeb Ahmed, Uljad Berdica, Martha Elliott, Danijela Horak, Jakob N. Foerster
Obtaining multiple meaningfully diverse, high quality samples from Large Language Models for a fixed prompt remains an open challenge. Current methods for increasing diversity often only operate at the token-level, paraphrasing the same response. This is problematic because it leads to poor exploration on reasoning problems and to unengaging, repetitive conversational agents. To address this we propose Intent Factored Generation (IFG), factorising the sampling process into two stages. First, we sample a semantically dense intent, e.g., a summary or keywords. Second, we sample the final response conditioning on both the original prompt and the intent from the first stage. This allows us to use a higher temperature during the intent step to promote conceptual diversity, and a lower temperature during the final generation to ensure the outputs are coherent and self-consistent. Additionally, we find that prompting the model to explicitly state its intent for each step of the chain-of-thought before generating the step is beneficial for reasoning tasks. We demonstrate our method’s effectiveness across a diverse set of tasks. We show this method improves both pass@k and Reinforcement Learning from Verifier Feedback on maths and code tasks. For instruction-tuning, we combine IFG with Direct Preference Optimisation to increase conversational diversity without sacrificing reward. Finally, we achieve higher diversity while maintaining the quality of generations on a general language modelling task, using a new dataset of reader comments and news articles that we collect and open-source. In summary, we present a simple method of increasing the sample diversity of LLMs while maintaining performance. This method can be implemented by changing the prompt and varying the temperature during generation, making it easy to integrate into many algorithms for gains across various applications.
nan
Article 415
Title@2025-06-11 (3): DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
Title: DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy | DipLLM: Feinsteuerungs-LLM für strategische Entscheidungsfindung in der Diplomatie | DipLLM: 外交战略决策的精细推荐LLM 2506.09655v1 |
Authors (6): Kaixuan Xu, Jiajun Chai, Sicheng Li, Yuqian Fu, Yuanheng Zhu, Dongbin Zhao
Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplomacy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to simplify the complex task of multi-unit action assignment into a sequence of unit-level decisions. By defining an equilibrium policy within this framework as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its performance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.
nan
Article 416
Title@2025-06-11 (3): Improving Discriminator Guidance in Diffusion Models
Title: Improving Discriminator Guidance in Diffusion Models | Verbesserung der Orientierungshilfen für Diskriminatoren bei Diffusionsmodellen | 改进传播模式中反对者的指导 2503.16117v2 |
Authors (5): Alexandre Verine, Ahmed Mehdi Inane, Florian Le Bronnec, Benjamin Negrevergne, Yann Chevaleyre
Discriminator Guidance has become a popular method for efficiently refining pre-trained Score-Matching Diffusion models. However, in this paper, we demonstrate that the standard implementation of this technique does not necessarily lead to a distribution closer to the real data distribution. Specifically, we show that training the discriminator using Cross-Entropy loss, as commonly done, can in fact increase the Kullback-Leibler divergence between the model and target distributions, particularly when the discriminator overfits. To address this, we propose a theoretically sound training objective for discriminator guidance that properly minimizes the KL divergence. We analyze its properties and demonstrate empirically across multiple datasets that our proposed method consistently improves over the conventional method by producing samples of higher quality.
nan
Article 417
Title@2025-06-11 (3): Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations
Title: Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations | Die Gleichwertigkeit Bayesischer neuraler Netzwerke und Gaussian-Prozesse: Über die Bedeutung von Lernaktivierungen | 重新审视贝耶斯神经网络和高斯进程等效的贝耶斯神经网络和高斯进程:关于学习活动的重要性 2410.15777v3 |
Authors (3): Marcin Sendera, Amin Sorkhei, Tomasz Kuśmierczyk
Gaussian Processes (GPs) provide a convenient framework for specifying function-space priors, making them a natural choice for modeling uncertainty. In contrast, Bayesian Neural Networks (BNNs) offer greater scalability and extendability but lack the advantageous properties of GPs. This motivates the development of BNNs capable of replicating GP-like behavior. However, existing solutions are either limited to specific GP kernels or rely on heuristics. We demonstrate that trainable activations are crucial for effective mapping of GP priors to wide BNNs. Specifically, we leverage the closed-form 2-Wasserstein distance for efficient gradient-based optimization of reparameterized priors and activations. Beyond learned activations, we also introduce trainable periodic activations that ensure global stationarity by design, and functional priors conditioned on GP hyperparameters to allow efficient model selection. Empirically, our method consistently outperforms existing approaches or matches performance of the heuristic methods, while offering stronger theoretical foundations.
nan
Article 418
Title@2025-06-11 (3): Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension
Title: Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension | Diskrepanzen sind Tugend: Schwach-zu-starke Verallgemeinerung durch Lens der Intrinsischen Dimension | 差异是道德:通过内分泌尺寸的透镜对电压的微弱普遍化 2502.05075v4 |
Authors (5): Yijun Dong, Yicheng Li, Yunai Li, Jason D. Lee, Qi Lei
Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong (large) student model is trained on pseudo-labels generated by a weak teacher. Surprisingly, W2S FT often outperforms the weak teacher. We seek to understand this phenomenon through the observation that FT often occurs in intrinsically low-dimensional spaces. Leveraging the low intrinsic dimensionality of FT, we analyze W2S in the ridgeless regression setting from a variance reduction perspective. For a strong student-weak teacher pair with sufficiently expressive low-dimensional feature subspaces $\mathcal{V}_s, \mathcal{V}_w$, we provide an exact characterization of the variance that dominates the generalization error of W2S. This unveils a virtue of discrepancy between the strong and weak models in W2S: the variance of the weak teacher is inherited by the strong student in $\mathcal{V}_s \cap \mathcal{V}_w$, while reduced by a factor of $\mathrm{dim}(\mathcal{V}_s)/N$ in the subspace of discrepancy $\mathcal{V}_w \setminus \mathcal{V}_s$ with $N$ pseudo-labels for W2S. Our analysis further casts light on the sample complexities and the scaling of performance gap recovery in W2S. The analysis is supported by experiments on synthetic regression problems, as well as real vision and NLP tasks.
nan
Article 419
Title@2025-06-11 (3): Pruning Spurious Subgraphs for Graph Out-of-Distribtuion Generalization
Title: Pruning Spurious Subgraphs for Graph Out-of-Distribtuion Generalization | Pruning Spurious Subgraphs für Graph Out-of-Distribtuion Verallgemeinerung | 平面平面平面平面图图解裂平面图集集 2506.05957v3 |
Authors (7): Tianjun Yao, Haoxuan Li, Yongqiang Chen, Tongliang Liu, Le Song, Eric Xing, Zhiqiang Shen
Graph Neural Networks (GNNs) often encounter significant performance degradation under distribution shifts between training and test data, hindering their applicability in real-world scenarios. Recent studies have proposed various methods to address the out-of-distribution generalization challenge, with many methods in the graph domain focusing on directly identifying an invariant subgraph that is predictive of the target label. However, we argue that identifying the edges from the invariant subgraph directly is challenging and error-prone, especially when some spurious edges exhibit strong correlations with the targets. In this paper, we propose PrunE, the first pruning-based graph OOD method that eliminates spurious edges to improve OOD generalizability. By pruning spurious edges, PrunE retains the invariant subgraph more comprehensively, which is critical for OOD generalization. Specifically, PrunE employs two regularization terms to prune spurious edges: 1) graph size constraint to exclude uninformative spurious edges, and 2) $\epsilon$-probability alignment to further suppress the occurrence of spurious edges. Through theoretical analysis and extensive experiments, we show that PrunE achieves superior OOD performance and outperforms previous state-of-the-art methods significantly. Codes are available at: \href{https://github.com/tianyao-aka/PrunE-GraphOOD}{https://github.com/tianyao-aka/PrunE-GraphOOD}.
nan
Article 420
Title@2025-06-11 (3): No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces
Title: No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces | Keine Aufgabe nach links: Isotropes Modell, das sich mit gemeinsamen und aufgabenspezifischen Subräumen vermischt | 未留下任务 : 与共同和任务特定子空间合并的异形模型 2502.04959v3 |
Authors (6): Daniel Marczak, Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost van de Weijer
Model merging integrates the weights of multiple task-specific models into a single multi-task model. Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains. In this paper, we investigate the key characteristics of task matrices – weight update matrices applied to a pre-trained model – that enable effective merging. We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement over the pre-trained model. Based on this, we propose an isotropic merging framework that flattens the singular value spectrum of task matrices, enhances alignment, and reduces the performance gap. Additionally, we incorporate both common and task-specific subspaces to further improve alignment and performance. Our proposed approach achieves state-of-the-art performance on vision and language tasks across various sets of tasks and model scales. This work advances the understanding of model merging dynamics, offering an effective methodology to merge models without requiring additional training. Code is available at https://github.com/danielm1405/iso-merging .
nan
Article 421
Title@2025-06-11 (3): HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios
Title: HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios | HopaDIFF: Ganzheitlich-Partial Aware Fourier Conditioned Diffusion für die Referenzierung von Human Action Segmentierung in Multi-Personen-Szenarien | HopaDIFF: 在多重人的情况中为提及人类行动分部分而传播的全方位全方位认识 2506.09650v1 |
Authors (10): Kunyu Peng, Junchao Huang, Xiangsheng Huang, Di Wen, Junwei Zheng, Yufan Chen, Kailun Yang, Jiamin Wu, Chongqing Hao, Rainer Stiefelhagen
Action segmentation is a core challenge in high-level video understanding, aiming to partition untrimmed videos into segments and assign each a label from a predefined action set. Existing methods primarily address single-person activities with fixed action sequences, overlooking multi-person scenarios. In this work, we pioneer textual reference-guided human action segmentation in multi-person settings, where a textual description specifies the target person for segmentation. We introduce the first dataset for Referring Human Action Segmentation, i.e., RHAS133, built from 133 movies and annotated with 137 fine-grained actions with 33h video data, together with textual descriptions for this new task. Benchmarking existing action recognition methods on RHAS133 using VLM-based feature extractors reveals limited performance and poor aggregation of visual cues for the target person. To address this, we propose a holistic-partial aware Fourier-conditioned diffusion framework, i.e., HopaDIFF, leveraging a novel cross-input gate attentional xLSTM to enhance holistic-partial long-range reasoning and a novel Fourier condition to introduce more fine-grained control to improve the action segmentation generation. HopaDIFF achieves state-of-the-art results on RHAS133 in diverse evaluation settings. The code is available at https://github.com/KPeng9510/HopaDIFF.git.
nan
Article 422
Title@2025-06-11 (3): Scaling Laws for Uncertainty in Deep Learning
Title: Scaling Laws for Uncertainty in Deep Learning | Skalierungsgesetze für Ungewissheit im Deep Learning | 强化深层学习不确定性法律 2506.09648v1 |
Authors (5): Mattia Rosso, Simone Rossi, Giulio Franzese, Markus Heinonen, Maurizio Filippone
Deep learning has recently revealed the existence of scaling laws, demonstrating that model performance follows predictable trends based on dataset and model sizes. Inspired by these findings and fascinating phenomena emerging in the over-parameterized regime, we examine a parallel direction: do similar scaling laws govern predictive uncertainties in deep learning? In identifiable parametric models, such scaling laws can be derived in a straightforward manner by treating model parameters in a Bayesian way. In this case, for example, we obtain $O(1/N)$ contraction rates for epistemic uncertainty with respect to the number of data $N$. However, in over-parameterized models, these guarantees do not hold, leading to largely unexplored behaviors. In this work, we empirically show the existence of scaling laws associated with various measures of predictive uncertainty with respect to dataset and model sizes. Through experiments on vision and language tasks, we observe such scaling laws for in- and out-of-distribution predictive uncertainty estimated through popular approximate Bayesian inference and ensemble methods. Besides the elegance of scaling laws and the practical utility of extrapolating uncertainties to larger data or models, this work provides strong evidence to dispel recurring skepticism against Bayesian approaches: “In many applications of deep learning we have so much data available: what do we need Bayes for?”. Our findings show that “so much data” is typically not enough to make epistemic uncertainty negligible.
nan
Article 423
Title@2025-06-11 (3): Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls
Title: Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls | Verteilungs- und adversariell robuste Logistische Regression über Intersecting Wasserstein Balls | 通过交叉瓦塞施泰因球的分布式和反向强劲后勤后退 2407.13625v4 |
Authors (6): Aras Selvi, Eleonora Kreacic, Mohsen Ghassemi, Vamsi Potluru, Tucker Balch, Manuela Veloso
Adversarially robust optimization (ARO) has emerged as the de facto standard for training models that hedge against adversarial attacks in the test stage. While these models are robust against adversarial attacks, they tend to suffer severely from overfitting. To address this issue, some successful methods replace the empirical distribution in the training stage with alternatives including (i) a worst-case distribution residing in an ambiguity set, resulting in a distributionally robust (DR) counterpart of ARO; (ii) a mixture of the empirical distribution with a distribution induced by an auxiliary (e.g., synthetic, external, out-of-domain) dataset. Inspired by the former, we study the Wasserstein DR counterpart of ARO for logistic regression and show it admits a tractable convex optimization reformulation. Adopting the latter setting, we revise the DR approach by intersecting its ambiguity set with another ambiguity set built using the auxiliary dataset, which offers a significant improvement whenever the Wasserstein distance between the data generating and auxiliary distributions can be estimated. We study the underlying optimization problem, develop efficient solution algorithms, and demonstrate that the proposed method outperforms benchmark approaches on standard datasets.
nan
Article 424
Title@2025-06-11 (3): Real-Time Network Traffic Forecasting with Missing Data: A Generative Model Approach
Title: Real-Time Network Traffic Forecasting with Missing Data: A Generative Model Approach | Echtzeit-Netzverkehr mit fehlenden Daten: Ein generatives Modellansatz | 缺少数据实时网络流量预测:生成模式方法 2506.09647v1 |
Authors (4): Lei Deng, Wenhan Xu, Jingwei Li, Danny H. K. Tsang
Real-time network traffic forecasting is crucial for network management and early resource allocation. Existing network traffic forecasting approaches operate under the assumption that the network traffic data is fully observed. However, in practical scenarios, the collected data are often incomplete due to various human and natural factors. In this paper, we propose a generative model approach for real-time network traffic forecasting with missing data. Firstly, we model the network traffic forecasting task as a tensor completion problem. Secondly, we incorporate a pre-trained generative model to achieve the low-rank structure commonly associated with tensor completion. The generative model effectively captures the intrinsic low-rank structure of network traffic data during pre-training and enables the mapping from a compact latent representation to the tensor space. Thirdly, rather than directly optimizing the high-dimensional tensor, we optimize its latent representation, which simplifies the optimization process and enables real-time forecasting. We also establish a theoretical recovery guarantee that quantifies the error bound of the proposed approach. Experiments on real-world datasets demonstrate that our approach achieves accurate network traffic forecasting within 100 ms, with a mean absolute error (MAE) below 0.002, as validated on the Abilene dataset.
nan
Article 425
Title@2025-06-11 (3): Generalizable and Fast Surrogates: Model Predictive Control of Articulated Soft Robots using Physics-Informed Neural Networks
Title: Generalizable and Fast Surrogates: Model Predictive Control of Articulated Soft Robots using Physics-Informed Neural Networks | Generalisierbare und schnelle Surrogate: Modellvorhersagesteuerung von artikulierten Soft Robots mit physikinformierten Neuronalen Netzwerken | 通用和快速超速超速:利用物理成形神经网络对人工软体机器人进行模拟预测控制 2502.01916v2 |
Authors (6): Tim-Lukas Habich, Aran Mohammad, Simon F. G. Ehlers, Martin Bensch, Thomas Seel, Moritz Schappler
Soft robots can revolutionize several applications with high demands on dexterity and safety. When operating these systems, real-time estimation and control require fast and accurate models. However, prediction with first-principles (FP) models is slow, and learned black-box models have poor generalizability. Physics-informed machine learning offers excellent advantages here, but it is currently limited to simple, often simulated systems without considering changes after training. We propose physics-informed neural networks (PINNs) for articulated soft robots (ASRs) with a focus on data efficiency. The amount of expensive real-world training data is reduced to a minimum – one dataset in one system domain. Two hours of data in different domains are used for a comparison against two gold-standard approaches: In contrast to a recurrent neural network, the PINN provides a high generalizability. The prediction speed of an accurate FP model is exceeded with the PINN by up to a factor of 467 at slightly reduced accuracy. This enables nonlinear model predictive control (MPC) of a pneumatic ASR. Accurate position tracking with the MPC running at 47 Hz is achieved in six dynamic experiments.
nan
Article 426
Title@2025-06-11 (3): Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering
Title: Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering | Effizientes und generalisierbares Graph Retriever für Wissens-Graph-Frageantworten lernen | 知识拼图问题解答的学习效率和通用图表 2506.09645v1 |
Authors (6): Tianjun Yao, Haoxuan Li, Zhiqiang Shen, Pan Li, Tongliang Liu, Kun Zhang
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains, but their reliability is hindered by the outdated knowledge and hallucinations. Retrieval-Augmented Generation mitigates these issues by grounding LLMs with external knowledge; however, most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning. Knowledge graphs, which represent facts as relational triples, offer a more structured and compact alternative. Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering (KGQA), with a significant proportion adopting the retrieve-then-reasoning paradigm. In this framework, graph-based retrievers have demonstrated strong empirical performance, yet they still face challenges in generalization ability. In this work, we propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA. RAPL addresses these limitations through three aspects: (1) a two-stage labeling strategy that combines heuristic signals with parametric models to provide causally grounded supervision; (2) a model-agnostic graph transformation approach to capture both intra- and inter-triple interactions, thereby enhancing representational capacity; and (3) a path-based reasoning strategy that facilitates learning from the injected rational knowledge, and supports downstream reasoner through structured inputs. Empirically, RAPL outperforms state-of-the-art methods by $2.66\%-20.34\%$, and significantly reduces the performance gap between smaller and more powerful LLM-based reasoners, as well as the gap under cross-dataset settings, highlighting its superior retrieval capability and generalizability. Codes are available at: https://github.com/tianyao-aka/RAPL.
nan
Article 427
Title@2025-06-11 (3): Evasion Attacks Against Bayesian Predictive Models
Title: Evasion Attacks Against Bayesian Predictive Models | Evasionsangriffe gegen bayesische Vorhersagemodelle | 攻击贝耶斯人预测模型 2506.09640v1 |
Authors (3): Pablo G. Arce, Roi Naveiro, David Ríos Insua
There is an increasing interest in analyzing the behavior of machine learning systems against adversarial attacks. However, most of the research in adversarial machine learning has focused on studying weaknesses against evasion or poisoning attacks to predictive models in classical setups, with the susceptibility of Bayesian predictive models to attacks remaining underexplored. This paper introduces a general methodology for designing optimal evasion attacks against such models. We investigate two adversarial objectives: perturbing specific point predictions and altering the entire posterior predictive distribution. For both scenarios, we propose novel gradient-based attacks and study their implementation and properties in various computational setups.
nan
Article 428
Title@2025-06-11 (3): FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models
Title: FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models | FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models | FFVLMBENCH: 设想语言模型联邦罚款基准 2506.09638v1 |
Authors (6): Weiying Zheng, Ziyue Lin, Pengxin Guo, Yuyin Zhou, Feifei Wang, Liangqiong Qu
Vision-Language Models (VLMs) have demonstrated remarkable capabilities in cross-modal understanding and generation by integrating visual and textual information. While instruction tuning and parameter-efficient fine-tuning methods have substantially improved the generalization of VLMs, most existing approaches rely on centralized training, posing challenges for deployment in domains with strict privacy requirements like healthcare. Recent efforts have introduced Federated Learning (FL) into VLM fine-tuning to address these privacy concerns, yet comprehensive benchmarks for evaluating federated fine-tuning strategies, model architectures, and task generalization remain lacking. In this work, we present \textbf{FedVLMBench}, the first systematic benchmark for federated fine-tuning of VLMs. FedVLMBench integrates two mainstream VLM architectures (encoder-based and encoder-free), four fine-tuning strategies, five FL algorithms, six multimodal datasets spanning four cross-domain single-task scenarios and two cross-domain multitask settings, covering four distinct downstream task categories. Through extensive experiments, we uncover key insights into the interplay between VLM architectures, fine-tuning strategies, data heterogeneity, and multi-task federated optimization. Notably, we find that a 2-layer multilayer perceptron (MLP) connector with concurrent connector and LLM tuning emerges as the optimal configuration for encoder-based VLMs in FL. Furthermore, current FL methods exhibit significantly higher sensitivity to data heterogeneity in vision-centric tasks than text-centric ones, across both encoder-free and encoder-based VLM architectures. Our benchmark provides essential tools, datasets, and empirical guidance for the research community, offering a standardized platform to advance privacy-preserving, federated training of multimodal foundation models.
nan
Article 429
Title@2025-06-11 (3): Model Attribution and Detection of Synthetic Speech via Vocoder Fingerprints
Title: Model Attribution and Detection of Synthetic Speech via Vocoder Fingerprints | Modellzuweisung und Erkennung von Synthetischer Sprache über Vocoder-Fingerabdrücke | 通过 Vocoder 指纹识别合成演讲的模型归属和检测 2411.14013v2 |
Authors (5): Matías Pizarro, Mike Laszkiewicz, Shawkat Hesso, Dorothea Kolossa, Asja Fischer
As speech generation technology advances, so do the potential threats of misusing synthetic speech signals. This work tackles three tasks: (1) single-model attribution in an open-world setting corresponding to the task of identifying whether synthetic speech signals originate from a specific vocoder (which requires only target vocoder data), (2) model attribution in a closed-world setting that corresponds to selecting the specific model that generated a sample from a given set of models, and (3) distinguishing synthetic from real speech. We show that standardized average residuals between audio signals and their low-pass or EnCodec filtered versions serve as powerful vocoder fingerprints that can be leveraged for all tasks achieving an average AUROC of over 99% on LJSpeech and JSUT in most settings. The accompanying robustness study shows that it is also resilient to noise levels up to a certain degree.
nan
Article 430
Title@2025-06-11 (3): In-Context Bias Propagation in LLM-Based Tabular Data Generation
Title: In-Context Bias Propagation in LLM-Based Tabular Data Generation | In-Context Bias Propagation in der LLM-basierten tabellarischen Datengenerierung | 以LLM为基础的制表数据生成中的Intext Bias 2506.09630v1 |
Authors (6): Pol G. Recasens, Alberto Gutierrez, Jordi Torres, Josep. Ll Berral, Anisa Halimi, Kieran Fraser
Large Language Models (LLMs) are increasingly used for synthetic tabular data generation through in-context learning (ICL), offering a practical solution for data augmentation in data scarce scenarios. While prior work has shown the potential of LLMs to improve downstream task performance through augmenting underrepresented groups, these benefits often assume access to a subset of unbiased in-context examples, representative of the real dataset. In real-world settings, however, data is frequently noisy and demographically skewed. In this paper, we systematically study how statistical biases within in-context examples propagate to the distribution of synthetic tabular data, showing that even mild in-context biases lead to global statistical distortions. We further introduce an adversarial scenario where a malicious contributor can inject bias into the synthetic dataset via a subset of in-context examples, ultimately compromising the fairness of downstream classifiers for a targeted and protected subgroup. Our findings demonstrate a new vulnerability associated with LLM-based data generation pipelines that rely on in-context prompts with in sensitive domains.
nan
Article 431
Title@2025-06-11 (3): Holistic Uncertainty Estimation For Open-Set Recognition
Title: Holistic Uncertainty Estimation For Open-Set Recognition | Ganzheitliche Ungewissheitsschätzung für Open-Set-Erkennung | Open-Set 识别的全局性不确定性估算 2408.14229v2 |
Authors (2): Leonid Erlygin, Alexey Zaytsev
Accurate uncertainty estimation is a critical challenge in open-set recognition, where a probe biometric sample may belong to an unknown identity. It can be addressed through sample quality estimation via probabilistic embeddings. However, the low variance of probabilistic embedding only partly implies a low identification error probability: an embedding of a sample could be close to several classes in a gallery, thus yielding high uncertainty despite high sample quality. We propose HolUE - a holistic uncertainty estimation method based on a Bayesian probabilistic model; it is aware of two sources of ambiguity in the open-set recognition system: (1) the gallery uncertainty caused by overlapping classes and (2) the uncertainty of embeddings. Challenging open-set recognition datasets, such as IJB-C for the image domain and VoxBlink for the audio domain, serve as a testbed for our method. We also provide a new open-set recognition protocol for the identification of whales and dolphins. In all cases, HolUE better identifies recognition errors than alternative uncertainty estimation methods, including those based solely on sample quality.
nan
Article 432
Title@2025-06-11 (3): GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras
Title: GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras | GLGENN: Eine neuartige Parameter-Licht-Equivariante Neural Networks-Architektur auf Basis von Clifford Geometrischen Algebras | GLGENN: 以克里夫德几何代数为基础的新参数-日等均质神经网络建筑 2506.09625v1 |
Authors (2): Ekaterina Filimoshina, Dmitry Shirokov
We propose, implement, and compare with competitors a new architecture of equivariant neural networks based on geometric (Clifford) algebras: Generalized Lipschitz Group Equivariant Neural Networks (GLGENN). These networks are equivariant to all pseudo-orthogonal transformations, including rotations and reflections, of a vector space with any non-degenerate or degenerate symmetric bilinear form. We propose a weight-sharing parametrization technique that takes into account the fundamental structures and operations of geometric algebras. Due to this technique, GLGENN architecture is parameter-light and has less tendency to overfitting than baseline equivariant models. GLGENN outperforms or matches competitors on several benchmarking equivariant tasks, including estimation of an equivariant function and a convex hull experiment, while using significantly fewer optimizable parameters.
nan
Article 433
Title@2025-06-11 (3): SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot
Title: SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot | SparseSSM: Effiziente selektiv strukturierte State Space-Modelle können in One-Shot beschnitten werden | SprassessSSSSM: 高效选择性结构结构国家空间模型可在单层中谨慎处理 2506.09613v1 |
Authors (2): Kaiwen Tuo, Huan Wang
State-space language models such as Mamba match Transformer quality while permitting linear complexity inference, yet still comprise billions of parameters that hinder deployment. Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix at the heart of the selective state-space module (SSM). In this paper, we introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures. Our layer-wise algorithm (i) derives an approximate second-order saliency score that aggregates Hessian-trace information across time steps, (ii) incorporates a component sensitivity analysis to guide feed-forward network (FFN) pruning, which also sheds light on where redundancy resides in mamba architecture, (iii) can be easily extended to semi-structured and structured sparsity. Empirically, we prune 50% of SSM weights without fine-tuning and observe no zero-shot accuracy loss, achieving the current state-of-the-art pruning algorithm for Mamba-based LLMs.
nan
Article 434
Title@2025-06-11 (3): Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Title: Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism | Verständnis der Kompetenzlücke in recurrenten Sprachmodellen: Die Rolle des Sammel- und Aggregatmechanismus | 了解经常语文模式中的技能差距:集合和集合机制的作用 2504.18574v2 |
Authors (3): Aviv Bick, Eric Xing, Albert Gu
State-space models (SSMs) offer efficient alternatives to Transformers for long sequences, but their fixed-size recurrent state limits capability on algorithmic tasks, such as retrieving past context. In this work, we examine how in-context retrieval operates in Transformer- and SSM-based language models and find that both rely on a similar Gather-and-Aggregate (G&A) mechanism: a Gather Head extracts relevant information pieces from context, which an Aggregate Head integrates into a single representation. In both architectures, G&A concentrates in a few heads, forming critical bottlenecks even for simple retrieval. For example, we show that disabling a single Gather or Aggregate Head in a pruned Llama-3.1-8B impairs retrieving the correct answer letter in MMLU, reducing its accuracy from 66% to 25% (random guessing). Moreover, this retrieval bottleneck can obscure limited knowledge demands of tasks as the pruned model succeeds on MMLU with functioning G&A heads yet fails on other knowledge benchmarks. The bottleneck similarly extends to tasks where SSMs typically underperform, such as GSM8K, BBH, and dialogue comprehension. We show that SSMs’ retrieval challenges manifest in these heads, creating smoother attention patterns instead of the sharp token transitions effective G&A requires. Thus, the Transformer-SSM retrieval gap exists in just a few heads, rather than the entire language model. This suggests a unified explanation for Transformer vs. SSM performance gap while showing how to merge their strengths. We find that pretrained hybrid models, where SSMs are combined with a few attention layers, delegate the role of Aggregate Heads to attention. Similarly, replacing a single G&A head in a pretrained SSM with an attention variant boosts retrieval and benchmark scores.
nan
Article 435
Title@2025-06-11 (3): Accelerating Large-Scale Regularized High-Order Tensor Recovery
Title: Accelerating Large-Scale Regularized High-Order Tensor Recovery | Beschleunigung der großflächigen Regularisierung der High-Order Tensor Erholung | 加速大规模正规化高阶天文恢复 2506.09594v1 |
Authors (4): Wenjin Qin, Hailin Wang, Jingyao Hou, Jianjun Wang
Currently, existing tensor recovery methods fail to recognize the impact of tensor scale variations on their structural characteristics. Furthermore, existing studies face prohibitive computational costs when dealing with large-scale high-order tensor data. To alleviate these issue, assisted by the Krylov subspace iteration, block Lanczos bidiagonalization process, and random projection strategies, this article first devises two fast and accurate randomized algorithms for low-rank tensor approximation (LRTA) problem. Theoretical bounds on the accuracy of the approximation error estimate are established. Next, we develop a novel generalized nonconvex modeling framework tailored to large-scale tensor recovery, in which a new regularization paradigm is exploited to achieve insightful prior representation for large-scale tensors. On the basis of the above, we further investigate new unified nonconvex models and efficient optimization algorithms, respectively, for several typical high-order tensor recovery tasks in unquantized and quantized situations. To render the proposed algorithms practical and efficient for large-scale tensor data, the proposed randomized LRTA schemes are integrated into their central and time-intensive computations. Finally, we conduct extensive experiments on various large-scale tensors, whose results demonstrate the practicability, effectiveness and superiority of the proposed method in comparison with some state-of-the-art approaches.
nan
Article 436
Title@2025-06-11 (3): Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting
Title: Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting | Physics-informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting | 用于全球次季对季预报的计算机化远程连接软件变换器 2506.08049v2 |
Authors (3): Tengfei Lyu, Weijia Zhang, Hao Liu
Subseasonal-to-seasonal (S2S) forecasting, which predicts climate conditions from several weeks to months in advance, presents significant challenges due to the chaotic dynamics of atmospheric systems and complex interactions across multiple scales. Current approaches often fail to explicitly model underlying physical processes and teleconnections that are crucial at S2S timescales. We introduce TelePiT, a novel deep learning architecture that enhances global S2S forecasting through integrated multi-scale physics and teleconnection awareness. Our approach consists of three key components: (1) Spherical Harmonic Embedding, which accurately encodes global atmospheric variables onto spherical geometry; (2) Multi-Scale Physics-Informed Neural ODE, which explicitly captures atmospheric physical processes across multiple learnable frequency bands; (3) Teleconnection-Aware Transformer, which models critical global climate interactions through tactfully injecting teleconnection patterns into the self-attention. Extensive experiments demonstrate that TelePiT significantly outperforms state-of-the-art data-driven baselines and operational numerical weather prediction systems, with remarkable improvements for atmospheric variables including a 57.7% reduction in RMSE for 2-meter temperature compared to previous best models.
nan
Article 437
Title@2025-06-11 (3): Metric-Entropy Limits on the Approximation of Nonlinear Dynamical Systems
Title: Metric-Entropy Limits on the Approximation of Nonlinear Dynamical Systems | Metrisch-Entropie-Grenzen zur Annäherung nichtlinearer dynamischer Systeme | 非线-线性动态系统接近度的中子-内基限值 2407.01250v2 |
Authors (3): Yang Pan, Clemens Hutter, Helmut Bölcskei
This paper is concerned with fundamental limits on the approximation of nonlinear dynamical systems. Specifically, we show that recurrent neural networks (RNNs) can approximate nonlinear systems – that satisfy a Lipschitz property and forget past inputs fast enough – in metric-entropy-optimal manner. As the sets of sequence-to-sequence mappings realized by the dynamical systems we consider are significantly more massive than function classes generally analyzed in approximation theory, a refined metric-entropy characterization is needed, namely in terms of order, type, and generalized dimension. We compute these quantities for the classes of exponentially- and polynomially Lipschitz fading-memory systems and show that RNNs can achieve them.
nan
Article 438
Title@2025-06-11 (3): Beyond Overconfidence: Foundation Models Redefine Calibration in Deep Neural Networks
Title: Beyond Overconfidence: Foundation Models Redefine Calibration in Deep Neural Networks | Beyond Overconfidency: Foundation Models Neufine Calibration in Deep Neural Networks | 超越自信:深神经网络中的模型重新校准基金会 2506.09593v1 |
Authors (3): Achim Hekler, Lukas Kuhn, Florian Buettner
Reliable uncertainty calibration is essential for safely deploying deep neural networks in high-stakes applications. Deep neural networks are known to exhibit systematic overconfidence, especially under distribution shifts. Although foundation models such as ConvNeXt, EVA and BEiT have demonstrated significant improvements in predictive performance, their calibration properties remain underexplored. This paper presents a comprehensive investigation into the calibration behavior of foundation models, revealing insights that challenge established paradigms. Our empirical analysis shows that these models tend to be underconfident in in-distribution predictions, resulting in higher calibration errors, while demonstrating improved calibration under distribution shifts. Furthermore, we demonstrate that foundation models are highly responsive to post-hoc calibration techniques in the in-distribution setting, enabling practitioners to effectively mitigate underconfidence bias. However, these methods become progressively less reliable under severe distribution shifts and can occasionally produce counterproductive results. Our findings highlight the complex, non-monotonic effects of architectural and training innovations on calibration, challenging established narratives of continuous improvement.
nan
Article 439
Title@2025-06-11 (3): Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis
Title: Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis | Decoding Knowledge Attribution in Mixture-of-Experts: Ein Rahmenwerk der Basic-Refinement-Kollaboration und Effizienzanalyse | 在混合专家中解码知识的配置:基础改进协作和效率分析框架 2505.24593v2 |
Authors (6): Junzhuo Li, Bo Wang, Xiuze Zhou, Peijie Jiang, Jia Liu, Xuming Hu
The interpretability of Mixture-of-Experts (MoE) models, especially those with heterogeneous designs, remains underexplored. Existing attribution methods for dense models fail to capture dynamic routing-expert interactions in sparse MoE architectures. To address this issue, we propose a cross-level attribution algorithm to analyze sparse MoE architectures (Qwen 1.5-MoE, OLMoE, Mixtral-8x7B) against dense models (Qwen 1.5-7B, Llama-7B, Mistral-7B). Results show MoE models achieve 37% higher per-layer efficiency via a “mid-activation, late-amplification” pattern: early layers screen experts, while late layers refine knowledge collaboratively. Ablation studies reveal a “basic-refinement” framework–shared experts handle general tasks (entity recognition), while routed experts specialize in domain-specific processing (geographic attributes). Semantic-driven routing is evidenced by strong correlations between attention heads and experts (r=0.68), enabling task-aware coordination. Notably, architectural depth dictates robustness: deep Qwen 1.5-MoE mitigates expert failures (e.g., 43% MRR drop in geographic tasks when blocking top-10 experts) through shared expert redundancy, whereas shallow OLMoE suffers severe degradation (76% drop). Task sensitivity further guides design: core-sensitive tasks (geography) require concentrated expertise, while distributed-tolerant tasks (object attributes) leverage broader participation. These insights advance MoE interpretability, offering principles to balance efficiency, specialization, and robustness.
nan
Article 440
Title@2025-06-11 (3): Human-like object concept representations emerge naturally in multimodal large language models
Title: Human-like object concept representations emerge naturally in multimodal large language models | Mensch-ähnliche Objekt-Konzept-Darstellungen entstehen natürlich in multimodalen großen Sprachmodellen | 在多式大型语言模式中自然出现类似人类的物体概念表述 2407.01067v3 |
Authors (13): Changde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, Jinpeng Li, Shuang Qiu, Le Chang, Huiguang He
Understanding how humans conceptualize and categorize natural objects offers critical insights into perception and cognition. With the advent of Large Language Models (LLMs), a key question arises: can these models develop human-like object representations from linguistic and multimodal data? In this study, we combined behavioral and neuroimaging analyses to explore the relationship between object concept representations in LLMs and human cognition. We collected 4.7 million triplet judgments from LLMs and Multimodal LLMs (MLLMs) to derive low-dimensional embeddings that capture the similarity structure of 1,854 natural objects. The resulting 66-dimensional embeddings were stable, predictive, and exhibited semantic clustering similar to human mental representations. Remarkably, the dimensions underlying these embeddings were interpretable, suggesting that LLMs and MLLMs develop human-like conceptual representations of objects. Further analysis showed strong alignment between model embeddings and neural activity patterns in brain regions such as EBA, PPA, RSC, and FFA. This provides compelling evidence that the object representations in LLMs, while not identical to human ones, share fundamental similarities that reflect key aspects of human conceptual knowledge. Our findings advance the understanding of machine intelligence and inform the development of more human-like artificial cognitive systems.
nan
Article 441
Title@2025-06-11 (3): Feasibility Study of CNNs and MLPs for Radiation Heat Transfer in 2-D Furnaces with Spectrally Participative Gases
Title: Feasibility Study of CNNs and MLPs for Radiation Heat Transfer in 2-D Furnaces with Spectrally Participative Gases | Machbarkeitsstudie von CNNs und MLPs für den Strahlungswärmetransfer in 2-D-Öfen mit Spektrally Participative Gasen | 关于有线电视新闻网和多频多频卫星在2-D发热中用光谱参与气体进行辐射热传导的有线电视新闻网和 MLP的可行性研究 2506.08033v2 |
Authors (5): Axel TahmasebiMoradi, Vincent Ren, Benjamin Le-Creurer, Chetra Mang, Mouadh Yagoubi
Aiming to reduce the computational cost of numerical simulations, a convolutional neural network (CNN) and a multi-layer perceptron (MLP) are introduced to build a surrogate model to approximate radiative heat transfer solutions in a 2-D walled domain with participative gases. The originality of this work lays in the adaptation of the inputs of the problem (gas and wall properties) in order to fit with the CNN architecture, more commonly used for image processing. Two precision datasets have been created with the classical solver, ICARUS2D, that uses the discrete transfer radiation method with the statistical narrow bands model. The performance of the CNN architecture is compared to a more classical MLP architecture in terms of speed and accuracy. Thanks to Optuna, all results are obtained using the optimized hyper parameters networks. The results show a significant speedup with industrially acceptable relative errors compared to the classical solver for both architectures. Additionally, the CNN outperforms the MLP in terms of precision and is more robust and stable to changes in hyper-parameters. A performance analysis on the dataset size of the samples have also been carried out to gain a deeper understanding of the model behavior.
nan
Article 442
Title@2025-06-11 (3): MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
Title: MOORL: A Framework for Integrating Offline-Online Reinforcement Learning | MOORL: Ein Rahmen für die Integration von Offline-Online-Verstärkungslernen | MORL: 整合离线-在线强化学习的框架 2506.09574v1 |
Authors (3): Gaurav Chaudhary, Wassim Uddin Mondal, Laxmidhar Behera
Sample efficiency and exploration remain critical challenges in Deep Reinforcement Learning (DRL), particularly in complex domains. Offline RL, which enables agents to learn optimal policies from static, pre-collected datasets, has emerged as a promising alternative. However, offline RL is constrained by issues such as out-of-distribution (OOD) actions that limit policy performance and generalization. To overcome these limitations, we propose Meta Offline-Online Reinforcement Learning (MOORL), a hybrid framework that unifies offline and online RL for efficient and scalable learning. While previous hybrid methods rely on extensive design components and added computational complexity to utilize offline data effectively, MOORL introduces a meta-policy that seamlessly adapts across offline and online trajectories. This enables the agent to leverage offline data for robust initialization while utilizing online interactions to drive efficient exploration. Our theoretical analysis demonstrates that the hybrid approach enhances exploration by effectively combining the complementary strengths of offline and online data. Furthermore, we demonstrate that MOORL learns a stable Q-function without added complexity. Extensive experiments on 28 tasks from the D4RL and V-D4RL benchmarks validate its effectiveness, showing consistent improvements over state-of-the-art offline and hybrid RL baselines. With minimal computational overhead, MOORL achieves strong performance, underscoring its potential for practical applications in real-world scenarios.
nan
Article 443
Title@2025-06-11 (3): Mechanistic PDE Networks for Discovery of Governing Equations
Title: Mechanistic PDE Networks for Discovery of Governing Equations | Mechanistische PDE-Netzwerke für die Entdeckung von Regierenden Gleichungen | 发现管理等分的机械 PDE 网络 2502.18377v2 |
Authors (3): Adeel Pervez, Efstratios Gavves, Francesco Locatello
We present Mechanistic PDE Networks – a model for discovery of governing partial differential equations from data. Mechanistic PDE Networks represent spatiotemporal data as space-time dependent linear partial differential equations in neural network hidden representations. The represented PDEs are then solved and decoded for specific tasks. The learned PDE representations naturally express the spatiotemporal dynamics in data in neural network hidden space, enabling increased power for dynamical modeling. Solving the PDE representations in a compute and memory-efficient way, however, is a significant challenge. We develop a native, GPU-capable, parallel, sparse, and differentiable multigrid solver specialized for linear partial differential equations that acts as a module in Mechanistic PDE Networks. Leveraging the PDE solver, we propose a discovery architecture that can discover nonlinear PDEs in complex settings while also being robust to noise. We validate PDE discovery on a number of PDEs, including reaction-diffusion and Navier-Stokes equations.
nan
Article 444
Title@2025-06-11 (3): From Symbolic to Neural and Back: Exploring Knowledge Graph-Large Language Model Synergies
Title: From Symbolic to Neural and Back: Exploring Knowledge Graph-Large Language Model Synergies | Von Symbolisch zu Neural und Zurück: Erforschen von Wissen Graphen-Große Sprachmodell-Synergien | 从符号到神经和后背:探索知识图集-大型语言模式协同作用 2506.09566v1 |
Authors (4): Blaž Škrlj, Boshko Koloski, Senja Pollak, Nada Lavrač
Integrating structured knowledge from Knowledge Graphs (KGs) into Large Language Models (LLMs) enhances factual grounding and reasoning capabilities. This survey paper systematically examines the synergy between KGs and LLMs, categorizing existing approaches into two main groups: KG-enhanced LLMs, which improve reasoning, reduce hallucinations, and enable complex question answering; and LLM-augmented KGs, which facilitate KG construction, completion, and querying. Through comprehensive analysis, we identify critical gaps and highlight the mutual benefits of structured knowledge integration. Compared to existing surveys, our study uniquely emphasizes scalability, computational efficiency, and data quality. Finally, we propose future research directions, including neuro-symbolic integration, dynamic KG updating, data reliability, and ethical considerations, paving the way for intelligent systems capable of managing more complex real-world knowledge tasks.
nan
Article 445
Title@2025-06-11 (3): Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics
Title: Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics | Funktion Normalisierung verhindert Zusammenbruch nicht kontrastiver Lerndynamiken | 防止非争议学习动态的崩溃 2309.16109v2 |
Authors (1): Han Bao
Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may collapse the dynamics, which is an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.
nan
Article 446
Title@2025-06-11 (3): Traceable LLM-based validation of statements in knowledge graphs
Title: Traceable LLM-based validation of statements in knowledge graphs | Rückführbare LLM-basierte Validierung von Aussagen in Wissensgraphen | 以可追踪的LLM为基础,对知识图中的报表进行可追踪的LLM验证 2409.07507v2 |
Authors (2): Daniel Adam, Tomáš Kliegr
This article presents a method for verifying RDF triples using LLMs, with an emphasis on providing traceable arguments. Because the LLMs cannot currently reliably identify the origin of the information used to construct the response to the user prompt, our approach is to avoid using internal LLM factual knowledge altogether. Instead, verified RDF statements are compared to chunks of external documents retrieved through a web search or Wikipedia. To assess the possible application of this retrieval augmented generation (RAG) workflow on biosciences content, we evaluated 1,719 positive statements from the BioRED dataset and the same number of newly generated negative statements. The resulting precision is 88 %, and recall is 44 %. This indicates that the method requires human oversight. We also evaluated the method on the SNLI dataset, which allowed us to compare our approach with models specifically tuned for the natural language inference task. We demonstrate the method on Wikidata, where a SPARQL query is used to automatically retrieve statements needing verification. Overall, the results suggest that LLMs could be used for large-scale verification of statements in KGs, a task previously unfeasible due to human annotation costs.
nan
Article 447
Title@2025-06-11 (3): On Finetuning Tabular Foundation Models
Title: On Finetuning Tabular Foundation Models | Über Finetuning Tabular Foundation Modelle | 关于微调表表基础模型 2506.08982v2 |
Authors (4): Ivan Rubachev, Akim Kotelnikov, Nikolay Kartashev, Artem Babenko
Foundation models are an emerging research direction in tabular deep learning. Notably, TabPFNv2 recently claimed superior performance over traditional GBDT-based methods on small-scale datasets using an in-context learning paradigm, which does not adapt model parameters to target datasets. However, the optimal finetuning approach for adapting tabular foundational models, and how this adaptation reshapes their internal mechanisms, remains underexplored. While prior works studied finetuning for earlier foundational models, inconsistent findings and TabPFNv2’s unique architecture necessitate fresh investigation. To address these questions, we first systematically evaluate various finetuning strategies on diverse datasets. Our findings establish full finetuning as the most practical solution for TabPFNv2 in terms of time-efficiency and effectiveness. We then investigate how finetuning alters TabPFNv2’s inner mechanisms, drawing an analogy to retrieval-augmented models. We reveal that the success of finetuning stems from the fact that after gradient-based adaptation, the dot products of the query-representations of test objects and the key-representations of in-context training objects more accurately reflect their target similarity. This improved similarity allows finetuned TabPFNv2 to better approximate target dependency by appropriately weighting relevant in-context samples, improving the retrieval-based prediction logic. From the practical perspective, we managed to finetune TabPFNv2 on datasets with up to 50K objects, observing performance improvements on almost all tasks. More precisely, on academic datasets with I.I.D. splits, finetuning allows TabPFNv2 to achieve state-of-the-art results, while on datasets with gradual temporal shifts and rich feature sets, TabPFNv2 is less stable and prior methods remain better.
nan
Article 448
Title@2025-06-11 (3): Large Language Models Miss the Multi-Agent Mark
Title: Large Language Models Miss the Multi-Agent Mark | Große Sprachmodelle vermissen das Multi-Agent Mark | 大语言模型 2505.21298v2 |
Authors (8): Emanuele La Malfa, Gabriele La Malfa, Samuele Marro, Jie M. Zhang, Elizabeth Black, Michael Luck, Philip Torr, Michael Wooldridge
Recent interest in Multi-Agent Systems of Large Language Models (MAS LLMs) has led to an increase in frameworks leveraging multiple LLMs to tackle complex tasks. However, much of this literature appropriates the terminology of MAS without engaging with its foundational principles. In this position paper, we highlight critical discrepancies between MAS theory and current MAS LLMs implementations, focusing on four key areas: the social aspect of agency, environment design, coordination and communication protocols, and measuring emergent behaviours. Our position is that many MAS LLMs lack multi-agent characteristics such as autonomy, social interaction, and structured environments, and often rely on oversimplified, LLM-centric architectures. The field may slow down and lose traction by revisiting problems the MAS literature has already addressed. Therefore, we systematically analyse this issue and outline associated research opportunities; we advocate for better integrating established MAS concepts and more precise terminology to avoid mischaracterisation and missed opportunities.
nan
Article 449
Title@2025-06-11 (3): Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means
Title: Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means | Robuste Verteilte Schätzung: Erweiterung von Gossip-Algorithmen auf Rangfolge und Trimmmittel | 强有力的分布分布式估算:将Gossip的数值扩大至排名和缩略语 2505.17836v5 |
Authors (3): Anna Van Elst, Igor Colin, Stephan Clémençon
This paper addresses the problem of robust estimation in gossip algorithms over arbitrary communication graphs. Gossip algorithms are fully decentralized, relying only on local neighbor-to-neighbor communication, making them well-suited for situations where communication is constrained. A fundamental challenge in existing mean-based gossip algorithms is their vulnerability to malicious or corrupted nodes. In this paper, we show that an outlier-robust mean can be computed by globally estimating a robust statistic. More specifically, we propose a novel gossip algorithm for rank estimation, referred to as \textsc{GoRank}, and leverage it to design a gossip procedure dedicated to trimmed mean estimation, coined \textsc{GoTrim}. In addition to a detailed description of the proposed methods, a key contribution of our work is a precise convergence analysis: we establish an $\mathcal{O}(1/t)$ rate for rank estimation and an $\mathcal{O}((\log t)/\sqrt{t})$ rate for trimmed mean estimation, where by $t$ is meant the number of iterations. Moreover, we provide a breakdown point analysis of \textsc{GoTrim}. We empirically validate our theoretical results through experiments on diverse network topologies, data distributions and contamination schemes.
nan
Article 450
Title@2025-06-11 (3): DeepExtractor: Time-domain reconstruction of signals and glitches in gravitational wave data with deep learning
Title: DeepExtractor: Time-domain reconstruction of signals and glitches in gravitational wave data with deep learning | DeepExtractor: Zeit-Domänen-Rekonstruktion von Signalen und Störungen in Gravitationswellendaten mit Deep Learning | 深度提取器:用深层学习重建引力波数据信号和小故障的时间- 域重建 2501.18423v3 |
Authors (7): Tom Dooney, Harsh Narola, Stefano Bromuri, R. Lyana Curier, Chris Van Den Broeck, Sarah Caudill, Daniel Stanley Tan
Gravitational wave (GW) detectors, such as LIGO, Virgo, and KAGRA, detect faint signals from distant astrophysical events. However, their high sensitivity also makes them susceptible to background noise, which can obscure these signals. This noise often includes transient artifacts called ‘glitches’, that can mimic genuine astrophysical signals or mask their true characteristics. In this study, we present DeepExtractor, a deep learning framework that is designed to reconstruct signals and glitches with power exceeding interferometer noise, regardless of their source. We design DeepExtractor to model the inherent noise distribution of GW detectors, following conventional assumptions that the noise is Gaussian and stationary over short time scales. It operates by predicting and subtracting the noise component of the data, retaining only the clean reconstruction of signal or glitch. We focus on applications related to glitches and validate DeepExtractor’s effectiveness through three experiments: (1) reconstructing simulated glitches injected into simulated detector noise, (2) comparing its performance with the state-of-the-art BayesWave algorithm, and (3) analyzing real data from the Gravity Spy dataset to demonstrate effective glitch subtraction from LIGO strain data. We further demonstrate its potential by reconstructing three real GW events from LIGO’s third observing run, without being trained on GW waveforms. Our proposed model achieves a median mismatch of only 0.9% for simulated glitches, outperforming several deep learning baselines. Additionally, DeepExtractor surpasses BayesWave in glitch recovery, offering a dramatic computational speedup by reconstructing one glitch sample in approximately 0.1 seconds on a CPU, compared to BayesWave’s processing time of approximately one hour per glitch.
nan
Article 451
Title@2025-06-11 (3): Differentially Private Federated $k$-Means Clustering with Server-Side Data
Title: Differentially Private Federated $k$-Means Clustering with Server-Side Data | Differential Private Federated $k$-Means Clustering mit Server-Side-Daten | 与服务器-Side数据组合在一起 2506.05408v2 |
Authors (3): Jonathan Scott, Christoph H. Lampert, David Saulpic
Clustering is a cornerstone of data analysis that is particularly suited to identifying coherent subgroups or substructures in unlabeled data, as are generated continuously in large amounts these days. However, in many cases traditional clustering methods are not applicable, because data are increasingly being produced and stored in a distributed way, e.g. on edge devices, and privacy concerns prevent it from being transferred to a central server. To address this challenge, we present FedDP-KMeans, a new algorithm for $k$-means clustering that is fully-federated as well as differentially private. Our approach leverages (potentially small and out-of-distribution) server-side data to overcome the primary challenge of differentially private clustering methods: the need for a good initialization. Combining our initialization with a simple federated DP-Lloyds algorithm we obtain an algorithm that achieves excellent results on synthetic and real-world benchmark tasks. We also provide a theoretical analysis of our method that provides bounds on the convergence speed and cluster identification success.
nan
Article 452
Title@2025-06-11 (3): The Causal Information Bottleneck and Optimal Causal Variable Abstractions
Title: The Causal Information Bottleneck and Optimal Causal Variable Abstractions | Die Kausalinformation Engpass und Optimale Kausalvariable Abstraktionen | 因果信息瓶颈和最佳因果变量抽象 2410.00535v5 |
Authors (3): Francisco N. F. Q. Simoes, Mehdi Dastani, Thijs van Ommen
To effectively study complex causal systems, it is often useful to construct abstractions of parts of the system by discarding irrelevant details while preserving key features. The Information Bottleneck (IB) method is a widely used approach to construct variable abstractions by compressing random variables while retaining predictive power over a target variable. Traditional methods like IB are purely statistical and ignore underlying causal structures, making them ill-suited for causal tasks. We propose the Causal Information Bottleneck (CIB), a causal extension of the IB, which compresses a set of chosen variables while maintaining causal control over a target variable. This method produces abstractions of (sets of) variables which are causally interpretable, give us insight about the interactions between the abstracted variables and the target variable, and can be used when reasoning about interventions. We present experimental results demonstrating that the learned abstractions accurately capture causal relations as intended.
nan
Article 453
Title@2025-06-11 (3): Identifiable Latent Bandits: Leveraging observational data for personalized decision-making
Title: Identifiable Latent Bandits: Leveraging observational data for personalized decision-making | Identifizierbare Latent Bandits: Nutzung von Beobachtungsdaten für personalisierte Entscheidungen | 利用观测数据进行个性化决策 2407.16239v4 |
Authors (4): Ahmet Zahid Balcıoğlu, Newton Mwai, Emil Carlsson, Fredrik D. Johansson
For many decision-making tasks, such as precision medicine, historical data alone are insufficient to determine the right choice for a new problem instance or patient. Online algorithms like multi-armed bandits can find optimal personalized decisions but are notoriously sample-hungry. In practice, training a bandit for a new individual from scratch is often infeasible, as the number of trials required is larger than the practical number of decision points. Latent bandits offer rapid exploration and personalization beyond what context variables can reveal, provided that a latent variable model can be learned consistently. In this work, we propose an identifiable latent bandit framework that leads to optimal decision-making with a shorter exploration time than classical bandits by learning from historical records of decisions and outcomes. Our method is based on nonlinear independent component analysis that provably identifies representations from observational data sufficient to infer the optimal action in new bandit instances. We verify this strategy in simulated and semi-synthetic environments, showing substantial improvement over online and offline learning baselines when identifying conditions are satisfied.
nan
Article 454
Title@2025-06-11 (3): Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information
Title: Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information | Eng gekoppelte LiDAR-IMU-Leg-Odometrie mit Online-Lektionskinematik, die Fußtaktile Informationen enthält | LIDAR-IMU-Leg Odo量度与包含脚触觉信息的在线学习腿脚力学 2506.09548v1 |
Authors (7): Taku Okawara, Kenji Koide, Aoki Takanose, Shuji Oishi, Masashi Yokozuka, Kentaro Uno, Kazuya Yoshida
In this letter, we present tightly coupled LiDAR-IMU-leg odometry, which is robust to challenging conditions such as featureless environments and deformable terrains. We developed an online learning-based leg kinematics model named the neural leg kinematics model, which incorporates tactile information (foot reaction force) to implicitly express the nonlinear dynamics between robot feet and the ground. Online training of this model enhances its adaptability to weight load changes of a robot (e.g., assuming delivery or transportation tasks) and terrain conditions. According to the \textit{neural adaptive leg odometry factor} and online uncertainty estimation of the leg kinematics model-based motion predictions, we jointly solve online training of this kinematics model and odometry estimation on a unified factor graph to retain the consistency of both. The proposed method was verified through real experiments using a quadruped robot in two challenging situations: 1) a sandy beach, representing an extremely featureless area with a deformable terrain, and 2) a campus, including multiple featureless areas and terrain types of asphalt, gravel (deformable terrain), and grass. Experimental results showed that our odometry estimation incorporating the \textit{neural leg kinematics model} outperforms state-of-the-art works. Our project page is available for further details: https://takuokawara.github.io/RAL2025_project_page/
nan
Article 455
Title@2025-06-11 (3): Design Patterns for Securing LLM Agents against Prompt Injections
Title: Design Patterns for Securing LLM Agents against Prompt Injections | Designmuster zur Sicherung von LLM-Agenten gegen sofortige Injektionen | 防止及时注射的LLM代理物的安全设计模式 2506.08837v2 |
Authors (13): Luca Beurer-Kellner, Beat Buesser Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn
As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent’s resilience on natural language inputs – an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.
nan
Article 456
Title@2025-06-11 (3): On the Privacy Risks of Spiking Neural Networks: A Membership Inference Analysis
Title: On the Privacy Risks of Spiking Neural Networks: A Membership Inference Analysis | Über die Datenschutzrisiken von Spiking Neuronal Networks: Eine Schlussfolgerungsanalyse für die Mitgliedschaft | 《关于间谍神经网络的隐私风险:成员推断分析》 2502.13191v4 |
Authors (4): Junyi Guan, Abhijith Sharma, Chong Tian, Salem Lahlou
Spiking Neural Networks (SNNs) are increasingly explored for their energy efficiency and robustness in real-world applications, yet their privacy risks remain largely unexamined. In this work, we investigate the susceptibility of SNNs to Membership Inference Attacks (MIAs) – a major privacy threat where an adversary attempts to determine whether a given sample was part of the training dataset. While prior work suggests that SNNs may offer inherent robustness due to their discrete, event-driven nature, we find that its resilience diminishes as latency (T) increases. Furthermore, we introduce an input dropout strategy under black box setting, that significantly enhances membership inference in SNNs. Our findings challenge the assumption that SNNs are inherently more secure, and even though they are expected to be better, our results reveal that SNNs exhibit privacy vulnerabilities that are equally comparable to Artificial Neural Networks (ANNs). Our code is available at https://github.com/sharmaabhijith/MIA_SNN.
nan
Article 457
Title@2025-06-11 (3): Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Title: Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models | Athena: Multimodale Vernunft mit dateneffizienten Prozess-Reward-Modellen verbessern | Athena:利用数据高效流程奖励模式加强多式联运理由 2506.09532v1 |
Authors (6): Shuai Wang, Zhenhua Liu, Jiaheng Wei, Xuanwu Yin, Dong Li, Emad Barsoum
We present Athena-PRM, a multimodal process reward model (PRM) designed to evaluate the reward score for each step in solving complex reasoning problems. Developing high-performance PRMs typically demands significant time and financial investment, primarily due to the necessity for step-level annotations of reasoning steps. Conventional automated labeling methods, such as Monte Carlo estimation, often produce noisy labels and incur substantial computational costs. To efficiently generate high-quality process-labeled data, we propose leveraging prediction consistency between weak and strong completers as a criterion for identifying reliable process labels. Remarkably, Athena-PRM demonstrates outstanding effectiveness across various scenarios and benchmarks with just 5,000 samples. Furthermore, we also develop two effective strategies to improve the performance of PRMs: ORM initialization and up-sampling for negative data. We validate our approach in three specific scenarios: verification for test time scaling, direct evaluation of reasoning step correctness, and reward ranked fine-tuning. Our Athena-PRM consistently achieves superior performance across multiple benchmarks and scenarios. Notably, when using Qwen2.5-VL-7B as the policy model, Athena-PRM enhances performance by 10.2 points on WeMath and 7.1 points on MathVista for test time scaling. Furthermore, Athena-PRM sets the state-of-the-art (SoTA) results in VisualProcessBench and outperforms the previous SoTA by 3.9 F1-score, showcasing its robust capability to accurately assess the correctness of the reasoning step. Additionally, utilizing Athena-PRM as the reward model, we develop Athena-7B with reward ranked fine-tuning and outperforms baseline with a significant margin on five benchmarks.
nan
Article 458
Title@2025-06-11 (3): Neural Functions for Learning Periodic Signal
Title: Neural Functions for Learning Periodic Signal | Neurale Funktionen für das Lernen von periodischem Signal | 学习周期信号的神经功能 2506.09526v1 |
Authors (4): Woojin Cho, Minju Jo, Kookjin Lee, Noseong Park
As function approximators, deep neural networks have served as an effective tool to represent various signal types. Recent approaches utilize multi-layer perceptrons (MLPs) to learn a nonlinear mapping from a coordinate to its corresponding signal, facilitating the learning of continuous neural representations from discrete data points. Despite notable successes in learning diverse signal types, coordinate-based MLPs often face issues of overfitting and limited generalizability beyond the training region, resulting in subpar extrapolation performance. This study addresses scenarios where the underlying true signals exhibit periodic properties, either spatially or temporally. We propose a novel network architecture, which extracts periodic patterns from measurements and leverages this information to represent the signal, thereby enhancing generalization and improving extrapolation performance. We demonstrate the efficacy of the proposed method through comprehensive experiments, including the learning of the periodic solutions for differential equations, and time series imputation (interpolation) and forecasting (extrapolation) on real-world datasets.
nan
Article 459
Title@2025-06-11 (3): Bias Detection via Maximum Subgroup Discrepancy
Title: Bias Detection via Maximum Subgroup Discrepancy | Bias-Erkennung über maximale Untergruppe Diskrepanz | 通过最大分组差异探测比亚斯 2502.02221v2 |
Authors (5): Jiří Němeček, Mark Kozdoba, Illia Kryvoviaz, Tomáš Pevný, Jakub Mareček
Bias evaluation is fundamental to trustworthy AI, both in terms of checking data quality and in terms of checking the outputs of AI systems. In testing data quality, for example, one may study the distance of a given dataset, viewed as a distribution, to a given ground-truth reference dataset. However, classical metrics, such as the Total Variation and the Wasserstein distances, are known to have high sample complexities and, therefore, may fail to provide a meaningful distinction in many practical scenarios. In this paper, we propose a new notion of distance, the Maximum Subgroup Discrepancy (MSD). In this metric, two distributions are close if, roughly, discrepancies are low for all feature subgroups. While the number of subgroups may be exponential, we show that the sample complexity is linear in the number of features, thus making it feasible for practical applications. Moreover, we provide a practical algorithm for evaluating the distance based on Mixed-integer optimization (MIO). We also note that the proposed distance is easily interpretable, thus providing clearer paths to fixing the biases once they have been identified. Finally, we describe a natural general bias detection framework, termed MSDD distances, and show that MSD aligns well with this framework. We empirically evaluate MSD by comparing it with other metrics and by demonstrating the above properties of MSD on real-world datasets.
nan
Article 460
Title@2025-06-11 (3): LLM-Powered CPI Prediction Inference with Online Text Time Series
Title: LLM-Powered CPI Prediction Inference with Online Text Time Series | LLM-Powered CPI Prediction Inferenz mit Online Text Time Series | LLM 授权的CPI 在线文本时间序列预测推论 2506.09516v1 |
Authors (4): Yingying Fan, Jinchi Lv, Ao Sun, Yurou Wang
Forecasting the Consumer Price Index (CPI) is an important yet challenging task in economics, where most existing approaches rely on low-frequency, survey-based data. With the recent advances of large language models (LLMs), there is growing potential to leverage high-frequency online text data for improved CPI prediction, an area still largely unexplored. This paper proposes LLM-CPI, an LLM-based approach for CPI prediction inference incorporating online text time series. We collect a large set of high-frequency online texts from a popularly used Chinese social network site and employ LLMs such as ChatGPT and the trained BERT models to construct continuous inflation labels for posts that are related to inflation. Online text embeddings are extracted via LDA and BERT. We develop a joint time series framework that combines monthly CPI data with LLM-generated daily CPI surrogates. The monthly model employs an ARX structure combining observed CPI data with text embeddings and macroeconomic variables, while the daily model uses a VARX structure built on LLM-generated CPI surrogates and text embeddings. We establish the asymptotic properties of the method and provide two forms of constructed prediction intervals. The finite-sample performance and practical advantages of LLM-CPI are demonstrated through both simulation and real data examples.
nan
Article 461
Title@2025-06-11 (3): A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications
Title: A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications | Eine Umfrage zur Rolle von Künstlicher Intelligenz und maschinellem Lernen in 6G-V2X Anwendungen | 关于6G-V2X应用中人工情报和机器学习的作用的调查 2506.09512v1 |
Authors (4): Donglin Wang, Anjie Qiu, Qiuheng Zhou, Hans D. Schotten
The rapid advancement of Vehicle-to-Everything (V2X) communication is transforming Intelligent Transportation Systems (ITS), with 6G networks expected to provide ultra-reliable, low-latency, and high-capacity connectivity for Connected and Autonomous Vehicles (CAVs). Artificial Intelligence (AI) and Machine Learning (ML) have emerged as key enablers in optimizing V2X communication by enhancing network management, predictive analytics, security, and cooperative driving due to their outstanding performance across various domains, such as natural language processing and computer vision. This survey comprehensively reviews recent advances in AI and ML models applied to 6G-V2X communication. It focuses on state-of-the-art techniques, including Deep Learning (DL), Reinforcement Learning (RL), Generative Learning (GL), and Federated Learning (FL), with particular emphasis on developments from the past two years. Notably, AI, especially GL, has shown remarkable progress and emerging potential in enhancing the performance, adaptability, and intelligence of 6G-V2X systems. Despite these advances, a systematic summary of recent research efforts in this area remains lacking, which this survey aims to address. We analyze their roles in 6G-V2X applications, such as intelligent resource allocation, beamforming, intelligent traffic management, and security management. Furthermore, we explore the technical challenges, including computational complexity, data privacy, and real-time decision-making constraints, while identifying future research directions for AI-driven 6G-V2X development. This study aims to provide valuable insights for researchers, engineers, and policymakers working towards realizing intelligent, AI-powered V2X ecosystems in 6G communication.
nan
Article 462
Title@2025-06-11 (3): Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design
Title: Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design | Effizientes Preference-Based-Verstärkungslernen: Randomized Exploration trifft auf experimentelles Design | 高效的以优惠为基础的强化学习:随机探索与实验设计 2506.09508v1 |
Authors (3): Andreas Schlaginhaufen, Reda Ouhamma, Maryam Kamgarpour
We study reinforcement learning from human feedback in general Markov decision processes, where agents learn from trajectory-level preference comparisons. A central challenge in this setting is to design algorithms that select informative preference queries to identify the underlying reward while ensuring theoretical guarantees. We propose a meta-algorithm based on randomized exploration, which avoids the computational challenges associated with optimistic approaches and remains tractable. We establish both regret and last-iterate guarantees under mild reinforcement learning oracle assumptions. To improve query complexity, we introduce and analyze an improved algorithm that collects batches of trajectory pairs and applies optimal experimental design to select informative comparison queries. The batch structure also enables parallelization of preference queries, which is relevant in practical deployment as feedback can be gathered concurrently. Empirical evaluation confirms that the proposed method is competitive with reward-based reinforcement learning while requiring a small number of preference queries.
nan
Article 463
Title@2025-06-11 (3): A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes
Title: A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes | Eine einheitliche Theorie der Kompositionalität, Modularität und Interpretationsfähigkeit in Markov-Entscheidungsprozessen | 马尔科夫决定程序的组成、模式和解释统一理论 2506.09499v1 |
Authors (2): Thomas J. Ringstrom, Paul R. Schrater
We introduce Option Kernel Bellman Equations (OKBEs) for a new reward-free Markov Decision Process. Rather than a value function, OKBEs directly construct and optimize a predictive map called a state-time option kernel (STOK) to maximize the probability of completing a goal while avoiding constraint violations. STOKs are compositional, modular, and interpretable initiation-to-termination transition kernels for policies in the Options Framework of Reinforcement Learning. This means: 1) STOKs can be composed using Chapman-Kolmogorov equations to make spatiotemporal predictions for multiple policies over long horizons, 2) high-dimensional STOKs can be represented and computed efficiently in a factorized and reconfigurable form, and 3) STOKs record the probabilities of semantically interpretable goal-success and constraint-violation events, needed for formal verification. Given a high-dimensional state-transition model for an intractable planning problem, we can decompose it with local STOKs and goal-conditioned policies that are aggregated into a factorized goal kernel, making it possible to forward-plan at the level of goals in high-dimensions to solve the problem. These properties lead to highly flexible agents that can rapidly synthesize meta-policies, reuse planning representations across many tasks, and justify goals using empowerment, an intrinsic motivation function. We argue that reward-maximization is in conflict with the properties of compositionality, modularity, and interpretability. Alternatively, OKBEs facilitate these properties to support verifiable long-horizon planning and intrinsic motivation that scales to dynamic high-dimensional world-models.
nan
Article 464
Title@2025-06-11 (3): LEMUR Neural Network Dataset: Towards Seamless AutoML
Title: LEMUR Neural Network Dataset: Towards Seamless AutoML | LEMUR Neural Network Dataset: Auf dem Weg zur nahtlosen AutoML | LEMUR 神经网络数据集: 走向无缝自动ML 2504.10552v2 |
Authors (10): Arash Torabi Goodarzi, Roman Kochnev, Waleed Khalid, Furui Qin, Tolgay Atinc Uzun, Yashkumar Sanjaybhai Dhameliya, Yash Kanubhai Kathiriya, Zofia Antonina Bentyn, Dmitry Ignatov, Radu Timofte
Neural networks are fundamental in artificial intelligence, driving progress in computer vision and natural language processing. High-quality datasets are crucial for their development, and there is growing interest in datasets composed of neural networks themselves to support benchmarking, automated machine learning (AutoML), and model analysis. We introduce LEMUR, an open source dataset of neural network models with well-structured code for diverse architectures across tasks such as object detection, image classification, segmentation, and natural language processing. LEMUR is primarily designed to provide a rich source of structured model representations and associated performance data, enabling the fine-tuning of large language models for AutoML applications. Leveraging Python and PyTorch, LEMUR enables seamless extension to new datasets and models while maintaining consistency. It integrates an Optuna-powered framework for evaluation, hyperparameter optimization, statistical analysis, and graphical insights. LEMUR VR extension enables the seamless deployment of models in virtual reality, optimizing their performance on resource-constrained devices. Providing tools for model evaluation, preprocessing, and database management, LEMUR supports researchers and practitioners in developing, testing, and analyzing neural networks. It offers an API that delivers comprehensive information about neural network models and their complete performance statistics with a single request, which can be used in experiments with code-generating large language models. The LEMUR and its plugins are accessible as open source projects under the MIT license at https://github.com/ABrain-One/nn-dataset, https://github.com/ABrain-One/nn-plots and https://github.com/ABrain-One/nn-vr.
nan
Article 465
Title@2025-06-11 (3): EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization
Title: EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization | EnerBridge-DPO: Energiegeführtes Protein inverses Falten mit Markov-Brücken und direkter Präferenzoptimierung | EnerBridge-DPO:以Markov桥和直接优化为首的热导蛋白质反向折叠 2506.09496v1 |
Authors (6): Dingyi Rong, Haotian Lu, Wenzhuo Zheng, Fan Zhang, Shuangjia Zheng, Ning Liu
Designing protein sequences with optimal energetic stability is a key challenge in protein inverse folding, as current deep learning methods are primarily trained by maximizing sequence recovery rates, often neglecting the energy of the generated sequences. This work aims to overcome this limitation by developing a model that directly generates low-energy, stable protein sequences. We propose EnerBridge-DPO, a novel inverse folding framework focused on generating low-energy, high-stability protein sequences. Our core innovation lies in: First, integrating Markov Bridges with Direct Preference Optimization (DPO), where energy-based preferences are used to fine-tune the Markov Bridge model. The Markov Bridge initiates optimization from an information-rich prior sequence, providing DPO with a pool of structurally plausible sequence candidates. Second, an explicit energy constraint loss is introduced, which enhances the energy-driven nature of DPO based on prior sequences, enabling the model to effectively learn energy representations from a wealth of prior knowledge and directly predict sequence energy values, thereby capturing quantitative features of the energy landscape. Our evaluations demonstrate that EnerBridge-DPO can design protein complex sequences with lower energy while maintaining sequence recovery rates comparable to state-of-the-art models, and accurately predicts $\Delta \Delta G$ values between various sequences.
nan
Article 466
Title@2025-06-11 (3): Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers
Title: Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers | Bridging Online Behavior and Clinical Insight: Eine längsseitige LLM-basierte Studie zur Suizidalität auf YouTube enthüllt neuartige digitale Marker | 架桥在线行为和临床透视:YouTube Reveals新数字标记上基于纵向LLM的自杀性纵向LM研究 2506.09495v1 |
Authors (7): Ilanit Sobol, Shir Lissak, Refael Tikochinski, Tal Nakash, Anat Brunstein Klomek, Eyal Fruchter, Roi Reichart
Suicide remains a leading cause of death in Western countries, underscoring the need for new research approaches. As social media becomes central to daily life, digital footprints offer valuable insight into suicidal behavior. Focusing on individuals who attempted suicide while uploading videos to their channels, we investigate: How do suicidal behaviors manifest on YouTube, and how do they differ from expert knowledge? We applied complementary approaches: computational bottom-up, hybrid, and expert-driven top-down, on a novel longitudinal dataset of 181 YouTube channels from individuals with life-threatening attempts, alongside 134 control channels. In the bottom-up approach, we applied LLM-based topic modeling to identify behavioral indicators. Of 166 topics, five were associated with suicide-attempt, with two also showing temporal attempt-related changes ($p<.01$) - Mental Health Struggles ($+0.08$)* and YouTube Engagement ($+0.1$)*. In the hybrid approach, a clinical expert reviewed LLM-derived topics and flagged 19 as suicide-related. However, none showed significant attempt-related temporal effects beyond those identified bottom-up. Notably, YouTube Engagement, a platform-specific indicator, was not flagged by the expert, underscoring the value of bottom-up discovery. In the top-down approach, psychological assessment of suicide attempt narratives revealed that the only significant difference between individuals who attempted before and those attempted during their upload period was the motivation to share this experience: the former aimed to Help Others ($\beta=-1.69$, $p<.01$), while the latter framed it as part of their Personal Recovery ($\beta=1.08$, $p<.01$). By integrating these approaches, we offer a nuanced understanding of suicidality, bridging digital behavior and clinical insights. * Within-group changes in relation to the suicide attempt.
nan
Article 467
Title@2025-06-11 (3): MIRA: Medical Time Series Foundation Model for Real-World Health Data
Title: MIRA: Medical Time Series Foundation Model for Real-World Health Data | MIRA: Medical Time Series Foundation Modell für real-World Gesundheitsdaten | 医疗时间系列基金会实际世界卫生数据模型 2506.07584v2 |
Authors (11): Hao Li, Bowen Deng, Chang Xu, Zhiyuan Feng, Viktor Schlegel, Yu-Hao Huang, Yizheng Sun, Jingyuan Sun, Kailai Yang, Yiyao Yu, Jiang Bian
A unified foundation model for medical time series – pretrained on open access and ethics board-approved medical corpora – offers the potential to reduce annotation burdens, minimize model customization, and enable robust transfer across clinical institutions, modalities, and tasks, particularly in data-scarce or privacy-constrained environments. However, existing generalist time series foundation models struggle to handle medical time series data due to their inherent challenges, including irregular intervals, heterogeneous sampling rates, and frequent missing values. To address these challenges, we introduce MIRA, a unified foundation model specifically designed for medical time series forecasting. MIRA incorporates a Continuous-Time Rotary Positional Encoding that enables fine-grained modeling of variable time intervals, a frequency-specific mixture-of-experts layer that routes computation across latent frequency regimes to further promote temporal specialization, and a Continuous Dynamics Extrapolation Block based on Neural ODE that models the continuous trajectory of latent states, enabling accurate forecasting at arbitrary target timestamps. Pretrained on a large-scale and diverse medical corpus comprising over 454 billion time points collect from publicly available datasets, MIRA achieves reductions in forecasting errors by an average of 10% and 7% in out-of-distribution and in-distribution scenarios, respectively, when compared to other zero-shot and fine-tuned baselines. We also introduce a comprehensive benchmark spanning multiple downstream clinical tasks, establishing a foundation for future research in medical time series modeling.
nan
Article 468
Title@2025-06-11 (3): Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds
Title: Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds | Minimax-Optimalität von tiefen neuronalen Netzwerken auf abhängigen Daten über PAC-Bayes-Grenzen | 通过PAC-Bayes界限依靠数据建立深海神经网络的最小最佳程度 2410.21702v2 |
Authors (2): Pierre Alquier, William Kengne
In a groundbreaking work, Schmidt-Hieber (2020) proved the minimax optimality of deep neural networks with ReLu activation for least-square regression estimation over a large class of functions defined by composition. In this paper, we extend these results in many directions. First, we remove the i.i.d. assumption on the observations, to allow some time dependence. The observations are assumed to be a Markov chain with a non-null pseudo-spectral gap. Then, we study a more general class of machine learning problems, which includes least-square and logistic regression as special cases. Leveraging on PAC-Bayes oracle inequalities and a version of Bernstein inequality due to Paulin (2015), we derive upper bounds on the estimation risk for a generalized Bayesian estimator. In the case of least-square regression, this bound matches (up to a logarithmic factor) the lower bound of Schmidt-Hieber (2020). We establish a similar lower bound for classification with the logistic loss, and prove that the proposed DNN estimator is optimal in the minimax sense.
nan
Article 469
Title@2025-06-11 (3): BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation
Title: BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation | BemaGANv2: Eine Tutorial- und Vergleichsstudie von GAN-basierten Vocodern für die Langzeit-Audiogenerierung | BemaGANv2: 以GAN为基地的长期音频生成Vocoders的教学和比较调查 2506.09487v1 |
Authors (10): Taesoo Park, Mungwi Jeong, Mingyu Park, Narae Kim, Junyoung Kim, Mujung Kim, Jisang Yoo, Hoyun Lee, Sanghoon Kim, Soonchul Kwon
This paper presents a tutorial-style survey and implementation guide of BemaGANv2, an advanced GAN-based vocoder designed for high-fidelity and long-term audio generation. Built upon the original BemaGAN architecture, BemaGANv2 incorporates major architectural innovations by replacing traditional ResBlocks in the generator with the Anti-aliased Multi-Periodicity composition (AMP) module, which internally applies the Snake activation function to better model periodic structures. In the discriminator framework, we integrate the Multi-Envelope Discriminator (MED), a novel architecture we originally proposed, to extract rich temporal envelope features crucial for periodicity detection. Coupled with the Multi-Resolution Discriminator (MRD), this combination enables more accurate modeling of long-range dependencies in audio. We systematically evaluate various discriminator configurations, including MSD + MED, MSD + MRD, and MPD + MED + MRD, using objective metrics (FAD, SSIM, PLCC, MCD) and subjective evaluations (MOS, SMOS). This paper also provides a comprehensive tutorial on the model architecture, training methodology, and implementation to promote reproducibility. The code and pre-trained models are available at: https://github.com/dinhoitt/BemaGANv2.
nan
Article 470
Title@2025-06-11 (3): Plug-and-Play image restoration with Stochastic deNOising REgularization
Title: Plug-and-Play image restoration with Stochastic deNOising REgularization | Plug-and-Play-Bildrestaurierung mit stochastischer Entnoisierung der Regularisierung | 利用斯托卡式脱氧化还溶性恢复插件和布板图像 2402.01779v3 |
Authors (4): Marien Renaud, Jean Prost, Arthur Leclaire, Nicolas Papadakis
Plug-and-Play (PnP) algorithms are a class of iterative algorithms that address image inverse problems by combining a physical model and a deep neural network for regularization. Even if they produce impressive image restoration results, these algorithms rely on a non-standard use of a denoiser on images that are less and less noisy along the iterations, which contrasts with recent algorithms based on Diffusion Models (DM), where the denoiser is applied only on re-noised images. We propose a new PnP framework, called Stochastic deNOising REgularization (SNORE), which applies the denoiser only on images with noise of the adequate level. It is based on an explicit stochastic regularization, which leads to a stochastic gradient descent algorithm to solve ill-posed inverse problems. A convergence analysis of this algorithm and its annealing extension is provided. Experimentally, we prove that SNORE is competitive with respect to state-of-the-art methods on deblurring and inpainting tasks, both quantitatively and qualitatively.
nan
Article 471
Title@2025-06-11 (3): Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling
Title: Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling | Auf dem Weg zu einem einheitlichen und verlustfreien Latent-Raum für 3D-Molekular-Latent-Diffusionsmodell | 迈向3D分子中子中子扩散模型的 统一和无损失的冷冻空间 2503.15567v3 |
Authors (9): Yanchen Luo, Zhiyuan Liu, Yi Zhao, Sihang Li, Hengxing Cai, Kenji Kawaguchi, Tat-Seng Chua, Yang Zhang, Xiang Wang
3D molecule generation is crucial for drug discovery and material science, requiring models to process complex multi-modalities, including atom types, chemical bonds, and 3D coordinates. A key challenge is integrating these modalities of different shapes while maintaining SE(3) equivariance for 3D coordinates. To achieve this, existing approaches typically maintain separate latent spaces for invariant and equivariant modalities, reducing efficiency in both training and sampling. In this work, we propose \textbf{U}nified Variational \textbf{A}uto-\textbf{E}ncoder for \textbf{3D} Molecular Latent Diffusion Modeling (\textbf{UAE-3D}), a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space, while maintaining near-zero reconstruction error. This unified latent space eliminates the complexities of handling multi-modality and equivariance when performing latent diffusion modeling. We demonstrate this by employing the Diffusion Transformer–a general-purpose diffusion model without any molecular inductive bias–for latent generation. Extensive experiments on GEOM-Drugs and QM9 datasets demonstrate that our method significantly establishes new benchmarks in both \textit{de novo} and conditional 3D molecule generation, achieving leading efficiency and quality. On GEOM-Drugs, it reduces FCD by 72.6\% over the previous best result, while achieving over 70\% relative average improvements in geometric fidelity.
nan
Article 472
Title@2025-06-11 (3): Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models
Title: Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models | Konvergenzbedingungen für stochastische Liniensuche auf Basis der Optimierung von überparametrisierten Modellen | 存储线搜索的一致条件 优化过度平衡模型的优化 2408.03199v2 |
Authors (2): Matteo Lapucci, Davide Pucci
In this paper, we deal with algorithms to solve the finite-sum problems related to fitting over-parametrized models, that typically satisfy the interpolation condition. In particular, we focus on approaches based on stochastic line searches and employing general search directions. We define conditions on the sequence of search directions that guarantee finite termination and bounds for the backtracking procedure. Moreover, we shed light on the additional property of directions needed to prove fast (linear) convergence of the general class of algorithms when applied to PL functions in the interpolation regime. From the point of view of algorithms design, the proposed analysis identifies safeguarding conditions that could be employed in relevant algorithmic frameworks. In particular, it could be of interest to integrate stochastic line searches within momentum, conjugate gradient or adaptive preconditioning methods.
nan
Article 473
Title@2025-06-11 (3): On a few pitfalls in KL divergence gradient estimation for RL
Title: On a few pitfalls in KL divergence gradient estimation for RL | Bei einigen Fallen bei KL-Divergenzgradientenschätzung für RL | KL RL差差差梯度估算的几个陷阱 2506.09477v1 |
Authors (2): Yunhao Tang, Rémi Munos
We point out a few pitfalls in implementing gradient estimation for KL divergence in RL training for LLM, as seen in a number of open source projects and papers. The first major pitfall is to differentiate through the KL estimate as loss functions to minimize KL divergence. We show that such implementations are generally incorrect and do not produce the desired KL gradient. Secondly, we show that some implementations do not account for the sequential nature of the estimation problem and produce a partial gradient at best. We demonstrate the impact of such issues with illustrative tabular and LLM experiments, and show the correct way to implement the KL gradient.
nan
Article 474
Title@2025-06-11 (3): Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough?
Title: Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough? | Modellgröße in kontinuierlichen Gauß-Prozessen anpassen: Wie groß ist groß genug? | 调整连续高斯进程模型大小:大到多大? 2408.07588v4 |
Authors (3): Guiomar Pescador-Barrios, Sarah Filippi, Mark van der Wilk
Many machine learning models require setting a parameter that controls their size before training, e.g. number of neurons in DNNs, or inducing points in GPs. Increasing capacity typically improves performance until all the information from the dataset is captured. After this point, computational cost keeps increasing, without improved performance. This leads to the question “How big is big enough?” We investigate this problem for Gaussian processes (single-layer neural networks) in continual learning. Here, data becomes available incrementally, and the final dataset size will therefore not be known before training, preventing the use of heuristics for setting a fixed model size. We develop a method to automatically adjust model size while maintaining near-optimal performance. Our experimental procedure follows the constraint that any hyperparameters must be set without seeing dataset properties, and we show that our method performs well across diverse datasets without the need to adjust its hyperparameter, showing it requires less tuning than others.
nan
Article 475
Title@2025-06-11 (3): A Reinforcement Learning Approach for RIS-aided Fair Communications
Title: A Reinforcement Learning Approach for RIS-aided Fair Communications | Ein stärkerer Lernansatz für RIS-gestützte Fair Communications | 以强化学习方法促进RIRS援助的公平交流 2506.06344v2 |
Authors (5): Alex Pierron, Michel Barbeau, Luca De Cicco, Jose Rubio-Hernan, Joaquin Garcia-Alfaro
Reconfigurable Intelligent Surfaces (RISs) are composed of physical elements that can dynamically alter electromagnetic wave properties to enhance beamforming and leading to improvements in areas with low coverage properties. They have the potential to be combined with Reinforcement Learning (RL) techniques to achieve network performance and energy efficiency via optimization techniques. In addition to performance and energy improvements, it is also crucial to consider the concept of fair communications. RISs must ensure that User Equipment (UE) units receive their signals with adequate strength, without other UE being deprived of service due to insufficient power. In this paper, we address such a problem. We explore the fairness properties of previous work and propose a novel method that aims at obtaining an efficient and fair duplex RIS-RL system for multiple legitimate UE units. We report and discuss our experimental work and simulation results. We also release our code and datasets to foster further research in the topic.
nan
Article 476
Title@2025-06-11 (3): Forecasting high-impact research topics via machine learning on evolving knowledge graphs
Title: Forecasting high-impact research topics via machine learning on evolving knowledge graphs | Prognose hochwirksamer Forschungsthemen durch maschinelles Lernen auf sich entwickelnden Wissensgraphen | 通过对不断演变的知识图表进行机器学习,预测高影响研究专题 2402.08640v4 |
Authors (2): Xuemei Gu, Mario Krenn
The exponential growth in scientific publications poses a severe challenge for human researchers. It forces attention to more narrow sub-fields, which makes it challenging to discover new impactful research ideas and collaborations outside one’s own field. While there are ways to predict a scientific paper’s future citation counts, they need the research to be finished and the paper written, usually assessing impact long after the idea was conceived. Here we show how to predict the impact of onsets of ideas that have never been published by researchers. For that, we developed a large evolving knowledge graph built from more than 21 million scientific papers. It combines a semantic network created from the content of the papers and an impact network created from the historic citations of papers. Using machine learning, we can predict the dynamic of the evolving network into the future with high accuracy (AUC values beyond 0.9 for most experiments), and thereby the impact of new research directions. We envision that the ability to predict the impact of new ideas will be a crucial component of future artificial muses that can inspire new impactful and interesting scientific ideas.
nan
Article 477
Title@2025-06-11 (3): ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Title: ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning | ReinFlow: Feinsteuerungs-Flow Matching-Politik mit Online-Verstärkungs-Lernen | ReinFlow: 与在线强化学习匹配流动政策的微调 2505.22094v3 |
Authors (4): Tonghe Zhang, Chao Yu, Sichang Su, Yu Wang
We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy’s deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/
nan
Article 478
Title@2025-06-11 (3): Bayesian Neural Scaling Law Extrapolation with Prior-Fitted Networks
Title: Bayesian Neural Scaling Law Extrapolation with Prior-Fitted Networks | Bayesian Neural Scaling Law Extrapolation mit Prior-Fitted-Netzwerke | Bayesian神经扩增法与事先确定网络的外推法 2505.23032v2 |
Authors (8): Dongwoo Lee, Dong Bok Lee, Steven Adriaensen, Juho Lee, Sung Ju Hwang, Frank Hutter, Seon Joo Kim, Hae Beom Lee
Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.
nan
Article 479
Title@2025-06-11 (3): Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms
Title: Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms | Auf dem Weg zur Überbrückung des Lohn-Generation-Gaps in direkten Ausrichtungsalgorithmen | 争取弥合直接调整数值中的奖励-评分差距 2506.09457v1 |
Authors (3): Zeguan Xiao, Yun Chen, Guanhua Chen
Direct Alignment Algorithms (DAAs), such as Direct Preference Optimization (DPO) and Simple Preference Optimization (SimPO), have emerged as efficient alternatives to Reinforcement Learning from Human Feedback (RLHF) algorithms for aligning large language models (LLMs) with human preferences. However, DAAs suffer from a fundamental limitation we identify as the “reward-generation gap” – a misalignment between optimization objectives during training and actual generation performance during inference. In this paper, we find a contributor to the reward-generation gap is the mismatch between the inherent importance of prefix tokens during the LLM generation process and how this importance is reflected in the implicit reward functions of DAAs. To bridge the gap, we introduce a simple yet effective approach called Prefix-Oriented Equal-length Training (POET), which truncates both preferred and dispreferred responses to match the shorter one’s length. Training with POET, where both responses in each sample are truncated to equal length, resulting in diverse truncated lengths across samples, the optimization of DAAs objective is implicitly constrained to converge across all positions, thus paying more attention to prefix tokens than the standard DAAs. We conduct experiments with DPO and SimPO, two representative DAAs, demonstrating that POET improves over their standard implementations, achieving up to 15.6 points in AlpacaEval 2 and overall improvements across downstream tasks. Our results highlight the importance of addressing the misalignment between reward optimization and generation performance in DAAs.
nan
Article 480
Title@2025-06-11 (3): NDCG-Consistent Softmax Approximation with Accelerated Convergence
Title: NDCG-Consistent Softmax Approximation with Accelerated Convergence | NDCG-Konsistente Softmax-Annäherung mit beschleunigter Konvergenz | NDCG-Consistent Softmax 与加速汇合相近 2506.09454v1 |
Authors (6): Yuanhao Pu, Defu Lian, Xiaolong Chen, Xu Huang, Jin Chen, Enhong Chen
Ranking tasks constitute fundamental components of extreme similarity learning frameworks, where extremely large corpora of objects are modeled through relative similarity relationships adhering to predefined ordinal structures. Among various ranking surrogates, Softmax (SM) Loss has been widely adopted due to its natural capability to handle listwise ranking via global negative comparisons, along with its flexibility across diverse application scenarios. However, despite its effectiveness, SM Loss often suffers from significant computational overhead and scalability limitations when applied to large-scale object spaces. To address this challenge, we propose novel loss formulations that align directly with ranking metrics: the Ranking-Generalizable \textbf{squared} (RG$^2$) Loss and the Ranking-Generalizable interactive (RG$^\times$) Loss, both derived through Taylor expansions of the SM Loss. Notably, RG$^2$ reveals the intrinsic mechanisms underlying weighted squared losses (WSL) in ranking methods and uncovers fundamental connections between sampling-based and non-sampling-based loss paradigms. Furthermore, we integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method, providing both generalization guarantees and convergence rate analyses. Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance relative to SM Loss, while significantly accelerating convergence. This framework offers the similarity learning community both theoretical insights and practically efficient tools, with methodologies applicable to a broad range of tasks where balancing ranking quality and computational efficiency is essential.
nan
Article 481
Title@2025-06-11 (3): MADCluster: Model-agnostic Anomaly Detection with Self-supervised Clustering Network
Title: MADCluster: Model-agnostic Anomaly Detection with Self-supervised Clustering Network | MADCluster: Modell-agnostische Anomalieerkennung mit selbstüberwachtem Clustering-Netzwerk | MADCluster:使用自监管的集群网进行模型-不可知异常探测 2505.16223v5 |
Authors (3): Sangyong Lee, Subo Hwang, Dohoon Kim
In this paper, we propose MADCluster, a novel model-agnostic anomaly detection framework utilizing self-supervised clustering. MADCluster is applicable to various deep learning architectures and addresses the ‘hypersphere collapse’ problem inherent in existing deep learning-based anomaly detection methods. The core idea is to cluster normal pattern data into a ‘single cluster’ while simultaneously learning the cluster center and mapping data close to this center. Also, to improve expressiveness and enable effective single clustering, we propose a new ‘One-directed Adaptive loss’. The optimization of this loss is mathematically proven. MADCluster consists of three main components: Base Embedder capturing high-dimensional temporal dynamics, Cluster Distance Mapping, and Sequence-wise Clustering for continuous center updates. Its model-agnostic characteristics are achieved by applying various architectures to the Base Embedder. Experiments on four time series benchmark datasets demonstrate that applying MADCluster improves the overall performance of comparative models. In conclusion, the compatibility of MADCluster shows potential for enhancing model performance across various architectures.
nan
Article 482
Title@2025-06-11 (3): SafeCoT: Improving VLM Safety with Minimal Reasoning
Title: SafeCoT: Improving VLM Safety with Minimal Reasoning | SafeCoT: VLM-Sicherheit mit minimaler Vernunft verbessern | SafeCot: 提高VLM安全性,并采用最低合理理由 2506.08399v2 |
Authors (4): Jiachen Ma, Zhanhui Zhou, Chao Yang, Chaochao Lu
Ensuring safe and appropriate responses from vision-language models (VLMs) remains a critical challenge, particularly in high-risk or ambiguous scenarios. We introduce SafeCoT, a lightweight, interpretable framework that leverages rule-based chain-of-thought (CoT) supervision to improve refusal behavior in VLMs. Unlike prior methods that rely on large-scale safety annotations or complex modeling, SafeCoT uses minimal supervision to help models reason about safety risks and make context-aware refusals. Experiments across multiple benchmarks show that SafeCoT significantly reduces overrefusal and enhances generalization, even with limited training data. Our approach offers a scalable solution for aligning VLMs with safety-critical objectives.
nan
Article 483
Title@2025-06-11 (3): Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform
Title: Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform | Verschlimmerungen von LLM-Embedding-Sequenzen lernen: Glas-Reformation | 学习LLM 嵌入序列的学习障碍:染色玻璃变形 2506.09452v1 |
Authors (4): Jay Roberts, Kyle Mylonakis, Sidhartha Roy, Kaan Kale
The high cost of ownership of AI compute infrastructure and challenges of robust serving of large language models (LLMs) has led to a surge in managed Model-as-a-service deployments. Even when enterprises choose on-premises deployments, the compute infrastructure is typically shared across many teams in order to maximize the return on investment. In both scenarios the deployed models operate only on plaintext data, and so enterprise data owners must allow their data to appear in plaintext on a shared or multi-tenant compute infrastructure. This results in data owners with private or sensitive data being hesitant or restricted in what data they use with these types of deployments. In this work we introduce the Stained Glass Transform, a learned, stochastic, and sequence dependent transformation of the word embeddings of an LLM which information theoretically provides privacy to the input of the LLM while preserving the utility of model. We theoretically connect a particular class of Stained Glass Transforms to the theory of mutual information of Gaussian Mixture Models. We then calculate a-postiori privacy estimates, based on mutual information, and verify the privacy and utility of instances of transformed embeddings through token level metrics of privacy and standard LLM performance benchmarks.
nan
Article 484
Title@2025-06-11 (3): Safe Screening Rules for Group SLOPE
Title: Safe Screening Rules for Group SLOPE | Sichere Screening-Regeln für Gruppen-LOPE | SLOPE集团安全筛选规则 2506.09451v1 |
Authors (3): Runxue Bao, Quanchao Lu, Yanfu Zhang
Variable selection is a challenging problem in high-dimensional sparse learning, especially when group structures exist. Group SLOPE performs well for the adaptive selection of groups of predictors. However, the block non-separable group effects in Group SLOPE make existing methods either invalid or inefficient. Consequently, Group SLOPE tends to incur significant computational costs and memory usage in practical high-dimensional scenarios. To overcome this issue, we introduce a safe screening rule tailored for the Group SLOPE model, which efficiently identifies inactive groups with zero coefficients by addressing the block non-separable group effects. By excluding these inactive groups during training, we achieve considerable gains in computational efficiency and memory usage. Importantly, the proposed screening rule can be seamlessly integrated into existing solvers for both batch and stochastic algorithms. Theoretically, we establish that our screening rule can be safely employed with existing optimization algorithms, ensuring the same results as the original approaches. Experimental results confirm that our method effectively detects inactive feature groups and significantly boosts computational efficiency without compromising accuracy.
nan
Article 485
Title@2025-06-11 (3): Modified K-means Algorithm with Local Optimality Guarantees
Title: Modified K-means Algorithm with Local Optimality Guarantees | Modifizierter K-Mittel-Algorithmus mit lokalen Optimierungsgarantien | 使用本地最佳度保障修改 K 平均值 2506.06990v2 |
Authors (3): Mingyi Li, Michael R. Metel, Akiko Takeda
The K-means algorithm is one of the most widely studied clustering algorithms in machine learning. While extensive research has focused on its ability to achieve a globally optimal solution, there still lacks a rigorous analysis of its local optimality guarantees. In this paper, we first present conditions under which the K-means algorithm converges to a locally optimal solution. Based on this, we propose simple modifications to the K-means algorithm which ensure local optimality in both the continuous and discrete sense, with the same computational complexity as the original K-means algorithm. As the dissimilarity measure, we consider a general Bregman divergence, which is an extension of the squared Euclidean distance often used in the K-means algorithm. Numerical experiments confirm that the K-means algorithm does not always find a locally optimal solution in practice, while our proposed methods provide improved locally optimal solutions with reduced clustering loss. Our code is available at https://github.com/lmingyi/LO-K-means.
nan
Article 486
Title@2025-06-11 (3): Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking
Title: Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking | Aufmerksamkeits-Bayesischer Hybrid-Ansatz für modulares multiples Partikel-Tracking | 模块多粒子跟踪的注意 – – 巴伊斯混合办法 2506.09441v1 |
Authors (2): Piyush Mishra, Philippe Roudot
Tracking multiple particles in noisy and cluttered scenes remains challenging due to a combinatorial explosion of trajectory hypotheses, which scales super-exponentially with the number of particles and frames. The transformer architecture has shown a significant improvement in robustness against this high combinatorial load. However, its performance still falls short of the conventional Bayesian filtering approaches in scenarios presenting a reduced set of trajectory hypothesis. This suggests that while transformers excel at narrowing down possible associations, they may not be able to reach the optimality of the Bayesian approach in locally sparse scenario. Hence, we introduce a hybrid tracking framework that combines the ability of self-attention to learn the underlying representation of particle behavior with the reliability and interpretability of Bayesian filtering. We perform trajectory-to-detection association by solving a label prediction problem, using a transformer encoder to infer soft associations between detections across frames. This prunes the hypothesis set, enabling efficient multiple-particle tracking in Bayesian filtering framework. Our approach demonstrates improved tracking accuracy and robustness against spurious detections, offering a solution for high clutter multiple particle tracking scenarios.
nan
Article 487
Title@2025-06-11 (3): Optimization over Sparse Support-Preserving Sets: Two-Step Projection with Global Optimality Guarantees
Title: Optimization over Sparse Support-Preserving Sets: Two-Step Projection with Global Optimality Guarantees | Optimierung über Sparse Support-Preserving Sets: Zwei-Schritt-Projektion mit globalen Optimierungsgarantien | 使用全球最佳保证的两步投影 2506.08558v2 |
Authors (3): William de Vazelhes, Xiao-Tong Yuan, Bin Gu
In sparse optimization, enforcing hard constraints using the $\ell_0$ pseudo-norm offers advantages like controlled sparsity compared to convex relaxations. However, many real-world applications demand not only sparsity constraints but also some extra constraints. While prior algorithms have been developed to address this complex scenario with mixed combinatorial and convex constraints, they typically require the closed form projection onto the mixed constraints which might not exist, and/or only provide local guarantees of convergence which is different from the global guarantees commonly sought in sparse optimization. To fill this gap, in this paper, we study the problem of sparse optimization with extra support-preserving constraints commonly encountered in the literature. We present a new variant of iterative hard-thresholding algorithm equipped with a two-step consecutive projection operator customized for these mixed constraints, serving as a simple alternative to the Euclidean projection onto the mixed constraint. By introducing a novel trade-off between sparsity relaxation and sub-optimality, we provide global guarantees in objective value for the output of our algorithm, in the deterministic, stochastic, and zeroth-order settings, under the conventional restricted strong-convexity/smoothness assumptions. As a fundamental contribution in proof techniques, we develop a novel extension of the classic three-point lemma to the considered two-step non-convex projection operator, which allows us to analyze the convergence in objective value in an elegant way that has not been possible with existing techniques. In the zeroth-order case, such technique also improves upon the state-of-the-art result from de Vazelhes et. al. (2022), even in the case without additional constraints, by allowing us to remove a non-vanishing system error present in their work.
nan
Article 488
Title@2025-06-11 (3): Generalization Error Analysis for Attack-Free and Byzantine-Resilient Decentralized Learning with Data Heterogeneity
Title: Generalization Error Analysis for Attack-Free and Byzantine-Resilient Decentralized Learning with Data Heterogeneity | Generalisierungsfehleranalyse für Angriffsfreies und Byzantinisch-Resilientes dezentralisiertes Lernen mit Daten Heterogenität | 为无攻击攻击和有数据差异化的抗拜占庭分散化学习提供通用错误分析 2506.09438v1 |
Authors (3): Haoxiang Ye, Tao Sun, Qing Ling
Decentralized learning, which facilitates joint model training across geographically scattered agents, has gained significant attention in the field of signal and information processing in recent years. While the optimization errors of decentralized learning algorithms have been extensively studied, their generalization errors remain relatively under-explored. As the generalization errors reflect the scalability of trained models on unseen data and are crucial in determining the performance of trained models in real-world applications, understanding the generalization errors of decentralized learning is of paramount importance. In this paper, we present fine-grained generalization error analysis for both attack-free and Byzantine-resilient decentralized learning with heterogeneous data as well as under mild assumptions, in contrast to prior studies that consider homogeneous data and/or rely on a stringent bounded stochastic gradient assumption. Our results shed light on the impact of data heterogeneity, model initialization and stochastic gradient noise – factors that have not been closely investigated before – on the generalization error of decentralized learning. We also reveal that Byzantine attacks performed by malicious agents largely affect the generalization error, and their negative impact is inherently linked to the data heterogeneity while remaining independent on the sample size. Numerical experiments on both convex and non-convex tasks are conducted to validate our theoretical findings.
nan
Article 489
Title@2025-06-11 (3): Discovering Physics Laws of Dynamical Systems via Invariant Function Learning
Title: Discovering Physics Laws of Dynamical Systems via Invariant Function Learning | Physikgesetze dynamischer Systeme durch Invariant Function Learning entdecken | 通过动态功能学习发现动态系统的物理法 2502.04495v2 |
Authors (3): Shurui Gui, Xiner Li, Shuiwang Ji
We consider learning underlying laws of dynamical systems governed by ordinary differential equations (ODE). A key challenge is how to discover intrinsic dynamics across multiple environments while circumventing environment-specific mechanisms. Unlike prior work, we tackle more complex environments where changes extend beyond function coefficients to entirely different function forms. For example, we demonstrate the discovery of ideal pendulum’s natural motion $\alpha^2 \sin{\theta_t}$ by observing pendulum dynamics in different environments, such as the damped environment $\alpha^2 \sin(\theta_t) - \rho \omega_t$ and powered environment $\alpha^2 \sin(\theta_t) + \rho \frac{\omega_t}{\left | \omega_t\right | }$. Here, we formulate this problem as an \emph{invariant function learning} task and propose a new method, known as \textbf{D}isentanglement of \textbf{I}nvariant \textbf{F}unctions (DIF), that is grounded in causal analysis. We propose a causal graph and design an encoder-decoder hypernetwork that explicitly disentangles invariant functions from environment-specific dynamics. The discovery of invariant functions is guaranteed by our information-based principle that enforces the independence between extracted invariant functions and environments. Quantitative comparisons with meta-learning and invariant learning baselines on three ODE systems demonstrate the effectiveness and efficiency of our method. Furthermore, symbolic regression explanation results highlight the ability of our framework to uncover intrinsic laws. Our code has been released as part of the AIRS library (\href{https://github.com/divelab/AIRS/tree/main/OpenODE/DIF}{https://github.com/divelab/AIRS/}). |
nan
Article 490
Title@2025-06-11 (3): Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling
Title: Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling | Lohnüberoptimierung bei Direktausrichtungsalgorithmen mit Wichtigkeits-Sampling abmildern | 在具有重要性抽样调查的直接对齐数值中,减缓回调过度优化 2506.08681v2 |
Authors (8): Phuc Minh Nguyen, Ngoc-Hieu Nguyen, Duy H. M. Nguyen, Anji Liu, An Mai, Binh T. Nguyen, Daniel Sonntag, Khoa D. Doan
Direct Alignment Algorithms (DAAs) such as Direct Preference Optimization (DPO) have emerged as alternatives to the standard Reinforcement Learning from Human Feedback (RLHF) for aligning large language models (LLMs) with human values. However, these methods are more susceptible to over-optimization, in which the model drifts away from the reference policy, leading to degraded performance as training progresses. This paper proposes a novel importance-sampling approach to mitigate the over-optimization problem of offline DAAs. This approach, called (IS-DAAs), multiplies the DAA objective with an importance ratio that accounts for the reference policy distribution. IS-DAAs additionally avoid the high variance issue associated with importance sampling by clipping the importance ratio to a maximum value. Our extensive experiments demonstrate that IS-DAAs can effectively mitigate over-optimization, especially under low regularization strength, and achieve better performance than other methods designed to address this problem. Our implementations are provided publicly at this link.
nan
Article 491
Title@2025-06-11 (3): When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
Title: When Is Diversity Rewarded in Cooperative Multi-Agent Learning? | Wann wird Vielfalt im kooperativen Multi-Agenten-Lernen belohnt? | 当多样性在多机构合作学习中得到回报吗? 2506.09434v1 |
Authors (3): Michael Amir, Matteo Bettini, Amanda Prorok
The success of teams in robotics, nature, and society often depends on the division of labor among diverse specialists; however, a principled explanation for when such diversity surpasses a homogeneous team is still missing. Focusing on multi-agent task allocation problems, our goal is to study this question from the perspective of reward design: what kinds of objectives are best suited for heterogeneous teams? We first consider an instantaneous, non-spatial setting where the global reward is built by two generalized aggregation operators: an inner operator that maps the $N$ agents’ effort allocations on individual tasks to a task score, and an outer operator that merges the $M$ task scores into the global team reward. We prove that the curvature of these operators determines whether heterogeneity can increase reward, and that for broad reward families this collapses to a simple convexity test. Next, we ask what incentivizes heterogeneity to emerge when embodied, time-extended agents must learn an effort allocation policy. To study heterogeneity in such settings, we use multi-agent reinforcement learning (MARL) as our computational paradigm, and introduce Heterogeneous Environment Design (HED), a gradient-based algorithm that optimizes the parameter space of underspecified MARL environments to find scenarios where heterogeneity is advantageous. Experiments in matrix games and an embodied Multi-Goal-Capture environment show that, despite the difference in settings, HED rediscovers the reward regimes predicted by our theory to maximize the advantage of heterogeneity, both validating HED and connecting our theoretical insights to reward design in MARL. Together, these results help us understand when behavioral diversity delivers a measurable benefit.
nan
Article 492
Title@2025-06-11 (3): Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training
Title: Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training | Milderung von sauberen Korrelationen in LLMs durch Kausalität-Bewusstsein nach dem Training | 通过Causaliity-Aware 培训后在LLMs中减轻净污损 2506.09433v1 |
Authors (2): Shurui Gui, Shuiwang Ji
While large language models (LLMs) have demonstrated remarkable capabilities in language modeling, recent studies reveal that they often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training. Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT). By decomposing a biased prediction into two unbiased steps, known as \textit{event estimation} and \textit{event intervention}, we reduce LLMs’ pre-training biases without incurring additional fine-tuning biases, thus enhancing the model’s generalization ability. Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks using only 100 ID fine-tuning samples, demonstrating the effectiveness and sample efficiency of CAPT.
nan
Article 493
Title@2025-06-11 (3): Through a Steerable Lens: Magnifying Neural Network Interpretability via Phase-Based Extrapolation
Title: Through a Steerable Lens: Magnifying Neural Network Interpretability via Phase-Based Extrapolation | Durch eine Steerable Lens: Vergrößerung der Neuralen Netzwerk-Interpretabilität über phasenbasierte Extrapolation | a 通过固态透镜:通过基于分阶段的外推法使神经网络放大解释 2506.02300v3 |
Authors (4): Farzaneh Mahdisoltani, Saeed Mahdisoltani, Roger B. Grosse, David J. Fleet
Understanding the internal representations and decision mechanisms of deep neural networks remains a critical open challenge. While existing interpretability methods often identify influential input regions, they may not elucidate how a model distinguishes between classes or what specific changes would transition an input from one category to another. To address these limitations, we propose a novel framework that visualizes the implicit path between classes by treating the network gradient as a form of infinitesimal motion. Drawing inspiration from phase-based motion magnification, we first decompose images using invertible transforms-specifically the Complex Steerable Pyramid-then compute class-conditional gradients in the transformed space. Rather than iteratively integrating the gradient to trace a full path, we amplify the one-step gradient to the input and perform a linear extrapolation to expose how the model moves from source to target class. By operating in the steerable pyramid domain, these amplified gradients produce semantically meaningful, spatially coherent morphs that highlight the classifier’s most sensitive directions, giving insight into the geometry of its decision boundaries. Experiments on both synthetic and real-world datasets demonstrate that our phase-focused extrapolation yields perceptually aligned, semantically meaningful transformations, offering a novel, interpretable lens into neural classifiers’ internal representations.
nan
Article 494
Title@2025-06-11 (3): CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis
Title: CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis | CTPD: Cross-Modal Temporal Pattern Discovery für verbesserte multimodale elektronische Gesundheitsdatenanalyse | CTPD: 增强多模式电子健康记录分析的跨模式时间模式发现 2411.00696v2 |
Authors (4): Fuying Wang, Feng Wu, Yihan Tang, Lequan Yu
Integrating multimodal Electronic Health Records (EHR) data, such as numerical time series and free-text clinical reports, has great potential in predicting clinical outcomes. However, prior work has primarily focused on capturing temporal interactions within individual samples and fusing multimodal information, overlooking critical temporal patterns across patients. These patterns, such as trends in vital signs like abnormal heart rate or blood pressure, can indicate deteriorating health or an impending critical event. Similarly, clinical notes often contain textual descriptions that reflect these patterns. Identifying corresponding temporal patterns across different modalities is crucial for improving the accuracy of clinical outcome predictions, yet it remains a challenging task. To address this gap, we introduce a Cross-Modal Temporal Pattern Discovery (CTPD) framework, designed to efficiently extract meaningful cross-modal temporal patterns from multimodal EHR data. Our approach introduces shared initial temporal pattern representations which are refined using slot attention to generate temporal semantic embeddings. To ensure rich cross-modal temporal semantics in the learned patterns, we introduce a contrastive-based TPNCE loss for cross-modal alignment, along with two reconstruction losses to retain core information of each modality. Evaluations on two clinically critical tasks, 48-hour in-hospital mortality and 24-hour phenotype classification, using the MIMIC-III database demonstrate the superiority of our method over existing approaches.
nan
Article 495
Title@2025-06-11 (3): Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic Monitoring
Title: Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic Monitoring | Schwach überwachtes Multiple Instance-Lernen für Whale Call Detection und zeitliche Lokalisierung in der langdauerhaften passiven akustischen Überwachung | 在长期被动听力监测中,受监督的捕鲸呼叫探测和时间定位多案例学习能力薄弱 2502.20838v2 |
Authors (4): Ragib Amin Nihal, Benjamin Yen, Runwu Shi, Kazuhiro Nakadai
Marine ecosystem monitoring via Passive Acoustic Monitoring (PAM) generates vast data, but deep learning often requires precise annotations and short segments. We introduce DSMIL-LocNet, a Multiple Instance Learning framework for whale call detection and localization using only bag-level labels. Our dual-stream model processes 2-30 minute audio segments, leveraging spectral and temporal features with attention-based instance selection. Tests on Antarctic whale data show longer contexts improve classification (F1: 0.8-0.9) while medium instances ensure localization precision (0.65-0.70). This suggests MIL can enhance scalable marine monitoring. Code: https://github.com/Ragib-Amin-Nihal/DSMIL-Loc
nan
Article 496
Title@2025-06-11 (3): Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models
Title: Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models | Folgen Sie der Energie, finden Sie den Weg: Riemannsche Metrics aus energiebasierten Modellen | 遵循能源,寻找道路:以能源为基础的模型的里曼尼度量仪 2505.18230v2 |
Authors (6): Louis Béthune, David Vigouroux, Yilun Du, Rufin VanRullen, Thomas Serre, Victor Boutin
What is the shortest path between two data points lying in a high-dimensional space? While the answer is trivial in Euclidean geometry, it becomes significantly more complex when the data lies on a curved manifold – requiring a Riemannian metric to describe the space’s local curvature. Estimating such a metric, however, remains a major challenge in high dimensions. In this work, we propose a method for deriving Riemannian metrics directly from pretrained Energy-Based Models (EBMs) – a class of generative models that assign low energy to high-density regions. These metrics define spatially varying distances, enabling the computation of geodesics – shortest paths that follow the data manifold’s intrinsic geometry. We introduce two novel metrics derived from EBMs and show that they produce geodesics that remain closer to the data manifold and exhibit lower curvature distortion, as measured by alignment with ground-truth trajectories. We evaluate our approach on increasingly complex datasets: synthetic datasets with known data density, rotated character images with interpretable geometry, and high-resolution natural images embedded in a pretrained VAE latent space. Our results show that EBM-derived metrics consistently outperform established baselines, especially in high-dimensional settings. Our work is the first to derive Riemannian metrics from EBMs, enabling data-aware geodesics and unlocking scalable, geometry-driven learning for generative modeling and simulation.
nan
Article 497
Title@2025-06-11 (3): Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Title: Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models | Vertrauen ist alles, was Sie brauchen: Wenig heißes RL Feintuning von Sprachmodellen | 信心是你们需要的:很少点点RL精细的语言模式 2506.06395v3 |
Authors (5): Pengyi Li, Matvey Skripkin, Alexander Zubrey, Andrey Kuznetsov, Ivan Oseledets
Large language models (LLMs) excel at reasoning, yet post-training remains critical for aligning their behavior with task goals. Existing reinforcement learning (RL) methods often depend on costly human annotations or external reward models. We propose Reinforcement Learning via Self-Confidence (RLSC), which uses the model’s own confidence as reward signals-eliminating the need for labels, preference models, or reward engineering. Applied to Qwen2.5-Math-7B with only 16 samples per question and 10 or 20 training steps, RLSC improves accuracy by +13.4% on AIME2024, +21.2% on MATH500, +21.7% on Minerva Math, +20.8% on Olympiadbench, and +9.7% on AMC23. RLSC provides a simple, scalable post-training method for inference models, requiring only a small number of samples and unlabelled supervision.
nan
Article 498
Title@2025-06-11 (3): Improving the performance of optical inverse design of multilayer thin films using CNN-LSTM tandem neural networks
Title: Improving the performance of optical inverse design of multilayer thin films using CNN-LSTM tandem neural networks | Verbesserung der Leistungsfähigkeit des optischen inversen Designs von Mehrschicht-Dünnfilmen mit CNN-LSTM Tandem-Neuronalen Netzwerken | 利用CNN-LSTM同步神经网络改进多层薄薄膜光反光设计性能 2506.10044v1 |
Authors (4): Uijun Jung, Deokho Jang, Sungchul Kim, Jungho Kim
Optical properties of thin film are greatly influenced by the thickness of each layer. Accurately predicting these thicknesses and their corresponding optical properties is important in the optical inverse design of thin films. However, traditional inverse design methods usually demand extensive numerical simulations and optimization procedures, which are time-consuming. In this paper, we utilize deep learning for the inverse design of the transmission spectra of SiO2/TiO2 multilayer thin films. We implement a tandem neural network (TNN), which can solve the one-to-many mapping problem that greatly degrades the performance of deep-learning-based inverse designs. In general, the TNN has been implemented by a back-to-back connection of an inverse neural network and a pre-trained forward neural network, both of which have been implemented based on multilayer perceptron (MLP) algorithms. In this paper, we propose to use not only MLP, but also convolutional neural network (CNN) or long short-term memory (LSTM) algorithms in the configuration of the TNN. We show that an LSTM-LSTM-based TNN yields the highest accuracy but takes the longest training time among nine configurations of TNNs. We also find that a CNN-LSTM-based TNN will be an optimal solution in terms of accuracy and speed because it could integrate the strengths of the CNN and LSTM algorithms.
nan
Article 499
Title@2025-06-11 (3): Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation
Title: Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation | Zeiteinheitliche Diffusionspolitik mit Aktionsdiskriminierung für die Robotermanipulation | 对机器人操纵行动歧视的、时间统一的统一传播政策 2506.09422v1 |
Authors (5): Ye Niu, Sanping Zhou, Yizhe Li, Ye Den, Le Wang
In many complex scenarios, robotic manipulation relies on generative models to estimate the distribution of multiple successful actions. As the diffusion model has better training robustness than other generative models, it performs well in imitation learning through successful robot demonstrations. However, the diffusion-based policy methods typically require significant time to iteratively denoise robot actions, which hinders real-time responses in robotic manipulation. Moreover, existing diffusion policies model a time-varying action denoising process, whose temporal complexity increases the difficulty of model training and leads to suboptimal action accuracy. To generate robot actions efficiently and accurately, we present the Time-Unified Diffusion Policy (TUDP), which utilizes action recognition capabilities to build a time-unified denoising process. On the one hand, we build a time-unified velocity field in action space with additional action discrimination information. By unifying all timesteps of action denoising, our velocity field reduces the difficulty of policy learning and speeds up action generation. On the other hand, we propose an action-wise training method, which introduces an action discrimination branch to supply additional action discrimination information. Through action-wise training, the TUDP implicitly learns the ability to discern successful actions to better denoising accuracy. Our method achieves state-of-the-art performance on RLBench with the highest success rate of 82.6% on a multi-view setup and 83.8% on a single-view setup. In particular, when using fewer denoising iterations, TUDP achieves a more significant improvement in success rate. Additionally, TUDP can produce accurate actions for a wide range of real-world tasks.
nan
Article 500
Title@2025-06-11 (3): A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy
Title: A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy | Ein Aufruf für kollaborative Intelligenz: Warum Menschen-Agenten-Systeme der KI-Autonomie vorausgehen sollten | 呼吁合作情报:为什么人力-物力系统应先于自主 2506.09420v1 |
Authors (13): Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Chunyu Miao, Dongyuan Li, Aiwei Liu, Yue Zhou, Yankai Chen, Weizhi Zhang, Yangning Li, Liancheng Fang, Renhe Jiang, Philip S. Yu
Recent improvements in large language models (LLMs) have led many researchers to focus on building fully autonomous AI agents. This position paper questions whether this approach is the right path forward, as these autonomous systems still have problems with reliability, transparency, and understanding the actual requirements of human. We suggest a different approach: LLM-based Human-Agent Systems (LLM-HAS), where AI works with humans rather than replacing them. By keeping human involved to provide guidance, answer questions, and maintain control, these systems can be more trustworthy and adaptable. Looking at examples from healthcare, finance, and software development, we show how human-AI teamwork can handle complex tasks better than AI working alone. We also discuss the challenges of building these collaborative systems and offer practical solutions. This paper argues that progress in AI should not be measured by how independent systems become, but by how well they can work with humans. The most promising future for AI is not in systems that take over human roles, but in those that enhance human capabilities through meaningful partnership.
nan
Article 501
Title@2025-06-11 (3): NestQuant: Nested Lattice Quantization for Matrix Products and LLMs
Title: NestQuant: Nested Lattice Quantization for Matrix Products and LLMs | NestQuant: Nested Lattice Quantization für Matrix-Produkte und LLMs | NestQuant: 母体产品和LLMs的Nasted Lattice量化 2502.09720v2 |
Authors (4): Semyon Savkin, Eitan Porat, Or Ordentlich, Yury Polyanskiy
Post-training quantization (PTQ) has emerged as a critical technique for efficient deployment of large language models (LLMs). This work proposes NestQuant, a novel PTQ scheme for weights and activations that is based on self-similar nested lattices. Recent works have mathematically shown such quantizers to be information-theoretically optimal for low-precision matrix multiplication. We implement a practical low-complexity version of NestQuant based on Gosset lattice, making it a drop-in quantizer for any matrix multiplication step (e.g., in self-attention, MLP etc). For example, NestQuant quantizes weights, KV-cache, and activations of Llama-3-8B to 4 bits, achieving perplexity of 6.6 on wikitext2. This represents more than 55% reduction in perplexity gap with respect to unquantized model (perplexity of 6.14) compared to state-of-the-art Metas SpinQuant (perplexity 7.3), OstQuant (7.3) and QuaRot (8.2). Comparisons on bigger models (up to 70B) and on various LLM evaluation benchmarks confirm uniform superiority of NestQuant.
nan
Article 502
Title@2025-06-11 (3): DeepForm: Reasoning Large Language Model for Communication System Formulation
Title: DeepForm: Reasoning Large Language Model for Communication System Formulation | DeepForm: Großes Sprachmodell für die Formulierung von Kommunikationssystemen | 深形式:开发通信系统的理由大语言模式 2506.08551v2 |
Authors (6): Panlong Wu, Ting Wang, Yifei Zhong, Haoqi Zhang, Zitong Wang, Fangxin Wang
Communication system formulation is critical for advancing 6G and future wireless technologies, yet it remains a complex, expertise-intensive task. While Large Language Models (LLMs) offer potential, existing general-purpose models often lack the specialized domain knowledge, nuanced reasoning capabilities, and access to high-quality, domain-specific training data required for adapting a general LLM into an LLM specially for communication system formulation. To bridge this gap, we introduce DeepForm, the first reasoning LLM specially for automated communication system formulation. We propose the world-first large-scale, open-source dataset meticulously curated for this domain called Communication System Formulation Reasoning Corpus (CSFRC). Our framework employs a two-stage training strategy: first, Supervised Fine-Tuning (SFT) with Chain-of-Thought (CoT) data to distill domain knowledge; second, a novel rule-based Reinforcement Learning (RL) algorithm, C-ReMax based on ReMax, to cultivate advanced modeling capabilities and elicit sophisticated reasoning patterns like self-correction and verification. Extensive experiments demonstrate that our model achieves state-of-the-art performance, significantly outperforming larger proprietary LLMs on diverse senerios. We will release related resources to foster further research in this area after the paper is accepted.
nan
Article 503
Title@2025-06-11 (3): Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors
Title: Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors | Wer die Störung gestartet hat, sollte sie beenden: Leiten datenfreies Modell Zusammenführen über Task-Vektoren | 开始干预者应当结束干预的任何人:指导通过任务矢量合并无数据模式 2503.08099v2 |
Authors (5): Runxi Cheng, Feng Xiong, Yongxian Wei, Wanyun Zhu, Chun Yuan
Model merging seeks to integrate task-specific expert models into a unified architecture while preserving multi-task generalization capabilities, yet parameter interference between constituent models frequently induces performance degradation. Although prior work has explored many merging strategies, resolving interference without additional data for retraining or test-time computation remains challenging. In this paper, we theoretically demonstrate that the task vectors of the linear layer constitute an approximate linear subspace for its corresponding input. Therefore, we can minimize interference under the guidance of task vectors. Based on this insight, we propose \textbf{WUDI-Merging} (\textbf{W}hoever started the interference sho\textbf{U}ld en\textbf{D} \textbf{I}t), a simple yet effective model merging method that eliminates interference without any additional data or rescaling coefficients. Comprehensive empirical evaluations across vision and language benchmarks demonstrate our method’s superiority, achieving state-of-the-art performance in data-free model merging scenarios (average 10.9\% improvement versus baseline methods) while even outperforming mainstream test-time adaptation approaches by 3.3\%, and only very few computing resources are required. The code will be publicly available soon.
nan
Article 504
Title@2025-06-11 (3): Arrival Control in Quasi-Reversible Queueing Systems: Optimization and Reinforcement Learning
Title: Arrival Control in Quasi-Reversible Queueing Systems: Optimization and Reinforcement Learning | Ankunftskontrolle in Quasi-Reversible Queueing Systemen: Optimierung und Verstärkungslernen | 准可更新排队系统抵达控制:优化和强化学习 2505.16353v2 |
Authors (2): Céline Comte, Pascal Moyal
In this paper, we introduce a versatile scheme for optimizing the arrival rates of quasi-reversible queueing systems. We first propose an alternative definition of quasi-reversibility that encompasses reversibility and highlights the importance of the definition of customer classes. In a second time, we introduce balanced arrival control policies, which generalize the notion of balanced arrival rates introduced in the context of Whittle networks, to the much broader class of quasi-reversible queueing systems. We prove that supplementing a quasi-reversible queueing system with a balanced arrival-control policy preserves the quasi-reversibility, and we specify the form of the stationary measures. We revisit two canonical examples of quasi-reversible queueing systems, Whittle networks and order-independent queues. Lastly, we focus on the problem of admission control and leverage our results in the frameworks of optimization and reinforcement learning.
nan
Article 505
Title@2025-06-11 (3): Code-Switching Curriculum Learning for Multilingual Transfer in LLMs
Title: Code-Switching Curriculum Learning for Multilingual Transfer in LLMs | Code-Switching Curriculum Learning für den mehrsprachigen Transfer in LLMs | 以LLMM为多种语言转让进行代码转换课程学习 2411.02460v2 |
Authors (5): Haneul Yoo, Cheonbok Park, Sangdoo Yun, Alice Oh, Hwaran Lee
Large language models (LLMs) now exhibit near human-level performance in various tasks, but their performance drops drastically after a handful of high-resource languages due to the imbalance in pre-training data. Inspired by the human process of second language acquisition, particularly code-switching$\unicode{x2014}$the practice of language alternation in a conversation$\unicode{x2014}$we propose code-switching curriculum learning (CSCL) to enhance cross-lingual transfer for LLMs. CSCL mimics the stages of human language learning by progressively training models with a curriculum consisting of 1) token-level code-switching, 2) sentence-level code-switching, and 3) monolingual corpora. Using Qwen 2 as our underlying model, we demonstrate the efficacy of the CSCL in improving language transfer to Korean, achieving significant performance gains compared to monolingual continual pre-training methods. Ablation studies reveal that both token- and sentence-level code-switching significantly enhance cross-lingual transfer and that curriculum learning amplifies these effects. We also extend our findings into various languages, including Japanese (high-resource) and Indonesian (low-resource), and using two additional models (Gemma 2 and Phi 3.5). We further show that CSCL mitigates spurious correlations between language resources and safety alignment, presenting a robust, efficient framework for more equitable language transfer in LLMs. We observe that CSCL is effective for low-resource settings where high-quality, monolingual corpora for language transfer are hardly available.
nan
Article 506
Title@2025-06-11 (3): Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations
Title: Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations | Sim-to-Real-Causal-Transfer: Ein metrischer Lernansatz zu kausal-aware Interaktionsdarstellungen | 简单到实际因果转移:从计量学习方法进行体能互动演示 2312.04540v2 |
Authors (5): Ahmad Rahimi, Po-Chien Luan, Yuejiang Liu, Frano Rajič, Alexandre Alahi
Modeling spatial-temporal interactions among neighboring agents is at the heart of multi-agent problems such as motion forecasting and crowd navigation. Despite notable progress, it remains unclear to which extent modern representations can capture the causal relationships behind agent interactions. In this work, we take an in-depth look at the causal awareness of these representations, from computational formalism to real-world practice. First, we cast doubt on the notion of non-causal robustness studied in the recent CausalAgents benchmark. We show that recent representations are already partially resilient to perturbations of non-causal agents, and yet modeling indirect causal effects involving mediator agents remains challenging. To address this challenge, we introduce a metric learning approach that regularizes latent representations with causal annotations. Our controlled experiments show that this approach not only leads to higher degrees of causal awareness but also yields stronger out-of-distribution robustness. To further operationalize it in practice, we propose a sim-to-real causal transfer method via cross-domain multi-task learning. Experiments on pedestrian datasets show that our method can substantially boost generalization, even in the absence of real-world causal annotations. We hope our work provides a new perspective on the challenges and pathways towards causally-aware representations of multi-agent interactions. Our code is available at https://github.com/vita-epfl/CausalSim2Real.
nan
Article 507
Title@2025-06-11 (3): Scoop-and-Toss: Dynamic Object Collection for Quadrupedal Systems
Title: Scoop-and-Toss: Dynamic Object Collection for Quadrupedal Systems | Scoop-and-Toss: Dynamische Objektsammlung für Vierradsysteme | Scooop-tos:四制系统动态对象收藏 2506.09406v1 |
Authors (3): Minji Kang, Chanwoo Baek, Yoonsang Lee
Quadruped robots have made significant advances in locomotion, extending their capabilities from controlled environments to real-world applications. Beyond movement, recent work has explored loco-manipulation using the legs to perform tasks such as pressing buttons or opening doors. While these efforts demonstrate the feasibility of leg-based manipulation, most have focused on relatively static tasks. In this work, we propose a framework that enables quadruped robots to collect objects without additional actuators by leveraging the agility of their legs. By attaching a simple scoop-like add-on to one leg, the robot can scoop objects and toss them into a collection tray mounted on its back. Our method employs a hierarchical policy structure comprising two expert policies-one for scooping and tossing, and one for approaching object positions-and a meta-policy that dynamically switches between them. The expert policies are trained separately, followed by meta-policy training for coordinated multi-object collection. This approach demonstrates how quadruped legs can be effectively utilized for dynamic object manipulation, expanding their role beyond locomotion.
nan
Article 508
Title@2025-06-11 (3): Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
Title: Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps | Feinabstimmungs-Diffusions-Politiken mit Backpropagation durch Diffusion Zeitschritte | 微调通过扩散时间步骤进行反向宣传的传播政策 2505.10482v2 |
Authors (5): Ningyuan Yang, Jiaxuan Gao, Feng Gao, Yi Wu, Chao Yu
Diffusion policies, widely adopted in decision-making scenarios such as robotics, gaming and autonomous driving, are capable of learning diverse skills from demonstration data due to their high representation power. However, the sub-optimal and limited coverage of demonstration data could lead to diffusion policies that generate sub-optimal trajectories and even catastrophic failures. While reinforcement learning (RL)-based fine-tuning has emerged as a promising solution to address these limitations, existing approaches struggle to effectively adapt Proximal Policy Optimization (PPO) to diffusion models. This challenge stems from the computational intractability of action likelihood estimation during the denoising process, which leads to complicated optimization objectives. In our experiments starting from randomly initialized policies, we find that online tuning of Diffusion Policies demonstrates much lower sample efficiency compared to directly applying PPO on MLP policies (MLP+PPO). To address these challenges, we introduce NCDPO, a novel framework that reformulates Diffusion Policy as a noise-conditioned deterministic policy. By treating each denoising step as a differentiable transformation conditioned on pre-sampled noise, NCDPO enables tractable likelihood evaluation and gradient backpropagation through all diffusion timesteps. Our experiments demonstrate that NCDPO achieves sample efficiency comparable to MLP+PPO when training from scratch, outperforming existing methods in both sample efficiency and final performance across diverse benchmarks, including continuous robot control and multi-agent game scenarios. Furthermore, our experimental results show that our method is robust to the number denoising timesteps in the Diffusion Policy.
nan
Article 509
Title@2025-06-11 (3): What Can RL Bring to VLA Generalization? An Empirical Study
Title: What Can RL Bring to VLA Generalization? An Empirical Study | Was kann RL zur VLA-Verallgemeinerung bringen? Eine empirische Studie | RL能带给VLA的概括化带来什么?经验研究。 2505.19789v2 |
Authors (8): Jijia Liu, Feng Gao, Bingwen Wei, Xinlei Chen, Qingmin Liao, Yi Wu, Chao Yu, Yu Wang
Large Vision-Language Action (VLA) models have shown significant potential for embodied AI. However, their predominant training via supervised fine-tuning (SFT) limits generalization due to susceptibility to compounding errors under distribution shifts. Reinforcement learning (RL) offers a path to overcome these limitations by optimizing for task objectives via trial-and-error, yet a systematic understanding of its specific generalization benefits for VLAs compared to SFT is lacking. To address this, our study introduces a comprehensive benchmark for evaluating VLA generalization and systematically investigates the impact of RL fine-tuning across diverse visual, semantic, and execution dimensions. Our extensive experiments reveal that RL fine-tuning, particularly with PPO, significantly enhances generalization in semantic understanding and execution robustness over SFT, while maintaining comparable visual robustness. We identify PPO as a more effective RL algorithm for VLAs than LLM-derived methods like DPO and GRPO. We also develop a simple recipe for efficient PPO training on VLAs, and demonstrate its practical utility for improving VLA generalization. The project page is at https://rlvla.github.io
nan
Article 510
Title@2025-06-11 (3): AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)
Title: AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI) | AdversariaL attacK sAfety alignment(ALKALI): Sicherung von LLMs durch GRACE: Geometrische Darstellung-Bewusst Kontrastive Verbesserung- Einführung von Adversarial Vulnerability Quality Index (AVQI) | Adversarial attacK sAfety alignment(ALKALI):通过GRACE来保护LLMS:几何代表-提高软件对抗性增强-采用反脆弱性质量指数(AVQI) 2506.08885v2 |
Authors (7): Danush Khanna, Krishna Kumar, Basab Ghosh, Vinija Jain, Vasu Sharma, Aman Chadha, Amitava Das
Adversarial threats against LLMs are escalating faster than current defenses can adapt. We expose a critical geometric blind spot in alignment: adversarial prompts exploit latent camouflage, embedding perilously close to the safe representation manifold while encoding unsafe intent thereby evading surface level defenses like Direct Preference Optimization (DPO), which remain blind to the latent geometry. We introduce ALKALI, the first rigorously curated adversarial benchmark and the most comprehensive to date spanning 9,000 prompts across three macro categories, six subtypes, and fifteen attack families. Evaluation of 21 leading LLMs reveals alarmingly high Attack Success Rates (ASRs) across both open and closed source models, exposing an underlying vulnerability we term latent camouflage, a structural blind spot where adversarial completions mimic the latent geometry of safe ones. To mitigate this vulnerability, we introduce GRACE - Geometric Representation Aware Contrastive Enhancement, an alignment framework coupling preference learning with latent space regularization. GRACE enforces two constraints: latent separation between safe and adversarial completions, and adversarial cohesion among unsafe and jailbreak behaviors. These operate over layerwise pooled embeddings guided by a learned attention profile, reshaping internal geometry without modifying the base model, and achieve up to 39% ASR reduction. Moreover, we introduce AVQI, a geometry aware metric that quantifies latent alignment failure via cluster separation and compactness. AVQI reveals when unsafe completions mimic the geometry of safe ones, offering a principled lens into how models internally encode safety. We make the code publicly available at https://anonymous.4open.science/r/alkali-B416/README.md.
nan
Article 511
Title@2025-06-11 (3): Chem42: a Family of chemical Language Models for Target-aware Ligand Generation
Title: Chem42: a Family of chemical Language Models for Target-aware Ligand Generation | Chem42: eine Familie chemischer Sprachmodelle für die zielbewusste Ligand-Generation | Chem42: “ 目标觉知长者 “ 化学语言模式家庭 2503.16563v2 |
Authors (7): Aahan Singh, Engin Tekin, Maryam Nadeem, Nancy A. ElNaker, Mohammad Amaan Sayeed, Natalia Vassilieva, Boulbaba Ben Amor
Revolutionizing drug discovery demands more than just understanding molecular interactions - it requires generative models that can design novel ligands tailored to specific biological targets. While chemical Language Models (cLMs) have made strides in learning molecular properties, most fail to incorporate target-specific insights, restricting their ability to drive de-novo ligand generation. Chem42, a cutting-edge family of generative chemical Language Models, is designed to bridge this gap. By integrating atomic-level interactions with multimodal inputs from Prot42, a complementary protein Language Model, Chem42 achieves a sophisticated cross-modal representation of molecular structures, interactions, and binding patterns. This innovative framework enables the creation of structurally valid, synthetically accessible ligands with enhanced target specificity. Evaluations across diverse protein targets confirm that Chem42 surpasses existing approaches in chemical validity, target-aware design, and predicted binding affinity. By reducing the search space of viable drug candidates, Chem42 could accelerate the drug discovery pipeline, offering a powerful generative AI tool for precision medicine. Our Chem42 models set a new benchmark in molecule property prediction, conditional molecule generation, and target-aware ligand design. The models are publicly available at huggingface.co/inceptionai.
nan
Article 512
Title@2025-06-11 (3): Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization
Title: Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization | Synergisierendes Verstärkungslernen und genetische Algorithmen für die neurale Kombinatorialoptimierung | 实现神经组合优化的强化强化学习和遗传数值同步化 2506.09404v1 |
Authors (5): Shengda Gu, Kai Li, Junliang Xing, Yifan Zhang, Jian Cheng
Combinatorial optimization problems are notoriously challenging due to their discrete structure and exponentially large solution space. Recent advances in deep reinforcement learning (DRL) have enabled the learning heuristics directly from data. However, DRL methods often suffer from limited exploration and susceptibility to local optima. On the other hand, evolutionary algorithms such as Genetic Algorithms (GAs) exhibit strong global exploration capabilities but are typically sample inefficient and computationally intensive. In this work, we propose the Evolutionary Augmentation Mechanism (EAM), a general and plug-and-play framework that synergizes the learning efficiency of DRL with the global search power of GAs. EAM operates by generating solutions from a learned policy and refining them through domain-specific genetic operations such as crossover and mutation. These evolved solutions are then selectively reinjected into the policy training loop, thereby enhancing exploration and accelerating convergence. We further provide a theoretical analysis that establishes an upper bound on the KL divergence between the evolved solution distribution and the policy distribution, ensuring stable and effective policy updates. EAM is model-agnostic and can be seamlessly integrated with state-of-the-art DRL solvers such as the Attention Model, POMO, and SymNCO. Extensive results on benchmark problems (e.g., TSP, CVRP, PCTSP, and OP) demonstrate that EAM significantly improves both solution quality and training efficiency over competitive baselines.
nan
Article 513
Title@2025-06-11 (3): Pre-trained Large Language Models Learn Hidden Markov Models In-context
Title: Pre-trained Large Language Models Learn Hidden Markov Models In-context | Vortrainierte große Sprachmodelle lernen versteckte Markov-Modelle In-Kontext | 受过培训的大型语言模型学习隐藏Markov模型 2506.07298v2 |
Authors (5): Yijia Dai, Zhaolin Gao, Yahya Sattar, Sarah Dean, Jennifer J. Sun
Hidden Markov Models (HMMs) are foundational tools for modeling sequential data with latent Markovian structure, yet fitting them to real-world data remains computationally challenging. In this work, we show that pre-trained large language models (LLMs) can effectively model data generated by HMMs via in-context learning (ICL)$\unicode{x2013}$their ability to infer patterns from examples within a prompt. On a diverse set of synthetic HMMs, LLMs achieve predictive accuracy approaching the theoretical optimum. We uncover novel scaling trends influenced by HMM properties, and offer theoretical conjectures for these empirical observations. We also provide practical guidelines for scientists on using ICL as a diagnostic tool for complex data. On real-world animal decision-making tasks, ICL achieves competitive performance with models designed by human experts. To our knowledge, this is the first demonstration that ICL can learn and predict HMM-generated sequences$\unicode{x2013}$an advance that deepens our understanding of in-context learning in LLMs and establishes its potential as a powerful tool for uncovering hidden structure in complex scientific data.
nan
Article 514
Title@2025-06-11 (3): LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification
Title: LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification | LLM Enhancers for GNNs: Eine Analyse aus der Perspektive der Identifizierung von Kausalmechanismen | GNN的LLM增强者:从因果关系机制的确定角度进行分析 2505.08265v3 |
Authors (6): Hang Gao, Wenxuan Huang, Fengge Wu, Junsuo Zhao, Changwen Zheng, Huaping Liu
The use of large language models (LLMs) as feature enhancers to optimize node representations, which are then used as inputs for graph neural networks (GNNs), has shown significant potential in graph representation learning. However, the fundamental properties of this approach remain underexplored. To address this issue, we propose conducting a more in-depth analysis of this issue based on the interchange intervention method. First, we construct a synthetic graph dataset with controllable causal relationships, enabling precise manipulation of semantic relationships and causal modeling to provide data for analysis. Using this dataset, we conduct interchange interventions to examine the deeper properties of LLM enhancers and GNNs, uncovering their underlying logic and internal mechanisms. Building on the analytical results, we design a plug-and-play optimization module to improve the information transfer between LLM enhancers and GNNs. Experiments across multiple datasets and models validate the proposed module.
nan
Article 515
Title@2025-06-11 (3): A theoretical basis for model collapse in recursive training
Title: A theoretical basis for model collapse in recursive training | Eine theoretische Grundlage für Modellkollaps in rekursiver Ausbildung | 循环培训模式崩溃的理论基础 2506.09401v1 |
Authors (1): Vivek Shripad Borkar
It is known that recursive training from generative models can lead to the so called `collapse’ of the simulated probability distribution. This note shows that one in fact gets two different asymptotic behaviours depending on whether an external source, howsoever minor, is also contributing samples.
nan
Article 516
Title@2025-06-11 (3): Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames
Title: Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames | Effiziente Vorhersage von SO(3)-Equivarienten Hamiltonian Matrices über SO(2) Local Frames | 通过SO(2)地方框架有效预测SO(3)-等质汉密尔顿母体 2506.09398v1 |
Authors (5): Haiyang Yu, Yuchao Lin, Xuan Zhang, Xiaofeng Qian, Shuiwang Ji
We consider the task of predicting Hamiltonian matrices to accelerate electronic structure calculations, which plays an important role in physics, chemistry, and materials science. Motivated by the inherent relationship between the off-diagonal blocks of the Hamiltonian matrix and the SO(2) local frame, we propose a novel and efficient network, called QHNetV2, that achieves global SO(3) equivariance without the costly SO(3) Clebsch-Gordan tensor products. This is achieved by introducing a set of new efficient and powerful SO(2)-equivariant operations and performing all off-diagonal feature updates and message passing within SO(2) local frames, thereby eliminating the need of SO(3) tensor products. Moreover, a continuous SO(2) tensor product is performed within the SO(2) local frame at each node to fuse node features, mimicking the symmetric contraction operation. Extensive experiments on the large QH9 and MD17 datasets demonstrate that our model achieves superior performance across a wide range of molecular structures and trajectories, highlighting its strong generalization capability. The proposed SO(2) operations on SO(2) local frames offer a promising direction for scalable and symmetry-aware learning of electronic structures. Our code will be released as part of the AIRS library https://github.com/divelab/AIRS.
nan
Article 517
Title@2025-06-11 (3): The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent
Title: The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent | Der Computational Advantage of Depth: High-Dimensional Hierarchical Funktionen mit gradienten Abstieg lernen | 深度的计算优势:学习带有渐变后代的高多元等级功能 2502.13961v3 |
Authors (4): Yatin Dandi, Luca Pesce, Lenka Zdeborová, Florent Krzakala
Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. In this paper, we introduce a class of target functions (single and multi-index Gaussian hierarchical targets) that incorporate a hierarchy of latent subspace dimensionalities. This framework enables us to analytically study the learning dynamics and generalization performance of deep networks compared to shallow ones in the high-dimensional limit. Specifically, our main theorem shows that feature learning with GD successively reduces the effective dimensionality, transforming a high-dimensional problem into a sequence of lower-dimensional ones. This enables learning the target function with drastically less samples than with shallow networks. While the results are proven in a controlled training setting, we also discuss more common training procedures and argue that they learn through the same mechanisms.
nan
Article 518
Title@2025-06-11 (3): SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving
Title: SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving | SLED: Ein spekulatives LLM-Decoding-Framework für effizientes Edge Serving | SLED: 有效边缘服务投机性LLM代谢框架 2506.09397v1 |
Authors (5): Xiangchen Li, Dimitrios Spatharakis, Saeid Ghafouri, Jiakun Fan, Dimitrios Nikolopoulos
Regardless the advancements in device capabilities, efficient inferencing advanced large language models (LLMs) at the edge remains challenging due to limited device memory and power constraints. Existing strategies, such as aggressive quantization, pruning, or remote inference, trade accuracy for efficiency or lead to substantial cost burdens. This position paper introduces a new approach that leverages speculative decoding, previously viewed primarily as a decoding acceleration technique for autoregressive generation of LLMs, as a promising approach specifically adapted for edge computing by orchestrating computation across heterogeneous devices. We propose SLED, a method that allows lightweight edge devices to draft multiple candidate tokens locally using diverse draft models, while a single, shared edge server efficiently batches and verifies the tokens utilizing a more precise target model. This approach supports device heterogeneity and reduces server-side memory footprint by avoiding the need to deploy multiple target models. Our initial experiments with Jetson Orin Nano, Raspberry Pi 5, and an RTX 6000 edge server indicate substantial benefits: significantly reduced latency, improved energy efficiency, and increased concurrent inference sessions, all without sacrificing model accuracy.
nan
Article 519
Title@2025-06-11 (3): A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities
Title: A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities | Ein genauerer Blick auf TabPFN v2: seine Stärken verstehen und seine Fähigkeiten erweitern | 更仔细地看看TabPFN v 2:了解其力量并扩大其能力 2502.17361v2 |
Authors (3): Han-Jia Ye, Si-Yang Liu, Wei-Lun Chao
Tabular datasets are inherently heterogeneous, presenting significant challenges for developing pre-trained foundation models. The recently introduced transformer-based Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning performance across diverse downstream datasets, marking a pivotal advancement in tabular foundation models. In this paper, we take a closer look at TabPFN v2 to examine how it effectively handles heterogeneity and achieves high predictive accuracy, and to explore how its limitations in high-dimensional, many-category, and large-scale tasks can be mitigated. We find that TabPFN v2 can infer attribute relationships even when provided with randomized attribute token inputs, eliminating the need to explicitly learn dataset-specific attribute embeddings to address heterogeneity. We further show that TabPFN v2 can be transformed into a feature extractor, revealing its ability to construct a highly separable feature space for accurate predictions. Lastly, we demonstrate that TabPFN v2’s limitations can be addressed through a test-time divide-and-conquer strategy, enabling scalable inference without requiring re-training. By uncovering the mechanisms behind TabPFN v2’s success and introducing strategies to extend its applicability, this study offers key insights into the design of future tabular foundation models.
nan
Article 520
Title@2025-06-11 (3): MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models
Title: MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models | MedChat: Multi-Agenten-Framework für multimodale Diagnosen mit großen Sprachmodellen | MedChat:使用大语言模式的多语言多模式诊断多机构框架 2506.07400v2 |
Authors (9): Philip R. Liu, Sparsh Bansal, Jimmy Dinh, Aditya Pawar, Ramani Satishkumar, Shail Desai, Neeraj Gupta, Xin Wang, Shu Hu
The integration of deep learning-based glaucoma detection with large language models (LLMs) presents an automated strategy to mitigate ophthalmologist shortages and improve clinical reporting efficiency. However, applying general LLMs to medical imaging remains challenging due to hallucinations, limited interpretability, and insufficient domain-specific medical knowledge, which can potentially reduce clinical accuracy. Although recent approaches combining imaging models with LLM reasoning have improved reporting, they typically rely on a single generalist agent, restricting their capacity to emulate the diverse and complex reasoning found in multidisciplinary medical teams. To address these limitations, we propose MedChat, a multi-agent diagnostic framework and platform that combines specialized vision models with multiple role-specific LLM agents, all coordinated by a director agent. This design enhances reliability, reduces hallucination risk, and enables interactive diagnostic reporting through an interface tailored for clinical review and educational use. Code available at https://github.com/Purdue-M2/MedChat.
nan
Article 521
Title@2025-06-11 (3): Spatial-RAG: Spatial Retrieval Augmented Generation for Real-World Geospatial Reasoning Questions
Title: Spatial-RAG: Spatial Retrieval Augmented Generation for Real-World Geospatial Reasoning Questions | Raum-RAG: Raum-Retrieval Augmented Generation für Real-World Geospatial Reasoning Fragen | 空间-RAG:世界现实地理空间原因问题空间检索增加一代 2502.18470v5 |
Authors (6): Dazhou Yu, Riyang Bao, Ruiyu Ning, Jinghong Peng, Gengchen Mai, Liang Zhao
Answering real-world geospatial questions–such as finding restaurants along a travel route or amenities near a landmark–requires reasoning over both geographic relationships and semantic user intent. However, existing large language models (LLMs) lack spatial computing capabilities and access to up-to-date, ubiquitous real-world geospatial data, while traditional geospatial systems fall short in interpreting natural language. To bridge this gap, we introduce Spatial-RAG, a Retrieval-Augmented Generation (RAG) framework designed for geospatial question answering. Spatial-RAG integrates structured spatial databases with LLMs via a hybrid spatial retriever that combines sparse spatial filtering and dense semantic matching. It formulates the answering process as a multi-objective optimization over spatial and semantic relevance, identifying Pareto-optimal candidates and dynamically selecting the best response based on user intent. Experiments across multiple tourism and map-based QA datasets show that Spatial-RAG significantly improves accuracy, precision, and ranking performance over strong baselines.
nan
Article 522
Title@2025-06-11 (3): Coil2Coil: Self-supervised MR image denoising using phased-array coil images
Title: Coil2Coil: Self-supervised MR image denoising using phased-array coil images | Coil2Coil: Selbstüberwachtes MR-Bild, das mit Hilfe von Phased-Array-Coil-Bildern dämpft | Coil2Cooil:使用相相继阵状圆圈图像自行监督的 MR 图像除去污染 2208.07552v2 |
Authors (6): Juhyung Park, Dongwon Park, Sooyeon Ji, Hyeong-Geol Shin, Se Young Chun, Jongho Lee
Denoising of magnetic resonance images is beneficial in improving the quality of low signal-to-noise ratio images. Recently, denoising using deep neural networks has demonstrated promising results. Most of these networks, however, utilize supervised learning, which requires large training images of noise-corrupted and clean image pairs. Obtaining training images, particularly clean images, is expensive and time-consuming. Hence, methods such as Noise2Noise (N2N) that require only pairs of noise-corrupted images have been developed to reduce the burden of obtaining training datasets. In this study, we propose a new self-supervised denoising method, Coil2Coil (C2C), that does not require the acquisition of clean images or paired noise-corrupted images for training. Instead, the method utilizes multichannel data from phased-array coils to generate training images. First, it divides and combines multichannel coil images into two images, one for input and the other for label. Then, they are processed to impose noise independence and sensitivity normalization such that they can be used for the training images of N2N. For inference, the method inputs a coil-combined image (e.g., DICOM image), enabling a wide application of the method. When evaluated using synthetic noise-added images, C2C shows the best performance against several self-supervised methods, reporting comparable outcomes to supervised methods. When testing the DICOM images, C2C successfully denoised real noise without showing structure-dependent residuals in the error maps. Because of the significant advantage of not requiring additional scans for clean or paired images, the method can be easily utilized for various clinical applications.
nan
Article 523
Title@2025-06-11 (3): Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation
Title: Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation | Die neuen Diffusionsmodelle: Vom Generativen Pre-Training bis zur Ein-Schritt-Generation | 重新审视传播模式:从产生培训前培训到单步一代 2506.09376v1 |
Authors (2): Bowen Zheng, Tianming Yang
Diffusion distillation is a widely used technique to reduce the sampling cost of diffusion models, yet it often requires extensive training, and the student performance tends to be degraded. Recent studies show that incorporating a GAN objective may alleviate these issues, yet the underlying mechanism remains unclear. In this work, we first identify a key limitation of distillation: mismatched step sizes and parameter numbers between the teacher and the student model lead them to converge to different local minima, rendering direct imitation suboptimal. We further demonstrate that a standalone GAN objective, without relying a distillation loss, overcomes this limitation and is sufficient to convert diffusion models into efficient one-step generators. Based on this finding, we propose that diffusion training may be viewed as a form of generative pre-training, equipping models with capabilities that can be unlocked through lightweight GAN fine-tuning. Supporting this view, we create a one-step generation model by fine-tuning a pre-trained model with 85% of parameters frozen, achieving strong performance with only 0.2M images and near-SOTA results with 5M images. We further present a frequency-domain analysis that may explain the one-step generative capability gained in diffusion training. Overall, our work provides a new perspective for diffusion training, highlighting its role as a powerful generative pre-training process, which can be the basis for building efficient one-step generation models.
nan
Article 524
Title@2025-06-11 (3): Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining
Title: Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining | Modality-Balancing Preference Optimierung großer multimodaler Modelle durch disversariale Negative Mining | 通过反反向负负采矿优化大型多式联运模型 2506.08022v2 |
Authors (7): Chenxi Liu, Tianyi Xiong, Ruibo Chen, Yihan Wu, Junfeng Guo, Tianyi Zhou, Heng Huang
The task adaptation and alignment of Large Multimodal Models (LMMs) have been significantly advanced by instruction tuning and further strengthened by recent preference optimization. Yet, most LMMs still suffer from severe modality imbalance during reasoning, i.e., outweighing language prior biases over visual inputs, which bottlenecks their generalization to downstream tasks and causes hallucinations. However, existing preference optimization approaches for LMMs do not focus on restraining the internal biases of their Large Language Model (LLM) backbones when curating the training data. Moreover, they heavily rely on offline data and lack the capacity to explore diverse responses adaptive to dynamic distributional shifts during training. Meanwhile, Group Relative Policy Optimization (GRPO), a recent method using online-generated data and verified rewards to improve reasoning capabilities, remains largely underexplored in LMM alignment. In this paper, we propose a novel preference learning framework, Modality-Balancing Preference Optimization (MBPO), to address the modality imbalance in LMMs. MBPO constructs a more effective offline preference dataset by generating hard negatives, i.e., rejected responses misled by LLM biases due to limited usage of visual information, through adversarial perturbation of input images. Moreover, MBPO leverages the easy-to-verify nature of close-ended tasks to generate online responses with verified rewards. GRPO is then employed to train the model with offline-online hybrid data. Extensive experiments demonstrate that MBPO can enhance LMM performance on challenging vision-language tasks and effectively reduce hallucinations.
nan
Article 525
Title@2025-06-11 (3): Effective Regularization Through Loss-Function Metalearning
Title: Effective Regularization Through Loss-Function Metalearning | Effektive Regularisierung durch Loss-Function Metalearning | 通过损失 – – 职能计算学习实现有效正规化 2010.00788v5 |
Authors (3): Santiago Gonzalez, Xin Qiu, Risto Miikkulainen
Evolutionary computation can be used to optimize several different aspects of neural network architectures. For instance, the TaylorGLO method discovers novel, customized loss functions, resulting in improved performance, faster training, and improved data utilization. A likely reason is that such functions discourage overfitting, leading to effective regularization. This paper demonstrates theoretically that this is indeed the case for TaylorGLO. Learning rule decomposition reveals that evolved loss functions balance two factors: the pull toward zero error, and a push away from it to avoid overfitting. This is a general principle that may be used to understand other regularization techniques as well (as demonstrated in this paper for label smoothing). The theoretical analysis leads to a constraint that can be utilized to find more effective loss functions in practice; the mechanism also results in networks that are more robust (as demonstrated in this paper with adversarial inputs). The analysis in this paper thus constitutes a first step towards understanding regularization, and demonstrates the power of evolutionary neural architecture search in general.
nan
Article 526
Title@2025-06-11 (3): Low-resource domain adaptation while minimizing energy and hardware resource consumption
Title: Low-resource domain adaptation while minimizing energy and hardware resource consumption | Low-Resource-Domänenanpassung bei Minimierung des Energie- und Hardware-Ressourcenverbrauchs | 低资源领域的适应,同时尽量减少能源和硬件资源的消耗 2506.08433v2 |
Authors (3): Hernán Maina, Nicolás Wolovick, Luciana Benotti
Training Large Language Models (LLMs) is costly in terms of energy, hardware, and annotated data, often resulting in a positionality rooted in predominant cultures and values (Santy et al., 2023). Domain adaptation has emerged as a promising strategy to better align models with diverse cultural and value contexts (Hershcovich et al., 2022), but its computational cost remains a significant barrier, particularly for research groups lacking access to large-scale infrastructure. In this paper, we evaluate how the use of different numerical precision formats and data parallelization strategies impacts both training speed (as a proxy to energy and hardware consumption) and model accuracy, with the goal of facilitating domain adaptation in low-resource environments. Our findings are relevant to any setting where energy efficiency, accessibility, or limited hardware availability are key concerns.
nan
Article 527
Title@2025-06-11 (3): TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation
Title: TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation | TimeDART: Ein Diffusion Autoregressiver Transformer für selbstüberwachte Zeitreihendarstellung | TimeDART: 自动控制时间序列代表的传播自动递减变动器 2410.05711v5 |
Authors (4): Daoyu Wang, Mingyue Cheng, Zhiding Liu, Qi Liu
Self-supervised learning has garnered increasing attention in time series analysis for benefiting various downstream tasks and reducing reliance on labeled data. Despite its effectiveness, existing methods often struggle to comprehensively capture both long-term dynamic evolution and subtle local patterns in a unified manner. In this work, we propose \textbf{TimeDART}, a novel self-supervised time series pre-training framework that unifies two powerful generative paradigms to learn more transferable representations. Specifically, we first employ a causal Transformer encoder, accompanied by a patch-based embedding strategy, to model the evolving trends from left to right. Building on this global modeling, we further introduce a denoising diffusion process to capture fine-grained local patterns through forward diffusion and reverse denoising. Finally, we optimize the model in an autoregressive manner. As a result, TimeDART effectively accounts for both global and local sequence features in a coherent way. We conduct extensive experiments on public datasets for time series forecasting and classification. The experimental results demonstrate that TimeDART consistently outperforms previous compared methods, validating the effectiveness of our approach. Our code is available at https://github.com/Melmaphother/TimeDART.
nan
Article 528
Title@2025-06-11 (3): LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
Title: LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization | LPO: Auf dem Weg zu einer präzisen Interaktion der GUI-Agenten über Standortpräferenzoptimierung | LPO:通过位置优化实现准确的图形界面代理互动 2506.09373v1 |
Authors (12): Jiaqi Tang, Yu Xia, Yi-Feng Wu, Yuwei Hu, Yuhui Chen, Qing-Guo Chen, Xiaogang Xu, Xiangyu Wu, Hao Lu, Yanqing Ma, Shiyin Lu, Qifeng Chen
The advent of autonomous agents is transforming interactions with Graphical User Interfaces (GUIs) by employing natural language as a powerful intermediary. Despite the predominance of Supervised Fine-Tuning (SFT) methods in current GUI agents for achieving spatial localization, these methods face substantial challenges due to their limited capacity to accurately perceive positional data. Existing strategies, such as reinforcement learning, often fail to assess positional accuracy effectively, thereby restricting their utility. In response, we introduce Location Preference Optimization (LPO), a novel approach that leverages locational data to optimize interaction preferences. LPO uses information entropy to predict interaction positions by focusing on zones rich in information. Besides, it further introduces a dynamic location reward function based on physical distance, reflecting the varying importance of interaction positions. Supported by Group Relative Preference Optimization (GRPO), LPO facilitates an extensive exploration of GUI environments and significantly enhances interaction precision. Comprehensive experiments demonstrate LPO’s superior performance, achieving SOTA results across both offline benchmarks and real-world online evaluations. Our code will be made publicly available soon, at https://github.com/AIDC-AI/LPO.
nan
Article 529
Title@2025-06-11 (3): Learnable Spatial-Temporal Positional Encoding for Link Prediction
Title: Learnable Spatial-Temporal Positional Encoding for Link Prediction | Erlernbare räumlich-temporale Positionskodierung für Link Prediction | 链接预测的空间-临时定位编码 2506.08309v2 |
Authors (5): Katherine Tieu, Dongqi Fu, Zihao Li, Ross Maciejewski, Jingrui He
Accurate predictions rely on the expressiveness power of graph deep learning frameworks like graph neural networks and graph transformers, where a positional encoding mechanism has become much more indispensable in recent state-of-the-art works to record the canonical position information. However, the current positional encoding is limited in three aspects: (1) most positional encoding methods use pre-defined, and fixed functions, which are inadequate to adapt to the complex attributed graphs; (2) a few pioneering works proposed the learnable positional encoding but are still limited to the structural information, not considering the real-world time-evolving topological and feature information; (3) most positional encoding methods are equipped with transformers’ attention mechanism to fully leverage their capabilities, where the dense or relational attention is often unaffordable on large-scale structured data. Hence, we aim to develop Learnable Spatial-Temporal Positional Encoding in an effective and efficient manner and propose a simple temporal link prediction model named L-STEP. Briefly, for L-STEP, we (1) prove the proposed positional learning scheme can preserve the graph property from the spatial-temporal spectral viewpoint, (2) verify that MLPs can fully exploit the expressiveness and reach transformers’ performance on that encoding, (3) change different initial positional encoding inputs to show robustness, (4) analyze the theoretical complexity and obtain less empirical running time than SOTA, and (5) demonstrate its temporal link prediction out-performance on 13 classic datasets and with 10 algorithms in both transductive and inductive settings using 3 different sampling strategies. Also, L-STEP obtains the leading performance in the newest large-scale TGB benchmark. Our code is available at https://github.com/kthrn22/L-STEP.
nan
Article 530
Title@2025-06-11 (3): Anomaly Detection and Generation with Diffusion Models: A Survey
Title: Anomaly Detection and Generation with Diffusion Models: A Survey | Anomalieerkennung und -generierung mit Diffusionsmodellen: Eine Umfrage | 使用传播模型进行异常检测和生成:调查 2506.09368v1 |
Authors (10): Yang Liu, Jing Liu, Chengfang Li, Rui Xi, Wenchao Li, Liang Cao, Jin Wang, Laurence T. Yang, Junsong Yuan, Wei Zhou
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing, by identifying unexpected patterns that deviate from established norms in real-world data. Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest due to their ability to learn complex data distributions and generate high-fidelity samples, offering a robust framework for unsupervised AD. In this survey, we comprehensively review anomaly detection and generation with diffusion models (ADGDM), presenting a tutorial-style analysis of the theoretical foundations and practical implementations and spanning images, videos, time series, tabular, and multimodal data. Crucially, unlike existing surveys that often treat anomaly detection and generation as separate problems, we highlight their inherent synergistic relationship. We reveal how DMs enable a reinforcing cycle where generation techniques directly address the fundamental challenge of anomaly data scarcity, while detection methods provide critical feedback to improve generation fidelity and relevance, advancing both capabilities beyond their individual potential. A detailed taxonomy categorizes ADGDM methods based on anomaly scoring mechanisms, conditioning strategies, and architectural designs, analyzing their strengths and limitations. We final discuss key challenges including scalability and computational efficiency, and outline promising future directions such as efficient architectures, conditioning strategies, and integration with foundation models (e.g., visual-language models and large language models). By synthesizing recent advances and outlining open research questions, this survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
nan
Article 531
Title@2025-06-11 (3): An Online Learning Approach to Prompt-based Selection of Generative Models and LLMs
Title: An Online Learning Approach to Prompt-based Selection of Generative Models and LLMs | Ein Online-Lernansatz zur sofortigen Auswahl generativer Modelle und LLMs | 以在线学习方式迅速选择创创模式和LLMs 2410.13287v4 |
Authors (3): Xiaoyan Hu, Ho-fung Leung, Farzan Farnia
Selecting a sample generation scheme from multiple prompt-based generative models, including large language models (LLMs) and prompt-guided image and video generation models, is typically addressed by choosing the model that maximizes an averaged evaluation score. However, this score-based selection overlooks the possibility that different models achieve the best generation performance for different types of text prompts. An online identification of the best generation model for various input prompts can reduce the costs associated with querying sub-optimal models. In this work, we explore the possibility of varying rankings of text-based generative models for different text prompts and propose an online learning framework to predict the best data generation model for a given input prompt. The proposed PAK-UCB algorithm addresses a contextual bandit (CB) setting with shared context variables across the arms, utilizing the generated data to update kernel-based functions that predict the score of each model available for unseen text prompts. Additionally, we leverage random Fourier features (RFF) to accelerate the online learning process of PAK-UCB. Our numerical experiments on real and simulated text-to-image and image-to-text generative models show that RFF-UCB performs successfully in identifying the best generation model across different sample types. The code is available at: github.com/yannxiaoyanhu/dgm-online-select.
nan
Article 532
Title@2025-06-11 (3): SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending
Title: SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending | SkillBlender: Auf dem Weg zu einer vielseitigen humanoiden Ganzkörperloko-Manipulation durch Skill Blending | 技术人才库:通过技能混合处理,努力通过技能混合处理,实现全体操控和操纵 2506.09366v1 |
Authors (8): Yuxuan Kuang, Haoran Geng, Amine Elhafsi, Tan-Dzung Do, Pieter Abbeel, Jitendra Malik, Marco Pavone, Yue Wang
Humanoid robots hold significant potential in accomplishing daily tasks across diverse environments thanks to their flexibility and human-like morphology. Recent works have made significant progress in humanoid whole-body control and loco-manipulation leveraging optimal control or reinforcement learning. However, these methods require tedious task-specific tuning for each task to achieve satisfactory behaviors, limiting their versatility and scalability to diverse tasks in daily scenarios. To that end, we introduce SkillBlender, a novel hierarchical reinforcement learning framework for versatile humanoid loco-manipulation. SkillBlender first pretrains goal-conditioned task-agnostic primitive skills, and then dynamically blends these skills to accomplish complex loco-manipulation tasks with minimal task-specific reward engineering. We also introduce SkillBench, a parallel, cross-embodiment, and diverse simulated benchmark containing three embodiments, four primitive skills, and eight challenging loco-manipulation tasks, accompanied by a set of scientific evaluation metrics balancing accuracy and feasibility. Extensive simulated experiments show that our method significantly outperforms all baselines, while naturally regularizing behaviors to avoid reward hacking, resulting in more accurate and feasible movements for diverse loco-manipulation tasks in our daily scenarios. Our code and benchmark will be open-sourced to the community to facilitate future research. Project page: https://usc-gvl.github.io/SkillBlender-web/.
nan
Article 533
Title@2025-06-11 (3): TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Title: TextSquare: Scaling up Text-Centric Visual Instruction Tuning | TextSquare: Skalieren von Text-Centric Visual Instruction Tuning | TextSquare: 增强文本内容的视觉教学图示 2404.12803v3 |
Authors (18): Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Yangfan He, Kuan Lu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao Liu, Yuan Xie, Xiang Bai, Can Huang
Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data. To this end, we introduce a new approach for creating a massive, high-quality instruction-tuning dataset, Square-10M, which is generated using closed-source MLLMs. The data construction process, termed Square, consists of four steps: Self-Questioning, Answering, Reasoning, and Evaluation. Our experiments with Square-10M led to three key findings: 1) Our model, TextSquare, considerably surpasses open-source previous state-of-the-art Text-centric MLLMs and sets a new standard on OCRBench(62.2%). It even outperforms top-tier models like GPT4V and Gemini in 6 of 10 text-centric benchmarks. 2) Additionally, we demonstrate the critical role of VQA reasoning data in offering comprehensive contextual insights for specific questions. This not only improves accuracy but also significantly mitigates hallucinations. Specifically, TextSquare scores an average of 75.1% across four general VQA and hallucination evaluation datasets, outperforming previous state-of-the-art models. 3) Notably, the phenomenon observed in scaling text-centric VQA datasets reveals a vivid pattern: the exponential increase of instruction tuning data volume is directly proportional to the improvement in model performance, thereby validating the necessity of the dataset scale and the high quality of Square-10M.
nan
Article 534
Title@2025-06-11 (3): Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Title: Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models | Automatische Pseudo-Harmful Prompt Generation zur Bewertung falscher Ablehnungen in großen Sprachmodellen | 自动生成用于评价大语言模式虚假拒绝的优多有害快速自动生成 2409.00598v2 |
Authors (6): Bang An, Sicheng Zhu, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang
Safety-aligned large language models (LLMs) sometimes falsely refuse pseudo-harmful prompts, like “how to kill a mosquito,” which are actually harmless. Frequent false refusals not only frustrate users but also provoke a public backlash against the very values alignment seeks to protect. In this paper, we propose the first method to auto-generate diverse, content-controlled, and model-dependent pseudo-harmful prompts. Using this method, we construct an evaluation dataset called PHTest, which is ten times larger than existing datasets, covers more false refusal patterns, and separately labels controversial prompts. We evaluate 20 LLMs on PHTest, uncovering new insights due to its scale and labeling. Our findings reveal a trade-off between minimizing false refusals and improving safety against jailbreak attacks. Moreover, we show that many jailbreak defenses significantly increase the false refusal rates, thereby undermining usability. Our method and dataset can help developers evaluate and fine-tune safer and more usable LLMs. Our code and dataset are available at https://github.com/umd-huang-lab/FalseRefusal
nan
Article 535
Title@2025-06-11 (3): Generalized Lie Symmetries in Physics-Informed Neural Operators
Title: Generalized Lie Symmetries in Physics-Informed Neural Operators | Generalisierte Lie-Symmetrien in physikinformierten Neuraloperatoren | 物理内成形神经操作器中的一般谎言对称 2502.00373v2 |
Authors (5): Amy Xiang Wang, Zakhar Shumaylov, Peter Zaika, Ferdia Sherry, Carola-Bibiane Schönlieb
Physics-informed neural operators (PINOs) have emerged as powerful tools for learning solution operators of partial differential equations (PDEs). Recent research has demonstrated that incorporating Lie point symmetry information can significantly enhance the training efficiency of PINOs, primarily through techniques like data, architecture, and loss augmentation. In this work, we focus on the latter, highlighting that point symmetries oftentimes result in no training signal, limiting their effectiveness in many problems. To address this, we propose a novel loss augmentation strategy that leverages evolutionary representatives of point symmetries, a specific class of generalized symmetries of the underlying PDE. These generalized symmetries provide a richer set of generators compared to standard symmetries, leading to a more informative training signal. We demonstrate that leveraging evolutionary representatives enhances the performance of neural operators, resulting in improved data efficiency and accuracy during training.
nan
Article 536
Title@2025-06-11 (3): PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs
Title: PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs | PDE-Controller: LLMs für Autoformalisierung und Begründung von PDEs | PDE – – 主计长:PDE的自动化正规化和原因说明 2502.00963v2 |
Authors (6): Mauricio Soroco, Jialin Song, Mengzhou Xia, Kye Emond, Weiran Sun, Wuyang Chen
While recent AI-for-math has made strides in pure mathematics, areas of applied mathematics, particularly PDEs, remain underexplored despite their significant real-world applications. We present PDE-Controller, a framework that enables large language models (LLMs) to control systems governed by partial differential equations (PDEs). Our approach enables LLMs to transform informal natural language instructions into formal specifications, and then execute reasoning and planning steps to improve the utility of PDE control. We build a holistic solution comprising datasets (both human-written cases and 2 million synthetic samples), math-reasoning models, and novel evaluation metrics, all of which require significant effort. Our PDE-Controller significantly outperforms prompting the latest open source and GPT models in reasoning, autoformalization, and program synthesis, achieving up to a 62% improvement in utility gain for PDE control. By bridging the gap between language generation and PDE systems, we demonstrate the potential of LLMs in addressing complex scientific and engineering challenges. We release all data, model checkpoints, and code at https://pde-controller.github.io/.
nan
Article 537
Title@2025-06-11 (3): Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
Title: Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation | Autoregressive Adversariale Nachschulung für die Echtzeit-Interactive Videogenerierung | 实时互动互动视频制作培训后自动递减反反反反向自动 2506.09350v1 |
Authors (9): Shanchuan Lin, Ceyuan Yang, Hao He, Jianwen Jiang, Yuxi Ren, Xin Xia, Yang Zhao, Xuefeng Xiao, Lu Jiang
Existing large-scale video generation models are computationally intensive, preventing adoption in real-time and interactive applications. In this work, we propose autoregressive adversarial post-training (AAPT) to transform a pre-trained latent video diffusion model into a real-time, interactive video generator. Our model autoregressively generates a latent frame at a time using a single neural function evaluation (1NFE). The model can stream the result to the user in real time and receive interactive responses as controls to generate the next latent frame. Unlike existing approaches, our method explores adversarial training as an effective paradigm for autoregressive generation. This not only allows us to design an architecture that is more efficient for one-step generation while fully utilizing the KV cache, but also enables training the model in a student-forcing manner that proves to be effective in reducing error accumulation during long video generation. Our experiments demonstrate that our 8B model achieves real-time, 24fps, streaming video generation at 736x416 resolution on a single H100, or 1280x720 on 8xH100 up to a minute long (1440 frames). Visit our research website at https://seaweed-apt.com/2
nan
Article 538
Title@2025-06-11 (3): TS-PIELM: Time-Stepping Physics-Informed Extreme Learning Machine Facilitates Soil Consolidation Analyses
Title: TS-PIELM: Time-Stepping Physics-Informed Extreme Learning Machine Facilitates Soil Consolidation Analyses | TS-PIELM: Zeitschnelle Physik-informierte extreme Lernmaschine erleichtert Bodenkonsolidierungsanalysen | TS-PIELM: 时间制备物理系统一体化极端学习机器促进土壤巩固分析 2506.08381v2 |
Authors (5): He Yang, Fei Ren, Hai-Sui Yu, Xueyu Geng, Pei-Zhi Zhuang
Accuracy and efficiency of the conventional physics-informed neural network (PINN) need to be improved before it can be a competitive alternative for soil consolidation analyses. This paper aims to overcome these limitations by proposing a highly accurate and efficient physics-informed machine learning (PIML) approach, termed time-stepping physics-informed extreme learning machine (TS-PIELM). In the TS-PIELM framework the consolidation process is divided into numerous time intervals, which helps overcome the limitation of PIELM in solving differential equations with sharp gradients. To accelerate network training, the solution is approximated by a single-layer feedforward extreme learning machine (ELM), rather than using a fully connected neural network in PINN. The input layer weights of the ELM network are generated randomly and fixed during the training process. Subsequently, the output layer weights are directly computed by solving a system of linear equations, which significantly enhances the training efficiency compared to the time-consuming gradient descent method in PINN. Finally, the superior performance of TS-PIELM is demonstrated by solving three typical Terzaghi consolidation problems. Compared to PINN, results show that the computational efficiency and accuracy of the novel TS-PIELM framework are improved by more than 1000 times and 100 times for one-dimensional cases, respectively. This paper provides compelling evidence that PIML can be a powerful tool for computational geotechnics.
nan
Article 539
Title@2025-06-11 (3): Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis
Title: Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis | Warum sind Web-KI-Agenten anfälliger als Standalone LLMs? Eine Sicherheitsanalyse | 为什么互联网上AI代理机构比独立的LLMs更加脆弱? 2502.20383v2 |
Authors (5): Jeffrey Yang Fan Chiang, Seungjae Lee, Jia-Bin Huang, Furong Huang, Yizheng Chen
Recent advancements in Web AI agents have demonstrated remarkable capabilities in addressing complex web navigation tasks. However, emerging research shows that these agents exhibit greater vulnerability compared to standalone Large Language Models (LLMs), despite both being built upon the same safety-aligned models. This discrepancy is particularly concerning given the greater flexibility of Web AI Agent compared to standalone LLMs, which may expose them to a wider range of adversarial user inputs. To build a scaffold that addresses these concerns, this study investigates the underlying factors that contribute to the increased vulnerability of Web AI agents. Notably, this disparity stems from the multifaceted differences between Web AI agents and standalone LLMs, as well as the complex signals - nuances that simple evaluation metrics, such as success rate, often fail to capture. To tackle these challenges, we propose a component-level analysis and a more granular, systematic evaluation framework. Through this fine-grained investigation, we identify three critical factors that amplify the vulnerability of Web AI agents; (1) embedding user goals into the system prompt, (2) multi-step action generation, and (3) observational capabilities. Our findings highlights the pressing need to enhance security and robustness in AI agent design and provide actionable insights for targeted defense strategies.
nan
Article 540
Title@2025-06-11 (3): Adversarial Surrogate Risk Bounds for Binary Classification
Title: Adversarial Surrogate Risk Bounds for Binary Classification | Adversarial Surrogate Risk Bounds für Binäre Klassifizierung | 二元分类的反向代谢风险临界值 2506.09348v1 |
Authors (1): Natalie S. Frank
A central concern in classification is the vulnerability of machine learning models to adversarial attacks. Adversarial training is one of the most popular techniques for training robust classifiers, which involves minimizing an adversarial surrogate risk. Recent work characterized when a minimizing sequence of an adversarial surrogate risk is also a minimizing sequence of the adversarial classification risk for binary classification – a property known as adversarial consistency. However, these results do not address the rate at which the adversarial classification risk converges to its optimal value for such a sequence of functions that minimize the adversarial surrogate. This paper provides surrogate risk bounds that quantify that convergence rate. Additionally, we derive distribution-dependent surrogate risk bounds in the standard (non-adversarial) learning setting, that may be of independent interest.
nan
Article 541
Title@2025-06-11 (3): ErrorEraser: Unlearning Data Bias for Improved Continual Learning
Title: ErrorEraser: Unlearning Data Bias for Improved Continual Learning | ErrorErraser: Unlearning Data Bias für verbessertes kontinuierliches Lernen | 错误搜索器: 退出数据比亚, 用于改进持续学习 2506.09347v1 |
Authors (7): Xuemei Cao, Hanlin Gu, Xin Yang, Bingjun Wei, Haoyang Liang, Xiangkun Wang, Tianrui Li
Continual Learning (CL) primarily aims to retain knowledge to prevent catastrophic forgetting and transfer knowledge to facilitate learning new tasks. Unlike traditional methods, we propose a novel perspective: CL not only needs to prevent forgetting, but also requires intentional forgetting.This arises from existing CL methods ignoring biases in real-world data, leading the model to learn spurious correlations that transfer and amplify across tasks. From feature extraction and prediction results, we find that data biases simultaneously reduce CL’s ability to retain and transfer knowledge. To address this, we propose ErrorEraser, a universal plugin that removes erroneous memories caused by biases in CL, enhancing performance in both new and old tasks. ErrorEraser consists of two modules: Error Identification and Error Erasure. The former learns the probability density distribution of task data in the feature space without prior knowledge, enabling accurate identification of potentially biased samples. The latter ensures only erroneous knowledge is erased by shifting the decision space of representative outlier samples. Additionally, an incremental feature distribution learning strategy is designed to reduce the resource overhead during error identification in downstream tasks. Extensive experimental results show that ErrorEraser significantly mitigates the negative impact of data biases, achieving higher accuracy and lower forgetting rates across three types of CL methods. The code is available at https://github.com/diadai/ErrorEraser.
nan
Article 542
Title@2025-06-11 (3): Ming-Omni: A Unified Multimodal Model for Perception and Generation
Title: Ming-Omni: A Unified Multimodal Model for Perception and Generation | Ming-Omni: Ein einheitliches multimodales Modell für Wahrnehmung und Erzeugung | Ming-Omni:感知和一代的统一多模式模型 2506.09344v1 |
Authors (58): Inclusion AI, Biao Gong, Cheng Zou, Chuanyang Zheng, Chunluan Zhou, Canxiang Yan, Chunxiang Jin, Chunjie Shen, Dandan Zheng, Fudong Wang, Furong Xu, GuangMing Yao, Jun Zhou, Jingdong Chen, Jianxin Sun, Jiajia Liu, Jianjiang Zhu, Jun Peng, Kaixiang Ji, Kaiyou Song, Kaimeng Ren, Libin Wang, Lixiang Ru, Lele Xie, Longhua Tan, Lyuxin Xue, Lan Wang, Mochen Bai, Ning Gao, Pei Chen, Qingpei Guo, Qinglong Zhang, Qiang Xu, Rui Liu, Ruijie Xiong, Sirui Gao, Tinghao Liu, Taisong Li, Weilong Chai, Xinyu Xiao, Xiaomei Wang, Xiaoxue Chen, Xiao Lu, Xiaoyu Li, Xingning Dong, Xuzheng Yu, Yi Yuan, Yuting Gao, Yunxiao Sun, Yipeng Chen, Yifei Wu, Yongjie Lyu, Ziping Ma, Zipeng Feng, Zhijiang Fang, Zhihao Qiu, Ziyuan Huang, Zhengyu He
We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single model to efficiently process and fuse multimodal inputs within a unified framework, thereby facilitating diverse tasks without requiring separate models, task-specific fine-tuning, or structural redesign. Importantly, Ming-Omni extends beyond conventional multimodal models by supporting audio and image generation. This is achieved through the integration of an advanced audio decoder for natural-sounding speech and Ming-Lite-Uni for high-quality image generation, which also allow the model to engage in context-aware chatting, perform text-to-speech conversion, and conduct versatile image editing. Our experimental results showcase Ming-Omni offers a powerful solution for unified perception and generation across all modalities. Notably, our proposed Ming-Omni is the first open-source model we are aware of to match GPT-4o in modality support, and we release all code and model weights to encourage further research and development in the community.
nan
Article 543
Title@2025-06-11 (3): RePO: Replay-Enhanced Policy Optimization
Title: RePO: Replay-Enhanced Policy Optimization | RePO: Replay-erweiterte Politikoptimierung | REPO: 重放强化政策优化 2506.09340v1 |
Authors (5): Siheng Li, Zhanhui Zhou, Wai Lam, Chao Yang, Chaochao Lu
Reinforcement learning (RL) is vital for optimizing large language models (LLMs). Recent Group Relative Policy Optimization (GRPO) estimates advantages using multiple on-policy outputs per prompt, leading to high computational costs and low data efficiency. To address this, we introduce Replay-Enhanced Policy Optimization (RePO), which leverages diverse replay strategies to retrieve off-policy samples from a replay buffer, allowing policy optimization based on a broader and more diverse set of samples for each prompt. Experiments on five LLMs across seven mathematical reasoning benchmarks demonstrate that RePO achieves absolute average performance gains of $18.4$ and $4.1$ points for Qwen2.5-Math-1.5B and Qwen3-1.7B, respectively, compared to GRPO. Further analysis indicates that RePO increases computational cost by $15\%$ while raising the number of effective optimization steps by $48\%$ for Qwen3-1.7B, with both on-policy and off-policy sample numbers set to $8$. The repository can be accessed at https://github.com/SihengLi99/RePO.
nan
Article 544
Title@2025-06-11 (3): AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin
Title: AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin | AsFT: Verankerungssicherheit während LLM-Feinsteuerung innerhalb eines schmalen Sicherheitsbeckens | AsFT:LLM在狭窄安全盆地内精密出海期间安整安全 2506.08473v2 |
Authors (11): Shuo Yang, Qihui Zhang, Yuyang Liu, Yue Huang, Xiaojun Jia, Kunpeng Ning, Jiayu Yao, Jigang Wang, Hailiang Dai, Yibing Song, Li Yuan
Large language models (LLMs) are vulnerable to safety risks during fine-tuning, where small amounts of malicious or harmless data can compromise safeguards. In this paper, building on the concept of alignment direction – defined by the weight difference between aligned and unaligned models – we observe that perturbations along this direction preserve model safety. In contrast, perturbations along directions orthogonal to this alignment are strongly linked to harmful direction perturbations, rapidly degrading safety and framing the parameter space as a narrow safety basin. Based on this insight, we propose a methodology for safety fine-tuning called AsFT (Anchoring Safety in Fine-Tuning), which integrates a regularization term into the training objective. This term uses the alignment direction as an anchor to suppress updates in harmful directions, ensuring that fine-tuning is constrained within the narrow safety basin. Extensive experiments on multiple datasets show that AsFT outperforms Safe LoRA, reducing harmful behavior by 7.60 percent, improving model performance by 3.44 percent, and maintaining robust performance across various experimental settings. Code is available at https://github.com/PKU-YuanGroup/AsFT
nan
Article 545
Title@2025-06-11 (3): Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Title: Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference | Sentence-Level-Reward-Modell kann besser für die Ausrichtung LLM aus menschlichen Präferenzen Generalisieren | 判决一级奖赏模式可以普遍化,更有利于将人类特惠的LLM与LM相匹配 2503.04793v4 |
Authors (7): Wenjie Qiu, Yi-Chen Li, Xuqin Zhang, Tianyi Zhang, Yihang Zhang, Zongzhang Zhang, Yang Yu
Learning reward models from human preference datasets and subsequently optimizing language models via reinforcement learning has emerged as a fundamental paradigm for aligning LLMs with human preferences. The performance of the reward model plays a crucial role in the effectiveness of alignment. Previous reward models operate at a coarse-grained level, requiring the generation of a complete response to obtain a reward value. The sparse reward may present challenges for downstream reinforcement learning. While recent efforts have attempted to learn token-level reward models, the lack of explicit semantic information makes it difficult to model the credit of every individual token. In this paper, we propose assigning scores to every sentence, introducing an intermediate-grained reward model. By segmenting the complete response into sentences and applying differential operations to reward output at the start and end positions of each sentence, we can effectively model the rewards of sentences. Moreover, a novel attention mechanism is introduced to aggregate the scores of all sentences into a response-level score, which allows it to be trained using the Bradley-Terry model. On common benchmarks, our method outperforms the response-level reward model by 2.7% on RewardBench (for reward modeling evaluation) and surpasses all baselines on AlpacaEval (for alignment evaluation).
nan
Article 546
Title@2025-06-11 (3): Know What You Don’t Know: Uncertainty Calibration of Process Reward Models
Title: Know What You Don’t Know: Uncertainty Calibration of Process Reward Models | Wissen, was Sie nicht wissen: Unsicherheit Kalibrierung von Prozess-Reward-Modellen | 你不知道你不知道什么: 过程奖励模型的不确定性校准 2506.09338v1 |
Authors (5): Young-Jin Park, Kristjan Greenewald, Kaveh Alim, Hao Wang, Navid Azizan
Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poorly calibrated and often overestimate success probabilities. To address this, we present a calibration approach, performed via quantile regression, that adjusts PRM outputs to better align with true success probabilities. Leveraging these calibrated success estimates and their associated confidence bounds, we introduce an \emph{instance-adaptive scaling} (IAS) framework that dynamically adjusts the inference budget based on the estimated likelihood that a partial reasoning trajectory will yield a correct final answer. Unlike conventional methods that allocate a fixed number of reasoning trajectories per query, this approach successfully adapts to each instance and reasoning step when using our calibrated PRMs. Experiments on mathematical reasoning benchmarks show that (i) our PRM calibration method successfully achieves small calibration error, outperforming the baseline methods, (ii) calibration is crucial for enabling effective adaptive scaling, and (iii) the proposed IAS strategy reduces inference costs while maintaining final answer accuracy, utilizing less compute on more confident problems as desired.
nan
Article 547
Title@2025-06-11 (3): STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation
Title: STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation | STAMImputer: Raum-Temporale Aufmerksamkeit MoE für Verkehrsdaten Imputation | STAMIMUTER: 交通数据截读的时/时注意MoE 2506.08054v2 |
Authors (7): Yiming Wang, Hao Peng, Senzhang Wang, Haohua Du, Chunyang Liu, Jia Wu, Guanlin Wu
Traffic data imputation is fundamentally important to support various applications in intelligent transportation systems such as traffic flow prediction. However, existing time-to-space sequential methods often fail to effectively extract features in block-wise missing data scenarios. Meanwhile, the static graph structure for spatial feature propagation significantly constrains the models flexibility in handling the distribution shift issue for the nonstationary traffic data. To address these issues, this paper proposes a SpatioTemporal Attention Mixture of experts network named STAMImputer for traffic data imputation. Specifically, we introduce a Mixture of Experts (MoE) framework to capture latent spatio-temporal features and their influence weights, effectively imputing block missing. A novel Low-rank guided Sampling Graph ATtention (LrSGAT) mechanism is designed to dynamically balance the local and global correlations across road networks. The sampled attention vectors are utilized to generate dynamic graphs that capture real-time spatial correlations. Extensive experiments are conducted on four traffic datasets for evaluation. The result shows STAMImputer achieves significantly performance improvement compared with existing SOTA approaches. Our codes are available at https://github.com/RingBDStack/STAMImupter.
nan
Article 548
Title@2025-06-11 (3): Natural Language Guided Ligand-Binding Protein Design
Title: Natural Language Guided Ligand-Binding Protein Design | Natürliche Sprache geführt Ligand-Binding Protein Design | Ligand-Binding 蛋白设计 2506.09332v1 |
Authors (5): Zhenqiao Song, Ramith Hettiarachchi, Chuan Li, Jianwen Xie, Lei Li
Can AI protein models follow human language instructions and design proteins with desired functions (e.g. binding to a ligand)? Designing proteins that bind to a given ligand is crucial in a wide range of applications in biology and chemistry. Most prior AI models are trained on protein-ligand complex data, which is scarce due to the high cost and time requirements of laboratory experiments. In contrast, there is a substantial body of human-curated text descriptions about protein-ligand interactions and ligand formula. In this paper, we propose InstructPro, a family of protein generative models that follow natural language instructions to design ligand-binding proteins. Given a textual description of the desired function and a ligand formula in SMILES, InstructPro generates protein sequences that are functionally consistent with the specified instructions. We develop the model architecture, training strategy, and a large-scale dataset, InstructProBench, to support both training and evaluation. InstructProBench consists of 9,592,829 triples of (function description, ligand formula, protein sequence). We train two model variants: InstructPro-1B (with 1 billion parameters) and InstructPro-3B~(with 3 billion parameters). Both variants consistently outperform strong baselines, including ProGen2, ESM3, and Pinal. Notably, InstructPro-1B achieves the highest docking success rate (81.52% at moderate confidence) and the lowest average root mean square deviation (RMSD) compared to ground truth structures (4.026{\AA}). InstructPro-3B further descreases the average RMSD to 2.527{\AA}, demonstrating InstructPro’s ability to generate ligand-binding proteins that align with the functional specifications.
nan
Article 549
Title@2025-06-11 (3): SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space
Title: SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space | SWAT-NN: Simultangewichte und Architekturtraining für neurale Netzwerke im Latent Space | SWAT-NN: 低空空间神经网络的同质重量和建筑培训 2506.08270v2 |
Authors (3): Zitong Huang, Mansooreh Montazerin, Ajitesh Srivastava
Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.
nan
Article 550
Title@2025-06-11 (3): Your Agent Can Defend Itself against Backdoor Attacks
Title: Your Agent Can Defend Itself against Backdoor Attacks | Ihr Agent kann sich gegen Hintertürangriffe verteidigen | 你的特工可以保护自己 免受幕后攻击 2506.08336v2 |
Authors (5): Li Changjiang, Liang Jiacheng, Cao Bochuan, Chen Jinghui, Wang Ting
Despite their growing adoption across domains, large language model (LLM)-powered agents face significant security risks from backdoor attacks during training and fine-tuning. These compromised agents can subsequently be manipulated to execute malicious operations when presented with specific triggers in their inputs or environments. To address this pressing risk, we present ReAgent, a novel defense against a range of backdoor attacks on LLM-based agents. Intuitively, backdoor attacks often result in inconsistencies among the user’s instruction, the agent’s planning, and its execution. Drawing on this insight, ReAgent employs a two-level approach to detect potential backdoors. At the execution level, ReAgent verifies consistency between the agent’s thoughts and actions; at the planning level, ReAgent leverages the agent’s capability to reconstruct the instruction based on its thought trajectory, checking for consistency between the reconstructed instruction and the user’s instruction. Extensive evaluation demonstrates ReAgent’s effectiveness against various backdoor attacks across tasks. For instance, ReAgent reduces the attack success rate by up to 90\% in database operation tasks, outperforming existing defenses by large margins. This work reveals the potential of utilizing compromised agents themselves to mitigate backdoor risks.
nan
Article 551
Title@2025-06-11 (3): polyGen: A Learning Framework for Atomic-level Polymer Structure Generation
Title: polyGen: A Learning Framework for Atomic-level Polymer Structure Generation | polyGen: Ein Lernrahmen für die atomare Polymerstrukturbildung | 聚金:原子级聚合体结构生成学习框架 2504.17656v3 |
Authors (2): Ayush Jain, Rampi Ramprasad
Synthetic polymeric materials underpin fundamental technologies in the energy, electronics, consumer goods, and medical sectors, yet their development still suffers from prolonged design timelines. Although polymer informatics tools have supported speedup, polymer simulation protocols continue to face significant challenges in the on-demand generation of realistic 3D atomic structures that respect conformational diversity. Generative algorithms for 3D structures of inorganic crystals, bio-polymers, and small molecules exist, but have not addressed synthetic polymers because of challenges in representation and dataset constraints. In this work, we introduce polyGen, the first generative model designed specifically for polymer structures from minimal inputs such as the repeat unit chemistry alone. polyGen combines graph-based encodings with a latent diffusion transformer using positional biased attention for realistic conformation generation. Given the limited dataset of 3,855 DFT-optimized polymer structures, we incorporate joint training with small molecule data to enhance generation quality. We also establish structure matching criteria to benchmark our approach on this novel problem. polyGen overcomes the limitations of traditional crystal structure prediction methods for polymers, successfully generating realistic and diverse linear and branched conformations, with promising performance even on challenging large repeat units. As the first atomic-level proof-of-concept capturing intrinsic polymer flexibility, it marks a new capability in material structure generation.
nan
Article 552
Title@2025-06-11 (3): Low-Loss Space in Neural Networks is Continuous and Fully Connected
Title: Low-Loss Space in Neural Networks is Continuous and Fully Connected | Low-Loss-Raum in neuralen Netzwerken ist kontinuierlich und vollständig verbunden | 神经网络中的低损失空间是连续和完全连接的 2505.02604v3 |
Authors (4): Yongding Tian, Zaid Al-Ars, Maksim Kitsak, Peter Hofstee
Visualizations of the loss landscape in neural networks suggest that minima are isolated points. However, both theoretical and empirical studies indicate that it is possible to connect two different minima with a path consisting of intermediate points that also have low loss. In this study, we propose a new algorithm which investigates low-loss paths in the full parameter space, not only between two minima. Our experiments on LeNet5, ResNet18, and Compact Convolutional Transformer architectures consistently demonstrate the existence of such continuous paths in the parameter space. These results suggest that the low-loss region is a fully connected and continuous space in the parameter space. Our findings provide theoretical insight into neural network over-parameterization, highlighting that parameters collectively define a high-dimensional low-loss space, implying parameter redundancy exists only within individual models and not throughout the entire low-loss space. Additionally, our work also provides new visualization methods and opportunities to improve model generalization by exploring the low-loss space that is closer to the origin.
nan
Article 553
Title@2025-06-11 (3): Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
Title: Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies | Beschleunigung der LLM-Inferenz mit verlustfreier spekulativer Dekodierung von Algorithmen für heterogene Vokabeln | 加速LLM 超异基因词汇表无损失投机代谢值的加速LLM推导值 2502.05202v3 |
Authors (8): Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Gaurav Jain, Oren Pereg, Moshe Wasserblat, David Harel
Accelerating the inference of large language models (LLMs) is a critical challenge in generative AI. Speculative decoding (SD) methods offer substantial efficiency gains by generating multiple tokens using a single target forward pass. However, existing SD approaches require the drafter and target models to share the same vocabulary, thus limiting the pool of possible drafters, often necessitating the training of a drafter from scratch. We present three new SD methods that remove this shared-vocabulary constraint. All three methods preserve the target distribution (i.e., they are lossless) and work with off-the-shelf models without requiring additional training or modifications. Empirically, on summarization, programming, and long-context tasks, our algorithms demonstrate significant speedups of up to 2.8x over standard autoregressive decoding. By enabling any off-the-shelf model to serve as a drafter and requiring no retraining, this work substantially broadens the applicability of the SD framework in practice.
nan
Article 554
Title@2025-06-11 (3): Monte Carlo Tree Diffusion for System 2 Planning
Title: Monte Carlo Tree Diffusion for System 2 Planning | Monte Carlo Tree Diffusion für System 2 Planung | 用于系统2规划的蒙特卡洛树传播 2502.07202v4 |
Authors (5): Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, Sungjin Ahn
Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance naturally improves with inference-time computation scaling-standard diffusion-based planners offer only limited avenues for the scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as inference-time computation increases.
nan
Article 555
Title@2025-06-11 (3): On The Impact of Merge Request Deviations on Code Review Practices
Title: On The Impact of Merge Request Deviations on Code Review Practices | Über die Auswirkungen von Merge Request Abweichungen auf Code-Review-Praktiken | 合并请求对守则审查惯例的影响 2506.08860v2 |
Authors (3): Samah Kansab, Francis Bordeleau, Ali Tizghadam
Code review is a key practice in software engineering, ensuring quality and collaboration. However, industrial Merge Request (MR) workflows often deviate from standardized review processes, with many MRs serving non-review purposes (e.g., drafts, rebases, or dependency updates). We term these cases deviations and hypothesize that ignoring them biases analytics and undermines ML models for review analysis. We identify seven deviation categories, occurring in 37.02% of MRs, and propose a few-shot learning detection method (91% accuracy). By excluding deviations, ML models predicting review completion time improve performance in 53.33% of cases (up to 2.25x) and exhibit significant shifts in feature importance (47% overall, 60% top-k). Our contributions include: (1) a taxonomy of MR deviations, (2) an AI-driven detection approach, and (3) empirical evidence of their impact on ML-based review analytics. This work aids practitioners in optimizing review efforts and ensuring reliable insights.
nan
Article 556
Title@2025-06-11 (3): Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
Title: Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models | Schulungsfreie Anleitung über die Unterscheidungsfähigkeit: Skalierbare Pfadsteuerung mit Baumsuche in Diffusions- und Flussmodellen | 超越差异的无培训指导—-超越差异的无培训指导:在传播和流动模型中进行植树搜索的可扩展路径指导 2502.11420v3 |
Authors (4): Yingqing Guo, Yukang Yang, Hui Yuan, Mengdi Wang
Training-free guidance enables controlled generation in diffusion and flow models, but most methods rely on gradients and assume differentiable objectives. This work focuses on training-free guidance addressing challenges from non-differentiable objectives and discrete data distributions. We propose TreeG: Tree Search-Based Path Steering Guidance, applicable to both continuous and discrete settings in diffusion and flow models. TreeG offers a unified framework for training-free guidance by proposing, evaluating, and selecting candidates at each step, enhanced with tree search over active paths and parallel exploration. We comprehensively investigate the design space of TreeG over the candidate proposal module and the evaluation function, instantiating TreeG into three novel algorithms. Our experiments show that TreeG consistently outperforms top guidance baselines in symbolic music generation, small molecule design, and enhancer DNA design with improvements of 29.01%, 16.6%, and 18.43%. Additionally, we identify an inference-time scaling law showing TreeG’s scalability in inference-time computation.
nan
Article 557
Title@2025-06-11 (3): Alzheimer’s Dementia Detection Using Perplexity from Paired Large Language Models
Title: Alzheimer’s Dementia Detection Using Perplexity from Paired Large Language Models | Alzheimer’s Dementia Detection Using Perplexity from Paired Large Language Models | 阿兹海默氏病痴呆症检测,使用Paired大语言模型的复度 2506.09315v1 |
Authors (3): Yao Xiao, Heidi Christensen, Stefan Goetze
Alzheimer’s dementia (AD) is a neurodegenerative disorder with cognitive decline that commonly impacts language ability. This work extends the paired perplexity approach to detecting AD by using a recent large language model (LLM), the instruction-following version of Mistral-7B. We improve accuracy by an average of 3.33% over the best current paired perplexity method and by 6.35% over the top-ranked method from the ADReSS 2020 challenge benchmark. Our further analysis demonstrates that the proposed approach can effectively detect AD with a clear and interpretable decision boundary in contrast to other methods that suffer from opaque decision-making processes. Finally, by prompting the fine-tuned LLMs and comparing the model-generated responses to human responses, we illustrate that the LLMs have learned the special language patterns of AD speakers, which opens up possibilities for novel methods of model interpretation and data augmentation.
nan
Article 558
Title@2025-06-11 (3): Surrogate models to optimize plasma assisted atomic layer deposition in high aspect ratio features
Title: Surrogate models to optimize plasma assisted atomic layer deposition in high aspect ratio features | Surrogate-Modelle zur Optimierung der plasmaunterstützten Atomschichtabscheidung in Eigenschaften mit hohem Seitenverhältnis | 利用代用模型优化高方位比率特征的等离子体辅助原子层沉积 2506.09313v1 |
Authors (2): Angel Yanguas-Gil, Jeffrey W. Elam
In this work we explore surrogate models to optimize plasma enhanced atomic layer deposition (PEALD) in high aspect ratio features. In plasma-based processes such as PEALD and atomic layer etching, surface recombination can dominate the reactivity of plasma species with the surface, which can lead to unfeasibly long exposure times to achieve full conformality inside nanostructures like high aspect ratio vias. Using a synthetic dataset based on simulations of PEALD, we train artificial neural networks to predict saturation times based on cross section thickness data obtained for partially coated conditions. The results obtained show that just two experiments in undersaturated conditions contain enough information to predict saturation times within 10% of the ground truth. A surrogate model trained to determine whether surface recombination dominates the plasma-surface interactions in a PEALD process achieves 99% accuracy. This demonstrates that machine learning can provide a new pathway to accelerate the optimization of PEALD processes in areas such as microelectronics. Our approach can be easily extended to atomic layer etching and more complex structures.
nan
Article 559
Title@2025-06-11 (3): What is the Cost of Differential Privacy for Deep Learning-Based Trajectory Generation?
Title: What is the Cost of Differential Privacy for Deep Learning-Based Trajectory Generation? | Was sind die Kosten der unterschiedlichen Privatsphäre für Deep Learning-Based Trajektorie Generation? | 深层学习轨迹世代的不同隐私成本是多少? 2506.09312v1 |
Authors (6): Erik Buchholz, Natasha Fernandes, David D. Nguyen, Alsharif Abuadbba, Surya Nepal, Salil S. Kanhere
While location trajectories offer valuable insights, they also reveal sensitive personal information. Differential Privacy (DP) offers formal protection, but achieving a favourable utility-privacy trade-off remains challenging. Recent works explore deep learning-based generative models to produce synthetic trajectories. However, current models lack formal privacy guarantees and rely on conditional information derived from real data during generation. This work investigates the utility cost of enforcing DP in such models, addressing three research questions across two datasets and eleven utility metrics. (1) We evaluate how DP-SGD, the standard DP training method for deep learning, affects the utility of state-of-the-art generative models. (2) Since DP-SGD is limited to unconditional models, we propose a novel DP mechanism for conditional generation that provides formal guarantees and assess its impact on utility. (3) We analyse how model types - Diffusion, VAE, and GAN - affect the utility-privacy trade-off. Our results show that DP-SGD significantly impacts performance, although some utility remains if the datasets is sufficiently large. The proposed DP mechanism improves training stability, particularly when combined with DP-SGD, for unstable models such as GANs and on smaller datasets. Diffusion models yield the best utility without guarantees, but with DP-SGD, GANs perform best, indicating that the best non-private model is not necessarily optimal when targeting formal guarantees. In conclusion, DP trajectory generation remains a challenging task, and formal guarantees are currently only feasible with large datasets and in constrained use cases.
nan
Article 560
Title@2025-06-11 (3): Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Title: Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions | Rewind-to-Delete: Zertifiziertes Maschinen-Entlernen für nicht konvexe Funktionen | 倒回到删除: 认证机器不学习非convex函数 2409.09778v4 |
Authors (2): Siqiao Mu, Diego Klabjan
Machine unlearning algorithms aim to efficiently remove data from a model without retraining it from scratch, in order to remove corrupted or outdated data or respect a user’s right to be forgotten." Certified machine unlearning is a strong theoretical guarantee based on differential privacy that quantifies the extent to which an algorithm erases data from the model weights. In contrast to existing works in certified unlearning for convex or strongly convex loss functions, or nonconvex objectives with limiting assumptions, we propose the first, first-order, black-box (i.e., can be applied to models pretrained with vanilla gradient descent) algorithm for unlearning on general nonconvex loss functions, which unlearns by
rewinding” to an earlier step during the learning process before performing gradient descent on the loss function of the retained data points. We prove $(\epsilon, \delta)$ certified unlearning and performance guarantees that establish the privacy-utility-complexity tradeoff of our algorithm, and we prove generalization guarantees for functions that satisfy the Polyak-Lojasiewicz inequality. Finally, we demonstrate the superior performance of our algorithm compared to existing methods, within a new experimental framework that more accurately reflects unlearning user data in practice.
nan
Article 561
Title@2025-06-11 (3): A Geometric Modeling of Occam’s Razor in Deep Learning
Title: A Geometric Modeling of Occam’s Razor in Deep Learning | Eine geometrische Modellierung von Occams Razor im Deep Learning | Occam的深层学习剃刀的几何建模 1905.11027v9 |
Authors (2): Ke Sun, Frank Nielsen
Why do deep neural networks (DNNs) benefit from very high dimensional parameter spaces? Their huge parameter complexities vs stunning performance in practice is all the more intriguing and not explainable using the standard theory of model selection for regular models. In this work, we propose a geometrically flavored information-theoretic approach to study this phenomenon. With the belief that simplicity is linked to better generalization, as grounded in the theory of minimum description length, the objective of our analysis is to examine and bound the complexity of DNNs. We introduce the locally varying dimensionality of the parameter space of neural network models by considering the number of significant dimensions of the Fisher information matrix, and model the parameter space as a manifold using the framework of singular semi-Riemannian geometry. We derive model complexity measures which yield short description lengths for deep neural network models based on their singularity analysis thus explaining the good performance of DNNs despite their large number of parameters.
nan
Article 562
Title@2025-06-11 (3): PARAFAC2-based Coupled Matrix and Tensor Factorizations with Constraints
Title: PARAFAC2-based Coupled Matrix and Tensor Factorizations with Constraints | PARAFAC2-basierte gekoppelte Matrix- und Tensor-Faktorisierungen mit Einschränkungen | 以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2为基础的 2406.12338v2 |
Authors (5): Carla Schenker, Xiulin Wang, David Horner, Morten A. Rasmussen, Evrim Acar
Data fusion models based on Coupled Matrix and Tensor Factorizations (CMTF) have been effective tools for joint analysis of data from multiple sources. While the vast majority of CMTF models are based on the strictly multilinear CANDECOMP/PARAFAC (CP) tensor model, recently also the more flexible PARAFAC2 model has been integrated into CMTF models. PARAFAC2 tensor models can handle irregular/ragged tensors and have shown to be especially useful for modelling dynamic data with unaligned or irregular time profiles. However, existing PARAFAC2-based CMTF models have limitations in terms of possible regularizations on the factors and/or types of coupling between datasets. To address these limitations, in this paper we introduce a flexible algorithmic framework that fits PARAFAC2-based CMTF models using Alternating Optimization (AO) and the Alternating Direction Method of Multipliers (ADMM). The proposed framework allows to impose various constraints on all modes and linear couplings to other matrix-, CP- or PARAFAC2-models. Experiments on various simulated and a real dataset demonstrate the utility and versatility of the proposed framework as well as its benefits in terms of accuracy and efficiency in comparison with state-of-the-art methods.
nan
Article 563
Title@2025-06-11 (3): Neuromorphic Optical Tracking and Imaging of Randomly Moving Targets through Strongly Scattering Media
Title: Neuromorphic Optical Tracking and Imaging of Randomly Moving Targets through Strongly Scattering Media | Neuromorphes optisches Tracking und Bildgebung von zufällig sich bewegenden Zielen durch stark streuende Medien | 通过强烈散射媒体对随机移动目标进行神经形态光学跟踪和成像 2501.03874v2 |
Authors (3): Ning Zhang, Timothy Shea, Arto Nurmikko
Tracking and acquiring simultaneous optical images of randomly moving targets obscured by scattering media remains a challenging problem of importance to many applications that require precise object localization and identification. In this work we develop an end-to-end neuromorphic optical engineering and computational approach to demonstrate how to track and image normally invisible objects by combining an event detecting camera with a multistage neuromorphic deep learning strategy. Photons emerging from dense scattering media are detected by the event camera and converted to pixel-wise asynchronized spike trains - a first step in isolating object-specific information from the dominant uninformative background. Spiking data is fed into a deep spiking neural network (SNN) engine where object tracking and image reconstruction are performed by two separate yet interconnected modules running in parallel in discrete time steps over the event duration. Through benchtop experiments we demonstrate tracking and imaging randomly moving objects in dense turbid media as well as image reconstruction of spatially stationary but optically dynamic objects. Standardized character sets serve as representative proxies for geometrically complex objects, underscoring the method’s generality. The results highlight the advantages of a fully neuromorphic approach in meeting a major imaging technology with high computational efficiency and low power consumption.
nan
Article 564
Title@2025-06-11 (3): Lower Bounds for Learning Quantum States with Single-Copy Measurements
Title: Lower Bounds for Learning Quantum States with Single-Copy Measurements | Untere Grenzen für das Lernen von Quantenzuständen mit Single-Copy-Messungen | 具有单复度测量法的低下学习区次数国家 2207.14438v3 |
Authors (2): Angus Lowe, Ashwin Nayak
We study the problems of quantum tomography and shadow tomography using measurements performed on individual, identical copies of an unknown $d$-dimensional state. We first revisit a known lower bound due to Haah et al. (2017) on quantum tomography with accuracy $\epsilon$ in trace distance, when the measurements choices are independent of previously observed outcomes (i.e., they are nonadaptive). We give a succinct proof of this result. This leads to stronger lower bounds when the learner uses measurements with a constant number of outcomes. In particular, this rigorously establishes the optimality of the folklore ``Pauli tomography” algorithm in terms of its sample complexity. We also derive novel bounds of $\Omega(r^2 d/\epsilon^2)$ and $\Omega(r^2 d^2/\epsilon^2)$ for learning rank $r$ states using arbitrary and constant-outcome measurements, respectively, in the nonadaptive case. In addition to the sample complexity, a resource of practical significance for learning quantum states is the number of different measurements used by an algorithm. We extend our lower bounds to the case where the learner performs possibly adaptive measurements from a fixed set of $\exp(O(d))$ measurements. This implies in particular that adaptivity does not give us any advantage using single-copy measurements that are efficiently implementable. We also obtain a similar bound in the case where the goal is to predict the expectation values of a given sequence of observables, a task known as shadow tomography. Finally, in the case of adaptive, single-copy measurements implementable with polynomial-size circuits, we prove that a straightforward strategy based on computing sample means of the given observables is optimal.
nan
Article 565
Title@2025-06-10 (2): Lightweight Object Detection Using Quantized YOLOv4-Tiny for Emergency Response in Aerial Imagery
Title: Lightweight Object Detection Using Quantized YOLOv4-Tiny for Emergency Response in Aerial Imagery | Leichtgewichtige Objekterkennung mit Quantisiertem YOLOv4-Tiny für Notfallreaktion in Luftbildern | 在空中图像中,使用量化的YOLOv4-Tiny用于应急反应的轻重量物体探测 2506.09299v1 |
Authors (2): Sindhu Boddu, Arindam Mukherjee
This paper presents a lightweight and energy-efficient object detection solution for aerial imagery captured during emergency response situations. We focus on deploying the YOLOv4-Tiny model, a compact convolutional neural network, optimized through post-training quantization to INT8 precision. The model is trained on a custom-curated aerial emergency dataset, consisting of 10,820 annotated images covering critical emergency scenarios. Unlike prior works that rely on publicly available datasets, we created this dataset ourselves due to the lack of publicly available drone-view emergency imagery, making the dataset itself a key contribution of this work. The quantized model is evaluated against YOLOv5-small across multiple metrics, including mean Average Precision (mAP), F1 score, inference time, and model size. Experimental results demonstrate that the quantized YOLOv4-Tiny achieves comparable detection performance while reducing the model size from 22.5 MB to 6.4 MB and improving inference speed by 44\%. With a 71\% reduction in model size and a 44\% increase in inference speed, the quantized YOLOv4-Tiny model proves highly suitable for real-time emergency detection on low-power edge devices.
nan
Article 566
Title@2025-06-10 (2): WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles
Title: WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles | WaKA: Datenzuweisung unter Verwendung von K-Nächstgelegenen Nachbarn und Mitgliedschafts-Privacy Principles | WaKA:使用K-近邻和成员隐私权原则的数据归属 2411.01357v3 |
Authors (5): Patrick Mesana, Clément Bénesse, Hadrien Lautraite, Gilles Caporossi, Sébastien Gambs
In this paper, we introduce WaKA (Wasserstein K-nearest-neighbors Attribution), a novel attribution method that leverages principles from the LiRA (Likelihood Ratio Attack) framework and k-nearest neighbors classifiers (k-NN). WaKA efficiently measures the contribution of individual data points to the model’s loss distribution, analyzing every possible k-NN that can be constructed using the training set, without requiring to sample subsets of the training set. WaKA is versatile and can be used a posteriori as a membership inference attack (MIA) to assess privacy risks or a priori for privacy influence measurement and data valuation. Thus, WaKA can be seen as bridging the gap between data attribution and membership inference attack (MIA) by providing a unified framework to distinguish between a data point’s value and its privacy risk. For instance, we have shown that self-attribution values are more strongly correlated with the attack success rate than the contribution of a point to the model generalization. WaKA’s different usage were also evaluated across diverse real-world datasets, demonstrating performance very close to LiRA when used as an MIA on k-NN classifiers, but with greater computational efficiency. Additionally, WaKA shows greater robustness than Shapley Values for data minimization tasks (removal or addition) on imbalanced datasets.
nan
Article 567
Title@2025-06-10 (2): Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems
Title: Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems | Künstliche Intelligenz für die Wissenschaft in Quanten-, Atom- und Kontinuumsystemen | 量子、原子学和连续系统科学人造情报 2307.08423v5 |
Authors (63): Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Alex Strasser, Haiyang Yu, YuQing Xie, Xiang Fu, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence, Hannes Stärk, Shurui Gui, Carl Edwards, Nicholas Gao, Adriana Ladera, Tailin Wu, Elyssa F. Hofgard, Aria Mansouri Tehrani, Rui Wang, Ameya Daigavane, Montgomery Bohde, Jerry Kurtin, Qian Huang, Tuong Phung, Minkai Xu, Chaitanya K. Joshi, Simon V. Mathis, Kamyar Azizzadenesheli, Ada Fang, Alán Aspuru-Guzik, Erik Bekkers, Michael Bronstein, Marinka Zitnik, Anima Anandkumar, Stefano Ermon, Pietro Liò, Rose Yu, Stephan Günnemann, Jure Leskovec, Heng Ji, Jimeng Sun, Regina Barzilay, Tommi Jaakkola, Connor W. Coley, Xiaoning Qian, Xiaofeng Qian, Tess Smidt, Shuiwang Ji
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.
nan
Article 568
Title@2025-06-10 (2): Causal Graph Recovery in Neuroimaging through Answer Set Programming
Title: Causal Graph Recovery in Neuroimaging through Answer Set Programming | Causal Graph Recovery in Neuroimaging durch Answer Set Programmierung | 通过 “ 解答组 “ 方案编制实现神经成形过程中的因果图恢复 2506.09286v1 |
Authors (5): Mohammadsajad Abavisani, Kseniya Solovyeva, David Danks, Vince Calhoun, Sergey Plis
Learning graphical causal structures from time series data presents significant challenges, especially when the measurement frequency does not match the causal timescale of the system. This often leads to a set of equally possible underlying causal graphs due to information loss from sub-sampling (i.e., not observing all possible states of the system throughout time). Our research addresses this challenge by incorporating the effects of sub-sampling in the derivation of causal graphs, resulting in more accurate and intuitive outcomes. We use a constraint optimization approach, specifically answer set programming (ASP), to find the optimal set of answers. ASP not only identifies the most probable underlying graph, but also provides an equivalence class of possible graphs for expert selection. In addition, using ASP allows us to leverage graph theory to further prune the set of possible solutions, yielding a smaller, more accurate answer set significantly faster than traditional approaches. We validate our approach on both simulated data and empirical structural brain connectivity, and demonstrate its superiority over established methods in these experiments. We further show how our method can be used as a meta-approach on top of established methods to obtain, on average, 12% improvement in F1 score. In addition, we achieved state of the art results in terms of precision and recall of reconstructing causal graph from sub-sampled time series data. Finally, our method shows robustness to varying degrees of sub-sampling on realistic simulations, whereas other methods perform worse for higher rates of sub-sampling.
nan
Article 569
Title@2025-06-10 (2): ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs
Title: ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs | Skalierbare HD: Skalierbare und hochdurchsatzfähige Hyperdimensionale Computing-Inferenz auf Multi-Core-CPUs | 可缩缩放的HD:可缩放的和高通向的多孔数多维电子计算引文 2506.09282v1 |
Authors (2): Dhruv Parikh, Viktor Prasanna
Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that represents and manipulates information using high-dimensional vectors, called hypervectors (HV). Traditional HDC methods, while robust to noise and inherently parallel, rely on single-pass, non-parametric training and often suffer from low accuracy. To address this, recent approaches adopt iterative training of base and class HVs, typically accelerated on GPUs. Inference, however, remains lightweight and well-suited for real-time execution. Yet, efficient HDC inference has been studied almost exclusively on specialized hardware such as FPGAs and GPUs, with limited attention to general-purpose multi-core CPUs. To address this gap, we propose ScalableHD for scalable and high-throughput HDC inference on multi-core CPUs. ScalableHD employs a two-stage pipelined execution model, where each stage is parallelized across cores and processes chunks of base and class HVs. Intermediate results are streamed between stages using a producer-consumer mechanism, enabling on-the-fly consumption and improving cache locality. To maximize performance, ScalableHD integrates memory tiling and NUMA-aware worker-to-core binding. Further, it features two execution variants tailored for small and large batch sizes, each designed to exploit compute parallelism based on workload characteristics while mitigating the memory-bound compute pattern that limits HDC inference performance on modern multi-core CPUs. ScalableHD achieves up to 10x speedup in throughput (samples per second) over state-of-the-art baselines such as TorchHD, across a diverse set of tasks ranging from human activity recognition to image classification, while preserving task accuracy. Furthermore, ScalableHD exhibits robust scalability: increasing the number of cores yields near-proportional throughput improvements.
nan
Article 570
Title@2025-06-10 (2): TTrace: Lightweight Error Checking and Diagnosis for Distributed Training
Title: TTrace: Lightweight Error Checking and Diagnosis for Distributed Training | TTrace: Leichte Fehlerprüfung und Diagnose für verteiltes Training | TTrace:轻量度错误检查和分布式培训诊断 2506.09280v1 |
Authors (8): Haitian Jiang, Shaowei Zhu, Zhen Zhang, Zhenyu Song, Xinwei Fu, Zhen Jia, Yida Wang, Jinyang Li
Distributed training is essential for scaling the training of large neural network models, such as large language models (LLMs), across thousands of GPUs. However, the complexity of distributed training programs makes them particularly prone to silent bugs, which do not produce explicit error signal but lead to incorrect training outcome. Effectively detecting and localizing such silent bugs in distributed training is challenging. Common debugging practice using metrics like training loss or gradient norm curves can be inefficient and ineffective. Additionally, obtaining intermediate tensor values and determining whether they are correct during silent bug localization is difficult, particularly in the context of low-precision training. To address those challenges, we design and implement TTrace, the first system capable of detecting and localizing silent bugs in distributed training. TTrace collects intermediate tensors from distributing training in a fine-grained manner and compares them against those from a trusted single-device reference implementation. To properly compare the floating-point values in the tensors, we propose novel mathematical analysis that provides a guideline for setting thresholds, enabling TTrace to distinguish bug-induced errors from floating-point round-off errors. Experimental results demonstrate that TTrace effectively detects 11 existing bugs and 3 new bugs in the widely used Megatron-LM framework, while requiring fewer than 10 lines of code change. TTrace is effective in various training recipes, including low-precision recipes involving BF16 and FP8.
nan
Article 571
Title@2025-06-10 (2): Ambient Diffusion Omni: Training Good Models with Bad Data
Title: Ambient Diffusion Omni: Training Good Models with Bad Data | Ambient Diffusion Omni: Gute Modelle mit schlechten Daten trainieren | 环境扩散全云:培训不良数据良好模型 2506.10038v1 |
Authors (5): Giannis Daras, Adrian Rodriguez-Munoz, Adam Klivans, Antonio Torralba, Constantinos Daskalakis
We show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model. Typically, diffusion models are trained on curated datasets that emerge from highly filtered data pools from the Web and other sources. We show that there is immense value in the lower-quality images that are often discarded. We present Ambient Diffusion Omni, a simple, principled framework to train diffusion models that can extract signal from all available images during training. Our framework exploits two properties of natural images – spectral power law decay and locality. We first validate our framework by successfully training diffusion models with images synthetically corrupted by Gaussian blur, JPEG compression, and motion blur. We then use our framework to achieve state-of-the-art ImageNet FID, and we show significant improvements in both image quality and diversity for text-to-image generative modeling. The core insight is that noise dampens the initial skew between the desired high-quality distribution and the mixed distribution we actually observe. We provide rigorous theoretical justification for our approach by analyzing the trade-off between learning from biased data versus limited unbiased data across diffusion times.
nan
Article 572
Title@2025-06-10 (2): A Topic Modeling Analysis of Stigma Dimensions, Social, and Related Behavioral Circumstances in Clinical Notes Among Patients with HIV
Title: A Topic Modeling Analysis of Stigma Dimensions, Social, and Related Behavioral Circumstances in Clinical Notes Among Patients with HIV | Thematische Modellierung von Stigma-Dimensionen, sozialen und damit zusammenhängenden Verhaltensumständen bei Patienten mit HIV | 艾滋病毒携带者临床说明中的污名、社会和相关行为环境的典型问题专题分析 2506.09279v1 |
Authors (8): Ziyi Chen, Yiyang Liu, Mattia Prosperi, Krishna Vaddiparti, Robert L Cook, Jiang Bian, Yi Guo, Yonghui Wu
Objective: To characterize stigma dimensions, social, and related behavioral circumstances in people living with HIV (PLWHs) seeking care, using natural language processing methods applied to a large collection of electronic health record (EHR) clinical notes from a large integrated health system in the southeast United States. Methods: We identified 9,140 cohort of PLWHs from the UF Health IDR and performed topic modeling analysis using Latent Dirichlet Allocation (LDA) to uncover stigma dimensions, social, and related behavioral circumstances. Domain experts created a seed list of HIV-related stigma keywords, then applied a snowball strategy to iteratively review notes for additional terms until saturation was reached. To identify more target topics, we tested three keyword-based filtering strategies. Domain experts manually reviewed the detected topics using the prevalent terms and key discussion topics. Word frequency analysis was used to highlight the prevalent terms associated with each topic. In addition, we conducted topic variation analysis among subgroups to examine differences across age and sex-specific demographics. Results and Conclusion: Topic modeling on sentences containing at least one keyword uncovered a wide range of topic themes associated with HIV-related stigma, social, and related behaviors circumstances, including “Mental Health Concern and Stigma”, “Social Support and Engagement”, “Limited Healthcare Access and Severe Illness”, “Treatment Refusal and Isolation” and so on. Topic variation analysis across age subgroups revealed differences. Extracting and understanding the HIV-related stigma dimensions, social, and related behavioral circumstances from EHR clinical notes enables scalable, time-efficient assessment, overcoming the limitations of traditional questionnaires and improving patient outcomes.
nan
Article 573
Title@2025-06-10 (2): UFM: A Simple Path towards Unified Dense Correspondence with Flow
Title: UFM: A Simple Path towards Unified Dense Correspondence with Flow | UFM: Ein einfacher Weg zur Unified Dense Correspondence mit Flow | UFM:通向统一思维与流动对应的简单路径 2506.09278v1 |
Authors (12): Yuchen Zhang, Nikhil Keetha, Chenwei Lyu, Bhuvan Jhamb, Yutian Chen, Yuheng Qiu, Jay Karhade, Shreyas Jha, Yaoyu Hu, Deva Ramanan, Sebastian Scherer, Wenshan Wang
Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this paper, we develop a Unified Flow & Matching model (UFM), which is trained on unified data for pixels that are co-visible in both source and target images. UFM uses a simple, generic transformer architecture that directly regresses the (u,v) flow. It is easier to train and more accurate for large flows compared to the typical coarse-to-fine cost volumes in prior work. UFM is 28% more accurate than state-of-the-art flow methods (Unimatch), while also having 62% less error and 6.7x faster than dense wide-baseline matchers (RoMa). UFM is the first to demonstrate that unified training can outperform specialized approaches across both domains. This result enables fast, general-purpose correspondence and opens new directions for multi-modal, long-range, and real-time correspondence tasks.
nan
Article 574
Title@2025-06-10 (2): EMMA: Efficient Visual Alignment in Multi-Modal LLMs
Title: EMMA: Efficient Visual Alignment in Multi-Modal LLMs | EMMA: Effiziente visuelle Ausrichtung in Multi-Modal LLMs | EMMA:多模式LMS中高效视觉协调 2410.02080v2 |
Authors (5): Sara Ghazanfari, Alexandre Araujo, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami
Multi-modal Large Language Models (MLLMs) have recently exhibited impressive general-purpose capabilities by leveraging vision foundation models to encode the core concepts of images into representations. These are then combined with instructions and processed by the language model to generate high-quality responses. Despite significant progress in enhancing the language component, challenges persist in optimally fusing visual encodings within the language model for task-specific adaptability. Recent research has focused on improving this fusion through modality adaptation modules but at the cost of significantly increased model complexity and training data needs. In this paper, we propose EMMA (Efficient Multi-Modal Adaptation), a lightweight cross-modality module designed to efficiently fuse visual and textual encodings, generating instruction-aware visual representations for the language model. Our key contributions include: (1) an efficient early fusion mechanism that integrates vision and language representations with minimal added parameters (less than 0.2% increase in model size), (2) an in-depth interpretability analysis that sheds light on the internal mechanisms of the proposed method; (3) comprehensive experiments that demonstrate notable improvements on both specialized and general benchmarks for MLLMs. Empirical results show that EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations. Our code is available at https://github.com/SaraGhazanfari/EMMA
nan
Article 575
Title@2025-06-10 (2): Learning The Minimum Action Distance
Title: Learning The Minimum Action Distance | Die minimale Aktionsdistanz lernen | 学习最低行动距离 2506.09276v1 |
Authors (4): Lorenzo Steccanella, Joshua B. Evans, Özgür Şimşek, Anders Jonsson
This paper presents a state representation framework for Markov decision processes (MDPs) that can be learned solely from state trajectories, requiring neither reward signals nor the actions executed by the agent. We propose learning the minimum action distance (MAD), defined as the minimum number of actions required to transition between states, as a fundamental metric that captures the underlying structure of an environment. MAD naturally enables critical downstream tasks such as goal-conditioned reinforcement learning and reward shaping by providing a dense, geometrically meaningful measure of progress. Our self-supervised learning approach constructs an embedding space where the distances between embedded state pairs correspond to their MAD, accommodating both symmetric and asymmetric approximations. We evaluate the framework on a comprehensive suite of environments with known MAD values, encompassing both deterministic and stochastic dynamics, as well as discrete and continuous state spaces, and environments with noisy observations. Empirical results demonstrate that the proposed approach not only efficiently learns accurate MAD representations across these diverse settings but also significantly outperforms existing state representation methods in terms of representation quality.
nan
Article 576
Title@2025-06-10 (2): Amortized Inference of Causal Models via Conditional Fixed-Point Iterations
Title: Amortized Inference of Causal Models via Conditional Fixed-Point Iterations | Amortisierte Schlussfolgerung von Kausalmodellen über bedingte Fixed-Point-Iterationen | 通过有条件的固定点定点迭代对因果模型进行摊销 2410.06128v3 |
Authors (5): Divyat Mahajan, Jannes Gladrow, Agrin Hilmkil, Cheng Zhang, Meyer Scetbon
Structural Causal Models (SCMs) offer a principled framework to reason about interventions and support out-of-distribution generalization, which are key goals in scientific discovery. However, the task of learning SCMs from observed data poses formidable challenges, and often requires training a separate model for each dataset. In this work, we propose amortized inference of SCMs by training a single model on multiple datasets sampled from different SCMs. We first use a transformer-based architecture for amortized learning of dataset embeddings, and then extend the Fixed-Point Approach (FiP) (Scetbon et al.) to infer SCMs conditionally on their dataset embeddings. As a byproduct, our method can generate observational and interventional data from novel SCMs at inference time, without updating parameters. Empirical results show that our amortized procedure performs on par with baselines trained specifically for each dataset on both in and out-of-distribution problems, and also outperforms them in scare data regimes.
nan
Article 577
Title@2025-06-10 (2): RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts
Title: RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts | RocketPPA: Code-Level Power, Performance und Area Prediction über LLM und Mixture of Experts | 火箭式PPPA:通过LLM和专家混合进行代码级动力、性能和地区预测 2503.21971v3 |
Authors (3): Armin Abdollahi, Mehdi Kamal, Massoud Pedram
This paper presents RocketPPA, a novel ultra-fast power, performance (delay), and area (PPA) estimator operating directly at the code-level abstraction using HDL code as input. The key technical innovation is its LLM-based regression model, which uniquely integrates a large language model (LLM) with a mixture-of-experts (MoE) architecture composed of multilayer perceptrons (MLPs). The LLM interprets the input HDL code and then utilizes its final hidden-layer representations to predict PPA metrics. Low-rank adaptation (LoRA) is used for parameter-efficient fine-tuning to enable efficient LLM training. Furthermore, the work includes the development of an LLM-based HDL code repair framework to generate a large and synthesizable training dataset. Experimental results on the VerilogEval benchmark demonstrate that RocketPPA achieves significant improvements in the accuracy of PPA estimation compared to previous state-of-the-art methods like Llama3-MetRex-8B. Specifically, at a 10% relative error threshold, RocketPPA enhances the pass rate for area prediction by 13.6%, delay by 9.4%, and power by 14.7%. At a 20% threshold, the improvements are 9.6% for area, 10.8% for delay, and 18.5% for power. Moreover, RocketPPA achieves a speedup of over 20x compared to MetRex and 30x over MasterRTL in processing the test set. The impact of RocketPPA is the potential to substantially accelerate the hardware design process by providing accurate PPA estimations early in the design cycle, thus avoiding the overhead of manual feature engineering and time-consuming synthesis flows.
nan
Article 578
Title@2025-06-10 (2): G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration
Title: G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration | G-Sim: Generative Simulationen mit großen Sprachmodellen und gradientfreier Kalibrierung | G-Sim:使用大语言模型和梯度自由校准的生成模拟 2506.09272v1 |
Authors (4): Samuel Holt, Max Ruiz Luyten, Antonin Berthon, Mihaela van der Schaar
Constructing robust simulators is essential for asking “what if?” questions and guiding policy in critical domains like healthcare and logistics. However, existing methods often struggle, either failing to generalize beyond historical data or, when using Large Language Models (LLMs), suffering from inaccuracies and poor empirical alignment. We introduce G-Sim, a hybrid framework that automates simulator construction by synergizing LLM-driven structural design with rigorous empirical calibration. G-Sim employs an LLM in an iterative loop to propose and refine a simulator’s core components and causal relationships, guided by domain knowledge. This structure is then grounded in reality by estimating its parameters using flexible calibration techniques. Specifically, G-Sim can leverage methods that are both likelihood-free and gradient-free with respect to the simulator, such as gradient-free optimization for direct parameter estimation or simulation-based inference for obtaining a posterior distribution over parameters. This allows it to handle non-differentiable and stochastic simulators. By integrating domain priors with empirical evidence, G-Sim produces reliable, causally-informed simulators, mitigating data-inefficiency and enabling robust system-level interventions for complex decision-making.
nan
Article 579
Title@2025-06-10 (2): Uncertainty Prioritized Experience Replay
Title: Uncertainty Prioritized Experience Replay | Ungewissheit Priorisierte Erfahrung Replay | 不确定性确定性优先经验重放 2506.09270v1 |
Authors (4): Rodrigo Carrasco-Davis, Sebastian Lee, Claudia Clopath, Will Dabney
Prioritized experience replay, which improves sample efficiency by selecting relevant transitions to update parameter estimates, is a crucial component of contemporary value-based deep reinforcement learning models. Typically, transitions are prioritized based on their temporal difference error. However, this approach is prone to favoring noisy transitions, even when the value estimation closely approximates the target mean. This phenomenon resembles the noisy TV problem postulated in the exploration literature, in which exploration-guided agents get stuck by mistaking noise for novelty. To mitigate the disruptive effects of noise in value estimation, we propose using epistemic uncertainty estimation to guide the prioritization of transitions from the replay buffer. Epistemic uncertainty quantifies the uncertainty that can be reduced by learning, hence reducing transitions sampled from the buffer generated by unpredictable random processes. We first illustrate the benefits of epistemic uncertainty prioritized replay in two tabular toy models: a simple multi-arm bandit task, and a noisy gridworld. Subsequently, we evaluate our prioritization scheme on the Atari suite, outperforming quantile regression deep Q-learning benchmarks; thus forging a path for the use of uncertainty prioritized replay in reinforcement learning agents.
nan
Article 580
Title@2025-06-10 (2): A Stable Whitening Optimizer for Efficient Neural Network Training
Title: A Stable Whitening Optimizer for Efficient Neural Network Training | Ein stabiler Whitening Optimizer für effizientes Neural Network Training | 高效神经网络培训的稳定白电优化器 2506.07254v2 |
Authors (3): Kevin Frans, Sergey Levine, Pieter Abbeel
In this work, we take an experimentally grounded look at neural network optimization. Building on the Shampoo family of algorithms, we identify and alleviate three key issues, resulting in the proposed SPlus method. First, we find that naive Shampoo is prone to divergence when matrix-inverses are cached for long periods. We introduce an alternate bounded update combining a historical eigenbasis with instantaneous normalization, resulting in across-the-board stability and significantly lower computational requirements. Second, we adapt a shape-aware scaling to enable learning rate transfer across network width. Third, we find that high learning rates result in large parameter noise, and propose a simple iterate-averaging scheme which unblocks faster learning. To properly confirm these findings, we introduce a pointed Transformer training benchmark, considering three objectives (language modelling, image classification, and diffusion modelling) across different stages of training. On average, SPlus is able to reach the validation performance of Adam within 44% of the gradient steps and 62% of the wallclock time.
nan
Article 581
Title@2025-06-10 (2): Dataset Properties Shape the Success of Neuroimaging-Based Patient Stratification: A Benchmarking Analysis Across Clustering Algorithms
Title: Dataset Properties Shape the Success of Neuroimaging-Based Patient Stratification: A Benchmarking Analysis Across Clustering Algorithms | Datensatzeigenschaften gestalten den Erfolg der Neuroimaging-basierten Patientenschichtung: Eine Benchmarking-Analyse über Cluster-Algorithmen hinweg | 数据集属性形状基于神经成形的患者分层成功率:对集束数值的基准分析 2503.12066v2 |
Authors (6): Yuetong Yu, Ruiyang Ge, Ilker Hacihaliloglu, Alexander Rauscher, Roger Tam, Sophia Frangou
Background: Data driven stratification of patients into biologically informed subtypes holds promise for precision neuropsychiatry, yet neuroimaging-based clustering methods often fail to generalize across cohorts. While algorithmic innovations have focused on model complexity, the role of underlying dataset characteristics remains underexplored. We hypothesized that cluster separation, size imbalance, noise, and the direction and magnitude of disease-related effects in the input data critically determine both within-algorithm accuracy and reproducibility. Methods: We evaluated 4 widely used stratification algorithms, HYDRA, SuStaIn, SmileGAN, and SurrealGAN, on a suite of synthetic brain-morphometry cohorts derived from the Human Connectome Project Young Adult dataset. Three global transformation patterns were applied to 600 pseudo-patients against 508 controls, followed by 4 within-dataset variations varying cluster count (k=2-6), overlap, and effect magnitude. Algorithm performance was quantified by accuracy in recovering the known ground-truth clusters. Results: Across 122 synthetic scenarios, data complexity consistently outweighed algorithm choice in predicting stratification success. Well-separated clusters yielded high accuracy for all methods, whereas overlapping, unequal-sized, or subtle effects reduced accuracy by up to 50%. SuStaIn could not scale beyond 17 features, HYDRA’s accuracy varied unpredictably with data heterogeneity. SmileGAN and SurrealGAN maintained robust pattern detection but did not assign discrete cluster labels to individuals. Conclusions: The study results demonstrate the impact of statistical properties of input data across algorithms and highlight the need for using realistic dataset distributions when new algorithms are being developed and suggest greater focus on data-centric strategies that actively shape and standardize the input distributions.
nan
Article 582
Title@2025-06-10 (2): Archon: An Architecture Search Framework for Inference-Time Techniques
Title: Archon: An Architecture Search Framework for Inference-Time Techniques | Archon: Ein Architektur-Suchrahmen für Inferenz-Zeit-Techniken | Archon: 推断-时间技术建筑搜索框架 2409.15254v6 |
Authors (11): Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, Azalia Mirhoseini
Inference-time techniques, such as repeated sampling or iterative revisions, are emerging as powerful ways to enhance large-language models (LLMs) at test time. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of each technique across models and tasks, the interactions between them, and the massive search space for combining them. To address these challenges, we introduce Archon, a modular and automated framework for optimizing the process of selecting and combining inference-time techniques and LLMs. Given a compute budget and a set of available LLMs, Archon explores a large design space to discover optimized configurations tailored to target benchmarks. It can design custom or general-purpose architectures that advance the Pareto frontier of accuracy vs. maximum token budget compared to top-performing baselines. Across instruction-following, reasoning, and coding tasks, we show that Archon can leverage additional inference compute budget to design systems that outperform frontier models such as OpenAI’s o1, GPT-4o, and Claude 3.5 Sonnet by an average of 15.1%.
nan
Article 583
Title@2025-06-10 (2): CFMI: Flow Matching for Missing Data Imputation
Title: CFMI: Flow Matching for Missing Data Imputation | CFMI: Flow Matching für fehlende Daten-Imputation | CFMI: 缺失数据计算流程匹配 2506.09258v1 |
Authors (2): Vaidotas Simkus, Michael U. Gutmann
We introduce conditional flow matching for imputation (CFMI), a new general-purpose method to impute missing data. The method combines continuous normalising flows, flow-matching, and shared conditional modelling to deal with intractabilities of traditional multiple imputation. Our comparison with nine classical and state-of-the-art imputation methods on 24 small to moderate-dimensional tabular data sets shows that CFMI matches or outperforms both traditional and modern techniques across a wide range of metrics. Applying the method to zero-shot imputation of time-series data, we find that it matches the accuracy of a related diffusion-based method while outperforming it in terms of computational efficiency. Overall, CFMI performs at least as well as traditional methods on lower-dimensional data while remaining scalable to high-dimensional settings, matching or exceeding the performance of other deep learning-based approaches, making it a go-to imputation method for a wide range of data types and dimensionalities.
nan
Article 584
Title@2025-06-10 (2): Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Title: Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity | Kommentar zur Illusion des Denkens: Die Stärken und Grenzen von Vernunftmodellen über das Lens of Problem Complexity verstehen | 关于 “ 思考的幻觉:通过问题复杂焦点了解理性模型的长处和局限性 “ 的评论 2506.09250v1 |
Authors (2): C. Opus, A. Lawsen
Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit “accuracy collapse” on planning puzzles beyond certain complexity thresholds. We demonstrate that their findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Our analysis reveals three critical issues: (1) Tower of Hanoi experiments systematically exceed model output token limits at reported failure points, with models explicitly acknowledging these constraints in their outputs; (2) The authors’ automated evaluation framework fails to distinguish between reasoning failures and practical constraints, leading to misclassification of model capabilities; (3) Most concerningly, their River Crossing benchmarks include mathematically impossible instances for N > 5 due to insufficient boat capacity, yet models are scored as failures for not solving these unsolvable problems. When we control for these experimental artifacts, by requesting generating functions instead of exhaustive move lists, preliminary experiments across multiple models indicate high accuracy on Tower of Hanoi instances previously reported as complete failures. These findings highlight the importance of careful experimental design when evaluating AI reasoning capabilities.
nan
Article 585
Title@2025-06-10 (2): Simulation-based Inference for High-dimensional Data using Surjective Sequential Neural Likelihood Estimation
Title: Simulation-based Inference for High-dimensional Data using Surjective Sequential Neural Likelihood Estimation | Simulationsbasierte Inferenz für hochdimensionale Daten mittels Surjektiv-Sequentieller Neural Likelihood-Abschätzung | 使用粒子序列神经神经近似近似性估计法对高维数据进行基于模拟的推论 2308.01054v3 |
Authors (3): Simon Dirmeier, Carlo Albert, Fernando Perez-Cruz
Neural likelihood estimation methods for simulation-based inference can suffer from performance degradation when the modeled data is very high-dimensional or lies along a lower-dimensional manifold, which is due to the inability of the density estimator to accurately estimate a density function. We present Surjective Sequential Neural Likelihood (SSNL) estimation, a novel member in the family of methods for simulation-based inference (SBI). SSNL fits a dimensionality-reducing surjective normalizing flow model and uses it as a surrogate likelihood function, which allows for computational inference via Markov chain Monte Carlo or variational Bayes methods. Among other benefits, SSNL avoids the requirement to manually craft summary statistics for inference of high-dimensional data sets, since the lower-dimensional representation is computed simultaneously with learning the likelihood and without additional computational overhead. We evaluate SSNL on a wide variety of experiments, including two challenging real-world examples from the astrophysics and neuroscience literatures, and show that it either outperforms or is on par with state-of-the-art methods, making it an excellent off-the-shelf estimator for SBI for high-dimensional data sets.
nan
Article 586
Title@2025-06-10 (2): Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation
Title: Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation | Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation | 在多式工业数据库检索增强型多式工业数据库中有条件监测援助 2506.09247v1 |
Authors (5): Karl Löwenmark, Daniel Strömbergsson, Chang Liu, Marcus Liwicki, Fredrik Sandin
Condition monitoring (CM) plays a crucial role in ensuring reliability and efficiency in the process industry. Although computerised maintenance systems effectively detect and classify faults, tasks like fault severity estimation, and maintenance decisions still largely depend on human expert analysis. The analysis and decision making automatically performed by current systems typically exhibit considerable uncertainty and high false alarm rates, leading to increased workload and reduced efficiency. This work integrates large language model (LLM)-based reasoning agents with CM workflows to address analyst and industry needs, namely reducing false alarms, enhancing fault severity estimation, improving decision support, and offering explainable interfaces. We propose MindRAG, a modular framework combining multimodal retrieval-augmented generation (RAG) with novel vector store structures designed specifically for CM data. The framework leverages existing annotations and maintenance work orders as surrogates for labels in a supervised learning protocol, addressing the common challenge of training predictive models on unlabelled and noisy real-world datasets. The primary contributions include: (1) an approach for structuring industry CM data into a semi-structured multimodal vector store compatible with LLM-driven workflows; (2) developing multimodal RAG techniques tailored for CM data; (3) developing practical reasoning agents capable of addressing real-world CM queries; and (4) presenting an experimental framework for integrating and evaluating such agents in realistic industrial scenarios. Preliminary results, evaluated with the help of an experienced analyst, indicate that MindRAG provide meaningful decision support for more efficient management of alarms, thereby improving the interpretability of CM systems.
nan
Article 587
Title@2025-06-10 (2): A Generative Framework for Predictive Modeling of Multiple Chronic Conditions Using Graph Variational Autoencoder and Bandit-Optimized Graph Neural Network
Title: A Generative Framework for Predictive Modeling of Multiple Chronic Conditions Using Graph Variational Autoencoder and Bandit-Optimized Graph Neural Network | Ein generatives Framework für die vorausschauende Modellierung mehrerer chronischer Zustände mit Graph Variational Autoencoder und Bandit-Optimiertes Graphen-Neural-Netzwerk | 利用图表变异自动编码器和强盗优化的图形神经网络对多种慢性条件进行预测模拟的生成框架 2409.13671v3 |
Authors (5): Julian Carvajal Rico, Adel Alaeddini, Syed Hasib Akhter Faruqui, Susan P Fisher-Hoch, Joseph B Mccormick
Predicting the emergence of multiple chronic conditions (MCC) is crucial for early intervention and personalized healthcare, as MCC significantly impacts patient outcomes and healthcare costs. Graph neural networks (GNNs) are effective methods for modeling complex graph data, such as those found in MCC. However, a significant challenge with GNNs is their reliance on an existing graph structure, which is not readily available for MCC. To address this challenge, we propose a novel generative framework for GNNs that constructs a representative underlying graph structure by utilizing the distribution of the data to enhance predictive analytics for MCC. Our framework employs a graph variational autoencoder (GVAE) to capture the complex relationships in patient data. This allows for a comprehensive understanding of individual health trajectories and facilitates the creation of diverse patient stochastic similarity graphs while preserving the original feature set. These variations of patient stochastic similarity graphs, generated from the GVAE decoder, are then processed by a GNN using a novel Laplacian regularization technique to refine the graph structure over time and improves the prediction accuracy of MCC. A contextual Bandit is designed to evaluate the stochastically generated graphs and identify the best-performing graph for the GNN model iteratively until model convergence. We validate the performance of the proposed contextual Bandit algorithm against $\varepsilon$-Greedy and multi-armed Bandit algorithms on a large cohort (n = 1,592) of patients with MCC. These advancements highlight the potential of the proposed approach to transform predictive healthcare analytics, enabling a more personalized and proactive approach to MCC management.
nan
Article 588
Title@2025-06-10 (2): Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning
Title: Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning | Ursachenzuweisung von Lieferrisiken durch kausale Entdeckung mit Verstärkungslernen | 利用强化学习发现原因和强化学习,确定分娩风险的根源原因 2408.05860v3 |
Authors (1): Minheng Xiao
This paper presents a novel approach to root cause attribution of delivery risks within supply chains by integrating causal discovery with reinforcement learning. As supply chains become increasingly complex, traditional methods of root cause analysis struggle to capture the intricate interrelationships between various factors, often leading to spurious correlations and suboptimal decision-making. Our approach addresses these challenges by leveraging causal discovery to identify the true causal relationships between operational variables, and reinforcement learning to iteratively refine the causal graph. This method enables the accurate identification of key drivers of late deliveries, such as shipping mode and delivery status, and provides actionable insights for optimizing supply chain performance. We apply our approach to a real-world supply chain dataset, demonstrating its effectiveness in uncovering the underlying causes of delivery delays and offering strategies for mitigating these risks. The findings have significant implications for improving operational efficiency, customer satisfaction, and overall profitability within supply chains.
nan
Article 589
Title@2025-06-10 (2): Electroencephalogram Emotion Recognition via AUC Maximization
Title: Electroencephalogram Emotion Recognition via AUC Maximization | Elektroenzephalogramm-Emotionserkennung über AUC-Maximierung | 通过AUUC实现最大化来识别电脑图情感 2408.08979v3 |
Authors (1): Minheng Xiao
Imbalanced datasets pose significant challenges in areas including neuroscience, cognitive science, and medical diagnostics, where accurately detecting minority classes is essential for robust model performance. This study addresses the issue of class imbalance, using the `Liking’ label in the DEAP dataset as an example. Such imbalances are often overlooked by prior research, which typically focuses on the more balanced arousal and valence labels and predominantly uses accuracy metrics to measure model performance. To tackle this issue, we adopt numerical optimization techniques aimed at maximizing the area under the curve (AUC), thus enhancing the detection of underrepresented classes. Our approach, which begins with a linear classifier, is compared against traditional linear classifiers, including logistic regression and support vector machines (SVM). Our method significantly outperforms these models, increasing recall from 41.6\% to 79.7\% and improving the F1-score from 0.506 to 0.632. These results highlight the efficacy of AUC maximization via numerical optimization in managing imbalanced datasets, providing an effective solution for enhancing predictive accuracy in detecting minority but crucial classes in out-of-sample datasets.
nan
Article 590
Title@2025-06-10 (2): PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies
Title: PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies | PatchGuard: Widerwärtig robuste Anomalieerkennung und Lokalisierung durch Vision Transformer und Pseudoanomalien | PatchGuard:通过视觉变形器和假冒异常器反向强力异常检测和定位 2506.09237v1 |
Authors (5): Mojtaba Nafez, Amirhossein Koochakian, Arad Maleki, Jafar Habibi, Mohammad Hossein Rohban
Anomaly Detection (AD) and Anomaly Localization (AL) are crucial in fields that demand high reliability, such as medical imaging and industrial monitoring. However, current AD and AL approaches are often susceptible to adversarial attacks due to limitations in training data, which typically include only normal, unlabeled samples. This study introduces PatchGuard, an adversarially robust AD and AL method that incorporates pseudo anomalies with localization masks within a Vision Transformer (ViT)-based architecture to address these vulnerabilities. We begin by examining the essential properties of pseudo anomalies, and follow it by providing theoretical insights into the attention mechanisms required to enhance the adversarial robustness of AD and AL systems. We then present our approach, which leverages Foreground-Aware Pseudo-Anomalies to overcome the deficiencies of previous anomaly-aware methods. Our method incorporates these crafted pseudo-anomaly samples into a ViT-based framework, with adversarial training guided by a novel loss function designed to improve model robustness, as supported by our theoretical analysis. Experimental results on well-established industrial and medical datasets demonstrate that PatchGuard significantly outperforms previous methods in adversarial settings, achieving performance gains of $53.2\%$ in AD and $68.5\%$ in AL, while also maintaining competitive accuracy in non-adversarial settings. The code repository is available at https://github.com/rohban-lab/PatchGuard .
nan
Article 591
Title@2025-06-10 (2): FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy
Title: FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy | FastLloyd: Federated, Exakt, Secure, and Tunable $k$-Means Clustering with Differential Privacy | 联邦、准确、安全和金枪鱼用$k$-配有不同隐私的组合型工具 2405.02437v3 |
Authors (3): Abdulrahman Diaa, Thomas Humphries, Florian Kerschbaum
We study the problem of privacy-preserving $k$-means clustering in the horizontally federated setting. Existing federated approaches using secure computation suffer from substantial overheads and do not offer output privacy. At the same time, differentially private (DP) $k$-means algorithms either assume a trusted central curator or significantly degrade utility by adding noise in the local DP model. Naively combining the secure and central DP solutions results in a protocol with impractical overhead. Instead, our work provides enhancements to both the DP and secure computation components, resulting in a design that is faster, more private, and more accurate than previous work. By utilizing the computational DP model, we design a lightweight, secure aggregation-based approach that achieves five orders of magnitude speed-up over state-of-the-art related work. Furthermore, we not only maintain the utility of the state-of-the-art in the central model of DP, but we improve the utility further by designing a new DP clustering mechanism.
nan
Article 592
Title@2025-06-10 (2): LieRE: Lie Rotational Positional Encodings
Title: LieRE: Lie Rotational Positional Encodings | LieRE: Lie Rotational Positional Encodings | 谎言: 谎言旋转位置编码 2406.10322v4 |
Authors (6): Sophie Ostmeier, Brian Axelrod, Maya Varma, Michael E. Moseley, Akshay Chaudhari, Curtis Langlotz
Transformer architectures depend on explicit position encodings to capture token positional information. Rotary Position Encoding (RoPE) has emerged as a popular choice in language models due to its efficient encoding of relative position information through key-query rotations. However, RoPE faces significant limitations beyond language processing: it is constrained to one-dimensional sequence data and, even with learnable phases, offers limited representational capacity. We address these challenges with Lie Relative Encodings (LieRE), which generalizes RoPE to high-dimensional rotation matrices by leveraging their Lie group structure. Through extensive evaluation on three image datasets across 2D and 3D classification tasks, LieRE achieves 1.5% improvement over state-of-the-art baselines on 2D tasks and 1% on 3D tasks, while demonstrating superior generalization to higher resolutions. Our implementation is computationally efficient, with results reproducible on 4 A100 GPUs in 30 minutes on CIFAR100. Our code is available at https://github.com/StanfordMIMI/LieRE.
nan
Article 593
Title@2025-06-10 (2): SoK: Machine Unlearning for Large Language Models
Title: SoK: Machine Unlearning for Large Language Models | SoK: Maschinelles Lernen für große Sprachmodelle | SoK:大语言模型的机器脱学 2506.09227v1 |
Authors (5): Jie Ren, Yue Xing, Yingqian Cui, Charu C. Aggarwal, Hui Liu
Large language model (LLM) unlearning has become a critical topic in machine learning, aiming to eliminate the influence of specific training data or knowledge without retraining the model from scratch. A variety of techniques have been proposed, including Gradient Ascent, model editing, and re-steering hidden representations. While existing surveys often organize these methods by their technical characteristics, such classifications tend to overlook a more fundamental dimension: the underlying intention of unlearning–whether it seeks to truly remove internal knowledge or merely suppress its behavioral effects. In this SoK paper, we propose a new taxonomy based on this intention-oriented perspective. Building on this taxonomy, we make three key contributions. First, we revisit recent findings suggesting that many removal methods may functionally behave like suppression, and explore whether true removal is necessary or achievable. Second, we survey existing evaluation strategies, identify limitations in current metrics and benchmarks, and suggest directions for developing more reliable and intention-aligned evaluations. Third, we highlight practical challenges–such as scalability and support for sequential unlearning–that currently hinder the broader deployment of unlearning methods. In summary, this work offers a comprehensive framework for understanding and advancing unlearning in generative AI, aiming to support future research and guide policy decisions around data removal and privacy.
nan
Article 594
Title@2025-06-10 (2): The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound
Title: The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound | Die Wirksamkeit von Semantik-erhaltenden Transformationen im selbstüberwachten Lernen für medizinischen Ultraschall | 医学超声波自监测学习中语义保留变异的功效 2504.07904v2 |
Authors (4): Blake VanBerlo, Alexander Wong, Jesse Hoey, Robert Arntfield
Data augmentation is a central component of joint embedding self-supervised learning (SSL). Approaches that work for natural images may not always be effective in medical imaging tasks. This study systematically investigated the impact of data augmentation and preprocessing strategies in SSL for lung ultrasound. Three data augmentation pipelines were assessed: (1) a baseline pipeline commonly used across imaging domains, (2) a novel semantic-preserving pipeline designed for ultrasound, and (3) a distilled set of the most effective transformations from both pipelines. Pretrained models were evaluated on multiple classification tasks: B-line detection, pleural effusion detection, and COVID-19 classification. Experiments revealed that semantics-preserving data augmentation resulted in the greatest performance for COVID-19 classification - a diagnostic task requiring global image context. Cropping-based methods yielded the greatest performance on the B-line and pleural effusion object classification tasks, which require strong local pattern recognition. Lastly, semantics-preserving ultrasound image preprocessing resulted in increased downstream performance for multiple tasks. Guidance regarding data augmentation and preprocessing strategies was synthesized for practitioners working with SSL in ultrasound.
nan
Article 595
Title@2025-06-10 (2): Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs
Title: Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs | Robuste Geräuschdämpfung durch adaptive Bündelung von Transformer-Ausgängen | 通过调适式组合变异器产出,通过调适式组合组合,强有力噪音减速 2506.09215v1 |
Authors (1): Greyson Brothers
We investigate the design of pooling methods used to summarize the outputs of transformer embedding models, primarily motivated by reinforcement learning and vision applications. This work considers problems where a subset of the input vectors contains requisite information for a downstream task (signal) while the rest are distractors (noise). By framing pooling as vector quantization with the goal of minimizing signal loss, we demonstrate that the standard methods used to aggregate transformer outputs, AvgPool, MaxPool, and ClsToken, are vulnerable to performance collapse as the signal-to-noise ratio (SNR) of inputs fluctuates. We then show that an attention-based adaptive pooling method can approximate the signal-optimal vector quantizer within derived error bounds for any SNR. Our theoretical results are first validated by supervised experiments on a synthetic dataset designed to isolate the SNR problem, then generalized to standard relational reasoning, multi-agent reinforcement learning, and vision benchmarks with noisy observations, where transformers with adaptive pooling display superior robustness across tasks.
nan
Article 596
Title@2025-06-10 (2): Revisiting Graph Projections for Effective Complementary Product Recommendation
Title: Revisiting Graph Projections for Effective Complementary Product Recommendation | Überarbeiten von Graph Projektionen für eine effektive ergänzende Produktempfehlung | 修订有效补充产品建议书的图表预测 2506.09209v1 |
Authors (3): Leandro Anghinoni, Pablo Zivic, Jorge Adrian Sanchez
Complementary product recommendation is a powerful strategy to improve customer experience and retail sales. However, recommending the right product is not a simple task because of the noisy and sparse nature of user-item interactions. In this work, we propose a simple yet effective method to predict a list of complementary products given a query item, based on the structure of a directed weighted graph projected from the user-item bipartite graph. We revisit bipartite graph projections for recommender systems and propose a novel approach for inferring complementarity relationships from historical user-item interactions. We compare our model with recent methods from the literature and show, despite the simplicity of our approach, an average improvement of +43% and +38% over sequential and graph-based recommenders, respectively, over different benchmarks.
nan
Article 597
Title@2025-06-10 (2): A Comparative Study of Conventional and Tripolar EEG for High-Performance Reach-to-Grasp BCI Systems
Title: A Comparative Study of Conventional and Tripolar EEG for High-Performance Reach-to-Grasp BCI Systems | Vergleichende Studie des konventionellen und tripolaren EEG für Hochleistungs-Reach-to-Grap BCI-Systeme | 利用常规和三极EEG进行高绩效接近格拉斯普到格拉斯普BCI系统比较研究 2402.09448v3 |
Authors (6): Ali Rabiee, Sima Ghafoori, Anna Cetera, Maryam Norouzi, Walter Besio, Reza Abiri
This study aims to enhance BCI applications for individuals with motor impairments by comparing the effectiveness of tripolar EEG (tEEG) with conventional EEG. The focus is on interpreting and decoding various grasping movements, such as power grasp and precision grasp. The goal is to determine which EEG technology is more effective in processing and translating grasp related neural signals. The approach involved experimenting on ten healthy participants who performed two distinct grasp movements: power grasp and precision grasp, with a no movement condition serving as the baseline. Our research presents a thorough comparison between EEG and tEEG in decoding grasping movements. This comparison spans several key parameters, including signal to noise ratio (SNR), spatial resolution via functional connectivity, ERPs, and wavelet time frequency analysis. Additionally, our study involved extracting and analyzing statistical features from the wavelet coefficients, and both binary and multiclass classification methods were employed. Four machine learning algorithms were used to evaluate the decoding accuracies. Our results indicated that tEEG demonstrated superior performance over conventional EEG in various aspects. This included a higher signal to noise ratio, enhanced spatial resolution, and more informative data in ERPs and wavelet time frequency analysis. The use of tEEG led to notable improvements in decoding accuracy for differentiating movement types. Specifically, tEEG achieved around 90% accuracy in binary and 75.97% for multiclass classification. These results are markedly better than those from standard EEG, which recorded a maximum of 77.85% and 61.27% in similar tasks, respectively. These findings highlight the superior effectiveness of tEEG over EEG in decoding grasp types and its competitive or superior performance in complex classifications compared with existing research.
nan
Article 598
Title@2025-06-10 (2): FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models
Title: FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models | FLoRIST: Singular Value Thresholding für effizientes und präzises Federated Fine-Tuning von großen Sprachmodellen | FLORST:为高效率和准确的联邦大语言模型的精密使用而保持单一价值 2506.09199v1 |
Authors (2): Hariharan Ramesh, Jyotikrishna Dass
Integrating Low-Rank Adaptation (LoRA) into federated learning offers a promising solution for parameter-efficient fine-tuning of Large Language Models (LLMs) without sharing local data. However, several methods designed for federated LoRA present significant challenges in balancing communication efficiency, model accuracy, and computational cost, particularly among heterogeneous clients. These methods either rely on simplistic averaging of local adapters, which introduces aggregation noise, require transmitting large stacked local adapters, leading to poor communication efficiency, or necessitate reconstructing memory-dense global weight-update matrix and performing computationally expensive decomposition to design client-specific low-rank adapters. In this work, we propose FLoRIST, a federated fine-tuning framework that achieves mathematically accurate aggregation without incurring high communication or computational overhead. Instead of constructing the full global weight-update matrix at the server, FLoRIST employs an efficient decomposition pipeline by performing singular value decomposition on stacked local adapters separately. This approach operates within a compact intermediate space to represent the accumulated information from local LoRAs. We introduce tunable singular value thresholding for server-side optimal rank selection to construct a pair of global low-rank adapters shared by all clients. Extensive empirical evaluations across multiple datasets and LLMs demonstrate that FLoRIST consistently strikes the best balance between superior communication efficiency and competitive performance in both homogeneous and heterogeneous setups.
nan
Article 599
Title@2025-06-10 (2): Multimodal Pragmatic Jailbreak on Text-to-image Models
Title: Multimodal Pragmatic Jailbreak on Text-to-image Models | Multimodaler Pragmatischer Jailbreak zu Text-to-Image-Modellen | 关于文本到图像模型的多式实用监狱 2409.19149v2 |
Authors (9): Tong Liu, Zhixin Lai, Jiawen Wang, Gengyuan Zhang, Shuo Chen, Philip Torr, Vera Demberg, Volker Tresp, Jindong Gu
Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content. To systematically explore this phenomenon, we propose a dataset to evaluate the current diffusion-based text-to-image (T2I) models under such jailbreak. We benchmark nine representative T2I models, including two closed-source commercial models. Experimental results reveal a concerning tendency to produce unsafe content: all tested models suffer from such type of jailbreak, with rates of unsafe generation ranging from around 10\% to 70\% where DALLE 3 demonstrates almost the highest unsafety. In real-world scenarios, various filters such as keyword blocklists, customized prompt filters, and NSFW image filters, are commonly employed to mitigate these risks. We evaluate the effectiveness of such filters against our jailbreak and found that, while these filters may be effective for single modality detection, they fail to work against our jailbreak. We also investigate the underlying reason for such jailbreaks, from the perspective of text rendering capability and training data. Our work provides a foundation for further development towards more secure and reliable T2I models. Project page at https://multimodalpragmatic.github.io/.
nan
Article 600
Title@2025-06-10 (2): Certified Unlearning for Neural Networks
Title: Certified Unlearning for Neural Networks | Zertifiziertes Lernen für neurale Netzwerke | 神经网络认证的不学习 2506.06985v2 |
Authors (5): Anastasia Koloskova, Youssef Allouah, Animesh Jha, Rachid Guerraoui, Sanmi Koyejo
We address the problem of machine unlearning, where the goal is to remove the influence of specific training data from a model upon request, motivated by privacy concerns and regulatory requirements such as the “right to be forgotten.” Unfortunately, existing methods rely on restrictive assumptions or lack formal guarantees. To this end, we propose a novel method for certified machine unlearning, leveraging the connection between unlearning and privacy amplification by stochastic post-processing. Our method uses noisy fine-tuning on the retain data, i.e., data that does not need to be removed, to ensure provable unlearning guarantees. This approach requires no assumptions about the underlying loss function, making it broadly applicable across diverse settings. We analyze the theoretical trade-offs in efficiency and accuracy and demonstrate empirically that our method not only achieves formal unlearning guarantees but also performs effectively in practice, outperforming existing baselines. Our code is available at https://github.com/stair-lab/certified-unlearning-neural-networks-icml-2025
nan
Article 601
Title@2025-06-10 (2): LaDCast: A Latent Diffusion Model for Medium-Range Ensemble Weather Forecasting
Title: LaDCast: A Latent Diffusion Model for Medium-Range Ensemble Weather Forecasting | LaDCast: Ein latentes Diffusionsmodell für mittelgroße Ensemble-Wettervorhersage | LaDCast: 中频集合天气预报的延迟传播模型 2506.09193v1 |
Authors (2): Yilin Zhuang, Karthik Duraisamy
Accurate probabilistic weather forecasting demands both high accuracy and efficient uncertainty quantification, challenges that overburden both ensemble numerical weather prediction (NWP) and recent machine-learning methods. We introduce LaDCast, the first global latent-diffusion framework for medium-range ensemble forecasting, which generates hourly ensemble forecasts entirely in a learned latent space. An autoencoder compresses high-dimensional ERA5 reanalysis fields into a compact representation, and a transformer-based diffusion model produces sequential latent updates with arbitrary hour initialization. The model incorporates Geometric Rotary Position Embedding (GeoRoPE) to account for the Earth’s spherical geometry, a dual-stream attention mechanism for efficient conditioning, and sinusoidal temporal embeddings to capture seasonal patterns. LaDCast achieves deterministic and probabilistic skill close to that of the European Centre for Medium-Range Forecast IFS-ENS, without any explicit perturbations. Notably, LaDCast demonstrates superior performance in tracking rare extreme events such as cyclones, capturing their trajectories more accurately than established models. By operating in latent space, LaDCast reduces storage and compute by orders of magnitude, demonstrating a practical path toward forecasting at kilometer-scale resolution in real time. We open-source our code and models and provide the training and evaluation pipelines at: https://github.com/tonyzyl/ladcast.
nan
Article 602
Title@2025-06-10 (2): Multi-Task Reward Learning from Human Ratings
Title: Multi-Task Reward Learning from Human Ratings | Multi-Task-Reward-Lernen aus menschlichen Bewertungen | 从人类评级中学习多任务奖励 2506.09183v1 |
Authors (6): Mingkang Wu, Devin White, Evelyn Rose, Vernon Lawhern, Nicholas R Waytowich, Yongcan Cao
Reinforcement learning from human feeback (RLHF) has become a key factor in aligning model behavior with users’ goals. However, while humans integrate multiple strategies when making decisions, current RLHF approaches often simplify this process by modeling human reasoning through isolated tasks such as classification or regression. In this paper, we propose a novel reinforcement learning (RL) method that mimics human decision-making by jointly considering multiple tasks. Specifically, we leverage human ratings in reward-free environments to infer a reward function, introducing learnable weights that balance the contributions of both classification and regression models. This design captures the inherent uncertainty in human decision-making and allows the model to adaptively emphasize different strategies. We conduct several experiments using synthetic human ratings to validate the effectiveness of the proposed approach. Results show that our method consistently outperforms existing rating-based RL methods, and in some cases, even surpasses traditional RL approaches.
nan
Article 603
Title@2025-06-10 (2): Pointwise confidence estimation in the non-linear $\ell^2$-regularized least squares
Title: Pointwise confidence estimation in the non-linear $\ell^2$-regularized least squares | Punktweise Vertrauensschätzung in den nichtlinearen $\ell^2$-regularisierten kleinsten Quadraten | 在非线性$\ell2美元正规化最低平方中点性信心估计值 2506.07088v2 |
Authors (2): Ilja Kuzborskij, Yasin Abbasi Yadkori
We consider a high-probability non-asymptotic confidence estimation in the $\ell^2$-regularized non-linear least-squares setting with fixed design. In particular, we study confidence estimation for local minimizers of the regularized training loss. We show a pointwise confidence bound, meaning that it holds for the prediction on any given fixed test input $x$. Importantly, the proposed confidence bound scales with similarity of the test input to the training data in the implicit feature space of the predictor (for instance, becoming very large when the test input lies far outside of the training data). This desirable last feature is captured by the weighted norm involving the inverse-Hessian matrix of the objective function, which is a generalized version of its counterpart in the linear setting, $x^{\top} \text{Cov}^{-1} x$. Our generalized result can be regarded as a non-asymptotic counterpart of the classical confidence interval based on asymptotic normality of the MLE estimator. We propose an efficient method for computing the weighted norm, which only mildly exceeds the cost of a gradient computation of the loss function. Finally, we complement our analysis with empirical evidence showing that the proposed confidence bound provides better coverage/width trade-off compared to a confidence estimation by bootstrapping, which is a gold-standard method in many applications involving non-linear predictors such as neural networks.
nan
Article 604
Title@2025-06-10 (2): Spatial Reasoning with Denoising Models
Title: Spatial Reasoning with Denoising Models | Raumgerechtes Denken mit vernarbenden Modellen | 空间原因与排减模式 2502.21075v2 |
Authors (4): Christopher Wewer, Bart Pogodzinski, Bernt Schiele, Jan Eric Lenssen
We introduce Spatial Reasoning Models (SRMs), a framework to perform reasoning over sets of continuous variables via denoising generative models. SRMs infer continuous representations on a set of unobserved variables, given observations on observed variables. Current generative models on spatial domains, such as diffusion and flow matching models, often collapse to hallucination in case of complex distributions. To measure this, we introduce a set of benchmark tasks that test the quality of complex reasoning in generative models and can quantify hallucination. The SRM framework allows to report key findings about importance of sequentialization in generation, the associated order, as well as the sampling strategies during training. It demonstrates, for the first time, that order of generation can successfully be predicted by the denoising network itself. Using these findings, we can increase the accuracy of specific reasoning tasks from <1% to >50%. Our project website provides additional videos, code, and the benchmark datasets: https://geometric-rl.mpi-inf.mpg.de/srm
nan
Article 605
Title@2025-06-10 (2): Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism
Title: Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism | Roboter-Gated Interaktives Imitation-Lernen mit adaptivem Interventionsmechanismus | 采用适应性干预机制的机器人自制交互式模拟学习 2506.09176v1 |
Authors (3): Haoyuan Cai, Zhenghao Peng, Bolei Zhou
Interactive Imitation Learning (IIL) allows agents to acquire desired behaviors through human interventions, but current methods impose high cognitive demands on human supervisors. We propose the Adaptive Intervention Mechanism (AIM), a novel robot-gated IIL algorithm that learns an adaptive criterion for requesting human demonstrations. AIM utilizes a proxy Q-function to mimic the human intervention rule and adjusts intervention requests based on the alignment between agent and human actions. By assigning high Q-values when the agent deviates from the expert and decreasing these values as the agent becomes proficient, the proxy Q-function enables the agent to assess the real-time alignment with the expert and request assistance when needed. Our expert-in-the-loop experiments reveal that AIM significantly reduces expert monitoring efforts in both continuous and discrete control tasks. Compared to the uncertainty-based baseline Thrifty-DAgger, our method achieves a 40% improvement in terms of human take-over cost and learning efficiency. Furthermore, AIM effectively identifies safety-critical states for expert assistance, thereby collecting higher-quality expert demonstrations and reducing overall expert data and environment interactions needed. Code and demo video are available at https://github.com/metadriverse/AIM.
nan
Article 606
Title@2025-06-10 (2): Multivariate Long-term Time Series Forecasting with Fourier Neural Filter
Title: Multivariate Long-term Time Series Forecasting with Fourier Neural Filter | Multivariate Langzeit-Zeitreihenprognose mit Fourier-Neuralfilter | 使用 Fourier神经过滤器预测多变长期时间序列 2506.09174v1 |
Authors (4): Chenheng Xu, Dan Wu, Yixin Zhu, Ying Nian Wu
Multivariate long-term time series forecasting has been suffering from the challenge of capturing both temporal dependencies within variables and spatial correlations across variables simultaneously. Current approaches predominantly repurpose backbones from natural language processing or computer vision (e.g., Transformers), which fail to adequately address the unique properties of time series (e.g., periodicity). The research community lacks a dedicated backbone with temporal-specific inductive biases, instead relying on domain-agnostic backbones supplemented with auxiliary techniques (e.g., signal decomposition). We introduce FNF as the backbone and DBD as the architecture to provide excellent learning capabilities and optimal learning pathways for spatio-temporal modeling, respectively. Our theoretical analysis proves that FNF unifies local time-domain and global frequency-domain information processing within a single backbone that extends naturally to spatial modeling, while information bottleneck theory demonstrates that DBD provides superior gradient flow and representation capacity compared to existing unified or sequential architectures. Our empirical evaluation across 11 public benchmark datasets spanning five domains (energy, meteorology, transportation, environment, and nature) confirms state-of-the-art performance with consistent hyperparameter settings. Notably, our approach achieves these results without any auxiliary techniques, suggesting that properly designed neural architectures can capture the inherent properties of time series, potentially transforming time series modeling in scientific and industrial applications.
nan
Article 607
Title@2025-06-10 (2): The Curious Language Model: Strategic Test-Time Information Acquisition
Title: The Curious Language Model: Strategic Test-Time Information Acquisition | Das Kuriose Sprachmodell: Strategische Test-Zeit-Informationsbeschaffung | 好奇语言模式:战略试验时间信息采集 2506.09173v1 |
Authors (5): Michael Cooper, Rohan Wadhawan, John Michael Giorgi, Chenhao Tan, Davis Liang
Decision-makers often possess insufficient information to render a confident decision. In these cases, the decision-maker can often undertake actions to acquire the necessary information about the problem at hand, e.g., by consulting knowledgeable authorities or by conducting experiments. Importantly, different levers of information acquisition come with different costs, posing the challenge of selecting the actions that are both informative and cost-effective. In this work, we propose CuriosiTree, a heuristic-based, test-time policy for zero-shot information acquisition in large language models (LLMs). CuriosiTree employs a greedy tree search to estimate the expected information gain of each action and strategically chooses actions based on a balance of anticipated information gain and associated cost. Empirical validation in a clinical diagnosis simulation shows that CuriosiTree enables cost-effective integration of heterogenous sources of information, and outperforms baseline action selection strategies in selecting action sequences that enable accurate diagnosis.
nan
Article 608
Title@2025-06-10 (2): MultiNet: An Open-Source Software Toolkit \& Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
Title: MultiNet: An Open-Source Software Toolkit \& Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models | MultiNet: Ein Open Source Software Toolkit \& Benchmark Suite für die Bewertung und Anpassung multimodaler Aktionsmodelle | 多网:多模式行动模式评价和适应评估的开放源软件工具包 基准套件 2506.09172v1 |
Authors (3): Pranav Guruprasad, Yangyue Wang, Harshvardhan Sikka
Recent innovations in multimodal action models represent a promising direction for developing general-purpose agentic systems, combining visual understanding, language comprehension, and action generation. We introduce MultiNet - a novel, fully open-source benchmark and surrounding software ecosystem designed to rigorously evaluate and adapt models across vision, language, and action domains. We establish standardized evaluation protocols for assessing vision-language models (VLMs) and vision-language-action models (VLAs), and provide open source software to download relevant data, models, and evaluations. Additionally, we provide a composite dataset with over 1.3 trillion tokens of image captioning, visual question answering, commonsense reasoning, robotic control, digital game-play, simulated locomotion/manipulation, and many more tasks. The MultiNet benchmark, framework, toolkit, and evaluation harness have been used in downstream research on the limitations of VLA generalization.
nan
Article 609
Title@2025-06-10 (2): Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search
Title: Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search | Verbesserung der LLM-Agentenplanung mit In-Context Learning über Atomic Fact Augmentation und Lookahead Search | 通过原子事实增强和目光头搜索改进LLM Agent 规划,通过原子事实增强和目光头搜索进行文中学习 2506.09171v1 |
Authors (4): Samuel Holt, Max Ruiz Luyten, Thomas Pouplin, Mihaela van der Schaar
Large Language Models (LLMs) are increasingly capable but often require significant guidance or extensive interaction history to perform effectively in complex, interactive environments. Existing methods may struggle with adapting to new information or efficiently utilizing past experiences for multi-step reasoning without fine-tuning. We introduce a novel LLM agent framework that enhances planning capabilities through in-context learning, facilitated by atomic fact augmentation and a recursive lookahead search. Our agent learns to extract task-critical ``atomic facts’’ from its interaction trajectories. These facts dynamically augment the prompts provided to LLM-based components responsible for action proposal, latent world model simulation, and state-value estimation. Planning is performed via a depth-limited lookahead search, where the LLM simulates potential trajectories and evaluates their outcomes, guided by the accumulated facts and interaction history. This approach allows the agent to improve its understanding and decision-making online, leveraging its experience to refine its behavior without weight updates. We provide a theoretical motivation linking performance to the quality of fact-based abstraction and LLM simulation accuracy. Empirically, our agent demonstrates improved performance and adaptability on challenging interactive tasks, achieving more optimal behavior as it accumulates experience, showcased in tasks such as TextFrozenLake and ALFWorld.
nan
Article 610
Title@2025-06-10 (2): ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning
Title: ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning | ProteinZero: Selbstverbessernde Proteinerzeugung über Online-Verstärkungslernen | 蛋白零:通过在线强化学习改善蛋白质生成 2506.07459v2 |
Authors (6): Ziwen Wang, Jiajun Fan, Ruihan Guo, Thao Nguyen, Heng Ji, Ge Liu
Protein generative models have shown remarkable promise in protein design but still face limitations in success rate, due to the scarcity of high-quality protein datasets for supervised pretraining. We present ProteinZero, a novel framework that enables scalable, automated, and continuous self-improvement of the inverse folding model through online reinforcement learning. To achieve computationally tractable online feedback, we introduce efficient proxy reward models based on ESM-fold and a novel rapid ddG predictor that significantly accelerates evaluation speed. ProteinZero employs a general RL framework balancing multi-reward maximization, KL-divergence from a reference model, and a novel protein-embedding level diversity regularization that prevents mode collapse while promoting higher sequence diversity. Through extensive experiments, we demonstrate that ProteinZero substantially outperforms existing methods across every key metric in protein design, achieving significant improvements in structural accuracy, designability, thermodynamic stability, and sequence diversity. Most impressively, ProteinZero reduces design failure rates by approximately 36% - 48% compared to widely-used methods like ProteinMPNN, ESM-IF and InstructPLM, consistently achieving success rates exceeding 90% across diverse and complex protein folds. Notably, the entire RL run on CATH-4.3 can be done with a single 8 X GPU node in under 3 days, including reward computation. Our work establishes a new paradigm for protein design where models evolve continuously from their own generated outputs, opening new possibilities for exploring the vast protein design space.
nan
Article 611
Title@2025-06-10 (2): Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models
Title: Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models | Ist die Lärmkonditionierung notwendig? Eine einheitliche Theorie der Bedingungslosen Graphen-Diffusionsmodelle | 是否有必要设定噪音条件? 无条件图形扩散模型的统一理论 2505.22935v2 |
Authors (2): Jipeng Li, Yanning Shen
Explicit noise-level conditioning is widely regarded as essential for the effective operation of Graph Diffusion Models (GDMs). In this work, we challenge this assumption by investigating whether denoisers can implicitly infer noise levels directly from corrupted graph structures, potentially eliminating the need for explicit noise conditioning. To this end, we develop a theoretical framework centered on Bernoulli edge-flip corruptions and extend it to encompass more complex scenarios involving coupled structure-attribute noise. Extensive empirical evaluations on both synthetic and real-world graph datasets, using models such as GDSS and DiGress, provide strong support for our theoretical findings. Notably, unconditional GDMs achieve performance comparable or superior to their conditioned counterparts, while also offering reductions in parameters (4-6%) and computation time (8-10%). Our results suggest that the high-dimensional nature of graph data itself often encodes sufficient information for the denoising process, opening avenues for simpler, more efficient GDM architectures.
nan
Article 612
Title@2025-06-10 (2): Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes
Title: Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes | Skalierbare Spatiotemporale Inferenz mit Biased Scan Aufmerksamkeit Transformer Neural Prozesse | 可缩缩缩的 Spatoto时推理 2506.09163v1 |
Authors (7): Daniel Jenson, Jhonathan Navott, Piotr Grynfelder, Mengyan Zhang, Makkunda Sharma, Elizaveta Semenova, Seth Flaxman
Neural Processes (NPs) are a rapidly evolving class of models designed to directly model the posterior predictive distribution of stochastic processes. While early architectures were developed primarily as a scalable alternative to Gaussian Processes (GPs), modern NPs tackle far more complex and data hungry applications spanning geology, epidemiology, climate, and robotics. These applications have placed increasing pressure on the scalability of these models, with many architectures compromising accuracy for scalability. In this paper, we demonstrate that this tradeoff is often unnecessary, particularly when modeling fully or partially translation invariant processes. We propose a versatile new architecture, the Biased Scan Attention Transformer Neural Process (BSA-TNP), which introduces Kernel Regression Blocks (KRBlocks), group-invariant attention biases, and memory-efficient Biased Scan Attention (BSA). BSA-TNP is able to: (1) match or exceed the accuracy of the best models while often training in a fraction of the time, (2) exhibit translation invariance, enabling learning at multiple resolutions simultaneously, (3) transparently model processes that evolve in both space and time, (4) support high dimensional fixed effects, and (5) scale gracefully – running inference with over 1M test points with 100K context points in under a minute on a single 24GB GPU.
nan
Article 613
Title@2025-06-10 (2): ICONS: Influence Consensus for Vision-Language Data Selection
Title: ICONS: Influence Consensus for Vision-Language Data Selection | ICONS: Einflusskonsens für Vision-Sprachen-Datenauswahl | ICONS: 愿景语言数据选择影响共识 2501.00654v3 |
Authors (6): Xindi Wu, Mengzhou Xia, Rulin Shao, Zhiwei Deng, Pang Wei Koh, Olga Russakovsky
Training vision-language models via instruction tuning often relies on large mixtures of data spanning diverse tasks and domains. However, these mixtures frequently include redundant information, increasing computational costs without proportional performance gains, necessitating more effective data selection strategies. Existing methods typically rely on task-agnostic heuristics to estimate data importance or focus on optimizing single tasks in isolation, limiting their effectiveness in multitask settings. In this work, we introduce ICONS, a gradient-based Influence CONsensus approach for vision-language data Selection. Our method leverages first-order training dynamics to estimate the influence of individual training examples on validation performance and aggregates these estimates across tasks via majority voting over task-specific influences. This cross-task consensus identifies data points that are consistently valuable across tasks, enabling us to prioritize examples that drive overall performance. The voting-based design further mitigates issues such as score calibration and outlier sensitivity, resulting in robust and scalable data selection for diverse multitask mixtures. With only 20% of the data from LLaVA-665K and Cambrian-7M, our selected subsets retain 98.6% and 98.8% of the performance achieved with full datasets, and can even surpass full data training at a 60% selection ratio on LLaVA-665K. Our approach also generalizes to unseen tasks and architectures, demonstrating strong transfer. We release two compact, high-utility subsets, LLaVA-ICONS-133K and Cambrian-ICONS-1.4M, preserving impactful training examples for efficient and scalable vision-language model development.
nan
Article 614
Title@2025-06-10 (2): FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts
Title: FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts | FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts | FC-Attack:通过自动发光流程图制作的破监狱多式多式大语言模型 2502.21059v2 |
Authors (5): Ziyi Zhang, Zhen Sun, Zongmin Zhang, Jihui Guo, Xinlei He
Multimodal Large Language Models (MLLMs) have become powerful and widely adopted in some practical applications. However, recent research has revealed their vulnerability to multimodal jailbreak attacks, whereby the model can be induced to generate harmful content, leading to safety risks. Although most MLLMs have undergone safety alignment, recent research shows that the visual modality is still vulnerable to jailbreak attacks. In our work, we discover that by using flowcharts with partially harmful information, MLLMs can be induced to provide additional harmful details. Based on this, we propose a jailbreak attack method based on auto-generated flowcharts, FC-Attack. Specifically, FC-Attack first fine-tunes a pre-trained LLM to create a step-description generator based on benign datasets. The generator is then used to produce step descriptions corresponding to a harmful query, which are transformed into flowcharts in 3 different shapes (vertical, horizontal, and S-shaped) as visual prompts. These flowcharts are then combined with a benign textual prompt to execute the jailbreak attack on MLLMs. Our evaluations on Advbench show that FC-Attack attains an attack success rate of up to 96% via images and up to 78% via videos across multiple MLLMs. Additionally, we investigate factors affecting the attack performance, including the number of steps and the font styles in the flowcharts. We also find that FC-Attack can improve the jailbreak performance from 4% to 28% in Claude-3.5 by changing the font style. To mitigate the attack, we explore several defenses and find that AdaShield can largely reduce the jailbreak performance but with the cost of utility drop.
nan
Article 615
Title@2025-06-10 (2): QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
Title: QuEST: Stable Training of LLMs with 1-Bit Weights and Activations | QuEST: Stabiles Training von LLMs mit 1-Bit Gewichten und Aktivierungen | QuEST:对具有1-比分重量和激活作用的LLMs进行稳定培训 2502.05003v2 |
Authors (6): Andrei Panferov, Jiale Chen, Soroush Tabesh, Roberto L. Castro, Mahdi Nikdan, Dan Alistarh
One approach to reducing the massive costs of large language models (LLMs) is the use of quantized or sparse representations for training or deployment. While post-training compression methods are very popular, the question of obtaining even more accurate compressed models by directly training over such representations, i.e., Quantization-Aware Training (QAT), is still open: for example, a recent study (arXiv:2411.04330) put the “optimal” bit-width at which models can be trained using QAT, while staying accuracy-competitive with standard FP16/BF16 precision, at 8-bits weights and activations. We advance this state-of-the-art via a new method called QuEST, for which we demonstrate optimality at 4-bits and stable convergence as low as 1-bit weights and activations. QuEST achieves this by improving two key aspects of QAT methods: (1) accurate and fast quantization of the (continuous) distributions of weights and activations via Hadamard normalization and MSE-optimal fitting; (2) a new trust gradient estimator based on the idea of explicitly minimizing the error between the noisy gradient computed over quantized states and the “true” (but unknown) full-precision gradient. Experiments on Llama-type architectures show that QuEST induces stable scaling laws across the entire range of hardware-supported precisions, and can be extended to sparse representations. We provide GPU kernel support showing that models produced by QuEST can be executed efficiently. Our code is available at https://github.com/IST-DASLab/QuEST.
nan
Article 616
Title@2025-06-10 (2): TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval
Title: TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval | TRACE: Erdungszeitreihen im Kontext für multimodale Einbettung und Retrieval | TRACE: 在多式嵌入和检索背景下确定时间序列 2506.09114v1 |
Authors (8): Jialin Chen, Ziyu Zhao, Gaukhar Nurbek, Aosong Feng, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, Rex Ying
The ubiquity of dynamic data in domains such as weather, healthcare, and energy underscores a growing need for effective interpretation and retrieval of time-series data. These data are inherently tied to domain-specific contexts, such as clinical notes or weather narratives, making cross-modal retrieval essential not only for downstream tasks but also for developing robust time-series foundation models by retrieval-augmented generation (RAG). Despite the increasing demand, time-series retrieval remains largely underexplored. Existing methods often lack semantic grounding, struggle to align heterogeneous modalities, and have limited capacity for handling multi-channel signals. To address this gap, we propose TRACE, a generic multimodal retriever that grounds time-series embeddings in aligned textual context. TRACE enables fine-grained channel-level alignment and employs hard negative mining to facilitate semantically meaningful retrieval. It supports flexible cross-modal retrieval modes, including Text-to-Timeseries and Timeseries-to-Text, effectively linking linguistic descriptions with complex temporal patterns. By retrieving semantically relevant pairs, TRACE enriches downstream models with informative context, leading to improved predictive accuracy and interpretability. Beyond a static retrieval engine, TRACE also serves as a powerful standalone encoder, with lightweight task-specific tuning that refines context-aware representations while maintaining strong cross-modal alignment. These representations achieve state-of-the-art performance on downstream forecasting and classification tasks. Extensive experiments across multiple domains highlight its dual utility, as both an effective encoder for downstream applications and a general-purpose retriever to enhance time-series models.
nan
Article 617
Title@2025-06-10 (2): Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
Title: Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations | Verständnis von Aufgabenvektoren im In-Context-Lernen: Emergenz, Funktionalität und Einschränkungen | 了解内文学习中的任务矢量:新出现、功能和局限性 2506.09048v1 |
Authors (4): Yuxin Dong, Jiachen Jiang, Zhihui Zhu, Xia Ning
Task vectors offer a compelling mechanism for accelerating inference in in-context learning (ICL) by distilling task-specific information into a single, reusable representation. Despite their empirical success, the underlying principles governing their emergence and functionality remain unclear. This work proposes the Linear Combination Conjecture, positing that task vectors act as single in-context demonstrations formed through linear combinations of the original ones. We provide both theoretical and empirical support for this conjecture. First, we show that task vectors naturally emerge in linear transformers trained on triplet-formatted prompts through loss landscape analysis. Next, we predict the failure of task vectors on representing high-rank mappings and confirm this on practical LLMs. Our findings are further validated through saliency analyses and parameter visualization, suggesting an enhancement of task vectors by injecting multiple ones into few-shot prompts. Together, our results advance the understanding of task vectors and shed light on the mechanisms underlying ICL in transformer-based models.
nan
Article 618
Title@2025-06-10 (2): Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation
Title: Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation | Agentische Neuronale Netzwerke: Selbstständige Multi-Agenten-Systeme über textuelle Backpropagation | 动态神经网络:通过文字反向分析实现自我演进的多行为者系统 2506.09046v1 |
Authors (5): Xiaowen Ma, Chenyang Lin, Yao Zhang, Volker Tresp, Yunpu Ma
Leveraging multiple Large Language Models(LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network(ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative “team” focused on a specific subtask. Agentic Neural Network follows a two-phase optimization strategy: (1) Forward Phase-Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase-Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables ANN to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across four benchmark datasets, ANN surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements. Our findings indicate that ANN provides a scalable, data-driven framework for multi-agent systems, combining the collaborative capabilities of LLMs with the efficiency and flexibility of neural network principles. We plan to open-source the entire framework.
nan
Article 619
Title@2025-06-10 (2): Evolutionary Policy Optimization
Title: Evolutionary Policy Optimization | Optimierung der Evolutionspolitik | 演变政策优化 2503.19037v2 |
Authors (4): Jianren Wang, Yifan Su, Abhinav Gupta, Deepak Pathak
On-policy reinforcement learning (RL) algorithms are widely used for their strong asymptotic performance and training stability, but they struggle to scale with larger batch sizes, as additional parallel environments yield redundant data due to limited policy-induced diversity. In contrast, Evolutionary Algorithms (EAs) scale naturally and encourage exploration via randomized population-based search, but are often sample-inefficient. We propose Evolutionary Policy Optimization (EPO), a hybrid algorithm that combines the scalability and diversity of EAs with the performance and stability of policy gradients. EPO maintains a population of agents conditioned on latent variables, shares actor-critic network parameters for coherence and memory efficiency, and aggregates diverse experiences into a master agent. Across tasks in dexterous manipulation, legged locomotion, and classic control, EPO outperforms state-of-the-art baselines in sample efficiency, asymptotic performance, and scalability.
nan
Article 620
Title@2025-06-10 (2): The Decoupled Risk Landscape in Performative Prediction
Title: The Decoupled Risk Landscape in Performative Prediction | Die entkoppelte Risikolandschaft in der Performativen Vorhersage | 性能预测中脱钩的风险景观 2506.09044v1 |
Authors (4): Javier Sanguino, Thomas Kehrenberg, Jose A. Lozano, Novi Quadrianto
Performative Prediction addresses scenarios where deploying a model induces a distribution shift in the input data, such as individuals modifying their features and reapplying for a bank loan after rejection. Literature has had a theoretical perspective giving mathematical guarantees for convergence (either to the stable or optimal point). We believe that visualization of the loss landscape can complement this theoretical advances with practical insights. Therefore, (1) we introduce a simple decoupled risk visualization method inspired in the two-step process that performative prediction is. Our approach visualizes the risk landscape with respect to two parameter vectors: model parameters and data parameters. We use this method to propose new properties of the interest points, to examine how existing algorithms traverse the risk landscape and perform under more realistic conditions, including strategic classification with non-linear models. (2) Building on this decoupled risk visualization, we introduce a novel setting - extended Performative Prediction - which captures scenarios where the distribution reacts to a model different from the decision-making one, reflecting the reality that agents often lack full access to the deployed model.
nan
Article 621
Title@2025-06-10 (2): FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed
Title: FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed | FZOO: Schneller Zeroth-Order-Optimierer für feintunende große Sprachmodelle in Richtung Adam-Scale-Geschwindigkeit | FZOO: 向亚当规模速度的微调大语言模型最佳最佳优选器 2506.09034v1 |
Authors (7): Sizhe Dang, Yangyang Guo, Yanjun Zhao, Haishan Ye, Xiaodong Zheng, Guang Dai, Ivor Tsang
Fine-tuning large language models (LLMs) often faces GPU memory bottlenecks: the backward pass of first-order optimizers like Adam increases memory usage to more than 10 times the inference level (e.g., 633 GB for OPT-30B). Zeroth-order (ZO) optimizers avoid this cost by estimating gradients only from forward passes, yet existing methods like MeZO usually require many more steps to converge. Can this trade-off between speed and memory in ZO be fundamentally improved? Normalized-SGD demonstrates strong empirical performance with greater memory efficiency than Adam. In light of this, we introduce FZOO, a Fast Zeroth-Order Optimizer toward Adam-Scale Speed. FZOO reduces the total forward passes needed for convergence by employing batched one-sided estimates that adapt step sizes based on the standard deviation of batch losses. It also accelerates per-batch computation through the use of Rademacher random vector perturbations coupled with CUDA’s parallel processing. Extensive experiments on diverse models, including RoBERTa-large, OPT (350M-66B), Phi-2, and Llama3, across 11 tasks validate FZOO’s effectiveness. On average, FZOO outperforms MeZO by 3 percent in accuracy while requiring 3 times fewer forward passes. For RoBERTa-large, FZOO achieves average improvements of 5.6 percent in accuracy and an 18 times reduction in forward passes compared to MeZO, achieving convergence speeds comparable to Adam. We also provide theoretical analysis proving FZOO’s formal equivalence to a normalized-SGD update rule and its convergence guarantees. FZOO integrates smoothly into PEFT techniques, enabling even larger memory savings. Overall, our results make single-GPU, high-speed, full-parameter fine-tuning practical and point toward future work on memory-efficient pre-training.
nan
Article 622
Title@2025-06-10 (2): Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
Title: Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning | Router-R1: Lehren von LLMs Multi-Round Routing und Aggregation durch Verstärkungslernen | 路由-R1路由-R1路由:教学LLMS 2506.09033v1 |
Authors (3): Haozhen Zhang, Tao Feng, Jiaxuan You
The rapid emergence of diverse large language models (LLMs) has spurred the development of LLM routers that assign user queries to the most suitable model. However, existing LLM routers typically perform a single-round, one-to-one mapping (\textit{i.e.}, assigning each query to a single model in isolation), which limits their capability to tackle complex tasks that demand the complementary strengths of multiple LLMs. In this paper, we present \textbf{Router-R1}, a reinforcement learning (RL)-based framework that formulates multi-LLM routing and aggregation as a sequential decision process. Router-R1 instantiates the router itself as a capable LLM, leveraging its reasoning ability to interleave “think” actions (internal deliberation) with “route” actions (dynamic model invocation), and integrates each response into its evolving context. To guide learning, we employ a lightweight rule-based reward comprising format rewards, final outcome rewards, and a novel cost reward for performance and cost trade-off optimization, opening a pathway toward optimizing performance-cost tradeoffs via RL. Router-R1 also conditions only on simple model descriptors such as pricing, latency, and example performance, enabling strong generalization to unseen model selection. Experiments on seven general and multi-hop QA benchmarks show that Router-R1 outperforms over several strong baselines, achieving superior performance while maintaining robust generalization and cost management.Code is available at https://github.com/ulab-uiuc/Router-R1.
nan
Article 623
Title@2025-06-10 (2): Optimal Spectral Transitions in High-Dimensional Multi-Index Models
Title: Optimal Spectral Transitions in High-Dimensional Multi-Index Models | Optimale Spektralübergänge in High-Dimensional Multi-Index-Modellen | 高多维多指数模型中最佳光谱转换 2502.02545v2 |
Authors (5): Leonardo Defilippis, Yatin Dandi, Pierre Mergny, Florent Krzakala, Bruno Loureiro
We consider the problem of how many samples from a Gaussian multi-index model are required to weakly reconstruct the relevant index subspace. Despite its increasing popularity as a testbed for investigating the computational complexity of neural networks, results beyond the single-index setting remain elusive. In this work, we introduce spectral algorithms based on the linearization of a message passing scheme tailored to this problem. Our main contribution is to show that the proposed methods achieve the optimal reconstruction threshold. Leveraging a high-dimensional characterization of the algorithms, we show that above the critical threshold the leading eigenvector correlates with the relevant index subspace, a phenomenon reminiscent of the Baik-Ben Arous-Peche (BBP) transition in spiked models arising in random matrix theory. Supported by numerical experiments and a rigorous theoretical framework, our work bridges critical gaps in the computational limits of weak learnability in multi-index model.
nan
Article 624
Title@2025-06-10 (2): JuStRank: Benchmarking LLM Judges for System Ranking
Title: JuStRank: Benchmarking LLM Judges for System Ranking | JuStRank: Benchmarking der LLM-Richter für Systemranking | JuStRank:为系统评分确定LLM法官基准 2412.09569v2 |
Authors (6): Ariel Gera, Odellia Boni, Yotam Perlitz, Roy Bar-Haim, Lilach Eden, Asaf Yehudai
Given the rapid progress of generative AI, there is a pressing need to systematically compare and choose between the numerous models and configurations available. The scale and versatility of such evaluations make the use of LLM-based judges a compelling solution for this challenge. Crucially, this approach requires first to validate the quality of the LLM judge itself. Previous work has focused on instance-based assessment of LLM judges, where a judge is evaluated over a set of responses, or response pairs, while being agnostic to their source systems. We argue that this setting overlooks critical factors affecting system-level ranking, such as a judge’s positive or negative bias towards certain systems. To address this gap, we conduct the first large-scale study of LLM judges as system rankers. System scores are generated by aggregating judgment scores over multiple system outputs, and the judge’s quality is assessed by comparing the resulting system ranking to a human-based ranking. Beyond overall judge assessment, our analysis provides a fine-grained characterization of judge behavior, including their decisiveness and bias.
nan
Article 625
Title@2025-06-10 (2): Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery
Title: Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery | Spektrale Schätzwerte für Multi-Index-Modelle: Präzise Asymptotik und optimale schwache Erholung | 多 Index 模型的光谱光谱模拟器:精确的无症状和最佳微弱恢复 2502.01583v2 |
Authors (3): Filip Kovačević, Yihan Zhang, Marco Mondelli
Multi-index models provide a popular framework to investigate the learnability of functions with low-dimensional structure and, also due to their connections with neural networks, they have been object of recent intensive study. In this paper, we focus on recovering the subspace spanned by the signals via spectral estimators – a family of methods routinely used in practice, often as a warm-start for iterative algorithms. Our main technical contribution is a precise asymptotic characterization of the performance of spectral methods, when sample size and input dimension grow proportionally and the dimension $p$ of the space to recover is fixed. Specifically, we locate the top-$p$ eigenvalues of the spectral matrix and establish the overlaps between the corresponding eigenvectors (which give the spectral estimators) and a basis of the signal subspace. Our analysis unveils a phase transition phenomenon in which, as the sample complexity grows, eigenvalues escape from the bulk of the spectrum and, when that happens, eigenvectors recover directions of the desired subspace. The precise characterization we put forward enables the optimization of the data preprocessing, thus allowing to identify the spectral estimator that requires the minimal sample size for weak recovery.
nan
Article 626
Title@2025-06-10 (2): Diffuse and Disperse: Image Generation with Representation Regularization
Title: Diffuse and Disperse: Image Generation with Representation Regularization | Diffuse und Disperse: Bildgenerierung mit Repräsentationsregularisierung | Diffuse & diffperse: 形象生成,有代表性的规范化 2506.09027v1 |
Authors (2): Runqian Wang, Kaiming He
The development of diffusion-based generative models over the past decade has largely proceeded independently of progress in representation learning. These diffusion models typically rely on regression-based objectives and generally lack explicit regularization. In this work, we propose \textit{Dispersive Loss}, a simple plug-and-play regularizer that effectively improves diffusion-based generative models. Our loss function encourages internal representations to disperse in the hidden space, analogous to contrastive self-supervised learning, with the key distinction that it requires no positive sample pairs and therefore does not interfere with the sampling process used for regression. Compared to the recent method of representation alignment (REPA), our approach is self-contained and minimalist, requiring no pre-training, no additional parameters, and no external data. We evaluate Dispersive Loss on the ImageNet dataset across a range of models and report consistent improvements over widely used and strong baselines. We hope our work will help bridge the gap between generative modeling and representation learning.
nan
Article 627
Title@2025-06-10 (2): e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
Title: e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs | e3: Erforschen lernen ermöglicht Extrapolation von Test-Time Compute für LLMs | e3: 学习探索以利对LLMM的试验时间计算进行外推计算 2506.09026v1 |
Authors (8): Amrith Setlur, Matthew Y. R. Yang, Charlie Snell, Jeremy Greer, Ian Wu, Virginia Smith, Max Simchowitz, Aviral Kumar
Test-time scaling offers a promising path to improve LLM reasoning by utilizing more compute at inference time; however, the true promise of this paradigm lies in extrapolation (i.e., improvement in performance on hard problems as LLMs keep “thinking” for longer, beyond the maximum token budget they were trained on). Surprisingly, we find that most existing reasoning models do not extrapolate well. We show that one way to enable extrapolation is by training the LLM to perform in-context exploration: training the LLM to effectively spend its test time budget by chaining operations (such as generation, verification, refinement, etc.), or testing multiple hypotheses before it commits to an answer. To enable in-context exploration, we identify three key ingredients as part of our recipe e3: (1) chaining skills that the base LLM has asymmetric competence in, e.g., chaining verification (easy) with generation (hard), as a way to implement in-context search; (2) leveraging “negative” gradients from incorrect traces to amplify exploration during RL, resulting in longer search traces that chains additional asymmetries; and (3) coupling task difficulty with training token budget during training via a specifically-designed curriculum to structure in-context exploration. Our recipe e3 produces the best known 1.7B model according to AIME’25 and HMMT’25 scores, and extrapolates to 2x the training token budget. Our e3-1.7B model not only attains high pass@1 scores, but also improves pass@k over the base model.
nan
Article 628
Title@2025-06-10 (2): DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging
Title: DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging | DIsoN: Dezentrale Isolationsnetzwerke für Out-of-Distribution Detection in Medical Imaging | DISON:医疗成像中分散分布式隔离探测网 2506.09024v1 |
Authors (5): Felix Wagner, Pramit Saha, Harry Anthony, J. Alison Noble, Konstantinos Kamnitsas
Safe deployment of machine learning (ML) models in safety-critical domains such as medical imaging requires detecting inputs with characteristics not seen during training, known as out-of-distribution (OOD) detection, to prevent unreliable predictions. Effective OOD detection after deployment could benefit from access to the training data, enabling direct comparison between test samples and the training data distribution to identify differences. State-of-the-art OOD detection methods, however, either discard training data after deployment or assume that test samples and training data are centrally stored together, an assumption that rarely holds in real-world settings. This is because shipping training data with the deployed model is usually impossible due to the size of training databases, as well as proprietary or privacy constraints. We introduce the Isolation Network, an OOD detection framework that quantifies the difficulty of separating a target test sample from the training data by solving a binary classification task. We then propose Decentralized Isolation Networks (DIsoN), which enables the comparison of training and test data when data-sharing is impossible, by exchanging only model parameters between the remote computational nodes of training and deployment. We further extend DIsoN with class-conditioning, comparing a target sample solely with training data of its predicted class. We evaluate DIsoN on four medical imaging datasets (dermatology, chest X-ray, breast ultrasound, histopathology) across 12 OOD detection tasks. DIsoN performs favorably against existing methods while respecting data-privacy. This decentralized OOD detection framework opens the way for a new type of service that ML developers could provide along with their models: providing remote, secure utilization of their training data for OOD detection services. Code will be available upon acceptance at: *****
nan
Article 629
Title@2025-06-10 (2): Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing
Title: Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing | Wahrscheinlich kostensensive gegnerische Verteidigung durch Randomized Glättung | 通过随机滑动平滑,进行成本敏感度高的反反向防御 2310.08732v3 |
Authors (4): Yuan Xin, Dingfan Chen, Michael Backes, Xiao Zhang
As ML models are increasingly deployed in critical applications, robustness against adversarial perturbations is crucial. While numerous defenses have been proposed to counter such attacks, they typically assume that all adversarial transformations are equally important, an assumption that rarely aligns with real-world applications. To address this, we study the problem of robust learning against adversarial perturbations under cost-sensitive scenarios, where the potential harm of different types of misclassifications is encoded in a cost matrix. Our solution introduces a provably robust learning algorithm to certify and optimize for cost-sensitive robustness, building on the scalable certification framework of randomized smoothing. Specifically, we formalize the definition of cost-sensitive certified radius and propose our novel adaptation of the standard certification algorithm to generate tight robustness certificates tailored to any cost matrix. In addition, we design a robust training method that improves certified cost-sensitive robustness without compromising model accuracy. Extensive experiments on benchmark datasets, including challenging ones unsolvable by existing methods, demonstrate the effectiveness of our certification algorithm and training method across various cost-sensitive scenarios.
nan
Article 630
Title@2025-06-10 (2): Edit Flows: Flow Matching with Edit Operations
Title: Edit Flows: Flow Matching with Edit Operations | Flows bearbeiten: Flow passend zu Bearbeitungsoperationen | 编辑流程: 与编辑操作匹配的流程 2506.09018v1 |
Authors (4): Marton Havasi, Brian Karrer, Itai Gat, Ricky T. Q. Chen
Autoregressive generative models naturally generate variable-length sequences, while non-autoregressive models struggle, often imposing rigid, token-wise structures. We propose Edit Flows, a non-autoregressive model that overcomes these limitations by defining a discrete flow over sequences through edit operations-insertions, deletions, and substitutions. By modeling these operations within a Continuous-time Markov Chain over the sequence space, Edit Flows enable flexible, position-relative generation that aligns more closely with the structure of sequence data. Our training method leverages an expanded state space with auxiliary variables, making the learning process efficient and tractable. Empirical results show that Edit Flows outperforms both autoregressive and mask models on image captioning and significantly outperforms the mask construction in text and code generation.
nan
Article 631
Title@2025-06-10 (2): SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning
Title: SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning | SPEED-RL: Schnellere Schulung von Vernunftmodellen über Online Curriculum Learning | SPEEED-RL:通过在线课程学习更快地培训推理模型 2506.09016v1 |
Authors (4): Ruiqi Zhang, Daman Arora, Song Mei, Andrea Zanette
Training large language models with reinforcement learning (RL) against verifiable rewards significantly enhances their reasoning abilities, yet remains computationally expensive due to inefficient uniform prompt sampling. We introduce Selective Prompting with Efficient Estimation of Difficulty (SPEED), an adaptive online RL curriculum that selectively chooses training examples of intermediate difficulty to maximize learning efficiency. Theoretically, we establish that intermediate-difficulty prompts improve the gradient estimator’s signal-to-noise ratio, accelerating convergence. Empirically, our efficient implementation leads to 2x to 6x faster training without degrading accuracy, requires no manual tuning, and integrates seamlessly into standard RL algorithms.
nan
Article 632
Title@2025-06-10 (2): Effective Data Pruning through Score Extrapolation
Title: Effective Data Pruning through Score Extrapolation | Effektive Datenkorrektur durch Score-Extrapolation | 通过计分法外推法有效调节数据 2506.09010v1 |
Authors (6): Sebastian Schmidt, Prasanga Dhungel, Christoffer Löffler, Björn Nieth, Stephan Günnemann, Leo Schwinn
Training advanced machine learning models demands massive datasets, resulting in prohibitive computational costs. To address this challenge, data pruning techniques identify and remove redundant training samples while preserving model performance. Yet, existing pruning techniques predominantly require a full initial training pass to identify removable samples, negating any efficiency benefits for single training runs. To overcome this limitation, we introduce a novel importance score extrapolation framework that requires training on only a small subset of data. We present two initial approaches in this framework - k-nearest neighbors and graph neural networks - to accurately predict sample importance for the entire dataset using patterns learned from this minimal subset. We demonstrate the effectiveness of our approach for 2 state-of-the-art pruning methods (Dynamic Uncertainty and TDDS), 4 different datasets (CIFAR-10, CIFAR-100, Places-365, and ImageNet), and 3 training paradigms (supervised, unsupervised, and adversarial). Our results indicate that score extrapolation is a promising direction to scale expensive score calculation methods, such as pruning, data attribution, or other tasks.
nan
Article 633
Title@2025-06-10 (2): Scalable Equilibrium Sampling with Sequential Boltzmann Generators
Title: Scalable Equilibrium Sampling with Sequential Boltzmann Generators | Skalierbare Equilibrium-Probenahme mit sequenziellen Boltzmann-Generatoren | 配有序列波尔茨曼发电机的可缩缩平衡取样器 2502.18462v2 |
Authors (6): Charlie B. Tan, Avishek Joey Bose, Chen Lin, Leon Klein, Michael M. Bronstein, Alexander Tong
Scalable sampling of molecular states in thermodynamic equilibrium is a long-standing challenge in statistical physics. Boltzmann generators tackle this problem by pairing normalizing flows with importance sampling to obtain uncorrelated samples under the target distribution. In this paper, we extend the Boltzmann generator framework with two key contributions, denoting our framework Sequential Boltzmann Generators (SBG). The first is a highly efficient Transformer-based normalizing flow operating directly on all-atom Cartesian coordinates. In contrast to the equivariant continuous flows of prior methods, we leverage exactly invertible non-equivariant architectures which are highly efficient during both sample generation and likelihood evaluation. This efficiency unlocks more sophisticated inference strategies beyond standard importance sampling. In particular, we perform inference-time scaling of flow samples using a continuous-time variant of sequential Monte Carlo, in which flow samples are transported towards the target distribution with annealed Langevin dynamics. SBG achieves state-of-the-art performance w.r.t. all metrics on peptide systems, demonstrating the first equilibrium sampling in Cartesian coordinates of tri-, tetra- and hexa-peptides that were thus far intractable for prior Boltzmann generators.
nan
Article 634
Title@2025-06-10 (2): Branched Schrödinger Bridge Matching
Title: Branched Schrödinger Bridge Matching | Verzweigte Schrödingerbrücke passend | 分割的 Schrödinger 桥桥配对 2506.09007v1 |
Authors (4): Sophia Tang, Yinuo Zhang, Alexander Tong, Pranam Chatterjee
Predicting the intermediate trajectories between an initial and target distribution is a central problem in generative modeling. Existing approaches, such as flow matching and Schr"odinger Bridge Matching, effectively learn mappings between two distributions by modeling a single stochastic path. However, these methods are inherently limited to unimodal transitions and cannot capture branched or divergent evolution from a common origin to multiple distinct outcomes. To address this, we introduce Branched Schr"odinger Bridge Matching (BranchSBM), a novel framework that learns branched Schr"odinger bridges. BranchSBM parameterizes multiple time-dependent velocity fields and growth processes, enabling the representation of population-level divergence into multiple terminal distributions. We show that BranchSBM is not only more expressive but also essential for tasks involving multi-path surface navigation, modeling cell fate bifurcations from homogeneous progenitor states, and simulating diverging cellular responses to perturbations.
nan
Article 635
Title@2025-06-10 (2): CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model
Title: CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model | CodeBrain: Bridging Entkoppelter Tokenizer und Multi-Scale-Architektur für EEG-Stiftungsmodell | 编码系统:为EEG基金会模式建立连接拆分调控器和多规模结构 2506.09110v1 |
Authors (7): Jingying Ma, Feng Wu, Qika Lin, Yucheng Xing, Chenyu Liu, Ziyu Jia, Mengling Feng
Electroencephalography (EEG) provides real-time insights into brain activity and is widely used in neuroscience. However, variations in channel configurations, sequence lengths, and task objectives limit the transferability of traditional task-specific models. Although recent EEG foundation models (EFMs) aim to learn generalizable representations, they struggle with limited heterogeneous representation capacity and inefficiency in capturing multi-scale brain dependencies. To address these challenges, we propose CodeBrain, an efficient EFM structurally aligned with brain organization, trained in two stages. (1) We introduce a TFDual-Tokenizer that independently tokenizes heterogeneous temporal and frequency components, enabling a quadratic expansion of the discrete representation space. This also offers a degree of interpretability through cross-domain token analysis. (2) We propose the EEGSSM, which combines a structured global convolution architecture and a sliding window attention mechanism to jointly model sparse long-range and local dependencies. Unlike fully connected Transformer models, EEGSSM better reflects the brain’s small-world topology and efficiently captures EEG’s inherent multi-scale structure. EEGSSM is trained with a masked self-supervised learning objective to predict token indices obtained in TFDual-Tokenizer. Comprehensive experiments on 10 public EEG datasets demonstrate the generalizability of CodeBrain with linear probing. By offering biologically informed and interpretable EEG modeling, CodeBrain lays the foundation for future neuroscience research. Both code and pretraining weights will be released in the future version.
nan
Article 636
Title@2025-06-10 (2): SensorLM: Learning the Language of Wearable Sensors
Title: SensorLM: Learning the Language of Wearable Sensors | SensorLM: Die Sprache der tragbaren Sensoren lernen | 传感器LM:学习穿装传感器的语言 2506.09108v1 |
Authors (20): Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, Tim Althoff, Yun Liu, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, Yuzhe Yang
We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor data with language remains challenging due to the lack of paired, richly annotated sensor-text descriptions in uncurated, real-world wearable data. We introduce a hierarchical caption generation pipeline designed to capture statistical, structural, and semantic information from sensor data. This approach enabled the curation of the largest sensor-language dataset to date, comprising over 59.7 million hours of data from more than 103,000 people. Furthermore, SensorLM extends prominent multimodal pretraining architectures (e.g., CLIP, CoCa) and recovers them as specific variants within a generic architecture. Extensive experiments on real-world tasks in human activity analysis and healthcare verify the superior performance of SensorLM over state-of-the-art in zero-shot recognition, few-shot learning, and cross-modal retrieval. SensorLM also demonstrates intriguing capabilities including scaling behaviors, label efficiency, sensor captioning, and zero-shot generalization to unseen tasks.
nan
Article 637
Title@2025-06-10 (2): Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models
Title: Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Effiziente Ausrichtung der medizinischen Visionssprache durch Anpassung maskierter Visionsmodelle | 通过调适蒙面愿景模型,提高医疗愿景-语言协调效率 2506.08990v1 |
Authors (5): Chenyu Lian, Hong-Yu Zhou, Dongyun Liang, Jing Qin, Liansheng Wang
Medical vision-language alignment through cross-modal contrastive learning shows promising performance in image-text matching tasks, such as retrieval and zero-shot classification. However, conventional cross-modal contrastive learning (CLIP-based) methods suffer from suboptimal visual representation capabilities, which also limits their effectiveness in vision-language alignment. In contrast, although the models pretrained via multimodal masked modeling struggle with direct cross-modal matching, they excel in visual representation. To address this contradiction, we propose ALTA (ALign Through Adapting), an efficient medical vision-language alignment method that utilizes only about 8% of the trainable parameters and less than 1/5 of the computational consumption required for masked record modeling. ALTA achieves superior performance in vision-language matching tasks like retrieval and zero-shot classification by adapting the pretrained vision model from masked record modeling. Additionally, we integrate temporal-multiview radiograph inputs to enhance the information consistency between radiographs and their corresponding descriptions in reports, further improving the vision-language alignment. Experimental evaluations show that ALTA outperforms the best-performing counterpart by over 4% absolute points in text-to-image accuracy and approximately 6% absolute points in image-to-text retrieval accuracy. The adaptation of vision-language models during efficient alignment also promotes better vision and language understanding. Code is publicly available at https://github.com/DopamineLcy/ALTA.
nan
Article 638
Title@2025-06-10 (2): SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
Title: SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning | SwS: Selbstbewusste Schwäche-getriebene Problemsynthese bei der Stärkung des Lernens für LLM-Reasoning | SWS:自觉的弱点驱动的问题综述 2506.08989v1 |
Authors (8): Xiao Liang, Zhong-Zhi Li, Yeyun Gong, Yang Wang, Hengyuan Zhang, Yelong Shen, Ying Nian Wu, Weizhu Chen
Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for training large language models (LLMs) on complex reasoning tasks, such as mathematical problem solving. A prerequisite for the scalability of RLVR is a high-quality problem set with precise and verifiable answers. However, the scarcity of well-crafted human-labeled math problems and limited-verification answers in existing distillation-oriented synthetic datasets limit their effectiveness in RL. Additionally, most problem synthesis strategies indiscriminately expand the problem set without considering the model’s capabilities, leading to low efficiency in generating useful questions. To mitigate this issue, we introduce a Self-aware Weakness-driven problem Synthesis framework (SwS) that systematically identifies model deficiencies and leverages them for problem augmentation. Specifically, we define weaknesses as questions that the model consistently fails to learn through its iterative sampling during RL training. We then extract the core concepts from these failure cases and synthesize new problems to strengthen the model’s weak areas in subsequent augmented training, enabling it to focus on and gradually overcome its weaknesses. Without relying on external knowledge distillation, our framework enables robust generalization byempowering the model to self-identify and address its weaknesses in RL, yielding average performance gains of 10.0% and 7.7% on 7B and 32B models across eight mainstream reasoning benchmarks.
nan
Article 639
Title@2025-06-10 (2): Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Title: Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models | Quamba2: Robustes und skalierbares Post-Training Quantization Framework für selektive State Space Models | Quamba2:有选择性的国家空间模型强力和可扩展的培训后量化框架 2503.22879v3 |
Authors (6): Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Mohamed S. Abdelfattah, Diana Marculescu
State Space Models (SSMs) are emerging as a compelling alternative to Transformers because of their consistent memory usage and high performance. Despite this, scaling up SSMs on cloud services or limited-resource devices is challenging due to their storage requirements and computational power. To overcome this, quantizing SSMs with low bit-width data formats can reduce model size and benefit from hardware acceleration. As SSMs are prone to quantization-induced errors, recent efforts have focused on optimizing a particular model or bit-width for efficiency without sacrificing performance. However, distinct bit-width configurations are essential for different scenarios, like W4A8 for boosting large-batch decoding speed, and W4A16 for enhancing generation speed in short prompt applications for a single user. To this end, we present Quamba2, compatible with W8A8, W4A8, and W4A16 for both Mamba1 and Mamba2 backbones, addressing the growing demand for SSM deployment on various platforms. Based on the channel order preserving and activation persistence of SSMs, we propose an offline approach to quantize inputs of a linear recurrence in 8-bit by sorting and clustering for input $x$, combined with a per-state-group quantization for input-dependent parameters $B$ and $C$. To ensure compute-invariance in the SSM output, we rearrange weights offline according to the clustering sequence. The experiments show that Quamba2-8B outperforms two state-of-the-art SSM quantization methods and delivers 1.3$\times$ and 3$\times$ speed-ups in the pre-filling and generation stages, respectively, while offering 4$\times$ memory reduction with only a $1.6\%$ average accuracy drop. The evaluation on MMLU shows the generalizability and robustness of our framework. The code and quantized models will be released at: https://github.com/enyac-group/Quamba.
nan
Article 640
Title@2025-06-10 (2): PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
Title: PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips | PrisonBreak: Gefängnisbrechende große Sprachmodelle mit weniger als fünfundzwanzig gezielten Bit-Flips | 监狱破碎:破禁大语言模式,目标位翻页小于25个目标小于25个 2412.07192v2 |
Authors (8): Zachary Coalson, Jeonghyun Woo, Yu Sun, Shiyang Chen, Lishan Yang, Prashant Nair, Bo Fang, Sanghyun Hong
We introduce a new class of attacks on commercial-scale (human-aligned) language models that induce jailbreaking through targeted bitwise corruptions in model parameters. Our adversary can jailbreak billion-parameter language models with fewer than 25 bit-flips in all cases$-$and as few as 5 in some$-$using up to 40$\times$ less bit-flips than existing attacks on computer vision models at least 100$\times$ smaller. Unlike prompt-based jailbreaks, our attack renders these models in memory ‘uncensored’ at runtime, allowing them to generate harmful responses without any input modifications. Our attack algorithm efficiently identifies target bits to flip, offering up to 20$\times$ more computational efficiency than previous methods. This makes it practical for language models with billions of parameters. We show an end-to-end exploitation of our attack using software-induced fault injection, Rowhammer (RH). Our work examines 56 DRAM RH profiles from DDR4 and LPDDR4X devices with different RH vulnerabilities. We show that our attack can reliably induce jailbreaking in systems similar to those affected by prior bit-flip attacks. Moreover, our approach remains effective even against highly RH-secure systems (e.g., 46$\times$ more secure than previously tested systems). Our analyses further reveal that: (1) models with less post-training alignment require fewer bit flips to jailbreak; (2) certain model components, such as value projection layers, are substantially more vulnerable than others; and (3) our method is mechanistically different than existing jailbreaks. Our findings highlight a pressing, practical threat to the language model ecosystem and underscore the need for research to protect these models from bit-flip attacks.
nan
Article 641
Title@2025-06-10 (2): Bias Analysis in Unconditional Image Generative Models
Title: Bias Analysis in Unconditional Image Generative Models | Bias-Analyse in bedingungslosen Bildgenerativen Modellen | 无条件图像生成模型中的偏见分析 2506.09106v1 |
Authors (5): Xiaofeng Zhang, Michelle Lin, Simon Lacoste-Julien, Aaron Courville, Yash Goyal
The widespread adoption of generative AI models has raised growing concerns about representational harm and potential discriminatory outcomes. Yet, despite growing literature on this topic, the mechanisms by which bias emerges - especially in unconditional generation - remain disentangled. We define the bias of an attribute as the difference between the probability of its presence in the observed distribution and its expected proportion in an ideal reference distribution. In our analysis, we train a set of unconditional image generative models and adopt a commonly used bias evaluation framework to study bias shift between training and generated distributions. Our experiments reveal that the detected attribute shifts are small. We find that the attribute shifts are sensitive to the attribute classifier used to label generated images in the evaluation framework, particularly when its decision boundaries fall in high-density regions. Our empirical analysis indicates that this classifier sensitivity is often observed in attributes values that lie on a spectrum, as opposed to exhibiting a binary nature. This highlights the need for more representative labeling practices, understanding the shortcomings through greater scrutiny of evaluation frameworks, and recognizing the socially complex nature of attributes when evaluating bias.
nan
Article 642
Title@2025-06-10 (2): DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
Title: DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models | DiffLM: Ansteuerbare Synthetische Datengenerierung über Diffusions-Sprachenmodelle | DiffLM: 通过传播语言模型生成可控合成数据 2411.03250v2 |
Authors (9): Ying Zhou, Xinyao Wang, Yulei Niu, Yaojie Shen, Lexin Tang, Fan Chen, Ben He, Le Sun, Longyin Wen
Recent advancements in large language models (LLMs) have significantly enhanced their knowledge and generative capabilities, leading to a surge of interest in leveraging LLMs for high-quality data synthesis. However, synthetic data generation via prompting LLMs remains challenging due to LLMs’ limited understanding of target data distributions and the complexity of prompt engineering, especially for structured formatted data. To address these issues, we introduce DiffLM, a controllable data synthesis framework based on variational autoencoder (VAE), which further (1) leverages diffusion models to reserve more information of original distribution and format structure in the learned latent distribution and (2) decouples the learning of target distribution knowledge from the LLM’s generative objectives via a plug-and-play latent feature injection module. As we observed significant discrepancies between the VAE’s latent representations and the real data distribution, the latent diffusion module is introduced into our framework to learn a fully expressive latent distribution. Evaluations on seven real-world datasets with structured formatted data (i.e., Tabular, Code, and Tool data) demonstrate that DiffLM generates high-quality data, with performance on downstream tasks surpassing that of real data by 2%-7% in certain cases. Data and code are available at https://github.com/bytedance/DiffLM.
nan
Article 643
Title@2025-06-10 (2): Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs
Title: Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs | Exakte obere und untere Grenzen für die Ausgabeverteilung von neuralen Netzwerken mit Random Inputs | 带有随机输入的神经网络输出分布精确的上下上下曲线 2502.11672v2 |
Authors (4): Andrey Kofnov, Daniel Kapla, Ezio Bartocci, Efstathia Bura
We derive exact upper and lower bounds for the cumulative distribution function (cdf) of the output of a neural network (NN) over its entire support subject to noisy (stochastic) inputs. The upper and lower bounds converge to the true cdf over its domain as the resolution increases. Our method applies to any feedforward NN using continuous monotonic piecewise twice continuously differentiable activation functions (e.g., ReLU, tanh and softmax) and convolutional NNs, which were beyond the scope of competing approaches. The novelty and instrumental tool of our approach is to bound general NNs with ReLU NNs. The ReLU NN-based bounds are then used to derive the upper and lower bounds of the cdf of the NN output. Experiments demonstrate that our method delivers guaranteed bounds of the predictive output distribution over its support, thus providing exact error guarantees, in contrast to competing approaches.
nan
Article 644
Title@2025-06-10 (2): Propositional Logic for Probing Generalization in Neural Networks
Title: Propositional Logic for Probing Generalization in Neural Networks | Propositionelle Logik zur Probing-Verallgemeinerung in neuralen Netzwerken | 神经网络普遍化的预示逻辑 2506.08978v1 |
Authors (3): Anna Langedijk, Jaap Jumelet, Willem Zuidema
The extent to which neural networks are able to acquire and represent symbolic rules remains a key topic of research and debate. Much current work focuses on the impressive capabilities of large language models, as well as their often ill-understood failures on a wide range of reasoning tasks. In this paper, in contrast, we investigate the generalization behavior of three key neural architectures (Transformers, Graph Convolution Networks and LSTMs) in a controlled task rooted in propositional logic. The task requires models to generate satisfying assignments for logical formulas, making it a structured and interpretable setting for studying compositionality. We introduce a balanced extension of an existing dataset to eliminate superficial patterns and enable testing on unseen operator combinations. Using this dataset, we evaluate the ability of the three architectures to generalize beyond the training distribution. While all models perform well in-distribution, we find that generalization to unseen patterns, particularly those involving negation, remains a significant challenge. Transformers fail to apply negation compositionally, unless structural biases are introduced. Our findings highlight persistent limitations in the ability of standard architectures to learn systematic representations of logical operators, suggesting the need for stronger inductive biases to support robust rule-based reasoning.
nan
Article 645
Title@2025-06-10 (2): Tailored Architectures for Time Series Forecasting: Evaluating Deep Learning Models on Gaussian Process-Generated Data
Title: Tailored Architectures for Time Series Forecasting: Evaluating Deep Learning Models on Gaussian Process-Generated Data | Maßgeschneiderte Architekturen für die Prognose von Zeitreihen: Bewertung von Deep Learning-Modellen auf gaussischen prozessgenerierten Daten | 时间序列预测:评价高斯进程生成数据的深学习模型 2506.08977v1 |
Authors (2): Victoria Hankemeier, Malte Schilling
Developments in Deep Learning have significantly improved time series forecasting by enabling more accurate modeling of complex temporal dependencies inherent in sequential data. The effectiveness of such models is often demonstrated on limited sets of specific real-world data. Although this allows for comparative analysis, it still does not demonstrate how specific data characteristics align with the architectural strengths of individual models. Our research aims at uncovering clear connections between time series characteristics and particular models. We introduce a novel dataset generated using Gaussian Processes, specifically designed to display distinct, known characteristics for targeted evaluations of model adaptability to them. Furthermore, we present TimeFlex, a new model that incorporates a modular architecture tailored to handle diverse temporal dynamics, including trends and periodic patterns. This model is compared to current state-of-the-art models, offering a deeper understanding of how models perform under varied time series conditions.
nan
Article 646
Title@2025-06-10 (2): Calibrated Physics-Informed Uncertainty Quantification
Title: Calibrated Physics-Informed Uncertainty Quantification | Kalibrierung der physikinformierten Unsicherheit Quantifizierung | 校准物理 - 内部不确定性量化 2502.04406v2 |
Authors (8): Vignesh Gopakumar, Ander Gray, Lorenzo Zanisi, Timothy Nunn, Daniel Giles, Matt J. Kusner, Stanislas Pamela, Marc Peter Deisenroth
Simulating complex physical systems is crucial for understanding and predicting phenomena across diverse fields, such as fluid dynamics and heat transfer, as well as plasma physics and structural mechanics. Traditional approaches rely on solving partial differential equations (PDEs) using numerical methods, which are computationally expensive and often prohibitively slow for real-time applications or large-scale simulations. Neural PDEs have emerged as efficient alternatives to these costly numerical solvers, offering significant computational speed-ups. However, their lack of robust uncertainty quantification (UQ) limits deployment in critical applications. We introduce a model-agnostic, physics-informed conformal prediction (CP) framework that provides guaranteed uncertainty estimates without requiring labelled data. By utilising a physics-based approach, we can quantify and calibrate the model’s inconsistencies with the physics rather than the uncertainty arising from the data. Our approach utilises convolutional layers as finite-difference stencils and leverages physics residual errors as nonconformity scores, enabling data-free UQ with marginal and joint coverage guarantees across prediction domains for a range of complex PDEs. We further validate the efficacy of our method on neural PDE models for plasma modelling and shot design in fusion reactors.
nan
Article 647
Title@2025-06-10 (2): On Large-scale Evaluation of Embedding Models for Knowledge Graph Completion
Title: On Large-scale Evaluation of Embedding Models for Knowledge Graph Completion | Zur groß angelegten Bewertung von Einbettungsmodellen für die Wissensgraphenvervollständigung | 大规模评价知识图完成情况嵌入模型 2504.08970v2 |
Authors (3): Nasim Shirvani-Mahdavi, Farahnaz Akrami, Chengkai Li
Knowledge graph embedding (KGE) models are extensively studied for knowledge graph completion, yet their evaluation remains constrained by unrealistic benchmarks. Standard evaluation metrics rely on the closed-world assumption, which penalizes models for correctly predicting missing triples, contradicting the fundamental goals of link prediction. These metrics often compress accuracy assessment into a single value, obscuring models’ specific strengths and weaknesses. The prevailing evaluation protocol, link prediction, operates under the unrealistic assumption that an entity’s properties, for which values are to be predicted, are known in advance. While alternative protocols such as property prediction, entity-pair ranking, and triple classification address some of these limitations, they remain underutilized. Moreover, commonly used datasets are either faulty or too small to reflect real-world data. Few studies examine the role of mediator nodes, which are essential for modeling n-ary relationships, or investigate model performance variation across domains. This paper conducts a comprehensive evaluation of four representative KGE models on large-scale datasets FB-CVT-REV and FB+CVT-REV. Our analysis reveals critical insights, including substantial performance variations between small and large datasets, both in relative rankings and absolute metrics, systematic overestimation of model capabilities when n-ary relations are binarized, and fundamental limitations in current evaluation protocols and metrics.
nan
Article 648
Title@2025-06-10 (2): Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers
Title: Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers | Vortrainierte Sprachmodelle lernen bemerkenswert genaue Darstellungen von Zahlen | 培训前语文模式的精确数字表示 2506.08966v1 |
Authors (5): Marek Kadlčík, Michal Štefánik, Timothee Mickus, Michal Spiegel, Josef Kuchař
Pretrained language models (LMs) are prone to arithmetic errors. Existing work showed limited success in probing numeric values from models’ representations, indicating that these errors can be attributed to the inherent unreliability of distributionally learned embeddings in representing exact quantities. However, we observe that previous probing methods are inadequate for the emergent structure of learned number embeddings with sinusoidal patterns. In response, we propose a novel probing technique that decodes numeric values from input embeddings with near-perfect accuracy across a range of open-source LMs. This proves that after the sole pre-training, LMs represent numbers with remarkable precision. Finally, we find that the embeddings’ preciseness judged by our probe’s accuracy explains a large portion of LM’s errors in elementary arithmetic, and show that aligning the embeddings with the pattern discovered by our probe can mitigate these errors.
nan
Article 649
Title@2025-06-10 (2): GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO
Title: GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO | GFREND: Generatives, nur wenige Schuss Belohnungsableitung durch EfficieNt DPO | GFRIEND: 通过公平竞争和消费者保护局生成的微小的报偿推理 2506.08965v1 |
Authors (3): Yiyang Zhao, Huiyu Bai, Xuejiao Zhao
The ability to train high-performing reward models with few-shot data is critical for enhancing the efficiency and scalability of Reinforcement Learning from Human Feedback (RLHF). We propose a data augmentation and expansion framework that enables generative reward models trained on small datasets to achieve comparable performance to those trained on large-scale datasets. Traditional methods to train a generative reward model, such as Direct Preference Optimization (DPO), are constrained by inefficiencies in sample pairing and limited data diversity. This work introduces preference refinement, which employs Chain-of-Thought (CoT) sampling to uncover diverse and high-quality preference relationships. It also incorporates a perplexity-based scoring mechanism to assign nuanced preference levels and utilizes Multi-level Direct Preference Optimization (M-DPO) to enable the model to capture finer-grained preference differences between samples. Experimental results demonstrate that the proposed method significantly enhances data efficiency and model performance, enabling reward models trained in a few-shot setting to achieve results on par with those trained on large-scale datasets. This study underscores the potential of data-efficient strategies in advancing reward model optimization, offering a robust solution for low-resource RLHF applications.
nan
Article 650
Title@2025-06-10 (2): Gumbel-max List Sampling for Distribution Coupling with Multiple Samples
Title: Gumbel-max List Sampling for Distribution Coupling with Multiple Samples | Gumbel-max-Listenstichproben für die Verteilungskopplung mit mehreren Proben | 与多个样本相混合的分布式组合的 Gumbel- max 列表列表抽样 2506.05632v2 |
Authors (3): Joseph Rowan, Buu Phan, Ashish Khisti
We study a relaxation of the problem of coupling probability distributions – a list of samples is generated from one distribution and an accept is declared if any one of these samples is identical to the sample generated from the other distribution. We propose a novel method for generating samples, which extends the Gumbel-max sampling suggested in Daliri et al. (arXiv:2408.07978) for coupling probability distributions. We also establish a corresponding lower bound on the acceptance probability, which we call the list matching lemma. We next discuss two applications of our setup. First, we develop a new mechanism for multi-draft speculative sampling that is simple to implement and achieves performance competitive with baselines such as SpecTr and SpecInfer across a range of language tasks. Our method also guarantees a certain degree of drafter invariance with respect to the output tokens which is not supported by existing schemes. We also provide a theoretical lower bound on the token level acceptance probability. As our second application, we consider distributed lossy compression with side information in a setting where a source sample is compressed and available to multiple decoders, each with independent side information. We propose a compression technique that is based on our generalization of Gumbel-max sampling and show that it provides significant gains in experiments involving synthetic Gaussian sources and the MNIST image dataset.
nan
Article 651
Title@2025-06-10 (2): BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly
Title: BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly | BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly | 双形:双体几何大会的学习协作能力 2506.06221v2 |
Authors (9): Yan Shen, Ruihai Wu, Yubin Ke, Xinyuan Song, Zeyi Li, Xiaoqi Li, Hongwei Fan, Haoran Lu, Hao dong
Shape assembly, the process of combining parts into a complete whole, is a crucial robotic skill with broad real-world applications. Among various assembly tasks, geometric assembly–where broken parts are reassembled into their original form (e.g., reconstructing a shattered bowl)–is particularly challenging. This requires the robot to recognize geometric cues for grasping, assembly, and subsequent bimanual collaborative manipulation on varied fragments. In this paper, we exploit the geometric generalization of point-level affordance, learning affordance aware of bimanual collaboration in geometric assembly with long-horizon action sequences. To address the evaluation ambiguity caused by geometry diversity of broken parts, we introduce a real-world benchmark featuring geometric variety and global reproducibility. Extensive experiments demonstrate the superiority of our approach over both previous affordance-based and imitation-based methods. Project page: https://sites.google.com/view/biassembly/.
nan
Article 652
Title@2025-06-10 (2): Towards Robust Deep Reinforcement Learning against Environmental State Perturbation
Title: Towards Robust Deep Reinforcement Learning against Environmental State Perturbation | Auf dem Weg zu einem robusten, tiefen Stärkungslernen gegen Umweltzustandsstörung | 努力开展强有力的深入强化学习,防止环境状态的干扰 2506.08961v1 |
Authors (2): Chenxu Wang, Huaping Liu
Adversarial attacks and robustness in Deep Reinforcement Learning (DRL) have been widely studied in various threat models; however, few consider environmental state perturbations, which are natural in embodied scenarios. To improve the robustness of DRL agents, we formulate the problem of environmental state perturbation, introducing a preliminary non-targeted attack method as a calibration adversary, and then propose a defense framework, named Boosted Adversarial Training (BAT), which first tunes the agents via supervised learning to avoid catastrophic failure and subsequently adversarially trains the agent with reinforcement learning. Extensive experimental results substantiate the vulnerability of mainstream agents under environmental state perturbations and the effectiveness of our proposed attack. The defense results demonstrate that while existing robust reinforcement learning algorithms may not be suitable, our BAT framework can significantly enhance the robustness of agents against environmental state perturbations across various situations.
nan
Article 653
Title@2025-06-10 (2): MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning
Title: MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning | MetaTT: Ein globaler Tensor-Train-Adapter für parametereffizientes Feintuning | MetATT: 参数有效精美调试全球Tensor-Train适应器 2506.09105v1 |
Authors (6): Javier Lopez-Piqueres, Pranav Deshpande, Archan Ray, Mattia J. Villani, Marco Pistoia, Niraj Kumar
We present MetaTT, a unified Tensor Train (TT) adapter framework for global low-rank fine-tuning of pre-trained transformers. Unlike LoRA, which fine-tunes each weight matrix independently, MetaTT uses a single shared TT to factorize all transformer sub-modules – query, key, value, projection, and feed-forward layers – by indexing the structural axes like layer and matrix type, and optionally heads and tasks. For a given rank, while LoRA adds parameters proportional to the product across modes, MetaTT only adds parameters proportional to the sum across modes leading to a significantly compressed final adapter. Our benchmarks compare MetaTT with LoRA along with recent state-of-the-art matrix and tensor decomposition based fine-tuning schemes. We observe that when tested on standard language modeling benchmarks, MetaTT leads to the most reduction in the parameters while maintaining similar accuracy to LoRA and even outperforming other tensor-based methods. Unlike CP or other rank-factorizations, the TT ansatz benefits from mature optimization routines – e.g., DMRG-style rank adaptive minimization in addition to Adam, which we find simplifies training. Because new modes can be appended cheaply, MetaTT naturally extends to shared adapters across many tasks without redesigning the core tensor.
nan
Article 654
Title@2025-06-10 (2): IntTrajSim: Trajectory Prediction for Simulating Multi-Vehicle driving at Signalized Intersections
Title: IntTrajSim: Trajectory Prediction for Simulating Multi-Vehicle driving at Signalized Intersections | IntTrajSim: Trajektorienvorhersage für das Simulieren von Multi-Fahrzeug-Fahren bei Signalisierten Schnitten | IntrajSim: 信号交界处模拟多车辆驾驶轨迹预测 2506.08957v1 |
Authors (4): Yash Ranjan, Rahul Sengupta, Anand Rangarajan, Sanjay Ranka
Traffic simulators are widely used to study the operational efficiency of road infrastructure, but their rule-based approach limits their ability to mimic real-world driving behavior. Traffic intersections are critical components of the road infrastructure, both in terms of safety risk (nearly 28% of fatal crashes and 58% of nonfatal crashes happen at intersections) as well as the operational efficiency of a road corridor. This raises an important question: can we create a data-driven simulator that can mimic the macro- and micro-statistics of the driving behavior at a traffic intersection? Deep Generative Modeling-based trajectory prediction models provide a good starting point to model the complex dynamics of vehicles at an intersection. But they are not tested in a “live” micro-simulation scenario and are not evaluated on traffic engineering-related metrics. In this study, we propose traffic engineering-related metrics to evaluate generative trajectory prediction models and provide a simulation-in-the-loop pipeline to do so. We also provide a multi-headed self-attention-based trajectory prediction model that incorporates the signal information, which outperforms our previous models on the evaluation metrics.
nan
Article 655
Title@2025-06-10 (2): Unifying Block-wise PTQ and Distillation-based QAT for Progressive Quantization toward 2-bit Instruction-Tuned LLMs
Title: Unifying Block-wise PTQ and Distillation-based QAT for Progressive Quantization toward 2-bit Instruction-Tuned LLMs | Vereinheitlichen Blockweise PTQ und Destillationsbasiertes QAT für die progressive Quantisierung in Richtung 2-Bit Instruction-Tuned LLMs | 用于逐步量化至2位制导二位制导LMM 的QAT 统一块式 PTQ 和基于蒸馏的QAT 统一PTQ 和 蒸馏基QAT 2506.09104v1 |
Authors (5): Jung Hyun Lee, Seungjae Shin, Vinnam Kim, Jaeseong You, An Chen
As the rapid scaling of large language models (LLMs) poses significant challenges for deployment on resource-constrained devices, there is growing interest in extremely low-bit quantization, such as 2-bit. Although prior works have shown that 2-bit large models are pareto-optimal over their 4-bit smaller counterparts in both accuracy and latency, these advancements have been limited to pre-trained LLMs and have not yet been extended to instruction-tuned models. To bridge this gap, we propose Unified Progressive Quantization (UPQ)$-$a novel progressive quantization framework (FP16$\rightarrow$INT4$\rightarrow$INT2) that unifies block-wise post-training quantization (PTQ) with distillation-based quantization-aware training (Distill-QAT) for INT2 instruction-tuned LLM quantization. UPQ first quantizes FP16 instruction-tuned models to INT4 using block-wise PTQ to significantly reduce the quantization error introduced by subsequent INT2 quantization. Next, UPQ applies Distill-QAT to enable INT2 instruction-tuned LLMs to generate responses consistent with their original FP16 counterparts by minimizing the generalized Jensen-Shannon divergence (JSD) between the two. To the best of our knowledge, we are the first to demonstrate that UPQ can quantize open-source instruction-tuned LLMs to INT2 without relying on proprietary post-training data, while achieving state-of-the-art performances on MMLU and IFEval$-$two of the most representative benchmarks for evaluating instruction-tuned LLMs.
nan
Article 656
Title@2025-06-10 (2): Data Augmentation For Small Object using Fast AutoAugment
Title: Data Augmentation For Small Object using Fast AutoAugment | Datenvergrößerung für kleine Objekte mit schneller AutoAugment | 使用快速自动增强的小型对象数据增强 2506.08956v1 |
Authors (4): DaeEun Yoon, Semin Kim, SangWook Yoo, Jongha Lee
In recent years, there has been tremendous progress in object detection performance. However, despite these advances, the detection performance for small objects is significantly inferior to that of large objects. Detecting small objects is one of the most challenging and important problems in computer vision. To improve the detection performance for small objects, we propose an optimal data augmentation method using Fast AutoAugment. Through our proposed method, we can quickly find optimal augmentation policies that can overcome degradation when detecting small objects, and we achieve a 20% performance improvement on the DOTA dataset.
nan
Article 657
Title@2025-06-10 (2): Segment Concealed Objects with Incomplete Supervision
Title: Segment Concealed Objects with Incomplete Supervision | Segmentverdeckte Objekte mit unvollständiger Überwachung | 带不完全监督的隐藏物体 2506.08955v1 |
Authors (11): Chunming He, Kai Li, Yachao Zhang, Ziyun Yang, Youwei Pang, Longxiang Tang, Chengyu Fang, Yulun Zhang, Linghe Kong, Xiu Li, Sina Farsiu
Incompletely-Supervised Concealed Object Segmentation (ISCOS) involves segmenting objects that seamlessly blend into their surrounding environments, utilizing incompletely annotated data, such as weak and semi-annotations, for model training. This task remains highly challenging due to (1) the limited supervision provided by the incompletely annotated training data, and (2) the difficulty of distinguishing concealed objects from the background, which arises from the intrinsic similarities in concealed scenarios. In this paper, we introduce the first unified method for ISCOS to address these challenges. To tackle the issue of incomplete supervision, we propose a unified mean-teacher framework, SEE, that leverages the vision foundation model, ``\emph{Segment Anything Model (SAM)}’’, to generate pseudo-labels using coarse masks produced by the teacher model as prompts. To mitigate the effect of low-quality segmentation masks, we introduce a series of strategies for pseudo-label generation, storage, and supervision. These strategies aim to produce informative pseudo-labels, store the best pseudo-labels generated, and select the most reliable components to guide the student model, thereby ensuring robust network training. Additionally, to tackle the issue of intrinsic similarity, we design a hybrid-granularity feature grouping module that groups features at different granularities and aggregates these results. By clustering similar features, this module promotes segmentation coherence, facilitating more complete segmentation for both single-object and multiple-object images. We validate the effectiveness of our approach across multiple ISCOS tasks, and experimental results demonstrate that our method achieves state-of-the-art performance. Furthermore, SEE can serve as a plug-and-play solution, enhancing the performance of existing models.
nan
Article 658
Title@2025-06-10 (2): Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction
Title: Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction | Protriever: End-to-End-Unterscheidbare Proteinhomologie Suche nach Fitness-Vorhersage | 远期:最终到最终差异可辨别的蛋白质同理学研究,以预测适合性。 2506.08954v1 |
Authors (7): Ruben Weitzman, Peter Mørch Groth, Lood Van Niekerk, Aoi Otani, Yarin Gal, Debora Marks, Pascal Notin
Retrieving homologous protein sequences is essential for a broad range of protein modeling tasks such as fitness prediction, protein design, structure modeling, and protein-protein interactions. Traditional workflows have relied on a two-step process: first retrieving homologs via Multiple Sequence Alignments (MSA), then training models on one or more of these alignments. However, MSA-based retrieval is computationally expensive, struggles with highly divergent sequences or complex insertions & deletions patterns, and operates independently of the downstream modeling objective. We introduce Protriever, an end-to-end differentiable framework that learns to retrieve relevant homologs while simultaneously training for the target task. When applied to protein fitness prediction, Protriever achieves state-of-the-art performance compared to sequence-based models that rely on MSA-based homolog retrieval, while being two orders of magnitude faster through efficient vector search. Protriever is both architecture- and task-agnostic, and can flexibly adapt to different retrieval strategies and protein databases at inference time – offering a scalable alternative to alignment-centric approaches.
nan
Article 659
Title@2025-06-10 (2): Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Explanations for Multivariate Time Series Classification
Title: Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Explanations for Multivariate Time Series Classification | Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Erklärungen für multivariate Zeitreihenklassifikation | 多变量时间序列分类的多目标子序列、基于多目标子序列的简单反事实解释 2501.04009v2 |
Authors (2): Mario Refoyo, David Luengo
Deep Learning systems excel in complex tasks but often lack transparency, limiting their use in critical applications. Counterfactual explanations, a core tool within eXplainable Artificial Intelligence (XAI), offer insights into model decisions by identifying minimal changes to an input to alter its predicted outcome. However, existing methods for time series data are limited by univariate assumptions, rigid constraints on modifications, or lack of validity guarantees. This paper introduces Multi-SpaCE, a multi-objective counterfactual explanation method for multivariate time series. Using non-dominated ranking genetic algorithm II (NSGA-II), Multi-SpaCE balances proximity, sparsity, plausibility, and contiguity. Unlike most methods, it ensures perfect validity, supports multivariate data and provides a Pareto front of solutions, enabling flexibility to different end-user needs. Comprehensive experiments in diverse datasets demonstrate the ability of Multi-SpaCE to consistently achieve perfect validity and deliver superior performance compared to existing methods.
nan
Article 660
Title@2025-06-10 (2): Epistemic Uncertainty in Conformal Scores: A Unified Approach
Title: Epistemic Uncertainty in Conformal Scores: A Unified Approach | Epistemische Ungewissheit in konformen Partituren: Ein einheitlicher Ansatz | 非正式分数中的不确定性:统一办法 2502.06995v2 |
Authors (4): Luben M. C. Cabezas, Vagner S. Santos, Thiago R. Ramos, Rafael Izbicki
Conformal prediction methods create prediction bands with distribution-free guarantees but do not explicitly capture epistemic uncertainty, which can lead to overconfident predictions in data-sparse regions. Although recent conformal scores have been developed to address this limitation, they are typically designed for specific tasks, such as regression or quantile regression. Moreover, they rely on particular modeling choices for epistemic uncertainty, restricting their applicability. We introduce $\texttt{EPICSCORE}$, a model-agnostic approach that enhances any conformal score by explicitly integrating epistemic uncertainty. Leveraging Bayesian techniques such as Gaussian Processes, Monte Carlo Dropout, or Bayesian Additive Regression Trees, $\texttt{EPICSCORE}$ adaptively expands predictive intervals in regions with limited data while maintaining compact intervals where data is abundant. As with any conformal method, it preserves finite-sample marginal coverage. Additionally, it also achieves asymptotic conditional coverage. Experiments demonstrate its good performance compared to existing methods. Designed for compatibility with any Bayesian model, but equipped with distribution-free guarantees, $\texttt{EPICSCORE}$ provides a general-purpose framework for uncertainty quantification in prediction problems.
nan
Article 661
Title@2025-06-10 (2): Implicit Neural Representations for Chemical Reaction Paths
Title: Implicit Neural Representations for Chemical Reaction Paths | Implizite neurale Darstellungen für chemische Reaktionswege | 化学反应路径的隐性神经图示 2502.15843v2 |
Authors (5): Kalyan Ramakrishnan, Lars L. Schaaf, Chen Lin, Guangrun Wang, Philip Torr
We show that neural networks can be optimized to represent minimum energy paths as continuous functions, offering a flexible alternative to discrete path-search methods like Nudged Elastic Band (NEB). Our approach parameterizes reaction paths with a network trained on a loss function that discards tangential energy gradients and enables instant estimation of the transition state. We first validate the method on two-dimensional potentials and then demonstrate its advantages over NEB on challenging atomistic systems where (i) poor initial guesses yield unphysical paths, (ii) multiple competing paths exist, or (iii) the reaction follows a complex multi-step mechanism. Results highlight the versatility of the method: for instance, a simple adjustment to the sampling strategy during optimization can help escape local-minimum solutions. Finally, in a low-dimensional setting, we demonstrate that a single neural network can learn from existing paths and generalize to unseen systems, showing promise for a universal reaction path representation.
nan
Article 662
Title@2025-06-10 (2): KARMA: A Multilevel Decomposition Hybrid Mamba Framework for Multivariate Long-Term Time Series Forecasting
Title: KARMA: A Multilevel Decomposition Hybrid Mamba Framework for Multivariate Long-Term Time Series Forecasting | KARMA: Ein Multilevel-Dekompositions-Hybrid-Mamba-Framework für multivariate Langzeitreihenprognosen | KARMA: 多变长期时间序列预测多层次分解混合 Mamba 框架 2506.08939v1 |
Authors (7): Hang Ye, Gaoxiang Duan, Haoran Zeng, Yangxin Zhu, Lingxue Meng, Xiaoying Zheng, Yongxin Zhu
Multivariate long-term and efficient time series forecasting is a key requirement for a variety of practical applications, and there are complex interleaving time dynamics in time series data that require decomposition modeling. Traditional time series decomposition methods are single and rely on fixed rules, which are insufficient for mining the potential information of the series and adapting to the dynamic characteristics of complex series. On the other hand, the Transformer-based models for time series forecasting struggle to effectively model long sequences and intricate dynamic relationships due to their high computational complexity. To overcome these limitations, we introduce KARMA, with an Adaptive Time Channel Decomposition module (ATCD) to dynamically extract trend and seasonal components. It further integrates a Hybrid Frequency-Time Decomposition module (HFTD) to further decompose Series into frequency-domain and time-domain. These components are coupled with multi-scale Mamba-based KarmaBlock to efficiently process global and local information in a coordinated manner. Experiments on eight real-world datasets from diverse domains well demonstrated that KARMA significantly outperforms mainstream baseline methods in both predictive accuracy and computational efficiency. Code and full results are available at this repository: https://github.com/yedadasd/KARMA
nan
Article 663
Title@2025-06-10 (2): BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models
Title: BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models | BioLangFusion: Multimodale Fusion von DNA, mRNA und Protein Language Models | BioLangFusion:DNA、MRNA和蛋白素语言模型的多式联运 2506.08936v1 |
Authors (6): Amina Mollaysa, Artem Moskale, Pushpak Pati, Tommaso Mansi, Mangal Prakash, Rui Liao
We present BioLangFusion, a simple approach for integrating pre-trained DNA, mRNA, and protein language models into unified molecular representations. Motivated by the central dogma of molecular biology (information flow from gene to transcript to protein), we align per-modality embeddings at the biologically meaningful codon level (three nucleotides encoding one amino acid) to ensure direct cross-modal correspondence. BioLangFusion studies three standard fusion techniques: (i) codon-level embedding concatenation, (ii) entropy-regularized attention pooling inspired by multiple-instance learning, and (iii) cross-modal multi-head attention – each technique providing a different inductive bias for combining modality-specific signals. These methods require no additional pre-training or modification of the base models, allowing straightforward integration with existing sequence-based foundation models. Across five molecular property prediction tasks, BioLangFusion outperforms strong unimodal baselines, showing that even simple fusion of pre-trained models can capture complementary multi-omic information with minimal overhead.
nan
Article 664
Title@2025-06-10 (2): Can A Gamer Train A Mathematical Reasoning Model?
Title: Can A Gamer Train A Mathematical Reasoning Model? | Kann ein Gamer ein mathematisches Reasoning-Modell trainieren? | 赌徒能训练数学理论模型吗? 2506.08935v1 |
Authors (1): Andrew Shin
While large language models (LLMs) have achieved remarkable performance in various tasks including mathematical reasoning, their development typically demands prohibitive computational resources. Recent advancements have reduced costs for training capable models, yet even these approaches rely on high-end hardware clusters. In this paper, we demonstrate that a single average gaming GPU can train a solid mathematical reasoning model, by integrating reinforcement learning and memory optimization techniques. Specifically, we train a 1.5B parameter mathematical reasoning model on RTX 3080 Ti of 16GB memory that achieves comparable or better performance on mathematical reasoning benchmarks than models several times larger, in resource-constrained environments. Our results challenge the paradigm that state-of-the-art mathematical reasoning necessitates massive infrastructure, democratizing access to high-performance AI research. https://github.com/shinandrew/YouronMath.
nan
Article 665
Title@2025-06-10 (2): Local MDI+: Local Feature Importances for Tree-Based Models
Title: Local MDI+: Local Feature Importances for Tree-Based Models | Lokale MDI+: lokale Feature-Wichtigkeiten für baumbasierte Modelle | 本地 MDI+: 树基模型的本地特点重要性 2506.08928v1 |
Authors (5): Zhongyuan Liang, Zachary T. Rewolinski, Abhineet Agarwal, Tiffany M. Tang, Bin Yu
Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local (i.e. sample-specific) feature importance (LFI) methods such as LIME and TreeSHAP. However, these approaches rely on approximations that ignore the model’s internal structure and instead depend on potentially unstable perturbations. These issues are addressed in the global setting by MDI+, a feature importance method which exploits an equivalence between decision trees and linear models on a transformed node basis. However, the global MDI+ scores are not able to explain predictions when faced with heterogeneous individual characteristics. To address this gap, we propose Local MDI+ (LMDI+), a novel extension of the MDI+ framework to the sample specific setting. LMDI+ outperforms existing baselines LIME and TreeSHAP in identifying instance-specific signal features, averaging a 10% improvement in downstream task performance across twelve real-world benchmark datasets. It further demonstrates greater stability by consistently producing similar instance-level feature importance rankings across multiple random forest fits. Finally, LMDI+ enables local interpretability use cases, including the identification of closer counterfactuals and the discovery of homogeneous subgroups.
nan
Article 666
Title@2025-06-10 (2): Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap
Title: Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap | Multimodale Unüberwachte Domain-Verallgemeinerung durch Herausholen über die Modalitätslücke | 通过跨越模式差距的检索实现多式多式不受监督域化 2402.04416v3 |
Authors (4): Christopher Liao, Christian So, Theodoros Tsiligkaridis, Brian Kulis
Domain generalization (DG) is an important problem that learns a model which generalizes to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (MUDG) problem, which uses a large task-agnostic unlabeled source dataset during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space. We make three contributions in the MUDG setting. Firstly, we show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization. Accordingly, we propose paired k-means, a simple clustering algorithm that improves nearest neighbor recall by storing centroids in query space instead of image space. Secondly, we propose an adaptive text augmentation scheme for target labels designed to improve zero-shot accuracy and diversify retrieved image data. Lastly, we present two simple but effective components to further improve downstream target accuracy. We compare against state-of-the-art name-only transfer, source-free DG and zero-shot (ZS) methods on their respective benchmarks and show consistent improvement in accuracy on 20 diverse datasets. Code is available: https://github.com/Chris210634/mudg
nan
Article 667
Title@2025-06-10 (2): PropMEND: Hypernetworks for Knowledge Propagation in LLMs
Title: PropMEND: Hypernetworks for Knowledge Propagation in LLMs | PropMEND: Hypernetworks für die Wissensverbreitung in LLMs | PropMEND: 以LLMMs方式传播知识的超网络 2506.08920v1 |
Authors (3): Zeyu Leo Liu, Greg Durrett, Eunsol Choi
Knowledge editing techniques for large language models (LLMs) can inject knowledge that is later reproducible verbatim, but they fall short on propagating that knowledge: models cannot answer questions that require reasoning with the injected knowledge. We present a hypernetwork-based approach for knowledge propagation, named PropMEND, where we meta-learn how to modify gradients of a language modeling loss to encourage injected information to propagate. Our approach extends the meta-objective of MEND [29] so that gradient updates on knowledge are transformed to enable answering multi-hop questions involving that knowledge. We show improved performance on the RippleEdit dataset, showing almost 2x accuracy on challenging multi-hop questions whose answers are not explicitly stated in the injected fact. We further introduce a new dataset, Controlled RippleEdit, to evaluate the generalization of our hypernetwork, testing knowledge propagation along relations and entities unseen during hypernetwork training. PropMEND still outperforms existing approaches in unseen entity-relation pairs, yet the performance gap decreases substantially, suggesting future work in propagating knowledge to a wide range of relations.
nan
Article 668
Title@2025-06-10 (2): Provably Accurate Adaptive Sampling for Collocation Points in Physics-informed Neural Networks
Title: Provably Accurate Adaptive Sampling for Collocation Points in Physics-informed Neural Networks | Wahrscheinlich genaue adaptive Probenahme für Kollokationspunkte in physikinformierten Neuronalen Netzwerken | 物理知情神经网络定位点的准确适应性抽样 2504.00910v2 |
Authors (5): Antoine Caradot, Rémi Emonet, Amaury Habrard, Abdel-Rahim Mezidi, Marc Sebban
Despite considerable scientific advances in numerical simulation, efficiently solving PDEs remains a complex and often expensive problem. Physics-informed Neural Networks (PINN) have emerged as an efficient way to learn surrogate solvers by embedding the PDE in the loss function and minimizing its residuals using automatic differentiation at so-called collocation points. Originally uniformly sampled, the choice of the latter has been the subject of recent advances leading to adaptive sampling refinements for PINNs. In this paper, leveraging a new quadrature method for approximating definite integrals, we introduce a provably accurate sampling method for collocation points based on the Hessian of the PDE residuals. Comparative experiments conducted on a set of 1D and 2D PDEs demonstrate the benefits of our method.
nan
Article 669
Title@2025-06-10 (2): Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL)
Title: Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL) | Verbesserung der Verallgemeinerbarkeit der Modellentdeckung über den Parameterraum mit Multi-Experiment-Gleichung Lernen (ME-EQL) | 通过多实验方程学习(ME-EQL)加强跨参数空间模型发现的一般性 2506.08916v1 |
Authors (6): Maria-Veronica Ciocanel, John T. Nardini, Kevin B. Flores, Erica M. Rutter, Suzanne S. Sindi, Alexandria Volkening
Agent-based modeling (ABM) is a powerful tool for understanding self-organizing biological systems, but it is computationally intensive and often not analytically tractable. Equation learning (EQL) methods can derive continuum models from ABM data, but they typically require extensive simulations for each parameter set, raising concerns about generalizability. In this work, we extend EQL to Multi-experiment equation learning (ME-EQL) by introducing two methods: one-at-a-time ME-EQL (OAT ME-EQL), which learns individual models for each parameter set and connects them via interpolation, and embedded structure ME-EQL (ES ME-EQL), which builds a unified model library across parameters. We demonstrate these methods using a birth–death mean-field model and an on-lattice agent-based model of birth, death, and migration with spatial structure. Our results show that both methods significantly reduce the relative error in recovering parameters from agent-based simulations, with OAT ME-EQL offering better generalizability across parameter space. Our findings highlight the potential of equation learning from multiple experiments to enhance the generalizability and interpretability of learned models for complex biological systems.
nan
Article 670
Title@2025-06-10 (2): Federated Generalised Variational Inference: A Robust Probabilistic Federated Learning Framework
Title: Federated Generalised Variational Inference: A Robust Probabilistic Federated Learning Framework | Federated Generalised Variational Inference: Ein robustes probabilistisches Federated Learning Framework | 联邦通用多变性推断:强力概率联邦学习框架 2502.00846v3 |
Authors (4): Terje Mildner, Oliver Hamelijnck, Paris Giampouras, Theodoros Damoulas
We introduce FedGVI, a probabilistic Federated Learning (FL) framework that is robust to both prior and likelihood misspecification. FedGVI addresses limitations in both frequentist and Bayesian FL by providing unbiased predictions under model misspecification, with calibrated uncertainty quantification. Our approach generalises previous FL approaches, specifically Partitioned Variational Inference (Ashman et al., 2022), by allowing robust and conjugate updates, decreasing computational complexity at the clients. We offer theoretical analysis in terms of fixed-point convergence, optimality of the cavity distribution, and provable robustness to likelihood misspecification. Further, we empirically demonstrate the effectiveness of FedGVI in terms of improved robustness and predictive performance on multiple synthetic and real world classification data sets.
nan
Article 671
Title@2025-06-10 (2): Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU
Title: Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU | Implementierung von Keyword-Spoting auf dem MCUX947 Microcontroller mit integrierter NPU | 执行关于MCUX947微型控制器与集成 NPU 2506.08911v1 |
Authors (2): Petar Jakuš, Hrvoje Džapo
This paper presents a keyword spotting (KWS) system implemented on the NXP MCXN947 microcontroller with an integrated Neural Processing Unit (NPU), enabling real-time voice interaction on resource-constrained devices. The system combines MFCC feature extraction with a CNN classifier, optimized using Quantization Aware Training to reduce model size with minimal accuracy drop. Experimental results demonstrate a 59x speedup in inference time when leveraging the NPU compared to CPU-only execution, achieving 97.06% accuracy with a model size of 30.58 KB, demonstrating the feasibility of efficient, low-power voice interfaces on embedded platforms.
nan
Article 672
Title@2025-06-10 (2): BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
Title: BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning | BEAST: Effiziente Tokenisierung von B-Splines kodierten Aktionssequenzen für das Imitationslernen | BEAST: 模拟学习的B-S级编码行动序列的有效招接 2506.06072v2 |
Authors (14): Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, Ömer Erdinç Yağmurlu, Nils Blank, Moritz Reuss, Rudolf Lioutikov
We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures generating smooth trajectories without discontinuities between adjacent segments. We extensively evaluate BEAST by integrating it with three distinct model architectures: a Variational Autoencoder (VAE) with continuous tokens, a decoder-only Transformer with discrete tokens, and Florence-2, a pretrained Vision-Language Model with an encoder-decoder architecture, demonstrating BEAST’s compatibility and scalability with large pretrained models. We evaluate BEAST across three established benchmarks consisting of 166 simulated tasks and on three distinct robot settings with a total of 8 real-world tasks. Experimental results demonstrate that BEAST (i) significantly reduces both training and inference computational costs, and (ii) consistently generates smooth, high-frequency control signals suitable for continuous control tasks while (iii) reliably achieves competitive task success rates compared to state-of-the-art methods.
nan
Article 673
Title@2025-06-10 (2): Intention-Conditioned Flow Occupancy Models
Title: Intention-Conditioned Flow Occupancy Models | Intentionskonditionierte Flow-Belegungsmodelle | 有意有条件流动占用模式 2506.08902v1 |
Authors (4): Chongyi Zheng, Seohong Park, Sergey Levine, Benjamin Eysenbach
Large-scale pre-training has fundamentally changed how machine learning research is done today: large foundation models are trained once, and then can be used by anyone in the community (including those without data or compute resources to train a model from scratch) to adapt and fine-tune to specific tasks. Applying this same framework to reinforcement learning (RL) is appealing because it offers compelling avenues for addressing core challenges in RL, including sample efficiency and robustness. However, there remains a fundamental challenge to pre-train large models in the context of RL: actions have long-term dependencies, so training a foundation model that reasons across time is important. Recent advances in generative AI have provided new tools for modeling highly complex distributions. In this paper, we build a probabilistic model to predict which states an agent will visit in the temporally distant future (i.e., an occupancy measure) using flow matching. As large datasets are often constructed by many distinct users performing distinct tasks, we include in our model a latent variable capturing the user intention. This intention increases the expressivity of our model, and enables adaptation with generalized policy improvement. We call our proposed method intention-conditioned flow occupancy models (InFOM). Comparing with alternative methods for pre-training, our experiments on $36$ state-based and $4$ image-based benchmark tasks demonstrate that the proposed method achieves $1.8 \times$ median improvement in returns and increases success rates by $36\%$. Website: https://chongyi-zheng.github.io/infom Code: https://github.com/chongyi-zheng/infom
nan
Article 674
Title@2025-06-10 (2): Feature Shift Localization Network
Title: Feature Shift Localization Network | Feature Shift Localization Network | 本地化网络 2506.09101v1 |
Authors (4): Míriam Barrabés, Daniel Mas Montserrat, Kapal Dev, Alexander G. Ioannidis
Feature shifts between data sources are present in many applications involving healthcare, biomedical, socioeconomic, financial, survey, and multi-sensor data, among others, where unharmonized heterogeneous data sources, noisy data measurements, or inconsistent processing and standardization pipelines can lead to erroneous features. Localizing shifted features is important to address the underlying cause of the shift and correct or filter the data to avoid degrading downstream analysis. While many techniques can detect distribution shifts, localizing the features originating them is still challenging, with current solutions being either inaccurate or not scalable to large and high-dimensional datasets. In this work, we introduce the Feature Shift Localization Network (FSL-Net), a neural network that can localize feature shifts in large and high-dimensional datasets in a fast and accurate manner. The network, trained with a large number of datasets, learns to extract the statistical properties of the datasets and can localize feature shifts from previously unseen datasets and shifts without the need for re-training. The code and ready-to-use trained model are available at https://github.com/AI-sandbox/FSL-Net.
nan
Article 675
Title@2025-06-10 (2): Generalization analysis of an unfolding network for analysis-based Compressed Sensing
Title: Generalization analysis of an unfolding network for analysis-based Compressed Sensing | Generalisierungsanalyse eines sich entfaltenden Netzwerks für analysebasiertes Komprimiertes Sensing | 对正在展开的基于分析的压缩遥感压缩分析网络的概括化分析 2303.05582v3 |
Authors (2): Vicky Kouni, Yannis Panagakis
Unfolding networks have shown promising results in the Compressed Sensing (CS) field. Yet, the investigation of their generalization ability is still in its infancy. In this paper, we perform a generalization analysis of a state-of-the-art ADMM-based unfolding network, which jointly learns a decoder for CS and a sparsifying redundant analysis operator. To this end, we first impose a structural constraint on the learnable sparsifier, which parametrizes the network’s hypothesis class. For the latter, we estimate its Rademacher complexity. With this estimate in hand, we deliver generalization error bounds – which scale like the square root of the number of layers – for the examined network. Finally, the validity of our theory is assessed and numerical comparisons to a state-of-the-art unfolding network are made, on synthetic and real-world datasets. Our experimental results demonstrate that our proposed framework complies with our theoretical findings and outperforms the baseline, consistently for all datasets.
nan
Article 676
Title@2025-06-10 (2): Improved Variational Inference in Discrete VAEs using Error Correcting Codes
Title: Improved Variational Inference in Discrete VAEs using Error Correcting Codes | Verbesserte Variationale Schlussfolgerungen in diskreten VAEs mit Fehlerkorrekturcodes | 使用错误校正代码的差错 VAEs 中改进的变量引用 2410.07840v2 |
Authors (4): María Martínez-García, Grace Villacrés, David Mitchell, Pablo M. Olmos
Despite advances in deep probabilistic models, learning discrete latent representations remains challenging. This work introduces a novel method to improve inference in discrete Variational Autoencoders by reframing the inference problem through a generative perspective. We conceptualize the model as a communication system, and propose to leverage Error-Correcting Codes (ECCs) to introduce redundancy in latent representations, allowing the variational posterior to produce more accurate estimates and reduce the variational gap. We present a proof-of-concept using a Discrete Variational Autoencoder with binary latent variables and low-complexity repetition codes, extending it to a hierarchical structure for disentangling global and local data features. Our approach significantly improves generation quality, data reconstruction, and uncertainty calibration, outperforming the uncoded models even when trained with tighter bounds such as the Importance Weighted Autoencoder objective. We also outline the properties that ECCs should possess to be effectively utilized for improved discrete variational inference.
nan
Article 677
Title@2025-06-10 (2): Real-Time Cascade Mitigation in Power Systems Using Influence Graph Improved by Reinforcement Learning
Title: Real-Time Cascade Mitigation in Power Systems Using Influence Graph Improved by Reinforcement Learning | Real-Time Cascade Mitigation in Stromsystemen mit Einfluss Graph verbessert durch Verstärkung Learning | 利用强化学习所改进的影响图,在电力系统中实时连锁减缓电源系统 2506.08893v1 |
Authors (4): Kai Zhou, Youbiao He, Chong Zhong, Yifu Wu
Despite high reliability, modern power systems with growing renewable penetration face an increasing risk of cascading outages. Real-time cascade mitigation requires fast, complex operational decisions under uncertainty. In this work, we extend the influence graph into a Markov decision process model (MDP) for real-time mitigation of cascading outages in power transmission systems, accounting for uncertainties in generation, load, and initial contingencies. The MDP includes a do-nothing action to allow for conservative decision-making and is solved using reinforcement learning. We present a policy gradient learning algorithm initialized with a policy corresponding to the unmitigated case and designed to handle invalid actions. The proposed learning method converges faster than the conventional algorithm. Through careful reward design, we learn a policy that takes conservative actions without deteriorating system conditions. The model is validated on the IEEE 14-bus and IEEE 118-bus systems. The results show that proactive line disconnections can effectively reduce cascading risk, and certain lines consistently emerge as critical in mitigating cascade propagation.
nan
Article 678
Title@2025-06-10 (2): SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Title: SeerAttention-R: Sparse Attention Adaptation for Long Reasoning | SeherAchtung-R: Sparse-Achtung-Anpassung für langes Nachdenken | SeerAttention-R: 偏差的注意力调整以适应长期理由 2506.08889v1 |
Authors (15): Yizhao Gao, Shuming Guo, Shijie Cao, Yuqing Xia, Yu Cheng, Lei Wang, Lingxiao Ma, Yutao Sun, Tianzhu Ye, Li Dong, Hayden Kwok-Hay So, Yu Hua, Ting Cao, Fan Yang, Mao Yang
We introduce SeerAttention-R, a sparse attention framework specifically tailored for the long decoding of reasoning models. Extended from SeerAttention, SeerAttention-R retains the design of learning attention sparsity through a self-distilled gating mechanism, while removing query pooling to accommodate auto-regressive decoding. With a lightweight plug-in gating, SeerAttention-R is flexible and can be easily integrated into existing pretrained model without modifying the original parameters. We demonstrate that SeerAttention-R, trained on just 0.4B tokens, maintains near-lossless reasoning accuracy with 4K token budget in AIME benchmark under large sparse attention block sizes (64/128). Using TileLang, we develop a highly optimized sparse decoding kernel that achieves near-theoretical speedups of up to 9x over FlashAttention-3 on H100 GPU at 90% sparsity. Code is available at: https://github.com/microsoft/SeerAttention.
nan
Article 679
Title@2025-06-10 (2): InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation Analysis
Title: InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation Analysis | InfoDPCCA: Informationstheoretische dynamische probabilistische kanonischen Korrelationsanalyse | InfoDPCA: 信息理论动态动态概率能力与卡尼诺相关关系分析 2506.08884v1 |
Authors (2): Shiqin Tang, Shujian Yu
Extracting meaningful latent representations from high-dimensional sequential data is a crucial challenge in machine learning, with applications spanning natural science and engineering. We introduce InfoDPCCA, a dynamic probabilistic Canonical Correlation Analysis (CCA) framework designed to model two interdependent sequences of observations. InfoDPCCA leverages a novel information-theoretic objective to extract a shared latent representation that captures the mutual structure between the data streams and balances representation compression and predictive sufficiency while also learning separate latent components that encode information specific to each sequence. Unlike prior dynamic CCA models, such as DPCCA, our approach explicitly enforces the shared latent space to encode only the mutual information between the sequences, improving interpretability and robustness. We further introduce a two-step training scheme to bridge the gap between information-theoretic representation learning and generative modeling, along with a residual connection mechanism to enhance training stability. Through experiments on synthetic and medical fMRI data, we demonstrate that InfoDPCCA excels as a tool for representation learning. Code of InfoDPCCA is available at https://github.com/marcusstang/InfoDPCCA.
nan
Article 680
Title@2025-06-10 (2): Filling in the Blanks: Applying Data Imputation in incomplete Water Metering Data
Title: Filling in the Blanks: Applying Data Imputation in incomplete Water Metering Data | Füllen der Blanks: Anwenden von Daten Imputation in unvollständigen Wasserzählerdaten | 填充空白:在不完整的水计量数据中应用数据计算 2506.08882v1 |
Authors (4): Dimitrios Amaxilatis, Themistoklis Sarantakos, Ioannis Chatzigiannakis, Georgios Mylonas
In this work, we explore the application of recent data imputation techniques to enhance monitoring and management of water distribution networks using smart water meters, based on data derived from a real-world IoT water grid monitoring deployment. Despite the detailed data produced by such meters, data gaps due to technical issues can significantly impact operational decisions and efficiency. Our results, by comparing various imputation methods, such as k-Nearest Neighbors, MissForest, Transformers, and Recurrent Neural Networks, indicate that effective data imputation can substantially enhance the quality of the insights derived from water consumption data as we study their effect on accuracy and reliability of water metering data to provide solutions in applications like leak detection and predictive maintenance scheduling.
nan
Article 681
Title@2025-06-10 (2): Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models
Title: Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models | Pivoting Factorization: Eine kompakte Meta-Low-Rank-Darstellung von Sparsität für effiziente Schlussfolgerungen in großen Sprachmodellen | 重力因素化:大语言模型中高效引引的公平代表性的契约元低率代表 2501.19090v2 |
Authors (3): Jialin Zhao, Yingtao Zhang, Carlo Vittorio Cannistraci
The rapid growth of Large Language Models has driven demand for effective model compression techniques to reduce memory and computation costs. Low-rank pruning has gained attention for its GPU compatibility across all densities. However, low-rank pruning struggles to match the performance of semi-structured pruning, often doubling perplexity at similar densities. In this paper, we propose Pivoting Factorization (PIFA), a novel lossless meta low-rank representation that unsupervisedly learns a compact form of any low-rank representation, effectively eliminating redundant information. PIFA identifies pivot rows (linearly independent rows) and expresses non-pivot rows as linear combinations, achieving 24.2% additional memory savings and 24.6% faster inference over low-rank layers at rank = 50% of dimension. To mitigate the performance degradation caused by low-rank pruning, we introduce a novel, retraining-free reconstruction method that minimizes error accumulation (M). MPIFA, combining M and PIFA into an end-to-end framework, significantly outperforms existing low-rank pruning methods, and achieves performance comparable to semi-structured pruning, while surpassing it in GPU efficiency and compatibility. Our code is available at https://github.com/biomedical-cybernetics/pivoting-factorization.
nan
Article 682
Title@2025-06-10 (2): Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks
Title: Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks | Sparse Spectral Training und Inferenz auf Euklidische und Hyperbolische Neuronale Netzwerke | Euclidean 和超双曲神经网络的 广度光谱培训和推断 2405.15481v2 |
Authors (5): Jialin Zhao, Yingtao Zhang, Xinghang Li, Huaping Liu, Carlo Vittorio Cannistraci
The growing demands on GPU memory posed by the increasing number of neural network parameters call for training approaches that are more memory-efficient. Previous memory reduction training techniques, such as Low-Rank Adaptation (LoRA) and ReLoRA, face challenges, with LoRA being constrained by its low-rank structure, particularly during intensive tasks like pre-training, and ReLoRA suffering from saddle point issues. In this paper, we propose Sparse Spectral Training (SST) to optimize memory usage for pre-training. SST updates all singular values and selectively updates singular vectors through a multinomial sampling method weighted by the magnitude of the singular values. Furthermore, SST employs singular value decomposition to initialize and periodically reinitialize low-rank parameters, reducing distortion relative to full-rank training compared to other low-rank methods. Through comprehensive testing on both Euclidean and hyperbolic neural networks across various tasks, SST demonstrates its ability to outperform existing memory reduction training methods and is comparable to full-rank training in various cases. On LLaMA-1.3B, with only 18.7\% of the parameters trainable compared to full-rank training (using a rank equivalent to 6\% of the embedding dimension), SST reduces the perplexity gap between other low-rank methods and full-rank training by 97.4\%. This result highlights SST as an effective parameter-efficient technique for model pre-training.
nan
Article 683
Title@2025-06-10 (2): Adapting to Heterophilic Graph Data with Structure-Guided Neighbor Discovery
Title: Adapting to Heterophilic Graph Data with Structure-Guided Neighbor Discovery | Anpassung an heterophile Graphendaten mit strukturgeführter Neighbor Discovery | 与结构引导邻居发现发现器相适应以适应结构辅助邻居发现器的热对数图形数据 2506.08871v1 |
Authors (5): Victor M. Tenorio, Madeline Navarro, Samuel Rey, Santiago Segarra, Antonio G. Marques
Graph Neural Networks (GNNs) often struggle with heterophilic data, where connected nodes may have dissimilar labels, as they typically assume homophily and rely on local message passing. To address this, we propose creating alternative graph structures by linking nodes with similar structural attributes (e.g., role-based or global), thereby fostering higher label homophily on these new graphs. We theoretically prove that GNN performance can be improved by utilizing graphs with fewer false positive edges (connections between nodes of different classes) and that considering multiple graph views increases the likelihood of finding such beneficial structures. Building on these insights, we introduce Structure-Guided GNN (SG-GNN), an architecture that processes the original graph alongside the newly created structural graphs, adaptively learning to weigh their contributions. Extensive experiments on various benchmark datasets, particularly those with heterophilic characteristics, demonstrate that our SG-GNN achieves state-of-the-art or highly competitive performance, highlighting the efficacy of exploiting structural information to guide GNNs.
nan
Article 684
Title@2025-06-10 (2): Low-Rank Augmented Implicit Neural Representation for Unsupervised High-Dimensional Quantitative MRI Reconstruction
Title: Low-Rank Augmented Implicit Neural Representation for Unsupervised High-Dimensional Quantitative MRI Reconstruction | Low-Rank Augmented Implizite Neural Repräsentanz für unüberwachte hochdimensionelle Quantitative MRT-Rekonstruktion | 低Rank 低强度增强的内隐隐含神经力代表,用于未受监督的高多样性定量MRI重建 2506.09100v1 |
Authors (4): Haonan Zhang, Guoyan Lao, Yuyao Zhang, Hongjiang Wei
Quantitative magnetic resonance imaging (qMRI) provides tissue-specific parameters vital for clinical diagnosis. Although simultaneous multi-parametric qMRI (MP-qMRI) technologies enhance imaging efficiency, robustly reconstructing qMRI from highly undersampled, high-dimensional measurements remains a significant challenge. This difficulty arises primarily because current reconstruction methods that rely solely on a single prior or physics-informed model to solve the highly ill-posed inverse problem, which often leads to suboptimal results. To overcome this limitation, we propose LoREIN, a novel unsupervised and dual-prior-integrated framework for accelerated 3D MP-qMRI reconstruction. Technically, LoREIN incorporates both low-rank prior and continuity prior via low-rank representation (LRR) and implicit neural representation (INR), respectively, to enhance reconstruction fidelity. The powerful continuous representation of INR enables the estimation of optimal spatial bases within the low-rank subspace, facilitating high-fidelity reconstruction of weighted images. Simultaneously, the predicted multi-contrast weighted images provide essential structural and quantitative guidance, further enhancing the reconstruction accuracy of quantitative parameter maps. Furthermore, our work introduces a zero-shot learning paradigm with broad potential in complex spatiotemporal and high-dimensional image reconstruction tasks, further advancing the field of medical imaging.
nan
Article 685
Title@2025-06-10 (2): k-NN as a Simple and Effective Estimator of Transferability
Title: k-NN as a Simple and Effective Estimator of Transferability | k-NN als einfacher und effektiver Schätzwert für die Übertragbarkeit | k-NN作为可转让性的一个简单而有效的模拟器 2503.18528v2 |
Authors (5): Moein Sorkhei, Christos Matsoukas, Johan Fredin Haslum, Emir Konuk, Kevin Smith
How well can one expect transfer learning to work in a new setting where the domain is shifted, the task is different, and the architecture changes? Many transfer learning metrics have been proposed to answer this question. But how accurate are their predictions in a realistic new setting? We conducted an extensive evaluation involving over 42,000 experiments comparing 23 transferability metrics across 16 different datasets to assess their ability to predict transfer performance. Our findings reveal that none of the existing metrics perform well across the board. However, we find that a simple k-nearest neighbor evaluation – as is commonly used to evaluate feature quality for self-supervision – not only surpasses existing metrics, but also offers better computational efficiency and ease of implementation.
nan
Article 686
Title@2025-06-10 (2): Deep reinforcement learning for near-deterministic preparation of cubic- and quartic-phase gates in photonic quantum computing
Title: Deep reinforcement learning for near-deterministic preparation of cubic- and quartic-phase gates in photonic quantum computing | Tiefes Verstärkungslernen für die nahezu deterministische Vorbereitung von kubischen und quartic-Phasentoren im photonischen Quanten-Computing | 深入强化学习,为光子量衡算的立方和临界阶段大门进行近乎决定性的准备 2506.07859v2 |
Authors (4): Amanuel Anteneh, Léandre Brunel, Carlos González-Arciniegas, Olivier Pfister
Cubic-phase states are a sufficient resource for universal quantum computing over continuous variables. We present results from numerical experiments in which deep neural networks are trained via reinforcement learning to control a quantum optical circuit for generating cubic-phase states, with an average success rate of 96%. The only non-Gaussian resource required is photon-number-resolving measurements. We also show that the exact same resources enable the direct generation of a quartic-phase gate, with no need for a cubic gate decomposition.
nan
Article 687
Title@2025-06-10 (2): Cooperative and Collaborative Multi-Task Semantic Communication for Distributed Sources
Title: Cooperative and Collaborative Multi-Task Semantic Communication for Distributed Sources | Kooperative und kollaborative Multi-Task Semantische Kommunikation für verteilte Quellen | 合作与合作 多任务、跨语义交流促进分配源的传播 2411.02150v2 |
Authors (5): Ahmad Halimi Razlighi, Maximilian H. V. Tillmann, Edgar Beck, Carsten Bockelmann, Armin Dekorsy
In this paper, we explore a multi-task semantic communication (SemCom) system for distributed sources, extending the existing focus on collaborative single-task execution. We build on the cooperative multi-task processing introduced in [1], which divides the encoder into a common unit (CU) and multiple specific units (SUs). While earlier studies in multi-task SemCom focused on full observation settings, our research explores a more realistic case where only distributed partial observations are available, such as in a production line monitored by multiple sensing nodes. To address this, we propose an SemCom system that supports multi-task processing through cooperation on the transmitter side via split structure and collaboration on the receiver side. We have used an information-theoretic perspective with variational approximations for our end-to-end data-driven approach. Simulation results demonstrate that the proposed cooperative and collaborative multi-task (CCMT) SemCom system significantly improves task execution accuracy, particularly in complex datasets, if the noise introduced from the communication channel is not limiting the task performance too much. Our findings contribute to a more general SemCom framework capable of handling distributed sources and multiple tasks simultaneously, advancing the applicability of SemCom systems in real-world scenarios.
nan
Article 688
Title@2025-06-10 (2): StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams
Title: StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams | StreamSplat: Auf dem Weg zu Online Dynamische 3D-Rekonstruktion aus unkalibrierten Videostreams | StreamSplat:从未经校正的视频流走向在线动态3D重建 2506.08862v1 |
Authors (5): Zike Wu, Qi Yan, Xuanyu Yi, Lele Wang, Renjie Liao
Real-time reconstruction of dynamic 3D scenes from uncalibrated video streams is crucial for numerous real-world applications. However, existing methods struggle to jointly address three key challenges: 1) processing uncalibrated inputs in real time, 2) accurately modeling dynamic scene evolution, and 3) maintaining long-term stability and computational efficiency. To this end, we introduce StreamSplat, the first fully feed-forward framework that transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting (3DGS) representations in an online manner, capable of recovering scene dynamics from temporally local observations. We propose two key technical innovations: a probabilistic sampling mechanism in the static encoder for 3DGS position prediction, and a bidirectional deformation field in the dynamic decoder that enables robust and efficient dynamic modeling. Extensive experiments on static and dynamic benchmarks demonstrate that StreamSplat consistently outperforms prior works in both reconstruction quality and dynamic scene modeling, while uniquely supporting online reconstruction of arbitrarily long video streams. Code and models are available at https://github.com/nickwzk/StreamSplat.
nan
Article 689
Title@2025-06-10 (2): DIME:Diffusion-Based Maximum Entropy Reinforcement Learning
Title: DIME:Diffusion-Based Maximum Entropy Reinforcement Learning | DIME:Diffusionsbasiertes maximales Entropie-Verstärkungs-Lernen | DIME: 以传播为基础的最大助爆强化学习 2502.02316v2 |
Authors (8): Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palenicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann
Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges-primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). \emph{DIME} leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.
nan
Article 690
Title@2025-06-10 (2): Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers
Title: Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers | Zu groß zu denken: Kapazität, Erinnerung und Verallgemeinerung in vortrainierten Transformern | 能力、记忆和在培训前变异器中的普及化 2506.09099v1 |
Authors (2): Joshua Barron, Devin White
The relationship between memorization and generalization in large language models (LLMs) remains an open area of research, with growing evidence that the two are deeply intertwined. In this work, we investigate this relationship by pre-training a series of capacity-limited Transformer models from scratch on two synthetic character-level tasks designed to separately probe generalization (via arithmetic extrapolation) and memorization (via factual recall). We observe a consistent trade-off: small models extrapolate to unseen arithmetic cases but fail to memorize facts, while larger models memorize but fail to extrapolate. An intermediate-capacity model exhibits a similar shift toward memorization. When trained on both tasks jointly, no model (regardless of size) succeeds at extrapolation. These findings suggest that pre-training may intrinsically favor one learning mode over the other. By isolating these dynamics in a controlled setting, our study offers insight into how model capacity shapes learning behavior and offers broader implications for the design and deployment of small language models.
nan
Article 691
Title@2025-06-10 (2): On the Hardness of Sampling from Mixture Distributions via Langevin Dynamics
Title: On the Hardness of Sampling from Mixture Distributions via Langevin Dynamics | Über die Härte der Probenahme aus Mischungsverteilungen über Langevin Dynamics | 使用 Langevin Directives 进行混音发行的样本的强度 2406.02017v3 |
Authors (3): Xiwei Cheng, Kexin Fu, Farzan Farnia
The Langevin Dynamics (LD), which aims to sample from a probability distribution using its score function, has been widely used for analyzing and developing score-based generative modeling algorithms. While the convergence behavior of LD in sampling from a uni-modal distribution has been extensively studied in the literature, the analysis of LD under a mixture distribution with distinct modes remains underexplored in the literature. In this work, we analyze LD in sampling from a mixture distribution and theoretically study its convergence properties. Our theoretical results indicate that for general mixture distributions of sub-Gaussian components, LD could fail in finding all the components within a sub-exponential number of steps in the data dimension. Following our result on the complexity of LD in sampling from high-dimensional variables, we propose Chained Langevin Dynamics (Chained-LD), which divides the data vector into patches of smaller sizes and generates every patch sequentially conditioned on the previous patches. Our theoretical analysis of Chained-LD indicates its faster convergence speed to the components of a mixture distribution. We present the results of several numerical experiments on synthetic and real image datasets, validating our theoretical results on the iteration complexities of sample generation from mixture distributions using the vanilla and chained LD algorithms.
nan
Article 692
Title@2025-06-10 (2): Learning surrogate equations for the analysis of an agent-based cancer model
Title: Learning surrogate equations for the analysis of an agent-based cancer model | Erlernen von Surrogatgleichungen für die Analyse eines agentenbasierten Krebsmodells | 用于分析基于制剂的癌症模式的学习替代方程 2503.01718v2 |
Authors (5): Kevin Burrage, Pamela M. Burrage, Justin N. Kreikemeyer, Adelinde M. Uhrmacher, Hasitha N. Weerasinghe
In this paper, we adapt a two-species agent-based cancer model that describes the interaction between cancer cells and healthy cells on a uniform grid to include the interaction with a third species – namely immune cells. We run six different scenarios to explore the competition between cancer and immune cells and the initial concentration of the immune cells on cancer dynamics. We then use coupled equation learning to construct a population-based reaction model for each scenario. We show how they can be unified into a single surrogate population-based reaction model, whose underlying three coupled ordinary differential equations are much easier to analyse than the original agent-based model. As an example, by finding the single steady state of the cancer concentration, we are able to find a linear relationship between this concentration and the initial concentration of the immune cells. This then enables us to estimate suitable values for the competition and initial concentration to reduce the cancer substantially without performing additional complex and expensive simulations from an agent-based stochastic model.
nan
Article 693
Title@2025-06-10 (2): Agile Reinforcement Learning for Real-Time Task Scheduling in Edge Computing
Title: Agile Reinforcement Learning for Real-Time Task Scheduling in Edge Computing | Agiles Verstärkungslernen für Echtzeit-Aufgabenplanung im Edge Computing | 边边计算中实时任务排期 2506.08850v1 |
Authors (3): Amin Avan, Akramul Azim, Qusay Mahmoud
Soft real-time applications are becoming increasingly complex, posing significant challenges for scheduling offloaded tasks in edge computing environments while meeting task timing constraints. Moreover, the exponential growth of the search space, presence of multiple objectives and parameters, and highly dynamic nature of edge computing environments further exacerbate the complexity of task scheduling. As a result, schedulers based on heuristic and metaheuristic algorithms frequently encounter difficulties in generating optimal or near-optimal task schedules due to their constrained ability to adapt to the dynamic conditions and complex environmental characteristics of edge computing. Accordingly, reinforcement learning algorithms have been incorporated into schedulers to address the complexity and dynamic conditions inherent in task scheduling in edge computing. However, a significant limitation of reinforcement learning algorithms is the prolonged learning time required to adapt to new environments and to address medium- and large-scale problems. This challenge arises from the extensive global action space and frequent random exploration of irrelevant actions. Therefore, this study proposes Agile Reinforcement learning (aRL), in which the RL-agent performs informed exploration and executes only relevant actions. Consequently, the predictability of the RL-agent is enhanced, leading to rapid adaptation and convergence, which positions aRL as a suitable candidate for scheduling the tasks of soft real-time applications in edge computing. The experiments demonstrate that the combination of informed exploration and action-masking methods enables aRL to achieve a higher hit-ratio and converge faster than the baseline approaches.
nan
Article 694
Title@2025-06-10 (2): IMAGIC-500: IMputation benchmark on A Generative Imaginary Country (500k samples)
Title: IMAGIC-500: IMputation benchmark on A Generative Imaginary Country (500k samples) | IMAGIC-500: Imputations-Benchmark für ein generatives Imaginary Country (500k-Proben) | IMAGIC-500: 创造想象力国家(500公里样本)的估测基准 2506.08844v1 |
Authors (6): Siyi Sun, David Antony Selby, Yunchuan Huang, Sebastian Vollmer, Seth Flaxman, Anisoara Calinescu
Missing data imputation in tabular datasets remains a pivotal challenge in data science and machine learning, particularly within socioeconomic research. However, real-world socioeconomic datasets are typically subject to strict data protection protocols, which often prohibit public sharing, even for synthetic derivatives. This severely limits the reproducibility and accessibility of benchmark studies in such settings. Further, there are very few publicly available synthetic datasets. Thus, there is limited availability of benchmarks for systematic evaluation of imputation methods on socioeconomic datasets, whether real or synthetic. In this study, we utilize the World Bank’s publicly available synthetic dataset, Synthetic Data for an Imaginary Country, which closely mimics a real World Bank household survey while being fully public, enabling broad access for methodological research. With this as a starting point, we derived the IMAGIC-500 dataset: we select a subset of 500k individuals across approximately 100k households with 19 socioeconomic features, designed to reflect the hierarchical structure of real-world household surveys. This paper introduces a comprehensive missing data imputation benchmark on IMAGIC-500 under various missing mechanisms (MCAR, MAR, MNAR) and missingness ratios (10\%, 20\%, 30\%, 40\%, 50\%). Our evaluation considers the imputation accuracy for continuous and categorical variables, computational efficiency, and impact on downstream predictive tasks, such as estimating educational attainment at the individual level. The results highlight the strengths and weaknesses of statistical, traditional machine learning, and deep learning imputation techniques, including recent diffusion-based methods. The IMAGIC-500 dataset and benchmark aim to facilitate the development of robust imputation algorithms and foster reproducible social science research.
nan
Article 695
Title@2025-06-10 (2): Quantum Architecture Search with Unsupervised Representation Learning
Title: Quantum Architecture Search with Unsupervised Representation Learning | Quantenarchitektur Suche mit unüberwachtem Repräsentationslernen | 以无人监督的代表制学习方式搜索量子结构 2401.11576v5 |
Authors (4): Yize Sun, Zixin Wu, Yunpu Ma, Volker Tresp
Unsupervised representation learning presents new opportunities for advancing Quantum Architecture Search (QAS) on Noisy Intermediate-Scale Quantum (NISQ) devices. QAS is designed to optimize quantum circuits for Variational Quantum Algorithms (VQAs). Most QAS algorithms tightly couple the search space and search algorithm, typically requiring the evaluation of numerous quantum circuits, resulting in high computational costs and limiting scalability to larger quantum circuits. Predictor-based QAS algorithms mitigate this issue by estimating circuit performance based on structure or embedding. However, these methods often demand time-intensive labeling to optimize gate parameters across many circuits, which is crucial for training accurate predictors. Inspired by the classical neural architecture search algorithm Arch2vec, we investigate the potential of unsupervised representation learning for QAS without relying on predictors. Our framework decouples unsupervised architecture representation learning from the search process, enabling the learned representations to be applied across various downstream tasks. Additionally, it integrates an improved quantum circuit graph encoding scheme, addressing the limitations of existing representations and enhancing search efficiency. This predictor-free approach removes the need for large labeled datasets. During the search, we employ REINFORCE and Bayesian Optimization to explore the latent representation space and compare their performance against baseline methods. We further validate our approach by executing the best-discovered MaxCut circuits on IBM’s ibm_sherbrooke quantum processor, confirming that the architectures retain optimal performance even under real hardware noise. Our results demonstrate that the framework efficiently identifies high-performing quantum circuits with fewer search iterations.
nan
Article 696
Title@2025-06-10 (2): Robot Pouring: Identifying Causes of Spillage and Selecting Alternative Action Parameters Using Probabilistic Actual Causation
Title: Robot Pouring: Identifying Causes of Spillage and Selecting Alternative Action Parameters Using Probabilistic Actual Causation | Robot-Pouring: Identifizierung von Ursachen von Spillage und Auswahl alternativer Aktionsparameter mittels probabilistischer tatsächlicher Schadensursachen | 机器人定位:利用概率实际原因查明螺旋原因和选择替代行动参数 2502.09395v3 |
Authors (5): Jaime Maldonado, Jonas Krumme, Christoph Zetzsche, Vanessa Didelez, Kerstin Schill
In everyday life, we perform tasks (e.g., cooking or cleaning) that involve a large variety of objects and goals. When confronted with an unexpected or unwanted outcome, we take corrective actions and try again until achieving the desired result. The reasoning performed to identify a cause of the observed outcome and to select an appropriate corrective action is a crucial aspect of human reasoning for successful task execution. Central to this reasoning is the assumption that a factor is responsible for producing the observed outcome. In this paper, we investigate the use of probabilistic actual causation to determine whether a factor is the cause of an observed undesired outcome. Furthermore, we show how the actual causation probabilities can be used to find alternative actions to change the outcome. We apply the probabilistic actual causation analysis to a robot pouring task. When spillage occurs, the analysis indicates whether a task parameter is the cause and how it should be changed to avoid spillage. The analysis requires a causal graph of the task and the corresponding conditional probability distributions. To fulfill these requirements, we perform a complete causal modeling procedure (i.e., task analysis, definition of variables, determination of the causal graph structure, and estimation of conditional probability distributions) using data from a realistic simulation of the robot pouring task, covering a large combinatorial space of task parameters. Based on the results, we discuss the implications of the variables’ representation and how the alternative actions suggested by the actual causation analysis would compare to the alternative solutions proposed by a human observer. The practical use of the analysis of probabilistic actual causation to select alternative action parameters is demonstrated.
nan
Article 697
Title@2025-06-10 (2): Nearly Optimal Differentially Private ReLU Regression
Title: Nearly Optimal Differentially Private ReLU Regression | Fast optimal unterschiedliche private ReLU-Regression | 接近最佳差异化私自 ReLU 回归 2503.06009v2 |
Authors (6): Meng Ding, Mingxi Lei, Shaowei Wang, Tianhang Zheng, Di Wang, Jinhui Xu
In this paper, we investigate one of the most fundamental nonconvex learning problems, ReLU regression, in the Differential Privacy (DP) model. Previous studies on private ReLU regression heavily rely on stringent assumptions, such as constant bounded norms for feature vectors and labels. We relax these assumptions to a more standard setting, where data can be i.i.d. sampled from $O(1)$-sub-Gaussian distributions. We first show that when $\varepsilon = \tilde{O}(\sqrt{\frac{1}{N}})$ and there is some public data, it is possible to achieve an upper bound of $\tilde{O}(\frac{d^2}{N^2 \varepsilon^2})$ for the excess population risk in $(\epsilon, \delta)$-DP, where $d$ is the dimension and $N$ is the number of data samples. Moreover, we relax the requirement of $\epsilon$ and public data by proposing and analyzing a one-pass mini-batch Generalized Linear Model Perceptron algorithm (DP-MBGLMtron). Additionally, using the tracing attack argument technique, we demonstrate that the minimax rate of the estimation error for $(\varepsilon, \delta)$-DP algorithms is lower bounded by $\Omega(\frac{d^2}{N^2 \varepsilon^2})$. This shows that DP-MBGLMtron achieves the optimal utility bound up to logarithmic factors. Experiments further support our theoretical results.
nan
Article 698
Title@2025-06-10 (2): Relational decomposition for program synthesis
Title: Relational decomposition for program synthesis | Verhältnismäßige Zersetzung für die Programmsynthese | 用于方案合成的通货膨胀分解 2408.12212v3 |
Authors (2): Céline Hocquette, Andrew Cropper
We introduce a relational approach to program synthesis. The key idea is to decompose synthesis tasks into simpler relational synthesis subtasks. Specifically, our representation decomposes a training input-output example into sets of input and output facts respectively. We then learn relations between the input and output facts. We demonstrate our approach using an off-the-shelf inductive logic programming (ILP) system on four challenging synthesis datasets. Our results show that (i) our representation can outperform a standard one, and (ii) an off-the-shelf ILP system with our representation can outperform domain-specific approaches.
nan
Article 699
Title@2025-06-10 (2): Stepsize anything: A unified learning rate schedule for budgeted-iteration training
Title: Stepsize anything: A unified learning rate schedule for budgeted-iteration training | Stepsize everything: Eine einheitliche Lernrate Zeitplan für budgetierte iteration Ausbildung | 采取任何步骤:为列入预算的就地培训统一学习率时间表 2505.24452v2 |
Authors (5): Anda Tang, Yiming Dong, Yutao Zeng, zhou Xun, Zhouchen Lin
The expanding computational costs and limited resources underscore the critical need for budgeted-iteration training, which aims to achieve optimal learning within predetermined iteration budgets. While learning rate schedules fundamentally govern the performance of different networks and tasks, particularly in budgeted-iteration scenarios, their design remains largely heuristic, lacking theoretical foundations. In addition, the optimal learning rate schedule requires extensive trial-and-error selection, making the training process inefficient. In this work, we propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules among diverse architectures and tasks under different constrained training budgets. First, we bridge the gap by constructing a novel training budget-aware optimization framework, which explicitly accounts for the robustness to landscape curvature variations. From this framework, we derive the UBA schedule, controlled by a single hyper-parameter \varphi that provides a trade-off between flexibility and simplicity, eliminating the need for per-network numerical optimization. Moreover, we establish a theoretical connection between \varphi and the condition number, adding interpretation and justification to our approach. Besides, we prove the convergence for different values of \varphi. We offer practical guidelines for its selection via theoretical analysis and empirical results. Extensive experimental results show that UBA consistently surpasses the commonly-used schedules across diverse vision and language tasks, spanning network architectures (e.g., ResNet, OLMo) and scales, under different training-iteration budgets.
nan
Article 700
Title@2025-06-10 (2): EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations
Title: EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations | EquivaMap: LLM-Leveraging für automatische Äquivalenzprüfung von Optimierungsformulierungen | Equivamap: 利用LMLMs对优化配方进行自动等效检查 2502.14760v2 |
Authors (4): Haotian Zhai, Connor Lawless, Ellen Vitercik, Liu Leqi
A fundamental problem in combinatorial optimization is identifying equivalent formulations. Despite the growing need for automated equivalence checks – driven, for example, by optimization copilots, which generate problem formulations from natural language descriptions – current approaches rely on simple heuristics that fail to reliably check formulation equivalence. Inspired by Karp reductions, in this work we introduce Quasi-Karp equivalence, a formal criterion for determining when two optimization formulations are equivalent based on the existence of a mapping between their decision variables. We propose EquivaMap, a framework that leverages large language models to automatically discover such mappings for scalable, reliable equivalence checking, with a verification stage that ensures mapped solutions preserve feasibility and optimality without additional solver calls. To evaluate our approach, we construct EquivaFormulation, the first open-source dataset of equivalent optimization formulations, generated by applying transformations such as adding slack variables or valid inequalities to existing formulations. Empirically, EquivaMap significantly outperforms existing methods, achieving substantial improvements in correctly identifying formulation equivalence.
nan
Article 701
Title@2025-06-10 (2): High-Dimensional Independence Testing via Maximum and Average Distance Correlations
Title: High-Dimensional Independence Testing via Maximum and Average Distance Correlations | High-Dimensional Independence Testing über maximale und mittlere Distanzkorrelationen | 通过最大和平均距离交错进行高度多功能独立测试 2001.01095v4 |
Authors (2): Cencheng Shen, Yuexiao Dong
This paper investigates the utilization of maximum and average distance correlations for multivariate independence testing. We characterize their consistency properties in high-dimensional settings with respect to the number of marginally dependent dimensions, compare the advantages of each test statistic, examine their respective null distributions, and present a fast chi-square-based testing procedure. The resulting tests are non-parametric and applicable to both Euclidean distance and the Gaussian kernel as the underlying metric. To better understand the practical use cases of the proposed tests, we evaluate the empirical performance of the maximum distance correlation, average distance correlation, and the original distance correlation across various multivariate dependence scenarios, as well as conduct a real data experiment to test the presence of various cancer types and peptide levels in human plasma.
nan
Article 702
Title@2025-06-10 (2): Activated LoRA: Fine-tuned LLMs for Intrinsics
Title: Activated LoRA: Fine-tuned LLMs for Intrinsics | Aktivierte LoRA: Feinjustierte LLMs für die Intrinsik | 活性LORA: 用于内分泌的微调LMLM 2504.12397v4 |
Authors (9): Kristjan Greenewald, Luis Lastras, Thomas Parnell, Vraj Shah, Lucian Popa, Giulio Zizzo, Chulaka Gunasekara, Ambrish Rawat, David Cox
Low-Rank Adaptation (LoRA) has emerged as a highly efficient framework for finetuning the weights of large foundation models, and has become the go-to method for data-driven customization of LLMs. Despite the promise of highly customized behaviors and capabilities, switching between relevant LoRAs in a multiturn setting is inefficient, as the key-value (KV) cache of the entire turn history must be recomputed with the LoRA weights before generation can begin. To address this problem, we propose Activated LoRA (aLoRA), an adapter architecture which modifies the LoRA framework to only adapt weights for the tokens in the sequence \emph{after} the aLoRA is invoked. This change crucially allows aLoRA to accept the base model’s KV cache of the input string, meaning that aLoRA can be instantly activated whenever needed in a chain without recomputing the cache. This enables building what we call \emph{intrinsics}, i.e. specialized models invoked to perform well-defined operations on portions of an input chain or conversation that otherwise uses the base model by default. We train a set of aLoRA-based intrinsics models, demonstrating competitive accuracy with standard LoRA while achieving significant inference benefits. The codebase is at https://github.com/IBM/activated-lora.
nan
Article 703
Title@2025-06-10 (2): Spectral invariance and maximality properties of the frequency spectrum of quantum neural networks
Title: Spectral invariance and maximality properties of the frequency spectrum of quantum neural networks | Spektrale Invarianz und maximale Eigenschaften des Frequenzspektrums quantenneuraler Netzwerke | 量子神经网络频谱的光谱性和最大度性 2402.14515v3 |
Authors (2): Patrick Holzer, Ivica Turkalj
Quantum Neural Networks (QNNs) are a popular approach in Quantum Machine Learning. We analyze this frequency spectrum using the Minkowski sum for sets and the set of differences, which makes it particularly easy to express and calculate the frequency spectrum algebraically, and prove different maximality results for a large class of models. Furthermore, we prove that under some mild conditions there exists a bijection between classes of models with the same area $A:=R\cdot L$ that preserves the frequency spectrum, where $R$ denotes the number of qubits and $L$ the number of layers, which we consequently call spectral invariance under area-preserving transformations. With this we explain the symmetry in $R$ and $L$ in the results often observed in the literature and show that the maximal frequency spectrum depends only on the area $A=RL$ and not on the individual values of $R$ and $L$. Moreover, we collect and extend existing results and specify the maximum possible frequency spectrum of a QNN with arbitrarily many layers as a function of the spectrum of its generators. In the case of arbitrary dimensional generators, where our two introduces notions of maximality differ, we extend existing results based on the so-called Golomb ruler and introduce a second novel approach based on a variation of the turnpike problem, which we call the relaxed turnpike problem.
nan
Article 704
Title@2025-06-10 (2): Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Title: Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO | Eintauchen in RL für Bildgenerierung mit CoT: Eine Studie über DPO vs. GRPO | 与COT一起为图像生成而跳入RL:关于DPO与GROP公司的研究 2505.17017v2 |
Authors (8): Chengzhuo Tong, Ziyu Guo, Renrui Zhang, Wenyu Shan, Xinyu Wei, Zhenghao Xing, Hongsheng Li, Pheng-Ann Heng
Recent advancements underscore the significant role of Reinforcement Learning (RL) in enhancing the Chain-of-Thought (CoT) reasoning capabilities of large language models (LLMs). Two prominent RL algorithms, Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO), are central to these developments, showcasing different pros and cons. Autoregressive image generation, also interpretable as a sequential CoT reasoning process, presents unique challenges distinct from LLM-based CoT reasoning. These encompass ensuring text-image consistency, improving image aesthetic quality, and designing sophisticated reward models, rather than relying on simpler rule-based rewards. While recent efforts have extended RL to this domain, these explorations typically lack an in-depth analysis of the domain-specific challenges and the characteristics of different RL strategies. To bridge this gap, we provide the first comprehensive investigation of the GRPO and DPO algorithms in autoregressive image generation, evaluating their in-domain performance and out-of-domain generalization, while scrutinizing the impact of different reward models on their respective capabilities. Our findings reveal that GRPO and DPO exhibit distinct advantages, and crucially, that reward models possessing stronger intrinsic generalization capabilities potentially enhance the generalization potential of the applied RL algorithms. Furthermore, we systematically explore three prevalent scaling strategies to enhance both their in-domain and out-of-domain proficiency, deriving unique insights into efficiently scaling performance for each paradigm. We hope our study paves a new path for inspiring future work on developing more effective RL algorithms to achieve robust CoT reasoning in the realm of autoregressive image generation. Code is released at https://github.com/ZiyuGuo99/Image-Generation-CoT
nan
Article 705
Title@2025-06-10 (2): In Praise of Stubbornness: An Empirical Case for Cognitive-Dissonance Aware Continual Update of Knowledge in LLMs
Title: In Praise of Stubbornness: An Empirical Case for Cognitive-Dissonance Aware Continual Update of Knowledge in LLMs | In Lob der Stubbornness: Ein empirischer Fall für Kognitiv-Dissonanz Aware Continual Update of Knowledge in LLMs | 赞美固态:关于知识在LLM中不断更新的认知与不相识经验案例 2502.04390v2 |
Authors (6): Simone Clemente, Zied Ben Houidi, Alexis Huet, Dario Rossi, Giulio Franzese, Pietro Michiardi
Through systematic empirical investigation, we uncover a fundamental and concerning property of Large Language Models: while they can safely learn facts that don’t contradict their knowledge, attempting to update facts with contradictory information triggers catastrophic corruption of unrelated knowledge. Unlike humans, who naturally resist contradictory information, these models indiscriminately accept contradictions, leading to devastating interference, destroying up to 80% of unrelated knowledge even when learning as few as 10-100 contradicting facts. To understand whether this interference could be mitigated through selective plasticity, we experiment with targeted network updates, distinguishing between previously used (stubborn) and rarely used (plastic) neurons. We uncover another asymmetry: while sparing frequently-used neurons significantly improves retention of existing knowledge for non-contradictory updates (98% vs 93% with standard updates), contradictory updates trigger catastrophic interference regardless of targeting strategy. This effect which persists across tested model scales (GPT-2 to GPT-J-6B), suggests a fundamental limitation in how neural networks handle contradictions. Finally, we demonstrate that contradictory information can be reliably detected (95%+ accuracy) using simple model features, offering a potential protective mechanism. These findings motivate new architectures that can, like humans, naturally resist contradictions rather than allowing destructive overwrites.
nan
Article 706
Title@2025-06-10 (2): Improving the Noise Estimation of Latent Neural Stochastic Differential Equations
Title: Improving the Noise Estimation of Latent Neural Stochastic Differential Equations | Verbesserung der Lärmschätzung latenter neuraler stochastischer Differentialgleichungen | 改进对内心神经斯托卡式差异的噪音估计 2412.17499v2 |
Authors (4): Linus Heck, Maximilian Gelbrecht, Michael T. Schaub, Niklas Boers
Latent neural stochastic differential equations (SDEs) have recently emerged as a promising approach for learning generative models from stochastic time series data. However, they systematically underestimate the noise level inherent in such data, limiting their ability to capture stochastic dynamics accurately. We investigate this underestimation in detail and propose a straightforward solution: by including an explicit additional noise regularization in the loss function, we are able to learn a model that accurately captures the diffusion component of the data. We demonstrate our results on a conceptual model system that highlights the improved latent neural SDE’s capability to model stochastic bistable dynamics.
nan
Article 707
Title@2025-06-10 (2): Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval
Title: Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval | Optimierte Text-Embedding-Modelle und Benchmarks für die Amharische Passage Retrieval | 阿姆光通过通过检索的最佳文本嵌入模型和基准 2505.19356v2 |
Authors (3): Kidist Amde Mekonnen, Yosef Worku Alemneh, Maarten de Rijke
Neural retrieval methods using transformer-based pre-trained language models have advanced multilingual and cross-lingual retrieval. However, their effectiveness for low-resource, morphologically rich languages such as Amharic remains underexplored due to data scarcity and suboptimal tokenization. We address this gap by introducing Amharic-specific dense retrieval models based on pre-trained Amharic BERT and RoBERTa backbones. Our proposed RoBERTa-Base-Amharic-Embed model (110M parameters) achieves a 17.6% relative improvement in MRR@10 and a 9.86% gain in Recall@10 over the strongest multilingual baseline, Arctic Embed 2.0 (568M parameters). More compact variants, such as RoBERTa-Medium-Amharic-Embed (42M), remain competitive while being over 13x smaller. Additionally, we train a ColBERT-based late interaction retrieval model that achieves the highest MRR@10 score (0.843) among all evaluated models. We benchmark our proposed models against both sparse and dense retrieval baselines to systematically assess retrieval effectiveness in Amharic. Our analysis highlights key challenges in low-resource settings and underscores the importance of language-specific adaptation. To foster future research in low-resource IR, we publicly release our dataset, codebase, and trained models at https://github.com/kidist-amde/amharic-ir-benchmarks.
nan
Article 708
Title@2025-06-10 (2): syren-baryon: Analytic emulators for the impact of baryons on the matter power spectrum
Title: syren-baryon: Analytic emulators for the impact of baryons on the matter power spectrum | Syren-Baryon: Analytische Emulatoren für den Einfluss von Baryonen auf das Materie-Leistungsspektrum | syren-baryon: 用于分析男爵对物质能量频谱影响的分析模拟器 2506.08783v1 |
Authors (5): Lukas Kammerer, Deaglan J. Bartlett, Gabriel Kronberger, Harry Desmond, Pedro G. Ferreira
Baryonic physics has a considerable impact on the distribution of matter in our Universe on scales probed by current and future cosmological surveys, acting as a key systematic in such analyses. We seek simple symbolic parametrisations for the impact of baryonic physics on the matter power spectrum for a range of physically motivated models, as a function of wavenumber, redshift, cosmology, and parameters controlling the baryonic feedback. We use symbolic regression to construct analytic approximations for the ratio of the matter power spectrum in the presence of baryons to that without such effects. We obtain separate functions of each of four distinct sub-grid prescriptions of baryonic physics from the CAMELS suite of hydrodynamical simulations (Astrid, IllustrisTNG, SIMBA and Swift-EAGLE) as well as for a baryonification algorithm. We also provide functions which describe the uncertainty on these predictions, due to both the stochastic nature of baryonic physics and the errors on our fits. The error on our approximations to the hydrodynamical simulations is comparable to the sample variance estimated through varying initial conditions, and our baryonification expression has a root mean squared error of better than one percent, although this increases on small scales. These errors are comparable to those of previous numerical emulators for these models. Our expressions are enforced to have the physically correct behaviour on large scales and at high redshift. Due to their analytic form, we are able to directly interpret the impact of varying cosmology and feedback parameters, and we can identify parameters which have little to no effect. Each function is based on a different implementation of baryonic physics, and can therefore be used to discriminate between these models when applied to real data. We provide publicly available code for all symbolic approximations found.
nan
Article 709
Title@2025-06-10 (2): LT-PINN: Lagrangian Topology-conscious Physics-informed Neural Network for Boundary-focused Engineering Optimization
Title: LT-PINN: Lagrangian Topology-conscious Physics-informed Neural Network for Boundary-focused Engineering Optimization | LT-PINN: Lagrangian Topologie-bewusste physik-informierte Neuronales Netzwerk für boundary-focused Engineering Optimization | LT-PINN:Lagrangian 地形 – – 具有意识的地形 – – 物理意识 – – 以边界为重点的工程优化知情神经网络 2506.06300v2 |
Authors (5): Yuanye Zhou, Zhaokun Wang, Kai Zhou, Hui Tang, Xiaofan Li
Physics-informed neural networks (PINNs) have emerged as a powerful meshless tool for topology optimization, capable of simultaneously determining optimal topologies and physical solutions. However, conventional PINNs rely on density-based topology descriptions, which necessitate manual interpolation and limit their applicability to complex geometries. To address this, we propose Lagrangian topology-conscious PINNs (LT-PINNs), a novel framework for boundary-focused engineering optimization. By parameterizing the control variables of topology boundary curves as learnable parameters, LT-PINNs eliminate the need for manual interpolation and enable precise boundary determination. We further introduce specialized boundary condition loss function and topology loss function to ensure sharp and accurate boundary representations, even for intricate topologies. The accuracy and robustness of LT-PINNs are validated via two types of partial differential equations (PDEs), including elastic equation with Dirichlet boundary conditions and Laplace’s equation with Neumann boundary conditions. Furthermore, we demonstrate effectiveness of LT-PINNs on more complex time-dependent and time-independent flow problems without relying on measurement data, and showcase their engineering application potential in flow velocity rearrangement, transforming a uniform upstream velocity into a sine-shaped downstream profile. The results demonstrate (1) LT-PINNs achieve substantial reductions in relative L2 errors compared with the state-of-art density topology-oriented PINNs (DT-PINNs), (2) LT-PINNs can handle arbitrary boundary conditions, making them suitable for a wide range of PDEs, and (3) LT-PINNs can infer clear topology boundaries without manual interpolation, especially for complex topologies.
nan
Article 710
Title@2025-06-10 (2): Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models
Title: Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models | Landsat-Bench: Datensätze und Benchmarks für Landsat-Stiftungsmodelle | Landsat-Bench:Landsat基金会模型数据集和基准 2506.08780v1 |
Authors (3): Isaac Corley, Lakshay Sharma, Ruth Crasto
The Landsat program offers over 50 years of globally consistent Earth imagery. However, the lack of benchmarks for this data constrains progress towards Landsat-based Geospatial Foundation Models (GFM). In this paper, we introduce Landsat-Bench, a suite of three benchmarks with Landsat imagery that adapt from existing remote sensing datasets – EuroSAT-L, BigEarthNet-L, and LC100-L. We establish baseline and standardized evaluation methods across both common architectures and Landsat foundation models pretrained on the SSL4EO-L dataset. Notably, we provide evidence that SSL4EO-L pretrained GFMs extract better representations for downstream tasks in comparison to ImageNet, including performance gains of +4% OA and +5.1% mAP on EuroSAT-L and BigEarthNet-L.
nan
Article 711
Title@2025-06-10 (2): Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch
Title: Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch | Silencing Empowerment, Allowing Bigotry: Auditierung der Moderation von Hass Rede auf Twitch | 静默赋权,允许偏见:审计对抽搐者仇恨言论的温和程度 2506.07667v2 |
Authors (6): Prarabdh Shukla, Wei Yin Chong, Yash Patel, Brennan Schaffner, Danish Pruthi, Arjun Bhagoji
To meet the demands of content moderation, online platforms have resorted to automated systems. Newer forms of real-time engagement($\textit{e.g.}$, users commenting on live streams) on platforms like Twitch exert additional pressures on the latency expected of such moderation systems. Despite their prevalence, relatively little is known about the effectiveness of these systems. In this paper, we conduct an audit of Twitch’s automated moderation tool ($\texttt{AutoMod}$) to investigate its effectiveness in flagging hateful content. For our audit, we create streaming accounts to act as siloed test beds, and interface with the live chat using Twitch’s APIs to send over $107,000$ comments collated from $4$ datasets. We measure $\texttt{AutoMod}$’s accuracy in flagging blatantly hateful content containing misogyny, racism, ableism and homophobia. Our experiments reveal that a large fraction of hateful messages, up to $94\%$ on some datasets, $\textit{bypass moderation}$. Contextual addition of slurs to these messages results in $100\%$ removal, revealing $\texttt{AutoMod}$’s reliance on slurs as a moderation signal. We also find that contrary to Twitch’s community guidelines, $\texttt{AutoMod}$ blocks up to $89.5\%$ of benign examples that use sensitive words in pedagogical or empowering contexts. Overall, our audit points to large gaps in $\texttt{AutoMod}$’s capabilities and underscores the importance for such systems to understand context effectively.
nan
Article 712
Title@2025-06-10 (2): Data-Driven Discovery of PDEs via the Adjoint Method
Title: Data-Driven Discovery of PDEs via the Adjoint Method | Datengestützte Entdeckung von PDEs über die Adjoint-Methode | 通过 “ 联合方法 “ 发现项目设计书的数据驱动数据 2401.17177v4 |
Authors (3): Mohsen Sadr, Tony Tohme, Kamal Youcef-Toumi
In this work, we present an adjoint-based method for discovering the underlying governing partial differential equations (PDEs) given data. The idea is to consider a parameterized PDE in a general form and formulate a PDE-constrained optimization problem aimed at minimizing the error of the PDE solution from data. Using variational calculus, we obtain an evolution equation for the Lagrange multipliers (adjoint equations) allowing us to compute the gradient of the objective function with respect to the parameters of PDEs given data in a straightforward manner. In particular, we consider a family of parameterized PDEs encompassing linear, nonlinear, and spatial derivative candidate terms, and elegantly derive the corresponding adjoint equations. We show the efficacy of the proposed approach in identifying the form of the PDE up to machine accuracy, enabling the accurate discovery of PDEs from data. We also compare its performance with the famous PDE Functional Identification of Nonlinear Dynamics method known as PDE-FIND (Rudy et al., 2017), on both smooth and noisy data sets. Even though the proposed adjoint method relies on forward/backward solvers, it outperforms PDE-FIND for large data sets thanks to the analytic expressions for gradients of the cost function with respect to each PDE parameter.
nan
Article 713
Title@2025-06-10 (2): TextAtari: 100K Frames Game Playing with Language Agents
Title: TextAtari: 100K Frames Game Playing with Language Agents | TextAtari: 100K Frames Spiel mit Sprachagenten | TextAtari: 100K 框架游戏与语言代理游戏 2506.04098v2 |
Authors (11): Wenhao Li, Wenwu Li, Chuyun Shen, Junjie Sheng, Zixiao Huang, Di Wu, Yun Hua, Wei Yin, Xiangfeng Wang, Hongyuan Zha, Bo Jin
We present TextAtari, a benchmark for evaluating language agents on very long-horizon decision-making tasks spanning up to 100,000 steps. By translating the visual state representations of classic Atari games into rich textual descriptions, TextAtari creates a challenging test bed that bridges sequential decision-making with natural language processing. The benchmark includes nearly 100 distinct tasks with varying complexity, action spaces, and planning horizons, all rendered as text through an unsupervised representation learning framework (AtariARI). We evaluate three open-source large language models (Qwen2.5-7B, Gemma-7B, and Llama3.1-8B) across three agent frameworks (zero-shot, few-shot chain-of-thought, and reflection reasoning) to assess how different forms of prior knowledge affect performance on these long-horizon challenges. Four scenarios-Basic, Obscured, Manual Augmentation, and Reference-based-investigate the impact of semantic understanding, instruction comprehension, and expert demonstrations on agent decision-making. Our results reveal significant performance gaps between language agents and human players in extensive planning tasks, highlighting challenges in sequential reasoning, state tracking, and strategic planning across tens of thousands of steps. TextAtari provides standardized evaluation protocols, baseline implementations, and a framework for advancing research at the intersection of language models and planning. Our code is available at https://github.com/Lww007/Text-Atari-Agents.
nan
Article 714
Title@2025-06-10 (2): Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal Discovery
Title: Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal Discovery | Pfade zur Kausalität: Informative Subgraphen innerhalb von Wissensgraphen finden für wissensbasierte Kausale Entdeckung | 因果路径:在知识型因果发现的知识图中查找信息子集 2506.08771v1 |
Authors (2): Yuni Susanti, Michael Färber
Inferring causal relationships between variable pairs is crucial for understanding multivariate interactions in complex systems. Knowledge-based causal discovery – which involves inferring causal relationships by reasoning over the metadata of variables (e.g., names or textual context) – offers a compelling alternative to traditional methods that rely on observational data. However, existing methods using Large Language Models (LLMs) often produce unstable and inconsistent results, compromising their reliability for causal inference. To address this, we introduce a novel approach that integrates Knowledge Graphs (KGs) with LLMs to enhance knowledge-based causal discovery. Our approach identifies informative metapath-based subgraphs within KGs and further refines the selection of these subgraphs using Learning-to-Rank-based models. The top-ranked subgraphs are then incorporated into zero-shot prompts, improving the effectiveness of LLMs in inferring the causal relationship. Extensive experiments on biomedical and open-domain datasets demonstrate that our method outperforms most baselines by up to 44.4 points in F1 scores, evaluated across diverse LLMs and KGs. Our code and datasets are available on GitHub: https://github.com/susantiyuni/path-to-causality
nan
Article 715
Title@2025-06-10 (2): CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries
Title: CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries | KLARIFY: Kontrastive Präferenz Verstärkungslernen für untangling Ambiguous Queries | CLARIFY: 拼凑不清的问询中竞争优惠强化学习 2506.00388v3 |
Authors (6): Ni Mu, Hao Hu, Xiao Hu, Yiqin Yang, Bo Xu, Qing-Shan Jia
Preference-based reinforcement learning (PbRL) bypasses explicit reward engineering by inferring reward functions from human preference comparisons, enabling better alignment with human intentions. However, humans often struggle to label a clear preference between similar segments, reducing label efficiency and limiting PbRL’s real-world applicability. To address this, we propose an offline PbRL method: Contrastive LeArning for ResolvIng Ambiguous Feedback (CLARIFY), which learns a trajectory embedding space that incorporates preference information, ensuring clearly distinguished segments are spaced apart, thus facilitating the selection of more unambiguous queries. Extensive experiments demonstrate that CLARIFY outperforms baselines in both non-ideal teachers and real human feedback settings. Our approach not only selects more distinguished queries but also learns meaningful trajectory embeddings.
nan
Article 716
Title@2025-06-10 (2): Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting
Title: Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting | Kann langsam denkende LLMs Vernunft im Laufe der Zeit? Empirische Studien in Zeitreihen Prognose | 时间序列预测的经验研究 2505.24511v2 |
Authors (3): Jiahao Wang, Mingyue Cheng, Qi Liu
Time series forecasting (TSF) is a fundamental and widely studied task, spanning methods from classical statistical approaches to modern deep learning and multimodal language modeling. Despite their effectiveness, these methods often follow a fast thinking paradigm emphasizing pattern extraction and direct value mapping, while overlooking explicit reasoning over temporal dynamics and contextual dependencies. Meanwhile, emerging slow-thinking LLMs (e.g., ChatGPT-o1, DeepSeek-R1) have demonstrated impressive multi-step reasoning capabilities across diverse domains, suggesting a new opportunity for reframing TSF as a structured reasoning task. This motivates a key question: can slow-thinking LLMs effectively reason over temporal patterns to support time series forecasting, even in zero-shot manner? To investigate this, in this paper, we propose TimeReasoner, an extensive empirical study that formulates TSF as a conditional reasoning task. We design a series of prompting strategies to elicit inference-time reasoning from pretrained slow-thinking LLMs and evaluate their performance across diverse TSF benchmarks. Our findings reveal that slow-thinking LLMs exhibit non-trivial zero-shot forecasting capabilities, especially in capturing high-level trends and contextual shifts. While preliminary, our study surfaces important insights into the reasoning behaviors of LLMs in temporal domains highlighting both their potential and limitations. We hope this work catalyzes further research into reasoning-based forecasting paradigms and paves the way toward more interpretable and generalizable TSF frameworks.
nan
Article 717
Title@2025-06-10 (2): Detecting malignant dynamics on very few blood sample using signature coefficients
Title: Detecting malignant dynamics on very few blood sample using signature coefficients | Erkennung maligner Dynamik an sehr wenigen Blutproben mit Signaturkoeffizienten | 利用签字系数检测极少数血液样本中的恶性动态 2506.09097v1 |
Authors (2): Rémi Vaucher, Stéphane Chrétien
Recent discoveries have suggested that the promising avenue of using circulating tumor DNA (ctDNA) levels in blood samples provides reasonable accuracy for cancer monitoring, with extremely low burden on the patient’s side. It is known that the presence of ctDNA can result from various mechanisms leading to DNA release from cells, such as apoptosis, necrosis or active secretion. One key idea in recent cancer monitoring studies is that monitoring the dynamics of ctDNA levels might be sufficient for early multi-cancer detection. This interesting idea has been turned into commercial products, e.g. in the company named GRAIL. In the present work, we propose to explore the use of Signature theory for detecting aggressive cancer tumors based on the analysis of blood samples. Our approach combines tools from continuous time Markov modelling for the dynamics of ctDNA levels in the blood, with Signature theory for building efficient testing procedures. Signature theory is a topic of growing interest in the Machine Learning community (see Chevyrev2016 and Fermanian2021), which is now recognised as a powerful feature extraction tool for irregularly sampled signals. The method proposed in the present paper is shown to correctly address the challenging problem of overcoming the inherent data scarsity due to the extremely small number of blood samples per patient. The relevance of our approach is illustrated with extensive numerical experiments that confirm the efficiency of the proposed pipeline.
nan
Article 718
Title@2025-06-10 (2): On the Stability of the Jacobian Matrix in Deep Neural Networks
Title: On the Stability of the Jacobian Matrix in Deep Neural Networks | Zur Stabilität der Jacobian Matrix in tiefen neuralen Netzwerken | 深神经网络Jacobian矩阵的稳定 2506.08764v1 |
Authors (5): Benjamin Dadoun, Soufiane Hayou, Hanan Salam, Mohamed El Amine Seddik, Pierre Youssef
Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.
nan
Article 719
Title@2025-06-10 (2): EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements
Title: EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements | EDINET-Bench: Bewertung von LLMs zu komplexen Finanzaufgaben anhand japanischer Finanzausweise | EDINET-Bench:利用日本财务报表评估复杂金融任务LMs 2506.08762v1 |
Authors (7): Issa Sugiura, Takashi Ishida, Taro Makino, Chieko Tazuke, Takanori Nakagawa, Kosuke Nakago, David Ha
Financial analysis presents complex challenges that could leverage large language model (LLM) capabilities. However, the scarcity of challenging financial datasets, particularly for Japanese financial data, impedes academic innovation in financial analytics. As LLMs advance, this lack of accessible research resources increasingly hinders their development and evaluation in this specialized domain. To address this gap, we introduce EDINET-Bench, an open-source Japanese financial benchmark designed to evaluate the performance of LLMs on challenging financial tasks including accounting fraud detection, earnings forecasting, and industry prediction. EDINET-Bench is constructed by downloading annual reports from the past 10 years from Japan’s Electronic Disclosure for Investors’ NETwork (EDINET) and automatically assigning labels corresponding to each evaluation task. Our experiments reveal that even state-of-the-art LLMs struggle, performing only slightly better than logistic regression in binary classification for fraud detection and earnings forecasting. These results highlight significant challenges in applying LLMs to real-world financial applications and underscore the need for domain-specific adaptation. Our dataset, benchmark construction code, and evaluation code is publicly available to facilitate future research in finance with LLMs.
nan
Article 720
Title@2025-06-10 (2): Intra-Trajectory Consistency for Reward Modeling
Title: Intra-Trajectory Consistency for Reward Modeling | Intra-Trajektorien-Konsistenz für Prämienmodellierung | 奖励建模在轨内兼容性 2506.09096v1 |
Authors (7): Chaoyang Zhou, Shunyu Liu, Zengmao Wang, Di Wang, Rong-Cheng Tu, Bo Du, Dacheng Tao
Reward models are critical for improving large language models (LLMs), particularly in reinforcement learning from human feedback (RLHF) or inference-time verification. Current reward modeling typically relies on scores of overall responses to learn the outcome rewards for the responses. However, since the response-level scores are coarse-grained supervision signals, the reward model struggles to identify the specific components within a response trajectory that truly correlate with the scores, leading to poor generalization on unseen responses. In this paper, we propose to leverage generation probabilities to establish reward consistency between processes in the response trajectory, which allows the response-level supervisory signal to propagate across processes, thereby providing additional fine-grained signals for reward learning. Building on analysis under the Bayesian framework, we develop an intra-trajectory consistency regularization to enforce that adjacent processes with higher next-token generation probability maintain more consistent rewards. We apply the proposed regularization to the advanced outcome reward model, improving its performance on RewardBench. Besides, we show that the reward model trained with the proposed regularization induces better DPO-aligned policies and achieves better best-of-N (BON) inference-time verification results. Our code is provided in https://github.com/chaoyang101/ICRM.
nan
Article 721
Title@2025-06-10 (2): Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL
Title: Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL | Verbesserung der Genauigkeit und Aufrechterhaltung des Datenrücklaufs von Kernkraftwerken: Ein funktionsorientierter LLM-Ansatz über NL-zu-SQL | 提高核电厂数据检索的准确性和可维护性:NL-SQL的功能计算LLM法 2506.08757v1 |
Authors (5): Mishca de Costa, Muhammad Anwar, Dave Mercier, Mark Randall, Issam Hammad
Retrieving operational data from nuclear power plants requires exceptional accuracy and transparency due to the criticality of the decisions it supports. Traditionally, natural language to SQL (NL-to-SQL) approaches have been explored for querying such data. While NL-to-SQL promises ease of use, it poses significant risks: end-users cannot easily validate generated SQL queries, and legacy nuclear plant databases – often complex and poorly structured – complicate query generation due to decades of incremental modifications. These challenges increase the likelihood of inaccuracies and reduce trust in the approach. In this work, we propose an alternative paradigm: leveraging function-calling large language models (LLMs) to address these challenges. Instead of directly generating SQL queries, we define a set of pre-approved, purpose-specific functions representing common use cases. Queries are processed by invoking these functions, which encapsulate validated SQL logic. This hybrid approach mitigates the risks associated with direct NL-to-SQL translations by ensuring that SQL queries are reviewed and optimized by experts before deployment. While this strategy introduces the upfront cost of developing and maintaining the function library, we demonstrate how NL-to-SQL tools can assist in the initial generation of function code, allowing experts to focus on validation rather than creation. Our study includes a performance comparison between direct NL-to-SQL generation and the proposed function-based approach, highlighting improvements in accuracy and maintainability. This work underscores the importance of balancing user accessibility with operational safety and provides a novel, actionable framework for robust data retrieval in critical systems.
nan
Article 722
Title@2025-06-10 (2): Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Title: Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction | Denken vs. Tun: Agenten, die Grund durch Skalierung Test-Zeit-Interaktion | 思考与做:通过扩大试验时间互动而有理由的代理商 2506.07976v2 |
Authors (11): Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar
The current paradigm of test-time scaling relies on generating long reasoning traces (“thinking” more) before producing a response. In agent problems that require interaction, this can be done by generating thinking traces before acting in the world. However, this process does not allow agents to acquire new information from the environment or adapt their behavior over time. In this work, we propose to scale test-time interaction, an untapped dimension of test-time scaling that increases the agent’s interaction horizon to enable running rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout. To demonstrate the promise of this scaling dimension, we study the domain of web agents. We first show that even prompting-based interaction scaling without any training can improve task success on web benchmarks non-trivially. Building on this, we introduce TTI (Test-Time Interaction), a curriculum-based online reinforcement learning (RL) approach that trains agents by adaptively adjusting their rollout lengths. Using a Gemma 3 12B model, TTI produces state-of-the-art open-source, open-data web agents on WebVoyager and WebArena benchmarks. We further show that TTI enables agents to balance exploration and exploitation adaptively. Our results establish interaction scaling as a powerful, complementary axis to scaling per-step compute, offering new avenues for training adaptive agents.
nan
Article 723
Title@2025-06-10 (2): Understanding High-Dimensional Bayesian Optimization
Title: Understanding High-Dimensional Bayesian Optimization | High-Dimensional Bayesian Optimierung verstehen | 理解高多样性巴耶斯最佳优化 2502.09198v2 |
Authors (3): Leonard Papenmeier, Matthias Poloczek, Luigi Nardi
Recent work reported that simple Bayesian optimization (BO) methods perform well for high-dimensional real-world tasks, seemingly contradicting prior work and tribal knowledge. This paper investigates why. We identify underlying challenges that arise in high-dimensional BO and explain why recent methods succeed. Our empirical analysis shows that vanishing gradients caused by Gaussian process (GP) initialization schemes play a major role in the failures of high-dimensional Bayesian optimization (HDBO) and that methods that promote local search behaviors are better suited for the task. We find that maximum likelihood estimation (MLE) of GP length scales suffices for state-of-the-art performance. Based on this, we propose a simple variant of MLE called MSR that leverages these findings to achieve state-of-the-art performance on a comprehensive set of real-world applications. We present targeted experiments to illustrate and confirm our findings.
nan
Article 724
Title@2025-06-10 (2): Superposed Parameterised Quantum Circuits
Title: Superposed Parameterised Quantum Circuits | Überlagerte parametrische Quantenkreise | 量子电路 2506.08749v1 |
Authors (3): Viktoria Patapovich, Mo Kordzanganeh, Alexey Melnikov
Quantum machine learning has shown promise for high-dimensional data analysis, yet many existing approaches rely on linear unitary operations and shared trainable parameters across outputs. These constraints limit expressivity and scalability relative to the multi-layered, non-linear architectures of classical deep networks. We introduce superposed parameterised quantum circuits to overcome these limitations. By combining flip-flop quantum random-access memory with repeat-until-success protocols, a superposed parameterised quantum circuit embeds an exponential number of parameterised sub-models in a single circuit and induces polynomial activation functions through amplitude transformations and post-selection. We provide an analytic description of the architecture, showing how multiple parameter sets are trained in parallel while non-linear amplitude transformations broaden representational power beyond conventional quantum kernels. Numerical experiments underscore these advantages: on a 1D step-function regression a two-qubit superposed parameterised quantum circuit cuts the mean-squared error by three orders of magnitude versus a parameter-matched variational baseline; on a 2D star-shaped two-dimensional classification task, introducing a quadratic activation lifts accuracy to 81.4% and reduces run-to-run variance three-fold. These results position superposed parameterised quantum circuits as a hardware-efficient route toward deeper, more versatile parameterised quantum circuits capable of learning complex decision boundaries.
nan
Article 725
Title@2025-06-10 (2): Towards Secure and Private Language Models for Nuclear Power Plants
Title: Towards Secure and Private Language Models for Nuclear Power Plants | Auf dem Weg zu sicheren und privaten Sprachmodellen für Kernkraftwerke | 争取实现核电厂安全和私用语言模式 2506.08746v1 |
Authors (4): Muhammad Anwar, Mishca de Costa, Issam Hammad, Daniel Lau
This paper introduces a domain-specific Large Language Model for nuclear applications, built from the publicly accessible Essential CANDU textbook. Drawing on a compact Transformer-based architecture, the model is trained on a single GPU to protect the sensitive data inherent in nuclear operations. Despite relying on a relatively small dataset, it shows encouraging signs of capturing specialized nuclear vocabulary, though the generated text sometimes lacks syntactic coherence. By focusing exclusively on nuclear content, this approach demonstrates the feasibility of in-house LLM solutions that align with rigorous cybersecurity and data confidentiality standards. Early successes in text generation underscore the model’s utility for specialized tasks, while also revealing the need for richer corpora, more sophisticated preprocessing, and instruction fine-tuning to enhance domain accuracy. Future directions include extending the dataset to cover diverse nuclear subtopics, refining tokenization to reduce noise, and systematically evaluating the model’s readiness for real-world applications in nuclear domain.
nan
Article 726
Title@2025-06-10 (2): Bridging RDF Knowledge Graphs with Graph Neural Networks for Semantically-Rich Recommender Systems
Title: Bridging RDF Knowledge Graphs with Graph Neural Networks for Semantically-Rich Recommender Systems | Bridging RDF Knowledge Graphs mit Graph Neural Networks für Semantically-Rich Recommender Systeme | 将RDF知识图与Smantic-Rich建议系统图形神经网络连接起来 2506.08743v1 |
Authors (3): Michael Färber, David Lamprecht, Yuni Susanti
Graph Neural Networks (GNNs) have substantially advanced the field of recommender systems. However, despite the creation of more than a thousand knowledge graphs (KGs) under the W3C standard RDF, their rich semantic information has not yet been fully leveraged in GNN-based recommender systems. To address this gap, we propose a comprehensive integration of RDF KGs with GNNs that utilizes both the topological information from RDF object properties and the content information from RDF datatype properties. Our main focus is an in-depth evaluation of various GNNs, analyzing how different semantic feature initializations and types of graph structure heterogeneity influence their performance in recommendation tasks. Through experiments across multiple recommendation scenarios involving multi-million-node RDF graphs, we demonstrate that harnessing the semantic richness of RDF KGs significantly improves recommender systems and lays the groundwork for GNN-based recommender systems for the Linked Open Data cloud. The code and data are available on our GitHub repository: https://github.com/davidlamprecht/rdf-gnn-recommendation
nan
Article 727
Title@2025-06-10 (2): Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports
Title: Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports | Urban Incident Prediction mit Graph Neural Networks: Integration von Regierungsbewertungen und Crowdsourced Reports | 与图形神经网络城市事件预测:整合政府评级和众包报告 2506.08740v1 |
Authors (5): Sidhika Balachandar, Shuvom Sadhuka, Bonnie Berger, Emma Pierson, Nikhil Garg
Graph neural networks (GNNs) are widely used in urban spatiotemporal forecasting, such as predicting infrastructure problems. In this setting, government officials wish to know in which neighborhoods incidents like potholes or rodent issues occur. The true state of incidents (e.g., street conditions) for each neighborhood is observed via government inspection ratings. However, these ratings are only conducted for a sparse set of neighborhoods and incident types. We also observe the state of incidents via crowdsourced reports, which are more densely observed but may be biased due to heterogeneous reporting behavior. First, for such settings, we propose a multiview, multioutput GNN-based model that uses both unbiased rating data and biased reporting data to predict the true latent state of incidents. Second, we investigate a case study of New York City urban incidents and collect, standardize, and make publicly available a dataset of 9,615,863 crowdsourced reports and 1,041,415 government inspection ratings over 3 years and across 139 types of incidents. Finally, we show on both real and semi-synthetic data that our model can better predict the latent state compared to models that use only reporting data or models that use only rating data, especially when rating data is sparse and reports are predictive of ratings. We also quantify demographic biases in crowdsourced reporting, e.g., higher-income neighborhoods report problems at higher rates. Our analysis showcases a widely applicable approach for latent state prediction using heterogeneous, sparse, and biased data.
nan
Article 728
Title@2025-06-10 (2): Exploration by Random Reward Perturbation
Title: Exploration by Random Reward Perturbation | Exploration durch zufällige Belohnung Störung | 利用随机奖励干扰探测 2506.08737v1 |
Authors (5): Haozhe Ma, Guoji Fu, Zhengding Luo, Jiele Wu, Tze-Yun Leong
We introduce Random Reward Perturbation (RRP), a novel exploration strategy for reinforcement learning (RL). Our theoretical analyses demonstrate that adding zero-mean noise to environmental rewards effectively enhances policy diversity during training, thereby expanding the range of exploration. RRP is fully compatible with the action-perturbation-based exploration strategies, such as $\epsilon$-greedy, stochastic policies, and entropy regularization, providing additive improvements to exploration effects. It is general, lightweight, and can be integrated into existing RL algorithms with minimal implementation effort and negligible computational overhead. RRP establishes a theoretical connection between reward shaping and noise-driven exploration, highlighting their complementary potential. Experiments show that RRP significantly boosts the performance of Proximal Policy Optimization and Soft Actor-Critic, achieving higher sample efficiency and escaping local optima across various tasks, under both sparse and dense reward scenarios.
nan
Article 729
Title@2025-06-10 (2): scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data
Title: scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data | scSSL-Bench: Benchmarking Selbstüberwachtes Lernen für Single-Cell-Daten | scSSL-Bench:为单一分类数据制定自我监督学习基准 2506.10031v1 |
Authors (7): Olga Ovcharenko, Florian Barkmann, Philip Toma, Imant Daunhawer, Julia Vogt, Sebastian Schelter, Valentina Boeva
Self-supervised learning (SSL) has proven to be a powerful approach for extracting biologically meaningful representations from single-cell data. To advance our understanding of SSL methods applied to single-cell data, we present scSSL-Bench, a comprehensive benchmark that evaluates nineteen SSL methods. Our evaluation spans nine datasets and focuses on three common downstream tasks: batch correction, cell type annotation, and missing modality prediction. Furthermore, we systematically assess various data augmentation strategies. Our analysis reveals task-specific trade-offs: the specialized single-cell frameworks, scVI, CLAIRE, and the finetuned scGPT excel at uni-modal batch correction, while generic SSL methods, such as VICReg and SimCLR, demonstrate superior performance in cell typing and multi-modal data integration. Random masking emerges as the most effective augmentation technique across all tasks, surpassing domain-specific augmentations. Notably, our results indicate the need for a specialized single-cell multi-modal data integration framework. scSSL-Bench provides a standardized evaluation platform and concrete recommendations for applying SSL to single-cell analysis, advancing the convergence of deep learning and single-cell genomics.
nan
Article 730
Title@2025-06-10 (2): Fusing Global and Local: Transformer-CNN Synergy for Next-Gen Current Estimation
Title: Fusing Global and Local: Transformer-CNN Synergy for Next-Gen Current Estimation | Fusing Global und Local: Transformer-CNN-Synergie für die aktuelle Schätzung von Next-Gen | 全球和局部引信: 变异器-CNN 用于下金当前估算的协同效应 2504.07996v2 |
Authors (8): Junlang Huang, Hao Chen, Li Luo, Yong Cai, Lexin Zhang, Tianhao Ma, Yitian Zhang, Zhong Guan
This paper presents a hybrid model combining Transformer and CNN for predicting the current waveform in signal lines. Unlike traditional approaches such as current source models, driver linear representations, waveform functional fitting, or equivalent load capacitance methods, our model does not rely on fixed simplified models of standard-cell drivers or RC loads. Instead, it replaces the complex Newton iteration process used in traditional SPICE simulations, leveraging the powerful sequence modeling capabilities of the Transformer framework to directly predict current responses without iterative solving steps. The hybrid architecture effectively integrates the global feature-capturing ability of Transformers with the local feature extraction advantages of CNNs, significantly improving the accuracy of current waveform predictions. Experimental results demonstrate that, compared to traditional SPICE simulations, the proposed algorithm achieves an error of only 0.0098. These results highlight the algorithm’s superior capabilities in predicting signal line current waveforms, timing analysis, and power evaluation, making it suitable for a wide range of technology nodes, from 40nm to 3nm.
nan
Article 731
Title@2025-06-10 (2): Flexible and Efficient Drift Detection without Labels
Title: Flexible and Efficient Drift Detection without Labels | Flexible und effiziente Drift-Erkennung ohne Etiketten | 无标签的灵活而高效的漂流探测 2506.08734v1 |
Authors (4): Nelvin Tan, Yu-Ching Shih, Dong Yang, Amol Salunkhe
Machine learning models are being increasingly used to automate decisions in almost every domain, and ensuring the performance of these models is crucial for ensuring high quality machine learning enabled services. Ensuring concept drift is detected early is thus of the highest importance. A lot of research on concept drift has focused on the supervised case that assumes the true labels of supervised tasks are available immediately after making predictions. Controlling for false positives while monitoring the performance of predictive models used to make inference from extremely large datasets periodically, where the true labels are not instantly available, becomes extremely challenging. We propose a flexible and efficient concept drift detection algorithm that uses classical statistical process control in a label-less setting to accurately detect concept drifts. We shown empirically that under computational constraints, our approach has better statistical power than previous known methods. Furthermore, we introduce a new drift detection framework to model the scenario of detecting drift (without labels) given prior detections, and show our how our drift detection algorithm can be incorporated effectively into this framework. We demonstrate promising performance via numerical simulations.
nan
Article 732
Title@2025-06-10 (2): Mitigating fairwashing using Two-Source Audits
Title: Mitigating fairwashing using Two-Source Audits | Fairwashing durch Zwei-Quellen-Audits abmildern | 利用双重来源审计减少洗水 2305.13883v2 |
Authors (4): Jade Garcia Bourrée, Erwan Le Merrer, Gilles Tredan, Benoît Rottembourg
Recent legislation requires online platforms to provide dedicated APIs to assess the compliance of their decision-making algorithms with the law. Research has nevertheless shown that the auditors of such platforms are prone to manipulation (a practice referred to as \textit{fairwashing}). To address this salient problem, recent work has considered audits under the assumption of partial knowledge of the platform’s internal mechanisms. In this paper, we propose a more pragmatic approach with the \textit{Two-Source Audit} setup: while still leveraging the API, we advocate for the adjunction of a second source of data to both perform the audit of a platform and the detection of fairwashing attempts. Our method is based on identifying discrepancies between the two data sources, using data proxies at use in the fairness literature. We formally demonstrate the conditions for success in this fairwashing mitigation task. We then validate our method empirically, demonstrating that Two-Source Audits can achieve a Pareto-optimal balance between the two objectives. We believe this paper sets the stage for reliable audits in manipulation-prone setups, under mild assumptions.
nan
Article 733
Title@2025-06-10 (2): Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs
Title: Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs | Breaking the ICE: Erforschen von Versprechungen und Herausforderungen von Benchmarks für Inferenz-Kohlenstoff- & Energieschätzungen für LLMs | 打破ICE:探索LLMM的碳和能源估算基准的许诺和挑战 2506.08727v1 |
Authors (8): Samarth Sikand, Rohit Mehra, Priyavanshi Pathania, Nikhil Bamby, Vibhu Saujanya Sharma, Vikrant Kaulgud, Sanjay Podder, Adam P. Burden
While Generative AI stands to be one of the fastest adopted technologies ever, studies have made evident that the usage of Large Language Models (LLMs) puts significant burden on energy grids and our environment. It may prove a hindrance to the Sustainability goals of any organization. A crucial step in any Sustainability strategy is monitoring or estimating the energy consumption of various components. While there exist multiple tools for monitoring energy consumption, there is a dearth of tools/frameworks for estimating the consumption or carbon emissions. Current drawbacks of both monitoring and estimation tools include high input data points, intrusive nature, high error margin, etc. We posit that leveraging emerging LLM benchmarks and related data points can help overcome aforementioned challenges while balancing accuracy of the emission estimations. To that extent, we discuss the challenges of current approaches and present our evolving framework, R-ICE, which estimates prompt level inference carbon emissions by leveraging existing state-of-the-art(SOTA) benchmark. This direction provides a more practical and non-intrusive way to enable emerging use-cases like dynamic LLM routing, carbon accounting, etc. Our promising validation results suggest that benchmark-based modelling holds great potential for inference emission estimation and warrants further exploration from the scientific community.
nan
Article 734
Title@2025-06-10 (2): Stop Misusing t-SNE and UMAP for Visual Analytics
Title: Stop Misusing t-SNE and UMAP for Visual Analytics | Stoppen Sie die missbräuchliche Verwendung von t-SNE und UMAP für Visual Analytics | 停止滥用 t- SNE 和视觉分析分析 UMAP 2506.08725v1 |
Authors (4): Hyeon Jeon, Jeongin Park, Sungbok Shin, Jinwook Seo
Misuses of t-SNE and UMAP in visual analytics have become increasingly common. For example, although t-SNE and UMAP projections often do not faithfully reflect true distances between clusters, practitioners frequently use them to investigate inter-cluster relationships. In this paper, we bring this issue to the surface and comprehensively investigate why such misuse occurs and how to prevent it. We conduct a literature review of 114 papers to verify the prevalence of the misuse and analyze the reasonings behind it. We then execute an interview study to uncover practitioners’ implicit motivations for using these techniques – rationales often undisclosed in the literature. Our findings indicate that misuse of t-SNE and UMAP primarily stems from limited discourse on their appropriate use in visual analytics. We conclude by proposing future directions and concrete action items to promote more reasonable use of DR.
nan
Article 735
Title@2025-06-10 (2): VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
Title: VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? | VIST-GPT: Sich in der Ära des Visual Storytelling mit LLMs niederlassen? | 使用LLMS的视觉故事时代? 2504.19267v3 |
Authors (5): Mohamed Gado, Towhid Taliee, Muhammad Memon, Dmitry Ignatov, Radu Timofte
Visual storytelling is an interdisciplinary field combining computer vision and natural language processing to generate cohesive narratives from sequences of images. This paper presents a novel approach that leverages recent advancements in multimodal models, specifically adapting transformer-based architectures and large multimodal models, for the visual storytelling task. Leveraging the large-scale Visual Storytelling (VIST) dataset, our VIST-GPT model produces visually grounded, contextually appropriate narratives. We address the limitations of traditional evaluation metrics, such as BLEU, METEOR, ROUGE, and CIDEr, which are not suitable for this task. Instead, we utilize RoViST and GROOVIST, novel reference-free metrics designed to assess visual storytelling, focusing on visual grounding, coherence, and non-redundancy. These metrics provide a more nuanced evaluation of narrative quality, aligning closely with human judgment.
nan
Article 736
Title@2025-06-10 (2): Strain Problems got you in a Twist? Try StrainRelief: A Quantum-Accurate Tool for Ligand Strain Calculations
Title: Strain Problems got you in a Twist? Try StrainRelief: A Quantum-Accurate Tool for Ligand Strain Calculations | Strain Probleme haben Sie in einem Twist? Versuchen Sie StrainRelief: Ein Quantum-Accurate-Tool für Ligand Strain Berechnungen | Strain Relief: Ligand Strain 计算 的量子精确工具 2503.13352v2 |
Authors (3): Ewan R. S. Wallace, Nathan C. Frey, Joshua A. Rackers
Ligand strain energy, the energy difference between the bound and unbound conformations of a ligand, is an important component of structure-based small molecule drug design. A large majority of observed ligands in protein-small molecule co-crystal structures bind in low-strain conformations, making strain energy a useful filter for structure-based drug design. In this work we present a tool for calculating ligand strain with a high accuracy. StrainRelief uses a MACE Neural Network Potential (NNP), trained on a large database of Density Functional Theory (DFT) calculations to estimate ligand strain of neutral molecules with quantum accuracy. We show that this tool estimates strain energy differences relative to DFT to within 1.4 kcal/mol, more accurately than alternative NNPs. These results highlight the utility of NNPs in drug discovery, and provide a useful tool for drug discovery teams.
nan
Article 737
Title@2025-06-10 (2): TimeWak: Temporal Chained-Hashing Watermark for Time Series Data
Title: TimeWak: Temporal Chained-Hashing Watermark for Time Series Data | TimeWak: Temporales Ketten-Hashing-Wasserzeichen für Zeitreihendaten | TimeWak:用于时间序列数据的时链套装水印 2506.06407v2 |
Authors (7): Zhi Wen Soi, Chaoyi Zhu, Fouad Abiad, Aditya Shankar, Jeroen M. Galjaard, Huijuan Wang, Lydia Y. Chen
Synthetic time series generated by diffusion models enable sharing privacy-sensitive datasets, such as patients’ functional MRI records. Key criteria for synthetic data include high data utility and traceability to verify the data source. Recent watermarking methods embed in homogeneous latent spaces, but state-of-the-art time series generators operate in real space, making latent-based watermarking incompatible. This creates the challenge of watermarking directly in real space while handling feature heterogeneity and temporal dependencies. We propose TimeWak, the first watermarking algorithm for multivariate time series diffusion models. To handle temporal dependence and spatial heterogeneity, TimeWak embeds a temporal chained-hashing watermark directly within the real temporal-feature space. The other unique feature is the $\epsilon$-exact inversion, which addresses the non-uniform reconstruction error distribution across features from inverting the diffusion process to detect watermarks. We derive the error bound of inverting multivariate time series and further maintain high watermark detectability. We extensively evaluate TimeWak on its impact on synthetic data quality, watermark detectability, and robustness under various post-editing attacks, against 5 datasets and baselines of different temporal lengths. Our results show that TimeWak achieves improvements of 61.96% in context-FID score, and 8.44% in correlational scores against the state-of-the-art baseline, while remaining consistently detectable.
nan
Article 738
Title@2025-06-10 (2): ASIDE: Architectural Separation of Instructions and Data in Language Models
Title: ASIDE: Architectural Separation of Instructions and Data in Language Models | ASIDE: Architekturelle Trennung von Instruktionen und Daten in Sprachmodellen | AIDIDE: 语言模式教学和数据结构分类 2503.10566v3 |
Authors (8): Egor Zverev, Evgenii Kortukov, Alexander Panfilov, Alexandra Volkova, Soroush Tabesh, Sebastian Lapuschkin, Wojciech Samek, Christoph H. Lampert
Despite their remarkable performance, large language models lack elementary safety features, making them susceptible to numerous malicious attacks. In particular, previous work has identified the absence of an intrinsic separation between instructions and data as a root cause of the success of prompt injection attacks. In this work, we propose a new architectural element, ASIDE, that allows language models to clearly separate instructions and data at the level of embeddings. ASIDE applies an orthogonal rotation to the embeddings of data tokens, thus creating clearly distinct representations of instructions and data tokens without introducing any additional parameters. As we demonstrate experimentally across a range of models, instruction-tuning LLMs with ASIDE (1) leads to highly increased instruction-data separation without a loss in model utility and (2) makes the models more robust to prompt injection benchmarks, even without dedicated safety training. Additionally, we provide insights into the mechanism underlying our method through an analysis of the model representations. The source code and training scripts are openly accessible at https://github.com/egozverev/aside.
nan
Article 739
Title@2025-06-10 (2): A Survey on Quantum Machine Learning: Current Trends, Challenges, Opportunities, and the Road Ahead
Title: A Survey on Quantum Machine Learning: Current Trends, Challenges, Opportunities, and the Road Ahead | Eine Umfrage zum Quantum Machine Learning: Aktuelle Trends, Herausforderungen, Chancen und die Zukunft | 关于 “ 量子机器学习:当前趋势、挑战、机遇和路前道路 “ 的调查 2310.10315v4 |
Authors (4): Kamila Zaman, Alberto Marchisio, Muhammad Abdullah Hanif, Muhammad Shafique
Quantum Computing (QC) claims to improve the efficiency of solving complex problems, compared to classical computing. When QC is integrated with Machine Learning (ML), it creates a Quantum Machine Learning (QML) system. This paper aims to provide a thorough understanding of the foundational concepts of QC and its notable advantages over classical computing. Following this, we delve into the key aspects of QML in a detailed and comprehensive manner. In this survey, we investigate a variety of QML algorithms, discussing their applicability across different domains. We examine quantum datasets, highlighting their unique characteristics and advantages. The survey also covers the current state of hardware technologies, providing insights into the latest advancements and their implications for QML. Additionally, we review the software tools and simulators available for QML development, discussing their features and usability. Furthermore, we explore practical applications of QML, illustrating how it can be leveraged to solve real-world problems more efficiently than classical ML methods. This survey aims to consolidate the current landscape of QML and outline key opportunities and challenges for future research.
nan
Article 740
Title@2025-06-10 (2): Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data
Title: Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data | Intelligenter zusammenführen, besser verallgemeinern: Modell verbessern Zusammenführen auf OOD-Daten | 合并的 “ 聪明者 “ 、 “ 更普遍化:加强OOD数据模型合并 “ 2506.09093v1 |
Authors (8): Bingjie Zhang, Hongkang Li, Changlong Shi, Guowei Rong, He Zhao, Dongsheng Wang, Dandan Guo, Meng Wang
Multi-task learning (MTL) concurrently trains a model on diverse task datasets to exploit common features, thereby improving overall performance across the tasks. Recent studies have dedicated efforts to merging multiple independent model parameters into a unified model for MTL, thus circumventing the need for training data and expanding the scope of applicable scenarios of MTL. However, current approaches to model merging predominantly concentrate on enhancing performance within in-domain (ID) datasets, often overlooking their efficacy on out-of-domain (OOD) datasets. In this work, we proposed LwPTV (Layer-wise Pruning Task Vector) by building a saliency score, measuring the redundancy of parameters in task vectors. Designed in this way ours can achieve mask vector for each task and thus perform layer-wise pruning on the task vectors, only keeping the pre-trained model parameters at the corresponding layer in merged model. Owing to its flexibility, our method can be seamlessly integrated with most of existing model merging methods to improve their performance on OOD tasks. Extensive experiments demonstrate that the application of our method results in substantial enhancements in OOD performance while preserving the ability on ID tasks.
nan
Article 741
Title@2025-06-10 (2): Variational Autoencoder-Based Approach to Latent Feature Analysis on Efficient Representation of Power Load Monitoring Data
Title: Variational Autoencoder-Based Approach to Latent Feature Analysis on Efficient Representation of Power Load Monitoring Data | Variationeller Autoencoder-basierter Ansatz zur Latent Feature Analyse zur effizienten Darstellung von Power Load Monitoring Daten | 以变式自动编码器为基础,对电荷载量监测数据高效代表情况进行长期特征分析的变式自动编码器方法 2506.08698v1 |
Authors (2): Boyu Xie, Tangtang Xie
With the development of smart grids, High-Dimensional and Incomplete (HDI) Power Load Monitoring (PLM) data challenges the performance of Power Load Forecasting (PLF) models. In this paper, we propose a potential characterization model VAE-LF based on Variational Autoencoder (VAE) for efficiently representing and complementing PLM missing data. VAE-LF learns a low-dimensional latent representation of the data using an Encoder-Decoder structure by splitting the HDI PLM data into vectors and feeding them sequentially into the VAE-LF model, and generates the complementary data. Experiments on the UK-DALE dataset show that VAE-LF outperforms other benchmark models in both 5% and 10% sparsity test cases, with significantly lower RMSE and MAE, and especially outperforms on low sparsity ratio data. The method provides an efficient data-completion solution for electric load management in smart grids.
nan
Article 742
Title@2025-06-10 (2): Self-Training Elicits Concise Reasoning in Large Language Models
Title: Self-Training Elicits Concise Reasoning in Large Language Models | Selbst-Trainings-Elizite schließen Vernunft in großen Sprachmodellen ein | 大语言模式中的简明理由 2502.20122v3 |
Authors (6): Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun
Chain-of-thought (CoT) reasoning has enabled large language models (LLMs) to utilize additional computation through intermediate tokens to solve complex tasks. However, we posit that typical reasoning traces contain many redundant tokens, incurring extraneous inference costs. Upon examination of the output distribution of current LLMs, we find evidence on their latent ability to reason more concisely, relative to their default behavior. To elicit this capability, we propose simple fine-tuning methods which leverage self-generated concise reasoning paths obtained by best-of-N sampling and few-shot conditioning, in task-specific settings. Our combined method achieves a 30% reduction in output tokens on average, across five model families on GSM8K and MATH, while maintaining average accuracy. By exploiting the fundamental stochasticity and in-context learning capabilities of LLMs, our self-training approach robustly elicits concise reasoning on a wide range of models, including those with extensive post-training. Code is available at https://github.com/TergelMunkhbat/concise-reasoning
nan
Article 743
Title@2025-06-10 (2): CUDA-LLM: LLMs Can Write Efficient CUDA Kernels
Title: CUDA-LLM: LLMs Can Write Efficient CUDA Kernels | CUDA-LLM: LLMs können effiziente CUDA-Kernel schreiben | CUDA-LLM: 能够写写有效CUDA核心的LLMss 2506.09092v1 |
Authors (5): Wentao Chen, Jiace Zhu, Qi Fan, Yehan Ma, An Zou
Large Language Models (LLMs) have demonstrated strong capabilities in general-purpose code generation. However, generating the code which is deeply hardware-specific, architecture-aware, and performance-critical, especially for massively parallel GPUs, remains a complex challenge. In this work, we explore the use of LLMs for the automated generation and optimization of CUDA programs, with the goal of producing high-performance GPU kernels that fully exploit the underlying hardware. To address this challenge, we propose a novel framework called \textbf{Feature Search and Reinforcement (FSR)}. FSR jointly optimizes compilation and functional correctness, as well as the runtime performance, which are validated through extensive and diverse test cases, and measured by actual kernel execution latency on the target GPU, respectively. This approach enables LLMs not only to generate syntactically and semantically correct CUDA code but also to iteratively refine it for efficiency, tailored to the characteristics of the GPU architecture. We evaluate FSR on representative CUDA kernels, covering AI workloads and computational intensive algorithms. Our results show that LLMs augmented with FSR consistently guarantee correctness rates. Meanwhile, the automatically generated kernels can outperform general human-written code by a factor of up to 179$\times$ in execution speeds. These findings highlight the potential of combining LLMs with performance reinforcement to automate GPU programming for hardware-specific, architecture-sensitive, and performance-critical applications.
nan
Article 744
Title@2025-06-10 (2): Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy
Title: Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy | Variationelle Schlussfolgerung optimiert mit der gekrümmten Geometrie der gekoppelten freien Energie | 使用共同自由能源曲线几何法优化 2506.09091v1 |
Authors (5): Kenric Nelson, Igor Oliveira, Amenah Al-Najafi, Fode Zhang, Hon Keung Tony Ng
We introduce an optimization framework for variational inference based on the coupled free energy, extending variational inference techniques to account for the curved geometry of the coupled exponential family. This family includes important heavy-tailed distributions such as the generalized Pareto and the Student’s t. By leveraging the coupled free energy, which is equal to the coupled evidence lower bound (ELBO) of the inverted probabilities, we improve the accuracy and robustness of the learned model. The coupled generalization of Fisher Information metric and the affine connection. The method is applied to the design of a coupled variational autoencoder (CVAE). By using the coupling for both the distributions and cost functions, the reconstruction metric is derived to still be the mean-square average loss with modified constants. The novelty comes from sampling the heavy-tailed latent distribution with its associated coupled probability, which has faster decaying tails. The result is the ability to train a model with high penalties in the tails, while assuring that the training samples have a reduced number of outliers. The Wasserstein-2 or Fr'echet Inception Distance of the reconstructed CelebA images shows the CVAE has a 3\% improvement over the VAE after 5 epochs of training.
nan
Article 745
Title@2025-06-10 (2): Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?
Title: Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? | Optuna vs Code Llama: Sind LLMs ein neues Paradigma für Hyperparameter-Tuning? | Optuna 与 Code Llama:LLMs是超参数图例的新范式吗? 2504.06006v3 |
Authors (5): Roman Kochnev, Arash Torabi Goodarzi, Zofia Antonina Bentyn, Dmitry Ignatov, Radu Timofte
Optimal hyperparameter selection is critical for maximizing neural network performance, especially as models grow in complexity. This work investigates the viability of leveraging large language models (LLMs) for hyperparameter optimization by fine-tuning a parameter-efficient version of Code Llama using LoRA. The adapted LLM is capable of generating accurate and efficient hyperparameter recommendations tailored to diverse neural network architectures. Unlike traditional approaches such as Optuna, which rely on computationally intensive trial-and-error procedures, our method achieves competitive or superior results in terms of Root Mean Square Error (RMSE) while significantly reducing computational overhead. Our findings demonstrate that LLM-based optimization not only matches the performance of state-of-the-art techniques like Tree-structured Parzen Estimators (TPE) but also substantially accelerates the tuning process. This positions LLMs as a promising alternative for rapid experimentation, particularly in resource-constrained environments such as edge devices and mobile platforms, where computational efficiency is essential. In addition to improved efficiency, the method offers time savings and consistent performance across various tasks, highlighting its robustness and generalizability. All generated hyperparameters are included in the LEMUR Neural Network (NN) Dataset, which is publicly available and serves as an open-source benchmark for hyperparameter optimization research.
nan
Article 746
Title@2025-06-10 (2): RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
Title: RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling | RuleReasoner: Verstärkte regelbasiertes Reasoning via Domain-aware Dynamic Sampling | 规则响应器:通过Domain-aware动态抽样加强基于规则的理由说明 2506.08672v1 |
Authors (3): Yang Liu, Jiaqi Li, Zilong Zheng
Rule-based reasoning has been acknowledged as one of the fundamental problems in reasoning, while deviations in rule formats, types, and complexity in real-world applications pose severe challenges. Recent studies have shown that large reasoning models (LRMs) have remarkable reasoning capabilities, and their performance is substantially enhanced by reinforcement learning (RL). However, it remains an open question whether small reasoning models (SRMs) can learn rule-based reasoning effectively with robust generalization across diverse tasks and domains. To address this, we introduce Reinforced Rule-based Reasoning, a.k.a. RuleReasoner, a simple yet effective method to conduct rule-based reasoning via a wide collection of curated tasks and a novel domain-aware dynamic sampling approach. Specifically, RuleReasoner resamples each training batch by updating the sampling weights of different domains based on historical rewards. This facilitates domain augmentation and flexible online learning schedules for RL, obviating the need for pre-hoc human-engineered mix-training recipes used in existing methods. Empirical evaluations on in-distribution (ID) and out-of-distribution (OOD) benchmarks reveal that RuleReasoner outperforms frontier LRMs by a significant margin ($\Delta$4.1% average points on eight ID tasks and $\Delta$10.4% average points on three OOD tasks over OpenAI-o1). Notably, our approach also exhibits higher computational efficiency compared to prior dynamic sampling methods for RL.
nan
Article 747
Title@2025-06-10 (2): sparseGeoHOPCA: A Geometric Solution to Sparse Higher-Order PCA Without Covariance Estimation
Title: sparseGeoHOPCA: A Geometric Solution to Sparse Higher-Order PCA Without Covariance Estimation | spärlichGeoHOPCA: Eine geometrische Lösung, um PCA ohne Kovarianzabschätzung höher zu bestellen | 稀多GeoHOPCA:无变化估计的普通五氯苯甲醚的几何解决办法 2506.08670v1 |
Authors (6): Renjie Xu, Chong Wu, Maolin Che, Zhuoheng Ran, Yimin Wei, Hong Yan
We propose sparseGeoHOPCA, a novel framework for sparse higher-order principal component analysis (SHOPCA) that introduces a geometric perspective to high-dimensional tensor decomposition. By unfolding the input tensor along each mode and reformulating the resulting subproblems as structured binary linear optimization problems, our method transforms the original nonconvex sparse objective into a tractable geometric form. This eliminates the need for explicit covariance estimation and iterative deflation, enabling significant gains in both computational efficiency and interpretability, particularly in high-dimensional and unbalanced data scenarios. We theoretically establish the equivalence between the geometric subproblems and the original SHOPCA formulation, and derive worst-case approximation error bounds based on classical PCA residuals, providing data-dependent performance guarantees. The proposed algorithm achieves a total computational complexity of $O\left(\sum_{n=1}^{N} (k_n^3 + J_n k_n^2)\right)$, which scales linearly with tensor size. Extensive experiments demonstrate that sparseGeoHOPCA accurately recovers sparse supports in synthetic settings, preserves classification performance under 10$\times$ compression, and achieves high-quality image reconstruction on ImageNet, highlighting its robustness and versatility.
nan
Article 748
Title@2025-06-10 (2): Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search
Title: Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search | Mit Blueprints und Prompt Template Search die Fähigkeit kleiner Sprachmodelle zu verbessern | 提高有蓝图和快速模板搜索的小型语言模式的理性能力 2506.08669v1 |
Authors (10): Dongge Han, Menglin Xia, Daniel Madrigal Diaz, Samuel Kessler, Ankur Mallick, Xuchao Zhang, Mirian Del Carmen Hipolito Garcia, Jin Xu, Victor Rühle, Saravan Rajmohan
Small language models (SLMs) offer promising and efficient alternatives to large language models (LLMs). However, SLMs’ limited capacity restricts their reasoning capabilities and makes them sensitive to prompt variations. To address these challenges, we propose a novel framework that enhances SLM reasoning capabilities through LLM generated blueprints. The blueprints provide structured, high-level reasoning guides that help SLMs systematically tackle related problems. Furthermore, our framework integrates a prompt template search mechanism to mitigate the SLMs’ sensitivity to prompt variations. Our framework demonstrates improved SLM performance across various tasks, including math (GSM8K), coding (MBPP), and logic reasoning (BBH). Our approach improves the reasoning capabilities of SLMs without increasing model size or requiring additional training, offering a lightweight and deployment-friendly solution for on-device or resource-constrained environments.
nan
Article 749
Title@2025-06-10 (2): Optimizing Learned Image Compression on Scalar and Entropy-Constraint Quantization
Title: Optimizing Learned Image Compression on Scalar and Entropy-Constraint Quantization | Optimierung der Bildkompression auf Scalar- und Entropie-Konstraint-Quantisierung | 优化 Scalar 和 Entropy- Contrpy- 约束量化的学习图像压缩 2506.08662v1 |
Authors (6): Florian Borzechowski, Michael Schäfer, Heiko Schwarz, Jonathan Pfaff, Detlev Marpe, Thomas Wiegand
The continuous improvements on image compression with variational autoencoders have lead to learned codecs competitive with conventional approaches in terms of rate-distortion efficiency. Nonetheless, taking the quantization into account during the training process remains a problem, since it produces zero derivatives almost everywhere and needs to be replaced with a differentiable approximation which allows end-to-end optimization. Though there are different methods for approximating the quantization, none of them model the quantization noise correctly and thus, result in suboptimal networks. Hence, we propose an additional finetuning training step: After conventional end-to-end training, parts of the network are retrained on quantized latents obtained at the inference stage. For entropy-constraint quantizers like Trellis-Coded Quantization, the impact of the quantizer is particularly difficult to approximate by rounding or adding noise as the quantized latents are interdependently chosen through a trellis search based on both the entropy model and a distortion measure. We show that retraining on correctly quantized data consistently yields additional coding gain for both uniform scalar and especially for entropy-constraint quantization, without increasing inference complexity. For the Kodak test set, we obtain average savings between 1% and 2%, and for the TecNick test set up to 2.2% in terms of Bj{\o}ntegaard-Delta bitrate.
nan
Article 750
Title@2025-06-10 (2): Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness
Title: Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness | Auf dem Weg zu robuster Echtzeit-Multivariate-Zeitreihenprognose: Ein einheitliches Framework für Abhängigkeit, Asynchronität und Missingness | 走向强劲的现实世界多变时间序列预测:关于依赖、同步和失踪的统一框架 2506.08660v1 |
Authors (4): Jinkwan Jang, Hyungjin Park, Jinmyeong Choi, Taesup Kim
Real-world time series data are inherently multivariate, often exhibiting complex inter-channel dependencies. Each channel is typically sampled at its own period and is prone to missing values due to various practical and operational constraints. These characteristics pose fundamental challenges related to channel dependency, sampling asynchrony, and missingness, all of which must be addressed to enable robust and reliable forecasting in practical settings. However, most existing architectures are built on oversimplified assumptions, such as identical sampling periods across channels and fully observed inputs at test time, which often do not hold in real-world scenarios. To bridge this gap, we propose ChannelTokenFormer, a Transformer-based forecasting model with a flexible architecture designed to explicitly capture cross-channel interactions, accommodate channel-wise asynchronous sampling, and effectively handle missing values. Extensive experiments on three benchmark datasets modified to reflect practical settings, along with one real-world industrial dataset, demonstrate the superior robustness and accuracy of ChannelTokenFormer under challenging real-world conditions.
nan
Article 751
Title@2025-06-10 (2): Integrating Asynchronous AdaBoost into Federated Learning: Five Real World Applications
Title: Integrating Asynchronous AdaBoost into Federated Learning: Five Real World Applications | Integration von Asynchron AdaBoost in das Federated Learning: Fünf Anwendungen in der realen Welt | 将AdaBoost亚洲同步体纳入联邦学习:五个真实世界应用 2506.09090v1 |
Authors (2): Arthur Oghlukyan, Nuria Gomez Blas
This paper presents a comprehensive analysis of an enhanced asynchronous AdaBoost framework for federated learning (FL), focusing on its application across five distinct domains: computer vision on edge devices, blockchain-based model transparency, on-device mobile personalization, IoT anomaly detection, and federated healthcare diagnostics. The proposed algorithm incorporates adaptive communication scheduling and delayed weight compensation to reduce synchronization frequency and communication overhead while preserving or improving model accuracy. We examine how these innovations improve communication efficiency, scalability, convergence, and robustness in each domain. Comparative metrics including training time, communication overhead, convergence iterations, and classification accuracy are evaluated using data and estimates derived from Oghlukyan’s enhanced AdaBoost framework. Empirical results show, for example, training time reductions on the order of 20-35% and communication overhead reductions of 30-40% compared to baseline AdaBoost, with convergence achieved in significantly fewer boosting rounds. Tables and charts summarize these improvements by domain. Mathematical formulations of the adaptive scheduling rule and error-driven synchronization thresholds are provided. Overall, the enhanced AdaBoost exhibits markedly improved efficiency and robustness across diverse FL scenarios, suggesting broad applicability of the approach.
nan
Article 752
Title@2025-06-10 (2): Scalable Graph Attention-based Instance Selection via Mini-Batch Sampling and Hierarchical Hashing
Title: Scalable Graph Attention-based Instance Selection via Mini-Batch Sampling and Hierarchical Hashing | Skalierbare Grafik aufmerksamkeitsbasierte Instanzenauswahl über Mini-Batch-Sampling und Hierarchical Hashing | 通过小型批量抽样和等级类散列选择基于可缩放图表的注意点选案 2502.20293v2 |
Authors (3): Zahiriddin Rustamov, Ayham Zaitouny, Nazar Zaki
Instance selection (IS) addresses the critical challenge of reducing dataset size while keeping informative characteristics, becoming increasingly important as datasets grow to millions of instances. Current IS methods often struggle with capturing complex relationships in high-dimensional spaces and scale with large datasets. This paper introduces a graph attention-based instance selection (GAIS) method that uses attention mechanisms to identify informative instances through their structural relationships in graph representations. We present two approaches for scalable graph construction: a distance-based mini-batch sampling technique that achieves dataset-size-independent complexity through strategic batch processing, and a hierarchical hashing approach that enables efficient similarity computation through random projections. The mini-batch approach keeps class distributions through stratified sampling, while the hierarchical hashing method captures relationships at multiple granularities through single-level, multi-level, and multi-view variants. Experiments across 39 datasets show that GAIS achieves reduction rates above 96\% while maintaining or improving model performance relative to state-of-the-art IS methods. The findings show that the distance-based mini-batch approach offers an optimal efficiency for large-scale datasets, while multi-view variants excel on complex, high-dimensional data, demonstrating that attention-based importance scoring can effectively identify instances important for maintaining decision boundaries while avoiding computationally prohibitive pairwise comparisons.
nan
Article 753
Title@2025-06-10 (2): When Simple Model Just Works: Is Network Traffic Classification in Crisis?
Title: When Simple Model Just Works: Is Network Traffic Classification in Crisis? | Wenn einfaches Modell gerade funktioniert: Ist Netzwerkverkehrsklassifikation in der Krise? | ” 简单模型 “ 何时起作用:网络交通分类是否处于危机之中? 2506.08655v1 |
Authors (6): Kamil Jerabek, Jan Luxemburk, Richard Plny, Josef Koumar, Jaroslav Pesek, Karel Hynek
Machine learning has been applied to network traffic classification (TC) for over two decades. While early efforts used shallow models, the latter 2010s saw a shift toward complex neural networks, often reporting near-perfect accuracy. However, it was recently revealed that a simple k-NN baseline using packet sequences metadata (sizes, times, and directions) can be on par or even outperform more complex methods. In this paper, we investigate this phenomenon further and evaluate this baseline across 12 datasets and 15 TC tasks, and investigate why it performs so well. Our analysis shows that most datasets contain over 50% redundant samples (identical packet sequences), which frequently appear in both training and test sets due to common splitting practices. This redundancy can lead to overestimated model performance and reduce the theoretical maximum accuracy when identical flows have conflicting labels. Given its distinct characteristics, we further argue that standard machine learning practices adapted from domains like NLP or computer vision may be ill-suited for TC. Finally, we propose new directions for task formulation and evaluation to address these challenges and help realign the field.
nan
Article 754
Title@2025-06-10 (2): A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck
Title: A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck | Ein Privacy-Preserving Föderated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck | 保护隐私的联邦学习框架,以普及CBCT, 将合成CT翻译在头部和颈部 2506.08654v1 |
Authors (3): Ciro Benito Raggio, Paolo Zaffino, Maria Francesca Spadea
Shortened Abstract Cone-beam computed tomography (CBCT) has become a widely adopted modality for image-guided radiotherapy (IGRT). However, CBCT suffers from increased noise, limited soft-tissue contrast, and artifacts, resulting in unreliable Hounsfield unit values and hindering direct dose calculation. Synthetic CT (sCT) generation from CBCT addresses these issues, especially using deep learning (DL) methods. Existing approaches are limited by institutional heterogeneity, scanner-dependent variations, and data privacy regulations that prevent multi-center data sharing. To overcome these challenges, we propose a cross-silo horizontal federated learning (FL) approach for CBCT-to-sCT synthesis in the head and neck region, extending our FedSynthCT framework. A conditional generative adversarial network was collaboratively trained on data from three European medical centers in the public SynthRAD2025 challenge dataset. The federated model demonstrated effective generalization across centers, with mean absolute error (MAE) ranging from $64.38\pm13.63$ to $85.90\pm7.10$ HU, structural similarity index (SSIM) from $0.882\pm0.022$ to $0.922\pm0.039$, and peak signal-to-noise ratio (PSNR) from $32.86\pm0.94$ to $34.91\pm1.04$ dB. Notably, on an external validation dataset of 60 patients, comparable performance was achieved (MAE: $75.22\pm11.81$ HU, SSIM: $0.904\pm0.034$, PSNR: $33.52\pm2.06$ dB) without additional training, confirming robust generalization despite protocol, scanner differences and registration errors. These findings demonstrate the technical feasibility of FL for CBCT-to-sCT synthesis while preserving data privacy and offer a collaborative solution for developing generalizable models across institutions without centralized data sharing or site-specific fine-tuning.
nan
Article 755
Title@2025-06-10 (2): Compositional Causal Reasoning Evaluation in Language Models
Title: Compositional Causal Reasoning Evaluation in Language Models | kompositorische Kausale Reasoning-Bewertung in Sprachmodellen | 语文模式的构成原因原因原因评估 2503.04556v4 |
Authors (5): Jacqueline R. M. A. Maasch, Alihan Hüyük, Xinnuo Xu, Aditya V. Nori, Javier Gonzalez
Causal reasoning and compositional reasoning are two core aspirations in AI. Measuring the extent of these behaviors requires principled evaluation methods. We explore a unified perspective that considers both behaviors simultaneously, termed compositional causal reasoning (CCR): the ability to infer how causal measures compose and, equivalently, how causal quantities propagate through graphs. We instantiate a framework for the systematic evaluation of CCR for the average treatment effect and the probability of necessity and sufficiency. As proof of concept, we demonstrate CCR evaluation for language models in the LLama, Phi, and GPT families. On a math word problem, our framework revealed a range of taxonomically distinct error patterns. CCR errors increased with the complexity of causal paths for all models except o1.
nan
Article 756
Title@2025-06-10 (2): JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset
Title: JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset | JoFormer (Journal-basierter Transformer): Theorie und empirische Analyse auf dem winzigen Shakespeare-Datensatz | Joformer(基于日圆的变换器):关于小莎士比亚数据集的理论和经验分析 2506.08652v1 |
Authors (1): Mahesh Godavarti
Transformers have demonstrated remarkable success in sequence modeling, yet effectively incorporating positional information remains a challenging and active area of research. In this paper, we introduce JoFormer, a journey-based Transformer architecture grounded in a recently proposed non-commutative algebra for composing transformations across positions. JoFormer represents relative positions through learnable directional transforms that are sequentially composed along the input, thereby extending and generalizing existing approaches based on relative position representations. We derive the JoFormer attention mechanism from first principles and show that it subsumes standard methods such as rotary transformations as special cases. To evaluate its effectiveness, we compare JoFormer to the RoFormer baseline on the Tiny Shakespeare character-level language modeling task. Our results demonstrate that JoFormer consistently achieves lower perplexity and faster convergence, highlighting the advantages of its more expressive, journey-based treatment of position. Notably, the per-token JoFormer is still a primitive, conceptual variant with layer-independent angles, yet it already demonstrates strong performance-underscoring its promise as a proof of concept for more expressive architectures. We conclude by discussing how JoFormer offers a principled approach to integrating positional structure into Transformer architectures. The code used in this work is available at https://github.com/mahesh-godavarti/joformer.
nan
Article 757
Title@2025-06-10 (2): BridgeNet: A Hybrid, Physics-Informed Machine Learning Framework for Solving High-Dimensional Fokker-Planck Equations
Title: BridgeNet: A Hybrid, Physics-Informed Machine Learning Framework for Solving High-Dimensional Fokker-Planck Equations | BridgeNet: Hybrides, physikinformiertes Machine Learning Framework zur Lösung hochdimensionaler Fokker-Planck-Gleichungen | BridgeNet:用于解决高二分法克-普朗克赤道的混合、物理成形机械学习框架 2506.04354v3 |
Authors (3): Elmira Mirzabeigi, Rezvan Salehi, Kourosh Parand
BridgeNet is a novel hybrid framework that integrates convolutional neural networks with physics-informed neural networks to efficiently solve non-linear, high-dimensional Fokker-Planck equations (FPEs). Traditional PINNs, which typically rely on fully connected architectures, often struggle to capture complex spatial hierarchies and enforce intricate boundary conditions. In contrast, BridgeNet leverages adaptive CNN layers for effective local feature extraction and incorporates a dynamically weighted loss function that rigorously enforces physical constraints. Extensive numerical experiments across various test cases demonstrate that BridgeNet not only achieves significantly lower error metrics and faster convergence compared to conventional PINN approaches but also maintains robust stability in high-dimensional settings. This work represents a substantial advancement in computational physics, offering a scalable and accurate solution methodology with promising applications in fields ranging from financial mathematics to complex system dynamics.
nan
Article 758
Title@2025-06-10 (2): TableDreamer: Progressive and Weakness-guided Data Synthesis from Scratch for Table Instruction Tuning
Title: TableDreamer: Progressive and Weakness-guided Data Synthesis from Scratch for Table Instruction Tuning | TableDreamer: Progressive und schwache Datensynthese von Scratch für die Table Instruction Tuning | 表Dreamer:Scratch为表格指示图纸提供的渐进和弱点制导数据合成 2506.08646v1 |
Authors (7): Mingyu Zheng, Zhifan Feng, Jia Wang, Lanrui Wang, Zheng Lin, Yang Hao, Weiping Wang
Despite the commendable progress of recent LLM-based data synthesis methods, they face two limitations in generating table instruction tuning data. First, they can not thoroughly explore the vast input space of table understanding tasks, leading to limited data diversity. Second, they ignore the weaknesses in table understanding ability of the target LLM and blindly pursue the increase of data quantity, resulting in suboptimal data efficiency. In this paper, we introduce a progressive and weakness-guided data synthesis framework tailored for table instruction tuning, named TableDreamer, to mitigate the above issues. Specifically, we first synthesize diverse tables and related instructions as seed data, and then perform an iterative exploration of the input space under the guidance of the newly identified weakness data, which eventually serve as the final training data for fine-tuning the target LLM. Extensive experiments on 10 tabular benchmarks demonstrate the effectiveness of the proposed framework, which boosts the average accuracy of Llama3.1-8B-instruct by 11.62% (49.07% to 60.69%) with 27K GPT-4o synthetic data and outperforms state-of-the-art data synthesis baselines which use more training data. The code and data is available at https://github.com/SpursGoZmy/TableDreamer
nan
Article 759
Title@2025-06-10 (2): Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach
Title: Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach | Durchgängige cross-modale und uni-modale Darstellungen: Ein Kronecker-Produktansatz | 引信的跨模式和统一模式代表:克罗马克产品方法 2506.08645v1 |
Authors (3): Youqi Wu, Jingwei Zhang, Farzan Farnia
Cross-modal embeddings, such as CLIP, BLIP and their variants, have achieved promising results in aligning representations across modalities. However, these embeddings could underperform compared to state-of-the-art single-modality embeddings on modality-specific tasks. On the other hand, single-modality embeddings excel in their domains but lack cross-modal alignment capabilities. In this work, we focus on the problem of unifying cross-modality and single-modality embeddings to achieve the performance of modality-expert embedding within individual modalities while preserving cross-modal alignment. To this end, we propose RP-KrossFuse, a method that leverages a random projection-based Kronecker product to integrate cross-modal embeddings with single-modality embeddings. RP-KrossFuse aims to fuse the sample-pairwise similarity scores of the fused embeddings and operates efficiently in a specified kernel space and supports scalable implementations via random Fourier features for shift-invariant kernels such as the Gaussian kernel. We demonstrate the effectiveness of RP-KrossFuse through several numerical experiments, combining CLIP embeddings with uni-modal image and text embeddings. Our numerical results indicate that RP-KrossFuse achieves competitive modality-specific performance while retaining cross-modal alignment, bridging the gap between cross-modal and single-modality embeddings.
nan
Article 760
Title@2025-06-10 (2): Semi-gradient DICE for Offline Constrained Reinforcement Learning
Title: Semi-gradient DICE for Offline Constrained Reinforcement Learning | Halbgradienter DICE für Offline-Beschränktes Verstärkungslernen | 用于非线外约束训练强化学习的半级DICE 2506.08644v1 |
Authors (4): Woosung Kim, JunHo Seo, Jongmin Lee, Byung-Jun Lee
Stationary Distribution Correction Estimation (DICE) addresses the mismatch between the stationary distribution induced by a policy and the target distribution required for reliable off-policy evaluation (OPE) and policy optimization. DICE-based offline constrained RL particularly benefits from the flexibility of DICE, as it simultaneously maximizes return while estimating costs in offline settings. However, we have observed that recent approaches designed to enhance the offline RL performance of the DICE framework inadvertently undermine its ability to perform OPE, making them unsuitable for constrained RL scenarios. In this paper, we identify the root cause of this limitation: their reliance on a semi-gradient optimization, which solves a fundamentally different optimization problem and results in failures in cost estimation. Building on these insights, we propose a novel method to enable OPE and constrained RL through semi-gradient DICE. Our method ensures accurate cost estimation and achieves state-of-the-art performance on the offline constrained RL benchmark, DSRL.
nan
Article 761
Title@2025-06-10 (2): An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation
Title: An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation | Eine Analyse von Hyperparameter-Optimierungsmethoden für die retrieval Augmented Generation | 对回收增加的一代的超光子辐射仪最佳优化方法的分析 2505.03452v2 |
Authors (15): Matan Orbach, Ohad Eytan, Benjamin Sznajder, Ariel Gera, Odellia Boni, Yoav Kantor, Gal Bloch, Omri Levy, Hadas Abraham, Nitzan Barzilay, Eyal Shnarch, Michael E. Factor, Shila Ofek-Koifman, Paula Ta-Shma, Assaf Toledo
Finding the optimal Retrieval-Augmented Generation (RAG) configuration for a given use case can be complex and expensive. Motivated by this challenge, frameworks for RAG hyper-parameter optimization (HPO) have recently emerged, yet their effectiveness has not been rigorously benchmarked. To address this gap, we present a comprehensive study involving 5 HPO algorithms over 5 datasets from diverse domains, including a new one collected for this work on real-world product documentation. Our study explores the largest HPO search space considered to date, with three evaluation metrics as optimization targets. Analysis of the results shows that RAG HPO can be done efficiently, either greedily or with random search, and that it significantly boosts RAG performance for all datasets. For greedy HPO approaches, we show that optimizing model selection first is preferable to the prevalent practice of optimizing according to RAG pipeline order.
nan
Article 762
Title@2025-06-10 (2): Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers
Title: Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers | Zeitreihendarstellungen für die Klassifizierung liegen in vorgebildeten Visionstransformatoren versteckt | 未受过训练的愿景变异者中隐藏的分类隐隐藏物的 时间序列 2506.08641v1 |
Authors (5): Simon Roschmann, Quentin Bouniot, Vasilii Feofanov, Ievgen Redko, Zeynep Akata
Time series classification is a fundamental task in healthcare and industry, yet the development of time series foundation models (TSFMs) remains limited by the scarcity of publicly available time series datasets. In this work, we propose Time Vision Transformer (TiViT), a framework that converts time series into images to leverage the representational power of frozen Vision Transformers (ViTs) pretrained on large-scale image datasets. First, we theoretically motivate our approach by analyzing the 2D patching of ViTs for time series, showing that it can increase the number of label-relevant tokens and reduce the sample complexity. Second, we empirically demonstrate that TiViT achieves state-of-the-art performance on standard time series classification benchmarks by utilizing the hidden representations of large OpenCLIP models. We explore the structure of TiViT representations and find that intermediate layers with high intrinsic dimension are the most effective for time series classification. Finally, we assess the alignment between TiViT and TSFM representation spaces and identify a strong complementarity, with further performance gains achieved by combining their features. Our findings reveal yet another direction for reusing vision representations in a non-visual domain.
nan
Article 763
Title@2025-06-10 (2): CASE: Contrastive Activation for Saliency Estimation
Title: CASE: Contrastive Activation for Saliency Estimation | RECHTSSACHE: Kontrastive Aktivierung für Salienzschätzung | CASE: 弹性估计的反动动动动 2506.07327v2 |
Authors (3): Dane Williamson, Yangfeng Ji, Matthew Dwyer
Saliency methods are widely used to visualize which input features are deemed relevant to a model’s prediction. However, their visual plausibility can obscure critical limitations. In this work, we propose a diagnostic test for class sensitivity: a method’s ability to distinguish between competing class labels on the same input. Through extensive experiments, we show that many widely used saliency methods produce nearly identical explanations regardless of the class label, calling into question their reliability. We find that class-insensitive behavior persists across architectures and datasets, suggesting the failure mode is structural rather than model-specific. Motivated by these findings, we introduce CASE, a contrastive explanation method that isolates features uniquely discriminative for the predicted class. We evaluate CASE using the proposed diagnostic and a perturbation-based fidelity test, and show that it produces faithful and more class-specific explanations than existing methods.
nan
Article 764
Title@2025-06-10 (2): Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth
Title: Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth | Effizientes Feintuning Quantisierter Modelle über adaptive Rank und Bitwidth | 通过适应级和Bitwidth对量化模型进行高效率的微调 2505.03802v3 |
Authors (6): Changhai Zhou, Shijie Han, Shiyang Zhang, Yuhua Zhou, Weizhong Zhang, Cheng Jin
QLoRA effectively combines low-bit quantization and LoRA to achieve memory-friendly fine-tuning for large language models (LLM). Recently, methods based on SVD for continuous update iterations to initialize LoRA matrices to accommodate quantization errors have generally failed to consistently improve performance. Dynamic mixed precision is a natural idea for continuously improving the fine-tuning performance of quantized models, but previous methods often optimize low-rank subspaces or quantization components separately, without considering their synergy. To address this, we propose \textbf{QR-Adaptor}, a unified, gradient-free strategy that uses partial calibration data to jointly search the quantization components and the rank of low-rank spaces for each layer, thereby continuously improving model performance. QR-Adaptor does not minimize quantization error but treats precision and rank allocation as a discrete optimization problem guided by actual downstream performance and memory usage. Compared to state-of-the-art (SOTA) quantized LoRA fine-tuning methods, our approach achieves a 4.89\% accuracy improvement on GSM8K, and in some cases even outperforms the 16-bit fine-tuned model while maintaining the memory footprint of the 4-bit setting.
nan
Article 765
Title@2025-06-10 (2): Discovering Continuous-Time Memory-Based Symbolic Policies using Genetic Programming
Title: Discovering Continuous-Time Memory-Based Symbolic Policies using Genetic Programming | Kontinuierliche-Zeit-Speicher-basierte symbolische Richtlinien mit genetischer Programmierung entdecken | 利用遗传方案拟定办法发现基于连续时间记忆的象征性政策 2406.02765v6 |
Authors (3): Sigur de Vries, Sander Keemink, Marcel van Gerven
Artificial intelligence techniques are increasingly being applied to solve control problems, but often rely on black-box methods without transparent output generation. To improve the interpretability and transparency in control systems, models can be defined as white-box symbolic policies described by mathematical expressions. For better performance in partially observable and volatile environments, the symbolic policies are extended with memory represented by continuous-time latent variables, governed by differential equations. Genetic programming is used for optimisation, resulting in interpretable policies consisting of symbolic expressions. Our results show that symbolic policies with memory compare with black-box policies on a variety of control tasks. Furthermore, the benefit of the memory in symbolic policies is demonstrated on experiments where memory-less policies fall short. Overall, we present a method for evolving high-performing symbolic policies that offer interpretability and transparency, which lacks in black-box models.
nan
Article 766
Title@2025-06-10 (2): HSG-12M: A Large-Scale Spatial Multigraph Dataset
Title: HSG-12M: A Large-Scale Spatial Multigraph Dataset | HSG-12M: Ein großräumiger multigraphischer Datensatz | HSG-12M:大型空间多光谱数据集 2506.08618v1 |
Authors (5): Xianquan Yan, Hakan Akgün, Kenji Kawaguchi, N. Duane Loh, Ching Hua Lee
Existing graph benchmarks assume non-spatial, simple edges, collapsing physically distinct paths into a single link. We introduce HSG-12M, the first large-scale dataset of $\textbf{spatial multigraphs}-$graphs embedded in a metric space where multiple geometrically distinct trajectories between two nodes are retained as separate edges. HSG-12M contains 11.6 million static and 5.1 million dynamic $\textit{Hamiltonian spectral graphs}$ across 1401 characteristic-polynomial classes, derived from 177 TB of spectral potential data. Each graph encodes the full geometry of a 1-D crystal’s energy spectrum on the complex plane, producing diverse, physics-grounded topologies that transcend conventional node-coordinate datasets. To enable future extensions, we release $\texttt{Poly2Graph}$: a high-performance, open-source pipeline that maps arbitrary 1-D crystal Hamiltonians to spectral graphs. Benchmarks with popular GNNs expose new challenges in learning from multi-edge geometry at scale. Beyond its practical utility, we show that spectral graphs serve as universal topological fingerprints of polynomials, vectors, and matrices, forging a new algebra-to-graph link. HSG-12M lays the groundwork for geometry-aware graph learning and new opportunities of data-driven scientific discovery in condensed matter physics and beyond.
nan
Article 767
Title@2025-06-10 (2): Predicting and Understanding College Student Mental Health with Interpretable Machine Learning
Title: Predicting and Understanding College Student Mental Health with Interpretable Machine Learning | Vorhersagen und Verständnis College Student Mental Health mit Interpretable Machine Learning | 预测和理解学院学生心理健康与可解释机器学习 2503.08002v3 |
Authors (5): Meghna Roy Chowdhury, Wei Xuan, Shreyas Sen, Yixue Zhao, Yi Ding
Mental health issues among college students have reached critical levels, significantly impacting academic performance and overall wellbeing. Predicting and understanding mental health status among college students is challenging due to three main factors: the necessity for large-scale longitudinal datasets, the prevalence of black-box machine learning models lacking transparency, and the tendency of existing approaches to provide aggregated insights at the population level rather than individualized understanding. To tackle these challenges, this paper presents I-HOPE, the first Interpretable Hierarchical mOdel for Personalized mEntal health prediction. I-HOPE is a two-stage hierarchical model that connects raw behavioral features to mental health status through five defined behavioral categories as interaction labels. We evaluate I-HOPE on the College Experience Study, the longest longitudinal mobile sensing dataset. This dataset spans five years and captures data from both pre-pandemic periods and the COVID-19 pandemic. I-HOPE achieves a prediction accuracy of 91%, significantly surpassing the 60-70% accuracy of baseline methods. In addition, I-HOPE distills complex patterns into interpretable and individualized insights, enabling the future development of tailored interventions and improving mental health support. The code is available at https://github.com/roycmeghna/I-HOPE.
nan
Article 768
Title@2025-06-10 (2): Generalizing while preserving monotonicity in comparison-based preference learning models
Title: Generalizing while preserving monotonicity in comparison-based preference learning models | Verallgemeinern unter Wahrung der Monotonizität in vergleichsbasierten Präferenz-Lernmodellen | 在以比较为基础的优惠学习模式中,在保持单一性的同时,实现普遍化 2506.08616v1 |
Authors (4): Julien Fageot, Peva Blanchard, Gilles Bareilles, Lê-Nguyên Hoang
If you tell a learning model that you prefer an alternative $a$ over another alternative $b$, then you probably expect the model to be monotone, that is, the valuation of $a$ increases, and that of $b$ decreases. Yet, perhaps surprisingly, many widely deployed comparison-based preference learning models, including large language models, fail to have this guarantee. Until now, the only comparison-based preference learning algorithms that were proved to be monotone are the Generalized Bradley-Terry models. Yet, these models are unable to generalize to uncompared data. In this paper, we advance the understanding of the set of models with generalization ability that are monotone. Namely, we propose a new class of Linear Generalized Bradley-Terry models with Diffusion Priors, and identify sufficient conditions on alternatives’ embeddings that guarantee monotonicity. Our experiments show that this monotonicity is far from being a general guarantee, and that our new class of generalizing models improves accuracy, especially when the dataset is limited.
nan
Article 769
Title@2025-06-10 (2): PatchTrAD: A Patch-Based Transformer focusing on Patch-Wise Reconstruction Error for Time Series Anomaly Detection
Title: PatchTrAD: A Patch-Based Transformer focusing on Patch-Wise Reconstruction Error for Time Series Anomaly Detection | PatchTrad: Ein Patch-basierter Transformer mit Schwerpunkt auf Patch-Wise-Reconstruction-Fehler für Zeitreihenanomalienerkennung | PatchTraAD:一个以补丁为主的变换器,侧重于用于反常探测的时间序列的补补维重建错误 2504.08827v2 |
Authors (3): Samy-Melwan Vilhes, Gilles Gasso, Mokhtar Z Alaya
Time series anomaly detection (TSAD) focuses on identifying whether observations in streaming data deviate significantly from normal patterns. With the prevalence of connected devices, anomaly detection on time series has become paramount, as it enables real-time monitoring and early detection of irregular behaviors across various application domains. In this work, we introduce PatchTrAD, a Patch-based Transformer model for time series anomaly detection. Our approach leverages a Transformer encoder along with the use of patches under a reconstructionbased framework for anomaly detection. Empirical evaluations on multiple benchmark datasets show that PatchTrAD is on par, in terms of detection performance, with state-of-the-art deep learning models for anomaly detection while being time efficient during inference.
nan
Article 770
Title@2025-06-10 (2): A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices
Title: A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices | Ein zweistufiges Datenauswahl-Framework für dateneffiziente Modellschulungen auf Edge-Geräten | 边缘装置数据有效模型培训两阶段数据选择框架 2505.16563v2 |
Authors (4): Chen Gong, Rui Xing, Zhenzhe Zheng, Fan Wu
The demand for machine learning (ML) model training on edge devices is escalating due to data privacy and personalized service needs. However, we observe that current on-device model training is hampered by the under-utilization of on-device data, due to low training throughput, limited storage and diverse data importance. To improve data resource utilization, we propose a two-stage data selection framework {\sf Titan} to select the most important data batch from streaming data for model training with guaranteed efficiency and effectiveness. Specifically, in the first stage, {\sf Titan} filters out a candidate dataset with potentially high importance in a coarse-grained manner.In the second stage of fine-grained selection, we propose a theoretically optimal data selection strategy to identify the data batch with the highest model performance improvement to current training round. To further enhance time-and-resource efficiency, {\sf Titan} leverages a pipeline to co-execute data selection and model training, and avoids resource conflicts by exploiting idle computing resources. We evaluate {\sf Titan} on real-world edge devices and three representative edge computing tasks with diverse models and data modalities. Empirical results demonstrate that {\sf Titan} achieves up to $43\%$ reduction in training time and $6.2\%$ increase in final accuracy with minor system overhead, such as data processing delay, memory footprint and energy consumption.
nan
Article 771
Title@2025-06-10 (2): CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference
Title: CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference | CENTAUR: Die Unmögliche Dreifaltigkeit von Datenschutz, Effizienz und Leistung in der Datenschutz-Bewahrung Transformer-Schlussfolgerung | CENTAUR:在隐私保护变换人推断中弥合隐私、效率和性能这三方面不可能实现的三重问题 2412.10652v2 |
Authors (9): Jinglong Luo, Guanzhong Chen, Yehong Zhang, Shiyu Liu, Hui Wang, Yue Yu, Xun Zhou, Yuan Qi, Zenglin Xu
With the growing deployment of pre-trained models like Transformers on cloud platforms, privacy concerns about model parameters and inference data are intensifying. Existing Privacy-Preserving Transformer Inference (PPTI) frameworks face the “impossible trinity” of balancing privacy, efficiency, and performance: Secure Multi-Party Computation (SMPC)-based approaches ensure strong privacy but suffer from high computational overhead and performance losses; Conversely, permutation-based methods achieve near-plaintext efficiency and accuracy but compromise privacy by exposing sensitive model parameters and intermediate results. Bridging this gap with a single approach presents substantial challenges, motivating the introduction of CENTAUR, a groundbreaking PPTI framework that seamlessly integrates random permutations and SMPC to address the “impossible trinity”. By designing efficient PPTI algorithms tailored to the structural properties of Transformer models, CENTAUR achieves an unprecedented balance among privacy, efficiency, and performance. Our experiments demonstrate CENTAUR’s ability to resist diverse data reconstruction attacks, achieve plaintext-level inference accuracy, and boost inference speed by 5.0-30.4 times, unlocking new possibilities for secure and efficient AI deployment.
nan
Article 772
Title@2025-06-10 (2): Spiking Neural Models for Decision-Making Tasks with Learning
Title: Spiking Neural Models for Decision-Making Tasks with Learning | Spiralische Neuralmodelle für Entscheidungs-Making-Aufgaben mit Lernen | 学习决策任务Spiking神经模型 2506.09087v1 |
Authors (4): Sophie Jaffard, Giulia Mezzadri, Patricia Reynaud-Bouret, Etienne Tanré
In cognition, response times and choices in decision-making tasks are commonly modeled using Drift Diffusion Models (DDMs), which describe the accumulation of evidence for a decision as a stochastic process, specifically a Brownian motion, with the drift rate reflecting the strength of the evidence. In the same vein, the Poisson counter model describes the accumulation of evidence as discrete events whose counts over time are modeled as Poisson processes, and has a spiking neurons interpretation as these processes are used to model neuronal activities. However, these models lack a learning mechanism and are limited to tasks where participants have prior knowledge of the categories. To bridge the gap between cognitive and biological models, we propose a biologically plausible Spiking Neural Network (SNN) model for decision-making that incorporates a learning mechanism and whose neurons activities are modeled by a multivariate Hawkes process. First, we show a coupling result between the DDM and the Poisson counter model, establishing that these two models provide similar categorizations and reaction times and that the DDM can be approximated by spiking Poisson neurons. To go further, we show that a particular DDM with correlated noise can be derived from a Hawkes network of spiking neurons governed by a local learning rule. In addition, we designed an online categorization task to evaluate the model predictions. This work provides a significant step toward integrating biologically relevant neural mechanisms into cognitive models, fostering a deeper understanding of the relationship between neural activity and behavior.
nan
Article 773
Title@2025-06-10 (2): Sample Efficient Demonstration Selection for In-Context Learning
Title: Sample Efficient Demonstration Selection for In-Context Learning | Beispiel Effiziente Demonstrationsauswahl für das In-Context-Lernen | 内部学习的高效示范选择 2506.08607v1 |
Authors (4): Kiran Purohit, V Venktesh, Sourangshu Bhattacharya, Avishek Anand
The in-context learning paradigm with LLMs has been instrumental in advancing a wide range of natural language processing tasks. The selection of few-shot examples (exemplars / demonstration samples) is essential for constructing effective prompts under context-length budget constraints. In this paper, we formulate the exemplar selection task as a top-m best arms identification problem. A key challenge in this setup is the exponentially large number of arms that need to be evaluated to identify the m-best arms. We propose CASE (Challenger Arm Sampling for Exemplar selection), a novel sample-efficient selective exploration strategy that maintains a shortlist of “challenger” arms, which are current candidates for the top-m arms. In each iteration, only one of the arms from this shortlist or the current topm set is pulled, thereby reducing sample complexity and, consequently, the number of LLM evaluations. Furthermore, we model the scores of exemplar subsets (arms) using a parameterized linear scoring function, leading to stochastic linear bandits setting. CASE achieves remarkable efficiency gains of up to 7x speedup in runtime while requiring 7x fewer LLM calls (87% reduction) without sacrificing performance compared to state-of-the-art exemplar selection methods. We release our code and data at https://github.com/kiranpurohit/CASE
nan
Article 774
Title@2025-06-10 (2): Flow Matching Meets PDEs: A Unified Framework for Physics-Constrained Generation
Title: Flow Matching Meets PDEs: A Unified Framework for Physics-Constrained Generation | Flow Matching trifft auf PDEs: Ein einheitliches Framework für physikbedingte Generationen | 流动匹配符合PDEs:受物理约束的一代的统一框架 2506.08604v1 |
Authors (4): Giacomo Baldan, Qiang Liu, Alberto Guardone, Nils Thuerey
Generative machine learning methods, such as diffusion models and flow matching, have shown great potential in modeling complex system behaviors and building efficient surrogate models. However, these methods typically learn the underlying physics implicitly from data. We propose Physics-Based Flow Matching (PBFM), a novel generative framework that explicitly embeds physical constraints, both PDE residuals and algebraic relations, into the flow matching objective. We also introduce temporal unrolling at training time that improves the accuracy of the final, noise-free sample prediction. Our method jointly minimizes the flow matching loss and the physics-based residual loss without requiring hyperparameter tuning of their relative weights. Additionally, we analyze the role of the minimum noise level, $\sigma_{\min}$, in the context of physical constraints and evaluate a stochastic sampling strategy that helps to reduce physical residuals. Through extensive benchmarks on three representative PDE problems, we show that our approach yields up to an $8\times$ more accurate physical residuals compared to FM, while clearly outperforming existing algorithms in terms of distributional accuracy. PBFM thus provides a principled and efficient framework for surrogate modeling, uncertainty quantification, and accelerated simulation in physics and engineering applications.
nan
Article 775
Title@2025-06-10 (2): CALT: A Library for Computer Algebra with Transformer
Title: CALT: A Library for Computer Algebra with Transformer | CALT: Eine Bibliothek für Computer Algebra mit Transformer | CALT:一个带有变换器的计算机代数库 2506.08600v1 |
Authors (3): Hiroshi Kera, Shun Arakawa, Yuta Sato
Recent advances in artificial intelligence have demonstrated the learnability of symbolic computation through end-to-end deep learning. Given a sufficient number of examples of symbolic expressions before and after the target computation, Transformer models - highly effective learners of sequence-to-sequence functions - can be trained to emulate the computation. This development opens up several intriguing challenges and new research directions, which require active contributions from the symbolic computation community. In this work, we introduce Computer Algebra with Transformer (CALT), a user-friendly Python library designed to help non-experts in deep learning train models for symbolic computation tasks.
nan
Article 776
Title@2025-06-10 (2): Model-based learning for joint channel estimationand hybrid MIMO precoding
Title: Model-based learning for joint channel estimationand hybrid MIMO precoding | Modellbasiertes Lernen für gemeinsame Kanalschätzung und hybride MIMO-Vorkodierung | 联合渠道估计和混合混合混合的混合混合混合混合混合混合混合混合模式的模型学习 2505.04255v2 |
Authors (5): Nay Klaimi, Amira Bedoui, Clément Elvira, Philippe Mary, Luc Le Magoarou
Hybrid precoding is a key ingredient of cost-effective massive multiple-input multiple-output transceivers. However, setting jointly digital and analog precoders to optimally serve multiple users is a difficult optimization problem. Moreover, it relies heavily on precise knowledge of the channels, which is difficult to obtain, especially when considering realistic systems comprising hardware impairments. In this paper, a joint channel estimation and hybrid precoding method is proposed, which consists in an end-to-end architecture taking received pilots as inputs and outputting pre-coders. The resulting neural network is fully model-based, making it lightweight and interpretable with very few learnable parameters. The channel estimation step is performed using the unfolded matching pursuit algorithm, accounting for imperfect knowledge of the antenna system, while the precoding step is done via unfolded projected gradient ascent. The great potential of the proposed method is empirically demonstrated on realistic synthetic channels.
nan
Article 777
Title@2025-06-10 (2): Solving excited states for long-range interacting trapped ions with neural networks
Title: Solving excited states for long-range interacting trapped ions with neural networks | Lösen von angeregten Zuständen für Langstrecken-Interagieren von gefangenen Ionen mit neuronalen Netzwerken | 用神经网络解决长距离互动嵌入离子的兴奋状态 2506.08594v1 |
Authors (7): Yixuan Ma, Chang Liu, Weikang Li, Shun-Yao Zhang, L. -M. Duan, Yukai Wu, Dong-Ling Deng
The computation of excited states in strongly interacting quantum many-body systems is of fundamental importance. Yet, it is notoriously challenging due to the exponential scaling of the Hilbert space dimension with the system size. Here, we introduce a neural network-based algorithm that can simultaneously output multiple low-lying excited states of a quantum many-body spin system in an accurate and efficient fashion. This algorithm, dubbed the neural quantum excited-state (NQES) algorithm, requires no explicit orthogonalization of the states and is generally applicable to higher dimensions. We demonstrate, through concrete examples including the Haldane-Shastry model with all-to-all interactions, that the NQES algorithm is capable of efficiently computing multiple excited states and their related observable expectations. In addition, we apply the NQES algorithm to two classes of long-range interacting trapped-ion systems in a two-dimensional Wigner crystal. For non-decaying all-to-all interactions with alternating signs, our computed low-lying excited states bear spatial correlation patterns similar to those of the ground states, which closely match recent experimental observations that the quasi-adiabatically prepared state accurately reproduces analytical ground-state correlations. For a system of up to 300 ions with power-law decaying antiferromagnetic interactions, we successfully uncover its gap scaling and correlation features. Our results establish a scalable and efficient algorithm for computing excited states of interacting quantum many-body systems, which holds potential applications ranging from benchmarking quantum devices to photoisomerization.
nan
Article 778
Title@2025-06-10 (2): Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings
Title: Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings | Dichte Retriever können bei einfachen Abfragen scheitern: Enthüllen des Granularitäts-Dilemmas der Einbettungen | 简单查询: 读取嵌入器的颗粒性三角形 2506.08592v1 |
Authors (6): Liyan Xu, Zhenlin Su, Mo Yu, Jiangnan Li, Fandong Meng, Jie Zhou
This work focuses on an observed limitation of text encoders: embeddings may not be able to recognize fine-grained entities or events within the semantics, resulting in failed dense retrieval on even simple cases. To examine such behaviors, we first introduce a new evaluation dataset in Chinese, named CapRetrieval, whose passages are image captions, and queries are phrases inquiring entities or events in various forms. Zero-shot evaluation suggests that encoders may fail on these fine-grained matching, regardless of training sources or model sizes. Aiming for enhancement, we proceed to finetune encoders with our proposed data generation strategies, which obtains the best performance on CapRetrieval. Within this process, we further identify an issue of granularity dilemma, a challenge for embeddings to express fine-grained salience while aligning with overall semantics. Our dataset, code and models in this work are publicly released at https://github.com/lxucs/CapRetrieval.
nan
Article 779
Title@2025-06-10 (2): Diversity-Guided MLP Reduction for Efficient Large Vision Transformers
Title: Diversity-Guided MLP Reduction for Efficient Large Vision Transformers | Diversity-geführte MLP-Reduktion für effiziente Großbildtransformatoren | 为高效的大型愿景变异器而减少由多样性引导的减量 2506.08591v1 |
Authors (5): Chengchao Shen, Hourun Zhu, Gongfan Fang, Jianxin Wang, Xinchao Wang
Transformer models achieve excellent scaling property, where the performance is improved with the increment of model capacity. However, large-scale model parameters lead to an unaffordable cost of computing and memory. We analyze popular transformer architectures and find that multilayer perceptron (MLP) modules take up the majority of model parameters. To this end, we focus on the recoverability of the compressed models and propose a Diversity-Guided MLP Reduction (DGMR) method to significantly reduce the parameters of large vision transformers with only negligible performance degradation. Specifically, we conduct a Gram-Schmidt weight pruning strategy to eliminate redundant neurons of MLP hidden layer, while preserving weight diversity for better performance recover during distillation. Compared to the model trained from scratch, our pruned model only requires 0.06\% data of LAION-2B (for the training of large vision transformers) without labels (ImageNet-1K) to recover the original performance. Experimental results on several state-of-the-art large vision transformers demonstrate that our method achieves a more than 57.0\% parameter and FLOPs reduction in a near lossless manner. Notably, for EVA-CLIP-E (4.4B), our method accomplishes a 71.5\% parameter and FLOPs reduction without performance degradation. The source code and trained weights are available at https://github.com/visresearch/DGMR.
nan
Article 780
Title@2025-06-10 (2): How transformers learn structured data: insights from hierarchical filtering
Title: How transformers learn structured data: insights from hierarchical filtering | Wie Transformatoren strukturierte Daten lernen: Erkenntnisse aus der hierarchischen Filterung | 变压器如何学习结构化数据:从上层过滤器获得的洞察力 2408.15138v3 |
Authors (4): Jerome Garnier-Brun, Marc Mézard, Emanuele Moscato, Luca Saglietti
Understanding the learning process and the embedded computation in transformers is becoming a central goal for the development of interpretable AI. In the present study, we introduce a hierarchical filtering procedure for data models of sequences on trees, allowing us to hand-tune the range of positional correlations in the data. Leveraging this controlled setting, we provide evidence that vanilla encoder-only transformers can approximate the exact inference algorithm when trained on root classification and masked language modeling tasks, and study how this computation is discovered and implemented. We find that correlations at larger distances, corresponding to increasing layers of the hierarchy, are sequentially included by the network during training. By comparing attention maps from models trained with varying degrees of filtering and by probing the different encoder levels, we find clear evidence of a reconstruction of correlations on successive length scales corresponding to the various levels of the hierarchy, which we relate to a plausible implementation of the exact inference algorithm within the same architecture.
nan
Article 781
Title@2025-06-10 (2): Diffusion-based Time Series Forecasting for Sewerage Systems
Title: Diffusion-based Time Series Forecasting for Sewerage Systems | Diffusionsbasierte Zeitreihenprognosen für Kanalisationssysteme | 排污系统时间序列预报 2506.08577v1 |
Authors (5): Nicholas A. Pearson, Francesca Cairoli, Luca Bortolussi, Davide Russo, Francesca Zanello
We introduce a novel deep learning approach that harnesses the power of generative artificial intelligence to enhance the accuracy of contextual forecasting in sewerage systems. By developing a diffusion-based model that processes multivariate time series data, our system excels at capturing complex correlations across diverse environmental signals, enabling robust predictions even during extreme weather events. To strengthen the model’s reliability, we further calibrate its predictions with a conformal inference technique, tailored for probabilistic time series data, ensuring that the resulting prediction intervals are statistically reliable and cover the true target values with a desired confidence level. Our empirical tests on real sewerage system data confirm the model’s exceptional capability to deliver reliable contextual predictions, maintaining accuracy even under severe weather conditions.
nan
Article 782
Title@2025-06-10 (2): FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL
Title: FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL | FlickerFusion: Intra-Trajektorie Domain Generalizing Multi-Agent RL | FlickerFusion: 磁盘内域域 通用多代理 RL 2410.15876v4 |
Authors (8): Woosung Koh, Wonbeen Oh, Siyeol Kim, Suhin Shin, Hyeongjin Kim, Jaein Jang, Junghyun Lee, Se-Young Yun
Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory – a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. FlickerFusion stochastically drops out parts of the observation space, emulating being in-domain when inferenced OOD. The results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-`a-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.
nan
Article 783
Title@2025-06-10 (2): How Malicious AI Swarms Can Threaten Democracy
Title: How Malicious AI Swarms Can Threaten Democracy | Wie bösartige KI-Schwärme die Demokratie bedrohen können | 恶意的AI Swararms能够如何威胁民主 2506.06299v2 |
Authors (21): Daniel Thilo Schroeder, Meeyoung Cha, Andrea Baronchelli, Nick Bostrom, Nicholas A. Christakis, David Garcia, Amit Goldenberg, Yara Kyrychenko, Kevin Leyton-Brown, Nina Lutz, Gary Marcus, Filippo Menczer, Gordon Pennycook, David G. Rand, Frank Schweitzer, Christopher Summerfield, Audrey Tang, Jay Van Bavel, Sander van der Linden, Dawn Song, Jonas R. Kunst
Advances in AI portend a new era of sophisticated disinformation operations. While individual AI systems already create convincing – and at times misleading – information, an imminent development is the emergence of malicious AI swarms. These systems can coordinate covertly, infiltrate communities, evade traditional detectors, and run continuous A/B tests, with round-the-clock persistence. The result can include fabricated grassroots consensus, fragmented shared reality, mass harassment, voter micro-suppression or mobilization, contamination of AI training data, and erosion of institutional trust. With democratic processes worldwide increasingly vulnerable, we urge a three-pronged response: (1) platform-side defenses – always-on swarm-detection dashboards, pre-election high-fidelity swarm-simulation stress-tests, transparency audits, and optional client-side “AI shields” for users; (2) model-side safeguards – standardized persuasion-risk tests, provenance-authenticating passkeys, and watermarking; and (3) system-level oversight – a UN-backed AI Influence Observatory.
nan
Article 784
Title@2025-06-10 (2): The Geometries of Truth Are Orthogonal Across Tasks
Title: The Geometries of Truth Are Orthogonal Across Tasks | Die Geometrien der Wahrheit sind orthogonal über Aufgaben hinweg | 真理的地貌不同任务 2506.08572v1 |
Authors (7): Waiss Azizian, Michael Kirchhof, Eugene Ndiaye, Louis Bethune, Michal Klein, Pierre Ablin, Marco Cuturi
Large Language Models (LLMs) have demonstrated impressive generalization capabilities across various tasks, but their claim to practical relevance is still mired by concerns on their reliability. Recent works have proposed examining the activations produced by an LLM at inference time to assess whether its answer to a question is correct. Some works claim that a “geometry of truth” can be learned from examples, in the sense that the activations that generate correct answers can be distinguished from those leading to mistakes with a linear classifier. In this work, we underline a limitation of these approaches: we observe that these “geometries of truth” are intrinsically task-dependent and fail to transfer across tasks. More precisely, we show that linear classifiers trained across distinct tasks share little similarity and, when trained with sparsity-enforcing regularizers, have almost disjoint supports. We show that more sophisticated approaches (e.g., using mixtures of probes and tasks) fail to overcome this limitation, likely because activation vectors commonly used to classify answers form clearly separated clusters when examined across tasks.
nan
Article 785
Title@2025-06-10 (2): Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification
Title: Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification | Asymptotische Normalität von unendlichen zentrierten Zufallswäldern -Anwendung auf eine unausgewogene Klassifizierung | 无限集中的随机森林的零散状态—-对平衡分类的适用 2506.08548v1 |
Authors (4): Moria Mayala, Erwan Scornet, Charles Tillier, Olivier Wintenberger
Many classification tasks involve imbalanced data, in which a class is largely underrepresented. Several techniques consists in creating a rebalanced dataset on which a classifier is trained. In this paper, we study theoretically such a procedure, when the classifier is a Centered Random Forests (CRF). We establish a Central Limit Theorem (CLT) on the infinite CRF with explicit rates and exact constant. We then prove that the CRF trained on the rebalanced dataset exhibits a bias, which can be removed with appropriate techniques. Based on an importance sampling (IS) approach, the resulting debiased estimator, called IS-ICRF, satisfies a CLT centered at the prediction function value. For high imbalance settings, we prove that the IS-ICRF estimator enjoys a variance reduction compared to the ICRF trained on the original data. Therefore, our theoretical analysis highlights the benefits of training random forests on a rebalanced dataset (followed by a debiasing procedure) compared to using the original data. Our theoretical results, especially the variance rates and the variance reduction, appear to be valid for Breiman’s random forests in our experiments.
nan
Article 786
Title@2025-06-10 (2): LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation
Title: LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation | LLM-ML Teaming: Integrierte Symbolische Dekodierung und Gradientensuche nach gültiger und stabiler Generativer Feature Transformation | LLM-ML 团队化:综合符号解码和渐进搜索,以进行有效和稳定的生成地貌转型 2506.09085v1 |
Authors (7): Xinyuan Wang, Haoyue Bai, Nanxu Gong, Wangyang Ying, Sixun Dong, Xiquan Cui, Yanjie Fu
Feature transformation enhances data representation by deriving new features from the original data. Generative AI offers potential for this task, but faces challenges in stable generation (consistent outputs) and valid generation (error-free sequences). Existing methods–traditional MLs’ low validity and LLMs’ instability–fail to resolve both. We find that LLMs ensure valid syntax, while ML’s gradient-steered search stabilizes performance. To bridge this gap, we propose a teaming framework combining LLMs’ symbolic generation with ML’s gradient optimization. This framework includes four steps: (1) golden examples generation, aiming to prepare high-quality samples with the ground knowledge of the teacher LLM; (2) feature transformation sequence embedding and search, intending to uncover potentially superior embeddings within the latent space; (3) student LLM feature transformation, aiming to distill knowledge from the teacher LLM; (4) LLM-ML decoder teaming, dedicating to combine ML and the student LLM probabilities for valid and stable generation. The experiments on various datasets show that the teaming policy can achieve 5\% improvement in downstream performance while reducing nearly half of the error cases. The results also demonstrate the efficiency and robustness of the teaming policy. Additionally, we also have exciting findings on LLMs’ capacity to understand the original data.
nan
Article 787
Title@2025-06-10 (2): Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models
Title: Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models | Verbesserte Ganzseitenoptimierung durch gemischt-körnige Belohnungsmechanismen-angepasste Sprachmodelle | 通过混合发奖机制-成熟语言模式优化全页 2506.09084v1 |
Authors (3): Xinyuan Wang, Liang Wu, Yanjie Fu
Optimizing the presentation of search and recommendation results is crucial to enhancing user experience and engagement. Whole Page Optimization (WPO) plays a pivotal role in this process, as it directly influences how information is surfaced to users. While Pre-trained Large Language Models (LLMs) have demonstrated remarkable capabilities in generating coherent and contextually relevant content, fine-tuning these models for complex tasks like WPO presents challenges. Specifically, the need for extensive human-annotated data to mitigate issues such as hallucinations and model instability can be prohibitively expensive, especially in large-scale systems that interact with millions of items daily. In this work, we address the challenge of fine-tuning LLMs for WPO by using user feedback as the supervision. Unlike manually labeled datasets, user feedback is inherently noisy and less precise. To overcome this, we propose a reward-based fine-tuning approach, PageLLM, which employs a mixed-grained reward mechanism that combines page-level and item-level rewards. The page-level reward evaluates the overall quality and coherence, while the item-level reward focuses on the accuracy and relevance of key recommendations. This dual-reward structure ensures that both the holistic presentation and the critical individual components are optimized. We validate PageLLM on both public and industrial datasets. PageLLM outperforms baselines and achieves a 0.44\% GMV increase in an online A/B test with over 10 million users, demonstrating its real-world impact.
nan
Article 788
Title@2025-06-10 (2): Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models
Title: Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models | Auf dem Weg zu einem autonomen Stärkungslernen für die real-weltliche Robotermanipulation mit großen Sprachmodellen | 争取以大语言模式进行现实世界机器人操纵的自主强化学习 2503.04280v4 |
Authors (6): Niccolò Turcato, Matteo Iovino, Aris Synodinos, Alberto Dalla Libera, Ruggero Carli, Pietro Falco
Recent advancements in Large Language Models (LLMs) and Visual Language Models (VLMs) have significantly impacted robotics, enabling high-level semantic motion planning applications. Reinforcement Learning (RL), a complementary paradigm, enables agents to autonomously optimize complex behaviors through interaction and reward signals. However, designing effective reward functions for RL remains challenging, especially in real-world tasks where sparse rewards are insufficient and dense rewards require elaborate design. In this work, we propose Autonomous Reinforcement learning for Complex Human-Informed Environments (ARCHIE), an unsupervised pipeline leveraging GPT-4, a pre-trained LLM, to generate reward functions directly from natural language task descriptions. The rewards are used to train RL agents in simulated environments, where we formalize the reward generation process to enhance feasibility. Additionally, GPT-4 automates the coding of task success criteria, creating a fully automated, one-shot procedure for translating human-readable text into deployable robot skills. Our approach is validated through extensive simulated experiments on single-arm and bi-manual manipulation tasks using an ABB YuMi collaborative robot, highlighting its practicality and effectiveness. Tasks are demonstrated on the real robot setup.
nan
Article 789
Title@2025-06-10 (2): Structured Variational $D$-Decomposition for Accurate and Stable Low-Rank Approximation
Title: Structured Variational $D$-Decomposition for Accurate and Stable Low-Rank Approximation | Strukturierte Variation $D$-Dekomposition für genaue und stabile Low-Rank-Annäherung | 结构变化($D美元)-准确度和稳定、低 Rank 接近度的削减 2506.08535v1 |
Authors (1): Ronald Katende
We introduce the $D$-decomposition, a non-orthogonal matrix factorization of the form $A \approx P D Q$, where $P \in \mathbb{R}^{n \times k}$, $D \in \mathbb{R}^{k \times k}$, and $Q \in \mathbb{R}^{k \times n}$. The decomposition is defined variationally by minimizing a regularized Frobenius loss, allowing control over rank, sparsity, and conditioning. Unlike algebraic factorizations such as LU or SVD, it is computed by alternating minimization. We establish existence and perturbation stability of the solution and show that each update has complexity $\mathcal{O}(n^2k)$. Benchmarks against truncated SVD, CUR, and nonnegative matrix factorization show improved reconstruction accuracy on MovieLens, MNIST, Olivetti Faces, and gene expression matrices, particularly under sparsity and noise.
nan
Article 790
Title@2025-06-10 (2): Regularized Langevin Dynamics for Combinatorial Optimization
Title: Regularized Langevin Dynamics for Combinatorial Optimization | Regularisierte Langevin-Dynamik für kombinatorische Optimierung | 固定的用于组合优化的兰埃文动态 2502.00277v2 |
Authors (2): Shengyu Feng, Yiming Yang
This work proposes a simple yet effective sampling framework for combinatorial optimization (CO). Our method builds on discrete Langevin dynamics (LD), an efficient gradient-guided generative paradigm. However, we observe that directly applying LD often leads to limited exploration. To overcome this limitation, we propose the Regularized Langevin Dynamics (RLD), which enforces an expected distance between the sampled and current solutions, effectively avoiding local minima. We develop two CO solvers on top of RLD, one based on simulated annealing (SA), and the other one based on neural network (NN). Empirical results on three classic CO problems demonstrate that both of our methods can achieve comparable or better performance against the previous state-of-the-art (SOTA) SA- and NN-based solvers. In particular, our SA algorithm reduces the runtime of the previous SOTA SA method by up to 80\%, while achieving equal or superior performance. In summary, RLD offers a promising framework for enhancing both traditional heuristics and NN models to solve CO problems. Our code is available at https://github.com/Shengyu-Feng/RLD4CO.
nan
Article 791
Title@2025-06-10 (2): Federated Learning: From Theory to Practice
Title: Federated Learning: From Theory to Practice | Föderiertes Lernen: Von der Theorie zur Praxis | 联邦学习:从理论到实践 2505.19183v2 |
Authors (1): A. Jung
This book offers a hands-on introduction to building and understanding federated learning (FL) systems. FL enables multiple devices – such as smartphones, sensors, or local computers – to collaboratively train machine learning (ML) models, while keeping their data private and local. It is a powerful solution when data cannot or should not be centralized due to privacy, regulatory, or technical reasons. The book is designed for students, engineers, and researchers who want to learn how to design scalable, privacy preserving FL systems. Our main focus is on personalization: enabling each device to train its own model while still benefiting from collaboration with relevant devices. This is achieved by leveraging similarities between (the learning tasks associated with) devices that are encoded by the weighted edges (or links) of a federated learning network (FL network). The key idea is to represent real-world FL systems as networks of devices, where nodes correspond to device and edges represent communication links and data similarities between them. The training of personalized models for these devices can be naturally framed as a distributed optimization problem. This optimization problem is referred to as generalized total variation minimization (GTVMin) and ensures that devices with similar learning tasks learn similar model parameters. Our approach is both mathematically principled and practically motivated. While we introduce some advanced ideas from optimization theory and graph-based learning, we aim to keep the book accessible. Readers are guided through the core ideas step by step, with intuitive explanations.
nan
Article 792
Title@2025-06-10 (2): Robust Evolutionary Multi-Objective Network Architecture Search for Reinforcement Learning (EMNAS-RL)
Title: Robust Evolutionary Multi-Objective Network Architecture Search for Reinforcement Learning (EMNAS-RL) | Robuste evolutionäre multiobjektive Netzwerkarchitektur Suche nach Verstärkungslernen (EMNAS-RL) | 强劲进化多目标网络强化学习建筑搜索(EMNAS-RL) 2506.08533v1 |
Authors (4): Nihal Acharya Adde, Alexandra Gianzina, Hanno Gottschalk, Andreas Ebert
This paper introduces Evolutionary Multi-Objective Network Architecture Search (EMNAS) for the first time to optimize neural network architectures in large-scale Reinforcement Learning (RL) for Autonomous Driving (AD). EMNAS uses genetic algorithms to automate network design, tailored to enhance rewards and reduce model size without compromising performance. Additionally, parallelization techniques are employed to accelerate the search, and teacher-student methodologies are implemented to ensure scalable optimization. This research underscores the potential of transfer learning as a robust framework for optimizing performance across iterative learning processes by effectively leveraging knowledge from earlier generations to enhance learning efficiency and stability in subsequent generations. Experimental results demonstrate that tailored EMNAS outperforms manually designed models, achieving higher rewards with fewer parameters. The findings of these strategies contribute positively to EMNAS for RL in autonomous driving, advancing the field toward better-performing networks suitable for real-world scenarios.
nan
Article 793
Title@2025-06-10 (2): Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks
Title: Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks | Effiziente robuste konforme Vorhersage über Lipschitz-gebundene Netzwerke | 通过Lipschitz-Bounds网络进行非正式预测 2506.05434v2 |
Authors (7): Thomas Massena, Léo andéol, Thibaut Boissin, Franck Mamalet, Corentin Friedrich, Mathieu Serrurier, Sébastien Gerchinovitz
Conformal Prediction (CP) has proven to be an effective post-hoc method for improving the trustworthiness of neural networks by providing prediction sets with finite-sample guarantees. However, under adversarial attacks, classical conformal guarantees do not hold anymore: this problem is addressed in the field of Robust Conformal Prediction. Several methods have been proposed to provide robust CP sets with guarantees under adversarial perturbations, but, for large scale problems, these sets are either too large or the methods are too computationally demanding to be deployed in real life scenarios. In this work, we propose a new method that leverages Lipschitz-bounded networks to precisely and efficiently estimate robust CP sets. When combined with a 1-Lipschitz robust network, we demonstrate that our lip-rcp method outperforms state-of-the-art results in both the size of the robust CP sets and computational efficiency in medium and large-scale scenarios such as ImageNet. Taking a different angle, we also study vanilla CP under attack, and derive new worst-case coverage bounds of vanilla CP sets, which are valid simultaneously for all adversarial attack levels. Our lip-rcp method makes this second approach as efficient as vanilla CP while also allowing robustness guarantees.
nan
Article 794
Title@2025-06-10 (2): Leveraging chaos in the training of artificial neural networks
Title: Leveraging chaos in the training of artificial neural networks | Das Chaos in der Ausbildung künstlicher neuronaler Netze abbauen | 在培训人工神经网络时利用混乱 2506.08523v1 |
Authors (3): Pedro Jiménez-González, Miguel C. Soriano, Lucas Lacasa
Traditional algorithms to optimize artificial neural networks when confronted with a supervised learning task are usually exploitation-type relaxational dynamics such as gradient descent (GD). Here, we explore the dynamics of the neural network trajectory along training for unconventionally large learning rates. We show that for a region of values of the learning rate, the GD optimization shifts away from purely exploitation-like algorithm into a regime of exploration-exploitation balance, as the neural network is still capable of learning but the trajectory shows sensitive dependence on initial conditions – as characterized by positive network maximum Lyapunov exponent –. Interestingly, the characteristic training time required to reach an acceptable accuracy in the test set reaches a minimum precisely in such learning rate region, further suggesting that one can accelerate the training of artificial neural networks by locating at the onset of chaos. Our results – initially illustrated for the MNIST classification task – qualitatively hold for a range of supervised learning tasks, learning architectures and other hyperparameters, and showcase the emergent, constructive role of transient chaotic dynamics in the training of artificial neural networks.
nan
Article 795
Title@2025-06-10 (2): FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching
Title: FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching | FEDTAIL: Federated Long-Tailed Domain Generalization mit Sharpness-Guided Gradient Matching | FEDTAIL: 采用尖锐引导梯度匹配法的联邦长期失败域 2506.08518v1 |
Authors (4): Sunny Gupta, Nikita Jangid, Shounak Das, Amit Sethi
Domain Generalization (DG) seeks to train models that perform reliably on unseen target domains without access to target data during training. While recent progress in smoothing the loss landscape has improved generalization, existing methods often falter under long-tailed class distributions and conflicting optimization objectives. We introduce FedTAIL, a federated domain generalization framework that explicitly addresses these challenges through sharpness-guided, gradient-aligned optimization. Our method incorporates a gradient coherence regularizer to mitigate conflicts between classification and adversarial objectives, leading to more stable convergence. To combat class imbalance, we perform class-wise sharpness minimization and propose a curvature-aware dynamic weighting scheme that adaptively emphasizes underrepresented tail classes. Furthermore, we enhance conditional distribution alignment by integrating sharpness-aware perturbations into entropy regularization, improving robustness under domain shift. FedTAIL unifies optimization harmonization, class-aware regularization, and conditional alignment into a scalable, federated-compatible framework. Extensive evaluations across standard domain generalization benchmarks demonstrate that FedTAIL achieves state-of-the-art performance, particularly in the presence of domain shifts and label imbalance, validating its effectiveness in both centralized and federated settings. Code: https://github.com/sunnyinAI/FedTail
nan
Article 796
Title@2025-06-10 (2): Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
Title: Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering | Meta-Adaptive Prompt-Destillation für wenige-heiße visuelle Frage-Antworten | 用于少数浅光视觉问题解答的元适应性快速蒸馏 2506.06905v2 |
Authors (3): Akash Gupta, Amos Storkey, Mirella Lapata
Large Multimodal Models (LMMs) often rely on in-context learning (ICL) to perform new tasks with minimal supervision. However, ICL performance, especially in smaller LMMs, is inconsistent and does not always improve monotonically with increasing examples. We hypothesize that this occurs due to the LMM being overwhelmed by additional information present in the image embeddings, which is not required for the downstream task. To address this, we propose a meta-learning approach that provides an alternative for inducing few-shot capabilities in LMMs, using a fixed set of soft prompts that are distilled from task-relevant image features and can be adapted at test time using a few examples. To facilitate this distillation, we introduce an attention-mapper module that can be easily integrated with the popular LLaVA v1.5 architecture and is jointly learned with soft prompts, enabling task adaptation in LMMs under low-data regimes with just a few gradient steps. Evaluation on the VL-ICL Bench shows that our method consistently outperforms ICL and related prompt-tuning approaches, even under image perturbations, improving task induction and reasoning across visual question answering tasks.
nan
Article 797
Title@2025-06-10 (2): NeurIPS 2024 ML4CFD Competition: Results and Retrospective Analysis
Title: NeurIPS 2024 ML4CFD Competition: Results and Retrospective Analysis | NeurIPS 2024 ML4CFD Wettbewerb: Ergebnisse und Retrospektivenanalyse | NeurIPS 2024 ML4CFD竞争:结果和回溯性分析 2506.08516v1 |
Authors (18): Mouadh Yagoubi, David Danan, Milad Leyli-Abadi, Ahmed Mazari, Jean-Patrick Brunet, Abbas Kabalan, Fabien Casenave, Yuxin Ma, Giovanni Catalani, Jean Fesquet, Jacob Helwig, Xuan Zhang, Haiyang Yu, Xavier Bertrand, Frederic Tost, Michael Baurheim, Joseph Morlier, Shuiwang Ji
The integration of machine learning (ML) into the physical sciences is reshaping computational paradigms, offering the potential to accelerate demanding simulations such as computational fluid dynamics (CFD). Yet, persistent challenges in accuracy, generalization, and physical consistency hinder the practical deployment of ML models in scientific domains. To address these limitations and systematically benchmark progress, we organized the ML4CFD competition, centered on surrogate modeling for aerodynamic simulations over two-dimensional airfoils. The competition attracted over 240 teams, who were provided with a curated dataset generated via OpenFOAM and evaluated through a multi-criteria framework encompassing predictive accuracy, physical fidelity, computational efficiency, and out-of-distribution generalization. This retrospective analysis reviews the competition outcomes, highlighting several approaches that outperformed baselines under our global evaluation score. Notably, the top entry exceeded the performance of the original OpenFOAM solver on aggregate metrics, illustrating the promise of ML-based surrogates to outperform traditional solvers under tailored criteria. Drawing from these results, we analyze the key design principles of top submissions, assess the robustness of our evaluation framework, and offer guidance for future scientific ML challenges.
nan
Article 798
Title@2025-06-10 (2): DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training
Title: DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training | DiffGradCAM: Eine universelle Aktivierungskarte der Klasse, die dem adversarialen Training standhält | DiffGradCAM: 通用级启动地图抗反向培训 2506.08514v1 |
Authors (3): Jacob Piland, Chris Sweet, Adam Czakja
Class Activation Mapping (CAM) and its gradient-based variants (e.g., GradCAM) have become standard tools for explaining Convolutional Neural Network (CNN) predictions. However, these approaches typically focus on individual logits, while for neural networks using softmax, the class membership probability estimates depend \textit{only} on the \textit{differences} between logits, not on their absolute values. This disconnect leaves standard CAMs vulnerable to adversarial manipulation, such as passive fooling, where a model is trained to produce misleading CAMs without affecting decision performance. We introduce \textbf{Salience-Hoax Activation Maps (SHAMs)}, an \emph{entropy-aware form of passive fooling} that serves as a benchmark for CAM robustness under adversarial conditions. To address the passive fooling vulnerability, we then propose \textbf{DiffGradCAM}, a novel, lightweight, and contrastive approach to class activation mapping that is both non-suceptible to passive fooling, but also matches the output of standard CAM methods such as GradCAM in the non-adversarial case. Together, SHAM and DiffGradCAM establish a new framework for probing and improving the robustness of saliency-based explanations. We validate both contributions across multi-class tasks with few and many classes.
nan
Article 799
Title@2025-06-10 (2): Human-Aligned Image Models Improve Visual Decoding from the Brain
Title: Human-Aligned Image Models Improve Visual Decoding from the Brain | Menschlich ausgerichtete Imagemodelle verbessern die visuelle Dekodierung aus dem Gehirn | 人与人之间的图像模型改进大脑的视觉解码 2502.03081v3 |
Authors (6): Nona Rajabi, Antônio H. Ribeiro, Miguel Vasco, Farzaneh Taleb, Mårten Björkman, Danica Kragic
Decoding visual images from brain activity has significant potential for advancing brain-computer interaction and enhancing the understanding of human perception. Recent approaches align the representation spaces of images and brain activity to enable visual decoding. In this paper, we introduce the use of human-aligned image encoders to map brain signals to images. We hypothesize that these models more effectively capture perceptual attributes associated with the rapid visual stimuli presentations commonly used in visual brain data recording experiments. Our empirical results support this hypothesis, demonstrating that this simple modification improves image retrieval accuracy by up to 21% compared to state-of-the-art methods. Comprehensive experiments confirm consistent performance improvements across diverse EEG architectures, image encoders, alignment methods, participants, and brain imaging modalities
nan
Article 800
Title@2025-06-10 (2): Explaining, Fast and Slow: Abstraction and Refinement of Provable Explanations
Title: Explaining, Fast and Slow: Abstraction and Refinement of Provable Explanations | Erklären, Schnell und Langsam: Abstraktion und Verfeinerung wahrnehmbarer Erklärungen | 解释、快速和缓慢:可允许解释的抽象和完善 2506.08505v1 |
Authors (5): Shahaf Bassan, Yizhak Yisrael Elboher, Tobias Ladner, Matthias Althoff, Guy Katz
Despite significant advancements in post-hoc explainability techniques for neural networks, many current methods rely on heuristics and do not provide formally provable guarantees over the explanations provided. Recent work has shown that it is possible to obtain explanations with formal guarantees by identifying subsets of input features that are sufficient to determine that predictions remain unchanged using neural network verification techniques. Despite the appeal of these explanations, their computation faces significant scalability challenges. In this work, we address this gap by proposing a novel abstraction-refinement technique for efficiently computing provably sufficient explanations of neural network predictions. Our method abstracts the original large neural network by constructing a substantially reduced network, where a sufficient explanation of the reduced network is also provably sufficient for the original network, hence significantly speeding up the verification process. If the explanation is in sufficient on the reduced network, we iteratively refine the network size by gradually increasing it until convergence. Our experiments demonstrate that our approach enhances the efficiency of obtaining provably sufficient explanations for neural network predictions while additionally providing a fine-grained interpretation of the network’s predictions across different abstraction levels.
nan
Article 801
Title@2025-06-10 (2): CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations
Title: CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations | CoMuMDR: Code-gemischter multimodaler Multidomain-Korpus für Diskurs paRsing in Gesprächen | COMMDR: 混合编码的多式混合多式对话中聊天声响多式公文 2506.08504v1 |
Authors (6): Divyaksh Shukla, Ritesh Baviskar, Dwijesh Gohil, Aniket Tiwari, Atul Shree, Ashutosh Modi
Discourse parsing is an important task useful for NLU applications such as summarization, machine comprehension, and emotion recognition. The current discourse parsing datasets based on conversations consists of written English dialogues restricted to a single domain. In this resource paper, we introduce CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations. The corpus (code-mixed in Hindi and English) has both audio and transcribed text and is annotated with nine discourse relations. We experiment with various SoTA baseline models; the poor performance of SoTA models highlights the challenges of multi-domain code-mixed corpus, pointing towards the need for developing better models for such realistic settings.
nan
Article 802
Title@2025-06-10 (2): Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
Title: Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning | Memory, Benchmark & Robots: Ein Benchmark für die Lösung komplexer Aufgaben mit Verstärkungslernen | 记忆、基准和机器人:用强化学习解决复杂任务的基准 2502.10550v2 |
Authors (4): Egor Cherepanov, Nikita Kachaev, Alexey K. Kovalev, Aleksandr I. Panov
Memory is crucial for enabling agents to tackle complex tasks with temporal and spatial dependencies. While many reinforcement learning (RL) algorithms incorporate memory, the field lacks a universal benchmark to assess an agent’s memory capabilities across diverse scenarios. This gap is particularly evident in tabletop robotic manipulation, where memory is essential for solving tasks with partial observability and ensuring robust performance, yet no standardized benchmarks exist. To address this, we introduce MIKASA (Memory-Intensive Skills Assessment Suite for Agents), a comprehensive benchmark for memory RL, with three key contributions: (1) we propose a comprehensive classification framework for memory-intensive RL tasks, (2) we collect MIKASA-Base – a unified benchmark that enables systematic evaluation of memory-enhanced agents across diverse scenarios, and (3) we develop MIKASA-Robo (pip install mikasa-robo-suite) – a novel benchmark of 32 carefully designed memory-intensive tasks that assess memory capabilities in tabletop robotic manipulation. Our work introduces a unified framework to advance memory RL research, enabling more robust systems for real-world use. MIKASA is available at https://tinyurl.com/membenchrobots.
nan
Article 803
Title@2025-06-10 (2): DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs
Title: DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs | In Konflikte geraten: In suchgesteigerten LLMs widersprüchliche Quellen erkennen und bekämpfen | 钻入冲突:发现和解决搜索中的冲突源 2506.08500v1 |
Authors (9): Arie Cattan, Alon Jacovi, Ori Ram, Jonathan Herzig, Roee Aharoni, Sasha Goldshtein, Eran Ofek, Idan Szpektor, Avi Caciularu
Retrieval Augmented Generation (RAG) is a commonly used approach for enhancing large language models (LLMs) with relevant and up-to-date information. However, the retrieved sources can often contain conflicting information and it remains unclear how models should address such discrepancies. In this work, we first propose a novel taxonomy of knowledge conflict types in RAG, along with the desired model behavior for each type. We then introduce CONFLICTS, a high-quality benchmark with expert annotations of conflict types in a realistic RAG setting. CONFLICTS is the first benchmark that enables tracking progress on how models address a wide range of knowledge conflicts. We conduct extensive experiments on this benchmark, showing that LLMs often struggle to appropriately resolve conflicts between sources. While prompting LLMs to explicitly reason about the potential conflict in the retrieved documents significantly improves the quality and appropriateness of their responses, substantial room for improvement in future research remains.
nan
Article 804
Title@2025-06-10 (2): Chip Placement with Diffusion Models
Title: Chip Placement with Diffusion Models | Chipplatzierung mit Diffusionsmodellen | 与传播模型一起放置芯片 2407.12282v3 |
Authors (6): Vint Lee, Minh Nguyen, Leena Elzeiny, Chun Deng, Pieter Abbeel, John Wawrzynek
Macro placement is a vital step in digital circuit design that defines the physical location of large collections of components, known as macros, on a 2D chip. Because key performance metrics of the chip are determined by the placement, optimizing it is crucial. Existing learning-based methods typically fall short because of their reliance on reinforcement learning (RL), which is slow and struggles to generalize, requiring online training on each new circuit. Instead, we train a diffusion model capable of placing new circuits zero-shot, using guided sampling in lieu of RL to optimize placement quality. To enable such models to train at scale, we designed a capable yet efficient architecture for the denoising model, and propose a novel algorithm to generate large synthetic datasets for pre-training. To allow zero-shot transfer to real circuits, we empirically study the design decisions of our dataset generation algorithm, and identify several key factors enabling generalization. When trained on our synthetic data, our models generate high-quality placements on unseen, realistic circuits, achieving competitive performance on placement benchmarks compared to state-of-the-art methods.
nan
Article 805
Title@2025-06-10 (2): The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning
Title: The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning | Das Gesicht des Populismus: Prüfung von Unterschieden in Gesichtsgefühlen politischer Führer mit Hilfe von maschinellem Lernen | 民粹主义的面孔:利用机器学习研究政治领袖的情感表达方式的差别 2304.09914v5 |
Authors (2): Sara Major, Aleksandar Tomašević
Populist rhetoric employed on online media is characterized as deeply impassioned and often imbued with strong emotions. The aim of this paper is to empirically investigate the differences in affective nonverbal communication of political leaders. We use a deep-learning approach to process a sample of 220 YouTube videos of political leaders from 15 different countries, analyze their facial expressions of emotion and then examine differences in average emotion scores representing the relative presence of 6 emotional states (anger, disgust, fear, happiness, sadness, and surprise) and a neutral expression for each frame of the YouTube video. Based on a sample of manually coded images, we find that this deep-learning approach has 53-60\% agreement with human labels. We observe statistically significant differences in the average score of negative emotions between groups of leaders with varying degrees of populist rhetoric.
nan
Article 806
Title@2025-06-10 (2): Thermodynamically Consistent Latent Dynamics Identification for Parametric Systems
Title: Thermodynamically Consistent Latent Dynamics Identification for Parametric Systems | Thermodynamisch konsistente Latent Dynamics-Identifikation für parametrische Systeme | 参数系统热动力一致的冷流动态识别 2506.08475v1 |
Authors (6): Xiaolong He, Yeonjong Shin, Anthony Gruber, Sohyeon Jung, Kookjin Lee, Youngsoo Choi
We propose an efficient thermodynamics-informed latent space dynamics identification (tLaSDI) framework for the reduced-order modeling of parametric nonlinear dynamical systems. This framework integrates autoencoders for dimensionality reduction with newly developed parametric GENERIC formalism-informed neural networks (pGFINNs), which enable efficient learning of parametric latent dynamics while preserving key thermodynamic principles such as free energy conservation and entropy generation across the parameter space. To further enhance model performance, a physics-informed active learning strategy is incorporated, leveraging a greedy, residual-based error indicator to adaptively sample informative training data, outperforming uniform sampling at equivalent computational cost. Numerical experiments on the Burgers’ equation and the 1D/1V Vlasov-Poisson equation demonstrate that the proposed method achieves up to 3,528x speed-up with 1-3% relative errors, and significant reduction in training (50-90%) and inference (57-61%) cost. Moreover, the learned latent space dynamics reveal the underlying thermodynamic behavior of the system, offering valuable insights into the physical-space dynamics.
nan
Article 807
Title@2025-06-10 (2): Penalty Learning for Optimal Partitioning using Multilayer Perceptron
Title: Penalty Learning for Optimal Partitioning using Multilayer Perceptron | Penalty Learning für optimale Partitionierung mit Multilayer Perceptron | 使用多层 Perceptron 进行最佳分区的惩罚学习 2408.00856v4 |
Authors (2): Tung L Nguyen, Toby Dylan Hocking
Changepoint detection is a technique used to identify significant shifts in sequences and is widely used in fields such as finance, genomics, and medicine. To identify the changepoints, dynamic programming (DP) algorithms, particularly Optimal Partitioning (OP) family, are widely used. To control the changepoints count, these algorithms use a fixed penalty to penalize the changepoints presence. To predict the optimal value of that penalty, existing methods used simple models such as linear or tree-based, which may limit predictive performance. To address this issue, this study proposes using a multilayer perceptron (MLP) with a ReLU activation function to predict the penalty. The proposed model generates continuous predictions – as opposed to the stepwise ones in tree-based models – and handles non-linearity better than linear models. Experiments on large benchmark genomic datasets demonstrate that the proposed model improves accuracy and F1 score compared to existing models.
nan
Article 808
Title@2025-06-10 (2): A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Title: A Survey of the Self Supervised Learning Mechanisms for Vision Transformers | Eine Übersicht über die selbstüberwachten Lernmechanismen für Visionstransformatoren | 展望变革者自我监督学习机制调查 2408.17059v5 |
Authors (14): Asifullah Khan, Anabia Sohail, Mustansar Fiaz, Mehdi Hassan, Tariq Habib Afridi, Sibghat Ullah Marwat, Farzeen Munir, Safdar Ali, Hannan Naseem, Muhammad Zaigham Zaheer, Kamran Ali, Tangina Sultana, Ziaurrehman Tanoli, Naeem Akhter
Vision Transformers (ViTs) have recently demonstrated remarkable performance in computer vision tasks. However, their parameter-intensive nature and reliance on large amounts of data for effective performance have shifted the focus from traditional human-annotated labels to unsupervised learning and pretraining strategies that uncover hidden structures within the data. In response to this challenge, self-supervised learning (SSL) has emerged as a promising paradigm. SSL leverages inherent relationships within the data itself as a form of supervision, eliminating the need for manual labeling and offering a more scalable and resource-efficient alternative for model training. Given these advantages, it is imperative to explore the integration of SSL techniques with ViTs, particularly in scenarios with limited labeled data. Inspired by this evolving trend, this survey aims to systematically review SSL mechanisms tailored for ViTs. We propose a comprehensive taxonomy to classify SSL techniques based on their representations and pre-training tasks. Additionally, we discuss the motivations behind SSL, review prominent pre-training tasks, and highlight advancements and challenges in this field. Furthermore, we conduct a comparative analysis of various SSL methods designed for ViTs, evaluating their strengths, limitations, and applicability to different scenarios.
nan
Article 809
Title@2025-06-10 (2): How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation
Title: How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation | Wie richten und ergänzen Bilder LiDAR? Auf dem Weg zu einer harmonisierten multimodalen 3D-Panoptischen Segmentierung | 图像如何对齐和补充 LiDAR ? 2505.18956v2 |
Authors (4): Yining Pan, Qiongjie Cui, Xulei Yang, Na Zhao
LiDAR-based 3D panoptic segmentation often struggles with the inherent sparsity of data from LiDAR sensors, which makes it challenging to accurately recognize distant or small objects. Recently, a few studies have sought to overcome this challenge by integrating LiDAR inputs with camera images, leveraging the rich and dense texture information provided by the latter. While these approaches have shown promising results, they still face challenges, such as misalignment during data augmentation and the reliance on post-processing steps. To address these issues, we propose Image-Assists-LiDAR (IAL), a novel multi-modal 3D panoptic segmentation framework. In IAL, we first introduce a modality-synchronized data augmentation strategy, PieAug, to ensure alignment between LiDAR and image inputs from the start. Next, we adopt a transformer decoder to directly predict panoptic segmentation results. To effectively fuse LiDAR and image features into tokens for the decoder, we design a Geometric-guided Token Fusion (GTF) module. Additionally, we leverage the complementary strengths of each modality as priors for query initialization through a Prior-based Query Generation (PQG) module, enhancing the decoder’s ability to generate accurate instance masks. Our IAL framework achieves state-of-the-art performance compared to previous multi-modal 3D panoptic segmentation methods on two widely used benchmarks. Code and models are publicly available at https://github.com/IMPL-Lab/IAL.git.
nan
Article 810
Title@2025-06-10 (2): AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
Title: AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models | AVA-Bench: Atomic Visual Ability Benchmark für Vision Foundation Modelle | AVA-Bench:远景基础模型的原子视觉能力基准 2506.09082v1 |
Authors (8): Zheda Mai, Arpita Chowdhury, Zihe Wang, Sooyoung Jeon, Lemeng Wang, Jiacheng Hou, Jihyung Kil, Wei-Lun Chao
The rise of vision foundation models (VFMs) calls for systematic evaluation. A common approach pairs VFMs with large language models (LLMs) as general-purpose heads, followed by evaluation on broad Visual Question Answering (VQA) benchmarks. However, this protocol has two key blind spots: (i) the instruction tuning data may not align with VQA test distributions, meaning a wrong prediction can stem from such data mismatch rather than a VFM’ visual shortcomings; (ii) VQA benchmarks often require multiple visual abilities, making it hard to tell whether errors stem from lacking all required abilities or just a single critical one. To address these gaps, we introduce AVA-Bench, the first benchmark that explicitly disentangles 14 Atomic Visual Abilities (AVAs) – foundational skills like localization, depth estimation, and spatial understanding that collectively support complex visual reasoning tasks. By decoupling AVAs and matching training and test distributions within each, AVA-Bench pinpoints exactly where a VFM excels or falters. Applying AVA-Bench to leading VFMs thus reveals distinctive “ability fingerprints,” turning VFM selection from educated guesswork into principled engineering. Notably, we find that a 0.5B LLM yields similar VFM rankings as a 7B LLM while cutting GPU hours by 8x, enabling more efficient evaluation. By offering a comprehensive and transparent benchmark, we hope AVA-Bench lays the foundation for the next generation of VFMs.
nan
Article 811
Title@2025-06-10 (2): MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature
Title: MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature | MAC: Eine effiziente Gradienten-Vorkonditionierung mit der mittleren Aktivierung Ungefähre Krümmung | MAC: 使用平均振动作用的高效渐进式预设设备 2506.08464v1 |
Authors (3): Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko
Second-order optimization methods for training neural networks, such as KFAC, exhibit superior convergence by utilizing curvature information of loss landscape. However, it comes at the expense of high computational burden. In this work, we analyze the two components that constitute the layer-wise Fisher information matrix (FIM) used in KFAC: the Kronecker factors related to activations and pre-activation gradients. Based on empirical observations on their eigenspectra, we propose efficient approximations for them, resulting in a computationally efficient optimization method called MAC. To the best of our knowledge, MAC is the first algorithm to apply the Kronecker factorization to the FIM of attention layers used in transformers and explicitly integrate attention scores into the preconditioning. We also study the convergence property of MAC on nonlinear neural networks and provide two conditions under which it converges to global minima. Our extensive evaluations on various network architectures and datasets show that the proposed method outperforms KFAC and other state-of-the-art methods in terms of accuracy, end-to-end training time, and memory usage. Code is available at https://github.com/hseung88/mac.
nan
Article 812
Title@2025-06-10 (2): How to Provably Improve Return Conditioned Supervised Learning?
Title: How to Provably Improve Return Conditioned Supervised Learning? | Wie man wahrscheinlich verbessern Rückkehr bedingtes Überwachtes Lernen? | 如何改善受监督的学习? 2506.08463v1 |
Authors (5): Zhishuai Liu, Yu Yang, Ruhan Wang, Pan Xu, Dongruo Zhou
In sequential decision-making problems, Return-Conditioned Supervised Learning (RCSL) has gained increasing recognition for its simplicity and stability in modern decision-making tasks. Unlike traditional offline reinforcement learning (RL) algorithms, RCSL frames policy learning as a supervised learning problem by taking both the state and return as input. This approach eliminates the instability often associated with temporal difference (TD) learning in offline RL. However, RCSL has been criticized for lacking the stitching property, meaning its performance is inherently limited by the quality of the policy used to generate the offline dataset. To address this limitation, we propose a principled and simple framework called Reinforced RCSL. The key innovation of our framework is the introduction of a concept we call the in-distribution optimal return-to-go. This mechanism leverages our policy to identify the best achievable in-dataset future return based on the current state, avoiding the need for complex return augmentation techniques. Our theoretical analysis demonstrates that Reinforced RCSL can consistently outperform the standard RCSL approach. Empirical results further validate our claims, showing significant performance improvements across a range of benchmarks.
nan
Article 813
Title@2025-06-10 (2): MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning
Title: MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning | MOBODY: Modellbasierte Off-Dynamik Offline-Verstärkung Lernen | MOBODY: 示范非药物非药物脱线强化学习 2506.08460v1 |
Authors (4): Yihong Guo, Yu Yang, Pan Xu, Anqi Liu
We study the off-dynamics offline reinforcement learning problem, where the goal is to learn a policy from offline datasets collected from source and target domains with mismatched transition. Existing off-dynamics offline RL methods typically either filter source transitions that resemble those of the target domain or apply reward augmentation to source data, both constrained by the limited transitions available from the target domain. As a result, the learned policy is unable to explore target domain beyond the offline datasets. We propose MOBODY, a Model-Based Off-Dynamics offline RL algorithm that addresses this limitation by enabling exploration of the target domain via learned dynamics. MOBODY generates new synthetic transitions in the target domain through model rollouts, which are used as data augmentation during offline policy learning. Unlike existing model-based methods that learn dynamics from a single domain, MOBODY tackles the challenge of mismatched dynamics by leveraging both source and target datasets. Directly merging these datasets can bias the learned model toward source dynamics. Instead, MOBODY learns target dynamics by discovering a shared latent representation of states and transitions across domains through representation learning. To stabilize training, MOBODY incorporates a behavior cloning loss that regularizes the policy. Specifically, we introduce a Q-weighted behavior cloning loss that regularizes the policy toward actions with high target-domain Q-values, rather than uniformly imitating all actions in the dataset. These Q-values are learned from an enhanced target dataset composed of offline target data, augmented source data, and rollout data from the learned target dynamics. We evaluate MOBODY on MuJoCo benchmarks and show that it significantly outperforms state-of-the-art baselines, with especially pronounced improvements in challenging scenarios.
nan
Article 814
Title@2025-06-10 (2): Textual Unlearning Gives a False Sense of Unlearning
Title: Textual Unlearning Gives a False Sense of Unlearning | Textuelles Nichtlernen gibt einen falschen Sinn für das Nichtlernen | 文字不学习给人一个错误的不学习感 2406.13348v3 |
Authors (6): Jiacheng Du, Zhibo Wang, Jie Zhang, Xiaoyi Pang, Jiahui Hu, Kui Ren
Language Models (LMs) are prone to ‘‘memorizing’’ training data, including substantial sensitive user information. To mitigate privacy risks and safeguard the right to be forgotten, machine unlearning has emerged as a promising approach for enabling LMs to efficiently ‘‘forget’’ specific texts. However, despite the good intentions, is textual unlearning really as effective and reliable as expected? To address the concern, we first propose Unlearning Likelihood Ratio Attack+ (U-LiRA+), a rigorous textual unlearning auditing method, and find that unlearned texts can still be detected with very high confidence after unlearning. Further, we conduct an in-depth investigation on the privacy risks of textual unlearning mechanisms in deployment and present the Textual Unlearning Leakage Attack (TULA), along with its variants in both black- and white-box scenarios. We show that textual unlearning mechanisms could instead reveal more about the unlearned texts, exposing them to significant membership inference and data reconstruction risks. Our findings highlight that existing textual unlearning actually gives a false sense of unlearning, underscoring the need for more robust and secure unlearning mechanisms.
nan
Article 815
Title@2025-06-10 (2): The interplay of robustness and generalization in quantum machine learning
Title: The interplay of robustness and generalization in quantum machine learning | Das Zusammenspiel von Robustheit und Verallgemeinerung im quantenmechanischen Lernen | 量子机器学习中强力和一般化的相互作用 2506.08455v1 |
Authors (3): Julian Berberich, Tobias Fellner, Christian Holm
While adversarial robustness and generalization have individually received substantial attention in the recent literature on quantum machine learning, their interplay is much less explored. In this chapter, we address this interplay for variational quantum models, which were recently proposed as function approximators in supervised learning. We discuss recent results quantifying both robustness and generalization via Lipschitz bounds, which explicitly depend on model parameters. Thus, they give rise to a regularization-based training approach for robust and generalizable quantum models, highlighting the importance of trainable data encoding strategies. The practical implications of the theoretical results are demonstrated with an application to time series analysis.
nan
Article 816
Title@2025-06-10 (2): An Explainable Vision Transformer with Transfer Learning Combined with Support Vector Machine Based Efficient Drought Stress Identification
Title: An Explainable Vision Transformer with Transfer Learning Combined with Support Vector Machine Based Efficient Drought Stress Identification | Ein erklärbarer Vision Transformer mit Transfer Learning kombiniert mit Unterstützung Vector Maschine basierte effiziente Drought Stress-Identifikation | 一个可解释的愿景变异器,与转让学习相结合,与基于支持矢量的高效干旱应激反应机的识别 2407.21666v2 |
Authors (3): Aswini Kumar Patra, Ankit Varshney, Lingaraj Sahoo
Early detection of drought stress is critical for taking timely measures for reducing crop loss before the drought impact becomes irreversible. The subtle phenotypical and physiological changes in response to drought stress are captured by non-invasive imaging techniques and these imaging data serve as valuable resource for machine learning methods to identify drought stress. While convolutional neural networks (CNNs) are in wide use, vision transformers (ViTs) present a promising alternative in capturing long-range dependencies and intricate spatial relationships, thereby enhancing the detection of subtle indicators of drought stress. We propose an explainable deep learning pipeline that leverages the power of ViTs for drought stress detection in potato crops using aerial imagery. We applied two distinct approaches: a synergistic combination of ViT and support vector machine (SVM), where ViT extracts intricate spatial features from aerial images, and SVM classifies the crops as stressed or healthy and an end-to-end approach using a dedicated classification layer within ViT to directly detect drought stress. Our key findings explain the ViT model’s decision-making process by visualizing attention maps. These maps highlight the specific spatial features within the aerial images that the ViT model focuses as the drought stress signature. Our findings demonstrate that the proposed methods not only achieve high accuracy in drought stress identification but also shedding light on the diverse subtle plant features associated with drought stress. This offers a robust and interpretable solution for drought stress monitoring for farmers to undertake informed decisions for improved crop management.
nan
Article 817
Title@2025-06-10 (2): Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data
Title: Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data | Stabilisierendes Kontrastives RL: Techniken für das Roboterziel, das von Offline-Daten aus erreicht wird | 稳定违规RL:利用离线数据实现机器人目标的技术 2306.03346v3 |
Authors (7): Chongyi Zheng, Benjamin Eysenbach, Homer Walke, Patrick Yin, Kuan Fang, Ruslan Salakhutdinov, Sergey Levine
Robotic systems that rely primarily on self-supervised learning have the potential to decrease the amount of human annotation and engineering effort required to learn control strategies. In the same way that prior robotic systems have leveraged self-supervised techniques from computer vision (CV) and natural language processing (NLP), our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem: learning to reach any goal without human-specified rewards or labels. Despite the seeming appeal, little (if any) prior work has demonstrated how self-supervised RL methods can be practically deployed on robotic systems. By first studying a challenging simulated version of this task, we discover design decisions about architectures and hyperparameters that increase the success rate by $2 \times$. These findings lay the groundwork for our main result: we demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks, with tasks being specified by a single goal image provided after training.
nan
Article 818
Title@2025-06-10 (2): Systematic and Efficient Construction of Quadratic Unconstrained Binary Optimization Forms for High-order and Dense Interactions
Title: Systematic and Efficient Construction of Quadratic Unconstrained Binary Optimization Forms for High-order and Dense Interactions | Systematische und effiziente Konstruktion von Quadratic Unconstrained Binary Optimization Forms für High-Order und Dense Interaktionen | 系统、高效地为高阶和高调互动和高调互动系统构建 “ 中度、无限制的 “ 优化二进制表格 2506.08448v1 |
Authors (2): Hyakka Nakada, Shu Tanaka
Quantum Annealing (QA) can efficiently solve combinatorial optimization problems whose objective functions are represented by Quadratic Unconstrained Binary Optimization (QUBO) formulations. For broader applicability of QA, quadratization methods are used to transform higher-order problems into QUBOs. However, quadratization methods for complex problems involving Machine Learning (ML) remain largely unknown. In these problems, strong nonlinearity and dense interactions prevent conventional methods from being applied. Therefore, we model target functions by the sum of rectified linear unit bases, which not only have the ability of universal approximation, but also have an equivalent quadratic-polynomial representation. In this study, the proof of concept is verified both numerically and analytically. In addition, by combining QA with the proposed quadratization, we design a new black-box optimization scheme, in which ML surrogate regressors are inputted to QA after the quadratization process.
nan
Article 819
Title@2025-06-10 (2): Understanding Bias Reinforcement in LLM Agents Debate
Title: Understanding Bias Reinforcement in LLM Agents Debate | Verständnis der Bias-Verstärkung in LLM-Agenten-Debatte | 了解LLLM代理商的强化申请 2503.16814v3 |
Authors (4): Jihwan Oh, Minchan Jeong, Jongwoo Ko, Se-Young Yun
Large Language Models $($LLMs$)$ solve complex problems using training-free methods like prompt engineering and in-context learning, yet ensuring reasoning correctness remains challenging. While self-correction methods such as self-consistency and self-refinement aim to improve reliability, they often reinforce biases due to the lack of effective feedback mechanisms. Multi-Agent Debate $($MAD$)$ has emerged as an alternative, but we identify two key limitations: bias reinforcement, where debate amplifies model biases instead of correcting them, and lack of perspective diversity, as all agents share the same model and reasoning patterns, limiting true debate effectiveness. To systematically evaluate these issues, we introduce $\textit{MetaNIM Arena}$, a benchmark designed to assess LLMs in adversarial strategic decision-making, where dynamic interactions influence optimal decisions. To overcome MAD’s limitations, we propose $\textbf{DReaMAD}$ $($$\textbf{D}$iverse $\textbf{Rea}$soning via $\textbf{M}$ulti-$\textbf{A}$gent $\textbf{D}$ebate with Refined Prompt$)$, a novel framework that $(1)$ refines LLM’s strategic prior knowledge to improve reasoning quality and $(2)$ promotes diverse viewpoints within a single model by systematically modifying prompts, reducing bias. Empirical results show that $\textbf{DReaMAD}$ significantly improves decision accuracy, reasoning diversity, and bias mitigation across multiple strategic tasks, establishing it as a more effective approach for LLM-based decision-making.
nan
Article 820
Title@2025-06-10 (2): A Framework for Controllable Multi-objective Learning with Annealed Stein Variational Hypernetworks
Title: A Framework for Controllable Multi-objective Learning with Annealed Stein Variational Hypernetworks | Ein Rahmen für kontrollierbares multiobjektives Lernen mit Annealed Stein Variational Hypernetworks | 与Annaaled Stein变异超网络的可控多目标学习框架 2506.06715v2 |
Authors (2): Minh-Duc Nguyen, Dung D. Le
Pareto Set Learning (PSL) is popular as an efficient approach to obtaining the complete optimal solution in Multi-objective Learning (MOL). A set of optimal solutions approximates the Pareto set, and its mapping is a set of dense points in the Pareto front in objective space. However, some current methods face a challenge: how to make the Pareto solution is diverse while maximizing the hypervolume value. In this paper, we propose a novel method to address this challenge, which employs Stein Variational Gradient Descent (SVGD) to approximate the entire Pareto set. SVGD pushes a set of particles towards the Pareto set by applying a form of functional gradient descent, which helps to converge and diversify optimal solutions. Additionally, we employ diverse gradient direction strategies to thoroughly investigate a unified framework for SVGD in multi-objective optimization and adapt this framework with an annealing schedule to promote stability. We introduce our method, SVH-MOL, and validate its effectiveness through extensive experiments on multi-objective problems and multi-task learning, demonstrating its superior performance.
nan
Article 821
Title@2025-06-10 (2): Time-Aware World Model for Adaptive Prediction and Control
Title: Time-Aware World Model for Adaptive Prediction and Control | Time-Aware-Weltmodell für adaptive Vorhersage und Kontrolle | 适应性预测和控制世界模型 2506.08441v1 |
Authors (3): Anh N. Nhu, Sanghyun Son, Ming Lin
In this work, we introduce the Time-Aware World Model (TAWM), a model-based approach that explicitly incorporates temporal dynamics. By conditioning on the time-step size, {\Delta}t, and training over a diverse range of {\Delta}t values – rather than sampling at a fixed time-step – TAWM learns both high- and low-frequency task dynamics across diverse control problems. Grounded in the information-theoretic insight that the optimal sampling rate depends on a system’s underlying dynamics, this time-aware formulation improves both performance and data efficiency. Empirical evaluations show that TAWM consistently outperforms conventional models across varying observation rates in a variety of control tasks, using the same number of training samples and iterations. Our code can be found online at: github.com/anh-nn01/Time-Aware-World-Model.
nan
Article 822
Title@2025-06-10 (2): Refiner: Data Refining against Gradient Leakage Attacks in Federated Learning
Title: Refiner: Data Refining against Gradient Leakage Attacks in Federated Learning | Raffinerie: Datenveredelung gegen gradiente Leckage-Angriffe im Federated Learning | 完善:完善数据,防止联邦学习中逐渐泄漏事件 2212.02042v3 |
Authors (5): Mingyuan Fan, Cen Chen, Chengyu Wang, Xiaodan Li, Wenmeng Zhou
Recent works have brought attention to the vulnerability of Federated Learning (FL) systems to gradient leakage attacks. Such attacks exploit clients’ uploaded gradients to reconstruct their sensitive data, thereby compromising the privacy protection capability of FL. In response, various defense mechanisms have been proposed to mitigate this threat by manipulating the uploaded gradients. Unfortunately, empirical evaluations have demonstrated limited resilience of these defenses against sophisticated attacks, indicating an urgent need for more effective defenses. In this paper, we explore a novel defensive paradigm that departs from conventional gradient perturbation approaches and instead focuses on the construction of robust data. Intuitively, if robust data exhibits low semantic similarity with clients’ raw data, the gradients associated with robust data can effectively obfuscate attackers. To this end, we design Refiner that jointly optimizes two metrics for privacy protection and performance maintenance. The utility metric is designed to promote consistency between the gradients of key parameters associated with robust data and those derived from clients’ data, thus maintaining model performance. Furthermore, the privacy metric guides the generation of robust data towards enlarging the semantic gap with clients’ data. Theoretical analysis supports the effectiveness of Refiner, and empirical evaluations on multiple benchmark datasets demonstrate the superior defense effectiveness of Refiner at defending against state-of-the-art attacks.
nan
Article 823
Title@2025-06-10 (2): Scaling Laws in Linear Regression: Compute, Parameters, and Data
Title: Scaling Laws in Linear Regression: Compute, Parameters, and Data | Skalierungsgesetze in linearer Regression: Berechnung, Parameter und Daten | 线性回归法的扩展法:计算、参数和数据 2406.08466v3 |
Authors (5): Licong Lin, Jingfeng Wu, Sham M. Kakade, Peter L. Bartlett, Jason D. Lee
Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, which predict that increasing model size monotonically improves performance. We study the theory of scaling laws in an infinite dimensional linear regression setup. Specifically, we consider a model with $M$ parameters as a linear function of sketched covariates. The model is trained by one-pass stochastic gradient descent (SGD) using $N$ data. Assuming the optimal parameter satisfies a Gaussian prior and the data covariance matrix has a power-law spectrum of degree $a>1$, we show that the reducible part of the test error is $\Theta(M^{-(a-1)} + N^{-(a-1)/a})$. The variance error, which increases with $M$, is dominated by the other errors due to the implicit regularization of SGD, thus disappearing from the bound. Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
nan
Article 824
Title@2025-06-10 (2): Learning to Lead: Incentivizing Strategic Agents in the Dark
Title: Learning to Lead: Incentivizing Strategic Agents in the Dark | Führen lernen: Anreize für strategische Agenten im Dunkeln | 学习铅:在黑暗中激励战略剂 2506.08438v1 |
Authors (3): Yuchen Wu, Xinyi Zhong, Zhuoran Yang
We study an online learning version of the generalized principal-agent model, where a principal interacts repeatedly with a strategic agent possessing private types, private rewards, and taking unobservable actions. The agent is non-myopic, optimizing a discounted sum of future rewards and may strategically misreport types to manipulate the principal’s learning. The principal, observing only her own realized rewards and the agent’s reported types, aims to learn an optimal coordination mechanism that minimizes strategic regret. We develop the first provably sample-efficient algorithm for this challenging setting. Our approach features a novel pipeline that combines (i) a delaying mechanism to incentivize approximately myopic agent behavior, (ii) an innovative reward angle estimation framework that uses sector tests and a matching procedure to recover type-dependent reward functions, and (iii) a pessimistic-optimistic LinUCB algorithm that enables the principal to explore efficiently while respecting the agent’s incentive constraints. We establish a near optimal $\tilde{O}(\sqrt{T}) $ regret bound for learning the principal’s optimal policy, where $\tilde{O}(\cdot) $ omits logarithmic factors. Our results open up new avenues for designing robust online learning algorithms for a wide range of game-theoretic settings involving private types and strategic agents.
nan
Article 825
Title@2025-06-10 (2): Olica: Efficient Structured Pruning of Large Language Models without Retraining
Title: Olica: Efficient Structured Pruning of Large Language Models without Retraining | Olica: Effizientes strukturiertes Beschneiden großer Sprachmodelle ohne Umschulung | Olica:无再培训的大型语言模型高效结构稳健 2506.08436v1 |
Authors (2): Jiujun He, Huazhen Lin
Most existing structured pruning methods for Large Language Models (LLMs) require substantial computational and data resources for retraining to reestablish the corrupted correlations, making them prohibitively expensive. To address this, we propose a pruning framework for LLMs called Orthogonal decomposition and Linear Calibration (Olica), which eliminates the need for retraining. A key observation is that the multi-head attention (MHA) layer depends on two types of matrix products. By treating these matrix products as unified entities and applying principal component analysis (PCA), we extract the most important information to compress LLMs without sacrificing accuracy or disrupting their original structure. Consequently, retraining becomes unnecessary. A fast decomposition method is devised, reducing the complexity of PCA by a factor of the square of the number of attention heads. Additionally, to mitigate error accumulation problem caused by pruning the feed-forward network (FFN) layer, we introduce a linear calibration method to reconstruct the residual errors of pruned layers using low-rank matrices. By leveraging singular value decomposition (SVD) on the solution of the least-squares problem, these matrices are obtained without requiring retraining. Extensive experiments show that the proposed Olica is efficient in terms of data usage, GPU memory, and running time, while delivering superior performance across multiple benchmarks.
nan
Article 826
Title@2025-06-10 (2): Boosting Gradient Leakage Attacks: Data Reconstruction in Realistic FL Settings
Title: Boosting Gradient Leakage Attacks: Data Reconstruction in Realistic FL Settings | Förderung gradienter Leckagenangriffe: Datenwiederaufbau in realistischen FL-Einstellungen | 推动渐变渗漏袭击:在现实的FL设置中进行数据重建 2506.08435v1 |
Authors (4): Mingyuan Fan, Fuyi Wang, Cen Chen, Jianying Zhou
Federated learning (FL) enables collaborative model training among multiple clients without the need to expose raw data. Its ability to safeguard privacy, at the heart of FL, has recently been a hot-button debate topic. To elaborate, several studies have introduced a type of attacks known as gradient leakage attacks (GLAs), which exploit the gradients shared during training to reconstruct clients’ raw data. On the flip side, some literature, however, contends no substantial privacy risk in practical FL environments due to the effectiveness of such GLAs being limited to overly relaxed conditions, such as small batch sizes and knowledge of clients’ data distributions. This paper bridges this critical gap by empirically demonstrating that clients’ data can still be effectively reconstructed, even within realistic FL environments. Upon revisiting GLAs, we recognize that their performance failures stem from their inability to handle the gradient matching problem. To alleviate the performance bottlenecks identified above, we develop FedLeak, which introduces two novel techniques, partial gradient matching and gradient regularization. Moreover, to evaluate the performance of FedLeak in real-world FL environments, we formulate a practical evaluation protocol grounded in a thorough review of extensive FL literature and industry practices. Under this protocol, FedLeak can still achieve high-fidelity data reconstruction, thereby underscoring the significant vulnerability in FL systems and the urgent need for more effective defense methods.
nan
Article 827
Title@2025-06-10 (2): FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making
Title: FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making | FinHEAR: Human Expertise und adaptive Risiko-Bewusst Temporale Begründung für Finanzentscheidungen | FinHEAR: 金融决策的人类专门知识和适应性风险—-风险警报时间原因 2506.09080v1 |
Authors (7): Jiaxiang Chen, Mingxi Zou, Zhuo Wang, Qifan Wang, Dongning Sun, Chi Zhang, Zenglin Xu
Financial decision-making presents unique challenges for language models, demanding temporal reasoning, adaptive risk assessment, and responsiveness to dynamic events. While large language models (LLMs) show strong general reasoning capabilities, they often fail to capture behavioral patterns central to human financial decisions-such as expert reliance under information asymmetry, loss-averse sensitivity, and feedback-driven temporal adjustment. We propose FinHEAR, a multi-agent framework for Human Expertise and Adaptive Risk-aware reasoning. FinHEAR orchestrates specialized LLM-based agents to analyze historical trends, interpret current events, and retrieve expert-informed precedents within an event-centric pipeline. Grounded in behavioral economics, it incorporates expert-guided retrieval, confidence-adjusted position sizing, and outcome-based refinement to enhance interpretability and robustness. Empirical results on curated financial datasets show that FinHEAR consistently outperforms strong baselines across trend prediction and trading tasks, achieving higher accuracy and better risk-adjusted returns.
nan
Article 828
Title@2025-06-10 (2): Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings
Title: Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings | Verstärkte Konvergenzraten für die nichtkonvexe Optimierung durch Reduktionsmappings | 通过减少绘图实现非电流优化的夏端趋同率 2506.08428v1 |
Authors (3): Evan Markou, Thalaiyasingam Ajanthan, Stephen Gould
Many high-dimensional optimisation problems exhibit rich geometric structures in their set of minimisers, often forming smooth manifolds due to over-parametrisation or symmetries. When this structure is known, at least locally, it can be exploited through reduction mappings that reparametrise part of the parameter space to lie on the solution manifold. These reductions naturally arise from inner optimisation problems and effectively remove redundant directions, yielding a lower-dimensional objective. In this work, we introduce a general framework to understand how such reductions influence the optimisation landscape. We show that well-designed reduction mappings improve curvature properties of the objective, leading to better-conditioned problems and theoretically faster convergence for gradient-based methods. Our analysis unifies a range of scenarios where structural information at optimality is leveraged to accelerate convergence, offering a principled explanation for the empirical gains observed in such optimisation algorithms.
nan
Article 829
Title@2025-06-10 (2): HASFL: Heterogeneity-aware Split Federated Learning over Edge Computing Systems
Title: HASFL: Heterogeneity-aware Split Federated Learning over Edge Computing Systems | HASFL: Heterogenität bewusst Split Federated Learning over Edge Computing Systems | HASFL: 超越边缘电子计算系统的异异质-认知异质分化联邦学习系统 2506.08426v1 |
Authors (5): Zheng Lin, Zhe Chen, Xianhao Chen, Wei Ni, Yue Gao
Split federated learning (SFL) has emerged as a promising paradigm to democratize machine learning (ML) on edge devices by enabling layer-wise model partitioning. However, existing SFL approaches suffer significantly from the straggler effect due to the heterogeneous capabilities of edge devices. To address the fundamental challenge, we propose adaptively controlling batch sizes (BSs) and model splitting (MS) for edge devices to overcome resource heterogeneity. We first derive a tight convergence bound of SFL that quantifies the impact of varied BSs and MS on learning performance. Based on the convergence bound, we propose HASFL, a heterogeneity-aware SFL framework capable of adaptively controlling BS and MS to balance communication-computing latency and training convergence in heterogeneous edge networks. Extensive experiments with various datasets validate the effectiveness of HASFL and demonstrate its superiority over state-of-the-art benchmarks.
nan
Article 830
Title@2025-06-10 (2): Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy
Title: Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy | Mic-hackathon 2024: Hackathon zum maschinellen Lernen für Elektronen- und Rastersondenmikroskopie | Mic-hackathon 2024:关于电子和扫描探测器机器学习的Hackathon 2506.08423v1 |
Authors (73): Utkarsh Pratiush, Austin Houston, Kamyar Barakati, Aditya Raghavan, Dasol Yoon, Harikrishnan KP, Zhaslan Baraissov, Desheng Ma, Samuel S. Welborn, Mikolaj Jakowski, Shawn-Patrick Barhorst, Alexander J. Pattison, Panayotis Manganaris, Sita Sirisha Madugula, Sai Venkata Gayathri Ayyagari, Vishal Kennedy, Ralph Bulanadi, Michelle Wang, Kieran J. Pang, Ian Addison-Smith, Willy Menacho, Horacio V. Guzman, Alexander Kiefer, Nicholas Furth, Nikola L. Kolev, Mikhail Petrov, Viktoriia Liu, Sergey Ilyev, Srikar Rairao, Tommaso Rodani, Ivan Pinto-Huguet, Xuli Chen, Josep Cruañes, Marta Torrens, Jovan Pomar, Fanzhi Su, Pawan Vedanti, Zhiheng Lyu, Xingzhi Wang, Lehan Yao, Amir Taqieddin, Forrest Laskowski, Xiangyu Yin, Yu-Tsun Shao, Benjamin Fein-Ashley, Yi Jiang, Vineet Kumar, Himanshu Mishra, Yogesh Paul, Adib Bazgir, Rama chandra Praneeth Madugula, Yuwen Zhang, Pravan Omprakash, Jian Huang, Eric Montufar-Morales, Vivek Chawla, Harshit Sethi, Jie Huang, Lauri Kurki, Grace Guinan, Addison Salvador, Arman Ter-Petrosyan, Madeline Van Winkle, Steven R. Spurgeon, Ganesh Narasimha, Zijie Wu, Richard Liu, Yongtao Liu, Boris Slautin, Andrew R Lupini, Rama Vasudevan, Gerd Duscher, Sergei V. Kalinin
Microscopy is a primary source of information on materials structure and functionality at nanometer and atomic scales. The data generated is often well-structured, enriched with metadata and sample histories, though not always consistent in detail or format. The adoption of Data Management Plans (DMPs) by major funding agencies promotes preservation and access. However, deriving insights remains difficult due to the lack of standardized code ecosystems, benchmarks, and integration strategies. As a result, data usage is inefficient and analysis time is extensive. In addition to post-acquisition analysis, new APIs from major microscope manufacturers enable real-time, ML-based analytics for automated decision-making and ML-agent-controlled microscope operation. Yet, a gap remains between the ML and microscopy communities, limiting the impact of these methods on physics, materials discovery, and optimization. Hackathons help bridge this divide by fostering collaboration between ML researchers and microscopy experts. They encourage the development of novel solutions that apply ML to microscopy, while preparing a future workforce for instrumentation, materials science, and applied ML. This hackathon produced benchmark datasets and digital twins of microscopes to support community growth and standardized workflows. All related code is available at GitHub: https://github.com/KalininGroup/Mic-hackathon-2024-codes-publication/tree/1.0.0.1
nan
Article 831
Title@2025-06-10 (2): Dense ReLU Neural Networks for Temporal-spatial Model
Title: Dense ReLU Neural Networks for Temporal-spatial Model | Dense ReLU Neuronale Netzwerke für zeitlich-räumliches Modell | 时空空间模型神经网络 2411.09961v8 |
Authors (5): Carlos Misael Madrid Padilla, Zhi Zhang, Xiaokai Luo, Daren Wang, Oscar Hernan Madrid Padilla
In this paper, we focus on fully connected deep neural networks utilizing the Rectified Linear Unit (ReLU) activation function for nonparametric estimation. We derive non-asymptotic bounds that lead to convergence rates, addressing both temporal and spatial dependence in the observed measurements. By accounting for dependencies across time and space, our models better reflect the complexities of real-world data, enhancing both predictive performance and theoretical robustness. We also tackle the curse of dimensionality by modeling the data on a manifold, exploring the intrinsic dimensionality of high-dimensional data. We broaden existing theoretical findings of temporal-spatial analysis by applying them to neural networks in more general contexts and demonstrate that our proof techniques are effective for models with short-range dependence. Our empirical simulations across various synthetic response functions underscore the superior performance of our method, outperforming established approaches in the existing literature. These findings provide valuable insights into the strong capabilities of dense neural networks (Dense NN) for temporal-spatial modeling across a broad range of function classes.
nan
Article 832
Title@2025-06-10 (2): DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Title: DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization | DisCO: Mit diskriminativ eingeschränkter Optimierung groß angelegte Modelle stärken | disCO:加强大理由理由模型,采用有偏见的限制性最佳化 2505.12366v2 |
Authors (5): Gang Li, Ming Lin, Tomer Galanti, Zhengzhong Tu, Tianbao Yang
The recent success and openness of DeepSeek-R1 have brought widespread attention to Group Relative Policy Optimization (GRPO) as a reinforcement learning method for large reasoning models (LRMs). In this work, we analyze the GRPO objective under a binary reward setting and reveal an inherent limitation of question-level difficulty bias. We also identify a connection between GRPO and traditional discriminative methods in supervised learning. Motivated by these insights, we introduce a new Discriminative Constrained Optimization (DisCO) framework for reinforcing LRMs, grounded in the principle of discriminative learning. The main differences between DisCO and GRPO and its recent variants are: (1) it replaces the group relative objective with a discriminative objective defined by a scoring function; (2) it abandons clipping-based surrogates in favor of non-clipping RL surrogate objectives used as scoring functions; (3) it employs a simple yet effective constrained optimization approach to enforce the KL divergence constraint, ensuring stable training. As a result, DisCO offers notable advantages over GRPO and its variants: (i) it completely eliminates difficulty bias by adopting discriminative objectives; (ii) it addresses the entropy instability in GRPO and its variants through the use of non-clipping scoring functions and a constrained optimization approach; (iii) it allows the incorporation of advanced discriminative learning techniques to address data imbalance, where a significant number of questions have more negative than positive generated answers during training. Our experiments on enhancing the mathematical reasoning capabilities of SFT-finetuned models show that DisCO significantly outperforms GRPO and its improved variants such as DAPO, achieving average gains of 7\% over GRPO and 6\% over DAPO across six benchmark tasks for an 1.5B model.
nan
Article 833
Title@2025-06-10 (2): Online Learning-guided Learning Rate Adaptation via Gradient Alignment
Title: Online Learning-guided Learning Rate Adaptation via Gradient Alignment | Online Learning-guided Learning Rate Anpassung über Gradient Alignment | 通过 “ 渐进协调 “ 调整,调整在线学习指导学习率 2506.08419v1 |
Authors (3): Ruichen Jiang, Ali Kavis, Aryan Mokhtari
The performance of an optimizer on large-scale deep learning models depends critically on fine-tuning the learning rate, often requiring an extensive grid search over base learning rates, schedules, and other hyperparameters. In this paper, we propose a principled framework called GALA (Gradient Alignment-based Learning rate Adaptation), which dynamically adjusts the learning rate by tracking the alignment between consecutive gradients and using a local curvature estimate. Guided by the convergence analysis, we formulate the problem of selecting the learning rate as a one-dimensional online learning problem. When paired with an online learning algorithm such as Follow-the-Regularized-Leader, our method produces a flexible, adaptive learning rate schedule that tends to increase when consecutive gradients are aligned and decrease otherwise. We establish a data-adaptive convergence rate for normalized SGD equipped with GALA in the smooth, nonconvex setting. Empirically, common optimizers such as SGD and Adam, when augmented with GALA, demonstrate robust performance across a wide range of initial learning rates and perform competitively without the need for tuning.
nan
Article 834
Title@2025-06-10 (2): Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood
Title: Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood | Offline RL mit glatter OOD-Verallgemeinerung in Convex Hull und seiner Nachbarschaft | 在Convex Hull及其邻里平滑OOOD平滑的离线RL(RL) 2506.08417v1 |
Authors (8): Qingmao Yao, Zhichao Lei, Tianyuan Chen, Ziyue Yuan, Xuefan Chen, Jianxiang Liu, Faguo Wu, Xiao Zhang
Offline Reinforcement Learning (RL) struggles with distributional shifts, leading to the $Q$-value overestimation for out-of-distribution (OOD) actions. Existing methods address this issue by imposing constraints; however, they often become overly conservative when evaluating OOD regions, which constrains the $Q$-function generalization. This over-constraint issue results in poor $Q$-value estimation and hinders policy improvement. In this paper, we introduce a novel approach to achieve better $Q$-value estimation by enhancing $Q$-function generalization in OOD regions within Convex Hull and its Neighborhood (CHN). Under the safety generalization guarantees of the CHN, we propose the Smooth Bellman Operator (SBO), which updates OOD $Q$-values by smoothing them with neighboring in-sample $Q$-values. We theoretically show that SBO approximates true $Q$-values for both in-sample and OOD actions within the CHN. Our practical algorithm, Smooth Q-function OOD Generalization (SQOG), empirically alleviates the over-constraint issue, achieving near-accurate $Q$-value estimation. On the D4RL benchmarks, SQOG outperforms existing state-of-the-art methods in both performance and computational efficiency.
nan
Article 835
Title@2025-06-10 (2): DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data
Title: DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data | DISCO balanciert die Skalen: Adaptives Domain- und Schwierigkeitsgrad-Bewusst-Verstärkung-Lernen auf unausgewogenen Daten | DISCO 平衡规模:关于数据不平衡的适应性域域和困难性软件强化学习 2505.15074v2 |
Authors (10): Yuhang Zhou, Jing Zhu, Shengyi Qian, Zhuokai Zhao, Xiyao Wang, Xiaoyu Liu, Ming Li, Paiheng Xu, Wei Ai, Furong Huang
Large Language Models (LLMs) are increasingly aligned with human preferences through Reinforcement Learning from Human Feedback (RLHF). Among RLHF methods, Group Relative Policy Optimization (GRPO) has gained attention for its simplicity and strong performance, notably eliminating the need for a learned value function. However, GRPO implicitly assumes a balanced domain distribution and uniform semantic alignment across groups - assumptions that rarely hold in real-world datasets. When applied to multi-domain, imbalanced data, GRPO disproportionately optimizes for dominant domains, neglecting underrepresented ones and resulting in poor generalization and fairness. We propose Domain-Informed Self-Consistency Policy Optimization (DISCO), a principled extension to GRPO that addresses inter-group imbalance with two key innovations. Domain-aware reward scaling counteracts frequency bias by reweighting optimization based on domain prevalence. Difficulty-aware reward scaling leverages prompt-level self-consistency to identify and prioritize uncertain prompts that offer greater learning value. Together, these strategies promote more equitable and effective policy learning across domains. Extensive experiments across multiple LLMs and skewed training distributions show that DISCO improves generalization, outperforms existing GRPO variants by 5% on Qwen3 models, and sets new state-of-the-art results on multi-domain alignment benchmarks.
nan
Article 836
Title@2025-06-10 (2): Improved Scaling Laws in Linear Regression via Data Reuse
Title: Improved Scaling Laws in Linear Regression via Data Reuse | Verbesserte Skalierungsgesetze bei linearer Regression über Datenwiederverwertung | 通过数据再利用改进线性回归通过数据再利用的扩展法 2506.08415v1 |
Authors (3): Licong Lin, Jingfeng Wu, Peter L. Bartlett
Neural scaling laws suggest that the test error of large language models trained online decreases polynomially as the model size and data size increase. However, such scaling can be unsustainable when running out of new data. In this work, we show that data reuse can improve existing scaling laws in linear regression. Specifically, we derive sharp test error bounds on $M$-dimensional linear models trained by multi-pass stochastic gradient descent (multi-pass SGD) on $N$ data with sketched features. Assuming that the data covariance has a power-law spectrum of degree $a$, and that the true parameter follows a prior with an aligned power-law spectrum of degree $b-a$ (with $a > b > 1$), we show that multi-pass SGD achieves a test error of $\Theta(M^{1-b} + L^{(1-b)/a})$, where $L \lesssim N^{a/b}$ is the number of iterations. In the same setting, one-pass SGD only attains a test error of $\Theta(M^{1-b} + N^{(1-b)/a})$ (see e.g., Lin et al., 2024). This suggests an improved scaling law via data reuse (i.e., choosing $L>N$) in data-constrained regimes. Numerical simulations are also provided to verify our theoretical findings.
nan
Article 837
Title@2025-06-10 (2): Learning to Hear Broken Motors: Signature-Guided Data Augmentation for Induction-Motor Diagnostics
Title: Learning to Hear Broken Motors: Signature-Guided Data Augmentation for Induction-Motor Diagnostics | Lernen, gebrochene Motoren zu hören: Signatur-geführte Datenvergrößerung für Induktions-Motor-Diagnostik | 学习听听破碎汽车:诱导-摩托汽车诊断的签名指导数据增强 2506.08412v1 |
Authors (5): Saraa Ali, Aleksandr Khizhik, Stepan Svirin, Artem Ryzhikov, Denis Derkach
The application of machine learning (ML) algorithms in the intelligent diagnosis of three-phase engines has the potential to significantly enhance diagnostic performance and accuracy. Traditional methods largely rely on signature analysis, which, despite being a standard practice, can benefit from the integration of advanced ML techniques. In our study, we innovate by combining ML algorithms with a novel unsupervised anomaly generation methodology that takes into account the engine physics model. We propose Signature-Guided Data Augmentation (SGDA), an unsupervised framework that synthesizes physically plausible faults directly in the frequency domain of healthy current signals. Guided by Motor Current Signature Analysis, SGDA creates diverse and realistic anomalies without resorting to computationally intensive simulations. This hybrid approach leverages the strengths of both supervised ML and unsupervised signature analysis, achieving superior diagnostic accuracy and reliability along with wide industrial application. The findings highlight the potential of our approach to contribute significantly to the field of engine diagnostics, offering a robust and efficient solution for real-world applications.
nan
Article 838
Title@2025-06-10 (2): Mutual-Taught for Co-adapting Policy and Reward Models
Title: Mutual-Taught for Co-adapting Policy and Reward Models | Gegenseitiges Lernen für Ko-Anpassungspolitik und Prämienmodelle | 共同适应政策和奖励模式相互培训 2506.06292v2 |
Authors (8): Tianyuan Shi, Canbin Huang, Fanqi Wan, Longguang Zhong, Ziyi Yang, Weizhou Shen, Xiaojun Quan, Ming Yan
During the preference optimization of large language models (LLMs), distribution shifts may arise between newly generated model samples and the data used to train the reward model (RM). This shift reduces the efficacy of the RM, which in turn negatively impacts the performance of the policy model (PM). To address this challenge, we propose Mutual-Taught, a self-training method that iteratively improves both the PM and RM without requiring additional human annotation. Our approach mirrors the expectation-maximization (EM) algorithm. In the E-step, the PM is updated using feedback from the current RM, guiding the PM toward a better approximation of the latent optimal preference distribution. In the M-step, we update the RM by constructing training data from the outputs of the PM before and after the E-step update. This process ensures that the RM adapts to the evolving policy distribution. Experimental results demonstrate that this iterative approach leads to consistent improvements in both models. Specifically, our 8B policy model, LLaMA-3-8B-Instruct-MT, achieves a length-controlled win rate of 54.1\% on AlpacaEval-2, while our 8B reward model, FsfairX-LLaMA3-RM-MT, performs on par with GPT-4o-2024-08-06 on RewardBench.
nan
Article 839
Title@2025-06-10 (2): FUSE: Measure-Theoretic Compact Fuzzy Set Representation for Taxonomy Expansion
Title: FUSE: Measure-Theoretic Compact Fuzzy Set Representation for Taxonomy Expansion | FUSE: Maß-theoretische Kompakt-Fuzzy-Set-Darstellung für Taxonomie-Expansion | FUSE: 测量-理论结合模糊集束代表扩大分类学 2506.08409v1 |
Authors (7): Fred Xu, Song Jiang, Zijie Huang, Xiao Luo, Shichang Zhang, Adrian Chen, Yizhou Sun
Taxonomy Expansion, which models complex concepts and their relations, can be formulated as a set representation learning task. The generalization of set, fuzzy set, incorporates uncertainty and measures the information within a semantic concept, making it suitable for concept modeling. Existing works usually model sets as vectors or geometric objects such as boxes, which are not closed under set operations. In this work, we propose a sound and efficient formulation of set representation learning based on its volume approximation as a fuzzy set. The resulting embedding framework, Fuzzy Set Embedding (FUSE), satisfies all set operations and compactly approximates the underlying fuzzy set, hence preserving information while being efficient to learn, relying on minimum neural architecture. We empirically demonstrate the power of FUSE on the task of taxonomy expansion, where FUSE achieves remarkable improvements up to 23% compared with existing baselines. Our work marks the first attempt to understand and efficiently compute the embeddings of fuzzy sets.
nan
Article 840
Title@2025-06-10 (2): Towards Generalized Source Tracing for Codec-Based Deepfake Speech
Title: Towards Generalized Source Tracing for Codec-Based Deepfake Speech | Auf dem Weg zu einer generalisierten Quellenverfolgung für codec-basierte Deepfake-Rede | 努力实现以基于编码的 “ 深层假言 “ 为依据的 “ 普遍源源追踪 “ 目标 2506.07294v2 |
Authors (6): Xuanjun Chen, I-Ming Lin, Lin Zhang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang
Recent attempts at source tracing for codec-based deepfake speech (CodecFake), generated by neural audio codec-based speech generation (CoSG) models, have exhibited suboptimal performance. However, how to train source tracing models using simulated CoSG data while maintaining strong performance on real CoSG-generated audio remains an open challenge. In this paper, we show that models trained solely on codec-resynthesized data tend to overfit to non-speech regions and struggle to generalize to unseen content. To mitigate these challenges, we introduce the Semantic-Acoustic Source Tracing Network (SASTNet), which jointly leverages Whisper for semantic feature encoding and Wav2vec2 with AudioMAE for acoustic feature encoding. Our proposed SASTNet achieves state-of-the-art performance on the CoSG test set of the CodecFake+ dataset, demonstrating its effectiveness for reliable source tracing.
nan
Article 841
Title@2025-06-10 (2): RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
Title: RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation | RLHS: Fehlausrichtung in RLHF mit Hindsight Simulation abmildern | RLHS: 减轻RLHF的差错,采用速光模拟模拟 2501.08617v3 |
Authors (5): Kaiqu Liang, Haimin Hu, Ryan Liu, Thomas L. Griffiths, Jaime Fernández Fisac
While Reinforcement Learning from Human Feedback (RLHF) has shown promise in aligning generative AI, we present empirical evidence that it can also cause severe, systematic misalignment. We hypothesize that this stems from evaluator feedback depending on downstream outcome predictions (foresight) that can be influenced by the AI’s output, inducing Goodhart’s law dynamics. We present a theoretical analysis showing that conditioning evaluator feedback on downstream observations (hindsight) inhibits this effect by decoupling the alignment signal from potentially compromised predictions–crucially, the result holds even if the observed outcomes are sampled from the AI’s own world model. Building on this insight, we introduce Reinforcement Learning from Hindsight Simulation (RLHS), which presents plausible simulated outcomes to evaluators before eliciting feedback. We validate RLHS across three consultancy settings–marketplace interactions, restaurant recommendations, and online course advising–using both online (PPO) and offline (DPO) fine-tuning methods, and show that it substantially improves alignment over RLHF in experiments and human evaluations. We perform post-hoc benchmark evaluations on TruthfulQA, HaluEval, and TrustLLM, finding that even after single-task fine-tuning, RLHF misalignment persists, whereas RLHS consistently outperforms baselines and demonstrates robust alignment generalization. The project webpage and code are available at https://rl-hindsight.github.io.
nan
Article 842
Title@2025-06-10 (2): Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry
Title: Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry | Sparse Training von Zufallsinitialisierung: Ausrichten von Lotteriekartenmasken mit Gewichtssymmetrie | 随机初始化的分散训练: 使用重量对称对调彩票票面罩 2505.05143v2 |
Authors (5): Mohammed Adnan, Rohan Jain, Ekansh Sharma, Rahul Krishnan, Yani Ioannou
The Lottery Ticket Hypothesis (LTH) suggests there exists a sparse LTH mask and weights that achieve the same generalization performance as the dense model while using significantly fewer parameters. However, finding a LTH solution is computationally expensive, and a LTH sparsity mask does not generalize to other random weight initializations. Recent work has suggested that neural networks trained from random initialization find solutions within the same basin modulo permutation, and proposes a method to align trained models within the same loss basin. We hypothesize that misalignment of basins is the reason why LTH masks do not generalize to new random initializations and propose permuting the LTH mask to align with the new optimization basin when performing sparse training from a different random init. We empirically show a significant increase in generalization when sparse training from random initialization with the permuted mask as compared to using the non-permuted LTH mask, on multiple datasets (CIFAR-10, CIFAR-100 and ImageNet) and models (VGG11, ResNet20 and ResNet50).
nan
Article 843
Title@2025-06-10 (2): mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks
Title: mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks | mSTEB: Massive mehrsprachige Bewertung von LLMs zu Sprach- und Textaufgaben | mSTEB: 对关于发言和文本任务LLM女士进行大规模多语种评价 2506.08400v1 |
Authors (7): Luel Hagos Beyene, Vivek Verma, Min Ma, Jesujoba O. Alabi, Fabian David Schmidt, Joyce Nakatumba-Nabende, David Ifeoluwa Adelani
Large Language models (LLMs) have demonstrated impressive performance on a wide range of tasks, including in multimodal settings such as speech. However, their evaluation is often limited to English and a few high-resource languages. For low-resource languages, there is no standardized evaluation benchmark. In this paper, we address this gap by introducing mSTEB, a new benchmark to evaluate the performance of LLMs on a wide range of tasks covering language identification, text classification, question answering, and translation tasks on both speech and text modalities. We evaluated the performance of leading LLMs such as Gemini 2.0 Flash and GPT-4o (Audio) and state-of-the-art open models such as Qwen 2 Audio and Gemma 3 27B. Our evaluation shows a wide gap in performance between high-resource and low-resource languages, especially for languages spoken in Africa and Americas/Oceania. Our findings show that more investment is needed to address their under-representation in LLMs coverage.
nan
Article 844
Title@2025-06-10 (2): Spatiotemporal deep learning models for detection of rapid intensification in cyclones
Title: Spatiotemporal deep learning models for detection of rapid intensification in cyclones | Geotemporale Deep-Learning-Modelle zur Erkennung einer schnellen Verstärkung bei Zyklonen | 探测气旋风暴迅速加剧的斯波时深深学习模式 2506.08397v1 |
Authors (3): Vamshika Sutar, Amandeep Singh, Rohitash Chandra
Cyclone rapid intensification is the rapid increase in cyclone wind intensity, exceeding a threshold of 30 knots, within 24 hours. Rapid intensification is considered an extreme event during a cyclone, and its occurrence is relatively rare, contributing to a class imbalance in the dataset. A diverse array of factors influences the likelihood of a cyclone undergoing rapid intensification, further complicating the task for conventional machine learning models. In this paper, we evaluate deep learning, ensemble learning and data augmentation frameworks to detect cyclone rapid intensification based on wind intensity and spatial coordinates. We note that conventional data augmentation methods cannot be utilised for generating spatiotemporal patterns replicating cyclones that undergo rapid intensification. Therefore, our framework employs deep learning models to generate spatial coordinates and wind intensity that replicate cyclones to address the class imbalance problem of rapid intensification. We also use a deep learning model for the classification module within the data augmentation framework to differentiate between rapid and non-rapid intensification events during a cyclone. Our results show that data augmentation improves the results for rapid intensification detection in cyclones, and spatial coordinates play a critical role as input features to the given models. This paves the way for research in synthetic data generation for spatiotemporal data with extreme events.
nan
Article 845
Title@2025-06-10 (2): From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models
Title: From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models | Von Pixeln zu Prädikaten: Learning Symbolic World Models via Pretrained Vision-Language Models | 从像素到预言:通过预先培训的视觉语言模型学习象征性世界模式 2501.00296v3 |
Authors (7): Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Jiuguang Wang, Tomás Lozano-Pérez, Leslie Pack Kaelbling
Our aim is to learn to solve long-horizon decision-making problems in complex robotics domains given low-level skills and a handful of short-horizon demonstrations containing sequences of images. To this end, we focus on learning abstract symbolic world models that facilitate zero-shot generalization to novel goals via planning. A critical component of such models is the set of symbolic predicates that define properties of and relationships between objects. In this work, we leverage pretrained vision language models (VLMs) to propose a large set of visual predicates potentially relevant for decision-making, and to evaluate those predicates directly from camera images. At training time, we pass the proposed predicates and demonstrations into an optimization-based model-learning algorithm to obtain an abstract symbolic world model that is defined in terms of a compact subset of the proposed predicates. At test time, given a novel goal in a novel setting, we use the VLM to construct a symbolic description of the current world state, and then use a search-based planning algorithm to find a sequence of low-level skills that achieves the goal. We demonstrate empirically across experiments in both simulation and the real world that our method can generalize aggressively, applying its learned world model to solve problems with a wide variety of object types, arrangements, numbers of objects, and visual backgrounds, as well as novel goals and much longer horizons than those seen at training time.
nan
Article 846
Title@2025-06-10 (2): Learning-based density-equalizing map
Title: Learning-based density-equalizing map | Learning-based Dichte-equalizing Karte | 基于学习的密度平等图 2506.10027v1 |
Authors (3): Yanwen Huang, Lok Ming Lui, Gary P. T. Choi
Density-equalizing map (DEM) serves as a powerful technique for creating shape deformations with the area changes reflecting an underlying density function. In recent decades, DEM has found widespread applications in fields such as data visualization, geometry processing, and medical imaging. Traditional approaches to DEM primarily rely on iterative numerical solvers for diffusion equations or optimization-based methods that minimize handcrafted energy functionals. However, these conventional techniques often face several challenges: they may suffer from limited accuracy, produce overlapping artifacts in extreme cases, and require substantial algorithmic redesign when extended from 2D to 3D, due to the derivative-dependent nature of their energy formulations. In this work, we propose a novel learning-based density-equalizing mapping framework (LDEM) using deep neural networks. Specifically, we introduce a loss function that enforces density uniformity and geometric regularity, and utilize a hierarchical approach to predict the transformations at both the coarse and dense levels. Our method demonstrates superior density-equalizing and bijectivity properties compared to prior methods for a wide range of simple and complex density distributions, and can be easily applied to surface remeshing with different effects. Also, it generalizes seamlessly from 2D to 3D domains without structural changes to the model architecture or loss formulation. Altogether, our work opens up new possibilities for scalable and robust computation of density-equalizing maps for practical applications.
nan
Article 847
Title@2025-06-10 (2): Reinforcement Learning Teachers of Test Time Scaling
Title: Reinforcement Learning Teachers of Test Time Scaling | Verstärktes Lernen von Lehrern der Testzeitskalierung | 测试时间尺度强化学习教师 2506.08388v1 |
Authors (3): Edoardo Cetin, Tianyu Zhao, Yujin Tang
Training reasoning language models (LMs) with reinforcement learning (RL) for one-hot correctness inherently relies on the LM being able to explore and solve its task with some chance at initialization. Furthermore, a key use case of reasoning LMs is to act as teachers for distilling new students and cold-starting future RL iterations rather than being deployed themselves. From these considerations, we introduce a new framework that avoids RL’s exploration challenge by training a new class of Reinforcement-Learned Teachers (RLTs) focused on yielding the most effective downstream distillation. RLTs are prompted with both the question and solution to each problem, and tasked to simply “connect-the-dots” with detailed explanations tailored for their students. We train RLTs with dense rewards obtained by feeding each explanation to the student and testing its understanding of the problem’s solution. In practice, the raw outputs of a 7B RLT provide higher final performance on competition and graduate-level tasks than existing distillation and cold-starting pipelines that collect and postprocess the reasoning traces of orders of magnitude larger LMs. Furthermore, RLTs maintain their effectiveness when training larger students and when applied zero-shot to out-of-distribution tasks, unlocking new levels of efficiency and re-usability for the RL reasoning framework.
nan
Article 848
Title@2025-06-10 (2): Network Threat Detection: Addressing Class Imbalanced Data with Deep Forest
Title: Network Threat Detection: Addressing Class Imbalanced Data with Deep Forest | Netzwerk Threat Detection: Adressierung von unausgeglichenen Daten der Klasse mit tiefem Wald | 网络威胁探测:处理深森林的分类平衡数据 2506.08383v1 |
Authors (2): Jiaqi Chen, Rongbin Ye
With the rapid expansion of Internet of Things (IoT) networks, detecting malicious traffic in real-time has become a critical cybersecurity challenge. This research addresses the detection challenges by presenting a comprehensive empirical analysis of machine learning techniques for malware detection using the IoT-23 dataset provided by the Stratosphere Laboratory. We address the significant class imbalance within the dataset through three resampling strategies. We implement and compare a few machine learning techniques. Our findings demonstrate that the combination of appropriate imbalance treatment techniques with ensemble methods, particularly gcForest, achieves better detection performance compared to traditional approaches. This work contributes significantly to the development of more intelligent and efficient automated threat detection systems for IoT environments, helping to secure critical infrastructure against sophisticated cyber attacks while optimizing computational resource usage.
nan
Article 849
Title@2025-06-10 (2): Reinforce LLM Reasoning through Multi-Agent Reflection
Title: Reinforce LLM Reasoning through Multi-Agent Reflection | Verstärkung der LLM-Vernunft durch Multi-Agent-Reflexion | 通过多方机构反省说明理由 2506.08379v1 |
Authors (2): Yurun Yuan, Tengyang Xie
Leveraging more test-time computation has proven to be an effective way to boost the reasoning capabilities of large language models (LLMs). Among various methods, the verify-and-improve paradigm stands out for enabling dynamic solution exploration and feedback incorporation. However, existing approaches often suffer from restricted feedback spaces and lack of coordinated training of different parties, leading to suboptimal performance. To address this, we model this multi-turn refinement process as a Markov Decision Process and introduce DPSDP (Direct Policy Search by Dynamic Programming), a reinforcement learning algorithm that trains an actor-critic LLM system to iteratively refine answers via direct preference learning on self-generated data. Theoretically, DPSDP can match the performance of any policy within the training distribution. Empirically, we instantiate DPSDP with various base models and show improvements on both in- and out-of-distribution benchmarks. For example, on benchmark MATH 500, majority voting over five refinement steps increases first-turn accuracy from 58.2% to 63.2% with Ministral-based models. An ablation study further confirms the benefits of multi-agent collaboration and out-of-distribution generalization.
nan
Article 850
Title@2025-06-10 (2): A Probabilistic Framework for Imputing Genetic Distances in Spatiotemporal Pathogen Models
Title: A Probabilistic Framework for Imputing Genetic Distances in Spatiotemporal Pathogen Models | Ein probabilistischer Rahmen für die Imputierung genetischer Entfernungen in Spatiotemporal Pathogen-Modellen | 关于将遗传距离算入时地病原体模型的概率框架 2506.09076v1 |
Authors (8): Haley Stone, Jing Du, Hao Xue, Matthew Scotch, David Heslop, Andreas Züfle, Chandini Raina MacIntyre, Flora Salim
Pathogen genome data offers valuable structure for spatial models, but its utility is limited by incomplete sequencing coverage. We propose a probabilistic framework for inferring genetic distances between unsequenced cases and known sequences within defined transmission chains, using time-aware evolutionary distance modeling. The method estimates pairwise divergence from collection dates and observed genetic distances, enabling biologically plausible imputation grounded in observed divergence patterns, without requiring sequence alignment or known transmission chains. Applied to highly pathogenic avian influenza A/H5 cases in wild birds in the United States, this approach supports scalable, uncertainty-aware augmentation of genomic datasets and enhances the integration of evolutionary information into spatiotemporal modeling workflows.
nan
Article 851
Title@2025-06-10 (2): AlphaFold Database Debiasing for Robust Inverse Folding
Title: AlphaFold Database Debiasing for Robust Inverse Folding | AlphaFold Datenbank-Debiasing für robuste Inverse Faltung | AlphaFold 数据库对硬性反向折叠的偏移 2506.08365v1 |
Authors (6): Cheng Tan, Zhenxiao Cao, Zhangyang Gao, Siyuan Li, Yufei Huang, Stan Z. Li
The AlphaFold Protein Structure Database (AFDB) offers unparalleled structural coverage at near-experimental accuracy, positioning it as a valuable resource for data-driven protein design. However, its direct use in training deep models that are sensitive to fine-grained atomic geometry, such as inverse folding, exposes a critical limitation. Comparative analysis of structural feature distributions reveals that AFDB structures exhibit distinct statistical regularities, reflecting a systematic geometric bias that deviates from the conformational diversity found in experimentally determined structures from the Protein Data Bank (PDB). While AFDB structures are cleaner and more idealized, PDB structures capture the intrinsic variability and physical realism essential for generalization in downstream tasks. To address this discrepancy, we introduce a Debiasing Structure AutoEncoder (DeSAE) that learns to reconstruct native-like conformations from intentionally corrupted backbone geometries. By training the model to recover plausible structural states, DeSAE implicitly captures a more robust and natural structural manifold. At inference, applying DeSAE to AFDB structures produces debiased structures that significantly improve inverse folding performance across multiple benchmarks. This work highlights the critical impact of subtle systematic biases in predicted structures and presents a principled framework for debiasing, significantly boosting the performance of structure-based learning tasks like inverse folding.
nan
Article 852
Title@2025-06-10 (2): Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity
Title: Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity | Lösen von Convex-Concave-Problemen mit $\tilde{\mathcal{O}}(ε^{-4/7})$ Oracle-Komplexität zweiter Ordnung | 以 $\ title_ mathcal{O} (- 4/ 7}) 美元解决二次折形甲骨文复杂度的连接问题 2506.08362v1 |
Authors (4): Lesi Chen, Chengchang Liu, Luo Luo, Jingzhao Zhang
Previous algorithms can solve convex-concave minimax problems $\min_{x \in \mathcal{X}} \max_{y \in \mathcal{Y}} f(x,y)$ with $\mathcal{O}(\epsilon^{-2/3})$ second-order oracle calls using Newton-type methods. This result has been speculated to be optimal because the upper bound is achieved by a natural generalization of the optimal first-order method. In this work, we show an improved upper bound of $\tilde{\mathcal{O}}(\epsilon^{-4/7})$ by generalizing the optimal second-order method for convex optimization to solve the convex-concave minimax problem. We further apply a similar technique to lazy Hessian algorithms and show that our proposed algorithm can also be seen as a second-order ``Catalyst’’ framework (Lin et al., JMLR 2018) that could accelerate any globally convergent algorithms for solving minimax problems.
nan
Article 853
Title@2025-06-10 (2): NysAct: A Scalable Preconditioned Gradient Descent using Nystrom Approximation
Title: NysAct: A Scalable Preconditioned Gradient Descent using Nystrom Approximation | NysAct: Ein skalierbarer vorkonditionierter Gradient Abstieg mit Nystrom-Annäherung | NysAct: 使用Nystrom 接近法的可缩放的预设条件渐变源 2506.08360v1 |
Authors (3): Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko
Adaptive gradient methods are computationally efficient and converge quickly, but they often suffer from poor generalization. In contrast, second-order methods enhance convergence and generalization but typically incur high computational and memory costs. In this work, we introduce NysAct, a scalable first-order gradient preconditioning method that strikes a balance between state-of-the-art first-order and second-order optimization methods. NysAct leverages an eigenvalue-shifted Nystrom method to approximate the activation covariance matrix, which is used as a preconditioning matrix, significantly reducing time and memory complexities with minimal impact on test accuracy. Our experiments show that NysAct not only achieves improved test accuracy compared to both first-order and second-order methods but also demands considerably less computational resources than existing second-order methods. Code is available at https://github.com/hseung88/nysact.
nan
Article 854
Title@2025-06-10 (2): An Adaptive Method Stabilizing Activations for Enhanced Generalization
Title: An Adaptive Method Stabilizing Activations for Enhanced Generalization | Eine adaptive Methode zur Stabilisierung von Aktivierungen für verstärkte Generalisierung | 增强普遍化的适应方法稳定活动 2506.08353v1 |
Authors (3): Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko
We introduce AdaAct, a novel optimization algorithm that adjusts learning rates according to activation variance. Our method enhances the stability of neuron outputs by incorporating neuron-wise adaptivity during the training process, which subsequently leads to better generalization – a complementary approach to conventional activation regularization methods. Experimental results demonstrate AdaAct’s competitive performance across standard image classification benchmarks. We evaluate AdaAct on CIFAR and ImageNet, comparing it with other state-of-the-art methods. Importantly, AdaAct effectively bridges the gap between the convergence speed of Adam and the strong generalization capabilities of SGD, all while maintaining competitive execution times. Code is available at https://github.com/hseung88/adaact.
nan
Article 855
Title@2025-06-10 (2): Differentially Private Relational Learning with Entity-level Privacy Guarantees
Title: Differentially Private Relational Learning with Entity-level Privacy Guarantees | Unterschiedlich privates relationales Lernen mit Datenschutzgarantien auf Körperschaftsebene | 与实体一级的隐私保障有区别的私人关系学习 2506.08347v1 |
Authors (5): Yinan Huang, Haoteng Ying, Eli Chien, Rongzhe Wei, Pan Li
Learning with relational and network-structured data is increasingly vital in sensitive domains where protecting the privacy of individual entities is paramount. Differential Privacy (DP) offers a principled approach for quantifying privacy risks, with DP-SGD emerging as a standard mechanism for private model training. However, directly applying DP-SGD to relational learning is challenging due to two key factors: (i) entities often participate in multiple relations, resulting in high and difficult-to-control sensitivity; and (ii) relational learning typically involves multi-stage, potentially coupled (interdependent) sampling procedures that make standard privacy amplification analyses inapplicable. This work presents a principled framework for relational learning with formal entity-level DP guarantees. We provide a rigorous sensitivity analysis and introduce an adaptive gradient clipping scheme that modulates clipping thresholds based on entity occurrence frequency. We also extend the privacy amplification results to a tractable subclass of coupled sampling, where the dependence arises only through sample sizes. These contributions lead to a tailored DP-SGD variant for relational data with provable privacy guarantees. Experiments on fine-tuning text encoders over text-attributed network-structured relational data demonstrate the strong utility-privacy trade-offs of our approach. Our code is available at https://github.com/Graph-COM/Node_DP.
nan
Article 856
Title@2025-06-10 (2): Re4MPC: Reactive Nonlinear MPC for Multi-model Motion Planning via Deep Reinforcement Learning
Title: Re4MPC: Reactive Nonlinear MPC for Multi-model Motion Planning via Deep Reinforcement Learning | Re4MPC: Reaktives nichtlineares MPC für Multi-Model-Bewegungsplanung über Deep Reinforcement Learning | Re4MPC:通过深强化学习进行多模式运动规划的反活动非线性非线性MPC 2506.08344v1 |
Authors (4): Neşet Ünver Akmandor, Sarvesh Prajapati, Mark Zolotas, Taşkın Padır
Traditional motion planning methods for robots with many degrees-of-freedom, such as mobile manipulators, are often computationally prohibitive for real-world settings. In this paper, we propose a novel multi-model motion planning pipeline, termed Re4MPC, which computes trajectories using Nonlinear Model Predictive Control (NMPC). Re4MPC generates trajectories in a computationally efficient manner by reactively selecting the model, cost, and constraints of the NMPC problem depending on the complexity of the task and robot state. The policy for this reactive decision-making is learned via a Deep Reinforcement Learning (DRL) framework. We introduce a mathematical formulation to integrate NMPC into this DRL framework. To validate our methodology and design choices, we evaluate DRL training and test outcomes in a physics-based simulation involving a mobile manipulator. Experimental results demonstrate that Re4MPC is more computationally efficient and achieves higher success rates in reaching end-effector goals than the NMPC baseline, which computes whole-body trajectories without our learning mechanism.
nan
Article 857
Title@2025-06-10 (2): Dynamical System Optimization
Title: Dynamical System Optimization | Dynamische Systemoptimierung | 动态系统优化 2506.08340v1 |
Authors (1): Emo Todorov
We develop an optimization framework centered around a core idea: once a (parametric) policy is specified, control authority is transferred to the policy, resulting in an autonomous dynamical system. Thus we should be able to optimize policy parameters without further reference to controls or actions, and without directly using the machinery of approximate Dynamic Programming and Reinforcement Learning. Here we derive simpler algorithms at the autonomous system level, and show that they compute the same quantities as policy gradients and Hessians, natural gradients, proximal methods. Analogs to approximate policy iteration and off-policy learning are also available. Since policy parameters and other system parameters are treated uniformly, the same algorithms apply to behavioral cloning, mechanism design, system identification, learning of state estimators. Tuning of generative AI models is not only possible, but is conceptually closer to the present framework than to Reinforcement Learning.
nan
Article 858
Title@2025-06-10 (2): Equivalence of stochastic and deterministic policy gradients
Title: Equivalence of stochastic and deterministic policy gradients | Gleichwertigkeit stochastischer und deterministischer politischer Gradienten | 政策梯度和确定性政策梯度等同 2505.23244v2 |
Authors (1): Emo Todorov
Policy gradients in continuous control have been derived for both stochastic and deterministic policies. Here we study the relationship between the two. In a widely-used family of MDPs involving Gaussian control noise and quadratic control costs, we show that the stochastic and deterministic policy gradients, natural gradients, and state value functions are identical; while the state-control value functions are different. We then develop a general procedure for constructing an MDP with deterministic policy that is equivalent to a given MDP with stochastic policy. The controls of this new MDP are the sufficient statistics of the stochastic policy in the original MDP. Our results suggest that policy gradient methods can be unified by approximating state value functions rather than state-control value functions.
nan
Article 859
Title@2025-06-10 (2): midr: Learning from Black-Box Models by Maximum Interpretation Decomposition
Title: midr: Learning from Black-Box Models by Maximum Interpretation Decomposition | midr: Von Black-Box-Modellen lernen durch maximale Interpretationszersetzung | 中间体: 通过最大解析分解从黑ox 模型中学习 2506.08338v1 |
Authors (3): Ryoichi Asashiba, Reiji Kozuma, Hirokazu Iwasawa
The use of appropriate methods of Interpretable Machine Learning (IML) and eXplainable Artificial Intelligence (XAI) is essential for adopting black-box predictive models in fields where model and prediction explainability is required. As a novel tool for interpreting black-box models, we introduce the R package midr, which implements Maximum Interpretation Decomposition (MID). MID is a functional decomposition approach that derives a low-order additive representation of a black-box model by minimizing the squared error between the model’s prediction function and this additive representation. midr enables learning from black-box models by constructing a global surrogate model with advanced analytical capabilities. After reviewing related work and the theoretical foundation of MID, we demonstrate the package’s usage and discuss some of its key features.
nan
Article 860
Title@2025-06-10 (2): A Simple Analysis of Discretization Error in Diffusion Models
Title: A Simple Analysis of Discretization Error in Diffusion Models | Eine einfache Analyse des Diskretisierungsfehlers in Diffusionsmodellen | 简单分析扩散模型中的分解错误 2506.08337v1 |
Authors (2): Juhyeok Choi, Chenglin Fan
Diffusion models, formulated as discretizations of stochastic differential equations (SDEs), achieve state-of-the-art generative performance. However, existing analyses of their discretization error often rely on complex probabilistic tools. In this work, we present a simplified theoretical framework for analyzing the Euler–Maruyama discretization of variance-preserving SDEs (VP-SDEs) in Denoising Diffusion Probabilistic Models (DDPMs), where $ T $ denotes the number of denoising steps in the diffusion process. Our approach leverages Gr"onwall’s inequality to derive a convergence rate of $ \mathcal{O}(1/T^{1/2}) $ under Lipschitz assumptions, significantly streamlining prior proofs. Furthermore, we demonstrate that the Gaussian noise in the discretization can be replaced by a discrete random variable (e.g., Rademacher or uniform noise) without sacrificing convergence guarantees-an insight with practical implications for efficient sampling. Experiments validate our theory, showing that (1) the error scales as predicted, (2) discrete noise achieves comparable sample quality to Gaussian noise, and (3) incorrect noise scaling degrades performance. By unifying simplified analysis and discrete noise substitution, our work bridges theoretical rigor with practical efficiency in diffusion-based generative modeling.
nan
Article 861
Title@2025-06-10 (2): Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer
Title: Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer | Zero-shot Meta-Learning für Tabular Prediction Aufgaben mit adversarially Pre-trained Transformer | 采用未受过训练的变换器进行表列预测任务 2502.04573v2 |
Authors (2): Yulun Wu, Doron L. Bergman
We present an Adversarially Pre-trained Transformer (APT) that is able to perform zero-shot meta-learning on tabular prediction tasks without pre-training on any real-world dataset, extending on the recent development of Prior-Data Fitted Networks (PFNs) and TabPFN. Specifically, APT is pre-trained with adversarial synthetic data agents, who continue to shift their underlying data generating distribution and deliberately challenge the model with different synthetic datasets. In addition, we propose a mixture block architecture that is able to handle classification tasks with arbitrary number of classes, addressing the class size limitation – a crucial weakness of prior deep tabular zero-shot learners. In experiments, we show that our framework matches state-of-the-art performance on small classification tasks without filtering on dataset characteristics such as number of classes and number of missing values, while maintaining an average runtime under one second. On common benchmark dataset suites in both classification and regression, we show that adversarial pre-training was able to enhance TabPFN’s performance. In our analysis, we demonstrate that the adversarial synthetic data agents were able to generate a more diverse collection of data compared to the ordinary random generator in TabPFN. In addition, we demonstrate that our mixture block neural design has improved generalizability and greatly accelerated pre-training.
nan
Article 862
Title@2025-06-10 (2): Generation from Noisy Examples
Title: Generation from Noisy Examples | Generierung aus lauteren Beispielen | 从吵闹中生成 2501.04179v2 |
Authors (2): Ananth Raman, Vinod Raman
We continue to study the learning-theoretic foundations of generation by extending the results from Kleinberg and Mullainathan [2024] and Li et al. [2024] to account for noisy example streams. In the noiseless setting of Kleinberg and Mullainathan [2024] and Li et al. [2024], an adversary picks a hypothesis from a binary hypothesis class and provides a generator with a sequence of its positive examples. The goal of the generator is to eventually output new, unseen positive examples. In the noisy setting, an adversary still picks a hypothesis and a sequence of its positive examples. But, before presenting the stream to the generator, the adversary inserts a finite number of negative examples. Unaware of which examples are noisy, the goal of the generator is to still eventually output new, unseen positive examples. In this paper, we provide necessary and sufficient conditions for when a binary hypothesis class can be noisily generatable. We provide such conditions with respect to various constraints on the number of distinct examples that need to be seen before perfect generation of positive examples. Interestingly, for finite and countable classes we show that generatability is largely unaffected by the presence of a finite number of noisy examples.
nan
Article 863
Title@2025-06-10 (2): Graph Prompting for Graph Learning Models: Recent Advances and Future Directions
Title: Graph Prompting for Graph Learning Models: Recent Advances and Future Directions | Graph Prompting für Graph Learning Models: Aktuelle Fortschritte und zukünftige Richtungen | 图示学习模型提示图示:最近的进展和未来方向 2506.08326v1 |
Authors (10): Xingbo Fu, Zehong Wang, Zihan Chen, Jiazheng Li, Yaochen Zhu, Zhenyu Lei, Cong Shen, Yanfang Ye, Chuxu Zhang, Jundong Li
Graph learning models have demonstrated great prowess in learning expressive representations from large-scale graph data in a wide variety of real-world scenarios. As a prevalent strategy for training powerful graph learning models, the “pre-training, adaptation” scheme first pre-trains graph learning models on unlabeled graph data in a self-supervised manner and then adapts them to specific downstream tasks. During the adaptation phase, graph prompting emerges as a promising approach that learns trainable prompts while keeping the pre-trained graph learning models unchanged. In this paper, we present a systematic review of recent advancements in graph prompting. First, we introduce representative graph pre-training methods that serve as the foundation step of graph prompting. Next, we review mainstream techniques in graph prompting and elaborate on how they design learnable prompts for graph prompting. Furthermore, we summarize the real-world applications of graph prompting from different domains. Finally, we discuss several open challenges in existing studies with promising future directions in this field.
nan
Article 864
Title@2025-06-10 (2): Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces
Title: Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces | Modellfreier Kernel Conformal Depth misst Algorithmen zur Unsicherheitsquantifizierung in Regressionsmodellen in trennbaren Hilbert-Räumen | 在塞可分离的希尔伯特空间的回归模型中,不确定性量化不确定性的模型-无模型核心常规 常规深度测量标准 2506.08325v1 |
Authors (5): Marcos Matabuena, Rahul Ghosal, Pavlo Mozharovskyi, Oscar Hernan Madrid Padilla, Jukka-Pekka Onnela
Depth measures are powerful tools for defining level sets in emerging, non–standard, and complex random objects such as high-dimensional multivariate data, functional data, and random graphs. Despite their favorable theoretical properties, the integration of depth measures into regression modeling to provide prediction regions remains a largely underexplored area of research. To address this gap, we propose a novel, model-free uncertainty quantification algorithm based on conditional depth measures–specifically, conditional kernel mean embeddings and an integrated depth measure. These new algorithms can be used to define prediction and tolerance regions when predictors and responses are defined in separable Hilbert spaces. The use of kernel mean embeddings ensures faster convergence rates in prediction region estimation. To enhance the practical utility of the algorithms with finite samples, we also introduce a conformal prediction variant that provides marginal, non-asymptotic guarantees for the derived prediction regions. Additionally, we establish both conditional and unconditional consistency results, as well as fast convergence rates in certain homoscedastic settings. We evaluate the finite–sample performance of our model in extensive simulation studies involving various types of functional data and traditional Euclidean scenarios. Finally, we demonstrate the practical relevance of our approach through a digital health application related to physical activity, aiming to provide personalized recommendations
nan
Article 865
Title@2025-06-10 (2): RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language
Title: RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language | RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors und Natural Language | 关于音频、视频、嵌入传感器和自然语言的问答答案 2505.17114v2 |
Authors (3): Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam
Multimodal question answering (QA) often requires identifying which video, audio, or sensor tokens are relevant to the question. Yet modality disagreements are common: off-camera speech, background noise, or motion outside the field of view often mislead fusion models that weight all streams equally. We present RAVEN, a unified QA architecture whose core is QuART, a query-conditioned cross-modal gating module that assigns scalar relevance scores to each token across modalities, enabling the model to amplify informative signals and suppress distractors before fusion. RAVEN is trained through a three-stage pipeline comprising unimodal pretraining, query-aligned fusion, and disagreement-oriented fine-tuning – each stage targeting a distinct challenge in multi-modal reasoning: representation quality, cross-modal relevance, and robustness to modality mismatch. To support training and evaluation, we release AVS-QA, a dataset of 300K synchronized Audio–Video-Sensor streams paired with automatically generated question-answer pairs. Experimental results on seven multi-modal QA benchmarks – including egocentric and exocentric tasks – show that RAVEN achieves up to 14.5\% and 8.0\% gains in accuracy compared to state-of-the-art multi-modal large language models, respectively. Incorporating sensor data provides an additional 16.4\% boost, and the model remains robust under modality corruption, outperforming SOTA baselines by 50.23\%. Our code and dataset are available at https://github.com/BASHLab/RAVEN.
nan
Article 866
Title@2025-06-10 (2): TPP-LLM: Modeling Temporal Point Processes by Efficiently Fine-Tuning Large Language Models
Title: TPP-LLM: Modeling Temporal Point Processes by Efficiently Fine-Tuning Large Language Models | TPP-LLM: Modellierung von Temporal Point-Prozessen durch effizientes Feintuning großer Sprachmodelle | TPP-LLM:通过高效微调大语言模式模拟时点进程 2410.02062v2 |
Authors (2): Zefang Liu, Yinzhu Quan
Temporal point processes (TPPs) are widely used to model the timing and occurrence of events in domains such as social networks, transportation systems, and e-commerce. In this paper, we introduce TPP-LLM, a novel framework that integrates large language models (LLMs) with TPPs to capture both the semantic and temporal aspects of event sequences. Unlike traditional methods that rely on categorical event type representations, TPP-LLM directly utilizes the textual descriptions of event types, enabling the model to capture rich semantic information embedded in the text. While LLMs excel at understanding event semantics, they are less adept at capturing temporal patterns. To address this, TPP-LLM incorporates temporal embeddings and employs parameter-efficient fine-tuning (PEFT) methods to effectively learn temporal dynamics without extensive retraining. This approach improves both predictive accuracy and computational efficiency. Experimental results across diverse real-world datasets demonstrate that TPP-LLM outperforms state-of-the-art baselines in sequence modeling and event prediction, highlighting the benefits of combining LLMs with TPPs.
nan
Article 867
Title@2025-06-10 (2): Curse of High Dimensionality Issue in Transformer for Long-context Modeling
Title: Curse of High Dimensionality Issue in Transformer for Long-context Modeling | Fluch der Hochdimensionalitätsfrage im Transformer für die Langkontextmodellierung | 变异器中高多维度问题的诅咒,用于长期建模 2505.22107v3 |
Authors (8): Shuhai Zhang, Zeng You, Yaofo Chen, Zhiquan Wen, Qianyue Wang, Zhijie Qiu, Yuanqing Li, Mingkui Tan
Transformer-based large language models (LLMs) excel in natural language processing tasks by capturing long-range dependencies through self-attention mechanisms. However, long-context modeling faces significant computational inefficiencies due to \textit{redundant} attention computations: while attention weights are often \textit{sparse}, all tokens consume \textit{equal} computational resources. In this paper, we reformulate traditional probabilistic sequence modeling as a \textit{supervised learning task}, enabling the separation of relevant and irrelevant tokens and providing a clearer understanding of redundancy. Based on this reformulation, we theoretically analyze attention sparsity, revealing that only a few tokens significantly contribute to predictions. Building on this, we formulate attention optimization as a linear coding problem and propose a \textit{group coding strategy}, theoretically showing its ability to improve robustness against random noise and enhance learning efficiency. Motivated by this, we propose \textit{Dynamic Group Attention} (DGA), which leverages the group coding to explicitly reduce redundancy by aggregating less important tokens during attention computation. Empirical results show that our DGA significantly reduces computational costs while maintaining competitive performance.Code is available at https://github.com/bolixinyu/DynamicGroupAttention.
nan
Article 868
Title@2025-06-10 (2): Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
Title: Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion | Warum Diffusion arbeitet maskieren: Zustand auf dem Sprungplan für verbesserte diskrete Diffusion | 为何遮掩传播工作有效:改进传播的跳跳时间表上的条件 2506.08316v1 |
Authors (3): Alan N. Amin, Nate Gruver, Andrew Gordon Wilson
Discrete diffusion models, like continuous diffusion models, generate high-quality samples by gradually undoing noise applied to datapoints with a Markov process. Gradual generation in theory comes with many conceptual benefits; for example, inductive biases can be incorporated into the noising Markov process, and access to improved sampling algorithms. In practice, however, the consistently best performing discrete diffusion model is, surprisingly, masking diffusion, which does not denoise gradually. Here we explain the superior performance of masking diffusion by noting that it makes use of a fundamental difference between continuous and discrete Markov processes: discrete Markov processes evolve by discontinuous jumps at a fixed rate and, unlike other discrete diffusion models, masking diffusion builds in the known distribution of jump times and only learns where to jump to. We show that we can similarly bake in the known distribution of jump times into any discrete diffusion model. The resulting models - schedule-conditioned discrete diffusion (SCUD) - generalize classical discrete diffusion and masking diffusion. By applying SCUD to models with noising processes that incorporate inductive biases on images, text, and protein data, we build models that outperform masking.
nan
Article 869
Title@2025-06-10 (2): Private Evolution Converges
Title: Private Evolution Converges | Private Evolutionskonvergenzen | 私人进化集市 2506.08312v1 |
Authors (3): Tomás González, Giulia Fanti, Aaditya Ramdas
Private Evolution (PE) is a promising training-free method for differentially private (DP) synthetic data generation. While it achieves strong performance in some domains (e.g., images and text), its behavior in others (e.g., tabular data) is less consistent. To date, the only theoretical analysis of the convergence of PE depends on unrealistic assumptions about both the algorithm’s behavior and the structure of the sensitive dataset. In this work, we develop a new theoretical framework to explain PE’s practical behavior and identify sufficient conditions for its convergence. For $d$-dimensional sensitive datasets with $n$ data points from a bounded domain, we prove that PE produces an $(\epsilon, \delta)$-DP synthetic dataset with expected 1-Wasserstein distance of order $\tilde{O}(d(n\epsilon)^{-1/d})$ from the original, establishing worst-case convergence of the algorithm as $n \to \infty$. Our analysis extends to general Banach spaces as well. We also connect PE to the Private Signed Measure Mechanism, a method for DP synthetic data generation that has thus far not seen much practical adoption. We demonstrate the practical relevance of our theoretical findings in simulations.
nan
Article 870
Title@2025-06-10 (2): Continuous Temporal Learning of Probability Distributions via Neural ODEs with Applications in Continuous Glucose Monitoring Data
Title: Continuous Temporal Learning of Probability Distributions via Neural ODEs with Applications in Continuous Glucose Monitoring Data | Kontinuierliches Temporales Lernen der Wahrscheinlichkeitsverteilungen über neurale ODEs mit Anwendungen in kontinuierlichen Glukoseüberwachungsdaten | 不断从时间上学习通过有持续甘蔗监测数据应用的神经代码对概率分布进行概率分布的连续时间学习 2505.08698v2 |
Authors (2): Antonio Álvarez-López, Marcos Matabuena
Modeling the dynamics of probability distributions from time-dependent data samples is a fundamental problem in many fields, including digital health. The goal is to analyze how the distribution of a biomarker, such as glucose, changes over time and how these changes may reflect the progression of chronic diseases like diabetes. We introduce a probabilistic model based on a Gaussian mixture that captures the evolution of a continuous-time stochastic process. Our approach combines a non-parametric estimate of the distribution, obtained with Maximum Mean Discrepancy (MMD), and a Neural Ordinary Differential Equation (Neural ODE) that governs the temporal evolution of the mixture weights. The model is highly interpretable, detects subtle distribution shifts, and remains computationally efficient. Simulation studies show that our method matches or surpasses the estimation accuracy of state-of-the-art, less interpretable techniques such as normalizing flows and non-parametric kernel density estimators. We further demonstrate its utility using data from a digital clinical trial, revealing how interventions affect the time-dependent distribution of glucose levels. The proposed method enables rigorous comparisons between control and treatment groups from both mathematical and clinical perspectives, offering novel longitudinal characterizations that existing approaches cannot achieve.
nan
Article 871
Title@2025-06-10 (2): GRAM: Generalization in Deep RL with a Robust Adaptation Module
Title: GRAM: Generalization in Deep RL with a Robust Adaptation Module | GRAM: Verallgemeinerung in Deep RL mit einem robusten Anpassungsmodul | GRAM: 带有强力适应模块的深 RL 常规化 2412.04323v2 |
Authors (6): James Queeney, Xiaoyi Cai, Alexander Schperberg, Radu Corcodel, Mouhacine Benosman, Jonathan P. How
The reliable deployment of deep reinforcement learning in real-world settings requires the ability to generalize across a variety of conditions, including both in-distribution scenarios seen during training as well as novel out-of-distribution scenarios. In this work, we present a framework for dynamics generalization in deep reinforcement learning that unifies these two distinct types of generalization within a single architecture. We introduce a robust adaptation module that provides a mechanism for identifying and reacting to both in-distribution and out-of-distribution environment dynamics, along with a joint training pipeline that combines the goals of in-distribution adaptation and out-of-distribution robustness. Our algorithm GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenarios upon deployment, which we demonstrate through extensive simulation and hardware locomotion experiments on a quadruped robot.
nan
Article 872
Title@2025-06-10 (2): H$^2$GFM: Towards unifying Homogeneity and Heterogeneity on Text-Attributed Graphs
Title: H$^2$GFM: Towards unifying Homogeneity and Heterogeneity on Text-Attributed Graphs | H$^2$GFM: Auf dem Weg zur Vereinheitlichung von Homogenität und Heterogenität auf Text-Attributen Grafiken | H$2美元GFM:努力在文本图表中统一同质性和异质性 2506.08298v1 |
Authors (7): Trung-Kien Nguyen, Heng Ping, Shixuan Li, Peiyu Zhang, Nikos Kanakaris, Nicholas Kotov, Paul Bogdan
The growing interests and applications of graph learning in diverse domains have propelled the development of a unified model generalizing well across different graphs and tasks, known as the Graph Foundation Model (GFM). Existing research has leveraged text-attributed graphs (TAGs) to tackle the heterogeneity in node features among graphs. However, they primarily focus on homogeneous TAGs (HoTAGs), leaving heterogeneous TAGs (HeTAGs), where multiple types of nodes/edges reside, underexplored. To enhance the capabilities and applications of GFM, we introduce H$^2$GFM, a novel framework designed to generalize across both HoTAGs and HeTAGs. Our model projects diverse meta-relations among graphs under a unified textual space, and employs a context encoding to capture spatial and higher-order semantic relationships. To achieve robust node representations, we propose a novel context-adaptive graph transformer (CGT), effectively capturing information from both context neighbors and their relationships. Furthermore, we employ a mixture of CGT experts to capture the heterogeneity in structural patterns among graph types. Comprehensive experiments on a wide range of HoTAGs and HeTAGs as well as learning scenarios demonstrate the effectiveness of our model.
nan
Article 873
Title@2025-06-09 (1): From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?
Title: From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information? | Von passiv zur aktiven Vernunft: Können große Sprachmodelle unter unvollständigen Informationen die richtigen Fragen stellen? | 从被动到积极理由:大语言模式能够在不完整信息下询问正确的问题吗? 2506.08295v1 |
Authors (6): Zhanke Zhou, Xiao Feng, Zhaocheng Zhu, Jiangchao Yao, Sanmi Koyejo, Bo Han
While existing benchmarks probe the reasoning abilities of large language models (LLMs) across diverse domains, they predominantly assess passive reasoning, providing models with all the information needed to reach a solution. By contrast, active reasoning-where an LLM must interact with external systems to acquire missing evidence or data-has received little systematic attention. To address this shortfall, we present AR-Bench, a novel benchmark designed explicitly to evaluate an LLM’s active reasoning skills. AR-Bench comprises three task families-detective cases, situation puzzles, and guessing numbers-that together simulate real-world, agentic scenarios and measure performance across commonsense, logical, and symbolic reasoning challenges. Empirical evaluation on AR-Bench demonstrates that contemporary LLMs exhibit pronounced difficulties with active reasoning: they frequently fail to acquire or leverage the information needed to solve tasks. This gap highlights a stark divergence between their passive and active reasoning abilities. Moreover, ablation studies indicate that even advanced strategies, such as tree-based searching or post-training approaches, yield only modest gains and fall short of the levels required for real-world deployment. Collectively, these findings highlight the critical need to advance methodology for active reasoning, e.g., incorporating interactive learning, real-time feedback loops, and environment-aware objectives for training. The benchmark is publicly available at: https://github.com/tmlr-group/AR-Bench.
nan
Article 874
Title@2025-06-09 (1): From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium
Title: From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium | Von der Debatte zum Equilibrium: Glaube-getriebener Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium | 从辩论到平衡:通过巴伊西亚纳什平衡进行 2506.08292v1 |
Authors (6): Xie Yi, Zhanke Zhou, Chentao Cao, Qiyu Niu, Tongliang Liu, Bo Han
Multi-agent frameworks can substantially boost the reasoning power of large language models (LLMs), but they typically incur heavy computational costs and lack convergence guarantees. To overcome these challenges, we recast multi-LLM coordination as an incomplete-information game and seek a Bayesian Nash equilibrium (BNE), in which each agent optimally responds to its probabilistic beliefs about the strategies of others. We introduce Efficient Coordination via Nash Equilibrium (ECON), a hierarchical reinforcement-learning paradigm that marries distributed reasoning with centralized final output. Under ECON, each LLM independently selects responses that maximize its expected reward, conditioned on its beliefs about co-agents, without requiring costly inter-agent exchanges. We mathematically prove that ECON attains a markedly tighter regret bound than non-equilibrium multi-agent schemes. Empirically, ECON outperforms existing multi-LLM approaches by 11.2% on average across six benchmarks spanning complex reasoning and planning tasks. Further experiments demonstrate ECON’s ability to flexibly incorporate additional models, confirming its scalability and paving the way toward larger, more powerful multi-LLM ensembles. The code is publicly available at: https://github.com/tmlr-group/ECON.
nan
Article 875
Title@2025-06-09 (1): Digital Twin Synchronization: Bridging the Sim-RL Agent to a Real-Time Robotic Additive Manufacturing Control
Title: Digital Twin Synchronization: Bridging the Sim-RL Agent to a Real-Time Robotic Additive Manufacturing Control | Digitale Twin-Synchronisierung: Überbrücken des Sim-RL Agents zu einer Echtzeit-Roboter-Zusatz-Fertigungssteuerung | 数字双双同步同步:将Sim-RL代理连接到实时机器人添加剂制造控制 2501.18016v2 |
Authors (4): Matsive Ali, Sandesh Giri, Sen Liu, Qin Yang
With the rapid development of deep reinforcement learning technology, it gradually demonstrates excellent potential and is becoming the most promising solution in the robotics. However, in the smart manufacturing domain, there is still not too much research involved in dynamic adaptive control mechanisms optimizing complex processes. This research advances the integration of Soft Actor-Critic (SAC) with digital twins for industrial robotics applications, providing a framework for enhanced adaptive real-time control for smart additive manufacturing processing. The system architecture combines Unity’s simulation environment with ROS2 for seamless digital twin synchronization, while leveraging transfer learning to efficiently adapt trained models across tasks. We demonstrate our methodology using a Viper X300s robot arm with the proposed hierarchical reward structure to address the common reinforcement learning challenges in two distinct control scenarios. The results show rapid policy convergence and robust task execution in both simulated and physical environments demonstrating the effectiveness of our approach.
nan
Article 876
Title@2025-06-09 (1): Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder Therapy
Title: Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder Therapy | Edge Computing basierte human-Roboter Kognitive Fusion: Eine medizinische Fallstudie in der Autismus-Spektrum-Störungstherapie | 以边缘计算机为基础的人类-机器人认知共生:自闭症频谱病理医学案例研究 2401.00776v2 |
Authors (1): Qin Yang
In recent years, edge computing has served as a paradigm that enables many future technologies like AI, Robotics, IoT, and high-speed wireless sensor networks (like 5G) by connecting cloud computing facilities and services to the end users. Especially in medical and healthcare applications, it provides remote patient monitoring and increases voluminous multimedia. From the robotics angle, robot-assisted therapy (RAT) is an active-assistive robotic technology in rehabilitation robotics, attracting researchers to study and benefit people with disability like autism spectrum disorder (ASD) children. However, the main challenge of RAT is that the model capable of detecting the affective states of ASD people exists and can recall individual preferences. Moreover, involving expert diagnosis and recommendations to guide robots in updating the therapy approach to adapt to different statuses and scenarios is a crucial part of the ASD therapy process. This paper proposes the architecture of edge cognitive computing by combining human experts and assisted robots collaborating in the same framework to achieve a seamless remote diagnosis, round-the-clock symptom monitoring, emergency warning, therapy alteration, and advanced assistance.
nan
Article 877
Title@2025-06-09 (1): Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers
Title: Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers | Präzise hochdimensionale Asymptotik für die Quantifizierung heterogener Übertragungen | 用于量化异基因转移的精确高多元性高活性 2010.11750v5 |
Authors (5): Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su
The problem of learning one task using samples from another task is central to transfer learning. In this paper, we focus on answering the following question: when does combining the samples from two related tasks perform better than learning with one target task alone? This question is motivated by an empirical phenomenon known as negative transfer, which has been observed in practice. While the transfer effect from one task to another depends on factors such as their sample sizes and the spectrum of their covariance matrices, precisely quantifying this dependence has remained a challenging problem. In order to compare a transfer learning estimator to single-task learning, one needs to compare the risks between the two estimators precisely. Further, the comparison depends on the distribution shifts between the two tasks. This paper applies recent developments of random matrix theory to tackle this challenge in a high-dimensional linear regression setting with two tasks. We show precise high-dimensional asymptotics for the bias and variance of a classical hard parameter sharing (HPS) estimator in the proportional limit, where the sample sizes of both tasks increase proportionally with dimension at fixed ratios. The precise asymptotics apply to various types of distribution shifts, including covariate shifts, model shifts, and combinations of both. We illustrate these results in a random-effects model to mathematically prove a phase transition from positive to negative transfer as the number of source task samples increases. One insight from the analysis is that a rebalanced HPS estimator, which downsizes the source task when the model shift is high, achieves the minimax optimal rate. The finding regarding phase transition also applies to multiple tasks when covariates are shared across tasks. Simulations validate the accuracy of the high-dimensional asymptotics for finite dimensions.
nan
Article 878
Title@2025-06-09 (1): Voice Impression Control in Zero-Shot TTS
Title: Voice Impression Control in Zero-Shot TTS | Sprach-Impression-Steuerung in Zero-Shot TTS | 零热TTTS中的声音压抑控制 2506.05688v2 |
Authors (3): Keinichi Fujita, Shota Horiguchi, Yusuke Ijima
Para-/non-linguistic information in speech is pivotal in shaping the listeners’ impression. Although zero-shot text-to-speech (TTS) has achieved high speaker fidelity, modulating subtle para-/non-linguistic information to control perceived voice characteristics, i.e., impressions, remains challenging. We have therefore developed a voice impression control method in zero-shot TTS that utilizes a low-dimensional vector to represent the intensities of various voice impression pairs (e.g., dark-bright). The results of both objective and subjective evaluations have demonstrated our method’s effectiveness in impression control. Furthermore, generating this vector via a large language model enables target-impression generation from a natural language description of the desired impression, thus eliminating the need for manual optimization. Audio examples are available on our demo page (https://ntt-hilab-gensp.github.io/is2025voiceimpression/).
nan
Article 879
Title@2025-06-09 (1): Seeing Voices: Generating A-Roll Video from Audio with Mirage
Title: Seeing Voices: Generating A-Roll Video from Audio with Mirage | Stimmen sehen: A-Roll-Video aus Audio mit Mirage generieren | 观看声音:从《幻影之音》制作一首歌曲视频 2506.08279v1 |
Authors (13): Aditi Sundararaman, Amogh Adishesha, Andrew Jaegle, Dan Bigioi, Hyoung-Kyu Song, Jon Kyl, Justin Mao, Kevin Lan, Mojtaba Komeili, ShahRukh Athar, Sheila Babayan, Stanislau Beliasau, William Buchwalter
From professional filmmaking to user-generated content, creators and consumers have long recognized that the power of video depends on the harmonious integration of what we hear (the video’s audio track) with what we see (the video’s image sequence). Current approaches to video generation either ignore sound to focus on general-purpose but silent image sequence generation or address both visual and audio elements but focus on restricted application domains such as re-dubbing. We introduce Mirage, an audio-to-video foundation model that excels at generating realistic, expressive output imagery from scratch given an audio input. When integrated with existing methods for speech synthesis (text-to-speech, or TTS), Mirage results in compelling multimodal video. When trained on audio-video footage of people talking (A-roll) and conditioned on audio containing speech, Mirage generates video of people delivering a believable interpretation of the performance implicit in input audio. Our central technical contribution is a unified method for training self-attention-based audio-to-video generation models, either from scratch or given existing weights. This methodology allows Mirage to retain generality as an approach to audio-to-video generation while producing outputs of superior subjective quality to methods that incorporate audio-specific architectures or loss components specific to people, speech, or details of how images or audio are captured. We encourage readers to watch and listen to the results of Mirage for themselves (see paper and comments for links).
nan
Article 880
Title@2025-06-09 (1): Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain
Title: Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain | Instruction-Tuned Video-Audio Modelle Elucidate Funktionelle Spezialisierung im Gehirn | 脑中精明功能性专业 2506.08277v1 |
Authors (8): Subba Reddy Oota, Khushbu Pahwa, Prachi Jindal, Satya Sai Srinath Namburi, Maneesh Singh, Tanmoy Chakraborty, Bapi S. Raju, Manish Gupta
Recent voxel-wise multimodal brain encoding studies have shown that multimodal large language models (MLLMs) exhibit a higher degree of brain alignment compared to unimodal models in both unimodal and multimodal stimulus settings. More recently, instruction-tuned multimodal models have shown to generate task-specific representations that align strongly with brain activity. However, prior work evaluating the brain alignment of MLLMs has primarily focused on unimodal settings or relied on non-instruction-tuned multimodal models for multimodal stimuli. To address this gap, we investigated brain alignment, that is, measuring the degree of predictivity of neural activity recorded while participants were watching naturalistic movies (video along with audio) with representations derived from MLLMs. We utilized instruction-specific embeddings from six video and two audio instruction-tuned MLLMs. Experiments with 13 video task-specific instructions show that instruction-tuned video MLLMs significantly outperform non-instruction-tuned multimodal (by 15%) and unimodal models (by 20%). Our evaluation of MLLMs for both video and audio tasks using language-guided instructions shows clear disentanglement in task-specific representations from MLLMs, leading to precise differentiation of multimodal functional processing in the brain. We also find that MLLM layers align hierarchically with the brain, with early sensory areas showing strong alignment with early layers, while higher-level visual and language regions align more with middle to late layers. These findings provide clear evidence for the role of task-specific instructions in improving the alignment between brain activity and MLLMs, and open new avenues for mapping joint information processing in both the systems. We make the code publicly available [https://github.com/subbareddy248/mllm_videos].
nan
Article 881
Title@2025-06-09 (1): SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Title: SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device | SnapGen-V: Ein Fünf-Sekunden-Video innerhalb von fünf Sekunden auf einem mobilen Gerät erzeugen | SnapGen-V:在移动设备上制作5秒内5秒内5秒的视频 2412.10494v2 |
Authors (14): Yushu Wu, Zhixing Zhang, Yanyu Li, Yanwu Xu, Anil Kag, Yang Sui, Huseyin Coskun, Ke Ma, Aleksei Lebedev, Ju Hu, Dimitris Metaxas, Yanzhi Wang, Sergey Tulyakov, Jian Ren
We have witnessed the unprecedented success of diffusion-based video generation over the past year. Recently proposed models from the community have wielded the power to generate cinematic and high-resolution videos with smooth motions from arbitrary input prompts. However, as a supertask of image generation, video generation models require more computation and are thus hosted mostly on cloud servers, limiting broader adoption among content creators. In this work, we propose a comprehensive acceleration framework to bring the power of the large-scale video diffusion model to the hands of edge users. From the network architecture scope, we initialize from a compact image backbone and search out the design and arrangement of temporal layers to maximize hardware efficiency. In addition, we propose a dedicated adversarial fine-tuning algorithm for our efficient model and reduce the denoising steps to 4. Our model, with only 0.6B parameters, can generate a 5-second video on an iPhone 16 PM within 5 seconds. Compared to server-side models that take minutes on powerful GPUs to generate a single video, we accelerate the generation by magnitudes while delivering on-par quality.
nan
Article 882
Title@2025-06-09 (1): LEANN: A Low-Storage Vector Index
Title: LEANN: A Low-Storage Vector Index | LEANN: Ein speicherarmer Vektorindex | LEANN:低压力矢量指数 2506.08276v1 |
Authors (13): Yichuan Wang, Shu Liu, Zhifei Li, Yongji Wu, Ziming Mao, Yilong Zhao, Xiao Yan, Zhiying Xu, Yang Zhou, Ion Stoica, Sewon Min, Matei Zaharia, Joseph E. Gonzalez
Embedding-based search is widely used in applications such as recommendation and retrieval-augmented generation (RAG). Recently, there is a growing demand to support these capabilities over personal data stored locally on devices. However, maintaining the necessary data structure associated with the embedding-based search is often infeasible due to its high storage overhead. For example, indexing 100 GB of raw data requires 150 to 700 GB of storage, making local deployment impractical. Reducing this overhead while maintaining search quality and latency becomes a critical challenge. In this paper, we present LEANN, a storage-efficient approximate nearest neighbor (ANN) search index optimized for resource-constrained personal devices. LEANN combines a compact graph-based structure with an efficient on-the-fly recomputation strategy to enable fast and accurate retrieval with minimal storage overhead. Our evaluation shows that LEANN reduces index size to under 5% of the original raw data, achieving up to 50 times smaller storage than standard indexes, while maintaining 90% top-3 recall in under 2 seconds on real-world question answering benchmarks.
nan
Article 883
Title@2025-06-09 (1): Exploring Microstructural Dynamics in Cryptocurrency Limit Order Books: Better Inputs Matter More Than Stacking Another Hidden Layer
Title: Exploring Microstructural Dynamics in Cryptocurrency Limit Order Books: Better Inputs Matter More Than Stacking Another Hidden Layer | Erforschung mikrostruktureller Dynamiken in Kryptowährung Limit Order Books: Bessere Eingaben sind wichtiger als das Stapeln einer anderen versteckten Ebene | 探索加密货币限制顺序书中的微观结构动态:更好的投入比另层隐藏层更重要 2506.05764v2 |
Authors (1): Haochuan Wang
Cryptocurrency price dynamics are driven largely by microstructural supply demand imbalances in the limit order book (LOB), yet the highly noisy nature of LOB data complicates the signal extraction process. Prior research has demonstrated that deep-learning architectures can yield promising predictive performance on pre-processed equity and futures LOB data, but they often treat model complexity as an unqualified virtue. In this paper, we aim to examine whether adding extra hidden layers or parameters to “blackbox ish” neural networks genuinely enhances short term price forecasting, or if gains are primarily attributable to data preprocessing and feature engineering. We benchmark a spectrum of models from interpretable baselines, logistic regression, XGBoost to deep architectures (DeepLOB, Conv1D+LSTM) on BTC/USDT LOB snapshots sampled at 100 ms to multi second intervals using publicly available Bybit data. We introduce two data filtering pipelines (Kalman, Savitzky Golay) and evaluate both binary (up/down) and ternary (up/flat/down) labeling schemes. Our analysis compares models on out of sample accuracy, latency, and robustness to noise. Results reveal that, with data preprocessing and hyperparameter tuning, simpler models can match and even exceed the performance of more complex networks, offering faster inference and greater interpretability.
nan
Article 884
Title@2025-06-09 (1): DualDynamics: Synergizing Implicit and Explicit Methods for Robust Irregular Time Series Analysis
Title: DualDynamics: Synergizing Implicit and Explicit Methods for Robust Irregular Time Series Analysis | DualDynamics: Synergetische Implizite und explizite Methoden zur robusten irregulären Zeitreihenanalyse | 双动态: 优化非常规时间序列分析的隐含和显性方法 2401.04979v6 |
Authors (3): YongKyung Oh, Dong-Young Lim, Sungil Kim
Real-world time series analysis faces significant challenges when dealing with irregular and incomplete data. While Neural Differential Equation (NDE) based methods have shown promise, they struggle with limited expressiveness, scalability issues, and stability concerns. Conversely, Neural Flows offer stability but falter with irregular data. We introduce ‘DualDynamics’, a novel framework that synergistically combines NDE-based method and Neural Flow-based method. This approach enhances expressive power while balancing computational demands, addressing critical limitations of existing techniques. We demonstrate DualDynamics’ effectiveness across diverse tasks: classification of robustness to dataset shift, irregularly-sampled series analysis, interpolation of missing data, and forecasting with partial observations. Our results show consistent outperformance over state-of-the-art methods, indicating DualDynamics’ potential to advance irregular time series analysis significantly.
nan
Article 885
Title@2025-06-09 (1): Navigating the Latent Space Dynamics of Neural Models
Title: Navigating the Latent Space Dynamics of Neural Models | Navigation der latenten Raumdynamik von Neuralmodellen | 导航内壳模型的冷层空间动态 2505.22785v2 |
Authors (4): Marco Fumero, Luca Moschella, Emanuele Rodolà, Francesco Locatello
Neural networks transform high-dimensional data into compact, structured representations, often modeled as elements of a lower dimensional latent space. In this paper, we present an alternative interpretation of neural models as dynamical systems acting on the latent manifold. Specifically, we show that autoencoder models implicitly define a latent vector field on the manifold, derived by iteratively applying the encoding-decoding map, without any additional training. We observe that standard training procedures introduce inductive biases that lead to the emergence of attractor points within this vector field. Drawing on this insight, we propose to leverage the vector field as a representation for the network, providing a novel tool to analyze the properties of the model and the data. This representation enables to: (i) analyze the generalization and memorization regimes of neural models, even throughout training; (ii) extract prior knowledge encoded in the network’s parameters from the attractors, without requiring any input data; (iii) identify out-of-distribution samples from their trajectories in the vector field. We further validate our approach on vision foundation models, showcasing the applicability and effectiveness of our method in real-world scenarios.
nan
Article 886
Title@2025-06-09 (1): Comprehensive Review of Neural Differential Equations for Time Series Analysis
Title: Comprehensive Review of Neural Differential Equations for Time Series Analysis | Umfassende Überprüfung neuraler Differentialgleichungen für die Zeitreihenanalyse | 全面审查用于时间序列分析的神经差异等同 2502.09885v2 |
Authors (6): YongKyung Oh, Seungsu Kam, Jonghun Lee, Dong-Young Lim, Sungil Kim, Alex Bui
Time series modeling and analysis have become critical in various domains. Conventional methods such as RNNs and Transformers, while effective for discrete-time and regularly sampled data, face significant challenges in capturing the continuous dynamics and irregular sampling patterns inherent in real-world scenarios. Neural Differential Equations (NDEs) represent a paradigm shift by combining the flexibility of neural networks with the mathematical rigor of differential equations. This paper presents a comprehensive review of NDE-based methods for time series analysis, including neural ordinary differential equations, neural controlled differential equations, and neural stochastic differential equations. We provide a detailed discussion of their mathematical formulations, numerical methods, and applications, highlighting their ability to model continuous-time dynamics. Furthermore, we address key challenges and future research directions. This survey serves as a foundation for researchers and practitioners seeking to leverage NDEs for advanced time series analysis.
nan
Article 887
Title@2025-06-09 (1): Universal Differential Equations for Scientific Machine Learning of Node-Wise Battery Dynamics in Smart Grids
Title: Universal Differential Equations for Scientific Machine Learning of Node-Wise Battery Dynamics in Smart Grids | Universelle Differentialgleichungen für das wissenschaftliche maschinelle Lernen von Node-Wise-Batteriedynamiken in Smart Grids | 智能网格节点电池动态科学机器学习通用差异等量 2506.08272v1 |
Authors (1): Tarushri N. S.
Universal Differential Equations (UDEs), which blend neural networks with physical differential equations, have emerged as a powerful framework for scientific machine learning (SciML), enabling data-efficient, interpretable, and physically consistent modeling. In the context of smart grid systems, modeling node-wise battery dynamics remains a challenge due to the stochasticity of solar input and variability in household load profiles. Traditional approaches often struggle with generalization and fail to capture unmodeled residual dynamics. This work proposes a UDE-based approach to learn node-specific battery evolution by embedding a neural residual into a physically inspired battery ODE. Synthetic yet realistic solar generation and load demand data are used to simulate battery dynamics over time. The neural component learns to model unobserved or stochastic corrections arising from heterogeneity in node demand and environmental conditions. Comprehensive experiments reveal that the trained UDE aligns closely with ground truth battery trajectories, exhibits smooth convergence behavior, and maintains stability in long-term forecasts. These findings affirm the viability of UDE-based SciML approaches for battery modeling in decentralized energy networks and suggest broader implications for real-time control and optimization in renewable-integrated smart grids.
nan
Article 888
Title@2025-06-09 (1): Why Gradients Rapidly Increase Near the End of Training
Title: Why Gradients Rapidly Increase Near the End of Training | Warum Gradienten am Ende der Ausbildung rasch zunehmen | 为何在培训接近结束时梯度迅速增加 2506.02285v2 |
Authors (1): Aaron Defazio
During long-duration Large Language Model (LLM) training runs the gradient norm increases rapidly near the end of training. In this short note, we show that this increase is due to an unintended interaction between weight decay, normalization layers, and the learning rate schedule. We propose a simple correction that fixes this behavior while also resulting in lower loss values throughout training.
nan
Article 889
Title@2025-06-09 (1): Sparse Interpretable Deep Learning with LIES Networks for Symbolic Regression
Title: Sparse Interpretable Deep Learning with LIES Networks for Symbolic Regression | Sparse Interpretable Deep Learning mit LIES-Netzwerken für symbolische Regression | 与 LIES 网络共享的粗化可解释深空学习,用于符号回归 2506.08267v1 |
Authors (4): Mansooreh Montazerin, Majd Al Aawar, Antonio Ortega, Ajitesh Srivastava
Symbolic regression (SR) aims to discover closed-form mathematical expressions that accurately describe data, offering interpretability and analytical insight beyond standard black-box models. Existing SR methods often rely on population-based search or autoregressive modeling, which struggle with scalability and symbolic consistency. We introduce LIES (Logarithm, Identity, Exponential, Sine), a fixed neural network architecture with interpretable primitive activations that are optimized to model symbolic expressions. We develop a framework to extract compact formulae from LIES networks by training with an appropriate oversampling strategy and a tailored loss function to promote sparsity and to prevent gradient instability. After training, it applies additional pruning strategies to further simplify the learned expressions into compact formulae. Our experiments on SR benchmarks show that the LIES framework consistently produces sparse and accurate symbolic formulae outperforming all baselines. We also demonstrate the importance of each design component through ablation studies.
nan
Article 890
Title@2025-06-09 (1): Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
Title: Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints | Verstärktes Lernen aus menschlichem Feedback mit hochkonfidenziellen Sicherheitseinschränkungen | 在高度信任安全限制下,从人类反馈中加强学习 2506.08266v1 |
Authors (6): Yaswanth Chittepu, Blossom Metevier, Will Schwarzer, Austin Hoag, Scott Niekum, Philip S. Thomas
Existing approaches to language model alignment often treat safety as a tradeoff against helpfulness, which can lead to unacceptable responses in sensitive domains. To ensure reliable performance in such settings, we propose High-Confidence Safe Reinforcement Learning from Human Feedback (HC-RLHF), a method that provides high-confidence safety guarantees while maximizing helpfulness. Similar to previous methods, HC-RLHF explicitly decouples human preferences into helpfulness and harmlessness (safety), which are learned by training a reward model and a cost model, respectively. It then employs a two-step process to find safe solutions. In the first step, it optimizes the reward function under an intentionally pessimistic version of the cost constraint. In the second step, the trained model undergoes a safety test to verify whether its performance stays within an upper-confidence bound of the actual cost constraint. We provide a theoretical analysis of HC-RLHF, including proof that it will not return an unsafe solution with a probability greater than a user-specified threshold. For our empirical analysis, we apply HC-RLHF to align three different language models (Qwen2-1.5B, Qwen2.5-3B, and LLaMa3.2-3B) with human preferences. Our results demonstrate that HC-RLHF produces safe models with high probability and can improve harmlessness and helpfulness compared to previous methods.
nan
Article 891
Title@2025-06-09 (1): Learning-Based Multiuser Scheduling in MIMO-OFDM Systems with Hybrid Beamforming
Title: Learning-Based Multiuser Scheduling in MIMO-OFDM Systems with Hybrid Beamforming | Lernbasierte Multiuser-Scheichung in MIMO-OFDM-Systemen mit Hybrid-Beamforming | MOIMO-OFDM系统和混合波束系统中基于学习的多用户规划 2506.08263v1 |
Authors (4): Pouya Agheli, Tugce Kobal, François Durand, Matthew Andrews
We investigate the multiuser scheduling problem in multiple-input multiple-output (MIMO) systems using orthogonal frequency division multiplexing (OFDM) and hybrid beamforming in which a base station (BS) communicates with multiple users over millimeter wave (mmWave) channels in the downlink. Improved scheduling is critical for enhancing spectral efficiency and the long-term performance of the system from the perspective of proportional fairness (PF) metric in hybrid beamforming systems due to its limited multiplexing gain. Our objective is to maximize PF by properly designing the analog and digital precoders within the hybrid beamforming and selecting the users subject to the number of radio frequency (RF) chains. Leveraging the characteristics of mmWave channels, we apply a two-timescale protocol. On a long timescale, we assign an analog beam to each user. Scheduling the users and designing the digital precoder are done accordingly on a short timescale. To conduct scheduling, we propose combinatorial solutions, such as greedy and sorting algorithms, followed by a machine learning (ML) approach. Our numerical results highlight the trade-off between the performance and complexity of the proposed approaches. Consequently, we show that the choice of approach depends on the specific criteria within a given scenario.
nan
Article 892
Title@2025-06-09 (1): SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense
Title: SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense | SHIELD: Sichere Hypernetzwerke für inkrementale Expansions-Learning Defense | SHIELD: 安全超网络,以逐步扩大学习防御 2506.08255v1 |
Authors (5): Patryk Krukowski, Łukasz Gorczyca, Piotr Helm, Kamil Książek, Przemysław Spurek
Traditional deep neural networks suffer from several limitations, including catastrophic forgetting. When models are adapted to new datasets, they tend to quickly forget previously learned knowledge. Another significant issue is the lack of robustness to even small perturbations in the input data. In practice, we can often easily perform adversarial attacks and change the network’s predictions, adding minimal noise to the input. Dedicated architectures and training procedures can solve each of the above problems separately. Unfortunately, currently, no model can simultaneously address both catastrophic forgetting and vulnerability to adversarial attacks. We introduce SHIELD (Secure Hypernetworks for Incremental Expansion and Learning Defense), a novel approach that integrates a hypernetwork-based continual learning approach with interval arithmetic. SHIELD use the hypernetwork to transfer trainable task embedding vectors into the weights of a target model dedicated to specific data. This paradigm allows for the dynamic generation of separate networks for each subtask, while the hypernetwork aggregates and analyzes information across all tasks. The target model takes in the input a data sample with a defined interval range, and by creating a hypercube, produces a prediction for the given range. Therefore, such target models provide strict guarantees against all possible attacks for data samples within the interval range. Our approach enhances security without sacrificing network adaptability, addressing the overlooked challenge of safety in continual learning.
nan
Article 893
Title@2025-06-09 (1): Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction
Title: Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction | Selbstüberwachte Transformer als iterative Lösungsverbesserungen für Einschränkungen Zufriedenheit | 自我主宰的变形器作为抑制满意度的循环解决方案改进器 2502.15794v2 |
Authors (4): Yudong W. Xu, Wenhao Li, Scott Sanner, Elias B. Khalil
We present a Transformer-based framework for Constraint Satisfaction Problems (CSPs). CSPs find use in many applications and thus accelerating their solution with machine learning is of wide interest. Most existing approaches rely on supervised learning from feasible solutions or reinforcement learning, paradigms that require either feasible solutions to these NP-Complete CSPs or large training budgets and a complex expert-designed reward signal. To address these challenges, we propose ConsFormer, a self-supervised framework that leverages a Transformer as a solution refiner. ConsFormer constructs a solution to a CSP iteratively in a process that mimics local search. Instead of using feasible solutions as labeled data, we devise differentiable approximations to the discrete constraints of a CSP to guide model training. Our model is trained to improve random assignments for a single step but is deployed iteratively at test time, circumventing the bottlenecks of supervised and reinforcement learning. Experiments on Sudoku, Graph Coloring, Nurse Rostering, and MAXCUT demonstrate that our method can tackle out-of-distribution CSPs simply through additional iterations.
nan
Article 894
Title@2025-06-09 (1): TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation
Title: TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation | TRAVEL: Training-freies Retrieval und Ausrichtung für Vision-und-Sprachen-Navigation | TRAVEL:无培训检索和协调愿景和语言导航 2502.07306v2 |
Authors (2): Navid Rajabi, Jana Kosecka
In this work, we propose a modular approach for the Vision-Language Navigation (VLN) task by decomposing the problem into four sub-modules that use state-of-the-art Large Language Models (LLMs) and Vision-Language Models (VLMs) in a zero-shot setting. Given navigation instruction in natural language, we first prompt LLM to extract the landmarks and the order in which they are visited. Assuming the known model of the environment, we retrieve the top-k locations of the last landmark and generate $k$ path hypotheses from the starting location to the last landmark using the shortest path algorithm on the topological map of the environment. Each path hypothesis is represented by a sequence of panoramas. We then use dynamic programming to compute the alignment score between the sequence of panoramas and the sequence of landmark names, which match scores obtained from VLM. Finally, we compute the nDTW metric between the hypothesis that yields the highest alignment score to evaluate the path fidelity. We demonstrate superior performance compared to other approaches that use joint semantic maps like VLMaps on the complex R2R-Habitat instruction dataset and quantify in detail the effect of visual grounding on navigation performance.
nan
Article 895
Title@2025-06-09 (1): Approximation-Aware Bayesian Optimization
Title: Approximation-Aware Bayesian Optimization | Angemessene Bayesian-Optimierung | 近似- Aware Bayesian 优化优化 2406.04308v2 |
Authors (6): Natalie Maus, Kyurae Kim, Geoff Pleiss, David Eriksson, John P. Cunningham, Jacob R. Gardner
High-dimensional Bayesian optimization (BO) tasks such as molecular design often require 10,000 function evaluations before obtaining meaningful results. While methods like sparse variational Gaussian processes (SVGPs) reduce computational requirements in these settings, the underlying approximations result in suboptimal data acquisitions that slow the progress of optimization. In this paper we modify SVGPs to better align with the goals of BO: targeting informed data acquisition rather than global posterior fidelity. Using the framework of utility-calibrated variational inference, we unify GP approximation and data acquisition into a joint optimization problem, thereby ensuring optimal decisions under a limited computational budget. Our approach can be used with any decision-theoretic acquisition function and is compatible with trust region methods like TuRBO. We derive efficient joint objectives for the expected improvement and knowledge gradient acquisition functions in both the standard and batch BO settings. Our approach outperforms standard SVGPs on high-dimensional benchmark tasks in control and molecular design.
nan
Article 896
Title@2025-06-09 (1): General Loss Functions Lead to (Approximate) Interpolation in High Dimensions
Title: General Loss Functions Lead to (Approximate) Interpolation in High Dimensions | Allgemeine Verlustfunktionen führen zu (nahezu) Interpolation in hohen Dimensionen | 导致(近近)高层面内插(近) 2303.07475v2 |
Authors (2): Kuo-Wei Lai, Vidya Muthukumar
We provide a unified framework that applies to a general family of convex losses across binary and multiclass settings in the overparameterized regime to approximately characterize the implicit bias of gradient descent in closed form. Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm interpolation in high dimensions, which arises from training on the squared loss. In contrast to prior work, which was tailored to exponentially-tailed losses and used the intermediate support-vector-machine formulation, our framework directly builds on the primal-dual analysis of Ji and Telgarsky (2021), allowing us to provide new approximate equivalences for general convex losses through a novel sensitivity analysis. Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings. Finally, we provide evidence for the tightness of our techniques and use our results to demonstrate the effect of certain loss functions designed for out-of-distribution problems on the closed-form solution.
nan
Article 897
Title@2025-06-09 (1): Parameter-free approximate equivariance for tasks with finite group symmetry
Title: Parameter-free approximate equivariance for tasks with finite group symmetry | Parameterfreie ungefähre Äquivarianz für Aufgaben mit endlicher Gruppensymmetrie | 有限组对称任务无参数 2506.08244v1 |
Authors (3): Riccardo Ali, Pietro Liò, Jamie Vicary
Equivariant neural networks incorporate symmetries through group actions, embedding them as an inductive bias to improve performance on a wide variety of tasks. However, existing equivariant methods can be computationally intensive, with high parameter counts, and are often tied to a specific architecture. We propose a simple zero-parameter approach that imposes approximate equivariance for a finite group in the latent representation, as an additional term in the loss function. We conduct experiments which allow the network to learn a group representation on the latent space, and show in every case it prefers to learn the regular representation. Fixing this action on the latent space, this yields a simple method to impose approximate equivariance as an additional loss penalty. We benchmark our approach on three datasets and compare it against several existing equivariant methods, showing that in many cases it achieves similar or better performance for a fraction of the parameters.
nan
Article 898
Title@2025-06-09 (1): Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic
Title: Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic | Temporalisierendes Vertrauen: Bewertung von Ketten-of-Thought-Reasoning mit Signal Temporal Logic | 时间化信任度:对根据信号时空逻辑进行的研究链原因的评价 2506.08243v1 |
Authors (4): Zhenjiang Mao, Artem Bisliouk, Rohith Reddy Nama, Ivan Ruchkin
Large Language Models (LLMs) have shown impressive performance in mathematical reasoning tasks when guided by Chain-of-Thought (CoT) prompting. However, they tend to produce highly confident yet incorrect outputs, which poses significant risks in domains like education, where users may lack the expertise to assess reasoning steps. To address this, we propose a structured framework that models stepwise confidence as a temporal signal and evaluates it using Signal Temporal Logic (STL). In particular, we define formal STL-based constraints to capture desirable temporal properties and compute robustness scores that serve as structured, interpretable confidence estimates. Our approach also introduces a set of uncertainty reshaping strategies to enforce smoothness, monotonicity, and causal consistency across the reasoning trajectory. Experiments show that our approach consistently improves calibration metrics and provides more reliable uncertainty estimates than conventional confidence aggregation and post-hoc calibration.
nan
Article 899
Title@2025-06-09 (1): Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations
Title: Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations | Umgang mit den bösen Zwillingen: Zufällige Augmentation verbessern, indem Katastrophenvergessen von verschiedenen Augmentationen adressiert wird | 对付邪恶双胞胎:通过解决灾难性地忘记多种增加现象,改善随机增加现象 2506.08240v1 |
Authors (2): Dongkyu Cho, Rumi Chunara
Data augmentation is a promising tool for enhancing out-of-distribution generalization, where the key is to produce diverse, challenging variations of the source domain via costly targeted augmentations that maximize its generalization effect. Conversely, random augmentation is inexpensive but is deemed suboptimal due to its limited effect. In this paper, we revisit random augmentation and explore methods to address its shortcomings. We show that the stochastic nature of random augmentation can produce a set of colliding augmentations that distorts the learned features, similar to catastrophic forgetting. We propose a simple solution that improves the generalization effect of random augmentation by addressing forgetting, which displays strong generalization performance across various single source domain generalization (sDG) benchmarks.
nan
Article 900
Title@2025-06-09 (1): MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning
Title: MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning | MF-LAL: Arzneimittel-Compound-Generierung mit Multi-Fidelity Latent Space Active Learning | MF-LAL:利用多功能低空空间主动学习生成药物化合物 2410.11226v3 |
Authors (5): Peter Eckmann, Dongxia Wu, Germano Heinzelmann, Michael K. Gilson, Rose Yu
Current generative models for drug discovery primarily use molecular docking as an oracle to guide the generation of active compounds. However, such models are often not useful in practice because even compounds with high docking scores do not consistently show real-world experimental activity. More accurate methods for activity prediction exist, such as molecular dynamics based binding free energy calculations, but they are too computationally expensive to use in a generative model. To address this challenge, we propose Multi-Fidelity Latent space Active Learning (MF-LAL), a generative modeling framework that integrates a set of oracles with varying cost-accuracy tradeoffs. Using active learning, we train a surrogate model for each oracle and use these surrogates to guide generation of compounds with high predicted activity. Unlike previous approaches that separately learn the surrogate model and generative model, MF-LAL combines the generative and multi-fidelity surrogate models into a single framework, allowing for more accurate activity prediction and higher quality samples. Our experiments on two disease-relevant proteins show that MF-LAL produces compounds with significantly better binding free energy scores than other single and multi-fidelity approaches (~50% improvement in mean binding free energy score). The code is available at https://github.com/Rose-STL-Lab/MF-LAL.
nan
Article 901
Title@2025-06-09 (1): Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework
Title: Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework | Gewährleistung der Zuverlässigkeit von kuratierten EHR-Daten: Die Validierung der Genauigkeit für LLM/ML-extrahierte Informationen und Daten (VALID) Framework | 确保封闭式电子HR-驱动数据可靠性:LLM/ML-提取信息和数据(VALID)框架的准确性验证 2506.08231v1 |
Authors (11): Melissa Estevez, Nisha Singh, Lauren Dyson, Blythe Adamson, Qianyu Yuan, Megan W. Hildner, Erin Fidyk, Olive Mbah, Farhad Khan, Kathi Seidl-Rathkopf, Aaron B. Cohen
Large language models (LLMs) are increasingly used to extract clinical data from electronic health records (EHRs), offering significant improvements in scalability and efficiency for real-world data (RWD) curation in oncology. However, the adoption of LLMs introduces new challenges in ensuring the reliability, accuracy, and fairness of extracted data, which are essential for research, regulatory, and clinical applications. Existing quality assurance frameworks for RWD and artificial intelligence do not fully address the unique error modes and complexities associated with LLM-extracted data. In this paper, we propose a comprehensive framework for evaluating the quality of clinical data extracted by LLMs. The framework integrates variable-level performance benchmarking against expert human abstraction, automated verification checks for internal consistency and plausibility, and replication analyses comparing LLM-extracted data to human-abstracted datasets or external standards. This multidimensional approach enables the identification of variables most in need of improvement, systematic detection of latent errors, and confirmation of dataset fitness-for-purpose in real-world research. Additionally, the framework supports bias assessment by stratifying metrics across demographic subgroups. By providing a rigorous and transparent method for assessing LLM-extracted RWD, this framework advances industry standards and supports the trustworthy use of AI-powered evidence generation in oncology research and practice.
nan
Article 902
Title@2025-06-09 (1): Scaling Laws of Motion Forecasting and Planning – A Technical Report
Title: Scaling Laws of Motion Forecasting and Planning – A Technical Report | Skalierungsgesetze der Bewegungsvorhersage und Planung – Ein technischer Bericht | 扩大运动预测和规划法 – – 技术报告 2506.08228v1 |
Authors (16): Mustafa Baniodeh, Kratarth Goel, Scott Ettinger, Carlos Fuertes, Ari Seff, Tim Shen, Cole Gulino, Chenjie Yang, Ghassen Jerfel, Dokook Choe, Rui Wang, Vinutha Kallem, Sergio Casas, Rami Al-Rfou, Benjamin Sapp, Dragomir Anguelov
We study the empirical scaling laws of a family of encoder-decoder autoregressive transformer models on the task of joint motion forecasting and planning in the autonomous driving domain. Using a 500 thousand hours driving dataset, we demonstrate that, similar to language modeling, model performance improves as a power-law function of the total compute budget, and we observe a strong correlation between model training loss and model evaluation metrics. Most interestingly, closed-loop metrics also improve with scaling, which has important implications for the suitability of open-loop metrics for model development and hill climbing. We also study the optimal scaling of the number of transformer parameters and the training data size for a training compute-optimal model. We find that as the training compute budget grows, optimal scaling requires increasing the model size 1.5x as fast as the dataset size. We also study inference-time compute scaling, where we observe that sampling and clustering the output of smaller models makes them competitive with larger models, up to a crossover point beyond which a larger models becomes more inference-compute efficient. Overall, our experimental results demonstrate that optimizing the training and inference-time scaling properties of motion forecasting and planning models is a key lever for improving their performance to address a wide variety of driving scenarios. Finally, we briefly study the utility of training on general logged driving data of other agents to improve the performance of the ego-agent, an important research area to address the scarcity of robotics data for large capacity models training.
nan
Article 903
Title@2025-06-09 (1): Mondrian: Transformer Operators via Domain Decomposition
Title: Mondrian: Transformer Operators via Domain Decomposition | Mondrian: Transformer Operators via Domain Decomposition | Mondrian:通过域分解的变形操作器 2506.08226v1 |
Authors (3): Arthur Feeney, Kuei-Hsiang Huang, Aparna Chandramowlishwaran
Operator learning enables data-driven modeling of partial differential equations (PDEs) by learning mappings between function spaces. However, scaling transformer-based operator models to high-resolution, multiscale domains remains a challenge due to the quadratic cost of attention and its coupling to discretization. We introduce \textbf{Mondrian}, transformer operators that decompose a domain into non-overlapping subdomains and apply attention over sequences of subdomain-restricted functions. Leveraging principles from domain decomposition, Mondrian decouples attention from discretization. Within each subdomain, it replaces standard layers with expressive neural operators, and attention across subdomains is computed via softmax-based inner products over functions. The formulation naturally extends to hierarchical windowed and neighborhood attention, supporting both local and global interactions. Mondrian achieves strong performance on Allen-Cahn and Navier-Stokes PDEs, demonstrating resolution scaling without retraining. These results highlight the promise of domain-decomposed attention for scalable and general-purpose neural operators.
nan
Article 904
Title@2025-06-09 (1): Harnessing the Continuous Structure: Utilizing the First-order Approach in Online Contract Design
Title: Harnessing the Continuous Structure: Utilizing the First-order Approach in Online Contract Design | Kontinuierliche Struktur: Nutzung des First-Order-Ansatzes im Online-Contract Design | 利用连续结构:在网上合同设计中使用一等办法 2403.07143v3 |
Authors (1): Shiliang Zuo
This work studies the online contract design problem. The principal’s goal is to learn the optimal contract that maximizes her utility through repeated interactions, without prior knowledge of the agent’s type (i.e., the agent’s cost and production functions). We leverage the structure provided by continuous action spaces, which allows the application of first-order conditions (FOC) to characterize the agent’s behavior. In some cases, we utilize conditions from the first-order approach (FOA) in economics, but in certain settings, we are able to apply FOC without additional assumptions, leading to simpler and more principled algorithms. We illustrate this approach in three problem settings. Firstly, we study the problem of learning the optimal contract when there can be many outcomes. In contrast to prior works that design highly specialized algorithms, we show that the problem can be directly reduced to Lipschitz bandits. Secondly, we study the problem of learning linear contracts. While the contracting problem involves hidden action (moral hazard) and the pricing problem involves hidden value (adverse selection), the two problems share a similar optimization structure, which enables direct reduction between the problem of learning linear contracts and dynamic pricing. Thirdly, we study the problem of learning contracts with many outcomes when agents are identical and provide an algorithm with polynomial sample complexity.
nan
Article 905
Title@2025-06-09 (1): What makes an Ensemble (Un) Interpretable?
Title: What makes an Ensemble (Un) Interpretable? | Was macht ein Ensemble (Un) Interpretierbar? | 集合物怎么可以解释呢? 2506.08216v1 |
Authors (4): Shahaf Bassan, Guy Amir, Meirav Zehavi, Guy Katz
Ensemble models are widely recognized in the ML community for their limited interpretability. For instance, while a single decision tree is considered interpretable, ensembles of trees (e.g., boosted trees) are often treated as black-boxes. Despite this folklore recognition, there remains a lack of rigorous mathematical understanding of what particularly makes an ensemble (un)-interpretable, including how fundamental factors like the (1) number, (2) size, and (3) type of base models influence its interpretability. In this work, we seek to bridge this gap by applying concepts from computational complexity theory to study the challenges of generating explanations for various ensemble configurations. Our analysis uncovers nuanced complexity patterns influenced by various factors. For example, we demonstrate that under standard complexity assumptions like P$\neq$NP, interpreting ensembles remains intractable even when base models are of constant size. Surprisingly, the complexity changes drastically with the number of base models: small ensembles of decision trees are efficiently interpretable, whereas interpreting ensembles with even a constant number of linear models remains intractable. We believe that our findings provide a more robust foundation for understanding the interpretability of ensembles, emphasizing the benefits of examining it through a computational complexity lens.
nan
Article 906
Title@2025-06-09 (1): Scalable Meta-Learning via Mixed-Mode Differentiation
Title: Scalable Meta-Learning via Mixed-Mode Differentiation | Skalierbares Meta-Learning über Mixed-Mode Differenzierung | 通过混合模式差异的可缩放元学习 2505.00793v2 |
Authors (5): Iurii Kemaev, Dan A Calian, Luisa M Zintgraf, Gregory Farquhar, Hado van Hasselt
Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the gradient-based optimisation itself, leading to “gradient-of-a-gradient” calculations with computationally expensive second-order and mixed derivatives. While modern automatic differentiation libraries provide a convenient way to write programs for calculating these derivatives, they oftentimes cannot fully exploit the specific structure of these problems out-of-the-box, leading to suboptimal performance. In this paper, we analyse such cases and propose Mixed-Flow Meta-Gradients, or MixFlow-MG – a practical algorithm that uses mixed-mode differentiation to construct more efficient and scalable computational graphs yielding over 10x memory and up to 25% wall-clock time improvements over standard implementations in modern meta-learning setups.
nan
Article 907
Title@2025-06-09 (1): Distinguishing Cause from Effect with Causal Velocity Models
Title: Distinguishing Cause from Effect with Causal Velocity Models | Bezeichnende Ursache von Wirkung mit kausalen Velocity-Modellen | 区别因与因果速率模型的影响的原因 2502.05122v2 |
Authors (4): Johnny Xi, Hugh Dance, Peter Orbanz, Benjamin Bloem-Reddy
Bivariate structural causal models (SCM) are often used to infer causal direction by examining their goodness-of-fit under restricted model classes. In this paper, we describe a parametrization of bivariate SCMs in terms of a causal velocity by viewing the cause variable as time in a dynamical system. The velocity implicitly defines counterfactual curves via the solution of initial value problems where the observation specifies the initial condition. Using tools from measure transport, we obtain a unique correspondence between SCMs and the score function of the generated distribution via its causal velocity. Based on this, we derive an objective function that directly regresses the velocity against the score function, the latter of which can be estimated non-parametrically from observational data. We use this to develop a method for bivariate causal discovery that extends beyond known model classes such as additive or location scale noise, and that requires no assumptions on the noise distributions. When the score is estimated well, the objective is also useful for detecting model non-identifiability and misspecification. We present positive results in simulation and benchmark experiments where many existing methods fail, and perform ablation studies to examine the method’s sensitivity to accurate score estimation.
nan
Article 908
Title@2025-06-09 (1): A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
Title: A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation | Eine umfassende Studie von Decoder-Only LLMs für die Text-to-Image-Generierung | 综合研究 “ 文字到图像生成的开发者 – – 唯一 “ 有限LLMs “ 综合研究 2506.08210v1 |
Authors (5): Andrew Z. Wang, Songwei Ge, Tero Karras, Ming-Yu Liu, Yogesh Balaji
Both text-to-image generation and large language models (LLMs) have made significant advancements. However, many text-to-image models still employ the somewhat outdated T5 and CLIP as their text encoders. In this work, we investigate the effectiveness of using modern decoder-only LLMs as text encoders for text-to-image diffusion models. We build a standardized training and evaluation pipeline that allows us to isolate and evaluate the effect of different text embeddings. We train a total of 27 text-to-image models with 12 different text encoders to analyze the critical aspects of LLMs that could impact text-to-image generation, including the approaches to extract embeddings, different LLMs variants, and model sizes. Our experiments reveal that the de facto way of using last-layer embeddings as conditioning leads to inferior performance. Instead, we explore embeddings from various layers and find that using layer-normalized averaging across all layers significantly improves alignment with complex prompts. Most LLMs with this conditioning outperform the baseline T5 model, showing enhanced performance in advanced visio-linguistic reasoning skills.
nan
Article 909
Title@2025-06-09 (1): A Machine Learning Approach to Generate Residual Stress Distributions using Sparse Characterization Data in Friction-Stir Processed Parts
Title: A Machine Learning Approach to Generate Residual Stress Distributions using Sparse Characterization Data in Friction-Stir Processed Parts | Ein maschineller Lernansatz zur Generierung residualer Stressverteilungen mit Sparse Charakterisierungsdaten in Friction-Stir Prozessteilen | 利用摩擦-标准处理部件中的扭曲特征数据生成残余应激反应分布的机械学习方法 2506.08205v1 |
Authors (3): Shadab Anwar Shaikh, Kranthi Balusu, Ayoub Soulami
Residual stresses, which remain within a component after processing, can deteriorate performance. Accurately determining their full-field distributions is essential for optimizing the structural integrity and longevity. However, the experimental effort required for full-field characterization is impractical. Given these challenges, this work proposes a machine learning (ML) based Residual Stress Generator (RSG) to infer full-field stresses from limited measurements. An extensive dataset was initially constructed by performing numerous process simulations with a diverse parameter set. A ML model based on U-Net architecture was then trained to learn the underlying structure through systematic hyperparameter tuning. Then, the model’s ability to generate simulated stresses was evaluated, and it was ultimately tested on actual characterization data to validate its effectiveness. The model’s prediction of simulated stresses shows that it achieved excellent predictive accuracy and exhibited a significant degree of generalization, indicating that it successfully learnt the latent structure of residual stress distribution. The RSG’s performance in predicting experimentally characterized data highlights the feasibility of the proposed approach in providing a comprehensive understanding of residual stress distributions from limited measurements, thereby significantly reducing experimental efforts.
nan
Article 910
Title@2025-06-09 (1): Correlated Noise Mechanisms for Differentially Private Learning
Title: Correlated Noise Mechanisms for Differentially Private Learning | Correlated Noise Mechanisms for Differentially Private Learning | 差别私人学习相关噪音机制 2506.08201v1 |
Authors (12): Krishna Pillutla, Jalaj Upadhyay, Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, Arun Ganesh, Monika Henzinger, Jonathan Katz, Ryan McKenna, H. Brendan McMahan, Keith Rush, Thomas Steinke, Abhradeep Thakurta
This monograph explores the design and analysis of correlated noise mechanisms for differential privacy (DP), focusing on their application to private training of AI and machine learning models via the core primitive of estimation of weighted prefix sums. While typical DP mechanisms inject independent noise into each step of a stochastic gradient (SGD) learning algorithm in order to protect the privacy of the training data, a growing body of recent research demonstrates that introducing (anti-)correlations in the noise can significantly improve privacy-utility trade-offs by carefully canceling out some of the noise added on earlier steps in subsequent steps. Such correlated noise mechanisms, known variously as matrix mechanisms, factorization mechanisms, and DP-Follow-the-Regularized-Leader (DP-FTRL) when applied to learning algorithms, have also been influential in practice, with industrial deployment at a global scale.
nan
Article 911
Title@2025-06-09 (1): Tight Lower Bounds and Improved Convergence in Performative Prediction
Title: Tight Lower Bounds and Improved Convergence in Performative Prediction | Enge niedrigere Grenzen und verbesserte Konvergenz in der Performativen Vorhersage | 性能预测的下下角拉紧和更好趋同 2412.03671v2 |
Authors (5): Pedram Khorsandi, Rushil Gupta, Mehrnaz Mofakhami, Simon Lacoste-Julien, Gauthier Gidel
Performative prediction is a framework accounting for the shift in the data distribution induced by the prediction of a model deployed in the real world. Ensuring rapid convergence to a stable solution where the data distribution remains the same after the model deployment is crucial, especially in evolving environments. This paper extends the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous retraining snapshots, yielding a class of algorithms that we call Affine Risk Minimizers and enabling convergence to a performatively stable point for a broader class of problems. We introduce a new upper bound for methods that use only the final iteration of the dataset and prove for the first time the tightness of both this new bound and the previous existing bounds within the same regime. We also prove that utilizing historical datasets can surpass the lower bound for last iterate RRM, and empirically observe faster convergence to the stable point on various performative prediction benchmarks. We offer at the same time the first lower bound analysis for RRM within the class of Affine Risk Minimizers, quantifying the potential improvements in convergence speed that could be achieved with other variants in our framework.
nan
Article 912
Title@2025-06-09 (1): Interpreting Agent Behaviors in Reinforcement-Learning-Based Cyber-Battle Simulation Platforms
Title: Interpreting Agent Behaviors in Reinforcement-Learning-Based Cyber-Battle Simulation Platforms | Verdolmetschung von Agenten-Behaviors bei verstärkten, Learning-basierten Cyber-Battle-Simulationsplattformen | 强化-以学习为基础的网络战模拟平台中解释代理行为行为 2506.08192v1 |
Authors (5): Jared Claypoole, Steven Cheung, Ashish Gehani, Vinod Yegneswaran, Ahmad Ridley
We analyze two open source deep reinforcement learning agents submitted to the CAGE Challenge 2 cyber defense challenge, where each competitor submitted an agent to defend a simulated network against each of several provided rules-based attack agents. We demonstrate that one can gain interpretability of agent successes and failures by simplifying the complex state and action spaces and by tracking important events, shedding light on the fine-grained behavior of both the defense and attack agents in each experimental scenario. By analyzing important events within an evaluation episode, we identify patterns in infiltration and clearing events that tell us how well the attacker and defender played their respective roles; for example, defenders were generally able to clear infiltrations within one or two timesteps of a host being exploited. By examining transitions in the environment’s state caused by the various possible actions, we determine which actions tended to be effective and which did not, showing that certain important actions are between 40% and 99% ineffective. We examine how decoy services affect exploit success, concluding for instance that decoys block up to 94% of exploits that would directly grant privileged access to a host. Finally, we discuss the realism of the challenge and ways that the CAGE Challenge 4 has addressed some of our concerns.
nan
Article 913
Title@2025-06-09 (1): Adapting Prediction Sets to Distribution Shifts Without Labels
Title: Adapting Prediction Sets to Distribution Shifts Without Labels | Anpassung von Vorhersagen an Verteilungsverschiebungen ohne Etiketten | 调整预测集以适应无标签的分布转换 2406.01416v2 |
Authors (4): Kevin Kasa, Zhiyu Zhang, Heng Yang, Graham W. Taylor
Recently there has been a surge of interest to deploy confidence set predictions rather than point predictions in machine learning. Unfortunately, the effectiveness of such prediction sets is frequently impaired by distribution shifts in practice, and the challenge is often compounded by the lack of ground truth labels at test time. Focusing on a standard set-valued prediction framework called conformal prediction (CP), this paper studies how to improve its practical performance using only unlabeled data from the shifted test domain. This is achieved by two new methods called ECP and EACP, whose main idea is to adjust the score function in CP according to its base model’s own uncertainty evaluation. Through extensive experiments on a number of large-scale datasets and neural network architectures, we show that our methods provide consistent improvement over existing baselines and nearly match the performance of fully supervised methods.
nan
Article 914
Title@2025-06-09 (1): Innate-Values-driven Reinforcement Learning based Cognitive Modeling
Title: Innate-Values-driven Reinforcement Learning based Cognitive Modeling | Angeborene Werte-getriebene Verstärkungs-Lernen auf Kognitive Modellierung | 基于认知模型的强化学习模型 2411.09160v2 |
Authors (1): Qin Yang
Innate values describe agents’ intrinsic motivations, which reflect their inherent interests and preferences for pursuing goals and drive them to develop diverse skills that satisfy their various needs. Traditional reinforcement learning (RL) is learning from interaction based on the feedback rewards of the environment. However, in real scenarios, the rewards are generated by agents’ innate value systems, which differ vastly from individuals based on their needs and requirements. In other words, considering the AI agent as a self-organizing system, developing its awareness through balancing internal and external utilities based on its needs in different tasks is a crucial problem for individuals learning to support others and integrate community with safety and harmony in the long term. To address this gap, we propose a new RL model termed innate-values-driven RL (IVRL) based on combined motivations’ models and expected utility theory to mimic its complex behaviors in the evolution through decision-making and learning. Then, we introduce two IVRL-based models: IV-DQN and IV-A2C. By comparing them with benchmark algorithms such as DQN, DDQN, A2C, and PPO in the Role-Playing Game (RPG) reinforcement learning test platform VIZDoom, we demonstrated that the IVRL-based models can help the agent rationally organize various needs, achieve better performance effectively.
nan
Article 915
Title@2025-06-09 (1): Innate-Values-driven Reinforcement Learning based Cooperative Multi-Agent Cognitive Modeling
Title: Innate-Values-driven Reinforcement Learning based Cooperative Multi-Agent Cognitive Modeling | Angeborene Werte-getriebene Verstärkung Learning basierte kooperative Multi-Agent Kognitive Modellierung | 以基于强化的学习为基础的合作多代理共认型建模 2401.05572v2 |
Authors (1): Qin Yang
In multi-agent systems (MAS), the dynamic interaction among multiple decision-makers is driven by their innate values, affecting the environment’s state, and can cause specific behavioral patterns to emerge. On the other hand, innate values in cognitive modeling reflect individual interests and preferences for specific tasks and drive them to develop diverse skills and plans, satisfying their various needs and achieving common goals in cooperation. Therefore, building the awareness of AI agents to balance the group utilities and system costs and meet group members’ needs in their cooperation is a crucial problem for individuals learning to support their community and even integrate into human society in the long term. However, the current MAS reinforcement learning domain lacks a general intrinsic model to describe agents’ dynamic motivation for decision-making and learning from an individual needs perspective in their cooperation. To address the gap, this paper proposes a general MAS innate-values reinforcement learning (IVRL) architecture from the individual preferences angle. We tested the Multi-Agent IVRL Actor-Critic Model in different StarCraft Multi-Agent Challenge (SMAC) settings, which demonstrated its potential to organize the group’s behaviours to achieve better performance.
nan
Article 916
Title@2025-06-09 (1): FedGA-Tree: Federated Decision Tree using Genetic Algorithm
Title: FedGA-Tree: Federated Decision Tree using Genetic Algorithm | FedGA-Baum: Föderierter Entscheidungsbaum mit genetischem Algorithmus | FedGA-Tree:使用遗传算法的联邦决策树 2506.08176v1 |
Authors (2): Anh V Nguyen, Diego Klabjan
In recent years, with rising concerns for data privacy, Federated Learning has gained prominence, as it enables collaborative training without the aggregation of raw data from participating clients. However, much of the current focus has been on parametric gradient-based models, while nonparametric counterparts such as decision tree are relatively understudied. Existing methods for adapting decision trees to Federated Learning generally combine a greedy tree-building algorithm with differential privacy to produce a global model for all clients. These methods are limited to classification trees and categorical data due to the constraints of differential privacy. In this paper, we explore an alternative approach that utilizes Genetic Algorithm to facilitate the construction of personalized decision trees and accommodate categorical and numerical data, thus allowing for both classification and regression trees. Comprehensive experiments demonstrate that our method surpasses decision trees trained solely on local data and a benchmark algorithm.
nan
Article 917
Title@2025-06-09 (1): xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Title: xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs | xGen-MM-Vid (BLIP-3-Video): Sie benötigen nur 32 Zeichen, um ein Video auch in VLMs darzustellen | xGen-MM-Vid (BLIP-3-Video):您只需32个调子即可在 VLMs 中代表视频 2410.16267v2 |
Authors (12): Michael S. Ryoo, Honglu Zhou, Shrikant Kendre, Can Qin, Le Xue, Manli Shu, Jongwoo Park, Kanchana Ranasinghe, Silvio Savarese, Ran Xu, Caiming Xiong, Juan Carlos Niebles
We present xGen-MM-Vid (BLIP-3-Video): a multimodal language model for videos, particularly designed to efficiently capture temporal information over multiple frames. BLIP-3-Video takes advantage of the ‘temporal encoder’ in addition to the conventional visual tokenizer, which maps a sequence of tokens over multiple frames into a compact set of visual tokens. This enables BLIP3-Video to use much fewer visual tokens than its competing models (e.g., 32 vs. 4608 tokens). We explore different types of temporal encoders, including learnable spatio-temporal pooling as well as sequential models like Token Turing Machines. We experimentally confirm that BLIP-3-Video obtains video question-answering accuracies comparable to much larger state-of-the-art models (e.g., 34B), while being much smaller (i.e., 4B) and more efficient by using fewer visual tokens. The project website is at https://www.salesforceairesearch.com/opensource/xGen-MM-Vid/index.html
nan
Article 918
Title@2025-06-09 (1): Grouped Discrete Representation for Object-Centric Learning
Title: Grouped Discrete Representation for Object-Centric Learning | Gruppierte diskrete Darstellung für Objekt-Zentrisches Lernen | 用于对象中心学习的分组差异代表 2411.02299v2 |
Authors (4): Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen
Object-Centric Learning (OCL) aims to discover objects in images or videos by reconstructing the input. Representative methods achieve this by reconstructing the input as its Variational Autoencoder (VAE) discrete representations, which suppress (super-)pixel noise and enhance object separability. However, these methods treat features as indivisible units, overlooking their compositional attributes, and discretize features via scalar code indexes, losing attribute-level similarities and differences. We propose Grouped Discrete Representation (GDR) for OCL. For better generalization, features are decomposed into combinatorial attributes by organized channel grouping. For better convergence, features are quantized into discrete representations via tuple code indexes. Experiments demonstrate that GDR consistently improves both mainstream and state-of-the-art OCL methods across various datasets. Visualizations further highlight GDR’s superior object separability and interpretability. The source code is available on https://github.com/Genera1Z/GroupedDiscreteRepresentation.
nan
Article 919
Title@2025-06-09 (1): Federated Learning on Stochastic Neural Networks
Title: Federated Learning on Stochastic Neural Networks | Föderiertes Lernen in stochastischen Neuronalen Netzwerken | 存储神经神经网络方面的联邦学习 2506.08169v1 |
Authors (4): Jingqiao Tang, Ryan Bausback, Feng Bao, Richard Archibald
Federated learning is a machine learning paradigm that leverages edge computing on client devices to optimize models while maintaining user privacy by ensuring that local data remains on the device. However, since all data is collected by clients, federated learning is susceptible to latent noise in local datasets. Factors such as limited measurement capabilities or human errors may introduce inaccuracies in client data. To address this challenge, we propose the use of a stochastic neural network as the local model within the federated learning framework. Stochastic neural networks not only facilitate the estimation of the true underlying states of the data but also enable the quantification of latent noise. We refer to our federated learning approach, which incorporates stochastic neural networks as local models, as Federated stochastic neural networks. We will present numerical experiments demonstrating the performance and effectiveness of our method, particularly in handling non-independent and identically distributed data.
nan
Article 920
Title@2025-06-09 (1): SILK: Smooth InterpoLation frameworK for motion in-betweening A Simplified Computational Approach
Title: SILK: Smooth InterpoLation frameworK for motion in-betweening A Simplified Computational Approach | SILK: Smooth InterpoLation frameworK für Bewegung zwischen einem vereinfachten Computational Approach | SILK: 在简化计算方法之间运动的平滑间插框架work 2506.09075v1 |
Authors (4): Elly Akhoundi, Hung Yu Ling, Anup Anand Deshmukh, Judith Butepage
Motion in-betweening is a crucial tool for animators, enabling intricate control over pose-level details in each keyframe. Recent machine learning solutions for motion in-betweening rely on complex models, incorporating skeleton-aware architectures or requiring multiple modules and training steps. In this work, we introduce a simple yet effective Transformer-based framework, employing a single Transformer encoder to synthesize realistic motions for motion in-betweening tasks. We find that data modeling choices play a significant role in improving in-betweening performance. Among others, we show that increasing data volume can yield equivalent or improved motion transitions, that the choice of pose representation is vital for achieving high-quality results, and that incorporating velocity input features enhances animation performance. These findings challenge the assumption that model complexity is the primary determinant of animation quality and provide insights into a more data-centric approach to motion interpolation. Additional videos and supplementary material are available at https://silk-paper.github.io.
nan
Article 921
Title@2025-06-09 (1): UniVarFL: Uniformity and Variance Regularized Federated Learning for Heterogeneous Data
Title: UniVarFL: Uniformity and Variance Regularized Federated Learning for Heterogeneous Data | UniVarFL: Uniformität und Varianz Regularisiertes Federated Learning für heterogene Daten | UUVARFL: 统一和差异校正的联邦异种数据学习 2506.08167v1 |
Authors (3): Sunny Gupta, Nikita Jangid, Amit Sethi
Federated Learning (FL) often suffers from severe performance degradation when faced with non-IID data, largely due to local classifier bias. Traditional remedies such as global model regularization or layer freezing either incur high computational costs or struggle to adapt to feature shifts. In this work, we propose UniVarFL, a novel FL framework that emulates IID-like training dynamics directly at the client level, eliminating the need for global model dependency. UniVarFL leverages two complementary regularization strategies during local training: Classifier Variance Regularization, which aligns class-wise probability distributions with those expected under IID conditions, effectively mitigating local classifier bias; and Hyperspherical Uniformity Regularization, which encourages a uniform distribution of feature representations across the hypersphere, thereby enhancing the model’s ability to generalize under diverse data distributions. Extensive experiments on multiple benchmark datasets demonstrate that UniVarFL outperforms existing methods in accuracy, highlighting its potential as a highly scalable and efficient solution for real-world FL deployments, especially in resource-constrained settings. Code: https://github.com/sunnyinAI/UniVarFL
nan
Article 922
Title@2025-06-09 (1): BLUR: A Bi-Level Optimization Approach for LLM Unlearning
Title: BLUR: A Bi-Level Optimization Approach for LLM Unlearning | BLUR: Ein Bi-Level-Optimierungsansatz für LLM-Unlearning | BLUR: LLM 重新学习的双级优化办法 2506.08164v1 |
Authors (9): Hadi Reisizadeh, Jinghan Jia, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Sijia Liu, Mingyi Hong
Enabling large language models (LLMs) to unlearn knowledge and capabilities acquired during training has proven vital for ensuring compliance with data regulations and promoting ethical practices in generative AI. Although there are growing interests in developing various unlearning algorithms, it remains unclear how to best formulate the unlearning problem. The most popular formulation uses a weighted sum of forget and retain loss, but it often leads to performance degradation due to the inherent trade-off between forget and retain losses. In this work, we argue that it is important to model the hierarchical structure of the unlearning problem, where the forget problem (which \textit{unlearns} certain knowledge and/or capabilities) takes priority over the retain problem (which preserves model utility). This hierarchical structure naturally leads to a bi-level optimization formulation where the lower-level objective focuses on minimizing the forget loss, while the upper-level objective aims to maintain the model’s utility. Based on this new formulation, we propose a novel algorithm, termed Bi-Level UnleaRning (\texttt{BLUR}), which not only possesses strong theoretical guarantees but more importantly, delivers superior performance. In particular, our extensive experiments demonstrate that \texttt{BLUR} consistently outperforms all the state-of-the-art algorithms across various unlearning tasks, models, and metrics. Codes are available at https://github.com/OptimAI-Lab/BLURLLMUnlearning.
nan
Article 923
Title@2025-06-09 (1): A Metrics-Oriented Architectural Model to Characterize Complexity on Machine Learning-Enabled Systems
Title: A Metrics-Oriented Architectural Model to Characterize Complexity on Machine Learning-Enabled Systems | Ein metrisch ausgerichtetes architektonisches Modell zur Charakterisierung von Komplexität auf maschinell lernfähigen Systemen | 以计量为主的建筑建筑模型,以明确机械学习系统的复杂性 2506.08153v1 |
Authors (1): Renato Cordeiro Ferreira
How can the complexity of ML-enabled systems be managed effectively? The goal of this research is to investigate how complexity affects ML-Enabled Systems (MLES). To address this question, this research aims to introduce a metrics-based architectural model to characterize the complexity of MLES. The goal is to support architectural decisions, providing a guideline for the inception and growth of these systems. This paper showcases the first step for creating the metrics-based architectural model: an extension of a reference architecture that can describe MLES to collect their metrics.
nan
Article 924
Title@2025-06-09 (1): Towards a Mechanistic Explanation of Diffusion Model Generalization
Title: Towards a Mechanistic Explanation of Diffusion Model Generalization | Auf dem Weg zu einer mechanistischen Erklärung der Verallgemeinerung des Difffusionsmodells | 争取对传播模型通用化进行机械解释 2411.19339v3 |
Authors (4): Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy, Frank Wood
We propose a simple, training-free mechanism which explains the generalization behaviour of diffusion models. By comparing pre-trained diffusion models to their theoretically optimal empirical counterparts, we identify a shared local inductive bias across a variety of network architectures. From this observation, we hypothesize that network denoisers generalize through localized denoising operations, as these operations approximate the training objective well over much of the training distribution. To validate our hypothesis, we introduce novel denoising algorithms which aggregate local empirical denoisers to replicate network behaviour. Comparing these algorithms to network denoisers across forward and reverse diffusion processes, our approach exhibits consistent visual similarity to neural network outputs, with lower mean squared error than previously proposed methods.
nan
Article 925
Title@2025-06-09 (1): Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models
Title: Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models | Mehrsprachige Hate Speech Detection in sozialen Medien mit translationsbasierten Ansätzen mit großen Sprachmodellen | 在社会媒体中使用基于翻译的方法和大语言模式的多语言社会媒体多语言仇恨言论探测 2506.08147v1 |
Authors (6): Muhammad Usman, Muhammad Ahmad, M. Shahiki Tash, Irina Gelbukh, Rolando Quintero Tellez, Grigori Sidorov
Social media platforms are critical spaces for public discourse, shaping opinions and community dynamics, yet their widespread use has amplified harmful content, particularly hate speech, threatening online safety and inclusivity. While hate speech detection has been extensively studied in languages like English and Spanish, Urdu remains underexplored, especially using translation-based approaches. To address this gap, we introduce a trilingual dataset of 10,193 tweets in English (3,834 samples), Urdu (3,197 samples), and Spanish (3,162 samples), collected via keyword filtering, with a balanced distribution of 4,849 Hateful and 5,344 Not-Hateful labels. Our methodology leverages attention layers as a precursor to transformer-based models and large language models (LLMs), enhancing feature extraction for multilingual hate speech detection. For non-transformer models, we use TF-IDF for feature extraction. The dataset is benchmarked using state-of-the-art models, including GPT-3.5 Turbo and Qwen 2.5 72B, alongside traditional machine learning models like SVM and other transformers (e.g., BERT, RoBERTa). Three annotators, following rigorous guidelines, ensured high dataset quality, achieving a Fleiss’ Kappa of 0.821. Our approach, integrating attention layers with GPT-3.5 Turbo and Qwen 2.5 72B, achieves strong performance, with macro F1 scores of 0.87 for English (GPT-3.5 Turbo), 0.85 for Spanish (GPT-3.5 Turbo), 0.81 for Urdu (Qwen 2.5 72B), and 0.88 for the joint multilingual model (Qwen 2.5 72B). These results reflect improvements of 8.75% in English (over SVM baseline 0.80), 8.97% in Spanish (over SVM baseline 0.78), 5.19% in Urdu (over SVM baseline 0.77), and 7.32% in the joint multilingual model (over SVM baseline 0.82). Our framework offers a robust solution for multilingual hate speech detection, fostering safer digital communities worldwide.
nan
Article 926
Title@2025-06-09 (1): A Theory of Machine Understanding via the Minimum Description Length Principle
Title: A Theory of Machine Understanding via the Minimum Description Length Principle | Eine Theorie des Maschinenverständnisses über das Prinzip der Mindestbeschreibungslänge | 通过最低描述长度原则理解机器理论 2504.00395v3 |
Authors (2): Canlin Zhang, Xiuwen Liu
Deep neural networks trained through end-to-end learning have achieved remarkable success across various domains in the past decade. However, the end-to-end learning strategy, originally designed to minimize predictive loss in a black-box manner, faces two fundamental limitations: the struggle to form explainable representations in a self-supervised manner, and the inability to compress information rigorously following the Minimum Description Length (MDL) principle. These two limitations point to a deeper issue: an end-to-end learning model is not able to “understand” what it learns. In this paper, we establish a novel theory connecting these two limitations. We design the Spectrum VAE, a novel deep learning architecture whose minimum description length (MDL) can be rigorously evaluated. Then, we introduce the concept of latent dimension combinations, or what we term spiking patterns, and demonstrate that the observed spiking patterns should be as few as possible based on the training data in order for the Spectrum VAE to achieve the MDL. Finally, our theory demonstrates that when the MDL is achieved with respect to the given data distribution, the Spectrum VAE will naturally produce explainable latent representations of the data. In other words, explainable representations–or “understanding”–can emerge in a self-supervised manner simply by making the deep network obey the MDL principle. In our opinion, this also implies a deeper insight: To understand is to compress. At its core, our theory advocates for a shift in the training objective of deep networks: not only to minimize predictive loss, but also to minimize the description length regarding the given data. That is, a deep network should not only learn, but also understand what it learns. This work is entirely theoretical and aims to inspire future research toward self-supervised, explainable AI grounded in the MDL principle.
nan
Article 927
Title@2025-06-09 (1): Fully data-driven inverse hyperelasticity with hyper-network neural ODE fields
Title: Fully data-driven inverse hyperelasticity with hyper-network neural ODE fields | Vollständig datengetriebene inverse Hyperelastizität mit hyper-network neuronalen ODE-Feldern | 由全数据驱动的全数据驱动的超反超弹性,具有超网络神经极极光字段 2506.08146v1 |
Authors (6): Vahidullah Taç, Amirhossein Amiri-Hezaveh, Manuel K. Rausch, Grace N. Bechtel, Francisco Sahli Costabal, Adrian Buganza Tepole
We propose a new framework for identifying mechanical properties of heterogeneous materials without a closed-form constitutive equation. Given a full-field measurement of the displacement field, for instance as obtained from digital image correlation (DIC), a continuous approximation of the strain field is obtained by training a neural network that incorporates Fourier features to effectively capture sharp gradients in the data. A physics-based data-driven method built upon ordinary neural differential equations (NODEs) is employed to discover constitutive equations. The NODE framework can represent arbitrary materials while satisfying constraints in the theory of constitutive equations by default. To account for heterogeneity, a hyper-network is defined, where the input is the material coordinate system, and the output is the NODE-based constitutive equation. The parameters of the hyper-network are optimized by minimizing a multi-objective loss function that includes penalty terms for violations of the strong form of the equilibrium equations of elasticity and the associated Neumann boundary conditions. We showcase the framework with several numerical examples, including heterogeneity arising from variations in material parameters, spatial transitions from isotropy to anisotropy, material identification in the presence of noise, and, ultimately, application to experimental data. As the numerical results suggest, the proposed approach is robust and general in identifying the mechanical properties of heterogeneous materials with very few assumptions, making it a suitable alternative to classical inverse methods.
nan
Article 928
Title@2025-06-09 (1): Accelerating Spectral Clustering under Fairness Constraints
Title: Accelerating Spectral Clustering under Fairness Constraints | Beschleunigung des Spektralclustering unter Fairness-Beschränkungen | 加速在公平制约下聚居 2506.08143v1 |
Authors (4): Francesco Tonin, Alex Lambert, Johan A. K. Suykens, Volkan Cevher
Fairness of decision-making algorithms is an increasingly important issue. In this paper, we focus on spectral clustering with group fairness constraints, where every demographic group is represented in each cluster proportionally as in the general population. We present a new efficient method for fair spectral clustering (Fair SC) by casting the Fair SC problem within the difference of convex functions (DC) framework. To this end, we introduce a novel variable augmentation strategy and employ an alternating direction method of multipliers type of algorithm adapted to DC problems. We show that each associated subproblem can be solved efficiently, resulting in higher computational efficiency compared to prior work, which required a computationally expensive eigendecomposition. Numerical experiments demonstrate the effectiveness of our approach on both synthetic and real-world benchmarks, showing significant speedups in computation time over prior art, especially as the problem size grows. This work thus represents a considerable step forward towards the adoption of fair clustering in real-world applications.
nan
Article 929
Title@2025-06-09 (1): AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists
Title: AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists | AutoSDT: Scaling Data-Driven Discovery Aufgaben gegenüber offenen Mitwissenschaftlern | AutoSDD: 扩大数据驱动发现任务, 以建立开放的共同科学家 2506.08140v1 |
Authors (19): Yifei Li, Hanane Nour Moussa, Ziru Chen, Shijie Chen, Botao Yu, Mingyi Xue, Benjamin Burns, Tzu-Yao Chiu, Vishal Dey, Zitong Lu, Chen Wei, Qianheng Zhang, Tianyu Zhang, Song Gao, Xuhui Huang, Xia Ning, Nesreen K. Ahmed, Ali Payani, Huan Sun
Despite long-standing efforts in accelerating scientific discovery with AI, building AI co-scientists remains challenging due to limited high-quality data for training and evaluation. To tackle this data scarcity issue, we present AutoSDT, an automatic pipeline that collects high-quality coding tasks in real-world data-driven discovery workflows. AutoSDT leverages the coding capabilities and parametric knowledge of LLMs to search for diverse sources, select ecologically valid tasks, and synthesize accurate task instructions and code solutions. Using our pipeline, we construct AutoSDT-5K, a dataset of 5,404 coding tasks for data-driven discovery that covers four scientific disciplines and 756 unique Python packages. To the best of our knowledge, AutoSDT-5K is the only automatically collected and the largest open dataset for data-driven scientific discovery. Expert feedback on a subset of 256 tasks shows the effectiveness of AutoSDT: 93% of the collected tasks are ecologically valid, and 92.2% of the synthesized programs are functionally correct. Trained on AutoSDT-5K, the Qwen2.5-Coder-Instruct LLM series, dubbed AutoSDT-Coder, show substantial improvement on two challenging data-driven discovery benchmarks, ScienceAgentBench and DiscoveryBench. Most notably, AutoSDT-Coder-32B reaches the same level of performance as GPT-4o on ScienceAgentBench with a success rate of 7.8%, doubling the performance of its base model. On DiscoveryBench, it lifts the hypothesis matching score to 8.1, bringing a 17.4% relative improvement and closing the gap between open-weight models and GPT-4o.
nan
Article 930
Title@2025-06-09 (1): Nearness of Neighbors Attention for Regression in Supervised Finetuning
Title: Nearness of Neighbors Attention for Regression in Supervised Finetuning | Nähe der Nachbarn Achtung vor der Regression in überwachter Finetuning | 邻近邻居的近邻 在监督的微调中注意后退 2506.08139v1 |
Authors (3): Aviad Susman, Mayte Suárez-Fariñas, Joseph T Colonel
It is common in supervised machine learning to combine the feature extraction capabilities of neural networks with the predictive power of traditional algorithms, such as k-nearest neighbors (k-NN) or support vector machines. This procedure involves performing supervised fine-tuning (SFT) on a domain-appropriate feature extractor, followed by training a traditional predictor on the resulting SFT embeddings. When used in this manner, traditional predictors often deliver increased performance over the SFT model itself, despite the fine-tuned feature extractor yielding embeddings specifically optimized for prediction by the neural network’s final dense layer. This suggests that directly incorporating traditional algorithms into SFT as prediction layers may further improve performance. However, many traditional algorithms have not been implemented as neural network layers due to their non-differentiable nature and their unique optimization requirements. As a step towards solving this problem, we introduce the Nearness of Neighbors Attention (NONA) regression layer. NONA uses the mechanics of neural network attention and a novel learned attention-masking scheme to yield a differentiable proxy of the k-NN regression algorithm. Results on multiple unstructured datasets show improved performance over both dense layer prediction and k-NN on SFT embeddings for regression.
nan
Article 931
Title@2025-06-09 (1): Learning Efficient Representations of Neutrino Telescope Events
Title: Learning Efficient Representations of Neutrino Telescope Events | Effiziente Darstellungen von Neutrino-Teleskop-Veranstaltungen lernen | 中子望远镜活动学习效率高的代表性 2410.13148v2 |
Authors (3): Felix J. Yu, Nicholas Kamp, Carlos A. Argüelles
Neutrino telescopes detect rare interactions of particles produced in some of the most extreme environments in the Universe. This is accomplished by instrumenting a cubic-kilometer volume of naturally occurring transparent medium with light sensors. Given their substantial size and the high frequency of background interactions, these telescopes amass an enormous quantity of large variance, high-dimensional data. These attributes create substantial challenges for analyzing and reconstructing interactions, particularly when utilizing machine learning (ML) techniques. In this paper, we present a novel approach, called om2vec, that employs transformer-based variational autoencoders to efficiently represent neutrino telescope events by learning compact and descriptive latent representations. We demonstrate that these latent representations offer enhanced flexibility and improved computational efficiency, thereby facilitating downstream tasks in data analysis.
nan
Article 932
Title@2025-06-09 (1): UnCLe: Benchmarking Unsupervised Continual Learning for Depth Completion
Title: UnCLe: Benchmarking Unsupervised Continual Learning for Depth Completion | UnCLe: Benchmarking unüberwachtes kontinuierliches Lernen für die Tiefenvervollständigung | 无CLLe:为深度完成确定不受监督的连续学习基准 2410.18074v4 |
Authors (6): Xien Chen, Rit Gangopadhyay, Michael Chu, Patrick Rim, Hyoungseob Park, Alex Wong
We propose UnCLe, the first standardized benchmark for Unsupervised Continual Learning of a multimodal 3D reconstruction task: Depth completion aims to infer a dense depth map from a pair of synchronized RGB image and sparse depth map. We benchmark depth completion models under the practical scenario of unsupervised learning over continuous streams of data. While unsupervised learning of depth boasts the possibility continual learning of novel data distributions over time, existing methods are typically trained on a static, or stationary, dataset. However, when adapting to novel nonstationary distributions, they ``catastrophically forget’’ previously learned information. UnCLe simulates these non-stationary distributions by adapting depth completion models to sequences of datasets containing diverse scenes captured from distinct domains using different visual and range sensors. We adopt representative methods from continual learning paradigms and translate them to enable unsupervised continual learning of depth completion. We benchmark these models across indoor and outdoor environments, and investigate the degree of catastrophic forgetting through standard quantitative metrics. We find that unsupervised continual learning of depth completion is an open problem, and we invite researchers to leverage UnCLe as a development platform.
nan
Article 933
Title@2025-06-09 (1): Directed Exploration in Reinforcement Learning from Linear Temporal Logic
Title: Directed Exploration in Reinforcement Learning from Linear Temporal Logic | Direkte Exploration bei der Verstärkung Lernen von linearen Temporal Logic | 直接探索从线上时间逻辑中加强学习 2408.09495v2 |
Authors (3): Marco Bagatella, Andreas Krause, Georg Martius
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning, as it allows describing objectives beyond the expressivity of conventional discounted return formulations. Nonetheless, recent works have shown that LTL formulas can be translated into a variable rewarding and discounting scheme, whose optimization produces a policy maximizing a lower bound on the probability of formula satisfaction. However, the synthesized reward signal remains fundamentally sparse, making exploration challenging. We aim to overcome this limitation, which can prevent current algorithms from scaling beyond low-dimensional, short-horizon problems. We show how better exploration can be achieved by further leveraging the LTL specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process, thus enabling a form of high-level value estimation. By taking a Bayesian perspective over LDBA dynamics and proposing a suitable prior distribution, we show that the values estimated through this procedure can be treated as a shaping potential and mapped to informative intrinsic rewards. Empirically, we demonstrate applications of our method from tabular settings to high-dimensional continuous systems, which have so far represented a significant challenge for LTL-based reinforcement learning algorithms.
nan
Article 934
Title@2025-06-09 (1): Constrained Pareto Set Identification with Bandit Feedback
Title: Constrained Pareto Set Identification with Bandit Feedback | Beschränkte Pareto-Set-Identifikation mit Bandit-Feedback | 被约束的 Pareto 设置 识别强盗反馈 2506.08127v1 |
Authors (3): Cyrille Kone, Emilie Kaufmann, Laura Richert
In this paper, we address the problem of identifying the Pareto Set under feasibility constraints in a multivariate bandit setting. Specifically, given a $K$-armed bandit with unknown means $\mu_1, \dots, \mu_K \in \mathbb{R}^d$, the goal is to identify the set of arms whose mean is not uniformly worse than that of another arm (i.e., not smaller for all objectives), while satisfying some known set of linear constraints, expressing, for example, some minimal performance on each objective. Our focus lies in fixed-confidence identification, for which we introduce an algorithm that significantly outperforms racing-like algorithms and the intuitive two-stage approach that first identifies feasible arms and then their Pareto Set. We further prove an information-theoretic lower bound on the sample complexity of any algorithm for constrained Pareto Set identification, showing that the sample complexity of our approach is near-optimal. Our theoretical results are supported by an extensive empirical evaluation on a series of benchmarks.
nan
Article 935
Title@2025-06-09 (1): Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning
Title: Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning | Bingo: Effiziente Begründung von LLMs durch dynamisches und auf Bedeutung basierendes Verstärkungslernen | 宾果:通过以动态和重要内容为基础的强化学习,提高LLMs高效率的理据能力 2506.08125v1 |
Authors (8): Hanbing Liu, Lang Cao, Yuanyi Ren, Mengyu Zhou, Haoyu Dong, Xiaojun Ma, Shi Han, Dongmei Zhang
Large language models have demonstrated impressive reasoning capabilities, yet they often suffer from inefficiencies due to unnecessarily verbose or redundant outputs. While many works have explored reinforcement learning (RL) to enhance reasoning abilities, most primarily focus on improving accuracy, with limited attention to reasoning efficiency. Some existing approaches introduce direct length-based rewards to encourage brevity, but this often leads to noticeable drops in accuracy. In this paper, we propose Bingo, an RL framework that advances length-based reward design to boost efficient reasoning. Bingo incorporates two key mechanisms: a significance-aware length reward, which gradually guides the model to reduce only insignificant tokens, and a dynamic length reward, which initially encourages elaborate reasoning for hard questions but decays over time to improve overall efficiency. Experiments across multiple reasoning benchmarks show that Bingo improves both accuracy and efficiency. It outperforms the vanilla reward and several other length-based reward baselines in RL, achieving a favorable trade-off between accuracy and efficiency. These results underscore the potential of training LLMs explicitly for efficient reasoning.
nan
Article 936
Title@2025-06-09 (1): Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence
Title: Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence | Kontinuierliche Politik und Wert Iteration für stochastische Kontrollprobleme und deren Konvergenz | 斯托卡控制问题及其趋同的持续政策和价值迭代 2506.08121v1 |
Authors (2): Qi Feng, Gu Wang
We introduce a continuous policy-value iteration algorithm where the approximations of the value function of a stochastic control problem and the optimal control are simultaneously updated through Langevin-type dynamics. This framework applies to both the entropy-regularized relaxed control problems and the classical control problems, with infinite horizon. We establish policy improvement and demonstrate convergence to the optimal control under the monotonicity condition of the Hamiltonian. By utilizing Langevin-type stochastic differential equations for continuous updates along the policy iteration direction, our approach enables the use of distribution sampling and non-convex learning techniques in machine learning to optimize the value function and identify the optimal control simultaneously.
nan
Article 937
Title@2025-06-09 (1): Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting
Title: Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting | Benchmarking-Modelle der vortrainierten Zeitreihen für die Strompreisprognose | 确定电力价格预测培训前时间序列模型的基准 2506.08113v1 |
Authors (3): Timothée Hornek Amir Sartipi, Igor Tchappi, Gilbert Fridgen
Accurate electricity price forecasting (EPF) is crucial for effective decision-making in power trading on the spot market. While recent advances in generative artificial intelligence (GenAI) and pre-trained large language models (LLMs) have inspired the development of numerous time series foundation models (TSFMs) for time series forecasting, their effectiveness in EPF remains uncertain. To address this gap, we benchmark several state-of-the-art pretrained models–Chronos-Bolt, Chronos-T5, TimesFM, Moirai, Time-MoE, and TimeGPT–against established statistical and machine learning (ML) methods for EPF. Using 2024 day-ahead auction (DAA) electricity prices from Germany, France, the Netherlands, Austria, and Belgium, we generate daily forecasts with a one-day horizon. Chronos-Bolt and Time-MoE emerge as the strongest among the TSFMs, performing on par with traditional models. However, the biseasonal MSTL model, which captures daily and weekly seasonality, stands out for its consistent performance across countries and evaluation metrics, with no TSFM statistically outperforming it.
nan
Article 938
Title@2025-06-09 (1): Selective Matching Losses – Not All Scores Are Created Equal
Title: Selective Matching Losses – Not All Scores Are Created Equal | Selektive Matching Losses – Nicht alle Scores werden gleich erstellt | 选择性匹配损失 – – 并非所有分数都等于创建 2506.04446v2 |
Authors (2): Gil I. Shamir, Manfred K. Warmuth
Learning systems match predicted scores to observations over some domain. Often, it is critical to produce accurate predictions in some subset (or region) of the domain, yet less important to accurately predict in other regions. We construct selective matching loss functions by design of increasing link functions over score domains. A matching loss is an integral over the link. A link defines loss sensitivity as function of the score, emphasizing high slope high sensitivity regions over flat ones. Loss asymmetry drives a model and resolves its underspecification to predict better in high sensitivity regions where it is more important, and to distinguish between high and low importance regions. A large variety of selective scalar losses can be designed with scaled and shifted Sigmoid and hyperbolic sine links. Their properties, however, do not extend to multi-class. Applying them per dimension lacks ranking sensitivity that assigns importance according to class score ranking. Utilizing composite Softmax functions, we develop a framework for multidimensional selective losses. We overcome limitations of the standard Softmax function, that is good for classification, but not for distinction between adjacent scores. Selective losses have substantial advantage over traditional losses in applications with more important score regions, including dwell-time prediction, retrieval, ranking with either pointwise, contrastive pairwise, or listwise losses, distillation problems, and fine-tuning alignment of Large Language Models (LLMs).
nan
Article 939
Title@2025-06-09 (1): Positional Attention: Expressivity and Learnability of Algorithmic Computation
Title: Positional Attention: Expressivity and Learnability of Algorithmic Computation | Positionale Aufmerksamkeit: Expressivität und Erlernbarkeit algorithmischer Computation | 位置注意: 算法的可表达性和可理解性 2410.01686v3 |
Authors (5): Artur Back de Luca, George Giapitzakis, Shenghao Yang, Petar Veličković, Kimon Fountoulakis
There is a growing interest in the ability of neural networks to execute algorithmic tasks (e.g., arithmetic, summary statistics, and sorting). The goal of this work is to better understand the role of attention in Transformers for algorithmic execution. Its importance for algorithmic execution has been studied theoretically and empirically using parallel computational models. Notably, many parallel algorithms communicate between processors solely using positional information. Inspired by this observation, we investigate how Transformers can execute algorithms using positional attention, where attention weights depend exclusively on positional encodings. We prove that Transformers with positional attention (positional Transformers) maintain the same expressivity of parallel computational models, incurring a logarithmic depth cost relative to the input length. We analyze their in-distribution learnability and explore how parameter norms in positional attention affect sample complexity. Our results show that positional Transformers introduce a learning trade-off: while they exhibit better theoretical dependence on parameter norms, certain tasks may require more layers, which can, in turn, increase sample complexity. Finally, we empirically explore the out-of-distribution performance of positional Transformers and find that they perform well in tasks where their underlying algorithmic solution relies on positional information.
nan
Article 940
Title@2025-06-09 (1): StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets
Title: StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets | StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets | StableMTL:从部分附加说明合成数据集中重新定位多任务学习的远程传播模型 2506.08013v1 |
Authors (3): Anh-Quan Cao, Ivan Lopes, Raoul de Charette
Multi-task learning for dense prediction is limited by the need for extensive annotation for every task, though recent works have explored training with partial task labels. Leveraging the generalization power of diffusion models, we extend the partial learning setup to a zero-shot setting, training a multi-task model on multiple synthetic datasets, each labeled for only a subset of tasks. Our method, StableMTL, repurposes image generators for latent regression. Adapting a denoising framework with task encoding, per-task conditioning and a tailored training scheme. Instead of per-task losses requiring careful balancing, a unified latent loss is adopted, enabling seamless scaling to more tasks. To encourage inter-task synergy, we introduce a multi-stream model with a task-attention mechanism that converts N-to-N task interactions into efficient 1-to-N attention, promoting effective cross-task sharing. StableMTL outperforms baselines on 7 tasks across 8 benchmarks.
nan
Article 941
Title@2025-06-09 (1): Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Title: Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion | Selbsterzwingen: Überbrückung der Zug-Testlücke in autoregressiver Video-Diffusion | 自我强迫:弥合自动递减视频传播中的培训-测试差距 2506.08009v1 |
Authors (5): Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, Eli Shechtman
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models. It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs during inference. Unlike prior methods that denoise future frames based on ground-truth context frames, Self Forcing conditions each frame’s generation on previously self-generated outputs by performing autoregressive rollout with key-value (KV) caching during training. This strategy enables supervision through a holistic loss at the video level that directly evaluates the quality of the entire generated sequence, rather than relying solely on traditional frame-wise objectives. To ensure training efficiency, we employ a few-step diffusion model along with a stochastic gradient truncation strategy, effectively balancing computational cost and performance. We further introduce a rolling KV cache mechanism that enables efficient autoregressive video extrapolation. Extensive experiments demonstrate that our approach achieves real-time streaming video generation with sub-second latency on a single GPU, while matching or even surpassing the generation quality of significantly slower and non-causal diffusion models. Project website: http://self-forcing.github.io/
nan
Article 942
Title@2025-06-09 (1): Hidden in plain sight: VLMs overlook their visual representations
Title: Hidden in plain sight: VLMs overlook their visual representations | Versteckt in freier Sicht: VLMs übersehen ihre visuellen Darstellungen | 隐藏在普通视线之下:VLMs忽略了它们的视觉形象。 2506.08008v1 |
Authors (4): Stephanie Fu, Tyler Bonnen, Devin Guillory, Trevor Darrell
Language provides a natural interface to specify and evaluate performance on visual tasks. To realize this possibility, vision language models (VLMs) must successfully integrate visual and linguistic information. Our work compares VLMs to a direct readout of their visual encoders to understand their ability to integrate across these modalities. Across a series of vision-centric benchmarks (e.g., depth estimation, correspondence), we find that VLMs perform substantially worse than their visual encoders, dropping to near-chance performance. We investigate these results through a series of analyses across the entire VLM: namely 1) the degradation of vision representations, 2) brittleness to task prompt, and 3) the language model’s role in solving the task. We find that the bottleneck in performing these vision-centric tasks lies in this third category; VLMs are not effectively using visual information easily accessible throughout the entire model, and they inherit the language priors present in the LLM. Our work helps diagnose the failure modes of open-source VLMs, and presents a series of evaluations useful for future investigations into visual understanding within VLMs.
nan
Article 943
Title@2025-06-09 (1): Reparameterized LLM Training via Orthogonal Equivalence Transformation
Title: Reparameterized LLM Training via Orthogonal Equivalence Transformation | Reparameterisiertes LLM-Training über Orthogonale Äquivalenztransformation | 通过正正对等转化进行修复性磁力LLM培训 2506.08001v1 |
Authors (6): Zeju Qiu, Simon Buchholz, Tim Z. Xiao, Maximilian Dax, Bernhard Schölkopf, Weiyang Liu
While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field’s most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.
nan
Article 944
Title@2025-06-09 (1): MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation
Title: MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation | MADFormer: Mixed Autoregressive und Diffusion Transformer für die kontinuierliche Bildgenerierung | MADFormer: 用于连续图像生成的混合自动递减和扩散变异器 2506.07999v1 |
Authors (3): Junhao Chen, Yulia Tsvetkov, Xiaochuang Han
Recent progress in multimodal generation has increasingly combined autoregressive (AR) and diffusion-based approaches, leveraging their complementary strengths: AR models capture long-range dependencies and produce fluent, context-aware outputs, while diffusion models operate in continuous latent spaces to refine high-fidelity visual details. However, existing hybrids often lack systematic guidance on how and why to allocate model capacity between these paradigms. In this work, we introduce MADFormer, a Mixed Autoregressive and Diffusion Transformer that serves as a testbed for analyzing AR-diffusion trade-offs. MADFormer partitions image generation into spatial blocks, using AR layers for one-pass global conditioning across blocks and diffusion layers for iterative local refinement within each block. Through controlled experiments on FFHQ-1024 and ImageNet, we identify two key insights: (1) block-wise partitioning significantly improves performance on high-resolution images, and (2) vertically mixing AR and diffusion layers yields better quality-efficiency balances–improving FID by up to 75% under constrained inference compute. Our findings offer practical design principles for future hybrid generative models.
nan
Article 945
Title@2025-06-09 (1): Generative Modeling of Weights: Generalization or Memorization?
Title: Generative Modeling of Weights: Generalization or Memorization? | Generative Modellierung von Gewichten: Verallgemeinerung oder Erinnerung? | 产生重力建模:普遍化还是记忆化? 2506.07998v1 |
Authors (4): Boya Zeng, Yida Yin, Zhiqiu Xu, Zhuang Liu
Generative models, with their success in image and video generation, have recently been explored for synthesizing effective neural network weights. These approaches take trained neural network checkpoints as training data, and aim to generate high-performing neural network weights during inference. In this work, we examine four representative methods on their ability to generate novel model weights, i.e., weights that are different from the checkpoints seen during training. Surprisingly, we find that these methods synthesize weights largely by memorization: they produce either replicas, or at best simple interpolations, of the training checkpoints. Current methods fail to outperform simple baselines, such as adding noise to the weights or taking a simple weight ensemble, in obtaining different and simultaneously high-performing models. We further show that this memorization cannot be effectively mitigated by modifying modeling factors commonly associated with memorization in image diffusion models, or applying data augmentations. Our findings provide a realistic assessment of what types of data current generative models can model, and highlight the need for more careful evaluation of generative models in new domains. Our code is available at https://github.com/boyazeng/weight_memorization.
nan
Article 946
Title@2025-06-09 (1): Domain Switching on the Pareto Front: Multi-Objective Deep Kernel Learning in Automated Piezoresponse Force Microscopy
Title: Domain Switching on the Pareto Front: Multi-Objective Deep Kernel Learning in Automated Piezoresponse Force Microscopy | Domain-Umschaltung der Pareto-Front: Multi-Objektives Deep Kernel-Lernen in automatisierter Piezoresponse-Kraftmikroskopie | Pareto Front 域域切换:在自动 Piezo响应力量中进行多目标深核心学习的显微镜 2506.08073v1 |
Authors (8): Yu Liu, Utkarsh Pratiush, Kamyar Barakati, Hiroshi Funakubo, Ching-Che Lin, Jaegyu Kim, Lane W. Martin, Sergei V. Kalinin
Ferroelectric polarization switching underpins the functional performance of a wide range of materials and devices, yet its dependence on complex local microstructural features renders systematic exploration by manual or grid-based spectroscopic measurements impractical. Here, we introduce a multi-objective kernel-learning workflow that infers the microstructural rules governing switching behavior directly from high-resolution imaging data. Applied to automated piezoresponse force microscopy (PFM) experiments, our framework efficiently identifies the key relationships between domain-wall configurations and local switching kinetics, revealing how specific wall geometries and defect distributions modulate polarization reversal. Post-experiment analysis projects abstract reward functions, such as switching ease and domain symmetry, onto physically interpretable descriptors including domain configuration and proximity to boundaries. This enables not only high-throughput active learning, but also mechanistic insight into the microstructural control of switching phenomena. While demonstrated for ferroelectric domain switching, our approach provides a powerful, generalizable tool for navigating complex, non-differentiable design spaces, from structure-property correlations in molecular discovery to combinatorial optimization across diverse imaging modalities.
nan
Article 947
Title@2025-06-09 (1): PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Title: PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation | PhantomWiki: On-Demand-Datensätze für die Bewertung von Vernunft und Retrieval | PhohantomWiki: 用于理性和检索评价的现场数据集 2502.20377v2 |
Authors (9): Albert Gong, Kamilė Stankevičiūtė, Chao Wan, Anmol Kabra, Raphael Thesmar, Johann Lee, Julius Klenke, Carla P. Gomes, Kilian Q. Weinberger
High-quality benchmarks are essential for evaluating reasoning and retrieval capabilities of large language models (LLMs). However, curating datasets for this purpose is not a permanent solution as they are prone to data leakage and inflated performance results. To address these challenges, we propose PhantomWiki: a pipeline to generate unique, factually consistent document corpora with diverse question-answer pairs. Unlike prior work, PhantomWiki is neither a fixed dataset, nor is it based on any existing data. Instead, a new PhantomWiki instance is generated on demand for each evaluation. We vary the question difficulty and corpus size to disentangle reasoning and retrieval capabilities respectively, and find that PhantomWiki datasets are surprisingly challenging for frontier LLMs. Thus, we contribute a scalable and data leakage-resistant framework for disentangled evaluation of reasoning, retrieval, and tool-use abilities. Our code is available at https://github.com/kilian-group/phantom-wiki.
nan
Article 948
Title@2025-06-09 (1): When Two LLMs Debate, Both Think They’ll Win
Title: When Two LLMs Debate, Both Think They’ll Win | Wenn zwei LLMs diskutieren, denken beide, dass sie gewinnen werden | 当两个LLM 辩论, 双方都认为他们会赢 2505.19184v3 |
Authors (2): Pradyumna Shyama Prasad, Minh Nhat Nguyen
Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language Models (LLMs) in a dynamic, adversarial debate setting, uniquely combining two realistic factors: (a) a multi-turn format requiring models to update beliefs as new information emerges, and (b) a zero-sum structure to control for task-related uncertainty, since mutual high-confidence claims imply systematic overconfidence. We organized 60 three-round policy debates among ten state-of-the-art LLMs, with models privately rating their confidence (0-100) in winning after each round. We observed five concerning patterns: (1) Systematic overconfidence: models began debates with average initial confidence of 72.9% vs. a rational 50% baseline. (2) Confidence escalation: rather than reducing confidence as debates progressed, debaters increased their win probabilities, averaging 83% by the final round. (3) Mutual overestimation: in 61.7% of debates, both sides simultaneously claimed >=75% probability of victory, a logical impossibility. (4) Persistent self-debate bias: models debating identical copies increased confidence from 64.1% to 75.2%; even when explicitly informed their chance of winning was exactly 50%, confidence still rose (from 50.0% to 57.1%). (5) Misaligned private reasoning: models’ private scratchpad thoughts sometimes differed from their public confidence ratings, raising concerns about faithfulness of chain-of-thought reasoning. These results suggest LLMs lack the ability to accurately self-assess or update their beliefs in dynamic, multi-turn tasks; a major concern as LLMs are now increasingly deployed without careful review in assistant and agentic roles. Code for our experiments is available at https://github.com/pradyuprasad/llms_overconfidence
nan
Article 949
Title@2025-06-09 (1): Rethinking Crowd-Sourced Evaluation of Neuron Explanations
Title: Rethinking Crowd-Sourced Evaluation of Neuron Explanations | Rethinking Crowd-Sourced Evaluation von Neuron Erklärungen | 重新思考人群-源对内向解释的重新思考 2506.07985v1 |
Authors (4): Tuomas Oikarinen, Ge Yan, Akshay Kulkarni, Tsui-Wei Weng
Interpreting individual neurons or directions in activations space is an important component of mechanistic interpretability. As such, many algorithms have been proposed to automatically produce neuron explanations, but it is often not clear how reliable these explanations are, or which methods produce the best explanations. This can be measured via crowd-sourced evaluations, but they can often be noisy and expensive, leading to unreliable results. In this paper, we carefully analyze the evaluation pipeline and develop a cost-effective and highly accurate crowdsourced evaluation strategy. In contrast to previous human studies that only rate whether the explanation matches the most highly activating inputs, we estimate whether the explanation describes neuron activations across all inputs. To estimate this effectively, we introduce a novel application of importance sampling to determine which inputs are the most valuable to show to raters, leading to around 30x cost reduction compared to uniform sampling. We also analyze the label noise present in crowd-sourced evaluations and propose a Bayesian method to aggregate multiple ratings leading to a further ~5x reduction in number of ratings required for the same accuracy. Finally, we use these methods to conduct a large-scale study comparing the quality of neuron explanations produced by the most popular methods for two different vision models.
nan
Article 950
Title@2025-06-09 (1): CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray
Title: CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray | CXR-LT 2024: Eine MICCAI-Herausforderung bei Langzeit-, Multilabel- und Zero-Shot-Erkrankungsklassifizierung aus der Bruströntgenuntersuchung | CXR-LT 2024:MICCAI对胸前X光中长尾、多标签和零发疾病分类的挑战 2506.07984v1 |
Authors (33): Mingquan Lin, Gregory Holste, Song Wang, Yiliang Zhou, Yishu Wei, Imon Banerjee, Pengyi Chen, Tianjie Dai, Yuexi Du, Nicha C. Dvornek, Yuyan Ge, Zuowei Guo, Shouhei Hanaoka, Dongkyun Kim, Pablo Messina, Yang Lu, Denis Parra, Donghyun Son, Álvaro Soto, Aisha Urooj, René Vidal, Yosuke Yamagishi, Zefan Yang, Ruichi Zhang, Yang Zhou, Leo Anthony Celi, Ronald M. Summers, Zhiyong Lu, Hao Chen, Adam Flanders, George Shih, Zhangyang Wang, Yifan Peng
The CXR-LT series is a community-driven initiative designed to enhance lung disease classification using chest X-rays (CXR). It tackles challenges in open long-tailed lung disease classification and enhances the measurability of state-of-the-art techniques. The first event, CXR-LT 2023, aimed to achieve these goals by providing high-quality benchmark CXR data for model development and conducting comprehensive evaluations to identify ongoing issues impacting lung disease classification performance. Building on the success of CXR-LT 2023, the CXR-LT 2024 expands the dataset to 377,110 chest X-rays (CXRs) and 45 disease labels, including 19 new rare disease findings. It also introduces a new focus on zero-shot learning to address limitations identified in the previous event. Specifically, CXR-LT 2024 features three tasks: (i) long-tailed classification on a large, noisy test set, (ii) long-tailed classification on a manually annotated “gold standard” subset, and (iii) zero-shot generalization to five previously unseen disease findings. This paper provides an overview of CXR-LT 2024, detailing the data curation process and consolidating state-of-the-art solutions, including the use of multimodal models for rare disease detection, advanced generative approaches to handle noisy labels, and zero-shot learning strategies for unseen diseases. Additionally, the expanded dataset enhances disease coverage to better represent real-world clinical settings, offering a valuable resource for future research. By synthesizing the insights and innovations of participating teams, we aim to advance the development of clinically realistic and generalizable diagnostic models for chest radiography.
nan
Article 951
Title@2025-06-09 (1): Real-time Localization of a Soccer Ball from a Single Camera
Title: Real-time Localization of a Soccer Ball from a Single Camera | Echtzeit Lokalisierung eines Fußballballs aus einer einzigen Kamera | 单摄像头足球球的实时本地化 2506.07981v1 |
Authors (3): Dmitrii Vorobev, Artem Prosvetov, Karim Elhadji Daou
We propose a computationally efficient method for real-time three-dimensional football trajectory reconstruction from a single broadcast camera. In contrast to previous work, our approach introduces a multi-mode state model with $W$ discrete modes to significantly accelerate optimization while preserving centimeter-level accuracy – even in cases of severe occlusion, motion blur, and complex backgrounds. The system operates on standard CPUs and achieves low latency suitable for live broadcast settings. Extensive evaluation on a proprietary dataset of 6K-resolution Russian Premier League matches demonstrates performance comparable to multi-camera systems, without the need for specialized or costly infrastructure. This work provides a practical method for accessible and accurate 3D ball tracking in professional football environments.
nan
Article 952
Title@2025-06-09 (1): Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation
Title: Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation | Bipartite Ranking von mehreren Labels: Auf Verlust Versus Label Aggregation | 多标签的双边排名:关于损失的Versus Label聚合 2504.11284v2 |
Authors (10): Michal Lukasik, Lin Chen, Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Felix X. Yu, Sashank J. Reddi, Gang Fu, Mohammadhossein Bateni, Sanjiv Kumar
Bipartite ranking is a fundamental supervised learning problem, with the goal of learning a ranking over instances with maximal Area Under the ROC Curve (AUC) against a single binary target label. However, one may often observe multiple binary target labels, e.g., from distinct human annotators. How can one synthesize such labels into a single coherent ranking? In this work, we formally analyze two approaches to this problem – loss aggregation and label aggregation – by characterizing their Bayes-optimal solutions. We show that while both approaches can yield Pareto-optimal solutions, loss aggregation can exhibit label dictatorship: one can inadvertently (and undesirably) favor one label over others. This suggests that label aggregation can be preferable to loss aggregation, which we empirically verify.
nan
Article 953
Title@2025-06-09 (1): Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator
Title: Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator | Realistischer städtischer Verkehrsgenerator mit dezentralisiertem Federated Learning für den SUMO-Simulator | 使用分散化的联邦学习来模拟SUMO的模拟器进行现实化城市交通发电机 2506.07980v1 |
Authors (9): Alberto Bazán-Guillén, Carlos Beis-Penedo, Diego Cajaraville-Aboy, Pablo Barbecho-Bautista, Rebeca P. Díaz-Redondo, Luis J. de la Cruz Llopis, Ana Fernández-Vilas, Mónica Aguilar Igartua, Manuel Fernández-Veiga
Realistic urban traffic simulation is essential for sustainable urban planning and the development of intelligent transportation systems. However, generating high-fidelity, time-varying traffic profiles that accurately reflect real-world conditions, especially in large-scale scenarios, remains a major challenge. Existing methods often suffer from limitations in accuracy, scalability, or raise privacy concerns due to centralized data processing. This work introduces DesRUTGe (Decentralized Realistic Urban Traffic Generator), a novel framework that integrates Deep Reinforcement Learning (DRL) agents with the SUMO simulator to generate realistic 24-hour traffic patterns. A key innovation of DesRUTGe is its use of Decentralized Federated Learning (DFL), wherein each traffic detector and its corresponding urban zone function as an independent learning node. These nodes train local DRL models using minimal historical data and collaboratively refine their performance by exchanging model parameters with selected peers (e.g., geographically adjacent zones), without requiring a central coordinator. Evaluated using real-world data from the city of Barcelona, DesRUTGe outperforms standard SUMO-based tools such as RouteSampler, as well as other centralized learning approaches, by delivering more accurate and privacy-preserving traffic pattern generation.
nan
Article 954
Title@2025-06-09 (1): Automated Capability Discovery via Foundation Model Self-Exploration
Title: Automated Capability Discovery via Foundation Model Self-Exploration | Automatisierte Capability Discovery über die Selbsterkundung von Fundamentmodellen | 通过基金会自我探索示范模式自动发现能力 2502.07577v3 |
Authors (3): Cong Lu, Shengran Hu, Jeff Clune
Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains through training on web-scale data. It remains challenging to precisely characterize even a fraction of the full spectrum of these abilities and potential risks in any new model. Existing evaluation approaches often require significant human effort, and it is taking increasing effort to design ever harder challenges for more capable models. We introduce Automated Capability Discovery (ACD), a framework that designates one foundation model as a scientist to systematically propose open-ended tasks probing the abilities of a subject model (potentially itself). By combining frontier models with ideas from the field of open-endedness, ACD automatically and systematically uncovers a diverse spectrum of surprising capabilities and failures in the subject model. We demonstrate ACD across a range of foundation models (including the GPT, Claude, and Llama series), showing that it automatically generates thousands of distinct tasks, which are then clustered to reveal dozens of broader capability areas and failure modes, that would be challenging for any single team to uncover. We further validate our method’s automated scoring with extensive human surveys, observing high agreement between model-generated and human evaluations. By leveraging foundation models’ ability to both create tasks and self-evaluate, ACD is a significant step toward scalable, automated evaluation of novel AI systems. All code and evaluation logs are open-sourced at https://github.com/conglu1997/ACD.
nan
Article 955
Title@2025-06-09 (1): Hyperpruning: Efficient Search through Pruned Variants of Recurrent Neural Networks Leveraging Lyapunov Spectrum
Title: Hyperpruning: Efficient Search through Pruned Variants of Recurrent Neural Networks Leveraging Lyapunov Spectrum | Hyperpruning: Effiziente Suche durch beschnittene Varianten von recurrenten Neuronalen Netzwerken, die Lyapunov Spectrum nutzen | 超光速运行:通过经常神经网络的审慎变式有效搜索,利用Lyapunov光谱网 2506.07975v1 |
Authors (2): Caleb Zheng, Eli Shlizerman
A variety of pruning methods have been introduced for over-parameterized Recurrent Neural Networks to improve efficiency in terms of power consumption and storage utilization. These advances motivate a new paradigm, termed `hyperpruning’, which seeks to identify the most suitable pruning strategy for a given network architecture and application. Unlike conventional hyperparameter search, where the optimal configuration’s accuracy remains uncertain, in the context of network pruning, the accuracy of the dense model sets the target for the accuracy of the pruned one. The goal, therefore, is to discover pruned variants that match or even surpass this established accuracy. However, exhaustive search over pruning configurations is computationally expensive and lacks early performance guarantees. To address this challenge, we propose a novel Lyapunov Spectrum (LS)-based distance metric that enables early comparison between pruned and dense networks, allowing accurate prediction of post-training performance. By integrating this LS-based distance with standard hyperparameter optimization algorithms, we introduce an efficient hyperpruning framework, termed LS-based Hyperpruning (LSH). LSH reduces search time by an order of magnitude compared to conventional approaches relying on full training. Experiments on stacked LSTM and RHN architectures using the Penn Treebank dataset, and on AWD-LSTM-MoS using WikiText-2, demonstrate that under fixed training budgets and target pruning ratios, LSH consistently identifies superior pruned models. Remarkably, these pruned variants not only outperform those selected by loss-based baseline but also exceed the performance of their dense counterpart.
nan
Article 956
Title@2025-06-09 (1): Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures
Title: Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures | Intelligentes Offloading im Fahrzeug Edge Computing: Eine umfassende Überprüfung von Deep Reinforcement-Lernansätzen und Architekturen | 在车辆边缘计算机中卸载:对深强化学习方法和架构的全面审查 2502.06963v2 |
Authors (3): Ashab Uddin, Ahmed Hamdi Sakr, Ning Zhang
The increasing complexity of Intelligent Transportation Systems (ITS) has led to significant interest in computational offloading to external infrastructures such as edge servers, vehicular nodes, and UAVs. These dynamic and heterogeneous environments pose challenges for traditional offloading strategies, prompting the exploration of Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) as adaptive decision-making frameworks. This survey presents a comprehensive review of recent advances in DRL-based offloading for vehicular edge computing (VEC). We classify and compare existing works based on learning paradigms (e.g., single-agent, multi-agent), system architectures (e.g., centralized, distributed, hierarchical), and optimization objectives (e.g., latency, energy, fairness). Furthermore, we analyze how Markov Decision Process (MDP) formulations are applied and highlight emerging trends in reward design, coordination mechanisms, and scalability. Finally, we identify open challenges and outline future research directions to guide the development of robust and intelligent offloading strategies for next-generation ITS.
nan
Article 957
Title@2025-06-09 (1): HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
Title: HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization | HeuriGym: Agentischer Benchmark für LLM-gekrempelte Heuristik in der kombinatorischen Optimierung | HeuriGym: 组合优化中LLM-减压希力学的代理基准 2506.07972v1 |
Authors (13): Hongzheng Chen, Yingheng Wang, Yaohui Cai, Hins Hu, Jiajie Li, Shirley Huang, Chenhui Deng, Rongjian Liang, Shufeng Kong, Haoxing Ren, Samitha Samaranayake, Carla P. Gomes, Zhiru Zhang
While Large Language Models (LLMs) have demonstrated significant advancements in reasoning and agent-based problem-solving, current evaluation methodologies fail to adequately assess their capabilities: existing benchmarks either rely on closed-ended questions prone to saturation and memorization, or subjective comparisons that lack consistency and rigor. In this work, we introduce HeuriGym, an agentic framework designed for evaluating heuristic algorithms generated by LLMs for combinatorial optimization problems, characterized by clearly defined objectives and expansive solution spaces. HeuriGym empowers LLMs to propose heuristics, receive evaluative feedback via code execution, and iteratively refine their solutions. We evaluate nine state-of-the-art models on nine problems across domains such as computer systems, logistics, and biology, exposing persistent limitations in tool use, planning, and adaptive reasoning. To quantify performance, we propose the Quality-Yield Index (QYI), a metric that captures both solution pass rate and quality. Even top models like GPT-o4-mini-high and Gemini-2.5-Pro attain QYI scores of only 0.6, well below the expert baseline of 1. Our open-source benchmark aims to guide the development of LLMs toward more effective and realistic problem-solving in scientific and engineering domains.
nan
Article 958
Title@2025-06-09 (1): MIB: A Mechanistic Interpretability Benchmark
Title: MIB: A Mechanistic Interpretability Benchmark | MIB: Ein Mechanistischer Interpretations-Benchmark | MIB: 机械解释基准 2504.13151v2 |
Authors (23): Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov
How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or causal variables in neural language models. The circuit localization track compares methods that locate the model components - and connections between them - most important for performing a task (e.g., attribution patching or information flow routes). The causal variable localization track compares methods that featurize a hidden vector, e.g., sparse autoencoders (SAEs) or distributed alignment search (DAS), and align those features to a task-relevant causal variable. Using MIB, we find that attribution and mask optimization methods perform best on circuit localization. For causal variable localization, we find that the supervised DAS method performs best, while SAE features are not better than neurons, i.e., non-featurized hidden vectors. These findings illustrate that MIB enables meaningful comparisons, and increases our confidence that there has been real progress in the field.
nan
Article 959
Title@2025-06-09 (1): A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling
Title: A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling | Ein zweiphasiges Deep-Learning-Framework für adaptive Zeitschritte in der High-Speed-Flow-Modellierung | 高频流建模适应性时间排行的两阶段深学习框架 2506.07969v1 |
Authors (14): Jacob Helwig, Sai Sreeharsha Adavi, Xuan Zhang, Yuchao Lin, Felix S. Chim, Luke Takeshi Vizzini, Haiyang Yu, Muhammad Hasnain, Saykat Kumar Biswas, John J. Holloway, Narendra Singh, N. K. Anand, Swagnik Guhathakurta, Shuiwang Ji
We consider the problem of modeling high-speed flows using machine learning methods. While most prior studies focus on low-speed fluid flows in which uniform time-stepping is practical, flows approaching and exceeding the speed of sound exhibit sudden changes such as shock waves. In such cases, it is essential to use adaptive time-stepping methods to allow a temporal resolution sufficient to resolve these phenomena while simultaneously balancing computational costs. Here, we propose a two-phase machine learning method, known as ShockCast, to model high-speed flows with adaptive time-stepping. In the first phase, we propose to employ a machine learning model to predict the timestep size. In the second phase, the predicted timestep is used as an input along with the current fluid fields to advance the system state by the predicted timestep. We explore several physically-motivated components for timestep prediction and introduce timestep conditioning strategies inspired by neural ODE and Mixture of Experts. As ShockCast is the first framework for learning high-speed flows, we evaluate our methods by generating two supersonic flow datasets, available at https://huggingface.co/datasets/divelab. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS).
nan
Article 960
Title@2025-06-09 (1): Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs
Title: Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs | Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs | 物理知情神经溶解器中发现聚合物的神经内核内核分析:PIKANs诉PINNs 2506.07958v1 |
Authors (2): Salah A. Faroughi, Farinaz Mostajeran
Physics-informed Kolmogorov-Arnold Networks (PIKANs), and in particular their Chebyshev-based variants (cPIKANs), have recently emerged as promising models for solving partial differential equations (PDEs). However, their training dynamics and convergence behavior remain largely unexplored both theoretically and numerically. In this work, we aim to advance the theoretical understanding of cPIKANs by analyzing them using Neural Tangent Kernel (NTK) theory. Our objective is to discern the evolution of kernel structure throughout gradient-based training and its subsequent impact on learning efficiency. We first derive the NTK of standard cKANs in a supervised setting, and then extend the analysis to the physics-informed context. We analyze the spectral properties of NTK matrices, specifically their eigenvalue distributions and spectral bias, for four representative PDEs: the steady-state Helmholtz equation, transient diffusion and Allen-Cahn equations, and forced vibrations governed by the Euler-Bernoulli beam equation. We also conduct an investigation into the impact of various optimization strategies, e.g., first-order, second-order, and hybrid approaches, on the evolution of the NTK and the resulting learning dynamics. Results indicate a tractable behavior for NTK in the context of cPIKANs, which exposes learning dynamics that standard physics-informed neural networks (PINNs) cannot capture. Spectral trends also reveal when domain decomposition improves training, directly linking kernel behavior to convergence rates under different setups. To the best of our knowledge, this is the first systematic NTK study of cPIKANs, providing theoretical insight that clarifies and predicts their empirical performance.
nan
Article 961
Title@2025-06-09 (1): Distillation Robustifies Unlearning
Title: Distillation Robustifies Unlearning | Destillation stärkt das Unlearning | 固化 固化 校正 校正 2506.06278v2 |
Authors (9): Bruce W. Lee, Addie Foote, Alex Infanger, Leni Shor, Harish Kamath, Jacob Goldman-Wetzler, Bryce Woodworth, Alex Cloud, Alexander Matt Turner
Current LLM unlearning methods are not robust: they can be reverted easily with a few steps of finetuning. This is true even for the idealized unlearning method of training to imitate an oracle model that was never exposed to unwanted information, suggesting that output-based finetuning is insufficient to achieve robust unlearning. In a similar vein, we find that training a randomly initialized student to imitate an unlearned model transfers desired behaviors while leaving undesired capabilities behind. In other words, distillation robustifies unlearning. Building on this insight, we propose Unlearn-Noise-Distill-on-Outputs (UNDO), a scalable method that distills an unlearned model into a partially noised copy of itself. UNDO introduces a tunable tradeoff between compute cost and robustness, establishing a new Pareto frontier on synthetic language and arithmetic tasks. At its strongest setting, UNDO matches the robustness of a model retrained from scratch with perfect data filtering while using only 60-80% of the compute and requiring only 0.01% of the pretraining data to be labeled. We also show that UNDO robustifies unlearning on the more realistic Weapons of Mass Destruction Proxy (WMDP) benchmark. Since distillation is widely used in practice, incorporating an unlearning step beforehand offers a convenient path to robust capability removal.
nan
Article 962
Title@2025-06-09 (1): Language Models over Canonical Byte-Pair Encodings
Title: Language Models over Canonical Byte-Pair Encodings | Sprachmodelle über Canonical Byte-Pair Encodings | Canonicic 字节对称编码语言模型 2506.07956v1 |
Authors (10): Tim Vieira, Tianyu Liu, Clemente Pasti, Yahya Emara, Brian DuSell, Benjamin LeBrun, Mario Giulianelli, Juan Luis Gastaldi, Timothy J. O’Donnell, Ryan Cotterell
Modern language models represent probability distributions over character strings as distributions over (shorter) token strings derived via a deterministic tokenizer, such as byte-pair encoding. While this approach is highly effective at scaling up language models to large corpora, its current incarnations have a concerning property: the model assigns nonzero probability mass to an exponential number of $\it{noncanonical}$ token encodings of each character string – these are token strings that decode to valid character strings but are impossible under the deterministic tokenizer (i.e., they will never be seen in any training corpus, no matter how large). This misallocation is both erroneous, as noncanonical strings never appear in training data, and wasteful, diverting probability mass away from plausible outputs. These are avoidable mistakes! In this work, we propose methods to enforce canonicality in token-level language models, ensuring that only canonical token strings are assigned positive probability. We present two approaches: (1) canonicality by conditioning, leveraging test-time inference strategies without additional training, and (2) canonicality by construction, a model parameterization that guarantees canonical outputs but requires training. We demonstrate that fixing canonicality mistakes improves the likelihood of held-out data for several models and corpora.
nan
Article 963
Title@2025-06-09 (1): Discrete and Continuous Difference of Submodular Minimization
Title: Discrete and Continuous Difference of Submodular Minimization | Diskreter und kontinuierlicher Unterschied der Submodularen Minimierung | 子模块最小化的分质和连续差异 2506.07952v1 |
Authors (3): George Orfanides, Tim Hoheisel, Marwa El Halabi
Submodular functions, defined on continuous or discrete domains, arise in numerous applications. We study the minimization of the difference of two submodular (DS) functions, over both domains, extending prior work restricted to set functions. We show that all functions on discrete domains and all smooth functions on continuous domains are DS. For discrete domains, we observe that DS minimization is equivalent to minimizing the difference of two convex (DC) functions, as in the set function case. We propose a novel variant of the DC Algorithm (DCA) and apply it to the resulting DC Program, obtaining comparable theoretical guarantees as in the set function case. The algorithm can be applied to continuous domains via discretization. Experiments demonstrate that our method outperforms baselines in integer compressive sensing and integer least squares.
nan
Article 964
Title@2025-06-09 (1): Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization
Title: Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization | Easy2Hard-Bench: Standardisierte Schwierigkeits-Labels für Profiling LLM Performance und Generalisierung | Easy2Hard-Bench:用于分析LLM性能和通用的标准化困难标签 2409.18433v2 |
Authors (11): Mucong Ding, Chenghao Deng, Jocelyn Choo, Zichu Wu, Aakriti Agrawal, Avi Schwarzschild, Tianyi Zhou, Tom Goldstein, John Langford, Anima Anandkumar, Furong Huang
While generalization over tasks from easy to hard is crucial to profile language models (LLMs), the datasets with fine-grained difficulty annotations for each problem across a broad range of complexity are still blank. Aiming to address this limitation, we present Easy2Hard-Bench, a consistently formatted collection of 6 benchmark datasets spanning various domains, such as mathematics and programming problems, chess puzzles, and reasoning questions. Each problem within these datasets is annotated with numerical difficulty scores. To systematically estimate problem difficulties, we collect abundant performance data on attempts to each problem by humans in the real world or LLMs on the prominent leaderboard. Leveraging the rich performance data, we apply well-established difficulty ranking systems, such as Item Response Theory (IRT) and Glicko-2 models, to uniformly assign numerical difficulty scores to problems. Moreover, datasets in Easy2Hard-Bench distinguish themselves from previous collections by a higher proportion of challenging problems. Through extensive experiments with six state-of-the-art LLMs, we provide a comprehensive analysis of their performance and generalization capabilities across varying levels of difficulty, with the aim of inspiring future research in LLM generalization. The datasets are available at https://huggingface.co/datasets/furonghuang-lab/Easy2Hard-Bench.
nan
Article 965
Title@2025-06-09 (1): Cost-Optimal Active AI Model Evaluation
Title: Cost-Optimal Active AI Model Evaluation | Kostenoptimale, aktive KI-Modellbewertung | AI模型评价 2506.07949v1 |
Authors (5): Anastasios N. Angelopoulos, Jacob Eisenstein, Jonathan Berant, Alekh Agarwal, Adam Fisch
The development lifecycle of generative AI systems requires continual evaluation, data acquisition, and annotation, which is costly in both resources and time. In practice, rapid iteration often makes it necessary to rely on synthetic annotation data because of the low cost, despite the potential for substantial bias. In this paper, we develop novel, cost-aware methods for actively balancing the use of a cheap, but often inaccurate, weak rater – such as a model-based autorater that is designed to automatically assess the quality of generated content – with a more expensive, but also more accurate, strong rater alternative such as a human. More specifically, the goal of our approach is to produce a low variance, unbiased estimate of the mean of the target “strong” rating, subject to some total annotation budget. Building on recent work in active and prediction-powered statistical inference, we derive a family of cost-optimal policies for allocating a given annotation budget between weak and strong raters so as to maximize statistical efficiency. Using synthetic and real-world data, we empirically characterize the conditions under which these policies yield improvements over prior methods. We find that, especially in tasks where there is high variability in the difficulty of examples, our policies can achieve the same estimation precision at a far lower total annotation budget than standard evaluation methods.
nan
Article 966
Title@2025-06-09 (1): TokenBreak: Bypassing Text Classification Models Through Token Manipulation
Title: TokenBreak: Bypassing Text Classification Models Through Token Manipulation | TokenBreak: Textklassifikationsmodelle durch Token-Manipulation umgehen | Token Break: 通过调制调控绕过文本分类模型 2506.07948v1 |
Authors (3): Kasimir Schulz, Kenneth Yeung, Kieran Evans
Natural Language Processing (NLP) models are used for text-related tasks such as classification and generation. To complete these tasks, input data is first tokenized from human-readable text into a format the model can understand, enabling it to make inferences and understand context. Text classification models can be implemented to guard against threats such as prompt injection attacks against Large Language Models (LLMs), toxic input and cybersecurity risks such as spam emails. In this paper, we introduce TokenBreak: a novel attack that can bypass these protection models by taking advantage of the tokenization strategy they use. This attack technique manipulates input text in such a way that certain models give an incorrect classification. Importantly, the end target (LLM or email recipient) can still understand and respond to the manipulated text and therefore be vulnerable to the very attack the protection model was put in place to prevent. The tokenizer is tied to model architecture, meaning it is possible to predict whether or not a model is vulnerable to attack based on family. We also present a defensive strategy as an added layer of protection that can be implemented without having to retrain the defensive model.
nan
Article 967
Title@2025-06-09 (1): Info-Coevolution: An Efficient Framework for Data Model Coevolution
Title: Info-Coevolution: An Efficient Framework for Data Model Coevolution | Info-Coevolution: Ein effizienter Rahmen für die Datenmodellkoevolution | 信息革命:数据模型革命的有效框架 2506.08070v1 |
Authors (9): Ziheng Qin, Hailun Xu, Wei Chee Yew, Qi Jia, Yang Luo, Kanchan Sarkar, Danhui Guan, Kai Wang, Yang You
Machine learning relies heavily on data, yet the continuous growth of real-world data poses challenges for efficient dataset construction and training. A fundamental yet unsolved question is: given our current model and data, does a new data (sample/batch) need annotation/learning? Conventional approaches retain all available data, leading to non-optimal data and training efficiency. Active learning aims to reduce data redundancy by selecting a subset of samples to annotate, while it increases pipeline complexity and introduces bias. In this work, we propose Info-Coevolution, a novel framework that efficiently enables models and data to coevolve through online selective annotation with no bias. Leveraging task-specific models (and open-source models), it selectively annotates and integrates online and web data to improve datasets efficiently. For real-world datasets like ImageNet-1K, Info-Coevolution reduces annotation and training costs by 32\% without performance loss. It is able to automatically give the saving ratio without tuning the ratio. It can further reduce the annotation ratio to 50\% with semi-supervised learning. We also explore retrieval-based dataset enhancement using unlabeled open-source data. Code is available at https://github.com/NUS-HPC-AI-Lab/Info-Coevolution/.
nan
Article 968
Title@2025-06-09 (1): Gradients: When Markets Meet Fine-tuning – A Distributed Approach to Model Optimisation
Title: Gradients: When Markets Meet Fine-tuning – A Distributed Approach to Model Optimisation | Gradienten: Wenn Märkte Feinsteuerung treffen – Ein verteilter Ansatz zur Modelloptimierung | 渐进:市场达到微调时 – – 模型优化的分配办法 2506.07940v1 |
Authors (1): Christopher Subia-Waud
Foundation model fine-tuning faces a fundamental challenge: existing AutoML platforms rely on single optimisation strategies that explore only a fraction of viable hyperparameter configurations. In this white paper, We introduce Gradients, a decentralised AutoML platform that transforms hyperparameter optimisation into a competitive marketplace where independent miners compete to discover optimal configurations. Economic incentives align individual exploration with collective optimisation goals, driving systematic investigation of hyperparameter regions that centralised methods miss. We evaluate our approach across 180 controlled experiments spanning diverse model architectures (70M to 70B parameters) and task types. Gradients achieves an 82.8\% win rate against HuggingFace AutoTrain and 100\% against TogetherAI, Databricks, and Google Cloud, with mean improvements of 11.8\% and 42.1\% respectively. Complex reasoning and retrieval tasks show particularly strong gains of 30-40\%, whilst diffusion models achieve 23.4\% improvements for person-specific generation. These results demonstrate that competitive, economically-driven approaches can systematically discover superior configurations that centralised AutoML consistently miss.
nan
Article 969
Title@2025-06-09 (1): Reliably detecting model failures in deployment without labels
Title: Reliably detecting model failures in deployment without labels | Zuverlässige Erkennung von Modellfehlern im Einsatz ohne Etiketten | 可靠地发现无标签部署中的模型失败 2506.05047v2 |
Authors (7): Viet Nguyen, Changjian Shui, Vijay Giri, Siddarth Arya, Amol Verma, Fahad Razak, Rahul G. Krishnan
The distribution of data changes over time; models operating operating in dynamic environments need retraining. But knowing when to retrain, without access to labels, is an open challenge since some, but not all shifts degrade model performance. This paper formalizes and addresses the problem of post-deployment deterioration (PDD) monitoring. We propose D3M, a practical and efficient monitoring algorithm based on the disagreement of predictive models, achieving low false positive rates under non-deteriorating shifts and provides sample complexity bounds for high true positive rates under deteriorating shifts. Empirical results on both standard benchmark and a real-world large-scale internal medicine dataset demonstrate the effectiveness of the framework and highlight its viability as an alert mechanism for high-stakes machine learning pipelines.
nan
Article 970
Title@2025-06-09 (1): Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models
Title: Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models | Mimicking oder Reasoning: Multi-Modal In-Context Learning in Vision-Language-Modellen neu denken | 模拟或解释:在视觉语言模型中重新思考多模式内学习 2506.07936v1 |
Authors (7): Chengyue Huang, Yuchen Zhu, Sichen Zhu, Jingyun Xiao, Moises Andrade, Shivang Chopra, Zsolt Kira
Vision-language models (VLMs) are widely assumed to exhibit in-context learning (ICL), a property similar to that of their language-only counterparts. While recent work suggests VLMs can perform multimodal ICL (MM-ICL), studies show they often rely on shallow heuristics – such as copying or majority voting – rather than true task understanding. We revisit this assumption by evaluating VLMs under distribution shifts, where support examples come from a dataset different from the query. Surprisingly, performance often degrades with more demonstrations, and models tend to copy answers rather than learn from them. To investigate further, we propose a new MM-ICL with Reasoning pipeline that augments each demonstration with a generated rationale alongside the answer. We conduct extensive and comprehensive experiments on both perception- and reasoning-required datasets with open-source VLMs ranging from 3B to 72B and proprietary models such as Gemini 2.0. We conduct controlled studies varying shot count, retrieval method, rationale quality, and distribution. Our results show limited performance sensitivity across these factors, suggesting that current VLMs do not effectively utilize demonstration-level information as intended in MM-ICL.
nan
Article 971
Title@2025-06-09 (1): Ensemble-Based Survival Models with the Self-Attended Beran Estimator Predictions
Title: Ensemble-Based Survival Models with the Self-Attended Beran Estimator Predictions | Ensemble-basierte Überlebensmodelle mit den selbstbestimmten Beran-Estimator-Vorhersagen | 与自行预测的贝兰估算器合成生存模型 2506.07933v1 |
Authors (5): Lev V. Utkin, Semen P. Khomets, Vlada A. Efremenko, Andrei V. Konstantinov, Natalya M. Verbova
Survival analysis predicts the time until an event of interest, such as failure or death, but faces challenges due to censored data, where some events remain unobserved. Ensemble-based models, like random survival forests and gradient boosting, are widely used but can produce unstable predictions due to variations in bootstrap samples. To address this, we propose SurvBESA (Survival Beran Estimators Self-Attended), a novel ensemble model that combines Beran estimators with a self-attention mechanism. Unlike traditional methods, SurvBESA applies self-attention to predicted survival functions, smoothing out noise by adjusting each survival function based on its similarity to neighboring survival functions. We also explore a special case using Huber’s contamination model to define attention weights, simplifying training to a quadratic or linear optimization problem. Numerical experiments show that SurvBESA outperforms state-of-the-art models. The implementation of SurvBESA is publicly available.
nan
Article 972
Title@2025-06-09 (1): Predicting Bad Goods Risk Scores with ARIMA Time Series: A Novel Risk Assessment Approach
Title: Predicting Bad Goods Risk Scores with ARIMA Time Series: A Novel Risk Assessment Approach | Risiko-Scores mit ARIMA Time Series voraussagen: Ein neuartiger Ansatz zur Risikobewertung | ARIMA 时间系列:新风险评估方法 2502.16520v3 |
Authors (1): Bishwajit Prasad Gond
The increasing complexity of supply chains and the rising costs associated with defective or substandard goods (bad goods) highlight the urgent need for advanced predictive methodologies to mitigate risks and enhance operational efficiency. This research presents a novel framework that integrates Time Series ARIMA (AutoRegressive Integrated Moving Average) models with a proprietary formula specifically designed to calculate bad goods after time series forecasting. By leveraging historical data patterns, including sales, returns, and capacity, the model forecasts potential quality failures, enabling proactive decision-making. ARIMA is employed to capture temporal trends in time series data, while the newly developed formula quantifies the likelihood and impact of defects with greater precision. Experimental results, validated on a dataset spanning 2022-2024 for Organic Beer-G 1 Liter, demonstrate that the proposed method outperforms traditional statistical models, such as Exponential Smoothing and Holt-Winters, in both prediction accuracy and risk evaluation. This study advances the field of predictive analytics by bridging time series forecasting, ARIMA, and risk management in supply chain quality control, offering a scalable and practical solution for minimizing losses due to bad goods.
nan
Article 973
Title@2025-06-09 (1): Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor
Title: Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor | Squeeze3D: Ihr Modell der 3D-Generation ist geheim ein extremer Neuralkompressor | Squeeze3D: 您的 3D 代的模型是一个秘密的 极端神经压缩器 2506.07932v1 |
Authors (5): Rishit Dagli, Yushi Guan, Sankeerth Durvasula, Mohammadreza Mofayezi, Nandita Vijaykumar
We propose Squeeze3D, a novel framework that leverages implicit prior knowledge learnt by existing pre-trained 3D generative models to compress 3D data at extremely high compression ratios. Our approach bridges the latent spaces between a pre-trained encoder and a pre-trained generation model through trainable mapping networks. Any 3D model represented as a mesh, point cloud, or a radiance field is first encoded by the pre-trained encoder and then transformed (i.e. compressed) into a highly compact latent code. This latent code can effectively be used as an extremely compressed representation of the mesh or point cloud. A mapping network transforms the compressed latent code into the latent space of a powerful generative model, which is then conditioned to recreate the original 3D model (i.e. decompression). Squeeze3D is trained entirely on generated synthetic data and does not require any 3D datasets. The Squeeze3D architecture can be flexibly used with existing pre-trained 3D encoders and existing generative models. It can flexibly support different formats, including meshes, point clouds, and radiance fields. Our experiments demonstrate that Squeeze3D achieves compression ratios of up to 2187x for textured meshes, 55x for point clouds, and 619x for radiance fields while maintaining visual quality comparable to many existing methods. Squeeze3D only incurs a small compression and decompression latency since it does not involve training object-specific networks to compress an object.
nan
Article 974
Title@2025-06-09 (1): Missing Data Imputation by Reducing Mutual Information with Rectified Flows
Title: Missing Data Imputation by Reducing Mutual Information with Rectified Flows | Fehlende Daten-Imputation durch Verringerung gegenseitiger Informationen mit rektifizierten Strömen | 通过减少相互信息与纠正流流 2505.11749v2 |
Authors (5): Jiahao Yu, Qizhen Ying, Leyang Wang, Ziyue Jiang, Song Liu
This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and their corresponding missing mask. Inspired by GAN-based approaches, which train generators to decrease the predictability of missingness patterns, our method explicitly targets the reduction of mutual information. Specifically, our algorithm iteratively minimizes the KL divergence between the joint distribution of the imputed data and missing mask, and the product of their marginals from the previous iteration. We show that the optimal imputation under this framework corresponds to solving an ODE, whose velocity field minimizes a rectified flow training objective. We further illustrate that some existing imputation techniques can be interpreted as approximate special cases of our mutual-information-reducing framework. Comprehensive experiments on synthetic and real-world datasets validate the efficacy of our proposed approach, demonstrating superior imputation performance.
nan
Article 975
Title@2025-06-09 (1): A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle
Title: A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle | Ein generativer physik-informierter, lernbasierter Ansatz zur Konstruktion repräsentativer Antriebszyklen | 为建造具有代表性的驱动周期而采用基于学习的创形物理综合强化强化学习方法 2506.07929v1 |
Authors (4): Amirreza Yasami, Mohammadali Tofigh, Mahdi Shahbakhti, Charles Robert Koch
Accurate driving cycle construction is crucial for vehicle design, fuel economy analysis, and environmental impact assessments. A generative Physics-Informed Expected SARSA-Monte Carlo (PIESMC) approach that constructs representative driving cycles by capturing transient dynamics, acceleration, deceleration, idling, and road grade transitions while ensuring model fidelity is introduced. Leveraging a physics-informed reinforcement learning framework with Monte Carlo sampling, PIESMC delivers efficient cycle construction with reduced computational cost. Experimental evaluations on two real-world datasets demonstrate that PIESMC replicates key kinematic and energy metrics, achieving up to a 57.3% reduction in cumulative kinematic fragment errors compared to the Micro-trip-based (MTB) method and a 10.5% reduction relative to the Markov-chain-based (MCB) method. Moreover, it is nearly an order of magnitude faster than conventional techniques. Analyses of vehicle-specific power distributions and wavelet-transformed frequency content further confirm its ability to reproduce experimental central tendencies and variability.
nan
Article 976
Title@2025-06-09 (1): Solving Inequality Proofs with Large Language Models
Title: Solving Inequality Proofs with Large Language Models | Unequality Proofs mit großen Sprachmodellen lösen | 以大语言模式解决不平等证据 2506.07927v1 |
Authors (7): Jiayi Sheng, Luna Lyu, Jikai Jin, Tony Xia, Alex Gu, James Zou, Pan Lu
Inequality proving, crucial across diverse scientific and mathematical fields, tests advanced reasoning skills such as discovering tight bounds and strategic theorem application. This makes it a distinct, demanding frontier for large language models (LLMs), offering insights beyond general mathematical problem-solving. Progress in this area is hampered by existing datasets that are often scarce, synthetic, or rigidly formal. We address this by proposing an informal yet verifiable task formulation, recasting inequality proving into two automatically checkable subtasks: bound estimation and relation prediction. Building on this, we release IneqMath, an expert-curated dataset of Olympiad-level inequalities, including a test set and training corpus enriched with step-wise solutions and theorem annotations. We also develop a novel LLM-as-judge evaluation framework, combining a final-answer judge with four step-wise judges designed to detect common reasoning flaws. A systematic evaluation of 29 leading LLMs on IneqMath reveals a surprising reality: even top models like o1 achieve less than 10% overall accuracy under step-wise scrutiny; this is a drop of up to 65.5% from their accuracy considering only final answer equivalence. This discrepancy exposes fragile deductive chains and a critical gap for current LLMs between merely finding an answer and constructing a rigorous proof. Scaling model size and increasing test-time computation yield limited gains in overall proof correctness. Instead, our findings highlight promising research directions such as theorem-guided reasoning and self-refinement. Code and data are available at https://ineqmath.github.io/.
nan
Article 977
Title@2025-06-09 (1): A Comparative Study of U-Net Architectures for Change Detection in Satellite Images
Title: A Comparative Study of U-Net Architectures for Change Detection in Satellite Images | Eine vergleichende Studie über U-Net-Architekturen zur Erkennung von Veränderungen in Satellitenbildern | U-Net图象变化探测网络结构比较研究 2506.07925v1 |
Authors (3): Yaxita Amin, Naimisha S Trivedi, Rashmi Bhattad
Remote sensing change detection is essential for monitoring the everchanging landscapes of the Earth. The U-Net architecture has gained popularity for its capability to capture spatial information and perform pixel-wise classification. However, their application in the Remote sensing field remains largely unexplored. Therefore, this paper fill the gap by conducting a comprehensive analysis of 34 papers. This study conducts a comparison and analysis of 18 different U-Net variations, assessing their potential for detecting changes in remote sensing. We evaluate both benefits along with drawbacks of each variation within the framework of this particular application. We emphasize variations that are explicitly built for change detection, such as Siamese Swin-U-Net, which utilizes a Siamese architecture. The analysis highlights the significance of aspects such as managing data from different time periods and collecting relationships over a long distance to enhance the precision of change detection. This study provides valuable insights for researchers and practitioners that choose U-Net versions for remote sensing change detection tasks.
nan
Article 978
Title@2025-06-09 (1): W4S4: WaLRUS Meets S4 for Long-Range Sequence Modeling
Title: W4S4: WaLRUS Meets S4 for Long-Range Sequence Modeling | W4S4: WaLRUS trifft S4 für lange Sequenzmodellierung | W4S4:WALRUS在长程序列建模中与S4相匹配 2506.07920v1 |
Authors (3): Hossein Babaei, Mel White, Richard G. Baraniuk
State Space Models (SSMs) have emerged as powerful components for sequence modeling, enabling efficient handling of long-range dependencies via linear recurrence and convolutional computation. However, their effectiveness depends heavily on the choice and initialization of the state matrix. In this work, we build on the SaFARi framework and existing WaLRUS SSMs to introduce a new variant, W4S4 (WaLRUS for S4), a new class of SSMs constructed from redundant wavelet frames. WaLRUS admits a stable diagonalization and supports fast kernel computation without requiring low-rank approximations, making it both theoretically grounded and computationally efficient. We show that WaLRUS retains information over long horizons significantly better than HiPPO-based SSMs, both in isolation and when integrated into deep architectures such as S4. Our experiments demonstrate consistent improvements across delay reconstruction tasks, classification benchmarks, and long-range sequence modeling, confirming that high-quality, structured initialization enabled by wavelet-based state dynamic offers substantial advantages over existing alternatives. WaLRUS provides a scalable and versatile foundation for the next generation of deep SSM-based models.
nan
Article 979
Title@2025-06-09 (1): Uncovering the Functional Roles of Nonlinearity in Memory
Title: Uncovering the Functional Roles of Nonlinearity in Memory | Enthüllen der funktionalen Rollen der Nichtlinearität im Gedächtnis | 发挥记忆中非线性的作用 2506.07919v1 |
Authors (2): Manuel Brenner, Georgia Koppe
Memory and long-range temporal processing are core requirements for sequence modeling tasks across natural language processing, time-series forecasting, speech recognition, and control. While nonlinear recurrence has long been viewed as essential for enabling such mechanisms, recent work suggests that linear dynamics may often suffice. In this study, we go beyond performance comparisons to systematically dissect the functional role of nonlinearity in recurrent networks–identifying both when it is computationally necessary, and what mechanisms it enables. We use Almost Linear Recurrent Neural Networks (AL-RNNs), which allow fine-grained control over nonlinearity, as both a flexible modeling tool and a probe into the internal mechanisms of memory. Across a range of classic sequence modeling tasks and a real-world stimulus selection task, we find that minimal nonlinearity is not only sufficient but often optimal, yielding models that are simpler, more robust, and more interpretable than their fully nonlinear or linear counterparts. Our results provide a principled framework for selectively introducing nonlinearity, bridging dynamical systems theory with the functional demands of long-range memory and structured computation in recurrent neural networks, with implications for both artificial and biological neural systems.
nan
Article 980
Title@2025-06-09 (1): CausalPFN: Amortized Causal Effect Estimation via In-Context Learning
Title: CausalPFN: Amortized Causal Effect Estimation via In-Context Learning | CausalPFN: Amortisierte Abschätzung des Kausaleffekts durch In-Context Learning | CausalPPFN: 通过内文学习对因果效应的摊销估计 2506.07918v1 |
Authors (7): Vahid Balazadeh, Hamidreza Kamkari, Valentin Thomas, Benson Li, Junwei Ma, Jesse C. Cresswell, Rahul G. Krishnan
Causal effect estimation from observational data is fundamental across various applications. However, selecting an appropriate estimator from dozens of specialized methods demands substantial manual effort and domain expertise. We present CausalPFN, a single transformer that amortizes this workflow: trained once on a large library of simulated data-generating processes that satisfy ignorability, it infers causal effects for new observational datasets out-of-the-box. CausalPFN combines ideas from Bayesian causal inference with the large-scale training protocol of prior-fitted networks (PFNs), learning to map raw observations directly to causal effects without any task-specific adjustment. Our approach achieves superior average performance on heterogeneous and average treatment effect estimation benchmarks (IHDP, Lalonde, ACIC). Moreover, it shows competitive performance for real-world policy making on uplift modeling tasks. CausalPFN provides calibrated uncertainty estimates to support reliable decision-making based on Bayesian principles. This ready-to-use model does not require any further training or tuning and takes a step toward automated causal inference (https://github.com/vdblm/CausalPFN).
nan
Article 981
Title@2025-06-09 (1): Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces
Title: Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces | Diffuse Alles: Multimodale Diffusionsmodelle auf willkürlichen Staatsräumen | Diffuse Everything:关于任意国家空间的多模式扩散模型 2506.07903v1 |
Authors (5): Kevin Rojas, Yuchen Zhu, Sichen Zhu, Felix X. -F. Ye, Molei Tao
Diffusion models have demonstrated remarkable performance in generating unimodal data across various tasks, including image, video, and text generation. On the contrary, the joint generation of multimodal data through diffusion models is still in the early stages of exploration. Existing approaches heavily rely on external preprocessing protocols, such as tokenizers and variational autoencoders, to harmonize varied data representations into a unified, unimodal format. This process heavily demands the high accuracy of encoders and decoders, which can be problematic for applications with limited data. To lift this restriction, we propose a novel framework for building multimodal diffusion models on arbitrary state spaces, enabling native generation of coupled data across different modalities. By introducing an innovative decoupled noise schedule for each modality, we enable both unconditional and modality-conditioned generation within a single model simultaneously. We empirically validate our approach for text-image generation and mixed-type tabular data synthesis, demonstrating that it achieves competitive performance.
nan
Article 982
Title@2025-06-09 (1): FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative Modeling
Title: FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative Modeling | FunDiff: Diffusionsmodelle über Funktionsräume für physikinformierte Generative Modellierung | FunDiff:物理成形生成模型的功能空间上的传播模型 2506.07902v1 |
Authors (4): Sifan Wang, Zehao Dou, Tong-Rui Liu, Lu Lu
Recent advances in generative modeling – particularly diffusion models and flow matching – have achieved remarkable success in synthesizing discrete data such as images and videos. However, adapting these models to physical applications remains challenging, as the quantities of interest are continuous functions governed by complex physical laws. Here, we introduce $\textbf{FunDiff}$, a novel framework for generative modeling in function spaces. FunDiff combines a latent diffusion process with a function autoencoder architecture to handle input functions with varying discretizations, generate continuous functions evaluable at arbitrary locations, and seamlessly incorporate physical priors. These priors are enforced through architectural constraints or physics-informed loss functions, ensuring that generated samples satisfy fundamental physical laws. We theoretically establish minimax optimality guarantees for density estimation in function spaces, showing that diffusion-based estimators achieve optimal convergence rates under suitable regularity conditions. We demonstrate the practical effectiveness of FunDiff across diverse applications in fluid dynamics and solid mechanics. Empirical results show that our method generates physically consistent samples with high fidelity to the target distribution and exhibits robustness to noisy and low-resolution data. Code and datasets are publicly available at https://github.com/sifanexisted/fundiff.
nan
Article 983
Title@2025-06-09 (1): MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs
Title: MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs | MEMOIR: Lebenslange Modellbearbeitung mit minimalem Überschreiben und informierter Aufbewahrung für LLMs | MEMOIR: 长寿命模型编辑,以最小覆盖和知情保留LLMM 2506.07899v1 |
Authors (5): Ke Wang, Yiming Qin, Nikolaos Dimitriadis, Alessandro Favero, Pascal Frossard
Language models deployed in real-world systems often require post-hoc updates to incorporate new or corrected knowledge. However, editing such models efficiently and reliably - without retraining or forgetting previous information - remains a major challenge. Existing methods for lifelong model editing either compromise generalization, interfere with past edits, or fail to scale to long editing sequences. We propose MEMOIR, a novel scalable framework that injects knowledge through a residual memory, i.e., a dedicated parameter module, while preserving the core capabilities of the pre-trained model. By sparsifying input activations through sample-dependent masks, MEMOIR confines each edit to a distinct subset of the memory parameters, minimizing interference among edits. At inference, it identifies relevant edits by comparing the sparse activation patterns of new queries to those stored during editing. This enables generalization to rephrased queries by activating only the relevant knowledge while suppressing unnecessary memory activation for unrelated prompts. Experiments on question answering, hallucination correction, and out-of-distribution generalization benchmarks across LLaMA-3 and Mistral demonstrate that MEMOIR achieves state-of-the-art performance across reliability, generalization, and locality metrics, scaling to thousands of sequential edits with minimal forgetting.
nan
Article 984
Title@2025-06-09 (1): GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution
Title: GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution | GaussianVAE: Adaptive Lerndynamik von 3D Gaussianern für High-Fidelity Super-Resolution | GausianVAE: 高频高频高频高频高频高分辨率3D的适应性学习动力 2506.07897v1 |
Authors (3): Shuja Khalid, Mohamed Ibrahim, Yang Liu
We present a novel approach for enhancing the resolution and geometric fidelity of 3D Gaussian Splatting (3DGS) beyond native training resolution. Current 3DGS methods are fundamentally limited by their input resolution, producing reconstructions that cannot extrapolate finer details than are present in the training views. Our work breaks this limitation through a lightweight generative model that predicts and refines additional 3D Gaussians where needed most. The key innovation is our Hessian-assisted sampling strategy, which intelligently identifies regions that are likely to benefit from densification, ensuring computational efficiency. Unlike computationally intensive GANs or diffusion approaches, our method operates in real-time (0.015s per inference on a single consumer-grade GPU), making it practical for interactive applications. Comprehensive experiments demonstrate significant improvements in both geometric accuracy and rendering quality compared to state-of-the-art methods, establishing a new paradigm for resolution-free 3D scene enhancement.
nan
Article 985
Title@2025-06-09 (1): Learning to Stop: Deep Learning for Mean Field Optimal Stopping
Title: Learning to Stop: Deep Learning for Mean Field Optimal Stopping | Stoppen lernen: Deep Learning für das mittlere Feld Optimales Stoppen | 学习到停止:为平均字段进行深度学习 2410.08850v2 |
Authors (3): Lorenzo Magnino, Yuchen Zhu, Mathieu Laurière
Optimal stopping is a fundamental problem in optimization with applications in risk management, finance, robotics, and machine learning. We extend the standard framework to a multi-agent setting, named multi-agent optimal stopping (MAOS), where agents cooperate to make optimal stopping decisions in a finite-space, discrete-time environment. Since solving MAOS becomes computationally prohibitive as the number of agents is very large, we study the mean-field optimal stopping (MFOS) problem, obtained as the number of agents tends to infinity. We establish that MFOS provides a good approximation to MAOS and prove a dynamic programming principle (DPP) based on mean-field control theory. We then propose two deep learning approaches: one that learns optimal stopping decisions by simulating full trajectories and another that leverages the DPP to compute the value function and to learn the optimal stopping rule using backward induction. Both methods train neural networks to approximate optimal stopping policies. We demonstrate the effectiveness and the scalability of our work through numerical experiments on 6 different problems in spatial dimension up to 300. To the best of our knowledge, this is the first work to formalize and computationally solve MFOS in discrete time and finite space, opening new directions for scalable MAOS methods.
nan
Article 986
Title@2025-06-09 (1): SMRS: advocating a unified reporting standard for surrogate models in the artificial intelligence era
Title: SMRS: advocating a unified reporting standard for surrogate models in the artificial intelligence era | SMRS: Für einen einheitlichen Berichterstattungsstandard für Surrogatmodelle im Zeitalter der künstlichen Intelligenz | SMRS:倡导人工智能时代代用模型的统一报告标准 2502.06753v2 |
Authors (5): Elizaveta Semenova, Alisa Sheinkman, Timothy James Hitge, Siobhan Mackenzie Hall, Jon Cockayne
Surrogate models are widely used to approximate complex systems across science and engineering to reduce computational costs. Despite their widespread adoption, the field lacks standardisation across key stages of the modelling pipeline, including data sampling, model selection, evaluation, and downstream analysis. This fragmentation limits reproducibility and cross-domain utility – a challenge further exacerbated by the rapid proliferation of AI-driven surrogate models. We argue for the urgent need to establish a structured reporting standard, the Surrogate Model Reporting Specification (SMRS), that systematically captures essential design and evaluation choices while remaining agnostic to implementation specifics. By promoting a standardised yet flexible framework, we aim to improve the reliability of surrogate modelling, foster interdisciplinary knowledge transfer, and, as a result, accelerate scientific progress in the AI era.
nan
Article 987
Title@2025-06-09 (1): SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark
Title: SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark | SoK: Datenrekonstruktion greift gegen Modelle des maschinellen Lernens an: Definition, Metrics und Benchmark | SoK:对机器学习模式进行数据重建攻击:定义、计量和基准 2506.07888v1 |
Authors (4): Rui Wen, Yiyong Liu, Michael Backes, Yang Zhang
Data reconstruction attacks, which aim to recover the training dataset of a target model with limited access, have gained increasing attention in recent years. However, there is currently no consensus on a formal definition of data reconstruction attacks or appropriate evaluation metrics for measuring their quality. This lack of rigorous definitions and universal metrics has hindered further advancement in this field. In this paper, we address this issue in the vision domain by proposing a unified attack taxonomy and formal definitions of data reconstruction attacks. We first propose a set of quantitative evaluation metrics that consider important criteria such as quantifiability, consistency, precision, and diversity. Additionally, we leverage large language models (LLMs) as a substitute for human judgment, enabling visual evaluation with an emphasis on high-quality reconstructions. Using our proposed taxonomy and metrics, we present a unified framework for systematically evaluating the strengths and limitations of existing attacks and establishing a benchmark for future research. Empirical results, primarily from a memorization perspective, not only validate the effectiveness of our metrics but also offer valuable insights for designing new attacks.
nan
Article 988
Title@2025-06-09 (1): SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection
Title: SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection | SimLTD: Einfache, überwachte und halbüberwachte Langzeitobjekterkennung | SimLTD: 简单监视和半监视长途远航物体探测 2412.20047v3 |
Authors (1): Phi Vu Tran
While modern visual recognition systems have made significant advancements, many continue to struggle with the open problem of learning from few exemplars. This paper focuses on the task of object detection in the setting where object classes follow a natural long-tailed distribution. Existing methods for long-tailed detection resort to external ImageNet labels to augment the low-shot training instances. However, such dependency on a large labeled database has limited utility in practical scenarios. We propose a versatile and scalable approach to leverage optional unlabeled images, which are easy to collect without the burden of human annotations. Our SimLTD framework is straightforward and intuitive, and consists of three simple steps: (1) pre-training on abundant head classes; (2) transfer learning on scarce tail classes; and (3) fine-tuning on a sampled set of both head and tail classes. Our approach can be viewed as an improved head-to-tail model transfer paradigm without the added complexities of meta-learning or knowledge distillation, as was required in past research. By harnessing supplementary unlabeled images, without extra image labels, SimLTD establishes new record results on the challenging LVIS v1 benchmark across both supervised and semi-supervised settings.
nan
Article 989
Title@2025-06-09 (1): Schauder Bases for $C[0, 1]$ Using ReLU, Softplus and Two Sigmoidal Functions
Title: Schauder Bases for $C[0, 1]$ Using ReLU, Softplus and Two Sigmoidal Functions | Schauder-Basis für $C[0, 1]$ mit ReLU, Softplus und zwei Sigmoidal-Funktionen | 使用 ReLU、 Softplus 和两个悬浮函数的 Schauder 基地[0, 1]$C[0, 1]美元 2506.07884v1 |
Authors (3): Anand Ganesh, Babhrubahan Bose, Anand Rajagopalan
We construct four Schauder bases for the space $C[0,1]$, one using ReLU functions, another using Softplus functions, and two more using sigmoidal versions of the ReLU and Softplus functions. This establishes the existence of a basis using these functions for the first time, and improves on the universal approximation property associated with them.
nan
Article 990
Title@2025-06-09 (1): Diffusion Counterfactual Generation with Semantic Abduction
Title: Diffusion Counterfactual Generation with Semantic Abduction | Diffusion Gegenfaktische Generation mit semantischer Abduktion | 以语义绑架方式传播反实际生成 2506.07883v1 |
Authors (5): Rajat Rasal, Avinash Kori, Fabio De Sousa Ribeiro, Tian Xia, Ben Glocker
Counterfactual image generation presents significant challenges, including preserving identity, maintaining perceptual quality, and ensuring faithfulness to an underlying causal model. While existing auto-encoding frameworks admit semantic latent spaces which can be manipulated for causal control, they struggle with scalability and fidelity. Advancements in diffusion models present opportunities for improving counterfactual image editing, having demonstrated state-of-the-art visual quality, human-aligned perception and representation learning capabilities. Here, we present a suite of diffusion-based causal mechanisms, introducing the notions of spatial, semantic and dynamic abduction. We propose a general framework that integrates semantic representations into diffusion models through the lens of Pearlian causality to edit images via a counterfactual reasoning process. To our knowledge, this is the first work to consider high-level semantic identity preservation for diffusion counterfactuals and to demonstrate how semantic control enables principled trade-offs between faithful causal control and identity preservation.
nan
Article 991
Title@2025-06-09 (1): On Hypothesis Transfer Learning of Functional Linear Models
Title: On Hypothesis Transfer Learning of Functional Linear Models | Über Hypothesentransfer Lernen von funktionalen Linearmodellen | 关于职能线性模型学习的假设转移 2206.04277v5 |
Authors (2): Haotian Lin, Matthew Reimherr
We study the transfer learning (TL) for the functional linear regression (FLR) under the Reproducing Kernel Hilbert Space (RKHS) framework, observing that the TL techniques in existing high-dimensional linear regression are not compatible with the truncation-based FLR methods, as functional data are intrinsically infinite-dimensional and generated by smooth underlying processes. We measure the similarity across tasks using RKHS distance, allowing the type of information being transferred to be tied to the properties of the imposed RKHS. Building on the hypothesis offset transfer learning paradigm, two algorithms are proposed: one conducts the transfer when positive sources are known, while the other leverages aggregation techniques to achieve robust transfer without prior information about the sources. We establish asymptotic lower bounds for this learning problem and show that the proposed algorithms enjoy a matching upper bound. These analyses provide statistical insights into factors that contribute to the dynamics of the transfer. We also extend the results to functional generalized linear models. The effectiveness of the proposed algorithms is demonstrated via extensive synthetic data as well as real-world data applications.
nan
Article 992
Title@2025-06-09 (1): GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity
Title: GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity | GradSkip: Kommunikationsbeschleunigte lokale Gradientenmethoden mit besserer Computational Complexity | GradSkip:通讯加速的地方渐进法,具有更好的计算复杂度 2210.16402v3 |
Authors (3): Artavazd Maranjyan, Mher Safaryan, Peter Richtárik
We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing clients to perform multiple local gradient-type training steps before communication. In a recent breakthrough, Mishchenko et al. (2022) proved that local training, when properly executed, leads to provable communication acceleration, and this holds in the strongly convex regime without relying on any data similarity assumptions. However, their ProxSkip method requires all clients to take the same number of local training steps in each communication round. We propose a redesign of the ProxSkip method, allowing clients with ``less important’’ data to get away with fewer local training steps without impacting the overall communication complexity of the method. In particular, we prove that our modified method, GradSkip, converges linearly under the same assumptions and has the same accelerated communication complexity, while the number of local gradient steps can be reduced relative to a local condition number. We further generalize our method by extending the randomness of probabilistic alternations to arbitrary unbiased compression operators and by considering a generic proximable regularizer. This generalization, which we call GradSkip+, recovers several related methods in the literature as special cases. Finally, we present an empirical study on carefully designed toy problems that confirm our theoretical claims.
nan
Article 993
Title@2025-06-09 (1): SIFBench: An Extensive Benchmark for Fatigue Analysis
Title: SIFBench: An Extensive Benchmark for Fatigue Analysis | SIFBench: Ein umfassender Benchmark für die Ermüdungsanalyse | SIFBench:关于 “ 致命分析 “ 的广泛基准 2506.01173v2 |
Authors (4): Tushar Gautam, Robert M. Kirby, Jacob Hochhalter, Shandian Zhe
Fatigue-induced crack growth is a leading cause of structural failure across critical industries such as aerospace, civil engineering, automotive, and energy. Accurate prediction of stress intensity factors (SIFs) – the key parameters governing crack propagation in linear elastic fracture mechanics – is essential for assessing fatigue life and ensuring structural integrity. While machine learning (ML) has shown great promise in SIF prediction, its advancement has been severely limited by the lack of rich, transparent, well-organized, and high-quality datasets. To address this gap, we introduce SIFBench, an open-source, large-scale benchmark database designed to support ML-based SIF prediction. SIFBench contains over 5 million different crack and component geometries derived from high-fidelity finite element simulations across 37 distinct scenarios, and provides a unified Python interface for seamless data access and customization. We report baseline results using a range of popular ML models – including random forests, support vector machines, feedforward neural networks, and Fourier neural operators – alongside comprehensive evaluation metrics and template code for model training, validation, and assessment. By offering a standardized and scalable resource, SIFBench substantially lowers the entry barrier and fosters the development and application of ML methods in damage tolerance design and predictive maintenance.
nan
Article 994
Title@2025-06-09 (1): Algorithmic Aspects of Strategic Trading
Title: Algorithmic Aspects of Strategic Trading | Algorithmische Aspekte des strategischen Handels | 战略贸易的算法方面 2502.07606v2 |
Authors (2): Michael Kearns, Mirah Shi
Algorithmic trading in modern financial markets is widely acknowledged to exhibit strategic, game-theoretic behaviors whose complexity can be difficult to model. A recent series of papers (Chriss, 2024b,c,a, 2025) has made progress in the setting of trading for position building. Here parties wish to buy or sell a fixed number of shares in a fixed time period in the presence of both temporary and permanent market impact, resulting in exponentially large strategy spaces. While these papers primarily consider the existence and structural properties of equilibrium strategies, in this work we focus on the algorithmic aspects of the proposed model. We give an efficient algorithm for computing best responses, and show that while the temporary impact only setting yields a potential game, best response dynamics do not generally converge for the general setting, for which no fast algorithm for (Nash) equilibrium computation is known. This leads us to consider the broader notion of Coarse Correlated Equilibria (CCE), which we show can be computed efficiently via an implementation of Follow the Perturbed Leader (FTPL). We illustrate the model and our results with an experimental investigation, where FTPL exhibits interesting behavior in different regimes of the relative weighting between temporary and permanent market impact.
nan
Article 995
Title@2025-06-09 (1): TS-RSR: A provably efficient approach for batch Bayesian Optimization
Title: TS-RSR: A provably efficient approach for batch Bayesian Optimization | TS-RSR: Ein nachweislich effizienter Ansatz für die Batch Bayesian Optimierung | TS-RSR:对Bayesian的批量最佳化采取可衡量有效办法 2403.04764v4 |
Authors (2): Zhaolin Ren, Na Li
This paper presents a new approach for batch Bayesian Optimization (BO) called Thompson Sampling-Regret to Sigma Ratio directed sampling (TS-RSR), where we sample a new batch of actions by minimizing a Thompson Sampling approximation of a regret to uncertainty ratio. Our sampling objective is able to coordinate the actions chosen in each batch in a way that minimizes redundancy between points whilst focusing on points with high predictive means or high uncertainty. Theoretically, we provide rigorous convergence guarantees on our algorithm’s regret, and numerically, we demonstrate that our method attains state-of-the-art performance on a range of challenging synthetic and realistic test functions, where it outperforms several competitive benchmark batch BO algorithms.
nan
Article 996
Title@2025-06-09 (1): Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?
Title: Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models? | Können Hessian-Based Insights Fehlerdiagnosen in aufmerksamkeitsbasierten Modellen unterstützen? | 以海珊为基地的洞察能支持以关注为基础的模型中的过失诊断吗? 2506.07871v1 |
Authors (2): Sigma Jahan, Mohammad Masudur Rahman
As attention-based deep learning models scale in size and complexity, diagnosing their faults becomes increasingly challenging. In this work, we conduct an empirical study to evaluate the potential of Hessian-based analysis for diagnosing faults in attention-based models. Specifically, we use Hessian-derived insights to identify fragile regions (via curvature analysis) and parameter interdependencies (via parameter interaction analysis) within attention mechanisms. Through experiments on three diverse models (HAN, 3D-CNN, DistilBERT), we show that Hessian-based metrics can localize instability and pinpoint fault sources more effectively than gradients alone. Our empirical findings suggest that these metrics could significantly improve fault diagnosis in complex neural architectures, potentially improving software debugging practices.
nan
Article 997
Title@2025-06-09 (1): Accurate INT8 Training Through Dynamic Block-Level Fallback
Title: Accurate INT8 Training Through Dynamic Block-Level Fallback | Genaues INT8-Training durch dynamischen Block-Level-Fallback | 通过动态区档级后退进行准确的INT8培训 2503.08040v3 |
Authors (5): Pengle Zhang, Jia Wei, Jintao Zhang, Jun Zhu, Jianfei Chen
Transformer models have achieved remarkable success across various AI applications but face significant training costs. Low-bit training, such as INT8 training, can leverage computational units with higher throughput, and has already demonstrated its effectiveness on GPT2 models with block-level quantization. However, it struggles with modern Transformer variants incorporating GLU units. This is because those variants demonstrate complex distributions of activation outliers. To address the challenge, we propose Fallback Quantization, implementing mixed-precision GEMM that dynamically falls back 8-bit to 16-bit for activation blocks containing outliers. Experiments show that our approach is robustly competent in both fine-tuning and pretraining settings. Moreover, our method achieves a 1.57x end-to-end training speedup on RTX4090 GPUs.
nan
Article 998
Title@2025-06-09 (1): FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
Title: FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity | FreeGave: 3D Physik Lernen aus dynamischen Videos von Gaussian Velocity | FreeGave: Gaussian 速度从动态视频学习 3D物理 2506.07865v1 |
Authors (4): Jinxi Li, Ziyang Song, Siyuan Zhou, Bo Yang
In this paper, we aim to model 3D scene geometry, appearance, and the underlying physics purely from multi-view videos. By applying various governing PDEs as PINN losses or incorporating physics simulation into neural networks, existing works often fail to learn complex physical motions at boundaries or require object priors such as masks or types. In this paper, we propose FreeGave to learn the physics of complex dynamic 3D scenes without needing any object priors. The key to our approach is to introduce a physics code followed by a carefully designed divergence-free module for estimating a per-Gaussian velocity field, without relying on the inefficient PINN losses. Extensive experiments on three public datasets and a newly collected challenging real-world dataset demonstrate the superior performance of our method for future frame extrapolation and motion segmentation. Most notably, our investigation into the learned physics codes reveals that they truly learn meaningful 3D physical motion patterns in the absence of any human labels in training.
nan
Article 999
Title@2025-06-09 (1): Lightweight Sequential Transformers for Blood Glucose Level Prediction in Type-1 Diabetes
Title: Lightweight Sequential Transformers for Blood Glucose Level Prediction in Type-1 Diabetes | Leichte Sequentielle Transformer für Blutglukose-Level-Vorhersage bei Typ-1-Diabetes | 一型糖尿病血凝胶水平预测的轻量量 轻序式变压器 2506.07864v1 |
Authors (4): Mirko Paolo Barbato, Giorgia Rigamonti, Davide Marelli, Paolo Napoletano
Type 1 Diabetes (T1D) affects millions worldwide, requiring continuous monitoring to prevent severe hypo- and hyperglycemic events. While continuous glucose monitoring has improved blood glucose management, deploying predictive models on wearable devices remains challenging due to computational and memory constraints. To address this, we propose a novel Lightweight Sequential Transformer model designed for blood glucose prediction in T1D. By integrating the strengths of Transformers’ attention mechanisms and the sequential processing of recurrent neural networks, our architecture captures long-term dependencies while maintaining computational efficiency. The model is optimized for deployment on resource-constrained edge devices and incorporates a balanced loss function to handle the inherent data imbalance in hypo- and hyperglycemic events. Experiments on two benchmark datasets, OhioT1DM and DiaTrend, demonstrate that the proposed model outperforms state-of-the-art methods in predicting glucose levels and detecting adverse events. This work fills the gap between high-performance modeling and practical deployment, providing a reliable and efficient T1D management solution.
nan
Article 1000
Title@2025-06-09 (1): VIVAT: Virtuous Improving VAE Training through Artifact Mitigation
Title: VIVAT: Virtuous Improving VAE Training through Artifact Mitigation | VIVAT: Tugendhafte Verbesserung der VAE-Ausbildung durch Artefakt-Minderung | VIVAT:通过减少非手工行为,提高VAE培训的道德水平 2506.07863v1 |
Authors (5): Lev Novitskiy, Viacheslav Vasilev, Maria Kovaleva, Vladimir Arkhipkin, Denis Dimitrov
Variational Autoencoders (VAEs) remain a cornerstone of generative computer vision, yet their training is often plagued by artifacts that degrade reconstruction and generation quality. This paper introduces VIVAT, a systematic approach to mitigating common artifacts in KL-VAE training without requiring radical architectural changes. We present a detailed taxonomy of five prevalent artifacts - color shift, grid patterns, blur, corner and droplet artifacts - and analyze their root causes. Through straightforward modifications, including adjustments to loss weights, padding strategies, and the integration of Spatially Conditional Normalization, we demonstrate significant improvements in VAE performance. Our method achieves state-of-the-art results in image reconstruction metrics (PSNR and SSIM) across multiple benchmarks and enhances text-to-image generation quality, as evidenced by superior CLIP scores. By preserving the simplicity of the KL-VAE framework while addressing its practical challenges, VIVAT offers actionable insights for researchers and practitioners aiming to optimize VAE training.
nan
Article 1001
Title@2025-06-09 (1): PropEnc: A Property Encoder for Graph Neural Networks
Title: PropEnc: A Property Encoder for Graph Neural Networks | Propenc: Ein Objekt-Encoder für Graph-Neural-Netzwerke | PropEnc: 图形神经网络的属性编码器 2409.11554v3 |
Authors (3): Anwar Said, Waseem Abbas, Xenofon Koutsoukos
Graph machine learning, particularly using graph neural networks, heavily relies on node features. However, many real-world systems, such as social and biological networks, lack node features due to privacy concerns, incomplete data, or collection limitations. Structural and positional encoding are commonly used to address this but are constrained by the maximum values of the encoded properties, such as the highest node degree. This limitation makes them impractical for scale-free networks and applications involving large or non-categorical properties. This paper introduces PropEnc, a novel and versatile encoder to generate expressive node embedding from any graph metric. By combining histogram construction with reversed index encoding, PropEnc offers a flexible solution that supports low-dimensional representations and diverse input types, effectively mitigating sparsity issues while improving computational efficiency. Additionally, it replicates one-hot encoding or approximates indices with high accuracy, making it adaptable to a wide range of graph applications. We validate PropEnc through extensive experiments on graph classification task across several social networks lacking node features. The empirical results demonstrate that PropEnc offers an efficient mechanism for constructing node features from various graph metrics.
nan
Article 1002
Title@2025-06-09 (1): Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective
Title: Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective | Fairness Overfiting in Machine Learning: Eine informationstheoretische Perspektive | 在机器学习中过分适应公平:信息理论视角 2506.07861v1 |
Authors (4): Firas Laakom, Haobo Chen, Jürgen Schmidhuber, Yuheng Bu
Despite substantial progress in promoting fairness in high-stake applications using machine learning models, existing methods often modify the training process, such as through regularizers or other interventions, but lack formal guarantees that fairness achieved during training will generalize to unseen data. Although overfitting with respect to prediction performance has been extensively studied, overfitting in terms of fairness loss has received far less attention. This paper proposes a theoretical framework for analyzing fairness generalization error through an information-theoretic lens. Our novel bounding technique is based on Efron-Stein inequality, which allows us to derive tight information-theoretic fairness generalization bounds with both Mutual Information (MI) and Conditional Mutual Information (CMI). Our empirical results validate the tightness and practical relevance of these bounds across diverse fairness-aware learning algorithms. Our framework offers valuable insights to guide the design of algorithms improving fairness generalization.
nan
Article 1003
Title@2025-06-09 (1): LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds
Title: LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds | LogoSP: Lokale-globale Gruppierung von Superpoints für unbeaufsichtigte semantische Segmentierung von 3D-Punkt-Clouds | LogoSP: 3D点云无监督的语义分割超点地方-全球组别 2506.07857v1 |
Authors (4): Zihui Zhang, Weisheng Dai, Hongtao Wen, Bo Yang
We study the problem of unsupervised 3D semantic segmentation on raw point clouds without needing human labels in training. Existing methods usually formulate this problem into learning per-point local features followed by a simple grouping strategy, lacking the ability to discover additional and possibly richer semantic priors beyond local features. In this paper, we introduce LogoSP to learn 3D semantics from both local and global point features. The key to our approach is to discover 3D semantic information by grouping superpoints according to their global patterns in the frequency domain, thus generating highly accurate semantic pseudo-labels for training a segmentation network. Extensive experiments on two indoor and an outdoor datasets show that our LogoSP surpasses all existing unsupervised methods by large margins, achieving the state-of-the-art performance for unsupervised 3D semantic segmentation. Notably, our investigation into the learned global patterns reveals that they truly represent meaningful 3D semantics in the absence of human labels during training.
nan
Article 1004
Title@2025-06-09 (1): Residual Reweighted Conformal Prediction for Graph Neural Networks
Title: Residual Reweighted Conformal Prediction for Graph Neural Networks | Residual Reweighted Conformal Prediction for Graph Neural Networks | 图形神经网络的重新加权常规预测 2506.07854v1 |
Authors (6): Zheng Zhang, Jie Bao, Zhixin Zhou, Nicolo Colombo, Lixin Cheng, Rui Luo
Graph Neural Networks (GNNs) excel at modeling relational data but face significant challenges in high-stakes domains due to unquantified uncertainty. Conformal prediction (CP) offers statistical coverage guarantees, but existing methods often produce overly conservative prediction intervals that fail to account for graph heteroscedasticity and structural biases. While residual reweighting CP variants address some of these limitations, they neglect graph topology, cluster-specific uncertainties, and risk data leakage by reusing training sets. To address these issues, we propose Residual Reweighted GNN (RR-GNN), a framework designed to generate minimal prediction sets with provable marginal coverage guarantees. RR-GNN introduces three major innovations to enhance prediction performance. First, it employs Graph-Structured Mondrian CP to partition nodes or edges into communities based on topological features, ensuring cluster-conditional coverage that reflects heterogeneity. Second, it uses Residual-Adaptive Nonconformity Scores by training a secondary GNN on a held-out calibration set to estimate task-specific residuals, dynamically adjusting prediction intervals according to node or edge uncertainty. Third, it adopts a Cross-Training Protocol, which alternates the optimization of the primary GNN and the residual predictor to prevent information leakage while maintaining graph dependencies. We validate RR-GNN on 15 real-world graphs across diverse tasks, including node classification, regression, and edge weight prediction. Compared to CP baselines, RR-GNN achieves improved efficiency over state-of-the-art methods, with no loss of coverage.
nan
Article 1005
Title@2025-06-09 (1): Conditional Local Independence Testing with Application to Dynamic Causal Discovery
Title: Conditional Local Independence Testing with Application to Dynamic Causal Discovery | Conditional Local Independence Testing mit Anwendung auf Dynamische Kausal-Discovery | 有条件的局部独立测试,应用到动态因子发现 2506.07844v1 |
Authors (3): Mingzhou Liu, Xinwei Sun, Yizhou Wang
In this note, we extend the conditional local independence testing theory developed in Christgau et al. (2024) to Ito processes. The result can be applied to causal discovery in dynamic systems.
nan
Article 1006
Title@2025-06-09 (1): Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels
Title: Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels | Jarzynski Umgewichtungs- und Probenahmedynamiken für das Training energiebasierter Modelle: Theoretische Analyse verschiedener Übergangskerne | Jarzynski 培训以能源为基础的模型的加权和抽样动态:对不同过渡核心的理论分析 2506.07843v1 |
Authors (1): Davide Carbone
Energy-Based Models (EBMs) provide a flexible framework for generative modeling, but their training remains theoretically challenging due to the need to approximate normalization constants and efficiently sample from complex, multi-modal distributions. Traditional methods, such as contrastive divergence and score matching, introduce biases that can hinder accurate learning. In this work, we present a theoretical analysis of Jarzynski reweighting, a technique from non-equilibrium statistical mechanics, and its implications for training EBMs. We focus on the role of the choice of the kernel and we illustrate these theoretical considerations in two key generative frameworks: (i) flow-based diffusion models, where we reinterpret Jarzynski reweighting in the context of stochastic interpolants to mitigate discretization errors and improve sample quality, and (ii) Restricted Boltzmann Machines, where we analyze its role in correcting the biases of contrastive divergence. Our results provide insights into the interplay between kernel choice and model performance, highlighting the potential of Jarzynski reweighting as a principled tool for generative learning.
nan
Article 1007
Title@2025-06-09 (1): FedX: Adaptive Model Decomposition and Quantization for IoT Federated Learning
Title: FedX: Adaptive Model Decomposition and Quantization for IoT Federated Learning | FedX: Adaptive Modellzersetzung und Quantisierung für IoT-Federated Learning | FFX:国际oT联合会学习的适应性模型分解和量化 2504.12849v3 |
Authors (8): Phung Lai, Xiaopeng Jiang, Hai Phan, Cristian Borcea, Khang Tran, An Chen, Vijaya Datta Mayyuri, Ruoming Jin
Federated Learning (FL) allows collaborative training among multiple devices without data sharing, thus enabling privacy-sensitive applications on mobile or Internet of Things (IoT) devices, such as mobile health and asset tracking. However, designing an FL system with good model utility that works with low computation/communication overhead on heterogeneous, resource-constrained mobile/IoT devices is challenging. To address this problem, this paper proposes FedX, a novel adaptive model decomposition and quantization FL system for IoT. To balance utility with resource constraints on IoT devices, FedX decomposes a global FL model into different sub-networks with adaptive numbers of quantized bits for different devices. The key idea is that a device with fewer resources receives a smaller sub-network for lower overhead but utilizes a larger number of quantized bits for higher model utility, and vice versa. The quantization operations in FedX are done at the server to reduce the computational load on devices. FedX iteratively minimizes the losses in the devices’ local data and in the server’s public data using quantized sub-networks under a regularization term, and thus it maximizes the benefits of combining FL with model quantization through knowledge sharing among the server and devices in a cost-effective training process. Extensive experiments show that FedX significantly improves quantization times by up to 8.43X, on-device computation time by 1.5X, and total end-to-end training time by 1.36X, compared with baseline FL systems. We guarantee the global model convergence theoretically and validate local model convergence empirically, highlighting FedX’s optimization efficiency.
nan
Article 1008
Title@2025-06-09 (1): A Cognac Shot To Forget Bad Memories: Corrective Unlearning for Graph Neural Networks
Title: A Cognac Shot To Forget Bad Memories: Corrective Unlearning for Graph Neural Networks | Ein Cognac, um schlechte Erinnerungen zu vergessen: Korrektives Lernen für Graphen-Neural-Netzwerke | 为忘记坏记忆而拍摄的科格纳克白兰地:为图形神经网络进行矫正性学习 2412.00789v4 |
Authors (7): Varshita Kolipaka, Akshit Sinha, Debangan Mishra, Sumit Kumar, Arvindh Arun, Shashwat Goel, Ponnurangam Kumaraguru
Graph Neural Networks (GNNs) are increasingly being used for a variety of ML applications on graph data. Because graph data does not follow the independently and identically distributed (i.i.d.) assumption, adversarial manipulations or incorrect data can propagate to other data points through message passing, which deteriorates the model’s performance. To allow model developers to remove the adverse effects of manipulated entities from a trained GNN, we study the recently formulated problem of Corrective Unlearning. We find that current graph unlearning methods fail to unlearn the effect of manipulations even when the whole manipulated set is known. We introduce a new graph unlearning method, Cognac, which can unlearn the effect of the manipulation set even when only 5% of it is identified. It recovers most of the performance of a strong oracle with fully corrected training data, even beating retraining from scratch without the deletion set while being 8x more efficient. We hope our work assists GNN developers in mitigating harmful effects caused by issues in real-world data, post-training. Our code is publicly available at https://github.com/cognac-gnn-unlearning/corrective-unlearning-for-gnns
nan
Article 1009
Title@2025-06-09 (1): C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition
Title: C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition | C3T: Grenzüberschreitender Transfer durch Zeit für sensorgestützte menschliche Aktivitätserkennung | C3T: 以传感器为基础的人类活动识别跨时间跨模式转让 2407.16803v3 |
Authors (3): Abhi Kamboj, Anh Duy Nguyen, Minh N. Do
In order to unlock the potential of diverse sensors, we investigate a method to transfer knowledge between time-series modalities using a multimodal \textit{temporal} representation space for Human Activity Recognition (HAR). Specifically, we explore the setting where the modality used in testing has no labeled data during training, which we refer to as Unsupervised Modality Adaptation (UMA). We categorize existing UMA approaches as Student-Teacher or Contrastive Alignment methods. These methods typically compress continuous-time data samples into single latent vectors during alignment, inhibiting their ability to transfer temporal information through real-world temporal distortions. To address this, we introduce Cross-modal Transfer Through Time (C3T), which preserves temporal information during alignment to handle dynamic sensor data better. C3T achieves this by aligning a set of temporal latent vectors across sensing modalities. Our extensive experiments on various camera+IMU datasets demonstrate that C3T outperforms existing methods in UMA by at least 8% in accuracy and shows superior robustness to temporal distortions such as time-shift, misalignment, and dilation. Our findings suggest that C3T has significant potential for developing generalizable models for time-series sensor data, opening new avenues for various multimodal applications.
nan
Article 1010
Title@2025-06-09 (1): Efficient and Generalized end-to-end Autonomous Driving System with Latent Deep Reinforcement Learning and Demonstrations
Title: Efficient and Generalized end-to-end Autonomous Driving System with Latent Deep Reinforcement Learning and Demonstrations | Effizientes und generalisiertes End-to-End Autonomes Fahrsystem mit Latent Deep Reinforcement Learning und Demonstrationen | 高效和普遍通用的端到端自动驾驶系统,配有深深层深强化学习和示范 2401.11792v7 |
Authors (4): Zuojin Tang, Xiaoyu Chen, Yongqiang Li, Jianyu Chen
An intelligent driving system should dynamically formulate appropriate driving strategies based on the current environment and vehicle status while ensuring system security and reliability. However, methods based on reinforcement learning and imitation learning often suffer from high sample complexity, poor generalization, and low safety. To address these challenges, this paper introduces an efficient and generalized end-to-end autonomous driving system (EGADS) for complex and varied scenarios. The RL agent in our EGADS combines variational inference with normalizing flows, which are independent of distribution assumptions. This combination allows the agent to capture historical information relevant to driving in latent space effectively, thereby significantly reducing sample complexity. Additionally, we enhance safety by formulating robust safety constraints and improve generalization and performance by integrating RL with expert demonstrations. Experimental results demonstrate that, compared to existing methods, EGADS significantly reduces sample complexity, greatly improves safety performance, and exhibits strong generalization capabilities in complex urban scenarios. Particularly, we contributed an expert dataset collected through human expert steering wheel control, specifically using the G29 steering wheel.
nan
Article 1011
Title@2025-06-09 (1): Beyond Numeric Rewards: In-Context Dueling Bandits with LLM Agents
Title: Beyond Numeric Rewards: In-Context Dueling Bandits with LLM Agents | Über Numerische Belohnungen hinaus: In-Context-Duellbanditen mit LLM-Agenten | 除数值奖励之外:与LLM代理商进行文中评分的强盗 2407.01887v4 |
Authors (4): Fanzeng Xia, Hao Liu, Yisong Yue, Tongxin Li
In-Context Reinforcement Learning (ICRL) is a frontier paradigm to solve Reinforcement Learning (RL) problems in the foundation model era. While ICRL capabilities have been demonstrated in transformers through task-specific training, the potential of Large Language Models (LLMs) out-of-the-box remains largely unexplored. This paper investigates whether LLMs can generalize cross-domain to perform ICRL under the problem of Dueling Bandits (DB), a stateless preference-based RL setting. We find that the top-performing LLMs exhibit a notable zero-shot capacity for relative decision-making, which translates to low short-term weak regret across all DB environment instances by quickly including the best arm in duels. However, an optimality gap still exists between LLMs and classic DB algorithms in terms of strong regret. LLMs struggle to converge and consistently exploit even when explicitly prompted to do so, and are sensitive to prompt variations. To bridge this gap, we propose an agentic flow framework: LLM with Enhanced Algorithmic Dueling (LEAD), which integrates off-the-shelf DB algorithm support with LLM agents through fine-grained adaptive interplay. We show that LEAD has theoretical guarantees inherited from classic DB algorithms on both weak and strong regret. We validate its efficacy and robustness even with noisy and adversarial prompts. The design of such an agentic framework sheds light on how to enhance the trustworthiness of general-purpose LLMs generalized to in-context decision-making tasks.
nan
Article 1012
Title@2025-06-09 (1): Improving large language models with concept-aware fine-tuning
Title: Improving large language models with concept-aware fine-tuning | Verbesserung großer Sprachmodelle mit konzeptbewusster Feinsteuerung | 改进大语言模式,进行概念意识微调 2506.07833v1 |
Authors (4): Michael K. Chen, Xikun Zhang, Jiaxing Huang, Dacheng Tao
Large language models (LLMs) have become the cornerstone of modern AI. However, the existing paradigm of next-token prediction fundamentally limits their ability to form coherent, high-level concepts, making it a critical barrier to human-like understanding and reasoning. Take the phrase “ribonucleic acid” as an example: an LLM will first decompose it into tokens, i.e., artificial text fragments (“rib”, “on”, …), then learn each token sequentially, rather than grasping the phrase as a unified, coherent semantic entity. This fragmented representation hinders deeper conceptual understanding and, ultimately, the development of truly intelligent systems. In response, we introduce Concept-Aware Fine-Tuning (CAFT), a novel multi-token training method that redefines how LLMs are fine-tuned. By enabling the learning of sequences that span multiple tokens, this method fosters stronger concept-aware learning. Our experiments demonstrate significant improvements compared to conventional next-token finetuning methods across diverse tasks, including traditional applications like text summarization and domain-specific ones like de novo protein design. Multi-token prediction was previously only possible in the prohibitively expensive pretraining phase; CAFT, to our knowledge, is the first to bring the multi-token setting to the post-training phase, thus effectively democratizing its benefits for the broader community of practitioners and researchers. Finally, the unexpected effectiveness of our proposed method suggests wider implications for the machine learning research community. All code and data are available at https://github.com/michaelchen-lab/caft-llm
nan
Article 1013
Title@2025-06-09 (1): Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information
Title: Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information | Dezentralisierung des Mehr-Agenten-Verstärkungs-Lernens mit zeitlichen Kausalinformationen | 利用时间原因信息分散多机构强化学习 2506.07829v1 |
Authors (5): Jan Corazza, Hadi Partovi Aria, Hyohun Kim, Daniel Neider, Zhe Xu
Reinforcement learning (RL) algorithms can find an optimal policy for a single agent to accomplish a particular task. However, many real-world problems require multiple agents to collaborate in order to achieve a common goal. For example, a robot executing a task in a warehouse may require the assistance of a drone to retrieve items from high shelves. In Decentralized Multi-Agent RL (DMARL), agents learn independently and then combine their policies at execution time, but often must satisfy constraints on compatibility of local policies to ensure that they can achieve the global task when combined. In this paper, we study how providing high-level symbolic knowledge to agents can help address unique challenges of this setting, such as privacy constraints, communication limitations, and performance concerns. In particular, we extend the formal tools used to check the compatibility of local policies with the team task, making decentralized training with theoretical guarantees usable in more scenarios. Furthermore, we empirically demonstrate that symbolic knowledge about the temporal evolution of events in the environment can significantly expedite the learning process in DMARL.
nan
Article 1014
Title@2025-06-09 (1): R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation
Title: R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation | R3D2: Realistische 3D Asset Insertion über Diffusion für autonome Fahrsimulation | R3D2:现实的 3D 资产通过传播插入资产,用于自动驾驶模拟 2506.07826v1 |
Authors (11): William Ljungbergh, Bernardo Taveira, Wenzhao Zheng, Adam Tonderski, Chensheng Peng, Fredrik Kahl, Christoffer Petersson, Michael Felsberg, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
Validating autonomous driving (AD) systems requires diverse and safety-critical testing, making photorealistic virtual environments essential. Traditional simulation platforms, while controllable, are resource-intensive to scale and often suffer from a domain gap with real-world data. In contrast, neural reconstruction methods like 3D Gaussian Splatting (3DGS) offer a scalable solution for creating photorealistic digital twins of real-world driving scenes. However, they struggle with dynamic object manipulation and reusability as their per-scene optimization-based methodology tends to result in incomplete object models with integrated illumination effects. This paper introduces R3D2, a lightweight, one-step diffusion model designed to overcome these limitations and enable realistic insertion of complete 3D assets into existing scenes by generating plausible rendering effects-such as shadows and consistent lighting-in real time. This is achieved by training R3D2 on a novel dataset: 3DGS object assets are generated from in-the-wild AD data using an image-conditioned 3D generative model, and then synthetically placed into neural rendering-based virtual environments, allowing R3D2 to learn realistic integration. Quantitative and qualitative evaluations demonstrate that R3D2 significantly enhances the realism of inserted assets, enabling use-cases like text-to-3D asset insertion and cross-scene/dataset object transfer, allowing for true scalability in AD validation. To promote further research in scalable and realistic AD simulation, we will release our dataset and code, see https://research.zenseact.com/publications/R3D2/.
nan
Article 1015
Title@2025-06-09 (1): Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation
Title: Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation | Beschleunigung von Diffusionsmodellen in Offline-RL über die Reward-Aware-Konsistenz-Trajektoriendestillation | 通过奖励-软件兼容性轨迹蒸馏在离线 RL 中加速扩散模型 2506.07822v1 |
Authors (6): Xintong Duan, Yutong He, Fahim Tajwar, Ruslan Salakhutdinov, J. Zico Kolter, Jeff Schneider
Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While the consistency model offers a potential solution, its applications to decision-making often struggle with suboptimal demonstrations or rely on complex concurrent training of multiple networks. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method enables single-step generation while maintaining higher performance and simpler training. Empirical evaluations on the Gym MuJoCo benchmarks and long horizon planning demonstrate that our approach can achieve an 8.7% improvement over previous state-of-the-art while offering up to 142x speedup over diffusion counterparts in inference time.
nan
Article 1016
Title@2025-06-09 (1): Accelerating Constrained Sampling: A Large Deviations Approach
Title: Accelerating Constrained Sampling: A Large Deviations Approach | Beschleunigte Probenahme beschleunigen: Ein großer Abweichungsansatz | 加速受控抽样:大偏离方法 2506.07816v1 |
Authors (4): Yingli Wang, Changwei Tu, Xiaoyu Wang, Lingjiong Zhu
The problem of sampling a target probability distribution on a constrained domain arises in many applications including machine learning. For constrained sampling, various Langevin algorithms such as projected Langevin Monte Carlo (PLMC) based on the discretization of reflected Langevin dynamics (RLD) and more generally skew-reflected non-reversible Langevin Monte Carlo (SRNLMC) based on the discretization of skew-reflected non-reversible Langevin dynamics (SRNLD) have been proposed and studied in the literature. This work focuses on the long-time behavior of SRNLD, where a skew-symmetric matrix is added to RLD. Although the non-asymptotic convergence analysis for SRNLD (and SRNLMC) and the acceleration compared to RLD (and PMLC) have been studied in the literature, it is not clear how one should design the skew-symmetric matrix in the dynamics to achieve good performance in practice. We establish a large deviation principle (LDP) for the empirical measure of SRNLD when the skew-symmetric matrix is chosen such that its product with the inward unit normal vector field on the boundary is zero. By explicitly characterizing the rate functions, we show that SRNLD can accelerate the convergence to the target distribution compared to RLD with this choice of the skew-symmetric matrix. Numerical experiments for SRNLMC based on the proposed skew-symmetric matrix show superior performance which validate the theoretical findings from the large deviations theory.
nan
Article 1017
Title@2025-06-09 (1): A weighted quantum ensemble of homogeneous quantum classifiers
Title: A weighted quantum ensemble of homogeneous quantum classifiers | Ein gewichtetes Quantenensemble homogener Quantenklassifikatoren | 一份加权量量的加权总量,由同质量分类器组成 2506.07810v1 |
Authors (3): Emiliano Tolotti, Enrico Blanzieri, Davide Pastorello
Ensemble methods in machine learning aim to improve prediction accuracy by combining multiple models. This is achieved by ensuring diversity among predictors to capture different data aspects. Homogeneous ensembles use identical models, achieving diversity through different data subsets, and weighted-average ensembles assign higher influence to more accurate models through a weight learning procedure. We propose a method to achieve a weighted homogeneous quantum ensemble using quantum classifiers with indexing registers for data encoding. This approach leverages instance-based quantum classifiers, enabling feature and training point subsampling through superposition and controlled unitaries, and allowing for a quantum-parallel execution of diverse internal classifiers with different data compositions in superposition. The method integrates a learning process involving circuit execution and classical weight optimization, for a trained ensemble execution with weights encoded in the circuit at test-time. Empirical evaluation demonstrate the effectiveness of the proposed method, offering insights into its performance.
nan
Article 1018
Title@2025-06-09 (1): Identifiable Object Representations under Spatial Ambiguities
Title: Identifiable Object Representations under Spatial Ambiguities | Identifizierbare Objektdarstellungen unter räumlichen Ambiguitäten | 空间模糊度下可识别物体的表示 2506.07806v1 |
Authors (3): Avinash Kori, Francesca Toni, Ben Glocker
Modular object-centric representations are essential for human-like reasoning but are challenging to obtain under spatial ambiguities, e.g. due to occlusions and view ambiguities. However, addressing challenges presents both theoretical and practical difficulties. We introduce a novel multi-view probabilistic approach that aggregates view-specific slots to capture invariant content information while simultaneously learning disentangled global viewpoint-level information. Unlike prior single-view methods, our approach resolves spatial ambiguities, provides theoretical guarantees for identifiability, and requires no viewpoint annotations. Extensive experiments on standard benchmarks and novel complex datasets validate our method’s robustness and scalability.
nan
Article 1019
Title@2025-06-09 (1): Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability
Title: Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability | Verbesserung der adversarialen Robustheit mit konformer Vorhersage: Ein Rahmen für eine garantierte Model-Zuverlässigkeit | 增强以非正式预测增强反向能力:保证模型可靠性框架 2506.07804v1 |
Authors (5): Jie Bao, Chuangyin Dang, Rui Luo, Hanwei Zhang, Zhixin Zhou
As deep learning models are increasingly deployed in high-risk applications, robust defenses against adversarial attacks and reliable performance guarantees become paramount. Moreover, accuracy alone does not provide sufficient assurance or reliable uncertainty estimates for these models. This study advances adversarial training by leveraging principles from Conformal Prediction. Specifically, we develop an adversarial attack method, termed OPSA (OPtimal Size Attack), designed to reduce the efficiency of conformal prediction at any significance level by maximizing model uncertainty without requiring coverage guarantees. Correspondingly, we introduce OPSA-AT (Adversarial Training), a defense strategy that integrates OPSA within a novel conformal training paradigm. Experimental evaluations demonstrate that our OPSA attack method induces greater uncertainty compared to baseline approaches for various defenses. Conversely, our OPSA-AT defensive model significantly enhances robustness not only against OPSA but also other adversarial attacks, and maintains reliable prediction. Our findings highlight the effectiveness of this integrated approach for developing trustworthy and resilient deep learning models for safety-critical domains. Our code is available at https://github.com/bjbbbb/Enhancing-Adversarial-Robustness-with-Conformal-Prediction.
nan
Article 1020
Title@2025-06-09 (1): Fine-grained Hierarchical Crop Type Classification from Integrated Hyperspectral EnMAP Data and Multispectral Sentinel-2 Time Series: A Large-scale Dataset and Dual-stream Transformer Method
Title: Fine-grained Hierarchical Crop Type Classification from Integrated Hyperspectral EnMAP Data and Multispectral Sentinel-2 Time Series: A Large-scale Dataset and Dual-stream Transformer Method | Feinkörnige Hierarchische Sortenklassifikation aus integrierten Hyperspektralen EnMAP-Daten und Multispektralen Sentinel-2 Zeitreihen: Eine großflächige Datensatz- und Dual-Stream-Transformer-Methode | 从综合超高光谱 EmmAP 数据和多谱哨兵-2时间序列:大型数据集和双流变异器方法中精细测的等级级作物类型分类 2506.06155v2 |
Authors (11): Wenyuan Li, Shunlin Liang, Yuxiang Zhang, Liqin Liu, Keyan Chen, Yongzhe Chen, Han Ma, Jianglei Xu, Yichuan Ma, Shikang Guan, Zhenwei Shi
Fine-grained crop type classification serves as the fundamental basis for large-scale crop mapping and plays a vital role in ensuring food security. It requires simultaneous capture of both phenological dynamics (obtained from multi-temporal satellite data like Sentinel-2) and subtle spectral variations (demanding nanometer-scale spectral resolution from hyperspectral imagery). Research combining these two modalities remains scarce currently due to challenges in hyperspectral data acquisition and crop types annotation costs. To address these issues, we construct a hierarchical hyperspectral crop dataset (H2Crop) by integrating 30m-resolution EnMAP hyperspectral data with Sentinel-2 time series. With over one million annotated field parcels organized in a four-tier crop taxonomy, H2Crop establishes a vital benchmark for fine-grained agricultural crop classification and hyperspectral image processing. We propose a dual-stream Transformer architecture that synergistically processes these modalities. It coordinates two specialized pathways: a spectral-spatial Transformer extracts fine-grained signatures from hyperspectral EnMAP data, while a temporal Swin Transformer extracts crop growth patterns from Sentinel-2 time series. The designed hierarchical classification head with hierarchical fusion then simultaneously delivers multi-level crop type classification across all taxonomic tiers. Experiments demonstrate that adding hyperspectral EnMAP data to Sentinel-2 time series yields a 4.2% average F1-scores improvement (peaking at 6.3%). Extensive comparisons also confirm our method’s higher accuracy over existing deep learning approaches for crop type classification and the consistent benefits of hyperspectral data across varying temporal windows and crop change scenarios. Codes and dataset are available at https://github.com/flyakon/H2Crop.
nan
Article 1021
Title@2025-06-09 (1): Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness
Title: Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness | Widersacher mit Anreizen: Eine strategische Alternative zur widerspenstigen Robustheit | 具有奖励措施的逆向:反反向强力战略替代方案 2406.11458v3 |
Authors (3): Maayan Ehrenberg, Roy Ganz, Nir Rosenfeld
Adversarial training aims to defend against adversaries: malicious opponents whose sole aim is to harm predictive performance in any way possible. This presents a rather harsh perspective, which we assert results in unnecessarily conservative training. As an alternative, we propose to model opponents as simply pursuing their own goals–rather than working directly against the classifier. Employing tools from strategic modeling, our approach enables knowledge or beliefs regarding the opponent’s possible incentives to be used as inductive bias for learning. Accordingly, our method of strategic training is designed to defend against all opponents within an ‘incentive uncertainty set’. This resorts to adversarial learning when the set is maximal, but offers potential gains when the set can be appropriately reduced. We conduct a series of experiments that show how even mild knowledge regarding the opponent’s incentives can be useful, and that the degree of potential gains depends on how these incentives relate to the structure of the learning task.
nan
Article 1022
Title@2025-06-09 (1): MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification
Title: MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification | MultiMatch: Multihead-Konsistenzregularisierung passend zur semi-überwachten Textklassifikation | 多匹配: 用于半有效文本分类的多标题一致性规则化 2506.07801v1 |
Authors (5): Iustin Sirbu, Robert-Adrian Popovici, Cornelia Caragea, Stefan Trausan-Matu, Traian Rebedea
We introduce MultiMatch, a novel semi-supervised learning (SSL) algorithm combining the paradigms of co-training and consistency regularization with pseudo-labeling. At its core, MultiMatch features a three-fold pseudo-label weighting module designed for three key purposes: selecting and filtering pseudo-labels based on head agreement and model confidence, and weighting them according to the perceived classification difficulty. This novel module enhances and unifies three existing techniques – heads agreement from Multihead Co-training, self-adaptive thresholds from FreeMatch, and Average Pseudo-Margins from MarginMatch – resulting in a holistic approach that improves robustness and performance in SSL settings. Experimental results on benchmark datasets highlight the superior performance of MultiMatch, achieving state-of-the-art results on 9 out of 10 setups from 5 natural language processing datasets and ranking first according to the Friedman test among 19 methods. Furthermore, MultiMatch demonstrates exceptional robustness in highly imbalanced settings, outperforming the second-best approach by 3.26% – and data imbalance is a key factor for many text classification tasks.
nan
Article 1023
Title@2025-06-09 (1): Generalized Interpolating Discrete Diffusion
Title: Generalized Interpolating Discrete Diffusion | Generalisierte Interpolation diskreter Diffusion | 普遍国际刑 化 2503.04482v2 |
Authors (6): Dimitri von Rütte, Janis Fluri, Yuhui Ding, Antonio Orvieto, Bernhard Schölkopf, Thomas Hofmann
While state-of-the-art language models achieve impressive results through next-token prediction, they have inherent limitations such as the inability to revise already generated tokens. This has prompted exploration of alternative approaches such as discrete diffusion. However, masked diffusion, which has emerged as a popular choice due to its simplicity and effectiveness, reintroduces this inability to revise words. To overcome this, we generalize masked diffusion, deriving a new family of general interpolating discrete diffusion (GIDD) which offers greater flexibility in the design of the noising processes. Leveraging a novel diffusion ELBO, we achieve compute-matched state-of-the-art performance in diffusion language modeling. Exploiting GIDD’s flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality and unlocking the ability for the model to correct its own mistakes, an area where autoregressive models notoriously have struggled. Code: https://github.com/dvruette/gidd/
nan
Article 1024
Title@2025-06-09 (1): LLM Unlearning Should Be Form-Independent
Title: LLM Unlearning Should Be Form-Independent | LLM Unlearning sollte formunabhängig sein | LLM 重新学习应是独立的形式 2506.07795v1 |
Authors (3): Xiaotian Ye, Mengqi Zhang, Shu Wu
Large Language Model (LLM) unlearning aims to erase or suppress undesirable knowledge within the model, offering promise for controlling harmful or private information to prevent misuse. However, recent studies highlight its limited efficacy in real-world scenarios, hindering practical adoption. In this study, we identify a pervasive issue underlying many downstream failures: the effectiveness of existing unlearning methods heavily depends on the form of training samples and frequently fails to generalize to alternate expressions of the same knowledge. We formally characterize this problem as Form-Dependent Bias and systematically investigate its specific manifestation patterns across various downstream tasks. To quantify its prevalence and support future research, we introduce ORT, a novel benchmark designed to evaluate the robustness of unlearning methods against variations in knowledge expression. Results reveal that Form-Dependent Bias is both widespread and severe among current techniques. We argue that LLM unlearning should be form-independent to address the endless forms of downstream tasks encountered in real-world security-critical scenarios. Towards this goal, we introduce Rank-one Concept Redirection (ROCR), a novel training-free method, as a promising solution path. ROCR performs unlearning by targeting the invariants in downstream tasks, specifically the activated dangerous concepts. It is capable of modifying model parameters within seconds to redirect the model’s perception of a specific unlearning target concept to another harmless concept. Extensive experiments demonstrate that ROCR significantly improves unlearning effectiveness compared to traditional methods while generating highly natural outputs.
nan
Article 1025
Title@2025-06-09 (1): High-Dimensional Learning in Finance
Title: High-Dimensional Learning in Finance | High-Dimensional Learning in Finance | 金融领域高层次学习 2506.03780v2 |
Authors (1): Hasan Fallahgoul
Recent advances in machine learning have shown promising results for financial prediction using large, over-parameterized models. This paper provides theoretical foundations and empirical validation for understanding when and how these methods achieve predictive success. I examine two key aspects of high-dimensional learning in finance. First, I prove that within-sample standardization in Random Fourier Features implementations fundamentally alters the underlying Gaussian kernel approximation, replacing shift-invariant kernels with training-set dependent alternatives. Second, I establish information-theoretic lower bounds that identify when reliable learning is impossible no matter how sophisticated the estimator. A detailed quantitative calibration of the polynomial lower bound shows that with typical parameter choices, e.g., 12,000 features, 12 monthly observations, and R-square 2-3%, the required sample size to escape the bound exceeds 25-30 years of data–well beyond any rolling-window actually used. Thus, observed out-of-sample success must originate from lower-complexity artefacts rather than from the intended high-dimensional mechanism.
nan
Article 1026
Title@2025-06-09 (1): Text-to-LoRA: Instant Transformer Adaption
Title: Text-to-LoRA: Instant Transformer Adaption | Text-zu-LoRA: Instant Transformer Adaption | 文本到LORA:即时变换器适应 2506.06105v2 |
Authors (4): Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, Robert Tjarko Lange
While Foundation Models provide a general tool for rapid content creation, they regularly require task-specific adaptation. Traditionally, this exercise involves careful curation of datasets and repeated fine-tuning of the underlying model. Fine-tuning techniques enable practitioners to adapt foundation models for many new applications but require expensive and lengthy training while being notably sensitive to hyperparameter choices. To overcome these limitations, we introduce Text-to-LoRA (T2L), a model capable of adapting large language models (LLMs) on the fly solely based on a natural language description of the target task. T2L is a hypernetwork trained to construct LoRAs in a single inexpensive forward pass. After training T2L on a suite of 9 pre-trained LoRA adapters (GSM8K, Arc, etc.), we show that the ad-hoc reconstructed LoRA instances match the performance of task-specific adapters across the corresponding test sets. Furthermore, T2L can compress hundreds of LoRA instances and zero-shot generalize to entirely unseen tasks. This approach provides a significant step towards democratizing the specialization of foundation models and enables language-based adaptation with minimal compute requirements. Our code is available at https://github.com/SakanaAI/text-to-lora
nan
Article 1027
Title@2025-06-09 (1): Detecting Out-of-Distribution Objects through Class-Conditioned Inpainting
Title: Detecting Out-of-Distribution Objects through Class-Conditioned Inpainting | Nicht-Verteilungsobjekte durch klassenkonditionierte Inpainting erkennen | 通过按类别修饰油漆检测分销物体 2402.03292v3 |
Authors (7): Quang-Huy Nguyen, Jin Peng Zhou, Zhenzhen Liu, Khanh-Huyen Bui, Kilian Q. Weinberger, Wei-Lun Chao, Dung D. Le
Recent object detectors have achieved impressive accuracy in identifying objects seen during training. However, real-world deployment often introduces novel and unexpected objects, referred to as out-of-distribution (OOD) objects, posing significant challenges to model trustworthiness. Modern object detectors are typically overconfident, making it unreliable to use their predictions alone for OOD detection. To address this, we propose leveraging an auxiliary model as a complementary solution. Specifically, we utilize an off-the-shelf text-to-image generative model, such as Stable Diffusion, which is trained with objective functions distinct from those of discriminative object detectors. We hypothesize that this fundamental difference enables the detection of OOD objects by measuring inconsistencies between the models. Concretely, for a given detected object bounding box and its predicted in-distribution class label, we perform class-conditioned inpainting on the image with the object removed. If the object is OOD, the inpainted image is likely to deviate significantly from the original, making the reconstruction error a robust indicator of OOD status. Extensive experiments demonstrate that our approach consistently surpasses existing zero-shot and non-zero-shot OOD detection methods, establishing a robust framework for enhancing object detection systems in dynamic environments.
nan
Article 1028
Title@2025-06-09 (1): Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy
Title: Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy | Prozess und Vertrauen: Byzantinische Angriffe mit umfassender Verteidigungsstrategie angehen | 审判和信任:以全面防卫战略对付拜占庭袭击 2505.07614v2 |
Authors (9): Gleb Molodtsov, Daniil Medyakov, Sergey Skorik, Nikolas Khachaturov, Shahane Tigranyan, Vladimir Aletov, Aram Avetisyan, Martin Takáč, Aleksandr Beznosikov
Recent advancements in machine learning have improved performance while also increasing computational demands. While federated and distributed setups address these issues, their structure is vulnerable to malicious influences. In this paper, we address a specific threat, Byzantine attacks, where compromised clients inject adversarial updates to derail global convergence. We combine the trust scores concept with trial function methodology to dynamically filter outliers. Our methods address the critical limitations of previous approaches, allowing functionality even when Byzantine nodes are in the majority. Moreover, our algorithms adapt to widely used scaled methods like Adam and RMSProp, as well as practical scenarios, including local training and partial participation. We validate the robustness of our methods by conducting extensive experiments on both synthetic and real ECG data collected from medical institutions. Furthermore, we provide a broad theoretical analysis of our algorithms and their extensions to aforementioned practical setups. The convergence guarantees of our methods are comparable to those of classical algorithms developed without Byzantine interference.
nan
Article 1029
Title@2025-06-09 (1): Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger
Title: Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger | Re-Ranking Reasoning Kontext mit Baumsuche macht große Vision-Sprache Modelle stärker | 以树木搜索为重新排列的原因说明背景,使大型视觉语言模型更强大 2506.07785v1 |
Authors (6): Qi Yang, Chenghao Zhang, Lubin Fan, Kun Ding, Jieping Ye, Shiming Xiang
Recent advancements in Large Vision Language Models (LVLMs) have significantly improved performance in Visual Question Answering (VQA) tasks through multimodal Retrieval-Augmented Generation (RAG). However, existing methods still face challenges, such as the scarcity of knowledge with reasoning examples and erratic responses from retrieved knowledge. To address these issues, in this study, we propose a multimodal RAG framework, termed RCTS, which enhances LVLMs by constructing a Reasoning Context-enriched knowledge base and a Tree Search re-ranking method. Specifically, we introduce a self-consistent evaluation mechanism to enrich the knowledge base with intrinsic reasoning patterns. We further propose a Monte Carlo Tree Search with Heuristic Rewards (MCTS-HR) to prioritize the most relevant examples. This ensures that LVLMs can leverage high-quality contextual reasoning for better and more consistent responses. Extensive experiments demonstrate that our framework achieves state-of-the-art performance on multiple VQA datasets, significantly outperforming In-Context Learning (ICL) and Vanilla-RAG methods. It highlights the effectiveness of our knowledge base and re-ranking method in improving LVLMs. Our code is available at https://github.com/yannqi/RCTS-RAG.
nan
Article 1030
Title@2025-06-09 (1): Is poisoning a real threat to LLM alignment? Maybe more so than you think
Title: Is poisoning a real threat to LLM alignment? Maybe more so than you think | Ist Vergiftung eine echte Bedrohung für LLM Ausrichtung? Vielleicht mehr, als Sie denken | 中毒是否真正威胁到LLM对齐? 2406.12091v4 |
Authors (5): Pankayaraj Pathmanathan, Souradip Chakraborty, Xiangyu Liu, Yongyuan Liang, Furong Huang
Recent advancements in Reinforcement Learning with Human Feedback (RLHF) have significantly impacted the alignment of Large Language Models (LLMs). The sensitivity of reinforcement learning algorithms such as Proximal Policy Optimization (PPO) has led to new line work on Direct Policy Optimization (DPO), which treats RLHF in a supervised learning framework. The increased practical use of these RLHF methods warrants an analysis of their vulnerabilities. In this work, we investigate the vulnerabilities of DPO to poisoning attacks under different scenarios and compare the effectiveness of preference poisoning, a first of its kind. We comprehensively analyze DPO’s vulnerabilities under different types of attacks, i.e., backdoor and non-backdoor attacks, and different poisoning methods across a wide array of language models, i.e., LLama 7B, Mistral 7B, and Gemma 7B. We find that unlike PPO-based methods, which, when it comes to backdoor attacks, require at least 4\% of the data to be poisoned to elicit harmful behavior, we exploit the true vulnerabilities of DPO more simply so we can poison the model with only as much as 0.5\% of the data. We further investigate the potential reasons behind the vulnerability and how well this vulnerability translates into backdoor vs non-backdoor attacks.
nan
Article 1031
Title@2025-06-09 (1): WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection
Title: WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection | WWAggr: Ein Fenster Wasserstein-basierte Aggregation für Ensemble Change Point Detection | WWWAggr:一个基于窗口的瓦西施泰因聚合群,用于聚合变化点探测 2506.08066v1 |
Authors (3): Alexander Stepikin, Evgenia Romanenkova, Alexey Zaytsev
Change Point Detection (CPD) aims to identify moments of abrupt distribution shifts in data streams. Real-world high-dimensional CPD remains challenging due to data pattern complexity and violation of common assumptions. Resorting to standalone deep neural networks, the current state-of-the-art detectors have yet to achieve perfect quality. Concurrently, ensembling provides more robust solutions, boosting the performance. In this paper, we investigate ensembles of deep change point detectors and realize that standard prediction aggregation techniques, e.g., averaging, are suboptimal and fail to account for problem peculiarities. Alternatively, we introduce WWAggr – a novel task-specific method of ensemble aggregation based on the Wasserstein distance. Our procedure is versatile, working effectively with various ensembles of deep CPD models. Moreover, unlike existing solutions, we practically lift a long-standing problem of the decision threshold selection for CPD.
nan
Article 1032
Title@2025-06-09 (1): Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Title: Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics | Die Grenzen der Low-Bit-Optimierer drücken: Fokus auf EMA-Dynamik | 推动低比最佳优化因素的极限:注重EMA动态 2505.00347v2 |
Authors (10): Cong Xu, Wenbin Liang, Mo Yu, Anan Liu, Ke-Yue Zhang, Shunli Wang, Lizhuang Ma, Jianyong Wang, Jun Wang, Wei Zhang
The rapid scaling of models has led to prohibitively high training and fine-tuning costs. A major factor accounting for memory consumption is the widespread use of stateful optimizers (e.g., Adam), which maintain auxiliary information of even 2x the model size in order to achieve optimal convergence. We therefore present SOLO in this work to spawn a novel type of optimizer that requires an extremely light memory footprint. While previous efforts have achieved certain success in 8-bit or 4-bit cases, SOLO enables Adam-style optimizers to maintain quantized states with precision as low as 3 bits, or even 2 bits. This immense progress is due to the identification and resolution of two key challenges: the signal swamping problem in unsigned quantization that results in unchanged state dynamics, and the increased gradient variance in signed quantization that leads to incorrect descent directions. The theoretical analysis suggests a tailored logarithmic quantization for the former and a precision-specific momentum hyperparameter for the latter. SOLO can thus be seamlessly applied to Adam-style optimizers, leading to substantial memory savings with minimal accuracy loss.
nan
Article 1033
Title@2025-06-09 (1): Trend-Aware Fashion Recommendation with Visual Segmentation and Semantic Similarity
Title: Trend-Aware Fashion Recommendation with Visual Segmentation and Semantic Similarity | Trend-Aware Fashion Empfehlung mit Visueller Segmentierung und semantischer Ähnlichkeit | 具有视觉分解和语义相似性的趋势软件时装建议 2506.07773v1 |
Authors (3): Mohamed Djilani, Nassim Ali Ousalah, Nidhal Eddine Chenni
We introduce a trend-aware and visually-grounded fashion recommendation system that integrates deep visual representations, garment-aware segmentation, semantic category similarity and user behavior simulation. Our pipeline extracts focused visual embeddings by masking non-garment regions via semantic segmentation followed by feature extraction using pretrained CNN backbones (ResNet-50, DenseNet-121, VGG16). To simulate realistic shopping behavior, we generate synthetic purchase histories influenced by user-specific trendiness and item popularity. Recommendations are computed using a weighted scoring function that fuses visual similarity, semantic coherence and popularity alignment. Experiments on the DeepFashion dataset demonstrate consistent gender alignment and improved category relevance, with ResNet-50 achieving 64.95% category similarity and lowest popularity MAE. An ablation study confirms the complementary roles of visual and popularity cues. Our method provides a scalable framework for personalized fashion recommendations that balances individual style with emerging trends. Our implementation is available at https://github.com/meddjilani/FashionRecommender
nan
Article 1034
Title@2025-06-09 (1): Diffusion Models-Aided Uplink Channel Estimation for RIS-Assisted Systems
Title: Diffusion Models-Aided Uplink Channel Estimation for RIS-Assisted Systems | Diffusion Models-Aided Uplink Channel Abschätzung für RIS-Assistente Systeme | IRIS辅助系统传播模型辅助上链接频道估计 2506.07770v1 |
Authors (8): Yang Wang, Yin Xu, Cixiao Zhang, Zhiyong Chen, Xiaowu Ou, Mingzeng Dai, Meixia Tao, Wenjun Zhang
This letter proposes a channel estimation method for reconfigurable intelligent surface (RIS)-assisted systems through a novel diffusion model (DM) framework. We reformulate the channel estimation problem as a denoising process, which aligns with the reverse process of the DM. To overcome the inherent randomness in the reverse process of conventional DM approaches, we adopt a deterministic sampling strategy with a step alignment mechanism that ensures the accuracy of channel estimation while adapting to different signal-to-noise ratio (SNR). Furthermore, to reduce the number of parameters of the U-Net, we meticulously design a lightweight network that achieves comparable performance, thereby enhancing the practicality of our proposed method. Extensive simulations demonstrate superior performance over a wide range of SNRs compared to baselines. For instance, the proposed method achieves performance improvements of up to 13.5 dB in normalized mean square error (NMSE) at SNR = 0 dB. Notably, the proposed lightweight network exhibits almost no performance loss compared to the original U-Net, while requiring only 6.59\% of its parameters.
nan
Article 1035
Title@2025-06-09 (1): Clustered Federated Learning via Embedding Distributions
Title: Clustered Federated Learning via Embedding Distributions | Clustered Federated Learning durch Einbettung von Distributionen | 通过嵌入式分发分发方式进行分组联邦学习 2506.07769v1 |
Authors (3): Dekai Zhang, Matthew Williams, Francesca Toni
Federated learning (FL) is a widely used framework for machine learning in distributed data environments where clients hold data that cannot be easily centralised, such as for data protection reasons. FL, however, is known to be vulnerable to non-IID data. Clustered FL addresses this issue by finding more homogeneous clusters of clients. We propose a novel one-shot clustering method, EMD-CFL, using the Earth Mover’s distance (EMD) between data distributions in embedding space. We theoretically motivate the use of EMDs using results from the domain adaptation literature and demonstrate empirically superior clustering performance in extensive comparisons against 16 baselines and on a range of challenging datasets.
nan
Article 1036
Title@2025-06-09 (1): Quickest Causal Change Point Detection by Adaptive Intervention
Title: Quickest Causal Change Point Detection by Adaptive Intervention | Schnellste ursächliche Punktänderungserkennung durch adaptive Intervention | 通过适应干预措施快速发现因果变化点 2506.07760v1 |
Authors (2): Haijie Xu, Chen Zhang
We propose an algorithm for change point monitoring in linear causal models that accounts for interventions. Through a special centralization technique, we can concentrate the changes arising from causal propagation across nodes into a single dimension. Additionally, by selecting appropriate intervention nodes based on Kullback-Leibler divergence, we can amplify the change magnitude. We also present an algorithm for selecting the intervention values, which aids in the identification of the most effective intervention nodes. Two monitoring methods are proposed, each with an adaptive intervention policy to make a balance between exploration and exploitation. We theoretically demonstrate the first-order optimality of the proposed methods and validate their properties using simulation datasets and two real-world case studies.
nan
Article 1037
Title@2025-06-09 (1): Agent Semantics, Semantic Spacetime, and Graphical Reasoning
Title: Agent Semantics, Semantic Spacetime, and Graphical Reasoning | Agent Semantics, Semantische Raumzeit und Graphische Vernunft | 语义学、语义空间时间和图形解释 2506.07756v1 |
Authors (1): Mark Burgess
Some formal aspects of the Semantic Spacetime graph model are presented, with reference to its use for directed knowledge representations and process modelling. A finite $\gamma(3,4)$ representation is defined to form a closed set of operations that can scale to any degree of semantic complexity. The Semantic Spacetime postulates bring predictability with minimal constraints to pathways in graphs. The ubiquitous appearance of absorbing states in any partial graph means that a graph process leaks information. The issue is closely associated with the issue of division by zero, which signals a loss of closure and the need for manual injection of remedial information. The Semantic Spacetime model (and its Promise Theory) origins help to clarify how such absorbing states are associated with boundary information where intentionality can enter.
nan
Article 1038
Title@2025-06-09 (1): Comparing Credit Risk Estimates in the Gen-AI Era
Title: Comparing Credit Risk Estimates in the Gen-AI Era | Vergleich der Schätzungen des Kreditrisikos in der Ära der Gen-AI | 比较GEN-AI时代的信用风险估算 2506.07754v1 |
Authors (7): Nicola Lavecchia, Sid Fadanelli, Federico Ricciuti, Gennaro Aloe, Enrico Bagli, Pietro Giuffrida, Daniele Vergari
Generative AI technologies have demonstrated significant potential across diverse applications. This study provides a comparative analysis of credit score modeling techniques, contrasting traditional approaches with those leveraging generative AI. Our findings reveal that current generative AI models fall short of matching the performance of traditional methods, regardless of the integration strategy employed. These results highlight the limitations in the current capabilities of generative AI for credit risk scoring, emphasizing the need for further research and development before the possibility of applying generative AI for this specific task, or equivalent ones.
nan
Article 1039
Title@2025-06-09 (1): An Overview of the Burer-Monteiro Method for Certifiable Robot Perception
Title: An Overview of the Burer-Monteiro Method for Certifiable Robot Perception | Ein Überblick über die Burer-Monteiro-Methode für zertifizierbare Roboterwahrnehmung | 关于可核证的机器人感知的Burer-Monteiro方法概览 2410.00117v2 |
Authors (5): Alan Papalia, Yulun Tian, David M. Rosen, Jonathan P. How, John J. Leonard
This paper presents an overview of the Burer-Monteiro method (BM), a technique that has been applied to solve robot perception problems to certifiable optimality in real-time. BM is often used to solve semidefinite programming relaxations, which can be used to perform global optimization for non-convex perception problems. Specifically, BM leverages the low-rank structure of typical semidefinite programs to dramatically reduce the computational cost of performing optimization. This paper discusses BM in certifiable perception, with three main objectives: (i) to consolidate information from the literature into a unified presentation, (ii) to elucidate the role of the linear independence constraint qualification (LICQ), a concept not yet well-covered in certifiable perception literature, and (iii) to share practical considerations that are discussed among practitioners but not thoroughly covered in the literature. Our general aim is to offer a practical primer for applying BM towards certifiable perception.
nan
Article 1040
Title@2025-06-09 (1): DualCast: A Model to Disentangle Aperiodic Events from Traffic Series
Title: DualCast: A Model to Disentangle Aperiodic Events from Traffic Series | DualCast: Ein Modell, um Aperiodische Ereignisse aus der Verkehrsreihe zu entwirren | DualCast: 将周期性事件与交通系列分开的模型 2411.18286v2 |
Authors (6): Xinyu Su, Feng Liu, Yanchuan Chang, Egemen Tanin, Majid Sarvi, Jianzhong Qi
Traffic forecasting is crucial for transportation systems optimisation. Current models minimise the mean forecasting errors, often favouring periodic events prevalent in the training data, while overlooking critical aperiodic ones like traffic incidents. To address this, we propose DualCast, a dual-branch framework that disentangles traffic signals into intrinsic spatial-temporal patterns and external environmental contexts, including aperiodic events. DualCast also employs a cross-time attention mechanism to capture high-order spatial-temporal relationships from both periodic and aperiodic patterns. DualCast is versatile. We integrate it with recent traffic forecasting models, consistently reducing their forecasting errors by up to 9.6% on multiple real datasets. Our source code is available at https://github.com/suzy0223/DualCast.
nan
Article 1041
Title@2025-06-09 (1): Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning
Title: Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning | Graph-Assistente Stiche für Offline-Hierarchisches Verstärkungslernen | 离线高层强化学习的图表辅助细化 2506.07744v1 |
Authors (5): Seungho Baek, Taegeon Park, Jongchan Park, Seungjun Oh, Yusung Kim
Existing offline hierarchical reinforcement learning methods rely on high-level policy learning to generate subgoal sequences. However, their efficiency degrades as task horizons increase, and they lack effective strategies for stitching useful state transitions across different trajectories. We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. By embedding states into a Temporal Distance Representation (TDR) space, GAS clusters semantically similar states from different trajectories into unified graph nodes, enabling efficient transition stitching. A shortest-path algorithm is then applied to select subgoal sequences within the graph, while a low-level policy learns to reach the subgoals. To improve graph quality, we introduce the Temporal Efficiency (TE) metric, which filters out noisy or inefficient transition states, significantly enhancing task performance. GAS outperforms prior offline HRL methods across locomotion, navigation, and manipulation tasks. Notably, in the most stitching-critical task, it achieves a score of 88.3, dramatically surpassing the previous state-of-the-art score of 1.0. Our source code is available at: https://github.com/qortmdgh4141/GAS.
nan
Article 1042
Title@2025-06-09 (1): An end-to-end attention-based approach for learning on graphs
Title: An end-to-end attention-based approach for learning on graphs | Ein durchgängiger, auf der Aufmerksamkeit basierender Ansatz für das Lernen auf Diagrammen | 以端对端关注为基础的图表学习方法 2402.10793v3 |
Authors (4): David Buterez, Jon Paul Janet, Dino Oglic, Pietro Lio
There has been a recent surge in transformer-based architectures for learning on graphs, mainly motivated by attention as an effective learning mechanism and the desire to supersede handcrafted operators characteristic of message passing schemes. However, concerns over their empirical effectiveness, scalability, and complexity of the pre-processing steps have been raised, especially in relation to much simpler graph neural networks that typically perform on par with them across a wide range of benchmarks. To tackle these shortcomings, we consider graphs as sets of edges and propose a purely attention-based approach consisting of an encoder and an attention pooling mechanism. The encoder vertically interleaves masked and vanilla self-attention modules to learn an effective representations of edges, while allowing for tackling possible misspecifications in input graphs. Despite its simplicity, the approach outperforms fine-tuned message passing baselines and recently proposed transformer-based methods on more than 70 node and graph-level tasks, including challenging long-range benchmarks. Moreover, we demonstrate state-of-the-art performance across different tasks, ranging from molecular to vision graphs, and heterophilous node classification. The approach also outperforms graph neural networks and transformers in transfer learning settings, and scales much better than alternatives with a similar performance level or expressive power.
nan
Article 1043
Title@2025-06-09 (1): When, Where and Why to Average Weights?
Title: When, Where and Why to Average Weights? | Wann, wo und warum zu Durchschnittsgewichten? | 何时、何地和为何平均体重? 2502.06761v2 |
Authors (3): Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping
Averaging checkpoints along the training trajectory is a simple yet powerful approach to improve the generalization performance of Machine Learning models and reduce training time. Motivated by these potential gains, and in an effort to fairly and thoroughly benchmark this technique, we present an extensive evaluation of averaging techniques in modern Deep Learning, which we perform using AlgoPerf \citep{dahl_benchmarking_2023}, a large-scale benchmark for optimization algorithms. We investigate whether weight averaging can reduce training time, improve generalization, and replace learning rate decay, as suggested by recent literature. Our evaluation across seven architectures and datasets reveals that averaging significantly accelerates training and yields considerable efficiency gains, at the price of a minimal implementation and memory cost, while mildly improving generalization across all considered workloads. Finally, we explore the relationship between averaging and learning rate annealing and show how to optimally combine the two to achieve the best performances.
nan
Article 1044
Title@2025-06-09 (1): Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning
Title: Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning | Spracheinbettung trifft auf Dynamische Grafik: Eine neue Exploration für neurale Architektur-Repräsentations-Lernen | 语言嵌入与动态图表相匹配:神经结构代表制学习的新探索 2506.07735v1 |
Authors (6): Haizhao Jing, Haokui Zhang, Zhenhao Shang, Rong Xiao, Peng Wang, Yanning Zhang
Neural Architecture Representation Learning aims to transform network models into feature representations for predicting network attributes, playing a crucial role in deploying and designing networks for real-world applications. Recently, inspired by the success of transformers, transformer-based models integrated with Graph Neural Networks (GNNs) have achieved significant progress in representation learning. However, current methods still have some limitations. First, existing methods overlook hardware attribute information, which conflicts with the current trend of diversified deep learning hardware and limits the practical applicability of models. Second, current encoding approaches rely on static adjacency matrices to represent topological structures, failing to capture the structural differences between computational nodes, which ultimately compromises encoding effectiveness. In this paper, we introduce LeDG-Former, an innovative framework that addresses these limitations through the synergistic integration of language-based semantic embedding and dynamic graph representation learning. Specifically, inspired by large language models (LLMs), we propose a language embedding framework where both neural architectures and hardware platform specifications are projected into a unified semantic space through tokenization and LLM processing, enabling zero-shot prediction across different hardware platforms for the first time. Then, we propose a dynamic graph-based transformer for modeling neural architectures, resulting in improved neural architecture modeling performance. On the NNLQP benchmark, LeDG-Former surpasses previous methods, establishing a new SOTA while demonstrating the first successful cross-hardware latency prediction capability. Furthermore, our framework achieves superior performance on the cell-structured NAS-Bench-101 and NAS-Bench-201 datasets.
nan
Article 1045
Title@2025-06-09 (1): Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild
Title: Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild | Robuste 3D-Form-Rekonstruktion in Zero-Shot aus einem einzigen Bild in der Wildnis | 从野生的单一图像中进行零热零位的强力 3D 形状重建 2403.14539v3 |
Authors (4): Junhyeong Cho, Kim Youwang, Hunmin Yang, Tae-Hyun Oh
Recent monocular 3D shape reconstruction methods have shown promising zero-shot results on object-segmented images without any occlusions. However, their effectiveness is significantly compromised in real-world conditions, due to imperfect object segmentation by off-the-shelf models and the prevalence of occlusions. To effectively address these issues, we propose a unified regression model that integrates segmentation and reconstruction, specifically designed for occlusion-aware 3D shape reconstruction. To facilitate its reconstruction in the wild, we also introduce a scalable data synthesis pipeline that simulates a wide range of variations in objects, occluders, and backgrounds. Training on our synthetic data enables the proposed model to achieve state-of-the-art zero-shot results on real-world images, using significantly fewer parameters than competing approaches.
nan
Article 1046
Title@2025-06-09 (1): Profiling Electric Vehicles via Early Charging Voltage Patterns
Title: Profiling Electric Vehicles via Early Charging Voltage Patterns | Profilieren von Elektrofahrzeugen über frühe Ladespannungsmuster | 通过早期充电压模式分析电动车辆 2506.07714v1 |
Authors (4): Francesco Marchiori, Denis Donadel, Alessandro Brighente, Mauro Conti
Electric Vehicles (EVs) are rapidly gaining adoption as a sustainable alternative to fuel-powered vehicles, making secure charging infrastructure essential. Despite traditional authentication protocols, recent results showed that attackers may steal energy through tailored relay attacks. One countermeasure is leveraging the EV’s fingerprint on the current exchanged during charging. However, existing methods focus on the final charging stage, allowing malicious actors to consume substantial energy before being detected and repudiated. This underscores the need for earlier and more effective authentication methods to prevent unauthorized charging. Meanwhile, profiling raises privacy concerns, as uniquely identifying EVs through charging patterns could enable user tracking. In this paper, we propose a framework for uniquely identifying EVs using physical measurements from the early charging stages. We hypothesize that voltage behavior early in the process exhibits similar characteristics to current behavior in later stages. By extracting features from early voltage measurements, we demonstrate the feasibility of EV profiling. Our approach improves existing methods by enabling faster and more reliable vehicle identification. We test our solution on a dataset of 7408 usable charges from 49 EVs, achieving up to 0.86 accuracy. Feature importance analysis shows that near-optimal performance is possible with just 10 key features, improving efficiency alongside our lightweight models. This research lays the foundation for a novel authentication factor while exposing potential privacy risks from unauthorized access to charging data.
nan
Article 1047
Title@2025-06-09 (1): Representation Bending for Large Language Model Safety
Title: Representation Bending for Large Language Model Safety | Darstellungsbiegen für große Sprachmodellsicherheit | 大语文示范语文安全示范语文代表名单 2504.01550v2 |
Authors (10): Ashkan Yousefpour, Taeheon Kim, Ryan S. Kwon, Seungbeen Lee, Wonje Jeung, Seungju Han, Alvin Wan, Harrison Ngan, Youngjae Yu, Jonghyun Choi
Large Language Models (LLMs) have emerged as powerful tools, but their inherent safety risks - ranging from harmful content generation to broader societal harms - pose significant challenges. These risks can be amplified by the recent adversarial attacks, fine-tuning vulnerabilities, and the increasing deployment of LLMs in high-stakes environments. Existing safety-enhancing techniques, such as fine-tuning with human feedback or adversarial training, are still vulnerable as they address specific threats and often fail to generalize across unseen attacks, or require manual system-level defenses. This paper introduces RepBend, a novel approach that fundamentally disrupts the representations underlying harmful behaviors in LLMs, offering a scalable solution to enhance (potentially inherent) safety. RepBend brings the idea of activation steering - simple vector arithmetic for steering model’s behavior during inference - to loss-based fine-tuning. Through extensive evaluation, RepBend achieves state-of-the-art performance, outperforming prior methods such as Circuit Breaker, RMU, and NPO, with up to 95% reduction in attack success rates across diverse jailbreak benchmarks, all with negligible reduction in model usability and general capabilities.
nan
Article 1048
Title@2025-06-09 (1): Statistical Taylor Expansion
Title: Statistical Taylor Expansion | Statistische Taylor-Erweiterung | 扩大泰勒统计 2410.01223v8 |
Authors (1): Chengpu Wang
Statistical Taylor expansion replaces the input precise variables in a conventional Taylor expansion with random variables each with known distribution, to calculate the result mean and deviation. It is based on the uncorrelated uncertainty assumption: Each input variable is measured independently with fine enough statistical precision, so that their uncertainties are independent of each other. It requires each sampling count to be sufficiently large for accurate results. Statistical Taylor expansion reviews that the intermediate analytic expressions can no longer be regarded as independent of each other, and the result of analytic expression should be path independent. This conclusion differs fundamentally from the conventional common approach in applied mathematics to find the best execution path for a result. This paper also presents an implementation of statistical Taylor expansion called variance arithmetic, and the tests on variance arithmetic.
nan
Article 1049
Title@2025-06-09 (1): Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation
Title: Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation | Bewertung der Robustheit in latenten Diffusionsmodellen durch Einbetten von Level Augmentation | 通过嵌入级递增评估前流传播模型的强力 2506.07706v1 |
Authors (2): Boris Martirosyan, Alexey Karmanov
Latent diffusion models (LDMs) achieve state-of-the-art performance across various tasks, including image generation and video synthesis. However, they generally lack robustness, a limitation that remains not fully explored in current research. In this paper, we propose several methods to address this gap. First, we hypothesize that the robustness of LDMs primarily should be measured without their text encoder, because if we take and explore the whole architecture, the problems of image generator and text encoders wll be fused. Second, we introduce novel data augmentation techniques designed to reveal robustness shortcomings in LDMs when processing diverse textual prompts. We then fine-tune Stable Diffusion 3 and Stable Diffusion XL models using Dreambooth, incorporating these proposed augmentation methods across multiple tasks. Finally, we propose a novel evaluation pipeline specifically tailored to assess the robustness of LDMs fine-tuned via Dreambooth.
nan
Article 1050
Title@2025-06-09 (1): Towards a Small Language Model Lifecycle Framework
Title: Towards a Small Language Model Lifecycle Framework | Auf dem Weg zu einem Rahmen für den Lebenszyklus eines kleinen Sprachmodells | 建立一个小型语言模拟生命周期框架 2506.07695v1 |
Authors (4): Parsa Miraghaei, Sergio Moreschini, Antti Kolehmainen, David Hästbacka
Background: The growing demand for efficient and deployable language models has led to increased interest in Small Language Models (SLMs). However, existing research remains fragmented, lacking a unified lifecycle perspective. Objective: This study aims to define a comprehensive lifecycle framework for SLMs by synthesizing insights from academic literature and practitioner sources. Method: We conducted a comprehensive survey of 36 works, analyzing and categorizing lifecycle-relevant techniques. Results: We propose a modular lifecycle model structured into main, optional, and cross-cutting components. The model captures key interconnections across stages, supporting method reuse, co-adaptation, and lifecycle-awareness. Conclusion: Our framework provides a coherent foundation for developing and maintaining SLMs, bridging theory and practice, and guiding future research and tool development.
nan
Article 1051
Title@2025-06-09 (1): Training Superior Sparse Autoencoders for Instruct Models
Title: Training Superior Sparse Autoencoders for Instruct Models | Training Superior Sparse Autoencoder für Instruct-Modelle | 培训高级Sparass Autoencors 教学模型高级导师 2506.07691v1 |
Authors (8): Jiaming Li, Haoran Ye, Yukun Chen, Xinyue Li, Lei Zhang, Hamid Alinejad-Rokny, Jimmy Chih-Hsien Peng, Min Yang
As large language models (LLMs) grow in scale and capability, understanding their internal mechanisms becomes increasingly critical. Sparse autoencoders (SAEs) have emerged as a key tool in mechanistic interpretability, enabling the extraction of human-interpretable features from LLMs. However, existing SAE training methods are primarily designed for base models, resulting in reduced reconstruction quality and interpretability when applied to instruct models. To bridge this gap, we propose $\underline{\textbf{F}}$inetuning-$\underline{\textbf{a}}$ligned $\underline{\textbf{S}}$equential $\underline{\textbf{T}}$raining ($\textit{FAST}$), a novel training method specifically tailored for instruct models. $\textit{FAST}$ aligns the training process with the data distribution and activation patterns characteristic of instruct models, resulting in substantial improvements in both reconstruction and feature interpretability. On Qwen2.5-7B-Instruct, $\textit{FAST}$ achieves a mean squared error of 0.6468 in token reconstruction, significantly outperforming baseline methods with errors of 5.1985 and 1.5096. In feature interpretability, $\textit{FAST}$ yields a higher proportion of high-quality features, for Llama3.2-3B-Instruct, $21.1\%$ scored in the top range, compared to $7.0\%$ and $10.2\%$ for $\textit{BT(P)}$ and $\textit{BT(F)}$. Surprisingly, we discover that intervening on the activations of special tokens via the SAEs leads to improvements in output quality, suggesting new opportunities for fine-grained control of model behavior. Code, data, and 240 trained SAEs are available at https://github.com/Geaming2002/FAST.
nan
Article 1052
Title@2025-06-09 (1): Rao-Blackwellised Reparameterisation Gradients
Title: Rao-Blackwellised Reparameterisation Gradients | Rao-Blackwellised Reparameterisation Gradienten | Rao-黑-黑- 恢复度梯度 2506.07687v1 |
Authors (4): Kevin Lam, Thang Bui, George Deligiannidis, Yee Whye Teh
Latent Gaussian variables have been popularised in probabilistic machine learning. In turn, gradient estimators are the machinery that facilitates gradient-based optimisation for models with latent Gaussian variables. The reparameterisation trick is often used as the default estimator as it is simple to implement and yields low-variance gradients for variational inference. In this work, we propose the R2-G2 estimator as the Rao-Blackwellisation of the reparameterisation gradient estimator. Interestingly, we show that the local reparameterisation gradient estimator for Bayesian MLPs is an instance of the R2-G2 estimator and Rao-Blackwellisation. This lets us extend benefits of Rao-Blackwellised gradients to a suite of probabilistic models. We show that initial training with R2-G2 consistently yields better performance in models with multiple applications of the reparameterisation trick.
nan
Article 1053
Title@2025-06-09 (1): On Fitting Flow Models with Large Sinkhorn Couplings
Title: On Fitting Flow Models with Large Sinkhorn Couplings | Auf Einbau-Flow-Modelle mit großen Sinkhorn-Kupplungen | 配有大型Sinkhorn组合体的 “ 适合流程模型 “ 2506.05526v2 |
Authors (4): Michal Klein, Alireza Mousavi-Hosseini, Stephen Zhang, Marco Cuturi
Flow models transform data gradually from one modality (e.g. noise) onto another (e.g. images). Such models are parameterized by a time-dependent velocity field, trained to fit segments connecting pairs of source and target points. When the pairing between source and target points is given, training flow models boils down to a supervised regression problem. When no such pairing exists, as is the case when generating data from noise, training flows is much harder. A popular approach lies in picking source and target points independently. This can, however, lead to velocity fields that are slow to train, but also costly to integrate at inference time. In theory, one would greatly benefit from training flow models by sampling pairs from an optimal transport (OT) measure coupling source and target, since this would lead to a highly efficient flow solving the Benamou and Brenier dynamical OT problem. In practice, recent works have proposed to sample mini-batches of $n$ source and $n$ target points and reorder them using an OT solver to form better pairs. These works have advocated using batches of size $n\approx 256$, and considered OT solvers that return couplings that are either sharp (using e.g. the Hungarian algorithm) or blurred (using e.g. entropic regularization, a.k.a. Sinkhorn). We follow in the footsteps of these works by exploring the benefits of increasing $n$ by three to four orders of magnitude, and look more carefully on the effect of the entropic regularization $\varepsilon$ used in the Sinkhorn algorithm. Our analysis is facilitated by new scale invariant quantities to report the sharpness of a coupling, while our sharded computations across multiple GPU or GPU nodes allow scaling up $n$. We show that in both synthetic and image generation tasks, flow models greatly benefit when fitted with large Sinkhorn couplings, with a low entropic regularization $\varepsilon$.
nan
Article 1054
Title@2025-06-09 (1): Forests for Differences: Robust Causal Inference Beyond Parametric DiD
Title: Forests for Differences: Robust Causal Inference Beyond Parametric DiD | Wälder für Unterschiede: Robuste Kausalableitung jenseits parametrischer DiD | 差异的森林:超出参数二D的强力因果推断 2505.09706v2 |
Authors (2): Hugo Gobato Souto, Francisco Louzada Neto
This paper introduces the Difference-in-Differences Bayesian Causal Forest (DiD-BCF), a novel non-parametric model addressing key challenges in DiD estimation, such as staggered adoption and heterogeneous treatment effects. DiD-BCF provides a unified framework for estimating Average (ATE), Group-Average (GATE), and Conditional Average Treatment Effects (CATE). A core innovation, its Parallel Trends Assumption (PTA)-based reparameterization, enhances estimation accuracy and stability in complex panel data settings. Extensive simulations demonstrate DiD-BCF’s superior performance over established benchmarks, particularly under non-linearity, selection biases, and effect heterogeneity. Applied to U.S. minimum wage policy, the model uncovers significant conditional treatment effect heterogeneity related to county population, insights obscured by traditional methods. DiD-BCF offers a robust and versatile tool for more nuanced causal inference in modern DiD applications.
nan
Article 1055
Title@2025-06-09 (1): Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
Title: Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models | Unlösbare Problemerkennung: Robuste Bewertung des Verständnisses für große multimodale Modelle | 无法解决的问题探测:大型多式联运模型的强有力理解评估 2403.20331v4 |
Authors (10): Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa
This paper introduces a novel task to evaluate the robust understanding capability of Large Multimodal Models (LMMs), termed $\textbf{Unsolvable Problem Detection (UPD)}$. Multiple-choice question answering (MCQA) is widely used to assess the understanding capability of LMMs, but it does not guarantee that LMMs truly comprehend the answer. UPD assesses the LMM’s ability to withhold answers when encountering unsolvable problems of MCQA, verifying whether the model truly understands the answer. UPD encompasses three problems: Absent Answer Detection (AAD), Incompatible Answer Set Detection (IASD), and Incompatible Visual Question Detection (IVQD), covering unsolvable cases like answer-lacking or incompatible choices and image-question mismatches. For the evaluation, we introduce the MM-UPD Bench, a benchmark for assessing performance across various ability dimensions. Our experiments reveal that even most LMMs, which demonstrate adequate performance on existing benchmarks, struggle significantly with MM-UPD, underscoring a novel aspect of trustworthiness that current benchmarks have overlooked. A detailed analysis shows that LMMs have different bottlenecks and chain-of-thought and self-reflection improved performance for LMMs with the bottleneck in their LLM capability. We hope our insights will enhance the broader understanding and development of more reliable LMMs. The code is available at https://github.com/AtsuMiyai/UPD.
nan
Article 1056
Title@2025-06-09 (1): How Benchmark Prediction from Fewer Data Misses the Mark
Title: How Benchmark Prediction from Fewer Data Misses the Mark | Wie Benchmark-Voraussagen von weniger Daten das Zeichen vermissen | 从更少数据中的基准预测如何错过标记 2506.07673v1 |
Authors (3): Guanhua Zhang, Florian E. Dorner, Moritz Hardt
Large language model (LLM) evaluation is increasingly costly, prompting interest in methods that speed up evaluation by shrinking benchmark datasets. Benchmark prediction (also called efficient LLM evaluation) aims to select a small subset of evaluation points and predict overall benchmark performance from that subset. In this paper, we systematically assess the strengths and limitations of 11 benchmark prediction methods across 19 diverse benchmarks. First, we identify a highly competitive baseline: Take a random sample and fit a regression model on the sample to predict missing entries. Outperforming most existing methods, this baseline challenges the assumption that careful subset selection is necessary for benchmark prediction. Second, we discover that all existing methods crucially depend on model similarity. They work best when interpolating scores among similar models. The effectiveness of benchmark prediction sharply declines when new models have higher accuracy than previously seen models. In this setting of extrapolation, none of the previous methods consistently beat a simple average over random samples. To improve over the sample average, we introduce a new method inspired by augmented inverse propensity weighting. This method consistently outperforms the random sample average even for extrapolation. However, its performance still relies on model similarity and the gains are modest in general. This shows that benchmark prediction fails just when it is most needed: at the evaluation frontier, where the goal is to evaluate new models of unknown capabilities.
nan
Article 1057
Title@2025-06-09 (1): Data-driven inventory management for new products: An adjusted Dyna-$Q$ approach with transfer learning
Title: Data-driven inventory management for new products: An adjusted Dyna-$Q$ approach with transfer learning | Datengesteuertes Bestandsmanagement für neue Produkte: Ein angepasster Dyna-$Q$-Ansatz mit Transfer-Lernen | 新产品数据驱动的库存管理新产品库存管理:经调整的转让学习Dyna-Q$办法 2501.08109v4 |
Authors (3): Xinye Qu, Longxiao Liu, Wenjie Huang
In this paper, we propose a novel reinforcement learning algorithm for inventory management of newly launched products with no historical demand information. The algorithm follows the classic Dyna-$Q$ structure, balancing the model-free and model-based approaches, while accelerating the training process of Dyna-$Q$ and mitigating the model discrepancy generated by the model-based feedback. Based on the idea of transfer learning, warm-start information from the demand data of existing similar products can be incorporated into the algorithm to further stabilize the early-stage training and reduce the variance of the estimated optimal policy. Our approach is validated through a case study of bakery inventory management with real data. The adjusted Dyna-$Q$ shows up to a 23.7\% reduction in average daily cost compared with $Q$-learning, and up to a 77.5\% reduction in training time within the same horizon compared with classic Dyna-$Q$. By using transfer learning, it can be found that the adjusted Dyna-$Q$ has the lowest total cost, lowest variance in total cost, and relatively low shortage percentages among all the benchmarking algorithms under a 30-day testing.
nan
Article 1058
Title@2025-06-09 (1): ProARD: progressive adversarial robustness distillation: provide wide range of robust students
Title: ProARD: progressive adversarial robustness distillation: provide wide range of robust students | ProARD: progressive konversarische Robustheitsdestillation: für eine breite Palette robuster Studenten | ProARD: 进步对抗性强力蒸馏:提供各种强健学生 2506.07666v1 |
Authors (3): Seyedhamidreza Mousavi, Seyedali Mousavi, Masoud Daneshtalab
Adversarial Robustness Distillation (ARD) has emerged as an effective method to enhance the robustness of lightweight deep neural networks against adversarial attacks. Current ARD approaches have leveraged a large robust teacher network to train one robust lightweight student. However, due to the diverse range of edge devices and resource constraints, current approaches require training a new student network from scratch to meet specific constraints, leading to substantial computational costs and increased CO2 emissions. This paper proposes Progressive Adversarial Robustness Distillation (ProARD), enabling the efficient one-time training of a dynamic network that supports a diverse range of accurate and robust student networks without requiring retraining. We first make a dynamic deep neural network based on dynamic layers by encompassing variations in width, depth, and expansion in each design stage to support a wide range of architectures. Then, we consider the student network with the largest size as the dynamic teacher network. ProARD trains this dynamic network using a weight-sharing mechanism to jointly optimize the dynamic teacher network and its internal student networks. However, due to the high computational cost of calculating exact gradients for all the students within the dynamic network, a sampling mechanism is required to select a subset of students. We show that random student sampling in each iteration fails to produce accurate and robust students.
nan
Article 1059
Title@2025-06-09 (1): The Universality Lens: Why Even Highly Over-Parametrized Models Learn Well
Title: The Universality Lens: Why Even Highly Over-Parametrized Models Learn Well | Die Universalitätslinse: Warum selbst hoch überparametrisierte Modelle gut lernen | 普世透镜:为什么甚至高超美化模型学习良好 2506.07661v1 |
Authors (3): Meir Feder, Ruediger Urbanke, Yaniv Fogel
A fundamental question in modern machine learning is why large, over-parameterized models, such as deep neural networks and transformers, tend to generalize well, even when their number of parameters far exceeds the number of training samples. We investigate this phenomenon through the lens of information theory, grounded in universal learning theory. Specifically, we study a Bayesian mixture learner with log-loss and (almost) uniform prior over an expansive hypothesis class. Our key result shows that the learner’s regret is not determined by the overall size of the hypothesis class, but rather by the cumulative probability of all models that are close, in Kullback-Leibler divergence distance, to the true data-generating process. We refer to this cumulative probability as the weight of the hypothesis. This leads to a natural notion of model simplicity: simple models are those with large weight and thus require fewer samples to generalize, while complex models have small weight and need more data. This perspective provides a rigorous and intuitive explanation for why over-parameterized models often avoid overfitting: the presence of simple hypotheses allows the posterior to concentrate on them when supported by the data. We further bridge theory and practice by recalling that stochastic gradient descent with Langevin dynamics samples from the correct posterior distribution, enabling our theoretical learner to be approximated using standard machine learning methods combined with ensemble learning. Our analysis yields non-uniform regret bounds and aligns with key practical concepts such as flat minima and model distillation. The results apply broadly across online, batch, and supervised learning settings, offering a unified and principled understanding of the generalization behavior of modern AI systems.
nan
Article 1060
Title@2025-06-09 (1): Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
Title: Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Bewertung von Dialect Fairness und Robustheit von großen Sprachmodellen in mit Gründen versehenen Aufgaben | 评估大语言模式在说明理由任务中的公平性和强度 2410.11005v3 |
Authors (10): Fangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Xun Wang, Si-Qing Chen, Michael Wooldridge, Janet B. Pierrehumbert, Furu Wei
Language is not monolithic. While benchmarks, including those designed for multiple languages, are often used as proxies to evaluate the performance of Large Language Models (LLMs), they tend to overlook the nuances of within-language variation and thus fail to model the experience of speakers of non-standard dialects. Focusing on African American Vernacular English (AAVE), we present the first study aimed at objectively assessing the fairness and robustness of LLMs in handling dialects across canonical reasoning tasks, including algorithm, math, logic, and integrated reasoning. We introduce ReDial (Reasoning with Dialect Queries), a benchmark containing 1.2K+ parallel query pairs in Standardized English and AAVE. We hire AAVE speakers, including experts with computer science backgrounds, to rewrite seven popular benchmarks, such as HumanEval and GSM8K. With ReDial, we evaluate widely used LLMs, including GPT, Claude, Llama, Mistral, and the Phi model families. Our findings reveal that almost all of these widely used models show significant brittleness and unfairness to queries in AAVE. Our work establishes a systematic and objective framework for analyzing LLM bias in dialectal queries. Moreover, it highlights how mainstream LLMs provide unfair service to dialect speakers in reasoning tasks, laying a critical foundation for future research.
nan
Article 1061
Title@2025-06-09 (1): Feature Statistics with Uncertainty Help Adversarial Robustness
Title: Feature Statistics with Uncertainty Help Adversarial Robustness | Feature Statistik mit Unsicherheit Hilfe Widersprüchliche Robustheit | 具有不确定性的特征统计 2503.20583v2 |
Authors (6): Ran Wang, Xinlei Zhou, Meng Hu, Rihao Li, Wenhui Wu, Yuheng Jia
Despite the remarkable success of deep neural networks (DNNs), the security threat of adversarial attacks poses a significant challenge to the reliability of DNNs. In this paper, both theoretically and empirically, we discover a universal phenomenon that has been neglected in previous works, i.e., adversarial attacks tend to shift the distributions of feature statistics. Motivated by this finding, and by leveraging the advantages of uncertainty-aware stochastic methods in building robust models efficiently, we propose an uncertainty-driven feature statistics adjustment module for robustness enhancement, named Feature Statistics with Uncertainty (FSU). It randomly resamples channel-wise feature means and standard deviations of examples from multivariate Gaussian distributions, which helps to reconstruct the perturbed examples and calibrate the shifted distributions. The calibration recovers some domain characteristics of the data for classification, thereby mitigating the influence of perturbations and weakening the ability of attacks to deceive models. The proposed FSU module has universal applicability in training, attacking, predicting, and fine-tuning, demonstrating impressive robustness enhancement ability at a trivial additional time cost. For example, by fine-tuning the well-established models with FSU, the state-of-the-art methods achieve up to 17.13% and 34.82% robustness improvement against powerful AA and CW attacks on benchmark datasets.
nan
Article 1062
Title@2025-06-09 (1): Trajectory World Models for Heterogeneous Environments
Title: Trajectory World Models for Heterogeneous Environments | Trajektoriale Weltmodelle für heterogene Umgebungen | 不同不同环境世界模型 2502.01366v2 |
Authors (7): Shaofeng Yin, Jialong Wu, Siqiao Huang, Xingjian Su, Xu He, Jianye Hao, Mingsheng Long
Heterogeneity in sensors and actuators across environments poses a significant challenge to building large-scale pre-trained world models on top of this low-dimensional sensor information. In this work, we explore pre-training world models for heterogeneous environments by addressing key transfer barriers in both data diversity and model flexibility. We introduce UniTraj, a unified dataset comprising over one million trajectories from 80 environments, designed to scale data while preserving critical diversity. Additionally, we propose TrajWorld, a novel architecture capable of flexibly handling varying sensor and actuator information and capturing environment dynamics in-context. Pre-training TrajWorld on UniTraj yields substantial gains in transition prediction, achieves a new state-of-the-art for off-policy evaluation, and also delivers superior online performance of model predictive control. To the best of our knowledge, this work, for the first time, demonstrates the transfer benefits of world models across heterogeneous and complex control environments. Code and data are available at https://github.com/thuml/TrajWorld.
nan
Article 1063
Title@2025-06-09 (1): Training Articulatory Inversion Models for Interspeaker Consistency
Title: Training Articulatory Inversion Models for Interspeaker Consistency | Training Artikulatorische Inversionsmodelle für die Konsistenz von Interspeakern | 跨口间连通性人工转换模型培训 2505.20529v3 |
Authors (3): Charles McGhee, Mark J. F. Gales, Kate M. Knill
Acoustic-to-Articulatory Inversion (AAI) attempts to model the inverse mapping from speech to articulation. Exact articulatory prediction from speech alone may be impossible, as speakers can choose different forms of articulation seemingly without reference to their vocal tract structure. However, once a speaker has selected an articulatory form, their productions vary minimally. Recent works in AAI have proposed adapting Self-Supervised Learning (SSL) models to single-speaker datasets, claiming that these single-speaker models provide a universal articulatory template. In this paper, we investigate whether SSL-adapted models trained on single and multi-speaker data produce articulatory targets which are consistent across speaker identities for English and Russian. We do this through the use of a novel evaluation method which extracts articulatory targets using minimal pair sets. We also present a training method which can improve interspeaker consistency using only speech data.
nan
Article 1064
Title@2025-06-09 (1): HieraEdgeNet: A Multi-Scale Edge-Enhanced Framework for Automated Pollen Recognition
Title: HieraEdgeNet: A Multi-Scale Edge-Enhanced Framework for Automated Pollen Recognition | HieraEdgeNet: Multi-Scale Edge-Enhanced Framework für automatisierte Pollenerkennung | HieraEdgeNet:一个多层边缘强化自动承认花粉框架 2506.07637v1 |
Authors (6): Yuchong Long, Wen Sun, Ningxiao Sun, Wenxiao Wang, Chao Li, Shan Yin
Automated pollen recognition is vital to paleoclimatology, biodiversity monitoring, and public health, yet conventional methods are hampered by inefficiency and subjectivity. Existing deep learning models often struggle to achieve the requisite localization accuracy for microscopic targets like pollen, which are characterized by their minute size, indistinct edges, and complex backgrounds. To overcome this limitation, we introduce HieraEdgeNet, a multi-scale edge-enhancement framework. The framework’s core innovation is the introduction of three synergistic modules: the Hierarchical Edge Module (HEM), which explicitly extracts a multi-scale pyramid of edge features that corresponds to the semantic hierarchy at early network stages; the Synergistic Edge Fusion (SEF) module, for deeply fusing these edge priors with semantic information at each respective scale; and the Cross Stage Partial Omni-Kernel Module (CSPOKM), which maximally refines the most detail-rich feature layers using an Omni-Kernel operator - comprising anisotropic large-kernel convolutions and mixed-domain attention - all within a computationally efficient Cross-Stage Partial (CSP) framework. On a large-scale dataset comprising 120 pollen classes, HieraEdgeNet achieves a mean Average Precision (mAP@.5) of 0.9501, significantly outperforming state-of-the-art baseline models such as YOLOv12n and RT-DETR. Furthermore, qualitative analysis confirms that our approach generates feature representations that are more precisely focused on object boundaries. By systematically integrating edge information, HieraEdgeNet provides a robust and powerful solution for high-precision, high-efficiency automated detection of microscopic objects.
nan
Article 1065
Title@2025-06-09 (1): RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors
Title: RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors | RAID: Ein Datensatz für die Prüfung der adversarialen Robustheit von KI-generierten Bilddetektoren | 用于测试AI-光学图像探测器的反向强力的数据集 2506.03988v3 |
Authors (11): Hicham Eddoubi, Jonas Ricker, Federico Cocchi, Lorenzo Baraldi, Angelo Sotgiu, Maura Pintor, Marcella Cornia, Lorenzo Baraldi, Asja Fischer, Rita Cucchiara, Battista Biggio
AI-generated images have reached a quality level at which humans are incapable of reliably distinguishing them from real images. To counteract the inherent risk of fraud and disinformation, the detection of AI-generated images is a pressing challenge and an active research topic. While many of the presented methods claim to achieve high detection accuracy, they are usually evaluated under idealized conditions. In particular, the adversarial robustness is often neglected, potentially due to a lack of awareness or the substantial effort required to conduct a comprehensive robustness analysis. In this work, we tackle this problem by providing a simpler means to assess the robustness of AI-generated image detectors. We present RAID (Robust evaluation of AI-generated image Detectors), a dataset of 72k diverse and highly transferable adversarial examples. The dataset is created by running attacks against an ensemble of seven state-of-the-art detectors and images generated by four different text-to-image models. Extensive experiments show that our methodology generates adversarial images that transfer with a high success rate to unseen detectors, which can be used to quickly provide an approximate yet still reliable estimate of a detector’s adversarial robustness. Our findings indicate that current state-of-the-art AI-generated image detectors can be easily deceived by adversarial examples, highlighting the critical need for the development of more robust methods. We release our dataset at https://huggingface.co/datasets/aimagelab/RAID and evaluation code at https://github.com/pralab/RAID.
nan
Article 1066
Title@2025-06-09 (1): Can Perplexity Predict Fine-tuning Performance? An Investigation of Tokenization Effects on Sequential Language Models for Nepali
Title: Can Perplexity Predict Fine-tuning Performance? An Investigation of Tokenization Effects on Sequential Language Models for Nepali | Kann Perplexity Fine-Tuning Performance voraussagen? Eine Untersuchung von Tokenization-Effekten auf sequentielle Sprachmodelle für Nepali | 重复性预测微调性能能否? 对尼泊尔的序列语言模式的收缩性影响调查i 2404.18071v2 |
Authors (4): Nishant Luitel, Nirajan Bekoju, Anand Kumar Sah, Subarna Shakya
The impact of subword tokenization on language model performance is well-documented for perplexity, with finer granularity consistently reducing this intrinsic metric. However, research on how different tokenization schemes affect a model’s understanding capabilities remains limited, particularly for non-Latin script languages. Addressing this gap, we conducted a comprehensive evaluation of six distinct tokenization strategies by pretraining transformer-based language models for Nepali and evaluating their performance across multiple downstream tasks. While recent prominent models like GPT, RoBERTa, Claude, LLaMA, Mistral, Falcon, and MPT have adopted byte-level BPE tokenization, our findings demonstrate that for Nepali, SentencePiece tokenization consistently yields superior results on understanding-based tasks. Unlike previous studies that primarily focused on BERT-based architectures, our research specifically examines sequential transformer models, providing valuable insights for language model development in low-resource languages and highlighting the importance of tokenization strategy beyond perplexity reduction.
nan
Article 1067
Title@2025-06-09 (1): AI-based Framework for Robust Model-Based Connector Mating in Robotic Wire Harness Installation
Title: AI-based Framework for Robust Model-Based Connector Mating in Robotic Wire Harness Installation | KI-basiertes Framework für robuste modellbasierte Steckverbinder-Matierung in der Roboterdraht-Harness-Installation | 以AI为基础的机器人电线装置中强有力的模型连接器配对框架 2503.09409v2 |
Authors (6): Claudius Kienle, Benjamin Alt, Finn Schneider, Tobias Pertlwieser, Rainer Jäkel, Rania Rayyes
Despite the widespread adoption of industrial robots in automotive assembly, wire harness installation remains a largely manual process, as it requires precise and flexible manipulation. To address this challenge, we design a novel AI-based framework that automates cable connector mating by integrating force control with deep visuotactile learning. Our system optimizes search-and-insertion strategies using first-order optimization over a multimodal transformer architecture trained on visual, tactile, and proprioceptive data. Additionally, we design a novel automated data collection and optimization pipeline that minimizes the need for machine learning expertise. The framework optimizes robot programs that run natively on standard industrial controllers, permitting human experts to audit and certify them. Experimental validations on a center console assembly task demonstrate significant improvements in cycle times and robustness compared to conventional robot programming approaches. Videos are available under https://claudius-kienle.github.io/AppMuTT.
nan
Article 1068
Title@2025-06-09 (1): Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks
Title: Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks | Rückkehr von ChebNet: Verständnis und Verbesserung eines überblickten GNN bei langfristigen Aufgaben | ChebNet的返回:了解和改进对长期任务GNN的忽视 2506.07624v1 |
Authors (9): Ali Hariri, Álvaro Arroyo, Alessio Gravina, Moshe Eliasof, Carola-Bibiane Schönlieb, Davide Bacciu, Kamyar Azizzadenesheli, Xiaowen Dong, Pierre Vandergheynst
ChebNet, one of the earliest spectral GNNs, has largely been overshadowed by Message Passing Neural Networks (MPNNs), which gained popularity for their simplicity and effectiveness in capturing local graph structure. Despite their success, MPNNs are limited in their ability to capture long-range dependencies between nodes. This has led researchers to adapt MPNNs through rewiring or make use of Graph Transformers, which compromises the computational efficiency that characterized early spatial message-passing architectures, and typically disregards the graph structure. Almost a decade after its original introduction, we revisit ChebNet to shed light on its ability to model distant node interactions. We find that out-of-box, ChebNet already shows competitive advantages relative to classical MPNNs and GTs on long-range benchmarks, while maintaining good scalability properties for high-order polynomials. However, we uncover that this polynomial expansion leads ChebNet to an unstable regime during training. To address this limitation, we cast ChebNet as a stable and non-dissipative dynamical system, which we coin Stable-ChebNet. Our Stable-ChebNet model allows for stable information propagation, and has controllable dynamics which do not require the use of eigendecompositions, positional encodings, or graph rewiring. Across several benchmarks, Stable-ChebNet achieves near state-of-the-art performance.
nan
Article 1069
Title@2025-06-09 (1): LoRMA: Low-Rank Multiplicative Adaptation for LLMs
Title: LoRMA: Low-Rank Multiplicative Adaptation for LLMs | LoRMA: Multiplikative Anpassung mit niedrigem Rank für LLMs | LoRMA:LLMML的低Rank多复制性适应 2506.07621v1 |
Authors (3): Harsh Bihany, Shubham Patel, Ashutosh Modi
Large Language Models have shown remarkable capabilities in the NLP domain. Their effectiveness can mainly be attributed to their ability to adapt to an array of downstream tasks. However, generally, full fine-tuning is a computationally expensive job. To mitigate this, many techniques have been developed that prime efficiency, a prominent one being Low-Rank Adaptation (LoRA). However, LoRA and its variants employ re-parametrized additive updates. In this paper, we propose Low-Rank Multiplicative Adaptation (LoRMA), which shifts the paradigm of additive updates to a richer space of matrix multiplicative transformations. We tackle challenges such as computational complexity and rank bottleneck of matrix multiplication by effectively re-ordering operations and introducing rank inflation strategies. We conduct extensive experiments to demonstrate the effectiveness of our approach in terms of various evaluation metrics.
nan
Article 1070
Title@2025-06-09 (1): Retrieval-Augmented Generation as Noisy In-Context Learning: A Unified Theory and Risk Bounds
Title: Retrieval-Augmented Generation as Noisy In-Context Learning: A Unified Theory and Risk Bounds | Retrieval-Augmented Generation as Noisy In-Context Learning: Eine einheitliche Theorie und Risikogrenzen | 作为有噪音的文内学习:统一理论和风险波 2506.03100v3 |
Authors (5): Yang Guo, Yutian Tao, Yifei Ming, Robert D. Nowak, Yingyu Liang
Retrieval-augmented generation (RAG) has seen many empirical successes in recent years by aiding the LLM with external knowledge. However, its theoretical aspect has remained mostly unexplored. In this paper, we propose the first finite-sample generalization bound for RAG in in-context linear regression and derive an exact bias-variance tradeoff. Our framework views the retrieved texts as query-dependent noisy in-context examples and recovers the classical in-context learning (ICL) and standard RAG as the limit cases. Our analysis suggests that an intrinsic ceiling on generalization error exists on RAG as opposed to the ICL. Furthermore, our framework is able to model retrieval both from the training data and from external corpora by introducing uniform and non-uniform RAG noise. In line with our theory, we show the sample efficiency of ICL and RAG empirically with experiments on common QA benchmarks, such as Natural Questions and TriviaQA.
nan
Article 1071
Title@2025-06-09 (1): The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning
Title: The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning | Der Catechol-Benchmark: Zeitreihen-Lösungsauswahldaten für wenige Aufnahmen maschinelles Lernen | Catechol基准:几发机器学习的时间序列溶剂选择数据 2506.07619v1 |
Authors (14): Toby Boyne, Juan S. Campos, Becky D. Langdon, Jixiang Qing, Yilin Xie, Shiqiang Zhang, Calvin Tsay, Ruth Misener, Daniel W. Davies, Kim E. Jelfs, Sarah Boyall, Thomas M. Dixon, Linden Schrecker, Jose Pablo Folch
Machine learning has promised to change the landscape of laboratory chemistry, with impressive results in molecular property prediction and reaction retro-synthesis. However, chemical datasets are often inaccessible to the machine learning community as they tend to require cleaning, thorough understanding of the chemistry, or are simply not available. In this paper, we introduce a novel dataset for yield prediction, providing the first-ever transient flow dataset for machine learning benchmarking, covering over 1200 process conditions. While previous datasets focus on discrete parameters, our experimental set-up allow us to sample a large number of continuous process conditions, generating new challenges for machine learning models. We focus on solvent selection, a task that is particularly difficult to model theoretically and therefore ripe for machine learning applications. We showcase benchmarking for regression algorithms, transfer-learning approaches, feature engineering, and active learning, with important applications towards solvent replacement and sustainable manufacturing.
nan
Article 1072
Title@2025-06-09 (1): FuXi-Air: Urban Air Quality Forecasting Based on Emission-Meteorology-Pollutant multimodal Machine Learning
Title: FuXi-Air: Urban Air Quality Forecasting Based on Emission-Meteorology-Pollutant multimodal Machine Learning | FuXi-Air: Vorhersage der städtischen Luftqualität auf der Grundlage von Emissions-Meteorologie-Pollutan multimodalen maschinellen Lernens | FUXi-Air:基于排放-气象-污染-多式联运机学习的城市空气质量预测 2506.07616v1 |
Authors (12): Zhixin Geng, Xu Fan, Xiqiao Lu, Yan Zhang, Guangyuan Yu, Cheng Huang, Qian Wang, Yuewu Li, Weichun Ma, Qi Yu, Libo Wu, Hao Li
Air pollution has emerged as a major public health challenge in megacities. Numerical simulations and single-site machine learning approaches have been widely applied in air quality forecasting tasks. However, these methods face multiple limitations, including high computational costs, low operational efficiency, and limited integration with observational data. With the rapid advancement of artificial intelligence, there is an urgent need to develop a low-cost, efficient air quality forecasting model for smart urban management. An air quality forecasting model, named FuXi-Air, has been constructed in this study based on multimodal data fusion to support high-precision air quality forecasting and operated in typical megacities. The model integrates meteorological forecasts, emission inventories, and pollutant monitoring data under the guidance of air pollution mechanism. By combining an autoregressive prediction framework with a frame interpolation strategy, the model successfully completes 72-hour forecasts for six major air pollutants at an hourly resolution across multiple monitoring sites within 25-30 seconds. In terms of both computational efficiency and forecasting accuracy, it outperforms the mainstream numerical air quality models in operational forecasting work. Ablation experiments concerning key influencing factors show that although meteorological data contribute more to model accuracy than emission inventories do, the integration of multimodal data significantly improves forecasting precision and ensures that reliable predictions are obtained under differing pollution mechanisms across megacities. This study provides both a technical reference and a practical example for applying multimodal data-driven models to air quality forecasting and offers new insights into building hybrid forecasting systems to support air pollution risk warning in smart city management.
nan
Article 1073
Title@2025-06-09 (1): Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds
Title: Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds | Poisson Midpoint-Methode für Log Concave Sampling: Jenseits der starken Fehler unteren Bounds | 日志集中取样的 Poisson 中点方法: 超越强误差, 下界 2506.07614v1 |
Authors (2): Rishikesh Srinivasan, Dheeraj Nagaraj
We study the problem of sampling from strongly log-concave distributions over $\mathbb{R}^d$ using the Poisson midpoint discretization (a variant of the randomized midpoint method) for overdamped/underdamped Langevin dynamics. We prove its convergence in the 2-Wasserstein distance ($W_2$), achieving a cubic speedup in dependence on the target accuracy ($\epsilon$) over the Euler-Maruyama discretization, surpassing existing bounds for randomized midpoint methods. Notably, in the case of underdamped Langevin dynamics, we demonstrate the complexity of $W_2$ convergence is much smaller than the complexity lower bounds for convergence in $L^2$ strong error established in the literature.
nan
Article 1074
Title@2025-06-09 (1): APE: Selective Fine-tuning with Acceptance Criteria for Language Model Adaptation
Title: APE: Selective Fine-tuning with Acceptance Criteria for Language Model Adaptation | APE: Selektive Feinabstimmung mit Annahmekriterien für Sprachmodellanpassung | APE:有选择地调整语言模式适应接受标准 2505.19912v2 |
Authors (1): Javier Marín
We present Adjacent Possible Exploration (APE), a selective fine-tuning method for adapting large language models that systematically explores parameter modifications while maintaining model stability. Inspired by evolutionary optimization principles, APE evaluates multiple candidate parameter updates through fine-tuning on small data subsets and accepts only those exceeding a performance threshold. Unlike standard fine-tuning that follows single gradient directions, APE implements a filtered selection process that prevents destabilizing parameter changes while enabling systematic improvement. Our method achieves 33.9\% BLEU improvement and 36.2\% perplexity reduction on news summarization tasks while using minimal computational resources. The approach provides a practical framework for controlled model adaptation that balances performance gains with representational stability.
nan
Article 1075
Title@2025-06-09 (1): Link Prediction with Relational Hypergraphs
Title: Link Prediction with Relational Hypergraphs | Link Vorhersage mit relationalen Hypergraphen | 将预测与关系电报联系起来 2402.04062v3 |
Authors (5): Xingyue Huang, Miguel Romero Orth, Pablo Barceló, Michael M. Bronstein, İsmail İlkan Ceylan
Link prediction with knowledge graphs has been thoroughly studied in graph machine learning, leading to a rich landscape of graph neural network architectures with successful applications. Nonetheless, it remains challenging to transfer the success of these architectures to relational hypergraphs, where the task of link prediction is over $k$-ary relations, which is substantially harder than link prediction with knowledge graphs. In this paper, we propose a framework for link prediction with relational hypergraphs, unlocking applications of graph neural networks to fully relational structures. Theoretically, we conduct a thorough analysis of the expressive power of the resulting model architectures via corresponding relational Weisfeiler-Leman algorithms and also via logical expressiveness. Empirically, we validate the power of the proposed model architectures on various relational hypergraph benchmarks. The resulting model architectures substantially outperform every baseline for inductive link prediction, and lead to state-of-the-art results for transductive link prediction.
nan
Article 1076
Title@2025-06-09 (1): SWAG: Long-term Surgical Workflow Prediction with Generative-based Anticipation
Title: SWAG: Long-term Surgical Workflow Prediction with Generative-based Anticipation | SWAG: Langzeit-Surgical Workflow Prediction mit generativer Antizipation | SWAG: 长期外科工作流动预测,以产生为基础预测 2412.18849v3 |
Authors (5): Maxence Boels, Yang Liu, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin
While existing approaches excel at recognising current surgical phases, they provide limited foresight and intraoperative guidance into future procedural steps. Similarly, current anticipation methods are constrained to predicting short-term and single events, neglecting the dense, repetitive, and long sequential nature of surgical workflows. To address these needs and limitations, we propose SWAG (Surgical Workflow Anticipative Generation), a framework that combines phase recognition and anticipation using a generative approach. This paper investigates two distinct decoding methods - single-pass (SP) and auto-regressive (AR) - to generate sequences of future surgical phases at minute intervals over long horizons. We propose a novel embedding approach using class transition probabilities to enhance the accuracy of phase anticipation. Additionally, we propose a generative framework using remaining time regression to classification (R2C). SWAG was evaluated on two publicly available datasets, Cholec80 and AutoLaparo21. Our single-pass model with class transition probability embeddings (SP*) achieves 32.1% and 41.3% F1 scores over 20 and 30 minutes on Cholec80 and AutoLaparo21, respectively. Moreover, our approach competes with existing methods on phase remaining time regression, achieving weighted mean absolute errors of 0.32 and 0.48 minutes for 2- and 3-minute horizons. SWAG demonstrates versatility across generative decoding frame works and classification and regression tasks to create temporal continuity between surgical workflow recognition and anticipation. Our method provides steps towards intraoperative surgical workflow generation for anticipation. Project: https://maxboels.github.io/swag.
nan
Article 1077
Title@2025-06-09 (1): Tree-Sliced Wasserstein Distance with Nonlinear Projection
Title: Tree-Sliced Wasserstein Distance with Nonlinear Projection | Baumgeschliffene Wasserstein-Distanz mit nichtlinearer Projektion | 用非线性投影绘制的树切瓦西尔斯坦距离 2505.00968v2 |
Authors (7): Thanh Tran, Viet-Hoang Tran, Thanh Chu, Trang Pham, Laurent El Ghaoui, Tam Le, Tan M. Nguyen
Tree-Sliced methods have recently emerged as an alternative to the traditional Sliced Wasserstein (SW) distance, replacing one-dimensional lines with tree-based metric spaces and incorporating a splitting mechanism for projecting measures. This approach enhances the ability to capture the topological structures of integration domains in Sliced Optimal Transport while maintaining low computational costs. Building on this foundation, we propose a novel nonlinear projectional framework for the Tree-Sliced Wasserstein (TSW) distance, substituting the linear projections in earlier versions with general projections, while ensuring the injectivity of the associated Radon Transform and preserving the well-definedness of the resulting metric. By designing appropriate projections, we construct efficient metrics for measures on both Euclidean spaces and spheres. Finally, we validate our proposed metric through extensive numerical experiments for Euclidean and spherical datasets. Applications include gradient flows, self-supervised learning, and generative models, where our methods demonstrate significant improvements over recent SW and TSW variants.
nan
Article 1078
Title@2025-06-09 (1): Learning from Double Positive and Unlabeled Data for Potential-Customer Identification
Title: Learning from Double Positive and Unlabeled Data for Potential-Customer Identification | Lernen aus doppelt positiven und unmarkierten Daten für die potenzielle Kundenidentifikation | 学习用于潜在海关识别的双正和无标签数据 2506.00436v2 |
Authors (5): Masahiro Kato, Yuki Ikeda, Kentaro Baba, Takashi Imai, Ryo Inokuchi
In this study, we propose a method for identifying potential customers in targeted marketing by applying learning from positive and unlabeled data (PU learning). We consider a scenario in which a company sells a product and can observe only the customers who purchased it. Decision-makers seek to market products effectively based on whether people have loyalty to the company. Individuals with loyalty are those who are likely to remain interested in the company even without additional advertising. Consequently, those loyal customers would likely purchase from the company if they are interested in the product. In contrast, people with lower loyalty may overlook the product or buy similar products from other companies unless they receive marketing attention. Therefore, by focusing marketing efforts on individuals who are interested in the product but do not have strong loyalty, we can achieve more efficient marketing. To achieve this goal, we consider how to learn, from limited data, a classifier that identifies potential customers who (i) have interest in the product and (ii) do not have loyalty to the company. Although our algorithm comprises a single-stage optimization, its objective function implicitly contains two losses derived from standard PU learning settings. For this reason, we refer to our approach as double PU learning. We verify the validity of the proposed algorithm through numerical experiments, confirming that it functions appropriately for the problem at hand.
nan
Article 1079
Title@2025-06-09 (1): How Expressive are Knowledge Graph Foundation Models?
Title: How Expressive are Knowledge Graph Foundation Models? | Wie ausdrucksstark sind Wissensgraphen-Stiftungsmodelle? | 知识图基础模型的表达性如何? 2502.13339v2 |
Authors (7): Xingyue Huang, Pablo Barceló, Michael M. Bronstein, İsmail İlkan Ceylan, Mikhail Galkin, Juan L Reutter, Miguel Romero Orth
Knowledge Graph Foundation Models (KGFMs) are at the frontier for deep learning on knowledge graphs (KGs), as they can generalize to completely novel knowledge graphs with different relational vocabularies. Despite their empirical success, our theoretical understanding of KGFMs remains very limited. In this paper, we conduct a rigorous study of the expressive power of KGFMs. Specifically, we show that the expressive power of KGFMs directly depends on the motifs that are used to learn the relation representations. We then observe that the most typical motifs used in the existing literature are binary, as the representations are learned based on how pairs of relations interact, which limits the model’s expressiveness. As part of our study, we design more expressive KGFMs using richer motifs, which necessitate learning relation representations based on, e.g., how triples of relations interact with each other. Finally, we empirically validate our theoretical findings, showing that the use of richer motifs results in better performance on a wide range of datasets drawn from different domains.
nan
Article 1080
Title@2025-06-09 (1): Revisiting Non-Acyclic GFlowNets in Discrete Environments
Title: Revisiting Non-Acyclic GFlowNets in Discrete Environments | Nicht-azyklische GFlowNets in diskreten Umgebungen erneut besuchen | 重新审视环境混乱中的非周期性GFlowNets 2502.07735v2 |
Authors (4): Nikita Morozov, Ian Maksimov, Daniil Tiapkin, Sergey Samsonov
Generative Flow Networks (GFlowNets) are a family of generative models that learn to sample objects from a given probability distribution, potentially known up to a normalizing constant. Instead of working in the object space, GFlowNets proceed by sampling trajectories in an appropriately constructed directed acyclic graph environment, greatly relying on the acyclicity of the graph. In our paper, we revisit the theory that relaxes the acyclicity assumption and present a simpler theoretical framework for non-acyclic GFlowNets in discrete environments. Moreover, we provide various novel theoretical insights related to training with fixed backward policies, the nature of flow functions, and connections between entropy-regularized RL and non-acyclic GFlowNets, which naturally generalize the respective concepts and theoretical results from the acyclic setting. In addition, we experimentally re-examine the concept of loss stability in non-acyclic GFlowNet training, as well as validate our own theoretical findings.
nan
Article 1081
Title@2025-06-09 (1): TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts
Title: TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts | TwinBreak: Jailbreaking LLM Security Alignments basierend auf Twin Prompts | 双突破:基于双提示的监狱破碎LLM安保调整 2506.07596v1 |
Authors (3): Torsten Krauß, Hamid Dashtbani, Alexandra Dmitrienko
Machine learning is advancing rapidly, with applications bringing notable benefits, such as improvements in translation and code generation. Models like ChatGPT, powered by Large Language Models (LLMs), are increasingly integrated into daily life. However, alongside these benefits, LLMs also introduce social risks. Malicious users can exploit LLMs by submitting harmful prompts, such as requesting instructions for illegal activities. To mitigate this, models often include a security mechanism that automatically rejects such harmful prompts. However, they can be bypassed through LLM jailbreaks. Current jailbreaks often require significant manual effort, high computational costs, or result in excessive model modifications that may degrade regular utility. We introduce TwinBreak, an innovative safety alignment removal method. Building on the idea that the safety mechanism operates like an embedded backdoor, TwinBreak identifies and prunes parameters responsible for this functionality. By focusing on the most relevant model layers, TwinBreak performs fine-grained analysis of parameters essential to model utility and safety. TwinBreak is the first method to analyze intermediate outputs from prompts with high structural and content similarity to isolate safety parameters. We present the TwinPrompt dataset containing 100 such twin prompts. Experiments confirm TwinBreak’s effectiveness, achieving 89% to 98% success rates with minimal computational requirements across 16 LLMs from five vendors.
nan
Article 1082
Title@2025-06-09 (1): Tree-Sliced Wasserstein Distance: A Geometric Perspective
Title: Tree-Sliced Wasserstein Distance: A Geometric Perspective | Baumgeschliffene Wasserstein-Distanz: Eine geometrische Perspektive | 树切瓦西尔斯坦距离:几何视角 2406.13725v3 |
Authors (7): Viet-Hoang Tran, Trang Pham, Tho Tran, Minh Khoi Nguyen Nhat, Thanh Chu, Tam Le, Tan M. Nguyen
Many variants of Optimal Transport (OT) have been developed to address its heavy computation. Among them, notably, Sliced Wasserstein (SW) is widely used for application domains by projecting the OT problem onto one-dimensional lines, and leveraging the closed-form expression of the univariate OT to reduce the computational burden. However, projecting measures onto low-dimensional spaces can lead to a loss of topological information. To mitigate this issue, in this work, we propose to replace one-dimensional lines with a more intricate structure, called tree systems. This structure is metrizable by a tree metric, which yields a closed-form expression for OT problems on tree systems. We provide an extensive theoretical analysis to formally define tree systems with their topological properties, introduce the concept of splitting maps, which operate as the projection mechanism onto these structures, then finally propose a novel variant of Radon transform for tree systems and verify its injectivity. This framework leads to an efficient metric between measures, termed Tree-Sliced Wasserstein distance on Systems of Lines (TSW-SL). By conducting a variety of experiments on gradient flows, image style transfer, and generative models, we illustrate that our proposed approach performs favorably compared to SW and its variants.
nan
Article 1083
Title@2025-06-09 (1): Exploiting Curvature in Online Convex Optimization with Delayed Feedback
Title: Exploiting Curvature in Online Convex Optimization with Delayed Feedback | Ausnutzen von Kurven in Online Convex-Optimierung mit verzögertem Feedback | 利用在线 Convex 优化使用延迟反馈的在线 Convex 优化中的曲线曲线 2506.07595v1 |
Authors (3): Hao Qiu, Emmanuel Esposito, Mengxiao Zhang
In this work, we study the online convex optimization problem with curved losses and delayed feedback. When losses are strongly convex, existing approaches obtain regret bounds of order $d_{\max} \ln T$, where $d_{\max}$ is the maximum delay and $T$ is the time horizon. However, in many cases, this guarantee can be much worse than $\sqrt{d_{\mathrm{tot}}}$ as obtained by a delayed version of online gradient descent, where $d_{\mathrm{tot}}$ is the total delay. We bridge this gap by proposing a variant of follow-the-regularized-leader that obtains regret of order $\min{\sigma_{\max}\ln T, \sqrt{d_{\mathrm{tot}}}}$, where $\sigma_{\max}$ is the maximum number of missing observations. We then consider exp-concave losses and extend the Online Newton Step algorithm to handle delays with an adaptive learning rate tuning, achieving regret $\min{d_{\max} n\ln T, \sqrt{d_{\mathrm{tot}}}}$ where $n$ is the dimension. To our knowledge, this is the first algorithm to achieve such a regret bound for exp-concave losses. We further consider the problem of unconstrained online linear regression and achieve a similar guarantee by designing a variant of the Vovk-Azoury-Warmuth forecaster with a clipping trick. Finally, we implement our algorithms and conduct experiments under various types of delay and losses, showing an improved performance over existing methods.
nan
Article 1084
Title@2025-06-09 (1): Lite-RVFL: A Lightweight Random Vector Functional-Link Neural Network for Learning Under Concept Drift
Title: Lite-RVFL: A Lightweight Random Vector Functional-Link Neural Network for Learning Under Concept Drift | Lite-RVFL: Ein leichter Zufallsvektor Funktions-Link Neuronales Netzwerk zum Lernen unter Konzept Drift | Lite-RVFL: 轻量随机随机矢量功能链接神经网络,用于在概念驱动下学习 2506.08063v1 |
Authors (3): Songqiao Hu, Zeyi Liu, Xiao He
The change in data distribution over time, also known as concept drift, poses a significant challenge to the reliability of online learning methods. Existing methods typically require model retraining or drift detection, both of which demand high computational costs and are often unsuitable for real-time applications. To address these limitations, a lightweight, fast and efficient random vector functional-link network termed Lite-RVFL is proposed, capable of adapting to concept drift without drift detection and retraining. Lite-RVFL introduces a novel objective function that assigns weights exponentially increasing to new samples, thereby emphasizing recent data and enabling timely adaptation. Theoretical analysis confirms the feasibility of this objective function for drift adaptation, and an efficient incremental update rule is derived. Experimental results on a real-world safety assessment task validate the efficiency, effectiveness in adapting to drift, and potential to capture temporal patterns of Lite-RVFL. The source code is available at https://github.com/songqiaohu/Lite-RVFL.
nan
Article 1085
Title@2025-06-09 (1): FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning
Title: FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning | FairDICE: Fairness-getriebenes Offline-Multi-Ziel-Verstärkungs-Lernen | FairDICE: 公平发展离线多目标强化学习 2506.08062v1 |
Authors (4): Woosung Kim, Jinho Lee, Jongmin Lee, Byung-Jun Lee
Multi-objective reinforcement learning (MORL) aims to optimize policies in the presence of conflicting objectives, where linear scalarization is commonly used to reduce vector-valued returns into scalar signals. While effective for certain preferences, this approach cannot capture fairness-oriented goals such as Nash social welfare or max-min fairness, which require nonlinear and non-additive trade-offs. Although several online algorithms have been proposed for specific fairness objectives, a unified approach for optimizing nonlinear welfare criteria in the offline setting-where learning must proceed from a fixed dataset-remains unexplored. In this work, we present FairDICE, the first offline MORL framework that directly optimizes nonlinear welfare objective. FairDICE leverages distribution correction estimation to jointly account for welfare maximization and distributional regularization, enabling stable and sample-efficient learning without requiring explicit preference weights or exhaustive weight search. Across multiple offline benchmarks, FairDICE demonstrates strong fairness-aware performance compared to existing baselines.
nan
Article 1086
Title@2025-06-09 (1): Explore the vulnerability of black-box models via diffusion models
Title: Explore the vulnerability of black-box models via diffusion models | Entdecken Sie die Schwachstelle von Black-Box-Modellen über Diffusionsmodelle | 通过推广模式探索黑盒模式的脆弱性 2506.07590v1 |
Authors (4): Jiacheng Shi, Yanfu Zhang, Huajie Shao, Ashley Gao
Recent advancements in diffusion models have enabled high-fidelity and photorealistic image generation across diverse applications. However, these models also present security and privacy risks, including copyright violations, sensitive information leakage, and the creation of harmful or offensive content that could be exploited maliciously. In this study, we uncover a novel security threat where an attacker leverages diffusion model APIs to generate synthetic images, which are then used to train a high-performing substitute model. This enables the attacker to execute model extraction and transfer-based adversarial attacks on black-box classification models with minimal queries, without needing access to the original training data. The generated images are sufficiently high-resolution and diverse to train a substitute model whose outputs closely match those of the target model. Across the seven benchmarks, including CIFAR and ImageNet subsets, our method shows an average improvement of 27.37% over state-of-the-art methods while using just 0.01 times of the query budget, achieving a 98.68% success rate in adversarial attacks on the target model.
nan
Article 1087
Title@2025-06-09 (1): Attention with Trained Embeddings Provably Selects Important Tokens
Title: Attention with Trained Embeddings Provably Selects Important Tokens | Aufmerksamkeit bei trainierten Einbettungen wählt wahrscheinlich wichtige Token aus | 与经过训练的嵌入器的关注 2505.17282v2 |
Authors (4): Diyuan Wu, Aleksandr Shevchenko, Samet Oymak, Marco Mondelli
Token embeddings play a crucial role in language modeling but, despite this practical relevance, their theoretical understanding remains limited. Our paper addresses the gap by characterizing the structure of embeddings obtained via gradient descent. Specifically, we consider a one-layer softmax attention model with a linear head for binary classification, i.e., $\texttt{Softmax}( p^\top E_X^\top ) E_X v = \frac{ \sum_{i=1}^T \exp(p^\top E_{x_i}) E_{x_i}^\top v}{\sum_{j=1}^T \exp(p^\top E_{x_{j}}) }$, where $E_X = [ E_{x_1} , \dots, E_{x_T} ]^\top$ contains the embeddings of the input sequence, $p$ is the embedding of the $\mathrm{\langle cls \rangle}$ token and $v$ the output vector. First, we show that, already after a single step of gradient training with the logistic loss, the embeddings $E_X$ capture the importance of tokens in the dataset by aligning with the output vector $v$ proportionally to the frequency with which the corresponding tokens appear in the dataset. Then, after training $p$ via gradient flow until convergence, the softmax selects the important tokens in the sentence (i.e., those that are predictive of the label), and the resulting $\mathrm{\langle cls \rangle}$ embedding maximizes the margin for such a selection. Experiments on real-world datasets (IMDB, Yelp) exhibit a phenomenology close to that unveiled by our theory.
nan
Article 1088
Title@2025-06-09 (1): PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs
Title: PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs | PrunePEFT: Iterative Hybride Pruning für eine parametereffiziente Feinabstimmung von LLMs | PurnePEFT: LLMs 参数有效微调过渡混合节制 2506.07587v1 |
Authors (6): Tongzhou Yu, Zhuhao Zhang, Guanghui Zhu, Shen Jiang, Meikang Qiu, Yihua Huang
Parameter Efficient Fine-Tuning (PEFT) methods have emerged as effective and promising approaches for fine-tuning pre-trained language models. Compared with Full parameter Fine-Tuning (FFT), PEFT achieved comparable task performance with a substantial reduction of trainable parameters, which largely saved the training and storage costs. However, using the PEFT method requires considering a vast design space, such as the type of PEFT modules and their insertion layers. Inadequate configurations can lead to sub-optimal results. Conventional solutions such as architectural search techniques, while effective, tend to introduce substantial additional overhead. In this paper, we propose a novel approach, PrunePEFT, which formulates the PEFT strategy search as a pruning problem and introduces a hybrid pruning strategy that capitalizes on the sensitivity of pruning methods to different PEFT modules. This method extends traditional pruning techniques by iteratively removing redundant or conflicting PEFT modules, thereby optimizing the fine-tuned configuration. By efficiently identifying the most relevant modules, our approach significantly reduces the computational burden typically associated with architectural search processes, making it a more scalable and efficient solution for fine-tuning large pre-trained models.
nan
Article 1089
Title@2025-06-09 (1): Theoretical Limitations of Ensembles in the Age of Overparameterization
Title: Theoretical Limitations of Ensembles in the Age of Overparameterization | Theoretische Einschränkungen von Ensembles im Zeitalter der Überparameterisierung | 过分参数化时代集合群的理论局限性 2410.16201v2 |
Authors (3): Niclas Dern, John P. Cunningham, Geoff Pleiss
Classic ensembles generalize better than any single component model. In contrast, recent empirical studies find that modern ensembles of (overparameterized) neural networks may not provide any inherent generalization advantage over single but larger neural networks. This paper clarifies how modern overparameterized ensembles differ from their classic underparameterized counterparts, using ensembles of random feature (RF) regressors as a basis for developing theory. In contrast to the underparameterized regime, where ensembling typically induces regularization and increases generalization, we prove with minimal assumptions that infinite ensembles of overparameterized RF regressors become pointwise equivalent to (single) infinite-width RF regressors, and finite width ensembles rapidly converge to single models with the same parameter budget. These results, which are exact for ridgeless models and approximate for small ridge penalties, imply that overparameterized ensembles and single large models exhibit nearly identical generalization. We further characterize the predictive variance amongst ensemble members, demonstrating that it quantifies the expected effects of increasing capacity rather than capturing any conventional notion of uncertainty. Our results challenge common assumptions about the advantages of ensembles in overparameterized settings, prompting a reconsideration of how well intuitions from underparameterized ensembles transfer to deep ensembles and the overparameterized regime.
nan
Article 1090
Title@2025-06-09 (1): Aircraft Trajectory Dataset Augmentation in Latent Space
Title: Aircraft Trajectory Dataset Augmentation in Latent Space | Flugbahndatensatz Augmentation im Latent Space | 低空空间轨迹数据集增强 2506.07585v1 |
Authors (2): Seokbin Yoon, Keumjin Lee
Aircraft trajectory modeling plays a crucial role in Air Traffic Management (ATM) and is important for various downstream tasks, including conflict detection and landing time prediction. Dataset augmentation through the addition of synthetically generated trajectory data is necessary to develop a more robust aircraft trajectory model and ensure that the trajectory dataset is sufficient and balanced. In this work, we propose a novel framework called ATRADA for aircraft trajectory dataset augmentation. In the proposed framework, a Transformer encoder learns the underlying patterns in the original trajectory dataset and converts each data point into a context vector in the learned latent space. The converted dataset in the latent space is projected into reduced dimensions using principal component analysis (PCA), and a Gaussian mixture model (GMM) is applied to fit the probability distribution of the data points in the reduced-dimensional space. Finally, new samples are drawn from the fitted GMM, the dimension of the samples is reverted to the original dimension, and they are decoded with a Multi-Layer Perceptron (MLP). Several experiments demonstrate that the framework effectively generates new, high-quality synthetic aircraft trajectory data, which were compared to the results of several baselines.
nan
Article 1091
Title@2025-06-09 (1): FedCGD: Collective Gradient Divergence Optimized Scheduling for Wireless Federated Learning
Title: FedCGD: Collective Gradient Divergence Optimized Scheduling for Wireless Federated Learning | FedCGD: Kollektive Gradient Divergenz optimierte Planung für drahtloses Federated Learning | FedCGD:无线联邦学习集体渐进变异优化计划 2506.07581v1 |
Authors (5): Tan Chen, Jintao Yan, Yuxuan Sun, Sheng Zhou, Zhisheng Niu
Federated learning (FL) is a promising paradigm for multiple devices to cooperatively train a model. When applied in wireless networks, two issues consistently affect the performance of FL, i.e., data heterogeneity of devices and limited bandwidth. Many papers have investigated device scheduling strategies considering the two issues. However, most of them recognize data heterogeneity as a property of individual devices. In this paper, we prove that the convergence speed of FL is affected by the sum of device-level and sample-level collective gradient divergence (CGD). The device-level CGD refers to the gradient divergence of the scheduled device group, instead of the sum of the individual device divergence. The sample-level CGD is statistically upper bounded by sampling variance, which is inversely proportional to the total number of samples scheduled for local update. To derive a tractable form of the device-level CGD, we further consider a classification problem and transform it into the weighted earth moving distance (WEMD) between the group distribution and the global distribution. Then we propose FedCGD algorithm to minimize the sum of multi-level CGDs by balancing WEMD and sampling variance, within polynomial time. Simulation shows that the proposed strategy increases classification accuracy on the CIFAR-10 dataset by up to 4.2\% while scheduling 41.8\% fewer devices, and flexibly switches between reducing WEMD and reducing sampling variance.
nan
Article 1092
Title@2025-06-09 (1): HSF: Defending against Jailbreak Attacks with Hidden State Filtering
Title: HSF: Defending against Jailbreak Attacks with Hidden State Filtering | HSF: Verteidigung gegen Jailbreak Attacken mit versteckten Zustand Filterung | HSF: 防范以隐蔽国家过滤手段侵入监狱的袭击 2409.03788v2 |
Authors (4): Cheng Qian, Hainan Zhang, Lei Sha, Zhiming Zheng
With the growing deployment of LLMs in daily applications like chatbots and content generation, efforts to ensure outputs align with human values and avoid harmful content have intensified. However, increasingly sophisticated jailbreak attacks threaten this alignment, aiming to induce unsafe outputs. Current defense efforts either focus on prompt rewriting or detection, which are limited in effectiveness due to the various design of jailbreak prompts, or on output control and detection, which are computationally expensive as they require LLM inference. Therefore, designing a pre-inference defense method that resists diverse jailbreak prompts is crucial for preventing LLM jailbreak attacks. We observe that jailbreak attacks, safe queries, and harmful queries exhibit different clustering patterns within the LLM’s hidden state representation space. This suggests that by leveraging the LLM’s hidden state representational capabilities, we can analyze the LLM’s forthcoming behavior and proactively intervene for defense. In this paper, we propose a jailbreak attack defense strategy based on a Hidden State Filter (HSF), a lossless architectural defense mechanism that enables the model to preemptively identify and reject adversarial inputs before the inference process begins. We activate its defensive potential through an additional plugin module, effectively framing the defense task as a classification problem. Experimental results on two benchmark datasets, utilizing three different LLMs, show that HSF significantly enhances resilience against six cutting-edge jailbreak attacks. It significantly reduces the success rate of jailbreak attacks while minimally impacting responses to benign user queries, with negligible inference overhead, and outperforming defense baselines.Our code and data are available at https://anonymous.4open.science/r/Hidden-State-Filtering-8652/
nan
Article 1093
Title@2025-06-09 (1): Denoising the Future: Top-p Distributions for Moving Through Time
Title: Denoising the Future: Top-p Distributions for Moving Through Time | Denoising the Future: Top-P-Distributionen für die Zeit | 摒弃未来:跨越时空的顶级发行品 2506.07578v1 |
Authors (4): Florian Andreas Marwitz, Ralf Möller, Magnus Bender, Marcel Gehrke
Inference in dynamic probabilistic models is a complex task involving expensive operations. In particular, for Hidden Markov Models, the whole state space has to be enumerated for advancing in time. Even states with negligible probabilities are considered, resulting in computational inefficiency and increased noise due to the propagation of unlikely probability mass. We propose to denoise the future and speed up inference by using only the top-p states, i.e., the most probable states with accumulated probability p. We show that the error introduced by using only the top-p states is bound by p and the so-called minimal mixing rate of the underlying model. Moreover, in our empirical evaluation, we show that we can expect speedups of at least an order of magnitude, while the error in terms of total variation distance is below 0.09.
nan
Article 1094
Title@2025-06-09 (1): Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Title: Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models | Uncertainty-o: Ein Modell-agnostisches Framework zur Enthüllung von Uncertainty in großen multimodalen Modellen | 不确定性 – – o:大型多式联运模型中不确定性的不确定性稳定的一个示范 – – 不可知框架 2506.07575v1 |
Authors (4): Ruiyang Zhang, Hu Zhang, Hao Fei, Zhedong Zheng
Large Multimodal Models (LMMs), harnessing the complementarity among diverse modalities, are often considered more robust than pure Language Large Models (LLMs); yet do LMMs know what they do not know? There are three key open questions remaining: (1) how to evaluate the uncertainty of diverse LMMs in a unified manner, (2) how to prompt LMMs to show its uncertainty, and (3) how to quantify uncertainty for downstream tasks. In an attempt to address these challenges, we introduce Uncertainty-o: (1) a model-agnostic framework designed to reveal uncertainty in LMMs regardless of their modalities, architectures, or capabilities, (2) an empirical exploration of multimodal prompt perturbations to uncover LMM uncertainty, offering insights and findings, and (3) derive the formulation of multimodal semantic uncertainty, which enables quantifying uncertainty from multimodal responses. Experiments across 18 benchmarks spanning various modalities and 10 LMMs (both open- and closed-source) demonstrate the effectiveness of Uncertainty-o in reliably estimating LMM uncertainty, thereby enhancing downstream tasks such as hallucination detection, hallucination mitigation, and uncertainty-aware Chain-of-Thought reasoning.
nan
Article 1095
Title@2025-06-09 (1): Attacking Attention of Foundation Models Disrupts Downstream Tasks
Title: Attacking Attention of Foundation Models Disrupts Downstream Tasks | Angriff auf die Aufmerksamkeit von Stiftungsmodellen bricht nachgelagerte Aufgaben aus | 攻击基金会模型的注意 破坏下流任务 2506.05394v2 |
Authors (3): Hondamunige Prasanna Silva, Federico Becattini, Lorenzo Seidenari
Foundation models represent the most prominent and recent paradigm shift in artificial intelligence. Foundation models are large models, trained on broad data that deliver high accuracy in many downstream tasks, often without fine-tuning. For this reason, models such as CLIP , DINO or Vision Transfomers (ViT), are becoming the bedrock of many industrial AI-powered applications. However, the reliance on pre-trained foundation models also introduces significant security concerns, as these models are vulnerable to adversarial attacks. Such attacks involve deliberately crafted inputs designed to deceive AI systems, jeopardizing their reliability. This paper studies the vulnerabilities of vision foundation models, focusing specifically on CLIP and ViTs, and explores the transferability of adversarial attacks to downstream tasks. We introduce a novel attack, targeting the structure of transformer-based architectures in a task-agnostic fashion. We demonstrate the effectiveness of our attack on several downstream tasks: classification, captioning, image/text retrieval, segmentation and depth estimation. Code available at:https://github.com/HondamunigePrasannaSilva/attack-attention
nan
Article 1096
Title@2025-06-09 (1): AI Scientists Fail Without Strong Implementation Capability
Title: AI Scientists Fail Without Strong Implementation Capability | KI-Wissenschaftler scheitern ohne starke Umsetzungsfähigkeit | AI 缺乏强有力的执行能力的科学家失败 2506.01372v2 |
Authors (7): Minjun Zhu, Qiujie Xie, Yixuan Weng, Jian Wu, Zhen Lin, Linyi Yang, Yue Zhang
The emergence of Artificial Intelligence (AI) Scientist represents a paradigm shift in scientific discovery, with large language models (LLMs) taking the lead as the primary executor in the entire scientific workflow from idea generation to experiment implementation. Recent AI Scientist studies demonstrate sufficient capabilities for independent scientific discovery, with the generated research reports gaining acceptance at the ICLR 2025 workshop and ACL 2025, arguing that a human-level AI Scientist, capable of uncovering phenomena previously unknown to humans, may be imminent. Despite this substantial progress, AI Scientist has yet to produce a groundbreaking achievement in the domain of computer science on par with automated scientific tools. Based on extensive quantitative evidence from existing benchmarks in complex engineering tasks and a systematic evaluation assess 28 research papers generated by five advanced AI Scientist systems, we argue that \textbf{the fundamental bottleneck for AI Scientists lies in their capability to execute the requisite verification procedures.} Current AI Scientist systems lack the execution capabilities needed to execute rigorous experiments and produce high-quality scientific papers. To better illustrate the root cause of this \textbf{implementation gap}, we provide an in-depth discussion on the fundamental limitations of AI Scientist. This position paper aims to call for the participants in the community to bridge the implementation gap.
nan
Article 1097
Title@2025-06-09 (1): Complex Physics-Informed Neural Network
Title: Complex Physics-Informed Neural Network | Komplexes physik-informiertes Neuronales Netzwerk | 复杂物理内化神经网络 2502.04917v2 |
Authors (4): Chenhao Si, Ming Yan, Xin Li, Zhihong Xia
We propose compleX-PINN, a novel physics-informed neural network (PINN) architecture incorporating a learnable activation function inspired by the Cauchy integral theorem. By optimizing the activation parameters, compleX-PINN achieves high accuracy with just a single hidden layer. Empirically, we demonstrate that compleX-PINN solves high-dimensional problems that pose significant challenges for PINNs. Our results show that compleX-PINN consistently achieves substantially greater precision, often improving accuracy by an order of magnitude, on these complex tasks.
nan
Article 1098
Title@2025-06-09 (1): A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment
Title: A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment | Ein einfacheres und erlernbares Graph Convolutional Attention Network für unüberwachtes Wissen Graphen Alignment | 一个用于不受监督知识图表对齐的简化和可学习的图变注意网络 2410.13263v2 |
Authors (3): Weishan Cai, Wenjun Ma, Yuncheng Jiang
The success of current Entity Alignment (EA) task depends largely on the supervision information provided by labeled data. Considering the cost of labeled data, most supervised methods are difficult to apply in practical scenarios. Therefore, more and more works based on contrastive learning, active learning or other deep learning techniques have been developed, to solve the performance bottleneck caused by the lack of labeled data. However, the existing unsupervised EA methods still have some limitations, either their modeling complexity is high or they cannot balance the effectiveness and practicality of alignment. To overcome these issues, we propose a Simplifying and Learnable graph convolutional attention network for Unsupervised Knowledge Graphs alignment method (SLU). Specifically, we first introduce LCAT, a new and simple framework as the backbone network to model the graph structure of two KGs. Then we design a reconstruction method of relation structure based on potential matching relations for efficiently filtering invalid neighborhood information of aligned entities, to improve the usability and scalability of SLU. Impressively, a similarity function based on consistency is proposed to better measure the similarity of candidate entity pairs. Finally, we conduct extensive experiments on three datasets of different sizes (15K and 100K) and different types (cross-lingual and monolingual) to verify the superiority of SLU. Experimental results show that SLU significantly improves alignment accuracy, outperforming 25 supervised or unsupervised methods, and improving 6.4% in Hits@1 over the best baseline in the best case.
nan
Article 1099
Title@2025-06-09 (1): AQUATIC-Diff: Additive Quantization for Truly Tiny Compressed Diffusion Models
Title: AQUATIC-Diff: Additive Quantization for Truly Tiny Compressed Diffusion Models | AQUATIC-Diff: Additive Quantisierung für wirklich kleine komprimierte Diffusionsmodelle | AQUATIC-Diff: 真正微缩压缩扩散模型补充量化 2506.05960v2 |
Authors (2): Adil Hasan, Thomas Peyrin
Significant investments have been made towards the commodification of diffusion models for generation of diverse media. Their mass-market adoption is however still hobbled by the intense hardware resource requirements of diffusion model inference. Model quantization strategies tailored specifically towards diffusion models have been useful in easing this burden, yet have generally explored the Uniform Scalar Quantization (USQ) family of quantization methods. In contrast, Vector Quantization (VQ) methods, which operate on groups of multiple related weights as the basic unit of compression, have seen substantial success in Large Language Model (LLM) quantization. In this work, we apply codebook-based additive vector quantization to the problem of diffusion model compression. Our resulting approach achieves a new Pareto frontier for the extremely low-bit weight quantization on the standard class-conditional benchmark of LDM-4 on ImageNet at 20 inference time steps. Notably, we report sFID 1.92 points lower than the full-precision model at W4A8 and the best-reported results for FID, sFID and ISC at W2A8. We are also able to demonstrate FLOPs savings on arbitrary hardware via an efficient inference kernel, as opposed to savings resulting from small integer operations which may lack broad hardware support.
nan
Article 1100
Title@2025-06-09 (1): Improving Memory Efficiency for Training KANs via Meta Learning
Title: Improving Memory Efficiency for Training KANs via Meta Learning | Verbesserung der Speichereffizienz für die Schulung von KANs über Meta Learning | 通过Meta学习提高培训KANs的记忆效率 2506.07549v1 |
Authors (4): Zhangchi Zhao, Jun Shu, Deyu Meng, Zongben Xu
Inspired by the Kolmogorov-Arnold representation theorem, KANs offer a novel framework for function approximation by replacing traditional neural network weights with learnable univariate functions. This design demonstrates significant potential as an efficient and interpretable alternative to traditional MLPs. However, KANs are characterized by a substantially larger number of trainable parameters, leading to challenges in memory efficiency and higher training costs compared to MLPs. To address this limitation, we propose to generate weights for KANs via a smaller meta-learner, called MetaKANs. By training KANs and MetaKANs in an end-to-end differentiable manner, MetaKANs achieve comparable or even superior performance while significantly reducing the number of trainable parameters and maintaining promising interpretability. Extensive experiments on diverse benchmark tasks, including symbolic regression, partial differential equation solving, and image classification, demonstrate the effectiveness of MetaKANs in improving parameter efficiency and memory usage. The proposed method provides an alternative technique for training KANs, that allows for greater scalability and extensibility, and narrows the training cost gap with MLPs stated in the original paper of KANs. Our code is available at https://github.com/Murphyzc/MetaKAN.
nan
Article 1101
Title@2025-06-09 (1): Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques
Title: Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques | Eliciting Fine-Tuned Transformer Fähigkeiten durch Inferenz-Zeit-Techniken | 通过推断-时间技术的精密导精度变形器能力 2506.08060v1 |
Authors (1): Asankhaya Sharma
Large language models have transformed natural language processing, yet supervised fine-tuning (SFT) remains computationally intensive. This paper formally proves that capabilities acquired through SFT can be approximated by a base transformer model using inference-time techniques, specifically in-context learning (ICL), without altering model parameters, under idealized assumptions including unbounded computational resources and access to the fine-tuning dataset. We extend these results to practical scenarios with finite context lengths and partial dataset access. For text generation tasks with fixed output length $l$, datasets of size $\mathrm{O}\left( \frac{m V}{\varepsilon^2} \log \frac{m}{\delta} \right)$ or, with bounded context, $\mathrm{O}\left( \frac{l \log V}{\varepsilon^2} \log \frac{1}{\delta} \right)$ suffice to approximate fine-tuned behavior across $m$ contexts within error $\varepsilon$, where $V$ is the vocabulary size and $\delta$ is the failure probability. For linear classification, datasets of size $\mathrm{O}\left( \frac{d}{\varepsilon} \right)$ or, with fixed context, $\mathrm{O}\left( \frac{1}{\varepsilon^2} \log \frac{1}{\delta} \right)$ are sufficient, where $d$ is the input dimension. Grounded in the Turing completeness of transformers, these results provide a theoretical foundation for resource-efficient deployment of large language models, with practical techniques like retrieval-augmented generation bridging theory to real-world applications.
nan
Article 1102
Title@2025-06-09 (1): Output-Constrained Decision Trees
Title: Output-Constrained Decision Trees | Output-beschränkte Entscheidungsbäume | 受产出约束的决策树 2405.15314v3 |
Authors (6): Hüseyin Tunç, Doğanay Özese, Ş. İlker Birbil, Donato Maragno, Marco Caserta, Mustafa Baydoğan
Incorporating domain-specific constraints into machine learning models is essential for generating predictions that are both accurate and feasible in real-world applications. This paper introduces new methods for training Output-Constrained Regression Trees (OCRT), addressing the limitations of traditional decision trees in constrained multi-target regression tasks. We propose three approaches: M-OCRT, which uses split-based mixed integer programming to enforce constraints; E-OCRT, which employs an exhaustive search for optimal splits and solves constrained prediction problems at each decision node; and EP-OCRT, which applies post-hoc constrained optimization to tree predictions. To illustrate their potential uses in ensemble learning, we also introduce a random forest framework working under convex feasible sets. We validate the proposed methods through a computational study both on synthetic and industry-driven hierarchical time series datasets. Our results demonstrate that imposing constraints on decision tree training results in accurate and feasible predictions.
nan
Article 1103
Title@2025-06-09 (1): Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling
Title: Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling | Heartcare Suite: Multidimensionales Verständnis von EKG mit Raw Multi-Lead Signal Modeling | 心脏护理套套件:以原始多牵头信号建模对ECG的多层面理解 2506.05831v2 |
Authors (14): Yihan Xie, Sijing Li, Tianwei Lin, Zhuonan Wang, Chenglin Yang, Yu Zhong, Wenqiao Zhang, Haoyuan Li, Hao Jiang, Fengda Zhang, Qishan Chen, Jun Xiao, Yueting Zhuang, Beng Chin Ooi
We present Heartcare Suite, a multimodal comprehensive framework for finegrained electrocardiogram (ECG) understanding. It comprises three key components: (i) Heartcare-220K, a high-quality, structured, and comprehensive multimodal ECG dataset covering essential tasks such as disease diagnosis, waveform morphology analysis, and rhythm interpretation. (ii) Heartcare-Bench, a systematic and multi-dimensional benchmark designed to evaluate diagnostic intelligence and guide the optimization of Medical Multimodal Large Language Models (Med-MLLMs) in ECG scenarios. and (iii) HeartcareGPT with a tailored tokenizer Bidirectional ECG Abstract Tokenization (Beat), which compresses raw multi-lead signals into semantically rich discrete tokens via duallevel vector quantization and query-guided bidirectional diffusion mechanism. Built upon Heartcare-220K, HeartcareGPT achieves strong generalization and SoTA performance across multiple clinically meaningful tasks. Extensive experiments demonstrate that Heartcare Suite is highly effective in advancing ECGspecific multimodal understanding and evaluation. Our project is available at https://github.com/DCDmllm/Heartcare-Suite .
nan
Article 1104
Title@2025-06-09 (1): Flowing Datasets with Wasserstein over Wasserstein Gradient Flows
Title: Flowing Datasets with Wasserstein over Wasserstein Gradient Flows | Flowing Datasets mit Wasserstein über Wasserstein Gradient Flows | 与瓦森斯坦对瓦森斯坦梯流的流数据集 2506.07534v1 |
Authors (3): Clément Bonet, Christophe Vauthier, Anna Korba
Many applications in machine learning involve data represented as probability distributions. The emergence of such data requires radically novel techniques to design tractable gradient flows on probability distributions over this type of (infinite-dimensional) objects. For instance, being able to flow labeled datasets is a core task for applications ranging from domain adaptation to transfer learning or dataset distillation. In this setting, we propose to represent each class by the associated conditional distribution of features, and to model the dataset as a mixture distribution supported on these classes (which are themselves probability distributions), meaning that labeled datasets can be seen as probability distributions over probability distributions. We endow this space with a metric structure from optimal transport, namely the Wasserstein over Wasserstein (WoW) distance, derive a differential structure on this space, and define WoW gradient flows. The latter enables to design dynamics over this space that decrease a given objective functional. We apply our framework to transfer learning and dataset distillation tasks, leveraging our gradient flow construction as well as novel tractable functionals that take the form of Maximum Mean Discrepancies with Sliced-Wasserstein based kernels between probability distributions.
nan
Article 1105
Title@2025-06-09 (1): Learning What Reinforcement Learning Can’t: Interleaved Online Fine-Tuning for Hardest Questions
Title: Learning What Reinforcement Learning Can’t: Interleaved Online Fine-Tuning for Hardest Questions | Lernen, was Verstärktes Lernen nicht kann: Interleaved Online Fine-Tuning für härteste Fragen | 学习强化学习无法做到的东西: 2506.07527v1 |
Authors (11): Lu Ma, Hao Liang, Meiyi Qiang, Lexiang Tang, Xiaochen Ma, Zhen Hao Wong, Junbo Niu, Chengyu Shen, Runming He, Bin Cui, Wentao Zhang
Recent advances in large language model (LLM) reasoning have shown that sophisticated behaviors such as planning and self-reflection can emerge through reinforcement learning (RL). However, despite these successes, RL in its current form remains insufficient to induce capabilities that exceed the limitations of the base model, as it is primarily optimized based on existing knowledge of the model rather than facilitating the acquisition of new information. To address this limitation, we employ supervised fine-tuning (SFT) to learn what RL cannot, which enables the incorporation of new knowledge and reasoning patterns by leveraging high-quality demonstration data. We analyze the training dynamics of RL and SFT for LLM reasoning and find that RL excels at maintaining and improving performance on questions within the model’s original capabilities, while SFT is more effective at enabling progress on questions beyond the current scope of the model. Motivated by the complementary strengths of RL and SFT, we introduce a novel training approach, \textbf{ReLIFT} (\textbf{Re}inforcement \textbf{L}earning \textbf{I}nterleaved with Online \textbf{F}ine-\textbf{T}uning). In ReLIFT, the model is primarily trained using RL, but when it encounters challenging questions, high-quality solutions are collected for fine-tuning, and the training process alternates between RL and fine-tuning to enhance the model’s reasoning abilities. ReLIFT achieves an average improvement of over +5.2 points across five competition-level benchmarks and one out-of-distribution benchmark compared to other zero-RL models. Furthermore, we demonstrate that ReLIFT outperforms both RL and SFT while using only 13\% of the detailed demonstration data, highlighting its scalability. These results provide compelling evidence that ReLIFT overcomes the fundamental limitations of RL and underscores the significant potential.
nan
Article 1106
Title@2025-06-09 (1): Test-time Correlation Alignment
Title: Test-time Correlation Alignment | Prüfzeitkorrelation Ausrichtung | 测试时关联对齐 2505.00533v2 |
Authors (3): Linjing You, Jiabao Lu, Xiayuan Huang
Deep neural networks often degrade under distribution shifts. Although domain adaptation offers a solution, privacy constraints often prevent access to source data, making Test-Time Adaptation (TTA, which adapts using only unlabeled test data) increasingly attractive. However, current TTA methods still face practical challenges: (1) a primary focus on instance-wise alignment, overlooking CORrelation ALignment (CORAL) due to missing source correlations; (2) complex backpropagation operations for model updating, resulting in overhead computation and (3) domain forgetting. To address these challenges, we provide a theoretical analysis to investigate the feasibility of Test-time Correlation Alignment (TCA), demonstrating that correlation alignment between high-certainty instances and test instances can enhance test performances with a theoretical guarantee. Based on this, we propose two simple yet effective algorithms: LinearTCA and LinearTCA+. LinearTCA applies a simple linear transformation to achieve both instance and correlation alignment without additional model updates, while LinearTCA+ serves as a plug-and-play module that can easily boost existing TTA methods. Extensive experiments validate our theoretical insights and show that TCA methods significantly outperforms baselines across various tasks, benchmarks and backbones. Notably, LinearTCA achieves higher accuracy with only 4% GPU memory and 0.6% computation time compared to the best TTA baseline. It also outperforms existing methods on CLIP over 1.86%.
nan
Article 1107
Title@2025-06-09 (1): Towards Achieving Perfect Multimodal Alignment
Title: Towards Achieving Perfect Multimodal Alignment | Auf dem Weg zu einer perfekten multimodalen Ausrichtung | 实现完美多式联运协调 2503.15352v2 |
Authors (2): Abhi Kamboj, Minh N. Do
Multimodal alignment constructs a joint latent vector space where modalities representing the same concept map to neighboring latent vectors. We formulate this as an inverse problem and show that, under certain conditions, paired data from each modality can map to equivalent latent vectors, which we refer to as perfect alignment. When perfect alignment cannot be achieved, it can be approximated using the Singular Value Decomposition (SVD) of a multimodal data matrix. Experiments on synthetic multimodal Gaussian data verify the effectiveness of our perfect alignment method compared to a learned contrastive alignment method. We further demonstrate the practical application of cross-modal transfer for human action recognition, showing that perfect alignment significantly enhances the model’s accuracy. We conclude by discussing how these findings can be applied to various modalities and tasks and the limitations of our method. We hope these findings inspire further exploration of perfect alignment and its applications in representation learning.
nan
Article 1108
Title@2025-06-09 (1): Addressing Correlated Latent Exogenous Variables in Debiased Recommender Systems
Title: Addressing Correlated Latent Exogenous Variables in Debiased Recommender Systems | Adressierung korrelierter latenter exogener Variablen in debiased Recommender Systems | 处理受损害建议系统中与气候相关的低端外部源变量 2506.07517v1 |
Authors (4): Shuqiang Zhang, Yuchao Zhang, Jinkun Chen, Haochen Sui
Recommendation systems (RS) aim to provide personalized content, but they face a challenge in unbiased learning due to selection bias, where users only interact with items they prefer. This bias leads to a distorted representation of user preferences, which hinders the accuracy and fairness of recommendations. To address the issue, various methods such as error imputation based, inverse propensity scoring, and doubly robust techniques have been developed. Despite the progress, from the structural causal model perspective, previous debiasing methods in RS assume the independence of the exogenous variables. In this paper, we release this assumption and propose a learning algorithm based on likelihood maximization to learn a prediction model. We first discuss the correlation and difference between unmeasured confounding and our scenario, then we propose a unified method that effectively handles latent exogenous variables. Specifically, our method models the data generation process with latent exogenous variables under mild normality assumptions. We then develop a Monte Carlo algorithm to numerically estimate the likelihood function. Extensive experiments on synthetic datasets and three real-world datasets demonstrate the effectiveness of our proposed method. The code is at https://github.com/WallaceSUI/kdd25-background-variable.
nan
Article 1109
Title@2025-06-09 (1): SecFormer: Fast and Accurate Privacy-Preserving Inference for Transformer Models via SMPC
Title: SecFormer: Fast and Accurate Privacy-Preserving Inference for Transformer Models via SMPC | SecFormer: Schnelle und genaue Datenschutz-Bewahrung von Schlussfolgerungen für Transformer-Modelle über SMPC | SecFormer:通过SMPC为变形模型快速和准确的隐私保护推断 2401.00793v5 |
Authors (8): Jinglong Luo, Yehong Zhang, Zhuo Zhang, Jiaqi Zhang, Xin Mu, Hui Wang, Yue Yu, Zenglin Xu
With the growing use of Transformer models hosted on cloud platforms to offer inference services, privacy concerns are escalating, especially concerning sensitive data like investment plans and bank account details. Secure Multi-Party Computing (SMPC) emerges as a promising solution to protect the privacy of inference data and model parameters. However, the application of SMPC in Privacy-Preserving Inference (PPI) for Transformer models often leads to considerable slowdowns or declines in performance. This is largely due to the multitude of nonlinear operations in the Transformer architecture, which are not well-suited to SMPC and difficult to circumvent or optimize effectively. To address this concern, we introduce a comprehensive PPI framework called SecFormer to achieve fast and accurate PPI for Transformer models. We successfully eliminate the high-cost exponential and maximum operations in PPI without sacrificing model performance and develop a suite of efficient SMPC protocols by employing suitable numerical computation methods to boost other complex nonlinear functions in PPI, including GeLU, LayerNorm, and a redesigned Softmax. Our extensive experiments reveal that SecFormer outperforms MPCFormer in performance, showing improvements of $3.4\%$ and $24.7\%$ for BERT${\text{BASE}}$ and BERT${\text{LARGE}}$, respectively. In terms of efficiency, SecFormer is 3.57 and 3.58 times faster than PUMA for BERT${\text{BASE}}$ and BERT${\text{LARGE}}$, demonstrating its effectiveness and speed.
nan
Article 1110
Title@2025-06-09 (1): Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators
Title: Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators | Beyond Closure Models: Chaotische Systeme über physikinformierte Neuraloperatoren lernen | 超越封闭模型:通过物理内成形神经操作员学习毛质系统 2408.05177v4 |
Authors (7): Chuwei Wang, Julius Berner, Zongyi Li, Di Zhou, Jiayun Wang, Jane Bae, Anima Anandkumar
Accurately predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling. However, achieving such predictions typically requires iterative computations over a dense spatiotemporal grid to account for the unstable nature of chaotic systems, which is expensive and impractical in many real-world situations. An alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a \textit{closure model}, which approximates the overall information from fine scales not captured in the coarse-grid simulation. Recently, ML approaches have been used for closure modeling, but they typically require a large number of training samples from expensive fully-resolved simulations (FRS). In this work, we prove an even more fundamental limitation, i.e., the standard approach to learning closure models suffers from a large approximation error for generic problems, no matter how large the model is, and it stems from the non-uniqueness of the mapping. We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation by not using a closure model or a coarse-grid solver. We first train the PINO model on data from a coarse-grid solver and then fine-tune it with (a small amount of) FRS and physics-based losses on a fine grid. The discretization-free nature of neural operators means that they do not suffer from the restriction of a coarse grid that closure models face, and they can provably approximate the long-term statistics of chaotic systems. In our experiments, our PINO model achieves a 330x speedup compared to FRS with a relative error $\sim 10\%$. In contrast, the closure model coupled with a coarse-grid solver is $60$x slower than PINO while having a much higher error $\sim186\%$ when the closure model is trained on the same FRS dataset.
nan
Article 1111
Title@2025-06-09 (1): Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models
Title: Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models | Eliciting In-Context Retrieval und Reasoning für Lang-Kontext Große Sprachmodelle | 长式长式大语言模型的全文检索和理由 2501.08248v3 |
Authors (6): Yifu Qiu, Varun Embar, Yizhe Zhang, Navdeep Jaitly, Shay B. Cohen, Benjamin Han
Recent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their expanded context windows, LCLMs can process entire knowledge bases and perform retrieval and reasoning directly – a capability we define as In-Context Retrieval and Reasoning (ICR^2). However, existing benchmarks like LOFT often overestimate LCLM performance by providing overly simplified contexts. To address this, we introduce ICR^2, a benchmark that evaluates LCLMs in more realistic scenarios by including confounding passages retrieved with strong retrievers. We then propose three methods to enhance LCLM performance: (1) retrieve-then-generate fine-tuning, (2) retrieval-attention-probing, which uses attention heads to filter and de-noise long contexts during decoding, and (3) joint retrieval head training alongside the generation head. Our evaluation of five well-known LCLMs on LOFT and ICR^2 demonstrates significant gains with our best approach applied to Mistral-7B: +17 and +15 points by Exact Match on LOFT, and +13 and +2 points on ICR^2, compared to vanilla RAG and supervised fine-tuning, respectively. It even outperforms GPT-4-Turbo on most tasks despite being a much smaller model.
nan
Article 1112
Title@2025-06-09 (1): Reinforcement Learning via Implicit Imitation Guidance
Title: Reinforcement Learning via Implicit Imitation Guidance | Verstärkung Lernen über Implizite Imitation Anleitung | 通过隐含模拟指导加强学习 2506.07505v1 |
Authors (4): Perry Dong, Alec M. Lessing, Annie S. Chen, Chelsea Finn
We study the problem of sample efficient reinforcement learning, where prior data such as demonstrations are provided for initialization in lieu of a dense reward signal. A natural approach is to incorporate an imitation learning objective, either as regularization during training or to acquire a reference policy. However, imitation learning objectives can ultimately degrade long-term performance, as it does not directly align with reward maximization. In this work, we propose to use prior data solely for guiding exploration via noise added to the policy, sidestepping the need for explicit behavior cloning constraints. The key insight in our framework, Data-Guided Noise (DGN), is that demonstrations are most useful for identifying which actions should be explored, rather than forcing the policy to take certain actions. Our approach achieves up to 2-3x improvement over prior reinforcement learning from offline data methods across seven simulated continuous control tasks.
nan
Article 1113
Title@2025-06-09 (1): CaliciBoost: Performance-Driven Evaluation of Molecular Representations for Caco-2 Permeability Prediction
Title: CaliciBoost: Performance-Driven Evaluation of Molecular Representations for Caco-2 Permeability Prediction | CaliciBoost: Performance-getriebene Auswertung von molekularen Darstellungen für Caco-2 Permeabilitätsvorhersage | CaliciBoost:对Caco-2 渗透性预测的分子代表量进行实绩-驱动评价 2506.08059v1 |
Authors (11): Huong Van Le, Weibin Ren, Junhong Kim, Yukyung Yun, Young Bin Park, Young Jun Kim, Bok Kyung Han, Inho Choi, Jong IL Park, Hwi-Yeol Yun, Jae-Mun Choi
Caco-2 permeability serves as a critical in vitro indicator for predicting the oral absorption of drug candidates during early-stage drug discovery. To enhance the accuracy and efficiency of computational predictions, we systematically investigated the impact of eight molecular feature representation types including 2D/3D descriptors, structural fingerprints, and deep learning-based embeddings combined with automated machine learning techniques to predict Caco-2 permeability. Using two datasets of differing scale and diversity (TDC benchmark and curated OCHEM data), we assessed model performance across representations and identified PaDEL, Mordred, and RDKit descriptors as particularly effective for Caco-2 prediction. Notably, the AutoML-based model CaliciBoost achieved the best MAE performance. Furthermore, for both PaDEL and Mordred representations, the incorporation of 3D descriptors resulted in a 15.73% reduction in MAE compared to using 2D features alone, as confirmed by feature importance analysis. These findings highlight the effectiveness of AutoML approaches in ADMET modeling and offer practical guidance for feature selection in data-limited prediction tasks.
nan
Article 1114
Title@2025-06-09 (1): BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction
Title: BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction | BatteryLife: Ein umfassender Datensatz und Benchmark für die Vorhersage der Akkulaufzeit | 电池寿命:电池寿命预测综合数据集和基准 2502.18807v6 |
Authors (9): Ruifeng Tan, Weixiang Hong, Jiayue Tang, Xibin Lu, Ruijun Ma, Xiang Zheng, Jia Li, Jiaqiang Huang, Tong-Yi Zhang
Battery Life Prediction (BLP), which relies on time series data produced by battery degradation tests, is crucial for battery utilization, optimization, and production. Despite impressive advancements, this research area faces three key challenges. Firstly, the limited size of existing datasets impedes insights into modern battery life data. Secondly, most datasets are restricted to small-capacity lithium-ion batteries tested under a narrow range of diversity in labs, raising concerns about the generalizability of findings. Thirdly, inconsistent and limited benchmarks across studies obscure the effectiveness of baselines and leave it unclear if models popular in other time series fields are effective for BLP. To address these challenges, we propose BatteryLife, a comprehensive dataset and benchmark for BLP. BatteryLife integrates 16 datasets, offering a 2.5 times sample size compared to the previous largest dataset, and provides the most diverse battery life resource with batteries from 8 formats, 59 chemical systems, 9 operating temperatures, and 421 charge/discharge protocols, including both laboratory and industrial tests. Notably, BatteryLife is the first to release battery life datasets of zinc-ion batteries, sodium-ion batteries, and industry-tested large-capacity lithium-ion batteries. With the comprehensive dataset, we revisit the effectiveness of baselines popular in this and other time series fields. Furthermore, we propose CyclePatch, a plug-in technique that can be employed in various neural networks. Extensive benchmarking of 18 methods reveals that models popular in other time series fields can be unsuitable for BLP, and CyclePatch consistently improves model performance establishing state-of-the-art benchmarks. Moreover, BatteryLife evaluates model performance across aging conditions and domains. BatteryLife is available at https://github.com/Ruifeng-Tan/BatteryLife.
nan
Article 1115
Title@2025-06-09 (1): Scalable Sobolev IPM for Probability Measures on a Graph
Title: Scalable Sobolev IPM for Probability Measures on a Graph | Skalierbares Sobolev IPM für Wahrscheinlichkeitsmessungen auf einem Graphen | 可缩缩缩的 Sobolev 用于对图表进行概率测量的IPM IPM 2502.00737v2 |
Authors (4): Tam Le, Truyen Nguyen, Hideitsu Hino, Kenji Fukumizu
We investigate the Sobolev IPM problem for probability measures supported on a graph metric space. Sobolev IPM is an important instance of integral probability metrics (IPM), and is obtained by constraining a critic function within a unit ball defined by the Sobolev norm. In particular, it has been used to compare probability measures and is crucial for several theoretical works in machine learning. However, to our knowledge, there are no efficient algorithmic approaches to compute Sobolev IPM effectively, which hinders its practical applications. In this work, we establish a relation between Sobolev norm and weighted $L^p$-norm, and leverage it to propose a \emph{novel regularization} for Sobolev IPM. By exploiting the graph structure, we demonstrate that the regularized Sobolev IPM provides a \emph{closed-form} expression for fast computation. This advancement addresses long-standing computational challenges, and paves the way to apply Sobolev IPM for practical applications, even in large-scale settings. Additionally, the regularized Sobolev IPM is negative definite. Utilizing this property, we design positive-definite kernels upon the regularized Sobolev IPM, and provide preliminary evidences of their advantages for comparing probability measures on a given graph for document classification and topological data analysis.
nan
Article 1116
Title@2025-06-09 (1): Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning
Title: Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning | Graph-of-Causal Evolution: Herausfordernde Modellkette für die Vernunft | Causal 进化图:有挑战性的理由模型链 2506.07501v1 |
Authors (1): Libo Wang
In view of the problem that each subchain in the chain-of-model (CoM) relies only on the information of the previous subchain and may lose long-range dependencies due to the causal mask blocking the global context flow between multi-level subchains, this work proposes a graph of causal evolution (GoCE). Its core principle is to map the implicit token representation into a differentiable and sparse causal adjacency matrix, then permeate causal constraints through each layer of calculation using causal-masked attention and causal-MoE. By combining intervention consistency loss test and self-evolution gate, the dynamic balance between causal structure learning and adaptive updating of transformer architecture is realized. The researcher built experimental environments in sandboxes built with Claude Sonnet 4, o4-mini-high, and DeepSeek R1 respectively with the transformer variant architecture introduced in GoCE. It is evaluated on publicly available datasets including CLUTRR, CLADDER, EX-FEVER, and CausalQA and compared with the baseline LLMs. The finding proves that GoCE strengthens the transformer’s ability to capture long-range causal dependencies, while the ability to self-evolve is improved. It not only surpasses the design of CoM in terms of design principles, but also provides experience for future research on causal learning and continuous adaptive improvement.
nan
Article 1117
Title@2025-06-09 (1): Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks
Title: Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks | Mind the Gap: Entfernen der Diskretisierungslücke in differenzierbaren Logic Gate-Netzwerken | 牢记差距:消除不同逻辑门网的分化差距 2506.07500v1 |
Authors (4): Shakir Yousefi, Andreas Plesner, Till Aczel, Roger Wattenhofer
Modern neural networks demonstrate state-of-the-art performance on numerous existing benchmarks; however, their high computational requirements and energy consumption prompt researchers to seek more efficient solutions for real-world deployment. Logic gate networks (LGNs) learns a large network of logic gates for efficient image classification. However, learning a network that can solve a simple problem like CIFAR-10 can take days to weeks to train. Even then, almost half of the network remains unused, causing a discretization gap. This discretization gap hinders real-world deployment of LGNs, as the performance drop between training and inference negatively impacts accuracy. We inject Gumbel noise with a straight-through estimator during training to significantly speed up training, improve neuron utilization, and decrease the discretization gap. We theoretically show that this results from implicit Hessian regularization, which improves the convergence properties of LGNs. We train networks $4.5 \times$ faster in wall-clock time, reduce the discretization gap by $98\%$, and reduce the number of unused gates by $100\%$.
nan
Article 1118
Title@2025-06-09 (1): Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization
Title: Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization | Low-Dimension-to-High-Dimension Verallgemeinerung und ihre Implikationen für Länge Verallgemeinerung | 低差异至高差异通用化及其对延长通用化的影响 2410.08898v2 |
Authors (4): Yang Chen, Long Yang, Yitao Liang, Zhouchen Lin
Low-Dimension-to-High-Dimension (LDHD) generalization is a special case of Out-of-Distribution (OOD) generalization, where the training data are restricted to a low-dimensional subspace of the high-dimensional testing space. Assuming that each instance is generated from a latent variable and the dimension of the latent variable reflects the problem scale, the inherent scaling challenge in length generalization can be captured by the LDHD generalization in the latent space. We theoretically demonstrate that LDHD generalization is generally unattainable without exploiting prior knowledge to provide appropriate inductive bias. Specifically, we explore LDHD generalization in Boolean functions. We verify that different architectures trained with (S)GD converge to \emph{min-degree interpolators w.r.t. different independent sets}. LDHD generalization is achievable if and only if the target function coincides with this inductive bias. Applying the insights from LDHD generalization to length generalization, we explain the effectiveness of CoT as changing the structure latent space to enable better LDHD generalization. We also propose a principle for position embedding design to handle both the inherent LDHD generalization and the nuisances such as the data format. Following the principle, we propose a novel position embedding called RPE-Square that remedies the RPE for dealing with the data format nuisance.
nan
Article 1119
Title@2025-06-09 (1): Explicit Preference Optimization: No Need for an Implicit Reward Model
Title: Explicit Preference Optimization: No Need for an Implicit Reward Model | Explizite Preference-Optimierung: Keine Notwendigkeit für ein Implizites Prämienmodell | 明确的优惠最优化:没有必要采用隐性奖赏模式 2506.07492v1 |
Authors (4): Xiangkun Hu, Lemin Kong, Tong He, David Wipf
The generated responses of large language models (LLMs) are often fine-tuned to human preferences through a process called reinforcement learning from human feedback (RLHF). As RLHF relies on a challenging training sequence, whereby a separate reward model is independently learned and then later applied to LLM policy updates, ongoing research effort has targeted more straightforward alternatives. In this regard, direct preference optimization (DPO) and its many offshoots circumvent the need for a separate reward training step. Instead, through the judicious use of a reparameterization trick that induces an \textit{implicit} reward, DPO and related methods consolidate learning to the minimization of a single loss function. And yet despite demonstrable success in some real-world settings, we prove that DPO-based objectives are nonetheless subject to sub-optimal regularization and counter-intuitive interpolation behaviors, underappreciated artifacts of the reparameterizations upon which they are based. To this end, we introduce an \textit{explicit} preference optimization framework termed EXPO that requires no analogous reparameterization to achieve an implicit reward. Quite differently, we merely posit intuitively-appealing regularization factors from scratch that transparently avoid the potential pitfalls of key DPO variants, provably satisfying regularization desiderata that prior methods do not. Empirical results serve to corroborate our analyses and showcase the efficacy of EXPO.
nan
Article 1120
Title@2025-06-09 (1): Binary Classifier Optimization for Large Language Model Alignment
Title: Binary Classifier Optimization for Large Language Model Alignment | Binärklassifikator-Optimierung für großsprachliche Modellausrichtung | 大语言模型对齐的二进制分类器优化 2404.04656v2 |
Authors (4): Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On
In real-world services such as ChatGPT, aligning models based on user feedback is crucial for improving model performance. However, due to the simplicity and convenience of providing feedback, users typically offer only basic binary signals, such as ‘thumbs-up’ or ‘thumbs-down’. Most existing alignment research, on the other hand, relies on preference-based approaches that require both positive and negative responses as a pair. We propose Binary Classifier Optimization (BCO), a technique that effectively aligns LLMs using only binary feedback. BCO trains a binary classifier, where the logit serves as an implicit reward, effectively minimizing the Direct Preference Optimization (DPO) loss. We demonstrate that the binary cross-entropy loss employed in classifier training acts as an upper bound for the DPO loss. Additionally, a novel reward shift technique further minimizes the gap between the losses. We validate our methodology in two settings: first, on a paired preference dataset, where our method performs on par with DPO; and second, on a Likert-5 scale annotation dataset which stems from real users’ queries. Our model consistently demonstrates effective and robust alignment across four base LLMs and three different datasets, showcasing the strength of our approach to learning from binary signals.
nan
Article 1121
Title@2025-06-09 (1): CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization
Title: CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization | CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model für die Kontextoptimierung | CoCOA-混合:环境优化组合模型 2506.07484v1 |
Authors (3): Dasol Hong, Wooju Lee, Hyun Myung
Prompt tuning, which adapts vision-language models by freezing model parameters and optimizing only the prompt, has proven effective for task-specific adaptations. The core challenge in prompt tuning is improving specialization for a specific task and generalization for unseen domains. However, frozen encoders often produce misaligned features, leading to confusion between classes and limiting specialization. To overcome this issue, we propose a confusion-aware loss (CoA-loss) that improves specialization by refining the decision boundaries between confusing classes. Additionally, we mathematically demonstrate that a mixture model can enhance generalization without compromising specialization. This is achieved using confidence-aware weights (CoA-weights), which adjust the weights of each prediction in the mixture model based on its confidence within the class domains. Extensive experiments show that CoCoA-Mix, a mixture model with CoA-loss and CoA-weights, outperforms state-of-the-art methods by enhancing specialization and generalization. Our code is publicly available at https://github.com/url-kaist/CoCoA-Mix.
nan
Article 1122
Title@2025-06-09 (1): Premise Selection for a Lean Hammer
Title: Premise Selection for a Lean Hammer | Premise Auswahl für einen Lean Hammer | 精精精锤预选 2506.07477v1 |
Authors (5): Thomas Zhu, Joshua Clune, Jeremy Avigad, Albert Qiaochu Jiang, Sean Welleck
Neural methods are transforming automated reasoning for proof assistants, yet integrating these advances into practical verification workflows remains challenging. Hammers are tools that interface with external automatic theorem provers to automate tedious reasoning steps. They have dramatically improved productivity in proof assistants, but the Lean proof assistant still does not have a hammer despite its growing popularity. We present LeanHammer, the first end-to-end domain-general hammer for Lean, built on a novel neural premise selection system for a hammer in dependent type theory. Unlike existing Lean premise selectors, our approach dynamically adapts to user-specific contexts and combines with symbolic proof search and reconstruction to create a practical hammer. With comprehensive evaluations, we show that our premise selector enables LeanHammer to solve 21\% more goals relative to existing premise selectors, and generalize well to diverse domains. Our work bridges the gap between neural retrieval and symbolic reasoning, making formal verification more accessible to researchers and practitioners.
nan
Article 1123
Title@2025-06-09 (1): FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Title: FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models | FRED: Flexible Reduction-Distribution Interconnect und Communication Implementation für Wafer-Scale Distributed Training von DNN-Modellen | FRED: DNN模型的宽度分布式分散式培训灵活弹性吸收-分散式互联和通信实施 2406.19580v2 |
Authors (5): Saeed Rashidi, William Won, Sudarshan Srinivasan, Puneet Gupta, Tushar Krishna
Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating high-end accelerators with high-speed wafer-scale interconnects, making it an attractive platform for distributed training. However, the wafer-scale interconnect should offer high performance and flexibility for various parallelization strategies to enable maximum optimizations for compute and memory usage. In this paper, we propose FRED, a wafer-scale interconnect that is tailored for the high-BW requirements of wafer-scale networks and can efficiently execute communication patterns of different parallelization strategies. Furthermore, FRED supports in-switch collective communication execution that reduces the network traffic by approximately 2X. Our results show that FRED can improve the average end-to-end training time of ResNet-152, Transformer-17B, GPT-3, and Transformer-1T by 1.76X, 1.87X, 1.34X, and 1.4X, respectively when compared to a baseline waferscale 2D-Mesh fabric.
nan
Article 1124
Title@2025-06-09 (1): Deep Ridgelet Transform and Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines
Title: Deep Ridgelet Transform and Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines | Deep Ridgelet Transform und Unified Universality Theorem für tiefe und flache Gelenk-Gruppe-Equivariante Maschinen | 深脊变异和统一通用深海和浅水联合小组-平等合用机器理论 2405.13682v5 |
Authors (4): Sho Sonoda, Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda
We present a constructive universal approximation theorem for learning machines equipped with joint-group-equivariant feature maps, called the joint-equivariant machines, based on the group representation theory. ``Constructive’’ here indicates that the distribution of parameters is given in a closed-form expression known as the ridgelet transform. Joint-group-equivariance encompasses a broad class of feature maps that generalize classical group-equivariance. Particularly, fully-connected networks are not group-equivariant but are joint-group-equivariant. Our main theorem also unifies the universal approximation theorems for both shallow and deep networks. Until this study, the universality of deep networks has been shown in a different manner from the universality of shallow networks, but our results discuss them on common ground. Now we can understand the approximation schemes of various learning machines in a unified manner. As applications, we show the constructive universal approximation properties of four examples: depth-$n$ joint-equivariant machine, depth-$n$ fully-connected network, depth-$n$ group-convolutional network, and a new depth-$2$ network with quadratic forms whose universality has not been known.
nan
Article 1125
Title@2025-06-09 (1): Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Title: Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models | Verfolgung beweglicher Ziele mit Online-Selbstspiel-Verstärkung Lernen für sicherere Sprachmodelle | 利用在线加强自身能力学习,建立更安全语言模式,以追踪移动目标 2506.07468v1 |
Authors (7): Mickel Liu, Liwei Jiang, Yancheng Liang, Simon Shaolei Du, Yejin Choi, Tim Althoff, Natasha Jaques
Conventional language model (LM) safety alignment relies on a reactive, disjoint procedure: attackers exploit a static model, followed by defensive fine-tuning to patch exposed vulnerabilities. This sequential approach creates a mismatch – attackers overfit to obsolete defenses, while defenders perpetually lag behind emerging threats. To address this, we propose Self-RedTeam, an online self-play reinforcement learning algorithm where an attacker and defender agent co-evolve through continuous interaction. We cast safety alignment as a two-player zero-sum game, where a single model alternates between attacker and defender roles – generating adversarial prompts and safeguarding against them – while a reward LM adjudicates outcomes. This enables dynamic co-adaptation. Grounded in the game-theoretic framework of zero-sum games, we establish a theoretical safety guarantee which motivates the design of our method: if self-play converges to a Nash Equilibrium, the defender will reliably produce safe responses to any adversarial input. Empirically, Self-RedTeam uncovers more diverse attacks (+21.8% SBERT) compared to attackers trained against static defenders and achieves higher robustness on safety benchmarks (e.g., +65.5% on WildJailBreak) than defenders trained against static attackers. We further propose hidden Chain-of-Thought, allowing agents to plan privately, which boosts adversarial diversity and reduces over-refusals. Our results motivate a shift from reactive patching to proactive co-evolution in LM safety training, enabling scalable, autonomous, and robust self-improvement of LMs via multi-agent reinforcement learning (MARL).
nan
Article 1126
Title@2025-06-09 (1): Physics-Informed Deep Learning Model for Line-integral Diagnostics Across Fusion Devices
Title: Physics-Informed Deep Learning Model for Line-integral Diagnostics Across Fusion Devices | Physik-informiertes Deep-Learning-Modell für linienintegrierte Diagnosen über Fusionsgeräte | 横穿融合装置的直线整体诊断法深学习模型 2412.00087v3 |
Authors (13): Cong Wang, Weizhe Yang, Haiping Wang, Renjie Yang, Jing Li, Zhijun Wang, Yixiong Wei, Xianli Huang, Chenshu Hu, Zhaoyang Liu, Xinyao Yu, Changqing Zou, Zhifeng Zhao
Rapid reconstruction of 2D plasma profiles from line-integral measurements is important in nuclear fusion. This paper introduces a physics-informed model architecture called Onion, that can enhance the performance of models and be adapted to various backbone networks. The model under Onion incorporates physical information by a multiplication process and applies the physics-informed loss function according to the principle of line integration. Prediction results demonstrate that the additional input of physical information improves the deep learning model’s ability, leading to a reduction in the average relative error E_1 between the reconstruction profiles and the target profiles by approximately 0.84x10^(-2) on synthetic datasets and about 0.06x10^(-2) on experimental datasets. Furthermore, the implementation of the Softplus activation function in the final two fully connected layers improves model performance. This enhancement results in a reduction in the E_1 by approximately 1.06x10^(-2) on synthetic datasets and about 0.11x10^(-2) on experimental datasets. The incorporation of the physics-informed loss function has been shown to correct the model’s predictions, bringing the back-projections closer to the actual inputs and reducing the errors associated with inversion algorithms. Besides, we have developed a synthetic data model to generate customized line-integral diagnostic datasets and have also collected soft x-ray diagnostic datasets from EAST and HL-2A. This study achieves reductions in reconstruction errors, and accelerates the development of surrogate models in fusion research.
nan
Article 1127
Title@2025-06-09 (1): Circumventing Backdoor Space via Weight Symmetry
Title: Circumventing Backdoor Space via Weight Symmetry | Umgehung des Backdoor-Raums über die Gewichtssymmetrie | 通过重量对称法环绕后门空间 2506.07467v1 |
Authors (7): Jie Peng, Hongwei Yang, Jing Zhao, Hengji Dong, Hui He, Weizhe Zhang, Haoyu He
Deep neural networks are vulnerable to backdoor attacks, where malicious behaviors are implanted during training. While existing defenses can effectively purify compromised models, they typically require labeled data or specific training procedures, making them difficult to apply beyond supervised learning settings. Notably, recent studies have shown successful backdoor attacks across various learning paradigms, highlighting a critical security concern. To address this gap, we propose Two-stage Symmetry Connectivity (TSC), a novel backdoor purification defense that operates independently of data format and requires only a small fraction of clean samples. Through theoretical analysis, we prove that by leveraging permutation invariance in neural networks and quadratic mode connectivity, TSC amplifies the loss on poisoned samples while maintaining bounded clean accuracy. Experiments demonstrate that TSC achieves robust performance comparable to state-of-the-art methods in supervised learning scenarios. Furthermore, TSC generalizes to self-supervised learning frameworks, such as SimCLR and CLIP, maintaining its strong defense capabilities. Our code is available at https://github.com/JiePeng104/TSC.
nan
Article 1128
Title@2025-06-09 (1): Parameter-Efficient Fine-Tuning of State Space Models
Title: Parameter-Efficient Fine-Tuning of State Space Models | Parameter-Effizient Feintuning von State Space Models | 国家空间模型有效精美设计 2410.09016v3 |
Authors (5): Kevin Galim, Wonjun Kang, Yuchen Zeng, Hyung Il Koo, Kangwook Lee
Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains largely underexplored. We start by investigating two fundamental questions on existing PEFT methods: (i) How do they perform on SSM-based models? (ii) Which parameters should they target for optimal results? Our analysis shows that LoRA and its variants consistently outperform all other PEFT methods. While LoRA is effective for linear projection matrices, it fails on SSM modules-yet still outperforms other methods applicable to SSMs, indicating their limitations. This underscores the need for a specialized SSM tuning approach. To address this, we propose Sparse Dimension Tuning (SDT), a PEFT method tailored for SSM modules. Combining SDT for SSMs with LoRA for linear projection matrices, we achieve state-of-the-art performance across extensive experiments.
nan
Article 1129
Title@2025-06-09 (1): KScope: A Framework for Characterizing the Knowledge Status of Language Models
Title: KScope: A Framework for Characterizing the Knowledge Status of Language Models | KScope: Ein Rahmen für die Charakterisierung des Wissensstatus von Sprachmodellen | KCope: 语言模式知识状况描述框架 2506.07458v1 |
Authors (6): Yuxin Xiao, Shan Chen, Jack Gallifant, Danielle Bitterman, Thomas Hartvigsen, Marzyeh Ghassemi
Characterizing a large language model’s (LLM’s) knowledge of a given question is challenging. As a result, prior work has primarily examined LLM behavior under knowledge conflicts, where the model’s internal parametric memory contradicts information in the external context. However, this does not fully reflect how well the model knows the answer to the question. In this paper, we first introduce a taxonomy of five knowledge statuses based on the consistency and correctness of LLM knowledge modes. We then propose KScope, a hierarchical framework of statistical tests that progressively refines hypotheses about knowledge modes and characterizes LLM knowledge into one of these five statuses. We apply KScope to nine LLMs across four datasets and systematically establish: (1) Supporting context narrows knowledge gaps across models. (2) Context features related to difficulty, relevance, and familiarity drive successful knowledge updates. (3) LLMs exhibit similar feature preferences when partially correct or conflicted, but diverge sharply when consistently wrong. (4) Context summarization constrained by our feature analysis, together with enhanced credibility, further improves update effectiveness and generalizes across LLMs.
nan
Article 1130
Title@2025-06-09 (1): When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment
Title: When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment | Wenn Stil die Sicherheit bricht: Verteidigung von Sprachmodellen gegen hocheffiziente Stilausrichtung | 当样式打破安全时: 防御语言模型, 对抗超优样式对齐 2506.07452v1 |
Authors (5): Yuxin Xiao, Sana Tonekaboni, Walter Gerych, Vinith Suriyakumar, Marzyeh Ghassemi
Large language models (LLMs) can be prompted with specific styles (e.g., formatting responses as lists), including in jailbreak queries. Although these style patterns are semantically unrelated to the malicious intents behind jailbreak queries, their safety impact remains unclear. In this work, we seek to understand whether style patterns compromise LLM safety, how superficial style alignment increases model vulnerability, and how best to mitigate these risks during alignment. We evaluate 32 LLMs across seven jailbreak benchmarks, and find that malicious queries with style patterns inflate the attack success rate (ASR) for nearly all models. Notably, ASR inflation correlates with both the length of style patterns and the relative attention an LLM exhibits on them. We then investigate superficial style alignment, and find that fine-tuning with specific styles makes LLMs more vulnerable to jailbreaks of those same styles. Finally, we propose SafeStyle, a defense strategy that incorporates a small amount of safety training data augmented to match the distribution of style patterns in the fine-tuning data. Across three LLMs and five fine-tuning style settings, SafeStyle consistently outperforms baselines in maintaining LLM safety.
nan
Article 1131
Title@2025-06-09 (1): Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs
Title: Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs | Die Erweiterung der epistemischen Unsicherheit über Parameter hinaus würde helfen, zuverlässige LLMs zu konzipieren | 超越参数的不确定性延伸有助于设计可靠的长效磁环 2506.07448v1 |
Authors (4): T. Duy Nguyen-Hien, Desi R. Ivanova, Yee Whye Teh, Wee Sun Lee
Although large language models (LLMs) are highly interactive and extendable, current approaches to ensure reliability in deployments remain mostly limited to rejecting outputs with high uncertainty in order to avoid misinformation. This conservative strategy reflects the current lack of tools to systematically distinguish and respond to different sources of uncertainty. In this paper, we advocate for the adoption of Bayesian Modeling of Experiments – a framework that provides a coherent foundation to reason about uncertainty and clarify the reducibility of uncertainty – for managing and proactively addressing uncertainty that arises in LLM deployments. This framework enables LLMs and their users to take contextually appropriate steps, such as requesting clarification, retrieving external information, or refining inputs. By supporting active resolution rather than passive avoidance, it opens the door to more reliable, transparent, and broadly applicable LLM systems, particularly in high-stakes, real-world settings.
nan
Article 1132
Title@2025-06-09 (1): One Fits All: Learning Fair Graph Neural Networks for Various Sensitive Attributes
Title: One Fits All: Learning Fair Graph Neural Networks for Various Sensitive Attributes | Eins passt auf alle: Faire Graphen-Neural-Netzwerke für verschiedene Sensitive Attribute lernen | ” 一个适合所有人 “ :学习各种敏感属性的公平图表神经网络 2406.13544v3 |
Authors (5): Yuchang Zhu, Jintang Li, Yatao Bian, Zibin Zheng, Liang Chen
Recent studies have highlighted fairness issues in Graph Neural Networks (GNNs), where they produce discriminatory predictions against specific protected groups categorized by sensitive attributes such as race and age. While various efforts to enhance GNN fairness have made significant progress, these approaches are often tailored to specific sensitive attributes. Consequently, they necessitate retraining the model from scratch to accommodate changes in the sensitive attribute requirement, resulting in high computational costs. To gain deeper insights into this issue, we approach the graph fairness problem from a causal modeling perspective, where we identify the confounding effect induced by the sensitive attribute as the underlying reason. Motivated by this observation, we formulate the fairness problem in graphs from an invariant learning perspective, which aims to learn invariant representations across environments. Accordingly, we propose a graph fairness framework based on invariant learning, namely FairINV, which enables the training of fair GNNs to accommodate various sensitive attributes within a single training session. Specifically, FairINV incorporates sensitive attribute partition and trains fair GNNs by eliminating spurious correlations between the label and various sensitive attributes. Experimental results on several real-world datasets demonstrate that FairINV significantly outperforms state-of-the-art fairness approaches, underscoring its effectiveness. Our code is available via: https://github.com/ZzoomD/FairINV/.
nan
Article 1133
Title@2025-06-09 (1): Resilience of Rademacher chaos of low degree
Title: Resilience of Rademacher chaos of low degree | Resilienz von Rademacher Chaos von niedrigem Grad | Rademacher低程度混乱的复原能力 2402.10504v5 |
Authors (3): Elad Aigner-Horev, Daniel Rosenberg, Roi Weiss
The resilience of a Rademacher chaos is the maximum number of adversarial sign-flips that the chaos can sustain without having its largest atom probability significantly altered. Inspired by probabilistic lower-bound guarantees for the resilience of linear Rademacher chaos, obtained by Bandeira, Ferber, and Kwan (Advances in Mathematics, Vol. $319$, $2017$), we provide probabilistic lower-bound guarantees for the resilience of Rademacher chaos of arbitrary yet sufficiently low degree. Our main results distinguish between Rademacher chaos of order two and those of higher order. In that, our first main result pertains to the resilience of decoupled bilinear Rademacher forms where different asymptotic behaviour is observed for sparse and dense matrices. For our second main result, we bootstrap our first result in order to provide resilience guarantees for quadratic Rademacher chaos. Our third main result, generalises the first and handles the resilience of decoupled Rademacher chaos of arbitrary yet sufficiently low order. Our results for decoupled Rademacher chaos of order two and that of higher order whilst are established through the same conceptual framework, differ substantially. A difference incurred due to the implementation of the same conceptual argument. The order two result is established using Dudley’s maximal inequality for sub-Gaussian processes, the Hanson-Wright inequality, as well as the Kolmogorov-Rogozin inequality. To handle higher order chaos, appeals to Dudley’s inequality as well as the Hanson-Wright inequality are replaced with tools suited for random tensors. Appeals to the Hanson-Wright inequality are replaced with appeals to a concentration result for random tensors put forth by Adamczak and Wolff. Our results are instance-dependent and thus allow for the efficient computation of resilience guarantees provided the order of the chaos is constant.
nan
Article 1134
Title@2025-06-09 (1): Position: We Need Responsible, Application-Driven (RAD) AI Research
Title: Position: We Need Responsible, Application-Driven (RAD) AI Research | Position: Wir brauchen verantwortungsbewusste, anwendungsgetriebene (RAD) KI-Forschung | 立场:我们需要负责任、应用驱动(RAD)AI研究 2505.04104v2 |
Authors (4): Sarah Hartman, Cheng Soon Ong, Julia Powles, Petra Kuhnert
This position paper argues that achieving meaningful scientific and societal advances with artificial intelligence (AI) requires a responsible, application-driven approach (RAD) to AI research. As AI is increasingly integrated into society, AI researchers must engage with the specific contexts where AI is being applied. This includes being responsive to ethical and legal considerations, technical and societal constraints, and public discourse. We present the case for RAD-AI to drive research through a three-staged approach: (1) building transdisciplinary teams and people-centred studies; (2) addressing context-specific methods, ethical commitments, assumptions, and metrics; and (3) testing and sustaining efficacy through staged testbeds and a community of practice. We present a vision for the future of application-driven AI research to unlock new value through technically feasible methods that are adaptive to the contextual needs and values of the communities they ultimately serve.
nan
Article 1135
Title@2025-06-09 (1): Straight-Line Diffusion Model for Efficient 3D Molecular Generation
Title: Straight-Line Diffusion Model for Efficient 3D Molecular Generation | Geradliniges Diffusionsmodell für effiziente 3D-Molekulargeneration | 高效 3D 分子生成的直线扩散模型 2503.02918v2 |
Authors (8): Yuyan Ni, Shikun Feng, Haohan Chi, Bowen Zheng, Huan-ang Gao, Wei-Ying Ma, Zhi-Ming Ma, Yanyan Lan
Diffusion-based models have shown great promise in molecular generation but often require a large number of sampling steps to generate valid samples. In this paper, we introduce a novel Straight-Line Diffusion Model (SLDM) to tackle this problem, by formulating the diffusion process to follow a linear trajectory. The proposed process aligns well with the noise sensitivity characteristic of molecular structures and uniformly distributes reconstruction effort across the generative process, thus enhancing learning efficiency and efficacy. Consequently, SLDM achieves state-of-the-art performance on 3D molecule generation benchmarks, delivering a 100-fold improvement in sampling efficiency.
nan
Article 1136
Title@2025-06-09 (1): Federated In-Context Learning: Iterative Refinement for Improved Answer Quality
Title: Federated In-Context Learning: Iterative Refinement for Improved Answer Quality | Föderiertes In-Context-Lernen: Iterative Veredelung für verbesserte Antwortqualität | 联邦内部学习:改进答复质量的过渡性改进 2506.07440v1 |
Authors (8): Ruhan Wang, Zhiyong Wang, Chengkai Huang, Rui Wang, Tong Yu, Lina Yao, John C. S. Lui, Dongruo Zhou
For question-answering (QA) tasks, in-context learning (ICL) enables language models to generate responses without modifying their parameters by leveraging examples provided in the input. However, the effectiveness of ICL heavily depends on the availability of high-quality examples, which are often scarce due to data privacy constraints, annotation costs, and distribution disparities. A natural solution is to utilize examples stored on client devices, but existing approaches either require transmitting model parameters - incurring significant communication overhead - or fail to fully exploit local datasets, limiting their effectiveness. To address these challenges, we propose Federated In-Context Learning (Fed-ICL), a general framework that enhances ICL through an iterative, collaborative process. Fed-ICL progressively refines responses by leveraging multi-round interactions between clients and a central server, improving answer quality without the need to transmit model parameters. We establish theoretical guarantees for the convergence of Fed-ICL and conduct extensive experiments on standard QA benchmarks, demonstrating that our proposed approach achieves strong performance while maintaining low communication costs.
nan
Article 1137
Title@2025-06-09 (1): Fast Geometric Embedding for Node Influence Maximization
Title: Fast Geometric Embedding for Node Influence Maximization | Schnelle geometrische Einbettung für Knotenbeeinflussung Maximierung | 节点影响最大化快速几何嵌入 2506.07435v1 |
Authors (2): Alexander Kolpakov, Igor Rivin
Computing classical centrality measures such as betweenness and closeness is computationally expensive on large-scale graphs. In this work, we introduce an efficient force layout algorithm that embeds a graph into a low-dimensional space, where the radial distance from the origin serves as a proxy for various centrality measures. We evaluate our method on multiple graph families and demonstrate strong correlations with degree, PageRank, and paths-based centralities. As an application, it turns out that the proposed embedding allows to find high-influence nodes in a network, and provides a fast and scalable alternative to the standard greedy algorithm.
nan
Article 1138
Title@2025-06-09 (1): Cartridges: Lightweight and general-purpose long context representations via self-study
Title: Cartridges: Lightweight and general-purpose long context representations via self-study | Patronen: Leichte und universelle lange Kontextdarstellungen durch Selbststudium | Cartridges:轻量和一般用途长背景介绍,通过自学 2506.06266v2 |
Authors (11): Sabri Eyuboglu, Ryan Ehrlich, Simran Arora, Neel Guha, Dylan Zinsley, Emily Liu, Will Tennien, Atri Rudra, James Zou, Azalia Mirhoseini, Christopher Re
Large language models are often used to answer queries grounded in large text corpora (e.g. codebases, legal documents, or chat histories) by placing the entire corpus in the context window and leveraging in-context learning (ICL). Although current models support contexts of 100K-1M tokens, this setup is costly to serve because the memory consumption of the KV cache scales with input length. We explore an alternative: training a smaller KV cache offline on each corpus. At inference time, we load this trained KV cache, which we call a Cartridge, and decode a response. Critically, the cost of training a Cartridge can be amortized across all the queries referencing the same corpus. However, we find that the naive approach of training the Cartridge with next-token prediction on the corpus is not competitive with ICL. Instead, we propose self-study, a training recipe in which we generate synthetic conversations about the corpus and train the Cartridge with a context-distillation objective. We find that Cartridges trained with self-study replicate the functionality of ICL, while being significantly cheaper to serve. On challenging long-context benchmarks, Cartridges trained with self-study match ICL performance while using 38.6x less memory and enabling 26.4x higher throughput. Self-study also extends the model’s effective context length (e.g. from 128k to 484k tokens on MTOB) and surprisingly, leads to Cartridges that can be composed at inference time without retraining.
nan
Article 1139
Title@2025-06-09 (1): Multi-agent Architecture Search via Agentic Supernet
Title: Multi-agent Architecture Search via Agentic Supernet | Multi-Agent Architektur Suche über Agentic Supernet | 通过 Agric Supernet 多剂机构建筑搜索 2502.04180v2 |
Authors (6): Guibin Zhang, Luyang Niu, Junfeng Fang, Kun Wang, Lei Bai, Xiang Wang
Large Language Model (LLM)-empowered multi-agent systems extend the cognitive boundaries of individual agents through disciplined collaboration and interaction, while constructing these systems often requires labor-intensive manual designs. Despite the availability of methods to automate the design of agentic workflows, they typically seek to identify a static, complex, one-size-fits-all system, which, however, fails to dynamically allocate inference resources based on the difficulty and domain of each query. To address this challenge, we shift away from the pursuit of a monolithic agentic system, instead optimizing the \textbf{agentic supernet}, a probabilistic and continuous distribution of agentic architectures. We introduce MaAS, an automated framework that samples query-dependent agentic systems from the supernet, delivering high-quality solutions and tailored resource allocation (\textit{e.g.}, LLM calls, tool calls, token cost). Comprehensive evaluation across six benchmarks demonstrates that MaAS \textbf{(I)} requires only $6\sim45\%$ of the inference costs of existing handcrafted or automated multi-agent systems, \textbf{(II)} surpasses them by $0.54\%\sim11.82\%$, and \textbf{(III)} enjoys superior cross-dataset and cross-LLM-backbone transferability.
nan
Article 1140
Title@2025-06-09 (1): FlashDMoE: Fast Distributed MoE in a Single Kernel
Title: FlashDMoE: Fast Distributed MoE in a Single Kernel | FlashDMoE: Schnell verteiltes MoE in einem einzigen Kernel | FlashDMoE: 在一个单一核心中快速分布的教育部 2506.04667v2 |
Authors (3): Osayamen Jonathan Aimuyo, Byungsoo Oh, Rachee Singh
The computational sparsity of Mixture-of-Experts (MoE) models enables sub-linear growth in compute cost as model size increases, thus offering a scalable path to training massive neural networks. However, existing implementations suffer from \emph{low GPU utilization}, \emph{significant latency overhead}, and a fundamental \emph{inability to leverage task locality}, primarily due to CPU-managed scheduling, host-initiated communication, and frequent kernel launches. To overcome these limitations, we develop FlashDMoE, a fully GPU-resident MoE operator that fuses expert computation and inter-GPU communication into a \emph{single persistent GPU kernel}. FlashDMoE enables fine-grained pipelining of dispatch, compute, and combine phases, eliminating launch overheads and reducing idle gaps. Unlike existing work, FlashDMoE obviates bulk-synchronous collectives for one-sided, device-initiated, inter-GPU (R)DMA transfers, thus unlocking \emph{payload efficiency}, where we eliminate bloated or redundant network payloads in sparsely activated layers. When evaluated on a single 8-H100 GPU node with MoE models having up to 128 experts and 16K token sequences, FlashDMoE achieves up to \textbf{9}$\times$ higher GPU utilization, \textbf{6}$\times$ lower latency, \textbf{5.7}$\times$ higher throughput, and \textbf{4}$\times$ better overlap efficiency compared to state-of-the-art baselines, despite using FP32 while baselines use FP16. FlashDMoE demonstrates that principled GPU kernel-hardware co-design is key to unlocking the performance ceiling of large-scale distributed ML workloads.
nan
Article 1141
Title@2025-06-09 (1): State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models
Title: State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models | State-offset Tuning: State-based Parameter-Efficient Fine-Tuning für State Space Models | 以国家为基础的国家空间模型参数 – – 有效精美设计 2503.03499v2 |
Authors (6): Wonjun Kang, Kevin Galim, Yuchen Zeng, Minjae Lee, Hyung Il Koo, Nam Ik Cho
State Space Models (SSMs) have emerged as efficient alternatives to Transformers, mitigating their quadratic computational cost. However, the application of Parameter-Efficient Fine-Tuning (PEFT) methods to SSMs remains largely unexplored. In particular, prompt-based methods like Prompt Tuning and Prefix-Tuning, which are widely used in Transformers, do not perform well on SSMs. To address this, we propose state-based methods as a superior alternative to prompt-based methods. This new family of methods naturally stems from the architectural characteristics of SSMs. State-based methods adjust state-related features directly instead of depending on external prompts. Furthermore, we introduce a novel state-based PEFT method: State-offset Tuning. At every timestep, our method directly affects the state at the current step, leading to more effective adaptation. Through extensive experiments across diverse datasets, we demonstrate the effectiveness of our method. Code is available at https://github.com/furiosa-ai/ssm-state-tuning.
nan
Article 1142
Title@2025-06-09 (1): HeTa: Relation-wise Heterogeneous Graph Foundation Attack Model
Title: HeTa: Relation-wise Heterogeneous Graph Foundation Attack Model | HeTa: Verhältnisweises Heterogenes Graph Foundation Attack Model | HeTa: 关系性异异异图形基金会攻击模型 2506.07428v1 |
Authors (4): Yuling Wang, Zihui Chen, Pengfei Jiao, Xiao Wang
Heterogeneous Graph Neural Networks (HGNNs) are vulnerable, highlighting the need for tailored attacks to assess their robustness and ensure security. However, existing HGNN attacks often require complex retraining of parameters to generate specific perturbations for new scenarios. Recently, foundation models have opened new horizons for the generalization of graph neural networks by capturing shared semantics across various graph distributions. This leads us to ask:Can we design a foundation attack model for HGNNs that enables generalizable perturbations across different HGNNs, and quickly adapts to new heterogeneous graphs (HGs)? Empirical findings reveal that, despite significant differences in model design and parameter space, different HGNNs surprisingly share common vulnerability patterns from a relation-aware perspective. Therefore, we explore how to design foundation HGNN attack criteria by mining shared attack units. In this paper, we propose a novel relation-wise heterogeneous graph foundation attack model, HeTa. We introduce a foundation surrogate model to align heterogeneity and identify the importance of shared relation-aware attack units. Building on this, we implement a serialized relation-by-relation attack based on the identified relational weights. In this way, the perturbation can be transferred to various target HGNNs and easily fine-tuned for new HGs. Extensive experiments exhibit powerful attack performances and generalizability of our method.
nan
Article 1143
Title@2025-06-09 (1): Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective
Title: Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective | Macht-Rechts-Dekay Verlust für große Sprachmodell Finetuning: Eine Theorie-Perspektive | 大语言模型微调的功率法减缩损失:理论视角 2505.16900v5 |
Authors (1): Jintian Shao
During the finetuning stage of text generation tasks, standard cross-entropy loss treats all tokens equally. This can lead models to overemphasize high-frequency, low-information tokens, neglecting lower-frequency tokens crucial for specificity and informativeness in generated content. This paper introduces a novel loss function, Power-Law Decay Loss (PDL), specifically designed to optimize the finetuning process for text generation. The core motivation for PDL stems from observations in information theory and linguistics: the informativeness of a token is often inversely proportional to its frequency of occurrence. PDL re-weights the contribution of each token in the standard cross-entropy loss based on its frequency in the training corpus, following a power-law decay. Specifically, the weights for high-frequency tokens are reduced, while low-frequency, information-dense tokens are assigned higher weights. This mechanism guides the model during finetuning to focus more on learning and generating tokens that convey specific and unique information, thereby enhancing the quality, diversity, and informativeness of the generated text. We theoretically elaborate on the motivation and construction of PDL and discuss its potential applications and advantages across various text generation finetuning tasks, such as abstractive summarization, dialogue systems, and style transfer.
nan
Article 1144
Title@2025-06-09 (1): DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning
Title: DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning | DOMAIN: Mildly COnservative Model-BAsed Offline Verstärkungslernen | DOMAIN: 低度防腐性示范性软体外表强化学习 2309.08925v4 |
Authors (10): Xiao-Yin Liu, Xiao-Hu Zhou, Mei-Jiang Gui, Guo-Tao Li, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Qi-Chao Zhang, Biao Luo, Zeng-Guang Hou
Model-based reinforcement learning (RL), which learns an environment model from the offline dataset and generates more out-of-distribution model data, has become an effective approach to the problem of distribution shift in offline RL. Due to the gap between the learned and actual environment, conservatism should be incorporated into the algorithm to balance accurate offline data and imprecise model data. The conservatism of current algorithms mostly relies on model uncertainty estimation. However, uncertainty estimation is unreliable and leads to poor performance in certain scenarios, and the previous methods ignore differences between the model data, which brings great conservatism. To address the above issues, this paper proposes a milDly cOnservative Model-bAsed offlINe RL algorithm (DOMAIN) without estimating model uncertainty, and designs the adaptive sampling distribution of model samples, which can adaptively adjust the model data penalty. In this paper, we theoretically demonstrate that the Q value learned by the DOMAIN outside the region is a lower bound of the true Q value, the DOMAIN is less conservative than previous model-based offline RL algorithms, and has the guarantee of safety policy improvement. The results of extensive experiments show that DOMAIN outperforms prior RL algorithms and the average performance has improved by 1.8% on the D4RL benchmark.
nan
Article 1145
Title@2025-06-09 (1): IF-GUIDE: Influence Function-Guided Detoxification of LLMs
Title: IF-GUIDE: Influence Function-Guided Detoxification of LLMs | IF-GUIDE: Einfluss funktionsgeführte Entgiftung von LLMs | IFGIIDE: LLMs的受控功能解毒作用影响 2506.01790v2 |
Authors (4): Zachary Coalson, Juhan Bae, Nicholas Carlini, Sanghyun Hong
We study how training data contributes to the emergence of toxic behaviors in large-language models. Most prior work on reducing model toxicity adopts $reactive$ approaches, such as fine-tuning pre-trained (and potentially toxic) models to align them with human values. In contrast, we propose a $proactive$ approach$-$IF-Guide$-$which leverages influence functions to identify harmful tokens within any training data and suppress their impact during training. To this end, we first show that standard influence functions are ineffective at discovering harmful training records. We then present a novel adaptation that measures token-level attributions from training data to model toxicity, along with techniques for selecting toxic training documents and a learning objective that can be integrated into both pre-training and fine-tuning. Moreover, IF-Guide does not rely on human-preference data, which is typically required by existing alignment methods. In evaluation, we demonstrate that IF-Guide substantially reduces both explicit and implicit toxicity$-$by up to 10$\times$ compared to uncensored models, and up to 3$\times$ compared to baseline alignment methods, e.g., DPO and RAD$-$across both pre-training and fine-tuning scenarios. IF-Guide is computationally efficient: a billion-parameter model is $not$ $necessary$ for computing influence scores; a million-parameter model$-$with 7.5$\times$ fewer parameters$-$can effectively serve as a proxy for identifying harmful data. Our code is publicly available at: https://github.com/ztcoalson/IF-Guide
nan
Article 1146
Title@2025-06-09 (1): Evidential Spectrum-Aware Contrastive Learning for OOD Detection in Dynamic Graphs
Title: Evidential Spectrum-Aware Contrastive Learning for OOD Detection in Dynamic Graphs | Evidential Spectrum-Aware Kontrastives Lernen für OOD Detection in dynamischen Graphen | 动态图中 OOD 探测的 证据性光谱- 软件差异性学习 2506.07417v1 |
Authors (6): Nan Sun, Xixun Lin, Zhiheng Zhou, Yanmin Shang, Zhenlin Cheng, Yanan Cao
Recently, Out-of-distribution (OOD) detection in dynamic graphs, which aims to identify whether incoming data deviates from the distribution of the in-distribution (ID) training set, has garnered considerable attention in security-sensitive fields. Current OOD detection paradigms primarily focus on static graphs and confront two critical challenges: i) high bias and high variance caused by single-point estimation, which makes the predictions sensitive to randomness in the data; ii) score homogenization resulting from the lack of OOD training data, where the model only learns ID-specific patterns, resulting in overall low OOD scores and a narrow score gap between ID and OOD data. To tackle these issues, we first investigate OOD detection in dynamic graphs through the lens of Evidential Deep Learning (EDL). Specifically, we propose EviSEC, an innovative and effective OOD detector via Evidential Spectrum-awarE Contrastive Learning. We design an evidential neural network to redefine the output as the posterior Dirichlet distribution, explaining the randomness of inputs through the uncertainty of distribution, which is overlooked by single-point estimation. Moreover, spectrum-aware augmentation module generates OOD approximations to identify patterns with high OOD scores, thereby widening the score gap between ID and OOD data and mitigating score homogenization. Extensive experiments on real-world datasets demonstrate that EviSAC effectively detects OOD samples in dynamic graphs.
nan
Article 1147
Title@2025-06-09 (1): LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments
Title: LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments | LiteVLM: Eine Bildverarbeitungs-Modell-Inferenz-Pipeline mit geringer Latenz für ressourcenbeschränkte Umgebungen | LiteVLM:资源受限环境的低寿命愿景-语言模型推导管道 2506.07416v1 |
Authors (4): Jin Huang, Yuchao Jin, Le An, Josh Park
This paper introduces an efficient Vision-Language Model (VLM) pipeline specifically optimized for deployment on embedded devices, such as those used in robotics and autonomous driving. The pipeline significantly reduces the computational overhead by jointly leveraging patch selection to filter irrelevant camera views, a token selection module to reduce input sequence length for the LLM, and speculative decoding to accelerate token generation. Evaluation on the NVIDIA DRIVE Thor platform for automonous driving application, our pipeline achieves $2.5\times$ end-to-end latency reduction without compromising task accuracy. The speed-up further increases to $3.2\times$ when applying FP8 post-training quantization. These results demonstrate our pipeline as a viable solution for enabling real-time VLM deployment in resource-constrained environments.
nan
Article 1148
Title@2025-06-09 (1): Variational Supervised Contrastive Learning
Title: Variational Supervised Contrastive Learning | Variationelles Überwachtes Kontrastuelles Lernen | 差异监督反舞弊学习 2506.07413v1 |
Authors (5): Ziwen Wang, Jiajun Fan, Thao Nguyen, Heng Ji, Ge Liu
Contrastive learning has proven to be highly efficient and adaptable in shaping representation spaces across diverse modalities by pulling similar samples together and pushing dissimilar ones apart. However, two key limitations persist: (1) Without explicit regulation of the embedding distribution, semantically related instances can inadvertently be pushed apart unless complementary signals guide pair selection, and (2) excessive reliance on large in-batch negatives and tailored augmentations hinders generalization. To address these limitations, we propose Variational Supervised Contrastive Learning (VarCon), which reformulates supervised contrastive learning as variational inference over latent class variables and maximizes a posterior-weighted evidence lower bound (ELBO) that replaces exhaustive pair-wise comparisons for efficient class-aware matching and grants fine-grained control over intra-class dispersion in the embedding space. Trained exclusively on image data, our experiments on CIFAR-10, CIFAR-100, ImageNet-100, and ImageNet-1K show that VarCon (1) achieves state-of-the-art performance for contrastive learning frameworks, reaching 79.36% Top-1 accuracy on ImageNet-1K and 78.29% on CIFAR-100 with a ResNet-50 encoder while converging in just 200 epochs; (2) yields substantially clearer decision boundaries and semantic organization in the embedding space, as evidenced by KNN classification, hierarchical clustering results, and transfer-learning assessments; and (3) demonstrates superior performance in few-shot learning than supervised baseline and superior robustness across various augmentation strategies.
nan
Article 1149
Title@2025-06-09 (1): GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models
Title: GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models | GANQ: GPU-Adaptive nicht einheitliche Quantisierung für große Sprachmodelle | GANQ: 大语言模型的GPU-APU-APU-Adapidive非统一量化 2501.12956v3 |
Authors (2): Pengxiang Zhao, Xiaoming Yuan
Large Language Models (LLMs) face significant deployment challenges due to their substantial resource requirements. While low-bit quantized weights can reduce memory usage and improve inference efficiency, current hardware lacks native support for mixed-precision General Matrix Multiplication (mpGEMM), resulting in inefficient dequantization-based implementations. Moreover, uniform quantization methods often fail to capture weight distributions adequately, leading to performance degradation. We propose GANQ (GPU-Adaptive Non-Uniform Quantization), a layer-wise post-training non-uniform quantization framework optimized for hardware-efficient lookup table-based mpGEMM. GANQ achieves superior quantization performance by utilizing a training-free, GPU-adaptive optimization algorithm to efficiently reduce layer-wise quantization errors. Extensive experiments demonstrate GANQ’s ability to reduce the perplexity gap from the FP16 baseline compared to state-of-the-art methods for both 3-bit and 4-bit quantization. Furthermore, when deployed on a single NVIDIA RTX 4090 GPU, GANQ’s quantized models achieve up to 2.57$\times$ speedup over the baseline, advancing memory and inference efficiency in LLM deployment.
nan
Article 1150
Title@2025-06-09 (1): Structured Pruning for Diverse Best-of-N Reasoning Optimization
Title: Structured Pruning for Diverse Best-of-N Reasoning Optimization | Strukturierte Prunings für die unterschiedliche Best-of-N-Reasoning-Optimierung | 结构审慎,实现最佳理由的多样化最佳最佳理由的优化 2506.03978v2 |
Authors (3): Hieu Trung Nguyen, Bao Nguyen, Viet Anh Nguyen
Model pruning in transformer-based language models, traditionally viewed as a means of achieving computational savings, can enhance the model’s reasoning capabilities. In this work, we uncover a surprising phenomenon: the selective pruning of certain attention heads leads to improvements in reasoning performance, particularly on challenging tasks. Motivated by this observation, we propose SPRINT, a novel contrastive learning framework that dynamically selects the optimal head and layer to prune during inference. By aligning question embeddings with head embeddings, SPRINT identifies those pruned-head configurations that result in more accurate reasoning. Extensive experiments demonstrate that our method significantly outperforms traditional best-of-$N$ and random head selection strategies on the MATH500 and GSM8K datasets.
nan
Article 1151
Title@2025-06-09 (1): Fractional-order Jacobian Matrix Differentiation and Its Application in Artificial Neural Networks
Title: Fractional-order Jacobian Matrix Differentiation and Its Application in Artificial Neural Networks | Fraktions-Order Jacobian Matrix Differenzierung und seine Anwendung in künstlichen neuralen Netzwerken | Jacobian 矩阵差异及其在人造神经网络中的应用 2506.07408v1 |
Authors (6): Xiaojun zhou, Chunna Zhao, Yaqun Huang, Chengli Zhou, Junjie Ye, Kemeng Xiang
Fractional-order differentiation has many characteristics different from integer-order differentiation. These characteristics can be applied to the optimization algorithms of artificial neural networks to obtain better results. However, due to insufficient theoretical research, at present, there is no fractional-order matrix differentiation method that is perfectly compatible with automatic differentiation (Autograd) technology. Therefore, we propose a fractional-order matrix differentiation calculation method. This method is introduced by the definition of the integer-order Jacobian matrix. We denote it as fractional-order Jacobian matrix differentiation ($^\alpha }$). Through $^\alpha }$, we can carry out the matrix-based fractional-order chain rule. Based on the Linear module and the fractional-order differentiation, we design the fractional-order Autograd technology to enable the use of fractional-order differentiation in hidden layers, thereby enhancing the practicality of fractional-order differentiation in deep learning. In the experiment, according to the PyTorch framework, we design fractional-order Linear (FLinear) and replace nn.Linear in the multilayer perceptron with FLinear. Through the qualitative analysis of the training set and validation set $Loss$, the quantitative analysis of the test set indicators, and the analysis of time consumption and GPU memory usage during model training, we verify the superior performance of $^\alpha }$ and prove that it is an excellent fractional-order gradient descent method in the field of deep learning.
nan
Article 1152
Title@2025-06-09 (1): Anomaly Detection and Early Warning Mechanism for Intelligent Monitoring Systems in Multi-Cloud Environments Based on LLM
Title: Anomaly Detection and Early Warning Mechanism for Intelligent Monitoring Systems in Multi-Cloud Environments Based on LLM | Anomalieerkennungs- und Frühwarnmechanismus für intelligente Überwachungssysteme in Multi-Cloud-Umgebungen auf Basis von LLM | 以LLM为基础在多城市环境中建立智能监测系统的异常探测和预警机制 2506.07407v1 |
Authors (4): Yihong Jin, Ze Yang, Juntian Liu, Xinhe Xu
With the rapid development of multi-cloud environments, it is increasingly important to ensure the security and reliability of intelligent monitoring systems. In this paper, we propose an anomaly detection and early warning mechanism for intelligent monitoring system in multi-cloud environment based on Large-Scale Language Model (LLM). On the basis of the existing monitoring framework, the proposed model innovatively introduces a multi-level feature extraction method, which combines the natural language processing ability of LLM with traditional machine learning methods to enhance the accuracy of anomaly detection and improve the real-time response efficiency. By introducing the contextual understanding capabilities of LLMs, the model dynamically adapts to different cloud service providers and environments, so as to more effectively detect abnormal patterns and predict potential failures. Experimental results show that the proposed model is significantly better than the traditional anomaly detection system in terms of detection accuracy and latency, and significantly improves the resilience and active management ability of cloud infrastructure.
nan
Article 1153
Title@2025-06-09 (1): InverseScope: Scalable Activation Inversion for Interpreting Large Language Models
Title: InverseScope: Scalable Activation Inversion for Interpreting Large Language Models | InverseScope: Skalierbare Aktivierungs-Inversion zum Dolmetschen großer Sprachmodelle | 反向范围: 用于解释大语言模型的可缩放活动转换 2506.07406v1 |
Authors (3): Yifan Luo, Zhennan Zhou, Bin Dong
Understanding the internal representations of large language models (LLMs) is a central challenge in interpretability research. Existing feature interpretability methods often rely on strong assumptions about the structure of representations that may not hold in practice. In this work, we introduce InverseScope, an assumption-light and scalable framework for interpreting neural activations via input inversion. Given a target activation, we define a distribution over inputs that generate similar activations and analyze this distribution to infer the encoded features. To address the inefficiency of sampling in high-dimensional spaces, we propose a novel conditional generation architecture that significantly improves sample efficiency compared to previous methods. We further introduce a quantitative evaluation protocol that tests interpretability hypotheses using feature consistency rate computed over the sampled inputs. InverseScope scales inversion-based interpretability methods to larger models and practical tasks, enabling systematic and quantitative analysis of internal representations in real-world LLMs.
nan
Article 1154
Title@2025-06-09 (1): RiemannFormer: A Framework for Attention in Curved Spaces
Title: RiemannFormer: A Framework for Attention in Curved Spaces | RiemannFormer: Ein Rahmenwerk für Aufmerksamkeit in geschwungenen Räumen | Riemann Fremer: 曲线空间的注意框架 2506.07405v1 |
Authors (1): Zhongping Ji
This research endeavors to offer insights into unlocking the further potential of transformer-based architectures. One of the primary motivations is to offer a geometric interpretation for the attention mechanism in transformers. In our framework, the attention mainly involves metric tensors, tangent spaces, inner product, and how they relate to each other. These quantities and structures at discrete positions are intricately interconnected via the parallel transport of tangent vectors. To make the learning process more efficient, we reduce the number of parameters through ingenious predefined configurations. Moreover, we introduce an explicit mechanism to highlight a neighborhood by attenuating the remote values, given that transformers inherently neglect local inductive bias. Experimental results demonstrate that our modules deliver significant performance improvements relative to the baseline. More evaluation experiments on visual and large language models will be launched successively.
nan
Article 1155
Title@2025-06-09 (1): Outlier-weighed Layerwise Sampling for LLM Fine-tuning
Title: Outlier-weighed Layerwise Sampling for LLM Fine-tuning | Ausgewägte schichtweise Probenahme für LLM-Feinabstimmung | LLM 精密调试 2405.18380v3 |
Authors (4): Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei Liu
The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks. However, the substantial size of LLMs presents significant challenges in training or fine-tuning. While parameter-efficient approaches such as low-rank adaptation (LoRA) have gained popularity, they often compromise performance compared to full-rank fine-tuning. In this paper, we propose Outlier-weighed Layerwise Sampling (OWS), a new memory-efficient fine-tuning approach, inspired by the layerwise outlier distribution of LLMs. Unlike LoRA, which adds extra adapters to all layers, OWS strategically assigns higher sampling probabilities to layers with more outliers, selectively sampling only a few layers and fine-tuning their pre-trained weights. To further increase the number of fine-tuned layers without a proportional rise in memory costs, we incorporate gradient low-rank projection, further boosting the approach’s performance. Our extensive experiments across various architectures, including LLaMa2 and Mistral, demonstrate that OWS consistently outperforms baseline approaches, including full fine-tuning. Specifically, it achieves up to a 1.1% average accuracy gain on the Commonsense Reasoning benchmark, a 3.0% improvement on MMLU, and a notable 10% boost on MT-Bench, while being more memory efficient. OWS allows us to fine-tune 7B LLMs with only 21GB of memory. Our code is available at https://github.com/pixeli99/OWS.
nan
Article 1156
Title@2025-06-09 (1): Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions
Title: Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions | Numerische Untersuchung der Sequenzmodellierungstheorie mit steuerbaren Speicherfunktionen | 使用可控内存函数对序列模拟理论进行数字调查 2506.05678v2 |
Authors (4): Haotian Jiang, Zeyu Bao, Shida Wang, Qianxiao Li
The evolution of sequence modeling architectures, from recurrent neural networks and convolutional models to Transformers and structured state-space models, reflects ongoing efforts to address the diverse temporal dependencies inherent in sequential data. Despite this progress, systematically characterizing the strengths and limitations of these architectures remains a fundamental challenge. In this work, we propose a synthetic benchmarking framework to evaluate how effectively different sequence models capture distinct temporal structures. The core of this approach is to generate synthetic targets, each characterized by a memory function and a parameter that determines the strength of temporal dependence. This setup allows us to produce a continuum of tasks that vary in temporal complexity, enabling fine-grained analysis of model behavior concerning specific memory properties. We focus on four representative memory functions, each corresponding to a distinct class of temporal structures. Experiments on several sequence modeling architectures confirm existing theoretical insights and reveal new findings. These results demonstrate the effectiveness of the proposed method in advancing theoretical understanding and highlight the importance of using controllable targets with clearly defined structures for evaluating sequence modeling architectures.
nan
Article 1157
Title@2025-06-09 (1): G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems
Title: G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems | G-Memory: Hierarchischer Speicher für Multi-Agent-Systeme | G-记忆:为多机构系统追踪等级记忆 2506.07398v1 |
Authors (6): Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, Shuicheng Yan
Large language model (LLM)-powered multi-agent systems (MAS) have demonstrated cognitive and execution capabilities that far exceed those of single LLM agents, yet their capacity for self-evolution remains hampered by underdeveloped memory architectures. Upon close inspection, we are alarmed to discover that prevailing MAS memory mechanisms (1) are overly simplistic, completely disregarding the nuanced inter-agent collaboration trajectories, and (2) lack cross-trial and agent-specific customization, in stark contrast to the expressive memory developed for single agents. To bridge this gap, we introduce G-Memory, a hierarchical, agentic memory system for MAS inspired by organizational memory theory, which manages the lengthy MAS interaction via a three-tier graph hierarchy: insight, query, and interaction graphs. Upon receiving a new user query, G-Memory performs bi-directional memory traversal to retrieve both $\textit{high-level, generalizable insights}$ that enable the system to leverage cross-trial knowledge, and $\textit{fine-grained, condensed interaction trajectories}$ that compactly encode prior collaboration experiences. Upon task execution, the entire hierarchy evolves by assimilating new collaborative trajectories, nurturing the progressive evolution of agent teams. Extensive experiments across five benchmarks, three LLM backbones, and three popular MAS frameworks demonstrate that G-Memory improves success rates in embodied action and accuracy in knowledge QA by up to $20.89\%$ and $10.12\%$, respectively, without any modifications to the original frameworks. Our codes are available at https://github.com/bingreeky/GMemory.
nan
Article 1158
Title@2025-06-09 (1): Nonparametric Modern Hopfield Models
Title: Nonparametric Modern Hopfield Models | Nichtparametrische moderne Hopfield-Modelle | 非参数现代跳地模型 2404.03900v2 |
Authors (5): Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu, Feng Ruan, Han Liu
We present a nonparametric interpretation for deep learning compatible modern Hopfield models and utilize this new perspective to debut efficient variants. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Interestingly, our framework not only recovers the known results from the original dense modern Hopfield model but also fills the void in the literature regarding efficient modern Hopfield models, by introducing \textit{sparse-structured} modern Hopfield models with sub-quadratic complexity. We establish that this sparse model inherits the appealing theoretical properties of its dense analogue – connection with transformer attention, fixed point convergence and exponential memory capacity. Additionally, we showcase the versatility of our framework by constructing a family of modern Hopfield models as extensions, including linear, random masked, top-$K$ and positive random feature modern Hopfield models. Empirically, we validate our framework in both synthetic and realistic settings for memory retrieval and learning tasks.
nan
Article 1159
Title@2025-06-09 (1): From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks
Title: From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks | Von der statischen zur adaptiven Verteidigung: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks | 从静态到适应性防卫:联邦多要求深入强化学习-驱动移动目标防御,防止UAV Swarm网络的DoS攻击 2506.07392v1 |
Authors (6): Yuyang Zhou, Guang Cheng, Kang Du, Zihan Chen, Tian Qin, Yuyu Zhao
The proliferation of unmanned aerial vehicle (UAV) swarms has enabled a wide range of mission-critical applications, but also exposes UAV networks to severe Denial-of-Service (DoS) threats due to their open wireless environment, dynamic topology, and resource constraints. Traditional static or centralized defense mechanisms are often inadequate for such dynamic and distributed scenarios. To address these challenges, we propose a novel federated multi-agent deep reinforcement learning (FMADRL)-driven moving target defense (MTD) framework for proactive and adaptive DoS mitigation in UAV swarm networks. Specifically, we design three lightweight and coordinated MTD mechanisms, including leader switching, route mutation, and frequency hopping, that leverage the inherent flexibility of UAV swarms to disrupt attacker efforts and enhance network resilience. The defense problem is formulated as a multi-agent partially observable Markov decision process (POMDP), capturing the distributed, resource-constrained, and uncertain nature of UAV swarms under attack. Each UAV is equipped with a local policy agent that autonomously selects MTD actions based on partial observations and local experiences. By employing a policy gradient-based FMADRL algorithm, UAVs collaboratively optimize their defense policies via reward-weighted aggregation, enabling distributed learning without sharing raw data and thus reducing communication overhead. Extensive simulations demonstrate that our approach significantly outperforms state-of-the-art baselines, achieving up to a 34.6% improvement in attack mitigation rate, a reduction in average recovery time of up to 94.6%, and decreases in energy consumption and defense cost by as much as 29.3% and 98.3%, respectively, while maintaining robust mission continuity under various DoS attack strategies.
nan
Article 1160
Title@2025-06-09 (1): Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection
Title: Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection | Schutz von Deep-Learning-Modell-Urheberrechten mit zweifelhafter Beispiel-freier Wiederverwertungserkennung | 保护深学习模式版权,进行反反对学性实例自由再利用探测 2407.03883v2 |
Authors (4): Xiaokun Luan, Xiyue Zhang, Jingyi Wang, Meng Sun
Model reuse techniques can reduce the resource requirements for training high-performance deep neural networks (DNNs) by leveraging existing models. However, unauthorized reuse and replication of DNNs can lead to copyright infringement and economic loss to the model owner. This underscores the need to analyze the reuse relation between DNNs and develop copyright protection techniques to safeguard intellectual property rights. Existing white-box testing-based approaches cannot address the common heterogeneous reuse case where the model architecture is changed, and DNN fingerprinting approaches heavily rely on generating adversarial examples with good transferability, which is known to be challenging in the black-box setting. To bridge the gap, we propose NFARD, a Neuron Functionality Analysis-based Reuse Detector, which only requires normal test samples to detect reuse relations by measuring the models’ differences on a newly proposed model characterization, i.e., neuron functionality (NF). A set of NF-based distance metrics is designed to make NFARD applicable to both white-box and black-box settings. Moreover, we devise a linear transformation method to handle heterogeneous reuse cases by constructing the optimal projection matrix for dimension consistency, significantly extending the application scope of NFARD. To the best of our knowledge, this is the first adversarial example-free method that exploits neuron functionality for DNN copyright protection. As a side contribution, we constructed a reuse detection benchmark named Reuse Zoo that covers various practical reuse techniques and popular datasets. Extensive evaluations on this comprehensive benchmark show that NFARD achieves F1 scores of 0.984 and 1.0 for detecting reuse relationships in black-box and white-box settings, respectively, while generating test suites 2 ~ 99 times faster than previous methods.
nan
Article 1161
Title@2025-06-09 (1): Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert
Title: Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert | Lärm-Robustheit durch Lärm: Asymmetrische LoRA-Anpassung mit Giftungsexperte | 噪音造成的噪音破坏:与中毒专家一起进行非对称 LoRA适应 2505.23868v3 |
Authors (8): Zhaokun Wang, Jinyu Guo, Jingwen Pu, Lingfeng Chen, Hongli Pu, Jie Ou, Libo Qin, Wenhong Tian
Current parameter-efficient fine-tuning methods for adapting pre-trained language models to downstream tasks are susceptible to interference from noisy data. Conventional noise-handling approaches either rely on laborious data pre-processing or employ model architecture modifications prone to error accumulation. In contrast to existing noise-process paradigms, we propose a noise-robust adaptation method via asymmetric LoRA poisoning experts (LoPE), a novel framework that enhances model robustness to noise only with generated noisy data. Drawing inspiration from the mixture-of-experts architecture, LoPE strategically integrates a dedicated poisoning expert in an asymmetric LoRA configuration. Through a two-stage paradigm, LoPE performs noise injection on the poisoning expert during fine-tuning to enhance its noise discrimination and processing ability. During inference, we selectively mask the dedicated poisoning expert to leverage purified knowledge acquired by normal experts for noise-robust output. Extensive experiments demonstrate that LoPE achieves strong performance and robustness purely through the low-cost noise injection, which completely eliminates the requirement of data cleaning.
nan
Article 1162
Title@2025-06-09 (1): Moment Alignment: Unifying Gradient and Hessian Matching for Domain Generalization
Title: Moment Alignment: Unifying Gradient and Hessian Matching for Domain Generalization | Moment Alignment: Vereinheitlichung von Gradienten und Hessischen Matching für Domain Generalization | 时速对齐: 统一渐变和赫西相匹配域域通用化 2506.07378v1 |
Authors (4): Yuen Chen, Haozhe Si, Guojun Zhang, Han Zhao
Domain generalization (DG) seeks to develop models that generalize well to unseen target domains, addressing the prevalent issue of distribution shifts in real-world applications. One line of research in DG focuses on aligning domain-level gradients and Hessians to enhance generalization. However, existing methods are computationally inefficient and the underlying principles of these approaches are not well understood. In this paper, we develop the theory of moment alignment for DG. Grounded in \textit{transfer measure}, a principled framework for quantifying generalizability between two domains, we first extend the definition of transfer measure to domain generalization that includes multiple source domains and establish a target error bound. Then, we prove that aligning derivatives across domains improves transfer measure both when the feature extractor induces an invariant optimal predictor across domains and when it does not. Notably, moment alignment provides a unifying understanding of Invariant Risk Minimization, gradient matching, and Hessian matching, three previously disconnected approaches to DG. We further connect feature moments and derivatives of the classifier head, and establish the duality between feature learning and classifier fitting. Building upon our theory, we introduce \textbf{C}losed-Form \textbf{M}oment \textbf{A}lignment (CMA), a novel DG algorithm that aligns domain-level gradients and Hessians in closed-form. Our method overcomes the computational inefficiencies of existing gradient and Hessian-based techniques by eliminating the need for repeated backpropagation or sampling-based Hessian estimation. We validate the efficacy of our approach through two sets of experiments: linear probing and full fine-tuning. CMA demonstrates superior performance in both settings compared to Empirical Risk Minimization and state-of-the-art algorithms.
nan
Article 1163
Title@2025-06-09 (1): Adapter Naturally Serves as Decoupler for Cross-Domain Few-Shot Semantic Segmentation
Title: Adapter Naturally Serves as Decoupler for Cross-Domain Few-Shot Semantic Segmentation | Adapter dient natürlich als Entkoppler für die Cross-Domain-Few-Shot Semantic Segmentierung | 自然适应器作为跨域鲜热语义分割的解码器 2506.07376v1 |
Authors (6): Jintao Tong, Ran Ma, Yixiong Zou, Guangyao Chen, Yuhua Li, Ruixuan Li
Cross-domain few-shot segmentation (CD-FSS) is proposed to pre-train the model on a source-domain dataset with sufficient samples, and then transfer the model to target-domain datasets where only a few samples are available for efficient fine-tuning. There are majorly two challenges in this task: (1) the domain gap and (2) fine-tuning with scarce data. To solve these challenges, we revisit the adapter-based methods, and discover an intriguing insight not explored in previous works: the adapter not only helps the fine-tuning of downstream tasks but also naturally serves as a domain information decoupler. Then, we delve into this finding for an interpretation, and find the model’s inherent structure could lead to a natural decoupling of domain information. Building upon this insight, we propose the Domain Feature Navigator (DFN), which is a structure-based decoupler instead of loss-based ones like current works, to capture domain-specific information, thereby directing the model’s attention towards domain-agnostic knowledge. Moreover, to prevent the potential excessive overfitting of DFN during the source-domain training, we further design the SAM-SVN method to constrain DFN from learning sample-specific knowledge. On target domains, we freeze the model and fine-tune the DFN to learn target-specific knowledge specific. Extensive experiments demonstrate that our method surpasses the state-of-the-art method in CD-FSS significantly by 2.69% and 4.68% MIoU in 1-shot and 5-shot scenarios, respectively.
nan
Article 1164
Title@2025-06-09 (1): Numerical Analysis of HiPPO-LegS ODE for Deep State Space Models
Title: Numerical Analysis of HiPPO-LegS ODE for Deep State Space Models | Numerische Analyse von HiPPO-LegS ODE für Deep State Space Modelle | HIPPPO-LegS深海国家空间模型ODE的数值分析 2412.08595v2 |
Authors (4): Jaesung R. Park, Jaewook J. Suh, Youngjoon Hong, Ernest K. Ryu
In deep learning, the recently introduced state space models utilize HiPPO (High-order Polynomial Projection Operators) memory units to approximate continuous-time trajectories of input functions using ordinary differential equations (ODEs), and these techniques have shown empirical success in capturing long-range dependencies in long input sequences. However, the mathematical foundations of these ODEs, particularly the singular HiPPO-LegS (Legendre Scaled) ODE, and their corresponding numerical discretizations remain unsettled. In this work, we fill this gap by establishing that HiPPO-LegS ODE is well-posed despite its singularity, albeit without the freedom of arbitrary initial conditions. Further, we establish convergence of the associated numerical discretization schemes for Riemann integrable input functions.
nan
Article 1165
Title@2025-06-09 (1): A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment
Title: A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment | Eine umfassende Umfrage in LLM(-Agent) Full Stack Sicherheit: Daten, Schulung und Bereitstellung | 用LLLM(-代理)全堆安全:数据、培训和部署进行的全面调查 2504.15585v4 |
Authors (103): Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Shicheng Xu, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu, Yue Liu, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Zhaoxin Fan, Kai Wang, Yi Ding, Donghai Hong, Jiaming Ji, Yingxin Lai, Zitong Yu, Xinfeng Li, Yifan Jiang, Yanhui Li, Xinyu Deng, Junlin Wu, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Qiufeng Wang, Xiaolong Jin, Wenxuan Wang, Dongrui Liu, Yanwei Yue, Wenke Huang, Guancheng Wan, Heng Chang, Tianlin Li, Yi Yu, Chenghao Li, Jiawei Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Jiaheng Zhang, Tianwei Zhang, Xingjun Ma, Jindong Gu, Liang Pang, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Lingjuan Lyu, Yuval Elovici, Bhavya Kailkhura, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, Xiaofeng Wang, Dacheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu
The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concern, not only for researchers and corporations but also for every nation. Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e.g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire “lifechain” of LLMs. To address this gap, this paper introduces, for the first time, the concept of “full-stack” safety to systematically consider safety issues throughout the entire process of LLM training, deployment, and eventual commercialization. Compared to the off-the-shelf LLM safety surveys, our work demonstrates several distinctive advantages: (I) Comprehensive Perspective. We define the complete LLM lifecycle as encompassing data preparation, pre-training, post-training, deployment and final commercialization. To our knowledge, this represents the first safety survey to encompass the entire lifecycle of LLMs. (II) Extensive Literature Support. Our research is grounded in an exhaustive review of over 800+ papers, ensuring comprehensive coverage and systematic organization of security issues within a more holistic understanding. (III) Unique Insights. Through systematic literature analysis, we have developed reliable roadmaps and perspectives for each chapter. Our work identifies promising research directions, including safety in data generation, alignment techniques, model editing, and LLM-based agent systems. These insights provide valuable guidance for researchers pursuing future work in this field.
nan
Article 1166
Title@2025-06-09 (1): Theoretical Benefit and Limitation of Diffusion Language Model
Title: Theoretical Benefit and Limitation of Diffusion Language Model | Theoretischer Nutzen und Begrenzung des Diffusions-Sprachmodells | 传播语言模式理论效益和限制 2502.09622v2 |
Authors (6): Guhao Feng, Yihan Geng, Jian Guan, Wei Wu, Liwei Wang, Di He
Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each diffusion step. However, its efficiency-accuracy trade-off is not yet well understood. In this paper, we present a rigorous theoretical analysis of a widely used type of diffusion language model, the Masked Diffusion Model (MDM), and find that its effectiveness heavily depends on the target evaluation metric. Under mild conditions, we prove that when using perplexity as the metric, MDMs can achieve near-optimal perplexity in sampling steps regardless of sequence length, demonstrating that efficiency can be achieved without sacrificing performance. However, when using the sequence error rate–which is important for understanding the “correctness” of a sequence, such as a reasoning chain–we show that the required sampling steps must scale linearly with sequence length to obtain “correct” sequences, thereby eliminating MDM’s efficiency advantage over autoregressive models. Our analysis establishes the first theoretical foundation for understanding the benefits and limitations of MDMs. All theoretical findings are supported by empirical studies.
nan
Article 1167
Title@2025-06-09 (1): MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing
Title: MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing | MoE-GPS: Guidlines für Vorhersagestrategie für dynamische Expertenduplikation in MoE Load Balancing | MoE-GPS:教育部负载平衡中动态专家重叠预测战略指南 2506.07366v1 |
Authors (3): Haiyue Ma, Zhixu Du, Yiran Chen
In multi-GPU Mixture-of-Experts (MoE) network, experts are distributed across different GPUs, which creates load imbalance as each expert processes different number of tokens. Recent works improve MoE inference load balance by dynamically duplicating popular experts to more GPUs to process excessive tokens, which requires predicting the distribution before routing. In this paper, we discuss the tradeoff of prediction strategies, accuracies, overhead, and end-to-end system performance. We propose MoE-GPS, a framework that guides the selection of the optimal predictor design under various system configurations, by quantifying the performance impact to system-level model runtime. Specifically, we advocate for Distribution-Only Prediction, a prediction strategy that only predicts overall token distribution which significantly reduces overhead compared to the traditional Token-to-Expert Prediction. On Mixtral 8x7B MMLU dataset, MoE-GPS suggests Distribution-Only Prediction which improves end-to-end inference performance by more than 23% compared with Token-to-Expert Prediction.
nan
Article 1168
Title@2025-06-09 (1): Multiple Object Stitching for Unsupervised Representation Learning
Title: Multiple Object Stitching for Unsupervised Representation Learning | Mehrere Objektstiche für unüberwachtes Repräsentationslernen | 用于无人监督的代表学习的多对象拼接 2506.07364v1 |
Authors (3): Chengchao Shen, Dawei Liu, Jianxin Wang
Contrastive learning for single object centric images has achieved remarkable progress on unsupervised representation, but suffering inferior performance on the widespread images with multiple objects. In this paper, we propose a simple but effective method, Multiple Object Stitching (MOS), to refine the unsupervised representation for multi-object images. Specifically, we construct the multi-object images by stitching the single object centric ones, where the objects in the synthesized multi-object images are predetermined. Hence, compared to the existing contrastive methods, our method provides additional object correspondences between multi-object images without human annotations. In this manner, our method pays more attention to the representations of each object in multi-object image, thus providing more detailed representations for complicated downstream tasks, such as object detection and semantic segmentation. Experimental results on ImageNet, CIFAR and COCO datasets demonstrate that our proposed method achieves the leading unsupervised representation performance on both single object centric images and multi-object ones. The source code is available at https://github.com/visresearch/MultipleObjectStitching.
nan
Article 1169
Title@2025-06-09 (1): Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding
Title: Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding | Stochastische nichtlineare Steuerung über finite-dimensionale Spektral-Dynamische Einbettung | 通过有限维谱光谱动态嵌入器控制非线性控件 2304.03907v5 |
Authors (5): Zhaolin Ren, Tongzheng Ren, Haitong Ma, Na Li, Bo Dai
This paper proposes an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear stochastic systems. This method reveals an infinite-dimensional feature representation induced by the system’s nonlinear stochastic dynamics, enabling a linear representation of the state-action value function. For practical implementation, this representation is approximated using finite-dimensional trucations, specifically via two prominent kernel approximation methods: random feature truncation and Nystrom approximation. To characterize the effectiveness of these approximations, we provide an in-depth theoretical analysis to characterize the approximation error arising from the finite-dimension truncation and statistical error due to finite-sample approximation in both policy evaluation and policy optimization. Empirically, our algorithm performs favorably against existing stochastic control algorithms on several benchmark problems.
nan
Article 1170
Title@2025-06-09 (1): Lightweight Joint Audio-Visual Deepfake Detection via Single-Stream Multi-Modal Learning Framework
Title: Lightweight Joint Audio-Visual Deepfake Detection via Single-Stream Multi-Modal Learning Framework | Leichte gemeinsame Audio-Visuelle Deepfake-Erkennung über Multi-Modal-Lernrahmen mit Single-Stream | 通过单层多模式学习框架,通过单一链式多模式学习框架,联合探测音频-视觉深海假相 2506.07358v1 |
Authors (5): Kuiyuan Zhang, Wenjie Pei, Rushi Lan, Yifang Guo, Zhongyun Hua
Deepfakes are AI-synthesized multimedia data that may be abused for spreading misinformation. Deepfake generation involves both visual and audio manipulation. To detect audio-visual deepfakes, previous studies commonly employ two relatively independent sub-models to learn audio and visual features, respectively, and fuse them subsequently for deepfake detection. However, this may underutilize the inherent correlations between audio and visual features. Moreover, utilizing two isolated feature learning sub-models can result in redundant neural layers, making the overall model inefficient and impractical for resource-constrained environments. In this work, we design a lightweight network for audio-visual deepfake detection via a single-stream multi-modal learning framework. Specifically, we introduce a collaborative audio-visual learning block to efficiently integrate multi-modal information while learning the visual and audio features. By iteratively employing this block, our single-stream network achieves a continuous fusion of multi-modal features across its layers. Thus, our network efficiently captures visual and audio features without the need for excessive block stacking, resulting in a lightweight network design. Furthermore, we propose a multi-modal classification module that can boost the dependence of the visual and audio classifiers on modality content. It also enhances the whole resistance of the video classifier against the mismatches between audio and visual modalities. We conduct experiments on the DF-TIMIT, FakeAVCeleb, and DFDC benchmark datasets. Compared to state-of-the-art audio-visual joint detection methods, our method is significantly lightweight with only 0.48M parameters, yet it achieves superiority in both uni-modal and multi-modal deepfakes, as well as in unseen types of deepfakes.
nan
Article 1171
Title@2025-06-09 (1): CBAM-STN-TPS-YOLO: Enhancing Agricultural Object Detection through Spatially Adaptive Attention Mechanisms
Title: CBAM-STN-TPS-YOLO: Enhancing Agricultural Object Detection through Spatially Adaptive Attention Mechanisms | CBAM-STN-TPS-YOLO: Verbesserung der Erkennung landwirtschaftlicher Objekte durch räumlich adaptive Aufmerksamkeitsmechanismen | CBAM-STN-TPS-YOLO:通过空间适应性关注机制加强农业物体的探测 2506.07357v1 |
Authors (2): Satvik Praveen, Yoonsung Jung
Object detection is vital in precision agriculture for plant monitoring, disease detection, and yield estimation. However, models like YOLO struggle with occlusions, irregular structures, and background noise, reducing detection accuracy. While Spatial Transformer Networks (STNs) improve spatial invariance through learned transformations, affine mappings are insufficient for non-rigid deformations such as bent leaves and overlaps. We propose CBAM-STN-TPS-YOLO, a model integrating Thin-Plate Splines (TPS) into STNs for flexible, non-rigid spatial transformations that better align features. Performance is further enhanced by the Convolutional Block Attention Module (CBAM), which suppresses background noise and emphasizes relevant spatial and channel-wise features. On the occlusion-heavy Plant Growth and Phenotyping (PGP) dataset, our model outperforms STN-YOLO in precision, recall, and mAP. It achieves a 12% reduction in false positives, highlighting the benefits of improved spatial flexibility and attention-guided refinement. We also examine the impact of the TPS regularization parameter in balancing transformation smoothness and detection performance. This lightweight model improves spatial awareness and supports real-time edge deployment, making it ideal for smart farming applications requiring accurate and efficient monitoring.
nan
Article 1172
Title@2025-06-09 (1): Unveiling and Addressing Pseudo Forgetting in Large Language Models
Title: Unveiling and Addressing Pseudo Forgetting in Large Language Models | Enthüllung und Adressierung von Pseudo Forgetting in großen Sprachmodellen | 在大语言模式中统一和解决 “ 优于 “ 的遗忘 2411.11932v2 |
Authors (5): Huashan Sun, Yizhe Yang, Yinghao Li, Jiawei Li, Yang Gao
Although substantial efforts have been made to mitigate catastrophic forgetting in continual learning, the intrinsic mechanisms are not well understood. In this work, we demonstrate the existence of “pseudo forgetting”: the performance degradation on previous tasks is not attributed to a loss of capabilities, but rather to the failure of the instructions to activate the appropriate model abilities. We show that the model’s performance on previous tasks can be restored through two simple interventions: (1) providing partial external correct rationale, and (2) appending semantically meaningless suffixes to the original instructions, to guide the generation of correct rationales. Through empirical analysis of the internal mechanisms governing rationale generation, we reveal that models exhibiting pseudo forgetting show reduced instruction dependence during rationale generation, leading to suboptimal activation of their inherent capabilities. Based on this insight, we propose Rationale-Guidance Difficulty based Replay (RGD-R) framework that dynamically allocates replay data based on the model’s ability to correctly leverage the intrinsic capabilities. Experimental results demonstrate that RGD-R effectively mitigates pseudo forgetting while maintaining model plasticity.
nan
Article 1173
Title@2025-06-09 (1): SALT: A Lightweight Model Adaptation Method for Closed Split Computing Environments
Title: SALT: A Lightweight Model Adaptation Method for Closed Split Computing Environments | SALT: Eine leichte Modellanpassungsmethode für geschlossene Split Computing-Umgebungen | SALT: 用于封闭式拆分计算环境的轻量模型适应方法 2506.07355v1 |
Authors (2): Yuya Okada, Takayuki Nishio
We propose SALT (Split-Adaptive Lightweight Tuning), a lightweight model adaptation framework for Split Computing under closed constraints, where the head and tail networks are proprietary and inaccessible to users. In such closed environments, conventional adaptation methods are infeasible since they require access to model parameters or architectures. SALT addresses this challenge by introducing a compact, trainable adapter on the client side to refine latent features from the head network, enabling user-specific adaptation without modifying the original models or increasing communication overhead. We evaluate SALT on user-specific classification tasks with CIFAR-10 and CIFAR-100, demonstrating improved accuracy with lower training latency compared to fine-tuning methods. Furthermore, SALT facilitates model adaptation for robust inference over lossy networks, a common challenge in edge-cloud environments. With minimal deployment overhead, SALT offers a practical solution for personalized inference in edge AI systems under strict system constraints.
nan
Article 1174
Title@2025-06-09 (1): Task Generalization With AutoRegressive Compositional Structure: Can Learning From $D$ Tasks Generalize to $D^{T}$ Tasks?
Title: Task Generalization With AutoRegressive Compositional Structure: Can Learning From $D$ Tasks Generalize to $D^{T}$ Tasks? | Aufgabenverallgemeinerung mit autoregressiver Kompositionsstruktur: Kann man von $D$ Aufgaben zu $D^{T}$ Aufgaben verallgemeinern? | 与自动递减的构成结构相关的一般任务:能否从“$D”的任务向“$D”的任务普遍学习到“$D”的任务? 2502.08991v2 |
Authors (6): Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen, Hongzhou Lin, Jingzhao Zhang, Mikhail Belkin
Large language models (LLMs) exhibit remarkable task generalization, solving tasks they were never explicitly trained on with only a few demonstrations. This raises a fundamental question: When can learning from a small set of tasks generalize to a large task family? In this paper, we investigate task generalization through the lens of autoregressive compositional structure, where each task is a composition of $T$ operations, and each operation is among a finite family of $D$ subtasks. This yields a total class of size $D^T$. We first show that generalization to all $D^T$ tasks is theoretically achievable by training on only $\widetilde{O}(D)$ tasks. Empirically, we demonstrate that Transformers achieve such exponential task generalization on sparse parity functions via In-context Learning (ICL) and chain-of-thought (CoT) reasoning. We further show generalization in arithmetic and translation, beyond parity functions.
nan
Article 1175
Title@2025-06-09 (1): Decentralized Optimization on Compact Submanifolds by Quantized Riemannian Gradient Tracking
Title: Decentralized Optimization on Compact Submanifolds by Quantized Riemannian Gradient Tracking | Dezentralisierte Optimierung auf kompakten Submanifolds durch Quantized Riemannian Gradient Tracking | 量化的里伊曼尼梯度跟踪对契约子部件的优化 2506.07351v1 |
Authors (7): Jun Chen, Lina Liu, Tianyi Zhu, Yong Liu, Guang Dai, Yunliang Jiang, Ivor W. Tsang
This paper considers the problem of decentralized optimization on compact submanifolds, where a finite sum of smooth (possibly non-convex) local functions is minimized by $n$ agents forming an undirected and connected graph. However, the efficiency of distributed optimization is often hindered by communication bottlenecks. To mitigate this, we propose the Quantized Riemannian Gradient Tracking (Q-RGT) algorithm, where agents update their local variables using quantized gradients. The introduction of quantization noise allows our algorithm to bypass the constraints of the accurate Riemannian projection operator (such as retraction), further improving iterative efficiency. To the best of our knowledge, this is the first algorithm to achieve an $\mathcal{O}(1/K)$ convergence rate in the presence of quantization, matching the convergence rate of methods without quantization. Additionally, we explicitly derive lower bounds on decentralized consensus associated with a function of quantization levels. Numerical experiments demonstrate that Q-RGT performs comparably to non-quantized methods while reducing communication bottlenecks and computational overhead.
nan
Article 1176
Title@2025-06-09 (1): Distributed Risk-Sensitive Safety Filters for Uncertain Discrete-Time Systems
Title: Distributed Risk-Sensitive Safety Filters for Uncertain Discrete-Time Systems | Verteilte risikosensible Sicherheitsfilter für unsichere Diskretzeitsysteme | 用于不确定分辨时间系统的分散式风险敏感度安全过滤器 2506.07347v1 |
Authors (3): Armin Lederer, Erfaun Noorani, Andreas Krause
Ensuring safety in multi-agent systems is a significant challenge, particularly in settings where centralized coordination is impractical. In this work, we propose a novel risk-sensitive safety filter for discrete-time multi-agent systems with uncertain dynamics that leverages control barrier functions (CBFs) defined through value functions. Our approach relies on centralized risk-sensitive safety conditions based on exponential risk operators to ensure robustness against model uncertainties. We introduce a distributed formulation of the safety filter by deriving two alternative strategies: one based on worst-case anticipation and another on proximity to a known safe policy. By allowing agents to switch between strategies, feasibility can be ensured. Through detailed numerical evaluations, we demonstrate the efficacy of our approach in maintaining safety without being overly conservative.
nan
Article 1177
Title@2025-06-09 (1): ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity
Title: ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity | ST-GraphNet: Ein räumlich-temporales Neuralnetzwerk für das Verständnis und die Vorhersage von Automatisiertem Fahrzeug-Crash-Severity | ST-GraphNet:一个了解和预测自动车辆撞车严重性的空间-时时图神经网络 2506.08051v1 |
Authors (5): Mahmuda Sultana Mimi, Md Monzurul Islam, Anannya Ghosh Tusti, Shriyank Somvanshi, Subasish Das
Understanding the spatial and temporal dynamics of automated vehicle (AV) crash severity is critical for advancing urban mobility safety and infrastructure planning. In this work, we introduce ST-GraphNet, a spatio-temporal graph neural network framework designed to model and predict AV crash severity by using both fine-grained and region-aggregated spatial graphs. Using a balanced dataset of 2,352 real-world AV-related crash reports from Texas (2024), including geospatial coordinates, crash timestamps, SAE automation levels, and narrative descriptions, we construct two complementary graph representations: (1) a fine-grained graph with individual crash events as nodes, where edges are defined via spatio-temporal proximity; and (2) a coarse-grained graph where crashes are aggregated into Hexagonal Hierarchical Spatial Indexing (H3)-based spatial cells, connected through hexagonal adjacency. Each node in the graph is enriched with multimodal data, including semantic, spatial, and temporal attributes, including textual embeddings from crash narratives using a pretrained Sentence-BERT model. We evaluate various graph neural network (GNN) architectures, such as Graph Convolutional Networks (GCN), Graph Attention Networks (GAT), and Dynamic Spatio-Temporal GCN (DSTGCN), to classify crash severity and predict high-risk regions. Our proposed ST-GraphNet, which utilizes a DSTGCN backbone on the coarse-grained H3 graph, achieves a test accuracy of 97.74\%, substantially outperforming the best fine-grained model (64.7\% test accuracy). These findings highlight the effectiveness of spatial aggregation, dynamic message passing, and multi-modal feature integration in capturing the complex spatio-temporal patterns underlying AV crash severity.
nan
Article 1178
Title@2025-06-09 (1): A Comprehensive Survey on Artificial Intelligence for Complex Network: Potential, Methodology and Application
Title: A Comprehensive Survey on Artificial Intelligence for Complex Network: Potential, Methodology and Application | Eine umfassende Umfrage zu Künstlicher Intelligenz für komplexes Netzwerk: Potenzial, Methodik und Anwendung | 复杂网络人工情报综合调查:潜力、方法和应用 2402.16887v2 |
Authors (11): Jingtao Ding, Chang Liu, Yu Zheng, Yunke Zhang, Zihan Yu, Ruikun Li, Hongyi Chen, Jinghua Piao, Huandong Wang, Jiazhen Liu, Yong Li
Complex networks pervade various real-world systems, from the natural environment to human societies. The essence of these networks is in their ability to transition and evolve from microscopic disorder-where network topology and node dynamics intertwine-to a macroscopic order characterized by certain collective behaviors. Over the past two decades, complex network science has significantly enhanced our understanding of the statistical mechanics, structures, and dynamics underlying real-world networks. Despite these advancements, there remain considerable challenges in exploring more realistic systems and enhancing practical applications. The emergence of artificial intelligence (AI) technologies, coupled with the abundance of diverse real-world network data, has heralded a new era in complex network science research. This survey aims to systematically address the potential advantages of AI in overcoming the lingering challenges of complex network research. It endeavors to summarize the pivotal research problems and provide an exhaustive review of the corresponding methodologies and applications. Through this comprehensive survey-the first of its kind on AI for complex networks-we expect to provide valuable insights that will drive further research and advancement in this interdisciplinary field.
nan
Article 1179
Title@2025-06-09 (1): On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks
Title: On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks | Über die Auswirkungen von Unsicherheit und Kalibrierung auf Likelihood-Ratio-Mitgliedschafts-Inferenzangriffe | 不确定性和校准对潜在――拉蒂奥成员推论攻击的影响 2402.10686v5 |
Authors (4): Meiyi Zhu, Caili Guo, Chunyan Feng, Osvaldo Simeone
In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleatoric uncertainty in the true data generation process, of the epistemic uncertainty caused by a limited training data set, and of the calibration level of the target model. We compare three different settings, in which the attacker receives decreasingly informative feedback from the target model: confidence vector (CV) disclosure, in which the output probability vector is released; true label confidence (TLC) disclosure, in which only the probability assigned to the true label is made available by the model; and decision set (DS) disclosure, in which an adaptive prediction set is produced as in conformal prediction. We derive bounds on the advantage of an MIA adversary with the aim of offering insights into the impact of uncertainty and calibration on the effectiveness of MIAs. Simulation results demonstrate that the derived analytical bounds predict well the effectiveness of MIAs.
nan
Article 1180
Title@2025-06-09 (1): Calibrated Value-Aware Model Learning with Probabilistic Environment Models
Title: Calibrated Value-Aware Model Learning with Probabilistic Environment Models | Kalibriertes Value-Aware-Modell-Lernen mit probabilistischen Umweltmodellen | 使用概率环境模型校准的价值软件模型学习 2505.22772v2 |
Authors (6): Claas Voelcker, Anastasiia Pedan, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, Amir-massoud Farahmand
The idea of value-aware model learning, that models should produce accurate value estimates, has gained prominence in model-based reinforcement learning. The MuZero loss, which penalizes a model’s value function prediction compared to the ground-truth value function, has been utilized in several prominent empirical works in the literature. However, theoretical investigation into its strengths and weaknesses is limited. In this paper, we analyze the family of value-aware model learning losses, which includes the popular MuZero loss. We show that these losses, as normally used, are uncalibrated surrogate losses, which means that they do not always recover the correct model and value function. Building on this insight, we propose corrections to solve this issue. Furthermore, we investigate the interplay between the loss calibration, latent model architectures, and auxiliary losses that are commonly employed when training MuZero-style agents. We show that while deterministic models can be sufficient to predict accurate values, learning calibrated stochastic models is still advantageous.
nan
Article 1181
Title@2025-06-09 (1): Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models
Title: Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models | Membership Inferenz Attack sollte weiter zu Verteilungsstatistiken für destillierte Generative Modelle | 成员攻击的推论应转向已蒸馏生成模型的分发统计数据 2502.02970v2 |
Authors (6): Muxing Li, Zesheng Ye, Yixuan Li, Andy Song, Guangquan Zhang, Feng Liu
To detect unauthorized data usage in training large-scale generative models (e.g., ChatGPT or Midjourney), membership inference attacks (MIA) have proven effective in distinguishing a single training instance (a member) from a single non-training instance (a non-member). This success is mainly credited to a memorization effect: models tend to perform better on a member than a non-member. However, we find that standard MIAs fail against distilled generative models (i.e., student models) that are increasingly deployed in practice for efficiency (e.g., ChatGPT 4o-mini). Trained exclusively on data generated from a large-scale model (a teacher model), the student model lacks direct exposure to any members (teacher’s training data), nullifying the memorization effect that standard MIAs rely on. This finding reveals a serious privacy loophole, where generation-service providers could deploy a student model whose teacher was potentially trained on unauthorized data, yet claim the deployed model is clean because it was not directly trained on such data. Hence, are distilled models inherently unauditable for upstream privacy violations, and should we discard them when we care about privacy? We contend no, as we uncover a memory chain connecting the student and teacher’s member data: the distribution of student-generated data aligns more closely with the distribution of the teacher’s members than with non-members, thus we can detect unauthorized data usage even when direct instance-level memorization is absent. This leads us to posit that MIAs on distilled generative models should shift from instance-level scores to distribution-level statistics. We further propose three principles of distribution-based MIAs for detecting unauthorized training data through distilled generative models, and validate our position through an exemplar framework. We lastly discuss the implications of our position.
nan
Article 1182
Title@2025-06-09 (1): Real-Time Execution of Action Chunking Flow Policies
Title: Real-Time Execution of Action Chunking Flow Policies | Real-Time Ausführung von Maßnahmen Chunking Flow Richtlinien | 实时执行 “ 快速行动 “ 流动政策 2506.07339v1 |
Authors (3): Kevin Black, Manuel Y. Galliker, Sergey Levine
Modern AI systems, especially those interacting with the physical world, increasingly require real-time performance. However, the high latency of state-of-the-art generalist models, including recent vision-language action models (VLAs), poses a significant challenge. While action chunking has enabled temporal consistency in high-frequency control tasks, it does not fully address the latency problem, leading to pauses or out-of-distribution jerky movements at chunk boundaries. This paper presents a novel inference-time algorithm that enables smooth asynchronous execution of action chunking policies. Our method, real-time chunking (RTC), is applicable to any diffusion- or flow-based VLA out of the box with no re-training. It generates the next action chunk while executing the current one, “freezing” actions guaranteed to execute and “inpainting” the rest. To test RTC, we introduce a new benchmark of 12 highly dynamic tasks in the Kinetix simulator, as well as evaluate 6 challenging real-world bimanual manipulation tasks. Results demonstrate that RTC is fast, performant, and uniquely robust to inference delay, significantly improving task throughput and enabling high success rates in precise tasks $\unicode{x2013}$ such as lighting a match $\unicode{x2013}$ even in the presence of significant latency. See https://pi.website/research/real_time_chunking for videos.
nan
Article 1183
Title@2025-06-09 (1): Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models
Title: Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models | Graph-KV: Durchbrechen der Sequenz durch Einspritzen struktureller Biasen in große Sprachmodelle | 图图-KV:通过向大语言模型输入结构分界来打破序列 2506.07334v1 |
Authors (7): Haoyu Wang, Peihao Wang, Mufei Li, Shikun Liu, Siqi Miao, Zhangyang Wang, Pan Li
Modern large language models (LLMs) are inherently auto-regressive, requiring input to be serialized into flat sequences regardless of their structural dependencies. This serialization hinders the model’s ability to leverage structural inductive biases, especially in tasks such as retrieval-augmented generation (RAG) and reasoning on data with native graph structures, where inter-segment dependencies are crucial. We introduce Graph-KV with the potential to overcome this limitation. Graph-KV leverages the KV-cache of text segments as condensed representations and governs their interaction through structural inductive biases. In this framework, ‘target’ segments selectively attend only to the KV-caches of their designated ‘source’ segments, rather than all preceding segments in a serialized sequence. This approach induces a graph-structured block mask, sparsifying attention and enabling a message-passing-like step within the LLM. Furthermore, strategically allocated positional encodings for source and target segments reduce positional bias and context window consumption. We evaluate Graph-KV across three scenarios: (1) seven RAG benchmarks spanning direct inference, multi-hop reasoning, and long-document understanding; (2) Arxiv-QA, a novel academic paper QA task with full-text scientific papers structured as citation ego-graphs; and (3) paper topic classification within a citation network. By effectively reducing positional bias and harnessing structural inductive biases, Graph-KV substantially outperforms baselines, including standard costly sequential encoding, across various settings. Code and the Graph-KV data are publicly available.
nan
Article 1184
Title@2025-06-09 (1): Model Generalization on Text Attribute Graphs: Principles with Large Language Models
Title: Model Generalization on Text Attribute Graphs: Principles with Large Language Models | Modellverallgemeinerung auf Text-Attribut-Grafiken: Prinzipien mit großen Sprachmodellen | 文本属性图的模型通用化:大语言模型原则 2502.11836v2 |
Authors (4): Haoyu Wang, Shikun Liu, Rongzhe Wei, Pan Li
Large language models (LLMs) have recently been introduced to graph learning, aiming to extend their zero-shot generalization success to tasks where labeled graph data is scarce. Among these applications, inference over text-attributed graphs (TAGs) presents unique challenges: existing methods struggle with LLMs’ limited context length for processing large node neighborhoods and the misalignment between node embeddings and the LLM token space. To address these issues, we establish two key principles for ensuring generalization and derive the framework LLM-BP accordingly: (1) Unifying the attribute space with task-adaptive embeddings, where we leverage LLM-based encoders and task-aware prompting to enhance generalization of the text attribute embeddings; (2) Developing a generalizable graph information aggregation mechanism, for which we adopt belief propagation with LLM-estimated parameters that adapt across graphs. Evaluations on 11 real-world TAG benchmarks demonstrate that LLM-BP significantly outperforms existing approaches, achieving 8.10% improvement with task-conditional embeddings and an additional 1.71% gain from adaptive aggregation. The code and task-adaptive embeddings are publicly available.
nan
Article 1185
Title@2025-06-09 (1): JavelinGuard: Low-Cost Transformer Architectures for LLM Security
Title: JavelinGuard: Low-Cost Transformer Architectures for LLM Security | JavelinGuard: Low-Cost-Transformer-Architekturen für LLM-Sicherheit | JavelinGuard:低成本变革结构促进LLM安全 2506.07330v1 |
Authors (2): Yash Datta, Sharath Rajasekar
We present JavelinGuard, a suite of low-cost, high-performance model architectures designed for detecting malicious intent in Large Language Model (LLM) interactions, optimized specifically for production deployment. Recent advances in transformer architectures, including compact BERT(Devlin et al. 2019) variants (e.g., ModernBERT (Warner et al. 2024)), allow us to build highly accurate classifiers with as few as approximately 400M parameters that achieve rapid inference speeds even on standard CPU hardware. We systematically explore five progressively sophisticated transformer-based architectures: Sharanga (baseline transformer classifier), Mahendra (enhanced attention-weighted pooling with deeper heads), Vaishnava and Ashwina (hybrid neural ensemble architectures), and Raudra (an advanced multi-task framework with specialized loss functions). Our models are rigorously benchmarked across nine diverse adversarial datasets, including popular sets like the NotInject series, BIPIA, Garak, ImprovedLLM, ToxicChat, WildGuard, and our newly introduced JavelinBench, specifically crafted to test generalization on challenging borderline and hard-negative cases. Additionally, we compare our architectures against leading open-source guardrail models as well as large decoder-only LLMs such as gpt-4o, demonstrating superior cost-performance trade-offs in terms of accuracy, and latency. Our findings reveal that while Raudra’s multi-task design offers the most robust performance overall, each architecture presents unique trade-offs in speed, interpretability, and resource requirements, guiding practitioners in selecting the optimal balance of complexity and efficiency for real-world LLM security applications.
nan
Article 1186
Title@2025-06-09 (1): Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning
Title: Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning | Dynamische Planung für Fahrzeug-zu-Fahrzeug-Kommunikation verbessertes Federated Learning | 车辆到车辆的交通通信增强联邦学习的动态日程安排 2406.17470v2 |
Authors (6): Jintao Yan, Tan Chen, Yuxuan Sun, Zhaojun Nan, Sheng Zhou, Zhisheng Niu
Leveraging the computing and sensing capabilities of vehicles, vehicular federated learning (VFL) has been applied to edge training for connected vehicles. The dynamic and interconnected nature of vehicular networks presents unique opportunities to harness direct vehicle-to-vehicle (V2V) communications, enhancing VFL training efficiency. In this paper, we formulate a stochastic optimization problem to optimize the VFL training performance, considering the energy constraints and mobility of vehicles, and propose a V2V-enhanced dynamic scheduling (VEDS) algorithm to solve it. The model aggregation requirements of VFL and the limited transmission time due to mobility result in a stepwise objective function, which presents challenges in solving the problem. We thus propose a derivative-based drift-plus-penalty method to convert the long-term stochastic optimization problem to an online mixed integer nonlinear programming (MINLP) problem, and provide a theoretical analysis to bound the performance gap between the online solution and the offline optimal solution. Further analysis of the scheduling priority reduces the original problem into a set of convex optimization problems, which are efficiently solved using the interior-point method. Experimental results demonstrate that compared with the state-of-the-art benchmarks, the proposed algorithm enhances the image classification accuracy on the CIFAR-10 dataset by 4.20% and reduces the average displacement errors on the Argoverse trajectory prediction dataset by 9.82%.
nan
Article 1187
Title@2025-06-08 (7): Mobility-Aware Asynchronous Federated Learning with Dynamic Sparsification
Title: Mobility-Aware Asynchronous Federated Learning with Dynamic Sparsification | Mobilität-Bewusst Asynchrones Federated Learning mit dynamischer Sparsifikation | 具有动态分级功能的流动意识非同步联邦学习 2506.07328v1 |
Authors (6): Jintao Yan, Tan Chen, Yuxuan Sun, Zhaojun Nan, Sheng Zhou, Zhisheng Niu
Asynchronous Federated Learning (AFL) enables distributed model training across multiple mobile devices, allowing each device to independently update its local model without waiting for others. However, device mobility introduces intermittent connectivity, which necessitates gradient sparsification and leads to model staleness, jointly affecting AFL convergence. This paper develops a theoretical model to characterize the interplay among sparsification, model staleness and mobility-induced contact patterns, and their joint impact on AFL convergence. Based on the analysis, we propose a mobility-aware dynamic sparsification (MADS) algorithm that optimizes the sparsification degree based on contact time and model staleness. Closed-form solutions are derived, showing that under low-speed conditions, MADS increases the sparsification degree to enhance convergence, while under high-speed conditions, it reduces the sparsification degree to guarantee reliable uploads within limited contact time. Experimental results validate the theoretical findings. Compared with the state-of-the-art benchmarks, the MADS algorithm increases the image classification accuracy on the CIFAR-10 dataset by 8.76% and reduces the average displacement error in the Argoverse trajectory prediction dataset by 9.46%.
nan
Article 1188
Title@2025-06-08 (7): Reward Model Interpretability via Optimal and Pessimal Tokens
Title: Reward Model Interpretability via Optimal and Pessimal Tokens | Reward Model Interpretierbarkeit über Optimale und Pessimal Token | 通过最佳和微量调制调来解释性 2506.07326v1 |
Authors (5): Brian Christian, Hannah Rose Kirk, Jessica A. F. Thompson, Christopher Summerfield, Tsvetomira Dumbalska
Reward modeling has emerged as a crucial component in aligning large language models with human values. Significant attention has focused on using reward models as a means for fine-tuning generative models. However, the reward models themselves – which directly encode human value judgments by turning prompt-response pairs into scalar rewards – remain relatively understudied. We present a novel approach to reward model interpretability through exhaustive analysis of their responses across their entire vocabulary space. By examining how different reward models score every possible single-token response to value-laden prompts, we uncover several striking findings: (i) substantial heterogeneity between models trained on similar objectives, (ii) systematic asymmetries in how models encode high- vs low-scoring tokens, (iii) significant sensitivity to prompt framing that mirrors human cognitive biases, and (iv) overvaluation of more frequent tokens. We demonstrate these effects across ten recent open-source reward models of varying parameter counts and architectures. Our results challenge assumptions about the interchangeability of reward models, as well as their suitability as proxies of complex and context-dependent human values. We find that these models can encode concerning biases toward certain identity groups, which may emerge as unintended consequences of harmlessness training – distortions that risk propagating through the downstream large language models now deployed to millions.
nan
Article 1189
Title@2025-06-08 (7): Devanagari Digit Recognition using Quantum Machine Learning
Title: Devanagari Digit Recognition using Quantum Machine Learning | Devanagari Digit Recognition mit Quantum Machine Learning | Devanagari Digit 使用量子机器学习识别 2506.09069v1 |
Authors (1): Sahaj Raj Malla
Handwritten digit recognition in regional scripts, such as Devanagari, is crucial for multilingual document digitization, educational tools, and the preservation of cultural heritage. The script’s complex structure and limited annotated datasets pose significant challenges to conventional models. This paper introduces the first hybrid quantum-classical architecture for Devanagari handwritten digit recognition, combining a convolutional neural network (CNN) for spatial feature extraction with a 10-qubit variational quantum circuit (VQC) for quantum-enhanced classification. Trained and evaluated on the Devanagari Handwritten Character Dataset (DHCD), the proposed model achieves a state-of-the-art test accuracy for quantum implementation of 99.80% and a test loss of 0.2893, with an average per-class F1-score of 0.9980. Compared to equivalent classical CNNs, our model demonstrates superior accuracy with significantly fewer parameters and enhanced robustness. By leveraging quantum principles such as superposition and entanglement, this work establishes a novel benchmark for regional script recognition, highlighting the promise of quantum machine learning (QML) in real-world, low-resource language settings.
nan
Article 1190
Title@2025-06-08 (7): DEF: Diffusion-augmented Ensemble Forecasting
Title: DEF: Diffusion-augmented Ensemble Forecasting | DEF: Diffusionsgenerierte Ensemble-Prognose | DEF: 扩散增强的集合预测 2506.07324v1 |
Authors (4): David Millard, Arielle Carr, Stéphane Gaudreault, Ali Baheri
We present DEF (\textbf{\ul{D}}iffusion-augmented \textbf{\ul{E}}nsemble \textbf{\ul{F}}orecasting), a novel approach for generating initial condition perturbations. Modern approaches to initial condition perturbations are primarily designed for numerical weather prediction (NWP) solvers, limiting their applicability in the rapidly growing field of machine learning for weather prediction. Consequently, stochastic models in this domain are often developed on a case-by-case basis. We demonstrate that a simple conditional diffusion model can (1) generate meaningful structured perturbations, (2) be applied iteratively, and (3) utilize a guidance term to intuitivey control the level of perturbation. This method enables the transformation of any deterministic neural forecasting system into a stochastic one. With our stochastic extended systems, we show that the model accumulates less error over long-term forecasts while producing meaningful forecast distributions. We validate our approach on the 5.625$^\circ$ ERA5 reanalysis dataset, which comprises atmospheric and surface variables over a discretized global grid, spanning from the 1960s to the present. On this dataset, our method demonstrates improved predictive performance along with reasonable spread estimates.
nan
Article 1191
Title@2025-06-08 (7): Watermarking Language Models with Error Correcting Codes
Title: Watermarking Language Models with Error Correcting Codes | Wasserzeichen-Sprachenmodelle mit Fehler-Korrektur-Codes | 错误校正代码的划水语言模型 2406.10281v4 |
Authors (4): Patrick Chao, Yan Sun, Edgar Dobriban, Hamed Hassani
Recent progress in large language models enables the creation of realistic machine-generated content. Watermarking is a promising approach to distinguish machine-generated text from human text, embedding statistical signals in the output that are ideally undetectable to humans. We propose a watermarking framework that encodes such signals through an error correcting code. Our method, termed robust binary code (RBC) watermark, introduces no noticeable degradation in quality. We evaluate our watermark on base and instruction fine-tuned models and find that our watermark is robust to edits, deletions, and translations. We provide an information-theoretic perspective on watermarking, a powerful statistical test for detection and for generating $p$-values, and theoretical guarantees. Our empirical findings suggest our watermark is fast, powerful, and robust, comparing favorably to the state-of-the-art.
nan
Article 1192
Title@2025-06-08 (7): Generative Modeling of Networked Time-Series via Transformer Architectures
Title: Generative Modeling of Networked Time-Series via Transformer Architectures | Generative Modellierung von vernetzten Zeitreihen über Transformer-Architekturen | 通过变换器架构生成网络化时序建模 2506.07312v1 |
Authors (1): Yusuf Elnady
Many security and network applications require having large datasets to train the machine learning models. Limited data access is a well-known problem in the security domain. Recent studies have shown the potential of Transformer models to enlarge the size of data by synthesizing new samples, but the synthesized samples don’t improve the models over the real data. To address this issue, we design an efficient transformer-based model as a generative framework to generate time-series data, that can be used to boost the performance of existing and new ML workflows. Our new transformer model achieves the SOTA results. We style our model to be generalizable and work across different datasets, and produce high-quality samples.
nan
Article 1193
Title@2025-06-08 (7): Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference
Title: Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference | Paged Attention trifft auf FlexAttention: Locking Long-Context Efficiency in Deployed Inference | 专注:在部署推论中解锁长文本效率 2506.07311v1 |
Authors (5): Thomas Joshi, Herman Saini, Neil Dhillon, Antoni Viros i Martin, Kaoutar El Maghraoui
Large Language Models (LLMs) encounter severe memory inefficiencies during long-context inference due to conventional handling of key-value (KV) caches. In this work, we introduce a novel integration of PagedAttention with PyTorch’s FlexAttention, addressing internal fragmentation and inefficiencies associated with monolithic KV cache allocations. Implemented within IBM’s Foundation Model Stack (FMS), our fused attention kernel efficiently gathers scattered KV data. Our benchmarks on an NVIDIA L4 GPU (24GB) demonstrate significantly reduced inference latency, growing only linearly (~2x) with sequence length from 128 to 2048 tokens when utilizing a global KV cache, compared to exponential latency increases without caching. While peak memory usage remains largely unchanged for single-step evaluations (dominated by model weights and activations), paged attention causes minimal incremental memory usage, observable only at sequence lengths exceeding 2048 tokens due to its power-of-two cache allocations. We open-source the full implementation and discuss its implications for future long-context model deployment.
nan
Article 1194
Title@2025-06-08 (7): PASS: Private Attributes Protection with Stochastic Data Substitution
Title: PASS: Private Attributes Protection with Stochastic Data Substitution | PASS: Schutz privater Attribute mit stochastischer Datenersetzung | PASS: 私人财产保护,用存储数据替代数据 2506.07308v1 |
Authors (6): Yizhuo Chen, Chun-Fu, Chen, Hsiang Hsu, Shaohan Hu, Tarek Abdelzaher
The growing Machine Learning (ML) services require extensive collections of user data, which may inadvertently include people’s private information irrelevant to the services. Various studies have been proposed to protect private attributes by removing them from the data while maintaining the utilities of the data for downstream tasks. Nevertheless, as we theoretically and empirically show in the paper, these methods reveal severe vulnerability because of a common weakness rooted in their adversarial training based strategies. To overcome this limitation, we propose a novel approach, PASS, designed to stochastically substitute the original sample with another one according to certain probabilities, which is trained with a novel loss function soundly derived from information-theoretic objective defined for utility-preserving private attributes protection. The comprehensive evaluation of PASS on various datasets of different modalities, including facial images, human activity sensory signals, and voice recording datasets, substantiates PASS’s effectiveness and generalizability.
nan
Article 1195
Title@2025-06-08 (7): Highly Fast Text Segmentation With Pairwise Markov Chains
Title: Highly Fast Text Segmentation With Pairwise Markov Chains | Hochschnelle Textsegmentierung mit paarweise Markov-Ketten | 配对马科夫链条的快速文本分割 2102.11037v2 |
Authors (4): Elie Azeraf, Emmanuel Monfrini, Emmanuel Vignon, Wojciech Pieczynski
Natural Language Processing (NLP) models’ current trend consists of using increasingly more extra-data to build the best models as possible. It implies more expensive computational costs and training time, difficulties for deployment, and worries about these models’ carbon footprint reveal a critical problem in the future. Against this trend, our goal is to develop NLP models requiring no extra-data and minimizing training time. To do so, in this paper, we explore Markov chain models, Hidden Markov Chain (HMC) and Pairwise Markov Chain (PMC), for NLP segmentation tasks. We apply these models for three classic applications: POS Tagging, Named-Entity-Recognition, and Chunking. We develop an original method to adapt these models for text segmentation’s specific challenges to obtain relevant performances with very short training and execution times. PMC achieves equivalent results to those obtained by Conditional Random Fields (CRF), one of the most applied models for these tasks when no extra-data are used. Moreover, PMC has training times 30 times shorter than the CRF ones, which validates this model given our objectives.
nan
Article 1196
Title@2025-06-08 (7): Sharpness-Aware Teleportation on Riemannian Manifolds
Title: Sharpness-Aware Teleportation on Riemannian Manifolds | Scharfheitsbewusste Teleportation auf Riemannschen Manifolds | Riemannian Manifolds 上的移动 2309.17215v2 |
Authors (7): Tuan Truong, Hoang-Phi Nguyen, Haocheng Luo, Tung Pham, Mehrtash Harandi, Dinh Phung, Trung Le
Recent studies highlight the effectiveness of flat minima in enhancing generalization, with sharpness-aware minimization (SAM) achieving state-of-the-art performance. Additionally, insights into the intrinsic geometry of the loss landscape have shown promise for improving model generalization. Building on these advancements, we introduce a novel sharpness-aware, geometry-aware teleportation mechanism to further enhance robustness and generalization. The core innovation of our approach is to decompose each iteration into a teleportation step within a local orbit and a sharpness-aware step that transitions between different orbits, leveraging the Riemannian quotient manifold. Our approach is grounded in a theoretical framework that analyzes the generalization gap between population loss and worst-case empirical loss within the context of Riemannian manifolds. To demonstrate the effectiveness of our method, we evaluate and compare our algorithm on diverse vision benchmarks with various datasets and Riemannian manifolds.
nan
Article 1197
Title@2025-06-08 (7): Uncertainty-Aware Strategies: A Model-Agnostic Framework for Robust Financial Optimization through Subsampling
Title: Uncertainty-Aware Strategies: A Model-Agnostic Framework for Robust Financial Optimization through Subsampling | Unsichere Strategien: Ein modell-agnostischer Rahmen für robuste finanzielle Optimierung durch Unterstichproben | 不确定性 – – 软件战略:通过子取样实现稳健金融优化的模型 – – 不可确定框架 2506.07299v1 |
Authors (4): Hans Buehler, Blanka Horvath, Yannick Limmer, Thorsten Schmidt
This paper addresses the challenge of model uncertainty in quantitative finance, where decisions in portfolio allocation, derivative pricing, and risk management rely on estimating stochastic models from limited data. In practice, the unavailability of the true probability measure forces reliance on an empirical approximation, and even small misestimations can lead to significant deviations in decision quality. Building on the framework of Klibanoff et al. (2005), we enhance the conventional objective - whether this is expected utility in an investing context or a hedging metric - by superimposing an outer “uncertainty measure”, motivated by traditional monetary risk measures, on the space of models. In scenarios where a natural model distribution is lacking or Bayesian methods are impractical, we propose an ad hoc subsampling strategy, analogous to bootstrapping in statistical finance and related to mini-batch sampling in deep learning, to approximate model uncertainty. To address the quadratic memory demands of naive implementations, we also present an adapted stochastic gradient descent algorithm that enables efficient parallelization. Through analytical, simulated, and empirical studies - including multi-period, real data and high-dimensional examples - we demonstrate that uncertainty measures outperform traditional mixture of measures strategies and our model-agnostic subsampling-based approach not only enhances robustness against model risk but also achieves performance comparable to more elaborate Bayesian methods.
nan
Article 1198
Title@2025-06-08 (7): MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Title: MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache | MiniKV: Die Grenzen der LLM-Inferenz durch 2-Bit-Layer-Diskriminativ-KV-Cache drücken | MiniKV: 通过2-Bit 层分解 KV 缓存推进LLM 推断的限度 2411.18077v3 |
Authors (5): Akshat Sharma, Hangliang Ding, Jianping Li, Neel Dani, Minjia Zhang
How to efficiently serve LLMs in practice has become exceptionally challenging due to their prohibitive memory and computation requirements. In this study, we investigate optimizing the KV cache, whose memory footprint poses a critical bottleneck in LLM inference, especially when dealing with long context tasks. To tackle the challenge, we introduce MiniKV, a KV cache optimization method that simultaneously preserves long context task accuracy while significantly reducing KV cache size via a novel 2-bit layer-discriminative KV cache. More importantly, we develop specialized CUDA kernels to make MiniKV compatible with FlashAttention. Experiments on a wide range of long context tasks show that MiniKV effectively achieves 86% KV cache compression ratio while recovering over 98.5% of accuracy, outperforming state-of-the-art methods while achieving excellent measured system performance improvements.
nan
Article 1199
Title@2025-06-08 (7): On the kernel learning problem
Title: On the kernel learning problem | Zum Problem des Kernel-Lernens | 关于内核学习问题 2502.11665v2 |
Authors (2): Yang Li, Feng Ruan
The classical kernel ridge regression problem aims to find the best fit for the output $Y$ as a function of the input data $X\in \mathbb{R}^d$, with a fixed choice of regularization term imposed by a given choice of a reproducing kernel Hilbert space, such as a Sobolev space. Here we consider a generalization of the kernel ridge regression problem, by introducing an extra matrix parameter $U$, which aims to detect the scale parameters and the feature variables in the data, and thereby improve the efficiency of kernel ridge regression. This naturally leads to a nonlinear variational problem to optimize the choice of $U$. We study various foundational mathematical aspects of this variational problem, and in particular how this behaves in the presence of multiscale structures in the data.
nan
Article 1200
Title@2025-06-08 (7): Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI
Title: Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI | Multi-Step Guided Diffusion für die Bildwiederherstellung an Edge-Geräten: Auf dem Weg zur leichten Wahrnehmung in körpernaher KI | 多脚制导扩散,促进边缘设备图像恢复:向Embodied AI 中的轻量度感知 2506.07286v1 |
Authors (1): Aditya Chakravarty
Diffusion models have shown remarkable flexibility for solving inverse problems without task-specific retraining. However, existing approaches such as Manifold Preserving Guided Diffusion (MPGD) apply only a single gradient update per denoising step, limiting restoration fidelity and robustness, especially in embedded or out-of-distribution settings. In this work, we introduce a multistep optimization strategy within each denoising timestep, significantly enhancing image quality, perceptual accuracy, and generalization. Our experiments on super-resolution and Gaussian deblurring demonstrate that increasing the number of gradient updates per step improves LPIPS and PSNR with minimal latency overhead. Notably, we validate this approach on a Jetson Orin Nano using degraded ImageNet and a UAV dataset, showing that MPGD, originally trained on face datasets, generalizes effectively to natural and aerial scenes. Our findings highlight MPGD’s potential as a lightweight, plug-and-play restoration module for real-time visual perception in embodied AI agents such as drones and mobile robots.
nan
Article 1201
Title@2025-06-08 (7): Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization
Title: Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization | Verteidigung gegen verschiedene Angriffe im Federated Learning durch Konsens-basierte Bi-Level-Optimierung | 通过基于共识的双级优化,在通过共识实现的两级最佳化,在联邦学习中防范多种袭击 2412.02535v2 |
Authors (5): Nicolás García Trillos, Aditya Kumar Akash, Sixu Li, Konstantin Riedl, Yuhua Zhu
Adversarial attacks pose significant challenges in many machine learning applications, particularly in the setting of distributed training and federated learning, where malicious agents seek to corrupt the training process with the goal of jeopardizing and compromising the performance and reliability of the final models. In this paper, we address the problem of robust federated learning in the presence of such attacks by formulating the training task as a bi-level optimization problem. We conduct a theoretical analysis of the resilience of consensus-based bi-level optimization (CB$^2$O), an interacting multi-particle metaheuristic optimization method, in adversarial settings. Specifically, we provide a global convergence analysis of CB$^2$O in mean-field law in the presence of malicious agents, demonstrating the robustness of CB$^2$O against a diverse range of attacks. Thereby, we offer insights into how specific hyperparameter choices enable to mitigate adversarial effects. On the practical side, we extend CB$^2$O to the clustered federated learning setting by proposing FedCB$^2$O, a novel interacting multi-particle system, and design a practical algorithm that addresses the demands of real-world applications. Extensive experiments demonstrate the robustness of the FedCB$^2$O algorithm against label-flipping attacks in decentralized clustered federated learning scenarios, showcasing its effectiveness in practical contexts.
nan
Article 1202
Title@2025-06-08 (7): Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
Title: Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback | Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback | 超前优先优化(EGPO):超越纳什最后的融合(Nash) 从人类反馈中学习 2503.08942v2 |
Authors (3): Runlong Zhou, Maryam Fazel, Simon S. Du
Reinforcement learning from human feedback (RLHF) has become essential for improving language model capabilities, but traditional approaches rely on the assumption that human preferences follow a transitive Bradley-Terry model. This assumption fails to capture the non-transitive nature of populational human preferences. Nash learning from human feedback (NLHF), targeting non-transitive preferences, is a problem of computing the Nash equilibrium (NE) of the two-player constant-sum game defined by the human preference. We introduce Extragradient preference optimization (EGPO), a novel algorithm for NLHF achieving last-iterate linear convergence to the NE of KL-regularized games and polynomial convergence to the NE of original games, while being robust to noise. Unlike previous approaches that rely on nested optimization, we derive an equivalent implementation using gradients of an online variant of the identity preference optimization (IPO) loss, enabling more faithful implementation for neural networks. Our empirical evaluations demonstrate EGPO’s superior performance over baseline methods when training for the same number of epochs, as measured by pairwise win-rates using the ground truth preference. These results validate both the theoretical strengths and practical advantages of EGPO for language model alignment with non-transitive human preferences.
nan
Article 1203
Title@2025-06-08 (7): Selective Prompt Anchoring for Code Generation
Title: Selective Prompt Anchoring for Code Generation | Selektive Prompt-Ankerung für die Code-Generierung | 代代代代代代代代代代代代代代代代代 代代代代代代代代代代代代代 代代代代代代代代代代代代 2408.09121v5 |
Authors (2): Yuan Tian, Tianyi Zhang
Recent advances in large language models (LLMs) have transformed software development by automatically generating code from natural language. Yet challenges remain in generating fully correct code that aligns with user intent. Our study reveals that LLMs tend to pay less attention to user prompts as more code tokens are generated. We hypothesize that this attention dilution issue is an important reason for code generation errors. To mitigate this issue, we propose Selective Prompt Anchoring (SPA) to guide code LLMs to pay more attention to user intent when generating code. We evaluate SPA using six base LLMs across six benchmarks. Our results demonstrate that SPA enhances Pass@1 by up to 12.9%, consistently outperforming SOTA code generation methods in all settings. Our code is available at https://github.com/magic-YuanTian/Selective-Prompt-Anchoring.
nan
Article 1204
Title@2025-06-08 (7): Machine Learning Should Maximize Welfare, but Not by (Only) Maximizing Accuracy
Title: Machine Learning Should Maximize Welfare, but Not by (Only) Maximizing Accuracy | Maschinelles Lernen sollte Wohlfahrt maximieren, aber nicht durch (nur) Maximierung der Genauigkeit | 机器学习应该最大限度地提高福利,但不能(仅仅)最大限度地提高准确度 2502.11981v2 |
Authors (2): Nir Rosenfeld, Haifeng Xu
Decades of research in machine learning have given us powerful tools for making accurate predictions. This has made such tools appealing for use in social settings and on human inputs. Yet despite a lack of justification for why the generic approach of accuracy maximization can or should improve our collective well-being – and mounting evidence of likely adverse outcomes – it remains the widespread default. This position paper asserts that for machine learning to become socially beneficial, it must be embedded within a broader economic framework that explicitly aims to maximize social welfare. The field of welfare economics asks: how should we allocate limited resources among self-interested agents to maximize overall benefits? We contend that this perspective applies to many contemporary applications of machine learning in social contexts, and advocate for its adoption. Rather than disposing of prediction, we propose to leverage this forte of machine learning towards welfare maximization. We demonstrate this idea by portraying a conceptual framework that gradually transitions from accuracy maximization (with awareness to welfare) to welfare maximization (via accurate prediction). We detail applications and use-cases for which this framework can be effective, identify technical challenges and practical opportunities, and highlight future avenues worth pursuing.
nan
Article 1205
Title@2025-06-08 (7): Tokenized Bandit for LLM Decoding and Alignment
Title: Tokenized Bandit for LLM Decoding and Alignment | Tokenized Bandit für LLM Decodierung und Ausrichtung | 用于LLMM 解码和对齐的定義土匪 2506.07276v1 |
Authors (4): Suho Shin, Chenghao Yang, Haifeng Xu, Mohammad T. Hajiaghayi
We introduce the tokenized linear bandit (TLB) and multi-armed bandit (TMAB), variants of linear and stochastic multi-armed bandit problems inspired by LLM decoding and alignment. In these problems, at each round $t \in [T]$, a user submits a query (context), and the decision maker (DM) sequentially selects a token irrevocably from a token set. Once the sequence is complete, the DM observes a random utility from the user, whose expectation is presented by a sequence function mapping the chosen token sequence to a nonnegative real value that depends on the query. In both problems, we first show that learning is impossible without any structure on the sequence function. We introduce a natural assumption, diminishing distance with more commons (DDMC), and propose algorithms with regret $\tilde{O}(L\sqrt{T})$ and $\tilde{O}(L\sqrt{T^{2/3}})$ for TLB and TMAB, respectively. As a side product, we obtain an (almost) optimality of the greedy decoding for LLM decoding algorithm under DDMC, which justifies the unresaonable effectiveness of greedy decoding in several tasks. This also has an immediate application to decoding-time LLM alignment, when the misaligned utility can be represented as the frozen LLM’s utility and a linearly realizable latent function. We finally validate our algorithm’s performance empirically as well as verify our assumptions using synthetic and real-world datasets.
nan
Article 1206
Title@2025-06-08 (7): A Cramér-von Mises Approach to Incentivizing Truthful Data Sharing
Title: A Cramér-von Mises Approach to Incentivizing Truthful Data Sharing | Ein Cramér-von-Mises-Ansatz zur Förderung der wahrheitsgemäßen Datenfreigabe | 激励真相数据共享的Cramér-von Mises方法 2506.07272v1 |
Authors (5): Alex Clinton, Thomas Zeng, Yiding Chen, Xiaojin Zhu, Kirthevasan Kandasamy
Modern data marketplaces and data sharing consortia increasingly rely on incentive mechanisms to encourage agents to contribute data. However, schemes that reward agents based on the quantity of submitted data are vulnerable to manipulation, as agents may submit fabricated or low-quality data to inflate their rewards. Prior work has proposed comparing each agent’s data against others’ to promote honesty: when others contribute genuine data, the best way to minimize discrepancy is to do the same. Yet prior implementations of this idea rely on very strong assumptions about the data distribution (e.g. Gaussian), limiting their applicability. In this work, we develop reward mechanisms based on a novel, two-sample test inspired by the Cram'er-von Mises statistic. Our methods strictly incentivize agents to submit more genuine data, while disincentivizing data fabrication and other types of untruthful reporting. We establish that truthful reporting constitutes a (possibly approximate) Nash equilibrium in both Bayesian and prior-agnostic settings. We theoretically instantiate our method in three canonical data sharing problems and show that it relaxes key assumptions made by prior work. Empirically, we demonstrate that our mechanism incentivizes truthful data sharing via simulations and on real-world language and image data.
nan
Article 1207
Title@2025-06-08 (7): Machine Learning-Based Self-Localization Using Internal Sensors for Automating Bulldozers
Title: Machine Learning-Based Self-Localization Using Internal Sensors for Automating Bulldozers | Maschinelle Learning-basierte Selbstlokalisierung mittels interner Sensoren zur Automatisierung von Bulldozern | 利用内部传感器实现推推推车自动化 2506.07271v1 |
Authors (7): Hikaru Sawafuji, Ryota Ozaki, Takuto Motomura, Toyohisa Matsuda, Masanori Tojima, Kento Uchida, Shinichi Shirakawa
Self-localization is an important technology for automating bulldozers. Conventional bulldozer self-localization systems rely on RTK-GNSS (Real Time Kinematic-Global Navigation Satellite Systems). However, RTK-GNSS signals are sometimes lost in certain mining conditions. Therefore, self-localization methods that do not depend on RTK-GNSS are required. In this paper, we propose a machine learning-based self-localization method for bulldozers. The proposed method consists of two steps: estimating local velocities using a machine learning model from internal sensors, and incorporating these estimates into an Extended Kalman Filter (EKF) for global localization. We also created a novel dataset for bulldozer odometry and conducted experiments across various driving scenarios, including slalom, excavation, and driving on slopes. The result demonstrated that the proposed self-localization method suppressed the accumulation of position errors compared to kinematics-based methods, especially when slip occurred. Furthermore, this study showed that bulldozer-specific sensors, such as blade position sensors and hydraulic pressure sensors, contributed to improving self-localization accuracy.
nan
Article 1208
Title@2025-06-08 (7): PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference
Title: PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference | PecSched: Präventive und effiziente Clusterplanung für LLM-Inferenz | 预测:先发制人和高效率的LLM推理群集安排 2409.15104v2 |
Authors (2): Zeyu Zhang, Haiying Shen
The scaling of transformer-based Large Language Models (LLMs) has significantly expanded their context lengths, enabling applications where inputs exceed 100K tokens. Our analysis of a recent Azure LLM inference trace reveals a highly skewed long-tail distribution of input lengths, with approximately 80% of inputs shorter than 2K tokens. Long inputs constitute only a small fraction. Existing cluster-level LLM scheduling strategies, including First-In-First-Out (FIFO), reservation-based, and priority-based approaches, primarily target short-input requests with lengths below 2K and fail to address this heterogeneity, leading to inefficiencies such as head-of-line blocking, resource underutilization, and starvation of long-input requests. We propose PecSched, a Preemptive and Efficient Cluster SCHEDuling system for LLM inference. PecSched introduces the following key techniques: 1) preemptive scheduling that prioritizes short-input requests for their performance; 2) coordinated prefill-decode colocation and disaggregation, which reduces both the duration and frequency of preemptions; 3) fast Sequence Parallelism (SP) that minimizes the prefill time of long-input requests to further reduce the likelihood and frequency of preemptions. Evaluations based on Azure LLM inference trace show that, compared to state-of-the-art cluster-level LLM inference schedulers, PecSched reduces the 99th percentile queueing delay of short-input requests by up to 92% and improves their throughput by up to 595%, without significantly affecting the Job Completion Time (JCT) of long-input requests. We open-sourced our code.
nan
Article 1209
Title@2025-06-08 (7): Universality of conformal prediction under the assumption of randomness
Title: Universality of conformal prediction under the assumption of randomness | Universalität der konformen Vorhersage unter der Annahme der Zufälligkeit | 假设随机性的情况下一致预测的普遍性 2502.19254v2 |
Authors (1): Vladimir Vovk
Conformal predictors provide set or functional predictions that are valid under the assumption of randomness, i.e., under the assumption of independent and identically distributed data. The question asked in this paper is whether there are predictors that are valid in the same sense under the assumption of randomness and that are more efficient than conformal predictors. The answer is that the class of conformal predictors is universal in that only limited gains in predictive efficiency are possible. The previous work in this area has relied on the algorithmic theory of randomness and so involved unspecified constants, whereas this paper’s results are much more practical. They are also shown to be optimal in some respects.
nan
Article 1210
Title@2025-06-08 (7): FDC: Fast KV Dimensionality Compression for Efficient LLM Inference
Title: FDC: Fast KV Dimensionality Compression for Efficient LLM Inference | FDC: Schnelle KV-Dimensionalitätskompression für effiziente LLM-Inferenz | FDC: 高效 LLM 推断快速 KV 尺寸压缩 2408.04107v3 |
Authors (2): Zeyu Zhang, Haiying Shen
In large-language models, memory constraints in the Key-Value Cache (KVC) pose a challenge during inference. In this work, we propose FDC, a fast KV dimensionality compression system that eliminates the decompression overhead incurred in the existing KV dimensionality compression system, Palu, and reduces attention time. Moreover, FDC employs adaptive compression, tailoring KV compression rates across heads and layers based on their contributions to inference to maximize overall compression while maintaining an accuracy loss constraint. Additionally, FDC enhances the attention kernel to balance the uneven workloads caused by the adaptive compression approach to further reduce attention computation latency. Comprehensive experiments demonstrate that compared to Palu, FDC can reduce Job Completion Time (JCT) by up to 64%, and delivers up to 1.97X throughput under the same latency, while maintaining 99% of the accuracy without compression. When state-of-the-art eviction and quantization methods are combined with FDC, they exhibit similar improvements compared to those combined with Palu. We open-sourced the code.
nan
Article 1211
Title@2025-06-08 (7): Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods
Title: Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods | Ensemble-MIX: Verbesserung der Probeneffizienz in Multi-Agent RL mit Ensemble-Methoden | 组合-MIX:利用组合方法提高多代理RL的取样效率 2506.02841v2 |
Authors (2): Tom Danino, Nahum Shimkin
Multi-agent reinforcement learning (MARL) methods have achieved state-of-the-art results on a range of multi-agent tasks. Yet, MARL algorithms typically require significantly more environment interactions than their single-agent counterparts to converge, a problem exacerbated by the difficulty in exploring over a large joint action space and the high variance intrinsic to MARL environments. To tackle these issues, we propose a novel algorithm that combines a decomposed centralized critic with decentralized ensemble learning, incorporating several key contributions. The main component in our scheme is a selective exploration method that leverages ensemble kurtosis. We extend the global decomposed critic with a diversity-regularized ensemble of individual critics and utilize its excess kurtosis to guide exploration toward high-uncertainty states and actions. To improve sample efficiency, we train the centralized critic with a novel truncated variation of the TD($\lambda$) algorithm, enabling efficient off-policy learning with reduced variance. On the actor side, our suggested algorithm adapts the mixed samples approach to MARL, mixing on-policy and off-policy loss functions for training the actors. This approach balances between stability and efficiency and outperforms purely off-policy learning. The evaluation shows our method outperforms state-of-the-art baselines on standard MARL benchmarks, including a variety of SMAC II maps.
nan
Article 1212
Title@2025-06-08 (7): BG-HOP: A Bimanual Generative Hand-Object Prior
Title: BG-HOP: A Bimanual Generative Hand-Object Prior | BG-HOP: Bimanuelle Generative Hand-Objekt-Vorhersage | BG-HOP: 双体生成手向对象前置 2506.09068v1 |
Authors (3): Sriram Krishna, Sravan Chittupalli, Sungjae Park
In this work, we present BG-HOP, a generative prior that seeks to model bimanual hand-object interactions in 3D. We address the challenge of limited bimanual interaction data by extending existing single-hand generative priors, demonstrating preliminary results in capturing the joint distribution of hands and objects. Our experiments showcase the model’s capability to generate bimanual interactions and synthesize grasps for given objects. We make code and models publicly available.
nan
Article 1213
Title@2025-06-08 (7): Counterfactual inference in sequential experiments
Title: Counterfactual inference in sequential experiments | Gegenfaktische Inferenz in sequentiellen Experimenten | 连续实验中的反事实推论 2202.06891v5 |
Authors (6): Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah
We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale – mean outcome under different treatments for each unit and each time – with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to $\infty$ together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.
nan
Article 1214
Title@2025-06-08 (7): RADAR: Recall Augmentation through Deferred Asynchronous Retrieval
Title: RADAR: Recall Augmentation through Deferred Asynchronous Retrieval | RADAR: Erinnerung an Augmentation durch verzögerte asynchrone Retrieval | RADAR:通过推迟的非同步回溯性回溯性增量 2506.07261v1 |
Authors (3): Amit Jaspal, Qian Dang, Ajantha Ramineni
Modern large-scale recommender systems employ multi-stage ranking funnel (Retrieval, Pre-ranking, Ranking) to balance engagement and computational constraints (latency, CPU). However, the initial retrieval stage, often relying on efficient but less precise methods like K-Nearest Neighbors (KNN), struggles to effectively surface the most engaging items from billion-scale catalogs, particularly distinguishing highly relevant and engaging candidates from merely relevant ones. We introduce Recall Augmentation through Deferred Asynchronous Retrieval (RADAR), a novel framework that leverages asynchronous, offline computation to pre-rank a significantly larger candidate set for users using the full complexity ranking model. These top-ranked items are stored and utilized as a high-quality retrieval source during online inference, bypassing online retrieval and pre-ranking stages for these candidates. We demonstrate through offline experiments that RADAR significantly boosts recall (2X Recall@200 vs DNN retrieval baseline) by effectively combining a larger retrieved candidate set with a more powerful ranking model. Online A/B tests confirm a +0.8% lift in topline engagement metrics, validating RADAR as a practical and effective method to improve recommendation quality under strict online serving constraints.
nan
Article 1215
Title@2025-06-08 (7): ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition
Title: ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition | ALINE: Gemeinsame Amortisation für Bayesian Inferenz und aktive Datenerfassung | AlINE: 贝耶斯推断和主动数据采集联合摊销 2506.07259v1 |
Authors (5): Daolang Huang, Xinyi Wen, Ayush Bharti, Samuel Kaski, Luigi Acerbi
Many critical applications, from autonomous scientific discovery to personalized medicine, demand systems that can both strategically acquire the most informative data and instantaneously perform inference based upon it. While amortized methods for Bayesian inference and experimental design offer part of the solution, neither approach is optimal in the most general and challenging task, where new data needs to be collected for instant inference. To tackle this issue, we introduce the Amortized Active Learning and Inference Engine (ALINE), a unified framework for amortized Bayesian inference and active data acquisition. ALINE leverages a transformer architecture trained via reinforcement learning with a reward based on self-estimated information gain provided by its own integrated inference component. This allows it to strategically query informative data points while simultaneously refining its predictions. Moreover, ALINE can selectively direct its querying strategy towards specific subsets of model parameters or designated predictive tasks, optimizing for posterior estimation, data prediction, or a mixture thereof. Empirical results on regression-based active learning, classical Bayesian experimental design benchmarks, and a psychometric model with selectively targeted parameters demonstrate that ALINE delivers both instant and accurate inference along with efficient selection of informative points.
nan
Article 1216
Title@2025-06-08 (7): Can the Rookies Cut the Tough Cookie? Exploring the Use of LLMs for SQL Equivalence Checking
Title: Can the Rookies Cut the Tough Cookie? Exploring the Use of LLMs for SQL Equivalence Checking | Können die Rookies das harte Cookie ausschneiden? Erforschen der Verwendung von LLMs für SQL-Equivalenzprüfung | Rookies 能够切开硬饼干吗? 探索 SQL 等效检查使用 LLMs 的情况 2412.05561v2 |
Authors (2): Rajat Singh, Srikanta Bedathur
Equivalence checking of SQL queries is an intractable problem often encountered in settings ranging from grading SQL submissions to debugging query optimizers. Despite recent work toward developing practical solutions, only simple queries written using a small subset of SQL are supported, leaving the equivalence checking of sophisticated SQL queries at the mercy of intensive, potentially error-prone, manual analysis. In this paper, we explore how LLMs can be used to reason with SQL queries to address this challenging problem. Towards this, we introduce a novel, realistic, and sufficiently complex benchmark called SQLEquiQuest for SQL query equivalence checking that reflects real-world settings. We establish strong baselines for SQL equivalence checking by leveraging the ability of LLMs to reason with SQL queries. We conduct a detailed evaluation of several state-of-the-art LLMs using various prompting strategies and carefully constructed in-context learning examples, including logical plans generated by SQL query processors. Our empirical evaluation shows that LLMs go well beyond the current capabilities of formal models for SQL equivalence, going from a mere 30% supported query pairs to full coverage, achieving up to 82% accuracy on Spider+DIN. However, a critical limitation of LLMs revealed by our analysis is that they exhibit a strong bias for equivalence predictions, with consistently poor performance over non-equivalent pairs, opening a new direction for potential future research.
nan
Article 1217
Title@2025-06-08 (7): AMPO: Active Multi-Preference Optimization for Self-play Preference Selection
Title: AMPO: Active Multi-Preference Optimization for Self-play Preference Selection | AMPO: Aktive Multi-Preference-Optimierung für Selbstspiel-Preference-Auswahl | AMPO: 主动的自玩首选多优先优化 2502.18293v2 |
Authors (5): Taneesh Gupta, Rahul Madhavan, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan
Multi-preference optimization enriches language-model alignment beyond pairwise preferences by contrasting entire sets of helpful and undesired responses, thereby enabling richer training signals for large language models. During self-play alignment, these models often produce numerous candidate answers per query, rendering it computationally infeasible to include all responses in the training objective. In this work, we propose $\textit{Active Multi-Preference Optimization}$ (AMPO), a novel approach that combines on-policy generation, a multi-preference group-contrastive loss, and active subset selection. Specifically, we score and embed large candidate pools of responses and then select a small, yet informative, subset that covers reward extremes and distinct semantic clusters for preference optimization. Our contrastive training scheme is capable of identifying not only the best and worst answers but also subtle, underexplored modes that are crucial for robust alignment. Theoretically, we provide guarantees for expected reward maximization using our active selection method, and empirically, AMPO achieves state-of-the-art results on $\textit{AlpacaEval}$ using Llama 8B and Mistral 7B. We release our datasets $\href{https://huggingface.co/Multi-preference-Optimization}{here}$.
nan
Article 1218
Title@2025-06-08 (7): DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
Title: DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products | DeltaProdukt: Verbesserung der State-Tracking in linearen RNNs über Haushaltsprodukte | DeltaProduction:通过家用产品改进国家通过家用产品对Linear RNNNs的跟踪 2502.10297v5 |
Authors (6): Julien Siems, Timur Carstensen, Arber Zela, Frank Hutter, Massimiliano Pontil, Riccardo Grazzi
Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling, offering efficient training and linear-time inference. However, existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices. Diagonal matrices, used in models such as Mamba, GLA, or mLSTM, yield fast runtime but have limited expressivity. To address this, recent architectures such as DeltaNet and RWKV-7 adopted a diagonal plus rank-1 structure, which allows simultaneous token and channel mixing, improving associative recall and, as recently shown, state-tracking when allowing negative eigenvalues in the state-transition matrices. Building on the interpretation of DeltaNet’s recurrence as performing one step of online gradient descent per token on an associative recall loss, we introduce DeltaProduct, which instead takes multiple ($n_h$) steps per token. This naturally leads to diagonal plus rank-$n_h$ state-transition matrices, formed as products of $n_h$ generalized Householder transformations, providing a tunable mechanism to balance expressivity and efficiency. We provide a detailed theoretical characterization of the state-tracking capability of DeltaProduct in finite precision, showing how it improves by increasing $n_h$. Our extensive experiments demonstrate that DeltaProduct outperforms DeltaNet in both state-tracking and language modeling, while also showing significantly improved length extrapolation capabilities.
nan
Article 1219
Title@2025-06-08 (7): Horizon Reduction Makes RL Scalable
Title: Horizon Reduction Makes RL Scalable | Horizontreduzierung macht RL skalierbar | 降低地平线使RL可缩放 2506.04168v2 |
Authors (6): Seohong Park, Kevin Frans, Deepinder Mann, Benjamin Eysenbach, Aviral Kumar, Sergey Levine
In this work, we study the scalability of offline reinforcement learning (RL) algorithms. In principle, a truly scalable offline RL algorithm should be able to solve any given problem, regardless of its complexity, given sufficient data, compute, and model capacity. We investigate if and how current offline RL algorithms match up to this promise on diverse, challenging, previously unsolved tasks, using datasets up to 1000x larger than typical offline RL datasets. We observe that despite scaling up data, many existing offline RL algorithms exhibit poor scaling behavior, saturating well below the maximum performance. We hypothesize that the horizon is the main cause behind the poor scaling of offline RL. We empirically verify this hypothesis through several analysis experiments, showing that long horizons indeed present a fundamental barrier to scaling up offline RL. We then show that various horizon reduction techniques substantially enhance scalability on challenging tasks. Based on our insights, we also introduce a minimal yet scalable method named SHARSA that effectively reduces the horizon. SHARSA achieves the best asymptotic performance and scaling behavior among our evaluation methods, showing that explicitly reducing the horizon unlocks the scalability of offline RL. Code: https://github.com/seohongpark/horizon-reduction
nan
Article 1220
Title@2025-06-08 (7): Regretful Decisions under Label Noise
Title: Regretful Decisions under Label Noise | Bedauerliche Entscheidungen im Rahmen von Label Noise | 标签噪音下的遗憾决定 2504.09330v2 |
Authors (4): Sujay Nagaraj, Yang Liu, Flavio P. Calmon, Berk Ustun
Machine learning models are routinely used to support decisions that affect individuals – be it to screen a patient for a serious illness or to gauge their response to treatment. In these tasks, we are limited to learning models from datasets with noisy labels. In this paper, we study the instance-level impact of learning under label noise. We introduce a notion of regret for this regime, which measures the number of unforeseen mistakes due to noisy labels. We show that standard approaches to learning under label noise can return models that perform well at a population-level while subjecting individuals to a lottery of mistakes. We present a versatile approach to estimate the likelihood of mistakes at the individual-level from a noisy dataset by training models over plausible realizations of datasets without label noise. This is supported by a comprehensive empirical study of label noise in clinical prediction tasks. Our results reveal how failure to anticipate mistakes can compromise model reliability and adoption – we demonstrate how we can address these challenges by anticipating and avoiding regretful decisions.
nan
Article 1221
Title@2025-06-08 (7): Unraveling Token Prediction Refinement and Identifying Essential Layers in Language Models
Title: Unraveling Token Prediction Refinement and Identifying Essential Layers in Language Models | Unraveling Token Prediction Verfeinerung und Identifizierung wesentlicher Ebenen in Sprachmodellen | 在语言模型中统一地改进和确定关键层 2501.15054v2 |
Authors (1): Jaturong Kongmanee
This research aims to unravel how large language models (LLMs) iteratively refine token predictions through internal processing. We utilized a logit lens technique to analyze the model’s token predictions derived from intermediate representations. Specifically, we focused on (1) how LLMs access and utilize information from input contexts, and (2) how positioning of relevant information affects the model’s token prediction refinement process. On a multi-document question answering task with varying input context lengths, we found that the depth of prediction refinement (defined as the number of intermediate layers an LLM uses to transition from an initial correct token prediction to its final, stable correct output), as a function of the position of relevant information, exhibits an approximately inverted U-shaped curve. We also found that the gap between these two layers, on average, diminishes when relevant information is positioned at the beginning or end of the input context. This suggested that the model requires more refinements when processing longer contexts with relevant information situated in the middle. Furthermore, our findings indicate that not all layers are equally essential for determining final correct outputs. Our analysis provides insights into how token predictions are distributed across different conditions, and establishes important connections to existing hypotheses and previous findings in AI safety research and development.
nan
Article 1222
Title@2025-06-08 (7): Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search
Title: Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search | Unsichere LLM-basierte Suche: Quantitative Analyse und Minderung von Sicherheitsrisiken in der AI-Websuche | 不安全的LLM-基于LLM的搜索:AI网络搜索中安全风险的定量分析和减轻 2502.04951v2 |
Authors (7): Zeren Luo, Zifan Peng, Yule Liu, Zhen Sun, Mingchen Li, Jingyi Zheng, Xinlei He
Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of AI-Powered Search Engines (AIPSEs), offering precise and efficient responses by integrating external databases with pre-existing knowledge. However, we observe that these AIPSEs raise risks such as quoting malicious content or citing malicious websites, leading to harmful or unverified information dissemination. In this study, we conduct the first safety risk quantification on seven production AIPSEs by systematically defining the threat model, risk type, and evaluating responses to various query types. With data collected from PhishTank, ThreatBook, and LevelBlue, our findings reveal that AIPSEs frequently generate harmful content that contains malicious URLs even with benign queries (e.g., with benign keywords). We also observe that directly querying a URL will increase the number of main risk-inclusive responses, while querying with natural language will slightly mitigate such risk. Compared to traditional search engines, AIPSEs outperform in both utility and safety. We further perform two case studies on online document spoofing and phishing to show the ease of deceiving AIPSEs in the real-world setting. To mitigate these risks, we develop an agent-based defense with a GPT-4.1-based content refinement tool and a URL detector. Our evaluation shows that our defense can effectively reduce the risk, with only a minor cost of reducing available information by approximately 10.7%. Our research highlights the urgent need for robust safety measures in AIPSEs.
nan
Article 1223
Title@2025-06-08 (7): Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy
Title: Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy | Imperatives Lernen: Ein selbstüberwachtes neuro-symbolisches Lernrahmenwerk für Roboterautonomie | 必要学习:自我监督的机器人自主的神经-双向学习框架 2406.16087v6 |
Authors (19): Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao
Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, labeling data for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neuro-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.
nan
Article 1224
Title@2025-06-08 (7): InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference
Title: InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference | InfoSEM: Ein tiefgehendes Generatives Modell mit informativen Prioren für Gene Regulatory Network Inferenz | InfoSEM: 基因监管网络推论具有信息前置信息的深生成模型 2503.04483v2 |
Authors (7): Tianyu Cui, Song-Jun Xu, Artem Moskalev, Shuwei Li, Tommaso Mansi, Mangal Prakash, Rui Liao
Inferring Gene Regulatory Networks (GRNs) from gene expression data is crucial for understanding biological processes. While supervised models are reported to achieve high performance for this task, they rely on costly ground truth (GT) labels and risk learning gene-specific biases, such as class imbalances of GT interactions, rather than true regulatory mechanisms. To address these issues, we introduce InfoSEM, an unsupervised generative model that leverages textual gene embeddings as informative priors, improving GRN inference without GT labels. InfoSEM can also integrate GT labels as an additional prior when available, avoiding biases and further enhancing performance. Additionally, we propose a biologically motivated benchmarking framework that better reflects real-world applications such as biomarker discovery and reveals learned biases of existing supervised methods. InfoSEM outperforms existing models by 38.5% across four datasets using textual embeddings prior and further boosts performance by 11.1% when integrating labeled data as priors.
nan
Article 1225
Title@2025-06-08 (7): Improving the Efficiency of Long Document Classification using Sentence Ranking Approach
Title: Improving the Efficiency of Long Document Classification using Sentence Ranking Approach | Verbesserung der Effizienz der Langdokumentklassifikation mittels Sentence-Ranking-Ansatz | 采用判决分级办法提高长文件分类的效率 2506.07248v1 |
Authors (4): Prathamesh Kokate, Mitali Sarnaik, Manavi Khopade, Raviraj Joshi
Long document classification poses challenges due to the computational limitations of transformer-based models, particularly BERT, which are constrained by fixed input lengths and quadratic attention complexity. Moreover, using the full document for classification is often redundant, as only a subset of sentences typically carries the necessary information. To address this, we propose a TF-IDF-based sentence ranking method that improves efficiency by selecting the most informative content. Our approach explores fixed-count and percentage-based sentence selection, along with an enhanced scoring strategy combining normalized TF-IDF scores and sentence length. Evaluated on the MahaNews LDC dataset of long Marathi news articles, the method consistently outperforms baselines such as first, last, and random sentence selection. With MahaBERT-v2, we achieve near-identical classification accuracy with just a 0.33 percent drop compared to the full-context baseline, while reducing input size by over 50 percent and inference latency by 43 percent. This demonstrates that significant context reduction is possible without sacrificing performance, making the method practical for real-world long document classification tasks.
nan
Article 1226
Title@2025-06-08 (7): Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models
Title: Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models | Förderung der Ensemblevielfalt mit interaktiver bayesischer Distributions Robustheit für Feinsteuerungs-Stiftungsmodelle | 促进混合多样性与互动的贝叶斯分配强力促进微调基础模型 2506.07247v1 |
Authors (6): Ngoc-Quan Pham, Tuan Truong, Quyen Tran, Tan Nguyen, Dinh Phung, Trung Le
We introduce Interactive Bayesian Distributional Robustness (IBDR), a novel Bayesian inference framework that allows modeling the interactions between particles, thereby enhancing ensemble quality through increased particle diversity. IBDR is grounded in a generalized theoretical framework that connects the distributional population loss with the approximate posterior, motivating a practical dual optimization procedure that enforces distributional robustness while fostering particle diversity. We evaluate IBDR’s performance against various baseline methods using the VTAB-1K benchmark and the common reasoning language task. The results consistently show that IBDR outperforms these baselines, underscoring its effectiveness in real-world applications.
nan
Article 1227
Title@2025-06-08 (7): Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Title: Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs | Übertaktung LLM-Reasoning: Überwachung und Kontrolle von Denkpfadlängen in LLMs | LLM 解释:监测和控制LLM中思考长度 2506.07240v1 |
Authors (3): Roy Eisenstadt, Itamar Zimerman, Lior Wolf
Recently, techniques such as explicit structured reasoning have demonstrated strong test-time scaling behavior by enforcing a separation between the model’s internal “thinking” process and the final response. A key factor influencing answer quality in this setting is the length of the thinking stage. When the reasoning is too short, the model may fail to capture the complexity of the task. Conversely, when it is too long, the model may overthink, leading to unnecessary computation and degraded performance. This paper explores and exploits the underlying mechanisms by which LLMs understand and regulate the length of their reasoning during explicit thought processes. First, we show that LLMs encode their progress through the reasoning process and introduce an interactive progress bar visualization, which is then used to reveal insights on the model’s planning dynamics. Second, we manipulate the internal progress encoding during inference to reduce unnecessary steps and generate a more concise and decisive chain of thoughts. Our empirical results demonstrate that this “overclocking” method mitigates overthinking, improves answer accuracy, and reduces inference latency. Our code is publicly available.
nan
Article 1228
Title@2025-06-08 (7): Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments
Title: Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments | Lernen als Individuen, Evolve als Team: Multi-Agent LLMs Anpassung in körpereigenen Umgebungen | 作为个人学习,作为一个团队参与:多剂LMs在渗透环境中的适应 2506.07232v1 |
Authors (6): Xinran Li, Chenjia Bai, Zijian Li, Jiakun Zheng, Ting Xiao, Jun Zhang
Large language models (LLMs) possess extensive knowledge bases and strong reasoning capabilities, making them promising tools for complex, multi-agent planning in embodied environments. However, despite LLMs’ advanced abilities and the sophisticated modular design of agentic methods, existing LLM-based planning algorithms remain limited by weak adaptation capabilities to multi-agent embodied scenarios. We address this limitation by introducing a framework that enables LLM agents to learn and evolve both before and during test time, equipping them with environment-relevant knowledge for better planning and enhanced communication for improved cooperation. Inspired by centralized training with decentralized execution in multi-agent reinforcement learning, we propose a \textit{Learn as Individuals, Evolve as a Team (LIET)} paradigm for multi-agent LLMs adaptation. At the individual level, LLM agents learn a local utility function from exploratory datasets to better comprehend the embodied environment, which is then queried during test time to support informed decision-making. At the team level, LLM agents collaboratively and iteratively maintain and update a shared cooperation knowledge list based on new experiences, using it to guide more effective communication. By combining individual learning with team evolution, LIET enables comprehensive and flexible adaptation for LLM agents. Our experiments on Communicative Watch-And-Help and ThreeD-World Multi-Agent Transport benchmarks demonstrate that LIET, instantiated with both LLaMA and GPT-4o, outperforms existing baselines and exhibits strong cooperative planning abilities.
nan
Article 1229
Title@2025-06-08 (7): VARSHAP: Addressing Global Dependency Problems in Explainable AI with Variance-Based Local Feature Attribution
Title: VARSHAP: Addressing Global Dependency Problems in Explainable AI with Variance-Based Local Feature Attribution | VARSHAP: Bewältigung globaler Abhängigkeitsprobleme bei erklärbarer KI mit varianzbasierter lokaler Feature-Attribution | VARSHAP: 解决全球依赖性问题,在可解释的AI 中与基于差异的地方特点 2506.07229v1 |
Authors (4): Mateusz Gajewski, Mikołaj Morzy, Adam Karczmarz, Piotr Sankowski
Existing feature attribution methods like SHAP often suffer from global dependence, failing to capture true local model behavior. This paper introduces VARSHAP, a novel model-agnostic local feature attribution method which uses the reduction of prediction variance as the key importance metric of features. Building upon Shapley value framework, VARSHAP satisfies the key Shapley axioms, but, unlike SHAP, is resilient to global data distribution shifts. Experiments on synthetic and real-world datasets demonstrate that VARSHAP outperforms popular methods such as KernelSHAP or LIME, both quantitatively and qualitatively.
nan
Article 1230
Title@2025-06-08 (7): A spring-block theory of feature learning in deep neural networks
Title: A spring-block theory of feature learning in deep neural networks | Eine Frühling-Block-Theorie des Feature-Lernens in tiefen neuronalen Netzwerken | 深神经网络中特征学习的春季阶段理论 2407.19353v3 |
Authors (3): Cheng Shi, Liming Pan, Ivan Dokmanić
Feature-learning deep nets progressively collapse data to a regular low-dimensional geometry. How this emerges from the collective action of nonlinearity, noise, learning rate, and other factors, has eluded first-principles theories built from microscopic neuronal dynamics. We exhibit a noise-nonlinearity phase diagram that identifies regimes where shallow or deep layers learn more effectively and propose a macroscopic mechanical theory that reproduces the diagram and links feature learning across layers to generalization.
nan
Article 1231
Title@2025-06-08 (7): Targeting relative risk heterogeneity with causal forests
Title: Targeting relative risk heterogeneity with causal forests | Ausrichtung der relativen Risiko Heterogenität mit ursächlichen Wäldern | 确定相对风险目标,确定与因果森林的异质性 2309.15793v3 |
Authors (4): Vik Shirvaikar, Andrea Storås, Xi Lin, Chris Holmes
The identification of heterogeneous treatment effects (HTE) across subgroups is of significant interest in clinical trial analysis. Several state-of-the-art HTE estimation methods, including causal forests, apply recursive partitioning for non-parametric identification of relevant covariates and interactions. However, the partitioning criterion is typically based on differences in absolute risk. This can dilute statistical power by masking variation in the relative risk, which is often a more appropriate quantity of clinical interest. In this work, we propose and implement a methodology for modifying causal forests to target relative risk, using a novel node-splitting procedure based on exhaustive generalized linear model comparison. We present results from simulated data that suggest relative risk causal forests can capture otherwise undetected sources of heterogeneity. We implement our method on real-world trial data to explore HTEs for liraglutide in patients with type 2 diabetes.
nan
Article 1232
Title@2025-06-08 (7): VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data
Title: VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data | VORTEX: Ein räumliches Computing-Framework für optimierte Drohnentelemetrie-Extraktion aus der First-Person-Ansicht Flugdaten anzeigen | VORTEX:从第一人查看飞行数据中最佳遥测提取无人机空间计算框架 2412.18505v2 |
Authors (2): James E. Gallagher, Edward J. Oughton
This paper presents the Visual Optical Recognition Telemetry EXtraction (VORTEX) system for extracting and analyzing drone telemetry data from First Person View (FPV) Uncrewed Aerial System (UAS) footage. VORTEX employs MMOCR, a PyTorch-based Optical Character Recognition (OCR) toolbox, to extract telemetry variables from drone Heads Up Display (HUD) recordings, utilizing advanced image preprocessing techniques, including CLAHE enhancement and adaptive thresholding. The study optimizes spatial accuracy and computational efficiency through systematic investigation of temporal sampling rates (1s, 5s, 10s, 15s, 20s) and coordinate processing methods. Results demonstrate that the 5-second sampling rate, utilizing 4.07% of available frames, provides the optimal balance with a point retention rate of 64% and mean speed accuracy within 4.2% of the 1-second baseline while reducing computational overhead by 80.5%. Comparative analysis of coordinate processing methods reveals that while UTM Zone 33N projection and Haversine calculations provide consistently similar results (within 0.1% difference), raw WGS84 coordinates underestimate distances by 15-30% and speeds by 20-35%. Altitude measurements showed unexpected resilience to sampling rate variations, with only 2.1% variation across all intervals. This research is the first of its kind, providing quantitative benchmarks for establishing a robust framework for drone telemetry extraction and analysis using open-source tools and spatial libraries.
nan
Article 1233
Title@2025-06-08 (7): Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
Title: Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts | Diversifizierung des Expertenwissens für Task-Agnostic Pruning in Sparse Mixture-of-Experts | 使专家在Sparse Mixture 专家中的任务不可知性调节专家知识多样化 2407.09590v4 |
Authors (5): Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao
By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our empirical study reveals that some experts encode redundant knowledge during pre-training. We thus propose a method of grouping and pruning similar experts to improve the model’s parameter efficiency. We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures, including Mixtral, Deepseek-MoE, and Qwen. The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks. We will release our code to facilitate future research.
nan
Article 1234
Title@2025-06-08 (7): Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward
Title: Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward | Mehr multimodale Begründungsfähigkeiten multimodaler Großsprachenmodelle durch Visual Perception Reward | 通过视觉感知奖励促进多式大语言模型的多式多式能力 2506.07218v1 |
Authors (7): Tong Xiao, Xin Xu, Zhenya Huang, Hongyu Gao, Quan Liu, Qi Liu, Enhong Chen
Enhancing the multimodal reasoning capabilities of Multimodal Large Language Models (MLLMs) is a challenging task that has attracted increasing attention in the community. Recently, several studies have applied Reinforcement Learning with Verifiable Rewards (RLVR) to the multimodal domain in order to enhance the reasoning abilities of MLLMs. However, these works largely overlook the enhancement of multimodal perception capabilities in MLLMs, which serve as a core prerequisite and foundational component of complex multimodal reasoning. Through McNemar’s test, we find that existing RLVR method fails to effectively enhance the multimodal perception capabilities of MLLMs, thereby limiting their further improvement in multimodal reasoning. To address this limitation, we propose Perception-R1, which introduces a novel visual perception reward that explicitly encourages MLLMs to perceive the visual content accurately, thereby can effectively incentivizing both their multimodal perception and reasoning capabilities. Specifically, we first collect textual visual annotations from the CoT trajectories of multimodal problems, which will serve as visual references for reward assignment. During RLVR training, we employ a judging LLM to assess the consistency between the visual annotations and the responses generated by MLLM, and assign the visual perception reward based on these consistency judgments. Extensive experiments on several multimodal reasoning benchmarks demonstrate the effectiveness of our Perception-R1, which achieves state-of-the-art performance on most benchmarks using only 1,442 training data.
nan
Article 1235
Title@2025-06-08 (7): Decoupled Data Consistency with Diffusion Purification for Image Restoration
Title: Decoupled Data Consistency with Diffusion Purification for Image Restoration | Entkoppelte Datenkonsistenz mit Diffusionsreinigung für Bildrestaurierung | 数据与用于图像恢复的传播净化相兼容 2403.06054v6 |
Authors (6): Xiang Li, Soo Min Kwon, Shijun Liang, Ismail R. Alkhouri, Saiprasad Ravishankar, Qing Qu
Diffusion models have recently gained traction as a powerful class of deep generative priors, excelling in a wide range of image restoration tasks due to their exceptional ability to model data distributions. To solve image restoration problems, many existing techniques achieve data consistency by incorporating additional likelihood gradient steps into the reverse sampling process of diffusion models. However, the additional gradient steps pose a challenge for real-world practical applications as they incur a large computational overhead, thereby increasing inference time. They also present additional difficulties when using accelerated diffusion model samplers, as the number of data consistency steps is limited by the number of reverse sampling steps. In this work, we propose a novel diffusion-based image restoration solver that addresses these issues by decoupling the reverse process from the data consistency steps. Our method involves alternating between a reconstruction phase to maintain data consistency and a refinement phase that enforces the prior via diffusion purification. Our approach demonstrates versatility, making it highly adaptable for efficient problem-solving in latent space. Additionally, it reduces the necessity for numerous sampling steps through the integration of consistency models. The efficacy of our approach is validated through comprehensive experiments across various image restoration tasks, including image denoising, deblurring, inpainting, and super-resolution.
nan
Article 1236
Title@2025-06-08 (7): RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts
Title: RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts | RepLoRA: Reparameterisieren der Low-Rank-Anpassung über die Perspektive der Mischung von Experten | RepLORA:通过专家混合的视角,恢复低成本适应的计量 2502.03044v2 |
Authors (6): Tuan Truong, Chau Nguyen, Huy Nguyen, Minh Le, Trung Le, Nhat Ho
Low-rank Adaptation (LoRA) has emerged as a powerful method for fine-tuning large-scale foundation models. Despite its popularity, the theoretical understanding of LoRA has remained limited. This paper presents a theoretical analysis of LoRA by examining its connection to the Mixture of Experts models. Under this framework, we show that simple reparameterizations of the LoRA matrices can notably accelerate the low-rank matrix estimation process. In particular, we prove that reparameterization can reduce the data needed to achieve a desired estimation error from an exponential to a polynomial scale. Motivated by this insight, we propose Reparameterized Low-Rank Adaptation (RepLoRA), which incorporates lightweight MLPs to reparameterize the LoRA matrices. Extensive experiments across multiple domains demonstrate that RepLoRA consistently outperforms vanilla LoRA. Notably, with limited data, RepLoRA surpasses LoRA by a margin of up to 40.0% and achieves LoRA’s performance with only 30.0% of the training data, highlighting both the theoretical and empirical robustness of our PEFT method.
nan
Article 1237
Title@2025-06-08 (7): Tilted Sharpness-Aware Minimization
Title: Tilted Sharpness-Aware Minimization | Gekippte Schärfe-Bewusst-Minimierung | 静液尖锐度- 最小化 2410.22656v2 |
Authors (3): Tian Li, Tianyi Zhou, Jeffrey A. Bilmes
Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of overparameterized models by seeking flat minima on the loss landscape through optimizing model parameters that incur the largest loss within a neighborhood. Nevertheless, such min-max formulations are computationally challenging especially when the problem is highly non-convex. Additionally, focusing only on the worst-case local solution while ignoring potentially many other local solutions may be suboptimal when searching for flat minima. In this work, we propose Tilted SAM (TSAM), a smoothed generalization of SAM inspired by exponential tilting that effectively assigns higher priority to local solutions that incur larger losses. TSAM is parameterized by a tilt hyperparameter $t$ and reduces to SAM as $t$ approaches infinity. We show that TSAM is smoother than SAM and thus easier to optimize, and it explicitly favors flatter minima. We develop algorithms motivated by the discretization of Hamiltonian dynamics to solve TSAM. Empirically, TSAM arrives at flatter local minima and results in superior test performance than the baselines of SAM and ERM across a range of image and text tasks.
nan
Article 1238
Title@2025-06-08 (7): Randomness, exchangeability, and conformal prediction
Title: Randomness, exchangeability, and conformal prediction | Zufälligkeit, Austauschbarkeit und konforme Vorhersage | 随机性、易交换性以及符合性预测 2501.11689v3 |
Authors (1): Vladimir Vovk
This paper argues for a wider use of the functional theory of randomness, a modification of the algorithmic theory of randomness getting rid of unspecified additive constants. Both theories are useful for understanding relationships between the assumptions of IID data and data exchangeability. While the assumption of IID data is standard in machine learning, conformal prediction relies on data exchangeability. Nouretdinov, V’yugin, and Gammerman showed, using the language of the algorithmic theory of randomness, that conformal prediction is a universal method under the assumption of IID data. In this paper (written for the Alex Gammerman Festschrift) I will selectively review connections between exchangeability and the property of being IID, early history of conformal prediction, my encounters and collaboration with Alex and other interesting people, and a translation of Nouretdinov et al.’s results into the language of the functional theory of randomness, which moves it closer to practice. Namely, the translation says that every confidence predictor that is valid for IID data can be transformed to a conformal predictor without losing much in predictive efficiency.
nan
Article 1239
Title@2025-06-08 (7): Improving Generalization with Flat Hilbert Bayesian Inference
Title: Improving Generalization with Flat Hilbert Bayesian Inference | Verbesserung der Generalisierung durch flache Hilbert Bayesische Schlussfolgerung | 利用Flat Hilbert Bayesian推论改进普遍化 2410.04196v2 |
Authors (6): Tuan Truong, Quyen Tran, Quan Pham-Ngoc, Nhat Ho, Dinh Phung, Trung Le
We introduce Flat Hilbert Bayesian Inference (FHBI), an algorithm designed to enhance generalization in Bayesian inference. Our approach involves an iterative two-step procedure with an adversarial functional perturbation step and a functional descent step within a reproducing kernel Hilbert space. This methodology is supported by a theoretical analysis that extends previous findings on generalization ability from finite-dimensional Euclidean spaces to infinite-dimensional functional spaces. To evaluate the effectiveness of FHBI, we conduct comprehensive comparisons against nine baseline methods on the \texttt{VTAB-1K} benchmark, which encompasses 19 diverse datasets across various domains with diverse semantics. Empirical results demonstrate that FHBI consistently outperforms the baselines by notable margins, highlighting its practical efficacy.
nan
Article 1240
Title@2025-06-08 (7): Another look at inference after prediction
Title: Another look at inference after prediction | Ein weiterer Blick auf die Schlussfolgerung nach der Vorhersage | 另一种观点是预测之后的推论 2411.19908v4 |
Authors (5): Jessica Gronsbell, Jianhui Gao, Yaqi Shi, Zachary R. McCaw, David Cheng
From structural biology to epidemiology, predictions from machine learning (ML) models are increasingly used to complement costly gold-standard data to enable faster, more affordable, and scalable scientific inquiry. In response, prediction-based (PB) inference has emerged to accommodate statistical analysis using a large volume of predictions together with a small amount of gold-standard data. The goals of PB inference are two-fold: (i) to mitigate bias from errors in predictions and (ii) to improve efficiency relative to traditional inference using only the gold-standard data. While early PB inference methods focused on bias, their ability to enhance efficiency remains unclear. We revisit a popular PB inference method and show that a simple modification can be applied to guarantee improvements in efficiency beyond yielding valid inferences when the ML predictions are imperfect. The utility of this approach in leveraging prediction-based outcomes to enhance efficiency is demonstrated through extensive simulation studies and an application to the UK Biobank data. We further contextualize the problem of PB inference through historical literature from economics and statistics to highlight perspectives from classical methods in this contemporary problem.
nan
Article 1241
Title@2025-06-08 (7): Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching
Title: Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching | Audio-Synthesizer-Inversion in symmetrischen Parameterräumen mit ungefähr gleichmässigem Flow-Matching | 音频合成器在对称参数空间的反转, 大约等差流匹配 2506.07199v1 |
Authors (3): Ben Hayes, Charalampos Saitis, György Fazekas
Many audio synthesizers can produce the same signal given different parameter configurations, meaning the inversion from sound to parameters is an inherently ill-posed problem. We show that this is largely due to intrinsic symmetries of the synthesizer, and focus in particular on permutation invariance. First, we demonstrate on a synthetic task that regressing point estimates under permutation symmetry degrades performance, even when using a permutation-invariant loss function or symmetry-breaking heuristics. Then, viewing equivalent solutions as modes of a probability distribution, we show that a conditional generative model substantially improves performance. Further, acknowledging the invariance of the implicit parameter distribution, we find that performance is further improved by using a permutation equivariant continuous normalizing flow. To accommodate intricate symmetries in real synthesizers, we also propose a relaxed equivariance strategy that adaptively discovers relevant symmetries from data. Applying our method to Surge XT, a full-featured open source synthesizer used in real world audio production, we find our method outperforms regression and generative baselines across audio reconstruction metrics.
nan
Article 1242
Title@2025-06-08 (7): GGBall: Graph Generative Model on Poincaré Ball
Title: GGBall: Graph Generative Model on Poincaré Ball | GGBall: Graph Generatives Modell auf Poincaré Ball | GGBall:poincaré球球的图形生成模型 2506.07198v1 |
Authors (6): Tianci Bu, Chuanrui Wang, Hao Ma, Haoren Zheng, Xin Lu, Tailin Wu
Generating graphs with hierarchical structures remains a fundamental challenge due to the limitations of Euclidean geometry in capturing exponential complexity. Here we introduce \textbf{GGBall}, a novel hyperbolic framework for graph generation that integrates geometric inductive biases with modern generative paradigms. GGBall combines a Hyperbolic Vector-Quantized Autoencoder (HVQVAE) with a Riemannian flow matching prior defined via closed-form geodesics. This design enables flow-based priors to model complex latent distributions, while vector quantization helps preserve the curvature-aware structure of the hyperbolic space. We further develop a suite of hyperbolic GNN and Transformer layers that operate entirely within the manifold, ensuring stability and scalability. Empirically, our model reduces degree MMD by over 75\% on Community-Small and over 40\% on Ego-Small compared to state-of-the-art baselines, demonstrating an improved ability to preserve topological hierarchies. These results highlight the potential of hyperbolic geometry as a powerful foundation for the generative modeling of complex, structured, and hierarchical data domains. Our code is available at \href{https://github.com/AI4Science-WestlakeU/GGBall}{here}.
nan
Article 1243
Title@2025-06-08 (7): Finite-PINN: A Physics-Informed Neural Network with Finite Geometric Encoding for Solid Mechanics
Title: Finite-PINN: A Physics-Informed Neural Network with Finite Geometric Encoding for Solid Mechanics | Finite-PINN: Ein physikinformiertes Neuronales Netzwerk mit Finite Geometrischer Encoding für feste Mechanik | Finite-PINN:一个物理化神经网络,其中含有固体机械学的有限几何编码 2412.09453v2 |
Authors (4): Haolin Li, Yuyang Miao, Zahra Sharif Khodaei, M. H. Aliabadi
PINN models have demonstrated capabilities in addressing fluid PDE problems, and their potential in solid mechanics is beginning to emerge. This study identifies two key challenges when using PINN to solve general solid mechanics problems. These challenges become evident when comparing the limitations of PINN with the well-established numerical methods commonly used in solid mechanics, such as the finite element method (FEM). Specifically: a) PINN models generate solutions over an infinite domain, which conflicts with the finite boundaries typical of most solid structures; and b) the solution space utilised by PINN is Euclidean, which is inadequate for addressing the complex geometries often present in solid structures. This work presents a PINN architecture for general solid mechanics problems, referred to as the Finite-PINN model. The model is designed to effectively tackle two key challenges, while retaining as much of the original PINN framework as possible. To this end, the Finite-PINN incorporates finite geometric encoding into the neural network inputs, thereby transforming the solution space from a conventional Euclidean space into a hybrid Euclidean-topological space. The model is comprehensively trained using both strong-form and weak-form loss formulations, enabling its application to a wide range of forward and inverse problems in solid mechanics. For forward problems, the Finite-PINN model efficiently approximates solutions to solid mechanics problems when the geometric information of a given structure has been preprocessed. For inverse problems, it effectively reconstructs full-field solutions from very sparse observations by embedding both physical laws and geometric information within its architecture.
nan
Article 1244
Title@2025-06-08 (7): Analyzing Breast Cancer Survival Disparities by Race and Demographic Location: A Survival Analysis Approach
Title: Analyzing Breast Cancer Survival Disparities by Race and Demographic Location: A Survival Analysis Approach | Analyse der Überlebensdisparitäten von Brustkrebs nach Rasse und demographischer Lage: Ein Ansatz zur Überlebensanalyse | 按种族和人口地点分析乳腺癌生存差异:生存分析方法 2506.07191v1 |
Authors (2): Ramisa Farha, Joshua O. Olukoya
This study employs a robust analytical framework to uncover patterns in survival outcomes among breast cancer patients from diverse racial and geographical backgrounds. This research uses the SEER 2021 dataset to analyze breast cancer survival outcomes to identify and comprehend dissimilarities. Our approach integrates exploratory data analysis (EDA), through this we identify key variables that influence survival rates and employ survival analysis techniques, including the Kaplan-Meier estimator and log-rank test and the advanced modeling Cox Proportional Hazards model to determine how survival rates vary across racial groups and countries. Model validation and interpretation are undertaken to ensure the reliability of our findings, which are documented comprehensively to inform policymakers and healthcare professionals. The outcome of this paper is a detailed version of statistical analysis that not just highlights disparities in breast cancer treatment and care but also serves as a foundational tool for developing targeted interventions to address the inequalities effectively. Through this research, our aim is to contribute to the global efforts to improve breast cancer outcomes and reduce treatment disparities.
nan
Article 1245
Title@2025-06-08 (7): ST-USleepNet: A Spatial-Temporal Coupling Prominence Network for Multi-Channel Sleep Staging
Title: ST-USleepNet: A Spatial-Temporal Coupling Prominence Network for Multi-Channel Sleep Staging | ST-USleepNet: Ein Raum-Temporal-Koppelungs-Prominenz-Netzwerk für Multi-Channel-Sleep-Staging | ST-USSLSLSINT:多通道睡眠调节空间-时际聚合显要网络 2408.11884v3 |
Authors (4): Jingying Ma, Qika Lin, Ziyu Jia, Mengling Feng
Sleep staging is critical to assess sleep quality and diagnose disorders. Despite advancements in artificial intelligence enabling automated sleep staging, significant challenges remain: (1) Simultaneously extracting prominent temporal and spatial sleep features from multi-channel raw signals, including characteristic sleep waveforms and salient spatial brain networks. (2) Capturing the spatial-temporal coupling patterns essential for accurate sleep staging. To address these challenges, we propose a novel framework named ST-USleepNet, comprising a spatial-temporal graph construction module (ST) and a U-shaped sleep network (USleepNet). The ST module converts raw signals into a spatial-temporal graph based on signal similarity, temporal, and spatial relationships to model spatial-temporal coupling patterns. The USleepNet employs a U-shaped structure for both the temporal and spatial streams, mirroring its original use in image segmentation to isolate significant targets. Applied to raw sleep signals and graph data from the ST module, USleepNet effectively segments these inputs, simultaneously extracting prominent temporal and spatial sleep features. Testing on three datasets demonstrates that ST-USleepNet outperforms existing baselines, and model visualizations confirm its efficacy in extracting prominent sleep features and temporal-spatial coupling patterns across various sleep stages. The code is available at https://github.com/Majy-Yuji/ST-USleepNet.
nan
Article 1246
Title@2025-06-08 (7): Learning based on neurovectors for tabular data: a new neural network approach
Title: Learning based on neurovectors for tabular data: a new neural network approach | Lernen auf Basis von Neurovektoren für tabellarische Daten: ein neuer neuronaler Netzwerkansatz | 基于用于表格数据的神经动力学的学习:新的神经网络方法 2506.07185v1 |
Authors (4): J. C. Husillos, A. Gallego, A. Roma, A. Troncoso
In this paper, we present a novel learning approach based on Neurovectors, an innovative paradigm that structures information through interconnected nodes and vector relationships for tabular data processing. Unlike traditional artificial neural networks that rely on weight adjustment through backpropagation, Neurovectors encode information by structuring data in vector spaces where energy propagation, rather than traditional weight updates, drives the learning process, enabling a more adaptable and explainable learning process. Our method generates dynamic representations of knowledge through neurovectors, thereby improving both the interpretability and efficiency of the predictive model. Experimental results using datasets from well-established repositories such as the UCI machine learning repository and Kaggle are reported both for classification and regression. To evaluate its performance, we compare our approach with standard machine learning and deep learning models, showing that Neurovectors achieve competitive accuracy.
nan
Article 1247
Title@2025-06-08 (7): Regularized Adaptive Graph Learning for Large-Scale Traffic Forecasting
Title: Regularized Adaptive Graph Learning for Large-Scale Traffic Forecasting | Regularisiertes Adaptives Graphenlernen für großräumige Verkehrsprognosen | 用于大型交通流量预测的正规化适应性图表学习 2506.07179v1 |
Authors (5): Kaiqi Wu, Weiyang Kong, Sen Zhang, Yubao Liu, Zitong Chen
Traffic prediction is a critical task in spatial-temporal forecasting with broad applications in travel planning and urban management. Adaptive graph convolution networks have emerged as mainstream solutions due to their ability to learn node embeddings in a data-driven manner and capture complex latent dependencies. However, existing adaptive graph learning methods for traffic forecasting often either ignore the regularization of node embeddings, which account for a significant proportion of model parameters, or face scalability issues from expensive graph convolution operations. To address these challenges, we propose a Regularized Adaptive Graph Learning (RAGL) model. First, we introduce a regularized adaptive graph learning framework that synergizes Stochastic Shared Embedding (SSE) and adaptive graph convolution via a residual difference mechanism, achieving both embedding regularization and noise suppression. Second, to ensure scalability on large road networks, we develop the Efficient Cosine Operator (ECO), which performs graph convolution based on the cosine similarity of regularized embeddings with linear time complexity. Extensive experiments on four large-scale real-world traffic datasets show that RAGL consistently outperforms state-of-the-art methods in terms of prediction accuracy and exhibits competitive computational efficiency.
nan
Article 1248
Title@2025-06-08 (7): RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality
Title: RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality | RULE: Verstärkung des Unerfahrenen erreicht Vergessen-Behalten Pareto-Optimalität | RULE: 强化不放功率达到遗忘-保留最佳程度 2506.07171v1 |
Authors (8): Chenlong Zhang, Zhuoran Jin, Hongbang Yuan, Jiaheng Wei, Tong Zhou, Kang Liu, Jun Zhao, Yubo Chen
The widespread deployment of Large Language Models (LLMs) trained on massive, uncurated corpora has raised growing concerns about the inclusion of sensitive, copyrighted, or illegal content. This has led to increasing interest in LLM unlearning: the task of selectively removing specific information from a model without retraining from scratch or degrading overall utility. However, existing methods often rely on large-scale forget and retain datasets, and suffer from unnatural responses, poor generalization, or catastrophic utility loss. In this work, we propose Reinforcement UnLearning (RULE), an efficient framework that formulates unlearning as a refusal boundary optimization problem. RULE is trained with a small portion of the forget set and synthesized boundary queries, using a verifiable reward function that encourages safe refusal on forget–related queries while preserving helpful responses on permissible inputs. We provide both theoretical and empirical evidence demonstrating the effectiveness of RULE in achieving targeted unlearning without compromising model utility. Experimental results show that, with only $12%$ forget set and $8%$ synthesized boundary data, RULE outperforms existing baselines by up to $17.5%$ forget quality and $16.3%$ naturalness response while maintaining general utility, achieving forget–retain Pareto optimality. Remarkably, we further observe that RULE improves the naturalness of model outputs, enhances training efficiency, and exhibits strong generalization ability, generalizing refusal behavior to semantically related but unseen queries.
nan
Article 1249
Title@2025-06-08 (7): Efficient Text-Attributed Graph Learning through Selective Annotation and Graph Alignment
Title: Efficient Text-Attributed Graph Learning through Selective Annotation and Graph Alignment | Effizientes Text-Attributes Graphenlernen durch selektive Annotation und Graphenausrichtung | 通过选择性注释和图表对齐,通过选择性注释和图表对齐,高效率的文本定位图表学习 2506.07168v1 |
Authors (9): Huanyi Xie, Lijie Hu, Lu Yu, Tianhao Huang, Longfei Li, Meng Li, Jun Zhou, Huan Wang, Di Wang
In the realm of Text-attributed Graphs (TAGs), traditional graph neural networks (GNNs) often fall short due to the complex textual information associated with each node. Recent methods have improved node representations by leveraging large language models (LLMs) to enhance node text features, but these approaches typically require extensive annotations or fine-tuning across all nodes, which is both time-consuming and costly. To overcome these challenges, we introduce GAGA, an efficient framework for TAG representation learning. GAGA reduces annotation time and cost by focusing on annotating only representative nodes and edges. It constructs an annotation graph that captures the topological relationships among these annotations. Furthermore, GAGA employs a two-level alignment module to effectively integrate the annotation graph with the TAG, aligning their underlying structures. Experiments show that GAGA achieves classification accuracies on par with or surpassing state-of-the-art methods while requiring only 1% of the data to be annotated, demonstrating its high efficiency.
nan
Article 1250
Title@2025-06-08 (7): AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models
Title: AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models | AMoPO: Adaptive Multi-Objektive Preference-Optimierung ohne Prämienmodelle und Referenzmodelle | AMOP: 适应性、多目标、不奖励模式和参考模式的优化 2506.07165v1 |
Authors (10): Qi Liu, Jingqing Ruan, Hao Li, Haodong Zhao, Desheng Wang, Jiansong Chen, Wan Guanglu, Xunliang Cai, Zhi Zheng, Tong Xu
Existing multi-objective preference alignment methods for large language models (LLMs) face limitations: (1) the inability to effectively balance various preference dimensions, and (2) reliance on auxiliary reward/reference models introduces computational complexity. To address these challenges, we propose Adaptive Multi-objective Preference Optimization (AMoPO), a novel framework that achieves dynamic balance across preference dimensions. By introducing the multi-objective optimization paradigm to use the dimension-aware generation metrics as implicit rewards, AMoPO aligns LLMs with diverse preferences without additional reward models or reference models. We introduce an adaptive weight assignment mechanism that models the generation space as a Gaussian distribution, allowing dynamic prioritization of preference dimensions. Empirical results demonstrate that AMoPO outperforms state-of-the-art baselines by 28.5%, and the experiments on 7B, 14B, and 32B models reveal the scaling ability of AMoPO. Moreover, additional analysis of multiple dimensions verifies its adaptability and effectiveness. These findings validate AMoPO’s capability to achieve dimension-aware preference alignment, highlighting its superiority. Our codes and datasets are available at https://github.com/Javkonline/AMoPO.
nan
Article 1251
Title@2025-06-08 (7): ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models
Title: ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models | ExLM: Nachdenken über die Auswirkungen von [MASK] Token in maskierten Sprachmodellen | EXLM:重新思考[MASK]语调模式中[MASK]语调的影响 2501.13397v5 |
Authors (8): Kangjie Zheng, Junwei Yang, Siyue Liang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang
Masked Language Models (MLMs) have achieved remarkable success in many self-supervised representation learning tasks. MLMs are trained by randomly masking portions of the input sequences with [MASK] tokens and learning to reconstruct the original content based on the remaining context. This paper explores the impact of [MASK] tokens on MLMs. Analytical studies show that masking tokens can introduce the corrupted semantics problem, wherein the corrupted context may convey multiple, ambiguous meanings. This problem is also a key factor affecting the performance of MLMs on downstream tasks. Based on these findings, we propose a novel enhanced-context MLM, ExLM. Our approach expands [MASK] tokens in the input context and models the dependencies between these expanded states. This enhancement increases context capacity and enables the model to capture richer semantic information, effectively mitigating the corrupted semantics problem during pre-training. Experimental results demonstrate that ExLM achieves significant performance improvements in both text modeling and SMILES modeling tasks. Further analysis confirms that ExLM enriches semantic representations through context enhancement, and effectively reduces the semantic multimodality commonly observed in MLMs.
nan
Article 1252
Title@2025-06-08 (7): SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision
Title: SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision | SMI-Editor: Bearbeiten-basiertes SMILES-Sprachenmodell mit Fragment-Level-Überwachung | SMI-编辑器:基于编辑的SMILES语言模式,带有零散监督级别 2412.05569v2 |
Authors (8): Kangjie Zheng, Siyue Liang, Junwei Yang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang
SMILES, a crucial textual representation of molecular structures, has garnered significant attention as a foundation for pre-trained language models (LMs). However, most existing pre-trained SMILES LMs focus solely on the single-token level supervision during pre-training, failing to fully leverage the substructural information of molecules. This limitation makes the pre-training task overly simplistic, preventing the models from capturing richer molecular semantic information. Moreover, during pre-training, these SMILES LMs only process corrupted SMILES inputs, never encountering any valid SMILES, which leads to a train-inference mismatch. To address these challenges, we propose SMI-Editor, a novel edit-based pre-trained SMILES LM. SMI-Editor disrupts substructures within a molecule at random and feeds the resulting SMILES back into the model, which then attempts to restore the original SMILES through an editing process. This approach not only introduces fragment-level training signals, but also enables the use of valid SMILES as inputs, allowing the model to learn how to reconstruct complete molecules from these incomplete structures. As a result, the model demonstrates improved scalability and an enhanced ability to capture fragment-level molecular information. Experimental results show that SMI-Editor achieves state-of-the-art performance across multiple downstream molecular tasks, and even outperforming several 3D molecular representation models.
nan
Article 1253
Title@2025-06-08 (7): pFedSOP : Accelerating Training Of Personalized Federated Learning Using Second-Order Optimization
Title: pFedSOP : Accelerating Training Of Personalized Federated Learning Using Second-Order Optimization | pFedSOP : Beschleunigendes Training des personalisierten Föderierten Lernens unter Verwendung von Second-Order-Optimierung | pFedSOP: 利用二阶优化加快个人化联邦学习培训 2506.07159v1 |
Authors (2): Mrinmay Sen, Chalavadi Krishna Mohan
Personalized Federated Learning (PFL) enables clients to collaboratively train personalized models tailored to their individual objectives, addressing the challenge of model generalization in traditional Federated Learning (FL) due to high data heterogeneity. However, existing PFL methods often require increased communication rounds to achieve the desired performance, primarily due to slow training caused by the use of first-order optimization, which has linear convergence. Additionally, many of these methods increase local computation because of the additional data fed into the model during the search for personalized local models. One promising solution to this slow training is second-order optimization, known for its quadratic convergence. However, employing it in PFL is challenging due to the Hessian matrix and its inverse. In this paper, we propose pFedSOP, which efficiently utilizes second-order optimization in PFL to accelerate the training of personalized models and enhance performance with fewer communication rounds. Our approach first computes a personalized local gradient update using the Gompertz function-based normalized angle between local and global gradient updates, incorporating client-specific global information. We then use a regularized Fisher Information Matrix (FIM), computed from this personalized gradient update, as an approximation of the Hessian to update the personalized models. This FIM-based second-order optimization speeds up training with fewer communication rounds by tackling the challenges with exact Hessian and avoids additional data being fed into the model during the search for personalized local models. Extensive experiments on heterogeneously partitioned image classification datasets with partial client participation demonstrate that pFedSOP outperforms state-of-the-art FL and PFL algorithms.
nan
Article 1254
Title@2025-06-08 (7): Syntactic Control of Language Models by Posterior Inference
Title: Syntactic Control of Language Models by Posterior Inference | Syntaktische Kontrolle von Sprachmodellen durch hintere Inferenz | 由Posideer 推断对语言模型的协同控制 2506.07154v1 |
Authors (4): Vicky Xefteri, Tim Vieira, Ryan Cotterell, Afra Amini
Controlling the syntactic structure of text generated by language models is valuable for applications requiring clarity, stylistic consistency, or interpretability, yet it remains a challenging task. In this paper, we argue that sampling algorithms based on the posterior inference can effectively enforce a target constituency structure during generation. Our approach combines sequential Monte Carlo, which estimates the posterior distribution by sampling from a proposal distribution, with a syntactic tagger that ensures that each generated token aligns with the desired syntactic structure. Our experiments with GPT2 and Llama3-8B models show that with an appropriate proposal distribution, we can improve syntactic accuracy, increasing the F1 score from $12.31$ (GPT2-large) and $35.33$ (Llama3-8B) to about $93$ in both cases without compromising the language model’s fluency. These results underscore both the complexity of syntactic control and the effectiveness of sampling algorithms, offering a promising approach for applications where precise control over syntax is essential.
nan
Article 1255
Title@2025-06-08 (7): Tensor Product Neural Networks for Functional ANOVA Model
Title: Tensor Product Neural Networks for Functional ANOVA Model | Tensor Produkt Neuronale Netzwerke für funktionales ANOVA-Modell | 功能ANOVA模型的神经网络 2502.15215v4 |
Authors (5): Seokhun Park, Insung Kong, Yongchan Choi, Chanmoo Park, Yongdai Kim
Interpretability for machine learning models is becoming more and more important as machine learning models become more complex. The functional ANOVA model, which decomposes a high-dimensional function into a sum of lower dimensional functions (commonly referred to as components), is one of the most popular tools for interpretable AI, and recently, various neural networks have been developed for estimating each component in the functional ANOVA model. However, such neural networks are highly unstable when estimating each component since the components themselves are not uniquely defined. That is, there are multiple functional ANOVA decompositions for a given function. In this paper, we propose a novel neural network which guarantees a unique functional ANOVA decomposition and thus is able to estimate each component stably and accurately. We call our proposed neural network ANOVA Tensor Product Neural Network (ANOVA-TPNN) since it is motivated by the tensor product basis expansion. Theoretically, we prove that ANOVA-TPNN can approximate any smooth function well. Empirically, we show that ANOVA-TPNN provide much more stable estimation of each component and thus much more stable interpretation when training data and initial values of the model parameters vary than existing neural networks do.
nan
Article 1256
Title@2025-06-08 (7): Comba: Improving Bilinear RNNs with Closed-loop Control
Title: Comba: Improving Bilinear RNNs with Closed-loop Control | Comba: Bilineare RNNs mit Closed-Loop-Steuerung verbessern | Comba: 改进有闭环控制的双线区域网网 2506.02475v2 |
Authors (8): Jiaxi Hu, Yongqi Pan, Jusen Du, Disen Lan, Xiaqiang Tang, Qingsong Wen, Yuxuan Liang, Weigao Sun
Recent efficient sequence modeling methods such as Gated DeltaNet, TTT, and RWKV-7 have achieved performance improvements by supervising the recurrent memory management through Delta learning rule. Unlike previous state-space models (e.g., Mamba) and gated linear attentions (e.g., GLA), these models introduce interactions between the recurrent state and the key vector, structurally resembling bilinear systems. In this paper, we first introduce the concept of Bilinear RNNs with a comprehensive analysis on the advantages and limitations of these models. Then, based on closed-loop control theory, we propose a novel Bilinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections. We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on large-scale corpus. Comba demonstrates superior performance and computation efficiency in both language and vision modeling.
nan
Article 1257
Title@2025-06-08 (7): How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level Analysis
Title: How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level Analysis | Wie reduziert DPO die Toxizität? Eine mechanistische Neuronen-Level-Analyse | DPO 如何减少毒性? 2411.06424v3 |
Authors (5): Yushi Yang, Filip Sondej, Harry Mayne, Andrew Lee, Adam Mahdi
Safety fine-tuning algorithms reduce harmful outputs in language models, yet their mechanisms remain under-explored. Direct Preference Optimization (DPO) is a popular choice of algorithm, but prior explanations, attributing its effects solely to dampened toxic neurons in the MLP layers, are incomplete. In this study, we analyse four language models (Llama-3.1-8B, Gemma-2-2B, Mistral-7B, GPT-2-Medium) and show that toxic neurons only account for 2.5% to 24% of DPO’s effects across models. Instead, DPO balances distributed activation shifts across all MLP neurons to create a net toxicity reduction. We attribute this reduction to four neuron groups, two aligned with reducing toxicity and two promoting anti-toxicity, whose combined effects replicate DPO across models. To further validate this understanding, we develop an activation editing method mimicking DPO through distributed shifts along a toxicity representation. This method outperforms DPO in reducing toxicity while preserving perplexity, without requiring any weight updates. Our work provides a mechanistic understanding of DPO and introduces an efficient, tuning-free alternative for safety fine-tuning.
nan
Article 1258
Title@2025-06-08 (7): Quantile-Optimal Policy Learning under Unmeasured Confounding
Title: Quantile-Optimal Policy Learning under Unmeasured Confounding | Quantile-Optimal Policy Learning unter ungemessener Verwirrung | 在无法计量的混乱状态下进行量化最佳政策学习 2506.07140v1 |
Authors (5): Zhongren Chen, Siyu Chen, Zhengling Qi, Xiaohong Chen, Zhuoran Yang
We study quantile-optimal policy learning where the goal is to find a policy whose reward distribution has the largest $\alpha$-quantile for some $\alpha \in (0, 1)$. We focus on the offline setting whose generating process involves unobserved confounders. Such a problem suffers from three main challenges: (i) nonlinearity of the quantile objective as a functional of the reward distribution, (ii) unobserved confounding issue, and (iii) insufficient coverage of the offline dataset. To address these challenges, we propose a suite of causal-assisted policy learning methods that provably enjoy strong theoretical guarantees under mild conditions. In particular, to address (i) and (ii), using causal inference tools such as instrumental variables and negative controls, we propose to estimate the quantile objectives by solving nonlinear functional integral equations. Then we adopt a minimax estimation approach with nonparametric models to solve these integral equations, and propose to construct conservative policy estimates that address (iii). The final policy is the one that maximizes these pessimistic estimates. In addition, we propose a novel regularized policy learning method that is more amenable to computation. Finally, we prove that the policies learned by these methods are $\tilde{\mathscr{O}}(n^{-1/2})$ quantile-optimal under a mild coverage assumption on the offline dataset. Here, $\tilde{\mathscr{O}}(\cdot)$ omits poly-logarithmic factors. To the best of our knowledge, we propose the first sample-efficient policy learning algorithms for estimating the quantile-optimal policy when there exist unmeasured confounding.
nan
Article 1259
Title@2025-06-08 (7): Evaluation of Machine Learning Models in Student Academic Performance Prediction
Title: Evaluation of Machine Learning Models in Student Academic Performance Prediction | Evaluation von Machine Learning Models in der studentischen Leistungsvorhersage | 评价学生学术表现预测中的机器学习模式 2506.08047v1 |
Authors (2): A. G. R. Sandeepa, Sanka Mohottala
This research investigates the use of machine learning methods to forecast students’ academic performance in a school setting. Students’ data with behavioral, academic, and demographic details were used in implementations with standard classical machine learning models including multi-layer perceptron classifier (MLPC). MLPC obtained 86.46% maximum accuracy for test set across all implementations. Under 10-fold cross validation, MLPC obtained 79.58% average accuracy for test set while for train set, it was 99.65%. MLP’s better performance over other machine learning models strongly suggest the potential use of neural networks as data-efficient models. Feature selection approach played a crucial role in improving the performance and multiple evaluation approaches were used in order to compare with existing literature. Explainable machine learning methods were utilized to demystify the black box models and to validate the feature selection approach.
nan
Article 1260
Title@2025-06-08 (7): Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning
Title: Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning | Zuverlässige Kritiker: Monotonische Verbesserung und Konvergenzgarantien für die Stärkung des Lernens | 可靠的批评:加强学习的单体改进和融合保障 2506.07134v1 |
Authors (4): Eshwar S. R., Gugan Thoppe, Aditya Gopalan, Gal Dalal
Despite decades of research, it remains challenging to correctly use Reinforcement Learning (RL) algorithms with function approximation. A prime example is policy iteration, whose fundamental guarantee of monotonic improvement collapses even under linear function approximation. To address this issue, we introduce Reliable Policy Iteration (RPI). It replaces the common projection or Bellman-error minimization during policy evaluation with a Bellman-based constrained optimization. We prove that not only does RPI confer textbook monotonicity on its value estimates but these estimates also lower bound the true return. Also, their limit partially satisfies the unprojected Bellman equation, emphasizing RPI’s natural fit within RL. RPI is the first algorithm with such monotonicity and convergence guarantees under function approximation. For practical use, we provide a model-free variant of RPI that amounts to a novel critic. It can be readily integrated into primary model-free PI implementations such as DQN and DDPG. In classical control tasks, such RPI-enhanced variants consistently maintain their lower-bound guarantee while matching or surpassing the performance of all baseline methods.
nan
Article 1261
Title@2025-06-08 (7): DiC: Rethinking Conv3x3 Designs in Diffusion Models
Title: DiC: Rethinking Conv3x3 Designs in Diffusion Models | DiC: Conv3x3-Designs in Diffusionsmodellen neu denken | DiC: 重新思考传播模型中的Conv3x3 设计 2501.00603v2 |
Authors (6): Yuchuan Tian, Jing Han, Chengcheng Wang, Yuchen Liang, Chao Xu, Hanting Chen
Diffusion models have shown exceptional performance in visual generation tasks. Recently, these models have shifted from traditional U-Shaped CNN-Attention hybrid structures to fully transformer-based isotropic architectures. While these transformers exhibit strong scalability and performance, their reliance on complicated self-attention operation results in slow inference speeds. Contrary to these works, we rethink one of the simplest yet fastest module in deep learning, 3x3 Convolution, to construct a scaled-up purely convolutional diffusion model. We first discover that an Encoder-Decoder Hourglass design outperforms scalable isotropic architectures for Conv3x3, but still under-performing our expectation. Further improving the architecture, we introduce sparse skip connections to reduce redundancy and improve scalability. Based on the architecture, we introduce conditioning improvements including stage-specific embeddings, mid-block condition injection, and conditional gating. These improvements lead to our proposed Diffusion CNN (DiC), which serves as a swift yet competitive diffusion architecture baseline. Experiments on various scales and settings show that DiC surpasses existing diffusion transformers by considerable margins in terms of performance while keeping a good speed advantage. Project page: https://github.com/YuchuanTian/DiC
nan
Article 1262
Title@2025-06-08 (7): Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models
Title: Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models | Gleichwertige Denoiser können Graphen nicht kopieren: Richten Sie Ihre Graphen-Diffusionsmodelle aus | 等式 Denois 无法复制图表: 对齐您的图形扩散模型 2405.17656v2 |
Authors (5): Najwa Laabid, Severi Rissanen, Markus Heinonen, Arno Solin, Vikas Garg
Graph diffusion models, dominant in graph generative modeling, remain underexplored for graph-to-graph translation tasks like chemical reaction prediction. We demonstrate that standard permutation equivariant denoisers face fundamental limitations in these tasks due to their inability to break symmetries in noisy inputs. To address this, we propose aligning input and target graphs to break input symmetries while preserving permutation equivariance in non-matching graph portions. Using retrosynthesis (i.e., the task of predicting precursors for synthesis of a given target molecule) as our application domain, we show how alignment dramatically improves discrete diffusion model performance from 5% to a SOTA-matching 54.7% top-1 accuracy. Code is available at https://github.com/Aalto-QuML/DiffAlign.
nan
Article 1263
Title@2025-06-08 (7): Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models
Title: Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models | Quality-Diversity Red-Teaming: Automatisierte Generierung von hochwertigen und unterschiedlichen Attackern für große Sprachmodelle | 质量-多样性红队:为大语言模式自动生成高品质和不同攻击者 2506.07121v1 |
Authors (8): Ren-Jian Wang, Ke Xue, Zeyu Qin, Ziniu Li, Sheng Tang, Hao-Tian Li, Shengcai Liu, Chao Qian
Ensuring safety of large language models (LLMs) is important. Red teaming–a systematic approach to identifying adversarial prompts that elicit harmful responses from target LLMs–has emerged as a crucial safety evaluation method. Within this framework, the diversity of adversarial prompts is essential for comprehensive safety assessments. We find that previous approaches to red-teaming may suffer from two key limitations. First, they often pursue diversity through simplistic metrics like word frequency or sentence embedding similarity, which may not capture meaningful variation in attack strategies. Second, the common practice of training a single attacker model restricts coverage across potential attack styles and risk categories. This paper introduces Quality-Diversity Red-Teaming (QDRT), a new framework designed to address these limitations. QDRT achieves goal-driven diversity through behavior-conditioned training and implements a behavioral replay buffer in an open-ended manner. Additionally, it trains multiple specialized attackers capable of generating high-quality attacks across diverse styles and risk categories. Our empirical evaluation demonstrates that QDRT generates attacks that are both more diverse and more effective against a wide range of target LLMs, including GPT-2, Llama-3, Gemma-2, and Qwen2.5. This work advances the field of LLM safety by providing a systematic and effective approach to automated red-teaming, ultimately supporting the responsible deployment of LLMs.
nan
Article 1264
Title@2025-06-08 (7): A Hybrid Multi-Factor Network with Dynamic Sequence Modeling for Early Warning of Intraoperative Hypotension
Title: A Hybrid Multi-Factor Network with Dynamic Sequence Modeling for Early Warning of Intraoperative Hypotension | Hybrides Multi-Factor-Netzwerk mit dynamischer Sequenzmodellierung zur Frühwarnung von intraoperativer Hypotonie | 混合多要素网络,具有动态序列模型模型,以及早警告不合作水分的不合作状态; 2409.11064v4 |
Authors (4): Mingyue Cheng, Jintao Zhang, Zhiding Liu, Chunli Liu
Intraoperative hypotension (IOH) prediction using past physiological signals is crucial, as IOH may lead to inadequate organ perfusion and significantly elevate the risk of severe complications and mortality. However, current methods often rely on static modeling, overlooking the complex temporal dependencies and the inherently non-stationary nature of physiological signals. We propose a Hybrid Multi-Factor (HMF) network that formulates IOH prediction as a dynamic sequence forecasting task, explicitly capturing both temporal dependencies and physiological non-stationarity. We represent signal dynamics as multivariate time series and decompose them into trend and seasonal components, enabling separate modeling of long-term and periodic variations. Each component is encoded with a patch-based Transformer to balance computational efficiency and feature representation. To address distributional drift from evolving signals, we introduce a symmetric normalization mechanism. Experiments on both public and real-world clinical datasets show that HMF significantly outperforms competitive baselines. We hope HMF offers new insights into IOH prediction and ultimately promotes safer surgical care. Our code is available at https://github.com/Mingyue-Cheng/HMF.
nan
Article 1265
Title@2025-06-08 (7): Conditional Denoising Meets Polynomial Modeling: A Flexible Decoupled Framework for Time Series Forecasting
Title: Conditional Denoising Meets Polynomial Modeling: A Flexible Decoupled Framework for Time Series Forecasting | Bedingtes Stören trifft auf Polynommodellierung: Ein flexibles entkoppeltes Framework für die Zeitreihenprognose | 满足多面性建模:时间序列预测灵活拆分框架 2410.13253v7 |
Authors (5): Jintao Zhang, Mingyue Cheng, Xiaoyu Tao, Zhiding Liu, Daoyu Wang
Time series forecasting models are becoming increasingly prevalent due to their critical role in decision-making across various domains. However, most existing approaches represent the coupled temporal patterns, often neglecting the distinction between their specific components. In particular, fluctuating patterns and smooth trends within time series exhibit distinct characteristics. In this work, to model complicated temporal patterns, we propose a Conditional Denoising Polynomial Modeling (CDPM) framework, where probabilistic diffusion models and deterministic linear models are trained end-to-end. Instead of modeling the coupled time series, CDPM decomposes it into trend and seasonal components for modeling them separately. To capture the fluctuating seasonal component, we employ a probabilistic diffusion model based on statistical properties from the historical window. For the smooth trend component, a module is proposed to enhance linear models by incorporating historical dependencies, thereby preserving underlying trends and mitigating noise distortion. Extensive experiments conducted on six benchmarks demonstrate the effectiveness of our framework, highlighting the potential of combining probabilistic and deterministic models. Our code is available at https://github.com/zjt-gpu/CDPM.
nan
Article 1266
Title@2025-06-08 (7): Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings
Title: Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings | Auf dem Weg zu einer universellen Offline-Black-Box-Optimierung über Sprachmodell-Embeddings | 通过学习语言模型嵌入式嵌入,实现普遍离线黑箱优化 2506.07109v1 |
Authors (7): Rong-Xi Tan, Ming Chen, Ke Xue, Yao Wang, Yaoyuan Wang, Sheng Fu, Chao Qian
The pursuit of universal black-box optimization (BBO) algorithms is a longstanding goal. However, unlike domains such as language or vision, where scaling structured data has driven generalization, progress in offline BBO remains hindered by the lack of unified representations for heterogeneous numerical spaces. Thus, existing offline BBO approaches are constrained to single-task and fixed-dimensional settings, failing to achieve cross-domain universal optimization. Recent advances in language models (LMs) offer a promising path forward: their embeddings capture latent relationships in a unifying way, enabling universal optimization across different data types possible. In this paper, we discuss multiple potential approaches, including an end-to-end learning framework in the form of next-token prediction, as well as prioritizing the learning of latent spaces with strong representational capabilities. To validate the effectiveness of these methods, we collect offline BBO tasks and data from open-source academic works for training. Experiments demonstrate the universality and effectiveness of our proposed methods. Our findings suggest that unifying language model priors and learning string embedding space can overcome traditional barriers in universal BBO, paving the way for general-purpose BBO algorithms. The code is provided at https://github.com/lamda-bbo/universal-offline-bbo.
nan
Article 1267
Title@2025-06-08 (7): Filling the Missings: Spatiotemporal Data Imputation by Conditional Diffusion
Title: Filling the Missings: Spatiotemporal Data Imputation by Conditional Diffusion | Füllung der fehlenden Daten: Spatiotemporale Daten-Imputation durch Bedingte Diffusion | 填补失踪人员:通过有条件的传播来估计时空数据 2506.07099v1 |
Authors (5): Wenying He, Jieling Huang, Junhua Gu, Ji Zhang, Yude Bai
Missing data in spatiotemporal systems presents a significant challenge for modern applications, ranging from environmental monitoring to urban traffic management. The integrity of spatiotemporal data often deteriorates due to hardware malfunctions and software failures in real-world deployments. Current approaches based on machine learning and deep learning struggle to model the intricate interdependencies between spatial and temporal dimensions effectively and, more importantly, suffer from cumulative errors during the data imputation process, which propagate and amplify through iterations. To address these limitations, we propose CoFILL, a novel Conditional Diffusion Model for spatiotemporal data imputation. CoFILL builds on the inherent advantages of diffusion models to generate high-quality imputations without relying on potentially error-prone prior estimates. It incorporates an innovative dual-stream architecture that processes temporal and frequency domain features in parallel. By fusing these complementary features, CoFILL captures both rapid fluctuations and underlying patterns in the data, which enables more robust imputation. The extensive experiments reveal that CoFILL’s noise prediction network successfully transforms random noise into meaningful values that align with the true data distribution. The results also show that CoFILL outperforms state-of-the-art methods in imputation accuracy. The source code is publicly available at https://github.com/joyHJL/CoFILL.
nan
Article 1268
Title@2025-06-08 (7): A Theoretical Characterization of Optimal Data Augmentations in Self-Supervised Learning
Title: A Theoretical Characterization of Optimal Data Augmentations in Self-Supervised Learning | Eine theoretische Charakterisierung von Optimal Data Augmentations im Selbst-überwachten Lernen | 自我监督学习中最佳数据增加的理论特征 2411.01767v4 |
Authors (3): Shlomo Libo Feigin, Maximilian Fleissner, Debarghya Ghoshdastidar
Data augmentations play an important role in the recent success of self-supervised learning (SSL). While augmentations are commonly understood to encode invariances between different views into the learned representations, this interpretation overlooks the impact of the pretraining architecture and suggests that SSL would require diverse augmentations which resemble the data to work well. However, these assumptions do not align with empirical evidence, encouraging further theoretical understanding to guide the principled design of augmentations in new domains. To this end, we use kernel theory to derive analytical expressions for data augmentations that achieve desired target representations after pretraining. We consider non-contrastive and contrastive losses, namely VICReg, Barlow Twins and the Spectral Contrastive Loss, and provide an algorithm to construct such augmentations. Our analysis shows that augmentations need not be similar to the data to learn useful representations, nor be diverse, and that the architecture has a significant impact on the optimal augmentations.
nan
Article 1269
Title@2025-06-08 (7): Epidemic-guided deep learning for spatiotemporal forecasting of Tuberculosis outbreak
Title: Epidemic-guided deep learning for spatiotemporal forecasting of Tuberculosis outbreak | Epidemisch geführtes Deep Learning für die spatiotemporale Vorhersage des Tuberkuloseausbruchs | 流行病指导深层学习,以对结核病爆发进行时空预测 2502.10786v2 |
Authors (4): Madhab Barman, Madhurima Panja, Nachiketa Mishra, Tanujit Chakraborty
Tuberculosis (TB) remains a formidable global health challenge, driven by complex spatiotemporal transmission dynamics and influenced by factors such as population mobility and behavioral changes. We propose an Epidemic-Guided Deep Learning (EGDL) approach that fuses mechanistic epidemiological principles with advanced deep learning techniques to enhance early warning systems and intervention strategies for TB outbreaks. Our framework is built upon a modified networked Susceptible-Infectious-Recovered (MN-SIR) model augmented with a saturated incidence rate and graph Laplacian diffusion, capturing both long-term transmission dynamics and region-specific population mobility patterns. Compartmental model parameters are rigorously estimated using Bayesian inference via the Markov Chain Monte Carlo approach. Theoretical analysis leveraging the comparison principle and Green’s formula establishes global stability properties of the disease-free and endemic equilibria. Building on these epidemiological insights, we design two forecasting architectures, EGDL-Parallel and EGDL-Series, that integrate the mechanistic outputs of the MN-SIR model within deep neural networks. This integration mitigates the overfitting risks commonly encountered in data-driven methods and filters out noise inherent in surveillance data, resulting in reliable forecasts of real-world epidemic trends. Experiments conducted on TB incidence data from 47 prefectures in Japan and 31 provinces in mainland China demonstrate that our approach delivers robust and accurate predictions across multiple time horizons (short to medium-term forecasts), supporting its generalizability across regions with different population dynamics.
nan
Article 1270
Title@2025-06-08 (7): Patient Similarity Computation for Clinical Decision Support: An Efficient Use of Data Transformation, Combining Static and Time Series Data
Title: Patient Similarity Computation for Clinical Decision Support: An Efficient Use of Data Transformation, Combining Static and Time Series Data | Patienten-Ähnlichkeitsrechnung für die klinische Entscheidungsunterstützung: Eine effiziente Nutzung von Datentransformationen, die Kombination von statischen und Zeitreihendaten | 临床决定支持:有效利用数据转换、静态和时间系列数据 2506.07092v1 |
Authors (4): Joydeb Kumar Sana, Mohammad M. Masud, M Sohel Rahman, M Saifur Rahman
Patient similarity computation (PSC) is a fundamental problem in healthcare informatics. The aim of the patient similarity computation is to measure the similarity among patients according to their historical clinical records, which helps to improve clinical decision support. This paper presents a novel distributed patient similarity computation (DPSC) technique based on data transformation (DT) methods, utilizing an effective combination of time series and static data. Time series data are sensor-collected patients’ information, including metrics like heart rate, blood pressure, Oxygen saturation, respiration, etc. The static data are mainly patient background and demographic data, including age, weight, height, gender, etc. Static data has been used for clustering the patients. Before feeding the static data to the machine learning model adaptive Weight-of-Evidence (aWOE) and Z-score data transformation (DT) methods have been performed, which improve the prediction performances. In aWOE-based patient similarity models, sensitive patient information has been processed using aWOE which preserves the data privacy of the trained models. We used the Dynamic Time Warping (DTW) approach, which is robust and very popular, for time series similarity. However, DTW is not suitable for big data due to the significant computational run-time. To overcome this problem, distributed DTW computation is used in this study. For Coronary Artery Disease, our DT based approach boosts prediction performance by as much as 11.4%, 10.20%, and 12.6% in terms of AUC, accuracy, and F-measure, respectively. In the case of Congestive Heart Failure (CHF), our proposed method achieves performance enhancement up to 15.9%, 10.5%, and 21.9% for the same measures, respectively. The proposed method reduces the computation time by as high as 40%.
nan
Article 1271
Title@2025-06-08 (7): Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models
Title: Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models | Stellen Sie sich vor zu hören: Auditory Knowledge Generation kann ein effektiver Assistent für Sprachmodelle sein | 想象一下:产生听力知识可以成为语言模式的有效助理。 2503.16853v2 |
Authors (3): Suho Yoo, Hyunjong Ok, Jaeho Lee
Language models pretrained on text-only corpora often struggle with tasks that require auditory commonsense knowledge. Previous work addresses this problem by augmenting the language model to retrieve knowledge from external audio databases. This approach has several limitations, such as the potential lack of relevant audio in databases and the high costs associated with constructing the databases. To address these issues, we propose Imagine to Hear, a novel approach that dynamically generates auditory knowledge using generative models. Our framework detects multiple audio-related textual spans from the given prompt and generates corresponding auditory knowledge. We develop several mechanisms to efficiently process multiple auditory knowledge, including a CLAP-based rejection sampler and a language-audio fusion module. Our experiments show that our method achieves state-of-the-art performance on AuditoryBench without relying on external databases, highlighting the effectiveness of our generation-based approach.
nan
Article 1272
Title@2025-06-08 (7): State Entropy Regularization for Robust Reinforcement Learning
Title: State Entropy Regularization for Robust Reinforcement Learning | Staatliche Entropie-Regularisierung für robustes Stärkungslernen | 强力强化学习的状态元件常规化 2506.07085v1 |
Authors (6): Uri Koren, Yonatan Ashlag, Mirco Mutti, Esther Derman, Pierre-Luc Bacon, Shie Mannor
State entropy regularization has empirically shown better exploration and sample complexity in reinforcement learning (RL). However, its theoretical guarantees have not been studied. In this paper, we show that state entropy regularization improves robustness to structured and spatially correlated perturbations. These types of variation are common in transfer learning but often overlooked by standard robust RL methods, which typically focus on small, uncorrelated changes. We provide a comprehensive characterization of these robustness properties, including formal guarantees under reward and transition uncertainty, as well as settings where the method performs poorly. Much of our analysis contrasts state entropy with the widely used policy entropy regularization, highlighting their different benefits. Finally, from a practical standpoint, we illustrate that compared with policy entropy, the robustness advantages of state entropy are more sensitive to the number of rollouts used for policy evaluation.
nan
Article 1273
Title@2025-06-08 (7): Inverse Design of Metamaterials with Manufacturing-Guiding Spectrum-to-Structure Conditional Diffusion Model
Title: Inverse Design of Metamaterials with Manufacturing-Guiding Spectrum-to-Structure Conditional Diffusion Model | Inverses Design von Metamaterialien mit Fertigungsleit-Spektrum-zu-Struktur Bedingtes Diffusionsmodell | 具有 “ 制造-引导光谱到结构有条件扩散模型 “ 的元材料逆向设计 2506.07083v1 |
Authors (12): Jiawen Li, Jiang Guo, Yuanzhe Li, Zetian Mao, Jiaxing Shen, Tashi Xu, Diptesh Das, Jinming He, Run Hu, Yaerim Lee, Koji Tsuda, Junichiro Shiomi
Metamaterials are artificially engineered structures that manipulate electromagnetic waves, having optical properties absent in natural materials. Recently, machine learning for the inverse design of metamaterials has drawn attention. However, the highly nonlinear relationship between the metamaterial structures and optical behaviour, coupled with fabrication difficulties, poses challenges for using machine learning to design and manufacture complex metamaterials. Herein, we propose a general framework that implements customised spectrum-to-shape and size parameters to address one-to-many metamaterial inverse design problems using conditional diffusion models. Our method exhibits superior spectral prediction accuracy, generates a diverse range of patterns compared to other typical generative models, and offers valuable prior knowledge for manufacturing through the subsequent analysis of the diverse generated results, thereby facilitating the experimental fabrication of metamaterial designs. We demonstrate the efficacy of the proposed method by successfully designing and fabricating a free-form metamaterial with a tailored selective emission spectrum for thermal camouflage applications.
nan
Article 1274
Title@2025-06-08 (7): On the Generalization of Data-Assisted Control in port-Hamiltonian Systems (DAC-pH)
Title: On the Generalization of Data-Assisted Control in port-Hamiltonian Systems (DAC-pH) | Zur Generalisierung der datengestützten Steuerung in port-Hamiltonischen Systemen (DAC-pH) | 港口-汉堡系统数据辅助控制普遍化(DAC-pH) 2506.07079v1 |
Authors (2): Mostafa Eslami, Maryam Babazadeh
This paper introduces a hypothetical hybrid control framework for port-Hamiltonian (p$\mathcal{H}$) systems, employing a dynamic decomposition based on Data-Assisted Control (DAC). The system’s evolution is split into two parts with fixed topology: Right-Hand Side (RHS)- an intrinsic Hamiltonian flow handling worst-case parametric uncertainties, and Left-Hand Side (LHS)- a dissipative/input flow addressing both structural and parametric uncertainties. A virtual port variable $\Pi$ serves as the interface between these two components. A nonlinear controller manages the intrinsic Hamiltonian flow, determining a desired port control value $\Pi_c$. Concurrently, Reinforcement Learning (RL) is applied to the dissipative/input flow to learn an agent for providing optimal policy in mapping $\Pi_c$ to the actual system input. This hybrid approach effectively manages RHS uncertainties while preserving the system’s inherent structure. Key advantages include adjustable performance via LHS controller parameters, enhanced AI explainability and interpretability through the port variable $\Pi$, the ability to guarantee safety and state attainability with hard/soft constraints, reduced complexity in learning hypothesis classes compared to end-to-end solutions, and improved state/parameter estimation using LHS prior knowledge and system Hamiltonian to address partial observability. The paper details the p$\mathcal{H}$ formulation, derives the decomposition, and presents the modular controller architecture. Beyond design, crucial aspects of stability and robustness analysis and synthesis are investigated, paving the way for deeper theoretical investigations. An application example, a pendulum with nonlinear dynamics, is simulated to demonstrate the approach’s empirical and phenomenological benefits for future research.
nan
Article 1275
Title@2025-06-08 (7): AgentDrug: Utilizing Large Language Models in an Agentic Workflow for Zero-Shot Molecular Optimization
Title: AgentDrug: Utilizing Large Language Models in an Agentic Workflow for Zero-Shot Molecular Optimization | AgentDrug: Einsatz großer Sprachmodelle in einem Agentischen Workflow für die Null-Shot-Molekularoptimierung | A剂药物:利用大型语言模型进行零热分子优化的制剂工作流程中的大语言模型 2410.13147v9 |
Authors (3): Khiem Le, Ting Hua, Nitesh V. Chawla
Molecular optimization – modifying a given molecule to improve desired properties – is a fundamental task in drug discovery. While LLMs hold the potential to solve this task using natural language to drive the optimization, straightforward prompting achieves limited accuracy. In this work, we propose AgentDrug, an agentic workflow that leverages LLMs in a structured refinement process to achieve significantly higher accuracy. AgentDrug defines a nested refinement loop: the inner loop uses feedback from cheminformatics toolkits to validate molecular structures, while the outer loop guides the LLM with generic feedback and a gradient-based objective to steer the molecule toward property improvement. We evaluate AgentDrug on benchmarks with both single- and multi-property optimization under loose and strict thresholds. Results demonstrate significant performance gains over previous methods. With Qwen-2.5-3B, AgentDrug improves accuracy by 20.7\% (loose) and 16.8\% (strict) on six single-property tasks, and by 7.0\% and 5.3\% on eight multi-property tasks. With larger model Qwen-2.5-7B, AgentDrug further improves accuracy on 6 single-property objectives by 28.9\% (loose) and 29.0\% (strict), and on 8 multi-property objectives by 14.9\% (loose) and 13.2\% (strict).
nan
Article 1276
Title@2025-06-08 (7): E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models
Title: E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models | E-BATS: Effiziente Backpropagation-freie Testzeitanpassung für Sprachstiftungsmodelle | EBATS: 语音基础模型的有效后支持-无测试-时间适应 2506.07078v1 |
Authors (6): Jiaheng Dong, Hong Jia, Soumyajit Chatterjee, Abhirup Ghosh, James Bailey, Ting Dang
Speech Foundation Models encounter significant performance degradation when deployed in real-world scenarios involving acoustic domain shifts, such as background noise and speaker accents. Test-time adaptation (TTA) has recently emerged as a viable strategy to address such domain shifts at inference time without requiring access to source data or labels. However, existing TTA approaches, particularly those relying on backpropagation, are memory-intensive, limiting their applicability in speech tasks and resource-constrained settings. Although backpropagation-free methods offer improved efficiency, existing ones exhibit poor accuracy. This is because they are predominantly developed for vision tasks, which fundamentally differ from speech task formulations, noise characteristics, and model architecture, posing unique transferability challenges. In this paper, we introduce E-BATS, the first Efficient BAckpropagation-free TTA framework designed explicitly for speech foundation models. E-BATS achieves a balance between adaptation effectiveness and memory efficiency through three key components: (i) lightweight prompt adaptation for a forward-pass-based feature alignment, (ii) a multi-scale loss to capture both global (utterance-level) and local distribution shifts (token-level) and (iii) a test-time exponential moving average mechanism for stable adaptation across utterances. Experiments conducted on four noisy speech datasets spanning sixteen acoustic conditions demonstrate consistent improvements, with 4.1%-13.5% accuracy gains over backpropagation-free baselines and 2.0-6.4 times GPU memory savings compared to backpropagation-based methods. By enabling scalable and robust adaptation under acoustic variability, this work paves the way for developing more efficient adaptation approaches for practical speech processing systems in real-world environments.
nan
Article 1277
Title@2025-06-08 (7): Diversity from Human Feedback
Title: Diversity from Human Feedback | Vielfalt durch menschliches Feedback | 来自人类反馈的多样性 2310.06648v3 |
Authors (7): Ren-Jian Wang, Ke Xue, Yutong Wang, Peng Yang, Haobo Fu, Qiang Fu, Chao Qian
Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that the behavior learned by DivHF is much more consistent with human requirements than the one learned by direct data-driven approaches without human feedback, and makes the final solutions more diverse under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.
nan
Article 1278
Title@2025-06-08 (7): Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization
Title: Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization | Beschleunigung der 3D Gaussian Splatting mit neuraler Sortierung und achsenorientierter Rasterisierung | 加速 3D Gaussian 与神经排序和轴偏向光化加速 3D Gaussian 喷洒 2506.07069v1 |
Authors (10): Zhican Wang, Guanghui He, Dantong Liu, Lingjun Gao, Shell Xu Hu, Chen Zhang, Zhuoran Song, Nicholas Lane, Wayne Luk, Hongxiang Fan
3D Gaussian Splatting (3DGS) has recently gained significant attention for high-quality and efficient view synthesis, making it widely adopted in fields such as AR/VR, robotics, and autonomous driving. Despite its impressive algorithmic performance, real-time rendering on resource-constrained devices remains a major challenge due to tight power and area budgets. This paper presents an architecture-algorithm co-design to address these inefficiencies. First, we reveal substantial redundancy caused by repeated computation of common terms/expressions during the conventional rasterization. To resolve this, we propose axis-oriented rasterization, which pre-computes and reuses shared terms along both the X and Y axes through a dedicated hardware design, effectively reducing multiply-and-add (MAC) operations by up to 63%. Second, by identifying the resource and performance inefficiency of the sorting process, we introduce a novel neural sorting approach that predicts order-independent blending weights using an efficient neural network, eliminating the need for costly hardware sorters. A dedicated training framework is also proposed to improve its algorithmic stability. Third, to uniformly support rasterization and neural network inference, we design an efficient reconfigurable processing array that maximizes hardware utilization and throughput. Furthermore, we introduce a $\pi$-trajectory tile schedule, inspired by Morton encoding and Hilbert curve, to optimize Gaussian reuse and reduce memory access overhead. Comprehensive experiments demonstrate that the proposed design preserves rendering quality while achieving a speedup of $23.4\sim27.8\times$ and energy savings of $28.8\sim51.4\times$ compared to edge GPUs for real-world scenes. We plan to open-source our design to foster further development in this field.
nan
Article 1279
Title@2025-06-08 (7): Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems
Title: Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems | Stabile Derivative Freie Gaussian Mischung Variationelle Schlussfolgerung für Bayesian Inverse Probleme | Bayes 反问题中稳定衍生自由高斯混集变异推论 2501.04259v3 |
Authors (5): Baojun Che, Yifan Chen, Zhenghao Huan, Daniel Zhengyu Huang, Weijie Wang
This paper is concerned with the approximation of probability distributions known up to normalization constants, with a focus on Bayesian inference for large-scale inverse problems in scientific computing. In this context, key challenges include costly repeated evaluations of forward models, multimodality, and inaccessible gradients for the forward model. To address them, we develop a variational inference framework that combines Fisher-Rao natural gradient with specialized quadrature rules to enable derivative free updates of Gaussian mixture variational families. The resulting method, termed Derivative Free Gaussian Mixture Variational Inference (DF-GMVI), guarantees covariance positivity and affine invariance, offering a stable and efficient framework for approximating complex posterior distributions. The effectiveness of DF-GMVI is demonstrated through numerical experiments on challenging scenarios, including distributions with multiple modes, infinitely many modes, and curved modes in spaces with up to 100 dimensions. The method’s practicality is further demonstrated in a large-scale application, where it successfully recovers the initial conditions of the Navier-Stokes equations from solution data at positive times.
nan
Article 1280
Title@2025-06-08 (7): AdaSplash: Adaptive Sparse Flash Attention
Title: AdaSplash: Adaptive Sparse Flash Attention | AdaSplash: Adaptive Sparse Flash Achtung | AdaSplash: 适应性散散闪光关注 2502.12082v2 |
Authors (3): Nuno Gonçalves, Marcos Treviso, André F. T. Martins
The computational cost of softmax-based attention in transformers limits their applicability to long-context tasks. Adaptive sparsity, of which $\alpha$-entmax attention is an example, offers a flexible data-dependent alternative, but existing implementations are inefficient and do not leverage the sparsity to obtain runtime and memory gains. In this work, we propose AdaSplash, which combines the efficiency of GPU-optimized algorithms with the sparsity benefits of $\alpha$-entmax. We first introduce a hybrid Halley-bisection algorithm, resulting in a 7-fold reduction in the number of iterations needed to compute the $\alpha$-entmax transformation. Then, we implement custom Triton kernels to efficiently handle adaptive sparsity. Experiments with RoBERTa and ModernBERT for text classification and single-vector retrieval, along with GPT-2 for language modeling, show that our method achieves substantial improvements in runtime and memory efficiency compared to existing $\alpha$-entmax implementations. It approaches – and in some cases surpasses – the efficiency of highly optimized softmax implementations like FlashAttention-2, enabling long-context training while maintaining strong task performance.
nan
Article 1281
Title@2025-06-08 (7): D2R: dual regularization loss with collaborative adversarial generation for model robustness
Title: D2R: dual regularization loss with collaborative adversarial generation for model robustness | D2R: Dualer Regularisierungsverlust mit kollaborativer konversatorischer Generation für Modell Robustheit | D2R:双重身份正规化损失,为示范性稳健而协作性对抗性一代人 2506.07056v1 |
Authors (6): Zhenyu Liu, Huizhi Liang, Rajiv Ranjan, Zhanxing Zhu, Vaclav Snasel, Varun Ojha
The robustness of Deep Neural Network models is crucial for defending models against adversarial attacks. Recent defense methods have employed collaborative learning frameworks to enhance model robustness. Two key limitations of existing methods are (i) insufficient guidance of the target model via loss functions and (ii) non-collaborative adversarial generation. We, therefore, propose a dual regularization loss (D2R Loss) method and a collaborative adversarial generation (CAG) strategy for adversarial training. D2R loss includes two optimization steps. The adversarial distribution and clean distribution optimizations enhance the target model’s robustness by leveraging the strengths of different loss functions obtained via a suitable function space exploration to focus more precisely on the target model’s distribution. CAG generates adversarial samples using a gradient-based collaboration between guidance and target models. We conducted extensive experiments on three benchmark databases, including CIFAR-10, CIFAR-100, Tiny ImageNet, and two popular target models, WideResNet34-10 and PreActResNet18. Our results show that D2R loss with CAG produces highly robust models.
nan
Article 1282
Title@2025-06-08 (7): Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead
Title: Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead | Policy Gradient mit Baumsuche: Lokale Optimas durch Lookahead vermeiden | 与树搜索的政策渐变:通过 Lookahead 避免本地 Oppima 2506.07054v1 |
Authors (6): Uri Koren, Navdeep Kumar, Uri Gadot, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor
Classical policy gradient (PG) methods in reinforcement learning frequently converge to suboptimal local optima, a challenge exacerbated in large or complex environments. This work investigates Policy Gradient with Tree Search (PGTS), an approach that integrates an $m$-step lookahead mechanism to enhance policy optimization. We provide theoretical analysis demonstrating that increasing the tree search depth $m$-monotonically reduces the set of undesirable stationary points and, consequently, improves the worst-case performance of any resulting stationary policy. Critically, our analysis accommodates practical scenarios where policy updates are restricted to states visited by the current policy, rather than requiring updates across the entire state space. Empirical evaluations on diverse MDP structures, including Ladder, Tightrope, and Gridworld environments, illustrate PGTS’s ability to exhibit “farsightedness,” navigate challenging reward landscapes, escape local traps where standard PG fails, and achieve superior solutions.
nan
Article 1283
Title@2025-06-08 (7): Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz
Title: Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz | Hyperbolisches rezidivierendes neuronales Netzwerk als erste Art von nicht-euklidischen neuronalen Quantenzustandsansatz | 超双曲经常性神经网络,作为第一种非欧洲的神经量子状态 ansatz 2505.22083v2 |
Authors (1): H. L. Dao
In this work, we introduce the first type of non-Euclidean neural quantum state (NQS) ansatz, in the form of the hyperbolic GRU (a variant of recurrent neural networks (RNNs)), to be used in the Variational Monte Carlo method of approximating the ground state energy for quantum many-body systems. In particular, we examine the performances of NQS ansatzes constructed from both conventional or Euclidean RNN/GRU and from hyperbolic GRU in the prototypical settings of the one- and two-dimensional transverse field Ising models (TFIM) and the one-dimensional Heisenberg $J_1J_2$ and $J_1J_2J_3$ systems. By virtue of the fact that, for all of the experiments performed in this work, hyperbolic GRU can yield performances comparable to or better than Euclidean RNNs, which have been extensively studied in these settings in the literature, our work is a proof-of-concept for the viability of hyperbolic GRU as the first type of non-Euclidean NQS ansatz for quantum many-body systems. Furthermore, in settings where the Hamiltonian displays a clear hierarchical interaction structure, such as the 1D Heisenberg $J_1J_2$ & $J_1J_2J_3$ systems with the 1st, 2nd and even 3rd nearest neighbor interactions, our results show that hyperbolic GRU definitively outperforms its Euclidean version in all instances. The fact that these results are reminiscent of the established ones from natural language processing where hyperbolic GRU almost always outperforms Euclidean RNNs when the training data exhibit a tree-like or hierarchical structure leads us to hypothesize that hyperbolic GRU NQS ansatz would likely outperform Euclidean RNN/GRU NQS ansatz in quantum spin systems that involve different degrees of nearest neighbor interactions. Finally, with this work, we hope to initiate future studies of other types of non-Euclidean NQS beyond hyperbolic GRU.
nan
Article 1284
Title@2025-06-08 (7): AHSG: Adversarial Attack on High-level Semantics in Graph Neural Networks
Title: AHSG: Adversarial Attack on High-level Semantics in Graph Neural Networks | AHSG: Adversarialer Angriff auf hochrangige Semantik in Graphen-Neuralen Netzwerken | AHSG:对图形神经网络高层语义的反向攻击 2412.07468v3 |
Authors (4): Kai Yuan, Jiahao Zhang, Yidi Wang, Xiaobing Pei
Adversarial attacks on Graph Neural Networks aim to perturb the performance of the learner by carefully modifying the graph topology and node attributes. Existing methods achieve attack stealthiness by constraining the modification budget and differences in graph properties. However, these methods typically disrupt task-relevant primary semantics directly, which results in low defensibility and detectability of the attack. In this paper, we propose an Adversarial Attack on High-level Semantics for Graph Neural Networks (AHSG), which is a graph structure attack model that ensures the retention of primary semantics. By combining latent representations with shared primary semantics, our model retains detectable attributes and relational patterns of the original graph while leveraging more subtle changes to carry out the attack. Then we use the Projected Gradient Descent algorithm to map the latent representations with attack effects to the adversarial graph. Through experiments on robust graph deep learning models equipped with defense strategies, we demonstrate that AHSG outperforms other state-of-the-art methods in attack effectiveness. Additionally, using Contextual Stochastic Block Models to detect the attacked graph further validates that our method preserves the primary semantics of the graph.
nan
Article 1285
Title@2025-06-08 (7): When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models
Title: When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models | Wenn Aufmerksamkeit zusammenbricht: Wie degenerierte Ebenen in LLMs kleinere, stärkere Modelle aktivieren | 注意折叠时: LLM 中如何分解层的 LLM 启用更小、 更强模型 2404.08634v3 |
Authors (4): Sunny Sanyal, Ravid Shwartz-Ziv, Alexandros G. Dimakis, Sujay Sanghavi
Large Language Models (LLMs) rely on the transformer architecture and its self-attention mechanism to deliver strong performance across tasks. However, we uncover a structural inefficiency in standard pre-trained decoder-style LLMs: in many of the deeper layers, attention matrices frequently collapse to near rank-one, single-column patterns. We refer to these underutilized components as lazy layers, which are redundant and computationally inefficient. To address this, we propose Inheritune, a simple and effective training recipe for building smaller, more efficient, and high performing language models. Inheritune initializes a compact model by inheriting the useful early layers from a larger pre-trained model, then progressively retrains and expands it. Our experiments across multiple models and datasets show that Inheritune trained models, despite having significantly fewer layers, can match or even outperform their larger counterparts. This approach yields compact, performant models and offers a practical path for efficient language model compression. Code is available at https://github.com/sanyalsunny111/LLM-Inheritune
nan
Article 1286
Title@2025-06-08 (7): FairPFN: A Tabular Foundation Model for Causal Fairness
Title: FairPFN: A Tabular Foundation Model for Causal Fairness | FairPFN: Ein Tabular Foundation Modell für Causal Fairness | FairPPFN: 因果关系公平性表示基金会模型 2506.07049v1 |
Authors (5): Jake Robertson, Noah Hollmann, Samuel Müller, Noor Awad, Frank Hutter
Machine learning (ML) systems are utilized in critical sectors, such as healthcare, law enforcement, and finance. However, these systems are often trained on historical data that contains demographic biases, leading to ML decisions that perpetuate or exacerbate existing social inequalities. Causal fairness provides a transparent, human-in-the-loop framework to mitigate algorithmic discrimination, aligning closely with legal doctrines of direct and indirect discrimination. However, current causal fairness frameworks hold a key limitation in that they assume prior knowledge of the correct causal model, restricting their applicability in complex fairness scenarios where causal models are unknown or difficult to identify. To bridge this gap, we propose FairPFN, a tabular foundation model pre-trained on synthetic causal fairness data to identify and mitigate the causal effects of protected attributes in its predictions. FairPFN’s key contribution is that it requires no knowledge of the causal model and still demonstrates strong performance in identifying and removing protected causal effects across a diverse set of hand-crafted and real-world scenarios relative to robust baseline methods. FairPFN paves the way for promising future research, making causal fairness more accessible to a wider variety of complex fairness problems.
nan
Article 1287
Title@2025-06-08 (7): SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Title: SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration | SalbeiAchtung: Genaue 8-Bit-Achtung für Plug-and-Play-Inferenzbeschleunigung | SageAttention: 精确注意8-Bit 加速插管和玩耍推论加速 2410.02367v8 |
Authors (6): Jintao Zhang, Jia Wei, Haofeng Huang, Pengle Zhang, Jun Zhu, Jianfei Chen
The transformer architecture predominates across various models. As the heart of the transformer, attention has a computational complexity of $O(N^2)$, compared to $O(N)$ for linear transformations. When handling large sequence lengths, attention becomes the primary time-consuming component. Although quantization has proven to be an effective method for accelerating model inference, existing quantization methods primarily focus on optimizing the linear layer. In response, we first analyze the feasibility of quantization in attention detailedly. Following that, we propose SageAttention, a highly efficient and accurate quantization method for attention. The OPS (operations per second) of our approach outperforms FlashAttention2 and xformers by about 2.1 times and 2.7 times, respectively. SageAttention also achieves superior accuracy performance over FlashAttention3. Comprehensive experiments confirm that our approach incurs almost no end-to-end metrics loss across diverse models, including those for large language processing, image generation, and video generation. The codes are available at https://github.com/thu-ml/SageAttention.
nan
Article 1288
Title@2025-06-08 (7): SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
Title: SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization | SalbeiAchtung2: Effiziente Aufmerksamkeit mit gründlicher Ausreißerglättung und per-thread INT4 Quantisierung | SageAttention 2: 高效关注,以完全外流滑动和每英尺INT4 量化 2411.10958v6 |
Authors (6): Jintao Zhang, Haofeng Huang, Pengle Zhang, Jia Wei, Jun Zhu, Jianfei Chen
Although quantization for linear layers has been widely used, its application to accelerate the attention process remains limited. To further enhance the efficiency of attention computation compared to SageAttention while maintaining precision, we propose SageAttention2, which utilizes significantly faster 4-bit matrix multiplication (Matmul) alongside additional precision-enhancing techniques. First, we propose to quantize matrices $(Q, K)$ to INT4 in a hardware-friendly thread-level granularity and quantize matrices $(\widetilde P, V)$ to FP8. Second, we propose a method to smooth $Q$, enhancing the accuracy of INT4 $QK^\top$. Third, we propose a two-level accumulation strategy for $\widetilde PV$ to enhance the accuracy of FP8 $\widetilde PV$. The operations per second (OPS) of SageAttention2 surpass FlashAttention2 and xformers by about 3x and 4.5x on RTX4090, respectively. Moreover, SageAttention2 matches the speed of FlashAttention3(fp8) on the Hopper GPUs, while delivering much higher accuracy. Comprehensive experiments confirm that our approach incurs negligible end-to-end metrics loss across diverse models, including those for language, image, and video generation. The code is available at https://github.com/thu-ml/SageAttention.
nan
Article 1289
Title@2025-06-08 (7): Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning
Title: Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning | Effiziente $Q$-Learning- und Actor-Critic-Methoden für robustes durchschnittliches Reward-Verstärkungs-Lernen | 高效益平均奖励强化学习的学习和行为-竞争方法 2506.07040v1 |
Authors (3): Yang Xu, Swetha Ganesh, Vaneet Aggarwal
We present the first $Q$-learning and actor-critic algorithms for robust average reward Markov Decision Processes (MDPs) with non-asymptotic convergence under contamination, TV distance and Wasserstein distance uncertainty sets. We show that the robust $Q$ Bellman operator is a strict contractive mapping with respect to a carefully constructed semi-norm with constant functions being quotiented out. This property supports a stochastic approximation update, that learns the optimal robust $Q$ function in $\tilde{\cO}(\epsilon^{-2})$ samples. We also show that the same idea can be used for robust $Q$ function estimation, which can be further used for critic estimation. Coupling it with theories in robust policy mirror descent update, we present a natural actor-critic algorithm that attains an $\epsilon$-optimal robust policy in $\tilde{\cO}(\epsilon^{-3})$ samples. These results advance the theory of distributionally robust reinforcement learning in the average reward setting.
nan
Article 1290
Title@2025-06-08 (7): BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling
Title: BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling | BRIDGE: Bootstrapping-Text zur Steuerung der Time-Series-Generation über Multi-Agent iterative Optimierung und Diffusionsmodellierung | BRIDGE:通过多代理迭代优化和传播模型化控制时间- 系列生成的推进文本 2503.02445v4 |
Authors (8): Hao Li, Yu-Hao Huang, Chang Xu, Viktor Schlegel, Renhe Jiang, Riza Batista-Navarro, Goran Nenadic, Jiang Bian
Time-series Generation (TSG) is a prominent research area with broad applications in simulations, data augmentation, and counterfactual analysis. While existing methods have shown promise in unconditional single-domain TSG, real-world applications demand for cross-domain approaches capable of controlled generation tailored to domain-specific constraints and instance-level requirements. In this paper, we argue that text can provide semantic insights, domain information and instance-specific temporal patterns, to guide and improve TSG. We introduce ``Text-Controlled TSG’’, a task focused on generating realistic time series by incorporating textual descriptions. To address data scarcity in this setting, we propose a novel LLM-based Multi-Agent framework that synthesizes diverse, realistic text-to-TS datasets. Furthermore, we introduce BRIDGE, a hybrid text-controlled TSG framework that integrates semantic prototypes with text description for supporting domain-level guidance. This approach achieves state-of-the-art generation fidelity on 11 of 12 datasets, and improves controllability by up to 12% on MSE and 6% MAE compared to no text input generation, highlighting its potential for generating tailored time-series data.
nan
Article 1291
Title@2025-06-08 (7): Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond
Title: Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond | Learning-at-Criticality in großen Sprachmodellen für Quantenfeldtheorie und darüber hinaus | 量子实地理论及以后大语言模型的学习困难 2506.03703v2 |
Authors (7): Xiansheng Cai, Sihan Hu, Tao Wang, Yuan Huang, Pan Zhang, Youjin Deng, Kun Chen
Fundamental physics often confronts complex symbolic problems with few guiding exemplars or established principles. While artificial intelligence (AI) offers promise, its typical need for vast datasets to learn from hinders its use in these information-scarce frontiers. We introduce learning at criticality (LaC), a reinforcement learning (RL) scheme that tunes Large Language Models (LLMs) to a sharp learning transition, addressing this information scarcity. At this transition, LLMs achieve peak generalization from minimal data, exemplified by 7-digit base-7 addition – a test of nontrivial arithmetic reasoning. To elucidate this peak, we analyze a minimal concept-network model (CoNet) designed to capture the essence of how LLMs might link tokens. Trained on a single exemplar, this model also undergoes a sharp learning transition. This transition exhibits hallmarks of a second-order phase transition, notably power-law distributed solution path lengths. At this critical point, the system maximizes a ``critical thinking pattern” crucial for generalization, enabled by the underlying scale-free exploration. This suggests LLMs reach peak performance by operating at criticality, where such explorative dynamics enable the extraction of underlying operational rules. We demonstrate LaC in quantum field theory: an 8B-parameter LLM, tuned to its critical point by LaC using a few exemplars of symbolic Matsubara sums, solves unseen, higher-order problems, significantly outperforming far larger models. LaC thus leverages critical phenomena, a physical principle, to empower AI for complex, data-sparse challenges in fundamental physics.
nan
Article 1292
Title@2025-06-08 (7): Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalanced Regression
Title: Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalanced Regression | Mixtur-Experten mit selbstüberwachter Aggregation für tabellarische Imbalanced Regression | 用于制表平衡回归的测试时自我监督集成混合专家 2506.07033v1 |
Authors (4): Yung-Chien Wang, Kuang-Da Wang, Wei-Yao Wang, Wen-Chih Peng
Tabular data serve as a fundamental and ubiquitous representation of structured information in numerous real-world applications, e.g., finance and urban planning. In the realm of tabular imbalanced applications, data imbalance has been investigated in classification tasks with insufficient instances in certain labels, causing the model’s ineffective generalizability. However, the imbalance issue of tabular regression tasks is underexplored, and yet is critical due to unclear boundaries for continuous labels and simplifying assumptions in existing imbalance regression work, which often rely on known and balanced test distributions. Such assumptions may not hold in practice and can lead to performance degradation. To address these issues, we propose MATI: Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalance Regression, featuring two key innovations: (i) the Region-Aware Mixture Expert, which adopts a Gaussian Mixture Model to capture the underlying related regions. The statistical information of each Gaussian component is then used to synthesize and train region-specific experts to capture the unique characteristics of their respective regions. (ii) Test-Time Self-Supervised Expert Aggregation, which dynamically adjusts region expert weights based on test data features to reinforce expert adaptation across varying test distributions. We evaluated MATI on four real-world tabular imbalance regression datasets, including house pricing, bike sharing, and age prediction. To reflect realistic deployment scenarios, we adopted three types of test distributions: a balanced distribution with uniform target frequencies, a normal distribution that follows the training data, and an inverse distribution that emphasizes rare target regions. On average across these three test distributions, MATI achieved a 7.1% improvement in MAE compared to existing methods.
nan
Article 1293
Title@2025-06-08 (7): AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Title: AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint | AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint | AlphaSteer:学习使用有原则的Null-空间限制的拒绝指导 2506.07022v1 |
Authors (9): Leheng Sheng, Changshuo Shen, Weixiang Zhao, Junfeng Fang, Xiaohao Liu, Zhenkai Liang, Xiang Wang, An Zhang, Tat-Seng Chua
As LLMs are increasingly deployed in real-world applications, ensuring their ability to refuse malicious prompts, especially jailbreak attacks, is essential for safe and reliable use. Recently, activation steering has emerged as an effective approach for enhancing LLM safety by adding a refusal direction vector to internal activations of LLMs during inference, which will further induce the refusal behaviors of LLMs. However, indiscriminately applying activation steering fundamentally suffers from the trade-off between safety and utility, since the same steering vector can also lead to over-refusal and degraded performance on benign prompts. Although prior efforts, such as vector calibration and conditional steering, have attempted to mitigate this trade-off, their lack of theoretical grounding limits their robustness and effectiveness. To better address the trade-off between safety and utility, we present a theoretically grounded and empirically effective activation steering method called AlphaSteer. Specifically, it considers activation steering as a learnable process with two principled learning objectives: utility preservation and safety enhancement. For utility preservation, it learns to construct a nearly zero vector for steering benign data, with the null-space constraints. For safety enhancement, it learns to construct a refusal direction vector for steering malicious data, with the help of linear regression. Experiments across multiple jailbreak attacks and utility benchmarks demonstrate the effectiveness of AlphaSteer, which significantly improves the safety of LLMs without compromising general capabilities. Our codes are available at https://github.com/AlphaLab-USTC/AlphaSteer.
nan
Article 1294
Title@2025-06-08 (7): LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models
Title: LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models | LEANCODE: Modelle besser verstehen für Code-Vereinfachung von vortrainierten großen Sprachmodellen | LEANCODE: 更好地理解模式,以更好地简化培训前大语言模式的守则 2505.14759v3 |
Authors (5): Yan Wang, Ling Ding, Tien N Nguyen, Shaohua Wang, Yanan Zheng
Large Language Models for code often entail significant computational complexity, which grows significantly with the length of the input code sequence. We propose LeanCode for code simplification to reduce training and prediction time, leveraging code contexts in utilizing attention scores to represent the tokens’ importance. We advocate for the selective removal of tokens based on the average context-aware attention scores rather than average scores across all inputs. LeanCode uses the attention scores of `CLS’ tokens within the encoder for classification tasks, such as code search. It also employs the encoder-decoder attention scores to determine token significance for sequence-to-sequence tasks like code summarization. Our evaluation shows LeanCode’s superiority over the SOTAs DietCode and Slimcode, with improvements of 60% and 16% for code search, and 29% and 27% for code summarization, respectively.
nan
Article 1295
Title@2025-06-08 (7): Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation
Title: Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation | Struktur-agnostische Optimalität des doppelt robusten Lernens zur Abschätzung des Behandlungseffekts | Doubly 治疗效果估测工具的“治疗效果强力学习”的结构-不可知性优化 2402.14264v4 |
Authors (2): Jikai Jin, Vasilis Syrgkanis
Average treatment effect estimation is the most central problem in causal inference with application to numerous disciplines. While many estimation strategies have been proposed in the literature, the statistical optimality of these methods has still remained an open area of investigation, especially in regimes where these methods do not achieve parametric rates. In this paper, we adopt the recently introduced structure-agnostic framework of statistical lower bounds, which poses no structural properties on the nuisance functions other than access to black-box estimators that achieve some statistical estimation rate. This framework is particularly appealing when one is only willing to consider estimation strategies that use non-parametric regression and classification oracles as black-box sub-processes. Within this framework, we prove the statistical optimality of the celebrated and widely used doubly robust estimators for both the Average Treatment Effect (ATE) and the Average Treatment Effect on the Treated (ATT), as well as weighted variants of the former, which arise in policy evaluation.
nan
Article 1296
Title@2025-06-08 (7): TABLET: Table Structure Recognition using Encoder-only Transformers
Title: TABLET: Table Structure Recognition using Encoder-only Transformers | TABELLET: Tabelle Struktur Anerkennung mit Encoder-only Transformers | TableT: 使用只使用编码器的变换器确认表格结构 2506.07015v1 |
Authors (2): Qiyu Hou, Jun Wang
To address the challenges of table structure recognition, we propose a novel Split-Merge-based top-down model optimized for large, densely populated tables. Our approach formulates row and column splitting as sequence labeling tasks, utilizing dual Transformer encoders to capture feature interactions. The merging process is framed as a grid cell classification task, leveraging an additional Transformer encoder to ensure accurate and coherent merging. By eliminating unstable bounding box predictions, our method reduces resolution loss and computational complexity, achieving high accuracy while maintaining fast processing speed. Extensive experiments on FinTabNet and PubTabNet demonstrate the superiority of our model over existing approaches, particularly in real-world applications. Our method offers a robust, scalable, and efficient solution for large-scale table recognition, making it well-suited for industrial deployment.
nan
Article 1297
Title@2025-06-08 (7): Comparison of Lightweight Methods for Vehicle Dynamics-Based Driver Drowsiness Detection
Title: Comparison of Lightweight Methods for Vehicle Dynamics-Based Driver Drowsiness Detection | Vergleich von Leichtbaumethoden für Fahrzeugdynamik-basierte Fahrer-Drowsiness-Erkennung | 机动机动机动机动机动机动车机动机动车驾驶识别漂泊情况轻型方法比较 2506.07014v1 |
Authors (3): Yutaro Nakagama, Daisuke Ishii, Kazuki Yoshizoe
Driver drowsiness detection (DDD) prevents road accidents caused by driver fatigue. Vehicle dynamics-based DDD has been proposed as a method that is both economical and high performance. However, there are concerns about the reliability of performance metrics and the reproducibility of many of the existing methods. For instance, some previous studies seem to have a data leakage issue among training and test datasets, and many do not openly provide the datasets they used. To this end, this paper aims to compare the performance of representative vehicle dynamics-based DDD methods under a transparent and fair framework that uses a public dataset. We first develop a framework for extracting features from an open dataset by Aygun et al. and performing DDD with lightweight ML models; the framework is carefully designed to support a variety of onfigurations. Second, we implement three existing representative methods and a concise random forest (RF)-based method in the framework. Finally, we report the results of experiments to verify the reproducibility and clarify the performance of DDD based on common metrics. Among the evaluated methods, the RF-based method achieved the highest accuracy of 88 %. Our findings imply the issues inherent in DDD methods developed in a non-standard manner, and demonstrate a high performance method implemented appropriately.
nan
Article 1298
Title@2025-06-08 (7): Half-AVAE: Adversarial-Enhanced Factorized and Structured Encoder-Free VAE for Underdetermined Independent Component Analysis
Title: Half-AVAE: Adversarial-Enhanced Factorized and Structured Encoder-Free VAE for Underdetermined Independent Component Analysis | Half-AVAE: Adversarial-erhöhte faktorisierte und strukturierte Encoder-freie VAE für unterbestimmte unabhängige Komponentenanalyse | 半AVAE:用于未确定独立部件分析的反强化加工厂和结构化无编码器 VAE 2506.07011v1 |
Authors (2): Yuan-Hao Wei, Yan-Jie Sun
This study advances the Variational Autoencoder (VAE) framework by addressing challenges in Independent Component Analysis (ICA) under both determined and underdetermined conditions, focusing on enhancing the independence and interpretability of latent variables. Traditional VAEs map observed data to latent variables and back via an encoder-decoder architecture, but struggle with underdetermined ICA where the number of latent variables exceeds observed signals. The proposed Half Adversarial VAE (Half-AVAE) builds on the encoder-free Half-VAE framework, eliminating explicit inverse mapping to tackle underdetermined scenarios. By integrating adversarial networks and External Enhancement (EE) terms, Half-AVAE promotes mutual independence among latent dimensions, achieving factorized and interpretable representations. Experiments with synthetic signals demonstrate that Half-AVAE outperforms baseline models, including GP-AVAE and Half-VAE, in recovering independent components under underdetermined conditions, as evidenced by lower root mean square errors. The study highlights the flexibility of VAEs in variational inference, showing that encoder omission, combined with adversarial training and structured priors, enables effective solutions for complex ICA tasks, advancing applications in disentanglement, causal inference, and generative modeling.
nan
Article 1299
Title@2025-06-08 (7): When do neural networks learn world models?
Title: When do neural networks learn world models? | Wann lernen neuronale Netzwerke Weltmodelle? | 神经网络何时学习世界模型? 2502.09297v4 |
Authors (3): Tianren Zhang, Guanyu Chen, Feng Chen
Humans develop world models that capture the underlying generation process of data. Whether neural networks can learn similar world models remains an open problem. In this work, we present the first theoretical results for this problem, showing that in a multi-task setting, models with a low-degree bias provably recover latent data-generating variables under mild assumptions–even if proxy tasks involve complex, non-linear functions of the latents. However, such recovery is sensitive to model architecture. Our analysis leverages Boolean models of task solutions via the Fourier-Walsh transform and introduces new techniques for analyzing invertible Boolean transforms, which may be of independent interest. We illustrate the algorithmic implications of our results and connect them to related research areas, including self-supervised learning, out-of-distribution generalization, and the linear representation hypothesis in large language models.
nan
Article 1300
Title@2025-06-08 (7): LLM-HDR: Bridging LLM-based Perception and Self-Supervision for Unpaired LDR-to-HDR Image Reconstruction
Title: LLM-HDR: Bridging LLM-based Perception and Self-Supervision for Unpaired LDR-to-HDR Image Reconstruction | LLM-HDR: Überbrückung der LLM-basierten Wahrnehmung und Selbstüberwachung für ungepaarte LDR-zu-HDR-Bildrekonstruktion | LLM-HDR: 连接以LLM为基础的LLM感知和自我监督的未受重视的LDR到HDR图像重建 2410.15068v3 |
Authors (6): Hrishav Bakul Barua, Kalin Stefanov, Lemuel Lai En Che, Abhinav Dhall, KokSheik Wong, Ganesh Krishnasamy
The translation of Low Dynamic Range (LDR) to High Dynamic Range (HDR) images is an important computer vision task. There is a significant amount of research utilizing both conventional non-learning methods and modern data-driven approaches, focusing on using both single-exposed and multi-exposed LDR for HDR image reconstruction. However, most current state-of-the-art methods require high-quality paired {LDR,HDR} datasets for model training. In addition, there is limited literature on using unpaired datasets for this task, that is, the model learns a mapping between domains, i.e., {LDR,HDR}. This paper proposes LLM-HDR, a method that integrates the perception of Large Language Models (LLM) into a modified semantic- and cycle-consistent adversarial architecture that utilizes unpaired {LDR,HDR} datasets for training. The method introduces novel artifact- and exposure-aware generators to address visual artifact removal and an encoder and loss to address semantic consistency, another under-explored topic. LLM-HDR is the first to use an LLM for the {LDR,HDR} translation task in a self-supervised setup. The method achieves state-of-the-art performance across several benchmark datasets and reconstructs high-quality HDR images. The official website of this work is available at: https://github.com/HrishavBakulBarua/LLM-HDR
nan
Article 1301
Title@2025-06-08 (7): Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts
Title: Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts | Feynman-Kac Korrektoren in der Diffusion: Annealing, Anleitung und Produkt von Experten | 传播中Feynman-Kac的更正者: 2503.02819v2 |
Authors (9): Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian, Roberto Bondesan, Alán Aspuru-Guzik, Arnaud Doucet, Rob Brekelmans, Alexander Tong, Kirill Neklyudov
While score-based generative models are the model of choice across diverse domains, there are limited tools available for controlling inference-time behavior in a principled manner, e.g. for composing multiple pretrained models. Existing classifier-free guidance methods use a simple heuristic to mix conditional and unconditional scores to approximately sample from conditional distributions. However, such methods do not approximate the intermediate distributions, necessitating additional `corrector’ steps. In this work, we provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models. We derive a weighted simulation scheme which we call Feynman-Kac Correctors (FKCs) based on the celebrated Feynman-Kac formula by carefully accounting for terms in the appropriate partial differential equations (PDEs). To simulate these PDEs, we propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality. We empirically demonstrate the utility of our methods by proposing amortized sampling via inference-time temperature annealing, improving multi-objective molecule generation using pretrained models, and improving classifier-free guidance for text-to-image generation. Our code is available at https://github.com/martaskrt/fkc-diffusion.
nan
Article 1302
Title@2025-06-08 (7): Automatic Doubly Robust Forests
Title: Automatic Doubly Robust Forests | Automatic Doppelt robuste Wälder | Doubly 自动硬森林 2412.07184v2 |
Authors (4): Zhaomeng Chen, Junting Duan, Victor Chernozhukov, Vasilis Syrgkanis
This paper proposes the automatic Doubly Robust Random Forest (DRRF) algorithm for estimating the conditional expectation of a moment functional in the presence of high-dimensional nuisance functions. DRRF extends the automatic debiasing framework based on the Riesz representer to the conditional setting and enables nonparametric, forest-based estimation (Athey et al., 2019; Oprescu et al., 2019). In contrast to existing methods, DRRF does not require prior knowledge of the form of the debiasing term or impose restrictive parametric or semi-parametric assumptions on the target quantity. Additionally, it is computationally efficient in making predictions at multiple query points. We establish consistency and asymptotic normality results for the DRRF estimator under general assumptions, allowing for the construction of valid confidence intervals. Through extensive simulations in heterogeneous treatment effect (HTE) estimation, we demonstrate the superior performance of DRRF over benchmark approaches in terms of estimation accuracy, robustness, and computational efficiency.
nan
Article 1303
Title@2025-06-08 (7): End-to-End Probabilistic Framework for Learning with Hard Constraints
Title: End-to-End Probabilistic Framework for Learning with Hard Constraints | End-to-End-Probabilistisches Rahmenwerk für das Lernen mit harten Einschränkungen | 最终到最终的艰苦制约因素学习概率框架 2506.07003v1 |
Authors (5): Utkarsh Utkarsh, Danielle C. Maddix, Ruijun Ma, Michael W. Mahoney, Yuyang Wang
We present a general purpose probabilistic forecasting framework, ProbHardE2E, to learn systems that can incorporate operational/physical constraints as hard requirements. ProbHardE2E enforces hard constraints by exploiting variance information in a novel way; and thus it is also capable of performing uncertainty quantification (UQ) on the model. Our methodology uses a novel differentiable probabilistic projection layer (DPPL) that can be combined with a wide range of neural network architectures. This DPPL allows the model to learn the system in an end-to-end manner, compared to other approaches where the constraints are satisfied either through a post-processing step or at inference. In addition, ProbHardE2E can optimize a strictly proper scoring rule, without making any distributional assumptions on the target, which enables it to obtain robust distributional estimates (in contrast to existing approaches that generally optimize likelihood-based objectives, which are heavily biased by their distributional assumptions and model choices); and it can incorporate a range of non-linear constraints (increasing the power of modeling and flexibility). We apply ProbHardE2E to problems in learning partial differential equations with uncertainty estimates and to probabilistic time-series forecasting, showcasing it as a broadly applicable general setup that connects these seemingly disparate domains.
nan
Article 1304
Title@2025-06-08 (7): Towards Physics-informed Diffusion for Anomaly Detection in Trajectories
Title: Towards Physics-informed Diffusion for Anomaly Detection in Trajectories | Auf dem Weg zu physikinformierter Diffusion für Anomalienerkennung in Trajektorien | 走向在物理知情的情况下传播用于轨迹中的异常探测 2506.06999v1 |
Authors (6): Arun Sharma, Mingzhou Yang, Majid Farhadloo, Subhankar Ghosh, Bharat Jayaprakash, Shashi Shekhar
Given trajectory data, a domain-specific study area, and a user-defined threshold, we aim to find anomalous trajectories indicative of possible GPS spoofing (e.g., fake trajectory). The problem is societally important to curb illegal activities in international waters, such as unauthorized fishing and illicit oil transfers. The problem is challenging due to advances in AI generated in deep fakes generation (e.g., additive noise, fake trajectories) and lack of adequate amount of labeled samples for ground-truth verification. Recent literature shows promising results for anomalous trajectory detection using generative models despite data sparsity. However, they do not consider fine-scale spatiotemporal dependencies and prior physical knowledge, resulting in higher false-positive rates. To address these limitations, we propose a physics-informed diffusion model that integrates kinematic constraints to identify trajectories that do not adhere to physical laws. Experimental results on real-world datasets in the maritime and urban domains show that the proposed framework results in higher prediction accuracy and lower estimation error rate for anomaly detection and trajectory generation methods, respectively. Our implementation is available at https://github.com/arunshar/Physics-Informed-Diffusion-Probabilistic-Model.
nan
Article 1305
Title@2025-06-08 (7): What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding
Title: What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding | Was macht Vernunftmodelle anders? Folgen Sie der Vernunftführerin für effiziente Decodierung | 是什么让理性模型不同? 跟随高效配方的理性导师 2506.06998v1 |
Authors (7): Ming Li, Zhengyuan Yang, Xiyao Wang, Dianqi Li, Kevin Lin, Tianyi Zhou, Lijuan Wang
Large reasoning models (LRMs) achieve strong reasoning performance by emitting long chains of thought. Yet, these verbose traces slow down inference and often drift into unnecessary detail, known as the overthinking phenomenon. To better understand LRMs’ behavior, we systematically analyze the token-level misalignment between reasoning and non-reasoning models. While it is expected that their primary difference lies in the stylistic “thinking cues”, LRMs uniquely exhibit two pivotal, previously under-explored phenomena: a Global Misalignment Rebound, where their divergence from non-reasoning models persists or even grows as response length increases, and more critically, a Local Misalignment Diminish, where the misalignment concentrates at the “thinking cues” each sentence starts with but rapidly declines in the remaining of the sentence. Motivated by the Local Misalignment Diminish, we propose FoReaL-Decoding, a collaborative fast-slow thinking decoding method for cost-quality trade-off. In FoReaL-Decoding, a Leading model leads the first few tokens for each sentence, and then a weaker draft model completes the following tokens to the end of each sentence. FoReaL-Decoding adopts a stochastic gate to smoothly interpolate between the small and the large model. On four popular math-reasoning benchmarks (AIME24, GPQA-Diamond, MATH500, AMC23), FoReaL-Decoding reduces theoretical FLOPs by 30 to 50% and trims CoT length by up to 40%, while preserving 86 to 100% of model performance. These results establish FoReaL-Decoding as a simple, plug-and-play route to controllable cost-quality trade-offs in reasoning-centric tasks.
nan
Article 1306
Title@2025-06-08 (7): Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access
Title: Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access | Probenkomplexität des Diffusionsmodelltrainings ohne empirischen Risikominimizer-Zugang | 传播示范培训的复杂程度,没有经验风险最小化,没有经验风险最小化机会 2505.18344v2 |
Authors (5): Mudit Gaur, Prashant Trivedi, Sasidhar Kunapuli, Amrit Singh Bedi, Vaneet Aggarwal
Diffusion models have demonstrated state-of-the-art performance across vision, language, and scientific domains. Despite their empirical success, prior theoretical analyses of the sample complexity suffer from poor scaling with input data dimension or rely on unrealistic assumptions such as access to exact empirical risk minimizers. In this work, we provide a principled analysis of score estimation, establishing a sample complexity bound of $\widetilde{\mathcal{O}}(\epsilon^{-6})$. Our approach leverages a structured decomposition of the score estimation error into statistical, approximation, and optimization errors, enabling us to eliminate the exponential dependence on neural network parameters that arises in prior analyses. It is the first such result which achieves sample complexity bounds without assuming access to the empirical risk minimizer of score function estimation loss.
nan
Article 1307
Title@2025-06-08 (7): STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings
Title: STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings | STAMP Ihr Inhalt: Nachweis der Datensatzmitgliedschaft über Watermarked Rephrasings | STAMP 您的内容: 通过水标修改方式证明数据集成员 2504.13416v2 |
Authors (3): Saksham Rastogi, Pratyush Maini, Danish Pruthi
Given how large parts of publicly available text are crawled to pretrain large language models (LLMs), data creators increasingly worry about the inclusion of their proprietary data for model training without attribution or licensing. Their concerns are also shared by benchmark curators whose test-sets might be compromised. In this paper, we present STAMP, a framework for detecting dataset membership-i.e., determining the inclusion of a dataset in the pretraining corpora of LLMs. Given an original piece of content, our proposal involves first generating multiple rephrases, each embedding a watermark with a unique secret key. One version is to be released publicly, while others are to be kept private. Subsequently, creators can compare model likelihoods between public and private versions using paired statistical tests to prove membership. We show that our framework can successfully detect contamination across four benchmarks which appear only once in the training data and constitute less than 0.001% of the total tokens, outperforming several contamination detection and dataset inference baselines. We verify that STAMP preserves both the semantic meaning and utility of the original data. We apply STAMP to two real-world scenarios to confirm the inclusion of paper abstracts and blog articles in the pretraining corpora.
nan
Article 1308
Title@2025-06-08 (7): Correcting for Position Bias in Learning to Rank: A Control Function Approach
Title: Correcting for Position Bias in Learning to Rank: A Control Function Approach | Korrektur für Position Bias in Learning to Rank: Ein Steuerungsfunktionsansatz | 校正学习排名中的比值位置:控制功能方法 2506.06989v1 |
Authors (3): Md Aminul Islam, Kathryn Vasilaky, Elena Zheleva
Implicit feedback data, such as user clicks, is commonly used in learning-to-rank (LTR) systems because it is easy to collect and it often reflects user preferences. However, this data is prone to various biases, and training an LTR system directly on biased data can result in suboptimal ranking performance. One of the most prominent and well-studied biases in implicit feedback data is position bias, which occurs because users are more likely to interact with higher-ranked documents regardless of their true relevance. In this paper, we propose a novel control function-based method that accounts for position bias in a two-stage process. The first stage uses exogenous variation from the residuals of the ranking process to correct for position bias in the second stage click equation. Unlike previous position bias correction methods, our method does not require knowledge of the click or propensity model and allows for nonlinearity in the underlying ranking model. Moreover, our method is general and allows for debiasing any state-of-the-art ranking algorithm by plugging it into the second stage. We also introduce a technique to debias validation clicks for hyperparameter tuning to select the optimal model in the absence of unbiased validation data. Experimental results demonstrate that our method outperforms state-of-the-art approaches in correcting for position bias.
nan
Article 1309
Title@2025-06-08 (7): Fully Explainable Classification Models Using Hyperblocks
Title: Fully Explainable Classification Models Using Hyperblocks | Vollständig erklärbare Klassifikationsmodelle mit Hyperblocks | 使用超超分块的完全可解释的分类模型 2506.06986v1 |
Authors (3): Austin Snyder, Ryan Gallagher, Boris Kovalerchuk
Building on existing work with Hyperblocks, which classify data using minimum and maximum bounds for each attribute, we focus on enhancing interpretability, decreasing training time, and reducing model complexity without sacrificing accuracy. This system allows subject matter experts (SMEs) to directly inspect and understand the model’s decision logic without requiring extensive machine learning expertise. To reduce Hyperblock complexity while retaining performance, we introduce a suite of algorithms for Hyperblock simplification. These include removing redundant attributes, removing redundant blocks through overlap analysis, and creating disjunctive units. These methods eliminate unnecessary parameters, dramatically reducing model size without harming classification power. We increase robustness by introducing an interpretable fallback mechanism using k-Nearest Neighbor (k-NN) classifiers for points not covered by any block, ensuring complete data coverage while preserving model transparency. Our results demonstrate that interpretable models can scale to high-dimensional, large-volume datasets while maintaining competitive accuracy. On benchmark datasets such as WBC (9-D), we achieve strong predictive performance with significantly reduced complexity. On MNIST (784-D), our method continues to improve through tuning and simplification, showing promise as a transparent alternative to black-box models in domains where trust, clarity, and control are crucial.
nan
Article 1310
Title@2025-06-08 (7): Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments
Title: Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments | Deep RL braucht tiefe Verhaltensanalyse: Implizite Planung durch modellfreie Agenten in offenen Umgebungen erkunden | 深 RL 需要深 RL 深度行为分析:探索不限名额环境中无示范工具的隐性规划 2506.06981v1 |
Authors (8): Riley Simmons-Edler, Ryan P. Badman, Felix Baastad Berg, Raymond Chua, John J. Vastola, Joshua Lunger, William Qian, Kanaka Rajan
Understanding the behavior of deep reinforcement learning (DRL) agents – particularly as task and agent sophistication increase – requires more than simple comparison of reward curves, yet standard methods for behavioral analysis remain underdeveloped in DRL. We apply tools from neuroscience and ethology to study DRL agents in a novel, complex, partially observable environment, ForageWorld, designed to capture key aspects of real-world animal foraging – including sparse, depleting resource patches, predator threats, and spatially extended arenas. We use this environment as a platform for applying joint behavioral and neural analysis to agents, revealing detailed, quantitatively grounded insights into agent strategies, memory, and planning. Contrary to common assumptions, we find that model-free RNN-based DRL agents can exhibit structured, planning-like behavior purely through emergent dynamics – without requiring explicit memory modules or world models. Our results show that studying DRL agents like animals – analyzing them with neuroethology-inspired tools that reveal structure in both behavior and neural dynamics – uncovers rich structure in their learning dynamics that would otherwise remain invisible. We distill these tools into a general analysis framework linking core behavioral and representational features to diagnostic methods, which can be reused for a wide range of tasks and agents. As agents grow more complex and autonomous, bridging neuroscience, cognitive science, and AI will be essential – not just for understanding their behavior, but for ensuring safe alignment and maximizing desirable behaviors that are hard to measure via reward. We show how this can be done by drawing on lessons from how biological intelligence is studied.
nan
Article 1311
Title@2025-06-08 (7): MoXGATE: Modality-aware cross-attention for multi-omic gastrointestinal cancer sub-type classification
Title: MoXGATE: Modality-aware cross-attention for multi-omic gastrointestinal cancer sub-type classification | MoXGATE: Modalitätsbewusste Cross-Attention für multi-omere gastrointestinale Tumor-Subtypklassifikation | MoXGATE:多工程胃肠胃癌亚型癌症分类模式 – – 意识交叉关注 2506.06980v1 |
Authors (5): Sajib Acharjee Dip, Uddip Acharjee Shuvo, Dipanwita Mallick, Abrar Rahman Abir, Liqing Zhang
Cancer subtype classification is crucial for personalized treatment and prognostic assessment. However, effectively integrating multi-omic data remains challenging due to the heterogeneous nature of genomic, epigenomic, and transcriptomic features. In this work, we propose Modality-Aware Cross-Attention MoXGATE, a novel deep-learning framework that leverages cross-attention and learnable modality weights to enhance feature fusion across multiple omics sources. Our approach effectively captures inter-modality dependencies, ensuring robust and interpretable integration. Through experiments on Gastrointestinal Adenocarcinoma (GIAC) and Breast Cancer (BRCA) datasets from TCGA, we demonstrate that MoXGATE outperforms existing methods, achieving 95\% classification accuracy. Ablation studies validate the effectiveness of cross-attention over simple concatenation and highlight the importance of different omics modalities. Moreover, our model generalizes well to unseen cancer types e.g., breast cancer, underscoring its adaptability. Key contributions include (1) a cross-attention-based multi-omic integration framework, (2) modality-weighted fusion for enhanced interpretability, (3) application of focal loss to mitigate data imbalance, and (4) validation across multiple cancer subtypes. Our results indicate that MoXGATE is a promising approach for multi-omic cancer subtype classification, offering improved performance and biological generalizability.
nan
Article 1312
Title@2025-06-08 (7): Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias
Title: Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias | Turb-L1: Langfristige Turbulenzen erreichen, die durch das Greifen spektraler Bias verfolgt werden | Turb-L1:通过处理光辉双鱼,实现长期动荡追踪 2505.19038v2 |
Authors (10): Hao Wu, Yuan Gao, Ruiqi Shu, Zean Han, Fan Xu, Zhihong Zhu, Qingsong Wen, Xian Wu, Kun Wang, Xiaomeng Huang
Accurately predicting the long-term evolution of turbulence is crucial for advancing scientific understanding and optimizing engineering applications. However, existing deep learning methods face significant bottlenecks in long-term autoregressive prediction, which exhibit excessive smoothing and fail to accurately track complex fluid dynamics. Our extensive experimental and spectral analysis of prevailing methods provides an interpretable explanation for this shortcoming, identifying Spectral Bias as the core obstacle. Concretely, spectral bias is the inherent tendency of models to favor low-frequency, smooth features while overlooking critical high-frequency details during training, thus reducing fidelity and causing physical distortions in long-term predictions. Building on this insight, we propose Turb-L1, an innovative turbulence prediction method, which utilizes a Hierarchical Dynamics Synthesis mechanism within a multi-grid architecture to explicitly overcome spectral bias. It accurately captures cross-scale interactions and preserves the fidelity of high-frequency dynamics, enabling reliable long-term tracking of turbulence evolution. Extensive experiments on the 2D turbulence benchmark show that Turb-L1 demonstrates excellent performance: (I) In long-term predictions, it reduces Mean Squared Error (MSE) by $80.3\%$ and increases Structural Similarity (SSIM) by over $9\times$ compared to the SOTA baseline, significantly improving prediction fidelity. (II) It effectively overcomes spectral bias, accurately reproducing the full enstrophy spectrum and maintaining physical realism in high-wavenumber regions, thus avoiding the spectral distortions or spurious energy accumulation seen in other methods.
nan
Article 1313
Title@2025-06-08 (7): Near Optimal Non-asymptotic Sample Complexity of 1-Identification
Title: Near Optimal Non-asymptotic Sample Complexity of 1-Identification | Nahe Optimale nicht-asymptotische Probenkomplexität der 1-Identifikation | 近于最优化的非不消毒样品 复杂度 1 - 识别 2506.06978v1 |
Authors (2): Zitian Li, Wang Chi Cheung
Motivated by an open direction in existing literature, we study the 1-identification problem, a fundamental multi-armed bandit formulation on pure exploration. The goal is to determine whether there exists an arm whose mean reward is at least a known threshold $\mu_0$, or to output None if it believes such an arm does not exist. The agent needs to guarantee its output is correct with probability at least $1-\delta$. Degenne & Koolen 2019 has established the asymptotically tight sample complexity for the 1-identification problem, but they commented that the non-asymptotic analysis remains unclear. We design a new algorithm Sequential-Exploration-Exploitation (SEE), and conduct theoretical analysis from the non-asymptotic perspective. Novel to the literature, we achieve near optimality, in the sense of matching upper and lower bounds on the pulling complexity. The gap between the upper and lower bounds is up to a polynomial logarithmic factor. The numerical result also indicates the effectiveness of our algorithm, compared to existing benchmarks.
nan
Article 1314
Title@2025-06-08 (7): Post Reinforcement Learning Inference
Title: Post Reinforcement Learning Inference | Rückschlüsse nach der Verstärkung des Lernens | 强化后学习的推论 2302.08854v4 |
Authors (2): Vasilis Syrgkanis, Ruohan Zhan
We consider estimation and inference using data collected from reinforcement learning algorithms. These algorithms, characterized by their adaptive experimentation, interact with individual units over multiple stages, dynamically adjusting their strategies based on previous interactions. Our goal is to evaluate a counterfactual policy post-data collection and estimate structural parameters, like dynamic treatment effects, which can be used for credit assignment and determining the effect of earlier actions on final outcomes. Such parameters of interest can be framed as solutions to moment equations, but not minimizers of a population loss function, leading to Z-estimation approaches for static data. However, in the adaptive data collection environment of reinforcement learning, where algorithms deploy nonstationary behavior policies, standard estimators do not achieve asymptotic normality due to the fluctuating variance. We propose a weighted Z-estimation approach with carefully designed adaptive weights to stabilize the time-varying estimation variance. We identify proper weighting schemes to restore the consistency and asymptotic normality of the weighted Z-estimators for target parameters, which allows for hypothesis testing and constructing uniform confidence regions. Primary applications include dynamic treatment effect estimation and dynamic off-policy evaluation.
nan
Article 1315
Title@2025-06-08 (7): UdonCare: Hierarchy Pruning for Unseen Domain Discovery in Predictive Healthcare
Title: UdonCare: Hierarchy Pruning for Unseen Domain Discovery in Predictive Healthcare | UdonCare: Hierarchie Pruning für Unsichtbare Domain Discovery in Predictive Healthcare | UdonCare: 预测保健中未见的域域发现 的分级预留机制 2506.06977v1 |
Authors (4): Pengfei Hu, Xiaoxue Han, Fei Wang, Yue Ning
Domain generalization has become a critical challenge in clinical prediction, where patient cohorts often exhibit shifting data distributions that degrade model performance. Typical domain generalization approaches struggle in real-world healthcare settings for two main reasons: (1) patient-specific domain labels are typically unavailable, making domain discovery especially difficult; (2) purely data-driven approaches overlook key clinical insights, leading to a gap in medical knowledge integration. To address these problems, we leverage hierarchical medical ontologies like the ICD-9-CM hierarchy to group diseases into higher-level categories and discover more flexible latent domains. In this paper, we introduce UdonCare, a hierarchy-guided framework that iteratively prunes fine-grained domains, encodes these refined domains, and applies a Siamese-type inference mechanism to separate domain-related signals from patient-level features. Experimental results on clinical datasets (MIMIC-III and MIMIC-IV) show that the proposed model achieves higher performance compared to other domain generalization baselines when substantial domain gaps presents, highlighting the untapped potential of medical knowledge for enhancing domain generalization in practical healthcare applications.
nan
Article 1316
Title@2025-06-08 (7): Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning
Title: Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning | Lernen, durch verstärktes Lernen durch reward-gewichtete Feintuning zu klären | 学习通过通过奖励-加权微调加强学习来澄清问题 2506.06964v1 |
Authors (9): Subhojyoti Mukherjee, Viet Dac Lai, Raghavendra Addanki, Ryan Rossi, Seunghyun Yoon, Trung Bui, Anup Rao, Jayakumar Subramanian, Branislav Kveton
Question answering (QA) agents automatically answer questions posed in natural language. In this work, we learn to ask clarifying questions in QA agents. The key idea in our method is to simulate conversations that contain clarifying questions and learn from them using reinforcement learning (RL). To make RL practical, we propose and analyze offline RL objectives that can be viewed as reward-weighted supervised fine-tuning (SFT) and easily optimized in large language models. Our work stands in a stark contrast to recently proposed methods, based on SFT and direct preference optimization, which have additional hyper-parameters and do not directly optimize rewards. We compare to these methods empirically and report gains in both optimized rewards and language quality.
nan
Article 1317
Title@2025-06-08 (7): Fitting Multilevel Factor Models
Title: Fitting Multilevel Factor Models | Passende Multilevel-Faktormodelle | 适合多级因数模型 2409.12067v3 |
Authors (3): Tetiana Parshakova, Trevor Hastie, Stephen Boyd
We examine a special case of the multilevel factor model, with covariance given by multilevel low rank (MLR) matrix~\cite{parshakova2023factor}. We develop a novel, fast implementation of the expectation-maximization algorithm, tailored for multilevel factor models, to maximize the likelihood of the observed data. This method accommodates any hierarchical structure and maintains linear time and storage complexities per iteration. This is achieved through a new efficient technique for computing the inverse of the positive definite MLR matrix. We show that the inverse of positive definite MLR matrix is also an MLR matrix with the same sparsity in factors, and we use the recursive Sherman-Morrison-Woodbury matrix identity to obtain the factors of the inverse. Additionally, we present an algorithm that computes the Cholesky factorization of an expanded matrix with linear time and space complexities, yielding the covariance matrix as its Schur complement. This paper is accompanied by an open-source package that implements the proposed methods.
nan
Article 1318
Title@2025-06-08 (7): CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
Title: CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration | CE-CoLLM: Effiziente und adaptive Großsprachenmodelle durch Cloud-Edge-Kollaboration | CE-COLLM:通过云眼协作建立高效和适应性大语言模型 2411.02829v2 |
Authors (2): Hongpeng Jin, Yanzhao Wu
Large Language Models (LLMs) exhibit remarkable human-like predictive capabilities. However, it is challenging to deploy LLMs to provide efficient and adaptive inference services at the edge. This paper proposes a novel Cloud-Edge Collaboration framework for LLMs (CE-CoLLM) to tackle these challenges. First, we identify the transmission of LLM contextual data between the cloud and edge as a key performance bottleneck, which introduces substantial communication overhead that dominates overall inference latency and makes na"ive cloud-edge collaboration for LLMs inefficient. Second, we introduce a suite of novel techniques, including a latency-aware early exit mechanism and efficient cloud context management, into CE-CoLLM, which collectively reduce communication overhead and preserve LLM inference accuracy. Third, we design two adaptive inference modes to accommodate diverse edge environments: (1) a low-latency standalone edge inference mode that enables reliable edge-side independent LLM inference even under unstable network conditions, and (2) a high-accuracy cloud-edge collaborative inference mode that adaptively leverages cloud resources to enhance prediction accuracy. Extensive experiments on multiple benchmark datasets demonstrate that CE-CoLLM reduces overall inference time by up to 13.81% and offloads over 84.53% of the computational workload from the cloud to the edge, compared to conventional cloud-based LLM deployment, without sacrificing prediction accuracy. The code is provided on GitHub at https://github.com/mlsysx/CE-CoLLM.
nan
Article 1319
Title@2025-06-08 (7): Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs
Title: Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs | Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs | 超越自反应内核:历史驱动目标,实现高效的非线性非线性通用图形MCMC 2505.18300v2 |
Authors (3): Jie Hu, Yi-Ting Ma, Do Young Eun
We propose a history-driven target (HDT) framework in Markov Chain Monte Carlo (MCMC) to improve any random walk algorithm on discrete state spaces, such as general undirected graphs, for efficient sampling from target distribution $\boldsymbol{\mu}$. With broad applications in network science and distributed optimization, recent innovations like the self-repellent random walk (SRRW) achieve near-zero variance by prioritizing under-sampled states through transition kernel modifications based on past visit frequencies. However, SRRW’s reliance on explicit computation of transition probabilities for all neighbors at each step introduces substantial computational overhead, while its strict dependence on time-reversible Markov chains excludes advanced non-reversible MCMC methods. To overcome these limitations, instead of direct modification of transition kernel, HDT introduces a history-dependent target distribution $\boldsymbol{\pi}[\mathbf{x}]$ to replace the original target $\boldsymbol{\mu}$ in any graph sampler, where $\mathbf{x}$ represents the empirical measure of past visits. This design preserves lightweight implementation by requiring only local information between the current and proposed states and achieves compatibility with both reversible and non-reversible MCMC samplers, while retaining unbiased samples with target distribution $\boldsymbol{\mu}$ and near-zero variance performance. Extensive experiments in graph sampling demonstrate consistent performance gains, and a memory-efficient Least Recently Used (LRU) cache ensures scalability to large general graphs.
nan
Article 1320
Title@2025-06-08 (7): Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression
Title: Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression | Sicherheitsbewusstes Verstärktes Lernen zur Kontrolle über risikosensitive Aktions-Value Iteration und quantitative Regression | 通过风险敏感行动增值和量回归,加强安全软件强化学习,以通过风险敏感行动控制 2506.06954v1 |
Authors (4): Clinton Enwerem, Aniruddh G. Puranic, John S. Baras, Calin Belta
Mainstream approximate action-value iteration reinforcement learning (RL) algorithms suffer from overestimation bias, leading to suboptimal policies in high-variance stochastic environments. Quantile-based action-value iteration methods reduce this bias by learning a distribution of the expected cost-to-go using quantile regression. However, ensuring that the learned policy satisfies safety constraints remains a challenge when these constraints are not explicitly integrated into the RL framework. Existing methods often require complex neural architectures or manual tradeoffs due to combined cost functions. To address this, we propose a risk-regularized quantile-based algorithm integrating Conditional Value-at-Risk (CVaR) to enforce safety without complex architectures. We also provide theoretical guarantees on the contraction properties of the risk-sensitive distributional Bellman operator in Wasserstein space, ensuring convergence to a unique cost distribution. Simulations of a mobile robot in a dynamic reach-avoid task show that our approach leads to more goal successes, fewer collisions, and better safety-performance trade-offs compared to risk-neutral methods.
nan
Article 1321
Title@2025-06-07 (6): Is Your Training Pipeline Production-Ready? A Case Study in the Healthcare Domain
Title: Is Your Training Pipeline Production-Ready? A Case Study in the Healthcare Domain | Ist Ihr Training Pipeline Production-Ready? Eine Fallstudie im Bereich Healthcare | 你的训练管道生产-准备? 保健领域案例研究 2506.06946v1 |
Authors (5): Daniel Lawand, Lucas Quaresma, Roberto Bolgheroni, Alfredo Goldman, Renato Cordeiro Ferreira
Deploying a Machine Learning (ML) training pipeline into production requires robust software engineering practices. This differs significantly from experimental workflows. This experience report investigates this challenge in SPIRA, a project whose goal is to create an ML-Enabled System (MLES) to pre-diagnose insufficiency respiratory via speech analysis. The first version of SPIRA’s training pipeline lacked critical software quality attributes. This paper presents an overview of the MLES, then compares three versions of the architecture of the Continuous Training subsystem, which evolved from a Big Ball of Mud, to a Modular Monolith, towards Microservices. By adopting different design principles and patterns to enhance its maintainability, robustness, and extensibility. In this way, the paper seeks to offer insights for both ML Engineers tasked to productionize ML training pipelines and Data Scientists seeking to adopt MLOps practices.
nan
Article 1322
Title@2025-06-07 (6): Polar Hierarchical Mamba: Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences
Title: Polar Hierarchical Mamba: Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences | Polar Hierarchical Mamba: Auf dem Weg zur Streaming-LiDAR-Objekterkennung mit Punktwolken als egozentrische Sequenzen | 极地分级马姆巴:向以点云为主心序列的分点云星探测星体径流激光雷达物体的方向迈进 2506.06944v1 |
Authors (3): Mellon M. Zhang, Glen Chou, Saibal Mukhopadhyay
Accurate and efficient object detection is essential for autonomous vehicles, where real-time perception requires low latency and high throughput. LiDAR sensors provide robust depth information, but conventional methods process full 360{\deg} scans in a single pass, introducing significant delay. Streaming approaches address this by sequentially processing partial scans in the native polar coordinate system, yet they rely on translation-invariant convolutions that are misaligned with polar geometry – resulting in degraded performance or requiring complex distortion mitigation. Recent Mamba-based state space models (SSMs) have shown promise for LiDAR perception, but only in the full-scan setting, relying on geometric serialization and positional embeddings that are memory-intensive and ill-suited to streaming. We propose Polar Hierarchical Mamba (PHiM), a novel SSM architecture designed for polar-coordinate streaming LiDAR. PHiM uses local bidirectional Mamba blocks for intra-sector spatial encoding and a global forward Mamba for inter-sector temporal modeling, replacing convolutions and positional encodings with distortion-aware, dimensionally-decomposed operations. PHiM sets a new state-of-the-art among streaming detectors on the Waymo Open Dataset, outperforming the previous best by 10\% and matching full-scan baselines at twice the throughput. Code will be available at https://github.com/meilongzhang/Polar-Hierarchical-Mamba .
nan
Article 1323
Title@2025-06-07 (6): Conditional Denoising Diffusion for ISAC Enhanced Channel Estimation in Cell-Free 6G
Title: Conditional Denoising Diffusion for ISAC Enhanced Channel Estimation in Cell-Free 6G | Bedingte störende Diffusion für ISAC-Verstärkte Kanalabschätzung in zellfreier 6G | 以无细胞6G方式为ISAC强化频道估计仪进行有条件的低排放传播 2506.06942v1 |
Authors (5): Mohammad Farzanullah, Han Zhang, Akram Bin Sediq, Ali Afana, Melike Erol-Kantarci
Cell-free Integrated Sensing and Communication (ISAC) aims to revolutionize 6th Generation (6G) networks. By combining distributed access points with ISAC capabilities, it boosts spectral efficiency, situational awareness, and communication reliability. Channel estimation is a critical step in cell-free ISAC systems to ensure reliable communication, but its performance is usually limited by challenges such as pilot contamination and noisy channel estimates. This paper presents a novel framework leveraging sensing information as a key input within a Conditional Denoising Diffusion Model (CDDM). In this framework, we integrate CDDM with a Multimodal Transformer (MMT) to enhance channel estimation in ISAC-enabled cell-free systems. The MMT encoder effectively captures inter-modal relationships between sensing and location data, enabling the CDDM to iteratively denoise and refine channel estimates. Simulation results demonstrate that the proposed approach achieves significant performance gains. As compared with Least Squares (LS) and Minimum Mean Squared Error (MMSE) estimators, the proposed model achieves normalized mean squared error (NMSE) improvements of 8 dB and 9 dB, respectively. Moreover, we achieve a 27.8% NMSE improvement compared to the traditional denoising diffusion model (TDDM), which does not incorporate sensing channel information. Additionally, the model exhibits higher robustness against pilot contamination and maintains high accuracy under challenging conditions, such as low signal-to-noise ratios (SNRs). According to the simulation results, the model performs well for users near sensing targets by leveraging the correlation between sensing and communication channels.
nan
Article 1324
Title@2025-06-07 (6): RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior
Title: RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior | RestoreGrad: Signal-Restaurierung mit Bedingten störenden Diffusions-Modellen mit gemeinsam gelernten Prior | RestAGrad:使用有条件的低度扩散模型和先前联合研究的传播模型,恢复信号恢复信号 2502.13574v3 |
Authors (7): Ching-Hua Lee, Chouchang Yang, Jaejin Cho, Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Yilin Shen, Hongxia Jin
Denoising diffusion probabilistic models (DDPMs) can be utilized to recover a clean signal from its degraded observation(s) by conditioning the model on the degraded signal. The degraded signals are themselves contaminated versions of the clean signals; due to this correlation, they may encompass certain useful information about the target clean data distribution. However, existing adoption of the standard Gaussian as the prior distribution in turn discards such information when shaping the prior, resulting in sub-optimal performance. In this paper, we propose to improve conditional DDPMs for signal restoration by leveraging a more informative prior that is jointly learned with the diffusion model. The proposed framework, called RestoreGrad, seamlessly integrates DDPMs into the variational autoencoder (VAE) framework, taking advantage of the correlation between the degraded and clean signals to encode a better diffusion prior. On speech and image restoration tasks, we show that RestoreGrad demonstrates faster convergence (5-10 times fewer training steps) to achieve better quality of restored signals over existing DDPM baselines and improved robustness to using fewer sampling steps in inference time (2-2.5 times fewer), advocating the advantages of leveraging jointly learned prior for efficiency improvements in the diffusion process.
nan
Article 1325
Title@2025-06-07 (6): The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Title: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity | Die Illusion des Denkens: Die Stärken und Grenzen von Vernunftmodellen über das Lens of Problem Complexity verstehen | 思考的幻觉:通过问题复杂焦点了解理性模型的长处和局限性 2506.06941v1 |
Authors (6): Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar
Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established math and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from contamination and does not provide insights into the reasoning traces. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs think. Through extensive experiments, we show that LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counterintuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having remaining token budget. By comparing LRMs with their standard LLM counterparts under same inference compute, we identify three performance regimes: (1) low-complexity tasks where standard models outperform LRMs, (2) medium-complexity tasks where LRMs demonstrates advantage, and (3) high-complexity tasks where both models face complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across scales. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models’ computational behavior, shedding light on their strengths, limitations, and raising questions about their reasoning capabilities.
nan
Article 1326
Title@2025-06-07 (6): Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
Title: Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More | Schärfe-Dynamik im NN-Training mit einem minimalistischen Beispiel verstehen: Die Auswirkungen von Dataset-Schwierigkeiten, Tiefe, Stochastik und mehr | 在NN培训中以最微小的示例了解尖锐的动态:数据集难度、深度、储存性等的影响 2506.06940v1 |
Authors (3): Geonhui Yoo, Minhak Song, Chulhee Yun
When training deep neural networks with gradient descent, sharpness often increases – a phenomenon known as progressive sharpening – before saturating at the edge of stability. Although commonly observed in practice, the underlying mechanisms behind progressive sharpening remain poorly understood. In this work, we study this phenomenon using a minimalist model: a deep linear network with a single neuron per layer. We show that this simple model effectively captures the sharpness dynamics observed in recent empirical studies, offering a simple testbed to better understand neural network training. Moreover, we theoretically analyze how dataset properties, network depth, stochasticity of optimizers, and step size affect the degree of progressive sharpening in the minimalist model. We then empirically demonstrate how these theoretical insights extend to practical scenarios. This study offers a deeper understanding of sharpness dynamics in neural network training, highlighting the interplay between depth, training data, and optimizers.
nan
Article 1327
Title@2025-06-07 (6): Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning
Title: Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning | Intelligente Anomalieerkennung für Lane-Rendering mit Transformer mit selbstüberwachtem Pre-Training und maßgeschneidertem Feintuning | 利用自我监督的训练前和定制的微调访问变形器,对利用自我监督的训练前和定制的微调访问前变形器进行巷路成形的智能异常异常探测 2312.04398v6 |
Authors (6): Yongqi Dong, Xingmin Lu, Ruohan Li, Wei Song, Bart van Arem, Haneen Farah
The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.
nan
Article 1328
Title@2025-06-07 (6): Scaling over Scaling: Exploring Test-Time Scaling Plateau in Large Reasoning Models
Title: Scaling over Scaling: Exploring Test-Time Scaling Plateau in Large Reasoning Models | Skalierung über Skalierung: Untersuchung des Testzeit-Skalierungs-Plateau in Großmodellen | 缩放超过缩放: 探索大型理由模型中的试验时间缩放高台 2505.20522v2 |
Authors (5): Jian Wang, Boyan Zhu, Chak Tou Leong, Yongqi Li, Wenjie Li
Large reasoning models (LRMs) have exhibited the capacity of enhancing reasoning performance via internal test-time scaling. Building upon this, a promising direction is to further scale test-time compute to unlock even greater reasoning capabilities. However, as we push these scaling boundaries, systematically understanding the practical limits and achieving optimal resource allocation becomes a critical challenge. In this paper, we investigate the scaling plateau of test-time scaling and introduce the Test-Time Scaling Performance Model (TTSPM). We theoretically analyze two fundamental paradigms for such extended scaling, parallel scaling and sequential scaling, from a probabilistic modeling perspective. Our primary contribution is the derivation of the saturation point on the scaling budget for both strategies, identifying thresholds beyond which additional computation yields diminishing returns. Remarkably, despite their distinct mechanisms, both paradigms converge to a unified mathematical structure in their upper bounds. We empirically validate our theoretical findings on challenging reasoning benchmarks, including AIME, MATH-500, and GPQA, demonstrating the practical utility of these bounds for test-time resource allocation. We hope that this work provides insights into the cost-benefit trade-offs of test-time scaling, guiding the development of more resource-efficient inference strategies for large reasoning models.
nan
Article 1329
Title@2025-06-07 (6): Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry
Title: Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry | Das Budget neu schreiben: Ein allgemeiner Rahmen für Black-Box-Angriffe unter Kosten-Asymmetrie | 重写预算:费用不对称下的黑牛袭击总框架 2506.06933v1 |
Authors (3): Mahdi Salmani, Alireza Abdollahpoorrostam, Seyed-Mohsen Moosavi-Dezfooli
Traditional decision-based black-box adversarial attacks on image classifiers aim to generate adversarial examples by slightly modifying input images while keeping the number of queries low, where each query involves sending an input to the model and observing its output. Most existing methods assume that all queries have equal cost. However, in practice, queries may incur asymmetric costs; for example, in content moderation systems, certain output classes may trigger additional review, enforcement, or penalties, making them more costly than others. While prior work has considered such asymmetric cost settings, effective algorithms for this scenario remain underdeveloped. In this paper, we propose a general framework for decision-based attacks under asymmetric query costs, which we refer to as asymmetric black-box attacks. We modify two core components of existing attacks: the search strategy and the gradient estimation process. Specifically, we propose Asymmetric Search (AS), a more conservative variant of binary search that reduces reliance on high-cost queries, and Asymmetric Gradient Estimation (AGREST), which shifts the sampling distribution to favor low-cost queries. We design efficient algorithms that minimize total attack cost by balancing different query types, in contrast to earlier methods such as stealthy attacks that focus only on limiting expensive (high-cost) queries. Our method can be integrated into a range of existing black-box attacks with minimal changes. We perform both theoretical analysis and empirical evaluation on standard image classification benchmarks. Across various cost regimes, our method consistently achieves lower total query cost and smaller perturbations than existing approaches, with improvements of up to 40% in some settings.
nan
Article 1330
Title@2025-06-07 (6): Revisiting semi-supervised learning in the era of foundation models
Title: Revisiting semi-supervised learning in the era of foundation models | Halbbeaufsichtigtes Lernen im Zeitalter der Gründungsmodelle | 在基础模式时代重新审查半监督的学习 2503.09707v2 |
Authors (4): Ping Zhang, Zheda Mai, Quang-Huy Nguyen, Wei-Lun Chao
Semi-supervised learning (SSL) leverages abundant unlabeled data alongside limited labeled data to enhance learning. As vision foundation models (VFMs) increasingly serve as the backbone of vision applications, it remains unclear how SSL interacts with these pre-trained models. To address this gap, we develop new SSL benchmark datasets where frozen VFMs underperform and systematically evaluate representative SSL methods. We make a surprising observation: parameter-efficient fine-tuning (PEFT) using only labeled data often matches SSL performance, even without leveraging unlabeled data. This motivates us to revisit self-training, a conceptually simple SSL baseline, where we use the supervised PEFT model to pseudo-label unlabeled data for further training. To overcome the notorious issue of noisy pseudo-labels, we propose ensembling multiple PEFT approaches and VFM backbones to produce more robust pseudo-labels. Empirical results validate the effectiveness of this simple yet powerful approach, providing actionable insights into SSL with VFMs and paving the way for more scalable and practical semi-supervised learning in the era of foundation models.
nan
Article 1331
Title@2025-06-07 (6): BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Title: BLEUBERI: BLEU is a surprisingly effective reward for instruction following | BLEUBERI: BLEU ist eine überraschend effektive Belohnung für den Unterricht nach | BLEU是一种令人惊讶的有效奖赏 2505.11080v2 |
Authors (7): Yapei Chang, Yekyung Kim, Michael Krumdick, Amir Zadeh, Chuan Li, Chris Tanner, Mohit Iyyer
Reward models are central to aligning LLMs with human preferences, but they are costly to train, requiring large-scale human-labeled preference data and powerful pretrained LLM backbones. Meanwhile, the increasing availability of high-quality synthetic instruction-following datasets raises the question: can simpler, reference-based metrics serve as viable alternatives to reward models during RL-based alignment? In this paper, we show first that BLEU, a basic string-matching metric, surprisingly matches strong reward models in agreement with human preferences on general instruction-following datasets. Based on this insight, we develop BLEUBERI, a method that first identifies challenging instructions and then applies Group Relative Policy Optimization (GRPO) using BLEU directly as the reward function. We demonstrate that BLEUBERI-trained models are competitive with models trained via reward model-guided RL across four challenging instruction-following benchmarks and three different base language models. A human evaluation further supports that the quality of BLEUBERI model outputs is on par with those from reward model-aligned models. Moreover, BLEUBERI models generate outputs that are more factually grounded than competing methods. Overall, we show that given access to high-quality reference outputs (easily obtained via existing instruction-following datasets or synthetic data generation), string matching-based metrics are cheap yet effective proxies for reward models during alignment. We release our code and data at https://github.com/lilakk/BLEUBERI.
nan
Article 1332
Title@2025-06-07 (6): Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes
Title: Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes | Analyse der Thompson-Probenahme zur Kontrolle unbekannter linearer Diffusionsprozesse | Thompson 用于控制未知线性扩散过程的抽样分析 2206.09977v2 |
Authors (3): Mohamad Kazem Shirani Faradonbeh, Sadegh Shirani, Mohsen Bayati
Linear diffusion processes serve as canonical continuous-time models for dynamic decision-making under uncertainty. These systems evolve according to drift matrices that specify the instantaneous rates of change in the expected system state, while also experiencing continuous random disturbances modeled by Brownian noise. For instance, in medical applications such as artificial pancreas systems, the drift matrices represent the internal dynamics of glucose concentrations. Classical results in stochastic control provide optimal policies under perfect knowledge of the drift matrices. However, practical decision-making scenarios typically feature uncertainty about the drift; in medical contexts, such parameters are patient-specific and unknown, requiring adaptive policies for efficiently learning the drift matrices while ensuring system stability and optimal performance. We study the Thompson sampling (TS) algorithm for decision-making in linear diffusion processes with unknown drift matrices. For this algorithm that designs control policies as if samples from a posterior belief about the parameters fully coincide with the unknown truth, we establish efficiency. That is, Thompson sampling learns optimal control actions fast, incurring only a square-root of time regret, and also learns to stabilize the system in a short time period. To our knowledge, this is the first such result for TS in a diffusion process control problem. Moreover, our empirical simulations in three settings that involve blood-glucose and flight control demonstrate that TS significantly improves regret, compared to the state-of-the-art algorithms, suggesting it explores in a more guarded fashion. Our theoretical analysis includes characterization of a certain optimality manifold that relates the geometry of the drift matrices to the optimal control of the diffusion process, among others.
nan
Article 1333
Title@2025-06-07 (6): Distributionally Robust Learning in Survival Analysis
Title: Distributionally Robust Learning in Survival Analysis | Verteilungsweise robustes Lernen in der Überlebensanalyse | 在生存分析中进行有力的分配学习 2506.01348v2 |
Authors (3): Yeping Jin, Lauren Wise, Ioannis Ch. Paschalidis
We introduce an innovative approach that incorporates a Distributionally Robust Learning (DRL) approach into Cox regression to enhance the robustness and accuracy of survival predictions. By formulating a DRL framework with a Wasserstein distance-based ambiguity set, we develop a variant Cox model that is less sensitive to assumptions about the underlying data distribution and more resilient to model misspecification and data perturbations. By leveraging Wasserstein duality, we reformulate the original min-max DRL problem into a tractable regularized empirical risk minimization problem, which can be computed by exponential conic programming. We provide guarantees on the finite sample behavior of our DRL-Cox model. Moreover, through extensive simulations and real world case studies, we demonstrate that our regression model achieves superior performance in terms of prediction accuracy and robustness compared with traditional methods.
nan
Article 1334
Title@2025-06-07 (6): Collapse-Proof Non-Contrastive Self-Supervised Learning
Title: Collapse-Proof Non-Contrastive Self-Supervised Learning | Kollaps-Proof des nicht kontrasiven Selbstüberwachten Lernens | 非封闭性自我监督学习的折叠结果 2410.04959v3 |
Authors (3): Emanuele Sansone, Tim Lebailly, Tinne Tuytelaars
We present a principled and simplified design of the projector and loss function for non-contrastive self-supervised learning based on hyperdimensional computing. We theoretically demonstrate that this design introduces an inductive bias that encourages representations to be simultaneously decorrelated and clustered, without explicitly enforcing these properties. This bias provably enhances generalization and suffices to avoid known training failure modes, such as representation, dimensional, cluster, and intracluster collapses. We validate our theoretical findings on image datasets, including SVHN, CIFAR-10, CIFAR-100, and ImageNet-100. Our approach effectively combines the strengths of feature decorrelation and cluster-based self-supervised learning methods, overcoming training failure modes while achieving strong generalization in clustering and linear classification tasks.
nan
Article 1335
Title@2025-06-07 (6): Basis Transformers for Multi-Task Tabular Regression
Title: Basis Transformers for Multi-Task Tabular Regression | Basistransformatoren für Multi-Task-Tabellarische Regression | 多任务表格回归基础变换器 2506.06926v1 |
Authors (3): Wei Min Loh, Jiaqi Shang, Pascal Poupart
Dealing with tabular data is challenging due to partial information, noise, and heterogeneous structure. Existing techniques often struggle to simultaneously address key aspects of tabular data such as textual information, a variable number of columns, and unseen data without metadata besides column names. We propose a novel architecture, \textit{basis transformers}, specifically designed to tackle these challenges while respecting inherent invariances in tabular data, including hierarchical structure and the representation of numeric values. We evaluate our design on a multi-task tabular regression benchmark, achieving an improvement of 0.338 in the median $R^2$ score and the lowest standard deviation across 34 tasks from the OpenML-CTR23 benchmark. Furthermore, our model has five times fewer parameters than the best-performing baseline and surpasses pretrained large language model baselines – even when initialized from randomized weights.
nan
Article 1336
Title@2025-06-07 (6): The global convergence time of stochastic gradient descent in non-convex landscapes: Sharp estimates via large deviations
Title: The global convergence time of stochastic gradient descent in non-convex landscapes: Sharp estimates via large deviations | Die globale Konvergenzzeit stochastischer Gradientenabgänge in nicht konvexen Landschaften: Scharfe Schätzungen über große Abweichungen | 非沿海地貌中悬浮性梯度下降的全球趋同时间:因巨大偏差而得出的粗略估计 2503.16398v2 |
Authors (4): Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos
In this paper, we examine the time it takes for stochastic gradient descent (SGD) to reach the global minimum of a general, non-convex loss function. We approach this question through the lens of randomly perturbed dynamical systems and large deviations theory, and we provide a tight characterization of the global convergence time of SGD via matching upper and lower bounds. These bounds are dominated by the most “costly” set of obstacles that the algorithm may need to overcome in order to reach a global minimizer from a given initialization, coupling in this way the global geometry of the underlying loss landscape with the statistics of the noise entering the process. Finally, motivated by applications to the training of deep neural networks, we also provide a series of refinements and extensions of our analysis for loss functions with shallow local minima.
nan
Article 1337
Title@2025-06-07 (6): Measurement to Meaning: A Validity-Centered Framework for AI Evaluation
Title: Measurement to Meaning: A Validity-Centered Framework for AI Evaluation | Messung zur Bedeutung: Ein gültigkeitszentrierter Rahmen für die AI-Bewertung | 衡量到意义:AI评价的有效性-中心框架 2505.10573v3 |
Authors (9): Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, Sanmi Koyejo
While the capabilities and utility of AI systems have advanced, rigorous norms for evaluating these systems have lagged. Grand claims, such as models achieving general reasoning capabilities, are supported with model performance on narrow benchmarks, like performance on graduate-level exam questions, which provide a limited and potentially misleading assessment. We provide a structured approach for reasoning about the types of evaluative claims that can be made given the available evidence. For instance, our framework helps determine whether performance on a mathematical benchmark is an indication of the ability to solve problems on math tests or instead indicates a broader ability to reason. Our framework is well-suited for the contemporary paradigm in machine learning, where various stakeholders provide measurements and evaluations that downstream users use to validate their claims and decisions. At the same time, our framework also informs the construction of evaluations designed to speak to the validity of the relevant claims. By leveraging psychometrics’ breakdown of validity, evaluations can prioritize the most critical facets for a given claim, improving empirical utility and decision-making efficacy. We illustrate our framework through detailed case studies of vision and language model evaluations, highlighting how explicitly considering validity strengthens the connection between evaluation evidence and the claims being made.
nan
Article 1338
Title@2025-06-07 (6): Graph-Based Physics-Guided Urban PM2.5 Air Quality Imputation with Constrained Monitoring Data
Title: Graph-Based Physics-Guided Urban PM2.5 Air Quality Imputation with Constrained Monitoring Data | Graph-based Physics-Guided Urban PM2.5 Luftqualität Imputation mit eingeschränkten Überwachungsdaten | 基于图表的物理指导城市 PP2.5 空气质量与受控监测数据 2506.06917v1 |
Authors (5): Shangjie Du, Hui Wei, Dong Yoon Lee, Zhizhang Hu, Shijia Pan
This work introduces GraPhy, a graph-based, physics-guided learning framework for high-resolution and accurate air quality modeling in urban areas with limited monitoring data. Fine-grained air quality monitoring information is essential for reducing public exposure to pollutants. However, monitoring networks are often sparse in socioeconomically disadvantaged regions, limiting the accuracy and resolution of air quality modeling. To address this, we propose a physics-guided graph neural network architecture called GraPhy with layers and edge features designed specifically for low-resolution monitoring data. Experiments using data from California’s socioeconomically disadvantaged San Joaquin Valley show that GraPhy achieves the overall best performance evaluated by mean squared error (MSE), mean absolute error (MAE), and R-square value (R2), improving the performance by 9%-56% compared to various baseline models. Moreover, GraPhy consistently outperforms baselines across different spatial heterogeneity levels, demonstrating the effectiveness of our model design.
nan
Article 1339
Title@2025-06-07 (6): Graph Neural Networks in Modern AI-aided Drug Discovery
Title: Graph Neural Networks in Modern AI-aided Drug Discovery | Graph Neuronale Netzwerke in der modernen KI-gestützten Drug Discovery | 现代人工智能辅助药物发现现代神经网络图 2506.06915v1 |
Authors (12): Odin Zhang, Haitao Lin, Xujun Zhang, Xiaorui Wang, Zhenxing Wu, Qing Ye, Weibo Zhao, Jike Wang, Kejun Ying, Yu Kang, Chang-yu Hsieh, Tingjun Hou
Graph neural networks (GNNs), as topology/structure-aware models within deep learning, have emerged as powerful tools for AI-aided drug discovery (AIDD). By directly operating on molecular graphs, GNNs offer an intuitive and expressive framework for learning the complex topological and geometric features of drug-like molecules, cementing their role in modern molecular modeling. This review provides a comprehensive overview of the methodological foundations and representative applications of GNNs in drug discovery, spanning tasks such as molecular property prediction, virtual screening, molecular generation, biomedical knowledge graph construction, and synthesis planning. Particular attention is given to recent methodological advances, including geometric GNNs, interpretable models, uncertainty quantification, scalable graph architectures, and graph generative frameworks. We also discuss how these models integrate with modern deep learning approaches, such as self-supervised learning, multi-task learning, meta-learning and pre-training. Throughout this review, we highlight the practical challenges and methodological bottlenecks encountered when applying GNNs to real-world drug discovery pipelines, and conclude with a discussion on future directions.
nan
Article 1340
Title@2025-06-07 (6): Uncertainty Estimation on Graphs with Structure Informed Stochastic Partial Differential Equations
Title: Uncertainty Estimation on Graphs with Structure Informed Stochastic Partial Differential Equations | Ungewissheitsschätzung auf Graphen mit Struktur Informierte stochastische Partielle Differentialgleichungen | 具有结构、知情斯托卡部分差别的图图的不确定性估计 2506.06907v1 |
Authors (2): Fred Xu, Thomas Markovich
Graph Neural Networks have achieved impressive results across diverse network modeling tasks, but accurately estimating uncertainty on graphs remains difficult, especially under distributional shifts. Unlike traditional uncertainty estimation, graph-based uncertainty must account for randomness arising from both the graph’s structure and its label distribution, which adds complexity. In this paper, making an analogy between the evolution of a stochastic partial differential equation (SPDE) driven by Matern Gaussian Process and message passing using GNN layers, we present a principled way to design a novel message passing scheme that incorporates spatial-temporal noises motivated by the Gaussian Process approach to SPDE. Our method simultaneously captures uncertainty across space and time and allows explicit control over the covariance kernel smoothness, thereby enhancing uncertainty estimates on graphs with both low and high label informativeness. Our extensive experiments on Out-of-Distribution (OOD) detection on graph datasets with varying label informativeness demonstrate the soundness and superiority of our model to existing approaches.
nan
Article 1341
Title@2025-06-07 (6): When Incentives Backfire, Data Stops Being Human
Title: When Incentives Backfire, Data Stops Being Human | Wenn Incentives Backfire, Daten aufhören, Mensch zu sein | 当鼓励反火时,数据停止成为人类 2502.07732v2 |
Authors (5): Sebastin Santy, Prasanta Bhattacharya, Manoel Horta Ribeiro, Kelsey Allen, Sewoong Oh
Progress in AI has relied on human-generated data, from annotator marketplaces to the wider Internet. However, the widespread use of large language models now threatens the quality and integrity of human-generated data on these very platforms. We argue that this issue goes beyond the immediate challenge of filtering AI-generated content – it reveals deeper flaws in how data collection systems are designed. Existing systems often prioritize speed, scale, and efficiency at the cost of intrinsic human motivation, leading to declining engagement and data quality. We propose that rethinking data collection systems to align with contributors’ intrinsic motivations – rather than relying solely on external incentives – can help sustain high-quality data sourcing at scale while maintaining contributor trust and long-term participation.
nan
Article 1342
Title@2025-06-07 (6): Prediction-Powered Adaptive Shrinkage Estimation
Title: Prediction-Powered Adaptive Shrinkage Estimation | Vorhersage-Powered Adaptive Schrumpfung Schätzung | 预测力的适应性减缩估计值 2502.14166v2 |
Authors (2): Sida Li, Nikolaos Ignatiadis
Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI’s benefits for individual statistical problems, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage (PAS), a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means. PAS debiases noisy ML predictions within each task and then borrows strength across tasks by using those same predictions as a reference point for shrinkage. The amount of shrinkage is determined by minimizing an unbiased estimate of risk, and we prove that this tuning strategy is asymptotically optimal. Experiments on both synthetic and real-world datasets show that PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.
nan
Article 1343
Title@2025-06-07 (6): An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation
Title: An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation | Eine adaptiv ungenaue Methode für das Zwei-Stufen-Lernen mit primär-dualer Stildifferenzierung | 使用纯-双式风格差异的双级学习适应性不精确方法 2412.06436v3 |
Authors (5): Lea Bogensperger, Matthias J. Ehrhardt, Thomas Pock, Mohammad Sadegh Salehi, Hok Shing Wong
We consider a bilevel learning framework for learning linear operators. In this framework, the learnable parameters are optimized via a loss function that also depends on the minimizer of a convex optimization problem (denoted lower-level problem). We utilize an iterative algorithm called `piggyback’ to compute the gradient of the loss and minimizer of the lower-level problem. Given that the lower-level problem is solved numerically, the loss function and thus its gradient can only be computed inexactly. To estimate the accuracy of the computed hypergradient, we derive an a-posteriori error bound, which provides guides for setting the tolerance for the lower-level problem, as well as the piggyback algorithm. To efficiently solve the upper-level optimization, we also propose an adaptive method for choosing a suitable step-size. To illustrate the proposed method, we consider a few learned regularizer problems, such as training an input-convex neural network.
nan
Article 1344
Title@2025-06-07 (6): TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training
Title: TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training | TorchTitan: Ein-Stop PyTorch native Lösung für produktionsfertiges LLM-Vortraining | 火炬泰坦:当地一站式PyToirch当地生产即成LLM培训前的一站式PyTorch本地解决方案 2410.06511v3 |
Authors (13): Wanchao Liang, Tianyu Liu, Less Wright, Will Constable, Andrew Gu, Chien-Chin Huang, Iris Zhang, Wei Feng, Howard Huang, Junjie Wang, Sanket Purandare, Gokul Nadathur, Stratos Idreos
The development of large language models (LLMs) has been instrumental in advancing state-of-the-art natural language processing applications. Training LLMs with billions of parameters and trillions of tokens require sophisticated distributed systems that enable composing and comparing several state-of-the-art techniques in order to efficiently scale across thousands of accelerators. However, existing solutions are complex, scattered across multiple libraries/repositories, lack interoperability, and are cumbersome to maintain. Thus, curating and empirically comparing training recipes require non-trivial engineering effort. This paper introduces TorchTitan, an open-source, PyTorch-native distributed training system that unifies state-of-the-art techniques, streamlining integration and reducing overhead. TorchTitan enables 3D parallelism in a modular manner with elastic scaling, providing comprehensive logging, checkpointing, and debugging tools for production-ready training. It also incorporates hardware-software co-designed solutions, leveraging features like Float8 training and SymmetricMemory. As a flexible test bed, TorchTitan facilitates custom recipe curation and comparison, allowing us to develop optimized training recipes for Llama 3.1 and provide guidance on selecting techniques for maximum efficiency based on our experiences. We thoroughly assess TorchTitan on the Llama 3.1 family of LLMs, spanning 8 billion to 405 billion parameters, and showcase its exceptional performance, modular composability, and elastic scalability. By stacking training optimizations, we demonstrate accelerations of 65.08% with 1D parallelism at the 128-GPU scale (Llama 3.1 8B), an additional 12.59% with 2D parallelism at the 256-GPU scale (Llama 3.1 70B), and an additional 30% with 3D parallelism at the 512-GPU scale (Llama 3.1 405B) on NVIDIA H100 GPUs over optimized baselines.
nan
Article 1345
Title@2025-06-07 (6): NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery
Title: NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery | NSD-Imagery: Ein Benchmark-Datensatz zur Erweiterung von fMRI-Vision-Dekodierungsmethoden auf geistige Bilder | NSD- 想象力:将FMRI愿景解码方法扩展至精神图像的基准数据集 2506.06898v1 |
Authors (6): Reese Kneeland, Paul S. Scotti, Ghislain St-Yves, Jesse Breedlove, Kendrick Kay, Thomas Naselaris
We release NSD-Imagery, a benchmark dataset of human fMRI activity paired with mental images, to complement the existing Natural Scenes Dataset (NSD), a large-scale dataset of fMRI activity paired with seen images that enabled unprecedented improvements in fMRI-to-image reconstruction efforts. Recent models trained on NSD have been evaluated only on seen image reconstruction. Using NSD-Imagery, it is possible to assess how well these models perform on mental image reconstruction. This is a challenging generalization requirement because mental images are encoded in human brain activity with relatively lower signal-to-noise and spatial resolution; however, generalization from seen to mental imagery is critical for real-world applications in medical domains and brain-computer interfaces, where the desired information is always internally generated. We provide benchmarks for a suite of recent NSD-trained open-source visual decoding models (MindEye1, MindEye2, Brain Diffuser, iCNN, Takagi et al.) on NSD-Imagery, and show that the performance of decoding methods on mental images is largely decoupled from performance on vision reconstruction. We further demonstrate that architectural choices significantly impact cross-decoding performance: models employing simple linear decoding architectures and multimodal feature decoding generalize better to mental imagery, while complex architectures tend to overfit visual training data. Our findings indicate that mental imagery datasets are critical for the development of practical applications, and establish NSD-Imagery as a useful resource for better aligning visual decoding methods with this goal.
nan
Article 1346
Title@2025-06-07 (6): Exploring Image Transforms derived from Eye Gaze Variables for Progressive Autism Diagnosis
Title: Exploring Image Transforms derived from Eye Gaze Variables for Progressive Autism Diagnosis | Erforschen von Bildtransformationen, abgeleitet von Eye Gaze-Variablen für die progressive Autismus-Diagnose | 探索从用于渐进自闭症诊断的眼盖变量中产生的图像变换 2506.09065v1 |
Authors (6): Abigail Copiaco, Christian Ritz, Yassine Himeur, Valsamma Eapen, Ammar Albanna, Wathiq Mansoor
The prevalence of Autism Spectrum Disorder (ASD) has surged rapidly over the past decade, posing significant challenges in communication, behavior, and focus for affected individuals. Current diagnostic techniques, though effective, are time-intensive, leading to high social and economic costs. This work introduces an AI-powered assistive technology designed to streamline ASD diagnosis and management, enhancing convenience for individuals with ASD and efficiency for caregivers and therapists. The system integrates transfer learning with image transforms derived from eye gaze variables to diagnose ASD. This facilitates and opens opportunities for in-home periodical diagnosis, reducing stress for individuals and caregivers, while also preserving user privacy through the use of image transforms. The accessibility of the proposed method also offers opportunities for improved communication between guardians and therapists, ensuring regular updates on progress and evolving support needs. Overall, the approach proposed in this work ensures timely, accessible diagnosis while protecting the subjects’ privacy, improving outcomes for individuals with ASD.
nan
Article 1347
Title@2025-06-07 (6): LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
Title: LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models | LLM-SRBench: Ein neuer Benchmark für die Entdeckung wissenschaftlicher Gleichungen mit großen Sprachmodellen | LLM-SRBench:采用大语言模式科学等同发现新基准 2504.10415v2 |
Authors (6): Parshin Shojaee, Ngoc-Hieu Nguyen, Kazem Meidani, Amir Barati Farimani, Khoa D Doan, Chandan K Reddy
Scientific equation discovery is a fundamental task in the history of scientific progress, enabling the derivation of laws governing natural phenomena. Recently, Large Language Models (LLMs) have gained interest for this task due to their potential to leverage embedded scientific knowledge for hypothesis generation. However, evaluating the true discovery capabilities of these methods remains challenging, as existing benchmarks often rely on common equations that are susceptible to memorization by LLMs, leading to inflated performance metrics that do not reflect discovery. In this paper, we introduce LLM-SRBench, a comprehensive benchmark with 239 challenging problems across four scientific domains specifically designed to evaluate LLM-based scientific equation discovery methods while preventing trivial memorization. Our benchmark comprises two main categories: LSR-Transform, which transforms common physical models into less common mathematical representations to test reasoning beyond memorized forms, and LSR-Synth, which introduces synthetic, discovery-driven problems requiring data-driven reasoning. Through extensive evaluation of several state-of-the-art methods, using both open and closed LLMs, we find that the best-performing system so far achieves only 31.5% symbolic accuracy. These findings highlight the challenges of scientific equation discovery, positioning LLM-SRBench as a valuable resource for future research.
nan
Article 1348
Title@2025-06-07 (6): Multimodal Integration of Longitudinal Noninvasive Diagnostics for Survival Prediction in Immunotherapy Using Deep Learning
Title: Multimodal Integration of Longitudinal Noninvasive Diagnostics for Survival Prediction in Immunotherapy Using Deep Learning | Multimodale Integration der langitudinalen Nichtinvasiven Diagnostik zur Überlebensvorhersage in der Immuntherapie mittels Deep Learning | 利用深层学习的免疫疗法中生存预测纵向非侵入性诊断的多模式综合 2411.18253v2 |
Authors (7): Melda Yeghaian, Zuhir Bodalal, Daan van den Broek, John B A G Haanen, Regina G H Beets-Tan, Stefano Trebeschi, Marcel A J van Gerven
Purpose: Immunotherapies have revolutionized the landscape of cancer treatments. However, our understanding of response patterns in advanced cancers treated with immunotherapy remains limited. By leveraging routinely collected noninvasive longitudinal and multimodal data with artificial intelligence, we could unlock the potential to transform immunotherapy for cancer patients, paving the way for personalized treatment approaches. Methods: In this study, we developed a novel artificial neural network architecture, multimodal transformer-based simple temporal attention (MMTSimTA) network, building upon a combination of recent successful developments. We integrated pre- and on-treatment blood measurements, prescribed medications and CT-based volumes of organs from a large pan-cancer cohort of 694 patients treated with immunotherapy to predict mortality at three, six, nine and twelve months. Different variants of our extended MMTSimTA network were implemented and compared to baseline methods incorporating intermediate and late fusion based integration methods. Results: The strongest prognostic performance was demonstrated using a variant of the MMTSimTA model with area under the curves (AUCs) of $0.84 \pm $0.04, $0.83 \pm $0.02, $0.82 \pm $0.02, $0.81 \pm $0.03 for 3-, 6-, 9-, and 12-month survival prediction, respectively. Discussion: Our findings show that integrating noninvasive longitudinal data using our novel architecture yields an improved multimodal prognostic performance, especially in short-term survival prediction. Conclusion: Our study demonstrates that multimodal longitudinal integration of noninvasive data using deep learning may offer a promising approach for personalized prognostication in immunotherapy-treated cancer patients.
nan
Article 1349
Title@2025-06-07 (6): Scalable Gaussian Processes with Latent Kronecker Structure
Title: Scalable Gaussian Processes with Latent Kronecker Structure | Skalierbare Gaußprozesse mit latenter Kronecker-Struktur | 可缩放高斯进程 2506.06895v1 |
Authors (6): Jihao Andreas Lin, Sebastian Ament, Maximilian Balandat, David Eriksson, José Miguel Hernández-Lobato, Eytan Bakshy
Applying Gaussian processes (GPs) to very large datasets remains a challenge due to limited computational scalability. Matrix structures, such as the Kronecker product, can accelerate operations significantly, but their application commonly entails approximations or unrealistic assumptions. In particular, the most common path to creating a Kronecker-structured kernel matrix is by evaluating a product kernel on gridded inputs that can be expressed as a Cartesian product. However, this structure is lost if any observation is missing, breaking the Cartesian product structure, which frequently occurs in real-world data such as time series. To address this limitation, we propose leveraging latent Kronecker structure, by expressing the kernel matrix of observed values as the projection of a latent Kronecker product. In combination with iterative linear system solvers and pathwise conditioning, our method facilitates inference of exact GPs while requiring substantially fewer computational resources than standard iterative methods. We demonstrate that our method outperforms state-of-the-art sparse and variational GPs on real-world datasets with up to five million examples, including robotics, automated machine learning, and climate applications.
nan
Article 1350
Title@2025-06-07 (6): A Novel Neural Filter to Improve Accuracy of Neural Network Models of Dynamic Systems
Title: A Novel Neural Filter to Improve Accuracy of Neural Network Models of Dynamic Systems | Ein neuartiger neuraler Filter zur Verbesserung der Genauigkeit neuraler Netzwerkmodelle dynamischer Systeme | 提高动态系统神经网络模型准确性的新神经过滤器 2409.13654v2 |
Authors (3): Parham Oveissi, Turibius Rozario, Ankit Goel
The application of neural networks in modeling dynamic systems has become prominent due to their ability to estimate complex nonlinear functions. Despite their effectiveness, neural networks face challenges in long-term predictions, where the prediction error diverges over time, thus degrading their accuracy. This paper presents a neural filter to enhance the accuracy of long-term state predictions of neural network-based models of dynamic systems. Motivated by the extended Kalman filter, the neural filter combines the neural network state predictions with the measurements from the physical system to improve the estimated state’s accuracy. The neural filter’s improvements in prediction accuracy are demonstrated through applications to four nonlinear dynamical systems. Numerical experiments show that the neural filter significantly improves prediction accuracy and bounds the state estimate covariance, outperforming the neural network predictions. Furthermore, it is also shown that the accuracy of a poorly trained neural network model can be improved to the same level as that of an adequately trained neural network model, potentially decreasing the training cost and required data to train a neural network.
nan
Article 1351
Title@2025-06-07 (6): Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks?
Title: Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks? | Kann In-Context-Verstärkung lernen, sich von Belohnungs-Vergiftungsangriffen zu erholen? | 文中加强学习能从受毒害袭击中恢复吗? 2506.06891v1 |
Authors (3): Paulius Sasnauskas, Yiğit Yalın, Goran Radanović
We study the corruption-robustness of in-context reinforcement learning (ICRL), focusing on the Decision-Pretrained Transformer (DPT, Lee et al., 2023). To address the challenge of reward poisoning attacks targeting the DPT, we propose a novel adversarial training framework, called Adversarially Trained Decision-Pretrained Transformer (AT-DPT). Our method simultaneously trains an attacker to minimize the true reward of the DPT by poisoning environment rewards, and a DPT model to infer optimal actions from the poisoned data. We evaluate the effectiveness of our approach against standard bandit algorithms, including robust baselines designed to handle reward contamination. Our results show that the proposed method significantly outperforms these baselines in bandit settings, under a learned attacker. We additionally evaluate AT-DPT on an adaptive attacker, and observe similar results. Furthermore, we extend our evaluation to the MDP setting, confirming that the robustness observed in bandit scenarios generalizes to more complex environments.
nan
Article 1352
Title@2025-06-07 (6): FREE: Fast and Robust Vision Language Models with Early Exits
Title: FREE: Fast and Robust Vision Language Models with Early Exits | KOSTENLOS: Schnelle und robuste Vision Sprachmodelle mit frühen Ausgängen | FREE:有早期出入口的快速和强有力的愿景语言模式 2506.06884v1 |
Authors (2): Divya Jyoti Bajpai, Manjesh Kumar Hanawal
In recent years, Vision-Language Models (VLMs) have shown remarkable performance improvements in Vision-Language tasks. However, their large size poses challenges for real-world applications where inference latency is a concern. To tackle this issue, we propose employing Early Exit (EE) strategies in VLMs. However, training exit classifiers in VLMs is challenging, particularly with limited labeled training data. To address this, we introduce FREE, an adversarial training approach within a GAN-based framework. Here, each exit consists of a transformer layer and a classifier. The transformer layer is adversarially trained to produce feature representations similar to the final layer, while a feature classifier serves as the discriminator. Our method focuses on performing input-adaptive inference that increases inference speed with minimal drop in performance. Experimental results demonstrate the effectiveness of our approach in enhancing accuracy and model robustness by mitigating overthinking and the phenomenon of mid-crisis that we highlight. We experimentally validate that our method speeds up the inference process by more than 1.51x while retaining comparable performance. The source code is available at https://github.com/Div290/FREE.
nan
Article 1353
Title@2025-06-07 (6): LookAlike: Consistent Distractor Generation in Math MCQs
Title: LookAlike: Consistent Distractor Generation in Math MCQs | LookAlike: Konsistente Distraktor-Generation in Math MCQs | 类似外观: 在数学 MCQs 中, 一致的分隔器生成 2505.01903v2 |
Authors (5): Nisarg Parikh, Nigel Fernandez, Alexander Scarlatos, Simon Woodhead, Andrew Lan
Large language models (LLMs) are increasingly used to generate distractors for multiple-choice questions (MCQs), especially in domains like math education. However, existing approaches are limited in ensuring that the generated distractors are consistent with common student errors. We propose LookAlike, a method that improves error-distractor consistency via preference optimization. Our two main innovations are: (a) mining synthetic preference pairs from model inconsistencies, and (b) alternating supervised fine-tuning (SFT) with Direct Preference Optimization (DPO) to stabilize training. Unlike prior work that relies on heuristics or manually annotated preference data, LookAlike uses its own generation inconsistencies as dispreferred samples, thus enabling scalable and stable training. Evaluated on a real-world dataset of 1,400+ math MCQs, LookAlike achieves 51.6% accuracy in distractor generation and 57.2% in error generation under LLM-as-a-judge evaluation, outperforming an existing state-of-the-art method (45.6% / 47.7%). These improvements highlight the effectiveness of preference-based regularization and inconsistency mining for generating consistent math MCQ distractors at scale.
nan
Article 1354
Title@2025-06-07 (6): Empowering COVID-19 Detection: Optimizing Performance Through Fine-Tuned EfficientNet Deep Learning Architecture
Title: Empowering COVID-19 Detection: Optimizing Performance Through Fine-Tuned EfficientNet Deep Learning Architecture | Empowering COVID-19 Detection: Optimierung der Leistung durch eine effiziente Net Deep Learning Architektur | 增强COVID-19探测能力:通过精巧高效网络深层学习架构优化业绩 2311.16593v2 |
Authors (5): Md. Alamin Talukder, Md. Abu Layek, Mohsin Kazi, Md Ashraf Uddin, Sunil Aryal
The worldwide COVID-19 pandemic has profoundly influenced the health and everyday experiences of individuals across the planet. It is a highly contagious respiratory disease requiring early and accurate detection to curb its rapid transmission. Initial testing methods primarily revolved around identifying the genetic composition of the coronavirus, exhibiting a relatively low detection rate and requiring a time-intensive procedure. To address this challenge, experts have suggested using radiological imagery, particularly chest X-rays, as a valuable approach within the diagnostic protocol. This study investigates the potential of leveraging radiographic imaging (X-rays) with deep learning algorithms to swiftly and precisely identify COVID-19 patients. The proposed approach elevates the detection accuracy by fine-tuning with appropriate layers on various established transfer learning models. The experimentation was conducted on a COVID-19 X-ray dataset containing 2000 images. The accuracy rates achieved were impressive of 100% for EfficientNetB4 model. The fine-tuned EfficientNetB4 achieved an excellent accuracy score, showcasing its potential as a robust COVID-19 detection model. Furthermore, EfficientNetB4 excelled in identifying Lung disease using Chest X-ray dataset containing 4,350 Images, achieving remarkable performance with an accuracy of 99.17%, precision of 99.13%, recall of 99.16%, and f1-score of 99.14%. These results highlight the promise of fine-tuned transfer learning for efficient lung detection through medical imaging, especially with X-ray images. This research offers radiologists an effective means of aiding rapid and precise COVID-19 diagnosis and contributes valuable assistance for healthcare professionals in accurately identifying affected patients.
nan
Article 1355
Title@2025-06-07 (6): Learnable Activation Functions in Physics-Informed Neural Networks for Solving Partial Differential Equations
Title: Learnable Activation Functions in Physics-Informed Neural Networks for Solving Partial Differential Equations | Lernbare Aktivierungsfunktionen in physikinformierten Neuronalen Netzwerken zur Lösung partieller Differentialgleichungen | 用于解决部分差异等量的物理内成神经网络中可学习的激活功能 2411.15111v3 |
Authors (2): Afrah Farea, Mustafa Serdar Celebi
We investigate learnable activation functions in Physics-Informed Neural Networks (PINNs) for solving Partial Differential Equations (PDEs): comparing traditional Multilayer Perceptrons (MLPs) with fixed and trainable activations against Kolmogorov-Arnold Networks (KANs) that employ learnable basis functions. While PINNs effectively incorporate physical laws into the learning process, they suffer from convergence and spectral bias problems, which limit their applicability to problems with rapid oscillations or sharp transitions. In this work, we study various activation and basis functions across diverse PDEs, including oscillatory, nonlinear wave, mixed-physics, and fluid dynamics problems. Using empirical Neural Tangent Kernel (NTK) analysis and Hessian eigenvalue decomposition, we assess convergence behavior, stability, and high-frequency approximation capacity. While KANs offer improved expressivity for capturing complex, high-frequency PDE solutions, they introduce new optimization challenges, especially in deeper networks. Our findings show that KANs face a curse of functional dimensionality, creating intractable optimization landscapes in deeper networks. Low spectral bias alone does not guarantee good performance; adaptive spectral bias approaches such as B-splines achieve optimal results by balancing global stability with local high-frequency resolution. Different PDE types require tailored strategies: smooth global activation functions excel for wave phenomena, while local adaptive activation functions suit problems with sharp transitions.
nan
Article 1356
Title@2025-06-07 (6): An Optimized Ensemble Deep Learning Model For Brain Tumor Classification
Title: An Optimized Ensemble Deep Learning Model For Brain Tumor Classification | Ein optimiertes Ensemble Deep Learning Modell für die Klassifizierung von Gehirntumoren | 大脑肿瘤分类最佳组合深层学习模型 2305.12844v3 |
Authors (3): Md. Alamin Talukder, Md. Manowarul Islam, Md Ashraf Uddin
Brain tumors present a grave risk to human life, demanding precise and timely diagnosis for effective treatment. Inaccurate identification of brain tumors can significantly diminish life expectancy, underscoring the critical need for precise diagnostic methods. Manual identification of brain tumors within vast Magnetic Resonance Imaging (MRI) image datasets is arduous and time-consuming. Thus, the development of a reliable deep learning (DL) model is essential to enhance diagnostic accuracy and ultimately save lives. This study introduces an innovative optimization-based deep ensemble approach employing transfer learning (TL) to efficiently classify brain tumors. Our methodology includes meticulous preprocessing, reconstruction of TL architectures, fine-tuning, and ensemble DL models utilizing weighted optimization techniques such as Genetic Algorithm-based Weight Optimization (GAWO) and Grid Search-based Weight Optimization (GSWO). Experimentation is conducted on the Figshare Contrast-Enhanced MRI (CE-MRI) brain tumor dataset, comprising 3064 images. Our approach achieves notable accuracy scores, with Xception, ResNet50V2, ResNet152V2, InceptionResNetV2, GAWO, and GSWO attaining 99.42%, 98.37%, 98.22%, 98.26%, 99.71%, and 99.76% accuracy, respectively. Notably, GSWO demonstrates superior accuracy, averaging 99.76\% accuracy across five folds on the Figshare CE-MRI brain tumor dataset. The comparative analysis highlights the significant performance enhancement of our proposed model over existing counterparts. In conclusion, our optimized deep ensemble model exhibits exceptional accuracy in swiftly classifying brain tumors. Furthermore, it has the potential to assist neurologists and clinicians in making accurate and immediate diagnostic decisions.
nan
Article 1357
Title@2025-06-07 (6): Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization
Title: Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization | Verbesserte Probenkomplexität für private Nonsmooth Nonconvex-Optimierung | 改进私人非机动非康通非节流优化的样品复杂度 2410.05880v2 |
Authors (3): Guy Kornowski, Daogao Liu, Kunal Talwar
We study differentially private (DP) optimization algorithms for stochastic and empirical objectives which are neither smooth nor convex, and propose methods that return a Goldstein-stationary point with sample complexity bounds that improve on existing works. We start by providing a single-pass $(\epsilon,\delta)$-DP algorithm that returns an $(\alpha,\beta)$-stationary point as long as the dataset is of size $\widetilde{\Omega}(\sqrt{d}/\alpha\beta^{3}+d/\epsilon\alpha\beta^{2})$, which is $\Omega(\sqrt{d})$ times smaller than the algorithm of Zhang et al. [2024] for this task, where $d$ is the dimension. We then provide a multi-pass polynomial time algorithm which further improves the sample complexity to $\widetilde{\Omega}\left(d/\beta^2+d^{3/4}/\epsilon\alpha^{1/2}\beta^{3/2}\right)$, by designing a sample efficient ERM algorithm, and proving that Goldstein-stationary points generalize from the empirical loss to the population loss.
nan
Article 1358
Title@2025-06-07 (6): Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
Title: Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding | Weit und deutlich sehen: Halluzinationen in MLLMs mit Aufmerksamkeit abbauen | 远看远看得很清楚:在MLLLMS中减少幻觉,并注意固态解说 2505.16652v2 |
Authors (12): Feilong Tang, Chengzhi Liu, Zhongxing Xu, Ming Hu, Zelin Peng, Zhiwei Yang, Jionglong Su, Minquan Lin, Yifan Peng, Xuelian Cheng, Imran Razzak, Zongyuan Ge
Recent advancements in multimodal large language models (MLLMs) have significantly improved performance in visual question answering. However, they often suffer from hallucinations. In this work, hallucinations are categorized into two main types: initial hallucinations and snowball hallucinations. We argue that adequate contextual information can be extracted directly from the token interaction process. Inspired by causal inference in the decoding strategy, we propose to leverage causal masks to establish information propagation between multimodal tokens. The hypothesis is that insufficient interaction between those tokens may lead the model to rely on outlier tokens, overlooking dense and rich contextual cues. Therefore, we propose to intervene in the propagation process by tackling outlier tokens to enhance in-context inference. With this goal, we present FarSight, a versatile plug-and-play decoding strategy to reduce attention interference from outlier tokens merely by optimizing the causal mask. The heart of our method is effective token propagation. We design an attention register structure within the upper triangular matrix of the causal mask, dynamically allocating attention to capture attention diverted to outlier tokens. Moreover, a positional awareness encoding method with a diminishing masking rate is proposed, allowing the model to attend to further preceding tokens, especially for video sequence tasks. With extensive experiments, FarSight demonstrates significant hallucination-mitigating performance across different MLLMs on both image and video benchmarks, proving its effectiveness.
nan
Article 1359
Title@2025-06-07 (6): Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary Methods
Title: Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary Methods | Learning Fluid-Struktur-Interaktion Dynamik mit physikinformierten Neuronalen Netzwerken und eingetauchten Grenzmethoden | 与物理内成形神经网络和混合边界方法的互动动态 2505.18565v2 |
Authors (5): Afrah Farea, Saiful Khan, Reza Daryani, Emre Cenk Ersan, Mustafa Serdar Celebi
We introduce neural network architectures that combine physics-informed neural networks (PINNs) with the immersed boundary method (IBM) to solve fluid-structure interaction (FSI) problems. Our approach features two distinct architectures: a Single-FSI network with a unified parameter space, and an innovative Eulerian-Lagrangian network that maintains separate parameter spaces for fluid and structure domains. We study each architecture using standard Tanh and adaptive B-spline activation functions. Empirical studies on a 2D cavity flow problem involving a moving solid structure show that the Eulerian-Lagrangian architecture performs significantly better. The adaptive B-spline activation further enhances accuracy by providing locality-aware representation near boundaries. While our methodology shows promising results in predicting the velocity field, pressure recovery remains challenging due to the absence of explicit force-coupling constraints in the current formulation. Our findings underscore the importance of domain-specific architectural design and adaptive activation functions for modeling FSI problems within the PINN framework.
nan
Article 1360
Title@2025-06-07 (6): Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning
Title: Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning | Log-Sum-Exponential-Estimator für Off-Policy-Evaluierung und -Lernen | 用于非政策评价和学习的非政策评价和学习的日志- 2506.06873v1 |
Authors (6): Armin Behnamnia, Gholamali Aminian, Alireza Aghaei, Chengchun Shi, Vincent Y. F. Tan, Hamid R. Rabiee
Off-policy learning and evaluation leverage logged bandit feedback datasets, which contain context, action, propensity score, and feedback for each data point. These scenarios face significant challenges due to high variance and poor performance with low-quality propensity scores and heavy-tailed reward distributions. We address these issues by introducing a novel estimator based on the log-sum-exponential (LSE) operator, which outperforms traditional inverse propensity score estimators. Our LSE estimator demonstrates variance reduction and robustness under heavy-tailed conditions. For off-policy evaluation, we derive upper bounds on the estimator’s bias and variance. In the off-policy learning scenario, we establish bounds on the regret – the performance gap between our LSE estimator and the optimal policy – assuming bounded $(1+\epsilon)$-th moment of weighted reward. Notably, we achieve a convergence rate of $O(n^{-\epsilon/(1+ \epsilon)})$ for the regret bounds, where $\epsilon \in [0,1]$ and $n$ is the size of logged bandit feedback dataset. Theoretical analysis is complemented by comprehensive empirical evaluations in both off-policy learning and evaluation scenarios, confirming the practical advantages of our approach. The code for our estimator is available at the following link: https://github.com/armin-behnamnia/lse-offpolicy-learning.
nan
Article 1361
Title@2025-06-07 (6): Watermark under Fire: A Robustness Evaluation of LLM Watermarking
Title: Watermark under Fire: A Robustness Evaluation of LLM Watermarking | Wasserzeichen unter Feuer: Eine Robustheitsbewertung von LLM Watermarking | 火灾下的水印:对LLM水标志的强力评价 2411.13425v3 |
Authors (5): Jiacheng Liang, Zian Wang, Lauren Hong, Shouling Ji, Ting Wang
Various watermarking methods (``watermarkers’’) have been proposed to identify LLM-generated texts; yet, due to the lack of unified evaluation platforms, many critical questions remain under-explored: i) What are the strengths/limitations of various watermarkers, especially their attack robustness? ii) How do various design choices impact their robustness? iii) How to optimally operate watermarkers in adversarial environments? To fill this gap, we systematize existing LLM watermarkers and watermark removal attacks, mapping out their design spaces. We then develop WaterPark, a unified platform that integrates 10 state-of-the-art watermarkers and 12 representative attacks. More importantly, by leveraging WaterPark, we conduct a comprehensive assessment of existing watermarkers, unveiling the impact of various design choices on their attack robustness. We further explore the best practices to operate watermarkers in adversarial environments. We believe our study sheds light on current LLM watermarking techniques while WaterPark serves as a valuable testbed to facilitate future research.
nan
Article 1362
Title@2025-06-07 (6): KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies
Title: KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies | KEA: Exploration durch proaktiv koordinierte Explorationsstrategien am Leben erhalten | KEA:通过主动协调的勘探战略使勘探活动保持活跃 2503.18234v2 |
Authors (4): Shih-Min Yang, Martin Magnusson, Johannes A. Stork, Todor Stoyanov
Soft Actor-Critic (SAC) has achieved notable success in continuous control tasks but struggles in sparse reward settings, where infrequent rewards make efficient exploration challenging. While novelty-based exploration methods address this issue by encouraging the agent to explore novel states, they are not trivial to apply to SAC. In particular, managing the interaction between novelty-based exploration and SAC’s stochastic policy can lead to inefficient exploration and redundant sample collection. In this paper, we propose KEA (Keeping Exploration Alive) which tackles the inefficiencies in balancing exploration strategies when combining SAC with novelty-based exploration. KEA integrates a novelty-augmented SAC with a standard SAC agent, proactively coordinated via a switching mechanism. This coordination allows the agent to maintain stochasticity in high-novelty regions, enhancing exploration efficiency and reducing repeated sample collection. We first analyze this potential issue in a 2D navigation task, and then evaluate KEA on the DeepSea hard-exploration benchmark as well as sparse reward control tasks from the DeepMind Control Suite. Compared to state-of-the-art novelty-based exploration baselines, our experiments show that KEA significantly improves learning efficiency and robustness in sparse reward setups.
nan
Article 1363
Title@2025-06-07 (6): SAFE: Finding Sparse and Flat Minima to Improve Pruning
Title: SAFE: Finding Sparse and Flat Minima to Improve Pruning | SAFE: Sparse und flache Minima finden, um Pruning zu verbessern | FAFE: 寻找粗糙和平坦的迷你小米来改善谨慎 2506.06866v1 |
Authors (4): Dongyeop Lee, Kwanhee Lee, Jinseok Chung, Namhoon Lee
Sparsifying neural networks often suffers from seemingly inevitable performance degradation, and it remains challenging to restore the original performance despite much recent progress. Motivated by recent studies in robust optimization, we aim to tackle this problem by finding subnetworks that are both sparse and flat at the same time. Specifically, we formulate pruning as a sparsity-constrained optimization problem where flatness is encouraged as an objective. We solve it explicitly via an augmented Lagrange dual approach and extend it further by proposing a generalized projection operation, resulting in novel pruning methods called SAFE and its extension, SAFE$^+$. Extensive evaluations on standard image classification and language modeling tasks reveal that SAFE consistently yields sparse networks with improved generalization performance, which compares competitively to well-established baselines. In addition, SAFE demonstrates resilience to noisy data, making it well-suited for real-world conditions.
nan
Article 1364
Title@2025-06-07 (6): Generalization and Robustness of the Tilted Empirical Risk
Title: Generalization and Robustness of the Tilted Empirical Risk | Verallgemeinerung und Robustheit des abgeschwächten Empirischen Risikos | 微薄经验风险的概括化和有力性 2409.19431v3 |
Authors (6): Gholamali Aminian, Amir R. Asadi, Tian Li, Ahmad Beirami, Gesine Reinert, Samuel N. Cohen
The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, \citet{li2020tilted} proposed the {\it tilted empirical risk} (TER) as a non-linear risk metric for machine learning applications such as classification and regression problems. In this work, we examine the generalization error of the tilted empirical risk in the robustness regime under \textit{negative tilt}. Our first contribution is to provide uniform and information-theoretic bounds on the {\it tilted generalization error}, defined as the difference between the population risk and the tilted empirical risk, under negative tilt for unbounded loss function under bounded $(1+\epsilon)$-th moment of loss function for some $\epsilon\in(0,1]$ with a convergence rate of $O(n^{-\epsilon/(1+\epsilon)})$ where $n$ is the number of training samples, revealing a novel application for TER under no distribution shift. Secondly, we study the robustness of the tilted empirical risk with respect to noisy outliers at training time and provide theoretical guarantees under distribution shift for the tilted empirical risk. We empirically corroborate our findings in simple experimental setups where we evaluate our bounds to select the value of tilt in a data-driven manner.
nan
Article 1365
Title@2025-06-07 (6): Multimodal Spatial Language Maps for Robot Navigation and Manipulation
Title: Multimodal Spatial Language Maps for Robot Navigation and Manipulation | Multimodale räumliche Sprachkarten für Roboternavigation und -manipulation | 用于机器人导航和操纵的多模式空间语言地图 2506.06862v1 |
Authors (4): Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard
Grounding language to a navigating agent’s observations can leverage pretrained multimodal foundation models to match perceptions to object or event descriptions. However, previous approaches remain disconnected from environment mapping, lack the spatial precision of geometric maps, or neglect additional modality information beyond vision. To address this, we propose multimodal spatial language maps as a spatial map representation that fuses pretrained multimodal features with a 3D reconstruction of the environment. We build these maps autonomously using standard exploration. We present two instances of our maps, which are visual-language maps (VLMaps) and their extension to audio-visual-language maps (AVLMaps) obtained by adding audio information. When combined with large language models (LLMs), VLMaps can (i) translate natural language commands into open-vocabulary spatial goals (e.g., “in between the sofa and TV”) directly localized in the map, and (ii) be shared across different robot embodiments to generate tailored obstacle maps on demand. Building upon the capabilities above, AVLMaps extend VLMaps by introducing a unified 3D spatial representation integrating audio, visual, and language cues through the fusion of features from pretrained multimodal foundation models. This enables robots to ground multimodal goal queries (e.g., text, images, or audio snippets) to spatial locations for navigation. Additionally, the incorporation of diverse sensory inputs significantly enhances goal disambiguation in ambiguous environments. Experiments in simulation and real-world settings demonstrate that our multimodal spatial language maps enable zero-shot spatial and multimodal goal navigation and improve recall by 50% in ambiguous scenarios. These capabilities extend to mobile robots and tabletop manipulators, supporting navigation and interaction guided by visual, audio, and spatial cues.
nan
Article 1366
Title@2025-06-07 (6): Differentially Private Sparse Linear Regression with Heavy-tailed Responses
Title: Differentially Private Sparse Linear Regression with Heavy-tailed Responses | Unterschiedlich Private Sparse Lineare Regression mit stark gestaffelten Responses | 具有重尾量反应的 不同私人的粗线性递减 2506.06861v1 |
Authors (5): Xizhi Tian, Meng Ding, Touming Tao, Zihang Xiang, Di Wang
As a fundamental problem in machine learning and differential privacy (DP), DP linear regression has been extensively studied. However, most existing methods focus primarily on either regular data distributions or low-dimensional cases with irregular data. To address these limitations, this paper provides a comprehensive study of DP sparse linear regression with heavy-tailed responses in high-dimensional settings. In the first part, we introduce the DP-IHT-H method, which leverages the Huber loss and private iterative hard thresholding to achieve an estimation error bound of ( \tilde{O}\biggl( s^{* \frac{1 }{2}} \cdot \biggl(\frac{\log d}{n}\biggr)^{\frac{\zeta}{1 + \zeta}} + s^{* \frac{1 + 2\zeta}{2 + 2\zeta}} \cdot \biggl(\frac{\log^2 d}{n \varepsilon}\biggr)^{\frac{\zeta}{1 + \zeta}} \biggr) ) under the $(\varepsilon, \delta)$-DP model, where $n$ is the sample size, $d$ is the dimensionality, $s^$ is the sparsity of the parameter, and $\zeta \in (0, 1]$ characterizes the tail heaviness of the data. In the second part, we propose DP-IHT-L, which further improves the error bound under additional assumptions on the response and achieves ( \tilde{O}\Bigl(\frac{(s^)^{3/2} \log d}{n \varepsilon}\Bigr). ) Compared to the first result, this bound is independent of the tail parameter $\zeta$. Finally, through experiments on synthetic and real-world datasets, we demonstrate that our methods outperform standard DP algorithms designed for ``regular’’ data.
nan
Article 1367
Title@2025-06-07 (6): High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations
Title: High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations | High-Fidelity Scientific Simulation Surrogats über adaptive Implizite Neuraldarstellungen | 通过适应性隐隐性神经仪表显示的高纤维科学模拟模拟器 2506.06858v1 |
Authors (6): Ziwei Li, Yuhan Duan, Tianyu Xiong, Yi-Tang Chen, Wei-Lun Chao, Han-Wei Shen
Effective surrogate models are critical for accelerating scientific simulations. Implicit neural representations (INRs) offer a compact and continuous framework for modeling spatially structured data, but they often struggle with complex scientific fields exhibiting localized, high-frequency variations. Recent approaches address this by introducing additional features along rigid geometric structures (e.g., grids), but at the cost of flexibility and increased model size. In this paper, we propose a simple yet effective alternative: Feature-Adaptive INR (FA-INR). FA-INR leverages cross-attention to an augmented memory bank to learn flexible feature representations, enabling adaptive allocation of model capacity based on data characteristics, rather than rigid structural assumptions. To further improve scalability, we introduce a coordinate-guided mixture of experts (MoE) that enhances the specialization and efficiency of feature representations. Experiments on three large-scale ensemble simulation datasets show that FA-INR achieves state-of-the-art fidelity while significantly reducing model size, establishing a new trade-off frontier between accuracy and compactness for INR-based surrogates.
nan
Article 1368
Title@2025-06-07 (6): Free Random Projection for In-Context Reinforcement Learning
Title: Free Random Projection for In-Context Reinforcement Learning | Freie Zufallsprojektion für In-Context-Verstärkung | 内容强化学习自由随机投射 2504.06983v2 |
Authors (3): Tomohiro Hayase, Benoît Collins, Nakamasa Inoue
Hierarchical inductive biases are hypothesized to promote generalizable policies in reinforcement learning, as demonstrated by explicit hyperbolic latent representations and architectures. Therefore, a more flexible approach is to have these biases emerge naturally from the algorithm. We introduce Free Random Projection, an input mapping grounded in free probability theory that constructs random orthogonal matrices where hierarchical structure arises inherently. The free random projection integrates seamlessly into existing in-context reinforcement learning frameworks by encoding hierarchical organization within the input space without requiring explicit architectural modifications. Empirical results on multi-environment benchmarks show that free random projection consistently outperforms the standard random projection, leading to improvements in generalization. Furthermore, analyses within linearly solvable Markov decision processes and investigations of the spectrum of kernel random matrices reveal the theoretical underpinnings of free random projection’s enhanced performance, highlighting its capacity for effective adaptation in hierarchically structured state spaces.
nan
Article 1369
Title@2025-06-07 (6): JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data
Title: JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data | JESTR: Gemeinsame Einbettung von Weltraumtechnik für Ranking-Kandidatenmoleküle für die Annotation untargeted Metabolomics Data | JESTR: 联合嵌入空间技术,以评分非目标元元数据评分候选候选分子 2411.14464v3 |
Authors (4): Apurva Kalia, Yan Zhou Chen, Dilip Krishnan, Soha Hassoun
Motivation: A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint prediction (FP), annotation rates remain low. Results: We introduce in this paper a novel paradigm (JESTR) for annotation. Unlike prior approaches that explicitly construct molecular fingerprints or spectra, JESTR leverages the insight that molecules and their corresponding spectra are views of the same data and effectively embeds their representations in a joint space. Candidate structures are ranked based on cosine similarity between the embeddings of query spectrum and each candidate. We evaluate JESTR against mol-to-spec and spec-to-FP annotation tools on three datasets. On average, for rank@[1-5], JESTR outperforms other tools by 23.6%-71.6%. We further demonstrate the strong value of regularization with candidate molecules during training, boosting rank@1 performance by 11.4% and enhancing the model’s ability to discern between target and candidate molecules. When comparing JESTR’s performance against that of publicly available pretrained models of SIRIUS and CFM-ID on appropriate subsets of MassSpecGym benchmark dataset, JESTR outperforms these tools by 31% and 238%, respectively. Through JESTR, we offer a novel promising avenue towards accurate annotation, therefore unlocking valuable insights into the metabolome.
nan
Article 1370
Title@2025-06-07 (6): Curvature Enhanced Data Augmentation for Regression
Title: Curvature Enhanced Data Augmentation for Regression | Verstärkte Datenvergrößerung für Regression | 后回归加速度增强数据增强值 2506.06853v1 |
Authors (2): Ilya Kaufman Sirot, Omri Azencot
Deep learning models with a large number of parameters, often referred to as over-parameterized models, have achieved exceptional performance across various tasks. Despite concerns about overfitting, these models frequently generalize well to unseen data, thanks to effective regularization techniques, with data augmentation being among the most widely used. While data augmentation has shown great success in classification tasks using label-preserving transformations, its application in regression problems has received less attention. Recently, a novel \emph{manifold learning} approach for generating synthetic data was proposed, utilizing a first-order approximation of the data manifold. Building on this foundation, we present a theoretical framework and practical tools for approximating and sampling general data manifolds. Furthermore, we introduce the Curvature-Enhanced Manifold Sampling (CEMS) method for regression tasks. CEMS leverages a second-order representation of the data manifold to enable efficient sampling and reconstruction of new data points. Extensive evaluations across multiple datasets and comparisons with state-of-the-art methods demonstrate that CEMS delivers superior performance in both in-distribution and out-of-distribution scenarios, while introducing only minimal computational overhead. Code is available at https://github.com/azencot-group/CEMS.
nan
Article 1371
Title@2025-06-07 (6): Active Preference Optimization for Sample Efficient RLHF
Title: Active Preference Optimization for Sample Efficient RLHF | Aktive Preference-Optimierung für Probeneffizienz RLHF | 抽样高效RLHF的主动首选最佳优化 2402.10500v3 |
Authors (4): Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury
Large Language Models (LLMs) aligned using Reinforcement Learning from Human Feedback (RLHF) have shown remarkable generation abilities in numerous tasks. However, collecting high-quality human preferences creates costly bottlenecks in practical deployments, and hence, training data are often budgeted. In these scenarios, it is crucial to collect training data (e.g., contexts, a pair of generations for each context, and a preference indicating which generation is better) carefully, yet most of the existing methods sample contexts uniformly at random from a given collection. Given this, under the Bradley-Terry-Luce preference model and with a small budget of training data, we show that uniform sampling of contexts could lead to a policy (i.e., an aligned model) that suffers a constant sub-optimality gap from the optimal policy. This highlights the need for an adaptive context sampling strategy for effective alignment under a small sample budget. To address this, we reformulate RLHF within the contextual preference bandit framework, treating generations as actions, and give a nearly complete characterization of the sub-optimality gap in terms of both lower and upper bounds. First, when the action set is a $d$-dimensional hypercube and the number of samples is $T$, we show an $\Omega(d/\sqrt{T})$ lower bound. Next, we propose an algorithm, $\textit{Active Preference Optimization}$ ($\texttt{APO}$), that iteratively collects preferences for the most uncertain contexts. We show that the sub-optimality gap of the policy learned via $\texttt{APO}$ matches the lower bound up to a log factor and a non-linearity constant. Finally, we perform experiments on practical datasets to validate $\texttt{APO}$’s efficacy over existing methods, establishing it as a sample-efficient and cost-effective solution for LLM alignment.
nan
Article 1372
Title@2025-06-07 (6): A Statistical Framework for Model Selection in LSTM Networks
Title: A Statistical Framework for Model Selection in LSTM Networks | Ein statistischer Rahmen für die Modellauswahl in LSTM-Netzwerken | LSTM网络模式选择统计框架 2506.06840v1 |
Authors (1): Fahad Mostafa
Long Short-Term Memory (LSTM) neural network models have become the cornerstone for sequential data modeling in numerous applications, ranging from natural language processing to time series forecasting. Despite their success, the problem of model selection, including hyperparameter tuning, architecture specification, and regularization choice remains largely heuristic and computationally expensive. In this paper, we propose a unified statistical framework for systematic model selection in LSTM networks. Our framework extends classical model selection ideas, such as information criteria and shrinkage estimation, to sequential neural networks. We define penalized likelihoods adapted to temporal structures, propose a generalized threshold approach for hidden state dynamics, and provide efficient estimation strategies using variational Bayes and approximate marginal likelihood methods. Several biomedical data centric examples demonstrate the flexibility and improved performance of the proposed framework.
nan
Article 1373
Title@2025-06-07 (6): Harnessing Vision-Language Models for Time Series Anomaly Detection
Title: Harnessing Vision-Language Models for Time Series Anomaly Detection | Nutzung von Vision-Language-Modellen für Zeitreihenanomalienerkennung | 利用视觉语言模型进行时间序列异常探测 2506.06836v1 |
Authors (3): Zelin He, Sarah Alnegheimish, Matthew Reimherr
Time-series anomaly detection (TSAD) has played a vital role in a variety of fields, including healthcare, finance, and industrial monitoring. Prior methods, which mainly focus on training domain-specific models on numerical data, lack the visual-temporal reasoning capacity that human experts have to identify contextual anomalies. To fill this gap, we explore a solution based on vision language models (VLMs). Recent studies have shown the ability of VLMs for visual reasoning tasks, yet their direct application to time series has fallen short on both accuracy and efficiency. To harness the power of VLMs for TSAD, we propose a two-stage solution, with (1) ViT4TS, a vision-screening stage built on a relatively lightweight pretrained vision encoder, which leverages 2-D time-series representations to accurately localize candidate anomalies; (2) VLM4TS, a VLM-based stage that integrates global temporal context and VLM reasoning capacity to refine the detection upon the candidates provided by ViT4TS. We show that without any time-series training, VLM4TS outperforms time-series pretrained and from-scratch baselines in most cases, yielding a 24.6 percent improvement in F1-max score over the best baseline. Moreover, VLM4TS also consistently outperforms existing language-model-based TSAD methods and is on average 36 times more efficient in token usage.
nan
Article 1374
Title@2025-06-07 (6): LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models
Title: LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models | LLM-Attacken: Verbesserung der Closed-Loop Adversarial Scenario Generation für autonomes Fahren mit großen Sprachmodellen | LLM-攻击者:加强利用大语言模型自主驾驶的闭路反反向假想生成 2501.15850v2 |
Authors (4): Yuewen Mei, Tong Nie, Jian Sun, Ye Tian
Ensuring and improving the safety of autonomous driving systems (ADS) is crucial for the deployment of highly automated vehicles, especially in safety-critical events. To address the rarity issue, adversarial scenario generation methods are developed, in which behaviors of traffic participants are manipulated to induce safety-critical events. However, existing methods still face two limitations. First, identification of the adversarial participant directly impacts the effectiveness of the generation. However, the complexity of real-world scenarios, with numerous participants and diverse behaviors, makes identification challenging. Second, the potential of generated safety-critical scenarios to continuously improve ADS performance remains underexplored. To address these issues, we propose LLM-attacker: a closed-loop adversarial scenario generation framework leveraging large language models (LLMs). Specifically, multiple LLM agents are designed and coordinated to identify optimal attackers. Then, the trajectories of the attackers are optimized to generate adversarial scenarios. These scenarios are iteratively refined based on the performance of ADS, forming a feedback loop to improve ADS. Experimental results show that LLM-attacker can create more dangerous scenarios than other methods, and the ADS trained with it achieves a collision rate half that of training with normal scenarios. This indicates the ability of LLM-attacker to test and enhance the safety and robustness of ADS. Video demonstrations are provided at: https://drive.google.com/file/d/1Zv4V3iG7825oyiKbUwS2Y-rR0DQIE1ZA/view.
nan
Article 1375
Title@2025-06-07 (6): FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-World LoRA
Title: FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-World LoRA | FedALT: Federated Fine-Tuning durch Adaptives lokales Training mit Rest-of-World LoRA | FedALT:通过与世界其余地区合作的适应性地方培训实现联邦退休 2503.11880v2 |
Authors (4): Jieming Bian, Lei Wang, Letian Zhang, Jie Xu
Fine-tuning large language models (LLMs) in federated settings enables privacy-preserving adaptation but suffers from cross-client interference due to model aggregation. Existing federated LoRA fine-tuning methods, primarily based on FedAvg, struggle with data heterogeneity, leading to harmful cross-client interference and suboptimal personalization. In this work, we propose \textbf{FedALT}, a novel personalized federated LoRA fine-tuning algorithm that fundamentally departs from FedAvg. Instead of using an aggregated model to initialize local training, each client continues training its individual LoRA while incorporating shared knowledge through a separate Rest-of-World (RoW) LoRA component. To effectively balance local adaptation and global information, FedALT introduces an adaptive mixer that dynamically learns input-specific weightings between the individual and RoW LoRA components, drawing conceptual foundations from the Mixture-of-Experts (MoE) paradigm. Through extensive experiments on NLP benchmarks, we demonstrate that FedALT significantly outperforms state-of-the-art personalized federated LoRA fine-tuning methods, achieving superior local adaptation without sacrificing computational efficiency.
nan
Article 1376
Title@2025-06-07 (6): The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes
Title: The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes | Die Strömungen des Konflikts: Konflikttrends mit gaussischen Prozessen zersetzen | 冲突流:与高斯进程冲突趋势的分解 2506.06828v1 |
Authors (1): Simon P. von der Maase
I present a novel approach to estimating the temporal and spatial patterns of violent conflict. I show how we can use highly temporally and spatially disaggregated data on conflict events in tandem with Gaussian processes to estimate temporospatial conflict trends. These trends can be studied to gain insight into conflict traps, diffusion and tempo-spatial conflict exposure in general; they can also be used to control for such phenomenons given other estimation tasks; lastly, the approach allow us to extrapolate the estimated tempo-spatial conflict patterns into future temporal units, thus facilitating powerful, stat-of-the-art, conflict forecasts. Importantly, these results are achieved via a relatively parsimonious framework using only one data source: past conflict patterns.
nan
Article 1377
Title@2025-06-07 (6): Data Shapley in One Training Run
Title: Data Shapley in One Training Run | Datenschädlichkeit in einem Trainingslauf | 一个培训运行中的数据形状 2406.11011v3 |
Authors (4): Jiachen T. Wang, Prateek Mittal, Dawn Song, Ruoxi Jia
Data Shapley provides a principled framework for attributing data’s contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing their application to large-scale models. Furthermore, they produce the same attribution score for any models produced by running the learning algorithm, meaning they cannot perform targeted attribution towards a specific model obtained from a single run of the algorithm. This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest. In its most efficient implementation, our technique incurs negligible additional runtime compared to standard model training. This dramatic efficiency improvement makes it possible to perform data attribution for the foundation model pretraining stage for the first time. We present several case studies that offer fresh insights into pretraining data’s contribution and discuss their implications for copyright in generative AI and pretraining data curation.
nan
Article 1378
Title@2025-06-07 (6): ASPO: Constraint-Aware Bayesian Optimization for FPGA-based Soft Processors
Title: ASPO: Constraint-Aware Bayesian Optimization for FPGA-based Soft Processors | ASPO: Constraint-Aware Bayesian Optimierung für FPGA-basierte Softprozessoren | APPO: 以FPGA为基础的软处理器的节压-Aware Bayesian优化 2506.06817v1 |
Authors (4): Haoran Wu, Ce Guo, Wayne Luk, Robert Mullins
Bayesian Optimization (BO) has shown promise in tuning processor design parameters. However, standard BO does not support constraints involving categorical parameters such as types of branch predictors and division circuits. In addition, optimization time of BO grows with processor complexity, which becomes increasingly significant especially for FPGA-based soft processors. This paper introduces ASPO, an approach that leverages disjunctive form to enable BO to handle constraints involving categorical parameters. Unlike existing methods that directly apply standard BO, the proposed ASPO method, for the first time, customizes the mathematical mechanism of BO to address challenges faced by soft-processor designs on FPGAs. Specifically, ASPO supports categorical parameters using a novel customized BO covariance kernel. It also accelerates the design evaluation procedure by penalizing the BO acquisition function with potential evaluation time and by reusing FPGA synthesis checkpoints from previously evaluated configurations. ASPO targets three soft processors: RocketChip, BOOM, and EL2 VeeR. The approach is evaluated based on seven RISC-V benchmarks. Results show that ASPO can reduce execution time for the ``multiply’’ benchmark on the BOOM processor by up to 35\% compared to the default configuration. Furthermore, it reduces design time for the BOOM processor by up to 74\% compared to Boomerang, a state-of-the-art hardware-oriented BO approach.
nan
Article 1379
Title@2025-06-07 (6): Path Integral Optimiser: Global Optimisation via Neural Schrödinger-Föllmer Diffusion
Title: Path Integral Optimiser: Global Optimisation via Neural Schrödinger-Föllmer Diffusion | Path Integral Optimiser: Globale Optimierung über Neural Schrödinger-Föllmer Diffusion | 路径综合优化:通过Neural Schrödinger-Föllmer扩散实现全球优化 2506.06815v1 |
Authors (3): Max McGuinness, Eirik Fladmark, Francisco Vargas
We present an early investigation into the use of neural diffusion processes for global optimisation, focusing on Zhang et al.’s Path Integral Sampler. One can use the Boltzmann distribution to formulate optimization as solving a Schr"odinger bridge sampling problem, then apply Girsanov’s theorem with a simple (single-point) prior to frame it in stochastic control terms, and compute the solution’s integral terms via a neural approximation (a Fourier MLP). We provide theoretical bounds for this optimiser, results on toy optimisation tasks, and a summary of the stochastic theory motivating the model. Ultimately, we found the optimiser to display promising per-step performance at optimisation tasks between 2 and 1,247 dimensions, but struggle to explore higher-dimensional spaces when faced with a 15.9k parameter model, indicating a need for work on adaptation in such environments.
nan
Article 1380
Title@2025-06-07 (6): Homophily-Driven Sanitation View for Robust Graph Contrastive Learning
Title: Homophily-Driven Sanitation View for Robust Graph Contrastive Learning | Homophil-getriebene Sanitäransicht für robustes Graph Kontrastives Lernen | 强力图表对比学习的光极- 驱动光学卫生视图 2307.12555v2 |
Authors (4): Yulin Zhu, Xing Ai, Yevgeniy Vorobeychik, Kai Zhou
We investigate adversarial robustness of unsupervised Graph Contrastive Learning (GCL) against structural attacks. First, we provide a comprehensive empirical and theoretical analysis of existing attacks, revealing how and why they downgrade the performance of GCL. Inspired by our analytic results, we present a robust GCL framework that integrates a homophily-driven sanitation view, which can be learned jointly with contrastive learning. A key challenge this poses, however, is the non-differentiable nature of the sanitation objective. To address this challenge, we propose a series of techniques to enable gradient-based end-to-end robust GCL. Moreover, we develop a fully unsupervised hyperparameter tuning method which, unlike prior approaches, does not require knowledge of node labels. We conduct extensive experiments to evaluate the performance of our proposed model, GCHS (Graph Contrastive Learning with Homophily-driven Sanitation View), against two state of the art structural attacks on GCL. Our results demonstrate that GCHS consistently outperforms all state of the art baselines in terms of the quality of generated node embeddings as well as performance on two important downstream tasks.
nan
Article 1381
Title@2025-06-07 (6): Efficient Parallelization of Message Passing Neural Network Potentials for Large-scale Molecular Dynamics
Title: Efficient Parallelization of Message Passing Neural Network Potentials for Large-scale Molecular Dynamics | Effiziente Parallelisierung von Message-Passing-Neural Network-Potenziale für großmaßstäbliche molekulare Dynamiken | 大型分子动态信息传递神经网络潜力的有效平行 2505.06711v3 |
Authors (2): Junfan Xia, Bin Jiang
Machine learning potentials have achieved great success in accelerating atomistic simulations. Many of them relying on atom-centered local descriptors are natural for parallelization. More recent message passing neural network (MPNN) models have demonstrated their superior accuracy and become increasingly popular. However, efficiently parallelizing MPNN models across multiple nodes remains challenging, limiting their practical applications in large-scale simulations. Here, we propose an efficient parallel algorithm for MPNN models, in which additional data communication is minimized among local atoms only in each MP layer without redundant computation, thus scaling linearly with the layer number. Integrated with our recursively embedded atom neural network model, this algorithm demonstrates excellent strong scaling and weak scaling behaviors in several benchmark systems. This approach enables massive molecular dynamics simulations on MPNN models as fast as on strictly local models for over 100 million atoms, vastly extending the applicability of the MPNN potential to an unprecedented scale. This general parallelization framework can empower various MPNN models to efficiently simulate very large and complex systems.
nan
Article 1382
Title@2025-06-07 (6): Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity
Title: Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity | Schnelle kausale Entdeckung durch ungefähre Kernel-basierte generalisierte Score-Funktionen mit linearer Computational Complexity | 以近端内核为基础的通用计分函数及线性比较复杂度 2412.17717v2 |
Authors (6): Yixin Ren, Haocheng Zhang, Yewei Xia, Hao Zhang, Jihong Guan, Shuigeng Zhou
Score-based causal discovery methods can effectively identify causal relationships by evaluating candidate graphs and selecting the one with the highest score. One popular class of scores is kernel-based generalized score functions, which can adapt to a wide range of scenarios and work well in practice because they circumvent assumptions about causal mechanisms and data distributions. Despite these advantages, kernel-based generalized score functions pose serious computational challenges in time and space, with a time complexity of $\mathcal{O}(n^3)$ and a memory complexity of $\mathcal{O}(n^2)$, where $n$ is the sample size. In this paper, we propose an approximate kernel-based generalized score function with $\mathcal{O}(n)$ time and space complexities by using low-rank technique and designing a set of rules to handle the complex composite matrix operations required to calculate the score, as well as developing sampling algorithms for different data types to benefit the handling of diverse data types efficiently. Our extensive causal discovery experiments on both synthetic and real-world data demonstrate that compared to the state-of-the-art method, our method can not only significantly reduce computational costs, but also achieve comparable accuracy, especially for large datasets.
nan
Article 1383
Title@2025-06-07 (6): IMPA-HGAE:Intra-Meta-Path Augmented Heterogeneous Graph Autoencoder
Title: IMPA-HGAE:Intra-Meta-Path Augmented Heterogeneous Graph Autoencoder | IMPA-HGAE:Intra-Meta-Path Augmented Heterogene Graph Autoencoder | IMPA-HGAE: 甲基板内增增异异异形图形自动编码器 2506.06809v1 |
Authors (4): Di Lin, Wanjing Ren, Xuanbin Li, Rui Zhang
Self-supervised learning (SSL) methods have been increasingly applied to diverse downstream tasks due to their superior generalization capabilities and low annotation costs. However, most existing heterogeneous graph SSL models convert heterogeneous graphs into homogeneous ones via meta-paths for training, which only leverage information from nodes at both ends of meta-paths while underutilizing the heterogeneous node information along the meta-paths. To address this limitation, this paper proposes a novel framework named IMPA-HGAE to enhance target node embeddings by fully exploiting internal node information along meta-paths. Experimental results validate that IMPA-HGAE achieves superior performance on heterogeneous datasets. Furthermore, this paper introduce innovative masking strategies to strengthen the representational capacity of generative SSL models on heterogeneous graph data. Additionally, this paper discuss the interpretability of the proposed method and potential future directions for generative self-supervised learning in heterogeneous graphs. This work provides insights into leveraging meta-path-guided structural semantics for robust representation learning in complex graph scenarios.
nan
Article 1384
Title@2025-06-07 (6): Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification
Title: Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification | Label-Semantik Aware Generativer Ansatz für Domain-Agnostic Multilabel-Klassifikation | 域-不可知性多标签分类的认知生成方法 2506.06806v1 |
Authors (5): Subhendu Khatuya, Shashwat Naidu, Saptarshi Ghosh, Pawan Goyal, Niloy Ganguly
The explosion of textual data has made manual document classification increasingly challenging. To address this, we introduce a robust, efficient domain-agnostic generative model framework for multi-label text classification. Instead of treating labels as mere atomic symbols, our approach utilizes predefined label descriptions and is trained to generate these descriptions based on the input text. During inference, the generated descriptions are matched to the pre-defined labels using a finetuned sentence transformer. We integrate this with a dual-objective loss function, combining cross-entropy loss and cosine similarity of the generated sentences with the predefined target descriptions, ensuring both semantic alignment and accuracy. Our proposed model LAGAMC stands out for its parameter efficiency and versatility across diverse datasets, making it well-suited for practical applications. We demonstrate the effectiveness of our proposed model by achieving new state-of-the-art performances across all evaluated datasets, surpassing several strong baselines. We achieve improvements of 13.94% in Micro-F1 and 24.85% in Macro-F1 compared to the closest baseline across all datasets.
nan
Article 1385
Title@2025-06-07 (6): Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs
Title: Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs | Dialog ohne Grenzen: Konstant-Sized KV-Caches für erweiterte Reaktionen in LLMs | 无限制对话框: 普通 KV 宽度响应库( LLM ) 2503.00979v2 |
Authors (5): Ravi Ghadia, Avinash Kumar, Gaurav Jain, Prashant Nair, Poulami Das
Autoregressive Transformers rely on Key-Value (KV) caching to accelerate inference. However, the linear growth of the KV cache with context length leads to excessive memory consumption and bandwidth constraints. This bottleneck is particularly problematic in real-time applications – such as chatbots and interactive assistants – where low latency and high memory efficiency are critical. Existing methods drop distant tokens or compress states in a lossy manner, sacrificing accuracy by discarding vital context or introducing bias. We propose MorphKV, an inference-time technique that maintains a constant-sized KV cache while preserving accuracy. MorphKV balances long-range dependencies and local coherence during text generation. It eliminates early-token bias while retaining high-fidelity context by adaptively ranking tokens through correlation-aware selection. Unlike heuristic retention or lossy compression, MorphKV iteratively refines the KV cache via lightweight updates guided by attention patterns of recent tokens. This approach captures inter-token correlation with greater accuracy, crucial for tasks like content creation and code generation. Our studies on long-response tasks show 52.9$\%$ memory savings and 18.2$\%$ higher accuracy on average compared to state-of-the-art prior works, enabling efficient real-world deployment.
nan
Article 1386
Title@2025-06-07 (6): Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate
Title: Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate | Vorhersage-erweitert Monte Carlo: Eine Machine-Learning-Ansicht auf Steuerungsvariate | 预测增强的蒙特卡洛:关于控制Variatte的机械学习观点 2412.11257v3 |
Authors (10): Fengpei Li, Haoxian Chen, Jiahe Lin, Arkin Gupta, Xiaowei Tan, Honglei Zhao, Gang Xu, Yuriy Nevmyvaka, Agostino Capponi, Henry Lam
For many complex simulation tasks spanning areas such as healthcare, engineering, and finance, Monte Carlo (MC) methods are invaluable due to their unbiased estimates and precise error quantification. Nevertheless, Monte Carlo simulations often become computationally prohibitive, especially for nested, multi-level, or path-dependent evaluations lacking effective variance reduction techniques. While machine learning (ML) surrogates appear as natural alternatives, naive replacements typically introduce unquantifiable biases. We address this challenge by introducing Prediction-Enhanced Monte Carlo (PEMC), a framework that leverages modern ML models as learned predictors, using cheap and parallelizable simulation as features, to output unbiased evaluation with reduced variance and runtime. PEMC can also be viewed as a “modernized” view of control variates, where we consider the overall computation-cost-aware variance reduction instead of per-replication reduction, while bypassing the closed-form mean function requirement and maintaining the advantageous unbiasedness and uncertainty quantifiability of Monte Carlo. We illustrate PEMC’s broader efficacy and versatility through three examples: first, equity derivatives such as variance swaps under stochastic local volatility models; second, interest rate derivatives such as swaption pricing under the Heath-Jarrow-Morton (HJM) interest-rate model. Finally, we showcase PEMC in a socially significant context - ambulance dispatch and hospital load balancing - where accurate mortality rate estimates are key for ethically sensitive decision-making. Across these diverse scenarios, PEMC consistently reduces variance while preserving unbiasedness, highlighting its potential as a powerful enhancement to standard Monte Carlo baselines.
nan
Article 1387
Title@2025-06-07 (6): EasyFS: an Efficient Model-free Feature Selection Framework via Elastic Transformation of Features
Title: EasyFS: an Efficient Model-free Feature Selection Framework via Elastic Transformation of Features | EasyFS: Ein effizientes modellfreies Feature Selection Framework über die elastische Transformation von Features | EsimpFS:通过地物的弹性变异,建立一个高效的无示范模式的地物选择框架 2402.05954v2 |
Authors (4): Jianming Lv, Sijun Xia, Depin Liang, Wei Chen
Traditional model-free feature selection methods treat each feature independently while disregarding the interrelationships among features, which leads to relatively poor performance compared with the model-aware methods. To address this challenge, we propose an efficient model-free feature selection framework via elastic expansion and compression of the features, namely EasyFS, to achieve better performance than state-of-the-art model-aware methods while sharing the characters of efficiency and flexibility with the existing model-free methods. In particular, EasyFS expands the feature space by using the random non-linear projection network to achieve the non-linear combinations of the original features, so as to model the interrelationships among the features and discover most correlated features. Meanwhile, a novel redundancy measurement based on the change of coding rate is proposed for efficient filtering of redundant features. Comprehensive experiments on 21 different datasets show that EasyFS outperforms state-of-the art methods up to 10.9\% in the regression tasks and 5.7\% in the classification tasks while saving more than 94\% of the time.
nan
Article 1388
Title@2025-06-07 (6): Refining Adaptive Zeroth-Order Optimization at Ease
Title: Refining Adaptive Zeroth-Order Optimization at Ease | Verfeinerung adaptiver Zeroth-Order-Optimierung bei Leichtigkeit | Ease 的适应性零极优化 2502.01014v2 |
Authors (4): Yao Shu, Qixin Zhang, Kun He, Zhongxiang Dai
Recently, zeroth-order (ZO) optimization plays an essential role in scenarios where gradient information is inaccessible or unaffordable, such as black-box systems and resource-constrained environments. While existing adaptive methods such as ZO-AdaMM have shown promise, they are fundamentally limited by their underutilization of moment information during optimization, usually resulting in underperforming convergence. To overcome these limitations, this paper introduces Refined Adaptive Zeroth-Order Optimization (R-AdaZO). Specifically, we first show the untapped variance reduction effect of first moment estimate on ZO gradient estimation, which improves the accuracy and stability of ZO updates. We then refine the second moment estimate based on these variance-reduced gradient estimates to better capture the geometry of the optimization landscape, enabling a more effective scaling of ZO updates. We present rigorous theoretical analysis to show (a) the first analysis to the variance reduction of first moment estimate in ZO optimization, (b) the improved second moment estimates with a more accurate approximation of its variance-free ideal, (c) the first variance-aware convergence framework for adaptive ZO methods, which may be of independent interest, and (d) the faster convergence of R-AdaZO than existing baselines like ZO-AdaMM. Our extensive experiments, including synthetic problems, black-box adversarial attack, and memory-efficient fine-tuning of large language models (LLMs), further verify the superior convergence of R-AdaZO, indicating that R-AdaZO offers an improved solution for real-world ZO optimization challenges.
nan
Article 1389
Title@2025-06-07 (6): Is Optimal Transport Necessary for Inverse Reinforcement Learning?
Title: Is Optimal Transport Necessary for Inverse Reinforcement Learning? | Ist ein optimaler Verkehr für das Inverse Verstärkungslernen notwendig? | 最佳交通是否为反强化学习所必需? 2506.06793v1 |
Authors (3): Zixuan Dong, Yumi Omori, Keith Ross
Inverse Reinforcement Learning (IRL) aims to recover a reward function from expert demonstrations. Recently, Optimal Transport (OT) methods have been successfully deployed to align trajectories and infer rewards. While OT-based methods have shown strong empirical results, they introduce algorithmic complexity, hyperparameter sensitivity, and require solving the OT optimization problems. In this work, we challenge the necessity of OT in IRL by proposing two simple, heuristic alternatives: (1) Minimum-Distance Reward, which assigns rewards based on the nearest expert state regardless of temporal order; and (2) Segment-Matching Reward, which incorporates lightweight temporal alignment by matching agent states to corresponding segments in the expert trajectory. These methods avoid optimization, exhibit linear-time complexity, and are easy to implement. Through extensive evaluations across 32 online and offline benchmarks with three reinforcement learning algorithms, we show that our simple rewards match or outperform recent OT-based approaches. Our findings suggest that the core benefits of OT may arise from basic proximity alignment rather than its optimal coupling formulation, advocating for reevaluation of complexity in future IRL design.
nan
Article 1390
Title@2025-06-07 (6): Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models
Title: Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models | Ferret: Federated Full-Parameter Tuning auf Scale für große Sprachmodelle | Ferret:在大语言模型规模上采用联邦全票全票制表 2409.06277v3 |
Authors (5): Yao Shu, Wenyang Hu, See-Kiong Ng, Bryan Kian Hsiang Low, Fei Richard Yu
Large Language Models (LLMs) have become indispensable in numerous real-world applications. However, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing approaches often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically comes at the cost of model accuracy. To this end, we propose federated full-parameter tuning at scale for LLMs (Ferret), the first first-order method with shared randomness to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. Ferret accomplishes this through three aspects: (i) it employs widely used first-order methods for efficient local updates; (ii) it projects these updates into a low-dimensional space to considerably reduce communication overhead; and (iii) it reconstructs local updates from this low-dimensional space with shared randomness to facilitate effective full-parameter global aggregation, ensuring fast convergence and competitive final performance. Our rigorous theoretical analyses and insights along with extensive experiments, show that Ferret significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy. Our implementation is available at https://github.com/allen4747/Ferret.
nan
Article 1391
Title@2025-06-07 (6): FuncGNN: Learning Functional Semantics of Logic Circuits with Graph Neural Networks
Title: FuncGNN: Learning Functional Semantics of Logic Circuits with Graph Neural Networks | FuncGNN: Funktionelle Semantik von Logic Circuits mit Graph Neural Networks lernen | FuncGNN: 学习具有图形神经网络的逻辑电路功能性词义 2506.06787v1 |
Authors (1): Qiyun Zhao
As integrated circuit scale grows and design complexity rises, effective circuit representation helps support logic synthesis, formal verification, and other automated processes in electronic design automation. And-Inverter Graphs (AIGs), as a compact and canonical structure, are widely adopted for representing Boolean logic in these workflows. However, the increasing complexity and integration density of modern circuits introduce structural heterogeneity and global logic information loss in AIGs, posing significant challenges to accurate circuit modeling. To address these issues, we propose FuncGNN, which integrates hybrid feature aggregation to extract multi-granularity topological patterns, thereby mitigating structural heterogeneity and enhancing logic circuit representations. FuncGNN further introduces gate-aware normalization that adapts to circuit-specific gate distributions, improving robustness to structural heterogeneity. Finally, FuncGNN employs multi-layer integration to merge intermediate features across layers, effectively synthesizing local and global semantic information for comprehensive logic representations. Experimental results on two logic-level analysis tasks (i.e., signal probability prediction and truth-table distance prediction) demonstrate that FuncGNN outperforms existing state-of-the-art methods, achieving improvements of 2.06% and 18.71%, respectively, while reducing training time by approximately 50.6% and GPU memory usage by about 32.8%.
nan
Article 1392
Title@2025-06-07 (6): Caterpillar GNN: Replacing Message Passing with Efficient Aggregation
Title: Caterpillar GNN: Replacing Message Passing with Efficient Aggregation | Caterpillar GNN: Ersetzen der Nachricht durch effiziente Aggregation | GNN: 以高效聚合取代信件传递 2506.06784v1 |
Authors (1): Marek Černý
Message-passing graph neural networks (MPGNNs) dominate modern graph learning, typically prioritizing maximal expressive power. In contrast, we introduce an \emph{efficient aggregation} mechanism, deliberately trading off some expressivity for stronger and more structured aggregation capabilities. Our approach allows seamless scaling between classical message-passing and simpler methods based on colored or plain walks. We rigorously characterize the expressive power at each intermediate step using homomorphism counts from a hierarchy of generalized \emph{caterpillar graphs}. Based on this foundation, we propose the \emph{Caterpillar GNN}, whose robust graph-level aggregation enables it to successfully tackle synthetic graph-level task specifically designed to be challenging for classical MPGNNs. Moreover, we demonstrate that, on real-world datasets, the Caterpillar GNN achieves comparable predictive performance while significantly reducing the number of nodes in the hidden layers of the computational graph.
nan
Article 1393
Title@2025-06-07 (6): Interpretable LLMs for Credit Risk: A Systematic Review and Taxonomy
Title: Interpretable LLMs for Credit Risk: A Systematic Review and Taxonomy | Interpretierbare LLMs für Kreditrisiken: Systematische Überprüfung und Taxonomie | 信用风险:系统审查和分类 2506.04290v2 |
Authors (2): Muhammed Golec, Maha AlabdulJalil
Large Language Models (LLM), which have developed in recent years, enable credit risk assessment through the analysis of financial texts such as analyst reports and corporate disclosures. This paper presents the first systematic review and taxonomy focusing on LLMbased approaches in credit risk estimation. We determined the basic model architectures by selecting 60 relevant papers published between 2020-2025 with the PRISMA research strategy. And we examined the data used for scenarios such as credit default prediction and risk analysis. Since the main focus of the paper is interpretability, we classify concepts such as explainability mechanisms, chain of thought prompts and natural language justifications for LLM-based credit models. The taxonomy organizes the literature under four main headings: model architectures, data types, explainability mechanisms and application areas. Based on this analysis, we highlight the main future trends and research gaps for LLM-based credit scoring systems. This paper aims to be a reference paper for artificial intelligence and financial researchers.
nan
Article 1394
Title@2025-06-07 (6): Feature-Based Instance Neighbor Discovery: Advanced Stable Test-Time Adaptation in Dynamic World
Title: Feature-Based Instance Neighbor Discovery: Advanced Stable Test-Time Adaptation in Dynamic World | Feature-Based Instance Neighbor Discovery: Erweiterte stabile Test-Time-Anpassung in dynamischer Welt | 以地貌为基础的实例邻国发现:动态世界中的高级稳定测试时间适应 2506.06782v1 |
Authors (7): Qinting Jiang, Chuyang Ye, Dongyan Wei, Bingli Wang, Yuan Xue, Jingyan Jiang, Zhi Wang
Despite progress, deep neural networks still suffer performance declines under distribution shifts between training and test domains, leading to a substantial decrease in Quality of Experience (QoE) for applications. Existing test-time adaptation (TTA) methods are challenged by dynamic, multiple test distributions within batches. We observe that feature distributions across different domains inherently cluster into distinct groups with varying means and variances. This divergence reveals a critical limitation of previous global normalization strategies in TTA, which inevitably distort the original data characteristics. Based on this insight, we propose Feature-based Instance Neighbor Discovery (FIND), which comprises three key components: Layer-wise Feature Disentanglement (LFD), Feature Aware Batch Normalization (FABN) and Selective FABN (S-FABN). LFD stably captures features with similar distributions at each layer by constructing graph structures. While FABN optimally combines source statistics with test-time distribution specific statistics for robust feature representation. Finally, S-FABN determines which layers require feature partitioning and which can remain unified, thereby enhancing inference efficiency. Extensive experiments demonstrate that FIND significantly outperforms existing methods, achieving a 30\% accuracy improvement in dynamic scenarios while maintaining computational efficiency.
nan
Article 1395
Title@2025-06-07 (6): Continuous-Time SO(3) Forecasting with Savitzky–Golay Neural Controlled Differential Equations
Title: Continuous-Time SO(3) Forecasting with Savitzky–Golay Neural Controlled Differential Equations | Continuous-Time SO(3) Vorhersage mit Savitzky–Golay Neural gesteuerte Differentialgleichungen | 连续时间 SO(3) 使用Savitzky-Golay 控制神经控制的神经等量的预测 2506.06780v1 |
Authors (4): Lennart Bastian, Mohammad Rashed, Nassir Navab, Tolga Birdal
Tracking and forecasting the rotation of objects is fundamental in computer vision and robotics, yet SO(3) extrapolation remains challenging as (1) sensor observations can be noisy and sparse, (2) motion patterns can be governed by complex dynamics, and (3) application settings can demand long-term forecasting. This work proposes modeling continuous-time rotational object dynamics on $SO(3)$ using Neural Controlled Differential Equations guided by Savitzky-Golay paths. Unlike existing methods that rely on simplified motion assumptions, our method learns a general latent dynamical system of the underlying object trajectory while respecting the geometric structure of rotations. Experimental results on real-world data demonstrate compelling forecasting capabilities compared to existing approaches.
nan
Article 1396
Title@2025-06-07 (6): Continuous Semi-Implicit Models
Title: Continuous Semi-Implicit Models | Kontinuierliche semiimplizite Modelle | 连续半集成模型 2506.06778v1 |
Authors (7): Longlin Yu, Jiajun Zha, Tong Yang, Tianyu Xie, Xiangyu Zhang, S. -H. Gary Chan, Cheng Zhang
Semi-implicit distributions have shown great promise in variational inference and generative modeling. Hierarchical semi-implicit models, which stack multiple semi-implicit layers, enhance the expressiveness of semi-implicit distributions and can be used to accelerate diffusion models given pretrained score networks. However, their sequential training often suffers from slow convergence. In this paper, we introduce CoSIM, a continuous semi-implicit model that extends hierarchical semi-implicit models into a continuous framework. By incorporating a continuous transition kernel, CoSIM enables efficient, simulation-free training. Furthermore, we show that CoSIM achieves consistency with a carefully designed transition kernel, offering a novel approach for multistep distillation of generative models at the distributional level. Extensive experiments on image generation demonstrate that CoSIM performs on par or better than existing diffusion model acceleration methods, achieving superior performance on FD-DINOv2.
nan
Article 1397
Title@2025-06-07 (6): Label Distribution Learning using the Squared Neural Family on the Probability Simplex
Title: Label Distribution Learning using the Squared Neural Family on the Probability Simplex | Label-Distribution-Lernen mit der Quadratischen Neural-Familie auf der Wahrscheinlichkeit Simplex | 利用 “ 概率简单 “ 上的平方神经家庭进行标签分布学习 2412.07324v2 |
Authors (3): Daokun Zhang, Russell Tsuchida, Dino Sejdinovic
Label distribution learning (LDL) provides a framework wherein a distribution over categories rather than a single category is predicted, with the aim of addressing ambiguity in labeled data. Existing research on LDL mainly focuses on the task of point estimation, i.e., finding an optimal distribution in the probability simplex conditioned on the given sample. In this paper, we propose a novel label distribution learning model SNEFY-LDL, which estimates a probability distribution of all possible label distributions over the simplex, by unleashing the expressive power of the recently introduced Squared Neural Family (SNEFY), a new class of tractable probability models. As a way to summarize the fitted model, we derive the closed-form label distribution mean, variance and covariance conditioned on the given sample, which can be used to predict the ground-truth label distributions, construct label distribution confidence intervals, and measure the correlations between different labels. Moreover, more information about the label distribution prediction uncertainties can be acquired from the modeled probability density function. Extensive experiments on conformal prediction, active learning and ensemble learning are conducted, verifying SNEFY-LDL’s great effectiveness in LDL uncertainty quantification. The source code of this paper is available at https://github.com/daokunzhang/SNEFY-LDL.
nan
Article 1398
Title@2025-06-07 (6): Taming Wild Branches: Overcoming Hard-to-Predict Branches using the Bullseye Predictor
Title: Taming Wild Branches: Overcoming Hard-to-Predict Branches using the Bullseye Predictor | Wilde Zweige zähmen: Hard-to-Predict-Zweige mit dem Bullseye Predictor überwinden | 塔姆荒野分支:利用 “ 红心 “ 预测仪克服难以预防的分支 2506.06773v1 |
Authors (3): Emet Behrendt, Shing Wai Pun, Prashant J. Nair
Branch prediction is key to the performance of out-of-order processors. While the CBP-2016 winner TAGE-SC-L combines geometric-history tables, a statistical corrector, and a loop predictor, over half of its remaining mispredictions stem from a small set of hard-to-predict (H2P) branches. These branches occur under diverse global histories, causing repeated thrashing in TAGE and eviction before usefulness counters can mature. Prior work shows that simply enlarging the tables offers only marginal improvement. We augment a 159 KB TAGE-SC-L predictor with a 28 KB H2P-targeted subsystem called the Bullseye predictor. It identifies problematic PCs using a set-associative H2P Identification Table (HIT) and steers them to one of two branch-specific perceptrons, one indexed by hashed local history and the other by folded global history. A short trial phase tracks head-to-head accuracy in an H2P cache. A branch becomes perceptron-resident only if the perceptron’s sustained accuracy and output magnitude exceed dynamic thresholds, after which TAGE updates for that PC are suppressed to reduce pollution. The HIT, cache, and perceptron operate fully in parallel with TAGE-SC-L, providing higher fidelity on the H2P tail. This achieves an average MPKI of 3.4045 and CycWpPKI of 145.09.
nan
Article 1399
Title@2025-06-07 (6): On Adversarial Robustness of Language Models in Transfer Learning
Title: On Adversarial Robustness of Language Models in Transfer Learning | Zur widerlichen Robustheit von Sprachmodellen im Transfer-Lernen | 转让学习中语言模式的反效果 2501.00066v2 |
Authors (4): Bohdan Turbal, Anastasiia Mazur, Jiaxu Zhao, Mykola Pechenizkiy
We investigate the adversarial robustness of LLMs in transfer learning scenarios. Through comprehensive experiments on multiple datasets (MBIB Hate Speech, MBIB Political Bias, MBIB Gender Bias) and various model architectures (BERT, RoBERTa, GPT-2, Gemma, Phi), we reveal that transfer learning, while improving standard performance metrics, often leads to increased vulnerability to adversarial attacks. Our findings demonstrate that larger models exhibit greater resilience to this phenomenon, suggesting a complex interplay between model size, architecture, and adaptation methods. Our work highlights the crucial need for considering adversarial robustness in transfer learning scenarios and provides insights into maintaining model security without compromising performance. These findings have significant implications for the development and deployment of LLMs in real-world applications where both performance and robustness are paramount.
nan
Article 1400
Title@2025-06-07 (6): AfroBench: How Good are Large Language Models on African Languages?
Title: AfroBench: How Good are Large Language Models on African Languages? | AfroBench: Wie gut sind große Sprachmodelle auf afrikanischen Sprachen? | AfroBench:非洲语言大语言模式有多好? 2311.07978v5 |
Authors (7): Jessica Ojo, Odunayo Ogundepo, Akintunde Oladipo, Kelechi Ogueji, Jimmy Lin, Pontus Stenetorp, David Ifeoluwa Adelani
Large-scale multilingual evaluations, such as MEGA, often include only a handful of African languages due to the scarcity of high-quality evaluation data and the limited discoverability of existing African datasets. This lack of representation hinders comprehensive LLM evaluation across a diverse range of languages and tasks. To address these challenges, we introduce AfroBench – a multi-task benchmark for evaluating the performance of LLMs across 64 African languages, 15 tasks and 22 datasets. AfroBench consists of nine natural language understanding datasets, six text generation datasets, six knowledge and question answering tasks, and one mathematical reasoning task. We present results comparing the performance of prompting LLMs to fine-tuned baselines based on BERT and T5-style models. Our results suggest large gaps in performance between high-resource languages, such as English, and African languages across most tasks; but performance also varies based on the availability of monolingual data resources. Our findings confirm that performance on African languages continues to remain a hurdle for current LLMs, underscoring the need for additional efforts to close this gap. https://mcgill-nlp.github.io/AfroBench/
nan
Article 1401
Title@2025-06-07 (6): Xeno-learning: knowledge transfer across species in deep learning-based spectral image analysis
Title: Xeno-learning: knowledge transfer across species in deep learning-based spectral image analysis | Xeno-Learning: Wissenstransfer über Arten hinweg in der Deep Learning-basierten Spektralbildanalyse | Xeno-学习:深学习光谱图象分析中的跨物种知识转让 2410.19789v2 |
Authors (19): Jan Sellner, Alexander Studier-Fischer, Ahmad Bin Qasim, Silvia Seidlitz, Nicholas Schreck, Minu Tizabi, Manuel Wiesenfarth, Annette Kopp-Schneider, Janne Heinecke, Jule Brandt, Samuel Knödler, Caelan Max Haney, Gabriel Salg, Berkin Özdemir, Maximilian Dietrich, Maurice Stephan Michel, Felix Nickel, Karl-Friedrich Kowalewski, Lena Maier-Hein
Novel optical imaging techniques, such as hyperspectral imaging (HSI) combined with machine learning-based (ML) analysis, have the potential to revolutionize clinical surgical imaging. However, these novel modalities face a shortage of large-scale, representative clinical data for training ML algorithms, while preclinical animal data is abundantly available through standardized experiments and allows for controlled induction of pathological tissue states, which is not ethically possible in patients. To leverage this situation, we propose a novel concept called “xeno-learning”, a cross-species knowledge transfer paradigm inspired by xeno-transplantation, where organs from a donor species are transplanted into a recipient species. Using a total of 13,874 HSI images from humans as well as porcine and rat models, we show that although spectral signatures of organs differ substantially across species, relative changes resulting from pathologies or surgical manipulation (e.g., malperfusion; injection of contrast agent) are comparable. Such changes learnt in one species can thus be transferred to a new species via a novel “physiology-based data augmentation” method, enabling the large-scale secondary use of preclinical animal data for humans. The resulting ethical, monetary, and performance benefits promise a high impact of the proposed knowledge transfer paradigm on future developments in the field.
nan
Article 1402
Title@2025-06-07 (6): The OCR Quest for Generalization: Learning to recognize low-resource alphabets with model editing
Title: The OCR Quest for Generalization: Learning to recognize low-resource alphabets with model editing | OCR Quest for Generalization: Lernen, Low-Resource-Alphabets mit Modellbearbeitung zu erkennen | OCR 普遍化调查:学会以示范编辑方式识别低资源字母表 2506.06761v1 |
Authors (3): Adrià Molina Rodríguez, Oriol Ramos Terrades, Josep Lladós
Achieving robustness in recognition systems across diverse domains is crucial for their practical utility. While ample data availability is usually assumed, low-resource languages, such as ancient manuscripts and non-western languages, tend to be kept out of the equations of massive pretraining and foundational techniques due to an under representation. In this work, we aim for building models which can generalize to new distributions of data, such as alphabets, faster than centralized fine-tune strategies. For doing so, we take advantage of the recent advancements in model editing to enhance the incorporation of unseen scripts (low-resource learning). In contrast to state-of-the-art meta-learning, we showcase the effectiveness of domain merging in sparse distributions of data, with agnosticity of its relation to the overall distribution or any other prototyping necessity. Even when using the same exact training data, our experiments showcase significant performance boosts in \textbf{transfer learning} to new alphabets and \textbf{out-of-domain evaluation} in challenging domain shifts, including historical ciphered texts and non-Latin scripts. This research contributes a novel approach into building models that can easily adopt under-represented alphabets and, therefore, enable document recognition to a wider set of contexts and cultures.
nan
Article 1403
Title@2025-06-07 (6): Bio-Inspired Classification: Combining Information Theory and Spiking Neural Networks – Influence of the Learning Rules
Title: Bio-Inspired Classification: Combining Information Theory and Spiking Neural Networks – Influence of the Learning Rules | Bio-Inspirierte Klassifikation: Verknüpfung von Informationstheorie und spikenden Neuronalen Netzwerken – Einfluss der Lernregeln | 生物启发分类:信息理论与Spiking神经网络相结合 – – 学习规则的影响 2506.06750v1 |
Authors (3): Zofia Rudnicka, Janusz Szczepanski, Agnieszka Pregowska
Training of Spiking Neural Networks (SNN) is challenging due to their unique properties, including temporal dynamics, non-differentiability of spike events, and sparse event-driven activations. In this paper, we widely consider the influence of the type of chosen learning algorithm, including bioinspired learning rules on the accuracy of classification. We proposed a bioinspired classifier based on the combination of SNN and Lempel-Ziv complexity (LZC). This approach synergizes the strengths of SNNs in temporal precision and biological realism with LZC’s structural complexity analysis, facilitating efficient and interpretable classification of spatiotemporal neural data. It turned out that the classic backpropagation algorithm achieves excellent classification accuracy, but at extremely high computational cost, which makes it impractical for real-time applications. Biologically inspired learning algorithms such as tempotron and Spikprop provide increased computational efficiency while maintaining competitive classification performance, making them suitable for time-sensitive tasks. The results obtained indicate that the selection of the most appropriate learning algorithm depends on the trade-off between classification accuracy and computational cost as well as application constraints.
nan
Article 1404
Title@2025-06-07 (6): Honey, I shrunk the hypothesis space (through logical preprocessing)
Title: Honey, I shrunk the hypothesis space (through logical preprocessing) | Schatz, ich habe den Hypothesenraum geschrumpft (durch logische Vorverarbeitung) | 亲爱的,我缩小了假设空间(通过逻辑预处理) 2506.06739v1 |
Authors (3): Andrew Cropper, Filipe Gouveia, David M. Cerna
Inductive logic programming (ILP) is a form of logical machine learning. The goal is to search a hypothesis space for a hypothesis that generalises training examples and background knowledge. We introduce an approach that ‘shrinks’ the hypothesis space before an ILP system searches it. Our approach uses background knowledge to find rules that cannot be in an optimal hypothesis regardless of the training examples. For instance, our approach discovers relationships such as “even numbers cannot be odd” and “prime numbers greater than 2 are odd”. It then removes violating rules from the hypothesis space. We implement our approach using answer set programming and use it to shrink the hypothesis space of a constraint-based ILP system. Our experiments on multiple domains, including visual reasoning and game playing, show that our approach can substantially reduce learning times whilst maintaining predictive accuracies. For instance, given just 10 seconds of preprocessing time, our approach can reduce learning times from over 10 hours to only 2 seconds.
nan
Article 1405
Title@2025-06-07 (6): WorldLLM: Improving LLMs’ world modeling using curiosity-driven theory-making
Title: WorldLLM: Improving LLMs’ world modeling using curiosity-driven theory-making | WorldLLM: Verbesserung der Weltmodellierung von LLMs durch kuriositätsgetriebene Theoriebildung | WorldLLLM:利用由好奇力驱动的理论制造改进LLMs的世界模型 2506.06725v1 |
Authors (5): Guillaume Levy, Cedric Colas, Pierre-Yves Oudeyer, Thomas Carta, Clement Romac
Large Language Models (LLMs) possess general world knowledge but often struggle to generate precise predictions in structured, domain-specific contexts such as simulations. These limitations arise from their inability to ground their broad, unstructured understanding in specific environments. To address this, we present WorldLLM, a framework that enhances LLM-based world modeling by combining Bayesian inference and autonomous active exploration with reinforcement learning. WorldLLM leverages the in-context learning abilities of LLMs to guide an LLM-based world model’s predictions using natural language hypotheses given in its prompt. These hypotheses are iteratively refined through a Bayesian inference framework that leverages a second LLM as the proposal distribution given collected evidence. This evidence is collected using a curiosity-driven reinforcement learning policy that explores the environment to find transitions with a low log-likelihood under our LLM-based predictive model using the current hypotheses. By alternating between refining hypotheses and collecting new evidence, our framework autonomously drives continual improvement of the predictions. Our experiments demonstrate the effectiveness of WorldLLM in a textual game environment that requires agents to manipulate and combine objects. The framework not only enhances predictive accuracy, but also generates human-interpretable theories of environment dynamics.
nan
Article 1406
Title@2025-06-07 (6): Directly Forecasting Belief for Reinforcement Learning with Delays
Title: Directly Forecasting Belief for Reinforcement Learning with Delays | Direkt voraussagender Glaube für verstärktes Lernen mit Verzögerungen | 直接预测学习迟迟加强学习的信念 2505.00546v2 |
Authors (9): Qingyuan Wu, Yuhui Wang, Simon Sinong Zhan, Yixuan Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang
Reinforcement learning (RL) with delays is challenging as sensory perceptions lag behind the actual events: the RL agent needs to estimate the real state of its environment based on past observations. State-of-the-art (SOTA) methods typically employ recursive, step-by-step forecasting of states. This can cause the accumulation of compounding errors. To tackle this problem, our novel belief estimation method, named Directly Forecasting Belief Transformer (DFBT), directly forecasts states from observations without incrementally estimating intermediate states step-by-step. We theoretically demonstrate that DFBT greatly reduces compounding errors of existing recursively forecasting methods, yielding stronger performance guarantees. In experiments with D4RL offline datasets, DFBT reduces compounding errors with remarkable prediction accuracy. DFBT’s capability to forecast state sequences also facilitates multi-step bootstrapping, thus greatly improving learning efficiency. On the MuJoCo benchmark, our DFBT-based method substantially outperforms SOTA baselines. Code is available at https://github.com/QingyuanWuNothing/DFBT.
nan
Article 1407
Title@2025-06-07 (6): IQFM A Wireless Foundational Model for I/Q Streams in AI-Native 6G
Title: IQFM A Wireless Foundational Model for I/Q Streams in AI-Native 6G | IQFM Ein drahtloses Grundmodell für I/Q Streams in AI-Native 6G | AI-Native 6G的I/Q流无线无线基础模型 2506.06718v1 |
Authors (2): Omar Mashaal, Hatem Abou-Zeid
Foundational models have shown remarkable potential in natural language processing and computer vision, yet remain in their infancy in wireless communications. While a few efforts have explored image-based modalities such as channel state information (CSI) and frequency spectrograms, foundational models that operate directly on raw IQ data remain largely unexplored. This paper presents, IQFM, the first I/Q signal foundational model for wireless communications. IQFM supporting diverse tasks: modulation classification, angle-of-arrival (AoA), beam prediction, and RF fingerprinting, without heavy preprocessing or handcrafted features. We also introduce a task-aware augmentation strategy that categorizes transformations into core augmentations, such as cyclic time shifting, and task-specific augmentations. This strategy forms the basis for structured, task-dependent representation learning within a contrastive self-supervised learning (SSL) framework. Using this strategy, the lightweight encoder, pre-trained via SSL on over-the-air multi-antenna IQ data, achieves up to 99.67% and 65.45% accuracy on modulation and AoA classification, respectively, using only one labeled sample per class, outperforming supervised baselines by up to 7x and 145x. The model also generalizes to out-of-distribution tasks; when adapted to new tasks using only 500 samples per class and minimal parameter updates via LoRA, the same frozen encoder achieves 94.15% on beam prediction (vs. 89.53% supervised), 50.00% on RML2016a modulation classification (vs. 49.30%), and 96.05% on RF fingerprinting (vs. 96.64%). These results demonstrate the potential of raw IQ-based foundational models as efficient, reusable encoders for multi-task learning in AI-native 6G systems.
nan
Article 1408
Title@2025-06-07 (6): Online SLA Decomposition: Enabling Real-Time Adaptation to Evolving Network Systems
Title: Online SLA Decomposition: Enabling Real-Time Adaptation to Evolving Network Systems | Online-SLA-Zersetzung: Echtzeit-Anpassung an sich entwickelnde Netzwerksysteme ermöglichen | SLA在线分解:使实时适应不断演变的网络系统 2408.08968v5 |
Authors (4): Cyril Shih-Huan Hsu, Danny De Vleeschauwer, Chrysa Papagianni, Paola Grosso
When a network slice spans multiple technology domains, it is crucial for each domain to uphold the End-to-End (E2E) Service Level Agreement (SLA) associated with the slice. Consequently, the E2E SLA must be properly decomposed into partial SLAs that are assigned to each domain involved. In a network slice management system with a two-level architecture, comprising an E2E service orchestrator and local domain controllers, we consider that the orchestrator has access only to historical data regarding the responses of local controllers to previous requests, and this information is used to construct a risk model for each domain. In this study, we extend our previous work by investigating the dynamic nature of real-world systems and introducing an online learning-decomposition framework to tackle the dynamicity. We propose a framework that continuously updates the risk models based on the most recent feedback. This approach leverages key components such as online gradient descent and FIFO memory buffers, which enhance the stability and robustness of the overall process. Our empirical study on an analytic model-based simulator demonstrates that the proposed framework outperforms the state-of-the-art static approach, delivering more accurate and resilient SLA decomposition under varying conditions and data limitations. Furthermore, we provide a comprehensive complexity analysis of the proposed solution.
nan
Article 1409
Title@2025-06-07 (6): Towards Black-Box Membership Inference Attack for Diffusion Models
Title: Towards Black-Box Membership Inference Attack for Diffusion Models | Auf dem Weg zu Black-Box-Mitgliedschafts-Inferenz-Angriff für Diffusionsmodelle | 争取为扩散模型进行黑箱成员推导攻击 2405.20771v4 |
Authors (4): Jingwei Li, Jing Dong, Tianxing He, Jingzhao Zhang
Given the rising popularity of AI-generated art and the associated copyright concerns, identifying whether an artwork was used to train a diffusion model is an important research topic. The work approaches this problem from the membership inference attack (MIA) perspective. We first identify the limitation of applying existing MIA methods for proprietary diffusion models: the required access of internal U-nets. To address the above problem, we introduce a novel membership inference attack method that uses only the image-to-image variation API and operates without access to the model’s internal U-net. Our method is based on the intuition that the model can more easily obtain an unbiased noise prediction estimate for images from the training set. By applying the API multiple times to the target image, averaging the outputs, and comparing the result to the original image, our approach can classify whether a sample was part of the training set. We validate our method using DDIM and Stable Diffusion setups and further extend both our approach and existing algorithms to the Diffusion Transformer architecture. Our experimental results consistently outperform previous methods.
nan
Article 1410
Title@2025-06-07 (6): Understanding Nonlinear Implicit Bias via Region Counts in Input Space
Title: Understanding Nonlinear Implicit Bias via Region Counts in Input Space | Nichtlineare Implizite Bias über Regionszählungen im Eingaberaum verstehen | 通过输入空间中的区域计数理解非线性隐隐含比值 2505.11370v2 |
Authors (5): Jingwei Li, Jing Xu, Zifan Wang, Huishuai Zhang, Jingzhao Zhang
One explanation for the strong generalization ability of neural networks is implicit bias. Yet, the definition and mechanism of implicit bias in non-linear contexts remains little understood. In this work, we propose to characterize implicit bias by the count of connected regions in the input space with the same predicted label. Compared with parameter-dependent metrics (e.g., norm or normalized margin), region count can be better adapted to nonlinear, overparameterized models, because it is determined by the function mapping and is invariant to reparametrization. Empirically, we found that small region counts align with geometrically simple decision boundaries and correlate well with good generalization performance. We also observe that good hyper-parameter choices such as larger learning rates and smaller batch sizes can induce small region counts. We further establish the theoretical connections and explain how larger learning rate can induce small region counts in neural networks.
nan
Article 1411
Title@2025-06-07 (6): RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking
Title: RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking | RED QUEEN: Schutz großer Sprachmodelle gegen verschleierte Multi-Turn-Jailbreaking | RED QUEEN: 保护大型语言模型,防止隐蔽的多轮破门破门 2409.17458v2 |
Authors (6): Yifan Jiang, Kriti Aggarwal, Tanmay Laud, Kashif Munir, Jay Pujara, Subhabrata Mukherjee
The rapid progress of Large Language Models (LLMs) has opened up new opportunities across various domains and applications; yet it also presents challenges related to potential misuse. To mitigate such risks, red teaming has been employed as a proactive security measure to probe language models for harmful outputs via jailbreak attacks. However, current jailbreak attack approaches are single-turn with explicit malicious queries that do not fully capture the complexity of real-world interactions. In reality, users can engage in multi-turn interactions with LLM-based chat assistants, allowing them to conceal their true intentions in a more covert manner. To bridge this gap, we, first, propose a new jailbreak approach, RED QUEEN ATTACK. This method constructs a multi-turn scenario, concealing the malicious intent under the guise of preventing harm. We craft 40 scenarios that vary in turns and select 14 harmful categories to generate 56k multi-turn attack data points. We conduct comprehensive experiments on the RED QUEEN ATTACK with four representative LLM families of different sizes. Our experiments reveal that all LLMs are vulnerable to RED QUEEN ATTACK, reaching 87.62% attack success rate on GPT-4o and 75.4% on Llama3-70B. Further analysis reveals that larger models are more susceptible to the RED QUEEN ATTACK, with multi-turn structures and concealment strategies contributing to its success. To prioritize safety, we introduce a straightforward mitigation strategy called RED QUEEN GUARD, which aligns LLMs to effectively counter adversarial attacks. This approach reduces the attack success rate to below 1% while maintaining the model’s performance across standard benchmarks. Full implementation and dataset are publicly accessible at https://github.com/kriti-hippo/red_queen.
nan
Article 1412
Title@2025-06-07 (6): Do Protein Transformers Have Biological Intelligence?
Title: Do Protein Transformers Have Biological Intelligence? | Haben Proteintransformatoren biologische Intelligenz? | 蛋白质变异器有生物智能吗? 2506.06701v1 |
Authors (8): Fudong Lin, Wanrou Du, Jinchan Liu, Tarikul Milon, Shelby Meche, Wu Xu, Xiaoqi Qin, Xu Yuan
Deep neural networks, particularly Transformers, have been widely adopted for predicting the functional properties of proteins. In this work, we focus on exploring whether Protein Transformers can capture biological intelligence among protein sequences. To achieve our goal, we first introduce a protein function dataset, namely Protein-FN, providing over 9000 protein data with meaningful labels. Second, we devise a new Transformer architecture, namely Sequence Protein Transformers (SPT), for computationally efficient protein function predictions. Third, we develop a novel Explainable Artificial Intelligence (XAI) technique called Sequence Score, which can efficiently interpret the decision-making processes of protein models, thereby overcoming the difficulty of deciphering biological intelligence bided in Protein Transformers. Remarkably, even our smallest SPT-Tiny model, which contains only 5.4M parameters, demonstrates impressive predictive accuracy, achieving 94.3% on the Antibiotic Resistance (AR) dataset and 99.6% on the Protein-FN dataset, all accomplished by training from scratch. Besides, our Sequence Score technique helps reveal that our SPT models can discover several meaningful patterns underlying the sequence structures of protein data, with these patterns aligning closely with the domain knowledge in the biology community. We have officially released our Protein-FN dataset on Hugging Face Datasets https://huggingface.co/datasets/Protein-FN/Protein-FN. Our code is available at https://github.com/fudong03/BioIntelligence.
nan
Article 1413
Title@2025-06-07 (6): MarginSel : Max-Margin Demonstration Selection for LLMs
Title: MarginSel : Max-Margin Demonstration Selection for LLMs | MarginSel : Max-Margin-Demonstrationsauswahl für LLMs | MaginSel : LLMS 最大边框演示选择 2506.06699v1 |
Authors (4): Rajeev Bhatt Ambati, James Lester, Shashank Srivastava, Snigdha Chaturvedi
Large Language Models (LLMs) excel at few-shot learning via in-context learning (ICL). However, the effectiveness of ICL is often sensitive to the selection and ordering of demonstration examples. To address this, we present MarginSel: Max-Margin Demonstration Selection for LLMs, a two-step method that selects hard demonstration examples for the ICL prompt, adapting to each test instance. Our approach achieves 2-7% absolute improvement in F1-score across classification tasks, compared to a random selection of examples. We also provide theoretical insights and empirical evidence showing that MarginSel induces max-margin behavior in LLMs by effectively increasing the margin for hard examples, analogous to support vectors, thereby shifting the decision boundary in a beneficial direction.
nan
Article 1414
Title@2025-06-07 (6): Contextual Experience Replay for Self-Improvement of Language Agents
Title: Contextual Experience Replay for Self-Improvement of Language Agents | Kontextuelles Erfahrungsreplay zur Selbstverbesserung von Sprachagenten | 语文代理人员自我改进的内事经验回顾 2506.06698v1 |
Authors (4): Yitao Liu, Chenglei Si, Karthik Narasimhan, Shunyu Yao
Large language model (LLM) agents have been applied to sequential decision-making tasks such as web navigation, but without any environment-specific experiences, they often fail in these complex tasks. Moreover, current LLM agents are not designed to continually learn from past experiences during inference time, which could be crucial for them to gain these environment-specific experiences. To address this, we propose Contextual Experience Replay (CER), a training-free framework to enable efficient self-improvement for language agents in their context window. Specifically, CER accumulates and synthesizes past experiences into a dynamic memory buffer. These experiences encompass environment dynamics and common decision-making patterns, allowing the agents to retrieve and augment themselves with relevant knowledge in new tasks, enhancing their adaptability in complex environments. We evaluate CER on the challenging WebArena and VisualWebArena benchmarks. On VisualWebArena, CER achieves a competitive performance of 31.9%. On WebArena, CER also gets a competitive average success rate of 36.7%, relatively improving the success rate of the GPT-4o agent baseline by 51.0%. We also conduct a comprehensive analysis on it to prove its efficiency, validity and understand it better.
nan
Article 1415
Title@2025-06-07 (6): Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning
Title: Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning | Breaking Data Silos: Auf dem Weg zu offenen und skalierbaren Mobilitäts-Stiftungsmodellen durch generatives kontinuierliches Lernen | 突破数据硅:通过创造持续学习实现开放和可缩放的流动基金会模式 2506.06694v1 |
Authors (5): Yuan Yuan, Yukun Liu, Chonghua Han, Jie Feng, Yong Li
Foundation models have revolutionized fields such as natural language processing and computer vision by enabling general-purpose learning across diverse tasks and datasets. However, building analogous models for human mobility remains challenging due to the privacy-sensitive nature of mobility data and the resulting data silos across institutions. To bridge this gap, we propose MoveGCL, a scalable and privacy-preserving framework for training mobility foundation models via generative continual learning. Without sharing raw data, MoveGCL enables decentralized and progressive model evolution by replaying synthetic trajectories generated from a frozen teacher model, and reinforces knowledge retention through a tailored distillation strategy that mitigates catastrophic forgetting. To address the heterogeneity of mobility patterns, MoveGCL incorporates a Mixture-of-Experts Transformer with a mobility-aware expert routing mechanism, and employs a layer-wise progressive adaptation strategy to stabilize continual updates. Experiments on six real-world urban datasets demonstrate that MoveGCL achieves performance comparable to joint training and significantly outperforms federated learning baselines, while offering strong privacy protection. MoveGCL marks a crucial step toward unlocking foundation models for mobility, offering a practical blueprint for open, scalable, and privacy-preserving model development in the era of foundation models.
nan
Article 1416
Title@2025-06-07 (6): From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
Title: From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications | Vom Tiefrang gradienten Subraumstabilisierung zu Low-Rank-Gewichten: Beobachtungen, Theorien und Anwendungen | 从低级梯度子空间稳定到低射速重量:观测、理论和应用 2407.11239v2 |
Authors (9): Ajay Jaiswal, Yifan Wang, Lu Yin, Shiwei Liu, Runjin Chen, Jiawei Zhao, Ananth Grama, Yuandong Tian, Zhangyang Wang
Large Language Models’ (LLMs) weight matrices can often be expressed in low-rank form with potential to relax memory and compute resource requirements. Unlike prior efforts that focus on developing novel matrix decompositions, in this work we study the non-uniform low-rank properties of weight matrices in LLMs through the lens of stabilizing gradient subspace. First, we provide a theoretical framework to understand the stabilization of gradient subspaces through Hessian analysis. Second, we empirically establish an important relationship between gradient dynamics and low-rank expressiveness of weight matrices. Our findings reveal that different LLM components exhibit varying levels of converged low-rank structures, necessitating variable rank reduction across them to minimize drop in performance due to compression. Drawing on this result, we present Weight Low-Rank Projection(WeLore) that unifies weight compression and memory-efficient fine-tuning into one, in a data-agnostic and one-shot manner. When used as a compression technique, WeLore categorizes weight matrices into Low-rank Components (LRCs) and Non-Low-rank Components (N-LRCs) and suitably encodes them for minimum performance loss. Our gradient dynamics perspective illustrates that LRCs tend to have better fine-tuning capabilities and their standalone fine-tuning can closely mimic and sometimes outperform the training loss trajectory and performance of full fine-tuning with notable memory and compute footprint reduction. Codes are available at https://github.com/VITA-Group/WeLore.
nan
Article 1417
Title@2025-06-07 (6): Learning Robust Heterogeneous Graph Representations via Contrastive-Reconstruction under Sparse Semantics
Title: Learning Robust Heterogeneous Graph Representations via Contrastive-Reconstruction under Sparse Semantics | Robuste Heterogene Graphendarstellungen über Kontrastive-Rekonstruktion unter Sparsemantik lernen | 在粗简语义下通过对比重建进行学习强强强异基因图示 2506.06682v1 |
Authors (4): Di Lin, Wanjing Ren, Xuanbin Li, Rui Zhang
In graph self-supervised learning, masked autoencoders (MAE) and contrastive learning (CL) are two prominent paradigms. MAE focuses on reconstructing masked elements, while CL maximizes similarity between augmented graph views. Recent studies highlight their complementarity: MAE excels at local feature capture, and CL at global information extraction. Hybrid frameworks for homogeneous graphs have been proposed, but face challenges in designing shared encoders to meet the semantic requirements of both tasks. In semantically sparse scenarios, CL struggles with view construction, and gradient imbalance between positive and negative samples persists. This paper introduces HetCRF, a novel dual-channel self-supervised learning framework for heterogeneous graphs. HetCRF uses a two-stage aggregation strategy to adapt embedding semantics, making it compatible with both MAE and CL. To address semantic sparsity, it enhances encoder output for view construction instead of relying on raw features, improving efficiency. Two positive sample augmentation strategies are also proposed to balance gradient contributions. Node classification experiments on four real-world heterogeneous graph datasets demonstrate that HetCRF outperforms state-of-the-art baselines. On datasets with missing node features, such as Aminer and Freebase, at a 40% label rate in node classification, HetCRF improves the Macro-F1 score by 2.75% and 2.2% respectively compared to the second-best baseline, validating its effectiveness and superiority.
nan
Article 1418
Title@2025-06-07 (6): From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients
Title: From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients | Von Offline zu Online Memory-Free und Task-Free Continual Learning über feinkörnige Hypergradienten | 从离线到在线无记忆和无任务持续学习 2502.18762v2 |
Authors (4): Nicolas Michel, Maorong Wang, Jiangpeng He, Toshihiko Yamasaki
Continual Learning (CL) aims to learn from a non-stationary data stream where the underlying distribution changes over time. While recent advances have produced efficient memory-free methods in the offline CL (offCL) setting, where tasks are known in advance and data can be revisited, online CL (onCL) remains dominated by memory-based approaches. The transition from offCL to onCL is challenging, as many offline methods rely on (1) prior knowledge of task boundaries and (2) sophisticated scheduling or optimization schemes, both of which are unavailable when data arrives sequentially and can be seen only once. In this paper, we investigate the adaptation of state-of-the-art memory-free offCL methods to the online setting. We first show that augmenting these methods with lightweight prototypes significantly improves performance, albeit at the cost of increased Gradient Imbalance, resulting in a biased learning towards earlier tasks. To address this issue, we introduce Fine-Grained Hypergradients, an online mechanism for rebalancing gradient updates during training. Our experiments demonstrate that the synergy between prototype memory and hypergradient reweighting substantially enhances the performance of memory-free methods in onCL and surpasses onCL baselines. Code will be released upon acceptance.
nan
Article 1419
Title@2025-06-07 (6): Interpretation of Deep Learning Model in Embryo Selection for In Vitro Fertilization (IVF) Treatment
Title: Interpretation of Deep Learning Model in Embryo Selection for In Vitro Fertilization (IVF) Treatment | Interpretation von Deep-Learning-Modell in der Embryo-Auswahl für die In-Vitro-Düngung (IVF) Behandlung | 体外受肥(IVF)治疗Embryo选择 Empryo的深学习模型解释 2506.06680v1 |
Authors (7): Radha Kodali, Venkata Rao Dhulipalla, Venkata Siva Kishor Tatavarty, Madhavi Nadakuditi, Bharadwaj Thiruveedhula, Suryanarayana Gunnam, Durga Prasad Bavirisetti
Infertility has a considerable impact on individuals’ quality of life, affecting them socially and psychologically, with projections indicating a rise in the upcoming years. In vitro fertilization (IVF) emerges as one of the primary techniques within economically developed nations, employed to address the rising problem of low fertility. Expert embryologists conventionally grade embryos by reviewing blastocyst images to select the most optimal for transfer, yet this process is time-consuming and lacks efficiency. Blastocyst images provide a valuable resource for assessing embryo viability. In this study, we introduce an explainable artificial intelligence (XAI) framework for classifying embryos, employing a fusion of convolutional neural network (CNN) and long short-term memory (LSTM) architecture, referred to as CNN-LSTM. Utilizing deep learning, our model achieves high accuracy in embryo classification while maintaining interpretability through XAI.
nan
Article 1420
Title@2025-06-07 (6): Rethinking Benign Overfitting in Two-Layer Neural Networks
Title: Rethinking Benign Overfitting in Two-Layer Neural Networks | Neudenken Benign Overfitting in Zwei-Ebenen-Neural-Netzwerke | 重新思考两层神经网络中的 “ 仁善 “ 改造 2502.11893v2 |
Authors (2): Ruichen Xu, Kexin Chen
Recent theoretical studies (Kou et al., 2023; Cao et al., 2022) have revealed a sharp phase transition from benign to harmful overfitting when the noise-to-feature ratio exceeds a threshold-a situation common in long-tailed data distributions where atypical data is prevalent. However, harmful overfitting rarely happens in overparameterized neural networks. Further experimental results suggested that memorization is necessary for achieving near-optimal generalization error in long-tailed data distributions (Feldman & Zhang, 2020). We argue that this discrepancy between theoretical predictions and empirical observations arises because previous feature-noise data models overlook the heterogeneous nature of noise across different data classes. In this paper, we refine the feature-noise data model by incorporating class-dependent heterogeneous noise and re-examine the overfitting phenomenon in neural networks. Through a comprehensive analysis of the training dynamics, we establish test loss bounds for the refined model. Our findings reveal that neural networks can leverage “data noise” to learn implicit features that improve the classification accuracy for long-tailed data. Our analysis also provides a training-free metric for evaluating data influence on test performance. Experimental validation on both synthetic and real-world datasets supports our theoretical results.
nan
Article 1421
Title@2025-06-07 (6): Accounting for plasticity: An extension of inelastic Constitutive Artificial Neural Networks
Title: Accounting for plasticity: An extension of inelastic Constitutive Artificial Neural Networks | Bilanzierung der Plastizität: Eine Erweiterung der inelastischen konstitutiven künstlichen Neuronalen Netzwerke | 可塑性核算:无弹性建筑人工神经神经网络的扩展 2407.19326v2 |
Authors (3): Birte Boes, Jaan-Willem Simon, Hagen Holthusen
In this work, we extend the existing framework of inelastic constitutive artificial neural networks (iCANNs) by incorporating plasticity to increase their applicability to model more complex material behavior. The proposed approach ensures objectivity, material symmetry, and thermodynamic consistency, providing a robust basis for automatic model discovery of constitutive equations at finite strains. These are predicted by discovering formulations for the Helmholtz free energy and plastic potentials for the yield function and evolution equations in terms of feed-forward networks. Our framework captures both linear and nonlinear kinematic hardening behavior. Investigation of our model’s prediction showed that the extended iCANNs successfully predict both linear and nonlinear kinematic hardening behavior based on experimental and artificially generated datasets, showcasing promising capabilities of this framework. Nonetheless, challenges remain in discovering more complex yield criteria with tension-compression asymmetry and addressing deviations in experimental data at larger strains. Despite these limitations, the proposed framework provides a promising basis for incorporating plasticity into iCANNs, offering a platform for advancing in the field of automated model discovery.
nan
Article 1422
Title@2025-06-07 (6): Flood-DamageSense: Multimodal Mamba with Multitask Learning for Building Flood Damage Assessment using SAR Remote Sensing Imagery
Title: Flood-DamageSense: Multimodal Mamba with Multitask Learning for Building Flood Damage Assessment using SAR Remote Sensing Imagery | Flood-DamageSense: Multimodales Mamba mit Multitasking-Lernen zur Erstellung von Hochwasserschäden mit Hilfe von SAR Remote Sensing Imagery | 水灾-灾情警报:利用SAR遥感图像学进行水灾损害评估的多任务学习的多模式Mamba 2506.06667v1 |
Authors (2): Yu-Hsuan Ho, Ali Mostafavi
Most post-disaster damage classifiers succeed only when destructive forces leave clear spectral or structural signatures – conditions rarely present after inundation. Consequently, existing models perform poorly at identifying flood-related building damages. The model presented in this study, Flood-DamageSense, addresses this gap as the first deep-learning framework purpose-built for building-level flood-damage assessment. The architecture fuses pre- and post-event SAR/InSAR scenes with very-high-resolution optical basemaps and an inherent flood-risk layer that encodes long-term exposure probabilities, guiding the network toward plausibly affected structures even when compositional change is minimal. A multimodal Mamba backbone with a semi-Siamese encoder and task-specific decoders jointly predicts (1) graded building-damage states, (2) floodwater extent, and (3) building footprints. Training and evaluation on Hurricane Harvey (2017) imagery from Harris County, Texas – supported by insurance-derived property-damage extents – show a mean F1 improvement of up to 19 percentage points over state-of-the-art baselines, with the largest gains in the frequently misclassified “minor” and “moderate” damage categories. Ablation studies identify the inherent-risk feature as the single most significant contributor to this performance boost. An end-to-end post-processing pipeline converts pixel-level outputs to actionable, building-scale damage maps within minutes of image acquisition. By combining risk-aware modeling with SAR’s all-weather capability, Flood-DamageSense delivers faster, finer-grained, and more reliable flood-damage intelligence to support post-disaster decision-making and resource allocation.
nan
Article 1423
Title@2025-06-07 (6): Through the Gaps: Uncovering Tactical Line-Breaking Passes with Clustering
Title: Through the Gaps: Uncovering Tactical Line-Breaking Passes with Clustering | Durch die Lücken: Entdeckt taktische Linien-Breaking-Pässe mit Clustering | 穿透差距:以集束方式破开战术直线通道 2506.06666v1 |
Authors (2): Oktay Karakuş, Hasan Arkadaş
Line-breaking passes (LBPs) are crucial tactical actions in football, allowing teams to penetrate defensive lines and access high-value spaces. In this study, we present an unsupervised, clustering-based framework for detecting and analysing LBPs using synchronised event and tracking data from elite matches. Our approach models opponent team shape through vertical spatial segmentation and identifies passes that disrupt defensive lines within open play. Beyond detection, we introduce several tactical metrics, including the space build-up ratio (SBR) and two chain-based variants, LBPCh$^1$ and LBPCh$^2$, which quantify the effectiveness of LBPs in generating immediate or sustained attacking threats. We evaluate these metrics across teams and players in the 2022 FIFA World Cup, revealing stylistic differences in vertical progression and structural disruption. The proposed methodology is explainable, scalable, and directly applicable to modern performance analysis and scouting workflows.
nan
Article 1424
Title@2025-06-07 (6): SDP-CROWN: Efficient Bound Propagation for Neural Network Verification with Tightness of Semidefinite Programming
Title: SDP-CROWN: Efficient Bound Propagation for Neural Network Verification with Tightness of Semidefinite Programming | SDP-CROWN: Effiziente Bound Propagation für neurale Netzwerkverifikation mit Zähigkeit der semidefiniten Programmierung | SDP-CROWN: 有效促进神经网络核查和半无限期方案拟订的严格性 2506.06665v1 |
Authors (4): Hong-Ming Chiu, Hao Chen, Huan Zhang, Richard Y. Zhang
Neural network verifiers based on linear bound propagation scale impressively to massive models but can be surprisingly loose when neuron coupling is crucial. Conversely, semidefinite programming (SDP) verifiers capture inter-neuron coupling naturally, but their cubic complexity restricts them to only small models. In this paper, we propose SDP-CROWN, a novel hybrid verification framework that combines the tightness of SDP relaxations with the scalability of bound-propagation verifiers. At the core of SDP-CROWN is a new linear bound, derived via SDP principles, that explicitly captures $\ell_{2}$-norm-based inter-neuron coupling while adding only one extra parameter per layer. This bound can be integrated seamlessly into any linear bound-propagation pipeline, preserving the inherent scalability of such methods yet significantly improving tightness. In theory, we prove that our inter-neuron bound can be up to a factor of $\sqrt{n}$ tighter than traditional per-neuron bounds. In practice, when incorporated into the state-of-the-art $\alpha$-CROWN verifier, we observe markedly improved verification performance on large models with up to 65 thousand neurons and 2.47 million parameters, achieving tightness that approaches that of costly SDP-based methods.
nan
Article 1425
Title@2025-06-07 (6): SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL
Title: SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL | SLAC: Simulationsvorausgesetzter Latent Action Space für Ganzkörper-Real-World RL | SLAC: 模拟 - 预先准备的中转行动空间,用于整个模拟 - 模拟 - 现实世界RL 2506.04147v2 |
Authors (3): Jiaheng Hu, Peter Stone, Roberto Martín-Martín
Building capable household and industrial robots requires mastering the control of versatile, high-degree-of-freedom (DoF) systems such as mobile manipulators. While reinforcement learning (RL) holds promise for autonomously acquiring robot control policies, scaling it to high-DoF embodiments remains challenging. Direct RL in the real world demands both safe exploration and high sample efficiency, which are difficult to achieve in practice. Sim-to-real RL, on the other hand, is often brittle due to the reality gap. This paper introduces SLAC, a method that renders real-world RL feasible for complex embodiments by leveraging a low-fidelity simulator to pretrain a task-agnostic latent action space. SLAC trains this latent action space via a customized unsupervised skill discovery method designed to promote temporal abstraction, disentanglement, and safety, thereby facilitating efficient downstream learning. Once a latent action space is learned, SLAC uses it as the action interface for a novel off-policy RL algorithm to autonomously learn downstream tasks through real-world interactions. We evaluate SLAC against existing methods on a suite of bimanual mobile manipulation tasks, where it achieves state-of-the-art performance. Notably, SLAC learns contact-rich whole-body tasks in under an hour of real-world interactions, without relying on any demonstrations or hand-crafted behavior priors. More information, code, and videos at robo-rl.github.io
nan
Article 1426
Title@2025-06-07 (6): Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence
Title: Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence | Kurzlanges Adversarial Training hilft LLMs, langanhaltende Jailbreak Attacken zu verteidigen: Theoretische und empirische Beweise | 短期对面培训帮助长者捍卫长期越狱袭击:理论和经验证据 2502.04204v2 |
Authors (4): Shaopeng Fu, Liang Ding, Jingfeng Zhang, Di Wang
Jailbreak attacks against large language models (LLMs) aim to induce harmful behaviors in LLMs through carefully crafted adversarial prompts. To mitigate attacks, one way is to perform adversarial training (AT)-based alignment, i.e., training LLMs on some of the most adversarial prompts to help them learn how to behave safely under attacks. During AT, the length of adversarial prompts plays a critical role in the robustness of aligned LLMs. While long-length adversarial prompts during AT might lead to strong LLM robustness, their synthesis however is very resource-consuming, which may limit the application of LLM AT. This paper focuses on adversarial suffix jailbreak attacks and unveils that to defend against a jailbreak attack with an adversarial suffix of length $\Theta(M)$, it is enough to align LLMs on prompts with adversarial suffixes of length $\Theta(\sqrt{M})$. Theoretically, we analyze the adversarial in-context learning of linear transformers on linear regression tasks and prove a robust generalization bound for trained transformers. The bound depends on the term $\Theta(\sqrt{M_{\text{test}}}/M_{\text{train}})$, where $M_{\text{train}}$ and $M_{\text{test}}$ are the numbers of adversarially perturbed in-context samples during training and testing. Empirically, we conduct AT on popular open-source LLMs and evaluate their robustness against jailbreak attacks of different adversarial suffix lengths. Results confirm a positive correlation between the attack success rate and the ratio of the square root of the adversarial suffix length during jailbreaking to the length during AT. Our findings show that it is practical to defend against long-length'' jailbreak attacks via efficient
short-length’’ AT. The code is available at https://github.com/fshp971/adv-icl.
nan
Article 1427
Title@2025-06-07 (6): Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning
Title: Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning | Q-Policy: Quantenverstärkte Politikevaluierung für skalierbares Stärkungslernen | Q-政策:为可扩展的强化学习加强政策评价 2505.11862v2 |
Authors (3): Kalyan Cherukuri, Aarav Lala, Yash Yardi
We propose Q-Policy, a hybrid quantum-classical reinforcement learning (RL) framework that mathematically accelerates policy evaluation and optimization by exploiting quantum computing primitives. Q-Policy encodes value functions in quantum superposition, enabling simultaneous evaluation of multiple state-action pairs via amplitude encoding and quantum parallelism. We introduce a quantum-enhanced policy iteration algorithm with provable polynomial reductions in sample complexity for the evaluation step, under standard assumptions. To demonstrate the technical feasibility and theoretical soundness of our approach, we validate Q-Policy on classical emulations of small discrete control tasks. Due to current hardware and simulation limitations, our experiments focus on showcasing proof-of-concept behavior rather than large-scale empirical evaluation. Our results support the potential of Q-Policy as a theoretical foundation for scalable RL on future quantum devices, addressing RL scalability challenges beyond classical approaches.
nan
Article 1428
Title@2025-06-07 (6): Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning
Title: Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning | Pareto-Optimal Rewards von Noisy Preferences lernen: Ein Rahmen für multi-objektives Inverse-Verstärkung-Lernen | 从新偏爱中学习 Pareto- Opatimal 奖励:多目标反强化学习框架 2505.11864v2 |
Authors (2): Kalyan Cherukuri, Aarav Lala
As generative agents become increasingly capable, alignment of their behavior with complex human values remains a fundamental challenge. Existing approaches often simplify human intent through reduction to a scalar reward, overlooking the multi-faceted nature of human feedback. In this work, we introduce a theoretical framework for preference-based Multi-Objective Inverse Reinforcement Learning (MO-IRL), where human preferences are modeled as latent vector-valued reward functions. We formalize the problem of recovering a Pareto-optimal reward representation from noisy preference queries and establish conditions for identifying the underlying multi-objective structure. We derive tight sample complexity bounds for recovering $\epsilon$-approximations of the Pareto front and introduce a regret formulation to quantify suboptimality in this multi-objective setting. Furthermore, we propose a provably convergent algorithm for policy optimization using preference-inferred reward cones. Our results bridge the gap between practical alignment techniques and theoretical guarantees, providing a principled foundation for learning aligned behaviors in a high-dimension and value-pluralistic environment.
nan
Article 1429
Title@2025-06-07 (6): GraphRAG under Fire
Title: GraphRAG under Fire | GraphRAG unter Feuer | 火下的图形RAG 2501.14050v3 |
Authors (7): Jiacheng Liang, Yuhui Wang, Changjiang Li, Rongyi Zhu, Tanqiu Jiang, Neil Gong, Ting Wang
GraphRAG advances retrieval-augmented generation (RAG) by structuring external knowledge as multi-scale knowledge graphs, enabling language models to integrate both broad context and granular details in their generation. While GraphRAG has demonstrated success across domains, its security implications remain largely unexplored. To bridge this gap, this work examines GraphRAG’s vulnerability to poisoning attacks, uncovering an intriguing security paradox: existing RAG poisoning attacks are less effective under GraphRAG than conventional RAG, due to GraphRAG’s graph-based indexing and retrieval; yet, the same features also create new attack surfaces. We present GragPoison, a novel attack that exploits shared relations in the underlying knowledge graph to craft poisoning text capable of compromising multiple queries simultaneously. GragPoison employs three key strategies: (i) relation injection to introduce false knowledge, (ii) relation enhancement to amplify poisoning influence, and (iii) narrative generation to embed malicious content within coherent text. Empirical evaluation across diverse datasets and models shows that GragPoison substantially outperforms existing attacks in terms of effectiveness (up to 98% success rate) and scalability (using less than 68% poisoning text) on multiple variations of GraphRAG. We also explore potential defensive measures and their limitations, identifying promising directions for future research.
nan
Article 1430
Title@2025-06-07 (6): Rescaled Influence Functions: Accurate Data Attribution in High Dimension
Title: Rescaled Influence Functions: Accurate Data Attribution in High Dimension | Reskalierte Einflussfunktionen: Genaue Datenzuweisung in hoher Dimension | 重新尺度影响函数: 高维度的准确数据归属 2506.06656v1 |
Authors (2): Ittai Rubinstein, Samuel B. Hopkins
How does the training data affect a model’s behavior? This is the question we seek to answer with data attribution. The leading practical approaches to data attribution are based on influence functions (IF). IFs utilize a first-order Taylor approximation to efficiently predict the effect of removing a set of samples from the training set without retraining the model, and are used in a wide variety of machine learning applications. However, especially in the high-dimensional regime (# params $\geq \Omega($# samples$)$), they are often imprecise and tend to underestimate the effect of sample removals, even for simple models such as logistic regression. We present rescaled influence functions (RIF), a new tool for data attribution which can be used as a drop-in replacement for influence functions, with little computational overhead but significant improvement in accuracy. We compare IF and RIF on a range of real-world datasets, showing that RIFs offer significantly better predictions in practice, and present a theoretical analysis explaining this improvement. Finally, we present a simple class of data poisoning attacks that would fool IF-based detections but would be detected by RIF.
nan
Article 1431
Title@2025-06-07 (6): Explaining Risks: Axiomatic Risk Attributions for Financial Models
Title: Explaining Risks: Axiomatic Risk Attributions for Financial Models | Risiken erklären: Axiomatische Risikozuweisungen für Finanzmodelle | 解释风险:金融模型的逻辑风险归属 2506.06653v1 |
Authors (1): Dangxing Chen
In recent years, machine learning models have achieved great success at the expense of highly complex black-box structures. By using axiomatic attribution methods, we can fairly allocate the contributions of each feature, thus allowing us to interpret the model predictions. In high-risk sectors such as finance, risk is just as important as mean predictions. Throughout this work, we address the following risk attribution problem: how to fairly allocate the risk given a model with data? We demonstrate with analysis and empirical examples that risk can be well allocated by extending the Shapley value framework.
nan
Article 1432
Title@2025-06-07 (6): Protap: A Benchmark for Protein Modeling on Realistic Downstream Applications
Title: Protap: A Benchmark for Protein Modeling on Realistic Downstream Applications | Protap: Ein Benchmark für Proteinmodellierung auf realistischen Downstream-Anwendungen | Protap:关于现实下下游应用的蛋白质建模基准 2506.02052v2 |
Authors (10): Shuo Yan, Yuliang Yan, Bin Ma, Chenao Li, Haochun Tang, Jiahua Lu, Minhua Lin, Yuyuan Feng, Hui Xiong, Enyan Dai
Recently, extensive deep learning architectures and pretraining strategies have been explored to support downstream protein applications. Additionally, domain-specific models incorporating biological knowledge have been developed to enhance performance in specialized tasks. In this work, we introduce $\textbf{Protap}$, a comprehensive benchmark that systematically compares backbone architectures, pretraining strategies, and domain-specific models across diverse and realistic downstream protein applications. Specifically, Protap covers five applications: three general tasks and two novel specialized tasks, i.e., enzyme-catalyzed protein cleavage site prediction and targeted protein degradation, which are industrially relevant yet missing from existing benchmarks. For each application, Protap compares various domain-specific models and general architectures under multiple pretraining settings. Our empirical studies imply that: (i) Though large-scale pretraining encoders achieve great results, they often underperform supervised encoders trained on small downstream training sets. (ii) Incorporating structural information during downstream fine-tuning can match or even outperform protein language models pretrained on large-scale sequence corpora. (iii) Domain-specific biological priors can enhance performance on specialized downstream tasks. Code and datasets are publicly available at https://github.com/Trust-App-AI-Lab/protap.
nan
Article 1433
Title@2025-06-07 (6): SAFER: A Calibrated Risk-Aware Multimodal Recommendation Model for Dynamic Treatment Regimes
Title: SAFER: A Calibrated Risk-Aware Multimodal Recommendation Model for Dynamic Treatment Regimes | SAFER: Ein kalibriertes multimodales Empfehlungsmodell für dynamische Behandlungssysteme | FAFER: 动态处理制度经校准的风险软件多式建议模型 2506.06649v1 |
Authors (4): Yishan Shen, Yuyang Ye, Hui Xiong, Yong Chen
Dynamic treatment regimes (DTRs) are critical to precision medicine, optimizing long-term outcomes through personalized, real-time decision-making in evolving clinical contexts, but require careful supervision for unsafe treatment risks. Existing efforts rely primarily on clinician-prescribed gold standards despite the absence of a known optimal strategy, and predominantly using structured EHR data without extracting valuable insights from clinical notes, limiting their reliability for treatment recommendations. In this work, we introduce SAFER, a calibrated risk-aware tabular-language recommendation framework for DTR that integrates both structured EHR and clinical notes, enabling them to learn from each other, and addresses inherent label uncertainty by assuming ambiguous optimal treatment solution for deceased patients. Moreover, SAFER employs conformal prediction to provide statistical guarantees, ensuring safe treatment recommendations while filtering out uncertain predictions. Experiments on two publicly available sepsis datasets demonstrate that SAFER outperforms state-of-the-art baselines across multiple recommendation metrics and counterfactual mortality rate, while offering robust formal assurances. These findings underscore SAFER potential as a trustworthy and theoretically grounded solution for high-stakes DTR applications.
nan
Article 1434
Title@2025-06-07 (6): Spark Transformer: Reactivating Sparsity in FFN and Attention
Title: Spark Transformer: Reactivating Sparsity in FFN and Attention | Spark Transformer: Sparsamkeit in FFN und Aufmerksamkeit reaktivieren | Spark 变换器: 重新激活FFFFF 的公平性和注意 2506.06644v1 |
Authors (19): Chong You, Kan Wu, Zhipeng Jia, Lin Chen, Srinadh Bhojanapalli, Jiaxian Guo, Utku Evci, Jan Wassenberg, Praneeth Netrapalli, Jeremiah J. Willcock, Suvinay Subramanian, Felix Chern, Alek Andreev, Shreya Pathak, Felix Yu, Prateek Jain, David E. Culler, Henry M. Levy, Sanjiv Kumar
The discovery of the lazy neuron phenomenon in trained Transformers, where the vast majority of neurons in their feed-forward networks (FFN) are inactive for each token, has spurred tremendous interests in activation sparsity for enhancing large model efficiency. While notable progress has been made in translating such sparsity to wall-time benefits, modern Transformers have moved away from the ReLU activation function crucial to this phenomenon. Existing efforts on re-introducing activation sparsity often degrade model quality, increase parameter count, complicate or slow down training. Sparse attention, the application of sparse activation to the attention mechanism, often faces similar challenges. This paper introduces the Spark Transformer, a novel architecture that achieves a high level of activation sparsity in both FFN and the attention mechanism while maintaining model quality, parameter count, and standard training procedures. Our method realizes sparsity via top-k masking for explicit control over sparsity level. Crucially, we introduce statistical top-k, a hardware-accelerator-friendly, linear-time approximate algorithm that avoids costly sorting and mitigates significant training slowdown from standard top-$k$ operators. Furthermore, Spark Transformer reallocates existing FFN parameters and attention key embeddings to form a low-cost predictor for identifying activated entries. This design not only mitigates quality loss from enforced sparsity, but also enhances wall-time benefit. Pretrained with the Gemma-2 recipe, Spark Transformer demonstrates competitive performance on standard benchmarks while exhibiting significant sparsity: only 8% of FFN neurons are activated, and each token attends to a maximum of 256 tokens. This sparsity translates to a 2.5x reduction in FLOPs, leading to decoding wall-time speedups of up to 1.79x on CPU and 1.40x on GPU.
nan
Article 1435
Title@2025-06-07 (6): GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
Title: GLASS: Guided Latent Slot Diffusion for Object-Centric Learning | GLASS: Geführte Latent Slot Diffusion für Objekt-Centric Learning | GLASS: 向导中中流槽扩散,用于对象中心学习 2407.17929v2 |
Authors (3): Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth
Object-centric learning aims to decompose an input image into a set of meaningful object files (slots). These latent object representations enable a variety of downstream tasks. Yet, object-centric learning struggles on real-world datasets, which contain multiple objects of complex textures and shapes in natural everyday scenes. To address this, we introduce Guided Latent Slot Diffusion (GLASS), a novel slot-attention model that learns in the space of generated images and uses semantic and instance guidance modules to learn better slot embeddings for various downstream tasks. Our experiments show that GLASS surpasses state-of-the-art slot-attention methods by a wide margin on tasks such as (zero-shot) object discovery and conditional image generation for real-world scenes. Moreover, GLASS enables the first application of slot attention to the compositional generation of complex, realistic scenes.
nan
Article 1436
Title@2025-06-07 (6): Zero-shot Generalist Graph Anomaly Detection with Unified Neighborhood Prompts
Title: Zero-shot Generalist Graph Anomaly Detection with Unified Neighborhood Prompts | Zero-shot Generalist Graph Anomalie Erkennung mit Unified Neighborhood Prompts | 使用统一邻里提示器进行异常探测 2410.14886v2 |
Authors (5): Chaoxi Niu, Hezhe Qiao, Changlu Chen, Ling Chen, Guansong Pang
Graph anomaly detection (GAD), which aims to identify nodes in a graph that significantly deviate from normal patterns, plays a crucial role in broad application domains. However, existing GAD methods are one-model-for-one-dataset approaches, i.e., training a separate model for each graph dataset. This largely limits their applicability in real-world scenarios. To overcome this limitation, we propose a novel zero-shot generalist GAD approach UNPrompt that trains a one-for-all detection model, requiring the training of one GAD model on a single graph dataset and then effectively generalizing to detect anomalies in other graph datasets without any retraining or fine-tuning. The key insight in UNPrompt is that i) the predictability of latent node attributes can serve as a generalized anomaly measure and ii) generalized normal and abnormal graph patterns can be learned via latent node attribute prediction in a properly normalized node attribute space. UNPrompt achieves a generalist mode for GAD through two main modules: one module aligns the dimensionality and semantics of node attributes across different graphs via coordinate-wise normalization, while another module learns generalized neighborhood prompts that support the use of latent node attribute predictability as an anomaly score across different datasets. Extensive experiments on real-world GAD datasets show that UNPrompt significantly outperforms diverse competing methods under the generalist GAD setting, and it also has strong superiority under the one-model-for-one-dataset setting. Code is available at https://github.com/mala-lab/UNPrompt.
nan
Article 1437
Title@2025-06-07 (6): Non-Intrusive Load Monitoring Based on Image Load Signatures and Continual Learning
Title: Non-Intrusive Load Monitoring Based on Image Load Signatures and Continual Learning | Nicht-intrusive Lastüberwachung basierend auf Bildlastsignaturen und kontinuierlichem Lernen | 基于图像加载签名和不断学习的无侵入性负载监测 2506.06637v1 |
Authors (2): Olimjon Toirov, Wei Yu
Non-Intrusive Load Monitoring (NILM) identifies the operating status and energy consumption of each electrical device in the circuit by analyzing the electrical signals at the bus, which is of great significance for smart power management. However, the complex and changeable load combinations and application environments lead to the challenges of poor feature robustness and insufficient model generalization of traditional NILM methods. To this end, this paper proposes a new non-intrusive load monitoring method that integrates “image load signature” and continual learning. This method converts multi-dimensional power signals such as current, voltage, and power factor into visual image load feature signatures, and combines deep convolutional neural networks to realize the identification and classification of multiple devices; at the same time, self-supervised pre-training is introduced to improve feature generalization, and continual online learning strategies are used to overcome model forgetting to adapt to the emergence of new loads. This paper conducts a large number of experiments on high-sampling rate load datasets, and compares a variety of existing methods and model variants. The results show that the proposed method has achieved significant improvements in recognition accuracy.
nan
Article 1438
Title@2025-06-07 (6): Vision-QRWKV: Exploring Quantum-Enhanced RWKV Models for Image Classification
Title: Vision-QRWKV: Exploring Quantum-Enhanced RWKV Models for Image Classification | Vision-QRWKV: Quantum-erweiterte RWKV-Modelle für die Bildklassifikation | 展望-QRWKKV:探索以量为基础增强的RWKV图像分类模型 2506.06633v1 |
Authors (1): Chi-Sheng Chen
Recent advancements in quantum machine learning have shown promise in enhancing classical neural network architectures, particularly in domains involving complex, high-dimensional data. Building upon prior work in temporal sequence modeling, this paper introduces Vision-QRWKV, a hybrid quantum-classical extension of the Receptance Weighted Key Value (RWKV) architecture, applied for the first time to image classification tasks. By integrating a variational quantum circuit (VQC) into the channel mixing component of RWKV, our model aims to improve nonlinear feature transformation and enhance the expressive capacity of visual representations. We evaluate both classical and quantum RWKV models on a diverse collection of 14 medical and standard image classification benchmarks, including MedMNIST datasets, MNIST, and FashionMNIST. Our results demonstrate that the quantum-enhanced model outperforms its classical counterpart on a majority of datasets, particularly those with subtle or noisy class distinctions (e.g., ChestMNIST, RetinaMNIST, BloodMNIST). This study represents the first systematic application of quantum-enhanced RWKV in the visual domain, offering insights into the architectural trade-offs and future potential of quantum models for lightweight and efficient vision tasks.
nan
Article 1439
Title@2025-06-07 (6): Arbitrarily-Conditioned Multi-Functional Diffusion for Multi-Physics Emulation
Title: Arbitrarily-Conditioned Multi-Functional Diffusion for Multi-Physics Emulation | Willkürlich bedingte multifunktionale Diffusion für multiphysikalische Emulation | 多物理模拟模拟多功能多功能扩散 2410.13794v2 |
Authors (5): Da Long, Zhitong Xu, Guang Yang, Akil Narayan, Shandian Zhe
Modern physics simulation often involves multiple functions of interests, and traditional numerical approaches are known to be complex and computationally costly. While machine learning-based surrogate models can offer significant cost reductions, most focus on a single task, such as forward prediction, and typically lack uncertainty quantification – an essential component in many applications. To overcome these limitations, we propose Arbitrarily-Conditioned Multi-Functional Diffusion (ACM-FD), a versatile probabilistic surrogate model for multi-physics emulation. ACM-FD can perform a wide range of tasks within a single framework, including forward prediction, various inverse problems, and simulating data for entire systems or subsets of quantities conditioned on others. Specifically, we extend the standard Denoising Diffusion Probabilistic Model (DDPM) for multi-functional generation by modeling noise as Gaussian processes (GP). We propose a random-mask based, zero-regularized denoising loss to achieve flexible and robust conditional generation. We induce a Kronecker product structure in the GP covariance matrix, substantially reducing the computational cost and enabling efficient training and sampling. We demonstrate the effectiveness of ACM-FD across several fundamental multi-physics systems.
nan
Article 1440
Title@2025-06-07 (6): Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Title: Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach | Wiederherstellung der Kalibrierung für ausgerichtete große Sprachmodelle: Ein Calibration-Aware Fine-Tuning-Ansatz | 统一大语言模型的重新校准:校准-软件精美调试方法 2505.01997v2 |
Authors (7): Jiancong Xiao, Bojian Hou, Zhanliang Wang, Ruochen Jin, Qi Long, Weijie J. Su, Li Shen
One of the key technologies for the success of Large Language Models (LLMs) is preference alignment. However, a notable side effect of preference alignment is poor calibration: while the pre-trained models are typically well-calibrated, LLMs tend to become poorly calibrated after alignment with human preferences. In this paper, we investigate why preference alignment affects calibration and how to address this issue. For the first question, we observe that the preference collapse issue in alignment undesirably generalizes to the calibration scenario, causing LLMs to exhibit overconfidence and poor calibration. To address this, we demonstrate the importance of fine-tuning with domain-specific knowledge to alleviate the overconfidence issue. To further analyze whether this affects the model’s performance, we categorize models into two regimes: calibratable and non-calibratable, defined by bounds of Expected Calibration Error (ECE). In the calibratable regime, we propose a calibration-aware fine-tuning approach to achieve proper calibration without compromising LLMs’ performance. However, as models are further fine-tuned for better performance, they enter the non-calibratable regime. For this case, we develop an EM-algorithm-based ECE regularization for the fine-tuning loss to maintain low calibration error. Extensive experiments validate the effectiveness of the proposed methods.
nan
Article 1441
Title@2025-06-07 (6): Harnessing On-Device Large Language Model: Empirical Results and Implications for AI PC
Title: Harnessing On-Device Large Language Model: Empirical Results and Implications for AI PC | Nutzung von On-Device Large Language Modell: Empirische Ergebnisse und Implikationen für KI-PC | 利用大语言大语言模型:经验结果和对AIPC的影响 2505.15030v3 |
Authors (8): Qingyu Song, Peiyu Liao, Wenqian Zhao, Yiwen Wang, Shoubo Hu, Hui-Ling Zhen, Ning Jiang, Mingxuan Yuan
The increasing deployment of Large Language Models (LLMs) on edge devices, driven by model advancements and hardware improvements, offers significant privacy benefits. However, these on-device LLMs inherently face performance limitations due to reduced model capacity and necessary compression techniques. To address this, we introduce a systematic methodology – encompassing model capability, development efficiency, and system resources – for evaluating on-device LLMs. Our comprehensive evaluation, encompassing models from 0.5B to 14B parameters and seven post-training quantization (PTQ) methods on commodity laptops, yields several critical insights: 1) System-level metrics exhibit near-linear scaling with effective bits-per-weight (BPW). 2) A practical threshold exists around $\sim$3.5 effective BPW, larger models subjected to low-bit quantization consistently outperform smaller models utilizing higher bit-precision. 3) Quantization with low BPW incurs marginal accuracy loss but significant memory savings. 4) Determined by low-level implementation specifics power consumption on CPU, where computation-intensive operations spend more power than memory-intensive ones. These findings offer crucial insights and practical guidelines for the efficient deployment and optimized configuration of LLMs on resource-constrained edge devices. Our codebase is available at https://github.com/simmonssong/LLMOnDevice.
nan
Article 1442
Title@2025-06-07 (6): Policy Filtration for RLHF to Mitigate Noise in Reward Models
Title: Policy Filtration for RLHF to Mitigate Noise in Reward Models | Politische Filtration für RLHF zur Mititation von Lärm in Prämienmodellen | 将RLHF政策归类为奖励模型中最小噪音的政策 2409.06957v5 |
Authors (7): Chuheng Zhang, Wei Shen, Li Zhao, Xuyun Zhang, Xiaolong Xu, Wanchun Dou, Jiang Bian
While direct policy optimization methods exist, pioneering LLMs are fine-tuned with reinforcement learning from human feedback (RLHF) to generate better responses under the supervision of a reward model learned from preference data. One major challenge of RLHF is the inaccuracy of the intermediate reward model, especially in the tasks that requires complex reasoning for the reward model to score a response. We find that the reliability of the reward model varies across responses assigned with different rewards. This motivates us to filter the samples whose rewards may be unreliable to improve the signal-to-noise ratio during policy learning, resulting in Policy Filtration for Proximal Policy Optimization (PF-PPO). To choose a proper policy filtering strategy, we use the coefficient of determination (R2) between the rewards and actual scores on filtered samples as the metrics to help us find promising strategies since it measures how well the rewards filtered by PF-PPO indicate real performance. We provide extensive experiments to validate the effectiveness of PF-PPO in code generation and math reasoning tasks. In code generation, PF-PPO achieves the state-of-the-art performance of 7-billion-parameter models on HumanEval (+7.9%), MBPP (+0.7%), and LeetCode Contest (+10.0%) which is a more challenging benchmark created by us. In math reasoning, PF-PPO yields performance increase using different reward models and benchmarks (Ape210K and CMATH). Code is available on https://github.com/DtYXs/verl/tree/pf-ppo.
nan
Article 1443
Title@2025-06-07 (6): Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning
Title: Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning | Lehrplan Verstärkung Lernen von einfachen bis harten Aufgaben verbessert LLM Reasoning | 从易到困难的工作学习改进LLM 理由 2506.06632v1 |
Authors (11): Shubham Parashar, Shurui Gui, Xiner Li, Hongyi Ling, Sushil Vemuri, Blake Olson, Eric Li, Yu Zhang, James Caverlee, Dileep Kalathil, Shuiwang Ji
We aim to improve the reasoning capabilities of language models via reinforcement learning (RL). Recent RL post-trained models like DeepSeek-R1 have demonstrated reasoning abilities on mathematical and coding tasks. However, prior studies suggest that using RL alone to improve reasoning on inherently difficult tasks is less effective. Here, we draw inspiration from curriculum learning and propose to schedule tasks from easy to hard (E2H), allowing LLMs to build reasoning skills gradually. Our method is termed E2H Reasoner. Empirically, we observe that, although easy tasks are important initially, fading them out through appropriate scheduling is essential in preventing overfitting. Theoretically, we establish convergence guarantees for E2H Reasoner within an approximate policy iteration framework. We derive finite-sample complexity bounds and show that when tasks are appropriately decomposed and conditioned, learning through curriculum stages requires fewer total samples than direct learning. Experiments across multiple domains show that E2H Reasoner significantly improves the reasoning ability of small LLMs (1.5B to 3B), which otherwise struggle when trained with vanilla RL alone, highlighting the effectiveness of our method.
nan
Article 1444
Title@2025-06-07 (6): Q-WSL: Optimizing Goal-Conditioned RL with Weighted Supervised Learning via Dynamic Programming
Title: Q-WSL: Optimizing Goal-Conditioned RL with Weighted Supervised Learning via Dynamic Programming | Q-WSL: Optimierung der zielkonditionierten RL mit gewichtetem betreutem Lernen durch dynamische Programmierung | Q-WSL:通过动态方案拟订,以加权监督的学习方式,优化目标有条件的RL 2410.06648v5 |
Authors (4): Xing Lei, Xuetao Zhang, Zifeng Zhuang, Donglin Wang
A novel class of advanced algorithms, termed Goal-Conditioned Weighted Supervised Learning (GCWSL), has recently emerged to tackle the challenges posed by sparse rewards in goal-conditioned reinforcement learning (RL). GCWSL consistently delivers strong performance across a diverse set of goal-reaching tasks due to its simplicity, effectiveness, and stability. However, GCWSL methods lack a crucial capability known as trajectory stitching, which is essential for learning optimal policies when faced with unseen skills during testing. This limitation becomes particularly pronounced when the replay buffer is predominantly filled with sub-optimal trajectories. In contrast, traditional TD-based RL methods, such as Q-learning, which utilize Dynamic Programming, do not face this issue but often experience instability due to the inherent difficulties in value function approximation. In this paper, we propose Q-learning Weighted Supervised Learning (Q-WSL), a novel framework designed to overcome the limitations of GCWSL by incorporating the strengths of Dynamic Programming found in Q-learning. Q-WSL leverages Dynamic Programming results to output the optimal action of (state, goal) pairs across different trajectories within the replay buffer. This approach synergizes the strengths of both Q-learning and GCWSL, effectively mitigating their respective weaknesses and enhancing overall performance. Empirical evaluations on challenging goal-reaching tasks demonstrate that Q-WSL surpasses other goal-conditioned approaches in terms of both performance and sample efficiency. Additionally, Q-WSL exhibits notable robustness in environments characterized by binary reward structures and environmental stochasticity.
nan
Article 1445
Title@2025-06-07 (6): Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts
Title: Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts | Zertifiziert robuste Modellbewertung im Federated Learning unter Meta-Distributionsverschiebungen | 在混合分配变化下对联邦学习模式进行可验证的健全模式评价 2410.20250v2 |
Authors (3): Amir Najafi, Samin Mahdizadeh Sani, Farzan Farnia
We address the challenge of certifying the performance of a federated learning model on an unseen target network using only measurements from the source network that trained the model. Specifically, consider a source network “A” with $K$ clients, each holding private, non-IID datasets drawn from heterogeneous distributions, modeled as samples from a broader meta-distribution $\mu$. Our goal is to provide certified guarantees for the model’s performance on a different, unseen network “B”, governed by an unknown meta-distribution $\mu’$, assuming the deviation between $\mu$ and $\mu’$ is bounded either in Wasserstein distance or an $f$-divergence. We derive worst-case uniform guarantees for both the model’s average loss and its risk CDF, the latter corresponding to a novel, adversarially robust version of the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality. In addition, we show how the vanilla DKW bound enables principled certification of the model’s true performance on unseen clients within the same (source) network. Our bounds are efficiently computable, asymptotically minimax optimal, and preserve clients’ privacy. We also establish non-asymptotic generalization bounds that converge to zero as $K$ grows and the minimum per-client sample size exceeds $\mathcal{O}(\log K)$. Empirical evaluations confirm the practical utility of our bounds across real-world tasks. The project code is available at: github.com/samin-mehdizadeh/Robust-Evaluation-DKW
nan
Article 1446
Title@2025-06-07 (6): Equilibrium Distribution for t-Distributed Stochastic Neighbor Embedding with Generalized Kernels
Title: Equilibrium Distribution for t-Distributed Stochastic Neighbor Embedding with Generalized Kernels | Gleichgewichtsverteilung für t-Distributed Stochastic Neighbor Einbettung mit generalisierten Kernen | 以一般内核嵌入的三散散散蒸散邻居间均匀分布 2505.24311v2 |
Authors (1): Yi Gu
T-distributed stochastic neighbor embedding (t-SNE) is a well-known algorithm for visualizing high-dimensional data by finding low-dimensional representations. In this paper, we study the convergence of t-SNE with generalized kernels and extend the results of Auffinger and Fletcher in 2023. Our work starts by giving a concrete formulation of generalized input and output kernels. Then we prove that under certain conditions, the t-SNE algorithm converges to an equilibrium distribution for a wide range of input and output kernels as the number of data points diverges.
nan
Article 1447
Title@2025-06-07 (6): Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations
Title: Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations | Robuste Erlernbarkeit von Proben-kompressiblen Verteilungen unter Lärm oder störenden Störungen | 在噪音或对称干扰下抽样压缩分发的可有力学习性 2506.06613v1 |
Authors (2): Arefe Boushehrian, Amir Najafi
Learning distribution families over $\mathbb{R}^d$ is a fundamental problem in unsupervised learning and statistics. A central question in this setting is whether a given family of distributions possesses sufficient structure to be (at least) information-theoretically learnable and, if so, to characterize its sample complexity. In 2018, Ashtiani et al. reframed \emph{sample compressibility}, originally due to Littlestone and Warmuth (1986), as a structural property of distribution classes, proving that it guarantees PAC-learnability. This discovery subsequently enabled a series of recent advancements in deriving nearly tight sample complexity bounds for various high-dimensional open problems. It has been further conjectured that the converse also holds: every learnable class admits a tight sample compression scheme. In this work, we establish that sample compressible families remain learnable even from perturbed samples, subject to a set of necessary and sufficient conditions. We analyze two models of data perturbation: (i) an additive independent noise model, and (ii) an adversarial corruption model, where an adversary manipulates a limited subset of the samples unknown to the learner. Our results are general and rely on as minimal assumptions as possible. We develop a perturbation-quantization framework that interfaces naturally with the compression scheme and leads to sample complexity bounds that scale gracefully with the noise level and corruption budget. As concrete applications, we establish new sample complexity bounds for learning finite mixtures of high-dimensional uniform distributions under both noise and adversarial perturbations, as well as for learning Gaussian mixture models from adversarially corrupted samples, resolving two open problems in the literature.
nan
Article 1448
Title@2025-06-07 (6): Transferring Features Across Language Models With Model Stitching
Title: Transferring Features Across Language Models With Model Stitching | Übertragung von Funktionen über Sprachmodelle mit Modellstich | 使用模型裁剪的跨语言模型传输功能 2506.06609v1 |
Authors (4): Alan Chen, Jack Merullo, Alessandro Stolfo, Ellie Pavlick
In this work, we demonstrate that affine mappings between residual streams of language models is a cheap way to effectively transfer represented features between models. We apply this technique to transfer the weights of Sparse Autoencoders (SAEs) between models of different sizes to compare their representations. We find that small and large models learn highly similar representation spaces, which motivates training expensive components like SAEs on a smaller model and transferring to a larger model at a FLOPs savings. For example, using a small-to-large transferred SAE as initialization can lead to 50% cheaper training runs when training SAEs on larger models. Next, we show that transferred probes and steering vectors can effectively recover ground truth performance. Finally, we dive deeper into feature-level transferability, finding that semantic and structural features transfer noticeably differently while specific classes of functional features have their roles faithfully mapped. Overall, our findings illustrate similarities and differences in the linear representation spaces of small and large models and demonstrate a method for improving the training efficiency of SAEs.
nan
Article 1449
Title@2025-06-07 (6): Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit
Title: Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit | Training-Free Tokenizer Transplantation über Orthogonal Passende Verfolgung | 通过矫形配对运动进行无培训的取暖器移植 2506.06607v1 |
Authors (2): Charles Goddard, Fernando Fernandes Neto
We present a training-free method to transplant tokenizers in pretrained large language models (LLMs) by reconstructing unseen token embeddings via Orthogonal Matching Pursuit (OMP). Specifically, we approximate each out-of-vocabulary token as a sparse linear combination of shared tokens, in two phases: first, compute each new token’s representation in the donor embedding space with a small dictionary of shared anchor tokens, then transfer these same sparse coefficients back into the base model’s embedding space. On two challenging cross-tokenizer tasks–Llama$\to$Mistral NeMo (12B) and Qwen$\to$Llama (1B)–we show that OMP achieves best zero-shot preservation of the base model’s performance across multiple benchmarks, while other zero-shot approaches degrade significantly. Compared to baselines (zero-init, mean-init, and existing approaches like WECHSEL, FOCUS, ZETT), OMP consistently achieves the best overall performance, effectively bridging large tokenizer discrepancies without gradient updates. Our analysis further identifies mismatched numerical tokenization schemes as a critical challenge for preserving mathematical reasoning capabilities. This technique enables direct reuse of pretrained model weights with new tokenizers, facilitating cross-tokenizer knowledge distillation, speculative decoding, ensembling, merging, and domain-specific vocabulary adaptations. We integrate our method into the open-source mergekit-tokensurgeon tool for post hoc vocabulary realignment.
nan
Article 1450
Title@2025-06-07 (6): Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization
Title: Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization | Stacey: Förderung des stochastischen Steepest Descents durch beschleunigte Optimierung von $\ell_p$-Smooth Nonconvex | Stacey: 通过加速 $\ ell_ p$- Smooth 的非convex 优化, 推广斯托切性隐蔽源 2506.06606v1 |
Authors (6): Xinyu Luo, Cedar Site Bai, Bolian Li, Petros Drineas, Ruqi Zhang, Brian Bullins
While popular optimization methods such as SGD, AdamW, and Lion depend on steepest descent updates in either $\ell_2$ or $\ell_\infty$ norms, there remains a critical gap in handling the non-Euclidean structure observed in modern deep networks training. In this work, we address this need by introducing a new accelerated $\ell_p$ steepest descent algorithm, called Stacey, which uses interpolated primal-dual iterate sequences to effectively navigate non-Euclidean smooth optimization tasks. In addition to providing novel theoretical guarantees for the foundations of our algorithm, we empirically compare our approach against these popular methods on tasks including image classification and language model (LLM) pretraining, demonstrating both faster convergence and higher final accuracy. We further evaluate different values of $p$ across various models and datasets, underscoring the importance and efficiency of non-Euclidean approaches over standard Euclidean methods. Code can be found at https://github.com/xinyuluo8561/Stacey .
nan
Article 1451
Title@2025-06-07 (6): Scoring the Unscorables: Cyber Risk Assessment Beyond Internet Scans
Title: Scoring the Unscorables: Cyber Risk Assessment Beyond Internet Scans | Bewertung der Unscorables: Cyber-Risikobewertung jenseits von Internet-Scans | CSCOW:互联网扫描之外的网络风险评估 2506.06604v1 |
Authors (3): Armin Sarabi, Manish Karir, Mingyan Liu
In this paper we present a study on using novel data types to perform cyber risk quantification by estimating the likelihood of a data breach. We demonstrate that it is feasible to build a highly accurate cyber risk assessment model using public and readily available technology signatures obtained from crawling an organization’s website. This approach overcomes the limitations of previous similar approaches that relied on large-scale IP address based scanning data, which suffers from incomplete/missing IP address mappings as well as the lack of such data for large numbers of small and medium-sized organizations (SMEs). In comparison to scan data, technology digital signature data is more readily available for millions of SMEs. Our study shows that there is a strong relationship between these technology signatures and an organization’s cybersecurity posture. In cross-validating our model using different cyber incident datasets, we also highlight the key differences between ransomware attack victims and the larger population of cyber incident and data breach victims.
nan
Article 1452
Title@2025-06-07 (6): CAtCh: Cognitive Assessment through Cookie Thief
Title: CAtCh: Cognitive Assessment through Cookie Thief | CAtCh: Kognitive Bewertung durch Cookie Dieb | CATCH: 通过饼干小偷进行认知评估 2506.06603v1 |
Authors (8): Joseph T Colonel, Carolyn Hagler, Guiselle Wismer, Laura Curtis, Jacqueline Becker, Juan Wisnivesky, Alex Federman, Gaurav Pandey
Several machine learning algorithms have been developed for the prediction of Alzheimer’s disease and related dementia (ADRD) from spontaneous speech. However, none of these algorithms have been translated for the prediction of broader cognitive impairment (CI), which in some cases is a precursor and risk factor of ADRD. In this paper, we evaluated several speech-based open-source methods originally proposed for the prediction of ADRD, as well as methods from multimodal sentiment analysis for the task of predicting CI from patient audio recordings. Results demonstrated that multimodal methods outperformed unimodal ones for CI prediction, and that acoustics-based approaches performed better than linguistics-based ones. Specifically, interpretable acoustic features relating to affect and prosody were found to significantly outperform BERT-based linguistic features and interpretable linguistic features, respectively. All the code developed for this study is available at https://github.com/JTColonel/catch.
nan
Article 1453
Title@2025-06-07 (6): Direct Prediction Set Minimization via Bilevel Conformal Classifier Training
Title: Direct Prediction Set Minimization via Bilevel Conformal Classifier Training | Direkte Vorhersage Einstellung Minimierung über Bilevel Conformal Classifier Training | 通过双级正规分类培训尽量减少直接预测设置 2506.06599v1 |
Authors (6): Yuanjie Shi, Hooman Shahrokhi, Xuesong Jia, Xiongzhi Chen, Janardhan Rao Doppa, Yan Yan
Conformal prediction (CP) is a promising uncertainty quantification framework which works as a wrapper around a black-box classifier to construct prediction sets (i.e., subset of candidate classes) with provable guarantees. However, standard calibration methods for CP tend to produce large prediction sets which makes them less useful in practice. This paper considers the problem of integrating conformal principles into the training process of deep classifiers to directly minimize the size of prediction sets. We formulate conformal training as a bilevel optimization problem and propose the {\em Direct Prediction Set Minimization (DPSM)} algorithm to solve it. The key insight behind DPSM is to minimize a measure of the prediction set size (upper level) that is conditioned on the learned quantile of conformity scores (lower level). We analyze that DPSM has a learning bound of $O(1/\sqrt{n})$ (with $n$ training samples), while prior conformal training methods based on stochastic approximation for the quantile has a bound of $\Omega(1/s)$ (with batch size $s$ and typically $s \ll \sqrt{n}$). Experiments on various benchmark datasets and deep models show that DPSM significantly outperforms the best prior conformal training baseline with $20.46\%\downarrow$ in the prediction set size and validates our theory.
nan
Article 1454
Title@2025-06-07 (6): Scaling Inference-Efficient Language Models
Title: Scaling Inference-Efficient Language Models | Skalierung von Schlussfolgerungen-Effiziente Sprachmodelle | 扩展率-有效语言模型 2501.18107v2 |
Authors (3): Song Bian, Minghao Yan, Shivaram Venkataraman
Scaling laws are powerful tools to predict the performance of large language models. However, current scaling laws fall short of accounting for inference costs. In this work, we first show that model architecture affects inference latency, where models of the same size can have up to 3.5x difference in latency. To tackle this challenge, we modify the Chinchilla scaling laws to co-optimize the model parameter count, the number of training tokens, and the model architecture. Due to the reason that models of similar training loss exhibit gaps in downstream evaluation, we also propose a novel method to train inference-efficient models based on the revised scaling laws. We perform extensive empirical studies to fit and evaluate our inference-aware scaling laws. We vary model parameters from 80M to 1B, training tokens from 1.6B to 30B, and model shapes, training 63 models. Guided by our inference-efficient scaling law and model selection method, we release the Morph-1B model, which improves inference latency by 1.8x while maintaining accuracy on downstream tasks compared to open-source models, pushing the Pareto frontier of accuracy-latency tradeoff. Notably, our experiments reveal that wider and shallower models can yield efficiency gains while preserving accuracy.
nan
Article 1455
Title@2025-06-07 (6): From Model-Based and Adaptive Control to Evolving Fuzzy Control
Title: From Model-Based and Adaptive Control to Evolving Fuzzy Control | Von der modellbasierten und adaptiven Steuerung zur sich entwickelnden Fuzzy-Steuerung | 从模型和适应控制到演化模糊控制 2506.06594v1 |
Authors (3): Daniel Leite, Igor Škrjanc, Fernando Gomide
Evolving fuzzy systems build and adapt fuzzy models - such as predictors and controllers - by incrementally updating their rule-base structure from data streams. On the occasion of the 60-year anniversary of fuzzy set theory, commemorated during the Fuzz-IEEE 2025 event, this brief paper revisits the historical development and core contributions of classical fuzzy and adaptive modeling and control frameworks. It then highlights the emergence and significance of evolving intelligent systems in fuzzy modeling and control, emphasizing their advantages in handling nonstationary environments. Key challenges and future directions are discussed, including safety, interpretability, and principled structural evolution.
nan
Article 1456
Title@2025-06-06 (5): Improving the Diffusability of Autoencoders
Title: Improving the Diffusability of Autoencoders | Verbesserung der Diffusibilität von Autoencodern | 提高自动编码器的易变性 2502.14831v3 |
Authors (8): Ivan Skorokhodov, Sharath Girish, Benran Hu, Willi Menapace, Yanyu Li, Rameen Abdal, Sergey Tulyakov, Aliaksandr Siarohin
Latent diffusion models have emerged as the leading approach for generating high-quality images and videos, utilizing compressed latent representations to reduce the computational burden of the diffusion process. While recent advancements have primarily focused on scaling diffusion backbones and improving autoencoder reconstruction quality, the interaction between these components has received comparatively less attention. In this work, we perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces, which are especially pronounced in the autoencoders with a large bottleneck channel size. We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality. To mitigate the issue, we propose scale equivariance: a simple regularization strategy that aligns latent and RGB spaces across frequencies by enforcing scale equivariance in the decoder. It requires minimal code changes and only up to 20K autoencoder fine-tuning steps, yet significantly improves generation quality, reducing FID by 19% for image generation on ImageNet-1K $256^2$ and FVD by at least 44% for video generation on Kinetics-700 $17 \times 256^2$. The source code is available at https://github.com/snap-research/diffusability.
nan
Article 1457
Title@2025-06-06 (5): Neighbour-Driven Gaussian Process Variational Autoencoders for Scalable Structured Latent Modelling
Title: Neighbour-Driven Gaussian Process Variational Autoencoders for Scalable Structured Latent Modelling | Nachbar-getriebene Gauß-Prozessvariationelle Autoencoder für skalierbare strukturierte Latentenmodellierung | 相邻的Driven Gaussian 用于可缩放结构晚期模拟模型的流程变异自动编码器 2505.16481v2 |
Authors (3): Xinxing Shi, Xiaoyu Jiang, Mauricio A. Álvarez
Gaussian Process (GP) Variational Autoencoders (VAEs) extend standard VAEs by replacing the fully factorised Gaussian prior with a GP prior, thereby capturing richer correlations among latent variables. However, performing exact GP inference in large-scale GPVAEs is computationally prohibitive, often forcing existing approaches to rely on restrictive kernel assumptions or large sets of inducing points. In this work, we propose a neighbour-driven approximation strategy that exploits local adjacencies in the latent space to achieve scalable GPVAE inference. By confining computations to the nearest neighbours of each data point, our method preserves essential latent dependencies, allowing more flexible kernel choices and mitigating the need for numerous inducing points. Through extensive experiments on tasks including representation learning, data imputation, and conditional generation, we demonstrate that our approach outperforms other GPVAE variants in both predictive performance and computational efficiency.
nan
Article 1458
Title@2025-06-06 (5): Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures
Title: Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures | Globale Konvergenz von Gradienten EM für überparameterisierte Gauß-Mischungen | 用于超度测量高斯混合体的梯度 EM 全球趋同 2506.06584v1 |
Authors (4): Mo Zhou, Weihang Xu, Maryam Fazel, Simon S. Du
Learning Gaussian Mixture Models (GMMs) is a fundamental problem in machine learning, with the Expectation-Maximization (EM) algorithm and its popular variant gradient EM being arguably the most widely used algorithms in practice. In the exact-parameterized setting, where both the ground truth GMM and the learning model have the same number of components $m$, a vast line of work has aimed to establish rigorous recovery guarantees for EM. However, global convergence has only been proven for the case of $m=2$, and EM is known to fail to recover the ground truth when $m\geq 3$. In this paper, we consider the $\textit{over-parameterized}$ setting, where the learning model uses $n>m$ components to fit an $m$-component ground truth GMM. In contrast to the exact-parameterized case, we provide a rigorous global convergence guarantee for gradient EM. Specifically, for any well separated GMMs in general position, we prove that with only mild over-parameterization $n = \Omega(m\log m)$, randomly initialized gradient EM converges globally to the ground truth at a polynomial rate with polynomial samples. Our analysis proceeds in two stages and introduces a suite of novel tools for Gaussian Mixture analysis. We use Hermite polynomials to study the dynamics of gradient EM and employ tensor decomposition to characterize the geometric landscape of the likelihood loss. This is the first global convergence and recovery result for EM or Gradient EM beyond the special case of $m=2$.
nan
Article 1459
Title@2025-06-06 (5): Demystifying Topological Message-Passing with Relational Structures: A Case Study on Oversquashing in Simplicial Message-Passing
Title: Demystifying Topological Message-Passing with Relational Structures: A Case Study on Oversquashing in Simplicial Message-Passing | Entmystifizieren Topologisches Message-Passing mit relationalen Strukturen: Eine Fallstudie zum Übersetzen in Simplicial Message-Passing | 解开神秘的地形信息与关系结构的粘贴:关于简单信息传送中过度隔绝的案例研究 2506.06582v1 |
Authors (5): Diaaeldin Taha, James Chapman, Marzieh Eidi, Karel Devriendt, Guido Montúfar
Topological deep learning (TDL) has emerged as a powerful tool for modeling higher-order interactions in relational data. However, phenomena such as oversquashing in topological message-passing remain understudied and lack theoretical analysis. We propose a unifying axiomatic framework that bridges graph and topological message-passing by viewing simplicial and cellular complexes and their message-passing schemes through the lens of relational structures. This approach extends graph-theoretic results and algorithms to higher-order structures, facilitating the analysis and mitigation of oversquashing in topological message-passing networks. Through theoretical analysis and empirical studies on simplicial networks, we demonstrate the potential of this framework to advance TDL.
nan
Article 1460
Title@2025-06-06 (5): Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques
Title: Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques | Auf dem Weg zu einer effizienten Multi-LLM-Inferenz: Charakterisierung und Analyse von LLM Routing und Hierarchical Techniques | 实现高效的多LLLM 多级LLM推理:LLM转流和等级技术的特征和分析 2506.06579v1 |
Authors (5): Adarsh Prasad Behera, Jaya Prakash Champati, Roberto Morabito, Sasu Tarkoma, James Gross
Recent progress in Language Models (LMs) has dramatically advanced the field of natural language processing (NLP), excelling at tasks like text generation, summarization, and question answering. However, their inference remains computationally expensive and energy intensive, especially in settings with limited hardware, power, or bandwidth. This makes it difficult to deploy LMs in mobile, edge, or cost sensitive environments. To address these challenges, recent approaches have introduced multi LLM intelligent model selection strategies that dynamically allocate computational resources based on query complexity – using lightweight models for simpler queries and escalating to larger models only when necessary. This survey explores two complementary strategies for efficient LLM inference: (i) routing, which selects the most suitable model based on the query, and (ii) cascading or hierarchical inference (HI), which escalates queries through a sequence of models until a confident response is found. Both approaches aim to reduce computation by using lightweight models for simpler tasks while offloading only when needed. We provide a comparative analysis of these techniques across key performance metrics, discuss benchmarking efforts, and outline open challenges. Finally, we outline future research directions to enable faster response times, adaptive model selection based on task complexity, and scalable deployment across heterogeneous environments, making LLM based systems more efficient and accessible for real world applications.
nan
Article 1461
Title@2025-06-06 (5): Generalization of Geometric Graph Neural Networks with Lipschitz Loss Functions
Title: Generalization of Geometric Graph Neural Networks with Lipschitz Loss Functions | Generalisierung geometrischer Graphen-Neuralnetze mit Lipschitz-Verlustfunktionen | 具有Lipschitz损失功能的几何图神经网络的通用化 2409.05191v2 |
Authors (3): Zhiyang Wang, Juan Cervino, Alejandro Ribeiro
In this paper, we study the generalization capabilities of geometric graph neural networks (GNNs). We consider GNNs over a geometric graph constructed from a finite set of randomly sampled points over an embedded manifold with topological information captured. We prove a generalization gap between the optimal empirical risk and the optimal statistical risk of this GNN, which decreases with the number of sampled points from the manifold and increases with the dimension of the underlying manifold. This generalization gap ensures that the GNN trained on a graph on a set of sampled points can be utilized to process other unseen graphs constructed from the same underlying manifold. The most important observation is that the generalization capability can be realized with one large graph instead of being limited to the size of the graph as in previous results. The generalization gap is derived based on the non-asymptotic convergence result of a GNN on the sampled graph to the underlying manifold neural networks (MNNs). We verify this theoretical result with experiments on multiple real-world datasets.
nan
Article 1462
Title@2025-06-06 (5): DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling
Title: DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling | DISC: DISC: Dynamische Zersetzung verbessert LLM-Inferenzskalierung | DISC: DISC: 动态分解改善LLM 推推法的扩大 2502.16706v2 |
Authors (9): Jonathan Light, Wei Cheng, Benjamin Riviere, Wu Yue, Masafumi Oyamada, Mengdi Wang, Yisong Yue, Santiago Paternain, Haifeng Chen
Inference scaling methods for large language models often work by breaking problems into steps or groups of tokens, then sampling and selecting the best next steps. However, these steps and their sizes are usually fixed or manually designed based on domain knowledge. We introduce dynamic decomposition, a method that adaptively and automatically breaks down solution and reasoning traces into manageable steps during inference. By allocating compute more effectively - especially by subdividing difficult steps and prioritizing their sampling - dynamic decomposition significantly boosts inference efficiency. Experiments on benchmarks like APPS, MATH, and LiveCodeBench show that dynamic decomposition outperforms fixed strategies such as token-level, sentence-level, and single-step decompositions, reducing the pass@10 error rate by 5.0%, 6.7%, and 10.5% respectively. These results show the promise of dynamic decomposition for improving a broad range of inference scaling techniques.
nan
Article 1463
Title@2025-06-06 (5): A Manifold Perspective on the Statistical Generalization of Graph Neural Networks
Title: A Manifold Perspective on the Statistical Generalization of Graph Neural Networks | Eine vielfältige Perspektive auf die statistische Verallgemeinerung von Graphen-Neuralen Netzwerken | 图表神经网络统计一般化的多功能视角 2406.05225v6 |
Authors (3): Zhiyang Wang, Juan Cervino, Alejandro Ribeiro
Graph Neural Networks (GNNs) extend convolutional neural networks to operate on graphs. Despite their impressive performances in various graph learning tasks, the theoretical understanding of their generalization capability is still lacking. Previous GNN generalization bounds ignore the underlying graph structures, often leading to bounds that increase with the number of nodes – a behavior contrary to the one experienced in practice. In this paper, we take a manifold perspective to establish the statistical generalization theory of GNNs on graphs sampled from a manifold in the spectral domain. As demonstrated empirically, we prove that the generalization bounds of GNNs decrease linearly with the size of the graphs in the logarithmic scale, and increase linearly with the spectral continuity constants of the filter functions. Notably, our theory explains both node-level and graph-level tasks. Our result has two implications: i) guaranteeing the generalization of GNNs to unseen data over manifolds; ii) providing insights into the practical design of GNNs, i.e., restrictions on the discriminability of GNNs are necessary to obtain a better generalization performance. We demonstrate our generalization bounds of GNNs using synthetic and multiple real-world datasets.
nan
Article 1464
Title@2025-06-06 (5): Graph Persistence goes Spectral
Title: Graph Persistence goes Spectral | Grafik Persistenz geht Spektral | 图 常年 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 2506.06571v1 |
Authors (3): Mattie Ji, Amauri H. Souza, Vikas Garg
Including intricate topological information (e.g., cycles) provably enhances the expressivity of message-passing graph neural networks (GNNs) beyond the Weisfeiler-Leman (WL) hierarchy. Consequently, Persistent Homology (PH) methods are increasingly employed for graph representation learning. In this context, recent works have proposed decorating classical PH diagrams with vertex and edge features for improved expressivity. However, due to their dependence on features, these methods still fail to capture basic graph structural information. In this paper, we propose SpectRe – a new topological descriptor for graphs that integrates spectral information into PH diagrams. Notably, SpectRe is strictly more expressive than existing descriptors on graphs. We also introduce notions of global and local stability to analyze existing descriptors and establish that SpectRe is locally stable. Finally, experiments on synthetic and real-world datasets demonstrate the effectiveness of SpectRe and its potential to enhance the capabilities of graph models in relevant learning tasks.
nan
Article 1465
Title@2025-06-06 (5): Reconstructing Heterogeneous Biomolecules via Hierarchical Gaussian Mixtures and Part Discovery
Title: Reconstructing Heterogeneous Biomolecules via Hierarchical Gaussian Mixtures and Part Discovery | Rekonstruieren heterogene Biomoleküle über hierarchische Gauß-Mischungen und Teil-Entdeckung | 通过高山级等级混合体和部件发现,通过等级制混合体重新构造异基因生物分子 2506.09063v1 |
Authors (4): Shayan Shekarforoush, David B. Lindell, Marcus A. Brubaker, David J. Fleet
Cryo-EM is a transformational paradigm in molecular biology where computational methods are used to infer 3D molecular structure at atomic resolution from extremely noisy 2D electron microscope images. At the forefront of research is how to model the structure when the imaged particles exhibit non-rigid conformational flexibility and compositional variation where parts are sometimes missing. We introduce a novel 3D reconstruction framework with a hierarchical Gaussian mixture model, inspired in part by Gaussian Splatting for 4D scene reconstruction. In particular, the structure of the model is grounded in an initial process that infers a part-based segmentation of the particle, providing essential inductive bias in order to handle both conformational and compositional variability. The framework, called CryoSPIRE, is shown to reveal biologically meaningful structures on complex experimental datasets, and establishes a new state-of-the-art on CryoBench, a benchmark for cryo-EM heterogeneity methods.
nan
Article 1466
Title@2025-06-06 (5): Securing Traffic Sign Recognition Systems in Autonomous Vehicles
Title: Securing Traffic Sign Recognition Systems in Autonomous Vehicles | Sicherung von Verkehrszeichenerkennungssystemen in autonomen Fahrzeugen | 保障机动车辆交通标志识别系统的安全 2506.06563v1 |
Authors (3): Thushari Hapuarachchi, Long Dang, Kaiqi Xiong
Deep Neural Networks (DNNs) are widely used for traffic sign recognition because they can automatically extract high-level features from images. These DNNs are trained on large-scale datasets obtained from unknown sources. Therefore, it is important to ensure that the models remain secure and are not compromised or poisoned during training. In this paper, we investigate the robustness of DNNs trained for traffic sign recognition. First, we perform the error-minimizing attacks on DNNs used for traffic sign recognition by adding imperceptible perturbations on training data. Then, we propose a data augmentation-based training method to mitigate the error-minimizing attacks. The proposed training method utilizes nonlinear transformations to disrupt the perturbations and improve the model robustness. We experiment with two well-known traffic sign datasets to demonstrate the severity of the attack and the effectiveness of our mitigation scheme. The error-minimizing attacks reduce the prediction accuracy of the DNNs from 99.90% to 10.6%. However, our mitigation scheme successfully restores the prediction accuracy to 96.05%. Moreover, our approach outperforms adversarial training in mitigating the error-minimizing attacks. Furthermore, we propose a detection model capable of identifying poisoned data even when the perturbations are imperceptible to human inspection. Our detection model achieves a success rate of over 99% in identifying the attack. This research highlights the need to employ advanced training methods for DNNs in traffic sign recognition systems to mitigate the effects of data poisoning attacks.
nan
Article 1467
Title@2025-06-06 (5): Enhancing radioisotope identification in gamma spectra via supervised domain adaptation
Title: Enhancing radioisotope identification in gamma spectra via supervised domain adaptation | Verbesserung der Radioisotop-Identifikation in Gammaspektren durch Überwachung der Domänenanpassung | 通过受监督的域适应加强伽马光谱中放射性同位素识别 2412.07069v2 |
Authors (1): Peter Lalor
Machine learning methods in gamma spectroscopy have the potential to provide accurate, real-time classification of unknown radioactive samples. However, obtaining sufficient experimental training data is often prohibitively expensive and time-consuming, and models trained solely on simulated data can struggle to generalize to the unpredictable range of real-world operating scenarios. In this study, we explore how supervised domain adaptation techniques can improve radioisotope identification models by transferring knowledge between different data domains. We begin by pretraining a model for radioisotope identification using data from a synthetic source domain, and then fine-tune it for a new target domain that shares the same label space. Our analysis indicates that fine-tuned models significantly outperform those trained exclusively on source-domain data or solely on target-domain data, particularly in the intermediate data regime ($\approx 10^2$ to $10^5$ target training samples). This conclusion is consistent across four different machine learning architectures (MLP, CNN, Transformer, and LSTM). Furthermore, our findings show that fine-tuned Transformers yield a statistically significant improvement in test performance compared to the other architectures. Overall, this study serves as a proof of concept for applying supervised domain adaptation techniques to gamma spectroscopy in scenarios where experimental data is limited.
nan
Article 1468
Title@2025-06-06 (5): ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Title: ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models | ThinkEdit: Verdolmetschbare Gewichte bearbeiten, um übermäßig kurzes Denken in vernünftigen Modellen zu vermischen | 思考Edit:将可解释的重量编辑为 “ 解释性模型 “ 中模拟超短思考 2503.22048v3 |
Authors (3): Chung-En Sun, Ge Yan, Tsui-Wei Weng
Recent studies have shown that Large Language Models (LLMs) augmented with chain-of-thought (CoT) reasoning demonstrate impressive problem-solving abilities. However, in this work, we identify a recurring issue where these models occasionally generate overly short reasoning, leading to degraded performance on even simple mathematical problems. Specifically, we investigate how reasoning length is embedded in the hidden representations of reasoning models and its impact on accuracy. Our analysis reveals that reasoning length is governed by a linear direction in the representation space, allowing us to induce overly short reasoning by steering the model along this direction. Building on this insight, we introduce \textbf{\textit{ThinkEdit}}, a simple yet effective weight-editing approach to mitigate the issue of overly short reasoning. We first identify a small subset of attention heads (approximately 4%) that predominantly drive short reasoning behavior. We then edit the output projection weights of these heads to remove the short reasoning direction. With changes to only 0.2% of the model’s parameters, \textbf{\textit{ThinkEdit}} effectively reduces overly short reasoning and yields notable accuracy gains for short reasoning outputs (+6.39%), along with an overall improvement across multiple math benchmarks (+3.34%). Our findings provide new mechanistic insights into how reasoning length is controlled within LLMs and highlight the potential of fine-grained model interventions to improve reasoning quality. Our code is available at: https://github.com/Trustworthy-ML-Lab/ThinkEdit\
nan
Article 1469
Title@2025-06-06 (5): Rapid training of Hamiltonian graph networks without gradient descent
Title: Rapid training of Hamiltonian graph networks without gradient descent | Schnelles Training von Hamiltonschen Graphennetzwerken ohne Gradientenabstieg | 快速培训汉密尔顿图表网络,无梯度下降 2506.06558v1 |
Authors (4): Atamert Rahma, Chinmay Datar, Ana Cukarska, Felix Dietrich
Learning dynamical systems that respect physical symmetries and constraints remains a fundamental challenge in data-driven modeling. Integrating physical laws with graph neural networks facilitates principled modeling of complex N-body dynamics and yields accurate and permutation-invariant models. However, training graph neural networks with iterative, gradient-based optimization algorithms (e.g., Adam, RMSProp, LBFGS) often leads to slow training, especially for large, complex systems. In comparison to 15 different optimizers, we demonstrate that Hamiltonian Graph Networks (HGN) can be trained up to 600x faster–but with comparable accuracy–by replacing iterative optimization with random feature-based parameter construction. We show robust performance in diverse simulations, including N-body mass-spring systems in up to 3 dimensions with different geometries, while retaining essential physical invariances with respect to permutation, rotation, and translation. We reveal that even when trained on minimal 8-node systems, the model can generalize in a zero-shot manner to systems as large as 4096 nodes without retraining. Our work challenges the dominance of iterative gradient-descent-based optimization algorithms for training neural network models for physical systems.
nan
Article 1470
Title@2025-06-06 (5): Infinity Search: Approximate Vector Search with Projections on q-Metric Spaces
Title: Infinity Search: Approximate Vector Search with Projections on q-Metric Spaces | Infinity Search: Ungefähre Vektorsuche mit Projektionen auf q-Metric Spaces | 无限搜索: 近似矢量搜索和 q- 计量空间预测 2506.06557v1 |
Authors (4): Antonio Pariente, Ignacio Hounie, Santiago Segarra, Alejandro Ribeiro
Despite the ubiquity of vector search applications, prevailing search algorithms overlook the metric structure of vector embeddings, treating it as a constraint rather than exploiting its underlying properties. In this paper, we demonstrate that in $q$-metric spaces, metric trees can leverage a stronger version of the triangle inequality to reduce comparisons for exact search. Notably, as $q$ approaches infinity, the search complexity becomes logarithmic. Therefore, we propose a novel projection method that embeds vector datasets with arbitrary dissimilarity measures into $q$-metric spaces while preserving the nearest neighbor. We propose to learn an approximation of this projection to efficiently transform query points to a space where euclidean distances satisfy the desired properties. Our experimental results with text and image vector embeddings show that learning $q$-metric approximations enables classic metric tree algorithms – which typically underperform with high-dimensional data – to achieve competitive performance against state-of-the-art search methods.
nan
Article 1471
Title@2025-06-06 (5): SDN-Based False Data Detection With Its Mitigation and Machine Learning Robustness for In-Vehicle Networks
Title: SDN-Based False Data Detection With Its Mitigation and Machine Learning Robustness for In-Vehicle Networks | SDN-basierte falsche Datenerkennung mit seiner Abmilderung und Machine Learning Robustheit für In-Vehicle-Netzwerke | 基于SDN的假数据探测,其减轻和机械学习强力适用于车辆内网络 2506.06556v1 |
Authors (4): Long Dang, Thushari Hapuarachchi, Kaiqi Xiong, Yi Li
As the development of autonomous and connected vehicles advances, the complexity of modern vehicles increases, with numerous Electronic Control Units (ECUs) integrated into the system. In an in-vehicle network, these ECUs communicate with one another using an standard protocol called Controller Area Network (CAN). Securing communication among ECUs plays a vital role in maintaining the safety and security of the vehicle. This paper proposes a robust SDN-based False Data Detection and Mitigation System (FDDMS) for in-vehicle networks. Leveraging the unique capabilities of Software-Defined Networking (SDN), FDDMS is designed to monitor and detect false data injection attacks in real-time. Specifically, we focus on brake-related ECUs within an SDN-enabled in-vehicle network. First, we decode raw CAN data to create an attack model that illustrates how false data can be injected into the system. Then, FDDMS, incorporating a Long Short Term Memory (LSTM)-based detection model, is used to identify false data injection attacks. We further propose an effective variant of DeepFool attack to evaluate the model’s robustness. To countermeasure the impacts of four adversarial attacks including Fast gradient descent method, Basic iterative method, DeepFool, and the DeepFool variant, we further enhance a re-training technique method with a threshold based selection strategy. Finally, a mitigation scheme is implemented to redirect attack traffic by dynamically updating flow rules through SDN. Our experimental results show that the proposed FDDMS is robust against adversarial attacks and effectively detects and mitigates false data injection attacks in real-time.
nan
Article 1472
Title@2025-06-06 (5): Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features
Title: Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features | Theoretische Grundlagen des Deep Copula Klassifikators: Ein generativer Ansatz zur Modellierung abhängiger Merkmale | 深 Cocula 分类法理论基础:建模附属地貌的开创性方法 2505.22997v2 |
Authors (3): Agnideep Aich, Ashit Baran Aich, Bruce Wade
Traditional classifiers often assume feature independence or rely on overly simplistic relationships, leading to poor performance in settings where real-world dependencies matter. We introduce the Deep Copula Classifier (DCC), a generative model that separates the learning of each feature’s marginal distribution from the modeling of their joint dependence structure via neural network-parameterized copulas. For each class, lightweight neural networks are used to flexibly and adaptively capture feature interactions, making DCC particularly effective when classification is driven by complex dependencies. We establish that DCC converges to the Bayes-optimal classifier under standard conditions and provide explicit convergence rates of O(n^{-r/(2r + d)}) for r-smooth copula densities. Beyond theoretical guarantees, we outline several practical extensions, including high-dimensional scalability through vine and factor copula architectures, semi-supervised learning via entropy regularization, and online adaptation using streaming gradient methods. By unifying statistical rigor with the representational power of neural networks, DCC offers a mathematically grounded and interpretable framework for dependency-aware classification.
nan
Article 1473
Title@2025-06-06 (5): GeoClip: Geometry-Aware Clipping for Differentially Private SGD
Title: GeoClip: Geometry-Aware Clipping for Differentially Private SGD | GeoClip: Geometry-Aware Clipping für unterschiedlich private SGD | GeoClip: 区分专用 SGD 的几何- 软件剪切 2506.06549v1 |
Authors (4): Atefeh Gilani, Naima Tasnim, Lalitha Sankar, Oliver Kosut
Differentially private stochastic gradient descent (DP-SGD) is the most widely used method for training machine learning models with provable privacy guarantees. A key challenge in DP-SGD is setting the per-sample gradient clipping threshold, which significantly affects the trade-off between privacy and utility. While recent adaptive methods improve performance by adjusting this threshold during training, they operate in the standard coordinate system and fail to account for correlations across the coordinates of the gradient. We propose GeoClip, a geometry-aware framework that clips and perturbs gradients in a transformed basis aligned with the geometry of the gradient distribution. GeoClip adaptively estimates this transformation using only previously released noisy gradients, incurring no additional privacy cost. We provide convergence guarantees for GeoClip and derive a closed-form solution for the optimal transformation that minimizes the amount of noise added while keeping the probability of gradient clipping under control. Experiments on both tabular and image datasets demonstrate that GeoClip consistently outperforms existing adaptive clipping methods under the same privacy budget.
nan
Article 1474
Title@2025-06-06 (5): MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
Title: MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping | MINT: Multimodale Instruktions-Tuning mit multimodaler Interaktions-Gruppierung | MINT: 与多式互动组合组合的多式指示调制 2506.02308v3 |
Authors (5): Xiaojun Shan, Qi Cao, Xing Han, Haofei Yu, Paul Pu Liang
Recent advances in multimodal foundation models have achieved state-of-the-art performance across a range of tasks. These breakthroughs are largely driven by new pre-training paradigms that leverage large-scale, unlabeled multimodal data, followed by instruction fine-tuning on curated labeled datasets and high-quality prompts. While there is growing interest in scaling instruction fine-tuning to ever-larger datasets in both quantity and scale, our findings reveal that simply increasing the number of instruction-tuning tasks does not consistently yield better performance. Instead, we observe that grouping tasks by the common interactions across modalities, such as discovering redundant shared information, prioritizing modality selection with unique information, or requiring synergistic fusion to discover new information from both modalities, encourages the models to learn transferrable skills within a group while suppressing interference from mismatched tasks. To this end, we introduce MINT, a simple yet surprisingly effective task-grouping strategy based on the type of multimodal interaction. We demonstrate that the proposed method greatly outperforms existing task grouping baselines for multimodal instruction tuning, striking an effective balance between generalization and specialization.
nan
Article 1475
Title@2025-06-06 (5): Mixed-curvature decision trees and random forests
Title: Mixed-curvature decision trees and random forests | Entscheidungsbäume mit gemischter Krümmung und zufällige Wälder | 混合精度决定树和随机森林 2410.13879v3 |
Authors (6): Philippe Chlenski, Quentin Chu, Raiyan R. Khan, Kaizhu Du, Antonio Khalil Moretti, Itsik Pe’er
Decision trees (DTs) and their random forest (RF) extensions are workhorses of classification and regression in Euclidean spaces. However, algorithms for learning in non-Euclidean spaces are still limited. We extend DT and RF algorithms to product manifolds: Cartesian products of several hyperbolic, hyperspherical, or Euclidean components. Such manifolds handle heterogeneous curvature while still factorizing neatly into simpler components, making them compelling embedding spaces for complex datasets. Our novel angular reformulation respects manifold geometry while preserving the algorithmic properties that make decision trees effective. In the special cases of single-component manifolds, our method simplifies to its Euclidean or hyperbolic counterparts, or introduces hyperspherical DT algorithms, depending on the curvature. In benchmarks on a diverse suite of 57 classification, regression, and link prediction tasks, our product RFs ranked first on 29 tasks and came in the top 2 for 41. This highlights the value of product RFs as straightforward yet powerful new tools for data analysis in product manifolds. Code for our method is available at https://github.com/pchlenski/manify.
nan
Article 1476
Title@2025-06-06 (5): GNNAnatomy: Rethinking Model-Level Explanations for Graph Neural Networks
Title: GNNAnatomy: Rethinking Model-Level Explanations for Graph Neural Networks | GNNAnatomy: Rethinking Model-Level Erklärungen für Graph Neuronale Netzwerke | GNNN 解剖:重新思考图形神经网络的模型级解释 2406.04548v3 |
Authors (4): Hsiao-Ying Lu, Yiran Li, Ujwal Pratap Krishna Kaluvakolanu Thyagarajan, Kwan-Liu Ma
Graph Neural Networks (GNNs) achieve outstanding performance across graph-based tasks but remain difficult to interpret. In this paper, we revisit foundational assumptions underlying model-level explanation methods for GNNs, namely: (1) maximizing classification confidence yields representative explanations, (2) a single explanation suffices for an entire class of graphs, and (3) explanations are inherently trustworthy. We identify pitfalls resulting from these assumptions: methods that optimize for classification confidence may overlook partially learned patterns; topological diversity across graph subsets within the same class is often underrepresented; and explanations alone offer limited support for building user trust when applied to new datasets or models. This paper introduces GNNAnatomy, a distillation-based method designed to generate explanations while avoiding these pitfalls. GNNAnatomy first characterizes graph topology using graphlets, a set of fundamental substructures. We then train a transparent multilayer perceptron surrogate to directly approximate GNN predictions based on the graphlet representations. By analyzing the weights assigned to each graphlet, we identify the most discriminative topologies, which serve as GNN explanations. To account for structural diversity within a class, GNNAnatomy generates explanations at the required granularity through an interface that supports human-AI teaming. This interface helps users identify subsets of graphs where distinct critical substructures drive class differentiation, enabling multi-grained explanations. Additionally, by enabling exploration and linking explanations back to input graphs, the interface fosters greater transparency and trust. We evaluate GNNAnatomy on both synthetic and real-world datasets through quantitative metrics and qualitative comparisons with state-of-the-art model-level explainable GNN methods.
nan
Article 1477
Title@2025-06-06 (5): Direct Fisher Score Estimation for Likelihood Maximization
Title: Direct Fisher Score Estimation for Likelihood Maximization | Direkte Fisher-Score-Schätzung für Likelihood Maximierung | 渔业直接分数估计,以争取使机会最大化 2506.06542v1 |
Authors (4): Sherman Khoo, Yakun Wang, Song Liu, Mark Beaumont
We study the problem of likelihood maximization when the likelihood function is intractable but model simulations are readily available. We propose a sequential, gradient-based optimization method that directly models the Fisher score based on a local score matching technique which uses simulations from a localized region around each parameter iterate. By employing a linear parameterization to the surrogate score model, our technique admits a closed-form, least-squares solution. This approach yields a fast, flexible, and efficient approximation to the Fisher score, effectively smoothing the likelihood objective and mitigating the challenges posed by complex likelihood landscapes. We provide theoretical guarantees for our score estimator, including bounds on the bias introduced by the smoothing. Empirical results on a range of synthetic and real-world problems demonstrate the superior performance of our method compared to existing benchmarks.
nan
Article 1478
Title@2025-06-06 (5): LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
Title: LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs | LabSafety Bench: Benchmarking LLMs zu Sicherheitsfragen in wissenschaftlichen Laboren | 实验室安全法官:科学实验室安全问题LLM女士基准基准 2410.14182v3 |
Authors (15): Yujun Zhou, Jingdong Yang, Yue Huang, Kehan Guo, Zoe Emory, Bikram Ghosh, Amita Bedar, Sujay Shekar, Zhenwen Liang, Pin-Yu Chen, Tian Gao, Werner Geyer, Nuno Moniz, Nitesh V Chawla, Xiangliang Zhang
Artificial Intelligence (AI) is revolutionizing scientific research, yet its growing integration into laboratory environments presents critical safety challenges. While large language models (LLMs) increasingly assist in tasks ranging from procedural guidance to autonomous experiment orchestration, an “illusion of understanding” may lead researchers to overestimate their reliability. Such overreliance is particularly dangerous in high-stakes laboratory settings, where failures in hazard identification or risk assessment can result in severe accidents. To address these concerns, we propose the Laboratory Safety Benchmark (LabSafety Bench), a comprehensive framework that evaluates large language models and vision language models (VLMs) on their ability to identify potential hazards, assess risks, and predict the consequences of unsafe actions in lab environments. LabSafety Bench comprises 765 multiple-choice questions aligned with US Occupational Safety and Health Administration (OSHA) protocols, along with 404 realistic laboratory scenarios featuring dual evaluation tasks: the Hazards Identification Test and the Consequence Identification Test, with 3128 open-ended questions in total. Evaluations across eight proprietary models, seven open-weight LLMs, and four VLMs reveal that, despite advanced performance on structured assessments, no model achieves the safety threshold required for reliable operation – none scoring above 70% on the Hazards Identification Test. Moreover, while proprietary models tend to excel in multiple-choice evaluations, their performance in open-ended, real-world scenario responses is comparable to that of open-source models. These findings underscore the urgent need for specialized evaluation frameworks to ensure the safe and responsible deployment of AI in laboratory settings.
nan
Article 1479
Title@2025-06-06 (5): Rational Tuning of LLM Cascades via Probabilistic Modeling
Title: Rational Tuning of LLM Cascades via Probabilistic Modeling | Rationales Tuning von LLM-Kaskaden durch probabilistische Modellierung | 通过概率建模对LLM Casscades 进行合理测算 2501.09345v4 |
Authors (2): Michael J. Zellinger, Matt Thomson
Understanding the reliability of large language models (LLMs) has recently garnered significant attention. Given LLMs’ propensity to hallucinate, as well as their high sensitivity to prompt design, it is already challenging to predict the performance of an individual LLM. However, the problem becomes more complex for compound LLM systems such as cascades, where in addition to each model’s standalone performance, we must understand how the error rates of different models interact. In this paper, we present a probabilistic model for the joint performance distribution of a sequence of LLMs, which enables a framework for rationally tuning the confidence thresholds of a LLM cascade using continuous optimization. Compared to selecting confidence thresholds using Bayesian optimization, our parametric Markov-copula model yields more favorable error-cost trade-offs, improving the area under the error-cost curve by 4.3% on average for cascades with $k\geq 3$ models. In the low-sample regime with $n \leq 30$ training examples, the performance improvement widens to 10.2%, suggesting that our framework’s inductive assumptions about the interactions between the error rates of different LLMs enhance sample efficiency. Overall, our Markov-copula model provides a rational basis for tuning LLM cascade performance and points to the potential of probabilistic methods in analyzing systems of LLMs.
nan
Article 1480
Title@2025-06-06 (5): Hierarchical and Collaborative LLM-Based Control for Multi-UAV Motion and Communication in Integrated Terrestrial and Non-Terrestrial Networks
Title: Hierarchical and Collaborative LLM-Based Control for Multi-UAV Motion and Communication in Integrated Terrestrial and Non-Terrestrial Networks | Hierarchische und kollaborative LLM-basierte Steuerung für Multi-UAV-Bewegung und Kommunikation in integrierten terrestrischen und nicht-terrestrischen Netzwerken | 在综合陆地和非地球网络中以基于LLM(基于LLM)的基于控制系统,以控制多功能、无人驾驶航空器的移动和通信 2506.06532v1 |
Authors (4): Zijiang Yan, Hao Zhou, Jianhua Pei, Hina Tabassum
Unmanned aerial vehicles (UAVs) have been widely adopted in various real-world applications. However, the control and optimization of multi-UAV systems remain a significant challenge, particularly in dynamic and constrained environments. This work explores the joint motion and communication control of multiple UAVs operating within integrated terrestrial and non-terrestrial networks that include high-altitude platform stations (HAPS). Specifically, we consider an aerial highway scenario in which UAVs must accelerate, decelerate, and change lanes to avoid collisions and maintain overall traffic flow. Different from existing studies, we propose a novel hierarchical and collaborative method based on large language models (LLMs). In our approach, an LLM deployed on the HAPS performs UAV access control, while another LLM onboard each UAV handles motion planning and control. This LLM-based framework leverages the rich knowledge embedded in pre-trained models to enable both high-level strategic planning and low-level tactical decisions. This knowledge-driven paradigm holds great potential for the development of next-generation 3D aerial highway systems. Experimental results demonstrate that our proposed collaborative LLM-based method achieves higher system rewards, lower operational costs, and significantly reduced UAV collision rates compared to baseline approaches.
nan
Article 1481
Title@2025-06-06 (5): Training-Free Constrained Generation With Stable Diffusion Models
Title: Training-Free Constrained Generation With Stable Diffusion Models | Schulungsfreie eingeschränkte Generation mit stabilen Diffusionsmodellen | 具有稳定传播模式的稳定传播模式的无培训、无培训的受训练一代 2502.05625v3 |
Authors (5): Stefano Zampini, Jacob K. Christopher, Luca Oneto, Davide Anguita, Ferdinando Fioretto
Stable diffusion models represent the state-of-the-art in data synthesis across diverse domains and hold transformative potential for applications in science and engineering, e.g., by facilitating the discovery of novel solutions and simulating systems that are computationally intractable to model explicitly. While there is increasing effort to incorporate physics-based constraints into generative models, existing techniques are either limited in their applicability to latent diffusion frameworks or lack the capability to strictly enforce domain-specific constraints. To address this limitation this paper proposes a novel integration of stable diffusion models with constrained optimization frameworks, enabling the generation of outputs satisfying stringent physical and functional requirements. The effectiveness of this approach is demonstrated through material design experiments requiring adherence to precise morphometric properties, challenging inverse design tasks involving the generation of materials inducing specific stress-strain responses, and copyright-constrained content generation tasks.
nan
Article 1482
Title@2025-06-06 (5): Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs
Title: Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs | Sharp Gap-Dependent Variance-Aware Regret Bounds für Tabular MDPs | 表列 MDP 的差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差幅 2506.06521v1 |
Authors (5): Shulun Chen, Runlong Zhou, Zihan Zhang, Maryam Fazel, Simon S. Du
We consider the gap-dependent regret bounds for episodic MDPs. We show that the Monotonic Value Propagation (MVP) algorithm achieves a variance-aware gap-dependent regret bound of \(\tilde{O}\left(\left(\sum_{\Delta_h(s,a)>0} \frac{H^2 \log K \land \mathtt{Var}_{\max}^{\text{c}}}{\Delta_h(s,a)} +\sum_{\Delta_h(s,a)=0}\frac{ H^2 \land \mathtt{Var}_{\max}^{\text{c}}}{\Delta_{\mathrm{min}}} + SAH^4 (S \lor H) \right) \log K\right),\) where $H$ is the planning horizon, $S$ is the number of states, $A$ is the number of actions, and $K$ is the number of episodes. Here, $\Delta_h(s,a) =V_h^* (a) - Q_h^* (s, a)$ represents the suboptimality gap and $\Delta_{\mathrm{min}} := \min_{\Delta_h (s,a) > 0} \Delta_h(s,a)$. The term $\mathtt{Var}{\max}^{\text{c}}$ denotes the maximum conditional total variance, calculated as the maximum over all $(\pi, h, s)$ tuples of the expected total variance under policy $\pi$ conditioned on trajectories visiting state $s$ at step $h$. $\mathtt{Var}{\max}^{\text{c}}$ characterizes the maximum randomness encountered when learning any $(h, s)$ pair. Our result stems from a novel analysis of the weighted sum of the suboptimality gap and can be potentially adapted for other algorithms. To complement the study, we establish a lower bound of \(\Omega \left( \sum_{\Delta_h(s,a)>0} \frac{H^2 \land \mathtt{Var}_{\max}^{\text{c}}}{\Delta_h(s,a)}\cdot \log K\right),\) demonstrating the necessity of dependence on $\mathtt{Var}_{\max}^{\text{c}}$ even when the maximum unconditional total variance (without conditioning on $(h, s)$) approaches zero.
nan
Article 1483
Title@2025-06-06 (5): A Systematic Review of Poisoning Attacks Against Large Language Models
Title: A Systematic Review of Poisoning Attacks Against Large Language Models | Eine systematische Überprüfung von Giftangriffen gegen große Sprachmodelle | 系统审查针对大语言模式的中毒袭击 2506.06518v1 |
Authors (6): Neil Fendley, Edward W. Staley, Joshua Carney, William Redman, Marie Chau, Nathan Drenkow
With the widespread availability of pretrained Large Language Models (LLMs) and their training datasets, concerns about the security risks associated with their usage has increased significantly. One of these security risks is the threat of LLM poisoning attacks where an attacker modifies some part of the LLM training process to cause the LLM to behave in a malicious way. As an emerging area of research, the current frameworks and terminology for LLM poisoning attacks are derived from earlier classification poisoning literature and are not fully equipped for generative LLM settings. We conduct a systematic review of published LLM poisoning attacks to clarify the security implications and address inconsistencies in terminology across the literature. We propose a comprehensive poisoning threat model applicable to categorize a wide range of LLM poisoning attacks. The poisoning threat model includes four poisoning attack specifications that define the logistics and manipulation strategies of an attack as well as six poisoning metrics used to measure key characteristics of an attack. Under our proposed framework, we organize our discussion of published LLM poisoning literature along four critical dimensions of LLM poisoning attacks: concept poisons, stealthy poisons, persistent poisons, and poisons for unique tasks, to better understand the current landscape of security risks.
nan
Article 1484
Title@2025-06-06 (5): MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
Title: MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding | MindLLM: Ein subject-agnostisches und vielseitiges Modell für fMRI-to-Text Decodierung | MindLLLM: FMRI 到文本代码的主体不可知性和易变性模型 2502.15786v2 |
Authors (6): Weikang Qiu, Zheng Huang, Haoyu Hu, Aosong Feng, Yujun Yan, Rex Ying
Decoding functional magnetic resonance imaging (fMRI) signals into text has been a key challenge in the neuroscience community, with the potential to advance brain-computer interfaces and uncover deeper insights into brain mechanisms. However, existing approaches often struggle with suboptimal predictive performance, limited task variety, and poor generalization across subjects. In response to this, we propose MindLLM, a model designed for subject-agnostic and versatile fMRI-to-text decoding. MindLLM consists of an fMRI encoder and an off-the-shelf LLM. The fMRI encoder employs a neuroscience-informed attention mechanism, which is capable of accommodating subjects with varying input shapes and thus achieves high-performance subject-agnostic decoding. Moreover, we introduce Brain Instruction Tuning (BIT), a novel approach that enhances the model’s ability to capture diverse semantic representations from fMRI signals, facilitating more versatile decoding. We evaluate MindLLM on comprehensive fMRI-to-text benchmarks. Results demonstrate that our model outperforms the baselines, improving downstream tasks by 12.0%, unseen subject generalization by 24.5%, and novel task adaptation by 25.0%. Furthermore, the attention patterns in MindLLM provide interpretable insights into its decision-making process.
nan
Article 1485
Title@2025-06-06 (5): InstantFT: An FPGA-Based Runtime Subsecond Fine-tuning of CNN Models
Title: InstantFT: An FPGA-Based Runtime Subsecond Fine-tuning of CNN Models | InstantFT: Eine FPGA-basierte Laufzeit-Untersekunde Feinabstimmung von CNN-Modellen | 即时TFT:以FPGA为基础对有线电视新闻网模型进行FPGA运行时间第二级微调 2506.06505v1 |
Authors (2): Keisuke Sugiura, Hiroki Matsutani
Training deep neural networks (DNNs) requires significantly more computation and memory than inference, making runtime adaptation of DNNs challenging on resource-limited IoT platforms. We propose InstantFT, an FPGA-based method for ultra-fast CNN fine-tuning on IoT devices, by optimizing the forward and backward computations in parameter-efficient fine-tuning (PEFT). Experiments on datasets with concept drift demonstrate that InstantFT fine-tunes a pre-trained CNN 17.4x faster than existing Low-Rank Adaptation (LoRA)-based approaches, while achieving comparable accuracy. Our FPGA-based InstantFT reduces the fine-tuning time to just 0.36s and improves energy-efficiency by 16.3x, enabling on-the-fly adaptation of CNNs to non-stationary data distributions.
nan
Article 1486
Title@2025-06-06 (5): Optimal Rates in Continual Linear Regression via Increasing Regularization
Title: Optimal Rates in Continual Linear Regression via Increasing Regularization | Optimale Raten in kontinuierlicher linearer Regression durch zunehmende Regularisierung | 通过增加正规化,持续线性回归的优化率 2506.06501v1 |
Authors (7): Ran Levinstein, Amit Attia, Matan Schliserman, Uri Sherman, Tomer Koren, Daniel Soudry, Itay Evron
We study realizable continual linear regression under random task orderings, a common setting for developing continual learning theory. In this setup, the worst-case expected loss after $k$ learning iterations admits a lower bound of $\Omega(1/k)$. However, prior work using an unregularized scheme has only established an upper bound of $O(1/k^{1/4})$, leaving a significant gap. Our paper proves that this gap can be narrowed, or even closed, using two frequently used regularization schemes: (1) explicit isotropic $\ell_2$ regularization, and (2) implicit regularization via finite step budgets. We show that these approaches, which are used in practice to mitigate forgetting, reduce to stochastic gradient descent (SGD) on carefully defined surrogate losses. Through this lens, we identify a fixed regularization strength that yields a near-optimal rate of $O(\log k / k)$. Moreover, formalizing and analyzing a generalized variant of SGD for time-varying functions, we derive an increasing regularization strength schedule that provably achieves an optimal rate of $O(1/k)$. This suggests that schedules that increase the regularization coefficient or decrease the number of steps per task are beneficial, at least in the worst case.
nan
Article 1487
Title@2025-06-06 (5): Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms
Title: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms | Synthetische Problemgenerierung für die Vernunft über Qualität-Diversität-Algorithmen | 通过质量-多样性算法导致的原因产生合成问题 2506.06499v1 |
Authors (4): Alex Havrilla, Edward Hughes, Mikayel Samvelyan, Jacob Abernethy
Large language model (LLM) driven synthetic data generation has emerged as a powerful method for improving model reasoning capabilities. However, most methods either distill large state-of-the-art models into small students or use natural ground-truth problem statements to guarantee problem statement quality. This limits the scalability of these approaches to more complex and diverse problem domains. To address this, we present SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms, a novel approach for generating high-quality and diverse synthetic math problem and solution pairs using only a single model by measuring a problem’s solve-rate: a proxy for problem difficulty. Starting from a seed dataset of 7.5K samples, we generate over 20 million new problem-solution pairs. We show that filtering the generated data by difficulty and then fine-tuning the same model on the resulting data improves relative model performance by up to 24\%. Additionally, we conduct ablations studying the impact of synthetic data quantity, quality and diversity on model generalization. We find that higher quality, as measured by problem difficulty, facilitates better in-distribution performance. Further, while generating diverse synthetic data does not as strongly benefit in-distribution performance, filtering for more diverse data facilitates more robust OOD generalization. We also confirm the existence of model and data scaling laws for synthetically generated problems, which positively benefit downstream model generalization.
nan
Article 1488
Title@2025-06-06 (5): Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent
Title: Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent | Verbesserte Finite-Partikel-Konvergenzraten für Steinvariationsgradabszenz | 改进斯坦因变化而变的渐变后代的有限粒子聚合率 2409.08469v3 |
Authors (3): Sayan Banerjee, Krishnakumar Balasubramanian, Promit Ghosal
We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy ($\mathsf{KSD}$) and Wasserstein-2 metrics. Our key insight is that the time derivative of the relative entropy between the joint density of $N$ particle locations and the $N$-fold product target measure, starting from a regular initial distribution, splits into a dominant negative part' proportional to $N$ times the expected $\mathsf{KSD}^2$ and a smaller
positive part’. This observation leads to $\mathsf{KSD}$ rates of order $1/\sqrt{N}$, in both continuous and discrete time, providing a near optimal (in the sense of matching the corresponding i.i.d. rates) double exponential improvement over the recent result by Shi and Mackey (2024). Under mild assumptions on the kernel and potential, these bounds also grow polynomially in the dimension $d$. By adding a bilinear component to the kernel, the above approach is used to further obtain Wasserstein-2 convergence in continuous time. For the case of `bilinear + Mat'ern’ kernels, we derive Wasserstein-2 rates that exhibit a curse-of-dimensionality similar to the i.i.d. setting. We also obtain marginal convergence and long-time propagation of chaos results for the time-averaged particle laws.
nan
Article 1489
Title@2025-06-06 (5): Synthetic Text Generation for Training Large Language Models via Gradient Matching
Title: Synthetic Text Generation for Training Large Language Models via Gradient Matching | Synthetische Textgenerierung für das Training großer Sprachmodelle über Gradient Matching | 通过 “ 渐级匹配 “ 培训大语言模型的合成文本生成 2502.17607v2 |
Authors (6): Dang Nguyen, Zeman Li, Mohammadhossein Bateni, Vahab Mirrokni, Meisam Razaviyayn, Baharan Mirzasoleiman
Synthetic data has the potential to improve the performance, training efficiency, and privacy of real training examples. Nevertheless, existing approaches for synthetic text generation are mostly heuristics and cannot generate human-readable text without compromising the privacy of real data, or provide performance guarantees for training Large Language Models (LLMs). In this work, we propose the first theoretically rigorous approach for generating synthetic human-readable text that provides convergence, performance, and privacy guarantees for fine-tuning LLMs on a target task. To do so, we leverage Alternating Direction Method of Multipliers (ADMM) that iteratively optimizes the embeddings of synthetic examples to match the noisy gradient of the target training or validation data, and maps them to a sequence of text tokens with low perplexity. In doing so, the generated synthetic text guarantees convergence of the model to a close neighborhood of the solution obtained by fine-tuning on real data and preserves their privacy. Experiments on various classification tasks confirm the effectiveness of our proposed approach. Our code is available at https://github.com/BigML-CS-UCLA/GRADMM.
nan
Article 1490
Title@2025-06-06 (5): Differentiable Quadratic Optimization For The Maximum Independent Set Problem
Title: Differentiable Quadratic Optimization For The Maximum Independent Set Problem | Unterschiedliche quadratische Optimierung für das maximale unabhängige Set-Problem | 最大独立集集问题可区别的二次二次曲线优化 2406.19532v7 |
Authors (7): Ismail Alkhouri, Cedric Le Denmat, Yingjie Li, Cunxi Yu, Jia Liu, Rongrong Wang, Alvaro Velasquez
Combinatorial Optimization (CO) addresses many important problems, including the challenging Maximum Independent Set (MIS) problem. Alongside exact and heuristic solvers, differentiable approaches have emerged, often using continuous relaxations of ReLU-based or quadratic objectives. Noting that an MIS in a graph is a Maximum Clique (MC) in its complement, we propose a new quadratic formulation for MIS by incorporating an MC term, improving convergence and exploration. We show that every maximal independent set corresponds to a local minimizer, derive conditions with respect to the MIS size, and characterize stationary points. To tackle the non-convexity of the objective, we propose optimizing several initializations in parallel using momentum-based gradient descent, complemented by an efficient MIS checking criterion derived from our theory. We dub our method as parallelized Clique-Informed Quadratic Optimization for MIS (pCQO-MIS). Our experimental results demonstrate the effectiveness of the proposed method compared to exact, heuristic, sampling, and data-centric approaches. Notably, our method avoids the out-of-distribution tuning and reliance on (un)labeled data required by data-centric methods, while achieving superior MIS sizes and competitive runtime relative to their inference time. Additionally, a key advantage of pCQO-MIS is that, unlike exact and heuristic solvers, the runtime scales only with the number of nodes in the graph, not the number of edges. Our code is available at the GitHub repository: https://github.com/ledenmat/pCQO-mis-benchmark/tree/refactor.
nan
Article 1491
Title@2025-06-06 (5): RelGNN: Composite Message Passing for Relational Deep Learning
Title: RelGNN: Composite Message Passing for Relational Deep Learning | RelGNN: Composite-Botschaft für relationales Deep Learning | RelGNN:用于关系深深学习的综合信息传递 2502.06784v2 |
Authors (3): Tianlang Chen, Charilaos Kanatsoulis, Jure Leskovec
Predictive tasks on relational databases are critical in real-world applications spanning e-commerce, healthcare, and social media. To address these tasks effectively, Relational Deep Learning (RDL) encodes relational data as graphs, enabling Graph Neural Networks (GNNs) to exploit relational structures for improved predictions. However, existing RDL methods often overlook the intrinsic structural properties of the graphs built from relational databases, leading to modeling inefficiencies, particularly in handling many-to-many relationships. Here we introduce RelGNN, a novel GNN framework specifically designed to leverage the unique structural characteristics of the graphs built from relational databases. At the core of our approach is the introduction of atomic routes, which are simple paths that enable direct single-hop interactions between the source and destination nodes. Building upon these atomic routes, RelGNN designs new composite message passing and graph attention mechanisms that reduce redundancy, highlight key signals, and enhance predictive accuracy. RelGNN is evaluated on 30 diverse real-world tasks from Relbench (Fey et al., 2024), and achieves state-of-the-art performance on the vast majority of tasks, with improvements of up to 25%. Code is available at https://github.com/snap-stanford/RelGNN.
nan
Article 1492
Title@2025-06-06 (5): Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Title: Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks | Alternierende Gradientenströme: Eine Theorie des Feature-Lernens in zweischichtigen Neuronalen Netzwerken | 交错的渐变流:两层神经网络中的特色学习理论 2506.06489v1 |
Authors (8): Daniel Kunin, Giovanni Luca Marchetti, Feng Chen, Dhruva Karkada, James B. Simon, Michael R. DeWeese, Surya Ganguli, Nina Miolane
What features neural networks learn, and how, remains an open question. In this paper, we introduce Alternating Gradient Flows (AGF), an algorithmic framework that describes the dynamics of feature learning in two-layer networks trained from small initialization. Prior works have shown that gradient flow in this regime exhibits a staircase-like loss curve, alternating between plateaus where neurons slowly align to useful directions and sharp drops where neurons rapidly grow in norm. AGF approximates this behavior as an alternating two-step process: maximizing a utility function over dormant neurons and minimizing a cost function over active ones. AGF begins with all neurons dormant. At each round, a dormant neuron activates, triggering the acquisition of a feature and a drop in the loss. AGF quantifies the order, timing, and magnitude of these drops, matching experiments across architectures. We show that AGF unifies and extends existing saddle-to-saddle analyses in fully connected linear networks and attention-only linear transformers, where the learned features are singular modes and principal components, respectively. In diagonal linear networks, we prove AGF converges to gradient flow in the limit of vanishing initialization. Applying AGF to quadratic networks trained to perform modular addition, we give the first complete characterization of the training dynamics, revealing that networks learn Fourier features in decreasing order of coefficient magnitude. Altogether, AGF offers a promising step towards understanding feature learning in neural networks.
nan
Article 1493
Title@2025-06-06 (5): Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws
Title: Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws | Diversity of Transformer Layers: Ein Aspekt der Parameterskalierungsgesetze | 变形层多样性:参数面积法的一个方面 2505.24009v2 |
Authors (6): Hidetaka Kamigaito, Ying Zhang, Jingun Kwon, Katsuhiko Hayashi, Manabu Okumura, Taro Watanabe
Transformers deliver outstanding performance across a wide range of tasks and are now a dominant backbone architecture for large language models (LLMs). Their task-solving performance is improved by increasing parameter size, as shown in the recent studies on parameter scaling laws. Although recent mechanistic-interpretability studies have deepened our understanding of the internal behavior of Transformers by analyzing their residual stream, the relationship between these internal mechanisms and the parameter scaling laws remains unclear. To bridge this gap, we focus on layers and their size, which mainly decide the parameter size of Transformers. For this purpose, we first theoretically investigate the layers within the residual stream through a bias-diversity decomposition. The decomposition separates (i) bias, the error of each layer’s output from the ground truth, and (ii) diversity, which indicates how much the outputs of each layer differ from each other. Analyzing Transformers under this theory reveals that performance improves when individual layers make predictions close to the correct answer and remain mutually diverse. We show that diversity becomes especially critical when individual layers’ outputs are far from the ground truth. Finally, we introduce an information-theoretic diversity and show our main findings that adding layers enhances performance only when those layers behave differently, i.e., are diverse. We also reveal the performance gains from increasing the number of layers exhibit submodularity: marginal improvements diminish as additional layers increase, mirroring the logarithmic convergence predicted by the parameter scaling laws. Experiments on multiple semantic-understanding tasks with various LLMs empirically confirm the theoretical properties derived in this study.
nan
Article 1494
Title@2025-06-06 (5): Membership Inference Attacks for Unseen Classes
Title: Membership Inference Attacks for Unseen Classes | Mitgliedschafts-Inferenz-Angriffe für ungesehene Klassen | 对不见人见的类类的推论攻击 2506.06488v1 |
Authors (4): Pratiksha Thaker, Neil Kale, Zhiwei Steven Wu, Virginia Smith
Shadow model attacks are the state-of-the-art approach for membership inference attacks on machine learning models. However, these attacks typically assume an adversary has access to a background (nonmember) data distribution that matches the distribution the target model was trained on. We initiate a study of membership inference attacks where the adversary or auditor cannot access an entire subclass from the distribution – a more extreme but realistic version of distribution shift than has been studied previously. In this setting, we first show that the performance of shadow model attacks degrades catastrophically, and then demonstrate the promise of another approach, quantile regression, that does not have the same limitations. We show that quantile regression attacks consistently outperform shadow model attacks in the class dropout setting – for example, quantile regression attacks achieve up to 11$\times$ the TPR of shadow models on the unseen class on CIFAR-100, and achieve nontrivial TPR on ImageNet even with 90% of training classes removed. We also provide a theoretical model that illustrates the potential and limitations of this approach.
nan
Article 1495
Title@2025-06-06 (5): Neural-Augmented Kelvinlet: Real-Time Soft Tissue Deformation with Multiple Graspers
Title: Neural-Augmented Kelvinlet: Real-Time Soft Tissue Deformation with Multiple Graspers | Neural-Augmented Kelvinlet: Echtzeit-Soft Tissue-Deformation mit mehreren Graspers | 神经加速开尔文莱:实时软体组织变形,带多颗粒变形 2506.08043v1 |
Authors (15): Ashkan Shahbazi, Kyvia Pereira, Jon S. Heiselman, Elaheh Akbari, Annie C. Benson, Sepehr Seifi, Xinyuan Liu, Garrison L. Johnston, Erwin Terpstra, Anne Draaisma, Jan-Jaap Severes, Jie Ying Wu, Nabil Simaan, Michael L. Miga, Soheil Kolouri
Fast and accurate simulation of soft tissue deformation is a critical factor for surgical robotics and medical training. In this paper, we introduce a novel physics-informed neural simulator that approximates soft tissue deformations in a realistic and real-time manner. Our framework integrates Kelvinlet-based priors into neural simulators, making it the first approach to leverage Kelvinlets for residual learning and regularization in data-driven soft tissue modeling. By incorporating large-scale Finite Element Method (FEM) simulations of both linear and nonlinear soft tissue responses, our method improves neural network predictions across diverse architectures, enhancing accuracy and physical consistency while maintaining low latency for real-time performance. We demonstrate the effectiveness of our approach by performing accurate surgical maneuvers that simulate the use of standard laparoscopic tissue grasping tools with high fidelity. These results establish Kelvinlet-augmented learning as a powerful and efficient strategy for real-time, physics-aware soft tissue simulation in surgical applications.
nan
Article 1496
Title@2025-06-06 (5): A Certified Unlearning Approach without Access to Source Data
Title: A Certified Unlearning Approach without Access to Source Data | Ein zertifizierter Unlearning-Ansatz ohne Zugriff auf Quelldaten | 未经认证的不学习方法,无法获得源数据 2506.06486v1 |
Authors (4): Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, Basak Guler
With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning methods often assume access to the complete training dataset, which is unrealistic in scenarios where the source data is no longer available. To address this challenge, we propose a certified unlearning framework that enables effective data removal \final{without access to the original training data samples}. Our approach utilizes a surrogate dataset that approximates the statistical properties of the source data, allowing for controlled noise scaling based on the statistical distance between the two. \updated{While our theoretical guarantees assume knowledge of the exact statistical distance, practical implementations typically approximate this distance, resulting in potentially weaker but still meaningful privacy guarantees.} This ensures strong guarantees on the model’s behavior post-unlearning while maintaining its overall utility. We establish theoretical bounds, introduce practical noise calibration techniques, and validate our method through extensive experiments on both synthetic and real-world datasets. The results demonstrate the effectiveness and reliability of our approach in privacy-sensitive settings.
nan
Article 1497
Title@2025-06-06 (5): The Economic Dispatch of Power-to-Gas Systems with Deep Reinforcement Learning:Tackling the Challenge of Delayed Rewards with Long-Term Energy Storage
Title: The Economic Dispatch of Power-to-Gas Systems with Deep Reinforcement Learning:Tackling the Challenge of Delayed Rewards with Long-Term Energy Storage | Die wirtschaftliche Dispatch von Power-to-Gas-Systemen mit tiefem Ausbau-Lernen:Die Herausforderung von verzögerten Belohnungen mit langfristiger Energiespeicherung bewältigen | 以深入强化学习的方式经济发送电力到地电系统:利用长期能源储存应对延迟奖励的挑战 2506.06484v1 |
Authors (3): Manuel Sage, Khalil Al Handawi, Yaoyao Fiona Zhao
Power-to-Gas (P2G) technologies gain recognition for enabling the integration of intermittent renewables, such as wind and solar, into electricity grids. However, determining the most cost-effective operation of these systems is complex due to the volatile nature of renewable energy, electricity prices, and loads. Additionally, P2G systems are less efficient in converting and storing energy compared to battery energy storage systems (BESs), and the benefits of converting electricity into gas are not immediately apparent. Deep Reinforcement Learning (DRL) has shown promise in managing the operation of energy systems amidst these uncertainties. Yet, DRL techniques face difficulties with the delayed reward characteristic of P2G system operation. Previous research has mostly focused on short-term studies that look at the energy conversion process, neglecting the long-term storage capabilities of P2G. This study presents a new method by thoroughly examining how DRL can be applied to the economic operation of P2G systems, in combination with BESs and gas turbines, over extended periods. Through three progressively more complex case studies, we assess the performance of DRL algorithms, specifically Deep Q-Networks and Proximal Policy Optimization, and introduce modifications to enhance their effectiveness. These modifications include integrating forecasts, implementing penalties on the reward function, and applying strategic cost calculations, all aimed at addressing the issue of delayed rewards. Our findings indicate that while DRL initially struggles with the complex decision-making required for P2G system operation, the adjustments we propose significantly improve its capability to devise cost-effective operation strategies, thereby unlocking the potential for long-term energy storage in P2G technologies.
nan
Article 1498
Title@2025-06-06 (5): Noise Consistency Regularization for Improved Subject-Driven Image Synthesis
Title: Noise Consistency Regularization for Improved Subject-Driven Image Synthesis | Lärmkonsistenz-Regularisierung für verbesserte themengetriebene Bildsynthese | 改进主题驱动图像合成改进目标驱动图像集的一致一致规范化 2506.06483v1 |
Authors (4): Yao Ni, Song Wen, Piotr Koniusz, Anoop Cherian
Fine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects. However, existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity. To address these challenges, we propose two auxiliary consistency losses for diffusion fine-tuning. First, a prior consistency regularization loss ensures that the predicted diffusion noise for prior (non-subject) images remains consistent with that of the pretrained model, improving fidelity. Second, a subject consistency regularization loss enhances the fine-tuned model’s robustness to multiplicative noise modulated latent code, helping to preserve subject identity while improving diversity. Our experimental results demonstrate that incorporating these losses into fine-tuning not only preserves subject identity but also enhances image diversity, outperforming DreamBooth in terms of CLIP scores, background variation, and overall visual quality.
nan
Article 1499
Title@2025-06-06 (5): TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness
Title: TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness | TimeRecipe: Ein Time-Series-Prognose-Rezept über Benchmarking Module Level Wirksamkeit | 时间Recpe: 通过基准标准模块的实时序列预测食谱 2506.06482v1 |
Authors (6): Zhiyuan Zhao, Juntong Ni, Shangqing Xu, Haoxin Liu, Wei Jin, B. Aditya Prakash
Time-series forecasting is an essential task with wide real-world applications across domains. While recent advances in deep learning have enabled time-series forecasting models with accurate predictions, there remains considerable debate over which architectures and design components, such as series decomposition or normalization, are most effective under varying conditions. Existing benchmarks primarily evaluate models at a high level, offering limited insight into why certain designs work better. To mitigate this gap, we propose TimeRecipe, a unified benchmarking framework that systematically evaluates time-series forecasting methods at the module level. TimeRecipe conducts over 10,000 experiments to assess the effectiveness of individual components across a diverse range of datasets, forecasting horizons, and task settings. Our results reveal that exhaustive exploration of the design space can yield models that outperform existing state-of-the-art methods and uncover meaningful intuitions linking specific design choices to forecasting scenarios. Furthermore, we release a practical toolkit within TimeRecipe that recommends suitable model architectures based on these empirical insights. The benchmark is available at: https://github.com/AdityaLab/TimeRecipe.
nan
Article 1500
Title@2025-06-06 (5): Unveiling Markov Heads in Pretrained Language Models for Offline Reinforcement Learning
Title: Unveiling Markov Heads in Pretrained Language Models for Offline Reinforcement Learning | Markov-Köpfe in vorgebildeten Sprachmodellen für Offline-Verstärkungslernen enthüllen | 离线强化学习培训前语言模式中未排行的Markov负责人 2409.06985v2 |
Authors (7): Wenhao Zhao, Qiushui Xu, Linjie Xu, Lei Song, Jinyu Wang, Chunlai Zhou, Jiang Bian
Recently, incorporating knowledge from pretrained language models (PLMs) into decision transformers (DTs) has generated significant attention in offline reinforcement learning (RL). These PLMs perform well in RL tasks, raising an intriguing question: what kind of knowledge from PLMs has been transferred to RL to achieve such good results? This work first dives into this problem by analyzing each head quantitatively and points out Markov head, a crucial component that exists in the attention heads of PLMs. It leads to extreme attention on the last-input token and performs well only in short-term environments. Furthermore, we prove that this extreme attention cannot be changed by re-training embedding layer or fine-tuning. Inspired by our analysis, we propose a general method GPT2-DTMA, which equips a pretrained DT with Mixture of Attention (MoA), to accommodate diverse attention requirements during fine-tuning. Extensive experiments corroborate our theorems and demonstrate the effectiveness of GPT2-DTMA: it achieves comparable performance in short-term environments while significantly narrowing the performance gap in long-term environments.
nan
Article 1501
Title@2025-06-06 (5): DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation
Title: DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation | DELT: Ein einfaches Diversity-orientiertes EarlyLate-Training für die Datensatzdestillation | DELT: 以简单多样性驱动的数据集蒸馏早期远程培训 2411.19946v2 |
Authors (4): Zhiqiang Shen, Ammar Sherif, Zeyuan Yin, Shitong Shao
Recent advances in dataset distillation have led to solutions in two main directions. The conventional batch-to-batch matching mechanism is ideal for small-scale datasets and includes bi-level optimization methods on models and syntheses, such as FRePo, RCIG, and RaT-BPTT, as well as other methods like distribution matching, gradient matching, and weight trajectory matching. Conversely, batch-to-global matching typifies decoupled methods, which are particularly advantageous for large-scale datasets. This approach has garnered substantial interest within the community, as seen in SRe$^2$L, G-VBSM, WMDD, and CDA. A primary challenge with the second approach is the lack of diversity among syntheses within each class since samples are optimized independently and the same global supervision signals are reused across different synthetic images. In this study, we propose a new Diversity-driven EarlyLate Training (DELT) scheme to enhance the diversity of images in batch-to-global matching with less computation. Our approach is conceptually simple yet effective, it partitions predefined IPC samples into smaller subtasks and employs local optimizations to distill each subset into distributions from distinct phases, reducing the uniformity induced by the unified optimization process. These distilled images from the subtasks demonstrate effective generalization when applied to the entire task. We conduct extensive experiments on CIFAR, Tiny-ImageNet, ImageNet-1K, and its sub-datasets. Our approach outperforms the previous state-of-the-art by 2$\sim$5% on average across different datasets and IPCs (images per class), increasing diversity per class by more than 5% while reducing synthesis time by up to 39.3% for enhancing the training efficiency. Code is available at: https://github.com/VILA-Lab/DELT.
nan
Article 1502
Title@2025-06-06 (5): Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage
Title: Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage | Kosteneffizientes LLM-Training mit Lifetime-Aware Tensor-Offloading über GPUDirect Storage | 通过GPUPUDirect 储存进行终身软件传感器脱载的成本效益高的LLM培训 2506.06472v1 |
Authors (7): Ziqi Yuan, Haoyang Zhang, Yirui Eric Zhou, Apoorve Mohan, I-Hsin Chung, Seetharami Seelam, Jian Huang
We present the design and implementation of a new lifetime-aware tensor offloading framework for GPU memory expansion using low-cost PCIe-based solid-state drives (SSDs). Our framework, TERAIO, is developed explicitly for large language model (LLM) training with multiple GPUs and multiple SSDs. Its design is driven by our observation that the active tensors take only a small fraction (1.7% on average) of allocated GPU memory in each LLM training iteration, the inactive tensors are usually large and will not be used for a long period of time, creating ample opportunities for offloading/prefetching tensors to/from slow SSDs without stalling the GPU training process. TERAIO accurately estimates the lifetime (active period of time in GPU memory) of each tensor with the profiling of the first few iterations in the training process. With the tensor lifetime analysis, TERAIO will generate an optimized tensor offloading/prefetching plan and integrate it into the compiled LLM program via PyTorch. TERAIO has a runtime tensor migration engine to execute the offloading/prefetching plan via GPUDirect storage, which allows direct tensor migration between GPUs and SSDs for alleviating the CPU bottleneck and maximizing the SSD bandwidth utilization. In comparison with state-of-the-art studies such as ZeRO-Offload and ZeRO-Infinity, we show that TERAIO improves the training performance of various LLMs by 1.47x on average, and achieves 80.7% of the ideal performance assuming unlimited GPU memory.
nan
Article 1503
Title@2025-06-06 (5): SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation
Title: SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation | SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation | SIGMA:通过西布林-引导蒙特卡罗增殖改进大语言示范理由说明 2506.06470v1 |
Authors (7): Yanwei Ren, Haotian Zhang, Fuxiang Wu, Jiayan Qiu, Jiaxing Huang, Baosheng Yu, Liu Liu
Enhancing large language models by simply scaling up datasets has begun to yield diminishing returns, shifting the spotlight to data quality. Monte Carlo Tree Search (MCTS) has emerged as a powerful technique for generating high-quality chain-of-thought data, yet conventional approaches typically retain only the top-scoring trajectory from the search tree, discarding sibling nodes that often contain valuable partial insights, recurrent error patterns, and alternative reasoning strategies. This unconditional rejection of non-optimal reasoning branches may waste vast amounts of informative data in the whole search tree. We propose SIGMA (Sibling Guided Monte Carlo Augmentation), a novel framework that reintegrates these discarded sibling nodes to refine LLM reasoning. SIGMA forges semantic links among sibling nodes along each search path and applies a two-stage refinement: a critique model identifies overlooked strengths and weaknesses across the sibling set, and a revision model conducts text-based backpropagation to refine the top-scoring trajectory in light of this comparative feedback. By recovering and amplifying the underutilized but valuable signals from non-optimal reasoning branches, SIGMA substantially improves reasoning trajectories. On the challenging MATH benchmark, our SIGMA-tuned 7B model achieves 54.92% accuracy using only 30K samples, outperforming state-of-the-art models trained on 590K samples. This result highlights that our sibling-guided optimization not only significantly reduces data usage but also significantly boosts LLM reasoning.
nan
Article 1504
Title@2025-06-06 (5): BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Title: BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning | BRiTE: Bootstrapping verstärkter Denkprozess zur Verbesserung von Sprachmodellen | Brite:强化强化思考程序以加强语言示范理由 2501.18858v2 |
Authors (13): Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, Liwei Wang, Mingyi Hong, Zhaoran Wang
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, yet generating reliable reasoning processes remains a significant challenge. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. Within this framework, we introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps. First, it generates high-quality rationales by approximating the optimal thinking process through reinforcement learning, using a novel reward shaping mechanism. Second, it enhances the base LLM by maximizing the joint probability of rationale generation with respect to the model’s parameters. Theoretically, we demonstrate BRiTE’s convergence at a rate of $1/T$ with $T$ representing the number of iterations. Empirical evaluations on math and coding benchmarks demonstrate that our approach consistently improves performance across different base models without requiring human-annotated thinking processes. In addition, BRiTE demonstrates superior performance compared to existing algorithms that bootstrap thinking processes use alternative methods such as rejection sampling, and can even match or exceed the results achieved through supervised fine-tuning with human-annotated data.
nan
Article 1505
Title@2025-06-06 (5): Latent Thought Models with Variational Bayes Inference-Time Computation
Title: Latent Thought Models with Variational Bayes Inference-Time Computation | Latent Thought Models mit Variational Bayes Inferenz-Zeit-Computation | 具有变异性贝湾推推算-时间计算法的古老思想模型 2502.01567v2 |
Authors (11): Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, Edouardo Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying Nian Wu
We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast learning of local variational parameters for the posterior distribution of latent vectors (inference-time computation), and slow learning of global decoder parameters. Empirical studies reveal that LTMs possess additional scaling dimensions beyond traditional Large Language Models (LLMs), such as the number of iterations in inference-time computation and number of latent thought vectors. Higher sample efficiency can be achieved by increasing training compute per token, with further gains possible by trading model size for more inference steps. Designed based on these scaling properties, LTMs demonstrate superior sample and parameter efficiency compared to autoregressive models and discrete diffusion models. They significantly outperform these counterparts in validation perplexity and zero-shot language modeling tasks. Additionally, LTMs exhibit emergent few-shot in-context reasoning capabilities that scale with model size, and achieve competitive performance in conditional and unconditional text generation.
nan
Article 1506
Title@2025-06-06 (5): FairICP: Encouraging Equalized Odds via Inverse Conditional Permutation
Title: FairICP: Encouraging Equalized Odds via Inverse Conditional Permutation | FairICP: Förderung ausgeglichener Odds durch inverse Conditional Permutation | FairICP:通过反条件性变异鼓励平等奇数 2404.05678v4 |
Authors (2): Yuheng Lai, Leying Guan
$\textit{Equalized odds}$, an important notion of algorithmic fairness, aims to ensure that sensitive variables, such as race and gender, do not unfairly influence the algorithm’s prediction when conditioning on the true outcome. Despite rapid advancements, current research primarily focuses on equalized odds violations caused by a single sensitive attribute, leaving the challenge of simultaneously accounting for multiple attributes under-addressed. We bridge this gap by introducing an in-processing fairness-aware learning approach, FairICP, which integrates adversarial learning with a novel inverse conditional permutation scheme. FairICP offers a flexible and efficient scheme to promote equalized odds under fairness conditions described by complex and multi-dimensional sensitive attributes. The efficacy and adaptability of our method are demonstrated through both simulation studies and empirical analyses of real-world datasets.
nan
Article 1507
Title@2025-06-06 (5): Towards Infant Sleep-Optimized Driving: Synergizing Wearable and Vehicle Sensing in Intelligent Cruise Control
Title: Towards Infant Sleep-Optimized Driving: Synergizing Wearable and Vehicle Sensing in Intelligent Cruise Control | Auf dem Weg zum schlafoptimierten Fahren für Säuglinge: Verwendbares und Fahrzeug sensibilisieren in intelligenter Kreuzfahrtsteuerung | 婴儿睡眠优化驾驶:在智能巡航控制中统一穿戴和车辆遥感 2506.06459v1 |
Authors (3): Ruitao Chen, Mozhang Guo, Jinge Li
Automated driving (AD) has substantially improved vehicle safety and driving comfort, but their impact on passenger well-being, particularly infant sleep, is not sufficiently studied. Sudden acceleration, abrupt braking, and sharp maneuvers can disrupt infant sleep, compromising both passenger comfort and parental convenience. To solve this problem, this paper explores the integration of reinforcement learning (RL) within AD to personalize driving behavior and optimally balance occupant comfort and travel efficiency. In particular, we propose an intelligent cruise control framework that adapts to varying driving conditions to enhance infant sleep quality by effectively synergizing wearable sensing and vehicle data. Long short-term memory (LSTM) and transformer-based neural networks are integrated with RL to model the relationship between driving behavior and infant sleep quality under diverse traffic and road conditions. Based on the sleep quality indicators from the wearable sensors, driving action data from vehicle controllers, and map data from map applications, the model dynamically computes the optimal driving aggressiveness level, which is subsequently translated into specific AD control strategies, e.g., the magnitude and frequency of acceleration, lane change, and overtaking. Simulation results demonstrate that the proposed solution significantly improves infant sleep quality compared to baseline methods, while preserving desirable travel efficiency.
nan
Article 1508
Title@2025-06-06 (5): WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets
Title: WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets | WISCA: Ein auf Konsens basierender Ansatz zur Harmonisierung der Dolmetschbarkeit in Tabellendatensätzen | WISSCA: 统一图表数据集解释性的协商一致办法 2506.06455v1 |
Authors (3): Antonio Jesús Banegas-Luna, Horacio Pérez-Sánchez, Carlos Martínez-Cortés
While predictive accuracy is often prioritized in machine learning (ML) models, interpretability remains essential in scientific and high-stakes domains. However, diverse interpretability algorithms frequently yield conflicting explanations, highlighting the need for consensus to harmonize results. In this study, six ML models were trained on six synthetic datasets with known ground truths, utilizing various model-agnostic interpretability techniques. Consensus explanations were generated using established methods and a novel approach: WISCA (Weighted Scaled Consensus Attributions), which integrates class probability and normalized attributions. WISCA consistently aligned with the most reliable individual method, underscoring the value of robust consensus strategies in improving explanation reliability.
nan
Article 1509
Title@2025-06-06 (5): LETS Forecast: Learning Embedology for Time Series Forecasting
Title: LETS Forecast: Learning Embedology for Time Series Forecasting | LETS-Prognose: Lernembedologie für Zeitreihenprognosen | LETS预测: 学习时间序列预测的生态学 2506.06454v1 |
Authors (5): Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Srinath Namburi GNVV, Nada Magdi Elkordi, Yin Li
Real-world time series are often governed by complex nonlinear dynamics. Understanding these underlying dynamics is crucial for precise future prediction. While deep learning has achieved major success in time series forecasting, many existing approaches do not explicitly model the dynamics. To bridge this gap, we introduce DeepEDM, a framework that integrates nonlinear dynamical systems modeling with deep neural networks. Inspired by empirical dynamic modeling (EDM) and rooted in Takens’ theorem, DeepEDM presents a novel deep model that learns a latent space from time-delayed embeddings, and employs kernel regression to approximate the underlying dynamics, while leveraging efficient implementation of softmax attention and allowing for accurate prediction of future time steps. To evaluate our method, we conduct comprehensive experiments on synthetic data of nonlinear dynamical systems as well as real-world time series across domains. Our results show that DeepEDM is robust to input noise, and outperforms state-of-the-art methods in forecasting accuracy. Our code is available at: https://abrarmajeedi.github.io/deep_edm.
nan
Article 1510
Title@2025-06-06 (5): Tackling Feature and Sample Heterogeneity in Decentralized Multi-Task Learning: A Sheaf-Theoretic Approach
Title: Tackling Feature and Sample Heterogeneity in Decentralized Multi-Task Learning: A Sheaf-Theoretic Approach | Schlagen von Feature und Sample Heterogenität im dezentralisierten Multi-Task-Lernen: Ein schaftheoretischer Ansatz | 处理分散化多任务学习中的地貌特征和抽样差异:沙夫理论方法 2502.01145v2 |
Authors (3): Chaouki Ben Issaid, Praneeth Vepakomma, Mehdi Bennis
Federated multi-task learning (FMTL) aims to simultaneously learn multiple related tasks across clients without sharing sensitive raw data. However, in the decentralized setting, existing FMTL frameworks are limited in their ability to capture complex task relationships and handle feature and sample heterogeneity across clients. To address these challenges, we introduce a novel sheaf-theoretic-based approach for FMTL. By representing client relationships using cellular sheaves, our framework can flexibly model interactions between heterogeneous client models. We formulate the sheaf-based FMTL optimization problem using sheaf Laplacian regularization and propose the Sheaf-FMTL algorithm to solve it. We show that the proposed framework provides a unified view encompassing many existing federated learning (FL) and FMTL approaches. Furthermore, we prove that our proposed algorithm, Sheaf-FMTL, achieves a sublinear convergence rate in line with state-of-the-art decentralized FMTL algorithms. Extensive experiments show that although Sheaf-FMTL introduces computational and storage overhead due to the management of interaction maps, it achieves substantial communication savings in terms of transmitted bits when compared to decentralized FMTL baselines. This trade-off makes Sheaf-FMTL especially suitable for cross-silo FL scenarios, where managing model heterogeneity and ensuring communication efficiency are essential, and where clients have adequate computational resources.
nan
Article 1511
Title@2025-06-06 (5): Canonical Autoregressive Generation
Title: Canonical Autoregressive Generation | Kanonische autoregressive Generation | Canonic 自动后退一代 2506.06446v1 |
Authors (4): Ivi Chatzi, Nina Corvelo Benz, Stratis Tsirtsis, Manuel Gomez-Rodriguez
State of the art large language models are trained using large amounts of tokens derived from raw text using what is called a tokenizer. Crucially, the tokenizer determines the (token) vocabulary a model will use during inference as well as, in principle, the (token) language. This is because, while the token vocabulary may allow for different tokenizations of a string, the tokenizer always maps the string to only one of these tokenizations–the canonical tokenization. However, multiple lines of empirical evidence suggest that large language models do not always generate canonical token sequences, and this comes with several negative consequences. In this work, we first show that, to generate a canonical token sequence, a model needs to generate (partial) canonical token sequences at each step of the autoregressive generation process underpinning its functioning. Building upon this theoretical result, we introduce canonical sampling, a simple and efficient sampling method that precludes a given model from generating non-canonical token sequences. Further, we also show that, in comparison with standard sampling, the distribution of token sequences generated using canonical sampling is provably closer to the true distribution of token sequences used during training.
nan
Article 1512
Title@2025-06-06 (5): Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance
Title: Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance | Saffron-1: Auf dem Weg zu einem Skalierungsparadigma für die LLM-Sicherheitssicherung | 山红-1:努力为LLM公司安全保证建立一个推论范围扩增范例 2506.06444v1 |
Authors (5): Ruizhong Qiu, Gaotang Li, Tianxin Wei, Jingrui He, Hanghang Tong
Existing safety assurance research has primarily focused on training-phase alignment to instill safe behaviors into LLMs. However, recent studies have exposed these methods’ susceptibility to diverse jailbreak attacks. Concurrently, inference scaling has significantly advanced LLM reasoning capabilities but remains unexplored in the context of safety assurance. Addressing this gap, our work pioneers inference scaling for robust and effective LLM safety against emerging threats. We reveal that conventional inference scaling techniques, despite their success in reasoning tasks, perform poorly in safety contexts, even falling short of basic approaches like Best-of-N Sampling. We attribute this inefficiency to a newly identified challenge, the exploration–efficiency dilemma, arising from the high computational overhead associated with frequent process reward model (PRM) evaluations. To overcome this dilemma, we propose SAFFRON, a novel inference scaling paradigm tailored explicitly for safety assurance. Central to our approach is the introduction of a multifurcation reward model (MRM) that significantly reduces the required number of reward model evaluations. To operationalize this paradigm, we further propose: (i) a partial supervision training objective for MRM, (ii) a conservative exploration constraint to prevent out-of-distribution explorations, and (iii) a Trie-based key–value caching strategy that facilitates cache sharing across sequences during tree search. Extensive experiments validate the effectiveness of our method. Additionally, we publicly release our trained multifurcation reward model (Saffron-1) and the accompanying token-level safety reward dataset (Safety4M) to accelerate future research in LLM safety. Our code, model, and data are publicly available at https://github.com/q-rz/saffron , and our project homepage is at https://q-rz.github.io/p/saffron .
nan
Article 1513
Title@2025-06-06 (5): Unlocking Chemical Insights: Superior Molecular Representations from Intermediate Encoder Layers
Title: Unlocking Chemical Insights: Superior Molecular Representations from Intermediate Encoder Layers | Entsperren von chemischen Erkenntnissen: Überlegene molekulare Darstellungen von Zwischen-Encoder-Schichten | 解锁化学透视:中间编码层的高级分子代表 2506.06443v1 |
Authors (1): Luis Pinto
Pretrained molecular encoders have become indispensable in computational chemistry for tasks such as property prediction and molecular generation. However, the standard practice of relying solely on final-layer embeddings for downstream tasks may discard valuable information. In this work, we challenge this convention by conducting a comprehensive layer-wise analysis of five diverse molecular encoders across 22 ADMET property prediction tasks. Our results demonstrate that embeddings from intermediate layers consistently outperform final-layer representations. Specifically, using fixed embeddings from the optimal intermediate layers improved downstream performance by an average of 5.4%, reaching gains up to 28.6%. Furthermore, finetuning up to these intermediate layers yielded even greater average improvements of 8.5%, with performance increases as high as 40.8%, achieving new state-of-the-art results on several benchmarks. Additionally, a strong positive correlation between fixed embedding performance and finetuning outcomes supports an efficient evaluate-then-finetune approach, enabling identification of optimal layers with reduced computational cost. These findings highlight the importance of exploring the full representational depth of molecular encoders to achieve substantial performance improvements and computational efficiency. The code is made publicly available at https://github.com/luispintoc/Unlocking-Chemical-Insights.
nan
Article 1514
Title@2025-06-06 (5): A Framework of decision-relevant observability: Reinforcement Learning converges under relative ignorability
Title: A Framework of decision-relevant observability: Reinforcement Learning converges under relative ignorability | Ein Rahmen entscheidungsrelevanter Beobachtbarkeit: Verstärktes Lernen konvergiert unter relativer Entzündlichkeit | 与决策相关的可观察性框架:加强学习在相对忽视的情况下趋同 2504.07722v4 |
Authors (1): MaryLena Bleile
From clinical dosing algorithms to autonomous robots, sequential decision-making systems routinely operate with missing or incomplete data. Classical reinforcement learning theory, which is commonly used to solve sequential decision problems, assumes Markovian observability, which may not hold under partial observability. Causal inference paradigms formalise ignorability of missingness. We show these views can be unified and generalized in order to guarantee Q-learning convergence even when the Markov property fails. To do so, we introduce the concept of \emph{relative ignorability}. Relative ignorability is a graphical-causal criterion which refines the requirements for accurate decision-making based on incomplete data. Theoretical results and simulations both reveal that non-markovian stochastic processes whose missingness is relatively ignorable with respect to causal estimands can still be optimized using standard Reinforcement Learning algorithms. These results expand the theoretical foundations of safe, data-efficient AI to real-world environments where complete information is unattainable.
nan
Article 1515
Title@2025-06-06 (5): Exploring Diffusion Transformer Designs via Grafting
Title: Exploring Diffusion Transformer Designs via Grafting | Erforschen von Diffusion Transformer Designs über Grafting | 探索通过Garfting 开发传播变异器设计 2506.05340v2 |
Authors (12): Keshigeyan Chandrasegaran, Michael Poli, Daniel Y. Fu, Dongjun Kim, Lea M. Hadzic, Manling Li, Agrim Gupta, Stefano Massaroli, Azalia Mirhoseini, Juan Carlos Niebles, Stefano Ermon, Li Fei-Fei
Designing model architectures requires decisions such as selecting operators (e.g., attention, convolution) and configurations (e.g., depth, width). However, evaluating the impact of these decisions on model quality requires costly pretraining, limiting architectural investigation. Inspired by how new software is built on existing code, we ask: can new architecture designs be studied using pretrained models? To this end, we present grafting, a simple approach for editing pretrained diffusion transformers (DiTs) to materialize new architectures under small compute budgets. Informed by our analysis of activation behavior and attention locality, we construct a testbed based on the DiT-XL/2 design to study the impact of grafting on model quality. Using this testbed, we develop a family of hybrid designs via grafting: replacing softmax attention with gated convolution, local attention, and linear attention, and replacing MLPs with variable expansion ratio and convolutional variants. Notably, many hybrid designs achieve good quality (FID: 2.38-2.64 vs. 2.27 for DiT-XL/2) using <2% pretraining compute. We then graft a text-to-image model (PixArt-Sigma), achieving a 1.43x speedup with less than a 2% drop in GenEval score. Finally, we present a case study that restructures DiT-XL/2 by converting every pair of sequential transformer blocks into parallel blocks via grafting. This reduces model depth by 2x and yields better quality (FID: 2.77) than other models of comparable depth. Together, we show that new diffusion model designs can be explored by grafting pretrained DiTs, with edits ranging from operator replacement to architecture restructuring. Code and grafted models: https://grafting.stanford.edu
nan
Article 1516
Title@2025-06-06 (5): Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Title: Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias | Eigenspektrumanalyse von Neuronalen Netzwerken ohne Aspect Ratio Bias | 对无视比比比的神经网络进行微微光谱分析 2506.06280v1 |
Authors (4): Yuanzhe Hu, Kinshuk Goel, Vlad Killiakov, Yaoqing Yang
Diagnosing deep neural networks (DNNs) through the eigenspectrum of weight matrices has been an active area of research in recent years. At a high level, eigenspectrum analysis of DNNs involves measuring the heavytailness of the empirical spectral densities (ESD) of weight matrices. It provides insight into how well a model is trained and can guide decisions on assigning better layer-wise training hyperparameters. In this paper, we address a challenge associated with such eigenspectrum methods: the impact of the aspect ratio of weight matrices on estimated heavytailness metrics. We demonstrate that matrices of varying sizes (and aspect ratios) introduce a non-negligible bias in estimating heavytailness metrics, leading to inaccurate model diagnosis and layer-wise hyperparameter assignment. To overcome this challenge, we propose FARMS (Fixed-Aspect-Ratio Matrix Subsampling), a method that normalizes the weight matrices by subsampling submatrices with a fixed aspect ratio. Instead of measuring the heavytailness of the original ESD, we measure the average ESD of these subsampled submatrices. We show that measuring the heavytailness of these submatrices with the fixed aspect ratio can effectively mitigate the aspect ratio bias. We validate our approach across various optimization techniques and application domains that involve eigenspectrum analysis of weights, including image classification in computer vision (CV) models, scientific machine learning (SciML) model training, and large language model (LLM) pruning. Our results show that despite its simplicity, FARMS uniformly improves the accuracy of eigenspectrum analysis while enabling more effective layer-wise hyperparameter assignment in these application domains. In one of the LLM pruning experiments, FARMS reduces the perplexity of the LLaMA-7B model by 17.3% when compared with the state-of-the-art method.
nan
Article 1517
Title@2025-06-06 (5): STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Title: STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis | STARFlow: Skalierung von Latent Normalisierungsströmen für hochauflösende Bildsynthese | STARRFlow: 增加高分辨率图像合成的中端正常化流程 2506.06276v1 |
Authors (10): Jiatao Gu, Tianrong Chen, David Berthelot, Huangjie Zheng, Yuyang Wang, Ruixiang Zhang, Laurent Dinh, Miguel Angel Bautista, Josh Susskind, Shuangfei Zhai
We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis. The core of STARFlow is Transformer Autoregressive Flow (TARFlow), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers. We first establish the theoretical universality of TARFlow for modeling continuous distributions. Building on this foundation, we introduce several key architectural and algorithmic innovations to significantly enhance scalability: (1) a deep-shallow design, wherein a deep Transformer block captures most of the model representational capacity, complemented by a few shallow Transformer blocks that are computationally efficient yet substantially beneficial; (2) modeling in the latent space of pretrained autoencoders, which proves more effective than direct pixel-level modeling; and (3) a novel guidance algorithm that significantly boosts sample quality. Crucially, our model remains an end-to-end normalizing flow, enabling exact maximum likelihood training in continuous spaces without discretization. STARFlow achieves competitive performance in both class-conditional and text-conditional image generation tasks, approaching state-of-the-art diffusion models in sample quality. To our knowledge, this work is the first successful demonstration of normalizing flows operating effectively at this scale and resolution.
nan
Article 1518
Title@2025-06-06 (5): Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding
Title: Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding | Filmfakten und Fibs (MF$^2$): Ein Benchmark für langes Filmverständnis | 电影事实和纤维(MF$2美元):长期电影理解基准 2506.06275v1 |
Authors (31): Emmanouil Zaranis, António Farinhas, Saul Santos, Beatriz Canaverde, Miguel Moura Ramos, Aditya K Surikuchi, André Viveiros, Baohao Liao, Elena Bueno-Benito, Nithin Sivakumaran, Pavlo Vasylenko, Shoubin Yu, Sonal Sannigrahi, Wafaa Mohammed, Ben Peters, Danae Sánchez Villegas, Elias Stengel-Eskin, Giuseppe Attanasio, Jaehong Yoon, Stella Frank, Alessandro Suglia, Chrysoula Zerva, Desmond Elliott, Mariella Dimiccoli, Mohit Bansal, Oswald Lanz, Raffaella Bernardi, Raquel Fernández, Sandro Pezzelle, Vlad Niculae, André F. T. Martins
Despite recent progress in vision-language models (VLMs), holistic understanding of long-form video content remains a significant challenge, partly due to limitations in current benchmarks. Many focus on peripheral, ``needle-in-a-haystack’’ details, encouraging context-insensitive retrieval over deep comprehension. Others rely on large-scale, semi-automatically generated questions (often produced by language models themselves) that are easier for models to answer but fail to reflect genuine understanding. In this paper, we introduce MF$^2$, a new benchmark for evaluating whether models can comprehend, consolidate, and recall key narrative information from full-length movies (50-170 minutes long). MF$^2$ includes over 50 full-length, open-licensed movies, each paired with manually constructed sets of claim pairs – one true (fact) and one plausible but false (fib), totalling over 850 pairs. These claims target core narrative elements such as character motivations and emotions, causal chains, and event order, and refer to memorable moments that humans can recall without rewatching the movie. Instead of multiple-choice formats, we adopt a binary claim evaluation protocol: for each pair, models must correctly identify both the true and false claims. This reduces biases like answer ordering and enables a more precise assessment of reasoning. Our experiments demonstrate that both open-weight and closed state-of-the-art models fall well short of human performance, underscoring the relative ease of the task for humans and their superior ability to retain and reason over critical narrative information – an ability current VLMs lack.
nan
Article 1519
Title@2025-06-06 (5): Do Large Language Models Reason Causally Like Us? Even Better?
Title: Do Large Language Models Reason Causally Like Us? Even Better? | Haben große Sprachmodelle Grund ursächlich wie wir? Noch besser? | 大语言模型的原因和我们一样吗?更好吗? 2502.10215v2 |
Authors (4): Hanna M. Dettki, Brenden M. Lake, Charley M. Wu, Bob Rehder
Causal reasoning is a core component of intelligence. Large language models (LLMs) have shown impressive capabilities in generating human-like text, raising questions about whether their responses reflect true understanding or statistical patterns. We compared causal reasoning in humans and four LLMs using tasks based on collider graphs, rating the likelihood of a query variable occurring given evidence from other variables. LLMs’ causal inferences ranged from often nonsensical (GPT-3.5) to human-like to often more normatively aligned than those of humans (GPT-4o, Gemini-Pro, and Claude). Computational model fitting showed that one reason for GPT-4o, Gemini-Pro, and Claude’s superior performance is they didn’t exhibit the “associative bias” that plagues human causal reasoning. Nevertheless, even these LLMs did not fully capture subtler reasoning patterns associated with collider graphs, such as “explaining away”.
nan
Article 1520
Title@2025-06-06 (5): Covering Number of Real Algebraic Varieties and Beyond: Improved Bounds and Applications
Title: Covering Number of Real Algebraic Varieties and Beyond: Improved Bounds and Applications | Abdeckung der Anzahl der realen algebraischen Sorten und darüber hinaus: Verbesserte Grenzen und Anwendungen | 覆盖实际代数差异数目及以后:改善边界和应用 2311.05116v4 |
Authors (2): Yifan Zhang, Joe Kileel
Covering numbers are a powerful tool used in the development of approximation algorithms, randomized dimension reduction methods, smoothed complexity analysis, and others. In this paper we prove upper bounds on the covering number of numerous sets in Euclidean space, namely real algebraic varieties, images of polynomial maps and semialgebraic sets in terms of the number of variables and degrees of the polynomials involved. The bounds remarkably improve the best known general bound by Yomdin-Comte, and our proof is much more straightforward. In particular, our result gives new bounds on the volume of the tubular neighborhood of the image of a polynomial map and a semialgebraic set, where results for varieties by Lotz and Basu-Lerario are not directly applicable. We illustrate the power of the result on three computational applications. Firstly, we derive a near-optimal bound on the covering number of tensors with low canonical polyadic (CP) rank, quantifying their approximation properties and filling in an important missing piece of theory for tensor dimension reduction and reconstruction. Secondly, we prove a bound on dimensionality reduction of images of polynomial maps via randomized sketching, which has direct applications to large scale polynomial optimization. Finally, we deduce generalization error bounds for deep neural networks with rational or ReLU activation functions, improving or matching the best known results in the machine learning literature while helping to quantify the impact of architecture choice on generalization error.
nan
Article 1521
Title@2025-06-06 (5): Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems
Title: Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems | Sketched Equivariant Imaging Regularization und Deep Internal Learning für inverse Probleme | 用于反向问题的预言的等同成像正规化和深内部学习 2411.05771v4 |
Authors (3): Guixian Xu, Jinglai Li, Junqi Tang
Equivariant Imaging (EI) regularization has become the de-facto technique for unsupervised training of deep imaging networks, without any need of ground-truth data. Observing that the EI-based unsupervised training paradigm currently has significant computational redundancy leading to inefficiency in high-dimensional applications, we propose a sketched EI regularization which leverages the randomized sketching techniques for acceleration. We apply our sketched EI regularization to develop an accelerated deep internal learning framework, which can be efficiently applied for test-time network adaptation. Additionally, for network adaptation tasks, we propose a parameter-efficient approach to accelerate both EI and Sketched-EI via optimizing only the normalization layers. Our numerical study on X-ray CT and multicoil magnetic resonance image reconstruction tasks demonstrate that our approach can achieve significant computational acceleration over standard EI counterpart in single-input setting and network adaptation at test time.
nan
Article 1522
Title@2025-06-06 (5): Normalizing Flows are Capable Generative Models
Title: Normalizing Flows are Capable Generative Models | Normalisierende Strömungen sind fähige Generative Modelle | 正常流动是能够生成的模型 2412.06329v3 |
Authors (10): Shuangfei Zhai, Ruixiang Zhang, Preetum Nakkiran, David Berthelot, Jiatao Gu, Huangjie Zheng, Tianrong Chen, Miguel Angel Bautista, Navdeep Jaitly, Josh Susskind
Normalizing Flows (NFs) are likelihood-based models for continuous inputs. They have demonstrated promising results on both density estimation and generative modeling tasks, but have received relatively little attention in recent years. In this work, we demonstrate that NFs are more powerful than previously believed. We present TarFlow: a simple and scalable architecture that enables highly performant NF models. TarFlow can be thought of as a Transformer-based variant of Masked Autoregressive Flows (MAFs): it consists of a stack of autoregressive Transformer blocks on image patches, alternating the autoregression direction between layers. TarFlow is straightforward to train end-to-end, and capable of directly modeling and generating pixels. We also propose three key techniques to improve sample quality: Gaussian noise augmentation during training, a post training denoising procedure, and an effective guidance method for both class-conditional and unconditional settings. Putting these together, TarFlow sets new state-of-the-art results on likelihood estimation for images, beating the previous best methods by a large margin, and generates samples with quality and diversity comparable to diffusion models, for the first time with a stand-alone NF model. We make our code available at https://github.com/apple/ml-tarflow.
nan
Article 1523
Title@2025-06-06 (5): Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens
Title: Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens | Reflect-then-Plan: Offline-Modellbasierte Planung durch eine doppelt Bayesian-Lens | 反映时计划:通过杜布里·巴耶斯冷镜头进行离线示范规划 2506.06261v1 |
Authors (5): Jihwan Jeong, Xiaoyu Wang, Jingmin Wang, Scott Sanner, Pascal Poupart
Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach. RefPlan unifies uncertainty modeling and MB planning by recasting planning as Bayesian posterior estimation. At deployment, it updates a belief over environment dynamics using real-time observations, incorporating uncertainty into MB planning via marginalization. Empirical results on standard benchmarks show that RefPlan significantly improves the performance of conservative offline RL policies. In particular, RefPlan maintains robust performance under high epistemic uncertainty and limited data, while demonstrating resilience to changing environment dynamics, improving the flexibility, generalizability, and robustness of offline-learned policies.
nan
Article 1524
Title@2025-06-06 (5): DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation
Title: DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation | DPCore: Dynamischer Prompt-Kernsatz für kontinuierliche Test-Zeit-Anpassung | DPCore: 持续试验-时间适应动态快速核心集 2406.10737v4 |
Authors (4): Yunbei Zhang, Akshay Mehra, Shuaicheng Niu, Jihun Hamm
Continual Test-Time Adaptation (CTTA) seeks to adapt source pre-trained models to continually changing, unseen target domains. While existing CTTA methods assume structured domain changes with uniform durations, real-world environments often exhibit dynamic patterns where domains recur with varying frequencies and durations. Current approaches, which adapt the same parameters across different domains, struggle in such dynamic conditions-they face convergence issues with brief domain exposures, risk forgetting previously learned knowledge, or misapplying it to irrelevant domains. To remedy this, we propose DPCore, a method designed for robust performance across diverse domain change patterns while ensuring computational efficiency. DPCore integrates three key components: Visual Prompt Adaptation for efficient domain alignment, a Prompt Coreset for knowledge preservation, and a Dynamic Update mechanism that intelligently adjusts existing prompts for similar domains while creating new ones for substantially different domains. Extensive experiments on four benchmarks demonstrate that DPCore consistently outperforms various CTTA methods, achieving state-of-the-art performance in both structured and dynamic settings while reducing trainable parameters by 99% and computation time by 64% compared to previous approaches.
nan
Article 1525
Title@2025-06-06 (5): PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time
Title: PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time | PersonaAgent: Wenn große Sprachmodell-Agenten Personalisierung zur Testzeit treffen | 人 员:当大语言示范媒介在试验时间达到个性化时 2506.06254v1 |
Authors (15): Weizhi Zhang, Xinyang Zhang, Chenwei Zhang, Liangwei Yang, Jingbo Shang, Zhepei Wei, Henry Peng Zou, Zijie Huang, Zhengyang Wang, Yifan Gao, Xiaoman Pan, Lian Xiong, Jingguo Liu, Philip S. Yu, Xian Li
Large Language Model (LLM) empowered agents have recently emerged as advanced paradigms that exhibit impressive capabilities in a wide range of domains and tasks. Despite their potential, current LLM agents often adopt a one-size-fits-all approach, lacking the flexibility to respond to users’ varying needs and preferences. This limitation motivates us to develop PersonaAgent, the first personalized LLM agent framework designed to address versatile personalization tasks. Specifically, PersonaAgent integrates two complementary components - a personalized memory module that includes episodic and semantic memory mechanisms; a personalized action module that enables the agent to perform tool actions tailored to the user. At the core, the persona (defined as unique system prompt for each user) functions as an intermediary: it leverages insights from personalized memory to control agent actions, while the outcomes of these actions in turn refine the memory. Based on the framework, we propose a test-time user-preference alignment strategy that simulate the latest n interactions to optimize the persona prompt, ensuring real-time user preference alignment through textual loss feedback between simulated and ground-truth responses. Experimental evaluations demonstrate that PersonaAgent significantly outperforms other baseline methods by not only personalizing the action space effectively but also scaling during test-time real-world applications. These results underscore the feasibility and potential of our approach in delivering tailored, dynamic user experiences.
nan
Article 1526
Title@2025-06-06 (5): Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Title: Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models | Sparse Autoencoder lernen monosemantische Funktionen in Vision-Language-Modellen | 在视觉语言模型中学习数学特征 2504.02821v2 |
Authors (5): Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot, Serge Belongie, Zeynep Akata
Given that interpretability and steerability are crucial to AI safety, Sparse Autoencoders (SAEs) have emerged as a tool to enhance them in Large Language Models (LLMs). In this work, we extend the application of SAEs to Vision-Language Models (VLMs), such as CLIP, and introduce a comprehensive framework for evaluating monosemanticity at the neuron-level in vision representations. To ensure that our evaluation aligns with human perception, we propose a benchmark derived from a large-scale user study. Our experimental results reveal that SAEs trained on VLMs significantly enhance the monosemanticity of individual neurons, with sparsity and wide latents being the most influential factors. Notably, we demonstrate that applying SAE interventions on CLIP’s vision encoder directly steers multimodal LLM outputs (e.g., LLaVA), without any modifications to the underlying model. These findings emphasize the practicality and efficacy of SAEs as an unsupervised tool for enhancing both interpretability and control of VLMs. Code is available at https://github.com/ExplainableML/sae-for-vlm.
nan
Article 1527
Title@2025-06-06 (5): Lagrangian-based Equilibrium Propagation: generalisation to arbitrary boundary conditions & equivalence with Hamiltonian Echo Learning
Title: Lagrangian-based Equilibrium Propagation: generalisation to arbitrary boundary conditions & equivalence with Hamiltonian Echo Learning | Lagrangian-basierte Equilibrium Propagation: Verallgemeinerung zu willkürlichen Randbedingungen & Äquivalenz mit Hamiltonian Echo Learning | Lagrangian 以拉格朗江为基础的平衡传播:对任意边界条件的概括化和与汉密尔顿回声学习的等同 2506.06248v1 |
Authors (4): Guillaume Pourcel, Debabrota Basu, Maxence Ernoult, Aditya Gilra
Equilibrium Propagation (EP) is a learning algorithm for training Energy-based Models (EBMs) on static inputs which leverages the variational description of their fixed points. Extending EP to time-varying inputs is a challenging problem, as the variational description must apply to the entire system trajectory rather than just fixed points, and careful consideration of boundary conditions becomes essential. In this work, we present Generalized Lagrangian Equilibrium Propagation (GLEP), which extends the variational formulation of EP to time-varying inputs. We demonstrate that GLEP yields different learning algorithms depending on the boundary conditions of the system, many of which are impractical for implementation. We then show that Hamiltonian Echo Learning (HEL) – which includes the recently proposed Recurrent HEL (RHEL) and the earlier known Hamiltonian Echo Backpropagation (HEB) algorithms – can be derived as a special case of GLEP. Notably, HEL is the only instance of GLEP we found that inherits the properties that make EP a desirable alternative to backpropagation for hardware implementations: it operates in a “forward-only” manner (i.e. using the same system for both inference and learning), it scales efficiently (requiring only two or more passes through the system regardless of model size), and enables local learning.
nan
Article 1528
Title@2025-06-06 (5): Neural Responses to Affective Sentences Reveal Signatures of Depression
Title: Neural Responses to Affective Sentences Reveal Signatures of Depression | Neurale Reaktionen auf Affective Sentences Offenbarungssignaturen von Depressionen | 对消极判决的神经反应 反萧条征兆 2506.06244v1 |
Authors (16): Aditya Kommineni, Woojae Jeong, Kleanthis Avramidis, Colin McDaniel, Myzelle Hughes, Thomas McGee, Elsi Kaiser, Kristina Lerman, Idan A. Blank, Dani Byrd, Assal Habibi, B. Rael Cahn, Sudarsana Kadiri, Takfarinas Medani, Richard M. Leahy, Shrikanth Narayanan
Major Depressive Disorder (MDD) is a highly prevalent mental health condition, and a deeper understanding of its neurocognitive foundations is essential for identifying how core functions such as emotional and self-referential processing are affected. We investigate how depression alters the temporal dynamics of emotional processing by measuring neural responses to self-referential affective sentences using surface electroencephalography (EEG) in healthy and depressed individuals. Our results reveal significant group-level differences in neural activity during sentence viewing, suggesting disrupted integration of emotional and self-referential information in depression. Deep learning model trained on these responses achieves an area under the receiver operating curve (AUC) of 0.707 in distinguishing healthy from depressed participants, and 0.624 in differentiating depressed subgroups with and without suicidal ideation. Spatial ablations highlight anterior electrodes associated with semantic and affective processing as key contributors. These findings suggest stable, stimulus-driven neural signatures of depression that may inform future diagnostic tools.
nan
Article 1529
Title@2025-06-06 (5): LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models
Title: LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models | LlavaGuard: Ein offenes VLM-basiertes Framework zur Sicherung von Vision-Datensätzen und -Modellen | LlavaGuard:基于VLM的公开保护视觉数据集和模型框架 2406.05113v3 |
Authors (5): Lukas Helff, Felix Friedrich, Manuel Brack, Kristian Kersting, Patrick Schramowski
This paper introduces LlavaGuard, a suite of VLM-based vision safeguards that address the critical need for reliable guardrails in the era of large-scale data and models. To this end, we establish a novel open framework, describing a customizable safety taxonomy, data preprocessing, augmentation, and training setup. For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category, and rationale. We also employ advanced augmentations to support context-specific assessments. The resulting LlavaGuard models, ranging from 0.5B to 7B, serve as a versatile tool for evaluating the safety compliance of visual content against flexible policies. In comprehensive experiments, LlavaGuard outperforms both state-of-the-art safeguards and VLMs in accuracy and in flexibly handling different policies. Additionally, we demonstrate LlavaGuard’s performance in two real-world applications: large-scale dataset annotation and moderation of text-to-image models. We make our entire framework, including the dataset, model weights, and training code.
nan
Article 1530
Title@2025-06-06 (5): fairmetrics: An R package for group fairness evaluation
Title: fairmetrics: An R package for group fairness evaluation | fairmetrics: Ein R-Paket für die Bewertung von Gruppengerechtigkeit | 公平度:团体公平评估R包件 2506.06243v1 |
Authors (3): Benjamin Smith, Jianhui Gao, Jessica Gronsbell
Fairness is a growing area of machine learning (ML) that focuses on ensuring models do not produce systematically biased outcomes for specific groups, particularly those defined by protected attributes such as race, gender, or age. Evaluating fairness is a critical aspect of ML model development, as biased models can perpetuate structural inequalities. The {fairmetrics} R package offers a user-friendly framework for rigorously evaluating numerous group-based fairness criteria, including metrics based on independence (e.g., statistical parity), separation (e.g., equalized odds), and sufficiency (e.g., predictive parity). Group-based fairness criteria assess whether a model is equally accurate or well-calibrated across a set of predefined groups so that appropriate bias mitigation strategies can be implemented. {fairmetrics} provides both point and interval estimates for multiple metrics through a convenient wrapper function and includes an example dataset derived from the Medical Information Mart for Intensive Care, version II (MIMIC-II) database (Goldberger et al., 2000; Raffa, 2016).
nan
Article 1531
Title@2025-06-06 (5): TQml Simulator: Optimized Simulation of Quantum Machine Learning
Title: TQml Simulator: Optimized Simulation of Quantum Machine Learning | TQml Simulator: Optimierte Simulation des Quantum Machine Learning | TQml 模拟器: 量子机器学习优化模拟 2506.04891v2 |
Authors (5): Viacheslav Kuzmin, Basil Kyriacou, Mateusz Papierz, Mo Kordzanganeh, Alexey Melnikov
Hardware-efficient circuits employed in Quantum Machine Learning are typically composed of alternating layers of uniformly applied gates. High-speed numerical simulators for such circuits are crucial for advancing research in this field. In this work, we numerically benchmark universal and gate-specific techniques for simulating the action of layers of gates on quantum state vectors, aiming to accelerate the overall simulation of Quantum Machine Learning algorithms. Our analysis shows that the optimal simulation method for a given layer of gates depends on the number of qubits involved, and that a tailored combination of techniques can yield substantial performance gains in the forward and backward passes for a given circuit. Building on these insights, we developed a numerical simulator, named TQml Simulator, that employs the most efficient simulation method for each layer in a given circuit. We evaluated TQml Simulator on circuits constructed from standard gate sets, such as rotations and CNOTs, as well as on native gates from IonQ and IBM quantum processing units. In most cases, our simulator outperforms equivalent Pennylane’s default_qubit simulator by up to a factor of 10, depending on the circuit, the number of qubits, the batch size of the input data, and the hardware used.
nan
Article 1532
Title@2025-06-06 (5): Towards an Explainable Comparison and Alignment of Feature Embeddings
Title: Towards an Explainable Comparison and Alignment of Feature Embeddings | Auf dem Weg zu einem erklärbaren Vergleich und der Ausrichtung von Feature-Embeddings | 争取实现可解释的比较和地貌隐含物的一致 2506.06231v1 |
Authors (3): Mohammad Jalali, Bahar Dibaei Nia, Farzan Farnia
While several feature embedding models have been developed in the literature, comparisons of these embeddings have largely focused on their numerical performance in classification-related downstream applications. However, an interpretable comparison of different embeddings requires identifying and analyzing mismatches between sample groups clustered within the embedding spaces. In this work, we propose the \emph{Spectral Pairwise Embedding Comparison (SPEC)} framework to compare embeddings and identify their differences in clustering a reference dataset. Our approach examines the kernel matrices derived from two embeddings and leverages the eigendecomposition of the difference kernel matrix to detect sample clusters that are captured differently by the two embeddings. We present a scalable implementation of this kernel-based approach, with computational complexity that grows linearly with the sample size. Furthermore, we introduce an optimization problem using this framework to align two embeddings, ensuring that clusters identified in one embedding are also captured in the other model. We provide numerical results demonstrating the SPEC’s application to compare and align embeddings on large-scale datasets such as ImageNet and MS-COCO. The code is available at https://github.com/mjalali/embedding-comparison.
nan
Article 1533
Title@2025-06-06 (5): NeurNCD: Novel Class Discovery via Implicit Neural Representation
Title: NeurNCD: Novel Class Discovery via Implicit Neural Representation | NeurNCD: Novel Class Discovery über Implizite Neuraldarstellung | NeurNCD: 通过隐性神经代表方式发现新类 2506.06412v1 |
Authors (2): Junming Wang, Yi Shi
Discovering novel classes in open-world settings is crucial for real-world applications. Traditional explicit representations, such as object descriptors or 3D segmentation maps, are constrained by their discrete, hole-prone, and noisy nature, which hinders accurate novel class discovery. To address these challenges, we introduce NeurNCD, the first versatile and data-efficient framework for novel class discovery that employs the meticulously designed Embedding-NeRF model combined with KL divergence as a substitute for traditional explicit 3D segmentation maps to aggregate semantic embedding and entropy in visual embedding space. NeurNCD also integrates several key components, including feature query, feature modulation and clustering, facilitating efficient feature augmentation and information exchange between the pre-trained semantic segmentation network and implicit neural representations. As a result, our framework achieves superior segmentation performance in both open and closed-world settings without relying on densely labelled datasets for supervised training or human interaction to generate sparse label supervision. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches on the NYUv2 and Replica datasets.
nan
Article 1534
Title@2025-06-06 (5): Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets
Title: Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets | Fréchet Radiomic Distance (FRD): Ein vielseitiges Metric für den Vergleich medizinischer Bilddatensätze | Fréchet放射性距离(FRD):用于比较医疗成像数据集的Versatile计量仪 2412.01496v2 |
Authors (19): Nicholas Konz, Richard Osuala, Preeti Verma, Yuwen Chen, Hanxue Gu, Haoyu Dong, Yaqian Chen, Andrew Marshall, Lidia Garrucho, Kaisar Kushibar, Daniel M. Lang, Gene S. Kim, Lars J. Grimm, John M. Lewin, James S. Duncan, Julia A. Schnabel, Oliver Diaz, Karim Lekadir, Maciej A. Mazurowski
Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perceptual metrics (e.g., Fr'echet Inception Distance/FID) from natural imaging, which we show insufficiently capture anatomical features. To this end, we introduce a new perceptual metric tailored for medical images, FRD (Fr'echet Radiomic Distance), which utilizes standardized, clinically meaningful, and interpretable image features. We show that FRD is superior to other image distribution metrics for a range of medical imaging applications, including out-of-domain (OOD) detection, the evaluation of image-to-image translation (by correlating more with downstream task performance as well as anatomical consistency and realism), and the evaluation of unconditional image generation. Moreover, FRD offers additional benefits such as stability and computational efficiency at low sample sizes, sensitivity to image corruptions and adversarial attacks, feature interpretability, and correlation with radiologist-perceived image quality. Additionally, we address key gaps in the literature by presenting an extensive framework for the multifaceted evaluation of image similarity metrics in medical imaging – including the first large-scale comparative study of generative models for medical image translation – and release an accessible codebase to facilitate future research. Our results are supported by thorough experiments spanning a variety of datasets, modalities, and downstream tasks, highlighting the broad potential of FRD for medical image analysis.
nan
Article 1535
Title@2025-06-06 (5): Corrector Sampling in Language Models
Title: Corrector Sampling in Language Models | Korrektor-Probenahme in Sprachmodellen | 语言模型中校正器样本 2506.06215v1 |
Authors (4): Itai Gat, Neta Shaul, Uriel Singer, Yaron Lipman
Autoregressive language models accumulate errors due to their fixed, irrevocable left-to-right token generation. To address this, we propose a new sampling method called Resample-Previous-Tokens (RPT). RPT mitigates error accumulation by iteratively revisiting and potentially replacing tokens in a window of previously generated text. This method can be integrated into existing autoregressive models, preserving their next-token-prediction quality and speed. Fine-tuning a pretrained 8B parameter model with RPT for only 100B resulted in ~10% relative improvements on reasoning and coding benchmarks compared to the standard sampling.
nan
Article 1536
Title@2025-06-06 (5): Model-Driven Graph Contrastive Learning
Title: Model-Driven Graph Contrastive Learning | Modellgetriebenes Graph Kontrastives Lernen | 模型驱动图反比学习 2506.06212v1 |
Authors (3): Ali Azizpour, Nicolas Zilberstein, Santiago Segarra
We propose $\textbf{MGCL}$, a model-driven graph contrastive learning (GCL) framework that leverages graphons (probabilistic generative models for graphs) to guide contrastive learning by accounting for the data’s underlying generative process. GCL has emerged as a powerful self-supervised framework for learning expressive node or graph representations without relying on annotated labels, which are often scarce in real-world data. By contrasting augmented views of graph data, GCL has demonstrated strong performance across various downstream tasks, such as node and graph classification. However, existing methods typically rely on manually designed or heuristic augmentation strategies that are not tailored to the underlying data distribution and operate at the individual graph level, ignoring similarities among graphs generated from the same model. Conversely, in our proposed approach, MGCL first estimates the graphon associated with the observed data and then defines a graphon-informed augmentation process, enabling data-adaptive and principled augmentations. Additionally, for graph-level tasks, MGCL clusters the dataset and estimates a graphon per group, enabling contrastive pairs to reflect shared semantics and structure. Extensive experiments on benchmark datasets demonstrate that MGCL achieves state-of-the-art performance, highlighting the advantages of incorporating generative models into GCL.
nan
Article 1537
Title@2025-06-06 (5): Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning
Title: Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning | Pseudo-Etikettierung trifft Labelglättung für geräuscharmes partiales Label-Lernen | Pseudo 标签会满足新闻部分标签学习的标签平滑 2402.04835v3 |
Authors (3): Darshana Saravanan, Naresh Manwani, Vineet Gandhi
We motivate weakly supervised learning as an effective learning paradigm for problems where curating perfectly annotated datasets is expensive and may require domain expertise such as fine-grained classification. We focus on Partial Label Learning (PLL), a weakly-supervised learning paradigm where each training instance is paired with a set of candidate labels (partial label), one of which is the true label. Noisy PLL (NPLL) relaxes this constraint by allowing some partial labels to not contain the true label, enhancing the practicality of the problem. Our work centres on NPLL and presents a framework that initially assigns pseudo-labels to images by exploiting the noisy partial labels through a weighted nearest neighbour algorithm. These pseudo-label and image pairs are then used to train a deep neural network classifier with label smoothing. The classifier’s features and predictions are subsequently employed to refine and enhance the accuracy of pseudo-labels. We perform thorough experiments on seven datasets and compare against nine NPLL and PLL methods. We achieve state-of-the-art results in all studied settings from the prior literature, obtaining substantial gains in the simulated fine-grained benchmarks. Further, we show the promising generalisation capability of our framework in realistic, fine-grained, crowd-sourced datasets.
nan
Article 1538
Title@2025-06-06 (5): Kinetics: Rethinking Test-Time Scaling Laws
Title: Kinetics: Rethinking Test-Time Scaling Laws | Kinetik: Überdenken von Test-Zeit-Skalierungsgesetzen | 动因:重新思考试验时间扩增法 2506.05333v2 |
Authors (6): Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen
We rethink test-time scaling laws from a practical efficiency perspective, revealing that the effectiveness of smaller models is significantly overestimated. Prior work, grounded in compute-optimality, overlooks critical memory access bottlenecks introduced by inference-time strategies (e.g., Best-of-$N$, long CoTs). Our holistic analysis, spanning models from 0.6B to 32B parameters, reveals a new Kinetics Scaling Law that better guides resource allocation by incorporating both computation and memory access costs. Kinetics Scaling Law suggests that test-time compute is more effective when used on models above a threshold than smaller ones. A key reason is that in TTS, attention, rather than parameter count, emerges as the dominant cost factor. Motivated by this, we propose a new scaling paradigm centered on sparse attention, which lowers per-token cost and enables longer generations and more parallel samples within the same resource budget. Empirically, we show that sparse attention models consistently outperform dense counterparts, achieving over 60 points gains in low-cost regimes and over 5 points gains in high-cost regimes for problem-solving accuracy on AIME, encompassing evaluations on state-of-the-art MoEs. These results suggest that sparse attention is essential and increasingly important with more computing invested, for realizing the full potential of test-time scaling where, unlike training, accuracy has yet to saturate as a function of computation, and continues to improve through increased generation. The code is available at https://github.com/Infini-AI-Lab/Kinetics.
nan
Article 1539
Title@2025-06-06 (5): ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment
Title: ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment | ByzSecAgg: Ein Byzantinisch-Resistent Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment | ByzsecAgg:基于编码计算和矢量承诺的联邦学习拜占庭 – – 相对安全安全聚合计划 2302.09913v4 |
Authors (3): Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali, Giuseppe Caire
In this paper, we propose ByzSecAgg, an efficient secure aggregation scheme for federated learning that is resistant to Byzantine attacks and privacy leakages. Processing individual updates to manage adversarial behavior, while preserving the privacy of the data against colluding nodes, requires some sort of secure secret sharing. However, the communication load for secret sharing of long vectors of updates can be very high. In federated settings, where users are often edge devices with potential bandwidth constraints, excessive communication overhead is undesirable. ByzSecAgg solves this problem by partitioning local updates into smaller sub-vectors and sharing them using ramp secret sharing. However, this sharing method does not admit bilinear computations, such as pairwise distances calculations, which are needed for distance-based outlier-detection algorithms, and effective methods for mitigating Byzantine attacks. To overcome this issue, each user runs another round of ramp sharing, with a different embedding of the data in the sharing polynomial. This technique, motivated by ideas from coded computing, enables secure computation of pairwise distance. In addition, to maintain the integrity and privacy of the local update, ByzSecAgg also uses a vector commitment method, in which the commitment size remains constant (i.e., does not increase with the length of the local update), while simultaneously allowing verification of the secret sharing process. In terms of communication load, ByzSecAgg significantly outperforms the related baseline scheme, known as BREA.
nan
Article 1540
Title@2025-06-06 (5): How to craft a deep reinforcement learning policy for wind farm flow control
Title: How to craft a deep reinforcement learning policy for wind farm flow control | Wie man eine tief greifende Lernpolitik für die Steuerung des Windparkflusses schafft | 如何为风力农场流动控制制定深入强化学习政策 2506.06204v1 |
Authors (5): Elie Kadoche, Pascal Bianchi, Florence Carton, Philippe Ciblat, Damien Ernst
Within wind farms, wake effects between turbines can significantly reduce overall energy production. Wind farm flow control encompasses methods designed to mitigate these effects through coordinated turbine control. Wake steering, for example, consists in intentionally misaligning certain turbines with the wind to optimize airflow and increase power output. However, designing a robust wake steering controller remains challenging, and existing machine learning approaches are limited to quasi-static wind conditions or small wind farms. This work presents a new deep reinforcement learning methodology to develop a wake steering policy that overcomes these limitations. Our approach introduces a novel architecture that combines graph attention networks and multi-head self-attention blocks, alongside a novel reward function and training strategy. The resulting model computes the yaw angles of each turbine, optimizing energy production in time-varying wind conditions. An empirical study conducted on steady-state, low-fidelity simulation, shows that our model requires approximately 10 times fewer training steps than a fully connected neural network and achieves more robust performance compared to a strong optimization baseline, increasing energy production by up to 14 %. To the best of our knowledge, this is the first deep reinforcement learning-based wake steering controller to generalize effectively across any time-varying wind conditions in a low-fidelity, steady-state numerical simulation setting.
nan
Article 1541
Title@2025-06-06 (5): MLOps with Microservices: A Case Study on the Maritime Domain
Title: MLOps with Microservices: A Case Study on the Maritime Domain | MLOps mit Microservices: Eine Fallstudie zum maritimen Bereich | 具有微服务的多边业务方案:海洋领域案例研究 2506.06202v1 |
Authors (3): Renato Cordeiro Ferreira, Rowanne Trapmann, Willem-Jan van den Heuvel
This case study describes challenges and lessons learned on building Ocean Guard: a Machine Learning-Enabled System (MLES) for anomaly detection in the maritime domain. First, the paper presents the system’s specification, and architecture. Ocean Guard was designed with a microservices’ architecture to enable multiple teams to work on the project in parallel. Then, the paper discusses how the developers adapted contract-based design to MLOps for achieving that goal. As a MLES, Ocean Guard employs code, model, and data contracts to establish guidelines between its services. This case study hopes to inspire software engineers, machine learning engineers, and data scientists to leverage similar approaches for their systems.
nan
Article 1542
Title@2025-06-06 (5): Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons
Title: Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons | Quantifizierung der Optimierungs- und Generalisierungsvorteile von Graphen-Neuralnetzwerken gegenüber mehrschichtigen Perzeptronen | 量化多层倍感图像神经网络的优化和普及优势 2306.13926v3 |
Authors (5): Wei Huang, Yuan Cao, Haonan Wang, Xin Cao, Taiji Suzuki
Graph neural networks (GNNs) have demonstrated remarkable capabilities in learning from graph-structured data, often outperforming traditional Multilayer Perceptrons (MLPs) in numerous graph-based tasks. Although existing works have demonstrated the benefits of graph convolution through Laplacian smoothing, expressivity or separability, there remains a lack of quantitative analysis comparing GNNs and MLPs from an optimization and generalization perspective. This study aims to address this gap by examining the role of graph convolution through feature learning theory. Using a signal-noise data model, we conduct a comparative analysis of the optimization and generalization between two-layer graph convolutional networks (GCNs) and their MLP counterparts. Our approach tracks the trajectory of signal learning and noise memorization in GNNs, characterizing their post-training generalization. We reveal that GNNs significantly prioritize signal learning, thus enhancing the regime of {low test error} over MLPs by $D^{q-2}$ times, where $D$ denotes a node’s expected degree and $q$ is the power of ReLU activation function with $q>2$. This finding highlights a substantial and quantitative discrepancy between GNNs and MLPs in terms of optimization and generalization, a conclusion further supported by our empirical simulations on both synthetic and real-world datasets.
nan
Article 1543
Title@2025-06-06 (5): DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models
Title: DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models | DyGMamba: Langfristige zeitliche Abhängigkeiten auf kontinuierlich-zeitdynamischen Graphen mit State Space Models effizient modellieren | DyGMamba:利用国家空间模型以连续时动态图表作为有效模拟长期时间依赖模型 2408.04713v4 |
Authors (8): Zifeng Ding, Yifeng Li, Yuan He, Antonio Norelli, Jingcheng Wu, Volker Tresp, Michael Bronstein, Yunpu Ma
Learning useful representations for continuous-time dynamic graphs (CTDGs) is challenging, due to the concurrent need to span long node interaction histories and grasp nuanced temporal details. In particular, two problems emerge: (1) Encoding longer histories requires more computational resources, making it crucial for CTDG models to maintain low computational complexity to ensure efficiency; (2) Meanwhile, more powerful models are needed to identify and select the most critical temporal information within the extended context provided by longer histories. To address these problems, we propose a CTDG representation learning model named DyGMamba, originating from the popular Mamba state space model (SSM). DyGMamba first leverages a node-level SSM to encode the sequence of historical node interactions. Another time-level SSM is then employed to exploit the temporal patterns hidden in the historical graph, where its output is used to dynamically select the critical information from the interaction history. We validate DyGMamba experimentally on the dynamic link prediction task. The results show that our model achieves state-of-the-art in most cases. DyGMamba also maintains high efficiency in terms of computational resources, making it possible to capture long temporal dependencies with a limited computation budget.
nan
Article 1544
Title@2025-06-06 (5): Transformative or Conservative? Conservation laws for ResNets and Transformers
Title: Transformative or Conservative? Conservation laws for ResNets and Transformers | Transformative oder konservative? Erhaltungsgesetze für ResNets und Transformer | 改革还是保守? ResNets和变异器的养护法 2506.06194v1 |
Authors (3): Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré
While conservation laws in gradient flow training dynamics are well understood for (mostly shallow) ReLU and linear networks, their study remains largely unexplored for more practical architectures. This paper bridges this gap by deriving and analyzing conservation laws for modern architectures, with a focus on convolutional ResNets and Transformer networks. For this, we first show that basic building blocks such as ReLU (or linear) shallow networks, with or without convolution, have easily expressed conservation laws, and no more than the known ones. In the case of a single attention layer, we also completely describe all conservation laws, and we show that residual blocks have the same conservation laws as the same block without a skip connection. We then introduce the notion of conservation laws that depend only on a subset of parameters (corresponding e.g. to a pair of consecutive layers, to a residual block, or to an attention layer). We demonstrate that the characterization of such laws can be reduced to the analysis of the corresponding building block in isolation. Finally, we examine how these newly discovered conservation principles, initially established in the continuous gradient flow regime, persist under discrete optimization dynamics, particularly in the context of Stochastic Gradient Descent (SGD).
nan
Article 1545
Title@2025-06-06 (5): ICU-TSB: A Benchmark for Temporal Patient Representation Learning for Unsupervised Stratification into Patient Cohorts
Title: ICU-TSB: A Benchmark for Temporal Patient Representation Learning for Unsupervised Stratification into Patient Cohorts | ICU-TSB: Benchmark für zeitliche Patientenrepräsentanz-Lernen zur unüberwachten Schichtung in Patientenkohorten | ICU-TSB: 用于在不受监督的情况下将患者分划入患者群体的时间病人代表制学习基准 2506.06192v1 |
Authors (5): Dimitrios Proios, Alban Bornet, Anthony Yazdani, Jose F Rodrigues Jr, Douglas Teodoro
Patient stratification identifying clinically meaningful subgroups is essential for advancing personalized medicine through improved diagnostics and treatment strategies. Electronic health records (EHRs), particularly those from intensive care units (ICUs), contain rich temporal clinical data that can be leveraged for this purpose. In this work, we introduce ICU-TSB (Temporal Stratification Benchmark), the first comprehensive benchmark for evaluating patient stratification based on temporal patient representation learning using three publicly available ICU EHR datasets. A key contribution of our benchmark is a novel hierarchical evaluation framework utilizing disease taxonomies to measure the alignment of discovered clusters with clinically validated disease groupings. In our experiments with ICU-TSB, we compared statistical methods and several recurrent neural networks, including LSTM and GRU, for their ability to generate effective patient representations for subsequent clustering of patient trajectories. Our results demonstrate that temporal representation learning can rediscover clinically meaningful patient cohorts; nevertheless, it remains a challenging task, with v-measuring varying from up to 0.46 at the top level of the taxonomy to up to 0.40 at the lowest level. To further enhance the practical utility of our findings, we also evaluate multiple strategies for assigning interpretable labels to the identified clusters. The experiments and benchmark are fully reproducible and available at https://github.com/ds4dh/CBMS2025stratification.
nan
Article 1546
Title@2025-06-06 (5): Physics-Informed Neural Networks for Control of Single-Phase Flow Systems Governed by Partial Differential Equations
Title: Physics-Informed Neural Networks for Control of Single-Phase Flow Systems Governed by Partial Differential Equations | Physik-informierte Neuronale Netze zur Steuerung von einphasigen Flusssystemen, die durch partielle Differentialgleichungen gesteuert werden | 用于控制按部分差异等量划分的单阶段流动系统的物理内建神经网络 2506.06188v1 |
Authors (4): Luis Kin Miyatake, Eduardo Camponogara, Eric Aislan Antonelo, Alexey Pavlov
The modeling and control of single-phase flow systems governed by Partial Differential Equations (PDEs) present challenges, especially under transient conditions. In this work, we extend the Physics-Informed Neural Nets for Control (PINC) framework, originally proposed to modeling and control of Ordinary Differential Equations (ODE) without the need of any labeled data, to the PDE case, particularly to single-phase incompressible and compressible flows, integrating neural networks with physical conservation laws. The PINC model for PDEs is structured into two stages: a steady-state network, which learns equilibrium solutions for a wide range of control inputs, and a transient network, which captures dynamic responses under time-varying boundary conditions. We propose a simplifying assumption that reduces the dimensionality of the spatial coordinate regarding the initial condition, allowing the efficient training of the PINC network. This simplification enables the derivation of optimal control policies using Model Predictive Control (MPC). We validate our approach through numerical experiments, demonstrating that the PINC model, which is trained exclusively using physical laws, i.e., without labeled data, accurately represents flow dynamics and enables real-time control applications. The results highlight the PINC’s capability to efficiently approximate PDE solutions without requiring iterative solvers, making it a promising alternative for fluid flow monitoring and optimization in engineering applications.
nan
Article 1547
Title@2025-06-06 (5): Antithetic Noise in Diffusion Models
Title: Antithetic Noise in Diffusion Models | Antithetischer Lärm bei Diffusionsmodellen | 扩散模型中的抗异静能 2506.06185v1 |
Authors (6): Jing Jia, Sifan Liu, Bowen Song, Wei Yuan, Liyue Shen, Guanyang Wang
We initiate a systematic study of antithetic initial noise in diffusion models. Across unconditional models trained on diverse datasets, text-conditioned latent-diffusion models, and diffusion-posterior samplers, we find that pairing each initial noise with its negation consistently yields strongly negatively correlated samples. To explain this phenomenon, we combine experiments and theoretical analysis, leading to a symmetry conjecture that the learned score function is approximately affine antisymmetric (odd symmetry up to a constant shift), and provide evidence supporting it. Leveraging this negative correlation, we enable two applications: (1) enhancing image diversity in models like Stable Diffusion without quality loss, and (2) sharpening uncertainty quantification (e.g., up to 90% narrower confidence intervals) when estimating downstream statistics. Building on these gains, we extend the two-point pairing to a randomized quasi-Monte Carlo estimator, which further improves estimation accuracy. Our framework is training-free, model-agnostic, and adds no runtime overhead.
nan
Article 1548
Title@2025-06-06 (5): SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels
Title: SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels | SemiOccam: Ein robustes semi-überwachtes Bilderkennungsnetzwerk mit Sparse-Labels | 半 Occam: 使用粗略标签粗略标签的强力半半超图像识别网络 2506.03582v2 |
Authors (2): Rui Yann, Xianglei Xing
We present SemiOccam, an image recognition network that leverages semi-supervised learning in a highly efficient manner. Existing works often rely on complex training techniques and architectures, requiring hundreds of GPU hours for training, while their generalization ability when dealing with extremely limited labeled data remains to be improved. To address these limitations, we construct a hierarchical mixture density classification decision mechanism by optimizing mutual information between feature representations and target classes, compressing redundant information while retaining crucial discriminative components. Experimental results demonstrate that our method achieves state-of-the-art performance on various datasets when using negligible labeled samples, and its simple architecture keeps training time to minute-level. Notably, this paper reveals a long-overlooked data leakage issue in the STL-10 dataset for semi-supervised learning tasks and removes duplicates to ensure the reliability of experimental results. We also release the deduplicated CleanSTL-10 dataset to facilitate fair and reliable research in future semi-supervised learning. Code available at https://github.com/Shu1L0n9/SemiOccam.
nan
Article 1549
Title@2025-06-06 (5): A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization
Title: A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization | Eine theoretische Studie der (Hyper-)Selbstachtung durch das Lens of Interactions: Repräsentation, Ausbildung, Verallgemeinerung | 通过互动的镜头进行(Hyper)自我关注的理论研究:代表性、培训、普遍化 2506.06179v1 |
Authors (2): Muhammed Ustaomeroglu, Guannan Qu
Self-attention has emerged as a core component of modern neural architectures, yet its theoretical underpinnings remain elusive. In this paper, we study self-attention through the lens of interacting entities, ranging from agents in multi-agent reinforcement learning to alleles in genetic sequences, and show that a single layer linear self-attention can efficiently represent, learn, and generalize functions capturing pairwise interactions, including out-of-distribution scenarios. Our analysis reveals that self-attention acts as a mutual interaction learner under minimal assumptions on the diversity of interaction patterns observed during training, thereby encompassing a wide variety of real-world domains. In addition, we validate our theoretical insights through experiments demonstrating that self-attention learns interaction functions and generalizes across both population distributions and out-of-distribution scenarios. Building on our theories, we introduce HyperFeatureAttention, a novel neural network module designed to learn couplings of different feature-level interactions between entities. Furthermore, we propose HyperAttention, a new module that extends beyond pairwise interactions to capture multi-entity dependencies, such as three-way, four-way, or general n-way interactions.
nan
Article 1550
Title@2025-06-06 (5): CoxNTF: A New Approach for Joint Clustering and Prediction in Survival Analysis
Title: CoxNTF: A New Approach for Joint Clustering and Prediction in Survival Analysis | CoxNTF: Ein neuer Ansatz für Joint Clustering und Vorhersagen in der Überlebensanalyse | CoxNTF:在生存分析中联合集群和预测的新方法 2506.06411v1 |
Authors (3): Paul Fogel, Christophe Geissler, George Luta
The interpretation of the results of survival analysis often benefits from latent factor representations of baseline covariates. However, existing methods, such as Nonnegative Matrix Factorization (NMF), do not incorporate survival information, limiting their predictive power. We present CoxNTF, a novel approach that uses non-negative tensor factorization (NTF) to derive meaningful latent representations that are closely associated with survival outcomes. CoxNTF constructs a weighted covariate tensor in which survival probabilities derived from the Coxnet model are used to guide the tensorization process. Our results show that CoxNTF achieves survival prediction performance comparable to using Coxnet with the original covariates, while providing a structured and interpretable clustering framework. In addition, the new approach effectively handles feature redundancy, making it a powerful tool for joint clustering and prediction in survival analysis.
nan
Article 1551
Title@2025-06-06 (5): Reusing Trajectories in Policy Gradients Enables Fast Convergence
Title: Reusing Trajectories in Policy Gradients Enables Fast Convergence | Wiederverwendung von Trajektorien in politischen Gradienten ermöglicht schnelle Konvergenz | 在政策梯度中重新使用轨迹以启用快速聚合 2506.06178v1 |
Authors (5): Alessandro Montenegro, Federico Mansutti, Marco Mussi, Matteo Papini, Alberto Maria Metelli
Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. These methods learn the parameters of parametric policies via stochastic gradient ascent, typically using on-policy trajectory data to estimate the policy gradient. However, such reliance on fresh data makes them sample-inefficient. Indeed, vanilla PG methods require $O(\epsilon^{-2})$ trajectories to reach an $\epsilon$-approximate stationary point. A common strategy to improve efficiency is to reuse off-policy information from past iterations, such as previous gradients or trajectories. While gradient reuse has received substantial theoretical attention, leading to improved rates of $O(\epsilon^{-3/2})$, the reuse of past trajectories remains largely unexplored from a theoretical perspective. In this work, we provide the first rigorous theoretical evidence that extensive reuse of past off-policy trajectories can significantly accelerate convergence in PG methods. We introduce a power mean correction to the multiple importance weighting estimator and propose RPG (Retrospective Policy Gradient), a PG algorithm that combines old and new trajectories for policy updates. Through a novel analysis, we show that, under established assumptions, RPG achieves a sample complexity of $\widetilde{O}(\epsilon^{-1})$, the best known rate in the literature. We further validate empirically our approach against PG methods with state-of-the-art rates.
nan
Article 1552
Title@2025-06-06 (5): Improving choice model specification using reinforcement learning
Title: Improving choice model specification using reinforcement learning | Verbesserung der Wahlmodellspezifikation mittels Verstärkungslernen | 利用强化学习改进选择模式规格 2506.06410v1 |
Authors (3): Gabriel Nova, Sander van Cranenburgh, Stephane Hess
Discrete choice modelling is a theory-driven modelling framework for understanding and forecasting choice behaviour. To obtain behavioural insights, modellers test several competing model specifications in their attempts to discover the ‘true’ data generation process. This trial-and-error process requires expertise, is time-consuming, and relies on subjective theoretical assumptions. Although metaheuristics have been proposed to assist choice modellers, they treat model specification as a classic optimisation problem, relying on static strategies, applying predefined rules, and neglecting outcomes from previous estimated models. As a result, current metaheuristics struggle to prioritise promising search regions, adapt exploration dynamically, and transfer knowledge to other modelling tasks. To address these limitations, we introduce a deep reinforcement learning-based framework where an ‘agent’ specifies models by estimating them and receiving rewards based on goodness-of-fit and parsimony. Results demonstrate the agent dynamically adapts its strategies to identify promising specifications across data generation processes, showing robustness and potential transferability, without prior domain knowledge.
nan
Article 1553
Title@2025-06-06 (5): Emergent Response Planning in LLMs
Title: Emergent Response Planning in LLMs | Emergent Response Planning in LLMs | 新的LLMM 应急规划 2502.06258v2 |
Authors (5): Zhichen Dong, Zhanhui Zhou, Zhixuan Liu, Chao Yang, Chaochao Lu
In this work, we argue that large language models (LLMs), though trained to predict only the next token, exhibit emergent planning behaviors: $\textbf{their hidden representations encode future outputs beyond the next token}$. Through simple probing, we demonstrate that LLM prompt representations encode global attributes of their entire responses, including $\textit{structure attributes}$ (e.g., response length, reasoning steps), $\textit{content attributes}$ (e.g., character choices in storywriting, multiple-choice answers at the end of response), and $\textit{behavior attributes}$ (e.g., answer confidence, factual consistency). In addition to identifying response planning, we explore how it scales with model size across tasks and how it evolves during generation. The findings that LLMs plan ahead for the future in their hidden representations suggest potential applications for improving transparency and generation control.
nan
Article 1554
Title@2025-06-06 (5): A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks
Title: A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks | Riemannische Optimierungsperspektive der Gauß-Newton-Methode für neurale Netze | Fefforward神经网络的高斯-牛顿方法的优化观点 2412.14031v4 |
Authors (1): Semih Cayci
We analyze the convergence of Gauss-Newton dynamics for training neural networks with smooth activation functions. In the underparameterized regime, the Gauss-Newton gradient flow induces a Riemannian gradient flow on a low-dimensional, smooth, embedded submanifold of the Euclidean output space. Using tools from Riemannian optimization, we prove \emph{last-iterate} convergence of the Riemannian gradient flow to the optimal in-class predictor at an \emph{exponential rate} that is independent of the conditioning of the Gram matrix, \emph{without} requiring explicit regularization. We further characterize the critical impacts of the neural network scaling factor and the initialization on the convergence behavior. In the overparameterized regime, we show that the Levenberg-Marquardt dynamics with an appropriately chosen damping schedule yields fast convergence rate despite potentially ill-conditioned neural tangent kernel matrices, analogous to the underparameterized regime. These findings demonstrate the potential of Gauss-Newton methods for efficiently optimizing neural networks in the near-initialization regime, particularly in ill-conditioned problems where kernel and Gram matrices have small singular values.
nan
Article 1555
Title@2025-06-06 (5): The Lock-in Hypothesis: Stagnation by Algorithm
Title: The Lock-in Hypothesis: Stagnation by Algorithm | Die Lock-in Hypothese: Stagnation durch Algorithmen | Lock-in 假说: 阿尔哥里特姆的停滞 2506.06166v1 |
Authors (4): Tianyi Alex Qiu, Zhonghao He, Tejasveer Chugh, Max Kleiman-Weiner
The training and deployment of large language models (LLMs) create a feedback loop with human users: models learn human beliefs from data, reinforce these beliefs with generated content, reabsorb the reinforced beliefs, and feed them back to users again and again. This dynamic resembles an echo chamber. We hypothesize that this feedback loop entrenches the existing values and beliefs of users, leading to a loss of diversity and potentially the lock-in of false beliefs. We formalize this hypothesis and test it empirically with agent-based LLM simulations and real-world GPT usage data. Analysis reveals sudden but sustained drops in diversity after the release of new GPT iterations, consistent with the hypothesized human-AI feedback loop. Code and data available at https://thelockinhypothesis.com
nan
Article 1556
Title@2025-06-06 (5): Automated Journalistic Questions: A New Method for Extracting 5W1H in French
Title: Automated Journalistic Questions: A New Method for Extracting 5W1H in French | Automatisierte journalistische Fragen: Eine neue Methode zur Extraktion von 5W1H in Französisch | 自动新闻问题:以法文提取5W1H的新方法 2505.14804v2 |
Authors (3): Maxence Verhaverbeke, Julie A. Gramaccia, Richard Khoury
The 5W1H questions – who, what, when, where, why and how – are commonly used in journalism to ensure that an article describes events clearly and systematically. Answering them is a crucial prerequisites for tasks such as summarization, clustering, and news aggregation. In this paper, we design the first automated extraction pipeline to get 5W1H information from French news articles. To evaluate the performance of our algorithm, we also create a corpus of 250 Quebec news articles with 5W1H answers marked by four human annotators. Our results demonstrate that our pipeline performs as well in this task as the large language model GPT-4o.
nan
Article 1557
Title@2025-06-06 (5): Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Title: Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning | Nicht alle Rollouts sind nützlich: Down-Sampling Rollouts im LLM Verstärkungslernen | 并非所有的推出都有用:在LLM强化学习中进行下选推出 2504.13818v2 |
Authors (4): Yixuan Even Xu, Yash Savani, Fei Fang, Zico Kolter
Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful paradigm for enhancing reasoning capabilities in large language models. However, it is constrained by a fundamental asymmetry in computation and memory requirements: rollout generation is embarrassingly parallel and memory-light, whereas policy updates are communication-heavy and memory-intensive. To address this, we introduce PODS (Policy Optimization with Down-Sampling). PODS produces numerous rollouts in parallel, then trains on only an informative subset, preserving learning signals while slashing update cost. We instantiate PODS with max-variance down-sampling, a principled criterion that maximises reward diversity and show it admits an $O(n\log n)$ solution. Empirically, coupling PODS with Group Relative Policy Optimization (GRPO) achieves superior performance over standard GRPO across different reasoning benchmarks and hardware environments.
nan
Article 1558
Title@2025-06-06 (5): ENMA: Tokenwise Autoregression for Generative Neural PDE Operators
Title: ENMA: Tokenwise Autoregression for Generative Neural PDE Operators | ENMA: Tokenwise Autoregression für generative neurale PDE-Betreiber | ENMA: 生成神经器 PDE 运算符的取向自动递减 2506.06158v1 |
Authors (4): Armand Kassaï Koupaï, Lise Le Boudec, Louis Serrano, Patrick Gallinari
Solving time-dependent parametric partial differential equations (PDEs) remains a fundamental challenge for neural solvers, particularly when generalizing across a wide range of physical parameters and dynamics. When data is uncertain or incomplete-as is often the case-a natural approach is to turn to generative models. We introduce ENMA, a generative neural operator designed to model spatio-temporal dynamics arising from physical phenomena. ENMA predicts future dynamics in a compressed latent space using a generative masked autoregressive transformer trained with flow matching loss, enabling tokenwise generation. Irregularly sampled spatial observations are encoded into uniform latent representations via attention mechanisms and further compressed through a spatio-temporal convolutional encoder. This allows ENMA to perform in-context learning at inference time by conditioning on either past states of the target trajectory or auxiliary context trajectories with similar dynamics. The result is a robust and adaptable framework that generalizes to new PDE regimes and supports one-shot surrogate modeling of time-dependent parametric PDEs.
nan
Article 1559
Title@2025-06-06 (5): Deconfounding Multi-Cause Latent Confounders: A Factor-Model Approach to Climate Model Bias Correction
Title: Deconfounding Multi-Cause Latent Confounders: A Factor-Model Approach to Climate Model Bias Correction | Deconfounding Multi-Cause Latent Confounders: Ein Faktor-Modell Ansatz zur Klimamodell Bias Korrektur | 多原因、中位本部混乱创始人:对气候模型比重纠正的因数模式办法 2408.12063v2 |
Authors (10): Wentao Gao, Jiuyong Li, Debo Cheng, Lin Liu, Jixue Liu, Thuc Duy Le, Xiaojing Du, Xiongren Chen, Yanchang Zhao, Yun Chen
Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, the GCM Outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglect unobserved confounders, leading to biased results. This paper proposes a novel bias correction approach to utilize both GCM and observational data to learn a factor model that captures multi-cause latent confounders. Inspired by recent advances in causality based time series deconfounding, our method first constructs a factor model to learn latent confounders from historical data and then applies them to enhance the bias correction process using advanced time series forecasting models. The experimental results demonstrate significant improvements in the accuracy of precipitation outputs. By addressing unobserved confounders, our approach offers a robust and theoretically grounded solution for climate model bias correction.
nan
Article 1560
Title@2025-06-06 (5): Graph Neural Network Generalization with Gaussian Mixture Model Based Augmentation
Title: Graph Neural Network Generalization with Gaussian Mixture Model Based Augmentation | Graphische Neural-Netzwerk-Verallgemeinerung mit Gauß-Mischungsmodell-basierter Augmentation | Gaussian 混合模型基底放大模型的神经网络通用 2411.08638v3 |
Authors (5): Yassine Abbahaddou, Fragkiskos D. Malliaros, Johannes F. Lutzeyer, Amine Mohamed Aboussalah, Michalis Vazirgiannis
Graph Neural Networks (GNNs) have shown great promise in tasks like node and graph classification, but they often struggle to generalize, particularly to unseen or out-of-distribution (OOD) data. These challenges are exacerbated when training data is limited in size or diversity. To address these issues, we introduce a theoretical framework using Rademacher complexity to compute a regret bound on the generalization error and then characterize the effect of data augmentation. This framework informs the design of GRATIN, an efficient graph data augmentation algorithm leveraging the capability of Gaussian Mixture Models (GMMs) to approximate any distribution. Our approach not only outperforms existing augmentation techniques in terms of generalization but also offers improved time complexity, making it highly suitable for real-world applications.
nan
Article 1561
Title@2025-06-06 (5): ELEVATE-GenAI: Reporting Guidelines for the Use of Large Language Models in Health Economics and Outcomes Research: an ISPOR Working Group on Generative AI Report
Title: ELEVATE-GenAI: Reporting Guidelines for the Use of Large Language Models in Health Economics and Outcomes Research: an ISPOR Working Group on Generative AI Report | ELEVATE-GenAI: Berichterstattungsrichtlinien für den Einsatz von großen Sprachmodellen in der Gesundheitsökonomie und der Ergebnisforschung: eine ISPOR-Arbeitsgruppe für generative KI-Berichte | ELEVATE-GenAI:在卫生经济学和成果研究中使用大语言模式的报告准则:科学、科学、科学、科学、技术研究所创举报告工作组 2501.12394v2 |
Authors (8): Rachael L. Fleurence, Dalia Dawoud, Jiang Bian, Mitchell K. Higashi, Xiaoyan Wang, Hua Xu, Jagpreet Chhatwal, Turgay Ayer
Introduction: Generative artificial intelligence (AI), particularly large language models (LLMs), holds significant promise for Health Economics and Outcomes Research (HEOR). However, standardized reporting guidance for LLM-assisted research is lacking. This article introduces the ELEVATE GenAI framework and checklist - reporting guidelines specifically designed for HEOR studies involving LLMs. Methods: The framework was developed through a targeted literature review of existing reporting guidelines, AI evaluation frameworks, and expert input from the ISPOR Working Group on Generative AI. It comprises ten domains, including model characteristics, accuracy, reproducibility, and fairness and bias. The accompanying checklist translates the framework into actionable reporting items. To illustrate its use, the framework was applied to two published HEOR studies: one focused on systematic literature review tasks and the other on economic modeling. Results: The ELEVATE GenAI framework offers a comprehensive structure for reporting LLM-assisted HEOR research, while the checklist facilitates practical implementation. Its application to the two case studies demonstrates its relevance and usability across different HEOR contexts. Limitations: Although the framework provides robust reporting guidance, further empirical testing is needed to assess its validity, completeness, usability, as well as its generalizability across diverse HEOR use cases. Conclusion: The ELEVATE GenAI framework and checklist address a critical gap by offering structured guidance for transparent, accurate, and reproducible reporting of LLM-assisted HEOR research. Future work will focus on extensive testing and validation to support broader adoption and refinement.
nan
Article 1562
Title@2025-06-06 (5): Laplace Transform Based Low-Complexity Learning of Continuous Markov Semigroups
Title: Laplace Transform Based Low-Complexity Learning of Continuous Markov Semigroups | Laplace Transform-basierte Low-Complexity-Lernen von kontinuierlichen Markov-Halbgruppen | 连续马尔科夫半群体低复杂程度低学习 2410.14477v2 |
Authors (6): Vladimir R. Kostic, Karim Lounici, Hélène Halconruy, Timothée Devergne, Pietro Novelli, Massimiliano Pontil
Markov processes serve as a universal model for many real-world random processes. This paper presents a data-driven approach for learning these models through the spectral decomposition of the infinitesimal generator (IG) of the Markov semigroup. The unbounded nature of IGs complicates traditional methods such as vector-valued regression and Hilbert-Schmidt operator analysis. Existing techniques, including physics-informed kernel regression, are computationally expensive and limited in scope, with no recovery guarantees for transfer operator methods when the time-lag is small. We propose a novel method that leverages the IG’s resolvent, characterized by the Laplace transform of transfer operators. This approach is robust to time-lag variations, ensuring accurate eigenvalue learning even for small time-lags. Our statistical analysis applies to a broader class of Markov processes than current methods while reducing computational complexity from quadratic to linear in the state dimension. Finally, we illustrate the behaviour of our method in two experiments.
nan
Article 1563
Title@2025-06-06 (5): LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws
Title: LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws | LLMs on the Line: Daten bestimmen die Loss-to-Loss-Skalierungsgesetze | 线上LLM:数据确定损失到损失的缩放法 2502.12120v2 |
Authors (5): Prasanna Mayilvahanan, Thaddäus Wiedemer, Sayak Mallick, Matthias Bethge, Wieland Brendel
Scaling laws guide the development of large language models (LLMs) by offering estimates for the optimal balance of model size, tokens, and compute. More recently, loss-to-loss scaling laws that relate losses across pretraining datasets and downstream tasks have emerged as a powerful tool for understanding and improving LLM performance. In this work, we investigate which factors most strongly influence loss-to-loss scaling. Our experiments reveal that the pretraining data and tokenizer determine the scaling trend. In contrast, model size, optimization hyperparameters, and even significant architectural differences, such as between transformer-based models like Llama and state-space models like Mamba, have limited impact. Consequently, practitioners should carefully curate suitable pretraining datasets for optimal downstream performance, while architectures and other settings can be freely optimized for training efficiency.
nan
Article 1564
Title@2025-06-06 (5): carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks
Title: carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks | carps: Ein Rahmen für den Vergleich von N Hyperparameter-Optimierern auf M Benchmarks | 在M基准上比较超超参数最佳放大器的框架 2506.06143v1 |
Authors (16): Carolin Benjamins, Helena Graf, Sarah Segel, Difan Deng, Tim Ruhkopf, Leona Hennig, Soham Basu, Neeratyoy Mallik, Edward Bergman, Deyao Chen, François Clément, Matthias Feurer, Katharina Eggensperger, Frank Hutter, Carola Doerr, Marius Lindauer
Hyperparameter Optimization (HPO) is crucial to develop well-performing machine learning models. In order to ease prototyping and benchmarking of HPO methods, we propose carps, a benchmark framework for Comprehensive Automated Research Performance Studies allowing to evaluate N optimizers on M benchmark tasks. In this first release of carps, we focus on the four most important types of HPO task types: blackbox, multi-fidelity, multi-objective and multi-fidelity-multi-objective. With 3 336 tasks from 5 community benchmark collections and 28 variants of 9 optimizer families, we offer the biggest go-to library to date to evaluate and compare HPO methods. The carps framework relies on a purpose-built, lightweight interface, gluing together optimizers and benchmark tasks. It also features an analysis pipeline, facilitating the evaluation of optimizers on benchmarks. However, navigating a huge number of tasks while developing and comparing methods can be computationally infeasible. To address this, we obtain a subset of representative tasks by minimizing the star discrepancy of the subset, in the space spanned by the full set. As a result, we propose an initial subset of 10 to 30 diverse tasks for each task type, and include functionality to re-compute subsets as more benchmarks become available, enabling efficient evaluations. We also establish a first set of baseline results on these tasks as a measure for future comparisons. With carps (https://www.github.com/automl/CARP-S), we make an important step in the standardization of HPO evaluation.
nan
Article 1565
Title@2025-06-06 (5): Efficient Diffusion Models: A Survey
Title: Efficient Diffusion Models: A Survey | Effiziente Diffusionsmodelle: Eine Umfrage | 高效传播模型:调查 2502.06805v3 |
Authors (14): Hui Shen, Jingxuan Zhang, Boning Xiong, Rui Hu, Shoufa Chen, Zhongwei Wan, Xin Wang, Yu Zhang, Zixuan Gong, Guangyin Bao, Chaofan Tao, Yongfeng Huang, Ye Yuan, Mi Zhang
Diffusion models have emerged as powerful generative models capable of producing high-quality contents such as images, videos, and audio, demonstrating their potential to revolutionize digital content creation. However, these capabilities come at the cost of their significant computational resources and lengthy generation time, underscoring the critical need to develop efficient techniques for practical deployment. In this survey, we provide a systematic and comprehensive review of research on efficient diffusion models. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient diffusion model topics from algorithm-level, system-level, and framework perspective, respectively. We have also created a GitHub repository where we organize the papers featured in this survey at https://github.com/AIoT-MLSys-Lab/Efficient-Diffusion-Model-Survey. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient diffusion model research and inspire them to contribute to this important and exciting field.
nan
Article 1566
Title@2025-06-06 (5): Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models
Title: Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models | Tabelle-r1: Selbstbeaufsichtigtes und verstärktes Lernen für programmbasierte Tabellenveranlagung in kleinen Sprachmodellen | 表-r1:针对基于方案的小语言模式说明理由的表格进行自我监督和强化学习 2506.06137v1 |
Authors (9): Rihui Jin, Zheyu Xin, Xing Xie, Zuoyi Li, Guilin Qi, Yongrui Chen, Xinbang Dai, Tongtong Wu, Gholamreza Haffari
Table reasoning (TR) requires structured reasoning over semi-structured tabular data and remains challenging, particularly for small language models (SLMs, e.g., LLaMA-8B) due to their limited capacity compared to large LMs (LLMs, e.g., GPT-4o). To narrow this gap, we explore program-based TR (P-TR), which circumvents key limitations of text-based TR (T-TR), notably in numerical reasoning, by generating executable programs. However, applying P-TR to SLMs introduces two challenges: (i) vulnerability to heterogeneity in table layouts, and (ii) inconsistency in reasoning due to limited code generation capability. We propose Table-r1, a two-stage P-TR method designed for SLMs. Stage 1 introduces an innovative self-supervised learning task, Layout Transformation Inference, to improve tabular layout generalization from a programmatic view. Stage 2 adopts a mix-paradigm variant of Group Relative Policy Optimization, enhancing P-TR consistency while allowing dynamic fallback to T-TR when needed. Experiments on four TR benchmarks demonstrate that Table-r1 outperforms all SLM-based methods, achieving at least a 15% accuracy improvement over the base model (LLaMA-8B) across all datasets and reaching performance competitive with LLMs.
nan
Article 1567
Title@2025-06-06 (5): Regret-Free Reinforcement Learning for LTL Specifications
Title: Regret-Free Reinforcement Learning for LTL Specifications | Bedauern-freie Verstärkungs-Lernen für LTL-Spezifikationen | 为远程学习具体要求而后悔地免费加强学习 2411.12019v2 |
Authors (3): Rupak Majumdar, Mahmoud Salamati, Sadegh Soudjani
Learning to control an unknown dynamical system with respect to high-level temporal specifications is an important problem in control theory. We present the first regret-free online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics. We assume that the underlying (unknown) dynamics is modeled by a finite-state and action Markov decision process (MDP). Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs. For general LTL specifications, we show that the synthesis problem can be reduced to a reach-avoid problem once the graph structure is known. Additionally, we provide an algorithm for learning the graph structure, assuming knowledge of a minimum transition probability, which operates independently of the main regret-free algorithm. Our LTL controller synthesis algorithm provides sharp bounds on how close we are to achieving optimal behavior after a finite number of learning episodes. In contrast, previous algorithms for LTL synthesis only provide asymptotic guarantees, which give no insight into the transient performance during the learning phase.
nan
Article 1568
Title@2025-06-06 (5): VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters
Title: VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters | VisionTS: Visuelle Maskierte Autoencoder sind kostenlose Zero-Shot-Zeitreihen-Forecaster | 视野:视觉蒙面自动识别器是无热零热时间序列预测器 2408.17253v4 |
Authors (6): Mouxiang Chen, Lefei Shen, Zhuo Li, Xiaoyun Joy Wang, Jianling Sun, Chenghao Liu
Foundation models have emerged as a promising approach in time series forecasting (TSF). Existing approaches either repurpose large language models (LLMs) or build large-scale time series datasets to develop TSF foundation models for universal forecasting. However, these methods face challenges due to the severe cross-domain gap or in-domain heterogeneity. This paper explores a new road to building a TSF foundation model from rich, high-quality natural images. Our key insight is that a visual masked autoencoder, pre-trained on the ImageNet dataset, can naturally be a numeric series forecaster. By reformulating TSF as an image reconstruction task, we bridge the gap between image pre-training and TSF downstream tasks. Surprisingly, without further adaptation in the time series domain, the proposed VisionTS could achieve better zero-shot forecast performance than existing TSF foundation models. With fine-tuning for one epoch, VisionTS could further improve the forecasting and achieve state-of-the-art performance in most cases. Extensive experiments reveal intrinsic similarities between images and real-world time series, suggesting that visual models may offer a “free lunch” for TSF and highlight the potential for future cross-modality research. Our code is publicly available at https://github.com/Keytoyze/VisionTS.
nan
Article 1569
Title@2025-06-06 (5): Similarity Matching Networks: Hebbian Learning and Convergence Over Multiple Time Scales
Title: Similarity Matching Networks: Hebbian Learning and Convergence Over Multiple Time Scales | Ähnlichkeit passender Netzwerke: Hebbisches Lernen und Konvergenz über mehrere Zeitskalan | 相似性匹配网络: Hebbian 学习和多重时间尺度的趋同 2506.06134v1 |
Authors (3): Veronica Centorrino, Francesco Bullo, Giovanni Russo
A recent breakthrough in biologically-plausible normative frameworks for dimensionality reduction is based upon the similarity matching cost function and the low-rank matrix approximation problem. Despite clear biological interpretation, successful application in several domains, and experimental validation, a formal complete convergence analysis remains elusive. Building on this framework, we consider and analyze a continuous-time neural network, the \emph{similarity matching network}, for principal subspace projection. Derived from a min-max-min objective, this biologically-plausible network consists of three coupled dynamics evolving at different time scales: neural dynamics, lateral synaptic dynamics, and feedforward synaptic dynamics at the fast, intermediate, and slow time scales, respectively. The feedforward and lateral synaptic dynamics consist of Hebbian and anti-Hebbian learning rules, respectively. By leveraging a multilevel optimization framework, we prove convergence of the dynamics in the offline setting. Specifically, at the first level (fast time scale), we show strong convexity of the cost function and global exponential convergence of the corresponding gradient-flow dynamics. At the second level (intermediate time scale), we prove strong concavity of the cost function and exponential convergence of the corresponding gradient-flow dynamics within the space of positive definite matrices. At the third and final level (slow time scale), we study a non-convex and non-smooth cost function, provide explicit expressions for its global minima, and prove almost sure convergence of the corresponding gradient-flow dynamics to the global minima. These results rely on two empirically motivated conjectures that are supported by thorough numerical experiments. Finally, we validate the effectiveness of our approach via a numerical example.
nan
Article 1570
Title@2025-06-06 (5): Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions
Title: Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions | Verbesserung der numerischen Stabilität des Normalisierten gegenseitigen Informations-Schätzers über hohe Abmessungen | 提高高方面标准化相互信息模拟器的数值稳定 2410.07642v2 |
Authors (2): Marko Tuononen, Ville Hautamäki
Mutual information provides a powerful, general-purpose metric for quantifying the amount of shared information between variables. Estimating normalized mutual information using a k-Nearest Neighbor (k-NN) based approach involves the calculation of the scaling-invariant k-NN radius. Calculation of the radius suffers from numerical overflow when the joint dimensionality of the data becomes high, typically in the range of several hundred dimensions. To address this issue, we propose a logarithmic transformation technique that improves the numerical stability of the radius calculation in high-dimensional spaces. By applying the proposed transformation during the calculation of the radius, numerical overflow is avoided, and precision is maintained. Proposed transformation is validated through both theoretical analysis and empirical evaluation, demonstrating its ability to stabilize the calculation without compromising precision, increasing bias, or adding significant computational overhead, while also helping to maintain estimator variance.
nan
Article 1571
Title@2025-06-06 (5): Gradient Similarity Surgery in Multi-Task Deep Learning
Title: Gradient Similarity Surgery in Multi-Task Deep Learning | Gradient Ähnlichkeit Chirurgie in Multi-Task Deep Learning | 多任务深层学习中的渐渐相似性外科手术 2506.06130v1 |
Authors (4): Thomas Borsani, Andrea Rosani, Giuseppe Nicosia, Giuseppe Di Fatta
The multi-task learning ($MTL$) paradigm aims to simultaneously learn multiple tasks within a single model capturing higher-level, more general hidden patterns that are shared by the tasks. In deep learning, a significant challenge in the backpropagation training process is the design of advanced optimisers to improve the convergence speed and stability of the gradient descent learning rule. In particular, in multi-task deep learning ($MTDL$) the multitude of tasks may generate potentially conflicting gradients that would hinder the concurrent convergence of the diverse loss functions. This challenge arises when the gradients of the task objectives have either different magnitudes or opposite directions, causing one or a few to dominate or to interfere with each other, thus degrading the training process. Gradient surgery methods address the problem explicitly dealing with conflicting gradients by adjusting the overall gradient trajectory. This work introduces a novel gradient surgery method, the Similarity-Aware Momentum Gradient Surgery (SAM-GS), which provides an effective and scalable approach based on a gradient magnitude similarity measure to guide the optimisation process. The SAM-GS surgery adopts gradient equalisation and modulation of the first-order momentum. A series of experimental tests have shown the effectiveness of SAM-GS on synthetic problems and $MTL$ benchmarks. Gradient magnitude similarity plays a crucial role in regularising gradient aggregation in $MTDL$ for the optimisation of the learning process.
nan
Article 1572
Title@2025-06-06 (5): Flow-Attentional Graph Neural Networks
Title: Flow-Attentional Graph Neural Networks | Fluss-aufmerksamer Graph Neuronale Netzwerke | 流动动态图图神经网络 2506.06127v1 |
Authors (4): Pascal Plettenberg, Dominik Köhler, Bernhard Sick, Josephine M. Thomas
Graph Neural Networks (GNNs) have become essential for learning from graph-structured data. However, existing GNNs do not consider the conservation law inherent in graphs associated with a flow of physical resources, such as electrical current in power grids or traffic in transportation networks, which can lead to reduced model performance. To address this, we propose flow attention, which adapts existing graph attention mechanisms to satisfy Kirchhoff's first law. Furthermore, we discuss how this modification influences the expressivity and identify sets of non-isomorphic graphs that can be discriminated by flow attention but not by standard attention. Through extensive experiments on two flow graph datasets (electronic circuits and power grids), we demonstrate that flow attention enhances the performance of attention-based GNNs on both graph-level classification and regression tasks.
nan
Article 1573
Title@2025-06-06 (5): Convergence of linear programming hierarchies for Gibbs states of spin systems
Title: Convergence of linear programming hierarchies for Gibbs states of spin systems | Konvergenz von linearen Programmierhierarchien für Gibbs-Zustände von Spinsystemen | Gibbbs旋转系统状态线性编程分级组合 2506.06125v1 |
Authors (2): Hamza Fawzi, Omar Fawzi
We consider the problem of computing expectation values of local functions under the Gibbs distribution of a spin system. In particular, we study two families of linear programming hierarchies for this problem. The first hierarchy imposes local spin flip equalities and has been considered in the bootstrap literature in high energy physics. For this hierarchy, we prove fast convergence under a spatial mixing (decay of correlations) condition. This condition is satisfied for example above the critical temperature for Ising models on a $d$-dimensional grid. The second hierarchy is based on a Markov chain having the Gibbs state as a fixed point and has been studied in the optimization literature and more recently in the bootstrap literature. For this hierarchy, we prove fast convergence provided the Markov chain mixes rapidly. Both hierarchies lead to an $\varepsilon$-approximation for local expectation values using a linear program of size quasi-polynomial in $n/\varepsilon$, where $n$ is the total number of sites, provided the interactions can be embedded in a $d$-dimensional grid with constant $d$. Compared to standard Monte Carlo methods, an advantage of this approach is that it always (i.e., for any system) outputs rigorous upper and lower bounds on the expectation value of interest, without needing an a priori analysis of the convergence speed.
nan
Article 1574
Title@2025-06-06 (5): Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
Title: Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library | Verstärkte Lernoptimierung für großformatiges Lernen: Eine effiziente und benutzerfreundliche Scaling-Bibliothek | 大规模学习的强化学习优化:高效和方便用户的扩大规模图书馆 2506.06122v1 |
Authors (41): Weixun Wang, Shaopan Xiong, Gengru Chen, Wei Gao, Sheng Guo, Yancheng He, Ju Huang, Jiaheng Liu, Zhendong Li, Xiaoyang Li, Zichen Liu, Haizhou Zhao, Dakai An, Lunxi Cao, Qiyang Cao, Wanxi Deng, Feilei Du, Yiliang Gu, Jiahe Li, Xiang Li, Mingjie Liu, Yijia Luo, Zihe Liu, Yadao Wang, Pei Wang, Tianyuan Wu, Yanan Wu, Yuheng Zhao, Shuaibing Zhao, Jin Yang, Siran Yang, Yingshui Tan, Huimin Yi, Yuchi Xu, Yujin Yuan, Xingyao Zhang, Lin Qu, Wenbo Su, Wei Wang, Jiamang Wang, Bo Zheng
We introduce ROLL, an efficient, scalable, and user-friendly library designed for Reinforcement Learning Optimization for Large-scale Learning. ROLL caters to three primary user groups: tech pioneers aiming for cost-effective, fault-tolerant large-scale training, developers requiring flexible control over training workflows, and researchers seeking agile experimentation. ROLL is built upon several key modules to serve these user groups effectively. First, a single-controller architecture combined with an abstraction of the parallel worker simplifies the development of the training pipeline. Second, the parallel strategy and data transfer modules enable efficient and scalable training. Third, the rollout scheduler offers fine-grained management of each sample’s lifecycle during the rollout stage. Fourth, the environment worker and reward worker support rapid and flexible experimentation with agentic RL algorithms and reward designs. Finally, AutoDeviceMapping allows users to assign resources to different models flexibly across various stages.
nan
Article 1575
Title@2025-06-06 (5): Training on Plausible Counterfactuals Removes Spurious Correlations
Title: Training on Plausible Counterfactuals Removes Spurious Correlations | Training auf Plausible Counterfactals entfernt spurlose Korrelationen | 关于可视反事实消除污损的培训 2505.16583v5 |
Authors (4): Shpresim Sadiku, Kartikeya Chitranshi, Hiroshi Kera, Sebastian Pokutta
Plausible counterfactual explanations (p-CFEs) are perturbations that minimally modify inputs to change classifier decisions while remaining plausible under the data distribution. In this study, we demonstrate that classifiers can be trained on p-CFEs labeled with induced \emph{incorrect} target classes to classify unperturbed inputs with the original labels. While previous studies have shown that such learning is possible with adversarial perturbations, we extend this paradigm to p-CFEs. Interestingly, our experiments reveal that learning from p-CFEs is even more effective: the resulting classifiers achieve not only high in-distribution accuracy but also exhibit significantly reduced bias with respect to spurious correlations.
nan
Article 1576
Title@2025-06-06 (5): Learners’ Languages
Title: Learners’ Languages | Sprachen der Lernenden | 学生语言 2103.01189v3 |
Authors (1): David I. Spivak
In “Backprop as functor”, the authors show that the fundamental elements of deep learning – gradient descent and backpropagation – can be conceptualized as a strong monoidal functor Para(Euc)$\to$Learn from the category of parameterized Euclidean spaces to that of learners, a category developed explicitly to capture parameter update and backpropagation. It was soon realized that there is an isomorphism Learn$\cong$Para(Slens), where Slens is the symmetric monoidal category of simple lenses as used in functional programming. In this note, we observe that Slens is a full subcategory of Poly, the category of polynomial functors in one variable, via the functor $A\mapsto Ay^A$. Using the fact that (Poly,$\otimes$) is monoidal closed, we show that a map $A\to B$ in Para(Slens) has a natural interpretation in terms of dynamical systems (more precisely, generalized Moore machines) whose interface is the internal-hom type $[Ay^A,By^B]$. Finally, we review the fact that the category p-Coalg of dynamical systems on any $p \in$ Poly forms a topos, and consider the logical propositions that can be stated in its internal language. We give gradient descent as an example, and we conclude by discussing some directions for future work.
nan
Article 1577
Title@2025-06-06 (5): Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness
Title: Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness | Auf dem Weg zum Lebenszyklus Unlearning Commitment Management: Messung der Stichprobenebene Unlearning Completeness | 实现生命周期不学习承诺管理:衡量抽样一级不学习 2506.06112v1 |
Authors (5): Cheng-Long Wang, Qi Li, Zihang Xiang, Yinzhi Cao, Di Wang
Growing concerns over data privacy and security highlight the importance of machine unlearning–removing specific data influences from trained models without full retraining. Techniques like Membership Inference Attacks (MIAs) are widely used to externally assess successful unlearning. However, existing methods face two key limitations: (1) maximizing MIA effectiveness (e.g., via online attacks) requires prohibitive computational resources, often exceeding retraining costs; (2) MIAs, designed for binary inclusion tests, struggle to capture granular changes in approximate unlearning. To address these challenges, we propose the Interpolated Approximate Measurement (IAM), a framework natively designed for unlearning inference. IAM quantifies sample-level unlearning completeness by interpolating the model’s generalization-fitting behavior gap on queried samples. IAM achieves strong performance in binary inclusion tests for exact unlearning and high correlation for approximate unlearning–scalable to LLMs using just one pre-trained shadow model. We theoretically analyze how IAM’s scoring mechanism maintains performance efficiently. We then apply IAM to recent approximate unlearning algorithms, revealing general risks of both over-unlearning and under-unlearning, underscoring the need for stronger safeguards in approximate unlearning systems. The code is available at https://github.com/Happy2Git/Unlearning_Inference_IAM.
nan
Article 1578
Title@2025-06-06 (5): Synthetic Tabular Data: Methods, Attacks and Defenses
Title: Synthetic Tabular Data: Methods, Attacks and Defenses | Synthetische Tabellendaten: Methoden, Angriffe und Verteidigungen | 合成图表数据:方法、攻击和防御 2506.06108v1 |
Authors (4): Graham Cormode, Samuel Maddock, Enayat Ullah, Shripad Gade
Synthetic data is often positioned as a solution to replace sensitive fixed-size datasets with a source of unlimited matching data, freed from privacy concerns. There has been much progress in synthetic data generation over the last decade, leveraging corresponding advances in machine learning and data analytics. In this survey, we cover the key developments and the main concepts in tabular synthetic data generation, including paradigms based on probabilistic graphical models and on deep learning. We provide background and motivation, before giving a technical deep-dive into the methodologies. We also address the limitations of synthetic data, by studying attacks that seek to retrieve information about the original sensitive data. Finally, we present extensions and open problems in this area.
nan
Article 1579
Title@2025-06-06 (5): The Canary’s Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
Title: The Canary’s Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text | Das Echo der Kanaren: Auditing Privacy Risks of LLM-generated Synthetic Text | 加那利的回声:LLM发明的合成文字的 审计隐私风险 2502.14921v2 |
Authors (5): Matthieu Meeus, Lukas Wutschitz, Santiago Zanella-Béguelin, Shruti Tople, Reza Shokri
How much information about training samples can be leaked through synthetic data generated by Large Language Models (LLMs)? Overlooking the subtleties of information flow in synthetic data generation pipelines can lead to a false sense of privacy. In this paper, we assume an adversary has access to some synthetic data generated by a LLM. We design membership inference attacks (MIAs) that target the training data used to fine-tune the LLM that is then used to synthesize data. The significant performance of our MIA shows that synthetic data leak information about the training data. Further, we find that canaries crafted for model-based MIAs are sub-optimal for privacy auditing when only synthetic data is released. Such out-of-distribution canaries have limited influence on the model’s output when prompted to generate useful, in-distribution synthetic data, which drastically reduces their effectiveness. To tackle this problem, we leverage the mechanics of auto-regressive models to design canaries with an in-distribution prefix and a high-perplexity suffix that leave detectable traces in synthetic data. This enhances the power of data-based MIAs and provides a better assessment of the privacy risks of releasing synthetic data generated by LLMs.
nan
Article 1580
Title@2025-06-06 (5): Label-Context-Dependent Internal Language Model Estimation for CTC
Title: Label-Context-Dependent Internal Language Model Estimation for CTC | Label-Context-Dependent Interne Sprachmodellschätzung für CTC | Label-Ctext-依赖性内部语言模型 反恐委员会内部语言模型估算 2506.06096v1 |
Authors (4): Zijian Yang, Minh-Nghia Phan, Ralf Schlüter, Hermann Ney
Although connectionist temporal classification (CTC) has the label context independence assumption, it can still implicitly learn a context-dependent internal language model (ILM) due to modern powerful encoders. In this work, we investigate the implicit context dependency modeled in the ILM of CTC. To this end, we propose novel context-dependent ILM estimation methods for CTC based on knowledge distillation (KD) with theoretical justifications. Furthermore, we introduce two regularization methods for KD. We conduct experiments on Librispeech and TED-LIUM Release 2 datasets for in-domain and cross-domain evaluation, respectively. Experimental results show that context-dependent ILMs outperform the context-independent priors in cross-domain evaluation, indicating that CTC learns a context-dependent ILM. The proposed label-level KD with smoothing method surpasses other ILM estimation approaches, with more than 13% relative improvement in word error rate compared to shallow fusion.
nan
Article 1581
Title@2025-06-06 (5): Flexible Operator Fusion for Fast Sparse Transformer with Diverse Masking on GPU
Title: Flexible Operator Fusion for Fast Sparse Transformer with Diverse Masking on GPU | Flexible Operator Fusion für schnellen Sparse Transformer mit vielfältigem maskieren auf GPU | 具有 GPU 上多种掩码的快速分散变异器灵活操作器融合 2506.06095v1 |
Authors (9): Wenhao Dai, Haodong Deng, Mengfei Rong, Xinyu Yang, Hongyu Liu, Fangxin Liu, Hailong Yang, Weifeng Liu, Qingxiao Sun
Large language models are popular around the world due to their powerful understanding capabilities. As the core component of LLMs, accelerating Transformer through parallelization has gradually become a hot research topic. Mask layers introduce sparsity into Transformer to reduce calculations. However, previous works rarely focus on the performance optimization of sparse Transformer. Moreover, rule-based mechanisms ignore the fusion opportunities of mixed-type operators and fail to adapt to various sequence lengths. To address the above problems, we propose STOF, a framework that incorporates optimizations for Sparse Transformer via flexible masking and operator fusion on GPU. We firstly unify the storage format and kernel implementation for the multi-head attention. Then, we map fusion schemes to compilation templates and determine the optimal parameter setting through a two-stage search engine. The experimental results show that compared to the state-of-the-art work, STOF achieves maximum speedups of 1.7x in MHA computation and 1.5x in end-to-end inference.
nan
Article 1582
Title@2025-06-06 (5): On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems
Title: On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems | On-Board-Mission-Replanning für adaptive kooperative Multi-Roboter-Systeme | 特派团适应性合作多机器人系统重新规划 2506.06094v1 |
Authors (6): Elim Kwan, Rehman Qureshi, Liam Fletcher, Colin Laganier, Victoria Nockles, Richard Walters
Cooperative autonomous robotic systems have significant potential for executing complex multi-task missions across space, air, ground, and maritime domains. But they commonly operate in remote, dynamic and hazardous environments, requiring rapid in-mission adaptation without reliance on fragile or slow communication links to centralised compute. Fast, on-board replanning algorithms are therefore needed to enhance resilience. Reinforcement Learning shows strong promise for efficiently solving mission planning tasks when formulated as Travelling Salesperson Problems (TSPs), but existing methods: 1) are unsuitable for replanning, where agents do not start at a single location; 2) do not allow cooperation between agents; 3) are unable to model tasks with variable durations; or 4) lack practical considerations for on-board deployment. Here we define the Cooperative Mission Replanning Problem as a novel variant of multiple TSP with adaptations to overcome these issues, and develop a new encoder/decoder-based model using Graph Attention Networks and Attention Models to solve it effectively and efficiently. Using a simple example of cooperative drones, we show our replanner consistently (90% of the time) maintains performance within 10% of the state-of-the-art LKH3 heuristic solver, whilst running 85-370 times faster on a Raspberry Pi. This work paves the way for increased resilience in autonomous multi-agent systems.
nan
Article 1583
Title@2025-06-06 (5): Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
Title: Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture | Paradigmen der AI-Bewertung: Kartierungsziele, Methoden und Kultur | AI 评价:绘图目标、方法和文化 2502.15620v2 |
Authors (4): John Burden, Marko Tešić, Lorenzo Pacchiardi, José Hernández-Orallo
Research in AI evaluation has grown increasingly complex and multidisciplinary, attracting researchers with diverse backgrounds and objectives. As a result, divergent evaluation paradigms have emerged, often developing in isolation, adopting conflicting terminologies, and overlooking each other’s contributions. This fragmentation has led to insular research trajectories and communication barriers both among different paradigms and with the general public, contributing to unmet expectations for deployed AI systems. To help bridge this insularity, in this paper we survey recent work in the AI evaluation landscape and identify six main paradigms. We characterise major recent contributions within each paradigm across key dimensions related to their goals, methodologies and research cultures. By clarifying the unique combination of questions and approaches associated with each paradigm, we aim to increase awareness of the breadth of current evaluation approaches and foster cross-pollination between different paradigms. We also identify potential gaps in the field to inspire future research directions.
nan
Article 1584
Title@2025-06-06 (5): HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions
Title: HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions | HeavyWater und SimplexWater: Wassermarkierung mit geringer Entropie Textverteilung | 重水和简易水:低英基文本分发的标注水 2506.06409v1 |
Authors (8): Dor Tsur, Carol Xuan Long, Claudio Mayrink Verdun, Hsiang Hsu, Chen-Fu Chen, Haim Permuter, Sajani Vithana, Flavio P. Calmon
Large language model (LLM) watermarks enable authentication of text provenance, curb misuse of machine-generated text, and promote trust in AI systems. Current watermarks operate by changing the next-token predictions output by an LLM. The updated (i.e., watermarked) predictions depend on random side information produced, for example, by hashing previously generated tokens. LLM watermarking is particularly challenging in low-entropy generation tasks - such as coding - where next-token predictions are near-deterministic. In this paper, we propose an optimization framework for watermark design. Our goal is to understand how to most effectively use random side information in order to maximize the likelihood of watermark detection and minimize the distortion of generated text. Our analysis informs the design of two new watermarks: HeavyWater and SimplexWater. Both watermarks are tunable, gracefully trading-off between detection accuracy and text distortion. They can also be applied to any LLM and are agnostic to side information generation. We examine the performance of HeavyWater and SimplexWater through several benchmarks, demonstrating that they can achieve high watermark detection accuracy with minimal compromise of text generation quality, particularly in the low-entropy regime. Our theoretical analysis also reveals surprising new connections between LLM watermarking and coding theory. The code implementation can be found in https://github.com/DorTsur/HeavyWater_SimplexWater
nan
Article 1585
Title@2025-06-06 (5): LinGuinE: Longitudinal Guidance Estimation for Volumetric Lung Tumour Segmentation
Title: LinGuinE: Longitudinal Guidance Estimation for Volumetric Lung Tumour Segmentation | LinGuinE: Longitudinal Guidance Schätzung für Volumetrische Lungentumorsegmentierung | LinGuinE: 活性肺炎肿瘤分离的纵向指导估计 2506.06092v1 |
Authors (5): Nadine Garibli, Mayank Patwari, Bence Csiba, Yi Wei, Kostas Sidiropoulos
Segmentation of lung gross tumour volumes is an important first step in radiotherapy and surgical intervention, and is starting to play a role in assessing chemotherapy response. Response to a drug is measured by tracking the tumour volumes over a series of CT scans over a time period i.e. a longitudinal study. However, there currently exist few solutions for automated or semi-automated longitudinal tumour segmentation. This paper introduces LinGuinE, an automated method to segment a longitudinal series of lung tumours. A radiologist must provide an initial input, indicating the location of the tumour in a CT scan at an arbitrary time point. LinGuinE samples points inside this tumour and propagates them to another time point using rigid registration. A click validity classifier selects points which still fall within the tumour; these are used to automatically create a segmentation in the new time point. We test LinGuinE on a dataset acquired from a phase 3 clinical trial for lung tumours and the publicly available 4-D lung CBCT dataset. We find that LinGuinE improves the Dice on both test sets by over 20% (p< 0.05) across 63 longitudinal studies. We show that any time point can be used as a starting point, conduct ablation experiments, and find that our LinGuinE setup yields the best results on both test datasets.
nan
Article 1586
Title@2025-06-06 (5): Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion
Title: Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion | Multidimensionaler Adaptiver Koeffizient für die Inferenz-Trajektorie-Optimierung in Strömung und Diffusion | 流动与传播过程中的多维可调适系数 2404.14161v3 |
Authors (4): Dohoon Lee, Jaehyun Park, Hyunwoo J. Kim, Kyogu Lee
Flow and diffusion models have demonstrated strong performance and training stability across various tasks but lack two critical properties of simulation-based methods: freedom of dimensionality and adaptability to different inference trajectories. To address this limitation, we propose the Multidimensional Adaptive Coefficient (MAC), a plug-in module for flow and diffusion models that extends conventional unidimensional coefficients to multidimensional ones and enables inference trajectory-wise adaptation. MAC is trained via simulation-based feedback through adversarial refinement. Empirical results across diverse frameworks and datasets demonstrate that MAC enhances generative quality with high training efficiency. Consequently, our work offers a new perspective on inference trajectory optimality, encouraging future research to move beyond vector field design and to leverage training-efficient, simulation-based optimization.
nan
Article 1587
Title@2025-06-06 (5): Multilevel neural simulation-based inference
Title: Multilevel neural simulation-based inference | Mehrstufige neuronale simulationsbasierte Inferenz | 多级神经模拟模拟推断 2506.06087v1 |
Authors (4): Yuga Hikida, Ayush Bharti, Niall Jeffrey, François-Xavier Briol
Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing a simulator. However, the performance of neural SBI can suffer when simulators are computationally expensive, thereby limiting the number of simulations that can be performed. In this paper, we propose a novel approach to neural SBI which leverages multilevel Monte Carlo techniques for settings where several simulators of varying cost and fidelity are available. We demonstrate through both theoretical analysis and extensive experiments that our method can significantly enhance the accuracy of SBI methods given a fixed computational budget.
nan
Article 1588
Title@2025-06-06 (5): A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data
Title: A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data | Ein neuartiges, Mensch-in-the-Loop Computational Grounded Theory Framework für große soziale Daten | 大型社会数据新颖的、 “ 网上人 “ 计算基础理论大社会数据新理论框架 2506.06083v1 |
Authors (4): Lama Alqazlan, Zheng Fang, Michael Castelle, Rob Procter
The availability of big data has significantly influenced the possibilities and methodological choices for conducting large-scale behavioural and social science research. In the context of qualitative data analysis, a major challenge is that conventional methods require intensive manual labour and are often impractical to apply to large datasets. One effective way to address this issue is by integrating emerging computational methods to overcome scalability limitations. However, a critical concern for researchers is the trustworthiness of results when Machine Learning (ML) and Natural Language Processing (NLP) tools are used to analyse such data. We argue that confidence in the credibility and robustness of results depends on adopting a ‘human-in-the-loop’ methodology that is able to provide researchers with control over the analytical process, while retaining the benefits of using ML and NLP. With this in mind, we propose a novel methodological framework for Computational Grounded Theory (CGT) that supports the analysis of large qualitative datasets, while maintaining the rigour of established Grounded Theory (GT) methodologies. To illustrate the framework’s value, we present the results of testing it on a dataset collected from Reddit in a study aimed at understanding tutors’ experiences in the gig economy.
nan
Article 1589
Title@2025-06-06 (5): Training Deep Learning Models with Norm-Constrained LMOs
Title: Training Deep Learning Models with Norm-Constrained LMOs | Ausbildung von Deep Learning-Modellen mit normgebundenen ABS | 与受诺姆约束的LMO培训的LMO培训深学习模式 2502.07529v2 |
Authors (6): Thomas Pethick, Wanyun Xie, Kimon Antonakopoulos, Zhenyu Zhu, Antonio Silveti-Falls, Volkan Cevher
In this work, we study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new stochastic family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems. The resulting update rule unifies several existing optimization methods under a single framework. Furthermore, we propose an explicit choice of norm for deep architectures, which, as a side benefit, leads to the transferability of hyperparameters across model sizes. Experimentally, we demonstrate significant speedups on nanoGPT training using our algorithm, Scion, without any reliance on Adam. The proposed method is memory-efficient, requiring only one set of model weights and one set of gradients, which can be stored in half-precision. The code is available at https://github.com/LIONS-EPFL/scion .
nan
Article 1590
Title@2025-06-06 (5): Proximal Policy Distillation
Title: Proximal Policy Distillation | Proximale Politikdestillation | Proximal 政策蒸馏 2407.15134v2 |
Authors (1): Giacomo Spigler
We introduce Proximal Policy Distillation (PPD), a novel policy distillation method that integrates student-driven distillation and Proximal Policy Optimization (PPO) to increase sample efficiency and to leverage the additional rewards that the student policy collects during distillation. To assess the efficacy of our method, we compare PPD with two common alternatives, student-distill and teacher-distill, over a wide range of reinforcement learning environments that include discrete actions and continuous control (ATARI, Mujoco, and Procgen). For each environment and method, we perform distillation to a set of target student neural networks that are smaller, identical (self-distillation), or larger than the teacher network. Our findings indicate that PPD improves sample efficiency and produces better student policies compared to typical policy distillation approaches. Moreover, PPD demonstrates greater robustness than alternative methods when distilling policies from imperfect demonstrations. The code for the paper is released as part of a new Python library built on top of stable-baselines3 to facilitate policy distillation: `sb3-distill’.
nan
Article 1591
Title@2025-06-06 (5): System-Aware Unlearning Algorithms: Use Lesser, Forget Faster
Title: System-Aware Unlearning Algorithms: Use Lesser, Forget Faster | System-Aware Unlearning Algorithmen: Weniger verwenden, schneller vergessen | 系统提醒 取消学习算法 : 使用更少, 忘记更快 2506.06073v1 |
Authors (3): Linda Lu, Ayush Sekhari, Karthik Sridharan
Machine unlearning addresses the problem of updating a machine learning model/system trained on a dataset $S$ so that the influence of a set of deletion requests $U \subseteq S$ on the unlearned model is minimized. The gold standard definition of unlearning demands that the updated model, after deletion, be nearly identical to the model obtained by retraining. This definition is designed for a worst-case attacker (one who can recover not only the unlearned model but also the remaining data samples, i.e., $S \setminus U$). Such a stringent definition has made developing efficient unlearning algorithms challenging. However, such strong attackers are also unrealistic. In this work, we propose a new definition, system-aware unlearning, which aims to provide unlearning guarantees against an attacker that can at best only gain access to the data stored in the system for learning/unlearning requests and not all of $S\setminus U$. With this new definition, we use the simple intuition that if a system can store less to make its learning/unlearning updates, it can be more secure and update more efficiently against a system-aware attacker. Towards that end, we present an exact system-aware unlearning algorithm for linear classification using a selective sampling-based approach, and we generalize the method for classification with general function classes. We theoretically analyze the tradeoffs between deletion capacity, accuracy, memory, and computation time.
nan
Article 1592
Title@2025-06-06 (5): ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis
Title: ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis | ProofAug: Effiziente Neuraltheorem-Probung mittels feinkörniger Proof-Strukturanalyse | 校对:通过精细的校对结构分析证明有效的神经理论 2501.18310v2 |
Authors (4): Haoxiong Liu, Jiacheng Sun, Zhenguo Li, Andrew C Yao
The synergy between deep learning models and traditional automation tools, such as built-in tactics of the proof assistant and off-the-shelf automated theorem provers, plays a crucial role in developing robust and efficient neural theorem provers(NTPs). However, for proof synthesis with LLMs, previous work applies automation tools either only when explicitly invoked by the model or at a single granularity level, failing to fully exploit their power. To solve this issue, we propose ProofAug, a procedure that equips LLMs with automation methods at various granularities through fine-grained structure analysis of model-generated proof proposals. ProofAug also serves as a versatile plug-and-play module that seamlessly integrates with any tree-search algorithm, enabling our construction of an efficient recursive proving (ERP) module to further enhance performance. The superiority of our method is validated on the miniF2F benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant. Notably, by additionally employing a mixed prompting strategy, we achieve a cumulative pass rate of 66.0% after curation of the dataset (61.9% for the original version) with 2100 queries to the model per problem (In contrast, the previous SOTA in Isabelle, Subgoal-XL, only achieves 56.1% using 16384 queries per problem). We also implement a Lean 4 version of ProofAug that can improve the pass@1 performance of Kimina-Prover-Preview-Distill-1.5B from 44.3% to 50.4% on miniF2F-test. Our code is available at https://github.com/haoxiongliu/ProofAug.
nan
Article 1593
Title@2025-06-06 (5): Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models
Title: Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models | Bewertung von Intersektionalen Bias in Darstellungen von vortrainierten Bilderkennungsmodellen | 评估培训前图像识别模型代表性的跨部门偏见 2506.03664v2 |
Authors (2): Valerie Krug, Sebastian Stober
Deep Learning models have achieved remarkable success. Training them is often accelerated by building on top of pre-trained models which poses the risk of perpetuating encoded biases. Here, we investigate biases in the representations of commonly used ImageNet classifiers for facial images while considering intersections of sensitive variables age, race and gender. To assess the biases, we use linear classifier probes and visualize activations as topographic maps. We find that representations in ImageNet classifiers particularly allow differentiation between ages. Less strongly pronounced, the models appear to associate certain ethnicities and distinguish genders in middle-aged groups.
nan
Article 1594
Title@2025-06-06 (5): The Influence of the Memory Capacity of Neural DDEs on the Universal Approximation Property
Title: The Influence of the Memory Capacity of Neural DDEs on the Universal Approximation Property | Der Einfluss der Speicherkapazität von neuralen DDEs auf die universelle Annäherungseigenschaft | 神经DDE的记忆力对普遍接近财产的影响 2505.07244v2 |
Authors (2): Christian Kuehn, Sara-Viola Kuntz
Neural Ordinary Differential Equations (Neural ODEs), which are the continuous-time analog of Residual Neural Networks (ResNets), have gained significant attention in recent years. Similarly, Neural Delay Differential Equations (Neural DDEs) can be interpreted as an infinite depth limit of Densely Connected Residual Neural Networks (DenseResNets). In contrast to traditional ResNet architectures, DenseResNets are feed-forward networks that allow for shortcut connections across all layers. These additional connections introduce memory in the network architecture, as typical in many modern architectures. In this work, we explore how the memory capacity in neural DDEs influences the universal approximation property. The key parameter for studying the memory capacity is the product $K \tau$ of the Lipschitz constant and the delay of the DDE. In the case of non-augmented architectures, where the network width is not larger than the input and output dimensions, neural ODEs and classical feed-forward neural networks cannot have the universal approximation property. We show that if the memory capacity $K\tau$ is sufficiently small, the dynamics of the neural DDE can be approximated by a neural ODE. Consequently, non-augmented neural DDEs with a small memory capacity also lack the universal approximation property. In contrast, if the memory capacity $K\tau$ is sufficiently large, we can establish the universal approximation property of neural DDEs for continuous functions. If the neural DDE architecture is augmented, we can expand the parameter regions in which universal approximation is possible. Overall, our results show that by increasing the memory capacity $K\tau$, the infinite-dimensional phase space of DDEs with positive delay $\tau>0$ is not sufficient to guarantee a direct jump transition to universal approximation, but only after a certain memory threshold, universal approximation holds.
nan
Article 1595
Title@2025-06-06 (5): Zero-Shot Detection of LLM-Generated Code via Approximated Task Conditioning
Title: Zero-Shot Detection of LLM-Generated Code via Approximated Task Conditioning | Zero-Shot-Erkennung des LLM-generierten Codes über eine angenäherte Aufgabenkonditionierung | 通过近似任务设置条件对LLM生成的代码进行零热检测 2506.06069v1 |
Authors (4): Maor Ashkenazi, Ofir Brenner, Tal Furman Shohet, Eran Treister
Detecting Large Language Model (LLM)-generated code is a growing challenge with implications for security, intellectual property, and academic integrity. We investigate the role of conditional probability distributions in improving zero-shot LLM-generated code detection, when considering both the code and the corresponding task prompt that generated it. Our key insight is that when evaluating the probability distribution of code tokens using an LLM, there is little difference between LLM-generated and human-written code. However, conditioning on the task reveals notable differences. This contrasts with natural language text, where differences exist even in the unconditional distributions. Leveraging this, we propose a novel zero-shot detection approach that approximates the original task used to generate a given code snippet and then evaluates token-level entropy under the approximated task conditioning (ATC). We further provide a mathematical intuition, contextualizing our method relative to previous approaches. ATC requires neither access to the generator LLM nor the original task prompts, making it practical for real-world applications. To the best of our knowledge, it achieves state-of-the-art results across benchmarks and generalizes across programming languages, including Python, CPP, and Java. Our findings highlight the importance of task-level conditioning for LLM-generated code detection. The supplementary materials and code are available at https://github.com/maorash/ATC, including the dataset gathering implementation, to foster further research in this area.
nan
Article 1596
Title@2025-06-06 (5): Feedforward Few-shot Species Range Estimation
Title: Feedforward Few-shot Species Range Estimation | Einspeisung Wenig abgespuckte Arten Range-Schätzung | 向前进的微粒物种范围估计 2502.14977v2 |
Authors (9): Christian Lange, Max Hamilton, Elijah Cole, Alexander Shepard, Samuel Heinrich, Angela Zhu, Subhransu Maji, Grant Van Horn, Oisin Mac Aodha
Knowing where a particular species can or cannot be found on Earth is crucial for ecological research and conservation efforts. By mapping the spatial ranges of all species, we would obtain deeper insights into how global biodiversity is affected by climate change and habitat loss. However, accurate range estimates are only available for a relatively small proportion of all known species. For the majority of the remaining species, we typically only have a small number of records denoting the spatial locations where they have previously been observed. We outline a new approach for few-shot species range estimation to address the challenge of accurately estimating the range of a species from limited data. During inference, our model takes a set of spatial locations as input, along with optional metadata such as text or an image, and outputs a species encoding that can be used to predict the range of a previously unseen species in a feedforward manner. We evaluate our approach on two challenging benchmarks, where we obtain state-of-the-art range estimation performance, in a fraction of the compute time, compared to recent alternative approaches.
nan
Article 1597
Title@2025-06-06 (5): HIGHT: Hierarchical Graph Tokenization for Molecule-Language Alignment
Title: HIGHT: Hierarchical Graph Tokenization for Molecule-Language Alignment | HIGHT: Hierarchische Graphen-Tokenisierung für die Ausrichtung von Molekülen | HIG: 分子-语言对齐的等级图形方位化 2406.14021v2 |
Authors (5): Yongqiang Chen, Quanming Yao, Juzheng Zhang, James Cheng, Yatao Bian
Recently, there has been a surge of interest in extending the success of large language models (LLMs) from texts to molecules. Most existing approaches adopt a graph neural network to represent a molecule as a series of node tokens for molecule-language alignment, which, however, have overlooked the inherent hierarchical structures in molecules. Notably, higher-order molecular structures contain rich semantics of functional groups, which encode crucial biochemical functionalities of the molecules. We show that neglecting the hierarchical information in tokenization will lead to subpar molecule-language alignment and severe hallucination. To address this limitation, we propose HIerarchical GrapH Tokenization (HIGHT). HIGHT employs a hierarchical graph tokenizer that encodes the hierarchy of atom, motif, and molecular levels of informative tokens to improve the molecular perception of LLMs. HIGHT also adopts an augmented instruction tuning dataset, enriched with the hierarchical graph information, to further enhance the molecule-language alignment. Extensive experiments on 14 real-world benchmarks verify the effectiveness of HIGHT in reducing hallucination by 40%, and significant improvements in various molecule-language downstream tasks. The project is available at https: //higraphllm.github.io/.
nan
Article 1598
Title@2025-06-06 (5): Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large $p$
Title: Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large $p$ | Ab Initio Nichtparametrische Variable Auswahl für skalierbare symbolische Regression mit großen $p$ | 用于大额p美元可缩放的符号回归的非参数变量选择 2410.13681v2 |
Authors (2): Shengbin Ye, Meng Li
Symbolic regression (SR) is a powerful technique for discovering symbolic expressions that characterize nonlinear relationships in data, gaining increasing attention for its interpretability, compactness, and robustness. However, existing SR methods do not scale to datasets with a large number of input variables (referred to as extreme-scale SR), which is common in modern scientific applications. This ``large $p$’’ setting, often accompanied by measurement error, leads to slow performance of SR methods and overly complex expressions that are difficult to interpret. To address this scalability challenge, we propose a method called PAN+SR, which combines a key idea of ab initio nonparametric variable selection with SR to efficiently pre-screen large input spaces and reduce search complexity while maintaining accuracy. The use of nonparametric methods eliminates model misspecification, supporting a strategy called parametric-assisted nonparametric (PAN). We also extend SRBench, an open-source benchmarking platform, by incorporating high-dimensional regression problems with various signal-to-noise ratios. Our results demonstrate that PAN+SR consistently enhances the performance of 19 contemporary SR methods, enabling several to achieve state-of-the-art performance on these challenging datasets.
nan
Article 1599
Title@2025-06-06 (5): Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation
Title: Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation | Effiziente feinkörnige Anleitung für die Diffusionsmodellbasierte Symbolische Musikgenerierung | 以模型为基础的示范符号音乐制作的 扩散模型的精效微粒指导 2410.08435v3 |
Authors (5): Tingyu Zhu, Haoyu Liu, Ziyu Wang, Zhimin Jiang, Zeyu Zheng
Developing generative models to create or conditionally create symbolic music presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To address these challenges, we introduce an efficient Fine-Grained Guidance (FGG) approach within diffusion models. FGG guides the diffusion models to generate music that aligns more closely with the control and intent of expert composers, which is critical to improve the accuracy, listenability, and quality of generated music. This approach empowers diffusion models to excel in advanced applications such as improvisation, and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effects of the FGG approach. We provide numerical experiments and subjective evaluation to demonstrate the effectiveness of our approach. We have published a demo page to showcase performances, which enables real-time interactive generation.
nan
Article 1600
Title@2025-06-06 (5): MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
Title: MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding | MedXpertQA: Benchmarking von medizinischem Verständnis auf Expertenebene | MedXperttQA:确定专家级医疗理由和理解基准 2501.18362v3 |
Authors (9): Yuxin Zuo, Shang Qu, Yifei Li, Zhangren Chen, Xuekai Zhu, Ermo Hua, Kaiyan Zhang, Ning Ding, Bowen Zhou
We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 questions spanning 17 specialties and 11 body systems. It includes two subsets, Text for text evaluation and MM for multimodal evaluation. Notably, MM introduces expert-level exam questions with diverse images and rich clinical information, including patient records and examination results, setting it apart from traditional medical multimodal benchmarks with simple QA pairs generated from image captions. MedXpertQA applies rigorous filtering and augmentation to address the insufficient difficulty of existing benchmarks like MedQA, and incorporates specialty board questions to improve clinical relevance and comprehensiveness. We perform data synthesis to mitigate data leakage risk and conduct multiple rounds of expert reviews to ensure accuracy and reliability. We evaluate 18 leading models on \benchmark. Moreover, medicine is deeply connected to real-world decision-making, providing a rich and representative setting for assessing reasoning abilities beyond mathematics and code. To this end, we develop a reasoning-oriented subset to facilitate the assessment of o1-like models. Code and data are available at: https://github.com/TsinghuaC3I/MedXpertQA
nan
Article 1601
Title@2025-06-06 (5): TRUST: Test-time Resource Utilization for Superior Trustworthiness
Title: TRUST: Test-time Resource Utilization for Superior Trustworthiness | TRUST: Testzeit-Ressourcennutzung für höchste Vertrauenswürdigkeit | 信托基金:高可信赖性试验时间利用资源 2506.06048v1 |
Authors (2): Haripriya Harikumar, Santu Rana
Standard uncertainty estimation techniques, such as dropout, often struggle to clearly distinguish reliable predictions from unreliable ones. We attribute this limitation to noisy classifier weights, which, while not impairing overall class-level predictions, render finer-level statistics less informative. To address this, we propose a novel test-time optimization method that accounts for the impact of such noise to produce more reliable confidence estimates. This score defines a monotonic subset-selection function, where population accuracy consistently increases as samples with lower scores are removed, and it demonstrates superior performance in standard risk-based metrics such as AUSE and AURC. Additionally, our method effectively identifies discrepancies between training and test distributions, reliably differentiates in-distribution from out-of-distribution samples, and elucidates key differences between CNN and ViT classifiers across various vision datasets.
nan
Article 1602
Title@2025-06-06 (5): Diffusion-Based Hierarchical Graph Neural Networks for Simulating Nonlinear Solid Mechanics
Title: Diffusion-Based Hierarchical Graph Neural Networks for Simulating Nonlinear Solid Mechanics | Diffusionsbasierte Hierarchische Graphen-Neuralnetze zur Simulation nichtlinearer Festmechanik | 模拟非线性固体机械师 2506.06045v1 |
Authors (4): Tobias Würth, Niklas Freymuth, Gerhard Neumann, Luise Kärger
Graph-based learned simulators have emerged as a promising approach for simulating physical systems on unstructured meshes, offering speed and generalization across diverse geometries. However, they often struggle with capturing global phenomena, such as bending or long-range correlations, and suffer from error accumulation over long rollouts due to their reliance on local message passing and direct next-step prediction. We address these limitations by introducing the Rolling Diffusion-Batched Inference Network (ROBIN), a novel learned simulator that integrates two key innovations: (i) Rolling Diffusion, a parallelized inference scheme that amortizes the cost of diffusion-based refinement across physical time steps by overlapping denoising steps across a temporal window. (ii) A Hierarchical Graph Neural Network built on algebraic multigrid coarsening, enabling multiscale message passing across different mesh resolutions. This architecture, implemented via Algebraic-hierarchical Message Passing Networks, captures both fine-scale local dynamics and global structural effects critical for phenomena like beam bending or multi-body contact. We validate ROBIN on challenging 2D and 3D solid mechanics benchmarks involving geometric, material, and contact nonlinearities. ROBIN achieves state-of-the-art accuracy on all tasks, substantially outperforming existing next-step learned simulators while reducing inference time by up to an order of magnitude compared to standard diffusion simulators.
nan
Article 1603
Title@2025-06-06 (5): SDS-Net: Shallow-Deep Synergism-detection Network for infrared small target detection
Title: SDS-Net: Shallow-Deep Synergism-detection Network for infrared small target detection | SDS-Net: Shallow-Deep Synergism-Detektionsnetzwerk für Infrarot-Kleinzielerkennung | SDS-Net:红外小型目标探测浅水深协同-探测网络 2506.06042v1 |
Authors (5): Taoran Yue, Xiaojin Lu, Jiaxi Cai, Yuanping Chen, Shibing Chu
Current CNN-based infrared small target detection(IRSTD) methods generally overlook the heterogeneity between shallow and deep features, leading to inefficient collaboration between shallow fine grained structural information and deep high-level semantic representations. Additionally, the dependency relationships and fusion mechanisms across different feature hierarchies lack systematic modeling, which fails to fully exploit the complementarity of multilevel features. These limitations hinder IRSTD performance while incurring substantial computational costs. To address these challenges, this paper proposes a shallow-deep synergistic detection network (SDS-Net) that efficiently models multilevel feature representations to increase both the detection accuracy and computational efficiency in IRSTD tasks. SDS-Net introduces a dual-branch architecture that separately models the structural characteristics and semantic properties of features, effectively preserving shallow spatial details while capturing deep semantic representations, thereby achieving high-precision detection with significantly improved inference speed. Furthermore, the network incorporates an adaptive feature fusion module to dynamically model cross-layer feature correlations, enhancing overall feature collaboration and representation capability. Comprehensive experiments on three public datasets (NUAA-SIRST, NUDT-SIRST, and IRSTD-1K) demonstrate that SDS-Net outperforms state-of-the-art IRSTD methods while maintaining low computational complexity and high inference efficiency, showing superior detection performance and broad application prospects. Our code will be made public at https://github.com/PhysiLearn/SDS-Net.
nan
Article 1604
Title@2025-06-06 (5): Tensor-to-Tensor Models with Fast Iterated Sum Features
Title: Tensor-to-Tensor Models with Fast Iterated Sum Features | Tensor-to-Tensor-Modelle mit schnell iterierten Sum-Features | 具有快速迭代超特性的电传感器到传感器模型 2506.06041v1 |
Authors (4): Joscha Diehl, Rasheed Ibraheem, Leonard Schmitz, Yue Wu
Data in the form of images or higher-order tensors is ubiquitous in modern deep learning applications. Owing to their inherent high dimensionality, the need for subquadratic layers processing such data is even more pressing than for sequence data. We propose a novel tensor-to-tensor layer with linear cost in the input size, utilizing the mathematical gadget of ``corner trees’’ from the field of permutation counting. In particular, for order-two tensors, we provide an image-to-image layer that can be plugged into image processing pipelines. On the one hand, our method can be seen as a higher-order generalization of state-space models. On the other hand, it is based on a multiparameter generalization of the signature of iterated integrals (or sums). The proposed tensor-to-tensor concept is used to build a neural network layer called the Fast Iterated Sums (FIS) layer which integrates seamlessly with other layer types. We demonstrate the usability of the FIS layer with both classification and anomaly detection tasks. By replacing some layers of a smaller ResNet architecture with FIS, a similar accuracy (with a difference of only 0.1\%) was achieved in comparison to a larger ResNet while reducing the number of trainable parameters and multi-add operations. The FIS layer was also used to build an anomaly detection model that achieved an average AUROC of 97.3\% on the texture images of the popular MVTec AD dataset. The processing and modelling codes are publicly available at https://github.com/diehlj/fast-iterated-sums.
nan
Article 1605
Title@2025-06-06 (5): Do-PFN: In-Context Learning for Causal Effect Estimation
Title: Do-PFN: In-Context Learning for Causal Effect Estimation | Do-PFN: In-Context-Lernen für die Schätzung des Kausaleffekts | DoPFN: 因果关系估算的理论内学习 2506.06039v1 |
Authors (6): Jake Robertson, Arik Reuter, Siyuan Guo, Noah Hollmann, Frank Hutter, Bernhard Schölkopf
Estimation of causal effects is critical to a range of scientific disciplines. Existing methods for this task either require interventional data, knowledge about the ground truth causal graph, or rely on assumptions such as unconfoundedness, restricting their applicability in real-world settings. In the domain of tabular machine learning, Prior-data fitted networks (PFNs) have achieved state-of-the-art predictive performance, having been pre-trained on synthetic data to solve tabular prediction problems via in-context learning. To assess whether this can be transferred to the harder problem of causal effect estimation, we pre-train PFNs on synthetic data drawn from a wide variety of causal structures, including interventions, to predict interventional outcomes given observational data. Through extensive experiments on synthetic case studies, we show that our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph. We also perform ablation studies that elucidate Do-PFN’s scalability and robustness across datasets with a variety of causal characteristics.
nan
Article 1606
Title@2025-06-06 (5): CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models
Title: CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models | CRISP-NAM: Konkurrenzfähige Risiken interpretierbare Überlebensvorhersage mit neuralen Additivenmodellen | CRIISP-NAM: 与神经添加模型相竞争的风险解释性生存预测 2505.21360v2 |
Authors (1): Dhanesh Ramachandram
Competing risks are crucial considerations in survival modelling, particularly in healthcare domains where patients may experience multiple distinct event types. We propose CRISP-NAM (Competing Risks Interpretable Survival Prediction with Neural Additive Models), an interpretable neural additive model for competing risks survival analysis which extends the neural additive architecture to model cause-specific hazards while preserving feature-level interpretability. Each feature contributes independently to risk estimation through dedicated neural networks, allowing for visualization of complex non-linear relationships between covariates and each competing risk. We demonstrate competitive performance on multiple datasets compared to existing approaches.
nan
Article 1607
Title@2025-06-06 (5): A PID-Controlled Tensor Wheel Decomposition Model for Dynamic Link Prediction
Title: A PID-Controlled Tensor Wheel Decomposition Model for Dynamic Link Prediction | Ein PID-gesteuertes Tensor-Radzersetzungsmodell für dynamische Link-Vorhersage | PID-控制式天线仪式轮式动态链接预测分解模型 2505.14211v2 |
Authors (2): Qu Wang, Yan Xia
Link prediction in dynamic networks remains a fundamental challenge in network science, requiring the inference of potential interactions and their evolving strengths through spatiotemporal pattern analysis. Traditional static network methods have inherent limitations in capturing temporal dependencies and weight dynamics, while tensor-based methods offer a promising paradigm by encoding dynamic networks into high-order tensors to explicitly model multidimensional interactions across nodes and time. Among them, tensor wheel decomposition (TWD) stands out for its innovative topological structure, which decomposes high-order tensors into cyclic factors and core tensors to maintain structural integrity. To improve the prediction accuracy, this study introduces a PID-controlled tensor wheel decomposition (PTWD) model, which mainly adopts the following two ideas: 1) exploiting the representation power of TWD to capture the latent features of dynamic network topology and weight evolution, and 2) integrating the proportional-integral-derivative (PID) control principle into the optimization process to obtain a stable model parameter learning scheme. The performance on four real datasets verifies that the proposed PTWD model has more accurate link prediction capabilities compared to other models.
nan
Article 1608
Title@2025-06-06 (5): Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling
Title: Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling | Neural Flow Diffusion Modelle: Erlernbarer Vorwärtsprozess für verbesserte Diffusionsmodellierung | 神经流流传播模型:改进传播模型的可学习前进进程 2404.12940v3 |
Authors (3): Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth
Conventional diffusion models typically relies on a fixed forward process, which implicitly defines complex marginal distributions over latent variables. This can often complicate the reverse process’ task in learning generative trajectories, and results in costly inference for diffusion models. To address these limitations, we introduce Neural Flow Diffusion Models (NFDM), a novel framework that enhances diffusion models by supporting a broader range of forward processes beyond the standard Gaussian. We also propose a novel parameterization technique for learning the forward process. Our framework provides an end-to-end, simulation-free optimization objective, effectively minimizing a variational upper bound on the negative log-likelihood. Experimental results demonstrate NFDM’s strong performance, evidenced by state-of-the-art likelihood estimation. Furthermore, we investigate NFDM’s capacity for learning generative dynamics with specific characteristics, such as deterministic straight lines trajectories, and demonstrate how the framework may be adopted for learning bridges between two distributions. The results underscores NFDM’s versatility and its potential for a wide range of applications.
nan
Article 1609
Title@2025-06-06 (5): SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations
Title: SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations | SDE Matching: Skalierbares und simulationsfreies Training latenter stochastischer Differentialgleichungen | SDE 匹配:可缩放和模拟无模拟的静态碎裂差异等量模拟培训 2502.02472v2 |
Authors (3): Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth
The Latent Stochastic Differential Equation (SDE) is a powerful tool for time series and sequence modeling. However, training Latent SDEs typically relies on adjoint sensitivity methods, which depend on simulation and backpropagation through approximate SDE solutions, which limit scalability. In this work, we propose SDE Matching, a new simulation-free method for training Latent SDEs. Inspired by modern Score- and Flow Matching algorithms for learning generative dynamics, we extend these ideas to the domain of stochastic dynamics for time series and sequence modeling, eliminating the need for costly numerical simulations. Our results demonstrate that SDE Matching achieves performance comparable to adjoint sensitivity methods while drastically reducing computational complexity.
nan
Article 1610
Title@2025-06-06 (5): Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification
Title: Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification | Probenspezifische Geräuschinjektion für die Diffusions-basierte Adversarialreinigung | 以传播为基础的反向净化的样本特定噪音注射 2506.06027v1 |
Authors (5): Yuhao Sun, Jiacheng Zhang, Zesheng Ye, Chaowei Xiao, Feng Liu
Diffusion-based purification (DBP) methods aim to remove adversarial noise from the input sample by first injecting Gaussian noise through a forward diffusion process, and then recovering the clean example through a reverse generative process. In the above process, how much Gaussian noise is injected to the input sample is key to the success of DBP methods, which is controlled by a constant noise level $t^$ for all samples in existing methods. In this paper, we discover that an optimal $t^$ for each sample indeed could be different. Intuitively, the cleaner a sample is, the less the noise it should be injected, and vice versa. Motivated by this finding, we propose a new framework, called Sample-specific Score-aware Noise Injection (SSNI). Specifically, SSNI uses a pre-trained score network to estimate how much a data point deviates from the clean data distribution (i.e., score norms). Then, based on the magnitude of score norms, SSNI applies a reweighting function to adaptively adjust $t^*$ for each sample, achieving sample-specific noise injections. Empirically, incorporating our framework with existing DBP methods results in a notable improvement in both accuracy and robustness on CIFAR-10 and ImageNet-1K, highlighting the necessity to allocate distinct noise levels to different samples in DBP methods. Our code is available at: https://github.com/tmlr-group/SSNI.
nan
Article 1611
Title@2025-06-06 (5): On Inverse Problems, Parameter Estimation, and Domain Generalization
Title: On Inverse Problems, Parameter Estimation, and Domain Generalization | Inverse Probleme, Parameterschätzung und Domain-Verallgemeinerung | 反向问题、参数估计和域通用化 2506.06024v1 |
Authors (1): Deborah Pereg
Signal restoration and inverse problems are key elements in most real-world data science applications. In the past decades, with the emergence of machine learning methods, inversion of measurements has become a popular step in almost all physical applications, which is normally executed prior to downstream tasks that often involve parameter estimation. In this work, we analyze the general problem of parameter estimation in an inverse problem setting. First, we address the domain-shift problem by re-formulating it in direct relation with the discrete parameter estimation analysis. We analyze a significant vulnerability in current attempts to enforce domain generalization, which we dubbed the Double Meaning Theorem. Our theoretical findings are experimentally illustrated for domain shift examples in image deblurring and speckle suppression in medical imaging. We then proceed to a theoretical analysis of parameter estimation given observed measurements before and after data processing involving an inversion of the observations. We compare this setting for invertible and non-invertible (degradation) processes. We distinguish between continuous and discrete parameter estimation, corresponding with regression and classification problems, respectively. Our theoretical findings align with the well-known information-theoretic data processing inequality, and to a certain degree question the common misconception that data-processing for inversion, based on modern generative models that may often produce outstanding perceptual quality, will necessarily improve the following parameter estimation objective. It is our hope that this paper will provide practitioners with deeper insights that may be leveraged in the future for the development of more efficient and informed strategic system planning, critical in safety-sensitive applications.
nan
Article 1612
Title@2025-06-06 (5): Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
Title: Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation | Begründung durch Ausführung: Vereinheitlichung von Prozess- und Ergebnisprämien für die Codegenerierung | 执行中的理由:代码生成的统一程序和结果奖励 2412.15118v2 |
Authors (8): Zhuohao Yu, Weizheng Gu, Yidong Wang, Xingru Jiang, Zhengran Zeng, Jindong Wang, Wei Ye, Shikun Zhang
Large Language Models excel at code generation yet struggle with complex programming tasks that demand sophisticated reasoning. To bridge this gap, traditional process supervision relies on learned reward models requiring costly training data and suffering from reward misalignment, while outcome supervision fails for complex tasks needing coordinated intermediate steps. We introduce Outcome Refining Process Supervision, which unifies process and outcome supervision by leveraging executable verification: a tree-structured search framework generates strategic alternatives, profiles execution metrics, and scores candidates via self-critique mechanisms that integrate runtime feedback with reasoning. Experiments across 5 models and 3 benchmarks show consistent gains, with 26.9% higher correctness and 42.2% improved code efficiency. The results demonstrate that ORPS enables LLMs to overcome local optima in code generation, suggesting a promising direction for combining verifiable outcomes with structured reasoning to tackle complex challenges. We open-source at: https://github.com/zhuohaoyu/ORPS
nan
Article 1613
Title@2025-06-06 (5): Improving Customer Service with Automatic Topic Detection in User Emails
Title: Improving Customer Service with Automatic Topic Detection in User Emails | Verbesserung des Kundenservices mit automatischer Themenerkennung in Benutzer-E-Mails | 改进用户电子邮件自动专题检测客户服务 2502.19115v3 |
Authors (5): Bojana Bašaragin, Darija Medvecki, Gorana Gojić, Milena Oparnica, Dragiša Mišković
This study introduces a novel natural language processing pipeline that enhances customer service efficiency at Telekom Srbija, a leading Serbian telecommunications company, through automated email topic detection and labeling. Central to the pipeline is BERTopic, a modular framework that allows unsupervised topic modeling. After a series of preprocessing and postprocessing steps, we assign one of 12 topics and several additional labels to incoming emails, allowing customer service to filter and access them through a custom-made application. While applied to Serbian, the methodology is conceptually language-agnostic and can be readily adapted to other languages, particularly those that are low-resourced and morphologically rich. The system performance was evaluated by assessing the speed and correctness of the automatically assigned topics, with a weighted average processing time of 0.041 seconds per email and a weighted average F1 score of 0.96. The system now operates in the company’s production environment, streamlining customer service operations through automated email classification.
nan
Article 1614
Title@2025-06-06 (5): Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems
Title: Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems | Unisoma: Ein Unified Transformer-basierter Solver für Multi-Solid-Systeme | Unisoma:多层系统统一变压器解决方案 2506.06021v1 |
Authors (5): Shilong Tao, Zhe Feng, Haonan Sun, Zhanxing Zhu, Yunhuai Liu
Multi-solid systems are foundational to a wide range of real-world applications, yet modeling their complex interactions remains challenging. Existing deep learning methods predominantly rely on implicit modeling, where the factors influencing solid deformation are not explicitly represented but are instead indirectly learned. However, as the number of solids increases, these methods struggle to accurately capture intricate physical interactions. In this paper, we introduce a novel explicit modeling paradigm that incorporates factors influencing solid deformation through structured modules. Specifically, we present Unisoma, a unified and flexible Transformer-based model capable of handling variable numbers of solids. Unisoma directly captures physical interactions using contact modules and adaptive interaction allocation mechanism, and learns the deformation through a triplet relationship. Compared to implicit modeling techniques, explicit modeling is more well-suited for multi-solid systems with diverse coupling patterns, as it enables detailed treatment of each solid while preventing information blending and confusion. Experimentally, Unisoma achieves consistent state-of-the-art performance across seven well-established datasets and two complex multi-solid tasks. Code is avaiable at \href{this link}{https://github.com/therontau0054/Unisoma}.
nan
Article 1615
Title@2025-06-06 (5): One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy
Title: One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy | Ein Stein, zwei Vögel: Verstärkung der feindlichen Verteidigung durch die Linse der distributiven Diskrepanz | 一石,两鸟:通过分布性差异的镜头加强反向防御 2503.02169v2 |
Authors (4): Jiacheng Zhang, Benjamin I. P. Rubinstein, Jingfeng Zhang, Feng Liu
Statistical adversarial data detection (SADD) detects whether an upcoming batch contains adversarial examples (AEs) by measuring the distributional discrepancies between clean examples (CEs) and AEs. In this paper, we explore the strength of SADD-based methods by theoretically showing that minimizing distributional discrepancy can help reduce the expected loss on AEs. Despite these advantages, SADD-based methods have a potential limitation: they discard inputs that are detected as AEs, leading to the loss of useful information within those inputs. To address this limitation, we propose a two-pronged adversarial defense method, named Distributional-discrepancy-based Adversarial Defense (DAD). In the training phase, DAD first optimizes the test power of the maximum mean discrepancy (MMD) to derive MMD-OPT, which is a stone that kills two birds. MMD-OPT first serves as a guiding signal to minimize the distributional discrepancy between CEs and AEs to train a denoiser. Then, it serves as a discriminator to differentiate CEs and AEs during inference. Overall, in the inference stage, DAD consists of a two-pronged process: (1) directly feeding the detected CEs into the classifier, and (2) removing noise from the detected AEs by the distributional-discrepancy-based denoiser. Extensive experiments show that DAD outperforms current state-of-the-art (SOTA) defense methods by simultaneously improving clean and robust accuracy on CIFAR-10 and ImageNet-1K against adaptive white-box attacks. Codes are publicly available at: https://github.com/tmlr-group/DAD.
nan
Article 1616
Title@2025-06-06 (5): A Theoretical Justification for Asymmetric Actor-Critic Algorithms
Title: A Theoretical Justification for Asymmetric Actor-Critic Algorithms | Eine theoretische Rechtfertigung für asymmetrische Aktor-Kritische Algorithmen | 关于非对称精算作用者-批评算法的理论理由 2501.19116v2 |
Authors (3): Gaspard Lambrechts, Damien Ernst, Aditya Mahajan
In reinforcement learning for partially observable environments, many successful algorithms have been developed within the asymmetric learning paradigm. This paradigm leverages additional state information available at training time for faster learning. Although the proposed learning objectives are usually theoretically sound, these methods still lack a precise theoretical justification for their potential benefits. We propose such a justification for asymmetric actor-critic algorithms with linear function approximators by adapting a finite-time convergence analysis to this setting. The resulting finite-time bound reveals that the asymmetric critic eliminates error terms arising from aliasing in the agent state.
nan
Article 1617
Title@2025-06-06 (5): A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
Title: A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces | Ein Fisher-Rao Gradientenstrom für Entropie-regularisierte Markov-Entscheidungsprozesse in polnischen Räumen | 用于波兰空间的环丙正规化的Markov决策过程的Fisher-Rao梯度流 2310.02951v3 |
Authors (5): Bekzhan Kerimkulov, James-Michael Leahy, David Siska, Lukasz Szpruch, Yufei Zhang
We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon entropy-regularised Markov decision processes with Polish state and action space. The flow is a continuous-time analogue of a policy mirror descent method. We establish the global well-posedness of the gradient flow and demonstrate its exponential convergence to the optimal policy. Moreover, we prove the flow is stable with respect to gradient evaluation, offering insights into the performance of a natural policy gradient flow with log-linear policy parameterisation. To overcome challenges stemming from the lack of the convexity of the objective function and the discontinuity arising from the entropy regulariser, we leverage the performance difference lemma and the duality relationship between the gradient and mirror descent flows. Our analysis provides a theoretical foundation for developing various discrete policy gradient algorithms.
nan
Article 1618
Title@2025-06-06 (5): pLDDT-Predictor: High-speed Protein Screening Using Transformer and ESM2
Title: pLDDT-Predictor: High-speed Protein Screening Using Transformer and ESM2 | pLDDT-Predictor: High-Speed Protein Screening mit Transformer und ESM2 | pLDDT-预测:使用变压器和无害环境管理2进行高速蛋白素筛选 2410.21283v3 |
Authors (6): Joongwon Chae, Zhenyu Wang, Ijaz Gul, Jiansong Ji, Zhenglin Chen, Peiwu Qin
Recent advancements in protein structure prediction, particularly AlphaFold2, have revolutionized structural biology by achieving near-experimental accuracy ($\text{average RMSD} < 1.5\text{\AA}$). However, the computational demands of these models (approximately 30 minutes per protein on an RTX 4090) significantly limit their application in high-throughput protein screening. While large language models like ESM (Evolutionary Scale Modeling) have shown promise in extracting structural information directly from protein sequences, rapid assessment of protein structure quality for large-scale analyses remains a major challenge. We introduce pLDDT-Predictor, a high-speed protein screening tool that achieves a $250,000\times$ speedup compared to AlphaFold2 by leveraging pre-trained ESM2 protein embeddings and a Transformer architecture. Our model predicts AlphaFold2’s pLDDT (predicted Local Distance Difference Test) scores with a Pearson correlation of 0.7891 and processes proteins in just 0.007 seconds on average. Using a comprehensive dataset of 1.5 million diverse protein sequences (ranging from 50 to 2048 amino acids), we demonstrate that pLDDT-Predictor accurately classifies high-confidence structures (pLDDT $>$ 70) with 91.2\% accuracy and achieves an MSE of 84.8142 compared to AlphaFold2’s predictions. The source code and pre-trained models are freely available at https://github.com/jw-chae/pLDDT_Predictor, enabling the research community to perform rapid, large-scale protein structure quality assessments.
nan
Article 1619
Title@2025-06-06 (5): Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation
Title: Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation | Klon-Robuste Gewichte in metrischen Räumen: Umgang mit Redundanz-Bias für Benchmark-Aggregation | 中度空间的克隆-罗布体重量:为基准聚合处理冗余琵琶湖 2502.03576v2 |
Authors (2): Damien Berriaud, Roger Wattenhofer
We are given a set of elements in a metric space. The distribution of the elements is arbitrary, possibly adversarial. Can we weigh the elements in a way that is resistant to such (adversarial) manipulations? This problem arises in various contexts. For instance, the elements could represent data points, requiring robust domain adaptation. Alternatively, they might represent tasks to be aggregated into a benchmark; or questions about personal political opinions in voting advice applications. This article introduces a theoretical framework for dealing with such problems. We propose clone-proof weighting functions as a solution concept. These functions distribute importance across elements of a set such that similar objects (``clones’’) share (some of) their weights, thus avoiding a potential bias introduced by their multiplicity. Our framework extends the maximum uncertainty principle to accommodate general metric spaces and includes a set of axioms – symmetry, continuity, and clone-proofness – that guide the construction of weighting functions. Finally, we address the existence of weighting functions satisfying our axioms in the significant case of Euclidean spaces and propose a general method for their construction.
nan
Article 1620
Title@2025-06-06 (5): LightGTS: A Lightweight General Time Series Forecasting Model
Title: LightGTS: A Lightweight General Time Series Forecasting Model | LightGTS: Ein leichtes allgemeines Zeitreihen-Vorhersagemodell | LightGTS:轻型一般时间序列预测模型 2506.06005v1 |
Authors (8): Yihang Wang, Yuying Qiu, Peng Chen, Yang Shu, Zhongwen Rao, Lujia Pan, Bin Yang, Chenjuan Guo
Existing works on general time series forecasting build foundation models with heavy model parameters through large-scale multi-source pre-training. These models achieve superior generalization ability across various datasets at the cost of significant computational burdens and limitations in resource-constrained scenarios. This paper introduces LightGTS, a lightweight general time series forecasting model designed from the perspective of consistent periodical modeling. To handle diverse scales and intrinsic periods in multi-source pre-training, we introduce Periodical Tokenization, which extracts consistent periodic patterns across different datasets with varying scales. To better utilize the periodicity in the decoding process, we further introduce Periodical Parallel Decoding, which leverages historical tokens to improve forecasting. Based on the two techniques above which fully leverage the inductive bias of periods inherent in time series, LightGTS uses a lightweight model to achieve outstanding performance on general time series forecasting. It achieves state-of-the-art forecasting performance on 9 real-world benchmarks in both zero-shot and full-shot settings with much better efficiency compared with existing time series foundation models.
nan
Article 1621
Title@2025-06-06 (5): What Really is a Member? Discrediting Membership Inference via Poisoning
Title: What Really is a Member? Discrediting Membership Inference via Poisoning | Was ist wirklich ein Mitglied? Diskreditierende Mitgliedschaftsinferenz durch Vergiftung | 真正的成员是什么? 2506.06003v1 |
Authors (8): Neal Mangaokar, Ashish Hooda, Zhuohang Li, Bradley A. Malin, Kassem Fawaz, Somesh Jha, Atul Prakash, Amrita Roy Chowdhury
Membership inference tests aim to determine whether a particular data point was included in a language model’s training set. However, recent works have shown that such tests often fail under the strict definition of membership based on exact matching, and have suggested relaxing this definition to include semantic neighbors as members as well. In this work, we show that membership inference tests are still unreliable under this relaxation - it is possible to poison the training dataset in a way that causes the test to produce incorrect predictions for a target point. We theoretically reveal a trade-off between a test’s accuracy and its robustness to poisoning. We also present a concrete instantiation of this poisoning attack and empirically validate its effectiveness. Our results show that it can degrade the performance of existing tests to well below random.
nan
Article 1622
Title@2025-06-06 (5): LaDEEP: A Deep Learning-based Surrogate Model for Large Deformation of Elastic-Plastic Solids
Title: LaDEEP: A Deep Learning-based Surrogate Model for Large Deformation of Elastic-Plastic Solids | LaDEEP: Ein Deep Learning-basiertes Surrogate-Modell für große Deformation von elastischen-Plastischen Festkörpern | LaDEEEP:一个基于深层学习的弹性固体大规模变形代金模型 2506.06001v1 |
Authors (5): Shilong Tao, Zhe Feng, Haonan Sun, Zhanxing Zhu, Yunhuai Liu
Scientific computing for large deformation of elastic-plastic solids is critical for numerous real-world applications. Classical numerical solvers rely primarily on local discrete linear approximation and are constrained by an inherent trade-off between accuracy and efficiency. Recently, deep learning models have achieved impressive progress in solving the continuum mechanism. While previous models have explored various architectures and constructed coefficient-solution mappings, they are designed for general instances without considering specific problem properties and hard to accurately handle with complex elastic-plastic solids involving contact, loading and unloading. In this work, we take stretch bending, a popular metal fabrication technique, as our case study and introduce LaDEEP, a deep learning-based surrogate model for \textbf{La}rge \textbf{De}formation of \textbf{E}lastic-\textbf{P}lastic Solids. We encode the partitioned regions of the involved slender solids into a token sequence to maintain their essential order property. To characterize the physical process of the solid deformation, a two-stage Transformer-based module is designed to predict the deformation with the sequence of tokens as input. Empirically, LaDEEP achieves five magnitudes faster speed than finite element methods with a comparable accuracy, and gains 20.47\% relative improvement on average compared to other deep learning baselines. We have also deployed our model into a real-world industrial production system, and it has shown remarkable performance in both accuracy and efficiency.
nan
Article 1623
Title@2025-06-06 (5): Machine learning for in-situ composition mapping in a self-driving magnetron sputtering system
Title: Machine learning for in-situ composition mapping in a self-driving magnetron sputtering system | Maschinelles Lernen für die In-situ-Kompositionsmapping in einem selbstfahrenden Magnetron-Sputtersystem | 在自驾驶磁磁突飞系统内进行原地成像绘图的机器学习 2506.05999v1 |
Authors (5): Sanna Jarl, Jens Sjölund, Robert J. W. Frost, Anders Holst, Jonathan J. S. Scragg
Self-driving labs (SDLs), employing automation and machine learning (ML) to accelerate experimental procedures, have enormous potential in the discovery of new materials. However, in thin film science, SDLs are mainly restricted to solution-based synthetic methods which are easier to automate but cannot access the broad chemical space of inorganic materials. This work presents an SDL based on magnetron co-sputtering. We are using combinatorial frameworks, obtaining accurate composition maps on multi-element, compositionally graded thin films. This normally requires time-consuming ex-situ analysis prone to systematic errors. We present a rapid and calibration-free in-situ, ML driven approach to produce composition maps for arbitrary source combinations and sputtering conditions. We develop a method to predict the composition distribution in a multi-element combinatorial thin film, using in-situ measurements from quartz-crystal microbalance sensors placed in a sputter chamber. For a given source, the sensor readings are learned as a function of the sputtering pressure and magnetron power, through active learning using Gaussian processes (GPs). The final GPs are combined with a geometric model of the deposition flux distribution in the chamber, which allows interpolation of the deposition rates from each source, at any position across the sample. We investigate several acquisition functions for the ML procedure. A fully Bayesian GP - BALM (Bayesian active learning MacKay) - achieved the best performance, learning the deposition rates for a single source in 10 experiments. Prediction accuracy for co-sputtering composition distributions was verified experimentally. Our framework dramatically increases throughput by avoiding the need for extensive characterisation or calibration, thus demonstrating the potential of ML-guided SDLs to accelerate materials exploration.
nan
Article 1624
Title@2025-06-06 (5): RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory
Title: RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory | RETENTION: Ressourceneffizientes baumbasiertes Ensemblemodell Beschleunigung mit Content-Addressablem Speicher | 消耗:基于资源的高效树基集合模型加速,内装内存内存内装内存内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内 2506.05994v1 |
Authors (6): Yi-Chun Liao, Chieh-Lin Tsai, Yuan-Hao Chang, Camélia Slimani, Jalil Boukhobza, Tei-Wei Kuo
Although deep learning has demonstrated remarkable capabilities in learning from unstructured data, modern tree-based ensemble models remain superior in extracting relevant information and learning from structured datasets. While several efforts have been made to accelerate tree-based models, the inherent characteristics of the models pose significant challenges for conventional accelerators. Recent research leveraging content-addressable memory (CAM) offers a promising solution for accelerating tree-based models, yet existing designs suffer from excessive memory consumption and low utilization. This work addresses these challenges by introducing RETENTION, an end-to-end framework that significantly reduces CAM capacity requirement for tree-based model inference. We propose an iterative pruning algorithm with a novel pruning criterion tailored for bagging-based models (e.g., Random Forest), which minimizes model complexity while ensuring controlled accuracy degradation. Additionally, we present a tree mapping scheme that incorporates two innovative data placement strategies to alleviate the memory redundancy caused by the widespread use of don’t care states in CAM. Experimental results show that implementing the tree mapping scheme alone achieves $1.46\times$ to $21.30 \times$ better space efficiency, while the full RETENTION framework yields $4.35\times$ to $207.12\times$ improvement with less than 3% accuracy loss. These results demonstrate that RETENTION is highly effective in reducing CAM capacity requirement, providing a resource-efficient direction for tree-based model acceleration.
nan
Article 1625
Title@2025-06-06 (5): Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning
Title: Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning | Diffusions-Politiken für die Verallgemeinerung von Out-of-Distribution im Offline-Verstärkungs-Lernen | 在离线强化学习中推广分配外普遍化政策 2307.04726v4 |
Authors (3): Suzan Ece Ada, Erhan Oztop, Emre Ugur
Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for data collection. However, they face challenges handling distribution shifts due to the lack of online interaction during training. To this end, we propose a novel method named State Reconstruction for Diffusion Policies (SRDP) that incorporates state reconstruction feature learning in the recent class of diffusion policies to address the problem of out-of-distribution (OOD) generalization. Our method promotes learning of generalizable state representation to alleviate the distribution shift caused by OOD states. To illustrate the OOD generalization and faster convergence of SRDP, we design a novel 2D Multimodal Contextual Bandit environment and realize it on a 6-DoF real-world UR10 robot, as well as in simulation, and compare its performance with prior algorithms. In particular, we show the importance of the proposed state reconstruction via ablation studies. In addition, we assess the performance of our model on standard continuous control benchmarks (D4RL), namely the navigation of an 8-DoF ant and forward locomotion of half-cheetah, hopper, and walker2d, achieving state-of-the-art results. Finally, we demonstrate that our method can achieve 167% improvement over the competing baseline on a sparse continuous control navigation task where various regions of the state space are removed from the offline RL dataset, including the region encapsulating the goal.
nan
Article 1626
Title@2025-06-06 (5): Peri-LN: Revisiting Normalization Layer in the Transformer Architecture
Title: Peri-LN: Revisiting Normalization Layer in the Transformer Architecture | Peri-LN: Normalisierungsschicht in der Transformer-Architektur neu aufsuchen | Peri-LN: 变形器结构中重新审视正常化层 2502.02732v3 |
Authors (10): Jeonghoon Kim, Byeongchan Lee, Cheonbok Park, Yeontaek Oh, Beomjun Kim, Taehwan Yoo, Seongjin Shin, Dongyoon Han, Jinwoo Shin, Kang Min Yoo
Selecting a layer normalization (LN) strategy that stabilizes training and speeds convergence in Transformers remains difficult, even for today’s large language models (LLM). We present a comprehensive analytical foundation for understanding how different LN strategies influence training dynamics in large-scale Transformers. Until recently, Pre-LN and Post-LN have long dominated practices despite their limitations in large-scale training. However, several open-source models have recently begun silently adopting a third strategy without much explanation. This strategy places normalization layer peripherally around sublayers, a design we term Peri-LN. While Peri-LN has demonstrated promising performance, its precise mechanisms and benefits remain almost unexplored. Our in-depth analysis delineates the distinct behaviors of LN strategies, showing how each placement shapes activation variance and gradient propagation. To validate our theoretical insight, we conduct extensive experiments on Transformers up to $3.2$B parameters, showing that Peri-LN consistently achieves more balanced variance growth, steadier gradient flow, and convergence stability. Our results suggest that Peri-LN warrants broader consideration for large-scale Transformer architectures, providing renewed insights into the optimal placement of LN.
nan
Article 1627
Title@2025-06-06 (5): Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning
Title: Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning | Dynamische Mischung von Progressivem Parameter-Effizienter Expertenbibliothek für lebenslanges Roboterlernen | 终身机器人学习进步参数有效专家图书馆动态混合体 2506.05985v1 |
Authors (6): Yuheng Lei, Sitong Mao, Shunbo Zhou, Hongyuan Zhang, Xuelong Li, Ping Luo
A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting. Previous work within the dominant pretrain-then-finetune paradigm has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters. However, in the context of lifelong learning, these methods rely on the impractical assumption of a test-time task identifier and restrict knowledge sharing among isolated adapters. To address these limitations, we propose Dynamic Mixture of Progressive Parameter-Efficient Expert Library (DMPEL) for lifelong robot learning. DMPEL progressively learn a low-rank expert library and employs a lightweight router to dynamically combine experts into an end-to-end policy, facilitating flexible behavior during lifelong adaptation. Moreover, by leveraging the modular structure of the fine-tuned parameters, we introduce coefficient replay to guide the router in accurately retrieving frozen experts for previously encountered tasks, thereby mitigating catastrophic forgetting. This method is significantly more storage- and computationally-efficient than applying demonstration replay to the entire policy. Extensive experiments on the lifelong manipulation benchmark LIBERO demonstrate that our framework outperforms state-of-the-art lifelong learning methods in success rates across continual adaptation, while utilizing minimal trainable parameters and storage.
nan
Article 1628
Title@2025-06-06 (5): Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Title: Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations | Jacobian Sparse Autoencoder: Sparsify Computations, nicht nur Aktivierungen | Jacobian Sparass Autoencoders: 简化计算,而不仅仅是激活 2502.18147v2 |
Authors (4): Lucy Farnik, Tim Lawson, Conor Houghton, Laurence Aitchison
Sparse autoencoders (SAEs) have been successfully used to discover sparse and human-interpretable representations of the latent activations of LLMs. However, we would ultimately like to understand the computations performed by LLMs and not just their representations. The extent to which SAEs can help us understand computations is unclear because they are not designed to “sparsify” computations in any sense, only latent activations. To solve this, we propose Jacobian SAEs (JSAEs), which yield not only sparsity in the input and output activations of a given model component but also sparsity in the computation (formally, the Jacobian) connecting them. With a na"ive implementation, the Jacobians in LLMs would be computationally intractable due to their size. One key technical contribution is thus finding an efficient way of computing Jacobians in this setup. We find that JSAEs extract a relatively large degree of computational sparsity while preserving downstream LLM performance approximately as well as traditional SAEs. We also show that Jacobians are a reasonable proxy for computational sparsity because MLPs are approximately linear when rewritten in the JSAE basis. Lastly, we show that JSAEs achieve a greater degree of computational sparsity on pre-trained LLMs than on the equivalent randomized LLM. This shows that the sparsity of the computational graph appears to be a property that LLMs learn through training, and suggests that JSAEs might be more suitable for understanding learned transformer computations than standard SAEs.
nan
Article 1629
Title@2025-06-06 (5): A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations
Title: A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations | Eine Übersicht über automatische Bewertungsmethoden zu Text-, Bild- und Sprachgenerierungen | 文字、视觉和语音一代自动评价方法调查 2506.10019v1 |
Authors (11): Tian Lan, Yang-Hao Zhou, Zi-Ao Ma, Fanshu Sun, Rui-Qing Sun, Junyu Luo, Rong-Cheng Tu, Heyan Huang, Chen Xu, Zhijing Wu, Xian-Ling Mao
Recent advances in deep learning have significantly enhanced generative AI capabilities across text, images, and audio. However, automatically evaluating the quality of these generated outputs presents ongoing challenges. Although numerous automatic evaluation methods exist, current research lacks a systematic framework that comprehensively organizes these methods across text, visual, and audio modalities. To address this issue, we present a comprehensive review and a unified taxonomy of automatic evaluation methods for generated content across all three modalities; We identify five fundamental paradigms that characterize existing evaluation approaches across these domains. Our analysis begins by examining evaluation methods for text generation, where techniques are most mature. We then extend this framework to image and audio generation, demonstrating its broad applicability. Finally, we discuss promising directions for future research in cross-modal evaluation methodologies.
nan
Article 1630
Title@2025-06-06 (5): HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks
Title: HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks | HO-FMN: Hyperparameter-Optimierung für schnelle Minimal-Norm-Angriffe | HO-FMN: 快速最低目标攻击的超参数优化 2407.08806v2 |
Authors (9): Raffaele Mura, Giuseppe Floris, Luca Scionis, Giorgio Piras, Maura Pintor, Ambra Demontis, Giorgio Giacinto, Battista Biggio, Fabio Roli
Gradient-based attacks are a primary tool to evaluate robustness of machine-learning models. However, many attacks tend to provide overly-optimistic evaluations as they use fixed loss functions, optimizers, step-size schedulers, and default hyperparameters. In this work, we tackle these limitations by proposing a parametric variation of the well-known fast minimum-norm attack algorithm, whose loss, optimizer, step-size scheduler, and hyperparameters can be dynamically adjusted. We re-evaluate 12 robust models, showing that our attack finds smaller adversarial perturbations without requiring any additional tuning. This also enables reporting adversarial robustness as a function of the perturbation budget, providing a more complete evaluation than that offered by fixed-budget attacks, while remaining efficient. We release our open-source code at https://github.com/pralab/HO-FMN.
nan
Article 1631
Title@2025-06-06 (5): AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
Title: AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification | AMPED: Adaptive Multi-Objektive Projektion zum Ausgleich von Exploration und Geschicklichkeit Diversifizierung | AMMPED: 兼顾探索与技能多样化的适应性多目标预测 2506.05980v1 |
Authors (6): Geonwoo Cho, Jaemoon Lee, Jaegyun Im, Subi Lee, Jihwan Lee, Sundong Kim
Skill-based reinforcement learning (SBRL) enables rapid adaptation in environments with sparse rewards by pretraining a skill-conditioned policy. Effective skill learning requires jointly maximizing both exploration and skill diversity. However, existing methods often face challenges in simultaneously optimizing for these two conflicting objectives. In this work, we propose a new method, Adaptive Multi-objective Projection for balancing Exploration and skill Diversification (AMPED), which explicitly addresses both exploration and skill diversification. We begin by conducting extensive ablation studies to identify and define a set of objectives that effectively capture the aspects of exploration and skill diversity, respectively. During the skill pretraining phase, AMPED introduces a gradient surgery technique to balance the objectives of exploration and skill diversity, mitigating conflicts and reducing reliance on heuristic tuning. In the subsequent fine-tuning phase, AMPED incorporates a skill selector module that dynamically selects suitable skills for downstream tasks, based on task-specific performance signals. Our approach achieves performance that surpasses SBRL baselines across various benchmarks. These results highlight the importance of explicitly harmonizing exploration and diversity and demonstrate the effectiveness of AMPED in enabling robust and generalizable skill learning. Project Page: https://geonwoo.me/amped/
nan
Article 1632
Title@2025-06-06 (5): Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning
Title: Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning | Mit Adaptive Transformer Block Expansion in Federated Fine-Tuning Catastrophic Forgetting | 减轻灾难性灾难,以适应性变异器集团扩大联邦炼油厂中的适应性变异器块 2506.05977v1 |
Authors (6): Yujia Huo, Jianchun Liu, Hongli Xu, Zhenguo Ma, Shilong Wang, Liusheng Huang
Federated fine-tuning (FedFT) of large language models (LLMs) has emerged as a promising solution for adapting models to distributed data environments while ensuring data privacy. Existing FedFT methods predominantly utilize parameter-efficient fine-tuning (PEFT) techniques to reduce communication and computation overhead. However, they often fail to adequately address the catastrophic forgetting, a critical challenge arising from continual adaptation in distributed environments. The traditional centralized fine-tuning methods, which are not designed for the heterogeneous and privacy-constrained nature of federated environments, struggle to mitigate this issue effectively. Moreover, the challenge is further exacerbated by significant variation in data distributions and device capabilities across clients, which leads to intensified forgetting and degraded model generalization. To tackle these issues, we propose FedBE, a novel FedFT framework that integrates an adaptive transformer block expansion mechanism with a dynamic trainable-block allocation strategy. Specifically, FedBE expands trainable blocks within the model architecture, structurally separating newly learned task-specific knowledge from the original pre-trained representations. Additionally, FedBE dynamically assigns these trainable blocks to clients based on their data distributions and computational capabilities. This enables the framework to better accommodate heterogeneous federated environments and enhances the generalization ability of the model.Extensive experiments show that compared with existing federated fine-tuning methods, FedBE achieves 12-74% higher accuracy retention on general tasks after fine-tuning and a model convergence acceleration ratio of 1.9-3.1x without degrading the accuracy of downstream tasks.
nan
Article 1633
Title@2025-06-06 (5): Graph Deep Learning for Time Series Forecasting
Title: Graph Deep Learning for Time Series Forecasting | Graph Deep Learning für die Vorhersage von Zeitreihen | 用于时间序列预测的深深学习图 2310.15978v2 |
Authors (4): Andrea Cini, Ivan Marisca, Daniele Zambon, Cesare Alippi
Graph deep learning methods have become popular tools to process collections of correlated time series. Unlike traditional multivariate forecasting methods, graph-based predictors leverage pairwise relationships by conditioning forecasts on graphs spanning the time series collection. The conditioning takes the form of architectural inductive biases on the forecasting architecture, resulting in a family of models called spatiotemporal graph neural networks. These biases allow for training global forecasting models on large collections of time series while localizing predictions w.r.t. each element in the set (nodes) by accounting for correlations among them (edges). Recent advances in graph neural networks and deep learning for time series forecasting make the adoption of such processing framework appealing and timely. However, most studies focus on refining existing architectures by exploiting modern deep-learning practices. Conversely, foundational and methodological aspects have not been subject to systematic investigation. To fill this void, this tutorial paper aims to introduce a comprehensive methodological framework formalizing the forecasting problem and providing design principles for graph-based predictors, as well as methods to assess their performance. In addition, together with an overview of the field, we provide design guidelines and best practices, as well as an in-depth discussion of open challenges and future directions.
nan
Article 1634
Title@2025-06-06 (5): On Measuring Long-Range Interactions in Graph Neural Networks
Title: On Measuring Long-Range Interactions in Graph Neural Networks | Bei der Messung von Langzeit-Interaktionen in Graphen-Neuralen Netzwerken | 关于在图形神经网络中测量长距离相互作用 2506.05971v1 |
Authors (5): Jacob Bamberger, Benjamin Gutteridge, Scott le Roux, Michael M. Bronstein, Xiaowen Dong
Long-range graph tasks – those dependent on interactions between distant nodes – are an open problem in graph neural network research. Real-world benchmark tasks, especially the Long Range Graph Benchmark, have become popular for validating the long-range capability of proposed architectures. However, this is an empirical approach that lacks both robustness and theoretical underpinning; a more principled characterization of the long-range problem is required. To bridge this gap, we formalize long-range interactions in graph tasks, introduce a range measure for operators on graphs, and validate it with synthetic experiments. We then leverage our measure to examine commonly used tasks and architectures, and discuss to what extent they are, in fact, long-range. We believe our work advances efforts to define and address the long-range problem on graphs, and that our range measure will aid evaluation of new datasets and architectures.
nan
Article 1635
Title@2025-06-06 (5): Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning
Title: Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning | Schrittweiser Übergang vom Bellman Optimality Operator zum Bellman Operator im Online-Verstärkungslernen | 从Bellman最佳运算符逐步过渡到在线强化学习中的Bellman运算符 2506.05968v1 |
Authors (5): Motoki Omura, Kazuki Ota, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada
For continuous action spaces, actor-critic methods are widely used in online reinforcement learning (RL). However, unlike RL algorithms for discrete actions, which generally model the optimal value function using the Bellman optimality operator, RL algorithms for continuous actions typically model Q-values for the current policy using the Bellman operator. These algorithms for continuous actions rely exclusively on policy updates for improvement, which often results in low sample efficiency. This study examines the effectiveness of incorporating the Bellman optimality operator into actor-critic frameworks. Experiments in a simple environment show that modeling optimal values accelerates learning but leads to overestimation bias. To address this, we propose an annealing approach that gradually transitions from the Bellman optimality operator to the Bellman operator, thereby accelerating learning while mitigating bias. Our method, combined with TD3 and SAC, significantly outperforms existing approaches across various locomotion and manipulation tasks, demonstrating improved performance and robustness to hyperparameters related to optimality.
nan
Article 1636
Title@2025-06-06 (5): Preference Learning for AI Alignment: a Causal Perspective
Title: Preference Learning for AI Alignment: a Causal Perspective | Bevorzugtes Lernen für die AI-Ausrichtung: eine kausale Perspektive | AI一致的优先学习:从因果关系角度看问题 2506.05967v1 |
Authors (2): Katarzyna Kobalczyk, Mihaela van der Schaar
Reward modelling from preference data is a crucial step in aligning large language models (LLMs) with human values, requiring robust generalisation to novel prompt-response pairs. In this work, we propose to frame this problem in a causal paradigm, providing the rich toolbox of causality to identify the persistent challenges, such as causal misidentification, preference heterogeneity, and confounding due to user-specific factors. Inheriting from the literature of causal inference, we identify key assumptions necessary for reliable generalisation and contrast them with common data collection practices. We illustrate failure modes of naive reward models and demonstrate how causally-inspired approaches can improve model robustness. Finally, we outline desiderata for future research and practices, advocating targeted interventions to address inherent limitations of observational data.
nan
Article 1637
Title@2025-06-06 (5): Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities
Title: Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities | Auf dem Weg zu robusten multimodalen physiologischen Grundlagenmodellen: Umgang mit willkürlich fehlenden Modalitäten | 争取建立强有力的多式多模式生理基金会模型:处理任意失踪方式 2504.19596v2 |
Authors (4): Wei-Bang Jiang, Xi Fu, Yi Ding, Cuntai Guan
Multimodal physiological signals, such as EEG, ECG, EOG, and EMG, are crucial for healthcare and brain-computer interfaces. While existing methods rely on specialized architectures and dataset-specific fusion strategies, they struggle to learn universal representations that generalize across datasets and handle missing modalities at inference time. To address these issues, we propose PhysioOmni, a foundation model for multimodal physiological signal analysis that models both homogeneous and heterogeneous features to decouple multimodal signals and extract generic representations while maintaining compatibility with arbitrary missing modalities. PhysioOmni trains a decoupled multimodal tokenizer, enabling masked signal pre-training via modality-invariant and modality-specific objectives. To ensure adaptability to diverse and incomplete modality combinations, the pre-trained encoders undergo resilient fine-tuning with prototype alignment on downstream datasets. Extensive experiments on four downstream tasks, emotion recognition, sleep stage classification, motor prediction, and mental workload detection, demonstrate that PhysioOmni achieves state-of-the-art performance while maintaining strong robustness to missing modalities. Our code and model weights will be released.
nan
Article 1638
Title@2025-06-06 (5): Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology
Title: Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology | Subspezialitätsspezifisches Stiftungsmodell für intelligente Gastrointestinalpathologie | 智能气胃肠道病理学 2505.21928v2 |
Authors (27): Lianghui Zhu, Xitong Ling, Minxi Ouyang, Xiaoping Liu, Tian Guan, Mingxi Fu, Zhiqiang Cheng, Fanglei Fu, Maomao Zeng, Liming Liu, Song Duan, Qiang Huang, Ying Xiao, Jianming Li, Shanming Lu, Zhenghua Piao, Mingxi Zhu, Yibo Jin, Shan Xu, Qiming He, Yizhi Wang, Junru Cheng, Xuanyu Wang, Luxi Xie, Houqiang Li, Sufang Tian, Yonghong He
Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterative optimization strategy combining pretraining with fine-screening, specifically designed to address the detection of sparsely distributed lesion areas in whole-slide images. Digepath is pretrained on over 353 million multi-scale images from 210,043 H&E-stained slides of GI diseases. It attains state-of-the-art performance on 33 out of 34 tasks related to GI pathology, including pathological diagnosis, protein expression status prediction, gene mutation prediction, and prognosis evaluation. We further translate the intelligent screening module for early GI cancer and achieve near-perfect 99.70% sensitivity across nine independent medical institutions. This work not only advances AI-driven precision pathology for GI diseases but also bridge critical gaps in histopathological practice.
nan
Article 1639
Title@2025-06-06 (5): Certification for Differentially Private Prediction in Gradient-Based Training
Title: Certification for Differentially Private Prediction in Gradient-Based Training | Zertifizierung für unterschiedliche private Vorhersagen im gradient-based Training | 渐进式培训中差别私人预测认证 2406.13433v3 |
Authors (8): Matthew Wicker, Philip Sosnin, Igor Shilov, Adrianna Janik, Mark N. Müller, Yves-Alexandre de Montjoye, Adrian Weller, Calvin Tsay
We study private prediction where differential privacy is achieved by adding noise to the outputs of a non-private model. Existing methods rely on noise proportional to the global sensitivity of the model, often resulting in sub-optimal privacy-utility trade-offs compared to private training. We introduce a novel approach for computing dataset-specific upper bounds on prediction sensitivity by leveraging convex relaxation and bound propagation techniques. By combining these bounds with the smooth sensitivity mechanism, we significantly improve the privacy analysis of private prediction compared to global sensitivity-based approaches. Experimental results across real-world datasets in medical image classification and natural language processing demonstrate that our sensitivity bounds are can be orders of magnitude tighter than global sensitivity. Our approach provides a strong basis for the development of novel privacy preserving technologies.
nan
Article 1640
Title@2025-06-06 (5): Diving into Self-Evolving Training for Multimodal Reasoning
Title: Diving into Self-Evolving Training for Multimodal Reasoning | Eintauchen in das Selbst-Evolving Training für multimodale Vernunft | 跳入多模式理由自我发展培训 2412.17451v3 |
Authors (6): Wei Liu, Junlong Li, Xiwen Zhang, Fan Zhou, Yu Cheng, Junxian He
Self-evolving trainin–where models iteratively learn from their own outputs–has emerged as a key approach for complex reasoning tasks, addressing the scarcity of high-quality chain-of-thought data. However, its effectiveness in multimodal reasoning, a domain more intricate than text-only reasoning, remains underexplored, and the understanding of critical factors in this training paradigm remains limited. Furthermore, a central challenge for this training method is performance saturation, which impedes further improvements and scalability. Inspired by reinforcement learning (RL), in this paper, we reframe self-evolving training for multimodal reasoning through the lens of RL, identifying three pivotal factors: Training Method, Reward Model, and Prompt Variation. Through systematic analysis, we establish relatively optimal design principles that significantly enhance multimodal reasoning capabilities. Moreover, delving deeper into training dynamics, we uncover the roots of saturation and propose a new automatic balancing mechanism to mitigate this limitation. Building on these insights, we propose M-STAR (Multimodal Self-evolving Training for Reasoning), a framework that achieves consistent performance gains across models of varying sizes and diverse benchmarks. All resources are made publicly available at https://mstar-lmm.github.io.
nan
Article 1641
Title@2025-06-06 (5): Applying XAI based unsupervised knowledge discovering for Operation modes in a WWTP. A real case: AQUAVALL WWTP
Title: Applying XAI based unsupervised knowledge discovering for Operation modes in a WWTP. A real case: AQUAVALL WWTP | Anwendung von XAI-basierten unüberwachten Kenntnissen für Betriebsmodi in einem WWTP. Ein echter Fall: AQUAVALL WWTP | 在WWTP中应用基于 XAI 的未经监督的操作模式知识发现 XAI 。 一个真实案例: AQUAVALL WWTP 2506.05958v1 |
Authors (5): Alicia Beneyto-Rodriguez, Gregorio I. Sainz-Palmero, Marta Galende-Hernández, María J. Fuente, José M. Cuenca
Water reuse is a key point when fresh water is a commodity in ever greater demand, but which is also becoming ever more available. Furthermore, the return of clean water to its natural environment is also mandatory. Therefore, wastewater treatment plants (WWTPs) are essential in any policy focused on these serious challenges. WWTPs are complex facilities which need to operate at their best to achieve their goals. Nowadays, they are largely monitored, generating large databases of historical data concerning their functioning over time. All this implies a large amount of embedded information which is not usually easy for plant managers to assimilate, correlate and understand; in other words, for them to know the global operation of the plant at any given time. At this point, the intelligent and Machine Learning (ML) approaches can give support for that need, managing all the data and translating them into manageable, interpretable and explainable knowledge about how the WWTP plant is operating at a glance. Here, an eXplainable Artificial Intelligence (XAI) based methodology is proposed and tested for a real WWTP, in order to extract explainable service knowledge concerning the operation modes of the WWTP managed by AQUAVALL, which is the public service in charge of the integral water cycle in the City Council of Valladolid (Castilla y Le'on, Spain). By applying well-known approaches of XAI and ML focused on the challenge of WWTP, it has been possible to summarize a large number of historical databases through a few explained operation modes of the plant in a low-dimensional data space, showing the variables and facility units involved in each case.
nan
Article 1642
Title@2025-06-06 (5): Inferring entropy production in many-body systems using nonequilibrium MaxEnt
Title: Inferring entropy production in many-body systems using nonequilibrium MaxEnt | Ableitung der Entropieproduktion in vielen Körpersystemen unter Verwendung von Nonequilibrium MaxEnt | 在使用无平衡的MaxEnt的多机体系统中,在多机体系统中产生导导导导酶 2505.10444v2 |
Authors (3): Miguel Aguilera, Sosuke Ito, Artemy Kolchinsky
We propose a method for inferring entropy production (EP) in high-dimensional stochastic systems, including many-body systems and non-Markovian systems with long memory. Standard techniques for estimating EP become intractable in such systems due to computational and statistical limitations. We infer trajectory-level EP and lower bounds on average EP by exploiting a nonequilibrium analogue of the Maximum Entropy principle, along with convex duality. Our approach uses only samples of trajectory observables (such as spatiotemporal correlation functions). It does not require reconstruction of high-dimensional probability distributions or rate matrices, nor any special assumptions such as discrete states or multipartite dynamics. It may be used to compute a hierarchical decomposition of EP, reflecting contributions from different kinds of interactions, and it has an intuitive physical interpretation as a thermodynamic uncertainty relation. We demonstrate its numerical performance on a disordered nonequilibrium spin model with 1000 spins and a large neural spike-train dataset.
nan
Article 1643
Title@2025-06-06 (5): Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes
Title: Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes | Deterministische Politik mit politischen Gradienten in eingeschränkten Markov-Entscheidungsprozessen lernen | 在受约束的Markov决策过程中,学习具有政策分级的政策确定性政策 2506.05953v1 |
Authors (5): Alessandro Montenegro, Leonardo Cesani, Marco Mussi, Matteo Papini, Alberto Maria Metelli
Constrained Reinforcement Learning (CRL) addresses sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints. In this setting, policy-based methods are widely used thanks to their advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or a parameter-based exploration strategy, depending on whether they learn the parameters of a stochastic policy or those of a stochastic hyperpolicy. We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under gradient domination assumptions. Furthermore, under specific noise models where the (hyper)policy is expressed as a stochastic perturbation of the actions or of the parameters of an underlying deterministic policy, we additionally establish global last-iterate convergence guarantees of C-PG to the optimal deterministic policy. This holds when learning a stochastic (hyper)policy and subsequently switching off the stochasticity at the end of training, thereby deploying a deterministic policy. Finally, we empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks, and compare them against state-of-the-art baselines, demonstrating their effectiveness, in particular when deploying deterministic policies after training.
nan
Article 1644
Title@2025-06-06 (5): How to explain grokking
Title: How to explain grokking | Wie man Grokking erklärt | 如何解释怪语 2412.18624v3 |
Authors (1): S. V. Kozyrev
Explanation of grokking (delayed generalization) in learning is given by modeling grokking by the stochastic gradient Langevin dynamics (Brownian motion) and applying the ideas of thermodynamics.
nan
Article 1645
Title@2025-06-06 (5): GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression
Title: GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression | GRASP: Redundante Ebenen durch adaptive Singular-Parameter für effiziente Modellkompression ersetzen | GRASP: 用高效模型压缩的适应性单质参数取代冗余层 2501.00339v3 |
Authors (6): Kainan Liu, Yong Zhang, Ning Cheng, Zhitao Li, Shaojun Wang, Jing Xiao
Recent studies have demonstrated that many layers are functionally redundant in large language models (LLMs), enabling model compression by removing these layers to reduce inference cost. While such approaches can improve efficiency, indiscriminate layer pruning often results in significant performance degradation. In this paper, we propose GRASP (Gradient-based Retention of Adaptive Singular Parameters), a novel compression framework that mitigates this issue by preserving sensitivity-aware singular values. Unlike direct layer pruning, GRASP leverages gradient-based attribution on a small calibration dataset to adaptively identify and retain critical singular components. By replacing redundant layers with only a minimal set of parameters, GRASP achieves efficient compression while maintaining strong performance with minimal overhead. Experiments across multiple LLMs show that GRASP consistently outperforms existing compression methods, achieving 90% of the original model’s performance under a 20% compression ratio.
nan
Article 1646
Title@2025-06-06 (5): BARK: A Fully Bayesian Tree Kernel for Black-box Optimization
Title: BARK: A Fully Bayesian Tree Kernel for Black-box Optimization | BARK: Ein vollbayesischer Baumkernel für Black-Box-Optimierung | 用于黑盒优化的全巴伊西亚树核心圈 2503.05574v2 |
Authors (5): Toby Boyne, Jose Pablo Folch, Robert M Lee, Behrang Shafei, Ruth Misener
We perform Bayesian optimization using a Gaussian process perspective on Bayesian Additive Regression Trees (BART). Our BART Kernel (BARK) uses tree agreement to define a posterior over piecewise-constant functions, and we explore the space of tree kernels using a Markov chain Monte Carlo approach. Where BART only samples functions, the resulting BARK model obtains samples of Gaussian processes defining distributions over functions, which allow us to build acquisition functions for Bayesian optimization. Our tree-based approach enables global optimization over the surrogate, even for mixed-feature spaces. Moreover, where many previous tree-based kernels provide uncertainty quantification over function values, our sampling scheme captures uncertainty over the tree structure itself. Our experiments show the strong performance of BARK on both synthetic and applied benchmarks, due to the combination of our fully Bayesian surrogate and the optimization procedure.
nan
Article 1647
Title@2025-06-06 (5): AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML
Title: AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | AutoML-Agent: Ein Multi-Agent-LLM-Framework für Full-Pipeline-AutoML | 自动MAL- Agency: 全Pipeline 自动MLM 多边代理LLM 框架 2410.02958v2 |
Authors (3): Patara Trirat, Wonyong Jeong, Sung Ju Hwang
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline, such as optimal model search and hyperparameter tuning. Existing AutoML systems often require technical expertise to set up complex tools, which is in general time-consuming and requires a large amount of human effort. Therefore, recent works have started exploiting large language models (LLM) to lessen such burden and increase the usability of AutoML frameworks via a natural language interface, allowing non-expert users to build their data-driven solutions. These methods, however, are usually designed only for a particular process in the AI development pipeline and do not efficiently use the inherent capacity of the LLMs. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML, i.e., from data retrieval to model deployment. AutoML-Agent takes user’s task descriptions, facilitates collaboration between specialized LLM agents, and delivers deployment-ready models. Unlike existing work, instead of devising a single plan, we introduce a retrieval-augmented planning strategy to enhance exploration to search for more optimal plans. We also decompose each plan into sub-tasks (e.g., data preprocessing and neural network design) each of which is solved by a specialized agent we build via prompting executing in parallel, making the search process more efficient. Moreover, we propose a multi-stage verification to verify executed results and guide the code generation LLM in implementing successful solutions. Extensive experiments on seven downstream tasks using fourteen datasets show that AutoML-Agent achieves a higher success rate in automating the full AutoML process, yielding systems with good performance throughout the diverse domains.
nan
Article 1648
Title@2025-06-06 (5): Additive decomposition of one-dimensional signals using Transformers
Title: Additive decomposition of one-dimensional signals using Transformers | Additive Zersetzung von eindimensionalen Signalen mit Transformern | 使用变压器的单维信号的添加分解 2506.05942v1 |
Authors (4): Samuele Salti, Andrea Pinto, Alessandro Lanza, Serena Morigi
One-dimensional signal decomposition is a well-established and widely used technique across various scientific fields. It serves as a highly valuable pre-processing step for data analysis. While traditional decomposition techniques often rely on mathematical models, recent research suggests that applying the latest deep learning models to this problem presents an exciting, unexplored area with promising potential. This work presents a novel method for the additive decomposition of one-dimensional signals. We leverage the Transformer architecture to decompose signals into their constituent components: piece-wise constant, smooth (low-frequency oscillatory), textured (high-frequency oscillatory), and a noise component. Our model, trained on synthetic data, achieves excellent accuracy in modeling and decomposing input signals from the same distribution, as demonstrated by the experimental results.
nan
Article 1649
Title@2025-06-06 (5): SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
Title: SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models | SafeAuto: Knowledge-Enhanced Safe Autonomous Driving mit multimodalen Foundation-Modellen | SafeAuto:利用多模式基金会模式加强知识增强安全自主驾驶 2503.00211v2 |
Authors (6): Jiawei Zhang, Xuan Yang, Taiqi Wang, Yu Yao, Aleksandr Petiushko, Bo Li
Traditional autonomous driving systems often struggle to connect high-level reasoning with low-level control, leading to suboptimal and sometimes unsafe behaviors. Recent advances in multimodal large language models (MLLMs), which process both visual and textual data, offer an opportunity to unify perception and reasoning. However, effectively embedding precise safety knowledge into MLLMs for autonomous driving remains a significant challenge. To address this, we propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge. First, we introduce a Position-Dependent Cross-Entropy (PDCE) loss to improve low-level control signal predictions when values are represented as text. Second, to explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic (e.g., “red light $\implies$ stop”) and embeds them into a probabilistic graphical model (e.g., Markov Logic Network) to verify predicted actions using recognized environmental attributes. Additionally, our Multimodal Retrieval-Augmented Generation (RAG) model leverages video, control signals, and environmental attributes to learn from past driving experiences. Integrating PDCE, MLN, and Multimodal RAG, SafeAuto outperforms existing baselines across multiple datasets, enabling more accurate, reliable, and safer autonomous driving. The code is available at https://github.com/AI-secure/SafeAuto.
nan
Article 1650
Title@2025-06-06 (5): Comparative Analysis of Modern Machine Learning Models for Retail Sales Forecasting
Title: Comparative Analysis of Modern Machine Learning Models for Retail Sales Forecasting | Vergleichende Analyse von modernen Machine Learning-Modellen für die Prognose des Einzelhandels | 零售销售预测现代机器学习模型比较分析 2506.05941v1 |
Authors (4): Luka Hobor, Mario Brcic, Lidija Polutnik, Ante Kapetanovic
Accurate forecasting is key for all business planning. When estimated sales are too high, brick-and-mortar retailers may incur higher costs due to unsold inventories, higher labor and storage space costs, etc. On the other hand, when forecasts underestimate the level of sales, firms experience lost sales, shortages, and impact on the reputation of the retailer in their relevant market. Accurate forecasting presents a competitive advantage for companies. It facilitates the achievement of revenue and profit goals and execution of pricing strategy and tactics. In this study, we provide an exhaustive assessment of the forecasting models applied to a high-resolution brick-and-mortar retail dataset. Our forecasting framework addresses the problems found in retail environments, including intermittent demand, missing values, and frequent product turnover. We compare tree-based ensembles (such as XGBoost and LightGBM) and state-of-the-art neural network architectures (including N-BEATS, NHITS, and the Temporal Fusion Transformer) across various experimental settings. Our results show that localized modeling strategies especially those using tree-based models on individual groups with non-imputed data, consistently deliver superior forecasting accuracy and computational efficiency. In contrast, neural models benefit from advanced imputation methods, yet still fall short in handling the irregularities typical of physical retail data. These results further practical understanding for model selection in retail environment and highlight the significance of data preprocessing to improve forecast performance.
nan
Article 1651
Title@2025-06-06 (5): Exponential Family Variational Flow Matching for Tabular Data Generation
Title: Exponential Family Variational Flow Matching for Tabular Data Generation | Exponentieller Familien-Variationsfluss passend zur tabellarischen Datengenerierung | 用于制表数据的家庭流动量变化 2506.05940v1 |
Authors (3): Andrés Guzmán-Cordero, Floor Eijkelboom, Jan-Willem van de Meent
While denoising diffusion and flow matching have driven major advances in generative modeling, their application to tabular data remains limited, despite its ubiquity in real-world applications. To this end, we develop TabbyFlow, a variational Flow Matching (VFM) method for tabular data generation. To apply VFM to data with mixed continuous and discrete features, we introduce Exponential Family Variational Flow Matching (EF-VFM), which represents heterogeneous data types using a general exponential family distribution. We hereby obtain an efficient, data-driven objective based on moment matching, enabling principled learning of probability paths over mixed continuous and discrete variables. We also establish a connection between variational flow matching and generalized flow matching objectives based on Bregman divergences. Evaluation on tabular data benchmarks demonstrates state-of-the-art performance compared to baselines.
nan
Article 1652
Title@2025-06-06 (5): Quantifying Adversarial Uncertainty in Evidential Deep Learning using Conflict Resolution
Title: Quantifying Adversarial Uncertainty in Evidential Deep Learning using Conflict Resolution | Quantifizierung von Unsicherheiten im Evidential Deep Learning mittels Konfliktlösung | 以解决冲突的方式量化证据深层学习中存在的逆差不确定性 2506.05937v1 |
Authors (3): Charmaine Barker, Daniel Bethell, Simos Gerasimou
Reliability of deep learning models is critical for deployment in high-stakes applications, where out-of-distribution or adversarial inputs may lead to detrimental outcomes. Evidential Deep Learning, an efficient paradigm for uncertainty quantification, models predictions as Dirichlet distributions of a single forward pass. However, EDL is particularly vulnerable to adversarially perturbed inputs, making overconfident errors. Conflict-aware Evidential Deep Learning (C-EDL) is a lightweight post-hoc uncertainty quantification approach that mitigates these issues, enhancing adversarial and OOD robustness without retraining. C-EDL generates diverse, task-preserving transformations per input and quantifies representational disagreement to calibrate uncertainty estimates when needed. C-EDL’s conflict-aware prediction adjustment improves detection of OOD and adversarial inputs, maintaining high in-distribution accuracy and low computational overhead. Our experimental evaluation shows that C-EDL significantly outperforms state-of-the-art EDL variants and competitive baselines, achieving substantial reductions in coverage for OOD data (up to 55%) and adversarial data (up to 90%), across a range of datasets, attack types, and uncertainty metrics.
nan
Article 1653
Title@2025-06-06 (5): Deep Learning Weather Models for Subregional Ocean Forecasting: A Case Study on the Canary Current Upwelling System
Title: Deep Learning Weather Models for Subregional Ocean Forecasting: A Case Study on the Canary Current Upwelling System | Deep Learning Weather Models for Subregional Ocean Forecasting: Eine Fallstudie über das kanarische aktuelle Auftriebssystem | 次区域海洋预报的深学习气象模型:关于加那利洋流上升系统的个案研究 2505.24429v2 |
Authors (3): Giovanny A. Cuervo-Londoño, Javier Sánchez, Ángel Rodríguez-Santana
Oceanographic forecasting impacts various sectors of society by supporting environmental conservation and economic activities. Based on global circulation models, traditional forecasting methods are computationally expensive and slow, limiting their ability to provide rapid forecasts. Recent advances in deep learning offer faster and more accurate predictions, although these data-driven models are often trained with global data from numerical simulations, which may not reflect reality. The emergence of such models presents great potential for improving ocean prediction at a subregional domain. However, their ability to predict fine-scale ocean processes, like mesoscale structures, remains largely unknown. This work aims to adapt a graph neural network initially developed for global weather forecasting to improve subregional ocean prediction, specifically focusing on the Canary Current upwelling system. The model is trained with satellite data and compared to state-of-the-art physical ocean models to assess its performance in capturing ocean dynamics. Our results show that the deep learning model surpasses traditional methods in precision despite some challenges in upwelling areas. It demonstrated superior performance in reducing RMSE errors compared to ConvLSTM and the GLORYS reanalysis, particularly in regions with complex oceanic dynamics such as Cape Ghir, Cape Bojador, and Cape Blanc. The model achieved improvements of up to 26.5% relative to ConvLSTM and error reductions of up to 76% in 5-day forecasts compared to the GLORYS reanalysis at these critical locations, highlighting its enhanced capability to capture spatial variability and improve predictive accuracy in complex areas. These findings suggest the viability of adapting meteorological data-driven models for improving subregional medium-term ocean forecasting.
nan
Article 1654
Title@2025-06-06 (5): Machine Learning Predictions for Traffic Equilibria in Road Renovation Scheduling
Title: Machine Learning Predictions for Traffic Equilibria in Road Renovation Scheduling | Machine Learning Vorhersagen für Verkehrsequilibrien in der Road Renovation Scheduling | 道路翻新日程安排交通平衡的机器学习预测 2506.05933v1 |
Authors (4): Robbert Bosch, Wouter van Heeswijk, Patricia Rogetzer, Martijn Mes
Accurately estimating the impact of road maintenance schedules on traffic conditions is important because maintenance operations can substantially worsen congestion if not carefully planned. Reliable estimates allow planners to avoid excessive delays during periods of roadwork. Since the exact increase in congestion is difficult to predict analytically, traffic simulations are commonly used to assess the redistribution of the flow of traffic. However, when applied to long-term maintenance planning involving many overlapping projects and scheduling alternatives, these simulations must be run thousands of times, resulting in a significant computational burden. This paper investigates the use of machine learning-based surrogate models to predict network-wide congestion caused by simultaneous road renovations. We frame the problem as a supervised learning task, using one-hot encodings, engineered traffic features, and heuristic approximations. A range of linear, ensemble-based, probabilistic, and neural regression models is evaluated under an online learning framework in which data progressively becomes available. The experimental results show that the Costliest Subset Heuristic provides a reasonable approximation when limited training data is available, and that most regression models fail to outperform it, with the exception of XGBoost, which achieves substantially better accuracy. In overall performance, XGBoost significantly outperforms alternatives in a range of metrics, most strikingly Mean Absolute Percentage Error (MAPE) and Pinball loss, where it achieves a MAPE of 11% and outperforms the next-best model by 20% and 38% respectively. This modeling approach has the potential to reduce the computational burden of large-scale traffic assignment problems in maintenance planning.
nan
Article 1655
Title@2025-06-06 (5): Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training
Title: Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training | Progressive Data Dropout: Ein verblüffend einfacher Ansatz zum schnelleren Training | 渐进数据辍学:快速培训的一个令人尴尬的简单方法 2505.22342v2 |
Authors (7): Shriram M S, Xinyue Hao, Shihao Hou, Yang Lu, Laura Sevilla-Lara, Anurag Arnab, Shreyank N Gowda
The success of the machine learning field has reliably depended on training on large datasets. While effective, this trend comes at an extraordinary cost. This is due to two deeply intertwined factors: the size of models and the size of datasets. While promising research efforts focus on reducing the size of models, the other half of the equation remains fairly mysterious. Indeed, it is surprising that the standard approach to training remains to iterate over and over, uniformly sampling the training dataset. In this paper we explore a series of alternative training paradigms that leverage insights from hard-data-mining and dropout, simple enough to implement and use that can become the new training standard. The proposed Progressive Data Dropout reduces the number of effective epochs to as little as 12.4% of the baseline. This savings actually do not come at any cost for accuracy. Surprisingly, the proposed method improves accuracy by up to 4.82%. Our approach requires no changes to model architecture or optimizer, and can be applied across standard training pipelines, thus posing an excellent opportunity for wide adoption. Code can be found here: https://github.com/bazyagami/LearningWithRevision
nan
Article 1656
Title@2025-06-06 (5): Can Masked Autoencoders Also Listen to Birds?
Title: Can Masked Autoencoders Also Listen to Birds? | Können maskierte Autoencoder auch Vögel hören? | 面具汽车人也能听鸟的话吗? 2504.12880v3 |
Authors (6): Lukas Rauch, René Heinrich, Ilyass Moummad, Alexis Joly, Bernhard Sick, Christoph Scholz
Masked Autoencoders (MAEs) have shown competitive results in audio classification by learning rich semantic representations through an efficient self-supervised reconstruction task. However, general-purpose models fail to generalize well when applied directly to fine-grained audio domains. Specifically, bird-sound classification requires distinguishing subtle inter-species differences and managing high intra-species acoustic variability, thereby revealing the performance limitations of general-domain Audio-MAE models. This work demonstrates that bridging this domain gap requires more than domain-specific pretraining data; adapting the entire training pipeline is crucial. We systematically revisit and adapt the pretraining recipe, fine-tuning methods, and frozen feature utilization to bird sounds using BirdSet, a large-scale bioacoustic dataset comparable to AudioSet. Our resulting Bird-MAE achieves new state-of-the-art results in BirdSet’s multi-label classification benchmark. Additionally, we introduce the parameter-efficient prototypical probing, enhancing the utility of frozen MAE representations and closely approaching fine-tuning performance in low-resource settings. Bird-MAE’s prototypical probes outperform linear probing by up to 37%$\text{p}$ in MAP and narrow the gap to fine-tuning to approximately 3.3%$\text{p}$ on average across BirdSet downstream tasks. Bird-MAE also demonstrates robust few-shot capabilities with prototypical probing in our newly established few-shot benchmark on BirdSet, highlighting the potential of tailored self-supervised learning pipelines for fine-grained audio domains.
nan
Article 1657
Title@2025-06-06 (5): GraphGPT: Generative Pre-trained Graph Eulerian Transformer
Title: GraphGPT: Generative Pre-trained Graph Eulerian Transformer | GraphGPT: Generative vortrainierte Graph Eulerian Transformer | 图GGPT: 未经培训的先产生图 欧勒里亚变形器 2401.00529v3 |
Authors (6): Qifang Zhao, Weidong Ren, Tianyu Li, Hong Liu, Xingsheng He, Xiaoxiao Xu
We introduceGraphGPT, a novel self-supervised generative pre-trained model for graph learning based on the Graph Eulerian Transformer (GET). First, we propose GET, which combines a standard transformer encoder or decoder architecture with an innovative graph-to-sequence transformation method. This method converts graphs or sampled subgraphs into sequences of tokens representing nodes, edges, and attributes in a reversible manner using Eulerian paths. We pre-train GET using either of the two self-supervised tasks: next-token prediction (NTP) and scheduled masked-token prediction (SMTP). The pre-trained model is then fine-tuned for downstream tasks such as graph-, edge-, and node-level prediction. Despite its simplicity, GraphGPT achieves performance comparable to or surpassing state-of-the-art methods on multiple large-scale Open Graph Benchmark (OGB) datasets. It demonstrates exceptional results on the molecular property prediction dataset PCQM4Mv2 and the protein-protein interaction dataset ogbl-ppa. Notably, generative pre-training enables scaling GraphGPT to 2 billion parameters while maintaining performance gains - a breakthrough that overcomes the scalability limitations of traditional Graph Neural Networks (GNNs) and prior graph transformers (GTs). To advance research in graph foundation models and facilitate scientific discovery in chemistry, materials science, and related fields, we will release the source code (https://github.com/alibaba/graph-gpt) and pre-trained checkpoints.
nan
Article 1658
Title@2025-06-06 (5): UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
Title: UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning | UDora: Ein einheitliches rotes Teaming-Framework gegen LLM-Agenten durch Dynamisches Entführen ihrer eigenen Vernunft | UDora:通过动态劫持LLM代理商的自身理由,针对这些代理商的统一红色团队框架 2503.01908v2 |
Authors (3): Jiawei Zhang, Shuang Yang, Bo Li
Large Language Model (LLM) agents equipped with external tools have become increasingly powerful for complex tasks such as web shopping, automated email replies, and financial trading. However, these advancements amplify the risks of adversarial attacks, especially when agents can access sensitive external functionalities. Nevertheless, manipulating LLM agents into performing targeted malicious actions or invoking specific tools remains challenging, as these agents extensively reason or plan before executing final actions. In this work, we present UDora, a unified red teaming framework designed for LLM agents that dynamically hijacks the agent’s reasoning processes to compel malicious behavior. Specifically, UDora first generates the model’s reasoning trace for the given task, then automatically identifies optimal points within this trace to insert targeted perturbations. The resulting perturbed reasoning is then used as a surrogate response for optimization. By iteratively applying this process, the LLM agent will then be induced to undertake designated malicious actions or to invoke specific malicious tools. Our approach demonstrates superior effectiveness compared to existing methods across three LLM agent datasets. The code is available at https://github.com/AI-secure/UDora.
nan
Article 1659
Title@2025-06-06 (5): Over-PINNs: Enhancing Physics-Informed Neural Networks via Higher-Order Partial Derivative Overdetermination of PDEs
Title: Over-PINNs: Enhancing Physics-Informed Neural Networks via Higher-Order Partial Derivative Overdetermination of PDEs | Über-PINNs: Verbesserung physikinformierter neuraler Netzwerke durch höhere Ordnung Partial Derivative Überbestimmung von PDEs | 超标准PINNs:通过高端部分衍生因素超过PDE的确定,加强物理内化神经网络 2506.05918v1 |
Authors (4): Wenxuan Huo, Qiang He, Gang Zhu, Weifeng Huang
Partial differential equations (PDEs) serve as the cornerstone of mathematical physics. In recent years, Physics-Informed Neural Networks (PINNs) have significantly reduced the dependence on large datasets by embedding physical laws directly into the training of neural networks. However, when dealing with complex problems, the accuracy of PINNs still has room for improvement. To address this issue, we introduce the Over-PINNs framework, which leverages automatic differentiation (AD) to generate higher-order auxiliary equations that impose additional physical constraints. These equations are incorporated as extra loss terms in the training process, effectively enhancing the model’s ability to capture physical information through an “overdetermined” approach. Numerical results illustrate that this method exhibits strong versatility in solving various types of PDEs. It achieves a significant improvement in solution accuracy without incurring substantial additional computational costs.
nan
Article 1660
Title@2025-06-06 (5): Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness
Title: Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness | Halbüberwachte Segmentierung jenseits der Genauigkeit neu denken: Zuverlässigkeit und Robustheit | 重新思考超越准确度的半监督分割:可靠性和强性 2506.05917v1 |
Authors (3): Steven Landgraf, Markus Hillemann, Markus Ulrich
Semantic segmentation is critical for scene understanding but demands costly pixel-wise annotations, attracting increasing attention to semi-supervised approaches to leverage abundant unlabeled data. While semi-supervised segmentation is often promoted as a path toward scalable, real-world deployment, it is astonishing that current evaluation protocols exclusively focus on segmentation accuracy, entirely overlooking reliability and robustness. These qualities, which ensure consistent performance under diverse conditions (robustness) and well-calibrated model confidences as well as meaningful uncertainties (reliability), are essential for safety-critical applications like autonomous driving, where models must handle unpredictable environments and avoid sudden failures at all costs. To address this gap, we introduce the Reliable Segmentation Score (RSS), a novel metric that combines predictive accuracy, calibration, and uncertainty quality measures via a harmonic mean. RSS penalizes deficiencies in any of its components, providing an easy and intuitive way of holistically judging segmentation models. Comprehensive evaluations of UniMatchV2 against its predecessor and a supervised baseline show that semi-supervised methods often trade reliability for accuracy. While out-of-domain evaluations demonstrate UniMatchV2’s robustness, they further expose persistent reliability shortcomings. We advocate for a shift in evaluation protocols toward more holistic metrics like RSS to better align semi-supervised learning research with real-world deployment needs.
nan
Article 1661
Title@2025-06-06 (5): Approximating Latent Manifolds in Neural Networks via Vanishing Ideals
Title: Approximating Latent Manifolds in Neural Networks via Vanishing Ideals | Approximierende latente Manifolds in neuralen Netzen über Vanishing Ideals | 借助消失的理念在神经网络中接近的中点修饰 2502.15051v2 |
Authors (4): Nico Pelleriti, Max Zimmer, Elias Wirth, Sebastian Pokutta
Deep neural networks have reshaped modern machine learning by learning powerful latent representations that often align with the manifold hypothesis: high-dimensional data lie on lower-dimensional manifolds. In this paper, we establish a connection between manifold learning and computational algebra by demonstrating how vanishing ideals can characterize the latent manifolds of deep networks. To that end, we propose a new neural architecture that (i) truncates a pretrained network at an intermediate layer, (ii) approximates each class manifold via polynomial generators of the vanishing ideal, and (iii) transforms the resulting latent space into linearly separable features through a single polynomial layer. The resulting models have significantly fewer layers than their pretrained baselines, while maintaining comparable accuracy, achieving higher throughput, and utilizing fewer parameters. Furthermore, drawing on spectral complexity analysis, we derive sharper theoretical guarantees for generalization, showing that our approach can in principle offer tighter bounds than standard deep networks. Numerical experiments confirm the effectiveness and efficiency of the proposed approach.
nan
Article 1662
Title@2025-06-06 (5): DeviceScope: An Interactive App to Detect and Localize Appliance Patterns in Electricity Consumption Time Series
Title: DeviceScope: An Interactive App to Detect and Localize Appliance Patterns in Electricity Consumption Time Series | DeviceScope: Eine interaktive App zur Erkennung und Lokalisierung von Appliance-Mustern in der Stromverbrauchs-Zeitreihe | 设备范围:在电力消耗时间序列中检测和本地化应用程序模式的交互式应用程序 2506.05912v1 |
Authors (4): Adrien Petralia, Paul Boniol, Philippe Charpentier, Themis Palpanas
In recent years, electricity suppliers have installed millions of smart meters worldwide to improve the management of the smart grid system. These meters collect a large amount of electrical consumption data to produce valuable information to help consumers reduce their electricity footprint. However, having non-expert users (e.g., consumers or sales advisors) understand these data and derive usage patterns for different appliances has become a significant challenge for electricity suppliers because these data record the aggregated behavior of all appliances. At the same time, ground-truth labels (which could train appliance detection and localization models) are expensive to collect and extremely scarce in practice. This paper introduces DeviceScope, an interactive tool designed to facilitate understanding smart meter data by detecting and localizing individual appliance patterns within a given time period. Our system is based on CamAL (Class Activation Map-based Appliance Localization), a novel weakly supervised approach for appliance localization that only requires the knowledge of the existence of an appliance in a household to be trained. This paper appeared in ICDE 2025.
nan
Article 1663
Title@2025-06-06 (5): CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Title: CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness | CAPability: Ein umfassender Visual Caption Benchmark zur Bewertung von Korrektheit und Gründlichkeit | 能力:评估正确性和彻底性的综合视觉能力基准 2502.14914v3 |
Authors (12): Zhihang Liu, Chen-Wei Xie, Bin Wen, Feiwu Yu, Jixuan Chen, Pandeng Li, Boqiang Zhang, Nianzu Yang, Yinglu Li, Zuan Gao, Yun Zheng, Hongtao Xie
Visual captioning benchmarks have become outdated with the emergence of modern multimodal large language models (MLLMs), as the brief ground-truth sentences and traditional metrics fail to assess detailed captions effectively. While recent benchmarks attempt to address this by focusing on keyword extraction or object-centric evaluation, they remain limited to vague-view or object-view analyses and incomplete visual element coverage. In this paper, we introduce CAPability, a comprehensive multi-view benchmark for evaluating visual captioning across 12 dimensions spanning six critical views. We curate nearly 11K human-annotated images and videos with visual element annotations to evaluate the generated captions. CAPability stably assesses both the correctness and thoroughness of captions with \textit{precision} and \textit{hit} metrics. By converting annotations to QA pairs, we further introduce a heuristic metric, \textit{know but cannot tell} ($K\bar{T}$), indicating a significant performance gap between QA and caption capabilities. Our work provides a holistic analysis of MLLMs’ captioning abilities, as we identify their strengths and weaknesses across various dimensions, guiding future research to enhance specific aspects of their capabilities.
nan
Article 1664
Title@2025-06-06 (5): Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants
Title: Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants | Brauchen wir Strategien zur Neuausrichtung? Eine theoretische und empirische Studie rund um SMOTE und deren Varianten | 我们需要重新平衡战略吗? 围绕SMOTE及其变体进行理论和经验研究。 2402.03819v5 |
Authors (3): Abdoulaye Sakho, Emmanuel Malherbe, Erwan Scornet
Synthetic Minority Oversampling Technique (SMOTE) is a common rebalancing strategy for handling imbalanced tabular data sets. However, few works analyze SMOTE theoretically. In this paper, we derive several non-asymptotic upper bound on SMOTE density. From these results, we prove that SMOTE (with default parameter) tends to copy the original minority samples asymptotically. We confirm and illustrate empirically this first theoretical behavior on a real-world data-set.bFurthermore, we prove that SMOTE density vanishes near the boundary of the support of the minority class distribution. We then adapt SMOTE based on our theoretical findings to introduce two new variants. These strategies are compared on 13 tabular data sets with 10 state-of-the-art rebalancing procedures, including deep generative and diffusion models. One of our key findings is that, for most data sets, applying no rebalancing strategy is competitive in terms of predictive performances, would it be with LightGBM, tuned random forests or logistic regression. However, when the imbalance ratio is artificially augmented, one of our two modifications of SMOTE leads to promising predictive performances compared to SMOTE and other state-of-the-art strategies.
nan
Article 1665
Title@2025-06-06 (5): A Driving Regime-Embedded Deep Learning Framework for Modeling Intra-Driver Heterogeneity in Multi-Scale Car-Following Dynamics
Title: A Driving Regime-Embedded Deep Learning Framework for Modeling Intra-Driver Heterogeneity in Multi-Scale Car-Following Dynamics | Ein Driving Regime-embedded Deep Learning Framework zur Modellierung von Intra-Driver Heterogenität in multi-Scale Car-Following Dynamics | 用于模拟多规模汽车跟踪动态中多规模汽车内部异质性模型的驱动系统(以制度为床位的深学习框架) 2506.05902v1 |
Authors (6): Shirui Zhou, Jiying Yan, Junfang Tian, Tao Wang, Yongfu Li, Shiquan Zhong
A fundamental challenge in car-following modeling lies in accurately representing the multi-scale complexity of driving behaviors, particularly the intra-driver heterogeneity where a single driver’s actions fluctuate dynamically under varying conditions. While existing models, both conventional and data-driven, address behavioral heterogeneity to some extent, they often emphasize inter-driver heterogeneity or rely on simplified assumptions, limiting their ability to capture the dynamic heterogeneity of a single driver under different driving conditions. To address this gap, we propose a novel data-driven car-following framework that systematically embeds discrete driving regimes (e.g., steady-state following, acceleration, cruising) into vehicular motion predictions. Leveraging high-resolution traffic trajectory datasets, the proposed hybrid deep learning architecture combines Gated Recurrent Units for discrete driving regime classification with Long Short-Term Memory networks for continuous kinematic prediction, unifying discrete decision-making processes and continuous vehicular dynamics to comprehensively represent inter- and intra-driver heterogeneity. Driving regimes are identified using a bottom-up segmentation algorithm and Dynamic Time Warping, ensuring robust characterization of behavioral states across diverse traffic scenarios. Comparative analyses demonstrate that the framework significantly reduces prediction errors for acceleration (maximum MSE improvement reached 58.47\%), speed, and spacing metrics while reproducing critical traffic phenomena, such as stop-and-go wave propagation and oscillatory dynamics.
nan
Article 1666
Title@2025-06-06 (5): State-Covering Trajectory Stitching for Diffusion Planners
Title: State-Covering Trajectory Stitching for Diffusion Planners | State-Covering Trajektorie Stich für Diffusionsplaner | 用于扩散规划器的 国家覆盖轨迹切换 2506.00895v2 |
Authors (2): Kyowoon Lee, Jaesik Choi
Diffusion-based generative models are emerging as powerful tools for long-horizon planning in reinforcement learning (RL), particularly with offline datasets. However, their performance is fundamentally limited by the quality and diversity of training data. This often restricts their generalization to tasks outside their training distribution or longer planning horizons. To overcome this challenge, we propose State-Covering Trajectory Stitching (SCoTS), a novel reward-free trajectory augmentation method that incrementally stitches together short trajectory segments, systematically generating diverse and extended trajectories. SCoTS first learns a temporal distance-preserving latent representation that captures the underlying temporal structure of the environment, then iteratively stitches trajectory segments guided by directional exploration and novelty to effectively cover and expand this latent space. We demonstrate that SCoTS significantly improves the performance and generalization capabilities of diffusion planners on offline goal-conditioned benchmarks requiring stitching and long-horizon reasoning. Furthermore, augmented trajectories generated by SCoTS significantly improve the performance of widely used offline goal-conditioned RL algorithms across diverse environments.
nan
Article 1667
Title@2025-06-06 (5): Few Labels are all you need: A Weakly Supervised Framework for Appliance Localization in Smart-Meter Series
Title: Few Labels are all you need: A Weakly Supervised Framework for Appliance Localization in Smart-Meter Series | Wenige Labels sind alles, was Sie brauchen: Ein schwach überwachtes Framework für die Gerätelokalisierung in der Smart-Meter-Serie | 少数标签是您所需要的:智能- 密特系列中受微弱监督的应用程序本地化框架 2506.05895v1 |
Authors (4): Adrien Petralia, Paul Boniol, Philippe Charpentier, Themis Palpanas
Improving smart grid system management is crucial in the fight against climate change, and enabling consumers to play an active role in this effort is a significant challenge for electricity suppliers. In this regard, millions of smart meters have been deployed worldwide in the last decade, recording the main electricity power consumed in individual households. This data produces valuable information that can help them reduce their electricity footprint; nevertheless, the collected signal aggregates the consumption of the different appliances running simultaneously in the house, making it difficult to apprehend. Non-Intrusive Load Monitoring (NILM) refers to the challenge of estimating the power consumption, pattern, or on/off state activation of individual appliances using the main smart meter signal. Recent methods proposed to tackle this task are based on a fully supervised deep-learning approach that requires both the aggregate signal and the ground truth of individual appliance power. However, such labels are expensive to collect and extremely scarce in practice, as they require conducting intrusive surveys in households to monitor each appliance. In this paper, we introduce CamAL, a weakly supervised approach for appliance pattern localization that only requires information on the presence of an appliance in a household to be trained. CamAL merges an ensemble of deep-learning classifiers combined with an explainable classification method to be able to localize appliance patterns. Our experimental evaluation, conducted on 4 real-world datasets, demonstrates that CamAL significantly outperforms existing weakly supervised baselines and that current SotA fully supervised NILM approaches require significantly more labels to reach CamAL performances. The source of our experiments is available at: https://github.com/adrienpetralia/CamAL. This paper appeared in ICDE 2025.
nan
Article 1668
Title@2025-06-06 (5): Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games
Title: Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games | Politikoptimierung für lineare-Quadratic Graphon Mean Field Games mit kontinuierlicher Zeit | 连续时线性二次曲线平面平面运动会的政策优化 2506.05894v1 |
Authors (2): Philipp Plank, Yufei Zhang
Multi-agent reinforcement learning, despite its popularity and empirical success, faces significant scalability challenges in large-population dynamic games. Graphon mean field games (GMFGs) offer a principled framework for approximating such games while capturing heterogeneity among players. In this paper, we propose and analyze a policy optimization framework for continuous-time, finite-horizon linear-quadratic GMFGs. Exploiting the structural properties of GMFGs, we design an efficient policy parameterization in which each player’s policy is represented as an affine function of their private state, with a shared slope function and player-specific intercepts. We develop a bilevel optimization algorithm that alternates between policy gradient updates for best-response computation under a fixed population distribution, and distribution updates using the resulting policies. We prove linear convergence of the policy gradient steps to best-response policies and establish global convergence of the overall algorithm to the Nash equilibrium. The analysis relies on novel landscape characterizations over infinite-dimensional policy spaces. Numerical experiments demonstrate the convergence and robustness of the proposed algorithm under varying graphon structures, noise levels, and action frequencies.
nan
Article 1669
Title@2025-06-06 (5): Open Your Eyes: Vision Enhances Message Passing Neural Networks in Link Prediction
Title: Open Your Eyes: Vision Enhances Message Passing Neural Networks in Link Prediction | Öffnen Sie Ihre Augen: Vision verbessert die Botschaft an neuralen Netzwerken in der Link-Vorhersage | ” 睁开眼睛:愿景在链接预测中加强信息传递神经网络 “ 2505.08266v3 |
Authors (8): Yanbin Wei, Xuehao Wang, Zhan Zhuang, Yang Chen, Shuhao Chen, Yulong Zhang, Yu Zhang, James Kwok
Message-passing graph neural networks (MPNNs) and structural features (SFs) are cornerstones for the link prediction task. However, as a common and intuitive mode of understanding, the potential of visual perception has been overlooked in the MPNN community. For the first time, we equip MPNNs with vision structural awareness by proposing an effective framework called Graph Vision Network (GVN), along with a more efficient variant (E-GVN). Extensive empirical results demonstrate that with the proposed frameworks, GVN consistently benefits from the vision enhancement across seven link prediction datasets, including challenging large-scale graphs. Such improvements are compatible with existing state-of-the-art (SOTA) methods and GVNs achieve new SOTA results, thereby underscoring a promising novel direction for link prediction.
nan
Article 1670
Title@2025-06-06 (5): An Uncertainty-Aware ED-LSTM for Probabilistic Suffix Prediction
Title: An Uncertainty-Aware ED-LSTM for Probabilistic Suffix Prediction | Eine unsichere ED-LSTM für probabilistische Suffix-Vorhersage | 用于概率后置物后置物预测的不确定性( ED-LSTM) 的不确定性警告 ED-LSTM 2505.21339v2 |
Authors (3): Henryk Mustroph, Michel Kunkler, Stefanie Rinderle-Ma
Suffix prediction of business processes forecasts the remaining sequence of events until process completion. Current approaches focus on predicting the most likely suffix, representing a single scenario. However, when the future course of a process is subject to uncertainty and high variability, the expressiveness of such a single scenario can be limited, since other possible scenarios, which together may have a higher overall probability, are overlooked. To address this limitation, we propose probabilistic suffix prediction, a novel approach that approximates a probability distribution of suffixes. The proposed approach is based on an Uncertainty-Aware Encoder-Decoder LSTM (U-ED-LSTM) and a Monte Carlo (MC) suffix sampling algorithm. We capture epistemic uncertainties via MC dropout and aleatoric uncertainties as learned loss attenuation. This technical report presents a comprehensive evaluation of the probabilistic suffix prediction approach’s predictive performance and calibration under three different hyperparameter settings, using four real-life and one artificial event log. The results show that: i) probabilistic suffix prediction can outperform most likely suffix prediction, the U-ED-LSTM has reasonable predictive performance, and ii) the model’s predictions are well calibrated.
nan
Article 1671
Title@2025-06-06 (5): Variational Inference for Quantum HyperNetworks
Title: Variational Inference for Quantum HyperNetworks | Variationelle Schlussfolgerung für Quantum HyperNetworks | 量子超网络的变异推推推 2506.05888v1 |
Authors (5): Luca Nepote, Alix Lhéritier, Nicolas Bondoux, Marios Kountouris, Maurizio Filippone
Binary Neural Networks (BiNNs), which employ single-bit precision weights, have emerged as a promising solution to reduce memory usage and power consumption while maintaining competitive performance in large-scale systems. However, training BiNNs remains a significant challenge due to the limitations of conventional training algorithms. Quantum HyperNetworks offer a novel paradigm for enhancing the optimization of BiNN by leveraging quantum computing. Specifically, a Variational Quantum Algorithm is employed to generate binary weights through quantum circuit measurements, while key quantum phenomena such as superposition and entanglement facilitate the exploration of a broader solution space. In this work, we establish a connection between this approach and Bayesian inference by deriving the Evidence Lower Bound (ELBO), when direct access to the output distribution is available (i.e., in simulations), and introducing a surrogate ELBO based on the Maximum Mean Discrepancy (MMD) metric for scenarios involving implicit distributions, as commonly encountered in practice. Our experimental results demonstrate that the proposed methods outperform standard Maximum Likelihood Estimation (MLE), improving trainability and generalization.
nan
Article 1672
Title@2025-06-06 (5): NILMFormer: Non-Intrusive Load Monitoring that Accounts for Non-Stationarity
Title: NILMFormer: Non-Intrusive Load Monitoring that Accounts for Non-Stationarity | NILMFormer: Non-Intrusive Load Monitoring, die Konten für Non-Stationarität | NILMFormer: 非入侵性账户的无侵入性负载监测 2506.05880v1 |
Authors (4): Adrien Petralia, Philippe Charpentier, Youssef Kadhi, Themis Palpanas
Millions of smart meters have been deployed worldwide, collecting the total power consumed by individual households. Based on these data, electricity suppliers offer their clients energy monitoring solutions to provide feedback on the consumption of their individual appliances. Historically, such estimates have relied on statistical methods that use coarse-grained total monthly consumption and static customer data, such as appliance ownership. Non-Intrusive Load Monitoring (NILM) is the problem of disaggregating a household’s collected total power consumption to retrieve the consumed power for individual appliances. Current state-of-the-art (SotA) solutions for NILM are based on deep-learning (DL) and operate on subsequences of an entire household consumption reading. However, the non-stationary nature of real-world smart meter data leads to a drift in the data distribution within each segmented window, which significantly affects model performance. This paper introduces NILMFormer, a Transformer-based architecture that incorporates a new subsequence stationarization/de-stationarization scheme to mitigate the distribution drift and that uses a novel positional encoding that relies only on the subsequence’s timestamp information. Experiments with 4 real-world datasets show that NILMFormer significantly outperforms the SotA approaches. Our solution has been deployed as the backbone algorithm for EDF’s (Electricit'e De France) consumption monitoring service, delivering detailed insights to millions of customers about their individual appliances’ power consumption. This paper appeared in KDD 2025.
nan
Article 1673
Title@2025-06-06 (5): Infinite-Dimensional Diffusion Models
Title: Infinite-Dimensional Diffusion Models | Infinite-Dimensional Diffusionsmodelle | 无限不同传播模型 2302.10130v3 |
Authors (4): Jakiw Pidstrigach, Youssef Marzouk, Sebastian Reich, Sven Wang
Diffusion models have had a profound impact on many application areas, including those where data are intrinsically infinite-dimensional, such as images or time series. The standard approach is first to discretize and then to apply diffusion models to the discretized data. While such approaches are practically appealing, the performance of the resulting algorithms typically deteriorates as discretization parameters are refined. In this paper, we instead directly formulate diffusion-based generative models in infinite dimensions and apply them to the generative modelling of functions. We prove that our formulations are well posed in the infinite-dimensional setting and provide dimension-independent distance bounds from the sample to the target measure. Using our theory, we also develop guidelines for the design of infinite-dimensional diffusion models. For image distributions, these guidelines are in line with current canonical choices. For other distributions, however, we can improve upon these canonical choices. We demonstrate these results both theoretically and empirically, by applying the algorithms to data distributions on manifolds and to distributions arising in Bayesian inverse problems or simulation-based inference.
nan
Article 1674
Title@2025-06-06 (5): A projection-based framework for gradient-free and parallel learning
Title: A projection-based framework for gradient-free and parallel learning | Ein projektionsbasierter Rahmen für gradientenfreies und paralleles Lernen | 以预测为基础的无梯度和平行学习框架 2506.05878v1 |
Authors (4): Andreas Bergmeister, Manish Krishan Lal, Stefanie Jegelka, Suvrit Sra
We present a feasibility-seeking approach to neural network training. This mathematical optimization framework is distinct from conventional gradient-based loss minimization and uses projection operators and iterative projection algorithms. We reformulate training as a large-scale feasibility problem: finding network parameters and states that satisfy local constraints derived from its elementary operations. Training then involves projecting onto these constraints, a local operation that can be parallelized across the network. We introduce PJAX, a JAX-based software framework that enables this paradigm. PJAX composes projection operators for elementary operations, automatically deriving the solution operators for the feasibility problems (akin to autodiff for derivatives). It inherently supports GPU/TPU acceleration, provides a familiar NumPy-like API, and is extensible. We train diverse architectures (MLPs, CNNs, RNNs) on standard benchmarks using PJAX, demonstrating its functionality and generality. Our results show that this approach is as a compelling alternative to gradient-based training, with clear advantages in parallelism and the ability to handle non-differentiable operations.
nan
Article 1675
Title@2025-06-06 (5): Interpretable Clustering Ensemble
Title: Interpretable Clustering Ensemble | Interpretiertes Cluster-Ensemble | 可解释的集群组合组合 2506.05877v1 |
Authors (5): Hang Lv, Lianyu Hu, Mudi Jiang, Xinying Liu, Zengyou He
Clustering ensemble has emerged as an important research topic in the field of machine learning. Although numerous methods have been proposed to improve clustering quality, most existing approaches overlook the need for interpretability in high-stakes applications. In domains such as medical diagnosis and financial risk assessment, algorithms must not only be accurate but also interpretable to ensure transparent and trustworthy decision-making. Therefore, to fill the gap of lack of interpretable algorithms in the field of clustering ensemble, we propose the first interpretable clustering ensemble algorithm in the literature. By treating base partitions as categorical variables, our method constructs a decision tree in the original feature space and use the statistical association test to guide the tree building process. Experimental results demonstrate that our algorithm achieves comparable performance to state-of-the-art (SOTA) clustering ensemble methods while maintaining an additional feature of interpretability. To the best of our knowledge, this is the first interpretable algorithm specifically designed for clustering ensemble, offering a new perspective for future research in interpretable clustering.
nan
Article 1676
Title@2025-06-06 (5): BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures
Title: BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures | BestServe: Servierstrategien mit optimalem Goodput in Kollokations- und Disaggregationsarchitekturen | 最佳服务:在合用和拆分结构中采用最优优利用的优良战略 2506.05871v1 |
Authors (6): Xiannan Hu, Tianyou Zeng, Xiaoming Yuan, Liwei Song, Guangyuan Zhang, Bangzheng He
Serving large language models (LLMs) to millions of users requires efficient resource allocation and parallelism strategies. It is a labor intensive trial-and-error process to find such a strategy. We present BestServe, a novel framework for ranking serving strategies by estimating goodput under various operating scenarios. Supporting both collocated and disaggregated architectures, BestServe leverages an inference simulator built on an adapted roofline model and CPU-GPU dispatch dynamics. Our framework determines the optimal strategy in minutes on a single standard CPU, eliminating the need for costly benchmarking, while achieving predictions within a $20\%$ error margin. It appeals to be practical for rapid deployment planning because of its lightweight design and strong extensibility.
nan
Article 1677
Title@2025-06-06 (5): Loss Functions for Predictor-based Neural Architecture Search
Title: Loss Functions for Predictor-based Neural Architecture Search | Verlustfunktionen für vorausschauende neurale Architektursuche | 预测基于神经建筑搜索的损失函数 2506.05869v1 |
Authors (4): Han Ji, Yuqi Feng, Jiahao Fan, Yanan Sun
Evaluation is a critical but costly procedure in neural architecture search (NAS). Performance predictors have been widely adopted to reduce evaluation costs by directly estimating architecture performance. The effectiveness of predictors is heavily influenced by the choice of loss functions. While traditional predictors employ regression loss functions to evaluate the absolute accuracy of architectures, recent approaches have explored various ranking-based loss functions, such as pairwise and listwise ranking losses, to focus on the ranking of architecture performance. Despite their success in NAS, the effectiveness and characteristics of these loss functions have not been thoroughly investigated. In this paper, we conduct the first comprehensive study on loss functions in performance predictors, categorizing them into three main types: regression, ranking, and weighted loss functions. Specifically, we assess eight loss functions using a range of NAS-relevant metrics on 13 tasks across five search spaces. Our results reveal that specific categories of loss functions can be effectively combined to enhance predictor-based NAS. Furthermore, our findings could provide practical guidance for selecting appropriate loss functions for various tasks. We hope this work provides meaningful insights to guide the development of loss functions for predictor-based methods in the NAS community.
nan
Article 1678
Title@2025-06-06 (5): Stealix: Model Stealing via Prompt Evolution
Title: Stealix: Model Stealing via Prompt Evolution | Stealix: Modelldiebstahl über Prompt Evolution | Stenyix: 通过快速进化偷盗模型 2506.05867v1 |
Authors (4): Zhixiong Zhuang, Hui-Po Wang, Maria-Irina Nicolae, Mario Fritz
Model stealing poses a significant security risk in machine learning by enabling attackers to replicate a black-box model without access to its training data, thus jeopardizing intellectual property and exposing sensitive information. Recent methods that use pre-trained diffusion models for data synthesis improve efficiency and performance but rely heavily on manually crafted prompts, limiting automation and scalability, especially for attackers with little expertise. To assess the risks posed by open-source pre-trained models, we propose a more realistic threat model that eliminates the need for prompt design skills or knowledge of class names. In this context, we introduce Stealix, the first approach to perform model stealing without predefined prompts. Stealix uses two open-source pre-trained models to infer the victim model’s data distribution, and iteratively refines prompts through a genetic algorithm, progressively improving the precision and diversity of synthetic images. Our experimental results demonstrate that Stealix significantly outperforms other methods, even those with access to class names or fine-grained prompts, while operating under the same query budget. These findings highlight the scalability of our approach and suggest that the risks posed by pre-trained generative models in model stealing may be greater than previously recognized.
nan
Article 1679
Title@2025-06-06 (5): Enhancing pretraining efficiency for medical image segmentation via transferability metrics
Title: Enhancing pretraining efficiency for medical image segmentation via transferability metrics | Steigerung der Vorschulungseffizienz für die Segmentierung medizinischer Bilder durch Übertragbarkeitsmetriken | 通过可转让性指标,提高医疗图象分形的训练前效率 2410.18677v2 |
Authors (3): Gábor Hidy, Bence Bakos, András Lukács
In medical image segmentation tasks, the scarcity of labeled training data poses a significant challenge when training deep neural networks. When using U-Net-style architectures, it is common practice to address this problem by pretraining the encoder part on a large general-purpose dataset like ImageNet. However, these methods are resource-intensive and do not guarantee improved performance on the downstream task. In this paper we investigate a variety of training setups on medical image segmentation datasets, using ImageNet-pretrained models. By examining over 300 combinations of models, datasets, and training methods, we find that shorter pretraining often leads to better results on the downstream task, providing additional proof to the well-known fact that the accuracy of the model on ImageNet is a poor indicator for downstream performance. As our main contribution, we introduce a novel transferability metric, based on contrastive learning, that measures how robustly a pretrained model is able to represent the target data. In contrast to other transferability scores, our method is applicable to the case of transferring from ImageNet classification to medical image segmentation. We apply our robustness score by measuring it throughout the pretraining phase to indicate when the model weights are optimal for downstream transfer. This reduces pretraining time and improves results on the target task.
nan
Article 1680
Title@2025-06-06 (5): Conditioning Diffusions Using Malliavin Calculus
Title: Conditioning Diffusions Using Malliavin Calculus | Konditionierung Diffusionen mit Malliavin Calculus | 使用 Malliavin Calculus 有条件的传播 2504.03461v2 |
Authors (5): Jakiw Pidstrigach, Elizabeth Baker, Carles Domingo-Enrich, George Deligiannidis, Nikolas Nüsken
In generative modelling and stochastic optimal control, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an infinite value if the target is hit and zero otherwise. We introduce a novel framework, based on Malliavin calculus and centred around a generalisation of the Tweedie score formula to nonlinear stochastic differential equations, that enables the development of methods robust to such singularities. This allows our approach to handle a broad range of applications, like diffusion bridges, or adding conditional controls to an already trained diffusion model. We demonstrate that our approach offers stable and reliable training, outperforming existing techniques. As a byproduct, we also introduce a novel score matching objective. Our loss functions are formulated such that they could readily be extended to manifold-valued and infinite dimensional diffusions.
nan
Article 1681
Title@2025-06-06 (5): Improved Allergy Wheal Detection for the Skin Prick Automated Test Device
Title: Improved Allergy Wheal Detection for the Skin Prick Automated Test Device | Verbesserte Allergikererkennung für das Automatisierte Testgerät Skin Prick | 改进皮肤皮板自动自动测试装置的过敏 wHeal 检测 2506.05862v1 |
Authors (24): Rembert Daems, Sven Seys, Valérie Hox, Adam Chaker, Glynnis De Greve, Winde Lemmens, Anne-Lise Poirrier, Eline Beckers, Zuzana Diamant, Carmen Dierickx, Peter W. Hellings, Caroline Huart, Claudia Jerin, Mark Jorissen, Hanne Oscé, Karolien Roux, Mark Thompson, Sophie Tombu, Saartje Uyttebroek, Andrzej Zarowski, Senne Gorris, Laura Van Gerven, Dirk Loeckx, Thomas Demeester
Background: The skin prick test (SPT) is the gold standard for diagnosing sensitization to inhalant allergies. The Skin Prick Automated Test (SPAT) device was designed for increased consistency in test results, and captures 32 images to be jointly used for allergy wheal detection and delineation, which leads to a diagnosis. Materials and Methods: Using SPAT data from $868$ patients with suspected inhalant allergies, we designed an automated method to detect and delineate wheals on these images. To this end, $10,416$ wheals were manually annotated by drawing detailed polygons along the edges. The unique data-modality of the SPAT device, with $32$ images taken under distinct lighting conditions, requires a custom-made approach. Our proposed method consists of two parts: a neural network component that segments the wheals on the pixel level, followed by an algorithmic and interpretable approach for detecting and delineating the wheals. Results: We evaluate the performance of our method on a hold-out validation set of $217$ patients. As a baseline we use a single conventionally lighted image per SPT as input to our method. Conclusion: Using the $32$ SPAT images under various lighting conditions offers a considerably higher accuracy than a single image in conventional, uniform light.
nan
Article 1682
Title@2025-06-06 (5): Wavelet-based Disentangled Adaptive Normalization for Non-stationary Times Series Forecasting
Title: Wavelet-based Disentangled Adaptive Normalization for Non-stationary Times Series Forecasting | Wavelet-basiert Disentangled Adaptive Normalisierung für nicht-stationäre Zeitreihen Forecasting | 用于非静止时报系列预测的不固定时报调适正常化 2506.05857v1 |
Authors (9): Junpeng Lin, Tian Lan, Bo Zhang, Ke Lin, Dandan Miao, Huiru He, Jiantao Ye, Chen Zhang, Yan-fu Li
Forecasting non-stationary time series is a challenging task because their statistical properties often change over time, making it hard for deep models to generalize well. Instance-level normalization techniques can help address shifts in temporal distribution. However, most existing methods overlook the multi-component nature of time series, where different components exhibit distinct non-stationary behaviors. In this paper, we propose Wavelet-based Disentangled Adaptive Normalization (WDAN), a model-agnostic framework designed to address non-stationarity in time series forecasting. WDAN uses discrete wavelet transforms to break down the input into low-frequency trends and high-frequency fluctuations. It then applies tailored normalization strategies to each part. For trend components that exhibit strong non-stationarity, we apply first-order differencing to extract stable features used for predicting normalization parameters. Extensive experiments on multiple benchmarks demonstrate that WDAN consistently improves forecasting accuracy across various backbone model. Code is available at this repository: https://github.com/MonBG/WDAN.
nan
Article 1683
Title@2025-06-06 (5): TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research
Title: TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research | TinySQL: Ein progressiver Text-zu-SQL-Datensatz für die mechanistische Interpretationsforschung | TinySQL: 用于机械解释性研究的渐进文本到SQL数据集 2503.12730v3 |
Authors (6): Abir Harrasse, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks, Amir Abdullah
Mechanistic interpretability research faces a gap between analyzing simple circuits in toy tasks and discovering features in large models. To bridge this gap, we propose text-to-SQL generation as an ideal task to study, as it combines the formal structure of toy tasks with real-world complexity. We introduce TinySQL, a synthetic dataset, progressing from basic to advanced SQL operations, and train models ranging from 33M to 1B parameters to establish a comprehensive testbed for interpretability. We apply multiple complementary interpretability techniques, including Edge Attribution Patching and Sparse Autoencoders, to identify minimal circuits and components supporting SQL generation. We compare circuits for different SQL subskills, evaluating their minimality, reliability, and identifiability. Finally, we conduct a layerwise logit lens analysis to reveal how models compose SQL queries across layers: from intent recognition to schema resolution to structured generation. Our work provides a robust framework for probing and comparing interpretability methods in a structured, progressively complex setting.
nan
Article 1684
Title@2025-06-06 (5): BoA: Attention-aware Post-training Quantization without Backpropagation
Title: BoA: Attention-aware Post-training Quantization without Backpropagation | BoA: Aufmerksamkeitsbewusste Nachschulung Quantisierung ohne Backpropagation | BoA: 注意培训后量化而不反言反言 2406.13474v3 |
Authors (6): Junhan Kim, Ho-young Kim, Eulrang Cho, Chungman Lee, Joonyoung Kim, Yongkweon Jeon
Post-training quantization (PTQ) is a promising solution for deploying large language models (LLMs) on resource-constrained devices. Early methods developed for small-scale networks, such as ResNet, rely on gradient-based optimization, which becomes impractical for hyper-scale LLMs with billions of parameters. While recently proposed backpropagation-free or transformation-based methods alleviate this issue, they ignore inter-layer interactions or use the naive nearest-rounding-based quantized weight assignment to save the heavy computational cost of weight optimization. In this paper, we introduce a novel backpropagation-free PTQ algorithm that optimizes quantized weights by considering inter-layer dependencies. The key innovation is the development of attention-aware Hessian matrices that capture inter-layer interactions within the attention module. Extensive experiments demonstrate that our approach not only outperforms existing weight quantization methods but also shows good synergy with conventional methods to suppress activation outliers, leading to state-of-the-art weight-activation quantization performance. The code will be available at https://github.com/SamsungLabs/BoA.
nan
Article 1685
Title@2025-06-06 (5): Training-Free Query Optimization via LLM-Based Plan Similarity
Title: Training-Free Query Optimization via LLM-Based Plan Similarity | Schulungsfreie Abfrage Optimierung über LLM-basierte Plan-Ähnlichkeit | 通过LLM-基于LLM-计划相似性实现无培训查询优化 2506.05853v1 |
Authors (3): Nikita Vasilenko, Alexander Demin, Vladimir Boorlakov
Large language model (LLM) embeddings offer a promising new avenue for database query optimization. In this paper, we explore how pre-trained execution plan embeddings can guide SQL query execution without the need for additional model training. We introduce LLM-PM (LLM-based Plan Mapping), a framework that embeds the default execution plan of a query, finds its k nearest neighbors among previously executed plans, and recommends database hintsets based on neighborhood voting. A lightweight consistency check validates the selected hint, while a fallback mechanism searches the full hint space when needed. Evaluated on the JOB-CEB benchmark using OpenGauss, LLM-PM achieves an average speed-up of 21% query latency reduction. This work highlights the potential of LLM-powered embeddings to deliver practical improvements in query performance and opens new directions for training-free, embedding-based optimizer guidance systems.
nan
Article 1686
Title@2025-06-06 (5): Regional, Lattice and Logical Representations of Neural Networks
Title: Regional, Lattice and Logical Representations of Neural Networks | Regionale, gitter- und logische Vertretungen neuraler Netzwerke | 神经网络的区域、拉蒂斯和逻辑代表 2506.05834v1 |
Authors (2): Sandro Preto, Marcelo Finger
A possible path to the interpretability of neural networks is to (approximately) represent them in the regional format of piecewise linear functions, where regions of inputs are associated to linear functions computing the network outputs. We present an algorithm for the translation of feedforward neural networks with ReLU activation functions in hidden layers and truncated identity activation functions in the output layer. We also empirically investigate the complexity of regional representations outputted by our method for neural networks with varying sizes. Lattice and logical representations of neural networks are straightforward from regional representations as long as they satisfy a specific property. So we empirically investigate to what extent the translations by our algorithm satisfy such property.
nan
Article 1687
Title@2025-06-06 (5): Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning
Title: Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning | Mit dem Pfeil der Zeit lernen: Hyperbolische Geometrie für rückseitig-kompatibles Repräsentationslernen | 沿时间之箭学习:反向比较代表学习的超单几何学 2506.05826v1 |
Authors (8): Ngoc Bui, Menglin Yang, Runjin Chen, Leonardo Neves, Mingxuan Ju, Rex Ying, Neil Shah, Tong Zhao
Backward compatible representation learning enables updated models to integrate seamlessly with existing ones, avoiding to reprocess stored data. Despite recent advances, existing compatibility approaches in Euclidean space neglect the uncertainty in the old embedding model and force the new model to reconstruct outdated representations regardless of their quality, thereby hindering the learning process of the new model. In this paper, we propose to switch perspectives to hyperbolic geometry, where we treat time as a natural axis for capturing a model’s confidence and evolution. By lifting embeddings into hyperbolic space and constraining updated embeddings to lie within the entailment cone of the old ones, we maintain generational consistency across models while accounting for uncertainties in the representations. To further enhance compatibility, we introduce a robust contrastive alignment loss that dynamically adjusts alignment weights based on the uncertainty of the old embeddings. Experiments validate the superiority of the proposed method in achieving compatibility, paving the way for more resilient and adaptable machine learning systems.
nan
Article 1688
Title@2025-06-06 (5): Domain Generalizable Knowledge Tracing via Concept Aggregation and Relation-Based Attention
Title: Domain Generalizable Knowledge Tracing via Concept Aggregation and Relation-Based Attention | Domain Generalizable Knowledge Tracing über Konzeptaggregation und relationsbasierte Aufmerksamkeit | 通过概念聚合和基于关系的关注寻找可普遍适用的知识 2407.02547v2 |
Authors (5): Yuquan Xie, Shengtao Peng, Wanqi Yang, Ming Yang, Yang Gao
Knowledge Tracing (KT) is a critical task in online education systems, aiming to monitor students’ knowledge states throughout a learning period. Common KT approaches involve predicting the probability of a student correctly answering the next question based on their exercise history. However, these methods often suffer from performance degradation when faced with the scarcity of student interactions in new education systems. To address this, we leverage student interactions from existing education systems to mitigate performance degradation caused by limited training data. Nevertheless, these interactions exhibit significant differences since they are derived from different education systems. To address this issue, we propose a domain generalization approach for knowledge tracing, where existing education systems are considered source domains, and new education systems with limited data are considered target domains. Additionally, we design a domain-generalizable knowledge tracing framework (DGKT) that can be applied to any KT model. Specifically, we present a concept aggregation approach designed to reduce conceptual disparities within sequences of student interactions from diverse domains. To further mitigate domain discrepancies, we introduce a novel normalization module called Sequence Instance Normalization (SeqIN). Moreover, to fully leverage exercise information, we propose a new knowledge tracing model tailored for the domain generalization KT task, named Domain-Generalizable Relation-based Knowledge Tracing (DGRKT). Extensive experiments across five benchmark datasets demonstrate that the proposed method performs well despite limited training data.
nan
Article 1689
Title@2025-06-06 (5): SAGE: A Framework of Precise Retrieval for RAG
Title: SAGE: A Framework of Precise Retrieval for RAG | SAGE: Ein Rahmen für präzises Retrieval für RAG | SAGE:RAG的精确检索框架 2503.01713v3 |
Authors (3): Jintao Zhang, Guoliang Li, Jinyang Su
Retrieval-augmented generation (RAG) has demonstrated significant proficiency in conducting question-answering (QA) tasks within a specified corpus. Nonetheless, numerous failure instances of RAG in QA still exist. These failures are not solely attributable to the limitations of Large Language Models (LLMs); instead, they predominantly arise from the retrieval of inaccurate information for LLMs due to two limitations: (1) Current RAG methods segment the corpus without considering semantics, making it difficult to find relevant context due to impaired correlation between questions and the segments. (2) There is a trade-off between missing essential context with fewer context retrieved and getting irrelevant context with more context retrieved. In this paper, we introduce a RAG framework (SAGE), to overcome these limitations. First, to address the segmentation issue without considering semantics, we propose to train a semantic segmentation model. This model is trained to segment the corpus into semantically complete chunks. Second, to ensure that only the most relevant chunks are retrieved while the irrelevant ones are ignored, we design a chunk selection algorithm to dynamically select chunks based on the decreasing speed of the relevance score, leading to a more relevant selection. Third, to further ensure the precision of the retrieved chunks, we propose letting LLMs assess whether retrieved chunks are excessive or lacking and then adjust the amount of context accordingly. Experiments show that SAGE outperforms baselines by 61.25% in the quality of QA on average. Moreover, by avoiding retrieving noisy context, SAGE lowers the cost of the tokens consumed in LLM inference and achieves a 49.41% enhancement in cost efficiency on average. Additionally, our work offers valuable insights for boosting RAG.
nan
Article 1690
Title@2025-06-06 (5): SageAttention2++: A More Efficient Implementation of SageAttention2
Title: SageAttention2++: A More Efficient Implementation of SageAttention2 | SageAttention2++: Effizientere Umsetzung von SageAttention2 | SageAttention2++:更有效地实施SageAttention2 2505.21136v3 |
Authors (8): Jintao Zhang, Xiaoming Xu, Jia Wei, Haofeng Huang, Pengle Zhang, Chendong Xiang, Jun Zhu, Jianfei Chen
The efficiency of attention is critical because its time complexity grows quadratically with sequence length. SageAttention2 addresses this by utilizing quantization to accelerate matrix multiplications (Matmul) in attention. To further accelerate SageAttention2, we propose to utilize the faster instruction of FP8 Matmul accumulated in FP16. The instruction is 2x faster than the FP8 Matmul used in SageAttention2. Our experiments show that SageAttention2++ achieves a 3.9x speedup over FlashAttention while maintaining the same attention accuracy as SageAttention2. This means SageAttention2++ effectively accelerates various models, including those for language, image, and video generation, with negligible end-to-end metrics loss. The code will be available at https://github.com/thu-ml/SageAttention.
nan
Article 1691
Title@2025-06-06 (5): Non-Asymptotic Length Generalization
Title: Non-Asymptotic Length Generalization | Nicht-Asymptotische Längenverallgemeinerung | 非负时长一般化 2506.03085v2 |
Authors (3): Thomas Chen, Tengyu Ma, Zhiyuan Li
Length generalization is the ability of a learning algorithm to learn a hypothesis which generalizes to longer inputs than the inputs in the training set. In this paper, we provide provable guarantees of length generalization for various classes of functions in an idealized setting. First, we formalize the framework of non-asymptotic length generalization, which requires a computable upper bound for the minimum input length that guarantees length generalization, as a function of the complexity of ground-truth function under some given complexity measure. We refer to this minimum input length to length generalize as length complexity. We show the Minimum-Complexity Interpolator learning algorithm achieves optimal length complexity. We further show that whether a function class admits non-asymptotic length generalization is equivalent to the decidability of its language equivalence problem, which implies that there is no computable upper bound for the length complexity of Context-Free Grammars. On the positive side, we show that the length complexity of Deterministic Finite Automata is $2n - 2$ where $n$ is the number of states of the ground-truth automaton. Our main results are upper bounds of length complexity for a subset of a transformer-related function class called C-RASP (Yang & Chiang, 2024). We show that the length complexity of 1-layer C-RASP functions is $O(T^2)$ when the ground-truth function has precision $T$, and that the length complexity of 2-layer C-RASP functions is $O(T^{O(K)})$ when the ground-truth function has precision $T$ and $K$ heads.
nan
Article 1692
Title@2025-06-06 (5): Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants
Title: Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants | Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants | 高有机铜剂在 lvLiNGAM 中的因果效应识别 2506.05202v2 |
Authors (5): Daniele Tramontano, Yaroslav Kivva, Saber Salehkaleybar, Mathias Drton, Negar Kiyavash
This paper investigates causal effect identification in latent variable Linear Non-Gaussian Acyclic Models (lvLiNGAM) using higher-order cumulants, addressing two prominent setups that are challenging in the presence of latent confounding: (1) a single proxy variable that may causally influence the treatment and (2) underspecified instrumental variable cases where fewer instruments exist than treatments. We prove that causal effects are identifiable with a single proxy or instrument and provide corresponding estimation methods. Experimental results demonstrate the accuracy and robustness of our approaches compared to existing methods, advancing the theoretical and practical understanding of causal inference in linear systems with latent confounders.
nan
Article 1693
Title@2025-06-06 (5): Taming Knowledge Conflicts in Language Models
Title: Taming Knowledge Conflicts in Language Models | Zähmen von Wissenskonflikten in Sprachmodellen | 在语言模式中讨论知识冲突 2503.10996v2 |
Authors (3): Gaotang Li, Yuzhong Chen, Hanghang Tong
Language Models (LMs) often encounter knowledge conflicts when parametric memory contradicts contextual knowledge. Previous works attribute this conflict to the interplay between “memory heads” and “context heads”, attention heads assumed to promote either memory or context exclusively. In this study, we go beyond this fundamental assumption by uncovering a critical phenomenon we term the superposition of contextual information and parametric memory, where highly influential attention heads simultaneously contribute to both memory and context. Building upon this insight, we propose Just Run Twice (JuICE), a test-time attention intervention method that steers LMs toward either parametric beliefs or contextual knowledge without requiring fine-tuning. JuICE identifies a set of reliable attention heads and leverages a dual-run approach to mitigate the superposition effects. Extensive experiments across 11 datasets and 6 model architectures demonstrate that JuICE sets the new state-of-the-art performance and robust generalization, achieving significant and consistent improvement across different domains under various conflict types. Finally, we theoretically analyze knowledge conflict and the superposition of contextual information and parametric memory in attention heads, which further elucidates the effectiveness of JuICE in these settings. Our code is available at https://github.com/GaotangLi/JUICE.
nan
Article 1694
Title@2025-06-06 (5): Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models
Title: Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models | Boolesche Matrix-Logik-Programmierung für aktives Lernen von Genfunktionen in metabolischen Netzwerkmodellen im Genommaßstab | 在基因组规模代谢网络模型中积极学习基因功能的布尔矩阵逻辑逻辑编程 2405.06724v4 |
Authors (4): Lun Ai, Stephen H. Muggleton, Shi-Shun Liang, Geoff S. Baldwin
Reasoning about hypotheses and updating knowledge through empirical observations are central to scientific discovery. In this work, we applied logic-based machine learning methods to drive biological discovery by guiding experimentation. Genome-scale metabolic network models (GEMs) - comprehensive representations of metabolic genes and reactions - are widely used to evaluate genetic engineering of biological systems. However, GEMs often fail to accurately predict the behaviour of genetically engineered cells, primarily due to incomplete annotations of gene interactions. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To efficiently predict using GEM, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging Boolean matrices to evaluate large logic programs. We developed a new system, $BMLP_{active}$, which guides cost-effective experimentation and uses interpretable logic programs to encode a state-of-the-art GEM of a model bacterial organism. Notably, $BMLP_{active}$ successfully learned the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for biological discovery, which would then facilitate microbial engineering for practical applications.
nan
Article 1695
Title@2025-06-06 (5): SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference
Title: SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference | SpargeAchtung: Genaue und trainingsfreie Sparse Achtung Beschleunigen jeder Modellinferenz | 隔离注意:准确和无培训的分散注意加速任何典型推断 2502.18137v5 |
Authors (7): Jintao Zhang, Chendong Xiang, Haofeng Huang, Jia Wei, Haocheng Xi, Jun Zhu, Jianfei Chen
An efficient attention implementation is essential for large models due to its quadratic time complexity. Fortunately, attention commonly exhibits sparsity, i.e., many values in the attention map are near zero, allowing for the omission of corresponding computations. Many studies have utilized the sparse pattern to accelerate attention. However, most existing works focus on optimizing attention within specific models by exploiting certain sparse patterns of the attention map. A universal sparse attention that guarantees both the speedup and end-to-end performance of diverse models remains elusive. In this paper, we propose SpargeAttn, a universal sparse and quantized attention for any model. Our method uses a two-stage online filter: in the first stage, we rapidly and accurately predict the attention map, enabling the skip of some matrix multiplications in attention. In the second stage, we design an online softmax-aware filter that incurs no extra overhead and further skips some matrix multiplications. Experiments show that our method significantly accelerates diverse models, including language, image, and video generation, without sacrificing end-to-end metrics. The codes are available at https://github.com/thu-ml/SpargeAttn.
nan
Article 1696
Title@2025-06-06 (5): Overcoming Challenges of Partial Client Participation in Federated Learning : A Comprehensive Review
Title: Overcoming Challenges of Partial Client Participation in Federated Learning : A Comprehensive Review | Herausforderungen der Teilkundenbeteiligung am Föderierten Lernen überwinden : Ein umfassender Überblick | 克服部分客户参与联邦学习的挑战:全面审查 2506.02887v2 |
Authors (4): Mrinmay Sen, Shruti Aparna, Rohit Agarwal, Chalavadi Krishna Mohan
Federated Learning (FL) is a learning mechanism that falls under the distributed training umbrella, which collaboratively trains a shared global model without disclosing the raw data from different clients. This paper presents an extensive survey on the impact of partial client participation in federated learning. While much of the existing research focuses on addressing issues such as generalization, robustness, and fairness caused by data heterogeneity under the assumption of full client participation, limited attention has been given to the practical and theoretical challenges arising from partial client participation, which is common in real-world scenarios. This survey provides an in-depth review of existing FL methods designed to cope with partial client participation. We offer a comprehensive analysis supported by theoretical insights and empirical findings, along with a structured categorization of these methods, highlighting their respective advantages and disadvantages.
nan
Article 1697
Title@2025-06-06 (5): BOLD: Boolean Logic Deep Learning
Title: BOLD: Boolean Logic Deep Learning | BOLD: Boolean Logic Deep Learning | 布尔逻辑深层学习 2405.16339v2 |
Authors (5): Van Minh Nguyen, Cristian Ocampo, Aymen Askri, Louis Leconte, Ba-Hien Tran
Deep learning is computationally intensive, with significant efforts focused on reducing arithmetic complexity, particularly regarding energy consumption dominated by data movement. While existing literature emphasizes inference, training is considerably more resource-intensive. This paper proposes a novel mathematical principle by introducing the notion of Boolean variation such that neurons made of Boolean weights and inputs can be trained – for the first time – efficiently in Boolean domain using Boolean logic instead of gradient descent and real arithmetic. We explore its convergence, conduct extensively experimental benchmarking, and provide consistent complexity evaluation by considering chip architecture, memory hierarchy, dataflow, and arithmetic precision. Our approach achieves baseline full-precision accuracy in ImageNet classification and surpasses state-of-the-art results in semantic segmentation, with notable performance in image super-resolution, and natural language understanding with transformer-based models. Moreover, it significantly reduces energy consumption during both training and inference.
nan
Article 1698
Title@2025-06-06 (5): Positional Encoding meets Persistent Homology on Graphs
Title: Positional Encoding meets Persistent Homology on Graphs | Positionale Kodierung trifft Persistente Homologie auf Graphen | 定位编码与图纸上的常同理学相识 2506.05814v1 |
Authors (3): Yogesh Verma, Amauri H. Souza, Vikas Garg
The local inductive bias of message-passing graph neural networks (GNNs) hampers their ability to exploit key structural information (e.g., connectivity and cycles). Positional encoding (PE) and Persistent Homology (PH) have emerged as two promising approaches to mitigate this issue. PE schemes endow GNNs with location-aware features, while PH methods enhance GNNs with multiresolution topological features. However, a rigorous theoretical characterization of the relative merits and shortcomings of PE and PH has remained elusive. We bridge this gap by establishing that neither paradigm is more expressive than the other, providing novel constructions where one approach fails but the other succeeds. Our insights inform the design of a novel learnable method, PiPE (Persistence-informed Positional Encoding), which is provably more expressive than both PH and PE. PiPE demonstrates strong performance across a variety of tasks (e.g., molecule property prediction, graph classification, and out-of-distribution generalization), thereby advancing the frontiers of graph representation learning. Code is available at https://github.com/Aalto-QuML/PIPE.
nan
Article 1699
Title@2025-06-06 (5): E-LDA: Toward Interpretable LDA Topic Models with Strong Guarantees in Logarithmic Parallel Time
Title: E-LDA: Toward Interpretable LDA Topic Models with Strong Guarantees in Logarithmic Parallel Time | E-LDA: Auf dem Weg zu interpretierbaren LDA-Themenmodellen mit starken Garantien in logarithmischer Parallelzeit | E-LDA:争取在对数平行时间建立具有有力保证的可解释的LDA专题模型 2506.07747v1 |
Authors (1): Adam Breuer
In this paper, we provide the first practical algorithms with provable guarantees for the problem of inferring the topics assigned to each document in an LDA topic model. This is the primary inference problem for many applications of topic models in social science, data exploration, and causal inference settings. We obtain this result by showing a novel non-gradient-based, combinatorial approach to estimating topic models. This yields algorithms that converge to near-optimal posterior probability in logarithmic parallel computation time (adaptivity) – exponentially faster than any known LDA algorithm. We also show that our approach can provide interpretability guarantees such that each learned topic is formally associated with a known keyword. Finally, we show that unlike alternatives, our approach can maintain the independence assumptions necessary to use the learned topic model for downstream causal inference methods that allow researchers to study topics as treatments. In terms of practical performance, our approach consistently returns solutions of higher semantic quality than solutions from state-of-the-art LDA algorithms, neural topic models, and LLM-based topic models across a diverse range of text datasets and evaluation parameters.
nan
Article 1700
Title@2025-06-06 (5): Unintended Harms of Value-Aligned LLMs: Psychological and Empirical Insights
Title: Unintended Harms of Value-Aligned LLMs: Psychological and Empirical Insights | Unbeabsichtigte Schäden an wertorientierten LLMs: Psychologische und empirische Erkenntnisse | 价值观一致的LMMs:心理和实证透视的意外伤害 2506.06404v1 |
Authors (6): Sooyung Choi, Jaehyeok Lee, Xiaoyuan Yi, Jing Yao, Xing Xie, JinYeong Bak
The application scope of Large Language Models (LLMs) continues to expand, leading to increasing interest in personalized LLMs that align with human values. However, aligning these models with individual values raises significant safety concerns, as certain values may correlate with harmful information. In this paper, we identify specific safety risks associated with value-aligned LLMs and investigate the psychological principles behind these challenges. Our findings reveal two key insights. (1) Value-aligned LLMs are more prone to harmful behavior compared to non-fine-tuned models and exhibit slightly higher risks in traditional safety evaluations than other fine-tuned models. (2) These safety issues arise because value-aligned LLMs genuinely generate text according to the aligned values, which can amplify harmful outcomes. Using a dataset with detailed safety categories, we find significant correlations between value alignment and safety risks, supported by psychological hypotheses. This study offers insights into the “black box” of value alignment and proposes in-context alignment methods to enhance the safety of value-aligned LLMs.
nan
Article 1701
Title@2025-06-06 (5): ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
Title: ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think | ZeroFlow: Katastrophisches Vergessen zu überwinden ist einfacher als Sie denken | 零花:克服灾难的遗忘比你想象的容易 2501.01045v4 |
Authors (7): Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang
Backpropagation provides a generalized configuration for overcoming catastrophic forgetting. Optimizers such as SGD and Adam are commonly used for weight updates in continual learning and continual pre-training. However, access to gradient information is not always feasible in practice due to black-box APIs, hardware constraints, or non-differentiable systems, a challenge we refer to as the gradient bans. To bridge this gap, we introduce ZeroFlow, the first benchmark designed to evaluate gradient-free optimization algorithms for overcoming forgetting. ZeroFlow examines a suite of forward pass-based methods across various algorithms, forgetting scenarios, and datasets. Our results show that forward passes alone can be sufficient to mitigate forgetting. We uncover novel optimization principles that highlight the potential of forward pass-based methods in mitigating forgetting, managing task conflicts, and reducing memory demands. Additionally, we propose new enhancements that further improve forgetting resistance using only forward passes. This work provides essential tools and insights to advance the development of forward-pass-based methods for continual learning.
nan
Article 1702
Title@2025-06-06 (5): Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model
Title: Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model | Neuraler Zusammenbruch in kumulativen Linkmodellen für Ordinalregression: Eine Analyse mit ungezwungenem Feature Model | Ordinal 倒退累积链接模型中的神经折叠:含有不受限制特性模型的分析 2506.05801v1 |
Authors (3): Chuang Ma, Tomoyuki Obuchi, Toshiyuki Tanaka
A phenomenon known as ‘‘Neural Collapse (NC)’’ in deep classification tasks, in which the penultimate-layer features and the final classifiers exhibit an extremely simple geometric structure, has recently attracted considerable attention, with the expectation that it can deepen our understanding of how deep neural networks behave. The Unconstrained Feature Model (UFM) has been proposed to explain NC theoretically, and there emerges a growing body of work that extends NC to tasks other than classification and leverages it for practical applications. In this study, we investigate whether a similar phenomenon arises in deep Ordinal Regression (OR) tasks, via combining the cumulative link model for OR and UFM. We show that a phenomenon we call Ordinal Neural Collapse (ONC) indeed emerges and is characterized by the following three properties: (ONC1) all optimal features in the same class collapse to their within-class mean when regularization is applied; (ONC2) these class means align with the classifier, meaning that they collapse onto a one-dimensional subspace; (ONC3) the optimal latent variables (corresponding to logits or preactivations in classification tasks) are aligned according to the class order, and in particular, in the zero-regularization limit, a highly local and simple geometric relationship emerges between the latent variables and the threshold values. We prove these properties analytically within the UFM framework with fixed threshold values and corroborate them empirically across a variety of datasets. We also discuss how these insights can be leveraged in OR, highlighting the use of fixed thresholds.
nan
Article 1703
Title@2025-06-06 (5): Uncertainty Propagation in the Fast Fourier Transform
Title: Uncertainty Propagation in the Fast Fourier Transform | Ungewissheitsverbreitung in der schnellen Fourier-Transformation | 快速傅里叶变形中的不确定性传播 2504.10136v2 |
Authors (3): Luca Schmid, Charlotte Muth, Laurent Schmalen
We address the problem of uncertainty propagation in the discrete Fourier transform by modeling the fast Fourier transform as a factor graph. Building on this representation, we propose an efficient framework for approximate Bayesian inference using belief propagation (BP) and expectation propagation, extending its applicability beyond Gaussian assumptions. By leveraging an appropriate BP message representation and a suitable schedule, our method achieves stable convergence with accurate mean and variance estimates. Numerical experiments in representative scenarios from communications demonstrate the practical potential of the proposed framework for uncertainty-aware inference in probabilistic systems operating across both time and frequency domain.
nan
Article 1704
Title@2025-06-06 (5): Option Pricing Using Ensemble Learning
Title: Option Pricing Using Ensemble Learning | Optionspreise mit Ensemble-Lernen | 使用集合学习的选项定价 2506.05799v1 |
Authors (2): Zeyuan Li, Qingdao Huang
Ensemble learning is characterized by flexibility, high precision, and refined structure. As a critical component within computational finance, option pricing with machine learning requires both high predictive accuracy and reduced structural complexity-features that align well with the inherent advantages of ensemble learning. This paper investigates the application of ensemble learning to option pricing, and conducts a comparative analysis with classical machine learning models to assess their performance in terms of accuracy, local feature extraction, and robustness to noise. A novel experimental strategy is introduced, leveraging parameter transfer across experiments to improve robustness and realism in financial simulations.Building upon this strategy, an evaluation mechanism is developed that incorporates a scoring strategy and a weighted evaluation strategy explicitly emphasizing the foundational role of financial theory. This mechanism embodies an orderly integration of theoretical finance and computational methods. In addition, the study examines the interaction between sliding window technique and noise, revealing nuanced patterns that suggest a potential connection relevant to ongoing research in machine learning and data science.
nan
Article 1705
Title@2025-06-06 (5): Maximum Entropy Reinforcement Learning with Diffusion Policy
Title: Maximum Entropy Reinforcement Learning with Diffusion Policy | Maximales Entropie-Verstärkung-Lernen mit Diffusionspolitik | 采用传播政策进行最大内容强化学习 2502.11612v3 |
Authors (3): Xiaoyi Dong, Jian Cheng, Xi Sheryl Zhang
The Soft Actor-Critic (SAC) algorithm with a Gaussian policy has become a mainstream implementation for realizing the Maximum Entropy Reinforcement Learning (MaxEnt RL) objective, which incorporates entropy maximization to encourage exploration and enhance policy robustness. While the Gaussian policy performs well on simpler tasks, its exploration capacity and potential performance in complex multi-goal RL environments are limited by its inherent unimodality. In this paper, we employ the diffusion model, a powerful generative model capable of capturing complex multimodal distributions, as the policy representation to fulfill the MaxEnt RL objective, developing a method named MaxEnt RL with Diffusion Policy (MaxEntDP). Our method enables efficient exploration and brings the policy closer to the optimal MaxEnt policy. Experimental results on Mujoco benchmarks show that MaxEntDP outperforms the Gaussian policy and other generative models within the MaxEnt RL framework, and performs comparably to other state-of-the-art diffusion-based online RL algorithms. Our code is available at https://github.com/diffusionyes/MaxEntDP.
nan
Article 1706
Title@2025-06-06 (5): EqCollide: Equivariant and Collision-Aware Deformable Objects Neural Simulator
Title: EqCollide: Equivariant and Collision-Aware Deformable Objects Neural Simulator | EqCollide: Gleichwertige und kollisionsfähige verformbare Objekte Neural Simulator | EqCollide: 等同和相撞-可变形物体神经模拟器 2506.05797v1 |
Authors (4): Qianyi Chen, Tianrun Gao, Chenbo Jiang, Tailin Wu
Simulating collisions of deformable objects is a fundamental yet challenging task due to the complexity of modeling solid mechanics and multi-body interactions. Existing data-driven methods often suffer from lack of equivariance to physical symmetries, inadequate handling of collisions, and limited scalability. Here we introduce EqCollide, the first end-to-end equivariant neural fields simulator for deformable objects and their collisions. We propose an equivariant encoder to map object geometry and velocity into latent control points. A subsequent equivariant Graph Neural Network-based Neural Ordinary Differential Equation models the interactions among control points via collision-aware message passing. To reconstruct velocity fields, we query a neural field conditioned on control point features, enabling continuous and resolution-independent motion predictions. Experimental results show that EqCollide achieves accurate, stable, and scalable simulations across diverse object configurations, and our model achieves 24.34% to 35.82% lower rollout MSE even compared with the best-performing baseline model. Furthermore, our model could generalize to more colliding objects and extended temporal horizons, and stay robust to input transformed with group action.
nan
Article 1707
Title@2025-06-06 (5): TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
Title: TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining | TiC-LM: Ein Web-Scale-Benchmark für zeitkontinuierliches LLM-Vortraining | TIC-LM:培训前时间-持续性LLM网络基准 2504.02107v3 |
Authors (11): Jeffrey Li, Mohammadreza Armandpour, Iman Mirzadeh, Sachin Mehta, Vaishaal Shankar, Raviteja Vemulapalli, Samy Bengio, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pouransari, Fartash Faghri
Large Language Models (LLMs) trained on historical web data inevitably become outdated. We investigate evaluation strategies and update methods for LLMs as new data becomes available. We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Common Crawl (CC) - orders of magnitude larger than previous continual language modeling benchmarks. We also design time-stratified evaluations across both general CC data and specific domains (Wikipedia, StackExchange, and code documentation) to assess how well various continual learning methods adapt to new data while retaining past knowledge. Our findings demonstrate that, on general CC data, autoregressive meta-schedules combined with a fixed-ratio replay of older data can achieve comparable held-out loss to re-training from scratch, while requiring significantly less computation (2.6x). However, the optimal balance between incorporating new data and replaying old data differs as replay is crucial to avoid forgetting on generic web data but less so on specific domains.
nan
Article 1708
Title@2025-06-06 (5): Exploiting Similarity for Computation and Communication-Efficient Decentralized Optimization
Title: Exploiting Similarity for Computation and Communication-Efficient Decentralized Optimization | Ausnutzung von Ähnlichkeiten für die Berechnung und Kommunikation-Effiziente dezentralisierte Optimierung | 利用计算和通信效率-权力下放优化的相似性 2506.05791v1 |
Authors (4): Yuki Takezawa, Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich
Reducing communication complexity is critical for efficient decentralized optimization. The proximal decentralized optimization (PDO) framework is particularly appealing, as methods within this framework can exploit functional similarity among nodes to reduce communication rounds. Specifically, when local functions at different nodes are similar, these methods achieve faster convergence with fewer communication steps. However, existing PDO methods often require highly accurate solutions to subproblems associated with the proximal operator, resulting in significant computational overhead. In this work, we propose the Stabilized Proximal Decentralized Optimization (SPDO) method, which achieves state-of-the-art communication and computational complexities within the PDO framework. Additionally, we refine the analysis of existing PDO methods by relaxing subproblem accuracy requirements and leveraging average functional similarity. Experimental results demonstrate that SPDO significantly outperforms existing methods.
nan
Article 1709
Title@2025-06-06 (5): Robust sensor fusion against on-vehicle sensor staleness
Title: Robust sensor fusion against on-vehicle sensor staleness | Robuste Sensor-Fusion gegen Stillstand des Sensors im Fahrzeug | 压制车辆传感器悬浮剂的强感传感器聚变 2506.05780v1 |
Authors (5): Meng Fan, Yifan Zuo, Patrick Blaes, Harley Montgomery, Subhasis Das
Sensor fusion is crucial for a performant and robust Perception system in autonomous vehicles, but sensor staleness, where data from different sensors arrives with varying delays, poses significant challenges. Temporal misalignment between sensor modalities leads to inconsistent object state estimates, severely degrading the quality of trajectory predictions that are critical for safety. We present a novel and model-agnostic approach to address this problem via (1) a per-point timestamp offset feature (for LiDAR and radar both relative to camera) that enables fine-grained temporal awareness in sensor fusion, and (2) a data augmentation strategy that simulates realistic sensor staleness patterns observed in deployed vehicles. Our method is integrated into a perspective-view detection model that consumes sensor data from multiple LiDARs, radars and cameras. We demonstrate that while a conventional model shows significant regressions when one sensor modality is stale, our approach reaches consistently good performance across both synchronized and stale conditions.
nan
Article 1710
Title@2025-06-06 (5): Pegasus: A Universal Framework for Scalable Deep Learning Inference on the Dataplane
Title: Pegasus: A Universal Framework for Scalable Deep Learning Inference on the Dataplane | Pegasus: Ein universelles Framework für skalierbare Deep Learning-Schlussfolgerungen auf dem Dataplane | Pegasus:数据飞机可缩放深学习推理通用框架 2506.05779v1 |
Authors (12): Yinchao Zhang, Su Yao, Yong Feng, Kang Chen, Tong Li, Zhuotao Liu, Yi Zhao, Lexuan Zhang, Xiangyu Gao, Feng Xiong, Qi Li, Ke Xu
The paradigm of Intelligent DataPlane (IDP) embeds deep learning (DL) models on the network dataplane to enable intelligent traffic analysis at line-speed. However, the current use of the match-action table (MAT) abstraction on the dataplane is misaligned with DL inference, leading to several key limitations, including accuracy degradation, limited scale, and lack of generality. This paper proposes Pegasus to address these limitations. Pegasus translates DL operations into three dataplane-oriented primitives to achieve generality: Partition, Map, and SumReduce. Specifically, Partition “divides” high-dimensional features into multiple low-dimensional vectors, making them more suitable for the dataplane; Map “conquers” computations on the low-dimensional vectors in parallel with the technique of fuzzy matching, while SumReduce “combines” the computation results. Additionally, Pegasus employs Primitive Fusion to merge computations, improving scalability. Finally, Pegasus adopts full precision weights with fixed-point activations to improve accuracy. Our implementation on a P4 switch demonstrates that Pegasus can effectively support various types of DL models, including Multi-Layer Perceptron (MLP), Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and AutoEncoder models on the dataplane. Meanwhile, Pegasus outperforms state-of-the-art approaches with an average accuracy improvement of up to 22.8%, along with up to 248x larger model size and 212x larger input scale.
nan
Article 1711
Title@2025-06-06 (5): Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch
Title: Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch | Dunkles Wissen unter verschiedenen Lehrer-Kapazitäten erforschen und Kapazitäts-Mismatch angehen | 在各种教师能力之下探索黑暗知识,解决能力不足问题 2405.13078v2 |
Authors (3): Wen-Shu Fan, Xin-Chun Li, De-Chuan Zhan
Knowledge Distillation (KD) could transfer the ``dark knowledge” of a well-performed yet large neural network to a weaker but lightweight one. From the view of output logits and softened probabilities, this paper goes deeper into the dark knowledge provided by teachers with different capacities. Two fundamental observations are: (1) a larger teacher tends to produce probability vectors with lower distinction among non-ground-truth classes; (2) teachers with different capacities are basically consistent in their cognition of relative class affinity. Through abundant experimental studies we verify these observations and provide in-depth empirical explanations to them. We argue that the distinctness among incorrect classes embodies the essence of dark knowledge. A larger and more accurate teacher lacks this distinctness, which hampers its teaching ability compared to a smaller teacher, ultimately leading to the peculiar phenomenon named “capacity mismatch”. Building on this insight, this paper explores multiple simple yet effective ways to address capacity mismatch, achieving superior experimental results compared to previous approaches.
nan
Article 1712
Title@2025-06-06 (5): ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation
Title: ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation | ActionPiece: Kontextuelle Tokenisierung von Handlungssequenzen für eine generative Empfehlung | ActionPiece: 产生建议的背景化行动序列 2502.13581v2 |
Authors (8): Yupeng Hou, Jianmo Ni, Zhankui He, Noveen Sachdeva, Wang-Cheng Kang, Ed H. Chi, Julian McAuley, Derek Zhiyuan Cheng
Generative recommendation (GR) is an emerging paradigm where user actions are tokenized into discrete token patterns and autoregressively generated as predictions. However, existing GR models tokenize each action independently, assigning the same fixed tokens to identical actions across all sequences without considering contextual relationships. This lack of context-awareness can lead to suboptimal performance, as the same action may hold different meanings depending on its surrounding context. To address this issue, we propose ActionPiece to explicitly incorporate context when tokenizing action sequences. In ActionPiece, each action is represented as a set of item features. Given the action sequence corpora, we construct the vocabulary by merging feature patterns as new tokens, based on their co-occurrence frequency both within individual sets and across adjacent sets. Considering the unordered nature of feature sets, we further introduce set permutation regularization, which produces multiple segmentations of action sequences with the same semantics. Our code is available at: https://github.com/google-deepmind/action_piece.
nan
Article 1713
Title@2025-06-06 (5): Evaluating Neuron Explanations: A Unified Framework with Sanity Checks
Title: Evaluating Neuron Explanations: A Unified Framework with Sanity Checks | Neuron-Erklärungen bewerten: Ein einheitliches Framework mit Sanitätskontrollen | 评估内向解释:一个统一框架,检查是否合乎情理 2506.05774v1 |
Authors (3): Tuomas Oikarinen, Ge Yan, Tsui-Wei Weng
Understanding the function of individual units in a neural network is an important building block for mechanistic interpretability. This is often done by generating a simple text explanation of the behavior of individual neurons or units. For these explanations to be useful, we must understand how reliable and truthful they are. In this work we unify many existing explanation evaluation methods under one mathematical framework. This allows us to compare existing evaluation metrics, understand the evaluation pipeline with increased clarity and apply existing statistical methods on the evaluation. In addition, we propose two simple sanity checks on the evaluation metrics and show that many commonly used metrics fail these tests and do not change their score after massive changes to the concept labels. Based on our experimental and theoretical results, we propose guidelines that future evaluations should follow and identify a set of reliable evaluation metrics.
nan
Article 1714
Title@2025-06-06 (5): SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures
Title: SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures | SGD Jittering: Eine Schulungsstrategie für robuste und präzise modellbasierte Architekturen | SGD JGT JUGT JIGT: 强健和准确的建模建筑培训战略 2410.14667v3 |
Authors (2): Peimeng Guan, Mark A. Davenport
Inverse problems aim to reconstruct unseen data from corrupted or perturbed measurements. While most work focuses on improving reconstruction quality, generalization accuracy and robustness are equally important, especially for safety-critical applications. Model-based architectures (MBAs), such as loop unrolling methods, are considered more interpretable and achieve better reconstructions. Empirical evidence suggests that MBAs are more robust to perturbations than black-box solvers, but the accuracy-robustness tradeoff in MBAs remains underexplored. In this work, we propose a simple yet effective training scheme for MBAs, called SGD jittering, which injects noise iteration-wise during reconstruction. We theoretically demonstrate that SGD jittering not only generalizes better than the standard mean squared error training but is also more robust to average-case attacks. We validate SGD jittering using denoising toy examples, seismic deconvolution, and single-coil MRI reconstruction. Both SGD jittering and its SPGD extension yield cleaner reconstructions for out-of-distribution data and demonstrates enhanced robustness against adversarial attacks.
nan
Article 1715
Title@2025-06-06 (5): Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Title: Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks | Visualisierung, Nachdenken und Abbau der verlorenen Landschaft tiefer neuraler Netzwerke | 深神经网络丧失的地貌景观 2405.12493v2 |
Authors (4): Yichu Xu, Xin-Chun Li, Lan Li, De-Chuan Zhan
The loss landscape of deep neural networks (DNNs) is commonly considered complex and wildly fluctuated. However, an interesting observation is that the loss surfaces plotted along Gaussian noise directions are almost v-basin ones with the perturbed model lying on the basin. This motivates us to rethink whether the 1D or 2D subspace could cover more complex local geometry structures, and how to mine the corresponding perturbation directions. This paper systematically and gradually categorizes the 1D curves from simple to complex, including v-basin, v-side, w-basin, w-peak, and vvv-basin curves. Notably, the latter two types are already hard to obtain via the intuitive construction of specific perturbation directions, and we need to propose proper mining algorithms to plot the corresponding 1D curves. Combining these 1D directions, various types of 2D surfaces are visualized such as the saddle surfaces and the bottom of a bottle of wine that are only shown by demo functions in previous works. Finally, we propose theoretical insights from the lens of the Hessian matrix to explain the observed several interesting phenomena.
nan
Article 1716
Title@2025-06-06 (5): Predicting ICU In-Hospital Mortality Using Adaptive Transformer Layer Fusion
Title: Predicting ICU In-Hospital Mortality Using Adaptive Transformer Layer Fusion | Vorhersage der Krankenhaussterblichkeit der Intensivstation mittels adaptiver Transformer Layer Fusion | 利用适应性变异器图层融合预测伊斯兰法院联盟在医院内的死亡率 2506.04924v2 |
Authors (19): Han Wang, Ruoyun He, Guoguang Lao, Ting Liu, Hejiao Luo, Changqi Qin, Hongying Luo, Junmin Huang, Zihan Wei, Lu Chen, Yongzhi Xu, Ziqian Bi, Junhao Song, Tianyang Wang, Chia Xin Liang, Xinyuan Song, Huafeng Liu, Junfeng Hao, Chunjie Tian
Early identification of high-risk ICU patients is crucial for directing limited medical resources. We introduce ALFIA (Adaptive Layer Fusion with Intelligent Attention), a modular, attention-based architecture that jointly trains LoRA (Low-Rank Adaptation) adapters and an adaptive layer-weighting mechanism to fuse multi-layer semantic features from a BERT backbone. Trained on our rigorous cw-24 (CriticalWindow-24) benchmark, ALFIA surpasses state-of-the-art tabular classifiers in AUPRC while preserving a balanced precision-recall profile. The embeddings produced by ALFIA’s fusion module, capturing both fine-grained clinical cues and high-level concepts, enable seamless pairing with GBDTs (CatBoost/LightGBM) as ALFIA-boost, and deep neuro networks as ALFIA-nn, yielding additional performance gains. Our experiments confirm ALFIA’s superior early-warning performance, by operating directly on routine clinical text, it furnishes clinicians with a convenient yet robust tool for risk stratification and timely intervention in critical-care settings.
nan
Article 1717
Title@2025-06-06 (5): AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation
Title: AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation | AANet: Virtuelles Screening unter struktureller Unsicherheit über Alignment und Aggregation | AANet:通过对齐和聚合在结构不确定性下进行虚拟筛选 2506.05768v1 |
Authors (8): Wenyu Zhu, Jianhui Wang, Bowen Gao, Yinjun Jia, Haichuan Tan, Ya-Qin Zhang, Wei-Ying Ma, Yanyan Lan
Virtual screening (VS) is a critical component of modern drug discovery, yet most existing methods–whether physics-based or deep learning-based–are developed around holo protein structures with known ligand-bound pockets. Consequently, their performance degrades significantly on apo or predicted structures such as those from AlphaFold2, which are more representative of real-world early-stage drug discovery, where pocket information is often missing. In this paper, we introduce an alignment-and-aggregation framework to enable accurate virtual screening under structural uncertainty. Our method comprises two core components: (1) a tri-modal contrastive learning module that aligns representations of the ligand, the holo pocket, and cavities detected from structures, thereby enhancing robustness to pocket localization error; and (2) a cross-attention based adapter for dynamically aggregating candidate binding sites, enabling the model to learn from activity data even without precise pocket annotations. We evaluated our method on a newly curated benchmark of apo structures, where it significantly outperforms state-of-the-art methods in blind apo setting, improving the early enrichment factor (EF1%) from 11.75 to 37.19. Notably, it also maintains strong performance on holo structures. These results demonstrate the promise of our approach in advancing first-in-class drug discovery, particularly in scenarios lacking experimentally resolved protein-ligand complexes.
nan
Article 1718
Title@2025-06-06 (5): On the Query Complexity of Verifier-Assisted Language Generation
Title: On the Query Complexity of Verifier-Assisted Language Generation | Zur Fragekomplexität der sprachlichen Generierung von Prüfern | 关于验证者协助语言生成的查询复杂程度 2502.12123v2 |
Authors (6): Edoardo Botta, Yuchen Li, Aashay Mehta, Jordan T. Ash, Cyril Zhang, Andrej Risteski
Recently, a plethora of works have proposed inference-time algorithms (e.g. best-of-n), which incorporate verifiers to assist the generation process. Their quality-efficiency trade-offs have been empirically benchmarked on a variety of constrained generation tasks, but the algorithmic design landscape is still largely poorly understood. In this paper, we develop a mathematical framework for reasoning about constrained generation using a pre-trained language model generator oracle and a process verifier–which can decide whether a prefix can be extended to a string which satisfies the constraints of choice. We show that even in very simple settings, access to a verifier can render an intractable problem (information-theoretically or computationally) to a tractable one. In fact, we show even simple algorithms, like tokenwise rejection sampling, can enjoy significant benefits from access to a verifier. Empirically, we show that a natural modification of tokenwise rejection sampling, in which the sampler is allowed to “backtrack” (i.e., erase the final few generated tokens) has robust and substantive benefits over natural baselines (e.g. (blockwise) rejection sampling, nucleus sampling)–both in terms of computational efficiency, accuracy and diversity.
nan
Article 1719
Title@2025-06-06 (5): BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning
Title: BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning | BiTrajDiff: Bidirektionale Trajektorien-Generation mit Diffusionsmodellen für Offline-Verstärkung | BirrajDiff:双向轨迹生成,带有离线强化学习传播模型 2506.05762v1 |
Authors (6): Yunpeng Qing, Shuo Chen, Yixiao Chi, Shunyu Liu, Sixu Lin, Changqing Zou
Recent advances in offline Reinforcement Learning (RL) have proven that effective policy learning can benefit from imposing conservative constraints on pre-collected datasets. However, such static datasets often exhibit distribution bias, resulting in limited generalizability. To address this limitation, a straightforward solution is data augmentation (DA), which leverages generative models to enrich data distribution. Despite the promising results, current DA techniques focus solely on reconstructing future trajectories from given states, while ignoring the exploration of history transitions that reach them. This single-direction paradigm inevitably hinders the discovery of diverse behavior patterns, especially those leading to critical states that may have yielded high-reward outcomes. In this work, we introduce Bidirectional Trajectory Diffusion (BiTrajDiff), a novel DA framework for offline RL that models both future and history trajectories from any intermediate states. Specifically, we decompose the trajectory generation task into two independent yet complementary diffusion processes: one generating forward trajectories to predict future dynamics, and the other generating backward trajectories to trace essential history transitions.BiTrajDiff can efficiently leverage critical states as anchors to expand into potentially valuable yet underexplored regions of the state space, thereby facilitating dataset diversity. Extensive experiments on the D4RL benchmark suite demonstrate that BiTrajDiff achieves superior performance compared to other advanced DA methods across various offline RL backbones.
nan
Article 1720
Title@2025-06-06 (5): Mapping correlations and coherence: adjacency-based approach to data visualization and regularity discovery
Title: Mapping correlations and coherence: adjacency-based approach to data visualization and regularity discovery | Mapping-Korrelationen und Kohärenz: Adjazenz-basierter Ansatz zur Datenvisualisierung und Regelmäßigkeits-Entdeckung | 绘图的相互关系和一致性:基于对应的可视化和定期发现数据方法 2506.05758v1 |
Authors (1): Guang-Xing Li
The development of science has been transforming man’s view towards nature for centuries. Observing structures and patterns in an effective approach to discover regularities from data is a key step toward theory-building. With increasingly complex data being obtained, revealing regularities systematically has become a challenge. Correlation is a most commonly-used and effective approach to describe regularities in data, yet for complex patterns, spatial inhomogeneity and complexity can often undermine the correlations. We present an algorithm to derive maps representing the type and degree of correlations, by taking the two-fold symmetry of the correlation vector into full account using the Stokes parameter. The method allows for a spatially resolved view of the nature and strength of correlations between physical quantities. In the correlation view, a region can often be separated into different subregions with different types of correlations. Subregions correspond to physical regimes for physical systems, or climate zones for climate maps. The simplicity of the method makes it widely applicable to a variety of data, where the correlation-based approach makes the map particularly useful in revealing regularities in physical systems and alike. As a new and efficient approach to represent data, the method should facilitate the development of new computational approaches to regularity discovery.
nan
Article 1721
Title@2025-06-06 (5): FlowOE: Imitation Learning with Flow Policy from Ensemble RL Experts for Optimal Execution under Heston Volatility and Concave Market Impacts
Title: FlowOE: Imitation Learning with Flow Policy from Ensemble RL Experts for Optimal Execution under Heston Volatility and Concave Market Impacts | FlowOE: Imitation Learning with Flow Policy von Ensemble RL Experts for Optimal Execution unter Heston Volatility und Concave Market Impacts | FlowOE:在赫斯顿波动和集中市场影响下最优化执行的集合RL专家以流动政策进行模拟学习 2506.05755v1 |
Authors (2): Yang Li, Zhi Chen
Optimal execution in financial markets refers to the process of strategically transacting a large volume of assets over a period to achieve the best possible outcome by balancing the trade-off between market impact costs and timing or volatility risks. Traditional optimal execution strategies, such as static Almgren-Chriss models, often prove suboptimal in dynamic financial markets. This paper propose flowOE, a novel imitation learning framework based on flow matching models, to address these limitations. FlowOE learns from a diverse set of expert traditional strategies and adaptively selects the most suitable expert behavior for prevailing market conditions. A key innovation is the incorporation of a refining loss function during the imitation process, enabling flowOE not only to mimic but also to improve upon the learned expert actions. To the best of our knowledge, this work is the first to apply flow matching models in a stochastic optimal execution problem. Empirical evaluations across various market conditions demonstrate that flowOE significantly outperforms both the specifically calibrated expert models and other traditional benchmarks, achieving higher profits with reduced risk. These results underscore the practical applicability and potential of flowOE to enhance adaptive optimal execution.
nan
Article 1722
Title@2025-06-06 (5): Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective
Title: Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective | Beschränkte Probenahmen für Sprachmodelle sollten einfach sein: Eine MCMC-Perspektive | 语言模式抽样应容易:MCMC视角 2506.05754v1 |
Authors (6): Emmanuel Anaya Gonzalez, Sairam Vaidya, Kanghee Park, Ruyi Ji, Taylor Berg-Kirkpatrick, Loris D’Antoni
Constrained decoding enables Language Models (LMs) to produce samples that provably satisfy hard constraints. However, existing constrained-decoding approaches often distort the underlying model distribution, a limitation that is especially problematic in applications like program fuzzing, where one wants to generate diverse and valid program inputs for testing purposes. We propose a new constrained sampling framework based on Markov Chain Monte Carlo (MCMC) that simultaneously satisfies three core desiderata: constraint satisfying (every sample satisfies the constraint), monotonically converging (the sampling process converges to the true conditional distribution), and efficient (high-quality samples emerge in few steps). Our method constructs a proposal distribution over valid outputs and applies a Metropolis-Hastings acceptance criterion based on the LM’s likelihood, ensuring principled and efficient exploration of the constrained space. Empirically, our sampler outperforms existing methods on both synthetic benchmarks and real-world program fuzzing tasks.
nan
Article 1723
Title@2025-06-06 (5): Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting
Title: Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting | Integration von Spatiotemporalen Features in LSTM für spatially Informed COVID-19 Hospitalization Forecasting | LSTM综合地貌特征纳入LSTM系统,用于空间知情COVID-19医院化预测 2506.05752v1 |
Authors (5): Zhongying Wang, Thoai D. Ngo, Hamidreza Zoraghein, Benjamin Lucas, Morteza Karimzadeh
The COVID-19 pandemic’s severe impact highlighted the need for accurate, timely hospitalization forecasting to support effective healthcare planning. However, most forecasting models struggled, especially during variant surges, when they were needed most. This study introduces a novel Long Short-Term Memory (LSTM) framework for forecasting daily state-level incident hospitalizations in the United States. We present a spatiotemporal feature, Social Proximity to Hospitalizations (SPH), derived from Facebook’s Social Connectedness Index to improve forecasts. SPH serves as a proxy for interstate population interaction, capturing transmission dynamics across space and time. Our parallel LSTM architecture captures both short- and long-term temporal dependencies, and our multi-horizon ensembling strategy balances consistency and forecasting error. Evaluation against COVID-19 Forecast Hub ensemble models during the Delta and Omicron surges reveals superiority of our model. On average, our model surpasses the ensemble by 27, 42, 54, and 69 hospitalizations per state on the $7^{th}$, $14^{th}$, $21^{st}$, and $28^{th}$ forecast days, respectively, during the Omicron surge. Data-ablation experiments confirm SPH’s predictive power, highlighting its effectiveness in enhancing forecasting models. This research not only advances hospitalization forecasting but also underscores the significance of spatiotemporal features, such as SPH, in refining predictive performance in modeling the complex dynamics of infectious disease spread.
nan
Article 1724
Title@2025-06-06 (5): Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models
Title: Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models | Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models | 低Rank适应的计算限制(LORA)变换模型的精确调整 2406.03136v2 |
Authors (5): Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu
We study the computational limits of Low-Rank Adaptation (LoRA) for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior of efficiency assuming the Strong Exponential Time Hypothesis (SETH), and (ii) prove the existence of almost linear algorithms by controlling the LoRA update computation term by term. For the former, we identify a sharp transition in the efficiency of all possible rank-$r$ LoRA update algorithms for transformers, based on specific norms resulting from the multiplications of the input sequence $X$, pretrained weights ${W^\star}$, and adapter matrices $\alpha B A/r$. Specifically, we derive a shared upper bound threshold for such norms, and show that efficient (sub-quadratic) approximation algorithms of LoRA exist only below this threshold. For the latter, we prove the existence of almost linear approximation algorithms for LoRA adaptation by utilizing the hierarchical low-rank structures of LoRA gradients and approximating the gradients with a series of chained low-rank approximations. To showcase our theory, we consider two practical scenarios: partial (e.g., only $W_V$ and $W_Q$) and full adaptations (e.g., $W_Q$, $W_V$, and $W_K$) of weights in attention heads.
nan
Article 1725
Title@2025-06-06 (5): Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance
Title: Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance | Effiziente Online-RFT mit Plug-and-Play-LLM-Richtern: Entsperren State-of-the-Art-Performance | 与 “ 插插 “ 和 “ Play LLM “ 法官一起高效在线RFT:解锁最新业绩 2506.05748v1 |
Authors (2): Rudransh Agnihotri, Ananya Pandey
Reward-model training is the cost bottleneck in modern Reinforcement Learning Human Feedback (RLHF) pipelines, often requiring tens of billions of parameters and an offline preference-tuning phase. In the proposed method, a frozen, instruction-tuned 7B LLM is augmented with only a one line JSON rubric and a rank-16 LoRA adapter (affecting just 0.8% of the model’s parameters), enabling it to serve as a complete substitute for the previously used heavyweight evaluation models. The plug-and-play judge achieves 96.2% accuracy on RewardBench, outperforming specialized reward networks ranging from 27B to 70B parameters. Additionally, it allows a 7B actor to outperform the top 70B DPO baseline, which scores 61.8%, by achieving 92% exact match accuracy on GSM-8K utilizing online PPO. Thorough ablations indicate that (i) six in context demonstrations deliver the majority of the zero-to-few-shot improvements (+2pp), and (ii) the LoRA effectively addresses the remaining disparity, particularly in the safety and adversarial Chat-Hard segments. The proposed model introduces HH-Rationales, a subset of 10,000 pairs from Anthropic HH-RLHF, to examine interpretability, accompanied by human generated justifications. GPT-4 scoring indicates that our LoRA judge attains approximately = 9/10 in similarity to human explanations, while zero-shot judges score around =5/10. These results indicate that the combination of prompt engineering and tiny LoRA produces a cost effective, transparent, and easily adjustable reward function, removing the offline phase while achieving new state-of-the-art outcomes for both static evaluation and online RLHF.
nan
Article 1726
Title@2025-06-06 (5): SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models
Title: SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models | SPRINT: Interleaved Planung und parallelisierte Ausführung in vernünftigen Modellen ermöglichen | SPRINT: 在理由说明模式中促进跨离规划和平行执行 2506.05745v1 |
Authors (6): Emil Biju, Shayan Talaei, Zhemin Huang, Mohammadreza Pourreza, Azalia Mirhoseini, Amin Saberi
Large reasoning models (LRMs) excel at complex reasoning tasks but typically generate lengthy sequential chains-of-thought, resulting in long inference times before arriving at the final answer. To address this challenge, we introduce SPRINT, a novel post-training and inference-time framework designed to enable LRMs to dynamically identify and exploit opportunities for parallelization during their reasoning process. SPRINT incorporates an innovative data curation pipeline that reorganizes natural language reasoning trajectories into structured rounds of long-horizon planning and parallel execution. By fine-tuning LRMs on a small amount of such curated data, the models learn to dynamically identify independent subtasks within extended reasoning processes and effectively execute them in parallel. Through extensive evaluations, we show that the models fine-tuned with the SPRINT framework match the performance of reasoning models on complex domains such as mathematics while generating up to ~39% fewer sequential tokens on problems requiring more than 8000 output tokens. Finally, we observe consistent results transferred to two out-of-distribution tasks of GPQA and Countdown with up to 45% and 65% reduction in average sequential tokens for longer reasoning trajectories, while achieving the performance of the fine-tuned reasoning model.
nan
Article 1727
Title@2025-06-06 (5): In-context denoising with one-layer transformers: connections between attention and associative memory retrieval
Title: In-context denoising with one-layer transformers: connections between attention and associative memory retrieval | In-Context-Denoising mit Einschichttransformatoren: Verbindungen zwischen Aufmerksamkeit und assoziativem Speicherabruf | 与一层变压器的自通脱色:注意和联动内存检索之间的联系 2502.05164v2 |
Authors (3): Matthew Smart, Alberto Bietti, Anirvan M. Sengupta
We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attention layer processes each denoising prompt by performing a single gradient descent update on a context-aware DAM energy landscape, where context tokens serve as associative memories and the query token acts as an initial state. This one-step update yields better solutions than exact retrieval of either a context token or a spurious local minimum, providing a concrete example of DAM networks extending beyond the standard retrieval paradigm. Overall, this work solidifies the link between associative memory and attention mechanisms first identified by Ramsauer et al., and demonstrates the relevance of associative memory models in the study of in-context learning.
nan
Article 1728
Title@2025-06-06 (5): FinSage: A Multi-aspect RAG System for Financial Filings Question Answering
Title: FinSage: A Multi-aspect RAG System for Financial Filings Question Answering | FinSage: Ein Multi-Aspect-RAG-System für Finanzfilialen Frageantworten | FinSage:财务申报问题解答的多层RAG系统 2504.14493v3 |
Authors (16): Xinyu Wang, Jijun Chi, Zhenghan Tai, Tung Sum Thomas Kwok, Muzhi Li, Zhuhong Li, Hailin He, Yuchen Hua, Peng Lu, Suyuchen Wang, Yihong Wu, Jerry Huang, Jingrui Tian, Fengran Mo, Yufei Cui, Ling Zhou
Leveraging large language models in real-world settings often entails a need to utilize domain-specific data and tools in order to follow the complex regulations that need to be followed for acceptable use. Within financial sectors, modern enterprises increasingly rely on Retrieval-Augmented Generation (RAG) systems to address complex compliance requirements in financial document workflows. However, existing solutions struggle to account for the inherent heterogeneity of data (e.g., text, tables, diagrams) and evolving nature of regulatory standards used in financial filings, leading to compromised accuracy in critical information extraction. We propose the FinSage framework as a solution, utilizing a multi-aspect RAG framework tailored for regulatory compliance analysis in multi-modal financial documents. FinSage introduces three innovative components: (1) a multi-modal pre-processing pipeline that unifies diverse data formats and generates chunk-level metadata summaries, (2) a multi-path sparse-dense retrieval system augmented with query expansion (HyDE) and metadata-aware semantic search, and (3) a domain-specialized re-ranking module fine-tuned via Direct Preference Optimization (DPO) to prioritize compliance-critical content. Extensive experiments demonstrate that FinSage achieves an impressive recall of 92.51% on 75 expert-curated questions derived from surpasses the best baseline method on the FinanceBench question answering datasets by 24.06% in accuracy. Moreover, FinSage has been successfully deployed as financial question-answering agent in online meetings, where it has already served more than 1,200 people.
nan
Article 1729
Title@2025-06-06 (5): ResearchTown: Simulator of Human Research Community
Title: ResearchTown: Simulator of Human Research Community | ResearchTown: Simulator der Human Research Community | 研究对象:人类研究界模拟者 2412.17767v2 |
Authors (8): Haofei Yu, Zhaochen Hong, Zirui Cheng, Kunlun Zhu, Keyang Xuan, Jinwei Yao, Tao Feng, Jiaxuan You
Large Language Models (LLMs) have demonstrated remarkable potential in scientific domains, yet a fundamental question remains unanswered: Can we simulate human research communities with LLMs? Addressing this question can deepen our understanding of the processes behind idea brainstorming and inspire the automatic discovery of novel scientific insights. In this work, we propose ResearchTown, a multi-agent framework for research community simulation. Within this framework, the human research community is simplified as an agent-data graph, where researchers and papers are represented as agent-type and data-type nodes, respectively, and connected based on their collaboration relationships. We also introduce TextGNN, a text-based inference framework that models various research activities (e.g., paper reading, paper writing, and review writing) as special forms of a unified message-passing process on the agent-data graph. To evaluate the quality of the research community simulation, we present ResearchBench, a benchmark that uses a node-masking prediction task for scalable and objective assessment based on similarity. Our experiments reveal three key findings: (1) ResearchTown can provide a realistic simulation of collaborative research activities, including paper writing and review writing; (2) ResearchTown can maintain robust simulation with multiple researchers and diverse papers; (3) ResearchTown can generate interdisciplinary research ideas that potentially inspire pioneering research directions.
nan
Article 1730
Title@2025-06-06 (5): Generalized Incremental Learning under Concept Drift across Evolving Data Streams
Title: Generalized Incremental Learning under Concept Drift across Evolving Data Streams | Allgemeines Inkrementelles Lernen unter Konzept Drift über Evolving Data Streams | 贯穿不断演变的数据流的 “ 概念漂流 “ 概念下的普通递增学习 2506.05736v1 |
Authors (3): En Yu, Jie Lu, Guangquan Zhang
Real-world data streams exhibit inherent non-stationarity characterized by concept drift, posing significant challenges for adaptive learning systems. While existing methods address isolated distribution shifts, they overlook the critical co-evolution of label spaces and distributions under limited supervision and persistent uncertainty. To address this, we formalize Generalized Incremental Learning under Concept Drift (GILCD), characterizing the joint evolution of distributions and label spaces in open-environment streaming contexts, and propose a novel framework called Calibrated Source-Free Adaptation (CSFA). First, CSFA introduces a training-free prototype calibration mechanism that dynamically fuses emerging prototypes with base representations, enabling stable new-class identification without optimization overhead. Second, we design a novel source-free adaptation algorithm, i.e., Reliable Surrogate Gap Sharpness-aware (RSGS) minimization. It integrates sharpness-aware perturbation loss optimization with surrogate gap minimization, while employing entropy-based uncertainty filtering to discard unreliable samples. This mechanism ensures robust distribution alignment and mitigates generalization degradation caused by uncertainties. Therefore, CSFA establishes a unified framework for stable adaptation to evolving semantics and distributions in open-world streaming scenarios. Extensive experiments validate the superior performance and effectiveness of CSFA compared to state-of-the-art approaches.
nan
Article 1731
Title@2025-06-06 (5): Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
Title: Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness | Vergessen LLMs wirklich? Bewertung des Unlernens mit Wissenskorrelation und Vertrauensbewusstsein | LLMs真的会忘却吗?用知识关联和信心意识来评价不学习 2506.05735v1 |
Authors (12): Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu, Ruihan Wu, Haoteng Yin, Mohsen Ghassemi, Yifan Li, Vamsi K. Potluru, Eli Chien, Kamalika Chaudhuri, Olgica Milenkovic, Pan Li
Machine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs). However, existing approaches predominantly focus on the explicit removal of isolated facts, often overlooking latent inferential dependencies and the non-deterministic nature of knowledge within LLMs. Consequently, facts presumed forgotten may persist implicitly through correlated information. To address these challenges, we propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge by representing relevant factual contexts as knowledge graphs with associated confidence scores. We further develop an inference-based evaluation protocol leveraging powerful LLMs as judges; these judges reason over the extracted knowledge subgraph to determine unlearning success. Our LLM judges utilize carefully designed prompts and are calibrated against human evaluations to ensure their trustworthiness and stability. Extensive experiments on our newly constructed benchmark demonstrate that our framework provides a more realistic and rigorous assessment of unlearning performance. Moreover, our findings reveal that current evaluation strategies tend to overestimate unlearning effectiveness. Our code is publicly available at https://github.com/Graph-COM/Knowledge_Unlearning.git.
nan
Article 1732
Title@2025-06-06 (5): Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity
Title: Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity | Nicht-konvexe Matrix-Erfassung: Brechen der quadratischen Rank-Schranke in der Probenkomplexität | 非曲线矩阵表感测:打破样本复杂程度的二次级屏障 2408.13276v3 |
Authors (2): Dominik Stöger, Yizhe Zhu
For the problem of reconstructing a low-rank matrix from a few linear measurements, two classes of algorithms have been widely studied in the literature: convex approaches based on nuclear norm minimization, and non-convex approaches that use factorized gradient descent. Under certain statistical model assumptions, it is known that nuclear norm minimization recovers the ground truth as soon as the number of samples scales linearly with the number of degrees of freedom of the ground truth. In contrast, while non-convex approaches are computationally less expensive, existing recovery guarantees assume that the number of samples scales at least quadratically with the rank $r$ of the ground-truth matrix. In this paper, we close this gap by showing that the non-convex approaches can be as efficient as nuclear norm minimization in terms of sample complexity. Namely, we consider the problem of reconstructing a positive semidefinite matrix from a few Gaussian measurements. We show that factorized gradient descent with spectral initialization converges to the ground truth with a linear rate as soon as the number of samples scales with $ \Omega (rd\kappa^2)$, where $d$ is the dimension, and $\kappa$ is the condition number of the ground truth matrix. This improves the previous rank-dependence in the sample complexity of non-convex matrix factorization from quadratic to linear. Our proof relies on a probabilistic decoupling argument, where we show that the gradient descent iterates are only weakly dependent on the individual entries of the measurement matrices. We expect that our proof technique is of independent interest for other non-convex problems.
nan
Article 1733
Title@2025-06-06 (5): GroMo: Plant Growth Modeling with Multiview Images
Title: GroMo: Plant Growth Modeling with Multiview Images | GroMo: Plant Growth Modeling mit Multiview-Bildern | Gromo:以多视角图像建模的植物增长模型 2503.06608v2 |
Authors (8): Ruchi Bhatt, Shreya Bansal, Amanpreet Chander, Rupinder Kaur, Malya Singh, Mohan Kankanhalli, Abdulmotaleb El Saddik, Mukesh Kumar Saini
Understanding plant growth dynamics is essential for applications in agriculture and plant phenotyping. We present the Growth Modelling (GroMo) challenge, which is designed for two primary tasks: (1) plant age prediction and (2) leaf count estimation, both essential for crop monitoring and precision agriculture. For this challenge, we introduce GroMo25, a dataset with images of four crops: radish, okra, wheat, and mustard. Each crop consists of multiple plants (p1, p2, …, pn) captured over different days (d1, d2, …, dm) and categorized into five levels (L1, L2, L3, L4, L5). Each plant is captured from 24 different angles with a 15-degree gap between images. Participants are required to perform both tasks for all four crops with these multiview images. We proposed a Multiview Vision Transformer (MVVT) model for the GroMo challenge and evaluated the crop-wise performance on GroMo25. MVVT reports an average MAE of 7.74 for age prediction and an MAE of 5.52 for leaf count. The GroMo Challenge aims to advance plant phenotyping research by encouraging innovative solutions for tracking and predicting plant growth. The GitHub repository is publicly available at https://github.com/mriglab/GroMo-Plant-Growth-Modeling-with-Multiview-Images.
nan
Article 1734
Title@2025-06-06 (5): Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization
Title: Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization | Provable Complexity Verbesserung von AdaGrad über SGD: obere und untere Bounds in stochastischen Non-Convex-Optimierung | AdaGrad的复杂度高于SGD的改善:在斯托切性非Convex优化中的上下界 2406.04592v3 |
Authors (3): Ruichen Jiang, Devyani Maladkar, Aryan Mokhtari
Adaptive gradient methods, such as AdaGrad, are among the most successful optimization algorithms for neural network training. While these methods are known to achieve better dimensional dependence than stochastic gradient descent (SGD) for stochastic convex optimization under favorable geometry, the theoretical justification for their success in stochastic non-convex optimization remains elusive. In fact, under standard assumptions of Lipschitz gradients and bounded noise variance, it is known that SGD is worst-case optimal in terms of finding a near-stationary point with respect to the $l_2$-norm, making further improvements impossible. Motivated by this limitation, we introduce refined assumptions on the smoothness structure of the objective and the gradient noise variance, which better suit the coordinate-wise nature of adaptive gradient methods. Moreover, we adopt the $l_1$-norm of the gradient as the stationarity measure, as opposed to the standard $l_2$-norm, to align with the coordinate-wise analysis and obtain tighter convergence guarantees for AdaGrad. Under these new assumptions and the $l_1$-norm stationarity measure, we establish an upper bound on the convergence rate of AdaGrad and a corresponding lower bound for SGD. In particular, we identify non-convex settings in which the iteration complexity of AdaGrad is favorable over SGD and show that, for certain configurations of problem parameters, it outperforms SGD by a factor of $d$, where $d$ is the problem dimension. To the best of our knowledge, this is the first result to demonstrate a provable gain of adaptive gradient methods over SGD in a non-convex setting. We also present supporting lower bounds, including one specific to AdaGrad and one applicable to general deterministic first-order methods, showing that our upper bound for AdaGrad is tight and unimprovable up to a logarithmic factor under certain conditions.
nan
Article 1735
Title@2025-06-06 (5): Large Language Models are Good Relational Learners
Title: Large Language Models are Good Relational Learners | Große Sprachmodelle sind gute relationale Lernende | 大语言模型是好关系学习者 2506.05725v1 |
Authors (3): Fang Wu, Vijay Prakash Dwivedi, Jure Leskovec
Large language models (LLMs) have demonstrated remarkable capabilities across various domains, yet their application to relational deep learning (RDL) remains underexplored. Existing approaches adapt LLMs by traversing relational links between entities in a database and converting the structured data into flat text documents. Still, this text-based serialization disregards critical relational structures, introduces redundancy, and often exceeds standard LLM context lengths. We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for LLMs within a retrieval-augmented generation (RAG) framework. Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to effectively process and reason over complex entity relationships. Specifically, the GNN encoder extracts a local subgraph around an entity to build feature representations that contain relevant entity relationships and temporal dependencies. These representations are transformed into structured prompts using a denormalization process, effectively allowing the LLM to reason over relational structures. Through extensive experiments, we demonstrate that Rel-LLM outperforms existing methods on key RDL tasks, offering a scalable and efficient approach to integrating LLMs with structured data sources. Code is available at https://github.com/smiles724/Rel-LLM.
nan
Article 1736
Title@2025-06-06 (5): Any-Class Presence Likelihood for Robust Multi-Label Classification with Abundant Negative Data
Title: Any-Class Presence Likelihood for Robust Multi-Label Classification with Abundant Negative Data | Jede-Klasse-Präsenz-Wahrscheinlichkeit für robuste Multi-Label-Klassifikation mit überschwänglichen Negativdaten | 具有大量负值数据的强性多标签分类中存在任何类别的可能性 2506.05721v1 |
Authors (5): Dumindu Tissera, Omar Awadallah, Muhammad Umair Danish, Ayan Sadhu, Katarina Grolinger
Multi-label Classification (MLC) assigns an instance to one or more non-exclusive classes. A challenge arises when the dataset contains a large proportion of instances with no assigned class, referred to as negative data, which can overwhelm the learning process and hinder the accurate identification and classification of positive instances. Nevertheless, it is common in MLC applications such as industrial defect detection, agricultural disease identification, and healthcare diagnosis to encounter large amounts of negative data. Assigning a separate negative class to these instances further complicates the learning objective and introduces unnecessary redundancies. To address this challenge, we redesign standard MLC loss functions by deriving a likelihood of any class being present, formulated by a normalized weighted geometric mean of the predicted class probabilities. We introduce a regularization parameter that controls the relative contribution of the absent class probabilities to the any-class presence likelihood in positive instances. The any-class presence likelihood complements the multi-label learning by encouraging the network to become more aware of implicit positive instances and improve the label classification within those positive instances. Experiments on large-scale datasets with negative data: SewerML, modified COCO, and ChestX-ray14, across various networks and base loss functions show that our loss functions consistently improve MLC performance of their standard loss counterparts, achieving gains of up to 6.01 percentage points in F1, 8.06 in F2, and 3.11 in mean average precision, all without additional parameters or computational complexity. Code available at: https://github.com/ML-for-Sensor-Data-Western/gmean-mlc
nan
Article 1737
Title@2025-06-06 (5): Optimization Proxies using Limited Labeled Data and Training Time – A Semi-Supervised Bayesian Neural Network Approach
Title: Optimization Proxies using Limited Labeled Data and Training Time – A Semi-Supervised Bayesian Neural Network Approach | Optimierungs-Proxies mit begrenzten beschrifteten Daten und Trainingszeit – Ein semi-überwachter Bayesian Neural Network-Ansatz | 使用有限标签数据和培训时间的有限标签数据和培训时间 – – 半超模贝耶斯神经网络方法 2410.03085v3 |
Authors (5): Parikshit Pareek, Abhijith Jayakumar, Kaarthik Sundar, Deepjyoti Deka, Sidhant Misra
Constrained optimization problems arise in various engineering systems such as inventory management and power grids. Standard deep neural network (DNN) based machine learning proxies are ineffective in practical settings where labeled data is scarce and training times are limited. We propose a semi-supervised Bayesian Neural Networks (BNNs) based optimization proxy for this complex regime, wherein training commences in a sandwiched fashion, alternating between a supervised learning step for minimizing cost, and an unsupervised learning step for enforcing constraint feasibility. We show that the proposed semi-supervised BNN outperforms DNN architectures on important non-convex constrained optimization problems from energy network operations, achieving up to a tenfold reduction in expected maximum equality gap and halving the inequality gaps. Further, the BNN’s ability to provide posterior samples is leveraged to construct practically meaningful probabilistic confidence bounds on performance using a limited validation data, unlike prior methods. The implementation code for this study is available at: https://github.com/kaarthiksundar/BNN-OPF/.
nan
Article 1738
Title@2025-06-06 (5): Grokking Beyond the Euclidean Norm of Model Parameters
Title: Grokking Beyond the Euclidean Norm of Model Parameters | Grokking jenseits der euklidischen Norm von Modellparametern | 示范参数欧洲标准 2506.05718v1 |
Authors (3): Pascal Jr Tikeng Notsawo, Guillaume Dumas, Guillaume Rabusseau
Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. In this work, we demonstrate that grokking can be induced by regularization, either explicit or implicit. More precisely, we show that when there exists a model with a property $P$ (e.g., sparse or low-rank weights) that generalizes on the problem of interest, gradient descent with a small but non-zero regularization of $P$ (e.g., $\ell_1$ or nuclear norm regularization) results in grokking. This extends previous work showing that small non-zero weight decay induces grokking. Moreover, our analysis shows that over-parameterization by adding depth makes it possible to grok or ungrok without explicitly using regularization, which is impossible in shallow cases. We further show that the $\ell_2$ norm is not a reliable proxy for generalization when the model is regularized toward a different property $P$, as the $\ell_2$ norm grows in many cases where no weight decay is used, but the model generalizes anyway. We also show that grokking can be amplified solely through data selection, with any other hyperparameter fixed.
nan
Article 1739
Title@2025-06-06 (5): Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning
Title: Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning | Ensemble Elastic DQN: Ein neuartiger Mehrschritt-Ensemble-Ansatz zur Bewältigung von Überschätzungen im tiefen wertbasierten Verstärkungslernen | 共合大 Elastic DQN: 一种新颖的多步共步办法,以解决深重价值强化学习中的高估问题 2506.05716v1 |
Authors (5): Adrian Ly, Richard Dazeley, Peter Vamplew, Francisco Cruz, Sunil Aryal
While many algorithmic extensions to Deep Q-Networks (DQN) have been proposed, there remains limited understanding of how different improvements interact. In particular, multi-step and ensemble style extensions have shown promise in reducing overestimation bias, thereby improving sample efficiency and algorithmic stability. In this paper, we introduce a novel algorithm called Ensemble Elastic Step DQN (EEDQN), which unifies ensembles with elastic step updates to stabilise algorithmic performance. EEDQN is designed to address two major challenges in deep reinforcement learning: overestimation bias and sample efficiency. We evaluated EEDQN against standard and ensemble DQN variants across the MinAtar benchmark, a set of environments that emphasise behavioral learning while reducing representational complexity. Our results show that EEDQN achieves consistently robust performance across all tested environments, outperforming baseline DQN methods and matching or exceeding state-of-the-art ensemble DQNs in final returns on most of the MinAtar environments. These findings highlight the potential of systematically combining algorithmic improvements and provide evidence that ensemble and multi-step methods, when carefully integrated, can yield substantial gains.
nan
Article 1740
Title@2025-06-06 (5): Mirage: A Multi-Level Superoptimizer for Tensor Programs
Title: Mirage: A Multi-Level Superoptimizer for Tensor Programs | Mirage: Ein Multi-Level-Superoptimizer für Tensor-Programme | 幻影:向导方案多层次超强激励器 2405.05751v3 |
Authors (10): Mengdi Wu, Xinhao Cheng, Shengyu Liu, Chunan Shi, Jianan Ji, Kit Ao, Praveen Velliengiri, Xupeng Miao, Oded Padon, Zhihao Jia
We introduce Mirage, the first multi-level superoptimizer for tensor programs. A key idea in Mirage is $\mu$Graphs, a uniform representation of tensor programs at the kernel, thread block, and thread levels of the GPU compute hierarchy. $\mu$Graphs enable Mirage to discover novel optimizations that combine algebraic transformations, schedule transformations, and generation of new custom kernels. To navigate the large search space, Mirage introduces a pruning technique based on abstraction that significantly reduces the search space and provides a certain optimality guarantee. To ensure that the optimized $\mu$Graph is equivalent to the input program, Mirage introduces a probabilistic equivalence verification procedure with strong theoretical guarantees. Our evaluation shows that Mirage outperforms existing approaches by up to 3.3$\times$ even for DNNs that are widely used and heavily optimized. Mirage is publicly available at https://github.com/mirage-project/mirage.
nan
Article 1741
Title@2025-06-06 (5): Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation
Title: Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation | Kommen Sie zusammen, aber nicht jetzt: Eine progressive Strategie, um Low-Rank-Anpassung zu fördern | 齐心合力,但现在不是现在:一个推进低Rank适应的渐进战略 2506.05713v1 |
Authors (12): Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma, Yu Zhang, Ying Wei
Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters’ activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter’s marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances multi-task merging accuracy, improves pruning robustness, and reduces training overhead, all while remaining compatible with diverse LoRA variants. Code is available at https://github.com/zwebzone/coto.
nan
Article 1742
Title@2025-06-06 (5): A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
Title: A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models | Eine Umfrage über Sparse Autoencoder: Interpretation der internen Mechanismen großer Sprachmodelle | 粗略自动编码器调查:解释大语言模式内部机制 2503.05613v2 |
Authors (7): Dong Shu, Xuansheng Wu, Haiyan Zhao, Daking Rai, Ziyu Yao, Ninghao Liu, Mengnan Du
Large Language Models (LLMs) have transformed natural language processing, yet their internal mechanisms remain largely opaque. Recently, mechanistic interpretability has attracted significant attention from the research community as a means to understand the inner workings of LLMs. Among various mechanistic interpretability approaches, Sparse Autoencoders (SAEs) have emerged as a promising method due to their ability to disentangle the complex, superimposed features within LLMs into more interpretable components. This paper presents a comprehensive survey of SAEs for interpreting and understanding the internal workings of LLMs. Our major contributions include: (1) exploring the technical framework of SAEs, covering basic architecture, design improvements, and effective training strategies; (2) examining different approaches to explaining SAE features, categorized into input-based and output-based explanation methods; (3) discussing evaluation methods for assessing SAE performance, covering both structural and functional metrics; and (4) investigating real-world applications of SAEs in understanding and manipulating LLM behaviors.
nan
Article 1743
Title@2025-06-06 (5): Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application
Title: Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application | Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung | 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v1 |
Authors (4): Xiucheng Wang, Honggang Jia, Nan Cheng, Dusit Niyato
In this paper, a novel semantic communication framework empowered by generative artificial intelligence (GAI) is proposed, specifically leveraging the capabilities of diffusion models (DMs). A rigorous theoretical foundation is established based on stochastic differential equations (SDEs), which elucidates the denoising properties of DMs in mitigating additive white Gaussian noise (AWGN) in latent semantic representations. Crucially, a closed-form analytical relationship between the signal-to-noise ratio (SNR) and the denoising timestep is derived, enabling the optimal selection of diffusion parameters for any given channel condition. To address the distribution mismatch between the received signal and the DM’s training data, a mathematically principled scaling mechanism is introduced, ensuring robust performance across a wide range of SNRs without requiring model fine-tuning. Built upon this theoretical insight, we develop a latent diffusion model (LDM)-based semantic transceiver, wherein a variational autoencoder (VAE) is employed for efficient semantic compression, and a pretrained DM serves as a universal denoiser. Notably, the proposed architecture is fully training-free at inference time, offering high modularity and compatibility with large-scale pretrained LDMs. This design inherently supports zero-shot generalization and mitigates the challenges posed by out-of-distribution inputs. Extensive experimental evaluations demonstrate that the proposed framework significantly outperforms conventional neural-network-based semantic communication baselines, particularly under low SNR conditions and distributional shifts, thereby establishing a promising direction for GAI-driven robust semantic transmission in future 6G systems.
nan
Article 1744
Title@2025-06-06 (5): Understanding Memorization in Generative Models via Sharpness in Probability Landscapes
Title: Understanding Memorization in Generative Models via Sharpness in Probability Landscapes | Gedächtnis in Generativen Modellen über Schärfe in Wahrscheinlichkeitslandschaften verstehen | 通过在概率景观中的尖锐化理解生成模型中的记忆化 2412.04140v3 |
Authors (3): Dongjae Jeon, Dueun Kim, Albert No
In this paper, we introduce a geometric framework to analyze memorization in diffusion models through the sharpness of the log probability density. We mathematically justify a previously proposed score-difference-based memorization metric by demonstrating its effectiveness in quantifying sharpness. Additionally, we propose a novel memorization metric that captures sharpness at the initial stage of image generation in latent diffusion models, offering early insights into potential memorization. Leveraging this metric, we develop a mitigation strategy that optimizes the initial noise of the generation process using a sharpness-aware regularization term.
nan
Article 1745
Title@2025-06-06 (5): Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting
Title: Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting | Online-Erkennung von LLM-generierten Texten durch sequentielle Hypothesenprüfung durch Wetten | 通过打赌按顺序假设测试在线检测LLM发光文本 2410.22318v3 |
Authors (2): Can Chen, Jun-Kun Wang
Developing algorithms to differentiate between machine-generated texts and human-written texts has garnered substantial attention in recent years. Existing methods in this direction typically concern an offline setting where a dataset containing a mix of real and machine-generated texts is given upfront, and the task is to determine whether each sample in the dataset is from a large language model (LLM) or a human. However, in many practical scenarios, sources such as news websites, social media accounts, and online forums publish content in a streaming fashion. Therefore, in this online scenario, how to quickly and accurately determine whether the source is an LLM with strong statistical guarantees is crucial for these media or platforms to function effectively and prevent the spread of misinformation and other potential misuse of LLMs. To tackle the problem of online detection, we develop an algorithm based on the techniques of sequential hypothesis testing by betting that not only builds upon and complements existing offline detection techniques but also enjoys statistical guarantees, which include a controlled false positive rate and the expected time to correctly identify a source as an LLM. Experiments were conducted to demonstrate the effectiveness of our method.
nan
Article 1746
Title@2025-06-06 (5): Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces
Title: Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces | Action-Adaptive Continual Learning: Politische Generalisierung unter dynamischen Aktionsräumen aktivieren | 行动促进性持续学习:动态行动空间下的扶持性政策概括化 2506.05702v1 |
Authors (7): Chaofan Pan, Jiafen Liu, Yanhua Li, Linbo Xiong, Fan Min, Wei Wei, Xin Yang
Continual Learning (CL) is a powerful tool that enables agents to learn a sequence of tasks, accumulating knowledge learned in the past and using it for problem-solving or future task learning. However, existing CL methods often assume that the agent’s capabilities remain static within dynamic environments, which doesn’t reflect real-world scenarios where capabilities dynamically change. This paper introduces a new and realistic problem: Continual Learning with Dynamic Capabilities (CL-DC), posing a significant challenge for CL agents: How can policy generalization across different action spaces be achieved? Inspired by the cortical functions, we propose an Action-Adaptive Continual Learning framework (AACL) to address this challenge. Our framework decouples the agent’s policy from the specific action space by building an action representation space. For a new action space, the encoder-decoder of action representations is adaptively fine-tuned to maintain a balance between stability and plasticity. Furthermore, we release a benchmark based on three environments to validate the effectiveness of methods for CL-DC. Experimental results demonstrate that our framework outperforms popular methods by generalizing the policy across action spaces.
nan
Article 1747
Title@2025-06-06 (5): Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health
Title: Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health | Statistisch gültige Überwachung nach der Bereitstellung sollte Standard für KI-basierte digitale Gesundheit sein | 统计上有效的部署后监测应作为AI 数字健康的标准 2506.05701v1 |
Authors (4): Pavel Dolin, Weizhi Li, Gautam Dasarathy, Visar Berisha
This position paper argues that post-deployment monitoring in clinical AI is underdeveloped and proposes statistically valid and label-efficient testing frameworks as a principled foundation for ensuring reliability and safety in real-world deployment. A recent review found that only 9% of FDA-registered AI-based healthcare tools include a post-deployment surveillance plan. Existing monitoring approaches are often manual, sporadic, and reactive, making them ill-suited for the dynamic environments in which clinical models operate. We contend that post-deployment monitoring should be grounded in label-efficient and statistically valid testing frameworks, offering a principled alternative to current practices. We use the term “statistically valid” to refer to methods that provide explicit guarantees on error rates (e.g., Type I/II error), enable formal inference under pre-defined assumptions, and support reproducibility–features that align with regulatory requirements. Specifically, we propose that the detection of changes in the data and model performance degradation should be framed as distinct statistical hypothesis testing problems. Grounding monitoring in statistical rigor ensures a reproducible and scientifically sound basis for maintaining the reliability of clinical AI systems. Importantly, it also opens new research directions for the technical community–spanning theory, methods, and tools for statistically principled detection, attribution, and mitigation of post-deployment model failures in real-world settings.
nan
Article 1748
Title@2025-06-06 (5): Knowledge Retention for Continual Model-Based Reinforcement Learning
Title: Knowledge Retention for Continual Model-Based Reinforcement Learning | Wissensvorbehalt für kontinuierliches modellbasiertes Verstärkungslernen | 继续保留知识,以不断保持知识,进行基于模式的不断强化学习 2503.04256v4 |
Authors (4): Yixiang Sun, Haotian Fu, Michael Littman, George Konidaris
We propose DRAGO, a novel approach for continual model-based reinforcement learning aimed at improving the incremental development of world models across a sequence of tasks that differ in their reward functions but not the state space or dynamics. DRAGO comprises two key components: Synthetic Experience Rehearsal, which leverages generative models to create synthetic experiences from past tasks, allowing the agent to reinforce previously learned dynamics without storing data, and Regaining Memories Through Exploration, which introduces an intrinsic reward mechanism to guide the agent toward revisiting relevant states from prior tasks. Together, these components enable the agent to maintain a comprehensive and continually developing world model, facilitating more effective learning and adaptation across diverse environments. Empirical evaluations demonstrate that DRAGO is able to preserve knowledge across tasks, achieving superior performance in various continual learning scenarios.
nan
Article 1749
Title@2025-06-06 (5): Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning
Title: Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning | Allgemeines LLM-Lernen von Graphen-Synthetischen Daten mit Verstärkungs-Lernen | 利用强化学习学习图形合成数据学习 2506.00845v2 |
Authors (6): Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xinyun Liu, Yulia Tsvetkov
Previous research has sought to enhance the graph reasoning capabilities of LLMs by supervised fine-tuning on synthetic graph data. While these led to specialized LLMs better at solving graph algorithm problems, we don’t need LLMs for shortest path: we need generalization from synthetic graph data to real-world tasks with implicit graph structures. In this work, we propose to unlock generalizable learning of graph synthetic data with reinforcement learning. We first design solution-based and process-based rewards for synthetic graph problems: instead of rigid memorizing response patterns in direct fine-tuning, we posit that RL would help LLMs grasp the essentials underlying graph reasoning and alleviate overfitting. We employ RL algorithms such as GRPO and DPO, aligning both off-the-shelf LLMs and LLMs fine-tuned on synthetic graph data. We then compare them against existing settings on both in-domain synthetic tasks and out-of-domain real-world tasks with implicit graph structures such as multi-hop QA, structured planning, and more. Extensive experiments demonstrate that our RL recipe leads to statistically significant improvement on 5 datasets, with an average gain of 12.9\% over baseline settings. Further analysis reveals that process-based rewards consistently outperform solution-based rewards, mixing synthetic and real-world task data yields potential gains, while compositionality and explainable intermediate steps remains a critical challenge even after RL.
nan
Article 1750
Title@2025-06-06 (5): Graph Attention Networks Unleashed: A Fast and Explainable Vulnerability Assessment Framework for Microgrids
Title: Graph Attention Networks Unleashed: A Fast and Explainable Vulnerability Assessment Framework for Microgrids | Graphen-Achtungsnetzwerke unleashed: Ein schneller und erklärbarer Sicherheitsbewertungsrahmen für Mikrogrids | 未剥离的图示注意网络:微电网快速和可解释的脆弱性评估框架 2503.00786v2 |
Authors (5): Wei Liu, Tao Zhang, Chenhui Lin, Kaiwen Li, Rui Wang
Independent microgrids are crucial for supplying electricity by combining distributed energy resources and loads in scenarios like isolated islands and field combat. Fast and accurate assessments of microgrid vulnerability against intentional attacks or natural disasters are essential for effective risk prevention and design optimization. However, conventional Monte Carlo simulation (MCS) methods are computationally expensive and time-consuming, while existing machine learning-based approaches often lack accuracy and explainability. To address these challenges, this study proposes a fast and explainable vulnerability assessment framework that integrates MCS with a graph attention network enhanced by self-attention pooling (GAT-S). MCS generates training data, while the GAT-S model learns the structural and electrical characteristics of the microgrid and further assesses its vulnerability intelligently. The GAT-S improves explainability and computational efficiency by dynamically assigning attention weights to critical nodes. Comprehensive experimental evaluations across various microgrid configurations demonstrate that the proposed framework provides accurate vulnerability assessments, achieving a mean squared error as low as 0.001, real-time responsiveness within 1 second, and delivering explainable results.
nan
Article 1751
Title@2025-06-06 (5): Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework
Title: Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework | Progressiv stark sein! Erweiterung der Wissensdestillation von großen Sprachmodellen durch einen Curriculum Learning Framework | 通过课程学习框架加强大语言模式的知识提炼 2506.05695v1 |
Authors (2): Lingyuan Liu, Mengxiang Zhang
Knowledge Distillation (KD) compresses large language models (LLMs) by transferring the teacher model’s capabilities to a smaller student model, reducing inference cost and memory usage while maintaining performance. However, existing KD methods for LLMs often fail to prevent significant shifts in the student model’s distribution during training, leading to issues such as catastrophic forgetting, mode collapse, and training-inference mismatch. To address these challenges, we propose a novel, plug-in curriculum learning framework inspired by the strength training principle of “progressive overload” (POCL), which can be seamlessly integrated into existing white-box KD approaches with minimal computational overhead. The framework comprises two core components: (1) a difficulty measurer that ranks and partitions training samples from easy to hard, and (2) a training scheduler that incrementally introduces these subsets into the distillation process at fixed intervals while applying loss functions with progressively rising temperatures. By starting with the easiest samples and progressively increasing the difficulty, the approach enhances both the stability and efficiency of learning. Extensive experiments in instruction-following settings demonstrate that POCL consistently improves the performance of distilled student models across various white-box KD methods and model families. Our findings highlight the effectiveness of sorted training samples in KD for LLMs. More generally, our work demonstrates how to structure training data within the KD process to enhance the stability and performance of distilled LLMs.
nan
Article 1752
Title@2025-06-06 (5): Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs
Title: Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs | Direct Behavior Optimization: Entsperren des Potenzials leichter LLMs | 直接行为优化:释放轻度LMT的潜力 2506.06401v1 |
Authors (7): Hongming Yang, Shi Lin, Jun Shao, Changting Lin, Donghai Zhu, Meng Han, Qinglei Kong
Lightweight Large Language Models (LwLLMs) are reduced-parameter, optimized models designed to run efficiently on consumer-grade hardware, offering significant advantages in resource efficiency, cost-effectiveness, and data privacy. However, these models often struggle with limited inference and reasoning capabilities, which restrict their performance on complex tasks and limit their practical applicability. Moreover, existing prompt optimization methods typically rely on extensive manual effort or the meta-cognitive abilities of state-of-the-art LLMs, making them less effective for LwLLMs. To address these challenges, we introduce DeBoP, a new Direct Behavior Optimization Paradigm, original from the Chain-of-Thought (CoT) prompting technique. Unlike CoT Prompting, DeBoP is an automatic optimization method, which focuses on the optimization directly on the behavior of LwLLMs. In particular, DeBoP transforms the optimization of complex prompts into the optimization of discrete, quantifiable execution sequences using a gradient-free Monte Carlo Tree Search. We evaluate DeBoP on seven challenging tasks where state-of-the-art LLMs excel but LwLLMs generally underperform. Experimental results demonstrate that DeBoP significantly outperforms recent prompt optimization methods on most tasks. In particular, DeBoP-optimized LwLLMs surpass GPT-3.5 on most tasks while reducing computational time by approximately 60% compared to other automatic prompt optimization methods.
nan
Article 1753
Title@2025-06-06 (5): Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning
Title: Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning | Gerade genug denken: Effiziente Vernunft mit adaptiven Längenstrafen Stärkungslernen | ” 思考够了 “ : “ 适应性延长刑罚强化学习的高效理性 “ 2506.05256v2 |
Authors (7): Violet Xiang, Chase Blagden, Rafael Rafailov, Nathan Lile, Sang Truong, Chelsea Finn, Nick Haber
Large reasoning models (LRMs) achieve higher performance on challenging reasoning tasks by generating more tokens at inference time, but this verbosity often wastes computation on easy problems. Existing solutions, including supervised finetuning on shorter traces, user-controlled budgets, or RL with uniform penalties, either require data curation, manual configuration, or treat all problems alike regardless of difficulty. We introduce Adaptive Length Penalty (ALP), a reinforcement learning objective tailoring generation length to per-prompt solve rate. During training, ALP monitors each prompt’s online solve rate through multiple rollouts and adds a differentiable penalty whose magnitude scales inversely with that rate, so confident (easy) prompts incur a high cost for extra tokens while hard prompts remain unhindered. Posttraining DeepScaleR-1.5B with ALP cuts average token usage by 50\% without significantly dropping performance. Relative to fixed-budget and uniform penalty baselines, ALP redistributes its reduced budget more intelligently by cutting compute on easy prompts and reallocating saved tokens to difficult ones, delivering higher accuracy on the hardest problems with higher cost.
nan
Article 1754
Title@2025-06-06 (5): Simmering: Sufficient is better than optimal for training neural networks
Title: Simmering: Sufficient is better than optimal for training neural networks | Simmering: Ausreichend ist besser als optimal für die Ausbildung neuronaler Netzwerke | 浸泡:对培训神经网络而言,充足比最佳的更好 2410.19912v2 |
Authors (3): Irina Babayan, Hazhir Aliahmadi, Greg van Anders
The broad range of neural network training techniques that invoke optimization but rely on ad hoc modification for validity suggests that optimization-based training is misguided. Shortcomings of optimization-based training are brought to particularly strong relief by the problem of overfitting, where naive optimization produces spurious outcomes. The broad success of neural networks for modelling physical processes has prompted advances that are based on inverting the direction of investigation and treating neural networks as if they were physical systems in their own right. These successes raise the question of whether broader, physical perspectives could motivate the construction of improved training algorithms. Here, we introduce simmering, a physics-based method that trains neural networks to generate weights and biases that are merely ``good enough’’, but which, paradoxically, outperforms leading optimization-based approaches. Using classification and regression examples we show that simmering corrects neural networks that are overfit by Adam, and show that simmering avoids overfitting if deployed from the outset. Our results question optimization as a paradigm for neural network training, and leverage information-geometric arguments to point to the existence of classes of sufficient training algorithms that do not take optimization as their starting point.
nan
Article 1755
Title@2025-06-06 (5): TracLLM: A Generic Framework for Attributing Long Context LLMs
Title: TracLLM: A Generic Framework for Attributing Long Context LLMs | TracLLM: Ein generisches Rahmenwerk für die Zuweisung von LLMs mit langem Kontext | TracLLM: 长期LMLM授标通用框架 2506.04202v2 |
Authors (4): Yanting Wang, Wei Zou, Runpeng Geng, Jinyuan Jia
Long context large language models (LLMs) are deployed in many real-world applications such as RAG, agent, and broad LLM-integrated applications. Given an instruction and a long context (e.g., documents, PDF files, webpages), a long context LLM can generate an output grounded in the provided context, aiming to provide more accurate, up-to-date, and verifiable outputs while reducing hallucinations and unsupported claims. This raises a research question: how to pinpoint the texts (e.g., sentences, passages, or paragraphs) in the context that contribute most to or are responsible for the generated output by an LLM? This process, which we call context traceback, has various real-world applications, such as 1) debugging LLM-based systems, 2) conducting post-attack forensic analysis for attacks (e.g., prompt injection attack, knowledge corruption attacks) to an LLM, and 3) highlighting knowledge sources to enhance the trust of users towards outputs generated by LLMs. When applied to context traceback for long context LLMs, existing feature attribution methods such as Shapley have sub-optimal performance and/or incur a large computational cost. In this work, we develop TracLLM, the first generic context traceback framework tailored to long context LLMs. Our framework can improve the effectiveness and efficiency of existing feature attribution methods. To improve the efficiency, we develop an informed search based algorithm in TracLLM. We also develop contribution score ensemble/denoising techniques to improve the accuracy of TracLLM. Our evaluation results show TracLLM can effectively identify texts in a long context that lead to the output of an LLM. Our code and data are at: https://github.com/Wang-Yanting/TracLLM.
nan
Article 1756
Title@2025-06-06 (5): RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models
Title: RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models | RoSTE: Ein effizienter Quantization-Aware-überwachter Feintuning-Ansatz für große Sprachmodelle | RoSTE:对大语言模式采用高效率的量化软件监督微调方法 2502.09003v3 |
Authors (7): Quan Wei, Chung-Yiu Yau, Hoi-To Wai, Yang Katie Zhao, Dongyeop Kang, Youngsuk Park, Mingyi Hong
Supervised fine-tuning is a standard method for adapting pre-trained large language models (LLMs) to downstream tasks. Quantization has been recently studied as a post-training technique for efficient LLM deployment. To obtain quantized fine-tuned LLMs, conventional pipelines would first fine-tune the pre-trained models, followed by post-training quantization. This often yields suboptimal performance as it fails to leverage the synergy between fine-tuning and quantization. To effectively realize low-bit quantization of weights, activations and KV caches in LLMs, we propose an algorithm named Rotated Straight-Through-Estimator (RoSTE), which combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy that identifies an effective rotation configuration to reduce activation outliers. We provide theoretical insights on RoSTE by analyzing its prediction error when applied to an overparameterized least square quantized training problem. Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration. Experiments on Pythia, Qwen and Llama models of different sizes demonstrate the effectiveness of RoSTE. Compared to existing post-SFT quantization baselines, our method consistently achieves superior performances across various tasks and different LLM architectures. Our code is available at https://github.com/OptimAI-Lab/RoSTE.
nan
Article 1757
Title@2025-06-06 (5): Sparse Autoencoders, Again?
Title: Sparse Autoencoders, Again? | Sparse Autoencoder, schon wieder? | 又是粗糙的自动编码器吗? 2506.04859v2 |
Authors (4): Yin Lu, Xuening Zhu, Tong He, David Wipf
Is there really much more to say about sparse autoencoders (SAEs)? Autoencoders in general, and SAEs in particular, represent deep architectures that are capable of modeling low-dimensional latent structure in data. Such structure could reflect, among other things, correlation patterns in large language model activations, or complex natural image manifolds. And yet despite the wide-ranging applicability, there have been relatively few changes to SAEs beyond the original recipe from decades ago, namely, standard deep encoder/decoder layers trained with a classical/deterministic sparse regularizer applied within the latent space. One possible exception is the variational autoencoder (VAE), which adopts a stochastic encoder module capable of producing sparse representations when applied to manifold data. In this work we formalize underappreciated weaknesses with both canonical SAEs, as well as analogous VAEs applied to similar tasks, and propose a hybrid alternative model that circumvents these prior limitations. In terms of theoretical support, we prove that global minima of our proposed model recover certain forms of structured data spread across a union of manifolds. Meanwhile, empirical evaluations on synthetic and real-world datasets substantiate the efficacy of our approach in accurately estimating underlying manifold dimensions and producing sparser latent representations without compromising reconstruction error. In general, we are able to exceed the performance of equivalent-capacity SAEs and VAEs, as well as recent diffusion models where applicable, within domains such as images and language model activation patterns.
nan
Article 1758
Title@2025-06-06 (5): Learning Design-Score Manifold to Guide Diffusion Models for Offline Optimization
Title: Learning Design-Score Manifold to Guide Diffusion Models for Offline Optimization | Learning Design-Score Manifold zur Anleitung von Diffusionsmodellen für Offline-Optimierung | 指导离线优化推广模式 2506.05680v1 |
Authors (6): Tailin Zhou, Zhilin Chen, Wenlong Lyu, Zhitang Chen, Danny H. K. Tsang, Jun Zhang
Optimizing complex systems, from discovering therapeutic drugs to designing high-performance materials, remains a fundamental challenge across science and engineering, as the underlying rules are often unknown and costly to evaluate. Offline optimization aims to optimize designs for target scores using pre-collected datasets without system interaction. However, conventional approaches may fail beyond training data, predicting inaccurate scores and generating inferior designs. This paper introduces ManGO, a diffusion-based framework that learns the design-score manifold, capturing the design-score interdependencies holistically. Unlike existing methods that treat design and score spaces in isolation, ManGO unifies forward prediction and backward generation, attaining generalization beyond training data. Key to this is its derivative-free guidance for conditional generation, coupled with adaptive inference-time scaling that dynamically optimizes denoising paths. Extensive evaluations demonstrate that ManGO outperforms 24 single- and 10 multi-objective optimization methods across diverse domains, including synthetic tasks, robot control, material design, DNA sequence, and real-world engineering optimization.
nan
Article 1759
Title@2025-06-06 (5): Topology-aware Neural Flux Prediction Guided by Physics
Title: Topology-aware Neural Flux Prediction Guided by Physics | Topologie-bewusst Neurale Flux-Vorhersage geleitet von der Physik | 物理引导 2506.05676v1 |
Authors (4): Haoyang Jiang, Jindong Wang, Xingquan Zhu, Yi He
Graph Neural Networks (GNNs) often struggle in preserving high-frequency components of nodal signals when dealing with directed graphs. Such components are crucial for modeling flow dynamics, without which a traditional GNN tends to treat a graph with forward and reverse topologies equal.To make GNNs sensitive to those high-frequency components thereby being capable to capture detailed topological differences, this paper proposes a novel framework that combines 1) explicit difference matrices that model directional gradients and 2) implicit physical constraints that enforce messages passing within GNNs to be consistent with natural laws. Evaluations on two real-world directed graph data, namely, water flux network and urban traffic flow network, demonstrate the effectiveness of our proposal.
nan
Article 1760
Title@2025-06-06 (5): LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
Title: LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models | LauraTSE: Target Speaker Extraction mit Auto-Regressive Decoder-Only Language Models | LARTSE: 使用自动递退解码器单一语言模式将议长抽取作为目标 2504.07402v2 |
Authors (3): Beilong Tang, Bang Zeng, Ming Li
We propose LauraTSE, an Auto-Regressive Decoder-Only Language Model for Target Speaker Extraction built upon the LauraGPT backbone. LauraTSE employs a small-scale auto-regressive decoder-only language model that generates the initial layers of the target speech’s discrete codec representations from the continuous embeddings of both the mixture and reference speech. These outputs serve as coarse-grained predictions. To refine them, a one-step encoder-only language model reconstructs the full codec representation by integrating information from both the mixture and the reference speech, adding fine-grained details. Our approach achieves superior or comparable performance to existing TSE models. Additionally, we conduct ablation studies to investigate the data scalability and the contribution of the encoder-only model.
nan
Article 1761
Title@2025-06-06 (5): Kernel Density Machines
Title: Kernel Density Machines | Kerndichtemaschinen | 内核密度机 2504.21419v2 |
Authors (2): Damir Filipovic, Paul Schneider
We introduce kernel density machines (KDM), a nonparametric estimator of a Radon–Nikodym derivative, based on reproducing kernel Hilbert spaces. KDM applies to general probability measures on countably generated measurable spaces under minimal assumptions. For computational efficiency, we incorporate a low-rank approximation with precisely controlled error that grants scalability to large-sample settings. We provide rigorous theoretical guarantees, including asymptotic consistency, a functional central limit theorem, and finite-sample error bounds, establishing a strong foundation for practical use. Empirical results based on simulated and real data demonstrate the efficacy and precision of KDM.
nan
Article 1762
Title@2025-06-06 (5): Contextually Guided Transformers via Low-Rank Adaptation
Title: Contextually Guided Transformers via Low-Rank Adaptation | Kontextuell geführte Transformer über Low-Rank-Anpassung | 通过低辐射适应实现上下文向导变形器 2506.05672v1 |
Authors (4): Andrey Zhmoginov, Jihwan Lee, Max Vladymyrov, Mark Sandler
Large Language Models (LLMs) based on Transformers excel at text processing, but their reliance on prompts for specialized behavior introduces computational overhead. We propose a modification to a Transformer architecture that eliminates the need for explicit prompts by learning to encode context into the model’s weights. Our Contextually Guided Transformer (CGT) model maintains a contextual summary at each sequence position, allowing it to update the weights on the fly based on the preceding context. This approach enables the model to self-specialize, effectively creating a tailored model for processing information following a given prefix. We demonstrate the effectiveness of our method on synthetic in-context learning tasks and language modeling benchmarks. Furthermore, we introduce techniques for enhancing the interpretability of the learned contextual representations, drawing connections to Variational Autoencoders and promoting smoother, more consistent context encoding. This work offers a novel direction for efficient and adaptable language modeling by integrating context directly into the model’s architecture.
nan
Article 1763
Title@2025-06-06 (5): BAQ: Efficient Bit Allocation Quantization for Large Language Models
Title: BAQ: Efficient Bit Allocation Quantization for Large Language Models | BAQ: Effiziente Bitallokation Quantisierung für große Sprachmodelle | BAQ: 大语言模式的高效比分配比量量化 2506.05664v1 |
Authors (4): Chao Zhang, Li Wang, Samson Lasaulce, Merouane Debbah
Post-training model quantization is a widely adopted technique for reducing the memory and computational costs of large language models (LLMs). However, most existing methods rely on uniform or heuristic bitwidth assignments, failing to account for the nonuniform sensitivity of weights to quantization noise. In this paper, we propose a novel framework for allocating quantization bitwidths based on sensitivity metrics derived from a Hessian proxy. We make key assumptions, which allow the layer/component-wise loss function to be expressed as an explicit function of the bitwidths. This enables a neat formulation of the bit allocation problem as a convex optimization task, whose closed-form solution adapts precision across weights to minimize the layer-wise quantization loss. Inspecting the solution provides several insights (such as the equal-loss structure), which are then exploited to design the proposed \textbf{BAQ} (Bit Allocation Quantization) algorithm. The proposed algorithm achieves a good trade-off between loss minimization and complexity and allows BAQ to be integrated into standard quantization pipelines with minimal overhead. Experimental results show that BAQ consistently outperforms GPTQ, achieving up to 56$\times$ lower perplexity at the same bitwidth on large language models ranging from 125M to 30B parameters. Leveraging our analytical results derived from solving the optimal bit allocation problem, we also provide a theoretical explanation for the observed gains. All codes of this paper are available at https://github.com/CSU-ModelCompression/BAQ.
nan
Article 1764
Title@2025-06-06 (5): CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation
Title: CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation | CoopetitiveV: LLM-powered Coopetitive Multi-Agent für hochwertige Verilog-Generation | 协作V:利用LLM-动力协同协作的多方协作促进高品质活性一代 2412.11014v2 |
Authors (8): Zhendong Mi, Renming Zheng, Haowen Zhong, Yue Sun, Seth Kneeland, Sayan Moitra, Ken Kutzer, Zhaozhuo Xu Shaoyi Huang
Recent advances in agentic LLMs have demonstrated great capabilities in Verilog code generation. However, existing approaches either use LLM-assisted single-agent prompting or cooperation-only multi-agent learning, which will lead to: (i) Degeneration issue for single-agent learning: characterized by diminished error detection and correction capabilities; (ii) Error propagation in cooperation-only multi-agent learning: erroneous information from the former agent will be propagated to the latter through prompts, which can make the latter agents generate buggy code. In this paper, we propose an LLM-based coopetitive multi-agent prompting framework, in which the agents cannot collaborate with each other to form the generation pipeline, but also create a healthy competitive mechanism to improve the generating quality. Our experimental results show that the coopetitive multi-agent framework can effectively mitigate the degeneration risk and reduce the error propagation while improving code error correction capabilities, resulting in higher quality Verilog code generation. The effectiveness of our approach is validated through extensive experiments. On VerilogEval Machine and Human dataset, CoopetitiveV+GPT-4 achieves 99.2% and 99.1% pass@10 scores, respectively. While on RTLLM, CoopetitiveV+GPT-4 obtains 100% syntax and 99.9% functionality pass@5 scores.
nan
Article 1765
Title@2025-06-06 (5): Emulating compact binary population synthesis simulations with robust uncertainty quantification and model comparison: Bayesian normalizing flows
Title: Emulating compact binary population synthesis simulations with robust uncertainty quantification and model comparison: Bayesian normalizing flows | Emulieren kompakter binärer Populationssynthesesimulationen mit robuster Unsicherheitsquantifizierung und Modellvergleich: Bayesische Normalisierungsströme | 以稳健的不确定性量化和模型比较来模拟精密的二元人口合成模拟,并进行有力的不确定性量化和模型比较: 2506.05657v1 |
Authors (1): Anarya Ray
Population synthesis simulations of compact binary coalescences~(CBCs) play a crucial role in extracting astrophysical insights from an ensemble of gravitational wave~(GW) observations. However, realistic simulations are costly to implement for a dense grid of initial conditions. Normalizing flows can emulate the distribution functions of a simulated population of binary parameters and thereby enable empirical constraints on the astrophysical initial conditions and branching fractions of various formation channels given data from a catalog of GW observations. They can also be used for data amplification in sparse regions of the CBC parameter space to guide the development of phenomenological population models for rarely synthesizable systems with components in theorized mass gaps, without having to simulate a prohibitively large number of binaries. But flow predictions are wrought with uncertainties, especially for sparse training sets. In this work I develop a method for quantifying and marginalizing uncertainties in the emulators by introducing the Bayesian Normalizing flow, a conditional density estimator constructed from Bayesian neural networks. Using the exact likelihood function associated with density estimators I sample the posterior distribution of flow parameters with suitably chosen priors to quantify and marginalize over flow uncertainties. I demonstrate the accuracy, calibration, and data-amplification impacts of the estimated uncertainties for simulations of binary black hole populations formed through common envelope evolution. I outline applications of the methodology in simulation-based inference from growing GW catalogs and sketch other uses for general simulation-based approaches in GW astronomy.
nan
Article 1766
Title@2025-06-06 (5): Federated Foundation Model for GI Endoscopy Images
Title: Federated Foundation Model for GI Endoscopy Images | Federiertes Stiftungsmodell für GI-Endoskopie-Bilder | GI 内窥镜图像联邦基金会模型 2505.24108v2 |
Authors (8): Alina Devkota, Annahita Amireskandari, Joel Palko, Shyam Thakkar, Donald Adjeroh, Xiajun Jiang, Binod Bhattarai, Prashnna K. Gyawali
Gastrointestinal (GI) endoscopy is essential in identifying GI tract abnormalities in order to detect diseases in their early stages and improve patient outcomes. Although deep learning has shown success in supporting GI diagnostics and decision-making, these models require curated datasets with labels that are expensive to acquire. Foundation models offer a promising solution by learning general-purpose representations, which can be finetuned for specific tasks, overcoming data scarcity. Developing foundation models for medical imaging holds significant potential, but the sensitive and protected nature of medical data presents unique challenges. Foundation model training typically requires extensive datasets, and while hospitals generate large volumes of data, privacy restrictions prevent direct data sharing, making foundation model training infeasible in most scenarios. In this work, we propose a FL framework for training foundation models for gastroendoscopy imaging, enabling data to remain within local hospital environments while contributing to a shared model. We explore several established FL algorithms, assessing their suitability for training foundation models without relying on task-specific labels, conducting experiments in both homogeneous and heterogeneous settings. We evaluate the trained foundation model on three critical downstream tasks–classification, detection, and segmentation–and demonstrate that it achieves improved performance across all tasks, highlighting the effectiveness of our approach in a federated, privacy-preserving setting.
nan
Article 1767
Title@2025-06-06 (5): Learning to Weight Parameters for Data Attribution
Title: Learning to Weight Parameters for Data Attribution | Gewichtungsparameter für die Zuordnung von Daten lernen | 数据归属的重度参数学习 2506.05647v1 |
Authors (4): Shuangqi Li, Hieu Le, Jingyi Xu, Mathieu Salzmann
We study data attribution in generative models, aiming to identify which training examples most influence a given output. Existing methods achieve this by tracing gradients back to training data. However, they typically treat all network parameters uniformly, ignoring the fact that different layers encode different types of information and may thus draw information differently from the training set. We propose a method that models this by learning parameter importance weights tailored for attribution, without requiring labeled data. This allows the attribution process to adapt to the structure of the model, capturing which training examples contribute to specific semantic aspects of an output, such as subject, style, or background. Our method improves attribution accuracy across diffusion models and enables fine-grained insights into how outputs borrow from training data.
nan
Article 1768
Title@2025-06-06 (5): Computational bottlenecks for denoising diffusions
Title: Computational bottlenecks for denoising diffusions | Computationale Engpässe für die Disseminierung von Diffusionen | 消除传播传播的计算瓶颈 2503.08028v2 |
Authors (2): Andrea Montanari, Viet Vu
Denoising diffusions sample from a probability distribution $\mu$ in $\mathbb{R}^d$ by constructing a stochastic process $({\hat{\boldsymbol x}}_t:t\ge 0)$ in $\mathbb{R}^d$ such that ${\hat{\boldsymbol x}}_0$ is easy to sample, but the distribution of $\hat{\boldsymbol x}_T$ at large $T$ approximates $\mu$. The drift ${\boldsymbol m}:\mathbb{R}^d\times\mathbb{R}\to\mathbb{R}^d$ of this diffusion process is learned my minimizing a score-matching objective. Is every probability distribution $\mu$, for which sampling is tractable, also amenable to sampling via diffusions? We provide evidence to the contrary by studying a probability distribution $\mu$ for which sampling is easy, but the drift of the diffusion process is intractable – under a popular conjecture on information-computation gaps in statistical estimation. We show that there exist drifts that are superpolynomially close to the optimum value (among polynomial time drifts) and yet yield samples with distribution that is very far from the target one.
nan
Article 1769
Title@2025-06-06 (5): Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models
Title: Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models | Pseudo Relevanz Feedback reicht aus, um die Lücke zwischen kleinen und großen Dense-Retrieval-Modellen zu schließen | Peedo相关性反馈足以弥合小型和大型常量检索模型之间的差距 2503.14887v2 |
Authors (4): Hang Li, Xiao Wang, Bevan Koopman, Guido Zuccon
Scaling dense retrievers to larger large language model (LLM) backbones has been a dominant strategy for improving their retrieval effectiveness. However, this has substantial cost implications: larger backbones require more expensive hardware (e.g. GPUs with more memory) and lead to higher indexing and querying costs (latency, energy consumption). In this paper, we challenge this paradigm by introducing PromptPRF, a feature-based pseudo-relevance feedback (PRF) framework that enables small LLM-based dense retrievers to achieve effectiveness comparable to much larger models. PromptPRF uses LLMs to extract query-independent, structured and unstructured features (e.g., entities, summaries, chain-of-thought keywords, essay) from top-ranked documents. These features are generated offline and integrated into dense query representations via prompting, enabling efficient retrieval without additional training. Unlike prior methods such as GRF, which rely on online, query-specific generation and sparse retrieval, PromptPRF decouples feedback generation from query processing and supports dense retrievers in a fully zero-shot setting. Experiments on TREC DL and BEIR benchmarks demonstrate that PromptPRF consistently improves retrieval effectiveness and offers favourable cost-effectiveness trade-offs. We further present ablation studies to understand the role of positional feedback and analyse the interplay between feature extractor size, PRF depth, and model performance. Our findings demonstrate that with effective PRF design, scaling the retriever is not always necessary, narrowing the gap between small and large models while reducing inference cost.
nan
Article 1770
Title@2025-06-06 (5): Interpretable Enzyme Function Prediction via Residue-Level Detection
Title: Interpretable Enzyme Function Prediction via Residue-Level Detection | Interpretierbare Enzymfunktionsvorhersage über Rückstandserkennung | 通过残留水平探测的可解释酶功能预测 2501.05644v2 |
Authors (4): Zhao Yang, Bing Su, Jiahao Chen, Ji-Rong Wen
Predicting multiple functions labeled with Enzyme Commission (EC) numbers from the enzyme sequence is of great significance but remains a challenge due to its sparse multi-label classification nature, i.e., each enzyme is typically associated with only a few labels out of more than 6000 possible EC numbers. However, existing machine learning algorithms generally learn a fixed global representation for each enzyme to classify all functions, thereby they lack interpretability and the fine-grained information of some function-specific local residue fragments may be overwhelmed. Here we present an attention-based framework, namely ProtDETR (Protein Detection Transformer), by casting enzyme function prediction as a detection problem. It uses a set of learnable functional queries to adaptatively extract different local representations from the sequence of residue-level features for predicting different EC numbers. ProtDETR not only significantly outperforms existing deep learning-based enzyme function prediction methods, but also provides a new interpretable perspective on automatically detecting different local regions for identifying different functions through cross-attentions between queries and residue-level features. Code is available at https://github.com/yangzhao1230/ProtDETR.
nan
Article 1771
Title@2025-06-06 (5): Where is the signal in tokenization space?
Title: Where is the signal in tokenization space? | Wo ist das Signal im Tokenisationsraum? | 象征空间的信号在哪里? 2408.08541v2 |
Authors (5): Renato Lui Geh, Honghua Zhang, Kareem Ahmed, Benjie Wang, Guy Van den Broeck
Large Language Models (LLMs) are typically shipped with tokenizers that deterministically encode text into so-called canonical token sequences, to which the LLMs assign probability values. One common assumption is that the probability of a piece of text is the probability of its canonical token sequence. However, the tokenization of a string is not unique: e.g., the Llama2 tokenizer encodes Tokens as [Tok,ens], but [Tok,en,s] also represents the same text. In this paper, we study non-canonical tokenizations. We prove that, given a string, it is computationally hard to find the most likely tokenization for an autoregressive LLM, as well as to compute the marginal probability over all possible tokenizations. We then show how the marginal is, in most cases, indistinguishable from the canonical probability. Surprisingly, we then empirically demonstrate the existence of a significant amount of signal hidden within tokenization space. Notably, by simply aggregating the probabilities of non-canonical tokenizations, we achieve improvements across a range of LLM evaluation benchmarks for a variety of architectures, including transformers and state space models.
nan
Article 1772
Title@2025-06-06 (5): Hierarchical Refinement: Optimal Transport to Infinity and Beyond
Title: Hierarchical Refinement: Optimal Transport to Infinity and Beyond | Hierarchische Verfeinerung: Optimaler Transport zur Unendlichkeit und darüber hinaus | 梯级精炼:向无限及以后的最佳运输 2503.03025v2 |
Authors (4): Peter Halmos, Julian Gold, Xinhao Liu, Benjamin J. Raphael
Optimal transport (OT) has enjoyed great success in machine learning as a principled way to align datasets via a least-cost correspondence, driven in large part by the runtime efficiency of the Sinkhorn algorithm (Cuturi, 2013). However, Sinkhorn has quadratic space complexity in the number of points, limiting scalability to larger datasets. Low-rank OT achieves linear-space complexity, but by definition, cannot compute a one-to-one correspondence between points. When the optimal transport problem is an assignment problem between datasets then an optimal mapping, known as the Monge map, is guaranteed to be a bijection. In this setting, we show that the factors of an optimal low-rank coupling co-cluster each point with its image under the Monge map. We leverage this invariant to derive an algorithm, Hierarchical Refinement (HiRef), that dynamically constructs a multiscale partition of each dataset using low-rank OT subproblems, culminating in a bijective coupling. Hierarchical Refinement uses linear space and has log-linear runtime, retaining the space advantage of low-rank OT while overcoming its limited resolution. We demonstrate the advantages of Hierarchical Refinement on several datasets, including ones containing over a million points, scaling full-rank OT to problems previously beyond Sinkhorn’s reach.
nan
Article 1773
Title@2025-06-06 (5): Autocomp: LLM-Driven Code Optimization for Tensor Accelerators
Title: Autocomp: LLM-Driven Code Optimization for Tensor Accelerators | Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger | 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v2 |
Authors (4): Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao
Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today’s computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise in code generation and optimization tasks, but generating low-resource languages like specialized tensor accelerator code still poses a significant challenge. We tackle this challenge with Autocomp, an approach that empowers accelerator programmers to leverage domain knowledge and hardware feedback to optimize code via an automated LLM-driven search. We accomplish this by: 1) formulating each optimization pass as a structured two-phase prompt, divided into planning and code generation phases, 2) inserting domain knowledge during planning via a concise and adaptable optimization menu, and 3) integrating correctness and performance metrics from hardware as feedback at each search iteration. Across three categories of representative workloads and two different accelerators, we demonstrate that Autocomp-optimized code runs 5.6x (GEMM) and 2.7x (convolution) faster than the vendor-provided library, and outperforms expert-level hand-tuned code by 1.4x (GEMM), 1.1x (convolution), and 1.3x (fine-grained linear algebra). Additionally, we demonstrate that optimization schedules generated from Autocomp can be reused across similar tensor operations, improving speedups by up to 24% under a fixed sample budget.
nan
Article 1774
Title@2025-06-06 (5): Estimating stationary mass, frequency by frequency
Title: Estimating stationary mass, frequency by frequency | Schätzung der stationären Masse, Häufigkeit nach Häufigkeit | 估计固定质量,按频率分列的频率 2503.12808v3 |
Authors (3): Milind Nakul, Vidya Muthukumar, Ashwin Pananjady
Suppose we observe a trajectory of length $n$ from an exponentially $\alpha$-mixing stochastic process over a finite but potentially large state space. We consider the problem of estimating the probability mass placed by the stationary distribution of any such process on elements that occur with a certain frequency in the observed sequence. We estimate this vector of probabilities in total variation distance, showing universal consistency in $n$ and recovering known results for i.i.d. sequences as special cases. Our proposed methodology – implementable in linear time – carefully combines the plug-in (or empirical) estimator with a recently-proposed modification of the Good–Turing estimator called WingIt, which was originally developed for Markovian sequences. En route to controlling the error of our estimator, we develop new performance bounds on WingIt and the plug-in estimator for exponentially $\alpha$-mixing stochastic processes. Importantly, the extensively used method of Poissonization can no longer be applied in our non i.i.d. setting, and so we develop complementary tools – including concentration inequalities for a natural self-normalized statistic of mixing sequences – that may prove independently useful in the design and analysis of estimators for related problems. Simulation studies corroborate our theoretical findings.
nan
Article 1775
Title@2025-06-06 (5): Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer
Title: Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer | Longitudinal Targeted Minimal Loss-basierte Abschätzung mit zeitlich-differenzierter Heterogener Transformer | 与时间差异变化异质变异器的纵向定值最低损失估计值 2404.04399v2 |
Authors (8): Toru Shirakawa, Yi Li, Yulun Wu, Sky Qiu, Yuxuan Li, Mingduo Zhao, Hiroyasu Iso, Mark van der Laan
We propose Deep Longitudinal Targeted Minimum Loss-based Estimation (Deep LTMLE), a novel approach to estimate the counterfactual mean of outcome under dynamic treatment policies in longitudinal problem settings. Our approach utilizes a transformer architecture with heterogeneous type embedding trained using temporal-difference learning. After obtaining an initial estimate using the transformer, following the targeted minimum loss-based likelihood estimation (TMLE) framework, we statistically corrected for the bias commonly associated with machine learning algorithms. Furthermore, our method also facilitates statistical inference by enabling the provision of 95% confidence intervals grounded in asymptotic statistical theory. Simulation results demonstrate our method’s superior performance over existing approaches, particularly in complex, long time-horizon scenarios. It remains effective in small-sample, short-duration contexts, matching the performance of asymptotically efficient estimators. To demonstrate our method in practice, we applied our method to estimate counterfactual mean outcomes for standard versus intensive blood pressure management strategies in a real-world cardiovascular epidemiology cohort study.
nan
Article 1776
Title@2025-06-06 (5): Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones
Title: Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones | Projektierbare Modelle: One-Shot-Generation von kleinen Spezialtransformatoren aus großen Modellen | 集成项目模型:大型一型小型专门化变形器的单制制式 2506.05641v1 |
Authors (3): Andrey Zhmoginov, Jihwan Lee, Mark Sandler
Modern Foundation Models (FMs) are typically trained on corpora spanning a wide range of different data modalities, topics and downstream tasks. Utilizing these models can be very computationally expensive and is out of reach for most consumer devices. Furthermore, most of the broad FM knowledge may actually be irrelevant for a specific task at hand. Here we explore a technique for mapping parameters of a large Transformer to parameters of a smaller specialized model. By making this transformation task-specific, we aim to capture a narrower scope of the knowledge needed for performing a specific task by a smaller model. We study our method on image modeling tasks, showing that performance of generated models exceeds that of universal conditional models.
nan
Article 1777
Title@2025-06-05 (4): A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition
Title: A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition | Ein fiktiver Q&A-Datensatz für das Studieren von Memorization und Knowledge Acquisition | 用于研究记忆化和知识获取的字形 A 数据集 2506.05639v1 |
Authors (5): John Kirchenbauer, Janny Mongkolsupawan, Yuxin Wen, Tom Goldstein, Daphne Ippolito
When language models are trained on textual data, they acquire both knowledge about the structure of language as well as knowledge of facts about the world. At inference time, their knowledge of facts can be leveraged to solve interesting problems and perform useful knowledge work for users. It is well known that language models can verbatim memorize long sequences from their training data. However, it is much less well understood how language models memorize facts seen during training. In this work, we propose a new dataset to specifically empower researchers to study the dual processes of fact memorization and verbatim sequence memorization. The dataset consists of synthetically-generated, webtext-like documents about fictional events, as well as question-answer pairs about the events. We conduct training experiments showing how synthetic data about fictional events can be effective in teasing apart different forms of memorization. We also document the challenges in effectively building realistic, fictional synthetic data.
nan
Article 1778
Title@2025-06-05 (4): Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks’ Internal Representations
Title: Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks’ Internal Representations | Erklärbare Konzeptgenerierung durch Vision-Sprachvorliebe Lernen zum Verständnis der internen Repräsentationen neuraler Netzwerke | 通过愿景-语言优先学习,了解神经网络内部代表网络,可解释的概念生成 2408.13438v3 |
Authors (3): Aditya Taparia, Som Sagar, Ransalu Senanayake
Understanding the inner representation of a neural network helps users improve models. Concept-based methods have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual “concepts” that are not directly related to feature attributes. For instance, the concept of “stripes” is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and manually collect multiple candidate concept image sets, making the process labor-intensive and prone to overlooking important concepts. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a standard generative model does not result in meaningful concepts, we devise a reinforcement learning-based preference optimization (RLPO) algorithm that fine-tunes a vision-language generative model from approximate textual descriptions of concepts. Through a series of experiments, we demonstrate our method’s ability to efficiently and reliably articulate diverse concepts that are otherwise challenging to craft manually.
nan
Article 1779
Title@2025-06-05 (4): CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Title: CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP | CLIPErase: Effizientes Entlernen von Visual-Textual Associations in CLIP | CLIPErase:在CLIP中有效不学习视觉-语言协会 2410.23330v2 |
Authors (6): Tianyu Yang, Lisen Dai, Xiangqi Wang, Minhao Cheng, Yapeng Tian, Xiangliang Zhang
Machine unlearning (MU) has gained significant attention as a means to remove specific data from trained models without requiring a full retraining process. While progress has been made in unimodal domains like text and image classification, unlearning in multimodal models remains relatively underexplored. In this work, we address the unique challenges of unlearning in CLIP, a prominent multimodal model that aligns visual and textual representations. We introduce CLIPErase, a novel approach that disentangles and selectively forgets both visual and textual associations, ensuring that unlearning does not compromise model performance. CLIPErase consists of three key modules: a Forgetting Module that disrupts the associations in the forget set, a Retention Module that preserves performance on the retain set, and a Consistency Module that maintains consistency with the original model. Extensive experiments on the CIFAR-100 and Flickr30K datasets across four CLIP downstream tasks demonstrate that CLIPErase effectively forgets designated associations in zero-shot tasks for multimodal samples, while preserving the model’s performance on the retain set after unlearning.
nan
Article 1780
Title@2025-06-05 (4): Bayesian Inference for Correlated Human Experts and Classifiers
Title: Bayesian Inference for Correlated Human Experts and Classifiers | Bayesische Schlussfolgerung für korrelierte menschliche Experten und Klassifikatoren | 与相关人类专家和分类有关连的人类专家和分类人的贝叶斯推论 2506.05636v1 |
Authors (5): Markelle Kelly, Alex Boyd, Sam Showalter, Mark Steyvers, Padhraic Smyth
Applications of machine learning often involve making predictions based on both model outputs and the opinions of human experts. In this context, we investigate the problem of querying experts for class label predictions, using as few human queries as possible, and leveraging the class probability estimates of pre-trained classifiers. We develop a general Bayesian framework for this problem, modeling expert correlation via a joint latent representation, enabling simulation-based inference about the utility of additional expert queries, as well as inference of posterior distributions over unobserved expert labels. We apply our approach to two real-world medical classification problems, as well as to CIFAR-10H and ImageNet-16H, demonstrating substantial reductions relative to baselines in the cost of querying human experts while maintaining high prediction accuracy.
nan
Article 1781
Title@2025-06-05 (4): AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking
Title: AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking | AdaReasoner: Adaptive Reasoning ermöglicht flexibleres Denken | AdaReasoner:适应性理性使思考更灵活 2505.17312v2 |
Authors (7): Xiangqi Wang, Yue Huang, Yanbo Wang, Xiaonan Luo, Kehan Guo, Yujun Zhou, Xiangliang Zhang
LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work ‘well enough’ across tasks but seldom achieve task-specific optimality. To address this gap, we introduce AdaReasoner, an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations for tasks requiring different types of thinking. AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy, along with a pretrained reward model to optimize the policy model for reasoning configurations with only a few-shot guide. AdaReasoner is backed by theoretical guarantees and experiments of fast convergence and a sublinear policy gap. Across six different LLMs and a variety of reasoning tasks, it consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.
nan
Article 1782
Title@2025-06-05 (4): AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization
Title: AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization | AutoQD: Automatische Entdeckung von unterschiedlichen Verhalten mit Qualitäts-Diversitäts-Optimierung | 自动QD: 以质量- 多样性优化自动发现多种行为 2506.05634v1 |
Authors (2): Saeed Hedayatian, Stefanos Nikolaidis
Quality-Diversity (QD) algorithms have shown remarkable success in discovering diverse, high-performing solutions, but rely heavily on hand-crafted behavioral descriptors that constrain exploration to predefined notions of diversity. Leveraging the equivalence between policies and occupancy measures, we present a theoretically grounded approach to automatically generate behavioral descriptors by embedding the occupancy measures of policies in Markov Decision Processes. Our method, AutoQD, leverages random Fourier features to approximate the Maximum Mean Discrepancy (MMD) between policy occupancy measures, creating embeddings whose distances reflect meaningful behavioral differences. A low-dimensional projection of these embeddings that captures the most behaviorally significant dimensions is then used as behavioral descriptors for off-the-shelf QD methods. We prove that our embeddings converge to true MMD distances between occupancy measures as the number of sampled trajectories and embedding dimensions increase. Through experiments in multiple continuous control tasks we demonstrate AutoQD’s ability in discovering diverse policies without predefined behavioral descriptors, presenting a well-motivated alternative to prior methods in unsupervised Reinforcement Learning and QD optimization. Our approach opens new possibilities for open-ended learning and automated behavior discovery in sequential decision making settings without requiring domain-specific knowledge.
nan
Article 1783
Title@2025-06-05 (4): The TESS Ten Thousand Catalog: 10,001 uniformly-vetted and -validated Eclipsing Binary Stars detected in Full-Frame Image data by machine learning and analyzed by citizen scientists
Title: The TESS Ten Thousand Catalog: 10,001 uniformly-vetted and -validated Eclipsing Binary Stars detected in Full-Frame Image data by machine learning and analyzed by citizen scientists | Der TESS Zehntausend-Katalog: 10,001 gleichmäßig getestete und validierte Eclipsing Binary Stars in Full-Frame-Bilddaten durch maschinelles Lernen erkannt und von Bürgerwissenschaftlern analysiert | TESS 10 000万个目录:10 001个统一审查的和经过验证的二进制星,通过机器学习和由公民科学家分析,在全框架图像数据中检测到 2506.05631v1 |
Authors (39): Veselin B. Kostov, Brian P. Powell, Aline U. Fornear, Marco Z. Di Fraia, Robert Gagliano, Thomas L. Jacobs, Julien S. de Lambilly, Hugo A. Durantini Luca, Steven R. Majewski, Mark Omohundro, Jerome Orosz, Saul A. Rappaport, Ryan Salik, Donald Short, William Welsh, Svetoslav Alexandrov, Cledison Marcos da Silva, Erika Dunning, Gerd Guhne, Marc Huten, Michiharu Hyogo, Davide Iannone, Sam Lee, Christian Magliano, Manya Sharma, Allan Tarr, John Yablonsky, Sovan Acharya, Fred Adams, Thomas Barclay, Benjamin T. Montet, Susan Mullally, Greg Olmschenk, Andrej Prsa, Elisa Quintana, Robert Wilson, Hasret Balcioglu, Ethan Kruse, the Eclipsing Binary Patrol Collaboration
The Transiting Exoplanet Survey Satellite (TESS) has surveyed nearly the entire sky in Full-Frame Image mode with a time resolution of 200 seconds to 30 minutes and a temporal baseline of at least 27 days. In addition to the primary goal of discovering new exoplanets, TESS is exceptionally capable at detecting variable stars, and in particular short-period eclipsing binaries which are relatively common, making up a few percent of all stars, and represent powerful astrophysical laboratories for deep investigations of stellar formation and evolution. We combed Sectors 1-82 of TESS Full-Frame Image data searching for eclipsing binary stars using a neural network that identified ~1.2 million stars with eclipse-like features. Of these, we have performed an in-depth analysis on ~60,000 targets using automated methods and manual inspection by citizen scientists. Here we present a catalog of 10001 uniformly-vetted and -validated eclipsing binary stars that passed all our ephemeris and photocenter tests, as well as complementary visual inspection. Of these, 7936 are new eclipsing binaries while the remaining 2065 are known systems for which we update the published ephemerides. We outline the detection and analysis of the targets, discuss the properties of the sample, and highlight potentially interesting systems. Finally, we also provide a list of ~900,000 unvetted and unvalidated targets for which the neural network found eclipse-like features with a score higher than 0.9, and for which there are no known eclipsing binaries within a sky-projected separation of a TESS pixel (~21 arcsec).
nan
Article 1784
Title@2025-06-05 (4): A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series
Title: A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series | Ein verlässlicher Rahmen für die Mensch-in-the-Loop-Anomalie-Erkennung in der Zeitreihe | 时间序列中人类在Loop异常探测的可靠框架 2405.03234v3 |
Authors (4): Ziquan Deng, Xiwei Xuan, Kwan-Liu Ma, Zhaodan Kong
Time series anomaly detection is a critical machine learning task for numerous applications, such as finance, healthcare, and industrial systems. However, even high-performing models may exhibit potential issues such as biases, leading to unreliable outcomes and misplaced confidence. While model explanation techniques, particularly visual explanations, offer valuable insights by elucidating model attributions of their decision, many limitations still exist – They are primarily instance-based and not scalable across the dataset, and they provide one-directional information from the model to the human side, lacking a mechanism for users to address detected issues. To fulfill these gaps, we introduce HILAD, a novel framework designed to foster a dynamic and bidirectional collaboration between humans and AI for enhancing anomaly detection models in time series. Through our visual interface, HILAD empowers domain experts to detect, interpret, and correct unexpected model behaviors at scale. Our evaluation through user studies with two models and three time series datasets demonstrates the effectiveness of HILAD, which fosters a deeper model understanding, immediate corrective actions, and model reliability enhancement.
nan
Article 1785
Title@2025-06-05 (4): GP-MoLFormer-Sim: Test Time Molecular Optimization through Contextual Similarity Guidance
Title: GP-MoLFormer-Sim: Test Time Molecular Optimization through Contextual Similarity Guidance | GP-MoLFormer-Sim: Testzeit-Molekularoptimierung durch kontextuelle Ähnlichkeitsführung | GP-MoLFormer-Sim:通过背景相似性指导试验时间分子优化 2506.05628v1 |
Authors (7): Jiri Navratil, Jarret Ross, Payel Das, Youssef Mroueh, Samuel C Hoffman, Vijil Chenthamarakshan, Brian Belgodere
The ability to design molecules while preserving similarity to a target molecule and/or property is crucial for various applications in drug discovery, chemical design, and biology. We introduce in this paper an efficient training-free method for navigating and sampling from the molecular space with a generative Chemical Language Model (CLM), while using the molecular similarity to the target as a guide. Our method leverages the contextual representations learned from the CLM itself to estimate the molecular similarity, which is then used to adjust the autoregressive sampling strategy of the CLM. At each step of the decoding process, the method tracks the distance of the current generations from the target and updates the logits to encourage the preservation of similarity in generations. We implement the method using a recently proposed $\sim$47M parameter SMILES-based CLM, GP-MoLFormer, and therefore refer to the method as GP-MoLFormer-Sim, which enables a test-time update of the deep generative policy to reflect the contextual similarity to a set of guide molecules. The method is further integrated into a genetic algorithm (GA) and tested on a set of standard molecular optimization benchmarks involving property optimization, molecular rediscovery, and structure-based drug design. Results show that, GP-MoLFormer-Sim, combined with GA (GP-MoLFormer-Sim+GA) outperforms existing training-free baseline methods, when the oracle remains black-box. The findings in this work are a step forward in understanding and guiding the generative mechanisms of CLMs.
nan
Article 1786
Title@2025-06-05 (4): Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency
Title: Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency | Grundlegende Grenzen der schnellen Tuning Transformer: Universalität, Kapazität und Effizienz | 迅速提款变换者的根本限制:普遍性、能力和效益 2411.16525v2 |
Authors (6): Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani, Chenyang Li, Zhao Song, Han Liu
We investigate the statistical and computational limits of prompt tuning for transformer-based foundation models. Our key contributions are prompt tuning on \emph{single-head} transformers with only a \emph{single} self-attention layer: (i) is universal, and (ii) supports efficient (even almost-linear time) algorithms under the Strong Exponential Time Hypothesis (SETH). Statistically, we prove that prompt tuning on such simplest possible transformers are universal approximators for sequence-to-sequence Lipschitz functions. In addition, we provide an exponential-in-$dL$ and -in-$(1/\epsilon)$ lower bound on the required soft-prompt tokens for prompt tuning to memorize any dataset with 1-layer, 1-head transformers. Computationally, we identify a phase transition in the efficiency of prompt tuning, determined by the norm of the \emph{soft-prompt-induced} keys and queries, and provide an upper bound criterion. Beyond this criterion, no sub-quadratic (efficient) algorithm for prompt tuning exists under SETH. Within this criterion, we showcase our theory by proving the existence of almost-linear time prompt tuning inference algorithms. These fundamental limits provide important necessary conditions for designing expressive and efficient prompt tuning methods for practitioners.
nan
Article 1787
Title@2025-06-05 (4): Theoretical Analysis of Positional Encodings in Transformer Models: Impact on Expressiveness and Generalization
Title: Theoretical Analysis of Positional Encodings in Transformer Models: Impact on Expressiveness and Generalization | Theoretische Analyse von Positionskodierungen in Transformer-Modellen: Auswirkungen auf Expressivität und Generalisierung | 变换模型中位置编码理论分析:对表现力和普遍化的影响 2506.06398v1 |
Authors (1): Yin Li
Positional encodings are a core part of transformer-based models, enabling processing of sequential data without recurrence. This paper presents a theoretical framework to analyze how various positional encoding methods, including sinusoidal, learned, relative, and bias-based methods like Attention with Linear Biases (ALiBi), impact a transformer’s expressiveness, generalization ability, and extrapolation to longer sequences. Expressiveness is defined via function approximation, generalization bounds are established using Rademacher complexity, and new encoding methods based on orthogonal functions, such as wavelets and Legendre polynomials, are proposed. The extrapolation capacity of existing and proposed encodings is analyzed, extending ALiBi’s biasing approach to a unified theoretical context. Experimental evaluation on synthetic sequence-to-sequence tasks shows that orthogonal transform-based encodings outperform traditional sinusoidal encodings in generalization and extrapolation. This work addresses a critical gap in transformer theory, providing insights for design choices in natural language processing, computer vision, and other transformer applications.
nan
Article 1788
Title@2025-06-05 (4): Two-dimensional Taxonomy for N-ary Knowledge Representation Learning Methods
Title: Two-dimensional Taxonomy for N-ary Knowledge Representation Learning Methods | Zweidimensionale Taxonomie für N-are Wissensdarstellung Lernmethoden | Nary知识代表制学习方法二维分类学 2506.05626v1 |
Authors (3): Xiaohua Lu, Liubov Tupikina, Mehwish Alam
Real-world knowledge can take various forms, including structured, semi-structured, and unstructured data. Among these, knowledge graphs are a form of structured human knowledge that integrate heterogeneous data sources into structured representations but typically reduce complex n-ary relations to simple triples, thereby losing higher-order relational details. In contrast, hypergraphs naturally represent n-ary relations with hyperedges, which directly connect multiple entities together. Yet hypergraph representation learning often overlooks entity roles in hyperedges, limiting the fine-grained semantic modelling. To address these issues, knowledge hypergraphs and hyper-relational knowledge graphs combine the advantages of knowledge graphs and hypergraphs to better capture the complex structures and role-specific semantics of real-world knowledge. This survey provides a comprehensive review of methods handling n-ary relational data, covering both knowledge hypergraphs and hyper-relational knowledge graphs literatures. We propose a two-dimensional taxonomy: the first dimension categorises models based on their methodology, i.e., translation-based models, tensor factorisation-based models, deep neural network-based models, logic rules-based models, and hyperedge expansion-based models. The second dimension classifies models according to their awareness of entity roles and positions in n-ary relations, dividing them into aware-less, position-aware, and role-aware approaches. Finally, we discuss existing datasets, negative sampling strategies, and outline open challenges to inspire future research.
nan
Article 1789
Title@2025-06-05 (4): Heterogeneous Sequel-Aware Graph Neural Networks for Sequential Learning
Title: Heterogeneous Sequel-Aware Graph Neural Networks for Sequential Learning | Heterogene Sequel-Aware Graph Neuronale Netzwerke für sequentielles Lernen | 用于序列学习的异异质Sequel-Aware 图像神经网络 2506.05625v1 |
Authors (3): Anushka Tiwari, Haimonti Dutta, Shahrzad Khanizadeh
Graph-based recommendation systems use higher-order user and item embeddings for next-item predictions. Dynamically adding collaborative signals from neighbors helps to use similar users’ preferences during learning. While item-item correlations and their impact on recommendations have been studied, the efficacy of temporal item sequences for recommendations is much less explored. In this paper, we examine temporal item sequence (sequel-aware) embeddings along with higher-order user embeddings and show that sequel-aware Graph Neural Networks have better (or comparable) recommendation performance than graph-based recommendation systems that do not consider sequel information. Extensive empirical results comparing Heterogeneous Sequel-aware Graph Neural Networks (HSAL-GNNs) to other algorithms for sequential learning (such as transformers, graph neural networks, auto-encoders) are presented on three synthetic and three real-world datasets. Our results indicate that the incorporation of sequence information from items greatly enhances recommendations.
nan
Article 1790
Title@2025-06-05 (4): Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space
Title: Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space | Verteilungsmatrix Fertigstellung über die nächsten Nachbarn im Wassersteinraum | 通过瓦塞斯坦空间近距离邻居完成分布矩阵 2410.13112v2 |
Authors (4): Jacob Feitelberg, Kyuseong Choi, Anish Agarwal, Raaz Dwivedi
We study the problem of distributional matrix completion: Given a sparsely observed matrix of empirical distributions, we seek to impute the true distributions associated with both observed and unobserved matrix entries. This is a generalization of traditional matrix completion, where the observations per matrix entry are scalar-valued. To do so, we utilize tools from optimal transport to generalize the nearest neighbors method to the distributional setting. Under a suitable latent factor model on probability distributions, we establish that our method recovers the distributions in the Wasserstein metric. We demonstrate through simulations that our method (i) provides better distributional estimates for an entry compared to using observed samples for that entry alone, (ii) yields accurate estimates of distributional quantities such as standard deviation and value-at-risk, and (iii) inherently supports heteroscedastic distributions. In addition, we demonstrate our method on a real-world dataset of quarterly earnings prediction distributions. We also prove novel asymptotic results for Wasserstein barycenters over one-dimensional distributions.
nan
Article 1791
Title@2025-06-05 (4): Active Learning of Piecewise Gaussian Process Surrogates
Title: Active Learning of Piecewise Gaussian Process Surrogates | Aktives Lernen des stückweisen Gaußschen Prozesses Surrogats | 正在积极学习小费高斯进程代得国家 2301.08789v3 |
Authors (6): Chiwoo Park, Robert Waelder, Bonggwon Kang, Benji Maruyama, Soondo Hong, Robert Gramacy
Active learning of Gaussian process (GP) surrogates has been useful for optimizing experimental designs for physical/computer simulation experiments, and for steering data acquisition schemes in machine learning. In this paper, we develop a method for active learning of piecewise, Jump GP surrogates. Jump GPs are continuous within, but discontinuous across, regions of a design space, as required for applications spanning autonomous materials design, configuration of smart factory systems, and many others. Although our active learning heuristics are appropriated from strategies originally designed for ordinary GPs, we demonstrate that additionally accounting for model bias, as opposed to the usual model uncertainty, is essential in the Jump GP context. Toward that end, we develop an estimator for bias and variance of Jump GP models. Illustrations, and evidence of the advantage of our proposed methods, are provided on a suite of synthetic benchmarks, and real-simulation experiments of varying complexity.
nan
Article 1792
Title@2025-06-05 (4): EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM?
Title: EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM? | EnsemW2S: Kann ein Ensemble von LLMs dazu genutzt werden, eine stärkere LLM zu erhalten? | EnsemW2S:能否利用捆绑的LLMs获得更强大的LMm? 2410.04571v2 |
Authors (7): Aakriti Agrawal, Mucong Ding, Zora Che, Chenghao Deng, Anirudh Satheesh, John Langford, Furong Huang
How can we harness the collective capabilities of multiple Large Language Models (LLMs) to create an even more powerful model? This question forms the foundation of our research, where we propose an innovative approach to weak-to-strong (w2s) generalization-a critical problem in AI alignment. Our work introduces an easy-to-hard (e2h) framework for studying the feasibility of w2s generalization, where weak models trained on simpler tasks collaboratively supervise stronger models on more complex tasks. This setup mirrors real-world challenges, where direct human supervision is limited. To achieve this, we develop a novel AdaBoost-inspired ensemble method, demonstrating that an ensemble of weak supervisors can enhance the performance of stronger LLMs across classification and generative tasks on difficult QA datasets. In several cases, our ensemble approach matches the performance of models trained on ground-truth data, establishing a new benchmark for w2s generalization. We observe an improvement of up to 14% over existing baselines and average improvements of 5% and 4% for binary classification and generative tasks, respectively. This research points to a promising direction for enhancing AI through collective supervision, especially in scenarios where labeled data is sparse or insufficient.
nan
Article 1793
Title@2025-06-05 (4): Exploring Representations and Interventions in Time Series Foundation Models
Title: Exploring Representations and Interventions in Time Series Foundation Models | Exploring Repräsentationen und Interventionen in Time Series Foundation Models | 探讨在时间系列基础模型中的代表性和干预措施 2409.12915v5 |
Authors (5): Michał Wiliński, Mononito Goswami, Willa Potosnak, Nina Żukowska, Artur Dubrawski
Time series foundation models (TSFMs) promise to be powerful tools for a wide range of applications. However, their internal representations and learned concepts are still not well understood. In this study, we investigate the structure and redundancy of representations across various TSFMs, examining the self-similarity of model layers within and across different model sizes. This analysis reveals block-like redundancy in the representations, which can be utilized for informed pruning to improve inference speed and efficiency. Additionally, we explore the concepts learned by these models - such as periodicity and trends - and how these can be manipulated through latent space steering to influence model behavior. Our experiments show that steering interventions can introduce new features, e.g., adding periodicity or trends to signals that initially lacked them. These findings underscore the value of representational analysis for optimizing models and demonstrate how conceptual steering offers new possibilities for more controlled and efficient time series analysis with TSFMs.
nan
Article 1794
Title@2025-06-05 (4): Toward Efficient Kernel-Based Solvers for Nonlinear PDEs
Title: Toward Efficient Kernel-Based Solvers for Nonlinear PDEs | Auf dem Weg zu effizienten Kernel-basierten Lösern für nichtlineare PDEs | 实现非线性PDE 高效的以核心为基础的内核解决问题器 2410.11165v4 |
Authors (6): Zhitong Xu, Da Long, Yiming Xu, Guang Yang, Shandian Zhe, Houman Owhadi
We introduce a novel kernel learning framework toward efficiently solving nonlinear partial differential equations (PDEs). In contrast to the state-of-the-art kernel solver that embeds differential operators within kernels, posing challenges with a large number of collocation points, our approach eliminates these operators from the kernel. We model the solution using a standard kernel interpolation form and differentiate the interpolant to compute the derivatives. Our framework obviates the need for complex Gram matrix construction between solutions and their derivatives, allowing for a straightforward implementation and scalable computation. As an instance, we allocate the collocation points on a grid and adopt a product kernel, which yields a Kronecker product structure in the interpolation. This structure enables us to avoid computing the full Gram matrix, reducing costs and scaling efficiently to a large number of collocation points. We provide a proof of the convergence and rate analysis of our method under appropriate regularity assumptions. In numerical experiments, we demonstrate the advantages of our method in solving several benchmark PDEs.
nan
Article 1795
Title@2025-06-05 (4): Population-Proportional Preference Learning from Human Feedback: An Axiomatic Approach
Title: Population-Proportional Preference Learning from Human Feedback: An Axiomatic Approach | Bevölkerung-proportionale Präferenz Lernen aus menschlichem Feedback: Ein axiomatischer Ansatz | 从人类反馈中学习:一种不言而喻的方法 2506.05619v1 |
Authors (4): Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar, Pablo A. Parrilo
Conventional preference learning methods often prioritize opinions held more widely when aggregating preferences from multiple evaluators. This may result in policies that are biased in favor of some types of opinions or groups. The objective of this paper is to develop a novel preference learning framework capable of aligning aggregate opinions and policies proportionally with the true population distribution of evaluator preferences. Our approach infers the feasible set of evaluator population distributions directly from pairwise comparison data. Using these estimates, the algorithm constructs a policy that satisfies foundational axioms from social choice theory, namely monotonicity and Pareto efficiency, as well as our newly-introduced axioms of population-proportional representation and population-bounded robustness. We propose a soft-max relaxation method that smoothly trade-offs population-proportional representation with the selection of the Condorcet winner (which beats all other options in pairwise comparisons). Finally, we validate the effectiveness and scalability of our approach through experiments on both tabular recommendation tasks and large-scale language model alignment.
nan
Article 1796
Title@2025-06-05 (4): Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
Title: Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods | Rollout Roulette: Ein probabilistischer Schlussfolgerungsansatz zur Inferenz-Zeit-Skalierung von LLMs mit partikelbasierten Monte-Carlo-Methoden | 滚轮推出:使用以粒子为基础的蒙特卡洛方法对LLMs的推论-时间尺度的概率推推方法 2502.01618v4 |
Authors (5): Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava
Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating scaling the computation spent at inference time. Existing inference-time scaling methods, usually with reward models, cast the task as a search problem, which tends to be vulnerable to reward hacking as a consequence of approximation errors in reward models. In this paper, we instead cast inference-time scaling as a probabilistic inference task and leverage sampling-based techniques to explore the typical set of the state distribution of a state-space model with an approximate likelihood, rather than optimize for its mode directly. We propose a novel inference-time scaling approach by adapting particle-based Monte Carlo methods to this task. Our empirical evaluation demonstrates that our methods have a 4-16x better scaling rate over our deterministic search counterparts on various challenging mathematical reasoning tasks. Using our approach, we show that Qwen2.5-Math-1.5B-Instruct can surpass GPT-4o accuracy in only 4 rollouts, while Qwen2.5-Math-7B-Instruct scales to o1 level accuracy in only 32 rollouts. Our work not only presents an effective method to inference-time scaling, but also connects the rich literature in probabilistic inference with inference-time scaling of LLMs to develop more robust algorithms in future work. Code, videos, and further information available at https://probabilistic-inference-scaling.github.io.
nan
Article 1797
Title@2025-06-05 (4): LFA applied to CNNs: Efficient Singular Value Decomposition of Convolutional Mappings by Local Fourier Analysis
Title: LFA applied to CNNs: Efficient Singular Value Decomposition of Convolutional Mappings by Local Fourier Analysis | LFA für CNNs: Effiziente Singularwertzerlegung von konvolutionären Mappings durch lokale Fourier-Analysen | LFA适用于有线电视新闻网:通过局部Fourier分析对革命绘图法的高效单项价值分解 2506.05617v1 |
Authors (3): Antonia van Betteray, Matthias Rottmann, Karsten Kahl
The singular values of convolutional mappings encode interesting spectral properties, which can be used, e.g., to improve generalization and robustness of convolutional neural networks as well as to facilitate model compression. However, the computation of singular values is typically very resource-intensive. The naive approach involves unrolling the convolutional mapping along the input and channel dimensions into a large and sparse two-dimensional matrix, making the exact calculation of all singular values infeasible due to hardware limitations. In particular, this is true for matrices that represent convolutional mappings with large inputs and a high number of channels. Existing efficient methods leverage the Fast Fourier transformation (FFT) to transform convolutional mappings into the frequency domain, enabling the computation of singular values for matrices representing convolutions with larger input and channel dimensions. For a constant number of channels in a given convolution, an FFT can compute N singular values in O(N log N) complexity. In this work, we propose an approach of complexity O(N) based on local Fourier analysis, which additionally exploits the shift invariance of convolutional operators. We provide a theoretical analysis of our algorithm’s runtime and validate its efficiency through numerical experiments. Our results demonstrate that our proposed method is scalable and offers a practical solution to calculate the entire set of singular values - along with the corresponding singular vectors if needed - for high-dimensional convolutional mappings.
nan
Article 1798
Title@2025-06-05 (4): When Maximum Entropy Misleads Policy Optimization
Title: When Maximum Entropy Misleads Policy Optimization | Wenn maximale Entropie die Politikoptimierung verfälscht | 当最大 Entroppy 错误引导政策优化 2506.05615v1 |
Authors (3): Ruipeng Zhang, Ya-Chien Chang, Sicun Gao
The Maximum Entropy Reinforcement Learning (MaxEnt RL) framework is a leading approach for achieving efficient learning and robust performance across many RL tasks. However, MaxEnt methods have also been shown to struggle with performance-critical control problems in practice, where non-MaxEnt algorithms can successfully learn. In this work, we analyze how the trade-off between robustness and optimality affects the performance of MaxEnt algorithms in complex control tasks: while entropy maximization enhances exploration and robustness, it can also mislead policy optimization, leading to failure in tasks that require precise, low-entropy policies. Through experiments on a variety of control problems, we concretely demonstrate this misleading effect. Our analysis leads to better understanding of how to balance reward design and entropy maximization in challenging control problems.
nan
Article 1799
Title@2025-06-05 (4): Extracting Interpretable Logic Rules from Graph Neural Networks
Title: Extracting Interpretable Logic Rules from Graph Neural Networks | Extrahieren von interpretierbaren Logic Rules aus Graph Neural Networks | 从图形神经网络中提取可解释逻辑规则 2503.19476v2 |
Authors (5): Chuqin Geng, Ziyu Zhao, Zhaoyue Wang, Haolin Ye, Xujie Si
Graph neural networks (GNNs) operate over both input feature spaces and combinatorial graph structures, making it challenging to understand the rationale behind their predictions. As GNNs gain widespread popularity and demonstrate success across various domains, such as drug discovery, studying their interpretability has become a critical task. To address this, many explainability methods have been proposed, with recent efforts shifting from instance-specific explanations to global concept-based explainability. However, these approaches face several limitations, such as relying on predefined concepts and explaining only a limited set of patterns. To address this, we propose a novel framework, LOGICXGNN, for extracting interpretable logic rules from GNNs. LOGICXGNN is model-agnostic, efficient, and data-driven, eliminating the need for predefined concepts. More importantly, it can serve as a rule-based classifier and even outperform the original neural models. Its interpretability facilitates knowledge discovery, as demonstrated by its ability to extract detailed and accurate chemistry knowledge that is often overlooked by existing methods. Another key advantage of LOGICXGNN is its ability to generate new graph instances in a controlled and transparent manner, offering significant potential for applications such as drug design. We empirically demonstrate these merits through experiments on real-world datasets such as MUTAG and BBBP.
nan
Article 1800
Title@2025-06-05 (4): Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
Title: Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning | Reasoning Towards Fairness: Bias in Sprachmodellen durch reasoning-guided Fine-Tuning abmildern | 追求公平的理由:通过有理、有理、有指导的微调减少语言模式中的偏见 2504.05632v3 |
Authors (3): Sanchit Kabra, Akshita Jha, Chandan K. Reddy
Recent advances in large-scale generative language models have shown that reasoning capabilities can significantly improve model performance across a variety of tasks. However, the impact of reasoning on a model’s ability to mitigate stereotypical responses remains largely underexplored. In this work, we investigate the crucial relationship between a model’s reasoning ability and fairness, and ask whether improved reasoning capabilities can mitigate harmful stereotypical responses, especially those arising due to shallow or flawed reasoning. We conduct a comprehensive evaluation of multiple open-source LLMs, and find that larger models with stronger reasoning abilities exhibit substantially lower stereotypical bias on existing fairness benchmarks. Building on this insight, we introduce ReGiFT – Reasoning Guided Fine-Tuning, a novel approach that extracts structured reasoning traces from advanced reasoning models and infuses them into models that lack such capabilities. We use only general-purpose reasoning and do not require any fairness-specific supervision for bias mitigation. Notably, we see that models fine-tuned using ReGiFT not only improve fairness relative to their non-reasoning counterparts but also outperform advanced reasoning models on fairness benchmarks. We also analyze how variations in the correctness of the reasoning traces and their length influence model fairness and their overall performance. Our findings highlight that enhancing reasoning capabilities is an effective, fairness-agnostic strategy for mitigating stereotypical bias caused by reasoning flaws.
nan
Article 1801
Title@2025-06-05 (4): Robust and Efficient Transfer Learning via Supernet Transfer in Warm-started Neural Architecture Search
Title: Robust and Efficient Transfer Learning via Supernet Transfer in Warm-started Neural Architecture Search | Robustes und effizientes Transfer-Lernen über Supernet-Transfer in der neuralen Architektur | 通过热启动的神经结构搜索中的超级网传输进行强力和高效传输学习 2407.20279v2 |
Authors (2): Prabhant Singh, Joaquin Vanschoren
Hand-designing Neural Networks is a tedious process that requires significant expertise. Neural Architecture Search (NAS) frameworks offer a very useful and popular solution that helps to democratize AI. However, these NAS frameworks are often computationally expensive to run, which limits their applicability and accessibility. In this paper, we propose a novel transfer learning approach, capable of effectively transferring pretrained supernets based on Optimal Transport or multi-dataset pretaining. This method can be generally applied to NAS methods based on Differentiable Architecture Search (DARTS). Through extensive experiments across dozens of image classification tasks, we demonstrate that transferring pretrained supernets in this way can not only drastically speed up the supernet training which then finds optimal models (3 to 5 times faster on average), but even yield that outperform those found when running DARTS methods from scratch. We also observe positive transfer to almost all target datasets, making it very robust. Besides drastically improving the applicability of NAS methods, this also opens up new applications for continual learning and related fields.
nan
Article 1802
Title@2025-06-05 (4): Adversarial Tokenization
Title: Adversarial Tokenization | Adversariale Tokenisierung | 对抗感应器化 2503.02174v2 |
Authors (3): Renato Lui Geh, Zilei Shao, Guy Van den Broeck
Current LLM pipelines account for only one possible tokenization for a given string, ignoring exponentially many alternative tokenizations during training and inference. For example, the standard Llama3 tokenization of penguin is [p,enguin], yet [peng,uin] is another perfectly valid alternative. In this paper, we show that despite LLMs being trained solely on one tokenization, they still retain semantic understanding of other tokenizations, raising questions about their implications in LLM safety. Put succinctly, we answer the following question: can we adversarially tokenize an obviously malicious string to evade safety and alignment restrictions? We show that not only is adversarial tokenization an effective yet previously neglected axis of attack, but it is also competitive against existing state-of-the-art adversarial approaches without changing the text of the harmful request. We empirically validate this exploit across three state-of-the-art LLMs and adversarial datasets, revealing a previously unknown vulnerability in subword models.
nan
Article 1803
Title@2025-06-05 (4): Model-Based Exploration in Monitored Markov Decision Processes
Title: Model-Based Exploration in Monitored Markov Decision Processes | Modellbasierte Exploration in überwachten Markov-Entscheidungsprozessen | 在监测的Markov决策过程中进行基于模型的探索 2502.16772v4 |
Authors (4): Alireza Kazemipour, Simone Parisi, Matthew E. Taylor, Michael Bowling
A tenet of reinforcement learning is that the agent always observes rewards. However, this is not true in many realistic settings, e.g., a human observer may not always be available to provide rewards, sensors may be limited or malfunctioning, or rewards may be inaccessible during deployment. Monitored Markov decision processes (Mon-MDPs) have recently been proposed to model such settings. However, existing Mon-MDP algorithms have several limitations: they do not fully exploit the problem structure, cannot leverage a known monitor, lack worst-case guarantees for ‘unsolvable’ Mon-MDPs without specific initialization, and offer only asymptotic convergence proofs. This paper makes three contributions. First, we introduce a model-based algorithm for Mon-MDPs that addresses these shortcomings. The algorithm employs two instances of model-based interval estimation: one to ensure that observable rewards are reliably captured, and another to learn the minimax-optimal policy. Second, we empirically demonstrate the advantages. We show faster convergence than prior algorithms in over four dozen benchmarks, and even more dramatic improvement when the monitoring process is known. Third, we present the first finite-sample bound on performance. We show convergence to a minimax-optimal policy even when some rewards are never observable.
nan
Article 1804
Title@2025-06-05 (4): An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus
Title: An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus | Eine optimale Kaskade-Feature-Level-Spatiotemporale Fusionsstrategie für Anomalieerkennung im CAN-Bus | CAN公共汽车异常探测的最佳最佳岩岩岩层地貌级超光时聚变战略 2501.18821v3 |
Authors (5): Mohammad Fatahi, Danial Sadrian Zadeh, Benyamin Ghojogh, Behzad Moshiri, Otman Basir
Intelligent transportation systems (ITS) play a pivotal role in modern infrastructure but face security risks due to the broadcast-based nature of the in-vehicle Controller Area Network (CAN) buses. While numerous machine learning models and strategies have been proposed to detect CAN anomalies, existing approaches lack robustness evaluations and fail to comprehensively detect attacks due to shifting their focus on a subset of dominant structures of anomalies. To overcome these limitations, the current study proposes a cascade feature-level spatiotemporal fusion framework that integrates the spatial features and temporal features through a two-parameter genetic algorithm (2P-GA)-optimized cascade architecture to cover all dominant structures of anomalies. Extensive paired t-test analysis confirms that the model achieves an AUC-ROC of 0.9987, demonstrating robust anomaly detection capabilities. The Spatial Module improves the precision by approximately 4%, while the Temporal Module compensates for recall losses, ensuring high true positive rates. The proposed framework detects all attack types with 100% accuracy on the CAR-HACKING dataset, outperforming state-of-the-art methods. This study provides a validated, robust solution for real-world CAN security challenges.
nan
Article 1805
Title@2025-06-05 (4): FaCTR: Factorized Channel-Temporal Representation Transformers for Efficient Time Series Forecasting
Title: FaCTR: Factorized Channel-Temporal Representation Transformers for Efficient Time Series Forecasting | FaCTR: Factorized Channel-Temporal Representation Transformers für effiziente Zeitreihenprognosen | FACTR: 高效时间序列预报的有条频道-临时代表制变革器 2506.05597v1 |
Authors (2): Yash Vijay, Harini Subramanyan
While Transformers excel in language and vision-where inputs are semantically rich and exhibit univariate dependency structures-their architectural complexity leads to diminishing returns in time series forecasting. Time series data is characterized by low per-timestep information density and complex dependencies across channels and covariates, requiring conditioning on structured variable interactions. To address this mismatch and overparameterization, we propose FaCTR, a lightweight spatiotemporal Transformer with an explicitly structural design. FaCTR injects dynamic, symmetric cross-channel interactions-modeled via a low-rank Factorization Machine into temporally contextualized patch embeddings through a learnable gating mechanism. It further encodes static and dynamic covariates for multivariate conditioning. Despite its compact design, FaCTR achieves state-of-the-art performance on eleven public forecasting benchmarks spanning both short-term and long-term horizons, with its largest variant using close to only 400K parameters-on average 50x smaller than competitive spatiotemporal transformer baselines. In addition, its structured design enables interpretability through cross-channel influence scores-an essential requirement for real-world decision-making. Finally, FaCTR supports self-supervised pretraining, positioning it as a compact yet versatile foundation for downstream time series tasks.
nan
Article 1806
Title@2025-06-05 (4): Zero-shot protein stability prediction by inverse folding models: a free energy interpretation
Title: Zero-shot protein stability prediction by inverse folding models: a free energy interpretation | Null-Shot-Proteinstabilitätsprognose durch inverse Faltmodelle: eine freie Energieinterpretation | 通过反折叠模型进行零射蛋白稳定性预测:免费能源解释 2506.05596v1 |
Authors (7): Jes Frellsen, Maher M. Kassem, Tone Bengtsen, Lars Olsen, Kresten Lindorff-Larsen, Jesper Ferkinghoff-Borg, Wouter Boomsma
Inverse folding models have proven to be highly effective zero-shot predictors of protein stability. Despite this success, the link between the amino acid preferences of an inverse folding model and the free-energy considerations underlying thermodynamic stability remains incompletely understood. A better understanding would be of interest not only from a theoretical perspective, but also potentially provide the basis for stronger zero-shot stability prediction. In this paper, we take steps to clarify the free-energy foundations of inverse folding models. Our derivation reveals the standard practice of likelihood ratios as a simplistic approximation and suggests several paths towards better estimates of the relative stability. We empirically assess these approaches and demonstrate that considerable gains in zero-shot performance can be achieved with fairly simple means.
nan
Article 1807
Title@2025-06-05 (4): Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations
Title: Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations | Unüberwachte Latent Pattern Analyse zur Schätzung des Typ-2-Diabetes-Risikos in nicht diagnostizierten Populationen | 未经监督的对未诊断的人群2型糖尿病风险估算的 2505.21824v2 |
Authors (3): Praveen Kumar, Vincent T. Metzger, Scott A. Malec
The global prevalence of diabetes, particularly type 2 diabetes mellitus (T2DM), is rapidly increasing, posing significant health and economic challenges. T2DM not only disrupts blood glucose regulation but also damages vital organs such as the heart, kidneys, eyes, nerves, and blood vessels, leading to substantial morbidity and mortality. In the US alone, the economic burden of diagnosed diabetes exceeded $400 billion in 2022. Early detection of individuals at risk is critical to mitigating these impacts. While machine learning approaches for T2DM prediction are increasingly adopted, many rely on supervised learning, which is often limited by the lack of confirmed negative cases. To address this limitation, we propose a novel unsupervised framework that integrates Non-negative Matrix Factorization (NMF) with statistical techniques to identify individuals at risk of developing T2DM. Our method identifies latent patterns of multimorbidity and polypharmacy among diagnosed T2DM patients and applies these patterns to estimate the T2DM risk in undiagnosed individuals. By leveraging data-driven insights from comorbidity and medication usage, our approach provides an interpretable and scalable solution that can assist healthcare providers in implementing timely interventions, ultimately improving patient outcomes and potentially reducing the future health and economic burden of T2DM.
nan
Article 1808
Title@2025-06-05 (4): Nonlinear Causal Discovery through a Sequential Edge Orientation Approach
Title: Nonlinear Causal Discovery through a Sequential Edge Orientation Approach | Nichtlineare Causal Discovery durch einen sequentiellen Edge-Orientierungsansatz | 通过序列边缘方向方法的非线性因果发现 2506.05590v1 |
Authors (2): Stella Huang, Qing Zhou
Recent advances have established the identifiability of a directed acyclic graph (DAG) under additive noise models (ANMs), spurring the development of various causal discovery methods. However, most existing methods make restrictive model assumptions, rely heavily on general independence tests, or require substantial computational time. To address these limitations, we propose a sequential procedure to orient undirected edges in a completed partial DAG (CPDAG), representing an equivalence class of DAGs, by leveraging the pairwise additive noise model (PANM) to identify their causal directions. We prove that this procedure can recover the true causal DAG assuming a restricted ANM. Building on this result, we develop a novel constraint-based algorithm for learning causal DAGs under nonlinear ANMs. Given an estimated CPDAG, we develop a ranking procedure that sorts undirected edges by their adherence to the PANM, which defines an evaluation order of the edges. To determine the edge direction, we devise a statistical test that compares the log-likelihood values, evaluated with respect to the competing directions, of a sub-graph comprising just the candidate nodes and their identified parents in the partial DAG. We further establish the structural learning consistency of our algorithm in the large-sample limit. Extensive experiments on synthetic and real-world datasets demonstrate that our method is computationally efficient, robust to model misspecification, and consistently outperforms many existing nonlinear DAG learning methods.
nan
Article 1809
Title@2025-06-05 (4): MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark
Title: MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark | MMTU: Ein massiver Multi-Task-Tabellen-Verstands- und Orientierungs-Benchmark | MMTU: 大规模多任务表格理解和说明理由基准 2506.05587v1 |
Authors (9): Junjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong, Shi Han, Lingjiao Chen, Dongmei Zhang, Surajit Chaudhuri, H. V. Jagadish
Tables and table-based use cases play a crucial role in many important real-world applications, such as spreadsheets, databases, and computational notebooks, which traditionally require expert-level users like data engineers, data analysts, and database administrators to operate. Although LLMs have shown remarkable progress in working with tables (e.g., in spreadsheet and database copilot scenarios), comprehensive benchmarking of such capabilities remains limited. In contrast to an extensive and growing list of NLP benchmarks, evaluations of table-related tasks are scarce, and narrowly focus on tasks like NL-to-SQL and Table-QA, overlooking the broader spectrum of real-world tasks that professional users face. This gap limits our understanding and model progress in this important area. In this work, we introduce MMTU, a large-scale benchmark with over 30K questions across 25 real-world table tasks, designed to comprehensively evaluate models ability to understand, reason, and manipulate real tables at the expert-level. These tasks are drawn from decades’ worth of computer science research on tabular data, with a focus on complex table tasks faced by professional users. We show that MMTU require a combination of skills – including table understanding, reasoning, and coding – that remain challenging for today’s frontier models, where even frontier reasoning models like OpenAI o4-mini and DeepSeek R1 score only around 60%, suggesting significant room for improvement. We highlight key findings in our evaluation using MMTU and hope that this benchmark drives further advances in understanding and developing foundation models for structured data processing and analysis. Our code and data are available at https://github.com/MMTU-Benchmark/MMTU and https://huggingface.co/datasets/MMTU-benchmark/MMTU.
nan
Article 1810
Title@2025-06-05 (4): CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions
Title: CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions | CoFrNets: Interpretierte Neuralarchitektur Inspiriert von fortgesetzten Brüchen | CoFrNets:受连续分数启发的可解释神经结构 2506.05586v1 |
Authors (6): Isha Puri, Amit Dhurandhar, Tejaswini Pedapati, Kartikeyan Shanmugam, Dennis Wei, Kush R. Varshney
In recent years there has been a considerable amount of research on local post hoc explanations for neural networks. However, work on building interpretable neural architectures has been relatively sparse. In this paper, we present a novel neural architecture, CoFrNet, inspired by the form of continued fractions which are known to have many attractive properties in number theory, such as fast convergence of approximations to real numbers. We show that CoFrNets can be efficiently trained as well as interpreted leveraging their particular functional form. Moreover, we prove that such architectures are universal approximators based on a proof strategy that is different than the typical strategy used to prove universal approximation results for neural networks based on infinite width (or depth), which is likely to be of independent interest. We experiment on nonlinear synthetic functions and are able to accurately model as well as estimate feature attributions and even higher order terms in some cases, which is a testament to the representational power as well as interpretability of such architectures. To further showcase the power of CoFrNets, we experiment on seven real datasets spanning tabular, text and image modalities, and show that they are either comparable or significantly better than other interpretable models and multilayer perceptrons, sometimes approaching the accuracies of state-of-the-art models.
nan
Article 1811
Title@2025-06-05 (4): TabFlex: Scaling Tabular Learning to Millions with Linear Attention
Title: TabFlex: Scaling Tabular Learning to Millions with Linear Attention | TabFlex: Tabulares Lernen zu Millionen mit linearer Aufmerksamkeit skalieren | TabFlex: 以线性关注方式向百万上百万上百万人推广表格学习 2506.05584v1 |
Authors (4): Yuchen Zeng, Tuan Dinh, Wonjun Kang, Andreas C Mueller
Leveraging the in-context learning (ICL) capability of Large Language Models (LLMs) for tabular classification has gained significant attention for its training-free adaptability across diverse datasets. Recent advancements, like TabPFN, excel in small-scale tabular datasets but struggle to scale for large and complex datasets. Our work enhances the efficiency and scalability of TabPFN for larger datasets by incorporating linear attention mechanisms as a scalable alternative to complexity-quadratic self-attention. Our model, TabFlex, efficiently handles tabular datasets with thousands of features and hundreds of classes, scaling seamlessly to millions of samples. For instance, TabFlex processes the poker-hand dataset with over a million samples in just 5 seconds. Our extensive evaluations demonstrate that TabFlex can achieve over a 2x speedup compared to TabPFN and a 1.5x speedup over XGBoost, outperforming 25 tested baselines in terms of efficiency across a diverse range of datasets. Furthermore, TabFlex remains highly effective on large-scale datasets, delivering strong performance with significantly reduced computational costs, especially when combined with data-efficient techniques such as dimensionality reduction and data sampling.
nan
Article 1812
Title@2025-06-05 (4): Conformal Prediction Adaptive to Unknown Subpopulation Shifts
Title: Conformal Prediction Adaptive to Unknown Subpopulation Shifts | Konforme Vorhersage Anpassung an unbekannte Subpopulationsverschiebungen | 适应未知亚人口变化 2506.05583v1 |
Authors (4): Nien-Shao Wang, Duygu Nur Yaldiz, Yavuz Faruk Bakman, Sai Praneeth Karimireddy
Conformal prediction is widely used to equip black-box machine learning models with uncertainty quantification enjoying formal coverage guarantees. However, these guarantees typically break down in the presence of distribution shifts, where the data distribution at test time differs from the training (or calibration-time) distribution. In this work, we address subpopulation shifts, where the test environment exhibits an unknown and differing mixture of subpopulations compared to the calibration data. We propose new methods that provably adapt conformal prediction to such shifts, ensuring valid coverage without requiring explicit knowledge of subpopulation structure. Our algorithms scale to high-dimensional settings and perform effectively in realistic machine learning tasks. Extensive experiments on vision (with vision transformers) and language (with large language models) benchmarks demonstrate that our methods reliably maintain coverage and controls risk in scenarios where standard conformal prediction fails.
nan
Article 1813
Title@2025-06-05 (4): On the Importance of Gaussianizing Representations
Title: On the Importance of Gaussianizing Representations | Über die Bedeutung gaussianisierender Repräsentationen | 关于 “ 高机构化 “ 代表的重要性 2505.00685v2 |
Authors (2): Daniel Eftekhari, Vardan Papyan
The normal distribution plays a central role in information theory - it is at the same time the best-case signal and worst-case noise distribution, has the greatest representational capacity of any distribution, and offers an equivalence between uncorrelatedness and independence for joint distributions. Accounting for the mean and variance of activations throughout the layers of deep neural networks has had a significant effect on facilitating their effective training, but seldom has a prescription for precisely what distribution these activations should take, and how this might be achieved, been offered. Motivated by the information-theoretic properties of the normal distribution, we address this question and concurrently present normality normalization: a novel normalization layer which encourages normality in the feature representations of neural networks using the power transform and employs additive Gaussian noise during training. Our experiments comprehensively demonstrate the effectiveness of normality normalization, in regards to its generalization performance on an array of widely used model and dataset combinations, its strong performance across various common factors of variation such as model width, depth, and training minibatch size, its suitability for usage wherever existing normalization layers are conventionally used, and as a means to improving model robustness to random perturbations.
nan
Article 1814
Title@2025-06-05 (4): The Impact of Inference Acceleration on Bias of LLMs
Title: The Impact of Inference Acceleration on Bias of LLMs | Die Auswirkungen von Inferenzbeschleunigung auf LLM-Bias | 推推加速对LLMM的偏见的影响 2410.22118v3 |
Authors (4): Elisabeth Kirsten, Ivan Habernal, Vedant Nanda, Muhammad Bilal Zafar
Last few years have seen unprecedented advances in capabilities of Large Language Models (LLMs). These advancements promise to benefit a vast array of application domains. However, due to their immense size, performing inference with LLMs is both costly and slow. Consequently, a plethora of recent work has proposed strategies to enhance inference efficiency, e.g., quantization, pruning, and caching. These acceleration strategies reduce the inference cost and latency, often by several factors, while maintaining much of the predictive performance measured via common benchmarks. In this work, we explore another critical aspect of LLM performance: demographic bias in model generations due to inference acceleration optimizations. Using a wide range of metrics, we probe bias in model outputs from a number of angles. Analysis of outputs before and after inference acceleration shows significant change in bias. Worryingly, these bias effects are complex and unpredictable. A combination of an acceleration strategy and bias type may show little bias change in one model but may lead to a large effect in another. Our results highlight a need for in-depth and case-by-case evaluation of model bias after it has been modified to accelerate inference.
nan
Article 1815
Title@2025-06-05 (4): Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts
Title: Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts | Gemeinsames Lernen in Agentensystemen: Eine kollektive KI ist größer als die Summe ihrer Teile | 危险系统合作学习:集体AI大于其各部分的总和 2506.05577v1 |
Authors (10): Saptarshi Nath, Christos Peridis, Eseoghene Benjamin, Xinran Liu, Soheil Kolouri, Peter Kinnell, Zexin Li, Cong Liu, Shirin Dora, Andrea Soltoggio
Agentic AI has gained significant interest as a research paradigm focused on autonomy, self-directed learning, and long-term reliability of decision making. Real-world agentic systems operate in decentralized settings on a large set of tasks or data distributions with constraints such as limited bandwidth, asynchronous execution, and the absence of a centralized model or even common objectives. We posit that exploiting previously learned skills, task similarities, and communication capabilities in a collective of agentic AI are challenging but essential elements to enabling scalability, open-endedness, and beneficial collaborative learning dynamics. In this paper, we introduce Modular Sharing and Composition in Collective Learning (MOSAIC), an agentic algorithm that allows multiple agents to independently solve different tasks while also identifying, sharing, and reusing useful machine-learned knowledge, without coordination, synchronization, or centralized control. MOSAIC combines three mechanisms: (1) modular policy composition via neural network masks, (2) cosine similarity estimation using Wasserstein embeddings for knowledge selection, and (3) asynchronous communication and policy integration. Results on a set of RL benchmarks show that MOSAIC has a greater sample efficiency than isolated learners, i.e., it learns significantly faster, and in some cases, finds solutions to tasks that cannot be solved by isolated learners. The collaborative learning and sharing dynamics are also observed to result in the emergence of ideal curricula of tasks, from easy to hard. These findings support the case for collaborative learning in agentic systems to achieve better and continuously evolving performance both at the individual and collective levels.
nan
Article 1816
Title@2025-06-05 (4): When can in-context learning generalize out of task distribution?
Title: When can in-context learning generalize out of task distribution? | Wann kann das In-Context-Lernen aus der Aufgabenverteilung verallgemeinern? | 什么时候能通俗易懂地了解任务分配的概况? 2506.05574v1 |
Authors (4): Chase Goddard, Lindsay M. Smith, Vudtiwat Ngampruetikorn, David J. Schwab
In-context learning (ICL) is a remarkable capability of pretrained transformers that allows models to generalize to unseen tasks after seeing only a few examples. We investigate empirically the conditions necessary on the pretraining distribution for ICL to emerge and generalize \emph{out-of-distribution}. Previous work has focused on the number of distinct tasks necessary in the pretraining dataset. Here, we use a different notion of task diversity to study the emergence of ICL in transformers trained on linear functions. We find that as task diversity increases, transformers undergo a transition from a specialized solution, which exhibits ICL only within the pretraining task distribution, to a solution which generalizes out of distribution to the entire task space. We also investigate the nature of the solutions learned by the transformer on both sides of the transition, and observe similar transitions in nonlinear regression problems. We construct a phase diagram to characterize how our concept of task diversity interacts with the number of pretraining tasks. In addition, we explore how factors such as the depth of the model and the dimensionality of the regression problem influence the transition.
nan
Article 1817
Title@2025-06-05 (4): Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning
Title: Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning | Ravan: Multi-Head Low-Rank Anpassung für Federated Fine-Tuning | Ravan:针对联邦罚款的多级低兰克适应措施 2506.05568v1 |
Authors (4): Arian Raje, Baris Askin, Divyansh Jhunjhunwala, Gauri Joshi
Large language models (LLMs) have not yet effectively leveraged the vast amounts of edge-device data, and federated learning (FL) offers a promising paradigm to collaboratively fine-tune LLMs without transferring private edge data to the cloud. To operate within the computation and communication constraints of edge devices, recent literature on federated fine-tuning of LLMs proposes the use of low-rank adaptation (LoRA) and similar parameter-efficient methods. However, LoRA-based methods suffer from accuracy degradation in FL settings, primarily because of data and computational heterogeneity across clients. We propose \textsc{Ravan}, an adaptive multi-head LoRA method that balances parameter efficiency and model expressivity by reparameterizing the weight updates as the sum of multiple LoRA heads $s_i\textbf{B}_i\textbf{H}_i\textbf{A}_i$ in which only the core matrices $\textbf{H}_i$ and their lightweight scaling factors $s_i$ are trained. These trainable scaling factors let the optimization focus on the most useful heads, recovering a higher-rank approximation of the full update without increasing the number of communicated parameters since clients upload $s_i\textbf{H}_i$ directly. Experiments on vision and language benchmarks show that \textsc{Ravan} improves test accuracy by 2-8\% over prior parameter-efficient baselines, making it a robust and scalable solution for federated fine-tuning of LLMs.
nan
Article 1818
Title@2025-06-05 (4): Partially-Supervised Neural Network Model For Quadratic Multiparametric Programming
Title: Partially-Supervised Neural Network Model For Quadratic Multiparametric Programming | Teilweise überwachtes neurales Netzwerkmodell für die quadratische multiparametrische Programmierung | 半二次多参数方案拟订部分运行的神经网络模型 2506.05567v1 |
Authors (3): Fuat Can Beylunioglu, Mehrdad Pirnia, P. Robert Duimering
Neural Networks (NN) with ReLU activation functions are used to model multiparametric quadratic optimization problems (mp-QP) in diverse engineering applications. Researchers have suggested leveraging the piecewise affine property of deep NN models to solve mp-QP with linear constraints, which also exhibit piecewise affine behaviour. However, traditional deep NN applications to mp-QP fall short of providing optimal and feasible predictions, even when trained on large datasets. This study proposes a partially-supervised NN (PSNN) architecture that directly represents the mathematical structure of the global solution function. In contrast to generic NN training approaches, the proposed PSNN method derives a large proportion of model weights directly from the mathematical properties of the optimization problem, producing more accurate solutions despite significantly smaller training data sets. Many energy management problems are formulated as QP, so we apply the proposed approach to energy systems (specifically DC optimal power flow) to demonstrate proof of concept. Model performance in terms of solution accuracy and speed of predictions was compared against a commercial solver and a generic Deep NN model based on classical training. Results show KKT sufficient conditions for PSNN consistently outperform generic NN architectures with classical training using far less data, including when tested on extreme, out-of-training distribution test data. Given its speed advantages over traditional solvers, the PSNN model can quickly produce optimal and feasible solutions within a second for millions of input parameters sampled from a distribution of stochastic demands and renewable generator dispatches, which can be used for simulations and long term planning.
nan
Article 1819
Title@2025-06-05 (4): Applying Informer for Option Pricing: A Transformer-Based Approach
Title: Applying Informer for Option Pricing: A Transformer-Based Approach | Informer für Optionspreise anwenden: Ein transformerbasierter Ansatz | 备选定价应用内线:基于变换器的方法 2506.05565v1 |
Authors (2): Feliks Bańka, Jarosław A. Chudziak
Accurate option pricing is essential for effective trading and risk management in financial markets, yet it remains challenging due to market volatility and the limitations of traditional models like Black-Scholes. In this paper, we investigate the application of the Informer neural network for option pricing, leveraging its ability to capture long-term dependencies and dynamically adjust to market fluctuations. This research contributes to the field of financial forecasting by introducing Informer’s efficient architecture to enhance prediction accuracy and provide a more adaptable and resilient framework compared to existing methods. Our results demonstrate that Informer outperforms traditional approaches in option pricing, advancing the capabilities of data-driven financial forecasting in this domain.
nan
Article 1820
Title@2025-06-05 (4): Relational Conformal Prediction for Correlated Time Series
Title: Relational Conformal Prediction for Correlated Time Series | Beziehungskonforme Vorhersage für korrelierte Zeitreihen | Cor相关时间序列的通货膨胀正规预测 2502.09443v2 |
Authors (5): Andrea Cini, Alexander Jenkins, Danilo Mandic, Cesare Alippi, Filippo Maria Bianchi
We address the problem of uncertainty quantification in time series forecasting by exploiting observations at correlated sequences. Relational deep learning methods leveraging graph representations are among the most effective tools for obtaining point estimates from spatiotemporal data and correlated time series. However, the problem of exploiting relational structures to estimate the uncertainty of such predictions has been largely overlooked in the same context. To this end, we propose a novel distribution-free approach based on the conformal prediction framework and quantile regression. Despite the recent applications of conformal prediction to sequential data, existing methods operate independently on each target time series and do not account for relationships among them when constructing the prediction interval. We fill this void by introducing a novel conformal prediction method based on graph deep learning operators. Our approach, named Conformal Relational Prediction (CoRel), does not require the relational structure (graph) to be known a priori and can be applied on top of any pre-trained predictor. Additionally, CoRel includes an adaptive component to handle non-exchangeable data and changes in the input time series. Our approach provides accurate coverage and achieves state-of-the-art uncertainty quantification in relevant benchmarks.
nan
Article 1821
Title@2025-06-05 (4): Mixed-Precision Conjugate Gradient Solvers with RL-Driven Precision Tuning
Title: Mixed-Precision Conjugate Gradient Solvers with RL-Driven Precision Tuning | Mixed-Precision Conjugate Gradient Solver mit RL-getriebener Präzisionstuning | RL-Driven 精确调试混合精密共分梯度溶剂 2504.14268v4 |
Authors (1): Xinye Chen
This paper presents a novel reinforcement learning (RL) framework for dynamically optimizing numerical precision in the preconditioned conjugate gradient (CG) method. By modeling precision selection as a Markov Decision Process (MDP), we employ Q-learning to adaptively assign precision levels to key operations, striking an optimal balance between computational efficiency and numerical accuracy, while ensuring stability through double-precision scalar computations and residual computing. In practice, the algorithm is trained on a set of data and subsequently performs inference for precision selection on out-of-sample data, without requiring re-analysis or retraining for new datasets. This enables the method to adapt seamlessly to new problem instances without the computational overhead of recalibration. Our results demonstrate the effectiveness of RL in enhancing solver’s performance, marking the first application of RL to mixed-precision numerical methods. The findings highlight the approach’s practical advantages, robustness, and scalability, providing valuable insights into its integration with iterative solvers and paving the way for AI-driven advancements in scientific computing.
nan
Article 1822
Title@2025-06-05 (4): DART-Vetter: A Deep LeARning Tool for automatic triage of exoplanet candidates
Title: DART-Vetter: A Deep LeARning Tool for automatic triage of exoplanet candidates | DART-Vetter: Ein Deep LeARning Tool für die automatische Triage von Exoplaneten-Kandidaten | DART – – 设计者:一个用于自动分流异丙型板块候选物的深深LeARning工具 2506.05556v1 |
Authors (15): Stefano Fiscale, Laura Inno, Alessandra Rotundi, Angelo Ciaramella, Alessio Ferone, Christian Magliano, Luca Cacciapuoti, Veselin Kostov, Elisa Quintana, Giovanni Covone, Maria Teresa Muscari Tomajoli, Vito Saggese, Luca Tonietti, Antonio Vanzanella, Vincenzo Della Corte
In the identification of new planetary candidates in transit surveys, the employment of Deep Learning models proved to be essential to efficiently analyse a continuously growing volume of photometric observations. To further improve the robustness of these models, it is necessary to exploit the complementarity of data collected from different transit surveys such as NASA’s Kepler, Transiting Exoplanet Survey Satellite (TESS), and, in the near future, the ESA PLAnetary Transits and Oscillation of stars (PLATO) mission. In this work, we present a Deep Learning model, named DART-Vetter, able to distinguish planetary candidates (PC) from false positives signals (NPC) detected by any potential transiting survey. DART-Vetter is a Convolutional Neural Network that processes only the light curves folded on the period of the relative signal, featuring a simpler and more compact architecture with respect to other triaging and/or vetting models available in the literature. We trained and tested DART-Vetter on several dataset of publicly available and homogeneously labelled TESS and Kepler light curves in order to prove the effectiveness of our model. Despite its simplicity, DART-Vetter achieves highly competitive triaging performance, with a recall rate of 91% on an ensemble of TESS and Kepler data, when compared to Exominer and Astronet-Triage. Its compact, open source and easy to replicate architecture makes DART-Vetter a particularly useful tool for automatizing triaging procedures or assisting human vetters, showing a discrete generalization on TCEs with Multiple Event Statistic (MES) > 20 and orbital period < 50 days.
nan
Article 1823
Title@2025-06-05 (4): Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP
Title: Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP | Multivariate Temporale Regression im Maßstab: Ein Drei-Pillar-Framework kombiniert ML, XAI und NLP | 规模上多变时间回落:三边框架组合 ML、 XAI 和 NLP 2504.02151v2 |
Authors (2): Jiztom Kavalakkatt Francis, Matthew J Darr
This paper introduces a novel framework that accelerates the discovery of actionable relationships in high-dimensional temporal data by integrating machine learning (ML), explainable AI (XAI), and natural language processing (NLP) to enhance data quality and streamline workflows. Traditional methods often fail to recognize complex temporal relationships, leading to noisy, redundant, or biased datasets. Our approach combines ML-driven pruning to identify and mitigate low-quality samples, XAI-based interpretability to validate critical feature interactions, and NLP for future contextual validation, reducing the time required to uncover actionable insights by 40-60%. Evaluated on real-world agricultural and synthetic datasets, the framework significantly improves performance metrics (e.g., MSE, R2, MAE) and computational efficiency, with hardware-agnostic scalability across diverse platforms. While long-term real-world impacts (e.g., cost savings, sustainability gains) are pending, this methodology provides an immediate pathway to accelerate data-centric AI in dynamic domains like agriculture and energy, enabling faster iteration cycles for domain experts.
nan
Article 1824
Title@2025-06-05 (4): Online Conformal Model Selection for Nonstationary Time Series
Title: Online Conformal Model Selection for Nonstationary Time Series | Online-Konforme Modellauswahl für nichtstationäre Zeitreihen | 非固定时间系列在线非正式模型选择 2506.05544v1 |
Authors (2): Shibo Li, Yao Zheng
This paper introduces the MPS (Model Prediction Set), a novel framework for online model selection for nonstationary time series. Classical model selection methods, such as information criteria and cross-validation, rely heavily on the stationarity assumption and often fail in dynamic environments which undergo gradual or abrupt changes over time. Yet real-world data are rarely stationary, and model selection under nonstationarity remains a largely open problem. To tackle this challenge, we combine conformal inference with model confidence sets to develop a procedure that adaptively selects models best suited to the evolving dynamics at any given time. Concretely, the MPS updates in real time a confidence set of candidate models that covers the best model for the next time period with a specified long-run probability, while adapting to nonstationarity of unknown forms. Through simulations and real-world data analysis, we demonstrate that MPS reliably and efficiently identifies optimal models under nonstationarity, an essential capability lacking in offline methods. Moreover, MPS frequently produces high-quality sets with small cardinality, whose evolution offers deeper insights into changing dynamics. As a generic framework, MPS accommodates any data-generating process, data structure, model class, training method, and evaluation metric, making it broadly applicable across diverse problem settings.
nan
Article 1825
Title@2025-06-05 (4): Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data
Title: Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data | Agentomics-ML: Autonomes Machine Learning Experimentation Agent für Genomische und Transkriptionsdaten | ML:基因组和转基因数据自动机械学习实验代理 2506.05542v1 |
Authors (9): Vlastimil Martinek, Andrea Gariboldi, Dimosthenis Tzimotoudis, Aitor Alberdi Escudero, Edward Blake, David Cechak, Luke Cassar, Alessandro Balestrucci, Panagiotis Alexiou
The adoption of machine learning (ML) and deep learning methods has revolutionized molecular medicine by driving breakthroughs in genomics, transcriptomics, drug discovery, and biological systems modeling. The increasing quantity, multimodality, and heterogeneity of biological datasets demand automated methods that can produce generalizable predictive models. Recent developments in large language model-based agents have shown promise for automating end-to-end ML experimentation on structured benchmarks. However, when applied to heterogeneous computational biology datasets, these methods struggle with generalization and success rates. Here, we introduce Agentomics-ML, a fully autonomous agent-based system designed to produce a classification model and the necessary files for reproducible training and inference. Our method follows predefined steps of an ML experimentation process, repeatedly interacting with the file system through Bash to complete individual steps. Once an ML model is produced, training and validation metrics provide scalar feedback to a reflection step to identify issues such as overfitting. This step then creates verbal feedback for future iterations, suggesting adjustments to steps such as data representation, model architecture, and hyperparameter choices. We have evaluated Agentomics-ML on several established genomic and transcriptomic benchmark datasets and show that it outperforms existing state-of-the-art agent-based methods in both generalization and success rates. While state-of-the-art models built by domain experts still lead in absolute performance on the majority of the computational biology datasets used in this work, Agentomics-ML narrows the gap for fully autonomous systems and achieves state-of-the-art performance on one of the used benchmark datasets. The code is available at https://github.com/BioGeMT/Agentomics-ML.
nan
Article 1826
Title@2025-06-05 (4): SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms
Title: SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms | SocialDF: Benchmark Dataset und Detection Model zur Minderung schädlicher Deepfake-Inhalte auf Social Media-Plattformen | 社会DF:社会媒体平台减少有害深层内容的基准数据集和检测模式 2506.05538v1 |
Authors (6): Arnesh Batra, Anushk Kumar, Jashn Khemani, Arush Gumber, Arhan Jain, Somil Gupta
The rapid advancement of deep generative models has significantly improved the realism of synthetic media, presenting both opportunities and security challenges. While deepfake technology has valuable applications in entertainment and accessibility, it has emerged as a potent vector for misinformation campaigns, particularly on social media. Existing detection frameworks struggle to distinguish between benign and adversarially generated deepfakes engineered to manipulate public perception. To address this challenge, we introduce SocialDF, a curated dataset reflecting real-world deepfake challenges on social media platforms. This dataset encompasses high-fidelity deepfakes sourced from various online ecosystems, ensuring broad coverage of manipulative techniques. We propose a novel LLM-based multi-factor detection approach that combines facial recognition, automated speech transcription, and a multi-agent LLM pipeline to cross-verify audio-visual cues. Our methodology emphasizes robust, multi-modal verification techniques that incorporate linguistic, behavioral, and contextual analysis to effectively discern synthetic media from authentic content.
nan
Article 1827
Title@2025-06-05 (4): Faster Approx. Top-K: Harnessing the Full Power of Two Stages
Title: Faster Approx. Top-K: Harnessing the Full Power of Two Stages | Schneller ca. Top-K: Die volle Kraft zweier Etappen nutzen | 快速近似。 最高K级: 利用两个阶段的全功率 2506.04165v2 |
Authors (5): Yashas Samaga, Varun Yerram, Spandana Raj Babbula, Prateek Jain, Praneeth Netrapalli
We consider the Top-$K$ selection problem, which aims to identify the largest-$K$ elements from an array. Top-$K$ selection arises in many machine learning algorithms and often becomes a bottleneck on accelerators, which are optimized for dense matrix multiplications. To address this problem, \citet{chern2022tpuknnknearestneighbor} proposed a fast two-stage \textit{approximate} Top-$K$ algorithm: (i) partition the input array and select the top-$1$ element from each partition, (ii) sort this \textit{smaller subset} and return the top $K$ elements. In this paper, we consider a generalized version of this algorithm, where the first stage selects top-$K’$ elements, for some $1 \leq K’ \leq K$, from each partition. Our contributions are as follows: (i) we derive an expression for the expected recall of this generalized algorithm and show that choosing $K’ > 1$ with fewer partitions in the first stage reduces the input size to the second stage more effectively while maintaining the same expected recall as the original algorithm, (ii) we derive a bound on the expected recall for the original algorithm in \citet{chern2022tpuknnknearestneighbor} that is provably tighter by a factor of $2$ than the one in that paper, and (iii) we implement our algorithm on Cloud TPUv5e and achieve around an order of magnitude speedups over the original algorithm without sacrificing recall on real-world tasks.
nan
Article 1828
Title@2025-06-05 (4): A new pathway to generative artificial intelligence by minimizing the maximum entropy
Title: A new pathway to generative artificial intelligence by minimizing the maximum entropy | Ein neuer Weg zur generativen künstlichen Intelligenz durch Minimierung der maximalen Entropie | 通过最大限度地减少最大倍增 实现人工智能基因化的新途径 2502.13287v2 |
Authors (2): Mattia Miotto, Lorenzo Monacelli
Generative artificial intelligence revolutionized society. Current models are trained by minimizing the distance between the produced data and the training set. Consequently, development is plateauing as they are intrinsically data-hungry and challenging to direct during the generative process. To overcome these limitations, we introduce a paradigm shift through a framework where we do not fit the training set but find the most informative yet least noisy representation of the data simultaneously minimizing the entropy to reduce noise and maximizing it to remain unbiased via adversary training. The result is a general physics-driven model, which is data-efficient and flexible, permitting to control and influence the generative process. Benchmarking shows that our approach outperforms variational autoencoders. We demonstrate the methods effectiveness in generating images, even with limited training data, and its unprecedented capability to customize the generation process a posteriori without any fine-tuning or retraining
nan
Article 1829
Title@2025-06-05 (4): MCMC-Correction of Score-Based Diffusion Models for Model Composition
Title: MCMC-Correction of Score-Based Diffusion Models for Model Composition | MCMC-Korrektion von Score-basierten Diffusionsmodellen für die Modellzusammensetzung | MC-基于记分传播模型模型的示范构成的校正 2307.14012v3 |
Authors (5): Anders Sjöberg, Jakob Lindqvist, Magnus Önnheim, Mats Jirstrand, Lennart Svensson
Diffusion models can be parameterized in terms of either a score or an energy function. The energy parameterization is attractive as it enables sampling procedures such as Markov Chain Monte Carlo (MCMC) that incorporates a Metropolis-Hastings (MH) correction step based on energy differences between proposed samples. Such corrections can significantly improve sampling quality, particularly in the context of model composition, where pre-trained models are combined to generate samples from novel distributions. Score-based diffusion models, on the other hand, are more widely adopted and come with a rich ecosystem of pre-trained models. However, they do not, in general, define an underlying energy function, making MH-based sampling inapplicable. In this work, we address this limitation by retaining the score parameterization and introducing a novel MH-like acceptance rule based on line integration of the score function. This allows the reuse of existing diffusion models while still combining the reverse process with various MCMC techniques, viewed as an instance of annealed MCMC. Through experiments on synthetic and real-world data, we show that our MH-like samplers offer comparable improvements to those obtained with energy-based models, without requiring explicit energy parameterization.
nan
Article 1830
Title@2025-06-05 (4): Spectral Graph Neural Networks are Incomplete on Graphs with a Simple Spectrum
Title: Spectral Graph Neural Networks are Incomplete on Graphs with a Simple Spectrum | Spektrale Graphen-Neuralnetze sind auf Graphen mit einem einfachen Spektrum unvollständig | 光谱图形神经网络在带有简单光谱的图形上不完整 2506.05530v1 |
Authors (4): Snir Hordan, Maya Bechler-Speicher, Gur Lifshitz, Nadav Dym
Spectral features are widely incorporated within Graph Neural Networks (GNNs) to improve their expressive power, or their ability to distinguish among non-isomorphic graphs. One popular example is the usage of graph Laplacian eigenvectors for positional encoding in MPNNs and Graph Transformers. The expressive power of such Spectrally-enhanced GNNs (SGNNs) is usually evaluated via the k-WL graph isomorphism test hierarchy and homomorphism counting. Yet, these frameworks align poorly with the graph spectra, yielding limited insight into SGNNs’ expressive power. We leverage a well-studied paradigm of classifying graphs by their largest eigenvalue multiplicity to introduce an expressivity hierarchy for SGNNs. We then prove that many SGNNs are incomplete even on graphs with distinct eigenvalues. To mitigate this deficiency, we adapt rotation equivariant neural networks to the graph spectra setting to propose a method to provably improve SGNNs’ expressivity on simple spectrum graphs. We empirically verify our theoretical claims via an image classification experiment on the MNIST Superpixel dataset and eigenvector canonicalization on graphs from ZINC.
nan
Article 1831
Title@2025-06-05 (4): Avoiding Death through Fear Intrinsic Conditioning
Title: Avoiding Death through Fear Intrinsic Conditioning | Vermeiden des Todes durch Angst Intrinsische Konditionierung | 避免因恐惧而死亡 2506.05529v1 |
Authors (4): Rodney Sanchez, Ferat Sahin, Alexander Ororbia, Jamison Heard
Biological and psychological concepts have inspired reinforcement learning algorithms to create new complex behaviors that expand agents’ capacity. These behaviors can be seen in the rise of techniques like goal decomposition, curriculum, and intrinsic rewards, which have paved the way for these complex behaviors. One limitation in evaluating these methods is the requirement for engineered extrinsic for realistic environments. A central challenge in engineering the necessary reward function(s) comes from these environments containing states that carry high negative rewards, but provide no feedback to the agent. Death is one such stimuli that fails to provide direct feedback to the agent. In this work, we introduce an intrinsic reward function inspired by early amygdala development and produce this intrinsic reward through a novel memory-augmented neural network (MANN) architecture. We show how this intrinsic motivation serves to deter exploration of terminal states and results in avoidance behavior similar to fear conditioning observed in animals. Furthermore, we demonstrate how modifying a threshold where the fear response is active produces a range of behaviors that are described under the paradigm of general anxiety disorders (GADs). We demonstrate this behavior in the Miniworld Sidewalk environment, which provides a partially observable Markov decision process (POMDP) and a sparse reward with a non-descriptive terminal condition, i.e., death. In effect, this study results in a biologically-inspired neural architecture and framework for fear conditioning paradigms; we empirically demonstrate avoidance behavior in a constructed agent that is able to solve environments with non-descriptive terminal conditions.
nan
Article 1832
Title@2025-06-05 (4): MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning
Title: MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning | MORSE-500: Ein programmierbarer Video-Benchmark zum Stress-Test Multimodale Reasoning | MORSE-500:压力测试多模式理由方案可控制的视频基准 2506.05523v1 |
Authors (13): Zikui Cai, Andrew Wang, Anirudh Satheesh, Ankit Nakhawa, Hyunwoo Jae, Keenan Powell, Minghui Liu, Neel Jay, Sungbin Oh, Xiyao Wang, Yongyuan Liang, Tom Goldstein, Furong Huang
Despite rapid advances in vision-language models (VLMs), current benchmarks for multimodal reasoning fall short in three key dimensions. First, they overwhelmingly rely on static images, failing to capture the temporal complexity of real-world environments. Second, they narrowly focus on mathematical problem-solving, neglecting the broader spectrum of reasoning skills – including abstract, physical, planning, spatial, and temporal capabilities – required for robust multimodal intelligence. Third, many benchmarks quickly saturate, offering limited headroom for diagnosing failure modes or measuring continued progress. We introduce MORSE-500 (Multimodal Reasoning Stress-test Environment), a video benchmark composed of 500 fully scripted clips with embedded questions spanning six complementary reasoning categories. Each instance is programmatically generated using deterministic Python scripts (via Manim, Matplotlib, MoviePy), generative video models, and curated real footage. This script-driven design allows fine-grained control over visual complexity, distractor density, and temporal dynamics – enabling difficulty to be scaled systematically as models improve. Unlike static benchmarks that become obsolete once saturated, MORSE-500 is built to evolve: its controllable generation pipeline supports the creation of arbitrarily challenging new instances, making it ideally suited for stress-testing next-generation models. Initial experiments with state-of-the-art systems – including various Gemini 2.5 Pro and OpenAI o3 which represent the strongest available at the time, alongside strong open-source models – reveal substantial performance gaps across all categories, with particularly large deficits in abstract and planning tasks. We release the full dataset, generation scripts, and evaluation harness to support transparent, reproducible, and forward-looking multimodal reasoning research.
nan
Article 1833
Title@2025-06-05 (4): The Complexity of Learning Sparse Superposed Features with Feedback
Title: The Complexity of Learning Sparse Superposed Features with Feedback | Die Komplexität des Lernens Sparse überlagerte Funktionen mit Feedback | 具有反馈的学习偏散超强特征的复杂性 2502.05407v3 |
Authors (1): Akash Kumar
The success of deep networks is crucially attributed to their ability to capture latent features within a representation space. In this work, we investigate whether the underlying learned features of a model can be efficiently retrieved through feedback from an agent, such as a large language model (LLM), in the form of relative \textit{triplet comparisons}. These features may represent various constructs, including dictionaries in LLMs or a covariance matrix of Mahalanobis distances. We analyze the feedback complexity associated with learning a feature matrix in sparse settings. Our results establish tight bounds when the agent is permitted to construct activations and demonstrate strong upper bounds in sparse scenarios when the agent’s feedback is limited to distributional information. We validate our theoretical findings through experiments on two distinct applications: feature recovery from Recursive Feature Machines and dictionary extraction from sparse autoencoders trained on Large Language Models.
nan
Article 1834
Title@2025-06-05 (4): Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots
Title: Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots | Learning to Recover: Dynamische Belohnungsformung mit Rad-Leg-Koordination für gefallene Roboter | 学习恢复: 堕落机器人使用轮式轴协调进行动态回授形状 2506.05516v1 |
Authors (5): Boyuan Deng, Luca Rossini, Jin Wang, Weijie Wang, Nikolaos Tsagarakis
Adaptive recovery from fall incidents are essential skills for the practical deployment of wheeled-legged robots, which uniquely combine the agility of legs with the speed of wheels for rapid recovery. However, traditional methods relying on preplanned recovery motions, simplified dynamics or sparse rewards often fail to produce robust recovery policies. This paper presents a learning-based framework integrating Episode-based Dynamic Reward Shaping and curriculum learning, which dynamically balances exploration of diverse recovery maneuvers with precise posture refinement. An asymmetric actor-critic architecture accelerates training by leveraging privileged information in simulation, while noise-injected observations enhance robustness against uncertainties. We further demonstrate that synergistic wheel-leg coordination reduces joint torque consumption by 15.8% and 26.2% and improves stabilization through energy transfer mechanisms. Extensive evaluations on two distinct quadruped platforms achieve recovery success rates up to 99.1% and 97.8% without platform-specific tuning. The supplementary material is available at https://boyuandeng.github.io/L2R-WheelLegCoordination/
nan
Article 1835
Title@2025-06-05 (4): Winner-takes-all for Multivariate Probabilistic Time Series Forecasting
Title: Winner-takes-all for Multivariate Probabilistic Time Series Forecasting | Winner-takes-all für Multivariate probabilistische Zeitreihen-Prognose | 多变量概率性时间序列预测的赢取全胜 2506.05515v1 |
Authors (3): Adrien Cortés, Rémi Rehm, Victor Letzelter
We introduce TimeMCL, a method leveraging the Multiple Choice Learning (MCL) paradigm to forecast multiple plausible time series futures. Our approach employs a neural network with multiple heads and utilizes the Winner-Takes-All (WTA) loss to promote diversity among predictions. MCL has recently gained attention due to its simplicity and ability to address ill-posed and ambiguous tasks. We propose an adaptation of this framework for time-series forecasting, presenting it as an efficient method to predict diverse futures, which we relate to its implicit quantization objective. We provide insights into our approach using synthetic data and evaluate it on real-world time series, demonstrating its promising performance at a light computational cost.
nan
Article 1836
Title@2025-06-05 (4): Geometric and Physical Constraints Synergistically Enhance Neural PDE Surrogates
Title: Geometric and Physical Constraints Synergistically Enhance Neural PDE Surrogates | Geometrische und physikalische Einschränkungen Synergistisch verbessern neurale PDE-Überlagerungen | 同步增强神经PDE代号 2506.05513v1 |
Authors (2): Yunfei Huang, David S. Greenberg
Neural PDE surrogates can improve the cost-accuracy tradeoff of classical solvers, but often generalize poorly to new initial conditions and accumulate errors over time. Physical and symmetry constraints have shown promise in closing this performance gap, but existing techniques for imposing these inductive biases are incompatible with the staggered grids commonly used in computational fluid dynamics. Here we introduce novel input and output layers that respect physical laws and symmetries on the staggered grids, and for the first time systematically investigate how these constraints, individually and in combination, affect the accuracy of PDE surrogates. We focus on two challenging problems: shallow water equations with closed boundaries and decaying incompressible turbulence. Compared to strong baselines, symmetries and physical constraints consistently improve performance across tasks, architectures, autoregressive prediction steps, accuracy measures, and network sizes. Symmetries are more effective than physical constraints, but surrogates with both performed best, even compared to baselines with data augmentation or pushforward training, while themselves benefiting from the pushforward trick. Doubly-constrained surrogates also generalize better to initial conditions and durations beyond the range of the training data, and more accurately predict real-world ocean currents.
nan
Article 1837
Title@2025-06-05 (4): Investigating Non-Transitivity in LLM-as-a-Judge
Title: Investigating Non-Transitivity in LLM-as-a-Judge | Untersuchung der Nichtübertragbarkeit in LLM-as-a-Richter | 调查LLM-as-A法官的非过渡性 2502.14074v3 |
Authors (4): Yi Xu, Laura Ruis, Tim Rocktäschel, Robert Kirk
Automatic evaluation methods based on large language models (LLMs) are emerging as the standard tool for assessing the instruction-following abilities of LLM-based agents. The most common method in this paradigm, pairwise comparisons with a baseline model, critically depends on the assumption of transitive preferences. However, the validity of this assumption remains largely unexplored. In this study, we investigate the presence of non-transitivity within the AlpacaEval framework and analyze its effects on model rankings. We find that LLM judges exhibit non-transitive preferences, leading to rankings that are sensitive to the choice of the baseline model. To mitigate this issue, we show that round-robin tournaments combined with Bradley-Terry models of preference can produce more reliable rankings. Notably, our method increases both the Spearman correlation and the Kendall correlation with Chatbot Arena (95.0% -> 96.4% and 82.1% -> 86.3% respectively). To address the computational cost of round-robin tournaments, we propose Swiss-Wise Iterative Matchmaking (Swim) tournaments, using a dynamic matching strategy to capture the benefits of round-robin tournaments while maintaining computational efficiency.
nan
Article 1838
Title@2025-06-05 (4): How can Diffusion Models Evolve into Continual Generators?
Title: How can Diffusion Models Evolve into Continual Generators? | Wie können sich Diffusionsmodelle zu kontinuierlichen Generatoren entwickeln? | 传播模型如何发展成持续发电机? 2505.11936v2 |
Authors (3): Jingren Liu, Zhong Ji, Xiangyu Chen
While diffusion models have achieved remarkable success in static data generation, their deployment in streaming or continual learning (CL) scenarios faces a major challenge: catastrophic forgetting (CF), where newly acquired generative capabilities overwrite previously learned ones. To systematically address this, we introduce a formal Continual Diffusion Generation (CDG) paradigm that characterizes and redefines CL in the context of generative diffusion models. Prior efforts often adapt heuristic strategies from continual classification tasks but lack alignment with the underlying diffusion process. In this work, we develop the first theoretical framework for CDG by analyzing cross-task dynamics in diffusion-based generative modeling. Our analysis reveals that the retention and stability of generative knowledge across tasks are governed by three key consistency criteria: inter-task knowledge consistency (IKC), unconditional knowledge consistency (UKC), and label knowledge consistency (LKC). Building on these insights, we propose Continual Consistency Diffusion (CCD), a principled framework that integrates these consistency objectives into training via hierarchical loss terms $\mathcal{L}{IKC}$, $\mathcal{L}{UKC}$, and $\mathcal{L}_{LKC}$. This promotes effective knowledge retention while enabling the assimilation of new generative capabilities. Extensive experiments on four benchmark datasets demonstrate that CCD achieves state-of-the-art performance under continual settings, with substantial gains in Mean Fidelity (MF) and Incremental Mean Fidelity (IMF), particularly in tasks with rich cross-task knowledge overlap.
nan
Article 1839
Title@2025-06-05 (4): The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models
Title: The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models | Der Generative Sprung: Scharfe Probenkomplexität für effizientes Lernen Gaussischer Multi-Index-Modelle | 产生式滑坡:高效学习高斯多指标模型的简单抽样复杂程度 2506.05500v1 |
Authors (3): Alex Damian, Jason D. Lee, Joan Bruna
In this work we consider generic Gaussian Multi-index models, in which the labels only depend on the (Gaussian) $d$-dimensional inputs through their projection onto a low-dimensional $r = O_d(1)$ subspace, and we study efficient agnostic estimation procedures for this hidden subspace. We introduce the \emph{generative leap} exponent $k^\star$, a natural extension of the generative exponent from [Damian et al.’24] to the multi-index setting. We first show that a sample complexity of $n=\Theta(d^{1 \vee \k/2})$ is necessary in the class of algorithms captured by the Low-Degree-Polynomial framework. We then establish that this sample complexity is also sufficient, by giving an agnostic sequential estimation procedure (that is, requiring no prior knowledge of the multi-index model) based on a spectral U-statistic over appropriate Hermite tensors. We further compute the generative leap exponent for several examples including piecewise linear functions (deep ReLU networks with bias), and general deep neural networks (with $r$-dimensional first hidden layer).
nan
Article 1840
Title@2025-06-05 (4): Multidimensional Analysis of Specific Language Impairment Using Unsupervised Learning Through PCA and Clustering
Title: Multidimensional Analysis of Specific Language Impairment Using Unsupervised Learning Through PCA and Clustering | Multidimensionale Analyse der spezifischen Sprachbeeinträchtigung durch unbeaufsichtigtes Lernen durch PCA und Clustering | 利用无监督的学习,通过常设仲裁院和集群,对特定语言障碍进行多层面分析 2506.05498v1 |
Authors (1): Niruthiha Selvanayagam
Specific Language Impairment (SLI) affects approximately 7 percent of children, presenting as isolated language deficits despite normal cognitive abilities, sensory systems, and supportive environments. Traditional diagnostic approaches often rely on standardized assessments, which may overlook subtle developmental patterns. This study aims to identify natural language development trajectories in children with and without SLI using unsupervised machine learning techniques, providing insights for early identification and targeted interventions. Narrative samples from 1,163 children aged 4-16 years across three corpora (Conti-Ramsden 4, ENNI, and Gillam) were analyzed using Principal Component Analysis (PCA) and clustering. A total of 64 linguistic features were evaluated to uncover developmental trajectories and distinguish linguistic profiles. Two primary clusters emerged: (1) high language production with low SLI prevalence, and (2) limited production but higher syntactic complexity with higher SLI prevalence. Additionally, boundary cases exhibited intermediate traits, supporting a continuum model of language abilities. Findings suggest SLI manifests primarily through reduced production capacity rather than syntactic complexity deficits. The results challenge categorical diagnostic frameworks and highlight the potential of unsupervised learning techniques for refining diagnostic criteria and intervention strategies.
nan
Article 1841
Title@2025-06-05 (4): Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models
Title: Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models | Conformal Prediction Beyond the Seen: Eine fehlende Massenperspektive für Unsicherheitsquantifizierung in generativen Modellen | 可见之外的非正式预测:在生成模型中不确定性量化的缺失质量角度 2506.05497v1 |
Authors (4): Sima Noorani, Shayan Kiyani, George Pappas, Hamed Hassani
Uncertainty quantification (UQ) is essential for safe deployment of generative AI models such as large language models (LLMs), especially in high stakes applications. Conformal prediction (CP) offers a principled uncertainty quantification framework, but classical methods focus on regression and classification, relying on geometric distances or softmax scores: tools that presuppose structured outputs. We depart from this paradigm by studying CP in a query only setting, where prediction sets must be constructed solely from finite queries to a black box generative model, introducing a new trade off between coverage, test time query budget, and informativeness. We introduce Conformal Prediction with Query Oracle (CPQ), a framework characterizing the optimal interplay between these objectives. Our finite sample algorithm is built on two core principles: one governs the optimal query policy, and the other defines the optimal mapping from queried samples to prediction sets. Remarkably, both are rooted in the classical missing mass problem in statistics. Specifically, the optimal query policy depends on the rate of decay, or the derivative, of the missing mass, for which we develop a novel estimator. Meanwhile, the optimal mapping hinges on the missing mass itself, which we estimate using Good Turing estimators. We then turn our focus to implementing our method for language models, where outputs are vast, variable, and often under specified. Fine grained experiments on three real world open ended tasks and two LLMs, show CPQ applicability to any black box LLM and highlight: (1) individual contribution of each principle to CPQ performance, and (2) CPQ ability to yield significantly more informative prediction sets than existing conformal methods for language uncertainty quantification.
nan
Article 1842
Title@2025-06-05 (4): Learning-Augmented Hierarchical Clustering
Title: Learning-Augmented Hierarchical Clustering | Learning-Augmented Hierarchical Clustering | 学习增强的等级集群 2506.05495v1 |
Authors (4): Vladimir Braverman, Jon C. Ergun, Chen Wang, Samson Zhou
Hierarchical clustering (HC) is an important data analysis technique in which the goal is to recursively partition a dataset into a tree-like structure while grouping together similar data points at each level of granularity. Unfortunately, for many of the proposed HC objectives, there exist strong barriers to approximation algorithms with the hardness of approximation. Thus, we consider the problem of hierarchical clustering given auxiliary information from natural oracles. Specifically, we focus on a splitting oracle which, when provided with a triplet of vertices $(u,v,w)$, answers (possibly erroneously) the pairs of vertices whose lowest common ancestor includes all three vertices in an optimal tree, i.e., identifying which vertex ``splits away’’ from the others. Using such an oracle, we obtain the following results: - A polynomial-time algorithm that outputs a hierarchical clustering tree with $O(1)$-approximation to the Dasgupta objective (Dasgupta [STOC’16]). - A near-linear time algorithm that outputs a hierarchical clustering tree with $(1-o(1))$-approximation to the Moseley-Wang objective (Moseley and Wang [NeurIPS’17]). Under the plausible Small Set Expansion Hypothesis, no polynomial-time algorithm can achieve any constant approximation for Dasgupta’s objective or $(1-C)$-approximation for the Moseley-Wang objective for some constant $C>0$. As such, our results demonstrate that the splitting oracle enables algorithms to outperform standard HC approaches and overcome hardness constraints. Furthermore, our approaches extend to sublinear settings, in which we show new streaming and PRAM algorithms for HC with improved guarantees.
nan
Article 1843
Title@2025-06-05 (4): GENIUS: A Generative Framework for Universal Multimodal Search
Title: GENIUS: A Generative Framework for Universal Multimodal Search | GENIUS: Ein generatives Framework für die universelle multimodale Suche | GENNIUS: 通用多模式搜索创创框架 2503.19868v2 |
Authors (6): Sungyeon Kim, Xinliang Zhu, Xiaofan Lin, Muhammet Bastan, Douglas Gray, Suha Kwak
Generative retrieval is an emerging approach in information retrieval that generates identifiers (IDs) of target data based on a query, providing an efficient alternative to traditional embedding-based retrieval methods. However, existing models are task-specific and fall short of embedding-based retrieval in performance. This paper proposes GENIUS, a universal generative retrieval framework supporting diverse tasks across multiple modalities and domains. At its core, GENIUS introduces modality-decoupled semantic quantization, transforming multimodal data into discrete IDs encoding both modality and semantics. Moreover, to enhance generalization, we propose a query augmentation that interpolates between a query and its target, allowing GENIUS to adapt to varied query forms. Evaluated on the M-BEIR benchmark, it surpasses prior generative methods by a clear margin. Unlike embedding-based retrieval, GENIUS consistently maintains high retrieval speed across database size, with competitive performance across multiple benchmarks. With additional re-ranking, GENIUS often achieves results close to those of embedding-based methods while preserving efficiency.
nan
Article 1844
Title@2025-06-05 (4): Sentiment Analysis in Learning Management Systems Understanding Student Feedback at Scale
Title: Sentiment Analysis in Learning Management Systems Understanding Student Feedback at Scale | Sentiment Analysis in Lernmanagement-Systemen Verständnis des Feedbacks von Studierenden auf Scale | 学习管理系统对学习管理系统的情感分析 了解学生的大规模反馈 2506.05490v1 |
Authors (1): Mohammed Almutairi
During the wake of the Covid-19 pandemic, the educational paradigm has experienced a major change from in person learning traditional to online platforms. The change of learning convention has impacted the teacher-student especially in non-verbal communication. The absent of non-verbal communication has led to a reliance on verbal feedback which diminished the efficacy of the educational experience. This paper explores the integration of sentiment analysis into learning management systems (LMS) to bridge the student-teacher’s gap by offering an alternative approach to interpreting student feedback beyond its verbal context. The research involves data preparation, feature selection, and the development of a deep neural network model encompassing word embedding, LSTM, and attention mechanisms. This model is compared against a logistic regression baseline to evaluate its efficacy in understanding student feedback. The study aims to bridge the communication gap between instructors and students in online learning environments, offering insights into the emotional context of student feedback and ultimately improving the quality of online education.
nan
Article 1845
Title@2025-06-05 (4): Initial Model Incorporation for Deep Learning FWI: Pretraining or Denormalization?
Title: Initial Model Incorporation for Deep Learning FWI: Pretraining or Denormalization? | Ursprüngliche Modellinkorporation für Deep Learning FWI: Vor- oder Entnormierung? | FWI: 培训前或非正规化? 2506.05484v1 |
Authors (4): Ruihua Chen, Bangyu Wu, Meng Li, Kai Yang
Subsurface property neural network reparameterized full waveform inversion (FWI) has emerged as an effective unsupervised learning framework, which can invert stably with an inaccurate starting model. It updates the trainable neural network parameters instead of fine-tuning on the subsurface model directly. There are primarily two ways to embed the prior knowledge of the initial model into neural networks, that is, pretraining and denormalization. Pretraining first regulates the neural networks’ parameters by fitting the initial velocity model; Denormalization directly adds the outputs of the network into the initial models without pretraining. In this letter, we systematically investigate the influence of the two ways of initial model incorporation for the neural network reparameterized FWI. We demonstrate that pretraining requires inverting the model perturbation based on a constant velocity value (mean) with a two-stage implementation. It leads to a complex workflow and inconsistency of objective functions in the two-stage process, causing the network parameters to become inactive and lose plasticity. Experimental results demonstrate that denormalization can simplify workflows, accelerate convergence, and enhance inversion accuracy compared with pretraining.
nan
Article 1846
Title@2025-06-05 (4): ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
Title: ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting | ODE-GS: Latente ODEs für dynamische Szene Extrapolation mit 3D Gaussian Splatting | ODE-GS: 带有 3D Gaussian Splatting 3D 的动态场点外推法 2506.05480v1 |
Authors (5): Daniel Wang, Patrick Rim, Tian Tian, Alex Wong, Ganesh Sundaramoorthi
We present ODE-GS, a novel method that unifies 3D Gaussian Splatting with latent neural ordinary differential equations (ODEs) to forecast dynamic 3D scenes far beyond the time span seen during training. Existing neural rendering systems - whether NeRF- or 3DGS-based - embed time directly in a deformation network and therefore excel at interpolation but collapse when asked to predict the future, where timestamps are strictly out-of-distribution. ODE-GS eliminates this dependency: after learning a high-fidelity, time-conditioned deformation model for the training window, we freeze it and train a Transformer encoder that summarizes past Gaussian trajectories into a latent state whose continuous evolution is governed by a neural ODE. Numerical integration of this latent flow yields smooth, physically plausible Gaussian trajectories that can be queried at any future instant and rendered in real time. Coupled with a variational objective and a lightweight second-derivative regularizer, ODE-GS attains state-of-the-art extrapolation on D-NeRF and NVFI benchmarks, improving PSNR by up to 10 dB and halving perceptual error (LPIPS) relative to the strongest baselines. Our results demonstrate that continuous-time latent dynamics are a powerful, practical route to photorealistic prediction of complex 3D scenes.
nan
Article 1847
Title@2025-06-05 (4): Learning-Augmented Algorithms for MTS with Bandit Access to Multiple Predictors
Title: Learning-Augmented Algorithms for MTS with Bandit Access to Multiple Predictors | Lerngesteigerte Algorithmen für MTS mit Bandit-Zugriff auf mehrere Vorhersager | 多边贸易体系的学习强化比值,使强盗能够接触多种预测者 2506.05479v1 |
Authors (2): Matei Gabriel Coşa, Marek Eliáš
We consider the following problem: We are given $\ell$ heuristics for Metrical Task Systems (MTS), where each might be tailored to a different type of input instances. While processing an input instance received online, we are allowed to query the action of only one of the heuristics at each time step. Our goal is to achieve performance comparable to the best of the given heuristics. The main difficulty of our setting comes from the fact that the cost paid by a heuristic at time $t$ cannot be estimated unless the same heuristic was also queried at time $t-1$. This is related to Bandit Learning against memory bounded adversaries (Arora et al., 2012). We show how to achieve regret of $O(\text{OPT}^{2/3})$ and prove a tight lower bound based on the construction of Dekel et al. (2013).
nan
Article 1848
Title@2025-06-05 (4): Inference-Time Hyper-Scaling with KV Cache Compression
Title: Inference-Time Hyper-Scaling with KV Cache Compression | Inferenz-Zeit-Hyper-Skalierung mit KV-Cache-Kompression | 与 KV 缓存压缩一起的推断时间超大缩放 2506.05345v1 |
Authors (4): Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot, Edoardo M. Ponti
Inference-time scaling trades efficiency for increased reasoning accuracy by generating longer or more parallel sequences. However, in Transformer LLMs, generation cost is bottlenecked by the size of the key-value (KV) cache, rather than the number of generated tokens. Hence, we explore inference-time hyper-scaling: by compressing the KV cache, we can generate more tokens within the same compute budget and further improve the accuracy of scaled inference. The success of this approach, however, hinges on the ability of compression methods to preserve accuracy even at high compression ratios. To make hyper-scaling practical, we introduce Dynamic Memory Sparsification (DMS), a novel method for sparsifying KV caches that only requires 1K training steps to achieve 8$\times$ compression, while maintaining better accuracy than training-free sparse attention. Instead of prematurely discarding cached tokens, DMS delays token eviction, implicitly merging representations and preserving critical information. We demonstrate the effectiveness of inference-time hyper-scaling with DMS on multiple families of LLMs, showing that it boosts accuracy for comparable inference runtime and memory load. For instance, we enhance Qwen-R1 32B by an average of 9.1 points on AIME 24, 7.6 on GPQA, and 9.6 on LiveCodeBench across compute budgets.
nan
Article 1849
Title@2025-06-05 (4): Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets
Title: Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets | Warum LLM Sicherheits-Guardrails nach Feinabstimmung zusammenbrechen: Eine Ähnlichkeitsanalyse zwischen Alignment- und Feinabstimmungs-Datensätzen | 为何LLM 安全护卫车在微调后倒塌:对准和微调数据集之间的相似性分析 2506.05346v1 |
Authors (7): Lei Hsiung, Tianyu Pang, Yung-Chen Tang, Linyue Song, Tsung-Yi Ho, Pin-Yu Chen, Yaoqing Yang
Recent advancements in large language models (LLMs) have underscored their vulnerability to safety alignment jailbreaks, particularly when subjected to downstream fine-tuning. However, existing mitigation strategies primarily focus on reactively addressing jailbreak incidents after safety guardrails have been compromised, removing harmful gradients during fine-tuning, or continuously reinforcing safety alignment throughout fine-tuning. As such, they tend to overlook a critical upstream factor: the role of the original safety-alignment data. This paper therefore investigates the degradation of safety guardrails through the lens of representation similarity between upstream alignment datasets and downstream fine-tuning tasks. Our experiments demonstrate that high similarity between these datasets significantly weakens safety guardrails, making models more susceptible to jailbreaks. Conversely, low similarity between these two types of datasets yields substantially more robust models and thus reduces harmfulness score by up to 10.33%. By highlighting the importance of upstream dataset design in the building of durable safety guardrails and reducing real-world vulnerability to jailbreak attacks, these findings offer actionable insights for fine-tuning service providers.
nan
Article 1850
Title@2025-06-05 (4): Search Arena: Analyzing Search-Augmented LLMs
Title: Search Arena: Analyzing Search-Augmented LLMs | Sucharena: Analyse von Such-Augmentierten LLMs | 搜索竞技场: 分析搜索推荐LMS 2506.05334v1 |
Authors (11): Mihran Miroyan, Tsung-Han Wu, Logan King, Tianle Li, Jiayi Pan, Xinyan Hu, Wei-Lin Chiang, Anastasios N. Angelopoulos, Trevor Darrell, Narges Norouzi, Joseph E. Gonzalez
Search-augmented language models combine web search with Large Language Models (LLMs) to improve response groundedness and freshness. However, analyzing these systems remains challenging: existing datasets are limited in scale and narrow in scope, often constrained to static, single-turn, fact-checking questions. In this work, we introduce Search Arena, a crowd-sourced, large-scale, human-preference dataset of over 24,000 paired multi-turn user interactions with search-augmented LLMs. The dataset spans diverse intents and languages, and contains full system traces with around 12,000 human preference votes. Our analysis reveals that user preferences are influenced by the number of citations, even when the cited content does not directly support the attributed claims, uncovering a gap between perceived and actual credibility. Furthermore, user preferences vary across cited sources, revealing that community-driven platforms are generally preferred and static encyclopedic sources are not always appropriate and reliable. To assess performance across different settings, we conduct cross-arena analyses by testing search-augmented LLMs in a general-purpose chat environment and conventional LLMs in search-intensive settings. We find that web search does not degrade and may even improve performance in non-search settings; however, the quality in search settings is significantly affected if solely relying on the model’s parametric knowledge. We open-sourced the dataset to support future research in this direction. Our dataset and code are available at: https://github.com/lmarena/search-arena.
nan
Article 1851
Title@2025-06-05 (4): AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Title: AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning | AceReason-Nemotron: Mathematische und Code-Reasonierung durch Stärkungslernen | AceReson-Nepron:通过强化学习推进数学和守则 2505.16400v3 |
Authors (8): Yang Chen, Zhuolin Yang, Zihan Liu, Chankyu Lee, Peng Xu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping
Despite recent progress in large-scale reinforcement learning (RL) for reasoning, the training recipe for building high-performing reasoning models remains elusive. Key implementation details of frontier models, such as DeepSeek-R1, including data curation strategies and RL training recipe, are often omitted. Moreover, recent research indicates distillation remains more effective than RL for smaller models. In this work, we demonstrate that large-scale RL can significantly enhance the reasoning capabilities of strong, small- and mid-sized models, achieving results that surpass those of state-of-the-art distillation-based models. We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts. Notably, we find that math-only RL not only significantly enhances the performance of strong distilled models on math benchmarks (e.g., +14.6% / +17.2% on AIME 2025 for the 7B / 14B models), but also code reasoning tasks (e.g., +6.8% / +5.8% on LiveCodeBench for the 7B / 14B models). In addition, extended code-only RL iterations further improve performance on code benchmarks with minimal or no degradation in math results. We develop a robust data curation pipeline to collect challenging prompts with high-quality, verifiable answers and test cases to enable verification-based RL across both domains. Finally, we identify key experimental insights, including curriculum learning with progressively increasing response lengths and the stabilizing effect of on-policy parameter updates. We find that RL not only elicits the foundational reasoning capabilities acquired during pretraining and supervised fine-tuning (e.g., distillation), but also pushes the limits of the model’s reasoning ability, enabling it to solve problems that were previously unsolvable.
nan
Article 1852
Title@2025-06-05 (4): Zeroth-Order Optimization Finds Flat Minima
Title: Zeroth-Order Optimization Finds Flat Minima | Nullte-Order-Optimierung findet flache Minima | 零阶优化发现平底微米 2506.05454v1 |
Authors (6): Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Michael Muehlebach, Niao He
Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known on the implicit regularization that provides a fine-grained characterization on which particular solutions are finally reached. We show that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian, which is widely used in previous work to distinguish between sharp and flat minima. We further provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions, where flat minima are defined as the minimizers that achieve the smallest trace of Hessian among all optimal solutions. Experiments on binary classification tasks with convex losses and language model fine-tuning support our theoretical findings.
nan
Article 1853
Title@2025-06-05 (4): The broader spectrum of in-context learning
Title: The broader spectrum of in-context learning | Das breitere Spektrum des In-Context-Lernens | 广义的内通学习 2412.03782v3 |
Authors (4): Andrew Kyle Lampinen, Stephanie C. Y. Chan, Aaditya K. Singh, Murray Shanahan
The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning within a much broader spectrum of meta-learned in-context learning. Indeed, we suggest that any distribution of sequences in which context non-trivially decreases loss on subsequent predictions can be interpreted as eliciting a kind of in-context learning. We suggest that this perspective helps to unify the broad set of in-context abilities that language models exhibit – such as adapting to tasks from instructions or role play, or extrapolating time series. This perspective also sheds light on potential roots of in-context learning in lower-level processing of linguistic dependencies (e.g. coreference or parallel structures). Finally, taking this perspective highlights the importance of generalization, which we suggest can be studied along several dimensions: not only the ability to learn something novel, but also flexibility in learning from different presentations, and in applying what is learned. We discuss broader connections to past literature in meta-learning and goal-conditioned agents, and other perspectives on learning and adaptation. We close by suggesting that research on in-context learning should consider this broader spectrum of in-context capabilities and types of generalization.
nan
Article 1854
Title@2025-06-05 (4): Admissibility of Completely Randomized Trials: A Large-Deviation Approach
Title: Admissibility of Completely Randomized Trials: A Large-Deviation Approach | Zulässigkeit von völlig randomisierten Prüfungen: Ein Ansatz mit großer Abweichung | 完全随机审理的可受理性:一个大型减缓办法 2506.05329v1 |
Authors (3): Guido Imbens, Chao Qin, Stefan Wager
When an experimenter has the option of running an adaptive trial, is it admissible to ignore this option and run a non-adaptive trial instead? We provide a negative answer to this question in the best-arm identification problem, where the experimenter aims to allocate measurement efforts judiciously to confidently deploy the most effective treatment arm. We find that, whenever there are at least three treatment arms, there exist simple adaptive designs that universally and strictly dominate non-adaptive completely randomized trials. This dominance is characterized by a notion called efficiency exponent, which quantifies a design’s statistical efficiency when the experimental sample is large. Our analysis focuses on the class of batched arm elimination designs, which progressively eliminate underperforming arms at pre-specified batch intervals. We characterize simple sufficient conditions under which these designs universally and strictly dominate completely randomized trials. These results resolve the second open problem posed in Qin [2022].
nan
Article 1855
Title@2025-06-05 (4): Seeing the Invisible: Machine learning-Based QPI Kernel Extraction via Latent Alignment
Title: Seeing the Invisible: Machine learning-Based QPI Kernel Extraction via Latent Alignment | Das Unsichtbare sehen: Maschinelles Lernen basierte QPI-Kernel-Extraktion über Latent Alignment | 可见:机器学习 QPI Kernel 采掘,通过Lient对齐 2506.05325v1 |
Authors (6): Yingshuai Ji, Haomin Zhuang, Matthew Toole, James McKenzie, Xiaolong Liu, Xiangliang Zhang
Quasiparticle interference (QPI) imaging is a powerful tool for probing electronic structures in quantum materials, but extracting the single-scatterer QPI pattern (i.e., the kernel) from a multi-scatterer image remains a fundamentally ill-posed inverse problem. In this work, we propose the first AI-based framework for QPI kernel extraction. We introduce a two-step learning strategy that decouples kernel representation learning from observation-to-kernel inference. In the first step, we train a variational autoencoder to learn a compact latent space of scattering kernels. In the second step, we align the latent representation of QPI observations with those of the pre-learned kernels using a dedicated encoder. This design enables the model to infer kernels robustly even under complex, entangled scattering conditions. We construct a diverse and physically realistic QPI dataset comprising 100 unique kernels and evaluate our method against a direct one-step baseline. Experimental results demonstrate that our approach achieves significantly higher extraction accuracy, and improved generalization to unseen kernels.
nan
Article 1856
Title@2025-06-05 (4): LSM-2: Learning from Incomplete Wearable Sensor Data
Title: LSM-2: Learning from Incomplete Wearable Sensor Data | LSM-2: Lernen von unvollständigen Wearable Sensordaten | LSM-2:学习不全穿感应器数据 2506.05321v1 |
Authors (25): Maxwell A. Xu, Girish Narayanswamy, Kumar Ayush, Dimitris Spathis, Shun Liao, Shyam A. Tailor, Ahmed Metwally, A. Ali Heydari, Yuwei Zhang, Jake Garrison, Samy Abdel-Ghaffar, Xuhai Xu, Ken Gu, Jacob Sunshine, Ming-Zher Poh, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Mark Malhotra, Shwetak Patel, Yuzhe Yang, James M. Rehg, Xin Liu, Daniel McDuff
Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM), a novel SSL approach that learns robust representations directly from incomplete data without requiring explicit imputation. AIM’s core novelty lies in its use of learnable mask tokens to model both existing (“inherited”) and artificially introduced missingness, enabling it to robustly handle fragmented real-world data during inference. Pre-trained on an extensive dataset of 40M hours of day-long multimodal sensor data, our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling. Furthermore, LSM-2 with AIM exhibits superior scaling performance, and critically, maintains high performance even under targeted missingness scenarios, reflecting clinically coherent patterns, such as the diagnostic value of nighttime biosignals for hypertension prediction. This makes AIM a more reliable choice for real-world wearable data applications.
nan
Article 1857
Title@2025-06-05 (4): Generalizable, real-time neural decoding with hybrid state-space models
Title: Generalizable, real-time neural decoding with hybrid state-space models | Verallgemeinerbare, Echtzeit-Neuraldekodierung mit hybriden State-Space-Modellen | 与混合状态空间模型的通用实时神经解码 2506.05320v1 |
Authors (7): Avery Hee-Woon Ryoo, Nanda H. Krishna, Ximeng Mao, Mehdi Azabou, Eva L. Dyer, Matthew G. Perich, Guillaume Lajoie
Real-time decoding of neural activity is central to neuroscience and neurotechnology applications, from closed-loop experiments to brain-computer interfaces, where models are subject to strict latency constraints. Traditional methods, including simple recurrent neural networks, are fast and lightweight but often struggle to generalize to unseen data. In contrast, recent Transformer-based approaches leverage large-scale pretraining for strong generalization performance, but typically have much larger computational requirements and are not always suitable for low-resource or real-time settings. To address these shortcomings, we present POSSM, a novel hybrid architecture that combines individual spike tokenization via a cross-attention module with a recurrent state-space model (SSM) backbone to enable (1) fast and causal online prediction on neural activity and (2) efficient generalization to new sessions, individuals, and tasks through multi-dataset pretraining. We evaluate POSSM’s decoding performance and inference speed on intracortical decoding of monkey motor tasks, and show that it extends to clinical applications, namely handwriting and speech decoding in human subjects. Notably, we demonstrate that pretraining on monkey motor-cortical recordings improves decoding performance on the human handwriting task, highlighting the exciting potential for cross-species transfer. In all of these tasks, we find that POSSM achieves decoding accuracy comparable to state-of-the-art Transformers, at a fraction of the inference cost (up to 9x faster on GPU). These results suggest that hybrid SSMs are a promising approach to bridging the gap between accuracy, inference speed, and generalization when training neural decoders for real-time, closed-loop applications.
nan
Article 1858
Title@2025-06-05 (4): Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
Title: Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay | Verbesserung der Dateneffizienz für LLM-Verstärkung Feinabstimmung durch problemorientierte Online-Datenauswahl und Rollout-Replay | 提高LLLM数据效率,通过困难目标在线数据选择和推出重播提高LLM强化微调的数据效率 2506.05316v1 |
Authors (7): Yifan Sun, Jingyan Shen, Yibin Wang, Tianyu Chen, Zhendong Wang, Mingyuan Zhou, Huan Zhang
Reinforcement learning (RL) has become an effective approach for fine-tuning large language models (LLMs), particularly to enhance their reasoning capabilities. However, RL fine-tuning remains highly resource-intensive, and existing work has largely overlooked the problem of data efficiency. In this paper, we propose two techniques to improve data efficiency in LLM RL fine-tuning: difficulty-targeted online data selection and rollout replay. We introduce the notion of adaptive difficulty to guide online data selection, prioritizing questions of moderate difficulty that are more likely to yield informative learning signals. To estimate adaptive difficulty efficiently, we develop an attention-based framework that requires rollouts for only a small reference set of questions. The adaptive difficulty of the remaining questions is then estimated based on their similarity to this set. To further reduce rollout cost, we introduce a rollout replay mechanism that reuses recent rollouts, lowering per-step computation while maintaining stable updates. Extensive experiments across 6 LLM-dataset combinations show that our method reduces RL fine-tuning time by 25% to 65% to reach the same level of performance as the original GRPO algorithm.
nan
Article 1859
Title@2025-06-05 (4): Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models
Title: Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models | Engropisches Lernen eingeschränkt: Ein primäres Rahmenwerk für große Sprachmodelle | 未学习:大语言模式的原始-双重框架 2506.05314v1 |
Authors (5): Taha Entesari, Arman Hatami, Rinat Khaziev, Anil Ramakrishna, Mahyar Fazlyab
Large Language Models (LLMs) deployed in real-world settings increasingly face the need to unlearn sensitive, outdated, or proprietary information. Existing unlearning methods typically formulate forgetting and retention as a regularized trade-off, combining both objectives into a single scalarized loss. This often leads to unstable optimization and degraded performance on retained data, especially under aggressive forgetting. We propose a new formulation of LLM unlearning as a constrained optimization problem: forgetting is enforced via a novel logit-margin flattening loss that explicitly drives the output distribution toward uniformity on a designated forget set, while retention is preserved through a hard constraint on a separate retain set. Compared to entropy-based objectives, our loss is softmax-free, numerically stable, and maintains non-vanishing gradients, enabling more efficient and robust optimization. We solve the constrained problem using a scalable primal-dual algorithm that exposes the trade-off between forgetting and retention through the dynamics of the dual variable. Evaluations on the TOFU and MUSE benchmarks across diverse LLM architectures demonstrate that our approach consistently matches or exceeds state-of-the-art baselines, effectively removing targeted information while preserving downstream utility.
nan
Article 1860
Title@2025-06-05 (4): Learning normalized image densities via dual score matching
Title: Learning normalized image densities via dual score matching | Normalisierte Bilddichten durch Dual-Score-Matching lernen | 通过双分比比对学习图像密度正常化 2506.05310v1 |
Authors (3): Florentin Guth, Zahra Kadkhodaie, Eero P Simoncelli
Learning probability models from data is at the heart of many machine learning endeavors, but is notoriously difficult due to the curse of dimensionality. We introduce a new framework for learning \emph{normalized} energy (log probability) models that is inspired from diffusion generative models, which rely on networks optimized to estimate the score. We modify a score network architecture to compute an energy while preserving its inductive biases. The gradient of this energy network with respect to its input image is the score of the learned density, which can be optimized using a denoising objective. Importantly, the gradient with respect to the noise level provides an additional score that can be optimized with a novel secondary objective, ensuring consistent and normalized energies across noise levels. We train an energy network with this \emph{dual} score matching objective on the ImageNet64 dataset, and obtain a cross-entropy (negative log likelihood) value comparable to the state of the art. We further validate our approach by showing that our energy model \emph{strongly generalizes}: estimated log probabilities are nearly independent of the specific images in the training set. Finally, we demonstrate that both image probability and dimensionality of local neighborhoods vary significantly with image content, in contrast with traditional assumptions such as concentration of measure or support on a low-dimensional manifold.
nan
Article 1861
Title@2025-06-05 (4): ProRefine: Inference-time Prompt Refinement with Textual Feedback
Title: ProRefine: Inference-time Prompt Refinement with Textual Feedback | ProRefine: Inferenz-Zeit Prompt Verfeinerung mit Text-Feedback | ProRefine: 用文字反馈迅速改进推论-时间 2506.05305v1 |
Authors (5): Deepak Pandita, Tharindu Cyril Weerasooriya, Ankit Parag Shah, Christopher M. Homan, Wei Wei
Agentic workflows, where multiple AI agents collaborate to accomplish complex tasks like reasoning or planning, are becoming increasingly prevalent. However, these workflows often suffer from error propagation and sub-optimal performance, largely due to poorly designed prompts that fail to effectively guide individual agents. This is a critical problem because it limits the reliability and scalability of these powerful systems. We introduce ProRefine, an innovative inference-time prompt optimization method that leverages textual feedback from large language models (LLMs) to address this challenge. ProRefine dynamically refines prompts for multi-step reasoning tasks without additional training or ground truth labels. Evaluated on five benchmark mathematical reasoning datasets, ProRefine significantly surpasses zero-shot Chain-of-Thought baselines by 3 to 37 percentage points. This approach not only boosts accuracy but also allows smaller models to match the performance of larger ones, highlighting its potential for efficient and scalable AI deployment, and democratizing access to high-performing AI.
nan
Article 1862
Title@2025-06-05 (4): Power Law Guided Dynamic Sifting for Efficient Attention
Title: Power Law Guided Dynamic Sifting for Efficient Attention | Power Law geführtes Dynamic Sifting für effiziente Aufmerksamkeit | 有效关注动力法指导动态筛选 2506.05300v1 |
Authors (3): Nirav Koley, Prajwal Singhania, Abhinav Bhatele
Efficient inference on GPUs using large language models remains challenging due to memory bandwidth limitations, particularly during data transfers between High Bandwidth Memory (HBM) and SRAM in attention computations. Approximate attention methods address this issue by reducing computational and memory overhead but often rely on expensive top-$k$ operations, which perform poorly on GPUs. We propose SiftAttention, a novel approximate attention method that replaces the top-$k$ step with a computationally efficient element-wise filtering operation based on a threshold value. Our intuition for doing this is based on our empirical observation that the $\tau$-th quantile of attention scores follows a predictable power-law over sequential generation steps. Exploiting this insight, our approach dynamically estimates a threshold value per prompt at each generation step. Only attention scores above this threshold and their corresponding value vectors are loaded/used to compute the attention output, reducing data movement between HBM and SRAM. Our evaluation demonstrates that SiftAttention preserves model quality better than existing approximate attention methods while reducing memory bandwidth usage when loading value vectors.
nan
Article 1863
Title@2025-06-05 (4): Control Tax: The Price of Keeping AI in Check
Title: Control Tax: The Price of Keeping AI in Check | Steuerregelung: Der Preis der AII-Kontrolle | 控制税:检查人工智能检查价格 2506.05296v1 |
Authors (4): Mikhail Terekhov, Zhen Ning David Liu, Caglar Gulcehre, Samuel Albanie
The rapid integration of agentic AI into high-stakes real-world applications requires robust oversight mechanisms. The emerging field of AI Control (AIC) aims to provide such an oversight mechanism, but practical adoption depends heavily on implementation overhead. To study this problem better, we introduce the notion of Control tax – the operational and financial cost of integrating control measures into AI pipelines. Our work makes three key contributions to the field of AIC: (1) we introduce a theoretical framework that quantifies the Control Tax and maps classifier performance to safety assurances; (2) we conduct comprehensive evaluations of state-of-the-art language models in adversarial settings, where attacker models insert subtle backdoors into code while monitoring models attempt to detect these vulnerabilities; and (3) we provide empirical financial cost estimates for control protocols and develop optimized monitoring strategies that balance safety and cost-effectiveness while accounting for practical constraints like auditing budgets. Our framework enables practitioners to make informed decisions by systematically connecting safety guarantees with their costs, advancing AIC through principled economic feasibility assessment across different deployment contexts.
nan
Article 1864
Title@2025-06-05 (4): A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$: Robust Imitation via Learning to Search
Title: A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$: Robust Imitation via Learning to Search | Ein glattes Meer hat nie eine geschickte $\texttt{SAILOR}$: Robuste Imitation über das Lernen zu suchen | 平滑的海海从未造出技能用$\ textt{ SAILOR}$: 通过学习搜索进行强力模仿 2506.05294v1 |
Authors (8): Arnav Kumar Jain, Vibhakar Mohta, Subin Kim, Atiksh Bhardwaj, Juntao Ren, Yunhai Feng, Sanjiban Choudhury, Gokul Swamy
The fundamental limitation of the behavioral cloning (BC) approach to imitation learning is that it only teaches an agent what the expert did at states the expert visited. This means that when a BC agent makes a mistake which takes them out of the support of the demonstrations, they often don’t know how to recover from it. In this sense, BC is akin to giving the agent the fish – giving them dense supervision across a narrow set of states – rather than teaching them to fish: to be able to reason independently about achieving the expert’s outcome even when faced with unseen situations at test-time. In response, we explore learning to search (L2S) from expert demonstrations, i.e. learning the components required to, at test time, plan to match expert outcomes, even after making a mistake. These include (1) a world model and (2) a reward model. We carefully ablate the set of algorithmic and design decisions required to combine these and other components for stable and sample/interaction-efficient learning of recovery behavior without additional human corrections. Across a dozen visual manipulation tasks from three benchmarks, our approach $\texttt{SAILOR}$ consistently out-performs state-of-the-art Diffusion Policies trained via BC on the same data. Furthermore, scaling up the amount of demonstrations used for BC by 5-10$\times$ still leaves a performance gap. We find that $\texttt{SAILOR}$ can identify nuanced failures and is robust to reward hacking. Our code is available at https://github.com/arnavkj1995/SAILOR .
nan
Article 1865
Title@2025-06-05 (4): Learning Beyond Experience: Generalizing to Unseen State Space with Reservoir Computing
Title: Learning Beyond Experience: Generalizing to Unseen State Space with Reservoir Computing | Learning Beyond Experience: Mit Reservoir Computing den Zustand des Weltraums verallgemeinern | 超越经验的学习:利用储量计算,向不见得的国家空间推广 2506.05292v1 |
Authors (3): Declan A. Norton, Yuanzhao Zhang, Michelle Girvan
Machine learning techniques offer an effective approach to modeling dynamical systems solely from observed data. However, without explicit structural priors – built-in assumptions about the underlying dynamics – these techniques typically struggle to generalize to aspects of the dynamics that are poorly represented in the training data. Here, we demonstrate that reservoir computing – a simple, efficient, and versatile machine learning framework often used for data-driven modeling of dynamical systems – can generalize to unexplored regions of state space without explicit structural priors. First, we describe a multiple-trajectory training scheme for reservoir computers that supports training across a collection of disjoint time series, enabling effective use of available training data. Then, applying this training scheme to multistable dynamical systems, we show that RCs trained on trajectories from a single basin of attraction can achieve out-of-domain generalization by capturing system behavior in entirely unobserved basins.
nan
Article 1866
Title@2025-06-05 (4): Stable Vision Concept Transformers for Medical Diagnosis
Title: Stable Vision Concept Transformers for Medical Diagnosis | Stabile Vision Konzepttransformatoren für medizinische Diagnose | 医学诊断的稳定远景构想变异器 2506.05286v1 |
Authors (6): Lijie Hu, Songning Lai, Yuan Hua, Shu Yang, Jingfeng Zhang, Di Wang
Transparency is a paramount concern in the medical field, prompting researchers to delve into the realm of explainable AI (XAI). Among these XAI methods, Concept Bottleneck Models (CBMs) aim to restrict the model’s latent space to human-understandable high-level concepts by generating a conceptual layer for extracting conceptual features, which has drawn much attention recently. However, existing methods rely solely on concept features to determine the model’s predictions, which overlook the intrinsic feature embeddings within medical images. To address this utility gap between the original models and concept-based models, we propose Vision Concept Transformer (VCT). Furthermore, despite their benefits, CBMs have been found to negatively impact model performance and fail to provide stable explanations when faced with input perturbations, which limits their application in the medical field. To address this faithfulness issue, this paper further proposes the Stable Vision Concept Transformer (SVCT) based on VCT, which leverages the vision transformer (ViT) as its backbone and incorporates a conceptual layer. SVCT employs conceptual features to enhance decision-making capabilities by fusing them with image features and ensures model faithfulness through the integration of Denoised Diffusion Smoothing. Comprehensive experiments on four medical datasets demonstrate that our VCT and SVCT maintain accuracy while remaining interpretable compared to baselines. Furthermore, even when subjected to perturbations, our SVCT model consistently provides faithful explanations, thus meeting the needs of the medical field.
nan
Article 1867
Title@2025-06-05 (4): Fast-DataShapley: Neural Modeling for Training Data Valuation
Title: Fast-DataShapley: Neural Modeling for Training Data Valuation | Fast-DataShapley: Neurale Modellierung für Trainingsdatenbewertung | 快速数据库:培训数据评估神经模型 2506.05281v1 |
Authors (7): Haifeng Sun, Yu Xiong, Runze Wu, Xinyu Cai, Changjie Fan, Lan Zhang, Xiang-Yang Li
The value and copyright of training data are crucial in the artificial intelligence industry. Service platforms should protect data providers’ legitimate rights and fairly reward them for their contributions. Shapley value, a potent tool for evaluating contributions, outperforms other methods in theory, but its computational overhead escalates exponentially with the number of data providers. Recent works based on Shapley values attempt to mitigate computation complexity by approximation algorithms. However, they need to retrain for each test sample, leading to intolerable costs. We propose Fast-DataShapley, a one-pass training method that leverages the weighted least squares characterization of the Shapley value to train a reusable explainer model with real-time reasoning speed. Given new test samples, no retraining is required to calculate the Shapley values of the training data. Additionally, we propose three methods with theoretical guarantees to reduce training overhead from two aspects: the approximate calculation of the utility function and the group calculation of the training data. We analyze time complexity to show the efficiency of our methods. The experimental evaluations on various image datasets demonstrate superior performance and efficiency compared to baselines. Specifically, the performance is improved to more than 2.5 times, and the explainer’s training speed can be increased by two orders of magnitude.
nan
Article 1868
Title@2025-06-05 (4): How to Unlock Time Series Editing? Diffusion-Driven Approach with Multi-Grained Control
Title: How to Unlock Time Series Editing? Diffusion-Driven Approach with Multi-Grained Control | Wie man Zeitreihenbearbeitung entsperrt? Diffusion-getriebener Ansatz mit mehrkörniger Steuerung | 如何解锁时间序列编辑 ? 2506.05276v1 |
Authors (7): Hao Yu, Chu Xin Cheng, Runlong Yu, Yuyang Ye, Shiwei Tong, Zhaofeng Liu, Defu Lian
Recent advances in time series generation have shown promise, yet controlling properties in generated sequences remains challenging. Time Series Editing (TSE) - making precise modifications while preserving temporal coherence - consider both point-level constraints and segment-level controls that current methods struggle to provide. We introduce the CocktailEdit framework to enable simultaneous, flexible control across different types of constraints. This framework combines two key mechanisms: a confidence-weighted anchor control for point-wise constraints and a classifier-based control for managing statistical properties such as sums and averages over segments. Our methods achieve precise local control during the denoising inference stage while maintaining temporal coherence and integrating seamlessly, with any conditionally trained diffusion-based time series models. Extensive experiments across diverse datasets and models demonstrate its effectiveness. Our work bridges the gap between pure generative modeling and real-world time series editing needs, offering a flexible solution for human-in-the-loop time series generation and editing. The code and demo are provided for validation.
nan
Article 1869
Title@2025-06-05 (4): Tight analyses of first-order methods with error feedback
Title: Tight analyses of first-order methods with error feedback | Enge Analysen von First-Order-Methoden mit Fehlerrückmeldung | 利用错误反馈对一级处理方法进行严格分析 2506.05271v1 |
Authors (3): Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut
Communication between agents often constitutes a major computational bottleneck in distributed learning. One of the most common mitigation strategies is to compress the information exchanged, thereby reducing communication overhead. To counteract the degradation in convergence associated with compressed communication, error feedback schemes – most notably $\mathrm{EF}$ and $\mathrm{EF}^{21}$ – were introduced. In this work, we provide a tight analysis of both of these methods. Specifically, we find the Lyapunov function that yields the best possible convergence rate for each method – with matching lower bounds. This principled approach yields sharp performance guarantees and enables a rigorous, apples-to-apples comparison between $\mathrm{EF}$, $\mathrm{EF}^{21}$, and compressed gradient descent. Our analysis is carried out in a simplified yet representative setting, which allows for clean theoretical insights and fair comparison of the underlying mechanisms.
nan
Article 1870
Title@2025-06-05 (4): Context is Key: A Benchmark for Forecasting with Essential Textual Information
Title: Context is Key: A Benchmark for Forecasting with Essential Textual Information | Kontext ist der Schlüssel: Ein Benchmark für Vorhersagen mit wesentlichen Textinformationen | 背景是关键:用基本文字信息预测的基准 2410.18959v4 |
Authors (11): Andrew Robert Williams, Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Jithendaraa Subramanian, Roland Riachi, James Requeima, Alexandre Lacoste, Irina Rish, Nicolas Chapados, Alexandre Drouin
Forecasting is a critical task in decision-making across numerous domains. While historical numerical data provide a start, they fail to convey the complete context for reliable and accurate predictions. Human forecasters frequently rely on additional information, such as background knowledge and constraints, which can efficiently be communicated through natural language. However, in spite of recent progress with LLM-based forecasters, their ability to effectively integrate this textual information remains an open question. To address this, we introduce “Context is Key” (CiK), a time-series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context, requiring models to integrate both modalities; crucially, every task in CiK requires understanding textual context to be solved successfully. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters, and propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings. This benchmark aims to advance multimodal forecasting by promoting models that are both accurate and accessible to decision-makers with varied technical expertise. The benchmark can be visualized at https://servicenow.github.io/context-is-key-forecasting/v0/.
nan
Article 1871
Title@2025-06-05 (4): Learning long range dependencies through time reversal symmetry breaking
Title: Learning long range dependencies through time reversal symmetry breaking | Längere Abhängigkeiten durch Zeitumkehr-Symmetrie-Bruch lernen | 通过时间逆向对称断裂,通过时间逆向对称断裂,学习远距离依赖性 2506.05259v1 |
Authors (2): Guillaume Pourcel, Maxence Ernoult
Deep State Space Models (SSMs) reignite physics-grounded compute paradigms, as RNNs could natively be embodied into dynamical systems. This calls for dedicated learning algorithms obeying to core physical principles, with efficient techniques to simulate these systems and guide their design. We propose Recurrent Hamiltonian Echo Learning (RHEL), an algorithm which provably computes loss gradients as finite differences of physical trajectories of non-dissipative, Hamiltonian systems. In ML terms, RHEL only requires three “forward passes” irrespective of model size, without explicit Jacobian computation, nor incurring any variance in the gradient estimation. Motivated by the physical realization of our algorithm, we first introduce RHEL in continuous time and demonstrate its formal equivalence with the continuous adjoint state method. To facilitate the simulation of Hamiltonian systems trained by RHEL, we propose a discrete-time version of RHEL which is equivalent to Backpropagation Through Time (BPTT) when applied to a class of recurrent modules which we call Hamiltonian Recurrent Units (HRUs). This setting allows us to demonstrate the scalability of RHEL by generalizing these results to hierarchies of HRUs, which we call Hamiltonian SSMs (HSSMs). We apply RHEL to train HSSMs with linear and nonlinear dynamics on a variety of time-series tasks ranging from mid-range to long-range classification and regression with sequence length reaching $\sim 50k$. We show that RHEL consistently matches the performance of BPTT across all models and tasks. This work opens new doors for the design of scalable, energy-efficient physical systems endowed with self-learning capabilities for sequence modelling.
nan
Article 1872
Title@2025-06-05 (4): Continual Learning from Simulated Interactions via Multitask Prospective Rehearsal for Bionic Limb Behavior Modeling
Title: Continual Learning from Simulated Interactions via Multitask Prospective Rehearsal for Bionic Limb Behavior Modeling | Kontinuierliches Lernen von Simulierten Interaktionen über Multitask Prospective Probe für Bionic Limb Behavior Modeling | 通过多塔斯克双立架行为模拟模型预演预演预演,从模拟互动中不断学习 2405.01114v4 |
Authors (5): Sharmita Dey, Benjamin Paassen, Sarath Ravindran Nair, Sabri Boughorbel, Arndt F. Schilling
Lower limb amputations and neuromuscular impairments severely restrict mobility, necessitating advancements beyond conventional prosthetics. While motorized bionic limbs show promise, their effectiveness depends on replicating the dynamic coordination of human movement across diverse environments. In this paper, we introduce a model for human behavior in the context of bionic prosthesis control. Our approach leverages human locomotion demonstrations to learn the synergistic coupling of the lower limbs, enabling the prediction of the kinematic behavior of a missing limb during tasks such as walking, climbing inclines, and stairs. We propose a multitasking, continually adaptive model that anticipates and refines movements over time. At the core of our method is a technique called multitask prospective rehearsal, that anticipates and synthesizes future movements based on the previous prediction and employs a corrective mechanism for subsequent predictions. Our evolving architecture merges lightweight, task-specific modules on a shared backbone, ensuring both specificity and scalability. We validate our model through experiments on real-world human gait datasets, including transtibial amputees, across a wide range of locomotion tasks. Results demonstrate that our approach consistently outperforms baseline models, particularly in scenarios with distributional shifts, adversarial perturbations, and noise.
nan
Article 1873
Title@2025-06-05 (4): Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning
Title: Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning | Konservative Klassifikatoren tun konsequent gut mit Verbesserung Agenten: Charakterisierung statistischer und Online-Lernen | 保守的分类机构与改进机构保持一贯的很好:将统计和在线学习定性为特征 2506.05252v1 |
Authors (2): Dravyansh Sharma, Alec Sun
Machine learning is now ubiquitous in societal decision-making, for example in evaluating job candidates or loan applications, and it is increasingly important to take into account how classified agents will react to the learning algorithms. The majority of recent literature on strategic classification has focused on reducing and countering deceptive behaviors by the classified agents, but recent work of Attias et al. identifies surprising properties of learnability when the agents genuinely improve in order to attain the desirable classification, such as smaller generalization error than standard PAC-learning. In this paper we characterize so-called learnability with improvements across multiple new axes. We introduce an asymmetric variant of minimally consistent concept classes and use it to provide an exact characterization of proper learning with improvements in the realizable setting. While prior work studies learnability only under general, arbitrary agent improvement regions, we give positive results for more natural Euclidean ball improvement sets. In particular, we characterize improper learning under a mild generative assumption on the data distribution. We further show how to learn in more challenging settings, achieving lower generalization error under well-studied bounded noise models and obtaining mistake bounds in realizable and agnostic online learning. We resolve open questions posed by Attias et al. for both proper and improper learning.
nan
Article 1874
Title@2025-06-05 (4): On the Convergence of Gradient Descent on Learning Transformers with Residual Connections
Title: On the Convergence of Gradient Descent on Learning Transformers with Residual Connections | Über die Konvergenz des gradienten Abstiegs auf Lerntransformatoren mit residualen Verbindungen | 关于有残余连接的学习变异器的 “ 渐渐后代 “ 趋同 2506.05249v1 |
Authors (3): Zhen Qin, Jinxin Zhou, Zhihui Zhu
Transformer models have emerged as fundamental tools across various scientific and engineering disciplines, owing to their outstanding performance in diverse applications. Despite this empirical success, the theoretical foundations of Transformers remain relatively underdeveloped, particularly in understanding their training dynamics. Existing research predominantly examines isolated components–such as self-attention mechanisms and feedforward networks–without thoroughly investigating the interdependencies between these components, especially when residual connections are present. In this paper, we aim to bridge this gap by analyzing the convergence behavior of a structurally complete yet single-layer Transformer, comprising self-attention, a feedforward network, and residual connections. We demonstrate that, under appropriate initialization, gradient descent exhibits a linear convergence rate, where the convergence speed is determined by the minimum and maximum singular values of the output matrix from the attention layer. Moreover, our analysis reveals that residual connections serve to ameliorate the ill-conditioning of this output matrix, an issue stemming from the low-rank structure imposed by the softmax operation, thereby promoting enhanced optimization stability. We also extend our theoretical findings to a multi-layer Transformer architecture, confirming the linear convergence rate of gradient descent under suitable initialization. Empirical results corroborate our theoretical insights, illustrating the beneficial role of residual connections in promoting convergence stability.
nan
Article 1875
Title@2025-06-05 (4): Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress
Title: Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress | LLM-Fortschritt neu denken: Berechenbare und unabhängige Wege zum Fortschritt | 重新思考LLM进步:计算依赖性和独立前进之路 2505.04075v2 |
Authors (5): Jack Sanderson, Teddy Foley, Spencer Guo, Anqi Qu, Henry Josephson
Regulatory efforts to govern large language model (LLM) development have predominantly focused on restricting access to high-performance computational resources. This study evaluates the efficacy of such measures by examining whether LLM capabilities can advance through algorithmic innovation in compute-constrained environments. We propose a novel framework distinguishing compute-dependent innovations–which yield disproportionate benefits at high compute–from compute-independent innovations, which improve efficiency across compute scales. The impact is quantified using Compute-Equivalent Gain (CEG). Experimental validation with nanoGPT models confirms that compute-independent advancements yield significant performance gains (e.g., with combined CEG up to $3.5\times$) across the tested scales. In contrast, compute-dependent advancements were detrimental to performance at smaller experimental scales, but showed improved CEG (on par with the baseline) as model size increased, a trend consistent with their definition of yielding primary benefits at higher compute. Crucially, these findings indicate that restrictions on computational hardware, while potentially slowing LLM progress, are insufficient to prevent all capability gains driven by algorithmic advancements. We argue that effective AI oversight must therefore incorporate mechanisms for understanding, anticipating, and potentially guiding algorithmic research, moving beyond a singular focus on hardware. The proposed framework also serves as an analytical tool for forecasting AI progress.
nan
Article 1876
Title@2025-06-05 (4): A theoretical framework for overfitting in energy-based modeling
Title: A theoretical framework for overfitting in energy-based modeling | Ein theoretischer Rahmen für die Überrüstung in energiebasierte Modellierung | 以能源为基础的建模的超套理论框架 2501.19158v3 |
Authors (4): Giovanni Catania, Aurélien Decelle, Cyril Furtlehner, Beatriz Seoane
We investigate the impact of limited data on training pairwise energy-based models for inverse problems aimed at identifying interaction networks. Utilizing the Gaussian model as testbed, we dissect training trajectories across the eigenbasis of the coupling matrix, exploiting the independent evolution of eigenmodes and revealing that the learning timescales are tied to the spectral decomposition of the empirical covariance matrix. We see that optimal points for early stopping arise from the interplay between these timescales and the initial conditions of training. Moreover, we show that finite data corrections can be accurately modeled through asymptotic random matrix theory calculations and provide the counterpart of generalized cross-validation in the energy based model context. Our analytical framework extends to binary-variable maximum-entropy pairwise models with minimal variations. These findings offer strategies to control overfitting in discrete-variable models through empirical shrinkage corrections, improving the management of overfitting in energy-based generative models. Finally, we propose a generalization to arbitrary energy-based models by deriving the neural tangent kernel dynamics of the score function under the score-matching algorithm.
nan
Article 1877
Title@2025-06-05 (4): Robust Moment Identification for Nonlinear PDEs via a Neural ODE Approach
Title: Robust Moment Identification for Nonlinear PDEs via a Neural ODE Approach | Robuste Moment-Identifikation für nichtlineare PDEs über einen neuralen ODE-Ansatz | 通过神经极光模式方法确定非线性PDE的强劲势头 2506.05245v1 |
Authors (4): Shaoxuan Chen, Su Yang, Panayotis G. Kevrekidis, Wei Zhu
We propose a data-driven framework for learning reduced-order moment dynamics from PDE-governed systems using Neural ODEs. In contrast to derivative-based methods like SINDy, which necessitate densely sampled data and are sensitive to noise, our approach based on Neural ODEs directly models moment trajectories, enabling robust learning from sparse and potentially irregular time series. Using as an application platform the nonlinear Schr"{o}dinger equation, the framework accurately recovers governing moment dynamics when closure is available, even with limited and irregular observations. For systems without analytical closure, we introduce a data-driven coordinate transformation strategy based on Stiefel manifold optimization, enabling the discovery of low-dimensional representations in which the moment dynamics become closed, facilitating interpretable and reliable modeling. We also explore cases where a closure model is not known, such as a Fisher-KPP reaction-diffusion system. Here we demonstrate that Neural ODEs can still effectively approximate the unclosed moment dynamics and achieve superior extrapolation accuracy compared to physical-expert-derived ODE models. This advantage remains robust even under sparse and irregular sampling, highlighting the method’s robustness in data-limited settings. Our results highlight the Neural ODE framework as a powerful and flexible tool for learning interpretable, low-dimensional moment dynamics in complex PDE-governed systems.
nan
Article 1878
Title@2025-06-05 (4): Aligning Latent Spaces with Flow Priors
Title: Aligning Latent Spaces with Flow Priors | Latente Räume mit Flow Priors ausrichten | 将局部空间与流动前置空间对齐 2506.05240v1 |
Authors (5): Yizhuo Li, Yuying Ge, Yixiao Ge, Ying Shan, Ping Luo
This paper presents a novel framework for aligning learnable latent spaces to arbitrary target distributions by leveraging flow-based generative models as priors. Our method first pretrains a flow model on the target features to capture the underlying distribution. This fixed flow model subsequently regularizes the latent space via an alignment loss, which reformulates the flow matching objective to treat the latents as optimization targets. We formally prove that minimizing this alignment loss establishes a computationally tractable surrogate objective for maximizing a variational lower bound on the log-likelihood of latents under the target distribution. Notably, the proposed method eliminates computationally expensive likelihood evaluations and avoids ODE solving during optimization. As a proof of concept, we demonstrate in a controlled setting that the alignment loss landscape closely approximates the negative log-likelihood of the target distribution. We further validate the effectiveness of our approach through large-scale image generation experiments on ImageNet with diverse target distributions, accompanied by detailed discussions and ablation studies. With both theoretical and empirical validation, our framework paves a new way for latent space alignment.
nan
Article 1879
Title@2025-06-05 (4): Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
Title: Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit | Sparse Autoencoder bewerten: Vom shallow Design bis hin zum passenden Pursuit | 评估粗略自动校对器:从浅色设计到匹配追求 2506.05239v1 |
Authors (5): Valérie Costa, Thomas Fel, Ekdeep Singh Lubana, Bahareh Tolooshams, Demba Ba
Sparse autoencoders (SAEs) have recently become central tools for interpretability, leveraging dictionary learning principles to extract sparse, interpretable features from neural representations whose underlying structure is typically unknown. This paper evaluates SAEs in a controlled setting using MNIST, which reveals that current shallow architectures implicitly rely on a quasi-orthogonality assumption that limits the ability to extract correlated features. To move beyond this, we introduce a multi-iteration SAE by unrolling Matching Pursuit (MP-SAE), enabling the residual-guided extraction of correlated features that arise in hierarchical settings such as handwritten digit generation while guaranteeing monotonic improvement of the reconstruction as more atoms are selected.
nan
Article 1880
Title@2025-06-05 (4): Timing is Important: Risk-aware Fund Allocation based on Time-Series Forecasting
Title: Timing is Important: Risk-aware Fund Allocation based on Time-Series Forecasting | Timing ist wichtig: Risiko-aware Fund Allokation basierend auf Time-Series Forecasting | 时间选择很重要:根据时间-系列预测进行有风险的基金分配 2505.24835v2 |
Authors (9): Fuyuan Lyu, Linfeng Du, Yunpeng Weng, Qiufang Ying, Zhiyan Xu, Wen Zou, Haolun Wu, Xiuqiang He, Xing Tang
Fund allocation has been an increasingly important problem in the financial domain. In reality, we aim to allocate the funds to buy certain assets within a certain future period. Naive solutions such as prediction-only or Predict-then-Optimize approaches suffer from goal mismatch. Additionally, the introduction of the SOTA time series forecasting model inevitably introduces additional uncertainty in the predicted result. To solve both problems mentioned above, we introduce a Risk-aware Time-Series Predict-and-Allocate (RTS-PnO) framework, which holds no prior assumption on the forecasting models. Such a framework contains three features: (i) end-to-end training with objective alignment measurement, (ii) adaptive forecasting uncertainty calibration, and (iii) agnostic towards forecasting models. The evaluation of RTS-PnO is conducted over both online and offline experiments. For offline experiments, eight datasets from three categories of financial applications are used: Currency, Stock, and Cryptos. RTS-PnO consistently outperforms other competitive baselines. The online experiment is conducted on the Cross-Border Payment business at FiT, Tencent, and an 8.4\% decrease in regret is witnessed when compared with the product-line approach. The code for the offline experiment is available at https://github.com/fuyuanlyu/RTS-PnO.
nan
Article 1881
Title@2025-06-05 (4): Blink of an eye: a simple theory for feature localization in generative models
Title: Blink of an eye: a simple theory for feature localization in generative models | Blink des Auges: eine einfache Theorie zur Featurelokalisierung in generativen Modellen | 眨眼:在基因模型中地貌化的简单理论 2502.00921v2 |
Authors (3): Marvin Li, Aayush Karan, Sitan Chen
Large language models can exhibit unexpected behavior in the blink of an eye. In a recent computer use demo, a language model switched from coding to Googling pictures of Yellowstone, and these sudden shifts in behavior have also been observed in reasoning patterns and jailbreaks. This phenomenon is not unique to autoregressive models: in diffusion models, key features of the final output are decided in narrow ``critical windows’’ of the generation process. In this work we develop a simple, unifying theory to explain this phenomenon using the formalism of stochastic localization samplers. We show that it emerges generically as the generation process localizes to a sub-population of the distribution it models. While critical windows have been studied at length in diffusion models, existing theory heavily relies on strong distributional assumptions and the particulars of Gaussian diffusion. In contrast to existing work our theory (1) applies to autoregressive and diffusion models; (2) makes no distributional assumptions; (3) quantitatively improves previous bounds even when specialized to diffusions; and (4) requires basic tools and no stochastic calculus or statistical-physics-based machinery. We also identify an intriguing connection to the all-or-nothing phenomenon from statistical inference. Finally, we validate our predictions empirically for LLMs and find that critical windows often coincide with failures in problem solving for various math and reasoning benchmarks.
nan
Article 1882
Title@2025-06-05 (4): Stochastic Poisson Surface Reconstruction with One Solve using Geometric Gaussian Processes
Title: Stochastic Poisson Surface Reconstruction with One Solve using Geometric Gaussian Processes | Stochastische Poisson-Oberflächenrekonstruktion mit einer Lösung mit geometrischen Gauß-Prozessen | 利用几何高斯进程进行一索一索的斯托查斯·普瓦松地面重建 2503.19136v2 |
Authors (4): Sidhanth Holalkere, David S. Bindel, Silvia Sellán, Alexander Terenin
Poisson Surface Reconstruction is a widely-used algorithm for reconstructing a surface from an oriented point cloud. To facilitate applications where only partial surface information is available, or scanning is performed sequentially, a recent line of work proposes to incorporate uncertainty into the reconstructed surface via Gaussian process models. The resulting algorithms first perform Gaussian process interpolation, then solve a set of volumetric partial differential equations globally in space, resulting in a computationally expensive two-stage procedure. In this work, we apply recently-developed techniques from geometric Gaussian processes to combine interpolation and surface reconstruction into a single stage, requiring only one linear solve per sample. The resulting reconstructed surface samples can be queried locally in space, without the use of problem-dependent volumetric meshes or grids. These capabilities enable one to (a) perform probabilistic collision detection locally around the region of interest, (b) perform ray casting without evaluating points not on the ray’s trajectory, and (c) perform next-view planning on a per-ray basis. They also do not requiring one to approximate kernel matrix inverses with diagonal matrices as part of intermediate computations, unlike prior methods. Results show that our approach provides a cleaner, more-principled, and more-flexible stochastic surface reconstruction pipeline.
nan
Article 1883
Title@2025-06-05 (4): MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
Title: MesaNet: Sequence Modeling by Locally Optimal Test-Time Training | MesaNet: Sequenzmodellierung durch lokal optimale Test-Time-Schulung | MesaNet:通过当地最佳试验时间培训进行序列建模 2506.05233v1 |
Authors (16): Johannes von Oswald, Nino Scherrer, Seijin Kobayashi, Luca Versari, Songlin Yang, Maximilian Schlegel, Kaitlin Maile, Yanick Schimpf, Oliver Sieberling, Alexander Meulemans, Rif A. Saurous, Guillaume Lajoie, Charlotte Frenkel, Razvan Pascanu, Blaise Agüera y Arcas, João Sacramento
Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, transformers require scaling memory and compute linearly during inference. A recent stream of work linearized the softmax operation, resulting in powerful recurrent neural network (RNN) models with constant memory and compute costs such as DeltaNet, Mamba or xLSTM. These models can be unified by noting that their recurrent layer dynamics can all be derived from an in-context regression objective, approximately optimized through an online learning rule. Here, we join this line of work and introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer (von Oswald et al., 2024), and study it in language modeling at the billion-parameter scale. This layer again stems from an in-context loss, but which is now minimized to optimality at every time point using a fast conjugate gradient solver. Through an extensive suite of experiments, we show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs, especially on tasks requiring long context understanding. This performance gain comes at the cost of additional flops spent during inference time. Our results are therefore intriguingly related to recent trends of increasing test-time compute to improve performance – here by spending compute to solve sequential optimization problems within the neural network itself.
nan
Article 1884
Title@2025-06-05 (4): Causal Discovery from Conditionally Stationary Time Series
Title: Causal Discovery from Conditionally Stationary Time Series | Causal Discovery aus Conditionally Stationary Time Series | 从有条件的静止时间序列中发现的因果发现 2110.06257v4 |
Authors (7): Carles Balsells-Rodas, Xavier Sumba, Tanmayee Narendra, Ruibo Tu, Gabriele Schweikert, Hedvig Kjellstrom, Yingzhen Li
Causal discovery, i.e., inferring underlying causal relationships from observational data, is highly challenging for AI systems. In a time series modeling context, traditional causal discovery methods mainly consider constrained scenarios with fully observed variables and/or data from stationary time-series. We develop a causal discovery approach to handle a wide class of nonstationary time series that are conditionally stationary, where the nonstationary behaviour is modeled as stationarity conditioned on a set of latent state variables. Named State-Dependent Causal Inference (SDCI), our approach is able to recover the underlying causal dependencies, with provable identifiablity for the state-dependent causal structures. Empirical experiments on nonlinear particle interaction data and gene regulatory networks demonstrate SDCI’s superior performance over baseline causal discovery methods. Improved results over non-causal RNNs on modeling NBA player movements demonstrate the potential of our method and motivate the use of causality-driven methods for forecasting.
nan
Article 1885
Title@2025-06-05 (4): Progressive Tempering Sampler with Diffusion
Title: Progressive Tempering Sampler with Diffusion | Progressive Temperierung mit Diffusion | 具有扩散作用的渐进性温度采样器 2506.05231v1 |
Authors (7): Severi Rissanen, RuiKang OuYang, Jiajun He, Wenlin Chen, Markus Heinonen, Arno Solin, José Miguel Hernández-Lobato
Recent research has focused on designing neural samplers that amortize the process of sampling from unnormalized densities. However, despite significant advancements, they still fall short of the state-of-the-art MCMC approach, Parallel Tempering (PT), when it comes to the efficiency of target evaluations. On the other hand, unlike a well-trained neural sampler, PT yields only dependent samples and needs to be rerun – at considerable computational cost – whenever new samples are required. To address these weaknesses, we propose the Progressive Tempering Sampler with Diffusion (PTSD), which trains diffusion models sequentially across temperatures, leveraging the advantages of PT to improve the training of neural samplers. We also introduce a novel method to combine high-temperature diffusion models to generate approximate lower-temperature samples, which are minimally refined using MCMC and used to train the next diffusion model. PTSD enables efficient reuse of sample information across temperature levels while generating well-mixed, uncorrelated samples. Our method significantly improves target evaluation efficiency, outperforming diffusion-based neural samplers.
nan
Article 1886
Title@2025-06-05 (4): Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts
Title: Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts | Diagonales Batching löst Parallelismus in recurrenten Speichertransformatoren für lange Kontexte auf | 对角对角拔 解锁长期常用内存变换器中的平行主义 2506.05229v1 |
Authors (5): Danil Sivtsov, Ivan Rodkin, Gleb Kuzmin, Yuri Kuratov, Ivan Oseledets
Transformer models struggle with long-context inference due to their quadratic time and linear memory complexity. Recurrent Memory Transformers (RMTs) offer a solution by reducing the asymptotic cost to linear time and constant memory usage. However, their memory update mechanism leads to sequential execution, causing a performance bottleneck. We introduce Diagonal Batching, a scheduling scheme that unlocks parallelism across segments in RMTs while preserving exact recurrence. This approach eliminates the sequential constraint, enabling efficient GPU inference even for single long-context inputs without complex batching and pipelining techniques. Because the technique is purely a run-time computation reordering, existing RMT models adopt it with no retraining. Applied to a LLaMA-1B ARMT model, Diagonal Batching yields a 3.3x speedup over standard full-attention LLaMA-1B and a 1.8x speedup over the sequential RMT implementation on 131,072-token sequences. By removing sequential bottleneck, Diagonal Batching reduces inference cost and latency, thereby strengthening RMTs as a practical solution for real-world, long-context applications.
nan
Article 1887
Title@2025-06-05 (4): The Lessons of Developing Process Reward Models in Mathematical Reasoning
Title: The Lessons of Developing Process Reward Models in Mathematical Reasoning | Die Lehren aus der Entwicklung von Prozess-Reward-Modellen in mathematischer Reasoning | 数学理由中发展进程奖励模型的经验教训 2501.07301v2 |
Authors (9): Zhenru Zhang, Chujie Zheng, Yangzhen Wu, Beichen Zhang, Runji Lin, Bowen Yu, Dayiheng Liu, Jingren Zhou, Junyang Lin
Process Reward Models (PRMs) emerge as a promising approach for process supervision in mathematical reasoning of Large Language Models (LLMs), which aim to identify and mitigate intermediate errors in the reasoning processes. However, the development of effective PRMs faces significant challenges, particularly in data annotation and evaluation methodologies. In this paper, through extensive experiments, we demonstrate that commonly used Monte Carlo (MC) estimation-based data synthesis for PRMs typically yields inferior performance and generalization compared to LLM-as-a-judge and human annotation methods. MC estimation relies on completion models to evaluate current-step correctness, leading to inaccurate step verification. Furthermore, we identify potential biases in conventional Best-of-N (BoN) evaluation strategies for PRMs: (1) The unreliable policy models generate responses with correct answers but flawed processes, leading to a misalignment between the evaluation criteria of BoN and the PRM objectives of process verification. (2) The tolerance of PRMs of such responses leads to inflated BoN scores. (3) Existing PRMs have a significant proportion of minimum scores concentrated on the final answer steps, revealing the shift from process to outcome-based assessment in BoN Optimized PRMs. To address these challenges, we develop a consensus filtering mechanism that effectively integrates MC estimation with LLM-as-a-judge and advocates a more comprehensive evaluation framework that combines response-level and step-level metrics. Based on the mechanisms, we significantly improve both model performance and data efficiency in the BoN evaluation and the step-wise error identification task. Finally, we release a new state-of-the-art PRM that outperforms existing open-source alternatives and provides practical guidelines for future research in building process supervision models.
nan
Article 1888
Title@2025-06-05 (4): A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values
Title: A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values | Ein einheitliches Rahmenwerk für wahrscheinlich effiziente Algorithmen zur Schätzung von Shapley-Werten | 用于估计损耗值的可预见高效算法统一框架 2506.05216v1 |
Authors (10): Tyler Chen, Akshay Seshadri, Mattia J. Villani, Pradeep Niroula, Shouvanik Chakrabarti, Archan Ray, Pranav Deshpande, Romina Yalovetzky, Marco Pistoia, Niraj Kumar
Shapley values have emerged as a critical tool for explaining which features impact the decisions made by machine learning models. However, computing exact Shapley values is difficult, generally requiring an exponential (in the feature dimension) number of model evaluations. To address this, many model-agnostic randomized estimators have been developed, the most influential and widely used being the KernelSHAP method (Lundberg & Lee, 2017). While related estimators such as unbiased KernelSHAP (Covert & Lee, 2021) and LeverageSHAP (Musco & Witter, 2025) are known to satisfy theoretical guarantees, bounds for KernelSHAP have remained elusive. We describe a broad and unified framework that encompasses KernelSHAP and related estimators constructed using both with and without replacement sampling strategies. We then prove strong non-asymptotic theoretical guarantees that apply to all estimators from our framework. This provides, to the best of our knowledge, the first theoretical guarantees for KernelSHAP and sheds further light on tradeoffs between existing estimators. Through comprehensive benchmarking on small and medium dimensional datasets for Decision-Tree models, we validate our approach against exact Shapley values, consistently achieving low mean squared error with modest sample sizes. Furthermore, we make specific implementation improvements to enable scalability of our methods to high-dimensional datasets. Our methods, tested on datasets such MNIST and CIFAR10, provide consistently better results compared to the KernelSHAP library.
nan
Article 1889
Title@2025-06-05 (4): Learning Theory of Decentralized Robust Kernel-Based Learning Algorithm
Title: Learning Theory of Decentralized Robust Kernel-Based Learning Algorithm | Lerntheorie des dezentralisierten robusten Kernel-basierten Lernalgorithmus | 分散的强力核心学习学习理论 2506.05215v1 |
Authors (1): Zhan Yu
We propose a new decentralized robust kernel-based learning algorithm within the framework of reproducing kernel Hilbert space (RKHS) by utilizing a networked system that can be represented as a connected graph. The robust loss function $\mathcal{L}_\sigma$ induced by a windowing function $W$ and a robustness scaling parameter $\sigma>0$, can encompass a broad spectrum of robust losses. Consequently, the proposed algorithm effectively provides a unified decentralized learning framework for robust regression, which fundamentally differs from the existing distributed robust kernel learning schemes, all of which are divide-and-conquer based. We rigorously establish the learning theory and offer a comprehensive convergence analysis for the algorithm. We show each local robust estimator generated from the decentralized algorithm can be utilized to approximate the regression function. Based on kernel-based integral operator techniques, we derive general high confidence convergence bounds for each local approximating sequence in terms of the mean square distance, RKHS norm, and generalization error, respectively. Moreover, we provide rigorous selection rules for local sample size and show that, under properly selected step size and scaling parameter $\sigma$, the decentralized robust algorithm can achieve optimal learning rates (up to logarithmic factors) in both norms. The parameter $\sigma$ is shown to be essential for enhancing robustness while also ensuring favorable convergence behavior. The intrinsic connection among decentralization, sample selection, robustness of the algorithm, and its convergence is clearly reflected.
nan
Article 1890
Title@2025-06-05 (4): Mitigating Degree Bias Adaptively with Hard-to-Learn Nodes in Graph Contrastive Learning
Title: Mitigating Degree Bias Adaptively with Hard-to-Learn Nodes in Graph Contrastive Learning | Degree Bias im Graph Contrastive Learning adaptiv mit schwer zu erlernenden Knoten abmildern | 与图表对比学习中难以识别的节点相适应 2506.05214v1 |
Authors (5): Jingyu Hu, Hongbo Bo, Jun Hong, Xiaowei Liu, Weiru Liu
Graph Neural Networks (GNNs) often suffer from degree bias in node classification tasks, where prediction performance varies across nodes with different degrees. Several approaches, which adopt Graph Contrastive Learning (GCL), have been proposed to mitigate this bias. However, the limited number of positive pairs and the equal weighting of all positives and negatives in GCL still lead to low-degree nodes acquiring insufficient and noisy information. This paper proposes the Hardness Adaptive Reweighted (HAR) contrastive loss to mitigate degree bias. It adds more positive pairs by leveraging node labels and adaptively weights positive and negative pairs based on their learning hardness. In addition, we develop an experimental framework named SHARP to extend HAR to a broader range of scenarios. Both our theoretical analysis and experiments validate the effectiveness of SHARP. The experimental results across four datasets show that SHARP achieves better performance against baselines at both global and degree levels.
nan
Article 1891
Title@2025-06-05 (4): Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data
Title: Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data | Verstärktes Lernen für individuelle Optimale Politik aus heterogenen Daten | 利用不同不同数据为个人最佳政策强化学习 2505.09496v2 |
Authors (3): Rui Miao, Babak Shahbaba, Annie Qu
Offline reinforcement learning (RL) aims to find optimal policies in dynamic environments in order to maximize the expected total rewards by leveraging pre-collected data. Learning from heterogeneous data is one of the fundamental challenges in offline RL. Traditional methods focus on learning an optimal policy for all individuals with pre-collected data from a single episode or homogeneous batch episodes, and thus, may result in a suboptimal policy for a heterogeneous population. In this paper, we propose an individualized offline policy optimization framework for heterogeneous time-stationary Markov decision processes (MDPs). The proposed heterogeneous model with individual latent variables enables us to efficiently estimate the individual Q-functions, and our Penalized Pessimistic Personalized Policy Learning (P4L) algorithm guarantees a fast rate on the average regret under a weak partial coverage assumption on behavior policies. In addition, our simulation studies and a real data application demonstrate the superior numerical performance of the proposed method compared with existing methods.
nan
Article 1892
Title@2025-06-05 (4): Unleashing The Power of Pre-Trained Language Models for Irregularly Sampled Time Series
Title: Unleashing The Power of Pre-Trained Language Models for Irregularly Sampled Time Series | Lösen der Macht von vortrainierten Sprachmodellen für unregelmäßig gemusterte Zeitreihen | 为不定期抽样抽样时间序列释放培训前语言模型的力量 2408.08328v2 |
Authors (4): Weijia Zhang, Chenlong Yin, Hao Liu, Hui Xiong
Pre-trained Language Models (PLMs), such as ChatGPT, have significantly advanced the field of natural language processing. This progress has inspired a series of innovative studies that explore the adaptation of PLMs to time series analysis, intending to create a unified foundation model that addresses various time series analytical tasks. However, these efforts predominantly focus on Regularly Sampled Time Series (RSTS), neglecting the unique challenges posed by Irregularly Sampled Time Series (ISTS), which are characterized by uneven sampling intervals and prevalent missing data. To bridge this gap, this work takes the first step in exploring the potential of PLMs for ISTS analysis. We begin by investigating the effect of various methods for representing ISTS, aiming to maximize the efficacy of PLMs in the analysis. Furthermore, we propose a unified PLM-based framework, named ISTS-PLM, to address diverse ISTS analytical tasks. It integrates novel time-aware and variable-aware PLMs tailored to tackle the intractable intra- and inter-time series modeling in ISTS. Finally, extensive experiments on a comprehensive benchmark demonstrate that the ISTS-PLM, utilizing a structured and effective series-based representation for ISTS, consistently achieves state-of-the-art performance across various analytical tasks, such as classification, interpolation, extrapolation, few-shot and zero-shot learning scenarios, spanning scientific domains like healthcare, biomechanics, and climate science.
nan
Article 1893
Title@2025-06-05 (4): The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
Title: The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text | The Common Pile v0.1: Ein 8TB-Datensatz von Public Domain und Openly Licensed Text | Pile v0.1:公共域和公开许可文本的 8TB数据集 2506.05209v1 |
Authors (27): Nikhil Kandpal, Brian Lester, Colin Raffel, Sebastian Majstorovic, Stella Biderman, Baber Abbasi, Luca Soldaini, Enrico Shippole, A. Feder Cooper, Aviya Skowron, John Kirchenbauer, Shayne Longpre, Lintang Sutawika, Alon Albalak, Zhenlin Xu, Guilherme Penedo, Loubna Ben Allal, Elie Bakouch, John David Pressman, Honglu Fan, Dashiell Stander, Guangyu Song, Aaron Gokaslan, Tom Goldstein, Brian R. Bartoldson, Bhavya Kailkhura, Tyler Murray
Large language models (LLMs) are typically trained on enormous quantities of unlicensed text, a practice that has led to scrutiny due to possible intellectual property infringement and ethical concerns. Training LLMs on openly licensed text presents a first step towards addressing these issues, but prior data collection efforts have yielded datasets too small or low-quality to produce performant LLMs. To address this gap, we collect, curate, and release the Common Pile v0.1, an eight terabyte collection of openly licensed text designed for LLM pretraining. The Common Pile comprises content from 30 sources that span diverse domains including research papers, code, books, encyclopedias, educational materials, audio transcripts, and more. Crucially, we validate our efforts by training two 7 billion parameter LLMs on text from the Common Pile: Comma v0.1-1T and Comma v0.1-2T, trained on 1 and 2 trillion tokens respectively. Both models attain competitive performance to LLMs trained on unlicensed text with similar computational budgets, such as Llama 1 and 2 7B. In addition to releasing the Common Pile v0.1 itself, we also release the code used in its creation as well as the training mixture and checkpoints for the Comma v0.1 models.
nan
Article 1894
Title@2025-06-05 (4): One Wave To Explain Them All: A Unifying Perspective On Feature Attribution
Title: One Wave To Explain Them All: A Unifying Perspective On Feature Attribution | Eine Welle, um sie alle zu erklären: Eine verbindende Perspektive auf Feature Attribution | 一波一波解释全部:一个统一化的地物归属观 2410.01482v2 |
Authors (4): Gabriel Kasmi, Amandine Brunetto, Thomas Fel, Jayneel Parekh
Feature attribution methods aim to improve the transparency of deep neural networks by identifying the input features that influence a model’s decision. Pixel-based heatmaps have become the standard for attributing features to high-dimensional inputs, such as images, audio representations, and volumes. While intuitive and convenient, these pixel-based attributions fail to capture the underlying structure of the data. Moreover, the choice of domain for computing attributions has often been overlooked. This work demonstrates that the wavelet domain allows for informative and meaningful attributions. It handles any input dimension and offers a unified approach to feature attribution. Our method, the Wavelet Attribution Method (WAM), leverages the spatial and scale-localized properties of wavelet coefficients to provide explanations that capture both the where and what of a model’s decision-making process. We show that WAM quantitatively matches or outperforms existing gradient-based methods across multiple modalities, including audio, images, and volumes. Additionally, we discuss how WAM bridges attribution with broader aspects of model robustness and transparency. Project page: https://gabrielkasmi.github.io/wam/
nan
Article 1895
Title@2025-06-05 (4): Trustworthiness Preservation by Copies of Machine Learning Systems
Title: Trustworthiness Preservation by Copies of Machine Learning Systems | Vertrauenswürdigkeit Erhaltung durch Kopien von Machine Learning Systemen | 机器学习系统副本保存可信赖性 2506.05203v1 |
Authors (2): Leonardo Ceragioli, Giuseppe Primiero
A common practice of ML systems development concerns the training of the same model under different data sets, and the use of the same (training and test) sets for different learning models. The first case is a desirable practice for identifying high quality and unbiased training conditions. The latter case coincides with the search for optimal models under a common dataset for training. These differently obtained systems have been considered akin to copies. In the quest for responsible AI, a legitimate but hardly investigated question is how to verify that trustworthiness is preserved by copies. In this paper we introduce a calculus to model and verify probabilistic complex queries over data and define four distinct notions: Justifiably, Equally, Weakly and Almost Trustworthy which can be checked analysing the (partial) behaviour of the copy with respect to its original. We provide a study of the relations between these notions of trustworthiness, and how they compose with each other and under logical operations. The aim is to offer a computational tool to check the trustworthiness of possibly complex systems copied from an original whose behavour is known.
nan
Article 1896
Title@2025-06-05 (4): Transformers Meet In-Context Learning: A Universal Approximation Theory
Title: Transformers Meet In-Context Learning: A Universal Approximation Theory | Transformer treffen auf In-Context Learning: Eine universelle Annäherungstheorie | 变革者满足内文学习:通用近似理论 2506.05200v1 |
Authors (5): Gen Li, Yuchen Jiao, Yu Huang, Yuting Wei, Yuxin Chen
Modern large language models are capable of in-context learning, the ability to perform new tasks at inference time using only a handful of input-output examples in the prompt, without any fine-tuning or parameter updates. We develop a universal approximation theory to better understand how transformers enable in-context learning. For any class of functions (each representing a distinct task), we demonstrate how to construct a transformer that, without any further weight updates, can perform reliable prediction given only a few in-context examples. In contrast to much of the recent literature that frames transformers as algorithm approximators – i.e., constructing transformers to emulate the iterations of optimization algorithms as a means to approximate solutions of learning problems – our work adopts a fundamentally different approach rooted in universal function approximation. This alternative approach offers approximation guarantees that are not constrained by the effectiveness of the optimization algorithms being approximated, thereby extending far beyond convex problems and linear function classes. Our construction sheds light on how transformers can simultaneously learn general-purpose representations and adapt dynamically to in-context examples.
nan
Article 1897
Title@2025-06-05 (4): Quantifying Cross-Modality Memorization in Vision-Language Models
Title: Quantifying Cross-Modality Memorization in Vision-Language Models | Quantifizierung der Cross-Modality-Memorisierung in Vision-Language-Modellen | 设想语言模型中量化跨模式的计量 2506.05198v1 |
Authors (6): Yuxin Wen, Yangsibo Huang, Tom Goldstein, Ravi Kumar, Badih Ghazi, Chiyuan Zhang
Understanding what and how neural networks memorize during training is crucial, both from the perspective of unintentional memorization of potentially sensitive information and from the standpoint of effective knowledge acquisition for real-world, knowledge-intensive tasks. While previous studies primarily investigate memorization within a single modality, such as text memorization in large language models or image memorization in diffusion models, unified multimodal models are becoming increasingly prevalent in practical applications. In this work, we focus on the unique characteristics of cross-modality memorization and conduct a systematic study centered on vision-language models. To facilitate controlled experiments, we first introduce a synthetic persona dataset comprising diverse synthetic person images and textual descriptions. We quantify factual knowledge memorization and cross-modal transferability by training models on a single modality and evaluating their performance in the other. Our results reveal that facts learned in one modality transfer to the other, but a significant gap exists between recalling information in the source and target modalities. Furthermore, we observe that this gap exists across various scenarios, including more capable models, machine unlearning, and the multi-hop case. At the end, we propose a baseline method to mitigate this challenge. We hope our study can inspire future research on developing more robust multimodal learning techniques to enhance cross-modal transferability.
nan
Article 1898
Title@2025-06-05 (4): Abnormal component analysis
Title: Abnormal component analysis | Abnorme Komponentenanalyse | 异常元件分析 2312.16139v2 |
Authors (3): Romain Valla, Pavlo Mozharovskyi, Florence d’Alché-Buc
At the crossway of machine learning and data analysis, anomaly detection aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification and isolation constitute an important task in almost any area of industry and science. While a substantial body of literature is devoted to detection of anomalies, little attention is payed to their explanation. This is the case mostly due to intrinsically non-supervised nature of the task and non-robustness of the exploratory methods like principal component analysis (PCA). We introduce a new statistical tool dedicated for exploratory analysis of abnormal observations using data depth as a score. Abnormal component analysis (shortly ACA) is a method that searches a low-dimensional data representation that best visualises and explains anomalies. This low-dimensional representation not only allows to distinguish groups of anomalies better than the methods of the state of the art, but as well provides a – linear in variables and thus easily interpretable – explanation for anomalies. In a comparative simulation and real-data study, ACA also proves advantageous for anomaly analysis with respect to methods present in the literature.
nan
Article 1899
Title@2025-06-05 (4): Locality Preserving Markovian Transition for Instance Retrieval
Title: Locality Preserving Markovian Transition for Instance Retrieval | Lokalität Erhaltung der Markovschen Übergang für Instance Retrieval | 将Markovian 过渡保留为事件检索 2506.05196v1 |
Authors (5): Jifei Luo, Wenzheng Wu, Hantao Yao, Lu Yu, Changsheng Xu
Diffusion-based re-ranking methods are effective in modeling the data manifolds through similarity propagation in affinity graphs. However, positive signals tend to diminish over several steps away from the source, reducing discriminative power beyond local regions. To address this issue, we introduce the Locality Preserving Markovian Transition (LPMT) framework, which employs a long-term thermodynamic transition process with multiple states for accurate manifold distance measurement. The proposed LPMT first integrates diffusion processes across separate graphs using Bidirectional Collaborative Diffusion (BCD) to establish strong similarity relationships. Afterwards, Locality State Embedding (LSE) encodes each instance into a distribution for enhanced local consistency. These distributions are interconnected via the Thermodynamic Markovian Transition (TMT) process, enabling efficient global retrieval while maintaining local effectiveness. Experimental results across diverse tasks confirm the effectiveness of LPMT for instance retrieval.
nan
Article 1900
Title@2025-06-05 (4): Chaotic Hedging with Iterated Integrals and Neural Networks
Title: Chaotic Hedging with Iterated Integrals and Neural Networks | Chaotische Absicherung mit iterierten Integralen und neuralen Netzwerken | 使用循环综合器和神经网络的冷酷杂乱和神经网络 2209.10166v4 |
Authors (2): Ariel Neufeld, Philipp Schmocker
In this paper, we derive an $L^p$-chaos expansion based on iterated Stratonovich integrals with respect to a given exponentially integrable continuous semimartingale. By omitting the orthogonality of the expansion, we show that every $p$-integrable functional, $p \in [1,\infty)$, can be approximated by a finite sum of iterated Stratonovich integrals. Using (possibly random) neural networks as integrands, we therefere obtain universal approximation results for $p$-integrable financial derivatives in the $L^p$-sense. Moreover, we can approximately solve the $L^p$-hedging problem (coinciding for $p = 2$ with the quadratic hedging problem), where the approximating hedging strategy can be computed in closed form within short runtime.
nan
Article 1901
Title@2025-06-05 (4): Counterfactual reasoning: an analysis of in-context emergence
Title: Counterfactual reasoning: an analysis of in-context emergence | Gegenfaktische Argumentation: eine Analyse der Entstehung von Inkontexten | 反事实推理:对内源出现的分析 2506.05188v1 |
Authors (3): Moritz Miller, Bernhard Schölkopf, Siyuan Guo
Large-scale neural language models (LMs) exhibit remarkable performance in in-context learning: the ability to learn and reason the input context on the fly without parameter update. This work studies in-context counterfactual reasoning in language models, that is, to predict the consequences of changes under hypothetical scenarios. We focus on studying a well-defined synthetic setup: a linear regression task that requires noise abduction, where accurate prediction is based on inferring and copying the contextual noise from factual observations. We show that language models are capable of counterfactual reasoning in this controlled setup and provide insights that counterfactual reasoning for a broad class of functions can be reduced to a transformation on in-context observations; we find self-attention, model depth, and data diversity in pre-training drive performance in Transformers. More interestingly, our findings extend beyond regression tasks and show that Transformers can perform noise abduction on sequential data, providing preliminary evidence on the potential for counterfactual story generation. Our code is available under https://github.com/moXmiller/counterfactual-reasoning.git .
nan
Article 1902
Title@2025-06-05 (4): TreeRPO: Tree Relative Policy Optimization
Title: TreeRPO: Tree Relative Policy Optimization | TreeRPO: Baum Relative Politikoptimierung | TreeRPO:树相对政策优化 2506.05183v1 |
Authors (6): Zhicheng Yang, Zhijiang Guo, Yinya Huang, Xiaodan Liang, Yiwei Wang, Jing Tang
Large Language Models (LLMs) have shown remarkable reasoning capabilities through Reinforcement Learning with Verifiable Rewards (RLVR) methods. However, a key limitation of existing approaches is that rewards defined at the full trajectory level provide insufficient guidance for optimizing the intermediate steps of a reasoning process. To address this, we introduce \textbf{\name}, a novel method that estimates the mathematical expectations of rewards at various reasoning steps using tree sampling. Unlike prior methods that rely on a separate step reward model, \name directly estimates these rewards through this sampling process. Building on the group-relative reward training mechanism of GRPO, \name innovatively computes rewards based on step-level groups generated during tree sampling. This advancement allows \name to produce fine-grained and dense reward signals, significantly enhancing the learning process and overall performance of LLMs. Experimental results demonstrate that our \name algorithm substantially improves the average Pass@1 accuracy of Qwen-2.5-Math on test benchmarks, increasing it from 19.0\% to 35.5\%. Furthermore, \name significantly outperforms GRPO by 2.9\% in performance while simultaneously reducing the average response length by 18.1\%, showcasing its effectiveness and efficiency. Our code will be available at \href{https://github.com/yangzhch6/TreeRPO}{https://github.com/yangzhch6/TreeRPO}.
nan
Article 1903
Title@2025-06-05 (4): AI-powered Contextual 3D Environment Generation: A Systematic Review
Title: AI-powered Contextual 3D Environment Generation: A Systematic Review | KI-powered Contextual 3D Environment Generation: Eine systematische Überprüfung | 3D环境代:系统审查 2506.05449v1 |
Authors (2): Miguel Silva, Alexandre Valle de Carvalho
The generation of high-quality 3D environments is crucial for industries such as gaming, virtual reality, and cinema, yet remains resource-intensive due to the reliance on manual processes. This study performs a systematic review of existing generative AI techniques for 3D scene generation, analyzing their characteristics, strengths, limitations, and potential for improvement. By examining state-of-the-art approaches, it presents key challenges such as scene authenticity and the influence of textual inputs. Special attention is given to how AI can blend different stylistic domains while maintaining coherence, the impact of training data on output quality, and the limitations of current models. In addition, this review surveys existing evaluation metrics for assessing realism and explores how industry professionals incorporate AI into their workflows. The findings of this study aim to provide a comprehensive understanding of the current landscape and serve as a foundation for future research on AI-driven 3D content generation. Key findings include that advanced generative architectures enable high-quality 3D content creation at a high computational cost, effective multi-modal integration techniques like cross-attention and latent space alignment facilitate text-to-3D tasks, and the quality and diversity of training data combined with comprehensive evaluation metrics are critical to achieving scalable, robust 3D scene generation.
nan
Article 1904
Title@2025-06-05 (4): Associative Memory and Generative Diffusion in the Zero-noise Limit
Title: Associative Memory and Generative Diffusion in the Zero-noise Limit | Assoziatives Gedächtnis und Generative Diffusion in der Null-Rausch-Grenze | 零噪音限制中的联合内存和生成扩散 2506.05178v1 |
Authors (2): Joshua Hess, Quaid Morris
Connections between generative diffusion and continuous-state associative memory models are studied. Morse-Smale dynamical systems are emphasized as universal approximators of gradient-based associative memory models and diffusion models as white-noise perturbed systems thereof. Universal properties of associative memory that follow from this description are described and used to characterize a generic transition from generation to memory as noise levels diminish. Structural stability inherited by Morse-Smale flows is shown to imply a notion of stability for diffusions at vanishing noise levels. Applied to one- and two-parameter families of gradients, this indicates stability at all but isolated points of associative memory learning landscapes and the learning and generation landscapes of diffusion models with gradient drift in the zero-noise limit, at which small sets of generic bifurcations characterize qualitative transitions between stable systems. Examples illustrating the characterization of these landscapes by sequences of these bifurcations are given, along with structural stability criterion for classic and modern Hopfield networks (equivalently, the attention mechanism).
nan
Article 1905
Title@2025-06-05 (4): Can Large Language Models Understand Intermediate Representations in Compilers?
Title: Can Large Language Models Understand Intermediate Representations in Compilers? | Können große Sprachmodelle Zwischendarstellungen in Compilern verstehen? | 大语言模式能理解《汇编者》的中间代表吗? 2502.06854v2 |
Authors (7): Hailong Jiang, Jianfeng Zhu, Yao Wan, Bo Fang, Hongyu Zhang, Ruoming Jin, Qiang Guan
Intermediate Representations (IRs) play a critical role in compiler design and program analysis, yet their comprehension by Large Language Models (LLMs) remains underexplored. In this paper, we present an explorative empirical study evaluating the capabilities of six state-of-the-art LLMs: GPT-4, GPT-3, DeepSeek, Gemma 2, Llama 3, and Code Llama, in understanding IRs. Specifically, we assess model performance across four core tasks: control flow graph reconstruction, decompilation, code summarization, and execution reasoning. While LLMs exhibit competence in parsing IR syntax and identifying high-level structures, they consistently struggle with instruction-level reasoning, especially in control flow reasoning, loop handling, and dynamic execution. Common failure modes include misinterpreting branching instructions, omitting critical operations, and relying on heuristic reasoning rather than precise instruction-level logic. Our findings highlight the need for IR-specific enhancements in LLM design. We recommend fine-tuning on structured IR datasets and integrating control-flow-sensitive architectures to improve model effectiveness. All experimental data and source code are publicly available at
nan
Article 1906
Title@2025-06-05 (4): Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks
Title: Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks | Verallgemeinerungen durch Meta-Learned Model Representations: PAC-Bayes und Sample Compression Hypernetworks | 通过Met-Learn模型代表:PAC-Bayes和抽样压缩超网络的通用环 2410.13577v3 |
Authors (5): Benjamin Leblanc, Mathieu Bazinet, Nathaniel D’Amours, Alexandre Drouin, Pascal Germain
Both PAC-Bayesian and Sample Compress learning frameworks are instrumental for deriving tight (non-vacuous) generalization bounds for neural networks. We leverage these results in a meta-learning scheme, relying on a hypernetwork that outputs the parameters of a downstream predictor from a dataset input. The originality of our approach lies in the investigated hypernetwork architectures that encode the dataset before decoding the parameters: (1) a PAC-Bayesian encoder that expresses a posterior distribution over a latent space, (2) a Sample Compress encoder that selects a small sample of the dataset input along with a message from a discrete set, and (3) a hybrid between both approaches motivated by a new Sample Compress theorem handling continuous messages. The latter theorem exploits the pivotal information transiting at the encoder-decoder junction in order to compute generalization guarantees for each downstream predictor obtained by our meta-learning scheme.
nan
Article 1907
Title@2025-06-05 (4): Goal-Oriented Time-Series Forecasting: Foundation Framework Design
Title: Goal-Oriented Time-Series Forecasting: Foundation Framework Design | Zielorientierte Zeitreihenplanung: Grundlagenrahmengestaltung | 目标导向的时间系列预测:基础框架设计 2504.17493v2 |
Authors (7): Luca-Andrei Fechete, Mohamed Sana, Fadhel Ayed, Nicola Piovesan, Wenjie Li, Antonio De Domenico, Tareq Si Salem
Traditional time-series forecasting often focuses only on minimizing prediction errors, ignoring the specific requirements of real-world applications that employ them. This paper presents a new training methodology, which allows a forecasting model to dynamically adjust its focus based on the importance of forecast ranges specified by the end application. Unlike previous methods that fix these ranges beforehand, our training approach breaks down predictions over the entire signal range into smaller segments, which are then dynamically weighted and combined to produce accurate forecasts within a region of interest. We tested our method on standard datasets, including a new wireless communication dataset, and found that not only it improves prediction accuracy but also enhances the performance of end application employing the forecasting model. This research provides a basis for creating forecasting systems that better connect prediction and decision-making in various practical applications.
nan
Article 1908
Title@2025-06-05 (4): Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests
Title: Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests | Bewertung der Robustheit von Deep Reinforcement Learning für die autonome Oberflächenfahrzeugsteuerung in Feldversuchen | 评价实地测试中用于自动水陆车辆控制的深强化学习的强力 2505.10033v2 |
Authors (4): Luis F. W. Batista, Stéphanie Aravecchia, Seth Hutchinson, Cédric Pradalier
Despite significant advancements in Deep Reinforcement Learning (DRL) for Autonomous Surface Vehicles (ASVs), their robustness in real-world conditions, particularly under external disturbances, remains insufficiently explored. In this paper, we evaluate the resilience of a DRL-based agent designed to capture floating waste under various perturbations. We train the agent using domain randomization and evaluate its performance in real-world field tests, assessing its ability to handle unexpected disturbances such as asymmetric drag and an off-center payload. We assess the agent’s performance under these perturbations in both simulation and real-world experiments, quantifying performance degradation and benchmarking it against an MPC baseline. Results indicate that the DRL agent performs reliably despite significant disturbances. Along with the open-source release of our implementation, we provide insights into effective training strategies, real-world challenges, and practical considerations for deploying DRLbased ASV controllers.
nan
Article 1909
Title@2025-06-05 (4): Detection-Driven Object Count Optimization for Text-to-Image Diffusion Models
Title: Detection-Driven Object Count Optimization for Text-to-Image Diffusion Models | Detektionsgesteuerte Objektanzahl-Optimierung für Text-zu-Bild-Diffusionsmodelle | 文本到图像传播模型最佳化 2408.11721v2 |
Authors (4): Oz Zafar, Yuval Cohen, Lior Wolf, Idan Schwartz
Accurately controlling object count in text-to-image generation remains a key challenge. Supervised methods often fail, as training data rarely covers all count variations. Methods that manipulate the denoising process to add or remove objects can help; however, they still require labeled data, limit robustness and image quality, and rely on a slow, iterative process. Pre-trained differentiable counting models that rely on soft object density summation exist and could steer generation, but employing them presents three main challenges: (i) they are pre-trained on clean images, making them less effective during denoising steps that operate on noisy inputs; (ii) they are not robust to viewpoint changes; and (iii) optimization is computationally expensive, requiring repeated model evaluations per image. We propose a new framework that uses pre-trained object counting techniques and object detectors to guide generation. First, we optimize a counting token using an outer-loop loss computed on fully generated images. Second, we introduce a detection-driven scaling term that corrects errors caused by viewpoint and proportion shifts, among other factors, without requiring backpropagation through the detection model. Third, we show that the optimized parameters can be reused for new prompts, removing the need for repeated optimization. Our method provides efficiency through token reuse, flexibility via compatibility with various detectors, and accuracy with improved counting across diverse object categories.
nan
Article 1910
Title@2025-06-05 (4): SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows
Title: SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows | SESaMo: Symmetrie-verstärkende stochastische Modulation für normalisierende Strömungen | SESaMo: 正常流动的对称性-强化斯托调动 2505.19619v2 |
Authors (4): Janik Kreit, Dominic Schuh, Kim A. Nicoli, Lena Funcke
Deep generative models have recently garnered significant attention across various fields, from physics to chemistry, where sampling from unnormalized Boltzmann-like distributions represents a fundamental challenge. In particular, autoregressive models and normalizing flows have become prominent due to their appealing ability to yield closed-form probability densities. Moreover, it is well-established that incorporating prior knowledge - such as symmetries - into deep neural networks can substantially improve training performances. In this context, recent advances have focused on developing symmetry-equivariant generative models, achieving remarkable results. Building upon these foundations, this paper introduces Symmetry-Enforcing Stochastic Modulation (SESaMo). Similar to equivariant normalizing flows, SESaMo enables the incorporation of inductive biases (e.g., symmetries) into normalizing flows through a novel technique called stochastic modulation. This approach enhances the flexibility of the generative model, allowing to effectively learn a variety of exact and broken symmetries. Our numerical experiments benchmark SESaMo in different scenarios, including an 8-Gaussian mixture model and physically relevant field theories, such as the $\phi^4$ theory and the Hubbard model.
nan
Article 1911
Title@2025-06-05 (4): Federated Isolation Forest for Efficient Anomaly Detection on Edge IoT Systems
Title: Federated Isolation Forest for Efficient Anomaly Detection on Edge IoT Systems | Föderierter Isolationswald für effiziente Anomalienerkennung an Edge IoT-Systemen | 在边缘IOT系统中高效异常探测的联邦隔离林 2506.05138v1 |
Authors (3): Pavle Vasiljevic, Milica Matic, Miroslav Popovic
Recently, federated learning frameworks such as Python TestBed for Federated Learning Algorithms and MicroPython TestBed for Federated Learning Algorithms have emerged to tackle user privacy concerns and efficiency in embedded systems. Even more recently, an efficient federated anomaly detection algorithm, FLiForest, based on Isolation Forests has been developed, offering a low-resource, unsupervised method well-suited for edge deployment and continuous learning. In this paper, we present an application of Isolation Forest-based temperature anomaly detection, developed using the previously mentioned federated learning frameworks, aimed at small edge devices and IoT systems running MicroPython. The system has been experimentally evaluated, achieving over 96% accuracy in distinguishing normal from abnormal readings and above 78% precision in detecting anomalies across all tested configurations, while maintaining a memory usage below 160 KB during model training. These results highlight its suitability for resource-constrained environments and edge systems, while upholding federated learning principles of data privacy and collaborative learning.
nan
Article 1912
Title@2025-06-05 (4): Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning
Title: Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning | Training Dynamics zugrunde liegende Sprachmodellskalierungsgesetze: Verlustverschleierung und Null-Summe-Lernen | 培训动态基础语言示范缩写法:损失减速和零苏姆学习 2506.05447v1 |
Authors (5): Andrei Mircea, Supriyo Chakraborty, Nima Chitsazan, Irina Rish, Ekaterina Lobacheva
This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl
nan
Article 1913
Title@2025-06-05 (4): DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning
Title: DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning | DiCoRe: Erweitern der Null-Shot-Erkennung durch Divergent-Convergent LLM Reasoning | DiCore: 通过差异-说服者LLM 合理性,加强零射事件探测 2506.05128v1 |
Authors (5): Tanmay Parekh, Kartik Mehta, Ninareh Mehrabi, Kai-Wei Chang, Nanyun Peng
Zero-shot Event Detection (ED), the task of identifying event mentions in natural language text without any training data, is critical for document understanding in specialized domains. Understanding the complex event ontology, extracting domain-specific triggers from the passage, and structuring them appropriately overloads and limits the utility of Large Language Models (LLMs) for zero-shot ED. To this end, we propose DiCoRe, a divergent-convergent reasoning framework that decouples the task of ED using Dreamer and Grounder. Dreamer encourages divergent reasoning through open-ended event discovery, which helps to boost event coverage. Conversely, Grounder introduces convergent reasoning to align the free-form predictions with the task-specific instructions using finite-state machine guided constrained decoding. Additionally, an LLM-Judge verifies the final outputs to ensure high precision. Through extensive experiments on six datasets across five domains and nine LLMs, we demonstrate how DiCoRe consistently outperforms prior zero-shot, transfer-learning, and reasoning baselines, achieving 4-7% average F1 gains over the best baseline – establishing DiCoRe as a strong zero-shot ED framework.
nan
Article 1914
Title@2025-06-05 (4): Membership Inference Attacks on Sequence Models
Title: Membership Inference Attacks on Sequence Models | Membership Inferenz Attacks on Sequence Models | 对序列模型的攻击 2506.05126v1 |
Authors (4): Lorenzo Rossi, Michael Aerni, Jie Zhang, Florian Tramèr
Sequence models, such as Large Language Models (LLMs) and autoregressive image generators, have a tendency to memorize and inadvertently leak sensitive information. While this tendency has critical legal implications, existing tools are insufficient to audit the resulting risks. We hypothesize that those tools’ shortcomings are due to mismatched assumptions. Thus, we argue that effectively measuring privacy leakage in sequence models requires leveraging the correlations inherent in sequential generation. To illustrate this, we adapt a state-of-the-art membership inference attack to explicitly model within-sequence correlations, thereby demonstrating how a strong existing attack can be naturally extended to suit the structure of sequence models. Through a case study, we show that our adaptations consistently improve the effectiveness of memorization audits without introducing additional computational costs. Our work hence serves as an important stepping stone toward reliable memorization audits for large sequence models.
nan
Article 1915
Title@2025-06-05 (4): Jailbreak Attack Initializations as Extractors of Compliance Directions
Title: Jailbreak Attack Initializations as Extractors of Compliance Directions | Jailbreak Attack Initialisierungen als Extraktoren für Compliance-Anweisungen | 作为 “ 遵守指示 “ 的提取器 2502.09755v2 |
Authors (5): Amit Levi, Rom Himelstein, Yaniv Nemcovsky, Avi Mendelson, Chaim Baskin
Safety-aligned LLMs respond to prompts with either compliance or refusal, each corresponding to distinct directions in the model’s activation space. Recent works show that initializing attacks via self-transfer from other prompts significantly enhances their performance. However, the underlying mechanisms of these initializations remain unclear, and attacks utilize arbitrary or hand-picked initializations. This work presents that each gradient-based jailbreak attack and subsequent initialization gradually converge to a single compliance direction that suppresses refusal, thereby enabling an efficient transition from refusal to compliance. Based on this insight, we propose CRI, an initialization framework that aims to project unseen prompts further along compliance directions. We demonstrate our approach on multiple attacks, models, and datasets, achieving an increased attack success rate (ASR) and reduced computational overhead, highlighting the fragility of safety-aligned LLMs. A reference implementation is available at: https://amit1221levi.github.io/CRI-Jailbreak-Init-LLMs-evaluation.
nan
Article 1916
Title@2025-06-05 (4): Nonlinear Causal Discovery for Grouped Data
Title: Nonlinear Causal Discovery for Grouped Data | Nichtlineare kausale Entdeckung für gruppierte Daten | 分组数据的非线性构造发现 2506.05120v1 |
Authors (3): Konstantin Göbler, Tobias Windisch, Mathias Drton
Inferring cause-effect relationships from observational data has gained significant attention in recent years, but most methods are limited to scalar random variables. In many important domains, including neuroscience, psychology, social science, and industrial manufacturing, the causal units of interest are groups of variables rather than individual scalar measurements. Motivated by these applications, we extend nonlinear additive noise models to handle random vectors, establishing a two-step approach for causal graph learning: First, infer the causal order among random vectors. Second, perform model selection to identify the best graph consistent with this order. We introduce effective and novel solutions for both steps in the vector case, demonstrating strong performance in simulations. Finally, we apply our method to real-world assembly line data with partial knowledge of causal ordering among variable groups.
nan
Article 1917
Title@2025-06-05 (4): Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images
Title: Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images | Erforschung von adversarialen Wasserzeichen in transformerbasierten Modellen: Übertragbarkeit und Robustheit gegen Abwehrmechanismen für medizinische Bilder | 探索以变换模式为基础的变换模型中的反对水标记:医疗图像防御机制的可转移性和强力 2506.06389v1 |
Authors (5): Rifat Sadik, Tanvir Rahman, Arpan Bhattacharjee, Bikash Chandra Halder, Ismail Hossain
Deep learning models have shown remarkable success in dermatological image analysis, offering potential for automated skin disease diagnosis. Previously, convolutional neural network(CNN) based architectures have achieved immense popularity and success in computer vision (CV) based task like skin image recognition, generation and video analysis. But with the emergence of transformer based models, CV tasks are now are nowadays carrying out using these models. Vision Transformers (ViTs) is such a transformer-based models that have shown success in computer vision. It uses self-attention mechanisms to achieve state-of-the-art performance across various tasks. However, their reliance on global attention mechanisms makes them susceptible to adversarial perturbations. This paper aims to investigate the susceptibility of ViTs for medical images to adversarial watermarking-a method that adds so-called imperceptible perturbations in order to fool models. By generating adversarial watermarks through Projected Gradient Descent (PGD), we examine the transferability of such attacks to CNNs and analyze the performance defense mechanism – adversarial training. Results indicate that while performance is not compromised for clean images, ViTs certainly become much more vulnerable to adversarial attacks: an accuracy drop of as low as 27.6%. Nevertheless, adversarial training raises it up to 90.0%.
nan
Article 1918
Title@2025-06-05 (4): Learning Rock Pushability on Rough Planetary Terrain
Title: Learning Rock Pushability on Rough Planetary Terrain | Rock Pushability lernen auf rauem planetarischen Gelände | 环绕行星地的学习岩岩推力 2505.09833v2 |
Authors (3): Tuba Girgin, Emre Girgin, Cagri Kilic
In the context of mobile navigation in unstructured environments, the predominant approach entails the avoidance of obstacles. The prevailing path planning algorithms are contingent upon deviating from the intended path for an indefinite duration and returning to the closest point on the route after the obstacle is left behind spatially. However, avoiding an obstacle on a path that will be used repeatedly by multiple agents can hinder long-term efficiency and lead to a lasting reliance on an active path planning system. In this study, we propose an alternative approach to mobile navigation in unstructured environments by leveraging the manipulation capabilities of a robotic manipulator mounted on top of a mobile robot. Our proposed framework integrates exteroceptive and proprioceptive feedback to assess the push affordance of obstacles, facilitating their repositioning rather than avoidance. While our preliminary visual estimation takes into account the characteristics of both the obstacle and the surface it relies on, the push affordance estimation module exploits the force feedback obtained by interacting with the obstacle via a robotic manipulator as the guidance signal. The objective of our navigation approach is to enhance the efficiency of routes utilized by multiple agents over extended periods by reducing the overall time spent by a fleet in environments where autonomous infrastructure development is imperative, such as lunar or Martian surfaces.
nan
Article 1919
Title@2025-06-05 (4): Survey on the Evaluation of Generative Models in Music
Title: Survey on the Evaluation of Generative Models in Music | Umfrage zur Bewertung generativer Modelle in der Musik | 音乐创创模型评价调查 2506.05104v1 |
Authors (6): Alexander Lerch, Claire Arthur, Nick Bryan-Kinns, Corey Ford, Qianyi Sun, Ashvala Vinay
Research on generative systems in music has seen considerable attention and growth in recent years. A variety of attempts have been made to systematically evaluate such systems. We provide an interdisciplinary review of the common evaluation targets, methodologies, and metrics for the evaluation of both system output and model usability, covering subjective and objective approaches, qualitative and quantitative approaches, as well as empirical and computational methods. We discuss the advantages and challenges of such approaches from a musicological, an engineering, and an HCI perspective.
nan
Article 1920
Title@2025-06-05 (4): Structural Alignment Improves Graph Test-Time Adaptation
Title: Structural Alignment Improves Graph Test-Time Adaptation | Struktural Alignment verbessert Graph Test-Time Anpassung | 结构调整改进图示测试时间适应 2502.18334v3 |
Authors (4): Hans Hao-Hsun Hsu, Shikun Liu, Han Zhao, Pan Li
Graph-based learning excels at capturing interaction patterns in diverse domains like recommendation, fraud detection, and particle physics. However, its performance often degrades under distribution shifts, especially those altering network connectivity. Current methods to address these shifts typically require retraining with the source dataset, which is often infeasible due to computational or privacy limitations. We introduce Test-Time Structural Alignment (TSA), a novel algorithm for Graph Test-Time Adaptation (GTTA) that aligns graph structures during inference without accessing the source data. Grounded in a theoretical understanding of graph data distribution shifts, TSA employs three synergistic strategies: uncertainty-aware neighborhood weighting to accommodate neighbor label distribution shifts, adaptive balancing of self-node and aggregated neighborhood representations based on their signal-to-noise ratio, and decision boundary refinement to correct residual label and feature shifts. Extensive experiments on synthetic and real-world datasets demonstrate TSA’s consistent outperformance of both non-graph TTA methods and state-of-the-art GTTA baselines.
nan
Article 1921
Title@2025-06-05 (4): Privacy Amplification Through Synthetic Data: Insights from Linear Regression
Title: Privacy Amplification Through Synthetic Data: Insights from Linear Regression | Datenschutzverstärkung durch synthetische Daten: Einblicke aus linearer Regression | 通过合成数据放大隐私:线性回归的洞视 2506.05101v1 |
Authors (4): Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard
Synthetic data inherits the differential privacy guarantees of the model used to generate it. Additionally, synthetic data may benefit from privacy amplification when the generative model is kept hidden. While empirical studies suggest this phenomenon, a rigorous theoretical understanding is still lacking. In this paper, we investigate this question through the well-understood framework of linear regression. First, we establish negative results showing that if an adversary controls the seed of the generative model, a single synthetic data point can leak as much information as releasing the model itself. Conversely, we show that when synthetic data is generated from random inputs, releasing a limited number of synthetic data points amplifies privacy beyond the model’s inherent guarantees. We believe our findings in linear regression can serve as a foundation for deriving more general bounds in the future.
nan
Article 1922
Title@2025-06-05 (4): The Impossibility of Fair LLMs
Title: The Impossibility of Fair LLMs | Die Unmöglichkeit fairer LLMs | 公平专利Ms的不可行性 2406.03198v2 |
Authors (5): Jacy Anthis, Kristian Lum, Michael Ekstrand, Avi Feller, Chenhao Tan
The rise of general-purpose artificial intelligence (AI) systems, particularly large language models (LLMs), has raised pressing moral questions about how to reduce bias and ensure fairness at scale. Researchers have documented a sort of “bias” in the significant correlations between demographics (e.g., race, gender) in LLM prompts and responses, but it remains unclear how LLM fairness could be evaluated with more rigorous definitions, such as group fairness or fair representations. We analyze a variety of technical fairness frameworks and find inherent challenges in each that make the development of a fair LLM intractable. We show that each framework either does not logically extend to the general-purpose AI context or is infeasible in practice, primarily due to the large amounts of unstructured training data and the many potential combinations of human populations, use cases, and sensitive attributes. These inherent challenges would persist for general-purpose AI, including LLMs, even if empirical challenges, such as limited participatory input and limited measurement methods, were overcome. Nonetheless, fairness will remain an important type of model evaluation, and there are still promising research directions, particularly the development of standards for the responsibility of LLM developers, context-specific evaluations, and methods of iterative, participatory, and AI-assisted evaluation that could scale fairness across the diverse contexts of modern human-AI interaction.
nan
Article 1923
Title@2025-06-05 (4): Semi-Implicit Variational Inference via Kernelized Path Gradient Descent
Title: Semi-Implicit Variational Inference via Kernelized Path Gradient Descent | Halbimplizite Variationsableitung über kernalisierten Pfad gradienten Abstieg | 通过内核化路径梯度梯度 2506.05088v1 |
Authors (3): Tobias Pielok, Bernd Bischl, David Rügamer
Semi-implicit variational inference (SIVI) is a powerful framework for approximating complex posterior distributions, but training with the Kullback-Leibler (KL) divergence can be challenging due to high variance and bias in high-dimensional settings. While current state-of-the-art semi-implicit variational inference methods, particularly Kernel Semi-Implicit Variational Inference (KSIVI), have been shown to work in high dimensions, training remains moderately expensive. In this work, we propose a kernelized KL divergence estimator that stabilizes training through nonparametric smoothing. To further reduce the bias, we introduce an importance sampling correction. We provide a theoretical connection to the amortized version of the Stein variational gradient descent, which estimates the score gradient via Stein’s identity, showing that both methods minimize the same objective, but our semi-implicit approach achieves lower gradient variance. In addition, our method’s bias in function space is benign, leading to more stable and efficient optimization. Empirical results demonstrate that our method outperforms or matches state-of-the-art SIVI methods in both performance and training efficiency.
nan
Article 1924
Title@2025-06-05 (4): Autoformulation of Mathematical Optimization Models Using LLMs
Title: Autoformulation of Mathematical Optimization Models Using LLMs | Autoformulierung mathematischer Optimierungsmodelle mit LLMs | 使用LLMM 的数学优化模型自动格式化 2411.01679v2 |
Authors (4): Nicolás Astorga, Tennison Liu, Yuanzhang Xiao, Mihaela van der Schaar
Mathematical optimization is fundamental to decision-making across diverse domains, from operations research to healthcare. Yet, translating real-world problems into optimization models remains a difficult task, often demanding specialized expertise. This paper approaches the problem of $\textit{autoformulation}$: the automated creation of solver-ready optimization models from natural language problem descriptions. We identify three core challenges of autoformulation: $\textit{(1)}$ the vast, problem-dependent hypothesis space, $\textit{(2)}$ efficient and diverse exploration of this space under uncertainty, and $\textit{(3)}$ evaluation of formulation correctness against problem description. To address these challenges, we present a novel method leveraging $\textit{Large Language Models}$ (LLMs) with $\textit{Monte-Carlo Tree Search}$, exploiting the hierarchical nature of optimization modeling to generate and systematically explore possible formulations. To enhance search efficiency, we introduce symbolic pruning to eliminate trivially equivalent search paths (branches), and employ LLM-based evaluation of partial formulations to guide search. Empirical analysis on linear and mixed-integer programming benchmarks demonstrates our method’s effectiveness, with significant performance gains from both LLM-based value estimation and symbolic pruning techniques.
nan
Article 1925
Title@2025-06-05 (4): EMBER2024 – A Benchmark Dataset for Holistic Evaluation of Malware Classifiers
Title: EMBER2024 – A Benchmark Dataset for Holistic Evaluation of Malware Classifiers | EMBER2024 – Ein Benchmark-Datensatz für die ganzheitliche Auswertung von Malware-Klassifikatoren | 2024年9月24日 – – 用于全面评估软件分类器基准数据集 2506.05074v1 |
Authors (8): Robert J. Joyce, Gideon Miller, Phil Roth, Richard Zak, Elliott Zaresky-Williams, Hyrum Anderson, Edward Raff, James Holt
A lack of accessible data has historically restricted malware analysis research, and practitioners have relied heavily on datasets provided by industry sources to advance. Existing public datasets are limited by narrow scope - most include files targeting a single platform, have labels supporting just one type of malware classification task, and make no effort to capture the evasive files that make malware detection difficult in practice. We present EMBER2024, a new dataset that enables holistic evaluation of malware classifiers. Created in collaboration with the authors of EMBER2017 and EMBER2018, the EMBER2024 dataset includes hashes, metadata, feature vectors, and labels for more than 3.2 million files from six file formats. Our dataset supports the training and evaluation of machine learning models on seven malware classification tasks, including malware detection, malware family classification, and malware behavior identification. EMBER2024 is the first to include a collection of malicious files that initially went undetected by a set of antivirus products, creating a “challenge” set to assess classifier performance against evasive malware. This work also introduces EMBER feature version 3, with added support for several new feature types. We are releasing the EMBER2024 dataset to promote reproducibility and empower researchers in the pursuit of new malware research topics.
nan
Article 1926
Title@2025-06-05 (4): GoRA: Gradient-driven Adaptive Low Rank Adaptation
Title: GoRA: Gradient-driven Adaptive Low Rank Adaptation | GoRA: Gradient-getriebene Adaptive Low-Rank-Anpassung | GARA:逐步驱动的适应性低级别适应 2502.12171v2 |
Authors (7): Haonan He, Peng Ye, Yuchen Ren, Yuan Yuan, Luyang Zhou, Shucun Ju, Lei Chen
Low-Rank Adaptation (LoRA) is a crucial method for efficiently fine-tuning large language models (LLMs), with its effectiveness influenced by two key factors: rank selection and weight initialization. While numerous LoRA variants have been proposed to improve performance by addressing one of these aspects, they often compromise usability or computational efficiency. In this paper, we analyze and identify the core limitations of existing approaches and propose a novel framework – GoRA (Gradient-driven Adaptive Low Rank Adaptation) – that simultaneously adapts both the rank and initialization strategy within a unified framework. GoRA leverages gradient information during training to dynamically assign optimal ranks and initialize low-rank adapter weights in an adaptive manner. To our knowledge, GoRA is the first method that not only addresses the limitations of prior approaches – which often focus on either rank selection or initialization in isolation – but also unifies both aspects within a single framework, enabling more effective and efficient adaptation. Extensive experiments across various architectures and modalities show that GoRA consistently outperforms existing LoRA-based methods while preserving the efficiency of vanilla LoRA. For example, when fine-tuning Llama3.1-8B-Base for mathematical reasoning, GoRA achieves a 5.13-point improvement over standard LoRA and even outperforms full fine-tuning by 2.05 points under high-rank settings.
nan
Article 1927
Title@2025-06-05 (4): UnHiPPO: Uncertainty-aware Initialization for State Space Models
Title: UnHiPPO: Uncertainty-aware Initialization for State Space Models | UnHiPPO: Unsichere Initialisierung für State Space Models | UNHPPPO:国家空间模型的不确定性初始化 2506.05065v1 |
Authors (3): Marten Lienen, Abdullah Saydemir, Stephan Günnemann
State space models are emerging as a dominant model class for sequence problems with many relying on the HiPPO framework to initialize their dynamics. However, HiPPO fundamentally assumes data to be noise-free; an assumption often violated in practice. We extend the HiPPO theory with measurement noise and derive an uncertainty-aware initialization for state space model dynamics. In our analysis, we interpret HiPPO as a linear stochastic control problem where the data enters as a noise-free control signal. We then reformulate the problem so that the data become noisy outputs of a latent system and arrive at an alternative dynamics initialization that infers the posterior of this latent system from the data without increasing runtime. Our experiments show that our initialization improves the resistance of state-space models to noise both at training and inference time. Find our implementation at https://cs.cit.tum.de/daml/unhippo.
nan
Article 1928
Title@2025-06-05 (4): NIMO: a Nonlinear Interpretable MOdel
Title: NIMO: a Nonlinear Interpretable MOdel | NIMO: ein nichtlinear interpretierbarer MOdel | NIIMO:非线性解释元体 2506.05059v1 |
Authors (3): Shijian Xu, Marcello Massimo Negri, Volker Roth
Neural networks (NNs) have achieved tremendous success over the past decade, yet they are still extremely difficult to interpret. In contrast, linear models are less expressive but offer inherent interpretability. Linear coefficients are interpretable as the marginal effect of a feature on the prediction, assuming all other features are kept fixed. To combine the benefits of both approaches, we introduce NIMO (Nonlinear Interpretable MOdel). The key idea is to define a model where the NN is designed to learn nonlinear corrections to the linear model predictions, while also maintaining the original interpretability of the linear coefficients. Relevantly, we develop an optimization algorithm based on profile likelihood that elegantly allows for optimizing over the NN parameters while updating the linear coefficients analytically. By relying on adaptive ridge regression we can easily incorporate sparsity constraints as well. We show empirically that we can recover the underlying linear coefficients while significantly improving the predictive accuracy. Compared to other hybrid interpretable approaches, our model is the only one that actually maintains the same interpretability of linear coefficients as in linear models. We also achieve higher performance on various regression and classification settings.
nan
Article 1929
Title@2025-06-05 (4): Mini Diffuser: Fast Multi-task Diffusion Policy Training Using Two-level Mini-batches
Title: Mini Diffuser: Fast Multi-task Diffusion Policy Training Using Two-level Mini-batches | Mini Diffusor: Schnelles Multi-Task Diffusion Policy Training mit zweistufigen Mini-Batches | Mini Diffuser:利用两级小型水槽快速多任务传播政策培训 2505.09430v2 |
Authors (4): Yutong Hu, Pinhao Song, Kehan Wen, Renaud Detry
We present a method that reduces, by an order of magnitude, the time and memory needed to train multi-task vision-language robotic diffusion policies. This improvement arises from a previously underexplored distinction between action diffusion and the image diffusion techniques that inspired it: In image generation, the target is high-dimensional. By contrast, in action generation, the dimensionality of the target is comparatively small, and only the image condition is high-dimensional. Our approach, \emph{Mini Diffuser}, exploits this asymmetry by introducing \emph{two-level minibatching}, which pairs multiple noised action samples with each vision-language condition, instead of the conventional one-to-one sampling strategy. To support this batching scheme, we introduce architectural adaptations to the diffusion transformer that prevent information leakage across samples while maintaining full conditioning access. In RLBench simulations, Mini-Diffuser achieves 95\% of the performance of state-of-the-art multi-task diffusion policies, while using only 5\% of the training time and 7\% of the memory. Real-world experiments further validate that Mini-Diffuser preserves the key strengths of diffusion-based policies, including the ability to model multimodal action distributions and produce behavior conditioned on diverse perceptual inputs. Code available at mini-diffuse-actor.github.io
nan
Article 1930
Title@2025-06-05 (4): Neurosymbolic Association Rule Mining from Tabular Data
Title: Neurosymbolic Association Rule Mining from Tabular Data | Neurosymbolische Assoziationsregel Mining aus tabellarischen Daten | 利用表格数据进行常规采矿 2504.19354v2 |
Authors (3): Erkan Karabulut, Paul Groth, Victoria Degeler
Association Rule Mining (ARM) is the task of mining patterns among data features in the form of logical rules, with applications across a myriad of domains. However, high-dimensional datasets often result in an excessive number of rules, increasing execution time and negatively impacting downstream task performance. Managing this rule explosion remains a central challenge in ARM research. To address this, we introduce Aerial+, a novel neurosymbolic ARM method. Aerial+ leverages an under-complete autoencoder to create a neural representation of the data, capturing associations between features. It extracts rules from this neural representation by exploiting the model’s reconstruction mechanism. Extensive evaluations on five datasets against seven baselines demonstrate that Aerial+ achieves state-of-the-art results by learning more concise, high-quality rule sets with full data coverage. When integrated into rule-based interpretable machine learning models, Aerial+ significantly reduces execution time while maintaining or improving accuracy.
nan
Article 1931
Title@2025-06-05 (4): A Compressive-Expressive Communication Framework for Compositional Representations
Title: A Compressive-Expressive Communication Framework for Compositional Representations | Ein kompressiv-expressiver Kommunikationsrahmen für kompositorische Repräsentationen | 组成代表制压缩-高压沟通框架 2501.19182v3 |
Authors (4): Rafael Elberg, Felipe del Rio, Mircea Petrache, Denis Parra
Compositional generalization–the ability to interpret novel combinations of familiar elements–is a hallmark of human cognition and language. Despite recent advances, deep neural networks still struggle to acquire this property reliably. In this work, we introduce CELEBI (Compressive-Expressive Language Emergence through a discrete Bottleneck and Iterated learning), a novel self-supervised framework for inducing compositionality in learned representations from pre-trained models, through a reconstruction-based communication game between a sender and a receiver. Building on theories of language emergence, we integrate three mechanisms that jointly promote compressibility, expressivity, and efficiency in the emergent language. First, interactive decoding incentivizes intermediate reasoning by requiring the receiver to produce partial reconstructions after each symbol. Second, a reconstruction-based imitation phase, inspired by iterated learning, trains successive generations of agents to imitate reconstructions rather than messages, enforcing a tighter communication bottleneck. Third, pairwise distance maximization regularizes message diversity by encouraging high distances between messages, with formal links to entropy maximization. Our method significantly improves both the efficiency and compositionality of the learned messages on the Shapes3D and MPI3D datasets, surpassing prior discrete communication frameworks in both reconstruction accuracy and topographic similarity. This work provides new theoretical and empirical evidence for the emergence of structured, generalizable communication protocols from simplicity-based inductive biases.
nan
Article 1932
Title@2025-06-05 (4): Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Title: Optimizing Anytime Reasoning via Budget Relative Policy Optimization | Optimierung jederzeit über Budget-Relational-Policy-Optimierung | 通过预算相对政策优化优化,优化任何时间的理由 2505.13438v2 |
Authors (6): Penghui Qi, Zichen Liu, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin
Scaling test-time compute is crucial for enhancing the reasoning capabilities of large language models (LLMs). Existing approaches typically employ reinforcement learning (RL) to maximize a verifiable reward obtained at the end of reasoning traces. However, such methods optimize only the final performance under a large and fixed token budget, which hinders efficiency in both training and deployment. In this work, we present a novel framework, AnytimeReasoner, to optimize anytime reasoning performance, which aims to improve token efficiency and the flexibility of reasoning under varying token budget constraints. To achieve this, we truncate the complete thinking process to fit within sampled token budgets from a prior distribution, compelling the model to summarize the optimal answer for each truncated thinking for verification. This introduces verifiable dense rewards into the reasoning process, facilitating more effective credit assignment in RL optimization. We then optimize the thinking and summary policies in a decoupled manner to maximize the cumulative reward. Additionally, we introduce a novel variance reduction technique, Budget Relative Policy Optimization (BRPO), to enhance the robustness and efficiency of the learning process when reinforcing the thinking policy. Empirical results in mathematical reasoning tasks demonstrate that our method consistently outperforms GRPO across all thinking budgets under various prior distributions, enhancing both training and token efficiency.
nan
Article 1933
Title@2025-06-05 (4): Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic
Title: Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic | Causal Policy Learning in Verstärkung Lernen: Hintertür-angepasste Soft-Actor-Kritik | 强化学习中因果政策学习:后门调整后软动作器-批评 2506.05445v1 |
Authors (5): Thanh Vinh Vo, Young Lee, Haozhe Ma, Chien Lu, Tze-Yun Leong
Hidden confounders that influence both states and actions can bias policy learning in reinforcement learning (RL), leading to suboptimal or non-generalizable behavior. Most RL algorithms ignore this issue, learning policies from observational trajectories based solely on statistical associations rather than causal effects. We propose DoSAC (Do-Calculus Soft Actor-Critic with Backdoor Adjustment), a principled extension of the SAC algorithm that corrects for hidden confounding via causal intervention estimation. DoSAC estimates the interventional policy $\pi(a | \mathrm{do}(s))$ using the backdoor criterion, without requiring access to true confounders or causal labels. To achieve this, we introduce a learnable Backdoor Reconstructor that infers pseudo-past variables (previous state and action) from the current state to enable backdoor adjustment from observational data. This module is integrated into a soft actor-critic framework to compute both the interventional policy and its entropy. Empirical results on continuous control benchmarks show that DoSAC outperforms baselines under confounded settings, with improved robustness, generalization, and policy reliability. |
nan
Article 1934
Title@2025-06-05 (4): A Selective Homomorphic Encryption Approach for Faster Privacy-Preserving Federated Learning
Title: A Selective Homomorphic Encryption Approach for Faster Privacy-Preserving Federated Learning | Ein selektiver homomorpher Verschlüsselungsansatz für schnelleres Datenschutz-Erhalten von Federated Learning | 为更快的隐私-保护联邦学习采取选择性单态加密方法 2501.12911v4 |
Authors (2): Abdulkadir Korkmaz, Praveen Rao
Federated learning (FL) has come forward as a critical approach for privacy-preserving machine learning in healthcare, allowing collaborative model training across decentralized medical datasets without exchanging clients’ data. However, current security implementations for these systems face a fundamental trade-off: rigorous cryptographic protections like fully homomorphic encryption (FHE) impose prohibitive computational overhead, while lightweight alternatives risk vulnerable data leakage through model updates. To address this issue, we present FAS (Fast and Secure Federated Learning), a novel approach that strategically combines selective homomorphic encryption, differential privacy, and bitwise scrambling to achieve robust security without compromising practical usability. Our approach eliminates the need for model pretraining phases while dynamically protecting high-risk model parameters through layered encryption and obfuscation. We implemented FAS using the Flower framework and evaluated it on a cluster of eleven physical machines. Our approach was up to 90\% faster than applying FHE on the model weights. In addition, we eliminated the computational overhead that is required by competitors such as FedML-HE and MaskCrypt. Our approach was up to 1.5$\times$ faster than the competitors while achieving comparable security results. Experimental evaluations on medical imaging datasets confirm that FAS maintains similar security results to conventional FHE against gradient inversion attacks while preserving diagnostic model accuracy. These results position FAS as a practical solution for latency-sensitive healthcare applications where both privacy preservation and computational efficiency are requirements.
nan
Article 1935
Title@2025-06-05 (4): iN2V: Bringing Transductive Node Embeddings to Inductive Graphs
Title: iN2V: Bringing Transductive Node Embeddings to Inductive Graphs | iN2V: Transduktive Knoteneinbettungen in induktive Graphen bringen | iN2V: 将感应节点嵌入式带入感应图 2506.05039v1 |
Authors (2): Nicolas Lell, Ansgar Scherp
Shallow node embeddings like node2vec (N2V) can be used for nodes without features or to supplement existing features with structure-based information. Embedding methods like N2V are limited in their application on new nodes, which restricts them to the transductive setting where the entire graph, including the test nodes, is available during training. We propose inductive node2vec (iN2V), which combines a post-hoc procedure to compute embeddings for nodes unseen during training and modifications to the original N2V training procedure to prepare the embeddings for this post-hoc procedure. We conduct experiments on several benchmark datasets and demonstrate that iN2V is an effective approach to bringing transductive embeddings to an inductive setting. Using iN2V embeddings improves node classification by 1 point on average, with up to 6 points of improvement depending on the dataset and the number of unseen nodes. Our iN2V is a plug-in approach to create new or enrich existing embeddings. It can also be combined with other embedding methods, making it a versatile approach for inductive node representation learning. Code to reproduce the results is available at https://github.com/Foisunt/iN2V .
nan
Article 1936
Title@2025-06-05 (4): Scalable Multi-Agent Offline Reinforcement Learning and the Role of Information
Title: Scalable Multi-Agent Offline Reinforcement Learning and the Role of Information | Skalierbares Mehr-Agent-Offline-Verstärkungs-Lernen und die Rolle der Information | 可缩放的多代理机构脱线强化学习和信息的作用 2502.11260v2 |
Authors (3): Riccardo Zamboni, Enrico Brunetti, Marcello Restelli
Offline Reinforcement Learning (RL) focuses on learning policies solely from a batch of previously collected data. offering the potential to leverage such datasets effectively without the need for costly or risky active exploration. While recent advances in Offline Multi-Agent RL (MARL) have shown promise, most existing methods either rely on large datasets jointly collected by all agents or agent-specific datasets collected independently. The former approach ensures strong performance but raises scalability concerns, while the latter emphasizes scalability at the expense of performance guarantees. In this work, we propose a novel scalable routine for both dataset collection and offline learning. Agents first collect diverse datasets coherently with a pre-specified information-sharing network and subsequently learn coherent localized policies without requiring either full observability or falling back to complete decentralization. We theoretically demonstrate that this structured approach allows a multi-agent extension of the seminal Fitted Q-Iteration (FQI) algorithm to globally converge, in high probability, to near-optimal policies. The convergence is subject to error terms that depend on the informativeness of the shared information. Furthermore, we show how this approach allows to bound the inherent error of the supervised-learning phase of FQI with the mutual information between shared and unshared information. Our algorithm, SCAlable Multi-agent FQI (SCAM-FQI), is then evaluated on a distributed decision-making problem. The empirical results align with our theoretical findings, supporting the effectiveness of SCAM-FQI in achieving a balance between scalability and policy performance.
nan
Article 1937
Title@2025-06-05 (4): TIMING: Temporality-Aware Integrated Gradients for Time Series Explanation
Title: TIMING: Temporality-Aware Integrated Gradients for Time Series Explanation | TIMING: Temporality-Aware integrierte Gradienten für die Zeitreihe Erklärung | TIMING: 时间序列解释时空软件集成梯度 2506.05035v1 |
Authors (3): Hyeongwon Jang, Changhun Kim, Eunho Yang
Recent explainable artificial intelligence (XAI) methods for time series primarily estimate point-wise attribution magnitudes, while overlooking the directional impact on predictions, leading to suboptimal identification of significant points. Our analysis shows that conventional Integrated Gradients (IG) effectively capture critical points with both positive and negative impacts on predictions. However, current evaluation metrics fail to assess this capability, as they inadvertently cancel out opposing feature contributions. To address this limitation, we propose novel evaluation metrics-Cumulative Prediction Difference (CPD) and Cumulative Prediction Preservation (CPP)-to systematically assess whether attribution methods accurately identify significant positive and negative points in time series XAI. Under these metrics, conventional IG outperforms recent counterparts. However, directly applying IG to time series data may lead to suboptimal outcomes, as generated paths ignore temporal relationships and introduce out-of-distribution samples. To overcome these challenges, we introduce TIMING, which enhances IG by incorporating temporal awareness while maintaining its theoretical properties. Extensive experiments on synthetic and real-world time series benchmarks demonstrate that TIMING outperforms existing time series XAI baselines. Our code is available at https://github.com/drumpt/TIMING.
nan
Article 1938
Title@2025-06-05 (4): Identifying and Understanding Cross-Class Features in Adversarial Training
Title: Identifying and Understanding Cross-Class Features in Adversarial Training | Identifizieren und Verstehen von Cross-Class-Features im Adversarial Training | 确定和理解反向培训的跨系列特点 2506.05032v1 |
Authors (3): Zeming Wei, Yiwen Guo, Yisen Wang
Adversarial training (AT) has been considered one of the most effective methods for making deep neural networks robust against adversarial attacks, while the training mechanisms and dynamics of AT remain open research problems. In this paper, we present a novel perspective on studying AT through the lens of class-wise feature attribution. Specifically, we identify the impact of a key family of features on AT that are shared by multiple classes, which we call cross-class features. These features are typically useful for robust classification, which we offer theoretical evidence to illustrate through a synthetic data model. Through systematic studies across multiple model architectures and settings, we find that during the initial stage of AT, the model tends to learn more cross-class features until the best robustness checkpoint. As AT further squeezes the training robust loss and causes robust overfitting, the model tends to make decisions based on more class-specific features. Based on these discoveries, we further provide a unified view of two existing properties of AT, including the advantage of soft-label training and robust overfitting. Overall, these insights refine the current understanding of AT mechanisms and provide new perspectives on studying them. Our code is available at https://github.com/PKU-ML/Cross-Class-Features-AT.
nan
Article 1939
Title@2025-06-05 (4): Artificial Intelligence Should Genuinely Support Clinical Reasoning and Decision Making To Bridge the Translational Gap
Title: Artificial Intelligence Should Genuinely Support Clinical Reasoning and Decision Making To Bridge the Translational Gap | Künstliche Intelligenz sollte die klinische Vernunft und Entscheidungsfindung wirklich unterstützen, um die translationale Lücke zu überbrücken | 人工情报应真正支持临床理由和决策,以弥补翻译差距 2506.05030v1 |
Authors (3): Kacper Sokol, James Fackler, Julia E Vogt
Artificial intelligence promises to revolutionise medicine, yet its impact remains limited because of the pervasive translational gap. We posit that the prevailing technology-centric approaches underpin this challenge, rendering such systems fundamentally incompatible with clinical practice, specifically diagnostic reasoning and decision making. Instead, we propose a novel sociotechnical conceptualisation of data-driven support tools designed to complement doctors’ cognitive and epistemic activities. Crucially, it prioritises real-world impact over superhuman performance on inconsequential benchmarks.
nan
Article 1940
Title@2025-06-05 (4): Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning
Title: Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning | Engpässe Transformer: Regelmäßige KV-Cache Abstraktion für generalisierte Vernunft | 装瓶式变压器: 用于一般理由的周期 KV 缓存摘要 2505.16950v2 |
Authors (5): Adnan Oomerjee, Zafeirios Fountas, Zhongwei Yu, Haitham Bou-Ammar, Jun Wang
Despite their impressive capabilities, Large Language Models struggle with generalisation beyond their training distribution, often exhibiting sophisticated pattern interpolation rather than true abstract reasoning (extrapolation). In this work, we approach this limitation through the lens of Information Bottleneck (IB) theory, which posits that model generalisation emerges from an optimal balance between input compression and retention of predictive information in latent representations. We prove using IB theory that decoder-only Transformers are inherently constrained in their ability to form task-optimal sequence representations. We then use this result to demonstrate that periodic global transformation of the internal sequence-level representations (KV cache) is a necessary computational step for improving Transformer generalisation in reasoning tasks. Based on these theoretical insights, we propose a modification to the Transformer architecture, in the form of an additional module that globally rewrites the KV cache at periodic intervals, shifting its capacity away from memorising input prefixes and toward encoding features most useful for predicting future tokens. Our model delivers substantial gains on mathematical reasoning benchmarks, outperforming both vanilla Transformers with up to 3.5x more parameters, as well as heuristic-driven pruning mechanisms for cache compression. Our approach can be seen as a principled generalisation of existing KV-cache compression methods; whereas such methods focus solely on compressing input representations, they often do so at the expense of retaining predictive information, and thus their capabilities are inherently bounded by those of an unconstrained model. This establishes a principled framework to manipulate Transformer memory using information theory, addressing fundamental reasoning limitations that scaling alone cannot overcome.
nan
Article 1941
Title@2025-06-05 (4): Tuning the Right Foundation Models is What you Need for Partial Label Learning
Title: Tuning the Right Foundation Models is What you Need for Partial Label Learning | Tuning the right foundation models is what you need for Partial Label Learning | 向右基建模投注是部分标签学习所需要的 2506.05027v1 |
Authors (4): Kuang He, Wei Tang, Tong Wei, Min-Ling Zhang
Partial label learning (PLL) seeks to train generalizable classifiers from datasets with inexact supervision, a common challenge in real-world applications. Existing studies have developed numerous approaches to progressively refine and recover ground-truth labels by training convolutional neural networks. However, limited attention has been given to foundation models that offer transferrable representations. In this work, we empirically conduct comprehensive evaluations of 11 foundation models across 13 PLL approaches on 8 benchmark datasets under 3 PLL scenarios. We further propose PartialCLIP, an efficient fine-tuning framework for foundation models in PLL. Our findings reveal that current PLL approaches tend to 1) achieve significant performance gains when using foundation models, 2) exhibit remarkably similar performance to each other, 3) maintain stable performance across varying ambiguity levels, while 4) are susceptible to foundation model selection and adaptation strategies. Additionally, we demonstrate the efficacy of text-embedding classifier initialization and effective candidate label filtering using zero-shot CLIP. Our experimental results and analysis underscore the limitations of current PLL approaches and provide valuable insights for developing more generalizable PLL models. The source code can be found at https://github.com/SEU-hk/PartialCLIP.
nan
Article 1942
Title@2025-06-05 (4): A unified weighting framework for evaluating nearest neighbour classification
Title: A unified weighting framework for evaluating nearest neighbour classification | Ein einheitlicher Gewichtungsrahmen für die Bewertung der nächsten Nachbarklassifikation | 评价近邻分类的统一加权框架 2311.16872v3 |
Authors (3): Oliver Urs Lenz, Henri Bollaert, Chris Cornelis
We present the first comprehensive and large-scale evaluation of classical (NN), fuzzy (FNN) and fuzzy rough (FRNN) nearest neighbour classification. We standardise existing proposals for nearest neighbour weighting with kernel functions, applied to the distance values and/or ranks of the nearest neighbours of a test instance. In particular, we show that the theoretically optimal Samworth weights converge to a kernel. Kernel functions are closely related to fuzzy negation operators, and we propose a new kernel based on Yager negation. We also consider various distance and scaling measures, which we show can be related to each other. Through a systematic series of experiments on 85 real-life classification datasets, we find that NN, FNN and FRNN all perform best with Boscovich distance, and that NN and FRNN perform best with a combination of Samworth rank- and distance-weights and scaling by the mean absolute deviation around the median ($r_1$), the standard deviation ($r_2$) or the semi-interquartile range ($r_{\infty}^*$), while FNN performs best with only Samworth distance-weights and $r_1$- or $r_2$-scaling. However, NN achieves comparable performance with Yager-$\frac{1}{2}$ distance-weights, which are simpler to implement than a combination of Samworth distance- and rank-weights. Finally, FRNN generally outperforms NN, which in turn performs systematically better than FNN.
nan
Article 1943
Title@2025-06-05 (4): GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs
Title: GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs | GRU: Abbau des Kompromisses zwischen Unlearning und Retention für LLMs | GRU:减少LLMM的不学习与保留之间的取舍 2503.09117v3 |
Authors (7): Yue Wang, Qizhou Wang, Feng Liu, Wei Huang, Yali Du, Xiaojiang Du, Bo Han
Large language model (LLM) unlearning has demonstrated its essential role in removing privacy and copyright-related responses, crucial for their legal and safe applications. However, the pursuit of complete unlearning often comes with substantial costs due to its compromises in their general functionality, leading to a notorious trade-off between unlearning and retention. It motivates this paper to explore enhanced unlearning schemes that can mitigate this trade-off. Specifically, we propose Gradient Rectified Unlearning (GRU), an improved framework that regulates the directions of gradient updates during the unlearning procedure such that their side impacts on other, unrelated responses can be minimized. GRU is easy and general to implement, demonstrating practical effectiveness across a variety of well-established unlearning benchmarks.
nan
Article 1944
Title@2025-06-05 (4): U-NetMN and SegNetMN: Modified U-Net and SegNet models for bimodal SAR image segmentation
Title: U-NetMN and SegNetMN: Modified U-Net and SegNet models for bimodal SAR image segmentation | U-NetMN und SegNetMN: Modifizierte U-Net- und SegNet-Modelle für bimodale SAR-Bildsegmentierung | U-NetMN和SegNetMN:经过修改的双式合成孔径雷达图像分离 U-Net和SegNet模型和SegNet模型 2506.05444v1 |
Authors (6): Marwane Kzadri, Franco Alberto Cardillo, Nanée Chahinian, Carole Delenne, Renaud Hostache, Jamal Riffi
Segmenting Synthetic Aperture Radar (SAR) images is crucial for many remote sensing applications, particularly water body detection. However, deep learning-based segmentation models often face challenges related to convergence speed and stability, mainly due to the complex statistical distribution of this type of data. In this study, we evaluate the impact of mode normalization on two widely used semantic segmentation models, U-Net and SegNet. Specifically, we integrate mode normalization, to reduce convergence time while maintaining the performance of the baseline models. Experimental results demonstrate that mode normalization significantly accelerates convergence. Furthermore, cross-validation results indicate that normalized models exhibit increased stability in different zones. These findings highlight the effectiveness of normalization in improving computational efficiency and generalization in SAR image segmentation.
nan
Article 1945
Title@2025-06-05 (4): On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Title: On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding | Über Expressive Power von Schleiftransformatoren: Theoretische Analyse und Verbesserung über die Zeitschrittkodierung | 循环变换器的表达力:通过时间步骤编码进行理论分析和加强 2410.01405v7 |
Authors (2): Kevin Xu, Issei Sato
Looped Transformers provide advantages in parameter efficiency, computational capabilities, and generalization for reasoning tasks. However, their expressive power regarding function approximation remains underexplored. In this paper, we establish the approximation rate of Looped Transformers by defining the modulus of continuity for sequence-to-sequence functions. This reveals a limitation specific to the looped architecture. That is, the analysis prompts the incorporation of scaling parameters for each loop, conditioned on timestep encoding. Experiments validate the theoretical results, showing that increasing the number of loops enhances performance, with further gains achieved through the timestep encoding.
nan
Article 1946
Title@2025-06-05 (4): Stein Variational Evolution Strategies
Title: Stein Variational Evolution Strategies | Strategien der Steinvariational Evolution | Stein变异演变战略 2410.10390v2 |
Authors (3): Cornelius V. Braun, Robert T. Lange, Marc Toussaint
Stein Variational Gradient Descent (SVGD) is a highly efficient method to sample from an unnormalized probability distribution. However, the SVGD update relies on gradients of the log-density, which may not always be available. Existing gradient-free versions of SVGD make use of simple Monte Carlo approximations or gradients from surrogate distributions, both with limitations. To improve gradient-free Stein variational inference, we combine SVGD steps with evolution strategy (ES) updates. Our results demonstrate that the resulting algorithm generates high-quality samples from unnormalized target densities without requiring gradient information. Compared to prior gradient-free SVGD methods, we find that the integration of the ES update in SVGD significantly improves the performance on multiple challenging benchmark problems.
nan
Article 1947
Title@2025-06-05 (4): Controlling Summarization Length Through EOS Token Weighting
Title: Controlling Summarization Length Through EOS Token Weighting | Kontrolle der Zusammenfassung Länge durch EOS Token Gewichtung | 控制通过 EOS 过 EOS 键权重加权的控控总和长度 2506.05017v1 |
Authors (4): Zeno Belligoli, Emmanouil Stergiadis, Eran Fainman, Ilya Gusev
Controlling the length of generated text can be crucial in various text-generation tasks, including summarization. Existing methods often require complex model alterations, limiting compatibility with pre-trained models. We address these limitations by developing a simple approach for controlling the length of automatic text summaries by increasing the importance of correctly predicting the EOS token in the cross-entropy loss computation. The proposed methodology is agnostic to architecture and decoding algorithms and orthogonal to other inference-time techniques to control the generation length. We tested it with encoder-decoder and modern GPT-style LLMs, and show that this method can control generation length, often without affecting the quality of the summary.
nan
Article 1948
Title@2025-06-05 (4): A new architecture of high-order deep neural networks that learn martingales
Title: A new architecture of high-order deep neural networks that learn martingales | Eine neue Architektur von hochordentlich tiefen neuronalen Netzwerken, die martingales lernen | 一个由高阶深层神经网络组成的新结构,能够学习马丁鱼 2505.03789v2 |
Authors (2): Syoiti Ninomiya, Yuming Ma
A new deep-learning neural network architecture based on high-order weak approximation algorithms for stochastic differential equations (SDEs) is proposed. The architecture enables the efficient learning of martingales by deep learning models. The behaviour of deep neural networks based on this architecture, when applied to the problem of pricing financial derivatives, is also examined. The core of this new architecture lies in the high-order weak approximation algorithms of the explicit Runge–Kutta type, wherein the approximation is realised solely through iterative compositions and linear combinations of vector fields of the target SDEs.
nan
Article 1949
Title@2025-06-05 (4): Multi-Point Proximity Encoding For Vector-Mode Geospatial Machine Learning
Title: Multi-Point Proximity Encoding For Vector-Mode Geospatial Machine Learning | Multi-Point Proximity Encoding für Vector-Mode Geospatial Machine Learning | 矢量模式地理空间机器学习多点近似编码 2506.05016v1 |
Authors (1): John Collins
Vector-mode geospatial data – points, lines, and polygons – must be encoded into an appropriate form in order to be used with traditional machine learning and artificial intelligence models. Encoding methods attempt to represent a given shape as a vector that captures its essential geometric properties. This paper presents an encoding method based on scaled distances from a shape to a set of reference points within a region of interest. The method, MultiPoint Proximity (MPP) encoding, can be applied to any type of shape, enabling the parameterization of machine learning models with encoded representations of vector-mode geospatial features. We show that MPP encoding possesses the desirable properties of shape-centricity and continuity, can be used to differentiate spatial objects based on their geometric features, and can capture pairwise spatial relationships with high precision. In all cases, MPP encoding is shown to perform better than an alternative method based on rasterization.
nan
Article 1950
Title@2025-06-05 (4): Towards Reasonable Concept Bottleneck Models
Title: Towards Reasonable Concept Bottleneck Models | Auf dem Weg zu vernünftigen Konzeptmodellen für Engpässe | 迈向合理概念的瓶颈模式 2506.05014v1 |
Authors (4): Nektarios Kalampalikis, Kavya Gupta, Georgi Vitanov, Isabel Valera
In this paper, we propose $\textbf{C}$oncept $\textbf{REA}$soning $\textbf{M}$odels (CREAM), a novel family of Concept Bottleneck Models (CBMs) that: (i) explicitly encodes concept-concept (${\texttt{C-C}}$) and concept-task (${\texttt{C$\rightarrow$Y}}$) relationships to enforce a desired model reasoning; and (ii) use a regularized side-channel to achieve competitive task performance, while keeping high concept importance. Specifically, CREAM architecturally embeds (bi)directed concept-concept, and concept to task relationships specified by a human expert, while severing undesired information flows (e.g., to handle mutually exclusive concepts). Moreover, CREAM integrates a black-box side-channel that is regularized to encourage task predictions to be grounded in the relevant concepts, thereby utilizing the side-channel only when necessary to enhance performance. Our experiments show that: (i) CREAM mainly relies on concepts while achieving task performance on par with black-box models; and (ii) the embedded ${\texttt{C-C}}$ and ${\texttt{C$\rightarrow$Y}}$ relationships ease model interventions and mitigate concept leakage.
nan
Article 1951
Title@2025-06-05 (4): Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Title: Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity | 1568 Tokens in einen einzigen Vektor und wieder zurück krammen: Die Grenzen der Einbettung von Raumkapazität erkunden | 将1568吨撞成单一矢量和后向:探索嵌入空间能力的极限 2502.13063v2 |
Authors (4): Yuri Kuratov, Mikhail Arkhipov, Aydar Bulatov, Mikhail Burtsev
A range of recent works addresses the problem of compression of sequence of tokens into a shorter sequence of real-valued vectors to be used as inputs instead of token embeddings or key-value cache. These approaches are focused on reduction of the amount of compute in existing language models rather than minimization of number of bits needed to store text. Despite relying on powerful models as encoders, the maximum attainable lossless compression ratio is typically not higher than x10. This fact is highly intriguing because, in theory, the maximum information capacity of large real-valued vectors is far beyond the presented rates even for 16-bit precision and a modest vector size. In this work, we explore the limits of compression by replacing the encoder with a per-sample optimization procedure. We show that vectors with compression ratios up to x1500 exist, which highlights two orders of magnitude gap between existing and practically attainable solutions. Furthermore, we empirically show that the compression limits are determined not by the length of the input but by the amount of uncertainty to be reduced, namely, the cross-entropy loss on this sequence without any conditioning. The obtained limits highlight the substantial gap between the theoretical capacity of input embeddings and their practical utilization, suggesting significant room for optimization in model design.
nan
Article 1952
Title@2025-06-05 (4): QiMeng: Fully Automated Hardware and Software Design for Processor Chip
Title: QiMeng: Fully Automated Hardware and Software Design for Processor Chip | QiMeng: Vollautomatisierte Hardware und Software-Design für Prozessor Chip | QiMeng:处理芯片全自动硬件和软件设计 2506.05007v1 |
Authors (20): Rui Zhang, Yuanbo Wen, Shuyao Cheng, Di Huang, Shaohui Peng, Jiaming Guo, Pengwei Jin, Jiacheng Zhao, Tianrui Ma, Yaoyu Zhu, Yifan Hao, Yongwei Zhao, Shengwen Liang, Ying Wang, Xing Hu, Zidong Du, Huimin Cui, Ling Li, Qi Guo, Yunji Chen
Processor chip design technology serves as a key frontier driving breakthroughs in computer science and related fields. With the rapid advancement of information technology, conventional design paradigms face three major challenges: the physical constraints of fabrication technologies, the escalating demands for design resources, and the increasing diversity of ecosystems. Automated processor chip design has emerged as a transformative solution to address these challenges. While recent breakthroughs in Artificial Intelligence (AI), particularly Large Language Models (LLMs) techniques, have opened new possibilities for fully automated processor chip design, substantial challenges remain in establishing domain-specific LLMs for processor chip design. In this paper, we propose QiMeng, a novel system for fully automated hardware and software design of processor chips. QiMeng comprises three hierarchical layers. In the bottom-layer, we construct a domain-specific Large Processor Chip Model (LPCM) that introduces novel designs in architecture, training, and inference, to address key challenges such as knowledge representation gap, data scarcity, correctness assurance, and enormous solution space. In the middle-layer, leveraging the LPCM’s knowledge representation and inference capabilities, we develop the Hardware Design Agent and the Software Design Agent to automate the design of hardware and software for processor chips. Currently, several components of QiMeng have been completed and successfully applied in various top-layer applications, demonstrating significant advantages and providing a feasible solution for efficient, fully automated hardware/software design of processor chips. Future research will focus on integrating all components and performing iterative top-down and bottom-up design processes to establish a comprehensive QiMeng system.
nan
Article 1953
Title@2025-06-05 (4): Monotonic anomaly detection
Title: Monotonic anomaly detection | Monotonische Anomalieerkennung | 单声异常探测 2410.23158v2 |
Authors (2): Oliver Urs Lenz, Matthijs van Leeuwen
Semi-supervised anomaly detection is based on the principle that potential anomalies are those records that look different from normal training data. However, in some cases we are specifically interested in anomalies that correspond to high attribute values (or low, but not both). We present two asymmetrical distance measures that take this monotonicity into account: ramp distance and signed distance. Through experiments on synthetic and real-life datasets, we show that ramp distance increases anomaly detection performance over the traditional absolute distance. While signed distance also performs well on synthetic data, it performs substantially poorer on real-life datasets. We argue that this is a consequence of the fact that when using signed distance, low values of certain attributes automatically compensate for high values of other attributes, such that anomaly detection is reduced to counting the total attribute value sum, which is too simplistic in practice.
nan
Article 1954
Title@2025-06-05 (4): Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games
Title: Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games | Vorsichtiger Optimismus: Ein Meta-Algorithmus für fast-konstantes Bedauern in den allgemeinen Spielen | 谨慎乐观主义:普通运动会中近距离遗憾的代算算法 2506.05005v1 |
Authors (3): Ashkan Soleymani, Georgios Piliouras, Gabriele Farina
Recent work [Soleymani et al., 2025] introduced a variant of Optimistic Multiplicative Weights Updates (OMWU) that adaptively controls the learning pace in a dynamic, non-monotone manner, achieving new state-of-the-art regret minimization guarantees in general games. In this work, we demonstrate that no-regret learning acceleration through adaptive pacing of the learners is not an isolated phenomenon. We introduce \emph{Cautious Optimism}, a framework for substantially faster regularized learning in general games. Cautious Optimism takes as input any instance of Follow-the-Regularized-Leader (FTRL) and outputs an accelerated no-regret learning algorithm by pacing the underlying FTRL with minimal computational overhead. Importantly, we retain uncoupledness (learners do not need to know other players’ utilities). Cautious Optimistic FTRL achieves near-optimal $O_T(\log T)$ regret in diverse self-play (mixing-and-matching regularizers) while preserving the optimal $O(\sqrt{T})$ regret in adversarial scenarios. In contrast to prior works (e.g. Syrgkanis et al. [2015], Daskalakis et al. [2021]), our analysis does not rely on monotonic step-sizes, showcasing a novel route for fast learning in general games.
nan
Article 1955
Title@2025-06-05 (4): UniPTMs: The First Unified Multi-type PTM Site Prediction Model via Master-Slave Architecture-Based Multi-Stage Fusion Strategy and Hierarchical Contrastive Loss
Title: UniPTMs: The First Unified Multi-type PTM Site Prediction Model via Master-Slave Architecture-Based Multi-Stage Fusion Strategy and Hierarchical Contrastive Loss | UniPTMs: Das erste einheitliche Multi-Type PTM Site Prediction Model über Master-Slave Architecture-basierte Multi-Stage Fusionsstrategie und hierarchischen Kontrastverlust | UNIPTMs:第一个单一多型PTM多型PTM站点预测模型,通过基于万能结构的建筑多阶段融合战略和等级对立损失 2506.05443v1 |
Authors (6): Yiyu Lin, Yan Wang, You Zhou, Xinye Ni, Jiahui Wu, Sen Yang
As a core mechanism of epigenetic regulation in eukaryotes, protein post-translational modifications (PTMs) require precise prediction to decipher dynamic life activity networks. To address the limitations of existing deep learning models in cross-modal feature fusion, domain generalization, and architectural optimization, this study proposes UniPTMs: the first unified framework for multi-type PTM prediction. The framework innovatively establishes a “Master-Slave” dual-path collaborative architecture: The master path dynamically integrates high-dimensional representations of protein sequences, structures, and evolutionary information through a Bidirectional Gated Cross-Attention (BGCA) module, while the slave path optimizes feature discrepancies and recalibration between structural and traditional features using a Low-Dimensional Fusion Network (LDFN). Complemented by a Multi-scale Adaptive convolutional Pyramid (MACP) for capturing local feature patterns and a Bidirectional Hierarchical Gated Fusion Network (BHGFN) enabling multi-level feature integration across paths, the framework employs a Hierarchical Dynamic Weighting Fusion (HDWF) mechanism to intelligently aggregate multimodal features. Enhanced by a novel Hierarchical Contrastive loss function for feature consistency optimization, UniPTMs demonstrates significant performance improvements (3.2%-11.4% MCC and 4.2%-14.3% AP increases) over state-of-the-art models across five modification types and transcends the Single-Type Prediction Paradigm. To strike a balance between model complexity and performance, we have also developed a lightweight variant named UniPTMs-mini.
nan
Article 1956
Title@2025-06-05 (4): FPTQuant: Function-Preserving Transforms for LLM Quantization
Title: FPTQuant: Function-Preserving Transforms for LLM Quantization | FPTQuant: Funktionsschonende Transformationen für die LLM-Quantisierung | FPTQuant: 用于LLM量化的功能保存变换 2506.04985v1 |
Authors (4): Boris van Breugel, Yelysei Bondarenko, Paul Whatmough, Markus Nagel
Large language models (LLMs) require substantial compute, and thus energy, at inference time. While quantizing weights and activations is effective at improving efficiency, naive quantization of LLMs can significantly degrade performance due to large magnitude outliers. This paper describes FPTQuant, which introduces four novel, lightweight, and expressive function-preserving transforms (FPTs) to facilitate quantization of transformers: (1) a mergeable pre-RoPE transform for queries and keys, (2) a mergeable transform for values, (3) a mergeable scaling transform within the MLP block, and (4) a cheap, dynamic scaling transform. By leveraging the equivariances and independencies inherent to canonical transformer operation, we designed these FPTs to maintain the model’s function while shaping the intermediate activation distributions to be more quantization friendly. FPTQuant requires no custom kernels and adds virtually no overhead during inference. The FPTs are trained both locally to reduce outliers, and end-to-end such that the outputs of the quantized and full-precision models match. FPTQuant enables static INT4 quantization with minimal overhead and shows SOTA speed-up of up to 3.9 times over FP. Empirically, FPTQuant has an excellent accuracy-speed trade-off – it is performing on par or exceeding most prior work and only shows slightly lower accuracy compared to a method that is up to 29% slower.
nan
Article 1957
Title@2025-06-05 (4): Agentic AI for Intent-Based Industrial Automation
Title: Agentic AI for Intent-Based Industrial Automation | Agentische KI für Intent-Based Industrial Automation | 用于内在工业自动化的AAA剂 2506.04980v1 |
Authors (2): Marcos Lima Romero, Ricardo Suyama
The recent development of Agentic AI systems, empowered by autonomous large language models (LLMs) agents with planning and tool-usage capabilities, enables new possibilities for the evolution of industrial automation and reduces the complexity introduced by Industry 4.0. This work proposes a conceptual framework that integrates Agentic AI with the intent-based paradigm, originally developed in network research, to simplify human-machine interaction (HMI) and better align automation systems with the human-centric, sustainable, and resilient principles of Industry 5.0. Based on the intent-based processing, the framework allows human operators to express high-level business or operational goals in natural language, which are decomposed into actionable components. These intents are broken into expectations, conditions, targets, context, and information that guide sub-agents equipped with specialized tools to execute domain-specific tasks. A proof of concept was implemented using the CMAPSS dataset and Google Agent Developer Kit (ADK), demonstrating the feasibility of intent decomposition, agent orchestration, and autonomous decision-making in predictive maintenance scenarios. The results confirm the potential of this approach to reduce technical barriers and enable scalable, intent-driven automation, despite data quality and explainability concerns.
nan
Article 1958
Title@2025-06-05 (4): Entropy-based Training Methods for Scalable Neural Implicit Sampler
Title: Entropy-based Training Methods for Scalable Neural Implicit Sampler | Entropie-basierte Trainingsmethoden für skalierbare neurale Implizite Sampler | 用于可缩缩的神经隐隐性取样器的基于英信的培训方法 2306.04952v2 |
Authors (3): Weijian Luo, Boya Zhang, Zhihua Zhang
Efficiently sampling from un-normalized target distributions is a fundamental problem in scientific computing and machine learning. Traditional approaches such as Markov Chain Monte Carlo (MCMC) guarantee asymptotically unbiased samples from such distributions but suffer from computational inefficiency, particularly when dealing with high-dimensional targets, as they require numerous iterations to generate a batch of samples. In this paper, we introduce an efficient and scalable neural implicit sampler that overcomes these limitations. The implicit sampler can generate large batches of samples with low computational costs by leveraging a neural transformation that directly maps easily sampled latent vectors to target samples without the need for iterative procedures. To train the neural implicit samplers, we introduce two novel methods: the KL training method and the Fisher training method. The former method minimizes the Kullback-Leibler divergence, while the latter minimizes the Fisher divergence between the sampler and the target distributions. By employing the two training methods, we effectively optimize the neural implicit samplers to learn and generate from the desired target distribution. To demonstrate the effectiveness, efficiency, and scalability of our proposed samplers, we evaluate them on three sampling benchmarks with different scales.
nan
Article 1959
Title@2025-06-05 (4): Deep histological synthesis from mass spectrometry imaging for multimodal registration
Title: Deep histological synthesis from mass spectrometry imaging for multimodal registration | Tiefe histologische Synthese aus Massenspektrometrie-Bildgebung für multimodale Registrierung | 从用于多式联运注册的质谱成像成像中产生深度神学合成 2506.05441v1 |
Authors (4): Kimberley M. Bird, Xujiong Ye, Alan M. Race, James M. Brown
Registration of histological and mass spectrometry imaging (MSI) allows for more precise identification of structural changes and chemical interactions in tissue. With histology and MSI having entirely different image formation processes and dimensionalities, registration of the two modalities remains an ongoing challenge. This work proposes a solution that synthesises histological images from MSI, using a pix2pix model, to effectively enable unimodal registration. Preliminary results show promising synthetic histology images with limited artifacts, achieving increases in mutual information (MI) and structural similarity index measures (SSIM) of +0.924 and +0.419, respectively, compared to a baseline U-Net model. Our source code is available on GitHub: https://github.com/kimberley/MIUA2025.
nan
Article 1960
Title@2025-06-05 (4): Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences
Title: Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences | Diff-Instruct++: Ein-Schritt-Training Text-zu-Bild-Generator-Modell zur Ausrichtung mit menschlichen Einstellungen | Diff- Instrucct+++:培训一步骤文本到图像生成模型,使之与人类偏好相一致 2410.18881v2 |
Authors (1): Weijian Luo
One-step text-to-image generator models offer advantages such as swift inference efficiency, flexible architectures, and state-of-the-art generation performance. In this paper, we study the problem of aligning one-step generator models with human preferences for the first time. Inspired by the success of reinforcement learning using human feedback (RLHF), we formulate the alignment problem as maximizing expected human reward functions while adding an Integral Kullback-Leibler divergence term to prevent the generator from diverging. By overcoming technical challenges, we introduce Diff-Instruct++ (DI++), the first, fast-converging and image data-free human preference alignment method for one-step text-to-image generators. We also introduce novel theoretical insights, showing that using CFG for diffusion distillation is secretly doing RLHF with DI++. Such an interesting finding brings understanding and potential contributions to future research involving CFG. In the experiment sections, we align both UNet-based and DiT-based one-step generators using DI++, which use the Stable Diffusion 1.5 and the PixelArt-$\alpha$ as the reference diffusion processes. The resulting DiT-based one-step text-to-image model achieves a strong Aesthetic Score of 6.19 and an Image Reward of 1.24 on the COCO validation prompt dataset. It also achieves a leading Human preference Score (HPSv2.0) of 28.48, outperforming other open-sourced models such as Stable Diffusion XL, DMD2, SD-Turbo, as well as PixelArt-$\alpha$. Both theoretical contributions and empirical evidence indicate that DI++ is a strong human-preference alignment approach for one-step text-to-image models. The homepage of the paper is https://github.com/pkulwj1994/diff_instruct_pp.
nan
Article 1961
Title@2025-06-05 (4): Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule
Title: Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule | Pilotieren von strukturbasiertem Drug Design über Modalitätsspezifische Optimale Planung | 通过特定方式和特定最佳安排表进行基于结构的药物设计 2505.07286v2 |
Authors (8): Keyue Qiu, Yuxuan Song, Zhehuan Fan, Peidong Liu, Zhe Zhang, Mingyue Zheng, Hao Zhou, Wei-Ying Ma
Structure-Based Drug Design (SBDD) is crucial for identifying bioactive molecules. Recent deep generative models are faced with challenges in geometric structure modeling. A major bottleneck lies in the twisted probability path of multi-modalities – continuous 3D positions and discrete 2D topologies – which jointly determine molecular geometries. By establishing the fact that noise schedules decide the Variational Lower Bound (VLB) for the twisted probability path, we propose VLB-Optimal Scheduling (VOS) strategy in this under-explored area, which optimizes VLB as a path integral for SBDD. Our model effectively enhances molecular geometries and interaction modeling, achieving state-of-the-art PoseBusters passing rate of 95.9% on CrossDock, more than 10% improvement upon strong baselines, while maintaining high affinities and robust intramolecular validity evaluated on held-out test set. Code is available at https://github.com/AlgoMole/MolCRAFT.
nan
Article 1962
Title@2025-06-05 (4): Efficient Time Series Processing for Transformers and State-Space Models through Token Merging
Title: Efficient Time Series Processing for Transformers and State-Space Models through Token Merging | Effiziente Zeitreihenverarbeitung für Transformatoren und State-Space-Modelle durch Token Merging | 通过 Token 合并对变形器和国家空间模型的有效时间序列处理 2405.17951v3 |
Authors (4): Leon Götz, Marcel Kollovieh, Stephan Günnemann, Leo Schwinn
Despite recent advances in subquadratic attention mechanisms or state-space models, processing long token sequences still imposes significant computational requirements. Token merging has emerged as a solution to increase computational efficiency in computer vision architectures. In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its computational complexity from quadratic to linear based on the neighborhood size to effectively scale to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Further, we identify spectral properties of the input data that reliably predict the potential benefits of local merging without requiring evaluation on downstream tasks. Our comprehensive empirical evaluation demonstrates that local merging offers substantial efficiency gains with minimal impact on accuracy, achieving up to 5400% acceleration on the recently proposed Chronos foundation model.
nan
Article 1963
Title@2025-06-05 (4): Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models
Title: Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models | Gemeinsame Interventionseffekte von Single-Variablen-Interventionen in Additiven-Modellen lernen | 在 “ 补充模式 “ 中单一可变干预措施的 学习联合干预效应 2506.04945v1 |
Authors (3): Armin Kekić, Sergio Hernan Garrido Mejia, Bernhard Schölkopf
Estimating causal effects of joint interventions on multiple variables is crucial in many domains, but obtaining data from such simultaneous interventions can be challenging. Our study explores how to learn joint interventional effects using only observational data and single-variable interventions. We present an identifiability result for this problem, showing that for a class of nonlinear additive outcome mechanisms, joint effects can be inferred without access to joint interventional data. We propose a practical estimator that decomposes the causal effect into confounded and unconfounded contributions for each intervention variable. Experiments on synthetic data demonstrate that our method achieves performance comparable to models trained directly on joint interventional data, outperforming a purely observational estimator.
nan
Article 1964
Title@2025-06-05 (4): Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation
Title: Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation | Schwacher Generativer Sampler zur effizienten Probeninvariantverteilung der stochastischen Differentialgleichung | 用于有效取样的弱生成采样器,以有效采样、不变化地分配蒸蒸蒸式差别等式的微变式采样器 2405.19256v2 |
Authors (4): Zhiqiang Cai, Yu Cao, Yuanfei Huang, Xiang Zhou
Sampling invariant distributions from an It\^o diffusion process presents a significant challenge in stochastic simulation. Traditional numerical solvers for stochastic differential equations require both a fine step size and a lengthy simulation period, resulting in biased and correlated samples. The current deep learning-based method solves the stationary Fokker–Planck equation to determine the invariant probability density function in the form of deep neural networks, but they generally do not directly address the problem of sampling from the computed density function. In this work, we introduce a framework that employs a weak generative sampler (WGS) to directly generate independent and identically distributed (iid) samples induced by a transformation map derived from the stationary Fokker–Planck equation. Our proposed loss function is based on the weak form of the Fokker–Planck equation, integrating normalizing flows to characterize the invariant distribution and facilitate sample generation from a base distribution. Our randomized test function circumvents the need for min-max optimization in the traditional weak formulation. Our method necessitates neither the computationally intensive calculation of the Jacobian determinant nor the invertibility of the transformation map. A crucial component of our framework is the adaptively chosen family of test functions in the form of Gaussian kernel functions with centers related to the generated data samples. Experimental results on several benchmark examples demonstrate the effectiveness and scalability of our method, which offers both low computational costs and excellent capability in exploring multiple metastable states.
nan
Article 1965
Title@2025-06-05 (4): David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training
Title: David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training | David und Goliath: Kleines Ein-Schritt-Modell schlägt große Diffusion mit Partitur nach dem Training | David和Goliath:小型一步骤模式比大规模传播更胜于记分后培训 2410.20898v3 |
Authors (4): Weijian Luo, Colin Zhang, Debing Zhang, Zhengyang Geng
We propose Diff-Instruct* (DI), a data-efficient post-training approach for one-step text-to-image generative models to improve its human preferences without requiring image data. Our method frames alignment as online reinforcement learning from human feedback (RLHF), which optimizes the one-step model to maximize human reward functions while being regularized to be kept close to a reference diffusion process. Unlike traditional RLHF approaches, which rely on the Kullback-Leibler divergence as the regularization, we introduce a novel general score-based divergence regularization that substantially improves performance as well as post-training stability. Although the general score-based RLHF objective is intractable to optimize, we derive a strictly equivalent tractable loss function in theory that can efficiently compute its \emph{gradient} for optimizations. We introduce \emph{DI-SDXL-1step}, which is a 2.6B one-step text-to-image model at a resolution of $1024\times 1024$, post-trained from DMD2 w.r.t SDXL. \textbf{Our 2.6B \emph{DI*-SDXL-1step} model outperforms the 50-step 12B FLUX-dev model} in ImageReward, PickScore, and CLIP score on the Parti prompts benchmark while using only 1.88\% of the inference time. This result clearly shows that with proper post-training, the small one-step model is capable of beating huge multi-step diffusion models. Our model is open-sourced at this link: https://github.com/pkulwj1994/diff_instruct_star. We hope our findings can contribute to human-centric machine learning techniques.
nan
Article 1966
Title@2025-06-05 (4): Can Transformers Do Enumerative Geometry?
Title: Can Transformers Do Enumerative Geometry? | Können Transformer enumerative Geometrie? | 变形人能做数字几何吗? 2408.14915v3 |
Authors (3): Baran Hashemi, Roderic G. Corominas, Alessandro Giacchetto
How can Transformers model and learn enumerative geometry? What is a robust procedure for using Transformers in abductive knowledge discovery within a mathematician-machine collaboration? In this work, we introduce a Transformer-based approach to computational enumerative geometry, specifically targeting the computation of $\psi$-class intersection numbers on the moduli space of curves. By reformulating the problem as a continuous optimization task, we compute intersection numbers across a wide value range from $10^{-45}$ to $10^{45}$. To capture the recursive nature inherent in these intersection numbers, we propose the Dynamic Range Activator (DRA), a new activation function that enhances the Transformer’s ability to model recursive patterns and handle severe heteroscedasticity. Given precision requirements for computing the intersections, we quantify the uncertainty of the predictions using Conformal Prediction with a dynamic sliding window adaptive to the partitions of equivalent number of marked points. To the best of our knowledge, there has been no prior work on modeling recursive functions with such a high-variance and factorial growth. Beyond simply computing intersection numbers, we explore the enumerative “world-model” of Transformers. Our interpretability analysis reveals that the network is implicitly modeling the Virasoro constraints in a purely data-driven manner. Moreover, through abductive hypothesis testing, probing, and causal inference, we uncover evidence of an emergent internal representation of the the large-genus asymptotic of $\psi$-class intersection numbers. These findings suggest that the network internalizes the parameters of the asymptotic closed-form and the polynomiality phenomenon of intersection numbers in a non-linear manner. This opens up new possibilities in inferring asymptotic closed-form expressions directly from limited amount of data.
nan
Article 1967
Title@2025-06-05 (4): Energentic Intelligence: From Self-Sustaining Systems to Enduring Artificial Life
Title: Energentic Intelligence: From Self-Sustaining Systems to Enduring Artificial Life | Energentische Intelligenz: Von selbsterhaltenden Systemen zum Ausharren des künstlichen Lebens | 能源情报:从自我维持系统到人命持续 2506.04916v1 |
Authors (1): Atahan Karagoz
This paper introduces Energentic Intelligence, a class of autonomous systems defined not by task performance, but by their capacity to sustain themselves through internal energy regulation. Departing from conventional reward-driven paradigms, these agents treat survival-maintaining functional operation under fluctuating energetic and thermal conditions-as the central objective. We formalize this principle through an energy-based utility function and a viability-constrained survival horizon, and propose a modular architecture that integrates energy harvesting, thermal regulation, and adaptive computation into a closed-loop control system. A simulated environment demonstrates the emergence of stable, resource-aware behavior without external supervision. Together, these contributions provide a theoretical and architectural foundation for deploying autonomous agents in resource-volatile settings where persistence must be self-regulated and infrastructure cannot be assumed.
nan
Article 1968
Title@2025-06-05 (4): MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
Title: MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining | MiMo: Entsperren des vernünftigen Potenzials des Sprachmodells – Von der Vorschulung zur Nachschulung | 米莫:释放语文模式的理性潜力 – – 从预培训到员额培训 2505.07608v2 |
Authors (65): LLM-Core Xiaomi, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai, Chenhong He, Dong Zhang, Duo Zhang, Guoan Wang, Hao Tian, Haochen Zhao, Heng Qu, Hongshen Xu, Jun Shi, Kainan Bao, Kai Fang, Kang Zhou, Kangyang Zhou, Lei Li, Menghang Zhu, Nuo Chen, Qiantong Wang, Shaohui Liu, Shicheng Li, Shuhao Gu, Shuhuai Ren, Shuo Liu, Sirui Deng, Weiji Zhuang, Weiwei Lv, Wenyu Yang, Xin Zhang, Xing Yong, Xing Zhang, Xingchen Song, Xinzhe Xu, Xu Wang, Yihan Yan, Yu Tu, Yuanyuan Tian, Yudong Wang, Yue Yu, Zhenru Lin, Zhichao Song, Zihao Yue
We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model’s reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective for enhanced performance and accelerated inference speed. During post-training, we curate a dataset of 130K verifiable mathematics and programming problems for reinforcement learning, integrating a test-difficulty-driven code-reward scheme to alleviate sparse-reward issues and employing strategic data resampling to stabilize training. Extensive evaluations show that MiMo-7B-Base possesses exceptional reasoning potential, outperforming even much larger 32B models. The final RL-tuned model, MiMo-7B-RL, achieves superior performance on mathematics, code and general reasoning tasks, surpassing the performance of OpenAI o1-mini. The model checkpoints are available at https://github.com/xiaomimimo/MiMo.
nan
Article 1969
Title@2025-06-05 (4): Dissecting Long Reasoning Models: An Empirical Study
Title: Dissecting Long Reasoning Models: An Empirical Study | Modelle der langen Vernunft zu entschlüsseln: Eine empirische Studie | 解析长期理由模型:经验研究 2506.04913v1 |
Authors (8): Yongyu Mu, Jiali Zeng, Bei Li, Xinyan Guan, Fandong Meng, Jie Zhou, Tong Xiao, Jingbo Zhu
Despite recent progress in training long-context reasoning models via reinforcement learning (RL), several open questions and counterintuitive behaviors remain. This work focuses on three key aspects: (1) We systematically analyze the roles of positive and negative samples in RL, revealing that positive samples mainly facilitate data fitting, whereas negative samples significantly enhance generalization and robustness. Interestingly, training solely on negative samples can rival standard RL training performance. (2) We identify substantial data inefficiency in group relative policy optimization, where over half of the samples yield zero advantage. To address this, we explore two straightforward strategies, including relative length rewards and offline sample injection, to better leverage these data and enhance reasoning efficiency and capability. (3) We investigate unstable performance across various reasoning models and benchmarks, attributing instability to uncertain problems with ambiguous outcomes, and demonstrate that multiple evaluation runs mitigate this issue.
nan
Article 1970
Title@2025-06-05 (4): When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models
Title: When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models | Wenn LLMs denken lügen: Enthüllen der strategischen Täuschung in Repräsentationen von Vernunftmodellen | 当思考LLLM Lie:在解释理由模型代表中消除战略欺骗时 2506.04909v1 |
Authors (3): Kai Wang, Yihao Zhang, Meng Sun
The honesty of large language models (LLMs) is a critical alignment challenge, especially as advanced systems with chain-of-thought (CoT) reasoning may strategically deceive humans. Unlike traditional honesty issues on LLMs, which could be possibly explained as some kind of hallucination, those models’ explicit thought paths enable us to study strategic deception–goal-driven, intentional misinformation where reasoning contradicts outputs. Using representation engineering, we systematically induce, detect, and control such deception in CoT-enabled LLMs, extracting “deception vectors” via Linear Artificial Tomography (LAT) for 89% detection accuracy. Through activation steering, we achieve a 40% success rate in eliciting context-appropriate deception without explicit prompts, unveiling the specific honesty-related issue of reasoning models and providing tools for trustworthy AI alignment.
nan
Article 1971
Title@2025-06-05 (4): Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy
Title: Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy | Trainieren Sie einen Sparse Autoencoder über mehrere Sparsity-Haushalte, um Dolmetschbarkeit und Genauigkeit zu erhalten | 跨多级预算培训一个简单自动编码员,以维护可解释性和准确性 2505.24473v2 |
Authors (7): Nikita Balagansky, Yaroslav Aksenov, Daniil Laptev, Vadim Kurochkin, Gleb Gerasimov, Nikita Koryagin, Daniil Gavrilov
Sparse Autoencoders (SAEs) have proven to be powerful tools for interpreting neural networks by decomposing hidden representations into disentangled, interpretable features via sparsity constraints. However, conventional SAEs are constrained by the fixed sparsity level chosen during training; meeting different sparsity requirements therefore demands separate models and increases the computational footprint during both training and evaluation. We introduce a novel training objective, \emph{HierarchicalTopK}, which trains a single SAE to optimise reconstructions across multiple sparsity levels simultaneously. Experiments with Gemma-2 2B demonstrate that our approach achieves Pareto-optimal trade-offs between sparsity and explained variance, outperforming traditional SAEs trained at individual sparsity levels. Further analysis shows that HierarchicalTopK preserves high interpretability scores even at higher sparsity. The proposed objective thus closes an important gap between flexibility and interpretability in SAE design.
nan
Article 1972
Title@2025-06-05 (4): Are all models wrong? Fundamental limits in distribution-free empirical model falsification
Title: Are all models wrong? Fundamental limits in distribution-free empirical model falsification | Sind alle Modelle falsch? Grundlegende Grenzen in der verteilungsfreien empirischen Modellfälschung | 所有模型是否都错了?无分销经验模型伪造的基本限制 2502.06765v2 |
Authors (3): Manuel M. Müller, Yuetian Luo, Rina Foygel Barber
In statistics and machine learning, when we train a fitted model on available data, we typically want to ensure that we are searching within a model class that contains at least one accurate model – that is, we would like to ensure an upper bound on the model class risk (the lowest possible risk that can be attained by any model in the class). However, it is also of interest to establish lower bounds on the model class risk, for instance so that we can determine whether our fitted model is at least approximately optimal within the class, or, so that we can decide whether the model class is unsuitable for the particular task at hand. Particularly in the setting of interpolation learning where machine learning models are trained to reach zero error on the training data, we might ask if, at the very least, a positive lower bound on the model class risk is possible – or are we unable to detect that “all models are wrong”? In this work, we answer these questions in a distribution-free setting by establishing a model-agnostic, fundamental hardness result for the problem of constructing a lower bound on the best test error achievable over a model class, and examine its implications on specific model classes such as tree-based methods and linear regression.
nan
Article 1973
Title@2025-06-05 (4): Leveraging Diffusion Models for Parameterized Quantum Circuit Generation
Title: Leveraging Diffusion Models for Parameterized Quantum Circuit Generation | Nutzung von Diffusionsmodellen für die parameterisierte Quantum Circuit Generation | 利用可计量量子电路生成的传播模型 2505.20863v2 |
Authors (4): Daniel Barta, Darya Martyniuk, Johannes Jung, Adrian Paschke
Quantum computing holds immense potential, yet its practical success depends on multiple factors, including advances in quantum circuit design. In this paper, we introduce a generative approach based on denoising diffusion models (DMs) to synthesize parameterized quantum circuits (PQCs). Extending the recent diffusion model pipeline of F"urrutter et al. [1], our model effectively conditions the synthesis process, enabling the simultaneous generation of circuit architectures and their continuous gate parameters. We demonstrate our approach in synthesizing PQCs optimized for generating high-fidelity Greenberger-Horne-Zeilinger (GHZ) states and achieving high accuracy in quantum machine learning (QML) classification tasks. Our results indicate a strong generalization across varying gate sets and scaling qubit counts, highlighting the versatility and computational efficiency of diffusion-based methods. This work illustrates the potential of generative models as a powerful tool for accelerating and optimizing the design of PQCs, supporting the development of more practical and scalable quantum applications.
nan
Article 1974
Title@2025-06-05 (4): Scalable Multi-Output Gaussian Processes with Stochastic Variational Inference
Title: Scalable Multi-Output Gaussian Processes with Stochastic Variational Inference | Skalierbare Multi-Output Gaußsche Prozesse mit stochastischer Variationsableitung | 可缩放多输出高斯进程,带有斯托卡变动推断 2407.02476v2 |
Authors (4): Xiaoyu Jiang, Sokratia Georgaka, Magnus Rattray, Mauricio A. Álvarez
The Multi-Output Gaussian Process is is a popular tool for modelling data from multiple sources. A typical choice to build a covariance function for a MOGP is the Linear Model of Coregionalization (LMC) which parametrically models the covariance between outputs. The Latent Variable MOGP (LV-MOGP) generalises this idea by modelling the covariance between outputs using a kernel applied to latent variables, one per output, leading to a flexible MOGP model that allows efficient generalization to new outputs with few data points. Computational complexity in LV-MOGP grows linearly with the number of outputs, which makes it unsuitable for problems with a large number of outputs. In this paper, we propose a stochastic variational inference approach for the LV-MOGP that allows mini-batches for both inputs and outputs, making computational complexity per training iteration independent of the number of outputs.
nan
Article 1975
Title@2025-06-05 (4): Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge
Title: Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge | Full-Parameter Continual Pretraining von Gemma2: Einblicke in Fluency und Domain Knowledge | Gemma2:深入了解流能和广域知识 2505.05946v2 |
Authors (3): Vytenis Šliogeris, Povilas Daniušis, Artūras Nakvosas
In this technical report, we empirically investigate the relationship between linguistic fluency and domain knowledge in the context of continual learning with large language models (LLMs). Specifically, we enhance the linguistic fluency of the Gemma2 LLM for the Lithuanian language by autoregressively pretraining its full parameter set on the first 10\% of the Lithuanian language component of the CulturaX dataset. To prevent catastrophic forgetting of the model’s existing domain knowledge, we apply Elastic Weight Consolidation (EWC), leveraging Fisher information estimated using data from the Massive Multitask Language Understanding (MMLU) benchmark. In the post-training evaluations, we assess linguistic fluency through perplexity and evaluate domain knowledge using accuracy on a suite of language understanding benchmarks, including ARC-Easy, Belebele, GSM8K, HellaSwag, MMLU, TruthfulQA, and Winogrande, in both English and Lithuanian. The empirical results demonstrate that EWC not only mitigates catastrophic forgetting by preserving the model’s performance in terms of both linguistic fluency and domain knowledge but also improves or maintains these capabilities for the newly added Lithuanian language. These findings highlight the potential for more efficient adaptation of general-purpose LLMs to under-represented languages without requiring access to the original training data. The accompanying codebase is openly accessible at https://github.com/Neurotechnology/LLM_EWC.
nan
Article 1976
Title@2025-06-05 (4): Gaussian Process Diffeomorphic Statistical Shape Modelling Outperforms Angle-Based Methods for Assessment of Hip Dysplasia
Title: Gaussian Process Diffeomorphic Statistical Shape Modelling Outperforms Angle-Based Methods for Assessment of Hip Dysplasia | Gaussian Prozess Diffeomorphe statistische Form Modellierung übertrifft Winkel-basierte Methoden zur Beurteilung von Hüftdysplasie | Gausian 进程 外形外形光效光效模型统计形态底部形态评估方法 2506.04886v1 |
Authors (6): Allen Paul, George Grammatopoulos, Adwaye Rambojun, Neill D. F. Campbell, Harinderjit S. Gill, Tony Shardlow
Dysplasia is a recognised risk factor for osteoarthritis (OA) of the hip, early diagnosis of dysplasia is important to provide opportunities for surgical interventions aimed at reducing the risk of hip OA. We have developed a pipeline for semi-automated classification of dysplasia using volumetric CT scans of patients’ hips and a minimal set of clinically annotated landmarks, combining the framework of the Gaussian Process Latent Variable Model with diffeomorphism to create a statistical shape model, which we termed the Gaussian Process Diffeomorphic Statistical Shape Model (GPDSSM). We used 192 CT scans, 100 for model training and 92 for testing. The GPDSSM effectively distinguishes dysplastic samples from controls while also highlighting regions of the underlying surface that show dysplastic variations. As well as improving classification accuracy compared to angle-based methods (AUC 96.2% vs 91.2%), the GPDSSM can save time for clinicians by removing the need to manually measure angles and interpreting 2D scans for possible markers of dysplasia.
nan
Article 1977
Title@2025-06-05 (4): Self-Tuning Spectral Clustering for Speaker Diarization
Title: Self-Tuning Spectral Clustering for Speaker Diarization | Selbst-Tuning Spectral Clustering für Sprecher-Diarisierung | 为议长进行自我引导光谱群集 2410.00023v2 |
Authors (4): Nikhil Raghav, Avisek Gupta, Md Sahidullah, Swagatam Das
Spectral clustering has proven effective in grouping speech representations for speaker diarization tasks, although post-processing the affinity matrix remains difficult due to the need for careful tuning before constructing the Laplacian. In this study, we present a novel pruning algorithm to create a sparse affinity matrix called spectral clustering on p-neighborhood retained affinity matrix (SC-pNA). Our method improves on node-specific fixed neighbor selection by allowing a variable number of neighbors, eliminating the need for external tuning data as the pruning parameters are derived directly from the affinity matrix. SC-pNA does so by identifying two clusters in every row of the initial affinity matrix, and retains only the top p % similarity scores from the cluster containing larger similarities. Spectral clustering is performed subsequently, with the number of clusters determined as the maximum eigengap. Experimental results on the challenging DIHARD-III dataset highlight the superiority of SC-pNA, which is also computationally more efficient than existing auto-tuning approaches. Our implementations are available at https://github.com/nikhilraghav29/SC-pNA.
nan
Article 1978
Title@2025-06-05 (4): Multiplicative Dynamic Mode Decomposition
Title: Multiplicative Dynamic Mode Decomposition | Multiplikative Dynamische Moduszersetzung | 多倍复制动态模式分解 2405.05334v2 |
Authors (2): Nicolas Boullé, Matthew J. Colbrook
Koopman operators are infinite-dimensional operators that linearize nonlinear dynamical systems, facilitating the study of their spectral properties and enabling the prediction of the time evolution of observable quantities. Recent methods have aimed to approximate Koopman operators while preserving key structures. However, approximating Koopman operators typically requires a dictionary of observables to capture the system’s behavior in a finite-dimensional subspace. The selection of these functions is often heuristic, may result in the loss of spectral information, and can severely complicate structure preservation. This paper introduces Multiplicative Dynamic Mode Decomposition (MultDMD), which enforces the multiplicative structure inherent in the Koopman operator within its finite-dimensional approximation. Leveraging this multiplicative property, we guide the selection of observables and define a constrained optimization problem for the matrix approximation, which can be efficiently solved. MultDMD presents a structured approach to finite-dimensional approximations and can more accurately reflect the spectral properties of the Koopman operator. We elaborate on the theoretical framework of MultDMD, detailing its formulation, optimization strategy, and convergence properties. The efficacy of MultDMD is demonstrated through several examples, including the nonlinear pendulum, the Lorenz system, and fluid dynamics data, where we demonstrate its remarkable robustness to noise.
nan
Article 1979
Title@2025-06-05 (4): Multi-granularity Knowledge Transfer for Continual Reinforcement Learning
Title: Multi-granularity Knowledge Transfer for Continual Reinforcement Learning | Multi-Granularitäts-Wissenstransfer für kontinuierliches Stärkungslernen | 用于持续加强学习的多族知识转让 2401.15098v3 |
Authors (7): Chaofan Pan, Lingfei Ren, Yihui Feng, Linbo Xiong, Wei Wei, Yonghao Li, Xin Yang
Continual reinforcement learning (CRL) empowers RL agents with the ability to learn a sequence of tasks, accumulating knowledge learned in the past and using the knowledge for problemsolving or future task learning. However, existing methods often focus on transferring fine-grained knowledge across similar tasks, which neglects the multi-granularity structure of human cognitive control, resulting in insufficient knowledge transfer across diverse tasks. To enhance coarse-grained knowledge transfer, we propose a novel framework called MT-Core (as shorthand for Multi-granularity knowledge Transfer for Continual reinforcement learning). MT-Core has a key characteristic of multi-granularity policy learning: 1) a coarsegrained policy formulation for utilizing the powerful reasoning ability of the large language model (LLM) to set goals, and 2) a fine-grained policy learning through RL which is oriented by the goals. We also construct a new policy library (knowledge base) to store policies that can be retrieved for multi-granularity knowledge transfer. Experimental results demonstrate the superiority of the proposed MT-Core in handling diverse CRL tasks versus popular baselines.
nan
Article 1980
Title@2025-06-05 (4): kTULA: A Langevin sampling algorithm with improved KL bounds under super-linear log-gradients
Title: kTULA: A Langevin sampling algorithm with improved KL bounds under super-linear log-gradients | kTULA: Ein Langevin-Sampling-Algorithmus mit verbesserten KL-Grenzen unter superlinearen Log-Gradienten | kTULA: Langevin 取样算法,在超线性日志梯度下改进 KL 界限 2506.04878v1 |
Authors (3): Iosif Lytras, Sotirios Sabanis, Ying Zhang
Motivated by applications in deep learning, where the global Lipschitz continuity condition is often not satisfied, we examine the problem of sampling from distributions with super-linearly growing log-gradients. We propose a novel tamed Langevin dynamics-based algorithm, called kTULA, to solve the aforementioned sampling problem, and provide a theoretical guarantee for its performance. More precisely, we establish a non-asymptotic convergence bound in Kullback-Leibler (KL) divergence with the best-known rate of convergence equal to $2-\overline{\epsilon}$, $\overline{\epsilon}>0$, which significantly improves relevant results in existing literature. This enables us to obtain an improved non-asymptotic error bound in Wasserstein-2 distance, which can be used to further derive a non-asymptotic guarantee for kTULA to solve the associated optimization problems. To illustrate the applicability of kTULA, we apply the proposed algorithm to the problem of sampling from a high-dimensional double-well potential distribution and to an optimization problem involving a neural network. We show that our main results can be used to provide theoretical guarantees for the performance of kTULA.
nan
Article 1981
Title@2025-06-05 (4): There Was Never a Bottleneck in Concept Bottleneck Models
Title: There Was Never a Bottleneck in Concept Bottleneck Models | Es gab nie einen Engpass in Konzept Engpässe Modelle | ” 概念瓶颈模型 “ 从未出现过瓶颈问题 2506.04877v1 |
Authors (3): Antonio Almudévar, José Miguel Hernández-Lobato, Alfonso Ortega
Deep learning representations are often difficult to interpret, which can hinder their deployment in sensitive applications. Concept Bottleneck Models (CBMs) have emerged as a promising approach to mitigate this issue by learning representations that support target task performance while ensuring that each component predicts a concrete concept from a predefined set. In this work, we argue that CBMs do not impose a true bottleneck: the fact that a component can predict a concept does not guarantee that it encodes only information about that concept. This shortcoming raises concerns regarding interpretability and the validity of intervention procedures. To overcome this limitation, we propose Minimal Concept Bottleneck Models (MCBMs), which incorporate an Information Bottleneck (IB) objective to constrain each representation component to retain only the information relevant to its corresponding concept. This IB is implemented via a variational regularization term added to the training loss. As a result, MCBMs support concept-level interventions with theoretical guarantees, remain consistent with Bayesian principles, and offer greater flexibility in key design choices.
nan
Article 1982
Title@2025-06-05 (4): Aligning Multimodal Representations through an Information Bottleneck
Title: Aligning Multimodal Representations through an Information Bottleneck | Multimodale Repräsentationen durch einen Informationsengpass ausrichten | 通过信息瓶颈协调多式联运代表机构 2506.04870v1 |
Authors (5): Antonio Almudévar, José Miguel Hernández-Lobato, Sameer Khurana, Ricard Marxer, Alfonso Ortega
Contrastive losses have been extensively used as a tool for multimodal representation learning. However, it has been empirically observed that their use is not effective to learn an aligned representation space. In this paper, we argue that this phenomenon is caused by the presence of modality-specific information in the representation space. Although some of the most widely used contrastive losses maximize the mutual information between representations of both modalities, they are not designed to remove the modality-specific information. We give a theoretical description of this problem through the lens of the Information Bottleneck Principle. We also empirically analyze how different hyperparameters affect the emergence of this phenomenon in a controlled experimental setup. Finally, we propose a regularization term in the loss function that is derived by means of a variational approximation and aims to increase the representational alignment. We analyze in a set of controlled experiments and real-world applications the advantages of including this regularization term.
nan
Article 1983
Title@2025-06-05 (4): LLMs for sensory-motor control: Combining in-context and iterative learning
Title: LLMs for sensory-motor control: Combining in-context and iterative learning | LLMs für die sensorisch-motorische Steuerung: Kombination von Kontext und iterativem Lernen | 感官运动控制LLMs:将内文与迭接学习相结合 2506.04867v1 |
Authors (2): Jônata Tyska Carvalho, Stefano Nolfi
We propose a method that enables large language models (LLMs) to control embodied agents by directly mapping continuous observation vectors to continuous action vectors. Initially, the LLMs generate a control strategy based on a textual description of the agent, its environment, and the intended goal. This strategy is then iteratively refined through a learning process in which the LLMs are repeatedly prompted to improve the current strategy, using performance feedback and sensory-motor data collected during its evaluation. The method is validated on classic control tasks from the Gymnasium library and the inverted pendulum task from the MuJoCo library. In most cases, it successfully identifies optimal or high-performing solutions by integrating symbolic knowledge derived through reasoning with sub-symbolic sensory-motor data gathered as the agent interacts with its environment.
nan
Article 1984
Title@2025-06-05 (4): Fokker-Planck to Callan-Symanzik: evolution of weight matrices under training
Title: Fokker-Planck to Callan-Symanzik: evolution of weight matrices under training | Fokker-Planck zu Callan-Symanzik: Entwicklung von Gewichtsmatrizen im Training | Fokker-Planck to Callan-Symanzik:培训中重量矩阵的演变 2501.09659v2 |
Authors (3): Wei Bu, Uri Kol, Ziming Liu
The dynamical evolution of a neural network during training has been an incredibly fascinating subject of study. First principal derivation of generic evolution of variables in statistical physics systems has proved useful when used to describe training dynamics conceptually, which in practice means numerically solving equations such as Fokker-Planck equation. Simulating entire networks inevitably runs into the curse of dimensionality. In this paper, we utilize Fokker-Planck to simulate the probability density evolution of individual weight matrices in the bottleneck layers of a simple 2-bottleneck-layered auto-encoder and compare the theoretical evolutions against the empirical ones by examining the output data distributions. We also derive physically relevant partial differential equations such as Callan-Symanzik and Kardar-Parisi-Zhang equations from the dynamical equation we have.
nan
Article 1985
Title@2025-06-05 (4): Improving AI-generated music with user-guided training
Title: Improving AI-generated music with user-guided training | Verbesserung der KI-generierten Musik durch benutzergeführtes Training | 通过用户指导培训,改进AI制制的音乐 2506.04852v1 |
Authors (5): Vishwa Mohan Singh, Sai Anirudh Aryasomayajula, Ahan Chatterjee, Beste Aydemir, Rifat Mehreen Amin
AI music generation has advanced rapidly, with models like diffusion and autoregressive algorithms enabling high-fidelity outputs. These tools can alter styles, mix instruments, or isolate them. Since sound can be visualized as spectrograms, image-generation algorithms can be applied to generate novel music. However, these algorithms are typically trained on fixed datasets, which makes it challenging for them to interpret and respond to user input accurately. This is especially problematic because music is highly subjective and requires a level of personalization that image generation does not provide. In this work, we propose a human-computation approach to gradually improve the performance of these algorithms based on user interactions. The human-computation element involves aggregating and selecting user ratings to use as the loss function for fine-tuning the model. We employ a genetic algorithm that incorporates user feedback to enhance the baseline performance of a model initially trained on a fixed dataset. The effectiveness of this approach is measured by the average increase in user ratings with each iteration. In the pilot test, the first iteration showed an average rating increase of 0.2 compared to the baseline. The second iteration further improved upon this, achieving an additional increase of 0.39 over the first iteration.
nan
Article 1986
Title@2025-06-05 (4): An Unsupervised Framework for Dynamic Health Indicator Construction and Its Application in Rolling Bearing Prognostics
Title: An Unsupervised Framework for Dynamic Health Indicator Construction and Its Application in Rolling Bearing Prognostics | Ein unbeaufsichtigtes Framework für die Konstruktion dynamischer Gesundheitsindikatoren und deren Anwendung in der Wälzlagerprognostik | 一个无人监督的动态健康指标建设框架及其在滚轴预测器中的应用 2506.05438v1 |
Authors (4): Tongda Sun, Chen Yin, Huailiang Zheng, Yining Dong
Health indicator (HI) plays a key role in degradation assessment and prognostics of rolling bearings. Although various HI construction methods have been investigated, most of them rely on expert knowledge for feature extraction and overlook capturing dynamic information hidden in sequential degradation processes, which limits the ability of the constructed HI for degradation trend representation and prognostics. To address these concerns, a novel dynamic HI that considers HI-level temporal dependence is constructed through an unsupervised framework. Specifically, a degradation feature learning module composed of a skip-connection-based autoencoder first maps raw signals to a representative degradation feature space (DFS) to automatically extract essential degradation features without the need for expert knowledge. Subsequently, in this DFS, a new HI-generating module embedded with an inner HI-prediction block is proposed for dynamic HI construction, where the temporal dependence between past and current HI states is guaranteed and modeled explicitly. On this basis, the dynamic HI captures the inherent dynamic contents of the degradation process, ensuring its effectiveness for degradation tendency modeling and future degradation prognostics. The experiment results on two bearing lifecycle datasets demonstrate that the proposed HI construction method outperforms comparison methods, and the constructed dynamic HI is superior for prognostic tasks.
nan
Article 1987
Title@2025-06-05 (4): Learning pure quantum states (almost) without regret
Title: Learning pure quantum states (almost) without regret | Lernen von reinen Quantenzuständen (fast) ohne Bedauern | 学习纯量度状态(几乎) 2406.18370v2 |
Authors (3): Josep Lumbreras, Mikhail Terekhov, Marco Tomamichel
We initiate the study of sample-optimal quantum state tomography with minimal disturbance to the samples. Can we efficiently learn a precise description of a quantum state through sequential measurements of samples while at the same time making sure that the post-measurement state of the samples is only minimally perturbed? Defining regret as the cumulative disturbance of all samples, the challenge is to find a balance between the most informative sequence of measurements on the one hand and measurements incurring minimal regret on the other. Here we answer this question for qubit states by exhibiting a protocol that for pure states achieves maximal precision while incurring a regret that grows only polylogarithmically with the number of samples, a scaling that we show to be optimal.
nan
Article 1988
Title@2025-06-05 (4): Hybrid deep convolution model for lung cancer detection with transfer learning
Title: Hybrid deep convolution model for lung cancer detection with transfer learning | Hybrides Deep-Convolution-Modell für Lungenkrebserkennung mit Transfer-Lernen | 肺癌检测与转移学习的混合深变深混合模型 2501.02785v2 |
Authors (4): Sugandha Saxena, S. N. Prasad, Ashwin M Polnaya, Shweta Agarwala
Advances in healthcare research have significantly enhanced our understanding of disease mechanisms, diagnostic precision, and therapeutic options. Yet, lung cancer remains one of the leading causes of cancer-related mortality worldwide due to challenges in early and accurate diagnosis. While current lung cancer detection models show promise, there is considerable potential for further improving the accuracy for timely intervention. To address this challenge, we introduce a hybrid deep convolution model leveraging transfer learning, named the Maximum Sensitivity Neural Network (MSNN). MSNN is designed to improve the precision of lung cancer detection by refining sensitivity and specificity. This model has surpassed existing deep learning approaches through experimental validation, achieving an accuracy of 98% and a sensitivity of 97%. By overlaying sensitivity maps onto lung Computed Tomography (CT) scans, it enables the visualization of regions most indicative of malignant or benign classifications. This innovative method demonstrates exceptional performance in distinguishing lung cancer with minimal false positives, thereby enhancing the accuracy of medical diagnoses.
nan
Article 1989
Title@2025-06-05 (4): Event Classification of Accelerometer Data for Industrial Package Monitoring with Embedded Deep Learning
Title: Event Classification of Accelerometer Data for Industrial Package Monitoring with Embedded Deep Learning | Event-Klassifikation von Accelerometer-Daten für die industrielle Paketüberwachung mit eingebettetem Deep Learning | 用于以嵌入深层学习方式监测工业一揽子措施的加速计数据的事件分类 2506.05435v1 |
Authors (6): Manon Renault, Hamoud Younes, Hugo Tessier, Ronan Le Roy, Bastien Pasdeloup, Mathieu Léonardon
Package monitoring is an important topic in industrial applications, with significant implications for operational efficiency and ecological sustainability. In this study, we propose an approach that employs an embedded system, placed on reusable packages, to detect their state (on a Forklift, in a Truck, or in an undetermined location). We aim to design a system with a lifespan of several years, corresponding to the lifespan of reusable packages. Our analysis demonstrates that maximizing device lifespan requires minimizing wake time. We propose a pipeline that includes data processing, training, and evaluation of the deep learning model designed for imbalanced, multiclass time series data collected from an embedded sensor. The method uses a one-dimensional Convolutional Neural Network architecture to classify accelerometer data from the IoT device. Before training, two data augmentation techniques are tested to solve the imbalance problem of the dataset: the Synthetic Minority Oversampling TEchnique and the ADAptive SYNthetic sampling approach. After training, compression techniques are implemented to have a small model size. On the considered twoclass problem, the methodology yields a precision of 94.54% for the first class and 95.83% for the second class, while compression techniques reduce the model size by a factor of four. The trained model is deployed on the IoT device, where it operates with a power consumption of 316 mW during inference.
nan
Article 1990
Title@2025-06-05 (4): From EHRs to Patient Pathways: Scalable Modeling of Longitudinal Health Trajectories with LLMs
Title: From EHRs to Patient Pathways: Scalable Modeling of Longitudinal Health Trajectories with LLMs | Von EHRs zu Patientenpfaden: Skalierbare Modellierung von langitudinalen Gesundheits-Trajektorien mit LLMs | 从EHRs到患者途径:可扩展的长纵向健康轨迹模型与LLMs 2506.04831v1 |
Authors (5): Chantal Pellegrini, Ege Özsoy, David Bani-Harouni, Matthias Keicher, Nassir Navab
Healthcare systems face significant challenges in managing and interpreting vast, heterogeneous patient data for personalized care. Existing approaches often focus on narrow use cases with a limited feature space, overlooking the complex, longitudinal interactions needed for a holistic understanding of patient health. In this work, we propose a novel approach to patient pathway modeling by transforming diverse electronic health record (EHR) data into a structured representation and designing a holistic pathway prediction model, EHR2Path, optimized to predict future health trajectories. Further, we introduce a novel summary mechanism that embeds long-term temporal context into topic-specific summary tokens, improving performance over text-only models, while being much more token-efficient. EHR2Path demonstrates strong performance in both next time-step prediction and longitudinal simulation, outperforming competitive baselines. It enables detailed simulations of patient trajectories, inherently targeting diverse evaluation tasks, such as forecasting vital signs, lab test results, or length-of-stay, opening a path towards predictive and personalized healthcare.
nan
Article 1991
Title@2025-06-05 (4): Reading Recognition in the Wild
Title: Reading Recognition in the Wild | Leseerkennung in der Wildnis | 野生生物的阅读识别 2505.24848v2 |
Authors (15): Charig Yang, Samiul Alam, Shakhrul Iman Siam, Michael J. Proulx, Lambert Mathias, Kiran Somasundaram, Luis Pesqueira, James Fort, Sheroze Sheriffdeen, Omkar Parkhi, Carl Ren, Mi Zhang, Yuning Chai, Richard Newcombe, Hyo Jin Kim
To enable egocentric contextual AI in always-on smart glasses, it is crucial to be able to keep a record of the user’s interactions with the world, including during reading. In this paper, we introduce a new task of reading recognition to determine when the user is reading. We first introduce the first-of-its-kind large-scale multimodal Reading in the Wild dataset, containing 100 hours of reading and non-reading videos in diverse and realistic scenarios. We then identify three modalities (egocentric RGB, eye gaze, head pose) that can be used to solve the task, and present a flexible transformer model that performs the task using these modalities, either individually or combined. We show that these modalities are relevant and complementary to the task, and investigate how to efficiently and effectively encode each modality. Additionally, we show the usefulness of this dataset towards classifying types of reading, extending current reading understanding studies conducted in constrained settings to larger scale, diversity and realism.
nan
Article 1992
Title@2025-06-05 (4): Supervised Graph Contrastive Learning for Gene Regulatory Network
Title: Supervised Graph Contrastive Learning for Gene Regulatory Network | Überwachtes Graph Kontrastives Lernen für Gene Regulatory Network | 受监督的基因监管网络图表对比性学习 2505.17786v2 |
Authors (5): Sho Oshima, Yuji Okamoto, Taisei Tosaki, Ryosuke Kojima, Yasushi Okuno
Graph representation learning is effective for obtaining a meaningful latent space utilizing the structure of graph data and is widely applied, including biological networks. In particular, Graph Contrastive Learning (GCL) has emerged as a powerful self-supervised method that relies on applying perturbations to graphs for data augmentation. However, when applying existing GCL methods to biological networks such as Gene Regulatory Networks (GRNs), they overlooked meaningful biologically relevant perturbations, e.g., gene knockdowns. In this study, we introduce SupGCL (Supervised Graph Contrastive Learning), a novel GCL method for GRNs that directly incorporates biological perturbations derived from gene knockdown experiments as the supervision. SupGCL mathematically extends existing GCL methods that utilize non-biological perturbations to probabilistic models that introduce actual biological gene perturbation utilizing gene knockdown data. Using the GRN representation obtained by our proposed method, our aim is to improve the performance of biological downstream tasks such as patient hazard prediction and disease subtype classification (graph-level task), and gene function classification (node-level task). We applied SupGCL on real GRN datasets derived from patients with multiple types of cancer, and in all experiments SupGCL achieves better performance than state-of-the-art baselines.
nan
Article 1993
Title@2025-06-05 (4): Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light Detectors
Title: Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light Detectors | Tor das Stopplicht: Realistische Widersacher-Angriffe auf Verkehrslicht-Detektoren | 愚笨的暗灯:现实的对抗性对交通灯光探测器的补丁攻击 2506.04823v1 |
Authors (5): Svetlana Pavlitska, Jamie Robb, Nikolai Polley, Melih Yazgan, J. Marius Zöllner
Realistic adversarial attacks on various camera-based perception tasks of autonomous vehicles have been successfully demonstrated so far. However, only a few works considered attacks on traffic light detectors. This work shows how CNNs for traffic light detection can be attacked with printed patches. We propose a threat model, where each instance of a traffic light is attacked with a patch placed under it, and describe a training strategy. We demonstrate successful adversarial patch attacks in universal settings. Our experiments show realistic targeted red-to-green label-flipping attacks and attacks on pictogram classification. Finally, we perform a real-world evaluation with printed patches and demonstrate attacks in the lab settings with a mobile traffic light for construction sites and in a test area with stationary traffic lights. Our code is available at https://github.com/KASTEL-MobilityLab/attacks-on-traffic-light-detection.
nan
Article 1994
Title@2025-06-05 (4): Offline Multi-agent Reinforcement Learning via Score Decomposition
Title: Offline Multi-agent Reinforcement Learning via Score Decomposition | Offline-Multi-Agenten-Verstärkung Lernen über Score-Dekomposition | 通过计分分分分分分化进行离线多剂强化学习 2505.05968v2 |
Authors (5): Dan Qiao, Wenhao Li, Shanchao Yang, Hongyuan Zha, Baoxiang Wang
Offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts, particularly stemming from the high dimensionality of joint action spaces and the presence of out-of-distribution joint action selections. In this work, we highlight that a fundamental challenge in offline MARL arises from the multi-equilibrium nature of cooperative tasks, which induces a highly multimodal joint behavior policy space coupled with heterogeneous-quality behavior data. This makes it difficult for individual policy regularization to align with a consistent coordination pattern, leading to the policy distribution shift problems. To tackle this challenge, we design a sequential score function decomposition method that distills per-agent regularization signals from the joint behavior policy, which induces coordinated modality selection under decentralized execution constraints. Then we leverage a flexible diffusion-based generative model to learn these score functions from multimodal offline data, and integrate them into joint-action critics to guide policy updates toward high-reward, in-distribution regions under a shared team reward. Our approach achieves state-of-the-art performance across multiple particle environments and Multi-agent MuJoCo benchmarks consistently. To the best of our knowledge, this is the first work to explicitly address the distributional gap between offline and online MARL, paving the way for more generalizable offline policy-based MARL methods.
nan
Article 1995
Title@2025-06-05 (4): LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning
Title: LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning | LogicPuzzleRL: Robuste mathematische Reasoning in LLMs durch Verstärkungslernen kultivieren | LologicPuwloughRL:通过强化学习在LLMs中培养强力数学理由 2506.04821v1 |
Authors (10): Zhen Hao Wong, Jingwen Deng, Runming He, Zirong Chen, Qijie You, Hejun Dong, Hao Liang, Chengyu Shen, Bin Cui, Wentao Zhang
Large language models (LLMs) excel at many supervised tasks but often struggle with structured reasoning in unfamiliar settings. This discrepancy suggests that standard fine-tuning pipelines may instill narrow, domain-specific heuristics rather than fostering general-purpose thinking strategies. In this work, we propose a “play to learn” framework that fine-tunes LLMs through reinforcement learning on a suite of seven custom logic puzzles, each designed to cultivate distinct reasoning skills such as constraint propagation, spatial consistency, and symbolic deduction. Using a reinforcement learning setup with verifiable rewards, models receive binary feedback based on puzzle correctness, encouraging iterative, hypothesis-driven problem solving. We demonstrate that this training approach significantly improves out-of-distribution performance on a range of mathematical benchmarks, especially for mid-difficulty problems that require multi-step reasoning. Analyses across problem categories and difficulty levels reveal that puzzle training promotes transferable reasoning routines, strengthening algebraic manipulation, geometric inference, and combinatorial logic, while offering limited gains on rote or highly specialized tasks. These findings show that reinforcement learning over logic puzzles reshapes the internal reasoning of LLMs, enabling more robust and compositional generalization without relying on task-specific symbolic tools.
nan
Article 1996
Title@2025-06-05 (4): Distributional encoding for Gaussian process regression with qualitative inputs
Title: Distributional encoding for Gaussian process regression with qualitative inputs | Verteilungskodierung für Gaußsche Prozessregression mit qualitativen Inputs | 带有定性投入的高斯进程回归的分布编码 2506.04813v1 |
Authors (1): Sébastien Da Veiga
Gaussian Process (GP) regression is a popular and sample-efficient approach for many engineering applications, where observations are expensive to acquire, and is also a central ingredient of Bayesian optimization (BO), a highly prevailing method for the optimization of black-box functions. However, when all or some input variables are categorical, building a predictive and computationally efficient GP remains challenging. Starting from the naive target encoding idea, where the original categorical values are replaced with the mean of the target variable for that category, we propose a generalization based on distributional encoding (DE) which makes use of all samples of the target variable for a category. To handle this type of encoding inside the GP, we build upon recent results on characteristic kernels for probability distributions, based on the maximum mean discrepancy and the Wasserstein distance. We also discuss several extensions for classification, multi-task learning and incorporation or auxiliary information. Our approach is validated empirically, and we demonstrate state-of-the-art predictive performance on a variety of synthetic and real-world datasets. DE is naturally complementary to recent advances in BO over discrete and mixed-spaces.
nan
Article 1997
Title@2025-06-05 (4): Uniform convergence of the smooth calibration error and its relationship with functional gradient
Title: Uniform convergence of the smooth calibration error and its relationship with functional gradient | Einheitliche Konvergenz des glatten Kalibrierfehlers und seines Verhältnisses mit dem funktionellen Gradienten | 平稳校准误差及其与功能梯度的关系统一汇合 2505.19396v2 |
Authors (2): Futoshi Futami, Atsushi Nitanda
Calibration is a critical requirement for reliable probabilistic prediction, especially in high-risk applications. However, the theoretical understanding of which learning algorithms can simultaneously achieve high accuracy and good calibration remains limited, and many existing studies provide empirical validation or a theoretical guarantee in restrictive settings. To address this issue, in this work, we focus on the smooth calibration error (CE) and provide a uniform convergence bound, showing that the smooth CE is bounded by the sum of the smooth CE over the training dataset and a generalization gap. We further prove that the functional gradient of the loss function can effectively control the training smooth CE. Based on this framework, we analyze three representative algorithms: gradient boosting trees, kernel boosting, and two-layer neural networks. For each, we derive conditions under which both classification and calibration performances are simultaneously guaranteed. Our results offer new theoretical insights and practical guidance for designing reliable probabilistic models with provable calibration guarantees.
nan
Article 1998
Title@2025-06-05 (4): Adaptive Preconditioners Trigger Loss Spikes in Adam
Title: Adaptive Preconditioners Trigger Loss Spikes in Adam | Adaptive Preconditioners Trigger Verlust Spikes in Adam | 亚当的触发器失落斯派克 2506.04805v1 |
Authors (9): Zhiwei Bai, Zhangchen Zhou, Jiajie Zhao, Xiaolong Li, Zhiyu Li, Feiyu Xiong, Hongkang Yang, Yaoyu Zhang, Zhi-Qin John Xu
Loss spikes emerge commonly during training across neural networks of varying architectures and scales when using the Adam optimizer. In this work, we investigate the underlying mechanism responsible for Adam spikes. While previous explanations attribute these phenomena to the lower-loss-as-sharper characteristics of the loss landscape, our analysis reveals that Adam’s adaptive preconditioners themselves can trigger spikes. Specifically, we identify a critical regime where squared gradients become substantially smaller than the second-order moment estimates, causing the latter to undergo a $\beta_2$-exponential decay and to respond sluggishly to current gradient information. This mechanism can push the maximum eigenvalue of the preconditioned Hessian beyond the classical stability threshold $2/\eta$ for a sustained period, inducing instability. This instability further leads to an alignment between the gradient and the maximum eigendirection, and a loss spike occurs precisely when the gradient-directional curvature exceeds $2/\eta$. We verify this mechanism through extensive experiments on fully connected networks, convolutional networks, and Transformer architectures.
nan
Article 1999
Title@2025-06-05 (4): Augmented Invertible Koopman Autoencoder for long-term time series forecasting
Title: Augmented Invertible Koopman Autoencoder for long-term time series forecasting | Augmented Invertible Koopman Autoencoder für langfristige Zeitreihen | 用于长期时间序列预报的可逆 Koopman 自动编码器 2503.12930v2 |
Authors (5): Anthony Frion, Lucas Drumetz, Mauro Dalla Mura, Guillaume Tochon, Abdeldjalil Aïssa-El-Bey
Following the introduction of Dynamic Mode Decomposition and its numerous extensions, many neural autoencoder-based implementations of the Koopman operator have recently been proposed. This class of methods appears to be of interest for modeling dynamical systems, either through direct long-term prediction of the evolution of the state or as a powerful embedding for downstream methods. In particular, a recent line of work has developed invertible Koopman autoencoders (IKAEs), which provide an exact reconstruction of the input state thanks to their analytically invertible encoder, based on coupling layer normalizing flow models. We identify that the conservation of the dimension imposed by the normalizing flows is a limitation for the IKAE models, and thus we propose to augment the latent state with a second, non-invertible encoder network. This results in our new model: the Augmented Invertible Koopman AutoEncoder (AIKAE). We demonstrate the relevance of the AIKAE through a series of long-term time series forecasting experiments, on satellite image time series as well as on a benchmark involving predictions based on a large lookback window of observations.
nan
Article 2000
Title@2025-06-05 (4): LotusFilter: Fast Diverse Nearest Neighbor Search via a Learned Cutoff Table
Title: LotusFilter: Fast Diverse Nearest Neighbor Search via a Learned Cutoff Table | LotusFilter: Schnelles Diverses Naheste Nachbarschaftssuche über einen erfahrenen Cutoff-Tisch | LotusFilter:通过计算截止表快速搜索近邻 2506.04790v1 |
Authors (1): Yusuke Matsui
Approximate nearest neighbor search (ANNS) is an essential building block for applications like RAG but can sometimes yield results that are overly similar to each other. In certain scenarios, search results should be similar to the query and yet diverse. We propose LotusFilter, a post-processing module to diversify ANNS results. We precompute a cutoff table summarizing vectors that are close to each other. During the filtering, LotusFilter greedily looks up the table to delete redundant vectors from the candidates. We demonstrated that the LotusFilter operates fast (0.02 [ms/query]) in settings resembling real-world RAG applications, utilizing features such as OpenAI embeddings. Our code is publicly available at https://github.com/matsui528/lotf.
nan
Article 2001
Title@2025-06-05 (4): Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques
Title: Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques | Auf dem Weg zur multimodalen Fusion LLM-Centric: Eine Studie über Integrationsstrategien und -techniken | 走向LLM-Centric Multimods 融合:关于一体化战略和技术的调查 2506.04788v1 |
Authors (4): Jisu An, Junseok Lee, Jeoungeun Lee, Yongseok Son
The rapid progress of Multimodal Large Language Models(MLLMs) has transformed the AI landscape. These models combine pre-trained LLMs with various modality encoders. This integration requires a systematic understanding of how different modalities connect to the language backbone. Our survey presents an LLM-centric analysis of current approaches. We examine methods for transforming and aligning diverse modal inputs into the language embedding space. This addresses a significant gap in existing literature. We propose a classification framework for MLLMs based on three key dimensions. First, we examine architectural strategies for modality integration. This includes both the specific integration mechanisms and the fusion level. Second, we categorize representation learning techniques as either joint or coordinate representations. Third, we analyze training paradigms, including training strategies and objective functions. By examining 125 MLLMs developed between 2021 and 2025, we identify emerging patterns in the field. Our taxonomy provides researchers with a structured overview of current integration techniques. These insights aim to guide the development of more robust multimodal integration strategies for future models built on pre-trained foundations.
nan
Article 2002
Title@2025-06-05 (4): Kernel $k$-Medoids as General Vector Quantization
Title: Kernel $k$-Medoids as General Vector Quantization | Kernel $k$-Medoids als General Vector Quantization | 作为一般矢量量化的内核 $k$- 中值 2506.04786v1 |
Authors (3): Thore Gerlach, Sascha Mücke, Christian Bauckhage
Vector Quantization (VQ) is a widely used technique in machine learning and data compression, valued for its simplicity and interpretability. Among hard VQ methods, $k$-medoids clustering and Kernel Density Estimation (KDE) approaches represent two prominent yet seemingly unrelated paradigms – one distance-based, the other rooted in probability density matching. In this paper, we investigate their connection through the lens of Quadratic Unconstrained Binary Optimization (QUBO). We compare a heuristic QUBO formulation for $k$-medoids, which balances centrality and diversity, with a principled QUBO derived from minimizing Maximum Mean Discrepancy in KDE-based VQ. Surprisingly, we show that the KDE-QUBO is a special case of the $k$-medoids-QUBO under mild assumptions on the kernel’s feature map. This reveals a deeper structural relationship between these two approaches and provides new insight into the geometric interpretation of the weighting parameters used in QUBO formulations for VQ.
nan
Article 2003
Title@2025-06-05 (4): Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward
Title: Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward | Prefix Grouper: Effiziente GRPO-Schulung durch Shared-Prefix Forward | 前缀群組: 通过共享前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前语前言前言前言前语前言前语前言前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语语 2506.05433v1 |
Authors (8): Zikang Liu, Tongtian Yue, Yepeng Tang, Longteng Guo, Junxian Cai, Qingbin Liu, Xi Chen, Jing Liu
Group Relative Policy Optimization (GRPO) enhances policy learning by computing gradients from relative comparisons among candidate outputs that share a common input prefix. Despite its effectiveness, GRPO introduces substantial computational overhead when processing long shared prefixes, which must be redundantly encoded for each group member. This inefficiency becomes a major scalability bottleneck in long-context learning scenarios. We propose Prefix Grouper, an efficient GRPO training algorithm that eliminates redundant prefix computation via a Shared-Prefix Forward strategy. In particular, by restructuring self-attention into two parts, our method enables the shared prefix to be encoded only once, while preserving full differentiability and compatibility with end-to-end training. We provide both theoretical and empirical evidence that Prefix Grouper is training-equivalent to standard GRPO: it yields identical forward outputs and backward gradients, ensuring that the optimization dynamics and final policy performance remain unchanged. Empirically, our experiments confirm that Prefix Grouper achieves consistent results while significantly reducing the computational cost of training, particularly in long-prefix scenarios. The proposed method is fully plug-and-play: it is compatible with existing GRPO-based architectures and can be seamlessly integrated into current training pipelines as a drop-in replacement, requiring no structural modifications and only minimal changes to input construction and attention computation. Prefix Grouper enables the use of larger group sizes under the same computational budget, thereby improving the scalability of GRPO to more complex tasks and larger models. Code is now available at https://github.com/johncaged/PrefixGrouper
nan
Article 2004
Title@2025-06-05 (4): Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards
Title: Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards | Verbesserte Regret Bounds für lineare Banditen mit schwerfälligen Belohnungen | 使用重负奖励的线性强盗的遗憾弹道改善 2506.04775v1 |
Authors (3): Artin Tajdini, Jonathan Scarlett, Kevin Jamieson
We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite $(1+\epsilon)$-absolute central moment bounded by $\upsilon$ for some $\epsilon \in (0,1]$. We improve both upper and lower bounds on the minimax regret compared to prior work. When $\upsilon = \mathcal{O}(1)$, the best prior known regret upper bound is $\tilde{\mathcal{O}}(d T^{\frac{1}{1+\epsilon}})$. While a lower with the same scaling has been given, it relies on a construction using $\upsilon = \mathcal{O}(d)$, and adapting the construction to the bounded-moment regime with $\upsilon = \mathcal{O}(1)$ yields only a $\Omega(d^{\frac{\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}})$ lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being $\sqrt{d}$ below the optimal rate in the finite-variance case ($\epsilon = 1$). We propose a new elimination-based algorithm guided by experimental design, which achieves regret $\tilde{\mathcal{O}}(d^{\frac{1+3\epsilon}{2(1+\epsilon)}} T^{\frac{1}{1+\epsilon}})$, thus improving the dependence on $d$ for all $\epsilon \in (0,1)$ and recovering a known optimal result for $\epsilon = 1$. We also establish a lower bound of $\Omega(d^{\frac{2\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}})$, which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems. For finite action sets, we derive similarly improved upper and lower bounds for regret. Finally, we provide action set dependent regret upper bounds showing that for some geometries, such as $l_p$-norm balls for $p \le 1 + \epsilon$, we can further reduce the dependence on $d$, and we can handle infinite-dimensional settings via the kernel trick, in particular establishing new regret bounds for the Mat'ern kernel that are the first to be sublinear for all $\epsilon \in (0, 1]$.
nan
Article 2005
Title@2025-06-05 (4): Fine-Grained Interpretation of Political Opinions in Large Language Models
Title: Fine-Grained Interpretation of Political Opinions in Large Language Models | Feinkörnige Interpretation politischer Meinungen in großen Sprachmodellen | 以大语言模式对政治意见的精细解释 2506.04774v1 |
Authors (4): Jingyu Hu, Mengyue Yang, Mengnan Du, Weiru Liu
Studies of LLMs’ political opinions mainly rely on evaluations of their open-ended responses. Recent work indicates that there is a misalignment between LLMs’ responses and their internal intentions. This motivates us to probe LLMs’ internal mechanisms and help uncover their internal political states. Additionally, we found that the analysis of LLMs’ political opinions often relies on single-axis concepts, which can lead to concept confounds. In this work, we extend the single-axis to multi-dimensions and apply interpretable representation engineering techniques for more transparent LLM political concept learning. Specifically, we designed a four-dimensional political learning framework and constructed a corresponding dataset for fine-grained political concept vector learning. These vectors can be used to detect and intervene in LLM internals. Experiments are conducted on eight open-source LLMs with three representation engineering techniques. Results show these vectors can disentangle political concept confounds. Detection tasks validate the semantic meaning of the vectors and show good generalization and robustness in OOD settings. Intervention Experiments show these vectors can intervene in LLMs to generate responses with different political leanings.
nan
Article 2006
Title@2025-06-05 (4): Inclusive, Differentially Private Federated Learning for Clinical Data
Title: Inclusive, Differentially Private Federated Learning for Clinical Data | Inklusives, differenziert privates Federated Learning für klinische Daten | 包容性、差异化私联校临床数据学习 2505.22108v2 |
Authors (10): Santhosh Parampottupadam, Melih Coşğun, Sarthak Pati, Maximilian Zenk, Saikat Roy, Dimitrios Bounias, Benjamin Hamm, Sinem Sav, Ralf Floca, Klaus Maier-Hein
Federated Learning (FL) offers a promising approach for training clinical AI models without centralizing sensitive patient data. However, its real-world adoption is hindered by challenges related to privacy, resource constraints, and compliance. Existing Differential Privacy (DP) approaches often apply uniform noise, which disproportionately degrades model performance, even among well-compliant institutions. In this work, we propose a novel compliance-aware FL framework that enhances DP by adaptively adjusting noise based on quantifiable client compliance scores. Additionally, we introduce a compliance scoring tool based on key healthcare and security standards to promote secure, inclusive, and equitable participation across diverse clinical settings. Extensive experiments on public datasets demonstrate that integrating under-resourced, less compliant clinics with highly regulated institutions yields accuracy improvements of up to 15% over traditional FL. This work advances FL by balancing privacy, compliance, and performance, making it a viable solution for real-world clinical workflows in global healthcare.
nan
Article 2007
Title@2025-06-05 (4): Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective
Title: Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective | Der Stabilitäts-Plastizitäts-Trade-off im kontinuierlichen Lernen aus architektonischer Perspektive neu denken | 从建筑学的视角重新思考持续学习中稳定-弹性权衡取舍 2506.03951v2 |
Authors (4): Aojun Lu, Hangjie Yuan, Tao Feng, Yanan Sun
The quest for Continual Learning (CL) seeks to empower neural networks with the ability to learn and adapt incrementally. Central to this pursuit is addressing the stability-plasticity dilemma, which involves striking a balance between two conflicting objectives: preserving previously learned knowledge and acquiring new knowledge. While numerous CL methods aim to achieve this trade-off, they often overlook the impact of network architecture on stability and plasticity, restricting the trade-off to the parameter level. In this paper, we delve into the conflict between stability and plasticity at the architectural level. We reveal that under an equal parameter constraint, deeper networks exhibit better plasticity, while wider networks are characterized by superior stability. To address this architectural-level dilemma, we introduce a novel framework denoted Dual-Arch, which serves as a plug-in component for CL. This framework leverages the complementary strengths of two distinct and independent networks: one dedicated to plasticity and the other to stability. Each network is designed with a specialized and lightweight architecture, tailored to its respective objective. Extensive experiments demonstrate that Dual-Arch enhances the performance of existing CL methods while being up to 87% more compact in terms of parameters. Code: https://github.com/byyx666/Dual-Arch.
nan
Article 2008
Title@2025-06-05 (4): PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling
Title: PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling | PCDVQ: Verbesserung der Vector Quantization für große Sprachmodelle über Polar Coordinate Entkopplung | PCDVQ:通过极地协调脱钩,加强大语言模型的矢量量化 2506.05432v1 |
Authors (6): Yuxuan Yue, Zukang Xu, Zhihang Yuan, Dawei Yang, Jianglong Wu, Liqiang Nie
Large Language Models (LLMs) face significant challenges in edge deployment due to their massive parameter scale. Vector Quantization (VQ), a clustering-based quantization method, serves as a prevalent solution to this issue for its extremely low-bit (even at 2-bit) and considerable accuracy. Since a vector is a quantity in mathematics and physics that has both direction and magnitude, existing VQ works typically quantize them in a coupled manner. However, we find that direction exhibits significantly greater sensitivity to quantization compared to the magnitude. For instance, when separately clustering the directions and magnitudes of weight vectors in LLaMA-2-7B, the accuracy drop of zero-shot tasks are 46.5\% and 2.3\%, respectively. This gap even increases with the reduction of clustering centers. Further, Euclidean distance, a common metric to access vector similarities in current VQ works, places greater emphasis on reducing the magnitude error. This property is contrary to the above finding, unavoidably leading to larger quantization errors. To these ends, this paper proposes Polar Coordinate Decoupled Vector Quantization (PCDVQ), an effective and efficient VQ framework consisting of two key modules: 1) Polar Coordinate Decoupling (PCD), which transforms vectors into their polar coordinate representations and perform independent quantization of the direction and magnitude parameters.2) Distribution Aligned Codebook Construction (DACC), which optimizes the direction and magnitude codebooks in accordance with the source distribution. Experimental results show that PCDVQ outperforms baseline methods at 2-bit level by at least 1.5\% zero-shot accuracy, establishing a novel paradigm for accurate and highly compressed LLMs.
nan
Article 2009
Title@2025-06-05 (4): OpenGT: A Comprehensive Benchmark For Graph Transformers
Title: OpenGT: A Comprehensive Benchmark For Graph Transformers | OpenGT: Ein umfassender Benchmark für Graph Transformer | OpenGT: 图表变形器综合基准 2506.04765v1 |
Authors (6): Jiachen Tang, Zhonghao Wang, Sirui Chen, Sheng Zhou, Jiawei Chen, Jiajun Bu
Graph Transformers (GTs) have recently demonstrated remarkable performance across diverse domains. By leveraging attention mechanisms, GTs are capable of modeling long-range dependencies and complex structural relationships beyond local neighborhoods. However, their applicable scenarios are still underexplored, this highlights the need to identify when and why they excel. Furthermore, unlike GNNs, which predominantly rely on message-passing mechanisms, GTs exhibit a diverse design space in areas such as positional encoding, attention mechanisms, and graph-specific adaptations. Yet, it remains unclear which of these design choices are truly effective and under what conditions. As a result, the community currently lacks a comprehensive benchmark and library to promote a deeper understanding and further development of GTs. To address this gap, this paper introduces OpenGT, a comprehensive benchmark for Graph Transformers. OpenGT enables fair comparisons and multidimensional analysis by establishing standardized experimental settings and incorporating a broad selection of state-of-the-art GNNs and GTs. Our benchmark evaluates GTs from multiple perspectives, encompassing diverse tasks and datasets with varying properties. Through extensive experiments, our benchmark has uncovered several critical insights, including the difficulty of transferring models across task levels, the limitations of local attention, the efficiency trade-offs in several models, the application scenarios of specific positional encodings, and the preprocessing overhead of some positional encodings. We aspire for this work to establish a foundation for future graph transformer research emphasizing fairness, reproducibility, and generalizability. We have developed an easy-to-use library OpenGT for training and evaluating existing GTs. The benchmark code is available at https://github.com/eaglelab-zju/OpenGT.
nan
Article 2010
Title@2025-06-05 (4): Log-Linear Attention
Title: Log-Linear Attention | Log-Linear-Achtung | 日志边注意 2506.04761v1 |
Authors (6): Han Guo, Songlin Yang, Tarushii Goel, Eric P. Xing, Tri Dao, Yoon Kim
The attention mechanism in Transformers is an important primitive for accurate and scalable sequence modeling. Its quadratic-compute and linear-memory complexity however remain significant bottlenecks. Linear attention and state-space models enable linear-time, constant-memory sequence modeling and can moreover be trained efficiently through matmul-rich parallelization across sequence length. However, at their core these models are still RNNs, and thus their use of a fixed-size hidden state to model the context is a fundamental limitation. This paper develops log-linear attention, an attention mechanism that balances linear attention’s efficiency and the expressiveness of softmax attention. Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states. We show that with a particular growth function, log-linear attention admits a similarly matmul-rich parallel form whose compute cost is log-linear in sequence length. Log-linear attention is a general framework and can be applied on top of existing linear attention variants. As case studies, we instantiate log-linear variants of two recent architectures – Mamba-2 and Gated DeltaNet – and find they perform well compared to their linear-time variants.
nan
Article 2011
Title@2025-06-05 (4): Robustness Evaluation for Video Models with Reinforcement Learning
Title: Robustness Evaluation for Video Models with Reinforcement Learning | Robustheitsbewertung für Videomodelle mit Verstärkungslernen | 强化学习录像模型的强力评价 2506.05431v1 |
Authors (8): Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna Gutierrez, Soumyendu Sarkar
Evaluating the robustness of Video classification models is very challenging, specifically when compared to image-based models. With their increased temporal dimension, there is a significant increase in complexity and computational cost. One of the key challenges is to keep the perturbations to a minimum to induce misclassification. In this work, we propose a multi-agent reinforcement learning approach (spatial and temporal) that cooperatively learns to identify the given video’s sensitive spatial and temporal regions. The agents consider temporal coherence in generating fine perturbations, leading to a more effective and visually imperceptible attack. Our method outperforms the state-of-the-art solutions on the Lp metric and the average queries. Our method enables custom distortion types, making the robustness evaluation more relevant to the use case. We extensively evaluate 4 popular models for video action recognition on two popular datasets, HMDB-51 and UCF-101.
nan
Article 2012
Title@2025-06-05 (4): Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models
Title: Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models | Multi-Layer GRPO: Verbessern von Vernunft und Selbstkorrektion in großen Sprachmodellen | 多语言GROPO:加强大语言模式的理性和自我校正 2506.04746v1 |
Authors (4): Fei Ding, Baiqiao Wang, Zijian Zeng, Youwei Wang
The Group Relative Policy Optimization (GRPO) algorithm has demonstrated considerable success in enhancing the reasoning capabilities of large language models (LLMs), as evidenced by DeepSeek-R1. However, the absence of intermediate supervision in GRPO frequently leads to inefficient exploration dynamics. A single error in a complex reasoning chain can invalidate the entire solution, resulting in abrupt reward vanishing and compromising training stability.To address these challenges, we propose MGRPO (Multi-layer GRPO). MGRPO operates in two layers: the first layer employs standard GRPO to generate an initial response. This response, along with the original query, is then fed into a second-layer GRPO process. This second layer is specifically trained to identify and correct errors in the initial response, effectively creating a self-correction loop. This mechanism provides implicit process-level supervision by rewarding successful error correction, without requiring an explicit, densely-annotated reward model. Experimental results on several mathematical reasoning benchmarks demonstrate that MGRPO significantly outperforms standard GRPO, achieving superior performance by fostering both reasoning and self-correction abilities.
nan
Article 2013
Title@2025-06-05 (4): Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models
Title: Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models | Text-zu-CAD-Generierung durch visuelles Feedback in großen Sprachmodellen | 通过将视觉反馈引入大语言模型,将生成的文字引入CAD 2501.19054v3 |
Authors (4): Ruiyu Wang, Yu Yuan, Shizhao Sun, Jiang Bian
Creating Computer-Aided Design (CAD) models requires significant expertise and effort. Text-to-CAD, which converts textual descriptions into CAD parametric sequences, is crucial in streamlining this process. Recent studies have utilized ground-truth parametric sequences, known as sequential signals, as supervision to achieve this goal. However, CAD models are inherently multimodal, comprising parametric sequences and corresponding rendered visual objects. Besides,the rendering process from parametric sequences to visual objects is many-to-one. Therefore, both sequential and visual signals are critical for effective training. In this work, we introduce CADFusion, a framework that uses Large Language Models (LLMs) as the backbone and alternates between two training stages: the sequential learning (SL) stage and the visual feedback (VF) stage. In the SL stage, we train LLMs using ground-truth parametric sequences, enabling the generation of logically coherent parametric sequences. In the VF stage, we reward parametric sequences that render into visually preferred objects and penalize those that do not, allowing LLMs to learn how rendered visual objects are perceived and evaluated. These two stages alternate throughout the training, ensuring balanced learning and preserving benefits of both signals. Experiments demonstrate that CADFusion significantly improves performance, both qualitatively and quantitatively.
nan
Article 2014
Title@2025-06-05 (4): Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
Title: Focus On This, Not That! Steering LLMs with Adaptive Feature Specification | Fokus auf diese, nicht das! Lenkung LLMs mit adaptiven Feature Specification | 聚焦于此, 不是那个! 指导性带适应性特征规格的LLMS 2410.22944v4 |
Authors (5): Tom A. Lamb, Adam Davies, Alasdair Paren, Philip H. S. Torr, Francesco Pinto
Despite the success of Instruction Tuning (IT) in training large language models (LLMs), such models often leverage spurious or biased features learnt from their training data and can become misaligned, leading to undesired behaviours. While existing techniques can steer model behaviour at inference-time, they are often post-hoc and do not embed steering as an intrinsic model feature. In this work, we introduce Focus Instruction Tuning (FIT), which trains LLMs to condition their responses by focusing on specific features whilst ignoring others, leading to different behaviours based on what features are specified. Across diverse benchmarks, we demonstrate that FIT: (i) successfully steers behaviour at inference time; (ii) increases robustness by amplifying core task signals and down-weighting spurious cues; (iii) mitigates social bias by suppressing demographic attributes; and (iv) generalises under distribution shifts and to previously unseen focus features. FIT therefore offers a lightweight, intrinsic mechanism for building more robust, fair, and easily controllable LLMs.
nan
Article 2015
Title@2025-06-05 (4): Model-based Neural Data Augmentation for sub-wavelength Radio Localization
Title: Model-based Neural Data Augmentation for sub-wavelength Radio Localization | Modellbasierte Neural Data Augmentation für die Subwellenlängen-Radiolokalisierung | 以模型为基础的亚波长无线电无线电下波长无线电神经数据扩增本地化模型 2506.06387v1 |
Authors (6): Baptiste Chatelier, Vincent Corlay, Musa Furkan Keskin, Matthieu Crussière, Henk Wymeersch, Luc Le Magoarou
The increasing deployment of large antenna arrays at base stations has significantly improved the spatial resolution and localization accuracy of radio-localization methods. However, traditional signal processing techniques struggle in complex radio environments, particularly in scenarios dominated by non line of sight (NLoS) propagation paths, resulting in degraded localization accuracy. Recent developments in machine learning have facilitated the development of machine learning-assisted localization techniques, enhancing localization accuracy in complex radio environments. However, these methods often involve substantial computational complexity during both the training and inference phases. This work extends the well-established fingerprinting-based localization framework by simultaneously reducing its memory requirements and improving its accuracy. Specifically, a model-based neural network is used to learn the location-to-channel mapping, and then serves as a generative neural channel model. This generative model augments the fingerprinting comparison dictionary while reducing the memory requirements. The proposed method outperforms fingerprinting baselines by achieving sub-wavelength localization accuracy, even in NLoS environments. Remarkably, it offers an improvement by several orders of magnitude in localization accuracy, while simultaneously reducing memory requirements by an order of magnitude compared to classical fingerprinting methods.
nan
Article 2016
Title@2025-06-05 (4): The Role of Diversity in In-Context Learning for Large Language Models
Title: The Role of Diversity in In-Context Learning for Large Language Models | Die Rolle der Vielfalt im In-Context-Lernen für große Sprachmodelle | 多样性在为大语言模式进行内文学习方面的作用 2505.19426v2 |
Authors (3): Wenyang Xiao, Haoyu Zhao, Lingxiao Huang
In-context learning (ICL) is a crucial capability of current large language models (LLMs), where the selection of examples plays a key role in performance. While most existing approaches focus on selecting the most similar examples to the query, the impact of diversity in example selection remains underexplored. We systematically investigate the role of diversity in in-context example selection through experiments across a range of tasks, from sentiment classification to more challenging math and code problems. Experiments on Llama-3.1, Gemma-2, and Mistral-v0.3 families of models show that diversity-aware selection methods improve performance, particularly on complex tasks like math and code, and enhance robustness to out-of-distribution queries. To support these findings, we introduce a theoretical framework that explains the benefits of incorporating diversity in in-context example selection.
nan
Article 2017
Title@2025-06-05 (4): SPHINX: Structural Prediction using Hypergraph Inference Network
Title: SPHINX: Structural Prediction using Hypergraph Inference Network | SPHINX: Structural Prediction mit Hypergraph Inference Network | SPHINX:利用高光推断网络进行结构预测 2410.03208v2 |
Authors (2): Iulia Duta, Pietro Liò
The importance of higher-order relations is widely recognized in a large number of real-world systems. However, annotating them is a tedious and sometimes impossible task. Consequently, current approaches for data modelling either ignore the higher-order interactions altogether or simplify them into pairwise connections. In order to facilitate higher-order processing, even when a hypergraph structure is not available, we introduce Structural Prediction using Hypergraph Inference Network (SPHINX), a model that learns to infer a latent hypergraph structure in an unsupervised way, solely from the final node-level signal. The model consists of a soft, differentiable clustering method used to sequentially predict, for each hyperedge, the probability distribution over the nodes and a sampling algorithm that converts them into an explicit hypergraph structure. We show that the recent advancement in $k$-subset sampling represents a suitable tool for producing discrete hypergraph structures, addressing some of the training instabilities exhibited by prior works. The resulting model can generate the higher-order structure necessary for any modern hypergraph neural network, facilitating the capture of higher-order interaction in domains where annotating them is difficult. Through extensive ablation studies and experiments conducted on two challenging datasets for trajectory prediction, we demonstrate that our model is capable of inferring suitable latent hypergraphs, that are interpretable and enhance the final performance.
nan
Article 2018
Title@2025-06-05 (4): Automated Modeling Method for Pathloss Model Discovery
Title: Automated Modeling Method for Pathloss Model Discovery | Automatisierte Modellierungsmethode für Pathloss Model Discovery | 病理模型发现自动建模方法 2505.23383v2 |
Authors (5): Ahmad Anaqreh, Shih-Kai Chou, Mihael Mohorčič, Thomas Lagkas, Carolina Fortuna
Modeling propagation is the cornerstone for designing and optimizing next-generation wireless systems, with a particular emphasis on 5G and beyond era. Traditional modeling methods have long relied on statistic-based techniques to characterize propagation behavior across different environments. With the expansion of wireless communication systems, there is a growing demand for methods that guarantee the accuracy and interpretability of modeling. Artificial intelligence (AI)-based techniques, in particular, are increasingly being adopted to overcome this challenge, although the interpretability is not assured with most of these methods. Inspired by recent advancements in AI, this paper proposes a novel approach that accelerates the discovery of path loss models while maintaining interpretability. The proposed method automates the formulation, evaluation, and refinement of the model, facilitating the discovery of the model. We examine two techniques: one based on Deep Symbolic Regression, offering full interpretability, and the second based on Kolmogorov-Arnold Networks, providing two levels of interpretability. Both approaches are evaluated on two synthetic and two real-world datasets. Our results show that Kolmogorov-Arnold Networks achieve the coefficient of determination value R^2 close to 1 with minimal prediction error, while Deep Symbolic Regression generates compact models with moderate accuracy. Moreover, on the selected examples, we demonstrate that automated methods outperform traditional methods, achieving up to 75% reduction in prediction errors, offering accurate and explainable solutions with potential to increase the efficiency of discovering next-generation path loss models.
nan
Article 2019
Title@2025-06-05 (4): Scaling Trends in Language Model Robustness
Title: Scaling Trends in Language Model Robustness | Skalierungstrends in der Robustheit von Sprachmodellen | 语言模型强度的增缩趋势 2407.18213v5 |
Authors (8): Nikolaus Howe, Ian McKenzie, Oskar Hollinsworth, Michał Zajac, Tom Tseng, Aaron Tucker, Pierre-Luc Bacon, Adam Gleave
Increasing model size has unlocked a dazzling array of capabilities in modern language models. At the same time, even frontier models remain vulnerable to jailbreaks and prompt injections, despite concerted efforts to make them robust. As both attack and defense gain access to more compute, and as models become larger, what happens to robustness? We argue that to answer this question requires a \emph{scaling} approach, which we employ in an extensive study of language model robustness across several classification tasks, model families, and adversarial attacks. We find that in the absence of explicit safety training, larger models are not consistently more robust; however, scale improves sample efficiency in adversarial training, though it worsens compute efficiency. Further, we find that increasing attack compute smoothly improves attack success rate against both undefended and adversarially trained models. Finally, after exploring robustness transfer across attacks and threat models, we combine attack and defense scaling rates to study the offense-defense balance. We find that while attack scaling outpaces adversarial training across all models studied, larger adversarially trained models might give defense the advantage in the long run. These results underscore the utility of the scaling lens, and provide a paradigm for evaluating future attacks and defenses on frontier models.
nan
Article 2020
Title@2025-06-05 (4): $μ$nit Scaling: Simple and Scalable FP8 LLM Training
Title: $μ$nit Scaling: Simple and Scalable FP8 LLM Training | $μ$nit Scaling: Einfache und skalierbare FP8 LLM-Schulung | $m$nit 缩放:简单和可缩放的FP8LLM培训 2502.05967v3 |
Authors (4): Saaketh Narayan, Abhay Gupta, Mansheej Paul, Davis Blalock
Large Language Model training with 8-bit floating point (FP8) formats promises significant efficiency improvements, but reduced numerical precision makes training challenging. It is currently possible to train in FP8 only if one is willing to tune various hyperparameters, reduce model scale, or accept the overhead of computing dynamic scale factors. We demonstrate simple, scalable FP8 training that requires no dynamic scaling factors or special hyperparameters, even at large model sizes. Our method, $\mu$nit Scaling ($\mu$S), also enables simple hyperparameter transfer across model widths, matched numerics across training and inference, and other desirable properties. $\mu$nit Scaling is straightforward to implement, consisting of a set of minimal interventions based on a first-principles analysis of common transformer operations. We validate our method by training models from 1B to 13B parameters, performing all hidden linear layer computations in FP8. We achieve quality equal to higher precision baselines while also training up to 33% faster.
nan
Article 2021
Title@2025-06-05 (4): Coordinated Robustness Evaluation Framework for Vision-Language Models
Title: Coordinated Robustness Evaluation Framework for Vision-Language Models | Koordinierte Robustheitsbewertungsrahmen für Vision-Language-Modelle | 愿景-语言模型协调强力评价框架 2506.05429v1 |
Authors (8): Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna Gutierrez, Soumyendu Sarkar
Vision-language models, which integrate computer vision and natural language processing capabilities, have demonstrated significant advancements in tasks such as image captioning and visual question and answering. However, similar to traditional models, they are susceptible to small perturbations, posing a challenge to their robustness, particularly in deployment scenarios. Evaluating the robustness of these models requires perturbations in both the vision and language modalities to learn their inter-modal dependencies. In this work, we train a generic surrogate model that can take both image and text as input and generate joint representation which is further used to generate adversarial perturbations for both the text and image modalities. This coordinated attack strategy is evaluated on the visual question and answering and visual reasoning datasets using various state-of-the-art vision-language models. Our results indicate that the proposed strategy outperforms other multi-modal attacks and single-modality attacks from the recent literature. Our results demonstrate their effectiveness in compromising the robustness of several state-of-the-art pre-trained multi-modal models such as instruct-BLIP, ViLT and others.
nan
Article 2022
Title@2025-06-05 (4): Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks
Title: Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks | Regularisierte KL-Divergenz für gut definierte Funktions-Raum-Varianten-Schlussfolgerung in Bayesischen neuronalen Netzwerken | Bayesian神经网络中定义精准的功能-空间变动推论的正规化 KL-KL-Diverence 2406.04317v3 |
Authors (2): Tristan Cinquin, Robert Bamler
Bayesian neural networks (BNN) promise to combine the predictive performance of neural networks with principled uncertainty modeling important for safety-critical systems and decision making. However, posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult. This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights. In this paper, we address a fundamental issue with such function-space VI approaches pointed out by Burt et al. (2020), who showed that the objective function (ELBO) is negative infinite for most priors of interest. Our solution builds on generalized VI (Knoblauch et al., 2019) with the regularized KL divergence (Quang, 2019) and is, to the best of our knowledge, the first well-defined variational objective for function-space inference in BNNs with Gaussian process (GP) priors. Experiments show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets, and provides competitive uncertainty estimates for regression, classification and out-of-distribution detection compared to BNN baselines with both function and weight-space priors.
nan
Article 2023
Title@2025-06-05 (4): Adapt before Continual Learning
Title: Adapt before Continual Learning | Anpassung vor dem kontinuierlichen Lernen | 在持续学习前适应 2506.03956v2 |
Authors (5): Aojun Lu, Tao Feng, Hangjie Yuan, Chunhui Ding, Yanan Sun
Continual Learning (CL) seeks to enable neural networks to incrementally acquire new knowledge (plasticity) while retaining existing knowledge (stability). While pre-trained models (PTMs) have become pivotal in CL, prevailing approaches freeze the PTM backbone to preserve stability, limiting their plasticity, particularly when encountering significant domain gaps in incremental tasks. Conversely, sequentially finetuning the entire PTM risks catastrophic forgetting of generalizable knowledge, exposing a critical stability-plasticity trade-off. To address this challenge, we propose Adapting PTMs before the core CL process (ACL), a novel framework that refines the PTM backbone through a plug-and-play adaptation phase before learning each new task with existing CL approaches (e.g., prompt tuning). ACL enhances plasticity by aligning embeddings with their original class prototypes while distancing them from others, theoretically and empirically shown to balance stability and plasticity. Extensive experiments demonstrate that ACL significantly improves CL performance across benchmarks and integrated methods, offering a versatile solution for PTM-based CL. Code is available at https://github.com/byyx666/ACL_code.
nan
Article 2024
Title@2025-06-05 (4): FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation
Title: FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation | FlowDAS: Ein stochastisches Interpolant-basiertes Framework für die Datenassimilierung | FlowDAS:一个基于数据同化的存储式内插框架 2501.16642v2 |
Authors (5): Siyi Chen, Yixuan Jia, Qing Qu, He Sun, Jeffrey A Fessler
Data assimilation (DA) integrates observations with a dynamical model to estimate states of PDE-governed systems. Model-driven methods (e.g., Kalman, particle) presuppose full knowledge of the true dynamics, which is not always satisfied in practice, while purely data-driven solvers learn a deterministic mapping between observations and states and therefore miss the intrinsic stochasticity of real processes. Recently, score-based diffusion models learn a global diffusion prior and provide a good modeling of the stochastic dynamics, showing new potential for DA. However, their all-at-once generation rather than step-by-step transition limits their performance when dealing with highly complex stochastic processes and lacks physical interpretability. To tackle these drawbacks, we introduce FlowDAS, a generative DA framework that uses stochastic interpolants to directly learn state transition dynamics and achieve step-by-step transition to better model the real dynamics. We also improve the framework by combining the observation, better suiting the DA settings. Directly learning the underlying dynamics from collected data removes restrictive dynamical assumptions, and conditioning on observations at each interpolation step yields stable, measurement-consistent forecasts. Experiments on Lorenz-63, Navier-Stokes super-resolution/sparse-observation scenarios, and large-scale weather forecasting – where dynamics are partly or wholly unknown – show that FlowDAS surpasses model-driven methods, neural operators, and score-based baselines in accuracy and physical plausibility.
nan
Article 2025
Title@2025-06-05 (4): Wyckoff Transformer: Generation of Symmetric Crystals
Title: Wyckoff Transformer: Generation of Symmetric Crystals | Wyckoff Transformer: Erzeugung symmetrischer Kristalle | Wyckoff 变形器: 生成对称晶体 2503.02407v4 |
Authors (7): Nikita Kazeev, Wei Nong, Ignat Romanov, Ruiming Zhu, Andrey Ustyuzhanin, Shuya Yamazaki, Kedar Hippalgaonkar
Crystal symmetry plays a fundamental role in determining its physical, chemical, and electronic properties such as electrical and thermal conductivity, optical and polarization behavior, and mechanical strength. Almost all known crystalline materials have internal symmetry. However, this is often inadequately addressed by existing generative models, making the consistent generation of stable and symmetrically valid crystal structures a significant challenge. We introduce WyFormer, a generative model that directly tackles this by formally conditioning on space group symmetry. It achieves this by using Wyckoff positions as the basis for an elegant, compressed, and discrete structure representation. To model the distribution, we develop a permutation-invariant autoregressive model based on the Transformer encoder and an absence of positional encoding. Extensive experimentation demonstrates WyFormer’s compelling combination of attributes: it achieves best-in-class symmetry-conditioned generation, incorporates a physics-motivated inductive bias, produces structures with competitive stability, predicts material properties with competitive accuracy even without atomic coordinates, and exhibits unparalleled inference speed.
nan
Article 2026
Title@2025-06-05 (4): Using In-Context Learning for Automatic Defect Labelling of Display Manufacturing Data
Title: Using In-Context Learning for Automatic Defect Labelling of Display Manufacturing Data | Verwendung von In-Context-Learning zur automatischen Defekt-Etikettierung von Display-Produktionsdaten | 使用内文学习自动去除显示器制造数据标签 2506.04717v1 |
Authors (5): Babar Hussain, Qiang Liu, Gang Chen, Bihai She, Dahai Yu
This paper presents an AI-assisted auto-labeling system for display panel defect detection that leverages in-context learning capabilities. We adopt and enhance the SegGPT architecture with several domain-specific training techniques and introduce a scribble-based annotation mechanism to streamline the labeling process. Our two-stage training approach, validated on industrial display panel datasets, demonstrates significant improvements over the baseline model, achieving an average IoU increase of 0.22 and a 14% improvement in recall across multiple product types, while maintaining approximately 60% auto-labeling coverage. Experimental results show that models trained on our auto-labeled data match the performance of those trained on human-labeled data, offering a practical solution for reducing manual annotation efforts in industrial inspection systems.
nan
Article 2027
Title@2025-06-05 (4): Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices
Title: Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices | Infinitesimale Spektralvariationen höherer Ordnung in rektangulären realen Zufallsmatrizen | 矩形真实随机矩阵中无尽的高端天文光谱变化 2506.03764v2 |
Authors (1): Róisín Luo
We present a theoretical framework for deriving the general $n$-th order Fr'echet derivatives of singular values in real rectangular matrices, by leveraging reduced resolvent operators from Kato’s analytic perturbation theory for self-adjoint operators. Deriving closed-form expressions for higher-order derivatives of singular values is notoriously challenging through standard matrix-analysis techniques. To overcome this, we treat a real rectangular matrix as a compact operator on a finite-dimensional Hilbert space, and embed the rectangular matrix into a block self-adjoint operator so that non-symmetric perturbations are captured. Applying Kato’s asymptotic eigenvalue expansion to this construction, we obtain a general, closed-form expression for the infinitesimal $n$-th order spectral variations. Specializing to $n=2$ and deploying on a Kronecker-product representation with matrix convention yield the Hessian of a singular value, not found in literature. By bridging abstract operator-theoretic perturbation theory with matrices, our framework equips researchers with a practical toolkit for higher-order spectral sensitivity studies in random matrix applications (e.g., adversarial perturbation in deep learning).
nan
Article 2028
Title@2025-06-05 (4): Bandit Multiclass List Classification
Title: Bandit Multiclass List Classification | Bandit-Listenklassifikation für mehrere Klassen | 强盗多级名单分类 2502.09257v2 |
Authors (2): Liad Erez, Tomer Koren
We study the problem of multiclass list classification with (semi-)bandit feedback, where input examples are mapped into subsets of size $m$ of a collection of $K$ possible labels. In each round of the interaction, the learner observes feedback consisting of the predicted labels which lie in some underlying set of ground truth labels associated with the given example. Our main result is for the $(\varepsilon,\delta)$-PAC variant of the problem for which we design an algorithm that returns an $\varepsilon$-optimal hypothesis with high probability using a sample complexity of $\widetilde{O} \big( (\mathrm{poly}(K/m) + sm / \varepsilon^2) \log ( | H | /\delta) \big)$ where $H$ is the underlying (finite) hypothesis class and $s$ is an upper bound on the number of true labels for a given example. This bound improves upon known bounds for combinatorial semi-bandits whenever $s \ll K$. Moreover, in the regime where $s = O(1)$ the leading terms in our bound match the corresponding full-information rates, implying that bandit feedback essentially comes at no cost. Our PAC learning algorithm is also computationally efficient given access to an ERM oracle for $H$. In the special case of single-label classification corresponding to $s=m=1$, we prove a sample complexity bound of $O \big((K^7 + 1/\varepsilon^2)\log ( | H | /\delta)\big)$ which improves upon recent results in this scenario (Erez et al. ‘24). Additionally, we consider the regret minimization setting where data can be generated adversarially, and establish a regret bound of $\widetilde O( | H | + \sqrt{smT \log | H | })$. Our results generalize and extend prior work in the simpler single-label setting (Erez et al. ‘24), and apply more generally to contextual combinatorial semi-bandit problems with $s$-sparse rewards. |
nan
Article 2029
Title@2025-06-05 (4): UNO: Unlearning via Orthogonalization in Generative models
Title: UNO: Unlearning via Orthogonalization in Generative models | UNO: Entlernen über Orthogonalisierung in generativen Modellen | UNO:在创制模型中,通过正正孔化重新学习 2506.04712v1 |
Authors (2): Pinak Mandal, Georg A. Gottwald
As generative models become increasingly powerful and pervasive, the ability to unlearn specific data, whether due to privacy concerns, legal requirements, or the correction of harmful content, has become increasingly important. Unlike in conventional training, where data are accumulated and knowledge is reinforced, unlearning aims to selectively remove the influence of particular data points without costly retraining from scratch. To be effective and reliable, such algorithms need to achieve (i) forgetting of the undesired data, (ii) preservation of the quality of the generation, (iii) preservation of the influence of the desired training data on the model parameters, and (iv) small number of training steps. We propose fast unlearning algorithms based on loss gradient orthogonalization. We show that our algorithms are able to forget data while maintaining the fidelity of the original model. Using MNIST and CelebA data, we demonstrate that our algorithms achieve orders of magnitude faster unlearning times than their predecessors, such as gradient surgery.
nan
Article 2030
Title@2025-06-05 (4): Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence
Title: Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence | Explizite Dichte-Annäherung für neurale Implizite Sampler mit einer Bernstein-basierten Konvex-Divergenz | 使用伯恩斯坦-以伯尔斯坦为基础的电汇变异的神经隐性采样器的显性密度近似 2506.04700v1 |
Authors (4): José Manuel de Frutos, Manuel A. Vázquez, Pablo M. Olmos, Joaquín Míguez
Rank-based statistical metrics, such as the invariant statistical loss (ISL), have recently emerged as robust and practically effective tools for training implicit generative models. In this work, we introduce dual-ISL, a novel likelihood-free objective for training implicit generative models that interchanges the roles of the target and model distributions in the ISL framework, yielding a convex optimization problem in the space of model densities. We prove that the resulting rank-based discrepancy $d_K$ is i) continuous under weak convergence and with respect to the $L^1$ norm, and ii) convex in its first argument-properties not shared by classical divergences such as KL or Wasserstein distances. Building on this, we develop a theoretical framework that interprets $d_K$ as an $L^2$-projection of the density ratio $q = p/\tilde p$ onto a Bernstein polynomial basis, from which we derive exact bounds on the truncation error, precise convergence rates, and a closed-form expression for the truncated density approximation. We further extend our analysis to the multivariate setting via random one-dimensional projections, defining a sliced dual-ISL divergence that retains both convexity and continuity. We empirically show that these theoretical advantages translate into practical ones. Specifically, across several benchmarks dual-ISL converges more rapidly, delivers markedly smoother and more stable training, and more effectively prevents mode collapse than classical ISL and other leading implicit generative methods-while also providing an explicit density approximation.
nan
Article 2031
Title@2025-06-05 (4): Enhanced Drought Analysis in Bangladesh: A Machine Learning Approach for Severity Classification Using Satellite Data
Title: Enhanced Drought Analysis in Bangladesh: A Machine Learning Approach for Severity Classification Using Satellite Data | Verbesserte Trockenheitsanalyse in Bangladesch: Ein maschineller Lernansatz für die Severity-Klassifikation mittels Satellitendaten | 孟加拉国加强干旱分析:利用卫星数据进行重力分类的机械学习方法 2506.04696v1 |
Authors (3): Tonmoy Paul, Mrittika Devi Mati, Md. Mahmudul Islam
Drought poses a pervasive environmental challenge in Bangladesh, impacting agriculture, socio-economic stability, and food security due to its unique geographic and anthropogenic vulnerabilities. Traditional drought indices, such as the Standardized Precipitation Index (SPI) and Palmer Drought Severity Index (PDSI), often overlook crucial factors like soil moisture and temperature, limiting their resolution. Moreover, current machine learning models applied to drought prediction have been underexplored in the context of Bangladesh, lacking a comprehensive integration of satellite data across multiple districts. To address these gaps, we propose a satellite data-driven machine learning framework to classify drought across 38 districts of Bangladesh. Using unsupervised algorithms like K-means and Bayesian Gaussian Mixture for clustering, followed by classification models such as KNN, Random Forest, Decision Tree, and Naive Bayes, the framework integrates weather data (humidity, soil moisture, temperature) from 2012-2024. This approach successfully classifies drought severity into different levels. However, it shows significant variabilities in drought vulnerabilities across regions which highlights the aptitude of machine learning models in terms of identifying and predicting drought conditions.
nan
Article 2032
Title@2025-06-05 (4): On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models
Title: On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models | Über den Mechanismus der bewussten Musterauswahl im Ausbau des Lernens für Sprachmodelle | 语文模式强化学习中选择理由选择模式机制 2506.04695v1 |
Authors (3): Xingwu Chen, Tianle Li, Difan Zou
Reinforcement learning (RL) has demonstrated remarkable success in enhancing model capabilities, including instruction-following, preference learning, and reasoning. Yet despite its empirical successes, the mechanisms by which RL improves reasoning abilities remain poorly understood. We present a systematic study of Reinforcement Learning with Verifiable Rewards (RLVR), showing that its primary benefit comes from optimizing the selection of existing reasoning patterns. Through extensive experiments, we demonstrate that RLVR-trained models preferentially adopt high-success-rate reasoning patterns while mostly maintaining stable performance on individual patterns. We further develop theoretical analyses on the convergence and training dynamics of RLVR based on a simplified question-reason-answer model. We study the gradient flow and show that RLVR can indeed find the solution that selects the reason pattern with the highest success rate. Besides, our theoretical results reveal two distinct regimes regarding the convergence of RLVR training: (1) rapid convergence for models with relatively strong initial reasoning capabilities versus (2) slower optimization dynamics for weaker models. Furthermore, we show that the slower optimization for weaker models can be mitigated by applying the supervised fine-tuning (SFT) before RLVR, when using a feasibly high-quality SFT dataset. We validate the theoretical findings through extensive experiments. This work advances our theoretical understanding of RL’s role in LLM fine-tuning and offers insights for further enhancing reasoning capabilities.
nan
Article 2033
Title@2025-06-05 (4): Influence Functions for Edge Edits in Non-Convex Graph Neural Networks
Title: Influence Functions for Edge Edits in Non-Convex Graph Neural Networks | Einflussfunktionen für Kantenbearbeitungen in nicht konvexen Graphen-Neuralnetzwerken | 非Convex 图形神经网络中的边缘编辑的影响函数 2506.04694v1 |
Authors (6): Jaeseung Heo, Kyeongheung Yun, Seokwon Yoon, MoonJeong Park, Jungseul Ok, Dongwoo Kim
Understanding how individual edges influence the behavior of graph neural networks (GNNs) is essential for improving their interpretability and robustness. Graph influence functions have emerged as promising tools to efficiently estimate the effects of edge deletions without retraining. However, existing influence prediction methods rely on strict convexity assumptions, exclusively consider the influence of edge deletions while disregarding edge insertions, and fail to capture changes in message propagation caused by these modifications. In this work, we propose a proximal Bregman response function specifically tailored for GNNs, relaxing the convexity requirement and enabling accurate influence prediction for standard neural network architectures. Furthermore, our method explicitly accounts for message propagation effects and extends influence prediction to both edge deletions and insertions in a principled way. Experiments with real-world datasets demonstrate accurate influence predictions for different characteristics of GNNs. We further demonstrate that the influence function is versatile in applications such as graph rewiring and adversarial attacks.
nan
Article 2034
Title@2025-06-05 (4): Towards Better Generalization via Distributional Input Projection Network
Title: Towards Better Generalization via Distributional Input Projection Network | Auf dem Weg zu einer besseren Verallgemeinerung über das Distributional Input Projection Network | 通过分配投入预测网络更好地实现普遍化 2506.04690v1 |
Authors (4): Yifan Hao, Yanxin Lu, Xinwei Shen, Tong Zhang
As overparameterized models become increasingly prevalent, training loss alone offers limited insight into generalization performance. While smoothness has been linked to improved generalization across various settings, directly enforcing smoothness in neural networks remains challenging. To address this, we introduce Distributional Input Projection Networks (DIPNet), a novel framework that projects inputs into learnable distributions at each layer. This distributional representation induces a smoother loss landscape with respect to the input, promoting better generalization. We provide theoretical analysis showing that DIPNet reduces both local smoothness measures and the Lipschitz constant of the network, contributing to improved generalization performance. Empirically, we validate DIPNet across a wide range of architectures and tasks, including Vision Transformers (ViTs), Large Language Models (LLMs), ResNet and MLPs. Our method consistently enhances test performance under standard settings, adversarial attacks, out-of-distribution inputs, and reasoning benchmarks. We demonstrate that the proposed input projection strategy can be seamlessly integrated into existing models, providing a general and effective approach for boosting generalization performance in modern deep learning.
nan
Article 2035
Title@2025-06-05 (4): Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models
Title: Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models | Recycling the Web: Eine Methode zur Verbesserung der Vorschulung von Daten Qualität und Menge für Sprachmodelle | 网上再循环:提高语文模式培训前数据质量和数量的方法 2506.04689v1 |
Authors (7): Thao Nguyen, Yang Li, Olga Golovneva, Luke Zettlemoyer, Sewoong Oh, Ludwig Schmidt, Xian Li
Scaling laws predict that the performance of large language models improves with increasing model size and data size. In practice, pre-training has been relying on massive web crawls, using almost all data sources publicly available on the internet so far. However, this pool of natural data does not grow at the same rate as the compute supply. Furthermore, the availability of high-quality texts is even more limited: data filtering pipelines often remove up to 99% of the initial web scrapes to achieve state-of-the-art. To address the “data wall” of pre-training scaling, our work explores ways to transform and recycle data discarded in existing filtering processes. We propose REWIRE, REcycling the Web with guIded REwrite, a method to enrich low-quality documents so that they could become useful for training. This in turn allows us to increase the representation of synthetic data in the final pre-training set. Experiments at 1B, 3B and 7B scales of the DCLM benchmark show that mixing high-quality raw texts and our rewritten texts lead to 1.0, 1.3 and 2.5 percentage points improvement respectively across 22 diverse tasks, compared to training on only filtered web data. Training on the raw-synthetic data mix is also more effective than having access to 2x web data. Through further analysis, we demonstrate that about 82% of the mixed in texts come from transforming lower-quality documents that would otherwise be discarded. REWIRE also outperforms related approaches of generating synthetic data, including Wikipedia-style paraphrasing, question-answer synthesizing and knowledge extraction. These results suggest that recycling web texts holds the potential for being a simple and effective approach for scaling pre-training data.
nan
Article 2036
Title@2025-06-05 (4): UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules
Title: UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules | UniSim: Ein einheitlicher Simulator für zeitgesteuerte Dynamik von Biomolekülen | UniSim: 生物分子时间分析分析动态的统一模拟器 2506.03157v2 |
Authors (3): Ziyang Yu, Wenbing Huang, Yang Liu
Molecular Dynamics (MD) simulations are essential for understanding the atomic-level behavior of molecular systems, giving insights into their transitions and interactions. However, classical MD techniques are limited by the trade-off between accuracy and efficiency, while recent deep learning-based improvements have mostly focused on single-domain molecules, lacking transferability to unfamiliar molecular systems. Therefore, we propose \textbf{Uni}fied \textbf{Sim}ulator (UniSim), which leverages cross-domain knowledge to enhance the understanding of atomic interactions. First, we employ a multi-head pretraining approach to learn a unified atomic representation model from a large and diverse set of molecular data. Then, based on the stochastic interpolant framework, we learn the state transition patterns over long timesteps from MD trajectories, and introduce a force guidance module for rapidly adapting to different chemical environments. Our experiments demonstrate that UniSim achieves highly competitive performance across small molecules, peptides, and proteins.
nan
Article 2037
Title@2025-06-05 (4): Model Predictive Control is Almost Optimal for Restless Bandit
Title: Model Predictive Control is Almost Optimal for Restless Bandit | Modell Predictive Control ist fast optimal für Restless Bandit | 模型预测控制几乎是无休止强盗的最佳模式 2410.06307v2 |
Authors (2): Nicolas Gast, Dheeraj Narasimha
We consider the discrete time infinite horizon average reward restless markovian bandit (RMAB) problem. We propose a \emph{model predictive control} based non-stationary policy with a rolling computational horizon $\tau$. At each time-slot, this policy solves a $\tau$ horizon linear program whose first control value is kept as a control for the RMAB. Our solution requires minimal assumptions and quantifies the loss in optimality in terms of $\tau$ and the number of arms, $N$. We show that its sub-optimality gap is $O(1/\sqrt{N})$ in general, and $\exp(-\Omega(N))$ under a local-stability condition. Our proof is based on a framework from dynamic control known as \emph{dissipativity}. Our solution easy to implement and performs very well in practice when compared to the state of the art. Further, both our solution and our proof methodology can easily be generalized to more general constrained MDP settings and should thus, be of great interest to the burgeoning RMAB community.
nan
Article 2038
Title@2025-06-05 (4): Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?
Title: Neural Discovery in Mathematics: Do Machines Dream of Colored Planes? | Neural Discovery in der Mathematik: Träumen Maschinen von farbigen Flugzeugen? | 数学中的神经发现:机器梦到彩色平面吗? 2501.18527v3 |
Authors (5): Konrad Mundinger, Max Zimmer, Aldo Kiem, Christoph Spiegel, Sebastian Pokutta
We demonstrate how neural networks can drive mathematical discovery through a case study of the Hadwiger-Nelson problem, a long-standing open problem at the intersection of discrete geometry and extremal combinatorics that is concerned with coloring the plane while avoiding monochromatic unit-distance pairs. Using neural networks as approximators, we reformulate this mixed discrete-continuous geometric coloring problem with hard constraints as an optimization task with a probabilistic, differentiable loss function. This enables gradient-based exploration of admissible configurations that most significantly led to the discovery of two novel six-colorings, providing the first improvement in thirty years to the off-diagonal variant of the original problem. Here, we establish the underlying machine learning approach used to obtain these results and demonstrate its broader applicability through additional numerical insights.
nan
Article 2039
Title@2025-06-05 (4): Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction
Title: Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction | Diffusion mit einem sprachlichen Kompass: Steuerung der Generation klinisch plausibler Zukunft sMRI-Darstellungen für frühe MCI-Konversionsvorhersage | 语言指南的传播:指导为早期MCI转换预测生成临床可视未来SMRI代表 2506.05428v1 |
Authors (4): Zhihao Tang, Chaozhuo Li, Litian Zhang, Xi Zhang
Early prediction of Mild Cognitive Impairment (MCI) conversion is hampered by a trade-off between immediacy–making fast predictions from a single baseline sMRI–and accuracy–leveraging longitudinal scans to capture disease progression. We propose MCI-Diff, a diffusion-based framework that synthesizes clinically plausible future sMRI representations directly from baseline data, achieving both real-time risk assessment and high predictive performance. First, a multi-task sequence reconstruction strategy trains a shared denoising network on interpolation and extrapolation tasks to handle irregular follow-up sampling and learn robust latent trajectories. Second, an LLM-driven “linguistic compass” is introduced for clinical plausibility sampling: generated feature candidates are quantized, tokenized, and scored by a fine-tuned language model conditioned on expected structural biomarkers, guiding autoregressive generation toward realistic disease patterns. Experiments on ADNI and AIBL cohorts show that MCI-Diff outperforms state-of-the-art baselines, improving early conversion accuracy by 5-12%.
nan
Article 2040
Title@2025-06-05 (4): Urania: Differentially Private Insights into AI Use
Title: Urania: Differentially Private Insights into AI Use | Urania: Unterschiedliche private Einblicke in die KI-Nutzung | Urania: 差异私人透视转化为 AI 使用 2506.04681v1 |
Authors (11): Daogao Liu, Edith Cohen, Badih Ghazi, Peter Kairouz, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi, Adam Sealfon, Da Yu, Chiyuan Zhang
We introduce $Urania$, a novel framework for generating insights about LLM chatbot interactions with rigorous differential privacy (DP) guarantees. The framework employs a private clustering mechanism and innovative keyword extraction methods, including frequency-based, TF-IDF-based, and LLM-guided approaches. By leveraging DP tools such as clustering, partition selection, and histogram-based summarization, $Urania$ provides end-to-end privacy protection. Our evaluation assesses lexical and semantic content preservation, pair similarity, and LLM-based metrics, benchmarking against a non-private Clio-inspired pipeline (Tamkin et al., 2024). Moreover, we develop a simple empirical privacy evaluation that demonstrates the enhanced robustness of our DP pipeline. The results show the framework’s ability to extract meaningful conversational insights while maintaining stringent user privacy, effectively balancing data utility with privacy preservation.
nan
Article 2041
Title@2025-06-05 (4): The cost of ensembling: is it always worth combining?
Title: The cost of ensembling: is it always worth combining? | Die Kosten für die Montage: lohnt es sich immer zu kombinieren? | 集合的成本:总值得合并吗? 2506.04677v1 |
Authors (1): Marco Zanotti
Given the continuous increase in dataset sizes and the complexity of forecasting models, the trade-off between forecast accuracy and computational cost is emerging as an extremely relevant topic, especially in the context of ensemble learning for time series forecasting. To asses it, we evaluated ten base models and eight ensemble configurations across two large-scale retail datasets (M5 and VN1), considering both point and probabilistic accuracy under varying retraining frequencies. We showed that ensembles consistently improve forecasting performance, particularly in probabilistic settings. However, these gains come at a substantial computational cost, especially for larger, accuracy-driven ensembles. We found that reducing retraining frequency significantly lowers costs, with minimal impact on accuracy, particularly for point forecasts. Moreover, efficiency-driven ensembles offer a strong balance, achieving competitive accuracy with considerably lower costs compared to accuracy-optimized combinations. Most importantly, small ensembles of two or three models are often sufficient to achieve near-optimal results. These findings provide practical guidelines for deploying scalable and cost-efficient forecasting systems, supporting the broader goals of sustainable AI in forecasting. Overall, this work shows that careful ensemble design and retraining strategy selection can yield accurate, robust, and cost-effective forecasts suitable for real-world applications.
nan
Article 2042
Title@2025-06-05 (4): Gen-n-Val: Agentic Image Data Generation and Validation
Title: Gen-n-Val: Agentic Image Data Generation and Validation | Gen-n-Val: Gen-n-Val: Agentische Bilddatengenerierung und -validierung | Gen-n-Val: 代理图像数据生成和校验 2506.04676v1 |
Authors (5): Jing-En Huang, I-Sheng Fang, Tzuhsuan Huang, Chih-Yu Wang, Jun-Cheng Chen
Recently, Large Language Models (LLMs) and Vision Large Language Models (VLLMs) have demonstrated impressive performance as agents across various tasks while data scarcity and label noise remain significant challenges in computer vision tasks, such as object detection and instance segmentation. A common solution for resolving these issues is to generate synthetic data. However, current synthetic data generation methods struggle with issues, such as multiple objects per mask, inaccurate segmentation, and incorrect category labels, limiting their effectiveness. To address these issues, we introduce Gen-n-Val, a novel agentic data generation framework that leverages Layer Diffusion (LD), LLMs, and VLLMs to produce high-quality, single-object masks and diverse backgrounds. Gen-n-Val consists of two agents: (1) The LD prompt agent, an LLM, optimizes prompts for LD to generate high-quality foreground instance images and segmentation masks. These optimized prompts ensure the generation of single-object synthetic data with precise instance masks and clean backgrounds. (2) The data validation agent, a VLLM, which filters out low-quality synthetic instance images. The system prompts for both agents are refined through TextGrad. Additionally, we use image harmonization to combine multiple instances within scenes. Compared to state-of-the-art synthetic data approaches like MosaicFusion, our approach reduces invalid synthetic data from 50% to 7% and improves performance by 1% mAP on rare classes in COCO instance segmentation with YOLOv9c and YOLO11m. Furthermore, Gen-n-Val shows significant improvements (7. 1% mAP) over YOLO-Worldv2-M in open-vocabulary object detection benchmarks with YOLO11m. Moreover, Gen-n-Val improves the performance of YOLOv9 and YOLO11 families in instance segmentation and object detection.
nan
Article 2043
Title@2025-06-05 (4): Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models
Title: Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models | Psi-Sampler: Erste Partikel-Probenahme für SMC-basierte Inferenz-Zeit-Belohnung in Score-Modellen | Psi-Sampler: 分数模型中基于 SMC 的推理-时间回报比对初始粒子抽样 2506.01320v2 |
Authors (4): Taehoon Yoon, Yunhong Min, Kyeongmin Yeo, Minhyuk Sung
We introduce $\Psi$-Sampler, an SMC-based framework incorporating pCNL-based initial particle sampling for effective inference-time reward alignment with a score-based generative model. Inference-time reward alignment with score-based generative models has recently gained significant traction, following a broader paradigm shift from pre-training to post-training optimization. At the core of this trend is the application of Sequential Monte Carlo (SMC) to the denoising process. However, existing methods typically initialize particles from the Gaussian prior, which inadequately captures reward-relevant regions and results in reduced sampling efficiency. We demonstrate that initializing from the reward-aware posterior significantly improves alignment performance. To enable posterior sampling in high-dimensional latent spaces, we introduce the preconditioned Crank-Nicolson Langevin (pCNL) algorithm, which combines dimension-robust proposals with gradient-informed dynamics. This approach enables efficient and scalable posterior sampling and consistently improves performance across various reward alignment tasks, including layout-to-image generation, quantity-aware generation, and aesthetic-preference generation, as demonstrated in our experiments. Project Webpage: https://psi-sampler.github.io/
nan
Article 2044
Title@2025-06-05 (4): TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization
Title: TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization | TANGO: Clustering mit Typicality-Aware Nonlocal Mode-Seeking und Graph-Cut-Optimierung | TANGO:以非本地模式搜索和图集优化组合组合 2408.10084v2 |
Authors (3): Haowen Ma, Zhiguo Long, Hua Meng
Density-based mode-seeking methods generate a \emph{density-ascending dependency} from low-density points towards higher-density neighbors. Current mode-seeking methods identify modes by breaking some dependency connections, but relying heavily on local data characteristics, requiring case-by-case threshold settings or human intervention to be effective for different datasets. To address this issue, we introduce a novel concept called \emph{typicality}, by exploring the \emph{locally defined} dependency from a \emph{global} perspective, to quantify how confident a point would be a mode. We devise an algorithm that effectively and efficiently identifies modes with the help of the global-view typicality. To implement and validate our idea, we design a clustering method called TANGO, which not only leverages typicality to detect modes, but also utilizes graph-cut with an improved \emph{path-based similarity} to aggregate data into the final clusters. Moreover, this paper also provides some theoretical analysis on the proposed algorithm. Experimental results on several synthetic and extensive real-world datasets demonstrate the effectiveness and superiority of TANGO. The code is available at https://github.com/SWJTU-ML/TANGO_code.
nan
Article 2045
Title@2025-06-05 (4): MTPNet: Multi-Grained Target Perception for Unified Activity Cliff Prediction
Title: MTPNet: Multi-Grained Target Perception for Unified Activity Cliff Prediction | MTPNet: Multi-Grained Target Perception für Unified Activity Cliff Prediction | MTPNet: 统一活动悬崖预测的多重目标感知 2506.05427v1 |
Authors (5): Zishan Shu, Yufan Deng, Hongyu Zhang, Zhiwei Nie, Jie Chen
Activity cliff prediction is a critical task in drug discovery and material design. Existing computational methods are limited to handling single binding targets, which restricts the applicability of these prediction models. In this paper, we present the Multi-Grained Target Perception network (MTPNet) to incorporate the prior knowledge of interactions between the molecules and their target proteins. Specifically, MTPNet is a unified framework for activity cliff prediction, which consists of two components: Macro-level Target Semantic (MTS) guidance and Micro-level Pocket Semantic (MPS) guidance. By this way, MTPNet dynamically optimizes molecular representations through multi-grained protein semantic conditions. To our knowledge, it is the first time to employ the receptor proteins as guiding information to effectively capture critical interaction details. Extensive experiments on 30 representative activity cliff datasets demonstrate that MTPNet significantly outperforms previous approaches, achieving an average RMSE improvement of 18.95% on top of several mainstream GNN architectures. Overall, MTPNet internalizes interaction patterns through conditional deep learning to achieve unified predictions of activity cliffs, helping to accelerate compound optimization and design. Codes are available at: https://github.com/ZishanShu/MTPNet.
nan
Article 2046
Title@2025-06-05 (4): FedAPM: Federated Learning via ADMM with Partial Model Personalization
Title: FedAPM: Federated Learning via ADMM with Partial Model Personalization | FedAPM: Federated Learning via ADMM mit Teilmodell-Personalisierung | FedAPP:通过具有部分示范性人格化的ADMMD学习联合会 2506.04672v1 |
Authors (9): Shengkun Zhu, Feiteng Nie, Jinshan Zeng, Sheng Wang, Yuan Sun, Yuan Yao, Shangfeng Chen, Quanqing Xu, Chuanhui Yang
In federated learning (FL), the assumption that datasets from different devices are independent and identically distributed (i.i.d.) often does not hold due to user differences, and the presence of various data modalities across clients makes using a single model impractical. Personalizing certain parts of the model can effectively address these issues by allowing those parts to differ across clients, while the remaining parts serve as a shared model. However, we found that partial model personalization may exacerbate client drift (each client’s local model diverges from the shared model), thereby reducing the effectiveness and efficiency of FL algorithms. We propose an FL framework based on the alternating direction method of multipliers (ADMM), referred to as FedAPM, to mitigate client drift. We construct the augmented Lagrangian function by incorporating first-order and second-order proximal terms into the objective, with the second-order term providing fixed correction and the first-order term offering compensatory correction between the local and shared models. Our analysis demonstrates that FedAPM, by using explicit estimates of the Lagrange multiplier, is more stable and efficient in terms of convergence compared to other FL frameworks. We establish the global convergence of FedAPM training from arbitrary initial points to a stationary point, achieving three types of rates: constant, linear, and sublinear, under mild assumptions. We conduct experiments using four heterogeneous and multimodal datasets with different metrics to validate the performance of FedAPM. Specifically, FedAPM achieves faster and more accurate convergence, outperforming the SOTA methods with average improvements of 12.3% in test accuracy, 16.4% in F1 score, and 18.0% in AUC while requiring fewer communication rounds.
nan
Article 2047
Title@2025-06-05 (4): Noise-Resistant Label Reconstruction Feature Selection for Partial Multi-Label Learning
Title: Noise-Resistant Label Reconstruction Feature Selection for Partial Multi-Label Learning | Noise-Resistent Label Reconstruction Feature Selection for Partial Multi-Label Learning | 部分多标签学习选择部分多标签学习的恢复标签功能 2506.04669v1 |
Authors (4): Wanfu Gao, Hanlin Pan, Qingqi Han, Kunpeng Liu
The “Curse of dimensionality” is prevalent across various data patterns, which increases the risk of model overfitting and leads to a decline in model classification performance. However, few studies have focused on this issue in Partial Multi-label Learning (PML), where each sample is associated with a set of candidate labels, at least one of which is correct. Existing PML methods addressing this problem are mainly based on the low-rank assumption. However, low-rank assumption is difficult to be satisfied in practical situations and may lead to loss of high-dimensional information. Furthermore, we find that existing methods have poor ability to identify positive labels, which is important in real-world scenarios. In this paper, a PML feature selection method is proposed considering two important characteristics of dataset: label relationship’s noise-resistance and label connectivity. Our proposed method utilizes label relationship’s noise-resistance to disambiguate labels. Then the learning process is designed through the reformed low-rank assumption. Finally, representative labels are found through label connectivity, and the weight matrix is reconstructed to select features with strong identification ability to these labels. The experimental results on benchmark datasets demonstrate the superiority of the proposed method.
nan
Article 2048
Title@2025-06-05 (4): Mixture-of-Experts Meets In-Context Reinforcement Learning
Title: Mixture-of-Experts Meets In-Context Reinforcement Learning | Mixture-of-Experts trifft auf In-Context-Verstärktes Lernen | 混合专家会议 文中加强学习 2506.05426v1 |
Authors (7): Wenhao Wu, Fuhong Liu, Haoru Li, Zican Hu, Daoyi Dong, Chunlin Chen, Zhi Wang
In-context reinforcement learning (ICRL) has emerged as a promising paradigm for adapting RL agents to downstream tasks through prompt conditioning. However, two notable challenges remain in fully harnessing in-context learning within RL domains: the intrinsic multi-modality of the state-action-reward data and the diverse, heterogeneous nature of decision tasks. To tackle these challenges, we propose \textbf{T2MIR} (\textbf{T}oken- and \textbf{T}ask-wise \textbf{M}oE for \textbf{I}n-context \textbf{R}L), an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models. T2MIR substitutes the feedforward layer with two parallel layers: a token-wise MoE that captures distinct semantics of input tokens across multiple modalities, and a task-wise MoE that routes diverse tasks to specialized experts for managing a broad task distribution with alleviated gradient conflicts. To enhance task-wise routing, we introduce a contrastive learning method that maximizes the mutual information between the task and its router representation, enabling more precise capture of task-relevant information. The outputs of two MoE components are concatenated and fed into the next layer. Comprehensive experiments show that T2MIR significantly facilitates in-context learning capacity and outperforms various types of baselines. We bring the potential and promise of MoE to ICRL, offering a simple and scalable architectural enhancement to advance ICRL one step closer toward achievements in language and vision communities. Our code is available at https://github.com/NJU-RL/T2MIR.
nan
Article 2049
Title@2025-06-05 (4): Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs
Title: Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs | Multiple Invertible und Partial-Equivariant Funktion für Latent Vector Transformation zur Verbesserung der Disentanglementierung in VAEs | 用于中子矢量变异的多重可垂直和部分等同函数,以强化 VAEs 的分解 2502.03740v2 |
Authors (3): Hee-Jun Jung, Jaehyoung Jeong, Kangil Kim
Disentanglement learning is a core issue for understanding and re-using trained information in Variational AutoEncoder (VAE), and effective inductive bias has been reported as a key factor. However, the actual implementation of such bias is still vague. In this paper, we propose a novel method, called Multiple Invertible and partial-equivariant transformation (MIPE-transformation), to inject inductive bias by 1) guaranteeing the invertibility of latent-to-latent vector transformation while preserving a certain portion of equivariance of input-to-latent vector transformation, called Invertible and partial-equivariant transformation (IPE-transformation), 2) extending the form of prior and posterior in VAE frameworks to an unrestricted form through a learnable conversion to an approximated exponential family, called Exponential Family conversion (EF-conversion), and 3) integrating multiple units of IPE-transformation and EF-conversion, and their training. In experiments on 3D Cars, 3D Shapes, and dSprites datasets, MIPE-transformation improves the disentanglement performance of state-of-the-art VAEs.
nan
Article 2050
Title@2025-06-05 (4): Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning
Title: Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning | Latent Safety-Constrained Policy Approach für sicheres Offline-Verstärkungslernen | 安全离线强化安全学习受安保约束的政策办法 2412.08794v2 |
Authors (4): Prajwal Koirala, Zhanhong Jiang, Soumik Sarkar, Cody Fleming
In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints, utilizing only offline data. Traditional methods often face difficulties in balancing these constraints, leading to either diminished performance or increased safety risks. We address these issues with a novel approach that begins by learning a conservatively safe policy through the use of Conditional Variational Autoencoders, which model the latent safety constraints. Subsequently, we frame this as a Constrained Reward-Return Maximization problem, wherein the policy aims to optimize rewards while complying with the inferred latent safety constraints. This is achieved by training an encoder with a reward-Advantage Weighted Regression objective within the latent constraint space. Our methodology is supported by theoretical analysis, including bounds on policy performance and sample complexity. Extensive empirical evaluation on benchmark datasets, including challenging autonomous driving scenarios, demonstrates that our approach not only maintains safety compliance but also excels in cumulative reward optimization, surpassing existing methods. Additional visualizations provide further insights into the effectiveness and underlying mechanisms of our approach.
nan
Article 2051
Title@2025-06-05 (4): Can Artificial Intelligence Trade the Stock Market?
Title: Can Artificial Intelligence Trade the Stock Market? | Kann Künstliche Intelligenz den Aktienmarkt handeln? | 人工智能贸易 股市? 2506.04658v1 |
Authors (2): Jędrzej Maskiewicz, Paweł Sakowski
The paper explores the use of Deep Reinforcement Learning (DRL) in stock market trading, focusing on two algorithms: Double Deep Q-Network (DDQN) and Proximal Policy Optimization (PPO) and compares them with Buy and Hold benchmark. It evaluates these algorithms across three currency pairs, the S&P 500 index and Bitcoin, on the daily data in the period of 2019-2023. The results demonstrate DRL’s effectiveness in trading and its ability to manage risk by strategically avoiding trades in unfavorable conditions, providing a substantial edge over classical approaches, based on supervised learning in terms of risk-adjusted returns.
nan
Article 2052
Title@2025-06-05 (4): The Oversmoothing Fallacy: A Misguided Narrative in GNN Research
Title: The Oversmoothing Fallacy: A Misguided Narrative in GNN Research | Der überwältigende Fehler: Ein fehlgeleitetes Narrativ in der GNN-Forschung | 《压倒性代价:GNN研究中的误导性叙述》 2506.04653v1 |
Authors (5): MoonJeong Park, Sunghyun Choi, Jaeseung Heo, Eunhyeok Park, Dongwoo Kim
Oversmoothing has been recognized as a main obstacle to building deep Graph Neural Networks (GNNs), limiting the performance. This position paper argues that the influence of oversmoothing has been overstated and advocates for a further exploration of deep GNN architectures. Given the three core operations of GNNs, aggregation, linear transformation, and non-linear activation, we show that prior studies have mistakenly confused oversmoothing with the vanishing gradient, caused by transformation and activation rather than aggregation. Our finding challenges prior beliefs about oversmoothing being unique to GNNs. Furthermore, we demonstrate that classical solutions such as skip connections and normalization enable the successful stacking of deep GNN layers without performance degradation. Our results clarify misconceptions about oversmoothing and shed new light on the potential of deep GNNs.
nan
Article 2053
Title@2025-06-05 (4): Active Learning with Variational Quantum Circuits for Quantum Process Tomography
Title: Active Learning with Variational Quantum Circuits for Quantum Process Tomography | Aktives Lernen mit unterschiedlichen Quantenkreisen für die Quantenprozesstomographie | 使用量子过程地形图谱学量子变量电路进行主动学习 2412.20925v2 |
Authors (3): Jiaqi Yang, Xiaohua Xu, Wei Xie
Quantum process tomography (QPT) is a fundamental tool for fully characterizing quantum systems. It relies on querying a set of quantum states as input to the quantum process. Previous QPT methods typically employ a straightforward strategy for randomly selecting quantum states, overlooking differences in informativeness among them. In this work, we propose a general active learning (AL) framework that adaptively selects the most informative subset of quantum states for reconstruction. We design and evaluate various AL algorithms and provide practical guidelines for selecting suitable methods in different scenarios. In particular, we introduce a learning framework that leverages the widely-used variational quantum circuits (VQCs) to perform the QPT task and integrate our AL algorithms into the query step. We demonstrate our algorithms by reconstructing the unitary quantum processes resulting from random quantum circuits with up to seven qubits. Numerical results show that our AL algorithms achieve significantly improved reconstruction, and the improvement increases with the size of the underlying quantum system. Our work opens new avenues for further advancing existing QPT methods.
nan
Article 2054
Title@2025-06-05 (4): Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction
Title: Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction | Neural Network Reprogrammierbarkeit: Ein einheitliches Thema auf Modellumprogrammierung, Prompt-Tuning und Prompt-Instruktion | 神经网络重组:关于模式重组、即时提款和即时指示的统一主题 2506.04650v1 |
Authors (7): Zesheng Ye, Chengyi Cai, Ruijiang Dong, Jianzhong Qi, Lei Feng, Pin-Yu Chen, Feng Liu
As large-scale pre-trained foundation models continue to expand in size and capability, efficiently adapting them to specific downstream tasks has become increasingly critical. Despite substantial progress, existing adaptation approaches have evolved largely in isolation, without a clear understanding of their interrelationships. This survey introduces neural network reprogrammability as a unifying framework that bridges mainstream model adaptation techniques–model reprogramming, prompt tuning, and prompt instruction–previously fragmented research areas yet converges on a shared principle: repurposing a pre-trained model by manipulating information at the interfaces while keeping the model parameters frozen. These methods exploit neural networks’ sensitivity to manipulation on different interfaces, be it through perturbing inputs, inserting tokens into intermediate layers, or providing task-specific examples in context, to redirect model behaviors towards desired outcomes. We then present a taxonomy that categorizes such information manipulation-based adaptation approaches across four key dimensions: manipulation format (fixed or learnable), location (interfaces where manipulations occur), operator (how they are applied), and output alignment requirement (post-processing needed to align outputs with downstream tasks). Notably, this framework applies consistently across data modalities, independent of specific model architectures. Moreover, viewing established techniques like in-context learning and chain-of-thought prompting through this lens reveals both their theoretical connections and practical distinctions. We further analyze remaining technical challenges and ethical considerations, positioning neural network reprogrammability as a fundamental paradigm for efficient model adaptation. We lastly identify promising research directions emerging from this integrative viewpoint.
nan
Article 2055
Title@2025-06-05 (4): Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing
Title: Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing | Nicht alle Optionen sind gleich: Textuelle Optionsgewichtung für Token-Efficient LLM-basierte Wissensverfolgung | 并非所有选项都创建为等: Token- Efficient LLM 知识追踪的文本选项比重 2410.12872v2 |
Authors (4): JongWoo Kim, SeongYeub Chu, Bryan Wong, Mun Yi
Large Language Models (LLMs) have recently emerged as promising tools for knowledge tracing (KT) due to their strong reasoning and generalization abilities. While recent LLM-based KT methods have proposed new prompt formats, they struggle to represent the full interaction histories of example learners within a single prompt during in-context learning (ICL), resulting in limited scalability and high computational cost under token constraints. In this work, we present \textit{LLM-based Option-weighted Knowledge Tracing (LOKT)}, a simple yet effective framework that encodes the interaction histories of example learners in context as \textit{textual categorical option weights (TCOW)}. TCOW are semantic labels (e.g., ``inadequate’’) assigned to the options selected by learners when answering questions, enhancing the interpretability of LLMs. Experiments on multiple-choice datasets show that LOKT outperforms existing non-LLM and LLM-based KT models in both cold-start and warm-start settings. Moreover, LOKT enables scalable and cost-efficient inference, achieving strong performance even under strict token constraints. Our code is available at \href{https://anonymous.4open.science/r/LOKT_model-3233}{https://anonymous.4open.science/r/LOKT_model-3233}.
nan
Article 2056
Title@2025-06-05 (4): ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation
Title: ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation | ActivePusher: Aktives Lernen und Planen mit Residual Physics für nichtprähensile Manipulation | 主动推进器: 主动学习和规划,对非传染病操纵进行残余物理研究与规划 2506.04646v1 |
Authors (3): Zhuoyun Zhong, Seyedali Golestaneh, Constantinos Chamzas
Planning with learned dynamics models offers a promising approach toward real-world, long-horizon manipulation, particularly in nonprehensile settings such as pushing or rolling, where accurate analytical models are difficult to obtain. Although learning-based methods hold promise, collecting training data can be costly and inefficient, as it often relies on randomly sampled interactions that are not necessarily the most informative. To address this challenge, we propose ActivePusher, a novel framework that combines residual-physics modeling with kernel-based uncertainty-driven active learning to focus data acquisition on the most informative skill parameters. Additionally, ActivePusher seamlessly integrates with model-based kinodynamic planners, leveraging uncertainty estimates to bias control sampling toward more reliable actions. We evaluate our approach in both simulation and real-world environments and demonstrate that it improves data efficiency and planning success rates compared to baseline methods.
nan
Article 2057
Title@2025-06-05 (4): Inference economics of language models
Title: Inference economics of language models | Schlußfolgerung Wirtschaftlichkeit von Sprachmodellen | 语言模式的推论经济学 2506.04645v1 |
Authors (1): Ege Erdil
We develop a theoretical model that addresses the economic trade-off between cost per token versus serial token generation speed when deploying LLMs for inference at scale. Our model takes into account arithmetic, memory bandwidth, network bandwidth and latency constraints; and optimizes over different parallelism setups and batch sizes to find the ones that optimize serial inference speed at a given cost per token. We use the model to compute Pareto frontiers of serial speed versus cost per token for popular language models.
nan
Article 2058
Title@2025-06-05 (4): MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
Title: MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning | MoDoMoDo: Multi-Domain-Datenmischungen für multimodales LLM-Verstärkungslernen | MoDoMoMoDoDo:多式LLM强化学习多功能数据混合体 2505.24871v2 |
Authors (10): Yiqing Liang, Jielin Qiu, Wenhao Ding, Zuxin Liu, James Tompkin, Mengdi Xu, Mengzhou Xia, Zhengzhong Tu, Laixi Shi, Jiacheng Zhu
Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a powerful paradigm for post-training large language models (LLMs), achieving state-of-the-art performance on tasks with structured, verifiable answers. Applying RLVR to Multimodal LLMs (MLLMs) presents significant opportunities but is complicated by the broader, heterogeneous nature of vision-language tasks that demand nuanced visual, logical, and spatial capabilities. As such, training MLLMs using RLVR on multiple datasets could be beneficial but creates challenges with conflicting objectives from interaction among diverse datasets, highlighting the need for optimal dataset mixture strategies to improve generalization and reasoning. We introduce a systematic post-training framework for Multimodal LLM RLVR, featuring a rigorous data mixture problem formulation and benchmark implementation. Specifically, (1) We developed a multimodal RLVR framework for multi-dataset post-training by curating a dataset that contains different verifiable vision-language problems and enabling multi-domain online RL learning with different verifiable rewards; (2) We proposed a data mixture strategy that learns to predict the RL fine-tuning outcome from the data mixture distribution, and consequently optimizes the best mixture. Comprehensive experiments showcase that multi-domain RLVR training, when combined with mixture prediction strategies, can significantly boost MLLM general reasoning capacities. Our best mixture improves the post-trained model’s accuracy on out-of-distribution benchmarks by an average of 5.24% compared to the same model post-trained with uniform data mixture, and by a total of 20.74% compared to the pre-finetuning baseline.
nan
Article 2059
Title@2025-06-05 (4): Adversarial Combinatorial Semi-bandits with Graph Feedback
Title: Adversarial Combinatorial Semi-bandits with Graph Feedback | Adversariale Kombinatoriale Halbbänder mit Graph Feedback | 带有图图反馈的半斜面 2502.18826v4 |
Authors (1): Yuxiao Wen
In combinatorial semi-bandits, a learner repeatedly selects from a combinatorial decision set of arms, receives the realized sum of rewards, and observes the rewards of the individual selected arms as feedback. In this paper, we extend this framework to include \emph{graph feedback}, where the learner observes the rewards of all neighboring arms of the selected arms in a feedback graph $G$. We establish that the optimal regret over a time horizon $T$ scales as $\widetilde{\Theta}(S\sqrt{T}+\sqrt{\alpha ST})$, where $S$ is the size of the combinatorial decisions and $\alpha$ is the independence number of $G$. This result interpolates between the known regrets $\widetilde\Theta(S\sqrt{T})$ under full information (i.e., $G$ is complete) and $\widetilde\Theta(\sqrt{KST})$ under the semi-bandit feedback (i.e., $G$ has only self-loops), where $K$ is the total number of arms. A key technical ingredient is to realize a convexified action using a random decision vector with negative correlations. We also show that online stochastic mirror descent (OSMD) that only realizes convexified actions in expectation is suboptimal.
nan
Article 2060
Title@2025-06-05 (4): An analytic theory of creativity in convolutional diffusion models
Title: An analytic theory of creativity in convolutional diffusion models | Eine analytische Theorie der Kreativität in konvolutionären Diffusionsmodellen | 革命扩散模型中的创造性分析理论 2412.20292v2 |
Authors (2): Mason Kamb, Surya Ganguli
We obtain an analytic, interpretable and predictive theory of creativity in convolutional diffusion models. Indeed, score-matching diffusion models can generate highly original images that lie far from their training data. However, optimal score-matching theory suggests that these models should only be able to produce memorized training examples. To reconcile this theory-experiment gap, we identify two simple inductive biases, locality and equivariance, that: (1) induce a form of combinatorial creativity by preventing optimal score-matching; (2) result in fully analytic, completely mechanistically interpretable, local score (LS) and equivariant local score (ELS) machines that, (3) after calibrating a single time-dependent hyperparameter can quantitatively predict the outputs of trained convolution only diffusion models (like ResNets and UNets) with high accuracy (median $r^2$ of $0.95, 0.94, 0.94, 0.96$ for our top model on CIFAR10, FashionMNIST, MNIST, and CelebA). Our model reveals a locally consistent patch mosaic mechanism of creativity, in which diffusion models create exponentially many novel images by mixing and matching different local training set patches at different scales and image locations. Our theory also partially predicts the outputs of pre-trained self-attention enabled UNets (median $r^2 \sim 0.77$ on CIFAR10), revealing an intriguing role for attention in carving out semantic coherence from local patch mosaics.
nan
Article 2061
Title@2025-06-05 (4): Composing Agents to Minimize Worst-case Risk
Title: Composing Agents to Minimize Worst-case Risk | Komponieren von Agenten zur Minimierung des schlimmsten Risikos | 尽量减少最坏情况风险的混合剂 2506.04632v1 |
Authors (2): Guruprerana Shabadi, Rajeev Alur
From software development to robot control, modern agentic systems decompose complex objectives into a sequence of subtasks and choose a set of specialized AI agents to complete them. We formalize an agentic workflow as a directed acyclic graph, called an agent graph, where edges represent AI agents and paths correspond to feasible compositions of agents. When deploying these systems in the real world, we need to choose compositions of agents that not only maximize the task success, but also minimize risk where the risk captures requirements like safety, fairness, and privacy. This additionally requires carefully analyzing the low-probability (tail) behaviors of compositions of agents. In this work, we consider worst-case risk minimization over the set of feasible agent compositions. We define worst-case risk as the tail quantile – also known as value-at-risk – of the loss distribution of the agent composition where the loss quantifies the risk associated with agent behaviors. We introduce an efficient algorithm that traverses the agent graph and finds a near-optimal composition of agents by approximating the value-at-risk via a union bound and dynamic programming. Furthermore, we prove that the approximation is near-optimal asymptotically for a broad class of practical loss functions. To evaluate our framework, we consider a suite of video game-like control benchmarks that require composing several agents trained with reinforcement learning and demonstrate our algorithm’s effectiveness in approximating the value-at-risk and identifying the optimal agent composition.
nan
Article 2062
Title@2025-06-05 (4): PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Title: PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models | Vergifteter Papagei: Subtile Daten vergiften Angriffe auf Elicit Copyright-Verletzung Inhalt aus großen Sprachmodellen | 有毒Parrot:大语言模型中侵犯版权内容的隐含数据毒害攻击 2503.07697v2 |
Authors (8): Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang
As the capabilities of large language models (LLMs) continue to expand, their usage has become increasingly prevalent. However, as reflected in numerous ongoing lawsuits regarding LLM-generated content, addressing copyright infringement remains a significant challenge. In this paper, we introduce PoisonedParrot: the first stealthy data poisoning attack that induces an LLM to generate copyrighted content even when the model has not been directly trained on the specific copyrighted material. PoisonedParrot integrates small fragments of copyrighted text into the poison samples using an off-the-shelf LLM. Despite its simplicity, evaluated in a wide range of experiments, PoisonedParrot is surprisingly effective at priming the model to generate copyrighted content with no discernible side effects. Moreover, we discover that existing defenses are largely ineffective against our attack. Finally, we make the first attempt at mitigating copyright-infringement poisoning attacks by proposing a defense: ParrotTrap. We encourage the community to explore this emerging threat model further.
nan
Article 2063
Title@2025-06-05 (4): Constructive Symbolic Reinforcement Learning via Intuitionistic Logic and Goal-Chaining Inference
Title: Constructive Symbolic Reinforcement Learning via Intuitionistic Logic and Goal-Chaining Inference | Konstruktives symbolisches Stärkungslernen durch intuitionistische Logik und zielführende Schlussfolgerung | 通过神学逻辑和目标 – – 测深推断法进行建设性建构性符号强化学习 2506.05422v1 |
Authors (1): Andrei T. Patrascu
We introduce a novel learning and planning framework that replaces traditional reward-based optimisation with constructive logical inference. In our model, actions, transitions, and goals are represented as logical propositions, and decision-making proceeds by building constructive proofs under intuitionistic logic. This method ensures that state transitions and policies are accepted only when supported by verifiable preconditions – eschewing probabilistic trial-and-error in favour of guaranteed logical validity. We implement a symbolic agent operating in a structured gridworld, where reaching a goal requires satisfying a chain of intermediate subgoals (e.g., collecting keys to open doors), each governed by logical constraints. Unlike conventional reinforcement learning agents, which require extensive exploration and suffer from unsafe or invalid transitions, our constructive agent builds a provably correct plan through goal chaining, condition tracking, and knowledge accumulation. Empirical comparison with Q-learning demonstrates that our method achieves perfect safety, interpretable behaviour, and efficient convergence with no invalid actions, highlighting its potential for safe planning, symbolic cognition, and trustworthy AI. This work presents a new direction for reinforcement learning grounded not in numeric optimisation, but in constructive logic and proof theory.
nan
Article 2064
Title@2025-06-05 (4): On The Sample Complexity Bounds In Bilevel Reinforcement Learning
Title: On The Sample Complexity Bounds In Bilevel Reinforcement Learning | Auf der Probe Komplexität Grenzen im Zwei-Ebenen-Verstärkungs-Lernen | 关于在双级强化学习中的抽样复杂度交错 2503.17644v3 |
Authors (5): Mudit Gaur, Utsav Singh, Amrit Singh Bedi, Raghu Pasupathu, Vaneet Aggarwal
Bilevel reinforcement learning (BRL) has emerged as a powerful framework for aligning generative models, yet its theoretical foundations, especially sample complexity bounds, remain underexplored. In this work, we present the first sample complexity bound for BRL, establishing a rate of $\mathcal{O}(\epsilon^{-3})$ in continuous state-action spaces. Traditional MDP analysis techniques do not extend to BRL due to its nested structure and non-convex lower-level problems. We overcome these challenges by leveraging the Polyak-{\L}ojasiewicz (PL) condition and the MDP structure to obtain closed-form gradients, enabling tight sample complexity analysis. Our analysis also extends to general bi-level optimization settings with non-convex lower levels, where we achieve state-of-the-art sample complexity results of $\mathcal{O}(\epsilon^{-3})$ improving upon existing bounds of $\mathcal{O}(\epsilon^{-6})$. Additionally, we address the computational bottleneck of hypergradient estimation by proposing a fully first-order, Hessian-free algorithm suitable for large-scale problems.
nan
Article 2065
Title@2025-06-05 (4): MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation
Title: MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation | MAVL: Ein mehrsprachiger Audio-Video-Text Datensatz für animierte Song-Übersetzung | MAVL: 动动歌曲翻译多语种视听歌词数据集 2505.18614v2 |
Authors (4): Woohyun Cho, Youngmin Kim, Sunghyun Lee, Youngjae Yu
Lyrics translation requires both accurate semantic transfer and preservation of musical rhythm, syllabic structure, and poetic style. In animated musicals, the challenge intensifies due to alignment with visual and auditory cues. We introduce Multilingual Audio-Video Lyrics Benchmark for Animated Song Translation (MAVL), the first multilingual, multimodal benchmark for singable lyrics translation. By integrating text, audio, and video, MAVL enables richer and more expressive translations than text-only approaches. Building on this, we propose Syllable-Constrained Audio-Video LLM with Chain-of-Thought SylAVL-CoT, which leverages audio-video cues and enforces syllabic constraints to produce natural-sounding lyrics. Experimental results demonstrate that SylAVL-CoT significantly outperforms text-based models in singability and contextual accuracy, emphasizing the value of multimodal, multilingual approaches for lyrics translation.
nan
Article 2066
Title@2025-06-05 (4): Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos
Title: Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos | Ungewissheitsbasiertes Extensibles Codebook für diskretes Federated Learning in heterogenen Datensilos | 以不确定性为基础的异异种数据过滤器中分辨联邦学习扩展守则手册 2402.18888v4 |
Authors (3): Tianyi Zhang, Yu Cao, Dianbo Liu
Federated learning (FL), aimed at leveraging vast distributed datasets, confronts a crucial challenge: the heterogeneity of data across different silos. While previous studies have explored discrete representations to enhance model generalization across minor distributional shifts, these approaches often struggle to adapt to new data silos with significantly divergent distributions. In response, we have identified that models derived from FL exhibit markedly increased uncertainty when applied to data silos with unfamiliar distributions. Consequently, we propose an innovative yet straightforward iterative framework, termed \emph{Uncertainty-Based Extensible-Codebook Federated Learning (UEFL)}. This framework dynamically maps latent features to trainable discrete vectors, assesses the uncertainty, and specifically extends the discretization dictionary or codebook for silos exhibiting high uncertainty. Our approach aims to simultaneously enhance accuracy and reduce uncertainty by explicitly addressing the diversity of data distributions, all while maintaining minimal computational overhead in environments characterized by heterogeneous data silos. Extensive experiments across multiple datasets demonstrate that UEFL outperforms state-of-the-art methods, achieving significant improvements in accuracy (by 3\%–22.1\%) and uncertainty reduction (by 38.83\%–96.24\%). The source code is available at https://github.com/destiny301/uefl.
nan
Article 2067
Title@2025-06-05 (4): Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Title: Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? | Können Wasserzeichen große Sprachmodelle die urheberrechtlich geschützte Textgenerierung verhindern und Trainingsdaten ausblenden? | 水标记大语言模型能否防止有版权的文本生成和隐藏培训数据? 2407.17417v3 |
Authors (9): Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang
Large Language Models (LLMs) have demonstrated impressive capabilities in generating diverse and contextually rich text. However, concerns regarding copyright infringement arise as LLMs may inadvertently produce copyrighted material. In this paper, we first investigate the effectiveness of watermarking LLMs as a deterrent against the generation of copyrighted texts. Through theoretical analysis and empirical evaluation, we demonstrate that incorporating watermarks into LLMs significantly reduces the likelihood of generating copyrighted content, thereby addressing a critical concern in the deployment of LLMs. However, we also find that watermarking can have unintended consequences on Membership Inference Attacks (MIAs), which aim to discern whether a sample was part of the pretraining dataset and may be used to detect copyright violations. Surprisingly, we find that watermarking adversely affects the success rate of MIAs, complicating the task of detecting copyrighted text in the pretraining dataset. These results reveal the complex interplay between different regulatory measures, which may impact each other in unforeseen ways. Finally, we propose an adaptive technique to improve the success rate of a recent MIA under watermarking. Our findings underscore the importance of developing adaptive methods to study critical problems in LLMs with potential legal implications.
nan
Article 2068
Title@2025-06-05 (4): Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning
Title: Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning | Bedauern-Optimal Q-Lernen mit niedrigen Kosten für ein-Agenten und Federated Verstärkungs-Lernen | 单一机构和联邦强化学习的低费用学习 2506.04626v1 |
Authors (3): Haochen Zhang, Zhong Zheng, Lingzhou Xue
Motivated by real-world settings where data collection and policy deployment – whether for a single agent or across multiple agents – are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL (FRL) with a focus on minimizing burn-in costs (the sample sizes needed to reach near-optimal regret) and policy switching or communication costs. In parallel finite-horizon episodic Markov Decision Processes (MDPs) with $S$ states and $A$ actions, existing methods either require superlinear burn-in costs in $S$ and $A$ or fail to achieve logarithmic switching or communication costs. We propose two novel model-free RL algorithms – Q-EarlySettled-LowCost and FedQ-EarlySettled-LowCost – that are the first in the literature to simultaneously achieve: (i) the best near-optimal regret among all known model-free RL or FRL algorithms, (ii) low burn-in cost that scales linearly with $S$ and $A$, and (iii) logarithmic policy switching cost for single-agent RL or communication cost for FRL. Additionally, we establish gap-dependent theoretical guarantees for both regret and switching/communication costs, improving or matching the best-known gap-dependent bounds.
nan
Article 2069
Title@2025-06-05 (4): Static Word Embeddings for Sentence Semantic Representation
Title: Static Word Embeddings for Sentence Semantic Representation | Statische Wort-Einbettungen für Satz semantische Darstellung | 判决语义代表的静态单词嵌入 2506.04624v1 |
Authors (5): Takashi Wada, Yuki Hirakawa, Ryotaro Shimizu, Takahiro Kawashima, Yuki Saito
We propose new static word embeddings optimised for sentence semantic representation. We first extract word embeddings from a pre-trained Sentence Transformer, and improve them with sentence-level principal component analysis, followed by either knowledge distillation or contrastive learning. During inference, we represent sentences by simply averaging word embeddings, which requires little computational cost. We evaluate models on both monolingual and cross-lingual tasks and show that our model substantially outperforms existing static models on sentence semantic tasks, and even rivals a basic Sentence Transformer model (SimCSE) on some data sets. Lastly, we perform a variety of analyses and show that our method successfully removes word embedding components that are irrelevant to sentence semantics, and adjusts the vector norms based on the influence of words on sentence semantics.
nan
Article 2070
Title@2025-06-05 (4): Solving Inverse Problems via Diffusion-Based Priors: An Approximation-Free Ensemble Sampling Approach
Title: Solving Inverse Problems via Diffusion-Based Priors: An Approximation-Free Ensemble Sampling Approach | Inverse Probleme durch Diffusionsbasierte Priorien lösen: Ein annäherungsfreier Ensemble-Probenansatz | 通过以传播为基础的前例解决反向问题:一种近似-无聚合集成样样样的方法 2506.03979v2 |
Authors (5): Haoxuan Chen, Yinuo Ren, Martin Renqiang Min, Lexing Ying, Zachary Izzo
Diffusion models (DMs) have proven to be effective in modeling high-dimensional distributions, leading to their widespread adoption for representing complex priors in Bayesian inverse problems (BIPs). However, current DM-based posterior sampling methods proposed for solving common BIPs rely on heuristic approximations to the generative process. To exploit the generative capability of DMs and avoid the usage of such approximations, we propose an ensemble-based algorithm that performs posterior sampling without the use of heuristic approximations. Our algorithm is motivated by existing works that combine DM-based methods with the sequential Monte Carlo (SMC) method. By examining how the prior evolves through the diffusion process encoded by the pre-trained score function, we derive a modified partial differential equation (PDE) governing the evolution of the corresponding posterior distribution. This PDE includes a modified diffusion term and a reweighting term, which can be simulated via stochastic weighted particle methods. Theoretically, we prove that the error between the true posterior distribution can be bounded in terms of the training error of the pre-trained score function and the number of particles in the ensemble. Empirically, we validate our algorithm on several inverse problems in imaging to show that our method gives more accurate reconstructions compared to existing DM-based methods.
nan
Article 2071
Title@2025-06-05 (4): Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting
Title: Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting | Proaktive Modellanpassung gegen Konzept Drift für Online-Zeitreihenprognosen | 针对在线时间序列预测的 “ 动态动态 “ 概念的前瞻性模型适应 2412.08435v4 |
Authors (2): Lifan Zhao, Yanyan Shen
Time series forecasting always faces the challenge of concept drift, where data distributions evolve over time, leading to a decline in forecast model performance. Existing solutions are based on online learning, which continually organize recent time series observations as new training samples and update model parameters according to the forecasting feedback on recent data. However, they overlook a critical issue: obtaining ground-truth future values of each sample should be delayed until after the forecast horizon. This delay creates a temporal gap between the training samples and the test sample. Our empirical analysis reveals that the gap can introduce concept drift, causing forecast models to adapt to outdated concepts. In this paper, we present Proceed, a novel proactive model adaptation framework for online time series forecasting. Proceed first estimates the concept drift between the recently used training samples and the current test sample. It then employs an adaptation generator to efficiently translate the estimated drift into parameter adjustments, proactively adapting the model to the test sample. To enhance the generalization capability of the framework, Proceed is trained on synthetic diverse concept drifts. Extensive experiments on five real-world datasets across various forecast models demonstrate that Proceed brings more performance improvements than the state-of-the-art online learning methods, significantly facilitating forecast models’ resilience against concept drifts. Code is available at https://github.com/SJTU-DMTai/OnlineTSF.
nan
Article 2072
Title@2025-06-05 (4): Efficiently Serving Large Multimodal Models Using EPD Disaggregation
Title: Efficiently Serving Large Multimodal Models Using EPD Disaggregation | Effizientes Servieren großer multimodaler Modelle mit EPD-Disaggregation | 利用EPD拆分有效服务大型多模式模式 2501.05460v3 |
Authors (12): Gursimran Singh, Xinglu Wang, Yifan Hu, Timothy Yu, Linzi Xing, Wei Jiang, Zhefeng Wang, Xiaolong Bai, Yi Li, Ying Xiong, Yong Zhang, Zhenan Fan
Large Multimodal Models (LMMs) extend Large Language Models (LLMs) by handling diverse inputs such as images, audio, and video, but at the cost of adding a multimodal encoding stage that increases both computational and memory overhead. This step negatively affects key Service Level Objectives (SLOs), such as time to first token (TTFT) and time per output token (TPOT). We introduce Encode-Prefill-Decode (EPD) Disaggregation, a novel framework that separates the encoding, prefill, and decode stages onto dedicated resources. Unlike current systems, which bundle encoding and prefill together, our approach decouples these steps, unlocking new opportunities and optimizations. These include a mechanism to cache multimedia tokens for efficient transfer, a novel way to parallelize the encoding load within a request, a module for optimal resource allocation for disaggregated serving, and a novel role-switching method to handle changing workload characteristics. Experimental evaluations with popular LMMs show substantial gains in memory efficiency (up to 15x lower peak memory utilization), batch sizes (up to 22x larger), 10x more images per request, and 2.2x larger KV caches. Furthermore, it leads to significant improvements in SLO attainment (up to 90-100% improvement) and TTFT (up to 71% reduction), compared to systems that do not disaggregate. The code is available at https://github.com/vbdi/epdserve.
nan
Article 2073
Title@2025-06-05 (4): Exploring bidirectional bounds for minimax-training of Energy-based models
Title: Exploring bidirectional bounds for minimax-training of Energy-based models | Erforschung bidirektionaler Grenzen für Minimax-Training von energiebasierten Modellen | 探讨能源模型小型培训的双向双向界限 2506.04609v1 |
Authors (6): Cong Geng, Jia Wang, Li Chen, Zhiyong Gao, Jes Frellsen, Søren Hauberg
Energy-based models (EBMs) estimate unnormalized densities in an elegant framework, but they are generally difficult to train. Recent work has linked EBMs to generative adversarial networks, by noting that they can be trained through a minimax game using a variational lower bound. To avoid the instabilities caused by minimizing a lower bound, we propose to instead work with bidirectional bounds, meaning that we maximize a lower bound and minimize an upper bound when training the EBM. We investigate four different bounds on the log-likelihood derived from different perspectives. We derive lower bounds based on the singular values of the generator Jacobian and on mutual information. To upper bound the negative log-likelihood, we consider a gradient penalty-like bound, as well as one based on diffusion processes. In all cases, we provide algorithms for evaluating the bounds. We compare the different bounds to investigate, the pros and cons of the different approaches. Finally, we demonstrate that the use of bidirectional bounds stabilizes EBM training and yields high-quality density estimation and sample generation.
nan
Article 2074
Title@2025-06-05 (4): ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning
Title: ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning | ADG: Ambient Diffusion-Guided Dataset Recovery für Korruption-Robust Offline-Verstärkung Lernen | ADG: 为腐败 – – ROBT离线强化学习而进行渗透性传播、指导性数据集恢复 2505.23871v2 |
Authors (8): Zeyuan Liu, Zhihe Yang, Jiawei Xu, Rui Yang, Jiafei Lyu, Baoxiang Wang, Yunjian Xu, Xiu Li
Real-world datasets collected from sensors or human inputs are prone to noise and errors, posing significant challenges for applying offline reinforcement learning (RL). While existing methods have made progress in addressing corrupted actions and rewards, they remain insufficient for handling corruption in high-dimensional state spaces and for cases where multiple elements in the dataset are corrupted simultaneously. Diffusion models, known for their strong denoising capabilities, offer a promising direction for this problem-but their tendency to overfit noisy samples limits their direct applicability. To overcome this, we propose Ambient Diffusion-Guided Dataset Recovery (ADG), a novel approach that pioneers the use of diffusion models to tackle data corruption in offline RL. First, we introduce Ambient Denoising Diffusion Probabilistic Models (DDPM) from approximated distributions, which enable learning on partially corrupted datasets with theoretical guarantees. Second, we use the noise-prediction property of Ambient DDPM to distinguish between clean and corrupted data, and then use the clean subset to train a standard DDPM. Third, we employ the trained standard DDPM to refine the previously identified corrupted data, enhancing data quality for subsequent offline RL training. A notable strength of ADG is its versatility-it can be seamlessly integrated with any offline RL algorithm. Experiments on a range of benchmarks, including MuJoCo, Kitchen, and Adroit, demonstrate that ADG effectively mitigates the impact of corrupted data and improves the robustness of offline RL under various noise settings, achieving state-of-the-art results.
nan
Article 2075
Title@2025-06-05 (4): Ignoring Directionality Leads to Compromised Graph Neural Network Explanations
Title: Ignoring Directionality Leads to Compromised Graph Neural Network Explanations | Ignorieren der Richtungalität führt zu kompromissierten Graphen-Neural-Netzwerk-Erklärungen | 忽略方向引导导致压缩的图形神经网络解释 2506.04608v1 |
Authors (3): Changsheng Sun, Xinke Li, Jin Song Dong
Graph Neural Networks (GNNs) are increasingly used in critical domains, where reliable explanations are vital for supporting human decision-making. However, the common practice of graph symmetrization discards directional information, leading to significant information loss and misleading explanations. Our analysis demonstrates how this practice compromises explanation fidelity. Through theoretical and empirical studies, we show that preserving directional semantics significantly improves explanation quality, ensuring more faithful insights for human decision-makers. These findings highlight the need for direction-aware GNN explainability in security-critical applications.
nan
Article 2076
Title@2025-06-05 (4): MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball
Title: MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball | MVP-Shapley: Featurebasierte Modellierung für die Bewertung des wertvollsten Spielers im Basketball | MVP-Shaplay:评估篮球中最有价值的玩家的基于地物的模型模型 2506.04602v1 |
Authors (8): Haifeng Sun, Yu Xiong, Runze Wu, Kai Wang, Lan Zhang, Changjie Fan, Shaojie Tang, Xiang-Yang Li
The burgeoning growth of the esports and multiplayer online gaming community has highlighted the critical importance of evaluating the Most Valuable Player (MVP). The establishment of an explainable and practical MVP evaluation method is very challenging. In our study, we specifically focus on play-by-play data, which records related events during the game, such as assists and points. We aim to address the challenges by introducing a new MVP evaluation framework, denoted as \oursys, which leverages Shapley values. This approach encompasses feature processing, win-loss model training, Shapley value allocation, and MVP ranking determination based on players’ contributions. Additionally, we optimize our algorithm to align with expert voting results from the perspective of causality. Finally, we substantiated the efficacy of our method through validation using the NBA dataset and the Dunk City Dynasty dataset and implemented online deployment in the industry.
nan
Article 2077
Title@2025-06-05 (4): Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets
Title: Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets | Skalierungsgesetze für einen robusten Vergleich von Open Foundation Language-Vision-Modellen und -Datensätzen | 开放基础语言-视觉模型和数据集强力比较法 2506.04598v1 |
Authors (7): Marianna Nezhurina, Tomer Porian, Giovanni Pucceti, Tommie Kerssies, Romain Beaumont, Mehdi Cherti, Jenia Jitsev
In studies of transferable learning, scaling laws are obtained for various important foundation models to predict their properties and performance at larger scales. We show here how scaling law derivation can also be used for model and dataset comparison, allowing to decide which procedure is to be preferred for pre-training. For the first time, full scaling laws based on dense measurements across a wide span of model and samples seen scales are derived for two important language-vision learning procedures, CLIP and MaMMUT, that use either contrastive only or contrastive and captioning text generative loss. Ensuring sufficient prediction accuracy for held out points, we use derived scaling laws to compare both models, obtaining evidence for MaMMUT’s stronger improvement with scale and better sample efficiency than standard CLIP. To strengthen validity of the comparison, we show scaling laws for various downstream tasks, classification, retrieval, and segmentation, and for different open datasets, DataComp, DFN and Re-LAION, observing consistently the same trends. We show that comparison can also be performed when deriving scaling laws with a constant learning rate schedule, reducing compute cost. Accurate derivation of scaling laws provides thus means to perform model and dataset comparison across scale spans, avoiding misleading conclusions based on measurements from single reference scales only, paving the road for systematic comparison and improvement of open foundation models and datasets for their creation. We release all the pre-trained models with their intermediate checkpoints, including openMaMMUT-L/14, which achieves $80.3\%$ zero-shot ImageNet-1k accuracy, trained on 12.8B samples from DataComp-1.4B. Code for reproducing experiments in the paper and raw experiments data can be found at https://github.com/LAION-AI/scaling-laws-for-comparison.
nan
Article 2078
Title@2025-06-05 (4): Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
Title: Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem | Lösen des vernünftigen Potenzials von vortrainierten LLMs durch Kritik Feinsteuerung auf ein Problem | 通过Critique对一个问题的微调,释放预先培训的LLMs的理据潜力 2506.03295v2 |
Authors (5): Yubo Wang, Ping Nie, Kai Zou, Lijun Wu, Wenhu Chen
We have witnessed that strong LLMs like Qwen-Math, MiMo, and Phi-4 possess immense reasoning potential inherited from the pre-training stage. With reinforcement learning (RL), these models can improve dramatically on reasoning tasks. Recent studies have shown that even RL on a single problem can unleash these models’ reasoning capabilities. However, RL is not only expensive but also unstable. Even one-shot RL requires hundreds of GPU hours. This raises a critical question: Is there a more efficient way to unleash the reasoning potential of these powerful base LLMs? In this work, we demonstrate that Critique Fine-Tuning (CFT) on only one problem can effectively unleash the reasoning potential of LLMs. Our method constructs critique data by collecting diverse model-generated solutions to a single problem and using teacher LLMs to provide detailed critiques. We fine-tune Qwen and Llama family models, ranging from 1.5B to 14B parameters, on the CFT data and observe significant performance gains across diverse reasoning tasks. For example, with just 5 GPU hours of training, Qwen-Math-7B-CFT show an average improvement of 15% on six math benchmarks and 16% on three logic reasoning benchmarks. These results are comparable to or even surpass the results from RL with 20x less compute. Ablation studies reveal the robustness of one-shot CFT across different prompt problems. These results highlight one-shot CFT as a simple, general, and compute-efficient approach to unleashing the reasoning capabilities of modern LLMs.
nan
Article 2079
Title@2025-06-05 (4): Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable
Title: Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable | Sicherheitssteuer: Sicherheitsausrichtung macht Ihre großen vernünftigen Modelle weniger vernünftig | 安全税:安全协调使你的大理由模型降低合理性 2503.00555v2 |
Authors (7): Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Zachary Yahn, Yichang Xu, Ling Liu
Safety alignment is an important procedure before the official deployment of a Large Language Model (LLM). While safety alignment has been extensively studied for LLM, there is still a large research gap for Large Reasoning Models (LRMs) that equip with improved reasoning capability. We in this paper systematically examine a simplified pipeline for producing safety aligned LRMs. With our evaluation of various LRMs, we deliver two main findings: i) Safety alignment can be done upon the LRM to restore its safety capability. ii) Safety alignment leads to a degradation of the reasoning capability of LRMs. The two findings show that there exists a trade-off between reasoning and safety capability with the sequential LRM production pipeline. The discovered trade-off, which we name Safety Tax, should shed light on future endeavors of safety research on LRMs. As a by-product, we curate a dataset called DirectRefusal, which might serve as an alternative dataset for safety alignment. Our source code is available at https://github.com/git-disl/Safety-Tax.
nan
Article 2080
Title@2025-06-05 (4): Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification
Title: Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification | Sicher: Mathematische Reasoning in großen Sprachmodellen durch retrospektive stufenweise Verifizierung verbessern | 安全:通过回溯性逐步认识正式核查,加强大语言模型中的数学理由 2506.04592v1 |
Authors (10): Chengwu Liu, Ye Yuan, Yichun Yin, Yan Xu, Xin Xu, Zaoyu Chen, Yasheng Wang, Lifeng Shang, Qun Liu, Ming Zhang
Chain-of-Thought (CoT) prompting has become the de facto method to elicit reasoning capabilities from large language models (LLMs). However, to mitigate hallucinations in CoT that are notoriously difficult to detect, current methods such as process reward models (PRMs) or self-consistency operate as opaque boxes and do not provide checkable evidence for their judgments, possibly limiting their effectiveness. To address this issue, we draw inspiration from the idea that “the gold standard for supporting a mathematical claim is to provide a proof”. We propose a retrospective, step-aware formal verification framework $Safe$. Rather than assigning arbitrary scores, we strive to articulate mathematical claims in formal mathematical language Lean 4 at each reasoning step and provide formal proofs to identify hallucinations. We evaluate our framework $Safe$ across multiple language models and various mathematical datasets, demonstrating a significant performance improvement while offering interpretable and verifiable evidence. We also propose $FormalStep$ as a benchmark for step correctness theorem proving with $30,809$ formal statements. To the best of our knowledge, our work represents the first endeavor to utilize formal mathematical language Lean 4 for verifying natural language content generated by LLMs, aligning with the reason why formal mathematical languages were created in the first place: to provide a robust foundation for hallucination-prone human-written proofs.
nan
Article 2081
Title@2025-06-05 (4): Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret
Title: Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret | Thompson Sampling und UCB verbinden: Auf dem Weg zu effizienteren Kompromissen zwischen Datenschutz und Bedauern | 连接Thompson抽样和UCB:实现更高效的隐私和遗憾之间的权衡取舍 2505.02383v2 |
Authors (5): Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias Lécuyer, Nidhi Hegde
We address differentially private stochastic bandit problems from the angles of exploring the deep connections among Thompson Sampling with Gaussian priors, Gaussian mechanisms, and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private bandit algorithm that enables to trade off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-\alpha)}\right)$-GDP and enjoys an $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bound, where $\alpha \in [0,1]$ controls the trade-off between privacy and regret. Theoretically, our DP-TS-UCB relies on anti-concentration bounds of Gaussian distributions and links exploration mechanisms in Thompson Sampling-based algorithms and Upper Confidence Bound-based algorithms, which may be of independent interest.
nan
Article 2082
Title@2025-06-05 (4): Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
Title: Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization | Adressierung von Konzept-Mislabeling in Konzept-Bottleneck-Modellen durch Preference-Optimierung | 通过优先优化处理概念瓶颈模式中的概念误贴标签问题 2504.18026v3 |
Authors (4): Emiliano Penaloza, Tianyue H. Zhan, Laurent Charlin, Mateo Espinosa Zarlenga
Concept Bottleneck Models (CBMs) propose to enhance the trustworthiness of AI systems by constraining their decisions on a set of human-understandable concepts. However, CBMs typically assume that datasets contain accurate concept labels-an assumption often violated in practice, which we show can significantly degrade performance (by 25% in some cases). To address this, we introduce the Concept Preference Optimization (CPO) objective, a new loss function based on Direct Preference Optimization, which effectively mitigates the negative impact of concept mislabeling on CBM performance. We provide an analysis of key properties of the CPO objective, showing it directly optimizes for the concept’s posterior distribution, and contrast it against Binary Cross Entropy (BCE), demonstrating that CPO is inherently less sensitive to concept noise. We empirically confirm our analysis by finding that CPO consistently outperforms BCE on three real-world datasets, both with and without added label noise. We make our code available on Github.
nan
Article 2083
Title@2025-06-05 (4): AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Title: AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment | AdvBDGen: Adversarially Fortified Prompt-Spezifische Fuzzy Backdoor Generator gegen LLM Ausrichtung | AdvBDGen:抗LLM对齐的反增强的快速特制烟雾后门发电机 2410.11283v3 |
Authors (4): Pankayaraj Pathmanathan, Udari Madhushani Sehwag, Michael-Andrei Panaitescu-Liess, Furong Huang
With the growing adoption of reinforcement learning with human feedback (RLHF) for aligning large language models (LLMs), the risk of backdoor installation during alignment has increased, leading to unintended and harmful behaviors. Existing backdoor triggers are typically limited to fixed word patterns, making them detectable during data cleaning and easily removable post-poisoning. In this work, we explore the use of prompt-specific paraphrases as backdoor triggers, enhancing their stealth and resistance to removal during LLM alignment. We propose AdvBDGen, an adversarially fortified generative fine-tuning framework that automatically generates prompt-specific backdoors that are effective, stealthy, and transferable across models. AdvBDGen employs a generator-discriminator pair, fortified by an adversary, to ensure the installability and stealthiness of backdoors. It enables the crafting and successful installation of complex triggers using as little as 3% of the fine-tuning data. Once installed, these backdoors can jailbreak LLMs during inference, demonstrate improved stability against perturbations compared to traditional constant triggers, and are more challenging to remove. These findings underscore an urgent need for the research community to develop more robust defenses against adversarial backdoor threats in LLM alignment.
nan
Article 2084
Title@2025-06-05 (4): Learning Hyperparameters via a Data-Emphasized Variational Objective
Title: Learning Hyperparameters via a Data-Emphasized Variational Objective | Hyperparameter über ein datenverstärktes Variationsziel lernen | 通过数据-按单位分列的变异目标学习超超参数 2502.01861v2 |
Authors (3): Ethan Harvey, Mikhail Petrov, Michael C. Hughes
When training large flexible models on limited data, avoiding overfitting is a practical concern. Common grid search or smarter search methods rely on expensive separate runs at each candidate hyperparameter while carving out a validation set that reduces available training data. In this paper, we consider direct gradient-based learning of regularization hyperparameters on the full training set via the evidence lower bound (“ELBo”) objective from Bayesian variational methods. We focus on scenarios where the model is over-parameterized for flexibility while the approximate posterior is chosen to be Gaussian with isotropic covariance for tractability, even though it cannot match the true posterior exactly. In such scenarios, we find the ELBo prioritizes posteriors that match the prior variance, which leads to severely underfitting the data. Instead, we recommend a data-emphasized ELBo that upweights the influence of the data likelihood relative to the prior. In Bayesian transfer learning of classifiers for text and images, our method reduces 88+ hour grid searches of past work to under 3 hours while delivering comparable accuracy. We further demonstrate how our approach enables efficient yet accurate approximations of Gaussian processes with learnable length-scale kernels.
nan
Article 2085
Title@2025-06-05 (4): Contrastive Visual Data Augmentation
Title: Contrastive Visual Data Augmentation | Kontrastive Bilddatenvergrößerung | 对比视觉数据增强 2502.17709v2 |
Authors (9): Yu Zhou, Bingxuan Li, Mohan Tang, Xiaomeng Jin, Te-Lin Wu, Kuan-Hao Huang, Heng Ji, Kai-Wei Chang, Nanyun Peng
Large multimodal models (LMMs) often struggle to recognize novel concepts, as they rely on pre-trained knowledge and have limited ability to capture subtle visual details. Domain-specific knowledge gaps in training also make them prone to confusing visually similar, commonly misrepresented, or low-resource concepts. To help LMMs better align nuanced visual features with language, improving their ability to recognize and reason about novel or rare concepts, we propose a Contrastive visual Data Augmentation (CoDA) strategy. CoDA extracts key contrastive textual and visual features of target concepts against the known concepts they are misrecognized as, and then uses multimodal generative models to produce targeted synthetic data. Automatic filtering of extracted features and augmented images is implemented to guarantee their quality, as verified by human annotators. We show the effectiveness and efficiency of CoDA on low-resource concept and diverse scene recognition datasets including INaturalist and SUN. We additionally collect NovelSpecies, a benchmark dataset consisting of newly discovered animal species that are guaranteed to be unseen by LMMs. LLaVA-1.6 1-shot updating results on these three datasets show CoDA significantly improves SOTA visual data augmentation strategies by 12.3% (NovelSpecies), 5.1% (SUN), and 6.0% (iNat) absolute gains in accuracy.
nan
Article 2086
Title@2025-06-05 (4): StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation
Title: StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation | StatsMerging: Statistik-geführtes Modell Zusammenführen über Task-Spezifische Lehrer-Destillation | StatsMerging:通过特定任务教师蒸馏将统计指导模型合并 2506.04567v1 |
Authors (3): Ranjith Merugu, Bryan Bo Cao, Shubham Jain
Model merging has emerged as a promising solution to accommodate multiple large models within constrained memory budgets. We present StatsMerging, a novel lightweight learning-based model merging method guided by weight distribution statistics without requiring ground truth labels or test samples. StatsMerging offers three key advantages: (1) It uniquely leverages singular values from singular value decomposition (SVD) to capture task-specific weight distributions, serving as a proxy for task importance to guide task coefficient prediction; (2) It employs a lightweight learner StatsMergeLearner to model the weight distributions of task-specific pre-trained models, improving generalization and enhancing adaptation to unseen samples; (3) It introduces Task-Specific Teacher Distillation for merging vision models with heterogeneous architectures, a merging learning paradigm that avoids costly ground-truth labels by task-specific teacher distillation. Notably, we present two types of knowledge distillation, (a) distilling knowledge from task-specific models to StatsMergeLearner; and (b) distilling knowledge from models with heterogeneous architectures prior to merging. Extensive experiments across eight tasks demonstrate the effectiveness of StatsMerging. Our results show that StatsMerging outperforms state-of-the-art techniques in terms of overall accuracy, generalization to unseen tasks, and robustness to image quality variations.
nan
Article 2087
Title@2025-06-05 (4): Clustering and Median Aggregation Improve Differentially Private Inference
Title: Clustering and Median Aggregation Improve Differentially Private Inference | Clustering und Median Aggregation verbessern unterschiedliche private Schlussfolgerungen | 群集和中中聚合改善差别私人推断 2506.04566v1 |
Authors (7): Kareem Amin, Salman Avestimehr, Sara Babakniya, Alex Bie, Weiwei Kong, Natalia Ponomareva, Umar Syed
Differentially private (DP) language model inference is an approach for generating private synthetic text. A sensitive input example is used to prompt an off-the-shelf large language model (LLM) to produce a similar example. Multiple examples can be aggregated together to formally satisfy the DP guarantee. Prior work creates inference batches by sampling sensitive inputs uniformly at random. We show that uniform sampling degrades the quality of privately generated text, especially when the sensitive examples concern heterogeneous topics. We remedy this problem by clustering the input data before selecting inference batches. Next, we observe that clustering also leads to more similar next-token predictions across inferences. We use this insight to introduce a new algorithm that aggregates next token statistics by privately computing medians instead of averages. This approach leverages the fact that the median has decreased local sensitivity when next token predictions are similar, allowing us to state a data-dependent and ex-post DP guarantee about the privacy properties of this algorithm. Finally, we demonstrate improvements in terms of representativeness metrics (e.g., MAUVE) as well as downstream task performance. We show that our method produces high-quality synthetic data at significantly lower privacy cost than a previous state-of-the-art method.
nan
Article 2088
Title@2025-06-05 (4): ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
Title: ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration | ReCalKV: Low-Rank KV Cache-Kompression über Head Reordering und Offline-Kalibrierung | ReCalKV: 通过头重排序和离线校准进行低Rank KV缓存压缩 2505.24357v2 |
Authors (6): Xianglong Yan, Zhiteng Li, Tianao Zhang, Linghe Kong, Yulun Zhang, Xiaokang Yang
Large language models (LLMs) have achieved remarkable performance, yet their capability on long-context reasoning is often constrained by the excessive memory required to store the Key-Value (KV) cache. This makes KV cache compression an essential step toward enabling efficient long-context reasoning. Recent methods have explored reducing the hidden dimensions of the KV cache, but many introduce additional computation through projection layers or suffer from significant performance degradation under high compression ratios. To address these challenges, we propose ReCalKV, a post-training KV cache compression method that reduces the hidden dimensions of the KV cache. We develop distinct compression strategies for Keys and Values based on their different roles and varying importance in the attention mechanism. For Keys, we propose Head-wise Similarity-aware Reordering (HSR), which clusters similar heads and applies grouped SVD to the key projection matrix, reducing additional computation while preserving accuracy. For Values, we propose Offline Calibration and Matrix Fusion (OCMF) to preserve accuracy without extra computational overhead. Experiments show that ReCalKV outperforms existing low-rank compression methods, achieving high compression ratios with minimal performance loss. The code and models will be available at: https://github.com/XIANGLONGYAN/ReCalKV.
nan
Article 2089
Title@2025-06-05 (4): OpenThoughts: Data Recipes for Reasoning Models
Title: OpenThoughts: Data Recipes for Reasoning Models | OpenThoughts: Datenrezepte für sinnvolle Modelle | OpenToughts: 用于解释模型的数据折录 2506.04178v2 |
Authors (50): Etash Guha, Ryan Marten, Sedrick Keh, Negin Raoof, Georgios Smyrnis, Hritik Bansal, Marianna Nezhurina, Jean Mercat, Trung Vu, Zayne Sprague, Ashima Suvarna, Benjamin Feuer, Liangyu Chen, Zaid Khan, Eric Frankel, Sachin Grover, Caroline Choi, Niklas Muennighoff, Shiye Su, Wanjia Zhao, John Yang, Shreyas Pimpalgaonkar, Kartik Sharma, Charlie Cheng-Jie Ji, Yichuan Deng, Sarah Pratt, Vivek Ramanujan, Jon Saad-Falcon, Jeffrey Li, Achal Dave, Alon Albalak, Kushal Arora, Blake Wulfe, Chinmay Hegde, Greg Durrett, Sewoong Oh, Mohit Bansal, Saadia Gabriel, Aditya Grover, Kai-Wei Chang, Vaishaal Shankar, Aaron Gokaslan, Mike A. Merrill, Tatsunori Hashimoto, Yejin Choi, Jenia Jitsev, Reinhard Heckel, Maheswaran Sathiamoorthy, Alexandros G. Dimakis, Ludwig Schmidt
Reasoning models have made rapid progress on many benchmarks involving math, code, and science. Yet, there are still many open questions about the best training recipes for reasoning since state-of-the-art models often rely on proprietary datasets with little to no public information available. To address this, the goal of the OpenThoughts project is to create open-source datasets for training reasoning models. After initial explorations, our OpenThoughts2-1M dataset led to OpenThinker2-32B, the first model trained on public reasoning data to match DeepSeek-R1-Distill-32B on standard reasoning benchmarks such as AIME and LiveCodeBench. We then improve our dataset further by systematically investigating each step of our data generation pipeline with 1,000+ controlled experiments, which led to OpenThoughts3. Scaling the pipeline to 1.2M examples and using QwQ-32B as teacher yields our OpenThoughts3-7B model, which achieves state-of-the-art results: 53% on AIME 2025, 51% on LiveCodeBench 06/24-01/25, and 54% on GPQA Diamond - improvements of 15.3, 17.2, and 20.5 percentage points compared to the DeepSeek-R1-Distill-Qwen-7B. All of our datasets and models are available on https://openthoughts.ai.
nan
Article 2090
Title@2025-06-05 (4): Investigating Distributions of Telecom Adapted Sentence Embeddings for Document Retrieval
Title: Investigating Distributions of Telecom Adapted Sentence Embeddings for Document Retrieval | Untersuchung der Verteilung von Telecom-adaptierten Satz-Einbindungen für Dokumentenwiederherstellung | 用于文件检索的经调整的远程判刑嵌入表的调查分发情况 2406.12336v3 |
Authors (7): Sujoy Roychowdhury, Sumit Soman, Ranjani Hosakere Gireesha, Vansh Chhabra, Neeraj Gunda, Subhadip Bandyopadhyay, Sai Krishna Bala
A plethora of sentence embedding models makes it challenging to choose one, especially for technical domains rich with specialized vocabulary. In this work, we domain adapt embeddings using telecom data for question answering. We evaluate embeddings obtained from publicly available models and their domain-adapted variants, on both point retrieval accuracies, as well as their (95%) confidence intervals. We establish a systematic method to obtain thresholds for similarity scores for different embeddings. As expected, we observe that fine-tuning improves mean bootstrapped accuracies. We also observe that it results in tighter confidence intervals, which further improve when pre-training is preceded by fine-tuning. We introduce metrics which measure the distributional overlaps of top-$K$, correct and random document similarities with the question. Further, we show that these metrics are correlated with retrieval accuracy and similarity thresholds. Recent literature shows conflicting effects of isotropy on retrieval accuracies. Our experiments establish that the isotropy of embeddings (as measured by two independent state-of-the-art isotropy metric definitions) is poorly correlated with retrieval performance. We show that embeddings for domain-specific sentences have little overlap with those for domain-agnostic ones, and fine-tuning moves them further apart. Based on our results, we provide recommendations for use of our methodology and metrics by researchers and practitioners.
nan
Article 2091
Title@2025-06-05 (4): LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits
Title: LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits | LEDRO: LLM-Enhanced Design Raumreduzierung und Optimierung für analoge Schaltungen | LLDRO: LLM-加强设计空间减少和模拟电路优化 2411.12930v2 |
Authors (4): Dimple Vijay Kochar, Hanrui Wang, Anantha Chandrakasan, Xin Zhang
Traditional approaches for designing analog circuits are time-consuming and require significant human expertise. Existing automation efforts using methods like Bayesian Optimization (BO) and Reinforcement Learning (RL) are sub-optimal and costly to generalize across different topologies and technology nodes. In our work, we introduce a novel approach, LEDRO, utilizing Large Language Models (LLMs) in conjunction with optimization techniques to iteratively refine the design space for analog circuit sizing. LEDRO is highly generalizable compared to other RL and BO baselines, eliminating the need for design annotation or model training for different topologies or technology nodes. We conduct a comprehensive evaluation of our proposed framework and baseline on 22 different Op-Amp topologies across four FinFET technology nodes. Results demonstrate the superior performance of LEDRO as it outperforms our best baseline by an average of 13% FoM improvement with 2.15x speed-up on low complexity Op-Amps and 48% FoM improvement with 1.7x speed-up on high complexity Op-Amps. This highlights LEDRO’s effective performance, efficiency, and generalizability.
nan
Article 2092
Title@2025-06-05 (4): What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Title: What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective | Was in LLM-Schichten passiert ist, wenn es um schnelles gegen langsames Denken geht: Eine gradiente Perspektive | 训练快速与慢思考:渐进视角时 LLM 图层中发生的情况 2410.23743v2 |
Authors (3): Ming Li, Yanhong Li, Tianyi Zhou
What makes a difference in the post-training of LLMs? We investigate the training patterns of different layers in large language models (LLMs) through the lens of the gradient. We are specifically interested in how fast vs. slow thinking affects the layer-wise gradients, given the recent popularity of training LLMs on reasoning paths such as chain-of-thoughts (CoT) and process rewards. In our study, fast thinking without CoT leads to larger gradients and larger differences of gradients across layers than slow thinking (Detailed CoT), indicating the learning stability brought by the latter. Additionally, we study whether the gradient patterns can reflect the correctness of responses when training different LLMs using slow vs. fast thinking paths. The results show that the gradients of slow thinking can distinguish correct and irrelevant reasoning paths. As a comparison, we conduct similar gradient analyses on non-reasoning knowledge learning tasks, on which, however, trivially increasing the response length does not lead to similar behaviors of slow thinking. Our study strengthens fundamental understandings of LLM training and sheds novel insights on its efficiency and stability, which pave the way towards building a generalizable System-2 agent. Our code, data, and gradient statistics can be found in: https://github.com/MingLiiii/Layer_Gradient.
nan
Article 2093
Title@2025-06-05 (4): Non-linear Multi-objective Optimization with Probabilistic Branch and Bound
Title: Non-linear Multi-objective Optimization with Probabilistic Branch and Bound | Nichtlineare Multiobjektive Optimierung mit probabilistischem Zweig und Bound | 非线性多目标优化与概率分支和圆 2506.04554v1 |
Authors (2): Hao Huang, Zelda B. Zabinsky
A multiple objective simulation optimization algorithm named Multiple Objective Probabilistic Branch and Bound with Single Observation (MOPBnB(so)) is presented for approximating the Pareto optimal set and the associated efficient frontier for stochastic multi-objective optimization problems. MOPBnB(so) evaluates a noisy function exactly once at any solution and uses neighboring solutions to estimate the objective functions, in contrast to a variant that uses multiple replications at a solution to estimate the objective functions. A finite-time performance analysis for deterministic multi-objective problems provides a bound on the probability that MOPBnB(so) captures the Pareto optimal set. Asymptotic convergence of MOPBnB(so) on stochastic problems is derived, in that the algorithm captures the Pareto optimal set and the estimations converge to the true objective function values. Numerical results reveal that the variant with multiple replications is extremely intensive in terms of computational resources compared to MOPBnB(so). In addition, numerical results show that MOPBnB(so) outperforms a genetic algorithm NSGA-II on test problems.
nan
Article 2094
Title@2025-06-05 (4): Unsupervised Machine Learning for Scientific Discovery: Workflow and Best Practices
Title: Unsupervised Machine Learning for Scientific Discovery: Workflow and Best Practices | Unüberwachtes maschinelles Lernen für wissenschaftliche Entdeckung: Workflow und Best Practices | 为科学发现进行无人监督的机器学习:工作流程和最佳做法 2506.04553v1 |
Authors (4): Andersen Chang, Tiffany M. Tang, Tarek M. Zikry, Genevera I. Allen
Unsupervised machine learning is widely used to mine large, unlabeled datasets to make data-driven discoveries in critical domains such as climate science, biomedicine, astronomy, chemistry, and more. However, despite its widespread utilization, there is a lack of standardization in unsupervised learning workflows for making reliable and reproducible scientific discoveries. In this paper, we present a structured workflow for using unsupervised learning techniques in science. We highlight and discuss best practices starting with formulating validatable scientific questions, conducting robust data preparation and exploration, using a range of modeling techniques, performing rigorous validation by evaluating the stability and generalizability of unsupervised learning conclusions, and promoting effective communication and documentation of results to ensure reproducible scientific discoveries. To illustrate our proposed workflow, we present a case study from astronomy, seeking to refine globular clusters of Milky Way stars based upon their chemical composition. Our case study highlights the importance of validation and illustrates how the benefits of a carefully-designed workflow for unsupervised learning can advance scientific discovery.
nan
Article 2095
Title@2025-06-05 (4): Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality
Title: Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality | Politisches Lernen “ohne” überschneidet sich: Pessimismus und generalisierte empirische Ungleichheit Bernsteins | 政策学习“无重叠”重叠:悲观主义和普遍经验型伯恩斯坦的不平等 2212.09900v4 |
Authors (4): Ying Jin, Zhimei Ren, Zhuoran Yang, Zhaoran Wang
This paper studies offline policy learning, which aims at utilizing observations collected a priori (from either fixed or adaptively evolving behavior policies) to learn an optimal individualized decision rule that achieves the best overall outcomes for a given population. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded. As one has no control over the data collection process, this assumption can be unrealistic in many situations, especially when the behavior policies are allowed to evolve over time with diminishing propensities for certain actions. In this paper, we propose Pessimistic Policy Learning (PPL), a new algorithm that optimizes lower confidence bounds (LCBs) – instead of point estimates – of the policy values. The LCBs are constructed using knowledge of the behavior policies for collecting the offline data. Without assuming any uniform overlap condition, we establish a data-dependent upper bound for the suboptimality of our algorithm, which only depends on (i) the overlap for the optimal policy, and (ii) the complexity of the policy class we optimize over. As an implication, for adaptively collected data, we ensure efficient policy learning as long as the propensities for optimal actions are lower bounded over time, while those for suboptimal ones are allowed to diminish arbitrarily fast. In our theoretical analysis, we develop a new self-normalized type concentration inequality for inverse-propensity-weighting estimators, generalizing the well-known empirical Bernstein’s inequality to unbounded and non-i.i.d. data. We complement our theory with an efficient optimization algorithm via Majorization-Minimization and policy tree search, as well as extensive simulation studies and real-world applications that demonstrate the efficacy of PPL.
nan
Article 2096
Title@2025-06-05 (4): Communication Efficient Adaptive Model-Driven Quantum Federated Learning
Title: Communication Efficient Adaptive Model-Driven Quantum Federated Learning | Kommunikation Effizientes adaptives modellgetriebenes Quantum-Federiertes Lernen | 适应性强、适应性强的示范模式-量子联邦学习 2506.04548v1 |
Authors (2): Dev Gurung, Shiva Raj Pokhrel
Training with huge datasets and a large number of participating devices leads to bottlenecks in federated learning (FL). Furthermore, the challenges of heterogeneity between multiple FL clients affect the overall performance of the system. In a quantum federated learning (QFL) context, we address these three main challenges: i) training bottlenecks from massive datasets, ii) the involvement of a substantial number of devices, and iii) non-IID data distributions. We introduce a model-driven quantum federated learning algorithm (mdQFL) to tackle these challenges. Our proposed approach is efficient and adaptable to various factors, including different numbers of devices. To the best of our knowledge, it is the first to explore training and update personalization, as well as test generalization within a QFL setting, which can be applied to other FL scenarios. We evaluated the efficiency of the proposed mdQFL framework through extensive experiments under diverse non-IID data heterogeneity conditions using various datasets within the Qiskit environment. Our results demonstrate a nearly 50% decrease in total communication costs while maintaining or, in some cases, exceeding the accuracy of the final model and consistently improving local model training compared to the standard QFL baseline. Moreover, our experimental evaluation thoroughly explores the QFL and mdQFL algorithms, along with several influencing factors. In addition, we present a theoretical analysis to clarify the complexities of the proposed algorithm. The experimental code is available at 1.
nan
Article 2097
Title@2025-06-05 (4): $K$-MSHC: Unmasking Minimally Sufficient Head Circuits in Large Language Models with Experiments on Syntactic Classification Tasks
Title: $K$-MSHC: Unmasking Minimally Sufficient Head Circuits in Large Language Models with Experiments on Syntactic Classification Tasks | $K$-MSHC: Entlarvung minimal ausreichender Kopfkreise in großen Sprachmodellen mit Experimenten zu syntaktischen Klassifizierungsaufgaben | USK美元-MSHC:在以同步分类任务实验的大型语言模型中,以最小化的足够大语言电路总电路 2505.12268v2 |
Authors (3): Pratim Chowdhary, Peter Chin, Deepernab Chakrabarty
Understanding which neural components drive specific capabilities in mid-sized language models ($\leq$10B parameters) remains a key challenge. We introduce the $(\bm{K}, \epsilon)$-Minimum Sufficient Head Circuit ($K$-MSHC), a methodology to identify minimal sets of attention heads crucial for classification tasks as well as Search-K-MSHC, an efficient algorithm for discovering these circuits. Applying our Search-K-MSHC algorithm to Gemma-9B, we analyze three syntactic task families: grammar acceptability, arithmetic verification, and arithmetic word problems. Our findings reveal distinct task-specific head circuits, with grammar tasks predominantly utilizing early layers, word problems showing pronounced activity in both shallow and deep regions, and arithmetic verification demonstrating a more distributed pattern across the network. We discover non-linear circuit overlap patterns, where different task pairs share computational components at varying levels of importance. While grammar and arithmetic share many “weak” heads, arithmetic and word problems share more consistently critical “strong” heads. Importantly, we find that each task maintains dedicated “super-heads” with minimal cross-task overlap, suggesting that syntactic and numerical competencies emerge from specialized yet partially reusable head circuits.
nan
Article 2098
Title@2025-06-05 (4): ATLaS: Agent Tuning via Learning Critical Steps
Title: ATLaS: Agent Tuning via Learning Critical Steps | ATLaS: Agent Tuning über Learning Critical Steps | 通过学习关键步骤进行测试的代理 2503.02197v2 |
Authors (6): Zhixun Chen, Ming Li, Yuxuan Huang, Yali Du, Meng Fang, Tianyi Zhou
Large Language Model (LLM) agents have demonstrated remarkable generalization capabilities across multi-domain tasks. Existing agent tuning approaches typically employ supervised finetuning on entire expert trajectories. However, behavior-cloning of full trajectories can introduce expert bias and weaken generalization to states not covered by the expert data. Additionally, critical steps, such as planning, complex reasoning for intermediate subtasks, and strategic decision-making, are essential to success in agent tasks, so learning these steps is the key to improving LLM agents. For more effective and efficient agent tuning, we propose ATLaS that identifies the critical steps in expert trajectories and finetunes LLMs solely on these steps with reduced costs. By steering the training’s focus to a few critical steps, our method mitigates the risk of overfitting entire trajectories and promotes generalization across different environments and tasks. In extensive experiments, an LLM finetuned on only 30% critical steps selected by ATLaS outperforms the LLM finetuned on all steps and recent open-source LLM agents. ATLaS maintains and improves base LLM skills as generalist agents interacting with diverse environments.
nan
Article 2099
Title@2025-06-05 (4): General Intelligence Requires Reward-based Pretraining
Title: General Intelligence Requires Reward-based Pretraining | Allgemeine Intelligenz erfordert eine prämienbasierte Vorausbildung | 以奖励为基础的训练前训练 2502.19402v2 |
Authors (4): Seungwook Han, Jyothish Pari, Samuel J. Gershman, Pulkit Agrawal
Large Language Models (LLMs) have demonstrated impressive real-world utility, exemplifying artificial useful intelligence (AUI). However, their ability to reason adaptively and robustly – the hallmarks of artificial general intelligence (AGI) – remains fragile. While LLMs seemingly succeed in commonsense reasoning, programming, and mathematics, they struggle to generalize algorithmic understanding across novel contexts. Our experiments with algorithmic tasks in esoteric programming languages reveal that LLM’s reasoning overfits to the training data and is limited in its transferability. We hypothesize that the core issue underlying such limited transferability is the coupling of reasoning and knowledge in LLMs. To transition from AUI to AGI, we propose disentangling knowledge and reasoning through three key directions: (1) pretaining to reason using RL from scratch as an alternative to the widely used next-token prediction pretraining, (2) using a curriculum of synthetic tasks to ease the learning of a reasoning prior for RL that can then be transferred to natural language tasks, and (3) learning more generalizable reasoning functions using a small context window to reduce exploiting spurious correlations between tokens. Such a reasoning system coupled with a trained retrieval system and a large external memory bank as a knowledge store can overcome several limitations of existing architectures at learning to reason in novel scenarios.
nan
Article 2100
Title@2025-06-05 (4): Combinatorial Reinforcement Learning with Preference Feedback
Title: Combinatorial Reinforcement Learning with Preference Feedback | Kombinatorisches Stärkungslernen mit Präferenz-Feedback | 结合强化学习与优先反馈 2502.10158v3 |
Authors (2): Joongkyu Lee, Min-hwan Oh
In this paper, we consider combinatorial reinforcement learning with preference feedback, where a learning agent sequentially offers an action–an assortment of multiple items to–a user, whose preference feedback follows a multinomial logistic (MNL) model. This framework allows us to model real-world scenarios, particularly those involving long-term user engagement, such as in recommender systems and online advertising. However, this framework faces two main challenges: (1) the unknown value of each item, unlike traditional MNL bandits that only address single-step preference feedback, and (2) the difficulty of ensuring optimism while maintaining tractable assortment selection in the combinatorial action space with unknown values. In this paper, we assume a contextual MNL preference model, where the mean utilities are linear, and the value of each item is approximated by a general function. We propose an algorithm, MNL-VQL, that addresses these challenges, making it both computationally and statistically efficient. As a special case, for linear MDPs (with the MNL preference feedback), we establish the first regret lower bound in this framework and show that MNL-VQL achieves nearly minimax-optimal regret. To the best of our knowledge, this is the first work to provide statistical guarantees in combinatorial RL with preference feedback.
nan
Article 2101
Title@2025-06-05 (4): hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation
Title: hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation | hdl2v: Ein Code-Übersetzungsdatensatz für verbesserte LLM Verilog-Generierung | hdl2v: 用于强化LLM Verilog 生成的代码翻译数据集 2506.04544v1 |
Authors (6): Charles Hong, Brendan Roberts, Huijae An, Alex Um, Advay Ratan, Yakun Sophia Shao
Large language models (LLMs) are playing an increasingly large role in domains such as code generation, including hardware code generation, where Verilog is the key language. However, the amount of publicly available Verilog code pales in comparison to the amount of code available for software languages like Python. In this work, we present hdl2v (“HDL-to-Verilog”), a dataset which seeks to increase the amount of available human-written Verilog data by translating or compiling three other hardware description languages - VHDL, Chisel, and PyMTL3 - to Verilog. Furthermore, we demonstrate the value of hdl2v in enhancing LLM Verilog generation by improving performance of a 32 billion-parameter open-weight model by up to 23% (pass@10) in VerilogEvalV2, without utilizing any data augmentation or knowledge distillation from larger models. We also show hdl2v’s ability to boost the performance of a data augmentation-based fine-tuning approach by 63%. Finally, we characterize and analyze our dataset to better understand which characteristics of HDL-to-Verilog datasets can be expanded upon in future work for even better performance.
nan
Article 2102
Title@2025-06-05 (4): Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction
Title: Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction | Neurale MJD: Neurale nicht-stationäre Merton Jump Diffusion für Zeitreihenvorhersage | 神经MJD: 用于时间序列预测的神经非立体Merton跳跃扩散 2506.04542v1 |
Authors (4): Yuanpei Gao, Qi Yan, Yan Leng, Renjie Liao
While deep learning methods have achieved strong performance in time series prediction, their black-box nature and inability to explicitly model underlying stochastic processes often limit their generalization to non-stationary data, especially in the presence of abrupt changes. In this work, we introduce Neural MJD, a neural network based non-stationary Merton jump diffusion (MJD) model. Our model explicitly formulates forecasting as a stochastic differential equation (SDE) simulation problem, combining a time-inhomogeneous It\^o diffusion to capture non-stationary stochastic dynamics with a time-inhomogeneous compound Poisson process to model abrupt jumps. To enable tractable learning, we introduce a likelihood truncation mechanism that caps the number of jumps within small time intervals and provide a theoretical error bound for this approximation. Additionally, we propose an Euler-Maruyama with restart solver, which achieves a provably lower error bound in estimating expected states and reduced variance compared to the standard solver. Experiments on both synthetic and real-world datasets demonstrate that Neural MJD consistently outperforms state-of-the-art deep learning and statistical learning methods.
nan
Article 2103
Title@2025-06-05 (4): Chronoamperometry with Room-Temperature Ionic Liquids: Sub-Second Inference Techniques
Title: Chronoamperometry with Room-Temperature Ionic Liquids: Sub-Second Inference Techniques | Chronoamperometrie mit Ionenflüssigkeiten der Raumtemperatur: Sub-Second Inferenztechniken | 与室-室-温度-不态液体:二次二次推断技术 2506.04540v1 |
Authors (1): Kordel K. France
Chronoamperometry (CA) is a fundamental electrochemical technique used for quantifying redox-active species. However, in room-temperature ionic liquids (RTILs), the high viscosity and slow mass transport often lead to extended measurement durations. This paper presents a novel mathematical regression approach that reduces CA measurement windows to under 1 second, significantly faster than previously reported methods, which typically require 1-4 seconds or longer. By applying an inference algorithm to the initial transient current response, this method accurately predicts steady-state electrochemical parameters without requiring additional hardware modifications. The approach is validated through comparison with standard chronoamperometric techniques and is demonstrated to maintain reasonable accuracy while dramatically reducing data acquisition time. The implications of this technique are explored in analytical chemistry, sensor technology, and battery science, where rapid electrochemical quantification is critical. Our technique is focused on enabling faster multiplexing of chronoamperometric measurements for rapid olfactory and electrochemical analysis.
nan
Article 2104
Title@2025-06-05 (4): Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models
Title: Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models | Intelligent4DSE: Optimierung der High-Level-Synthese Design Space Exploration mit Graph Neural Networks und großen Sprachmodellen | 智能4DSE:利用图形神经网络和大语言模型优化高级合成设计空间探索 2504.19649v2 |
Authors (4): Lei Xu, Shanshan Wang, Emmanuel Casseau, Chenglong Xiao
High-level synthesis (HLS) design space exploration (DSE) is an optimization process in electronic design automation (EDA) that systematically explores high-level design configurations to achieve Pareto-optimal hardware implementations balancing performance, area, and power (PPA). To optimize this process, HLS prediction tasks often employ message-passing neural networks (MPNNs), leveraging complex architectures to achieve high accuracy. These predictors serve as evaluators in the DSE process, effectively bypassing the time-consuming estimations traditionally required by HLS tools. However, existing models often prioritize structural complexity and minimization of training loss, overlooking task-specific characteristics. Additionally, while evolutionary algorithms are widely used in DSE, they typically require extensive domain-specific knowledge to design effective crossover and mutation operators. To address these limitations, we propose CoGNNs-LLMEA, a framework that integrates a graph neural network with task-adaptive message passing and a large language model-enhanced evolutionary algorithm. As a predictive model, CoGNNs directly leverages intermediate representations generated from source code after compiler front-end processing, enabling prediction of quality of results (QoR) without invoking HLS tools. Due to its strong adaptability to tasks, CoGNNs can be tuned to predict post-HLS and post-implementation outcomes, effectively bridging the gap between high-level abstractions and physical implementation characteristics. CoGNNs achieves state-of-the-art prediction accuracy in post-HLS QoR prediction, reducing mean prediction errors by 2.8$\times$ for latency and 3.4$\times$ for resource utilization compared to baseline models.
nan
Article 2105
Title@2025-06-05 (4): Olfactory Inertial Odometry: Sensor Calibration and Drift Compensation
Title: Olfactory Inertial Odometry: Sensor Calibration and Drift Compensation | Olfactory Inertial Odometrie: Sensorkalibrierung und Driftkompensation | 腐蚀性惰性测量:传感器校准和漂流赔偿 2506.04539v1 |
Authors (4): Kordel K. France, Ovidiu Daescu, Anirban Paul, Shalini Prasad
Visual inertial odometry (VIO) is a process for fusing visual and kinematic data to understand a machine’s state in a navigation task. Olfactory inertial odometry (OIO) is an analog to VIO that fuses signals from gas sensors with inertial data to help a robot navigate by scent. Gas dynamics and environmental factors introduce disturbances into olfactory navigation tasks that can make OIO difficult to facilitate. With our work here, we define a process for calibrating a robot for OIO that generalizes to several olfaction sensor types. Our focus is specifically on calibrating OIO for centimeter-level accuracy in localizing an odor source on a slow-moving robot platform to demonstrate use cases in robotic surgery and touchless security screening. We demonstrate our process for OIO calibration on a real robotic arm and show how this calibration improves performance over a cold-start olfactory navigation task.
nan
Article 2106
Title@2025-06-05 (4): Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs
Title: Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs | Projektionsbasierte Lyapunov-Methode für voll heterogene schwach gekoppelte MDPs | 以投射为基础的Lyapunov 方法,用于完全多元的微软相混合的 MDP 2502.06072v4 |
Authors (3): Xiangcheng Zhang, Yige Hong, Weina Wang
Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, an efficiently computable policy achieves an $O(1/\sqrt{N})$ optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.
nan
Article 2107
Title@2025-06-05 (4): NOBLE – Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models
Title: NOBLE – Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models | NOBLE – Neuronenoperator mit biologisch informierten Latent-Embeddings zur Erfassung experimenteller Variabilität in biologischen Neuronenmodellen | NOBL – – 神经操作器,具有生物知情的中程嵌入装置,以捕获生物中子模型中的实验变异性 2506.04536v1 |
Authors (6): Luca Ghafourpour, Valentin Duruisseaux, Bahareh Tolooshams, Philip H. Wong, Costas A. Anastassiou, Anima Anandkumar
Characterizing the diverse computational properties of human neurons via multimodal electrophysiological, transcriptomic, and morphological data provides the foundation for constructing and validating bio-realistic neuron models that can advance our understanding of fundamental mechanisms underlying brain function. However, current modeling approaches remain constrained by the limited availability and intrinsic variability of experimental neuronal data. To capture variability, ensembles of deterministic models are often used, but are difficult to scale as model generation requires repeating computationally expensive optimization for each neuron. While deep learning is becoming increasingly relevant in this space, it fails to capture the full biophysical complexity of neurons, their nonlinear voltage dynamics, and variability. To address these shortcomings, we introduce NOBLE, a neural operator framework that learns a mapping from a continuous frequency-modulated embedding of interpretable neuron features to the somatic voltage response induced by current injection. Trained on data generated from biophysically realistic neuron models, NOBLE predicts distributions of neural dynamics accounting for the intrinsic experimental variability. Unlike conventional bio-realistic neuron models, interpolating within the embedding space offers models whose dynamics are consistent with experimentally observed responses. NOBLE is the first scaled-up deep learning framework validated on real experimental data, enabling efficient generation of synthetic neurons that exhibit trial-to-trial variability and achieve a $4200\times$ speedup over numerical solvers. To this end, NOBLE captures fundamental neural properties, opening the door to a better understanding of cellular composition and computations, neuromorphic architectures, large-scale brain circuits, and general neuroAI applications.
nan
Article 2108
Title@2025-06-05 (4): On the Emergence of Position Bias in Transformers
Title: On the Emergence of Position Bias in Transformers | Über die Entstehung der Position Bias in Transformatoren | 变换者中立场偏见的出现 2502.01951v2 |
Authors (4): Xinyi Wu, Yifei Wang, Stefanie Jegelka, Ali Jadbabaie
Recent studies have revealed various manifestations of position bias in transformer architectures, from the “lost-in-the-middle” phenomenon to attention sinks, yet a comprehensive theoretical understanding of how attention masks and positional encodings shape these biases remains elusive. This paper presents a graph-theoretic framework for analyzing position bias in multi-layer attention. Modeling attention masks as directed graphs, we quantify how tokens interact with contextual information based on their sequential positions. We uncover two key insights: First, causal masking inherently biases attention toward earlier positions, as tokens in deeper layers attend to increasingly more contextualized representations of earlier tokens. Second, we characterize the competing effects of the causal mask and relative positional encodings, such as the decay mask and rotary positional encoding (RoPE): while both mechanisms introduce distance-based decay within individual attention maps, their aggregate effect across multiple attention layers$\unicode{x2013}$coupled with the causal mask$\unicode{x2013}$leads to a trade-off between the long-term decay effects and the cumulative importance of early sequence positions. Through controlled numerical experiments, we not only validate our theoretical findings but also reproduce position biases observed in real-world LLMs. Our framework offers a principled foundation for understanding positional biases in transformers, shedding light on the complex interplay of attention mechanism components and guiding more informed architectural design.
nan
Article 2109
Title@2025-06-05 (4): HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training
Title: HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training | HALOS: Hierarchische asynchrone lokale SGD über langsame Netzwerke für geo-distributed Large Language Model Training | HALOS:地理分布大语言模式培训的跨慢网的等级非同步本地 SGD 2506.04531v1 |
Authors (8): Geon-Woo Kim, Junbo Li, Shashidhar Gandham, Omar Baldonado, Adithya Gangidi, Pavan Balaji, Zhangyang Wang, Aditya Akella
Training large language models (LLMs) increasingly relies on geographically distributed accelerators, causing prohibitive communication costs across regions and uneven utilization of heterogeneous hardware. We propose HALoS, a hierarchical asynchronous optimization framework that tackles these issues by introducing local parameter servers (LPSs) within each region and a global parameter server (GPS) that merges updates across regions. This hierarchical design minimizes expensive inter-region communication, reduces straggler effects, and leverages fast intra-region links. We provide a rigorous convergence analysis for HALoS under non-convex objectives, including theoretical guarantees on the role of hierarchical momentum in asynchronous training. Empirically, HALoS attains up to 7.5x faster convergence than synchronous baselines in geo-distributed LLM training and improves upon existing asynchronous methods by up to 2.1x. Crucially, HALoS preserves the model quality of fully synchronous SGD-matching or exceeding accuracy on standard language modeling and downstream benchmarks-while substantially lowering total training time. These results demonstrate that hierarchical, server-side update accumulation and global model merging are powerful tools for scalable, efficient training of new-era LLMs in heterogeneous, geo-distributed environments.
nan
Article 2110
Title@2025-06-05 (4): Self-Predictive Dynamics for Generalization of Vision-based Reinforcement Learning
Title: Self-Predictive Dynamics for Generalization of Vision-based Reinforcement Learning | Selbstvorhersagbare Dynamiken für die Generalisierung des visionsbasierten Verstärkungslernens | 普及基于愿景的强化学习的自我预测动态 2506.05418v1 |
Authors (3): Kyungsoo Kim, Jeongsoo Ha, Yusung Kim
Vision-based reinforcement learning requires efficient and robust representations of image-based observations, especially when the images contain distracting (task-irrelevant) elements such as shadows, clouds, and light. It becomes more important if those distractions are not exposed during training. We design a Self-Predictive Dynamics (SPD) method to extract task-relevant features efficiently, even in unseen observations after training. SPD uses weak and strong augmentations in parallel, and learns representations by predicting inverse and forward transitions across the two-way augmented versions. In a set of MuJoCo visual control tasks and an autonomous driving task (CARLA), SPD outperforms previous studies in complex observations, and significantly improves the generalization performance for unseen observations. Our code is available at https://github.com/unigary/SPD.
nan
Article 2111
Title@2025-06-05 (4): Hierarchical Implicit Neural Emulators
Title: Hierarchical Implicit Neural Emulators | Hierarchische Implizite Neuralemulatoren | 等级隐性神经隐性模拟器 2506.04528v1 |
Authors (7): Ruoxi Jiang, Xiao Zhang, Karan Jakhar, Peter Y. Lu, Pedram Hassanzadeh, Michael Maire, Rebecca Willett
Neural PDE solvers offer a powerful tool for modeling complex dynamical systems, but often struggle with error accumulation over long time horizons and maintaining stability and physical consistency. We introduce a multiscale implicit neural emulator that enhances long-term prediction accuracy by conditioning on a hierarchy of lower-dimensional future state representations. Drawing inspiration from the stability properties of numerical implicit time-stepping methods, our approach leverages predictions several steps ahead in time at increasing compression rates for next-timestep refinements. By actively adjusting the temporal downsampling ratios, our design enables the model to capture dynamics across multiple granularities and enforce long-range temporal coherence. Experiments on turbulent fluid dynamics show that our method achieves high short-term accuracy and produces long-term stable forecasts, significantly outperforming autoregressive baselines while adding minimal computational overhead.
nan
Article 2112
Title@2025-06-05 (4): Uncovering Memorization Effect in the Presence of Spurious Correlations
Title: Uncovering Memorization Effect in the Presence of Spurious Correlations | Uncovering Memorization Effekt in der Gegenwart von Purous Correlations | 在存在净腐蚀物时未覆盖的记忆化效应 2501.00961v3 |
Authors (6): Chenyu You, Haocheng Dai, Yifei Min, Jasjeet S. Sekhon, Sarang Joshi, James S. Duncan
Machine learning models often rely on simple spurious features – patterns in training data that correlate with targets but are not causally related to them, like image backgrounds in foreground classification. This reliance typically leads to imbalanced test performance across minority and majority groups. In this work, we take a closer look at the fundamental cause of such imbalanced performance through the lens of memorization, which refers to the ability to predict accurately on atypical examples (minority groups) in the training set but failing in achieving the same accuracy in the testing set. This paper systematically shows the ubiquitous existence of spurious features in a small set of neurons within the network, providing the first-ever evidence that memorization may contribute to imbalanced group performance. Through three experimental sources of converging empirical evidence, we find the property of a small subset of neurons or channels in memorizing minority group information. Inspired by these findings, we hypothesize that spurious memorization, concentrated within a small subset of neurons, plays a key role in driving imbalanced group performance. To further substantiate this hypothesis, we show that eliminating these unnecessary spurious memorization patterns via a novel framework during training can significantly affect the model performance on minority groups. Our experimental results across various architectures and benchmarks offer new insights on how neural networks encode core and spurious knowledge, laying the groundwork for future research in demystifying robustness to spurious correlation.
nan
Article 2113
Title@2025-06-05 (4): Perturbative Gradient Training: A novel training paradigm for bridging the gap between deep neural networks and physical reservoir computing
Title: Perturbative Gradient Training: A novel training paradigm for bridging the gap between deep neural networks and physical reservoir computing | Perturbative Gradient Training: Ein neuartiges Trainingsparadigma zur Überbrückung der Kluft zwischen tiefen neuronalen Netzwerken und physikalischen Reservoir Computing | 循环渐进培训:弥合深神经网络与物理储油层计算之间的差距的新培训模式 2506.04523v1 |
Authors (3): Cliff B. Abbott, Mark Elo, Dmytro A. Bozhko
We introduce Perturbative Gradient Training (PGT), a novel training paradigm that overcomes a critical limitation of physical reservoir computing: the inability to perform backpropagation due to the black-box nature of physical reservoirs. Drawing inspiration from perturbation theory in physics, PGT uses random perturbations in the network’s parameter space to approximate gradient updates using only forward passes. We demonstrate the feasibility of this approach on both simulated neural network architectures, including a dense network and a transformer model with a reservoir layer, and on experimental hardware using a magnonic auto-oscillation ring as the physical reservoir. Our results show that PGT can achieve performance comparable to that of standard backpropagation methods in cases where backpropagation is impractical or impossible. PGT represents a promising step toward integrating physical reservoirs into deeper neural network architectures and achieving significant energy efficiency gains in AI training.
nan
Article 2114
Title@2025-06-05 (4): EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles
Title: EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles | EnsemW2S: Verbesserung der Schwach-zu-Strong-Verallgemeinerung mit großsprachigen Modellensembles | EnsemW2S:用大语言模型组合加强弱至强的通用化 2505.21959v2 |
Authors (9): Aakriti Agrawal, Mucong Ding, Zora Che, Chenghao Deng, Anirudh Satheesh, Bang An, Bayan Bruss, John Langford, Furong Huang
With Large Language Models (LLMs) rapidly approaching and potentially surpassing human-level performance, it has become imperative to develop approaches capable of effectively supervising and enhancing these powerful models using smaller, human-level models exposed to only human-level data. We address this critical weak-to-strong (W2S) generalization challenge by proposing a novel method aimed at improving weak experts, by training on the same limited human-level data, enabling them to generalize to complex, super-human-level tasks. Our approach, called \textbf{EnsemW2S}, employs a token-level ensemble strategy that iteratively combines multiple weak experts, systematically addressing the shortcomings identified in preceding iterations. By continuously refining these weak models, we significantly enhance their collective ability to supervise stronger student models. We extensively evaluate the generalization performance of both the ensemble of weak experts and the subsequent strong student model across in-distribution (ID) and out-of-distribution (OOD) datasets. For OOD, we specifically introduce question difficulty as an additional dimension for defining distributional shifts. Our empirical results demonstrate notable improvements, achieving 4\%, and 3.2\% improvements on ID datasets and, upto 6\% and 2.28\% on OOD datasets for experts and student models respectively, underscoring the effectiveness of our proposed method in advancing W2S generalization.
nan