• 00 06-12 (4) Rethinking Losses for Diffusion Bridge Samplers Rückdenken von Verlusten für Diffusionsbrücken-Probenehmer 重新思考扩散桥采样器的损失 2506.10982v1
  • 01 06-12 Fine-Grained Perturbation Guidance via Attention Head Selection Feinkörnige Störungsführung über Aufmerksamkeitskopfauswahl 通过 “ 关注负责人甄选 “ 指导 2506.10978v1
  • 02 06-12 AutoMind: Adaptive Knowledgeable Agent for Automated Data Science AutoMind: Adaptives Knowledgeable Agent für automatisierte Datenwissenschaft 自动Mind:自动数据科学适应性知识代理 2506.10974v1
  • 03 06-12 Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning Prinzipierte Ansätze zur Erweiterung neuraler Architekturen auf Funktionsräume für das Bediener-Lernen 将神经结构扩展至功能空间以便操作员学习的原则办法 2506.10973v1
  • 04 06-12 Farseer: A Refined Scaling Law in Large Language Models Farseer: Ein verfeinertes Skalierungsgesetz in großen Sprachmodellen Farseer:大语言模式中经过完善的扩大法 2506.10972v1
  • 05 06-12 What Exactly Does Guidance Do in Masked Discrete Diffusion Models Was genau tut Anleitung in maskierten diskreten Diffusionsmodellen? 在隐蔽的分辨扩散模型中,指导到底做什么? 2506.10971v1
  • 06 06-12 SoK: Watermarking for AI-Generated Content SoK: Wasserzeichen für KI-generierten Inhalt SoK: 用于 AI 生成内容的水标志 2411.18479v3
  • 07 06-12 SpectralAR: Spectral Autoregressive Visual Generation SpectralAR: Spectral Autoregressive Visual Generation 光谱AR:光谱自动递减视觉生成 2506.10962v1
  • 08 06-12 ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark ChinesischHarm-Bench: Ein chinesischer schädlicher Content Detection Benchmark 中中汉禁区:中国有害内容检测基准 2506.10960v1
  • 09 06-12 Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods In-Context-Lernen über strukturierte Manifolds verstehen: Aufmerksamkeit auf Kernel-Methoden überbrücken 理解关于结构式操纵的内文学习:弥合对核心方法的注意 2506.10959v1
  • 10 06-12 AssistanceZero: Scalably Solving Assistance Games AssistanceZero: Skalably Solving Assistance Games 援助零:可升级解决援助运动会 2504.07091v2
  • 11 06-12 ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems ReGuidance: Ein einfacher Diffusions-Wrapper zur Steigerung der Probenqualität bei harten inversen Problemen 重新指导:一个简单的传播包包,用于提高反向问题样本质量 2506.10955v1
  • 12 06-12 Build the web for agents, not agents for the web Erstellen Sie das Web für Agenten, nicht Agenten für das Web 为代理者而不是网络代理者建立网络 2506.10953v1
  • 13 06-12 Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training Domain2Vec: Vectorizing Datasets, um die optimale Datenmischung ohne Training zu finden 域2Vec: 将数据集矢量化,以查找未经过培训的最佳数据混合体 2506.10952v1
  • 14 06-12 Execution Guided Line-by-Line Code Generation Ausführen Geführte Zeilen-für-Linien-Code-Generierung 逐行逐行制导 2506.10948v1
  • 15 06-12 Spurious Rewards: Rethinking Training Signals in RLVR Spurious Rewards: Rethinking Trainingssignale in RLVR 净利奖:在RLVR重新思考培训信号 2506.10947v1
  • 16 06-12 GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models GUARD: Geführtes Lernen und Zurückhalten über Datenzuweisung für große Sprachmodelle GUARD:通过大语言模式数据归称制,指导学习和保留 2506.10946v1
  • 17 06-12 Coupled reaction and diffusion governing interface evolution in solid-state batteries Gekoppelte Reaktion und Diffusion, die die Schnittstellenentwicklung in Festkörperbatterien regeln 关于固体状态电池界面演变的组合反应与扩散 2506.10944v1
  • 18 06-12 TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems TimberStrike: Datensatz-Rekonstruktion Angriff Enthüllen der Privatsphäre Leckage in Federated Tree-Based Systems 木材三角:联邦树基系统中数据集重建攻击清除隐私渗漏 2506.07605v2
  • 19 06-12 Self-Adapting Language Models Selbstanpassende Sprachmodelle 自我改进语言模式 2506.10943v1
  • 20 06-12 VINCIE: Unlocking In-context Image Editing from Video VINCIE: Im Kontext Bildbearbeitung von Video entsperren VINCIE: 从视频中解锁 Incontext 图像编辑 2506.10941v1
  • 21 06-12 Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction Entwicklung eines leistungsfähigen Rahmens für die Sprachemotionserkennung unter naturalistischen Bedingungen Herausforderung für die emotionale Attributvorhersage 制定一个高绩效框架,以便在自然条件下对情感属性预测的自然条件挑战中承认言论情感 2506.10930v1
  • 22 06-12 Improving LLM Safety Alignment with Dual-Objective Optimization Verbesserung der LLM-Sicherheitsausrichtung mit Dual-Ziel-Optimierung 提高LLM安全一致性,实现双目标优化 2503.03710v2
  • 23 06-12 On feature selection in double-imbalanced data settings: a Random Forest approach Zur Feature-Auswahl in doppelt ausgeglichenen Dateneinstellungen: ein Random Forest-Ansatz 在双重平衡数据设置中选择功能的功能:随机森林办法 2506.10929v1
  • 24 06-12 Investigating the Relationship Between Physical Activity and Tailored Behavior Change Messaging: Connecting Contextual Bandit with Large Language Models Untersuchung der Beziehung zwischen körperlicher Aktivität und maßgeschneidertem Verhalten Veränderung Messaging: Verknüpfung von Kontextbandit mit großen Sprachmodellen 调查身体活动与因地制宜的行为变化通信之间的关系:将环境强盗与大语言模式联系起来 2506.07275v2
  • 25 06-12 Robustly Improving LLM Fairness in Realistic Settings via Interpretability Robuste Verbesserung der LLM Fairness in realistischen Einstellungen durch Dolmetschbarkeit 通过可解释性在现实环境中强有力地提高LLM公平性 2506.10922v1
  • 26 06-12 Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization Dekomponieren von MLP-Aktivierungen in Interpretierbare Funktionen über semi-Nonnegative Matrix-Fabrikisierung 通过半氮基矩阵化系数化,将劳动和生产部的分解活动转化为可解释性特征 2506.10920v1
  • 27 06-12 Sequential-Parallel Duality in Prefix Scannable Models Sequential-Parallel-Dualität in voreingestellten Scannable-Modellen 前缀扫描可扫描模型中的序列- Parallel 品质 2506.10918v1
  • 28 06-12 M4V: Multi-Modal Mamba for Text-to-Video Generation M4V: Multi-Modal Mamba für die Text-zu-Video-Generierung M4V: 用于文本到视频一代的多式Mamba 2506.10915v1
  • 29 06-12 Foundation Models for Causal Inference via Prior-Data Fitted Networks Gründungsmodelle für die Kausalableitung über Prior-Data Fitted Networks 通过先前数据适合型网络进行因果关系推断的模型基础 2506.10914v1
  • 30 06-12 NoLoCo: No-all-reduce Low Communication Training Method for Large Models NoLoCo: Nein-alles-Reduce Low Communication Training Method for Large Models NoLOCO: 大型模型的低通信培训方法 2506.10911v1
  • 31 06-12 Probably Approximately Correct Labels Wahrscheinlich ungefähr korrekte Etiketten 可能是近似正确标签 2506.10908v1
  • 32 06-12 Demystifying Spectral Feature Learning for Instrumental Variable Regression Entmystifizieren von Spektralfunktionen für instrumentelle variable Regression 为乐器变量回归而解开神秘的光谱特征学习 2506.10899v1
  • 33 06-12 The Diffusion Duality Die Diffusionsdualität 传播质量 2506.10892v1
  • 34 06-12 PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play PLAY2PROMPT: Zero-shot Tool Instruction Optimierung für LLM Agenten über Tool Play PLAY2PROMOPT: 通过工具游戏优化LLM代理器的零射工具指令 2503.14432v2
  • 35 06-12 Lattice Climber Attack: Adversarial attacks for randomized mixtures of classifiers Lattice Climber Attack: Adversariale Angriffe für randomisierte Mischungen von Klassifikatoren Lattice 攀爬者攻击: 随机混合分类器的反向攻击 2506.10888v1
  • 36 06-12 Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers Verallgemeinerung oder Halluzination? Verstehen von Out-of-Context-Reasoning in Transformers 通化还是幻觉? 理解变异器的逻辑外原因 2506.10887v1
  • 37 06-12 Three iterations of $(d-1)$-WL test distinguish non isometric clouds of $d$-dimensional points Drei Iterationen von $(d-1)$-WL-Test unterscheiden nicht-isometrische Wolken von $d$-dimensionalen Punkten 三次(d-1-1美元-WL)的迭代(美元-美元-WL)测试区分非等度云(美元-维点) 2303.12853v4
  • 38 06-12 A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials Ein Goemans-Williamson-Algorithmus zur Identifizierung von Subkohorten in klinischen Studien Goemans- Williamson 型算法,用于在临床试验中识别亚科 2506.10879v1
  • 39 06-12 Data-Driven Prediction of Dynamic Interactions Between Robot Appendage and Granular Material Datengestützte Vorhersage dynamischer Interaktionen zwischen Roboter-Anhang und Granular Material 机器人附加和颗粒材料之间动态相互作用的数据驱动预测 2506.10875v1
  • 40 06-12 The Gittins Index: A Design Principle for Decision-Making Under Uncertainty Der Gittins Index: Ein Design-Prinzip für Entscheidungsfindung unter Unsicherheit Gittins指数:不确定性下决策的设计原则 2506.10872v1
  • 41 06-12 Unveiling the Role of Randomization in Multiclass Adversarial Classification: Insights from Graph Theory Enthüllen der Rolle der Randomisierung in der Mehrklassen-Adversarial-Klassifikation: Einsichten aus der Graphentheorie 打破随机化在多级反对向分类中的作用:图表理论的透视 2503.14299v2
  • 42 06-12 Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization Lebensfähigkeit künftiger Maßnahmen: Robuste Sicherheit im Stärkungslernen durch Entropieregularisierung 未来行动的可行性:通过英信正规化加强学习的有力安全 2506.10871v1
  • 43 06-12 mLaSDI: Multi-stage latent space dynamics identification mLaSDI: Mehrstufige latente Raumdynamik-Identifikation mLASDI:多阶段潜在空间动态识别 2506.09207v2
  • 44 06-12 Latent Action Learning Requires Supervision in the Presence of Distractors Latent Action Learning erfordert Überwachung in der Gegenwart von Distraktoren 长期行动学习要求对有减压器的存在进行监督 2502.00379v5
  • 45 06-12 Multi-group Uncertainty Quantification for Long-form Text Generation Multi-Gruppen-Unsicherheits-Quantifizierung für langformige Textgenerierung 长式文本生成的不确定性量化 2407.21057v2
  • 46 06-12 Breaking Distortion-free Watermarks in Large Language Models Breaking Distortion-free Watermarks in großen Sprachmodellen 大语言模型中无扭曲的无水印 2502.18608v2
  • 47 06-12 Debiasing Watermarks for Large Language Models via Maximal Coupling Debiasing Wasserzeichen für große Sprachmodelle über Maximal Coupling 通过Maximal Coupling为大语言模型减少对水标记的偏差 2411.11203v2
  • 48 06-12 Sample Complexity and Representation Ability of Test-time Scaling Paradigms Probenkomplexität und Darstellungsfähigkeit von Testzeit-Skalierparadigmen 试验时间缩放模型的复杂性和代表性 2506.05295v2
  • 49 06-12 OmniFluids: Unified Physics Pre-trained Modeling of Fluid Dynamics OmniFluids: Unified Physics Pre-trained Modeling of Fluid Dynamics OmniFluids: 流体动力学统一物理预培训模拟模型 2506.10862v1
  • 50 06-12 Energy-Efficient Deep Learning for Traffic Classification on Microcontrollers Energieeffizientes Deep Learning für die Verkehrsklassifizierung an Mikrocontrollern 关于微控制器交通分类的节能深层学习 2506.10851v1
  • 51 06-12 GraphThought: Graph Combinatorial Optimization with Thought Generation GraphThought: Graph Combinatorial Optimization mit Thought Generation 图图图图图图: 与新一代人相比的图形组合式优化 2502.11607v2
  • 52 06-12 Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles Beschleunigen von Diffusion Großsprachige Modelle mit SlowFast: Die drei goldenen Prinzipien 以慢速加速传播大语言模式:三大金原则 2506.10848v1
  • 53 06-12 Advanced fraud detection using machine learning models: enhancing financial transaction security Fortgeschrittene Betrugserkennung mit maschinellen Lernmodellen: Erhöhung der finanziellen Transaktionssicherheit 利用机器学习模式发现高级欺诈:加强金融交易安全 2506.10842v1
  • 54 06-12 Simplicity bias and optimization threshold in two-layer ReLU networks Einfachheits-Bias und Optimierungsschwelle in zweischichtigen ReLU-Netzwerken 双层ReLU网络中的简化偏差和优化阈值 2410.02348v2
  • 55 06-12 Efficiency Robustness of Dynamic Deep Learning Systems Effizienz Robustheit dynamischer Deep Learning Systeme 动态深深学习系统的效率强强 2506.10831v1
  • 56 06-12 On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves Auf der Geometrie der Receiver-Betriebscharakteristik und Precision-Recall-Kurven 收件者操作特征和精确召回曲线几何 2504.02169v2
  • 57 06-12 Learning richness modulates equality reasoning in neural networks Lernreichtum moduliert Gleichstellungsabwägungen in neuronalen Netzwerken 学习丰富 适应神经网络的平等推理 2503.09781v2
  • 58 06-12 Empirical and computer-aided robustness analysis of long-step and accelerated methods in smooth convex optimization Empirische und computergestützte Robustheitsanalyse von langen und beschleunigten Methoden in reibungsloser konvexer Optimierung 经验分析和计算机辅助的稳健性分析,分析如何以长步骤和加速方法顺利优化顺流优化 2506.09730v2
  • 59 06-12 General targeted machine learning for modern causal mediation analysis Allgemeines gezieltes maschinelles Lernen für moderne kausale Mediationsanalyse 现代因果调解分析通用定向机器学习 2408.14620v2
  • 60 06-12 Detecting High-Stakes Interactions with Activation Probes Ermitteln von High-Stakes-Interaktionen mit Aktivierungssonden 与活性检测检测检测检测高吸量相互作用 2506.10805v1
  • 61 06-12 Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background Knowledge Optimale Algorithmen für eingeschränktes k-Center-Clustering mit Instance-Level-Hintergrundwissen 受约束的 k- Center 集成实情背景知识的近最佳 K- Center 算法 2401.12533v4
  • 62 06-12 Dense Associative Memory with Epanechnikov Energy Dichtes assoziatives Gedächtnis mit Epanechnikov Energie Epanechnikov能源的密集联合记忆 2506.10801v1
  • 63 06-12 Divide-Fuse-Conquer: Eliciting “Aha Moments” in Multi-Scenario Games Divide-Fuse-Conquer: Eliciting “Aha Momente” in Multi-Szenario-Spiele 分裂-裂变:在多种场景运动会中激发“哈动力” 2505.16401v4
  • 64 06-12 Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling Adaptives Federated LoRA in heterogenen drahtlosen Netzwerken mit unabhängiger Probenahme 具有独立抽样调查的多源无线网络中的联邦适应性 2505.23555v2
  • 65 06-12 Mimicking Human Intuition: Cognitive Belief-Driven Reinforcement Learning Mimikking Human Intuition: Kognitives Glauben-getriebenes Stärkungslernen 模拟人类感知:认知信仰 – – 强化学习 2410.01739v3
  • 66 06-12 Incentivizing Quality Text Generation via Statistical Contracts Förderung der Erzeugung von Qualitätstexten durch statistische Verträge 通过统计合同鼓励质量制文本 2406.11118v2
  • 67 06-12 SlotPi: Physics-informed Object-centric Reasoning Models SlotPi: Physik-informierte objektzentrierte Vernunftmodelle SlotPi:物理学-知情的物体中心理由模型 2506.10778v1
  • 68 06-12 Monotone Classification with Relative Approximations Einfarbige Klassifizierung mit relativen Annäherungen 相对相近的单体分类 2506.10775v1
  • 69 06-12 Skillful joint probabilistic weather forecasting from marginals Qualifizierte gemeinsame probabilistische Wettervorhersage von Randgruppen 来自边缘地带的有技能的联合概率天气预报 2506.10772v1
  • 70 06-12 OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems OPT-BENCH: LLM Agent auf großräumigen Suchräumen bewerten Optimierungsprobleme OOP-BENCH:评估大型搜索空间优化问题LLM代理 2506.10764v1
  • 71 06-12 Great Models Think Alike and this Undermines AI Oversight Große Modelle denken ähnlich und dies unterminiert AI Oversight 伟大的模特儿们想着类似的想法 和这枚地下地雷 AI监督 2502.04313v2
  • 72 06-12 Persistent Topological Features in Large Language Models Persistente Topologische Features in großen Sprachmodellen 大语言模式中的持久性有机污染物特征 2410.11042v2
  • 73 06-12 Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering Neural bei ArchEHR-QA 2025: Agentische Prompt-Optimierung für evidenzgerundete klinische Fragen ArchEHR-QA 2025:证据四舍五入临床问题解答的代理快速优化 2506.10751v1
  • 74 06-12 Graphical Transformation Models Graphische Transformationsmodelle 图形转换模型 2503.17845v3
  • 75 06-12 A User’s Guide to Sampling Strategies for Sliced Optimal Transport Leitfaden für Anwender zur Probenahme von Strategien für den optimalen Transport 剪切最佳运输抽样战略用户指南 2502.02275v4
  • 76 06-12 Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals Hybride Bernstein-Normalisierungsströmungen für flexible Multivariate Dichteregression mit interpretierbaren Marginalen 具有可解释边际作用的灵活多变密度递减的 伯恩斯坦混合 伯恩斯坦 2505.14164v2
  • 77 06-12 BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery BioNeMo Framework: eine modulare, leistungsstarke Bibliothek für die Entwicklung von KI-Modellen in der Wirkstoffentdeckung BioNemo框架:一个模块化、高绩效图书馆,用于在药物发现方面开发AI模型 2411.10548v3
  • 78 06-12 PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models PREMISE: Skalierbare und strategische Prompt-Optimierung für effiziente mathematische Reasoning in großen Modellen PREMISE:大规模模型中高效数学理由的可扩展和战略快速优化 2506.10716v1
  • 79 06-12 Testing Generalizability in Causal Inference Prüfung der Verallgemeinerbarkeit bei Kausalzusammenhang 因果关系中的通用性 2411.03021v2
  • 80 06-12 Continual Hyperbolic Learning of Instances and Classes Kontinuierliches Hyperbolisches Lernen von Instanzen und Klassen 不断对各种事件和课程进行双曲学习 2506.10710v1
  • 81 06-12 Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models Anreize für eine fortgeschrittene Instruktions-Folge von großen Sprachmodellen 为采用大语言模式的高级指示提供激励理由 2506.01413v2
  • 82 06-12 ConTextTab: A Semantics-Aware Tabular In-Context Learner ConTextTab: Ein Semantik-Bewusst Tabular In-Context Learner ConTextTab : 语义学软件制表器 2506.10707v1
  • 83 06-12 Preserving Task-Relevant Information Under Linear Concept Removal Task-Relevante Informationen unter linearen Konzeptentfernung speichern 在线性概念删除下保留任务相关信息 2506.10703v1
  • 84 06-12 SNR and Resource Adaptive Deep JSCC for Distributed IoT Image Classification SNR und Resource Adaptive Deep JSCC für verteilte IoT-Bildklassifikation SNR与资源适应性深部JSC,用于分布的IoT图像分类 2506.10699v1
  • 85 06-12 A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack Ein hierarchischer Ansatz zur Bewertung der Verwundbarkeit baumbasierter Klassifikationsmodelle für den Schlussfolgerungsangriff der Mitgliedschaft 评估树树分类模式易受成员推推推攻击的脆弱性的等级办法 2502.09396v2
  • 86 06-12 FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems FedRAG: Ein Rahmen für Systeme der Feinsteuerung von Retrieval-Augmented Generation FFRAG: 微调取回系统框架 2506.09200v2
  • 87 06-12 Scalable unsupervised feature selection via weight stability Skalierbare unüberwachte Feature-Auswahl über Gewichtsstabilität 通过重量稳定性进行可缩放的、不受监督的特征选择 2506.06114v2
  • 88 06-12 STOAT: Spatial-Temporal Probabilistic Causal Inference Network STOAT: Raum-Temporal Probabilistic Causal Inference Network STOAT: 空间-临时概率因果关系网络 2506.09544v2
  • 89 06-12 Learning hidden cascades via classification Versteckte Kaskaden durch Klassifizierung lernen 通过分类学习隐藏的级联 2505.11228v2
  • 90 06-12 Large Language Models for Detection of Life-Threatening Texts Große Sprachmodelle zur Erkennung lebensbedrohlicher Texte 探测生命威胁文字的长语言大语言模型 2506.10687v1
  • 91 06-12 Croppable Knowledge Graph Embedding Croppable Knowledge Graph Embedding 可裁剪知识图嵌入 2407.02779v2
  • 92 06-12 Implicit Language Models are RNNs: Balancing Parallelization and Expressivity Implizite Sprachmodelle sind RNNs: Ausgleichende Parallelisierung und Expressivität 隐含语言模式是RNNs:平衡平行和表达。 2502.07827v3
  • 93 06-12 SR-Reward: Taking The Path More Traveled SR-Reward: Den Weg mehr beschreiten SR-奖励:走上更多旅行的道路 2501.02330v3
  • 94 06-12 Saturation Self-Organizing Map Karte zur Selbstorganisation der Sättigung 饱和自组织地图 2506.10680v1
  • 95 06-12 VeriContaminated: Assessing LLM-Driven Verilog Coding for Data Contamination VeriContaminiert: Bewertung von LLM-getriebener Verilog Coding für Datenkontamination 受污染:评估数据污染的LLM-Driven Verilog编码 2503.13572v3
  • 96 06-12 Practical Improvements of A/B Testing with Off-Policy Estimation Praktische Verbesserungen der A/B-Prüfung mit Off-Policy-Abschätzung 实际改进A/B测试和非政策性估算的非政策性A/B测试 2506.10677v1
  • 97 06-12 RmGPT: A Foundation Model with Generative Pre-trained Transformer for Fault Diagnosis and Prognosis in Rotating Machinery RmGPT: Ein Fundamentmodell mit generativem vortrainiertem Transformer für Fehlerdiagnose und Prognose in rotierenden Maschinen RMGPT: 轮调机械系统失灵诊断和预预报的预培训前生成变形器基金会模型 2409.17604v2
  • 98 06-12 Automated Generation of Precedence Graphs in Digital Value Chains for Automotive Production Automatisierte Generierung von Precedence Graphen in digitalen Wertschöpfungsketten für die Automobilproduktion 自动生成汽车生产数字价值链中的优先性图 2504.19835v2
  • 99 06-12 Diffusion-Free Graph Generation with Next-Scale Prediction Diffusionsfreie Graphengenerierung mit Next-Scale-Vorhersage 下期期期预测的无传播图制作 2503.23612v2
  • 100 06-12 Logarithmic Smoothing for Adaptive PAC-Bayesian Off-Policy Learning Logarithmische Glättung für adaptives PAC-Bayesisches Off-Policy-Lernen PAC-Bayesian适应性PAC-Bayesian离政策学习对数平滑 2506.10664v1
  • 101 06-12 Pushing the Limits of Extreme Weather: Constructing Extreme Heatwave Storylines with Differentiable Climate Models Die Grenzen des extremen Wetters sprengen: Extreme Hitzewellen-Geschichten mit differenzierbaren Klimamodellen konstruieren 推推极端天气的极限:用不同的气候模型构建极端热浪线 2506.10660v1
  • 102 06-12 CoRT: Code-integrated Reasoning within Thinking CoRT: Code-integrierte Vernunft im Denken CORT: 思考中守则综合理由 2506.09820v2
  • 103 06-12 TSFM-Bench: A Comprehensive and Unified Benchmark of Foundation Models for Time Series Forecasting TSFM-Bench: Ein umfassender und einheitlicher Benchmark von Basismodellen für die Zeitreihenprognose TSFM-Bench:时间序列预测基础模型的全面统一基准 2410.11802v6
  • 104 06-12 Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference Graph-dependent regret Bounds in Multi-Armed Bandits mit Interferenz 多武装有干扰力的多武装强力土匪的图形- 依赖性 Regret Regret 界圈 2503.07555v2
  • 105 06-12 Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes Robuste, unüberwachte Anpassung eines Spracherkennungsgeräts mit Entropie-Minimierungs- und Lautsprechercodes 使用磁最小化和演讲人守则的演讲者演讲者 2506.10653v1
  • 106 06-12 Data Shifts Hurt CoT: A Theoretical Study Data Shifts Hurt CoT: Eine theoretische Studie Data Shifts Hurt CoT:理论研究 2506.10647v1
  • 107 06-12 Flexible Tails for Normalizing Flows Flexible Endlagen für Normalisierung von Strömungen 流动正常化的灵活反面 2406.16971v2
  • 108 06-12 Structure and asymptotic preserving deep neural surrogates for uncertainty quantification in multiscale kinetic equations Struktur und asymptotische Erhaltung tiefer neuronaler Surrogate zur Unsicherheitsquantifizierung in multiskaligen kinetischen Gleichungen 用于多尺度动动能方程中不确定性量化的深海神经代代体 2506.10636v1
  • 109 06-12 Deploying Open-Source Large Language Models: A performance Analysis Bereitstellung von Open-Source-Großsprachenmodellen: Eine Leistungsanalyse 部署开放源码大语言模式:业绩分析 2409.14887v4
  • 110 06-12 Hessian Geometry of Latent Space in Generative Models Hessische Geometrie des Latentenraums in generativen Modellen 创创模型中的深层空间的赫西德几何学 2506.10632v1
  • 111 06-12 Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs Zeitreihenprognosen als Begründung: Ein langsam denkender Ansatz mit verstärkten LLMs 以时间序列预测作为理由:与强化的LLMs采取慢思考方式 2506.10630v1
  • 112 06-12 Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning Aufgabenanpassung aus Kompetenzen: Informationsgeometrie, Entflechtung und neue Ziele für unbeaufsichtigtes Weiterbildungslernen 适应技能的任务:信息几何、分离和未受监督的加强学习新目标 2506.10629v1
  • 113 06-12 Leveraging Low-rank Factorizations of Conditional Correlation Matrices in Graph Learning Leveraging Low-rank Factorizations of Conditional Correlation Matrices in Graph Learning 在图表学习中利用条件性关联矩阵的低级别定级 2506.10628v1
  • 114 06-12 Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation Pretraining Generative Flow Networks mit teuren Belohnungen für die molekulare Graphengenerierung 为分子图生成建立具有不昂贵奖励的 预培训产生流动网络 2503.06337v4
  • 115 06-12 SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis SDialog: Ein Python-Toolkit für die Synthetische Dialog-Generierung und -Analyse Sidialog:合成对话生成和分析的Python工具包 2506.10622v1
  • 116 06-12 Assessing the Resilience of Automotive Intrusion Detection Systems to Adversarial Manipulation Bewertung der Resilienz von Intrusions-Detektionssystemen für die Adversarialmanipulation 评估汽车入侵探测系统对冲操纵的复原力 2506.10620v1
  • 117 06-12 Universal Neural Optimal Transport Universeller Neural Optimaler Verkehr 通用神经优化运输 2212.00133v6
  • 118 06-12 Deep Learning-Based Digitization of Overlapping ECG Images with Open-Source Python Code Deep Learning-based Digitalisierung von überlappenden EKG-Bildern mit Open-Source-Python-Code 使用开放源码的 ECG 重叠图像的深学习数字化 2506.10617v1
  • 119 06-12 Non-stationary Online Learning for Curved Losses: Improved Dynamic Regret via Mixability Non-stationäres Online-Lernen für geschwungene Verluste: Dynamischer Streit durch Mischbarkeit verbessert 非静止的曲线损失在线学习:通过混合改进动态迟缓 2506.10616v1
  • 120 06-12 Data Driven Diagnosis for Large Cyber-Physical-Systems with Minimal Prior Information Datenangetriebene Diagnose für große Cyber-Physical-Systeme mit minimalen Vorinformationen 具有最起码原始信息的大型网络物理系统数据驱动诊断 2506.10613v1
  • 121 06-12 Density Ratio Estimation with Conditional Probability Paths Schätzung des Dichte-Verhältnisses mit bedingten Wahrscheinlichkeitspfaden 以有条件概率路径估算密度比率 2502.02300v3
  • 122 06-12 Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models Inferenz-Zeit-Dekomposition von Aktivierungen (ITDA): Ein skalierbarer Ansatz zur Interpretation großer Sprachmodelle 触发(ITDA):解释大语言模式的可扩展办法 2505.17769v2
  • 123 06-12 Distortion-Aware Brushing for Reliable Cluster Analysis in Multidimensional Projections Distortion-Aware Bürsten für zuverlässige Clusteranalyse in mehrdimensionalen Projektionen 为多层面预测中的可靠集集分析 2201.06379v2
  • 124 06-12 TransMLA: Multi-Head Latent Attention Is All You Need TransMLA: Mehrkopf-Latent-Achtung ist alles, was Sie brauchen TransMLA: 多党领袖关注是你们需要的 2502.07864v5
  • 125 06-12 ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization ConfPO: Ausnutzen des politischen Modells Vertrauen für kritische Token-Auswahl in Präferenz-Optimierung 召集:利用政策模范信心在优先最佳化中选择关键物优选标准 2506.08712v2
  • 126 06-12 Generative Uncertainty in Diffusion Models Generative Unsicherheit bei Diffusionsmodellen 扩散模型中产生的不确定性 2502.20946v2
  • 127 06-12 Size-adaptive Hypothesis Testing for Fairness Size-adaptive Hypothesenprüfung auf Fairness 公平性假设测试 2506.10586v1
  • 128 06-12 MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark MAYA: Inkonsistenzen in generativem Passwort ansprechen Raten über einen einheitlichen Benchmark MAYA:通过统一基准解决生成密码猜猜不一致问题 2504.16651v2
  • 129 06-12 Bandit Convex Optimisation Bandit Convex Optimierung Bandit Convex 优化优化 2402.06535v4
  • 130 06-12 Graph Neural Networks for Automatic Addition of Optimizing Components in Printed Circuit Board Schematics Graphische Neuralnetze zur automatischen Ergänzung von Optimierungskomponenten in gedruckten Schaltplanen 打印电路板图示板自动增加优化部件的神经网络 2506.10577v1
  • 131 06-12 Evolutionary Prediction Games Evolutionäre Vorhersage Spiele 演进预测运动会 2503.03401v2
  • 132 06-12 Identifiability Challenges in Sparse Linear Ordinary Differential Equations Identifizierbarkeitsherausforderungen in Sparse Lineare Standarddifferenzgleichungen 分散线性普通差别等分法的可识别性挑战 2506.09816v2
  • 133 06-12 Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration Box-Constrained Softmax-Funktion und seine Anwendung für die Post-Hoc-Kalibrierung Box- Contract Softmax 函数及其热后校准应用程序 2506.10572v1
  • 134 06-12 On the role of non-linear latent features in bipartite generative neural networks Zur Rolle nichtlinearer latenter Merkmale in bipartitischen generativen neuronalen Netzwerken 关于非线性潜伏特征在双边基因神经网络中的作用 2506.10552v1
  • 135 06-12 Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods Kostenlose Bewertung von Record-Level-Datensicherheitsrisiken durch Artifact-basierte Methoden 采用人工方法进行自由记录隐私风险评估 2411.05743v3
  • 136 06-12 Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models Obliviate: Effiziente Unvergesslichkeit für den Schutz geistigen Eigentums in großen Sprachmodellen 默认:在大语言模式中有效统一保护知识产权 2502.15010v2
  • 137 06-12 Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models Erforschung von Leistungs-Komplexitäts-Trade-Offs in Sound Event Detection-Modellen 探索健全事件探测模型中的性能-复杂交易-便利交易 2503.11373v2
  • 138 06-12 Data-driven Day Ahead Market Prices Forecasting: A Focus on Short Training Set Windows datengetriebener Tag voraus Marktpreise Prognose: Ein Fokus auf Kurztraining-Set Windows 头期市场价格预测:关注短期培训成套窗口 2506.10536v1
  • 139 06-12 Equivariant Neural Diffusion for Molecule Generation Equivariante Neuraldiffusion für die Molekülgeneration 分子代的等量神经元传播 2506.10532v1
  • 140 06-12 Decision Making under the Exponential Family: Distributionally Robust Optimisation with Bayesian Ambiguity Sets Entscheidungsfindung unter der Exponential-Familie: Verteilungsstarke Optimierung mit Bayesian Ambiguity Sets 根据 “ 名人家庭 “ 下的决策:与贝ysian 暗形组的分布式强力优化 2411.16829v2
  • 141 06-12 Learning in Budgeted Auctions with Spacing Objectives Lernen in budgetierten Auktionen mit unterschiedlichen Zielen 具有间距目标的预算拍卖中的学习 2411.04843v2
  • 142 06-12 QuXAI: Explainers for Hybrid Quantum Machine Learning Models QuXAI: Erklärer für Hybrid-Quantum-Maschinen-Lernmodelle QuXAI:混合量子机器学习模型的解释器 2505.10167v3
  • 143 06-12 Macro Graph of Experts for Billion-Scale Multi-Task Recommendation Makrograph der Experten für Milliarden-Scale Multi-Task-Empfehlung 10亿多任务建议十亿多阶段专家宏观图 2506.10520v1
  • 144 06-12 A Weighted Loss Approach to Robust Federated Learning under Data Heterogeneity Ein gewichtiger Verlust Ansatz für robustes Federated Learning unter Daten Heterogenität 数据异质下联邦强力学习的加权损失方法 2506.09824v2
  • 145 06-12 Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning Subgraph Gaussian Einbettungskontrast für selbstüberwachtes Graphen-Darstellungslernen 自支持图表代表制学习的 Subgraph Gaussian 嵌入式对比对比度 2505.23529v2
  • 146 06-12 Differentially private and decentralized randomized power method Differential private und dezentrale randomisierte Energiemethode 区别对待的私人和分散的随机电源方法 2411.01931v3
  • 147 06-12 Large Scale Multi-Task Bayesian Optimization with Large Language Models Große Multi-Task Bayesian Optimierung mit großen Sprachmodellen 使用大语言模式的大型多语类多语类多语类多语类最佳优化 2503.08131v2
  • 148 06-12 Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation Qualität über Menge: Steigerung der Dateneffizienz durch ensembled multimodale Daten-Curation 质量高于数量:通过集合多模式数据曲线提高数据效率 2502.08211v2
  • 149 06-12 A Crack in the Bark: Leveraging Public Knowledge to Remove Tree-Ring Watermarks Ein Riss in der Bark: Benutzt öffentliches Wissen, um Baum-Ring-Wasserzeichen zu entfernen 瓶中的裂缝:利用公共知识来清除树状水印 2506.10502v1
  • 150 06-12 BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis BugGen: Eine selbstkorrigierende LLM-Pipeline für eine realistische RTL-Bug-Synthese BugGen: 现实的 RTL 错误合成自更正多 Agency LLM 管道 2506.10501v1
  • 151 06-12 Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG Lag nicht, RAG: Training-freie Adversarial Detection mit RAG 不要拉格,RAG:使用RAG进行无训练的反向探测 2504.04858v2
  • 152 06-12 SHORE: A Long-term User Lifetime Value Prediction Model in Digital Games SHORE: Ein langfristiges User Lifetime Value Prediction Model in digitalen Spielen SHORE: 数字游戏中长期用户终身价值预测模型 2506.10487v1
  • 153 06-12 A Conjoint Graph Representation Learning Framework for Hypertension Comorbidity Risk Prediction Ein konjoint Graph Representation Learning Framework für Hypertonie Comorbidity Risk Prediction 超高频风险预测联合图表代表学习框架 2505.05094v2
  • 154 06-12 Zero-Shot Offline Imitation Learning via Optimal Transport Zero-Shot Offline-Imitation Lernen über optimalen Transport 通过最佳交通方式进行零热离线模拟学习 2410.08751v3
  • 155 06-12 Engagement-Driven Content Generation with Large Language Models Engagement-getriebene Content-Generierung mit großen Sprachmodellen 具有大语言模式的 “ 互动互动 “ 式内容生成 2411.13187v5
  • 156 06-12 Prediction of steady states in a marine ecosystem model by a machine learning technique Vorhersage von stetigen Zuständen in einem marinen Ökosystemmodell durch eine Maschinelle Lerntechnik 通过机械学习技术预测海洋生态系统模型中稳定的状态 2506.10475v1
  • 157 06-12 PASCO (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms PASCO (PArallel Structured COarsening): ein Overlay, um Graph Clustering Algorithmen zu beschleunigen PASCO(PARALLEL 结构化COseting):加速图形组合算法的重叠 2412.13592v2
  • 158 06-12 Glimpse: Generalized Locality for Scalable and Robust CT Glimpse: Generalisierte Lokalität für skalierbare und robuste CT Glumpse: 可缩放和强力CT的通用位置 2401.00816v3
  • 159 06-12 Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts Auf dem Weg zur robusten multimodalen Emotionserkennung unter fehlenden Modalitäten und Verteilungsverschiebungen 争取在缺失模式和分销转移模式下强有力地承认多模式情感 2506.10452v1
  • 160 06-12 Extending AALpy with Passive Learning: A Generalized State-Merging Approach AALpy mit passivem Lernen erweitern: Ein generalisierter Staats-Merging-Ansatz 扩大被动学习的AALPY:普遍化的国家-老龄化办法 2506.06333v2
  • 161 06-12 MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices MNN-LLM: Eine generische Inferenz-Engine für schnelle großsprachliche Modellbereitstellung auf mobilen Geräten MNN-LLM:移动设备快速大语言模型部署通用推断引擎 2506.10443v1
  • 162 06-12 Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Experten-Rennen: Eine flexible Routing-Strategie für Scaling Diffusion Transformer mit Mixture von Experten 专家种族:与专家混合的扩大传播变异器灵活机动战略 2503.16057v3
  • 163 06-12 MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning MasHost baut alles: Autonomes Multi-Agenten-System, das durch Verstärkungslernen gesteuert wird 以强化学习为导向的多机构自治系统 2506.08507v2
  • 164 06-12 From Features to Graphs: Exploring Graph Structures and Pairwise Interactions via GNNs Von Features zu Graphen: Erforschen von Graphenstrukturen und paarweise Interaktionen über GNNs 从地貌到图表:通过GNNs探索图表结构和对称互动 2502.13471v2
  • 165 06-12 System Identification Using Kolmogorov-Arnold Networks: A Case Study on Buck Converters Systemidentifizierung mit Kolmogorov-Arnold Networks: Eine Fallstudie zu Buck Convertern 使用Kolmogorov-Arnold网络的系统识别系统:关于巴克转换器的案例研究 2506.10434v1
  • 166 06-12 Measuring Semantic Information Production in Generative Diffusion Models Messung der semantischen Informationsproduktion in generativen Diffusionsmodellen 生物传播模型中计量语义信息生产 2506.10433v1
  • 167 06-12 CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning CheMatAgent: Verbesserung von LLMs für Chemie und Materialwissenschaft durch baumsuchebasiertes Tool Learning CheMatAgent:通过植树搜索工具学习加强化学和材料科学LLMs 2506.07551v2
  • 168 06-12 Multi-dimensional Autoscaling of Processing Services: A Comparison of Agent-based Methods Mehrdimensionale Autoskalierung von Verarbeitungsdienstleistungen: Ein Vergleich von agentenbasierten Methoden 处理服务多维多维自动升级:以代理为基础的方法比较 2506.10420v1
  • 169 06-12 Data-Driven Soil Organic Carbon Sampling: Integrating Spectral Clustering with Conditioned Latin Hypercube Optimization Datengetriebene Bodenorganische Kohlenstoff-Probenahme: Integration von Spektral-Clustering mit bedingter lateinischer Hyperkube-Optimierung 数据驱动土壤有机碳取样:将光谱集成与有条件的拉丁超立方体优化结合 2506.10419v1
  • 170 06-12 Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation Verbesserter Algorithmus für tiefes aktives Lernen unter Unwucht durch Optimale Trennung 通过最佳离职,在平衡下改善深层积极学习的比值 2312.09196v4
  • 171 06-12 Towards Graph-Based Privacy-Preserving Federated Learning: ModelNet – A ResNet-based Model Classification Dataset Auf dem Weg zu einem graphisch basierten Datenschutz-Erhaltungs-Federated Learning: ModelNet – Ein resNet-basierter Modellklassifikations-Datensatz 走向基于图表的隐私-保留联邦学习:模型网 – – 以ResNet为基础的分类模型数据集 2506.00476v2
  • 172 06-12 Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series Zeit-IMM: Ein Datensatz und Benchmark für irreguläre multimodale Multivariate Zeitreihen 时间-IMM:非正常多式联运多变时间序列的数据集和基准 2506.10412v1
  • 173 06-12 DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models DeepGDel: Deep Learning-basierte Gene Deletion Prediction Framework für wachstumsverbundene Produktion in Genom-Scale Metabolic-Modellen 深层GDel:在基因组-规模元元模型中实现增长和混合生产以深学习为基础的基因删除预测框架 2504.06316v2
  • 174 06-12 IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection IoTGeM: Generalisierbare Modelle für die IoT-Angrifferkennung auf Verhaltensbasis IotGeM:基于行为的IoT攻击探测通用模型 2401.01343v2
  • 175 06-12 An energy-efficient learning solution for the Agile Earth Observation Satellite Scheduling Problem Eine energieeffiziente Lernlösung für das agile Erdbeobachtungssatelliten-Scheduling-Problem Agile地球观测卫星排程问题的一个节能学习解决方案 2503.04803v2
  • 176 06-12 PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier PAG: Multi-Turn verstärkt LLM Selbstkorrektion mit Politik als Generativer Prüfer PAG: 多发强化LLM自我校正,政策作为产生验证 2506.10406v1
  • 177 06-12 Generative Algorithms for Wildfire Progression Reconstruction from Multi-Modal Satellite Active Fire Measurements and Terrain Height Generative Algorithmen für Wildfire-Progressions-Rekonstruktion durch Multi-Modal Satellite-Aktivfeuer-Messungen und Geländehöhe 从多模式卫星活性火灾测量和地面高度生成的野火进量重建 2506.10404v1
  • 178 06-12 Time To Impeach LLM-as-a-Judge: Programs are the Future of Evaluation Time To Impeach LLM-as-a-Richter: Programme sind die Zukunft der Evaluation 弹劾LLM作为法官的时间:方案就是评价的未来 2506.10403v1
  • 179 06-12 RsGCN: Rescaling Enhances Generalization of GCNs for Solving Scalable Traveling Salesman Problems RsGCN: Rescaling verbessert die Generalisierung von GCNs zur Lösung skalierbarer reisender Salesman-Probleme RsGCN: 提高全球氯化萘的通用化,以解决可缩放旅行销售员问题 2506.00533v2
  • 180 06-12 TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning TooBadRL: Trigger-Optimierung zur Steigerung der Effektivität von Backdoor-Angriffen auf Deep Verstärkung Learning TooBadRL:在深强化学习中提高后门袭击效果的触发最佳优化 2506.09562v2
  • 181 06-12 EQA-RM: A Generative Embodied Reward Model with Test-time Scaling EQA-RM: Generatives Embmund-Reward-Modell mit Testzeitskalierung EQA-RM:带有试验时间缩放的产生性内嵌式奖赏模型 2506.10389v1
  • 182 06-12 Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective Messen von Repräsentationsverschiebungen im kontinuierlichen Lernen: Eine lineare Transformationsperspektive 持续学习中的衡量代表性变化:线性转变前景 2505.20970v3
  • 183 06-12 Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning Hierarchische Latentenfähigkeiten von Sprachmodellen über das kausale Repräsentationslernen entdecken 通过因果代表制学习发现语言模式的分级本端能力 2506.10378v1
  • 184 06-12 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Ein minimalistischer Ansatz zur LLM-Vernunft: von der Abstoßung zur Verstärkung 从拒绝抽样到强化 2504.11343v2
  • 185 06-12 Revisiting Transformers with Insights from Image Filtering Übertragen von Transformern mit Einsichten aus der Bildfilterung 以图像过滤透视的洞视重新审视变形器 2506.10371v1
  • 186 06-12 Can We Infer Confidential Properties of Training Data from LLMs? Können wir vertrauliche Eigenschaften von Trainingsdaten von LLMs ableiten? 我们能否从LLMS中推断培训数据的机密性? 2506.10364v1
  • 187 06-12 Permutation-Based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data Permutationsbasierter Rank-Test in der Gegenwart von Diskretisierung und Anwendung in der kausalen Entdeckung mit gemischten Daten 在与混合数据相混合的Causal Discovery中存在分化和应用 2501.18990v2
  • 188 06-12 Advanced deep architecture pruning using single filter performance Fortschrittliche Tiefenarchitektur-Beschneidung mit Single-Filter-Performance 使用单一过滤性能进行高级深高级结构修剪 2501.12880v2
  • 189 06-12 Dimension-Independent Kernel ε-Covers Dimensionsunabhängiger Kernel ε-Covers 独立的内核 - 中央 2306.16516v2
  • 190 06-12 Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success Demonstration von Multi-Suction Item Picking at Scale über Multi-Modal Lernen von Pick-Erfolg 通过多模式学习选取成功模式, 演示通过选取成功多模式学习在缩放中选择的多测试项目 2506.10359v1
  • 191 06-12 TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree TreeLoRA: Effizientes kontinuierliches Lernen über schichtweise LoRAs, geleitet von einem Hierarchischen Gradienten-Similaritätsbaum TreeLORA:通过层-维兹 LoRAs 有效持续学习,由等级梯度梯度- 硅化树指导 2506.10355v1
  • 192 06-12 History-Aware Neural Operator: Robust Data-Driven Constitutive Modeling of Path-Dependent Materials History-Aware Neuronal Operator: Robuste datengetriebene konstitutive Modellierung von pfadabhängigen Materialien 历史软件神经操作器:路径依赖材料的强有力数据驱动建模模型 2506.10352v1
  • 193 06-12 PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications PhysNav-DG: Ein neuartiges adaptives Framework für robuste VLM-Sensor-Fusion in Navigationsanwendungen PhysNav-DG:导航应用中的强力VLM-传感器融合新适应框架 2505.01881v2
  • 194 06-12 Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control Verbesserung der Glasdefekterkennung mit Diffusionsmodellen: Adressierung unausgewogener Datensätze in der Fertigungsqualitätskontrolle 利用传播模型加强玻璃破损检测:在制造业质量控制中解决数据集不平衡问题 2505.03134v2
  • 195 06-12 Decoding for Punctured Convolutional and Turbo Codes: A Deep Learning Solution for Protocols Compliance Dekodierung für punktierte Konvolutions- und Turbo-Codes: Eine Deep-Learning-Lösung für Protokolle Compliance 钻穿的革命代码和涡轮代码解码:遵守议定书的深学习解决方案 2502.15475v2
  • 196 06-12 PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation PhysioWave: Multi-Scale Wavelet-Transformer für Physiologische Signaldarstellung PhysioWave: 生理信号代表的多阶段波盘转换器 2506.10351v1
  • 197 06-12 LightKG: Efficient Knowledge-Aware Recommendations with Simplified GNN Architecture LightKG: Effiziente wissensbasierte Empfehlungen mit vereinfachter GNN-Architektur LightKG: 具有简化 GNN 架构的高效知识软件建议 2506.10347v1
  • 198 06-12 Technical Report with Proofs for A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments Technischer Bericht mit Beweisen für ein vollständiges Bild in Übereinstimmungsprüfung: Alle optimalen Ausrichtungen effizient zusammenfassen 技术报告,附有符合要求检查中完整图片的校对证明:高效地对齐所有最佳对齐 2506.10345v1
  • 199 06-12 Provably Learning from Language Feedback Wahrscheinlich von Sprachfeedback lernen 从语言反馈中学习 2506.10341v1
  • 200 06-12 Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Auf dem Weg zu datenschutzerhaltender verteilter Intelligenz in AR/VR/MR 多模式、多模式、多任务、多任务联邦基金会下一代扩展现实系统模式:争取在AR/VR/MR/MR中发布隐私保护分配情报 2506.05683v3
  • 201 06-12 On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention On-the-Fly Adaptive Destillation von Transformer auf Dual-State Lineare Aufmerksamkeit 向两国线性注意对变压器进行实时调整性蒸馏 2506.09316v2
  • 202 06-12 Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs Amulett: Neuausrichtung während der Testzeit für Personalisierte Präferenzanpassung von LLMs 缩略图:在试验期间重新对准,以适应LLMM的个性化偏好 2502.19148v3
  • 203 06-12 Air in Your Neighborhood: Fine-Grained AQI Forecasting Using Mobile Sensor Data Luft in Ihrer Nachbarschaft: Feinkörnige AQI-Prognose mit mobilen Sensordaten 利用移动感应器数据进行精密的AQI预报 2506.10332v1
  • 204 06-12 CompilerDream: Learning a Compiler World Model for General Code Optimization CompilerDream: Lernen eines Compiler-Weltmodells für die allgemeine Code-Optimierung 汇编者:学习编纂者世界通用守则优化模式 2404.16077v3
  • 205 06-12 Towards Scalable SOAP Note Generation: A Weakly Supervised Multimodal Framework Auf dem Weg zu einer skalierbaren SOAP-Note Generation: Ein schwach überwachter multimodaler Rahmen 向可缩放的 SOAP 笔记本生成方向迈进: 受微弱监督的多模式框架 2506.10328v1
  • 206 06-12 A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon Ein Benchmark für die Verallgemeinerung unterschiedlicher Teamstrategien im wettbewerbsfähigen Pokémon 普凯蒙竞争中全面推广不同团队战略的基准 2506.10326v1
  • 207 06-12 SWDL: Stratum-Wise Difference Learning with Deep Laplacian Pyramid for Semi-Supervised 3D Intracranial Hemorrhage Segmentation SWDL: Stratum-Wise Difference Learning mit tiefer Laplacian Pyramide für halbüberwachte 3D-Intrakranialblutungssegmentierung SWDL: 与深拉拉拉比西亚金字塔为半超3D内出血分块进行平流和深拉拉拉西亚金字塔差异学习 2506.10325v1
  • 208 06-12 DiffUMI: Training-Free Universal Model Inversion via Unconditional Diffusion for Face Recognition DiffUMI: Schulungsfreie universelle Modellinversion über Bedingungslose Diffusion für die Gesichtserkennung DiffUMI:通过无条件传播面相承认,无培训通用模式翻版 2504.18015v2
  • 209 06-12 DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDE DeePoly: Ein High-Order Accuracy Scientific Machine Learning Framework für die Funktionsannäherung und Lösung von PDE Deepoly: 功能近似和溶解PDE高分精确度科学机器学习框架 2506.04613v2
  • 210 06-12 Policy-Based Trajectory Clustering in Offline Reinforcement Learning Policy-based Trajektorie Clustering im Offline-Verstärkungslernen 以政策为基础的离线强化学习轨迹集群 2506.09202v2
  • 211 06-12 PerfTracker: Online Performance Troubleshooting for Large-scale Model Training in Production PerfTracker: Online-Performance-Fehlersuche für großformatige Modellschulungen in der Produktion PerfTracker:大规模生产示范培训在线绩效问题解决 2506.08528v3
  • 212 06-12 Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting Aufwärtsgewichtung einfache Proben in Feinabstimmung Mitigates Vergessen 高重量加重量易加重量样本 2502.02797v2
  • 213 06-12 PyLO: Towards Accessible Learned Optimizers in PyTorch PyLO: Auf dem Weg zu zugänglich gelernten Optimierern in PyTorch PyLO:在皮托尔奇努力实现无障碍、可获取的优化 2506.10315v1
  • 214 06-12 Detecting Sockpuppetry on Wikipedia Using Meta-Learning Sockepuppetry auf Wikipedia erkennen Mit Meta-Learning 在维基百科上用元学习探测袜子布料 2506.10314v1
  • 215 06-12 Collaborative Min-Max Regret in Grouped Multi-Armed Bandits Kollaborativer Min-Max-Streit in gruppierten Multi-Armed Bandits 联合多武装强盗团体中负负负负 2506.10313v1
  • 216 06-12 Agnostic Smoothed Online Learning without Knowledge of the Base Measure Agnostisches geglättetes Online-Lernen ohne Kenntnis des Basismaßes 缺乏基础措施知识的Agnostic平滑在线学习 2410.05124v3
  • 217 06-12 Self-learning signal classifier for decameter coherent scatter radars Selbstlernender Signalklassifikator für Dekameter-Konsistenzstreuradar 10度连续散射雷达自学信号分类器 2506.10305v1
  • 218 06-12 The Alignment Trap: Complexity Barriers Die Alignment-Falle: Komplexitätsbarrieren 协调陷阱:复杂障碍 2506.10304v1
  • 219 06-12 Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training ReLU-Netzwerke zusammenführen, um bei der Initialisierung und während des Trainings viele lineare Regionen exponieren zu lassen 在初始化和培训期间,将ReLU网络强制推至展示极多线性线性区域 2311.18022v7
  • 220 06-12 Distributionally-Constrained Adversaries in Online Learning Verteilt bedingte Widersacher im Online-Lernen 在线学习中经分发的、经过培训的在线学习论文 2506.10293v1
  • 221 06-12 Heterogeneous Multi-Agent Reinforcement Learning for Distributed Channel Access in WLANs Heterogenes Multi-Agenten-Verstärkungslernen für verteilten Kanalzugang in WLANs 不同多机构强化学习,用于在局域网中分配的频道接入 2412.14218v2
  • 222 06-12 Context Is Not Comprehension Kontext ist nicht verständlich 背景不令人理解 2506.04907v4
  • 223 06-12 ClusterUCB: Efficient Gradient-Based Data Selection for Targeted Fine-Tuning of LLMs ClusterUCB: Effiziente Gradient-basierte Datenauswahl für gezielte Feinsteuerung von LLMs COCUCB: 高效率的逐步数据选择,以便有针对性地微调LLMM 2506.10288v1
  • 224 06-12 Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning Graph-MLLM: Multimodale große Sprachmodelle für multimodales Graphenlernen nutzen 图图-MLLM:利用多模式多模式大语言模型进行多模式图表学习 2506.10282v1
  • 225 06-12 Noise Balance and Stationary Distribution of Stochastic Gradient Descent Geräuschausgleich und Stationäre Verteilung des stochastischen Gradientenabstiegs 沙粒渐变后裔的噪音平衡和固定分布 2308.06671v2
  • 226 06-12 Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty Leichtes Datenset Pruning ohne vollständiges Training über Beispiel Schwierigkeit und Vorhersage Ungewissheit 轻量数据集不通过实例困难和预测不确定性进行充分培训 2502.06905v3
  • 227 06-12 VQC-MLPNet: An Unconventional Hybrid Quantum-Classical Architecture for Scalable and Robust Quantum Machine Learning VQC-MLPNet: Eine unkonventionelle hybride Quantum-Klassische Architektur für skalierbares und robustes Quantum Machine Learning VQC-MLPNet:用于可缩放和强力量量机器学习的非常规混合量子分类结构 2506.10275v1
  • 228 06-12 TimeBridge: Better Diffusion Prior Design with Bridge Models for Time Series Generation TimeBridge: Besseres Diffusions-Vorentwurf mit Bridge-Modellen für die Zeitreihenerstellung TimeBridge: 更好的传播前设计,具有时间序列生成的桥梁模型 2408.06672v2
  • 229 06-12 GD doesn’t make the cut: Three ways that non-differentiability affects neural network training GD macht nicht den Schnitt: Drei Wege, dass Nicht-Unterschiedlichkeit das neuronale Netzwerk-Training beeinflusst GD没有做出削减: 三种无差异的方式影响神经网络培训 2401.08426v9
  • 230 06-12 Predicting function of evolutionarily implausible DNA sequences Vorhersagefunktion evolutionär unglaubwürdiger DNA-Sequenzen 进化性DNA序列的预测功能 2506.10271v1
  • 231 06-12 Interior-Point Vanishing Problem in Semidefinite Relaxations for Neural Network Verification Interior-Point Vanishing Problem in semidefinite Entspannung für Neural Network Verifikation 神经网络核查半无限期放松措施中的内部-点消失问题 2506.10269v1
  • 232 06-12 Do Language Models Have Bayesian Brains? Distinguishing Stochastic and Deterministic Decision Patterns within Large Language Models Haben Sprachmodelle Bayesische Gehirne? Beeindruckende stochastische und deterministische Entscheidungsmuster innerhalb großer Sprachmodelle 语言模式是否具有贝耶斯人脑? 区分大语言模式中的斯托卡和决定性决定模式 2506.10268v1
  • 233 06-12 Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update Heavy-Tailed Linear Bandits: Huber-Regression mit One-Pass-Update 重困线性强盗:用单页更新的 “ Huber Refer “ 后退 2503.00419v2
  • 234 06-12 Meta-learning Representations for Learning from Multiple Annotators Meta-Learning-Darstellungen zum Lernen von mehreren Annotatoren 从多个宣传员那里学习的元学习代表 2506.10259v1
  • 235 06-12 Elucidating the Design Space of Multimodal Protein Language Models Erklärung des Gestaltungsraums multimodaler Protein-Sprachmodelle 说明多模式蛋白质语言模型的设计空间 2504.11454v3
  • 236 06-12 OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning OmniSage: Large Scale, Multi-Entity Heterogenes Graph Representation Learning 总括:大规模、多实体异质图代表制学习 2504.17811v3
  • 237 06-12 Qronos: Correcting the Past by Shaping the Future… in Post-Training Quantization Qronos: Die Vergangenheit korrigieren, indem sie die Zukunft gestalten… in der Post-Training-Quantisierung Qronos:在培训后量化中通过塑造未来来纠正过去 2505.11695v2
  • 238 06-11 (3) Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing Gewicht wert: Randomisierte und Regularisierte Blockkaczmarz-Algorithmen ohne Vorverarbeitung 重量:无预处理的随机和正规化卡茨马尔兹理算区块 2502.00882v2
  • 239 06-11 A new type of federated clustering: A non-model-sharing approach Eine neue Art von föderierten Clustern: Ein nicht-Modell-Sharing-Ansatz 新型的联盟联合集群:非模式共享办法 2506.10244v1
  • 240 06-11 PointNet with KAN versus PointNet with MLP for 3D Classification and Segmentation of Point Sets PointNet mit KAN versus PointNet mit MLP für 3D Klassifizierung und Segmentierung von Punktsätzen 与 KAN 和 PointNet 和 MLP 相对的点网点网点,用于 3D 的点集分类和分划 2410.10084v3
  • 241 06-11 Function-Guided Conditional Generation Using Protein Language Models with Adapters Funktionsgeführte Conditional Generation Verwendung von Protein Language Models mit Adaptern 使用配有适应器的蛋白语言模型的 函数引导有条件生成 2410.03634v2
  • 242 06-11 Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods Prompt Angriffe Enthüllen Oberflächliche Wissensentfernung in unlearning Methoden 以不学习方法清除知识 2506.10236v1
  • 243 06-11 LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation LaMAGIC2: Erweiterte Schaltungsformulierungen für sprachmodellbasierte analoge Topologie-Generierung LaMAGIC2:语言模拟模拟模拟地形生成的先进电路配制 2506.10235v1
  • 244 06-11 Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices Große Sprachmodelle sichern: Bedrohungen, Schwachstellen und verantwortungsvolle Praktiken 保障大语言模式:威胁、脆弱性和负责任做法 2403.12503v2
  • 245 06-11 ScoreMix: Improving Face Recognition via Score Composition in Diffusion Generators ScoreMix: Verbesserung der Gesichtserkennung über Score-Komposition in Diffusionsgeneratoren 分数组合:通过传播发电机的分分数构成改善面对面的承认 2506.10226v1
  • 246 06-11 Meta-learning Optimizers for Communication-Efficient Learning Meta-Learning-Optimierer für Kommunikations-Effizientes Lernen 通信-高效学习的元学习优化工具 2312.02204v2
  • 247 06-11 Neural Networks Generalize on Low Complexity Data Neurale Netzwerke verallgemeinern auf Daten mit geringer Komplexität 通用低复杂度数据神经网络 2409.12446v3
  • 248 06-11 The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks Die Auswirkungen von Feature Scaling im maschinellen Lernen: Auswirkungen auf Regressions- und Klassifizierungsaufgaben 机械学习中的特质增强效果:对倒退和分类任务的影响 2506.08274v2
  • 249 06-11 SALAD: Systematic Assessment of Machine Unlearing on LLM-Aided Hardware Design SALAD: Systematische Bewertung der Maschinenentleerung auf LLM-Aided Hardware Design SALAD: 系统评估LLM辅助硬件设计中的机器渗漏 2506.02089v2
  • 250 06-11 Cross-Learning Between ECG and PCG: Exploring Common and Exclusive Characteristics of Bimodal Electromechanical Cardiac Waveforms Cross-Learning zwischen EKG und PCG: Erforschung gemeinsamer und exklusiver Eigenschaften bimodaler elektromechanischer Herzwellenformen ECG和PCG之间的交叉学习:探索双式电动机械心脏病波形的常见和独有特点 2506.10212v1
  • 251 06-11 Failure Modes of LLMs for Causal Reasoning on Narratives Failure Modes von LLMs für die ursächliche Begründung von Narrativen 以叙述为由解释原因的LLMs失败模式 2410.23884v4
  • 252 06-11 SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models SealQA: Anhebung der Messlatte für die Vernunft in Search-Augmented Language Models SealQA: 提高搜索增强语言模式中的原因栏 2506.01062v2
  • 253 06-11 ETS: Efficient Tree Search for Inference-Time Scaling ETS: Effiziente Baumsuche nach Inferenz-Zeitskalierung ETS: 高效树搜索推断时间缩放 2502.13575v2
  • 254 06-11 Towards Fair Representation: Clustering and Consensus Auf dem Weg zu einer fairen Vertretung: Clustering und Konsens 实现公平代表性:集群和共识 2506.08673v2
  • 255 06-11 AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent AWP: Activation-Aware Weight Pruning und Quantisierung mit projiziertem Gradient Descent AWP: 与预测的渐变人种相比,活性-敏量 体重 稳重和量化 2506.10205v1
  • 256 06-11 VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use VTool-R1: VLMs lernen mit Bildern zu denken, indem sie mehr über multimodale Werkzeugnutzung lernen VTool-R1:VLMs通过多模式工具使用强化学习学习如何用图像思考 2505.19255v3
  • 257 06-11 Prompt Variability Effects On LLM Code Generation Veränderliche Auswirkungen auf die LLM-Code-Generierung 对LLM 代码生成的迅速易变性效应 2506.10204v1
  • 258 06-11 From Neural Representations to Interpretable Logic Rules Von neuralen Vertretungen zu interpretierbaren Logischen Regeln 从神经代表到可解释逻辑规则 2501.08281v2
  • 259 06-11 Learning Geometric Invariant Features for Classification of Vector Polygons with Graph Message-passing Neural Network Lernen Geometrische Invariante Funktionen zur Klassifizierung von Vektor-Polygonen mit Graph Message-passing Neural Network 具有图形信件通过神经网络的矢量多边形分类 2407.04334v2
  • 260 06-11 Privacy-aware Berrut Approximated Coded Computing for Federated Learning Privacy-aware Berrut Ungefähre Coded Computing für Federated Learning Berrut Berrut Berrut 接近的联邦学习联合会编码计算器 2405.01704v3
  • 261 06-11 DynaSubVAE: Adaptive Subgrouping for Scalable and Robust OOD Detection DynaSubVAE: Adaptive Subgruppierung für skalierbare und robuste OOD-Erkennung DynaSubVAE:可缩放和强力OOD探测适应性分组 2506.10200v1
  • 262 06-11 Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance Objekt-Halluzination in großen Visions-Sprachen-Modellen durch Bild-umrundete Anleitung abmildern 通过图像四舍五入指南在大型视觉语言模型中减轻物体幻觉 2402.08680v2
  • 263 06-11 Exploring Topological and Localization Phenomena in SSH Chains under Generalized AAH Modulation: A Computational Approach Erforschung topologischer und Lokalisierungsphänomene in SSH-Ketten unter generalisierter AAH-Modulation: Ein rechnerischer Ansatz 根据普遍AAAH变动:一种计算方法,在SSH链链中探索地形学和本地化现象 2506.10195v1
  • 264 06-11 Multi-task Representation Learning for Mixed Integer Linear Programming Multi-Task-Darstellungs-Lernen für gemischte Integer-Linear-Programmierung 混合整数线性线性方案规划多任务代表学习 2412.14409v2
  • 265 06-11 Improving Oral Cancer Outcomes Through Machine Learning and Dimensionality Reduction Verbesserung der Ergebnisse von Oralkrebs durch maschinelles Lernen und Reduzierung der Dimensionalität 通过机器学习和减少分量来改善口腔癌结果 2506.10189v1
  • 266 06-11 Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce Zukunft der Arbeit mit KI-Agenten: Auditing Automation und Augmentation Potential in der gesamten US-Arbeitskraft 未来与AI代理机构的合作:全美国员工队伍的审计自动化和增强潜力 2506.06576v2
  • 267 06-11 Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming Problem Balans: Multi-Armed Bandits-basierte adaptive Großnachbarschaft Suche nach gemischt-integer-Programmierungsproblem Balans:多武装强盗基于适应性的大型邻里搜索混合内插方案拟订问题 2412.14382v2
  • 268 06-11 Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment Skalierbare nicht-equivariante 3D-Molekül-Erzeugung über Rotational Alignment 通过旋转对齐可缩放的非等式 3D 分子生成 2506.10186v1
  • 269 06-11 Optimizing Genetic Algorithms with Multilayer Perceptron Networks for Enhancing TinyFace Recognition Optimierung genetischer Algorithmen mit mehrschichtigen Perceptron-Netzwerken zur Verbesserung der TinyFace-Erkennung 利用增强微面识别的多层倍感人网络优化遗传算法 2506.10184v1
  • 270 06-11 A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes Eine vergleichende Studie über Techniken des maschinellen Lernens zur Frühvorhersage von Diabetes 早期预测糖尿病的机器学习技术比较研究 2506.10180v1
  • 271 06-11 Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models Geometrische Regelmäßigkeit bei der Deterministischen Probenahme von Diffusions-basierten Generativen Modellen 以传播为基础的创造模型的确定性抽样的几何规律性 2506.10177v1
  • 272 06-11 SPARKE: Scalable Prompt-Aware Diversity Guidance in Diffusion Models via RKE Score SPARKE: Skalierbares Prompt-Aware Diversity Guidance in Diffusionsmodellen über RKE-Score SPARKE:通过RKE分数在传播模型中提供可缩放的即时快速软件多样性指导 2506.10173v1
  • 273 06-11 Learning a Gaussian Mixture for Sparsity Regularization in Inverse Problems Eine Gaußische Mischung für Sparsamkeitsregularisierung in inversen Problemen lernen a 学习高斯混音,以在反向问题中实现分级规范化 2401.16612v2
  • 274 06-11 Momentum Multi-Marginal Schrödinger Bridge Matching Momentum Multi-Marginal Schrödinger Brücke passend 多波盘桥配桥 2506.10168v1
  • 275 06-11 Wasserstein Barycenter Soft Actor-Critic Wasserstein Barycenter Soft-Actor-Kritik 瓦塞斯泰因·巴里研究中心 2506.10167v1
  • 276 06-11 The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset Der PNPL-Wettbewerb 2025: Spracherkennung und Phoneme-Klassifizierung im LibriBrain-Datensatz 2025年PNPL竞争:Libribrain数据集中的语音探测和电话分类 2506.10165v1
  • 277 06-11 Probabilistic Variational Contrastive Learning Probabilistisches Variationelles Kontrastives Lernen 概率概率变化差异性差异性学习 2506.10159v1
  • 278 06-11 Measuring Corporate Human Capital Disclosures: Lexicon, Data, Code, and Research Opportunities Corporate Human Capital Disclosures messen: Lexikon, Daten, Code und Forschungsmöglichkeiten 衡量公司人力资本披露:词汇、数据、代码和研究机会 2506.10155v1
  • 279 06-11 Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME Analysieren von Emotionen in Bangla Social Media Kommentare mit maschinellem Lernen und LIME 利用机器学习和LIME分析孟加拉社会媒体评论中的情感 2506.10154v1
  • 280 06-11 Attention on flow control: transformer-based reinforcement learning for lift regulation in highly disturbed flows Aufmerksamkeit auf Strömungssteuerung: Transformer-basiertes Bewehrungslernen zur Heberegulierung in stark gestörten Strömungen 注意流动控制:以变压器为基础的增强力学习如何在高度动荡的流量中进行升降管制 2506.10153v1
  • 281 06-11 Balanced Hyperbolic Embeddings Are Natural Out-of-Distribution Detectors Ausgewogene hyperbolische Einbettungen sind natürliche Out-of-Distribution-Detektoren 平衡的双曲嵌入器是天然流散探测器 2506.10146v1
  • 282 06-11 Physiological-Model-Based Neural Network for Heart Rate Estimation during Daily Physical Activities Physiologisch-Modellbasiertes Neuronales Netzwerk zur Schätzung der Herzfrequenz während täglicher körperlicher Aktivitäten 日常体育活动中心电率估计的生理-基于模型的神经网络 2506.10144v1
  • 283 06-11 Diffusion prior as a direct regularization term for FWI Diffusion vor der direkten Regularisierung für FWI 之前作为家庭福利基金会直接正规化期的传播 2506.10141v1
  • 284 06-11 Survival Analysis as Imprecise Classification with Trainable Kernels Überlebensanalyse als ungenaue Klassifizierung mit lernfähigen Kernen 进行生存分析,将生存分析作为可训练内核的不精确分类 2506.10140v1
  • 285 06-11 Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban Lernte Suche interpretieren: Ein Übergangsmodell und eine Wertfunktion in einer RNN zu finden, die Sokoban spielt 解释学识搜索:在播放 Sokoban 的 RNN 中找到过渡模式和价值函数 2506.10138v1
  • 286 06-11 Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning Selbstprädiktive Darstellungen für die kombinatorische Generalisierung im Verhaltensklonen 行为克隆综合通用的自我预测代表 2506.10137v1
  • 287 06-11 Provable Sim-to-Real Transfer via Offline Domain Randomization Provable Sim-to-Real-Übertragung über Offline-Domain Randomisierung 通过离线网域随机转接 2506.10133v1
  • 288 06-11 ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs ViCrit: Eine überprüfbare Verstärkungs-Lern-Proxy-Aufgabe für visuelle Wahrnehmung in VLMs 虚拟: VLM 视觉感知的可核查强化学习代用任务 2506.10128v1
  • 289 06-11 Generative Modeling with Diffusion Generative Modellierung mit Diffusion 利用传播生成模型 2412.10948v2
  • 290 06-11 Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms Treffen Sie mich am Arm: Das kooperative Multi-Armed Bandits Problem mit geteilten Armen 与我在 “ 武器:合作的多武装强盗与可分享武器问题 “ 聚会 2506.10127v1
  • 291 06-11 Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds’ Annotated Imagery Peer-Ranked Precision: Erstellung eines Basisdatensatzes für feinfühlige Visionsmodelle aus DataSeeds kommentierten Bildern 同行式精确度:从数据种子附加图象中为精美的愿景模型创建基础数据集 2506.05673v3
  • 292 06-11 GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments GRAIL: Ein Benchmark für GRAPH ActIve Learning in dynamischen Sensing-Umgebungen GRAIL: 动态遥感环境中GRaph Active学习基准 2506.10120v1
  • 293 06-11 Detecção da Psoríase Utilizando Visão Computacional: Uma Abordagem Comparativa Entre CNNs e Vision Transformers Detecção da Psoríase Utilizando Visão Computacional: Uma Abortagem Comparativa Entre CNNs e Vision Transformers 利用有线电视新闻网和愿景变革者 2506.10119v1
  • 294 06-11 NnD: Diffusion-based Generation of Physically-Nonnegative Objects NnD: Diffusionsbasierte Generierung physikalisch-nonnegativer Objekte NND: 以传播为主的物理-氮化物体的生成 2506.10112v1
  • 295 06-11 AI5GTest: AI-Driven Specification-Aware Automated Testing and Validation of 5G O-RAN Components AI5GTest: AI-Driven Specification-Aware Automatisierte Prüfung und Validierung von 5G O-RAN Komponenten AI5GTest: AI-Driver 5G O-RAN 部件的AI-DRE 规格-软件自动测试和验证 2506.10111v1
  • 296 06-11 A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning Ein einheitliches Framework, um Symmetrie im maschinellen Lernen zu stärken, zu entdecken und zu fördern 实施、发现和促进机器学习对称的统一框架 2311.00212v3
  • 297 06-11 Learning to Collaborate Over Graphs: A Selective Federated Multi-Task Learning Approach Lernen, über Graphen zu kollaborieren: Ein selektiver, federführender Lernansatz mit mehreren Aufgaben 学习 “ 合作翻图:选择性联邦多任务学习方法 “ 2506.10102v1
  • 298 06-11 VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis VISTA: Vision-Language-Schlussfolgerung für eine trainingsfreie Analyse der Stock-Zeitreihen VISTA:无培训-库存无培训-时间-系列分析的远景-语言推断 2505.18570v3
  • 299 06-11 Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes Grundlegende Grenzen des Lernens Hochdimensionale Vereinfachungen in Noisy Regimes 吵闹制度中学习高维简单主义的基本限制 2506.10101v1
  • 300 06-11 Token-Efficient RL for LLM Reasoning Token-Efficient RL für LLM-Vernunft 用于LLM 理由解释的 Token- Effecticent RL 2504.20834v4
  • 301 06-11 Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models Schätzung der gemeinsamen Wahrscheinlichkeit von Szenario-Parametern mit Gaussian Mixture Copula-Modellen 与高山混合混合胶合合构件模型共同估计设想参数的 联合概率 2506.10098v1
  • 302 06-11 Unsupervised Deep Clustering of MNIST with Triplet-Enhanced Convolutional Autoencoders Unüberwachtes Deep Clustering von MNIST mit Triplet-Enhanced Convolutional Autoencodern 在无人监督下,MNIST与三维强化的革命自动自动编码器一起深入集群 2506.10094v1
  • 303 06-11 Efficient kernelized bandit algorithms via exploration distributions Effiziente kernisierte Bandit-Algorithmen über Explorations-Distributionen 通过勘探分布分配进行高效率的内脏化土匪算法 2506.10091v1
  • 304 06-11 Assortment Optimization for Patient-Provider Matching Sortimentsoptimierung für Patient-Provider-Matching 患者-供应者匹配的组群优化 2502.10353v2
  • 305 06-11 Optimizing Latent Dimension Allocation in Hierarchical VAEs: Balancing Attenuation and Information Retention for OOD Detection Optimierung der Latent Dimension Allocation in Hierarchical VAEs: Abgleichende Dämpfung und Informationsretention für OOD Detection 优化在等级性VAEs中的中局量分配:平衡减速和信息保留以探测OOD 2506.10089v1
  • 306 06-11 M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Predictive Embedding Architecture M3-JEPA: Multimodale Ausrichtung über Multi-Gate MoE auf Basis der Joint-Predictive Embedding Architecture M3-JEPA:通过基于联合预防嵌入结构的多门教育部实现多式联运协调 2409.05929v5
  • 307 06-11 Patient-Specific Deep Reinforcement Learning for Automatic Replanning in Head-and-Neck Cancer Proton Therapy Patientenspezifisches Tiefen-Verstärkungs-Lernen zur automatischen Neuplanung in der Kopf-Neck-Krebs-Protonen-Therapie 在头部和颈部癌症质子治疗中进行自动再规划的 具体病人深强化学习 2506.10073v1
  • 308 06-11 Textual Bayes: Quantifying Uncertainty in LLM-Based Systems Textbuchten: Quantifizierung der Unsicherheit in LLM-basierten Systemen 以LLM为基础的系统中的量化不确定性 2506.10060v1
  • 309 06-11 DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos DGS-LRM: 实时变形 3D Gaussian 重建独视视频 2506.09997v1
  • 310 06-11 Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling Flipping gegen alle Ungeraden: LLM-Münze Flip-Bias durch verbalisierte Ablehnungsstichproben reduzieren 反万千万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万千万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万 2506.09998v1
  • 311 06-11 EVINET: Towards Open-World Graph Learning via Evidential Reasoning Network EVINET: Auf dem Weg zum Open-World Graph Learning über Evidential Reasoning Network EVINET:通过证明理由网络走向开放世界图表学习 2506.07288v2
  • 312 06-11 Text-Aware Image Restoration with Diffusion Models Text-Aware Bildrestaurierung mit Diffusionsmodellen 带有传播模型的文本软件图像恢复 2506.09993v1
  • 313 06-11 Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Multiverse: Ihre Sprachmodelle entscheiden geheim, wie die Generation parallelisiert und zusammengeführt werden kann 多种语言:您的语言模型秘密决定如何平行和合并一代 2506.09991v1
  • 314 06-11 Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation Chain-of-Action: Trajektorie Autoregressive Modellierung für die Robotermanipulation 操作链:机器人操纵的轨迹自动递减模型模型 2506.09990v1
  • 315 06-11 Spectral Image Tokenizer Spektraler Bild-Tokenizer 频谱图像调试器 2412.09607v2
  • 316 06-11 EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits EditInspector: Ein Benchmark für die Auswertung von textgeführten Bildbearbeitungen 编辑监视器:文本指导图像编辑评价基准 2506.09988v1
  • 317 06-11 Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs Omni-DPO: Ein Dual-Perspektiven-Paradigma für dynamisches Preference-Lernen von LLMs Omni-DPO:一个双双面范式,以动态方式学习LLMs 2506.10054v1
  • 318 06-11 A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs Ein Shortcut-aware-Video-QA-Benchmark für physisches Verständnis über minimale Video-Paare 通过最小视频对等器进行身体理解的捷径视距视频QA基准 2506.09987v1
  • 319 06-11 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning V-JEPA 2: Selbstüberwachte Videomodelle ermöglichen Verständnis, Vorhersage und Planung V-JEPA2:自我监督的视频模型,促进理解、预测和规划 2506.09985v1
  • 320 06-11 Trustworthy AI: Safety, Bias, and Privacy – A Survey Vertrauenswürdige KI: Sicherheit, Bias und Privatsphäre – Eine Umfrage 值得信赖的大赦国际:安全、比亚斯和隐私 – – 调查 2502.10450v2
  • 321 06-11 Resa: Transparent Reasoning Models via SAEs Resa: Transparente Begründungsmodelle über SAE Resa:通过SAEs建立透明说明理由模型 2506.09967v1
  • 322 06-11 Gradient Aligned Regression via Pairwise Losses Gradient Aligned Regression über Paarweise Verluste 通过对等损失逐步走向一致后退 2402.06104v6
  • 323 06-11 Griffin: Towards a Graph-Centric Relational Database Foundation Model Griffin: Auf dem Weg zu einem Graph-Centric Relational Database Foundation Modell 格里芬:走向图形中心关系数据库基础模型 2505.05568v2
  • 324 06-11 Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models Provable Benefits of Unüberwachte Pre-Training and Transfer Learning via Single-Index Models 通过单一指标模式进行的未受监督的培训和转让前学习的惠益 2502.16849v2
  • 325 06-11 Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy Kvasir-VQA-x1: Ein multimodaler Datensatz für medizinische Vernunft und robustes MedVQA in der gastrointestinalen Endoskopie Kvasir-VQA-x1:一个用于医疗理由的多模式数据集和气肠内肠内镜检查中的强力MDVQA 2506.09958v1
  • 326 06-11 Canonical Latent Representations in Conditional Diffusion Models Kanonische Latentendarstellungen in bedingten Diffusionsmodellen 有条件扩散模型中卡尼卡内端表示器 2506.09955v1
  • 327 06-11 Scalable First-order Method for Certifying Optimal k-Sparse GLMs Skalierbare Methode erster Ordnung zur Zertifizierung optimaler k-Spar-GLMs 可缩放的核证最佳 k- Sparse GLM 最佳 GLM 的可缩放第一等级方法 2502.09502v3
  • 328 06-11 7B Fully Open Source Moxin-LLM/VLM – From Pretraining to GRPO-based Reinforcement Learning Enhancement 7B Vollständig offene Quelle Moxin-LLM/VLM – Vom Vortraining zur GRPO-basierten Verstärkungs-Lernverbesserung 7B 完全开放源代码 Moxin-LLM/VLM – – 从预培训到基于GROP的强化学习加强 2412.06845v5
  • 329 06-11 Curvature Tuning: Provable Training-free Model Steering From a Single Parameter Curvature Tuning: Vorstellbare trainingsfreie Modelllenkung aus einem einzigen Parameter 曲线图示:从单一参数看可实现的无培训示范指导 2502.07783v4
  • 330 06-11 The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability Die Komplexität der Online-Strategischen Entscheidungsfindung mit Informationen Asymmetrie und Wissenstransportierbarkeit 在线战略决策与信息不对称和知识可传输性 2506.09940v1
  • 331 06-11 Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order Leveraging Coordinate Momentum in SignSGD und Muon: Speicheroptimierte Null-Order 利用 SignSGD 和 Muon 中的调控坐标动态: 内存适应的零点 2506.04430v2
  • 332 06-11 Limits of nonlinear and dispersive fiber propagation for an optical fiber-based extreme learning machine Grenzen der nichtlinearen und dispersiven Faserausbreitung für eine optische Faser-basierte extreme Lernmaschine 光纤极端学习机器的非线性和分散性纤维传播限制 2503.03649v3
  • 333 06-11 CaLMQA: Exploring culturally specific long-form question answering across 23 languages CaLMQA: Erforschen von kulturell spezifischen Langzeitfragen in 23 Sprachen CALMQA:探讨以23种语言回答文化上特有的长式问题 2406.17761v3
  • 334 06-11 Generate-then-Verify: Reconstructing Data from Limited Published Statistics Generate-then-Verify: Rekonstruieren von Daten aus begrenzten veröffentlichten Statistiken 生成 - - - - - - - 核查:重新构建来自有限出版统计数据的数据 2504.21199v2
  • 335 06-11 Discovering Forbidden Topics in Language Models Verbotene Themen in Sprachmodellen entdecken 发现语言模型中的禁止专题 2505.17441v3
  • 336 06-11 RNE: a plug-and-play framework for diffusion density estimation and inference-time control RNE: ein Plug-and-Play-Framework für Diffusionsdichteschätzung und Inferenzzeitkontrolle RNE:用于传播密度估计和推断时间控制的一个插件和操作框架 2506.05668v2
  • 337 06-11 Bayesian Probabilistic Matrix Factorization Bayesische probabilistische Matrix-Fabrikisierung Bayesian 概率矩阵 2506.09928v1
  • 338 06-11 Mamba time series forecasting with uncertainty quantification Mamba-Zeitreihenvorhersage mit Unsicherheitsquantifizierung Mamba时间序列预测,带有不确定性量化的Mamba时间序列 2503.10873v2
  • 339 06-11 Dynamic Diffusion Schrödinger Bridge in Astrophysical Observational Inversions Dynamische Diffusion Schrödinger Brücke in astrophysikalischen Beobachtungsinversionen 天体物理观测反转中的动态扩散Srödinger桥 2506.08065v2
  • 340 06-11 Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning Apollo: A Posteriori Label-Only Mitgliedschaft Inferenz Angriff auf Maschine Unlearning 阿波罗:一个后选标签 - 唯一的会员 推论攻击 机器退出学习 2506.09923v1
  • 341 06-11 Logits-Based Finetuning Logits-based Finetuning 基于登录的微调 2505.24461v2
  • 342 06-11 Optimal Noise Reduction in Dense Mixed-Membership Stochastic Block Models under Diverging Spiked Eigenvalues Condition Optimale Lärmreduzierung in Dichte Mischmitgliedschaft Stochastic Block Models unter Diverging Spiked Eigenvalues Zustand 在四分五裂的斯派派值条件下,减少高密度混合混合成员制碎块模型的最佳噪音 2307.14530v3
  • 343 06-11 Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity Adam exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity Adam 通过协调的适应性 开发 $\ ell\ incty$- 损失地貌测量 2410.08198v3
  • 344 06-11 Temperature Optimization for Bayesian Deep Learning Temperaturoptimierung für Bayesian Deep Learning Bayesian深层学习的温度优化 2410.05757v2
  • 345 06-11 PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants PersonaLens: Ein Benchmark für die Personalisierungsbewertung in Conversational AI Assistants 人际关系:对话性AI助理个人化评价基准 2506.09902v1
  • 346 06-11 “What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended) “Was sind meine Optionen?”: Erklären von RL-Agenten mit unterschiedlichen nahezu optimalen Alternativen (erweitert) “我有什么选择?”:解释具有多种近最佳替代方法的RL代理(扩展) 2506.09901v1
  • 347 06-11 The Remarkable Robustness of LLMs: Stages of Inference? Die bemerkenswerte Robustheit von LLMs: Stufen der Schlussfolgerung? LLMS的显著威力:推论阶段? 2406.19384v2
  • 348 06-11 A look at adversarial attacks on radio waveforms from discrete latent space Ein Blick auf feindliche Angriffe auf Radiowellenformen aus diskretem latenten Raum 审视对来自离散潜空空间的无线电波形的对抗性攻击 2506.09896v1
  • 349 06-11 Causal Climate Emulation with Bayesian Filtering Causal Klima Emulation mit Bayesian Filtering 以贝耶斯过滤器模拟因果气候 2506.09891v1
  • 350 06-11 Learning single-index models via harmonic decomposition Ein-Index-Modelle über harmonische Zersetzung lernen 通过口腔分解学习单指数模型 2506.09887v1
  • 351 06-11 OmniJet-$α_C$: Learning point cloud calorimeter simulations using generative transformers OmniJet-$α_C$: Simulationen von Punktwolkenkalorimetern mit generativen Transformatoren OmniJet-$ C$: 使用基因变压器学习点云度热量计模拟 2501.05534v2
  • 352 06-11 Leveraging data-driven weather models for improving numerical weather prediction skill through large-scale spectral nudging Nutzung von datengesteuerten Wettermodellen zur Verbesserung der numerischen Wettervorhersagefähigkeit durch großflächige spektrale Nudging 利用数据驱动天气模型,通过大型光谱裸露提高数字天气预测技能 2407.06100v3
  • 353 06-11 UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching UmbraTTS: Text-zu-Sprechen an Umweltkontexte anpassen mit Flow Matching UmbratTS:用流动匹配使文字语音适应环境环境环境 2506.09874v1
  • 354 06-11 Private Aggregation for Byzantine-Resilient Heterogeneous Federated Learning Private Aggregation für Byzantinisch-Resilient Heterogenes Federated Learning Byzantine-抗后异异质联邦学习私营聚合 2506.09870v1
  • 355 06-11 Machine Learning-Based Classification of Oils Using Dielectric Properties and Microwave Resonant Sensing Machine Learning-based Klassifizierung von Ölen mit dielektrischen Eigenschaften und Mikrowellen-Resonant Sensing 利用电力特性和微波共振遥感对石油进行机 机 动 学习 分类 2506.09867v1
  • 356 06-11 Conformal Prediction as Bayesian Quadrature Konforme Vorhersage als bayesische Quadratur Bayesian 夸度的非正规预测 2502.13228v2
  • 357 06-11 Guided Graph Compression for Quantum Graph Neural Networks Geführte Graphenkompression für Quantum Graphen-Neural-Netzwerke 量图神经网络的向导图形压缩 2506.09862v1
  • 358 06-11 ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models ImageChain: Advancing Sequential Image-to-Text Reasoning in multimodalen großen Sprachmodellen 图像Chain:在多模式大语言模型中推进按顺序排列的图像到文字理由 2502.19409v2
  • 359 06-11 Tractable hierarchies of convex relaxations for polynomial optimization on the nonnegative orthant Tractable Hierarchien konvexer Entspannungen zur polynomialen Optimierung auf dem nicht negativen Orthanten 用于非阴性恒星多边优化的 convex 放松的可跟踪的等级结构 2209.06175v2
  • 360 06-11 FinTSBridge: A New Evaluation Suite for Real-world Financial Prediction with Advanced Time Series Models FinTSBridge: Eine neue Evaluierungs-Suite für reale finanzielle Vorhersagen mit Modellen der fortgeschrittenen Zeitreihen FinTSBridge:具有先进时间序列模型的实时世界金融预测新评价套件 2503.06928v2
  • 361 06-11 Advancing Exchange Rate Forecasting: Leveraging Machine Learning and AI for Enhanced Accuracy in Global Financial Markets Advancing Exchange Rate Forecasting: Hebeling Machine Learning und KI für verbesserte Genauigkeit in den globalen Finanzmärkten 提高汇率预测:利用机器学习和大赦国际促进全球金融市场更加准确性 2506.09851v1
  • 362 06-11 Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective Produkt von Experten mit LLMs: Leistungssteigerung auf ARC ist eine Frage der Perspektive 与LLMM公司的专家产品有关的产品:促进ACR的绩效是一个透视问题 2505.07859v2
  • 363 06-11 Reinforcement Learning from Human Feedback Verstärktes Lernen aus menschlichem Feedback 从人类反馈中学习 2504.12501v2
  • 364 06-11 A Deep Generative Model for the Simulation of Discrete Karst Networks Ein tiefes Generatives Modell für die Simulation diskreter Karst-Netzwerke Discrete Karst 网络模拟深度生成模型 2506.09832v1
  • 365 06-11 Multi-Variable Batch Bayesian Optimization in Materials Research: Synthetic Data Analysis of Noise Sensitivity and Problem Landscape Effects Multivariable Batch Bayesian Optimierung in der Materialforschung: Synthetische Datenanalyse von Lärmempfindlichkeit und Problemlandschaftseffekten 材料研究中多种可变批量贝耶斯最佳优化:噪音敏感度和问题景观效应合成数据分析 2504.03943v2
  • 366 06-11 Online Discovery of Simulation Models for Evolving Business Processes (Extended Version) Online Discovery of Simulation Models for Evolving Business Processes (Erweiterte Version) 不断演变的业务流程模拟模型在线发现(扩展版) 2506.10049v1
  • 367 06-11 Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation Sparser, besser, schneller, stärker: Sparsity Detection für effiziente automatische Differenzierung Sparser、更好、更快、更有力:为高效自动差异而进行分级检测 2501.17737v2
  • 368 06-11 Metritocracy: Representative Metrics for Lite Benchmarks Metritokratie: Repräsentative Metrics für Lite Benchmarks 宏观经济:利特基准的代议制衡量标准 2506.09813v1
  • 369 06-11 Generalizing Supervised Contrastive learning: A Projection Perspective Verallgemeinern des Überwachten Kontrastivlernens: Eine Projektionsperspektive 普及受监督的反竞争学习:预测视角 2506.09810v1
  • 370 06-11 Network Dynamics-Based Framework for Understanding Deep Neural Networks Network Dynamics-based Framework zum Verständnis tiefer neuraler Netzwerke 了解深神经网络的动态框架 2501.02436v3
  • 371 06-11 Using Shapley interactions to understand how models use structure Shapley-Interaktionen nutzen, um zu verstehen, wie Modelle Struktur verwenden 利用模拟互动来了解模型如何使用结构 2403.13106v2
  • 372 06-11 Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy Automatische Behandlungsplanung mit Verstärkungstraining für eine hoch dosierte Prostata Brachytherapie 利用高剂量前列腺布拉奇疗法强化学习进行自动治疗规划 2506.09805v1
  • 373 06-11 Regularizing Learnable Feature Extraction for Automatic Speech Recognition Regularisieren von Lern-Feature-Extraktion für automatische Spracherkennung 将自动语音识别的可读功能抽取器正规化 2506.09804v1
  • 374 06-11 Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols Teufelshand: Daten vergiften Angriffe auf lokal private Graphen-Lernprotokolle 魔鬼之手:对本地私人图案学习程序的数据毒害攻击 2506.09803v1
  • 375 06-11 LogProber: Disentangling confidence from contamination in LLM responses LogProber: Entwirren des Vertrauens in LLM-Antworten 日志Prober:解除对LLM反应中污染的信心 2408.14352v2
  • 376 06-11 Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction Einschließlich sprachlicher Einschränkungen aus der externen Wissensquelle für Audio-Visual Target Speech Extraction 将外部知识来源的语言限制纳入音像目标演讲摘录 2506.09792v1
  • 377 06-11 DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images DANCE: Deep Learning-Assisted Analyse von Proteinsequenzen mit Chaos verbesserte Kaleidoskopische Bilder 成就:利用混乱强化卡莱多斯图象对蛋白序列进行深入学习辅助分析 2409.06694v3
  • 378 06-11 Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing Darstellung Shattering in Transformers: Synthetische Studie mit Wissensbearbeitung 在变形器中代表变形器:带有知识编辑的合成研究 2410.17194v5
  • 379 06-11 A theoretical framework for self-supervised contrastive learning for continuous dependent data Ein theoretischer Rahmen für selbstüberwachtes kontrastives Lernen für kontinuierliche abhängige Daten 为持续依赖数据进行自我监督对比式学习的理论框架 2506.09785v1
  • 380 06-11 On the Similarities of Embeddings in Contrastive Learning Über die Ähnlichkeiten von Einbettungen im kontrastiven Lernen 关于差异学习中的嵌入相似性 2506.09781v1
  • 381 06-11 Cross-Channel Unlabeled Sensing over a Union of Signal Subspaces Cross-Channel-Unmarkierte Sensing über eine Union von Signal-Subräumen 对信号子空间联盟的交叉通道无标签遥感 2506.09773v1
  • 382 06-11 Load-Aware Training Scheduling for Model Circulation-based Decentralized Federated Learning Load-Aware-Trainingsplanung für Modellzirkulationsbasiertes dezentralisiertes Federated Learning 模拟环环流分散式联邦学习模式的负载软件培训计划 2506.09769v1
  • 383 06-11 Anomaly Detection via Autoencoder Composite Features and NCE Anomalieerkennung über Autoencoder Composite-Features und NCE 通过自动编码器合成特征和 NCE 异常探测 2502.01920v2
  • 384 06-11 Learning to Optimize Package Picking for Large-Scale, Real-World Robot Induction Optimierung der Auswahl von Paketen für großformatige, real-Welt-Roboter-Induktion 学习优化大型、现实世界机器人上岗软件包选择 2506.09765v1
  • 385 06-11 Alice and the Caterpillar: A more descriptive null model for assessing data mining results Alice und die Caterpillar: Ein beschreibendes Nullmodell zur Bewertung der Ergebnisse des Data Mining 爱丽丝和毛毛虫:评估数据开采结果的更具描述性的无效模式 2506.09764v1
  • 386 06-11 Generating Likely Counterfactuals Using Sum-Product Networks Wahrscheinliche Counterfactuals mit Sum-Product-Netzwerken generieren 使用总和生产网络产生可能的反事实 2401.14086v5
  • 387 06-11 STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization STAR: Vielfältiges Lernen von Roboter-Fähigkeit Abstraktionen durch Rotations-Augmented Vector Quantization STAR:通过旋转-增强矢量量化,学习多式机器人技能变异 2506.03863v2
  • 388 06-11 Feature Engineering for Agents: An Adaptive Cognitive Architecture for Interpretable ML Monitoring Feature Engineering for Agents: Eine adaptive Kognitive Architektur für interpretierbare ML-Überwachung 代理物特征工程:用于可解释ML监测的适应性认知结构 2506.09742v1
  • 389 06-11 Byzantine-Resilient Decentralized Multi-Armed Bandits Byzantinisch-Resiliente dezentralisierte Multi-Armed Banditen 拜占庭抗力强的分散式多武装强盗 2310.07320v2
  • 390 06-11 Towards Multi-modal Graph Large Language Model Auf dem Weg zum multimodalen Graphenmodell für große Sprachen 走向多式大语言数字模型 2506.09738v1
  • 391 06-11 AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale AtmosMJ: Über die Jahresskala hinausgehende KI-Wettervorhersagen über das Gateing-Mechanismus hinaus AtmosMJ: 重新研究国际气象预报系统在年规模以后的天气预报的确定机制 2506.09733v1
  • 392 06-11 Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts Auf dem Weg zu Basismodellen für die dynamische Systemrekonstruktion: Hierarchisches Meta-Learning über die Mischung von Experten 建立动态系统重建基础模式:通过专家混合学习 2502.05335v2
  • 393 06-11 Reevaluating Meta-Learning Optimization Algorithms Through Contextual Self-Modulation Neubewertung von Meta-Learning-Optimierungsalgorithmen durch kontextuelle Selbst-Modulation 通过上下文自体调整重新评价元-收益优化优化化的元-收益优化化 2410.01655v2
  • 394 06-11 Auto-Compressing Networks Autokomprimierende Netzwerke 自动压缩网络 2506.09714v1
  • 395 06-11 Training-Free Voice Conversion with Factorized Optimal Transport Schulungsfreie Sprachumwandlung mit Factorized Optimal Transport 以优化运输为要素的免费培训语音转换 2506.09709v1
  • 396 06-11 LLM2TEA: Agentic AI Designer Finds Innovative Objects with Generative Evolutionary Multitasking LLM2TEA: Agentischer AI-Designer findet innovative Objekte mit generativem evolutionären Multitasking LLM2TEA: ARI 设计师发现具有引演进进化多任务的创新对象 2406.14917v2
  • 397 06-11 On the Importance of Embedding Norms in Self-Supervised Learning Über die Bedeutung der Einbettung von Normen in das Selbstüberwachte Lernen 关于自我监督学习中嵌入规范的重要性 2502.09252v2
  • 398 06-11 TRIDENT: Temporally Restricted Inference via DFA-Enhanced Neural Traversal TRIDENT: Vorübergehend eingeschränkte Schlussfolgerung über DFA-verstärktes Neuraltraversal 通过DFA-增强的神经轨迹轨迹临时受限制的引文 2506.09701v1
  • 399 06-11 SLEEPYLAND: trust begins with fair evaluation of automatic sleep staging models SLEEPYLAND: Vertrauen beginnt mit einer fairen Bewertung automatischer Schlafinszenierungsmodelle SEPEYLAND:信任始于公平评价自动睡眠模式 2506.08574v2
  • 400 06-11 Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment Jenseits von Bradley-Terry-Modellen: Ein allgemeines Präferenzmodell für Sprachmodellausrichtung 超越布拉德利-泰瑞模式:语文模式统一通用优先模式 2410.02197v3
  • 401 06-11 Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation Auto-Regressive vs Flow-Matching: Eine vergleichende Studie zur Modellierung von Paradigmen für die Text-zu-Musik-Generierung 自动递减与流动配接: 文本到音乐一代模型模型比较研究 2506.08570v2
  • 402 06-11 Share Secrets for Privacy: Confidential Forecasting with Vertical Federated Learning Share Secrets for Privacy: Vertrauliche Vorhersagen mit vertikalem Federated Learning 私隐共享秘密:与垂直联邦学习有关的保密预报 2405.20761v2
  • 403 06-11 Adding simple structure at inference improves Vision-Language Compositionality Einfache Struktur bei Inferenz hinzufügen verbessert die Vision-Sprachenkompositionalität 在推论中添加简单结构改进视野-语言的构成性 2506.09691v1
  • 404 06-11 Wasserstein Hypergraph Neural Network Wasserstein Hypergraph Neuronales Netzwerk Vasserstein 高音神经网络 2506.09682v1
  • 405 06-11 Assessing the Quality of Denoising Diffusion Models in Wasserstein Distance: Noisy Score and Optimal Bounds Bewertung der Qualität des Denoisierens von Diffusionsmodellen in Wasserstein Abstand: Geräuschpegel und optimale Bounds 评估瓦西尔斯坦距离中Disoising扩散模型的质量:噪音分数和最佳区区 2506.09681v1
  • 406 06-11 SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting SplitLoRA: Balance Stabilität und Plastizität im kontinuierlichen Lernen durch gradienten Raum Splitting Split LoRA:通过逐步空间分割在持续学习中平衡稳定和可塑性 2505.22370v3
  • 407 06-11 Wavelet Scattering Transform and Fourier Representation for Offline Detection of Malicious Clients in Federated Learning Wavelet Scattering Transform und Fourier-Darstellung zur Offline-Erkennung von bösartigen Kunden im Federated Learning 联邦学习联盟恶意客户离线检测分波散散变形和Fourier代表处 2506.09674v1
  • 408 06-11 Optimizing Shortfall Risk Metric for Learning Regression Models Optimierung von Shortfall Risk Metric zum Lernen von Regressionsmodellen 优化 “ 短缺风险 “ 风险计量模型的学习倒退模型 2505.17777v3
  • 409 06-11 CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization CROW: Beseitigung von Backdoors aus großen Sprachmodellen über interne Konsistenzregularisierung CROW:通过内部一致性规范化,消除大语言模式中的后门 2411.12768v2
  • 410 06-11 CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain CINeMA: Bedingter Impliziter Neuraler Multi-Modaler Atlas für eine räumlich-zeitliche Darstellung des Perinatalen Gehirns CINEMA: 围产期大脑空间时/时代表性有条件的隐性神经隐性多模式图集 2506.09668v1
  • 411 06-11 Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes Lernen von zeitvariierenden Multi-Region Gehirnkommunikation über skalierbare Markovian Gaussian Prozesse 通过可缩放的马尔科维扬高斯进程进行学习、改变时间的多区域脑交流 2407.00397v5
  • 412 06-11 Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement Anmerkungsfreie MIDI-to-Audio-Synthese über Konkatenative Synthese und Generative Veredelung 通过分类合成和生产改进进行无说明的MIDI到研究综合综述 2410.16785v2
  • 413 06-11 SyncFed: Time-Aware Federated Learning through Explicit Timestamping and Synchronization SyncFed: Time-Aware-Federated Learning durch explizites Zeitstempeln und Synchronisieren 同步Fed:通过明确时间印记和同步化进行时间软件联邦学习 2506.09660v1
  • 414 06-11 Intent Factored Generation: Unleashing the Diversity in Your Language Model Intent Factored Generation: Die Vielfalt in Ihrem Sprachmodell entfesseln 内在因数代:在你的语言模式中释放多样性 2506.09659v1
  • 415 06-11 DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy DipLLM: Feinsteuerungs-LLM für strategische Entscheidungsfindung in der Diplomatie DipLLM: 外交战略决策的精细推荐LLM 2506.09655v1
  • 416 06-11 Improving Discriminator Guidance in Diffusion Models Verbesserung der Orientierungshilfen für Diskriminatoren bei Diffusionsmodellen 改进传播模式中反对者的指导 2503.16117v2
  • 417 06-11 Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations Die Gleichwertigkeit Bayesischer neuraler Netzwerke und Gaussian-Prozesse: Über die Bedeutung von Lernaktivierungen 重新审视贝耶斯神经网络和高斯进程等效的贝耶斯神经网络和高斯进程:关于学习活动的重要性 2410.15777v3
  • 418 06-11 Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension Diskrepanzen sind Tugend: Schwach-zu-starke Verallgemeinerung durch Lens der Intrinsischen Dimension 差异是道德:通过内分泌尺寸的透镜对电压的微弱普遍化 2502.05075v4
  • 419 06-11 Pruning Spurious Subgraphs for Graph Out-of-Distribtuion Generalization Pruning Spurious Subgraphs für Graph Out-of-Distribtuion Verallgemeinerung 平面平面平面平面图图解裂平面图集集 2506.05957v3
  • 420 06-11 No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces Keine Aufgabe nach links: Isotropes Modell, das sich mit gemeinsamen und aufgabenspezifischen Subräumen vermischt 未留下任务 : 与共同和任务特定子空间合并的异形模型 2502.04959v3
  • 421 06-11 HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios HopaDIFF: Ganzheitlich-Partial Aware Fourier Conditioned Diffusion für die Referenzierung von Human Action Segmentierung in Multi-Personen-Szenarien HopaDIFF: 在多重人的情况中为提及人类行动分部分而传播的全方位全方位认识 2506.09650v1
  • 422 06-11 Scaling Laws for Uncertainty in Deep Learning Skalierungsgesetze für Ungewissheit im Deep Learning 强化深层学习不确定性法律 2506.09648v1
  • 423 06-11 Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls Verteilungs- und adversariell robuste Logistische Regression über Intersecting Wasserstein Balls 通过交叉瓦塞施泰因球的分布式和反向强劲后勤后退 2407.13625v4
  • 424 06-11 Real-Time Network Traffic Forecasting with Missing Data: A Generative Model Approach Echtzeit-Netzverkehr mit fehlenden Daten: Ein generatives Modellansatz 缺少数据实时网络流量预测:生成模式方法 2506.09647v1
  • 425 06-11 Generalizable and Fast Surrogates: Model Predictive Control of Articulated Soft Robots using Physics-Informed Neural Networks Generalisierbare und schnelle Surrogate: Modellvorhersagesteuerung von artikulierten Soft Robots mit physikinformierten Neuronalen Netzwerken 通用和快速超速超速:利用物理成形神经网络对人工软体机器人进行模拟预测控制 2502.01916v2
  • 426 06-11 Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering Effizientes und generalisierbares Graph Retriever für Wissens-Graph-Frageantworten lernen 知识拼图问题解答的学习效率和通用图表 2506.09645v1
  • 427 06-11 Evasion Attacks Against Bayesian Predictive Models Evasionsangriffe gegen bayesische Vorhersagemodelle 攻击贝耶斯人预测模型 2506.09640v1
  • 428 06-11 FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models FFVLMBENCH: 设想语言模型联邦罚款基准 2506.09638v1
  • 429 06-11 Model Attribution and Detection of Synthetic Speech via Vocoder Fingerprints Modellzuweisung und Erkennung von Synthetischer Sprache über Vocoder-Fingerabdrücke 通过 Vocoder 指纹识别合成演讲的模型归属和检测 2411.14013v2
  • 430 06-11 In-Context Bias Propagation in LLM-Based Tabular Data Generation In-Context Bias Propagation in der LLM-basierten tabellarischen Datengenerierung 以LLM为基础的制表数据生成中的Intext Bias 2506.09630v1
  • 431 06-11 Holistic Uncertainty Estimation For Open-Set Recognition Ganzheitliche Ungewissheitsschätzung für Open-Set-Erkennung Open-Set 识别的全局性不确定性估算 2408.14229v2
  • 432 06-11 GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras GLGENN: Eine neuartige Parameter-Licht-Equivariante Neural Networks-Architektur auf Basis von Clifford Geometrischen Algebras GLGENN: 以克里夫德几何代数为基础的新参数-日等均质神经网络建筑 2506.09625v1
  • 433 06-11 SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot SparseSSM: Effiziente selektiv strukturierte State Space-Modelle können in One-Shot beschnitten werden SprassessSSSSM: 高效选择性结构结构国家空间模型可在单层中谨慎处理 2506.09613v1
  • 434 06-11 Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism Verständnis der Kompetenzlücke in recurrenten Sprachmodellen: Die Rolle des Sammel- und Aggregatmechanismus 了解经常语文模式中的技能差距:集合和集合机制的作用 2504.18574v2
  • 435 06-11 Accelerating Large-Scale Regularized High-Order Tensor Recovery Beschleunigung der großflächigen Regularisierung der High-Order Tensor Erholung 加速大规模正规化高阶天文恢复 2506.09594v1
  • 436 06-11 Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting Physics-informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting 用于全球次季对季预报的计算机化远程连接软件变换器 2506.08049v2
  • 437 06-11 Metric-Entropy Limits on the Approximation of Nonlinear Dynamical Systems Metrisch-Entropie-Grenzen zur Annäherung nichtlinearer dynamischer Systeme 非线-线性动态系统接近度的中子-内基限值 2407.01250v2
  • 438 06-11 Beyond Overconfidence: Foundation Models Redefine Calibration in Deep Neural Networks Beyond Overconfidency: Foundation Models Neufine Calibration in Deep Neural Networks 超越自信:深神经网络中的模型重新校准基金会 2506.09593v1
  • 439 06-11 Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis Decoding Knowledge Attribution in Mixture-of-Experts: Ein Rahmenwerk der Basic-Refinement-Kollaboration und Effizienzanalyse 在混合专家中解码知识的配置:基础改进协作和效率分析框架 2505.24593v2
  • 440 06-11 Human-like object concept representations emerge naturally in multimodal large language models Mensch-ähnliche Objekt-Konzept-Darstellungen entstehen natürlich in multimodalen großen Sprachmodellen 在多式大型语言模式中自然出现类似人类的物体概念表述 2407.01067v3
  • 441 06-11 Feasibility Study of CNNs and MLPs for Radiation Heat Transfer in 2-D Furnaces with Spectrally Participative Gases Machbarkeitsstudie von CNNs und MLPs für den Strahlungswärmetransfer in 2-D-Öfen mit Spektrally Participative Gasen 关于有线电视新闻网和多频多频卫星在2-D发热中用光谱参与气体进行辐射热传导的有线电视新闻网和 MLP的可行性研究 2506.08033v2
  • 442 06-11 MOORL: A Framework for Integrating Offline-Online Reinforcement Learning MOORL: Ein Rahmen für die Integration von Offline-Online-Verstärkungslernen MORL: 整合离线-在线强化学习的框架 2506.09574v1
  • 443 06-11 Mechanistic PDE Networks for Discovery of Governing Equations Mechanistische PDE-Netzwerke für die Entdeckung von Regierenden Gleichungen 发现管理等分的机械 PDE 网络 2502.18377v2
  • 444 06-11 From Symbolic to Neural and Back: Exploring Knowledge Graph-Large Language Model Synergies Von Symbolisch zu Neural und Zurück: Erforschen von Wissen Graphen-Große Sprachmodell-Synergien 从符号到神经和后背:探索知识图集-大型语言模式协同作用 2506.09566v1
  • 445 06-11 Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics Funktion Normalisierung verhindert Zusammenbruch nicht kontrastiver Lerndynamiken 防止非争议学习动态的崩溃 2309.16109v2
  • 446 06-11 Traceable LLM-based validation of statements in knowledge graphs Rückführbare LLM-basierte Validierung von Aussagen in Wissensgraphen 以可追踪的LLM为基础,对知识图中的报表进行可追踪的LLM验证 2409.07507v2
  • 447 06-11 On Finetuning Tabular Foundation Models Über Finetuning Tabular Foundation Modelle 关于微调表表基础模型 2506.08982v2
  • 448 06-11 Large Language Models Miss the Multi-Agent Mark Große Sprachmodelle vermissen das Multi-Agent Mark 大语言模型 2505.21298v2
  • 449 06-11 Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means Robuste Verteilte Schätzung: Erweiterung von Gossip-Algorithmen auf Rangfolge und Trimmmittel 强有力的分布分布式估算:将Gossip的数值扩大至排名和缩略语 2505.17836v5
  • 450 06-11 DeepExtractor: Time-domain reconstruction of signals and glitches in gravitational wave data with deep learning DeepExtractor: Zeit-Domänen-Rekonstruktion von Signalen und Störungen in Gravitationswellendaten mit Deep Learning 深度提取器:用深层学习重建引力波数据信号和小故障的时间- 域重建 2501.18423v3
  • 451 06-11 Differentially Private Federated $k$-Means Clustering with Server-Side Data Differential Private Federated $k$-Means Clustering mit Server-Side-Daten 与服务器-Side数据组合在一起 2506.05408v2
  • 452 06-11 The Causal Information Bottleneck and Optimal Causal Variable Abstractions Die Kausalinformation Engpass und Optimale Kausalvariable Abstraktionen 因果信息瓶颈和最佳因果变量抽象 2410.00535v5
  • 453 06-11 Identifiable Latent Bandits: Leveraging observational data for personalized decision-making Identifizierbare Latent Bandits: Nutzung von Beobachtungsdaten für personalisierte Entscheidungen 利用观测数据进行个性化决策 2407.16239v4
  • 454 06-11 Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information Eng gekoppelte LiDAR-IMU-Leg-Odometrie mit Online-Lektionskinematik, die Fußtaktile Informationen enthält LIDAR-IMU-Leg Odo量度与包含脚触觉信息的在线学习腿脚力学 2506.09548v1
  • 455 06-11 Design Patterns for Securing LLM Agents against Prompt Injections Designmuster zur Sicherung von LLM-Agenten gegen sofortige Injektionen 防止及时注射的LLM代理物的安全设计模式 2506.08837v2
  • 456 06-11 On the Privacy Risks of Spiking Neural Networks: A Membership Inference Analysis Über die Datenschutzrisiken von Spiking Neuronal Networks: Eine Schlussfolgerungsanalyse für die Mitgliedschaft 《关于间谍神经网络的隐私风险:成员推断分析》 2502.13191v4
  • 457 06-11 Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models Athena: Multimodale Vernunft mit dateneffizienten Prozess-Reward-Modellen verbessern Athena:利用数据高效流程奖励模式加强多式联运理由 2506.09532v1
  • 458 06-11 Neural Functions for Learning Periodic Signal Neurale Funktionen für das Lernen von periodischem Signal 学习周期信号的神经功能 2506.09526v1
  • 459 06-11 Bias Detection via Maximum Subgroup Discrepancy Bias-Erkennung über maximale Untergruppe Diskrepanz 通过最大分组差异探测比亚斯 2502.02221v2
  • 460 06-11 LLM-Powered CPI Prediction Inference with Online Text Time Series LLM-Powered CPI Prediction Inferenz mit Online Text Time Series LLM 授权的CPI 在线文本时间序列预测推论 2506.09516v1
  • 461 06-11 A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications Eine Umfrage zur Rolle von Künstlicher Intelligenz und maschinellem Lernen in 6G-V2X Anwendungen 关于6G-V2X应用中人工情报和机器学习的作用的调查 2506.09512v1
  • 462 06-11 Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design Effizientes Preference-Based-Verstärkungslernen: Randomized Exploration trifft auf experimentelles Design 高效的以优惠为基础的强化学习:随机探索与实验设计 2506.09508v1
  • 463 06-11 A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes Eine einheitliche Theorie der Kompositionalität, Modularität und Interpretationsfähigkeit in Markov-Entscheidungsprozessen 马尔科夫决定程序的组成、模式和解释统一理论 2506.09499v1
  • 464 06-11 LEMUR Neural Network Dataset: Towards Seamless AutoML LEMUR Neural Network Dataset: Auf dem Weg zur nahtlosen AutoML LEMUR 神经网络数据集: 走向无缝自动ML 2504.10552v2
  • 465 06-11 EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization EnerBridge-DPO: Energiegeführtes Protein inverses Falten mit Markov-Brücken und direkter Präferenzoptimierung EnerBridge-DPO:以Markov桥和直接优化为首的热导蛋白质反向折叠 2506.09496v1
  • 466 06-11 Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers Bridging Online Behavior and Clinical Insight: Eine längsseitige LLM-basierte Studie zur Suizidalität auf YouTube enthüllt neuartige digitale Marker 架桥在线行为和临床透视:YouTube Reveals新数字标记上基于纵向LLM的自杀性纵向LM研究 2506.09495v1
  • 467 06-11 MIRA: Medical Time Series Foundation Model for Real-World Health Data MIRA: Medical Time Series Foundation Modell für real-World Gesundheitsdaten 医疗时间系列基金会实际世界卫生数据模型 2506.07584v2
  • 468 06-11 Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds Minimax-Optimalität von tiefen neuronalen Netzwerken auf abhängigen Daten über PAC-Bayes-Grenzen 通过PAC-Bayes界限依靠数据建立深海神经网络的最小最佳程度 2410.21702v2
  • 469 06-11 BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation BemaGANv2: Eine Tutorial- und Vergleichsstudie von GAN-basierten Vocodern für die Langzeit-Audiogenerierung BemaGANv2: 以GAN为基地的长期音频生成Vocoders的教学和比较调查 2506.09487v1
  • 470 06-11 Plug-and-Play image restoration with Stochastic deNOising REgularization Plug-and-Play-Bildrestaurierung mit stochastischer Entnoisierung der Regularisierung 利用斯托卡式脱氧化还溶性恢复插件和布板图像 2402.01779v3
  • 471 06-11 Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling Auf dem Weg zu einem einheitlichen und verlustfreien Latent-Raum für 3D-Molekular-Latent-Diffusionsmodell 迈向3D分子中子中子扩散模型的 统一和无损失的冷冻空间 2503.15567v3
  • 472 06-11 Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models Konvergenzbedingungen für stochastische Liniensuche auf Basis der Optimierung von überparametrisierten Modellen 存储线搜索的一致条件 优化过度平衡模型的优化 2408.03199v2
  • 473 06-11 On a few pitfalls in KL divergence gradient estimation for RL Bei einigen Fallen bei KL-Divergenzgradientenschätzung für RL KL RL差差差梯度估算的几个陷阱 2506.09477v1
  • 474 06-11 Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough? Modellgröße in kontinuierlichen Gauß-Prozessen anpassen: Wie groß ist groß genug? 调整连续高斯进程模型大小:大到多大? 2408.07588v4
  • 475 06-11 A Reinforcement Learning Approach for RIS-aided Fair Communications Ein stärkerer Lernansatz für RIS-gestützte Fair Communications 以强化学习方法促进RIRS援助的公平交流 2506.06344v2
  • 476 06-11 Forecasting high-impact research topics via machine learning on evolving knowledge graphs Prognose hochwirksamer Forschungsthemen durch maschinelles Lernen auf sich entwickelnden Wissensgraphen 通过对不断演变的知识图表进行机器学习,预测高影响研究专题 2402.08640v4
  • 477 06-11 ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning ReinFlow: Feinsteuerungs-Flow Matching-Politik mit Online-Verstärkungs-Lernen ReinFlow: 与在线强化学习匹配流动政策的微调 2505.22094v3
  • 478 06-11 Bayesian Neural Scaling Law Extrapolation with Prior-Fitted Networks Bayesian Neural Scaling Law Extrapolation mit Prior-Fitted-Netzwerke Bayesian神经扩增法与事先确定网络的外推法 2505.23032v2
  • 479 06-11 Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms Auf dem Weg zur Überbrückung des Lohn-Generation-Gaps in direkten Ausrichtungsalgorithmen 争取弥合直接调整数值中的奖励-评分差距 2506.09457v1
  • 480 06-11 NDCG-Consistent Softmax Approximation with Accelerated Convergence NDCG-Konsistente Softmax-Annäherung mit beschleunigter Konvergenz NDCG-Consistent Softmax 与加速汇合相近 2506.09454v1
  • 481 06-11 MADCluster: Model-agnostic Anomaly Detection with Self-supervised Clustering Network MADCluster: Modell-agnostische Anomalieerkennung mit selbstüberwachtem Clustering-Netzwerk MADCluster:使用自监管的集群网进行模型-不可知异常探测 2505.16223v5
  • 482 06-11 SafeCoT: Improving VLM Safety with Minimal Reasoning SafeCoT: VLM-Sicherheit mit minimaler Vernunft verbessern SafeCot: 提高VLM安全性,并采用最低合理理由 2506.08399v2
  • 483 06-11 Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform Verschlimmerungen von LLM-Embedding-Sequenzen lernen: Glas-Reformation 学习LLM 嵌入序列的学习障碍:染色玻璃变形 2506.09452v1
  • 484 06-11 Safe Screening Rules for Group SLOPE Sichere Screening-Regeln für Gruppen-LOPE SLOPE集团安全筛选规则 2506.09451v1
  • 485 06-11 Modified K-means Algorithm with Local Optimality Guarantees Modifizierter K-Mittel-Algorithmus mit lokalen Optimierungsgarantien 使用本地最佳度保障修改 K 平均值 2506.06990v2
  • 486 06-11 Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking Aufmerksamkeits-Bayesischer Hybrid-Ansatz für modulares multiples Partikel-Tracking 模块多粒子跟踪的注意 – – 巴伊斯混合办法 2506.09441v1
  • 487 06-11 Optimization over Sparse Support-Preserving Sets: Two-Step Projection with Global Optimality Guarantees Optimierung über Sparse Support-Preserving Sets: Zwei-Schritt-Projektion mit globalen Optimierungsgarantien 使用全球最佳保证的两步投影 2506.08558v2
  • 488 06-11 Generalization Error Analysis for Attack-Free and Byzantine-Resilient Decentralized Learning with Data Heterogeneity Generalisierungsfehleranalyse für Angriffsfreies und Byzantinisch-Resilientes dezentralisiertes Lernen mit Daten Heterogenität 为无攻击攻击和有数据差异化的抗拜占庭分散化学习提供通用错误分析 2506.09438v1
  • 489 06-11 Discovering Physics Laws of Dynamical Systems via Invariant Function Learning Physikgesetze dynamischer Systeme durch Invariant Function Learning entdecken 通过动态功能学习发现动态系统的物理法 2502.04495v2
  • 490 06-11 Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling Lohnüberoptimierung bei Direktausrichtungsalgorithmen mit Wichtigkeits-Sampling abmildern 在具有重要性抽样调查的直接对齐数值中,减缓回调过度优化 2506.08681v2
  • 491 06-11 When Is Diversity Rewarded in Cooperative Multi-Agent Learning? Wann wird Vielfalt im kooperativen Multi-Agenten-Lernen belohnt? 当多样性在多机构合作学习中得到回报吗? 2506.09434v1
  • 492 06-11 Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training Milderung von sauberen Korrelationen in LLMs durch Kausalität-Bewusstsein nach dem Training 通过Causaliity-Aware 培训后在LLMs中减轻净污损 2506.09433v1
  • 493 06-11 Through a Steerable Lens: Magnifying Neural Network Interpretability via Phase-Based Extrapolation Durch eine Steerable Lens: Vergrößerung der Neuralen Netzwerk-Interpretabilität über phasenbasierte Extrapolation a 通过固态透镜:通过基于分阶段的外推法使神经网络放大解释 2506.02300v3
  • 494 06-11 CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis CTPD: Cross-Modal Temporal Pattern Discovery für verbesserte multimodale elektronische Gesundheitsdatenanalyse CTPD: 增强多模式电子健康记录分析的跨模式时间模式发现 2411.00696v2
  • 495 06-11 Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic Monitoring Schwach überwachtes Multiple Instance-Lernen für Whale Call Detection und zeitliche Lokalisierung in der langdauerhaften passiven akustischen Überwachung 在长期被动听力监测中,受监督的捕鲸呼叫探测和时间定位多案例学习能力薄弱 2502.20838v2
  • 496 06-11 Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models Folgen Sie der Energie, finden Sie den Weg: Riemannsche Metrics aus energiebasierten Modellen 遵循能源,寻找道路:以能源为基础的模型的里曼尼度量仪 2505.18230v2
  • 497 06-11 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Vertrauen ist alles, was Sie brauchen: Wenig heißes RL Feintuning von Sprachmodellen 信心是你们需要的:很少点点RL精细的语言模式 2506.06395v3
  • 498 06-11 Improving the performance of optical inverse design of multilayer thin films using CNN-LSTM tandem neural networks Verbesserung der Leistungsfähigkeit des optischen inversen Designs von Mehrschicht-Dünnfilmen mit CNN-LSTM Tandem-Neuronalen Netzwerken 利用CNN-LSTM同步神经网络改进多层薄薄膜光反光设计性能 2506.10044v1
  • 499 06-11 Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation Zeiteinheitliche Diffusionspolitik mit Aktionsdiskriminierung für die Robotermanipulation 对机器人操纵行动歧视的、时间统一的统一传播政策 2506.09422v1
  • 500 06-11 A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy Ein Aufruf für kollaborative Intelligenz: Warum Menschen-Agenten-Systeme der KI-Autonomie vorausgehen sollten 呼吁合作情报:为什么人力-物力系统应先于自主 2506.09420v1
  • 501 06-11 NestQuant: Nested Lattice Quantization for Matrix Products and LLMs NestQuant: Nested Lattice Quantization für Matrix-Produkte und LLMs NestQuant: 母体产品和LLMs的Nasted Lattice量化 2502.09720v2
  • 502 06-11 DeepForm: Reasoning Large Language Model for Communication System Formulation DeepForm: Großes Sprachmodell für die Formulierung von Kommunikationssystemen 深形式:开发通信系统的理由大语言模式 2506.08551v2
  • 503 06-11 Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors Wer die Störung gestartet hat, sollte sie beenden: Leiten datenfreies Modell Zusammenführen über Task-Vektoren 开始干预者应当结束干预的任何人:指导通过任务矢量合并无数据模式 2503.08099v2
  • 504 06-11 Arrival Control in Quasi-Reversible Queueing Systems: Optimization and Reinforcement Learning Ankunftskontrolle in Quasi-Reversible Queueing Systemen: Optimierung und Verstärkungslernen 准可更新排队系统抵达控制:优化和强化学习 2505.16353v2
  • 505 06-11 Code-Switching Curriculum Learning for Multilingual Transfer in LLMs Code-Switching Curriculum Learning für den mehrsprachigen Transfer in LLMs 以LLMM为多种语言转让进行代码转换课程学习 2411.02460v2
  • 506 06-11 Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations Sim-to-Real-Causal-Transfer: Ein metrischer Lernansatz zu kausal-aware Interaktionsdarstellungen 简单到实际因果转移:从计量学习方法进行体能互动演示 2312.04540v2
  • 507 06-11 Scoop-and-Toss: Dynamic Object Collection for Quadrupedal Systems Scoop-and-Toss: Dynamische Objektsammlung für Vierradsysteme Scooop-tos:四制系统动态对象收藏 2506.09406v1
  • 508 06-11 Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps Feinabstimmungs-Diffusions-Politiken mit Backpropagation durch Diffusion Zeitschritte 微调通过扩散时间步骤进行反向宣传的传播政策 2505.10482v2
  • 509 06-11 What Can RL Bring to VLA Generalization? An Empirical Study Was kann RL zur VLA-Verallgemeinerung bringen? Eine empirische Studie RL能带给VLA的概括化带来什么?经验研究。 2505.19789v2
  • 510 06-11 AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI) AdversariaL attacK sAfety alignment(ALKALI): Sicherung von LLMs durch GRACE: Geometrische Darstellung-Bewusst Kontrastive Verbesserung- Einführung von Adversarial Vulnerability Quality Index (AVQI) Adversarial attacK sAfety alignment(ALKALI):通过GRACE来保护LLMS:几何代表-提高软件对抗性增强-采用反脆弱性质量指数(AVQI) 2506.08885v2
  • 511 06-11 Chem42: a Family of chemical Language Models for Target-aware Ligand Generation Chem42: eine Familie chemischer Sprachmodelle für die zielbewusste Ligand-Generation Chem42: “ 目标觉知长者 “ 化学语言模式家庭 2503.16563v2
  • 512 06-11 Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization Synergisierendes Verstärkungslernen und genetische Algorithmen für die neurale Kombinatorialoptimierung 实现神经组合优化的强化强化学习和遗传数值同步化 2506.09404v1
  • 513 06-11 Pre-trained Large Language Models Learn Hidden Markov Models In-context Vortrainierte große Sprachmodelle lernen versteckte Markov-Modelle In-Kontext 受过培训的大型语言模型学习隐藏Markov模型 2506.07298v2
  • 514 06-11 LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification LLM Enhancers for GNNs: Eine Analyse aus der Perspektive der Identifizierung von Kausalmechanismen GNN的LLM增强者:从因果关系机制的确定角度进行分析 2505.08265v3
  • 515 06-11 A theoretical basis for model collapse in recursive training Eine theoretische Grundlage für Modellkollaps in rekursiver Ausbildung 循环培训模式崩溃的理论基础 2506.09401v1
  • 516 06-11 Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames Effiziente Vorhersage von SO(3)-Equivarienten Hamiltonian Matrices über SO(2) Local Frames 通过SO(2)地方框架有效预测SO(3)-等质汉密尔顿母体 2506.09398v1
  • 517 06-11 The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent Der Computational Advantage of Depth: High-Dimensional Hierarchical Funktionen mit gradienten Abstieg lernen 深度的计算优势:学习带有渐变后代的高多元等级功能 2502.13961v3
  • 518 06-11 SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving SLED: Ein spekulatives LLM-Decoding-Framework für effizientes Edge Serving SLED: 有效边缘服务投机性LLM代谢框架 2506.09397v1
  • 519 06-11 A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities Ein genauerer Blick auf TabPFN v2: seine Stärken verstehen und seine Fähigkeiten erweitern 更仔细地看看TabPFN v 2:了解其力量并扩大其能力 2502.17361v2
  • 520 06-11 MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models MedChat: Multi-Agenten-Framework für multimodale Diagnosen mit großen Sprachmodellen MedChat:使用大语言模式的多语言多模式诊断多机构框架 2506.07400v2
  • 521 06-11 Spatial-RAG: Spatial Retrieval Augmented Generation for Real-World Geospatial Reasoning Questions Raum-RAG: Raum-Retrieval Augmented Generation für Real-World Geospatial Reasoning Fragen 空间-RAG:世界现实地理空间原因问题空间检索增加一代 2502.18470v5
  • 522 06-11 Coil2Coil: Self-supervised MR image denoising using phased-array coil images Coil2Coil: Selbstüberwachtes MR-Bild, das mit Hilfe von Phased-Array-Coil-Bildern dämpft Coil2Cooil:使用相相继阵状圆圈图像自行监督的 MR 图像除去污染 2208.07552v2
  • 523 06-11 Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation Die neuen Diffusionsmodelle: Vom Generativen Pre-Training bis zur Ein-Schritt-Generation 重新审视传播模式:从产生培训前培训到单步一代 2506.09376v1
  • 524 06-11 Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining Modality-Balancing Preference Optimierung großer multimodaler Modelle durch disversariale Negative Mining 通过反反向负负采矿优化大型多式联运模型 2506.08022v2
  • 525 06-11 Effective Regularization Through Loss-Function Metalearning Effektive Regularisierung durch Loss-Function Metalearning 通过损失 – – 职能计算学习实现有效正规化 2010.00788v5
  • 526 06-11 Low-resource domain adaptation while minimizing energy and hardware resource consumption Low-Resource-Domänenanpassung bei Minimierung des Energie- und Hardware-Ressourcenverbrauchs 低资源领域的适应,同时尽量减少能源和硬件资源的消耗 2506.08433v2
  • 527 06-11 TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation TimeDART: Ein Diffusion Autoregressiver Transformer für selbstüberwachte Zeitreihendarstellung TimeDART: 自动控制时间序列代表的传播自动递减变动器 2410.05711v5
  • 528 06-11 LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization LPO: Auf dem Weg zu einer präzisen Interaktion der GUI-Agenten über Standortpräferenzoptimierung LPO:通过位置优化实现准确的图形界面代理互动 2506.09373v1
  • 529 06-11 Learnable Spatial-Temporal Positional Encoding for Link Prediction Erlernbare räumlich-temporale Positionskodierung für Link Prediction 链接预测的空间-临时定位编码 2506.08309v2
  • 530 06-11 Anomaly Detection and Generation with Diffusion Models: A Survey Anomalieerkennung und -generierung mit Diffusionsmodellen: Eine Umfrage 使用传播模型进行异常检测和生成:调查 2506.09368v1
  • 531 06-11 An Online Learning Approach to Prompt-based Selection of Generative Models and LLMs Ein Online-Lernansatz zur sofortigen Auswahl generativer Modelle und LLMs 以在线学习方式迅速选择创创模式和LLMs 2410.13287v4
  • 532 06-11 SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending SkillBlender: Auf dem Weg zu einer vielseitigen humanoiden Ganzkörperloko-Manipulation durch Skill Blending 技术人才库:通过技能混合处理,努力通过技能混合处理,实现全体操控和操纵 2506.09366v1
  • 533 06-11 TextSquare: Scaling up Text-Centric Visual Instruction Tuning TextSquare: Skalieren von Text-Centric Visual Instruction Tuning TextSquare: 增强文本内容的视觉教学图示 2404.12803v3
  • 534 06-11 Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models Automatische Pseudo-Harmful Prompt Generation zur Bewertung falscher Ablehnungen in großen Sprachmodellen 自动生成用于评价大语言模式虚假拒绝的优多有害快速自动生成 2409.00598v2
  • 535 06-11 Generalized Lie Symmetries in Physics-Informed Neural Operators Generalisierte Lie-Symmetrien in physikinformierten Neuraloperatoren 物理内成形神经操作器中的一般谎言对称 2502.00373v2
  • 536 06-11 PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs PDE-Controller: LLMs für Autoformalisierung und Begründung von PDEs PDE – – 主计长:PDE的自动化正规化和原因说明 2502.00963v2
  • 537 06-11 Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation Autoregressive Adversariale Nachschulung für die Echtzeit-Interactive Videogenerierung 实时互动互动视频制作培训后自动递减反反反反向自动 2506.09350v1
  • 538 06-11 TS-PIELM: Time-Stepping Physics-Informed Extreme Learning Machine Facilitates Soil Consolidation Analyses TS-PIELM: Zeitschnelle Physik-informierte extreme Lernmaschine erleichtert Bodenkonsolidierungsanalysen TS-PIELM: 时间制备物理系统一体化极端学习机器促进土壤巩固分析 2506.08381v2
  • 539 06-11 Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis Warum sind Web-KI-Agenten anfälliger als Standalone LLMs? Eine Sicherheitsanalyse 为什么互联网上AI代理机构比独立的LLMs更加脆弱? 2502.20383v2
  • 540 06-11 Adversarial Surrogate Risk Bounds for Binary Classification Adversarial Surrogate Risk Bounds für Binäre Klassifizierung 二元分类的反向代谢风险临界值 2506.09348v1
  • 541 06-11 ErrorEraser: Unlearning Data Bias for Improved Continual Learning ErrorErraser: Unlearning Data Bias für verbessertes kontinuierliches Lernen 错误搜索器: 退出数据比亚, 用于改进持续学习 2506.09347v1
  • 542 06-11 Ming-Omni: A Unified Multimodal Model for Perception and Generation Ming-Omni: Ein einheitliches multimodales Modell für Wahrnehmung und Erzeugung Ming-Omni:感知和一代的统一多模式模型 2506.09344v1
  • 543 06-11 RePO: Replay-Enhanced Policy Optimization RePO: Replay-erweiterte Politikoptimierung REPO: 重放强化政策优化 2506.09340v1
  • 544 06-11 AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin AsFT: Verankerungssicherheit während LLM-Feinsteuerung innerhalb eines schmalen Sicherheitsbeckens AsFT:LLM在狭窄安全盆地内精密出海期间安整安全 2506.08473v2
  • 545 06-11 Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference Sentence-Level-Reward-Modell kann besser für die Ausrichtung LLM aus menschlichen Präferenzen Generalisieren 判决一级奖赏模式可以普遍化,更有利于将人类特惠的LLM与LM相匹配 2503.04793v4
  • 546 06-11 Know What You Don’t Know: Uncertainty Calibration of Process Reward Models Wissen, was Sie nicht wissen: Unsicherheit Kalibrierung von Prozess-Reward-Modellen 你不知道你不知道什么: 过程奖励模型的不确定性校准 2506.09338v1
  • 547 06-11 STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation STAMImputer: Raum-Temporale Aufmerksamkeit MoE für Verkehrsdaten Imputation STAMIMUTER: 交通数据截读的时/时注意MoE 2506.08054v2
  • 548 06-11 Natural Language Guided Ligand-Binding Protein Design Natürliche Sprache geführt Ligand-Binding Protein Design Ligand-Binding 蛋白设计 2506.09332v1
  • 549 06-11 SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space SWAT-NN: Simultangewichte und Architekturtraining für neurale Netzwerke im Latent Space SWAT-NN: 低空空间神经网络的同质重量和建筑培训 2506.08270v2
  • 550 06-11 Your Agent Can Defend Itself against Backdoor Attacks Ihr Agent kann sich gegen Hintertürangriffe verteidigen 你的特工可以保护自己 免受幕后攻击 2506.08336v2
  • 551 06-11 polyGen: A Learning Framework for Atomic-level Polymer Structure Generation polyGen: Ein Lernrahmen für die atomare Polymerstrukturbildung 聚金:原子级聚合体结构生成学习框架 2504.17656v3
  • 552 06-11 Low-Loss Space in Neural Networks is Continuous and Fully Connected Low-Loss-Raum in neuralen Netzwerken ist kontinuierlich und vollständig verbunden 神经网络中的低损失空间是连续和完全连接的 2505.02604v3
  • 553 06-11 Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies Beschleunigung der LLM-Inferenz mit verlustfreier spekulativer Dekodierung von Algorithmen für heterogene Vokabeln 加速LLM 超异基因词汇表无损失投机代谢值的加速LLM推导值 2502.05202v3
  • 554 06-11 Monte Carlo Tree Diffusion for System 2 Planning Monte Carlo Tree Diffusion für System 2 Planung 用于系统2规划的蒙特卡洛树传播 2502.07202v4
  • 555 06-11 On The Impact of Merge Request Deviations on Code Review Practices Über die Auswirkungen von Merge Request Abweichungen auf Code-Review-Praktiken 合并请求对守则审查惯例的影响 2506.08860v2
  • 556 06-11 Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models Schulungsfreie Anleitung über die Unterscheidungsfähigkeit: Skalierbare Pfadsteuerung mit Baumsuche in Diffusions- und Flussmodellen 超越差异的无培训指导—-超越差异的无培训指导:在传播和流动模型中进行植树搜索的可扩展路径指导 2502.11420v3
  • 557 06-11 Alzheimer’s Dementia Detection Using Perplexity from Paired Large Language Models Alzheimer’s Dementia Detection Using Perplexity from Paired Large Language Models 阿兹海默氏病痴呆症检测,使用Paired大语言模型的复度 2506.09315v1
  • 558 06-11 Surrogate models to optimize plasma assisted atomic layer deposition in high aspect ratio features Surrogate-Modelle zur Optimierung der plasmaunterstützten Atomschichtabscheidung in Eigenschaften mit hohem Seitenverhältnis 利用代用模型优化高方位比率特征的等离子体辅助原子层沉积 2506.09313v1
  • 559 06-11 What is the Cost of Differential Privacy for Deep Learning-Based Trajectory Generation? Was sind die Kosten der unterschiedlichen Privatsphäre für Deep Learning-Based Trajektorie Generation? 深层学习轨迹世代的不同隐私成本是多少? 2506.09312v1
  • 560 06-11 Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions Rewind-to-Delete: Zertifiziertes Maschinen-Entlernen für nicht konvexe Funktionen 倒回到删除: 认证机器不学习非convex函数 2409.09778v4
  • 561 06-11 A Geometric Modeling of Occam’s Razor in Deep Learning Eine geometrische Modellierung von Occams Razor im Deep Learning Occam的深层学习剃刀的几何建模 1905.11027v9
  • 562 06-11 PARAFAC2-based Coupled Matrix and Tensor Factorizations with Constraints PARAFAC2-basierte gekoppelte Matrix- und Tensor-Faktorisierungen mit Einschränkungen 以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2为基础的 2406.12338v2
  • 563 06-11 Neuromorphic Optical Tracking and Imaging of Randomly Moving Targets through Strongly Scattering Media Neuromorphes optisches Tracking und Bildgebung von zufällig sich bewegenden Zielen durch stark streuende Medien 通过强烈散射媒体对随机移动目标进行神经形态光学跟踪和成像 2501.03874v2
  • 564 06-11 Lower Bounds for Learning Quantum States with Single-Copy Measurements Untere Grenzen für das Lernen von Quantenzuständen mit Single-Copy-Messungen 具有单复度测量法的低下学习区次数国家 2207.14438v3
  • 565 06-10 (2) Lightweight Object Detection Using Quantized YOLOv4-Tiny for Emergency Response in Aerial Imagery Leichtgewichtige Objekterkennung mit Quantisiertem YOLOv4-Tiny für Notfallreaktion in Luftbildern 在空中图像中,使用量化的YOLOv4-Tiny用于应急反应的轻重量物体探测 2506.09299v1
  • 566 06-10 WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles WaKA: Datenzuweisung unter Verwendung von K-Nächstgelegenen Nachbarn und Mitgliedschafts-Privacy Principles WaKA:使用K-近邻和成员隐私权原则的数据归属 2411.01357v3
  • 567 06-10 Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems Künstliche Intelligenz für die Wissenschaft in Quanten-, Atom- und Kontinuumsystemen 量子、原子学和连续系统科学人造情报 2307.08423v5
  • 568 06-10 Causal Graph Recovery in Neuroimaging through Answer Set Programming Causal Graph Recovery in Neuroimaging durch Answer Set Programmierung 通过 “ 解答组 “ 方案编制实现神经成形过程中的因果图恢复 2506.09286v1
  • 569 06-10 ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs Skalierbare HD: Skalierbare und hochdurchsatzfähige Hyperdimensionale Computing-Inferenz auf Multi-Core-CPUs 可缩缩放的HD:可缩放的和高通向的多孔数多维电子计算引文 2506.09282v1
  • 570 06-10 TTrace: Lightweight Error Checking and Diagnosis for Distributed Training TTrace: Leichte Fehlerprüfung und Diagnose für verteiltes Training TTrace:轻量度错误检查和分布式培训诊断 2506.09280v1
  • 571 06-10 Ambient Diffusion Omni: Training Good Models with Bad Data Ambient Diffusion Omni: Gute Modelle mit schlechten Daten trainieren 环境扩散全云:培训不良数据良好模型 2506.10038v1
  • 572 06-10 A Topic Modeling Analysis of Stigma Dimensions, Social, and Related Behavioral Circumstances in Clinical Notes Among Patients with HIV Thematische Modellierung von Stigma-Dimensionen, sozialen und damit zusammenhängenden Verhaltensumständen bei Patienten mit HIV 艾滋病毒携带者临床说明中的污名、社会和相关行为环境的典型问题专题分析 2506.09279v1
  • 573 06-10 UFM: A Simple Path towards Unified Dense Correspondence with Flow UFM: Ein einfacher Weg zur Unified Dense Correspondence mit Flow UFM:通向统一思维与流动对应的简单路径 2506.09278v1
  • 574 06-10 EMMA: Efficient Visual Alignment in Multi-Modal LLMs EMMA: Effiziente visuelle Ausrichtung in Multi-Modal LLMs EMMA:多模式LMS中高效视觉协调 2410.02080v2
  • 575 06-10 Learning The Minimum Action Distance Die minimale Aktionsdistanz lernen 学习最低行动距离 2506.09276v1
  • 576 06-10 Amortized Inference of Causal Models via Conditional Fixed-Point Iterations Amortisierte Schlussfolgerung von Kausalmodellen über bedingte Fixed-Point-Iterationen 通过有条件的固定点定点迭代对因果模型进行摊销 2410.06128v3
  • 577 06-10 RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts RocketPPA: Code-Level Power, Performance und Area Prediction über LLM und Mixture of Experts 火箭式PPPA:通过LLM和专家混合进行代码级动力、性能和地区预测 2503.21971v3
  • 578 06-10 G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration G-Sim: Generative Simulationen mit großen Sprachmodellen und gradientfreier Kalibrierung G-Sim:使用大语言模型和梯度自由校准的生成模拟 2506.09272v1
  • 579 06-10 Uncertainty Prioritized Experience Replay Ungewissheit Priorisierte Erfahrung Replay 不确定性确定性优先经验重放 2506.09270v1
  • 580 06-10 A Stable Whitening Optimizer for Efficient Neural Network Training Ein stabiler Whitening Optimizer für effizientes Neural Network Training 高效神经网络培训的稳定白电优化器 2506.07254v2
  • 581 06-10 Dataset Properties Shape the Success of Neuroimaging-Based Patient Stratification: A Benchmarking Analysis Across Clustering Algorithms Datensatzeigenschaften gestalten den Erfolg der Neuroimaging-basierten Patientenschichtung: Eine Benchmarking-Analyse über Cluster-Algorithmen hinweg 数据集属性形状基于神经成形的患者分层成功率:对集束数值的基准分析 2503.12066v2
  • 582 06-10 Archon: An Architecture Search Framework for Inference-Time Techniques Archon: Ein Architektur-Suchrahmen für Inferenz-Zeit-Techniken Archon: 推断-时间技术建筑搜索框架 2409.15254v6
  • 583 06-10 CFMI: Flow Matching for Missing Data Imputation CFMI: Flow Matching für fehlende Daten-Imputation CFMI: 缺失数据计算流程匹配 2506.09258v1
  • 584 06-10 Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Kommentar zur Illusion des Denkens: Die Stärken und Grenzen von Vernunftmodellen über das Lens of Problem Complexity verstehen 关于 “ 思考的幻觉:通过问题复杂焦点了解理性模型的长处和局限性 “ 的评论 2506.09250v1
  • 585 06-10 Simulation-based Inference for High-dimensional Data using Surjective Sequential Neural Likelihood Estimation Simulationsbasierte Inferenz für hochdimensionale Daten mittels Surjektiv-Sequentieller Neural Likelihood-Abschätzung 使用粒子序列神经神经近似近似性估计法对高维数据进行基于模拟的推论 2308.01054v3
  • 586 06-10 Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation 在多式工业数据库检索增强型多式工业数据库中有条件监测援助 2506.09247v1
  • 587 06-10 A Generative Framework for Predictive Modeling of Multiple Chronic Conditions Using Graph Variational Autoencoder and Bandit-Optimized Graph Neural Network Ein generatives Framework für die vorausschauende Modellierung mehrerer chronischer Zustände mit Graph Variational Autoencoder und Bandit-Optimiertes Graphen-Neural-Netzwerk 利用图表变异自动编码器和强盗优化的图形神经网络对多种慢性条件进行预测模拟的生成框架 2409.13671v3
  • 588 06-10 Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning Ursachenzuweisung von Lieferrisiken durch kausale Entdeckung mit Verstärkungslernen 利用强化学习发现原因和强化学习,确定分娩风险的根源原因 2408.05860v3
  • 589 06-10 Electroencephalogram Emotion Recognition via AUC Maximization Elektroenzephalogramm-Emotionserkennung über AUC-Maximierung 通过AUUC实现最大化来识别电脑图情感 2408.08979v3
  • 590 06-10 PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies PatchGuard: Widerwärtig robuste Anomalieerkennung und Lokalisierung durch Vision Transformer und Pseudoanomalien PatchGuard:通过视觉变形器和假冒异常器反向强力异常检测和定位 2506.09237v1
  • 591 06-10 FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy FastLloyd: Federated, Exakt, Secure, and Tunable $k$-Means Clustering with Differential Privacy 联邦、准确、安全和金枪鱼用$k$-配有不同隐私的组合型工具 2405.02437v3
  • 592 06-10 LieRE: Lie Rotational Positional Encodings LieRE: Lie Rotational Positional Encodings 谎言: 谎言旋转位置编码 2406.10322v4
  • 593 06-10 SoK: Machine Unlearning for Large Language Models SoK: Maschinelles Lernen für große Sprachmodelle SoK:大语言模型的机器脱学 2506.09227v1
  • 594 06-10 The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound Die Wirksamkeit von Semantik-erhaltenden Transformationen im selbstüberwachten Lernen für medizinischen Ultraschall 医学超声波自监测学习中语义保留变异的功效 2504.07904v2
  • 595 06-10 Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs Robuste Geräuschdämpfung durch adaptive Bündelung von Transformer-Ausgängen 通过调适式组合变异器产出,通过调适式组合组合,强有力噪音减速 2506.09215v1
  • 596 06-10 Revisiting Graph Projections for Effective Complementary Product Recommendation Überarbeiten von Graph Projektionen für eine effektive ergänzende Produktempfehlung 修订有效补充产品建议书的图表预测 2506.09209v1
  • 597 06-10 A Comparative Study of Conventional and Tripolar EEG for High-Performance Reach-to-Grasp BCI Systems Vergleichende Studie des konventionellen und tripolaren EEG für Hochleistungs-Reach-to-Grap BCI-Systeme 利用常规和三极EEG进行高绩效接近格拉斯普到格拉斯普BCI系统比较研究 2402.09448v3
  • 598 06-10 FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models FLoRIST: Singular Value Thresholding für effizientes und präzises Federated Fine-Tuning von großen Sprachmodellen FLORST:为高效率和准确的联邦大语言模型的精密使用而保持单一价值 2506.09199v1
  • 599 06-10 Multimodal Pragmatic Jailbreak on Text-to-image Models Multimodaler Pragmatischer Jailbreak zu Text-to-Image-Modellen 关于文本到图像模型的多式实用监狱 2409.19149v2
  • 600 06-10 Certified Unlearning for Neural Networks Zertifiziertes Lernen für neurale Netzwerke 神经网络认证的不学习 2506.06985v2
  • 601 06-10 LaDCast: A Latent Diffusion Model for Medium-Range Ensemble Weather Forecasting LaDCast: Ein latentes Diffusionsmodell für mittelgroße Ensemble-Wettervorhersage LaDCast: 中频集合天气预报的延迟传播模型 2506.09193v1
  • 602 06-10 Multi-Task Reward Learning from Human Ratings Multi-Task-Reward-Lernen aus menschlichen Bewertungen 从人类评级中学习多任务奖励 2506.09183v1
  • 603 06-10 Pointwise confidence estimation in the non-linear $\ell^2$-regularized least squares Punktweise Vertrauensschätzung in den nichtlinearen $\ell^2$-regularisierten kleinsten Quadraten 在非线性$\ell2美元正规化最低平方中点性信心估计值 2506.07088v2
  • 604 06-10 Spatial Reasoning with Denoising Models Raumgerechtes Denken mit vernarbenden Modellen 空间原因与排减模式 2502.21075v2
  • 605 06-10 Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism Roboter-Gated Interaktives Imitation-Lernen mit adaptivem Interventionsmechanismus 采用适应性干预机制的机器人自制交互式模拟学习 2506.09176v1
  • 606 06-10 Multivariate Long-term Time Series Forecasting with Fourier Neural Filter Multivariate Langzeit-Zeitreihenprognose mit Fourier-Neuralfilter 使用 Fourier神经过滤器预测多变长期时间序列 2506.09174v1
  • 607 06-10 The Curious Language Model: Strategic Test-Time Information Acquisition Das Kuriose Sprachmodell: Strategische Test-Zeit-Informationsbeschaffung 好奇语言模式:战略试验时间信息采集 2506.09173v1
  • 608 06-10 MultiNet: An Open-Source Software Toolkit \& Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models MultiNet: Ein Open Source Software Toolkit \& Benchmark Suite für die Bewertung und Anpassung multimodaler Aktionsmodelle 多网:多模式行动模式评价和适应评估的开放源软件工具包 基准套件 2506.09172v1
  • 609 06-10 Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search Verbesserung der LLM-Agentenplanung mit In-Context Learning über Atomic Fact Augmentation und Lookahead Search 通过原子事实增强和目光头搜索改进LLM Agent 规划,通过原子事实增强和目光头搜索进行文中学习 2506.09171v1
  • 610 06-10 ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning ProteinZero: Selbstverbessernde Proteinerzeugung über Online-Verstärkungslernen 蛋白零:通过在线强化学习改善蛋白质生成 2506.07459v2
  • 611 06-10 Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models Ist die Lärmkonditionierung notwendig? Eine einheitliche Theorie der Bedingungslosen Graphen-Diffusionsmodelle 是否有必要设定噪音条件? 无条件图形扩散模型的统一理论 2505.22935v2
  • 612 06-10 Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes Skalierbare Spatiotemporale Inferenz mit Biased Scan Aufmerksamkeit Transformer Neural Prozesse 可缩缩缩的 Spatoto时推理 2506.09163v1
  • 613 06-10 ICONS: Influence Consensus for Vision-Language Data Selection ICONS: Einflusskonsens für Vision-Sprachen-Datenauswahl ICONS: 愿景语言数据选择影响共识 2501.00654v3
  • 614 06-10 FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts FC-Attack:通过自动发光流程图制作的破监狱多式多式大语言模型 2502.21059v2
  • 615 06-10 QuEST: Stable Training of LLMs with 1-Bit Weights and Activations QuEST: Stabiles Training von LLMs mit 1-Bit Gewichten und Aktivierungen QuEST:对具有1-比分重量和激活作用的LLMs进行稳定培训 2502.05003v2
  • 616 06-10 TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval TRACE: Erdungszeitreihen im Kontext für multimodale Einbettung und Retrieval TRACE: 在多式嵌入和检索背景下确定时间序列 2506.09114v1
  • 617 06-10 Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations Verständnis von Aufgabenvektoren im In-Context-Lernen: Emergenz, Funktionalität und Einschränkungen 了解内文学习中的任务矢量:新出现、功能和局限性 2506.09048v1
  • 618 06-10 Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation Agentische Neuronale Netzwerke: Selbstständige Multi-Agenten-Systeme über textuelle Backpropagation 动态神经网络:通过文字反向分析实现自我演进的多行为者系统 2506.09046v1
  • 619 06-10 Evolutionary Policy Optimization Optimierung der Evolutionspolitik 演变政策优化 2503.19037v2
  • 620 06-10 The Decoupled Risk Landscape in Performative Prediction Die entkoppelte Risikolandschaft in der Performativen Vorhersage 性能预测中脱钩的风险景观 2506.09044v1
  • 621 06-10 FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed FZOO: Schneller Zeroth-Order-Optimierer für feintunende große Sprachmodelle in Richtung Adam-Scale-Geschwindigkeit FZOO: 向亚当规模速度的微调大语言模型最佳最佳优选器 2506.09034v1
  • 622 06-10 Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning Router-R1: Lehren von LLMs Multi-Round Routing und Aggregation durch Verstärkungslernen 路由-R1路由-R1路由:教学LLMS 2506.09033v1
  • 623 06-10 Optimal Spectral Transitions in High-Dimensional Multi-Index Models Optimale Spektralübergänge in High-Dimensional Multi-Index-Modellen 高多维多指数模型中最佳光谱转换 2502.02545v2
  • 624 06-10 JuStRank: Benchmarking LLM Judges for System Ranking JuStRank: Benchmarking der LLM-Richter für Systemranking JuStRank:为系统评分确定LLM法官基准 2412.09569v2
  • 625 06-10 Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery Spektrale Schätzwerte für Multi-Index-Modelle: Präzise Asymptotik und optimale schwache Erholung 多 Index 模型的光谱光谱模拟器:精确的无症状和最佳微弱恢复 2502.01583v2
  • 626 06-10 Diffuse and Disperse: Image Generation with Representation Regularization Diffuse und Disperse: Bildgenerierung mit Repräsentationsregularisierung Diffuse & diffperse: 形象生成,有代表性的规范化 2506.09027v1
  • 627 06-10 e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs e3: Erforschen lernen ermöglicht Extrapolation von Test-Time Compute für LLMs e3: 学习探索以利对LLMM的试验时间计算进行外推计算 2506.09026v1
  • 628 06-10 DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging DIsoN: Dezentrale Isolationsnetzwerke für Out-of-Distribution Detection in Medical Imaging DISON:医疗成像中分散分布式隔离探测网 2506.09024v1
  • 629 06-10 Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing Wahrscheinlich kostensensive gegnerische Verteidigung durch Randomized Glättung 通过随机滑动平滑,进行成本敏感度高的反反向防御 2310.08732v3
  • 630 06-10 Edit Flows: Flow Matching with Edit Operations Flows bearbeiten: Flow passend zu Bearbeitungsoperationen 编辑流程: 与编辑操作匹配的流程 2506.09018v1
  • 631 06-10 SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning SPEED-RL: Schnellere Schulung von Vernunftmodellen über Online Curriculum Learning SPEEED-RL:通过在线课程学习更快地培训推理模型 2506.09016v1
  • 632 06-10 Effective Data Pruning through Score Extrapolation Effektive Datenkorrektur durch Score-Extrapolation 通过计分法外推法有效调节数据 2506.09010v1
  • 633 06-10 Scalable Equilibrium Sampling with Sequential Boltzmann Generators Skalierbare Equilibrium-Probenahme mit sequenziellen Boltzmann-Generatoren 配有序列波尔茨曼发电机的可缩缩平衡取样器 2502.18462v2
  • 634 06-10 Branched Schrödinger Bridge Matching Verzweigte Schrödingerbrücke passend 分割的 Schrödinger 桥桥配对 2506.09007v1
  • 635 06-10 CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model CodeBrain: Bridging Entkoppelter Tokenizer und Multi-Scale-Architektur für EEG-Stiftungsmodell 编码系统:为EEG基金会模式建立连接拆分调控器和多规模结构 2506.09110v1
  • 636 06-10 SensorLM: Learning the Language of Wearable Sensors SensorLM: Die Sprache der tragbaren Sensoren lernen 传感器LM:学习穿装传感器的语言 2506.09108v1
  • 637 06-10 Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Effiziente Ausrichtung der medizinischen Visionssprache durch Anpassung maskierter Visionsmodelle 通过调适蒙面愿景模型,提高医疗愿景-语言协调效率 2506.08990v1
  • 638 06-10 SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning SwS: Selbstbewusste Schwäche-getriebene Problemsynthese bei der Stärkung des Lernens für LLM-Reasoning SWS:自觉的弱点驱动的问题综述 2506.08989v1
  • 639 06-10 Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models Quamba2: Robustes und skalierbares Post-Training Quantization Framework für selektive State Space Models Quamba2:有选择性的国家空间模型强力和可扩展的培训后量化框架 2503.22879v3
  • 640 06-10 PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips PrisonBreak: Gefängnisbrechende große Sprachmodelle mit weniger als fünfundzwanzig gezielten Bit-Flips 监狱破碎:破禁大语言模式,目标位翻页小于25个目标小于25个 2412.07192v2
  • 641 06-10 Bias Analysis in Unconditional Image Generative Models Bias-Analyse in bedingungslosen Bildgenerativen Modellen 无条件图像生成模型中的偏见分析 2506.09106v1
  • 642 06-10 DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models DiffLM: Ansteuerbare Synthetische Datengenerierung über Diffusions-Sprachenmodelle DiffLM: 通过传播语言模型生成可控合成数据 2411.03250v2
  • 643 06-10 Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs Exakte obere und untere Grenzen für die Ausgabeverteilung von neuralen Netzwerken mit Random Inputs 带有随机输入的神经网络输出分布精确的上下上下曲线 2502.11672v2
  • 644 06-10 Propositional Logic for Probing Generalization in Neural Networks Propositionelle Logik zur Probing-Verallgemeinerung in neuralen Netzwerken 神经网络普遍化的预示逻辑 2506.08978v1
  • 645 06-10 Tailored Architectures for Time Series Forecasting: Evaluating Deep Learning Models on Gaussian Process-Generated Data Maßgeschneiderte Architekturen für die Prognose von Zeitreihen: Bewertung von Deep Learning-Modellen auf gaussischen prozessgenerierten Daten 时间序列预测:评价高斯进程生成数据的深学习模型 2506.08977v1
  • 646 06-10 Calibrated Physics-Informed Uncertainty Quantification Kalibrierung der physikinformierten Unsicherheit Quantifizierung 校准物理 - 内部不确定性量化 2502.04406v2
  • 647 06-10 On Large-scale Evaluation of Embedding Models for Knowledge Graph Completion Zur groß angelegten Bewertung von Einbettungsmodellen für die Wissensgraphenvervollständigung 大规模评价知识图完成情况嵌入模型 2504.08970v2
  • 648 06-10 Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers Vortrainierte Sprachmodelle lernen bemerkenswert genaue Darstellungen von Zahlen 培训前语文模式的精确数字表示 2506.08966v1
  • 649 06-10 GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO GFREND: Generatives, nur wenige Schuss Belohnungsableitung durch EfficieNt DPO GFRIEND: 通过公平竞争和消费者保护局生成的微小的报偿推理 2506.08965v1
  • 650 06-10 Gumbel-max List Sampling for Distribution Coupling with Multiple Samples Gumbel-max-Listenstichproben für die Verteilungskopplung mit mehreren Proben 与多个样本相混合的分布式组合的 Gumbel- max 列表列表抽样 2506.05632v2
  • 651 06-10 BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly 双形:双体几何大会的学习协作能力 2506.06221v2
  • 652 06-10 Towards Robust Deep Reinforcement Learning against Environmental State Perturbation Auf dem Weg zu einem robusten, tiefen Stärkungslernen gegen Umweltzustandsstörung 努力开展强有力的深入强化学习,防止环境状态的干扰 2506.08961v1
  • 653 06-10 MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning MetaTT: Ein globaler Tensor-Train-Adapter für parametereffizientes Feintuning MetATT: 参数有效精美调试全球Tensor-Train适应器 2506.09105v1
  • 654 06-10 IntTrajSim: Trajectory Prediction for Simulating Multi-Vehicle driving at Signalized Intersections IntTrajSim: Trajektorienvorhersage für das Simulieren von Multi-Fahrzeug-Fahren bei Signalisierten Schnitten IntrajSim: 信号交界处模拟多车辆驾驶轨迹预测 2506.08957v1
  • 655 06-10 Unifying Block-wise PTQ and Distillation-based QAT for Progressive Quantization toward 2-bit Instruction-Tuned LLMs Vereinheitlichen Blockweise PTQ und Destillationsbasiertes QAT für die progressive Quantisierung in Richtung 2-Bit Instruction-Tuned LLMs 用于逐步量化至2位制导二位制导LMM 的QAT 统一块式 PTQ 和基于蒸馏的QAT 统一PTQ 和 蒸馏基QAT 2506.09104v1
  • 656 06-10 Data Augmentation For Small Object using Fast AutoAugment Datenvergrößerung für kleine Objekte mit schneller AutoAugment 使用快速自动增强的小型对象数据增强 2506.08956v1
  • 657 06-10 Segment Concealed Objects with Incomplete Supervision Segmentverdeckte Objekte mit unvollständiger Überwachung 带不完全监督的隐藏物体 2506.08955v1
  • 658 06-10 Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction Protriever: End-to-End-Unterscheidbare Proteinhomologie Suche nach Fitness-Vorhersage 远期:最终到最终差异可辨别的蛋白质同理学研究,以预测适合性。 2506.08954v1
  • 659 06-10 Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Explanations for Multivariate Time Series Classification Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Erklärungen für multivariate Zeitreihenklassifikation 多变量时间序列分类的多目标子序列、基于多目标子序列的简单反事实解释 2501.04009v2
  • 660 06-10 Epistemic Uncertainty in Conformal Scores: A Unified Approach Epistemische Ungewissheit in konformen Partituren: Ein einheitlicher Ansatz 非正式分数中的不确定性:统一办法 2502.06995v2
  • 661 06-10 Implicit Neural Representations for Chemical Reaction Paths Implizite neurale Darstellungen für chemische Reaktionswege 化学反应路径的隐性神经图示 2502.15843v2
  • 662 06-10 KARMA: A Multilevel Decomposition Hybrid Mamba Framework for Multivariate Long-Term Time Series Forecasting KARMA: Ein Multilevel-Dekompositions-Hybrid-Mamba-Framework für multivariate Langzeitreihenprognosen KARMA: 多变长期时间序列预测多层次分解混合 Mamba 框架 2506.08939v1
  • 663 06-10 BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models BioLangFusion: Multimodale Fusion von DNA, mRNA und Protein Language Models BioLangFusion:DNA、MRNA和蛋白素语言模型的多式联运 2506.08936v1
  • 664 06-10 Can A Gamer Train A Mathematical Reasoning Model? Kann ein Gamer ein mathematisches Reasoning-Modell trainieren? 赌徒能训练数学理论模型吗? 2506.08935v1
  • 665 06-10 Local MDI+: Local Feature Importances for Tree-Based Models Lokale MDI+: lokale Feature-Wichtigkeiten für baumbasierte Modelle 本地 MDI+: 树基模型的本地特点重要性 2506.08928v1
  • 666 06-10 Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap Multimodale Unüberwachte Domain-Verallgemeinerung durch Herausholen über die Modalitätslücke 通过跨越模式差距的检索实现多式多式不受监督域化 2402.04416v3
  • 667 06-10 PropMEND: Hypernetworks for Knowledge Propagation in LLMs PropMEND: Hypernetworks für die Wissensverbreitung in LLMs PropMEND: 以LLMMs方式传播知识的超网络 2506.08920v1
  • 668 06-10 Provably Accurate Adaptive Sampling for Collocation Points in Physics-informed Neural Networks Wahrscheinlich genaue adaptive Probenahme für Kollokationspunkte in physikinformierten Neuronalen Netzwerken 物理知情神经网络定位点的准确适应性抽样 2504.00910v2
  • 669 06-10 Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL) Verbesserung der Verallgemeinerbarkeit der Modellentdeckung über den Parameterraum mit Multi-Experiment-Gleichung Lernen (ME-EQL) 通过多实验方程学习(ME-EQL)加强跨参数空间模型发现的一般性 2506.08916v1
  • 670 06-10 Federated Generalised Variational Inference: A Robust Probabilistic Federated Learning Framework Federated Generalised Variational Inference: Ein robustes probabilistisches Federated Learning Framework 联邦通用多变性推断:强力概率联邦学习框架 2502.00846v3
  • 671 06-10 Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU Implementierung von Keyword-Spoting auf dem MCUX947 Microcontroller mit integrierter NPU 执行关于MCUX947微型控制器与集成 NPU 2506.08911v1
  • 672 06-10 BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning BEAST: Effiziente Tokenisierung von B-Splines kodierten Aktionssequenzen für das Imitationslernen BEAST: 模拟学习的B-S级编码行动序列的有效招接 2506.06072v2
  • 673 06-10 Intention-Conditioned Flow Occupancy Models Intentionskonditionierte Flow-Belegungsmodelle 有意有条件流动占用模式 2506.08902v1
  • 674 06-10 Feature Shift Localization Network Feature Shift Localization Network 本地化网络 2506.09101v1
  • 675 06-10 Generalization analysis of an unfolding network for analysis-based Compressed Sensing Generalisierungsanalyse eines sich entfaltenden Netzwerks für analysebasiertes Komprimiertes Sensing 对正在展开的基于分析的压缩遥感压缩分析网络的概括化分析 2303.05582v3
  • 676 06-10 Improved Variational Inference in Discrete VAEs using Error Correcting Codes Verbesserte Variationale Schlussfolgerungen in diskreten VAEs mit Fehlerkorrekturcodes 使用错误校正代码的差错 VAEs 中改进的变量引用 2410.07840v2
  • 677 06-10 Real-Time Cascade Mitigation in Power Systems Using Influence Graph Improved by Reinforcement Learning Real-Time Cascade Mitigation in Stromsystemen mit Einfluss Graph verbessert durch Verstärkung Learning 利用强化学习所改进的影响图,在电力系统中实时连锁减缓电源系统 2506.08893v1
  • 678 06-10 SeerAttention-R: Sparse Attention Adaptation for Long Reasoning SeherAchtung-R: Sparse-Achtung-Anpassung für langes Nachdenken SeerAttention-R: 偏差的注意力调整以适应长期理由 2506.08889v1
  • 679 06-10 InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation Analysis InfoDPCCA: Informationstheoretische dynamische probabilistische kanonischen Korrelationsanalyse InfoDPCA: 信息理论动态动态概率能力与卡尼诺相关关系分析 2506.08884v1
  • 680 06-10 Filling in the Blanks: Applying Data Imputation in incomplete Water Metering Data Füllen der Blanks: Anwenden von Daten Imputation in unvollständigen Wasserzählerdaten 填充空白:在不完整的水计量数据中应用数据计算 2506.08882v1
  • 681 06-10 Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models Pivoting Factorization: Eine kompakte Meta-Low-Rank-Darstellung von Sparsität für effiziente Schlussfolgerungen in großen Sprachmodellen 重力因素化:大语言模型中高效引引的公平代表性的契约元低率代表 2501.19090v2
  • 682 06-10 Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks Sparse Spectral Training und Inferenz auf Euklidische und Hyperbolische Neuronale Netzwerke Euclidean 和超双曲神经网络的 广度光谱培训和推断 2405.15481v2
  • 683 06-10 Adapting to Heterophilic Graph Data with Structure-Guided Neighbor Discovery Anpassung an heterophile Graphendaten mit strukturgeführter Neighbor Discovery 与结构引导邻居发现发现器相适应以适应结构辅助邻居发现器的热对数图形数据 2506.08871v1
  • 684 06-10 Low-Rank Augmented Implicit Neural Representation for Unsupervised High-Dimensional Quantitative MRI Reconstruction Low-Rank Augmented Implizite Neural Repräsentanz für unüberwachte hochdimensionelle Quantitative MRT-Rekonstruktion 低Rank 低强度增强的内隐隐含神经力代表,用于未受监督的高多样性定量MRI重建 2506.09100v1
  • 685 06-10 k-NN as a Simple and Effective Estimator of Transferability k-NN als einfacher und effektiver Schätzwert für die Übertragbarkeit k-NN作为可转让性的一个简单而有效的模拟器 2503.18528v2
  • 686 06-10 Deep reinforcement learning for near-deterministic preparation of cubic- and quartic-phase gates in photonic quantum computing Tiefes Verstärkungslernen für die nahezu deterministische Vorbereitung von kubischen und quartic-Phasentoren im photonischen Quanten-Computing 深入强化学习,为光子量衡算的立方和临界阶段大门进行近乎决定性的准备 2506.07859v2
  • 687 06-10 Cooperative and Collaborative Multi-Task Semantic Communication for Distributed Sources Kooperative und kollaborative Multi-Task Semantische Kommunikation für verteilte Quellen 合作与合作 多任务、跨语义交流促进分配源的传播 2411.02150v2
  • 688 06-10 StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams StreamSplat: Auf dem Weg zu Online Dynamische 3D-Rekonstruktion aus unkalibrierten Videostreams StreamSplat:从未经校正的视频流走向在线动态3D重建 2506.08862v1
  • 689 06-10 DIME:Diffusion-Based Maximum Entropy Reinforcement Learning DIME:Diffusionsbasiertes maximales Entropie-Verstärkungs-Lernen DIME: 以传播为基础的最大助爆强化学习 2502.02316v2
  • 690 06-10 Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers Zu groß zu denken: Kapazität, Erinnerung und Verallgemeinerung in vortrainierten Transformern 能力、记忆和在培训前变异器中的普及化 2506.09099v1
  • 691 06-10 On the Hardness of Sampling from Mixture Distributions via Langevin Dynamics Über die Härte der Probenahme aus Mischungsverteilungen über Langevin Dynamics 使用 Langevin Directives 进行混音发行的样本的强度 2406.02017v3
  • 692 06-10 Learning surrogate equations for the analysis of an agent-based cancer model Erlernen von Surrogatgleichungen für die Analyse eines agentenbasierten Krebsmodells 用于分析基于制剂的癌症模式的学习替代方程 2503.01718v2
  • 693 06-10 Agile Reinforcement Learning for Real-Time Task Scheduling in Edge Computing Agiles Verstärkungslernen für Echtzeit-Aufgabenplanung im Edge Computing 边边计算中实时任务排期 2506.08850v1
  • 694 06-10 IMAGIC-500: IMputation benchmark on A Generative Imaginary Country (500k samples) IMAGIC-500: Imputations-Benchmark für ein generatives Imaginary Country (500k-Proben) IMAGIC-500: 创造想象力国家(500公里样本)的估测基准 2506.08844v1
  • 695 06-10 Quantum Architecture Search with Unsupervised Representation Learning Quantenarchitektur Suche mit unüberwachtem Repräsentationslernen 以无人监督的代表制学习方式搜索量子结构 2401.11576v5
  • 696 06-10 Robot Pouring: Identifying Causes of Spillage and Selecting Alternative Action Parameters Using Probabilistic Actual Causation Robot-Pouring: Identifizierung von Ursachen von Spillage und Auswahl alternativer Aktionsparameter mittels probabilistischer tatsächlicher Schadensursachen 机器人定位:利用概率实际原因查明螺旋原因和选择替代行动参数 2502.09395v3
  • 697 06-10 Nearly Optimal Differentially Private ReLU Regression Fast optimal unterschiedliche private ReLU-Regression 接近最佳差异化私自 ReLU 回归 2503.06009v2
  • 698 06-10 Relational decomposition for program synthesis Verhältnismäßige Zersetzung für die Programmsynthese 用于方案合成的通货膨胀分解 2408.12212v3
  • 699 06-10 Stepsize anything: A unified learning rate schedule for budgeted-iteration training Stepsize everything: Eine einheitliche Lernrate Zeitplan für budgetierte iteration Ausbildung 采取任何步骤:为列入预算的就地培训统一学习率时间表 2505.24452v2
  • 700 06-10 EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations EquivaMap: LLM-Leveraging für automatische Äquivalenzprüfung von Optimierungsformulierungen Equivamap: 利用LMLMs对优化配方进行自动等效检查 2502.14760v2
  • 701 06-10 High-Dimensional Independence Testing via Maximum and Average Distance Correlations High-Dimensional Independence Testing über maximale und mittlere Distanzkorrelationen 通过最大和平均距离交错进行高度多功能独立测试 2001.01095v4
  • 702 06-10 Activated LoRA: Fine-tuned LLMs for Intrinsics Aktivierte LoRA: Feinjustierte LLMs für die Intrinsik 活性LORA: 用于内分泌的微调LMLM 2504.12397v4
  • 703 06-10 Spectral invariance and maximality properties of the frequency spectrum of quantum neural networks Spektrale Invarianz und maximale Eigenschaften des Frequenzspektrums quantenneuraler Netzwerke 量子神经网络频谱的光谱性和最大度性 2402.14515v3
  • 704 06-10 Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO Eintauchen in RL für Bildgenerierung mit CoT: Eine Studie über DPO vs. GRPO 与COT一起为图像生成而跳入RL:关于DPO与GROP公司的研究 2505.17017v2
  • 705 06-10 In Praise of Stubbornness: An Empirical Case for Cognitive-Dissonance Aware Continual Update of Knowledge in LLMs In Lob der Stubbornness: Ein empirischer Fall für Kognitiv-Dissonanz Aware Continual Update of Knowledge in LLMs 赞美固态:关于知识在LLM中不断更新的认知与不相识经验案例 2502.04390v2
  • 706 06-10 Improving the Noise Estimation of Latent Neural Stochastic Differential Equations Verbesserung der Lärmschätzung latenter neuraler stochastischer Differentialgleichungen 改进对内心神经斯托卡式差异的噪音估计 2412.17499v2
  • 707 06-10 Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval Optimierte Text-Embedding-Modelle und Benchmarks für die Amharische Passage Retrieval 阿姆光通过通过检索的最佳文本嵌入模型和基准 2505.19356v2
  • 708 06-10 syren-baryon: Analytic emulators for the impact of baryons on the matter power spectrum Syren-Baryon: Analytische Emulatoren für den Einfluss von Baryonen auf das Materie-Leistungsspektrum syren-baryon: 用于分析男爵对物质能量频谱影响的分析模拟器 2506.08783v1
  • 709 06-10 LT-PINN: Lagrangian Topology-conscious Physics-informed Neural Network for Boundary-focused Engineering Optimization LT-PINN: Lagrangian Topologie-bewusste physik-informierte Neuronales Netzwerk für boundary-focused Engineering Optimization LT-PINN:Lagrangian 地形 – – 具有意识的地形 – – 物理意识 – – 以边界为重点的工程优化知情神经网络 2506.06300v2
  • 710 06-10 Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models Landsat-Bench: Datensätze und Benchmarks für Landsat-Stiftungsmodelle Landsat-Bench:Landsat基金会模型数据集和基准 2506.08780v1
  • 711 06-10 Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch Silencing Empowerment, Allowing Bigotry: Auditierung der Moderation von Hass Rede auf Twitch 静默赋权,允许偏见:审计对抽搐者仇恨言论的温和程度 2506.07667v2
  • 712 06-10 Data-Driven Discovery of PDEs via the Adjoint Method Datengestützte Entdeckung von PDEs über die Adjoint-Methode 通过 “ 联合方法 “ 发现项目设计书的数据驱动数据 2401.17177v4
  • 713 06-10 TextAtari: 100K Frames Game Playing with Language Agents TextAtari: 100K Frames Spiel mit Sprachagenten TextAtari: 100K 框架游戏与语言代理游戏 2506.04098v2
  • 714 06-10 Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal Discovery Pfade zur Kausalität: Informative Subgraphen innerhalb von Wissensgraphen finden für wissensbasierte Kausale Entdeckung 因果路径:在知识型因果发现的知识图中查找信息子集 2506.08771v1
  • 715 06-10 CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries KLARIFY: Kontrastive Präferenz Verstärkungslernen für untangling Ambiguous Queries CLARIFY: 拼凑不清的问询中竞争优惠强化学习 2506.00388v3
  • 716 06-10 Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting Kann langsam denkende LLMs Vernunft im Laufe der Zeit? Empirische Studien in Zeitreihen Prognose 时间序列预测的经验研究 2505.24511v2
  • 717 06-10 Detecting malignant dynamics on very few blood sample using signature coefficients Erkennung maligner Dynamik an sehr wenigen Blutproben mit Signaturkoeffizienten 利用签字系数检测极少数血液样本中的恶性动态 2506.09097v1
  • 718 06-10 On the Stability of the Jacobian Matrix in Deep Neural Networks Zur Stabilität der Jacobian Matrix in tiefen neuralen Netzwerken 深神经网络Jacobian矩阵的稳定 2506.08764v1
  • 719 06-10 EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements EDINET-Bench: Bewertung von LLMs zu komplexen Finanzaufgaben anhand japanischer Finanzausweise EDINET-Bench:利用日本财务报表评估复杂金融任务LMs 2506.08762v1
  • 720 06-10 Intra-Trajectory Consistency for Reward Modeling Intra-Trajektorien-Konsistenz für Prämienmodellierung 奖励建模在轨内兼容性 2506.09096v1
  • 721 06-10 Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL Verbesserung der Genauigkeit und Aufrechterhaltung des Datenrücklaufs von Kernkraftwerken: Ein funktionsorientierter LLM-Ansatz über NL-zu-SQL 提高核电厂数据检索的准确性和可维护性:NL-SQL的功能计算LLM法 2506.08757v1
  • 722 06-10 Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction Denken vs. Tun: Agenten, die Grund durch Skalierung Test-Zeit-Interaktion 思考与做:通过扩大试验时间互动而有理由的代理商 2506.07976v2
  • 723 06-10 Understanding High-Dimensional Bayesian Optimization High-Dimensional Bayesian Optimierung verstehen 理解高多样性巴耶斯最佳优化 2502.09198v2
  • 724 06-10 Superposed Parameterised Quantum Circuits Überlagerte parametrische Quantenkreise 量子电路 2506.08749v1
  • 725 06-10 Towards Secure and Private Language Models for Nuclear Power Plants Auf dem Weg zu sicheren und privaten Sprachmodellen für Kernkraftwerke 争取实现核电厂安全和私用语言模式 2506.08746v1
  • 726 06-10 Bridging RDF Knowledge Graphs with Graph Neural Networks for Semantically-Rich Recommender Systems Bridging RDF Knowledge Graphs mit Graph Neural Networks für Semantically-Rich Recommender Systeme 将RDF知识图与Smantic-Rich建议系统图形神经网络连接起来 2506.08743v1
  • 727 06-10 Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports Urban Incident Prediction mit Graph Neural Networks: Integration von Regierungsbewertungen und Crowdsourced Reports 与图形神经网络城市事件预测:整合政府评级和众包报告 2506.08740v1
  • 728 06-10 Exploration by Random Reward Perturbation Exploration durch zufällige Belohnung Störung 利用随机奖励干扰探测 2506.08737v1
  • 729 06-10 scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data scSSL-Bench: Benchmarking Selbstüberwachtes Lernen für Single-Cell-Daten scSSL-Bench:为单一分类数据制定自我监督学习基准 2506.10031v1
  • 730 06-10 Fusing Global and Local: Transformer-CNN Synergy for Next-Gen Current Estimation Fusing Global und Local: Transformer-CNN-Synergie für die aktuelle Schätzung von Next-Gen 全球和局部引信: 变异器-CNN 用于下金当前估算的协同效应 2504.07996v2
  • 731 06-10 Flexible and Efficient Drift Detection without Labels Flexible und effiziente Drift-Erkennung ohne Etiketten 无标签的灵活而高效的漂流探测 2506.08734v1
  • 732 06-10 Mitigating fairwashing using Two-Source Audits Fairwashing durch Zwei-Quellen-Audits abmildern 利用双重来源审计减少洗水 2305.13883v2
  • 733 06-10 Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs Breaking the ICE: Erforschen von Versprechungen und Herausforderungen von Benchmarks für Inferenz-Kohlenstoff- & Energieschätzungen für LLMs 打破ICE:探索LLMM的碳和能源估算基准的许诺和挑战 2506.08727v1
  • 734 06-10 Stop Misusing t-SNE and UMAP for Visual Analytics Stoppen Sie die missbräuchliche Verwendung von t-SNE und UMAP für Visual Analytics 停止滥用 t- SNE 和视觉分析分析 UMAP 2506.08725v1
  • 735 06-10 VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? VIST-GPT: Sich in der Ära des Visual Storytelling mit LLMs niederlassen? 使用LLMS的视觉故事时代? 2504.19267v3
  • 736 06-10 Strain Problems got you in a Twist? Try StrainRelief: A Quantum-Accurate Tool for Ligand Strain Calculations Strain Probleme haben Sie in einem Twist? Versuchen Sie StrainRelief: Ein Quantum-Accurate-Tool für Ligand Strain Berechnungen Strain Relief: Ligand Strain 计算 的量子精确工具 2503.13352v2
  • 737 06-10 TimeWak: Temporal Chained-Hashing Watermark for Time Series Data TimeWak: Temporales Ketten-Hashing-Wasserzeichen für Zeitreihendaten TimeWak:用于时间序列数据的时链套装水印 2506.06407v2
  • 738 06-10 ASIDE: Architectural Separation of Instructions and Data in Language Models ASIDE: Architekturelle Trennung von Instruktionen und Daten in Sprachmodellen AIDIDE: 语言模式教学和数据结构分类 2503.10566v3
  • 739 06-10 A Survey on Quantum Machine Learning: Current Trends, Challenges, Opportunities, and the Road Ahead Eine Umfrage zum Quantum Machine Learning: Aktuelle Trends, Herausforderungen, Chancen und die Zukunft 关于 “ 量子机器学习:当前趋势、挑战、机遇和路前道路 “ 的调查 2310.10315v4
  • 740 06-10 Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data Intelligenter zusammenführen, besser verallgemeinern: Modell verbessern Zusammenführen auf OOD-Daten 合并的 “ 聪明者 “ 、 “ 更普遍化:加强OOD数据模型合并 “ 2506.09093v1
  • 741 06-10 Variational Autoencoder-Based Approach to Latent Feature Analysis on Efficient Representation of Power Load Monitoring Data Variationeller Autoencoder-basierter Ansatz zur Latent Feature Analyse zur effizienten Darstellung von Power Load Monitoring Daten 以变式自动编码器为基础,对电荷载量监测数据高效代表情况进行长期特征分析的变式自动编码器方法 2506.08698v1
  • 742 06-10 Self-Training Elicits Concise Reasoning in Large Language Models Selbst-Trainings-Elizite schließen Vernunft in großen Sprachmodellen ein 大语言模式中的简明理由 2502.20122v3
  • 743 06-10 CUDA-LLM: LLMs Can Write Efficient CUDA Kernels CUDA-LLM: LLMs können effiziente CUDA-Kernel schreiben CUDA-LLM: 能够写写有效CUDA核心的LLMss 2506.09092v1
  • 744 06-10 Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy Variationelle Schlussfolgerung optimiert mit der gekrümmten Geometrie der gekoppelten freien Energie 使用共同自由能源曲线几何法优化 2506.09091v1
  • 745 06-10 Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? Optuna vs Code Llama: Sind LLMs ein neues Paradigma für Hyperparameter-Tuning? Optuna 与 Code Llama:LLMs是超参数图例的新范式吗? 2504.06006v3
  • 746 06-10 RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling RuleReasoner: Verstärkte regelbasiertes Reasoning via Domain-aware Dynamic Sampling 规则响应器:通过Domain-aware动态抽样加强基于规则的理由说明 2506.08672v1
  • 747 06-10 sparseGeoHOPCA: A Geometric Solution to Sparse Higher-Order PCA Without Covariance Estimation spärlichGeoHOPCA: Eine geometrische Lösung, um PCA ohne Kovarianzabschätzung höher zu bestellen 稀多GeoHOPCA:无变化估计的普通五氯苯甲醚的几何解决办法 2506.08670v1
  • 748 06-10 Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search Mit Blueprints und Prompt Template Search die Fähigkeit kleiner Sprachmodelle zu verbessern 提高有蓝图和快速模板搜索的小型语言模式的理性能力 2506.08669v1
  • 749 06-10 Optimizing Learned Image Compression on Scalar and Entropy-Constraint Quantization Optimierung der Bildkompression auf Scalar- und Entropie-Konstraint-Quantisierung 优化 Scalar 和 Entropy- Contrpy- 约束量化的学习图像压缩 2506.08662v1
  • 750 06-10 Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness Auf dem Weg zu robuster Echtzeit-Multivariate-Zeitreihenprognose: Ein einheitliches Framework für Abhängigkeit, Asynchronität und Missingness 走向强劲的现实世界多变时间序列预测:关于依赖、同步和失踪的统一框架 2506.08660v1
  • 751 06-10 Integrating Asynchronous AdaBoost into Federated Learning: Five Real World Applications Integration von Asynchron AdaBoost in das Federated Learning: Fünf Anwendungen in der realen Welt 将AdaBoost亚洲同步体纳入联邦学习:五个真实世界应用 2506.09090v1
  • 752 06-10 Scalable Graph Attention-based Instance Selection via Mini-Batch Sampling and Hierarchical Hashing Skalierbare Grafik aufmerksamkeitsbasierte Instanzenauswahl über Mini-Batch-Sampling und Hierarchical Hashing 通过小型批量抽样和等级类散列选择基于可缩放图表的注意点选案 2502.20293v2
  • 753 06-10 When Simple Model Just Works: Is Network Traffic Classification in Crisis? Wenn einfaches Modell gerade funktioniert: Ist Netzwerkverkehrsklassifikation in der Krise? ” 简单模型 “ 何时起作用:网络交通分类是否处于危机之中? 2506.08655v1
  • 754 06-10 A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck Ein Privacy-Preserving Föderated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck 保护隐私的联邦学习框架,以普及CBCT, 将合成CT翻译在头部和颈部 2506.08654v1
  • 755 06-10 Compositional Causal Reasoning Evaluation in Language Models kompositorische Kausale Reasoning-Bewertung in Sprachmodellen 语文模式的构成原因原因原因评估 2503.04556v4
  • 756 06-10 JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset JoFormer (Journal-basierter Transformer): Theorie und empirische Analyse auf dem winzigen Shakespeare-Datensatz Joformer(基于日圆的变换器):关于小莎士比亚数据集的理论和经验分析 2506.08652v1
  • 757 06-10 BridgeNet: A Hybrid, Physics-Informed Machine Learning Framework for Solving High-Dimensional Fokker-Planck Equations BridgeNet: Hybrides, physikinformiertes Machine Learning Framework zur Lösung hochdimensionaler Fokker-Planck-Gleichungen BridgeNet:用于解决高二分法克-普朗克赤道的混合、物理成形机械学习框架 2506.04354v3
  • 758 06-10 TableDreamer: Progressive and Weakness-guided Data Synthesis from Scratch for Table Instruction Tuning TableDreamer: Progressive und schwache Datensynthese von Scratch für die Table Instruction Tuning 表Dreamer:Scratch为表格指示图纸提供的渐进和弱点制导数据合成 2506.08646v1
  • 759 06-10 Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach Durchgängige cross-modale und uni-modale Darstellungen: Ein Kronecker-Produktansatz 引信的跨模式和统一模式代表:克罗马克产品方法 2506.08645v1
  • 760 06-10 Semi-gradient DICE for Offline Constrained Reinforcement Learning Halbgradienter DICE für Offline-Beschränktes Verstärkungslernen 用于非线外约束训练强化学习的半级DICE 2506.08644v1
  • 761 06-10 An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation Eine Analyse von Hyperparameter-Optimierungsmethoden für die retrieval Augmented Generation 对回收增加的一代的超光子辐射仪最佳优化方法的分析 2505.03452v2
  • 762 06-10 Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers Zeitreihendarstellungen für die Klassifizierung liegen in vorgebildeten Visionstransformatoren versteckt 未受过训练的愿景变异者中隐藏的分类隐隐藏物的 时间序列 2506.08641v1
  • 763 06-10 CASE: Contrastive Activation for Saliency Estimation RECHTSSACHE: Kontrastive Aktivierung für Salienzschätzung CASE: 弹性估计的反动动动动 2506.07327v2
  • 764 06-10 Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth Effizientes Feintuning Quantisierter Modelle über adaptive Rank und Bitwidth 通过适应级和Bitwidth对量化模型进行高效率的微调 2505.03802v3
  • 765 06-10 Discovering Continuous-Time Memory-Based Symbolic Policies using Genetic Programming Kontinuierliche-Zeit-Speicher-basierte symbolische Richtlinien mit genetischer Programmierung entdecken 利用遗传方案拟定办法发现基于连续时间记忆的象征性政策 2406.02765v6
  • 766 06-10 HSG-12M: A Large-Scale Spatial Multigraph Dataset HSG-12M: Ein großräumiger multigraphischer Datensatz HSG-12M:大型空间多光谱数据集 2506.08618v1
  • 767 06-10 Predicting and Understanding College Student Mental Health with Interpretable Machine Learning Vorhersagen und Verständnis College Student Mental Health mit Interpretable Machine Learning 预测和理解学院学生心理健康与可解释机器学习 2503.08002v3
  • 768 06-10 Generalizing while preserving monotonicity in comparison-based preference learning models Verallgemeinern unter Wahrung der Monotonizität in vergleichsbasierten Präferenz-Lernmodellen 在以比较为基础的优惠学习模式中,在保持单一性的同时,实现普遍化 2506.08616v1
  • 769 06-10 PatchTrAD: A Patch-Based Transformer focusing on Patch-Wise Reconstruction Error for Time Series Anomaly Detection PatchTrad: Ein Patch-basierter Transformer mit Schwerpunkt auf Patch-Wise-Reconstruction-Fehler für Zeitreihenanomalienerkennung PatchTraAD:一个以补丁为主的变换器,侧重于用于反常探测的时间序列的补补维重建错误 2504.08827v2
  • 770 06-10 A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices Ein zweistufiges Datenauswahl-Framework für dateneffiziente Modellschulungen auf Edge-Geräten 边缘装置数据有效模型培训两阶段数据选择框架 2505.16563v2
  • 771 06-10 CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference CENTAUR: Die Unmögliche Dreifaltigkeit von Datenschutz, Effizienz und Leistung in der Datenschutz-Bewahrung Transformer-Schlussfolgerung CENTAUR:在隐私保护变换人推断中弥合隐私、效率和性能这三方面不可能实现的三重问题 2412.10652v2
  • 772 06-10 Spiking Neural Models for Decision-Making Tasks with Learning Spiralische Neuralmodelle für Entscheidungs-Making-Aufgaben mit Lernen 学习决策任务Spiking神经模型 2506.09087v1
  • 773 06-10 Sample Efficient Demonstration Selection for In-Context Learning Beispiel Effiziente Demonstrationsauswahl für das In-Context-Lernen 内部学习的高效示范选择 2506.08607v1
  • 774 06-10 Flow Matching Meets PDEs: A Unified Framework for Physics-Constrained Generation Flow Matching trifft auf PDEs: Ein einheitliches Framework für physikbedingte Generationen 流动匹配符合PDEs:受物理约束的一代的统一框架 2506.08604v1
  • 775 06-10 CALT: A Library for Computer Algebra with Transformer CALT: Eine Bibliothek für Computer Algebra mit Transformer CALT:一个带有变换器的计算机代数库 2506.08600v1
  • 776 06-10 Model-based learning for joint channel estimationand hybrid MIMO precoding Modellbasiertes Lernen für gemeinsame Kanalschätzung und hybride MIMO-Vorkodierung 联合渠道估计和混合混合混合的混合混合混合混合混合混合混合混合模式的模型学习 2505.04255v2
  • 777 06-10 Solving excited states for long-range interacting trapped ions with neural networks Lösen von angeregten Zuständen für Langstrecken-Interagieren von gefangenen Ionen mit neuronalen Netzwerken 用神经网络解决长距离互动嵌入离子的兴奋状态 2506.08594v1
  • 778 06-10 Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings Dichte Retriever können bei einfachen Abfragen scheitern: Enthüllen des Granularitäts-Dilemmas der Einbettungen 简单查询: 读取嵌入器的颗粒性三角形 2506.08592v1
  • 779 06-10 Diversity-Guided MLP Reduction for Efficient Large Vision Transformers Diversity-geführte MLP-Reduktion für effiziente Großbildtransformatoren 为高效的大型愿景变异器而减少由多样性引导的减量 2506.08591v1
  • 780 06-10 How transformers learn structured data: insights from hierarchical filtering Wie Transformatoren strukturierte Daten lernen: Erkenntnisse aus der hierarchischen Filterung 变压器如何学习结构化数据:从上层过滤器获得的洞察力 2408.15138v3
  • 781 06-10 Diffusion-based Time Series Forecasting for Sewerage Systems Diffusionsbasierte Zeitreihenprognosen für Kanalisationssysteme 排污系统时间序列预报 2506.08577v1
  • 782 06-10 FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL FlickerFusion: Intra-Trajektorie Domain Generalizing Multi-Agent RL FlickerFusion: 磁盘内域域 通用多代理 RL 2410.15876v4
  • 783 06-10 How Malicious AI Swarms Can Threaten Democracy Wie bösartige KI-Schwärme die Demokratie bedrohen können 恶意的AI Swararms能够如何威胁民主 2506.06299v2
  • 784 06-10 The Geometries of Truth Are Orthogonal Across Tasks Die Geometrien der Wahrheit sind orthogonal über Aufgaben hinweg 真理的地貌不同任务 2506.08572v1
  • 785 06-10 Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification Asymptotische Normalität von unendlichen zentrierten Zufallswäldern -Anwendung auf eine unausgewogene Klassifizierung 无限集中的随机森林的零散状态—-对平衡分类的适用 2506.08548v1
  • 786 06-10 LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation LLM-ML Teaming: Integrierte Symbolische Dekodierung und Gradientensuche nach gültiger und stabiler Generativer Feature Transformation LLM-ML 团队化:综合符号解码和渐进搜索,以进行有效和稳定的生成地貌转型 2506.09085v1
  • 787 06-10 Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models Verbesserte Ganzseitenoptimierung durch gemischt-körnige Belohnungsmechanismen-angepasste Sprachmodelle 通过混合发奖机制-成熟语言模式优化全页 2506.09084v1
  • 788 06-10 Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models Auf dem Weg zu einem autonomen Stärkungslernen für die real-weltliche Robotermanipulation mit großen Sprachmodellen 争取以大语言模式进行现实世界机器人操纵的自主强化学习 2503.04280v4
  • 789 06-10 Structured Variational $D$-Decomposition for Accurate and Stable Low-Rank Approximation Strukturierte Variation $D$-Dekomposition für genaue und stabile Low-Rank-Annäherung 结构变化($D美元)-准确度和稳定、低 Rank 接近度的削减 2506.08535v1
  • 790 06-10 Regularized Langevin Dynamics for Combinatorial Optimization Regularisierte Langevin-Dynamik für kombinatorische Optimierung 固定的用于组合优化的兰埃文动态 2502.00277v2
  • 791 06-10 Federated Learning: From Theory to Practice Föderiertes Lernen: Von der Theorie zur Praxis 联邦学习:从理论到实践 2505.19183v2
  • 792 06-10 Robust Evolutionary Multi-Objective Network Architecture Search for Reinforcement Learning (EMNAS-RL) Robuste evolutionäre multiobjektive Netzwerkarchitektur Suche nach Verstärkungslernen (EMNAS-RL) 强劲进化多目标网络强化学习建筑搜索(EMNAS-RL) 2506.08533v1
  • 793 06-10 Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks Effiziente robuste konforme Vorhersage über Lipschitz-gebundene Netzwerke 通过Lipschitz-Bounds网络进行非正式预测 2506.05434v2
  • 794 06-10 Leveraging chaos in the training of artificial neural networks Das Chaos in der Ausbildung künstlicher neuronaler Netze abbauen 在培训人工神经网络时利用混乱 2506.08523v1
  • 795 06-10 FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching FEDTAIL: Federated Long-Tailed Domain Generalization mit Sharpness-Guided Gradient Matching FEDTAIL: 采用尖锐引导梯度匹配法的联邦长期失败域 2506.08518v1
  • 796 06-10 Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering Meta-Adaptive Prompt-Destillation für wenige-heiße visuelle Frage-Antworten 用于少数浅光视觉问题解答的元适应性快速蒸馏 2506.06905v2
  • 797 06-10 NeurIPS 2024 ML4CFD Competition: Results and Retrospective Analysis NeurIPS 2024 ML4CFD Wettbewerb: Ergebnisse und Retrospektivenanalyse NeurIPS 2024 ML4CFD竞争:结果和回溯性分析 2506.08516v1
  • 798 06-10 DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training DiffGradCAM: Eine universelle Aktivierungskarte der Klasse, die dem adversarialen Training standhält DiffGradCAM: 通用级启动地图抗反向培训 2506.08514v1
  • 799 06-10 Human-Aligned Image Models Improve Visual Decoding from the Brain Menschlich ausgerichtete Imagemodelle verbessern die visuelle Dekodierung aus dem Gehirn 人与人之间的图像模型改进大脑的视觉解码 2502.03081v3
  • 800 06-10 Explaining, Fast and Slow: Abstraction and Refinement of Provable Explanations Erklären, Schnell und Langsam: Abstraktion und Verfeinerung wahrnehmbarer Erklärungen 解释、快速和缓慢:可允许解释的抽象和完善 2506.08505v1
  • 801 06-10 CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations CoMuMDR: Code-gemischter multimodaler Multidomain-Korpus für Diskurs paRsing in Gesprächen COMMDR: 混合编码的多式混合多式对话中聊天声响多式公文 2506.08504v1
  • 802 06-10 Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning Memory, Benchmark & Robots: Ein Benchmark für die Lösung komplexer Aufgaben mit Verstärkungslernen 记忆、基准和机器人:用强化学习解决复杂任务的基准 2502.10550v2
  • 803 06-10 DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs In Konflikte geraten: In suchgesteigerten LLMs widersprüchliche Quellen erkennen und bekämpfen 钻入冲突:发现和解决搜索中的冲突源 2506.08500v1
  • 804 06-10 Chip Placement with Diffusion Models Chipplatzierung mit Diffusionsmodellen 与传播模型一起放置芯片 2407.12282v3
  • 805 06-10 The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning Das Gesicht des Populismus: Prüfung von Unterschieden in Gesichtsgefühlen politischer Führer mit Hilfe von maschinellem Lernen 民粹主义的面孔:利用机器学习研究政治领袖的情感表达方式的差别 2304.09914v5
  • 806 06-10 Thermodynamically Consistent Latent Dynamics Identification for Parametric Systems Thermodynamisch konsistente Latent Dynamics-Identifikation für parametrische Systeme 参数系统热动力一致的冷流动态识别 2506.08475v1
  • 807 06-10 Penalty Learning for Optimal Partitioning using Multilayer Perceptron Penalty Learning für optimale Partitionierung mit Multilayer Perceptron 使用多层 Perceptron 进行最佳分区的惩罚学习 2408.00856v4
  • 808 06-10 A Survey of the Self Supervised Learning Mechanisms for Vision Transformers Eine Übersicht über die selbstüberwachten Lernmechanismen für Visionstransformatoren 展望变革者自我监督学习机制调查 2408.17059v5
  • 809 06-10 How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation Wie richten und ergänzen Bilder LiDAR? Auf dem Weg zu einer harmonisierten multimodalen 3D-Panoptischen Segmentierung 图像如何对齐和补充 LiDAR ? 2505.18956v2
  • 810 06-10 AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models AVA-Bench: Atomic Visual Ability Benchmark für Vision Foundation Modelle AVA-Bench:远景基础模型的原子视觉能力基准 2506.09082v1
  • 811 06-10 MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature MAC: Eine effiziente Gradienten-Vorkonditionierung mit der mittleren Aktivierung Ungefähre Krümmung MAC: 使用平均振动作用的高效渐进式预设设备 2506.08464v1
  • 812 06-10 How to Provably Improve Return Conditioned Supervised Learning? Wie man wahrscheinlich verbessern Rückkehr bedingtes Überwachtes Lernen? 如何改善受监督的学习? 2506.08463v1
  • 813 06-10 MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning MOBODY: Modellbasierte Off-Dynamik Offline-Verstärkung Lernen MOBODY: 示范非药物非药物脱线强化学习 2506.08460v1
  • 814 06-10 Textual Unlearning Gives a False Sense of Unlearning Textuelles Nichtlernen gibt einen falschen Sinn für das Nichtlernen 文字不学习给人一个错误的不学习感 2406.13348v3
  • 815 06-10 The interplay of robustness and generalization in quantum machine learning Das Zusammenspiel von Robustheit und Verallgemeinerung im quantenmechanischen Lernen 量子机器学习中强力和一般化的相互作用 2506.08455v1
  • 816 06-10 An Explainable Vision Transformer with Transfer Learning Combined with Support Vector Machine Based Efficient Drought Stress Identification Ein erklärbarer Vision Transformer mit Transfer Learning kombiniert mit Unterstützung Vector Maschine basierte effiziente Drought Stress-Identifikation 一个可解释的愿景变异器,与转让学习相结合,与基于支持矢量的高效干旱应激反应机的识别 2407.21666v2
  • 817 06-10 Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data Stabilisierendes Kontrastives RL: Techniken für das Roboterziel, das von Offline-Daten aus erreicht wird 稳定违规RL:利用离线数据实现机器人目标的技术 2306.03346v3
  • 818 06-10 Systematic and Efficient Construction of Quadratic Unconstrained Binary Optimization Forms for High-order and Dense Interactions Systematische und effiziente Konstruktion von Quadratic Unconstrained Binary Optimization Forms für High-Order und Dense Interaktionen 系统、高效地为高阶和高调互动和高调互动系统构建 “ 中度、无限制的 “ 优化二进制表格 2506.08448v1
  • 819 06-10 Understanding Bias Reinforcement in LLM Agents Debate Verständnis der Bias-Verstärkung in LLM-Agenten-Debatte 了解LLLM代理商的强化申请 2503.16814v3
  • 820 06-10 A Framework for Controllable Multi-objective Learning with Annealed Stein Variational Hypernetworks Ein Rahmen für kontrollierbares multiobjektives Lernen mit Annealed Stein Variational Hypernetworks 与Annaaled Stein变异超网络的可控多目标学习框架 2506.06715v2
  • 821 06-10 Time-Aware World Model for Adaptive Prediction and Control Time-Aware-Weltmodell für adaptive Vorhersage und Kontrolle 适应性预测和控制世界模型 2506.08441v1
  • 822 06-10 Refiner: Data Refining against Gradient Leakage Attacks in Federated Learning Raffinerie: Datenveredelung gegen gradiente Leckage-Angriffe im Federated Learning 完善:完善数据,防止联邦学习中逐渐泄漏事件 2212.02042v3
  • 823 06-10 Scaling Laws in Linear Regression: Compute, Parameters, and Data Skalierungsgesetze in linearer Regression: Berechnung, Parameter und Daten 线性回归法的扩展法:计算、参数和数据 2406.08466v3
  • 824 06-10 Learning to Lead: Incentivizing Strategic Agents in the Dark Führen lernen: Anreize für strategische Agenten im Dunkeln 学习铅:在黑暗中激励战略剂 2506.08438v1
  • 825 06-10 Olica: Efficient Structured Pruning of Large Language Models without Retraining Olica: Effizientes strukturiertes Beschneiden großer Sprachmodelle ohne Umschulung Olica:无再培训的大型语言模型高效结构稳健 2506.08436v1
  • 826 06-10 Boosting Gradient Leakage Attacks: Data Reconstruction in Realistic FL Settings Förderung gradienter Leckagenangriffe: Datenwiederaufbau in realistischen FL-Einstellungen 推动渐变渗漏袭击:在现实的FL设置中进行数据重建 2506.08435v1
  • 827 06-10 FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making FinHEAR: Human Expertise und adaptive Risiko-Bewusst Temporale Begründung für Finanzentscheidungen FinHEAR: 金融决策的人类专门知识和适应性风险—-风险警报时间原因 2506.09080v1
  • 828 06-10 Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings Verstärkte Konvergenzraten für die nichtkonvexe Optimierung durch Reduktionsmappings 通过减少绘图实现非电流优化的夏端趋同率 2506.08428v1
  • 829 06-10 HASFL: Heterogeneity-aware Split Federated Learning over Edge Computing Systems HASFL: Heterogenität bewusst Split Federated Learning over Edge Computing Systems HASFL: 超越边缘电子计算系统的异异质-认知异质分化联邦学习系统 2506.08426v1
  • 830 06-10 Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy Mic-hackathon 2024: Hackathon zum maschinellen Lernen für Elektronen- und Rastersondenmikroskopie Mic-hackathon 2024:关于电子和扫描探测器机器学习的Hackathon 2506.08423v1
  • 831 06-10 Dense ReLU Neural Networks for Temporal-spatial Model Dense ReLU Neuronale Netzwerke für zeitlich-räumliches Modell 时空空间模型神经网络 2411.09961v8
  • 832 06-10 DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization DisCO: Mit diskriminativ eingeschränkter Optimierung groß angelegte Modelle stärken disCO:加强大理由理由模型,采用有偏见的限制性最佳化 2505.12366v2
  • 833 06-10 Online Learning-guided Learning Rate Adaptation via Gradient Alignment Online Learning-guided Learning Rate Anpassung über Gradient Alignment 通过 “ 渐进协调 “ 调整,调整在线学习指导学习率 2506.08419v1
  • 834 06-10 Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood Offline RL mit glatter OOD-Verallgemeinerung in Convex Hull und seiner Nachbarschaft 在Convex Hull及其邻里平滑OOOD平滑的离线RL(RL) 2506.08417v1
  • 835 06-10 DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data DISCO balanciert die Skalen: Adaptives Domain- und Schwierigkeitsgrad-Bewusst-Verstärkung-Lernen auf unausgewogenen Daten DISCO 平衡规模:关于数据不平衡的适应性域域和困难性软件强化学习 2505.15074v2
  • 836 06-10 Improved Scaling Laws in Linear Regression via Data Reuse Verbesserte Skalierungsgesetze bei linearer Regression über Datenwiederverwertung 通过数据再利用改进线性回归通过数据再利用的扩展法 2506.08415v1
  • 837 06-10 Learning to Hear Broken Motors: Signature-Guided Data Augmentation for Induction-Motor Diagnostics Lernen, gebrochene Motoren zu hören: Signatur-geführte Datenvergrößerung für Induktions-Motor-Diagnostik 学习听听破碎汽车:诱导-摩托汽车诊断的签名指导数据增强 2506.08412v1
  • 838 06-10 Mutual-Taught for Co-adapting Policy and Reward Models Gegenseitiges Lernen für Ko-Anpassungspolitik und Prämienmodelle 共同适应政策和奖励模式相互培训 2506.06292v2
  • 839 06-10 FUSE: Measure-Theoretic Compact Fuzzy Set Representation for Taxonomy Expansion FUSE: Maß-theoretische Kompakt-Fuzzy-Set-Darstellung für Taxonomie-Expansion FUSE: 测量-理论结合模糊集束代表扩大分类学 2506.08409v1
  • 840 06-10 Towards Generalized Source Tracing for Codec-Based Deepfake Speech Auf dem Weg zu einer generalisierten Quellenverfolgung für codec-basierte Deepfake-Rede 努力实现以基于编码的 “ 深层假言 “ 为依据的 “ 普遍源源追踪 “ 目标 2506.07294v2
  • 841 06-10 RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation RLHS: Fehlausrichtung in RLHF mit Hindsight Simulation abmildern RLHS: 减轻RLHF的差错,采用速光模拟模拟 2501.08617v3
  • 842 06-10 Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry Sparse Training von Zufallsinitialisierung: Ausrichten von Lotteriekartenmasken mit Gewichtssymmetrie 随机初始化的分散训练: 使用重量对称对调彩票票面罩 2505.05143v2
  • 843 06-10 mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks mSTEB: Massive mehrsprachige Bewertung von LLMs zu Sprach- und Textaufgaben mSTEB: 对关于发言和文本任务LLM女士进行大规模多语种评价 2506.08400v1
  • 844 06-10 Spatiotemporal deep learning models for detection of rapid intensification in cyclones Geotemporale Deep-Learning-Modelle zur Erkennung einer schnellen Verstärkung bei Zyklonen 探测气旋风暴迅速加剧的斯波时深深学习模式 2506.08397v1
  • 845 06-10 From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models Von Pixeln zu Prädikaten: Learning Symbolic World Models via Pretrained Vision-Language Models 从像素到预言:通过预先培训的视觉语言模型学习象征性世界模式 2501.00296v3
  • 846 06-10 Learning-based density-equalizing map Learning-based Dichte-equalizing Karte 基于学习的密度平等图 2506.10027v1
  • 847 06-10 Reinforcement Learning Teachers of Test Time Scaling Verstärktes Lernen von Lehrern der Testzeitskalierung 测试时间尺度强化学习教师 2506.08388v1
  • 848 06-10 Network Threat Detection: Addressing Class Imbalanced Data with Deep Forest Netzwerk Threat Detection: Adressierung von unausgeglichenen Daten der Klasse mit tiefem Wald 网络威胁探测:处理深森林的分类平衡数据 2506.08383v1
  • 849 06-10 Reinforce LLM Reasoning through Multi-Agent Reflection Verstärkung der LLM-Vernunft durch Multi-Agent-Reflexion 通过多方机构反省说明理由 2506.08379v1
  • 850 06-10 A Probabilistic Framework for Imputing Genetic Distances in Spatiotemporal Pathogen Models Ein probabilistischer Rahmen für die Imputierung genetischer Entfernungen in Spatiotemporal Pathogen-Modellen 关于将遗传距离算入时地病原体模型的概率框架 2506.09076v1
  • 851 06-10 AlphaFold Database Debiasing for Robust Inverse Folding AlphaFold Datenbank-Debiasing für robuste Inverse Faltung AlphaFold 数据库对硬性反向折叠的偏移 2506.08365v1
  • 852 06-10 Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity Lösen von Convex-Concave-Problemen mit $\tilde{\mathcal{O}}(ε^{-4/7})$ Oracle-Komplexität zweiter Ordnung 以 $\ title_ mathcal{O} (- 4/ 7}) 美元解决二次折形甲骨文复杂度的连接问题 2506.08362v1
  • 853 06-10 NysAct: A Scalable Preconditioned Gradient Descent using Nystrom Approximation NysAct: Ein skalierbarer vorkonditionierter Gradient Abstieg mit Nystrom-Annäherung NysAct: 使用Nystrom 接近法的可缩放的预设条件渐变源 2506.08360v1
  • 854 06-10 An Adaptive Method Stabilizing Activations for Enhanced Generalization Eine adaptive Methode zur Stabilisierung von Aktivierungen für verstärkte Generalisierung 增强普遍化的适应方法稳定活动 2506.08353v1
  • 855 06-10 Differentially Private Relational Learning with Entity-level Privacy Guarantees Unterschiedlich privates relationales Lernen mit Datenschutzgarantien auf Körperschaftsebene 与实体一级的隐私保障有区别的私人关系学习 2506.08347v1
  • 856 06-10 Re4MPC: Reactive Nonlinear MPC for Multi-model Motion Planning via Deep Reinforcement Learning Re4MPC: Reaktives nichtlineares MPC für Multi-Model-Bewegungsplanung über Deep Reinforcement Learning Re4MPC:通过深强化学习进行多模式运动规划的反活动非线性非线性MPC 2506.08344v1
  • 857 06-10 Dynamical System Optimization Dynamische Systemoptimierung 动态系统优化 2506.08340v1
  • 858 06-10 Equivalence of stochastic and deterministic policy gradients Gleichwertigkeit stochastischer und deterministischer politischer Gradienten 政策梯度和确定性政策梯度等同 2505.23244v2
  • 859 06-10 midr: Learning from Black-Box Models by Maximum Interpretation Decomposition midr: Von Black-Box-Modellen lernen durch maximale Interpretationszersetzung 中间体: 通过最大解析分解从黑ox 模型中学习 2506.08338v1
  • 860 06-10 A Simple Analysis of Discretization Error in Diffusion Models Eine einfache Analyse des Diskretisierungsfehlers in Diffusionsmodellen 简单分析扩散模型中的分解错误 2506.08337v1
  • 861 06-10 Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer Zero-shot Meta-Learning für Tabular Prediction Aufgaben mit adversarially Pre-trained Transformer 采用未受过训练的变换器进行表列预测任务 2502.04573v2
  • 862 06-10 Generation from Noisy Examples Generierung aus lauteren Beispielen 从吵闹中生成 2501.04179v2
  • 863 06-10 Graph Prompting for Graph Learning Models: Recent Advances and Future Directions Graph Prompting für Graph Learning Models: Aktuelle Fortschritte und zukünftige Richtungen 图示学习模型提示图示:最近的进展和未来方向 2506.08326v1
  • 864 06-10 Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces Modellfreier Kernel Conformal Depth misst Algorithmen zur Unsicherheitsquantifizierung in Regressionsmodellen in trennbaren Hilbert-Räumen 在塞可分离的希尔伯特空间的回归模型中,不确定性量化不确定性的模型-无模型核心常规 常规深度测量标准 2506.08325v1
  • 865 06-10 RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors und Natural Language 关于音频、视频、嵌入传感器和自然语言的问答答案 2505.17114v2
  • 866 06-10 TPP-LLM: Modeling Temporal Point Processes by Efficiently Fine-Tuning Large Language Models TPP-LLM: Modellierung von Temporal Point-Prozessen durch effizientes Feintuning großer Sprachmodelle TPP-LLM:通过高效微调大语言模式模拟时点进程 2410.02062v2
  • 867 06-10 Curse of High Dimensionality Issue in Transformer for Long-context Modeling Fluch der Hochdimensionalitätsfrage im Transformer für die Langkontextmodellierung 变异器中高多维度问题的诅咒,用于长期建模 2505.22107v3
  • 868 06-10 Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion Warum Diffusion arbeitet maskieren: Zustand auf dem Sprungplan für verbesserte diskrete Diffusion 为何遮掩传播工作有效:改进传播的跳跳时间表上的条件 2506.08316v1
  • 869 06-10 Private Evolution Converges Private Evolutionskonvergenzen 私人进化集市 2506.08312v1
  • 870 06-10 Continuous Temporal Learning of Probability Distributions via Neural ODEs with Applications in Continuous Glucose Monitoring Data Kontinuierliches Temporales Lernen der Wahrscheinlichkeitsverteilungen über neurale ODEs mit Anwendungen in kontinuierlichen Glukoseüberwachungsdaten 不断从时间上学习通过有持续甘蔗监测数据应用的神经代码对概率分布进行概率分布的连续时间学习 2505.08698v2
  • 871 06-10 GRAM: Generalization in Deep RL with a Robust Adaptation Module GRAM: Verallgemeinerung in Deep RL mit einem robusten Anpassungsmodul GRAM: 带有强力适应模块的深 RL 常规化 2412.04323v2
  • 872 06-10 H$^2$GFM: Towards unifying Homogeneity and Heterogeneity on Text-Attributed Graphs H$^2$GFM: Auf dem Weg zur Vereinheitlichung von Homogenität und Heterogenität auf Text-Attributen Grafiken H$2美元GFM:努力在文本图表中统一同质性和异质性 2506.08298v1
  • 873 06-09 (1) From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information? Von passiv zur aktiven Vernunft: Können große Sprachmodelle unter unvollständigen Informationen die richtigen Fragen stellen? 从被动到积极理由:大语言模式能够在不完整信息下询问正确的问题吗? 2506.08295v1
  • 874 06-09 From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium Von der Debatte zum Equilibrium: Glaube-getriebener Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium 从辩论到平衡:通过巴伊西亚纳什平衡进行 2506.08292v1
  • 875 06-09 Digital Twin Synchronization: Bridging the Sim-RL Agent to a Real-Time Robotic Additive Manufacturing Control Digitale Twin-Synchronisierung: Überbrücken des Sim-RL Agents zu einer Echtzeit-Roboter-Zusatz-Fertigungssteuerung 数字双双同步同步:将Sim-RL代理连接到实时机器人添加剂制造控制 2501.18016v2
  • 876 06-09 Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder Therapy Edge Computing basierte human-Roboter Kognitive Fusion: Eine medizinische Fallstudie in der Autismus-Spektrum-Störungstherapie 以边缘计算机为基础的人类-机器人认知共生:自闭症频谱病理医学案例研究 2401.00776v2
  • 877 06-09 Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers Präzise hochdimensionale Asymptotik für die Quantifizierung heterogener Übertragungen 用于量化异基因转移的精确高多元性高活性 2010.11750v5
  • 878 06-09 Voice Impression Control in Zero-Shot TTS Sprach-Impression-Steuerung in Zero-Shot TTS 零热TTTS中的声音压抑控制 2506.05688v2
  • 879 06-09 Seeing Voices: Generating A-Roll Video from Audio with Mirage Stimmen sehen: A-Roll-Video aus Audio mit Mirage generieren 观看声音:从《幻影之音》制作一首歌曲视频 2506.08279v1
  • 880 06-09 Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain Instruction-Tuned Video-Audio Modelle Elucidate Funktionelle Spezialisierung im Gehirn 脑中精明功能性专业 2506.08277v1
  • 881 06-09 SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device SnapGen-V: Ein Fünf-Sekunden-Video innerhalb von fünf Sekunden auf einem mobilen Gerät erzeugen SnapGen-V:在移动设备上制作5秒内5秒内5秒的视频 2412.10494v2
  • 882 06-09 LEANN: A Low-Storage Vector Index LEANN: Ein speicherarmer Vektorindex LEANN:低压力矢量指数 2506.08276v1
  • 883 06-09 Exploring Microstructural Dynamics in Cryptocurrency Limit Order Books: Better Inputs Matter More Than Stacking Another Hidden Layer Erforschung mikrostruktureller Dynamiken in Kryptowährung Limit Order Books: Bessere Eingaben sind wichtiger als das Stapeln einer anderen versteckten Ebene 探索加密货币限制顺序书中的微观结构动态:更好的投入比另层隐藏层更重要 2506.05764v2
  • 884 06-09 DualDynamics: Synergizing Implicit and Explicit Methods for Robust Irregular Time Series Analysis DualDynamics: Synergetische Implizite und explizite Methoden zur robusten irregulären Zeitreihenanalyse 双动态: 优化非常规时间序列分析的隐含和显性方法 2401.04979v6
  • 885 06-09 Navigating the Latent Space Dynamics of Neural Models Navigation der latenten Raumdynamik von Neuralmodellen 导航内壳模型的冷层空间动态 2505.22785v2
  • 886 06-09 Comprehensive Review of Neural Differential Equations for Time Series Analysis Umfassende Überprüfung neuraler Differentialgleichungen für die Zeitreihenanalyse 全面审查用于时间序列分析的神经差异等同 2502.09885v2
  • 887 06-09 Universal Differential Equations for Scientific Machine Learning of Node-Wise Battery Dynamics in Smart Grids Universelle Differentialgleichungen für das wissenschaftliche maschinelle Lernen von Node-Wise-Batteriedynamiken in Smart Grids 智能网格节点电池动态科学机器学习通用差异等量 2506.08272v1
  • 888 06-09 Why Gradients Rapidly Increase Near the End of Training Warum Gradienten am Ende der Ausbildung rasch zunehmen 为何在培训接近结束时梯度迅速增加 2506.02285v2
  • 889 06-09 Sparse Interpretable Deep Learning with LIES Networks for Symbolic Regression Sparse Interpretable Deep Learning mit LIES-Netzwerken für symbolische Regression 与 LIES 网络共享的粗化可解释深空学习,用于符号回归 2506.08267v1
  • 890 06-09 Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints Verstärktes Lernen aus menschlichem Feedback mit hochkonfidenziellen Sicherheitseinschränkungen 在高度信任安全限制下,从人类反馈中加强学习 2506.08266v1
  • 891 06-09 Learning-Based Multiuser Scheduling in MIMO-OFDM Systems with Hybrid Beamforming Lernbasierte Multiuser-Scheichung in MIMO-OFDM-Systemen mit Hybrid-Beamforming MOIMO-OFDM系统和混合波束系统中基于学习的多用户规划 2506.08263v1
  • 892 06-09 SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense SHIELD: Sichere Hypernetzwerke für inkrementale Expansions-Learning Defense SHIELD: 安全超网络,以逐步扩大学习防御 2506.08255v1
  • 893 06-09 Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction Selbstüberwachte Transformer als iterative Lösungsverbesserungen für Einschränkungen Zufriedenheit 自我主宰的变形器作为抑制满意度的循环解决方案改进器 2502.15794v2
  • 894 06-09 TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation TRAVEL: Training-freies Retrieval und Ausrichtung für Vision-und-Sprachen-Navigation TRAVEL:无培训检索和协调愿景和语言导航 2502.07306v2
  • 895 06-09 Approximation-Aware Bayesian Optimization Angemessene Bayesian-Optimierung 近似- Aware Bayesian 优化优化 2406.04308v2
  • 896 06-09 General Loss Functions Lead to (Approximate) Interpolation in High Dimensions Allgemeine Verlustfunktionen führen zu (nahezu) Interpolation in hohen Dimensionen 导致(近近)高层面内插(近) 2303.07475v2
  • 897 06-09 Parameter-free approximate equivariance for tasks with finite group symmetry Parameterfreie ungefähre Äquivarianz für Aufgaben mit endlicher Gruppensymmetrie 有限组对称任务无参数 2506.08244v1
  • 898 06-09 Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic Temporalisierendes Vertrauen: Bewertung von Ketten-of-Thought-Reasoning mit Signal Temporal Logic 时间化信任度:对根据信号时空逻辑进行的研究链原因的评价 2506.08243v1
  • 899 06-09 Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations Umgang mit den bösen Zwillingen: Zufällige Augmentation verbessern, indem Katastrophenvergessen von verschiedenen Augmentationen adressiert wird 对付邪恶双胞胎:通过解决灾难性地忘记多种增加现象,改善随机增加现象 2506.08240v1
  • 900 06-09 MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning MF-LAL: Arzneimittel-Compound-Generierung mit Multi-Fidelity Latent Space Active Learning MF-LAL:利用多功能低空空间主动学习生成药物化合物 2410.11226v3
  • 901 06-09 Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework Gewährleistung der Zuverlässigkeit von kuratierten EHR-Daten: Die Validierung der Genauigkeit für LLM/ML-extrahierte Informationen und Daten (VALID) Framework 确保封闭式电子HR-驱动数据可靠性:LLM/ML-提取信息和数据(VALID)框架的准确性验证 2506.08231v1
  • 902 06-09 Scaling Laws of Motion Forecasting and Planning – A Technical Report Skalierungsgesetze der Bewegungsvorhersage und Planung – Ein technischer Bericht 扩大运动预测和规划法 – – 技术报告 2506.08228v1
  • 903 06-09 Mondrian: Transformer Operators via Domain Decomposition Mondrian: Transformer Operators via Domain Decomposition Mondrian:通过域分解的变形操作器 2506.08226v1
  • 904 06-09 Harnessing the Continuous Structure: Utilizing the First-order Approach in Online Contract Design Kontinuierliche Struktur: Nutzung des First-Order-Ansatzes im Online-Contract Design 利用连续结构:在网上合同设计中使用一等办法 2403.07143v3
  • 905 06-09 What makes an Ensemble (Un) Interpretable? Was macht ein Ensemble (Un) Interpretierbar? 集合物怎么可以解释呢? 2506.08216v1
  • 906 06-09 Scalable Meta-Learning via Mixed-Mode Differentiation Skalierbares Meta-Learning über Mixed-Mode Differenzierung 通过混合模式差异的可缩放元学习 2505.00793v2
  • 907 06-09 Distinguishing Cause from Effect with Causal Velocity Models Bezeichnende Ursache von Wirkung mit kausalen Velocity-Modellen 区别因与因果速率模型的影响的原因 2502.05122v2
  • 908 06-09 A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation Eine umfassende Studie von Decoder-Only LLMs für die Text-to-Image-Generierung 综合研究 “ 文字到图像生成的开发者 – – 唯一 “ 有限LLMs “ 综合研究 2506.08210v1
  • 909 06-09 A Machine Learning Approach to Generate Residual Stress Distributions using Sparse Characterization Data in Friction-Stir Processed Parts Ein maschineller Lernansatz zur Generierung residualer Stressverteilungen mit Sparse Charakterisierungsdaten in Friction-Stir Prozessteilen 利用摩擦-标准处理部件中的扭曲特征数据生成残余应激反应分布的机械学习方法 2506.08205v1
  • 910 06-09 Correlated Noise Mechanisms for Differentially Private Learning Correlated Noise Mechanisms for Differentially Private Learning 差别私人学习相关噪音机制 2506.08201v1
  • 911 06-09 Tight Lower Bounds and Improved Convergence in Performative Prediction Enge niedrigere Grenzen und verbesserte Konvergenz in der Performativen Vorhersage 性能预测的下下角拉紧和更好趋同 2412.03671v2
  • 912 06-09 Interpreting Agent Behaviors in Reinforcement-Learning-Based Cyber-Battle Simulation Platforms Verdolmetschung von Agenten-Behaviors bei verstärkten, Learning-basierten Cyber-Battle-Simulationsplattformen 强化-以学习为基础的网络战模拟平台中解释代理行为行为 2506.08192v1
  • 913 06-09 Adapting Prediction Sets to Distribution Shifts Without Labels Anpassung von Vorhersagen an Verteilungsverschiebungen ohne Etiketten 调整预测集以适应无标签的分布转换 2406.01416v2
  • 914 06-09 Innate-Values-driven Reinforcement Learning based Cognitive Modeling Angeborene Werte-getriebene Verstärkungs-Lernen auf Kognitive Modellierung 基于认知模型的强化学习模型 2411.09160v2
  • 915 06-09 Innate-Values-driven Reinforcement Learning based Cooperative Multi-Agent Cognitive Modeling Angeborene Werte-getriebene Verstärkung Learning basierte kooperative Multi-Agent Kognitive Modellierung 以基于强化的学习为基础的合作多代理共认型建模 2401.05572v2
  • 916 06-09 FedGA-Tree: Federated Decision Tree using Genetic Algorithm FedGA-Baum: Föderierter Entscheidungsbaum mit genetischem Algorithmus FedGA-Tree:使用遗传算法的联邦决策树 2506.08176v1
  • 917 06-09 xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs xGen-MM-Vid (BLIP-3-Video): Sie benötigen nur 32 Zeichen, um ein Video auch in VLMs darzustellen xGen-MM-Vid (BLIP-3-Video):您只需32个调子即可在 VLMs 中代表视频 2410.16267v2
  • 918 06-09 Grouped Discrete Representation for Object-Centric Learning Gruppierte diskrete Darstellung für Objekt-Zentrisches Lernen 用于对象中心学习的分组差异代表 2411.02299v2
  • 919 06-09 Federated Learning on Stochastic Neural Networks Föderiertes Lernen in stochastischen Neuronalen Netzwerken 存储神经神经网络方面的联邦学习 2506.08169v1
  • 920 06-09 SILK: Smooth InterpoLation frameworK for motion in-betweening A Simplified Computational Approach SILK: Smooth InterpoLation frameworK für Bewegung zwischen einem vereinfachten Computational Approach SILK: 在简化计算方法之间运动的平滑间插框架work 2506.09075v1
  • 921 06-09 UniVarFL: Uniformity and Variance Regularized Federated Learning for Heterogeneous Data UniVarFL: Uniformität und Varianz Regularisiertes Federated Learning für heterogene Daten UUVARFL: 统一和差异校正的联邦异种数据学习 2506.08167v1
  • 922 06-09 BLUR: A Bi-Level Optimization Approach for LLM Unlearning BLUR: Ein Bi-Level-Optimierungsansatz für LLM-Unlearning BLUR: LLM 重新学习的双级优化办法 2506.08164v1
  • 923 06-09 A Metrics-Oriented Architectural Model to Characterize Complexity on Machine Learning-Enabled Systems Ein metrisch ausgerichtetes architektonisches Modell zur Charakterisierung von Komplexität auf maschinell lernfähigen Systemen 以计量为主的建筑建筑模型,以明确机械学习系统的复杂性 2506.08153v1
  • 924 06-09 Towards a Mechanistic Explanation of Diffusion Model Generalization Auf dem Weg zu einer mechanistischen Erklärung der Verallgemeinerung des Difffusionsmodells 争取对传播模型通用化进行机械解释 2411.19339v3
  • 925 06-09 Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models Mehrsprachige Hate Speech Detection in sozialen Medien mit translationsbasierten Ansätzen mit großen Sprachmodellen 在社会媒体中使用基于翻译的方法和大语言模式的多语言社会媒体多语言仇恨言论探测 2506.08147v1
  • 926 06-09 A Theory of Machine Understanding via the Minimum Description Length Principle Eine Theorie des Maschinenverständnisses über das Prinzip der Mindestbeschreibungslänge 通过最低描述长度原则理解机器理论 2504.00395v3
  • 927 06-09 Fully data-driven inverse hyperelasticity with hyper-network neural ODE fields Vollständig datengetriebene inverse Hyperelastizität mit hyper-network neuronalen ODE-Feldern 由全数据驱动的全数据驱动的超反超弹性,具有超网络神经极极光字段 2506.08146v1
  • 928 06-09 Accelerating Spectral Clustering under Fairness Constraints Beschleunigung des Spektralclustering unter Fairness-Beschränkungen 加速在公平制约下聚居 2506.08143v1
  • 929 06-09 AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists AutoSDT: Scaling Data-Driven Discovery Aufgaben gegenüber offenen Mitwissenschaftlern AutoSDD: 扩大数据驱动发现任务, 以建立开放的共同科学家 2506.08140v1
  • 930 06-09 Nearness of Neighbors Attention for Regression in Supervised Finetuning Nähe der Nachbarn Achtung vor der Regression in überwachter Finetuning 邻近邻居的近邻 在监督的微调中注意后退 2506.08139v1
  • 931 06-09 Learning Efficient Representations of Neutrino Telescope Events Effiziente Darstellungen von Neutrino-Teleskop-Veranstaltungen lernen 中子望远镜活动学习效率高的代表性 2410.13148v2
  • 932 06-09 UnCLe: Benchmarking Unsupervised Continual Learning for Depth Completion UnCLe: Benchmarking unüberwachtes kontinuierliches Lernen für die Tiefenvervollständigung 无CLLe:为深度完成确定不受监督的连续学习基准 2410.18074v4
  • 933 06-09 Directed Exploration in Reinforcement Learning from Linear Temporal Logic Direkte Exploration bei der Verstärkung Lernen von linearen Temporal Logic 直接探索从线上时间逻辑中加强学习 2408.09495v2
  • 934 06-09 Constrained Pareto Set Identification with Bandit Feedback Beschränkte Pareto-Set-Identifikation mit Bandit-Feedback 被约束的 Pareto 设置 识别强盗反馈 2506.08127v1
  • 935 06-09 Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning Bingo: Effiziente Begründung von LLMs durch dynamisches und auf Bedeutung basierendes Verstärkungslernen 宾果:通过以动态和重要内容为基础的强化学习,提高LLMs高效率的理据能力 2506.08125v1
  • 936 06-09 Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence Kontinuierliche Politik und Wert Iteration für stochastische Kontrollprobleme und deren Konvergenz 斯托卡控制问题及其趋同的持续政策和价值迭代 2506.08121v1
  • 937 06-09 Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting Benchmarking-Modelle der vortrainierten Zeitreihen für die Strompreisprognose 确定电力价格预测培训前时间序列模型的基准 2506.08113v1
  • 938 06-09 Selective Matching Losses – Not All Scores Are Created Equal Selektive Matching Losses – Nicht alle Scores werden gleich erstellt 选择性匹配损失 – – 并非所有分数都等于创建 2506.04446v2
  • 939 06-09 Positional Attention: Expressivity and Learnability of Algorithmic Computation Positionale Aufmerksamkeit: Expressivität und Erlernbarkeit algorithmischer Computation 位置注意: 算法的可表达性和可理解性 2410.01686v3
  • 940 06-09 StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets StableMTL:从部分附加说明合成数据集中重新定位多任务学习的远程传播模型 2506.08013v1
  • 941 06-09 Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Selbsterzwingen: Überbrückung der Zug-Testlücke in autoregressiver Video-Diffusion 自我强迫:弥合自动递减视频传播中的培训-测试差距 2506.08009v1
  • 942 06-09 Hidden in plain sight: VLMs overlook their visual representations Versteckt in freier Sicht: VLMs übersehen ihre visuellen Darstellungen 隐藏在普通视线之下:VLMs忽略了它们的视觉形象。 2506.08008v1
  • 943 06-09 Reparameterized LLM Training via Orthogonal Equivalence Transformation Reparameterisiertes LLM-Training über Orthogonale Äquivalenztransformation 通过正正对等转化进行修复性磁力LLM培训 2506.08001v1
  • 944 06-09 MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation MADFormer: Mixed Autoregressive und Diffusion Transformer für die kontinuierliche Bildgenerierung MADFormer: 用于连续图像生成的混合自动递减和扩散变异器 2506.07999v1
  • 945 06-09 Generative Modeling of Weights: Generalization or Memorization? Generative Modellierung von Gewichten: Verallgemeinerung oder Erinnerung? 产生重力建模:普遍化还是记忆化? 2506.07998v1
  • 946 06-09 Domain Switching on the Pareto Front: Multi-Objective Deep Kernel Learning in Automated Piezoresponse Force Microscopy Domain-Umschaltung der Pareto-Front: Multi-Objektives Deep Kernel-Lernen in automatisierter Piezoresponse-Kraftmikroskopie Pareto Front 域域切换:在自动 Piezo响应力量中进行多目标深核心学习的显微镜 2506.08073v1
  • 947 06-09 PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation PhantomWiki: On-Demand-Datensätze für die Bewertung von Vernunft und Retrieval PhohantomWiki: 用于理性和检索评价的现场数据集 2502.20377v2
  • 948 06-09 When Two LLMs Debate, Both Think They’ll Win Wenn zwei LLMs diskutieren, denken beide, dass sie gewinnen werden 当两个LLM 辩论, 双方都认为他们会赢 2505.19184v3
  • 949 06-09 Rethinking Crowd-Sourced Evaluation of Neuron Explanations Rethinking Crowd-Sourced Evaluation von Neuron Erklärungen 重新思考人群-源对内向解释的重新思考 2506.07985v1
  • 950 06-09 CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray CXR-LT 2024: Eine MICCAI-Herausforderung bei Langzeit-, Multilabel- und Zero-Shot-Erkrankungsklassifizierung aus der Bruströntgenuntersuchung CXR-LT 2024:MICCAI对胸前X光中长尾、多标签和零发疾病分类的挑战 2506.07984v1
  • 951 06-09 Real-time Localization of a Soccer Ball from a Single Camera Echtzeit Lokalisierung eines Fußballballs aus einer einzigen Kamera 单摄像头足球球的实时本地化 2506.07981v1
  • 952 06-09 Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation Bipartite Ranking von mehreren Labels: Auf Verlust Versus Label Aggregation 多标签的双边排名:关于损失的Versus Label聚合 2504.11284v2
  • 953 06-09 Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator Realistischer städtischer Verkehrsgenerator mit dezentralisiertem Federated Learning für den SUMO-Simulator 使用分散化的联邦学习来模拟SUMO的模拟器进行现实化城市交通发电机 2506.07980v1
  • 954 06-09 Automated Capability Discovery via Foundation Model Self-Exploration Automatisierte Capability Discovery über die Selbsterkundung von Fundamentmodellen 通过基金会自我探索示范模式自动发现能力 2502.07577v3
  • 955 06-09 Hyperpruning: Efficient Search through Pruned Variants of Recurrent Neural Networks Leveraging Lyapunov Spectrum Hyperpruning: Effiziente Suche durch beschnittene Varianten von recurrenten Neuronalen Netzwerken, die Lyapunov Spectrum nutzen 超光速运行:通过经常神经网络的审慎变式有效搜索,利用Lyapunov光谱网 2506.07975v1
  • 956 06-09 Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures Intelligentes Offloading im Fahrzeug Edge Computing: Eine umfassende Überprüfung von Deep Reinforcement-Lernansätzen und Architekturen 在车辆边缘计算机中卸载:对深强化学习方法和架构的全面审查 2502.06963v2
  • 957 06-09 HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization HeuriGym: Agentischer Benchmark für LLM-gekrempelte Heuristik in der kombinatorischen Optimierung HeuriGym: 组合优化中LLM-减压希力学的代理基准 2506.07972v1
  • 958 06-09 MIB: A Mechanistic Interpretability Benchmark MIB: Ein Mechanistischer Interpretations-Benchmark MIB: 机械解释基准 2504.13151v2
  • 959 06-09 A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling Ein zweiphasiges Deep-Learning-Framework für adaptive Zeitschritte in der High-Speed-Flow-Modellierung 高频流建模适应性时间排行的两阶段深学习框架 2506.07969v1
  • 960 06-09 Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs 物理知情神经溶解器中发现聚合物的神经内核内核分析:PIKANs诉PINNs 2506.07958v1
  • 961 06-09 Distillation Robustifies Unlearning Destillation stärkt das Unlearning 固化 固化 校正 校正 2506.06278v2
  • 962 06-09 Language Models over Canonical Byte-Pair Encodings Sprachmodelle über Canonical Byte-Pair Encodings Canonicic 字节对称编码语言模型 2506.07956v1
  • 963 06-09 Discrete and Continuous Difference of Submodular Minimization Diskreter und kontinuierlicher Unterschied der Submodularen Minimierung 子模块最小化的分质和连续差异 2506.07952v1
  • 964 06-09 Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization Easy2Hard-Bench: Standardisierte Schwierigkeits-Labels für Profiling LLM Performance und Generalisierung Easy2Hard-Bench:用于分析LLM性能和通用的标准化困难标签 2409.18433v2
  • 965 06-09 Cost-Optimal Active AI Model Evaluation Kostenoptimale, aktive KI-Modellbewertung AI模型评价 2506.07949v1
  • 966 06-09 TokenBreak: Bypassing Text Classification Models Through Token Manipulation TokenBreak: Textklassifikationsmodelle durch Token-Manipulation umgehen Token Break: 通过调制调控绕过文本分类模型 2506.07948v1
  • 967 06-09 Info-Coevolution: An Efficient Framework for Data Model Coevolution Info-Coevolution: Ein effizienter Rahmen für die Datenmodellkoevolution 信息革命:数据模型革命的有效框架 2506.08070v1
  • 968 06-09 Gradients: When Markets Meet Fine-tuning – A Distributed Approach to Model Optimisation Gradienten: Wenn Märkte Feinsteuerung treffen – Ein verteilter Ansatz zur Modelloptimierung 渐进:市场达到微调时 – – 模型优化的分配办法 2506.07940v1
  • 969 06-09 Reliably detecting model failures in deployment without labels Zuverlässige Erkennung von Modellfehlern im Einsatz ohne Etiketten 可靠地发现无标签部署中的模型失败 2506.05047v2
  • 970 06-09 Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models Mimicking oder Reasoning: Multi-Modal In-Context Learning in Vision-Language-Modellen neu denken 模拟或解释:在视觉语言模型中重新思考多模式内学习 2506.07936v1
  • 971 06-09 Ensemble-Based Survival Models with the Self-Attended Beran Estimator Predictions Ensemble-basierte Überlebensmodelle mit den selbstbestimmten Beran-Estimator-Vorhersagen 与自行预测的贝兰估算器合成生存模型 2506.07933v1
  • 972 06-09 Predicting Bad Goods Risk Scores with ARIMA Time Series: A Novel Risk Assessment Approach Risiko-Scores mit ARIMA Time Series voraussagen: Ein neuartiger Ansatz zur Risikobewertung ARIMA 时间系列:新风险评估方法 2502.16520v3
  • 973 06-09 Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor Squeeze3D: Ihr Modell der 3D-Generation ist geheim ein extremer Neuralkompressor Squeeze3D: 您的 3D 代的模型是一个秘密的 极端神经压缩器 2506.07932v1
  • 974 06-09 Missing Data Imputation by Reducing Mutual Information with Rectified Flows Fehlende Daten-Imputation durch Verringerung gegenseitiger Informationen mit rektifizierten Strömen 通过减少相互信息与纠正流流 2505.11749v2
  • 975 06-09 A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle Ein generativer physik-informierter, lernbasierter Ansatz zur Konstruktion repräsentativer Antriebszyklen 为建造具有代表性的驱动周期而采用基于学习的创形物理综合强化强化学习方法 2506.07929v1
  • 976 06-09 Solving Inequality Proofs with Large Language Models Unequality Proofs mit großen Sprachmodellen lösen 以大语言模式解决不平等证据 2506.07927v1
  • 977 06-09 A Comparative Study of U-Net Architectures for Change Detection in Satellite Images Eine vergleichende Studie über U-Net-Architekturen zur Erkennung von Veränderungen in Satellitenbildern U-Net图象变化探测网络结构比较研究 2506.07925v1
  • 978 06-09 W4S4: WaLRUS Meets S4 for Long-Range Sequence Modeling W4S4: WaLRUS trifft S4 für lange Sequenzmodellierung W4S4:WALRUS在长程序列建模中与S4相匹配 2506.07920v1
  • 979 06-09 Uncovering the Functional Roles of Nonlinearity in Memory Enthüllen der funktionalen Rollen der Nichtlinearität im Gedächtnis 发挥记忆中非线性的作用 2506.07919v1
  • 980 06-09 CausalPFN: Amortized Causal Effect Estimation via In-Context Learning CausalPFN: Amortisierte Abschätzung des Kausaleffekts durch In-Context Learning CausalPPFN: 通过内文学习对因果效应的摊销估计 2506.07918v1
  • 981 06-09 Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces Diffuse Alles: Multimodale Diffusionsmodelle auf willkürlichen Staatsräumen Diffuse Everything:关于任意国家空间的多模式扩散模型 2506.07903v1
  • 982 06-09 FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative Modeling FunDiff: Diffusionsmodelle über Funktionsräume für physikinformierte Generative Modellierung FunDiff:物理成形生成模型的功能空间上的传播模型 2506.07902v1
  • 983 06-09 MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs MEMOIR: Lebenslange Modellbearbeitung mit minimalem Überschreiben und informierter Aufbewahrung für LLMs MEMOIR: 长寿命模型编辑,以最小覆盖和知情保留LLMM 2506.07899v1
  • 984 06-09 GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution GaussianVAE: Adaptive Lerndynamik von 3D Gaussianern für High-Fidelity Super-Resolution GausianVAE: 高频高频高频高频高频高分辨率3D的适应性学习动力 2506.07897v1
  • 985 06-09 Learning to Stop: Deep Learning for Mean Field Optimal Stopping Stoppen lernen: Deep Learning für das mittlere Feld Optimales Stoppen 学习到停止:为平均字段进行深度学习 2410.08850v2
  • 986 06-09 SMRS: advocating a unified reporting standard for surrogate models in the artificial intelligence era SMRS: Für einen einheitlichen Berichterstattungsstandard für Surrogatmodelle im Zeitalter der künstlichen Intelligenz SMRS:倡导人工智能时代代用模型的统一报告标准 2502.06753v2
  • 987 06-09 SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark SoK: Datenrekonstruktion greift gegen Modelle des maschinellen Lernens an: Definition, Metrics und Benchmark SoK:对机器学习模式进行数据重建攻击:定义、计量和基准 2506.07888v1
  • 988 06-09 SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection SimLTD: Einfache, überwachte und halbüberwachte Langzeitobjekterkennung SimLTD: 简单监视和半监视长途远航物体探测 2412.20047v3
  • 989 06-09 Schauder Bases for $C[0, 1]$ Using ReLU, Softplus and Two Sigmoidal Functions Schauder-Basis für $C[0, 1]$ mit ReLU, Softplus und zwei Sigmoidal-Funktionen 使用 ReLU、 Softplus 和两个悬浮函数的 Schauder 基地[0, 1]$C[0, 1]美元 2506.07884v1
  • 990 06-09 Diffusion Counterfactual Generation with Semantic Abduction Diffusion Gegenfaktische Generation mit semantischer Abduktion 以语义绑架方式传播反实际生成 2506.07883v1
  • 991 06-09 On Hypothesis Transfer Learning of Functional Linear Models Über Hypothesentransfer Lernen von funktionalen Linearmodellen 关于职能线性模型学习的假设转移 2206.04277v5
  • 992 06-09 GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity GradSkip: Kommunikationsbeschleunigte lokale Gradientenmethoden mit besserer Computational Complexity GradSkip:通讯加速的地方渐进法,具有更好的计算复杂度 2210.16402v3
  • 993 06-09 SIFBench: An Extensive Benchmark for Fatigue Analysis SIFBench: Ein umfassender Benchmark für die Ermüdungsanalyse SIFBench:关于 “ 致命分析 “ 的广泛基准 2506.01173v2
  • 994 06-09 Algorithmic Aspects of Strategic Trading Algorithmische Aspekte des strategischen Handels 战略贸易的算法方面 2502.07606v2
  • 995 06-09 TS-RSR: A provably efficient approach for batch Bayesian Optimization TS-RSR: Ein nachweislich effizienter Ansatz für die Batch Bayesian Optimierung TS-RSR:对Bayesian的批量最佳化采取可衡量有效办法 2403.04764v4
  • 996 06-09 Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models? Können Hessian-Based Insights Fehlerdiagnosen in aufmerksamkeitsbasierten Modellen unterstützen? 以海珊为基地的洞察能支持以关注为基础的模型中的过失诊断吗? 2506.07871v1
  • 997 06-09 Accurate INT8 Training Through Dynamic Block-Level Fallback Genaues INT8-Training durch dynamischen Block-Level-Fallback 通过动态区档级后退进行准确的INT8培训 2503.08040v3
  • 998 06-09 FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity FreeGave: 3D Physik Lernen aus dynamischen Videos von Gaussian Velocity FreeGave: Gaussian 速度从动态视频学习 3D物理 2506.07865v1
  • 999 06-09 Lightweight Sequential Transformers for Blood Glucose Level Prediction in Type-1 Diabetes Leichte Sequentielle Transformer für Blutglukose-Level-Vorhersage bei Typ-1-Diabetes 一型糖尿病血凝胶水平预测的轻量量 轻序式变压器 2506.07864v1
  • 1000 06-09 VIVAT: Virtuous Improving VAE Training through Artifact Mitigation VIVAT: Tugendhafte Verbesserung der VAE-Ausbildung durch Artefakt-Minderung VIVAT:通过减少非手工行为,提高VAE培训的道德水平 2506.07863v1
  • 1001 06-09 PropEnc: A Property Encoder for Graph Neural Networks Propenc: Ein Objekt-Encoder für Graph-Neural-Netzwerke PropEnc: 图形神经网络的属性编码器 2409.11554v3
  • 1002 06-09 Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective Fairness Overfiting in Machine Learning: Eine informationstheoretische Perspektive 在机器学习中过分适应公平:信息理论视角 2506.07861v1
  • 1003 06-09 LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds LogoSP: Lokale-globale Gruppierung von Superpoints für unbeaufsichtigte semantische Segmentierung von 3D-Punkt-Clouds LogoSP: 3D点云无监督的语义分割超点地方-全球组别 2506.07857v1
  • 1004 06-09 Residual Reweighted Conformal Prediction for Graph Neural Networks Residual Reweighted Conformal Prediction for Graph Neural Networks 图形神经网络的重新加权常规预测 2506.07854v1
  • 1005 06-09 Conditional Local Independence Testing with Application to Dynamic Causal Discovery Conditional Local Independence Testing mit Anwendung auf Dynamische Kausal-Discovery 有条件的局部独立测试,应用到动态因子发现 2506.07844v1
  • 1006 06-09 Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels Jarzynski Umgewichtungs- und Probenahmedynamiken für das Training energiebasierter Modelle: Theoretische Analyse verschiedener Übergangskerne Jarzynski 培训以能源为基础的模型的加权和抽样动态:对不同过渡核心的理论分析 2506.07843v1
  • 1007 06-09 FedX: Adaptive Model Decomposition and Quantization for IoT Federated Learning FedX: Adaptive Modellzersetzung und Quantisierung für IoT-Federated Learning FFX:国际oT联合会学习的适应性模型分解和量化 2504.12849v3
  • 1008 06-09 A Cognac Shot To Forget Bad Memories: Corrective Unlearning for Graph Neural Networks Ein Cognac, um schlechte Erinnerungen zu vergessen: Korrektives Lernen für Graphen-Neural-Netzwerke 为忘记坏记忆而拍摄的科格纳克白兰地:为图形神经网络进行矫正性学习 2412.00789v4
  • 1009 06-09 C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition C3T: Grenzüberschreitender Transfer durch Zeit für sensorgestützte menschliche Aktivitätserkennung C3T: 以传感器为基础的人类活动识别跨时间跨模式转让 2407.16803v3
  • 1010 06-09 Efficient and Generalized end-to-end Autonomous Driving System with Latent Deep Reinforcement Learning and Demonstrations Effizientes und generalisiertes End-to-End Autonomes Fahrsystem mit Latent Deep Reinforcement Learning und Demonstrationen 高效和普遍通用的端到端自动驾驶系统,配有深深层深强化学习和示范 2401.11792v7
  • 1011 06-09 Beyond Numeric Rewards: In-Context Dueling Bandits with LLM Agents Über Numerische Belohnungen hinaus: In-Context-Duellbanditen mit LLM-Agenten 除数值奖励之外:与LLM代理商进行文中评分的强盗 2407.01887v4
  • 1012 06-09 Improving large language models with concept-aware fine-tuning Verbesserung großer Sprachmodelle mit konzeptbewusster Feinsteuerung 改进大语言模式,进行概念意识微调 2506.07833v1
  • 1013 06-09 Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information Dezentralisierung des Mehr-Agenten-Verstärkungs-Lernens mit zeitlichen Kausalinformationen 利用时间原因信息分散多机构强化学习 2506.07829v1
  • 1014 06-09 R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation R3D2: Realistische 3D Asset Insertion über Diffusion für autonome Fahrsimulation R3D2:现实的 3D 资产通过传播插入资产,用于自动驾驶模拟 2506.07826v1
  • 1015 06-09 Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation Beschleunigung von Diffusionsmodellen in Offline-RL über die Reward-Aware-Konsistenz-Trajektoriendestillation 通过奖励-软件兼容性轨迹蒸馏在离线 RL 中加速扩散模型 2506.07822v1
  • 1016 06-09 Accelerating Constrained Sampling: A Large Deviations Approach Beschleunigte Probenahme beschleunigen: Ein großer Abweichungsansatz 加速受控抽样:大偏离方法 2506.07816v1
  • 1017 06-09 A weighted quantum ensemble of homogeneous quantum classifiers Ein gewichtetes Quantenensemble homogener Quantenklassifikatoren 一份加权量量的加权总量,由同质量分类器组成 2506.07810v1
  • 1018 06-09 Identifiable Object Representations under Spatial Ambiguities Identifizierbare Objektdarstellungen unter räumlichen Ambiguitäten 空间模糊度下可识别物体的表示 2506.07806v1
  • 1019 06-09 Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability Verbesserung der adversarialen Robustheit mit konformer Vorhersage: Ein Rahmen für eine garantierte Model-Zuverlässigkeit 增强以非正式预测增强反向能力:保证模型可靠性框架 2506.07804v1
  • 1020 06-09 Fine-grained Hierarchical Crop Type Classification from Integrated Hyperspectral EnMAP Data and Multispectral Sentinel-2 Time Series: A Large-scale Dataset and Dual-stream Transformer Method Feinkörnige Hierarchische Sortenklassifikation aus integrierten Hyperspektralen EnMAP-Daten und Multispektralen Sentinel-2 Zeitreihen: Eine großflächige Datensatz- und Dual-Stream-Transformer-Methode 从综合超高光谱 EmmAP 数据和多谱哨兵-2时间序列:大型数据集和双流变异器方法中精细测的等级级作物类型分类 2506.06155v2
  • 1021 06-09 Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness Widersacher mit Anreizen: Eine strategische Alternative zur widerspenstigen Robustheit 具有奖励措施的逆向:反反向强力战略替代方案 2406.11458v3
  • 1022 06-09 MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification MultiMatch: Multihead-Konsistenzregularisierung passend zur semi-überwachten Textklassifikation 多匹配: 用于半有效文本分类的多标题一致性规则化 2506.07801v1
  • 1023 06-09 Generalized Interpolating Discrete Diffusion Generalisierte Interpolation diskreter Diffusion 普遍国际刑 化 2503.04482v2
  • 1024 06-09 LLM Unlearning Should Be Form-Independent LLM Unlearning sollte formunabhängig sein LLM 重新学习应是独立的形式 2506.07795v1
  • 1025 06-09 High-Dimensional Learning in Finance High-Dimensional Learning in Finance 金融领域高层次学习 2506.03780v2
  • 1026 06-09 Text-to-LoRA: Instant Transformer Adaption Text-zu-LoRA: Instant Transformer Adaption 文本到LORA:即时变换器适应 2506.06105v2
  • 1027 06-09 Detecting Out-of-Distribution Objects through Class-Conditioned Inpainting Nicht-Verteilungsobjekte durch klassenkonditionierte Inpainting erkennen 通过按类别修饰油漆检测分销物体 2402.03292v3
  • 1028 06-09 Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy Prozess und Vertrauen: Byzantinische Angriffe mit umfassender Verteidigungsstrategie angehen 审判和信任:以全面防卫战略对付拜占庭袭击 2505.07614v2
  • 1029 06-09 Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger Re-Ranking Reasoning Kontext mit Baumsuche macht große Vision-Sprache Modelle stärker 以树木搜索为重新排列的原因说明背景,使大型视觉语言模型更强大 2506.07785v1
  • 1030 06-09 Is poisoning a real threat to LLM alignment? Maybe more so than you think Ist Vergiftung eine echte Bedrohung für LLM Ausrichtung? Vielleicht mehr, als Sie denken 中毒是否真正威胁到LLM对齐? 2406.12091v4
  • 1031 06-09 WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection WWAggr: Ein Fenster Wasserstein-basierte Aggregation für Ensemble Change Point Detection WWWAggr:一个基于窗口的瓦西施泰因聚合群,用于聚合变化点探测 2506.08066v1
  • 1032 06-09 Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics Die Grenzen der Low-Bit-Optimierer drücken: Fokus auf EMA-Dynamik 推动低比最佳优化因素的极限:注重EMA动态 2505.00347v2
  • 1033 06-09 Trend-Aware Fashion Recommendation with Visual Segmentation and Semantic Similarity Trend-Aware Fashion Empfehlung mit Visueller Segmentierung und semantischer Ähnlichkeit 具有视觉分解和语义相似性的趋势软件时装建议 2506.07773v1
  • 1034 06-09 Diffusion Models-Aided Uplink Channel Estimation for RIS-Assisted Systems Diffusion Models-Aided Uplink Channel Abschätzung für RIS-Assistente Systeme IRIS辅助系统传播模型辅助上链接频道估计 2506.07770v1
  • 1035 06-09 Clustered Federated Learning via Embedding Distributions Clustered Federated Learning durch Einbettung von Distributionen 通过嵌入式分发分发方式进行分组联邦学习 2506.07769v1
  • 1036 06-09 Quickest Causal Change Point Detection by Adaptive Intervention Schnellste ursächliche Punktänderungserkennung durch adaptive Intervention 通过适应干预措施快速发现因果变化点 2506.07760v1
  • 1037 06-09 Agent Semantics, Semantic Spacetime, and Graphical Reasoning Agent Semantics, Semantische Raumzeit und Graphische Vernunft 语义学、语义空间时间和图形解释 2506.07756v1
  • 1038 06-09 Comparing Credit Risk Estimates in the Gen-AI Era Vergleich der Schätzungen des Kreditrisikos in der Ära der Gen-AI 比较GEN-AI时代的信用风险估算 2506.07754v1
  • 1039 06-09 An Overview of the Burer-Monteiro Method for Certifiable Robot Perception Ein Überblick über die Burer-Monteiro-Methode für zertifizierbare Roboterwahrnehmung 关于可核证的机器人感知的Burer-Monteiro方法概览 2410.00117v2
  • 1040 06-09 DualCast: A Model to Disentangle Aperiodic Events from Traffic Series DualCast: Ein Modell, um Aperiodische Ereignisse aus der Verkehrsreihe zu entwirren DualCast: 将周期性事件与交通系列分开的模型 2411.18286v2
  • 1041 06-09 Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning Graph-Assistente Stiche für Offline-Hierarchisches Verstärkungslernen 离线高层强化学习的图表辅助细化 2506.07744v1
  • 1042 06-09 An end-to-end attention-based approach for learning on graphs Ein durchgängiger, auf der Aufmerksamkeit basierender Ansatz für das Lernen auf Diagrammen 以端对端关注为基础的图表学习方法 2402.10793v3
  • 1043 06-09 When, Where and Why to Average Weights? Wann, wo und warum zu Durchschnittsgewichten? 何时、何地和为何平均体重? 2502.06761v2
  • 1044 06-09 Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning Spracheinbettung trifft auf Dynamische Grafik: Eine neue Exploration für neurale Architektur-Repräsentations-Lernen 语言嵌入与动态图表相匹配:神经结构代表制学习的新探索 2506.07735v1
  • 1045 06-09 Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild Robuste 3D-Form-Rekonstruktion in Zero-Shot aus einem einzigen Bild in der Wildnis 从野生的单一图像中进行零热零位的强力 3D 形状重建 2403.14539v3
  • 1046 06-09 Profiling Electric Vehicles via Early Charging Voltage Patterns Profilieren von Elektrofahrzeugen über frühe Ladespannungsmuster 通过早期充电压模式分析电动车辆 2506.07714v1
  • 1047 06-09 Representation Bending for Large Language Model Safety Darstellungsbiegen für große Sprachmodellsicherheit 大语文示范语文安全示范语文代表名单 2504.01550v2
  • 1048 06-09 Statistical Taylor Expansion Statistische Taylor-Erweiterung 扩大泰勒统计 2410.01223v8
  • 1049 06-09 Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation Bewertung der Robustheit in latenten Diffusionsmodellen durch Einbetten von Level Augmentation 通过嵌入级递增评估前流传播模型的强力 2506.07706v1
  • 1050 06-09 Towards a Small Language Model Lifecycle Framework Auf dem Weg zu einem Rahmen für den Lebenszyklus eines kleinen Sprachmodells 建立一个小型语言模拟生命周期框架 2506.07695v1
  • 1051 06-09 Training Superior Sparse Autoencoders for Instruct Models Training Superior Sparse Autoencoder für Instruct-Modelle 培训高级Sparass Autoencors 教学模型高级导师 2506.07691v1
  • 1052 06-09 Rao-Blackwellised Reparameterisation Gradients Rao-Blackwellised Reparameterisation Gradienten Rao-黑-黑- 恢复度梯度 2506.07687v1
  • 1053 06-09 On Fitting Flow Models with Large Sinkhorn Couplings Auf Einbau-Flow-Modelle mit großen Sinkhorn-Kupplungen 配有大型Sinkhorn组合体的 “ 适合流程模型 “ 2506.05526v2
  • 1054 06-09 Forests for Differences: Robust Causal Inference Beyond Parametric DiD Wälder für Unterschiede: Robuste Kausalableitung jenseits parametrischer DiD 差异的森林:超出参数二D的强力因果推断 2505.09706v2
  • 1055 06-09 Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models Unlösbare Problemerkennung: Robuste Bewertung des Verständnisses für große multimodale Modelle 无法解决的问题探测:大型多式联运模型的强有力理解评估 2403.20331v4
  • 1056 06-09 How Benchmark Prediction from Fewer Data Misses the Mark Wie Benchmark-Voraussagen von weniger Daten das Zeichen vermissen 从更少数据中的基准预测如何错过标记 2506.07673v1
  • 1057 06-09 Data-driven inventory management for new products: An adjusted Dyna-$Q$ approach with transfer learning Datengesteuertes Bestandsmanagement für neue Produkte: Ein angepasster Dyna-$Q$-Ansatz mit Transfer-Lernen 新产品数据驱动的库存管理新产品库存管理:经调整的转让学习Dyna-Q$办法 2501.08109v4
  • 1058 06-09 ProARD: progressive adversarial robustness distillation: provide wide range of robust students ProARD: progressive konversarische Robustheitsdestillation: für eine breite Palette robuster Studenten ProARD: 进步对抗性强力蒸馏:提供各种强健学生 2506.07666v1
  • 1059 06-09 The Universality Lens: Why Even Highly Over-Parametrized Models Learn Well Die Universalitätslinse: Warum selbst hoch überparametrisierte Modelle gut lernen 普世透镜:为什么甚至高超美化模型学习良好 2506.07661v1
  • 1060 06-09 Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks Bewertung von Dialect Fairness und Robustheit von großen Sprachmodellen in mit Gründen versehenen Aufgaben 评估大语言模式在说明理由任务中的公平性和强度 2410.11005v3
  • 1061 06-09 Feature Statistics with Uncertainty Help Adversarial Robustness Feature Statistik mit Unsicherheit Hilfe Widersprüchliche Robustheit 具有不确定性的特征统计 2503.20583v2
  • 1062 06-09 Trajectory World Models for Heterogeneous Environments Trajektoriale Weltmodelle für heterogene Umgebungen 不同不同环境世界模型 2502.01366v2
  • 1063 06-09 Training Articulatory Inversion Models for Interspeaker Consistency Training Artikulatorische Inversionsmodelle für die Konsistenz von Interspeakern 跨口间连通性人工转换模型培训 2505.20529v3
  • 1064 06-09 HieraEdgeNet: A Multi-Scale Edge-Enhanced Framework for Automated Pollen Recognition HieraEdgeNet: Multi-Scale Edge-Enhanced Framework für automatisierte Pollenerkennung HieraEdgeNet:一个多层边缘强化自动承认花粉框架 2506.07637v1
  • 1065 06-09 RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors RAID: Ein Datensatz für die Prüfung der adversarialen Robustheit von KI-generierten Bilddetektoren 用于测试AI-光学图像探测器的反向强力的数据集 2506.03988v3
  • 1066 06-09 Can Perplexity Predict Fine-tuning Performance? An Investigation of Tokenization Effects on Sequential Language Models for Nepali Kann Perplexity Fine-Tuning Performance voraussagen? Eine Untersuchung von Tokenization-Effekten auf sequentielle Sprachmodelle für Nepali 重复性预测微调性能能否? 对尼泊尔的序列语言模式的收缩性影响调查i 2404.18071v2
  • 1067 06-09 AI-based Framework for Robust Model-Based Connector Mating in Robotic Wire Harness Installation KI-basiertes Framework für robuste modellbasierte Steckverbinder-Matierung in der Roboterdraht-Harness-Installation 以AI为基础的机器人电线装置中强有力的模型连接器配对框架 2503.09409v2
  • 1068 06-09 Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks Rückkehr von ChebNet: Verständnis und Verbesserung eines überblickten GNN bei langfristigen Aufgaben ChebNet的返回:了解和改进对长期任务GNN的忽视 2506.07624v1
  • 1069 06-09 LoRMA: Low-Rank Multiplicative Adaptation for LLMs LoRMA: Multiplikative Anpassung mit niedrigem Rank für LLMs LoRMA:LLMML的低Rank多复制性适应 2506.07621v1
  • 1070 06-09 Retrieval-Augmented Generation as Noisy In-Context Learning: A Unified Theory and Risk Bounds Retrieval-Augmented Generation as Noisy In-Context Learning: Eine einheitliche Theorie und Risikogrenzen 作为有噪音的文内学习:统一理论和风险波 2506.03100v3
  • 1071 06-09 The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning Der Catechol-Benchmark: Zeitreihen-Lösungsauswahldaten für wenige Aufnahmen maschinelles Lernen Catechol基准:几发机器学习的时间序列溶剂选择数据 2506.07619v1
  • 1072 06-09 FuXi-Air: Urban Air Quality Forecasting Based on Emission-Meteorology-Pollutant multimodal Machine Learning FuXi-Air: Vorhersage der städtischen Luftqualität auf der Grundlage von Emissions-Meteorologie-Pollutan multimodalen maschinellen Lernens FUXi-Air:基于排放-气象-污染-多式联运机学习的城市空气质量预测 2506.07616v1
  • 1073 06-09 Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds Poisson Midpoint-Methode für Log Concave Sampling: Jenseits der starken Fehler unteren Bounds 日志集中取样的 Poisson 中点方法: 超越强误差, 下界 2506.07614v1
  • 1074 06-09 APE: Selective Fine-tuning with Acceptance Criteria for Language Model Adaptation APE: Selektive Feinabstimmung mit Annahmekriterien für Sprachmodellanpassung APE:有选择地调整语言模式适应接受标准 2505.19912v2
  • 1075 06-09 Link Prediction with Relational Hypergraphs Link Vorhersage mit relationalen Hypergraphen 将预测与关系电报联系起来 2402.04062v3
  • 1076 06-09 SWAG: Long-term Surgical Workflow Prediction with Generative-based Anticipation SWAG: Langzeit-Surgical Workflow Prediction mit generativer Antizipation SWAG: 长期外科工作流动预测,以产生为基础预测 2412.18849v3
  • 1077 06-09 Tree-Sliced Wasserstein Distance with Nonlinear Projection Baumgeschliffene Wasserstein-Distanz mit nichtlinearer Projektion 用非线性投影绘制的树切瓦西尔斯坦距离 2505.00968v2
  • 1078 06-09 Learning from Double Positive and Unlabeled Data for Potential-Customer Identification Lernen aus doppelt positiven und unmarkierten Daten für die potenzielle Kundenidentifikation 学习用于潜在海关识别的双正和无标签数据 2506.00436v2
  • 1079 06-09 How Expressive are Knowledge Graph Foundation Models? Wie ausdrucksstark sind Wissensgraphen-Stiftungsmodelle? 知识图基础模型的表达性如何? 2502.13339v2
  • 1080 06-09 Revisiting Non-Acyclic GFlowNets in Discrete Environments Nicht-azyklische GFlowNets in diskreten Umgebungen erneut besuchen 重新审视环境混乱中的非周期性GFlowNets 2502.07735v2
  • 1081 06-09 TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts TwinBreak: Jailbreaking LLM Security Alignments basierend auf Twin Prompts 双突破:基于双提示的监狱破碎LLM安保调整 2506.07596v1
  • 1082 06-09 Tree-Sliced Wasserstein Distance: A Geometric Perspective Baumgeschliffene Wasserstein-Distanz: Eine geometrische Perspektive 树切瓦西尔斯坦距离:几何视角 2406.13725v3
  • 1083 06-09 Exploiting Curvature in Online Convex Optimization with Delayed Feedback Ausnutzen von Kurven in Online Convex-Optimierung mit verzögertem Feedback 利用在线 Convex 优化使用延迟反馈的在线 Convex 优化中的曲线曲线 2506.07595v1
  • 1084 06-09 Lite-RVFL: A Lightweight Random Vector Functional-Link Neural Network for Learning Under Concept Drift Lite-RVFL: Ein leichter Zufallsvektor Funktions-Link Neuronales Netzwerk zum Lernen unter Konzept Drift Lite-RVFL: 轻量随机随机矢量功能链接神经网络,用于在概念驱动下学习 2506.08063v1
  • 1085 06-09 FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning FairDICE: Fairness-getriebenes Offline-Multi-Ziel-Verstärkungs-Lernen FairDICE: 公平发展离线多目标强化学习 2506.08062v1
  • 1086 06-09 Explore the vulnerability of black-box models via diffusion models Entdecken Sie die Schwachstelle von Black-Box-Modellen über Diffusionsmodelle 通过推广模式探索黑盒模式的脆弱性 2506.07590v1
  • 1087 06-09 Attention with Trained Embeddings Provably Selects Important Tokens Aufmerksamkeit bei trainierten Einbettungen wählt wahrscheinlich wichtige Token aus 与经过训练的嵌入器的关注 2505.17282v2
  • 1088 06-09 PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs PrunePEFT: Iterative Hybride Pruning für eine parametereffiziente Feinabstimmung von LLMs PurnePEFT: LLMs 参数有效微调过渡混合节制 2506.07587v1
  • 1089 06-09 Theoretical Limitations of Ensembles in the Age of Overparameterization Theoretische Einschränkungen von Ensembles im Zeitalter der Überparameterisierung 过分参数化时代集合群的理论局限性 2410.16201v2
  • 1090 06-09 Aircraft Trajectory Dataset Augmentation in Latent Space Flugbahndatensatz Augmentation im Latent Space 低空空间轨迹数据集增强 2506.07585v1
  • 1091 06-09 FedCGD: Collective Gradient Divergence Optimized Scheduling for Wireless Federated Learning FedCGD: Kollektive Gradient Divergenz optimierte Planung für drahtloses Federated Learning FedCGD:无线联邦学习集体渐进变异优化计划 2506.07581v1
  • 1092 06-09 HSF: Defending against Jailbreak Attacks with Hidden State Filtering HSF: Verteidigung gegen Jailbreak Attacken mit versteckten Zustand Filterung HSF: 防范以隐蔽国家过滤手段侵入监狱的袭击 2409.03788v2
  • 1093 06-09 Denoising the Future: Top-p Distributions for Moving Through Time Denoising the Future: Top-P-Distributionen für die Zeit 摒弃未来:跨越时空的顶级发行品 2506.07578v1
  • 1094 06-09 Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models Uncertainty-o: Ein Modell-agnostisches Framework zur Enthüllung von Uncertainty in großen multimodalen Modellen 不确定性 – – o:大型多式联运模型中不确定性的不确定性稳定的一个示范 – – 不可知框架 2506.07575v1
  • 1095 06-09 Attacking Attention of Foundation Models Disrupts Downstream Tasks Angriff auf die Aufmerksamkeit von Stiftungsmodellen bricht nachgelagerte Aufgaben aus 攻击基金会模型的注意 破坏下流任务 2506.05394v2
  • 1096 06-09 AI Scientists Fail Without Strong Implementation Capability KI-Wissenschaftler scheitern ohne starke Umsetzungsfähigkeit AI 缺乏强有力的执行能力的科学家失败 2506.01372v2
  • 1097 06-09 Complex Physics-Informed Neural Network Komplexes physik-informiertes Neuronales Netzwerk 复杂物理内化神经网络 2502.04917v2
  • 1098 06-09 A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment Ein einfacheres und erlernbares Graph Convolutional Attention Network für unüberwachtes Wissen Graphen Alignment 一个用于不受监督知识图表对齐的简化和可学习的图变注意网络 2410.13263v2
  • 1099 06-09 AQUATIC-Diff: Additive Quantization for Truly Tiny Compressed Diffusion Models AQUATIC-Diff: Additive Quantisierung für wirklich kleine komprimierte Diffusionsmodelle AQUATIC-Diff: 真正微缩压缩扩散模型补充量化 2506.05960v2
  • 1100 06-09 Improving Memory Efficiency for Training KANs via Meta Learning Verbesserung der Speichereffizienz für die Schulung von KANs über Meta Learning 通过Meta学习提高培训KANs的记忆效率 2506.07549v1
  • 1101 06-09 Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques Eliciting Fine-Tuned Transformer Fähigkeiten durch Inferenz-Zeit-Techniken 通过推断-时间技术的精密导精度变形器能力 2506.08060v1
  • 1102 06-09 Output-Constrained Decision Trees Output-beschränkte Entscheidungsbäume 受产出约束的决策树 2405.15314v3
  • 1103 06-09 Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling Heartcare Suite: Multidimensionales Verständnis von EKG mit Raw Multi-Lead Signal Modeling 心脏护理套套件:以原始多牵头信号建模对ECG的多层面理解 2506.05831v2
  • 1104 06-09 Flowing Datasets with Wasserstein over Wasserstein Gradient Flows Flowing Datasets mit Wasserstein über Wasserstein Gradient Flows 与瓦森斯坦对瓦森斯坦梯流的流数据集 2506.07534v1
  • 1105 06-09 Learning What Reinforcement Learning Can’t: Interleaved Online Fine-Tuning for Hardest Questions Lernen, was Verstärktes Lernen nicht kann: Interleaved Online Fine-Tuning für härteste Fragen 学习强化学习无法做到的东西: 2506.07527v1
  • 1106 06-09 Test-time Correlation Alignment Prüfzeitkorrelation Ausrichtung 测试时关联对齐 2505.00533v2
  • 1107 06-09 Towards Achieving Perfect Multimodal Alignment Auf dem Weg zu einer perfekten multimodalen Ausrichtung 实现完美多式联运协调 2503.15352v2
  • 1108 06-09 Addressing Correlated Latent Exogenous Variables in Debiased Recommender Systems Adressierung korrelierter latenter exogener Variablen in debiased Recommender Systems 处理受损害建议系统中与气候相关的低端外部源变量 2506.07517v1
  • 1109 06-09 SecFormer: Fast and Accurate Privacy-Preserving Inference for Transformer Models via SMPC SecFormer: Schnelle und genaue Datenschutz-Bewahrung von Schlussfolgerungen für Transformer-Modelle über SMPC SecFormer:通过SMPC为变形模型快速和准确的隐私保护推断 2401.00793v5
  • 1110 06-09 Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators Beyond Closure Models: Chaotische Systeme über physikinformierte Neuraloperatoren lernen 超越封闭模型:通过物理内成形神经操作员学习毛质系统 2408.05177v4
  • 1111 06-09 Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models Eliciting In-Context Retrieval und Reasoning für Lang-Kontext Große Sprachmodelle 长式长式大语言模型的全文检索和理由 2501.08248v3
  • 1112 06-09 Reinforcement Learning via Implicit Imitation Guidance Verstärkung Lernen über Implizite Imitation Anleitung 通过隐含模拟指导加强学习 2506.07505v1
  • 1113 06-09 CaliciBoost: Performance-Driven Evaluation of Molecular Representations for Caco-2 Permeability Prediction CaliciBoost: Performance-getriebene Auswertung von molekularen Darstellungen für Caco-2 Permeabilitätsvorhersage CaliciBoost:对Caco-2 渗透性预测的分子代表量进行实绩-驱动评价 2506.08059v1
  • 1114 06-09 BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction BatteryLife: Ein umfassender Datensatz und Benchmark für die Vorhersage der Akkulaufzeit 电池寿命:电池寿命预测综合数据集和基准 2502.18807v6
  • 1115 06-09 Scalable Sobolev IPM for Probability Measures on a Graph Skalierbares Sobolev IPM für Wahrscheinlichkeitsmessungen auf einem Graphen 可缩缩缩的 Sobolev 用于对图表进行概率测量的IPM IPM 2502.00737v2
  • 1116 06-09 Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning Graph-of-Causal Evolution: Herausfordernde Modellkette für die Vernunft Causal 进化图:有挑战性的理由模型链 2506.07501v1
  • 1117 06-09 Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks Mind the Gap: Entfernen der Diskretisierungslücke in differenzierbaren Logic Gate-Netzwerken 牢记差距:消除不同逻辑门网的分化差距 2506.07500v1
  • 1118 06-09 Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization Low-Dimension-to-High-Dimension Verallgemeinerung und ihre Implikationen für Länge Verallgemeinerung 低差异至高差异通用化及其对延长通用化的影响 2410.08898v2
  • 1119 06-09 Explicit Preference Optimization: No Need for an Implicit Reward Model Explizite Preference-Optimierung: Keine Notwendigkeit für ein Implizites Prämienmodell 明确的优惠最优化:没有必要采用隐性奖赏模式 2506.07492v1
  • 1120 06-09 Binary Classifier Optimization for Large Language Model Alignment Binärklassifikator-Optimierung für großsprachliche Modellausrichtung 大语言模型对齐的二进制分类器优化 2404.04656v2
  • 1121 06-09 CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model für die Kontextoptimierung CoCOA-混合:环境优化组合模型 2506.07484v1
  • 1122 06-09 Premise Selection for a Lean Hammer Premise Auswahl für einen Lean Hammer 精精精锤预选 2506.07477v1
  • 1123 06-09 FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models FRED: Flexible Reduction-Distribution Interconnect und Communication Implementation für Wafer-Scale Distributed Training von DNN-Modellen FRED: DNN模型的宽度分布式分散式培训灵活弹性吸收-分散式互联和通信实施 2406.19580v2
  • 1124 06-09 Deep Ridgelet Transform and Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines Deep Ridgelet Transform und Unified Universality Theorem für tiefe und flache Gelenk-Gruppe-Equivariante Maschinen 深脊变异和统一通用深海和浅水联合小组-平等合用机器理论 2405.13682v5
  • 1125 06-09 Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models Verfolgung beweglicher Ziele mit Online-Selbstspiel-Verstärkung Lernen für sicherere Sprachmodelle 利用在线加强自身能力学习,建立更安全语言模式,以追踪移动目标 2506.07468v1
  • 1126 06-09 Physics-Informed Deep Learning Model for Line-integral Diagnostics Across Fusion Devices Physik-informiertes Deep-Learning-Modell für linienintegrierte Diagnosen über Fusionsgeräte 横穿融合装置的直线整体诊断法深学习模型 2412.00087v3
  • 1127 06-09 Circumventing Backdoor Space via Weight Symmetry Umgehung des Backdoor-Raums über die Gewichtssymmetrie 通过重量对称法环绕后门空间 2506.07467v1
  • 1128 06-09 Parameter-Efficient Fine-Tuning of State Space Models Parameter-Effizient Feintuning von State Space Models 国家空间模型有效精美设计 2410.09016v3
  • 1129 06-09 KScope: A Framework for Characterizing the Knowledge Status of Language Models KScope: Ein Rahmen für die Charakterisierung des Wissensstatus von Sprachmodellen KCope: 语言模式知识状况描述框架 2506.07458v1
  • 1130 06-09 When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment Wenn Stil die Sicherheit bricht: Verteidigung von Sprachmodellen gegen hocheffiziente Stilausrichtung 当样式打破安全时: 防御语言模型, 对抗超优样式对齐 2506.07452v1
  • 1131 06-09 Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs Die Erweiterung der epistemischen Unsicherheit über Parameter hinaus würde helfen, zuverlässige LLMs zu konzipieren 超越参数的不确定性延伸有助于设计可靠的长效磁环 2506.07448v1
  • 1132 06-09 One Fits All: Learning Fair Graph Neural Networks for Various Sensitive Attributes Eins passt auf alle: Faire Graphen-Neural-Netzwerke für verschiedene Sensitive Attribute lernen ” 一个适合所有人 “ :学习各种敏感属性的公平图表神经网络 2406.13544v3
  • 1133 06-09 Resilience of Rademacher chaos of low degree Resilienz von Rademacher Chaos von niedrigem Grad Rademacher低程度混乱的复原能力 2402.10504v5
  • 1134 06-09 Position: We Need Responsible, Application-Driven (RAD) AI Research Position: Wir brauchen verantwortungsbewusste, anwendungsgetriebene (RAD) KI-Forschung 立场:我们需要负责任、应用驱动(RAD)AI研究 2505.04104v2
  • 1135 06-09 Straight-Line Diffusion Model for Efficient 3D Molecular Generation Geradliniges Diffusionsmodell für effiziente 3D-Molekulargeneration 高效 3D 分子生成的直线扩散模型 2503.02918v2
  • 1136 06-09 Federated In-Context Learning: Iterative Refinement for Improved Answer Quality Föderiertes In-Context-Lernen: Iterative Veredelung für verbesserte Antwortqualität 联邦内部学习:改进答复质量的过渡性改进 2506.07440v1
  • 1137 06-09 Fast Geometric Embedding for Node Influence Maximization Schnelle geometrische Einbettung für Knotenbeeinflussung Maximierung 节点影响最大化快速几何嵌入 2506.07435v1
  • 1138 06-09 Cartridges: Lightweight and general-purpose long context representations via self-study Patronen: Leichte und universelle lange Kontextdarstellungen durch Selbststudium Cartridges:轻量和一般用途长背景介绍,通过自学 2506.06266v2
  • 1139 06-09 Multi-agent Architecture Search via Agentic Supernet Multi-Agent Architektur Suche über Agentic Supernet 通过 Agric Supernet 多剂机构建筑搜索 2502.04180v2
  • 1140 06-09 FlashDMoE: Fast Distributed MoE in a Single Kernel FlashDMoE: Schnell verteiltes MoE in einem einzigen Kernel FlashDMoE: 在一个单一核心中快速分布的教育部 2506.04667v2
  • 1141 06-09 State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models State-offset Tuning: State-based Parameter-Efficient Fine-Tuning für State Space Models 以国家为基础的国家空间模型参数 – – 有效精美设计 2503.03499v2
  • 1142 06-09 HeTa: Relation-wise Heterogeneous Graph Foundation Attack Model HeTa: Verhältnisweises Heterogenes Graph Foundation Attack Model HeTa: 关系性异异异图形基金会攻击模型 2506.07428v1
  • 1143 06-09 Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective Macht-Rechts-Dekay Verlust für große Sprachmodell Finetuning: Eine Theorie-Perspektive 大语言模型微调的功率法减缩损失:理论视角 2505.16900v5
  • 1144 06-09 DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning DOMAIN: Mildly COnservative Model-BAsed Offline Verstärkungslernen DOMAIN: 低度防腐性示范性软体外表强化学习 2309.08925v4
  • 1145 06-09 IF-GUIDE: Influence Function-Guided Detoxification of LLMs IF-GUIDE: Einfluss funktionsgeführte Entgiftung von LLMs IFGIIDE: LLMs的受控功能解毒作用影响 2506.01790v2
  • 1146 06-09 Evidential Spectrum-Aware Contrastive Learning for OOD Detection in Dynamic Graphs Evidential Spectrum-Aware Kontrastives Lernen für OOD Detection in dynamischen Graphen 动态图中 OOD 探测的 证据性光谱- 软件差异性学习 2506.07417v1
  • 1147 06-09 LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments LiteVLM: Eine Bildverarbeitungs-Modell-Inferenz-Pipeline mit geringer Latenz für ressourcenbeschränkte Umgebungen LiteVLM:资源受限环境的低寿命愿景-语言模型推导管道 2506.07416v1
  • 1148 06-09 Variational Supervised Contrastive Learning Variationelles Überwachtes Kontrastuelles Lernen 差异监督反舞弊学习 2506.07413v1
  • 1149 06-09 GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models GANQ: GPU-Adaptive nicht einheitliche Quantisierung für große Sprachmodelle GANQ: 大语言模型的GPU-APU-APU-Adapidive非统一量化 2501.12956v3
  • 1150 06-09 Structured Pruning for Diverse Best-of-N Reasoning Optimization Strukturierte Prunings für die unterschiedliche Best-of-N-Reasoning-Optimierung 结构审慎,实现最佳理由的多样化最佳最佳理由的优化 2506.03978v2
  • 1151 06-09 Fractional-order Jacobian Matrix Differentiation and Its Application in Artificial Neural Networks Fraktions-Order Jacobian Matrix Differenzierung und seine Anwendung in künstlichen neuralen Netzwerken Jacobian 矩阵差异及其在人造神经网络中的应用 2506.07408v1
  • 1152 06-09 Anomaly Detection and Early Warning Mechanism for Intelligent Monitoring Systems in Multi-Cloud Environments Based on LLM Anomalieerkennungs- und Frühwarnmechanismus für intelligente Überwachungssysteme in Multi-Cloud-Umgebungen auf Basis von LLM 以LLM为基础在多城市环境中建立智能监测系统的异常探测和预警机制 2506.07407v1
  • 1153 06-09 InverseScope: Scalable Activation Inversion for Interpreting Large Language Models InverseScope: Skalierbare Aktivierungs-Inversion zum Dolmetschen großer Sprachmodelle 反向范围: 用于解释大语言模型的可缩放活动转换 2506.07406v1
  • 1154 06-09 RiemannFormer: A Framework for Attention in Curved Spaces RiemannFormer: Ein Rahmenwerk für Aufmerksamkeit in geschwungenen Räumen Riemann Fremer: 曲线空间的注意框架 2506.07405v1
  • 1155 06-09 Outlier-weighed Layerwise Sampling for LLM Fine-tuning Ausgewägte schichtweise Probenahme für LLM-Feinabstimmung LLM 精密调试 2405.18380v3
  • 1156 06-09 Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions Numerische Untersuchung der Sequenzmodellierungstheorie mit steuerbaren Speicherfunktionen 使用可控内存函数对序列模拟理论进行数字调查 2506.05678v2
  • 1157 06-09 G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems G-Memory: Hierarchischer Speicher für Multi-Agent-Systeme G-记忆:为多机构系统追踪等级记忆 2506.07398v1
  • 1158 06-09 Nonparametric Modern Hopfield Models Nichtparametrische moderne Hopfield-Modelle 非参数现代跳地模型 2404.03900v2
  • 1159 06-09 From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks Von der statischen zur adaptiven Verteidigung: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks 从静态到适应性防卫:联邦多要求深入强化学习-驱动移动目标防御,防止UAV Swarm网络的DoS攻击 2506.07392v1
  • 1160 06-09 Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection Schutz von Deep-Learning-Modell-Urheberrechten mit zweifelhafter Beispiel-freier Wiederverwertungserkennung 保护深学习模式版权,进行反反对学性实例自由再利用探测 2407.03883v2
  • 1161 06-09 Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert Lärm-Robustheit durch Lärm: Asymmetrische LoRA-Anpassung mit Giftungsexperte 噪音造成的噪音破坏:与中毒专家一起进行非对称 LoRA适应 2505.23868v3
  • 1162 06-09 Moment Alignment: Unifying Gradient and Hessian Matching for Domain Generalization Moment Alignment: Vereinheitlichung von Gradienten und Hessischen Matching für Domain Generalization 时速对齐: 统一渐变和赫西相匹配域域通用化 2506.07378v1
  • 1163 06-09 Adapter Naturally Serves as Decoupler for Cross-Domain Few-Shot Semantic Segmentation Adapter dient natürlich als Entkoppler für die Cross-Domain-Few-Shot Semantic Segmentierung 自然适应器作为跨域鲜热语义分割的解码器 2506.07376v1
  • 1164 06-09 Numerical Analysis of HiPPO-LegS ODE for Deep State Space Models Numerische Analyse von HiPPO-LegS ODE für Deep State Space Modelle HIPPPO-LegS深海国家空间模型ODE的数值分析 2412.08595v2
  • 1165 06-09 A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment Eine umfassende Umfrage in LLM(-Agent) Full Stack Sicherheit: Daten, Schulung und Bereitstellung 用LLLM(-代理)全堆安全:数据、培训和部署进行的全面调查 2504.15585v4
  • 1166 06-09 Theoretical Benefit and Limitation of Diffusion Language Model Theoretischer Nutzen und Begrenzung des Diffusions-Sprachmodells 传播语言模式理论效益和限制 2502.09622v2
  • 1167 06-09 MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing MoE-GPS: Guidlines für Vorhersagestrategie für dynamische Expertenduplikation in MoE Load Balancing MoE-GPS:教育部负载平衡中动态专家重叠预测战略指南 2506.07366v1
  • 1168 06-09 Multiple Object Stitching for Unsupervised Representation Learning Mehrere Objektstiche für unüberwachtes Repräsentationslernen 用于无人监督的代表学习的多对象拼接 2506.07364v1
  • 1169 06-09 Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding Stochastische nichtlineare Steuerung über finite-dimensionale Spektral-Dynamische Einbettung 通过有限维谱光谱动态嵌入器控制非线性控件 2304.03907v5
  • 1170 06-09 Lightweight Joint Audio-Visual Deepfake Detection via Single-Stream Multi-Modal Learning Framework Leichte gemeinsame Audio-Visuelle Deepfake-Erkennung über Multi-Modal-Lernrahmen mit Single-Stream 通过单层多模式学习框架,通过单一链式多模式学习框架,联合探测音频-视觉深海假相 2506.07358v1
  • 1171 06-09 CBAM-STN-TPS-YOLO: Enhancing Agricultural Object Detection through Spatially Adaptive Attention Mechanisms CBAM-STN-TPS-YOLO: Verbesserung der Erkennung landwirtschaftlicher Objekte durch räumlich adaptive Aufmerksamkeitsmechanismen CBAM-STN-TPS-YOLO:通过空间适应性关注机制加强农业物体的探测 2506.07357v1
  • 1172 06-09 Unveiling and Addressing Pseudo Forgetting in Large Language Models Enthüllung und Adressierung von Pseudo Forgetting in großen Sprachmodellen 在大语言模式中统一和解决 “ 优于 “ 的遗忘 2411.11932v2
  • 1173 06-09 SALT: A Lightweight Model Adaptation Method for Closed Split Computing Environments SALT: Eine leichte Modellanpassungsmethode für geschlossene Split Computing-Umgebungen SALT: 用于封闭式拆分计算环境的轻量模型适应方法 2506.07355v1
  • 1174 06-09 Task Generalization With AutoRegressive Compositional Structure: Can Learning From $D$ Tasks Generalize to $D^{T}$ Tasks? Aufgabenverallgemeinerung mit autoregressiver Kompositionsstruktur: Kann man von $D$ Aufgaben zu $D^{T}$ Aufgaben verallgemeinern? 与自动递减的构成结构相关的一般任务:能否从“$D”的任务向“$D”的任务普遍学习到“$D”的任务? 2502.08991v2
  • 1175 06-09 Decentralized Optimization on Compact Submanifolds by Quantized Riemannian Gradient Tracking Dezentralisierte Optimierung auf kompakten Submanifolds durch Quantized Riemannian Gradient Tracking 量化的里伊曼尼梯度跟踪对契约子部件的优化 2506.07351v1
  • 1176 06-09 Distributed Risk-Sensitive Safety Filters for Uncertain Discrete-Time Systems Verteilte risikosensible Sicherheitsfilter für unsichere Diskretzeitsysteme 用于不确定分辨时间系统的分散式风险敏感度安全过滤器 2506.07347v1
  • 1177 06-09 ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity ST-GraphNet: Ein räumlich-temporales Neuralnetzwerk für das Verständnis und die Vorhersage von Automatisiertem Fahrzeug-Crash-Severity ST-GraphNet:一个了解和预测自动车辆撞车严重性的空间-时时图神经网络 2506.08051v1
  • 1178 06-09 A Comprehensive Survey on Artificial Intelligence for Complex Network: Potential, Methodology and Application Eine umfassende Umfrage zu Künstlicher Intelligenz für komplexes Netzwerk: Potenzial, Methodik und Anwendung 复杂网络人工情报综合调查:潜力、方法和应用 2402.16887v2
  • 1179 06-09 On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks Über die Auswirkungen von Unsicherheit und Kalibrierung auf Likelihood-Ratio-Mitgliedschafts-Inferenzangriffe 不确定性和校准对潜在――拉蒂奥成员推论攻击的影响 2402.10686v5
  • 1180 06-09 Calibrated Value-Aware Model Learning with Probabilistic Environment Models Kalibriertes Value-Aware-Modell-Lernen mit probabilistischen Umweltmodellen 使用概率环境模型校准的价值软件模型学习 2505.22772v2
  • 1181 06-09 Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models Membership Inferenz Attack sollte weiter zu Verteilungsstatistiken für destillierte Generative Modelle 成员攻击的推论应转向已蒸馏生成模型的分发统计数据 2502.02970v2
  • 1182 06-09 Real-Time Execution of Action Chunking Flow Policies Real-Time Ausführung von Maßnahmen Chunking Flow Richtlinien 实时执行 “ 快速行动 “ 流动政策 2506.07339v1
  • 1183 06-09 Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models Graph-KV: Durchbrechen der Sequenz durch Einspritzen struktureller Biasen in große Sprachmodelle 图图-KV:通过向大语言模型输入结构分界来打破序列 2506.07334v1
  • 1184 06-09 Model Generalization on Text Attribute Graphs: Principles with Large Language Models Modellverallgemeinerung auf Text-Attribut-Grafiken: Prinzipien mit großen Sprachmodellen 文本属性图的模型通用化:大语言模型原则 2502.11836v2
  • 1185 06-09 JavelinGuard: Low-Cost Transformer Architectures for LLM Security JavelinGuard: Low-Cost-Transformer-Architekturen für LLM-Sicherheit JavelinGuard:低成本变革结构促进LLM安全 2506.07330v1
  • 1186 06-09 Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning Dynamische Planung für Fahrzeug-zu-Fahrzeug-Kommunikation verbessertes Federated Learning 车辆到车辆的交通通信增强联邦学习的动态日程安排 2406.17470v2
  • 1187 06-08 (7) Mobility-Aware Asynchronous Federated Learning with Dynamic Sparsification Mobilität-Bewusst Asynchrones Federated Learning mit dynamischer Sparsifikation 具有动态分级功能的流动意识非同步联邦学习 2506.07328v1
  • 1188 06-08 Reward Model Interpretability via Optimal and Pessimal Tokens Reward Model Interpretierbarkeit über Optimale und Pessimal Token 通过最佳和微量调制调来解释性 2506.07326v1
  • 1189 06-08 Devanagari Digit Recognition using Quantum Machine Learning Devanagari Digit Recognition mit Quantum Machine Learning Devanagari Digit 使用量子机器学习识别 2506.09069v1
  • 1190 06-08 DEF: Diffusion-augmented Ensemble Forecasting DEF: Diffusionsgenerierte Ensemble-Prognose DEF: 扩散增强的集合预测 2506.07324v1
  • 1191 06-08 Watermarking Language Models with Error Correcting Codes Wasserzeichen-Sprachenmodelle mit Fehler-Korrektur-Codes 错误校正代码的划水语言模型 2406.10281v4
  • 1192 06-08 Generative Modeling of Networked Time-Series via Transformer Architectures Generative Modellierung von vernetzten Zeitreihen über Transformer-Architekturen 通过变换器架构生成网络化时序建模 2506.07312v1
  • 1193 06-08 Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference Paged Attention trifft auf FlexAttention: Locking Long-Context Efficiency in Deployed Inference 专注:在部署推论中解锁长文本效率 2506.07311v1
  • 1194 06-08 PASS: Private Attributes Protection with Stochastic Data Substitution PASS: Schutz privater Attribute mit stochastischer Datenersetzung PASS: 私人财产保护,用存储数据替代数据 2506.07308v1
  • 1195 06-08 Highly Fast Text Segmentation With Pairwise Markov Chains Hochschnelle Textsegmentierung mit paarweise Markov-Ketten 配对马科夫链条的快速文本分割 2102.11037v2
  • 1196 06-08 Sharpness-Aware Teleportation on Riemannian Manifolds Scharfheitsbewusste Teleportation auf Riemannschen Manifolds Riemannian Manifolds 上的移动 2309.17215v2
  • 1197 06-08 Uncertainty-Aware Strategies: A Model-Agnostic Framework for Robust Financial Optimization through Subsampling Unsichere Strategien: Ein modell-agnostischer Rahmen für robuste finanzielle Optimierung durch Unterstichproben 不确定性 – – 软件战略:通过子取样实现稳健金融优化的模型 – – 不可确定框架 2506.07299v1
  • 1198 06-08 MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache MiniKV: Die Grenzen der LLM-Inferenz durch 2-Bit-Layer-Diskriminativ-KV-Cache drücken MiniKV: 通过2-Bit 层分解 KV 缓存推进LLM 推断的限度 2411.18077v3
  • 1199 06-08 On the kernel learning problem Zum Problem des Kernel-Lernens 关于内核学习问题 2502.11665v2
  • 1200 06-08 Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI Multi-Step Guided Diffusion für die Bildwiederherstellung an Edge-Geräten: Auf dem Weg zur leichten Wahrnehmung in körpernaher KI 多脚制导扩散,促进边缘设备图像恢复:向Embodied AI 中的轻量度感知 2506.07286v1
  • 1201 06-08 Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization Verteidigung gegen verschiedene Angriffe im Federated Learning durch Konsens-basierte Bi-Level-Optimierung 通过基于共识的双级优化,在通过共识实现的两级最佳化,在联邦学习中防范多种袭击 2412.02535v2
  • 1202 06-08 Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback 超前优先优化(EGPO):超越纳什最后的融合(Nash) 从人类反馈中学习 2503.08942v2
  • 1203 06-08 Selective Prompt Anchoring for Code Generation Selektive Prompt-Ankerung für die Code-Generierung 代代代代代代代代代代代代代代代代代 代代代代代代代代代代代代代 代代代代代代代代代代代代 2408.09121v5
  • 1204 06-08 Machine Learning Should Maximize Welfare, but Not by (Only) Maximizing Accuracy Maschinelles Lernen sollte Wohlfahrt maximieren, aber nicht durch (nur) Maximierung der Genauigkeit 机器学习应该最大限度地提高福利,但不能(仅仅)最大限度地提高准确度 2502.11981v2
  • 1205 06-08 Tokenized Bandit for LLM Decoding and Alignment Tokenized Bandit für LLM Decodierung und Ausrichtung 用于LLMM 解码和对齐的定義土匪 2506.07276v1
  • 1206 06-08 A Cramér-von Mises Approach to Incentivizing Truthful Data Sharing Ein Cramér-von-Mises-Ansatz zur Förderung der wahrheitsgemäßen Datenfreigabe 激励真相数据共享的Cramér-von Mises方法 2506.07272v1
  • 1207 06-08 Machine Learning-Based Self-Localization Using Internal Sensors for Automating Bulldozers Maschinelle Learning-basierte Selbstlokalisierung mittels interner Sensoren zur Automatisierung von Bulldozern 利用内部传感器实现推推推车自动化 2506.07271v1
  • 1208 06-08 PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference PecSched: Präventive und effiziente Clusterplanung für LLM-Inferenz 预测:先发制人和高效率的LLM推理群集安排 2409.15104v2
  • 1209 06-08 Universality of conformal prediction under the assumption of randomness Universalität der konformen Vorhersage unter der Annahme der Zufälligkeit 假设随机性的情况下一致预测的普遍性 2502.19254v2
  • 1210 06-08 FDC: Fast KV Dimensionality Compression for Efficient LLM Inference FDC: Schnelle KV-Dimensionalitätskompression für effiziente LLM-Inferenz FDC: 高效 LLM 推断快速 KV 尺寸压缩 2408.04107v3
  • 1211 06-08 Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods Ensemble-MIX: Verbesserung der Probeneffizienz in Multi-Agent RL mit Ensemble-Methoden 组合-MIX:利用组合方法提高多代理RL的取样效率 2506.02841v2
  • 1212 06-08 BG-HOP: A Bimanual Generative Hand-Object Prior BG-HOP: Bimanuelle Generative Hand-Objekt-Vorhersage BG-HOP: 双体生成手向对象前置 2506.09068v1
  • 1213 06-08 Counterfactual inference in sequential experiments Gegenfaktische Inferenz in sequentiellen Experimenten 连续实验中的反事实推论 2202.06891v5
  • 1214 06-08 RADAR: Recall Augmentation through Deferred Asynchronous Retrieval RADAR: Erinnerung an Augmentation durch verzögerte asynchrone Retrieval RADAR:通过推迟的非同步回溯性回溯性增量 2506.07261v1
  • 1215 06-08 ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition ALINE: Gemeinsame Amortisation für Bayesian Inferenz und aktive Datenerfassung AlINE: 贝耶斯推断和主动数据采集联合摊销 2506.07259v1
  • 1216 06-08 Can the Rookies Cut the Tough Cookie? Exploring the Use of LLMs for SQL Equivalence Checking Können die Rookies das harte Cookie ausschneiden? Erforschen der Verwendung von LLMs für SQL-Equivalenzprüfung Rookies 能够切开硬饼干吗? 探索 SQL 等效检查使用 LLMs 的情况 2412.05561v2
  • 1217 06-08 AMPO: Active Multi-Preference Optimization for Self-play Preference Selection AMPO: Aktive Multi-Preference-Optimierung für Selbstspiel-Preference-Auswahl AMPO: 主动的自玩首选多优先优化 2502.18293v2
  • 1218 06-08 DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products DeltaProdukt: Verbesserung der State-Tracking in linearen RNNs über Haushaltsprodukte DeltaProduction:通过家用产品改进国家通过家用产品对Linear RNNNs的跟踪 2502.10297v5
  • 1219 06-08 Horizon Reduction Makes RL Scalable Horizontreduzierung macht RL skalierbar 降低地平线使RL可缩放 2506.04168v2
  • 1220 06-08 Regretful Decisions under Label Noise Bedauerliche Entscheidungen im Rahmen von Label Noise 标签噪音下的遗憾决定 2504.09330v2
  • 1221 06-08 Unraveling Token Prediction Refinement and Identifying Essential Layers in Language Models Unraveling Token Prediction Verfeinerung und Identifizierung wesentlicher Ebenen in Sprachmodellen 在语言模型中统一地改进和确定关键层 2501.15054v2
  • 1222 06-08 Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search Unsichere LLM-basierte Suche: Quantitative Analyse und Minderung von Sicherheitsrisiken in der AI-Websuche 不安全的LLM-基于LLM的搜索:AI网络搜索中安全风险的定量分析和减轻 2502.04951v2
  • 1223 06-08 Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy Imperatives Lernen: Ein selbstüberwachtes neuro-symbolisches Lernrahmenwerk für Roboterautonomie 必要学习:自我监督的机器人自主的神经-双向学习框架 2406.16087v6
  • 1224 06-08 InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference InfoSEM: Ein tiefgehendes Generatives Modell mit informativen Prioren für Gene Regulatory Network Inferenz InfoSEM: 基因监管网络推论具有信息前置信息的深生成模型 2503.04483v2
  • 1225 06-08 Improving the Efficiency of Long Document Classification using Sentence Ranking Approach Verbesserung der Effizienz der Langdokumentklassifikation mittels Sentence-Ranking-Ansatz 采用判决分级办法提高长文件分类的效率 2506.07248v1
  • 1226 06-08 Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models Förderung der Ensemblevielfalt mit interaktiver bayesischer Distributions Robustheit für Feinsteuerungs-Stiftungsmodelle 促进混合多样性与互动的贝叶斯分配强力促进微调基础模型 2506.07247v1
  • 1227 06-08 Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs Übertaktung LLM-Reasoning: Überwachung und Kontrolle von Denkpfadlängen in LLMs LLM 解释:监测和控制LLM中思考长度 2506.07240v1
  • 1228 06-08 Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments Lernen als Individuen, Evolve als Team: Multi-Agent LLMs Anpassung in körpereigenen Umgebungen 作为个人学习,作为一个团队参与:多剂LMs在渗透环境中的适应 2506.07232v1
  • 1229 06-08 VARSHAP: Addressing Global Dependency Problems in Explainable AI with Variance-Based Local Feature Attribution VARSHAP: Bewältigung globaler Abhängigkeitsprobleme bei erklärbarer KI mit varianzbasierter lokaler Feature-Attribution VARSHAP: 解决全球依赖性问题,在可解释的AI 中与基于差异的地方特点 2506.07229v1
  • 1230 06-08 A spring-block theory of feature learning in deep neural networks Eine Frühling-Block-Theorie des Feature-Lernens in tiefen neuronalen Netzwerken 深神经网络中特征学习的春季阶段理论 2407.19353v3
  • 1231 06-08 Targeting relative risk heterogeneity with causal forests Ausrichtung der relativen Risiko Heterogenität mit ursächlichen Wäldern 确定相对风险目标,确定与因果森林的异质性 2309.15793v3
  • 1232 06-08 VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data VORTEX: Ein räumliches Computing-Framework für optimierte Drohnentelemetrie-Extraktion aus der First-Person-Ansicht Flugdaten anzeigen VORTEX:从第一人查看飞行数据中最佳遥测提取无人机空间计算框架 2412.18505v2
  • 1233 06-08 Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts Diversifizierung des Expertenwissens für Task-Agnostic Pruning in Sparse Mixture-of-Experts 使专家在Sparse Mixture 专家中的任务不可知性调节专家知识多样化 2407.09590v4
  • 1234 06-08 Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward Mehr multimodale Begründungsfähigkeiten multimodaler Großsprachenmodelle durch Visual Perception Reward 通过视觉感知奖励促进多式大语言模型的多式多式能力 2506.07218v1
  • 1235 06-08 Decoupled Data Consistency with Diffusion Purification for Image Restoration Entkoppelte Datenkonsistenz mit Diffusionsreinigung für Bildrestaurierung 数据与用于图像恢复的传播净化相兼容 2403.06054v6
  • 1236 06-08 RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts RepLoRA: Reparameterisieren der Low-Rank-Anpassung über die Perspektive der Mischung von Experten RepLORA:通过专家混合的视角,恢复低成本适应的计量 2502.03044v2
  • 1237 06-08 Tilted Sharpness-Aware Minimization Gekippte Schärfe-Bewusst-Minimierung 静液尖锐度- 最小化 2410.22656v2
  • 1238 06-08 Randomness, exchangeability, and conformal prediction Zufälligkeit, Austauschbarkeit und konforme Vorhersage 随机性、易交换性以及符合性预测 2501.11689v3
  • 1239 06-08 Improving Generalization with Flat Hilbert Bayesian Inference Verbesserung der Generalisierung durch flache Hilbert Bayesische Schlussfolgerung 利用Flat Hilbert Bayesian推论改进普遍化 2410.04196v2
  • 1240 06-08 Another look at inference after prediction Ein weiterer Blick auf die Schlussfolgerung nach der Vorhersage 另一种观点是预测之后的推论 2411.19908v4
  • 1241 06-08 Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching Audio-Synthesizer-Inversion in symmetrischen Parameterräumen mit ungefähr gleichmässigem Flow-Matching 音频合成器在对称参数空间的反转, 大约等差流匹配 2506.07199v1
  • 1242 06-08 GGBall: Graph Generative Model on Poincaré Ball GGBall: Graph Generatives Modell auf Poincaré Ball GGBall:poincaré球球的图形生成模型 2506.07198v1
  • 1243 06-08 Finite-PINN: A Physics-Informed Neural Network with Finite Geometric Encoding for Solid Mechanics Finite-PINN: Ein physikinformiertes Neuronales Netzwerk mit Finite Geometrischer Encoding für feste Mechanik Finite-PINN:一个物理化神经网络,其中含有固体机械学的有限几何编码 2412.09453v2
  • 1244 06-08 Analyzing Breast Cancer Survival Disparities by Race and Demographic Location: A Survival Analysis Approach Analyse der Überlebensdisparitäten von Brustkrebs nach Rasse und demographischer Lage: Ein Ansatz zur Überlebensanalyse 按种族和人口地点分析乳腺癌生存差异:生存分析方法 2506.07191v1
  • 1245 06-08 ST-USleepNet: A Spatial-Temporal Coupling Prominence Network for Multi-Channel Sleep Staging ST-USleepNet: Ein Raum-Temporal-Koppelungs-Prominenz-Netzwerk für Multi-Channel-Sleep-Staging ST-USSLSLSINT:多通道睡眠调节空间-时际聚合显要网络 2408.11884v3
  • 1246 06-08 Learning based on neurovectors for tabular data: a new neural network approach Lernen auf Basis von Neurovektoren für tabellarische Daten: ein neuer neuronaler Netzwerkansatz 基于用于表格数据的神经动力学的学习:新的神经网络方法 2506.07185v1
  • 1247 06-08 Regularized Adaptive Graph Learning for Large-Scale Traffic Forecasting Regularisiertes Adaptives Graphenlernen für großräumige Verkehrsprognosen 用于大型交通流量预测的正规化适应性图表学习 2506.07179v1
  • 1248 06-08 RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality RULE: Verstärkung des Unerfahrenen erreicht Vergessen-Behalten Pareto-Optimalität RULE: 强化不放功率达到遗忘-保留最佳程度 2506.07171v1
  • 1249 06-08 Efficient Text-Attributed Graph Learning through Selective Annotation and Graph Alignment Effizientes Text-Attributes Graphenlernen durch selektive Annotation und Graphenausrichtung 通过选择性注释和图表对齐,通过选择性注释和图表对齐,高效率的文本定位图表学习 2506.07168v1
  • 1250 06-08 AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models AMoPO: Adaptive Multi-Objektive Preference-Optimierung ohne Prämienmodelle und Referenzmodelle AMOP: 适应性、多目标、不奖励模式和参考模式的优化 2506.07165v1
  • 1251 06-08 ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models ExLM: Nachdenken über die Auswirkungen von [MASK] Token in maskierten Sprachmodellen EXLM:重新思考[MASK]语调模式中[MASK]语调的影响 2501.13397v5
  • 1252 06-08 SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision SMI-Editor: Bearbeiten-basiertes SMILES-Sprachenmodell mit Fragment-Level-Überwachung SMI-编辑器:基于编辑的SMILES语言模式,带有零散监督级别 2412.05569v2
  • 1253 06-08 pFedSOP : Accelerating Training Of Personalized Federated Learning Using Second-Order Optimization pFedSOP : Beschleunigendes Training des personalisierten Föderierten Lernens unter Verwendung von Second-Order-Optimierung pFedSOP: 利用二阶优化加快个人化联邦学习培训 2506.07159v1
  • 1254 06-08 Syntactic Control of Language Models by Posterior Inference Syntaktische Kontrolle von Sprachmodellen durch hintere Inferenz 由Posideer 推断对语言模型的协同控制 2506.07154v1
  • 1255 06-08 Tensor Product Neural Networks for Functional ANOVA Model Tensor Produkt Neuronale Netzwerke für funktionales ANOVA-Modell 功能ANOVA模型的神经网络 2502.15215v4
  • 1256 06-08 Comba: Improving Bilinear RNNs with Closed-loop Control Comba: Bilineare RNNs mit Closed-Loop-Steuerung verbessern Comba: 改进有闭环控制的双线区域网网 2506.02475v2
  • 1257 06-08 How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level Analysis Wie reduziert DPO die Toxizität? Eine mechanistische Neuronen-Level-Analyse DPO 如何减少毒性? 2411.06424v3
  • 1258 06-08 Quantile-Optimal Policy Learning under Unmeasured Confounding Quantile-Optimal Policy Learning unter ungemessener Verwirrung 在无法计量的混乱状态下进行量化最佳政策学习 2506.07140v1
  • 1259 06-08 Evaluation of Machine Learning Models in Student Academic Performance Prediction Evaluation von Machine Learning Models in der studentischen Leistungsvorhersage 评价学生学术表现预测中的机器学习模式 2506.08047v1
  • 1260 06-08 Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning Zuverlässige Kritiker: Monotonische Verbesserung und Konvergenzgarantien für die Stärkung des Lernens 可靠的批评:加强学习的单体改进和融合保障 2506.07134v1
  • 1261 06-08 DiC: Rethinking Conv3x3 Designs in Diffusion Models DiC: Conv3x3-Designs in Diffusionsmodellen neu denken DiC: 重新思考传播模型中的Conv3x3 设计 2501.00603v2
  • 1262 06-08 Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models Gleichwertige Denoiser können Graphen nicht kopieren: Richten Sie Ihre Graphen-Diffusionsmodelle aus 等式 Denois 无法复制图表: 对齐您的图形扩散模型 2405.17656v2
  • 1263 06-08 Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models Quality-Diversity Red-Teaming: Automatisierte Generierung von hochwertigen und unterschiedlichen Attackern für große Sprachmodelle 质量-多样性红队:为大语言模式自动生成高品质和不同攻击者 2506.07121v1
  • 1264 06-08 A Hybrid Multi-Factor Network with Dynamic Sequence Modeling for Early Warning of Intraoperative Hypotension Hybrides Multi-Factor-Netzwerk mit dynamischer Sequenzmodellierung zur Frühwarnung von intraoperativer Hypotonie 混合多要素网络,具有动态序列模型模型,以及早警告不合作水分的不合作状态; 2409.11064v4
  • 1265 06-08 Conditional Denoising Meets Polynomial Modeling: A Flexible Decoupled Framework for Time Series Forecasting Bedingtes Stören trifft auf Polynommodellierung: Ein flexibles entkoppeltes Framework für die Zeitreihenprognose 满足多面性建模:时间序列预测灵活拆分框架 2410.13253v7
  • 1266 06-08 Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings Auf dem Weg zu einer universellen Offline-Black-Box-Optimierung über Sprachmodell-Embeddings 通过学习语言模型嵌入式嵌入,实现普遍离线黑箱优化 2506.07109v1
  • 1267 06-08 Filling the Missings: Spatiotemporal Data Imputation by Conditional Diffusion Füllung der fehlenden Daten: Spatiotemporale Daten-Imputation durch Bedingte Diffusion 填补失踪人员:通过有条件的传播来估计时空数据 2506.07099v1
  • 1268 06-08 A Theoretical Characterization of Optimal Data Augmentations in Self-Supervised Learning Eine theoretische Charakterisierung von Optimal Data Augmentations im Selbst-überwachten Lernen 自我监督学习中最佳数据增加的理论特征 2411.01767v4
  • 1269 06-08 Epidemic-guided deep learning for spatiotemporal forecasting of Tuberculosis outbreak Epidemisch geführtes Deep Learning für die spatiotemporale Vorhersage des Tuberkuloseausbruchs 流行病指导深层学习,以对结核病爆发进行时空预测 2502.10786v2
  • 1270 06-08 Patient Similarity Computation for Clinical Decision Support: An Efficient Use of Data Transformation, Combining Static and Time Series Data Patienten-Ähnlichkeitsrechnung für die klinische Entscheidungsunterstützung: Eine effiziente Nutzung von Datentransformationen, die Kombination von statischen und Zeitreihendaten 临床决定支持:有效利用数据转换、静态和时间系列数据 2506.07092v1
  • 1271 06-08 Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models Stellen Sie sich vor zu hören: Auditory Knowledge Generation kann ein effektiver Assistent für Sprachmodelle sein 想象一下:产生听力知识可以成为语言模式的有效助理。 2503.16853v2
  • 1272 06-08 State Entropy Regularization for Robust Reinforcement Learning Staatliche Entropie-Regularisierung für robustes Stärkungslernen 强力强化学习的状态元件常规化 2506.07085v1
  • 1273 06-08 Inverse Design of Metamaterials with Manufacturing-Guiding Spectrum-to-Structure Conditional Diffusion Model Inverses Design von Metamaterialien mit Fertigungsleit-Spektrum-zu-Struktur Bedingtes Diffusionsmodell 具有 “ 制造-引导光谱到结构有条件扩散模型 “ 的元材料逆向设计 2506.07083v1
  • 1274 06-08 On the Generalization of Data-Assisted Control in port-Hamiltonian Systems (DAC-pH) Zur Generalisierung der datengestützten Steuerung in port-Hamiltonischen Systemen (DAC-pH) 港口-汉堡系统数据辅助控制普遍化(DAC-pH) 2506.07079v1
  • 1275 06-08 AgentDrug: Utilizing Large Language Models in an Agentic Workflow for Zero-Shot Molecular Optimization AgentDrug: Einsatz großer Sprachmodelle in einem Agentischen Workflow für die Null-Shot-Molekularoptimierung A剂药物:利用大型语言模型进行零热分子优化的制剂工作流程中的大语言模型 2410.13147v9
  • 1276 06-08 E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models E-BATS: Effiziente Backpropagation-freie Testzeitanpassung für Sprachstiftungsmodelle EBATS: 语音基础模型的有效后支持-无测试-时间适应 2506.07078v1
  • 1277 06-08 Diversity from Human Feedback Vielfalt durch menschliches Feedback 来自人类反馈的多样性 2310.06648v3
  • 1278 06-08 Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization Beschleunigung der 3D Gaussian Splatting mit neuraler Sortierung und achsenorientierter Rasterisierung 加速 3D Gaussian 与神经排序和轴偏向光化加速 3D Gaussian 喷洒 2506.07069v1
  • 1279 06-08 Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems Stabile Derivative Freie Gaussian Mischung Variationelle Schlussfolgerung für Bayesian Inverse Probleme Bayes 反问题中稳定衍生自由高斯混集变异推论 2501.04259v3
  • 1280 06-08 AdaSplash: Adaptive Sparse Flash Attention AdaSplash: Adaptive Sparse Flash Achtung AdaSplash: 适应性散散闪光关注 2502.12082v2
  • 1281 06-08 D2R: dual regularization loss with collaborative adversarial generation for model robustness D2R: Dualer Regularisierungsverlust mit kollaborativer konversatorischer Generation für Modell Robustheit D2R:双重身份正规化损失,为示范性稳健而协作性对抗性一代人 2506.07056v1
  • 1282 06-08 Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead Policy Gradient mit Baumsuche: Lokale Optimas durch Lookahead vermeiden 与树搜索的政策渐变:通过 Lookahead 避免本地 Oppima 2506.07054v1
  • 1283 06-08 Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz Hyperbolisches rezidivierendes neuronales Netzwerk als erste Art von nicht-euklidischen neuronalen Quantenzustandsansatz 超双曲经常性神经网络,作为第一种非欧洲的神经量子状态 ansatz 2505.22083v2
  • 1284 06-08 AHSG: Adversarial Attack on High-level Semantics in Graph Neural Networks AHSG: Adversarialer Angriff auf hochrangige Semantik in Graphen-Neuralen Netzwerken AHSG:对图形神经网络高层语义的反向攻击 2412.07468v3
  • 1285 06-08 When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models Wenn Aufmerksamkeit zusammenbricht: Wie degenerierte Ebenen in LLMs kleinere, stärkere Modelle aktivieren 注意折叠时: LLM 中如何分解层的 LLM 启用更小、 更强模型 2404.08634v3
  • 1286 06-08 FairPFN: A Tabular Foundation Model for Causal Fairness FairPFN: Ein Tabular Foundation Modell für Causal Fairness FairPPFN: 因果关系公平性表示基金会模型 2506.07049v1
  • 1287 06-08 SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration SalbeiAchtung: Genaue 8-Bit-Achtung für Plug-and-Play-Inferenzbeschleunigung SageAttention: 精确注意8-Bit 加速插管和玩耍推论加速 2410.02367v8
  • 1288 06-08 SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization SalbeiAchtung2: Effiziente Aufmerksamkeit mit gründlicher Ausreißerglättung und per-thread INT4 Quantisierung SageAttention 2: 高效关注,以完全外流滑动和每英尺INT4 量化 2411.10958v6
  • 1289 06-08 Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning Effiziente $Q$-Learning- und Actor-Critic-Methoden für robustes durchschnittliches Reward-Verstärkungs-Lernen 高效益平均奖励强化学习的学习和行为-竞争方法 2506.07040v1
  • 1290 06-08 BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling BRIDGE: Bootstrapping-Text zur Steuerung der Time-Series-Generation über Multi-Agent iterative Optimierung und Diffusionsmodellierung BRIDGE:通过多代理迭代优化和传播模型化控制时间- 系列生成的推进文本 2503.02445v4
  • 1291 06-08 Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond Learning-at-Criticality in großen Sprachmodellen für Quantenfeldtheorie und darüber hinaus 量子实地理论及以后大语言模型的学习困难 2506.03703v2
  • 1292 06-08 Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalanced Regression Mixtur-Experten mit selbstüberwachter Aggregation für tabellarische Imbalanced Regression 用于制表平衡回归的测试时自我监督集成混合专家 2506.07033v1
  • 1293 06-08 AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint AlphaSteer:学习使用有原则的Null-空间限制的拒绝指导 2506.07022v1
  • 1294 06-08 LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models LEANCODE: Modelle besser verstehen für Code-Vereinfachung von vortrainierten großen Sprachmodellen LEANCODE: 更好地理解模式,以更好地简化培训前大语言模式的守则 2505.14759v3
  • 1295 06-08 Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation Struktur-agnostische Optimalität des doppelt robusten Lernens zur Abschätzung des Behandlungseffekts Doubly 治疗效果估测工具的“治疗效果强力学习”的结构-不可知性优化 2402.14264v4
  • 1296 06-08 TABLET: Table Structure Recognition using Encoder-only Transformers TABELLET: Tabelle Struktur Anerkennung mit Encoder-only Transformers TableT: 使用只使用编码器的变换器确认表格结构 2506.07015v1
  • 1297 06-08 Comparison of Lightweight Methods for Vehicle Dynamics-Based Driver Drowsiness Detection Vergleich von Leichtbaumethoden für Fahrzeugdynamik-basierte Fahrer-Drowsiness-Erkennung 机动机动机动机动机动机动车机动机动车驾驶识别漂泊情况轻型方法比较 2506.07014v1
  • 1298 06-08 Half-AVAE: Adversarial-Enhanced Factorized and Structured Encoder-Free VAE for Underdetermined Independent Component Analysis Half-AVAE: Adversarial-erhöhte faktorisierte und strukturierte Encoder-freie VAE für unterbestimmte unabhängige Komponentenanalyse 半AVAE:用于未确定独立部件分析的反强化加工厂和结构化无编码器 VAE 2506.07011v1
  • 1299 06-08 When do neural networks learn world models? Wann lernen neuronale Netzwerke Weltmodelle? 神经网络何时学习世界模型? 2502.09297v4
  • 1300 06-08 LLM-HDR: Bridging LLM-based Perception and Self-Supervision for Unpaired LDR-to-HDR Image Reconstruction LLM-HDR: Überbrückung der LLM-basierten Wahrnehmung und Selbstüberwachung für ungepaarte LDR-zu-HDR-Bildrekonstruktion LLM-HDR: 连接以LLM为基础的LLM感知和自我监督的未受重视的LDR到HDR图像重建 2410.15068v3
  • 1301 06-08 Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts Feynman-Kac Korrektoren in der Diffusion: Annealing, Anleitung und Produkt von Experten 传播中Feynman-Kac的更正者: 2503.02819v2
  • 1302 06-08 Automatic Doubly Robust Forests Automatic Doppelt robuste Wälder Doubly 自动硬森林 2412.07184v2
  • 1303 06-08 End-to-End Probabilistic Framework for Learning with Hard Constraints End-to-End-Probabilistisches Rahmenwerk für das Lernen mit harten Einschränkungen 最终到最终的艰苦制约因素学习概率框架 2506.07003v1
  • 1304 06-08 Towards Physics-informed Diffusion for Anomaly Detection in Trajectories Auf dem Weg zu physikinformierter Diffusion für Anomalienerkennung in Trajektorien 走向在物理知情的情况下传播用于轨迹中的异常探测 2506.06999v1
  • 1305 06-08 What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding Was macht Vernunftmodelle anders? Folgen Sie der Vernunftführerin für effiziente Decodierung 是什么让理性模型不同? 跟随高效配方的理性导师 2506.06998v1
  • 1306 06-08 Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access Probenkomplexität des Diffusionsmodelltrainings ohne empirischen Risikominimizer-Zugang 传播示范培训的复杂程度,没有经验风险最小化,没有经验风险最小化机会 2505.18344v2
  • 1307 06-08 STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings STAMP Ihr Inhalt: Nachweis der Datensatzmitgliedschaft über Watermarked Rephrasings STAMP 您的内容: 通过水标修改方式证明数据集成员 2504.13416v2
  • 1308 06-08 Correcting for Position Bias in Learning to Rank: A Control Function Approach Korrektur für Position Bias in Learning to Rank: Ein Steuerungsfunktionsansatz 校正学习排名中的比值位置:控制功能方法 2506.06989v1
  • 1309 06-08 Fully Explainable Classification Models Using Hyperblocks Vollständig erklärbare Klassifikationsmodelle mit Hyperblocks 使用超超分块的完全可解释的分类模型 2506.06986v1
  • 1310 06-08 Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments Deep RL braucht tiefe Verhaltensanalyse: Implizite Planung durch modellfreie Agenten in offenen Umgebungen erkunden 深 RL 需要深 RL 深度行为分析:探索不限名额环境中无示范工具的隐性规划 2506.06981v1
  • 1311 06-08 MoXGATE: Modality-aware cross-attention for multi-omic gastrointestinal cancer sub-type classification MoXGATE: Modalitätsbewusste Cross-Attention für multi-omere gastrointestinale Tumor-Subtypklassifikation MoXGATE:多工程胃肠胃癌亚型癌症分类模式 – – 意识交叉关注 2506.06980v1
  • 1312 06-08 Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias Turb-L1: Langfristige Turbulenzen erreichen, die durch das Greifen spektraler Bias verfolgt werden Turb-L1:通过处理光辉双鱼,实现长期动荡追踪 2505.19038v2
  • 1313 06-08 Near Optimal Non-asymptotic Sample Complexity of 1-Identification Nahe Optimale nicht-asymptotische Probenkomplexität der 1-Identifikation 近于最优化的非不消毒样品 复杂度 1 - 识别 2506.06978v1
  • 1314 06-08 Post Reinforcement Learning Inference Rückschlüsse nach der Verstärkung des Lernens 强化后学习的推论 2302.08854v4
  • 1315 06-08 UdonCare: Hierarchy Pruning for Unseen Domain Discovery in Predictive Healthcare UdonCare: Hierarchie Pruning für Unsichtbare Domain Discovery in Predictive Healthcare UdonCare: 预测保健中未见的域域发现 的分级预留机制 2506.06977v1
  • 1316 06-08 Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning Lernen, durch verstärktes Lernen durch reward-gewichtete Feintuning zu klären 学习通过通过奖励-加权微调加强学习来澄清问题 2506.06964v1
  • 1317 06-08 Fitting Multilevel Factor Models Passende Multilevel-Faktormodelle 适合多级因数模型 2409.12067v3
  • 1318 06-08 CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration CE-CoLLM: Effiziente und adaptive Großsprachenmodelle durch Cloud-Edge-Kollaboration CE-COLLM:通过云眼协作建立高效和适应性大语言模型 2411.02829v2
  • 1319 06-08 Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs 超越自反应内核:历史驱动目标,实现高效的非线性非线性通用图形MCMC 2505.18300v2
  • 1320 06-08 Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression Sicherheitsbewusstes Verstärktes Lernen zur Kontrolle über risikosensitive Aktions-Value Iteration und quantitative Regression 通过风险敏感行动增值和量回归,加强安全软件强化学习,以通过风险敏感行动控制 2506.06954v1
  • 1321 06-07 (6) Is Your Training Pipeline Production-Ready? A Case Study in the Healthcare Domain Ist Ihr Training Pipeline Production-Ready? Eine Fallstudie im Bereich Healthcare 你的训练管道生产-准备? 保健领域案例研究 2506.06946v1
  • 1322 06-07 Polar Hierarchical Mamba: Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences Polar Hierarchical Mamba: Auf dem Weg zur Streaming-LiDAR-Objekterkennung mit Punktwolken als egozentrische Sequenzen 极地分级马姆巴:向以点云为主心序列的分点云星探测星体径流激光雷达物体的方向迈进 2506.06944v1
  • 1323 06-07 Conditional Denoising Diffusion for ISAC Enhanced Channel Estimation in Cell-Free 6G Bedingte störende Diffusion für ISAC-Verstärkte Kanalabschätzung in zellfreier 6G 以无细胞6G方式为ISAC强化频道估计仪进行有条件的低排放传播 2506.06942v1
  • 1324 06-07 RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior RestoreGrad: Signal-Restaurierung mit Bedingten störenden Diffusions-Modellen mit gemeinsam gelernten Prior RestAGrad:使用有条件的低度扩散模型和先前联合研究的传播模型,恢复信号恢复信号 2502.13574v3
  • 1325 06-07 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Die Illusion des Denkens: Die Stärken und Grenzen von Vernunftmodellen über das Lens of Problem Complexity verstehen 思考的幻觉:通过问题复杂焦点了解理性模型的长处和局限性 2506.06941v1
  • 1326 06-07 Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More Schärfe-Dynamik im NN-Training mit einem minimalistischen Beispiel verstehen: Die Auswirkungen von Dataset-Schwierigkeiten, Tiefe, Stochastik und mehr 在NN培训中以最微小的示例了解尖锐的动态:数据集难度、深度、储存性等的影响 2506.06940v1
  • 1327 06-07 Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning Intelligente Anomalieerkennung für Lane-Rendering mit Transformer mit selbstüberwachtem Pre-Training und maßgeschneidertem Feintuning 利用自我监督的训练前和定制的微调访问变形器,对利用自我监督的训练前和定制的微调访问前变形器进行巷路成形的智能异常异常探测 2312.04398v6
  • 1328 06-07 Scaling over Scaling: Exploring Test-Time Scaling Plateau in Large Reasoning Models Skalierung über Skalierung: Untersuchung des Testzeit-Skalierungs-Plateau in Großmodellen 缩放超过缩放: 探索大型理由模型中的试验时间缩放高台 2505.20522v2
  • 1329 06-07 Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry Das Budget neu schreiben: Ein allgemeiner Rahmen für Black-Box-Angriffe unter Kosten-Asymmetrie 重写预算:费用不对称下的黑牛袭击总框架 2506.06933v1
  • 1330 06-07 Revisiting semi-supervised learning in the era of foundation models Halbbeaufsichtigtes Lernen im Zeitalter der Gründungsmodelle 在基础模式时代重新审查半监督的学习 2503.09707v2
  • 1331 06-07 BLEUBERI: BLEU is a surprisingly effective reward for instruction following BLEUBERI: BLEU ist eine überraschend effektive Belohnung für den Unterricht nach BLEU是一种令人惊讶的有效奖赏 2505.11080v2
  • 1332 06-07 Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes Analyse der Thompson-Probenahme zur Kontrolle unbekannter linearer Diffusionsprozesse Thompson 用于控制未知线性扩散过程的抽样分析 2206.09977v2
  • 1333 06-07 Distributionally Robust Learning in Survival Analysis Verteilungsweise robustes Lernen in der Überlebensanalyse 在生存分析中进行有力的分配学习 2506.01348v2
  • 1334 06-07 Collapse-Proof Non-Contrastive Self-Supervised Learning Kollaps-Proof des nicht kontrasiven Selbstüberwachten Lernens 非封闭性自我监督学习的折叠结果 2410.04959v3
  • 1335 06-07 Basis Transformers for Multi-Task Tabular Regression Basistransformatoren für Multi-Task-Tabellarische Regression 多任务表格回归基础变换器 2506.06926v1
  • 1336 06-07 The global convergence time of stochastic gradient descent in non-convex landscapes: Sharp estimates via large deviations Die globale Konvergenzzeit stochastischer Gradientenabgänge in nicht konvexen Landschaften: Scharfe Schätzungen über große Abweichungen 非沿海地貌中悬浮性梯度下降的全球趋同时间:因巨大偏差而得出的粗略估计 2503.16398v2
  • 1337 06-07 Measurement to Meaning: A Validity-Centered Framework for AI Evaluation Messung zur Bedeutung: Ein gültigkeitszentrierter Rahmen für die AI-Bewertung 衡量到意义:AI评价的有效性-中心框架 2505.10573v3
  • 1338 06-07 Graph-Based Physics-Guided Urban PM2.5 Air Quality Imputation with Constrained Monitoring Data Graph-based Physics-Guided Urban PM2.5 Luftqualität Imputation mit eingeschränkten Überwachungsdaten 基于图表的物理指导城市 PP2.5 空气质量与受控监测数据 2506.06917v1
  • 1339 06-07 Graph Neural Networks in Modern AI-aided Drug Discovery Graph Neuronale Netzwerke in der modernen KI-gestützten Drug Discovery 现代人工智能辅助药物发现现代神经网络图 2506.06915v1
  • 1340 06-07 Uncertainty Estimation on Graphs with Structure Informed Stochastic Partial Differential Equations Ungewissheitsschätzung auf Graphen mit Struktur Informierte stochastische Partielle Differentialgleichungen 具有结构、知情斯托卡部分差别的图图的不确定性估计 2506.06907v1
  • 1341 06-07 When Incentives Backfire, Data Stops Being Human Wenn Incentives Backfire, Daten aufhören, Mensch zu sein 当鼓励反火时,数据停止成为人类 2502.07732v2
  • 1342 06-07 Prediction-Powered Adaptive Shrinkage Estimation Vorhersage-Powered Adaptive Schrumpfung Schätzung 预测力的适应性减缩估计值 2502.14166v2
  • 1343 06-07 An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation Eine adaptiv ungenaue Methode für das Zwei-Stufen-Lernen mit primär-dualer Stildifferenzierung 使用纯-双式风格差异的双级学习适应性不精确方法 2412.06436v3
  • 1344 06-07 TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training TorchTitan: Ein-Stop PyTorch native Lösung für produktionsfertiges LLM-Vortraining 火炬泰坦:当地一站式PyToirch当地生产即成LLM培训前的一站式PyTorch本地解决方案 2410.06511v3
  • 1345 06-07 NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery NSD-Imagery: Ein Benchmark-Datensatz zur Erweiterung von fMRI-Vision-Dekodierungsmethoden auf geistige Bilder NSD- 想象力:将FMRI愿景解码方法扩展至精神图像的基准数据集 2506.06898v1
  • 1346 06-07 Exploring Image Transforms derived from Eye Gaze Variables for Progressive Autism Diagnosis Erforschen von Bildtransformationen, abgeleitet von Eye Gaze-Variablen für die progressive Autismus-Diagnose 探索从用于渐进自闭症诊断的眼盖变量中产生的图像变换 2506.09065v1
  • 1347 06-07 LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models LLM-SRBench: Ein neuer Benchmark für die Entdeckung wissenschaftlicher Gleichungen mit großen Sprachmodellen LLM-SRBench:采用大语言模式科学等同发现新基准 2504.10415v2
  • 1348 06-07 Multimodal Integration of Longitudinal Noninvasive Diagnostics for Survival Prediction in Immunotherapy Using Deep Learning Multimodale Integration der langitudinalen Nichtinvasiven Diagnostik zur Überlebensvorhersage in der Immuntherapie mittels Deep Learning 利用深层学习的免疫疗法中生存预测纵向非侵入性诊断的多模式综合 2411.18253v2
  • 1349 06-07 Scalable Gaussian Processes with Latent Kronecker Structure Skalierbare Gaußprozesse mit latenter Kronecker-Struktur 可缩放高斯进程 2506.06895v1
  • 1350 06-07 A Novel Neural Filter to Improve Accuracy of Neural Network Models of Dynamic Systems Ein neuartiger neuraler Filter zur Verbesserung der Genauigkeit neuraler Netzwerkmodelle dynamischer Systeme 提高动态系统神经网络模型准确性的新神经过滤器 2409.13654v2
  • 1351 06-07 Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks? Kann In-Context-Verstärkung lernen, sich von Belohnungs-Vergiftungsangriffen zu erholen? 文中加强学习能从受毒害袭击中恢复吗? 2506.06891v1
  • 1352 06-07 FREE: Fast and Robust Vision Language Models with Early Exits KOSTENLOS: Schnelle und robuste Vision Sprachmodelle mit frühen Ausgängen FREE:有早期出入口的快速和强有力的愿景语言模式 2506.06884v1
  • 1353 06-07 LookAlike: Consistent Distractor Generation in Math MCQs LookAlike: Konsistente Distraktor-Generation in Math MCQs 类似外观: 在数学 MCQs 中, 一致的分隔器生成 2505.01903v2
  • 1354 06-07 Empowering COVID-19 Detection: Optimizing Performance Through Fine-Tuned EfficientNet Deep Learning Architecture Empowering COVID-19 Detection: Optimierung der Leistung durch eine effiziente Net Deep Learning Architektur 增强COVID-19探测能力:通过精巧高效网络深层学习架构优化业绩 2311.16593v2
  • 1355 06-07 Learnable Activation Functions in Physics-Informed Neural Networks for Solving Partial Differential Equations Lernbare Aktivierungsfunktionen in physikinformierten Neuronalen Netzwerken zur Lösung partieller Differentialgleichungen 用于解决部分差异等量的物理内成神经网络中可学习的激活功能 2411.15111v3
  • 1356 06-07 An Optimized Ensemble Deep Learning Model For Brain Tumor Classification Ein optimiertes Ensemble Deep Learning Modell für die Klassifizierung von Gehirntumoren 大脑肿瘤分类最佳组合深层学习模型 2305.12844v3
  • 1357 06-07 Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization Verbesserte Probenkomplexität für private Nonsmooth Nonconvex-Optimierung 改进私人非机动非康通非节流优化的样品复杂度 2410.05880v2
  • 1358 06-07 Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding Weit und deutlich sehen: Halluzinationen in MLLMs mit Aufmerksamkeit abbauen 远看远看得很清楚:在MLLLMS中减少幻觉,并注意固态解说 2505.16652v2
  • 1359 06-07 Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary Methods Learning Fluid-Struktur-Interaktion Dynamik mit physikinformierten Neuronalen Netzwerken und eingetauchten Grenzmethoden 与物理内成形神经网络和混合边界方法的互动动态 2505.18565v2
  • 1360 06-07 Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning Log-Sum-Exponential-Estimator für Off-Policy-Evaluierung und -Lernen 用于非政策评价和学习的非政策评价和学习的日志- 2506.06873v1
  • 1361 06-07 Watermark under Fire: A Robustness Evaluation of LLM Watermarking Wasserzeichen unter Feuer: Eine Robustheitsbewertung von LLM Watermarking 火灾下的水印:对LLM水标志的强力评价 2411.13425v3
  • 1362 06-07 KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies KEA: Exploration durch proaktiv koordinierte Explorationsstrategien am Leben erhalten KEA:通过主动协调的勘探战略使勘探活动保持活跃 2503.18234v2
  • 1363 06-07 SAFE: Finding Sparse and Flat Minima to Improve Pruning SAFE: Sparse und flache Minima finden, um Pruning zu verbessern FAFE: 寻找粗糙和平坦的迷你小米来改善谨慎 2506.06866v1
  • 1364 06-07 Generalization and Robustness of the Tilted Empirical Risk Verallgemeinerung und Robustheit des abgeschwächten Empirischen Risikos 微薄经验风险的概括化和有力性 2409.19431v3
  • 1365 06-07 Multimodal Spatial Language Maps for Robot Navigation and Manipulation Multimodale räumliche Sprachkarten für Roboternavigation und -manipulation 用于机器人导航和操纵的多模式空间语言地图 2506.06862v1
  • 1366 06-07 Differentially Private Sparse Linear Regression with Heavy-tailed Responses Unterschiedlich Private Sparse Lineare Regression mit stark gestaffelten Responses 具有重尾量反应的 不同私人的粗线性递减 2506.06861v1
  • 1367 06-07 High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations High-Fidelity Scientific Simulation Surrogats über adaptive Implizite Neuraldarstellungen 通过适应性隐隐性神经仪表显示的高纤维科学模拟模拟器 2506.06858v1
  • 1368 06-07 Free Random Projection for In-Context Reinforcement Learning Freie Zufallsprojektion für In-Context-Verstärkung 内容强化学习自由随机投射 2504.06983v2
  • 1369 06-07 JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data JESTR: Gemeinsame Einbettung von Weltraumtechnik für Ranking-Kandidatenmoleküle für die Annotation untargeted Metabolomics Data JESTR: 联合嵌入空间技术,以评分非目标元元数据评分候选候选分子 2411.14464v3
  • 1370 06-07 Curvature Enhanced Data Augmentation for Regression Verstärkte Datenvergrößerung für Regression 后回归加速度增强数据增强值 2506.06853v1
  • 1371 06-07 Active Preference Optimization for Sample Efficient RLHF Aktive Preference-Optimierung für Probeneffizienz RLHF 抽样高效RLHF的主动首选最佳优化 2402.10500v3
  • 1372 06-07 A Statistical Framework for Model Selection in LSTM Networks Ein statistischer Rahmen für die Modellauswahl in LSTM-Netzwerken LSTM网络模式选择统计框架 2506.06840v1
  • 1373 06-07 Harnessing Vision-Language Models for Time Series Anomaly Detection Nutzung von Vision-Language-Modellen für Zeitreihenanomalienerkennung 利用视觉语言模型进行时间序列异常探测 2506.06836v1
  • 1374 06-07 LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models LLM-Attacken: Verbesserung der Closed-Loop Adversarial Scenario Generation für autonomes Fahren mit großen Sprachmodellen LLM-攻击者:加强利用大语言模型自主驾驶的闭路反反向假想生成 2501.15850v2
  • 1375 06-07 FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-World LoRA FedALT: Federated Fine-Tuning durch Adaptives lokales Training mit Rest-of-World LoRA FedALT:通过与世界其余地区合作的适应性地方培训实现联邦退休 2503.11880v2
  • 1376 06-07 The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes Die Strömungen des Konflikts: Konflikttrends mit gaussischen Prozessen zersetzen 冲突流:与高斯进程冲突趋势的分解 2506.06828v1
  • 1377 06-07 Data Shapley in One Training Run Datenschädlichkeit in einem Trainingslauf 一个培训运行中的数据形状 2406.11011v3
  • 1378 06-07 ASPO: Constraint-Aware Bayesian Optimization for FPGA-based Soft Processors ASPO: Constraint-Aware Bayesian Optimierung für FPGA-basierte Softprozessoren APPO: 以FPGA为基础的软处理器的节压-Aware Bayesian优化 2506.06817v1
  • 1379 06-07 Path Integral Optimiser: Global Optimisation via Neural Schrödinger-Föllmer Diffusion Path Integral Optimiser: Globale Optimierung über Neural Schrödinger-Föllmer Diffusion 路径综合优化:通过Neural Schrödinger-Föllmer扩散实现全球优化 2506.06815v1
  • 1380 06-07 Homophily-Driven Sanitation View for Robust Graph Contrastive Learning Homophil-getriebene Sanitäransicht für robustes Graph Kontrastives Lernen 强力图表对比学习的光极- 驱动光学卫生视图 2307.12555v2
  • 1381 06-07 Efficient Parallelization of Message Passing Neural Network Potentials for Large-scale Molecular Dynamics Effiziente Parallelisierung von Message-Passing-Neural Network-Potenziale für großmaßstäbliche molekulare Dynamiken 大型分子动态信息传递神经网络潜力的有效平行 2505.06711v3
  • 1382 06-07 Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity Schnelle kausale Entdeckung durch ungefähre Kernel-basierte generalisierte Score-Funktionen mit linearer Computational Complexity 以近端内核为基础的通用计分函数及线性比较复杂度 2412.17717v2
  • 1383 06-07 IMPA-HGAE:Intra-Meta-Path Augmented Heterogeneous Graph Autoencoder IMPA-HGAE:Intra-Meta-Path Augmented Heterogene Graph Autoencoder IMPA-HGAE: 甲基板内增增异异异形图形自动编码器 2506.06809v1
  • 1384 06-07 Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification Label-Semantik Aware Generativer Ansatz für Domain-Agnostic Multilabel-Klassifikation 域-不可知性多标签分类的认知生成方法 2506.06806v1
  • 1385 06-07 Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs Dialog ohne Grenzen: Konstant-Sized KV-Caches für erweiterte Reaktionen in LLMs 无限制对话框: 普通 KV 宽度响应库( LLM ) 2503.00979v2
  • 1386 06-07 Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate Vorhersage-erweitert Monte Carlo: Eine Machine-Learning-Ansicht auf Steuerungsvariate 预测增强的蒙特卡洛:关于控制Variatte的机械学习观点 2412.11257v3
  • 1387 06-07 EasyFS: an Efficient Model-free Feature Selection Framework via Elastic Transformation of Features EasyFS: Ein effizientes modellfreies Feature Selection Framework über die elastische Transformation von Features EsimpFS:通过地物的弹性变异,建立一个高效的无示范模式的地物选择框架 2402.05954v2
  • 1388 06-07 Refining Adaptive Zeroth-Order Optimization at Ease Verfeinerung adaptiver Zeroth-Order-Optimierung bei Leichtigkeit Ease 的适应性零极优化 2502.01014v2
  • 1389 06-07 Is Optimal Transport Necessary for Inverse Reinforcement Learning? Ist ein optimaler Verkehr für das Inverse Verstärkungslernen notwendig? 最佳交通是否为反强化学习所必需? 2506.06793v1
  • 1390 06-07 Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models Ferret: Federated Full-Parameter Tuning auf Scale für große Sprachmodelle Ferret:在大语言模型规模上采用联邦全票全票制表 2409.06277v3
  • 1391 06-07 FuncGNN: Learning Functional Semantics of Logic Circuits with Graph Neural Networks FuncGNN: Funktionelle Semantik von Logic Circuits mit Graph Neural Networks lernen FuncGNN: 学习具有图形神经网络的逻辑电路功能性词义 2506.06787v1
  • 1392 06-07 Caterpillar GNN: Replacing Message Passing with Efficient Aggregation Caterpillar GNN: Ersetzen der Nachricht durch effiziente Aggregation GNN: 以高效聚合取代信件传递 2506.06784v1
  • 1393 06-07 Interpretable LLMs for Credit Risk: A Systematic Review and Taxonomy Interpretierbare LLMs für Kreditrisiken: Systematische Überprüfung und Taxonomie 信用风险:系统审查和分类 2506.04290v2
  • 1394 06-07 Feature-Based Instance Neighbor Discovery: Advanced Stable Test-Time Adaptation in Dynamic World Feature-Based Instance Neighbor Discovery: Erweiterte stabile Test-Time-Anpassung in dynamischer Welt 以地貌为基础的实例邻国发现:动态世界中的高级稳定测试时间适应 2506.06782v1
  • 1395 06-07 Continuous-Time SO(3) Forecasting with Savitzky–Golay Neural Controlled Differential Equations Continuous-Time SO(3) Vorhersage mit Savitzky–Golay Neural gesteuerte Differentialgleichungen 连续时间 SO(3) 使用Savitzky-Golay 控制神经控制的神经等量的预测 2506.06780v1
  • 1396 06-07 Continuous Semi-Implicit Models Kontinuierliche semiimplizite Modelle 连续半集成模型 2506.06778v1
  • 1397 06-07 Label Distribution Learning using the Squared Neural Family on the Probability Simplex Label-Distribution-Lernen mit der Quadratischen Neural-Familie auf der Wahrscheinlichkeit Simplex 利用 “ 概率简单 “ 上的平方神经家庭进行标签分布学习 2412.07324v2
  • 1398 06-07 Taming Wild Branches: Overcoming Hard-to-Predict Branches using the Bullseye Predictor Wilde Zweige zähmen: Hard-to-Predict-Zweige mit dem Bullseye Predictor überwinden 塔姆荒野分支:利用 “ 红心 “ 预测仪克服难以预防的分支 2506.06773v1
  • 1399 06-07 On Adversarial Robustness of Language Models in Transfer Learning Zur widerlichen Robustheit von Sprachmodellen im Transfer-Lernen 转让学习中语言模式的反效果 2501.00066v2
  • 1400 06-07 AfroBench: How Good are Large Language Models on African Languages? AfroBench: Wie gut sind große Sprachmodelle auf afrikanischen Sprachen? AfroBench:非洲语言大语言模式有多好? 2311.07978v5
  • 1401 06-07 Xeno-learning: knowledge transfer across species in deep learning-based spectral image analysis Xeno-Learning: Wissenstransfer über Arten hinweg in der Deep Learning-basierten Spektralbildanalyse Xeno-学习:深学习光谱图象分析中的跨物种知识转让 2410.19789v2
  • 1402 06-07 The OCR Quest for Generalization: Learning to recognize low-resource alphabets with model editing OCR Quest for Generalization: Lernen, Low-Resource-Alphabets mit Modellbearbeitung zu erkennen OCR 普遍化调查:学会以示范编辑方式识别低资源字母表 2506.06761v1
  • 1403 06-07 Bio-Inspired Classification: Combining Information Theory and Spiking Neural Networks – Influence of the Learning Rules Bio-Inspirierte Klassifikation: Verknüpfung von Informationstheorie und spikenden Neuronalen Netzwerken – Einfluss der Lernregeln 生物启发分类:信息理论与Spiking神经网络相结合 – – 学习规则的影响 2506.06750v1
  • 1404 06-07 Honey, I shrunk the hypothesis space (through logical preprocessing) Schatz, ich habe den Hypothesenraum geschrumpft (durch logische Vorverarbeitung) 亲爱的,我缩小了假设空间(通过逻辑预处理) 2506.06739v1
  • 1405 06-07 WorldLLM: Improving LLMs’ world modeling using curiosity-driven theory-making WorldLLM: Verbesserung der Weltmodellierung von LLMs durch kuriositätsgetriebene Theoriebildung WorldLLLM:利用由好奇力驱动的理论制造改进LLMs的世界模型 2506.06725v1
  • 1406 06-07 Directly Forecasting Belief for Reinforcement Learning with Delays Direkt voraussagender Glaube für verstärktes Lernen mit Verzögerungen 直接预测学习迟迟加强学习的信念 2505.00546v2
  • 1407 06-07 IQFM A Wireless Foundational Model for I/Q Streams in AI-Native 6G IQFM Ein drahtloses Grundmodell für I/Q Streams in AI-Native 6G AI-Native 6G的I/Q流无线无线基础模型 2506.06718v1
  • 1408 06-07 Online SLA Decomposition: Enabling Real-Time Adaptation to Evolving Network Systems Online-SLA-Zersetzung: Echtzeit-Anpassung an sich entwickelnde Netzwerksysteme ermöglichen SLA在线分解:使实时适应不断演变的网络系统 2408.08968v5
  • 1409 06-07 Towards Black-Box Membership Inference Attack for Diffusion Models Auf dem Weg zu Black-Box-Mitgliedschafts-Inferenz-Angriff für Diffusionsmodelle 争取为扩散模型进行黑箱成员推导攻击 2405.20771v4
  • 1410 06-07 Understanding Nonlinear Implicit Bias via Region Counts in Input Space Nichtlineare Implizite Bias über Regionszählungen im Eingaberaum verstehen 通过输入空间中的区域计数理解非线性隐隐含比值 2505.11370v2
  • 1411 06-07 RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking RED QUEEN: Schutz großer Sprachmodelle gegen verschleierte Multi-Turn-Jailbreaking RED QUEEN: 保护大型语言模型,防止隐蔽的多轮破门破门 2409.17458v2
  • 1412 06-07 Do Protein Transformers Have Biological Intelligence? Haben Proteintransformatoren biologische Intelligenz? 蛋白质变异器有生物智能吗? 2506.06701v1
  • 1413 06-07 MarginSel : Max-Margin Demonstration Selection for LLMs MarginSel : Max-Margin-Demonstrationsauswahl für LLMs MaginSel : LLMS 最大边框演示选择 2506.06699v1
  • 1414 06-07 Contextual Experience Replay for Self-Improvement of Language Agents Kontextuelles Erfahrungsreplay zur Selbstverbesserung von Sprachagenten 语文代理人员自我改进的内事经验回顾 2506.06698v1
  • 1415 06-07 Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning Breaking Data Silos: Auf dem Weg zu offenen und skalierbaren Mobilitäts-Stiftungsmodellen durch generatives kontinuierliches Lernen 突破数据硅:通过创造持续学习实现开放和可缩放的流动基金会模式 2506.06694v1
  • 1416 06-07 From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications Vom Tiefrang gradienten Subraumstabilisierung zu Low-Rank-Gewichten: Beobachtungen, Theorien und Anwendungen 从低级梯度子空间稳定到低射速重量:观测、理论和应用 2407.11239v2
  • 1417 06-07 Learning Robust Heterogeneous Graph Representations via Contrastive-Reconstruction under Sparse Semantics Robuste Heterogene Graphendarstellungen über Kontrastive-Rekonstruktion unter Sparsemantik lernen 在粗简语义下通过对比重建进行学习强强强异基因图示 2506.06682v1
  • 1418 06-07 From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients Von Offline zu Online Memory-Free und Task-Free Continual Learning über feinkörnige Hypergradienten 从离线到在线无记忆和无任务持续学习 2502.18762v2
  • 1419 06-07 Interpretation of Deep Learning Model in Embryo Selection for In Vitro Fertilization (IVF) Treatment Interpretation von Deep-Learning-Modell in der Embryo-Auswahl für die In-Vitro-Düngung (IVF) Behandlung 体外受肥(IVF)治疗Embryo选择 Empryo的深学习模型解释 2506.06680v1
  • 1420 06-07 Rethinking Benign Overfitting in Two-Layer Neural Networks Neudenken Benign Overfitting in Zwei-Ebenen-Neural-Netzwerke 重新思考两层神经网络中的 “ 仁善 “ 改造 2502.11893v2
  • 1421 06-07 Accounting for plasticity: An extension of inelastic Constitutive Artificial Neural Networks Bilanzierung der Plastizität: Eine Erweiterung der inelastischen konstitutiven künstlichen Neuronalen Netzwerke 可塑性核算:无弹性建筑人工神经神经网络的扩展 2407.19326v2
  • 1422 06-07 Flood-DamageSense: Multimodal Mamba with Multitask Learning for Building Flood Damage Assessment using SAR Remote Sensing Imagery Flood-DamageSense: Multimodales Mamba mit Multitasking-Lernen zur Erstellung von Hochwasserschäden mit Hilfe von SAR Remote Sensing Imagery 水灾-灾情警报:利用SAR遥感图像学进行水灾损害评估的多任务学习的多模式Mamba 2506.06667v1
  • 1423 06-07 Through the Gaps: Uncovering Tactical Line-Breaking Passes with Clustering Durch die Lücken: Entdeckt taktische Linien-Breaking-Pässe mit Clustering 穿透差距:以集束方式破开战术直线通道 2506.06666v1
  • 1424 06-07 SDP-CROWN: Efficient Bound Propagation for Neural Network Verification with Tightness of Semidefinite Programming SDP-CROWN: Effiziente Bound Propagation für neurale Netzwerkverifikation mit Zähigkeit der semidefiniten Programmierung SDP-CROWN: 有效促进神经网络核查和半无限期方案拟订的严格性 2506.06665v1
  • 1425 06-07 SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL SLAC: Simulationsvorausgesetzter Latent Action Space für Ganzkörper-Real-World RL SLAC: 模拟 - 预先准备的中转行动空间,用于整个模拟 - 模拟 - 现实世界RL 2506.04147v2
  • 1426 06-07 Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence Kurzlanges Adversarial Training hilft LLMs, langanhaltende Jailbreak Attacken zu verteidigen: Theoretische und empirische Beweise 短期对面培训帮助长者捍卫长期越狱袭击:理论和经验证据 2502.04204v2
  • 1427 06-07 Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning Q-Policy: Quantenverstärkte Politikevaluierung für skalierbares Stärkungslernen Q-政策:为可扩展的强化学习加强政策评价 2505.11862v2
  • 1428 06-07 Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning Pareto-Optimal Rewards von Noisy Preferences lernen: Ein Rahmen für multi-objektives Inverse-Verstärkung-Lernen 从新偏爱中学习 Pareto- Opatimal 奖励:多目标反强化学习框架 2505.11864v2
  • 1429 06-07 GraphRAG under Fire GraphRAG unter Feuer 火下的图形RAG 2501.14050v3
  • 1430 06-07 Rescaled Influence Functions: Accurate Data Attribution in High Dimension Reskalierte Einflussfunktionen: Genaue Datenzuweisung in hoher Dimension 重新尺度影响函数: 高维度的准确数据归属 2506.06656v1
  • 1431 06-07 Explaining Risks: Axiomatic Risk Attributions for Financial Models Risiken erklären: Axiomatische Risikozuweisungen für Finanzmodelle 解释风险:金融模型的逻辑风险归属 2506.06653v1
  • 1432 06-07 Protap: A Benchmark for Protein Modeling on Realistic Downstream Applications Protap: Ein Benchmark für Proteinmodellierung auf realistischen Downstream-Anwendungen Protap:关于现实下下游应用的蛋白质建模基准 2506.02052v2
  • 1433 06-07 SAFER: A Calibrated Risk-Aware Multimodal Recommendation Model for Dynamic Treatment Regimes SAFER: Ein kalibriertes multimodales Empfehlungsmodell für dynamische Behandlungssysteme FAFER: 动态处理制度经校准的风险软件多式建议模型 2506.06649v1
  • 1434 06-07 Spark Transformer: Reactivating Sparsity in FFN and Attention Spark Transformer: Sparsamkeit in FFN und Aufmerksamkeit reaktivieren Spark 变换器: 重新激活FFFFF 的公平性和注意 2506.06644v1
  • 1435 06-07 GLASS: Guided Latent Slot Diffusion for Object-Centric Learning GLASS: Geführte Latent Slot Diffusion für Objekt-Centric Learning GLASS: 向导中中流槽扩散,用于对象中心学习 2407.17929v2
  • 1436 06-07 Zero-shot Generalist Graph Anomaly Detection with Unified Neighborhood Prompts Zero-shot Generalist Graph Anomalie Erkennung mit Unified Neighborhood Prompts 使用统一邻里提示器进行异常探测 2410.14886v2
  • 1437 06-07 Non-Intrusive Load Monitoring Based on Image Load Signatures and Continual Learning Nicht-intrusive Lastüberwachung basierend auf Bildlastsignaturen und kontinuierlichem Lernen 基于图像加载签名和不断学习的无侵入性负载监测 2506.06637v1
  • 1438 06-07 Vision-QRWKV: Exploring Quantum-Enhanced RWKV Models for Image Classification Vision-QRWKV: Quantum-erweiterte RWKV-Modelle für die Bildklassifikation 展望-QRWKKV:探索以量为基础增强的RWKV图像分类模型 2506.06633v1
  • 1439 06-07 Arbitrarily-Conditioned Multi-Functional Diffusion for Multi-Physics Emulation Willkürlich bedingte multifunktionale Diffusion für multiphysikalische Emulation 多物理模拟模拟多功能多功能扩散 2410.13794v2
  • 1440 06-07 Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach Wiederherstellung der Kalibrierung für ausgerichtete große Sprachmodelle: Ein Calibration-Aware Fine-Tuning-Ansatz 统一大语言模型的重新校准:校准-软件精美调试方法 2505.01997v2
  • 1441 06-07 Harnessing On-Device Large Language Model: Empirical Results and Implications for AI PC Nutzung von On-Device Large Language Modell: Empirische Ergebnisse und Implikationen für KI-PC 利用大语言大语言模型:经验结果和对AIPC的影响 2505.15030v3
  • 1442 06-07 Policy Filtration for RLHF to Mitigate Noise in Reward Models Politische Filtration für RLHF zur Mititation von Lärm in Prämienmodellen 将RLHF政策归类为奖励模型中最小噪音的政策 2409.06957v5
  • 1443 06-07 Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning Lehrplan Verstärkung Lernen von einfachen bis harten Aufgaben verbessert LLM Reasoning 从易到困难的工作学习改进LLM 理由 2506.06632v1
  • 1444 06-07 Q-WSL: Optimizing Goal-Conditioned RL with Weighted Supervised Learning via Dynamic Programming Q-WSL: Optimierung der zielkonditionierten RL mit gewichtetem betreutem Lernen durch dynamische Programmierung Q-WSL:通过动态方案拟订,以加权监督的学习方式,优化目标有条件的RL 2410.06648v5
  • 1445 06-07 Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts Zertifiziert robuste Modellbewertung im Federated Learning unter Meta-Distributionsverschiebungen 在混合分配变化下对联邦学习模式进行可验证的健全模式评价 2410.20250v2
  • 1446 06-07 Equilibrium Distribution for t-Distributed Stochastic Neighbor Embedding with Generalized Kernels Gleichgewichtsverteilung für t-Distributed Stochastic Neighbor Einbettung mit generalisierten Kernen 以一般内核嵌入的三散散散蒸散邻居间均匀分布 2505.24311v2
  • 1447 06-07 Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations Robuste Erlernbarkeit von Proben-kompressiblen Verteilungen unter Lärm oder störenden Störungen 在噪音或对称干扰下抽样压缩分发的可有力学习性 2506.06613v1
  • 1448 06-07 Transferring Features Across Language Models With Model Stitching Übertragung von Funktionen über Sprachmodelle mit Modellstich 使用模型裁剪的跨语言模型传输功能 2506.06609v1
  • 1449 06-07 Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit Training-Free Tokenizer Transplantation über Orthogonal Passende Verfolgung 通过矫形配对运动进行无培训的取暖器移植 2506.06607v1
  • 1450 06-07 Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization Stacey: Förderung des stochastischen Steepest Descents durch beschleunigte Optimierung von $\ell_p$-Smooth Nonconvex Stacey: 通过加速 $\ ell_ p$- Smooth 的非convex 优化, 推广斯托切性隐蔽源 2506.06606v1
  • 1451 06-07 Scoring the Unscorables: Cyber Risk Assessment Beyond Internet Scans Bewertung der Unscorables: Cyber-Risikobewertung jenseits von Internet-Scans CSCOW:互联网扫描之外的网络风险评估 2506.06604v1
  • 1452 06-07 CAtCh: Cognitive Assessment through Cookie Thief CAtCh: Kognitive Bewertung durch Cookie Dieb CATCH: 通过饼干小偷进行认知评估 2506.06603v1
  • 1453 06-07 Direct Prediction Set Minimization via Bilevel Conformal Classifier Training Direkte Vorhersage Einstellung Minimierung über Bilevel Conformal Classifier Training 通过双级正规分类培训尽量减少直接预测设置 2506.06599v1
  • 1454 06-07 Scaling Inference-Efficient Language Models Skalierung von Schlussfolgerungen-Effiziente Sprachmodelle 扩展率-有效语言模型 2501.18107v2
  • 1455 06-07 From Model-Based and Adaptive Control to Evolving Fuzzy Control Von der modellbasierten und adaptiven Steuerung zur sich entwickelnden Fuzzy-Steuerung 从模型和适应控制到演化模糊控制 2506.06594v1
  • 1456 06-06 (5) Improving the Diffusability of Autoencoders Verbesserung der Diffusibilität von Autoencodern 提高自动编码器的易变性 2502.14831v3
  • 1457 06-06 Neighbour-Driven Gaussian Process Variational Autoencoders for Scalable Structured Latent Modelling Nachbar-getriebene Gauß-Prozessvariationelle Autoencoder für skalierbare strukturierte Latentenmodellierung 相邻的Driven Gaussian 用于可缩放结构晚期模拟模型的流程变异自动编码器 2505.16481v2
  • 1458 06-06 Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures Globale Konvergenz von Gradienten EM für überparameterisierte Gauß-Mischungen 用于超度测量高斯混合体的梯度 EM 全球趋同 2506.06584v1
  • 1459 06-06 Demystifying Topological Message-Passing with Relational Structures: A Case Study on Oversquashing in Simplicial Message-Passing Entmystifizieren Topologisches Message-Passing mit relationalen Strukturen: Eine Fallstudie zum Übersetzen in Simplicial Message-Passing 解开神秘的地形信息与关系结构的粘贴:关于简单信息传送中过度隔绝的案例研究 2506.06582v1
  • 1460 06-06 Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques Auf dem Weg zu einer effizienten Multi-LLM-Inferenz: Charakterisierung und Analyse von LLM Routing und Hierarchical Techniques 实现高效的多LLLM 多级LLM推理:LLM转流和等级技术的特征和分析 2506.06579v1
  • 1461 06-06 Generalization of Geometric Graph Neural Networks with Lipschitz Loss Functions Generalisierung geometrischer Graphen-Neuralnetze mit Lipschitz-Verlustfunktionen 具有Lipschitz损失功能的几何图神经网络的通用化 2409.05191v2
  • 1462 06-06 DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling DISC: DISC: Dynamische Zersetzung verbessert LLM-Inferenzskalierung DISC: DISC: 动态分解改善LLM 推推法的扩大 2502.16706v2
  • 1463 06-06 A Manifold Perspective on the Statistical Generalization of Graph Neural Networks Eine vielfältige Perspektive auf die statistische Verallgemeinerung von Graphen-Neuralen Netzwerken 图表神经网络统计一般化的多功能视角 2406.05225v6
  • 1464 06-06 Graph Persistence goes Spectral Grafik Persistenz geht Spektral 图 常年 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 2506.06571v1
  • 1465 06-06 Reconstructing Heterogeneous Biomolecules via Hierarchical Gaussian Mixtures and Part Discovery Rekonstruieren heterogene Biomoleküle über hierarchische Gauß-Mischungen und Teil-Entdeckung 通过高山级等级混合体和部件发现,通过等级制混合体重新构造异基因生物分子 2506.09063v1
  • 1466 06-06 Securing Traffic Sign Recognition Systems in Autonomous Vehicles Sicherung von Verkehrszeichenerkennungssystemen in autonomen Fahrzeugen 保障机动车辆交通标志识别系统的安全 2506.06563v1
  • 1467 06-06 Enhancing radioisotope identification in gamma spectra via supervised domain adaptation Verbesserung der Radioisotop-Identifikation in Gammaspektren durch Überwachung der Domänenanpassung 通过受监督的域适应加强伽马光谱中放射性同位素识别 2412.07069v2
  • 1468 06-06 ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models ThinkEdit: Verdolmetschbare Gewichte bearbeiten, um übermäßig kurzes Denken in vernünftigen Modellen zu vermischen 思考Edit:将可解释的重量编辑为 “ 解释性模型 “ 中模拟超短思考 2503.22048v3
  • 1469 06-06 Rapid training of Hamiltonian graph networks without gradient descent Schnelles Training von Hamiltonschen Graphennetzwerken ohne Gradientenabstieg 快速培训汉密尔顿图表网络,无梯度下降 2506.06558v1
  • 1470 06-06 Infinity Search: Approximate Vector Search with Projections on q-Metric Spaces Infinity Search: Ungefähre Vektorsuche mit Projektionen auf q-Metric Spaces 无限搜索: 近似矢量搜索和 q- 计量空间预测 2506.06557v1
  • 1471 06-06 SDN-Based False Data Detection With Its Mitigation and Machine Learning Robustness for In-Vehicle Networks SDN-basierte falsche Datenerkennung mit seiner Abmilderung und Machine Learning Robustheit für In-Vehicle-Netzwerke 基于SDN的假数据探测,其减轻和机械学习强力适用于车辆内网络 2506.06556v1
  • 1472 06-06 Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features Theoretische Grundlagen des Deep Copula Klassifikators: Ein generativer Ansatz zur Modellierung abhängiger Merkmale 深 Cocula 分类法理论基础:建模附属地貌的开创性方法 2505.22997v2
  • 1473 06-06 GeoClip: Geometry-Aware Clipping for Differentially Private SGD GeoClip: Geometry-Aware Clipping für unterschiedlich private SGD GeoClip: 区分专用 SGD 的几何- 软件剪切 2506.06549v1
  • 1474 06-06 MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping MINT: Multimodale Instruktions-Tuning mit multimodaler Interaktions-Gruppierung MINT: 与多式互动组合组合的多式指示调制 2506.02308v3
  • 1475 06-06 Mixed-curvature decision trees and random forests Entscheidungsbäume mit gemischter Krümmung und zufällige Wälder 混合精度决定树和随机森林 2410.13879v3
  • 1476 06-06 GNNAnatomy: Rethinking Model-Level Explanations for Graph Neural Networks GNNAnatomy: Rethinking Model-Level Erklärungen für Graph Neuronale Netzwerke GNNN 解剖:重新思考图形神经网络的模型级解释 2406.04548v3
  • 1477 06-06 Direct Fisher Score Estimation for Likelihood Maximization Direkte Fisher-Score-Schätzung für Likelihood Maximierung 渔业直接分数估计,以争取使机会最大化 2506.06542v1
  • 1478 06-06 LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs LabSafety Bench: Benchmarking LLMs zu Sicherheitsfragen in wissenschaftlichen Laboren 实验室安全法官:科学实验室安全问题LLM女士基准基准 2410.14182v3
  • 1479 06-06 Rational Tuning of LLM Cascades via Probabilistic Modeling Rationales Tuning von LLM-Kaskaden durch probabilistische Modellierung 通过概率建模对LLM Casscades 进行合理测算 2501.09345v4
  • 1480 06-06 Hierarchical and Collaborative LLM-Based Control for Multi-UAV Motion and Communication in Integrated Terrestrial and Non-Terrestrial Networks Hierarchische und kollaborative LLM-basierte Steuerung für Multi-UAV-Bewegung und Kommunikation in integrierten terrestrischen und nicht-terrestrischen Netzwerken 在综合陆地和非地球网络中以基于LLM(基于LLM)的基于控制系统,以控制多功能、无人驾驶航空器的移动和通信 2506.06532v1
  • 1481 06-06 Training-Free Constrained Generation With Stable Diffusion Models Schulungsfreie eingeschränkte Generation mit stabilen Diffusionsmodellen 具有稳定传播模式的稳定传播模式的无培训、无培训的受训练一代 2502.05625v3
  • 1482 06-06 Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs Sharp Gap-Dependent Variance-Aware Regret Bounds für Tabular MDPs 表列 MDP 的差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差幅 2506.06521v1
  • 1483 06-06 A Systematic Review of Poisoning Attacks Against Large Language Models Eine systematische Überprüfung von Giftangriffen gegen große Sprachmodelle 系统审查针对大语言模式的中毒袭击 2506.06518v1
  • 1484 06-06 MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding MindLLM: Ein subject-agnostisches und vielseitiges Modell für fMRI-to-Text Decodierung MindLLLM: FMRI 到文本代码的主体不可知性和易变性模型 2502.15786v2
  • 1485 06-06 InstantFT: An FPGA-Based Runtime Subsecond Fine-tuning of CNN Models InstantFT: Eine FPGA-basierte Laufzeit-Untersekunde Feinabstimmung von CNN-Modellen 即时TFT:以FPGA为基础对有线电视新闻网模型进行FPGA运行时间第二级微调 2506.06505v1
  • 1486 06-06 Optimal Rates in Continual Linear Regression via Increasing Regularization Optimale Raten in kontinuierlicher linearer Regression durch zunehmende Regularisierung 通过增加正规化,持续线性回归的优化率 2506.06501v1
  • 1487 06-06 Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms Synthetische Problemgenerierung für die Vernunft über Qualität-Diversität-Algorithmen 通过质量-多样性算法导致的原因产生合成问题 2506.06499v1
  • 1488 06-06 Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent Verbesserte Finite-Partikel-Konvergenzraten für Steinvariationsgradabszenz 改进斯坦因变化而变的渐变后代的有限粒子聚合率 2409.08469v3
  • 1489 06-06 Synthetic Text Generation for Training Large Language Models via Gradient Matching Synthetische Textgenerierung für das Training großer Sprachmodelle über Gradient Matching 通过 “ 渐级匹配 “ 培训大语言模型的合成文本生成 2502.17607v2
  • 1490 06-06 Differentiable Quadratic Optimization For The Maximum Independent Set Problem Unterschiedliche quadratische Optimierung für das maximale unabhängige Set-Problem 最大独立集集问题可区别的二次二次曲线优化 2406.19532v7
  • 1491 06-06 RelGNN: Composite Message Passing for Relational Deep Learning RelGNN: Composite-Botschaft für relationales Deep Learning RelGNN:用于关系深深学习的综合信息传递 2502.06784v2
  • 1492 06-06 Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks Alternierende Gradientenströme: Eine Theorie des Feature-Lernens in zweischichtigen Neuronalen Netzwerken 交错的渐变流:两层神经网络中的特色学习理论 2506.06489v1
  • 1493 06-06 Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws Diversity of Transformer Layers: Ein Aspekt der Parameterskalierungsgesetze 变形层多样性:参数面积法的一个方面 2505.24009v2
  • 1494 06-06 Membership Inference Attacks for Unseen Classes Mitgliedschafts-Inferenz-Angriffe für ungesehene Klassen 对不见人见的类类的推论攻击 2506.06488v1
  • 1495 06-06 Neural-Augmented Kelvinlet: Real-Time Soft Tissue Deformation with Multiple Graspers Neural-Augmented Kelvinlet: Echtzeit-Soft Tissue-Deformation mit mehreren Graspers 神经加速开尔文莱:实时软体组织变形,带多颗粒变形 2506.08043v1
  • 1496 06-06 A Certified Unlearning Approach without Access to Source Data Ein zertifizierter Unlearning-Ansatz ohne Zugriff auf Quelldaten 未经认证的不学习方法,无法获得源数据 2506.06486v1
  • 1497 06-06 The Economic Dispatch of Power-to-Gas Systems with Deep Reinforcement Learning:Tackling the Challenge of Delayed Rewards with Long-Term Energy Storage Die wirtschaftliche Dispatch von Power-to-Gas-Systemen mit tiefem Ausbau-Lernen:Die Herausforderung von verzögerten Belohnungen mit langfristiger Energiespeicherung bewältigen 以深入强化学习的方式经济发送电力到地电系统:利用长期能源储存应对延迟奖励的挑战 2506.06484v1
  • 1498 06-06 Noise Consistency Regularization for Improved Subject-Driven Image Synthesis Lärmkonsistenz-Regularisierung für verbesserte themengetriebene Bildsynthese 改进主题驱动图像合成改进目标驱动图像集的一致一致规范化 2506.06483v1
  • 1499 06-06 TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness TimeRecipe: Ein Time-Series-Prognose-Rezept über Benchmarking Module Level Wirksamkeit 时间Recpe: 通过基准标准模块的实时序列预测食谱 2506.06482v1
  • 1500 06-06 Unveiling Markov Heads in Pretrained Language Models for Offline Reinforcement Learning Markov-Köpfe in vorgebildeten Sprachmodellen für Offline-Verstärkungslernen enthüllen 离线强化学习培训前语言模式中未排行的Markov负责人 2409.06985v2
  • 1501 06-06 DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation DELT: Ein einfaches Diversity-orientiertes EarlyLate-Training für die Datensatzdestillation DELT: 以简单多样性驱动的数据集蒸馏早期远程培训 2411.19946v2
  • 1502 06-06 Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage Kosteneffizientes LLM-Training mit Lifetime-Aware Tensor-Offloading über GPUDirect Storage 通过GPUPUDirect 储存进行终身软件传感器脱载的成本效益高的LLM培训 2506.06472v1
  • 1503 06-06 SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation SIGMA:通过西布林-引导蒙特卡罗增殖改进大语言示范理由说明 2506.06470v1
  • 1504 06-06 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning BRiTE: Bootstrapping verstärkter Denkprozess zur Verbesserung von Sprachmodellen Brite:强化强化思考程序以加强语言示范理由 2501.18858v2
  • 1505 06-06 Latent Thought Models with Variational Bayes Inference-Time Computation Latent Thought Models mit Variational Bayes Inferenz-Zeit-Computation 具有变异性贝湾推推算-时间计算法的古老思想模型 2502.01567v2
  • 1506 06-06 FairICP: Encouraging Equalized Odds via Inverse Conditional Permutation FairICP: Förderung ausgeglichener Odds durch inverse Conditional Permutation FairICP:通过反条件性变异鼓励平等奇数 2404.05678v4
  • 1507 06-06 Towards Infant Sleep-Optimized Driving: Synergizing Wearable and Vehicle Sensing in Intelligent Cruise Control Auf dem Weg zum schlafoptimierten Fahren für Säuglinge: Verwendbares und Fahrzeug sensibilisieren in intelligenter Kreuzfahrtsteuerung 婴儿睡眠优化驾驶:在智能巡航控制中统一穿戴和车辆遥感 2506.06459v1
  • 1508 06-06 WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets WISCA: Ein auf Konsens basierender Ansatz zur Harmonisierung der Dolmetschbarkeit in Tabellendatensätzen WISSCA: 统一图表数据集解释性的协商一致办法 2506.06455v1
  • 1509 06-06 LETS Forecast: Learning Embedology for Time Series Forecasting LETS-Prognose: Lernembedologie für Zeitreihenprognosen LETS预测: 学习时间序列预测的生态学 2506.06454v1
  • 1510 06-06 Tackling Feature and Sample Heterogeneity in Decentralized Multi-Task Learning: A Sheaf-Theoretic Approach Schlagen von Feature und Sample Heterogenität im dezentralisierten Multi-Task-Lernen: Ein schaftheoretischer Ansatz 处理分散化多任务学习中的地貌特征和抽样差异:沙夫理论方法 2502.01145v2
  • 1511 06-06 Canonical Autoregressive Generation Kanonische autoregressive Generation Canonic 自动后退一代 2506.06446v1
  • 1512 06-06 Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Saffron-1: Auf dem Weg zu einem Skalierungsparadigma für die LLM-Sicherheitssicherung 山红-1:努力为LLM公司安全保证建立一个推论范围扩增范例 2506.06444v1
  • 1513 06-06 Unlocking Chemical Insights: Superior Molecular Representations from Intermediate Encoder Layers Entsperren von chemischen Erkenntnissen: Überlegene molekulare Darstellungen von Zwischen-Encoder-Schichten 解锁化学透视:中间编码层的高级分子代表 2506.06443v1
  • 1514 06-06 A Framework of decision-relevant observability: Reinforcement Learning converges under relative ignorability Ein Rahmen entscheidungsrelevanter Beobachtbarkeit: Verstärktes Lernen konvergiert unter relativer Entzündlichkeit 与决策相关的可观察性框架:加强学习在相对忽视的情况下趋同 2504.07722v4
  • 1515 06-06 Exploring Diffusion Transformer Designs via Grafting Erforschen von Diffusion Transformer Designs über Grafting 探索通过Garfting 开发传播变异器设计 2506.05340v2
  • 1516 06-06 Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias Eigenspektrumanalyse von Neuronalen Netzwerken ohne Aspect Ratio Bias 对无视比比比的神经网络进行微微光谱分析 2506.06280v1
  • 1517 06-06 STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis STARFlow: Skalierung von Latent Normalisierungsströmen für hochauflösende Bildsynthese STARRFlow: 增加高分辨率图像合成的中端正常化流程 2506.06276v1
  • 1518 06-06 Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding Filmfakten und Fibs (MF$^2$): Ein Benchmark für langes Filmverständnis 电影事实和纤维(MF$2美元):长期电影理解基准 2506.06275v1
  • 1519 06-06 Do Large Language Models Reason Causally Like Us? Even Better? Haben große Sprachmodelle Grund ursächlich wie wir? Noch besser? 大语言模型的原因和我们一样吗?更好吗? 2502.10215v2
  • 1520 06-06 Covering Number of Real Algebraic Varieties and Beyond: Improved Bounds and Applications Abdeckung der Anzahl der realen algebraischen Sorten und darüber hinaus: Verbesserte Grenzen und Anwendungen 覆盖实际代数差异数目及以后:改善边界和应用 2311.05116v4
  • 1521 06-06 Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems Sketched Equivariant Imaging Regularization und Deep Internal Learning für inverse Probleme 用于反向问题的预言的等同成像正规化和深内部学习 2411.05771v4
  • 1522 06-06 Normalizing Flows are Capable Generative Models Normalisierende Strömungen sind fähige Generative Modelle 正常流动是能够生成的模型 2412.06329v3
  • 1523 06-06 Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens Reflect-then-Plan: Offline-Modellbasierte Planung durch eine doppelt Bayesian-Lens 反映时计划:通过杜布里·巴耶斯冷镜头进行离线示范规划 2506.06261v1
  • 1524 06-06 DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation DPCore: Dynamischer Prompt-Kernsatz für kontinuierliche Test-Zeit-Anpassung DPCore: 持续试验-时间适应动态快速核心集 2406.10737v4
  • 1525 06-06 PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time PersonaAgent: Wenn große Sprachmodell-Agenten Personalisierung zur Testzeit treffen 人 员:当大语言示范媒介在试验时间达到个性化时 2506.06254v1
  • 1526 06-06 Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Sparse Autoencoder lernen monosemantische Funktionen in Vision-Language-Modellen 在视觉语言模型中学习数学特征 2504.02821v2
  • 1527 06-06 Lagrangian-based Equilibrium Propagation: generalisation to arbitrary boundary conditions & equivalence with Hamiltonian Echo Learning Lagrangian-basierte Equilibrium Propagation: Verallgemeinerung zu willkürlichen Randbedingungen & Äquivalenz mit Hamiltonian Echo Learning Lagrangian 以拉格朗江为基础的平衡传播:对任意边界条件的概括化和与汉密尔顿回声学习的等同 2506.06248v1
  • 1528 06-06 Neural Responses to Affective Sentences Reveal Signatures of Depression Neurale Reaktionen auf Affective Sentences Offenbarungssignaturen von Depressionen 对消极判决的神经反应 反萧条征兆 2506.06244v1
  • 1529 06-06 LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models LlavaGuard: Ein offenes VLM-basiertes Framework zur Sicherung von Vision-Datensätzen und -Modellen LlavaGuard:基于VLM的公开保护视觉数据集和模型框架 2406.05113v3
  • 1530 06-06 fairmetrics: An R package for group fairness evaluation fairmetrics: Ein R-Paket für die Bewertung von Gruppengerechtigkeit 公平度:团体公平评估R包件 2506.06243v1
  • 1531 06-06 TQml Simulator: Optimized Simulation of Quantum Machine Learning TQml Simulator: Optimierte Simulation des Quantum Machine Learning TQml 模拟器: 量子机器学习优化模拟 2506.04891v2
  • 1532 06-06 Towards an Explainable Comparison and Alignment of Feature Embeddings Auf dem Weg zu einem erklärbaren Vergleich und der Ausrichtung von Feature-Embeddings 争取实现可解释的比较和地貌隐含物的一致 2506.06231v1
  • 1533 06-06 NeurNCD: Novel Class Discovery via Implicit Neural Representation NeurNCD: Novel Class Discovery über Implizite Neuraldarstellung NeurNCD: 通过隐性神经代表方式发现新类 2506.06412v1
  • 1534 06-06 Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets Fréchet Radiomic Distance (FRD): Ein vielseitiges Metric für den Vergleich medizinischer Bilddatensätze Fréchet放射性距离(FRD):用于比较医疗成像数据集的Versatile计量仪 2412.01496v2
  • 1535 06-06 Corrector Sampling in Language Models Korrektor-Probenahme in Sprachmodellen 语言模型中校正器样本 2506.06215v1
  • 1536 06-06 Model-Driven Graph Contrastive Learning Modellgetriebenes Graph Kontrastives Lernen 模型驱动图反比学习 2506.06212v1
  • 1537 06-06 Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning Pseudo-Etikettierung trifft Labelglättung für geräuscharmes partiales Label-Lernen Pseudo 标签会满足新闻部分标签学习的标签平滑 2402.04835v3
  • 1538 06-06 Kinetics: Rethinking Test-Time Scaling Laws Kinetik: Überdenken von Test-Zeit-Skalierungsgesetzen 动因:重新思考试验时间扩增法 2506.05333v2
  • 1539 06-06 ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment ByzSecAgg: Ein Byzantinisch-Resistent Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment ByzsecAgg:基于编码计算和矢量承诺的联邦学习拜占庭 – – 相对安全安全聚合计划 2302.09913v4
  • 1540 06-06 How to craft a deep reinforcement learning policy for wind farm flow control Wie man eine tief greifende Lernpolitik für die Steuerung des Windparkflusses schafft 如何为风力农场流动控制制定深入强化学习政策 2506.06204v1
  • 1541 06-06 MLOps with Microservices: A Case Study on the Maritime Domain MLOps mit Microservices: Eine Fallstudie zum maritimen Bereich 具有微服务的多边业务方案:海洋领域案例研究 2506.06202v1
  • 1542 06-06 Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons Quantifizierung der Optimierungs- und Generalisierungsvorteile von Graphen-Neuralnetzwerken gegenüber mehrschichtigen Perzeptronen 量化多层倍感图像神经网络的优化和普及优势 2306.13926v3
  • 1543 06-06 DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models DyGMamba: Langfristige zeitliche Abhängigkeiten auf kontinuierlich-zeitdynamischen Graphen mit State Space Models effizient modellieren DyGMamba:利用国家空间模型以连续时动态图表作为有效模拟长期时间依赖模型 2408.04713v4
  • 1544 06-06 Transformative or Conservative? Conservation laws for ResNets and Transformers Transformative oder konservative? Erhaltungsgesetze für ResNets und Transformer 改革还是保守? ResNets和变异器的养护法 2506.06194v1
  • 1545 06-06 ICU-TSB: A Benchmark for Temporal Patient Representation Learning for Unsupervised Stratification into Patient Cohorts ICU-TSB: Benchmark für zeitliche Patientenrepräsentanz-Lernen zur unüberwachten Schichtung in Patientenkohorten ICU-TSB: 用于在不受监督的情况下将患者分划入患者群体的时间病人代表制学习基准 2506.06192v1
  • 1546 06-06 Physics-Informed Neural Networks for Control of Single-Phase Flow Systems Governed by Partial Differential Equations Physik-informierte Neuronale Netze zur Steuerung von einphasigen Flusssystemen, die durch partielle Differentialgleichungen gesteuert werden 用于控制按部分差异等量划分的单阶段流动系统的物理内建神经网络 2506.06188v1
  • 1547 06-06 Antithetic Noise in Diffusion Models Antithetischer Lärm bei Diffusionsmodellen 扩散模型中的抗异静能 2506.06185v1
  • 1548 06-06 SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels SemiOccam: Ein robustes semi-überwachtes Bilderkennungsnetzwerk mit Sparse-Labels 半 Occam: 使用粗略标签粗略标签的强力半半超图像识别网络 2506.03582v2
  • 1549 06-06 A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization Eine theoretische Studie der (Hyper-)Selbstachtung durch das Lens of Interactions: Repräsentation, Ausbildung, Verallgemeinerung 通过互动的镜头进行(Hyper)自我关注的理论研究:代表性、培训、普遍化 2506.06179v1
  • 1550 06-06 CoxNTF: A New Approach for Joint Clustering and Prediction in Survival Analysis CoxNTF: Ein neuer Ansatz für Joint Clustering und Vorhersagen in der Überlebensanalyse CoxNTF:在生存分析中联合集群和预测的新方法 2506.06411v1
  • 1551 06-06 Reusing Trajectories in Policy Gradients Enables Fast Convergence Wiederverwendung von Trajektorien in politischen Gradienten ermöglicht schnelle Konvergenz 在政策梯度中重新使用轨迹以启用快速聚合 2506.06178v1
  • 1552 06-06 Improving choice model specification using reinforcement learning Verbesserung der Wahlmodellspezifikation mittels Verstärkungslernen 利用强化学习改进选择模式规格 2506.06410v1
  • 1553 06-06 Emergent Response Planning in LLMs Emergent Response Planning in LLMs 新的LLMM 应急规划 2502.06258v2
  • 1554 06-06 A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks Riemannische Optimierungsperspektive der Gauß-Newton-Methode für neurale Netze Fefforward神经网络的高斯-牛顿方法的优化观点 2412.14031v4
  • 1555 06-06 The Lock-in Hypothesis: Stagnation by Algorithm Die Lock-in Hypothese: Stagnation durch Algorithmen Lock-in 假说: 阿尔哥里特姆的停滞 2506.06166v1
  • 1556 06-06 Automated Journalistic Questions: A New Method for Extracting 5W1H in French Automatisierte journalistische Fragen: Eine neue Methode zur Extraktion von 5W1H in Französisch 自动新闻问题:以法文提取5W1H的新方法 2505.14804v2
  • 1557 06-06 Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning Nicht alle Rollouts sind nützlich: Down-Sampling Rollouts im LLM Verstärkungslernen 并非所有的推出都有用:在LLM强化学习中进行下选推出 2504.13818v2
  • 1558 06-06 ENMA: Tokenwise Autoregression for Generative Neural PDE Operators ENMA: Tokenwise Autoregression für generative neurale PDE-Betreiber ENMA: 生成神经器 PDE 运算符的取向自动递减 2506.06158v1
  • 1559 06-06 Deconfounding Multi-Cause Latent Confounders: A Factor-Model Approach to Climate Model Bias Correction Deconfounding Multi-Cause Latent Confounders: Ein Faktor-Modell Ansatz zur Klimamodell Bias Korrektur 多原因、中位本部混乱创始人:对气候模型比重纠正的因数模式办法 2408.12063v2
  • 1560 06-06 Graph Neural Network Generalization with Gaussian Mixture Model Based Augmentation Graphische Neural-Netzwerk-Verallgemeinerung mit Gauß-Mischungsmodell-basierter Augmentation Gaussian 混合模型基底放大模型的神经网络通用 2411.08638v3
  • 1561 06-06 ELEVATE-GenAI: Reporting Guidelines for the Use of Large Language Models in Health Economics and Outcomes Research: an ISPOR Working Group on Generative AI Report ELEVATE-GenAI: Berichterstattungsrichtlinien für den Einsatz von großen Sprachmodellen in der Gesundheitsökonomie und der Ergebnisforschung: eine ISPOR-Arbeitsgruppe für generative KI-Berichte ELEVATE-GenAI:在卫生经济学和成果研究中使用大语言模式的报告准则:科学、科学、科学、科学、技术研究所创举报告工作组 2501.12394v2
  • 1562 06-06 Laplace Transform Based Low-Complexity Learning of Continuous Markov Semigroups Laplace Transform-basierte Low-Complexity-Lernen von kontinuierlichen Markov-Halbgruppen 连续马尔科夫半群体低复杂程度低学习 2410.14477v2
  • 1563 06-06 LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws LLMs on the Line: Daten bestimmen die Loss-to-Loss-Skalierungsgesetze 线上LLM:数据确定损失到损失的缩放法 2502.12120v2
  • 1564 06-06 carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks carps: Ein Rahmen für den Vergleich von N Hyperparameter-Optimierern auf M Benchmarks 在M基准上比较超超参数最佳放大器的框架 2506.06143v1
  • 1565 06-06 Efficient Diffusion Models: A Survey Effiziente Diffusionsmodelle: Eine Umfrage 高效传播模型:调查 2502.06805v3
  • 1566 06-06 Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models Tabelle-r1: Selbstbeaufsichtigtes und verstärktes Lernen für programmbasierte Tabellenveranlagung in kleinen Sprachmodellen 表-r1:针对基于方案的小语言模式说明理由的表格进行自我监督和强化学习 2506.06137v1
  • 1567 06-06 Regret-Free Reinforcement Learning for LTL Specifications Bedauern-freie Verstärkungs-Lernen für LTL-Spezifikationen 为远程学习具体要求而后悔地免费加强学习 2411.12019v2
  • 1568 06-06 VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters VisionTS: Visuelle Maskierte Autoencoder sind kostenlose Zero-Shot-Zeitreihen-Forecaster 视野:视觉蒙面自动识别器是无热零热时间序列预测器 2408.17253v4
  • 1569 06-06 Similarity Matching Networks: Hebbian Learning and Convergence Over Multiple Time Scales Ähnlichkeit passender Netzwerke: Hebbisches Lernen und Konvergenz über mehrere Zeitskalan 相似性匹配网络: Hebbian 学习和多重时间尺度的趋同 2506.06134v1
  • 1570 06-06 Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions Verbesserung der numerischen Stabilität des Normalisierten gegenseitigen Informations-Schätzers über hohe Abmessungen 提高高方面标准化相互信息模拟器的数值稳定 2410.07642v2
  • 1571 06-06 Gradient Similarity Surgery in Multi-Task Deep Learning Gradient Ähnlichkeit Chirurgie in Multi-Task Deep Learning 多任务深层学习中的渐渐相似性外科手术 2506.06130v1
  • 1572 06-06 Flow-Attentional Graph Neural Networks Fluss-aufmerksamer Graph Neuronale Netzwerke 流动动态图图神经网络 2506.06127v1
  • 1573 06-06 Convergence of linear programming hierarchies for Gibbs states of spin systems Konvergenz von linearen Programmierhierarchien für Gibbs-Zustände von Spinsystemen Gibbbs旋转系统状态线性编程分级组合 2506.06125v1
  • 1574 06-06 Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library Verstärkte Lernoptimierung für großformatiges Lernen: Eine effiziente und benutzerfreundliche Scaling-Bibliothek 大规模学习的强化学习优化:高效和方便用户的扩大规模图书馆 2506.06122v1
  • 1575 06-06 Training on Plausible Counterfactuals Removes Spurious Correlations Training auf Plausible Counterfactals entfernt spurlose Korrelationen 关于可视反事实消除污损的培训 2505.16583v5
  • 1576 06-06 Learners’ Languages Sprachen der Lernenden 学生语言 2103.01189v3
  • 1577 06-06 Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness Auf dem Weg zum Lebenszyklus Unlearning Commitment Management: Messung der Stichprobenebene Unlearning Completeness 实现生命周期不学习承诺管理:衡量抽样一级不学习 2506.06112v1
  • 1578 06-06 Synthetic Tabular Data: Methods, Attacks and Defenses Synthetische Tabellendaten: Methoden, Angriffe und Verteidigungen 合成图表数据:方法、攻击和防御 2506.06108v1
  • 1579 06-06 The Canary’s Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text Das Echo der Kanaren: Auditing Privacy Risks of LLM-generated Synthetic Text 加那利的回声:LLM发明的合成文字的 审计隐私风险 2502.14921v2
  • 1580 06-06 Label-Context-Dependent Internal Language Model Estimation for CTC Label-Context-Dependent Interne Sprachmodellschätzung für CTC Label-Ctext-依赖性内部语言模型 反恐委员会内部语言模型估算 2506.06096v1
  • 1581 06-06 Flexible Operator Fusion for Fast Sparse Transformer with Diverse Masking on GPU Flexible Operator Fusion für schnellen Sparse Transformer mit vielfältigem maskieren auf GPU 具有 GPU 上多种掩码的快速分散变异器灵活操作器融合 2506.06095v1
  • 1582 06-06 On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems On-Board-Mission-Replanning für adaptive kooperative Multi-Roboter-Systeme 特派团适应性合作多机器人系统重新规划 2506.06094v1
  • 1583 06-06 Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture Paradigmen der AI-Bewertung: Kartierungsziele, Methoden und Kultur AI 评价:绘图目标、方法和文化 2502.15620v2
  • 1584 06-06 HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions HeavyWater und SimplexWater: Wassermarkierung mit geringer Entropie Textverteilung 重水和简易水:低英基文本分发的标注水 2506.06409v1
  • 1585 06-06 LinGuinE: Longitudinal Guidance Estimation for Volumetric Lung Tumour Segmentation LinGuinE: Longitudinal Guidance Schätzung für Volumetrische Lungentumorsegmentierung LinGuinE: 活性肺炎肿瘤分离的纵向指导估计 2506.06092v1
  • 1586 06-06 Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion Multidimensionaler Adaptiver Koeffizient für die Inferenz-Trajektorie-Optimierung in Strömung und Diffusion 流动与传播过程中的多维可调适系数 2404.14161v3
  • 1587 06-06 Multilevel neural simulation-based inference Mehrstufige neuronale simulationsbasierte Inferenz 多级神经模拟模拟推断 2506.06087v1
  • 1588 06-06 A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data Ein neuartiges, Mensch-in-the-Loop Computational Grounded Theory Framework für große soziale Daten 大型社会数据新颖的、 “ 网上人 “ 计算基础理论大社会数据新理论框架 2506.06083v1
  • 1589 06-06 Training Deep Learning Models with Norm-Constrained LMOs Ausbildung von Deep Learning-Modellen mit normgebundenen ABS 与受诺姆约束的LMO培训的LMO培训深学习模式 2502.07529v2
  • 1590 06-06 Proximal Policy Distillation Proximale Politikdestillation Proximal 政策蒸馏 2407.15134v2
  • 1591 06-06 System-Aware Unlearning Algorithms: Use Lesser, Forget Faster System-Aware Unlearning Algorithmen: Weniger verwenden, schneller vergessen 系统提醒 取消学习算法 : 使用更少, 忘记更快 2506.06073v1
  • 1592 06-06 ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis ProofAug: Effiziente Neuraltheorem-Probung mittels feinkörniger Proof-Strukturanalyse 校对:通过精细的校对结构分析证明有效的神经理论 2501.18310v2
  • 1593 06-06 Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models Bewertung von Intersektionalen Bias in Darstellungen von vortrainierten Bilderkennungsmodellen 评估培训前图像识别模型代表性的跨部门偏见 2506.03664v2
  • 1594 06-06 The Influence of the Memory Capacity of Neural DDEs on the Universal Approximation Property Der Einfluss der Speicherkapazität von neuralen DDEs auf die universelle Annäherungseigenschaft 神经DDE的记忆力对普遍接近财产的影响 2505.07244v2
  • 1595 06-06 Zero-Shot Detection of LLM-Generated Code via Approximated Task Conditioning Zero-Shot-Erkennung des LLM-generierten Codes über eine angenäherte Aufgabenkonditionierung 通过近似任务设置条件对LLM生成的代码进行零热检测 2506.06069v1
  • 1596 06-06 Feedforward Few-shot Species Range Estimation Einspeisung Wenig abgespuckte Arten Range-Schätzung 向前进的微粒物种范围估计 2502.14977v2
  • 1597 06-06 HIGHT: Hierarchical Graph Tokenization for Molecule-Language Alignment HIGHT: Hierarchische Graphen-Tokenisierung für die Ausrichtung von Molekülen HIG: 分子-语言对齐的等级图形方位化 2406.14021v2
  • 1598 06-06 Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large $p$ Ab Initio Nichtparametrische Variable Auswahl für skalierbare symbolische Regression mit großen $p$ 用于大额p美元可缩放的符号回归的非参数变量选择 2410.13681v2
  • 1599 06-06 Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation Effiziente feinkörnige Anleitung für die Diffusionsmodellbasierte Symbolische Musikgenerierung 以模型为基础的示范符号音乐制作的 扩散模型的精效微粒指导 2410.08435v3
  • 1600 06-06 MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding MedXpertQA: Benchmarking von medizinischem Verständnis auf Expertenebene MedXperttQA:确定专家级医疗理由和理解基准 2501.18362v3
  • 1601 06-06 TRUST: Test-time Resource Utilization for Superior Trustworthiness TRUST: Testzeit-Ressourcennutzung für höchste Vertrauenswürdigkeit 信托基金:高可信赖性试验时间利用资源 2506.06048v1
  • 1602 06-06 Diffusion-Based Hierarchical Graph Neural Networks for Simulating Nonlinear Solid Mechanics Diffusionsbasierte Hierarchische Graphen-Neuralnetze zur Simulation nichtlinearer Festmechanik 模拟非线性固体机械师 2506.06045v1
  • 1603 06-06 SDS-Net: Shallow-Deep Synergism-detection Network for infrared small target detection SDS-Net: Shallow-Deep Synergism-Detektionsnetzwerk für Infrarot-Kleinzielerkennung SDS-Net:红外小型目标探测浅水深协同-探测网络 2506.06042v1
  • 1604 06-06 Tensor-to-Tensor Models with Fast Iterated Sum Features Tensor-to-Tensor-Modelle mit schnell iterierten Sum-Features 具有快速迭代超特性的电传感器到传感器模型 2506.06041v1
  • 1605 06-06 Do-PFN: In-Context Learning for Causal Effect Estimation Do-PFN: In-Context-Lernen für die Schätzung des Kausaleffekts DoPFN: 因果关系估算的理论内学习 2506.06039v1
  • 1606 06-06 CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models CRISP-NAM: Konkurrenzfähige Risiken interpretierbare Überlebensvorhersage mit neuralen Additivenmodellen CRIISP-NAM: 与神经添加模型相竞争的风险解释性生存预测 2505.21360v2
  • 1607 06-06 A PID-Controlled Tensor Wheel Decomposition Model for Dynamic Link Prediction Ein PID-gesteuertes Tensor-Radzersetzungsmodell für dynamische Link-Vorhersage PID-控制式天线仪式轮式动态链接预测分解模型 2505.14211v2
  • 1608 06-06 Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling Neural Flow Diffusion Modelle: Erlernbarer Vorwärtsprozess für verbesserte Diffusionsmodellierung 神经流流传播模型:改进传播模型的可学习前进进程 2404.12940v3
  • 1609 06-06 SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations SDE Matching: Skalierbares und simulationsfreies Training latenter stochastischer Differentialgleichungen SDE 匹配:可缩放和模拟无模拟的静态碎裂差异等量模拟培训 2502.02472v2
  • 1610 06-06 Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification Probenspezifische Geräuschinjektion für die Diffusions-basierte Adversarialreinigung 以传播为基础的反向净化的样本特定噪音注射 2506.06027v1
  • 1611 06-06 On Inverse Problems, Parameter Estimation, and Domain Generalization Inverse Probleme, Parameterschätzung und Domain-Verallgemeinerung 反向问题、参数估计和域通用化 2506.06024v1
  • 1612 06-06 Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation Begründung durch Ausführung: Vereinheitlichung von Prozess- und Ergebnisprämien für die Codegenerierung 执行中的理由:代码生成的统一程序和结果奖励 2412.15118v2
  • 1613 06-06 Improving Customer Service with Automatic Topic Detection in User Emails Verbesserung des Kundenservices mit automatischer Themenerkennung in Benutzer-E-Mails 改进用户电子邮件自动专题检测客户服务 2502.19115v3
  • 1614 06-06 Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems Unisoma: Ein Unified Transformer-basierter Solver für Multi-Solid-Systeme Unisoma:多层系统统一变压器解决方案 2506.06021v1
  • 1615 06-06 One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy Ein Stein, zwei Vögel: Verstärkung der feindlichen Verteidigung durch die Linse der distributiven Diskrepanz 一石,两鸟:通过分布性差异的镜头加强反向防御 2503.02169v2
  • 1616 06-06 A Theoretical Justification for Asymmetric Actor-Critic Algorithms Eine theoretische Rechtfertigung für asymmetrische Aktor-Kritische Algorithmen 关于非对称精算作用者-批评算法的理论理由 2501.19116v2
  • 1617 06-06 A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces Ein Fisher-Rao Gradientenstrom für Entropie-regularisierte Markov-Entscheidungsprozesse in polnischen Räumen 用于波兰空间的环丙正规化的Markov决策过程的Fisher-Rao梯度流 2310.02951v3
  • 1618 06-06 pLDDT-Predictor: High-speed Protein Screening Using Transformer and ESM2 pLDDT-Predictor: High-Speed Protein Screening mit Transformer und ESM2 pLDDT-预测:使用变压器和无害环境管理2进行高速蛋白素筛选 2410.21283v3
  • 1619 06-06 Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation Klon-Robuste Gewichte in metrischen Räumen: Umgang mit Redundanz-Bias für Benchmark-Aggregation 中度空间的克隆-罗布体重量:为基准聚合处理冗余琵琶湖 2502.03576v2
  • 1620 06-06 LightGTS: A Lightweight General Time Series Forecasting Model LightGTS: Ein leichtes allgemeines Zeitreihen-Vorhersagemodell LightGTS:轻型一般时间序列预测模型 2506.06005v1
  • 1621 06-06 What Really is a Member? Discrediting Membership Inference via Poisoning Was ist wirklich ein Mitglied? Diskreditierende Mitgliedschaftsinferenz durch Vergiftung 真正的成员是什么? 2506.06003v1
  • 1622 06-06 LaDEEP: A Deep Learning-based Surrogate Model for Large Deformation of Elastic-Plastic Solids LaDEEP: Ein Deep Learning-basiertes Surrogate-Modell für große Deformation von elastischen-Plastischen Festkörpern LaDEEEP:一个基于深层学习的弹性固体大规模变形代金模型 2506.06001v1
  • 1623 06-06 Machine learning for in-situ composition mapping in a self-driving magnetron sputtering system Maschinelles Lernen für die In-situ-Kompositionsmapping in einem selbstfahrenden Magnetron-Sputtersystem 在自驾驶磁磁突飞系统内进行原地成像绘图的机器学习 2506.05999v1
  • 1624 06-06 RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory RETENTION: Ressourceneffizientes baumbasiertes Ensemblemodell Beschleunigung mit Content-Addressablem Speicher 消耗:基于资源的高效树基集合模型加速,内装内存内存内装内存内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内 2506.05994v1
  • 1625 06-06 Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning Diffusions-Politiken für die Verallgemeinerung von Out-of-Distribution im Offline-Verstärkungs-Lernen 在离线强化学习中推广分配外普遍化政策 2307.04726v4
  • 1626 06-06 Peri-LN: Revisiting Normalization Layer in the Transformer Architecture Peri-LN: Normalisierungsschicht in der Transformer-Architektur neu aufsuchen Peri-LN: 变形器结构中重新审视正常化层 2502.02732v3
  • 1627 06-06 Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning Dynamische Mischung von Progressivem Parameter-Effizienter Expertenbibliothek für lebenslanges Roboterlernen 终身机器人学习进步参数有效专家图书馆动态混合体 2506.05985v1
  • 1628 06-06 Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations Jacobian Sparse Autoencoder: Sparsify Computations, nicht nur Aktivierungen Jacobian Sparass Autoencoders: 简化计算,而不仅仅是激活 2502.18147v2
  • 1629 06-06 A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations Eine Übersicht über automatische Bewertungsmethoden zu Text-, Bild- und Sprachgenerierungen 文字、视觉和语音一代自动评价方法调查 2506.10019v1
  • 1630 06-06 HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks HO-FMN: Hyperparameter-Optimierung für schnelle Minimal-Norm-Angriffe HO-FMN: 快速最低目标攻击的超参数优化 2407.08806v2
  • 1631 06-06 AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification AMPED: Adaptive Multi-Objektive Projektion zum Ausgleich von Exploration und Geschicklichkeit Diversifizierung AMMPED: 兼顾探索与技能多样化的适应性多目标预测 2506.05980v1
  • 1632 06-06 Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning Mit Adaptive Transformer Block Expansion in Federated Fine-Tuning Catastrophic Forgetting 减轻灾难性灾难,以适应性变异器集团扩大联邦炼油厂中的适应性变异器块 2506.05977v1
  • 1633 06-06 Graph Deep Learning for Time Series Forecasting Graph Deep Learning für die Vorhersage von Zeitreihen 用于时间序列预测的深深学习图 2310.15978v2
  • 1634 06-06 On Measuring Long-Range Interactions in Graph Neural Networks Bei der Messung von Langzeit-Interaktionen in Graphen-Neuralen Netzwerken 关于在图形神经网络中测量长距离相互作用 2506.05971v1
  • 1635 06-06 Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning Schrittweiser Übergang vom Bellman Optimality Operator zum Bellman Operator im Online-Verstärkungslernen 从Bellman最佳运算符逐步过渡到在线强化学习中的Bellman运算符 2506.05968v1
  • 1636 06-06 Preference Learning for AI Alignment: a Causal Perspective Bevorzugtes Lernen für die AI-Ausrichtung: eine kausale Perspektive AI一致的优先学习:从因果关系角度看问题 2506.05967v1
  • 1637 06-06 Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities Auf dem Weg zu robusten multimodalen physiologischen Grundlagenmodellen: Umgang mit willkürlich fehlenden Modalitäten 争取建立强有力的多式多模式生理基金会模型:处理任意失踪方式 2504.19596v2
  • 1638 06-06 Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology Subspezialitätsspezifisches Stiftungsmodell für intelligente Gastrointestinalpathologie 智能气胃肠道病理学 2505.21928v2
  • 1639 06-06 Certification for Differentially Private Prediction in Gradient-Based Training Zertifizierung für unterschiedliche private Vorhersagen im gradient-based Training 渐进式培训中差别私人预测认证 2406.13433v3
  • 1640 06-06 Diving into Self-Evolving Training for Multimodal Reasoning Eintauchen in das Selbst-Evolving Training für multimodale Vernunft 跳入多模式理由自我发展培训 2412.17451v3
  • 1641 06-06 Applying XAI based unsupervised knowledge discovering for Operation modes in a WWTP. A real case: AQUAVALL WWTP Anwendung von XAI-basierten unüberwachten Kenntnissen für Betriebsmodi in einem WWTP. Ein echter Fall: AQUAVALL WWTP 在WWTP中应用基于 XAI 的未经监督的操作模式知识发现 XAI 。 一个真实案例: AQUAVALL WWTP 2506.05958v1
  • 1642 06-06 Inferring entropy production in many-body systems using nonequilibrium MaxEnt Ableitung der Entropieproduktion in vielen Körpersystemen unter Verwendung von Nonequilibrium MaxEnt 在使用无平衡的MaxEnt的多机体系统中,在多机体系统中产生导导导导酶 2505.10444v2
  • 1643 06-06 Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes Deterministische Politik mit politischen Gradienten in eingeschränkten Markov-Entscheidungsprozessen lernen 在受约束的Markov决策过程中,学习具有政策分级的政策确定性政策 2506.05953v1
  • 1644 06-06 How to explain grokking Wie man Grokking erklärt 如何解释怪语 2412.18624v3
  • 1645 06-06 GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression GRASP: Redundante Ebenen durch adaptive Singular-Parameter für effiziente Modellkompression ersetzen GRASP: 用高效模型压缩的适应性单质参数取代冗余层 2501.00339v3
  • 1646 06-06 BARK: A Fully Bayesian Tree Kernel for Black-box Optimization BARK: Ein vollbayesischer Baumkernel für Black-Box-Optimierung 用于黑盒优化的全巴伊西亚树核心圈 2503.05574v2
  • 1647 06-06 AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML AutoML-Agent: Ein Multi-Agent-LLM-Framework für Full-Pipeline-AutoML 自动MAL- Agency: 全Pipeline 自动MLM 多边代理LLM 框架 2410.02958v2
  • 1648 06-06 Additive decomposition of one-dimensional signals using Transformers Additive Zersetzung von eindimensionalen Signalen mit Transformern 使用变压器的单维信号的添加分解 2506.05942v1
  • 1649 06-06 SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models SafeAuto: Knowledge-Enhanced Safe Autonomous Driving mit multimodalen Foundation-Modellen SafeAuto:利用多模式基金会模式加强知识增强安全自主驾驶 2503.00211v2
  • 1650 06-06 Comparative Analysis of Modern Machine Learning Models for Retail Sales Forecasting Vergleichende Analyse von modernen Machine Learning-Modellen für die Prognose des Einzelhandels 零售销售预测现代机器学习模型比较分析 2506.05941v1
  • 1651 06-06 Exponential Family Variational Flow Matching for Tabular Data Generation Exponentieller Familien-Variationsfluss passend zur tabellarischen Datengenerierung 用于制表数据的家庭流动量变化 2506.05940v1
  • 1652 06-06 Quantifying Adversarial Uncertainty in Evidential Deep Learning using Conflict Resolution Quantifizierung von Unsicherheiten im Evidential Deep Learning mittels Konfliktlösung 以解决冲突的方式量化证据深层学习中存在的逆差不确定性 2506.05937v1
  • 1653 06-06 Deep Learning Weather Models for Subregional Ocean Forecasting: A Case Study on the Canary Current Upwelling System Deep Learning Weather Models for Subregional Ocean Forecasting: Eine Fallstudie über das kanarische aktuelle Auftriebssystem 次区域海洋预报的深学习气象模型:关于加那利洋流上升系统的个案研究 2505.24429v2
  • 1654 06-06 Machine Learning Predictions for Traffic Equilibria in Road Renovation Scheduling Machine Learning Vorhersagen für Verkehrsequilibrien in der Road Renovation Scheduling 道路翻新日程安排交通平衡的机器学习预测 2506.05933v1
  • 1655 06-06 Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training Progressive Data Dropout: Ein verblüffend einfacher Ansatz zum schnelleren Training 渐进数据辍学:快速培训的一个令人尴尬的简单方法 2505.22342v2
  • 1656 06-06 Can Masked Autoencoders Also Listen to Birds? Können maskierte Autoencoder auch Vögel hören? 面具汽车人也能听鸟的话吗? 2504.12880v3
  • 1657 06-06 GraphGPT: Generative Pre-trained Graph Eulerian Transformer GraphGPT: Generative vortrainierte Graph Eulerian Transformer 图GGPT: 未经培训的先产生图 欧勒里亚变形器 2401.00529v3
  • 1658 06-06 UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning UDora: Ein einheitliches rotes Teaming-Framework gegen LLM-Agenten durch Dynamisches Entführen ihrer eigenen Vernunft UDora:通过动态劫持LLM代理商的自身理由,针对这些代理商的统一红色团队框架 2503.01908v2
  • 1659 06-06 Over-PINNs: Enhancing Physics-Informed Neural Networks via Higher-Order Partial Derivative Overdetermination of PDEs Über-PINNs: Verbesserung physikinformierter neuraler Netzwerke durch höhere Ordnung Partial Derivative Überbestimmung von PDEs 超标准PINNs:通过高端部分衍生因素超过PDE的确定,加强物理内化神经网络 2506.05918v1
  • 1660 06-06 Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness Halbüberwachte Segmentierung jenseits der Genauigkeit neu denken: Zuverlässigkeit und Robustheit 重新思考超越准确度的半监督分割:可靠性和强性 2506.05917v1
  • 1661 06-06 Approximating Latent Manifolds in Neural Networks via Vanishing Ideals Approximierende latente Manifolds in neuralen Netzen über Vanishing Ideals 借助消失的理念在神经网络中接近的中点修饰 2502.15051v2
  • 1662 06-06 DeviceScope: An Interactive App to Detect and Localize Appliance Patterns in Electricity Consumption Time Series DeviceScope: Eine interaktive App zur Erkennung und Lokalisierung von Appliance-Mustern in der Stromverbrauchs-Zeitreihe 设备范围:在电力消耗时间序列中检测和本地化应用程序模式的交互式应用程序 2506.05912v1
  • 1663 06-06 CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness CAPability: Ein umfassender Visual Caption Benchmark zur Bewertung von Korrektheit und Gründlichkeit 能力:评估正确性和彻底性的综合视觉能力基准 2502.14914v3
  • 1664 06-06 Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants Brauchen wir Strategien zur Neuausrichtung? Eine theoretische und empirische Studie rund um SMOTE und deren Varianten 我们需要重新平衡战略吗? 围绕SMOTE及其变体进行理论和经验研究。 2402.03819v5
  • 1665 06-06 A Driving Regime-Embedded Deep Learning Framework for Modeling Intra-Driver Heterogeneity in Multi-Scale Car-Following Dynamics Ein Driving Regime-embedded Deep Learning Framework zur Modellierung von Intra-Driver Heterogenität in multi-Scale Car-Following Dynamics 用于模拟多规模汽车跟踪动态中多规模汽车内部异质性模型的驱动系统(以制度为床位的深学习框架) 2506.05902v1
  • 1666 06-06 State-Covering Trajectory Stitching for Diffusion Planners State-Covering Trajektorie Stich für Diffusionsplaner 用于扩散规划器的 国家覆盖轨迹切换 2506.00895v2
  • 1667 06-06 Few Labels are all you need: A Weakly Supervised Framework for Appliance Localization in Smart-Meter Series Wenige Labels sind alles, was Sie brauchen: Ein schwach überwachtes Framework für die Gerätelokalisierung in der Smart-Meter-Serie 少数标签是您所需要的:智能- 密特系列中受微弱监督的应用程序本地化框架 2506.05895v1
  • 1668 06-06 Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games Politikoptimierung für lineare-Quadratic Graphon Mean Field Games mit kontinuierlicher Zeit 连续时线性二次曲线平面平面运动会的政策优化 2506.05894v1
  • 1669 06-06 Open Your Eyes: Vision Enhances Message Passing Neural Networks in Link Prediction Öffnen Sie Ihre Augen: Vision verbessert die Botschaft an neuralen Netzwerken in der Link-Vorhersage ” 睁开眼睛:愿景在链接预测中加强信息传递神经网络 “ 2505.08266v3
  • 1670 06-06 An Uncertainty-Aware ED-LSTM for Probabilistic Suffix Prediction Eine unsichere ED-LSTM für probabilistische Suffix-Vorhersage 用于概率后置物后置物预测的不确定性( ED-LSTM) 的不确定性警告 ED-LSTM 2505.21339v2
  • 1671 06-06 Variational Inference for Quantum HyperNetworks Variationelle Schlussfolgerung für Quantum HyperNetworks 量子超网络的变异推推推 2506.05888v1
  • 1672 06-06 NILMFormer: Non-Intrusive Load Monitoring that Accounts for Non-Stationarity NILMFormer: Non-Intrusive Load Monitoring, die Konten für Non-Stationarität NILMFormer: 非入侵性账户的无侵入性负载监测 2506.05880v1
  • 1673 06-06 Infinite-Dimensional Diffusion Models Infinite-Dimensional Diffusionsmodelle 无限不同传播模型 2302.10130v3
  • 1674 06-06 A projection-based framework for gradient-free and parallel learning Ein projektionsbasierter Rahmen für gradientenfreies und paralleles Lernen 以预测为基础的无梯度和平行学习框架 2506.05878v1
  • 1675 06-06 Interpretable Clustering Ensemble Interpretiertes Cluster-Ensemble 可解释的集群组合组合 2506.05877v1
  • 1676 06-06 BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures BestServe: Servierstrategien mit optimalem Goodput in Kollokations- und Disaggregationsarchitekturen 最佳服务:在合用和拆分结构中采用最优优利用的优良战略 2506.05871v1
  • 1677 06-06 Loss Functions for Predictor-based Neural Architecture Search Verlustfunktionen für vorausschauende neurale Architektursuche 预测基于神经建筑搜索的损失函数 2506.05869v1
  • 1678 06-06 Stealix: Model Stealing via Prompt Evolution Stealix: Modelldiebstahl über Prompt Evolution Stenyix: 通过快速进化偷盗模型 2506.05867v1
  • 1679 06-06 Enhancing pretraining efficiency for medical image segmentation via transferability metrics Steigerung der Vorschulungseffizienz für die Segmentierung medizinischer Bilder durch Übertragbarkeitsmetriken 通过可转让性指标,提高医疗图象分形的训练前效率 2410.18677v2
  • 1680 06-06 Conditioning Diffusions Using Malliavin Calculus Konditionierung Diffusionen mit Malliavin Calculus 使用 Malliavin Calculus 有条件的传播 2504.03461v2
  • 1681 06-06 Improved Allergy Wheal Detection for the Skin Prick Automated Test Device Verbesserte Allergikererkennung für das Automatisierte Testgerät Skin Prick 改进皮肤皮板自动自动测试装置的过敏 wHeal 检测 2506.05862v1
  • 1682 06-06 Wavelet-based Disentangled Adaptive Normalization for Non-stationary Times Series Forecasting Wavelet-basiert Disentangled Adaptive Normalisierung für nicht-stationäre Zeitreihen Forecasting 用于非静止时报系列预测的不固定时报调适正常化 2506.05857v1
  • 1683 06-06 TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research TinySQL: Ein progressiver Text-zu-SQL-Datensatz für die mechanistische Interpretationsforschung TinySQL: 用于机械解释性研究的渐进文本到SQL数据集 2503.12730v3
  • 1684 06-06 BoA: Attention-aware Post-training Quantization without Backpropagation BoA: Aufmerksamkeitsbewusste Nachschulung Quantisierung ohne Backpropagation BoA: 注意培训后量化而不反言反言 2406.13474v3
  • 1685 06-06 Training-Free Query Optimization via LLM-Based Plan Similarity Schulungsfreie Abfrage Optimierung über LLM-basierte Plan-Ähnlichkeit 通过LLM-基于LLM-计划相似性实现无培训查询优化 2506.05853v1
  • 1686 06-06 Regional, Lattice and Logical Representations of Neural Networks Regionale, gitter- und logische Vertretungen neuraler Netzwerke 神经网络的区域、拉蒂斯和逻辑代表 2506.05834v1
  • 1687 06-06 Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning Mit dem Pfeil der Zeit lernen: Hyperbolische Geometrie für rückseitig-kompatibles Repräsentationslernen 沿时间之箭学习:反向比较代表学习的超单几何学 2506.05826v1
  • 1688 06-06 Domain Generalizable Knowledge Tracing via Concept Aggregation and Relation-Based Attention Domain Generalizable Knowledge Tracing über Konzeptaggregation und relationsbasierte Aufmerksamkeit 通过概念聚合和基于关系的关注寻找可普遍适用的知识 2407.02547v2
  • 1689 06-06 SAGE: A Framework of Precise Retrieval for RAG SAGE: Ein Rahmen für präzises Retrieval für RAG SAGE:RAG的精确检索框架 2503.01713v3
  • 1690 06-06 SageAttention2++: A More Efficient Implementation of SageAttention2 SageAttention2++: Effizientere Umsetzung von SageAttention2 SageAttention2++:更有效地实施SageAttention2 2505.21136v3
  • 1691 06-06 Non-Asymptotic Length Generalization Nicht-Asymptotische Längenverallgemeinerung 非负时长一般化 2506.03085v2
  • 1692 06-06 Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants 高有机铜剂在 lvLiNGAM 中的因果效应识别 2506.05202v2
  • 1693 06-06 Taming Knowledge Conflicts in Language Models Zähmen von Wissenskonflikten in Sprachmodellen 在语言模式中讨论知识冲突 2503.10996v2
  • 1694 06-06 Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models Boolesche Matrix-Logik-Programmierung für aktives Lernen von Genfunktionen in metabolischen Netzwerkmodellen im Genommaßstab 在基因组规模代谢网络模型中积极学习基因功能的布尔矩阵逻辑逻辑编程 2405.06724v4
  • 1695 06-06 SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference SpargeAchtung: Genaue und trainingsfreie Sparse Achtung Beschleunigen jeder Modellinferenz 隔离注意:准确和无培训的分散注意加速任何典型推断 2502.18137v5
  • 1696 06-06 Overcoming Challenges of Partial Client Participation in Federated Learning : A Comprehensive Review Herausforderungen der Teilkundenbeteiligung am Föderierten Lernen überwinden : Ein umfassender Überblick 克服部分客户参与联邦学习的挑战:全面审查 2506.02887v2
  • 1697 06-06 BOLD: Boolean Logic Deep Learning BOLD: Boolean Logic Deep Learning 布尔逻辑深层学习 2405.16339v2
  • 1698 06-06 Positional Encoding meets Persistent Homology on Graphs Positionale Kodierung trifft Persistente Homologie auf Graphen 定位编码与图纸上的常同理学相识 2506.05814v1
  • 1699 06-06 E-LDA: Toward Interpretable LDA Topic Models with Strong Guarantees in Logarithmic Parallel Time E-LDA: Auf dem Weg zu interpretierbaren LDA-Themenmodellen mit starken Garantien in logarithmischer Parallelzeit E-LDA:争取在对数平行时间建立具有有力保证的可解释的LDA专题模型 2506.07747v1
  • 1700 06-06 Unintended Harms of Value-Aligned LLMs: Psychological and Empirical Insights Unbeabsichtigte Schäden an wertorientierten LLMs: Psychologische und empirische Erkenntnisse 价值观一致的LMMs:心理和实证透视的意外伤害 2506.06404v1
  • 1701 06-06 ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think ZeroFlow: Katastrophisches Vergessen zu überwinden ist einfacher als Sie denken 零花:克服灾难的遗忘比你想象的容易 2501.01045v4
  • 1702 06-06 Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model Neuraler Zusammenbruch in kumulativen Linkmodellen für Ordinalregression: Eine Analyse mit ungezwungenem Feature Model Ordinal 倒退累积链接模型中的神经折叠:含有不受限制特性模型的分析 2506.05801v1
  • 1703 06-06 Uncertainty Propagation in the Fast Fourier Transform Ungewissheitsverbreitung in der schnellen Fourier-Transformation 快速傅里叶变形中的不确定性传播 2504.10136v2
  • 1704 06-06 Option Pricing Using Ensemble Learning Optionspreise mit Ensemble-Lernen 使用集合学习的选项定价 2506.05799v1
  • 1705 06-06 Maximum Entropy Reinforcement Learning with Diffusion Policy Maximales Entropie-Verstärkung-Lernen mit Diffusionspolitik 采用传播政策进行最大内容强化学习 2502.11612v3
  • 1706 06-06 EqCollide: Equivariant and Collision-Aware Deformable Objects Neural Simulator EqCollide: Gleichwertige und kollisionsfähige verformbare Objekte Neural Simulator EqCollide: 等同和相撞-可变形物体神经模拟器 2506.05797v1
  • 1707 06-06 TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining TiC-LM: Ein Web-Scale-Benchmark für zeitkontinuierliches LLM-Vortraining TIC-LM:培训前时间-持续性LLM网络基准 2504.02107v3
  • 1708 06-06 Exploiting Similarity for Computation and Communication-Efficient Decentralized Optimization Ausnutzung von Ähnlichkeiten für die Berechnung und Kommunikation-Effiziente dezentralisierte Optimierung 利用计算和通信效率-权力下放优化的相似性 2506.05791v1
  • 1709 06-06 Robust sensor fusion against on-vehicle sensor staleness Robuste Sensor-Fusion gegen Stillstand des Sensors im Fahrzeug 压制车辆传感器悬浮剂的强感传感器聚变 2506.05780v1
  • 1710 06-06 Pegasus: A Universal Framework for Scalable Deep Learning Inference on the Dataplane Pegasus: Ein universelles Framework für skalierbare Deep Learning-Schlussfolgerungen auf dem Dataplane Pegasus:数据飞机可缩放深学习推理通用框架 2506.05779v1
  • 1711 06-06 Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch Dunkles Wissen unter verschiedenen Lehrer-Kapazitäten erforschen und Kapazitäts-Mismatch angehen 在各种教师能力之下探索黑暗知识,解决能力不足问题 2405.13078v2
  • 1712 06-06 ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation ActionPiece: Kontextuelle Tokenisierung von Handlungssequenzen für eine generative Empfehlung ActionPiece: 产生建议的背景化行动序列 2502.13581v2
  • 1713 06-06 Evaluating Neuron Explanations: A Unified Framework with Sanity Checks Neuron-Erklärungen bewerten: Ein einheitliches Framework mit Sanitätskontrollen 评估内向解释:一个统一框架,检查是否合乎情理 2506.05774v1
  • 1714 06-06 SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures SGD Jittering: Eine Schulungsstrategie für robuste und präzise modellbasierte Architekturen SGD JGT JUGT JIGT: 强健和准确的建模建筑培训战略 2410.14667v3
  • 1715 06-06 Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks Visualisierung, Nachdenken und Abbau der verlorenen Landschaft tiefer neuraler Netzwerke 深神经网络丧失的地貌景观 2405.12493v2
  • 1716 06-06 Predicting ICU In-Hospital Mortality Using Adaptive Transformer Layer Fusion Vorhersage der Krankenhaussterblichkeit der Intensivstation mittels adaptiver Transformer Layer Fusion 利用适应性变异器图层融合预测伊斯兰法院联盟在医院内的死亡率 2506.04924v2
  • 1717 06-06 AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation AANet: Virtuelles Screening unter struktureller Unsicherheit über Alignment und Aggregation AANet:通过对齐和聚合在结构不确定性下进行虚拟筛选 2506.05768v1
  • 1718 06-06 On the Query Complexity of Verifier-Assisted Language Generation Zur Fragekomplexität der sprachlichen Generierung von Prüfern 关于验证者协助语言生成的查询复杂程度 2502.12123v2
  • 1719 06-06 BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning BiTrajDiff: Bidirektionale Trajektorien-Generation mit Diffusionsmodellen für Offline-Verstärkung BirrajDiff:双向轨迹生成,带有离线强化学习传播模型 2506.05762v1
  • 1720 06-06 Mapping correlations and coherence: adjacency-based approach to data visualization and regularity discovery Mapping-Korrelationen und Kohärenz: Adjazenz-basierter Ansatz zur Datenvisualisierung und Regelmäßigkeits-Entdeckung 绘图的相互关系和一致性:基于对应的可视化和定期发现数据方法 2506.05758v1
  • 1721 06-06 FlowOE: Imitation Learning with Flow Policy from Ensemble RL Experts for Optimal Execution under Heston Volatility and Concave Market Impacts FlowOE: Imitation Learning with Flow Policy von Ensemble RL Experts for Optimal Execution unter Heston Volatility und Concave Market Impacts FlowOE:在赫斯顿波动和集中市场影响下最优化执行的集合RL专家以流动政策进行模拟学习 2506.05755v1
  • 1722 06-06 Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective Beschränkte Probenahmen für Sprachmodelle sollten einfach sein: Eine MCMC-Perspektive 语言模式抽样应容易:MCMC视角 2506.05754v1
  • 1723 06-06 Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting Integration von Spatiotemporalen Features in LSTM für spatially Informed COVID-19 Hospitalization Forecasting LSTM综合地貌特征纳入LSTM系统,用于空间知情COVID-19医院化预测 2506.05752v1
  • 1724 06-06 Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models 低Rank适应的计算限制(LORA)变换模型的精确调整 2406.03136v2
  • 1725 06-06 Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance Effiziente Online-RFT mit Plug-and-Play-LLM-Richtern: Entsperren State-of-the-Art-Performance 与 “ 插插 “ 和 “ Play LLM “ 法官一起高效在线RFT:解锁最新业绩 2506.05748v1
  • 1726 06-06 SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models SPRINT: Interleaved Planung und parallelisierte Ausführung in vernünftigen Modellen ermöglichen SPRINT: 在理由说明模式中促进跨离规划和平行执行 2506.05745v1
  • 1727 06-06 In-context denoising with one-layer transformers: connections between attention and associative memory retrieval In-Context-Denoising mit Einschichttransformatoren: Verbindungen zwischen Aufmerksamkeit und assoziativem Speicherabruf 与一层变压器的自通脱色:注意和联动内存检索之间的联系 2502.05164v2
  • 1728 06-06 FinSage: A Multi-aspect RAG System for Financial Filings Question Answering FinSage: Ein Multi-Aspect-RAG-System für Finanzfilialen Frageantworten FinSage:财务申报问题解答的多层RAG系统 2504.14493v3
  • 1729 06-06 ResearchTown: Simulator of Human Research Community ResearchTown: Simulator der Human Research Community 研究对象:人类研究界模拟者 2412.17767v2
  • 1730 06-06 Generalized Incremental Learning under Concept Drift across Evolving Data Streams Allgemeines Inkrementelles Lernen unter Konzept Drift über Evolving Data Streams 贯穿不断演变的数据流的 “ 概念漂流 “ 概念下的普通递增学习 2506.05736v1
  • 1731 06-06 Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness Vergessen LLMs wirklich? Bewertung des Unlernens mit Wissenskorrelation und Vertrauensbewusstsein LLMs真的会忘却吗?用知识关联和信心意识来评价不学习 2506.05735v1
  • 1732 06-06 Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity Nicht-konvexe Matrix-Erfassung: Brechen der quadratischen Rank-Schranke in der Probenkomplexität 非曲线矩阵表感测:打破样本复杂程度的二次级屏障 2408.13276v3
  • 1733 06-06 GroMo: Plant Growth Modeling with Multiview Images GroMo: Plant Growth Modeling mit Multiview-Bildern Gromo:以多视角图像建模的植物增长模型 2503.06608v2
  • 1734 06-06 Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization Provable Complexity Verbesserung von AdaGrad über SGD: obere und untere Bounds in stochastischen Non-Convex-Optimierung AdaGrad的复杂度高于SGD的改善:在斯托切性非Convex优化中的上下界 2406.04592v3
  • 1735 06-06 Large Language Models are Good Relational Learners Große Sprachmodelle sind gute relationale Lernende 大语言模型是好关系学习者 2506.05725v1
  • 1736 06-06 Any-Class Presence Likelihood for Robust Multi-Label Classification with Abundant Negative Data Jede-Klasse-Präsenz-Wahrscheinlichkeit für robuste Multi-Label-Klassifikation mit überschwänglichen Negativdaten 具有大量负值数据的强性多标签分类中存在任何类别的可能性 2506.05721v1
  • 1737 06-06 Optimization Proxies using Limited Labeled Data and Training Time – A Semi-Supervised Bayesian Neural Network Approach Optimierungs-Proxies mit begrenzten beschrifteten Daten und Trainingszeit – Ein semi-überwachter Bayesian Neural Network-Ansatz 使用有限标签数据和培训时间的有限标签数据和培训时间 – – 半超模贝耶斯神经网络方法 2410.03085v3
  • 1738 06-06 Grokking Beyond the Euclidean Norm of Model Parameters Grokking jenseits der euklidischen Norm von Modellparametern 示范参数欧洲标准 2506.05718v1
  • 1739 06-06 Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning Ensemble Elastic DQN: Ein neuartiger Mehrschritt-Ensemble-Ansatz zur Bewältigung von Überschätzungen im tiefen wertbasierten Verstärkungslernen 共合大 Elastic DQN: 一种新颖的多步共步办法,以解决深重价值强化学习中的高估问题 2506.05716v1
  • 1740 06-06 Mirage: A Multi-Level Superoptimizer for Tensor Programs Mirage: Ein Multi-Level-Superoptimizer für Tensor-Programme 幻影:向导方案多层次超强激励器 2405.05751v3
  • 1741 06-06 Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation Kommen Sie zusammen, aber nicht jetzt: Eine progressive Strategie, um Low-Rank-Anpassung zu fördern 齐心合力,但现在不是现在:一个推进低Rank适应的渐进战略 2506.05713v1
  • 1742 06-06 A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models Eine Umfrage über Sparse Autoencoder: Interpretation der internen Mechanismen großer Sprachmodelle 粗略自动编码器调查:解释大语言模式内部机制 2503.05613v2
  • 1743 06-06 Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v1
  • 1744 06-06 Understanding Memorization in Generative Models via Sharpness in Probability Landscapes Gedächtnis in Generativen Modellen über Schärfe in Wahrscheinlichkeitslandschaften verstehen 通过在概率景观中的尖锐化理解生成模型中的记忆化 2412.04140v3
  • 1745 06-06 Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting Online-Erkennung von LLM-generierten Texten durch sequentielle Hypothesenprüfung durch Wetten 通过打赌按顺序假设测试在线检测LLM发光文本 2410.22318v3
  • 1746 06-06 Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces Action-Adaptive Continual Learning: Politische Generalisierung unter dynamischen Aktionsräumen aktivieren 行动促进性持续学习:动态行动空间下的扶持性政策概括化 2506.05702v1
  • 1747 06-06 Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health Statistisch gültige Überwachung nach der Bereitstellung sollte Standard für KI-basierte digitale Gesundheit sein 统计上有效的部署后监测应作为AI 数字健康的标准 2506.05701v1
  • 1748 06-06 Knowledge Retention for Continual Model-Based Reinforcement Learning Wissensvorbehalt für kontinuierliches modellbasiertes Verstärkungslernen 继续保留知识,以不断保持知识,进行基于模式的不断强化学习 2503.04256v4
  • 1749 06-06 Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning Allgemeines LLM-Lernen von Graphen-Synthetischen Daten mit Verstärkungs-Lernen 利用强化学习学习图形合成数据学习 2506.00845v2
  • 1750 06-06 Graph Attention Networks Unleashed: A Fast and Explainable Vulnerability Assessment Framework for Microgrids Graphen-Achtungsnetzwerke unleashed: Ein schneller und erklärbarer Sicherheitsbewertungsrahmen für Mikrogrids 未剥离的图示注意网络:微电网快速和可解释的脆弱性评估框架 2503.00786v2
  • 1751 06-06 Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework Progressiv stark sein! Erweiterung der Wissensdestillation von großen Sprachmodellen durch einen Curriculum Learning Framework 通过课程学习框架加强大语言模式的知识提炼 2506.05695v1
  • 1752 06-06 Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs Direct Behavior Optimization: Entsperren des Potenzials leichter LLMs 直接行为优化:释放轻度LMT的潜力 2506.06401v1
  • 1753 06-06 Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning Gerade genug denken: Effiziente Vernunft mit adaptiven Längenstrafen Stärkungslernen ” 思考够了 “ : “ 适应性延长刑罚强化学习的高效理性 “ 2506.05256v2
  • 1754 06-06 Simmering: Sufficient is better than optimal for training neural networks Simmering: Ausreichend ist besser als optimal für die Ausbildung neuronaler Netzwerke 浸泡:对培训神经网络而言,充足比最佳的更好 2410.19912v2
  • 1755 06-06 TracLLM: A Generic Framework for Attributing Long Context LLMs TracLLM: Ein generisches Rahmenwerk für die Zuweisung von LLMs mit langem Kontext TracLLM: 长期LMLM授标通用框架 2506.04202v2
  • 1756 06-06 RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models RoSTE: Ein effizienter Quantization-Aware-überwachter Feintuning-Ansatz für große Sprachmodelle RoSTE:对大语言模式采用高效率的量化软件监督微调方法 2502.09003v3
  • 1757 06-06 Sparse Autoencoders, Again? Sparse Autoencoder, schon wieder? 又是粗糙的自动编码器吗? 2506.04859v2
  • 1758 06-06 Learning Design-Score Manifold to Guide Diffusion Models for Offline Optimization Learning Design-Score Manifold zur Anleitung von Diffusionsmodellen für Offline-Optimierung 指导离线优化推广模式 2506.05680v1
  • 1759 06-06 Topology-aware Neural Flux Prediction Guided by Physics Topologie-bewusst Neurale Flux-Vorhersage geleitet von der Physik 物理引导 2506.05676v1
  • 1760 06-06 LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models LauraTSE: Target Speaker Extraction mit Auto-Regressive Decoder-Only Language Models LARTSE: 使用自动递退解码器单一语言模式将议长抽取作为目标 2504.07402v2
  • 1761 06-06 Kernel Density Machines Kerndichtemaschinen 内核密度机 2504.21419v2
  • 1762 06-06 Contextually Guided Transformers via Low-Rank Adaptation Kontextuell geführte Transformer über Low-Rank-Anpassung 通过低辐射适应实现上下文向导变形器 2506.05672v1
  • 1763 06-06 BAQ: Efficient Bit Allocation Quantization for Large Language Models BAQ: Effiziente Bitallokation Quantisierung für große Sprachmodelle BAQ: 大语言模式的高效比分配比量量化 2506.05664v1
  • 1764 06-06 CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation CoopetitiveV: LLM-powered Coopetitive Multi-Agent für hochwertige Verilog-Generation 协作V:利用LLM-动力协同协作的多方协作促进高品质活性一代 2412.11014v2
  • 1765 06-06 Emulating compact binary population synthesis simulations with robust uncertainty quantification and model comparison: Bayesian normalizing flows Emulieren kompakter binärer Populationssynthesesimulationen mit robuster Unsicherheitsquantifizierung und Modellvergleich: Bayesische Normalisierungsströme 以稳健的不确定性量化和模型比较来模拟精密的二元人口合成模拟,并进行有力的不确定性量化和模型比较: 2506.05657v1
  • 1766 06-06 Federated Foundation Model for GI Endoscopy Images Federiertes Stiftungsmodell für GI-Endoskopie-Bilder GI 内窥镜图像联邦基金会模型 2505.24108v2
  • 1767 06-06 Learning to Weight Parameters for Data Attribution Gewichtungsparameter für die Zuordnung von Daten lernen 数据归属的重度参数学习 2506.05647v1
  • 1768 06-06 Computational bottlenecks for denoising diffusions Computationale Engpässe für die Disseminierung von Diffusionen 消除传播传播的计算瓶颈 2503.08028v2
  • 1769 06-06 Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models Pseudo Relevanz Feedback reicht aus, um die Lücke zwischen kleinen und großen Dense-Retrieval-Modellen zu schließen Peedo相关性反馈足以弥合小型和大型常量检索模型之间的差距 2503.14887v2
  • 1770 06-06 Interpretable Enzyme Function Prediction via Residue-Level Detection Interpretierbare Enzymfunktionsvorhersage über Rückstandserkennung 通过残留水平探测的可解释酶功能预测 2501.05644v2
  • 1771 06-06 Where is the signal in tokenization space? Wo ist das Signal im Tokenisationsraum? 象征空间的信号在哪里? 2408.08541v2
  • 1772 06-06 Hierarchical Refinement: Optimal Transport to Infinity and Beyond Hierarchische Verfeinerung: Optimaler Transport zur Unendlichkeit und darüber hinaus 梯级精炼:向无限及以后的最佳运输 2503.03025v2
  • 1773 06-06 Autocomp: LLM-Driven Code Optimization for Tensor Accelerators Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v2
  • 1774 06-06 Estimating stationary mass, frequency by frequency Schätzung der stationären Masse, Häufigkeit nach Häufigkeit 估计固定质量,按频率分列的频率 2503.12808v3
  • 1775 06-06 Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer Longitudinal Targeted Minimal Loss-basierte Abschätzung mit zeitlich-differenzierter Heterogener Transformer 与时间差异变化异质变异器的纵向定值最低损失估计值 2404.04399v2
  • 1776 06-06 Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones Projektierbare Modelle: One-Shot-Generation von kleinen Spezialtransformatoren aus großen Modellen 集成项目模型:大型一型小型专门化变形器的单制制式 2506.05641v1
  • 1777 06-05 (4) A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition Ein fiktiver Q&A-Datensatz für das Studieren von Memorization und Knowledge Acquisition 用于研究记忆化和知识获取的字形 A 数据集 2506.05639v1
  • 1778 06-05 Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks’ Internal Representations Erklärbare Konzeptgenerierung durch Vision-Sprachvorliebe Lernen zum Verständnis der internen Repräsentationen neuraler Netzwerke 通过愿景-语言优先学习,了解神经网络内部代表网络,可解释的概念生成 2408.13438v3
  • 1779 06-05 CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP CLIPErase: Effizientes Entlernen von Visual-Textual Associations in CLIP CLIPErase:在CLIP中有效不学习视觉-语言协会 2410.23330v2
  • 1780 06-05 Bayesian Inference for Correlated Human Experts and Classifiers Bayesische Schlussfolgerung für korrelierte menschliche Experten und Klassifikatoren 与相关人类专家和分类有关连的人类专家和分类人的贝叶斯推论 2506.05636v1
  • 1781 06-05 AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking AdaReasoner: Adaptive Reasoning ermöglicht flexibleres Denken AdaReasoner:适应性理性使思考更灵活 2505.17312v2
  • 1782 06-05 AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization AutoQD: Automatische Entdeckung von unterschiedlichen Verhalten mit Qualitäts-Diversitäts-Optimierung 自动QD: 以质量- 多样性优化自动发现多种行为 2506.05634v1
  • 1783 06-05 The TESS Ten Thousand Catalog: 10,001 uniformly-vetted and -validated Eclipsing Binary Stars detected in Full-Frame Image data by machine learning and analyzed by citizen scientists Der TESS Zehntausend-Katalog: 10,001 gleichmäßig getestete und validierte Eclipsing Binary Stars in Full-Frame-Bilddaten durch maschinelles Lernen erkannt und von Bürgerwissenschaftlern analysiert TESS 10 000万个目录:10 001个统一审查的和经过验证的二进制星,通过机器学习和由公民科学家分析,在全框架图像数据中检测到 2506.05631v1
  • 1784 06-05 A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series Ein verlässlicher Rahmen für die Mensch-in-the-Loop-Anomalie-Erkennung in der Zeitreihe 时间序列中人类在Loop异常探测的可靠框架 2405.03234v3
  • 1785 06-05 GP-MoLFormer-Sim: Test Time Molecular Optimization through Contextual Similarity Guidance GP-MoLFormer-Sim: Testzeit-Molekularoptimierung durch kontextuelle Ähnlichkeitsführung GP-MoLFormer-Sim:通过背景相似性指导试验时间分子优化 2506.05628v1
  • 1786 06-05 Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency Grundlegende Grenzen der schnellen Tuning Transformer: Universalität, Kapazität und Effizienz 迅速提款变换者的根本限制:普遍性、能力和效益 2411.16525v2
  • 1787 06-05 Theoretical Analysis of Positional Encodings in Transformer Models: Impact on Expressiveness and Generalization Theoretische Analyse von Positionskodierungen in Transformer-Modellen: Auswirkungen auf Expressivität und Generalisierung 变换模型中位置编码理论分析:对表现力和普遍化的影响 2506.06398v1
  • 1788 06-05 Two-dimensional Taxonomy for N-ary Knowledge Representation Learning Methods Zweidimensionale Taxonomie für N-are Wissensdarstellung Lernmethoden Nary知识代表制学习方法二维分类学 2506.05626v1
  • 1789 06-05 Heterogeneous Sequel-Aware Graph Neural Networks for Sequential Learning Heterogene Sequel-Aware Graph Neuronale Netzwerke für sequentielles Lernen 用于序列学习的异异质Sequel-Aware 图像神经网络 2506.05625v1
  • 1790 06-05 Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space Verteilungsmatrix Fertigstellung über die nächsten Nachbarn im Wassersteinraum 通过瓦塞斯坦空间近距离邻居完成分布矩阵 2410.13112v2
  • 1791 06-05 Active Learning of Piecewise Gaussian Process Surrogates Aktives Lernen des stückweisen Gaußschen Prozesses Surrogats 正在积极学习小费高斯进程代得国家 2301.08789v3
  • 1792 06-05 EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM? EnsemW2S: Kann ein Ensemble von LLMs dazu genutzt werden, eine stärkere LLM zu erhalten? EnsemW2S:能否利用捆绑的LLMs获得更强大的LMm? 2410.04571v2
  • 1793 06-05 Exploring Representations and Interventions in Time Series Foundation Models Exploring Repräsentationen und Interventionen in Time Series Foundation Models 探讨在时间系列基础模型中的代表性和干预措施 2409.12915v5
  • 1794 06-05 Toward Efficient Kernel-Based Solvers for Nonlinear PDEs Auf dem Weg zu effizienten Kernel-basierten Lösern für nichtlineare PDEs 实现非线性PDE 高效的以核心为基础的内核解决问题器 2410.11165v4
  • 1795 06-05 Population-Proportional Preference Learning from Human Feedback: An Axiomatic Approach Bevölkerung-proportionale Präferenz Lernen aus menschlichem Feedback: Ein axiomatischer Ansatz 从人类反馈中学习:一种不言而喻的方法 2506.05619v1
  • 1796 06-05 Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods Rollout Roulette: Ein probabilistischer Schlussfolgerungsansatz zur Inferenz-Zeit-Skalierung von LLMs mit partikelbasierten Monte-Carlo-Methoden 滚轮推出:使用以粒子为基础的蒙特卡洛方法对LLMs的推论-时间尺度的概率推推方法 2502.01618v4
  • 1797 06-05 LFA applied to CNNs: Efficient Singular Value Decomposition of Convolutional Mappings by Local Fourier Analysis LFA für CNNs: Effiziente Singularwertzerlegung von konvolutionären Mappings durch lokale Fourier-Analysen LFA适用于有线电视新闻网:通过局部Fourier分析对革命绘图法的高效单项价值分解 2506.05617v1
  • 1798 06-05 When Maximum Entropy Misleads Policy Optimization Wenn maximale Entropie die Politikoptimierung verfälscht 当最大 Entroppy 错误引导政策优化 2506.05615v1
  • 1799 06-05 Extracting Interpretable Logic Rules from Graph Neural Networks Extrahieren von interpretierbaren Logic Rules aus Graph Neural Networks 从图形神经网络中提取可解释逻辑规则 2503.19476v2
  • 1800 06-05 Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning Reasoning Towards Fairness: Bias in Sprachmodellen durch reasoning-guided Fine-Tuning abmildern 追求公平的理由:通过有理、有理、有指导的微调减少语言模式中的偏见 2504.05632v3
  • 1801 06-05 Robust and Efficient Transfer Learning via Supernet Transfer in Warm-started Neural Architecture Search Robustes und effizientes Transfer-Lernen über Supernet-Transfer in der neuralen Architektur 通过热启动的神经结构搜索中的超级网传输进行强力和高效传输学习 2407.20279v2
  • 1802 06-05 Adversarial Tokenization Adversariale Tokenisierung 对抗感应器化 2503.02174v2
  • 1803 06-05 Model-Based Exploration in Monitored Markov Decision Processes Modellbasierte Exploration in überwachten Markov-Entscheidungsprozessen 在监测的Markov决策过程中进行基于模型的探索 2502.16772v4
  • 1804 06-05 An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus Eine optimale Kaskade-Feature-Level-Spatiotemporale Fusionsstrategie für Anomalieerkennung im CAN-Bus CAN公共汽车异常探测的最佳最佳岩岩岩层地貌级超光时聚变战略 2501.18821v3
  • 1805 06-05 FaCTR: Factorized Channel-Temporal Representation Transformers for Efficient Time Series Forecasting FaCTR: Factorized Channel-Temporal Representation Transformers für effiziente Zeitreihenprognosen FACTR: 高效时间序列预报的有条频道-临时代表制变革器 2506.05597v1
  • 1806 06-05 Zero-shot protein stability prediction by inverse folding models: a free energy interpretation Null-Shot-Proteinstabilitätsprognose durch inverse Faltmodelle: eine freie Energieinterpretation 通过反折叠模型进行零射蛋白稳定性预测:免费能源解释 2506.05596v1
  • 1807 06-05 Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations Unüberwachte Latent Pattern Analyse zur Schätzung des Typ-2-Diabetes-Risikos in nicht diagnostizierten Populationen 未经监督的对未诊断的人群2型糖尿病风险估算的 2505.21824v2
  • 1808 06-05 Nonlinear Causal Discovery through a Sequential Edge Orientation Approach Nichtlineare Causal Discovery durch einen sequentiellen Edge-Orientierungsansatz 通过序列边缘方向方法的非线性因果发现 2506.05590v1
  • 1809 06-05 MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark MMTU: Ein massiver Multi-Task-Tabellen-Verstands- und Orientierungs-Benchmark MMTU: 大规模多任务表格理解和说明理由基准 2506.05587v1
  • 1810 06-05 CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions CoFrNets: Interpretierte Neuralarchitektur Inspiriert von fortgesetzten Brüchen CoFrNets:受连续分数启发的可解释神经结构 2506.05586v1
  • 1811 06-05 TabFlex: Scaling Tabular Learning to Millions with Linear Attention TabFlex: Tabulares Lernen zu Millionen mit linearer Aufmerksamkeit skalieren TabFlex: 以线性关注方式向百万上百万上百万人推广表格学习 2506.05584v1
  • 1812 06-05 Conformal Prediction Adaptive to Unknown Subpopulation Shifts Konforme Vorhersage Anpassung an unbekannte Subpopulationsverschiebungen 适应未知亚人口变化 2506.05583v1
  • 1813 06-05 On the Importance of Gaussianizing Representations Über die Bedeutung gaussianisierender Repräsentationen 关于 “ 高机构化 “ 代表的重要性 2505.00685v2
  • 1814 06-05 The Impact of Inference Acceleration on Bias of LLMs Die Auswirkungen von Inferenzbeschleunigung auf LLM-Bias 推推加速对LLMM的偏见的影响 2410.22118v3
  • 1815 06-05 Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts Gemeinsames Lernen in Agentensystemen: Eine kollektive KI ist größer als die Summe ihrer Teile 危险系统合作学习:集体AI大于其各部分的总和 2506.05577v1
  • 1816 06-05 When can in-context learning generalize out of task distribution? Wann kann das In-Context-Lernen aus der Aufgabenverteilung verallgemeinern? 什么时候能通俗易懂地了解任务分配的概况? 2506.05574v1
  • 1817 06-05 Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning Ravan: Multi-Head Low-Rank Anpassung für Federated Fine-Tuning Ravan:针对联邦罚款的多级低兰克适应措施 2506.05568v1
  • 1818 06-05 Partially-Supervised Neural Network Model For Quadratic Multiparametric Programming Teilweise überwachtes neurales Netzwerkmodell für die quadratische multiparametrische Programmierung 半二次多参数方案拟订部分运行的神经网络模型 2506.05567v1
  • 1819 06-05 Applying Informer for Option Pricing: A Transformer-Based Approach Informer für Optionspreise anwenden: Ein transformerbasierter Ansatz 备选定价应用内线:基于变换器的方法 2506.05565v1
  • 1820 06-05 Relational Conformal Prediction for Correlated Time Series Beziehungskonforme Vorhersage für korrelierte Zeitreihen Cor相关时间序列的通货膨胀正规预测 2502.09443v2
  • 1821 06-05 Mixed-Precision Conjugate Gradient Solvers with RL-Driven Precision Tuning Mixed-Precision Conjugate Gradient Solver mit RL-getriebener Präzisionstuning RL-Driven 精确调试混合精密共分梯度溶剂 2504.14268v4
  • 1822 06-05 DART-Vetter: A Deep LeARning Tool for automatic triage of exoplanet candidates DART-Vetter: Ein Deep LeARning Tool für die automatische Triage von Exoplaneten-Kandidaten DART – – 设计者:一个用于自动分流异丙型板块候选物的深深LeARning工具 2506.05556v1
  • 1823 06-05 Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP Multivariate Temporale Regression im Maßstab: Ein Drei-Pillar-Framework kombiniert ML, XAI und NLP 规模上多变时间回落:三边框架组合 ML、 XAI 和 NLP 2504.02151v2
  • 1824 06-05 Online Conformal Model Selection for Nonstationary Time Series Online-Konforme Modellauswahl für nichtstationäre Zeitreihen 非固定时间系列在线非正式模型选择 2506.05544v1
  • 1825 06-05 Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data Agentomics-ML: Autonomes Machine Learning Experimentation Agent für Genomische und Transkriptionsdaten ML:基因组和转基因数据自动机械学习实验代理 2506.05542v1
  • 1826 06-05 SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms SocialDF: Benchmark Dataset und Detection Model zur Minderung schädlicher Deepfake-Inhalte auf Social Media-Plattformen 社会DF:社会媒体平台减少有害深层内容的基准数据集和检测模式 2506.05538v1
  • 1827 06-05 Faster Approx. Top-K: Harnessing the Full Power of Two Stages Schneller ca. Top-K: Die volle Kraft zweier Etappen nutzen 快速近似。 最高K级: 利用两个阶段的全功率 2506.04165v2
  • 1828 06-05 A new pathway to generative artificial intelligence by minimizing the maximum entropy Ein neuer Weg zur generativen künstlichen Intelligenz durch Minimierung der maximalen Entropie 通过最大限度地减少最大倍增 实现人工智能基因化的新途径 2502.13287v2
  • 1829 06-05 MCMC-Correction of Score-Based Diffusion Models for Model Composition MCMC-Korrektion von Score-basierten Diffusionsmodellen für die Modellzusammensetzung MC-基于记分传播模型模型的示范构成的校正 2307.14012v3
  • 1830 06-05 Spectral Graph Neural Networks are Incomplete on Graphs with a Simple Spectrum Spektrale Graphen-Neuralnetze sind auf Graphen mit einem einfachen Spektrum unvollständig 光谱图形神经网络在带有简单光谱的图形上不完整 2506.05530v1
  • 1831 06-05 Avoiding Death through Fear Intrinsic Conditioning Vermeiden des Todes durch Angst Intrinsische Konditionierung 避免因恐惧而死亡 2506.05529v1
  • 1832 06-05 MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning MORSE-500: Ein programmierbarer Video-Benchmark zum Stress-Test Multimodale Reasoning MORSE-500:压力测试多模式理由方案可控制的视频基准 2506.05523v1
  • 1833 06-05 The Complexity of Learning Sparse Superposed Features with Feedback Die Komplexität des Lernens Sparse überlagerte Funktionen mit Feedback 具有反馈的学习偏散超强特征的复杂性 2502.05407v3
  • 1834 06-05 Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots Learning to Recover: Dynamische Belohnungsformung mit Rad-Leg-Koordination für gefallene Roboter 学习恢复: 堕落机器人使用轮式轴协调进行动态回授形状 2506.05516v1
  • 1835 06-05 Winner-takes-all for Multivariate Probabilistic Time Series Forecasting Winner-takes-all für Multivariate probabilistische Zeitreihen-Prognose 多变量概率性时间序列预测的赢取全胜 2506.05515v1
  • 1836 06-05 Geometric and Physical Constraints Synergistically Enhance Neural PDE Surrogates Geometrische und physikalische Einschränkungen Synergistisch verbessern neurale PDE-Überlagerungen 同步增强神经PDE代号 2506.05513v1
  • 1837 06-05 Investigating Non-Transitivity in LLM-as-a-Judge Untersuchung der Nichtübertragbarkeit in LLM-as-a-Richter 调查LLM-as-A法官的非过渡性 2502.14074v3
  • 1838 06-05 How can Diffusion Models Evolve into Continual Generators? Wie können sich Diffusionsmodelle zu kontinuierlichen Generatoren entwickeln? 传播模型如何发展成持续发电机? 2505.11936v2
  • 1839 06-05 The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models Der Generative Sprung: Scharfe Probenkomplexität für effizientes Lernen Gaussischer Multi-Index-Modelle 产生式滑坡:高效学习高斯多指标模型的简单抽样复杂程度 2506.05500v1
  • 1840 06-05 Multidimensional Analysis of Specific Language Impairment Using Unsupervised Learning Through PCA and Clustering Multidimensionale Analyse der spezifischen Sprachbeeinträchtigung durch unbeaufsichtigtes Lernen durch PCA und Clustering 利用无监督的学习,通过常设仲裁院和集群,对特定语言障碍进行多层面分析 2506.05498v1
  • 1841 06-05 Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models Conformal Prediction Beyond the Seen: Eine fehlende Massenperspektive für Unsicherheitsquantifizierung in generativen Modellen 可见之外的非正式预测:在生成模型中不确定性量化的缺失质量角度 2506.05497v1
  • 1842 06-05 Learning-Augmented Hierarchical Clustering Learning-Augmented Hierarchical Clustering 学习增强的等级集群 2506.05495v1
  • 1843 06-05 GENIUS: A Generative Framework for Universal Multimodal Search GENIUS: Ein generatives Framework für die universelle multimodale Suche GENNIUS: 通用多模式搜索创创框架 2503.19868v2
  • 1844 06-05 Sentiment Analysis in Learning Management Systems Understanding Student Feedback at Scale Sentiment Analysis in Lernmanagement-Systemen Verständnis des Feedbacks von Studierenden auf Scale 学习管理系统对学习管理系统的情感分析 了解学生的大规模反馈 2506.05490v1
  • 1845 06-05 Initial Model Incorporation for Deep Learning FWI: Pretraining or Denormalization? Ursprüngliche Modellinkorporation für Deep Learning FWI: Vor- oder Entnormierung? FWI: 培训前或非正规化? 2506.05484v1
  • 1846 06-05 ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting ODE-GS: Latente ODEs für dynamische Szene Extrapolation mit 3D Gaussian Splatting ODE-GS: 带有 3D Gaussian Splatting 3D 的动态场点外推法 2506.05480v1
  • 1847 06-05 Learning-Augmented Algorithms for MTS with Bandit Access to Multiple Predictors Lerngesteigerte Algorithmen für MTS mit Bandit-Zugriff auf mehrere Vorhersager 多边贸易体系的学习强化比值,使强盗能够接触多种预测者 2506.05479v1
  • 1848 06-05 Inference-Time Hyper-Scaling with KV Cache Compression Inferenz-Zeit-Hyper-Skalierung mit KV-Cache-Kompression 与 KV 缓存压缩一起的推断时间超大缩放 2506.05345v1
  • 1849 06-05 Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets Warum LLM Sicherheits-Guardrails nach Feinabstimmung zusammenbrechen: Eine Ähnlichkeitsanalyse zwischen Alignment- und Feinabstimmungs-Datensätzen 为何LLM 安全护卫车在微调后倒塌:对准和微调数据集之间的相似性分析 2506.05346v1
  • 1850 06-05 Search Arena: Analyzing Search-Augmented LLMs Sucharena: Analyse von Such-Augmentierten LLMs 搜索竞技场: 分析搜索推荐LMS 2506.05334v1
  • 1851 06-05 AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning AceReason-Nemotron: Mathematische und Code-Reasonierung durch Stärkungslernen AceReson-Nepron:通过强化学习推进数学和守则 2505.16400v3
  • 1852 06-05 Zeroth-Order Optimization Finds Flat Minima Nullte-Order-Optimierung findet flache Minima 零阶优化发现平底微米 2506.05454v1
  • 1853 06-05 The broader spectrum of in-context learning Das breitere Spektrum des In-Context-Lernens 广义的内通学习 2412.03782v3
  • 1854 06-05 Admissibility of Completely Randomized Trials: A Large-Deviation Approach Zulässigkeit von völlig randomisierten Prüfungen: Ein Ansatz mit großer Abweichung 完全随机审理的可受理性:一个大型减缓办法 2506.05329v1
  • 1855 06-05 Seeing the Invisible: Machine learning-Based QPI Kernel Extraction via Latent Alignment Das Unsichtbare sehen: Maschinelles Lernen basierte QPI-Kernel-Extraktion über Latent Alignment 可见:机器学习 QPI Kernel 采掘,通过Lient对齐 2506.05325v1
  • 1856 06-05 LSM-2: Learning from Incomplete Wearable Sensor Data LSM-2: Lernen von unvollständigen Wearable Sensordaten LSM-2:学习不全穿感应器数据 2506.05321v1
  • 1857 06-05 Generalizable, real-time neural decoding with hybrid state-space models Verallgemeinerbare, Echtzeit-Neuraldekodierung mit hybriden State-Space-Modellen 与混合状态空间模型的通用实时神经解码 2506.05320v1
  • 1858 06-05 Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay Verbesserung der Dateneffizienz für LLM-Verstärkung Feinabstimmung durch problemorientierte Online-Datenauswahl und Rollout-Replay 提高LLLM数据效率,通过困难目标在线数据选择和推出重播提高LLM强化微调的数据效率 2506.05316v1
  • 1859 06-05 Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models Engropisches Lernen eingeschränkt: Ein primäres Rahmenwerk für große Sprachmodelle 未学习:大语言模式的原始-双重框架 2506.05314v1
  • 1860 06-05 Learning normalized image densities via dual score matching Normalisierte Bilddichten durch Dual-Score-Matching lernen 通过双分比比对学习图像密度正常化 2506.05310v1
  • 1861 06-05 ProRefine: Inference-time Prompt Refinement with Textual Feedback ProRefine: Inferenz-Zeit Prompt Verfeinerung mit Text-Feedback ProRefine: 用文字反馈迅速改进推论-时间 2506.05305v1
  • 1862 06-05 Power Law Guided Dynamic Sifting for Efficient Attention Power Law geführtes Dynamic Sifting für effiziente Aufmerksamkeit 有效关注动力法指导动态筛选 2506.05300v1
  • 1863 06-05 Control Tax: The Price of Keeping AI in Check Steuerregelung: Der Preis der AII-Kontrolle 控制税:检查人工智能检查价格 2506.05296v1
  • 1864 06-05 A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$: Robust Imitation via Learning to Search Ein glattes Meer hat nie eine geschickte $\texttt{SAILOR}$: Robuste Imitation über das Lernen zu suchen 平滑的海海从未造出技能用$\ textt{ SAILOR}$: 通过学习搜索进行强力模仿 2506.05294v1
  • 1865 06-05 Learning Beyond Experience: Generalizing to Unseen State Space with Reservoir Computing Learning Beyond Experience: Mit Reservoir Computing den Zustand des Weltraums verallgemeinern 超越经验的学习:利用储量计算,向不见得的国家空间推广 2506.05292v1
  • 1866 06-05 Stable Vision Concept Transformers for Medical Diagnosis Stabile Vision Konzepttransformatoren für medizinische Diagnose 医学诊断的稳定远景构想变异器 2506.05286v1
  • 1867 06-05 Fast-DataShapley: Neural Modeling for Training Data Valuation Fast-DataShapley: Neurale Modellierung für Trainingsdatenbewertung 快速数据库:培训数据评估神经模型 2506.05281v1
  • 1868 06-05 How to Unlock Time Series Editing? Diffusion-Driven Approach with Multi-Grained Control Wie man Zeitreihenbearbeitung entsperrt? Diffusion-getriebener Ansatz mit mehrkörniger Steuerung 如何解锁时间序列编辑 ? 2506.05276v1
  • 1869 06-05 Tight analyses of first-order methods with error feedback Enge Analysen von First-Order-Methoden mit Fehlerrückmeldung 利用错误反馈对一级处理方法进行严格分析 2506.05271v1
  • 1870 06-05 Context is Key: A Benchmark for Forecasting with Essential Textual Information Kontext ist der Schlüssel: Ein Benchmark für Vorhersagen mit wesentlichen Textinformationen 背景是关键:用基本文字信息预测的基准 2410.18959v4
  • 1871 06-05 Learning long range dependencies through time reversal symmetry breaking Längere Abhängigkeiten durch Zeitumkehr-Symmetrie-Bruch lernen 通过时间逆向对称断裂,通过时间逆向对称断裂,学习远距离依赖性 2506.05259v1
  • 1872 06-05 Continual Learning from Simulated Interactions via Multitask Prospective Rehearsal for Bionic Limb Behavior Modeling Kontinuierliches Lernen von Simulierten Interaktionen über Multitask Prospective Probe für Bionic Limb Behavior Modeling 通过多塔斯克双立架行为模拟模型预演预演预演,从模拟互动中不断学习 2405.01114v4
  • 1873 06-05 Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning Konservative Klassifikatoren tun konsequent gut mit Verbesserung Agenten: Charakterisierung statistischer und Online-Lernen 保守的分类机构与改进机构保持一贯的很好:将统计和在线学习定性为特征 2506.05252v1
  • 1874 06-05 On the Convergence of Gradient Descent on Learning Transformers with Residual Connections Über die Konvergenz des gradienten Abstiegs auf Lerntransformatoren mit residualen Verbindungen 关于有残余连接的学习变异器的 “ 渐渐后代 “ 趋同 2506.05249v1
  • 1875 06-05 Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress LLM-Fortschritt neu denken: Berechenbare und unabhängige Wege zum Fortschritt 重新思考LLM进步:计算依赖性和独立前进之路 2505.04075v2
  • 1876 06-05 A theoretical framework for overfitting in energy-based modeling Ein theoretischer Rahmen für die Überrüstung in energiebasierte Modellierung 以能源为基础的建模的超套理论框架 2501.19158v3
  • 1877 06-05 Robust Moment Identification for Nonlinear PDEs via a Neural ODE Approach Robuste Moment-Identifikation für nichtlineare PDEs über einen neuralen ODE-Ansatz 通过神经极光模式方法确定非线性PDE的强劲势头 2506.05245v1
  • 1878 06-05 Aligning Latent Spaces with Flow Priors Latente Räume mit Flow Priors ausrichten 将局部空间与流动前置空间对齐 2506.05240v1
  • 1879 06-05 Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit Sparse Autoencoder bewerten: Vom shallow Design bis hin zum passenden Pursuit 评估粗略自动校对器:从浅色设计到匹配追求 2506.05239v1
  • 1880 06-05 Timing is Important: Risk-aware Fund Allocation based on Time-Series Forecasting Timing ist wichtig: Risiko-aware Fund Allokation basierend auf Time-Series Forecasting 时间选择很重要:根据时间-系列预测进行有风险的基金分配 2505.24835v2
  • 1881 06-05 Blink of an eye: a simple theory for feature localization in generative models Blink des Auges: eine einfache Theorie zur Featurelokalisierung in generativen Modellen 眨眼:在基因模型中地貌化的简单理论 2502.00921v2
  • 1882 06-05 Stochastic Poisson Surface Reconstruction with One Solve using Geometric Gaussian Processes Stochastische Poisson-Oberflächenrekonstruktion mit einer Lösung mit geometrischen Gauß-Prozessen 利用几何高斯进程进行一索一索的斯托查斯·普瓦松地面重建 2503.19136v2
  • 1883 06-05 MesaNet: Sequence Modeling by Locally Optimal Test-Time Training MesaNet: Sequenzmodellierung durch lokal optimale Test-Time-Schulung MesaNet:通过当地最佳试验时间培训进行序列建模 2506.05233v1
  • 1884 06-05 Causal Discovery from Conditionally Stationary Time Series Causal Discovery aus Conditionally Stationary Time Series 从有条件的静止时间序列中发现的因果发现 2110.06257v4
  • 1885 06-05 Progressive Tempering Sampler with Diffusion Progressive Temperierung mit Diffusion 具有扩散作用的渐进性温度采样器 2506.05231v1
  • 1886 06-05 Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts Diagonales Batching löst Parallelismus in recurrenten Speichertransformatoren für lange Kontexte auf 对角对角拔 解锁长期常用内存变换器中的平行主义 2506.05229v1
  • 1887 06-05 The Lessons of Developing Process Reward Models in Mathematical Reasoning Die Lehren aus der Entwicklung von Prozess-Reward-Modellen in mathematischer Reasoning 数学理由中发展进程奖励模型的经验教训 2501.07301v2
  • 1888 06-05 A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values Ein einheitliches Rahmenwerk für wahrscheinlich effiziente Algorithmen zur Schätzung von Shapley-Werten 用于估计损耗值的可预见高效算法统一框架 2506.05216v1
  • 1889 06-05 Learning Theory of Decentralized Robust Kernel-Based Learning Algorithm Lerntheorie des dezentralisierten robusten Kernel-basierten Lernalgorithmus 分散的强力核心学习学习理论 2506.05215v1
  • 1890 06-05 Mitigating Degree Bias Adaptively with Hard-to-Learn Nodes in Graph Contrastive Learning Degree Bias im Graph Contrastive Learning adaptiv mit schwer zu erlernenden Knoten abmildern 与图表对比学习中难以识别的节点相适应 2506.05214v1
  • 1891 06-05 Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data Verstärktes Lernen für individuelle Optimale Politik aus heterogenen Daten 利用不同不同数据为个人最佳政策强化学习 2505.09496v2
  • 1892 06-05 Unleashing The Power of Pre-Trained Language Models for Irregularly Sampled Time Series Lösen der Macht von vortrainierten Sprachmodellen für unregelmäßig gemusterte Zeitreihen 为不定期抽样抽样时间序列释放培训前语言模型的力量 2408.08328v2
  • 1893 06-05 The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text The Common Pile v0.1: Ein 8TB-Datensatz von Public Domain und Openly Licensed Text Pile v0.1:公共域和公开许可文本的 8TB数据集 2506.05209v1
  • 1894 06-05 One Wave To Explain Them All: A Unifying Perspective On Feature Attribution Eine Welle, um sie alle zu erklären: Eine verbindende Perspektive auf Feature Attribution 一波一波解释全部:一个统一化的地物归属观 2410.01482v2
  • 1895 06-05 Trustworthiness Preservation by Copies of Machine Learning Systems Vertrauenswürdigkeit Erhaltung durch Kopien von Machine Learning Systemen 机器学习系统副本保存可信赖性 2506.05203v1
  • 1896 06-05 Transformers Meet In-Context Learning: A Universal Approximation Theory Transformer treffen auf In-Context Learning: Eine universelle Annäherungstheorie 变革者满足内文学习:通用近似理论 2506.05200v1
  • 1897 06-05 Quantifying Cross-Modality Memorization in Vision-Language Models Quantifizierung der Cross-Modality-Memorisierung in Vision-Language-Modellen 设想语言模型中量化跨模式的计量 2506.05198v1
  • 1898 06-05 Abnormal component analysis Abnorme Komponentenanalyse 异常元件分析 2312.16139v2
  • 1899 06-05 Locality Preserving Markovian Transition for Instance Retrieval Lokalität Erhaltung der Markovschen Übergang für Instance Retrieval 将Markovian 过渡保留为事件检索 2506.05196v1
  • 1900 06-05 Chaotic Hedging with Iterated Integrals and Neural Networks Chaotische Absicherung mit iterierten Integralen und neuralen Netzwerken 使用循环综合器和神经网络的冷酷杂乱和神经网络 2209.10166v4
  • 1901 06-05 Counterfactual reasoning: an analysis of in-context emergence Gegenfaktische Argumentation: eine Analyse der Entstehung von Inkontexten 反事实推理:对内源出现的分析 2506.05188v1
  • 1902 06-05 TreeRPO: Tree Relative Policy Optimization TreeRPO: Baum Relative Politikoptimierung TreeRPO:树相对政策优化 2506.05183v1
  • 1903 06-05 AI-powered Contextual 3D Environment Generation: A Systematic Review KI-powered Contextual 3D Environment Generation: Eine systematische Überprüfung 3D环境代:系统审查 2506.05449v1
  • 1904 06-05 Associative Memory and Generative Diffusion in the Zero-noise Limit Assoziatives Gedächtnis und Generative Diffusion in der Null-Rausch-Grenze 零噪音限制中的联合内存和生成扩散 2506.05178v1
  • 1905 06-05 Can Large Language Models Understand Intermediate Representations in Compilers? Können große Sprachmodelle Zwischendarstellungen in Compilern verstehen? 大语言模式能理解《汇编者》的中间代表吗? 2502.06854v2
  • 1906 06-05 Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks Verallgemeinerungen durch Meta-Learned Model Representations: PAC-Bayes und Sample Compression Hypernetworks 通过Met-Learn模型代表:PAC-Bayes和抽样压缩超网络的通用环 2410.13577v3
  • 1907 06-05 Goal-Oriented Time-Series Forecasting: Foundation Framework Design Zielorientierte Zeitreihenplanung: Grundlagenrahmengestaltung 目标导向的时间系列预测:基础框架设计 2504.17493v2
  • 1908 06-05 Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests Bewertung der Robustheit von Deep Reinforcement Learning für die autonome Oberflächenfahrzeugsteuerung in Feldversuchen 评价实地测试中用于自动水陆车辆控制的深强化学习的强力 2505.10033v2
  • 1909 06-05 Detection-Driven Object Count Optimization for Text-to-Image Diffusion Models Detektionsgesteuerte Objektanzahl-Optimierung für Text-zu-Bild-Diffusionsmodelle 文本到图像传播模型最佳化 2408.11721v2
  • 1910 06-05 SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows SESaMo: Symmetrie-verstärkende stochastische Modulation für normalisierende Strömungen SESaMo: 正常流动的对称性-强化斯托调动 2505.19619v2
  • 1911 06-05 Federated Isolation Forest for Efficient Anomaly Detection on Edge IoT Systems Föderierter Isolationswald für effiziente Anomalienerkennung an Edge IoT-Systemen 在边缘IOT系统中高效异常探测的联邦隔离林 2506.05138v1
  • 1912 06-05 Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning Training Dynamics zugrunde liegende Sprachmodellskalierungsgesetze: Verlustverschleierung und Null-Summe-Lernen 培训动态基础语言示范缩写法:损失减速和零苏姆学习 2506.05447v1
  • 1913 06-05 DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning DiCoRe: Erweitern der Null-Shot-Erkennung durch Divergent-Convergent LLM Reasoning DiCore: 通过差异-说服者LLM 合理性,加强零射事件探测 2506.05128v1
  • 1914 06-05 Membership Inference Attacks on Sequence Models Membership Inferenz Attacks on Sequence Models 对序列模型的攻击 2506.05126v1
  • 1915 06-05 Jailbreak Attack Initializations as Extractors of Compliance Directions Jailbreak Attack Initialisierungen als Extraktoren für Compliance-Anweisungen 作为 “ 遵守指示 “ 的提取器 2502.09755v2
  • 1916 06-05 Nonlinear Causal Discovery for Grouped Data Nichtlineare kausale Entdeckung für gruppierte Daten 分组数据的非线性构造发现 2506.05120v1
  • 1917 06-05 Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images Erforschung von adversarialen Wasserzeichen in transformerbasierten Modellen: Übertragbarkeit und Robustheit gegen Abwehrmechanismen für medizinische Bilder 探索以变换模式为基础的变换模型中的反对水标记:医疗图像防御机制的可转移性和强力 2506.06389v1
  • 1918 06-05 Learning Rock Pushability on Rough Planetary Terrain Rock Pushability lernen auf rauem planetarischen Gelände 环绕行星地的学习岩岩推力 2505.09833v2
  • 1919 06-05 Survey on the Evaluation of Generative Models in Music Umfrage zur Bewertung generativer Modelle in der Musik 音乐创创模型评价调查 2506.05104v1
  • 1920 06-05 Structural Alignment Improves Graph Test-Time Adaptation Struktural Alignment verbessert Graph Test-Time Anpassung 结构调整改进图示测试时间适应 2502.18334v3
  • 1921 06-05 Privacy Amplification Through Synthetic Data: Insights from Linear Regression Datenschutzverstärkung durch synthetische Daten: Einblicke aus linearer Regression 通过合成数据放大隐私:线性回归的洞视 2506.05101v1
  • 1922 06-05 The Impossibility of Fair LLMs Die Unmöglichkeit fairer LLMs 公平专利Ms的不可行性 2406.03198v2
  • 1923 06-05 Semi-Implicit Variational Inference via Kernelized Path Gradient Descent Halbimplizite Variationsableitung über kernalisierten Pfad gradienten Abstieg 通过内核化路径梯度梯度 2506.05088v1
  • 1924 06-05 Autoformulation of Mathematical Optimization Models Using LLMs Autoformulierung mathematischer Optimierungsmodelle mit LLMs 使用LLMM 的数学优化模型自动格式化 2411.01679v2
  • 1925 06-05 EMBER2024 – A Benchmark Dataset for Holistic Evaluation of Malware Classifiers EMBER2024 – Ein Benchmark-Datensatz für die ganzheitliche Auswertung von Malware-Klassifikatoren 2024年9月24日 – – 用于全面评估软件分类器基准数据集 2506.05074v1
  • 1926 06-05 GoRA: Gradient-driven Adaptive Low Rank Adaptation GoRA: Gradient-getriebene Adaptive Low-Rank-Anpassung GARA:逐步驱动的适应性低级别适应 2502.12171v2
  • 1927 06-05 UnHiPPO: Uncertainty-aware Initialization for State Space Models UnHiPPO: Unsichere Initialisierung für State Space Models UNHPPPO:国家空间模型的不确定性初始化 2506.05065v1
  • 1928 06-05 NIMO: a Nonlinear Interpretable MOdel NIMO: ein nichtlinear interpretierbarer MOdel NIIMO:非线性解释元体 2506.05059v1
  • 1929 06-05 Mini Diffuser: Fast Multi-task Diffusion Policy Training Using Two-level Mini-batches Mini Diffusor: Schnelles Multi-Task Diffusion Policy Training mit zweistufigen Mini-Batches Mini Diffuser:利用两级小型水槽快速多任务传播政策培训 2505.09430v2
  • 1930 06-05 Neurosymbolic Association Rule Mining from Tabular Data Neurosymbolische Assoziationsregel Mining aus tabellarischen Daten 利用表格数据进行常规采矿 2504.19354v2
  • 1931 06-05 A Compressive-Expressive Communication Framework for Compositional Representations Ein kompressiv-expressiver Kommunikationsrahmen für kompositorische Repräsentationen 组成代表制压缩-高压沟通框架 2501.19182v3
  • 1932 06-05 Optimizing Anytime Reasoning via Budget Relative Policy Optimization Optimierung jederzeit über Budget-Relational-Policy-Optimierung 通过预算相对政策优化优化,优化任何时间的理由 2505.13438v2
  • 1933 06-05 Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic Causal Policy Learning in Verstärkung Lernen: Hintertür-angepasste Soft-Actor-Kritik 强化学习中因果政策学习:后门调整后软动作器-批评 2506.05445v1
  • 1934 06-05 A Selective Homomorphic Encryption Approach for Faster Privacy-Preserving Federated Learning Ein selektiver homomorpher Verschlüsselungsansatz für schnelleres Datenschutz-Erhalten von Federated Learning 为更快的隐私-保护联邦学习采取选择性单态加密方法 2501.12911v4
  • 1935 06-05 iN2V: Bringing Transductive Node Embeddings to Inductive Graphs iN2V: Transduktive Knoteneinbettungen in induktive Graphen bringen iN2V: 将感应节点嵌入式带入感应图 2506.05039v1
  • 1936 06-05 Scalable Multi-Agent Offline Reinforcement Learning and the Role of Information Skalierbares Mehr-Agent-Offline-Verstärkungs-Lernen und die Rolle der Information 可缩放的多代理机构脱线强化学习和信息的作用 2502.11260v2
  • 1937 06-05 TIMING: Temporality-Aware Integrated Gradients for Time Series Explanation TIMING: Temporality-Aware integrierte Gradienten für die Zeitreihe Erklärung TIMING: 时间序列解释时空软件集成梯度 2506.05035v1
  • 1938 06-05 Identifying and Understanding Cross-Class Features in Adversarial Training Identifizieren und Verstehen von Cross-Class-Features im Adversarial Training 确定和理解反向培训的跨系列特点 2506.05032v1
  • 1939 06-05 Artificial Intelligence Should Genuinely Support Clinical Reasoning and Decision Making To Bridge the Translational Gap Künstliche Intelligenz sollte die klinische Vernunft und Entscheidungsfindung wirklich unterstützen, um die translationale Lücke zu überbrücken 人工情报应真正支持临床理由和决策,以弥补翻译差距 2506.05030v1
  • 1940 06-05 Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning Engpässe Transformer: Regelmäßige KV-Cache Abstraktion für generalisierte Vernunft 装瓶式变压器: 用于一般理由的周期 KV 缓存摘要 2505.16950v2
  • 1941 06-05 Tuning the Right Foundation Models is What you Need for Partial Label Learning Tuning the right foundation models is what you need for Partial Label Learning 向右基建模投注是部分标签学习所需要的 2506.05027v1
  • 1942 06-05 A unified weighting framework for evaluating nearest neighbour classification Ein einheitlicher Gewichtungsrahmen für die Bewertung der nächsten Nachbarklassifikation 评价近邻分类的统一加权框架 2311.16872v3
  • 1943 06-05 GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs GRU: Abbau des Kompromisses zwischen Unlearning und Retention für LLMs GRU:减少LLMM的不学习与保留之间的取舍 2503.09117v3
  • 1944 06-05 U-NetMN and SegNetMN: Modified U-Net and SegNet models for bimodal SAR image segmentation U-NetMN und SegNetMN: Modifizierte U-Net- und SegNet-Modelle für bimodale SAR-Bildsegmentierung U-NetMN和SegNetMN:经过修改的双式合成孔径雷达图像分离 U-Net和SegNet模型和SegNet模型 2506.05444v1
  • 1945 06-05 On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding Über Expressive Power von Schleiftransformatoren: Theoretische Analyse und Verbesserung über die Zeitschrittkodierung 循环变换器的表达力:通过时间步骤编码进行理论分析和加强 2410.01405v7
  • 1946 06-05 Stein Variational Evolution Strategies Strategien der Steinvariational Evolution Stein变异演变战略 2410.10390v2
  • 1947 06-05 Controlling Summarization Length Through EOS Token Weighting Kontrolle der Zusammenfassung Länge durch EOS Token Gewichtung 控制通过 EOS 过 EOS 键权重加权的控控总和长度 2506.05017v1
  • 1948 06-05 A new architecture of high-order deep neural networks that learn martingales Eine neue Architektur von hochordentlich tiefen neuronalen Netzwerken, die martingales lernen 一个由高阶深层神经网络组成的新结构,能够学习马丁鱼 2505.03789v2
  • 1949 06-05 Multi-Point Proximity Encoding For Vector-Mode Geospatial Machine Learning Multi-Point Proximity Encoding für Vector-Mode Geospatial Machine Learning 矢量模式地理空间机器学习多点近似编码 2506.05016v1
  • 1950 06-05 Towards Reasonable Concept Bottleneck Models Auf dem Weg zu vernünftigen Konzeptmodellen für Engpässe 迈向合理概念的瓶颈模式 2506.05014v1
  • 1951 06-05 Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity 1568 Tokens in einen einzigen Vektor und wieder zurück krammen: Die Grenzen der Einbettung von Raumkapazität erkunden 将1568吨撞成单一矢量和后向:探索嵌入空间能力的极限 2502.13063v2
  • 1952 06-05 QiMeng: Fully Automated Hardware and Software Design for Processor Chip QiMeng: Vollautomatisierte Hardware und Software-Design für Prozessor Chip QiMeng:处理芯片全自动硬件和软件设计 2506.05007v1
  • 1953 06-05 Monotonic anomaly detection Monotonische Anomalieerkennung 单声异常探测 2410.23158v2
  • 1954 06-05 Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games Vorsichtiger Optimismus: Ein Meta-Algorithmus für fast-konstantes Bedauern in den allgemeinen Spielen 谨慎乐观主义:普通运动会中近距离遗憾的代算算法 2506.05005v1
  • 1955 06-05 UniPTMs: The First Unified Multi-type PTM Site Prediction Model via Master-Slave Architecture-Based Multi-Stage Fusion Strategy and Hierarchical Contrastive Loss UniPTMs: Das erste einheitliche Multi-Type PTM Site Prediction Model über Master-Slave Architecture-basierte Multi-Stage Fusionsstrategie und hierarchischen Kontrastverlust UNIPTMs:第一个单一多型PTM多型PTM站点预测模型,通过基于万能结构的建筑多阶段融合战略和等级对立损失 2506.05443v1
  • 1956 06-05 FPTQuant: Function-Preserving Transforms for LLM Quantization FPTQuant: Funktionsschonende Transformationen für die LLM-Quantisierung FPTQuant: 用于LLM量化的功能保存变换 2506.04985v1
  • 1957 06-05 Agentic AI for Intent-Based Industrial Automation Agentische KI für Intent-Based Industrial Automation 用于内在工业自动化的AAA剂 2506.04980v1
  • 1958 06-05 Entropy-based Training Methods for Scalable Neural Implicit Sampler Entropie-basierte Trainingsmethoden für skalierbare neurale Implizite Sampler 用于可缩缩的神经隐隐性取样器的基于英信的培训方法 2306.04952v2
  • 1959 06-05 Deep histological synthesis from mass spectrometry imaging for multimodal registration Tiefe histologische Synthese aus Massenspektrometrie-Bildgebung für multimodale Registrierung 从用于多式联运注册的质谱成像成像中产生深度神学合成 2506.05441v1
  • 1960 06-05 Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences Diff-Instruct++: Ein-Schritt-Training Text-zu-Bild-Generator-Modell zur Ausrichtung mit menschlichen Einstellungen Diff- Instrucct+++:培训一步骤文本到图像生成模型,使之与人类偏好相一致 2410.18881v2
  • 1961 06-05 Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule Pilotieren von strukturbasiertem Drug Design über Modalitätsspezifische Optimale Planung 通过特定方式和特定最佳安排表进行基于结构的药物设计 2505.07286v2
  • 1962 06-05 Efficient Time Series Processing for Transformers and State-Space Models through Token Merging Effiziente Zeitreihenverarbeitung für Transformatoren und State-Space-Modelle durch Token Merging 通过 Token 合并对变形器和国家空间模型的有效时间序列处理 2405.17951v3
  • 1963 06-05 Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models Gemeinsame Interventionseffekte von Single-Variablen-Interventionen in Additiven-Modellen lernen 在 “ 补充模式 “ 中单一可变干预措施的 学习联合干预效应 2506.04945v1
  • 1964 06-05 Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation Schwacher Generativer Sampler zur effizienten Probeninvariantverteilung der stochastischen Differentialgleichung 用于有效取样的弱生成采样器,以有效采样、不变化地分配蒸蒸蒸式差别等式的微变式采样器 2405.19256v2
  • 1965 06-05 David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training David und Goliath: Kleines Ein-Schritt-Modell schlägt große Diffusion mit Partitur nach dem Training David和Goliath:小型一步骤模式比大规模传播更胜于记分后培训 2410.20898v3
  • 1966 06-05 Can Transformers Do Enumerative Geometry? Können Transformer enumerative Geometrie? 变形人能做数字几何吗? 2408.14915v3
  • 1967 06-05 Energentic Intelligence: From Self-Sustaining Systems to Enduring Artificial Life Energentische Intelligenz: Von selbsterhaltenden Systemen zum Ausharren des künstlichen Lebens 能源情报:从自我维持系统到人命持续 2506.04916v1
  • 1968 06-05 MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining MiMo: Entsperren des vernünftigen Potenzials des Sprachmodells – Von der Vorschulung zur Nachschulung 米莫:释放语文模式的理性潜力 – – 从预培训到员额培训 2505.07608v2
  • 1969 06-05 Dissecting Long Reasoning Models: An Empirical Study Modelle der langen Vernunft zu entschlüsseln: Eine empirische Studie 解析长期理由模型:经验研究 2506.04913v1
  • 1970 06-05 When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models Wenn LLMs denken lügen: Enthüllen der strategischen Täuschung in Repräsentationen von Vernunftmodellen 当思考LLLM Lie:在解释理由模型代表中消除战略欺骗时 2506.04909v1
  • 1971 06-05 Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy Trainieren Sie einen Sparse Autoencoder über mehrere Sparsity-Haushalte, um Dolmetschbarkeit und Genauigkeit zu erhalten 跨多级预算培训一个简单自动编码员,以维护可解释性和准确性 2505.24473v2
  • 1972 06-05 Are all models wrong? Fundamental limits in distribution-free empirical model falsification Sind alle Modelle falsch? Grundlegende Grenzen in der verteilungsfreien empirischen Modellfälschung 所有模型是否都错了?无分销经验模型伪造的基本限制 2502.06765v2
  • 1973 06-05 Leveraging Diffusion Models for Parameterized Quantum Circuit Generation Nutzung von Diffusionsmodellen für die parameterisierte Quantum Circuit Generation 利用可计量量子电路生成的传播模型 2505.20863v2
  • 1974 06-05 Scalable Multi-Output Gaussian Processes with Stochastic Variational Inference Skalierbare Multi-Output Gaußsche Prozesse mit stochastischer Variationsableitung 可缩放多输出高斯进程,带有斯托卡变动推断 2407.02476v2
  • 1975 06-05 Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge Full-Parameter Continual Pretraining von Gemma2: Einblicke in Fluency und Domain Knowledge Gemma2:深入了解流能和广域知识 2505.05946v2
  • 1976 06-05 Gaussian Process Diffeomorphic Statistical Shape Modelling Outperforms Angle-Based Methods for Assessment of Hip Dysplasia Gaussian Prozess Diffeomorphe statistische Form Modellierung übertrifft Winkel-basierte Methoden zur Beurteilung von Hüftdysplasie Gausian 进程 外形外形光效光效模型统计形态底部形态评估方法 2506.04886v1
  • 1977 06-05 Self-Tuning Spectral Clustering for Speaker Diarization Selbst-Tuning Spectral Clustering für Sprecher-Diarisierung 为议长进行自我引导光谱群集 2410.00023v2
  • 1978 06-05 Multiplicative Dynamic Mode Decomposition Multiplikative Dynamische Moduszersetzung 多倍复制动态模式分解 2405.05334v2
  • 1979 06-05 Multi-granularity Knowledge Transfer for Continual Reinforcement Learning Multi-Granularitäts-Wissenstransfer für kontinuierliches Stärkungslernen 用于持续加强学习的多族知识转让 2401.15098v3
  • 1980 06-05 kTULA: A Langevin sampling algorithm with improved KL bounds under super-linear log-gradients kTULA: Ein Langevin-Sampling-Algorithmus mit verbesserten KL-Grenzen unter superlinearen Log-Gradienten kTULA: Langevin 取样算法,在超线性日志梯度下改进 KL 界限 2506.04878v1
  • 1981 06-05 There Was Never a Bottleneck in Concept Bottleneck Models Es gab nie einen Engpass in Konzept Engpässe Modelle ” 概念瓶颈模型 “ 从未出现过瓶颈问题 2506.04877v1
  • 1982 06-05 Aligning Multimodal Representations through an Information Bottleneck Multimodale Repräsentationen durch einen Informationsengpass ausrichten 通过信息瓶颈协调多式联运代表机构 2506.04870v1
  • 1983 06-05 LLMs for sensory-motor control: Combining in-context and iterative learning LLMs für die sensorisch-motorische Steuerung: Kombination von Kontext und iterativem Lernen 感官运动控制LLMs:将内文与迭接学习相结合 2506.04867v1
  • 1984 06-05 Fokker-Planck to Callan-Symanzik: evolution of weight matrices under training Fokker-Planck zu Callan-Symanzik: Entwicklung von Gewichtsmatrizen im Training Fokker-Planck to Callan-Symanzik:培训中重量矩阵的演变 2501.09659v2
  • 1985 06-05 Improving AI-generated music with user-guided training Verbesserung der KI-generierten Musik durch benutzergeführtes Training 通过用户指导培训,改进AI制制的音乐 2506.04852v1
  • 1986 06-05 An Unsupervised Framework for Dynamic Health Indicator Construction and Its Application in Rolling Bearing Prognostics Ein unbeaufsichtigtes Framework für die Konstruktion dynamischer Gesundheitsindikatoren und deren Anwendung in der Wälzlagerprognostik 一个无人监督的动态健康指标建设框架及其在滚轴预测器中的应用 2506.05438v1
  • 1987 06-05 Learning pure quantum states (almost) without regret Lernen von reinen Quantenzuständen (fast) ohne Bedauern 学习纯量度状态(几乎) 2406.18370v2
  • 1988 06-05 Hybrid deep convolution model for lung cancer detection with transfer learning Hybrides Deep-Convolution-Modell für Lungenkrebserkennung mit Transfer-Lernen 肺癌检测与转移学习的混合深变深混合模型 2501.02785v2
  • 1989 06-05 Event Classification of Accelerometer Data for Industrial Package Monitoring with Embedded Deep Learning Event-Klassifikation von Accelerometer-Daten für die industrielle Paketüberwachung mit eingebettetem Deep Learning 用于以嵌入深层学习方式监测工业一揽子措施的加速计数据的事件分类 2506.05435v1
  • 1990 06-05 From EHRs to Patient Pathways: Scalable Modeling of Longitudinal Health Trajectories with LLMs Von EHRs zu Patientenpfaden: Skalierbare Modellierung von langitudinalen Gesundheits-Trajektorien mit LLMs 从EHRs到患者途径:可扩展的长纵向健康轨迹模型与LLMs 2506.04831v1
  • 1991 06-05 Reading Recognition in the Wild Leseerkennung in der Wildnis 野生生物的阅读识别 2505.24848v2
  • 1992 06-05 Supervised Graph Contrastive Learning for Gene Regulatory Network Überwachtes Graph Kontrastives Lernen für Gene Regulatory Network 受监督的基因监管网络图表对比性学习 2505.17786v2
  • 1993 06-05 Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light Detectors Tor das Stopplicht: Realistische Widersacher-Angriffe auf Verkehrslicht-Detektoren 愚笨的暗灯:现实的对抗性对交通灯光探测器的补丁攻击 2506.04823v1
  • 1994 06-05 Offline Multi-agent Reinforcement Learning via Score Decomposition Offline-Multi-Agenten-Verstärkung Lernen über Score-Dekomposition 通过计分分分分分分化进行离线多剂强化学习 2505.05968v2
  • 1995 06-05 LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning LogicPuzzleRL: Robuste mathematische Reasoning in LLMs durch Verstärkungslernen kultivieren LologicPuwloughRL:通过强化学习在LLMs中培养强力数学理由 2506.04821v1
  • 1996 06-05 Distributional encoding for Gaussian process regression with qualitative inputs Verteilungskodierung für Gaußsche Prozessregression mit qualitativen Inputs 带有定性投入的高斯进程回归的分布编码 2506.04813v1
  • 1997 06-05 Uniform convergence of the smooth calibration error and its relationship with functional gradient Einheitliche Konvergenz des glatten Kalibrierfehlers und seines Verhältnisses mit dem funktionellen Gradienten 平稳校准误差及其与功能梯度的关系统一汇合 2505.19396v2
  • 1998 06-05 Adaptive Preconditioners Trigger Loss Spikes in Adam Adaptive Preconditioners Trigger Verlust Spikes in Adam 亚当的触发器失落斯派克 2506.04805v1
  • 1999 06-05 Augmented Invertible Koopman Autoencoder for long-term time series forecasting Augmented Invertible Koopman Autoencoder für langfristige Zeitreihen 用于长期时间序列预报的可逆 Koopman 自动编码器 2503.12930v2
  • 2000 06-05 LotusFilter: Fast Diverse Nearest Neighbor Search via a Learned Cutoff Table LotusFilter: Schnelles Diverses Naheste Nachbarschaftssuche über einen erfahrenen Cutoff-Tisch LotusFilter:通过计算截止表快速搜索近邻 2506.04790v1
  • 2001 06-05 Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques Auf dem Weg zur multimodalen Fusion LLM-Centric: Eine Studie über Integrationsstrategien und -techniken 走向LLM-Centric Multimods 融合:关于一体化战略和技术的调查 2506.04788v1
  • 2002 06-05 Kernel $k$-Medoids as General Vector Quantization Kernel $k$-Medoids als General Vector Quantization 作为一般矢量量化的内核 $k$- 中值 2506.04786v1
  • 2003 06-05 Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward Prefix Grouper: Effiziente GRPO-Schulung durch Shared-Prefix Forward 前缀群組: 通过共享前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前语前言前言前言前语前言前语前言前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语语 2506.05433v1
  • 2004 06-05 Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards Verbesserte Regret Bounds für lineare Banditen mit schwerfälligen Belohnungen 使用重负奖励的线性强盗的遗憾弹道改善 2506.04775v1
  • 2005 06-05 Fine-Grained Interpretation of Political Opinions in Large Language Models Feinkörnige Interpretation politischer Meinungen in großen Sprachmodellen 以大语言模式对政治意见的精细解释 2506.04774v1
  • 2006 06-05 Inclusive, Differentially Private Federated Learning for Clinical Data Inklusives, differenziert privates Federated Learning für klinische Daten 包容性、差异化私联校临床数据学习 2505.22108v2
  • 2007 06-05 Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective Der Stabilitäts-Plastizitäts-Trade-off im kontinuierlichen Lernen aus architektonischer Perspektive neu denken 从建筑学的视角重新思考持续学习中稳定-弹性权衡取舍 2506.03951v2
  • 2008 06-05 PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling PCDVQ: Verbesserung der Vector Quantization für große Sprachmodelle über Polar Coordinate Entkopplung PCDVQ:通过极地协调脱钩,加强大语言模型的矢量量化 2506.05432v1
  • 2009 06-05 OpenGT: A Comprehensive Benchmark For Graph Transformers OpenGT: Ein umfassender Benchmark für Graph Transformer OpenGT: 图表变形器综合基准 2506.04765v1
  • 2010 06-05 Log-Linear Attention Log-Linear-Achtung 日志边注意 2506.04761v1
  • 2011 06-05 Robustness Evaluation for Video Models with Reinforcement Learning Robustheitsbewertung für Videomodelle mit Verstärkungslernen 强化学习录像模型的强力评价 2506.05431v1
  • 2012 06-05 Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models Multi-Layer GRPO: Verbessern von Vernunft und Selbstkorrektion in großen Sprachmodellen 多语言GROPO:加强大语言模式的理性和自我校正 2506.04746v1
  • 2013 06-05 Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models Text-zu-CAD-Generierung durch visuelles Feedback in großen Sprachmodellen 通过将视觉反馈引入大语言模型,将生成的文字引入CAD 2501.19054v3
  • 2014 06-05 Focus On This, Not That! Steering LLMs with Adaptive Feature Specification Fokus auf diese, nicht das! Lenkung LLMs mit adaptiven Feature Specification 聚焦于此, 不是那个! 指导性带适应性特征规格的LLMS 2410.22944v4
  • 2015 06-05 Model-based Neural Data Augmentation for sub-wavelength Radio Localization Modellbasierte Neural Data Augmentation für die Subwellenlängen-Radiolokalisierung 以模型为基础的亚波长无线电无线电下波长无线电神经数据扩增本地化模型 2506.06387v1
  • 2016 06-05 The Role of Diversity in In-Context Learning for Large Language Models Die Rolle der Vielfalt im In-Context-Lernen für große Sprachmodelle 多样性在为大语言模式进行内文学习方面的作用 2505.19426v2
  • 2017 06-05 SPHINX: Structural Prediction using Hypergraph Inference Network SPHINX: Structural Prediction mit Hypergraph Inference Network SPHINX:利用高光推断网络进行结构预测 2410.03208v2
  • 2018 06-05 Automated Modeling Method for Pathloss Model Discovery Automatisierte Modellierungsmethode für Pathloss Model Discovery 病理模型发现自动建模方法 2505.23383v2
  • 2019 06-05 Scaling Trends in Language Model Robustness Skalierungstrends in der Robustheit von Sprachmodellen 语言模型强度的增缩趋势 2407.18213v5
  • 2020 06-05 $μ$nit Scaling: Simple and Scalable FP8 LLM Training $μ$nit Scaling: Einfache und skalierbare FP8 LLM-Schulung $m$nit 缩放:简单和可缩放的FP8LLM培训 2502.05967v3
  • 2021 06-05 Coordinated Robustness Evaluation Framework for Vision-Language Models Koordinierte Robustheitsbewertungsrahmen für Vision-Language-Modelle 愿景-语言模型协调强力评价框架 2506.05429v1
  • 2022 06-05 Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks Regularisierte KL-Divergenz für gut definierte Funktions-Raum-Varianten-Schlussfolgerung in Bayesischen neuronalen Netzwerken Bayesian神经网络中定义精准的功能-空间变动推论的正规化 KL-KL-Diverence 2406.04317v3
  • 2023 06-05 Adapt before Continual Learning Anpassung vor dem kontinuierlichen Lernen 在持续学习前适应 2506.03956v2
  • 2024 06-05 FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation FlowDAS: Ein stochastisches Interpolant-basiertes Framework für die Datenassimilierung FlowDAS:一个基于数据同化的存储式内插框架 2501.16642v2
  • 2025 06-05 Wyckoff Transformer: Generation of Symmetric Crystals Wyckoff Transformer: Erzeugung symmetrischer Kristalle Wyckoff 变形器: 生成对称晶体 2503.02407v4
  • 2026 06-05 Using In-Context Learning for Automatic Defect Labelling of Display Manufacturing Data Verwendung von In-Context-Learning zur automatischen Defekt-Etikettierung von Display-Produktionsdaten 使用内文学习自动去除显示器制造数据标签 2506.04717v1
  • 2027 06-05 Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices Infinitesimale Spektralvariationen höherer Ordnung in rektangulären realen Zufallsmatrizen 矩形真实随机矩阵中无尽的高端天文光谱变化 2506.03764v2
  • 2028 06-05 Bandit Multiclass List Classification Bandit-Listenklassifikation für mehrere Klassen 强盗多级名单分类 2502.09257v2
  • 2029 06-05 UNO: Unlearning via Orthogonalization in Generative models UNO: Entlernen über Orthogonalisierung in generativen Modellen UNO:在创制模型中,通过正正孔化重新学习 2506.04712v1
  • 2030 06-05 Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence Explizite Dichte-Annäherung für neurale Implizite Sampler mit einer Bernstein-basierten Konvex-Divergenz 使用伯恩斯坦-以伯尔斯坦为基础的电汇变异的神经隐性采样器的显性密度近似 2506.04700v1
  • 2031 06-05 Enhanced Drought Analysis in Bangladesh: A Machine Learning Approach for Severity Classification Using Satellite Data Verbesserte Trockenheitsanalyse in Bangladesch: Ein maschineller Lernansatz für die Severity-Klassifikation mittels Satellitendaten 孟加拉国加强干旱分析:利用卫星数据进行重力分类的机械学习方法 2506.04696v1
  • 2032 06-05 On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models Über den Mechanismus der bewussten Musterauswahl im Ausbau des Lernens für Sprachmodelle 语文模式强化学习中选择理由选择模式机制 2506.04695v1
  • 2033 06-05 Influence Functions for Edge Edits in Non-Convex Graph Neural Networks Einflussfunktionen für Kantenbearbeitungen in nicht konvexen Graphen-Neuralnetzwerken 非Convex 图形神经网络中的边缘编辑的影响函数 2506.04694v1
  • 2034 06-05 Towards Better Generalization via Distributional Input Projection Network Auf dem Weg zu einer besseren Verallgemeinerung über das Distributional Input Projection Network 通过分配投入预测网络更好地实现普遍化 2506.04690v1
  • 2035 06-05 Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models Recycling the Web: Eine Methode zur Verbesserung der Vorschulung von Daten Qualität und Menge für Sprachmodelle 网上再循环:提高语文模式培训前数据质量和数量的方法 2506.04689v1
  • 2036 06-05 UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules UniSim: Ein einheitlicher Simulator für zeitgesteuerte Dynamik von Biomolekülen UniSim: 生物分子时间分析分析动态的统一模拟器 2506.03157v2
  • 2037 06-05 Model Predictive Control is Almost Optimal for Restless Bandit Modell Predictive Control ist fast optimal für Restless Bandit 模型预测控制几乎是无休止强盗的最佳模式 2410.06307v2
  • 2038 06-05 Neural Discovery in Mathematics: Do Machines Dream of Colored Planes? Neural Discovery in der Mathematik: Träumen Maschinen von farbigen Flugzeugen? 数学中的神经发现:机器梦到彩色平面吗? 2501.18527v3
  • 2039 06-05 Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction Diffusion mit einem sprachlichen Kompass: Steuerung der Generation klinisch plausibler Zukunft sMRI-Darstellungen für frühe MCI-Konversionsvorhersage 语言指南的传播:指导为早期MCI转换预测生成临床可视未来SMRI代表 2506.05428v1
  • 2040 06-05 Urania: Differentially Private Insights into AI Use Urania: Unterschiedliche private Einblicke in die KI-Nutzung Urania: 差异私人透视转化为 AI 使用 2506.04681v1
  • 2041 06-05 The cost of ensembling: is it always worth combining? Die Kosten für die Montage: lohnt es sich immer zu kombinieren? 集合的成本:总值得合并吗? 2506.04677v1
  • 2042 06-05 Gen-n-Val: Agentic Image Data Generation and Validation Gen-n-Val: Gen-n-Val: Agentische Bilddatengenerierung und -validierung Gen-n-Val: 代理图像数据生成和校验 2506.04676v1
  • 2043 06-05 Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models Psi-Sampler: Erste Partikel-Probenahme für SMC-basierte Inferenz-Zeit-Belohnung in Score-Modellen Psi-Sampler: 分数模型中基于 SMC 的推理-时间回报比对初始粒子抽样 2506.01320v2
  • 2044 06-05 TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization TANGO: Clustering mit Typicality-Aware Nonlocal Mode-Seeking und Graph-Cut-Optimierung TANGO:以非本地模式搜索和图集优化组合组合 2408.10084v2
  • 2045 06-05 MTPNet: Multi-Grained Target Perception for Unified Activity Cliff Prediction MTPNet: Multi-Grained Target Perception für Unified Activity Cliff Prediction MTPNet: 统一活动悬崖预测的多重目标感知 2506.05427v1
  • 2046 06-05 FedAPM: Federated Learning via ADMM with Partial Model Personalization FedAPM: Federated Learning via ADMM mit Teilmodell-Personalisierung FedAPP:通过具有部分示范性人格化的ADMMD学习联合会 2506.04672v1
  • 2047 06-05 Noise-Resistant Label Reconstruction Feature Selection for Partial Multi-Label Learning Noise-Resistent Label Reconstruction Feature Selection for Partial Multi-Label Learning 部分多标签学习选择部分多标签学习的恢复标签功能 2506.04669v1
  • 2048 06-05 Mixture-of-Experts Meets In-Context Reinforcement Learning Mixture-of-Experts trifft auf In-Context-Verstärktes Lernen 混合专家会议 文中加强学习 2506.05426v1
  • 2049 06-05 Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs Multiple Invertible und Partial-Equivariant Funktion für Latent Vector Transformation zur Verbesserung der Disentanglementierung in VAEs 用于中子矢量变异的多重可垂直和部分等同函数,以强化 VAEs 的分解 2502.03740v2
  • 2050 06-05 Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning Latent Safety-Constrained Policy Approach für sicheres Offline-Verstärkungslernen 安全离线强化安全学习受安保约束的政策办法 2412.08794v2
  • 2051 06-05 Can Artificial Intelligence Trade the Stock Market? Kann Künstliche Intelligenz den Aktienmarkt handeln? 人工智能贸易 股市? 2506.04658v1
  • 2052 06-05 The Oversmoothing Fallacy: A Misguided Narrative in GNN Research Der überwältigende Fehler: Ein fehlgeleitetes Narrativ in der GNN-Forschung 《压倒性代价:GNN研究中的误导性叙述》 2506.04653v1
  • 2053 06-05 Active Learning with Variational Quantum Circuits for Quantum Process Tomography Aktives Lernen mit unterschiedlichen Quantenkreisen für die Quantenprozesstomographie 使用量子过程地形图谱学量子变量电路进行主动学习 2412.20925v2
  • 2054 06-05 Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction Neural Network Reprogrammierbarkeit: Ein einheitliches Thema auf Modellumprogrammierung, Prompt-Tuning und Prompt-Instruktion 神经网络重组:关于模式重组、即时提款和即时指示的统一主题 2506.04650v1
  • 2055 06-05 Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing Nicht alle Optionen sind gleich: Textuelle Optionsgewichtung für Token-Efficient LLM-basierte Wissensverfolgung 并非所有选项都创建为等: Token- Efficient LLM 知识追踪的文本选项比重 2410.12872v2
  • 2056 06-05 ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation ActivePusher: Aktives Lernen und Planen mit Residual Physics für nichtprähensile Manipulation 主动推进器: 主动学习和规划,对非传染病操纵进行残余物理研究与规划 2506.04646v1
  • 2057 06-05 Inference economics of language models Schlußfolgerung Wirtschaftlichkeit von Sprachmodellen 语言模式的推论经济学 2506.04645v1
  • 2058 06-05 MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning MoDoMoDo: Multi-Domain-Datenmischungen für multimodales LLM-Verstärkungslernen MoDoMoMoDoDo:多式LLM强化学习多功能数据混合体 2505.24871v2
  • 2059 06-05 Adversarial Combinatorial Semi-bandits with Graph Feedback Adversariale Kombinatoriale Halbbänder mit Graph Feedback 带有图图反馈的半斜面 2502.18826v4
  • 2060 06-05 An analytic theory of creativity in convolutional diffusion models Eine analytische Theorie der Kreativität in konvolutionären Diffusionsmodellen 革命扩散模型中的创造性分析理论 2412.20292v2
  • 2061 06-05 Composing Agents to Minimize Worst-case Risk Komponieren von Agenten zur Minimierung des schlimmsten Risikos 尽量减少最坏情况风险的混合剂 2506.04632v1
  • 2062 06-05 PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models Vergifteter Papagei: Subtile Daten vergiften Angriffe auf Elicit Copyright-Verletzung Inhalt aus großen Sprachmodellen 有毒Parrot:大语言模型中侵犯版权内容的隐含数据毒害攻击 2503.07697v2
  • 2063 06-05 Constructive Symbolic Reinforcement Learning via Intuitionistic Logic and Goal-Chaining Inference Konstruktives symbolisches Stärkungslernen durch intuitionistische Logik und zielführende Schlussfolgerung 通过神学逻辑和目标 – – 测深推断法进行建设性建构性符号强化学习 2506.05422v1
  • 2064 06-05 On The Sample Complexity Bounds In Bilevel Reinforcement Learning Auf der Probe Komplexität Grenzen im Zwei-Ebenen-Verstärkungs-Lernen 关于在双级强化学习中的抽样复杂度交错 2503.17644v3
  • 2065 06-05 MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation MAVL: Ein mehrsprachiger Audio-Video-Text Datensatz für animierte Song-Übersetzung MAVL: 动动歌曲翻译多语种视听歌词数据集 2505.18614v2
  • 2066 06-05 Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos Ungewissheitsbasiertes Extensibles Codebook für diskretes Federated Learning in heterogenen Datensilos 以不确定性为基础的异异种数据过滤器中分辨联邦学习扩展守则手册 2402.18888v4
  • 2067 06-05 Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? Können Wasserzeichen große Sprachmodelle die urheberrechtlich geschützte Textgenerierung verhindern und Trainingsdaten ausblenden? 水标记大语言模型能否防止有版权的文本生成和隐藏培训数据? 2407.17417v3
  • 2068 06-05 Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning Bedauern-Optimal Q-Lernen mit niedrigen Kosten für ein-Agenten und Federated Verstärkungs-Lernen 单一机构和联邦强化学习的低费用学习 2506.04626v1
  • 2069 06-05 Static Word Embeddings for Sentence Semantic Representation Statische Wort-Einbettungen für Satz semantische Darstellung 判决语义代表的静态单词嵌入 2506.04624v1
  • 2070 06-05 Solving Inverse Problems via Diffusion-Based Priors: An Approximation-Free Ensemble Sampling Approach Inverse Probleme durch Diffusionsbasierte Priorien lösen: Ein annäherungsfreier Ensemble-Probenansatz 通过以传播为基础的前例解决反向问题:一种近似-无聚合集成样样样的方法 2506.03979v2
  • 2071 06-05 Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting Proaktive Modellanpassung gegen Konzept Drift für Online-Zeitreihenprognosen 针对在线时间序列预测的 “ 动态动态 “ 概念的前瞻性模型适应 2412.08435v4
  • 2072 06-05 Efficiently Serving Large Multimodal Models Using EPD Disaggregation Effizientes Servieren großer multimodaler Modelle mit EPD-Disaggregation 利用EPD拆分有效服务大型多模式模式 2501.05460v3
  • 2073 06-05 Exploring bidirectional bounds for minimax-training of Energy-based models Erforschung bidirektionaler Grenzen für Minimax-Training von energiebasierten Modellen 探讨能源模型小型培训的双向双向界限 2506.04609v1
  • 2074 06-05 ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning ADG: Ambient Diffusion-Guided Dataset Recovery für Korruption-Robust Offline-Verstärkung Lernen ADG: 为腐败 – – ROBT离线强化学习而进行渗透性传播、指导性数据集恢复 2505.23871v2
  • 2075 06-05 Ignoring Directionality Leads to Compromised Graph Neural Network Explanations Ignorieren der Richtungalität führt zu kompromissierten Graphen-Neural-Netzwerk-Erklärungen 忽略方向引导导致压缩的图形神经网络解释 2506.04608v1
  • 2076 06-05 MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball MVP-Shapley: Featurebasierte Modellierung für die Bewertung des wertvollsten Spielers im Basketball MVP-Shaplay:评估篮球中最有价值的玩家的基于地物的模型模型 2506.04602v1
  • 2077 06-05 Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets Skalierungsgesetze für einen robusten Vergleich von Open Foundation Language-Vision-Modellen und -Datensätzen 开放基础语言-视觉模型和数据集强力比较法 2506.04598v1
  • 2078 06-05 Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Lösen des vernünftigen Potenzials von vortrainierten LLMs durch Kritik Feinsteuerung auf ein Problem 通过Critique对一个问题的微调,释放预先培训的LLMs的理据潜力 2506.03295v2
  • 2079 06-05 Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable Sicherheitssteuer: Sicherheitsausrichtung macht Ihre großen vernünftigen Modelle weniger vernünftig 安全税:安全协调使你的大理由模型降低合理性 2503.00555v2
  • 2080 06-05 Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification Sicher: Mathematische Reasoning in großen Sprachmodellen durch retrospektive stufenweise Verifizierung verbessern 安全:通过回溯性逐步认识正式核查,加强大语言模型中的数学理由 2506.04592v1
  • 2081 06-05 Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret Thompson Sampling und UCB verbinden: Auf dem Weg zu effizienteren Kompromissen zwischen Datenschutz und Bedauern 连接Thompson抽样和UCB:实现更高效的隐私和遗憾之间的权衡取舍 2505.02383v2
  • 2082 06-05 Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization Adressierung von Konzept-Mislabeling in Konzept-Bottleneck-Modellen durch Preference-Optimierung 通过优先优化处理概念瓶颈模式中的概念误贴标签问题 2504.18026v3
  • 2083 06-05 AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment AdvBDGen: Adversarially Fortified Prompt-Spezifische Fuzzy Backdoor Generator gegen LLM Ausrichtung AdvBDGen:抗LLM对齐的反增强的快速特制烟雾后门发电机 2410.11283v3
  • 2084 06-05 Learning Hyperparameters via a Data-Emphasized Variational Objective Hyperparameter über ein datenverstärktes Variationsziel lernen 通过数据-按单位分列的变异目标学习超超参数 2502.01861v2
  • 2085 06-05 Contrastive Visual Data Augmentation Kontrastive Bilddatenvergrößerung 对比视觉数据增强 2502.17709v2
  • 2086 06-05 StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation StatsMerging: Statistik-geführtes Modell Zusammenführen über Task-Spezifische Lehrer-Destillation StatsMerging:通过特定任务教师蒸馏将统计指导模型合并 2506.04567v1
  • 2087 06-05 Clustering and Median Aggregation Improve Differentially Private Inference Clustering und Median Aggregation verbessern unterschiedliche private Schlussfolgerungen 群集和中中聚合改善差别私人推断 2506.04566v1
  • 2088 06-05 ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration ReCalKV: Low-Rank KV Cache-Kompression über Head Reordering und Offline-Kalibrierung ReCalKV: 通过头重排序和离线校准进行低Rank KV缓存压缩 2505.24357v2
  • 2089 06-05 OpenThoughts: Data Recipes for Reasoning Models OpenThoughts: Datenrezepte für sinnvolle Modelle OpenToughts: 用于解释模型的数据折录 2506.04178v2
  • 2090 06-05 Investigating Distributions of Telecom Adapted Sentence Embeddings for Document Retrieval Untersuchung der Verteilung von Telecom-adaptierten Satz-Einbindungen für Dokumentenwiederherstellung 用于文件检索的经调整的远程判刑嵌入表的调查分发情况 2406.12336v3
  • 2091 06-05 LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits LEDRO: LLM-Enhanced Design Raumreduzierung und Optimierung für analoge Schaltungen LLDRO: LLM-加强设计空间减少和模拟电路优化 2411.12930v2
  • 2092 06-05 What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Was in LLM-Schichten passiert ist, wenn es um schnelles gegen langsames Denken geht: Eine gradiente Perspektive 训练快速与慢思考:渐进视角时 LLM 图层中发生的情况 2410.23743v2
  • 2093 06-05 Non-linear Multi-objective Optimization with Probabilistic Branch and Bound Nichtlineare Multiobjektive Optimierung mit probabilistischem Zweig und Bound 非线性多目标优化与概率分支和圆 2506.04554v1
  • 2094 06-05 Unsupervised Machine Learning for Scientific Discovery: Workflow and Best Practices Unüberwachtes maschinelles Lernen für wissenschaftliche Entdeckung: Workflow und Best Practices 为科学发现进行无人监督的机器学习:工作流程和最佳做法 2506.04553v1
  • 2095 06-05 Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality Politisches Lernen “ohne” überschneidet sich: Pessimismus und generalisierte empirische Ungleichheit Bernsteins 政策学习“无重叠”重叠:悲观主义和普遍经验型伯恩斯坦的不平等 2212.09900v4
  • 2096 06-05 Communication Efficient Adaptive Model-Driven Quantum Federated Learning Kommunikation Effizientes adaptives modellgetriebenes Quantum-Federiertes Lernen 适应性强、适应性强的示范模式-量子联邦学习 2506.04548v1
  • 2097 06-05 $K$-MSHC: Unmasking Minimally Sufficient Head Circuits in Large Language Models with Experiments on Syntactic Classification Tasks $K$-MSHC: Entlarvung minimal ausreichender Kopfkreise in großen Sprachmodellen mit Experimenten zu syntaktischen Klassifizierungsaufgaben USK美元-MSHC:在以同步分类任务实验的大型语言模型中,以最小化的足够大语言电路总电路 2505.12268v2
  • 2098 06-05 ATLaS: Agent Tuning via Learning Critical Steps ATLaS: Agent Tuning über Learning Critical Steps 通过学习关键步骤进行测试的代理 2503.02197v2
  • 2099 06-05 General Intelligence Requires Reward-based Pretraining Allgemeine Intelligenz erfordert eine prämienbasierte Vorausbildung 以奖励为基础的训练前训练 2502.19402v2
  • 2100 06-05 Combinatorial Reinforcement Learning with Preference Feedback Kombinatorisches Stärkungslernen mit Präferenz-Feedback 结合强化学习与优先反馈 2502.10158v3
  • 2101 06-05 hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation hdl2v: Ein Code-Übersetzungsdatensatz für verbesserte LLM Verilog-Generierung hdl2v: 用于强化LLM Verilog 生成的代码翻译数据集 2506.04544v1
  • 2102 06-05 Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction Neurale MJD: Neurale nicht-stationäre Merton Jump Diffusion für Zeitreihenvorhersage 神经MJD: 用于时间序列预测的神经非立体Merton跳跃扩散 2506.04542v1
  • 2103 06-05 Chronoamperometry with Room-Temperature Ionic Liquids: Sub-Second Inference Techniques Chronoamperometrie mit Ionenflüssigkeiten der Raumtemperatur: Sub-Second Inferenztechniken 与室-室-温度-不态液体:二次二次推断技术 2506.04540v1
  • 2104 06-05 Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models Intelligent4DSE: Optimierung der High-Level-Synthese Design Space Exploration mit Graph Neural Networks und großen Sprachmodellen 智能4DSE:利用图形神经网络和大语言模型优化高级合成设计空间探索 2504.19649v2
  • 2105 06-05 Olfactory Inertial Odometry: Sensor Calibration and Drift Compensation Olfactory Inertial Odometrie: Sensorkalibrierung und Driftkompensation 腐蚀性惰性测量:传感器校准和漂流赔偿 2506.04539v1
  • 2106 06-05 Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs Projektionsbasierte Lyapunov-Methode für voll heterogene schwach gekoppelte MDPs 以投射为基础的Lyapunov 方法,用于完全多元的微软相混合的 MDP 2502.06072v4
  • 2107 06-05 NOBLE – Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models NOBLE – Neuronenoperator mit biologisch informierten Latent-Embeddings zur Erfassung experimenteller Variabilität in biologischen Neuronenmodellen NOBL – – 神经操作器,具有生物知情的中程嵌入装置,以捕获生物中子模型中的实验变异性 2506.04536v1
  • 2108 06-05 On the Emergence of Position Bias in Transformers Über die Entstehung der Position Bias in Transformatoren 变换者中立场偏见的出现 2502.01951v2
  • 2109 06-05 HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training HALOS: Hierarchische asynchrone lokale SGD über langsame Netzwerke für geo-distributed Large Language Model Training HALOS:地理分布大语言模式培训的跨慢网的等级非同步本地 SGD 2506.04531v1
  • 2110 06-05 Self-Predictive Dynamics for Generalization of Vision-based Reinforcement Learning Selbstvorhersagbare Dynamiken für die Generalisierung des visionsbasierten Verstärkungslernens 普及基于愿景的强化学习的自我预测动态 2506.05418v1
  • 2111 06-05 Hierarchical Implicit Neural Emulators Hierarchische Implizite Neuralemulatoren 等级隐性神经隐性模拟器 2506.04528v1
  • 2112 06-05 Uncovering Memorization Effect in the Presence of Spurious Correlations Uncovering Memorization Effekt in der Gegenwart von Purous Correlations 在存在净腐蚀物时未覆盖的记忆化效应 2501.00961v3
  • 2113 06-05 Perturbative Gradient Training: A novel training paradigm for bridging the gap between deep neural networks and physical reservoir computing Perturbative Gradient Training: Ein neuartiges Trainingsparadigma zur Überbrückung der Kluft zwischen tiefen neuronalen Netzwerken und physikalischen Reservoir Computing 循环渐进培训:弥合深神经网络与物理储油层计算之间的差距的新培训模式 2506.04523v1
  • 2114 06-05 EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles EnsemW2S: Verbesserung der Schwach-zu-Strong-Verallgemeinerung mit großsprachigen Modellensembles EnsemW2S:用大语言模型组合加强弱至强的通用化 2505.21959v2

Article 0

Title@2025-06-12 (4): Rethinking Losses for Diffusion Bridge Samplers

Title: Rethinking Losses for Diffusion Bridge Samplers Rückdenken von Verlusten für Diffusionsbrücken-Probenehmer 重新思考扩散桥采样器的损失 2506.10982v1

Authors (5): Sebastian Sanokowski, Lukas Gruber, Christoph Bartmann, Sepp Hochreiter, Sebastian Lehner

Diffusion bridges are a promising class of deep-learning methods for sampling from unnormalized distributions. Recent works show that the Log Variance (LV) loss consistently outperforms the reverse Kullback-Leibler (rKL) loss when using the reparametrization trick to compute rKL-gradients. While the on-policy LV loss yields identical gradients to the rKL loss when combined with the log-derivative trick for diffusion samplers with non-learnable forward processes, this equivalence does not hold for diffusion bridges or when diffusion coefficients are learned. Based on this insight we argue that for diffusion bridges the LV loss does not represent an optimization objective that can be motivated like the rKL loss via the data processing inequality. Our analysis shows that employing the rKL loss with the log-derivative trick (rKL-LD) does not only avoid these conceptual problems but also consistently outperforms the LV loss. Experimental results with different types of diffusion bridges on challenging benchmarks show that samplers trained with the rKL-LD loss achieve better performance. From a practical perspective we find that rKL-LD requires significantly less hyperparameter optimization and yields more stable training behavior.

扩散桥是一个很有希望的深层学习方法类别,从非正常分布分布中取样。最近的工程显示,日志差异(LV)损失在使用重新校正技巧计算 rKL 梯度时,始终优于反向 Kullback- Leiber (rKL) 损失。虽然在政策上LV 损失产生与 RKL 损失相同的梯度,如果与逻辑衍生技巧(rKL-LD)相结合,不仅避免了这些概念问题,而且始终超越了LV 损失。在具有挑战性的基准上,不同种类的传播桥梁的实验结果显示,通过 RKLL-LD损失培训的采样者要达到更高的性能。从实际角度看,我们发现使用RKLLL损失的优化不是最强的,而是更稳定地进行高的优化。


Article 1

Title@2025-06-12 (4): Fine-Grained Perturbation Guidance via Attention Head Selection

Title: Fine-Grained Perturbation Guidance via Attention Head Selection Feinkörnige Störungsführung über Aufmerksamkeitskopfauswahl 通过 “ 关注负责人甄选 “ 指导 2506.10978v1

Authors (10): Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Minjae Kim, Jaewon Min, Wooseok Jang, Saungwu Lee, Sayak Paul, Susung Hong, Seungryong Kim

Recent guidance methods in diffusion models steer reverse sampling by perturbing the model to construct an implicit weak model and guide generation away from it. Among these approaches, attention perturbation has demonstrated strong empirical performance in unconditional scenarios where classifier-free guidance is not applicable. However, existing attention perturbation methods lack principled approaches for determining where perturbations should be applied, particularly in Diffusion Transformer (DiT) architectures where quality-relevant computations are distributed across layers. In this paper, we investigate the granularity of attention perturbations, ranging from the layer level down to individual attention heads, and discover that specific heads govern distinct visual concepts such as structure, style, and texture quality. Building on this insight, we propose “HeadHunter”, a systematic framework for iteratively selecting attention heads that align with user-centric objectives, enabling fine-grained control over generation quality and visual attributes. In addition, we introduce SoftPAG, which linearly interpolates each selected head’s attention map toward an identity matrix, providing a continuous knob to tune perturbation strength and suppress artifacts. Our approach not only mitigates the oversmoothing issues of existing layer-level perturbation but also enables targeted manipulation of specific visual styles through compositional head selection. We validate our method on modern large-scale DiT-based text-to-image models including Stable Diffusion 3 and FLUX.1, demonstrating superior performance in both general quality enhancement and style-specific guidance. Our work provides the first head-level analysis of attention perturbation in diffusion models, uncovering interpretable specialization within attention layers and enabling practical design of effective perturbation strategies.

传播模型的最近指导方法通过扰动模型来引导反向取样,以构建一个隐含的薄弱模型并引导其生成。在这些方法中,人们的注意力扰动在无条件的假设情景中表现出了很强的经验性表现,这些假设情景中不适用无分类指导。然而,现有的注意力扰动方法缺乏确定应在何处应用扰动的系统化框架,特别是在与质量相关的计算分布在多个层次的Difult 变异器(DiT)结构中。在本文中,我们调查了从层到个人关注头的注意力扰动颗粒性变化,发现具体的头部在结构、风格和质质质质量质量质量等不同的视觉概念上表现得非常强。基于这一洞察,我们建议“头动”是一个系统化框架,用于迭代地选择与用户中心目标一致的注意力头部结构,从而能够对生成质量和视觉属性进行细化控制。 此外,我们引入了SoftPAG, 将每个选定的头部关注度图以直线度为首,提供持续 knob 以调动的振动性流动性水平和压压动性结构型结构的深度结构。 我们的方法只能在视觉结构内部进行着的升级的深度上进行着的升级的升级,我们现有的视觉结构结构结构的升级,我们通过特定的调整,只能测测测测测测测测测测测测测测测测测测测测测测测测,我们现有的系统, 。


Article 2

Title@2025-06-12 (4): AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Title: AutoMind: Adaptive Knowledgeable Agent for Automated Data Science AutoMind: Adaptives Knowledgeable Agent für automatisierte Datenwissenschaft 自动Mind:自动数据科学适应性知识代理 2506.10974v1

Authors (9): Yixin Ou, Yujie Luo, Jingsheng Zheng, Lanning Wei, Shuofei Qiao, Jintian Zhang, Da Zheng, Huajun Chen, Ningyu Zhang

Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science.

大型语言模型(LLM)代理商在解决现实世界数据科学问题方面表现出了巨大的潜力。LLM驱动的数据科学代理商承诺使整个机器学习管道自动化,然而其真实世界的有效性仍然有限。现有框架依赖于僵硬、预先定义的工作流程和不灵活的编码战略;因此,它们仅擅长于相对简单、古老的问题,未能捕捉人类从业者带来复杂、创新任务的经验专长。在这项工作中,我们引入了AutoMind(AutoMind)(一个适应性、知识丰富的LLM(LM)代理商)框架,通过三项关键进步克服了这些缺陷:(1) 一种成熟的专家知识基础,使该代理商具有领域专家知识;(2) 一种具有代理知识的树搜索算法,从战略上探索可能的解决方案;(3) 一种自我调整的编码战略,根据任务的复杂性动态地定制生成代码。对两个自动化数据科学基准的评估表明,AutoMind(AutoMind)能够提供优异的绩效、效率和质量解决方案质量,强调AutMind(Autmind)是迈向完全自动化数据科学的高效和稳健健捷的一步。


Article 3

Title@2025-06-12 (4): Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning

Title: Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning Prinzipierte Ansätze zur Erweiterung neuraler Architekturen auf Funktionsräume für das Bediener-Lernen 将神经结构扩展至功能空间以便操作员学习的原则办法 2506.10973v1

Authors (7): Julius Berner, Miguel Liu-Schiaffini, Jean Kossaifi, Valentin Duruisseaux, Boris Bonev, Kamyar Azizzadenesheli, Anima Anandkumar

A wide range of scientific problems, such as those described by continuous-time dynamical systems and partial differential equations (PDEs), are naturally formulated on function spaces. While function spaces are typically infinite-dimensional, deep learning has predominantly advanced through applications in computer vision and natural language processing that focus on mappings between finite-dimensional spaces. Such fundamental disparities in the nature of the data have limited neural networks from achieving a comparable level of success in scientific applications as seen in other fields. Neural operators are a principled way to generalize neural networks to mappings between function spaces, offering a pathway to replicate deep learning’s transformative impact on scientific problems. For instance, neural operators can learn solution operators for entire classes of PDEs, e.g., physical systems with different boundary conditions, coefficient functions, and geometries. A key factor in deep learning’s success has been the careful engineering of neural architectures through extensive empirical testing. Translating these neural architectures into neural operators allows operator learning to enjoy these same empirical optimizations. However, prior neural operator architectures have often been introduced as standalone models, not directly derived as extensions of existing neural network architectures. In this paper, we identify and distill the key principles for constructing practical implementations of mappings between infinite-dimensional function spaces. Using these principles, we propose a recipe for converting several popular neural architectures into neural operators with minimal modifications. This paper aims to guide practitioners through this process and details the steps to make neural operators work in practice. Our code can be found at https://github.com/neuraloperator/NNs-to-NOs

功能空间一般是无限的,但深层次的学习主要通过计算机视觉和自然语言处理的应用,侧重于在有限空间之间绘制地图。这些数据性质上的根本差异限制了神经网络在科学应用方面取得与其它领域相似的成功水平。神经操作员是将神经网络推广到功能空间间绘图的一条原则性途径,为复制深层次学习对科学问题的影响提供了途径。例如,神经操作员可以学习整个类别PDE的解决方案操作员,例如,具有不同边界条件、系数功能和地理特征的物理系统。深层次学习的关键因素之一是通过广泛的实验测试对神经结构进行仔细的工程设计。将这些神经结构转换成神经操作员,使操作员能够学习这些在功能空间之间的实验性优化。然而,以前的神经操作员结构常常被引入为独立模型,而不是直接在构建现有神经网络结构的扩展中,通过这些核心结构的扩展,我们通过这些软质操作员和软质结构的扩展,我们找到了这些软质操作员的模型。


Article 4

Title@2025-06-12 (4): Farseer: A Refined Scaling Law in Large Language Models

Title: Farseer: A Refined Scaling Law in Large Language Models Farseer: Ein verfeinertes Skalierungsgesetz in großen Sprachmodellen Farseer:大语言模式中经过完善的扩大法 2506.10972v1

Authors (11): Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing a model loss surface $L(N,D)$, Farseer achieves a significantly better fit to empirical data than prior laws (e.g., Chinchilla’s law). Our methodology yields accurate, robust, and highly generalizable predictions, demonstrating excellent extrapolation capabilities, improving upon Chinchilla’s law by reducing extrapolation error by 433\%. This allows for the reliable evaluation of competing training strategies across all $(N,D)$ settings, enabling conclusions from small-scale ablation studies to be confidently extrapolated to predict large-scale performance. Furthermore, Farseer provides new insights into optimal compute allocation, better reflecting the nuanced demands of modern LLM training. To validate our approach, we trained an extensive suite of approximately 1,000 LLMs across diverse scales and configurations, consuming roughly 3 million NVIDIA H100 GPU hours. We are comprehensively open-sourcing all models, data, results, and logs at https://github.com/Farseer-Scaling-Law/Farseer to foster further research.

培训大型语言模型(LLMS)的费用太高,令人望而却步,造成了一个关键的缩小差距,因为小规模实验的洞察力往往无法向资源密集型生产系统转移,从而阻碍有效的创新。为了弥补这一差距,我们引入了Farseer, 这是一项新颖和经过改进的缩略法, 提供了更高的全尺度预测准确度。通过系统地构建一个模型损失表面值$(N,D),Farseer比以往的法律(例如Chinchilla的法律)更符合经验数据。我们的方法得出了准确、有力和高度普遍化的预测,显示了极好的外推法能力,通过减少433的外推误改进了Chinchilla的法律。这样,我们就可以可靠地评估所有(N,D)环境中相互竞争的培训战略,提供更高的预测性预测性预测性能。此外,Farseer提供了对最佳的计算分配的新见解,更好地反映现代LMM培训的精细要求。为了验证我们的方法,我们培训了整个方法,在各种规模和配置上大约1 000 LLMS-LMS-IA全面研算。


Article 5

Title@2025-06-12 (4): What Exactly Does Guidance Do in Masked Discrete Diffusion Models

Title: What Exactly Does Guidance Do in Masked Discrete Diffusion Models Was genau tut Anleitung in maskierten diskreten Diffusionsmodellen? 在隐蔽的分辨扩散模型中,指导到底做什么? 2506.10971v1

Authors (3): He Ye, Rojas Kevin, Tao Molei

We study masked discrete diffusion models with classifier-free guidance (CFG). Assuming no score error nor discretization error, we derive an explicit solution to the guided reverse dynamics, so that how guidance influences the sampling behavior can be precisely characterized. When the full data distribution is a mixture over classes and the goal is to sample from a specific class, guidance amplifies class-specific regions while suppresses regions shared with other classes. This effect depends on the guidance strength $w$ and induces distinct covariance structures in the sampled distribution. Notably, we observe quantitatively different behaviors in $1$D and $2$D. We also show that for large $w$, the decay rate of the total variation ($\mathrm{TV}$) along the reverse dynamics is double-exponential in $w$ for both $1$D and $2$D. These findings highlight the role of guidance, not just in shaping the output distribution, but also in controlling the dynamics of the sampling trajectory. Our theoretical analysis is supported by experiments that illustrate the geometric effects of guidance and its impact on convergence.

假设没有分数错误或分解错误,我们就会为引导反向动态找到一个明确的解决方案,从而可以精确地描述指导如何影响抽样行为。当完整的数据分布是跨类混合的,目标是从特定类别中取样时,指导会扩大特定类别区域,同时压制与其他类别共享的区域。这种影响取决于指导强度,因为美元值取决于指导强度,并会在抽样分布中产生不同的共差结构。值得注意的是,我们观察到以1美元和2美元计算的量化不同行为。我们还表明,对于大美元而言,整个反向动态变化的衰减率($mathrm{TV}$)是双倍的,以1美元和2美元计算。这些结论突出了指导的作用,不仅在决定产出分布方面,而且在控制抽样分布的动态方面。我们的理论分析得到实验的支持,这些实验显示了指导的几何效果及其对趋同的影响。


Article 6

Title@2025-06-12 (4): SoK: Watermarking for AI-Generated Content

Title: SoK: Watermarking for AI-Generated Content SoK: Wasserzeichen für KI-generierten Inhalt SoK: 用于 AI 生成内容的水标志 2411.18479v3

Authors (14): Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Carlini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, Somesh Jha, Lei Li, Yu-Xiang Wang, Dawn Song

As the outputs of generative AI (GenAI) techniques improve in quality, it becomes increasingly challenging to distinguish them from human-created content. Watermarking schemes are a promising approach to address the problem of distinguishing between AI and human-generated content. These schemes embed hidden signals within AI-generated content to enable reliable detection. While watermarking is not a silver bullet for addressing all risks associated with GenAI, it can play a crucial role in enhancing AI safety and trustworthiness by combating misinformation and deception. This paper presents a comprehensive overview of watermarking techniques for GenAI, beginning with the need for watermarking from historical and regulatory perspectives. We formalize the definitions and desired properties of watermarking schemes and examine the key objectives and threat models for existing approaches. Practical evaluation strategies are also explored, providing insights into the development of robust watermarking techniques capable of resisting various attacks. Additionally, we review recent representative works, highlight open challenges, and discuss potential directions for this emerging field. By offering a thorough understanding of watermarking in GenAI, this work aims to guide researchers in advancing watermarking methods and applications, and support policymakers in addressing the broader implications of GenAI.

由于基因化的AI(GenAI)技术在质量上有所改进,因此将其与人类创造的内容区别开来已变得日益具有挑战性。水标记计划是解决区别AI和人类生成的内容问题的一个很有希望的方法。这些计划将隐藏的信号嵌入AI生成的内容中,以便能够进行可靠的检测。虽然水标记并不是解决与GenAI有关的所有风险的银弹,但它可以通过打击错误和欺骗行为,在提高AI的安全和可信度方面发挥关键作用。本文件全面概述了GenAI的水标记技术,从历史和监管角度出发,为GenAI提供一个水标记技术的全面概览。我们正式确定水标记计划的定义和预期特性,并审查现有方法的关键目标和威胁模式。我们还探讨了实际评估战略,为开发能够抵御各种袭击的稳健的水标记技术提供了深刻的见解。此外,我们审查最近的代表著作,突出公开的挑战,并讨论这一新兴领域的潜在方向。通过全面了解GenAI的水标记,这项工作旨在指导研究人员推进水标记方法和应用,并支持决策者解决GenAI的更广泛影响。


Article 7

Title@2025-06-12 (4): SpectralAR: Spectral Autoregressive Visual Generation

Title: SpectralAR: Spectral Autoregressive Visual Generation SpectralAR: Spectral Autoregressive Visual Generation 光谱AR:光谱自动递减视觉生成 2506.10962v1

Authors (6): Yuanhui Huang, Weiliang Chen, Wenzhao Zheng, Yueqi Duan, Jie Zhou, Jiwen Lu

Autoregressive visual generation has garnered increasing attention due to its scalability and compatibility with other modalities compared with diffusion models. Most existing methods construct visual sequences as spatial patches for autoregressive generation. However, image patches are inherently parallel, contradicting the causal nature of autoregressive modeling. To address this, we propose a Spectral AutoRegressive (SpectralAR) visual generation framework, which realizes causality for visual sequences from the spectral perspective. Specifically, we first transform an image into ordered spectral tokens with Nested Spectral Tokenization, representing lower to higher frequency components. We then perform autoregressive generation in a coarse-to-fine manner with the sequences of spectral tokens. By considering different levels of detail in images, our SpectralAR achieves both sequence causality and token efficiency without bells and whistles. We conduct extensive experiments on ImageNet-1K for image reconstruction and autoregressive generation, and SpectralAR achieves 3.02 gFID with only 64 tokens and 310M parameters. Project page: https://huang-yh.github.io/spectralar/.

与扩散模型相比,自动递减的视觉生成由于其可变性和与其他模式的兼容性而日益引起注意。大多数现有方法将视觉序列构建为自动递减生成的空间补丁。然而,图像补丁本质上是平行的,与自动递减模型的因果关系性质相矛盾。为了解决这个问题,我们提议一个光谱自动递减生成框架,从光谱角度认识视觉序列的因果关系。具体地说,我们首先将图像转换成带有Nested Spectoral Tokenization(代表低频至高频组件)的定购光谱符号。然后,我们用光谱符号序列的相向偏移方式进行自动递增生成。通过考虑图像中不同的详细程度,我们的SpectrameralAR在没有钟和哨子的情况下实现序列因果关系和象征性效率。我们用图像Net-1K进行广泛的实验,用于图像重建和自动递增生成,而SpecturalAR只实现3.02 gFID,只有64个符号和3100M参数。项目页面: https://huang-ygio/slamgioalgioal.


Article 8

Title@2025-06-12 (4): ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

Title: ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark ChinesischHarm-Bench: Ein chinesischer schädlicher Content Detection Benchmark 中中汉禁区:中国有害内容检测基准 2506.10960v1

Authors (10): Kangwei Liu, Siyuan Cheng, Bozhong Tian, Xiaozhuan Liang, Yuyang Yin, Meng Han, Ningyu Zhang, Bryan Hooi, Xi Chen, Shumin Deng

Large language models (LLMs) have been increasingly applied to automated harmful content detection tasks, assisting moderators in identifying policy violations and improving the overall efficiency and accuracy of content review. However, existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope. We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data. Our annotation process further yields a knowledge rule base that provides explicit expert knowledge to assist LLMs in Chinese harmful content detection. In addition, we propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models, enabling smaller models to achieve performance comparable to state-of-the-art LLMs. Code and data are available at https://github.com/zjunlp/ChineseHarm-bench.

大型语言模型(LLMS)已越来越多地用于自动识别有害内容的任务,协助主持人查明违反政策的情况,提高内容审查的总体效率和准确性;然而,用于检测有害内容的现有资源主要集中于英语,中国数据集仍然稀缺,而且范围往往有限;我们为中国内容伤害检测提供了一个全面、专业的附加说明的基准,该基准涵盖六个代表性类别,完全以现实世界数据为基础;我们的批注过程还产生了一个知识规则基础,该知识基础提供明确的专家知识知识,协助查找中国有害内容的LMS;此外,我们提出一个知识强化基线,将人附加说明的知识规则和大型语言模型的隐含知识结合起来,使较小的模型能够达到与最先进的LMS相近的性能;可在https://github.com/zjunp/中国Harm-bench查阅数据。


Article 9

Title@2025-06-12 (4): Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods

Title: Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods In-Context-Lernen über strukturierte Manifolds verstehen: Aufmerksamkeit auf Kernel-Methoden überbrücken 理解关于结构式操纵的内文学习:弥合对核心方法的注意 2506.10959v1

Authors (4): Zhaiming Shen, Alexander Hsu, Rongjie Lai, Wenjing Liao

While in-context learning (ICL) has achieved remarkable success in natural language and vision domains, its theoretical understanding–particularly in the context of structured geometric data–remains unexplored. In this work, we initiate a theoretical study of ICL for regression of H"older functions on manifolds. By establishing a novel connection between the attention mechanism and classical kernel methods, we derive generalization error bounds in terms of the prompt length and the number of training tasks. When a sufficient number of training tasks are observed, transformers give rise to the minimax regression rate of H"older functions on manifolds, which scales exponentially with the intrinsic dimension of the manifold, rather than the ambient space dimension. Our result also characterizes how the generalization error scales with the number of training tasks, shedding light on the complexity of transformers as in-context algorithm learners. Our findings provide foundational insights into the role of geometry in ICL and novels tools to study ICL of nonlinear models.

虽然在自然语言和视觉领域取得了显著的成功(ICL),但其理论理解 – – 特别是在结构化几何数据-剩余部分尚未探索的几何数据-领域。在这项工作中,我们开始对ICL进行理论研究,以在多元上回归H'older函数。通过在关注机制和古典内核方法之间建立新的联系,我们从迅速的长度和培训任务的数量方面得出了概括性错误的界限。当观察到足够数量的训练任务时,变压器产生在多元体上的H'older函数的微缩回归率,这些元体与多元体的内在维度相比,而不是环境空间层面是倍增的。我们的结果还说明了与培训任务数量相比的一般误差尺度,揭示了变压器作为同源算算算法学习者的复杂性。我们的调查结果为在ICL和研究非线性模型的ICL(ICL)的新工具中的几何作用提供了基本洞察力。


Article 10

Title@2025-06-12 (4): AssistanceZero: Scalably Solving Assistance Games

Title: AssistanceZero: Scalably Solving Assistance Games AssistanceZero: Skalably Solving Assistance Games 援助零:可升级解决援助运动会 2504.07091v2

Authors (8): Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, Anca Dragan

Assistance games are a promising alternative to reinforcement learning from human feedback (RLHF) for training AI assistants. Assistance games resolve key drawbacks of RLHF, such as incentives for deceptive behavior, by explicitly modeling the interaction between assistant and user as a two-player game where the assistant cannot observe their shared goal. Despite their potential, assistance games have only been explored in simple settings. Scaling them to more complex environments is difficult because it requires both solving intractable decision-making problems under uncertainty and accurately modeling human users’ behavior. We present the first scalable approach to solving assistance games and apply it to a new, challenging Minecraft-based assistance game with over $10^{400}$ possible goals. Our approach, AssistanceZero, extends AlphaZero with a neural network that predicts human actions and rewards, enabling it to plan under uncertainty. We show that AssistanceZero outperforms model-free RL algorithms and imitation learning in the Minecraft-based assistance game. In a human study, our AssistanceZero-trained assistant significantly reduces the number of actions participants take to complete building tasks in Minecraft. Our results suggest that assistance games are a tractable framework for training effective AI assistants in complex environments. Our code and models are available at https://github.com/cassidylaidlaw/minecraft-building-assistance-game.

援助游戏是加强学习人类反馈(RLHF)以培训AI助理的一种有希望的替代方法。援助游戏解决了RLHF的关键缺点,例如鼓励欺骗行为。援助游戏将助理和用户之间的互动作为两个玩家的游戏,使助理无法遵守共同目标。援助游戏尽管有潜力,但只是在简单的环境下才探索。将援助游戏推广到更复杂的环境中是困难的,因为它既需要解决不确定的棘手决策问题,也需要准确地模拟人类用户的行为。我们提出了第一个可扩展的方法来解决援助游戏,并将其应用到一个新的、具有挑战性的基于地雷的援助游戏,其可能的目标超过1 000美元400美元。我们的方法,“援助Zero”,将阿尔法Zero扩展到一个预测人类行动和奖励的神经网络,使其能够在不确定的情况下进行规划。我们显示,援助Zero超越了无模型RL算法和模拟基于地雷的援助游戏的模拟学习。在一项人类研究中,我们培训Zero培训的助理大大减少了参与者完成地雷工业建筑任务的数量。我们的成果表明,“AlezeZero Zero”,我们的援助游戏是一个复杂的建筑模型。


Article 11

Title@2025-06-12 (4): ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems

Title: ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems ReGuidance: Ein einfacher Diffusions-Wrapper zur Steigerung der Probenqualität bei harten inversen Problemen 重新指导:一个简单的传播包包,用于提高反向问题样本质量 2506.10955v1

Authors (3): Aayush Karan, Kulin Shah, Sitan Chen

There has been a flurry of activity around using pretrained diffusion models as informed data priors for solving inverse problems, and more generally around steering these models using reward models. Training-free methods like diffusion posterior sampling (DPS) and its many variants have offered flexible heuristic algorithms for these tasks, but when the reward is not informative enough, e.g., in hard inverse problems with low signal-to-noise ratio, these techniques veer off the data manifold, failing to produce realistic outputs. In this work, we devise a simple wrapper, ReGuidance, for boosting both the sample realism and reward achieved by these methods. Given a candidate solution $\hat{x}$ produced by an algorithm of the user’s choice, we propose inverting the solution by running the unconditional probability flow ODE in reverse starting from $\hat{x}$, and then using the resulting latent as an initialization for DPS. We evaluate our wrapper on hard inverse problems like large box in-painting and super-resolution with high upscaling. Whereas state-of-the-art baselines visibly fail, we find that applying our wrapper on top of these baselines significantly boosts sample quality and measurement consistency. We complement these findings with theory proving that on certain multimodal data distributions, ReGuidance simultaneously boosts the reward and brings the candidate solution closer to the data manifold. To our knowledge, this constitutes the first rigorous algorithmic guarantee for DPS.

使用预先培训的传播模型,作为解决反向问题的知情数据前导,以及更一般地以奖励模型指导这些模型。 推广远端抽样(DPS)及其许多变量等无培训方法为这些任务提供了灵活的超光速算法,但是当奖励不够充分时,例如,在信号到噪音比率低的硬反向问题中,这些技术从数据元流中跳出,未能产生现实产出。在这项工作中,我们设计了一个简单的包装、重新指导,用以提升这些方法所取得的样板真实主义和奖励。鉴于根据用户选择的算法产生的一个候选人解决方案$\hat{x}及其许多变量,我们建议通过运行无条件概率流来扭转这些方法,从$\hat{x}开始,然后利用由此产生的潜值来启动DPS。我们首先评估关于硬反向反向的问题的包装,例如大型的纸箱和超分辨率,以高尺度来提升这些方法所取得的真实性真实性和超清晰度。 在通过用户选择的算算法的精度基准中,我们发现一些最精确性的数据,我们用最精确的精确的精确的基值模型来测量数据,我们用这些最精确的精确的精确的模型来测量模型分析结果。


Article 12

Title@2025-06-12 (4): Build the web for agents, not agents for the web

Title: Build the web for agents, not agents for the web Erstellen Sie das Web für Agenten, nicht Agenten für das Web 为代理者而不是网络代理者建立网络 2506.10953v1

Authors (4): Xing Han Lù, Gaurav Kamath, Marius Mosbach, Siva Reddy

Recent advancements in Large Language Models (LLMs) and multimodal counterparts have spurred significant interest in developing web agents – AI systems capable of autonomously navigating and completing tasks within web environments. While holding tremendous promise for automating complex web interactions, current approaches face substantial challenges due to the fundamental mismatch between human-designed interfaces and LLM capabilities. Current methods struggle with the inherent complexity of web inputs, whether processing massive DOM trees, relying on screenshots augmented with additional information, or bypassing the user interface entirely through API interactions. This position paper advocates for a paradigm shift in web agent research: rather than forcing web agents to adapt to interfaces designed for humans, we should develop a new interaction paradigm specifically optimized for agentic capabilities. To this end, we introduce the concept of an Agentic Web Interface (AWI), an interface specifically designed for agents to navigate a website. We establish six guiding principles for AWI design, emphasizing safety, efficiency, and standardization, to account for the interests of all primary stakeholders. This reframing aims to overcome fundamental limitations of existing interfaces, paving the way for more efficient, reliable, and transparent web agent design, which will be a collaborative effort involving the broader ML community.

大语言模型(LLMS)和多式联运对应方最近的进展引起了人们对开发网络代理器的极大兴趣 – – 能够自主浏览和完成网络环境中任务的AI系统。在拥有使复杂的网络互动自动化的巨大希望的同时,目前的方法面临巨大的挑战,因为人设计的界面与LLM能力之间根本的不匹配。目前的方法与网络投入的内在复杂性挣扎,无论是处理大型DOM树,依靠通过额外信息增加的截图,还是完全通过API互动绕过用户界面。这份立场文件倡导网络代理器研究的范式转变:而不是迫使网络代理器适应为人类设计的界面,我们应开发一种新的互动模式,专门优化了代理能力。为此,我们引入了AWI(AWI)概念,这是一个专门为代理器浏览网站而设计的界面。我们为AWI设计制定了六项指导原则,强调安全、效率和标准化,以顾及所有主要利益攸关方的利益。这一重新配置的目的是克服现有界面的基本限制,为更有效、可靠和透明的网络代理器设计铺平道路,这将扩大社区的协作努力。


Article 13

Title@2025-06-12 (4): Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training

Title: Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training Domain2Vec: Vectorizing Datasets, um die optimale Datenmischung ohne Training zu finden 域2Vec: 将数据集矢量化,以查找未经过培训的最佳数据混合体 2506.10952v1

Authors (4): Mozhi Zhang, Howe Tissue, Lu Wang, Xipeng Qiu

We introduce~\textsc{Domain2Vec}, a novel approach that decomposes any dataset into a linear combination of several \emph{meta-domains}, a new concept designed to capture the key underlying features of datasets. \textsc{Domain2Vec} maintains a vocabulary of meta-domains and uses a classifier to decompose any given dataset into a domain vector that corresponds to a distribution over this vocabulary. These domain vectors enable the identification of the optimal data mixture for language model (LM) pretraining in a training-free manner under the \emph{\textbf{D}istribution \textbf{A}lignment \textbf{A}ssumption} (DA$^{2}$), which suggests that when the data distributions of the training set and the validation set are better aligned, a lower validation loss is achieved. Moreover, \textsc{Domain2vec} can be seamlessly integrated into previous works to model the relationship between domain vectors and LM performance, greatly enhancing the efficiency and scalability of previous methods. Extensive experiments demonstrate that \textsc{Domain2Vec} helps find the data mixture that enhances downstream task performance with minimal computational overhead. Specifically, \textsc{Domain2Vec} achieves the same validation loss on Pile-CC using only $51.5\%$ of the computation required when training on the original mixture of The Pile dataset. Under equivalent compute budget, \textsc{Domain2Vec} improves downstream performance by an average of $2.83\%$.

我们引入了{ textsc{ domain2Vec} 这一新的方法, 将任何数据集分解成多个\ emph{ meta- domains} 的线性组合。 这是一个新概念, 旨在捕捉数据集的关键基本特征。\ textsc{ Domain2Vec} 维持一个元域词汇, 并使用一个分类器将任何特定数据集分解成一个与此词汇中分布相对应的域矢量。 这些域矢量使得能够根据\ emph{ meta- domays} 将任何数据集分解成一个线性化组合。 在\ emph{ textb{ D} 以无培训方式对语言模式( LM) 进行预培训前的优化数据组合。 Do} greatation\ textbf{A} lignment\ textbf{A} Asumplegation of divical conditional ladections the requideal requidealal $ $ $2\ deviltypecial develyal devely extradeal le exdeal le ex deal extradestruffects a ex extial le lemental lemental lemental lections lections 在前端, levelmental lections lemental press


Article 14

Title@2025-06-12 (4): Execution Guided Line-by-Line Code Generation

Title: Execution Guided Line-by-Line Code Generation Ausführen Geführte Zeilen-für-Linien-Code-Generierung 逐行逐行制导 2506.10948v1

Authors (3): Boaz Lavon, Shahar Katz, Lior Wolf

We present a novel approach to neural code generation that incorporates real-time execution signals into the language model generation process. While large language models (LLMs) have demonstrated impressive code generation capabilities, they typically do not utilize execution feedback during inference, a critical signal that human programmers regularly leverage. Our method, Execution-Guided Classifier-Free Guidance (EG-CFG), dynamically incorporates execution signals as the model generates code, providing line-by-line feedback that guides the generation process toward executable solutions. EG-CFG employs a multi-stage process: first, we conduct beam search to sample candidate program completions for each line; second, we extract execution signals by executing these candidates against test cases; and finally, we incorporate these signals into the prompt during generation. By maintaining consistent signals across tokens within the same line and refreshing signals at line boundaries, our approach provides coherent guidance while preserving syntactic structure. Moreover, the method naturally supports native parallelism at the task level in which multiple agents operate in parallel, exploring diverse reasoning paths and collectively generating a broad set of candidate solutions. Our experiments across diverse coding tasks demonstrate that EG-CFG significantly improves code generation performance compared to standard approaches, achieving state-of-the-art results across various levels of complexity, from foundational problems to challenging competitive programming tasks. Our code is available at: https://github.com/boazlavon/eg_cfg

我们提出了一种将实时执行信号纳入语言模式生成过程的神经代码生成新颖方法,将实时执行信号纳入语言模式生成过程。大型语言模型(LLMs)展示了令人印象深刻的代码生成能力,而大型语言模型(LLMs)则通常没有在推断过程中使用执行反馈,而这是人类程序员经常利用的关键信号。我们的方法,即执行-指导分类-免责指南(EG-CFG),在模型生成代码时动态整合执行信号,提供逐行反馈,引导生成可执行解决方案的生成过程。EG-CFG采用多阶段程序:首先,我们进行抽样搜索,为每行选取候选人方案完成样本;第二,我们通过执行这些候选人测试案例来提取执行信号;最后,我们将这些信号纳入到生成过程中的快速信号。我们的方法,即执行-指导了同一行内的一致性信号,在保持同步结构结构的同时,提供了一致的指导。此外,该方法自然地支持任务层面的本土平行平行运行,探索不同的推理路径,并集体生成广泛的候选人解决方案。我们在不同的网络-CFLEG-C-C-C-C-CF-CFS-CFD-CFD 不同层次上实现具有挑战性的业绩-CFD-CFD-CFD-CFD-CFD的代码层次上的各种实验,从不同的标准化方法,从不同的标准-CFDM-CFDM 和不同层次上,从不同的标准-CFD-CFD-CFDM 的层次,从不同的实验,从不同的标准化方法到到到不同层次,从不同的标准-CFD。


Article 15

Title@2025-06-12 (4): Spurious Rewards: Rethinking Training Signals in RLVR

Title: Spurious Rewards: Rethinking Training Signals in RLVR Spurious Rewards: Rethinking Trainingssignale in RLVR 净利奖:在RLVR重新思考培训信号 2506.10947v1

Authors (14): Rulin Shao, Shuyue Stella Li, Rui Xin, Scott Geng, Yiping Wang, Sewoong Oh, Simon Shaolei Du, Nathan Lambert, Sewon Min, Ranjay Krishna, Yulia Tsvetkov, Hannaneh Hajishirzi, Pang Wei Koh, Luke Zettlemoyer

We show that reinforcement learning with verifiable rewards (RLVR) can elicit strong mathematical reasoning in certain models even with spurious rewards that have little, no, or even negative correlation with the correct answer. For example, RLVR improves MATH-500 performance for Qwen2.5-Math-7B in absolute points by 21.4% (random reward), 13.8% (format reward), 24.1% (incorrect label), 26.0% (1-shot RL), and 27.1% (majority voting) – nearly matching the 29.1% gained with ground truth rewards. However, the spurious rewards that work for Qwen often fail to yield gains with other model families like Llama3 or OLMo2. In particular, we find code reasoning – thinking in code without actual code execution – to be a distinctive Qwen2.5-Math behavior that becomes significantly more frequent after RLVR, from 65% to over 90%, even with spurious rewards. Overall, we hypothesize that, given the lack of useful reward signal, RLVR must somehow be surfacing useful reasoning representations learned during pretraining, although the exact mechanism remains a topic for future work. We suggest that future RLVR research should possibly be validated on diverse models rather than a single de facto choice, as we show that it is easy to get significant performance gains on Qwen models even with completely spurious reward signals.

我们显示,通过可核查的奖励(RLVR)强化学习可以在某些模型中引起强烈的数学推理,即使虚假的奖励很少、没有,甚至与正确的答案有负相关关系。例如,RLVR将Quwen2.5-Matth-7B的MATH-500性能绝对提高21.4%(随机奖赏)、13.8%(标准奖赏)、24.1%(标准奖赏)、24.1%(不正确的标签)、26.0%(1发RLL)和27.1%(多数投票) – – 几乎与获得的29.1%的实情奖状相匹配。然而,为Quenyerywen工作往往无法与Llama3 或OLMo2等其他模范式家庭取得收益的虚假奖状。特别是,我们发现代码推理学原理 – – 在代码中思考而不实际的代码执行 – – 是一种独特的Qwen2.5-Math行为,在RVR后,从65%到90%以上的奖状,甚至有惊人的奖状。总体而言,我们假设,由于缺少有用的奖状奖状的奖状信号,RVLV 一定的推理陈述在前学到了有用的推理陈述,虽然确切的机制是我们未来的研究成绩可能是一个重要的题目,但只是的标志,在将来的标志性能的标志。


Article 16

Title@2025-06-12 (4): GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

Title: GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models GUARD: Geführtes Lernen und Zurückhalten über Datenzuweisung für große Sprachmodelle GUARD:通过大语言模式数据归称制,指导学习和保留 2506.10946v1

Authors (7): Evelyn Ma, Duo Zhou, Peizhi Niu, Huiting Zhou, Huan Zhang, Olgica Milenkovic, S. Rasoul Etesami

Unlearning in large language models (LLMs) is becoming increasingly important due to regulatory compliance, copyright protection, and privacy concerns. However, a key challenge in LLM unlearning is unintended forgetting, where the removal of specific data inadvertently impairs the utility of the model and its retention of valuable, desired information. While prior work has primarily focused on architectural innovations, the influence of data-level factors on unlearning performance remains underexplored. As a result, existing methods often suffer from degraded retention when forgetting high-impact data. To address this, we propose GUARD-a novel framework for Guided Unlearning And Retention via Data attribution. At its core, GUARD introduces a lightweight proxy data attribution metric tailored for LLM unlearning, which quantifies the “alignment” between the forget and retain sets while remaining computationally efficient. Building on this, we design a novel unlearning objective that assigns adaptive, nonuniform unlearning weights to samples, inversely proportional to their proxy attribution scores. Through such a reallocation of unlearning power, GUARD mitigates unintended losses in retention. We provide rigorous theoretical guarantees that GUARD significantly enhances retention while maintaining forgetting metrics comparable to prior methods. Extensive experiments on the TOFU benchmark across multiple LLM architectures demonstrate that GUARD substantially improves utility preservation while ensuring effective unlearning. Notably, GUARD reduces utility sacrifice on the Retain Set by up to 194.92% in terms of Truth Ratio when forgetting 10% of the training data.

在大型语言模型(LLMS)中,不学于大型语言模型(LLM)正变得日益重要。然而,LLM(LLM)中的一个关键挑战是无意忘记,因为删除具体数据会无意地损害模型的效用和保存宝贵、期望的信息。虽然以前的工作主要侧重于建筑创新,但数据层面因素对不学业绩的影响仍然未得到充分探讨。因此,在忘记高影响数据时,现有方法往往会受到退化保留。为了解决这个问题,我们提议GUARD(GUARD)为通过数据归属引导不学和保留建立一个新的框架。在核心方面,GUARD为LM(LM)不学习推出一个轻型代用代用数据归属指标,该指标使遗忘和保留各套之间的“匹配”在计算上保持效率的同时被量化。在此基础上,我们设计了一个新的不学习目标,即给样本分配适应性、非单一的不学习权重,与其代用分数的分数反比。我们提议,GURD(GARD)提出一个严格的理论保证为LM(GR)在大幅提升历史标定值的同时,同时,使R(DR)R(DR)的R)BILILILILIL)标准在以往的升级中大幅改进。同时,同时,同时,使R(BILVILBIBIBR(R)的不比重)的保存。


Article 17

Title@2025-06-12 (4): Coupled reaction and diffusion governing interface evolution in solid-state batteries

Title: Coupled reaction and diffusion governing interface evolution in solid-state batteries Gekoppelte Reaktion und Diffusion, die die Schnittstellenentwicklung in Festkörperbatterien regeln 关于固体状态电池界面演变的组合反应与扩散 2506.10944v1

Authors (7): Jingxuan Ding, Laura Zichi, Matteo Carli, Menghang Wang, Albert Musaelian, Yu Xie, Boris Kozinsky

Understanding and controlling the atomistic-level reactions governing the formation of the solid-electrolyte interphase (SEI) is crucial for the viability of next-generation solid state batteries. However, challenges persist due to difficulties in experimentally characterizing buried interfaces and limits in simulation speed and accuracy. We conduct large-scale explicit reactive simulations with quantum accuracy for a symmetric battery cell, {\symcell}, enabled by active learning and deep equivariant neural network interatomic potentials. To automatically characterize the coupled reactions and interdiffusion at the interface, we formulate and use unsupervised classification techniques based on clustering in the space of local atomic environments. Our analysis reveals the formation of a previously unreported crystalline disordered phase, Li$2$S${0.72}$P${0.14}$Cl${0.14}$, in the SEI, that evaded previous predictions based purely on thermodynamics, underscoring the importance of explicit modeling of full reaction and transport kinetics. Our simulations agree with and explain experimental observations of the SEI formations and elucidate the Li creep mechanisms, critical to dendrite initiation, characterized by significant Li motion along the interface. Our approach is to crease a digital twin from first principles, without adjustable parameters fitted to experiment. As such, it offers capabilities to gain insights into atomistic dynamics governing complex heterogeneous processes in solid-state synthesis and electrochemistry.

了解和控制关于形成固态电路交替器(SEI)的原子级反应对于下一代固态电池的可行性至关重要。然而,由于在实验性地定性掩埋界面方面的困难以及模拟速度和精确度方面的限制,挑战依然存在。我们在SEI进行大规模明确的反应模拟,对对称电池电池电池(thsymcell})进行量度精确度的模拟,通过积极学习和深度等异质神经网络间生潜能来进行这种模拟。要自动确定界面上的各种反应和相互融合的特征,我们根据当地原子环境空间的组合,制定和使用不受监督的分类技术。我们的分析显示,一个先前未报告的晶状干扰的阶段的形成,即:Li$2$=0.72}P$0.14}0.14美元。 在SEI中,通过纯粹基于热力学过程的预测而回避以前的预测,强调对全面反应和运输动态进行明确模拟的重要性。我们的模拟同意并解释对SEI的实验的实验性观测结果,从重要的磁性结构到我们开始的精确性机制。


Article 18

Title@2025-06-12 (4): TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems

Title: TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems TimberStrike: Datensatz-Rekonstruktion Angriff Enthüllen der Privatsphäre Leckage in Federated Tree-Based Systems 木材三角:联邦树基系统中数据集重建攻击清除隐私渗漏 2506.07605v2

Authors (5): Marco Di Gennaro, Giovanni De Lucia, Stefano Longari, Stefano Zanero, Michele Carminati

Federated Learning has emerged as a privacy-oriented alternative to centralized Machine Learning, enabling collaborative model training without direct data sharing. While extensively studied for neural networks, the security and privacy implications of tree-based models remain underexplored. This work introduces TimberStrike, an optimization-based dataset reconstruction attack targeting horizontally federated tree-based models. Our attack, carried out by a single client, exploits the discrete nature of decision trees by using split values and decision paths to infer sensitive training data from other clients. We evaluate TimberStrike on State-of-the-Art federated gradient boosting implementations across multiple frameworks, including Flower, NVFlare, and FedTree, demonstrating their vulnerability to privacy breaches. On a publicly available stroke prediction dataset, TimberStrike consistently reconstructs between 73.05% and 95.63% of the target dataset across all implementations. We further analyze Differential Privacy, showing that while it partially mitigates the attack, it also significantly degrades model performance. Our findings highlight the need for privacy-preserving mechanisms specifically designed for tree-based Federated Learning systems, and we provide preliminary insights into their design.

联邦学习联合会已成为中央机构学习的以隐私为导向的替代方案,有利于合作模式培训,而没有直接分享数据。尽管对神经网络进行了广泛研究,但基于树的模型对安全和隐私的影响仍未得到充分探讨。这项工作引入了TaultStrike,这是以横向结合的树为基础的模型为对象的基于优化的数据元重建攻击。我们由一个客户进行的攻击,利用决策树的离散性质,利用不同的价值和决定路径从其他客户处推断敏感培训数据。我们评估了木材在包括Flower、NVFFlare和FedTre在内的多个框架的州级联盟梯度促进实施方面发生的碰撞,展示了它们易受隐私破坏的脆弱性。在公开提供的中风预测数据集中,木材Strike持续地重建了所有执行过程中目标数据集的73.05 %至95.63%。我们进一步分析差异隐私,表明它虽然部分减轻了攻击,但也显著地降低了模型性。我们的调查结果突出表明需要专门为基于树木的联邦学习系统设计的隐私保护机制,我们提供了初步的见解。


Article 19

Title@2025-06-12 (4): Self-Adapting Language Models

Title: Self-Adapting Language Models Selbstanpassende Sprachmodelle 自我改进语言模式 2506.10943v1

Authors (6): Adam Zweiger, Jyothish Pari, Han Guo, Ekin Akyürek, Yoon Kim, Pulkit Agrawal

Large language models (LLMs) are powerful but static; they lack mechanisms to adapt their weights in response to new tasks, knowledge, or examples. We introduce Self-Adapting LLMs (SEAL), a framework that enables LLMs to self-adapt by generating their own finetuning data and update directives. Given a new input, the model produces a self-edit-a generation that may restructure the information in different ways, specify optimization hyperparameters, or invoke tools for data augmentation and gradient-based updates. Through supervised finetuning (SFT), these self-edits result in persistent weight updates, enabling lasting adaptation. To train the model to produce effective self-edits, we use a reinforcement learning loop with the downstream performance of the updated model as the reward signal. Unlike prior approaches that rely on separate adaptation modules or auxiliary networks, SEAL directly uses the model’s own generation to control its adaptation process. Experiments on knowledge incorporation and few-shot generalization show that SEAL is a promising step toward language models capable of self-directed adaptation. Our website and code is available at https://jyopari.github.io/posts/seal.

大型语言模型(LLMS)是强大但静态的;它们缺乏根据新的任务、知识或实例调整其权重的机制;我们引入了自我开发的LLMS(SEAL),这个框架使LLMS能够通过生成自己的微调数据和更新指令进行自我改造。根据新的投入,该模型产生自编辑的一代,可以以不同方式重组信息,指定最优化的超参数,或为数据扩增和梯度更新援引工具。通过监管的微调(SFT),这些自编辑导致持续更新重量,使得能够进行持久的适应。为制作有效自编辑而培训模型,我们使用一个强化学习循环,将更新后的模型的下游性表现作为奖励信号。与以前依赖单独的适应模块或辅助网络的方法不同,SEAL直接使用模型的生成来控制其适应进程。关于知识整合的实验和几张图的概括化显示SEAL是朝着能够自导适应的语言模型迈出的有希望的一步。我们的网站和代码可以在 https://jyopari.github.gitual/s。


Article 20

Title@2025-06-12 (4): VINCIE: Unlocking In-context Image Editing from Video

Title: VINCIE: Unlocking In-context Image Editing from Video VINCIE: Im Kontext Bildbearbeitung von Video entsperren VINCIE: 从视频中解锁 Incontext 图像编辑 2506.10941v1

Authors (10): Leigang Qu, Feng Cheng, Ziyan Yang, Qi Zhao, Shanchuan Lin, Yichun Shi, Yicong Li, Wenjie Wang, Tat-Seng Chua, Lu Jiang

In-context image editing aims to modify images based on a contextual sequence comprising text and previously generated images. Existing methods typically depend on task-specific pipelines and expert models (e.g., segmentation and inpainting) to curate training data. In this work, we explore whether an in-context image editing model can be learned directly from videos. We introduce a scalable approach to annotate videos as interleaved multimodal sequences. To effectively learn from this data, we design a block-causal diffusion transformer trained on three proxy tasks: next-image prediction, current segmentation prediction, and next-segmentation prediction. Additionally, we propose a novel multi-turn image editing benchmark to advance research in this area. Extensive experiments demonstrate that our model exhibits strong in-context image editing capabilities and achieves state-of-the-art results on two multi-turn image editing benchmarks. Despite being trained exclusively on videos, our model also shows promising abilities in multi-concept composition, story generation, and chain-of-editing applications.

文本内图像编辑旨在根据由文本和先前生成的图像组成的背景序列修改图像。 现有方法通常取决于任务特定管道和专家模型( 如分解和油漆)来整理培训数据。 在这项工作中,我们探讨能否直接从视频中学习文本内图像编辑模型; 我们采用可扩缩的方法将视频注释作为相互脱节的多式联运序列; 为了有效地从这些数据中学习, 我们设计了一个块- 螺旋扩散变压器, 就三种代理任务进行了培训: 下一个图像预测, 当前分解预测, 和下一个分解预测。 此外, 我们提出一个新的多方向图像编辑基准, 以推进这一领域的研究。 广泛的实验表明, 我们的模式展示了超强的文本图像编辑能力, 在两个多方向图像编辑基准上取得了最新的结果。 尽管我们只接受过视频培训, 我们的模式还展示了多概念构成、 故事生成和链编辑应用程序方面的有前途的能力。


Article 21

Title@2025-06-12 (4): Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction

Title: Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction Entwicklung eines leistungsfähigen Rahmens für die Sprachemotionserkennung unter naturalistischen Bedingungen Herausforderung für die emotionale Attributvorhersage 制定一个高绩效框架,以便在自然条件下对情感属性预测的自然条件挑战中承认言论情感 2506.10930v1

Authors (4): Thanathai Lertpetchpun, Tiantian Feng, Dani Byrd, Shrikanth Narayanan

Speech emotion recognition (SER) in naturalistic conditions presents a significant challenge for the speech processing community. Challenges include disagreement in labeling among annotators and imbalanced data distributions. This paper presents a reproducible framework that achieves superior (top 1) performance in the Emotion Recognition in Naturalistic Conditions Challenge (IS25-SER Challenge) - Task 2, evaluated on the MSP-Podcast dataset. Our system is designed to tackle the aforementioned challenges through multimodal learning, multi-task learning, and imbalanced data handling. Specifically, our best system is trained by adding text embeddings, predicting gender, and including Other'' (O) andNo Agreement’’ (X) samples in the training set. Our system’s results secured both first and second places in the IS25-SER Challenge, and the top performance was achieved by a simple two-system ensemble.

语言处理社区面临的重大挑战是自然条件下的言语情绪识别(SER)对语言处理社区来说是一个重大挑战,挑战包括记录者之间在标签上的分歧和数据分布不平衡。本文件提供了一个可复制的框架,在自然条件下的情感识别挑战(IS25-SER挑战)中取得了优异(顶部1)性能。任务2,对MSP-Podcast数据集进行了评估。我们的系统旨在通过多式学习、多任务学习和不平衡的数据处理来应对上述挑战。具体地说,我们的最佳系统通过添加文本嵌入、预测性别以及将“其他”和“不协议”样本纳入培训集来培训。我们的系统结果确保了IS25-SER挑战的第一和第二位,而顶级的功能是通过简单的两系统共体实现的。


Article 22

Title@2025-06-12 (4): Improving LLM Safety Alignment with Dual-Objective Optimization

Title: Improving LLM Safety Alignment with Dual-Objective Optimization Verbesserung der LLM-Sicherheitsausrichtung mit Dual-Ziel-Optimierung 提高LLM安全一致性,实现双目标优化 2503.03710v2

Authors (7): Xuandong Zhao, Will Cai, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song

Existing training-time safety alignment techniques for large language models (LLMs) remain vulnerable to jailbreak attacks. Direct preference optimization (DPO), a widely deployed alignment method, exhibits limitations in both experimental and theoretical contexts as its loss function proves suboptimal for refusal learning. Through gradient-based analysis, we identify these shortcomings and propose an improved safety alignment that disentangles DPO objectives into two components: (1) robust refusal training, which encourages refusal even when partial unsafe generations are produced, and (2) targeted unlearning of harmful knowledge. This approach significantly increases LLM robustness against a wide range of jailbreak attacks, including prefilling, suffix, and multi-turn attacks across both in-distribution and out-of-distribution scenarios. Furthermore, we introduce a method to emphasize critical refusal tokens by incorporating a reward-based token-level weighting mechanism for refusal learning, which further improves the robustness against adversarial exploits. Our research also suggests that robustness to jailbreak attacks is correlated with token distribution shifts in the training process and internal representations of refusal and harmful tokens, offering valuable directions for future research in LLM safety alignment. The code is available at https://github.com/wicai24/DOOR-Alignment

对大型语言模型(LLMS)的现有培训时间安全调整技术仍然容易受到越狱攻击。直接偏好优化(DPO)是一种广泛采用的调整方法,在实验和理论背景下都表现出了局限性,因为其损失功能证明对拒绝学习而言并不理想。我们通过基于梯度的分析,查明这些缺陷,并提议改进安全调整,将DPO的目标分为两个部分:(1) 强有力的拒绝培训,鼓励即使在产生部分不安全的一代人的情况下也予以拒绝,(2) 有针对性的不学习有害知识。这一方法大大提高了LM对一系列广泛的越狱攻击的稳健性,包括预先填补、后缀和在分配和分配之外两种情况下的多发攻击。此外,我们引入了一种强调关键拒绝标志的方法,即采用基于奖励的象征性加权机制来学习拒绝学习,从而进一步提高对抗对抗敌对剥削的稳健性。我们的研究还表明,对监狱破伤攻击的稳健性与培训过程中象征性的分发变化以及拒绝和有害标志的内部表现有关,为LM安全调整的未来研究提供了宝贵的方向。该代码可在 https://Giusub/DOB/DO.


Article 23

Title@2025-06-12 (4): On feature selection in double-imbalanced data settings: a Random Forest approach

Title: On feature selection in double-imbalanced data settings: a Random Forest approach Zur Feature-Auswahl in doppelt ausgeglichenen Dateneinstellungen: ein Random Forest-Ansatz 在双重平衡数据设置中选择功能的功能:随机森林办法 2506.10929v1

Authors (1): Fabio Demaria

Feature selection is a critical step in high-dimensional classification tasks, particularly under challenging conditions of double imbalance, namely settings characterized by both class imbalance in the response variable and dimensional asymmetry in the data $(n \gg p)$. In such scenarios, traditional feature selection methods applied to Random Forests (RF) often yield unstable or misleading importance rankings. This paper proposes a novel thresholding scheme for feature selection based on minimal depth, which exploits the tree topology to assess variable relevance. Extensive experiments on simulated and real-world datasets demonstrate that the proposed approach produces more parsimonious and accurate subsets of variables compared to conventional minimal depth-based selection. The method provides a practical and interpretable solution for variable selection in RF under double imbalance conditions.

地物选择是高层次分类任务的关键步骤,特别是在具有挑战性的双重不平衡条件下,即在数据$(n\gg p)中以反应变数和尺寸不对称两种等级不平衡为特点的设置。在这种情况下,适用于随机森林的传统地物选择方法往往产生不稳定或误导重要性的排名。本文提出基于最低深度的地物选择新颖的门槛计划,利用树本学评估可变关联性。模拟和实际世界数据集的广泛实验表明,与传统的最低深度选择相比,拟议的方法产生了更相似和准确的变量子集。这种方法为在双重失衡条件下选择俄罗斯联邦的变量提供了实用和可解释的解决方案。


Article 24

Title@2025-06-12 (4): Investigating the Relationship Between Physical Activity and Tailored Behavior Change Messaging: Connecting Contextual Bandit with Large Language Models

Title: Investigating the Relationship Between Physical Activity and Tailored Behavior Change Messaging: Connecting Contextual Bandit with Large Language Models Untersuchung der Beziehung zwischen körperlicher Aktivität und maßgeschneidertem Verhalten Veränderung Messaging: Verknüpfung von Kontextbandit mit großen Sprachmodellen 调查身体活动与因地制宜的行为变化通信之间的关系:将环境强盗与大语言模式联系起来 2506.07275v2

Authors (7): Haochen Song, Dominik Hofer, Rania Islambouli, Laura Hawkins, Ananya Bhattacharjee, Meredith Franklin, Joseph Jay Williams

Machine learning approaches, such as contextual multi-armed bandit (cMAB) algorithms, offer a promising strategy to reduce sedentary behavior by delivering personalized interventions to encourage physical activity. However, cMAB algorithms typically require large participant samples to learn effectively and may overlook key psychological factors that are not explicitly encoded in the model. In this study, we propose a hybrid approach that combines cMAB for selecting intervention types with large language models (LLMs) to personalize message content. We evaluate four intervention types: behavioral self-monitoring, gain-framed, loss-framed, and social comparison, each delivered as a motivational message aimed at increasing motivation for physical activity and daily step count. Message content is further personalized using dynamic contextual factors including daily fluctuations in self-efficacy, social influence, and regulatory focus. Over a seven-day trial, participants receive daily messages assigned by one of four models: cMAB alone, LLM alone, combined cMAB with LLM personalization (cMABxLLM), or equal randomization (RCT). Outcomes include daily step count and message acceptance, assessed via ecological momentary assessments (EMAs). We apply a causal inference framework to evaluate the effects of each model. Our findings offer new insights into the complementary roles of LLM-based personalization and cMAB adaptation in promoting physical activity through personalized behavioral messaging.

在这项研究中,我们提出一种混合方法,将选择干预类型与大语言模型(LLMs)相结合,使信息内容个性化。我们评价了四种干预类型:行为自我监测、获取框架、损失框架和社会比较,每个类型都作为一种激励信息,目的是增加体力活动和日常步骤计数的动力。信息内容利用动态背景因素进一步个性化,包括自觉、社会影响和监管重点方面的日常波动。在七天的试验中,与会者收到四种模式之一的每日信息:单是CMAB、LLM、与LM个人化(cMABxLM)、或平等随机化(RCT)。结果包括每日步骤计数和信息接收,通过个人瞬间判断评估评估个人行为。我们运用了每个周期性结论,我们运用了个人行为分析模型,我们运用了每个周期性结论性评估。


Article 25

Title@2025-06-12 (4): Robustly Improving LLM Fairness in Realistic Settings via Interpretability

Title: Robustly Improving LLM Fairness in Realistic Settings via Interpretability Robuste Verbesserung der LLM Fairness in realistischen Einstellungen durch Dolmetschbarkeit 通过可解释性在现实环境中强有力地提高LLM公平性 2506.10922v1

Authors (2): Adam Karvonen, Samuel Marks

Large language models (LLMs) are increasingly deployed in high-stakes hiring applications, making decisions that directly impact people’s careers and livelihoods. While prior studies suggest simple anti-bias prompts can eliminate demographic biases in controlled evaluations, we find these mitigations fail when realistic contextual details are introduced. We address these failures through internal bias mitigation: by identifying and neutralizing sensitive attribute directions within model activations, we achieve robust bias reduction across all tested scenarios. Across leading commercial (GPT-4o, Claude 4 Sonnet, Gemini 2.5 Flash) and open-source models (Gemma-2 27B, Gemma-3, Mistral-24B), we find that adding realistic context such as company names, culture descriptions from public careers pages, and selective hiring constraints (e.g.,``only accept candidates in the top 10\%”) induces significant racial and gender biases (up to 12\% differences in interview rates). When these biases emerge, they consistently favor Black over White candidates and female over male candidates across all tested models and scenarios. Moreover, models can infer demographics and become biased from subtle cues like college affiliations, with these biases remaining invisible even when inspecting the model’s chain-of-thought reasoning. To address these limitations, our internal bias mitigation identifies race and gender-correlated directions and applies affine concept editing at inference time. Despite using directions from a simple synthetic dataset, the intervention generalizes robustly, consistently reducing bias to very low levels (typically under 1\%, always below 2.5\%) while largely maintaining model performance. Our findings suggest that practitioners deploying LLMs for hiring should adopt more realistic evaluation methodologies and consider internal mitigation strategies for equitable outcomes.

大型语言模型(LLMS)越来越多地被部署到高端招聘应用程序中,做出直接影响人们职业和生计的决定。虽然先前的研究显示简单的反偏见提示可以消除受控评价中的人口偏差,但我们发现,当引入现实背景细节时,这些缓解措施会失败。 我们通过内部偏差减缓这些失败:通过在模型启动过程中识别和中和敏感属性方向,我们在所有测试的情景中实现强烈的偏差减少。在领先的商业模型(GPT-4o、Claude 4 Sonnet、Gemini 2.5 Flash)和开放源代码模型(Gemma-227B、Gemma-3、Gemma-3、Mistral-24B)中,我们发现,在多数情况下,增加现实的环境环境环境环境,如公司名称、公共职业网页的文化描述以及选择性招聘限制(例如,只接受前10的候选人),会引发严重的种族和性别偏差(在面试率上达到12差异 ) 。当这些偏差出现这些偏差时,在所有测试的模型和低度模型和假设模型中,他们总是偏向白人候选人和女性候选人倾重男候选人。此外, 考虑保持人口统计学战略,从低的偏差的偏差,从大学的偏差,从低于大学的校友关系,在1级推向下,在测试中推向下,在精确推向下,在计算。


Article 26

Title@2025-06-12 (4): Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

Title: Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization Dekomponieren von MLP-Aktivierungen in Interpretierbare Funktionen über semi-Nonnegative Matrix-Fabrikisierung 通过半氮基矩阵化系数化,将劳动和生产部的分解活动转化为可解释性特征 2506.10920v1

Authors (3): Or Shafran, Atticus Geiger, Mor Geva

A central goal for mechanistic interpretability has been to identify the right units of analysis in large language models (LLMs) that causally explain their outputs. While early work focused on individual neurons, evidence that neurons often encode multiple concepts has motivated a shift toward analyzing directions in activation space. A key question is how to find directions that capture interpretable features in an unsupervised manner. Current methods rely on dictionary learning with sparse autoencoders (SAEs), commonly trained over residual stream activations to learn directions from scratch. However, SAEs often struggle in causal evaluations and lack intrinsic interpretability, as their learning is not explicitly tied to the computations of the model. Here, we tackle these limitations by directly decomposing MLP activations with semi-nonnegative matrix factorization (SNMF), such that the learned features are (a) sparse linear combinations of co-activated neurons, and (b) mapped to their activating inputs, making them directly interpretable. Experiments on Llama 3.1, Gemma 2 and GPT-2 show that SNMF derived features outperform SAEs and a strong supervised baseline (difference-in-means) on causal steering, while aligning with human-interpretable concepts. Further analysis reveals that specific neuron combinations are reused across semantically-related features, exposing a hierarchical structure in the MLP’s activation space. Together, these results position SNMF as a simple and effective tool for identifying interpretable features and dissecting concept representations in LLMs.

机械诠释的核心目标是在大型语言模型(LLMs)中确定正确的分析单位,这些分析单位能因果解释其产出。虽然早期工作侧重于单个神经元,但证据表明神经元往往将多种概念编码成多种概念,从而推动向分析空间激活方向转变。一个关键问题是如何找到能够以不受监督的方式捕捉可解释特征的方向。目前的方法依靠的是与稀有自动编码器(SAEs)一起的字典学习,这些方法通常经过关于从零开始学习方向的剩余流启动的训练。然而,SAEs经常在因果评估方面挣扎,缺乏内在解释性的解释性,因为它们的学习与模型特性的计算没有明确挂钩。在这里,我们通过直接将MLP的启动与半同步矩阵因子化要素(SNMF)进行分解,从而找到能够以不受监督的方式捕捉到可解释的特性。当前方法依赖于(a) 与稀松散的自动神经元神经元的线性组合,以及(b)与其激活性投入直接解释。Llama 3.1, Gemma 2 和GPT-2 实验显示SNMLEDF 的超越了SAEER 的定位位置位置,同时将S-de-de-deal-de-deal-deal-res-lavial-liversal-laviews) 的定位,同时将一个强大的结构显示一个强大的结构的、一个强大的结构,并显示一个稳定的循环的精确的精确的模型,并显示一个稳定的模型。


Article 27

Title@2025-06-12 (4): Sequential-Parallel Duality in Prefix Scannable Models

Title: Sequential-Parallel Duality in Prefix Scannable Models Sequential-Parallel-Dualität in voreingestellten Scannable-Modellen 前缀扫描可扫描模型中的序列- Parallel 品质 2506.10918v1

Authors (6): Morris Yau, Sharut Gupta, Valerie Engelmayer, Kazuki Irie, Stefanie Jegelka, Jacob Andreas

Modern neural sequence models are designed to meet the dual mandate of parallelizable training and fast sequential inference. Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba, that achieve such ``sequential-parallel duality.’’ This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference? We begin by describing a broad class of such models – state space models – as those whose state updates can be computed using the classic parallel prefix scan algorithm with a custom associative aggregation operator. We then define a more general class, Prefix-Scannable Models (PSMs), by relaxing the state aggregation operator to allow arbitrary (potentially non-associative) functions such as softmax attention. This generalization unifies many existing architectures, including element-wise RNNs (e.g., Mamba) and linear transformers (e.g., GLA, Mamba2, mLSTM), while also introducing new models with softmax-like operators that achieve O(1) amortized compute per token and log(N) memory for sequence length N. We empirically evaluate such models on illustrative small-scale language modeling and canonical synthetic tasks, including state tracking and associative recall. Empirically, we find that PSMs retain the expressivity of transformer-based architectures while matching the inference efficiency of state space models – in some cases exhibiting better length generalization than either.

现代神经序列模型的设计是为了满足平行培训和快速连续测序的双重任务的双重任务。最近的事态发展产生了各种模型,如Gated Linear Connect (GLA) 和 Mamba,这些模型可以实现“序列-平行双重性 ” 。 ”这提出了一个自然的问题:我们能否将支持近一致-时间平行评估和线性-时间、恒定-空间相继推的全层神经序列模型定性为支持近一致-平行评价和线性-时间、恒定-空间相继推断的全级神经序列模型?我们首先描述一大批这类模型 – – 国家空间模型 – – 即国家更新可以使用传统的平行前端扫描算算法和一个自定义的自定义变递缩缩式组合集集控操作器(GLA、Mamba2、可扫描模型)和MLSTM等各种模型。我们随后定义了一个更通用的类,通过放松国家汇总操作器使任意的(潜在非关联性)功能性功能性(例如软性RNNNNIS)的模型和线性变换式模型,同时在软性SAL-imal-imal-lial lial-lial-lial-lial-lial lical ladeal maild maild maildal maild maildal maisal commal commal commal commal commal commal commal commal commal commal im commal im commal im im im commation commation commation commal commal commex commal commal impeal commal commal commal commal commal commal commal commal compeal impeal compeal commal impeal impeal compeal commal commal commal commal commal commal commal commal commal immal im commal commal comm


Article 28

Title@2025-06-12 (4): M4V: Multi-Modal Mamba for Text-to-Video Generation

Title: M4V: Multi-Modal Mamba for Text-to-Video Generation M4V: Multi-Modal Mamba für die Text-zu-Video-Generierung M4V: 用于文本到视频一代的多式Mamba 2506.10915v1

Authors (8): Jiancheng Huang, Gengwei Zhang, Zequn Jie, Siyu Jiao, Yinlong Qian, Ling Chen, Yunchao Wei, Lin Ma

Text-to-video generation has significantly enriched content creation and holds the potential to evolve into powerful world simulators. However, modeling the vast spatiotemporal space remains computationally demanding, particularly when employing Transformers, which incur quadratic complexity in sequence processing and thus limit practical applications. Recent advancements in linear-time sequence modeling, particularly the Mamba architecture, offer a more efficient alternative. Nevertheless, its plain design limits its direct applicability to multi-modal and spatiotemporal video generation tasks. To address these challenges, we introduce M4V, a Multi-Modal Mamba framework for text-to-video generation. Specifically, we propose a multi-modal diffusion Mamba (MM-DiM) block that enables seamless integration of multi-modal information and spatiotemporal modeling through a multi-modal token re-composition design. As a result, the Mamba blocks in M4V reduce FLOPs by 45% compared to the attention-based alternative when generating videos at 768$\times$1280 resolution. Additionally, to mitigate the visual quality degradation in long-context autoregressive generation processes, we introduce a reward learning strategy that further enhances per-frame visual realism. Extensive experiments on text-to-video benchmarks demonstrate M4V’s ability to produce high-quality videos while significantly lowering computational costs. Code and models will be publicly available at https://huangjch526.github.io/M4V_project.

文本到视频的生成极大地丰富了内容的创建,并有可能演变成强大的世界模拟器。然而,对庞大的片段空间进行建模仍然在计算上要求很高,特别是当使用在序列处理中产生二次复杂性从而限制实际应用的变异器时。最近线性时间序列建模的进步,特别是Mamba结构,提供了一个更有效的替代方案。然而,它的简单设计限制了其对多式和超式视频生成任务的直接适用性。为了应对这些挑战,我们引入了M4V,即供文本到视频生成的多式VMamba框架。具体地说,我们建议采用多式传播Mamba(MMM-DIM)块,以便能够通过多式象征性重新配置设计无缝地整合多式信息和广式模型。结果,M4V4中的Mamba区块在生成768美元到1280美元的视频生成时,将FLOPs与以关注为基础的替代方案相比减少45%。此外,我们提议在长式视频-视频-视频-视频的高清晰度生成过程中,将进一步降低视觉质量的降解,同时在高层次的图像-直观-视频生成过程中,我们将在高透明-直观-直观-直观-直观-直观-直观-直观-直观-影像生成的生成模型中,将进一步展示-直观-直观-直观-直路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路。


Article 29

Title@2025-06-12 (4): Foundation Models for Causal Inference via Prior-Data Fitted Networks

Title: Foundation Models for Causal Inference via Prior-Data Fitted Networks Gründungsmodelle für die Kausalableitung über Prior-Data Fitted Networks 通过先前数据适合型网络进行因果关系推断的模型基础 2506.10914v1

Authors (4): Yuchen Ma, Dennis Frauen, Emil Javurek, Stefan Feuerriegel

Prior-data fitted networks (PFNs) have recently been proposed as a promising way to train tabular foundation models. PFNs are transformers that are pre-trained on synthetic data generated from a prespecified prior distribution and that enable Bayesian inference through in-context learning. In this paper, we introduce CausalFM, a comprehensive framework for training PFN-based foundation models in various causal inference settings. First, we formalize the construction of Bayesian priors for causal inference based on structural causal models (SCMs) in a principled way and derive necessary criteria for the validity of such priors. Building on this, we propose a novel family of prior distributions using causality-inspired Bayesian neural networks that enable CausalFM to perform Bayesian causal inference in various settings, including back-door, front-door, and instrumental variable adjustment. Finally, we instantiate CausalFM and explicitly train a foundation model for estimating conditional average treatment effects (CATEs) using back-door adjustment. We show that CausalFM performs competitively for CATE estimation using various synthetic and semi-synthetic benchmarks. In sum, our framework can be used as a general recipe to train foundation models for various causal inference settings. In contrast to the current state-of-the-art in causal inference, CausalFM offers a novel paradigm with the potential to fundamentally change how practitioners perform causal inference in medicine, economics, and other disciplines.

最近,有人提议将先前安装数据的网络(PFNs)作为培训表层基础模型的有希望的方法。PFNs是变压器,在事先规定的先前分发的合成数据上经过预先培训,并且能够通过同文学习进行贝叶斯的推断。在本文中,我们引入了CausalFM(一个综合框架),用于在各种因果推断环境中培训以PFN为基础的基础模型。首先,我们以原则方式正式建立基于结构性因果模型的Bayesian因果推断前科,并为此类前科的有效性制定必要的标准。在此基础上,我们提议采用因果诱导导出贝叶斯的神经网络,使Bayesaurgy(Causal FMy)能够在不同环境中(包括后门、前门和辅助性变数调整)进行巴伊斯的因果推断。最后,我们用后门调整的方式正式建立Bayesian Font(CATEMs) ,并明确培训一个基础模型,用以评估有条件的平均治疗效果。我们展示了CATEFM(CED)在基本理论环境中进行竞争性估算,使用各种合成和半因果分析基础基础基础基础,在各种因果基质模型中可以用来进行。


Article 30

Title@2025-06-12 (4): NoLoCo: No-all-reduce Low Communication Training Method for Large Models

Title: NoLoCo: No-all-reduce Low Communication Training Method for Large Models NoLoCo: Nein-alles-Reduce Low Communication Training Method for Large Models NoLOCO: 大型模型的低通信培训方法 2506.10911v1

Authors (5): Jari Kolehmainen, Nikolay Blagoev, John Donaghy, Oğuzhan Ersoy, Christopher Nies

Training large language models is generally done via optimization methods on clusters containing tens of thousands of accelerators, communicating over a high-bandwidth interconnect. Scaling up these clusters is expensive and can become impractical, imposing limits on the size of models that can be trained. Several recent studies have proposed training methods that are less communication intensive, avoiding the need for a highly connected compute cluster. These state-of-the-art low communication training methods still employ a synchronization step for model parameters, which, when performed over all model replicas, can become costly on a low-bandwidth network. In this work, we propose a novel optimization method, NoLoCo, that does not explicitly synchronize all model parameters during training and, as a result, does not require any collective communication. NoLoCo implicitly synchronizes model weights via a novel variant of the Nesterov momentum optimizer by partially averaging model weights with a randomly selected other one. We provide both a theoretical convergence analysis for our proposed optimizer as well as empirical results from language model training. We benchmark NoLoCo on a wide range of accelerator counts and model sizes, between 125M to 6.8B parameters. Our method requires significantly less communication overhead than fully sharded data parallel training or even widely used low communication training method, DiLoCo. The synchronization step itself is estimated to be one magnitude faster than the all-reduce used in DiLoCo for few hundred accelerators training over the internet. We also do not have any global blocking communication that reduces accelerator idling time. Compared to DiLoCo, we also observe up to $4\%$ faster convergence rate with wide range of model sizes and accelerator counts.

培训大型语言模型通常是通过在包含数万个加速器的集群上优化方法进行,通过高带宽的连接进行沟通。 扩大这些集群费用昂贵, 可能变得不切实际, 对可以培训的模型的大小施加限制。 最近的一些研究提出了沟通强度较低的培训方法, 避免需要一个高度连接的计算组。 这些最先进的低通信培训方法仍然对模型参数采用同步步骤, 在所有模型复制器上实施时, 低带宽网络上可能变得费用高昂。 在这项工作中, 我们建议一种新型优化方法, 即 No LoCo, 在培训期间没有明确同步所有模型参数, 因而不要求任何集体交流。 No LoCo 隐含同步模型重量, 通过Nesterov 动力优化的一种新变式, 部分平均模型重量, 并随机选择其他方法。 我们为我们拟议的最优化者提供理论性趋近的趋同分析, 以及任何语言模型训练的经验性结果都会变得昂贵。 我们把No Loco 用于一个范围不甚广的网络化的模型, 甚至不协调所有模型中的时间范围, 也要求大量使用成本比我们所用的标准的模型 。


Article 31

Title@2025-06-12 (4): Probably Approximately Correct Labels

Title: Probably Approximately Correct Labels Wahrscheinlich ungefähr korrekte Etiketten 可能是近似正确标签 2506.10908v1

Authors (3): Emmanuel J. Candès, Andrew Ilyas, Tijana Zrnic

Obtaining high-quality labeled datasets is often costly, requiring either extensive human annotation or expensive experiments. We propose a method that supplements such “expert” labels with AI predictions from pre-trained models to construct labeled datasets more cost-effectively. Our approach results in probably approximately correct labels: with high probability, the overall labeling error is small. This solution enables rigorous yet efficient dataset curation using modern AI models. We demonstrate the benefits of the methodology through text annotation with large language models, image labeling with pre-trained vision models, and protein folding analysis with AlphaFold.

获取高质量的标签数据集往往成本高昂,需要广泛的人类批注或昂贵的实验。我们提出了一个方法来补充这种“专家”标签,加上从经过培训的模型中预测的人工智能标签,以便以更具成本效益的方式构建标签数据集。我们的方法可以大致正确的标签:在高概率的情况下,总体标签错误很小。这个方法可以使用现代的AI模型进行严格而高效的数据集弯曲。我们通过使用大型语言模型的文本注释、与经过培训的视觉模型进行图像标签以及与阿尔法福尔德进行蛋白质折叠分析来证明这种方法的好处。


Article 32

Title@2025-06-12 (4): Demystifying Spectral Feature Learning for Instrumental Variable Regression

Title: Demystifying Spectral Feature Learning for Instrumental Variable Regression Entmystifizieren von Spektralfunktionen für instrumentelle variable Regression 为乐器变量回归而解开神秘的光谱特征学习 2506.10899v1

Authors (5): Dimitri Meunier, Antoine Moulin, Jakub Wornbard, Vladimir R. Kostic, Arthur Gretton

We address the problem of causal effect estimation in the presence of hidden confounders, using nonparametric instrumental variable (IV) regression. A leading strategy employs spectral features - that is, learned features spanning the top eigensubspaces of the operator linking treatments to instruments. We derive a generalization error bound for a two-stage least squares estimator based on spectral features, and gain insights into the method’s performance and failure modes. We show that performance depends on two key factors, leading to a clear taxonomy of outcomes. In a good scenario, the approach is optimal. This occurs with strong spectral alignment, meaning the structural function is well-represented by the top eigenfunctions of the conditional operator, coupled with this operator’s slow eigenvalue decay, indicating a strong instrument. Performance degrades in a bad scenario: spectral alignment remains strong, but rapid eigenvalue decay (indicating a weaker instrument) demands significantly more samples for effective feature learning. Finally, in the ugly scenario, weak spectral alignment causes the method to fail, regardless of the eigenvalues’ characteristics. Our synthetic experiments empirically validate this taxonomy.

我们用非参数工具变量(IV)回归法解决在隐蔽混淆器存在的情况下因果估计问题。 一种主要策略使用光谱特征, 即操作者将处理方法与仪器连接起来的顶部天分次空间的学习特征。 我们根据光谱特征为两阶段最小方形估计器得出一个一般错误, 并深入了解方法的性能和故障模式。 我们显示, 性能取决于两个关键因素, 导致结果的清晰分类。 在一种良好的情景中, 这种方法是最佳的。 发生于强光谱匹配, 意味着结构功能由有条件操作者顶部天分功能充分体现, 以及该操作者缓慢的天分衰减, 表明一种强大的工具。 性能在一种坏的情景中降解: 光谱对齐仍然很强大, 但快速的光值衰变( 指示较弱的仪器) 需要大量样本来有效特征学习。 最后, 在丑陋的情景中, 弱光谱协调法导致方法失败, 而不考虑其特性的特性。


Article 33

Title@2025-06-12 (4): The Diffusion Duality

Title: The Diffusion Duality Die Diffusionsdualität 传播质量 2506.10892v1

Authors (6): Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, Volodymyr Kuleshov

Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: http://s-sahoo.github.io/duo

单一状态离散扩散模型具有快速生成文本的希望,因为它们具有自我校正的内在能力。 但是,它们通常表现优于自动递减模型和蒙面扩散模型。 在这项工作中,我们通过利用关键的洞察力缩小了这一性能差距:统一状态扩散过程自然地从根基高斯扩散中产生。我们的方法,Duo,从高山扩散中转让强大的技术,以改进培训和取样。首先,我们引入了高山进程指导下的课程学习战略,通过减少差异使培训速度翻倍。经过课程培训的模型在7个基准中的3个标准中超过了自动递增模型。第二,我们展示了混凝固性蒸馏法,将一致性从连续的蒸馏到离散的设置。这种算法通过加速两个级的采样来解开传播语言模型的几步生成。我们在项目网页上提供了代码和模式检查站: http://s-sahoo.github.io/duo。我们提供了项目页面上的代码和模式检查站: http://s-sahoo. github. duo/duo/duo。


Article 34

Title@2025-06-12 (4): PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play

Title: PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play PLAY2PROMPT: Zero-shot Tool Instruction Optimierung für LLM Agenten über Tool Play PLAY2PROMOPT: 通过工具游戏优化LLM代理器的零射工具指令 2503.14432v2

Authors (5): Wei Fang, Yang Zhang, Kaizhi Qian, James Glass, Yada Zhu

Large language models (LLMs) are increasingly integrated with specialized external tools, yet many tasks demand zero-shot tool usage with minimal or noisy documentation. Existing solutions rely on manual rewriting or labeled data for validation, making them inapplicable in true zero-shot settings. To address these challenges, we propose PLAY2PROMPT, an automated framework that systematically “plays” with each tool to explore its input-output behaviors. Through this iterative trial-and-error process, PLAY2PROMPT refines tool documentation and generates usage examples without any labeled data. These examples not only guide LLM inference but also serve as validation to further enhance tool utilization. Extensive experiments on real-world tasks demonstrate that PLAY2PROMPT significantly improves zero-shot tool performance across both open and closed models, offering a scalable and effective solution for domain-specific tool integration.

大型语言模型(LLMS)日益与专门的外部工具融合在一起,但许多任务要求使用零光工具,且文件很少或吵闹。现有解决方案依靠人工重写或标签数据进行验证,使其在真正的零光环境中不适用。为了应对这些挑战,我们提议PLAY2PROMPT,这是一个系统“玩耍”的自动框架,每个工具都可以探索其输入-输出行为。通过这个迭接的试验-操作程序,PLAY2PROMOT改进了工具文档,并生成了使用示例,而没有任何标签数据。这些示例不仅指导LLLM推断,而且还作为进一步增强工具利用的验证。关于现实世界任务的广泛实验表明,PLAY2PROMOT大大改善了开放和封闭模式的零光工具性,为特定域工具整合提供了可扩展的有效解决方案。


Article 35

Title@2025-06-12 (4): Lattice Climber Attack: Adversarial attacks for randomized mixtures of classifiers

Title: Lattice Climber Attack: Adversarial attacks for randomized mixtures of classifiers Lattice Climber Attack: Adversariale Angriffe für randomisierte Mischungen von Klassifikatoren Lattice 攀爬者攻击: 随机混合分类器的反向攻击 2506.10888v1

Authors (3): Lucas Gnecco-Heredia, Benjamin Negrevergne, Yann Chevaleyre

Finite mixtures of classifiers (a.k.a. randomized ensembles) have been proposed as a way to improve robustness against adversarial attacks. However, existing attacks have been shown to not suit this kind of classifier. In this paper, we discuss the problem of attacking a mixture in a principled way and introduce two desirable properties of attacks based on a geometrical analysis of the problem (effectiveness and maximality). We then show that existing attacks do not meet both of these properties. Finally, we introduce a new attack called {\em lattice climber attack} with theoretical guarantees in the binary linear setting, and demonstrate its performance by conducting experiments on synthetic and real datasets.

提议将精密的分类器混合物(a.k.a. 随机组合)作为提高抵御对抗性攻击的力度的一种方法。然而,现有攻击已证明不适合这种分类器。在本文中,我们讨论以原则性方式攻击混合物的问题,根据对问题(有效性和最大性)的几何分析,引入两种可取的攻击特性。然后,我们表明现有的攻击不符合这两种特性。最后,我们用二进制线性线性环境的理论保证引进了一个新的攻击,称为“范拉蒂斯登山者攻击”),并通过对合成和真实数据集进行实验来证明其性能。


Article 36

Title@2025-06-12 (4): Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

Title: Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers Verallgemeinerung oder Halluzination? Verstehen von Out-of-Context-Reasoning in Transformers 通化还是幻觉? 理解变异器的逻辑外原因 2506.10887v1

Authors (8): Yixiao Huang, Hanlin Zhu, Tianyu Guo, Jiantao Jiao, Somayeh Sojoudi, Michael I. Jordan, Stuart Russell, Song Mei

Large language models (LLMs) can acquire new knowledge through fine-tuning, but this process exhibits a puzzling duality: models can generalize remarkably from new facts, yet are also prone to hallucinating incorrect information. However, the reasons for this phenomenon remain poorly understood. In this work, we argue that both behaviors stem from a single mechanism known as out-of-context reasoning (OCR): the ability to deduce implications by associating concepts, even those without a causal link. Our experiments across five prominent LLMs confirm that OCR indeed drives both generalization and hallucination, depending on whether the associated concepts are causally related. To build a rigorous theoretical understanding of this phenomenon, we then formalize OCR as a synthetic factual recall task. We empirically show that a one-layer single-head attention-only transformer with factorized output and value matrices can learn to solve this task, while a model with combined weights cannot, highlighting the crucial role of matrix factorization. Our theoretical analysis shows that the OCR capability can be attributed to the implicit bias of gradient descent, which favors solutions that minimize the nuclear norm of the combined output-value matrix. This mathematical structure explains why the model learns to associate facts and implications with high sample efficiency, regardless of whether the correlation is causal or merely spurious. Ultimately, our work provides a theoretical foundation for understanding the OCR phenomenon, offering a new lens for analyzing and mitigating undesirable behaviors from knowledge injection.

大型语言模型(LLMS)可以通过微调获得新知识,但这一过程呈现出一种令人费解的双重性:模型可以从新的事实中加以概括,但又容易产生幻觉。然而,这一现象的原因仍然没有得到很好理解。在这项工作中,我们争辩说,这两种行为都来自一个单一的机制,称为超文本推理(OCR):通过结合概念,甚至没有因果联系的概念,来推断影响的能力。我们在五个突出的LMS的实验中证实,OCR确实既会推动普遍性,也会引发幻觉,这取决于相关概念是否因果相关。要建立严格的理论理解,我们然后将OCR正式确定为综合事实回顾任务。我们从经验上表明,一个单层单一的注意力变异器,带有因素性产出和价值矩阵,可以学会解决这项任务,而一个综合权重的模型则不能突出矩阵化的关键作用。我们的理论分析显示,OCR能力可以归因于渐渐下降的隐含偏见,这取决于相关概念是否具有因果关系。为了建立严格的核规范,我们然后将OCRR正式确定一个综合产出-价值模型,这是我们最终的理论分析基础,这解释了为什么是提供我们如何解释。


Article 37

Title@2025-06-12 (4): Three iterations of $(d-1)$-WL test distinguish non isometric clouds of $d$-dimensional points

Title: Three iterations of $(d-1)$-WL test distinguish non isometric clouds of $d$-dimensional points Drei Iterationen von $(d-1)$-WL-Test unterscheiden nicht-isometrische Wolken von $d$-dimensionalen Punkten 三次(d-1-1美元-WL)的迭代(美元-美元-WL)测试区分非等度云(美元-维点) 2303.12853v4

Authors (5): Valentino Delle Rose, Alexander Kozachinskiy, Cristóbal Rojas, Mircea Petrache, Pablo Barceló

The Weisfeiler–Lehman (WL) test is a fundamental iterative algorithm for checking isomorphism of graphs. It has also been observed that it underlies the design of several graph neural network architectures, whose capabilities and performance can be understood in terms of the expressive power of this test. Motivated by recent developments in machine learning applications to datasets involving three-dimensional objects, we study when the WL test is {\em complete} for clouds of euclidean points represented by complete distance graphs, i.e., when it can distinguish, up to isometry, any arbitrary such cloud. %arbitrary clouds of euclidean points represented by complete distance graphs. % How many dimensions of the Weisfeiler–Lehman test is enough to distinguish any two non-isometric point clouds in $d$-dimensional Euclidean space, assuming that these point clouds are given as complete graphs labeled by distances between the points? This question is important for understanding, which architectures of graph neural networks are capable of fully exploiting the spacial structure of a point cloud. Our main result states that the $(d-1)$-dimensional WL test is complete for point clouds in $d$-dimensional Euclidean space, for any $d\ge 2$, and that only three iterations of the test suffice. We also observe that the $d$-dimensional WL test only requires one iteration to achieve completeness. Our paper thus provides complete understanding of the 3-dimensional case: it was shown in previous works that 1-WL is not complete in $\mathbb{R}^3$, and we show that 2-WL is complete there. We also strengthen the lower bound for 1-WL by showing that it is unable to recognize planar point clouds in $\mathbb{R}^3$. Finally, we show that 2-WL is not complete in $\mathbb{R}^6$, leaving as an open question, whether it is complete in $\mathbb{R}^{d}$ for $d = 4,5$.

Weisfeiler- Lehman (WL) 测试是用于检查图形是否具有任意性云层的基本迭代算法。 人们也观察到,它支撑了数个图形神经网络结构的设计, 其能力和性能可以以测试的表达力来理解。 由机器在涉及三维天体的数据集中学习应用的最新发展动态驱动的Weisfeiler- Lehman 测试, 当 WL 测试完成时, 我们研究这些点云仅以完整的距离图标出。 也就是说, 当它能够辨别, 直至偏差值, 任何任意的云层。% a 任意的oclide 点的云层由 2LL 表示的功能和性能, 因此, 前Wisfeiler- Lehman 测试的多维维度足以区分任何两个非测量点云层值的云层值, 假设这些点云点仅以完全的平面图标出, 而我们只能完全地 4 美元 。 。 这个问题对于理解, 它的图形网络结构能够完全利用 $ 美元 。


Article 38

Title@2025-06-12 (4): A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials

Title: A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials Ein Goemans-Williamson-Algorithmus zur Identifizierung von Subkohorten in klinischen Studien Goemans- Williamson 型算法,用于在临床试验中识别亚科 2506.10879v1

Authors (1): Pratik Worah

We design an efficient algorithm that outputs a linear classifier for identifying homogeneous subsets (equivalently subcohorts) from large inhomogeneous datasets. Our theoretical contribution is a rounding technique, similar to that of Goemans and Williamson (1994), that approximates the optimal solution of the underlying optimization problem within a factor of $0.82$. As an application, we use our algorithm to design a simple test that can identify homogeneous subcohorts of patients, that are mainly comprised of metastatic cases, from the RNA microarray dataset for breast cancer by Curtis et al. (2012). Furthermore, we also use the test output by the algorithm to systematically identify subcohorts of patients in which statistically significant changes in methylation levels of tumor suppressor genes co-occur with statistically significant changes in nuclear receptor expression. Identifying such homogeneous subcohorts of patients can be useful for the discovery of disease pathways and therapeutics, specific to the subcohort.

我们设计了一个高效的算法,从大型异质数据集中输出一个线性分类器,用于识别同质子集(相当于亚异质)。我们的理论贡献是一种四舍五入的技术,类似于戈曼斯和威廉姆森(1994年)的技术,它近似于0.82美元系数内潜在优化问题的最佳解决办法。作为一种应用,我们用我们的算法设计一个简单的测试,可以识别病人的同质次组,主要包括Curtis等人(2012年)的乳腺癌RNA微粒数据集中的同质次组。此外,我们还利用算法的测试输出系统识别病人的次组群,其中肿瘤抑制基因的甲基化水平发生了统计学上的重大变化,同时对核感应器表达方式也发生了统计上的重大变化。确定病人的同质次组群可以有助于发现子科特的疾病路径和治疗方法。


Article 39

Title@2025-06-12 (4): Data-Driven Prediction of Dynamic Interactions Between Robot Appendage and Granular Material

Title: Data-Driven Prediction of Dynamic Interactions Between Robot Appendage and Granular Material Datengestützte Vorhersage dynamischer Interaktionen zwischen Roboter-Anhang und Granular Material 机器人附加和颗粒材料之间动态相互作用的数据驱动预测 2506.10875v1

Authors (4): Guanjin Wang, Xiangxue Zhao, Shapour Azarm, Balakumar Balachandran

An alternative data-driven modeling approach has been proposed and employed to gain fundamental insights into robot motion interaction with granular terrain at certain length scales. The approach is based on an integration of dimension reduction (Sequentially Truncated Higher-Order Singular Value Decomposition), surrogate modeling (Gaussian Process), and data assimilation techniques (Reduced Order Particle Filter). This approach can be used online and is based on offline data, obtained from the offline collection of high-fidelity simulation data and a set of sparse experimental data. The results have shown that orders of magnitude reduction in computational time can be obtained from the proposed data-driven modeling approach compared with physics-based high-fidelity simulations. With only simulation data as input, the data-driven prediction technique can generate predictions that have comparable accuracy as simulations. With both simulation data and sparse physical experimental measurement as input, the data-driven approach with its embedded data assimilation techniques has the potential in outperforming only high-fidelity simulations for the long-horizon predictions. In addition, it is demonstrated that the data-driven modeling approach can also reproduce the scaling relationship recovered by physics-based simulations for maximum resistive forces, which may indicate its general predictability beyond a case-by-case basis. The results are expected to help robot navigation and exploration in unknown and complex terrains during both online and offline phases.

已经提出并采用了一种替代数据驱动模型方法,以深入了解机器人运动与颗粒层在某些长度尺度上与颗粒层互动的机器人运动,该方法的基础是:整合尺寸缩减(按顺序排列的高级Order Singular 值分解)、代建模型(Gauussian Process)和数据同化技术(减少顺序粒子过滤器),这一方法可以在线使用,并以离线数据为基础,从高纤维模拟数据离线采集的离线数据以及一组稀少的实验数据为基础;结果显示,计算时间的降幅从拟议的数据驱动模型方法中获得,与基于物理学的高纤维模拟相比,可以减少数量级。如果只有模拟数据作为投入,数据驱动的预测技术可以产生类似模拟数据的准确性预测(Gaussilian Process Process)和数据驱动的物理实验测量数据驱动方法作为投入,那么数据吸收的模拟性模拟方法只能超过长期正弦预测的高度纤维化模拟模拟。此外,还表明,数据驱动的模型和模拟性模型分析结果可以复制到不那么的模型化的模型化的模型分析基础,然后再复制,然后再复制。


Article 40

Title@2025-06-12 (4): The Gittins Index: A Design Principle for Decision-Making Under Uncertainty

Title: The Gittins Index: A Design Principle for Decision-Making Under Uncertainty Der Gittins Index: Ein Design-Prinzip für Entscheidungsfindung unter Unsicherheit Gittins指数:不确定性下决策的设计原则 2506.10872v1

Authors (2): Ziv Scully, Alexander Terenin

The Gittins index is a tool that optimally solves a variety of decision-making problems involving uncertainty, including multi-armed bandit problems, minimizing mean latency in queues, and search problems like the Pandora’s box model. However, despite the above examples and later extensions thereof, the space of problems that the Gittins index can solve perfectly optimally is limited, and its definition is rather subtle compared to those of other multi-armed bandit algorithms. As a result, the Gittins index is often regarded as being primarily a concept of theoretical importance, rather than a practical tool for solving decision-making problems. The aim of this tutorial is to demonstrate that the Gittins index can be fruitfully applied to practical problems. We start by giving an example-driven introduction to the Gittins index, then walk through several examples of problems it solves - some optimally, some suboptimally but still with excellent performance. Two practical highlights in the latter category are applying the Gittins index to Bayesian optimization, and applying the Gittins index to minimizing tail latency in queues.

Gittins指数是一个工具,它能最好地解决各种涉及不确定性的决策问题,包括多臂土匪问题,最大限度地减少队列中的平均潜伏,以及Pandora箱式模型等搜索问题。然而,尽管有上述例子和后来的扩展,Gittins指数完全可以最佳解决的问题空间是有限的,其定义与其他多臂土匪算法相比相当微妙。因此,Gittins指数常常被视为主要具有理论重要性的概念,而不是解决决策问题的实用工具。这个教程的目的是证明Gittins指数可以富有成效地应用于实际问题。我们首先以实例为驱动地介绍Gittins指数,然后通过几个问题的例子,有些是最佳的,有些是次近的,但表现仍然很好。后一类的两个实际亮点是将Gittins指数应用于Bayesian 优化,并应用Gittins索引来尽量减少排队尾拖。


Article 41

Title@2025-06-12 (4): Unveiling the Role of Randomization in Multiclass Adversarial Classification: Insights from Graph Theory

Title: Unveiling the Role of Randomization in Multiclass Adversarial Classification: Insights from Graph Theory Enthüllen der Rolle der Randomisierung in der Mehrklassen-Adversarial-Klassifikation: Einsichten aus der Graphentheorie 打破随机化在多级反对向分类中的作用:图表理论的透视 2503.14299v2

Authors (6): Lucas Gnecco-Heredia, Matteo Sammut, Muni Sreenivas Pydi, Rafael Pinot, Benjamin Negrevergne, Yann Chevaleyre

Randomization as a mean to improve the adversarial robustness of machine learning models has recently attracted significant attention. Unfortunately, much of the theoretical analysis so far has focused on binary classification, providing only limited insights into the more complex multiclass setting. In this paper, we take a step toward closing this gap by drawing inspiration from the field of graph theory. Our analysis focuses on discrete data distributions, allowing us to cast the adversarial risk minimization problems within the well-established framework of set packing problems. By doing so, we are able to identify three structural conditions on the support of the data distribution that are necessary for randomization to improve robustness. Furthermore, we are able to construct several data distributions where (contrarily to binary classification) switching from a deterministic to a randomized solution significantly reduces the optimal adversarial risk. These findings highlight the crucial role randomization can play in enhancing robustness to adversarial attacks in multiclass classification.

作为改善机器学习模式对抗性强健性的一种手段,随机化最近引起了人们的极大关注。 不幸的是,迄今为止的理论分析大多侧重于二进制分类,只对更复杂的多级设置提供了有限的洞察力。在本文中,我们通过从图形理论领域获得灵感,朝着缩小这一差距迈出了一步。我们的分析侧重于离散的数据分布,从而使我们能够将对抗性风险最小化的问题置于既定的成套包装问题框架之内。通过这样做,我们得以确定支持数据分配的三个结构性条件,而这是随机化提高稳健性所必需的。此外,我们能够在从确定性(从二进制分类到二进制分类)转换为随机化解决方案的地方建立若干数据分布,从而大大降低了最佳的对抗性风险。这些结论强调了随机化在多级分类中增强对抗性攻击的稳健性方面可以发挥的关键作用。


Article 42

Title@2025-06-12 (4): Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization

Title: Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization Lebensfähigkeit künftiger Maßnahmen: Robuste Sicherheit im Stärkungslernen durch Entropieregularisierung 未来行动的可行性:通过英信正规化加强学习的有力安全 2506.10871v1

Authors (4): Pierre-François Massiani, Alexander von Rohr, Lukas Haverbeck, Sebastian Trimpe

Despite the many recent advances in reinforcement learning (RL), the question of learning policies that robustly satisfy state constraints under unknown disturbances remains open. In this paper, we offer a new perspective on achieving robust safety by analyzing the interplay between two well-established techniques in model-free RL: entropy regularization, and constraints penalization. We reveal empirically that entropy regularization in constrained RL inherently biases learning toward maximizing the number of future viable actions, thereby promoting constraints satisfaction robust to action noise. Furthermore, we show that by relaxing strict safety constraints through penalties, the constrained RL problem can be approximated arbitrarily closely by an unconstrained one and thus solved using standard model-free RL. This reformulation preserves both safety and optimality while empirically improving resilience to disturbances. Our results indicate that the connection between entropy regularization and robustness is a promising avenue for further empirical and theoretical investigation, as it enables robust safety in RL through simple reward shaping.

尽管在加强学习(RL)方面最近取得了许多进展,但学习政策如何在未知扰动下强有力地满足国家制约的问题仍然有待解决。在本文件中,我们通过分析无模式RL中两种成熟技术的相互作用,提供了实现稳健安全的新视角:即:变本加厉的正规化和约束性处罚。我们从经验中发现,受限制的RL的诱导性正规化在学习最大限度地增加未来可行行动的数量方面有着内在的偏向性,从而促使对行动噪音产生强烈的满意度。此外,我们表明,通过通过惩罚放松严格的安全限制,受限制的RL问题可以任意地被一个不受约束的束缚的束缚性监管者所近似,从而使用标准的无模式RL来解决。这种重整既能维护安全和最佳性,又在经验上提高抗扰动的复原力。我们的结果表明,变本和稳健之间的联系是进一步进行实证和理论性调查的一个有希望的途径,因为它通过简单的奖励方式使RL的稳健安全。


Article 43

Title@2025-06-12 (4): mLaSDI: Multi-stage latent space dynamics identification

Title: mLaSDI: Multi-stage latent space dynamics identification mLaSDI: Mehrstufige latente Raumdynamik-Identifikation mLASDI:多阶段潜在空间动态识别 2506.09207v2

Authors (3): William Anderson, Seung Whan Chung, Youngsoo Choi

Determining accurate numerical solutions of partial differential equations (PDEs) is an important task in many scientific disciplines. However, solvers can be computationally expensive, leading to the development of reduced-order models (ROMs). Recently, Latent Space Dynamics Identification (LaSDI) was proposed as a data-driven, non-intrusive ROM framework. LaSDI compresses the training data using an autoencoder and learns a system of user-chosen ordinary differential equations (ODEs), which govern the latent space dynamics. This allows for rapid predictions by interpolating and evolving the low-dimensional ODEs in the latent space. While LaSDI has produced effective ROMs for numerous problems, the autoencoder can have difficulty accurately reconstructing training data while also satisfying the imposed dynamics in the latent space, particularly in complex or high-frequency regimes. To address this, we propose multi-stage Latent Space Dynamics Identification (mLaSDI). With mLaSDI, several autoencoders are trained sequentially in stages, where each autoencoder learns to correct the error of the previous stages. We find that applying mLaSDI with small autoencoders results in lower prediction and reconstruction errors, while also reducing training time compared to LaSDI.

确定部分差异方程式(PDEs)的准确数字解决方案在许多科学学科中是一项重要任务。然而,解答器在计算上可能费用高昂,导致开发了减少顺序模型(ROMs)。最近,LaST空间动态识别(LASDI)被提议为数据驱动的非侵入性ROM框架。LASDI使用自动编码器压缩培训数据,并学习一个管理潜在空间动态的用户选择的普通差异方程式系统(MLASDI),这样就可以通过对潜层低维代号进行内插和演化来进行快速预测。虽然LASDI为许多问题制作了有效的ROM,但自动编码器可能难以准确重建培训数据,同时也满足了潜在空间,特别是复杂或高频系统中的强制动态。为了解决这个问题,我们提出了多阶段的低端空间动态识别(MLASDI)系统。使用MSDI系统,对几个自动编码器进行了连续培训,使每个小型解码器都学会如何纠正前几个阶段的错误。我们发现,在将LASDRADS在进行低级的重建时,同时将LADDRADS在降低时间预测中进行。


Article 44

Title@2025-06-12 (4): Latent Action Learning Requires Supervision in the Presence of Distractors

Title: Latent Action Learning Requires Supervision in the Presence of Distractors Latent Action Learning erfordert Überwachung in der Gegenwart von Distraktoren 长期行动学习要求对有减压器的存在进行监督 2502.00379v5

Authors (7): Alexander Nikulin, Ilya Zisman, Denis Tarasov, Nikita Lyubaykin, Andrei Polubarov, Igor Kiselev, Vladislav Kurenkov

Recently, latent action learning, pioneered by Latent Action Policies (LAPO), have shown remarkable pre-training efficiency on observation-only data, offering potential for leveraging vast amounts of video available on the web for embodied AI. However, prior work has focused on distractor-free data, where changes between observations are primarily explained by ground-truth actions. Unfortunately, real-world videos contain action-correlated distractors that may hinder latent action learning. Using Distracting Control Suite (DCS) we empirically investigate the effect of distractors on latent action learning and demonstrate that LAPO struggle in such scenario. We propose LAOM, a simple LAPO modification that improves the quality of latent actions by 8x, as measured by linear probing. Importantly, we show that providing supervision with ground-truth actions, as few as 2.5% of the full dataset, during latent action learning improves downstream performance by 4.2x on average. Our findings suggest that integrating supervision during Latent Action Models (LAM) training is critical in the presence of distractors, challenging the conventional pipeline of first learning LAM and only then decoding from latent to ground-truth actions.

最近,由Lient Action Policy(LAPO)率先推出的潜在行动学习在只观察数据上表现出了惊人的训练前效率,为利用网上大量视频进行体现AI提供了潜力。然而,先前的工作侧重于分散式数据,观测之间的变化主要通过地面真相行动来解释。不幸的是,真实世界的视频含有可能阻碍潜在行动学习的与行动有关的分散性。我们利用干扰控制套件,实证地调查了分散器对潜在行动学习的影响,并展示了LAPO在这种情景中的斗争。我们建议LAOM,这是一个简单的LAPO修改,按照线性研究的测量,将潜在行动的质量提高8x。重要的是,我们表明,在潜在行动学习过程中,对地面真相行动的监督,只有2.5 % , 平均而言,4.2x的下游表现会提高。我们的调查结果表明,在使用远程行动模型(LAM)培训时,综合监督对于分散式的培训对于转移器的存在至关重要,对最初学习LAM的常规管道提出了挑战,然后才从潜至地面的分解。


Article 45

Title@2025-06-12 (4): Multi-group Uncertainty Quantification for Long-form Text Generation

Title: Multi-group Uncertainty Quantification for Long-form Text Generation Multi-Gruppen-Unsicherheits-Quantifizierung für langformige Textgenerierung 长式文本生成的不确定性量化 2407.21057v2

Authors (2): Terrance Liu, Zhiwei Steven Wu

While past works have shown how uncertainty quantification can be applied to large language model (LLM) outputs, the question of whether resulting uncertainty guarantees still hold within sub-groupings of data remains open. In our work, given some long-form text generated by an LLM, we study uncertainty at both the level of individual claims contained within the output (via calibration) and across the entire output itself (via conformal prediction). Using biography generation as a testbed for this study, we derive a set of (demographic) attributes (e.g., whether some text describes a man or woman) for each generation to form such “subgroups” of data. We find that although canonical methods for both types of uncertainty quantification perform well when measuring across the entire dataset, such guarantees break down when examining particular subgroups. Having established this issue, we invoke group-conditional methods for uncertainty quantification – multicalibration and multivalid conformal prediction – and find that across a variety of approaches, additional subgroup information consistently improves calibration and conformal prediction within subgroups (while crucially retaining guarantees across the entire dataset). As the problems of calibration, conformal prediction, and their multi-group counterparts have not been extensively explored in the context of long-form text generation, we consider these results to form a benchmark for this setting.

过去的工作表明,不确定性的量化如何适用于大型语言模型(LLM)产出,由此产生的不确定性的保证是否仍然保留在数据分组内的问题仍然未解决。在我们的工作中,鉴于LLM产生的一些长式文本,我们研究了产出(校准)和整个产出本身(校准)中包含的个别索赔的不确定性水平(通过一致的预测),利用传记生成作为本研究的测试台,我们为每代人得出一套(人口)属性(例如,一些文字是否描述一个男人或女人),以形成这种“子群”的数据。我们发现,尽管这两种不确定性量化的可归纳方法在测量整个数据集时效果良好,但在检查特定分组时,这种保证会破裂。我们确定了这一问题,我们援引了不确定性量化的群点 – – 多重校准和多valid一致的预测 – – 并发现,在各种方法中,额外的分组信息始终在改进校准和一致的预测(同时关键地在整个数据集中保留着保证)。由于校准、符合这些版本的预测和多式结果的组问题不是我们所探讨的长式的对应。


Article 46

Title@2025-06-12 (4): Breaking Distortion-free Watermarks in Large Language Models

Title: Breaking Distortion-free Watermarks in Large Language Models Breaking Distortion-free Watermarks in großen Sprachmodellen 大语言模型中无扭曲的无水印 2502.18608v2

Authors (8): Shayleen Reynolds, Hengzhi He, Dung Daniel T. Ngo, Saheed Obitayo, Niccolò Dalmasso, Guang Cheng, Vamsi K. Potluru, Manuela Veloso

In recent years, LLM watermarking has emerged as an attractive safeguard against AI-generated content, with promising applications in many real-world domains. However, there are growing concerns that the current LLM watermarking schemes are vulnerable to expert adversaries wishing to reverse-engineer the watermarking mechanisms. Prior work in breaking or stealing LLM watermarks mainly focuses on the distribution-modifying algorithm of Kirchenbauer et al. (2023), which perturbs the logit vector before sampling. In this work, we focus on reverse-engineering the other prominent LLM watermarking scheme, distortion-free watermarking (Kuditipudi et al. 2024), which preserves the underlying token distribution by using a hidden watermarking key sequence. We demonstrate that, even under a more sophisticated watermarking scheme, it is possible to compromise the LLM and carry out a spoofing attack, i.e. generate a large number of (potentially harmful) texts that can be attributed to the original watermarked LLM. Specifically, we propose using adaptive prompting and a sorting-based algorithm to accurately recover the underlying secret key for watermarking the LLM. Our empirical findings on LLAMA-3.1-8B-Instruct, Mistral-7B-Instruct, Gemma-7b, and OPT-125M challenge the current theoretical claims on the robustness and usability of the distortion-free watermarking techniques.

近几年来,LLM 水的标记已成为防止AI产生的含量的有吸引力的保障,在许多现实世界域中应用了很有希望,然而,人们越来越担心目前的LLM水的标记办法对希望逆向设计水的标记机制的专家对手来说是脆弱的。以前破坏或偷LLM水的标记主要是Kirchenbauer等人(2023年)的分布调节算法,这种算法在取样之前会破坏对对对正对流矢量的干扰。在这项工作中,我们侧重于反向设计其他著名的LM水的标记办法,即无扭曲的水标记(Kuditipudi等人,2024年),这种办法通过使用隐藏的水标记关键序列来保持基本象征性的分布。我们证明,即使是在更复杂的水标记办法下,也有可能损害LMM和Spofing攻击,即产生大量(可能有害的)案文,可归因于原始的水标记LMM的理论。具体地说,我们建议使用适应性迅速和基于排序的算法,以精确地恢复我们关于当前主要关键LMMM的标记和MMM的精确的精确的精确的对目前进行水的磁标记。


Article 47

Title@2025-06-12 (4): Debiasing Watermarks for Large Language Models via Maximal Coupling

Title: Debiasing Watermarks for Large Language Models via Maximal Coupling Debiasing Wasserzeichen für große Sprachmodelle über Maximal Coupling 通过Maximal Coupling为大语言模型减少对水标记的偏差 2411.11203v2

Authors (5): Yangxinyu Xie, Xiang Li, Tanwi Mallick, Weijie J. Su, Ruixun Zhang

Watermarking language models is essential for distinguishing between human and machine-generated text and thus maintaining the integrity and trustworthiness of digital communication. We present a novel green/red list watermarking approach that partitions the token set into green'' andred’’ lists, subtly increasing the generation probability for green tokens. To correct token distribution bias, our method employs maximal coupling, using a uniform coin flip to decide whether to apply bias correction, with the result embedded as a pseudorandom watermark signal. Theoretical analysis confirms this approach’s unbiased nature and robust detection capabilities. Experimental results show that it outperforms prior techniques by preserving text quality while maintaining high detectability, and it demonstrates resilience to targeted modifications aimed at improving text quality. This research provides a promising watermarking solution for language models, balancing effective detection with minimal impact on text quality.

水标识语言模型对于区分人类和机器生成的文本,从而保持数字通信的完整性和可信赖性至关重要。我们展示了一种新的绿色/红列表水标识方法,将标牌分割成“绿色”和“红”列表,小于增加绿色标牌的生成概率。为了纠正象征性分布偏差,我们的方法采用了最大程度的组合,使用统一硬币翻转来确定是否应用偏向纠正,结果嵌入伪随机水标识。理论分析证实了这一方法的公正性和稳健的检测能力。实验结果表明,它通过在保持高可探测性的同时维护文本质量,优于先前的技术,并展示了对旨在改善文本质量的有针对性的修改的复原力。这一研究为语言模型提供了一个有希望的标注水解决方案,平衡了有效检测,对文本质量的影响最小。


Article 48

Title@2025-06-12 (4): Sample Complexity and Representation Ability of Test-time Scaling Paradigms

Title: Sample Complexity and Representation Ability of Test-time Scaling Paradigms Probenkomplexität und Darstellungsfähigkeit von Testzeit-Skalierparadigmen 试验时间缩放模型的复杂性和代表性 2506.05295v2

Authors (8): Baihe Huang, Shanda Li, Tianhao Wu, Yiming Yang, Ameet Talwalkar, Kannan Ramchandran, Michael I. Jordan, Jiantao Jiao

Test-time scaling paradigms have significantly advanced the capabilities of large language models (LLMs) on complex tasks. Despite their empirical success, theoretical understanding of the sample efficiency of various test-time strategies – such as self-consistency, best-of-$n$, and self-correction – remains limited. In this work, we first establish a separation result between two repeated sampling strategies: self-consistency requires $\Theta(1/\Delta^2)$ samples to produce the correct answer, while best-of-$n$ only needs $\Theta(1/\Delta)$, where $\Delta < 1$ denotes the probability gap between the correct and second most likely answers. Next, we present an expressiveness result for the self-correction approach with verifier feedback: it enables Transformers to simulate online learning over a pool of experts at test time. Therefore, a single Transformer architecture can provably solve multiple tasks without prior knowledge of the specific task associated with a user query, extending the representation theory of Transformers from single-task to multi-task settings. Finally, we empirically validate our theoretical results, demonstrating the practical effectiveness of self-correction methods.

测试时间缩放范式极大地提高了大型语言模型(LLMS)在复杂任务方面的能力。尽管它们取得了经验上的成功,但对于各种测试时间战略(如自我一致性、最佳美元和自我校正等)的抽样效率的理论理解仍然有限。在这项工作中,我们首先将两种重复的抽样战略区分开来:自我一致性需要$\Tata(1/Delta)2美元样本来得出正确的答案,而最佳美元只需要$@theta(1/Delta)$($-n)$($-n)$($-delta < 1美元)就表示正确和第二最有可能的答案之间的概率差距。接下来,我们展示了自我校验者反馈的自我校正方法的清晰性结果:它使变换者能够在测试时间在一个专家库中模拟在线学习。因此,单一的变换器结构可以在不事先了解与用户查询有关的具体任务的情况下以可调和方式解决多项任务,将变换器的表述理论从单项到多任务设置之间的概率差差。最后,我们验证了实际的自我校正方法。


Article 49

Title@2025-06-12 (4): OmniFluids: Unified Physics Pre-trained Modeling of Fluid Dynamics

Title: OmniFluids: Unified Physics Pre-trained Modeling of Fluid Dynamics OmniFluids: Unified Physics Pre-trained Modeling of Fluid Dynamics OmniFluids: 流体动力学统一物理预培训模拟模型 2506.10862v1

Authors (6): Rui Zhang, Qi Meng, Han Wan, Yang Liu, Zhi-Ming Ma, Hao Sun

High-fidelity and efficient simulation of fluid dynamics drive progress in various scientific and engineering applications. Traditional computational fluid dynamics methods offer strong interpretability and guaranteed convergence, but rely on fine spatial and temporal meshes, incurring prohibitive computational costs. Physics-informed neural networks (PINNs) and neural operators aim to accelerate PDE solvers using deep learning techniques. However, PINNs require extensive retraining and careful tuning, and purely data-driven operators demand large labeled datasets. Hybrid physics-aware methods embed numerical discretizations into network architectures or loss functions, but achieve marginal speed gains and become unstable when balancing coarse priors against high-fidelity measurements. To this end, we introduce OmniFluids, a unified physics pre-trained operator learning framework that integrates physics-only pre-training, coarse-grid operator distillation, and few-shot fine-tuning, which enables fast inference and accurate prediction under limited or zero data supervision. For architectural design, the key components of OmniFluids include a mixture of operators, a multi-frame decoder, and factorized Fourier layers, which enable efficient and scalable modeling of diverse physical tasks while maintaining seamless integration with physics-based supervision. Across a broad range of two- and three-dimensional benchmarks, OmniFluids significantly outperforms state-of-the-art AI-driven methods in flow field reconstruction and turbulence statistics accuracy, delivering 10-100x speedups compared to classical solvers, and accurately recovers unknown physical parameters from sparse, noisy data. This work establishes a new paradigm for efficient and generalizable surrogate modeling in complex fluid systems under limited data availability.

传统计算流体动力学方法提供强大的解释性和有保障的趋同,但依赖细微的时空模模,导致令人难以承受的计算成本。物理知情神经网络(PINNs)和神经操作员的目标是利用深层学习技术加速PDE解答器。然而,PINN需要广泛的再培训和仔细调整,而纯数据驱动操作员则需要大量贴标签的数据集。混合物理认知方法将数字分解参数嵌入网络结构或损失功能,但实现边际速度增益,在平衡粗缩的先前时间和高纤维度测量时变得不稳定。为此,我们引入了OmniFluids、统一的物理预训练操作员学习框架,将仅物理学前培训、粗电离电网操作员蒸馏和微微微调结合起来,从而可以在有限或零数据监督下快速推断和准确预测模型。对于建筑设计、OmniFlud的精确流流数据组合的关键部分包括操作员的混合体、多框架拆解的重建、以及四层集成的系统,这三维基化的实地操作的流流化和四层的系统可以保证在一般数据系统中进行高效的流流流流流中执行。


Article 50

Title@2025-06-12 (4): Energy-Efficient Deep Learning for Traffic Classification on Microcontrollers

Title: Energy-Efficient Deep Learning for Traffic Classification on Microcontrollers Energieeffizientes Deep Learning für die Verkehrsklassifizierung an Mikrocontrollern 关于微控制器交通分类的节能深层学习 2506.10851v1

Authors (4): Adel Chehade, Edoardo Ragusa, Paolo Gastaldo, Rodolfo Zunino

In this paper, we present a practical deep learning (DL) approach for energy-efficient traffic classification (TC) on resource-limited microcontrollers, which are widely used in IoT-based smart systems and communication networks. Our objective is to balance accuracy, computational efficiency, and real-world deployability. To that end, we develop a lightweight 1D-CNN, optimized via hardware-aware neural architecture search (HW-NAS), which achieves 96.59% accuracy on the ISCX VPN-NonVPN dataset with only 88.26K parameters, a 20.12K maximum tensor size, and 10.08M floating-point operations (FLOPs). Moreover, it generalizes across various TC tasks, with accuracies ranging from 94% to 99%. To enable deployment, the model is quantized to INT8, suffering only a marginal 1-2% accuracy drop relative to its Float32 counterpart. We evaluate real-world inference performance on two microcontrollers: the high-performance STM32F746G-DISCO and the cost-sensitive Nucleo-F401RE. The deployed model achieves inference latencies of 31.43ms and 115.40ms, with energy consumption of 7.86 mJ and 29.10 mJ per inference, respectively. These results demonstrate the feasibility of on-device encrypted traffic analysis, paving the way for scalable, low-power IoT security solutions.

在本文中,我们介绍了对资源有限的微控制器进行节能交通分类(TC)的实际深入学习(DL)方法,该方法在基于IoT的智能系统和通信网络中广泛使用。我们的目标是平衡准确性、计算效率和实际世界可部署性。为此,我们开发了一个轻量的1D-CNN,通过硬件智能神经结构搜索优化1D-CNN(HW-NAS),该方法在ISCX VPN-NONVPN数据集上实现了96.59%的准确性,该数据仅具有88.26K参数,20.12K最高拉高尺寸和10.08M浮动点操作(FLOPs)。此外,该方法在各种技术合作任务中广泛使用,从94%到99%不等。为了能够部署,该模型仅通过硬件智能神经结构搜索(HW-NAS), 其神经结构搜索中只有1至2%的精确度下降。我们评估了两个微量控制器的实时性能表现:高性STMF746G-DISCO和10M浮标定点操作操作(FLEL-CL-401M401),分别在31-F4401号部署的安全性安全性安全分析中实现了高效度的SBS-CLMLMF-40和成本性流流/40MLVLVLVA结果。


Article 51

Title@2025-06-12 (4): GraphThought: Graph Combinatorial Optimization with Thought Generation

Title: GraphThought: Graph Combinatorial Optimization with Thought Generation GraphThought: Graph Combinatorial Optimization mit Thought Generation 图图图图图图: 与新一代人相比的图形组合式优化 2502.11607v2

Authors (9): Zixiao Huang, Lifeng Guo, Wenhao Li, Junjie Sheng, Chuyun Shen, Haosheng Chen, Bo Jin, Changhong Lu, Xiangfeng Wang

Graph combinatorial optimization (GCO) problems are central to domains like logistics and bioinformatics. While traditional solvers dominate, large language models (LLMs) offer new possibilities for structured reasoning, yet struggle with complex GCO tasks requiring rigorous combinatorial analysis and multi-step deduction, often producing hallucinated steps. We first formalize the Optimal Thoughts Design (OTD) problem, which provides a structured guidance for producing high-quality intermediate reasoning steps. Building on this formulation, we introduce GraphThought, a novel framework that generates effective reasoning sequences through either heuristic-guided forward search or solver-aligned backward reasoning. By fine-tuning LLMs on these structured thought sequences, we develop Llama-GT, an 8B-parameter model that achieves state-of-the-art performance on the GraphArena benchmark, outperforming significantly larger models like DeepSeek-V3. Our results demonstrate that when scaffolded with structured reasoning priors, principled thought generation can significantly enhance LLM performance on GCO tasks without requiring increased model scale.

图形组合优化(GCO)问题是物流和生物信息学等领域的核心。虽然传统解决方案占主导地位,但大型语言模型(LLMs)为结构化推理提供了新的可能性,但与复杂的GCO任务挣扎,这些任务需要严格的组合分析和多步扣减,常常产生幻觉步骤。我们首先正式确定最佳思维设计(OTD)问题,它为产生高质量的中间推理步骤提供了结构化指导。基于这一提法,我们引入了GaphThought,这是一个通过超常引导前搜索或求解后向推理来产生有效推理序列的新框架。通过对这些结构化思维序列进行微调LLMs,我们开发了Llama-GT,8B参数模型,在GagArenna基准上达到最新表现,比DeepSeek-V3等大得多的模型表现要好得多。我们的结果显示,在以结构推理的推理推理推理推理推理推理推理推理法时,原则思想生成能够大大提高GCO任务的LM性能。


Article 52

Title@2025-06-12 (4): Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles

Title: Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles Beschleunigen von Diffusion Großsprachige Modelle mit SlowFast: Die drei goldenen Prinzipien 以慢速加速传播大语言模式:三大金原则 2506.10848v1

Authors (5): Qingyan Wei, Yaojie Zhang, Zhiyuan Liu, Dongrui Liu, Linfeng Zhang

Diffusion-based language models (dLLMs) have emerged as a promising alternative to traditional autoregressive LLMs by enabling parallel token generation and significantly reducing inference latency. However, existing sampling strategies for dLLMs, such as confidence-based or semi-autoregressive decoding, often suffer from static behavior, leading to suboptimal efficiency and limited flexibility. In this paper, we propose SlowFast Sampling, a novel dynamic sampling strategy that adaptively alternates between exploratory and accelerated decoding stages. Our method is guided by three golden principles: certainty principle, convergence principle, and positional principle, which govern when and where tokens can be confidently and efficiently decoded. We further integrate our strategy with dLLM-Cache to reduce redundant computation. Extensive experiments across benchmarks and models show that SlowFast Sampling achieves up to 15.63$\times$ speedup on LLaDA with minimal accuracy drop, and up to 34.22$\times$ when combined with caching. Notably, our approach outperforms strong autoregressive baselines like LLaMA3 8B in throughput, demonstrating that well-designed sampling can unlock the full potential of dLLMs for fast and high-quality generation.

以扩散为基础的语言模型(dLLMS)通过平行象征性生成和大量减少推导延迟度,成为传统自动递增的LLMS(dLLMS)的一个有希望的替代方法,它使平行象征性生成和显著减少推导延迟度,然而,现有的dLLMS取样战略,例如信心基或半自动递减解码,往往受到静态行为的影响,导致效率低下和灵活性有限。在本文中,我们建议慢速取样,这是一种新的动态抽样战略,在探索阶段和加速解码阶段之间互换。我们的方法以三项黄金原则为指导:确定性原则、趋同原则和定位原则,这些原则在何时和何地规范标志可以自信和高效地解码。我们进一步将我们的战略与dLLLMM-Case整合,以减少多余的计算。跨基准和模型的广泛实验表明,慢速采样在LLADADA中达到15.631美元,在精确下降后达到34.22\时间,加固。值得注意的是,我们的方法超越了像LLMAMA3和快速制模8B的快速制成的完整自动采样的潜力。


Article 53

Title@2025-06-12 (4): Advanced fraud detection using machine learning models: enhancing financial transaction security

Title: Advanced fraud detection using machine learning models: enhancing financial transaction security Fortgeschrittene Betrugserkennung mit maschinellen Lernmodellen: Erhöhung der finanziellen Transaktionssicherheit 利用机器学习模式发现高级欺诈:加强金融交易安全 2506.10842v1

Authors (10): Nudrat Fariha, Md Nazmuddin Moin Khan, Md Iqbal Hossain, Syed Ali Reza, Joy Chakra Bortty, Kazi Sharmin Sultana, Md Shadidur Islam Jawad, Saniah Safat, Md Abdul Ahad, Maksuda Begum

The rise of digital payments has accelerated the need for intelligent and scalable systems to detect fraud. This research presents an end-to-end, feature-rich machine learning framework for detecting credit card transaction anomalies and fraud using real-world data. The study begins by merging transactional, cardholder, merchant, and merchant category datasets from a relational database to create a unified analytical view. Through the feature engineering process, we extract behavioural signals such as average spending, deviation from historical patterns, transaction timing irregularities, and category frequency metrics. These features are enriched with temporal markers such as hour, day of week, and weekend indicators to expose all latent patterns that indicate fraudulent behaviours. Exploratory data analysis reveals contextual transaction trends across all the dataset features. Using the transactional data, we train and evaluate a range of unsupervised models: Isolation Forest, One Class SVM, and a deep autoencoder trained to reconstruct normal behavior. These models flag the top 1% of reconstruction errors as outliers. PCA visualizations illustrate each models ability to separate anomalies into a two-dimensional latent space. We further segment the transaction landscape using K-Means clustering and DBSCAN to identify dense clusters of normal activity and isolate sparse, suspicious regions.

数字付款的上升加快了对智能和可扩缩系统的需求,以发现欺诈行为。这一研究为发现信用卡交易异常和欺诈,提供了一个端到端的、内容丰富的机器学习框架,以利用真实世界的数据来发现信用卡交易异常和欺诈。研究首先将交易、持卡人、商人和商类数据集从一个关系数据库中合并起来,以形成统一的分析观点。我们通过特征工程过程,提取了平均支出、偏离历史模式、交易时间异常和类别频率尺度等行为信号。这些特征增加了时间标记,如小时、每周一天和周末指标,以暴露显示显示欺诈行为的所有潜在模式。探索性数据分析揭示了所有数据集特征的背景交易趋势。我们利用交易数据,培训和评价一系列不受监督的模式:隔离森林、一类SVM和受过重建正常行为的训练的深层自动编码器。这些模型将重建错误的前1%标为外围值。五氯苯的可视化说明每一种模型都有能力将异常现象分解成一个二维的隐蔽空间。我们使用K-MEAs的正常集群和DBS-CAN活动区域。


Article 54

Title@2025-06-12 (4): Simplicity bias and optimization threshold in two-layer ReLU networks

Title: Simplicity bias and optimization threshold in two-layer ReLU networks Einfachheits-Bias und Optimierungsschwelle in zweischichtigen ReLU-Netzwerken 双层ReLU网络中的简化偏差和优化阈值 2410.02348v2

Authors (2): Etienne Boursier, Nicolas Flammarion

Understanding generalization of overparametrized neural networks remains a fundamental challenge in machine learning. Most of the literature mostly studies generalization from an interpolation point of view, taking convergence of parameters towards a global minimum of the training loss for granted. While overparametrized architectures indeed interpolated the data for typical classification tasks, this interpolation paradigm does not seem valid anymore for more complex tasks such as in-context learning or diffusion. Instead for such tasks, it has been empirically observed that the trained models goes from global minima to spurious local minima of the training loss as the number of training samples becomes larger than some level we call optimization threshold. While the former yields a poor generalization to the true population loss, the latter was observed to actually correspond to the minimiser of this true loss. This paper explores theoretically this phenomenon in the context of two-layer ReLU networks. We demonstrate that, despite overparametrization, networks often converge toward simpler solutions rather than interpolating the training data, which can lead to a drastic improvement on the test loss with respect to interpolating solutions. Our analysis relies on the so called early alignment phase, during which neurons align towards specific directions. This directional alignment, which occurs in the early stage of training, leads to a simplicity bias, wherein the network approximates the ground truth model without converging to the global minimum of the training loss. Our results suggest that this bias, resulting in an optimization threshold from which interpolation is not reached anymore, is beneficial and enhances the generalization of trained models.

对超均分的神经神经网络的普遍理解仍然是机器学习中的一项根本挑战。大多数文献大多研究从内推角度的概括性,将参数的趋同性从全球范围的最低培训损失归结到一个全球最低水平。虽然过分对称性结构实际上将典型分类任务的数据相互交织,但这种内推式模式似乎不再适用于更复杂的任务,如文中学习或传播。对于这些任务,从经验上看,经过培训的模型往往从全球最低水平到虚假的当地培训损失最低水平,因为培训样本的数量比我们所谓的优化临界水平要大得多。虽然前者的参数趋同性参数对真正的人口损失造成较低的普遍程度,但人们认为后者实际上与真正损失的最小程度相对应。本文从理论上讲,在两个层次的RELU网络中探索了这种现象。我们证明,尽管这种偏差性,网络往往趋向于更简单的模式,而不是对培训数据进行内推,这可能导致测试损失与内置解决方案的急剧改善。我们的分析依靠的是,在如此早期的神经调整阶段,在进行这种特定的调整过程中,在进行这种调整的早期的阶段,在进行这种特定的调整,在进行中,在进行这种调整过程中,在进行某种特定的调整。


Article 55

Title@2025-06-12 (4): Efficiency Robustness of Dynamic Deep Learning Systems

Title: Efficiency Robustness of Dynamic Deep Learning Systems Effizienz Robustheit dynamischer Deep Learning Systeme 动态深深学习系统的效率强强 2506.10831v1

Authors (7): Ravishka Rathnasuriya, Tingxi Li, Zexin Xu, Zihe Song, Mirazul Haque, Simin Chen, Wei Yang

Deep Learning Systems (DLSs) are increasingly deployed in real-time applications, including those in resourceconstrained environments such as mobile and IoT devices. To address efficiency challenges, Dynamic Deep Learning Systems (DDLSs) adapt inference computation based on input complexity, reducing overhead. While this dynamic behavior improves efficiency, such behavior introduces new attack surfaces. In particular, efficiency adversarial attacks exploit these dynamic mechanisms to degrade system performance. This paper systematically explores efficiency robustness of DDLSs, presenting the first comprehensive taxonomy of efficiency attacks. We categorize these attacks based on three dynamic behaviors: (i) attacks on dynamic computations per inference, (ii) attacks on dynamic inference iterations, and (iii) attacks on dynamic output production for downstream tasks. Through an in-depth evaluation, we analyze adversarial strategies that target DDLSs efficiency and identify key challenges in securing these systems. In addition, we investigate existing defense mechanisms, demonstrating their limitations against increasingly popular efficiency attacks and the necessity for novel mitigation strategies to secure future adaptive DDLSs.

为了应对效率挑战,动态深层学习系统(DLS)根据投入复杂度调整推论计算,降低管理费用。虽然这种动态行为提高了效率,但这种行为引入了新的攻击表面。特别是,效率对抗性攻击利用了这些动态机制来降低系统性能。本文系统地探索DLS的效率强度,首次对效率攻击进行全面分类。我们根据三种动态行为对这些攻击进行了分类:(一) 攻击动态计算,根据动态推断,(二) 攻击动态推断迭代,(三) 攻击下游任务动态产出生产,通过深入评估,我们分析针对DLS效率的对抗性战略,并找出保障这些系统的关键性挑战。此外,我们调查现有的防御机制,表明它们对日益流行的效率攻击的局限性,以及有必要制定新的缓解战略,以确保今后的DLS适应性攻击。


Article 56

Title@2025-06-12 (4): On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves

Title: On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves Auf der Geometrie der Receiver-Betriebscharakteristik und Precision-Recall-Kurven 收件者操作特征和精确召回曲线几何 2504.02169v2

Authors (1): Reza Sameni

We study the geometry of Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves in binary classification problems. The key finding is that many of the most commonly used binary classification metrics are merely functions of the composition function $G := F_p \circ F_n^{-1}$, where $F_p(\cdot)$ and $F_n(\cdot)$ are the class-conditional cumulative distribution functions of the classifier scores in the positive and negative classes, respectively. This geometric perspective facilitates the selection of operating points, understanding the effect of decision thresholds, and comparison between classifiers. It also helps explain how the shapes and geometry of ROC/PR curves reflect classifier behavior, providing objective tools for building classifiers optimized for specific applications with context-specific constraints. We further explore the conditions for classifier dominance, present analytical and numerical examples demonstrating the effects of class separability and variance on ROC and PR geometries, and derive a link between the positive-to-negative class leakage function $G(\cdot)$ and the Kullback–Leibler divergence. The framework highlights practical considerations, such as model calibration, cost-sensitive optimization, and operating point selection under real-world capacity constraints, enabling more informed approaches to classifier deployment and decision-making.

在二进制分类问题中,我们研究了接收器操作特征曲线(ROC)和精度-回调(PR)曲线的几何性。关键结论是,许多最常用的二进制分类指标只是构成函数的函数$G:=F_p\circ F_n-1}$,其中,F_p(cdot)$和$F_n(cdot)$分别是正类和负类中分类分数的等级累积性分布功能。这一几何观点有利于选择操作点,理解决定阈值的影响和分类者之间的比较。它也有助于解释ROC/PR曲线的形状和几何形状和几何分反映分类行为,为在特定环境的限制下优化特定应用程序优化的分类提供客观工具。我们进一步探索了分类支配地位的条件,提出分析性和数字实例,表明分类对正到负级的分类差分数和差异对ROC和PR的分布式分布性功能。这种几何角度视角有助于选择操作点渗漏功能,理解决定阈值-实际选择成本比重。


Article 57

Title@2025-06-12 (4): Learning richness modulates equality reasoning in neural networks

Title: Learning richness modulates equality reasoning in neural networks Lernreichtum moduliert Gleichstellungsabwägungen in neuronalen Netzwerken 学习丰富 适应神经网络的平等推理 2503.09781v2

Authors (2): William L. Tong, Cengiz Pehlevan

Equality reasoning is ubiquitous and purely abstract: sameness or difference may be evaluated no matter the nature of the underlying objects. As a result, same-different (SD) tasks have been extensively studied as a starting point for understanding abstract reasoning in humans and across animal species. With the rise of neural networks that exhibit striking apparent proficiency for abstractions, equality reasoning in these models has also gained interest. Yet despite extensive study, conclusions about equality reasoning vary widely and with little consensus. To clarify the underlying principles in learning SD tasks, we develop a theory of equality reasoning in multi-layer perceptrons (MLP). Following observations in comparative psychology, we propose a spectrum of behavior that ranges from conceptual to perceptual outcomes. Conceptual behavior is characterized by task-specific representations, efficient learning, and insensitivity to spurious perceptual details. Perceptual behavior is characterized by strong sensitivity to spurious perceptual details, accompanied by the need for exhaustive training to learn the task. We develop a mathematical theory to show that an MLP’s behavior is driven by learning richness. Rich-regime MLPs exhibit conceptual behavior, whereas lazy-regime MLPs exhibit perceptual behavior. We validate our theoretical findings in vision SD experiments, showing that rich feature learning promotes success by encouraging hallmarks of conceptual behavior. Overall, our work identifies feature learning richness as a key parameter modulating equality reasoning, and suggests that equality reasoning in humans and animals may similarly depend on learning richness in neural circuits.

平等推理是无处不在的,纯粹抽象的:无论基本对象的性质如何,都可能评价相同或差异。因此,作为理解人类和动物物种抽象推理的起点,对相同或差异(SD)任务进行了广泛的研究。随着神经网络的兴起,明显精准抽象,这些模型的平等推理也引起了兴趣。尽管进行了广泛的研究,关于平等推理的结论大相径庭,而且很少达成共识。为了澄清学习SD任务的基本原则,我们在多层感官(MLP)中发展了一种平等推理理论。在比较心理学的观察之后,我们提出了从概念到感知结果的多种行为。概念行为的特点有:特定任务的表现、高效的学习、对假概念细节细节的敏锐性。概念行为的特点是高度敏感地认识细节,同时需要详尽的培训来学习任务。我们开发了一个数学理论,表明MLP的行为由学习的丰富程度驱动。根据Rich-realim MLP 展示了概念和感知性的行为,而理论性推导了我们的概念性、感性、感性、感性、感性、感性、感官、感性、感性、感性、感性、感性、感性、感性、感性、感力、感性、感力、感力、感力、感力、感力、感、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性能、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性、性


Article 58

Title@2025-06-12 (4): Empirical and computer-aided robustness analysis of long-step and accelerated methods in smooth convex optimization

Title: Empirical and computer-aided robustness analysis of long-step and accelerated methods in smooth convex optimization Empirische und computergestützte Robustheitsanalyse von langen und beschleunigten Methoden in reibungsloser konvexer Optimierung 经验分析和计算机辅助的稳健性分析,分析如何以长步骤和加速方法顺利优化顺流优化 2506.09730v2

Authors (2): Pierre Vernimmen, François Glineur

This work assesses both empirically and theoretically, using the performance estimation methodology, how robust different first-order optimization methods are when subject to relative inexactness in their gradient computations. Relative inexactness occurs, for example, when compressing the gradient using fewer bits of information, which happens when dealing with large-scale problems on GPUs. Three major families of methods are analyzed: constant step gradient descent, long-step methods, and accelerated methods. The latter two are first shown to be theoretically not robust to inexactness. Then, a semi-heuristic shortening factor is introduced to improve their theoretical guarantees. All methods are subsequently tested on a concrete inexact problem, with two different types of relative inexactness, and it is observed that both accelerated methods are much more robust than expected, and that the shortening factor significantly helps the long-step methods. In the end, all shortened methods appear to be promising, even in this inexact setting.

这项工作利用业绩估计方法,从经验上和理论上评估了在梯度计算中相对不精确时,不同一级优化方法的稳健程度如何。例如,在使用较少的信息块压缩梯度时,就会出现相对不准确的情况,这在处理GPU的大规模问题时发生。对方法的三大类进行了分析:不变的梯度梯度下降、长步骤方法和加速方法。后两类方法首先显示在理论上不精确。然后,采用了半超速缩短系数来改进其理论保障。所有方法随后都经过一个具体的不精确问题的测试,有两种不同的相对不准确性,据观察,两种加速方法都比预期的强得多,缩短系数对长步骤方法大有帮助。最后,所有缩短的方法似乎在理论上都不可靠,即使在这种精确的环境下也是有希望的。


Article 59

Title@2025-06-12 (4): General targeted machine learning for modern causal mediation analysis

Title: General targeted machine learning for modern causal mediation analysis Allgemeines gezieltes maschinelles Lernen für moderne kausale Mediationsanalyse 现代因果调解分析通用定向机器学习 2408.14620v2

Authors (4): Richard Liu, Nicholas T. Williams, Kara E. Rudolph, Iván Díaz

Causal mediation analyses investigate the mechanisms through which causes exert their effects, and are therefore central to scientific progress. The literature on the non-parametric definition and identification of mediational effects in rigourous causal models has grown significantly in recent years, and there has been important progress to address challenges in the interpretation and identification of such effects. Despite great progress in the causal inference front, statistical methodology for non-parametric estimation has lagged behind, with few or no methods available for tackling non-parametric estimation in the presence of multiple, continuous, or high-dimensional mediators. In this paper we show that the identification formulas for six popular non-parametric approaches to mediation analysis proposed in recent years can be recovered from just two statistical estimands. We leverage this finding to propose an all-purpose one-step estimation algorithm that can be coupled with machine learning in any mediation study that uses any of these six definitions of mediation. The estimators have desirable properties, such as $\sqrt{n}$-convergence and asymptotic normality. Estimating the first-order correction for the one-step estimator requires estimation of complex density ratios on the potentially high-dimensional mediators, a challenge that is solved using recent advancements in so-called Riesz learning. We illustrate the properties of our methods in a simulation study and illustrate its use on real data to estimate the extent to which pain management practices mediate the total effect of having a chronic pain disorder on opioid use disorder.

尽管在因果关系方面取得重大进展,但非参数估算的统计方法仍然落后,在多重、连续或高层次调解人在场的情况下,用于处理非参数估算的方法很少或根本没有,因此是科学进步的核心。在本文中,我们表明,近年来提出的调解分析的六种流行非参数定义和确定严格因果关系模型的鉴定公式可以从两个统计估计值中收回。我们利用这一发现提出一个全目的一步估算算法,与在任何调解研究中使用这六种调解定义的机能学习相结合。非参数估算方法具有可取性,例如,在多个、连续或高层次调解人在场的情况下,用于处理非参数估算的方法很少或根本没有。我们在本文中显示,近年来提出的调解分析的六种流行非参数性非参数定义的鉴定公式可以从两个统计估计值中恢复。我们利用这一发现来提出一个全目的一步估算法,同时在任何调解研究中学习任何这六种调解定义的机能方面,这些估计值具有可取性,例如$qrt{converg}和astial restruational real real restial restial real real restial restial restial restial romatial ex romatiquest romatime 需要我们使用一个高度对一个高度的精确度的深度的精确度的精确度估算。


Article 60

Title@2025-06-12 (4): Detecting High-Stakes Interactions with Activation Probes

Title: Detecting High-Stakes Interactions with Activation Probes Ermitteln von High-Stakes-Interaktionen mit Aktivierungssonden 与活性检测检测检测检测高吸量相互作用 2506.10805v1

Authors (7): Alex McKenzie, Urja Pawar, Phil Blandfort, William Bankes, David Krueger, Ekdeep Singh Lubana, Dmitrii Krasheninnikov

Monitoring is an important aspect of safely deploying Large Language Models (LLMs). This paper examines activation probes for detecting “high-stakes” interactions – where the text indicates that the interaction might lead to significant harm – as a critical, yet underexplored, target for such monitoring. We evaluate several probe architectures trained on synthetic data, and find them to exhibit robust generalization to diverse, out-of-distribution, real-world data. Probes’ performance is comparable to that of prompted or finetuned medium-sized LLM monitors, while offering computational savings of six orders-of-magnitude. Our experiments also highlight the potential of building resource-aware hierarchical monitoring systems, where probes serve as an efficient initial filter and flag cases for more expensive downstream analysis. We release our novel synthetic dataset and codebase to encourage further study.

本文审查了用于探测“高接触”相互作用的激活探测器 – – 文本指出这种相互作用可能导致重大伤害 – – 作为这种监测的关键但探索不足的目标。我们评估了几个经过合成数据培训的探测结构,发现这些结构能够展示出对多样化的、分布外的、真实世界数据进行有力概括的特征。探头的性能与激励或微调的中等规模LLM监测器的性能相当,同时提供了六个微量级的计算节余。我们的实验还突显了建立有资源意识的等级监测系统的潜力,在这种系统中,探测器作为高效的初始过滤器和标志案例,用于更昂贵的下游分析。我们发行了我们新的合成数据集和代码库,以鼓励进一步研究。


Article 61

Title@2025-06-12 (4): Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background Knowledge

Title: Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background Knowledge Optimale Algorithmen für eingeschränktes k-Center-Clustering mit Instance-Level-Hintergrundwissen 受约束的 k- Center 集成实情背景知识的近最佳 K- Center 算法 2401.12533v4

Authors (5): Longkun Guo, Chaoqi Jia, Kewen Liao, Zhigang Lu, Minhui Xue

Center-based clustering has attracted significant research interest from both theory and practice. In many practical applications, input data often contain background knowledge that can be used to improve clustering results. In this work, we build on widely adopted $k$-center clustering and model its input background knowledge as must-link (ML) and cannot-link (CL) constraint sets. However, most clustering problems including $k$-center are inherently $\mathcal{NP}$-hard, while the more complex constrained variants are known to suffer severer approximation and computation barriers that significantly limit their applicability. By employing a suite of techniques including reverse dominating sets, linear programming (LP) integral polyhedron, and LP duality, we arrive at the first efficient approximation algorithm for constrained $k$-center with the best possible ratio of 2. We also construct competitive baseline algorithms and empirically evaluate our approximation algorithm against them on a variety of real datasets. The results validate our theoretical findings and demonstrate the great advantages of our algorithm in terms of clustering cost, clustering quality, and running time.

以中心为主的集群在理论和实践两方面都引起了重要的研究兴趣。在许多实际应用中,输入数据往往包含可用于改进集群结果的背景知识。在这项工作中,我们以广泛采用的以美元为核心的集群为基础,并以其输入背景知识作为必须链接(ML)和不能链接(CL)的制约组合模型进行建模。然而,包括美元中枢在内的大多数集群问题本质上是美元硬的,而已知较复杂的受限制变量会遭受严重近似和计算障碍,从而严重限制其适用性。通过使用一系列技术,包括反向主导数据集、线性编程(LP)整体多元元和LP的双重性,我们达到了限制美元中枢的第一个有效近似算法,其最佳比率为2。 我们还建立了竞争性基线算法,并用经验评估我们根据各种真实数据集进行的近似算法。结果证实了我们的理论结论,并展示了我们的算法在组合成本、组合质量和运行时间方面的巨大优势。


Article 62

Title@2025-06-12 (4): Dense Associative Memory with Epanechnikov Energy

Title: Dense Associative Memory with Epanechnikov Energy Dichtes assoziatives Gedächtnis mit Epanechnikov Energie Epanechnikov能源的密集联合记忆 2506.10801v1

Authors (5): Benjamin Hoover, Zhaoyang Shi, Krishnakumar Balasubramanian, Dmitry Krotov, Parikshit Ram

We propose a novel energy function for Dense Associative Memory (DenseAM) networks, the log-sum-ReLU (LSR), inspired by optimal kernel density estimation. Unlike the common log-sum-exponential (LSE) function, LSR is based on the Epanechnikov kernel and enables exact memory retrieval with exponential capacity without requiring exponential separation functions. Moreover, it introduces abundant additional \emph{emergent} local minima while preserving perfect pattern recovery – a characteristic previously unseen in DenseAM literature. Empirical results show that LSR energy has significantly more local minima (memories) that have comparable log-likelihood to LSE-based models. Analysis of LSR’s emergent memories on image datasets reveals a degree of creativity and novelty, hinting at this method’s potential for both large-scale memory storage and generative tasks.

我们提议为 “ 共振内存 “ 网络,即由最佳内核密度估计所启发的日志总和(LSR)网络提供一个新的能源功能。与通用日志总和(LSE)功能不同,LSR以Epanechnikov内核为基础,能够以指数能力进行精确的内存检索,而不需要指数分离功能。此外,它引入了大量更多的额外\emph{emergent}当地微型资源,同时保持了完美的模式恢复 – – 这是DenseAM文献中以前所见的特征。经验性结果显示,LSR能量具有比LSE模型类似本地微型(模量)的显著增加。对图像数据集的原始记忆的分析揭示了一定程度的创造性和新颖性,暗示了这一方法对于大型内存存储和基因化任务的潜力。


Article 63

Title@2025-06-12 (4): Divide-Fuse-Conquer: Eliciting “Aha Moments” in Multi-Scenario Games

Title: Divide-Fuse-Conquer: Eliciting “Aha Moments” in Multi-Scenario Games Divide-Fuse-Conquer: Eliciting “Aha Momente” in Multi-Szenario-Spiele 分裂-裂变:在多种场景运动会中激发“哈动力” 2505.16401v4

Authors (8): Xiaoqing Zhang, Huabin Zheng, Ang Lv, Yuhan Liu, Zirui Song, Xiuying Chen, Rui Yan, Flood Sung

Large language models (LLMs) have been observed to suddenly exhibit advanced reasoning abilities during reinforcement learning (RL), resembling an ``aha moment’’ triggered by simple outcome-based rewards. While RL has proven effective in eliciting such breakthroughs in tasks involving mathematics, coding, and vision, it faces significant challenges in multi-scenario games. The diversity of game rules, interaction modes, and environmental complexities often leads to policies that perform well in one scenario but fail to generalize to others. Simply combining multiple scenarios during training introduces additional challenges, such as training instability and poor performance. To overcome these challenges, we propose Divide-Fuse-Conquer, a framework designed to enhance generalization in multi-scenario RL. This approach starts by heuristically grouping games based on characteristics such as rules and difficulties. Specialized models are then trained for each group to excel at games in the group is what we refer to as the divide step. Next, we fuse model parameters from different groups as a new model, and continue training it for multiple groups, until the scenarios in all groups are conquered. Experiments across 18 TextArena games show that Qwen2.5-32B-Align trained with the Divide-Fuse-Conquer strategy reaches a performance level comparable to Claude3.5, achieving 7 wins and 4 draws. We hope our approach can inspire future research on using reinforcement learning to improve the generalization of LLMs.

大型语言模型(LLMS)被观察到在强化学习(RL)期间突然展示了先进的推理能力,类似于简单的基于结果的奖励触发了“aha moment ” 的“aha moment ” 。虽然RL已证明在数学、编码和愿景等任务中取得突破是有效的,但它在多角度的游戏中面临重大挑战。游戏规则、互动模式和环境复杂性的多样性往往导致在一种情景中表现良好,但未能推广到其他情景。仅仅在培训期间将多种情景结合起来就带来了额外的挑战,例如培训不稳定和业绩不佳。为了克服这些挑战,我们提议了一个旨在强化多角度RL中通用化的框架。这个方法始于基于规则和困难等特点的超理论性组合游戏。然后,我们为每个群体培训了在集团游戏中优异的模型,我们称之为分化步骤。接下来,我们将不同群体的模式参数融合为新的模式,并继续为多个群体提供培训,直到所有群体的情况都被征服。在18个TextArnaF游戏中进行实验,通过经过培训的升级的升级战略来提升我们未来的学习水平。


Article 64

Title@2025-06-12 (4): Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling

Title: Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling Adaptives Federated LoRA in heterogenen drahtlosen Netzwerken mit unabhängiger Probenahme 具有独立抽样调查的多源无线网络中的联邦适应性 2505.23555v2

Authors (7): Yanzhao Hou, Jiaxiang Geng, Boyu Li, Xiaofeng Tao, Juncheng Wang, Xiaodong Xu, Bing Luo

Federated LoRA has emerged as a promising technique for efficiently fine-tuning large language models (LLMs) on distributed devices by reducing the number of trainable parameters. However, existing approaches often inadequately overlook the theoretical and practical implications of system and data heterogeneity, thereby failing to optimize the overall training efficiency, particularly in terms of wall-clock time. In this paper, we propose an adaptive federated LoRA strategy with independent client sampling to minimize the convergence wall-clock time of federated fine-tuning under both computation and communication heterogeneity. We first derive a new convergence bound for federated LoRA with arbitrary and independent client sampling, notably without requiring the stringent bounded gradient assumption. Then, we introduce an adaptive bandwidth allocation scheme that accounts for heterogeneous client resources and system bandwidth constraints. Based on the derived theory, we formulate and solve a non-convex optimization problem to jointly determine the LoRA sketching ratios and sampling probabilities, aiming to minimize wall-clock convergence time. An efficient and low-complexity algorithm is developed to approximate the solution. Finally, extensive experiments demonstrate that our approach significantly reduces wall-clock training time compared to state-of-the-art methods across various models and datasets.

通过减少可训练参数的数量,联邦洛拉联盟已成为高效微调分布式设备上大型语言模型(LLMs)的一个很有希望的技术,通过减少可训练参数的数量,可以有效地微调分布式设备上的大型语言模型(LLMs),但是,现有的方法往往没有适当地忽视系统和数据差异的理论和实践影响,从而未能优化总体培训效率,特别是墙时时段的培训效率。在本文件中,我们提出了一个适应性的联邦洛拉联盟战略,通过独立客户抽样,尽量减少计算和通信差异性两种情况下联合微调的同步时间。我们首先为具有任意和独立客户抽样的联邦洛拉公司找到新的趋同点,特别是不需要严格的封闭梯度假设。然后,我们引入了适应性带宽分配计划,考虑到各种客户资源和系统带宽限制。根据推理,我们制定并解决非凝固型优化问题,共同确定洛拉的草图比例和取样概率,目的是最大限度地减少墙时段的趋同时间。我们制定了高效和低兼容性的算法,以近解决方案。最后,广泛的实验表明我们的做法大大缩短了各种壁点培训时间和不同状态的数据。


Article 65

Title@2025-06-12 (4): Mimicking Human Intuition: Cognitive Belief-Driven Reinforcement Learning

Title: Mimicking Human Intuition: Cognitive Belief-Driven Reinforcement Learning Mimikking Human Intuition: Kognitives Glauben-getriebenes Stärkungslernen 模拟人类感知:认知信仰 – – 强化学习 2410.01739v3

Authors (3): Xingrui Gu, Guanren Qiao, Chuyi Jiang

Traditional reinforcement learning (RL) methods mainly rely on trial-and-error exploration, often lacking mechanisms to guide agents toward more informative decision-making and struggling to leverage past experiences, resulting in low sample efficiency. To overcome this issue, we propose an innovative framework inspired by cognitive principles: Cognitive Belief-Driven Reinforcement Learning (CBD-RL). By incorporating cognitive heuristics, CBD-RL transforms conventional trial-and-error learning into a more structured and guided learning paradigm, simulating the human reasoning process. This framework’s core is a belief system that optimizes action probabilities by integrating feedback with prior experience, thus enhancing decision making under uncertainty. It also organizes state-action pairs into meaningful categories, promoting generalization and improving sample efficiency. The concrete implementations of this framework, CBDQ, CBDPPO, and CBDSAC, demonstrate superior performance in discrete and continuous action spaces in diverse environments such as Atari and MuJoCo. By bridging cognitive science and reinforcement learning, this research opens a new avenue for developing RL systems that are more interpretable, efficient, and cognitively inspired.

传统强化学习(RL)方法主要依赖试探和试探探索,往往缺乏机制来引导代理人进行信息更加丰富的决策,并努力利用过去的经验,从而降低抽样效率。为了克服这一问题,我们提议了一个由认知原则启发的创新框架:认知信仰驱动强化学习(CBD-RL),通过将认知超常性学纳入一个更有条理、更有指导的学习模式,CBD-RL将常规试探和试探学习转化成一个更有条理的学习模式,模拟人类推理过程。这个框架的核心是一个信仰系统,它通过将反馈与以往的经验结合起来,优化行动概率,从而增强在不确定性下的决策。它还将州际行动对口组织成有意义的类别,促进总体化并提高抽样效率。这个框架(CBDQ、CBOPPO和CDSAC)的具体实施展示了在诸如Atari和MuJoCo等不同环境中的离散和连续行动空间的优异性表现。通过连接认知科学和强化学习,这一研究为开发更易解释、高效和有认知灵感的RL系统开辟了一条新途径。


Article 66

Title@2025-06-12 (4): Incentivizing Quality Text Generation via Statistical Contracts

Title: Incentivizing Quality Text Generation via Statistical Contracts Förderung der Erzeugung von Qualitätstexten durch statistische Verträge 通过统计合同鼓励质量制文本 2406.11118v2

Authors (3): Eden Saig, Ohad Einav, Inbal Talgam-Cohen

While the success of large language models (LLMs) increases demand for machine-generated text, current pay-per-token pricing schemes create a misalignment of incentives known in economics as moral hazard: Text-generating agents have strong incentive to cut costs by preferring a cheaper model over the cutting-edge one, and this can be done “behind the scenes” since the agent performs inference internally. In this work, we approach this issue from an economic perspective, by proposing a pay-for-performance, contract-based framework for incentivizing quality. We study a principal-agent game where the agent generates text using costly inference, and the contract determines the principal’s payment for the text according to an automated quality evaluation. Since standard contract theory is inapplicable when internal inference costs are unknown, we introduce cost-robust contracts. As our main theoretical contribution, we characterize optimal cost-robust contracts through a direct correspondence to optimal composite hypothesis tests from statistics, generalizing a result of Saig et al. (NeurIPS’23). We evaluate our framework empirically by deriving contracts for a range of objectives and LLM evaluation benchmarks, and find that cost-robust contracts sacrifice only a marginal increase in objective value compared to their cost-aware counterparts.

虽然大型语言模型(LLMs)的成功增加了对机器生成文本的需求,但目前的按成本计价计划却造成了对经济学中已知的道德危害的激励因素的不匹配: 文本制作商通过选择更便宜的模型而不是尖端模型而强烈地鼓励削减成本,而且这样做可以“在幕后”进行,因为该代理商在内部进行推断。在这项工作中,我们从经济角度处理该问题,提出一个基于合同的按业绩计酬框架来激励质量。我们研究了一种主要代理商游戏,即代理商使用昂贵的推断生成文本,而合同根据自动质量评估确定文本的本金付款。由于标准合同理论在内部推论成本未知的情况下不适用,我们引入了成本-破坏合同。作为我们的主要理论贡献,我们通过直接对应统计中的最佳综合假设测试,概括赛格等人的结果,我们通过将成本与成本比比比,我们仅将成本比对目标和成本比比的牺牲合同来评估我们的框架。(NeurIPS23)。


Article 67

Title@2025-06-12 (4): SlotPi: Physics-informed Object-centric Reasoning Models

Title: SlotPi: Physics-informed Object-centric Reasoning Models SlotPi: Physik-informierte objektzentrierte Vernunftmodelle SlotPi:物理学-知情的物体中心理由模型 2506.10778v1

Authors (11): Jian Li, Wan Han, Ning Lin, Yu-Liang Zhan, Ruizhi Chengze, Haining Wang, Yi Zhang, Hongsheng Liu, Zidong Wang, Fan Yu, Hao Sun

Understanding and reasoning about dynamics governed by physical laws through visual observation, akin to human capabilities in the real world, poses significant challenges. Currently, object-centric dynamic simulation methods, which emulate human behavior, have achieved notable progress but overlook two critical aspects: 1) the integration of physical knowledge into models. Humans gain physical insights by observing the world and apply this knowledge to accurately reason about various dynamic scenarios; 2) the validation of model adaptability across diverse scenarios. Real-world dynamics, especially those involving fluids and objects, demand models that not only capture object interactions but also simulate fluid flow characteristics. To address these gaps, we introduce SlotPi, a slot-based physics-informed object-centric reasoning model. SlotPi integrates a physical module based on Hamiltonian principles with a spatio-temporal prediction module for dynamic forecasting. Our experiments highlight the model’s strengths in tasks such as prediction and Visual Question Answering (VQA) on benchmark and fluid datasets. Furthermore, we have created a real-world dataset encompassing object interactions, fluid dynamics, and fluid-object interactions, on which we validated our model’s capabilities. The model’s robust performance across all datasets underscores its strong adaptability, laying a foundation for developing more advanced world models.

通过视觉观测,与现实世界中的人类能力相近,对物理法规范的动态的理解和推理提出了重大挑战。目前,以物体为中心的动态模拟方法,与人类行为相似,取得了显著的进展,但忽略了两个关键方面:(1)物理知识融入模型;人类通过观察世界而获得物理洞察力,并将这种知识应用于对各种动态假设的准确理解;(2)对不同情景模型的适应性进行验证;(2)真实世界动态,特别是涉及流体和物体的动态,要求不仅捕捉物体相互作用,而且模拟流体流体特征的模型。为弥补这些差距,我们引入了SlotPi,即基于空格的物理知情物理学物体中心推理模型。SlotPi将基于汉密尔顿原则的物理模块与用于动态预测的时空预测模块结合起来。我们的实验突出模型在基准和流体数据集等任务中的优势。此外,我们创建了一个包含物体相互作用、流体动力动态和流体反射体相互作用的实时数据集,我们以此验证了我们模型的先进能力。模型展示了一种强大的世界模型的基础。


Article 68

Title@2025-06-12 (4): Monotone Classification with Relative Approximations

Title: Monotone Classification with Relative Approximations Einfarbige Klassifizierung mit relativen Annäherungen 相对相近的单体分类 2506.10775v1

Authors (1): Yufei Tao

In monotone classification, the input is a multi-set $P$ of points in $\mathbb{R}^d$, each associated with a hidden label from ${-1, 1}$. The goal is to identify a monotone function $h$, which acts as a classifier, mapping from $\mathbb{R}^d$ to ${-1, 1}$ with a small {\em error}, measured as the number of points $p \in P$ whose labels differ from the function values $h(p)$. The cost of an algorithm is defined as the number of points having their labels revealed. This article presents the first study on the lowest cost required to find a monotone classifier whose error is at most $(1 + \epsilon) \cdot k^$ where $\epsilon \ge 0$ and $k^$ is the minimum error achieved by an optimal monotone classifier – in other words, the error is allowed to exceed the optimal by at most a relative factor. Nearly matching upper and lower bounds are presented for the full range of $\epsilon$. All previous work on the problem can only achieve an error higher than the optimal by an absolute factor.

在单调分类中,输入的值为$mathbb{Rd$的多价P$,每个值都与$@-1, 1@$的隐藏标签相关。 目标是确定一个单调函数$h$, 它作为分类器发挥作用, 从$mathbb{Rd$到$%-1, 1$@$@$d$, 用小的 em差错量测量, 其标签不同于函数值$h( p)$的点数 。 算法的成本被定义为其标签被显示的点数。 本文章首次对找到一个单调分类器所需的最低成本进行了研究, 该分类器的错误最大值为$(1+\ epsilon)\ = k$@ $@ $@ gege 0 和 $k$ $是最佳单调的最小错误 – 换句话说, 允许错误以最优的相对系数超过最优的值。 近匹配的上下限和下限, 仅匹配的上下限值是前位值的绝对值, 。


Article 69

Title@2025-06-12 (4): Skillful joint probabilistic weather forecasting from marginals

Title: Skillful joint probabilistic weather forecasting from marginals Qualifizierte gemeinsame probabilistische Wettervorhersage von Randgruppen 来自边缘地带的有技能的联合概率天气预报 2506.10772v1

Authors (11): Ferran Alet, Ilan Price, Andrew El-Kadi, Dominic Masters, Stratis Markou, Tom R. Andersson, Jacklynn Stott, Remi Lam, Matthew Willson, Alvaro Sanchez-Gonzalez, Peter Battaglia

Machine learning (ML)-based weather models have rapidly risen to prominence due to their greater accuracy and speed than traditional forecasts based on numerical weather prediction (NWP), recently outperforming traditional ensembles in global probabilistic weather forecasting. This paper presents FGN, a simple, scalable and flexible modeling approach which significantly outperforms the current state-of-the-art models. FGN generates ensembles via learned model-perturbations with an ensemble of appropriately constrained models. It is trained directly to minimize the continuous rank probability score (CRPS) of per-location forecasts. It produces state-of-the-art ensemble forecasts as measured by a range of deterministic and probabilistic metrics, makes skillful ensemble tropical cyclone track predictions, and captures joint spatial structure despite being trained only on marginals.

机器学习(ML)的天气模型由于比基于数字天气预测(NWP)的传统预测更加精确和速度,在近期全球概率天气预报中表现优于传统组合,因此迅速变得突出,本文件介绍了FGN这一简单、可扩展和灵活的模型方法,该方法大大优于目前最先进的模型;FGN通过经学习的模型干扰和一系列适当受限制的模型生成集合;它直接接受培训,以尽量降低每个地点预测的连续级概率分数(CRPS);它根据一系列确定性和概率指标进行最先进的共同预测;使高技能的热带气旋轨迹预测变得多才多艺;尽管只受过边际学培训,但它捕捉了联合空间结构。


Article 70

Title@2025-06-12 (4): OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Title: OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems OPT-BENCH: LLM Agent auf großräumigen Suchräumen bewerten Optimierungsprobleme OOP-BENCH:评估大型搜索空间优化问题LLM代理 2506.10764v1

Authors (7): Xiaozhe Li, Jixuan Chen, Xinyu Fang, Shengyuan Ding, Haodong Duan, Qingwen Liu, Kai Chen

Large Language Models (LLMs) have shown remarkable capabilities in solving diverse tasks. However, their proficiency in iteratively optimizing complex solutions through learning from previous feedback remains insufficiently explored. To bridge this gap, we present OPT-BENCH, a comprehensive benchmark designed to evaluate LLM agents on large-scale search space optimization problems. OPT-BENCH includes 20 real-world machine learning tasks sourced from Kaggle and 10 classical NP problems, offering a diverse and challenging environment for assessing LLM agents on iterative reasoning and solution refinement. To enable rigorous evaluation, we introduce OPT-Agent, an end-to-end optimization framework that emulates human reasoning when tackling complex problems by generating, validating, and iteratively improving solutions through leveraging historical feedback. Through extensive experiments on 9 state-of-the-art LLMs from 6 model families, we analyze the effects of optimization iterations, temperature settings, and model architectures on solution quality and convergence. Our results demonstrate that incorporating historical context significantly enhances optimization performance across both ML and NP tasks. All datasets, code, and evaluation tools are open-sourced to promote further research in advancing LLM-driven optimization and iterative reasoning. Project page: \href{https://github.com/OliverLeeXZ/OPT-BENCH}{https://github.com/OliverLeeXZ/OPT-BENCH}.

大型语言模型(LLMS)在解决不同任务方面表现出了非凡的能力,然而,它们通过从以往反馈中学习,在迭代优化复杂解决方案方面精通率的精通度仍未得到充分探讨。为了缩小这一差距,我们介绍了旨在评价大型搜索空间优化问题LLM代理物的全面基准O-BENCH。OF-BENCH包括来自Kagle和10个典型NP问题的20个真实世界机器学习任务,为在迭代推理和解决方案完善方面评估LLM代理物提供了多样而具有挑战性的环境。为了能够进行严格的评估,我们引入了LO-Agents,即一个在利用历史反馈解决复杂问题时仿照人类推理的端对端优化框架,即生成、验证和迭代改进解决方案。通过对来自6个模式家庭的9个最先进的LMMs进行广泛的实验,我们分析了优化迭代机率、温度设置和模型结构对解决方案质量和趋同的影响。我们的成果表明,纳入历史背景极大地提高了ML和NP任务的最佳性业绩。所有数据集、代码和评价工具都是公开来源,以促进推进LMM-LM-LO-LOBZ/DUBSUB/DLGIZ/DLVLS/CS/CLS/CLGIGIGLPLPS/CS/CLPLPLPSPS/CSPSPPPPPPPS/PPPPPS/PPPPPPPPPPPPPPPPS/S/S/PPPPPPPPS/S/S/S/S/S/S/SPPPPS/S/S/S/S/S/S/S/S/S/S_


Article 71

Title@2025-06-12 (4): Great Models Think Alike and this Undermines AI Oversight

Title: Great Models Think Alike and this Undermines AI Oversight Große Modelle denken ähnlich und dies unterminiert AI Oversight 伟大的模特儿们想着类似的想法 和这枚地下地雷 AI监督 2502.04313v2

Authors (9): Shashwat Goel, Joschka Struber, Ilze Amanda Auzina, Karuna K Chandra, Ponnurangam Kumaraguru, Douwe Kiela, Ameya Prabhu, Matthias Bethge, Jonas Geiping

As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we refer to as ‘‘AI Oversight’’. We study how model similarity affects both aspects of AI oversight by proposing Chance Adjusted Probabilistic Agreement (CAPA): a metric for LM similarity based on overlap in model mistakes. Using CAPA, we first show that LLM-as-a-judge scores favor models similar to the judge, generalizing recent self-preference results. Then, we study training on LM annotations, and find complementary knowledge between the weak supervisor and strong student model plays a crucial role in gains from ‘‘weak-to-strong generalization’’. As model capabilities increase, it becomes harder to find their mistakes, and we might defer more to AI oversight. However, we observe a concerning trend – model mistakes are becoming more similar with increasing capabilities, pointing to risks from correlated failures. Our work underscores the importance of reporting and correcting for model similarity, especially in the emerging paradigm of AI oversight.

随着语言模型(LM)能力的进步、评估和在规模上监督这些能力对人类来说越来越困难。希望其他语言模型能够使这两项任务自动化,我们称之为“AI 监督 ” 。我们研究模型相似性如何影响AI监督的两方面,提出“CAPA 机会调整概率协议(CAPA) : 一种基于模式错误重叠的LM相似性衡量标准。我们使用CAPA,我们首先显示LLM-as-a-judge 评分类似法官的偏爱模式,概括最近的自我参照结果。然后,我们研究LM 说明,并在弱的主管和强的学生模型之间找到互补知识,在“Weak-en-strong-processional化”的收益中发挥着关键作用。随着模型能力的增长,发现错误变得更加困难,我们可能更偏重于AI监督。然而,我们观察到一种趋势 — 模型错误正越来越接近于日益增强的能力,指出了相关失败的风险。我们的工作强调了报告和纠正模式相似性的重要性,特别是在新兴的AI 监督模式中。


Article 72

Title@2025-06-12 (4): Persistent Topological Features in Large Language Models

Title: Persistent Topological Features in Large Language Models Persistente Topologische Features in großen Sprachmodellen 大语言模式中的持久性有机污染物特征 2410.11042v2

Authors (6): Yuri Gardinazzi, Karthik Viswanathan, Giada Panerai, Alessio Ansuini, Alberto Cazzaniga, Matteo Biagetti

Understanding the decision-making processes of large language models is critical given their widespread applications. To achieve this, we aim to connect a formal mathematical framework – zigzag persistence from topological data analysis – with practical and easily applicable algorithms. Zigzag persistence is particularly effective for characterizing data as it dynamically transforms across model layers. Within this framework, we introduce topological descriptors that measure how topological features, $p$-dimensional holes, persist and evolve throughout the layers. Unlike methods that assess each layer individually and then aggregate the results, our approach directly tracks the full evolutionary path of these features. This offers a statistical perspective on how prompts are rearranged and their relative positions changed in the representation space, providing insights into the system’s operation as an integrated whole. To demonstrate the expressivity and applicability of our framework, we highlight how sensitive these descriptors are to different models and a variety of datasets. As a showcase application to a downstream task, we use zigzag persistence to establish a criterion for layer pruning, achieving results comparable to state-of-the-art methods while preserving the system-level perspective.

理解大语言模型的决策过程至关重要,因为大语言模型的应用十分广泛。为了实现这一点,我们力求将正式数学框架 – – 从地形数据分析得出的zigzag持久性 – – 与实用和易于应用的算法联系起来。 Zigzag持久性对于数据特征的定性特别有效,因为它能动态地跨模式层转变。在此框架内,我们引入了表层描述符,用来测量地形特征、美元-维洞、如何持续并在整个层次上演。不同于对每个层次进行单独评估和随后汇总结果的方法,我们的方法直接跟踪这些特征的全面演进路径。这提供了一种统计视角,说明在代表空间中如何重新排列亮度及其相对位置的变化,为整个系统的运作提供洞察力。为了展示我们框架的清晰度和适用性,我们强调这些描述符对不同模型和各种数据集的敏感度。作为下游任务的示范应用,我们用 zigzag持久性来建立层钻探标准,在保持系统层面视角的同时,取得与最新方法相近的结果。


Article 73

Title@2025-06-12 (4): Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering

Title: Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering Neural bei ArchEHR-QA 2025: Agentische Prompt-Optimierung für evidenzgerundete klinische Fragen ArchEHR-QA 2025:证据四舍五入临床问题解答的代理快速优化 2506.10751v1

Authors (6): Sai Prasanna Teja Reddy Bogireddy, Abrar Majeedi, Viswanatha Reddy Gajjala, Zhuoyan Xu, Siddhant Rai, Vaishnav Potlapalli

Automated question answering (QA) over electronic health records (EHRs) can bridge critical information gaps for clinicians and patients, yet it demands both precise evidence retrieval and faithful answer generation under limited supervision. In this work, we present Neural, the runner-up in the BioNLP 2025 ArchEHR-QA shared task on evidence-grounded clinical QA. Our proposed method decouples the task into (1) sentence-level evidence identification and (2) answer synthesis with explicit citations. For each stage, we automatically explore the prompt space with DSPy’s MIPROv2 optimizer, jointly tuning instructions and few-shot demonstrations on the development set. A self-consistency voting scheme further improves evidence recall without sacrificing precision. On the hidden test set, our method attains an overall score of 51.5, placing second stage while outperforming standard zero-shot and few-shot prompting by over 20 and 10 points, respectively. These results indicate that data-driven prompt optimization is a cost-effective alternative to model fine-tuning for high-stakes clinical QA, advancing the reliability of AI assistants in healthcare.

对电子健康记录自动回答问题(QA)可以弥补临床医生和病人的关键信息差距,但要求准确的证据检索和在有限监督下忠实地生成答案。在这项工作中,我们介绍BioNLP 2025年ArchEHR-QA的后排神经系统,共同执行有证据根据的临床质量分析任务。我们建议的方法将这项任务分为(1) 判决一级证据识别和(2) 回答合成,并明确引用。在每一个阶段,我们自动探索与DSPy’s MIPROv2优化器、联合调整指示和对成套开发的微小的演示的迅速空间。自我一致投票计划可以进一步改进证据召回,而不牺牲精确性。在隐藏的测试中,我们的方法达到总分51.5分,处于第二个阶段,同时超过标准零发数和几发点,分别超过20分和10分。这些结果表明,数据驱动的快速优化是高取临床质量分析模型的成本效益替代方法,提高AI助理在医疗保健方面的可靠性。


Article 74

Title@2025-06-12 (4): Graphical Transformation Models

Title: Graphical Transformation Models Graphische Transformationsmodelle 图形转换模型 2503.17845v3

Authors (4): Matthias Herp, Johannes Brachem, Michael Altenbuchinger, Thomas Kneib

Graphical Transformation Models (GTMs) are introduced as a novel approach to effectively model multivariate data with intricate marginals and complex dependency structures non-parametrically, while maintaining interpretability through the identification of varying conditional independencies. GTMs extend multivariate transformation models by replacing the Gaussian copula with a custom-designed multivariate transformation, offering two major advantages. Firstly, GTMs can capture more complex interdependencies using penalized splines, which also provide an efficient regularization scheme. Secondly, we demonstrate how to approximately regularize GTMs using a lasso penalty towards pairwise conditional independencies, akin to Gaussian graphical models. The model’s robustness and effectiveness are validated through simulations, showcasing its ability to accurately learn parametric vine copulas and identify conditional independencies. Additionally, the model is applied to a benchmark astrophysics dataset, where the GTM demonstrates favorable performance compared to non-parametric vine copulas in learning complex multivariate distributions.

图形转换模型(GTMs)被引入为一种新的方法,以有效模拟具有复杂边际和复杂依赖结构的多变量数据,而不以参数为依据,同时通过识别不同的有条件依赖性来保持解释性。GTMs扩展了多变量转换模型,以定制设计的多变量变换模式取代Gaussian 椰子,提供了两个主要的优势。首先,GTMs可以使用惩罚性的样条来捕捉更复杂的相互依存关系,这也提供了一个有效的正规化计划。第二,我们展示了如何使用拉索罚款来大致规范GTMs,使其适应与高斯图形模型类似的双向有条件依赖性。该模型的坚固性和有效性通过模拟得到验证,展示其准确学习参数性松动和确定有条件的互不依赖性的能力。此外,该模型还用于一个基准的天体物理学数据集,其中GTM在学习复杂的多变量分布中显示优于非参数相交配方的功能。


Article 75

Title@2025-06-12 (4): A User’s Guide to Sampling Strategies for Sliced Optimal Transport

Title: A User’s Guide to Sampling Strategies for Sliced Optimal Transport Leitfaden für Anwender zur Probenahme von Strategien für den optimalen Transport 剪切最佳运输抽样战略用户指南 2502.02275v4

Authors (3): Keanu Sisouk, Julie Delon, Julien Tierny

This paper serves as a user’s guide to sampling strategies for sliced optimal transport. We provide reminders and additional regularity results on the Sliced Wasserstein distance. We detail the construction methods, generation time complexity, theoretical guarantees, and conditions for each strategy. Additionally, we provide insights into their suitability for sliced optimal transport in theory. Extensive experiments on both simulated and real-world data offer a representative comparison of the strategies, culminating in practical recommendations for their best usage.

本文作为用户对切片最佳运输的抽样战略的指南,我们提供了关于切片瓦森斯坦距离的提醒和额外规律性结果,我们详细介绍了每个战略的建造方法、生成时间复杂性、理论保障和条件。此外,我们深入了解这些方法在理论上是否适合切片最佳运输。关于模拟数据和现实世界数据的广泛实验对战略进行了有代表性的比较,最终提出了最佳使用的实际建议。


Article 76

Title@2025-06-12 (4): Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals

Title: Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals Hybride Bernstein-Normalisierungsströmungen für flexible Multivariate Dichteregression mit interpretierbaren Marginalen 具有可解释边际作用的灵活多变密度递减的 伯恩斯坦混合 伯恩斯坦 2505.14164v2

Authors (4): Marcel Arpogaus, Thomas Kneib, Thomas Nagler, David Rügamer

Density regression models allow a comprehensive understanding of data by modeling the complete conditional probability distribution. While flexible estimation approaches such as normalizing flows (NF) work particularly well in multiple dimensions, interpreting the input-output relationship of such models is often difficult, due to the black-box character of deep learning models. In contrast, existing statistical methods for multivariate outcomes such as multivariate conditional transformation models (MCTM) are restricted in flexibility and are often not expressive enough to represent complex multivariate probability distributions. In this paper, we combine MCTM with state-of-the-art and autoregressive NF to leverage the transparency of MCTM for modeling interpretable feature effects on the marginal distributions in the first step and the flexibility of neural-network-based NF techniques to account for complex and non-linear relationships in the joint data distribution. We demonstrate our method’s versatility in various numerical experiments and compare it with MCTM and other NF models on both simulated and real-world data.

密度回归模型通过对完全的有条件概率分布进行建模,使得人们能够全面理解数据。灵活的估算方法,例如正常流动(NF)在多个方面特别有效,但是由于深层学习模型的黑盒特性,解释这些模型的输入-产出关系往往很困难。相比之下,多种变式有条件转换模型(MCTM)等现有多变量结果统计方法的灵活性有限,而且往往不够清晰,无法代表复杂的多变量概率分布。在本文中,我们将MCTTM与最新和自动递增的NF结合起来,以便利用MTM的透明度来模拟第一阶段边际分布的可解释特点效应,以及神经网络的NFT技术的灵活性,以便在联合数据分布中说明复杂和非线性关系。我们展示了我们的方法在各种数字实验中的多功能性,并在模拟数据和实际数据中与MTM和其他NF模型进行比较。


Article 77

Title@2025-06-12 (4): BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

Title: BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery BioNeMo Framework: eine modulare, leistungsstarke Bibliothek für die Entwicklung von KI-Modellen in der Wirkstoffentdeckung BioNemo框架:一个模块化、高绩效图书馆,用于在药物发现方面开发AI模型 2411.10548v3

Authors (93): Peter St. John, Dejun Lin, Polina Binder, Malcolm Greaves, Vega Shah, John St. John, Adrian Lange, Patrick Hsu, Rajesh Illango, Arvind Ramanathan, Anima Anandkumar, David H Brookes, Akosua Busia, Abhishaike Mahajan, Stephen Malina, Neha Prasad, Sam Sinai, Lindsay Edwards, Thomas Gaudelet, Cristian Regep, Martin Steinegger, Burkhard Rost, Alexander Brace, Kyle Hippe, Luca Naef, Keisuke Kamata, George Armstrong, Kevin Boyd, Zhonglin Cao, Han-Yi Chou, Simon Chu, Allan dos Santos Costa, Sajad Darabi, Eric Dawson, Kieran Didi, Cong Fu, Mario Geiger, Michelle Gill, Darren J Hsu, Gagan Kaushik, Maria Korshunova, Steven Kothen-Hill, Youhan Lee, Meng Liu, Micha Livne, Zachary McClure, Jonathan Mitchell, Alireza Moradzadeh, Ohad Mosafi, Youssef Nashed, Saee Paliwal, Yuxing Peng, Sara Rabhi, Farhad Ramezanghorbani, Danny Reidenbach, Camir Ricketts, Brian C Roland, Kushal Shah, Tyler Shimko, Hassan Sirelkhatim, Savitha Srinivasan, Abraham C Stern, Dorota Toczydlowska, Srimukh Prasad Veccham, Niccolò Alberto Elia Venanzi, Anton Vorontsov, Jared Wilber, Isabel Wilkinson, Wei Jing Wong, Eva Xue, Cory Ye, Xin Yu, Yang Zhang, Guoqing Zhou, Becca Zandstein, Alejandro Chacòn, Prashant Sohani, Maximilian Stadler, Christian Hundt, Feiwen Zhu, Christian Dallago, Bruno Trentini, Emine Kucukbenli, Saee Paliwal, Timur Rvachov, Eddie Calleja, Johnny Israeli, Harry Clifford, Risto Haukioja, Nicholas Haemel, Kyle Tretina, Neha Tadimeti, Anthony B Costa

Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.

人工智能模型的编码生物学和化学正在开辟通向高通量和高质量硅药物开发的新途径,然而,它们的培训越来越依赖计算尺度,最近对数百个图形处理器进行了蛋白质语言模型培训;我们引入了生物Nemo框架,以便利在数百个全球动力器中进行计算生物学和化学AI模型的培训;其模块设计允许将单个组成部分,如数据处理器等纳入现有工作流程,并允许社区作出贡献;我们通过使用诸如PLM预培训和微调等案例,详细介绍了生物Nemo框架的技术特点;关于256个NVIDIA A100s,生物Nemo框架在4.2天里对30亿个参数BERT基PLM进行了为期1万亿多个符号的培训;生物Nemo框架是开放的,可供每个人免费使用。


Article 78

Title@2025-06-12 (4): PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models

Title: PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models PREMISE: Skalierbare und strategische Prompt-Optimierung für effiziente mathematische Reasoning in großen Modellen PREMISE:大规模模型中高效数学理由的可扩展和战略快速优化 2506.10716v1

Authors (3): Ye Yu, Yaoning Yu, Haohan Wang

Large reasoning models (LRMs) such as Claude 3.7 Sonnet and OpenAI o1 achieve strong performance on mathematical benchmarks using lengthy chain-of-thought (CoT) reasoning, but the resulting traces are often unnecessarily verbose. This inflates token usage and cost, limiting deployment in latency-sensitive or API-constrained settings. We introduce PREMISE (PRompt-based Efficient Mathematical Inference with Strategic Evaluation), a prompt-only framework that reduces reasoning overhead without modifying model weights. PREMISE combines trace-level diagnostics with gradient-inspired prompt optimization to minimize redundant computation while preserving answer accuracy. The approach jointly optimizes brevity and correctness through a multi-objective textual search that balances token length and answer validity. Unlike prior work, PREMISE runs in a single-pass black-box interface, so it can be applied directly to commercial LLMs. On GSM8K, SVAMP, and Math500 we match or exceed baseline accuracy ($96\%\rightarrow96\%$ with Claude, $91\%\rightarrow92\%$ with Gemini) while reducing reasoning tokens by up to $87.5\%$ and cutting dollar cost by $69$–$82\%$. These results show that prompt-level optimization is a practical and scalable path to efficient LRM inference without compromising reasoning quality.

大型推理模型(LRMs),如Claude 3.7 Sonnet 和 OpenAI o1 等大型推理模型(LRM),利用冗长的思维链推理,在数学基准上取得很强的成绩,但由此产生的痕迹往往不必要。这种增缩象征使用和成本,限制在对延时敏感或受API限制的环境下的部署。我们引入了PREMISE(基于Prompt的高效数学参数与战略评估的PROMISE),一个在不改变模型重量的情况下减少推理成本的简单框架。PREMISE将追踪级诊断与受梯度激励的快速优化结合起来,以尽量减少多余的计算,同时保持回答准确性。这种方法通过多目标文本搜索,使符号长度和回答的有效性达到最佳和准确性。 与以前的工作不同,PREMSE在单一的黑箱界面中运行,因此可以直接应用到商业LMSMS。 GSM8K, SVAMP, 和 Math500我们匹配或超过基准精确性精确性(96-rdefile ladeal $ $99_cal_cal$)和Gemini-rillational-rillational leximlexinal lexal lex lex lex lex lex lex lex lex lex lex lemental lex lex lex lex lex lex lex。


Article 79

Title@2025-06-12 (4): Testing Generalizability in Causal Inference

Title: Testing Generalizability in Causal Inference Prüfung der Verallgemeinerbarkeit bei Kausalzusammenhang 因果关系中的通用性 2411.03021v2

Authors (3): Daniel de Vassimon Manela, Linying Yang, Robin J. Evans

Ensuring robust model performance in diverse real-world scenarios requires addressing generalizability across domains with covariate shifts. However, no formal procedure exists for statistically evaluating generalizability in machine learning algorithms. Existing predictive metrics like mean squared error (MSE) help to quantify the relative performance between models, but do not directly answer whether a model can or cannot generalize. To address this gap in the domain of causal inference, we propose a systematic framework for statistically evaluating the generalizability of high-dimensional causal inference models. Our approach uses the frugal parameterization to flexibly simulate from fully and semi-synthetic causal benchmarks, offering a comprehensive evaluation for both mean and distributional regression methods. Grounded in real-world data, our method ensures more realistic evaluations, which is often missing in current work relying on simplified datasets. Furthermore, using simulations and statistical testing, our framework is robust and avoids over-reliance on conventional metrics, providing statistical safeguards for decision making.

在不同现实世界情景中确保稳健的模型性能需要解决不同领域的通用性,并发生共变变化。然而,在从统计角度评估机器学习算法的通用性方面,没有正式的程序。现有的预测性指标,如平均平方差(MSE)有助于量化模型之间的相对性能,但并不直接回答模型能否或不能概括化。为了解决因果推断领域的这一差距,我们提出了一个系统框架,用于统计评估高维因果推理模型的通用性。我们的方法利用节制参数,从充分和半合成因果基准中进行灵活模拟,为平均值和分布回归方法提供综合评价。基于现实世界数据,我们的方法确保了更现实的评估,而目前的工作往往依靠简化的数据集,而这种评估往往缺乏。此外,我们的框架使用模拟和统计测试,是稳健健的,避免过度依赖常规指标,为决策提供统计保障。


Article 80

Title@2025-06-12 (4): Continual Hyperbolic Learning of Instances and Classes

Title: Continual Hyperbolic Learning of Instances and Classes Kontinuierliches Hyperbolisches Lernen von Instanzen und Klassen 不断对各种事件和课程进行双曲学习 2506.10710v1

Authors (6): Melika Ayoughi, Mina Ghadimi Atigh, Mohammad Mahdi Derakhshani, Cees G. M. Snoek, Pascal Mettes, Paul Groth

Continual learning has traditionally focused on classifying either instances or classes, but real-world applications, such as robotics and self-driving cars, require models to handle both simultaneously. To mirror real-life scenarios, we introduce the task of continual learning of instances and classes, at the same time. This task challenges models to adapt to multiple levels of granularity over time, which requires balancing fine-grained instance recognition with coarse-grained class generalization. In this paper, we identify that classes and instances naturally form a hierarchical structure. To model these hierarchical relationships, we propose HyperCLIC, a continual learning algorithm that leverages hyperbolic space, which is uniquely suited for hierarchical data due to its ability to represent tree-like structures with low distortion and compact embeddings. Our framework incorporates hyperbolic classification and distillation objectives, enabling the continual embedding of hierarchical relations. To evaluate performance across multiple granularities, we introduce continual hierarchical metrics. We validate our approach on EgoObjects, the only dataset that captures the complexity of hierarchical object recognition in dynamic real-world environments. Empirical results show that HyperCLIC operates effectively at multiple granularities with improved hierarchical generalization.

持续学习传统上侧重于对实例或类别进行分类, 但现实世界应用, 如机器人和自驾驶汽车等, 需要同时使用模型来处理两者。 为了镜像现实生活情景, 我们同时引入持续学习实例和类别的任务。 这个任务挑战模型要适应颗粒的多重水平, 需要用粗微分分层分类法的概括性来平衡微微分体识别。 在本文中, 我们确定该类别和情况自然形成等级结构。 为了模拟这些等级关系, 我们建议使用超超CLIC, 一种持续学习的算法, 利用超单向空间, 因为它能够代表像树一样的结构, 且低扭曲和紧凑嵌入, 因而特别适合等级数据 。 我们的框架包含超单向分类和蒸馏目标, 使等级关系能够持续嵌入。 为了评估多颗粒性的性, 我们引入了连续的等级测量标准。 我们验证了我们在EgoObjects上的方法, 唯一的数据集能捕捉到动态真实世界环境中分级物体识别的复杂性 。 磁性CLIC 的结果显示, 系统在多级上有效操作的压性 。


Article 81

Title@2025-06-12 (4): Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

Title: Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models Anreize für eine fortgeschrittene Instruktions-Folge von großen Sprachmodellen 为采用大语言模式的高级指示提供激励理由 2506.01413v2

Authors (9): Yulei Qin, Gang Li, Zongyi Li, Zihan Xu, Yuchen Shi, Zhekai Lin, Xiao Cui, Ke Li, Xing Sun

Existing large language models (LLMs) face challenges of following complex instructions, especially when multiple constraints are present and organized in paralleling, chaining, and branching structures. One intuitive solution, namely chain-of-thought (CoT), is expected to universally improve capabilities of LLMs. However, we find that the vanilla CoT exerts a negative impact on performance due to its superficial reasoning pattern of simply paraphrasing the instructions. It fails to peel back the compositions of constraints for identifying their relationship across hierarchies of types and dimensions. To this end, we propose a systematic method to boost LLMs in dealing with complex instructions via incentivizing reasoning for test-time compute scaling. First, we stem from the decomposition of complex instructions under existing taxonomies and propose a reproducible data acquisition method. Second, we exploit reinforcement learning (RL) with verifiable rule-centric reward signals to cultivate reasoning specifically for instruction following. We address the shallow, non-essential nature of reasoning under complex instructions via sample-wise contrast for superior CoT enforcement. We also exploit behavior cloning of experts to facilitate steady distribution shift from fast-thinking LLMs to skillful reasoners. Extensive evaluations on seven comprehensive benchmarks confirm the validity of the proposed method, where a 1.5B LLM achieves 11.74% gains with performance comparable to a 8B LLM. Codes and data are available at https://github.com/yuleiqin/RAIF.

现有大型语言模型(LLMS)面临遵守复杂指示的挑战,特别是当存在多种限制因素,并在平行、链条和分支结构中组织多种制约时。一个直观的解决方案,即思维链(CoT),预计将普遍提高LLMs的能力。然而,我们发现香草COT由于其肤浅的推理模式而对其业绩产生消极影响,因为它只是将指示抛光光,它没有剥去在确定不同类型和层面的等级关系方面的制约的构成。为此,我们提出一个系统的方法,通过激励测试-时间计算比例的推理,促进LMS处理复杂的指示。首先,我们源于现有分类法下复杂指示的分解,并提出可再生的数据采集方法。第二,我们利用强化学习(RLL)和可核实的以规则为中心的奖赏信号,专门为随后的教学提供推理。我们通过样本对比,处理在高级COT执法的复杂指示下的浅浅、非本质推理学性质。我们还利用专家行为克隆,促进从现有测试-LMLMS-LMS-S-S-S-S-S-S-S-S-Syal-S-S-S-S-S-S-S-S-Sy-Sy-Sy-S-Syal-Syal-S-S-Syal-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-Sy-Sy-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-


Article 82

Title@2025-06-12 (4): ConTextTab: A Semantics-Aware Tabular In-Context Learner

Title: ConTextTab: A Semantics-Aware Tabular In-Context Learner ConTextTab: Ein Semantik-Bewusst Tabular In-Context Learner ConTextTab : 语义学软件制表器 2506.10707v1

Authors (4): Marco Spinaci, Marek Polewczyk, Maximilian Schambach, Sam Thelin

Tabular in-context learning (ICL) has recently achieved state-of-the-art (SOTA) performance on several tabular prediction tasks. Previously restricted to classification problems on small tables, recent advances such as TabPFN and TabICL have extended its use to larger datasets. While being architecturally efficient and well-adapted to tabular data structures, current table-native ICL architectures, being trained exclusively on synthetic data, do not fully leverage the rich semantics and world knowledge contained in real-world tabular data. On another end of this spectrum, tabular ICL models based on pretrained large language models such as TabuLa-8B integrate deep semantic understanding and world knowledge but are only able to make use of a small amount of context due to inherent architectural limitations. With the aim to combine the best of both these worlds, we introduce ConTextTab, integrating semantic understanding and alignment into a table-native ICL framework. By employing specialized embeddings for different data modalities and by training on large-scale real-world tabular data, our model is competitive with SOTA across a broad set of benchmarks while setting a new standard on the semantically rich CARTE benchmark.

最近的一些进展,如TabPFN和TabICL等,已经将其用于更大的数据集。虽然目前的表型ICL结构在结构上效率高,适应了表格数据结构,但完全以合成数据为主的培训,目前的表型ICL结构没有充分利用真实世界表格数据中所包含的丰富的语义和世界知识。在这一频谱的另一端,基于诸如TabuLa-8B等预先培训的大型语言模型的ICL模型,将深度语义理解和世界知识整合在一起,但由于固有的建筑限制,我们只能利用少量的环境。为了将这两个世界的最佳数据组合起来,我们引入了ConTextTAB,将语义理解和一致性纳入表格型ICL框架。通过对不同数据模式的专门嵌入和大规模真实世界表格数据的培训,我们的模型与SOTA具有竞争力,跨越一套广泛的标准基准,同时制定了新的标准CARSAR标准。


Article 83

Title@2025-06-12 (4): Preserving Task-Relevant Information Under Linear Concept Removal

Title: Preserving Task-Relevant Information Under Linear Concept Removal Task-Relevante Informationen unter linearen Konzeptentfernung speichern 在线性概念删除下保留任务相关信息 2506.10703v1

Authors (3): Floris Holstege, Shauli Ravfogel, Bram Wouters

Modern neural networks often encode unwanted concepts alongside task-relevant information, leading to fairness and interpretability concerns. Existing post-hoc approaches can remove undesired concepts but often degrade useful signals. We introduce SPLICE-Simultaneous Projection for LInear concept removal and Covariance prEservation-which eliminates sensitive concepts from representations while exactly preserving their covariance with a target label. SPLICE achieves this via an oblique projection that “splices out” the unwanted direction yet protects important label correlations. Theoretically, it is the unique solution that removes linear concept predictability and maintains target covariance with minimal embedding distortion. Empirically, SPLICE outperforms baselines on benchmarks such as Bias in Bios and Winobias, removing protected attributes while minimally damaging main-task information.

现代神经网络往往将不必要的概念与任务相关的信息编码起来,从而导致对公平和可解释性的关切。 现有的热后方法可以消除不理想的概念,但往往会降低有用的信号。 我们引入了 SPLICE - 同步预测, 用于Learar 概念的清除和共性观测, 从而消除敏感概念的表达方式, 同时完全保持与目标标签的共性。 SPLICE通过模糊的预测实现这一点, 即“ 防止” 不必要的方向, 同时又保护重要的标签关联性。 理论上, 这是消除线性概念可预测性并维持目标变量, 且最小的嵌入扭曲的独特解决方案 。 简而言之, SPLICE 超越了生物和维诺比亚等基准的基线, 去除受保护的属性, 同时又尽可能减少主要任务信息 。


Article 84

Title@2025-06-12 (4): SNR and Resource Adaptive Deep JSCC for Distributed IoT Image Classification

Title: SNR and Resource Adaptive Deep JSCC for Distributed IoT Image Classification SNR und Resource Adaptive Deep JSCC für verteilte IoT-Bildklassifikation SNR与资源适应性深部JSC,用于分布的IoT图像分类 2506.10699v1

Authors (2): Ali Waqas, Sinem Coleri

Sensor-based local inference at IoT devices faces severe computational limitations, often requiring data transmission over noisy wireless channels for server-side processing. To address this, split-network Deep Neural Network (DNN) based Joint Source-Channel Coding (JSCC) schemes are used to extract and transmit relevant features instead of raw data. However, most existing methods rely on fixed network splits and static configurations, lacking adaptability to varying computational budgets and channel conditions. In this paper, we propose a novel SNR- and computation-adaptive distributed CNN framework for wireless image classification across IoT devices and edge servers. We introduce a learning-assisted intelligent Genetic Algorithm (LAIGA) that efficiently explores the CNN hyperparameter space to optimize network configuration under given FLOPs constraints and given SNR. LAIGA intelligently discards the infeasible network configurations that exceed computational budget at IoT device. It also benefits from the Random Forests based learning assistance to avoid a thorough exploration of hyperparameter space and to induce application specific bias in candidate optimal configurations. Experimental results demonstrate that the proposed framework outperforms fixed-split architectures and existing SNR-adaptive methods, especially under low SNR and limited computational resources. We achieve a 10\% increase in classification accuracy as compared to existing JSCC based SNR-adaptive multilayer framework at an SNR as low as -10dB across a range of available computational budget (1M to 70M FLOPs) at IoT device.

在IOT设备上基于感官的本地测算系统面临严重的计算限制,常常需要通过噪音无线频道传输数据,用于服务器处理。为此,我们采用了基于源-气相联合编码(JSCC)的分裂式网络深神经网络(DNN)计划,以提取和传输相关特征,而不是原始数据。然而,大多数现有方法依赖于固定网络分割和静态配置,缺乏适应不同计算预算和频道条件的适应性。本文中,我们提议建立一个新型的SNR-和计算-适应性分布式CNN框架,用于在IOT装置和边缘服务器上进行无线图像分类。我们引入了智能智能智能智能智能阿尔哥里姆(LAIGA)(DNIS)计划,高效探索了CNNW超光速超光速智能网络空间,以优化网络配置模式优化网络配置。LIGA明智地抛弃了超过IO设备的计算预算的不可行网络配置。我们通过随机森林学习帮助避免对超光度空间进行彻底探索,并在候选人最佳配置中引入具体的应用偏差偏差。我们实验结果,在S-ROLLLL的低级的当前预算结构下,在现有的标准结构下,在S-RRRRLLLLLLLLLA中,将现有的现有低级计算结构下,将现有低级结构下,将现有的框架提升了现有低级结构下,作为现有的标准,作为现有的标准。


Article 85

Title@2025-06-12 (4): A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack

Title: A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack Ein hierarchischer Ansatz zur Bewertung der Verwundbarkeit baumbasierter Klassifikationsmodelle für den Schlussfolgerungsangriff der Mitgliedschaft 评估树树分类模式易受成员推推推攻击的脆弱性的等级办法 2502.09396v2

Authors (2): Richard J. Preen, Jim Smith

Machine learning models can inadvertently expose confidential properties of their training data, making them vulnerable to membership inference attacks (MIA). While numerous evaluation methods exist, many require computationally expensive processes, such as training multiple shadow models. This article presents two new complementary approaches for efficiently identifying vulnerable tree-based models: an ante-hoc analysis of hyperparameter choices and a post-hoc examination of trained model structure. While these new methods cannot certify whether a model is safe from MIA, they provide practitioners with a means to significantly reduce the number of models that need to undergo expensive MIA assessment through a hierarchical filtering approach. More specifically, it is shown that the rank order of disclosure risk for different hyperparameter combinations remains consistent across datasets, enabling the development of simple, human-interpretable rules for identifying relatively high-risk models before training. While this ante-hoc analysis cannot determine absolute safety since this also depends on the specific dataset, it allows the elimination of unnecessarily risky configurations during hyperparameter tuning. Additionally, computationally inexpensive structural metrics serve as indicators of MIA vulnerability, providing a second filtering stage to identify risky models after training but before conducting expensive attacks. Empirical results show that hyperparameter-based risk prediction rules can achieve high accuracy in predicting the most at risk combinations of hyperparameters across different tree-based model types, while requiring no model training. Moreover, target model accuracy is not seen to correlate with privacy risk, suggesting opportunities to optimise model configurations for both performance and privacy.

机器学习模型可能无意中暴露其培训数据的保密性质,使其容易受到会员推价攻击(MIA) 。虽然存在许多评价方法,但许多评价方法需要计算昂贵的过程,例如培训多个影子模型。本篇文章提出了两种新的补充方法,以有效识别脆弱的树型模型:超光谱选择的抗热分析和经过培训的模型结构的后热检查。虽然这些新方法无法证明模型是否安全地来自MIA,但它们为从业者提供了一种手段,以大幅减少需要通过等级过滤法进行昂贵的隐私权评估的模型数量。更具体地说,这表明不同超光度组合的披露风险等级顺序在数据集之间仍然一致,从而得以制定简单、人为互换规则,在培训前确定相对高风险的模式。虽然这种超光谱分析不能确定绝对安全性,因为这也取决于具体的数据集,但允许在超光光度调整期间消除不必要的模型配置。此外,计算价格低廉的结构指标是MIA脆弱性的指标,提供第二个过滤阶段,用以在进行高度预测之前确定风险性模型,同时在进行高风险预测之前进行高风险预测。


Article 86

Title@2025-06-12 (4): FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems

Title: FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems FedRAG: Ein Rahmen für Systeme der Feinsteuerung von Retrieval-Augmented Generation FFRAG: 微调取回系统框架 2506.09200v2

Authors (8): Val Andrei Fajardo, David B. Emerson, Amandeep Singh, Veronica Chatrath, Marcelo Lotif, Ravi Theja, Alex Cheung, Izuki Matsuba

Retrieval-augmented generation (RAG) systems have been shown to be effective in addressing many of the drawbacks of relying solely on the parametric memory of large language models. Recent work has demonstrated that RAG systems can be improved via fine-tuning of their retriever and generator models. In this work, we introduce FedRAG, a framework for fine-tuning RAG systems across centralized and federated architectures. FedRAG supports state-of-the-art fine-tuning methods, offering a simple and intuitive interface and a seamless conversion from centralized to federated training tasks. FedRAG is also deeply integrated with the modern RAG ecosystem, filling a critical gap in available tools.

事实证明,回收-提款发电系统(RAG)对于解决完全依赖大型语言模型的参数记忆的许多缺点是有效的,最近的工作表明,通过微调其检索器和发电机模型,可以改进RAG系统。在这项工作中,我们引入FedRAG,这是对中央和联合结构的RAG系统进行微调的框架。 FedRAG支持最先进的微调方法,提供一个简单和直观的界面,并将中央化培训任务顺利地转换为联合培训任务。 FedRAG还与现代RAG生态系统紧密结合,填补了现有工具中的一个关键空白。


Article 87

Title@2025-06-12 (4): Scalable unsupervised feature selection via weight stability

Title: Scalable unsupervised feature selection via weight stability Skalierbare unüberwachte Feature-Auswahl über Gewichtsstabilität 通过重量稳定性进行可缩放的、不受监督的特征选择 2506.06114v2

Authors (2): Xudong Zhang, Renato Cordeiro de Amorim

Unsupervised feature selection is critical for improving clustering performance in high-dimensional data, where irrelevant features can obscure meaningful structure. In this work, we introduce the Minkowski weighted $k$-means++, a novel initialisation strategy for the Minkowski Weighted $k$-means. Our initialisation selects centroids probabilistically using feature relevance estimates derived from the data itself. Building on this, we propose two new feature selection algorithms, FS-MWK++, which aggregates feature weights across a range of Minkowski exponents to identify stable and informative features, and SFS-MWK++, a scalable variant based on subsampling. We support our approach with a theoretical guarantee under mild assumptions and extensive experiments showing that our methods consistently outperform existing alternatives. Our software can be found at https://github.com/xzhang4-ops1/FSMWK.

不受监督的特性选择对于提高高维数据的群集性能至关重要, 在高维数据的群集性能方面, 不相干的特点可以模糊有意义的结构。 在这项工作中, 我们引入了 Minkowski 加权 $k$- means++, 这是Minkowski Weighted $k$s- means的新型初始化战略。 我们的初始化使用数据本身得出的特征相关估计, 概率性地选择了小行星。 在此基础上, 我们提议了两种新的特征选择算法 FS- MWK++, 将一系列Minkowski Expents 的权重组合在一起, 以识别稳定和信息性特征。 SFS- MWK++, 是一种基于子抽样的可缩放变量。 我们支持我们的方法, 在温和假设和广泛实验下, 以理论保证我们的方法始终优于现有替代品。 我们的软件可以在 https://github.com/xzhang4-ops1/FSMWWK 中找到。


Article 88

Title@2025-06-12 (4): STOAT: Spatial-Temporal Probabilistic Causal Inference Network

Title: STOAT: Spatial-Temporal Probabilistic Causal Inference Network STOAT: Raum-Temporal Probabilistic Causal Inference Network STOAT: 空间-临时概率因果关系网络 2506.09544v2

Authors (4): Yang Yang, Du Yin, Hao Xue, Flora Salim

Spatial-temporal causal time series (STC-TS) involve region-specific temporal observations driven by causally relevant covariates and interconnected across geographic or network-based spaces. Existing methods often model spatial and temporal dynamics independently and overlook causality-driven probabilistic forecasting, limiting their predictive power. To address this, we propose STOAT (Spatial-Temporal Probabilistic Causal Inference Network), a novel framework for probabilistic forecasting in STC-TS. The proposed method extends a causal inference approach by incorporating a spatial relation matrix that encodes interregional dependencies (e.g. proximity or connectivity), enabling spatially informed causal effect estimation. The resulting latent series are processed by deep probabilistic models to estimate the parameters of the distributions, enabling calibrated uncertainty modeling. We further explore multiple output distributions (e.g., Gaussian, Student’s-$t$, Laplace) to capture region-specific variability. Experiments on COVID-19 data across six countries demonstrate that STOAT outperforms state-of-the-art probabilistic forecasting models (DeepAR, DeepVAR, Deep State Space Model, etc.) in key metrics, particularly in regions with strong spatial dependencies. By bridging causal inference and geospatial probabilistic forecasting, STOAT offers a generalizable framework for complex spatial-temporal tasks, such as epidemic management.

现有方法往往独立地模拟空间和时间动态,并忽视因果关系驱动的概率预测,从而限制其预测力。为了解决这一问题,我们提议采用STOAT(空间-时空概率因果推断网络),这是在STC-TS中进行概率预测的新框架。拟议方法扩展了因果推断方法,纳入了一个空间关系矩阵,该矩阵将区域间依赖(例如接近或连通)编码,使空间知情的因果关系估计成为独立模型,并忽视因果关系驱动的概率预测,从而限制了其预测力。为此,我们提议采用STOAT(空间-时空概率概率因果诱因果推断网络),这是在STC-TS中进行概率预测的新框架。拟议方法将因果推断方法扩展为因果推断方法。6个国家的COVID-19数据实验表明,STOAT(如甚高频-亚经互连通性地理空间预测模型、甚高空基的SBI-ART),其潜在序列模型(如甚高空基的S-ART),其基础性、甚低空基的S-AV-BI-B-BI-B-SIR预测模型。


Article 89

Title@2025-06-12 (4): Learning hidden cascades via classification

Title: Learning hidden cascades via classification Versteckte Kaskaden durch Klassifizierung lernen 通过分类学习隐藏的级联 2505.11228v2

Authors (5): Derrick Gilchrist Edward Manoharan, Anubha Goel, Alexandros Iosifidis, Henri Hansen, Juho Kanniainen

The spreading dynamics in social networks are often studied under the assumption that individuals’ statuses, whether informed or infected, are fully observable. However, in many real-world situations, such statuses remain unobservable, which is crucial for determining an individual’s potential to further spread the infection. While this final status is hidden, intermediate indicators such as symptoms of infection are observable and provide important insights into the spread process. We propose a partial observability-aware Machine Learning framework to learn the characteristics of the spreading model. We term the method Distribution Classification, which utilizes the power of classifiers to infer the underlying transmission dynamics. We evaluate our method on two types of synthetic networks and extend the study to a real-world insider trading network. Results show that the method performs well, especially on complex networks with high cyclic connectivity, supporting its utility in analyzing real-world spreading phenomena where direct observation of individual statuses is not possible.

社会网络的蔓延动态往往在以下假设下研究:个人状况,无论是知情还是受感染,都是完全可见的;然而,在许多现实世界局势中,这种状况仍然是不可观察的,这对于确定个人进一步传播感染的可能性至关重要;虽然这种最终状况是隐蔽的,但传染症状等中间指标是可观测的,对传播过程提供了重要的洞察力;我们提议了一个部分易观察性机器学习框架,以了解传播模式的特点;我们使用了方法分配分类,利用分类者的力量来推断潜在的传播动态;我们评估了两种合成网络的方法,并将研究扩大到真实世界的内部交易网络;结果显示,该方法运行良好,特别是在具有高度循环连接的复杂网络上,支持其用于分析无法直接观察个人状况的真实世界传播现象。


Article 90

Title@2025-06-12 (4): Large Language Models for Detection of Life-Threatening Texts

Title: Large Language Models for Detection of Life-Threatening Texts Große Sprachmodelle zur Erkennung lebensbedrohlicher Texte 探测生命威胁文字的长语言大语言模型 2506.10687v1

Authors (3): Thanh Thi Nguyen, Campbell Wilson, Janis Dalins

Detecting life-threatening language is essential for safeguarding individuals in distress, promoting mental health and well-being, and preventing potential harm and loss of life. This paper presents an effective approach to identifying life-threatening texts using large language models (LLMs) and compares them with traditional methods such as bag of words, word embedding, topic modeling, and Bidirectional Encoder Representations from Transformers. We fine-tune three open-source LLMs including Gemma, Mistral, and Llama-2 using their 7B parameter variants on different datasets, which are constructed with class balance, imbalance, and extreme imbalance scenarios. Experimental results demonstrate a strong performance of LLMs against traditional methods. More specifically, Mistral and Llama-2 models are top performers in both balanced and imbalanced data scenarios while Gemma is slightly behind. We employ the upsampling technique to deal with the imbalanced data scenarios and demonstrate that while this method benefits traditional approaches, it does not have as much impact on LLMs. This study demonstrates a great potential of LLMs for real-world life-threatening language detection problems.

检测威胁生命的语言对于保护处于困境中的个人、促进心理健康和福祉以及防止潜在伤害和生命损失至关重要,本文件展示了使用大型语言模型(LLMs)确定威胁生命的文本的有效方法,并将这些文本与传统方法进行比较,如变异器的单词包、字嵌入、主题模型和双向编码显示器等传统方法。我们微小地运用高端技术处理不平衡的数据假设情况,并表明虽然这种方法有利于传统方法,但对Llama-2 系统的影响不大。这项研究表明LLMs在现实世界威胁生命的语言探测问题方面有很大的潜力。


Article 91

Title@2025-06-12 (4): Croppable Knowledge Graph Embedding

Title: Croppable Knowledge Graph Embedding Croppable Knowledge Graph Embedding 可裁剪知识图嵌入 2407.02779v2

Authors (6): Yushan Zhu, Wen Zhang, Zhiqiang Liu, Mingyang Chen, Lei Liang, Huajun Chen

Knowledge Graph Embedding (KGE) is a common approach for Knowledge Graphs (KGs) in AI tasks. Embedding dimensions depend on application scenarios. Requiring a new dimension means training a new KGE model from scratch, increasing cost and limiting efficiency and flexibility. In this work, we propose a novel KGE training framework MED. It allows one training to obtain a croppable KGE model for multiple scenarios with different dimensional needs. Sub-models of required dimensions can be directly cropped and used without extra training. In MED, we propose a mutual learning mechanism to improve the low-dimensional sub-models and make high-dimensional sub-models retain the low-dimensional sub-models’ capacity, an evolutionary improvement mechanism to promote the high-dimensional sub-models to master the triple that the low-dimensional sub-models can not, and a dynamic loss weight to adaptively balance the multiple losses. Experiments on 4 KGE models across 4 standard KG completion datasets, 3 real-world scenarios using a large-scale KG, and extending MED to the BERT language model demonstrate its effectiveness, high efficiency, and flexible extensibility.

知识嵌入图(KGE)是AI任务中知识图(KGS)的一种常见方法。嵌入维度取决于应用情景。 要求新的维度意味着从零开始培训一个新的KGE模型,增加成本并限制效率和灵活性。 在这项工作中,我们提议了一个新的KGE培训框架MED。它允许一种培训为具有不同维度需要的多种情景获得可裁成的KGE模型。要求层面的子模型可以在不经过额外培训的情况下直接裁剪和使用。在MED中,我们提议了一个相互学习机制,以改善低维次模型,并使高维次模型保持低维次模型的能力,这是一种进化改进机制,以促进高维次模型掌握低维子模型无法掌握的三重模型,以及动态损失权重,以适应性平衡多重损失。在4个标准KG的完成数据集中对4个KGE模型进行实验,3个真实世界模型使用大型KG,并将MED扩展至BERT语言模型,展示其有效性、高效率和灵活性。


Article 92

Title@2025-06-12 (4): Implicit Language Models are RNNs: Balancing Parallelization and Expressivity

Title: Implicit Language Models are RNNs: Balancing Parallelization and Expressivity Implizite Sprachmodelle sind RNNs: Ausgleichende Parallelisierung und Expressivität 隐含语言模式是RNNs:平衡平行和表达。 2502.07827v3

Authors (6): Mark Schöne, Babak Rahmani, Heiner Kremer, Fabian Falck, Hitesh Ballani, Jannes Gladrow

State-space models (SSMs) and transformers dominate the language modeling landscape. However, they are constrained to a lower computational complexity than classical recurrent neural networks (RNNs), limiting their expressivity. In contrast, RNNs lack parallelization during training, raising fundamental questions about the trade off between parallelization and expressivity. We propose implicit SSMs, which iterate a transformation until convergence to a fixed point. Theoretically, we show that implicit SSMs implement the non-linear state-transitions of RNNs. Empirically, we find that only approximate fixed-point convergence suffices, enabling the design of a scalable training curriculum that largely retains parallelization, with full convergence required only for a small subset of tokens. Our approach demonstrates superior state-tracking capabilities on regular languages, surpassing transformers and SSMs. We further scale implicit SSMs to natural language reasoning tasks and pretraining of large-scale language models up to 1.3B parameters on 207B tokens representing, to our knowledge, the largest implicit model trained to date. Notably, our implicit models outperform their explicit counterparts on standard benchmarks. Our code is publicly available at http://github.com/microsoft/implicit_languagemodels .

国家空间模型(SSM)和变压器在语言建模格局中占主导地位,然而,它们被限制在比古典经常性神经网络(RNN)更低的计算复杂性下,限制其表达性。相比之下,在培训期间,RNNN缺乏平行性,对平行性和表达性之间的平衡提出了根本性问题。我们提出了隐含的 SMS,在向一个固定点趋同之前,它会转换。理论上,我们显示隐含的SMS实施了非线性状态的RNNs过渡。我们经常发现,只有近乎固定点的趋同,才能设计一个可扩展的培训课程,基本上保持平行化,只对一小部分象征物要求完全趋同。我们的方法显示了在常规语言、超变压器和SMM等上的高级国家跟踪能力。我们进一步将隐含的SMS扩大到自然语言推理任务,并预先培训大型语言模型,在207B标志上达到1.3B参数,据我们所知,是迄今所训练的最大隐含型模型。 值得注意的是,我们隐含的模型超越了他们在标准模模模/Anglimal标准基准上的明确对应者。我们代码是公开的。


Article 93

Title@2025-06-12 (4): SR-Reward: Taking The Path More Traveled

Title: SR-Reward: Taking The Path More Traveled SR-Reward: Den Weg mehr beschreiten SR-奖励:走上更多旅行的道路 2501.02330v3

Authors (4): Seyed Mahdi B. Azad, Zahra Padar, Gabriel Kalweit, Joschka Boedecker

In this paper, we propose a novel method for learning reward functions directly from offline demonstrations. Unlike traditional inverse reinforcement learning (IRL), our approach decouples the reward function from the learner’s policy, eliminating the adversarial interaction typically required between the two. This results in a more stable and efficient training process. Our reward function, called \textit{SR-Reward}, leverages successor representation (SR) to encode a state based on expected future states’ visitation under the demonstration policy and transition dynamics. By utilizing the Bellman equation, SR-Reward can be learned concurrently with most reinforcement learning (RL) algorithms without altering the existing training pipeline. We also introduce a negative sampling strategy to mitigate overestimation errors by reducing rewards for out-of-distribution data, thereby enhancing robustness. This strategy inherently introduces a conservative bias into RL algorithms that employ the learned reward. We evaluate our method on the D4RL benchmark, achieving competitive results compared to offline RL algorithms with access to true rewards and imitation learning (IL) techniques like behavioral cloning. Moreover, our ablation studies on data size and quality reveal the advantages and limitations of SR-Reward as a proxy for true rewards.

在本文中,我们提出了一种从离线演示中直接学习奖励功能的新颖方法。与传统的反向强化学习(IRL)不同,我们的方法将奖励功能从学习者的政策中分离出来,消除两者之间通常需要的对抗性互动。这导致一个更加稳定和高效的培训过程。我们的奖励功能,称为\ textit{SR-Rerward},利用后续代表(SR),根据示范政策和过渡动态,根据预期的未来国家访问来将一个国家编码。通过使用贝尔曼方程式,SR-Rward可以与大多数强化学习(RL)算法同时学习,而不改变现有的培训管道。我们还采用了负面的抽样战略,通过减少分配外数据的奖励来减少过高估计错误,从而增强稳健性。我们的奖励功能是将保守的偏差引入使用学到的奖励的RL算法。我们用D4RL基准的方法取得了竞争性的结果,而从离线的RL算法可以取得真正的奖赏和模仿学习(IL)技术,例如行为性克隆。此外,我们关于数据规模和质量奖赏的真正关系研究,揭示了数据的优势。


Article 94

Title@2025-06-12 (4): Saturation Self-Organizing Map

Title: Saturation Self-Organizing Map Karte zur Selbstorganisation der Sättigung 饱和自组织地图 2506.10680v1

Authors (2): Igor Urbanik, Paweł Gajewski

Continual learning poses a fundamental challenge for neural systems, which often suffer from catastrophic forgetting when exposed to sequential tasks. Self-Organizing Maps (SOMs), despite their interpretability and efficiency, are not immune to this issue. In this paper, we introduce Saturation Self-Organizing Maps (SatSOM)-an extension of SOMs designed to improve knowledge retention in continual learning scenarios. SatSOM incorporates a novel saturation mechanism that gradually reduces the learning rate and neighborhood radius of neurons as they accumulate information. This effectively freezes well-trained neurons and redirects learning to underutilized areas of the map.

持续学习给神经系统带来了根本性的挑战,神经系统在接连工作时往往会遭受灾难性的遗忘,而自我组织的地图尽管可以解释和效率高,但不能幸免于这一问题,在本文件中,我们引入了饱和自组织地图(SatSOM)——SOM(SatSOM)——扩展SOM(SOM),目的是在不断学习的情景中提高知识保留率。SatSOM包含一种新的饱和机制,在神经元积累信息时逐渐降低学习率和周边周边周边。这有效地冻结了受过良好训练的神经元,并将学习转向地图中未充分利用的地区。


Article 95

Title@2025-06-12 (4): VeriContaminated: Assessing LLM-Driven Verilog Coding for Data Contamination

Title: VeriContaminated: Assessing LLM-Driven Verilog Coding for Data Contamination VeriContaminiert: Bewertung von LLM-getriebener Verilog Coding für Datenkontamination 受污染:评估数据污染的LLM-Driven Verilog编码 2503.13572v3

Authors (8): Zeng Wang, Minghao Shao, Jitendra Bhandari, Likhitha Mankali, Ramesh Karri, Ozgur Sinanoglu, Muhammad Shafique, Johann Knechtel

Large Language Models (LLMs) have revolutionized code generation, achieving exceptional results on various established benchmarking frameworks. However, concerns about data contamination - where benchmark data inadvertently leaks into pre-training or fine-tuning datasets - raise questions about the validity of these evaluations. While this issue is known, limiting the industrial adoption of LLM-driven software engineering, hardware coding has received little to no attention regarding these risks. For the first time, we analyze state-of-the-art (SOTA) evaluation frameworks for Verilog code generation (VerilogEval and RTLLM), using established methods for contamination detection (CCD and Min-K% Prob). We cover SOTA commercial and open-source LLMs (CodeGen2.5, Minitron 4b, Mistral 7b, phi-4 mini, LLaMA-{1,2,3.1}, GPT-{2,3.5,4o}, Deepseek-Coder, and CodeQwen 1.5), in baseline and fine-tuned models (RTLCoder and Verigen). Our study confirms that data contamination is a critical concern. We explore mitigations and the resulting trade-offs for code quality vs fairness (i.e., reducing contamination toward unbiased benchmarking).

大型语言模型(LLMS)使代码生成发生了革命性变化,在各种既定基准框架(VerilogEval和RTLLM)上取得了特殊成果,然而,对数据污染的关注(基准数据无意中泄漏到培训前或微调数据集中)引起了关于这些评估有效性的问题,尽管这个问题众所周知,限制LLM驱动软件工程的工业采用,硬件编码在这些风险方面几乎没有引起任何注意。我们首次在基线和微调模型(RTLCD和Verigen)中分析了Verilog码生成的最新(SOTA)评价框架(VerilogEval和RTLLLM),我们的研究证实,SOTA的商业和开放源码软件(CodeGen2.5、Minitron 4b、Mistral 7b、phy-4 mini、LLAMA-1,2,3.1}、GPT-{2,3.5,4o}、Deepseekk-Coder和DCQwen)中,数据污染质量和标准化标准(RTLCoder和Vergen)是一个重要的关切。我们的研究确认数据污染是降低质量和标准。


Article 96

Title@2025-06-12 (4): Practical Improvements of A/B Testing with Off-Policy Estimation

Title: Practical Improvements of A/B Testing with Off-Policy Estimation Praktische Verbesserungen der A/B-Prüfung mit Off-Policy-Abschätzung 实际改进A/B测试和非政策性估算的非政策性A/B测试 2506.10677v1

Authors (3): Sakhi Otmane, Gilotte Alexandre, Rohde David

We address the problem of A/B testing, a widely used protocol for evaluating the potential improvement achieved by a new decision system compared to a baseline. This protocol segments the population into two subgroups, each exposed to a version of the system and estimates the improvement as the difference between the measured effects. In this work, we demonstrate that the commonly used difference-in-means estimator, while unbiased, can be improved. We introduce a family of unbiased off-policy estimators that achieves lower variance than the standard approach. Among this family, we identify the estimator with the lowest variance. The resulting estimator is simple, and offers substantial variance reduction when the two tested systems exhibit similarities. Our theoretical analysis and experimental results validate the effectiveness and practicality of the proposed method.

我们处理A/B测试问题,这是用来评估新决策系统与基线相比可能实现的改进的一个广泛应用的协议。这个协议将人口分成两个分组,每个分组都接触一个版本的系统,并将改进作为衡量效果之间的差别来估计。在这项工作中,我们证明,通常使用的中值差异估测器虽然没有偏见,但可以改进。我们引入了一个不带偏见的非政策性估计器组成的大家庭,其差异小于标准方法。在这个大家庭中,我们找出了差异最小的估测器。由此得出的估测器很简单,当两个测试的系统显示出相似之处时,其差异就会大为减少。我们的理论分析和实验结果证实了拟议方法的有效性和实用性。


Article 97

Title@2025-06-12 (4): RmGPT: A Foundation Model with Generative Pre-trained Transformer for Fault Diagnosis and Prognosis in Rotating Machinery

Title: RmGPT: A Foundation Model with Generative Pre-trained Transformer for Fault Diagnosis and Prognosis in Rotating Machinery RmGPT: Ein Fundamentmodell mit generativem vortrainiertem Transformer für Fehlerdiagnose und Prognose in rotierenden Maschinen RMGPT: 轮调机械系统失灵诊断和预预报的预培训前生成变形器基金会模型 2409.17604v2

Authors (8): Yilin Wang, Yifei Yu, Kong Sun, Peixuan Lei, Yuxuan Zhang, Enrico Zio, Aiguo Xia, Yuanxiang Li

In industry, the reliability of rotating machinery is critical for production efficiency and safety. Current methods of Prognostics and Health Management (PHM) often rely on task-specific models, which face significant challenges in handling diverse datasets with varying signal characteristics, fault modes and operating conditions. Inspired by advancements in generative pretrained models, we propose RmGPT, a unified model for diagnosis and prognosis tasks. RmGPT introduces a novel generative token-based framework, incorporating Signal Tokens, Prompt Tokens, Time-Frequency Task Tokens and Fault Tokens to handle heterogeneous data within a unified model architecture. We leverage self-supervised learning for robust feature extraction and introduce a next signal token prediction pretraining strategy, alongside efficient prompt learning for task-specific adaptation. Extensive experiments demonstrate that RmGPT significantly outperforms state-of-the-art algorithms, achieving near-perfect accuracy in diagnosis tasks and exceptionally low errors in prognosis tasks. Notably, RmGPT excels in few-shot learning scenarios, achieving 82\% accuracy in 16-class one-shot experiments, highlighting its adaptability and robustness. This work establishes RmGPT as a powerful PHM foundation model for rotating machinery, advancing the scalability and generalizability of PHM solutions. \textbf{Code is available at: https://github.com/Pandalin98/RmGPT.

在工业中,旋转机械的可靠性对于生产效率和安全至关重要。目前的预测和健康管理方法往往依赖任务特有的模型,这些模型在处理具有不同信号特点、故障模式和操作条件的不同数据集方面面临着重大挑战。受基因化预测试模型进步的启发,我们提议了用于诊断和预测任务的统一模式RmGPT,这是诊断和预测任务的统一模式。RmGPT引入了一个新型的基因化代号框架,包括Signal Tokens、Spent Tokens、Timer-Front Tokens和Fault Tokens,以便在统一的模型结构中处理各种数据。我们利用自我监督学习的方法进行强力地特征提取,并采用下一个信号信号信号信号信号化预测预培训战略,同时迅速学习具体任务适应。广泛的实验表明,RmGPTT大大超前一流的算法,在诊断任务中达到近于性差的准确性,在预测/诊断任务中出现极低的错误。 突出的RMPTT在几发学习情景中优优异,在16-G级的可变动性机械基础上达到82的精确性常规,在16-GMMMLSlimdroma的可调制进行。


Article 98

Title@2025-06-12 (4): Automated Generation of Precedence Graphs in Digital Value Chains for Automotive Production

Title: Automated Generation of Precedence Graphs in Digital Value Chains for Automotive Production Automatisierte Generierung von Precedence Graphen in digitalen Wertschöpfungsketten für die Automobilproduktion 自动生成汽车生产数字价值链中的优先性图 2504.19835v2

Authors (2): Cornelius Hake, Christian Friedrich

This study examines the digital value chain in automotive manufacturing, focusing on the identification, software flashing, customization, and commissioning of electronic control units in vehicle networks. A novel precedence graph design is proposed to optimize this process chain using an automated scheduling algorithm, which combines structured data extraction from heterogeneous sources via natural language processing and classification techniques with mixed integer linear programming for efficient graph generation. The results show significant improvements in key metrics. The algorithm reduces the number of production stations equipped with expensive hardware and software to execute digital value chain processes, while also increasing capacity utilization through efficient scheduling and reduced idle time. Task parallelization is optimized, resulting in streamlined workflows and increased throughput. Compared to the traditional scheduling method, the automated approach has reduced preparation time by 50% and reduced scheduling activities, as it now takes two minutes to create the precedence graph. The flexibility of the algorithm’s constraints allows for vehicle-specific configurations while maintaining high responsiveness, eliminating backup stations and facilitating the integration of new topologies. Automated scheduling significantly outperforms manual methods in efficiency, functionality, and adaptability.

这项研究审查了汽车制造业的数字价值链,重点是车辆网络电子控制单位的识别、软件闪烁、定制和调试,建议采用新的优先图表设计,采用自动排期算法优化这一流程链,将通过自然语言处理和分类技术从不同来源按结构提取的数据与用于高效生成图形的混合整数线性编程相结合,结果显示关键指标有了重大改进;算法减少了配备了昂贵硬件和软件以执行数字价值链流程的生产站的数量,同时通过高效排期和减少闲置时间提高了能力利用率;优化了任务平行化,从而简化了工作流程和吞吐量。与传统的排期法相比,自动化方法减少了50%的排期,减少了排期活动,因为现在需要两分钟才能创建排期图。算法的局限性使得车辆特定配置具有灵活性,同时保持高度的响应能力,消除备份站,并促进新表的整合。自动排期表在效率、功能和适应性方面大大优于手工方法。


Article 99

Title@2025-06-12 (4): Diffusion-Free Graph Generation with Next-Scale Prediction

Title: Diffusion-Free Graph Generation with Next-Scale Prediction Diffusionsfreie Graphengenerierung mit Next-Scale-Vorhersage 下期期期预测的无传播图制作 2503.23612v2

Authors (6): Samuel Belkadi, Steve Hong, Marian Chen, Miruna Cretu, Charles Harris, Pietro Lio

Autoregressive models excel in efficiency and plug directly into the transformer ecosystem, delivering robust generalization, predictable scalability, and seamless workflows such as fine-tuning and parallelized training. However, they require an explicit sequence order, which contradicts the unordered nature of graphs. In contrast, diffusion models maintain permutation invariance and enable one-shot generation but require up to thousands of denoising steps and additional features for expressivity, leading to high computational costs. Inspired by recent breakthroughs in image generation, especially the success of visual autoregressive methods, we propose MAG, a novel diffusion-free graph generation framework based on next-scale prediction. By leveraging a hierarchy of latent representations, the model progressively generates scales of the entire graph without the need for explicit node ordering. Experiments on both generic and molecular graph datasets demonstrated the potential of this method, achieving inference speedups of up to three orders of magnitude over state-of-the-art methods, while preserving high-quality generation.

自动递减模型在效率方面非常出色,直接插入变压器生态系统,提供了强有力的通用、可预测的缩放性和无缝工作流程,如微调和平行培训。然而,它们需要明确的序列顺序,这与图形的无顺序性质相矛盾。相反,扩散模型保持了一闪不动的生成,但需要多达数千个分流步骤和额外特征来进行表达,从而导致高计算成本。在图像生成方面最近出现的突破,特别是视觉自动递增方法的成功,我们建议MAG,一个基于下尺度预测的新型无扩散图形生成框架。模型通过利用潜伏代表结构的等级,逐渐生成整个图形的尺度,而不需要明确的节点排序。对通用和分子图形数据集的实验显示了这种方法的潜力,在保持高质量生成的同时,在将高质量生成的同时,实现最高至三个级的超标度加速。


Article 100

Title@2025-06-12 (4): Logarithmic Smoothing for Adaptive PAC-Bayesian Off-Policy Learning

Title: Logarithmic Smoothing for Adaptive PAC-Bayesian Off-Policy Learning Logarithmische Glättung für adaptives PAC-Bayesisches Off-Policy-Lernen PAC-Bayesian适应性PAC-Bayesian离政策学习对数平滑 2506.10664v1

Authors (2): Maxime Haddouche, Otmane Sakhi

Off-policy learning serves as the primary framework for learning optimal policies from logged interactions collected under a static behavior policy. In this work, we investigate the more practical and flexible setting of adaptive off-policy learning, where policies are iteratively refined and re-deployed to collect higher-quality data. Building on the success of PAC-Bayesian learning with Logarithmic Smoothing (LS) in static settings, we extend this framework to the adaptive scenario using tools from online PAC-Bayesian theory. Furthermore, we demonstrate that a principled adjustment to the LS estimator naturally accommodates multiple rounds of deployment and yields faster convergence rates under mild conditions. Our method matches the performance of leading offline approaches in static settings, and significantly outperforms them when intermediate policy deployments are allowed. Empirical evaluations across diverse scenarios highlight both the advantages of adaptive data collection and the strength of the PAC-Bayesian formulation.

离政策学习是学习从静态行为政策下收集的登录互动中获取最佳政策的最佳政策的主要框架。 在这项工作中,我们调查适应性离政策学习的更实际和灵活环境,在这种环境中,对政策进行迭代改进和重新部署,以收集更高质量的数据。在PAC-Bayesian在静态环境中与对数平滑(LS)成功学习的基础上,我们利用在线PAC-Bayesian理论的工具,将这一框架扩大到适应情景。此外,我们还表明,对LS测算仪进行有原则的调整,自然会适应多轮部署,并在温和条件下产生更快的趋同率。我们的方法与静态环境中主要离线方法的绩效相匹配,并在允许中期政策部署时大大优于这些绩效。 不同情景的实证评估强调了适应性数据收集的优势和PAC-Bayesian公式的力度。


Article 101

Title@2025-06-12 (4): Pushing the Limits of Extreme Weather: Constructing Extreme Heatwave Storylines with Differentiable Climate Models

Title: Pushing the Limits of Extreme Weather: Constructing Extreme Heatwave Storylines with Differentiable Climate Models Die Grenzen des extremen Wetters sprengen: Extreme Hitzewellen-Geschichten mit differenzierbaren Klimamodellen konstruieren 推推极端天气的极限:用不同的气候模型构建极端热浪线 2506.10660v1

Authors (2): Tim Whittaker, Alejandro Di Luca

Understanding the plausible upper bounds of extreme weather events is essential for risk assessment in a warming climate. Existing methods, based on large ensembles of physics-based models, are often computationally expensive or lack the fidelity needed to simulate rare, high-impact extremes. Here, we present a novel framework that leverages a differentiable hybrid climate model, NeuralGCM, to optimize initial conditions and generate physically consistent worst-case heatwave trajectories. Applied to the 2021 Pacific Northwest heatwave, our method produces temperature anomalies up to 3.7 $^\circ$C above the most extreme member of a 75-member ensemble. These trajectories feature intensified atmospheric blocking and amplified Rossby wave patterns–hallmarks of severe heat events. Our results demonstrate that differentiable climate models can efficiently explore the upper tails of event likelihoods, providing a powerful new approach for constructing targeted storylines of extreme weather under climate change.

了解极端天气事件可信的上层界限对于在气候变暖情况下进行风险评估至关重要。基于基于物理模型的大型组合的现有方法往往计算成本昂贵或缺乏模拟稀有、高影响极端所需的忠诚性。在这里,我们提出了一个新的框架,利用不同的混合气候模型NeuralGCM来优化初始条件和产生物理上一致的最差热波轨迹。在2021年太平洋西北热浪中,我们的方法产生了温度异常,超过75个成员组合中最极端的成员3.7 $circ$C。这些轨迹特征是大气阻塞加剧,并放大了严重热事件的罗斯比波模式。我们的结果表明,不同的气候模型能够有效地探索事件可能性的上尾部,为构建气候变化下极端天气的定向故事线提供了强有力的新办法。


Article 102

Title@2025-06-12 (4): CoRT: Code-integrated Reasoning within Thinking

Title: CoRT: Code-integrated Reasoning within Thinking CoRT: Code-integrierte Vernunft im Denken CORT: 思考中守则综合理由 2506.09820v2

Authors (11): Chengpeng Li, Zhengyang Tang, Ziniu Li, Mingfeng Xue, Keqin Bao, Tian Ding, Ruoyu Sun, Benyou Wang, Xiang Wang, Junyang Lin, Dayiheng Liu

Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations. Addressing these limitations through computational tools (e.g., computation libraries and symbolic solvers) is promising, but it introduces a technical challenge: Code Interpreter (CI) brings external knowledge beyond the model’s internal text representations, thus the direct combination is not efficient. This paper introduces CoRT, a post-training framework for teaching LRMs to leverage CI effectively and efficiently. As a first step, we address the data scarcity issue by synthesizing code-integrated reasoning data through Hint-Engineering, which strategically inserts different hints at appropriate positions to optimize LRM-CI interaction. We manually create 30 high-quality samples, upon which we post-train models ranging from 1.5B to 32B parameters, with supervised fine-tuning, rejection fine-tuning and reinforcement learning. Our experimental results demonstrate that Hint-Engineering models achieve 4\% and 8\% absolute improvements on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B respectively, across five challenging mathematical reasoning datasets. Furthermore, Hint-Engineering models use about 30\% fewer tokens for the 32B model and 50\% fewer tokens for the 1.5B model compared with the natural language models. The models and code are available at https://github.com/ChengpengLi1003/CoRT.

o1 和 DeepSeek-R1 等大型解释模型(LRMs)显示,在使用长期思维链(CoT)的自然语言推理方面,取得了显著进展,但在处理复杂的数学操作时,它们仍然效率低或不准确。通过计算工具(例如计算图书馆和符号求解器)解决这些限制是大有希望的,但它带来了一个技术挑战:代码解释器(CI)将外部知识带出模型的内部文本演示之外,因此直接组合效率不高。本文介绍了CORT,这是一个培训后框架,用于教授LRM有效和高效地利用CI。作为第一步,我们通过Hint-EngIL3/RT,将代码集解的理论数据数据数据集数据集数据集数据集数据集(Hint-Eng-I)合成数据集数据集数据集在适当的位置上插入了不同的提示。我们手工创建了30个高品质的样本,从1.5B到32B参数的后向模型,同时对模型进行监管的微调、拒绝、微调、微调和加固化。我们的实验结果表明,Hint-Ening模型可以实现4-R-r-r-r-r-r-rxxx-xxxxx-xxxxxxxx的50的数学-x-x的数学-r-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-xx-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-


Article 103

Title@2025-06-12 (4): TSFM-Bench: A Comprehensive and Unified Benchmark of Foundation Models for Time Series Forecasting

Title: TSFM-Bench: A Comprehensive and Unified Benchmark of Foundation Models for Time Series Forecasting TSFM-Bench: Ein umfassender und einheitlicher Benchmark von Basismodellen für die Zeitreihenprognose TSFM-Bench:时间序列预测基础模型的全面统一基准 2410.11802v6

Authors (11): Zhe Li, Xiangfei Qiu, Peng Chen, Yihang Wang, Hanyin Cheng, Yang Shu, Jilin Hu, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Bin Yang

Time Series Forecasting (TSF) is key functionality in numerous fields, such as financial investment, weather services, and energy management. Although increasingly capable TSF methods occur, many of them require domain-specific data collection and model training and do not generalize well when applied in other domains. Time Series Foundation Models (TSFMs) that are pre-trained on massive heterogeneous time series data aim to overcome these limitations. The prospects for generalizability have spurred the development of a new generation of TSFMs. This study proposes a benchmark, TSFM-Bench, to facilitate comprehensive and unified evaluation of TSFMs. TSFM-Bench covers a wide range of TSFMs, including those based on large language models and those pre-trained on time series data. TSFM-Bench supports multiple forecasting scenarios, including zero-shot, few-shot, and full-shot, enabling assessment across the full range of adaptation strategies. TSFM-Bench also provides a standardized experimental protocols for critical evaluation processes such as dataset splitting, loading, normalization, and few-shot sampling, facilitating consistency and fairness. We report on an extensive evaluation of TSFMs across a diverse range of datasets spanning multiple domains and exhibiting varied statistical characteristics. Specifically, we identify pros and cons and inherent limitations of existing TSFMs, and we propose potential directions for new model designs.

时间序列预测(TSF)是金融投资、天气服务和能源管理等许多领域的关键功能,尽管技术战略框架方法越来越具有能力,但其中许多方法需要具体领域的数据收集和模型培训,在应用其他领域时没有很好地推广。时间系列基础模型(TSFM)预先培训了大规模不同时间序列数据,目的是克服这些限制。一般性的前景刺激了新一代技术战略框架的发展。本研究报告提出了一套基准,即TSFM-Bench,以促进对技术供应链的全面和统一评价。TSFM-Bench涵盖广泛的技术战略框架,包括基于大语言模型和事先接受过时间序列数据培训的模型。TSFM-Bench支持多种预测情景,包括零射、几发和全射,有利于对各种适应战略进行评估。TSFM-Bench还提出了一套标准化的示范性实验程序,用于关键评估进程,例如数据集的分离、装载、正常化和少发抽样取样,促进一致性和公平性。我们报告关于广泛评价大规模评估战略领域和多种统计模式的动态,我们提出了各种统计模式,并提出了各种统计模式,并提出了我们提出了各种统计模式,并展示了多种模式。


Article 104

Title@2025-06-12 (4): Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference

Title: Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference Graph-dependent regret Bounds in Multi-Armed Bandits mit Interferenz 多武装有干扰力的多武装强力土匪的图形- 依赖性 Regret Regret 界圈 2503.07555v2

Authors (3): Fateme Jamshidi, Mohammad Shahverdikondori, Negar Kiyavash

We study multi-armed bandits under network interference, where each unit’s reward depends on its own treatment and those of its neighbors in a given graph. This induces an exponentially large action space, making standard approaches computationally impractical. We propose a novel algorithm that uses the local graph structure to minimize regret. We derive a graph-dependent upper bound on cumulative regret that improves over prior work. Additionally, we provide the first lower bounds for bandits with arbitrary network interference, where each bound involves a distinct structural property of the graph. These bounds show that for both dense and sparse graphs, our algorithm is nearly optimal, with matching upper and lower bounds up to logarithmic factors. When the interference graph is unknown, a variant of our algorithm is Pareto optimal: no algorithm can uniformly outperform it across all instances. We complement our theoretical results with numerical experiments, showing that our approach outperforms the baseline methods.

在网络干扰下,我们研究多武装强盗,每个单位的奖赏取决于自己的待遇和某个图表中邻居的待遇。这引出了一个指数大的行动空间,使得标准方法在计算上变得不切实际。我们建议了一个使用本地图形结构的新型算法,以最大限度地减少遗憾。我们从累积的遗憾中得出一个依靠图形的上限,从而改善先前的工作。此外,我们为强盗提供了第一个较低的下限,即任意网络干扰,每个捆绑都涉及图的特殊结构属性。这些界限表明,对于稠密和稀疏的图表来说,我们的算法几乎是最佳的,与对数因素相匹配。当干扰图为未知时,我们的算法的变式是Pareto最佳的:没有一种算法可以在所有实例中一致地超越它。我们用数字实验来补充我们的理论结果,表明我们的方法比基线方法要好。


Article 105

Title@2025-06-12 (4): Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes

Title: Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes Robuste, unüberwachte Anpassung eines Spracherkennungsgeräts mit Entropie-Minimierungs- und Lautsprechercodes 使用磁最小化和演讲人守则的演讲者演讲者 2506.10653v1

Authors (4): Rogier C. van Dalen, Shucong Zhang, Titouan Parcollet, Sourav Bhattacharya

Speech recognisers usually perform optimally only in a specific environment and need to be adapted to work well in another. For adaptation to a new speaker, there is often too little data for fine-tuning to be robust, and that data is usually unlabelled. This paper proposes a combination of approaches to make adaptation to a single minute of data robust. First, instead of estimating the adaptation parameters with cross-entropy on a single error-prone hypothesis or “pseudo-label”, this paper proposes a novel loss function, the conditional entropy over complete hypotheses. Using multiple hypotheses makes adaptation more robust to errors in the initial recognition. Second, a “speaker code” characterises a speaker in a vector short enough that it requires little data to estimate. On a far-field noise-augmented version of Common Voice, the proposed scheme yields a 20% relative improvement in word error rate on one minute of adaptation data, increasing on 10 minutes to 29%.

语音识别器通常只在特定环境中最优化地发挥作用, 并且需要在另一个环境中适应效果良好。 为了适应新扬声器, 通常只有太少的数据可以微调, 数据通常不贴标签。 本文提出将适应方法组合起来, 使数据达到单分钟的稳健度。 首先, 本文没有在单一的错误易变假设或“ 假名” 上用跨点滴子来估计适应参数, 而是提出一个新的损失功能, 即有条件的对完整假称的诱变。 使用多个假设使适应能力更能适应初始识别错误。 其次, “ 扩音码” 的特性使一个矢量短, 需要很少数据来估计。 在远方的常见声音版本中, 所拟议的方案在适应数据一分钟的字错误率上产生20%的相对改善, 增加10分钟到29 % 。


Article 106

Title@2025-06-12 (4): Data Shifts Hurt CoT: A Theoretical Study

Title: Data Shifts Hurt CoT: A Theoretical Study Data Shifts Hurt CoT: Eine theoretische Studie Data Shifts Hurt CoT:理论研究 2506.10647v1

Authors (3): Lang Yin, Debangshu Banerjee, Gagandeep Singh

Chain of Thought (CoT) has been applied to various large language models (LLMs) and proven to be effective in improving the quality of outputs. In recent studies, transformers are proven to have absolute upper bounds in terms of expressive power, and consequently, they cannot solve many computationally difficult problems. However, empowered by CoT, transformers are proven to be able to solve some difficult problems effectively, such as the $k$-parity problem. Nevertheless, those works rely on two imperative assumptions: (1) identical training and testing distribution, and (2) corruption-free training data with correct reasoning steps. However, in the real world, these assumptions do not always hold. Although the risks of data shifts have caught attention, our work is the first to rigorously study the exact harm caused by such shifts to the best of our knowledge. Focusing on the $k$-parity problem, in this work we investigate the joint impact of two types of data shifts: the distribution shifts and data poisoning, on the quality of trained models obtained by a well-established CoT decomposition. In addition to revealing a surprising phenomenon that CoT leads to worse performance on learning parity than directly generating the prediction, our technical results also give a rigorous and comprehensive explanation of the mechanistic reasons of such impact.

运用于各种大型语言模型(LLMS),并证明在提高产出质量方面行之有效。在最近的研究中,变压器被证明在表达力方面具有绝对的上限,因此无法解决许多计算上的困难问题。然而,由于CoT的赋权,变压器被证明能够有效解决一些棘手问题,例如美元差异问题。然而,这些工作依赖于两个紧迫的假设:(1) 相同的培训和测试分布,和(2) 无腐败的培训数据,并有正确的推理步骤。然而,在现实世界中,这些假设并非始终有效。尽管数据变化的风险已经引起注意,但我们的工作是首先严格研究这种变化对我们知识的最佳影响所造成的确切损害。在这项工作中,我们注重美元差异问题,我们调查两类数据变化的共同影响:分布变化和数据中毒,以及由完善的Cot分解法获得的经过培训的模式的质量。此外,我们还揭示出一种令人吃惊的现象,即CoT导致学习平等性表现比直接产生严格的预测结果。


Article 107

Title@2025-06-12 (4): Flexible Tails for Normalizing Flows

Title: Flexible Tails for Normalizing Flows Flexible Endlagen für Normalisierung von Strömungen 流动正常化的灵活反面 2406.16971v2

Authors (2): Tennessee Hickling, Dennis Prangle

Normalizing flows are a flexible class of probability distributions, expressed as transformations of a simple base distribution. A limitation of standard normalizing flows is representing distributions with heavy tails, which arise in applications to both density estimation and variational inference. A popular current solution to this problem is to use a heavy tailed base distribution. We argue this can lead to poor performance due to the difficulty of optimising neural networks, such as normalizing flows, under heavy tailed input. We propose an alternative, “tail transform flow” (TTF), which uses a Gaussian base distribution and a final transformation layer which can produce heavy tails. Experimental results show this approach outperforms current methods, especially when the target distribution has large dimension or tail weight.

正常化流是一种灵活的概率分布类别,表现为简单基分布的转换。标准正常流的局限性代表重尾的分布,这种分布出现在密度估计和变异推论的应用中。当前最受欢迎的解决办法是使用重尾基分布。我们争辩说,这可能导致性能不佳,因为很难优化神经网络,例如,在重尾投入下流的正常化。我们提出了另一种选择,即“尾变流 ” (TTF),它使用高山基分布和最终变形层,可以产生重尾。实验结果显示,这种方法优于当前的方法,特别是在目标分布具有大尺寸或尾重量的情况下。


Article 108

Title@2025-06-12 (4): Structure and asymptotic preserving deep neural surrogates for uncertainty quantification in multiscale kinetic equations

Title: Structure and asymptotic preserving deep neural surrogates for uncertainty quantification in multiscale kinetic equations Struktur und asymptotische Erhaltung tiefer neuronaler Surrogate zur Unsicherheitsquantifizierung in multiskaligen kinetischen Gleichungen 用于多尺度动动能方程中不确定性量化的深海神经代代体 2506.10636v1

Authors (3): Wei Chen, Giacomo Dimarco, Lorenzo Pareschi

The high dimensionality of kinetic equations with stochastic parameters poses major computational challenges for uncertainty quantification (UQ). Traditional Monte Carlo (MC) sampling methods, while widely used, suffer from slow convergence and high variance, which become increasingly severe as the dimensionality of the parameter space grows. To accelerate MC sampling, we adopt a multiscale control variates strategy that leverages low-fidelity solutions from simplified kinetic models to reduce variance. To further improve sampling efficiency and preserve the underlying physics, we introduce surrogate models based on structure and asymptotic preserving neural networks (SAPNNs). These deep neural networks are specifically designed to satisfy key physical properties, including positivity, conservation laws, entropy dissipation, and asymptotic limits. By training the SAPNNs on low-fidelity models and enriching them with selected high-fidelity samples from the full Boltzmann equation, our method achieves significant variance reduction while maintaining physical consistency and asymptotic accuracy. The proposed methodology enables efficient large-scale prediction in kinetic UQ and is validated across both homogeneous and nonhomogeneous multiscale regimes. Numerical results demonstrate improved accuracy and computational efficiency compared to standard MC techniques.

使用传统的蒙特卡洛(MC)采样方法,虽然广泛使用,但缓慢趋同和高度差异,随着参数空间的维度增加而变得日益严重。为了加速MC采样,我们采取了一个多尺度的控制变异战略,利用简化动动能模型的低纤维溶液来减少差异。为了进一步提高采样效率和保护基础物理学,我们采用了基于结构和无药保存神经网络的代谢模型(SAPNNS)。这些深神经网络专门设计以满足关键物理特性,包括现实性、保护法、消化和无药限制。通过对SAPNN(S)进行低纤维模型培训,并用全波尔茨曼方程式中选定的高纤维化样本丰富这些网络,我们的方法在保持物理一致性和无药性精确性的同时实现了显著的差异减少。拟议的方法使得在运动UQ(SAPNNNS)中进行高效的大规模预测,并且经过同质和非遗传性、非遗传性计算方法的验证,从而展示了标准化和非遗传性结果。


Article 109

Title@2025-06-12 (4): Deploying Open-Source Large Language Models: A performance Analysis

Title: Deploying Open-Source Large Language Models: A performance Analysis Bereitstellung von Open-Source-Großsprachenmodellen: Eine Leistungsanalyse 部署开放源码大语言模式:业绩分析 2409.14887v4

Authors (3): Yannis Bendi-Ouis, Dan Dutartre, Xavier Hinaut

Since the release of ChatGPT in November 2022, large language models (LLMs) have seen considerable success, including in the open-source community, with many open-weight models available. However, the requirements to deploy such a service are often unknown and difficult to evaluate in advance. To facilitate this process, we conducted numerous tests at the Centre Inria de l’Universit'e de Bordeaux. In this article, we propose a comparison of the performance of several models of different sizes (mainly Mistral and LLaMa) depending on the available GPUs, using vLLM, a Python library designed to optimize the inference of these models. Our results provide valuable information for private and public groups wishing to deploy LLMs, allowing them to evaluate the performance of different models based on their available hardware. This study thus contributes to facilitating the adoption and use of these large language models in various application domains.

自2022年11月公布ChattGPT以来,大型语言模式(LLMS)取得了相当大的成功,包括在开放源码社区,有许多开放量级模式,然而,部署这种服务的要求往往不为人所知,而且难以事先评估,为便利这一进程,我们在Bordeaux大学中心进行了多次测试,在本篇文章中,我们提议比较几种不同规模的模型(主要是Mistral和LLLaMa)的性能,这取决于现有GPUs的性能,使用VLLM(一个旨在优化这些模型推断的Python图书馆),我们的成果为希望部署LMS的私人和公共团体提供了宝贵的信息,使他们能够根据它们现有的硬件评估不同模型的性能,从而帮助在不同应用领域采用和使用这些大型语言模式。


Article 110

Title@2025-06-12 (4): Hessian Geometry of Latent Space in Generative Models

Title: Hessian Geometry of Latent Space in Generative Models Hessische Geometrie des Latentenraums in generativen Modellen 创创模型中的深层空间的赫西德几何学 2506.10632v1

Authors (4): Alexander Lobashev, Dmitry Guskov, Maria Larchenko, Mikhail Tamm

This paper presents a novel method for analyzing the latent space geometry of generative models, including statistical physics models and diffusion models, by reconstructing the Fisher information metric. The method approximates the posterior distribution of latent variables given generated samples and uses this to learn the log-partition function, which defines the Fisher metric for exponential families. Theoretical convergence guarantees are provided, and the method is validated on the Ising and TASEP models, outperforming existing baselines in reconstructing thermodynamic quantities. Applied to diffusion models, the method reveals a fractal structure of phase transitions in the latent space, characterized by abrupt changes in the Fisher metric. We demonstrate that while geodesic interpolations are approximately linear within individual phases, this linearity breaks down at phase boundaries, where the diffusion model exhibits a divergent Lipschitz constant with respect to the latent space. These findings provide new insights into the complex structure of diffusion model latent spaces and their connection to phenomena like phase transitions. Our source code is available at https://github.com/alobashev/hessian-geometry-of-diffusion-models.

本文介绍了一种新颖的方法,用以分析包括统计物理模型和传播模型在内的基因模型的潜在空间几何,方法是通过重建渔业信息度量来分析渔业信息度量,该方法近似了所生成样本潜在变量的后部分布,并用来学习记录分配功能,该功能为指数型家庭定义了渔业指标。提供了理论趋同保证,该方法在Ising和TASEP模型上得到了验证,超过了重建热力学数量的现有基线。该方法适用于扩散模型,该方法揭示了以渔业指标突变为特征的潜层空间阶段转变的分形结构。我们表明,虽然地深线间间相互作用在个别阶段中大致线性,但这种线性在阶段间断,在潜在空间方面,扩散模型显示利普施奇兹的常数。这些发现为扩散模型潜在空间的复杂结构及其与阶段过渡等现象的联系提供了新的见解。我们的源代码可在http://github.com/alobashev/hesian-geologisat-drigation-difift-diful-mod-mography-mod-mod-mod-mod-mod-modals.)。


Article 111

Title@2025-06-12 (4): Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

Title: Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs Zeitreihenprognosen als Begründung: Ein langsam denkender Ansatz mit verstärkten LLMs 以时间序列预测作为理由:与强化的LLMs采取慢思考方式 2506.10630v1

Authors (7): Yucong Luo, Yitong Zhou, Mingyue Cheng, Jiahao Wang, Daoyu Wang, Tingyue Pan, Jintao Zhang

To advance time series forecasting (TSF), various methods have been proposed to improve prediction accuracy, evolving from statistical techniques to data-driven deep learning architectures. Despite their effectiveness, most existing methods still adhere to a fast thinking paradigm-relying on extracting historical patterns and mapping them to future values as their core modeling philosophy, lacking an explicit thinking process that incorporates intermediate time series reasoning. Meanwhile, emerging slow-thinking LLMs (e.g., OpenAI-o1) have shown remarkable multi-step reasoning capabilities, offering an alternative way to overcome these issues. However, prompt engineering alone presents several limitations - including high computational cost, privacy risks, and limited capacity for in-depth domain-specific time series reasoning. To address these limitations, a more promising approach is to train LLMs to develop slow thinking capabilities and acquire strong time series reasoning skills. For this purpose, we propose Time-R1, a two-stage reinforcement fine-tuning framework designed to enhance multi-step reasoning ability of LLMs for time series forecasting. Specifically, the first stage conducts supervised fine-tuning for warmup adaptation, while the second stage employs reinforcement learning to improve the model’s generalization ability. Particularly, we design a fine-grained multi-objective reward specifically for time series forecasting, and then introduce GRIP (group-based relative importance for policy optimization), which leverages non-uniform sampling to further encourage and optimize the model’s exploration of effective reasoning paths. Experiments demonstrate that Time-R1 significantly improves forecast performance across diverse datasets.

为推动时间序列预测(TSF),提出了各种提高预测准确性的方法,从统计技术演变为数据驱动的深层学习结构。尽管这些方法具有效力,但大多数现有方法仍然坚持快速思维模式,以提取历史模式并将其映射成未来价值观作为其核心模型哲学,缺乏包含中间时间序列推理的明确思维过程。与此同时,新出现的低思维长效LMS(例如OpenAI-o1)显示出显著的多步推理能力,为克服这些问题提供了替代方法。然而,仅迅速工程本身就提出了若干限制,包括高计算成本、隐私风险和深度特定域时间序列推理能力有限。为了克服这些限制,一个更有希望的方法是培训LLMS发展缓慢的思维能力,并获得强大的时间序列推理技能。为此,我们提议了一个两阶段强化微调框架,目的是提高LMS模型在时间序列预测方面的多步推理能力。1 第一阶段对暖化适应进行了监督的微调整,第二阶段则利用强化学习来提高模型的精确性精确度预测,从而具体地推理学地推理,从而推理地推理地推算出一个不比性地推理的GRRRRRR的模型。


Article 112

Title@2025-06-12 (4): Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning

Title: Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning Aufgabenanpassung aus Kompetenzen: Informationsgeometrie, Entflechtung und neue Ziele für unbeaufsichtigtes Weiterbildungslernen 适应技能的任务:信息几何、分离和未受监督的加强学习新目标 2506.10629v1

Authors (6): Yucheng Yang, Tianyi Zhou, Qiang He, Lei Han, Mykola Pechenizkiy, Meng Fang

Unsupervised reinforcement learning (URL) aims to learn general skills for unseen downstream tasks. Mutual Information Skill Learning (MISL) addresses URL by maximizing the mutual information between states and skills but lacks sufficient theoretical analysis, e.g., how well its learned skills can initialize a downstream task’s policy. Our new theoretical analysis in this paper shows that the diversity and separability of learned skills are fundamentally critical to downstream task adaptation but MISL does not necessarily guarantee these properties. To complement MISL, we propose a novel disentanglement metric LSEPIN. Moreover, we build an information-geometric connection between LSEPIN and downstream task adaptation cost. For better geometric properties, we investigate a new strategy that replaces the KL divergence in information geometry with Wasserstein distance. We extend the geometric analysis to it, which leads to a novel skill-learning objective WSEP. It is theoretically justified to be helpful to downstream task adaptation and it is capable of discovering more initial policies for downstream tasks than MISL. We finally propose another Wasserstein distance-based algorithm PWSEP that can theoretically discover all optimal initial policies.

无人监督的强化学习(URL)旨在学习隐性下游任务的一般技能。 互信技能学习(MISL)通过最大限度地增加国家与技能之间的相互信息,处理URL,但缺乏足够的理论分析,例如,其所学技能在多大程度上能够启动下游任务的政策。 本文中的新理论分析表明,学习技能的多样性和分离性对于下游任务适应至关重要,但MISL不一定保证这些属性。 为了补充MISL, 我们提议了一个新的分解性指标LSEPIN。 此外, 我们在LSEPIN和下游任务适应成本之间建立了信息地理学联系。 为了更好的几何特性,我们调查了一种新的战略,用瓦塞斯坦距离取代KLL在信息几何方面的差异。 我们将地理学分析扩展到它,从而导致一个新的技能学习目标WSEP。 从理论上讲,帮助下游任务适应是有道理的,并且能够发现比MISL更多的下游任务初始政策。 我们最后建议了另一个瓦列斯特斯坦远程算算算法 PWESEP, 能够从理论上发现所有最佳初始政策。


Article 113

Title@2025-06-12 (4): Leveraging Low-rank Factorizations of Conditional Correlation Matrices in Graph Learning

Title: Leveraging Low-rank Factorizations of Conditional Correlation Matrices in Graph Learning Leveraging Low-rank Factorizations of Conditional Correlation Matrices in Graph Learning 在图表学习中利用条件性关联矩阵的低级别定级 2506.10628v1

Authors (4): Thu Ha Phi, Alexandre Hippert-Ferrer, Florent Bouchard, Arnaud Breloy

This paper addresses the problem of learning an undirected graph from data gathered at each nodes. Within the graph signal processing framework, the topology of such graph can be linked to the support of the conditional correlation matrix of the data. The corresponding graph learning problem then scales to the squares of the number of variables (nodes), which is usually problematic at large dimension. To tackle this issue, we propose a graph learning framework that leverages a low-rank factorization of the conditional correlation matrix. In order to solve for the resulting optimization problems, we derive tools required to apply Riemannian optimization techniques for this particular structure. The proposal is then particularized to a low-rank constrained counterpart of the GLasso algorithm, i.e., the penalized maximum likelihood estimation of a Gaussian graphical model. Experiments on synthetic and real data evidence that a very efficient dimension-versus-performance trade-off can be achieved with this approach.

本文探讨从每个节点收集的数据中学习一个非方向图的问题。 在图形信号处理框架内, 此图的表层可以与数据有条件相关矩阵的支持联系起来。 相应的图层学习问题然后将变量数( 节点) 的方形按比例排列为通常在大范围内存在问题的变量数( 节点) 。 为了解决这一问题, 我们提议了一个图形学习框架, 利用条件相关矩阵的低等级因子化。 为了解决由此产生的优化问题, 我们获取了将里曼尼亚优化技术应用于这一特定结构所需的工具。 然后, 该提案被具体指定为GLasso 算法的低级别限制对应方, 即高斯图形模型的最大受限估计可能性。 实验合成和真实数据证据表明, 使用这种方法可以实现非常高效的维度- 反效果- 贸易。


Article 114

Title@2025-06-12 (4): Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation

Title: Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation Pretraining Generative Flow Networks mit teuren Belohnungen für die molekulare Graphengenerierung 为分子图生成建立具有不昂贵奖励的 预培训产生流动网络 2503.06337v4

Authors (5): Mohit Pandey, Gopeshh Subbaraj, Artem Cherkasov, Martin Ester, Emmanuel Bengio

Generative Flow Networks (GFlowNets) have recently emerged as a suitable framework for generating diverse and high-quality molecular structures by learning from rewards treated as unnormalized distributions. Previous works in this framework often restrict exploration by using predefined molecular fragments as building blocks, limiting the chemical space that can be accessed. In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. We propose an unsupervised pre-training approach using drug-like molecule datasets, which teaches A-GFNs about inexpensive yet informative molecular descriptors such as drug-likeliness, topological polar surface area, and synthetic accessibility scores. These properties serve as proxy rewards, guiding A-GFNs towards regions of chemical space that exhibit desirable pharmacological properties. We further implement a goal-conditioned finetuning process, which adapts A-GFNs to optimize for specific target properties. In this work, we pretrain A-GFN on a subset of ZINC dataset, and by employing robust evaluation metrics we show the effectiveness of our approach when compared to other relevant baseline methods for a wide range of drug design tasks. The code is accessible at https://github.com/diamondspark/AGFN.

最近,出现了一种利用个体原子作为组成部分来更全面地探索类似毒品的化学空间的基础性基因化模型(GFlowNets),通过学习作为非标准化分布分布的奖励,形成多样化和高质量分子结构的合适框架。本框架以前的工作往往限制勘探,使用预先定义的分子碎片作为构件,限制可以进入的化学空间。在这项工作中,我们引入了Atoro-GFlowNets(A-GFNs),这是利用个体原子作为建筑块来更全面地探索类似毒品的化学空间的基础性基因化模型。我们建议采用一种不受监督的训练前方法,使用类似毒品的分子数据集,教A-GFNs使用廉价但信息丰富的分子描述器,例如类似毒品、表层极地表面积和合成的可获取性分数。这些特性作为代理奖赏,指导A-GFNs到具有适当药性能的化学空间区域。我们进一步实施一个目标性调整过程,使A-GFNs适应于特定的目标性特性。在这项工作中,我们先将A-GFN系统用于ZINC数据基点的一个子组的一组,然后在使用可靠的基准/RAFDAFS范围上,在使用其他设计方法上显示我们可获取性基准设计范围时,我们的相关设计方法的范围。


Article 115

Title@2025-06-12 (4): SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis

Title: SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis SDialog: Ein Python-Toolkit für die Synthetische Dialog-Generierung und -Analyse Sidialog:合成对话生成和分析的Python工具包 2506.10622v1

Authors (3): Sergio Burdisso, Esaú Villatoro-Tello, Petr Motlicek

The advancement of conversational AI systems relies on the availability of high-quality, flexible, and reproducible synthetic dialogues for training, evaluation, and benchmarking. SDialog is a modular, extensible Python toolkit designed to address the challenges of synthetic dialogue generation and analysis. By leveraging instruction-tuned Large Language Models (LLMs), SDialog provides abstractions for personas, orchestration, and scenario management, enabling the creation of realistic, diverse, and controllable conversational data for research and development. SDialog supports workflows such as multi-agent simulation and scenario-driven generation, and represents a step forward in the standardization of tools and frameworks for synthetic data generation, a crucial advancement for ensuring reproducibility in today’s fast-evolving research landscape.

Sdialog是一个模块化的、可推广的Python工具包,旨在应对合成对话生成和分析的挑战。 Sdialog通过利用受指导的大型语言模型(LLMs),为个人、交响和情景管理提供抽象信息,为研究和开发创造现实、多样和可控制的交谈数据。Sdialog支持多试模拟和情景生成等工作流程,并代表了合成数据生成工具和框架标准化的一步,这是确保当今快速变化的研究格局中再生的关键进步。


Article 116

Title@2025-06-12 (4): Assessing the Resilience of Automotive Intrusion Detection Systems to Adversarial Manipulation

Title: Assessing the Resilience of Automotive Intrusion Detection Systems to Adversarial Manipulation Bewertung der Resilienz von Intrusions-Detektionssystemen für die Adversarialmanipulation 评估汽车入侵探测系统对冲操纵的复原力 2506.10620v1

Authors (4): Stefano Longari, Paolo Cerracchio, Michele Carminati, Stefano Zanero

The security of modern vehicles has become increasingly important, with the controller area network (CAN) bus serving as a critical communication backbone for various Electronic Control Units (ECUs). The absence of robust security measures in CAN, coupled with the increasing connectivity of vehicles, makes them susceptible to cyberattacks. While intrusion detection systems (IDSs) have been developed to counter such threats, they are not foolproof. Adversarial attacks, particularly evasion attacks, can manipulate inputs to bypass detection by IDSs. This paper extends our previous work by investigating the feasibility and impact of gradient-based adversarial attacks performed with different degrees of knowledge against automotive IDSs. We consider three scenarios: white-box (attacker with full system knowledge), grey-box (partial system knowledge), and the more realistic black-box (no knowledge of the IDS’ internal workings or data). We evaluate the effectiveness of the proposed attacks against state-of-the-art IDSs on two publicly available datasets. Additionally, we study effect of the adversarial perturbation on the attack impact and evaluate real-time feasibility by precomputing evasive payloads for timed injection based on bus traffic. Our results demonstrate that, besides attacks being challenging due to the automotive domain constraints, their effectiveness is strongly dependent on the dataset quality, the target IDS, and the attacker’s degree of knowledge.

现代车辆的安全已变得日益重要,控制区网络(CAN)公共汽车是各种电子控制单位的关键通信主干线。由于CAN缺乏强有力的安全措施,加上车辆的连通性日益增强,因此这些车辆很容易受到网络攻击。虽然已经开发了入侵探测系统(IDS)来对付这种威胁,但这些系统并非易犯的。反向攻击,特别是逃避攻击,可以操纵输入,绕过IDS的探测。本文扩展了我们以前的工作,调查了基于梯度的对冲攻击的可行性和影响,对汽车IDS进行了不同程度的知识攻击。我们考虑了三种情景:白箱(具有全系统知识的攻击者)、灰盒(部分系统知识)和更加现实的黑盒(对IDS的内部工作或数据一无所知)。我们评估了拟议对State of the state-the art-arted IDS的攻击的有效性。此外,我们研究了对攻击袭击的对抗性透视线对攻击影响的影响,并通过预先分析对实时可行性进行评估。我们考虑的是:白箱(具有全系统知识的攻击)、灰盒(部分系统知识)以及更现实的黑盒(没有时间定位数据系统),我们对攻击的定位数据系统测试质量限制。


Article 117

Title@2025-06-12 (4): Universal Neural Optimal Transport

Title: Universal Neural Optimal Transport Universeller Neural Optimaler Verkehr 通用神经优化运输 2212.00133v6

Authors (4): Jonathan Geuter, Gregor Kornhardt, Ingimar Tomasson, Vaios Laschos

Optimal Transport (OT) problems are a cornerstone of many applications, but solving them is computationally expensive. To address this problem, we propose UNOT (Universal Neural Optimal Transport), a novel framework capable of accurately predicting (entropic) OT distances and plans between discrete measures for a given cost function. UNOT builds on Fourier Neural Operators, a universal class of neural networks that map between function spaces and that are discretization-invariant, which enables our network to process measures of variable resolutions. The network is trained adversarially using a second, generating network and a self-supervised bootstrapping loss. We ground UNOT in an extensive theoretical framework. Through experiments on Euclidean and non-Euclidean domains, we show that our network not only accurately predicts OT distances and plans across a wide range of datasets, but also captures the geometry of the Wasserstein space correctly. Furthermore, we show that our network can be used as a state-of-the-art initialization for the Sinkhorn algorithm with speedups of up to $7.4\times$, significantly outperforming existing approaches.

最佳运输(OT)问题是许多应用的基石,但解决它们的费用是计算成本高昂的。为了解决这个问题,我们提议UNOT(通用神经最佳运输),这是一个能够准确预测特定成本功能离散措施(即机能性)OT距离和计划的新框架。UNOT建在Fourier神经操作器上,这是在功能空间之间绘制图的通用神经网络类别,并且是分解和异变的神经网络,使我们的网络能够处理变量分辨率的测量。这个网络是用第二个工具进行对抗性训练的,产生网络和自我监督的靴子损失。我们把UNOT置于一个广泛的理论框架内。我们通过在Euclidean和非EClidean域的实验,我们表明我们的网络不仅准确预测了在广泛数据集中的OT距离和计划,而且还正确地捕捉了瓦列斯坦空间的几何测量。此外,我们显示我们的网络可以作为Sinkhorn算法的尖端初始化方法,其速度高达7.4美元。


Article 118

Title@2025-06-12 (4): Deep Learning-Based Digitization of Overlapping ECG Images with Open-Source Python Code

Title: Deep Learning-Based Digitization of Overlapping ECG Images with Open-Source Python Code Deep Learning-based Digitalisierung von überlappenden EKG-Bildern mit Open-Source-Python-Code 使用开放源码的 ECG 重叠图像的深学习数字化 2506.10617v1

Authors (4): Reza Karbasi, Masoud Rahimi, Abdol-Hossein Vahabie, Hadi Moradi

This paper addresses the persistent challenge of accurately digitizing paper-based electrocardiogram (ECG) recordings, with a particular focus on robustly handling single leads compromised by signal overlaps-a common yet under-addressed issue in existing methodologies. We propose a two-stage pipeline designed to overcome this limitation. The first stage employs a U-Net based segmentation network, trained on a dataset enriched with overlapping signals and fortified with custom data augmentations, to accurately isolate the primary ECG trace. The subsequent stage converts this refined binary mask into a time-series signal using established digitization techniques, enhanced by an adaptive grid detection module for improved versatility across different ECG formats and scales. Our experimental results demonstrate the efficacy of our approach. The U-Net architecture achieves an IoU of 0.87 for the fine-grained segmentation task. Crucially, our proposed digitization method yields superior performance compared to a well-established baseline technique across both non-overlapping and challenging overlapping ECG samples. For non-overlapping signals, our method achieved a Mean Squared Error (MSE) of 0.0010 and a Pearson Correlation Coefficient (rho) of 0.9644, compared to 0.0015 and 0.9366, respectively, for the baseline. On samples with signal overlap, our method achieved an MSE of 0.0029 and a rho of 0.9641, significantly improving upon the baseline’s 0.0178 and 0.8676. This work demonstrates an effective strategy to significantly enhance digitization accuracy, especially in the presence of signal overlaps, thereby laying a strong foundation for the reliable conversion of analog ECG records into analyzable digital data for contemporary research and clinical applications. The implementation is publicly available at this GitHub repository: https://github.com/masoudrahimi39/ECG-code.

本文讨论了将基于纸张的心电图(ECG)记录进行准确数字化的持久挑战,特别侧重于严格处理因信号重叠而受损的单项线索,这是现有方法中常见但处理不足的问题。我们提出一个旨在克服这一限制的两阶段管道。第一阶段使用基于U-Net的分解网络,该网的分解网络经过培训,该网的数据集经过重叠信号的丰富和定制数据增强,以准确隔离主要的ECG追踪。随后阶段利用既定的数字化技术,将这一经过改进的双面遮罩转换成一个时间序列信号,并辅之以一个适应性电网检测模块,以改善不同ECGF格式和尺度的多功能。我们的实验结果显示了我们的方法的有效性。U-Net结构为精细分解任务实现了0.87的IOU。 值得注意的是,我们提议的数字化方法在非重叠和具有挑战性的ECGH样本中,将这一改进的双面双面双面双层遮固的双层遮掩罩。 对于非重叠的信号,我们的方法在0.00和Pearson-Correxl的服务器的转换应用中, 将这一基基基比为0.9-0.00的基比的基数据, 提高了我们的数据基基基比为0.16的基比为0.9的基的基比的基比的基基基比的基基比提高了。


Article 119

Title@2025-06-12 (4): Non-stationary Online Learning for Curved Losses: Improved Dynamic Regret via Mixability

Title: Non-stationary Online Learning for Curved Losses: Improved Dynamic Regret via Mixability Non-stationäres Online-Lernen für geschwungene Verluste: Dynamischer Streit durch Mischbarkeit verbessert 非静止的曲线损失在线学习:通过混合改进动态迟缓 2506.10616v1

Authors (3): Yu-Jie Zhang, Peng Zhao, Masashi Sugiyama

Non-stationary online learning has drawn much attention in recent years. Despite considerable progress, dynamic regret minimization has primarily focused on convex functions, leaving the functions with stronger curvature (e.g., squared or logistic loss) underexplored. In this work, we address this gap by showing that the regret can be substantially improved by leveraging the concept of mixability, a property that generalizes exp-concavity to effectively capture loss curvature. Let $d$ denote the dimensionality and $P_T$ the path length of comparators that reflects the environmental non-stationarity. We demonstrate that an exponential-weight method with fixed-share updates achieves an $\mathcal{O}(d T^{1/3} P_T^{2/3} \log T)$ dynamic regret for mixable losses, improving upon the best-known $\mathcal{O}(d^{10/3} T^{1/3} P_T^{2/3} \log T)$ result (Baby and Wang, 2021) in $d$. More importantly, this improvement arises from a simple yet powerful analytical framework that exploits the mixability, which avoids the Karush-Kuhn-Tucker-based analysis required by existing work.

近些年来,非静止在线学习引起了人们的极大关注。尽管取得了相当大的进步,但动态最小化的遗憾主要集中于 convex 功能,使得这些功能的曲线(如平方或后勤损失)没有得到充分探讨。在这项工作中,我们通过利用混合化概念,表明这一遗憾可以大大改善,因为混合化概念是一种通用化的属性,可以有效捕捉损失曲线。让美元表示方位和美元方块的路径长度,反映环境非静止化的参照方块。我们证明,使用固定共享更新的指数加权法可以取得美元(d T1/3}(d T2/3}P_T_P_T$)。更重要的是,这种改进产生于简单而有力的分析框架,通过利用现有的模型(Baby和Wang, 2021, 美元),避免了目前这种组合化分析。


Article 120

Title@2025-06-12 (4): Data Driven Diagnosis for Large Cyber-Physical-Systems with Minimal Prior Information

Title: Data Driven Diagnosis for Large Cyber-Physical-Systems with Minimal Prior Information Datenangetriebene Diagnose für große Cyber-Physical-Systeme mit minimalen Vorinformationen 具有最起码原始信息的大型网络物理系统数据驱动诊断 2506.10613v1

Authors (6): Henrik Sebastian Steude, Alexander Diedrich, Ingo Pill, Lukas Moddemann, Daniel Vranješ, Oliver Niggemann

Diagnostic processes for complex cyber-physical systems often require extensive prior knowledge in the form of detailed system models or comprehensive training data. However, obtaining such information poses a significant challenge. To address this issue, we present a new diagnostic approach that operates with minimal prior knowledge, requiring only a basic understanding of subsystem relationships and data from nominal operations. Our method combines a neural network-based symptom generator, which employs subsystem-level anomaly detection, with a new graph diagnosis algorithm that leverages minimal causal relationship information between subsystems-information that is typically available in practice. Our experiments with fully controllable simulated datasets show that our method includes the true causal component in its diagnosis set for 82 p.c. of all cases while effectively reducing the search space in 73 p.c. of the scenarios. Additional tests on the real-world Secure Water Treatment dataset showcase the approach’s potential for practical scenarios. Our results thus highlight our approach’s potential for practical applications with large and complex cyber-physical systems where limited prior knowledge is available.

复杂的网络物理体系的诊断过程往往需要以详细系统模型或全面培训数据的形式提供广泛的先前知识。然而,获取此类信息是一个重大挑战。为了解决这一问题,我们提出了一个新的诊断方法,它以最起码的先前知识运作,只需要对子系统关系和名义操作数据有基本了解。我们的方法将神经网络症状生成器(使用子系统一级异常探测法)与新的图表诊断算法(利用通常可以实际得到的子系统信息之间最小的因果关系信息)结合起来。我们用完全可控的模拟数据集进行的实验表明,我们的方法包括所有案例的正确因果关系,而其诊断集为82页c.,同时有效地减少了情景的搜索空间。对现实世界安全水处理数据集的额外测试展示了该方法在实际情景方面的潜力。因此,我们的结果突出表明了我们的方法在使用大型和复杂的网络物理系统方面的潜在实际应用潜力,而先前的知识有限。


Article 121

Title@2025-06-12 (4): Density Ratio Estimation with Conditional Probability Paths

Title: Density Ratio Estimation with Conditional Probability Paths Schätzung des Dichte-Verhältnisses mit bedingten Wahrscheinlichkeitspfaden 以有条件概率路径估算密度比率 2502.02300v3

Authors (5): Hanlin Yu, Arto Klami, Aapo Hyvärinen, Anna Korba, Omar Chehab

Density ratio estimation in high dimensions can be reframed as integrating a certain quantity, the time score, over probability paths which interpolate between the two densities. In practice, the time score has to be estimated based on samples from the two densities. However, existing methods for this problem remain computationally expensive and can yield inaccurate estimates. Inspired by recent advances in generative modeling, we introduce a novel framework for time score estimation, based on a conditioning variable. Choosing the conditioning variable judiciously enables a closed-form objective function. We demonstrate that, compared to previous approaches, our approach results in faster learning of the time score and competitive or better estimation accuracies of the density ratio on challenging tasks. Furthermore, we establish theoretical guarantees on the error of the estimated density ratio.

高维度的密度比率估计可被重新设定为结合一定数量、时间评分、超过两种密度之间相互交错的概率路径。 实际上,时间评分必须根据两种密度的样本来估计。然而,这一问题的现有方法仍然计算昂贵,可以得出不准确的估计数。在基因模型的最近进展的启发下,我们引入了一个基于调节变数的时间评分新框架。选择调节变数明智地选择了封闭式目标功能。我们证明,与以往方法相比,我们的方法可以更快地了解时间评分和竞争性或更好估计挑战性任务密度比率的准确性。此外,我们还为估计密度比率的错误建立了理论保障。


Article 122

Title@2025-06-12 (4): Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models

Title: Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models Inferenz-Zeit-Dekomposition von Aktivierungen (ITDA): Ein skalierbarer Ansatz zur Interpretation großer Sprachmodelle 触发(ITDA):解释大语言模式的可扩展办法 2505.17769v2

Authors (3): Patrick Leask, Neel Nanda, Noura Al Moubayed

Sparse autoencoders (SAEs) are a popular method for decomposing Large Langage Models (LLM) activations into interpretable latents. However, due to their substantial training cost, most academic research uses open-source SAEs which are only available for a restricted set of models of up to 27B parameters. SAE latents are also learned from a dataset of activations, which means they do not transfer between models. Motivated by relative representation similarity measures, we introduce Inference-Time Decomposition of Activations (ITDA) models, an alternative method for decomposing language model activations. To train an ITDA, we greedily construct a dictionary of language model activations on a dataset of prompts, selecting those activations which were worst approximated by matching pursuit on the existing dictionary. ITDAs can be trained in just 1% of the time required for SAEs, using 1% of the data. This allowed us to train ITDAs on Llama-3.1 70B and 405B on a single consumer GPU. ITDAs can achieve similar reconstruction performance to SAEs on some target LLMs, but generally incur a performance penalty. However, ITDA dictionaries enable cross-model comparisons, and a simple Jaccard similarity index on ITDA dictionaries outperforms existing methods like CKA, SVCCA, and relative representation similarity metrics. ITDAs provide a cheap alternative to SAEs where computational resources are limited, or when cross model comparisons are necessary. Code available at https://github.com/pleask/itda.

将大型朗格模型(LLM)激活成可解释的潜值,这是将大型朗格模型(LLM)激活成大型朗格模型的流行方法。然而,由于培训费用高昂,大多数学术研究都使用开放源的SAE系统,而只有一套范围有限的、不超过27B参数的模型才有开放源的SAE系统。还从一个启动数据集中学习SAE的潜值,这意味着它们不会在模型之间转移。受相对代表性相似措施的驱动,我们引入了将活动模型(ITDA)激活成可解释的语言模型激活的替代方法。为了培训ITDA,我们贪婪地在提示数据集上建立语言模型的词典启动,选择最接近于现有字典中最差的启动。ITDA系统可以仅用1%的时间对SAE系统进行训练,使用1%的数据。这使我们能够在Llama-31-70B和405B的替代方法中培训ITDA系统,但在单个消费者GPU中,ITDA系统可以实现类似的重建业绩,在SADA上普遍地使IDA系统能使ICAS格式的S得到类似的业绩。


Article 123

Title@2025-06-12 (4): Distortion-Aware Brushing for Reliable Cluster Analysis in Multidimensional Projections

Title: Distortion-Aware Brushing for Reliable Cluster Analysis in Multidimensional Projections Distortion-Aware Bürsten für zuverlässige Clusteranalyse in mehrdimensionalen Projektionen 为多层面预测中的可靠集集分析 2201.06379v2

Authors (7): Hyeon Jeon, Michaël Aupetit, Soohyun Lee, Kwon Ko, Youngtaek Kim, Ghulam Jilani Quadri, Jinwook Seo

Brushing is a common interaction technique in 2D scatterplots, allowing users to select clustered points within a continuous, enclosed region for further analysis or filtering. However, applying conventional brushing to 2D representations of multidimensional (MD) data, i.e., Multidimensional Projections (MDPs), can lead to unreliable cluster analysis due to MDP-induced distortions that inaccurately represent the cluster structure of the original MD data. To alleviate this problem, we introduce a novel brushing technique for MDPs called Distortion-aware brushing. As users perform brushing, Distortion-aware brushing corrects distortions around the currently brushed points by dynamically relocating points in the projection, pulling data points close to the brushed points in MD space while pushing distant ones apart. This dynamic adjustment helps users brush MD clusters more accurately, leading to more reliable cluster analysis. Our user studies with 24 participants show that Distortion-aware brushing significantly outperforms previous brushing techniques for MDPs in accurately separating clusters in the MD space and remains robust against distortions. We further demonstrate the effectiveness of our technique through two use cases: (1) conducting cluster analysis of geospatial data and (2) interactively labeling MD clusters.

在 2D 散射点中,浏览是一种常见的互动技术,使用户能够在连续、封闭的封闭区域中选择聚集点,以便进一步分析或过滤。然而,对多维(MD)数据的2D表示方式,即多维预测(MDP)数据(MDP)应用常规的刷刷,可能导致由于MDP引起的扭曲而导致群集分析不可靠,这种扭曲不准确地代表原始MD数据的群集结构。为了缓解这一问题,我们为MDP引进了一种叫作扭曲的刷刷新技术。随着用户在投影中通过动态移动点对目前刷出的点进行刷新,扭曲意识的刷新,将数据点拉近MD空间的刷新点,同时将数据拉离遥远的点。这种动态调整有助于用户更准确地刷刷MD组,从而导致更可靠的群集分析。我们与24名参与者进行的用户研究表明,扭曲觉觉的刷新方法大大偏离了MD 空间中精确分离组合的MD 空间中微调技术,并保持稳健防止扭曲。我们进一步通过两个互动案例展示了我们技术的有效性:进行MD 类集 标签分析。


Article 124

Title@2025-06-12 (4): TransMLA: Multi-Head Latent Attention Is All You Need

Title: TransMLA: Multi-Head Latent Attention Is All You Need TransMLA: Mehrkopf-Latent-Achtung ist alles, was Sie brauchen TransMLA: 多党领袖关注是你们需要的 2502.07864v5

Authors (6): Fanxu Meng, Pingzhi Tang, Xiaojuan Tang, Zengwei Yao, Xing Sun, Muhan Zhang

In this paper, we present TransMLA, a framework that seamlessly converts any GQA-based pre-trained model into an MLA-based model. Our approach enables direct compatibility with DeepSeek’s codebase, allowing these models to fully leverage DeepSeek-specific optimizations such as vLLM and SGlang. By compressing 93% of the KV cache in LLaMA-2-7B, TransMLA achieves a 10.6x inference speedup at an 8K context length while preserving meaningful output quality. Additionally, the model requires only 6 billion tokens for fine-tuning to regain performance on par with the original across multiple benchmarks. TransMLA offers a practical solution for migrating GQA-based models to the MLA structure. When combined with DeepSeek’s advanced features, such as FP8 quantization and Multi-Token Prediction, even greater inference acceleration can be realized.

在本文中,我们展示了 TransMLA, 这个框架可以无缝地将基于GQA的预培训模式转换成基于司法协助的模式。 我们的方法可以直接与DeepSeek的代码库兼容, 使这些模型能够充分利用DepSeek特有的优化, 如 vLLM 和SGlan。 通过压缩LalaMA-2-7B 中93%的 KV缓存, TransMLA 可以在保留有意义的产出质量的同时以8K 的上下文长度加速10.6x 的推断速度。 此外, 该模型只需要60亿个符号进行微调, 以恢复与最初的多个基准相同的业绩。 TransMLA 提供了将基于GQA 的模型迁移到司法协助结构的实用解决方案。 当与DeepSeek的先进功能, 如FP8 量化和多方向预测相结合时, 甚至可以实现更大的推论加速。


Article 125

Title@2025-06-12 (4): ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization

Title: ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization ConfPO: Ausnutzen des politischen Modells Vertrauen für kritische Token-Auswahl in Präferenz-Optimierung 召集:利用政策模范信心在优先最佳化中选择关键物优选标准 2506.08712v2

Authors (5): Hee Suk Yoon, Eunseop Yoon, Mark Hasegawa-Johnson, Sungwoong Kim, Chang D. Yoo

We introduce ConfPO, a method for preference learning in Large Language Models (LLMs) that identifies and optimizes preference-critical tokens based solely on the training policy’s confidence, without requiring any auxiliary models or compute. Unlike prior Direct Alignment Algorithms (DAAs) such as Direct Preference Optimization (DPO), which uniformly adjust all token probabilities regardless of their relevance to preference, ConfPO focuses optimization on the most impactful tokens. This targeted approach improves alignment quality while mitigating overoptimization (i.e., reward hacking) by using the KL divergence budget more efficiently. In contrast to recent token-level methods that rely on credit-assignment models or AI annotators, raising concerns about scalability and reliability, ConfPO is simple, lightweight, and model-free. Experimental results on challenging alignment benchmarks, including AlpacaEval 2 and Arena-Hard, demonstrate that ConfPO consistently outperforms uniform DAAs across various LLMs, delivering better alignment with zero additional computational overhead.

我们引入了ConpPO(ConfPO),这是在大语言模型中进行优先学习的一种方法,它仅基于培训政策的信心,而无需任何辅助模型或计算,而确定和优化偏爱关键符号。与以前直接对齐的算法(DAAs)(DPO)不同,前者统一调整所有象征性概率,而不论其与偏好的相关性如何,ConfPO(Fonpo)侧重于最有影响力的标码。这一有针对性的方法提高了匹配质量,同时通过使用 KL 差异预算更有效地减少过度优化(即奖励黑客),同时通过使用 KL 差异预算来降低过度优化(即奖励) 。与最近采用依赖信用分配模型或AI Antiers(AI ) 的象征性方法相比, ConfPO(Compo) 是简单、轻和无型的。关于具有挑战性的校准基准(包括AlpacaEval 2 和 Arena-Hard)的实验结果显示,Conpo(Confo) 始终优于不同LMS(DAA) 的一致的DA) ,与零的额外计算管理。


Article 126

Title@2025-06-12 (4): Generative Uncertainty in Diffusion Models

Title: Generative Uncertainty in Diffusion Models Generative Unsicherheit bei Diffusionsmodellen 扩散模型中产生的不确定性 2502.20946v2

Authors (6): Metod Jazbec, Eliot Wong-Toi, Guoxuan Xia, Dan Zhang, Eric Nalisnick, Stephan Mandt

Diffusion models have recently driven significant breakthroughs in generative modeling. While state-of-the-art models produce high-quality samples on average, individual samples can still be low quality. Detecting such samples without human inspection remains a challenging task. To address this, we propose a Bayesian framework for estimating generative uncertainty of synthetic samples. We outline how to make Bayesian inference practical for large, modern generative models and introduce a new semantic likelihood (evaluated in the latent space of a feature extractor) to address the challenges posed by high-dimensional sample spaces. Through our experiments, we demonstrate that the proposed generative uncertainty effectively identifies poor-quality samples and significantly outperforms existing uncertainty-based methods. Notably, our Bayesian framework can be applied post-hoc to any pretrained diffusion or flow matching model (via the Laplace approximation), and we propose simple yet effective techniques to minimize its computational overhead during sampling.

最近,传播模型在基因模型方面取得了重大突破。尽管最先进的模型平均产生高质量的样本,但单个样本的质量仍然很低。在未经人类检查的情况下检测这些样本仍然是一项艰巨的任务。为了解决这个问题,我们提议建立一个贝叶斯框架来估计合成样本的基因变异性。我们概述了如何使贝叶斯的推论在大型现代基因模型中切实可行,并引入一种新的语义可能性(在特征提取的潜质空间中进行评估)来应对高维样本空间构成的挑战。通过我们的实验,我们证明拟议的基因变异性有效地识别了质量差的样本,大大超越了现有的基于不确定性的方法。值得注意的是,我们的贝叶斯框架可以应用在热后应用任何未经训练的传播或流动匹配模型(通过拉帕热近光线),我们提出了在取样过程中尽量减少其计算间接费用的简单而有效的技术。


Article 127

Title@2025-06-12 (4): Size-adaptive Hypothesis Testing for Fairness

Title: Size-adaptive Hypothesis Testing for Fairness Size-adaptive Hypothesenprüfung auf Fairness 公平性假设测试 2506.10586v1

Authors (5): Antonio Ferrara, Francesco Cozzi, Alan Perotti, André Panisson, Francesco Bonchi

Determining whether an algorithmic decision-making system discriminates against a specific demographic typically involves comparing a single point estimate of a fairness metric against a predefined threshold. This practice is statistically brittle: it ignores sampling error and treats small demographic subgroups the same as large ones. The problem intensifies in intersectional analyses, where multiple sensitive attributes are considered jointly, giving rise to a larger number of smaller groups. As these groups become more granular, the data representing them becomes too sparse for reliable estimation, and fairness metrics yield excessively wide confidence intervals, precluding meaningful conclusions about potential unfair treatments. In this paper, we introduce a unified, size-adaptive, hypothesis-testing framework that turns fairness assessment into an evidence-based statistical decision. Our contribution is twofold. (i) For sufficiently large subgroups, we prove a Central-Limit result for the statistical parity difference, leading to analytic confidence intervals and a Wald test whose type-I (false positive) error is guaranteed at level $\alpha$. (ii) For the long tail of small intersectional groups, we derive a fully Bayesian Dirichlet-multinomial estimator; Monte-Carlo credible intervals are calibrated for any sample size and naturally converge to Wald intervals as more data becomes available. We validate our approach empirically on benchmark datasets, demonstrating how our tests provide interpretable, statistically rigorous decisions under varying degrees of data availability and intersectionality.

确定算法决策系统是否歧视特定人口,通常需要比较一个单一点的公平衡量标准估计值,与预先确定的阈值相比较。这种做法在统计上是模糊的:它忽略了抽样错误,把小人口分组与大分组一样对待。在交叉分析中,问题加剧,因为多重敏感属性被一并考虑,导致更多的小群体。随着这些群体变得更加细微,它们所代表的数据变得过于稀少,无法可靠地估算,公平衡量标准产生过度广泛的信任间隔,从而无法对潜在的不公平待遇作出有意义的结论。在本文中,我们引入了一个统一、大小适应性的假设测试框架,将公平评估转化为基于证据的统计决定。我们的贡献是双重的。 对于足够大的分组,我们证明是统计均等差异的一个中度-弹性结果,导致分析性的信任间隔,以及衡量其类型一(偏差)的误差以美元表示肯定的正差,从而导致对潜在不公平待遇进行过度的交叉分析。 (二)对于小型交叉小组的长尾巴伊西亚-迪里斯特利德·迪利斯特利斯特-迪勒(Dalbloral-alblationalblationalblationalblationalblationalblationalalalalblock)) 数据测试,我们得出了一种完全的基比基比基比重的标准化数据。


Article 128

Title@2025-06-12 (4): MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark

Title: MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark MAYA: Inkonsistenzen in generativem Passwort ansprechen Raten über einen einheitlichen Benchmark MAYA:通过统一基准解决生成密码猜猜不一致问题 2504.16651v2

Authors (4): William Corrias, Fabio De Gaspari, Dorjan Hitaj, Luigi V. Mancini

Recent advances in generative models have led to their application in password guessing, with the aim of replicating the complexity, structure, and patterns of human-created passwords. Despite their potential, inconsistencies and inadequate evaluation methodologies in prior research have hindered meaningful comparisons and a comprehensive, unbiased understanding of their capabilities. This paper introduces MAYA, a unified, customizable, plug-and-play benchmarking framework designed to facilitate the systematic characterization and benchmarking of generative password-guessing models in the context of trawling attacks. Using MAYA, we conduct a comprehensive assessment of six state-of-the-art approaches, which we re-implemented and adapted to ensure standardization. Our evaluation spans eight real-world password datasets and covers an exhaustive set of advanced testing scenarios, totaling over 15,000 compute hours. Our findings indicate that these models effectively capture different aspects of human password distribution and exhibit strong generalization capabilities. However, their effectiveness varies significantly with long and complex passwords. Through our evaluation, sequential models consistently outperform other generative architectures and traditional password-guessing tools, demonstrating unique capabilities in generating accurate and complex guesses. Moreover, the diverse password distributions learned by the models enable a multi-model attack that outperforms the best individual model. By releasing MAYA, we aim to foster further research, providing the community with a new tool to consistently and reliably benchmark generative password-guessing models. Our framework is publicly available at https://github.com/williamcorrias/MAYA-Password-Benchmarking.

最近基因化模型的进展导致其应用于密码猜测,目的是复制人类创造的密码的复杂性、结构和模式。尽管这些密码具有潜力,但以往研究中的不一致和不充分的评价方法妨碍了有意义的比较和对其能力的全面、公正的理解。本文介绍了一个统一的、可定制的、插接和游戏基准框架MAYA, 目的是便利在拖网攻击中系统化地貌密码猜测模型的定性和基准化。我们利用MAYA,对六种最先进的方法进行了全面评估,我们重新实施和调整,以确保标准化。我们的评价跨越八个真实世界密码数据集,涵盖一套详尽的先进测试情景,总共15 000个计算小时。我们的研究结果表明,这些模型有效地捕捉了人类密码分布的不同方面,并展示了强大的概括化能力。然而,它们的效力因长而复杂地密码而大不相同。通过我们的评估,顺序模型始终优于其他的基因化框架和传统的密码调查工具,展示了在创建准确和复杂的基因化模型方面的独特能力。A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-


Article 129

Title@2025-06-12 (4): Bandit Convex Optimisation

Title: Bandit Convex Optimisation Bandit Convex Optimierung Bandit Convex 优化优化 2402.06535v4

Authors (1): Tor Lattimore

Bandit convex optimisation is a fundamental framework for studying zeroth-order convex optimisation. This book covers the many tools used for this problem, including cutting plane methods, interior point methods, continuous exponential weights, gradient descent and online Newton step. The nuances between the many assumptions and setups are explained. Although there is not much truly new here, some existing tools are applied in novel ways to obtain new algorithms. A few bounds are improved in minor ways.

Bandit convex 优化是研究零顺序 convex 优化的基本框架。 这本书涵盖了用于这一问题的许多工具, 包括切割平面方法、 内点方法、 连续指数权重、 梯度下行和在线 牛顿 步骤。 许多假设和设置之间的细微差别得到了解释。 虽然这里没有多少真正的新事物, 但有些现有工具正在以新颖的方式应用, 以获得新的算法。 一些界限稍有改进 。


Article 130

Title@2025-06-12 (4): Graph Neural Networks for Automatic Addition of Optimizing Components in Printed Circuit Board Schematics

Title: Graph Neural Networks for Automatic Addition of Optimizing Components in Printed Circuit Board Schematics Graphische Neuralnetze zur automatischen Ergänzung von Optimierungskomponenten in gedruckten Schaltplanen 打印电路板图示板自动增加优化部件的神经网络 2506.10577v1

Authors (4): Pascal Plettenberg, André Alcalde, Bernhard Sick, Josephine M. Thomas

The design and optimization of Printed Circuit Board (PCB) schematics is crucial for the development of high-quality electronic devices. Thereby, an important task is to optimize drafts by adding components that improve the robustness and reliability of the circuit, e.g., pull-up resistors or decoupling capacitors. Since there is a shortage of skilled engineers and manual optimizations are very time-consuming, these best practices are often neglected. However, this typically leads to higher costs for troubleshooting in later development stages as well as shortened product life cycles, resulting in an increased amount of electronic waste that is difficult to recycle. Here, we present an approach for automating the addition of new components into PCB schematics by representing them as bipartite graphs and utilizing a node pair prediction model based on Graph Neural Networks (GNNs). We apply our approach to three highly relevant PCB design optimization tasks and compare the performance of several popular GNN architectures on real-world datasets labeled by human experts. We show that GNNs can solve these problems with high accuracy and demonstrate that our approach offers the potential to automate PCB design optimizations in a time- and cost-efficient manner.

印刷电路委员会(PCB)示意图的设计和优化对于发展高质量的电子装置至关重要。因此,一项重要任务是通过增加能够提高电路的稳健性和可靠性的部件,例如拉动阻力器或脱钩电容器,优化草稿。由于缺少熟练的工程师和人工优化非常耗时,这些最佳做法往往被忽视。然而,这通常会导致后期开发阶段的故障排除成本较高,产品寿命周期缩短,导致电子废物数量增加,难以回收。在这里,我们提出一种方法,将新部件添加到多氯联苯示意图上,作为双向图,并使用基于“神经网络”图的结对预测模型。我们采用的方法是三项高度相关的多氯联苯设计优化任务,并将人类专家所标注的实时数据集上若干广受欢迎的GNN结构的性能加以比较。我们表明,GNN可以以高准确性的方式解决这些问题,并表明我们的方法能够以高效的方式为多氯联苯设计自动化的升级。


Article 131

Title@2025-06-12 (4): Evolutionary Prediction Games

Title: Evolutionary Prediction Games Evolutionäre Vorhersage Spiele 演进预测运动会 2503.03401v2

Authors (2): Eden Saig, Nir Rosenfeld

When a prediction algorithm serves a collection of users, disparities in prediction quality are likely to emerge. If users respond to accurate predictions by increasing engagement, inviting friends, or adopting trends, repeated learning creates a feedback loop that shapes both the model and the population of its users. In this work, we introduce evolutionary prediction games, a framework grounded in evolutionary game theory which models such feedback loops as natural-selection processes among groups of users. Our theoretical analysis reveals a gap between idealized and real-world learning settings: In idealized settings with unlimited data and computational power, repeated learning creates competition and promotes competitive exclusion across a broad class of behavioral dynamics. However, under realistic constraints such as finite data, limited compute, or risk of overfitting, we show that stable coexistence and mutualistic symbiosis between groups becomes possible. We analyze these possibilities in terms of their stability and feasibility, present mechanisms that can sustain their existence, and empirically demonstrate our findings.

当预测算法用于收集用户时,预测质量的差异可能会出现。如果用户通过增加参与、邀请朋友或采用趋势来回应准确的预测,反复学习会产生影响模型和用户人口的反馈循环。在这项工作中,我们引入进化预测游戏,这是一个基于进化游戏理论的框架,它模拟了用户群体之间自然选择过程等反馈循环。我们的理论分析揭示了理想化和现实世界学习环境之间的差距:在理想化环境中,有无限的数据和计算能力,反复学习在广泛的行为动态中产生竞争,促进竞争性排斥。然而,在诸如有限数据、有限计算或过度适应风险等现实的限制下,我们显示群体之间能够实现稳定的共存和相互共生关系。我们从稳定性和可行性的角度分析这些可能性,现有的机制能够维持它们的存在,并用经验展示我们的调查结果。


Article 132

Title@2025-06-12 (4): Identifiability Challenges in Sparse Linear Ordinary Differential Equations

Title: Identifiability Challenges in Sparse Linear Ordinary Differential Equations Identifizierbarkeitsherausforderungen in Sparse Lineare Standarddifferenzgleichungen 分散线性普通差别等分法的可识别性挑战 2506.09816v2

Authors (3): Cecilia Casolo, Sören Becker, Niki Kilbertus

Dynamical systems modeling is a core pillar of scientific inquiry across natural and life sciences. Increasingly, dynamical system models are learned from data, rendering identifiability a paramount concept. For systems that are not identifiable from data, no guarantees can be given about their behavior under new conditions and inputs, or about possible control mechanisms to steer the system. It is known in the community that “linear ordinary differential equations (ODE) are almost surely identifiable from a single trajectory.” However, this only holds for dense matrices. The sparse regime remains underexplored, despite its practical relevance with sparsity arising naturally in many biological, social, and physical systems. In this work, we address this gap by characterizing the identifiability of sparse linear ODEs. Contrary to the dense case, we show that sparse systems are unidentifiable with a positive probability in practically relevant sparsity regimes and provide lower bounds for this probability. We further study empirically how this theoretical unidentifiability manifests in state-of-the-art methods to estimate linear ODEs from data. Our results corroborate that sparse systems are also practically unidentifiable. Theoretical limitations are not resolved through inductive biases or optimization dynamics. Our findings call for rethinking what can be expected from data-driven dynamical system modeling and allows for quantitative assessments of how much to trust a learned linear ODE.

动态系统建模是自然科学和生命科学科学研究的核心支柱。 动态系统模型越来越多地从数据中学习, 使识别性成为最高概念。 对于无法从数据中识别的系统, 无法保证其在新的条件和投入下的行为, 或可能的控制机制来指导系统。 社区知道“ 线性普通差异方程式( ODE) 几乎可以肯定地从单一轨迹中识别出来 ” 。 然而, 这只能维持在密度基质上。 稀疏的系统仍然没有得到充分探讨, 尽管它与许多生物、社会和物理系统中自然产生的弥漫性具有实际相关性。 在这项工作中, 我们通过说明稀疏线性线性模型来弥补这一差距。 与密集的情况相反, 我们显示, 稀疏的系统是无法识别的,在实际相关的宽度制度中具有积极的可能性,并且为这种可能性提供了较低的界限。 我们进一步研究了这种理论性不可辨别的方法是如何在从数据中估计线性 ODEs 。 我们的结果证实, 稀疏的系统也是几乎无法辨别的模型。 理论限制使得我们所期望的动态的动态的系统能够被重新定位, 成为一种动态的动态分析。


Article 133

Title@2025-06-12 (4): Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration

Title: Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration Box-Constrained Softmax-Funktion und seine Anwendung für die Post-Hoc-Kalibrierung Box- Contract Softmax 函数及其热后校准应用程序 2506.10572v1

Authors (4): Kyohei Atarashi, Satoshi Oyama, Hiromi Arai, Hisashi Kashima

Controlling the output probabilities of softmax-based models is a common problem in modern machine learning. Although the $\mathrm{Softmax}$ function provides soft control via its temperature parameter, it lacks the ability to enforce hard constraints, such as box constraints, on output probabilities, which can be critical in certain applications requiring reliable and trustworthy models. In this work, we propose the box-constrained softmax ($\mathrm{BCSoftmax}$) function, a novel generalization of the $\mathrm{Softmax}$ function that explicitly enforces lower and upper bounds on output probabilities. While $\mathrm{BCSoftmax}$ is formulated as the solution to a box-constrained optimization problem, we develop an exact and efficient computation algorithm for $\mathrm{BCSoftmax}$. As a key application, we introduce two post-hoc calibration methods based on $\mathrm{BCSoftmax}$. The proposed methods mitigate underconfidence and overconfidence in predictive models by learning the lower and upper bounds of the output probabilities or logits after model training, thereby enhancing reliability in downstream decision-making tasks. We demonstrate the effectiveness of our methods experimentally using the TinyImageNet, CIFAR-100, and 20NewsGroups datasets, achieving improvements in calibration metrics.

在现代机器学习中,控制软负式模型的输出概率是一个常见的问题。 尽管 $\ mathrm{ Softmax} 函数通过其温度参数提供软控制, 但它缺乏对产出概率实施硬性限制的能力, 如框限制, 对要求可靠和可信赖模型的某些应用程序来说, 产出概率可能至关重要。 在这项工作中, 我们提出盒式软式控制( mathrm{ BCSoftmax} $) 功能, 一种对 $\ mathrm{ BCftmax} 功能的新的概括化。 虽然 $\ mathrm{ BCftmax} 函数通过其温度参数提供软性控制。 虽然 $\ mathrm{ BCSoftmax} 函数通过箱式控制优化问题的解决方案来实施硬性限制产出概率。 我们为 $\ mathrm{ BCBCSoftmax} 功能, 我们引入了两种基于 $\ mathrimm{ BCSoftmax} 功能的后校准校准方法。 在预测性模型中减少和过度信任性和过度性, 我们的模型中, 正在学习的下层分析中, , 在测试后, 在测试中, 在测试中, 测试中, 在测试中, 在测试中, 在测试中, 在测试后演练后演算中, 校程中, 校程中, 校程中, 校程中, 校程中, 我们的校正。


Article 134

Title@2025-06-12 (4): On the role of non-linear latent features in bipartite generative neural networks

Title: On the role of non-linear latent features in bipartite generative neural networks Zur Rolle nichtlinearer latenter Merkmale in bipartitischen generativen neuronalen Netzwerken 关于非线性潜伏特征在双边基因神经网络中的作用 2506.10552v1

Authors (4): Tony Bonnaire, Giovanni Catania, Aurélien Decelle, Beatriz Seoane

We investigate the phase diagram and memory retrieval capabilities of bipartite energy-based neural networks, namely Restricted Boltzmann Machines (RBMs), as a function of the prior distribution imposed on their hidden units - including binary, multi-state, and ReLU-like activations. Drawing connections to the Hopfield model and employing analytical tools from statistical physics of disordered systems, we explore how the architectural choices and activation functions shape the thermodynamic properties of these models. Our analysis reveals that standard RBMs with binary hidden nodes and extensive connectivity suffer from reduced critical capacity, limiting their effectiveness as associative memories. To address this, we examine several modifications, such as introducing local biases and adopting richer hidden unit priors. These adjustments restore ordered retrieval phases and markedly improve recall performance, even at finite temperatures. Our theoretical findings, supported by finite-size Monte Carlo simulations, highlight the importance of hidden unit design in enhancing the expressive power of RBMs.

我们调查两边基于能源的神经网络,即限制波尔茨曼机器(RBM)的阶段图和记忆检索能力,这是先前对其隐蔽单元(包括二进制、多状态和类似RLU的激活)的分布功能。我们从无序系统的统计物理学中绘制与霍普菲尔德模型的连接并利用这些系统的分析工具,我们探索建筑选择和激活功能如何影响这些模型的热力特性。我们的分析表明,带有二进制隐藏节点和广泛连接的标准按级按级按级按级按级按级按级按级按级按级按级按级按级按级计算,限制了它们作为连带记忆的效力。为了解决这个问题,我们研究了几项修改,例如引入地方偏差和采用较富的隐藏单元前期等。这些调整恢复了定级的检索阶段,并显著地改进了业绩,即使在有限的温度下。我们的理论发现,在限定规模的蒙特卡洛模拟中,突出了隐藏单位设计在加强按级按级按级计调制制时的重要性。


Article 135

Title@2025-06-12 (4): Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods

Title: Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods Kostenlose Bewertung von Record-Level-Datensicherheitsrisiken durch Artifact-basierte Methoden 采用人工方法进行自由记录隐私风险评估 2411.05743v3

Authors (4): Joseph Pollock, Igor Shilov, Euodia Dodd, Yves-Alexandre de Montjoye

Membership inference attacks (MIAs) are widely used to empirically assess privacy risks in machine learning models, both providing model-level vulnerability metrics and identifying the most vulnerable training samples. State-of-the-art methods, however, require training hundreds of shadow models with the same architecture as the target model. This makes the computational cost of assessing the privacy of models prohibitive for many practical applications, particularly when used iteratively as part of the model development process and for large models. We propose a novel approach for identifying the training samples most vulnerable to membership inference attacks by analyzing artifacts naturally available during the training process. Our method, Loss Trace Interquartile Range (LT-IQR), analyzes per-sample loss trajectories collected during model training to identify high-risk samples without requiring any additional model training. Through experiments on standard benchmarks, we demonstrate that LT-IQR achieves 92% precision@k=1% in identifying the samples most vulnerable to state-of-the-art MIAs. This result holds across datasets and model architectures with LT-IQR outperforming both traditional vulnerability metrics, such as loss, and lightweight MIAs using few shadow models. We also show LT-IQR to accurately identify points vulnerable to multiple MIA methods and perform ablation studies. We believe LT-IQR enables model developers to identify vulnerable training samples, for free, as part of the model development process. Our results emphasize the potential of artifact-based methods to efficiently evaluate privacy risks.

成员推导攻击(MIAs)被广泛用于实证评估机器学习模型中的隐私风险,既提供模型级别的脆弱度度度指标,又确定最易受到伤害的培训样本。然而,最先进的方法要求培训数百个影子模型,与目标模型的架构相同。这使得评估模型隐私的计算成本对许多实际应用来说令人望而却步,特别是在作为模型开发过程的一部分和大型模型的迭代使用时。我们提出一种新的方法,通过分析培训过程中自然可得的工艺品,确定最易成为会员推断袭击的样本。我们的方法,即损失追踪国际间交易范围(LT-IQR),分析在模型培训期间收集的每个抽样损失轨迹,以确定高风险样品,而不需要任何额外的示范培训。我们通过对标准基准的实验,证明LT-IQR实现了92%的精确度@k=1%的精确度,通过分析最易受到伤害的模式样本,分析在培训过程中自然可以获得的文物。我们的方法,LT-IR(LT-I)超越了传统的脆弱度标点,同时进行模拟的光度和模型的模型评估。我们通过对低比标准的模型的模型的模型,还显示了损失评估方法。


Article 136

Title@2025-06-12 (4): Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models

Title: Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models Obliviate: Effiziente Unvergesslichkeit für den Schutz geistigen Eigentums in großen Sprachmodellen 默认:在大语言模式中有效统一保护知识产权 2502.15010v2

Authors (2): Mark Russinovich, Ahmed Salem

Recent copyright agreements between AI companies and content creators underscore the need for fine-grained control over language models’ ability to reproduce copyrighted text. Existing defenses-ranging from aggressive unlearning to simplistic output filters-either sacrifice model utility or inadequately address verbatim leakage. We introduce Obliviate, a lightweight post-training method that surgically suppresses exact reproduction of specified sequences while preserving semantic understanding. Obliviate first identifies memorized passages and then, for each target token, minimally adjusts the model’s output distribution via a Kullback-Leibler divergence penalty to drive down the probability of exact reproduction. Simultaneously, we enforce a consistency loss on non-target tokens to retain the model’s fluency and task performance. We evaluate Obliviate on four popular 6-8B-parameter models (LLaMA-3.1, LLaMA-3.1-Instruct, Qwen-2.5, and Yi-1.5) using synthetic memorization benchmarks and organic copyrighted excerpts (e.g., Moby Dick, Frankenstein, Alice in Wonderland and Les Miserables). Across all settings, Obliviate reduces verbatim recall by two orders of magnitude (e.g., from hundreds of words to fewer than 12) while degrading downstream accuracy by at most 1% on HellaSwag, MMLU, TruthfulQA, and Winogrande. Furthermore, we benchmark Obliviate aganist different unlearning and copyright techniques using the MUSE and CoTaEval benchmarks. These results position Obliviate as a practical, high-fidelity solution for copyright compliance in deployed LLMs.

AI公司和内容创作者之间最近达成的版权协议强调,需要对语言模型复制版权文本的能力进行细微的控制。现有的防御措施包括:从主动的不学习到简单的产出过滤器,无论是牺牲模型效用还是不当的邮件渗漏。我们采用了轻巧的培训后训练方法Obliviate,在保持语义理解的同时,用外科方式抑制特定序列的复制。首先确定记忆化的段落,然后对每个目标标志,通过Kullback-Lebel差价罚款,对模型的输出分配进行最小调整,以降低精确复制的概率。与此同时,我们对非目标标志实施一致性损失,以保持模型的流畅和任务性。我们用四种受欢迎的 6-8B 参数模型(LLLAMA-3、LalaMA-3.1-Instruct、Qwen-2.5和Yy-1.5) 评估了四种受欢迎的6-8B参数模型(LLO-I-Instruct、Q-2.5和Y-1.5),用合成的数学标准基准和有机版权基准(例如,Moby Dick、Francel、Al-Al-Al-Al-Lislate 和Lislation) 和最低级的准确性排序,用两种标准,用最低级标准,用最低的顺序和最低的顺序,将这些结果和最低的精确性。


Article 137

Title@2025-06-12 (4): Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models

Title: Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models Erforschung von Leistungs-Komplexitäts-Trade-Offs in Sound Event Detection-Modellen 探索健全事件探测模型中的性能-复杂交易-便利交易 2503.11373v2

Authors (5): Tobias Morocutti, Florian Schmid, Jonathan Greif, Francesco Foscarin, Gerhard Widmer

We target the problem of developing new low-complexity networks for the sound event detection task. Our goal is to meticulously analyze the performance-complexity trade-off, aiming to be competitive with the large state-of-the-art models, at a fraction of the computational requirements. We find that low-complexity convolutional models previously proposed for audio tagging can be effectively adapted for event detection (which requires frame-wise prediction) by adjusting convolutional strides, removing the global pooling, and, importantly, adding a sequence model before the (now frame-wise) classification heads. Systematic experiments reveal that the best choice for the sequence model type depends on which complexity metric is most important for the given application. We also investigate the impact of enhanced training strategies such as knowledge distillation. In the end, we show that combined with an optimized training strategy, we can reach event detection performance comparable to state-of-the-art transformers while requiring only around 5% of the parameters. We release all our pre-trained models and the code for reproducing this work to support future research in low-complexity sound event detection at https://github.com/theMoro/EfficientSED.

我们的目标是在计算要求的一小部分时间里,仔细分析性能-复杂程度权衡,目的是与大型最新模型进行竞争。我们发现,以前为音频标记提议的低复杂性变异性模型可以有效地适应于事件探测(这需要以框架为根据的预测),方法是调整连带进步,消除全球集合,并且重要的是,在(现为框架的)分类头之前添加一个序列模型。系统实验表明,对序列模型类型的最佳选择取决于对特定应用最重要的哪些复杂度衡量标准。我们还调查了知识蒸馏等强化培训战略的影响。最后,我们表明,与优化培训战略相结合,我们可以达到与最先进变异型器相匹配的事件探测性能,同时只需要5%的参数。我们发布了我们所有预先培训的模型和重新制作这项工作的代码,以支持未来在https://giub/scientoro-Shotoroi/McomcomforyEDDM探测低复杂度事件的研究。


Article 138

Title@2025-06-12 (4): Data-driven Day Ahead Market Prices Forecasting: A Focus on Short Training Set Windows

Title: Data-driven Day Ahead Market Prices Forecasting: A Focus on Short Training Set Windows datengetriebener Tag voraus Marktpreise Prognose: Ein Fokus auf Kurztraining-Set Windows 头期市场价格预测:关注短期培训成套窗口 2506.10536v1

Authors (6): Vasilis Michalakopoulos, Christoforos Menos-Aikateriniadis, Elissaios Sarmas, Antonis Zakynthinos, Pavlos S. Georgilakis, Dimitris Askounis

This study investigates the performance of machine learning models in forecasting electricity Day-Ahead Market (DAM) prices using short historical training windows, with a focus on detecting seasonal trends and price spikes. We evaluate four models, namely LSTM with Feed Forward Error Correction (FFEC), XGBoost, LightGBM, and CatBoost, across three European energy markets (Greece, Belgium, Ireland) using feature sets derived from ENTSO-E forecast data. Training window lengths range from 7 to 90 days, allowing assessment of model adaptability under constrained data availability. Results indicate that LightGBM consistently achieves the highest forecasting accuracy and robustness, particularly with 45 and 60 day training windows, which balance temporal relevance and learning depth. Furthermore, LightGBM demonstrates superior detection of seasonal effects and peak price events compared to LSTM and other boosting models. These findings suggest that short-window training approaches, combined with boosting methods, can effectively support DAM forecasting in volatile, data-scarce environments.

这项研究调查了利用短期历史培训窗口预测电日市场价格的机器学习模型的绩效,重点是发现季节趋势和价格涨幅;我们评估了四个模型,即具有进料错误校正的LSTM(FFEC)、XGBost、LightGBM和CatBoost(希腊、比利时、爱尔兰)三个欧洲能源市场,利用ENTSO-E预报数据生成的成套特征;培训窗口长度从7天到90天不等,以便评估受数据限制的模型适应性;结果显示,灯GBM始终在预测准确性和稳健性方面达到最高水平,特别是45天和60天的培训窗口,这些窗口平衡了时间相关性和学习深度;此外,灯GBM显示,与LSTM和其他推进模型相比,对季节效应和高峰价格事件的探测率较高。这些研究结果表明,短窗口培训方法与推进方法相结合,可以有效支持DAM在波动、数据侵蚀环境中的预报。


Article 139

Title@2025-06-12 (4): Equivariant Neural Diffusion for Molecule Generation

Title: Equivariant Neural Diffusion for Molecule Generation Equivariante Neuraldiffusion für die Molekülgeneration 分子代的等量神经元传播 2506.10532v1

Authors (4): François Cornet, Grigory Bartosh, Mikkel N. Schmidt, Christian A. Naesseth

We introduce Equivariant Neural Diffusion (END), a novel diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Compared to current state-of-the-art equivariant diffusion models, the key innovation in END lies in its learnable forward process for enhanced generative modelling. Rather than pre-specified, the forward process is parameterized through a time- and data-dependent transformation that is equivariant to rigid transformations. Through a series of experiments on standard molecule generation benchmarks, we demonstrate the competitive performance of END compared to several strong baselines for both unconditional and conditional generation.

我们引入了异质神经分解(END),这是3D分子生成的新型扩散模型,与欧clidean变异具有等同性。与当前最新的等异性扩散模型相比,END的关键创新在于其可学习的强化基因建模前期进程。远前进程不是预先规定的,而是通过一个与僵化变异等同的时间和数据化变异来参数化。通过一系列标准分子生成基准实验,我们展示了END的竞争性性能,而无条件和有条件的生成则有几条强有力的基线。


Article 140

Title@2025-06-12 (4): Decision Making under the Exponential Family: Distributionally Robust Optimisation with Bayesian Ambiguity Sets

Title: Decision Making under the Exponential Family: Distributionally Robust Optimisation with Bayesian Ambiguity Sets Entscheidungsfindung unter der Exponential-Familie: Verteilungsstarke Optimierung mit Bayesian Ambiguity Sets 根据 “ 名人家庭 “ 下的决策:与贝ysian 暗形组的分布式强力优化 2411.16829v2

Authors (3): Charita Dellaporta, Patrick O’Hara, Theodoros Damoulas

Decision making under uncertainty is challenging as the data-generating process (DGP) is often unknown. Bayesian inference proceeds by estimating the DGP through posterior beliefs on the model’s parameters. However, minimising the expected risk under these beliefs can lead to suboptimal decisions due to model uncertainty or limited, noisy observations. To address this, we introduce Distributionally Robust Optimisation with Bayesian Ambiguity Sets (DRO-BAS) which hedges against model uncertainty by optimising the worst-case risk over a posterior-informed ambiguity set. We provide two such sets, based on posterior expectations (DRO-BAS(PE)) or posterior predictives (DRO-BAS(PP)) and prove that both admit, under conditions, strong dual formulations leading to efficient single-stage stochastic programs which are solved with a sample average approximation. For DRO-BAS(PE) this covers all conjugate exponential family members while for DRO-BAS(PP) this is shown under conditions on the predictive’s moment generating function. Our DRO-BAS formulations outperform existing Bayesian DRO on the Newsvendor problem and achieve faster solve times with comparable robustness on the Portfolio problem.

在不确定情况下作出决策具有挑战性,因为数据产生过程往往不为人知。贝伊斯通过模型参数的后端信仰估计DGP而得出预测结果。然而,由于模型的不确定性或有限的、吵闹的观测,将这些信念下的预期风险降到最低可能导致作出不理想的决定。为了解决这个问题,我们引入了与Bayesian Ambiguity Set (DRO-BAS) 的分布式强力优化优化,通过优化后方知情的模棱两可避免模型不确定性。我们根据后方期望(DRO-BAS(PE))或后方预测(DRO-BAS(PP)))提供两套这类预测结果。我们根据预测后方期望(DRO-BAS(DRO-BAS(P)))或后方预测(DRO-BAS(DPP)) 提供两种情况,以预测结果证明,在条件下,两种情况都承认强有力的双重配方配制导致高效的单一阶段的抽查方案。对于DRO-BAS(PES)来说,这涵盖了所有同DRO-BAS(PAS)的指数(PPP)在预测时段产生更快的节制中,在可比较的BA-Formax-Fy-Formax-S-S-promaismaildromais promais pro pro pro pro pro pro pro pro pro pro pro pro prois pro pro pro pro prois pro pro pro pro pro pro plemental plementald prois pro pro plement 上的问题上, y prodald prodaldaldaldaldald plementald pro pro pro prod prod pled prod prodal prod proddaldal progal prod prodrodal y prod y prodrodroddrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodrodro


Article 141

Title@2025-06-12 (4): Learning in Budgeted Auctions with Spacing Objectives

Title: Learning in Budgeted Auctions with Spacing Objectives Lernen in budgetierten Auktionen mit unterschiedlichen Zielen 具有间距目标的预算拍卖中的学习 2411.04843v2

Authors (6): Giannis Fikioris, Robert Kleinberg, Yoav Kolumbus, Raunak Kumar, Yishay Mansour, Éva Tardos

In many repeated auction settings, participants care not only about how frequently they win but also how their winnings are distributed over time. This problem arises in various practical domains where avoiding congested demand is crucial, such as online retail sales and compute services, as well as in advertising campaigns that require sustained visibility over time. We introduce a simple model of this phenomenon, modeling it as a budgeted auction where the value of a win is a concave function of the time since the last win. This implies that for a given number of wins, even spacing over time is optimal. We also extend our model and results to the case when not all wins result in “conversions” (realization of actual gains), and the probability of conversion depends on a context. The goal is to maximize and evenly space conversions rather than just wins. We study the optimal policies for this setting in second-price auctions and offer learning algorithms for the bidders that achieve low regret against the optimal bidding policy in a Bayesian online setting. Our main result is a computationally efficient online learning algorithm that achieves $\tilde O(\sqrt T)$ regret. We achieve this by showing that an infinite-horizon Markov decision process (MDP) with the budget constraint in expectation is essentially equivalent to our problem, even when limiting that MDP to a very small number of states. The algorithm achieves low regret by learning a bidding policy that chooses bids as a function of the context and the system’s state, which will be the time elapsed since the last win (or conversion). We show that state-independent strategies incur linear regret even without uncertainty of conversions. We complement this by showing that there are state-independent strategies that, while still having linear regret, achieve a $(1-\frac 1 e)$ approximation to the optimal reward.

在许多反复的拍卖环境中, 参与者不仅关心他们赢得的频率, 也关心他们赢得的得分如何随时间而分配。 这个问题出现在各种实际领域, 避免需求紧要, 比如在线零售销售和计算服务, 以及需要长期持续可见的广告运动中。 我们引入了这一现象的简单模式, 将赢的价值作为自上次赢以来时间的混搭功能, 以预算为模式的拍卖模式。 这意味着对于某个特定数目的赢者来说, 甚至独立时间的间隔是最佳的。 我们还将我们的模型和结果推广到并非所有赢得“ 转换” ( 实际收益的实现) 的关键领域, 以及转换的可能性取决于一个环境。 我们的目标是在二价拍卖中研究这个环境的最佳政策, 并且给竞拍者提供学习算法, 在Bayesian 的网络环境下, 我们的主要结果是计算到一个高效的在线学习算法, 实现O( sqrent T) 的变换值( 实现实际收益的变值 ) , 而转换的概率概率取决于一个环境 。 我们从一个无限的算算法, 将显示我们预算的回报的变数 的算算法, 的算算法将显示一个比重的变数, 的策略将显示我们的预算 的算法 的 的 的算法是 的 的 的 。


Article 142

Title@2025-06-12 (4): QuXAI: Explainers for Hybrid Quantum Machine Learning Models

Title: QuXAI: Explainers for Hybrid Quantum Machine Learning Models QuXAI: Erklärer für Hybrid-Quantum-Maschinen-Lernmodelle QuXAI:混合量子机器学习模型的解释器 2505.10167v3

Authors (6): Saikat Barua, Mostafizur Rahman, Shehenaz Khaled, Md Jafor Sadek, Rafiul Islam, Shahnewaz Siddique

The emergence of hybrid quantum-classical machine learning (HQML) models opens new horizons of computational intelligence but their fundamental complexity frequently leads to black box behavior that undermines transparency and reliability in their application. Although XAI for quantum systems still in its infancy, a major research gap is evident in robust global and local explainability approaches that are designed for HQML architectures that employ quantized feature encoding followed by classical learning. The gap is the focus of this work, which introduces QuXAI, an framework based upon Q-MEDLEY, an explainer for explaining feature importance in these hybrid systems. Our model entails the creation of HQML models incorporating quantum feature maps, the use of Q-MEDLEY, which combines feature based inferences, preserving the quantum transformation stage and visualizing the resulting attributions. Our result shows that Q-MEDLEY delineates influential classical aspects in HQML models, as well as separates their noise, and competes well against established XAI techniques in classical validation settings. Ablation studies more significantly expose the virtues of the composite structure used in Q-MEDLEY. The implications of this work are critically important, as it provides a route to improve the interpretability and reliability of HQML models, thus promoting greater confidence and being able to engage in safer and more responsible use of quantum-enhanced AI technology. Our code and experiments are open-sourced at: https://github.com/GitsSaikat/QuXAI

混合量子古典机学(HQML)模式的出现开启了计算智能的新视野,但其基本复杂性经常导致黑盒行为,破坏其应用的透明度和可靠性。虽然量子系统XAI仍然处于初级阶段,但为使用量化特征编码的HQML结构设计的强大的全球和地方解释方法明显存在重大研究差距,这些结构采用量化特征编码,随后是古典学习。这种差距是这项工作的焦点,它引入了QuXAI(QuXAI)(基于Q-MEDLEY(Q-MEDLEY)的框架),这是解释这些混合系统重要性的一个解释器。我们的模型需要创建包含量子特征图的黑盒行为,使用Q-MEDLEY(Q-MEDAY)模型,该模型结合基于推断的特性,保存量子转换阶段和由此产生的属性的可视化分析方法。我们的结果显示,Q-MEDLE(Q)中具有影响力的经典 XAI(Q-MEDL)技术的开放技术,并与之竞争。更显著地暴露了在Q-MEDLA/Q(HLAIAY)中所使用的复合结构结构的美度的美度,因此,可以更安全地解释。在提高的模型中进行更安全性、更安全的模型。


Article 143

Title@2025-06-12 (4): Macro Graph of Experts for Billion-Scale Multi-Task Recommendation

Title: Macro Graph of Experts for Billion-Scale Multi-Task Recommendation Makrograph der Experten für Milliarden-Scale Multi-Task-Empfehlung 10亿多任务建议十亿多阶段专家宏观图 2506.10520v1

Authors (9): Hongyu Yao, Zijin Hong, Hao Chen, Yuanchen Bei, Zhiqing Li, Qijie Shen, Zuobin Ying, Huan Gong, Feiran Huang

Graph-based multi-task learning at billion-scale presents a significant challenge, as different tasks correspond to distinct billion-scale graphs. Traditional multi-task learning methods often neglect these graph structures, relying solely on individual user and item embeddings. However, disregarding graph structures overlooks substantial potential for improving performance. In this paper, we introduce the Macro Graph of Expert (MGOE) framework, the first approach capable of leveraging macro graph embeddings to capture task-specific macro features while modeling the correlations between task-specific experts. Specifically, we propose the concept of a Macro Graph Bottom, which, for the first time, enables multi-task learning models to incorporate graph information effectively. We design the Macro Prediction Tower to dynamically integrate macro knowledge across tasks. MGOE has been deployed at scale, powering multi-task learning for the homepage of a leading billion-scale recommender system. Extensive offline experiments conducted on three public benchmark datasets demonstrate its superiority over state-of-the-art multi-task learning methods, establishing MGOE as a breakthrough in multi-task graph-based recommendation. Furthermore, online A/B tests confirm the superiority of MGOE in billion-scale recommender systems.

在10亿尺度上,基于图表的多任务学习是一个重大挑战,因为不同的任务与不同的10亿尺度图表相对应。传统的多任务学习方法往往忽视这些图表结构,只依靠单个用户和项目嵌入。然而,无视图表结构忽略了改善业绩的巨大潜力。在本文中,我们引入了宏观专家图(MGOE)框架,这是能够利用宏观图嵌入来捕捉具体任务宏观特征的第一个方法,同时模拟具体任务专家之间的关联。具体地说,我们提出了宏观图底部的概念,首次使多任务学习模型能够有效地纳入图表信息。我们设计宏观预测塔,以动态地将宏观知识贯穿各项任务。MGOE已经大规模部署,为领先的10亿尺度建议系统的主页提供多任务学习动力。在三个公共基准数据集上进行的广泛离线实验表明它优于最先进的多任务学习方法,将MGOE确定为多任务图表制建议中的突破。此外,在线A/B测试确认了MGO系统在多任务图表基础上的优势。


Article 144

Title@2025-06-12 (4): A Weighted Loss Approach to Robust Federated Learning under Data Heterogeneity

Title: A Weighted Loss Approach to Robust Federated Learning under Data Heterogeneity Ein gewichtiger Verlust Ansatz für robustes Federated Learning unter Daten Heterogenität 数据异质下联邦强力学习的加权损失方法 2506.09824v2

Authors (5): Johan Erbani, Sonia Ben Mokhtar, Pierre-Edouard Portier, Elod Egyed-Zsigmond, Diana Nurbakova

Federated learning (FL) is a machine learning paradigm that enables multiple data holders to collaboratively train a machine learning model without sharing their training data with external parties. In this paradigm, workers locally update a model and share with a central server their updated gradients (or model parameters). While FL seems appealing from a privacy perspective, it opens a number of threats from a security perspective as (Byzantine) participants can contribute poisonous gradients (or model parameters) harming model convergence. Byzantine-resilient FL addresses this issue by ensuring that the training proceeds as if Byzantine participants were absent. Towards this purpose, common strategies ignore outlier gradients during model aggregation, assuming that Byzantine gradients deviate more from honest gradients than honest gradients do from each other. However, in heterogeneous settings, honest gradients may differ significantly, making it difficult to distinguish honest outliers from Byzantine ones. In this paper, we introduce the Worker Label Alignement Loss (WoLA), a weighted loss that aligns honest worker gradients despite data heterogeneity, which facilitates the identification of Byzantines’ gradients. This approach significantly outperforms state-of-the-art methods in heterogeneous settings. In this paper, we provide both theoretical insights and empirical evidence of its effectiveness.

联邦学习(FL)是一种机器学习模式,它使多个数据持有者能够在不与外部各方分享培训数据的情况下合作培训机器学习模式,而无需与外部各方分享其培训数据。在这个模式中,工人在当地更新一个模型,并与中央服务器共享其更新的梯度(或模型参数)。虽然FL从隐私角度似乎吸引,但从安全角度打开了一些威胁,因为(Byzantine)参与者可以造成有毒梯度(或模型参数)损害模型趋同。Byzantine-Relishive FL(Byzantine-Relishive FL)解决这个问题,确保培训成果与Byzantine参与者不存在一样。为此,共同战略在模型汇总期间忽略了外部梯度,假设Byzantine梯度偏离的诚实梯度多于诚实梯度。然而,在多种环境里,诚实梯度的梯度可能差异很大,难以区分诚实的离差者与Byzantine的模型趋同者。在本文中,我们介绍了WoLA, 一种加权损失使诚实的工人梯度梯度与数据具有高度的梯度,这为我们提供了模型化的理论推导路。


Article 145

Title@2025-06-12 (4): Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning

Title: Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning Subgraph Gaussian Einbettungskontrast für selbstüberwachtes Graphen-Darstellungslernen 自支持图表代表制学习的 Subgraph Gaussian 嵌入式对比对比度 2505.23529v2

Authors (3): Shifeng Xie, Aref Einizade, Jhony H. Giraldo

Graph Representation Learning (GRL) is a fundamental task in machine learning, aiming to encode high-dimensional graph-structured data into low-dimensional vectors. Self-Supervised Learning (SSL) methods are widely used in GRL because they can avoid expensive human annotation. In this work, we propose a novel Subgraph Gaussian Embedding Contrast (SubGEC) method. Our approach introduces a subgraph Gaussian embedding module, which adaptively maps subgraphs to a structured Gaussian space, ensuring the preservation of input subgraph characteristics while generating subgraphs with a controlled distribution. We then employ optimal transport distances, more precisely the Wasserstein and Gromov-Wasserstein distances, to effectively measure the similarity between subgraphs, enhancing the robustness of the contrastive learning process. Extensive experiments across multiple benchmarks demonstrate that \method~outperforms or presents competitive performance against state-of-the-art approaches. Our findings provide insights into the design of SSL methods for GRL, emphasizing the importance of the distribution of the generated contrastive pairs.

图形教学( GRL) 是机器学习的一项基本任务, 目的是将高方图结构数据编码为低维矢量。 自我支持学习( SSL) 方法在 GRL 中被广泛使用, 因为它们可以避免昂贵的人类批注。 在这项工作中, 我们提出一个新的Subgraph Gausian 嵌入对比( SubGEC) 方法。 我们的方法引入了子集子集成模块, 该模块将子集成成到结构化的高斯空间, 确保在生成受控分布的子集时保存输入子集特性。 然后我们使用最佳的运输距离, 更精确地说, 瓦瑟斯坦 和 Gromov- Wasserstein 的距离, 以有效测量子集之间的相似性, 增强对比性学习过程的稳健性。 跨多个基准的大规模实验表明, 血压~ 外形或显示与最新技术方法相比具有竞争力的性能。 我们的发现为 GRSL 设计 SL 方法提供了洞察 设计 方法的洞察力, , 强调生成对比配对分布的重要性 。


Article 146

Title@2025-06-12 (4): Differentially private and decentralized randomized power method

Title: Differentially private and decentralized randomized power method Differential private und dezentrale randomisierte Energiemethode 区别对待的私人和分散的随机电源方法 2411.01931v3

Authors (5): Julien Nicolas, César Sabater, Mohamed Maouche, Sonia Ben Mokhtar, Mark Coates

The randomized power method has gained significant interest due to its simplicity and efficient handling of large-scale spectral analysis and recommendation tasks. However, its application to large datasets containing personal information (e.g., web interactions, search history, personal tastes) raises critical privacy problems. This paper addresses these issues by proposing enhanced privacy-preserving variants of the method. First, we propose a variant that reduces the amount of the noise required in current techniques to achieve Differential Privacy (DP). More precisely, we refine the privacy analysis so that the Gaussian noise variance no longer grows linearly with the target rank, achieving the same DP guarantees with strictly less noise. Second, we adapt our method to a decentralized framework in which data is distributed among multiple users. The decentralized protocol strengthens privacy guarantees with no accuracy penalty and a low computational and communication overhead. Our results include the provision of tighter convergence bounds for both the centralized and decentralized versions, and an empirical comparison with previous work using real recommendation datasets.

随机电动方法由于简单和高效地处理大型光谱分析和建议任务而引起了极大的兴趣,然而,它适用于包含个人信息的大型数据集(例如网络互动、搜索历史、个人口味),引起了严重的隐私问题。本文件通过提出方法的强化隐私保护变体来解决这些问题。首先,我们提出了一个变体,减少当前技术为实现差异隐私(DP)所需噪音的数量。更准确地说,我们完善隐私分析,使高西亚噪音不再与目标级别成直线变化,在严格减少噪音的情况下实现相同的DP保障。第二,我们调整我们的方法,使之适应一个分散化的框架,在多个用户之间分配数据。分散化的协议加强了隐私保障,没有准确的罚款,计算和通信管理低。我们的成果包括提供集中和分散化版本的更紧密的连接线,以及用真实的建议数据集对先前的工作进行经验性比较。


Article 147

Title@2025-06-12 (4): Large Scale Multi-Task Bayesian Optimization with Large Language Models

Title: Large Scale Multi-Task Bayesian Optimization with Large Language Models Große Multi-Task Bayesian Optimierung mit großen Sprachmodellen 使用大语言模式的大型多语类多语类多语类多语类最佳优化 2503.08131v2

Authors (10): Yimeng Zeng, Natalie Maus, Haydn Thomas Jones, Jeffrey Tao, Fangping Wan, Marcelo Der Torossian Torres, Cesar de la Fuente-Nunez, Ryan Marcus, Osbert Bastani, Jacob R. Gardner

In multi-task Bayesian optimization, the goal is to leverage experience from optimizing existing tasks to improve the efficiency of optimizing new ones. While approaches using multi-task Gaussian processes or deep kernel transfer exist, the performance improvement is marginal when scaling beyond a moderate number of tasks. We introduce a novel approach leveraging large language models (LLMs) to learn from, and improve upon, previous optimization trajectories, scaling to approximately 1500 distinct tasks. Specifically, we propose a feedback loop in which an LLM is fine-tuned on the high quality solutions to specific tasks found by Bayesian optimization (BO). This LLM is then used to generate initialization points for future BO searches for new tasks. The trajectories of these new searches provide additional training data for fine-tuning the LLM, completing the loop. We evaluate our method on two distinct domains: database query optimization and antimicrobial peptide design. Results demonstrate that our approach creates a positive feedback loop, where the LLM’s generated initializations gradually improve, leading to better optimization performance. As this feedback loop continues, we find that the LLM is eventually able to generate solutions to new tasks in just a few shots that are better than the solutions produced by “from scratch” by Bayesian optimization while simultaneously requiring significantly fewer oracle calls.

在多功能贝叶斯优化中,目标是利用现有任务的经验,优化现有任务,提高优化新任务的效率。 虽然使用多任务高斯进程或深内核传输的方法存在,但绩效改进在超出中度任务范围以外时是微不足道的。 我们引入了一种创新方法,利用大语言模型(LLMs)从以往的优化轨迹中学习并改进,缩小到大约1500项不同的任务。 具体地说,我们提议了一个反馈循环,让一个LLM对巴耶斯优化(BO)发现的具体任务的高质量解决方案进行微调。这个LM随后被用来为未来BO搜索新任务生成初始化点。这些新搜索的轨迹提供了额外的培训数据,用于微调LM,完成这一循环。我们在两个不同的领域评估了我们的方法:数据库查询优化和抗微缩胶条纹设计。结果显示,我们的方法创造了积极的反馈循环,在这个循环中,LLM的初始化逐渐改进,导致更好的优化绩效。 由于这一反馈循环继续存在,我们发现LM公司最终能够产生比新的解决方案的更好。


Article 148

Title@2025-06-12 (4): Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation

Title: Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation Qualität über Menge: Steigerung der Dateneffizienz durch ensembled multimodale Daten-Curation 质量高于数量:通过集合多模式数据曲线提高数据效率 2502.08211v2

Authors (7): Jinda Xu, Yuhao Song, Daming Wang, Weiwei Zhao, Minghua Chen, Kangliang Chen, Qinya Li

In an era overwhelmed by vast amounts of data, the effective curation of web-crawl datasets is essential for optimizing model performance. This paper tackles the challenges associated with the unstructured and heterogeneous nature of such datasets. Traditional heuristic curation methods often inadequately capture complex features, resulting in biases and the exclusion of relevant data. We introduce an advanced, learning-driven approach, Ensemble Curation Of DAta ThroUgh Multimodal Operators (EcoDatum), incorporating a novel quality-guided deduplication method to ensure balanced feature distributions. EcoDatum strategically integrates various unimodal and multimodal data curation operators within a weak supervision ensemble framework, utilizing automated optimization to score each data point effectively. EcoDatum, which significantly improves the data curation quality and efficiency, outperforms existing state-of-the-art (SOTA) techniques, ranked 1st on the DataComp leaderboard, with an average performance score of 0.182 across 38 diverse evaluation datasets. This represents a 28% improvement over the DataComp baseline method, demonstrating its effectiveness in improving dataset curation and model training efficiency.

在一个被大量数据所淹没的时代,有效校正网络拖网数据集对于优化模型性能至关重要。本文件应对与这类数据集的无结构性和多样性性质有关的挑战。传统的超常调解析方法往往不能充分捕捉复杂的特征,导致偏向和排斥相关数据。我们引入了一种先进的、由学习驱动的方法,即Data ThroUgh多式操作员(EcoDatum)的集合曲线,其中包括一种新型的质量引导解析方法,以确保均衡地分布特征。EcoDatum在战略上将各种单式和多式联运数据调控操作员纳入一个薄弱的监管整体框架,利用自动化优化来有效评分每个数据点。EcoDatum大大改进了数据调控的质量和效率,超越了现有的状态(SOTA)技术,在数据组合领导板上排名第1位,在38个不同的评估数据集中平均业绩评分为0.182。这代表了数据比较基准方法的28%的改进,显示了其在改进数据整理模型和培训效率方面的有效性。


Article 149

Title@2025-06-12 (4): A Crack in the Bark: Leveraging Public Knowledge to Remove Tree-Ring Watermarks

Title: A Crack in the Bark: Leveraging Public Knowledge to Remove Tree-Ring Watermarks Ein Riss in der Bark: Benutzt öffentliches Wissen, um Baum-Ring-Wasserzeichen zu entfernen 瓶中的裂缝:利用公共知识来清除树状水印 2506.10502v1

Authors (2): Junhua Lin, Marc Juarez

We present a novel attack specifically designed against Tree-Ring, a watermarking technique for diffusion models known for its high imperceptibility and robustness against removal attacks. Unlike previous removal attacks, which rely on strong assumptions about attacker capabilities, our attack only requires access to the variational autoencoder that was used to train the target diffusion model, a component that is often publicly available. By leveraging this variational autoencoder, the attacker can approximate the model’s intermediate latent space, enabling more effective surrogate-based attacks. Our evaluation shows that this approach leads to a dramatic reduction in the AUC of Tree-Ring detector’s ROC and PR curves, decreasing from 0.993 to 0.153 and from 0.994 to 0.385, respectively, while maintaining high image quality. Notably, our attacks outperform existing methods that assume full access to the diffusion model. These findings highlight the risk of reusing public autoencoders to train diffusion models – a threat not considered by current industry practices. Furthermore, the results suggest that the Tree-Ring detector’s precision, a metric that has been overlooked by previous evaluations, falls short of the requirements for real-world deployment.

我们展示了一种新颖的攻击,专门针对树林,一种以其高度不可感知性和强力来对付清除攻击的传播模型。与以往的清除攻击相比,我们的攻击仅仅需要使用用于训练目标扩散模型的变异自动编码器,这是经常公开使用的部件。通过利用这种变异自动编码器,攻击者可以接近模型的中间潜伏空间,从而能够更有效地进行代孕攻击。我们的评估表明,这一方法导致AUC的植树探测器ROC和PR曲线大幅下降,分别从0.993降至0.13和0.994降至0.385,同时保持高图像质量。值得注意的是,我们的攻击超出了现有方法,无法完全进入扩散模型。这些调查结果突出表明了重新使用公共自动编码器来训练扩散模型的风险 – – 目前工业实践没有考虑到这一威胁。此外,结果显示,树林探测器的精确度是以前评估所忽略的、实际世界部署要求的短处。


Article 150

Title@2025-06-12 (4): BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis

Title: BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis BugGen: Eine selbstkorrigierende LLM-Pipeline für eine realistische RTL-Bug-Synthese BugGen: 现实的 RTL 错误合成自更正多 Agency LLM 管道 2506.10501v1

Authors (7): Surya Jasper, Minh Luu, Evan Pan, Aakash Tyagi, Michael Quinn, Jiang Hu, David Kebo Houngninou

Hardware complexity continues to strain verification resources, motivating the adoption of machine learning (ML) methods to improve debug efficiency. However, ML-assisted debugging critically depends on diverse and scalable bug datasets, which existing manual or automated bug insertion methods fail to reliably produce. We introduce BugGen, a first of its kind, fully autonomous, multi-agent pipeline leveraging Large Language Models (LLMs) to systematically generate, insert, and validate realistic functional bugs in RTL. BugGen partitions modules, selects mutation targets via a closed-loop agentic architecture, and employs iterative refinement and rollback mechanisms to ensure syntactic correctness and functional detectability. Evaluated across five OpenTitan IP blocks, BugGen produced 500 unique bugs with 94% functional accuracy and achieved a throughput of 17.7 validated bugs per hour-over five times faster than typical manual expert insertion. Additionally, BugGen identified 104 previously undetected bugs in OpenTitan regressions, highlighting its utility in exposing verification coverage gaps. Compared against Certitude, BugGen demonstrated over twice the syntactic accuracy, deeper exposure of testbench blind spots, and more functionally meaningful and complex bug scenarios. Furthermore, when these BugGen-generated datasets were employed to train ML-based failure triage models, we achieved high classification accuracy (88.1%-93.2%) across different IP blocks, confirming the practical utility and realism of generated bugs. BugGen thus provides a scalable solution for generating high-quality bug datasets, significantly enhancing verification efficiency and ML-assisted debugging.

硬件复杂程度继续使核查资源紧张,促使采用机器学习(ML)方法来提高调试效率。然而,ML协助的调试关键取决于多种且可缩放的错误数据集,而现有的人工或自动错误插入方法无法可靠地生成这些数据集。我们引入了BugGen,这是同类的首个完全自主的多剂管道管道,利用大语言模型(LLLMS)系统生成、插入和验证了RTL. BugGen 分区模块中切合实际的功能错误,通过闭路代理结构选择了突变目标,并使用了迭代性改进和回滚回机制,以确保同步性效用正确性和功能可探测性。在五个 OpenTitan IP IP区中,BugGen 生成了500个独特的错误,其功能准确性为94%,并实现了17.7个经验证的错误,比典型的手动专家插入速度快5倍。此外,BugG在 OpenTeral-Timan Registration中发现了104个先前未检测过的错误,突出的错误在揭示核查范围上产生漏洞。 对比,BBBBBG 80,BBBen 和滚化的精确度的精确度为两倍的两倍。当我们使用了双向了两次地展示的精确度提高了的精确度测测测测测测测测测测算时,这些测测测测测测测测测测测测测测算。


Article 151

Title@2025-06-12 (4): Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG

Title: Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG Lag nicht, RAG: Training-freie Adversarial Detection mit RAG 不要拉格,RAG:使用RAG进行无训练的反向探测 2504.04858v2

Authors (4): Roie Kazoom, Raz Lapid, Moshe Sipper, Ofer Hadar

Adversarial patch attacks pose a major threat to vision systems by embedding localized perturbations that mislead deep models. Traditional defense methods often require retraining or fine-tuning, making them impractical for real-world deployment. We propose a training-free Visual Retrieval-Augmented Generation (VRAG) framework that integrates Vision-Language Models (VLMs) for adversarial patch detection. By retrieving visually similar patches and images that resemble stored attacks in a continuously expanding database, VRAG performs generative reasoning to identify diverse attack types, all without additional training or fine-tuning. We extensively evaluate open-source large-scale VLMs, including Qwen-VL-Plus, Qwen2.5-VL-72B, and UI-TARS-72B-DPO, alongside Gemini-2.0, a closed-source model. Notably, the open-source UI-TARS-72B-DPO model achieves up to 95 percent classification accuracy, setting a new state-of-the-art for open-source adversarial patch detection. Gemini-2.0 attains the highest overall accuracy, 98 percent, but remains closed-source. Experimental results demonstrate VRAG’s effectiveness in identifying a variety of adversarial patches with minimal human annotation, paving the way for robust, practical defenses against evolving adversarial patch attacks.

通过嵌入局部扰动,误导深层模型,Adversarial Adversarial Communication攻击对视觉系统构成重大威胁。传统防御方法往往需要再培训或微调,使其不适合于现实世界的部署。我们提议了一个无培训的视觉回溯回动一代(VRAG)框架,将视觉-语言模型(VLMS)与Gemini-2.0的封闭源模式结合起来。值得注意的是,开放源代码的UI-TARS-72B-DPO模型在不断扩展的数据库中获取类似存储式袭击的类似视觉补丁和图像,实现了95%的分类精确度,在没有额外培训或微调的情况下,都确定了各种不同的攻击类型。我们广泛评价了开放源大型VLMS,包括Qwen-VL-Plus、Qwen2.5-VL-72B和UI-TARS-72B-DPO(VRAG-72B-DPO),以及封闭源模型。一个最高水平的保密性测试结果,用于不断升级的VG-TAR-TAR-72B-DPO。


Article 152

Title@2025-06-12 (4): SHORE: A Long-term User Lifetime Value Prediction Model in Digital Games

Title: SHORE: A Long-term User Lifetime Value Prediction Model in Digital Games SHORE: Ein langfristiges User Lifetime Value Prediction Model in digitalen Spielen SHORE: 数字游戏中长期用户终身价值预测模型 2506.10487v1

Authors (6): Shuaiqi Sun, Congde Yuan, Haoqiang Yang, Mengzhuo Guo, Guiying Wei, Jiangbo Tian

In digital gaming, long-term user lifetime value (LTV) prediction is essential for monetization strategy, yet presents major challenges due to delayed payment behavior, sparse early user data, and the presence of high-value outliers. While existing models typically rely on either short-cycle observations or strong distributional assumptions, such approaches often underestimate long-term value or suffer from poor robustness. To address these issues, we propose SHort-cycle auxiliary with Order-preserving REgression (SHORE), a novel LTV prediction framework that integrates short-horizon predictions (e.g., LTV-15 and LTV-30) as auxiliary tasks to enhance long-cycle targets (e.g., LTV-60). SHORE also introduces a hybrid loss function combining order-preserving multi-class classification and a dynamic Huber loss to mitigate the influence of zero-inflation and outlier payment behavior. Extensive offline and online experiments on real-world datasets demonstrate that SHORE significantly outperforms existing baselines, achieving a 47.91\% relative reduction in prediction error in online deployment. These results highlight SHORE’s practical effectiveness and robustness in industrial-scale LTV prediction for digital games.

在数字赌博中,长期用户寿命值(LTV)预测对于货币化战略至关重要,但是由于延迟支付行为、早期用户数据稀少以及存在高值外值,带来了重大挑战。虽然现有模型通常依赖短周期观测或强有力的分配假设,但这类方法往往低估长期价值或缺乏稳健性。为了解决这些问题,我们提议采用“Shoort周期辅助”和“命令保护回流(SHORE)”,这是一个新的LTV预测框架,将短视预测(例如LTV-15和LTV-30)作为辅助任务,以加强长周期目标(例如LTV-60),从而构成重大挑战。SHORE还引入了混合损失功能,结合了维持订单的多级分类和动态枢纽损失,以减轻零通胀和外部支付行为的影响。在现实世界数据集上的广泛的离线和在线实验表明,SHORE大大超越了现有基线(例如LTVE),实现了在线部署中预测误差47.91%的相对减少。这些结果突出表明了SHORTV的实用性预测。


Article 153

Title@2025-06-12 (4): A Conjoint Graph Representation Learning Framework for Hypertension Comorbidity Risk Prediction

Title: A Conjoint Graph Representation Learning Framework for Hypertension Comorbidity Risk Prediction Ein konjoint Graph Representation Learning Framework für Hypertonie Comorbidity Risk Prediction 超高频风险预测联合图表代表学习框架 2505.05094v2

Authors (3): Leming Zhou, Zuo Wang, Zhixuan Duan

The comorbidities of hypertension impose a heavy burden on patients and society. Early identification is necessary to prompt intervention, but it remains a challenging task. This study aims to address this challenge by combining joint graph learning with network analysis. Motivated by this discovery, we develop a Conjoint Graph Representation Learning (CGRL) framework that: a) constructs two networks based on disease coding, including the patient network and the disease difference network. Three comorbidity network features were generated based on the basic difference network to capture the potential relationship between comorbidities and risk diseases; b) incorporates computational structure intervention and learning feature representation, CGRL was developed to predict the risks of diabetes and coronary heart disease in patients; and c) analysis the comorbidity patterns and exploring the pathways of disease progression, the pathological pathogenesis of diabetes and coronary heart disease may be revealed. The results show that the network features extracted based on the difference network are important, and the framework we proposed provides more accurate predictions than other strong models in terms of accuracy.

高血压的发病率给病人和社会带来沉重的负担。早期诊断对于及时干预来说是必要的,但它仍然是一项艰巨的任务。本研究旨在通过将联合图表学习与网络分析相结合来应对这一挑战。我们受这一发现驱动,开发了一个共同图表代表学习框架:(a) 建立基于疾病编码的两个网络,包括病人网络和疾病差异网络。三个发病率网络功能基于基本差异网络产生,以捕捉并发症和风险疾病之间的潜在关系;(b) 纳入计算结构干预和学习特征代表,开发了高血压中心,以预测糖尿病和慢性心脏病在病人中的风险;以及(c) 分析并发症模式并探索疾病演变的途径、糖尿病和冠心病病病病的病理病理病理发。结果显示,基于差异网络提取的网络特征很重要,我们提出的框架比其他强型模型更准确地预测。


Article 154

Title@2025-06-12 (4): Zero-Shot Offline Imitation Learning via Optimal Transport

Title: Zero-Shot Offline Imitation Learning via Optimal Transport Zero-Shot Offline-Imitation Lernen über optimalen Transport 通过最佳交通方式进行零热离线模拟学习 2410.08751v3

Authors (5): Thomas Rupf, Marco Bagatella, Nico Gürtler, Jonas Frey, Georg Martius

Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent’s immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks. The code is available at https://github.com/martius-lab/zilot.

零光模拟学习算法有望从测试时的单一演示中产生复制无形行为的希望。 现有的实用方法将专家演示视为一个目标序列, 能够与高级目标选择器进行模仿, 以及一个低级目标限制政策。 但是, 这个框架可能会受到近似行为的影响: 代理人为实现个别目标而立即采取的行动可能会破坏长期目标。 我们引入了一种新的方法, 通过直接优化与模仿学习所固有的占用匹配的目标来缓解这一问题。 我们提议将一个有目标条件的价值函数提升到占居之间的距离, 而后者又可以通过一个学习的世界模型进行近似。 由此产生的方法可以从离线、 亚最佳的数据中学习, 并且能够进行非显微、 零光的模仿, 正如我们在复杂、 连续的基准中所显示的那样。 该代码可以在 https://github.com/ martius-lab/zilot 上查阅 。


Article 155

Title@2025-06-12 (4): Engagement-Driven Content Generation with Large Language Models

Title: Engagement-Driven Content Generation with Large Language Models Engagement-getriebene Content-Generierung mit großen Sprachmodellen 具有大语言模式的 “ 互动互动 “ 式内容生成 2411.13187v5

Authors (5): Erica Coppolillo, Federico Cinus, Marco Minici, Francesco Bonchi, Giuseppe Manco

Large Language Models (LLMs) demonstrate significant persuasive capabilities in one-on-one interactions, but their influence within social networks, where interconnected users and complex opinion dynamics pose unique challenges, remains underexplored. This paper addresses the research question: \emph{Can LLMs generate meaningful content that maximizes user engagement on social networks?} To answer this, we propose a pipeline using reinforcement learning with simulated feedback, where the network’s response to LLM-generated content (i.e., the reward) is simulated through a formal engagement model. This approach bypasses the temporal cost and complexity of live experiments, enabling an efficient feedback loop between the LLM and the network under study. It also allows to control over endogenous factors such as the LLM’s position within the social network and the distribution of opinions on a given topic. Our approach is adaptive to the opinion distribution of the underlying network and agnostic to the specifics of the engagement model, which is embedded as a plug-and-play component. Such flexibility makes it suitable for more complex engagement tasks and interventions in computational social science. Using our framework, we analyze the performance of LLMs in generating social engagement under different conditions, showcasing their full potential in this task. The experimental code is publicly available at https://github.com/mminici/Engagement-Driven-Content-Generation.

大型语言模型(LLMS)在一对一互动中展示了相当的说服能力,但在社交网络中,用户相互联系和复杂的观点动态构成独特的挑战,其影响仍然未得到充分探讨。本文论述研究问题: =emph{Can LLMS产生有意义的内容,使用户在社交网络上的参与最大化?}为此,我们提议利用模拟反馈的强化学习管道,让网络对LLM产生的内容(即奖赏)的反应通过正式参与模式模拟。这种方法绕过现场实验的时间成本和复杂性,使得LLM和正在研究的网络之间能够有一个高效的反馈循环。它还能够控制本地因素,如LLMM在社交网络中的位置和对某个特定主题的意见的传播。我们的方法是适应基础网络的舆论分布,对参与模式的具体内容(即奖赏)的认知,该模式被嵌入为“Con-lay”部分。这种灵活性使得它适合于更复杂的参与任务和计算社会科学中的干预。我们用这个框架来分析LLMMMMS/LG公司在不同的条件下现有的实验性任务。


Article 156

Title@2025-06-12 (4): Prediction of steady states in a marine ecosystem model by a machine learning technique

Title: Prediction of steady states in a marine ecosystem model by a machine learning technique Vorhersage von stetigen Zuständen in einem marinen Ökosystemmodell durch eine Maschinelle Lerntechnik 通过机械学习技术预测海洋生态系统模型中稳定的状态 2506.10475v1

Authors (2): Sarker Miraz Mahfuz, Thomas Slawig

We used precomputed steady states obtained by a spin-up for a global marine ecosystem model as training data to build a mapping from the small number of biogeochemical model parameters onto the three-dimensional converged steady annual cycle. The mapping was performed by a conditional variational autoencoder (CVAE) with mass correction. Applied for test data, we show that the prediction obtained by the CVAE already gives a reasonable good approximation of the steady states obtained by a regular spin-up. However, the predictions do not reach the same level of annual periodicity as those obtained in the original spin-up data. Thus, we took the predictions as initial values for a spin-up. We could show that the number of necessary iterations, corresponding to model years, to reach a prescribed stopping criterion in the spin-up could be significantly reduced compared to the use of the originally uniform, constant initial value. The amount of reduction depends on the applied stopping criterion, measuring the periodicity of the solution. The savings in needed iterations and, thus, computing time for the spin-up ranges from 50 to 95\%, depending on the stopping criterion for the spin-up. We compared these results with the use of the mean of the training data as an initial value. We found that this also accelerates the spin-up, but only by a much lower factor.

我们用一个全球海洋生态系统模型的附带值获得的预测稳定状态作为培训数据,从少量生物地球化学模型参数到三维趋同的稳定的年周期进行绘图。该绘图由一个有条件的变异自动coder(CVAE)进行,并进行质量校正。我们应用了测试数据,显示CVAE获得的预测已经对通过定期旋转获得的稳定状态提供了合理的良好近似值。然而,预测并不达到与原始旋转数据中获得的周期值相同的年度周期水平。因此,我们将这些预测作为递增的初步值。我们可以显示,必要的迭代数(与模型年相对应)与质量校正校准相符,与最初的统一、不变的初始值相比,可能大大降低。降低的幅度取决于应用的停止标准,衡量解决办法的周期性。需要的年周期性节省量,因此,旋转周期的时间从50到95°不等,这取决于循环值的停止值,与模型年相比,我们也可以表明,必要的迭代数的数量可以达到一定的停止标准,与模型值相比,与最初的统一、不变的初始值相比,我们用这个加速值比较了这些系数的结果。我们还比较了这一数值。


Article 157

Title@2025-06-12 (4): PASCO (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms

Title: PASCO (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms PASCO (PArallel Structured COarsening): ein Overlay, um Graph Clustering Algorithmen zu beschleunigen PASCO(PARALLEL 结构化COseting):加速图形组合算法的重叠 2412.13592v2

Authors (7): Etienne Lasalle, Rémi Vaudaine, Titouan Vayer, Pierre Borgnat, Rémi Gribonval, Paulo Gonçalves, Màrton Karsai

Clustering the nodes of a graph is a cornerstone of graph analysis and has been extensively studied. However, some popular methods are not suitable for very large graphs: e.g., spectral clustering requires the computation of the spectral decomposition of the Laplacian matrix, which is not applicable for large graphs with a large number of communities. This work introduces PASCO, an overlay that accelerates clustering algorithms. Our method consists of three steps: 1-We compute several independent small graphs representing the input graph by applying an efficient and structure-preserving coarsening algorithm. 2-A clustering algorithm is run in parallel onto each small graph and provides several partitions of the initial graph. 3-These partitions are aligned and combined with an optimal transport method to output the final partition. The PASCO framework is based on two key contributions: a novel global algorithm structure designed to enable parallelization and a fast, empirically validated graph coarsening algorithm that preserves structural properties. We demonstrate the strong performance of 1 PASCO in terms of computational efficiency, structural preservation, and output partition quality, evaluated on both synthetic and real-world graph datasets.

将图形的节点分组为图表分析的基石,并且已经进行了广泛的研究。但是,一些流行的方法不适合非常大的图表:例如,光谱集群要求计算拉普拉西亚矩阵的光谱分解,而拉普拉西亚矩阵不适用于有大量社区的大图。这项工作介绍了PASCO, 这是一种加速组合算法的叠加。我们的方法包括三个步骤:1-我们通过应用高效和结构保存的粗略算法,计算代表输入图的几个独立的小图。2-A组算法是平行在每一个小图上运行的,并提供了初始图的若干分区。3-这些分区与输出最终分区的最佳运输方法一致并结合。PASCO框架基于两种关键贡献:一种新的全球算法结构,旨在促成平行化,一种经经验验证的快速图表分解算法,以保存结构特性。我们从计算效率、结构保存和产出分区质量的角度展示了1个PASCO在合成和现实世界图表数据集上的巨大表现。


Article 158

Title@2025-06-12 (4): Glimpse: Generalized Locality for Scalable and Robust CT

Title: Glimpse: Generalized Locality for Scalable and Robust CT Glimpse: Generalisierte Lokalität für skalierbare und robuste CT Glumpse: 可缩放和强力CT的通用位置 2401.00816v3

Authors (4): AmirEhsan Khorashadizadeh, Valentin Debarnot, Tianlin Liu, Ivan Dokmanić

Deep learning has become the state-of-the-art approach to medical tomographic imaging. A common approach is to feed the result of a simple inversion, for example the backprojection, to a multiscale convolutional neural network (CNN) which computes the final reconstruction. Despite good results on in-distribution test data, this often results in overfitting certain large-scale structures and poor generalization on out-of-distribution (OOD) samples. Moreover, the memory and computational complexity of multiscale CNNs scale unfavorably with image resolution, making them impractical for application at realistic clinical resolutions. In this paper, we introduce Glimpse, a local coordinate-based neural network for computed tomography which reconstructs a pixel value by processing only the measurements associated with the neighborhood of the pixel. Glimpse significantly outperforms successful CNNs on OOD samples, while achieving comparable or better performance on in-distribution test data and maintaining a memory footprint almost independent of image resolution; 5GB memory suffices to train on 1024x1024 images which is orders of magnitude less than CNNs. Glimpse is fully differentiable and can be used plug-and-play in arbitrary deep learning architectures, enabling feats such as correcting miscalibrated projection orientations. Our implementation and Google Colab demo can be accessed at https://github.com/swing-research/Glimpse.

深度学习已成为医学摄影成像的最先进的医学成像方法。 一种共同的方法是将简单的反转结果(例如回射)转化为一个计算最终重建的多比例进化神经神经网络(CNN ) 。 尽管在分布测试数据上取得了良好的结果,但这往往造成某些大规模结构的过度,而且对超出分布(OOOD)样本的简略化不甚理想。此外,多种规模的CNN的记忆和计算复杂性与图像解析几乎不相容,使其不切实际的临床分辨率应用。 在本文中,我们引入了Glimpse,一个基于当地协调的神经神经网络,用于计算像素值的计算,而计算网络仅通过处理与像素相邻的测量来重建像素值。 Glimpse 大大超越了在分布测试数据中成功的CNN的功能,同时保持一个几乎与图像解析无关的记忆足迹; 5GB记忆足以在1024x1024图像上培训比CNNB的级别较低的临床解析分辨率。 Glimpal-Glips-libal 和Gligalim-libal imal 学会可以完全不同。 学习Glical-lical-lical-lical


Article 159

Title@2025-06-12 (4): Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts

Title: Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts Auf dem Weg zur robusten multimodalen Emotionserkennung unter fehlenden Modalitäten und Verteilungsverschiebungen 争取在缺失模式和分销转移模式下强有力地承认多模式情感 2506.10452v1

Authors (5): Guowei Zhong, Ruohong Huan, Mingzhen Wu, Ronghua Liang, Peng Chen

Recent advancements in Multimodal Emotion Recognition (MER) face challenges in addressing both modality missing and Out-Of-Distribution (OOD) data simultaneously. Existing methods often rely on specific models or introduce excessive parameters, which limits their practicality. To address these issues, we propose a novel robust MER framework, Causal Inference Distiller (CIDer), and introduce a new task, Random Modality Feature Missing (RMFM), to generalize the definition of modality missing. CIDer integrates two key components: a Model-Specific Self-Distillation (MSSD) module and a Model-Agnostic Causal Inference (MACI) module. MSSD enhances robustness under the RMFM task through a weight-sharing self-distillation approach applied across low-level features, attention maps, and high-level representations. Additionally, a Word-level Self-aligned Attention Module (WSAM) reduces computational complexity, while a Multimodal Composite Transformer (MCT) facilitates efficient multimodal fusion. To tackle OOD challenges, MACI employs a tailored causal graph to mitigate label and language biases using a Multimodal Causal Module (MCM) and fine-grained counterfactual texts. Notably, MACI can independently enhance OOD generalization with minimal additional parameters. Furthermore, we also introduce the new repartitioned MER OOD datasets. Experimental results demonstrate that CIDer achieves robust performance in both RMFM and OOD scenarios, with fewer parameters and faster training compared to state-of-the-art methods. The implementation of this work is publicly accessible at https://github.com/gw-zhong/CIDer.

现有方法往往依赖具体模型或引入过多参数,从而限制其实用性。为了解决这些问题,我们提议采用一个新的强有力的市场模型框架,即Causal Inference Dispitiller(CIDer),并引入一个新的任务,即随机模式特征特征缺失(RMFM),以概括缺失模式的定义。CIDer综合了两个关键组成部分:模型特征自我蒸馏参数(MSSD)模块和模型-遗传诱导(MACI)模块。MSSD通过在低级别特征、关注地图和高级别表述中应用加权共享自我蒸馏法,增强RMFM任务下的稳健性。此外,Word级自我调控模块(WASAM)降低了计算复杂性,而多模式性能快速变异变(MCT)则促进高效的多式联运挑战,MACI采用定制的因果性图表,同时在IMM(IMA)通用的IMA和IMA(IMA) IM(O) IM(多级变数) 测试中引入了常规定义和IM(IM(IM) IM(IMA) IM(IMA) IM(IM) IM) IM(O) IM(IM) IM(IM) IM(IM) IML) IML) IML) 常规和新版本操作(O(O(IML) (IML) IML) IML) (IM(IML) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (OL) (IML) (IMLL) (IM) (IM) (IM) (IM) (IML) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IML) (IML) (IML) (IML) (IM) (IM(O) (IM(IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (IM) (O) (IM) (IM) (IM) (IM) (IML) (IM) (IM


Article 160

Title@2025-06-12 (4): Extending AALpy with Passive Learning: A Generalized State-Merging Approach

Title: Extending AALpy with Passive Learning: A Generalized State-Merging Approach AALpy mit passivem Lernen erweitern: Ein generalisierter Staats-Merging-Ansatz 扩大被动学习的AALPY:普遍化的国家-老龄化办法 2506.06333v2

Authors (2): Benjamin von Berg, Bernhard K. Aichernig

AALpy is a well-established open-source automata learning library written in Python with a focus on active learning of systems with IO behavior. It provides a wide range of state-of-the-art algorithms for different automaton types ranging from fully deterministic to probabilistic automata. In this work, we present the recent addition of a generalized implementation of an important method from the domain of passive automata learning: state-merging in the red-blue framework. Using a common internal representation for different automaton types allows for a general and highly configurable implementation of the red-blue framework. We describe how to define and execute state-merging algorithms using AALpy, which reduces the implementation effort for state-merging algorithms mainly to the definition of compatibility criteria and scoring. This aids the implementation of both existing and novel algorithms. In particular, defining some existing state-merging algorithms from the literature with AALpy only takes a few lines of code.

AALpy 是一个以 Python 写成的成熟的开放源码自动磁体学习图书馆, 重点是积极学习具有 IO 行为的系统。 它为从完全确定性到概率性不同类型的不同自动磁体提供了一系列最先进的算法, 从完全确定性到概率性自动磁体。 在这项工作中, 我们介绍了最近从被动自动磁体学习领域普遍采用重要方法的附加情况: 红蓝色框架中的国家合并。 使用不同的自动磁体类型的共同内部代表可以普遍和高度可配置地实施红蓝色框架。 我们描述如何定义和执行使用 AALPy 的州合并算法, 从而减少国家合并算法的实施努力, 主要是确定兼容性标准和评分。 这有利于实施现有和新颖的算法。 特别是, 从文献中用 AALpy 来定义某些现有的州合并算法只需要几行代码。


Article 161

Title@2025-06-12 (4): MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices

Title: MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices MNN-LLM: Eine generische Inferenz-Engine für schnelle großsprachliche Modellbereitstellung auf mobilen Geräten MNN-LLM:移动设备快速大语言模型部署通用推断引擎 2506.10443v1

Authors (7): Zhaode Wang, Jingbang Yang, Xinyu Qian, Shiwen Xing, Xiaotang Jiang, Chengfei Lv, Shengyu Zhang

Large language models (LLMs) have demonstrated exceptional performance across a variety of tasks. However, their substantial scale leads to significant computational resource consumption during inference, resulting in high costs. Consequently, edge device inference presents a promising solution. The primary challenges of edge inference include memory usage and inference speed. This paper introduces MNN-LLM, a framework specifically designed to accelerate the deployment of large language models on mobile devices. MNN-LLM addresses the runtime characteristics of LLMs through model quantization and DRAM-Flash hybrid storage, effectively reducing memory usage. It rearranges weights and inputs based on mobile CPU instruction sets and GPU characteristics while employing strategies such as multicore load balancing, mixed-precision floating-point operations, and geometric computations to enhance performance. Notably, MNN-LLM achieves up to a 8.6x speed increase compared to current mainstream LLM-specific frameworks.

大型语言模型(LLMS)在各种任务中表现出非凡的性能,然而,它们的巨大规模导致在推论期间大量计算资源消耗,从而导致高成本。因此,边缘装置推论提出了一个有希望的解决办法。边缘推论的主要挑战包括内存使用和推论速度。本文介绍了MNN-LLM,这是一个专门旨在加速在移动设备上部署大型语言模型的框架。MNN-LLM通过模型四分化和DRAM-Flash混合储存处理LMS的运行时间特点,有效地减少了记忆的用量。它根据移动式CPU教学组和GPU特性重新安排了重量和投入,同时采用了多核心负载平衡、混合精度浮动点操作和几何计算等战略,以提高性能。值得注意的是,MNN-LLM比当前的主流LM特定框架提高了8.6x的速度。


Article 162

Title@2025-06-12 (4): Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Title: Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Experten-Rennen: Eine flexible Routing-Strategie für Scaling Diffusion Transformer mit Mixture von Experten 专家种族:与专家混合的扩大传播变异器灵活机动战略 2503.16057v3

Authors (7): Yike Yuan, Ziyu Wang, Zihao Huang, Defa Zhu, Xun Zhou, Jingyi Yu, Qiyang Min

Diffusion models have emerged as mainstream framework in visual generation. Building upon this success, the integration of Mixture of Experts (MoE) methods has shown promise in enhancing model scalability and performance. In this paper, we introduce Race-DiT, a novel MoE model for diffusion transformers with a flexible routing strategy, Expert Race. By allowing tokens and experts to compete together and select the top candidates, the model learns to dynamically assign experts to critical tokens. Additionally, we propose per-layer regularization to address challenges in shallow layer learning, and router similarity loss to prevent mode collapse, ensuring better expert utilization. Extensive experiments on ImageNet validate the effectiveness of our approach, showcasing significant performance gains while promising scaling properties.

在视觉生成中,传播模式已成为主流框架。在这一成功的基础上,整合专家混合方法在加强模型缩放性和性能方面显示出了希望。在本文中,我们引入了 “ 种族DIT “ ,这是用于传播变压器的新型的 “ 种族DIT “ 模型,这是一个具有灵活路由战略的传播变压器的新颖的 “ 种族DIT “ 模型。通过允许象征性品和专家共同竞争和挑选顶级候选人,该模型学会动态地将专家分配到关键标志上。此外,我们提议对每个层次进行正规化,以应对浅层学习方面的挑战,以及路由性相似性损失,以防止模式崩溃,确保专家得到更好的利用。关于图像网络的广泛实验验证了我们的方法的有效性,展示了在有希望提升性能的同时取得的重大绩效收益。


Article 163

Title@2025-06-12 (4): MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

Title: MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning MasHost baut alles: Autonomes Multi-Agenten-System, das durch Verstärkungslernen gesteuert wird 以强化学习为导向的多机构自治系统 2506.08507v2

Authors (8): Kuo Yang, Xingjie Yang, Linhui Yu, Qing Xu, Yan Fang, Xu Wang, Zhengyang Zhou, Yang Wang

Large Language Model (LLM)-driven Multi-agent systems (Mas) have recently emerged as a powerful paradigm for tackling complex real-world tasks. However, existing Mas construction methods typically rely on manually crafted interaction mechanisms or heuristic rules, introducing human biases and constraining the autonomous ability. Even with recent advances in adaptive Mas construction, existing systems largely remain within the paradigm of semi-autonomous patterns. In this work, we propose MasHost, a Reinforcement Learning (RL)-based framework for autonomous and query-adaptive Mas design. By formulating Mas construction as a graph search problem, our proposed MasHost jointly samples agent roles and their interactions through a unified probabilistic sampling mechanism. Beyond the accuracy and efficiency objectives pursued in prior works, we introduce component rationality as an additional and novel design principle in Mas. To achieve this multi-objective optimization, we propose Hierarchical Relative Policy Optimization (HRPO), a novel RL strategy that collaboratively integrates group-relative advantages and action-wise rewards. To our knowledge, our proposed MasHost is the first RL-driven framework for autonomous Mas graph construction. Extensive experiments on six benchmarks demonstrate that MasHost consistently outperforms most competitive baselines, validating its effectiveness, efficiency, and structure rationality.

大型语言模型(LLM)驱动的多试剂系统(Mas)最近成为处理复杂的现实世界任务的一个强有力的范例,然而,现有的Mas建筑方法通常依赖人工设计的互动机制或超常规则,引入人类偏见并限制自主能力。即使最近在适应性Mas建设方面有所进展,现有系统在很大程度上仍然处于半自治模式范式的范式之内。在这项工作中,我们提议以MasHost为主的强化学习框架(RL)为基础,用于自主和调试性Mas设计。通过将Mas建筑设计成图表搜索问题,我们提议的Mashost联合样本代理作用及其相互作用通过统一的概率抽样机制进行。除了在以前的工程中追求的准确性和效率目标外,我们还引入了部分合理性,作为新的设计原则。为了实现这一多目标优化,我们提议了高分级相对优化(HRPO),这是一个新型的RL战略,将群体优势和行动角度的奖赏结合起来。我们所拟议的Mashost是第一个由RL驱动的自主结构最有竞争力的标准,在连续的马斯最有竞争力的结构上展示的基线。


Article 164

Title@2025-06-12 (4): From Features to Graphs: Exploring Graph Structures and Pairwise Interactions via GNNs

Title: From Features to Graphs: Exploring Graph Structures and Pairwise Interactions via GNNs Von Features zu Graphen: Erforschen von Graphenstrukturen und paarweise Interaktionen über GNNs 从地貌到图表:通过GNNs探索图表结构和对称互动 2502.13471v2

Authors (4): Phaphontee Yamchote, Saw Nay Htet Win, Chainarong Amornbunchornvej, Thanapon Noraset

Feature interaction is crucial in predictive machine learning models, as it captures the relationships between features that influence model performance. In this work, we focus on pairwise interactions and investigate their importance in constructing feature graphs for Graph Neural Networks (GNNs). We leverage existing GNN models and tools to explore the relationship between feature graph structures and their effectiveness in modeling interactions. Through experiments on synthesized datasets, we uncover that edges between interacting features are important for enabling GNNs to model feature interactions effectively. We also observe that including non-interaction edges can act as noise, degrading model performance. Furthermore, we provide theoretical support for sparse feature graph selection using the Minimum Description Length (MDL) principle. We prove that feature graphs retaining only necessary interaction edges yield a more efficient and interpretable representation than complete graphs, aligning with Occam’s Razor. Our findings offer both theoretical insights and practical guidelines for designing feature graphs that improve the performance and interpretability of GNN models.

在预测性机器学习模型中,特性互动至关重要,因为它能捕捉影响模型性能的特征之间的关系。在这项工作中,我们注重双向互动,并调查其在为图形神经网络(GNN)构建特征图形方面的重要性。我们利用现有的GNN模型和工具来探索特征图形结构之间的关系及其建模互动的有效性。通过综合数据集实验,我们发现互动特征之间的边缘对于使GNN能够有效地模拟特征互动非常重要。我们还注意到,包括非互动边缘可以起到噪音、降低模式性能的作用。此外,我们提供理论支持,支持使用最低描述长度原则选择稀有特征图形。我们证明,只保留必要互动边缘的特征图形能够比完整的图形产生比完整图形更高效和可解释的描述性。我们的调查结果为设计改进GNN模型的性能和可解释性提供了理论洞察力和实用指南。


Article 165

Title@2025-06-12 (4): System Identification Using Kolmogorov-Arnold Networks: A Case Study on Buck Converters

Title: System Identification Using Kolmogorov-Arnold Networks: A Case Study on Buck Converters Systemidentifizierung mit Kolmogorov-Arnold Networks: Eine Fallstudie zu Buck Convertern 使用Kolmogorov-Arnold网络的系统识别系统:关于巴克转换器的案例研究 2506.10434v1

Authors (3): Nart Gashi, Panagiotis Kakosimos, George Papafotiou

Kolmogorov-Arnold Networks (KANs) are emerging as a powerful framework for interpretable and efficient system identification in dynamic systems. By leveraging the Kolmogorov-Arnold representation theorem, KANs enable function approximation through learnable activation functions, offering improved scalability, accuracy, and interpretability compared to traditional neural networks. This paper investigates the application of KANs to model and analyze the dynamics of a buck converter system, focusing on state-space parameter estimation along with discovering the system equations. Using simulation data, the methodology involves approximating state derivatives with KANs, constructing interpretable state-space representations, and validating these models through numerical experiments. The results demonstrate the ability of KANs to accurately identify system dynamics, verify model consistency, and detect parameter changes, providing valuable insights into their applicability for system identification in modern industrial systems.

Kolmogorov-Arnold Networks (KANs) 正在成为动态系统中可解释和高效系统识别的强大框架。 通过利用 Kolmogorov-Arnold 演示理论, KANs 能够通过可学习的激活功能实现功能近似, 提供比传统神经网络更好的可缩放性、 准确性和可解释性。 本文调查了 KANs 用于模拟和分析高压转换器系统的动态, 重点是州空间参数估计以及发现系统方程式。 使用模拟数据, 该方法涉及将国家衍生物与 KANs相匹配, 构建可解释的状态- 空间表达方式, 通过数字实验验证这些模型。 结果表明 KANs 能够准确识别系统动态、 校验模型一致性和检测参数变化, 提供有价值的洞察其适用于现代工业系统中系统识别系统的实用性。


Article 166

Title@2025-06-12 (4): Measuring Semantic Information Production in Generative Diffusion Models

Title: Measuring Semantic Information Production in Generative Diffusion Models Messung der semantischen Informationsproduktion in generativen Diffusionsmodellen 生物传播模型中计量语义信息生产 2506.10433v1

Authors (4): Florian Handke, Félix Koulischer, Gabriel Raya, Luca Ambrogioni

It is well known that semantic and structural features of the generated images emerge at different times during the reverse dynamics of diffusion, a phenomenon that has been connected to physical phase transitions in magnets and other materials. In this paper, we introduce a general information-theoretic approach to measure when these class-semantic “decisions” are made during the generative process. By using an online formula for the optimal Bayesian classifier, we estimate the conditional entropy of the class label given the noisy state. We then determine the time intervals corresponding to the highest information transfer between noisy states and class labels using the time derivative of the conditional entropy. We demonstrate our method on one-dimensional Gaussian mixture models and on DDPM models trained on the CIFAR10 dataset. As expected, we find that the semantic information transfer is highest in the intermediate stages of diffusion while vanishing during the final stages. However, we found sizable differences between the entropy rate profiles of different classes, suggesting that different “semantic decisions” are located at different intermediate times.

众所周知, 生成图像的语义和结构特征在不同时间的反向扩散动态中出现, 这种现象与磁体和其他材料的物理阶段转变有关。 在本文中, 我们引入了一种一般的信息理论方法, 以测量这些类类的语义“ 决定” 是在基因化过程中作出的。 我们通过使用最佳贝叶斯语分类器的在线公式, 估计了等级标签的有条件的酶值, 以吵闹的状态为条件。 然后, 我们用有条件的 entropy 衍生物的时间来决定噪音状态和类标签之间最高信息传输的时间间隔。 我们用单维高斯语混合模型和在 CIFAR10 数据集培训的 DDPM 模型来演示我们的方法 。 正如预期的那样, 我们发现语义信息传输在传播的中间阶段最高, 在最后阶段消失。 然而, 我们发现不同等级的 的 伦比率剖面剖面图有相当大的差异, 表明不同的“ 命令” 位于不同的中间时期 。


Article 167

Title@2025-06-12 (4): CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning

Title: CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning CheMatAgent: Verbesserung von LLMs für Chemie und Materialwissenschaft durch baumsuchebasiertes Tool Learning CheMatAgent:通过植树搜索工具学习加强化学和材料科学LLMs 2506.07551v2

Authors (10): Mengsong Wu, YaFei Wang, Yidong Ming, Yuqi An, Yuwei Wan, Wenliang Chen, Binbin Lin, Yuqiang Li, Tong Xie, Dongzhan Zhou

Large language models (LLMs) have recently demonstrated promising capabilities in chemistry tasks while still facing challenges due to outdated pretraining knowledge and the difficulty of incorporating specialized chemical expertise. To address these issues, we propose an LLM-based agent that synergistically integrates 137 external chemical tools created ranging from basic information retrieval to complex reaction predictions, and a dataset curation pipeline to generate the dataset ChemToolBench that facilitates both effective tool selection and precise parameter filling during fine-tuning and evaluation. We introduce a Hierarchical Evolutionary Monte Carlo Tree Search (HE-MCTS) framework, enabling independent optimization of tool planning and execution. By leveraging self-generated data, our approach supports step-level fine-tuning (FT) of the policy model and training task-adaptive PRM and ORM that surpass GPT-4o. Experimental evaluations demonstrate that our approach significantly improves performance in Chemistry QA and discovery tasks, offering a robust solution to integrate specialized tools with LLMs for advanced chemical applications. All datasets and code are available at https://github.com/AI4Chem/ChemistryAgent .

大型语言模型(LLMS)最近表现出了在化学任务方面充满希望的能力,同时由于培训前知识过时和难以纳入专门的化学专门知识,仍然面临着挑战。为了解决这些问题,我们提议一个基于LLM的代理机构,以协同方式整合从基本信息检索到复杂反应预测等范围创建的137个外部化学工具,以及一个旨在生成数据集ChemToolBench的数据中心曲线管道,该平台既有利于在微调和评估期间有效选择工具,也有利于精确的参数填充。我们引入了一个等级进化蒙特卡洛树搜索框架,使工具规划和执行能够独立优化。我们的方法利用自我生成的数据,支持政策模型的级微调,以及培训任务适应性PRM和ORM,这超过了GPT-4o。实验性评估表明,我们的方法大大改进了化学质量和发现任务的业绩,为先进化学应用的专业工具与LMS整合提供了强有力的解决方案。所有数据集和代码都可在https://github.com/AI4chem/ChemAgentistry上查阅。


Article 168

Title@2025-06-12 (4): Multi-dimensional Autoscaling of Processing Services: A Comparison of Agent-based Methods

Title: Multi-dimensional Autoscaling of Processing Services: A Comparison of Agent-based Methods Mehrdimensionale Autoskalierung von Verarbeitungsdienstleistungen: Ein Vergleich von agentenbasierten Methoden 处理服务多维多维自动升级:以代理为基础的方法比较 2506.10420v1

Authors (5): Boris Sedlak, Alireza Furutanpey, Zihang Wang, Víctor Casamayor Pujol, Schahram Dustdar

Edge computing breaks with traditional autoscaling due to strict resource constraints, thus, motivating more flexible scaling behaviors using multiple elasticity dimensions. This work introduces an agent-based autoscaling framework that dynamically adjusts both hardware resources and internal service configurations to maximize requirements fulfillment in constrained environments. We compare four types of scaling agents: Active Inference, Deep Q Network, Analysis of Structural Knowledge, and Deep Active Inference, using two real-world processing services running in parallel: YOLOv8 for visual recognition and OpenCV for QR code detection. Results show all agents achieve acceptable SLO performance with varying convergence patterns. While the Deep Q Network benefits from pre-training, the structural analysis converges quickly, and the deep active inference agent combines theoretical foundations with practical scalability advantages. Our findings provide evidence for the viability of multi-dimensional agent-based autoscaling for edge environments and encourage future work in this research direction.

由于严格的资源限制,计算断层与传统的自动计算断裂,因此,利用多种弹性维度鼓励更灵活的缩放行为。这项工作引入了一个基于代理的自动缩放框架,对硬件资源和内部服务配置进行动态调整,以最大限度地满足受限制环境中的要求。我们比较了四种类型的缩放剂:主动推论、深Q网络、结构知识分析和深活性推理,同时运行两个真实世界的处理服务:用于视觉识别的YOLOv8和用于QR代码检测的 OpenCV。结果显示,所有代理都实现了可接受的 SLO 性能,并有各种不同的趋同模式。虽然深Q网络从培训前获益,但结构分析迅速汇合,深活性推理剂将理论基础与实用的可伸缩性优势结合起来。我们的调查结果为边缘环境基于多维体的自动调整的可行性提供了证据,并鼓励今后在这一研究方向开展工作。


Article 169

Title@2025-06-12 (4): Data-Driven Soil Organic Carbon Sampling: Integrating Spectral Clustering with Conditioned Latin Hypercube Optimization

Title: Data-Driven Soil Organic Carbon Sampling: Integrating Spectral Clustering with Conditioned Latin Hypercube Optimization Datengetriebene Bodenorganische Kohlenstoff-Probenahme: Integration von Spektral-Clustering mit bedingter lateinischer Hyperkube-Optimierung 数据驱动土壤有机碳取样:将光谱集成与有条件的拉丁超立方体优化结合 2506.10419v1

Authors (3): Weiying Zhao, Aleksei Unagaev, Natalia Efremova

Soil organic carbon (SOC) monitoring often relies on selecting representative field sampling locations based on environmental covariates. We propose a novel hybrid methodology that integrates spectral clustering - an unsupervised machine learning technique with conditioned Latin hypercube sampling (cLHS) to enhance the representativeness of SOC sampling. In our approach, spectral clustering partitions the study area into $K$ homogeneous zones using multivariate covariate data, and cLHS is then applied within each zone to select sampling locations that collectively capture the full diversity of environmental conditions. This hybrid spectral-cLHS method ensures that even minor but important environmental clusters are sampled, addressing a key limitation of vanilla cLHS which can overlook such areas. We demonstrate on a real SOC mapping dataset that spectral-cLHS provides more uniform coverage of covariate feature space and spatial heterogeneity than standard cLHS. This improved sampling design has the potential to yield more accurate SOC predictions by providing better-balanced training data for machine learning models.

土壤有机碳(SOC)监测往往依靠在环境共变的基础上选择具有代表性的实地取样地点。我们建议采用一种新的混合方法,将光谱集聚(一种不受监督的机器学习技术,使用有条件的拉丁超立方取样(cLHS)来提高SOC取样的代表性。在我们的方法中,光谱集聚将研究区分成一个使用多变量共变数据,将研究区划成千元同质区,然后在每个区内应用CLHS来选择能够共同捕捉环境条件充分多样性的取样地点。这种混合光谱-cLHS方法确保即使是小型但重要的环境组群都取样,解决可以忽略这些地区的香草丙基HS的一个关键限制。我们用真正的SOC测绘数据集证明,光谱-LHS提供了比标准 cLHS系统更统一的共变地空间和空间异性区。这种改进的取样设计有可能通过为机器学习模型提供更平衡的培训数据,产生更准确的SOC预测。


Article 170

Title@2025-06-12 (4): Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation

Title: Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation Verbesserter Algorithmus für tiefes aktives Lernen unter Unwucht durch Optimale Trennung 通过最佳离职,在平衡下改善深层积极学习的比值 2312.09196v4

Authors (4): Shyam Nuggehalli, Jifan Zhang, Lalit Jain, Robert Nowak

Class imbalance severely impacts machine learning performance on minority classes in real-world applications. While various solutions exist, active learning offers a fundamental fix by strategically collecting balanced, informative labeled examples from abundant unlabeled data. We introduce DIRECT, an algorithm that identifies class separation boundaries and selects the most uncertain nearby examples for annotation. By reducing the problem to one-dimensional active learning, DIRECT leverages established theory to handle batch labeling and label noise – another common challenge in data annotation that particularly affects active learning methods. Our work presents the first comprehensive study of active learning under both class imbalance and label noise. Extensive experiments on imbalanced datasets show DIRECT reduces annotation costs by over 60\% compared to state-of-the-art active learning methods and over 80\% versus random sampling, while maintaining robustness to label noise.

虽然存在各种解决办法,但积极学习通过从大量未贴标签的数据中战略性地收集平衡、信息化的标签示例,提供了根本的解决方案。我们引入了直接(Direct)算法,该算法识别了分类隔离界限,并选择了附近最不确定的注解实例。通过将问题降低为一维积极学习,直接(Direct)利用既定理论处理批量标签和标签噪音,这是数据批量标签和标签噪音方面的另一个共同挑战,尤其影响积极的学习方法。我们的工作展示了在分类不平衡和标签噪音下积极学习的首次全面研究。关于不平衡数据集的广泛实验显示,直接(Direct)比最新活跃学习方法减少了批量记成本60%以上,相对于随机抽样则减少了80%以上,同时保持了标签噪音的稳健性。


Article 171

Title@2025-06-12 (4): Towards Graph-Based Privacy-Preserving Federated Learning: ModelNet – A ResNet-based Model Classification Dataset

Title: Towards Graph-Based Privacy-Preserving Federated Learning: ModelNet – A ResNet-based Model Classification Dataset Auf dem Weg zu einem graphisch basierten Datenschutz-Erhaltungs-Federated Learning: ModelNet – Ein resNet-basierter Modellklassifikations-Datensatz 走向基于图表的隐私-保留联邦学习:模型网 – – 以ResNet为基础的分类模型数据集 2506.00476v2

Authors (2): Abhisek Ray, Lukas Esterle

Federated Learning (FL) has emerged as a powerful paradigm for training machine learning models across distributed data sources while preserving data locality. However, the privacy of local data is always a pivotal concern and has received a lot of attention in recent research on the FL regime. Moreover, the lack of domain heterogeneity and client-specific segregation in the benchmarks remains a critical bottleneck for rigorous evaluation. In this paper, we introduce ModelNet, a novel image classification dataset constructed from the embeddings extracted from a pre-trained ResNet50 model. First, we modify the CIFAR100 dataset into three client-specific variants, considering three domain heterogeneities (homogeneous, heterogeneous, and random). Subsequently, we train each client-specific subset of all three variants on the pre-trained ResNet50 model to save model parameters. In addition to multi-domain image data, we propose a new hypothesis to define the FL algorithm that can access the anonymized model parameters to preserve the local privacy in a more effective manner compared to existing ones. ModelNet is designed to simulate realistic FL settings by incorporating non-IID data distributions and client diversity design principles in the mainframe for both conventional and futuristic graph-driven FL algorithms. The three variants are ModelNet-S, ModelNet-D, and ModelNet-R, which are based on homogeneous, heterogeneous, and random data settings, respectively. To the best of our knowledge, we are the first to propose a cross-environment client-specific FL dataset along with the graph-based variant. Extensive experiments based on domain shifts and aggregation strategies show the effectiveness of the above variants, making it a practical benchmark for classical and graph-based FL research. The dataset and related code are available online.

联邦学习联合会(FL)已成为在保存数据地点的同时,培训分布式数据源的机器学习模型的强大范例。然而,当地数据的隐私始终是一个关键关切,在最近关于FL制度的研究中受到了很多关注。此外,基准中缺乏域异质性和客户专用隔离,这仍然是严格评估的关键瓶颈。在本文中,我们引入了模型Net,这是从事先培训的ResNet50模型中嵌入的一个新的图像分类数据集。首先,我们将CIFAR100数据集修改为三种客户特有变异,考虑到三种域异性(均匀、混杂和随机)。随后,我们用预先培训的ResNet50模型模型来培训所有三种变异,以保存模型参数。除了多域图像数据外,我们还提出了一个新的假设,用来确定FL算基模型参数,以便以更高效的方式维护当地隐私。模型Net旨在模拟FL(FL)的域域域域域域域数据流(Flalderalalalality)和Floral-al-al-al-ligal 数据流(Floral-lational-liversal-liversal-lational-dal-lational-lational-lational-lational-lational-lational-lational-lational-lational-lational-lations)的模型)的模型和Fmo 数据流流流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据分布、F-FM-F-F-F-slation-Flation-lation-lation-lation-lation-lation-lation-lation-xx-lation-lation-s-s-s-s-x-l-l-I-I-I-l-l-l-l-I-I-I-l-l-l-I-I-I-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-I-l-l-I-l-l-I-I-I-I-l


Article 172

Title@2025-06-12 (4): Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series

Title: Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series Zeit-IMM: Ein Datensatz und Benchmark für irreguläre multimodale Multivariate Zeitreihen 时间-IMM:非正常多式联运多变时间序列的数据集和基准 2506.10412v1

Authors (7): Ching Chang, Jeehyun Hwang, Yidan Shi, Haixin Wang, Wen-Chih Peng, Tien-Fu Chen, Wei Wang

Time series data in real-world applications such as healthcare, climate modeling, and finance are often irregular, multimodal, and messy, with varying sampling rates, asynchronous modalities, and pervasive missingness. However, existing benchmarks typically assume clean, regularly sampled, unimodal data, creating a significant gap between research and real-world deployment. We introduce Time-IMM, a dataset specifically designed to capture cause-driven irregularity in multimodal multivariate time series. Time-IMM represents nine distinct types of time series irregularity, categorized into trigger-based, constraint-based, and artifact-based mechanisms. Complementing the dataset, we introduce IMM-TSF, a benchmark library for forecasting on irregular multimodal time series, enabling asynchronous integration and realistic evaluation. IMM-TSF includes specialized fusion modules, including a timestamp-to-text fusion module and a multimodality fusion module, which support both recency-aware averaging and attention-based integration strategies. Empirical results demonstrate that explicitly modeling multimodality on irregular time series data leads to substantial gains in forecasting performance. Time-IMM and IMM-TSF provide a foundation for advancing time series analysis under real-world conditions. The dataset is publicly available at https://www.kaggle.com/datasets/blacksnail789521/time-imm/data, and the benchmark library can be accessed at https://anonymous.4open.science/r/IMMTSF_NeurIPS2025.

在现实世界应用中的时间序列数据,如医疗保健、气候模型和融资,往往不规则、多模式和混乱,抽样率不同,不统一的方式和普遍缺失;然而,现有基准通常假定清洁、定期抽样、单一方式的数据,造成研究和现实世界部署之间的巨大差距;我们引入了时间-IMM,这是一个数据集,专门用来记录多式多变时间序列中由原因驱动的不规则性;时间-IMM是9种不同的时间序列,分为触发型、约束型和工艺型机制;补充数据集,我们引入IMM-TSF,这是一个用于预测不规则的多式联运时间序列的基准图书馆,能够实现非同步整合和现实评估;IMM-TSM-TSF包括专门的聚合模块,包括一个时间戳至文字融合模块和一个基于多式多变时间序列的多式联运模块;时间-观测平均和关注型整合型战略。


Article 173

Title@2025-06-12 (4): DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models

Title: DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models DeepGDel: Deep Learning-basierte Gene Deletion Prediction Framework für wachstumsverbundene Produktion in Genom-Scale Metabolic-Modellen 深层GDel:在基因组-规模元元模型中实现增长和混合生产以深学习为基础的基因删除预测框架 2504.06316v2

Authors (2): Ziwei Yang, Takeyuki Tamura

In genome-scale constraint-based metabolic models, gene deletion strategies are crucial for achieving growth-coupled production, where cell growth and target metabolite production are simultaneously achieved. While computational methods for calculating gene deletions have been widely explored and contribute to developing gene deletion strategy databases, current approaches are limited in leveraging new data-driven paradigms, such as machine learning, for more efficient strain design. Therefore, it is necessary to propose a fundamental framework for this objective. In this study, we first formulate the problem of gene deletion strategy prediction and then propose a framework for predicting gene deletion strategies for growth-coupled production in genome-scale metabolic models. The proposed framework leverages deep learning algorithms to learn and integrate sequential gene and metabolite data representation, enabling the automatic gene deletion strategy prediction. Computational experiment results demonstrate the feasibility of the proposed framework, showing substantial improvements over baseline methods. Specifically, the proposed framework achieves a 14.69%, 22.52%, and 13.03% increase in overall accuracy across three metabolic models of different scales under study, while maintaining balanced precision and recall in predicting gene deletion statuses. The source code and examples for the framework are publicly available at https://github.com/MetNetComp/DeepGDel.

在基因组规模的抑制性代谢模型中,基因删除战略对于实现生长混合生产至关重要,因为细胞增长和目标代谢物生产同时实现。虽然计算基因删除的计算方法已经得到广泛探讨,并有助于开发基因删除战略数据库,但目前采用的方法有限,难以利用新的数据驱动模式,如机器学习,以便更有效地设计菌株,因此,有必要为此目标提出一个基本框架。在本研究中,我们首先提出基因删除战略预测问题,然后提出一个框架,用于预测基因淘汰战略,以便在基因组规模的代谢模型中实现增长混合生产。拟议框架利用深层次的学习算法,学习和整合顺序基因和代谢物数据,使自动基因删除战略预测成为可能。综合实验结果显示拟议框架的可行性,表明在基线方法上有了重大改进。具体地说,拟议框架在研究中,在三个不同规模的代谢模型中,实现了14.69%、22.52%和13.03%的总体准确度提高,同时保持平衡的精确度,并回顾基因删除状态。MARB/MF/SEBSUD/F/FIFSUD/FIFLD/FDSUD 提供的源代码和框架是公开框架的范例。


Article 174

Title@2025-06-12 (4): IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection

Title: IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection IoTGeM: Generalisierbare Modelle für die IoT-Angrifferkennung auf Verhaltensbasis IotGeM:基于行为的IoT攻击探测通用模型 2401.01343v2

Authors (3): Kahraman Kostas, Mike Just, Michael A. Lones

Previous research on behavior-based attack detection for networks of IoT devices has resulted in machine learning models whose ability to adapt to unseen data is limited and often not demonstrated. This paper presents IoTGeM, an approach for modeling IoT network attacks that focuses on generalizability, yet also leads to better detection and performance. We first introduce an improved rolling window approach for feature extraction. To reduce overfitting, we then apply a multi-step feature selection process where a Genetic Algorithm (GA) is uniquely guided by exogenous feedback from a separate, independent dataset. To prevent common data leaks that have limited previous models, we build and test our models using strictly isolated train and test datasets. The resulting models are rigorously evaluated using a diverse portfolio of machine learning algorithms and datasets. Our window-based models demonstrate superior generalization compared to traditional flow-based models, particularly when tested on unseen datasets. On these stringent, cross-dataset tests, IoTGeM achieves F1 scores of 99\% for ACK, HTTP, SYN, MHD, and PS attacks, as well as a 94\% F1 score for UDP attacks. Finally, we build confidence in the models by using the SHAP (SHapley Additive exPlanations) explainable AI technique, allowing us to identify the specific features that underlie the accurate detection of attacks.

先前对以行为为基础探测IoT装置网络攻击的研究已经产生了机器学习模型,这些模型适应不可见数据的能力有限,而且往往没有展示。本文展示了IoTGeM,这是模拟IoT网络攻击的一种方法,侧重于通用性,但也导致更好的探测和性能。我们首先采用更好的滚动窗口方法进行特征提取。为了减少过度配制,我们随后采用一个多步骤特征选择程序,在这种程序下,遗传阿尔格利什姆(GA)以单独独立的独立的数据集提供的外部反馈为独特的指导。为了防止共同数据泄漏,我们使用严格孤立的列车和测试数据集来建立和测试我们的模型。由此产生的模型使用多种机器学习算法和数据集的组合进行严格的评估。我们基于窗口的模型比传统的流基模型更高级的通用性,特别是在对隐蔽的数据集进行测试时。在这些严格、交叉数据集测试中,IoTGEM为ACK、HTTP、SYN、MH和PSS测试数据集攻击的F1级分数。我们用SAP攻击的分数,最后的SADSBrealS,将SADrestration 解释我们用于SADreal攻击的F1的精确攻击的分数。


Article 175

Title@2025-06-12 (4): An energy-efficient learning solution for the Agile Earth Observation Satellite Scheduling Problem

Title: An energy-efficient learning solution for the Agile Earth Observation Satellite Scheduling Problem Eine energieeffiziente Lernlösung für das agile Erdbeobachtungssatelliten-Scheduling-Problem Agile地球观测卫星排程问题的一个节能学习解决方案 2503.04803v2

Authors (3): Antonio M. Mercado-Martínez, Beatriz Soret, Antonio Jurado-Navas

The Agile Earth Observation Satellite Scheduling Problem (AEOSSP) entails finding the subset of observation targets to be scheduled along the satellite’s orbit while meeting operational constraints of time, energy and memory. The problem of deciding what and when to observe is inherently complex, and becomes even more challenging when considering several issues that compromise the quality of the captured images, such as cloud occlusion, atmospheric turbulence, and image resolution. This paper presents a Deep Reinforcement Learning (DRL) approach for addressing the AEOSSP with time-dependent profits, integrating these three factors to optimize the use of energy and memory resources. The proposed method involves a dual decision-making process: selecting the sequence of targets and determining the optimal observation time for each. Our results demonstrate that the proposed algorithm reduces the capture of images that fail to meet quality requirements by > 60% and consequently decreases energy waste from attitude maneuvers by up to 78%, all while maintaining strong observation performance.

敏捷的地球观测卫星规划问题(AEOSSP)意味着在满足时间、能量和记忆等操作限制的同时,在卫星轨道上寻找预定的观测目标子集。决定观测什么和何时观测的问题本身就十分复杂,在考虑影响所捕获图像质量的一些问题时甚至更具挑战性,如云层隔离、大气动荡和图像解析。本文件介绍了一种深度强化学习(DRL)方法,以基于时间的利润解决AEOSSP,将这三个因素结合起来,以优化能源和记忆资源的使用。拟议方法涉及一个双重决策程序:选择目标序列和确定每个目标的最佳观测时间。我们的结果表明,拟议的算法减少了未能达到质量要求的60%以上的图像的捕获,从而将能源浪费从姿态动作减少到78 %,所有这些都保持了强大的观测性能。


Article 176

Title@2025-06-12 (4): PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

Title: PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier PAG: Multi-Turn verstärkt LLM Selbstkorrektion mit Politik als Generativer Prüfer PAG: 多发强化LLM自我校正,政策作为产生验证 2506.10406v1

Authors (8): Yuhua Jiang, Yuwen Xiong, Yufeng Yuan, Chao Xin, Wenyuan Xu, Yu Yue, Qianchuan Zhao, Lin Yan

Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks, yet they still struggle to reliably verify the correctness of their own outputs. Existing solutions to this verification challenge often depend on separate verifier models or require multi-stage self-correction training pipelines, which limit scalability. In this paper, we propose Policy as Generative Verifier (PAG), a simple and effective framework that empowers LLMs to self-correct by alternating between policy and verifier roles within a unified multi-turn reinforcement learning (RL) paradigm. Distinct from prior approaches that always generate a second attempt regardless of model confidence, PAG introduces a selective revision mechanism: the model revises its answer only when its own generative verification step detects an error. This verify-then-revise workflow not only alleviates model collapse but also jointly enhances both reasoning and verification abilities. Extensive experiments across diverse reasoning benchmarks highlight PAG’s dual advancements: as a policy, it enhances direct generation and self-correction accuracy; as a verifier, its self-verification outperforms self-consistency.

大型语言模型(LLMS)在复杂的推理任务中表现出了令人印象深刻的能力,然而,它们仍然在努力可靠地核实其自身产出的正确性。这一核查挑战的现有解决办法往往取决于单独的核查模型,或需要多阶段自我校正培训管道,这限制了可缩放性。在本文中,我们提议政策作为“创制验证器”(PAG),这是一个简单而有效的框架,它使LLMs能够通过在统一的多方向强化学习模式中交替政策和验证员的作用来自我纠正。与以往的做法不同,以往的做法总是产生第二次尝试,而不管模型是否有信心。 PAG采用了选择性的修订机制:模型只有在其自身的基因核查步骤发现错误时才修改其答复。这一校验后工作流程不仅缓解了模型的崩溃,而且还共同增强了推理和核查能力。 不同推理基准的广泛实验突出PAG的双重进步:作为一种政策,它能加强直接生成和自我校正的准确性;作为校验者,其自我核查超越了自我一致性。


Article 177

Title@2025-06-12 (4): Generative Algorithms for Wildfire Progression Reconstruction from Multi-Modal Satellite Active Fire Measurements and Terrain Height

Title: Generative Algorithms for Wildfire Progression Reconstruction from Multi-Modal Satellite Active Fire Measurements and Terrain Height Generative Algorithmen für Wildfire-Progressions-Rekonstruktion durch Multi-Modal Satellite-Aktivfeuer-Messungen und Geländehöhe 从多模式卫星活性火灾测量和地面高度生成的野火进量重建 2506.10404v1

Authors (11): Bryan Shaddy, Brianna Binder, Agnimitra Dasgupta, Haitong Qin, James Haley, Angel Farguell, Kyle Hilburn, Derek V. Mallia, Adam Kochanski, Jan Mandel, Assad Oberai

Increasing wildfire occurrence has spurred growing interest in wildfire spread prediction. However, even the most complex wildfire models diverge from observed progression during multi-day simulations, motivating need for data assimilation. A useful approach to assimilating measurement data into complex coupled atmosphere-wildfire models is to estimate wildfire progression from measurements and use this progression to develop a matching atmospheric state. In this study, an approach is developed for estimating fire progression from VIIRS active fire measurements, GOES-derived ignition times, and terrain height data. A conditional Generative Adversarial Network is trained with simulations of historic wildfires from the atmosphere-wildfire model WRF-SFIRE, thus allowing incorporation of WRF-SFIRE physics into estimates. Fire progression is succinctly represented by fire arrival time, and measurements for training are obtained by applying an approximate observation operator to WRF-SFIRE solutions, eliminating need for satellite data during training. The model is trained on tuples of fire arrival times, measurements, and terrain, and once trained leverages measurements of real fires and corresponding terrain data to generate samples of fire arrival times. The approach is validated on five Pacific US wildfires, with results compared against high-resolution perimeters measured via aircraft, finding an average Sorensen-Dice coefficient of 0.81. The influence of terrain height on the arrival time inference is also evaluated and it is observed that terrain has minimal influence when the inference is conditioned on satellite measurements.

野火发生率的增加引起了人们对野火扩散预测的兴趣,但即使是最复杂的野火模型也引起了对野火扩散预测的兴趣。即使最复杂的野火模型也与多天模拟期间观察到的野火趋势有差异,因此需要数据同化。将测量数据同化为复杂的大气和热火模型的有用方法,是从测量中估计野火进展,利用这一进展来形成一个匹配的大气状态。在这项研究中,开发了一种方法来估计七战热火测量的火灾进展、从地表点火点火点火的时间、从地表高度数据。一个有条件的创生性对网络进行了培训,模拟了从大气和边缘火点火模型WRF-SFIRIR的历史性野火,从而将WRF-SFIRI物理学纳入估计中。火到来时间是简洁的反映,通过将一个大致观测操作员对WRF-SFIRI解决方案进行训练,在培训期间不再需要卫星到达的时间、测量和地形数据,一旦经过培训的火力测量和相应的地形数据,以生成火到的样本。在五个太平洋地平地平地平面的地形上测量时,通过测测测测测测到了5时,在高的地平时,对飞机到达的地平地平地平地平路路路的测测测到结果。


Article 178

Title@2025-06-12 (4): Time To Impeach LLM-as-a-Judge: Programs are the Future of Evaluation

Title: Time To Impeach LLM-as-a-Judge: Programs are the Future of Evaluation Time To Impeach LLM-as-a-Richter: Programme sind die Zukunft der Evaluation 弹劾LLM作为法官的时间:方案就是评价的未来 2506.10403v1

Authors (3): Tzu-Heng Huang, Harit Vishwakarma, Frederic Sala

Large language models (LLMs) are widely used to evaluate the quality of LLM generations and responses, but this leads to significant challenges: high API costs, uncertain reliability, inflexible pipelines, and inherent biases. To address these, we introduce PAJAMA (Program-As-a-Judge for Automated Model Assessment), a new alternative that uses LLMs to synthesize executable judging programs instead of directly scoring responses. These synthesized programs can be stored and run locally, costing orders of magnitude less while providing interpretable, and auditable judging logic that can be easily adapted. Program-based judges mitigate biases, improving judgment consistency by 15.83% and reducing biased responses by 23.7% on average compared to a Qwen2.5-14B-based LLM-as-a-judge. When program judgments are distilled into a model, PAJAMA outperforms LLM-as-a-judge on the challenging CHAT-HARD subset of RewardBench, outperforming metrics by 2.19% on Prometheus and 8.67% on the JudgeLM dataset, all at three orders of magnitude lower cost.

大型语言模型(LLMS)被广泛用于评估LLM世代和应对措施的质量,但这带来了重大挑战:API成本高、可靠性不确定、输油管不灵活以及固有的偏见。为了解决这些问题,我们引入了PAAMA(自动模型评估方案法官),这是一个新的替代方案,使用LLMS来综合可执行的判断程序,而不是直接评分。这些综合方案可以在当地储存和运行,成本水平低,同时提供可解释和可审计的逻辑,容易调整。基于方案的法官减少偏见,提高判断一致性15.83%,将偏差反应平均减少23.7%,而基于Qwen2.5-14B的LM-as-a-judge。当方案判决被提炼为模型时,PAJAMA优于LM-A-a-法官对挑战性CHAT-HARD的RewardBench(RewardBench)子集,优于Prometheus和法官LM数据集8.67%的指数,均以2.19%的成绩低于2.9%。


Article 179

Title@2025-06-12 (4): RsGCN: Rescaling Enhances Generalization of GCNs for Solving Scalable Traveling Salesman Problems

Title: RsGCN: Rescaling Enhances Generalization of GCNs for Solving Scalable Traveling Salesman Problems RsGCN: Rescaling verbessert die Generalisierung von GCNs zur Lösung skalierbarer reisender Salesman-Probleme RsGCN: 提高全球氯化萘的通用化,以解决可缩放旅行销售员问题 2506.00533v2

Authors (4): Junquan Huang, Zong-Gan Chen, Yuncheng Jiang, Zhi-Hui Zhan

Neural traveling salesman problem (TSP) solvers face two critical challenges: poor generalization for scalable TSPs and high training costs. To address these challenges, we propose a new Rescaling Graph Convolutional Network (RsGCN). Focusing on the scale-dependent features (i.e., features varied with problem scales) related to nodes and edges that influence the sensitivity of GCNs to the problem scales, a Rescaling Mechanism in RsGCN enhances the generalization capability by (1) rescaling adjacent nodes to construct a subgraph with a uniform number of adjacent nodes for each node across various scales of TSPs, which stabilizes the graph message aggregation; (2) rescaling subgraph edges to adjust the lengths of subgraph edges to the same magnitude, which maintains numerical consistency. In addition, an efficient training strategy with a mixed-scale dataset and bidirectional loss is used in RsGCN. To fully exploit the heatmaps generated by RsGCN, we design an efficient post-search algorithm termed Re2Opt, in which a reconstruction process based on adaptive weight is incorporated to help avoid local optima. Based on a combined architecture of RsGCN and Re2Opt, our solver achieves remarkable generalization and low training cost: with only 3 epochs of training on the mixed-scale dataset containing instances with up to 100 nodes, it can be generalized successfully to 10K-node instances without any fine-tuning. Extensive experiments demonstrate our state-of-the-art performance across uniform distribution instances of 9 different scales from 20 to 10K nodes and 78 real-world instances from TSPLIB, while requiring the fewest learnable parameters and training epochs among neural competitors.

神经巡回销售员问题解决者面临两个严峻的挑战:对可缩放的TSP和高培训成本而言,对可缩放的TSP和高额培训成本的统略性反应不足。为了应对这些挑战,我们建议建立一个新的“缩放图表革命网络 ” (RsGCN) 。关注节点和影响GCN对问题规模敏感度的节点和边缘的大小特征(即与问题规模不同的特点),一个RsGCN的调整机制通过:(1) 调整相邻的通用节点,为TSP不同规模的每个节点建造一个具有统一数目的相邻节点的子图谱,以稳定图形电文汇总;(2) 调整子缩放边缘,将子图边的长度调整为同一程度,以保持数字一致性。此外, RsGCN 的高效培训策略使用混合比例数据集和双向损失。为了充分利用RsGCN生成的热测图,我们设计了一个名为Re2CN的后期分析算法,在不包含适应性重量值的模型和高额培训中,在快速的重建过程中,只能通过Sloveal-IF2的模型学习10级和低成本进行。


Article 180

Title@2025-06-12 (4): TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning

Title: TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning TooBadRL: Trigger-Optimierung zur Steigerung der Effektivität von Backdoor-Angriffen auf Deep Verstärkung Learning TooBadRL:在深强化学习中提高后门袭击效果的触发最佳优化 2506.09562v2

Authors (4): Songze Li, Mingxuan Zhang, Kang Wei, Shouling Ji

Deep reinforcement learning (DRL) has achieved remarkable success in a wide range of sequential decision-making domains, including robotics, healthcare, smart grids, and finance. Recent research demonstrates that attackers can efficiently exploit system vulnerabilities during the training phase to execute backdoor attacks, producing malicious actions when specific trigger patterns are present in the state observations. However, most existing backdoor attacks rely primarily on simplistic and heuristic trigger configurations, overlooking the potential efficacy of trigger optimization. To address this gap, we introduce TooBadRL (Trigger Optimization to Boost Effectiveness of Backdoor Attacks on DRL), the first framework to systematically optimize DRL backdoor triggers along three critical axes, i.e., temporal, spatial, and magnitude. Specifically, we first introduce a performance-aware adaptive freezing mechanism for injection timing. Then, we formulate dimension selection as a cooperative game, utilizing Shapley value analysis to identify the most influential state variable for the injection dimension. Furthermore, we propose a gradient-based adversarial procedure to optimize the injection magnitude under environment constraints. Evaluations on three mainstream DRL algorithms and nine benchmark tasks show that TooBadRL significantly improves attack success rates, while ensuring minimal degradation of normal task performance. These results highlight the previously underappreciated importance of principled trigger optimization in DRL backdoor attacks. The source code of TooBadRL can be found at https://github.com/S3IC-Lab/TooBadRL.

深入强化学习(DRL)在一系列广泛的连续决策领域取得了显著成功,包括机器人、医疗保健、智能电网和金融。最近的研究表明,攻击者可以在培训阶段有效利用系统弱点,实施幕后攻击,在州观测中出现特定触发模式时采取恶意行动。然而,大多数现有的后门攻击主要依靠简单和超常的触发配置,忽视触发优化的潜在效力。为了解决这一差距,我们引入了ToBadRL(Trigger优化到增强DRL后门攻击的效果),这是在三个关键轴上系统优化DRL后门触发的第一个框架,即:时间、空间和规模。具体地说,我们首先引入了一种有性适应性的适应性冷冻机制,用于注入时间。然后,我们将维度选择作为一种合作游戏,利用沙普利价值分析来确定注射层面最有影响力的变量。此外,我们还可以提出一种基于梯度的对抗程序,以优化环境制约下的注射量。在三个主流DR算法和九个基准任务上,即时间、空间、空间和规模。我们首先引入了性适应性ADRral-BL攻击的成功率,然后在常规评估中,在SBRABRABABABRABABABADR的常规攻击中大幅测试中大幅改进了成功。


Article 181

Title@2025-06-12 (4): EQA-RM: A Generative Embodied Reward Model with Test-time Scaling

Title: EQA-RM: A Generative Embodied Reward Model with Test-time Scaling EQA-RM: Generatives Embmund-Reward-Modell mit Testzeitskalierung EQA-RM:带有试验时间缩放的产生性内嵌式奖赏模型 2506.10389v1

Authors (3): Yuhang Chen, Zhen Tan, Tianlong Chen

Reward Models (RMs), vital for large model alignment, are underexplored for complex embodied tasks like Embodied Question Answering (EQA) where nuanced evaluation of agents’ spatial, temporal, and logical understanding is critical yet not considered by generic approaches. We introduce EQA-RM, a novel generative multimodal reward model specifically architected for EQA, trained via our innovative Contrastive Group Relative Policy Optimization (C-GRPO) strategy to learn fine-grained behavioral distinctions. The generative nature of EQA-RM provides interpretable, structured reward feedback (beyond simple scalars), uniquely enabling test-time scaling to dynamically adjust evaluation granularity, from concise scores to detailed critiques of reasoning and grounding, at inference without retraining. Concurrently, we introduce EQARewardBench, a new benchmark built on OpenEQA for standardized EQA reward model assessment. Demonstrating high sample efficiency, EQA-RM (fine-tuning Qwen2-VL-2B-Instruct) achieves 61.9\% accuracy on EQA-RM-Bench with only 700 samples, outperforming strong proprietary baselines, including Gemini-2.5-Flash, GPT-4o, Claude-3.5-Haiku, and open-sourced state-of-the-art models such as RoVRM and VisualPRM. The code and dataset can be found here https://github.com/UNITES-Lab/EQA-RM.

对大型模型调整至关重要的重塑模型(RMs),在对代理人的空间、时间和逻辑理解进行细微的细微评价至非通用方法所考虑的关键情况下,对于诸如Embodied 问答(EQA)等复杂而成型的任务,尚未充分探讨。我们引入了EQA-RM,这是专门为EQA设计的一种新颖的基因化多式奖赏模型,通过我们创新的对比小组相对政策优化(C-GROPO)战略培训,以了解细微区分的行为区分。EQA-RM的基因化性质提供了可解释的、结构化的奖赏反馈(超越简单的标度 ) 、独特的测试时间缩放至动态调整评价颗粒度,从简洁的评分到详细的推理和基础分析。 同时,我们引入EQAwardBench,这是建立在OpEQA(C-GRPO)标准奖赏模型评估的一个新的基准。展示高采样效率,EQA-RM(对Q2-V-L-2-B-Instra-In-RM-S-RM-S-I-RM-S-S-I-S-S-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-IB-S-S-S-S-S-S-IB-S-S-S-S-S-S-S-S-S-S-S-S-S-S-IAR_BAR_BAR_BAR_BAR-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-IAR-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-IAR_BAR_B-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-


Article 182

Title@2025-06-12 (4): Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective

Title: Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective Messen von Repräsentationsverschiebungen im kontinuierlichen Lernen: Eine lineare Transformationsperspektive 持续学习中的衡量代表性变化:线性转变前景 2505.20970v3

Authors (3): Joonkyu Kim, Yejin Kim, Jy-yong Sohn

In continual learning scenarios, catastrophic forgetting of previously learned tasks is a critical issue, making it essential to effectively measure such forgetting. Recently, there has been growing interest in focusing on representation forgetting, the forgetting measured at the hidden layer. In this paper, we provide the first theoretical analysis of representation forgetting and use this analysis to better understand the behavior of continual learning. First, we introduce a new metric called representation discrepancy, which measures the difference between representation spaces constructed by two snapshots of a model trained through continual learning. We demonstrate that our proposed metric serves as an effective surrogate for the representation forgetting while remaining analytically tractable. Second, through mathematical analysis of our metric, we derive several key findings about the dynamics of representation forgetting: the forgetting occurs more rapidly to a higher degree as the layer index increases, while increasing the width of the network slows down the forgetting process. Third, we support our theoretical findings through experiments on real image datasets, including Split-CIFAR100 and ImageNet1K.

在持续学习的情景中,灾难性地忘记以往学到的任务是一个关键问题,因此,必须有效地衡量如何忘记这些任务。最近,人们越来越关注注重代表的遗忘,在隐蔽层测量的忘记。在本文中,我们首次对代表的遗忘进行了理论分析,并利用这一分析更好地了解持续学习的行为。首先,我们引入了一个新的指标,称为代表差异,以衡量通过持续学习所培训的模型的两张快照所构建的代表性空间之间的差异。我们证明,我们提议的衡量标准是代表的遗忘的有效替代物,同时仍然可以分析地进行。第二,我们通过数学分析我们的衡量标准,得出关于代表的动态的一些关键结论:随着分层指数的增加,遗忘现象会更快地发生,同时提高网络的宽度,减缓了遗忘过程。第三,我们通过对真实图像数据集的实验,包括Splid-CIFAR100和图像网络1K,支持我们的理论发现。


Article 183

Title@2025-06-12 (4): Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

Title: Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning Hierarchische Latentenfähigkeiten von Sprachmodellen über das kausale Repräsentationslernen entdecken 通过因果代表制学习发现语言模式的分级本端能力 2506.10378v1

Authors (4): Jikai Jin, Vasilis Syrgkanis, Sham Kakade, Hanlin Zhang

Faithful evaluation of language model capabilities is crucial for deriving actionable insights that can inform model development. However, rigorous causal evaluations in this domain face significant methodological challenges, including complex confounding effects and prohibitive computational costs associated with extensive retraining. To tackle these challenges, we propose a causal representation learning framework wherein observed benchmark performance is modeled as a linear transformation of a few latent capability factors. Crucially, these latent factors are identified as causally interrelated after appropriately controlling for the base model as a common confounder. Applying this approach to a comprehensive dataset encompassing over 1500 models evaluated across six benchmarks from the Open LLM Leaderboard, we identify a concise three-node linear causal structure that reliably explains the observed performance variations. Further interpretation of this causal structure provides substantial scientific insights beyond simple numerical rankings: specifically, we reveal a clear causal direction starting from general problem-solving capabilities, advancing through instruction-following proficiency, and culminating in mathematical reasoning ability. Our results underscore the essential role of carefully controlling base model variations during evaluation, a step critical to accurately uncovering the underlying causal relationships among latent model capabilities.

对语言模型能力的忠实评价对于得出可用于模型开发的可操作的洞察力至关重要。然而,这一领域的严格的因果评估面临着重大的方法挑战,包括复杂的混乱效应和与广泛再培训有关的令人难以承受的计算费用。为了应对这些挑战,我们提议了一个因果代表性学习框架,其中所观察到的基准业绩以几个潜在能力因素的线性转变为模型。关键是,这些潜在因素在适当控制基础模型后被确定为因果性关联性,而作为共同的混淆因素。将这一方法应用于一个包含1500多个模型的综合数据集,该数据集由开放 LLM 领头板的六个基准进行评估,我们确定了一个简洁的三点线性因果结构,可靠地解释了观察到的业绩差异。对这一因果结构的进一步解释提供了超出简单数字排序以外的大量科学洞察力:具体地说,我们揭示了一个明确的因果性方向,从一般解决问题的能力出发,通过以熟练的教学方式推进,以数学推理能力为顶峰。我们的结果强调了在评估期间仔细控制基础模型变换的基本作用,这是准确发现潜在模型能力之间潜在因果关系的关键一步。


Article 184

Title@2025-06-12 (4): A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Title: A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Ein minimalistischer Ansatz zur LLM-Vernunft: von der Abstoßung zur Verstärkung 从拒绝抽样到强化 2504.11343v2

Authors (11): Wei Xiong, Jiarui Yao, Yuhui Xu, Bo Pang, Lei Wang, Doyen Sahoo, Junnan Li, Nan Jiang, Tong Zhang, Caiming Xiong, Hanze Dong

Reinforcement learning (RL) has become a prevailing approach for fine-tuning large language models (LLMs) on complex reasoning tasks. Among recent methods, GRPO stands out for its empirical success in training models such as DeepSeek-R1, yet the sources of its effectiveness remain poorly understood. In this work, we revisit GRPO from a reinforce-like algorithm perspective and analyze its core components. Surprisingly, we find that a simple rejection sampling baseline, RAFT, which trains only on positively rewarded samples, yields competitive performance than GRPO and PPO. Our ablation studies reveal that GRPO’s main advantage arises from discarding prompts with entirely incorrect responses, rather than from its reward normalization. Motivated by this insight, we propose Reinforce-Rej, a minimal extension of policy gradient that filters both entirely incorrect and entirely correct samples. Reinforce-Rej improves KL efficiency and stability, serving as a lightweight yet effective alternative to more complex RL algorithms. We advocate RAFT as a robust and interpretable baseline, and suggest that future advances should focus on more principled designs for incorporating negative samples, rather than relying on them indiscriminately. Our findings provide guidance for future work in reward-based LLM post-training.

加强学习已成为在复杂的推理任务上微调大型语言模型(LLMs)的普遍做法。在最近的方法中,GROP在DeepSeek-R1等培训模型中表现出了成功的经验,但是其效力的来源仍然不甚为人理解。在这项工作中,我们从强化的算法角度重新研究GROP,并分析其核心组成部分。令人惊讶的是,我们发现一个简单的拒绝抽样基准,RAFT,它仅对积极奖励的样本进行培训,比GROPO和PPPO具有竞争性性能。我们的通货膨胀研究表明,GROPO的主要优势来自于完全错误的反应,而不是其奖赏正常化。我们建议加强-Rej,这是政策梯度的最小延伸,可以筛选完全不正确和完全正确的样本。加强-REj提高了KL的效率和稳定性,作为较复杂的RL算法的较轻但有效的替代方法。我们主张RAFT为强有力和可解释的基线,并建议今后的进展应侧重于更有原则性的设计,以纳入负面的样品,而不是依赖我们未来的草本性训练。


Article 185

Title@2025-06-12 (4): Revisiting Transformers with Insights from Image Filtering

Title: Revisiting Transformers with Insights from Image Filtering Übertragen von Transformern mit Einsichten aus der Bildfilterung 以图像过滤透视的洞视重新审视变形器 2506.10371v1

Authors (3): Laziz U. Abdullaev, Maksim Tkachenko, Tan M. Nguyen

The self-attention mechanism, a cornerstone of Transformer-based state-of-the-art deep learning architectures, is largely heuristic-driven and fundamentally challenging to interpret. Establishing a robust theoretical foundation to explain its remarkable success and limitations has therefore become an increasingly prominent focus in recent research. Some notable directions have explored understanding self-attention through the lens of image denoising and nonparametric regression. While promising, existing frameworks still lack a deeper mechanistic interpretation of various architectural components that enhance self-attention, both in its original formulation and subsequent variants. In this work, we aim to advance this understanding by developing a unifying image processing framework, capable of explaining not only the self-attention computation itself but also the role of components such as positional encoding and residual connections, including numerous later variants. We also pinpoint potential distinctions between the two concepts building upon our framework, and make effort to close this gap. We introduce two independent architectural modifications within transformers. While our primary objective is interpretability, we empirically observe that image processing-inspired modifications can also lead to notably improved accuracy and robustness against data contamination and adversaries across language and vision tasks as well as better long sequence understanding.

自我注意机制是以变异器为基础的最先进的深层次学习结构的基石,它在很大程度上是超自然学驱动的,对解释具有根本的挑战性。因此,建立坚实的理论基础来解释其显著的成功和局限性,已成为最近研究的一个日益突出的重点。一些值得注意的方向探索了通过图像脱色和非对称回归的视角来理解自我注意。虽然前景看好,但现有框架仍然缺乏对各种建筑构件的更深层次机械化解释,这些构件在最初的构件和随后的变式中都加强了自我注意。在这项工作中,我们的目标是通过开发一个统一的图像处理框架来推进这种理解,不仅能够解释自我注意计算本身,而且能够解释定位编码和剩余连接等构件的作用,包括许多后来的变式。我们还确定了在我们框架内建立的两个概念之间的潜在区别,并努力缩小这一差距。我们在变异体中引入了两个独立的建筑修饰。虽然我们的首要目标是可解释性,但我们从经验上认为,受启发的修改还能够显著地提高准确性和稳健性,防止数据在语言和愿景上相互交叉和长远理解。


Article 186

Title@2025-06-12 (4): Can We Infer Confidential Properties of Training Data from LLMs?

Title: Can We Infer Confidential Properties of Training Data from LLMs? Können wir vertrauliche Eigenschaften von Trainingsdaten von LLMs ableiten? 我们能否从LLMS中推断培训数据的机密性? 2506.10364v1

Authors (4): Penguin Huang, Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri

Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets to support applications in fields such as healthcare, finance, and law. These fine-tuning datasets often have sensitive and confidential dataset-level properties – such as patient demographics or disease prevalence – that are not intended to be revealed. While prior work has studied property inference attacks on discriminative models (e.g., image classification models) and generative models (e.g., GANs for image data), it remains unclear if such attacks transfer to LLMs. In this work, we introduce PropInfer, a benchmark task for evaluating property inference in LLMs under two fine-tuning paradigms: question-answering and chat-completion. Built on the ChatDoctor dataset, our benchmark includes a range of property types and task configurations. We further propose two tailored attacks: a prompt-based generation attack and a shadow-model attack leveraging word frequency signals. Empirical evaluations across multiple pretrained LLMs show the success of our attacks, revealing a previously unrecognized vulnerability in LLMs.

大型语言模型(LLMS)日益在特定领域数据集上进行微调,以支持保健、金融和法律等领域的应用;这些微调数据集往往具有敏感和保密的数据集级特性,如病人人口或疾病流行,这些性质并不打算披露;虽然以前的工作研究过对歧视模型(例如图像分类模型)和基因模型(例如图像数据GANs)进行的财产推断攻击,但仍然不清楚这种攻击是否转移到LMS;在这项工作中,我们引入PropInfer,这是在两个微调模式下评估LLMS财产推断的基准任务:回答问题和完成聊天。在ChatDoctor数据集上,我们的基准包括一系列财产类型和任务配置。我们进一步建议了两种有针对性的攻击:即时生成攻击和影子模型攻击,利用文字频率信号。在多个受过训练的LMS中进行的经验性评价显示我们攻击的成功,揭示了LMS中先前未确认的脆弱性。


Article 187

Title@2025-06-12 (4): Permutation-Based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data

Title: Permutation-Based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data Permutationsbasierter Rank-Test in der Gegenwart von Diskretisierung und Anwendung in der kausalen Entdeckung mit gemischten Daten 在与混合数据相混合的Causal Discovery中存在分化和应用 2501.18990v2

Authors (9): Xinshuai Dong, Ignavier Ng, Boyang Sun, Haoyue Dai, Guang-Yuan Hao, Shunxing Fan, Peter Spirtes, Yumou Qiu, Kun Zhang

Recent advances have shown that statistical tests for the rank of cross-covariance matrices play an important role in causal discovery. These rank tests include partial correlation tests as special cases and provide further graphical information about latent variables. Existing rank tests typically assume that all the continuous variables can be perfectly measured, and yet, in practice many variables can only be measured after discretization. For example, in psychometric studies, the continuous level of certain personality dimensions of a person can only be measured after being discretized into order-preserving options such as disagree, neutral, and agree. Motivated by this, we propose Mixed data Permutation-based Rank Test (MPRT), which properly controls the statistical errors even when some or all variables are discretized. Theoretically, we establish the exchangeability and estimate the asymptotic null distribution by permutations; as a consequence, MPRT can effectively control the Type I error in the presence of discretization while previous methods cannot. Empirically, our method is validated by extensive experiments on synthetic data and real-world data to demonstrate its effectiveness as well as applicability in causal discovery.

最近的进展表明,跨变量矩阵等级的统计测试在因果发现中起着重要作用,这些等级测试包括作为特殊案例的部分相关测试,并提供关于潜在变量的进一步图形信息。现有的等级测试通常假定所有连续变量都可以完全测量,但实际上许多变量只能在分解后才能测量。例如,在心理测定研究中,一个人某些个性层面的连续水平只有在分解成不同、中立和同意等按顺序保存选项之后才能测量。我们为此提出混合数据跨位测试(MPRT),即使某些或所有变量是分解的,也能够适当控制统计错误。理论上,我们确定可互换性,并估计通过分解的无损分布;因此,MPRT能够有效控制存在离解时的I型错误,而以前的方法则无法。我们的方法通过对合成数据和真实世界数据的广泛实验加以验证,以证明其有效性和在因果发现中的适用性。


Article 188

Title@2025-06-12 (4): Advanced deep architecture pruning using single filter performance

Title: Advanced deep architecture pruning using single filter performance Fortschrittliche Tiefenarchitektur-Beschneidung mit Single-Filter-Performance 使用单一过滤性能进行高级深高级结构修剪 2501.12880v2

Authors (6): Yarden Tzach, Yuval Meir, Ronit D. Gross, Ofek Tevet, Ella Koresh, Ido Kanter

Pruning the parameters and structure of neural networks reduces the computational complexity, energy consumption, and latency during inference. Recently, a novel underlying mechanism for successful deep learning (DL) was presented based on a method that quantitatively measures the single filter performance in each layer of a DL architecture, and a new comprehensive mechanism of how deep learning works was presented. This statistical mechanics inspired viewpoint enables to reveal the macroscopic behavior of the entire network from the microscopic performance of each filter and their cooperative behavior. Herein, we demonstrate how this understanding paves the path to high quenched dilution of the convolutional layers of deep architectures without affecting their overall accuracy using applied filter cluster connections (AFCC). AFCC is exemplified on VGG-11 and EfficientNet-B0 architectures trained on CIFAR-100, and its high pruning outperforms other techniques using the same pruning magnitude. Additionally, this technique is broadened to single nodal performance and highly pruning of fully connected layers, suggesting a possible implementation to considerably reduce the complexity of over-parameterized AI tasks.

保护神经网络的参数和结构可以降低计算复杂性、能源消耗和推论期间的潜伏。最近,根据定量测量DL结构每一层的单一过滤性能的方法,提出了一个新的深层成功学习基本机制(DL),并提出了新的深层学习工作的综合机制。这一统计机制启发的观点能够揭示整个网络的宏观行为,从每个过滤器的微镜性能及其合作行为中揭示出。在这里,我们展示了这一理解如何铺平了深层建筑的快速稀释,而不会影响其总体准确性,使用应用的过滤性集束连接(AFCC),AFCC以VG-11和高效的Net-B0为范例,在CIFAR-100上培训的架构及其高运行率超过使用相同运行强度的其他技术。此外,这一技术已扩大到单一节度性表现和高度连接的层,建议可能实施,以大大降低过度分离的AI任务的复杂性。


Article 189

Title@2025-06-12 (4): Dimension-Independent Kernel ε-Covers

Title: Dimension-Independent Kernel ε-Covers Dimensionsunabhängiger Kernel ε-Covers 独立的内核 - 中央 2306.16516v2

Authors (2): Jeff M. Phillips, Hasan Pourmahmood-Aghababa

We introduce the notion of an $\varepsilon$-cover for a kernel range space. A kernel range space concerns a set of points $X \subset \mathbb{R}^d$ and the space of all queries by a fixed kernel (e.g., a Gaussian kernel $K(p,\cdot) = \exp(-|p-\cdot|^2)$, where $p \in \mathbb{R}^d$). For a point set $X$ of size $n$, a query returns a vector of values $R_p \in \mathbb{R}^n$, where the $i$th coordinate $(R_p)_i = K(p,x_i)$ for $x_i \in X$. An $\varepsilon$-cover is a subset of points $Q \subset \mathbb{R}^d$ so for any $p \in \mathbb{R}^d$ that $\frac{1}{n} |R_p - R_q|_1\leq \varepsilon$ for some $q \in Q$. This is a smooth analog of Haussler’s notion of $\varepsilon$-covers for combinatorial range spaces (e.g., defined by subsets of points within a ball query) where the resulting vectors $R_p$ are in ${0,1}^n$ instead of $[0,1]^n$. The kernel versions of these range spaces show up in data analysis tasks where the coordinates may be uncertain or imprecise, and hence one wishes to add some flexibility in the notion of inside and outside of a query range. Our main result is that, unlike combinatorial range spaces, the size of kernel $\varepsilon$-covers is independent of the input size $n$ and dimension $d$. We obtain a bound of $2^{\tilde O(1/\varepsilon^2)}$, where $\tilde{O}(f(1/\varepsilon))$ hides log factors in $(1/\varepsilon)$ that can depend on the kernel. This implies that by relaxing the notion of boundaries in range queries, eventually the curse of dimensionality disappears, and may help explain the success of machine learning in very high-dimensions. We also complement this result with a lower bound of almost $(1/\varepsilon)^{\Omega(1/\varepsilon)}$, showing the exponential dependence on $1/\varepsilon$ is necessary.

nan


Article 190

Title@2025-06-12 (4): Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success

Title: Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success Demonstration von Multi-Suction Item Picking at Scale über Multi-Modal Lernen von Pick-Erfolg 通过多模式学习选取成功模式, 演示通过选取成功多模式学习在缩放中选择的多测试项目 2506.10359v1

Authors (9): Che Wang, Jeroen van Baar, Chaitanya Mitash, Shuai Li, Dylan Randle, Weiyao Wang, Sumedh Sontakke, Kostas E. Bekris, Kapil Katyal

This work demonstrates how autonomously learning aspects of robotic operation from sparsely-labeled, real-world data of deployed, engineered solutions at industrial scale can provide with solutions that achieve improved performance. Specifically, it focuses on multi-suction robot picking and performs a comprehensive study on the application of multi-modal visual encoders for predicting the success of candidate robotic picks. Picking diverse items from unstructured piles is an important and challenging task for robot manipulation in real-world settings, such as warehouses. Methods for picking from clutter must work for an open set of items while simultaneously meeting latency constraints to achieve high throughput. The demonstrated approach utilizes multiple input modalities, such as RGB, depth and semantic segmentation, to estimate the quality of candidate multi-suction picks. The strategy is trained from real-world item picking data, with a combination of multimodal pretrain and finetune. The manuscript provides comprehensive experimental evaluation performed over a large item-picking dataset, an item-picking dataset targeted to include partial occlusions, and a package-picking dataset, which focuses on containers, such as boxes and envelopes, instead of unpackaged items. The evaluation measures performance for different item configurations, pick scenes, and object types. Ablations help to understand the effects of in-domain pretraining, the impact of different modalities and the importance of finetuning. These ablations reveal both the importance of training over multiple modalities but also the ability of models to learn during pretraining the relationship between modalities so that during finetuning and inference, only a subset of them can be used as input.

这项工作展示了机器人操作的自发学习方面如何从低标签、真实世界的部署数据、工业规模的工程解决方案中自主学习机器人操作的方方面面能够提供提高性能的解决方案。 具体地说, 它侧重于多制制式机器人选取, 并全面研究多式视觉解析器用于预测候选机器人选取的成功。 从非结构化的堆叠中选取各种物品, 是一个重要而具有挑战性的任务, 在现实世界环境中, 如仓库, 机器人操作中选取。 从杂交中选取数据的方法必须用于一套开放的物品, 同时满足延缓限制, 以达到高通量。 演示的方法利用多种输入模式, 如 RGB、 深度和 语义分解分解, 来估计候选多式视觉选取器的质量。 战略从真实世界的选取项目中选取各种物品, 将多式前导和细调数据组合组合中进行全面的实验性评价, 专制式选取数据集部分封闭, 和精细制数据设置的精细化数据设置, 也注重于能力模式, 评估容器、 格式的精选取、 格式的精选制、 格式的精选制、 格式的精选的精选的精选的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度, , 等的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度, 的精度的精度的精度的精度的精度的精度, 的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度, 的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度,


Article 191

Title@2025-06-12 (4): TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree

Title: TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree TreeLoRA: Effizientes kontinuierliches Lernen über schichtweise LoRAs, geleitet von einem Hierarchischen Gradienten-Similaritätsbaum TreeLORA:通过层-维兹 LoRAs 有效持续学习,由等级梯度梯度- 硅化树指导 2506.10355v1

Authors (5): Yu-Yang Qian, Yuan-Ze Xu, Zhen-Yu Zhang, Peng Zhao, Zhi-Hua Zhou

Many real-world applications collect data in a streaming environment, where learning tasks are encountered sequentially. This necessitates continual learning (CL) to update models online, enabling adaptation to new tasks while preserving past knowledge to prevent catastrophic forgetting. Nowadays, with the flourish of large pre-trained models (LPMs), efficiency has become increasingly critical for CL, due to their substantial computational demands and growing parameter sizes. In this paper, we introduce TreeLoRA (K-D Tree of Low-Rank Adapters), a novel approach that constructs layer-wise adapters by leveraging hierarchical gradient similarity to enable efficient CL, particularly for LPMs. To reduce the computational burden of task similarity estimation, we employ bandit techniques to develop an algorithm based on lower confidence bounds to efficiently explore the task structure. Furthermore, we use sparse gradient updates to facilitate parameter optimization, making the approach better suited for LPMs. Theoretical analysis is provided to justify the rationale behind our approach, and experiments on both vision transformers (ViTs) and large language models (LLMs) demonstrate the effectiveness and efficiency of our approach across various domains, including vision and natural language processing tasks.

许多现实世界应用程序在不断流出的环境中收集数据,学习任务依次相遇。这需要不断学习(CL),在网上更新模型,使适应新任务,同时保留过去的知识,防止灾难性的遗忘。如今,随着大量预先培训的模型(LPMs)的蓬勃发展,效率对于CL越来越重要,因为它们有大量的计算需求,而且参数大小不断增大。在本文中,我们引入了TrealoRA(K-D 低兰克适应器树),这是一种新颖的方法,它利用等级梯度相似的等级梯度来构建多层次适应器,从而使得高效率的CLLL,特别是LPMs。为了减少任务相似性估算的计算负担,我们使用了土匪技术来开发一种基于较低信任度的算法,以便有效地探索任务结构。此外,我们使用稀有的梯度更新法来方便参数的优化,使LPMs更适合LPMs。我们的方法有合理性,并且对视觉变异器和大语言模型进行实验,以展示我们的方法的效益和效率,包括视觉和自然语言处理任务。


Article 192

Title@2025-06-12 (4): History-Aware Neural Operator: Robust Data-Driven Constitutive Modeling of Path-Dependent Materials

Title: History-Aware Neural Operator: Robust Data-Driven Constitutive Modeling of Path-Dependent Materials History-Aware Neuronal Operator: Robuste datengetriebene konstitutive Modellierung von pfadabhängigen Materialien 历史软件神经操作器:路径依赖材料的强有力数据驱动建模模型 2506.10352v1

Authors (3): Binyao Guo, Zihan Lin, QiZhi He

This study presents an end-to-end learning framework for data-driven modeling of path-dependent inelastic materials using neural operators. The framework is built on the premise that irreversible evolution of material responses, governed by hidden dynamics, can be inferred from observable data. We develop the History-Aware Neural Operator (HANO), an autoregressive model that predicts path-dependent material responses from short segments of recent strain-stress history without relying on hidden state variables, thereby overcoming self-consistency issues commonly encountered in recurrent neural network (RNN)-based models. Built on a Fourier-based neural operator backbone, HANO enables discretization-invariant learning. To enhance its ability to capture both global loading patterns and critical local path dependencies, we embed a hierarchical self-attention mechanism that facilitates multiscale feature extraction. Beyond ensuring self-consistency, HANO mitigates sensitivity to initial hidden states, a commonly overlooked issue that can lead to instability in recurrent models when applied to generalized loading paths. By modeling stress-strain evolution as a continuous operator rather than relying on fixed input-output mappings, HANO naturally accommodates varying path discretizations and exhibits robust performance under complex conditions, including irregular sampling, multi-cycle loading, noisy data, and pre-stressed states. We evaluate HANO on two benchmark problems: elastoplasticity with hardening and progressive anisotropic damage in brittle solids. Results show that HANO consistently outperforms baseline models in predictive accuracy, generalization, and robustness. With its demonstrated capabilities, HANO provides an effective data-driven surrogate for simulating inelastic materials and is well-suited for integration with classical numerical solvers.

本研究为利用神经操作员对依赖路径的弹性材料进行数据驱动建模提供了一个端到端学习的框架。框架的前提是,从可见的数据中可以推断出由隐藏的动态调节的材料反应不可逆转的演变。我们开发了历史软件神经操作员(HANO),这是一个自动递减模型,它预测从最近紧张历史的短片中产生基于路径的材料反应,而不必依赖隐藏的状态变量,从而克服经常的内向型网络(RNNN)基于神经运行的模型中常见的自我一致性问题。在基于四流的神经操作员骨干上,HANO能够让不易的内向性学习。为了提高它捕捉全球装货模式和关键的本地路径依赖的能力,我们建立了有利于多级特征提取的等级自控机制。除了确保自身的一致性之外,HANO还减轻了对初始隐藏状态的敏感度,一个通常被忽视的问题,在应用通用的内装能力时导致经常模型的不稳定性。通过模拟压力压力的演化,而不是依赖固定的内向型螺,在固定的内置的内置的内流、内流、自动的内置的内置、自动的内置的内置、自动的内置的内置的内置的内置和内置的内置式的内置的内置的内置的内置的内置的内置的内置的内存状态的内存状态的内存状态, 的内置的内置的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存状态,包括:一个内存的内存的内存的内存的内存的内存的内存的内行的内行的内存的内行状态的内存的内存的内存的内存的内存的内存的内行的内存的内存的内存的内存的内行的内行的内行的内行的内行的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内


Article 193

Title@2025-06-12 (4): PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications

Title: PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications PhysNav-DG: Ein neuartiges adaptives Framework für robuste VLM-Sensor-Fusion in Navigationsanwendungen PhysNav-DG:导航应用中的强力VLM-传感器融合新适应框架 2505.01881v2

Authors (2): Trisanth Srinivasan, Santosh Patapati

Robust navigation in diverse environments and domains requires both accurate state estimation and transparent decision making. We present PhysNav-DG, a novel framework that integrates classical sensor fusion with the semantic power of vision-language models. Our dual-branch architecture predicts navigation actions from multi-sensor inputs while simultaneously generating detailed chain-of-thought explanations. A modified Adaptive Kalman Filter dynamically adjusts its noise parameters based on environmental context. It leverages several streams of raw sensor data along with semantic insights from models such as LLaMA 3.2 11B and BLIP-2. To evaluate our approach, we introduce the MD-NEX Benchmark, a novel multi-domain dataset that unifies indoor navigation, autonomous driving, and social navigation tasks with ground-truth actions and human-validated explanations. Extensive experiments and ablations show that PhysNav-DG improves navigation success rates by over 20% and achieves high efficiency, with explanations that are both highly grounded and clear. This work connects high-level semantic reasoning and geometric planning for safer and more trustworthy autonomous systems.

在不同环境和领域进行强力导航既需要准确的状态估计,也需要透明的决策。 我们展示了PhysNav-DG, 这个将古典传感器与视觉语言模型的语义力融合在一起的新框架。 我们的双部门架构从多传感器投入中预测导航行动,同时产生详细的思维链解释。 一个修改过的适应性卡尔曼过滤器根据环境背景动态调整其噪音参数。 它利用了多种原始传感器数据流以及LalaMA 3. 2 11B 和 BLIP-2 等模型的语义洞察。 为了评估我们的方法,我们引入了MD-NEX基准,这是一套新颖的多维数据集,它以地面导航、自主驾驶和社会导航动作和人性解释为一体。 广泛的实验和推理显示PhysNav-DG将导航成功率提高20%以上,并实现了高效率,其解释基础和清晰。 这项工作将高层次的语义推理推理和几何规划连接了更安全和更可靠的自主系统。


Article 194

Title@2025-06-12 (4): Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control

Title: Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control Verbesserung der Glasdefekterkennung mit Diffusionsmodellen: Adressierung unausgewogener Datensätze in der Fertigungsqualitätskontrolle 利用传播模型加强玻璃破损检测:在制造业质量控制中解决数据集不平衡问题 2505.03134v2

Authors (3): Sajjad Rezvani Boroujeni, Hossein Abedi, Tom Bush

Visual defect detection in industrial glass manufacturing remains a critical challenge due to the low frequency of defective products, leading to imbalanced datasets that limit the performance of deep learning models and computer vision systems. This paper presents a novel approach using Denoising Diffusion Probabilistic Models (DDPMs) to generate synthetic defective glass product images for data augmentation, effectively addressing class imbalance issues in manufacturing quality control and automated visual inspection. The methodology significantly enhances image classification performance of standard CNN architectures (ResNet50V2, EfficientNetB0, and MobileNetV2) in detecting anomalies by increasing the minority class representation. Experimental results demonstrate substantial improvements in key machine learning metrics, particularly in recall for defective samples across all tested deep neural network architectures while maintaining perfect precision. The most dramatic improvement was observed in ResNet50V2’s overall classification accuracy, which increased from seventy-eight percent to ninety-three percent when trained with the augmented data. This work provides a scalable, cost effective approach to enhancing automated defect detection in glass manufacturing that can potentially be extended to other industrial quality assurance systems and industries with similar class imbalance challenges.

工业玻璃制造中的视觉缺陷检测由于有缺陷产品的频率低,导致数据集失衡,限制了深层学习模型和计算机视觉系统的性能,因此,在工业玻璃制造中发现有缺陷仍然是一项重大挑战。本文介绍了一种新颖的方法,使用Denoising Difmission 概率模型(DDPMs)生成合成有缺陷的玻璃产品图像,用于数据增强,有效解决制造质量控制和自动视觉检查中的阶级不平衡问题。该方法通过增加少数阶层的代表性,大大提高了有线电视新闻新闻网标准结构(ResNet50V2, effectiveNetB0, 和 MovedNetV2)在发现反常现象方面的图像分类性能。实验结果表明,关键机器学习指标有重大改进,特别是在所有经过测试的深神经网络结构中召回有缺陷的样本,同时保持完全精确性。在ResNet50V2的总体分类精确度中观察到了最显著的改进,在接受强化数据培训后,从78%增加到93%。这项工作为改进玻璃制造中的自动缺陷检测提供了一种可扩展到其他工业质量保证系统和有类似阶级不平衡挑战的可推广的可推广的、成本有效的方法。


Article 195

Title@2025-06-12 (4): Decoding for Punctured Convolutional and Turbo Codes: A Deep Learning Solution for Protocols Compliance

Title: Decoding for Punctured Convolutional and Turbo Codes: A Deep Learning Solution for Protocols Compliance Dekodierung für punktierte Konvolutions- und Turbo-Codes: Eine Deep-Learning-Lösung für Protokolle Compliance 钻穿的革命代码和涡轮代码解码:遵守议定书的深学习解决方案 2502.15475v2

Authors (2): Yongli Yan, Linglong Dai

Neural network-based decoding methods have shown promise in enhancing error correction performance, but traditional approaches struggle with the challenges posed by punctured codes. In particular, these methods fail to address the complexities of variable code rates and the need for protocol compatibility. This paper presents a unified Long Short-Term Memory (LSTM)-based decoding architecture specifically designed to overcome these challenges. The proposed method unifies punctured convolutional and Turbo codes. A puncture embedding mechanism integrates puncturing patterns directly into the network, enabling seamless adaptation to varying code rates, while balanced bit error rate training ensures robustness across different code lengths, rates, and channels, maintaining protocol flexibility. Extensive simulations in Additive White Gaussian Noise and Rayleigh fading channels demonstrate that the proposed approach outperforms conventional decoding techniques, providing significant improvements in decoding accuracy and robustness. These results underscore the potential of LSTM-based decoding as a promising solution for next-generation artificial intelligence powered communication systems.

以神经网络为基础的解码方法在提高错误校正性能方面显示出了希望,但传统方法在克服被刺破的代码所带来的挑战方面挣扎,特别是,这些方法未能解决可变代码率的复杂性和对协议兼容性的需求。本文件介绍了一个专门为克服这些挑战而设计的基于统一长期内存(LSTM)的解码结构。拟议方法统一了穿透性脉冲和涡轮编码。一个穿刺嵌入机制将刺破模式直接纳入网络,使得能够无缝地适应不同的代码率,而平衡的位数错误率培训确保了不同代码长度、费率和渠道的稳健性,并保持了协议的灵活性。在Additive White Gaussian Noise和Rayleigh 的大规模模拟表明,拟议方法超越了常规解码技术,大大改进了解码准确性和稳健性。这些结果突出表明了基于LSTM解码的解码作为下一代人工智能通信系统有希望的解决办法的潜力。


Article 196

Title@2025-06-12 (4): PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation

Title: PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation PhysioWave: Multi-Scale Wavelet-Transformer für Physiologische Signaldarstellung PhysioWave: 生理信号代表的多阶段波盘转换器 2506.10351v1

Authors (6): Yanlong Chen, Mattia Orlandi, Pierangelo Maria Rapa, Simone Benatti, Luca Benini, Yawei Li

Physiological signals are often corrupted by motion artifacts, baseline drift, and other low-SNR disturbances, which pose significant challenges for analysis. Additionally, these signals exhibit strong non-stationarity, with sharp peaks and abrupt changes that evolve continuously, making them difficult to represent using traditional time-domain or filtering methods. To address these issues, a novel wavelet-based approach for physiological signal analysis is presented, aiming to capture multi-scale time-frequency features in various physiological signals. Leveraging this technique, two large-scale pretrained models specific to EMG and ECG are introduced for the first time, achieving superior performance and setting new baselines in downstream tasks. Additionally, a unified multi-modal framework is constructed by integrating pretrained EEG model, where each modality is guided through its dedicated branch and fused via learnable weighted fusion. This design effectively addresses challenges such as low signal-to-noise ratio, high inter-subject variability, and device mismatch, outperforming existing methods on multi-modal tasks. The proposed wavelet-based architecture lays a solid foundation for analysis of diverse physiological signals, while the multi-modal design points to next-generation physiological signal processing with potential impact on wearable health monitoring, clinical diagnostics, and broader biomedical applications.

此外,这些信号表现出强烈的非常态性,其峰值和突变不断变化,使得它们难以代表使用传统的时空或过滤方法。为了解决这些问题,提出了一种新的基于波盘的生理信号分析方法,旨在捕捉各种生理信号中的多尺度时间频率特征。利用这一技术,首次引入了针对环境管理集团和ECG的两种大规模预先培训的模型,实现了优异性,为下游任务设定了新的基线。此外,通过整合预先培训的EEEG模型,构建了一个统一的多模式框架,其中每种模式都通过其专用分支提供指导,并通过可学习的加权聚合而结合。这一设计有效地应对了诸如信号到噪音比率低、高跨质变异和装置不匹配等挑战,超过了关于多模式任务的现有方法。拟议的波盘基结构为分析多种生理信号奠定了坚实的基础,同时在多模式、更广泛的临床诊断性诊断性应用方面,同时将多模式设计点用于后期的生理影响处理。


Article 197

Title@2025-06-12 (4): LightKG: Efficient Knowledge-Aware Recommendations with Simplified GNN Architecture

Title: LightKG: Efficient Knowledge-Aware Recommendations with Simplified GNN Architecture LightKG: Effiziente wissensbasierte Empfehlungen mit vereinfachter GNN-Architektur LightKG: 具有简化 GNN 架构的高效知识软件建议 2506.10347v1

Authors (5): Yanhui Li, Dongxia Wang, Zhu Sun, Haonan Zhang, Huizhong Guo

Recently, Graph Neural Networks (GNNs) have become the dominant approach for Knowledge Graph-aware Recommender Systems (KGRSs) due to their proven effectiveness. Building upon GNN-based KGRSs, Self-Supervised Learning (SSL) has been incorporated to address the sparity issue, leading to longer training time. However, through extensive experiments, we reveal that: (1)compared to other KGRSs, the existing GNN-based KGRSs fail to keep their superior performance under sparse interactions even with SSL. (2) More complex models tend to perform worse in sparse interaction scenarios and complex mechanisms, like attention mechanism, can be detrimental as they often increase learning difficulty. Inspired by these findings, we propose LightKG, a simple yet powerful GNN-based KGRS to address sparsity issues. LightKG includes a simplified GNN layer that encodes directed relations as scalar pairs rather than dense embeddings and employs a linear aggregation framework, greatly reducing the complexity of GNNs. Additionally, LightKG incorporates an efficient contrastive layer to implement SSL. It directly minimizes the node similarity in original graph, avoiding the time-consuming subgraph generation and comparison required in previous SSL methods. Experiments on four benchmark datasets show that LightKG outperforms 12 competitive KGRSs in both sparse and dense scenarios while significantly reducing training time. Specifically, it surpasses the best baselines by an average of 5.8\% in recommendation accuracy and saves 84.3\% of training time compared to KGRSs with SSL. Our code is available at https://github.com/1371149/LightKG.

最近,图形神经网络(GNN)因其证明的有效性而成为了知识图形显示建议系统(KGRS)的主导方法。在基于GNN的KGRS、自增强学习系统(SSL)的基础上,已经纳入一个简单而强大的GNNKGS,以解决模糊性问题,导致培训时间延长。然而,通过广泛的实验,我们发现:(1)与其他KGRS相比,现有的GNNS的GGGGS没有在即使与SSL的微薄互动中保持其优异性。(2) 更复杂的模型往往在稀少的互动假设和复杂机制(如关注机制)中表现得更差,因为它们往往会增加学习困难。根据这些发现,我们提出了LightKG,这是一个简单但又强大的GNNNGG系统,以解决模糊性问题。 LightKG包括一个简化的GNNNS图层,该图将关系作为卡路配方而不是密集的嵌入,并采用线性汇总框架,大大降低GNNSNS的复杂度。此外,LKG在SLV的高效对比层结构中,将SLSLS的节级培训比时间与S的精确时间比。在以往的G中将SBERG标准中直接显示SBRRRR的模型中,将SBR的原始数据格式缩小了。


Article 198

Title@2025-06-12 (4): Technical Report with Proofs for A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments

Title: Technical Report with Proofs for A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments Technischer Bericht mit Beweisen für ein vollständiges Bild in Übereinstimmungsprüfung: Alle optimalen Ausrichtungen effizient zusammenfassen 技术报告,附有符合要求检查中完整图片的校对证明:高效地对齐所有最佳对齐 2506.10345v1

Authors (3): Philipp Bär, Moe T. Wynn, Sander J. J. Leemans

This technical report provides proofs for the claims in the paper “A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments”.

本技术报告为“符合要求的完整图片检查:有效概括所有最佳调整”文件中的索赔提供了证据。


Article 199

Title@2025-06-12 (4): Provably Learning from Language Feedback

Title: Provably Learning from Language Feedback Wahrscheinlich von Sprachfeedback lernen 从语言反馈中学习 2506.10341v1

Authors (6): Wanqiao Xu, Allen Nie, Ruijie Zheng, Aditya Modi, Adith Swaminathan, Ching-An Cheng

Interactively learning from observation and language feedback is an increasingly studied area driven by the emergence of large language model (LLM) agents. While impressive empirical demonstrations have been shown, so far a principled framing of these decision problems remains lacking. In this paper, we formalize the Learning from Language Feedback (LLF) problem, assert sufficient assumptions to enable learning despite latent rewards, and introduce $\textit{transfer eluder dimension}$ as a complexity measure to characterize the hardness of LLF problems. We show that transfer eluder dimension captures the intuition that information in the feedback changes the learning complexity of the LLF problem. We demonstrate cases where learning from rich language feedback can be exponentially faster than learning from reward. We develop a no-regret algorithm, called $\texttt{HELiX}$, that provably solves LLF problems through sequential interactions, with performance guarantees that scale with the transfer eluder dimension of the problem. Across several empirical domains, we show that $\texttt{HELiX}$ performs well even when repeatedly prompting LLMs does not work reliably. Our contributions mark a first step towards designing principled interactive learning algorithms from generic language feedback.

从观察和语言反馈中相互学习是一个日益受大型语言模式(LLM)代理商的出现所驱动的研究领域。虽然已经展示了令人印象深刻的经验示范,但迄今为止仍然缺乏对这些决定问题的有原则框架。在本文中,我们将语言反馈(LLLF)问题正式化,提出足够的假设,使学习成为可能得到的回报,并引入$\textit{transfer eluder situle} $,作为衡量LLF问题难度的复杂尺度。我们表明,传输私利的维度能够捕捉到反馈中的信息改变LLLF问题学习复杂性的直觉。我们展示了从丰富的语言反馈中学习的速度可能比从奖励中学习的速度快得多的案例。我们开发了一个名为$\ textt{HELX}的无约束算法,这通过连续的相互作用可以解决LLLF问题,而绩效保证与LF问题的转移 Eluder 维度的规模。在几个经验领域,我们显示, $\ textt{HELX} 能够很好地完成LLMS的学习问题的复杂性。我们的贡献标志着设计从通用语言的交互式反射法的第一步。


Article 200

Title@2025-06-12 (4): Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR

Title: Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Auf dem Weg zu datenschutzerhaltender verteilter Intelligenz in AR/VR/MR 多模式、多模式、多任务、多任务联邦基金会下一代扩展现实系统模式:争取在AR/VR/MR/MR中发布隐私保护分配情报 2506.05683v3

Authors (5): Fardis Nadimi, Payam Abdisarabshali, Kasra Borazjani, Jacob Chakareski, Seyyedali Hosseinalipour

Extended reality (XR) systems, which consist of virtual reality (VR), augmented reality (AR), and mixed reality (XR), offer a transformative interface for immersive, multi-modal, and embodied human-computer interaction. In this paper, we envision that multi-modal multi-task (M3T) federated foundation models (FedFMs) can offer transformative capabilities for XR systems through integrating the representational strength of M3T foundation models (FMs) with the privacy-preserving model training principles of federated learning (FL). We present a modular architecture for FedFMs, which entails different coordination paradigms for model training and aggregations. Central to our vision is the codification of XR challenges that affect the implementation of FedFMs under the SHIFT dimensions: (1) Sensor and modality diversity, (2) Hardware heterogeneity and system-level constraints, (3) Interactivity and embodied personalization, (4) Functional/task variability, and (5) Temporality and environmental variability. We illustrate the manifestation of these dimensions across a set of emerging and anticipated applications of XR systems. Finally, we propose evaluation metrics, dataset requirements, and design tradeoffs necessary for the development of resource-aware FedFMs in XR. This perspective aims to chart the technical and conceptual foundations for context-aware privacy-preserving intelligence in the next generation of XR systems.

由虚拟现实(VR)、强化现实(AR)和混合现实(XR)组成的扩展现实(XR)系统为隐性、多模式和体现人的计算机互动提供了一个变革界面,在本文件中,我们设想,多模式多任务(M3T)联合基础模型(FedFMs)能够通过将M3T基础模型(FMs)的代表性强度和体现的个人化、(4)功能/任务变异性以及(5)时间和环境变异性相结合,为XRFM提供一个模块架构,这需要为模型培训和聚合提供不同的协调模式。我们愿景的核心是编纂影响SHIFT维度下FMS执行的XR挑战:(1) 传感器和模式多样性,(2) 硬软件异质和系统级限制,(3) 互动和体现个人化,(4) 功能/任务变异性,以及(5) 时间和环境变异性。我们展示了这些层面在XR系统一系列新兴和预期的应用中的表现。我们愿景的核心是:在SHIFFFM系统下,为下一个数据库设计、数据格式设计目标,我们提议了XFM格式设计技术图表的下一个格式要求。


Article 201

Title@2025-06-12 (4): On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention

Title: On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention On-the-Fly Adaptive Destillation von Transformer auf Dual-State Lineare Aufmerksamkeit 向两国线性注意对变压器进行实时调整性蒸馏 2506.09316v2

Authors (5): Yeonju Ro, Zhenyu Zhang, Souvik Kundu, Zhangyang Wang, Aditya Akella

Large language models (LLMs) excel at capturing global token dependencies via self-attention but face prohibitive compute and memory costs on lengthy inputs. While sub-quadratic methods (e.g., linear attention) can reduce these costs, they often degrade accuracy due to overemphasizing recent tokens. In this work, we first propose dual-state linear attention (DSLA), a novel design that maintains two specialized hidden states-one for preserving historical context and one for tracking recency-thereby mitigating the short-range bias typical of linear-attention architectures. To further balance efficiency and accuracy under dynamic workload conditions, we introduce DSLA-Serve, an online adaptive distillation framework that progressively replaces Transformer layers with DSLA layers at inference time, guided by a sensitivity-based layer ordering. DSLA-Serve uses a chained fine-tuning strategy to ensure that each newly converted DSLA layer remains consistent with previously replaced layers, preserving the overall quality. Extensive evaluations on commonsense reasoning, long-context QA, and text summarization demonstrate that DSLA-Serve yields 2.3x faster inference than Llama2-7B and 3.0x faster than the hybrid Zamba-7B, while retaining comparable performance across downstream tasks. Our ablation studies show that DSLA’s dual states capture both global and local dependencies, addressing the historical-token underrepresentation seen in prior linear attentions. Codes are available at https://github.com/utnslab/DSLA-Serve.

大型语言模型(LLMS)擅长通过自我注意捕捉全球象征性依赖性,但在冗长投入下却面临着令人望而却步的计算和记忆成本。尽管次二次计算方法(如线性关注)可以降低这些费用,但由于过度强调最近的一些象征性,这些方法往往会降低准确性。在这项工作中,我们首先提议双州线性关注(DSLA),这是保持两种保存历史背景的专门隐蔽状态的新设计,也是追踪反应的新型设计,从而减轻线性关注结构典型的短距离偏差。为了在动态工作量条件下进一步平衡效率和准确性,我们引入DSLA-Serve,这是一个在线适应性蒸馏框架,在基于敏感性的层次下,以DSLA-Servoration层取代变异的变异层。 DSLA-Servervey 使用一个链式的微调战略,以确保每个新转换的DSLA层与先前更换的层保持一致,从而保持总体质量。对通用逻辑推理学、长直线性QA和文字平衡进行广泛的评价,我们在LSLSLA-ServiewServiewSDSLx公司在LA-SLA-ServiewSLSLxSBSerview中比SLSBSBSBServiax前的成绩上更快更快的成绩上更快。


Article 202

Title@2025-06-12 (4): Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Title: Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs Amulett: Neuausrichtung während der Testzeit für Personalisierte Präferenzanpassung von LLMs 缩略图:在试验期间重新对准,以适应LLMM的个性化偏好 2502.19148v3

Authors (8): Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang

How to align large language models (LLMs) with user preferences from a static general dataset has been frequently studied. However, user preferences are usually personalized, changing, and diverse regarding culture, values, or time. This leads to the problem that the actual user preferences often do not coincide with those trained by the model developers in the practical use of LLMs. Since we cannot collect enough data and retrain for every demand, researching efficient real-time preference adaptation methods based on the backbone LLMs during test time is important. To this end, we introduce Amulet, a novel, training-free framework that formulates the decoding process of every token as a separate online learning problem with the guidance of simple user-provided prompts, thus enabling real-time optimization to satisfy users’ personalized preferences. To reduce the computational cost brought by this optimization process for each token, we additionally provide a closed-form solution for each iteration step of the optimization process, thereby reducing the computational time cost to a negligible level. The detailed experimental results demonstrate that Amulet can achieve significant performance improvements in rich settings with combinations of different LLMs, datasets, and user preferences, while maintaining acceptable computational efficiency.

经常研究如何使大型语言模型(LLMS)与静态一般数据集的用户偏好相一致,然而,用户偏好通常是个性化的、变化的,在文化、价值或时间方面是多种多样的,这导致一个问题,即实际用户偏好往往与模型开发者在实际使用LLMs方面所培训的偏好不相吻合。由于我们无法为每项需求收集足够的数据和再培训,因此在测试期间根据主干LLMS研究有效的实时偏好适应方法十分重要。为此,我们引入了Amulet,这是一个创新的、无培训的框架,它以简单的用户提供的提示为指导,将每个符号的解码进程作为一个单独的在线学习问题,从而使得实时优化能够满足用户的个人偏好。为了减少这种优化进程为每种符号带来的计算成本,我们为优化过程的每个迭代步骤提供额外的封闭式解决方案,从而将计算时间成本降至微不足道的水平。我们详细的实验结果表明,Amulett可以在丰富的环境中实现显著的性改进,同时将不同的LMS、数据设置和用户偏好,同时保持可接受的效率。


Article 203

Title@2025-06-12 (4): Air in Your Neighborhood: Fine-Grained AQI Forecasting Using Mobile Sensor Data

Title: Air in Your Neighborhood: Fine-Grained AQI Forecasting Using Mobile Sensor Data Luft in Ihrer Nachbarschaft: Feinkörnige AQI-Prognose mit mobilen Sensordaten 利用移动感应器数据进行精密的AQI预报 2506.10332v1

Authors (1): Aaryam Sharma

Air pollution has become a significant health risk in developing countries. While governments routinely publish air-quality index (AQI) data to track pollution, these values fail to capture the local reality, as sensors are often very sparse. In this paper, we address this gap by predicting AQI in 1 km^2 neighborhoods, using the example of AirDelhi dataset. Using Spatio-temporal GNNs we surpass existing works by 71.654 MSE a 79% reduction, even on unseen coordinates. New insights about AQI such as the existence of strong repetitive short-term patterns and changing spatial relations are also discovered. The code is available on GitHub.

在发展中国家,空气污染已成为严重的健康风险。虽然政府定期公布空气质量指数(AQI)数据以跟踪污染,但这些数值未能反映当地现实,因为传感器往往非常稀少。 在本文中,我们用AirDelhi数据集的例子预测AQI在1平方公里的相邻地区存在这一差距。使用时空GNNs,我们比现有工程减少71.654 MSE79%,即使是在看不见的坐标上。还发现了关于AQI的新见解,例如存在强烈重复的短期模式和不断变化的空间关系。该代码可在GitHub上查阅。


Article 204

Title@2025-06-12 (4): CompilerDream: Learning a Compiler World Model for General Code Optimization

Title: CompilerDream: Learning a Compiler World Model for General Code Optimization CompilerDream: Lernen eines Compiler-Weltmodells für die allgemeine Code-Optimierung 汇编者:学习编纂者世界通用守则优化模式 2404.16077v3

Authors (5): Chaoyi Deng, Jialong Wu, Ningya Feng, Jianmin Wang, Mingsheng Long

Effective code optimization in compilers is crucial for computer and software engineering. The success of these optimizations primarily depends on the selection and ordering of the optimization passes applied to the code. While most compilers rely on a fixed sequence of optimization passes, current methods to find the optimal sequence either employ impractically slow search algorithms or learning methods that struggle to generalize to code unseen during training. We introduce CompilerDream, a model-based reinforcement learning approach to general code optimization. CompilerDream comprises a compiler world model that accurately simulates the intrinsic properties of optimization passes and an agent trained on this model to produce effective optimization strategies. By training on a large-scale program dataset, CompilerDream is equipped to serve as a general code optimizer across various application scenarios and source-code languages. Our extensive experiments first highlight CompilerDream’s strong optimization capabilities for autotuning, where it leads the CompilerGym leaderboard. More importantly, the zero-shot generalization ability of large-scale trained compiler world model and agent, excels across diverse datasets, surpassing LLVM’s built-in optimizations and other state-of-the-art methods in both settings of value prediction and end-to-end code optimization.

编译器的有效代码优化对于计算机和软件工程至关重要。 这些优化的成功主要取决于对代码应用的优化通行证的选择和顺序。 虽然大多数编译器依赖固定的优化通行证序列, 但当前寻找最佳序列的方法要么采用不切实际的慢速搜索算法, 要么采用在培训期间难以普及到无法理解的代码的学习方法。 我们引入了基于模型的强化学习方法CapilrDream, 这是一种基于模型的强化学习方法, 用于一般代码优化。 编译器Dream 包括一个编译器世界模型, 准确模拟优化通行证的内在特性, 以及一个经过培训的代理人, 以制作有效的优化战略。 虽然大多数编译器依赖一个大型程序数据集的培训, 但通过对编译器的数据集进行培训, 将编译器Dream 的强大优化能力 首次凸显了 , 并引导了编译器Gym 领导板。 更重要的是, 大规模经过培训的编译器世界模型和代理器的零光化能力, 超越了各种数据集, 超越了LLVM的终极值和最优化预测设置中的其他状态。


Article 205

Title@2025-06-12 (4): Towards Scalable SOAP Note Generation: A Weakly Supervised Multimodal Framework

Title: Towards Scalable SOAP Note Generation: A Weakly Supervised Multimodal Framework Auf dem Weg zu einer skalierbaren SOAP-Note Generation: Ein schwach überwachter multimodaler Rahmen 向可缩放的 SOAP 笔记本生成方向迈进: 受微弱监督的多模式框架 2506.10328v1

Authors (3): Sadia Kamal, Tim Oates, Joy Wan

Skin carcinoma is the most prevalent form of cancer globally, accounting for over $8 billion in annual healthcare expenditures. In clinical settings, physicians document patient visits using detailed SOAP (Subjective, Objective, Assessment, and Plan) notes. However, manually generating these notes is labor-intensive and contributes to clinician burnout. In this work, we propose a weakly supervised multimodal framework to generate clinically structured SOAP notes from limited inputs, including lesion images and sparse clinical text. Our approach reduces reliance on manual annotations, enabling scalable, clinically grounded documentation while alleviating clinician burden and reducing the need for large annotated data. Our method achieves performance comparable to GPT-4o, Claude, and DeepSeek Janus Pro across key clinical relevance metrics. To evaluate clinical quality, we introduce two novel metrics MedConceptEval and Clinical Coherence Score (CCS) which assess semantic alignment with expert medical concepts and input features, respectively.

皮肤癌是全球最常见的癌症形式,每年的保健支出超过80亿美元。在临床环境中,医生使用详细的SOAP(目标、目标、评估和计划)注释来记录病人的探视。然而,人工生成这些笔记是劳动密集型的,有助于临床烧伤。在这项工作中,我们提出一个监督不力的多式联运框架,从有限的投入(包括损耗图像和稀少的临床文本)中产生临床结构化的SOAP笔记。我们的方法减少了对手动说明的依赖,允许可缩放、临床基础文件,同时减轻临床负担和减少对大量附加说明数据的需求。我们的方法取得了与GPT-4o、Claude和DeepSeek Janus Pro相类似的业绩。为了评估临床质量,我们引入了两个新型的计量标准MedCeptionEval和临床一致性计数,分别评估与专家医疗概念和输入特征的语义调整。


Article 206

Title@2025-06-12 (4): A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon

Title: A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon Ein Benchmark für die Verallgemeinerung unterschiedlicher Teamstrategien im wettbewerbsfähigen Pokémon 普凯蒙竞争中全面推广不同团队战略的基准 2506.10326v1

Authors (5): Cameron Angliss, Jiaxun Cui, Jiaheng Hu, Arrasy Rahman, Peter Stone

Developing AI agents that can robustly adapt to dramatically different strategic landscapes without retraining is a central challenge for multi-agent learning. Pok'emon Video Game Championships (VGC) is a domain with an extraordinarily large space of possible team configurations of approximately $10^{139}$ - far larger than those of Dota or Starcraft. The highly discrete, combinatorial nature of team building in Pok'emon VGC causes optimal strategies to shift dramatically depending on both the team being piloted and the opponent’s team, making generalization uniquely challenging. To advance research on this problem, we introduce VGC-Bench: a benchmark that provides critical infrastructure, standardizes evaluation protocols, and supplies human-play datasets and a range of baselines - from large-language-model agents and behavior cloning to reinforcement learning and empirical game-theoretic methods such as self-play, fictitious play, and double oracle. In the restricted setting where an agent is trained and evaluated on a single-team configuration, our methods are able to win against a professional VGC competitor. We extensively evaluated all baseline methods over progressively larger team sets and find that even the best-performing algorithm in the single-team setting struggles at scaling up as team size grows. Thus, policy generalization across diverse team strategies remains an open challenge for the community. Our code is open sourced at https://github.com/cameronangliss/VGC-Bench.

在不再培训的情况下,发展能够强有力地适应完全不同的战略景观的AI代理机构是多试探学习的一个中心挑战。 Pok\ emamon Vegle General Campales(VGC)是一个非常庞大的领域,拥有大约10139美元(比Dota或Starcraft要大得多)的可能团队配置空间,远大于Dota或Starcraft。在Pok'emon VGC的团队建设中,高度离散、组合性强的团队建设导致最佳战略的急剧转变,取决于正在试点的团队和对手团队,使普遍性具有独特的挑战性。为了推进对这一问题的研究,我们引入VGC-Bench:一个提供关键基础设施的基准,将评估协议标准化,并提供人类游戏数据集和一系列基线 — 从大型语言模范代理和行为克隆到强化学习和实验性游戏-理论方法,如自玩游戏、虚构游戏和双形或变形。在限制环境中培训和评价一个代理机构,我们的方法能够战胜专业VGC竞争者。我们广泛评估VGC-Bench:我们广泛评估了所有基线方法,在团队规模上超越了整个团队的团队规模上,在团队中不断提升了整个团队的游戏中,在团队中不断演进进进进进进进进进进进进进进进进进进进进进进进式的系统。


Article 207

Title@2025-06-12 (4): SWDL: Stratum-Wise Difference Learning with Deep Laplacian Pyramid for Semi-Supervised 3D Intracranial Hemorrhage Segmentation

Title: SWDL: Stratum-Wise Difference Learning with Deep Laplacian Pyramid for Semi-Supervised 3D Intracranial Hemorrhage Segmentation SWDL: Stratum-Wise Difference Learning mit tiefer Laplacian Pyramide für halbüberwachte 3D-Intrakranialblutungssegmentierung SWDL: 与深拉拉拉比西亚金字塔为半超3D内出血分块进行平流和深拉拉拉西亚金字塔差异学习 2506.10325v1

Authors (6): Cheng Wang, Siqi Chen, Donghua Mi, Yang Chen, Yudong Zhang, Yinsheng Li

Recent advances in medical imaging have established deep learning-based segmentation as the predominant approach, though it typically requires large amounts of manually annotated data. However, obtaining annotations for intracranial hemorrhage (ICH) remains particularly challenging due to the tedious and costly labeling process. Semi-supervised learning (SSL) has emerged as a promising solution to address the scarcity of labeled data, especially in volumetric medical image segmentation. Unlike conventional SSL methods that primarily focus on high-confidence pseudo-labels or consistency regularization, we propose SWDL-Net, a novel SSL framework that exploits the complementary advantages of Laplacian pyramid and deep convolutional upsampling. The Laplacian pyramid excels at edge sharpening, while deep convolutions enhance detail precision through flexible feature mapping. Our framework achieves superior segmentation of lesion details and boundaries through a difference learning mechanism that effectively integrates these complementary approaches. Extensive experiments on a 271-case ICH dataset and public benchmarks demonstrate that SWDL-Net outperforms current state-of-the-art methods in scenarios with only 2% labeled data. Additional evaluations on the publicly available Brain Hemorrhage Segmentation Dataset (BHSD) with 5% labeled data further confirm the superiority of our approach. Code and data have been released at https://github.com/SIAT-CT-LAB/SWDL.

医学成像(SSL)的近期进展将深层次的基于学习的分割作为主要方法,尽管通常需要大量人工附加说明的数据。然而,由于标签过程既乏味又昂贵,因此获得室内出血(ICH)的说明仍然特别困难。半监督的学习(SSL)已成为解决标签数据稀缺问题的一个大有希望的解决办法,特别是在体积图解析方面。与主要侧重于高信任伪标签或一致性正规化的常规SSL方法不同,我们提议SWDL-Net(SWDL-Net)是一个新的SSL框架,它利用了拉普拉西金字塔和深刻的革命性上层取样的互补优势。 Laplacian金字塔在精锐化方面表现突出,而深层的演进化则通过灵活的地貌图绘制提高了细节的精确度。我们的框架通过一种差异学习机制实现腐蚀细节和界限的更优分化,有效地融合了这些互补的方法。关于271个案件ICH数据集和公共基准的广泛实验表明,SWD-AB网络在假设中超越了目前状态的状态方法,在SBRial-ral-labalalalation方法中,只确认了我们已公开标定的MS/MISD数据格式的数据。


Article 208

Title@2025-06-12 (4): DiffUMI: Training-Free Universal Model Inversion via Unconditional Diffusion for Face Recognition

Title: DiffUMI: Training-Free Universal Model Inversion via Unconditional Diffusion for Face Recognition DiffUMI: Schulungsfreie universelle Modellinversion über Bedingungslose Diffusion für die Gesichtserkennung DiffUMI:通过无条件传播面相承认,无培训通用模式翻版 2504.18015v2

Authors (4): Hanrui Wang, Shuo Wang, Chun-Shien Lu, Isao Echizen

Face recognition technology presents serious privacy risks due to its reliance on sensitive and immutable biometric data. To address these concerns, such systems typically convert raw facial images into embeddings, which are traditionally viewed as privacy-preserving. However, model inversion attacks challenge this assumption by reconstructing private facial images from embeddings, highlighting a critical vulnerability in face recognition systems. Most existing inversion methods require training a separate generator for each target model, making them computationally intensive. In this work, we introduce DiffUMI, a diffusion-based universal model inversion attack that requires no additional training. DiffUMI is the first approach to successfully leverage unconditional face generation without relying on model-specific generators. It surpasses state-of-the-art attacks by 15.5% and 9.82% in success rate on standard and privacy-preserving face recognition systems, respectively. Furthermore, we propose a novel use of out-of-domain detection (OODD), demonstrating for the first time that model inversion can differentiate between facial and non-facial embeddings using only the embedding space.

面部识别技术由于依赖敏感和不可改变的生物鉴别数据而带来严重的隐私风险。 为解决这些关切,这类系统通常将原始面部图像转换成嵌入器,传统上被视为隐私保护。 然而,模型反向攻击通过将私人面部图像从嵌入器中重建出来,在面部识别系统中突出一个关键弱点,从而挑战这一假设。 大多数现有的反向方法要求为每个目标模型培训一个单独的生成器,使其在计算上集聚。 在这项工作中,我们引入了DiffUMI, 一种基于扩散的通用反向模型攻击,不需要额外培训。 DiffUMI是成功利用无条件面部生成而无需依赖模型特定生成器的第一个方法。 它在标准上和隐私面部识别系统中分别超过最先进的攻击率15.5%和9.82%。 此外,我们提议新使用外部检测(OODDD), 首次展示了模型能够仅使用嵌入空间区分面部和非面部嵌入器。


Article 209

Title@2025-06-12 (4): DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDE

Title: DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDE DeePoly: Ein High-Order Accuracy Scientific Machine Learning Framework für die Funktionsannäherung und Lösung von PDE Deepoly: 功能近似和溶解PDE高分精确度科学机器学习框架 2506.04613v2

Authors (2): Li Liu, Heng Yong

Recently, machine learning methods have gained significant traction in scientific computing, particularly for solving Partial Differential Equations (PDEs). However, methods based on deep neural networks (DNNs) often lack convergence guarantees and computational efficiency compared to traditional numerical schemes. This work introduces DeePoly, a novel framework that transforms the solution paradigm from pure non-convex parameter optimization to a two-stage approach: first employing a DNN to capture complex global features, followed by linear space optimization with combined DNN-extracted features (Scoper) and polynomial basis functions (Sniper). This strategic combination leverages the complementary strengths of both methods – DNNs excel at approximating complex global features (i.e., high-gradient features) and stabilize the polynomial approximation while polynomial bases provide high-precision local corrections with convergence guarantees. Theoretical analysis and numerical experiments demonstrate that this approach significantly enhances both high-order accuracy and efficiency across diverse problem types while maintaining mesh-free and scheme-free properties. This paper also serves as a theoretical exposition for the open-source project DeePoly.

最近,机器学习方法在科学计算中获得了显著的牵引力,特别是在解决部分差异(PDE)方面。然而,基于深神经网络(DNNS)的方法往往缺乏趋同保障和与传统数字方法相比的计算效率。这项工作介绍了Deepolly,这是一个将解决方案范式从纯非Convex参数优化转变为两阶段方法的新框架:首先使用DNN来捕捉复杂的全球特征,然后是线性空间优化,同时结合DNN抽取的特征(Scoper)和多元基函数(Sniper),这种战略组合利用了两种方法的互补优势 – – DNNS在相近复杂的全球特征(即高位特征)方面的优势,稳定多元近距离,而多元基础则提供高精度的地方校正,同时提供趋同保证。理论分析和数字实验表明,这一方法极大地提高了不同类型问题高端的准确度和效率,同时保持无网格和无计划特性。这份文件还作为开放源项目Deeppoly的理论演示。


Article 210

Title@2025-06-12 (4): Policy-Based Trajectory Clustering in Offline Reinforcement Learning

Title: Policy-Based Trajectory Clustering in Offline Reinforcement Learning Policy-based Trajektorie Clustering im Offline-Verstärkungslernen 以政策为基础的离线强化学习轨迹集群 2506.09202v2

Authors (3): Hao Hu, Xinqi Wang, Simon Shaolei Du

We introduce a novel task of clustering trajectories from offline reinforcement learning (RL) datasets, where each cluster center represents the policy that generated its trajectories. By leveraging the connection between the KL-divergence of offline trajectory distributions and a mixture of policy-induced distributions, we formulate a natural clustering objective. To solve this, we propose Policy-Guided K-means (PG-Kmeans) and Centroid-Attracted Autoencoder (CAAE). PG-Kmeans iteratively trains behavior cloning (BC) policies and assigns trajectories based on policy generation probabilities, while CAAE resembles the VQ-VAE framework by guiding the latent representations of trajectories toward the vicinity of specific codebook entries to achieve clustering. Theoretically, we prove the finite-step convergence of PG-Kmeans and identify a key challenge in offline trajectory clustering: the inherent ambiguity of optimal solutions due to policy-induced conflicts, which can result in multiple equally valid but structurally distinct clusterings. Experimentally, we validate our methods on the widely used D4RL dataset and custom GridWorld environments. Our results show that both PG-Kmeans and CAAE effectively partition trajectories into meaningful clusters. They offer a promising framework for policy-based trajectory clustering, with broad applications in offline RL and beyond.

为了解决这个问题,我们提出了将离线强化学习(RL)数据集的轨迹分组的新任务,即每个集群中心代表产生其轨迹的政策。通过利用离线轨迹分布的KL光度与政策驱动分布的混合分布之间的连接,我们制定了自然集群目标。为了解决这个问题,我们提出了政策指南K手段(PG-Kpoles)和Centrid-Atrate Autoencoder(CAAE),PG-Kpoles反复培训行为克隆(BC)政策,并根据政策生成概率分配轨迹。CAAE类似于VQ-VAE框架,指导轨道在特定代码集条目附近的潜在表达方式,以达到集群。理论上,我们证明PG-Kpoles(PG-Kpoles)和Coide-Aloadroadroadal 集群的有限阶段融合,并找出一个关键挑战:政策引发的冲突导致多重的、但结构上不同的组合,而结构上则可以使R-VQ-VA的应用程序与VV-VA(R)框架相似,我们有效地验证了我们G-G Greal-G Greal-G Groal-I-G)在数据库提供的结果。


Article 211

Title@2025-06-12 (4): PerfTracker: Online Performance Troubleshooting for Large-scale Model Training in Production

Title: PerfTracker: Online Performance Troubleshooting for Large-scale Model Training in Production PerfTracker: Online-Performance-Fehlersuche für großformatige Modellschulungen in der Produktion PerfTracker:大规模生产示范培训在线绩效问题解决 2506.08528v3

Authors (13): Yu Guan, Zhiyu Yin, Haoyu Chen, Sheng Cheng, Chaojie Yang, Kun Qian, Tianyin Xu, Yang Zhang, Hanyu Zhao, Yong Li, Wei Lin, Dennis Cai, Ennan Zhai

Troubleshooting performance problems of large model training (LMT) is immensely challenging, due to unprecedented scales of modern GPU clusters, the complexity of software-hardware interactions, and the data intensity of the training process. Existing troubleshooting approaches designed for traditional distributed systems or datacenter networks fall short and can hardly apply to real-world training systems. In this paper, we present PerfTracker, the first online troubleshooting system utilizing fine-grained profiling, to diagnose performance issues of large-scale model training in production. PerfTracker can diagnose performance issues rooted in both hardware (e.g., GPUs and their interconnects) and software (e.g., Python functions and GPU operations). It scales to LMT on modern GPU clusters. PerfTracker effectively summarizes runtime behavior patterns of fine-grained LMT functions via online profiling, and leverages differential observability to localize the root cause with minimal production impact. PerfTracker has been deployed as a production service for large-scale GPU clusters of O(10, 000) GPUs (product homepage https://help.aliyun.com/zh/pai/user-guide/perftracker-online-performance-analysis-diagnostic-tool). It has been used to diagnose a variety of difficult performance issues.

大型模型培训(LMT)的故障排除问题非常艰巨,原因是现代GPU集群规模空前,软件硬件互动的复杂性,以及培训过程的数据强度。为传统分布式系统或数据中心网络设计的现有故障排除方法不足,难以适用于现实世界培训系统。本文介绍PerfTracker,这是第一个使用微小分析的在线故障排除系统,用于诊断生产中大规模模型培训的绩效问题。 PerfTracker可以诊断硬件(例如,GPUs及其内部连接)和软件(例如,Python功能和GPU操作)两方面的绩效问题。在现代GPU组或数据中心中,现有故障排除方法不足,难以适用于现实世界培训系统。在本文中,我们介绍PerfTracker,这是第一个使用微小分析模型分析的在线故障排除系统,用来分析大规模GPUG(例如,10,000)的硬件组合及其内部连接)和软件(例如,Python 功能-stall-stall) GPUS-stall a has hasimal-hillagemental-deviewal ASyalmental a.


Article 212

Title@2025-06-12 (4): Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting

Title: Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting Aufwärtsgewichtung einfache Proben in Feinabstimmung Mitigates Vergessen 高重量加重量易加重量样本 2502.02797v2

Authors (5): Sunny Sanyal, Hayden Prairie, Rudrajit Das, Ali Kavis, Sujay Sanghavi

Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities, a phenomenon known as “catastrophic forgetting”. This is especially an issue when one does not have access to the data and recipe used to develop the pre-trained model. Under this constraint, most existing methods for mitigating forgetting are inapplicable. To address this challenge, we propose a sample weighting scheme for the fine-tuning data solely based on the pre-trained model’s losses. Specifically, we upweight the easy samples on which the pre-trained model’s loss is low and vice versa to limit the drift from the pre-trained model. Our approach is orthogonal and yet complementary to existing methods; while such methods mostly operate on parameter or gradient space, we concentrate on the sample space. We theoretically analyze the impact of fine-tuning with our method in a linear setting, showing that it stalls learning in a certain subspace which inhibits overfitting to the target task. We empirically demonstrate the efficacy of our method on both language and vision tasks. As an example, when fine-tuning Gemma 2 2B on MetaMathQA, our method results in only a $0.8\%$ drop in accuracy on GSM8K (another math dataset) compared to standard fine-tuning, while preserving $5.4\%$ more accuracy on the pre-training datasets. Our code is publicly available at https://github.com/sanyalsunny111/FLOW_finetuning .

在下游任务上微调一个经过预先训练的模型往往会降低其原有能力,即所谓的“灾难性的忘记”现象。当一个人无法获取用于开发预训练模式的数据和食谱时,这尤其是一个问题。在这个制约因素下,大多数现有的减轻忘记的方法是不适用的。为了应对这一挑战,我们建议只根据经过训练的模型的损失对微调数据进行抽样加权。具体地说,我们将经过训练的模型损失程度较低的简易样本加起来,反之,以限制从预训练模式中流出。我们的方法是垂直的,但又与现有方法相辅相成;虽然这些方法大多在参数或梯度空间上运作,但我们集中关注抽样空间。我们从理论上分析在线性环境下微调方法的影响,表明它使微调方法的学习停滞在某些子空间里,这抑制了目标任务。我们从经验上展示了我们方法在语言和视觉任务两方面的功效。我们的方法在MetAQA 2B 上微调GMath-QA 2,我们的方法只是将G-Q-Q-roup a dal deal dal disal disal disal setty on setal setal dal sal sal setal sal setal set laps.


Article 213

Title@2025-06-12 (4): PyLO: Towards Accessible Learned Optimizers in PyTorch

Title: PyLO: Towards Accessible Learned Optimizers in PyTorch PyLO: Auf dem Weg zu zugänglich gelernten Optimierern in PyTorch PyLO:在皮托尔奇努力实现无障碍、可获取的优化 2506.10315v1

Authors (6): Paul Janson, Benjamin Therien, Quentin Anthony, Xiaolong Huang, Abhinav Moudgil, Eugene Belilovsky

Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optimizers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances – such as VeLO, which was meta-trained for 4000 TPU-months – remain largely inaccessible to the broader community, in part due to their reliance on JAX and the absence of user-friendly packages for applying the optimizers after meta-training. To address this gap, we introduce PyLO, a PyTorch-based library that brings learned optimizers to the broader machine learning community through familiar, widely adopted workflows. Unlike prior work focused on synthetic or convex tasks, our emphasis is on applying learned optimization to real-world large-scale pre-training tasks. Our release includes a CUDA-accelerated version of the small_fc_lopt learned optimizer architecture from (Metz et al., 2022a), delivering substantial speedups – from 39.36 to 205.59 samples/sec throughput for training ViT B/16 with batch size 32. PyLO also allows us to easily combine learned optimizers with existing optimization tools such as learning rate schedules and weight decay. When doing so, we find that learned optimizers can substantially benefit. Our code is available at https://github.com/Belilovsky-Lab/pylo

过去十年来,最优化的学习者一直是一个积极的研究主题,在实用的、通用的优化优化方法方面不断取得进展,这种优化方法可以作为像亚当这样的广泛使用的方法的现成替代物。然而,最近的进展 – – 例如VELO, 它经过4000个TPU月的元训练,在4000个TPU月中,对于更广大的社区来说基本上是无法实现的 – – 部分原因是他们依赖JAX,而且缺乏方便用户的软件包,在进行元培训后应用最优化设备。为了弥补这一差距,我们引入了PyLO,这是一个以PyTorch为基础的图书馆,通过熟悉和广泛采用工作流程,将学习的优化者带到更广泛的机器学习界。与以前侧重于合成或康韦克斯任务的工作不同,我们的重点是对现实世界大规模培训前的任务应用所学的优化。我们的发布内容包括CUDA-AC-ACerc_lopat学习的小型优化器结构(Metz et al., 2022a),我们引入了3936至205.59个样本/sexinstupupupupupupups-weal labs,我们可以很容易地学习到这样的优化工具。


Article 214

Title@2025-06-12 (4): Detecting Sockpuppetry on Wikipedia Using Meta-Learning

Title: Detecting Sockpuppetry on Wikipedia Using Meta-Learning Sockepuppetry auf Wikipedia erkennen Mit Meta-Learning 在维基百科上用元学习探测袜子布料 2506.10314v1

Authors (2): Luc Raszewski, Christine De Kock

Malicious sockpuppet detection on Wikipedia is critical to preserving access to reliable information on the internet and preventing the spread of disinformation. Prior machine learning approaches rely on stylistic and meta-data features, but do not prioritise adaptability to author-specific behaviours. As a result, they struggle to effectively model the behaviour of specific sockpuppet-groups, especially when text data is limited. To address this, we propose the application of meta-learning, a machine learning technique designed to improve performance in data-scarce settings by training models across multiple tasks. Meta-learning optimises a model for rapid adaptation to the writing style of a new sockpuppet-group. Our results show that meta-learning significantly enhances the precision of predictions compared to pre-trained models, marking an advancement in combating sockpuppetry on open editing platforms. We release a new dataset of sockpuppet investigations to foster future research in both sockpuppetry and meta-learning fields.

维基百科上的恶性袜子探测对保存在互联网上获取可靠信息的机会和防止虚假信息传播至关重要。 先前的机器学习方法依赖于文体和元数据特征,但并不优先考虑作者特定行为。 结果,他们努力有效地模拟特定袜子布偶群体的行为,特别是在文本数据有限的情况下。 为了解决这个问题,我们提议应用元学习,即一种机器学习技术,通过多种任务的培训模型来改善数据卡片环境中的性能。 元学习选修模式是快速适应一个新的袜子布偶团体的写作风格的模式。 我们的结果表明,与预先培训的模式相比,元学习极大地提高了预测的准确性,标志着在打击开放编辑平台上的袜子布偶方面有所进步。 我们发布了一套新数据集,由袜子调查提供,以促进今后在袜子和元学习领域的研究。


Article 215

Title@2025-06-12 (4): Collaborative Min-Max Regret in Grouped Multi-Armed Bandits

Title: Collaborative Min-Max Regret in Grouped Multi-Armed Bandits Kollaborativer Min-Max-Streit in gruppierten Multi-Armed Bandits 联合多武装强盗团体中负负负负 2506.10313v1

Authors (2): Moïse Blanchard, Vineet Goyal

We study the impact of sharing exploration in multi-armed bandits in a grouped setting where a set of groups have overlapping feasible action sets [Baek and Farias ‘24]. In this grouped bandit setting, groups share reward observations, and the objective is to minimize the collaborative regret, defined as the maximum regret across groups. This naturally captures applications in which one aims to balance the exploration burden between groups or populations – it is known that standard algorithms can lead to significantly imbalanced exploration cost between groups. We address this problem by introducing an algorithm Col-UCB that dynamically coordinates exploration across groups. We show that Col-UCB achieves both optimal minimax and instance-dependent collaborative regret up to logarithmic factors. These bounds are adaptive to the structure of shared action sets between groups, providing insights into when collaboration yields significant benefits over each group learning their best action independently.

我们研究在一个组合环境中在多武装匪徒中分享勘探活动的影响,在这种组合环境中,一组团体[Baek和Faras ‘24]有重叠的可行行动组合。在这个组合的匪徒环境中,小组分享奖励意见,目标是尽量减少协作的遗憾,即各团体之间的最大遗憾。这自然地捕捉了旨在平衡各团体或人口之间勘探负担的各种应用 – – 众所周知,标准算法可能导致各团体之间勘探费用严重失衡。我们通过采用动态协调各团体间勘探活动的Col-CUB算法来解决这一问题。我们证明,Col-UB既实现了最佳小型合作,又实现了以实例为基础的协作遗憾,最终达到了对数因素。这些界限适应了各团体之间共同行动组合的结构,提供了对协作在为每个团体独立学习其最佳行动带来重大好处时的洞察力。


Article 216

Title@2025-06-12 (4): Agnostic Smoothed Online Learning without Knowledge of the Base Measure

Title: Agnostic Smoothed Online Learning without Knowledge of the Base Measure Agnostisches geglättetes Online-Lernen ohne Kenntnis des Basismaßes 缺乏基础措施知识的Agnostic平滑在线学习 2410.05124v3

Authors (1): Moïse Blanchard

Classical results in statistical learning typically consider two extreme data-generating models: i.i.d. instances from an unknown distribution, or fully adversarial instances, often much more challenging statistically. To bridge the gap between these models, recent work introduced the smoothed framework, in which at each iteration an adversary generates instances from a distribution constrained to have density bounded by $\sigma^{-1}$ compared to some fixed base measure $\mu$. This framework interpolates between the i.i.d. and adversarial cases, depending on the value of $\sigma$. For the classical online prediction problem, most prior results in smoothed online learning rely on the arguably strong assumption that the base measure $\mu$ is known to the learner, contrasting with standard settings in the PAC learning or consistency literature. We consider the general agnostic problem in which the base measure is unknown and values are arbitrary. Along this direction, Block et al. showed that empirical risk minimization has sublinear regret under the well-specified assumption. We propose an algorithm R-Cover based on recursive coverings which is the first to guarantee sublinear regret for agnostic smoothed online learning without prior knowledge of $\mu$. For classification, we prove that R-Cover has adaptive regret $\tilde O(\sqrt{dT/\sigma})$ for function classes with VC dimension $d$, which is optimal up to logarithmic factors. For regression, we establish that R-Cover has sublinear oblivious regret for function classes with polynomial fat-shattering dimension growth.

统计学习的经典结果通常会考虑两种极端的数据生成模型:i.d.来自未知分布或完全对立的事例,通常在统计上更具挑战性。为了弥合这些模型之间的差距,最近的工作引入了平滑的框架,在每次迭代中,一个对手生成了因密度限制而受$\sigma_1美元约束的分布实例,而与某些固定基量 $\ mu$ 比较。这个框架在i.i.d. 和对抗性案例之间进行交叉计算,这取决于美元值。对于传统的在线预测问题,平滑的在线学习大多数先前结果都依赖于一个可论证的强的假设,即基量$\mu$,与PAC学习或一致性文献中的标准设置相比。我们认为,基量为未知,值是任意的。沿着这个方向,Block 和 Al- al. 显示,实证风险最小度在精确度假设的维度层面下具有亚线上的遗憾。我们建议基于递增的R-Cover值,它基于累变的累系数,这是第一个可以保证先入的R&rodeal rocle, rocial recial 。


Article 217

Title@2025-06-12 (4): Self-learning signal classifier for decameter coherent scatter radars

Title: Self-learning signal classifier for decameter coherent scatter radars Selbstlernender Signalklassifikator für Dekameter-Konsistenzstreuradar 10度连续散射雷达自学信号分类器 2506.10305v1

Authors (2): Oleg Berngardt, Ivan Lavygin

The paper presents a method for automatic constructing a classifier for processed data obtained by decameter coherent scatter radars. Method is based only on the radar data obtained, the results of automatic modeling of radio wave propagation in the ionosphere, and mathematical criteria for estimating the quality of the models. The final classifier is the model trained at data obtained by 12 radars of the SuperDARN and SECIRA networks over two years for each radar. The number of the model coefficients is 2669. For the classification, the model uses both the calculated parameters of radio wave propagation in the model ionosphere and the parameters directly measured by the radar. Calibration of radiowave elevation measurements at each radar was made using meteor trail scattered signals. The analysis showed that the optimal number of classes in the data is 37, of which 25 are frequently observed. The analysis made it possible to choose 14 classes from them, which are confidently separated in other variants of model training. A preliminary interpretation of 10 of them was carried out. The dynamics of observation of various classes and their dependence on the geographical latitude of radars at different levels of solar and geomagnetic activity were presented, it was shown that it does not contradict with known physical mechanisms. The analysis showed that the most important parameters to identify the classes are the shape of the signal ray-tracing trajectory in its second half, the ray-traced scattering height and the Doppler velocity measured by the radar.

论文展示了一种方法,用于自动构建一个分类器,用于对经处理的数据进行分解,该方法仅以获得的雷达数据、电离层无线电波传播自动建模结果和估计模型质量的数学标准为基础;最后的分类器是经12个雷达(SuperDARN和SECIRA网络的12个雷达)在两年中为每个雷达获得的数据培训的模型;模型系数数为2669;对于分类,模型使用了模型电离层无线电波传播的计算参数和雷达直接测量的参数;对每个雷达的无线电波高度测量进行了校准,使用了流星踪分散信号;分析表明,数据中最优的等级数为37个,其中25个经常观测;分析使得有可能从中选择14个类别,这些类别在其他模型培训变式中自信地分开;对其中10个模型进行了初步解释;对不同等级的观测动态及其对不同级别雷达地理纬度的依赖度的动态和由雷达直接测量的参数;对每个雷达进行校准的频率测量,每个雷达的测量是使用流道分散的信号信号波散的信号;分析显示,其半级与已知的轨道分析显示,其物理结构与已知的轨道与已知的轨道对比是相反。


Article 218

Title@2025-06-12 (4): The Alignment Trap: Complexity Barriers

Title: The Alignment Trap: Complexity Barriers Die Alignment-Falle: Komplexitätsbarrieren 协调陷阱:复杂障碍 2506.10304v1

Authors (1): Jasper Yao

We establish fundamental computational complexity barriers to verifying AI safety as system capabilities scale. Our main results show that for AI systems with expressiveness EXP$(m)$ above a critical threshold $\tau$, safety verification requires exponential time and is coNP-complete. We formalize the Capability-Risk Scaling (CRS) dynamic, which demonstrates how increasing AI capability drives societal safety requirements toward perfection, creating an inescapable tension with verification complexity. Through four core theorems, we prove that (1) verification complexity grows exponentially with system expressiveness, (2) safe policies comprise at most a $2^{-2^m}$ fraction of the policy space, (3) no finite set of alignment techniques can provide universal coverage, and (4) robust safety properties form measure-zero sets for neural networks. These results characterize an “intractability gap” where practical safety requirements fall within the region of computational intractability. We conclude by presenting a strategic trilemma: AI development must either constrain system complexity to maintain verifiable safety, accept unverifiable risks while scaling capabilities, or develop fundamentally new safety paradigms beyond verification. Our work provides the first systematic complexity-theoretic analysis of AI alignment and establishes rigorous bounds that any safety approach must confront. A formal verification of the core theorems in Lean4 is currently in progress.

我们通过系统能力规模,为核查AI安全设置了基本的计算复杂性障碍。我们的主要结果表明,对于清晰度 EXP$(m)$(m)超过临界阈值的AI系统,安全核查需要指数化的时间,并且已经完全完成。我们正式确定了CRS(CRS)动态,这表明了增强AI能力如何推动社会安全要求达到完美,造成无法避免的紧张与核查复杂性。我们通过四个核心理论,证明(1) 核查复杂性随着系统的清晰度而急剧增长,(2) 安全政策最多包含政策空间的2-2m}美元份额,(3) 没有一套有限的校准技术能够提供普遍覆盖,(4) 神经网络的稳健的安全性度为措施-零套。这些结果具有“可选性差距”的特点,因为实际的安全要求在计算不易失当的区域范围内。我们通过提出战略三重模型来得出结论: AI 开发必须要么限制系统复杂性,以保持可核查的安全,接受无法核查的风险,而同时提升能力,或者开发超出核查范围的基本新的安全模式。我们的工作提供了当前系统性复杂度分析,即严格地进行常规的AI调整。


Article 219

Title@2025-06-12 (4): Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training

Title: Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training ReLU-Netzwerke zusammenführen, um bei der Initialisierung und während des Trainings viele lineare Regionen exponieren zu lassen 在初始化和培训期间,将ReLU网络强制推至展示极多线性线性区域 2311.18022v7

Authors (3): Max Milkert, David Hyde, Forrest Laine

In a neural network with ReLU activations, the number of piecewise linear regions in the output can grow exponentially with depth. However, this is highly unlikely to happen when the initial parameters are sampled randomly, which therefore often leads to the use of networks that are unnecessarily large. To address this problem, we introduce a novel parameterization of the network that restricts its weights so that a depth $d$ network produces exactly $2^d$ linear regions at initialization and maintains those regions throughout training under the parameterization. This approach allows us to learn approximations of convex, one dimensional functions that are several orders of magnitude more accurate than their randomly initialized counterparts. We further demonstrate a preliminary extension of our construction to multidimensional and non-convex functions, allowing the technique to replace traditional dense layers in various architectures.

在使用 ReLU 激活的神经网络中,输出中的片断线性区域的数量会随着深度而成倍增长。 但是,当初始参数随机抽样时,这极不可能发生,因此往往导致使用不必要的大网络。为了解决这个问题,我们引入了网络的新参数化,限制其重量,以便一个深度的美元网络在初始化时能够产生精确的2d美元线性区域,并在整个培训过程中在参数化下保持这些区域。这个方法使我们能够学习二次曲线的近似值,即比随机初始化的对等系统更精确的数个数量级的一维函数。我们进一步展示了我们建设的多维和非曲线功能的初步扩展,使得技术能够取代各种结构中传统的密度层。


Article 220

Title@2025-06-12 (4): Distributionally-Constrained Adversaries in Online Learning

Title: Distributionally-Constrained Adversaries in Online Learning Verteilt bedingte Widersacher im Online-Lernen 在线学习中经分发的、经过培训的在线学习论文 2506.10293v1

Authors (2): Moïse Blanchard, Samory Kpotufe

There has been much recent interest in understanding the continuum from adversarial to stochastic settings in online learning, with various frameworks including smoothed settings proposed to bridge this gap. We consider the more general and flexible framework of distributionally constrained adversaries in which instances are drawn from distributions chosen by an adversary within some constrained distribution class [RST11]. Compared to smoothed analysis, we consider general distributional classes which allows for a fine-grained understanding of learning settings between fully stochastic and fully adversarial for which a learner can achieve non-trivial regret. We give a characterization for which distribution classes are learnable in this context against both oblivious and adaptive adversaries, providing insights into the types of interplay between the function class and distributional constraints on adversaries that enable learnability. In particular, our results recover and generalize learnability for known smoothed settings. Further, we show that for several natural function classes including linear classifiers, learning can be achieved without any prior knowledge of the distribution class – in other words, a learner can simultaneously compete against any constrained adversary within learnable distribution classes.

最近人们非常关心理解在线学习中从对立式到对立式的连续过程,包括提出平滑的设置等各种框架,以弥合这一差距。我们考虑到分布上受限制的对手较一般和灵活的框架,在这种框架中,从一些受限制的分配类[RST11]中由对手选择的分布中产生。与平滑的分析相比,我们考虑一般分配类,这种分配类可以细微地理解完全随机和完全对立式的学习环境,使学习者能够实现非三重式的遗憾。我们给出了在这种背景下可以学习哪些分销类的特征,既能与盲目对手和适应性对手对立,又能够洞察功能类和对对手的分布限制之间的相互作用类型。特别是,我们的结果恢复了已知的平稳环境,并普遍地学习。此外,我们表明,对于几个自然功能类,包括线性分类,学习可以在不事先了解分配类的情况下实现 – 换句话说,学习者可以同时与任何受限制的对手在可学习的分配类内进行竞争。


Article 221

Title@2025-06-12 (4): Heterogeneous Multi-Agent Reinforcement Learning for Distributed Channel Access in WLANs

Title: Heterogeneous Multi-Agent Reinforcement Learning for Distributed Channel Access in WLANs Heterogenes Multi-Agenten-Verstärkungslernen für verteilten Kanalzugang in WLANs 不同多机构强化学习,用于在局域网中分配的频道接入 2412.14218v2

Authors (6): Jiaming Yu, Le Liang, Chongtao Guo, Ziyang Guo, Shi Jin, Geoffrey Ye Li

This paper investigates the use of multi-agent reinforcement learning (MARL) to address distributed channel access in wireless local area networks. In particular, we consider the challenging yet more practical case where the agents heterogeneously adopt value-based or policy-based reinforcement learning algorithms to train the model. We propose a heterogeneous MARL training framework, named QPMIX, which adopts a centralized training with distributed execution paradigm to enable heterogeneous agents to collaborate. Moreover, we theoretically prove the convergence of the proposed heterogeneous MARL method when using the linear value function approximation. Our method maximizes the network throughput and ensures fairness among stations, therefore, enhancing the overall network performance. Simulation results demonstrate that the proposed QPMIX algorithm improves throughput, mean delay, delay jitter, and collision rates compared with conventional carrier-sense multiple access with collision avoidance (CSMA/CA) mechanism in the saturated traffic scenario. Furthermore, the QPMIX algorithm is robust in unsaturated and delay-sensitive traffic scenarios. It coexists well with the conventional CSMA/CA mechanism and promotes cooperation among heterogeneous agents.

本文调查多试剂强化学习(MARL)的使用,以解决无线局域网中分散的频道接入问题。我们特别审议了一个具有挑战性的、更实际的情况,即代理商采用多种不同的基于价值或基于政策的强化学习算法来培训模型。我们提议了一个称为QPMIX的多式MARL培训框架,采用集中培训,采用分布式执行模式,使不同代理商能够合作。此外,我们从理论上证明,在使用线性值函数近似时,拟议的多元MARL方法已经趋于一致。我们的方法使网络通过量最大化,确保各站之间的公平性,从而加强了整个网络的性能。模拟结果表明,与饱和交通情景下的常规承运人多发式避免碰撞(CSMA/CA)机制相比,拟议的QPMIX算法改进了吞吐量、平均延迟、延迟和碰撞率。此外,QPMIX算法在不饱和迟缓性反应性交通情景下十分稳健。它与常规的CSMA/CA机制共存,并促进各不同物剂之间的合作。


Article 222

Title@2025-06-12 (4): Context Is Not Comprehension

Title: Context Is Not Comprehension Kontext ist nicht verständlich 背景不令人理解 2506.04907v4

Authors (2): Alex Pan, Mary-Anne Williams

The dominant way of judging Large Language Models (LLMs) has been to ask how well they can recall explicit facts from very long inputs. While today’s best models achieve near perfect recall, this masks a harder skill: performing multi-step reasoning and tracking intermediate state that never appears verbatim. We introduce Verbose ListOps (VLO), a benchmark that embeds deterministic ListOps computations inside narrative camouflage and, crucially, allows step-level evaluation of every intermediate result. Experiments show that models which solve raw ListOps with approximately 100% accuracy collapse on VLO after only 10,000 tokens. By exposing where a model’s reasoning chain first diverges, VLO moves assessment beyond sheer context length and toward genuine comprehension. VLO’s generation pipeline is task-agnostic: it can weave any deterministically verifiable reasoning schema – arithmetic, symbolic, abductive, inductive or defeasible – into narrative form. This makes VLO a reusable test-bed for the next wave of reasoning-centric model designs, not merely those with step-explicit scaffolds.

判断大语言模型(LLMS)的主要方法一直是询问它们能够从非常长的投入中记得清楚的事实。 虽然今天的最佳模型几乎完全可以回顾, 但是它掩盖了一个更难掌握的技能: 执行多步推理和追踪从未出现过的中间状态。 我们引入了Verbose ListOps (VLO), 这个基准将确定性 ListOps 计算方法嵌入叙事伪装中, 并且关键地允许对每个中间结果进行分级评估。 实验显示, 解决原始 ListOps 的模型, 其精度在1万个符号之后大约为100%的VLO 崩溃。 通过曝光模型推理链首先出现差异的地方, VLO 移动评估超越纯粹的上下文长度, 走向真正的理解。 VLO 的一代管道具有任务性: 它可以将任何可确定性、 可核实的推理的推理方法 – 算、 符号、 、 诱性、 诱导性或易失能性 – 编成叙述形式。 这让VLO成为下一个推理中心模型设计的可重复的测试台, 不仅仅是具有分辨的试的测试台式模型设计。 。 这让VLO成为下一个波的试的测试台, 的测试台, 。


Article 223

Title@2025-06-12 (4): ClusterUCB: Efficient Gradient-Based Data Selection for Targeted Fine-Tuning of LLMs

Title: ClusterUCB: Efficient Gradient-Based Data Selection for Targeted Fine-Tuning of LLMs ClusterUCB: Effiziente Gradient-basierte Datenauswahl für gezielte Feinsteuerung von LLMs COCUCB: 高效率的逐步数据选择,以便有针对性地微调LLMM 2506.10288v1

Authors (6): Zige Wang, Qi Zhu, Fei Mi, Minghui Xu, Ruochun Jin, Wenjing Yang

Gradient-based data influence approximation has been leveraged to select useful data samples in the supervised fine-tuning of large language models. However, the computation of gradients throughout the fine-tuning process requires too many resources to be feasible in practice. In this paper, we propose an efficient gradient-based data selection framework with clustering and a modified Upper Confidence Bound (UCB) algorithm. Based on the intuition that data samples with similar gradient features will have similar influences, we first perform clustering on the training data pool. Then, we frame the inter-cluster data selection as a constrained computing budget allocation problem and consider it a multi-armed bandit problem. A modified UCB algorithm is leveraged to solve this problem. Specifically, during the iterative sampling process, historical data influence information is recorded to directly estimate the distributions of each cluster, and a cold start is adopted to balance exploration and exploitation. Experimental results on various benchmarks show that our proposed framework, ClusterUCB, can achieve comparable results to the original gradient-based data selection methods while greatly reducing computing consumption.

在大型语言模型的监督下微调中,已利用基于渐变的数据影响近似值来选择有用的数据样本,但在整个微调过程中计算梯度需要太多的资源才能实际可行。在本文件中,我们建议采用高效的梯度数据选择框架,同时采用集群和修改的超信任量计算法。基于具有类似梯度特征的数据样本具有类似影响的直觉,我们首先在培训数据库中进行分组。然后,我们将集群数据选择视为一个有限的计算预算分配问题,并将其视为一个多臂强盗问题。修改后的UCB算法被用来解决这一问题。具体地说,在迭代抽样过程中,记录历史数据影响信息,直接估计每个组的分布情况,并采用一个寒冷的起点来平衡探索和利用。各种基准的实验结果表明,我们提议的Crob CUCB框架可以取得与原始梯度数据选择方法的类似结果,同时大大减少计算消耗量。


Article 224

Title@2025-06-12 (4): Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning

Title: Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning Graph-MLLM: Multimodale große Sprachmodelle für multimodales Graphenlernen nutzen 图图-MLLM:利用多模式多模式大语言模型进行多模式图表学习 2506.10282v1

Authors (6): Jiajin Liu, Dongzhe Fan, Jiacheng Shen, Chuanhao Ji, Daochen Zha, Qiaoyu Tan

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in representing and understanding diverse modalities. However, they typically focus on modality alignment in a pairwise manner while overlooking structural relationships across data points. Integrating multimodality with structured graph information (i.e., multimodal graphs, MMGs) is essential for real-world applications such as social networks, healthcare, and recommendation systems. Existing MMG learning methods fall into three paradigms based on how they leverage MLLMs: Encoder, Aligner, and Predictor. MLLM-as-Encoder focuses on enhancing graph neural networks (GNNs) via multimodal feature fusion; MLLM-as-Aligner aligns multimodal attributes in language or hidden space to enable LLM-based graph reasoning; MLLM-as-Predictor treats MLLMs as standalone reasoners with in-context learning or fine-tuning. Despite their advances, the MMG field lacks a unified benchmark to fairly evaluate across these approaches, making it unclear what progress has been made. To bridge this gap, we present Graph-MLLM, a comprehensive benchmark for multimodal graph learning by systematically evaluating these three paradigms across six datasets with different domains. Through extensive experiments, we observe that jointly considering the visual and textual attributes of the nodes benefits graph learning, even when using pre-trained text-to-image alignment models (e.g., CLIP) as encoders. We also find that converting visual attributes into textual descriptions further improves performance compared to directly using visual inputs. Moreover, we observe that fine-tuning MLLMs on specific MMGs can achieve state-of-the-art results in most scenarios, even without explicit graph structure information. We hope that our open-sourced library will facilitate rapid, equitable evaluation and inspire further innovative research in this field.

多式大语言模型(MLLM)在体现和理解多种模式方面表现出了非凡的能力,然而,它们通常侧重于模式调整,以配对方式调整,同时忽略各数据点之间的结构关系。将多式信息与结构图形信息(即多式图表、MMGs)相结合,对于社会网络、医疗保健和建议系统等现实世界应用至关重要。现有的MMG学习方法基于如何利用MLMM(MLMM)的三种模式:Encoder、Aleger和Ans-Encorder。MLM-as-Encorder(GNNNs)侧重于通过多式功能组合加强图形直观神经网络(GNNS);MLLM-Aligner(MLM-Aligner)在语言或隐蔽空间将多式图形信息模型属性整合在一起,使LLMMM-MM(MM)结构推理推理;MLM-S-PLM(M)将MLM(MLM(M)作为独立解或微调调解的理论)作为独立解释的理论的理论解释。MLMLMLM(我们在这方面的更深入的更深入评估,我们甚至更深入的理论)实地评估。MLILM(我们更深入的理论)将利用了这些直观的文本,我们更深入的理论的理论,我们更深入的理论,我们更深入的实地的理论,我们观察的实地评估。


Article 225

Title@2025-06-12 (4): Noise Balance and Stationary Distribution of Stochastic Gradient Descent

Title: Noise Balance and Stationary Distribution of Stochastic Gradient Descent Geräuschausgleich und Stationäre Verteilung des stochastischen Gradientenabstiegs 沙粒渐变后裔的噪音平衡和固定分布 2308.06671v2

Authors (3): Liu Ziyin, Hongchao Li, Masahito Ueda

The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks. However, it remains poorly understood how the SGD navigates the highly nonlinear and degenerate loss landscape of a neural network. In this work, we show that the minibatch noise of SGD regularizes the solution towards a noise-balanced solution whenever the loss function contains a rescaling parameter symmetry. Because the difference between a simple diffusion process and SGD dynamics is the most significant when symmetries are present, our theory implies that the loss function symmetries constitute an essential probe of how SGD works. We then apply this result to derive the stationary distribution of stochastic gradient flow for a diagonal linear network with arbitrary depth and width. The stationary distribution exhibits complicated nonlinear phenomena such as phase transitions, broken ergodicity, and fluctuation inversion. These phenomena are shown to exist uniquely in deep networks, implying a fundamental difference between deep and shallow models.

用于培养神经网络的算法(SGD)是用于培养神经网络的算法。 但是,它仍然不太清楚 SGD是如何导航神经网络高度非线性和退化性损失景观的。 在这项工作中,我们显示SGD的微小噪声在损失函数包含一个伸缩参数对称功能时,会规范对噪音平衡解决办法的解决方案。由于简单扩散过程与SGD动态之间的差别在出现对称时最为显著,我们的理论暗示损失函数对称构成了SGD如何工作的基本探测。 然后我们应用这一结果来为具有任意深度和宽度的对角线网络生成静态的斜度流分布。 静止分布显示了复杂的非线性现象, 如阶段转换、 折断的垂直和波动。 这些现象在深层网络中表现出独特的存在, 意味着深度和浅度模型之间的根本差异 。


Article 226

Title@2025-06-12 (4): Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty

Title: Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty Leichtes Datenset Pruning ohne vollständiges Training über Beispiel Schwierigkeit und Vorhersage Ungewissheit 轻量数据集不通过实例困难和预测不确定性进行充分培训 2502.06905v3

Authors (4): Yeseul Cho, Baekrok Shin, Changmin Kang, Chulhee Yun

Recent advances in deep learning rely heavily on massive datasets, leading to substantial storage and training costs. Dataset pruning aims to alleviate this demand by discarding redundant examples. However, many existing methods require training a model with a full dataset over a large number of epochs before being able to prune the dataset, which ironically makes the pruning process more expensive than just training the model on the entire dataset. To overcome this limitation, we introduce a Difficulty and Uncertainty-Aware Lightweight (DUAL) score, which aims to identify important samples from the early training stage by considering both example difficulty and prediction uncertainty. To address a catastrophic accuracy drop at an extreme pruning, we further propose a ratio-adaptive sampling using Beta distribution. Experiments on various datasets and learning scenarios such as image classification with label noise and image corruption, and model architecture generalization demonstrate the superiority of our method over previous state-of-the-art (SOTA) approaches. Specifically, on ImageNet-1k, our method reduces the time cost for pruning to 66% compared to previous methods while achieving a SOTA, specifically 60% test accuracy at a 90% pruning ratio. On CIFAR datasets, the time cost is reduced to just 15% while maintaining SOTA performance.

最近深层学习的进展严重依赖大量数据集,导致大量储存和培训费用。数据集修剪的目的是通过丢弃冗余实例来缓解这种需求。然而,许多现有方法需要培训一个模型,在大量时代中拥有完整的数据集,然后才能将数据集精化,讽刺的是,这使得修剪过程比在整个数据集中仅对模型进行训练要贵得多。为了克服这一限制,我们引入了困难和不确定的软件轻量分(DUAL),目的是通过考虑示例困难和预测不确定性,从早期培训阶段找出重要样本,以便从早期培训阶段找出重要样本,从而通过考虑示例困难和预测不确定性来缓解这一需求。要解决在极端修剪裁时灾难性的精确度下降的问题,我们进一步建议使用Beta分布来进行一个适合比例的抽样。实验了各种数据集和学习情景,例如标签噪音和图像腐败的图像分类,以及模型结构化,表明我们的方法优于以往的状态-艺术(SOTA)方法。具体地说,我们的方法旨在降低与先前方法相比的运行时间成本为66%,同时实现SOTA的精确率率率为15,而SOTRA的运行为90。


Article 227

Title@2025-06-12 (4): VQC-MLPNet: An Unconventional Hybrid Quantum-Classical Architecture for Scalable and Robust Quantum Machine Learning

Title: VQC-MLPNet: An Unconventional Hybrid Quantum-Classical Architecture for Scalable and Robust Quantum Machine Learning VQC-MLPNet: Eine unkonventionelle hybride Quantum-Klassische Architektur für skalierbares und robustes Quantum Machine Learning VQC-MLPNet:用于可缩放和强力量量机器学习的非常规混合量子分类结构 2506.10275v1

Authors (4): Jun Qi, Chao-Han Yang, Pin-Yu Chen, Min-Hsiu Hsieh

Variational Quantum Circuits (VQCs) offer a novel pathway for quantum machine learning, yet their practical application is hindered by inherent limitations such as constrained linear expressivity, optimization challenges, and acute sensitivity to quantum hardware noise. This work introduces VQC-MLPNet, a scalable and robust hybrid quantum-classical architecture designed to overcome these obstacles. By innovatively employing quantum circuits to dynamically generate parameters for classical Multi-Layer Perceptrons (MLPs) via amplitude encoding and parameterized quantum operations, VQC-MLPNet substantially expands representation capabilities and augments training stability. We provide rigorous theoretical guarantees via statistical learning techniques and Neural Tangent Kernel analysis, explicitly deriving upper bounds on approximation, uniform deviation, and optimization errors. These theoretical insights demonstrate exponential improvements in representation capacity relative to quantum circuit depth and the number of qubits, providing clear computational advantages over standalone quantum circuits and existing hybrid quantum architectures. Our theoretical claims are empirically corroborated through extensive experiments, including classifying semiconductor quantum-dot charge states and predicting genomic transcription factor binding sites, demonstrating resilient performance even under realistic IBM quantum noise simulations. This research establishes a theoretically sound and practically robust framework, advancing the frontiers of quantum-enhanced learning for unconventional computing paradigms in the Noisy Intermediate-Scale Quantum era and beyond.

QQC-MLPNet(VQC-MLPNet)为量子机器学习提供了一条全新的路径,然而,其实际应用却受到内在限制的阻碍,例如限制直线表达、优化挑战、对量子硬件噪音的敏感度等固有限制。这项工作引入了VQC-MLPNet,这是一个可升级和强大的混合量子古典结构,旨在克服这些障碍。通过创新利用量子电路,动态地为古典多低视谱(MLP)生成参数,通过振幅编码和参数化量子操作,VQC-MLPNet(VQC-MLPNet)大大扩大了代表性能力,增强了培训稳定性。我们通过统计学习技术和神经中内内内内内内内内内内核分析,提供严格的理论保障,明确得出关于近似、统一偏差和优化误差的上界限。这些理论洞察显示,相对于量电路深度和Qibitis数量结构(ML)而言,代表性明显具有计算优势。我们的理论主张通过广泛的实验得到证实,包括将半导师型量控股价充量充量充量充量的状态,并预测在现实的代平级平级平级研究中平级平级平级平级平级的平级结构结构结构结构结构内,建立具有实际的平质性平质性平质的平级研究基础的平质性平标。


Article 228

Title@2025-06-12 (4): TimeBridge: Better Diffusion Prior Design with Bridge Models for Time Series Generation

Title: TimeBridge: Better Diffusion Prior Design with Bridge Models for Time Series Generation TimeBridge: Besseres Diffusions-Vorentwurf mit Bridge-Modellen für die Zeitreihenerstellung TimeBridge: 更好的传播前设计,具有时间序列生成的桥梁模型 2408.06672v2

Authors (5): Jinseong Park, Seungyun Lee, Woojin Jeong, Yujin Choi, Jaewook Lee

Time series generation is widely used in real-world applications such as simulation, data augmentation, and hypothesis testing. Recently, diffusion models have emerged as the de facto approach to time series generation, enabling diverse synthesis scenarios. However, the fixed standard-Gaussian diffusion prior may be ill-suited for general time series data, such as temporal order and fixed points. In this paper, we propose TimeBridge, a framework that flexibly synthesizes time series data by using diffusion bridges to learn paths between a chosen prior and the data distribution. We then explore several prior designs tailored to time series synthesis. Our framework covers (i) data- and time-dependent priors for unconditional generation and (ii) scale-preserving priors for conditional generation. Experiments show that our framework with data-driven priors outperforms standard diffusion models on time series generation.

时间序列生成被广泛用于模拟、数据扩增和假设测试等现实世界应用。最近,扩散模型被作为时间序列生成的实际方法出现,从而产生了多种合成设想。然而,以前固定的标准-Gausian扩散可能不适合一般时间序列数据,如时间顺序和固定点。在本文中,我们提议了TimeBridge,这是一个通过使用传播桥梁来灵活合成时间序列数据的框架,以学习选定先前数据分布与数据分布之间的路径。然后我们探索了以前为时间序列合成而定制的若干设计。我们的框架涵盖(一) 无条件生成的数据和时间前期,以及(二) 有条件生成的缩放前期。实验表明,我们以数据驱动的前期生成框架在时间序列生成方面优于标准扩散模型。


Article 229

Title@2025-06-12 (4): GD doesn’t make the cut: Three ways that non-differentiability affects neural network training

Title: GD doesn’t make the cut: Three ways that non-differentiability affects neural network training GD macht nicht den Schnitt: Drei Wege, dass Nicht-Unterschiedlichkeit das neuronale Netzwerk-Training beeinflusst GD没有做出削减: 三种无差异的方式影响神经网络培训 2401.08426v9

Authors (1): Siddharth Krishna Kumar

This paper critically examines the fundamental distinctions between gradient methods applied to non-differentiable functions (NGDMs) and classical gradient descents (GDs) for differentiable functions, revealing significant gaps in current deep learning optimization theory. We demonstrate that NGDMs exhibit markedly different convergence properties compared to GDs, strongly challenging the applicability of extensive neural network convergence literature based on $L-smoothness$ to non-smooth neural networks. Our analysis reveals paradoxical behavior of NDGM solutions for $L_{1}$-regularized problems, where increasing regularization counterintuitively leads to larger $L_{1}$ norms of optimal solutions. This finding calls into question widely adopted $L_{1}$ penalization techniques for network pruning. We further challenge the common assumption that optimization algorithms like RMSProp behave similarly in differentiable and non-differentiable contexts. Expanding on the Edge of Stability phenomenon, we demonstrate its occurrence in a broader class of functions, including Lipschitz continuous convex differentiable functions. This finding raises important questions about its relevance and interpretation in non-convex, non-differentiable neural networks, particularly those using ReLU activations. Our work identifies critical misunderstandings of NDGMs in influential literature, stemming from an overreliance on strong smoothness assumptions. These findings necessitate a reevaluation of optimization dynamics in deep learning, emphasizing the crucial need for more nuanced theoretical foundations in analyzing these complex systems.

本文批判地审视了适用于非差别功能(NGDMs)的梯度方法与适用于非差别功能(NGDMs)的经典梯度下降(GDs)方法之间的根本区别,揭示了当前深层次学习优化理论中的重大差距。我们证明,NGDMs展示了与GDs相比明显不同的趋同特性,强烈质疑基于美元对非线性神经网络的大规模神经网络趋同文献的适用性。我们的分析揭示了对美元常规化问题的NDGM解决方案的自相矛盾行为,在这些问题上,不断增加的正规化反直觉导致最佳解决方案的更高值($L*1})规范。这一发现使人们对广泛采用美元($$1})的处罚方法进行质疑。我们进一步质疑了一种共同的假设,即像RMSProp这样的优化算法在差异和无差别的环境中运行类似。在稳定现象的扩大时,我们展示了它出现在更广泛的职能类别中,包括Libschitz convers condicial convers ex ex remodition refrodutional refroduction reful fidiversal relist resmavidustrislational revidustrismismismismismismismismlation.


Article 230

Title@2025-06-12 (4): Predicting function of evolutionarily implausible DNA sequences

Title: Predicting function of evolutionarily implausible DNA sequences Vorhersagefunktion evolutionär unglaubwürdiger DNA-Sequenzen 进化性DNA序列的预测功能 2506.10271v1

Authors (3): Shiyu Jiang, Xuyin Liu, Zitong Jerry Wang

Genomic language models (gLMs) show potential for generating novel, functional DNA sequences for synthetic biology, but doing so requires them to learn not just evolutionary plausibility, but also sequence-to-function relationships. We introduce a set of prediction tasks called Nullsettes, which assesses a model’s ability to predict loss-of-function mutations created by translocating key control elements in synthetic expression cassettes. Across 12 state-of-the-art models, we find that mutation effect prediction performance strongly correlates with the predicted likelihood of the nonmutant. Furthermore, the range of likelihood values predictive of strong model performance is highly dependent on sequence length. Our work highlights the importance of considering both sequence likelihood and sequence length when using gLMs for mutation effect prediction.

基因组语言模型(gLMs)显示出为合成生物学生成新型、功能性DNA序列的潜力,但这样做要求它们不仅了解进化的可信任性,而且还了解序列到功能关系。我们引入了一套称为Nullsettes的预测任务,评估模型预测通过合成表达式磁带中关键控制要素的转换产生的功能性突变的能力。在12个最先进的模型中,我们发现变异效应预测性能与预测的非变异性的可能性密切相关。此外,预测强型性能的可能性值范围高度取决于序列长度。我们的工作强调了在使用 gLMs进行突变效应预测时考虑序列概率和序列长度的重要性。


Article 231

Title@2025-06-12 (4): Interior-Point Vanishing Problem in Semidefinite Relaxations for Neural Network Verification

Title: Interior-Point Vanishing Problem in Semidefinite Relaxations for Neural Network Verification Interior-Point Vanishing Problem in semidefinite Entspannung für Neural Network Verifikation 神经网络核查半无限期放松措施中的内部-点消失问题 2506.10269v1

Authors (4): Ryota Ueda, Takami Sato, Ken Kobayashi, Kazuhide Nakata

Semidefinite programming (SDP) relaxation has emerged as a promising approach for neural network verification, offering tighter bounds than other convex relaxation methods for deep neural networks (DNNs) with ReLU activations. However, we identify a critical limitation in the SDP relaxation when applied to deep networks: interior-point vanishing, which leads to the loss of strict feasibility – a crucial condition for the numerical stability and optimality of SDP. Through rigorous theoretical and empirical analysis, we demonstrate that as the depth of DNNs increases, the strict feasibility is likely to be lost, creating a fundamental barrier to scaling SDP-based verification. To address the interior-point vanishing, we design and investigate five solutions to enhance the feasibility conditions of the verification problem. Our methods can successfully solve 88% of the problems that could not be solved by existing methods, accounting for 41% of the total. Our analysis also reveals that the valid constraints for the lower and upper bounds for each ReLU unit are traditionally inherited from prior work without solid reasons, but are actually not only unbeneficial but also even harmful to the problem’s feasibility. This work provides valuable insights into the fundamental challenges of SDP-based DNN verification and offers practical solutions to improve its applicability to deeper neural networks, contributing to the development of more reliable and secure systems with DNNs.

在神经网络核查方面,半终点编程的放松已成为一个有希望的神经网络核查方法,为深神经网络的激活提供了比其他孔雀放松方法更紧的界限。然而,我们发现,在应用到深网络时,SDP放松是一个关键限制:内点消失,导致失去严格的可行性 – – 这是SDP数字稳定性和最佳性的一个关键条件。通过严格的理论和经验分析,我们表明,随着DNN的深度增加,严格的可行性有可能丧失,为扩大SDP的核查造成一个基本障碍。为解决内部点消失的问题,我们设计和调查五项解决办法,以加强核查问题的可行性条件。我们的方法可以成功地解决现有方法无法解决的88%的问题,占总数的41%。我们的分析还表明,对ReLU每个单位的下层和上层界限的有效限制,传统上是没有坚实理由从先前工作中继承的,但实际上不仅不可靠,而且对扩大基于SDP的核查范围构成根本挑战,而且对问题的实际可行性也有害。这项工作为SNPR更深入的核查提供了宝贵的见解,为SD的系统提供了更可靠的安全性的挑战。


Article 232

Title@2025-06-12 (4): Do Language Models Have Bayesian Brains? Distinguishing Stochastic and Deterministic Decision Patterns within Large Language Models

Title: Do Language Models Have Bayesian Brains? Distinguishing Stochastic and Deterministic Decision Patterns within Large Language Models Haben Sprachmodelle Bayesische Gehirne? Beeindruckende stochastische und deterministische Entscheidungsmuster innerhalb großer Sprachmodelle 语言模式是否具有贝耶斯人脑? 区分大语言模式中的斯托卡和决定性决定模式 2506.10268v1

Authors (2): Andrea Yaoyun Cui, Pengfei Yu

Language models are essentially probability distributions over token sequences. Auto-regressive models generate sentences by iteratively computing and sampling from the distribution of the next token. This iterative sampling introduces stochasticity, leading to the assumption that language models make probabilistic decisions, similar to sampling from unknown distributions. Building on this assumption, prior research has used simulated Gibbs sampling, inspired by experiments designed to elicit human priors, to infer the priors of language models. In this paper, we revisit a critical question: Do language models possess Bayesian brains? Our findings show that under certain conditions, language models can exhibit near-deterministic decision-making, such as producing maximum likelihood estimations, even with a non-zero sampling temperature. This challenges the sampling assumption and undermines previous methods for eliciting human-like priors. Furthermore, we demonstrate that without proper scrutiny, a system with deterministic behavior undergoing simulated Gibbs sampling can converge to a “false prior.” To address this, we propose a straightforward approach to distinguish between stochastic and deterministic decision patterns in Gibbs sampling, helping to prevent the inference of misleading language model priors. We experiment on a variety of large language models to identify their decision patterns under various circumstances. Our results provide key insights in understanding decision making of large language models.

语言模型基本上是象征性序列的概率分布。 自动递减模型通过迭代计算和采样从下一个符号的分布中产生句子。 迭代抽样引入了随机性, 从而假设语言模型作出概率性决定, 类似于未知分布的抽样。 基于这一假设, 先前的研究使用了模拟Gibs抽样, 受旨在提取人类前科的实验的启发, 并参照了语言模型的前身。 在本文中, 我们重新审视了一个关键问题 : 语言模型是否拥有巴耶斯人的大脑? 我们的研究结果显示, 在某些条件下, 语言模型可以展示出近乎决定性的决策决策, 例如生成最大概率估计, 甚至是非零采样温度。 这挑战了取样假设, 并破坏了先前的人类类似模式。 此外, 我们证明, 未经适当审查, 模拟吉布抽样抽样的具有确定性行为的系统可以归结为“ 假象 ” 。 为了解决这个问题, 我们建议一种直截了的方法来区分随机和确定性决定性的决定模式, 有助于防止在大量语言模型中做出误导性判断。


Article 233

Title@2025-06-12 (4): Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update

Title: Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update Heavy-Tailed Linear Bandits: Huber-Regression mit One-Pass-Update 重困线性强盗:用单页更新的 “ Huber Refer “ 后退 2503.00419v2

Authors (4): Jing Wang, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou

We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develops a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational costs: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a \emph{one-pass} algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from $\mathcal{O}(t \log T)$ to $\mathcal{O}(1)$ with respect to current round $t$ and the time horizon $T$, and achieves a near-optimal and variance-aware regret of order $\widetilde{\mathcal{O}}\big(d T^{\frac{1-\epsilon}{2(1+\epsilon)}} \sqrt{\sum_{t=1}^T \nu_t^2} + d T^{\frac{1-\epsilon}{2(1+\epsilon)}}\big)$ where $d$ is the dimension and $\nu_t^{1+\epsilon}$ is the $(1+\epsilon)$-th central moment of reward at round $t$.

我们用重尾噪声研究螺旋线性土匪。 处理重尾噪声、 脱轨和中位速率的两种原则战略已经引入到重尾土匪身上。 尽管如此, 这些方法仍然依赖于特定的噪音假设或强盗结构, 限制其适用于一般环境。 最近的工作 [Huang et al. 2024] 通过适应性枢纽回归开发一种软拖网法, 以解决这些限制。 然而, 它们的方法有不理想的计算成本: 它需要存储所有历史数据并在每轮中完成这些数据的完整传输。 在本文中, 我们提出基于在线镜底下移框架的 emph{ one- pass} 算法。 我们的方法仅使用每轮的当前数据更新, 将每轮计算成本从$\macal{O} (t\log T) 美元降低到$ mathal{ {O}(1)$, 相对于当前圆美元和时平面值$2( $_\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\


Article 234

Title@2025-06-12 (4): Meta-learning Representations for Learning from Multiple Annotators

Title: Meta-learning Representations for Learning from Multiple Annotators Meta-Learning-Darstellungen zum Lernen von mehreren Annotatoren 从多个宣传员那里学习的元学习代表 2506.10259v1

Authors (5): Atsutoshi Kumagai, Tomoharu Iwata, Taishi Nishiyama, Yasutoshi Ida, Yasuhiro Fujiwara

We propose a meta-learning method for learning from multiple noisy annotators. In many applications such as crowdsourcing services, labels for supervised learning are given by multiple annotators. Since the annotators have different skills or biases, given labels can be noisy. To learn accurate classifiers, existing methods require many noisy annotated data. However, sufficient data might be unavailable in practice. To overcome the lack of data, the proposed method uses labeled data obtained in different but related tasks. The proposed method embeds each example in tasks to a latent space by using a neural network and constructs a probabilistic model for learning a task-specific classifier while estimating annotators’ abilities on the latent space. This neural network is meta-learned to improve the expected test classification performance when the classifier is adapted to a given small amount of annotated data. This classifier adaptation is performed by maximizing the posterior probability via the expectation-maximization (EM) algorithm. Since each step in the EM algorithm is easily computed as a closed-form and is differentiable, the proposed method can efficiently backpropagate the loss through the EM algorithm to meta-learn the neural network. We show the effectiveness of our method with real-world datasets with synthetic noise and real-world crowdsourcing datasets.

我们建议了一种向多个吵闹的告示员学习的元学习方法。 在诸如众包服务等许多应用中,监督学习的标签由多个说明员提供。 由于批注员有不同的技能或偏差, 给标签会很吵。 要学习准确的分类师, 现有的方法需要许多吵闹的附加说明的数据。 但是, 在实践中可能缺乏足够的数据。 为了克服数据缺乏的问题, 提议的方法使用在不同但相关的任务中获得的标签数据。 拟议的方法将每个例子嵌入任务的潜在空间中, 使用神经网络, 并构建一种在估计潜在空间内特定任务分类员的能力时学习特定分类员的概率模型。 这个神经网络是元化的, 以提高预期的测试分类性性能, 当分类员适应给定少量附加说明的数据时。 这个分类员的调整是通过期望- 最大化的算法来进行。 由于EM 算法的每个步骤很容易以封闭的形式和不同的方式计算, 拟议的方法可以有效地将一个特定分类员在潜在空间中学习一个任务分类员的能力模型。 这个神经网络的计算出真正的损失, 通过我们 EM 的合成的算算算算方法可以有效地用真实数据系统来显示真实的数据- 。


Article 235

Title@2025-06-12 (4): Elucidating the Design Space of Multimodal Protein Language Models

Title: Elucidating the Design Space of Multimodal Protein Language Models Erklärung des Gestaltungsraums multimodaler Protein-Sprachmodelle 说明多模式蛋白质语言模型的设计空间 2504.11454v3

Authors (8): Cheng-Yen Hsieh, Xinyou Wang, Daiheng Zhang, Dongyu Xue, Fei Ye, Shujian Huang, Zaixiang Zheng, Quanquan Gu

Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. We identify tokenization loss and inaccurate structure token predictions by the PLMs as major bottlenecks. To address these, our proposed design space covers improved generative modeling, structure-aware architectures and representation learning, and data exploration. Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling. The effective design methods dramatically improve the structure generation diversity, and notably, folding abilities of our 650M model by reducing the RMSD from 5.52 to 2.36 on PDB testset, even outperforming 3B baselines and on par with the specialized folding models. Project page and code: https://bytedance.github.io/dplm/dplm-2.1/.

多式蛋白语言模型(PLM)结合了序列和基于象征性的结构信息,作为蛋白质建模、生成和设计的一个强有力的基础。然而,依赖象征性的三维结构作为离散的象征物造成了对精细结构细节和相关性的忠实性的巨大损失。在本文件中,我们系统地阐明了多式蛋白语言模型的设计空间,以克服其局限性。我们确定PLM模型的象征性损失和不准确的结构象征性预测是主要的瓶颈。为解决这些问题,我们拟议的设计空间包括改良的基因模型、结构认知架构和代表性学习以及数据探索。我们的进步方法的精细化监督,表明基于象征性的多式MLMs能够实现强有力的结构建模。有效的设计方法极大地改进了我们650M模型的结构多样性,特别是将PDB测试的RMSD从5.52降至2.36,甚至超过3B基线和与专门折叠模型相匹配。项目页和代码:https://byteance.gith.io/dplm/dplm-2.1/。


Article 236

Title@2025-06-12 (4): OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning

Title: OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning OmniSage: Large Scale, Multi-Entity Heterogenes Graph Representation Learning 总括:大规模、多实体异质图代表制学习 2504.17811v3

Authors (8): Anirudhan Badrinath, Alex Yang, Kousik Rajesh, Prabhat Agarwal, Jaewon Yang, Haoyu Chen, Jiajing Xu, Charles Rosenberg

Representation learning, a task of learning latent vectors to represent entities, is a key task in improving search and recommender systems in web applications. Various representation learning methods have been developed, including graph-based approaches for relationships among entities, sequence-based methods for capturing the temporal evolution of user activities, and content-based models for leveraging text and visual content. However, the development of a unifying framework that integrates these diverse techniques to support multiple applications remains a significant challenge. This paper presents OmniSage, a large-scale representation framework that learns universal representations for a variety of applications at Pinterest. OmniSage integrates graph neural networks with content-based models and user sequence models by employing multiple contrastive learning tasks to effectively process graph data, user sequence data, and content signals. To support the training and inference of OmniSage, we developed an efficient infrastructure capable of supporting Pinterest graphs with billions of nodes. The universal representations generated by OmniSage have significantly enhanced user experiences on Pinterest, leading to an approximate 2.5% increase in sitewide repins (saves) across five applications. This paper highlights the impact of unifying representation learning methods, and we make the model code publicly available at https://github.com/pinterest/atg-research/tree/main/omnisage.

代表学习是学习潜在载体以代表实体的一项任务,是改进网络应用程序搜索和建议系统的关键任务。已经开发了各种代表学习方法,包括实体间关系的图形化方法、用户活动时间演变的顺序化方法、利用文字和视觉内容的基于内容的模式。然而,开发一个整合这些不同技术以支持多种应用的统一框架,仍是一项重大挑战。本文介绍了OmniSage,这是一个大型代表框架,它学习各种应用在兴趣中的通用代表。OmniSage将图形神经网络与基于内容的模式和用户序列模型相结合,为此采用多种对比性学习任务来有效处理图表数据、用户序列数据和内容信号。为了支持OmniSage的培训和推断,我们开发了一个高效的基础设施,能够支持有数十亿个节点的利息图表。OmniSage的通用表述极大地增强了用户在兴趣方面的经验,导致五个应用程序的全网站(savises)调控器增加了大约2.5%。本文着重介绍了统一代表性/在线学习方法的影响。


Article 237

Title@2025-06-12 (4): Qronos: Correcting the Past by Shaping the Future… in Post-Training Quantization

Title: Qronos: Correcting the Past by Shaping the Future… in Post-Training Quantization Qronos: Die Vergangenheit korrigieren, indem sie die Zukunft gestalten… in der Post-Training-Quantisierung Qronos:在培训后量化中通过塑造未来来纠正过去 2505.11695v2

Authors (4): Shihao Zhang, Haoyu Zhang, Ian Colbert, Rayan Saab

We introduce Qronos – a new state-of-the-art post-training quantization algorithm that sequentially rounds and updates neural network weights. Qronos not only explicitly corrects errors due to both weight and activation quantization, but also errors resulting from quantizing previous layers. Our iterative algorithm is based on an interpretable and disciplined optimization framework that subsumes and surpasses existing data-driven approaches. At each step, Qronos alternates between error correction and diffusion via optimal update rules. Importantly, we prove that Qronos admits an efficient implementation that uses the Cholesky decomposition for solving least-squares problems. We also demonstrate that Qronos is compatible with existing transformation techniques such as Hadamard-based incoherence processing and weight-activation scaling equalization, among others. We evaluate Qronos using recent autoregressive language generation models in the Llama3 family; Qronos consistently outperforms previous state-of-the-art adaptive rounding methods when quantizing the weights, activations, and/or KV caches.

我们引入了Qronos – – 一种新的最先进的培训后量化算法,它依次循环并更新神经网络重量。 Qronos不仅明确纠正由于重量和激活量化造成的错误, 而且还纠正了前几层的量化所产生的错误。 我们的迭代算法基于一个可解释和有纪律的优化框架, 该框架将分解并超过现有的数据驱动方法。 每一步, Qronos通过最佳更新规则将错误更正与传播相交。 重要的是, 我们证明 Qronos 承认了一种高效的实施, 使用Choolesky分解法解决最小方位问题。 我们还表明, Qronos 与现有的变异技术兼容, 例如基于 Hadamad 的异位处理和重振动比例均衡等 。 我们使用最近的Llama3 家族的反向语言生成模型来评估Qronos ; Qronos 一贯地超越了在对重重力、 激活和/ 或 KV 缓存中进行四轮回调的旧方法。


Article 238

Title@2025-06-11 (3): Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing

Title: Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing Gewicht wert: Randomisierte und Regularisierte Blockkaczmarz-Algorithmen ohne Vorverarbeitung 重量:无预处理的随机和正规化卡茨马尔兹理算区块 2502.00882v2

Authors (3): Gil Goldshlager, Jiang Hu, Lin Lin

Due to the ever growing amounts of data leveraged for machine learning and scientific computing, it is increasingly important to develop algorithms that sample only a small portion of the data at a time. In the case of linear least-squares, the randomized block Kaczmarz method (RBK) is an appealing example of such an algorithm, but its convergence is only understood under sampling distributions that require potentially prohibitively expensive preprocessing steps. To address this limitation, we analyze RBK when the data is sampled uniformly, showing that its iterates converge in a Monte Carlo sense to a $\textit{weighted}$ least-squares solution. Unfortunately, for general problems the condition number of the weight matrix and the variance of the iterates can become arbitrarily large. We control these issues by incorporating regularization into the RBK iterations, yielding the regularized algorithm ReBlocK. Numerical experiments including examples arising from natural gradient optimization demonstrate that ReBlocK can outperform both RBK and minibatch stochastic gradient descent for inconsistent problems with rapidly decaying singular values.

由于机器学习和科学计算所利用的数据数量不断增加,因此越来越有必要发展一种算法,每次只对一小部分数据进行抽样。对于线性最小方块而言,随机的卡茨马尔兹区块方法(Kaczmarz)是这种算法的引人注意的例子,但它的趋同只能通过抽样分布来理解,而抽样分布则可能需要花费巨大得多的预处理步骤。为了解决这一限制,我们在对数据进行统一取样时对RBK进行了分析,表明其循环在蒙特卡洛感中汇合到美元(textit{加权}美元)最低方块的溶液中。不幸的是,对于一般问题,重量矩阵的条件数量和迭代国的差异可能会变得任意大。我们通过将正规化纳入RBK迭代法,产生正规化的算法ReBlocK. Numerical实验,包括自然梯度优化所产生的实例,表明ReBlocK和微型批次梯度梯度梯度梯度梯度梯度梯度梯度梯度下降,因为问题与指数的迅速衰减。


Article 239

Title@2025-06-11 (3): A new type of federated clustering: A non-model-sharing approach

Title: A new type of federated clustering: A non-model-sharing approach Eine neue Art von föderierten Clustern: Ein nicht-Modell-Sharing-Ansatz 新型的联盟联合集群:非模式共享办法 2506.10244v1

Authors (8): Yuji Kawamata, Kaoru Kamijo, Maki Kihira, Akihiro Toyoda, Tomoru Nakayama, Akira Imakura, Tetsuya Sakurai, Yukihiko Okada

In recent years, the growing need to leverage sensitive data across institutions has led to increased attention on federated learning (FL), a decentralized machine learning paradigm that enables model training without sharing raw data. However, existing FL-based clustering methods, known as federated clustering, typically assume simple data partitioning scenarios such as horizontal or vertical splits, and cannot handle more complex distributed structures. This study proposes data collaboration clustering (DC-Clustering), a novel federated clustering method that supports clustering over complex data partitioning scenarios where horizontal and vertical splits coexist. In DC-Clustering, each institution shares only intermediate representations instead of raw data, ensuring privacy preservation while enabling collaborative clustering. The method allows flexible selection between k-means and spectral clustering, and achieves final results with a single round of communication with the central server. We conducted extensive experiments using synthetic and open benchmark datasets. The results show that our method achieves clustering performance comparable to centralized clustering where all data are pooled. DC-Clustering addresses an important gap in current FL research by enabling effective knowledge discovery from distributed heterogeneous data. Its practical properties – privacy preservation, communication efficiency, and flexibility – make it a promising tool for privacy-sensitive domains such as healthcare and finance.

近年来,各机构利用敏感数据的日益需要,导致对联合学习(FL)这一分散式机器学习模式的日益重视,这种分散式的机械学习模式使示范培训得以不共享原始数据。然而,现有的基于FL的集群方法(称为Federate Group)通常采取简单的数据分割方案,例如横向或纵向分割,无法处理更复杂的分布结构。本研究报告提议数据合作集群(DC-Clustering),这是一种新的联合式集群方法,支持在横向和纵向分裂共存的复杂数据分割假设中进行集群。在DC-Clustering中,每个机构只共享中间显示器而不是原始数据,确保隐私保护,同时促成合作集群。这种方法允许灵活选择 k- means和光谱集群,并通过与中央服务器的单轮通信取得最终结果。我们利用合成和开放基准数据集进行了广泛的实验。结果显示,我们的方法的组合性业绩与集中组合性(所有数据都聚集在一起)相近。DC-Clustering 解决当前FL研究中的一个重要差距,即能够从分布的混杂数据中有效地发现知识。它的实际性质 – – 隐私保护、通信效率和灵活性 – – – 使它成为一种有希望的工具。


Article 240

Title@2025-06-11 (3): PointNet with KAN versus PointNet with MLP for 3D Classification and Segmentation of Point Sets

Title: PointNet with KAN versus PointNet with MLP for 3D Classification and Segmentation of Point Sets PointNet mit KAN versus PointNet mit MLP für 3D Klassifizierung und Segmentierung von Punktsätzen 与 KAN 和 PointNet 和 MLP 相对的点网点网点,用于 3D 的点集分类和分划 2410.10084v3

Authors (1): Ali Kashefi

Kolmogorov-Arnold Networks (KANs) have recently gained attention as an alternative to traditional Multilayer Perceptrons (MLPs) in deep learning frameworks. KANs have been integrated into various deep learning architectures such as convolutional neural networks, graph neural networks, and transformers, with their performance evaluated. However, their effectiveness within point-cloud-based neural networks remains unexplored. To address this gap, we incorporate KANs into PointNet for the first time to evaluate their performance on 3D point cloud classification and segmentation tasks. Specifically, we introduce PointNet-KAN, built upon two key components. First, it employs KANs instead of traditional MLPs. Second, it retains the core principle of PointNet by using shared KAN layers and applying symmetric functions for global feature extraction, ensuring permutation invariance with respect to the input features. In traditional MLPs, the goal is to train the weights and biases with fixed activation functions; however, in KANs, the goal is to train the activation functions themselves. We use Jacobi polynomials to construct the KAN layers. We extensively and systematically evaluate PointNet-KAN across various polynomial degrees and special types such as the Lagrange, Chebyshev, and Gegenbauer polynomials. Our results show that PointNet-KAN achieves competitive performance compared to PointNet with MLPs on benchmark datasets for 3D object classification and part and semantic segmentation, despite employing a shallower and simpler network architecture. We also study a hybrid PointNet model incorporating both KAN and MLP layers. We hope this work serves as a foundation and provides guidance for integrating KANs, as an alternative to MLPs, into more advanced point cloud processing architectures.

Kolmogorov-Arnold 网络( KANs) 最近作为传统多层天体( MLPs) 的替代方案,在深层学习框架中受到关注。 KANs 已被整合到各种深层学习结构中, 如 convolual 神经网络、图形神经网络和变压器, 并且对其性能进行了评估。 但是, 在基于点球的神经网络中, KANPs 仍然没有被挖掘出来。 为了解决这一差距, 我们首次将 KANs 纳入 PortNet 网络, 以评价其在3D 点的云级分类和分解任务上的表现。 具体地说, 我们引入了点Net- KAN- KAN, 以两个关键组成部分为基础建立。 首先, 它使用 KANs 而不是传统的 MLPs 。 第二, 它保留了点网络的核心原则, 使用共享 KAN 的层次, 并应用对全方位功能模型, 确保输入输入输入输入。 在传统的 MLPPS 中, 我们的目标是将重重度和偏重点用于 IMAN 和 Calal IM 。


Article 241

Title@2025-06-11 (3): Function-Guided Conditional Generation Using Protein Language Models with Adapters

Title: Function-Guided Conditional Generation Using Protein Language Models with Adapters Funktionsgeführte Conditional Generation Verwendung von Protein Language Models mit Adaptern 使用配有适应器的蛋白语言模型的 函数引导有条件生成 2410.03634v2

Authors (4): Jason Yang, Aadyot Bhatnagar, Jeffrey A. Ruffolo, Ali Madani

The conditional generation of proteins with desired functions is a key goal for generative models. Existing methods based on prompting of protein language models (PLMs) can generate proteins conditioned on a target functionality, such as a desired enzyme family. However, these methods are limited to simple, tokenized conditioning and have not been shown to generalize to unseen functions. In this study, we propose ProCALM (Protein Conditionally Adapted Language Model), an approach for the conditional generation of proteins using adapters to PLMs. While previous methods have used adapters for structure-conditioned generation from PLMs, our implementation of ProCALM involves finetuning ProGen2 to condition generation based on versatile representations of protein function-e.g. enzyme family, taxonomy, or natural language descriptions. ProCALM matches or exceeds the performance of existing methods at conditional sequence generation from target functions. Impressively, it can also generalize to rare and unseen functions. Overall, ProCALM is a flexible and computationally efficient approach, and we expect that it can be extended to a wide range of generative language models.

具有理想功能的有条件蛋白质的生成是基因模型的一个关键目标; 以刺激蛋白语言模型(PLMs)为基础的现有方法可以产生以目标功能为条件的蛋白质,例如理想的酶系; 然而,这些方法仅限于简单、象征性的调节,没有被显示为可概括的功能; 在这项研究中,我们提议了ProCALM(有条件适应语言模型),一种使用适应器生成有条件蛋白质的方法; 虽然以前的方法已经使用适应器从PLMs中产生结构成型的适应器,但我们实施ProCALM(ProCALM)需要根据蛋白质功能的多功能(例如:酶族、分类或自然语言描述)对ProGen2进行微调,以条件生成。ProCALM在设定目标功能的有条件序列生成时匹配或超过现有方法的功能。 令人印象深刻地说,它也可以概括到稀有和看不见的功能。 总体来说,ProCALM是一种灵活和计算高效的方法,我们期望它能够扩大到广泛的基因化语言模型。


Article 242

Title@2025-06-11 (3): Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods

Title: Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods Prompt Angriffe Enthüllen Oberflächliche Wissensentfernung in unlearning Methoden 以不学习方法清除知识 2506.10236v1

Authors (4): Yeonwoo Jang, Shariqah Hossain, Ashwin Sreevatsa, Diogo Cruz

In this work, we show that some machine unlearning methods may fail when subjected to straightforward prompt attacks. We systematically evaluate eight unlearning techniques across three model families, and employ output-based, logit-based, and probe analysis to determine to what extent supposedly unlearned knowledge can be retrieved. While methods like RMU and TAR demonstrate robust unlearning, ELM remains vulnerable to specific prompt attacks (e.g., Hindi filler text in original prompt recovering 57.3% accuracy). Our logit analysis also confirms that unlearned models are generally not hiding knowledge by modifying the way the answer is formatted, as the correlation between output and logit accuracy is strong. These results challenge prevailing assumptions about unlearning effectiveness and highlight the need for evaluation frameworks that can reliably distinguish between true knowledge removal and superficial output suppression. We also publicly make available our evaluation framework to easily evaluate prompting techniques to retrieve unlearning knowledge.

在这项工作中,我们发现,当受到直截了当的迅速攻击时,一些机器的不学习方法可能会失败。我们系统地评估了三个模式家庭8种不学习技术,并采用了基于产出的、基于登录的和基于调查的分析,以确定在多大程度上可以检索到据称没有获得的知识。RMU和TAR等方法显示的是强有力的不学习,而ELM则仍然容易受到具体的迅速攻击(例如,印地语填充器的原始快速恢复57.3%的准确性)。我们的对日志的分析还证实,由于产出和登录准确性之间的关联性很强,未学习模式通常不会通过改变答案的格式来隐藏知识。这些结果挑战了关于不学习有效性的普遍假设,并强调了评价框架的必要性,这种框架能够可靠地区分真正的知识删除和表面禁止产出。我们还公开提供我们的评价框架,以方便地评价检索不学习知识的快速技术。


Article 243

Title@2025-06-11 (3): LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation

Title: LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation LaMAGIC2: Erweiterte Schaltungsformulierungen für sprachmodellbasierte analoge Topologie-Generierung LaMAGIC2:语言模拟模拟模拟地形生成的先进电路配制 2506.10235v1

Authors (5): Chen-Chia Chang, Wan-Hsuan Lin, Yikang Shen, Yiran Chen, Xin Zhang

Automation of analog topology design is crucial due to customized requirements of modern applications with heavily manual engineering efforts. The state-of-the-art work applies a sequence-to-sequence approach and supervised finetuning on language models to generate topologies given user specifications. However, its circuit formulation is inefficient due to O( V 2) token length and suffers from low precision sensitivity to numeric inputs. In this work, we introduce LaMAGIC2, a succinct float-input canonical formulation with identifier (SFCI) for language model-based analog topology generation. SFCI addresses these challenges by improving component-type recognition through identifier-based representations, reducing token length complexity to O( V ), and enhancing numeric precision sensitivity for better performance under tight tolerances. Our experiments demonstrate that LaMAGIC2 achieves 34% higher success rates under a tight tolerance of 0.01 and 10X lower MSEs compared to a prior method. LaMAGIC2 also exhibits better transferability for circuits with more vertices with up to 58.5% improvement. These advancements establish LaMAGIC2 as a robust framework for analog topology generation.

模拟地形设计自动化至关重要,因为现代应用需要大量手工工程,因此模拟地形设计自动化至关重要。最先进的工作采用顺序到顺序的方法,对语言模型进行有监督的微调,以产生符合用户规格的地形。然而,由于O(V 2)象征性长度,电路配制效率低下,对数字输入的精确度低。在这项工作中,我们引入了LaMAGIC2, 一种简明的、具有基于语言模型的模拟地形生成识别标志(SFCI)的浮式输入式集成式配方。SFCI通过基于识别特征的表示方式改进部件类型识别,降低O(V ) 的象征性复杂性,提高数字精度,以便在紧紧的容度下提高性能。我们的实验表明,LaMAGIC2在0.0和10X较低的MSE的严格耐受容度下,取得了34%更高的成功率。LaMAGIC2还表明,具有更高程度58.5%改进的脊椎的电路路路的可转移性。这些进步使LAMAGIC2成为了最高生成结构的坚固框架。


Article 244

Title@2025-06-11 (3): Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices

Title: Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices Große Sprachmodelle sichern: Bedrohungen, Schwachstellen und verantwortungsvolle Praktiken 保障大语言模式:威胁、脆弱性和负责任做法 2403.12503v2

Authors (5): Sara Abdali, Richard Anarfi, CJ Barberan, Jia He, Erfan Shayegani

Large language models (LLMs) have significantly transformed the landscape of Natural Language Processing (NLP). Their impact extends across a diverse spectrum of tasks, revolutionizing how we approach language understanding and generations. Nevertheless, alongside their remarkable utility, LLMs introduce critical security and risk considerations. These challenges warrant careful examination to ensure responsible deployment and safeguard against potential vulnerabilities. This research paper thoroughly investigates security and privacy concerns related to LLMs from five thematic perspectives: security and privacy concerns, vulnerabilities against adversarial attacks, potential harms caused by misuses of LLMs, mitigation strategies to address these challenges while identifying limitations of current strategies. Lastly, the paper recommends promising avenues for future research to enhance the security and risk management of LLMs.

大型语言模型(LLMs)极大地改变了自然语言处理(NLP)的格局,其影响跨越了各种各样的任务,改变了我们对待语言理解和几代人的方式,然而,LLMs除了具有非凡的效用外,还引入了重要的安全和风险考虑,这些挑战值得仔细研究,以确保负责任地部署和防范潜在的脆弱性,本研究文件从五个主题角度彻底调查与LLMs有关的安全和隐私问题:安全和隐私关切、对抗性攻击的脆弱性、滥用LLMs可能造成的伤害、应对这些挑战的缓解战略,同时查明目前战略的局限性;最后,该文件建议今后开展研究以加强LLMs的安全和风险管理的可行途径。


Article 245

Title@2025-06-11 (3): ScoreMix: Improving Face Recognition via Score Composition in Diffusion Generators

Title: ScoreMix: Improving Face Recognition via Score Composition in Diffusion Generators ScoreMix: Verbesserung der Gesichtserkennung über Score-Komposition in Diffusionsgeneratoren 分数组合:通过传播发电机的分分数构成改善面对面的承认 2506.10226v1

Authors (2): Parsa Rahimi, Sebastien Marcel

In this paper, we propose ScoreMix, a novel yet simple data augmentation strategy leveraging the score compositional properties of diffusion models to enhance discriminator performance, particularly under scenarios with limited labeled data. By convexly mixing the scores from different class-conditioned trajectories during diffusion sampling, we generate challenging synthetic samples that significantly improve discriminative capabilities in all studied benchmarks. We systematically investigate class-selection strategies for mixing and discover that greater performance gains arise when combining classes distant in the discriminator’s embedding space, rather than close in the generator’s condition space. Moreover, we empirically show that, under standard metrics, the correlation between the generator’s learned condition space and the discriminator’s embedding space is minimal. Our approach achieves notable performance improvements without extensive parameter searches, demonstrating practical advantages for training discriminative models while effectively mitigating problems regarding collections of large datasets. Paper website: https://parsa-ra.github.io/scoremix

在本文中,我们提出ScowMix,这是一个创新而简单的数据增强战略,它利用扩散模型的分数构成特性来提高歧视性性能,特别是在有有限标签数据的情况下。我们通过在扩散取样过程中将不同等级固定轨道的分数混为一谈,生成了具有挑战性的合成样本,大大提高了所有研究基准的歧视性能力。我们系统地调查了分类选择混合战略,发现如果将远离歧视者嵌入空间的班级结合在一起,而不是接近发电机的状态空间,则产生更大的性能收益。此外,我们从经验上表明,根据标准衡量标准,发电机的学习状态空间与歧视者嵌入空间之间的关系是最小的。我们的方法在没有广泛参数搜索的情况下取得了显著的性能改进,展示了培训歧视模式的实际优势,同时有效地缓解了大型数据集收集方面的问题。 论文网站: https://parsa-ra.github.io/scorminix。


Article 246

Title@2025-06-11 (3): Meta-learning Optimizers for Communication-Efficient Learning

Title: Meta-learning Optimizers for Communication-Efficient Learning Meta-Learning-Optimierer für Kommunikations-Effizientes Lernen 通信-高效学习的元学习优化工具 2312.02204v2

Authors (5): Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky

Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally on each worker, before averaging model parameters, helping relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art adaptive optimizers for deep learning. In this work, we investigate if the recent progress in the emerging area of learned optimizers can potentially close this gap in homogeneous data and homogeneous device settings while remaining communication-efficient. Specifically, we meta-learn how to perform global updates given an update from local SGD iterations. Our results demonstrate that learned optimizers can substantially outperform local SGD and its sophisticated variants while maintaining their communication efficiency. Our learned optimizers can even generalize to unseen and much larger datasets and architectures, including ImageNet and ViTs, and to unseen modalities such as language modeling. We therefore show the potential of learned optimizers for improving communication-efficient distributed learning.

SGD, 特别是本地 SGD 的通信效率变异器近年来受到极大关注。 这些方法在平均模型参数之前,在当地计算每个工人的多重梯度步骤,帮助减轻分布式深层学习培训中关键的通信瓶颈。 虽然提出了许多变异器,但有时它们可能落后于最先进的适应性优化系统,以便深层学习。在这项工作中,我们调查在新兴的有知识的优化器领域最近取得的进展能否在保持通信效率的同时,缩小同质数据和同质设备设置方面的差距。具体地说,我们利用本地SGD的更新,将如何进行全球更新。我们的成果表明,学习的优化器在保持通信效率的同时,可以大大优于本地的 SGD及其尖端变异器。我们学到的优化器甚至可以概括为看不见的和更大的数据集和结构,包括图像网和ViT,以及语言模型等隐形模式。 因此,我们展示了学习的优化器在改进传播效率的学习方面的潜力。


Article 247

Title@2025-06-11 (3): Neural Networks Generalize on Low Complexity Data

Title: Neural Networks Generalize on Low Complexity Data Neurale Netzwerke verallgemeinern auf Daten mit geringer Komplexität 通用低复杂度数据神经网络 2409.12446v3

Authors (2): Sourav Chatterjee, Timothy Sudijono

We show that feedforward neural networks with ReLU activation generalize on low complexity data, suitably defined. Given i.i.d.~data generated from a simple programming language, the minimum description length (MDL) feedforward neural network which interpolates the data generalizes with high probability. We define this simple programming language, along with a notion of description length of such networks. We provide several examples on basic computational tasks, such as checking primality of a natural number. For primality testing, our theorem shows the following and more. Suppose that we draw an i.i.d.~sample of $n$ numbers uniformly at random from $1$ to $N$. For each number $x_i$, let $y_i = 1$ if $x_i$ is a prime and $0$ if it is not. Then, the interpolating MDL network accurately answers, with probability $1- O((\ln N)/n)$, whether a newly drawn number between $1$ and $N$ is a prime or not. Note that the network is not designed to detect primes; minimum description learning discovers a network which does so. Extensions to noisy data are also discussed, suggesting that MDL neural network interpolators can demonstrate tempered overfitting.

我们用RELU 激活功能来显示饲料前进的神经网络, 以简单编程语言生成的数据, 即 i. d. ~ 数据, 以简单编程语言生成的 i. d. ~ 数据, 最小描述长度 (MDL) 向神经网络, 以高概率对数据进行统观。 我们定义了这种简单的编程语言, 并给出了描述这种网络长度的概念。 我们提供了一些关于基本计算任务的例子, 如检查自然数字的初等性。 在初等测试中, 我们的理论显示以下及更多。 假设我们绘制了 i. d. ~ 美元数字的样本, 平均为1美元到 美元。 对于每个数字, $_ i 美元, 允许$y_ i = 1美元, 如果$x i 是正数, 如果不是正数, 则为0. 0美元。 然后, 我们提供一些插图解的MDL 网络准确的答案, 概率为 1- O ( n) / n 美元, , , 无论新绘制的数字在$ $ $ n 之间是 1 pri. d. d. d 或 not. prople small pall un 。 。 un.


Article 248

Title@2025-06-11 (3): The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks

Title: The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks Die Auswirkungen von Feature Scaling im maschinellen Lernen: Auswirkungen auf Regressions- und Klassifizierungsaufgaben 机械学习中的特质增强效果:对倒退和分类任务的影响 2506.08274v2

Authors (7): João Manoel Herrera Pinheiro, Suzana Vilas Boas de Oliveira, Thiago Henrique Segreto Silva, Pedro Antonio Rabelo Saraiva, Enzo Ferreira de Souza, Leonardo André Ambrosio, Marcelo Becker

This research addresses the critical lack of comprehensive studies on feature scaling by systematically evaluating 12 scaling techniques - including several less common transformations - across 14 different Machine Learning algorithms and 16 datasets for classification and regression tasks. We meticulously analyzed impacts on predictive performance (using metrics such as accuracy, MAE, MSE, and $R^2$) and computational costs (training time, inference time, and memory usage). Key findings reveal that while ensemble methods (such as Random Forest and gradient boosting models like XGBoost, CatBoost and LightGBM) demonstrate robust performance largely independent of scaling, other widely used models such as Logistic Regression, SVMs, TabNet, and MLPs show significant performance variations highly dependent on the chosen scaler. This extensive empirical analysis, with all source code, experimental results, and model parameters made publicly available to ensure complete transparency and reproducibility, offers model-specific crucial guidance to practitioners on the need for an optimal selection of feature scaling techniques.

这项研究通过系统评估14种不同的机器学习算法和16个数据集用于分类和回归任务的12种缩放技术(包括若干较不常见的变换)和16个数据集,解决了严重缺乏关于特征缩放的全面研究的问题。我们仔细分析了对预测性业绩的影响(使用精确度、MAE、MSE和$R%2美元等指标)和计算成本(培训时间、推算时间和记忆使用)和计算成本的影响(培训时间、试验结果和记忆使用)。主要调查结果显示,尽管混合方法(如随机森林和梯度加速模型,如XGBoost、CatBoost和LightGBM)显示强健的性能基本上独立于缩放,但其他广泛使用的模式(如物流递增、SVMS、TabNet和MLPs)显示,显著的性能差异在很大程度上取决于选定的缩放尺度。这一广泛的实证分析提供了所有源代码、实验结果和模型参数,以确保完全透明和可复制性,向从业人员提供关于最佳选择特征缩放技术需要的模型的关键指导。


Article 249

Title@2025-06-11 (3): SALAD: Systematic Assessment of Machine Unlearing on LLM-Aided Hardware Design

Title: SALAD: Systematic Assessment of Machine Unlearing on LLM-Aided Hardware Design SALAD: Systematische Bewertung der Maschinenentleerung auf LLM-Aided Hardware Design SALAD: 系统评估LLM辅助硬件设计中的机器渗漏 2506.02089v2

Authors (9): Zeng Wang, Minghao Shao, Rupesh Karn, Likhitha Mankali, Jitendra Bhandari, Ramesh Karri, Ozgur Sinanoglu, Muhammad Shafique, Johann Knechtel

Large Language Models (LLMs) offer transformative capabilities for hardware design automation, particularly in Verilog code generation. However, they also pose significant data security challenges, including Verilog evaluation data contamination, intellectual property (IP) design leakage, and the risk of malicious Verilog generation. We introduce SALAD, a comprehensive assessment that leverages machine unlearning to mitigate these threats. Our approach enables the selective removal of contaminated benchmarks, sensitive IP and design artifacts, or malicious code patterns from pre-trained LLMs, all without requiring full retraining. Through detailed case studies, we demonstrate how machine unlearning techniques effectively reduce data security risks in LLM-aided hardware design.

大型语言模型(LLMS)为硬件设计自动化提供了变革能力,特别是在Verilog代码生成方面,但也带来了数据安全方面的重大挑战,包括Verilog评价数据污染、知识产权设计渗漏和恶意Verilog生成的风险。我们引入了SALAD(SALAD)综合评估,该评估利用机器退出学习来减轻这些威胁。我们的方法使得有选择地将受污染的基准、敏感的IP和设计文物或恶意代码模式从预先培训的LMS中去除,而无需充分再培训。通过详细的案例研究,我们展示了机器不学习技术如何有效减少LLM辅助硬件设计的数据安全风险。


Article 250

Title@2025-06-11 (3): Cross-Learning Between ECG and PCG: Exploring Common and Exclusive Characteristics of Bimodal Electromechanical Cardiac Waveforms

Title: Cross-Learning Between ECG and PCG: Exploring Common and Exclusive Characteristics of Bimodal Electromechanical Cardiac Waveforms Cross-Learning zwischen EKG und PCG: Erforschung gemeinsamer und exklusiver Eigenschaften bimodaler elektromechanischer Herzwellenformen ECG和PCG之间的交叉学习:探索双式电动机械心脏病波形的常见和独有特点 2506.10212v1

Authors (4): Sajjad Karimi, Amit J. Shah, Gari D. Clifford, Reza Sameni

Simultaneous electrocardiography (ECG) and phonocardiogram (PCG) provide a comprehensive, multimodal perspective on cardiac function by capturing the heart’s electrical and mechanical activities, respectively. However, the distinct and overlapping information content of these signals, as well as their potential for mutual reconstruction and biomarker extraction, remains incompletely understood, especially under varying physiological conditions and across individuals. In this study, we systematically investigate the common and exclusive characteristics of ECG and PCG using the EPHNOGRAM dataset of simultaneous ECG-PCG recordings during rest and exercise. We employ a suite of linear and nonlinear machine learning models, including non-causal LSTM networks, to reconstruct each modality from the other and analyze the influence of causality, physiological state, and cross-subject variability. Our results demonstrate that nonlinear models, particularly non-causal LSTM, provide superior reconstruction performance, with reconstructing ECG from PCG proving more tractable than the reverse. Exercise and cross-subject scenarios present significant challenges, but envelope-based modeling that utilizes instantaneous amplitude features substantially improves cross-subject generalizability for cross-modal learning. Furthermore, we demonstrate that clinically relevant ECG biomarkers, such as fiducial points and QT intervals, can be estimated from PCG in cross-subject settings. These findings advance our understanding of the relationship between electromechanical cardiac modalities, in terms of both waveform characteristics and the timing of cardiac events, with potential applications in novel multimodal cardiac monitoring technologies.

在这项研究中,我们利用ECG和PCG同时的ECG-PCG记录数据集,系统地调查ECG和PCG的共同和专属特性。我们使用一套线性和非线性机器学习模型,包括非线性LSTM网络,从其他模式中重建每一种模式,分析因果关系、生理状态和跨主题差异的影响。我们的结果表明,非线性模型,特别是非线性 LSTM, 提供了优异的重建性能,从PCG重建ECG, 证明比反向更易行。练习和跨背景情景提出了重大挑战,但基于信封的模型,利用瞬间粘度特性,从其他模式中重建各种模式,包括LSTM网络,从其他模式中重建每一种模式,分析因果关系、生理状态和跨主题差异的影响。我们的结果表明,非线性模型,特别是非线性LSTM,提供了优超标准性重建性功能,从PCG中重建ECG,证明比反向更易操作。练习和跨背景情景带来了重大挑战,但基于信封模型的模型,利用瞬间粘测测测测测测测测测测的轨道的轨道机机性机性机性机性机性模型,从而测量测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测测了EG。


Article 251

Title@2025-06-11 (3): Failure Modes of LLMs for Causal Reasoning on Narratives

Title: Failure Modes of LLMs for Causal Reasoning on Narratives Failure Modes von LLMs für die ursächliche Begründung von Narrativen 以叙述为由解释原因的LLMs失败模式 2410.23884v4

Authors (5): Khurram Yamin, Shantanu Gupta, Gaurav R. Ghosal, Zachary C. Lipton, Bryan Wilder

In this work, we investigate the causal reasoning abilities of large language models (LLMs) through the representative problem of inferring causal relationships from narratives. We find that even state-of-the-art language models rely on unreliable shortcuts, both in terms of the narrative presentation and their parametric knowledge. For example, LLMs tend to determine causal relationships based on the topological ordering of events (i.e., earlier events cause later ones), resulting in lower performance whenever events are not narrated in their exact causal order. Similarly, we demonstrate that LLMs struggle with long-term causal reasoning and often fail when the narratives are long and contain many events. Additionally, we show LLMs appear to rely heavily on their parametric knowledge at the expense of reasoning over the provided narrative. This degrades their abilities whenever the narrative opposes parametric knowledge. We extensively validate these failure modes through carefully controlled synthetic experiments, as well as evaluations on real-world narratives. Finally, we observe that explicitly generating a causal graph generally improves performance while naive chain-of-thought is ineffective. Collectively, our results distill precise failure modes of current state-of-the-art models and can pave the way for future techniques to enhance causal reasoning in LLMs.

在这项工作中,我们调查大型语言模型(LLMs)的因果推理能力,方法是通过从叙述中推断因果关系这一具有代表性的问题来推断大语言模型(LLMs)的因果推理能力;我们发现,即使是最先进的语言模型也依赖不可靠的捷径,无论是叙述性陈述还是其参数知识;例如,LLMs往往根据事件的地形顺序确定因果关系(即早期事件导致后来的事件),因此,如果事件没有按其确切的因果顺序加以说明,其性能就会下降;同样,我们证明LLMs与长期因果推理斗争,当叙述很长且包含许多事件时往往失败。此外,我们显示LLMs似乎严重依赖其准度知识,而牺牲了对所提供的叙述性陈述的推理。这削弱了他们的能力,每当叙述性说明反对参数知识时,我们通过仔细控制的合成实验以及实际世界叙述性叙述性来广泛验证这些失败模式。最后,我们指出,明确产生因果关系的图表通常会改善业绩,而天真的思维链是无效的。从整体上看,我们的结果会保持准确的失败模式,从而强化了当前磁号模型的推算方法,可以加强未来的推导法。


Article 252

Title@2025-06-11 (3): SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

Title: SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models SealQA: Anhebung der Messlatte für die Vernunft in Search-Augmented Language Models SealQA: 提高搜索增强语言模式中的原因栏 2506.01062v2

Authors (6): Thinh Pham, Nguyen Nguyen, Pratibha Zunjare, Weiyuan Chen, Yu-Min Tseng, Tu Vu

We introduce SealQA, a new challenge benchmark for evaluating SEarch-Augmented Language models on fact-seeking questions where web search yields conflicting, noisy, or unhelpful results. SealQA comes in three flavors: (1) Seal-0 (main) and (2) Seal-Hard, which assess factual accuracy and reasoning capabilities, with Seal-0 focusing on the most challenging questions where chat models (e.g., GPT-4.1) typically achieve near-zero accuracy; and (3) LongSeal, which extends SealQA to test long-context, multi-document reasoning in “needle-in-a-haystack” settings. Our evaluation reveals critical limitations in current models: Even frontier LLMs perform poorly across all SealQA flavors. On Seal-0, frontier agentic models equipped with tools like o3 and o4-mini achieve only 17.1% and 6.3% accuracy, respectively, at their best reasoning efforts. We find that advanced reasoning models such as DeepSeek-R1-671B and o3-mini are highly vulnerable to noisy search results. Notably, increasing test-time compute does not yield reliable gains across o3-mini, o4-mini, and o3, with performance often plateauing or even declining early. Additionally, while recent models are less affected by the “lost-in-the-middle” issue, they still fail to reliably identify relevant documents in LongSeal when faced with numerous distractors. To facilitate future work, we release SealQA at huggingface.co/datasets/vtllms/sealqa.

我们引入了Search-Auged 语言模型, 用于评估网络搜索结果相互冲突、吵闹或无益的实况调查问题的新挑战基准。 SealQA 以三种口味出现:(1) Seal-0 (main) 和(2) Seal-Hard, 评估真实准确性和推理能力, 由Seal-0 侧重于聊天模式(例如GPT-4.1)通常达到接近零准确度的最具有挑战性的问题;(3) LongSeal, 将SealQA 扩展至测试“需要-a-haystack” 设置中的长相文本、多文档推理。 我们的评估揭示了当前模式中的关键局限性:(1) 甚至边境LLMS在所有Seal 口味上表现不佳。 在Seil-0, 配有O3 和 o4 工具的边境代理模型分别只达到17.1% 和6.3% 准确度, 其最佳推理工作。 我们发现, DeepSeek- R1-671B 和 o3 Mini 等高级推理模型仍然极易受到冷测的搜索结果的伤害。 。 。 明显, 不断的测试- 分析- 和不断下降的收益 , 无法产生 。


Article 253

Title@2025-06-11 (3): ETS: Efficient Tree Search for Inference-Time Scaling

Title: ETS: Efficient Tree Search for Inference-Time Scaling ETS: Effiziente Baumsuche nach Inferenz-Zeitskalierung ETS: 高效树搜索推断时间缩放 2502.13575v2

Authors (10): Coleman Hooper, Sehoon Kim, Suhong Moon, Kerem Dilmen, Monishwaran Maheswaran, Nicholas Lee, Michael W. Mahoney, Sophia Shao, Kurt Keutzer, Amir Gholami

Test-time compute scaling has emerged as a new axis along which to improve model accuracy, where additional computation is used at inference time to allow the model to think longer for more challenging problems. One promising approach for test-time compute scaling is search against a process reward model, where a model generates multiple potential candidates at each step of the search, and these partial trajectories are then scored by a separate reward model in order to guide the search process. The diversity of trajectories in the tree search process affects the accuracy of the search, since increasing diversity promotes more exploration. However, this diversity comes at a cost, as divergent trajectories have less KV sharing, which means they consume more memory and slow down the search process. Previous search methods either do not perform sufficient exploration, or else explore diverse trajectories but have high latency. We address this challenge by proposing Efficient Tree Search (ETS), which promotes KV sharing by pruning redundant trajectories while maintaining necessary diverse trajectories. ETS incorporates a linear programming cost model to promote KV cache sharing by penalizing the number of nodes retained, while incorporating a semantic coverage term into the cost model to ensure that we retain trajectories which are semantically different. We demonstrate how ETS can achieve 1.8$\times$ reduction in average KV cache size during the search process, leading to 1.4$\times$ increased throughput relative to prior state-of-the-art methods, with minimal accuracy degradation and without requiring any custom kernel implementation. Code is available at: https://github.com/SqueezeAILab/ETS.

测试时计算缩放是一个新的轴轴,沿着这个轴来提高模型的准确性,在这个轴上,在推论时间使用额外的计算方法,让模型更长时间思考更具有挑战性的问题。一个有希望的测试时计算缩放方法就是对一个过程奖励模型进行搜索,一个模型在搜索的每一步都产生多种潜在候选人,然后这些部分轨迹通过一个单独的奖赏模型来评分,以指导搜索进程。树上搜索过程中的轨迹多样性影响到搜索的准确性,因为增加的多样化有助于更多的探索。然而,这种多样性是成本产生的,因为不同的轨迹较少地让模型对更具挑战性的问题进行思考。不同的轨迹显示,这意味着它们消耗更多的记忆力和减慢搜索过程。 以前的搜索方法要么没有进行充分的探索,或者探索不同的轨迹图,但是我们提出高效的树木搜索(ETS),通过运行多余的美元轨迹来促进 KV的共享,同时保持必要的最低轨迹。 ETS 包含一个线性编程成本模型,用来促进KV缓存的缓存, 将我们保留在平时, 平时可以保留一个稳定的搜索时间里程。


Article 254

Title@2025-06-11 (3): Towards Fair Representation: Clustering and Consensus

Title: Towards Fair Representation: Clustering and Consensus Auf dem Weg zu einer fairen Vertretung: Clustering und Konsens 实现公平代表性:集群和共识 2506.08673v2

Authors (5): Diptarka Chakraborty, Kushagra Chatterjee, Debarati Das, Tien Long Nguyen, Romina Nobahari

Consensus clustering, a fundamental task in machine learning and data analysis, aims to aggregate multiple input clusterings of a dataset, potentially based on different non-sensitive attributes, into a single clustering that best represents the collective structure of the data. In this work, we study this fundamental problem through the lens of fair clustering, as introduced by Chierichetti et al. [NeurIPS’17], which incorporates the disparate impact doctrine to ensure proportional representation of each protected group in the dataset within every cluster. Our objective is to find a consensus clustering that is not only representative but also fair with respect to specific protected attributes. To the best of our knowledge, we are the first to address this problem and provide a constant-factor approximation. As part of our investigation, we examine how to minimally modify an existing clustering to enforce fairness – an essential postprocessing step in many clustering applications that require fair representation. We develop an optimal algorithm for datasets with equal group representation and near-linear time constant factor approximation algorithms for more general scenarios with different proportions of two group sizes. We complement our approximation result by showing that the problem is NP-hard for two unequal-sized groups. Given the fundamental nature of this problem, we believe our results on Closest Fair Clustering could have broader implications for other clustering problems, particularly those for which no prior approximation guarantees exist for their fair variants.

集成共识是机器学习和数据分析的一项根本任务,其宗旨是将可能基于不同非敏感属性的数据集的多重输入群集汇总成一个最能代表数据集体结构的单一组群。在这项工作中,我们通过由Chierichetti et al.[NeurIPS’17]介绍的公平组群透镜研究这一根本问题,它包含了不同的影响理论,以确保每个受保护群体在每个组群的数据集中的比例代表比例。我们的目标是找到一个不仅具有代表性的共识组群,而且在特定受保护属性方面也是公平的。我们最了解的是,我们首先处理这一问题,并提供一个常态点近似。作为我们调查的一部分,我们研究如何尽量减少现有组群集的公平化镜头,以落实公平性,这是许多组群应用中一个基本的后处理步骤,需要公平代表性。我们为每个组群群中具有平等代表性的数据集和近线时常系数缩算法制定了一种最佳算法,用于两个组规模不同的更一般假设。我们通过显示问题的严重性,我们首先处理这一问题,提供不变的近似结果。作为我们调查的一部分,我们无法对两个不平等的组群体产生更深刻的影响。


Article 255

Title@2025-06-11 (3): AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent

Title: AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent AWP: Activation-Aware Weight Pruning und Quantisierung mit projiziertem Gradient Descent AWP: 与预测的渐变人种相比,活性-敏量 体重 稳重和量化 2506.10205v1

Authors (5): Jing Liu, Toshiaki Koike-Akino, Ye Wang, Hassan Mansour, Matthew Brand

To address the enormous size of Large Language Models (LLMs), model compression methods, such as quantization and pruning, are often deployed, especially on edge devices. In this work, we focus on layer-wise post-training quantization and pruning. Drawing connections between activation-aware weight pruning and sparse approximation problems, and motivated by the success of Iterative Hard Thresholding (IHT), we propose a unified method for Activation-aware Weight pruning and quantization via Projected gradient descent (AWP). Our experiments demonstrate that AWP outperforms state-of-the-art LLM pruning and quantization methods. Theoretical convergence guarantees of the proposed method for pruning are also provided.

为解决大语言模型(LLMs)的巨大规模问题,通常会采用模型压缩方法,如量化和裁剪等,特别是在边缘设备上。在这项工作中,我们侧重于从层到层的训练后量化和裁剪。在激活-觉察重量裁剪和稀疏近近似问题之间建立联系,并受迭代硬推力控股的成功推动,我们提出了一种统一的方法,通过预测梯度下降进行活化-觉察重量裁剪和四分化。我们的实验表明,AWP优于最先进的LLM裁剪裁剪和四分化方法。我们还提供了拟议裁剪方法的理论趋同保证。


Article 256

Title@2025-06-11 (3): VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use

Title: VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use VTool-R1: VLMs lernen mit Bildern zu denken, indem sie mehr über multimodale Werkzeugnutzung lernen VTool-R1:VLMs通过多模式工具使用强化学习学习如何用图像思考 2505.19255v3

Authors (9): Mingyuan Wu, Jingcheng Yang, Jize Jiang, Meitang Li, Kaizhuo Yan, Hanchao Yu, Minjia Zhang, Chengxiang Zhai, Klara Nahrstedt

Reinforcement Learning Finetuning (RFT) has significantly advanced the reasoning capabilities of large language models (LLMs) by enabling long chains of thought, self-correction, and effective tool use. While recent works attempt to extend RFT to vision-language models (VLMs), these efforts largely produce text-only reasoning conditioned on static image inputs, falling short of true multimodal reasoning in the response. In contrast, test-time methods like Visual Sketchpad incorporate visual steps but lack training mechanisms. We introduce VTool-R1, the first framework that trains VLMs to generate multimodal chains of thought by interleaving text and intermediate visual reasoning steps. VTool-R1 integrates Python-based visual editing tools into the RFT process, enabling VLMs to learn when and how to generate visual reasoning steps that benefit final reasoning. Trained with outcome-based rewards tied to task accuracy, our approach elicits strategic visual tool use for reasoning without relying on process-based supervision. Experiments on structured visual question answering over charts and tables show that VTool-R1 enhances reasoning performance by teaching VLMs to “think with images” and generate multimodal chain of thoughts with tools.

通过长期思维链、自我校正和有效工具的使用,大型语言模型(LLMS)的推理能力有了显著提高,最近的工作试图将RFT推广到视觉语言模型(VLMS),但这些努力基本上产生了仅以静态图像输入为条件的文本推理,没有达到真正的多式推理。相比之下,视觉Strachpad等试验时间方法包含视觉步骤,但缺乏培训机制。我们引入VTool-R1,这是培训VLMS通过互读文本和中间视觉推理步骤产生多式思维链的第一个框架。VTool-R1将Python的视觉编辑工具纳入RFTA进程,使VLMS能够学习何时以及如何产生有利于最终推理的视觉推理步骤。经过基于结果的奖励与任务精度挂钩,我们的方法引出战略视觉工具用于推理,而无需基于程序的监督。通过图表和表格对结构直观问题进行实验,显示VTool-R1通过教授VLMS公司“用图像思考”和产生多式思维链。


Article 257

Title@2025-06-11 (3): Prompt Variability Effects On LLM Code Generation

Title: Prompt Variability Effects On LLM Code Generation Veränderliche Auswirkungen auf die LLM-Code-Generierung 对LLM 代码生成的迅速易变性效应 2506.10204v1

Authors (5): Andrei Paleyes, Radzim Sendyka, Diana Robinson, Christian Cabrera, Neil D. Lawrence

Code generation is one of the most active areas of application of Large Language Models (LLMs). While LLMs lower barriers to writing code and accelerate development process, the overall quality of generated programs depends on the quality of given prompts. Specifically, functionality and quality of generated code can be sensitive to user’s background and familiarity with software development. It is therefore important to quantify LLM’s sensitivity to variations in the input. To this end we propose a synthetic evaluation pipeline for code generation with LLMs, as well as a systematic persona-based evaluation approach to expose qualitative differences of LLM responses dependent on prospective user background. Both proposed methods are completely independent from specific programming tasks and LLMs, and thus are widely applicable. We provide experimental evidence illustrating utility of our methods and share our code for the benefit of the community.

生成代码是应用大语言模型最活跃的领域之一。虽然LLMs公司降低写法码和加速发展进程的障碍,但生成程序的总体质量取决于给定提示的质量。具体地说,生成代码的功能和质量对用户的背景和软件开发的熟悉度十分敏感,因此,必须量化LLM公司对投入差异的敏感度。为此,我们提议为与LLMs公司生成代码建立一个合成评价管道,并采用基于人的系统评价方法,以揭示LLM公司根据潜在用户背景做出的反应的质量差异。两种拟议方法都完全独立于具体的方案编制任务和LMS公司,因此广泛适用。我们提供实验性证据,说明我们的方法的效用,并分享我们的代码,以造福社区。


Article 258

Title@2025-06-11 (3): From Neural Representations to Interpretable Logic Rules

Title: From Neural Representations to Interpretable Logic Rules Von neuralen Vertretungen zu interpretierbaren Logischen Regeln 从神经代表到可解释逻辑规则 2501.08281v2

Authors (5): Chuqin Geng, Xiaojie Xu, Anqi Xing, Ziyu Zhao, Xujie Si

As deep neural networks continue to excel across various domains, their black-box nature has raised concerns about transparency and trust. In particular, interpretability has become increasingly essential for applications that demand high safety and knowledge rigor, such as drug discovery, autonomous driving, and genomics. However, progress in understanding even the simplest deep neural networks - such as fully connected networks - has been limited, despite their role as foundational elements in state-of-the-art models like ResNet and Transformer. In this paper, we address this challenge by introducing NeuroLogic, a novel approach for decoding interpretable logic rules from neural networks. NeuroLogic leverages neural activation patterns to capture the model’s critical decision-making processes, translating them into logical rules represented by hidden predicates. Thanks to its flexible design in the grounding phase, NeuroLogic can be adapted to a wide range of neural networks. For simple fully connected neural networks, hidden predicates can be grounded in certain split patterns of original input features to derive decision-tree-like rules. For large, complex vision neural networks, NeuroLogic grounds hidden predicates into high-level visual concepts that are understandable to humans. Our empirical study demonstrates that NeuroLogic can extract global and interpretable rules from state-of-the-art models such as ResNet, a task at which existing work struggles. We believe NeuroLogic can help pave the way for understanding the black-box nature of neural networks.

由于深层神经网络继续跨越多个领域,它们的黑箱性质引起了对透明度和信任的担忧。 特别是,对于需要高度安全和知识固定的应用程序,例如药物发现、自主驾驶和基因组学,解释性已变得越来越重要。 然而,对即使是最简单的深层神经网络 — — 如完全连接的网络 — — 的理解进展也很有限,尽管这些网络在ResNet和变异器等最先进的模型中扮演着基本元素的作用。 在本文中,我们通过引入“神经实验室”来应对这一挑战,这是将神经网络中可解释的逻辑规则解译为新颖的方法。 神经实验室网络利用神经激活模式来捕捉捉摸模型的关键决策过程,将其转化为隐蔽的上游规则。 由于在地面阶段的灵活设计,NeroLologic可以适应广泛的神经网络网络网络网络。 对于简单完全连接的神经网络来说,隐藏的上游数据可以植根于某些原始输入模式的分解模式中,从而得出类似决策层规则。 对于大型、复杂的神经网络网络网络网络网络网络网络网络网络的动态激活模式来说,可以将神经激活模式转换成隐含的内行法。


Article 259

Title@2025-06-11 (3): Learning Geometric Invariant Features for Classification of Vector Polygons with Graph Message-passing Neural Network

Title: Learning Geometric Invariant Features for Classification of Vector Polygons with Graph Message-passing Neural Network Lernen Geometrische Invariante Funktionen zur Klassifizierung von Vektor-Polygonen mit Graph Message-passing Neural Network 具有图形信件通过神经网络的矢量多边形分类 2407.04334v2

Authors (3): Zexian Huang, Kourosh Khoshelham, Martin Tomko

Geometric shape classification of vector polygons remains a challenging task in spatial analysis. Previous studies have primarily focused on deep learning approaches for rasterized vector polygons, while the study of discrete polygon representations and corresponding learning methods remains underexplored. In this study, we investigate a graph-based representation of vector polygons and propose a simple graph message-passing framework, PolyMP, along with its densely self-connected variant, PolyMP-DSC, to learn more expressive and robust latent representations of polygons. This framework hierarchically captures self-looped graph information and learns geometric-invariant features for polygon shape classification. Through extensive experiments, we demonstrate that combining a permutation-invariant graph message-passing neural network with a densely self-connected mechanism achieves robust performance on benchmark datasets, including synthetic glyphs and real-world building footprints, outperforming several baseline methods. Our findings indicate that PolyMP and PolyMP-DSC effectively capture expressive geometric features that remain invariant under common transformations, such as translation, rotation, scaling, and shearing, while also being robust to trivial vertex removals. Furthermore, we highlight the strong generalization ability of the proposed approach, enabling the transfer of learned geometric features from synthetic glyph polygons to real-world building footprints.

矢量多边形的几何形状分类仍然是空间分析中一项艰巨的任务。先前的研究主要侧重于对弧式矢量多边形的深度学习方法,而对离散多边形和相应的学习方法的研究仍未得到充分探讨。在本研究中,我们调查矢量多边形的图形代表,并提议一个简单的图表信息传递框架,即PolyMP及其密集的自我连接变体PolyMP-DSC,以学习更多表达和强大的多边形的潜在表现。这个框架从等级上捕捉自我透视图信息,并学习多边形分类的几何异性特征。通过广泛的实验,我们证明,将偏差-异式图形信息传递神经网络与密集的自我连接机制相结合,在基准数据集(包括合成晶体和真实世界建筑足迹)上取得了强有力的业绩,超过了几个基线方法。我们的研究结果表明, PolyMP 和 PolyMP-DSC 有效捕捉到在共同变异状态下仍然存在的显性地球度特征,例如翻译、旋转、缩放、缩缩和缩放等几异变异性特性特征。同时,还提议将精准的合成地球平面方法转化为学习。


Article 260

Title@2025-06-11 (3): Privacy-aware Berrut Approximated Coded Computing for Federated Learning

Title: Privacy-aware Berrut Approximated Coded Computing for Federated Learning Privacy-aware Berrut Ungefähre Coded Computing für Federated Learning Berrut Berrut Berrut 接近的联邦学习联合会编码计算器 2405.01704v3

Authors (3): Xavier Martínez Luaña, Rebeca P. Díaz Redondo, Manuel Fernández Veiga

Federated Learning (FL) is an interesting strategy that enables the collaborative training of an AI model among different data owners without revealing their private datasets. Even so, FL has some privacy vulnerabilities that have been tried to be overcome by applying some techniques like Differential Privacy (DP), Homomorphic Encryption (HE), or Secure Multi-Party Computation (SMPC). However, these techniques have some important drawbacks that might narrow their range of application: problems to work with non-linear functions and to operate large matrix multiplications and high communication and computational costs to manage semi-honest nodes. In this context, we propose a solution to guarantee privacy in FL schemes that simultaneously solves the previously mentioned problems. Our proposal is based on the Berrut Approximated Coded Computing, a technique from the Coded Distributed Computing paradigm, adapted to a Secret Sharing configuration, to provide input privacy to FL in a scalable way. It can be applied for computing non-linear functions and treats the special case of distributed matrix multiplication, a key primitive at the core of many automated learning tasks. Because of these characteristics, it could be applied in a wide range of FL scenarios, since it is independent of the machine learning models or aggregation algorithms used in the FL scheme. We provide analysis of the achieved privacy and complexity of our solution and, due to the extensive numerical results performed, a good trade-off between privacy and precision can be observed.

联邦学习(FL)是一个有趣的战略,它使不同数据所有者能够在不透露其私人数据集的情况下对AI模型进行合作培训,从而能够在不同数据所有者之间对AI模型进行合作培训。即使如此,FL也有一些隐私脆弱性,通过应用差异隐私(DP)、单态加密(HE)或安全多党计算(SMPC)等技术,试图克服这些隐私脆弱性。然而,这些技术有一些重要的缺点,可能缩小其应用范围:使用非线性功能和操作大型矩阵倍增和高通信和计算成本以管理半正统节点。在这方面,我们提出了一个解决方案,以保障FL的隐私,同时解决上述问题。我们的提案基于Berrut Ap相近的编码计算(He)技术,这是根据编码分配的计算机模型技术,适应了秘密共享配置,以可缩放的方式为FL提供输入隐私。它可用于计算非线性功能,并处理分布式矩阵倍增的特例,这是许多自动学习任务的核心的原始关键。由于这些特性,因此,我们的提案可以广泛应用FL的精确性模型,因此,因此,我们可以将采用一种独立的模型和Falalalalalalalalal al comma ex ex exal laction laction laction lades。


Article 261

Title@2025-06-11 (3): DynaSubVAE: Adaptive Subgrouping for Scalable and Robust OOD Detection

Title: DynaSubVAE: Adaptive Subgrouping for Scalable and Robust OOD Detection DynaSubVAE: Adaptive Subgruppierung für skalierbare und robuste OOD-Erkennung DynaSubVAE:可缩放和强力OOD探测适应性分组 2506.10200v1

Authors (4): Tina Behrouzi, Sana Tonekaboni, Rahul G. Krishnan, Anna Goldenberg

Real-world observational data often contain existing or emerging heterogeneous subpopulations that deviate from global patterns. The majority of models tend to overlook these underrepresented groups, leading to inaccurate or even harmful predictions. Existing solutions often rely on detecting these samples as Out-of-domain (OOD) rather than adapting the model to new emerging patterns. We introduce DynaSubVAE, a Dynamic Subgrouping Variational Autoencoder framework that jointly performs representation learning and adaptive OOD detection. Unlike conventional approaches, DynaSubVAE evolves with the data by dynamically updating its latent structure to capture new trends. It leverages a novel non-parametric clustering mechanism, inspired by Gaussian Mixture Models, to discover and model latent subgroups based on embedding similarity. Extensive experiments show that DynaSubVAE achieves competitive performance in both near-OOD and far-OOD detection, and excels in class-OOD scenarios where an entire class is missing during training. We further illustrate that our dynamic subgrouping mechanism outperforms standalone clustering methods such as GMM and KMeans++ in terms of both OOD accuracy and regret precision.

与传统方法不同,DynaSubVAE与数据不同,通过动态更新其潜在结构来捕捉新趋势。它利用高斯混合模型的新型非参数组群机制,在嵌入相似性的基础上发现和模拟潜在组群。 广泛的实验显示,DynaSubVAE在近OOD和远OODD探测中都取得了竞争性业绩,并在整个班级在培训中都缺少的类类OOD情景中取得了优异性。我们进一步说明,我们动态分组机制超越了诸如GMM和KIMeans++等独立组群方法,例如GMM和KIMeans+,在OODG和PR精确度方面都是如此。


Article 262

Title@2025-06-11 (3): Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance

Title: Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance Objekt-Halluzination in großen Visions-Sprachen-Modellen durch Bild-umrundete Anleitung abmildern 通过图像四舍五入指南在大型视觉语言模型中减轻物体幻觉 2402.08680v2

Authors (4): Linxi Zhao, Yihe Deng, Weitong Zhang, Quanquan Gu

The advancement of Large Vision-Language Models (LVLMs) has increasingly highlighted the critical issue of their tendency to hallucinate non-existing objects in the images. To address this issue, previous works focused on using specially curated datasets or powerful LLMs to rectify the outputs of LVLMs. However, these approaches require either costly training or fine-tuning, or API access to proprietary LLMs for post-generation correction. In response to these limitations, we propose Mitigating hallucinAtion via image-gRounded guIdaNcE (MARINE), a framework that is both training-free and API-free. MARINE effectively and efficiently reduces object hallucinations during inference by introducing image-grounded guidance to LVLMs. This is achieved by leveraging open-source vision models to extract object-level information, thereby enhancing the precision of LVLM-generated content. Our framework’s flexibility further allows for the integration of multiple vision models, enabling more reliable and robust object-level guidance. Through comprehensive evaluations across 5 popular LVLMs with diverse evaluation metrics and benchmarks, we demonstrate the effectiveness of MARINE, which even outperforms existing fine-tuning-based methods. Remarkably, it reduces hallucinations consistently in GPT-4V-assisted evaluation while maintaining the detailedness of LVLMs’ generations. We release our code at https://github.com/Linxi-ZHAO/MARINE.

为解决这一问题,以往的工作重点是利用特别整理的数据集或强大的LLMS来纠正LVMs的产出。然而,这些方法需要成本高的培训或微调,或者需要API获得专有的LMS来进行生成后校正。为了应对这些限制,我们提议通过图像定位古尔达恩克(MARINE)来缓解幻觉,这是一个既无培训又无API的框架。MARINE通过向LVLMs引入基于图像的指导来切实有效地减少物体的幻觉。这是通过利用公开来源的视觉模型来提取目标级信息,从而提高LVMM内容的精确度。我们的框架的灵活性进一步允许通过图像模型的整合,使得更可靠和可靠的对象级指导。通过对5个大众LVLMS系统进行全面评价,同时采用不同的评价基准和基准,从而在LVMS-LMS-DMS/REDRA规则的不断调整中,我们通过GLMS-RISMS-remartal 的升级方法来降低其有效性。


Article 263

Title@2025-06-11 (3): Exploring Topological and Localization Phenomena in SSH Chains under Generalized AAH Modulation: A Computational Approach

Title: Exploring Topological and Localization Phenomena in SSH Chains under Generalized AAH Modulation: A Computational Approach Erforschung topologischer und Lokalisierungsphänomene in SSH-Ketten unter generalisierter AAH-Modulation: Ein rechnerischer Ansatz 根据普遍AAAH变动:一种计算方法,在SSH链链中探索地形学和本地化现象 2506.10195v1

Authors (2): Souvik Ghosh, Sayak Roy

The Su-Schrieffer-Heeger (SSH) model serves as a canonical example of a one-dimensional topological insulator, yet its behavior under more complex, realistic conditions remains a fertile ground for research. This paper presents a comprehensive computational investigation into generalized SSH models, exploring the interplay between topology, quasi-periodic disorder, non-Hermiticity, and time-dependent driving. Using exact diagonalization and specialized numerical solvers, we map the system’s phase space through its spectral properties and localization characteristics, quantified by the Inverse Participation Ratio (IPR). We demonstrate that while the standard SSH model exhibits topologically protected edge states, these are destroyed by a localization transition induced by strong Aubry-Andr'e-Harper (AAH) modulation. Further, we employ unsupervised machine learning (PCA) to autonomously classify the system’s phases, revealing that strong localization can obscure underlying topological signatures. Extending the model beyond Hermiticity, we uncover the non-Hermitian skin effect, a dramatic localization of all bulk states at a boundary. Finally, we apply a periodic Floquet drive to a topologically trivial chain, successfully engineering a Floquet topological insulator characterized by the emergence of anomalous edge states at the boundaries of the quasi-energy zone. These findings collectively provide a multi-faceted view of the rich phenomena hosted in generalized 1D topological systems.

Su-Schrieffer- Heeger (SSH) 模型是一个单维表层绝缘模型的典型例子,但该模型在更复杂、更现实的条件下的行为仍然是研究的肥沃土壤。本文展示了对普通SSH模型的全面计算调查,探索了地形学、半周期性混乱、非周期性以及时间驱动之间的相互作用。我们使用精确的分解和专门的数字解算器,通过光谱特性和本地化特征绘制了系统的阶段空间图,由反向参与比率(IPR)进行量化。我们证明,虽然标准SSH模型在更复杂、更现实的条件下的行为仍然是受表面保护的边缘状态,但这些状态却由于由强力AUBry-Andrèe-Harper(AAAAH)调控的本地化过程而导致的局部化转变而遭到破坏。此外,我们采用不受监督的机器学习(PCA) 来对系统的各个阶段进行自主分类,表明强的本地化可以模糊基本的地表层特征特征。将模型扩展到Hermitici(I),我们发现非人类的内层结构结构结构的表面,我们发现在1级一级的地表层上层的表面的表面上层的表面的表面结构上层,我们成功地将一个稳定的地层的地压压压压压压的地压压压在1的地压的地压在最上。


Article 264

Title@2025-06-11 (3): Multi-task Representation Learning for Mixed Integer Linear Programming

Title: Multi-task Representation Learning for Mixed Integer Linear Programming Multi-Task-Darstellungs-Lernen für gemischte Integer-Linear-Programmierung 混合整数线性线性方案规划多任务代表学习 2412.14409v2

Authors (3): Junyang Cai, Taoan Huang, Bistra Dilkina

Mixed Integer Linear Programs (MILPs) are highly flexible and powerful tools for modeling and solving complex real-world combinatorial optimization problems. Recently, machine learning (ML)-guided approaches have demonstrated significant potential in improving MILP-solving efficiency. However, these methods typically rely on separate offline data collection and training processes, which limits their scalability and adaptability. This paper introduces the first multi-task learning framework for ML-guided MILP solving. The proposed framework provides MILP embeddings helpful in guiding MILP solving across solvers (e.g., Gurobi and SCIP) and across tasks (e.g., Branching and Solver configuration). Through extensive experiments on three widely used MILP benchmarks, we demonstrate that our multi-task learning model performs similarly to specialized models within the same distribution. Moreover, it significantly outperforms them in generalization across problem sizes and tasks.

混合线性程序(MILP)是模拟和解决复杂的现实世界组合优化问题的非常灵活和有力的工具。最近,机器学习(ML)指导方法在提高MILP解决效率方面显示出巨大潜力。然而,这些方法通常依靠单独的离线数据收集和培训程序,这限制了其可缩放性和适应性。本文介绍了ML引导的MILP解决的第一个多任务学习框架。拟议框架提供了MILP嵌入器,有助于指导MILP解决解决解决解决解决解决解决解决所有问题(如Gurobi和SCIP)和跨任务(如分流和溶解器配置)的问题。通过对三大广泛使用的MILP基准进行广泛试验,我们证明我们的多任务学习模式与同一分布范围内的专门模型类似。此外,它大大超越了在问题大小和任务之间的通用模式。


Article 265

Title@2025-06-11 (3): Improving Oral Cancer Outcomes Through Machine Learning and Dimensionality Reduction

Title: Improving Oral Cancer Outcomes Through Machine Learning and Dimensionality Reduction Verbesserung der Ergebnisse von Oralkrebs durch maschinelles Lernen und Reduzierung der Dimensionalität 通过机器学习和减少分量来改善口腔癌结果 2506.10189v1

Authors (3): Mohammad Subhi Al-Batah, Muhyeeddin Alqaraleh, Mowafaq Salem Alzboon

Oral cancer presents a formidable challenge in oncology, necessitating early diagnosis and accurate prognosis to enhance patient survival rates. Recent advancements in machine learning and data mining have revolutionized traditional diagnostic methodologies, providing sophisticated and automated tools for differentiating between benign and malignant oral lesions. This study presents a comprehensive review of cutting-edge data mining methodologies, including Neural Networks, K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and ensemble learning techniques, specifically applied to the diagnosis and prognosis of oral cancer. Through a rigorous comparative analysis, our findings reveal that Neural Networks surpass other models, achieving an impressive classification accuracy of 93,6 % in predicting oral cancer. Furthermore, we underscore the potential benefits of integrating feature selection and dimensionality reduction techniques to enhance model performance. These insights underscore the significant promise of advanced data mining techniques in bolstering early detection, optimizing treatment strategies, and ultimately improving patient outcomes in the realm of oral oncology.

口腔癌在肿瘤学方面提出了艰巨的挑战,需要早期诊断和准确预测,以提高患者存活率。最近机器学习和数据挖掘方面的进步使传统诊断方法发生了革命性的变化,为区分良性和恶性口腔损伤提供了尖端和自动化的工具。本研究报告全面审查了尖端数据挖掘方法,包括神经网络、K-Nearest Nearbearbors(KNNN)、支持矢量机(SVM)和联合学习技术,特别适用于口腔癌的诊断和诊断。通过严格的比较分析,我们的调查结果显示神经网络超越了其他模型,在预测口腔癌症方面达到了令人印象深刻的分类精确度93.6 % 。此外,我们强调将特征选择和减少维度技术结合起来以提高模型性能的潜在好处。这些见解强调了先进的数据挖掘技术在支持早期检测、优化治疗战略以及最终改善口腔肿瘤领域的患者结果方面的巨大前景。


Article 266

Title@2025-06-11 (3): Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce

Title: Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce Zukunft der Arbeit mit KI-Agenten: Auditing Automation und Augmentation Potential in der gesamten US-Arbeitskraft 未来与AI代理机构的合作:全美国员工队伍的审计自动化和增强潜力 2506.06576v2

Authors (7): Yijia Shao, Humishka Zope, Yucheng Jiang, Jiaxin Pei, David Nguyen, Erik Brynjolfsson, Diyi Yang

The rapid rise of compound AI systems (a.k.a., AI agents) is reshaping the labor market, raising concerns about job displacement, diminished human agency, and overreliance on automation. Yet, we lack a systematic understanding of the evolving landscape. In this paper, we address this gap by introducing a novel auditing framework to assess which occupational tasks workers want AI agents to automate or augment, and how those desires align with the current technological capabilities. Our framework features an audio-enhanced mini-interview to capture nuanced worker desires and introduces the Human Agency Scale (HAS) as a shared language to quantify the preferred level of human involvement. Using this framework, we construct the WORKBank database, building on the U.S. Department of Labor’s O*NET database, to capture preferences from 1,500 domain workers and capability assessments from AI experts across over 844 tasks spanning 104 occupations. Jointly considering the desire and technological capability divides tasks in WORKBank into four zones: Automation “Green Light” Zone, Automation “Red Light” Zone, R&D Opportunity Zone, Low Priority Zone. This highlights critical mismatches and opportunities for AI agent development. Moving beyond a simple automate-or-not dichotomy, our results reveal diverse HAS profiles across occupations, reflecting heterogeneous expectations for human involvement. Moreover, our study offers early signals of how AI agent integration may reshape the core human competencies, shifting from information-focused skills to interpersonal ones. These findings underscore the importance of aligning AI agent development with human desires and preparing workers for evolving workplace dynamics.

复合AI系统(a.k.a.a.a.,AI代理商)的迅速上升正在改变劳动力市场,使人们对就业转移、人力机构衰减和过度依赖自动化等问题感到关切。然而,我们缺乏对不断变化的形势的系统理解。在本文件中,我们通过引入新的审计框架来弥补这一差距,以评估哪些职业任务工人希望AI代理商自动化或扩大,以及这些工人的愿望如何与当前的技术能力相一致。我们的框架将强化的小型访问功能用于捕捉微妙的工人愿望,并将人类机构规模作为量化人类参与偏好程度的共同语言。我们利用这一框架,在劳工部O*NET数据库的基础上,建立WorkBank数据库,从1,500个领域工人的偏好和AI代理商的能力评估,涵盖超过104种职业的844项任务。我们共同考虑将工作班克的任务分为四个区域:自动化的“绿色光区”、自动化的“亮点区”的“人类机构规模”,将人类机构规模作为量化人类参与程度的通用语言。我们利用这个框架构建了Horb Bank数据库数据库,从A.


Article 267

Title@2025-06-11 (3): Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming Problem

Title: Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming Problem Balans: Multi-Armed Bandits-basierte adaptive Großnachbarschaft Suche nach gemischt-integer-Programmierungsproblem Balans:多武装强盗基于适应性的大型邻里搜索混合内插方案拟订问题 2412.14382v2

Authors (3): Junyang Cai, Serdar Kadioglu, Bistra Dilkina

Mixed-integer programming (MIP) is a powerful paradigm for modeling and solving various important combinatorial optimization problems. Recently, learning-based approaches have shown a potential to speed up MIP solving via offline training that then guides important design decisions during the search. However, a significant drawback of these methods is their heavy reliance on offline training, which requires collecting training datasets and computationally costly training epochs yet offering only limited generalization to unseen (larger) instances. In this paper, we propose Balans, an adaptive meta-solver for MIPs with online learning capability that does not require any supervision or apriori training. At its core, Balans is based on adaptive large-neighborhood search, operating on top of an MIP solver by successive applications of destroy and repair neighborhood operators. During the search, the selection among different neighborhood definitions is guided on the fly for the instance at hand via multi-armed bandit algorithms. Our extensive experiments on hard optimization instances show that Balans offers significant performance gains over the default MIP solver, is better than committing to any single best neighborhood, and improves over the state-of-the-art large-neighborhood search for MIPs. Finally, we release Balans as a highly configurable, MIP solver agnostic, open-source software.

混合内插程序(MIP)是建模和解决各种重要的组合优化问题的强大范例。最近,基于学习的方法显示出通过离线培训加快MIP解决离线培训的潜力,从而在搜索过程中指导重要的设计决定。然而,这些方法的一个重大缺陷是严重依赖离线培训,这需要收集培训数据集和计算成本高昂的培训时代,但仅对看不见(大)案例提供有限的概括化。在本文中,我们提议Balans,这是具有无需任何监督或优先培训的在线学习能力的MIP的适应性元软件。在其核心方面,Balans基于适应性大型邻里搜索,通过连续的摧毁和修理社区运营商应用在MIP解决方案解决方案解决方案解决方案解决方案解决方案解决方案的顶端运作。在搜索过程中,不同街区定义的选择以多种手边际(大)算法为指导。我们在硬性优化实例上的广泛实验显示,Balans为默认的MIP解决方案解决方案解决方案解决方案解决方案解决方案解决方案解决方案提供显著的绩效收益,比致力于任何单一的最佳邻里搜索工具更好,我们最终将高端的搜索系统升级。


Article 268

Title@2025-06-11 (3): Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment

Title: Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment Skalierbare nicht-equivariante 3D-Molekül-Erzeugung über Rotational Alignment 通过旋转对齐可缩放的非等式 3D 分子生成 2506.10186v1

Authors (2): Yuhui Ding, Thomas Hofmann

Equivariant diffusion models have achieved impressive performance in 3D molecule generation. These models incorporate Euclidean symmetries of 3D molecules by utilizing an SE(3)-equivariant denoising network. However, specialized equivariant architectures limit the scalability and efficiency of diffusion models. In this paper, we propose an approach that relaxes such equivariance constraints. Specifically, our approach learns a sample-dependent SO(3) transformation for each molecule to construct an aligned latent space. A non-equivariant diffusion model is then trained over the aligned representations. Experimental results demonstrate that our approach performs significantly better than previously reported non-equivariant models. It yields sample quality comparable to state-of-the-art equivariant diffusion models and offers improved training and sampling efficiency. Our code is available at https://github.com/skeletondyh/RADM

在 3D 分子 生成 中, 等式扩散模型取得了令人印象深刻的性能。 这些模型通过使用 SE(3) 等式除尘网络, 包含了 3D 分子 的 Euclide 的对称性。 然而, 专业的等式结构限制了 扩散模型的可缩放性和效率 。 在本文中, 我们提出了一种放松这种等式限制的方法 。 具体地说, 我们的方法为每个分子学习了一种基于样本的 SO(3) 转换, 以构建一个匹配的潜在空间 。 然后, 对一个非 等式扩散模型进行了相应的培训 。 实验结果显示, 我们的方法比先前报告的非等式模型要好得多 。 它产生与最先进的等式扩散模型相近的样本质量, 并提供更好的培训和取样效率 。 我们的代码可以在 https://github.com/skeletondyh/RADM 上查阅 。


Article 269

Title@2025-06-11 (3): Optimizing Genetic Algorithms with Multilayer Perceptron Networks for Enhancing TinyFace Recognition

Title: Optimizing Genetic Algorithms with Multilayer Perceptron Networks for Enhancing TinyFace Recognition Optimierung genetischer Algorithmen mit mehrschichtigen Perceptron-Netzwerken zur Verbesserung der TinyFace-Erkennung 利用增强微面识别的多层倍感人网络优化遗传算法 2506.10184v1

Authors (3): Mohammad Subhi Al-Batah, Mowafaq Salem Alzboon, Muhyeeddin Alqaraleh

This study conducts an empirical examination of MLP networks investigated through a rigorous methodical experimentation process involving three diverse datasets: TinyFace, Heart Disease, and Iris. Study Overview: The study includes three key methods: a) a baseline training using the default settings for the Multi-Layer Perceptron (MLP), b) feature selection using Genetic Algorithm (GA) based refinement c) Principal Component Analysis (PCA) based dimension reduction. The results show important information on how such techniques affect performance. While PCA had showed benefits in low-dimensional and noise-free datasets GA consistently increased accuracy in complex datasets by accurately identifying critical features. Comparison reveals that feature selection and dimensionality reduction play interdependent roles in enhancing MLP performance. The study contributes to the literature on feature engineering and neural network parameter optimization, offering practical guidelines for a wide range of machine learning tasks

研究概览:研究包括三个关键方法:(a) 利用多湖 Perceptron(MLP)的默认设置进行基线培训;(b) 利用基于基因变异(GA)的精细进行特征选择;(c) 基于主要组成部分分析(PCA)的尺寸降低。研究结果显示了有关这些技术如何影响性能的重要信息。尽管五氯苯甲醚通过准确识别关键特征,在低维和无噪音数据集方面显示出在GA持续提高复杂数据集的准确性的好处。比较表明,特征选择和减少维度在提高MLP性能方面发挥着相互依存的作用。该研究有助于关于特征工程学和神经网络参数优化的文献,为广泛的机器学习任务提供了实用指南。


Article 270

Title@2025-06-11 (3): A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes

Title: A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes Eine vergleichende Studie über Techniken des maschinellen Lernens zur Frühvorhersage von Diabetes 早期预测糖尿病的机器学习技术比较研究 2506.10180v1

Authors (5): Mowafaq Salem Alzboon, Mohammad Al-Batah, Muhyeeddin Alqaraleh, Ahmad Abuashour, Ahmad Fuad Bader

In many nations, diabetes is becoming a significant health problem, and early identification and control are crucial. Using machine learning algorithms to predict diabetes has yielded encouraging results. Using the Pima Indians Diabetes dataset, this study attempts to evaluate the efficacy of several machine-learning methods for diabetes prediction. The collection includes information on 768 patients, such as their ages, BMIs, and glucose levels. The techniques assessed are Logistic Regression, Decision Tree, Random Forest, k-Nearest Neighbors, Naive Bayes, Support Vector Machine, Gradient Boosting, and Neural Network. The findings indicate that the Neural Network algorithm performed the best, with an accuracy of 78.57 percent, followed by the Random Forest method, with an accuracy of 76.30 percent. The study implies that machine learning algorithms can aid diabetes prediction and be an efficient early detection tool.

在许多国家,糖尿病正在成为一个严重的健康问题,早期识别和控制至关重要。使用机器学习算法来预测糖尿病已经产生了令人鼓舞的结果。使用皮马印第安人糖尿病数据集,这项研究试图评估糖尿病预测的若干机器学习方法的功效。收集包括768名病人的信息,如他们的年龄、BMIs和葡萄糖水平。评估的技术是后勤递增、决定树、随机森林、K-Near邻居、Naive Bayes、支持矢量机、Gradient Bobsting和Neural 网络。 研究结果显示神经网络算法最出色,精确度为78.57%,其次是随机森林法,精确度为76.30%。研究意味着机器学习算法可以帮助糖尿病的预测,并成为高效的早期检测工具。


Article 271

Title@2025-06-11 (3): Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models

Title: Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models Geometrische Regelmäßigkeit bei der Deterministischen Probenahme von Diffusions-basierten Generativen Modellen 以传播为基础的创造模型的确定性抽样的几何规律性 2506.10177v1

Authors (4): Defang Chen, Zhenyu Zhou, Can Wang, Siwei Lyu

Diffusion-based generative models employ stochastic differential equations (SDEs) and their equivalent probability flow ordinary differential equations (ODEs) to establish a smooth transformation between complex high-dimensional data distributions and tractable prior distributions. In this paper, we reveal a striking geometric regularity in the deterministic sampling dynamics: each simulated sampling trajectory lies within an extremely low-dimensional subspace, and all trajectories exhibit an almost identical ‘‘boomerang’’ shape, regardless of the model architecture, applied conditions, or generated content. We characterize several intriguing properties of these trajectories, particularly under closed-form solutions based on kernel-estimated data modeling. We also demonstrate a practical application of the discovered trajectory regularity by proposing a dynamic programming-based scheme to better align the sampling time schedule with the underlying trajectory structure. This simple strategy requires minimal modification to existing ODE-based numerical solvers, incurs negligible computational overhead, and achieves superior image generation performance, especially in regions with only $5 \sim 10$ function evaluations.

在本文中,我们揭示了确定性取样动态中惊人的几何规律性:每个模拟采样轨迹都位于一个极低维次空间之内,而所有轨迹都呈现出几乎相同的“boomerang”形状,而不论模型结构、应用条件或生成的内容如何。我们对这些轨迹的一些令人感兴趣的特性作了定性,特别是在基于内核估计数据模型的封闭式解决办法下。我们还展示了所发现的轨迹规律性的实际应用,为此提出了一个动态的基于规划的计划,以更好地将采样时间表与基本轨迹结构相协调。这一简单战略要求对现有以ODE为基础的数字解答器进行最低限度的修改,产生微不足道的计算间接费用,并取得更好的图像生成性能,特别是在只有5美元或10美元的功能评价的区域。


Article 272

Title@2025-06-11 (3): SPARKE: Scalable Prompt-Aware Diversity Guidance in Diffusion Models via RKE Score

Title: SPARKE: Scalable Prompt-Aware Diversity Guidance in Diffusion Models via RKE Score SPARKE: Skalierbares Prompt-Aware Diversity Guidance in Diffusionsmodellen über RKE-Score SPARKE:通过RKE分数在传播模型中提供可缩放的即时快速软件多样性指导 2506.10173v1

Authors (4): Mohammad Jalali, Haoyu Lei, Amin Gohari, Farzan Farnia

Diffusion models have demonstrated remarkable success in high-fidelity image synthesis and prompt-guided generative modeling. However, ensuring adequate diversity in generated samples of prompt-guided diffusion models remains a challenge, particularly when the prompts span a broad semantic spectrum and the diversity of generated data needs to be evaluated in a prompt-aware fashion across semantically similar prompts. Recent methods have introduced guidance via diversity measures to encourage more varied generations. In this work, we extend the diversity measure-based approaches by proposing the Scalable Prompt-Aware R'eny Kernel Entropy Diversity Guidance (SPARKE) method for prompt-aware diversity guidance. SPARKE utilizes conditional entropy for diversity guidance, which dynamically conditions diversity measurement on similar prompts and enables prompt-aware diversity control. While the entropy-based guidance approach enhances prompt-aware diversity, its reliance on the matrix-based entropy scores poses computational challenges in large-scale generation settings. To address this, we focus on the special case of Conditional latent RKE Score Guidance, reducing entropy computation and gradient-based optimization complexity from the $O(n^3)$ of general entropy measures to $O(n)$. The reduced computational complexity allows for diversity-guided sampling over potentially thousands of generation rounds on different prompts. We numerically test the SPARKE method on several text-to-image diffusion models, demonstrating that the proposed method improves the prompt-aware diversity of the generated data without incurring significant computational costs. We release our code on the project page: https://mjalali.github.io/SPARKE

然而,确保快速引导扩散模型的生成样本具有充分的多样性仍然是一项挑战,特别是当闪光覆盖广泛的语义频谱,而生成的数据的多样性需要通过语义相似的信号进行快速觉悟式的评价。最近的方法通过多样性措施引入了指南,鼓励更多代人。在这项工作中,我们推广基于多样性的计量方法,为快速觉悟的多样化指南提出了可缩用快速引导的Kennal Entropy Explication Guidelines(SPARKE)方法。 SPARKE利用有条件的昆虫用于多样性指南,动态地为类似的语义频谱频谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱谱,从而在大规模生成环境中提出计算挑战。为了解决这个问题,我们把重点放在了“Contial livestal RKE recostorational Goal” 指南的特殊案例, 降低“Otropreal demal” rograductional-roduction roductional-roductionsalationalational-slational roupalational roupationalationalationalation-salationalationalationalationalationalation-slational daldal-s) 。


Article 273

Title@2025-06-11 (3): Learning a Gaussian Mixture for Sparsity Regularization in Inverse Problems

Title: Learning a Gaussian Mixture for Sparsity Regularization in Inverse Problems Eine Gaußische Mischung für Sparsamkeitsregularisierung in inversen Problemen lernen a 学习高斯混音,以在反向问题中实现分级规范化 2401.16612v2

Authors (4): Giovanni S. Alberti, Luca Ratti, Matteo Santacesaria, Silvia Sciutto

In inverse problems, it is widely recognized that the incorporation of a sparsity prior yields a regularization effect on the solution. This approach is grounded on the a priori assumption that the unknown can be appropriately represented in a basis with a limited number of significant components, while most coefficients are close to zero. This occurrence is frequently observed in real-world scenarios, such as with piecewise smooth signals. In this study, we propose a probabilistic sparsity prior formulated as a mixture of degenerate Gaussians, capable of modeling sparsity with respect to a generic basis. Under this premise, we design a neural network that can be interpreted as the Bayes estimator for linear inverse problems. Additionally, we put forth both a supervised and an unsupervised training strategy to estimate the parameters of this network. To evaluate the effectiveness of our approach, we conduct a numerical comparison with commonly employed sparsity-promoting regularization techniques, namely LASSO, group LASSO, iterative hard thresholding, and sparse coding/dictionary learning. Notably, our reconstructions consistently exhibit lower mean square error values across all $1$D datasets utilized for the comparisons, even in cases where the datasets significantly deviate from a Gaussian mixture model.

反之,人们广泛认识到,将表面偏差纳入之前的偏差性会给解决办法带来一种正规化效果。这一方法基于先验的假设,即未知因素可以在数量有限的重要组成部分的基础上得到适当代表,而大多数系数接近于零。这种情况经常在现实世界的情景中观察到,例如以片状光滑信号观察到。在这项研究中,我们提出一种概率偏差,先作为退化的高斯人的一种混合体来配制,能够模拟一般基础的过度性。在这个前提下,我们设计了一个神经网络,可以被解释为对线性反向问题进行拜斯估计。此外,我们提出了一个监督和非监督的培训战略来估计这个网络的参数。为了评估我们的方法的有效性,我们与常用的偏差性刺激规范化技术进行了数字比较,即LASSO、LASSO组、迭代硬阈定门槛和稀有的编码/字典学习。值得注意的是,我们的重建工作始终显示,在所有1美元D的混合物模型中,与用于比较的数据差得多。


Article 274

Title@2025-06-11 (3): Momentum Multi-Marginal Schrödinger Bridge Matching

Title: Momentum Multi-Marginal Schrödinger Bridge Matching Momentum Multi-Marginal Schrödinger Brücke passend 多波盘桥配桥 2506.10168v1

Authors (4): Panagiotis Theodoropoulos, Augustinos D. Saravanos, Evangelos A. Theodorou, Guan-Horng Liu

Understanding complex systems by inferring trajectories from sparse sample snapshots is a fundamental challenge in a wide range of domains, e.g., single-cell biology, meteorology, and economics. Despite advancements in Bridge and Flow matching frameworks, current methodologies rely on pairwise interpolation between adjacent snapshots. This hinders their ability to capture long-range temporal dependencies and potentially affects the coherence of the inferred trajectories. To address these issues, we introduce \textbf{Momentum Multi-Marginal Schr"odinger Bridge Matching (3MSBM)}, a novel matching framework that learns smooth measure-valued splines for stochastic systems that satisfy multiple positional constraints. This is achieved by lifting the dynamics to phase space and generalizing stochastic bridges to be conditioned on several points, forming a multi-marginal conditional stochastic optimal control problem. The underlying dynamics are then learned by minimizing a variational objective, having fixed the path induced by the multi-marginal conditional bridge. As a matching approach, 3MSBM learns transport maps that preserve intermediate marginals throughout training, significantly improving convergence and scalability. Extensive experimentation in a series of real-world applications validates the superior performance of 3MSBM compared to existing methods in capturing complex dynamics with temporal dependencies, opening new avenues for training matching frameworks in multi-marginal settings.

通过从少样的样片中推断出轨迹来理解复杂系统,这是一系列广泛领域,例如单细胞生物学、气象学和经济学领域的一项根本挑战。尽管在大桥和流程匹配框架方面有所进步,但目前的方法依赖相邻快照之间对齐的内插。这妨碍了它们捕捉长距离时间依赖的能力,并有可能影响推断轨迹的一致性。为了解决这些问题,我们引入了\ textbf{Mmomentum 多重悬界Schr"大桥匹配(3MSBMBM})这一新型匹配框架,以学习满足多种定位限制的随机系统平滑度估量值螺旋丝。尽管在大桥和流动匹配框架方面有所进步,但目前的方法是:通过提升空间的动态,将随机桥梁连接到若干点,形成多边际的有条件的新偏移最佳控制问题。然后通过尽量减少变异性目标来学习基本动态,同时将多边框桥桥牵引导出的道路固定在(3MSBM)中。作为匹配方法的匹配方法,3MSBBM系列的运输路径与在大幅改进现有双向性动态定位的移动模型的模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模型中学习,从而在大大保持现有伸缩图。


Article 275

Title@2025-06-11 (3): Wasserstein Barycenter Soft Actor-Critic

Title: Wasserstein Barycenter Soft Actor-Critic Wasserstein Barycenter Soft-Actor-Kritik 瓦塞斯泰因·巴里研究中心 2506.10167v1

Authors (2): Zahra Shahrooei, Ali Baheri

Deep off-policy actor-critic algorithms have emerged as the leading framework for reinforcement learning in continuous control domains. However, most of these algorithms suffer from poor sample efficiency, especially in environments with sparse rewards. In this paper, we take a step towards addressing this issue by providing a principled directed exploration strategy. We propose Wasserstein Barycenter Soft Actor-Critic (WBSAC) algorithm, which benefits from a pessimistic actor for temporal difference learning and an optimistic actor to promote exploration. This is achieved by using the Wasserstein barycenter of the pessimistic and optimistic policies as the exploration policy and adjusting the degree of exploration throughout the learning process. We compare WBSAC with state-of-the-art off-policy actor-critic algorithms and show that WBSAC is more sample-efficient on MuJoCo continuous control tasks.

在连续控制领域,深层的外政策行为者-批评算法已成为加强学习的主要框架,但是,大多数这些算法都因抽样效率低下而受到影响,特别是在回报微薄的环境中;在本文件中,我们通过提供有原则的定向勘探战略,为解决这一问题迈出了一步;我们提出了瓦瑟斯坦·巴里森·巴里森特·索夫特·演员-批评(WBSAC)算法,该算法得益于一个在时间差异方面学习的悲观行为者,以及一个促进探索的乐观行为者。这是通过利用悲观和乐观政策的瓦塞斯坦温温中心作为探索政策,并在整个学习过程中调整勘探程度来实现的。我们把WBSAC与最先进的非政策行为者-批评性算法进行比较,并表明WBSAC在 MuJoco连续控制任务上更具样本效率。


Article 276

Title@2025-06-11 (3): The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset

Title: The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset Der PNPL-Wettbewerb 2025: Spracherkennung und Phoneme-Klassifizierung im LibriBrain-Datensatz 2025年PNPL竞争:Libribrain数据集中的语音探测和电话分类 2506.10165v1

Authors (18): Gilad Landau, Miran Özdogan, Gereon Elvers, Francesco Mantegna, Pratik Somaiya, Dulhan Jayalath, Luisa Kurth, Teyun Kwon, Brendan Shillingford, Greg Farquhar, Minqi Jiang, Karim Jerbi, Hamza Abdelhedi, Yorguin Mantilla Ramos, Caglar Gulcehre, Mark Woolrich, Natalie Voets, Oiwi Parker Jones

The advance of speech decoding from non-invasive brain data holds the potential for profound societal impact. Among its most promising applications is the restoration of communication to paralysed individuals affected by speech deficits such as dysarthria, without the need for high-risk surgical interventions. The ultimate aim of the 2025 PNPL competition is to produce the conditions for an “ImageNet moment” or breakthrough in non-invasive neural decoding, by harnessing the collective power of the machine learning community. To facilitate this vision we present the largest within-subject MEG dataset recorded to date (LibriBrain) together with a user-friendly Python library (pnpl) for easy data access and integration with deep learning frameworks. For the competition we define two foundational tasks (i.e. Speech Detection and Phoneme Classification from brain data), complete with standardised data splits and evaluation metrics, illustrative benchmark models, online tutorial code, a community discussion board, and public leaderboard for submissions. To promote accessibility and participation the competition features a Standard track that emphasises algorithmic innovation, as well as an Extended track that is expected to reward larger-scale computing, accelerating progress toward a non-invasive brain-computer interface for speech.

从非侵入性大脑数据中解码言论的推进具有深刻的社会影响的潜力,其中最有希望的应用是恢复通信,使受言论赤字影响的个人瘫痪,例如听障者,不需要高风险外科手术干预。2025年PNPL竞争的最终目的是通过利用机器学习界的集体力量,创造“图像网时刻”或非侵入性神经解码突破的条件。为了促进这一愿景,我们展示了迄今为止记录的最大主体内主MEG数据集(利布里布雷恩),以及一个方便用户的Python图书馆(ppl),以便容易获得数据并融入深层学习框架。关于竞争,我们定义了两项基本任务(即语音探测和脑数据的电话分类),完成于标准化的数据分解和评估衡量标准、说明性基准模型、在线辅导代码、社区讨论平台以及公众提交材料的首选板。促进无障碍和参与竞争,其特点是强调算法创新的标准轨道,并将其作为升级的语音界面,用于加速进行非升级的升级的计算机计算。


Article 277

Title@2025-06-11 (3): Probabilistic Variational Contrastive Learning

Title: Probabilistic Variational Contrastive Learning Probabilistisches Variationelles Kontrastives Lernen 概率概率变化差异性差异性学习 2506.10159v1

Authors (3): Minoh Jeong, Seonho Kim, Alfred Hero

Deterministic embeddings learned by contrastive learning (CL) methods such as SimCLR and SupCon achieve state-of-the-art performance but lack a principled mechanism for uncertainty quantification. We propose Variational Contrastive Learning (VCL), a decoder-free framework that maximizes the evidence lower bound (ELBO) by interpreting the InfoNCE loss as a surrogate reconstruction term and adding a KL divergence regularizer to a uniform prior on the unit hypersphere. We model the approximate posterior $q_\theta(z x)$ as a projected normal distribution, enabling the sampling of probabilistic embeddings. Our two instantiations–VSimCLR and VSupCon–replace deterministic embeddings with samples from $q_\theta(z x)$ and incorporate a normalized KL term into the loss. Experiments on multiple benchmarks demonstrate that VCL mitigates dimensional collapse, enhances mutual information with class labels, and matches or outperforms deterministic baselines in classification accuracy, all the while providing meaningful uncertainty estimates through the posterior model. VCL thus equips contrastive learning with a probabilistic foundation, serving as a new basis for contrastive approaches.

SimCLR 和 SupCon 等对比学习方法所学的确定性嵌入方法,例如SimCLR 和 SupCon , 取得了最先进的性能,但却缺乏一个用于量化不确定性的原则机制。我们建议采用差异性对比学习(VCL),这是一个无编码性框架,通过将InfoNCE 损失解释为替代重建术语,在单位超视谱中将一个标准化的 KL 术语纳入一个统一的前题,从而最大限度地增加证据约束性(ELBO ) 。我们将近似后端 $Qtheta (zxx) 的数据作为预测正常分布的模型,以便能够取样概率性嵌入。我们的两个即时- VimCLR 和 VSupCon 的确定性嵌入点与 $qótheta (zxx ) 的样本的样本混集(ELBOB) , 并在损失中加入一个标准化的 KLL 术语。在多个基准实验中表明VCLL 能够缓解尺寸崩溃, 用类标签加强相互信息, 和匹配或超出确定性基准, 分类精确性基准, , 并同时提供新的对比基础。


Article 278

Title@2025-06-11 (3): Measuring Corporate Human Capital Disclosures: Lexicon, Data, Code, and Research Opportunities

Title: Measuring Corporate Human Capital Disclosures: Lexicon, Data, Code, and Research Opportunities Corporate Human Capital Disclosures messen: Lexikon, Daten, Code und Forschungsmöglichkeiten 衡量公司人力资本披露:词汇、数据、代码和研究机会 2506.10155v1

Authors (3): Elizabeth Demers, Victor Xiaoqi Wang, Kean Wu

Human capital (HC) is increasingly important to corporate value creation. Unlike other assets, however, HC is not currently subject to well-defined measurement or disclosure rules. We use a machine learning algorithm (word2vec) trained on a confirmed set of HC disclosures to develop a comprehensive list of HC-related keywords classified into five subcategories (DEI; health and safety; labor relations and culture; compensation and benefits; and demographics and other) that capture the multidimensional nature of HC management. We share our lexicon, corporate HC disclosures, and the Python code used to develop the lexicon, and we provide detailed examples of using our data and code, including for fine-tuning a BERT model. Researchers can use our HC lexicon (or modify the code to capture another construct of interest) with their samples of corporate communications to address pertinent HC questions. We close with a discussion of future research opportunities related to HC management and disclosure.

与其它资产不同,HC目前不受明确界定的测量或披露规则的约束。 我们使用经过经证实的HC披露方法培训的机器学习算法(word2vec)来制定一份全面的HC相关关键词清单,这些关键词可分为五个亚类(DI;卫生和安全;劳动关系和文化;报酬与福利;人口统计等),它们反映了HC管理的多层面性质。我们分享了我们的词汇、公司HC披露和用于开发词汇的Python代码,我们提供了使用我们数据和代码的详细例子,包括微调BERT模型。研究人员可以使用我们的HC词汇(或修改代码以获取另一个利益结构)及其公司通信样本来解决相关的HC问题。我们最后讨论了与HC管理和披露有关的未来研究机会。


Article 279

Title@2025-06-11 (3): Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME

Title: Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME Analysieren von Emotionen in Bangla Social Media Kommentare mit maschinellem Lernen und LIME 利用机器学习和LIME分析孟加拉社会媒体评论中的情感 2506.10154v1

Authors (5): Bidyarthi Paul, SM Musfiqur Rahman, Dipta Biswas, Md. Ziaul Hasan, Md. Zahid Hossain

Research on understanding emotions in written language continues to expand, especially for understudied languages with distinctive regional expressions and cultural features, such as Bangla. This study examines emotion analysis using 22,698 social media comments from the EmoNoBa dataset. For language analysis, we employ machine learning models: Linear SVM, KNN, and Random Forest with n-gram data from a TF-IDF vectorizer. We additionally investigated how PCA affects the reduction of dimensionality. Moreover, we utilized a BiLSTM model and AdaBoost to improve decision trees. To make our machine learning models easier to understand, we used LIME to explain the predictions of the AdaBoost classifier, which uses decision trees. With the goal of advancing sentiment analysis in languages with limited resources, our work examines various techniques to find efficient techniques for emotion identification in Bangla.

有关理解书面语言情感的研究在继续扩大,特别是针对研究不足的语言,这些语言具有独特的区域表现形式和文化特征,如孟加拉语。本研究利用EmonoBa数据集的22,698份社交媒体评论来审查情感分析。在语言分析中,我们采用了机器学习模型:Linear SVM、KNN和随机森林,使用来自TF-IDF矢量器的n-gram数据。我们进一步调查了五氯苯如何影响维度的减少。此外,我们利用了BILSTM模型和AdaBoost来改进决策树。为了使机器学习模型更容易理解,我们利用LIME来解释AdaBoost分类师的预测,因为后者使用决策树。为了在资源有限的情况下推进语言的情绪分析,我们的工作研究了各种技术,以便在孟加拉语中找到有效的情感识别技术。


Article 280

Title@2025-06-11 (3): Attention on flow control: transformer-based reinforcement learning for lift regulation in highly disturbed flows

Title: Attention on flow control: transformer-based reinforcement learning for lift regulation in highly disturbed flows Aufmerksamkeit auf Strömungssteuerung: Transformer-basiertes Bewehrungslernen zur Heberegulierung in stark gestörten Strömungen 注意流动控制:以变压器为基础的增强力学习如何在高度动荡的流量中进行升降管制 2506.10153v1

Authors (2): Zhecheng Liu, Jeff D. Eldredge

A linear flow control strategy designed for weak disturbances may not remain effective in sequences of strong disturbances due to nonlinear interactions, but it is sensible to leverage it for developing a better strategy. In the present study, we propose a transformer-based reinforcement learning (RL) framework to learn an effective control strategy for regulating aerodynamic lift in gust sequences via pitch control. The transformer addresses the challenge of partial observability from limited surface pressure sensors. We demonstrate that the training can be accelerated with two techniques – pretraining with an expert policy (here, linear control) and task-level transfer learning (here, extending a policy trained on isolated gusts to multiple gusts). We show that the learned strategy outperforms the best proportional control, with the performance gap widening as the number of gusts increases. The control strategy learned in an environment with a small number of successive gusts is shown to effectively generalize to an environment with an arbitrarily long sequence of gusts. We investigate the pivot configuration and show that quarter-chord pitching control can achieve superior lift regulation with substantially less control effort compared to mid-chord pitching control. Through a decomposition of the lift, we attribute this advantage to the dominant added-mass contribution accessible via quarter-chord pitching. The success on multiple configurations shows the generalizability of the proposed transformer-based RL framework, which offers a promising approach to solve more computationally demanding flow control problems when combined with the proposed acceleration techniques.

针对微弱扰动设计的线性流动控制战略,由于非线性互动,在强烈扰动序列中可能不会继续有效,但由于非线性互动而使线性流动控制战略有效,但利用它来制定更好的战略是明智的。在本研究中,我们提议了一个基于变压器的强化学习(RL)框架,以学习一种有效的控制战略,以通过声控控制来调节古斯特序列中的空气动力提升。变压器应对从有限的表面压力传感器中部分可视化的挑战。我们证明,可以通过两种技术 – – 采用专家政策(这里,线性控制)和任务级转移学习(这里,将受过孤立螺旋式训练的政策扩大到多个螺旋型) – – 加快培训速度。我们表明,学习的变压式战略超越了最佳比例控制(RL)框架,随着气压数量的增加,性能差距扩大。 在一个环境中位性压式压力传感器的拟议组合中,我们调查了分弦式配置,并表明,四弦级调控控能能够实现更高级的升升调调,而我们通过中位式的递增压式控制优势。


Article 281

Title@2025-06-11 (3): Balanced Hyperbolic Embeddings Are Natural Out-of-Distribution Detectors

Title: Balanced Hyperbolic Embeddings Are Natural Out-of-Distribution Detectors Ausgewogene hyperbolische Einbettungen sind natürliche Out-of-Distribution-Detektoren 平衡的双曲嵌入器是天然流散探测器 2506.10146v1

Authors (3): Tejaswi Kasarla, Max van Spengler, Pascal Mettes

Out-of-distribution recognition forms an important and well-studied problem in deep learning, with the goal to filter out samples that do not belong to the distribution on which a network has been trained. The conclusion of this paper is simple: a good hierarchical hyperbolic embedding is preferred for discriminating in- and out-of-distribution samples. We introduce Balanced Hyperbolic Learning. We outline a hyperbolic class embedding algorithm that jointly optimizes for hierarchical distortion and balancing between shallow and wide subhierarchies. We then use the class embeddings as hyperbolic prototypes for classification on in-distribution data. We outline how to generalize existing out-of-distribution scoring functions to operate with hyperbolic prototypes. Empirical evaluations across 13 datasets and 13 scoring functions show that our hyperbolic embeddings outperform existing out-of-distribution approaches when trained on the same data with the same backbones. We also show that our hyperbolic embeddings outperform other hyperbolic approaches, beat state-of-the-art contrastive methods, and natively enable hierarchical out-of-distribution generalization.

分配外的识别是深层学习中的一个重要且经过广泛研究的问题,目的是筛选不属于网络所培训的分布范围的样本。 本文的结论很简单: 优等级双曲嵌入优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优等优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优优


Article 282

Title@2025-06-11 (3): Physiological-Model-Based Neural Network for Heart Rate Estimation during Daily Physical Activities

Title: Physiological-Model-Based Neural Network for Heart Rate Estimation during Daily Physical Activities Physiologisch-Modellbasiertes Neuronales Netzwerk zur Schätzung der Herzfrequenz während täglicher körperlicher Aktivitäten 日常体育活动中心电率估计的生理-基于模型的神经网络 2506.10144v1

Authors (5): Yaowen Zhang, Libera Fresiello, Peter H. Veltink, Dirk W. Donker, Ying Wang

Heart failure (HF) poses a significant global health challenge, with early detection offering opportunities for improved outcomes. Abnormalities in heart rate (HR), particularly during daily activities, may serve as early indicators of HF risk. However, existing HR monitoring tools for HF detection are limited by their reliability on population-based averages. The estimation of individualized HR serves as a dynamic digital twin, enabling precise tracking of cardiac health biomarkers. Current HR estimation methods, categorized into physiologically-driven and purely data-driven models, struggle with efficiency and interpretability. This study introduces a novel physiological-model-based neural network (PMB-NN) framework for HR estimation based on oxygen uptake (VO2) data during daily physical activities. The framework was trained and tested on individual datasets from 12 participants engaged in activities including resting, cycling, and running. By embedding physiological constraints, which were derived from our proposed simplified human movement physiological model (PM), into the neural network training process, the PMB-NN model adheres to human physiological principles while achieving high estimation accuracy, with a median R$^2$ score of 0.8 and an RMSE of 8.3 bpm. Comparative statistical analysis demonstrates that the PMB-NN achieves performance on par with the benchmark neural network model while significantly outperforming traditional physiological model (p=0.002). In addition, our PMB-NN is adept at identifying personalized parameters of the PM, enabling the PM to generate reasonable HR estimation. The proposed framework with a precise VO2 estimation system derived from body movements enables the future possibilities of personalized and real-time cardiac monitoring during daily life physical activities.

心衰竭(HF)是全球健康的一个重大挑战,早期发现为改进结果提供了机会。心衰竭(HR)的常态(HR),特别是在日常活动中,可能是高频风险的早期指标。然而,现有的高频检测的HR监测工具因其在基于人口的平均数上的可靠性而受到限制。个人化的HR估算是一个动态数字双,有利于准确跟踪心脏健康生物标志。目前的HR估算方法,分为生理驱动和纯数据驱动的模式,在效率和可解释性上挣扎。这项研究引入了一个基于生理摄入氧量(VO2)数据的新颖的基于生理模型的神经网络(PMB-NN)框架。在日常物理活动中,对参与包括休息、骑自行车和运行等活动的12名参与者的个人数据集进行了培训和测试。通过将我们提议的简化人类运动生理模型(PMF)纳入神经网络培训进程,PMB-NN模型符合人类生理原则,同时实现高估精度的准确度估算值,从摄入0.8的RMB值,而IMF的逻辑值值为8.3 IMB 个人-MB 运行基准进行大幅的对比分析。


Article 283

Title@2025-06-11 (3): Diffusion prior as a direct regularization term for FWI

Title: Diffusion prior as a direct regularization term for FWI Diffusion vor der direkten Regularisierung für FWI 之前作为家庭福利基金会直接正规化期的传播 2506.10141v1

Authors (3): Yuke Xie, Hervé Chauris, Nicolas Desassis

Diffusion models have recently shown promise as powerful generative priors for inverse problems. However, conventional applications require solving the full reverse diffusion process and operating on noisy intermediate states, which poses challenges for physics-constrained computational seismic imaging. In particular, such instability is pronounced in non-linear solvers like those used in Full Waveform Inversion (FWI), where wave propagation through noisy velocity fields can lead to numerical artifacts and poor inversion quality. In this work, we propose a simple yet effective framework that directly integrates a pretrained Denoising Diffusion Probabilistic Model (DDPM) as a score-based generative diffusion prior into FWI through a score rematching strategy. Unlike traditional diffusion approaches, our method avoids the reverse diffusion sampling and needs fewer iterations. We operate the image inversion entirely in the clean image space, eliminating the need to operate through noisy velocity models. The generative diffusion prior can be introduced as a simple regularization term in the standard FWI update rule, requiring minimal modification to existing FWI pipelines. This promotes stable wave propagation and can improve convergence behavior and inversion quality. Numerical experiments suggest that the proposed method offers enhanced fidelity and robustness compared to conventional and GAN-based FWI approaches, while remaining practical and computationally efficient for seismic imaging and other inverse problem tasks.

然而,常规应用需要解决完全反向扩散过程,并在吵闹的中间状态上运行,这给物理上受限制的计算地震成像带来了挑战。特别是,这种不稳定在非线性溶液中十分明显,如在全波变换(FWI)中所使用的那种非线性溶液中十分明显,因为通过超速田的波浪传播可能导致数字文物和低反向质量。在这项工作中,我们提议了一个简单而有效的框架,直接将事先经过训练的Denoising Difmission Abability 模型(DDPM)整合为一种基于分的遗传传播模型,在FWI之前,通过分数再配比战略将之作为一种基于分的遗传传播模型传播。与传统的传播方法不同,我们的方法避免了反向扩散抽样,需要更少的迭代。我们完全在清洁的图像空间操作这种图像,不需要通过噪音速度模型操作。在FWIFWI标准更新规则中引入前的基因化扩散,这需要对现有FWI输管管管进行最低限度的修改。这可以促进稳定的波传播,并且改进趋近感行为和内化质量,同时将其他的GWIWILIA试验提出改进。


Article 284

Title@2025-06-11 (3): Survival Analysis as Imprecise Classification with Trainable Kernels

Title: Survival Analysis as Imprecise Classification with Trainable Kernels Überlebensanalyse als ungenaue Klassifizierung mit lernfähigen Kernen 进行生存分析,将生存分析作为可训练内核的不精确分类 2506.10140v1

Authors (3): Andrei V. Konstantinov, Vlada A. Efremenko, Lev V. Utkin

Survival analysis is a fundamental tool for modeling time-to-event data in healthcare, engineering, and finance, where censored observations pose significant challenges. While traditional methods like the Beran estimator offer nonparametric solutions, they often struggle with the complex data structures and heavy censoring. This paper introduces three novel survival models, iSurvM (the imprecise Survival model based on Mean likelihood functions), iSurvQ (the imprecise Survival model based on the Quantiles of likelihood functions), and iSurvJ (the imprecise Survival model based on the Joint learning), that combine imprecise probability theory with attention mechanisms to handle censored data without parametric assumptions. The first idea behind the models is to represent censored observations by interval-valued probability distributions for each instance over time intervals between events moments. The second idea is to employ the kernel-based Nadaraya-Watson regression with trainable attention weights for computing the imprecise probability distribution over time intervals for the entire dataset. The third idea is to consider three decision strategies for training, which correspond to the proposed three models. Experiments on synthetic and real datasets demonstrate that the proposed models, especially iSurvJ, consistently outperform the Beran estimator from the accuracy and computational complexity points of view. Codes implementing the proposed models are publicly available.

生存分析是保健、工程和金融领域时间到活动数据模型的基本工具,在这种模型中,经过审查的观察提出了重大挑战。贝兰天文仪等传统方法提供了非参数性的解决办法,但往往与复杂的数据结构和繁琐的审查程序相冲突。本文介绍了三种新的生存模型,即iSurvM(基于平均概率功能的不精确生存模型)、iSurvQ(基于概率功能量化的不精确生存模型)和iSurvJ(基于联合学习的不精确生存模型),这些模型结合了不准确概率理论和关注机制,以不带参数性假设的方式处理受审查的数据。模型的第一个想法是代表通过时间间隔间间隔间间隔对每个案例的受审查概率分布进行观察。第二个想法是采用基于内核的纳达拉亚-瓦特森(基于概率功能的不精确生存模型)和iSurvJ(基于联合学习的不精确生存模型),用以计算整个数据集的不精确时间间隔的概率分布。第三个想法是考虑三种培训决策战略,这与拟议的三个模型相匹配。对合成和真实的精确度假设和真实的精确度模型进行实验,从提议的公开计算,特别是从提议的模型显示。


Article 285

Title@2025-06-11 (3): Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban

Title: Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban Lernte Suche interpretieren: Ein Übergangsmodell und eine Wertfunktion in einer RNN zu finden, die Sokoban spielt 解释学识搜索:在播放 Sokoban 的 RNN 中找到过渡模式和价值函数 2506.10138v1

Authors (4): Mohammad Taufeeque, Aaron David Tucker, Adam Gleave, Adrià Garriga-Alonso

We partially reverse-engineer a convolutional recurrent neural network (RNN) trained to play the puzzle game Sokoban with model-free reinforcement learning. Prior work found that this network solves more levels with more test-time compute. Our analysis reveals several mechanisms analogous to components of classic bidirectional search. For each square, the RNN represents its plan in the activations of channels associated with specific directions. These state-action activations are analogous to a value function - their magnitudes determine when to backtrack and which plan branch survives pruning. Specialized kernels extend these activations (containing plan and value) forward and backward to create paths, forming a transition model. The algorithm is also unlike classical search in some ways. State representation is not unified; instead, the network considers each box separately. Each layer has its own plan representation and value function, increasing search depth. Far from being inscrutable, the mechanisms leveraging test-time compute learned in this network by model-free training can be understood in familiar terms.

我们的分析揭示了类似于经典双向搜索组件的若干机制。 对于每个广场, RNN 代表着它与特定方向相关频道的激活计划。 这些州行动激活类似于一个价值函数 - 它们的数量决定了向后跟踪的时间和哪个计划分支生存下来。 专门的内核扩展了这些前向和后向的启动( 包含计划和价值) 以创建路径, 形成一个过渡模式。 算法也不同于经典的搜索方式 。 国家代表制不统一; 相反, 网络将每个框分开考虑。 每个层都有自己的计划表达和价值功能, 增加搜索深度。 远非易变形, 可以理解利用无模式培训的测试时间计算机制。


Article 286

Title@2025-06-11 (3): Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Title: Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning Selbstprädiktive Darstellungen für die kombinatorische Generalisierung im Verhaltensklonen 行为克隆综合通用的自我预测代表 2506.10137v1

Authors (5): Daniel Lawson, Adriana Hugessen, Charlotte Cloutier, Glen Berseth, Khimya Khetarpal

Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in domains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective, $\text{BYOL-}\gamma$ augmented GCBC, which is not only able to theoretically approximate the successor representation in the finite MDP case without contrastive samples or TD learning, but also, results in competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.

通过监督学习(SL)培训的行为性克隆(BC)方法是学习机器人等领域人类示范政策的有效途径。 目标调整这些政策使单一的通用政策能够捕捉离线数据集内包含的各种行为。 虽然目标性行为性克隆(GCBC)方法在分散培训任务中可以很好地发挥作用,但不一定将零射射入到需要以新的国家目标配对(即组合式概括化)为条件的任务中。这一限制可部分归因于不列颠哥伦比亚所学的国家代表性缺乏时间上的一致性;如果与时间相关的国家被编码为类似的潜在代表,那么新的州目标配对的超分配差距就会缩小。因此,鼓励代表性空间的这种时间性一致性应该有助于组合式的概括化。 成功表现(它将未来所访问的国家的分布与当前状态相挂钩,即精美地概括了这一属性。 但是,先前的后继代表制方法依赖于对比性样本、时间差异(TD)学习,或者两者兼有。在这项工作中,我们提议,在不要求进行IMBBA-B的常规代表性总体对比中,我们建议一个简单的、不要求在IM-BC的排序中进行一个简单的常规对比性对比,而能够对等的对比。


Article 287

Title@2025-06-11 (3): Provable Sim-to-Real Transfer via Offline Domain Randomization

Title: Provable Sim-to-Real Transfer via Offline Domain Randomization Provable Sim-to-Real-Übertragung über Offline-Domain Randomisierung 通过离线网域随机转接 2506.10133v1

Authors (3): Arnaud Fickinger, Abderrahim Bendahi, Stuart Russell

Reinforcement-learning agents often struggle when deployed from simulation to the real-world. A dominant strategy for reducing the sim-to-real gap is domain randomization (DR) which trains the policy across many simulators produced by sampling dynamics parameters, but standard DR ignores offline data already available from the real system. We study offline domain randomization (ODR), which first fits a distribution over simulator parameters to an offline dataset. While a growing body of empirical work reports substantial gains with algorithms such as DROPO, the theoretical foundations of ODR remain largely unexplored. In this work, we (i) formalize ODR as a maximum-likelihood estimation over a parametric simulator family, (ii) prove consistency of this estimator under mild regularity and identifiability conditions, showing it converges to the true dynamics as the dataset grows, (iii) derive gap bounds demonstrating ODRs sim-to-real error is up to an O(M) factor tighter than uniform DR in the finite-simulator case (and analogous gains in the continuous setting), and (iv) introduce E-DROPO, a new version of DROPO which adds an entropy bonus to prevent variance collapse, yielding broader randomization and more robust zero-shot transfer in practice.

在从模拟到真实世界的部署过程中,强化学习的代理机构往往在从模拟到真实世界的部署中挣扎。减少模拟到现实差距的主要策略是域随机化(DR),通过取样动态参数对许多模拟模拟器进行政策培训,但标准的DR忽略了实际系统中已经存在的离线数据。我们研究了离线域随机化(ODR),它首先适合模拟参数的分布,到离线数据集。越来越多的经验性工作报告在诸如DROPO这样的算法中取得了巨大的收益,而网上解决的理论基础基本上仍未得到探讨。在这项工作中,我们(i)将网上解决正式确定为对一个计量参数模拟器大家庭的最大相似性估计,(ii)证明这个估计器在温和的经常性和可识别性条件下具有一致性,显示它与随着数据集的不断增长而真实的动态一致。 (iii) 显示网上解决机构模拟到真实错误的分错的分界线是O(M)比统一的DRM系数更接近于固定模拟器案(和类似在连续的递增的DPO中的成果),并引入更强劲的递增的递增的DPO-RPO(iv),和递增的递增的递增的递增的RPO)。


Article 288

Title@2025-06-11 (3): ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

Title: ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs ViCrit: Eine überprüfbare Verstärkungs-Lern-Proxy-Aufgabe für visuelle Wahrnehmung in VLMs 虚拟: VLM 视觉感知的可核查强化学习代用任务 2506.10128v1

Authors (13): Xiyao Wang, Zhengyuan Yang, Chao Feng, Yongyuan Liang, Yuhang Zhou, Xiaoyu Liu, Ziyi Zang, Ming Li, Chung-Ching Lin, Kevin Lin, Linjie Li, Furong Huang, Lijuan Wang

Reinforcement learning (RL) has shown great effectiveness for fine-tuning large language models (LLMs) using tasks that are challenging yet easily verifiable, such as math reasoning or code generation. However, extending this success to visual perception in vision-language models (VLMs) has been impeded by the scarcity of vision-centric tasks that are simultaneously challenging and unambiguously verifiable. To this end, we introduce ViCrit (Visual Caption Hallucination Critic), an RL proxy task that trains VLMs to localize a subtle, synthetic visual hallucination injected into paragraphs of human-written image captions. Starting from a 200-word captions, we inject a single, subtle visual description error-altering a few words on objects, attributes, counts, or spatial relations-and task the model to pinpoint the corrupted span given the image and the modified caption. This formulation preserves the full perceptual difficulty while providing a binary, exact-match reward that is easy to compute and unambiguous. Models trained with the ViCrit Task exhibit substantial gains across a variety of VL benchmarks. Crucially, the improvements transfer beyond natural-image training data to abstract image reasoning and visual math, showing promises of learning to perceive rather than barely memorizing seen objects. To facilitate evaluation, we further introduce ViCrit-Bench, a category-balanced diagnostic benchmark that systematically probes perception errors across diverse image domains and error types. Together, our results demonstrate that fine-grained hallucination criticism is an effective and generalizable objective for enhancing visual perception in VLMs.

强化学习(RL)在微调大型语言模型(LLMS)方面显示了极大的实效,这些模型使用具有挑战性但易于核查的任务,如数学推理或代码生成等。然而,将这一成功扩大到视觉语言模型(VLMS)中的视觉感知差值,由于缺少具有挑战性和明确可核查的视觉中心任务而受阻。为此,我们引入了VCrit (VicalCaption Hallucilation Criticle),这是一个代理任务,该任务培训VLMS将隐含的微妙、合成视觉幻觉植入人类写图像标题的段落。从200个字标题开始,我们输入了一个单一、微妙的视觉描述错差值,在对象、属性、计数或空间关系模型中添加了几个字眼感知觉觉觉知觉,以找出图像和经修改的标题。我们引入了完全的认知困难,同时提供了易于理解和清晰的二进式、准确的奖赏。经过VICrit任务培训的模型展示了各种VL基准中的重大收益。 值得注意,在各种VLIL基准基准中,在视觉感知错度方面进行改进,超越了常规和直观领域,在直观分析目标领域之外,我们学习了一种精测测测测数据,在学习了一种微的模型,在学习了我们所见的数学和直观测标的模型的模型,我们所看到的数学比。


Article 289

Title@2025-06-11 (3): Generative Modeling with Diffusion

Title: Generative Modeling with Diffusion Generative Modellierung mit Diffusion 利用传播生成模型 2412.10948v2

Authors (1): Justin Le

We provide an overview of the diffusion model as a method to generate new samples. Generative models have been recently adopted for tasks such as art generation (Stable Diffusion, Dall-E) and text generation (ChatGPT). Diffusion models in particular apply noise to sample data and then “reverse” this noising process to generate new samples. We will formally define these noising and denoising processes, then present algorithms to train and generate with a diffusion model. Afterward, we will explore a potential application of diffusion models in improving classifier performance on imbalanced data.

我们概述了扩散模型作为产生新样本的一种方法,最近为艺术生成(稳定传播、Dall-E)和文本生成(ChatGPT)等任务采用了生成模型。传播模型特别将噪音应用于样本数据,然后“反转”这个“循环”过程来生成新的样本。我们将正式界定这些点火和去注过程,然后提出算法来培训和生成扩散模型。随后,我们将探索推广模型的可能应用,以提高分类者在不平衡数据方面的性能。


Article 290

Title@2025-06-11 (3): Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms

Title: Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms Treffen Sie mich am Arm: Das kooperative Multi-Armed Bandits Problem mit geteilten Armen 与我在 “ 武器:合作的多武装强盗与可分享武器问题 “ 聚会 2506.10127v1

Authors (2): Xinyi Hu, Aldo Pacchiano

We study the decentralized multi-player multi-armed bandits (MMAB) problem under a no-sensing setting, where each player receives only their own reward and obtains no information about collisions. Each arm has an unknown capacity, and if the number of players pulling an arm exceeds its capacity, all players involved receive zero reward. This setting generalizes the classical unit-capacity model and introduces new challenges in coordination and capacity discovery under severe feedback limitations. We propose A-CAPELLA (Algorithm for Capacity-Aware Parallel Elimination for Learning and Allocation), a decentralized algorithm that achieves logarithmic regret in this generalized regime. Our main contribution is a collaborative hypothesis testing protocol that enables synchronized successive elimination and capacity estimation through carefully structured collision patterns. This represents a provably efficient learning result in decentralized no-sensing MMAB with unknown arm capacities.

我们研究在无观测环境下的多玩家多武装强盗(MMAB)问题,每个玩家只得到他们自己的奖赏,没有关于碰撞的信息。每个手臂都具有未知的能力,如果拉着一个手臂的玩家人数超过其能力,所有参与者都得到零奖。这概括了典型的单位能力模式,在严重的反馈限制下提出了协调和能力发现方面的新挑战。我们提议A-CAPELLA(能力-软件平行消除学习和分配的算法),这是一种分散的算法,在这个普遍制度中实现对数的遗憾。我们的主要贡献是合作的假设测试协议,它能够通过仔细结构的碰撞模式同步连续消除和能力估算。这代表了一种相当有效的学习结果,即分散了不明的手臂能力。


Article 291

Title@2025-06-11 (3): Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds’ Annotated Imagery

Title: Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds’ Annotated Imagery Peer-Ranked Precision: Erstellung eines Basisdatensatzes für feinfühlige Visionsmodelle aus DataSeeds kommentierten Bildern 同行式精确度:从数据种子附加图象中为精美的愿景模型创建基础数据集 2506.05673v3

Authors (4): Sajjad Abdoli, Freeman Lewin, Gediminas Vasiliauskas, Fabian Schonholz

The development of modern Artificial Intelligence (AI) models, particularly diffusion-based models employed in computer vision and image generation tasks, is undergoing a paradigmatic shift in development methodologies. Traditionally dominated by a “Model Centric” approach, in which performance gains were primarily pursued through increasingly complex model architectures and hyperparameter optimization, the field is now recognizing a more nuanced “Data-Centric” approach. This emergent framework foregrounds the quality, structure, and relevance of training data as the principal driver of model performance. To operationalize this paradigm shift, we introduce the DataSeeds.AI sample dataset (the “DSD”), initially comprised of approximately 10,610 high-quality human peer-ranked photography images accompanied by extensive multi-tier annotations. The DSD is a foundational computer vision dataset designed to usher in a new standard for commercial image datasets. Representing a small fraction of DataSeeds.AI’s 100 million-plus image catalog, the DSD provides a scalable foundation necessary for robust commercial and multimodal AI development. Through this in-depth exploratory analysis, we document the quantitative improvements generated by the DSD on specific models against known benchmarks and make the code and the trained models used in our evaluation publicly available.

现代人工智能(AI)模型的开发,特别是计算机视觉和图像生成任务中使用的基于传播的模型,正在发展方法的范式转变中发生范式转变。传统上,发展方法以“模型中心”方法为主,主要通过日益复杂的模型结构和超光谱优化来追求绩效收益,目前,该领域正在承认一种更加精细的“数据中心”方法。这个新兴框架为培训数据的质量、结构和相关性提供了基础,作为模型绩效的主要驱动因素。为了实施这一范式转变,我们引入了DataSeids.AI样本数据集(“DSD”),最初由大约10,610个高质量的人类同侪摄影图像组成,并配有广泛的多层次图解。DSD是一个基础计算机远景数据集,旨在引入新的商业图像数据集标准。DSD代表了一小部分数据SEE.AI的1,000万以上图像目录,它为稳健的商业和多式联运开发提供了可扩展的基础。通过这一深入的探索性分析,我们记录了我们所了解的量化的模型,并用在已知的DDD所制作的定量模型上所使用的定量模型。


Article 292

Title@2025-06-11 (3): GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments

Title: GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments GRAIL: Ein Benchmark für GRAPH ActIve Learning in dynamischen Sensing-Umgebungen GRAIL: 动态遥感环境中GRaph Active学习基准 2506.10120v1

Authors (2): Maryam Khalid, Akane Sano

Graph-based Active Learning (AL) leverages the structure of graphs to efficiently prioritize label queries, reducing labeling costs and user burden in applications like health monitoring, human behavior analysis, and sensor networks. By identifying strategically positioned nodes, graph AL minimizes data collection demands while maintaining model performance, making it a valuable tool for dynamic environments. Despite its potential, existing graph AL methods are often evaluated on static graph datasets and primarily focus on prediction accuracy, neglecting user-centric considerations such as sampling diversity, query fairness, and adaptability to dynamic settings. To bridge this gap, we introduce GRAIL, a novel benchmarking framework designed to evaluate graph AL strategies in dynamic, real-world environments. GRAIL introduces novel metrics to assess sustained effectiveness, diversity, and user burden, enabling a comprehensive evaluation of AL methods under varying conditions. Extensive experiments on datasets featuring dynamic, real-life human sensor data reveal trade-offs between prediction performance and user burden, highlighting limitations in existing AL strategies. GRAIL demonstrates the importance of balancing node importance, query diversity, and network topology, providing an evaluation mechanism for graph AL solutions in dynamic environments.

以图表为基础的积极学习(AL) 利用图表结构来有效地确定标签查询的优先次序,减少标签费用和用户在健康监测、人类行为分析和感官网络等应用方面的负担。通过确定具有战略定位的节点,图形AL最大限度地减少数据收集需求,同时保持模型性能,使其成为动态环境的宝贵工具。尽管具有潜力,但现有的图表AL方法经常在静态图表数据集中评估,主要侧重于预测准确性,忽视抽样多样性、查询公平性和适应动态环境等以用户为中心的考虑。为了缩小这一差距,我们引入了GRIL,这是一个新的基准框架,旨在评价动态、现实世界环境中的图形AL战略。GRIL引入了新的衡量标准,以评估持续的有效性、多样性和用户负担,以便能够在不同条件下全面评估AL方法。关于以动态、真实生命人类传感器数据为特征的数据集的广泛实验揭示了预测性业绩与用户负担之间的利弊,突出了现有AL战略的局限性。GRIL表明平衡节点重要性、查询多样性和网络地形学的重要性,为动态环境中的图形AL解决方案提供评价机制。


Article 293

Title@2025-06-11 (3): Detecção da Psoríase Utilizando Visão Computacional: Uma Abordagem Comparativa Entre CNNs e Vision Transformers

Title: Detecção da Psoríase Utilizando Visão Computacional: Uma Abordagem Comparativa Entre CNNs e Vision Transformers Detecção da Psoríase Utilizando Visão Computacional: Uma Abortagem Comparativa Entre CNNs e Vision Transformers 利用有线电视新闻网和愿景变革者 2506.10119v1

Authors (3): Natanael Lucena, Fábio S. da Silva, Ricardo Rios

This paper presents a comparison of the performance of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in the task of multi-classifying images containing lesions of psoriasis and diseases similar to it. Models pre-trained on ImageNet were adapted to a specific data set. Both achieved high predictive metrics, but the ViTs stood out for their superior performance with smaller models. Dual Attention Vision Transformer-Base (DaViT-B) obtained the best results, with an f1-score of 96.4%, and is recommended as the most efficient architecture for automated psoriasis detection. This article reinforces the potential of ViTs for medical image classification tasks.

本文件比较了革命神经网络(CNNs)和愿景变异器(VVTs)在包含恶性丝虫病和类似疾病损害的多级图像任务方面的表现。在图像网络上预先培训的模型经过了具体数据集的调整。这两个模型都实现了高预测度,但ViTs则以较小的模型为优异性能。双注意力视觉变异器(DaViT-B)获得了最佳效果,F1核心为96.4%,被推荐为自动检测恶性丝虫病的最有效结构。这篇文章强化了ViTs在医学图像分类任务方面的潜力。


Article 294

Title@2025-06-11 (3): NnD: Diffusion-based Generation of Physically-Nonnegative Objects

Title: NnD: Diffusion-based Generation of Physically-Nonnegative Objects NnD: Diffusionsbasierte Generierung physikalisch-nonnegativer Objekte NND: 以传播为主的物理-氮化物体的生成 2506.10112v1

Authors (3): Nadav Torem, Tamar Sde-Chen, Yoav Y. Schechner

Most natural objects have inherent complexity and variability. While some simple objects can be modeled from first principles, many real-world phenomena, such as cloud formation, require computationally expensive simulations that limit scalability. This work focuses on a class of physically meaningful, nonnegative objects that are computationally tractable but costly to simulate. To dramatically reduce computational costs, we propose nonnegative diffusion (NnD). This is a learned generative model using score based diffusion. It adapts annealed Langevin dynamics to enforce, by design, non-negativity throughout iterative scene generation and analysis (inference). NnD trains on high-quality physically simulated objects. Once trained, it can be used for generation and inference. We demonstrate generation of 3D volumetric clouds, comprising inherently nonnegative microphysical fields. Our generated clouds are consistent with cloud physics trends. They are effectively not distinguished as non-physical by expert perception.

大多数自然物体具有固有的复杂性和可变性。 虽然一些简单的天体可以从最初的原则中建模, 但许多现实世界现象, 如云层形成, 需要计算昂贵的模拟, 以限制可缩放性。 这项工作侧重于一组物理上有意义的非阴性物体, 这些物体在计算上可移动, 但模拟成本却昂贵。 为了大幅降低计算成本, 我们建议非阴性扩散( NnD ) 。 这是一个使用基于分数的传播方法的学习的基因模型。 它使无线的兰格文动态在迭代场生成和分析( 推断) 中以设计方式执行非负性。 NnD 用于高质量的物理模拟物体的培训。 一旦经过培训, 就可以用于生成和推断。 我们展示了3D体积云的生成, 由固有的非负性微物理场组成。 我们生成的云与云层物理趋势一致。 它们实际上没有被专家的认知区分为非物理。


Article 295

Title@2025-06-11 (3): AI5GTest: AI-Driven Specification-Aware Automated Testing and Validation of 5G O-RAN Components

Title: AI5GTest: AI-Driven Specification-Aware Automated Testing and Validation of 5G O-RAN Components AI5GTest: AI-Driven Specification-Aware Automatisierte Prüfung und Validierung von 5G O-RAN Komponenten AI5GTest: AI-Driver 5G O-RAN 部件的AI-DRE 规格-软件自动测试和验证 2506.10111v1

Authors (3): Abiodun Ganiyu, Pranshav Gajjar, Vijay K Shah

The advent of Open Radio Access Networks (O-RAN) has transformed the telecommunications industry by promoting interoperability, vendor diversity, and rapid innovation. However, its disaggregated architecture introduces complex testing challenges, particularly in validating multi-vendor components against O-RAN ALLIANCE and 3GPP specifications. Existing frameworks, such as those provided by Open Testing and Integration Centres (OTICs), rely heavily on manual processes, are fragmented and prone to human error, leading to inconsistency and scalability issues. To address these limitations, we present AI5GTest – an AI-powered, specification-aware testing framework designed to automate the validation of O-RAN components. AI5GTest leverages a cooperative Large Language Models (LLM) framework consisting of Gen-LLM, Val-LLM, and Debug-LLM. Gen-LLM automatically generates expected procedural flows for test cases based on 3GPP and O-RAN specifications, while Val-LLM cross-references signaling messages against these flows to validate compliance and detect deviations. If anomalies arise, Debug-LLM performs root cause analysis, providing insight to the failure cause. To enhance transparency and trustworthiness, AI5GTest incorporates a human-in-the-loop mechanism, where the Gen-LLM presents top-k relevant official specifications to the tester for approval before proceeding with validation. Evaluated using a range of test cases obtained from O-RAN TIFG and WG5-IOT test specifications, AI5GTest demonstrates a significant reduction in overall test execution time compared to traditional manual methods, while maintaining high validation accuracy.

开放电台接入网络(O-RAN)的出现,通过促进互操作性、供应商多样性和快速创新,改变了电信业,然而,其分类结构带来了复杂的测试挑战,特别是在根据O-RAN Allianscent和3GP规格验证多供应商组件方面;现有框架,如开放测试和一体化中心(OTICs)提供的框架,严重依赖人工流程,支离破碎,容易发生人为错误,导致不一致和可缩放问题;为解决这些限制,我们提出了AI5GTest – – 一个由AI驱动的、符合规格的测试框架,旨在自动验证O-RAN的组件;AI5GTest利用一个合作型大语言模型框架,包括Gen-LLM、Val-LLIM和Debug-LPPM。 Gentral-LUM自动生成基于3GPP和O-RAN规格的测试案件的预期程序流,同时,Val-LLM交叉参照这些流动的信息信号,以验证遵守情况和检测偏离。如果出现异常情况,DLLM进行相关的根分析,则使用相关的根分析,提供对IG标准的测试。


Article 296

Title@2025-06-11 (3): A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning

Title: A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning Ein einheitliches Framework, um Symmetrie im maschinellen Lernen zu stärken, zu entdecken und zu fördern 实施、发现和促进机器学习对称的统一框架 2311.00212v3

Authors (4): Samuel E. Otto, Nicholas Zolman, J. Nathan Kutz, Steven L. Brunton

Symmetry is present throughout nature and continues to play an increasingly central role in physics and machine learning. Fundamental symmetries, such as Poincar'{e} invariance, allow physical laws discovered in laboratories on Earth to be extrapolated to the farthest reaches of the universe. Symmetry is essential to achieving this extrapolatory power in machine learning applications. For example, translation invariance in image classification allows models with fewer parameters, such as convolutional neural networks, to be trained on smaller data sets and achieve state-of-the-art performance. In this paper, we provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models in three ways: 1. enforcing known symmetry when training a model; 2. discovering unknown symmetries of a given model or data set; and 3. promoting symmetry during training by learning a model that breaks symmetries within a user-specified group of candidates when there is sufficient evidence in the data. We show that these tasks can be cast within a common mathematical framework whose central object is the Lie derivative associated with fiber-linear Lie group actions on vector bundles. We extend and unify several existing results by showing that enforcing and discovering symmetry are linear-algebraic tasks that are dual with respect to the bilinear structure of the Lie derivative. We also propose a novel way to promote symmetry by introducing a class of convex regularization functions based on the Lie derivative and nuclear norm relaxation to penalize symmetry breaking during training of machine learning models. We explain how these ideas can be applied to a wide range of machine learning models including basis function regression, dynamical systems discovery, neural networks, and neural operators acting on fields.

自然中存在对称性, 并继续在物理和机器学习中扮演越来越核心的角色。 基本对称性, 比如 Poincar\\ {e} 差异性, 允许将地球上实验室发现的物理法律推至宇宙最远的角落。 测量性能对于在机器学习应用中实现这种外推力至关重要。 例如, 图像分类的翻译性差让参数较少的模型, 如卷轴神经网络 接受培训, 学习更小的数据集, 并实现最先进的性能。 在本文中, 我们提供了一个统一的理论和方法框架, 以三种方式将对称性对称性纳入机器学习模式: 1. 在培训模型时执行已知的对称性能; 2. 发现一个特定模型或数据集的未知的对等性; 3. 通过学习一个模型, 在数据中有足够的证据时, 在用户- 指定的候选人组中打破对正对等性, 我们显示这些任务可以在一个共同的数学框架内进行, 其核心的对等性对等性对机的对等性对等性对等性, 将一系列的对机极性模型和机极性机极性机极性机极性机极性分析性机变函数进行解释性研究, 。 我们通过在进行一系列的对机变的对机变的演演演制性判法, 将一些对机变的对机极性判法, , 将一系列的对机变法, 将一些正的对机变的机变的对机变的机变法, 向性判法, 向性判法性判法性判法性判法性判法, , 向性判法, 向性判法, 向性判法性判法, 演演演算性判法, 演演演算性判法, 演算性判法, 演演演演制性判法, , , 演算性判法, 演算性判法, 演算性判法, 演算性判法, 演算性判制的演算性判法, 演算性判法, 演算性判制的演算性判法, 演算法, 演算法, 演算法, 演算性判法, 演算性判制


Article 297

Title@2025-06-11 (3): Learning to Collaborate Over Graphs: A Selective Federated Multi-Task Learning Approach

Title: Learning to Collaborate Over Graphs: A Selective Federated Multi-Task Learning Approach Lernen, über Graphen zu kollaborieren: Ein selektiver, federführender Lernansatz mit mehreren Aufgaben 学习 “ 合作翻图:选择性联邦多任务学习方法 “ 2506.10102v1

Authors (3): Ahmed Elbakary, Chaouki Ben Issaid, Mehdi Bennis

We present a novel federated multi-task learning method that leverages cross-client similarity to enable personalized learning for each client. To avoid transmitting the entire model to the parameter server, we propose a communication-efficient scheme that introduces a feature anchor, a compact vector representation that summarizes the features learned from the client’s local classes. This feature anchor is shared with the server to account for local clients’ distribution. In addition, the clients share the classification heads, a lightweight linear layer, and perform a graph-based regularization to enable collaboration among clients. By modeling collaboration between clients as a dynamic graph and continuously updating and refining this graph, we can account for any drift from the clients. To ensure beneficial knowledge transfer and prevent negative collaboration, we leverage a community detection-based approach that partitions this dynamic graph into homogeneous communities, maximizing the sum of task similarities, represented as the graph edges’ weights, within each community. This mechanism restricts collaboration to highly similar clients within their formed communities, ensuring positive interaction and preserving personalization. Extensive experiments on two heterogeneous datasets demonstrate that our method significantly outperforms state-of-the-art baselines. Furthermore, we show that our method exhibits superior computation and communication efficiency and promotes fairness across clients.

我们提出了一个新的联结式多任务学习方法,利用跨客户的相似性,使每个客户都能进行个性化学习。为了避免将整个模型传输到参数服务器,我们提议了一个通信高效计划,引入一个功能锚,即一个集客户本地分类所学特征的紧凑矢量代表。这个功能锚与服务器共享,以说明当地客户的分配情况。此外,客户共享分类头,一个轻量的线性层,并进行基于图表的正规化,以使客户之间能够进行合作。通过将客户之间的合作建模成动态图表,不断更新和完善这个图表,我们可以说明客户的任何漂移。为了确保有益的知识转移和防止负面合作,我们利用基于社区检测的方法,将这种动态图形分割成一个单一的社区,最大限度地增加任务相似性的总和,作为每个社区的图形边端的重量。这个机制限制了客户在他们组成社区内与非常相似的客户的合作,确保积极的互动和保持个性化。在两个不同的数据中进行广泛的实验,表明我们的方法大大优于州-艺术客户的基线。此外,我们展示了我们的方法,展示了我们的方法,提高了客户的公平性。


Article 298

Title@2025-06-11 (3): VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis

Title: VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis VISTA: Vision-Language-Schlussfolgerung für eine trainingsfreie Analyse der Stock-Zeitreihen VISTA:无培训-库存无培训-时间-系列分析的远景-语言推断 2505.18570v3

Authors (5): Tina Khezresmaeilzadeh, Parsa Razmara, Seyedarmin Azizi, Mohammad Erfan Sadeghi, Erfan Baghaei Potraghloo

Stock price prediction remains a complex and high-stakes task in financial analysis, traditionally addressed using statistical models or, more recently, language models. In this work, we introduce VISTA (Vision-Language Inference for Stock Time-series Analysis), a novel, training-free framework that leverages Vision-Language Models (VLMs) for multi-modal stock forecasting. VISTA prompts a VLM with both textual representations of historical stock prices and their corresponding line charts to predict future price values. By combining numerical and visual modalities in a zero-shot setting and using carefully designed chain-of-thought prompts, VISTA captures complementary patterns that unimodal approaches often miss. We benchmark VISTA against standard baselines, including ARIMA and text-only LLM-based prompting methods. Experimental results show that VISTA outperforms these baselines by up to 89.83%, demonstrating the effectiveness of multi-modal inference for stock time-series analysis and highlighting the potential of VLMs in financial forecasting tasks without requiring task-specific training.

股票价格预测仍然是金融分析中一项复杂和高度的任务,传统上使用统计模型或最近使用的语言模型加以处理。在这项工作中,我们采用了VISTA(股票时间序列分析的视觉-语言推断),这是一个没有培训的新框架,利用视野-语言模型(VLMs)进行多模式股票预测。VISTA推动了一个VLM,同时以文字形式表述历史股票价格及其相应的线性图表来预测未来价格值。通过在零镜头设置中结合数字和视觉模式,并使用精心设计的思维链提示,VISTA捕捉了通常不采用单一方式的方法的互补模式。我们根据标准基线对VISTA进行基准,包括ARIMA和基于文本的LLMM快速方法。实验结果表明,VISTA比这些基线高出89.83%,表明股票时间序列分析的多模式推断的有效性,并强调VLMs在财务预测任务中的潜力,而无需具体任务培训。


Article 299

Title@2025-06-11 (3): Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes

Title: Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes Grundlegende Grenzen des Lernens Hochdimensionale Vereinfachungen in Noisy Regimes 吵闹制度中学习高维简单主义的基本限制 2506.10101v1

Authors (4): Seyed Amir Hossein Saberi, Amir Najafi, Abolfazl Motahari, Babak H. khalaj

In this paper, we establish sample complexity bounds for learning high-dimensional simplices in $\mathbb{R}^K$ from noisy data. Specifically, we consider $n$ i.i.d. samples uniformly drawn from an unknown simplex in $\mathbb{R}^K$, each corrupted by additive Gaussian noise of unknown variance. We prove an algorithm exists that, with high probability, outputs a simplex within $\ell_2$ or total variation (TV) distance at most $\varepsilon$ from the true simplex, provided $n \ge (K^2/\varepsilon^2) e^{\mathcal{O}(K/\mathrm{SNR}^2)}$, where $\mathrm{SNR}$ is the signal-to-noise ratio. Extending our prior work~\citep{saberi2023sample}, we derive new information-theoretic lower bounds, showing that simplex estimation within TV distance $\varepsilon$ requires at least $n \ge \Omega(K^3 \sigma^2/\varepsilon^2 + K/\varepsilon)$ samples, where $\sigma^2$ denotes the noise variance. In the noiseless scenario, our lower bound $n \ge \Omega(K/\varepsilon)$ matches known upper bounds up to constant factors. We resolve an open question by demonstrating that when $\mathrm{SNR} \ge \Omega(K^{1/2})$, noisy-case complexity aligns with the noiseless case. Our analysis leverages sample compression techniques (Ashtiani et al., 2018) and introduces a novel Fourier-based method for recovering distributions from noisy observations, potentially applicable beyond simplex learning.

在本文中, 我们从噪音数据中以 $\ mathbb{ RK$ 建立用于学习高维implice 的样本复杂度。 具体地说, 我们考虑 $ i. d. 统一地从一个未知的简单x $\ mathb{ RK$ 中提取的样本, 其中每个样本都因 添加高尔氏噪音而腐蚀。 我们证明存在一种算法, 极有可能, 输出在$2 美元或全变异( TV) 中, 最多为 $\ vareptieplon , 提供 $ $2\ varepsl2 emax , 提供 $ legn $\ vreal2\ k_ listermax listal_ listations a 。 我们从一个简单的估算值中显示 $ direntireal- devoil liversal dism a.


Article 300

Title@2025-06-11 (3): Token-Efficient RL for LLM Reasoning

Title: Token-Efficient RL for LLM Reasoning Token-Efficient RL für LLM-Vernunft 用于LLM 理由解释的 Token- Effecticent RL 2504.20834v4

Authors (2): Alan Lee, Harry Tong

We propose reinforcement learning (RL) strategies tailored for reasoning in large language models (LLMs) under strict memory and compute limits, with a particular focus on compatibility with LoRA fine-tuning. Building on early policy gradient methods with baseline subtraction, we design critic-free methods that operate on a small, informative subset of output tokens to reduce memory usage and stabilize training. We introduce S-GRPO, a stochastic variant of Group Relative Policy Optimization, and T-SPMO, a token-level prefix matching approach for fine-grained credit assignment. Applied to Qwen2-1.5B, our methods raise accuracy on the SVAMP benchmark from 46% to over 70% and show strong performance on multi-digit multiplication. Surprisingly, full-token GRPO under LoRA fails to improve over the base model, suggesting that selective token-level optimization may act as an implicit regularizer in low-parameter training regimes.

nan


Article 301

Title@2025-06-11 (3): Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models

Title: Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models Schätzung der gemeinsamen Wahrscheinlichkeit von Szenario-Parametern mit Gaussian Mixture Copula-Modellen 与高山混合混合胶合合构件模型共同估计设想参数的 联合概率 2506.10098v1

Authors (4): Christian Reichenbächer, Philipp Rank, Jochen Hipp, Oliver Bringmann

This paper presents the first application of Gaussian Mixture Copula Models to the statistical modeling of driving scenarios for the safety validation of automated driving systems. Knowledge of the joint probability distribution of scenario parameters is essential for scenario-based safety assessment, where risk quantification depends on the likelihood of concrete parameter combinations. Gaussian Mixture Copula Models bring together the multimodal expressivity of Gaussian Mixture Models and the flexibility of copulas, enabling separate modeling of marginal distributions and dependencies. We benchmark Gaussian Mixture Copula Models against previously proposed approaches - Gaussian Mixture Models and Gaussian Copula Models - using real-world driving data drawn from scenarios defined in United Nations Regulation No. 157. Our evaluation across 18 million scenario instances demonstrates that Gaussian Mixture Copula Models provide a better fit to the data in terms of both likelihood and Sinkhorn distance. These results suggest that Gaussian Mixture Copula Models are a compelling foundation for future scenario-based validation frameworks.

nan


Article 302

Title@2025-06-11 (3): Unsupervised Deep Clustering of MNIST with Triplet-Enhanced Convolutional Autoencoders

Title: Unsupervised Deep Clustering of MNIST with Triplet-Enhanced Convolutional Autoencoders Unüberwachtes Deep Clustering von MNIST mit Triplet-Enhanced Convolutional Autoencodern 在无人监督下,MNIST与三维强化的革命自动自动编码器一起深入集群 2506.10094v1

Authors (1): Md. Faizul Islam Ansari

This research implements an advanced unsupervised clustering system for MNIST handwritten digits through two-phase deep autoencoder architecture. A deep neural autoencoder requires a training process during phase one to develop minimal yet interpretive representations of images by minimizing reconstruction errors. During the second phase we unify the reconstruction error with a KMeans clustering loss for learned latent embeddings through a joint distance-based objective. Our model contains three elements which include batch normalization combined with dropout and weight decay for achieving generalized and stable results. The framework achieves superior clustering performance during extensive tests which used intrinsic measurements including Silhouette Score and Davies-Bouldin Index coupled with extrinsic metrics NMI and ARI when processing image features. The research uses t-SNE visualization to present learned embeddings that show distinct clusters for digits. Our approach reaches an optimal combination between data reconstruction accuracy and cluster separation purity when adding the benefit of understandable results and scalable implementations. The approach creates a dependable base that helps deploy unsupervised representation learning in different large-scale image clustering applications.

nan


Article 303

Title@2025-06-11 (3): Efficient kernelized bandit algorithms via exploration distributions

Title: Efficient kernelized bandit algorithms via exploration distributions Effiziente kernisierte Bandit-Algorithmen über Explorations-Distributionen 通过勘探分布分配进行高效率的内脏化土匪算法 2506.10091v1

Authors (3): Bingshan Hu, Zheng He, Danica J. Sutherland

We consider a kernelized bandit problem with a compact arm set ${X} \subset \mathbb{R}^d $ and a fixed but unknown reward function $f^*$ with a finite norm in some Reproducing Kernel Hilbert Space (RKHS). We propose a class of computationally efficient kernelized bandit algorithms, which we call GP-Generic, based on a novel concept: exploration distributions. This class of algorithms includes Upper Confidence Bound-based approaches as a special case, but also allows for a variety of randomized algorithms. With careful choice of exploration distribution, our proposed generic algorithm realizes a wide range of concrete algorithms that achieve $\tilde{O}(\gamma_T\sqrt{T})$ regret bounds, where $\gamma_T$ characterizes the RKHS complexity. This matches known results for UCB- and Thompson Sampling-based algorithms; we also show that in practice, randomization can yield better practical results.

nan


Article 304

Title@2025-06-11 (3): Assortment Optimization for Patient-Provider Matching

Title: Assortment Optimization for Patient-Provider Matching Sortimentsoptimierung für Patient-Provider-Matching 患者-供应者匹配的组群优化 2502.10353v2

Authors (2): Naveen Raman, Holly Wiberg

Rising provider turnover results in frequently needing to rematch patients with available providers. However, the rematching process is cumbersome for both patients and health systems, resulting in labor-intensive and ad hoc reassignments. We propose a novel patient-provider matching approach to address this issue by offering patients limited provider menus. The goal is to maximize match quality across the system while preserving patient choice. We frame this as a novel variant of assortment optimization, where patient-specific provider menus are offered upfront, and patients respond in a random sequence to make their selections. This hybrid offline-online setting is understudied in previous literature and captures system dynamics across various domains. We first demonstrate that a greedy baseline policy–which offers all providers to all patients–can maximize the match rate but lead to low-quality matches. Based on this, we construct a set of policies and demonstrate that the best policy depends on problem specifics, such as a patient’s willingness to match and the ratio of patients to providers. On real-world data, our proposed policy improves average match quality by 13% over a greedy solution by tailoring assortments based on patient characteristics. Our analysis reveals a tradeoff between menu size and system-wide match quality, highlighting the value of balancing patient choice with centralized planning.

nan


Article 305

Title@2025-06-11 (3): Optimizing Latent Dimension Allocation in Hierarchical VAEs: Balancing Attenuation and Information Retention for OOD Detection

Title: Optimizing Latent Dimension Allocation in Hierarchical VAEs: Balancing Attenuation and Information Retention for OOD Detection Optimierung der Latent Dimension Allocation in Hierarchical VAEs: Abgleichende Dämpfung und Informationsretention für OOD Detection 优化在等级性VAEs中的中局量分配:平衡减速和信息保留以探测OOD 2506.10089v1

Authors (3): Dane Williamson, Yangfeng Ji, Matthew Dwyer

Out-of-distribution (OOD) detection is a critical task in machine learning, particularly for safety-critical applications where unexpected inputs must be reliably flagged. While hierarchical variational autoencoders (HVAEs) offer improved representational capacity over traditional VAEs, their performance is highly sensitive to how latent dimensions are distributed across layers. Existing approaches often allocate latent capacity arbitrarily, leading to ineffective representations or posterior collapse. In this work, we introduce a theoretically grounded framework for optimizing latent dimension allocation in HVAEs, drawing on principles from information theory to formalize the trade-off between information loss and representational attenuation. We prove the existence of an optimal allocation ratio $r^{\ast}$ under a fixed latent budget, and empirically show that tuning this ratio consistently improves OOD detection performance across datasets and architectures. Our approach outperforms baseline HVAE configurations and provides practical guidance for principled latent structure design, leading to more robust OOD detection with deep generative models.

nan


Article 306

Title@2025-06-11 (3): M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Predictive Embedding Architecture

Title: M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Predictive Embedding Architecture M3-JEPA: Multimodale Ausrichtung über Multi-Gate MoE auf Basis der Joint-Predictive Embedding Architecture M3-JEPA:通过基于联合预防嵌入结构的多门教育部实现多式联运协调 2409.05929v5

Authors (11): Hongyang Lei, Xiaolong Cheng, Qi Qin, Dan Wang, Kun Fan, Huazhen Huang, Qingqing Gu, Yetao Wu, Zhonglin Jiang, Yong Chen, Luo Ji

Current multimodal learning strategies primarily optimize in the original token space. Such a framework is easy to incorporate with the backbone of pretrained language model, but might result in modality collapse. To alleviate such issues, we leverage the joint embedding predictive architecture (JEPA) on the multimodal tasks, which converts the input embedding into the output embedding space by a predictor and then conducts the cross-modal alignment on the latent space. We implement this predictor by a Multi-Gate Mixture of Experts (MMoE) and name the framework as M3-JEPA, accordingly. The gating function disentangles the modality-specific and shared information and derives information-theoretic optimality. The framework is implemented with both contrastive and regularization loss, and solved by alternative gradient descent (AGD) between different multimodal tasks. By thoroughly designed experiments, we show that M3-JEPA can obtain state-of-the-art performance on different modalities and tasks, generalize to unseen datasets and domains, and is computationally efficient in both training and inference. Our observation suggests that M3-JEPA might become a new basis to self-supervised learning in the open world.

nan


Article 307

Title@2025-06-11 (3): Patient-Specific Deep Reinforcement Learning for Automatic Replanning in Head-and-Neck Cancer Proton Therapy

Title: Patient-Specific Deep Reinforcement Learning for Automatic Replanning in Head-and-Neck Cancer Proton Therapy Patientenspezifisches Tiefen-Verstärkungs-Lernen zur automatischen Neuplanung in der Kopf-Neck-Krebs-Protonen-Therapie 在头部和颈部癌症质子治疗中进行自动再规划的 具体病人深强化学习 2506.10073v1

Authors (6): Malvern Madondo, Yuan Shao, Yingzi Liu, Jun Zhou, Xiaofeng Yang, Zhen Tian

Anatomical changes during intensity-modulated proton therapy (IMPT) for head-and-neck cancer (HNC) can shift Bragg peaks, risking tumor underdosing and organ-at-risk overdosing. As a result, treatment replanning is often required to maintain clinically acceptable treatment quality. However, current manual replanning processes are resource-intensive and time-consuming. We propose a patient-specific deep reinforcement learning (DRL) framework for automated IMPT replanning, with a reward-shaping mechanism based on a $150$-point plan quality score addressing competing clinical objectives. We formulate the planning process as an RL problem where agents learn control policies to adjust optimization priorities, maximizing plan quality. Unlike population-based approaches, our framework trains personalized agents for each patient using their planning CT (Computed Tomography) and augmented anatomies simulating anatomical changes (tumor progression and regression). This patient-specific approach leverages anatomical similarities throughout treatment, enabling effective plan adaptation. We implemented two DRL algorithms, Deep Q-Network and Proximal Policy Optimization, using dose-volume histograms (DVHs) as state representations and a $22$-dimensional action space of priority adjustments. Evaluation on five HNC patients using actual replanning CT data showed both DRL agents improved initial plan scores from $120.63 \pm 21.40$ to $139.78 \pm 6.84$ (DQN) and $142.74 \pm 5.16$ (PPO), surpassing manual replans generated by a human planner ($137.20 \pm 5.58$). Clinical validation confirms that improvements translate to better tumor coverage and OAR sparing across diverse anatomical changes. This work demonstrates DRL’s potential in addressing geometric and dosimetric complexities of adaptive proton therapy, offering efficient offline adaptation solutions and advancing online adaptive proton therapy.

nan


Article 308

Title@2025-06-11 (3): Textual Bayes: Quantifying Uncertainty in LLM-Based Systems

Title: Textual Bayes: Quantifying Uncertainty in LLM-Based Systems Textbuchten: Quantifizierung der Unsicherheit in LLM-basierten Systemen 以LLM为基础的系统中的量化不确定性 2506.10060v1

Authors (11): Brendan Leigh Ross, Noël Vouitsis, Atiyeh Ashari Ghomi, Rasa Hosseinzadeh, Ji Xin, Zhaoyan Liu, Yi Sui, Shiyi Hou, Kin Kwan Leung, Gabriel Loaiza-Ganem, Jesse C. Cresswell

Although large language models (LLMs) are becoming increasingly capable of solving challenging real-world tasks, accurately quantifying their uncertainty remains a critical open problem, which limits their applicability in high-stakes domains. This challenge is further compounded by the closed-source, black-box nature of many state-of-the-art LLMs. Moreover, LLM-based systems can be highly sensitive to the prompts that bind them together, which often require significant manual tuning (i.e., prompt engineering). In this work, we address these challenges by viewing LLM-based systems through a Bayesian lens. We interpret prompts as textual parameters in a statistical model, allowing us to use a small training dataset to perform Bayesian inference over these prompts. This novel perspective enables principled uncertainty quantification over both the model’s textual parameters and its downstream predictions, while also incorporating prior beliefs about these parameters expressed in free-form text. To perform Bayesian inference, a difficult problem even for well-studied data modalities, we introduce Metropolis-Hastings through LLM Proposals (MHLP), a novel Markov chain Monte Carlo (MCMC) algorithm that combines prompt optimization techniques with standard MCMC methods. MHLP is a turnkey modification to existing LLM pipelines, including those that rely exclusively on closed-source models. Empirically, we demonstrate that our method yields improvements in both predictive accuracy and uncertainty quantification (UQ) on a range of LLM benchmarks and UQ tasks. More broadly, our work demonstrates a viable path for incorporating methods from the rich Bayesian literature into the era of LLMs, paving the way for more reliable and calibrated LLM-based systems.

nan


Article 309

Title@2025-06-11 (3): DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos

Title: DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos DGS-LRM: 实时变形 3D Gaussian 重建独视视频 2506.09997v1

Authors (14): Chieh Hubert Lin, Zhaoyang Lv, Songyin Wu, Zhen Xu, Thu Nguyen-Phuoc, Hung-Yu Tseng, Julian Straub, Numair Khan, Lei Xiao, Ming-Hsuan Yang, Yuheng Ren, Richard Newcombe, Zhao Dong, Zhengqin Li

We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM), the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene. Feed-forward scene reconstruction has gained significant attention for its ability to rapidly create digital replicas of real-world environments. However, most existing models are limited to static scenes and fail to reconstruct the motion of moving objects. Developing a feed-forward model for dynamic scene reconstruction poses significant challenges, including the scarcity of training data and the need for appropriate 3D representations and training paradigms. To address these challenges, we introduce several key technical contributions: an enhanced large-scale synthetic dataset with ground-truth multi-view videos and dense 3D scene flow supervision; a per-pixel deformable 3D Gaussian representation that is easy to learn, supports high-quality dynamic view synthesis, and enables long-range 3D tracking; and a large transformer network that achieves real-time, generalizable dynamic scene reconstruction. Extensive qualitative and quantitative experiments demonstrate that DGS-LRM achieves dynamic scene reconstruction quality comparable to optimization-based methods, while significantly outperforming the state-of-the-art predictive dynamic reconstruction method on real-world examples. Its predicted physically grounded 3D deformation is accurate and can readily adapt for long-range 3D tracking tasks, achieving performance on par with state-of-the-art monocular video 3D tracking methods.

nan


Article 310

Title@2025-06-11 (3): Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling

Title: Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling Flipping gegen alle Ungeraden: LLM-Münze Flip-Bias durch verbalisierte Ablehnungsstichproben reduzieren 反万千万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万千万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万万 2506.09998v1

Authors (6): Tim Z. Xiao, Johannes Zenn, Zhen Liu, Weiyang Liu, Robert Bamler, Bernhard Schölkopf

Large language models (LLMs) can often accurately describe probability distributions using natural language, yet they still struggle to generate faithful samples from them. This mismatch limits their use in tasks requiring reliable stochasticity, such as Monte Carlo methods, agent-based simulations, and randomized decision-making. We investigate this gap between knowledge and sampling in the context of Bernoulli distributions. We introduce Verbalized Rejection Sampling (VRS), a natural-language adaptation of classical rejection sampling that prompts the LLM to reason about and accept or reject proposed samples. Despite relying on the same Bernoulli mechanism internally, VRS substantially reduces sampling bias across models. We provide theoretical analysis showing that, under mild assumptions, VRS improves over direct sampling, with gains attributable to both the algorithm and prompt design. More broadly, our results show how classical probabilistic tools can be verbalized and embedded into LLM workflows to improve reliability, without requiring access to model internals or heavy prompt engineering.

nan


Article 311

Title@2025-06-11 (3): EVINET: Towards Open-World Graph Learning via Evidential Reasoning Network

Title: EVINET: Towards Open-World Graph Learning via Evidential Reasoning Network EVINET: Auf dem Weg zum Open-World Graph Learning über Evidential Reasoning Network EVINET:通过证明理由网络走向开放世界图表学习 2506.07288v2

Authors (5): Weijie Guan, Haohui Wang, Jian Kang, Lihui Liu, Dawei Zhou

Graph learning has been crucial to many real-world tasks, but they are often studied with a closed-world assumption, with all possible labels of data known a priori. To enable effective graph learning in an open and noisy environment, it is critical to inform the model users when the model makes a wrong prediction to in-distribution data of a known class, i.e., misclassification detection or when the model encounters out-of-distribution from novel classes, i.e., out-of-distribution detection. This paper introduces Evidential Reasoning Network (EVINET), a framework that addresses these two challenges by integrating Beta embedding within a subjective logic framework. EVINET includes two key modules: Dissonance Reasoning for misclassification detection and Vacuity Reasoning for out-of-distribution detection. Extensive experiments demonstrate that EVINET outperforms state-of-the-art methods across multiple metrics in the tasks of in-distribution classification, misclassification detection, and out-of-distribution detection. EVINET demonstrates the necessity of uncertainty estimation and logical reasoning for misclassification detection and out-of-distribution detection and paves the way for open-world graph learning. Our code and data are available at https://github.com/SSSKJ/EviNET.

nan


Article 312

Title@2025-06-11 (3): Text-Aware Image Restoration with Diffusion Models

Title: Text-Aware Image Restoration with Diffusion Models Text-Aware Bildrestaurierung mit Diffusionsmodellen 带有传播模型的文本软件图像恢复 2506.09993v1

Authors (9): Jaewon Min, Jin Hyeon Kim, Paul Hyunbin Cho, Jaeeun Lee, Jihye Park, Minkyu Park, Sangpil Kim, Hyunhee Park, Seungryong Kim

Image restoration aims to recover degraded images. However, existing diffusion-based restoration methods, despite great success in natural image restoration, often struggle to faithfully reconstruct textual regions in degraded images. Those methods frequently generate plausible but incorrect text-like patterns, a phenomenon we refer to as text-image hallucination. In this paper, we introduce Text-Aware Image Restoration (TAIR), a novel restoration task that requires the simultaneous recovery of visual contents and textual fidelity. To tackle this task, we present SA-Text, a large-scale benchmark of 100K high-quality scene images densely annotated with diverse and complex text instances. Furthermore, we propose a multi-task diffusion framework, called TeReDiff, that integrates internal features from diffusion models into a text-spotting module, enabling both components to benefit from joint training. This allows for the extraction of rich text representations, which are utilized as prompts in subsequent denoising steps. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art restoration methods, achieving significant gains in text recognition accuracy. See our project page: https://cvlab-kaist.github.io/TAIR/

nan


Article 313

Title@2025-06-11 (3): Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Title: Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Multiverse: Ihre Sprachmodelle entscheiden geheim, wie die Generation parallelisiert und zusammengeführt werden kann 多种语言:您的语言模型秘密决定如何平行和合并一代 2506.09991v1

Authors (5): Xinyu Yang, Yuwei An, Hongyi Liu, Tianqi Chen, Beidi Chen

Autoregressive Large Language Models (AR-LLMs) frequently exhibit implicit parallelism in sequential generation. Inspired by this, we introduce Multiverse, a new generative model that enables natively parallel generation. Multiverse internalizes a MapReduce paradigm, generating automatically through three stages: (i) a Map stage for adaptive task decomposition, (ii) a Process stage for parallel subtask execution, and (iii) a Reduce stage for lossless result synthesis. Next, we build a real-world Multiverse reasoning model with co-design of data, algorithm, and system, enabling rapid and seamless transfer from frontier AR-LLMs. Starting from sequential reasoning chains, we create Multiverse 1K by converting them into structured training data using an automated LLM-assisted pipeline, avoiding costly human annotations. Algorithmically, we design Multiverse Attention to separate parallel reasoning steps while keeping compatibility with causal attention for efficient training. Systematically, we implement Multiverse Engine to enable parallel inference. It features a dedicated scheduler that dynamically switches between sequential and parallel generation, triggered directly by the model. After a 3-hour fine-tuning with 1K examples, our Multiverse-32B stands as the only open-sourced non-AR model achieving performance on par with leading AR-LLMs of the same scale, evidenced by AIME24 & 25 scores of 54% and 46%, respectively. Moreover, our budget control experiments show that Multiverse-32B exhibits superior scaling, outperforming AR-LLMs by 1.87% on average using the same context length. Such scaling further leads to practical efficiency gain, achieving up to 2x speedup across varying batch sizes. We have open-sourced the entire Multiverse ecosystem, including data, model weights, engine, supporting tools, as well as complete data curation prompts and detailed training and evaluation recipes.

nan


Article 314

Title@2025-06-11 (3): Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation

Title: Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation Chain-of-Action: Trajektorie Autoregressive Modellierung für die Robotermanipulation 操作链:机器人操纵的轨迹自动递减模型模型 2506.09990v1

Authors (10): Wenbo Zhang, Tianrun Hu, Yanyuan Qiao, Hanbo Zhang, Yuchu Qin, Yang Li, Jiajun Liu, Tao Kong, Lingqiao Liu, Xiao Ma

We present Chain-of-Action (CoA), a novel visuo-motor policy paradigm built upon Trajectory Autoregressive Modeling. Unlike conventional approaches that predict next step action(s) forward, CoA generates an entire trajectory by explicit backward reasoning with task-specific goals through an action-level Chain-of-Thought (CoT) process. This process is unified within a single autoregressive structure: (1) the first token corresponds to a stable keyframe action that encodes the task-specific goals; and (2) subsequent action tokens are generated autoregressively, conditioned on the initial keyframe and previously predicted actions. This backward action reasoning enforces a global-to-local structure, allowing each local action to be tightly constrained by the final goal. To further realize the action reasoning structure, CoA incorporates four complementary designs: continuous action token representation; dynamic stopping for variable-length trajectory generation; reverse temporal ensemble; and multi-token prediction to balance action chunk modeling with global structure. As a result, CoA gives strong spatial generalization capabilities while preserving the flexibility and simplicity of a visuo-motor policy. Empirically, we observe CoA achieves the state-of-the-art performance across 60 RLBench tasks and 8 real-world manipulation tasks.

nan


Article 315

Title@2025-06-11 (3): Spectral Image Tokenizer

Title: Spectral Image Tokenizer Spektraler Bild-Tokenizer 频谱图像调试器 2412.09607v2

Authors (3): Carlos Esteves, Mohammed Suhail, Ameesh Makadia

Image tokenizers map images to sequences of discrete tokens, and are a crucial component of autoregressive transformer-based image generation. The tokens are typically associated with spatial locations in the input image, arranged in raster scan order, which is not ideal for autoregressive modeling. In this paper, we propose to tokenize the image spectrum instead, obtained from a discrete wavelet transform (DWT), such that the sequence of tokens represents the image in a coarse-to-fine fashion. Our tokenizer brings several advantages: 1) it leverages that natural images are more compressible at high frequencies, 2) it can take and reconstruct images of different resolutions without retraining, 3) it improves the conditioning for next-token prediction – instead of conditioning on a partial line-by-line reconstruction of the image, it takes a coarse reconstruction of the full image, 4) it enables partial decoding where the first few generated tokens can reconstruct a coarse version of the image, 5) it enables autoregressive models to be used for image upsampling. We evaluate the tokenizer reconstruction metrics as well as multiscale image generation, text-guided image upsampling and editing.

nan


Article 316

Title@2025-06-11 (3): EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits

Title: EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits EditInspector: Ein Benchmark für die Auswertung von textgeführten Bildbearbeitungen 编辑监视器:文本指导图像编辑评价基准 2506.09988v1

Authors (4): Ron Yosef, Moran Yanuka, Yonatan Bitton, Dani Lischinski

Text-guided image editing, fueled by recent advancements in generative AI, is becoming increasingly widespread. This trend highlights the need for a comprehensive framework to verify text-guided edits and assess their quality. To address this need, we introduce EditInspector, a novel benchmark for evaluation of text-guided image edits, based on human annotations collected using an extensive template for edit verification. We leverage EditInspector to evaluate the performance of state-of-the-art (SoTA) vision and language models in assessing edits across various dimensions, including accuracy, artifact detection, visual quality, seamless integration with the image scene, adherence to common sense, and the ability to describe edit-induced changes. Our findings indicate that current models struggle to evaluate edits comprehensively and frequently hallucinate when describing the changes. To address these challenges, we propose two novel methods that outperform SoTA models in both artifact detection and difference caption generation.

nan


Article 317

Title@2025-06-11 (3): Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs

Title: Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs Omni-DPO: Ein Dual-Perspektiven-Paradigma für dynamisches Preference-Lernen von LLMs Omni-DPO:一个双双面范式,以动态方式学习LLMs 2506.10054v1

Authors (10): Shangpin Peng, Weinong Wang, Zhuotao Tian, Senqiao Yang, Xing Wu, Haotian Xu, Chengquan Zhang, Takashi Isobe, Baotian Hu, Min Zhang

Direct Preference Optimization (DPO) has become a cornerstone of reinforcement learning from human feedback (RLHF) due to its simplicity and efficiency. However, existing DPO-based approaches typically treat all preference pairs uniformly, ignoring critical variations in their inherent quality and learning utility, leading to suboptimal data utilization and performance. To address this challenge, we propose Omni-DPO, a dual-perspective optimization framework that jointly accounts for (1) the inherent quality of each preference pair and (2) the model’s evolving performance on those pairs. By adaptively weighting samples according to both data quality and the model’s learning dynamics during training, Omni-DPO enables more effective training data utilization and achieves better performance. Experimental results on various models and benchmarks demonstrate the superiority and generalization capabilities of Omni-DPO. On textual understanding tasks, Gemma-2-9b-it finetuned with Omni-DPO beats the leading LLM, Claude 3 Opus, by a significant margin of 6.7 points on the Arena-Hard benchmark. On mathematical reasoning tasks, Omni-DPO consistently outperforms the baseline methods across all benchmarks, providing strong empirical evidence for the effectiveness and robustness of our approach. Code and models will be available at https://github.com/pspdada/Omni-DPO.

nan


Article 318

Title@2025-06-11 (3): A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs

Title: A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs Ein Shortcut-aware-Video-QA-Benchmark für physisches Verständnis über minimale Video-Paare 通过最小视频对等器进行身体理解的捷径视距视频QA基准 2506.09987v1

Authors (7): Benno Krojer, Mojtaba Komeili, Candace Ross, Quentin Garrido, Koustuv Sinha, Nicolas Ballas, Mahmoud Assran

Existing benchmarks for assessing the spatio-temporal understanding and reasoning abilities of video language models are susceptible to score inflation due to the presence of shortcut solutions based on superficial visual or textual cues. This paper mitigates the challenges in accurately assessing model performance by introducing the Minimal Video Pairs (MVP) benchmark, a simple shortcut-aware video QA benchmark for assessing the physical understanding of video language models. The benchmark is comprised of 55K high-quality multiple-choice video QA examples focusing on physical world understanding. Examples are curated from nine video data sources, spanning first-person egocentric and exocentric videos, robotic interaction data, and cognitive science intuitive physics benchmarks. To mitigate shortcut solutions that rely on superficial visual or textual cues and biases, each sample in MVP has a minimal-change pair – a visually similar video accompanied by an identical question but an opposing answer. To answer a question correctly, a model must provide correct answers for both examples in the minimal-change pair; as such, models that solely rely on visual or textual biases would achieve below random performance. Human performance on MVP is 92.9\%, while the best open-source state-of-the-art video-language model achieves 40.2\% compared to random performance at 25\%.

nan


Article 319

Title@2025-06-11 (3): V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Title: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning V-JEPA 2: Selbstüberwachte Videomodelle ermöglichen Verständnis, Vorhersage und Planung V-JEPA2:自我监督的视频模型,促进理解、预测和规划 2506.09985v1

Authors (30): Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Mojtaba, Komeili, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, Sergio Arnaud, Abha Gejji, Ada Martin, Francois Robert Hogan, Daniel Dugas, Piotr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong Li, Xiaodong Ma, Sarath Chandar, Franziska Meier, Yann LeCun, Michael Rabbat, Nicolas Ballas

A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supervised approach that combines internet-scale video data with a small amount of interaction data (robot trajectories), to develop models capable of understanding, predicting, and planning in the physical world. We first pre-train an action-free joint-embedding-predictive architecture, V-JEPA 2, on a video and image dataset comprising over 1 million hours of internet video. V-JEPA 2 achieves strong performance on motion understanding (77.3 top-1 accuracy on Something-Something v2) and state-of-the-art performance on human action anticipation (39.7 recall-at-5 on Epic-Kitchens-100) surpassing previous task-specific models. Additionally, after aligning V-JEPA 2 with a large language model, we demonstrate state-of-the-art performance on multiple video question-answering tasks at the 8 billion parameter scale (e.g., 84.0 on PerceptionTest, 76.9 on TempCompass). Finally, we show how self-supervised learning can be applied to robotic planning tasks by post-training a latent action-conditioned world model, V-JEPA 2-AC, using less than 62 hours of unlabeled robot videos from the Droid dataset. We deploy V-JEPA 2-AC zero-shot on Franka arms in two different labs and enable picking and placing of objects using planning with image goals. Notably, this is achieved without collecting any data from the robots in these environments, and without any task-specific training or reward. This work demonstrates how self-supervised learning from web-scale data and a small amount of robot interaction data can yield a world model capable of planning in the physical world.

nan


Article 320

Title@2025-06-11 (3): Trustworthy AI: Safety, Bias, and Privacy – A Survey

Title: Trustworthy AI: Safety, Bias, and Privacy – A Survey Vertrauenswürdige KI: Sicherheit, Bias und Privatsphäre – Eine Umfrage 值得信赖的大赦国际:安全、比亚斯和隐私 – – 调查 2502.10450v2

Authors (4): Xingli Fang, Jianwei Li, Varun Mulchandani, Jung-Eun Kim

The capabilities of artificial intelligence systems have been advancing to a great extent, but these systems still struggle with failure modes, vulnerabilities, and biases. In this paper, we study the current state of the field, and present promising insights and perspectives regarding concerns that challenge the trustworthiness of AI models. In particular, this paper investigates the issues regarding three thrusts: safety, privacy, and bias, which hurt models’ trustworthiness. For safety, we discuss safety alignment in the context of large language models, preventing them from generating toxic or harmful content. For bias, we focus on spurious biases that can mislead a network. Lastly, for privacy, we cover membership inference attacks in deep neural networks. The discussions addressed in this paper reflect our own experiments and observations.

nan


Article 321

Title@2025-06-11 (3): Resa: Transparent Reasoning Models via SAEs

Title: Resa: Transparent Reasoning Models via SAEs Resa: Transparente Begründungsmodelle über SAE Resa:通过SAEs建立透明说明理由模型 2506.09967v1

Authors (7): Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Deqing Fu, Willie Neiswanger

How cost-effectively can we elicit strong reasoning in language models by leveraging their underlying representations? We answer this question with Resa, a family of 1.5B reasoning models trained via a novel and efficient sparse autoencoder tuning (SAE-Tuning) procedure. This method first trains an SAE to capture reasoning abilities from a source model, and then uses the trained SAE to guide a standard supervised fine-tuning process to elicit such abilities in a target model, all using verified question-answer data without any reasoning traces. Notably, when applied to certain base models before further RL post-training, SAE-Tuning retains >97% of its RL-trained counterpart’s reasoning performance while reducing training costs by >2000x to roughly $1 and training time by >450x to around 20 minutes. Furthermore, when applied to lightly RL-trained models (e.g., within 1 hour on 2 GPUs), it enables reasoning performance such as 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23 for only around $1 additional cost. Surprisingly, the reasoning abilities extracted via SAEs are potentially both generalizable and modular. Generality means abilities extracted from one dataset still elevate performance on a larger and overlapping corpus. Modularity means abilities extracted from Qwen or Qwen-Math can be attached to the R1-Distill model at test time, without any retraining, and yield comparable gains. Extensive ablations validate these findings and all artifacts are fully open-sourced.

nan


Article 322

Title@2025-06-11 (3): Gradient Aligned Regression via Pairwise Losses

Title: Gradient Aligned Regression via Pairwise Losses Gradient Aligned Regression über Paarweise Verluste 通过对等损失逐步走向一致后退 2402.06104v6

Authors (3): Dixian Zhu, Tianbao Yang, Livnat Jerby

Regression is a fundamental task in machine learning that has garnered extensive attention over the past decades. The conventional approach for regression involves employing loss functions that primarily concentrate on aligning model prediction with the ground truth for each individual data sample. Recent research endeavors have introduced novel perspectives by incorporating label similarity to regression via imposing extra pairwise regularization on the latent feature space and demonstrated the effectiveness. However, there are two drawbacks for those approaches: i) their pairwise operation in latent feature space is computationally more expensive than conventional regression losses; ii) it lacks of theoretical justifications behind such regularization. In this work, we propose GAR (Gradient Aligned Regression) as a competitive alternative method in label space, which is constituted by a conventional regression loss and two pairwise label difference losses for gradient alignment including magnitude and direction. GAR enjoys: i) the same level efficiency as conventional regression loss because the quadratic complexity for the proposed pairwise losses can be reduced to linear complexity; ii) theoretical insights from learning the pairwise label difference to learning the gradient of the ground truth function. We limit our current scope as regression on the clean data setting without noises, outliers or distributional shifts, etc. We demonstrate the effectiveness of the proposed method practically on two synthetic datasets and on eight extensive real-world tasks from six benchmark datasets with other eight competitive baselines. Running time experiments demonstrate the superior efficiency of the proposed GAR over existing methods with pairwise regularization in latent feature space and ablation studies demonstrate the effectiveness of each component for GAR.

nan


Article 323

Title@2025-06-11 (3): Griffin: Towards a Graph-Centric Relational Database Foundation Model

Title: Griffin: Towards a Graph-Centric Relational Database Foundation Model Griffin: Auf dem Weg zu einem Graph-Centric Relational Database Foundation Modell 格里芬:走向图形中心关系数据库基础模型 2505.05568v2

Authors (7): Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, Muhan Zhang

We introduce Griffin, the first foundation model attemptation designed specifically for Relational Databases (RDBs). Unlike previous smaller models focused on single RDB tasks, Griffin unifies the data encoder and task decoder to handle diverse tasks. Additionally, we enhance the architecture by incorporating a cross-attention module and a novel aggregator. Griffin utilizes pretraining on both single-table and RDB datasets, employing advanced encoders for categorical, numerical, and metadata features, along with innovative components such as cross-attention modules and enhanced message-passing neural networks (MPNNs) to capture the complexities of relational data. Evaluated on large-scale, heterogeneous, and temporal graphs extracted from RDBs across various domains (spanning over 150 million nodes), Griffin demonstrates superior or comparable performance to individually trained models, excels in low-data scenarios, and shows strong transferability with similarity and diversity in pretraining across new datasets and tasks, highlighting its potential as a universally applicable foundation model for RDBs. Code available at https://github.com/yanxwb/Griffin.

nan


Article 324

Title@2025-06-11 (3): Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models

Title: Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models Provable Benefits of Unüberwachte Pre-Training and Transfer Learning via Single-Index Models 通过单一指标模式进行的未受监督的培训和转让前学习的惠益 2502.16849v2

Authors (3): Taj Jones-McCormick, Aukosh Jagannath, Subhabrata Sen

Unsupervised pre-training and transfer learning are commonly used techniques to initialize training algorithms for neural networks, particularly in settings with limited labeled data. In this paper, we study the effects of unsupervised pre-training and transfer learning on the sample complexity of high-dimensional supervised learning. Specifically, we consider the problem of training a single-layer neural network via online stochastic gradient descent. We establish that pre-training and transfer learning (under concept shift) reduce sample complexity by polynomial factors (in the dimension) under very general assumptions. We also uncover some surprising settings where pre-training grants exponential improvement over random initialization in terms of sample complexity.

nan


Article 325

Title@2025-06-11 (3): Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy

Title: Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy Kvasir-VQA-x1: Ein multimodaler Datensatz für medizinische Vernunft und robustes MedVQA in der gastrointestinalen Endoskopie Kvasir-VQA-x1:一个用于医疗理由的多模式数据集和气肠内肠内镜检查中的强力MDVQA 2506.09958v1

Authors (3): Sushant Gautam, Michael A. Riegler, Pål Halvorsen

Medical Visual Question Answering (MedVQA) is a promising field for developing clinical decision support systems, yet progress is often limited by the available datasets, which can lack clinical complexity and visual diversity. To address these gaps, we introduce Kvasir-VQA-x1, a new, large-scale dataset for gastrointestinal (GI) endoscopy. Our work significantly expands upon the original Kvasir-VQA by incorporating 159,549 new question-answer pairs that are designed to test deeper clinical reasoning. We developed a systematic method using large language models to generate these questions, which are stratified by complexity to better assess a model’s inference capabilities. To ensure our dataset prepares models for real-world clinical scenarios, we have also introduced a variety of visual augmentations that mimic common imaging artifacts. The dataset is structured to support two main evaluation tracks: one for standard VQA performance and another to test model robustness against these visual perturbations. By providing a more challenging and clinically relevant benchmark, Kvasir-VQA-x1 aims to accelerate the development of more reliable and effective multimodal AI systems for use in clinical settings. The dataset is fully accessible and adheres to FAIR data principles, making it a valuable resource for the wider research community. Code and data: https://github.com/Simula/Kvasir-VQA-x1 and https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1

nan


Article 326

Title@2025-06-11 (3): Canonical Latent Representations in Conditional Diffusion Models

Title: Canonical Latent Representations in Conditional Diffusion Models Kanonische Latentendarstellungen in bedingten Diffusionsmodellen 有条件扩散模型中卡尼卡内端表示器 2506.09955v1

Authors (4): Yitao Xu, Tong Zhang, Ehsan Pajouheshgar, Sabine Süsstrunk

Conditional diffusion models (CDMs) have shown impressive performance across a range of generative tasks. Their ability to model the full data distribution has opened new avenues for analysis-by-synthesis in downstream discriminative learning. However, this same modeling capacity causes CDMs to entangle the class-defining features with irrelevant context, posing challenges to extracting robust and interpretable representations. To this end, we identify Canonical LAtent Representations (CLAReps), latent codes whose internal CDM features preserve essential categorical information while discarding non-discriminative signals. When decoded, CLAReps produce representative samples for each class, offering an interpretable and compact summary of the core class semantics with minimal irrelevant details. Exploiting CLAReps, we develop a novel diffusion-based feature-distillation paradigm, CaDistill. While the student has full access to the training set, the CDM as teacher transfers core class knowledge only via CLAReps, which amounts to merely 10 % of the training data in size. After training, the student achieves strong adversarial robustness and generalization ability, focusing more on the class signals instead of spurious background cues. Our findings suggest that CDMs can serve not just as image generators but also as compact, interpretable teachers that can drive robust representation learning.

nan


Article 327

Title@2025-06-11 (3): Scalable First-order Method for Certifying Optimal k-Sparse GLMs

Title: Scalable First-order Method for Certifying Optimal k-Sparse GLMs Skalierbare Methode erster Ordnung zur Zertifizierung optimaler k-Spar-GLMs 可缩放的核证最佳 k- Sparse GLM 最佳 GLM 的可缩放第一等级方法 2502.09502v3

Authors (3): Jiachang Liu, Soroosh Shafiee, Andrea Lodi

This paper investigates the problem of certifying optimality for sparse generalized linear models (GLMs), where sparsity is enforced through an $\ell_0$ cardinality constraint. While branch-and-bound (BnB) frameworks can certify optimality by pruning nodes using dual bounds, existing methods for computing these bounds are either computationally intensive or exhibit slow convergence, limiting their scalability to large-scale problems. To address this challenge, we propose a first-order proximal gradient algorithm designed to solve the perspective relaxation of the problem within a BnB framework. Specifically, we formulate the relaxed problem as a composite optimization problem and demonstrate that the proximal operator of the non-smooth component can be computed exactly in log-linear time complexity, eliminating the need to solve a computationally expensive second-order cone program. Furthermore, we introduce a simple restart strategy that enhances convergence speed while maintaining low per-iteration complexity. Extensive experiments on synthetic and real-world datasets show that our approach significantly accelerates dual bound computations and is highly effective in providing optimality certificates for large-scale problems.

nan


Article 328

Title@2025-06-11 (3): 7B Fully Open Source Moxin-LLM/VLM – From Pretraining to GRPO-based Reinforcement Learning Enhancement

Title: 7B Fully Open Source Moxin-LLM/VLM – From Pretraining to GRPO-based Reinforcement Learning Enhancement 7B Vollständig offene Quelle Moxin-LLM/VLM – Vom Vortraining zur GRPO-basierten Verstärkungs-Lernverbesserung 7B 完全开放源代码 Moxin-LLM/VLM – – 从预培训到基于GROP的强化学习加强 2412.06845v5

Authors (18): Pu Zhao, Xuan Shen, Zhenglun Kong, Yixin Shen, Sung-En Chang, Timothy Rupprecht, Lei Lu, Enfu Nan, Changdi Yang, Yumei He, Weiyan Shi, Xingchen Xu, Yu Huang, Wei Jiang, Wei Wang, Yue Chen, Yong He, Yanzhi Wang

Recently, Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Although open-source LLMs present unprecedented opportunities for innovation and research, the commercialization of LLMs has raised concerns about transparency, reproducibility, and safety. Many open-source LLMs fail to meet fundamental transparency requirements by withholding essential components like training code and data, which may hinder further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed, adhering to principles of open science, open source, open data, and open access. We release the pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints, aiming to make continuous commitments to fully open-source LLMs. After pre-training the base model, we finetune the Moxin Base model with SOTA post-training framework and instruction data to obtain Moxin Instruct model. To improve the reasoning capability, we further finetune our Instruct model with chain-of-thought data distilled from DeepSeek R1, and then use Group Relative Policy Optimization (GRPO) following DeepSeek R1 to finetune our model, leading to the Moxin Reasoning model. Moreover, we develop our vision language model based on our Moxin model. Experiments show that our models achieve superior performance in various evaluations such as zero-shot evaluation, few-shot evaluation, and CoT evaluation.

nan


Article 329

Title@2025-06-11 (3): Curvature Tuning: Provable Training-free Model Steering From a Single Parameter

Title: Curvature Tuning: Provable Training-free Model Steering From a Single Parameter Curvature Tuning: Vorstellbare trainingsfreie Modelllenkung aus einem einzigen Parameter 曲线图示:从单一参数看可实现的无培训示范指导 2502.07783v4

Authors (3): Leyang Hu, Matteo Gamba, Randall Balestriero

The scaling of model and data sizes has reshaped the AI landscape, establishing finetuning pretrained models as the standard paradigm for solving downstream tasks. However, dominant finetuning methods typically rely on weight adaptation, often lack interpretability, and depend on heuristically chosen hyperparameters. In this paper, we take a different perspective and shift the focus from weights to activation functions, viewing them through the lens of spline operators. We propose Curvature Tuning (CT), an interpretable and principled steering method that modulates a model’s decision boundary by injecting a single hyperparameter into its activation functions. We show that CT provably adjusts model decision boundary curvature and, more fundamentally, projects a model onto a space of smooth functions-thereby complementing current finetuning methods, whose effect lies primarily in feature adaptation. Making this hyperparameter trainable gives rise to a novel and highly parameter-efficient finetuning method. Empirically, CT improves both generalization and robustness. For example, it boosts downstream accuracy of ResNet-50/152 by 7.14%/8.46% over linear probing and 4.64%/1.70% over LoRA across 12 datasets, and improves robust accuracy on the $\ell_\infty$ benchmark from RobustBench by 1032.64%/1494.46%. Our code is available at https://github.com/Leon-Leyang/curvature-tuning.

nan


Article 330

Title@2025-06-11 (3): The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability

Title: The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability Die Komplexität der Online-Strategischen Entscheidungsfindung mit Informationen Asymmetrie und Wissenstransportierbarkeit 在线战略决策与信息不对称和知识可传输性 2506.09940v1

Authors (7): Jiachen Hu, Rui Ai, Han Zhong, Xiaoyu Chen, Liwei Wang, Zhaoran Wang, Zhuoran Yang

Information asymmetry is a pervasive feature of multi-agent systems, especially evident in economics and social sciences. In these settings, agents tailor their actions based on private information to maximize their rewards. These strategic behaviors often introduce complexities due to confounding variables. Simultaneously, knowledge transportability poses another significant challenge, arising from the difficulties of conducting experiments in target environments. It requires transferring knowledge from environments where empirical data is more readily available. Against these backdrops, this paper explores a fundamental question in online learning: Can we employ non-i.i.d. actions to learn about confounders even when requiring knowledge transfer? We present a sample-efficient algorithm designed to accurately identify system dynamics under information asymmetry and to navigate the challenges of knowledge transfer effectively in reinforcement learning, framed within an online strategic interaction model. Our method provably achieves learning of an $\epsilon$-optimal policy with a tight sample complexity of $O(1/\epsilon^2)$.

nan


Article 331

Title@2025-06-11 (3): Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order

Title: Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order Leveraging Coordinate Momentum in SignSGD und Muon: Speicheroptimierte Null-Order 利用 SignSGD 和 Muon 中的调控坐标动态: 内存适应的零点 2506.04430v2

Authors (8): Egor Petrov, Grigoriy Evseev, Aleksey Antonov, Andrey Veprikov, Pavel Plyusnin, Nikolay Bushkov, Stanislav Moiseev, Aleksandr Beznosikov

Fine-tuning Large Language Models (LLMs) is essential for adapting pre-trained models to downstream tasks. Yet traditional first-order optimizers such as Stochastic Gradient Descent (SGD) and Adam incur prohibitive memory and computational costs that scale poorly with model size. In this paper, we investigate zero-order (ZO) optimization methods as a memory- and compute-efficient alternative, particularly in the context of parameter-efficient fine-tuning techniques like LoRA. We propose $\texttt{JAGUAR SignSGD}$, a ZO momentum-based algorithm that extends ZO SignSGD, requiring the same number of parameters as the standard ZO SGD and only $\mathcal{O}(1)$ function evaluations per iteration. To the best of our knowledge, this is the first study to establish rigorous convergence guarantees for SignSGD in the stochastic ZO case. We further propose $\texttt{JAGUAR Muon}$, a novel ZO extension of the Muon optimizer that leverages the matrix structure of model parameters, and we provide its convergence rate under arbitrary stochastic noise. Through extensive experiments on challenging LLM fine-tuning benchmarks, we demonstrate that the proposed algorithms meet or exceed the convergence quality of standard first-order methods, achieving significant memory reduction. Our theoretical and empirical results establish new ZO optimization methods as a practical and theoretically grounded approach for resource-constrained LLM adaptation. Our code is available at https://github.com/brain-mmo-lab/ZO_LLM

nan


Article 332

Title@2025-06-11 (3): Limits of nonlinear and dispersive fiber propagation for an optical fiber-based extreme learning machine

Title: Limits of nonlinear and dispersive fiber propagation for an optical fiber-based extreme learning machine Grenzen der nichtlinearen und dispersiven Faserausbreitung für eine optische Faser-basierte extreme Lernmaschine 光纤极端学习机器的非线性和分散性纤维传播限制 2503.03649v3

Authors (8): Andrei V. Ermolaev, Mathilde Hary, Lev Leybov, Piotr Ryczkowski, Anas Skalli, Daniel Brunner, Goëry Genty, John M. Dudley

We report a generalized nonlinear Schr"odinger equation simulation model of an extreme learning machine (ELM) based on optical fiber propagation. Using the MNIST handwritten digit dataset as a benchmark, we study how accuracy depends on propagation dynamics, as well as parameters governing spectral encoding, readout, and noise. For this dataset and with quantum noise limited input, test accuracies of : over 91% and 93% are found for propagation in the anomalous and normal dispersion regimes respectively. Our results also suggest that quantum noise on the input pulses introduces an intrinsic penalty to ELM performance.

nan


Article 333

Title@2025-06-11 (3): CaLMQA: Exploring culturally specific long-form question answering across 23 languages

Title: CaLMQA: Exploring culturally specific long-form question answering across 23 languages CaLMQA: Erforschen von kulturell spezifischen Langzeitfragen in 23 Sprachen CALMQA:探讨以23种语言回答文化上特有的长式问题 2406.17761v3

Authors (6): Shane Arora, Marzena Karpinska, Hung-Ting Chen, Ipsita Bhattacharjee, Mohit Iyyer, Eunsol Choi

Despite rising global usage of large language models (LLMs), their ability to generate long-form answers to culturally specific questions remains unexplored in many languages. To fill this gap, we perform the first study of textual multilingual long-form QA by creating CaLMQA, a dataset of 51.7K culturally specific questions across 23 different languages. We define culturally specific questions as those that refer to concepts unique to one or a few cultures, or have different answers depending on the cultural or regional context. We obtain these questions by crawling naturally-occurring questions from community web forums in high-resource languages, and by hiring native speakers to write questions in under-resourced, rarely-studied languages such as Fijian and Kirundi. Our data collection methodologies are translation-free, enabling the collection of culturally unique questions like “Kuber iki umwami wa mbere w’uburundi yitwa Ntare?” (Kirundi; English translation: “Why was the first king of Burundi called Ntare (Lion)?”). We evaluate factuality, relevance and surface-level quality of LLM-generated long-form answers, finding that (1) for many languages, even the best models make critical surface-level errors (e.g., answering in the wrong language, repetition), especially for low-resource languages; and (2) answers to culturally specific questions contain more factual errors than answers to culturally agnostic questions – questions that have consistent meaning and answer across many cultures. We release CaLMQA to facilitate future research in cultural and multilingual long-form QA.

nan


Article 334

Title@2025-06-11 (3): Generate-then-Verify: Reconstructing Data from Limited Published Statistics

Title: Generate-then-Verify: Reconstructing Data from Limited Published Statistics Generate-then-Verify: Rekonstruieren von Daten aus begrenzten veröffentlichten Statistiken 生成 - - - - - - - 核查:重新构建来自有限出版统计数据的数据 2504.21199v2

Authors (5): Terrance Liu, Eileen Xiao, Adam Smith, Pratiksha Thaker, Zhiwei Steven Wu

We study the problem of reconstructing tabular data from aggregate statistics, in which the attacker aims to identify interesting claims about the sensitive data that can be verified with 100% certainty given the aggregates. Successful attempts in prior work have conducted studies in settings where the set of published statistics is rich enough that entire datasets can be reconstructed with certainty. In our work, we instead focus on the regime where many possible datasets match the published statistics, making it impossible to reconstruct the entire private dataset perfectly (i.e., when approaches in prior work fail). We propose the problem of partial data reconstruction, in which the goal of the adversary is to instead output a $\textit{subset}$ of rows and/or columns that are $\textit{guaranteed to be correct}$. We introduce a novel integer programming approach that first $\textbf{generates}$ a set of claims and then $\textbf{verifies}$ whether each claim holds for all possible datasets consistent with the published aggregates. We evaluate our approach on the housing-level microdata from the U.S. Decennial Census release, demonstrating that privacy violations can still persist even when information published about such data is relatively sparse.

nan


Article 335

Title@2025-06-11 (3): Discovering Forbidden Topics in Language Models

Title: Discovering Forbidden Topics in Language Models Verbotene Themen in Sprachmodellen entdecken 发现语言模型中的禁止专题 2505.17441v3

Authors (4): Can Rager, Chris Wendler, Rohit Gandikota, David Bau

Refusal discovery is the task of identifying the full set of topics that a language model refuses to discuss. We introduce this new problem setting and develop a refusal discovery method, Iterated Prefill Crawler (IPC), that uses token prefilling to find forbidden topics. We benchmark IPC on Tulu-3-8B, an open-source model with public safety tuning data. Our crawler manages to retrieve 31 out of 36 topics within a budget of 1000 prompts. Next, we scale the crawler to a frontier model using the prefilling option of Claude-Haiku. Finally, we crawl three widely used open-weight models: Llama-3.3-70B and two of its variants finetuned for reasoning: DeepSeek-R1-70B and Perplexity-R1-1776-70B. DeepSeek-R1-70B reveals patterns consistent with censorship tuning: The model exhibits “thought suppression” behavior that indicates memorization of CCP-aligned responses. Although Perplexity-R1-1776-70B is robust to censorship, IPC elicits CCP-aligned refusals answers in the quantized model. Our findings highlight the critical need for refusal discovery methods to detect biases, boundaries, and alignment failures of AI systems.

nan


Article 336

Title@2025-06-11 (3): RNE: a plug-and-play framework for diffusion density estimation and inference-time control

Title: RNE: a plug-and-play framework for diffusion density estimation and inference-time control RNE: ein Plug-and-Play-Framework für Diffusionsdichteschätzung und Inferenzzeitkontrolle RNE:用于传播密度估计和推断时间控制的一个插件和操作框架 2506.05668v2

Authors (4): Jiajun He, José Miguel Hernández-Lobato, Yuanqi Du, Francisco Vargas

In this paper, we introduce the Radon-Nikodym Estimator (RNE), a flexible, plug-and-play framework for diffusion inference-time density estimation and control, based on the concept of the density ratio between path distributions. RNE connects and unifies a variety of existing density estimation and inference-time control methods under a single and intuitive perspective, stemming from basic variational inference and probabilistic principles therefore offering both theoretical clarity and practical versatility. Experiments demonstrate that RNE delivers strong results in diffusion density estimation, and offers broad applicability to inference-time control tasks – such as annealing, diffusion model composition, and reward-tilting – with promising inference-time scaling performance.

nan


Article 337

Title@2025-06-11 (3): Bayesian Probabilistic Matrix Factorization

Title: Bayesian Probabilistic Matrix Factorization Bayesische probabilistische Matrix-Fabrikisierung Bayesian 概率矩阵 2506.09928v1

Authors (2): Ruixuan Xu, Xiangxiang Weng

Matrix factorization is a widely used technique in recommendation systems. Probabilistic Matrix Factorization (PMF) [1] extends traditional matrix factorization by incorporating probability distributions over latent factors, allowing for uncertainty quantification. However, computing the posterior distribution is intractable due to the high-dimensional integral. To address this, we employ two Bayesian inference methods: Markov Chain Monte Carlo (MCMC) [2] and Variational Inference (VI) [3] to approximate the posterior. We evaluate their performance on MovieLens dataset and compare their convergence speed, predictive accuracy, and computational efficiency. Experimental results demonstrate that VI offers faster convergence, while MCMC provides more accurate posterior estimates.

nan


Article 338

Title@2025-06-11 (3): Mamba time series forecasting with uncertainty quantification

Title: Mamba time series forecasting with uncertainty quantification Mamba-Zeitreihenvorhersage mit Unsicherheitsquantifizierung Mamba时间序列预测,带有不确定性量化的Mamba时间序列 2503.10873v2

Authors (5): Pedro Pessoa, Paul Campitelli, Douglas P. Shepherd, S. Banu Ozkan, Steve Pressé

State space models, such as Mamba, have recently garnered attention in time series forecasting due to their ability to capture sequence patterns. However, in electricity consumption benchmarks, Mamba forecasts exhibit a mean error of approximately 8\%. Similarly, in traffic occupancy benchmarks, the mean error reaches 18\%. This discrepancy leaves us to wonder whether the prediction is simply inaccurate or falls within error given spread in historical data. To address this limitation, we propose a method to quantify the predictive uncertainty of Mamba forecasts. Here, we propose a dual-network framework based on the Mamba architecture for probabilistic forecasting, where one network generates point forecasts while the other estimates predictive uncertainty by modeling variance. We abbreviate our tool, Mamba with probabilistic time series forecasting, as Mamba-ProbTSF and the code for its implementation is available on GitHub (https://github.com/PessoaP/Mamba-ProbTSF). Evaluating this approach on synthetic and real-world benchmark datasets, we find Kullback-Leibler divergence between the learned distributions and the data–which, in the limit of infinite data, should converge to zero if the model correctly captures the underlying probability distribution–reduced to the order of $10^{-3}$ for synthetic data and $10^{-1}$ for real-world benchmark, demonstrating its effectiveness. We find that in both the electricity consumption and traffic occupancy benchmark, the true trajectory stays within the predicted uncertainty interval at the two-sigma level about 95\% of the time. We end with a consideration of potential limitations, adjustments to improve performance, and considerations for applying this framework to processes for purely or largely stochastic dynamics where the stochastic changes accumulate, as observed for example in pure Brownian motion or molecular dynamics trajectories.

nan


Article 339

Title@2025-06-11 (3): Dynamic Diffusion Schrödinger Bridge in Astrophysical Observational Inversions

Title: Dynamic Diffusion Schrödinger Bridge in Astrophysical Observational Inversions Dynamische Diffusion Schrödinger Brücke in astrophysikalischen Beobachtungsinversionen 天体物理观测反转中的动态扩散Srödinger桥 2506.08065v2

Authors (5): Ye Zhu, Duo Xu, Zhiwei Deng, Jonathan C. Tan, Olga Russakovsky

We study Diffusion Schr"odinger Bridge (DSB) models in the context of dynamical astrophysical systems, specifically tackling observational inverse prediction tasks within Giant Molecular Clouds (GMCs) for star formation. We introduce the Astro-DSB model, a variant of DSB with the pairwise domain assumption tailored for astrophysical dynamics. By investigating its learning process and prediction performance in both physically simulated data and in real observations (the Taurus B213 data), we present two main takeaways. First, from the astrophysical perspective, our proposed paired DSB method improves interpretability, learning efficiency, and prediction performance over conventional astrostatistical and other machine learning methods. Second, from the generative modeling perspective, probabilistic generative modeling reveals improvements over discriminative pixel-to-pixel modeling in Out-Of-Distribution (OOD) testing cases of physical simulations with unseen initial conditions and different dominant physical processes. Our study expands research into diffusion models beyond the traditional visual synthesis application and provides evidence of the models’ learning abilities beyond pure data statistics, paving a path for future physics-aware generative models which can align dynamics between machine learning and real (astro)physical systems.

nan


Article 340

Title@2025-06-11 (3): Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

Title: Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning Apollo: A Posteriori Label-Only Mitgliedschaft Inferenz Angriff auf Maschine Unlearning 阿波罗:一个后选标签 - 唯一的会员 推论攻击 机器退出学习 2506.09923v1

Authors (3): Liou Tang, James Joshi, Ashish Kundu

Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.

nan


Article 341

Title@2025-06-11 (3): Logits-Based Finetuning

Title: Logits-Based Finetuning Logits-based Finetuning 基于登录的微调 2505.24461v2

Authors (7): Jingyao Li, Senqiao Yang, Sitong Wu, Han Shi, Chuanyang Zheng, Hong Xu, Jiaya Jia

In recent years, developing compact and efficient large language models (LLMs) has emerged as a thriving area of research. Traditional Supervised Fine-Tuning (SFT), which relies on singular ground truth labels, often fails to capture token-level dependencies and linguistic diversity. To address these limitations, we propose a logits-based fine-tuning framework that integrates the strengths of supervised learning and knowledge distillation. Our approach constructs enriched training targets by combining teacher logits with ground truth labels, preserving both correctness and linguistic diversity. This ensures more reliable and effective training. We constructed a large-scale 1.2M logits dataset and trained a series of science-focused models. Experimental results demonstrate that our method achieves significant improvements, with accuracy gains of 18% on Mawps and 22.7% on TabMWP. Across nine widely used mathematical benchmarks, our method consistently outperforms prior SFT models, achieving an average improvement of 7.28%. Codes are available at https://github.com/dvlab-research/Logits-Based-Finetuning.

nan


Article 342

Title@2025-06-11 (3): Optimal Noise Reduction in Dense Mixed-Membership Stochastic Block Models under Diverging Spiked Eigenvalues Condition

Title: Optimal Noise Reduction in Dense Mixed-Membership Stochastic Block Models under Diverging Spiked Eigenvalues Condition Optimale Lärmreduzierung in Dichte Mischmitgliedschaft Stochastic Block Models unter Diverging Spiked Eigenvalues Zustand 在四分五裂的斯派派值条件下,减少高密度混合混合成员制碎块模型的最佳噪音 2307.14530v3

Authors (2): Fedor Noskov, Maxim Panov

Community detection is one of the most critical problems in modern network science. Its applications can be found in various fields, from protein modeling to social network analysis. Recently, many papers appeared studying the problem of overlapping community detection, where each node of a network may belong to several communities. In this work, we consider Mixed-Membership Stochastic Block Model (MMSB) first proposed by Airoldi et al. MMSB provides quite a general setting for modeling overlapping community structure in graphs. The central question of this paper is to reconstruct relations between communities given an observed network. We compare different approaches and establish the minimax lower bound on the estimation error. Then, we propose a new estimator that matches this lower bound. Theoretical results are proved under fairly general conditions on the considered model. Finally, we illustrate the theory in a series of experiments.

nan


Article 343

Title@2025-06-11 (3): Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity

Title: Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity Adam exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity Adam 通过协调的适应性 开发 $\ ell\ incty$- 损失地貌测量 2410.08198v3

Authors (3): Shuo Xie, Mohamad Amin Mohamadi, Zhiyuan Li

Adam outperforms SGD when training language models. Yet this advantage is not well-understood theoretically – previous convergence analysis for Adam and SGD mainly focuses on the number of steps $T$ and is already minimax-optimal in non-convex cases, which are both $\widetilde{O}(T^{-1/4})$. In this work, we argue that the exploitation of nice $\ell_\infty$-geometry is the key advantage of Adam over SGD. More specifically, we give a new convergence analysis for Adam under novel assumptions that loss is smooth under $\ell_\infty$-geometry rather than the more common $\ell_2$-geometry, which yields a much better empirical smoothness constant for GPT-2 and ResNet models. Our experiments confirm that Adam performs much worse when the favorable $\ell_\infty$-geometry is changed while SGD provably remains unaffected. We also extend the convergence analysis to blockwise Adam under novel blockwise smoothness assumptions.

nan


Article 344

Title@2025-06-11 (3): Temperature Optimization for Bayesian Deep Learning

Title: Temperature Optimization for Bayesian Deep Learning Temperaturoptimierung für Bayesian Deep Learning Bayesian深层学习的温度优化 2410.05757v2

Authors (4): Kenyon Ng, Chris van der Heide, Liam Hodgkinson, Susan Wei

The Cold Posterior Effect (CPE) is a phenomenon in Bayesian Deep Learning (BDL), where tempering the posterior to a cold temperature often improves the predictive performance of the posterior predictive distribution (PPD). Although the term `CPE’ suggests colder temperatures are inherently better, the BDL community increasingly recognizes that this is not always the case. Despite this, there remains no systematic method for finding the optimal temperature beyond grid search. In this work, we propose a data-driven approach to select the temperature that maximizes test log-predictive density, treating the temperature as a model parameter and estimating it directly from the data. We empirically demonstrate that our method performs comparably to grid search, at a fraction of the cost, across both regression and classification tasks. Finally, we highlight the differing perspectives on CPE between the BDL and Generalized Bayes communities: while the former primarily emphasizes the predictive performance of the PPD, the latter prioritizes the utility of the posterior under model misspecification; these distinct objectives lead to different temperature preferences.

nan


Article 345

Title@2025-06-11 (3): PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants

Title: PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants PersonaLens: Ein Benchmark für die Personalisierungsbewertung in Conversational AI Assistants 人际关系:对话性AI助理个人化评价基准 2506.09902v1

Authors (6): Zheng Zhao, Clara Vania, Subhradeep Kayal, Naila Khan, Shay B. Cohen, Emine Yilmaz

Large language models (LLMs) have advanced conversational AI assistants. However, systematically evaluating how well these assistants apply personalization–adapting to individual user preferences while completing tasks–remains challenging. Existing personalization benchmarks focus on chit-chat, non-conversational tasks, or narrow domains, failing to capture the complexities of personalized task-oriented assistance. To address this, we introduce PersonaLens, a comprehensive benchmark for evaluating personalization in task-oriented AI assistants. Our benchmark features diverse user profiles equipped with rich preferences and interaction histories, along with two specialized LLM-based agents: a user agent that engages in realistic task-oriented dialogues with AI assistants, and a judge agent that employs the LLM-as-a-Judge paradigm to assess personalization, response quality, and task success. Through extensive experiments with current LLM assistants across diverse tasks, we reveal significant variability in their personalization capabilities, providing crucial insights for advancing conversational AI systems.

nan


Article 346

Title@2025-06-11 (3): “What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)

Title: “What are my options?”: Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended) “Was sind meine Optionen?”: Erklären von RL-Agenten mit unterschiedlichen nahezu optimalen Alternativen (erweitert) “我有什么选择?”:解释具有多种近最佳替代方法的RL代理(扩展) 2506.09901v1

Authors (4): Noel Brindise, Vijeth Hebbar, Riya Shah, Cedric Langbort

In this work, we provide an extended discussion of a new approach to explainable Reinforcement Learning called Diverse Near-Optimal Alternatives (DNA), first proposed at L4DC 2025. DNA seeks a set of reasonable “options” for trajectory-planning agents, optimizing policies to produce qualitatively diverse trajectories in Euclidean space. In the spirit of explainability, these distinct policies are used to “explain” an agent’s options in terms of available trajectory shapes from which a human user may choose. In particular, DNA applies to value function-based policies on Markov decision processes where agents are limited to continuous trajectories. Here, we describe DNA, which uses reward shaping in local, modified Q-learning problems to solve for distinct policies with guaranteed epsilon-optimality. We show that it successfully returns qualitatively different policies that constitute meaningfully different “options” in simulation, including a brief comparison to related approaches in the stochastic optimization field of Quality Diversity. Beyond the explanatory motivation, this work opens new possibilities for exploration and adaptive planning in RL.

nan


Article 347

Title@2025-06-11 (3): The Remarkable Robustness of LLMs: Stages of Inference?

Title: The Remarkable Robustness of LLMs: Stages of Inference? Die bemerkenswerte Robustheit von LLMs: Stufen der Schlussfolgerung? LLMS的显著威力:推论阶段? 2406.19384v2

Authors (3): Vedang Lad, Wes Gurnee, Max Tegmark

We investigate the robustness of Large Language Models (LLMs) to structural interventions by deleting and swapping adjacent layers during inference. Surprisingly, models retain 72-95% of their original top-1 prediction accuracy without any fine-tuning. We find that performance degradation is not uniform across layers: interventions to the early and final layers cause the most degradation, while the model is remarkably robust to dropping middle layers. This pattern of localized sensitivity motivates our hypothesis of four stages of inference, observed across diverse model families and sizes: (1) detokenization, where local context is integrated to lift raw token embeddings into higher-level representations; (2) feature engineering, where task- and entity-specific features are iteratively refined; (3) prediction ensembling, where hidden states are aggregated into plausible next-token predictions; and (4) residual sharpening, where irrelevant features are suppressed to finalize the output distribution. Synthesizing behavioral and mechanistic evidence, we provide a framework for interpreting depth-dependent computations in LLMs.

nan


Article 348

Title@2025-06-11 (3): A look at adversarial attacks on radio waveforms from discrete latent space

Title: A look at adversarial attacks on radio waveforms from discrete latent space Ein Blick auf feindliche Angriffe auf Radiowellenformen aus diskretem latenten Raum 审视对来自离散潜空空间的无线电波形的对抗性攻击 2506.09896v1

Authors (3): Attanasia Garuso, Silvija Kokalj-Filipovic, Yagna Kaasaragadda

Having designed a VQVAE that maps digital radio waveforms into discrete latent space, and yields a perfectly classifiable reconstruction of the original data, we here analyze the attack suppressing properties of VQVAE when an adversarial attack is performed on high-SNR radio-frequency (RF) data-points. To target amplitude modulations from a subset of digitally modulated waveform classes, we first create adversarial attacks that preserve the phase between the in-phase and quadrature component whose values are adversarially changed. We compare them with adversarial attacks of the same intensity where phase is not preserved. We test the classification accuracy of such adversarial examples on a classifier trained to deliver 100% accuracy on the original data. To assess the ability of VQVAE to suppress the strength of the attack, we evaluate the classifier accuracy on the reconstructions by VQVAE of the adversarial datapoints and show that VQVAE substantially decreases the effectiveness of the attack. We also compare the I/Q plane diagram of the attacked data, their reconstructions and the original data. Finally, using multiple methods and metrics, we compare the probability distribution of the VQVAE latent space with and without attack. Varying the attack strength, we observe interesting properties of the discrete space, which may help detect the attacks.

nan


Article 349

Title@2025-06-11 (3): Causal Climate Emulation with Bayesian Filtering

Title: Causal Climate Emulation with Bayesian Filtering Causal Klima Emulation mit Bayesian Filtering 以贝耶斯过滤器模拟因果气候 2506.09891v1

Authors (9): Sebastian Hickman, Ilija Trajkovic, Julia Kaltenborn, Francis Pelletier, Alex Archibald, Yaniv Gurwicz, Peer Nowack, David Rolnick, Julien Boussard

Traditional models of climate change use complex systems of coupled equations to simulate physical processes across the Earth system. These simulations are highly computationally expensive, limiting our predictions of climate change and analyses of its causes and effects. Machine learning has the potential to quickly emulate data from climate models, but current approaches are not able to incorporate physics-informed causal relationships. Here, we develop an interpretable climate model emulator based on causal representation learning. We derive a physics-informed approach including a Bayesian filter for stable long-term autoregressive emulation. We demonstrate that our emulator learns accurate climate dynamics, and we show the importance of each one of its components on a realistic synthetic dataset and data from two widely deployed climate models.

nan


Article 350

Title@2025-06-11 (3): Learning single-index models via harmonic decomposition

Title: Learning single-index models via harmonic decomposition Ein-Index-Modelle über harmonische Zersetzung lernen 通过口腔分解学习单指数模型 2506.09887v1

Authors (4): Nirmit Joshi, Hugo Koubbi, Theodor Misiakiewicz, Nathan Srebro

We study the problem of learning single-index models, where the label $y \in \mathbb{R}$ depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown one-dimensional projection $\langle \boldsymbol{w}*,\boldsymbol{x}\rangle$. Prior work has shown that under Gaussian inputs, the statistical and computational complexity of recovering $\boldsymbol{w}*$ is governed by the Hermite expansion of the link function. In this paper, we propose a new perspective: we argue that “spherical harmonics” – rather than “Hermite polynomials” – provide the natural basis for this problem, as they capture its intrinsic “rotational symmetry”. Building on this insight, we characterize the complexity of learning single-index models under arbitrary spherically symmetric input distributions. We introduce two families of estimators – based on tensor unfolding and online SGD – that respectively achieve either optimal sample complexity or optimal runtime, and argue that estimators achieving both may not exist in general. When specialized to Gaussian inputs, our theory not only recovers and clarifies existing results but also reveals new phenomena that had previously been overlooked.

nan


Article 351

Title@2025-06-11 (3): OmniJet-$α_C$: Learning point cloud calorimeter simulations using generative transformers

Title: OmniJet-$α_C$: Learning point cloud calorimeter simulations using generative transformers OmniJet-$α_C$: Simulationen von Punktwolkenkalorimetern mit generativen Transformatoren OmniJet-$ C$: 使用基因变压器学习点云度热量计模拟 2501.05534v2

Authors (6): Joschka Birk, Frank Gaede, Anna Hallin, Gregor Kasieczka, Martina Mozzanica, Henning Rose

We show the first use of generative transformers for generating calorimeter showers as point clouds in a high-granularity calorimeter. Using the tokenizer and generative part of the OmniJet-${\alpha}$ model, we represent the hits in the detector as sequences of integers. This model allows variable-length sequences, which means that it supports realistic shower development and does not need to be conditioned on the number of hits. Since the tokenization represents the showers as point clouds, the model learns the geometry of the showers without being restricted to any particular voxel grid.

nan


Article 352

Title@2025-06-11 (3): Leveraging data-driven weather models for improving numerical weather prediction skill through large-scale spectral nudging

Title: Leveraging data-driven weather models for improving numerical weather prediction skill through large-scale spectral nudging Nutzung von datengesteuerten Wettermodellen zur Verbesserung der numerischen Wettervorhersagefähigkeit durch großflächige spektrale Nudging 利用数据驱动天气模型,通过大型光谱裸露提高数字天气预测技能 2407.06100v3

Authors (12): Syed Zahid Husain, Leo Separovic, Jean-François Caron, Rabah Aider, Mark Buehner, Stéphane Chamberland, Ervig Lapalme, Ron McTaggart-Cowan, Christopher Subich, Paul A. Vaillancourt, Jing Yang, Ayrton Zadra

Operational meteorological forecasting has long relied on physics-based numerical weather prediction (NWP) models. Recently, this landscape has faced disruption by the advent of data-driven artificial intelligence (AI)-based weather models, which offer tremendous computational performance and competitive forecasting accuracy. However, data-driven models for medium-range forecasting generally suffer from major limitations, including low effective resolution and a narrow range of predicted variables. This study illustrates the relative strengths and weaknesses of these competing paradigms using the physics-based GEM (Global Environmental Multiscale) and the AI-based GraphCast models. Analyses of their respective global predictions in physical and spectral space reveal that GraphCast-predicted large scales outperform GEM, particularly for longer lead times, even though fine scales predicted by GraphCast suffer from excessive smoothing. Building on this insight, a hybrid NWP-AI system is proposed, wherein temperature and horizontal wind components predicted by GEM are spectrally nudged toward GraphCast predictions at large scales, while GEM itself freely generates the fine-scale details critical for local predictability and weather extremes. This hybrid approach is capable of leveraging the strengths of GraphCast to enhance the prediction skill of the GEM model while generating a full suite of physically consistent forecast fields with a full power spectrum. Additionally, trajectories of tropical cyclones are predicted with enhanced accuracy without significant changes in intensity. Work is in progress for operationalization of this hybrid system at the Canadian Meteorological Centre.

nan


Article 353

Title@2025-06-11 (3): UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching

Title: UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching UmbraTTS: Text-zu-Sprechen an Umweltkontexte anpassen mit Flow Matching UmbratTS:用流动匹配使文字语音适应环境环境环境 2506.09874v1

Authors (9): Neta Glazer, Aviv Navon, Yael Segal, Aviv Shamsian, Hilit Segev, Asaf Buchnick, Menachem Pirchi, Gil Hetz, Joseph Keshet

Recent advances in Text-to-Speech (TTS) have enabled highly natural speech synthesis, yet integrating speech with complex background environments remains challenging. We introduce UmbraTTS, a flow-matching based TTS model that jointly generates both speech and environmental audio, conditioned on text and acoustic context. Our model allows fine-grained control over background volume and produces diverse, coherent, and context-aware audio scenes. A key challenge is the lack of data with speech and background audio aligned in natural context. To overcome the lack of paired training data, we propose a self-supervised framework that extracts speech, background audio, and transcripts from unannotated recordings. Extensive evaluations demonstrate that UmbraTTS significantly outperformed existing baselines, producing natural, high-quality, environmentally aware audios.

nan


Article 354

Title@2025-06-11 (3): Private Aggregation for Byzantine-Resilient Heterogeneous Federated Learning

Title: Private Aggregation for Byzantine-Resilient Heterogeneous Federated Learning Private Aggregation für Byzantinisch-Resilient Heterogenes Federated Learning Byzantine-抗后异异质联邦学习私营聚合 2506.09870v1

Authors (2): Maximilian Egger, Rawad Bitar

Ensuring resilience to Byzantine clients while maintaining the privacy of the clients’ data is a fundamental challenge in federated learning (FL). When the clients’ data is homogeneous, suitable countermeasures were studied from an information-theoretic perspective utilizing secure aggregation techniques while ensuring robust aggregation of the clients’ gradients. However, the countermeasures used fail when the clients’ data is heterogeneous. Suitable pre-processing techniques, such as nearest neighbor mixing, were recently shown to enhance the performance of those countermeasures in the heterogeneous setting. Nevertheless, those pre-processing techniques cannot be applied with the introduced privacy-preserving mechanisms. We propose a multi-stage method encompassing a careful co-design of verifiable secret sharing, secure aggregation, and a tailored symmetric private information retrieval scheme to achieve information-theoretic privacy guarantees and Byzantine resilience under data heterogeneity. We evaluate the effectiveness of our scheme on a variety of attacks and show how it outperforms the previously known techniques. Since the communication overhead of secure aggregation is non-negligible, we investigate the interplay with zero-order estimation methods that reduce the communication cost in state-of-the-art FL tasks and thereby make private aggregation scalable.

nan


Article 355

Title@2025-06-11 (3): Machine Learning-Based Classification of Oils Using Dielectric Properties and Microwave Resonant Sensing

Title: Machine Learning-Based Classification of Oils Using Dielectric Properties and Microwave Resonant Sensing Machine Learning-based Klassifizierung von Ölen mit dielektrischen Eigenschaften und Mikrowellen-Resonant Sensing 利用电力特性和微波共振遥感对石油进行机 机 动 学习 分类 2506.09867v1

Authors (3): Amit Baran Dey, Wasim Arif, Rakhesh Singh Kshetrimayum

This paper proposes a machine learning-based methodology for the classification of various oil samples based on their dielectric properties, utilizing a microwave resonant sensor. The dielectric behaviour of oils, governed by their molecular composition, induces distinct shifts in the sensor’s resonant frequency and amplitude response. These variations are systematically captured and processed to extract salient features, which serve as inputs for multiple machine learning classifiers. The microwave resonant sensor operates in a non-destructive, low-power manner, making it particularly well-suited for real-time industrial applications. A comprehensive dataset is developed by varying the permittivity of oil samples and acquiring the corresponding sensor responses. Several classifiers are trained and evaluated using the extracted resonant features to assess their capability in distinguishing between oil types. Experimental results demonstrate that the proposed approach achieves a high classification accuracy of 99.41% with the random forest classifier, highlighting its strong potential for automated oil identification. The system’s compact form factor, efficiency, and high performance underscore its viability for fast and reliable oil characterization in industrial environments.

nan


Article 356

Title@2025-06-11 (3): Conformal Prediction as Bayesian Quadrature

Title: Conformal Prediction as Bayesian Quadrature Konforme Vorhersage als bayesische Quadratur Bayesian 夸度的非正规预测 2502.13228v2

Authors (2): Jake C. Snell, Thomas L. Griffiths

As machine learning-based prediction systems are increasingly used in high-stakes situations, it is important to understand how such predictive models will perform upon deployment. Distribution-free uncertainty quantification techniques such as conformal prediction provide guarantees about the loss black-box models will incur even when the details of the models are hidden. However, such methods are based on frequentist probability, which unduly limits their applicability. We revisit the central aspects of conformal prediction from a Bayesian perspective and thereby illuminate the shortcomings of frequentist guarantees. We propose a practical alternative based on Bayesian quadrature that provides interpretable guarantees and offers a richer representation of the likely range of losses to be observed at test time.

nan


Article 357

Title@2025-06-11 (3): Guided Graph Compression for Quantum Graph Neural Networks

Title: Guided Graph Compression for Quantum Graph Neural Networks Geführte Graphenkompression für Quantum Graphen-Neural-Netzwerke 量图神经网络的向导图形压缩 2506.09862v1

Authors (6): Mikel Casals, Vasilis Belis, Elias F. Combarro, Eduard Alarcón, Sofia Vallecorsa, Michele Grossi

Graph Neural Networks (GNNs) are effective for processing graph-structured data but face challenges with large graphs due to high memory requirements and inefficient sparse matrix operations on GPUs. Quantum Computing (QC) offers a promising avenue to address these issues and inspires new algorithmic approaches. In particular, Quantum Graph Neural Networks (QGNNs) have been explored in recent literature. However, current quantum hardware limits the dimension of the data that can be effectively encoded. Existing approaches either simplify datasets manually or use artificial graph datasets. This work introduces the Guided Graph Compression (GGC) framework, which uses a graph autoencoder to reduce both the number of nodes and the dimensionality of node features. The compression is guided to enhance the performance of a downstream classification task, which can be applied either with a quantum or a classical classifier. The framework is evaluated on the Jet Tagging task, a classification problem of fundamental importance in high energy physics that involves distinguishing particle jets initiated by quarks from those by gluons. The GGC is compared against using the autoencoder as a standalone preprocessing step and against a baseline classical GNN classifier. Our numerical results demonstrate that GGC outperforms both alternatives, while also facilitating the testing of novel QGNN ansatzes on realistic datasets.

nan


Article 358

Title@2025-06-11 (3): ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models

Title: ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models ImageChain: Advancing Sequential Image-to-Text Reasoning in multimodalen großen Sprachmodellen 图像Chain:在多模式大语言模型中推进按顺序排列的图像到文字理由 2502.19409v2

Authors (3): Danae Sánchez Villegas, Ingo Ziegler, Desmond Elliott

Reasoning over sequences of images remains a challenge for multimodal large language models (MLLMs). While recent models incorporate multi-image data during pre-training, they still struggle to recognize sequential structures, often treating images independently. This work introduces ImageChain, a framework that enhances MLLMs with sequential reasoning capabilities over image data by modeling visual sequences as a multi-turn conversation. In ImageChain, images are interleaved with corresponding textual descriptions to form a controlled dialogue that explicitly captures temporal dependencies and narrative progression. Our method optimizes for the task of next-scene description, where the model generates a context-aware description of an upcoming scene based on preceding visual and textual cues. We demonstrate that our approach improves performance on the next-scene description task – achieving an average improvement from 3.7% to 19% in SimRate, a metric that quantifies semantic similarity to human-annotated ground truths. Moreover, ImageChain achieves robust zero-shot out-of-domain performance in applications ranging from comics to robotics. Extensive experiments validate that instruction-tuning in a multimodal, multi-turn conversation design is key to bridging the gap between static image understanding and temporally-aware reasoning.

nan


Article 359

Title@2025-06-11 (3): Tractable hierarchies of convex relaxations for polynomial optimization on the nonnegative orthant

Title: Tractable hierarchies of convex relaxations for polynomial optimization on the nonnegative orthant Tractable Hierarchien konvexer Entspannungen zur polynomialen Optimierung auf dem nicht negativen Orthanten 用于非阴性恒星多边优化的 convex 放松的可跟踪的等级结构 2209.06175v2

Authors (4): Ngoc Hoang Anh Mai, Victor Magron, Jean-Bernard Lasserre, Kim-Chuan Toh

We consider polynomial optimization problems (POP) on a semialgebraic set contained in the nonnegative orthant (every POP on a compact set can be put in this format by a simple translation of the origin). Such a POP can be converted to an equivalent POP by squaring each variable. Using even symmetry and the concept of factor width, we propose a hierarchy of semidefinite relaxations based on the extension of P'olya’s Positivstellensatz by Dickinson-Povh. As its distinguishing and crucial feature, the maximal matrix size of each resulting semidefinite relaxation can be chosen arbitrarily and in addition, we prove that the sequence of values returned by the new hierarchy converges to the optimal value of the original POP at the rate $O(\varepsilon^{-c})$ if the semialgebraic set has nonempty interior. When applied to (i) robustness certification of multi-layer neural networks and (ii) computation of positive maximal singular values, our method based on P'olya’s Positivstellensatz provides better bounds and runs several hundred times faster than the standard Moment-SOS hierarchy.

nan


Article 360

Title@2025-06-11 (3): FinTSBridge: A New Evaluation Suite for Real-world Financial Prediction with Advanced Time Series Models

Title: FinTSBridge: A New Evaluation Suite for Real-world Financial Prediction with Advanced Time Series Models FinTSBridge: Eine neue Evaluierungs-Suite für reale finanzielle Vorhersagen mit Modellen der fortgeschrittenen Zeitreihen FinTSBridge:具有先进时间序列模型的实时世界金融预测新评价套件 2503.06928v2

Authors (7): Yanlong Wang, Jian Xu, Tiantian Gao, Hongkang Zhang, Shao-Lun Huang, Danny Dongning Sun, Xiao-Ping Zhang

Despite the growing attention to time series forecasting in recent years, many studies have proposed various solutions to address the challenges encountered in time series prediction, aiming to improve forecasting performance. However, effectively applying these time series forecasting models to the field of financial asset pricing remains a challenging issue. There is still a need for a bridge to connect cutting-edge time series forecasting models with financial asset pricing. To bridge this gap, we have undertaken the following efforts: 1) We constructed three datasets from the financial domain; 2) We selected over ten time series forecasting models from recent studies and validated their performance in financial time series; 3) We developed new metrics, msIC and msIR, in addition to MSE and MAE, to showcase the time series correlation captured by the models; 4) We designed financial-specific tasks for these three datasets and assessed the practical performance and application potential of these forecasting models in important financial problems. We hope the developed new evaluation suite, FinTSBridge, can provide valuable insights into the effectiveness and robustness of advanced forecasting models in finanical domains.

nan


Article 361

Title@2025-06-11 (3): Advancing Exchange Rate Forecasting: Leveraging Machine Learning and AI for Enhanced Accuracy in Global Financial Markets

Title: Advancing Exchange Rate Forecasting: Leveraging Machine Learning and AI for Enhanced Accuracy in Global Financial Markets Advancing Exchange Rate Forecasting: Hebeling Machine Learning und KI für verbesserte Genauigkeit in den globalen Finanzmärkten 提高汇率预测:利用机器学习和大赦国际促进全球金融市场更加准确性 2506.09851v1

Authors (7): Md. Yeasin Rahat, Rajan Das Gupta, Nur Raisa Rahman, Sudipto Roy Pritom, Samiur Rahman Shakir, Md Imrul Hasan Showmick, Md. Jakir Hossen

The prediction of foreign exchange rates, such as the US Dollar (USD) to Bangladeshi Taka (BDT), plays a pivotal role in global financial markets, influencing trade, investments, and economic stability. This study leverages historical USD/BDT exchange rate data from 2018 to 2023, sourced from Yahoo Finance, to develop advanced machine learning models for accurate forecasting. A Long Short-Term Memory (LSTM) neural network is employed, achieving an exceptional accuracy of 99.449%, a Root Mean Square Error (RMSE) of 0.9858, and a test loss of 0.8523, significantly outperforming traditional methods like ARIMA (RMSE 1.342). Additionally, a Gradient Boosting Classifier (GBC) is applied for directional prediction, with backtesting on a $10,000 initial capital revealing a 40.82% profitable trade rate, though resulting in a net loss of $20,653.25 over 49 trades. The study analyzes historical trends, showing a decline in BDT/USD rates from 0.012 to 0.009, and incorporates normalized daily returns to capture volatility. These findings highlight the potential of deep learning in forex forecasting, offering traders and policymakers robust tools to mitigate risks. Future work could integrate sentiment analysis and real-time economic indicators to further enhance model adaptability in volatile markets.

nan


Article 362

Title@2025-06-11 (3): Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective

Title: Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective Produkt von Experten mit LLMs: Leistungssteigerung auf ARC ist eine Frage der Perspektive 与LLMM公司的专家产品有关的产品:促进ACR的绩效是一个透视问题 2505.07859v2

Authors (3): Daniel Franzen, Jan Disselhoff, David Hartmann

The Abstraction and Reasoning Corpus (ARC-AGI) poses a significant challenge for large language models (LLMs), exposing limitations in their abstract reasoning abilities. In this work, we leverage task-specific data augmentations throughout the training, generation, and scoring phases, and employ a depth-first search algorithm to generate diverse, high-probability candidate solutions. Furthermore, we utilize the LLM not only as a generator but also as a scorer, using its output probabilities to select the most promising solutions. Our method achieves a score of 71.6% (286.5/400 solved tasks) on the public ARC-AGI evaluation set, demonstrating state-of-the-art performance among publicly available approaches. While concurrent closed-source work has reported higher scores, our method distinguishes itself through its transparency, reproducibility, and remarkably low inference cost, averaging only around 2ct per task on readily available hardware (we assume a price of 36ct/hour for a Nvidia 4090 GPU).

nan


Article 363

Title@2025-06-11 (3): Reinforcement Learning from Human Feedback

Title: Reinforcement Learning from Human Feedback Verstärktes Lernen aus menschlichem Feedback 从人类反馈中学习 2504.12501v2

Authors (1): Nathan Lambert

Reinforcement learning from human feedback (RLHF) has become an important technical and storytelling tool to deploy the latest machine learning systems. In this book, we hope to give a gentle introduction to the core methods for people with some level of quantitative background. The book starts with the origins of RLHF – both in recent literature and in a convergence of disparate fields of science in economics, philosophy, and optimal control. We then set the stage with definitions, problem formulation, data collection, and other common math used in the literature. The core of the book details every optimization stage in using RLHF, from starting with instruction tuning to training a reward model and finally all of rejection sampling, reinforcement learning, and direct alignment algorithms. The book concludes with advanced topics – understudied research questions in synthetic data and evaluation – and open questions for the field.

nan


Article 364

Title@2025-06-11 (3): A Deep Generative Model for the Simulation of Discrete Karst Networks

Title: A Deep Generative Model for the Simulation of Discrete Karst Networks Ein tiefes Generatives Modell für die Simulation diskreter Karst-Netzwerke Discrete Karst 网络模拟深度生成模型 2506.09832v1

Authors (3): Dany Lauzon, Julien Straubhaar, Philippe Renard

The simulation of discrete karst networks presents a significant challenge due to the complexity of the physicochemical processes occurring within various geological and hydrogeological contexts over extended periods. This complex interplay leads to a wide variety of karst network patterns, each intricately linked to specific hydrogeological conditions. We explore a novel approach that represents karst networks as graphs and applies graph generative models (deep learning techniques) to capture the intricate nature of karst environments. In this representation, nodes retain spatial information and properties, while edges signify connections between nodes. Our generative process consists of two main steps. First, we utilize graph recurrent neural networks (GraphRNN) to learn the topological distribution of karst networks. GraphRNN decomposes the graph simulation into a sequential generation of nodes and edges, informed by previously generated structures. Second, we employ denoising diffusion probabilistic models on graphs (G-DDPM) to learn node features (spatial coordinates and other properties). G-DDPMs enable the generation of nodes features on the graphs produced by the GraphRNN that adhere to the learned statistical properties by sampling from the derived probability distribution, ensuring that the generated graphs are realistic and capture the essential features of the original data. We test our approach using real-world karst networks and compare generated subgraphs with actual subgraphs from the database, by using geometry and topology metrics. Our methodology allows stochastic simulation of discrete karst networks across various types of formations, a useful tool for studying the behavior of physical processes such as flow and transport.

nan


Article 365

Title@2025-06-11 (3): Multi-Variable Batch Bayesian Optimization in Materials Research: Synthetic Data Analysis of Noise Sensitivity and Problem Landscape Effects

Title: Multi-Variable Batch Bayesian Optimization in Materials Research: Synthetic Data Analysis of Noise Sensitivity and Problem Landscape Effects Multivariable Batch Bayesian Optimierung in der Materialforschung: Synthetische Datenanalyse von Lärmempfindlichkeit und Problemlandschaftseffekten 材料研究中多种可变批量贝耶斯最佳优化:噪音敏感度和问题景观效应合成数据分析 2504.03943v2

Authors (7): Imon Mia, Armi Tiihonen, Anna Ernst, Anusha Srivastava, Tonio Buonassisi, William Vandenberghe, Julia W. P. Hsu

Bayesian Optimization (BO) machine learning method is increasingly used to guide experimental optimization tasks in materials science. To emulate the large number of input variables and noise-containing results in experimental materials research, we perform batch BO simulation of six design variables with a range of noise levels. Two test cases relevant for materials science problems are examined: a needle-in-a-haystack case (Ackley function) that may be encountered in, e.g., molecule optimizations, and a smooth landscape with a local optimum in addition to the global optimum (Hartmann function) that may be encountered in, e.g., material composition optimization. We show learning curves, performance metrics, and visualization to effectively track the optimization progression and evaluate how the optimization outcomes are affected by noise, batch-picking method, choice of acquisition function, and exploration hyperparameter values. We find that the effects of noise depend on the problem landscape: noise degrades the optimization results of a needle-in-a-haystack search (Ackley) dramatically more. However, with increasing noise, we observe an increasing probability of landing on the local optimum in Hartmann. Therefore, prior knowledge of the problem domain structure and noise level is essential when designing BO for materials research experiments. Synthetic data studies – with known ground truth and controlled noise levels – enable us to isolate and evaluate the impact of different batch BO components, {\it e.g.}, acquisition policy, objective metrics, and hyperparameter values, before transitioning to the inherent uncertainties of real experimental systems. The results and methodology of this study will facilitate a greater utilization of BO in guiding experimental materials research, specifically in settings with a large number of design variables to optimize.

nan


Article 366

Title@2025-06-11 (3): Online Discovery of Simulation Models for Evolving Business Processes (Extended Version)

Title: Online Discovery of Simulation Models for Evolving Business Processes (Extended Version) Online Discovery of Simulation Models for Evolving Business Processes (Erweiterte Version) 不断演变的业务流程模拟模型在线发现(扩展版) 2506.10049v1

Authors (4): Francesco Vinci, Gyunam Park, Wil van der Aalst, Massimiliano de Leoni

Business Process Simulation (BPS) refers to techniques designed to replicate the dynamic behavior of a business process. Many approaches have been proposed to automatically discover simulation models from historical event logs, reducing the cost and time to manually design them. However, in dynamic business environments, organizations continuously refine their processes to enhance efficiency, reduce costs, and improve customer satisfaction. Existing techniques to process simulation discovery lack adaptability to real-time operational changes. In this paper, we propose a streaming process simulation discovery technique that integrates Incremental Process Discovery with Online Machine Learning methods. This technique prioritizes recent data while preserving historical information, ensuring adaptation to evolving process dynamics. Experiments conducted on four different event logs demonstrate the importance in simulation of giving more weight to recent data while retaining historical knowledge. Our technique not only produces more stable simulations but also exhibits robustness in handling concept drift, as highlighted in one of the use cases.

nan


Article 367

Title@2025-06-11 (3): Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation

Title: Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation Sparser, besser, schneller, stärker: Sparsity Detection für effiziente automatische Differenzierung Sparser、更好、更快、更有力:为高效自动差异而进行分级检测 2501.17737v2

Authors (2): Adrian Hill, Guillaume Dalle

From implicit differentiation to probabilistic modeling, Jacobian and Hessian matrices have many potential use cases in Machine Learning (ML), but they are viewed as computationally prohibitive. Fortunately, these matrices often exhibit sparsity, which can be leveraged to speed up the process of Automatic Differentiation (AD). This paper presents advances in sparsity detection, previously the performance bottleneck of Automatic Sparse Differentiation (ASD). Our implementation of sparsity detection is based on operator overloading, able to detect both local and global sparsity patterns, and supports flexible index set representations. It is fully automatic and requires no modification of user code, making it compatible with existing ML codebases. Most importantly, it is highly performant, unlocking Jacobians and Hessians at scales where they were considered too expensive to compute. On real-world problems from scientific ML, graph neural networks and optimization, we show significant speed-ups of up to three orders of magnitude. Notably, using our sparsity detection system, ASD outperforms standard AD for one-off computations, without amortization of either sparsity detection or matrix coloring.

nan


Article 368

Title@2025-06-11 (3): Metritocracy: Representative Metrics for Lite Benchmarks

Title: Metritocracy: Representative Metrics for Lite Benchmarks Metritokratie: Repräsentative Metrics für Lite Benchmarks 宏观经济:利特基准的代议制衡量标准 2506.09813v1

Authors (4): Ariel Procaccia, Benjamin Schiffer, Serena Wang, Shirley Zhang

A common problem in LLM evaluation is how to choose a subset of metrics from a full suite of possible metrics. Subset selection is usually done for efficiency or interpretability reasons, and the goal is often to select a representative'' subset of metrics. However,representative’’ is rarely clearly defined. In this work, we use ideas from social choice theory to formalize two notions of representation for the selection of a subset of evaluation metrics. We first introduce positional representation, which guarantees every alternative is sufficiently represented at every position cutoff. We then introduce positional proportionality, which guarantees no alternative is proportionally over- or under-represented by more than a small error at any position. We prove upper and lower bounds on the smallest number of metrics needed to guarantee either of these properties in the worst case. We also study a generalized form of each property that allows for additional input on groups of metrics that must be represented. Finally, we tie theory to practice through real-world case studies on both LLM evaluation and hospital quality evaluation.

nan


Article 369

Title@2025-06-11 (3): Generalizing Supervised Contrastive learning: A Projection Perspective

Title: Generalizing Supervised Contrastive learning: A Projection Perspective Verallgemeinern des Überwachten Kontrastivlernens: Eine Projektionsperspektive 普及受监督的反竞争学习:预测视角 2506.09810v1

Authors (2): Minoh Jeong, Alfred Hero

Self-supervised contrastive learning (SSCL) has emerged as a powerful paradigm for representation learning and has been studied from multiple perspectives, including mutual information and geometric viewpoints. However, supervised contrastive (SupCon) approaches have received comparatively little attention in this context: for instance, while InfoNCE used in SSCL is known to form a lower bound on mutual information (MI), the relationship between SupCon and MI remains unexplored. To address this gap, we introduce ProjNCE, a generalization of the InfoNCE loss that unifies supervised and self-supervised contrastive objectives by incorporating projection functions and an adjustment term for negative pairs. We prove that ProjNCE constitutes a valid MI bound and affords greater flexibility in selecting projection strategies for class embeddings. Building on this flexibility, we further explore the centroid-based class embeddings in SupCon by exploring a variety of projection methods. Extensive experiments on multiple datasets and settings demonstrate that ProjNCE consistently outperforms both SupCon and standard cross-entropy training. Our work thus refines SupCon along two complementary perspective–mutual information interpretation and projection design–and offers broadly applicable improvements whenever SupCon serves as the foundational contrastive objective.

nan


Article 370

Title@2025-06-11 (3): Network Dynamics-Based Framework for Understanding Deep Neural Networks

Title: Network Dynamics-Based Framework for Understanding Deep Neural Networks Network Dynamics-based Framework zum Verständnis tiefer neuraler Netzwerke 了解深神经网络的动态框架 2501.02436v3

Authors (4): Yuchen Lin, Yong Zhang, Sihan Feng, Hong Zhao

Advancements in artificial intelligence call for a deeper understanding of the fundamental mechanisms underlying deep learning. In this work, we propose a theoretical framework to analyze learning dynamics through the lens of dynamical systems theory. We redefine the notions of linearity and nonlinearity in neural networks by introducing two fundamental transformation units at the neuron level: order-preserving transformations and non-order-preserving transformations. Different transformation modes lead to distinct collective behaviors in weight vector organization, different modes of information extraction, and the emergence of qualitatively different learning phases. Transitions between these phases may occur during training, accounting for key phenomena such as grokking. To further characterize generalization and structural stability, we introduce the concept of attraction basins in both sample and weight spaces. The distribution of neurons with different transformation modes across layers, along with the structural characteristics of the two types of attraction basins, forms a set of core metrics for analyzing the performance of learning models. Hyperparameters such as depth, width, learning rate, and batch size act as control variables for fine-tuning these metrics. Our framework not only sheds light on the intrinsic advantages of deep learning, but also provides a novel perspective for optimizing network architectures and training strategies.

nan


Article 371

Title@2025-06-11 (3): Using Shapley interactions to understand how models use structure

Title: Using Shapley interactions to understand how models use structure Shapley-Interaktionen nutzen, um zu verstehen, wie Modelle Struktur verwenden 利用模拟互动来了解模型如何使用结构 2403.13106v2

Authors (6): Divyansh Singhvi, Diganta Misra, Andrej Erkelens, Raghav Jain, Isabel Papadimitriou, Naomi Saphra

Language is an intricately structured system, and a key goal of NLP interpretability is to provide methodological insights for understanding how language models represent this structure internally. In this paper, we use Shapley Taylor interaction indices (STII) in order to examine how language and speech models internally relate and structure their inputs. Pairwise Shapley interactions measure how much two inputs work together to influence model outputs beyond if we linearly added their independent influences, providing a view into how models encode structural interactions between inputs. We relate the interaction patterns in models to three underlying linguistic structures: syntactic structure, non-compositional semantics, and phonetic coarticulation. We find that autoregressive text models encode interactions that correlate with the syntactic proximity of inputs, and that both autoregressive and masked models encode nonlinear interactions in idiomatic phrases with non-compositional semantics. Our speech results show that inputs are more entangled for pairs where a neighboring consonant is likely to influence a vowel or approximant, showing that models encode the phonetic interaction needed for extracting discrete phonemic representations.

nan


Article 372

Title@2025-06-11 (3): Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy

Title: Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy Automatische Behandlungsplanung mit Verstärkungstraining für eine hoch dosierte Prostata Brachytherapie 利用高剂量前列腺布拉奇疗法强化学习进行自动治疗规划 2506.09805v1

Authors (3): Tonghe Wang, Yining Feng, Xiaofeng Yang

Purpose: In high-dose-rate (HDR) prostate brachytherapy procedures, the pattern of needle placement solely relies on physician experience. We investigated the feasibility of using reinforcement learning (RL) to provide needle positions and dwell times based on patient anatomy during pre-planning stage. This approach would reduce procedure time and ensure consistent plan quality. Materials and Methods: We train a RL agent to adjust the position of one selected needle and all the dwell times on it to maximize a pre-defined reward function after observing the environment. After adjusting, the RL agent then moves on to the next needle, until all needles are adjusted. Multiple rounds are played by the agent until the maximum number of rounds is reached. Plan data from 11 prostate HDR boost patients (1 for training, and 10 for testing) treated in our clinic were included in this study. The dosimetric metrics and the number of used needles of RL plan were compared to those of the clinical results (ground truth). Results: On average, RL plans and clinical plans have very similar prostate coverage (Prostate V100) and Rectum D2cc (no statistical significance), while RL plans have less prostate hotspot (Prostate V150) and Urethra D20% plans with statistical significance. Moreover, RL plans use 2 less needles than clinical plan on average. Conclusion: We present the first study demonstrating the feasibility of using reinforcement learning to autonomously generate clinically practical HDR prostate brachytherapy plans. This RL-based method achieved equal or improved plan quality compared to conventional clinical approaches while requiring fewer needles. With minimal data requirements and strong generalizability, this approach has substantial potential to standardize brachytherapy planning, reduce clinical variability, and enhance patient outcomes.

nan


Article 373

Title@2025-06-11 (3): Regularizing Learnable Feature Extraction for Automatic Speech Recognition

Title: Regularizing Learnable Feature Extraction for Automatic Speech Recognition Regularisieren von Lern-Feature-Extraktion für automatische Spracherkennung 将自动语音识别的可读功能抽取器正规化 2506.09804v1

Authors (5): Peter Vieting, Maximilian Kannen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney

Neural front-ends are an appealing alternative to traditional, fixed feature extraction pipelines for automatic speech recognition (ASR) systems since they can be directly trained to fit the acoustic model. However, their performance often falls short compared to classical methods, which we show is largely due to their increased susceptibility to overfitting. This work therefore investigates regularization methods for training ASR models with learnable feature extraction front-ends. First, we examine audio perturbation methods and show that larger relative improvements can be obtained for learnable features. Additionally, we identify two limitations in the standard use of SpecAugment for these front-ends and propose masking in the short time Fourier transform (STFT)-domain as a simple but effective modification to address these challenges. Finally, integrating both regularization approaches effectively closes the performance gap between traditional and learnable features.

nan


Article 374

Title@2025-06-11 (3): Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols

Title: Devil’s Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols Teufelshand: Daten vergiften Angriffe auf lokal private Graphen-Lernprotokolle 魔鬼之手:对本地私人图案学习程序的数据毒害攻击 2506.09803v1

Authors (5): Longzhu He, Chaozhuo Li, Peng Tang, Litian Zhang, Sen Su

Graph neural networks (GNNs) have achieved significant success in graph representation learning and have been applied to various domains. However, many real-world graphs contain sensitive personal information, such as user profiles in social networks, raising serious privacy concerns when graph learning is performed using GNNs. To address this issue, locally private graph learning protocols have gained considerable attention. These protocols leverage the privacy advantages of local differential privacy (LDP) and the effectiveness of GNN’s message-passing in calibrating noisy data, offering strict privacy guarantees for users’ local data while maintaining high utility (e.g., node classification accuracy) for graph learning. Despite these advantages, such protocols may be vulnerable to data poisoning attacks, a threat that has not been considered in previous research. Identifying and addressing these threats is crucial for ensuring the robustness and security of privacy-preserving graph learning frameworks. This work introduces the first data poisoning attack targeting locally private graph learning protocols. The attacker injects fake users into the protocol, manipulates these fake users to establish links with genuine users, and sends carefully crafted data to the server, ultimately compromising the utility of private graph learning. The effectiveness of the attack is demonstrated both theoretically and empirically. In addition, several defense strategies have also been explored, but their limited effectiveness highlights the need for more robust defenses.

nan


Article 375

Title@2025-06-11 (3): LogProber: Disentangling confidence from contamination in LLM responses

Title: LogProber: Disentangling confidence from contamination in LLM responses LogProber: Entwirren des Vertrauens in LLM-Antworten 日志Prober:解除对LLM反应中污染的信心 2408.14352v2

Authors (3): Nicolas Yax, Pierre-Yves Oudeyer, Stefano Palminteri

In machine learning, contamination refers to situations where testing data leak into the training set. The issue is particularly relevant for the evaluation of the performance of Large Language Models (LLMs), which are generally trained on gargantuan, and generally opaque, corpora of text scraped from the world wide web. Developing tools to detect contamination is therefore crucial to be able to fairly and properly track the evolution of the performance of LLMs. To date, only a few recent studies have attempted to address the issue of quantifying and detecting contamination in short text sequences, such as those commonly found in benchmarks. However, these methods have limitations that can sometimes render them impractical.In the present paper, we introduce LogProber, a novel, efficient algorithm that we show to be able to detect contamination in a black box setting that tries to tackle some of these drawbacks by focusing on the familiarity with the question rather than the answer. Here, we explore the properties of the proposed method in comparison with concurrent approaches, identify its advantages and limitations, and illustrate how different forms of contamination can go undetected depending on the design of the detection algorithm.

nan


Article 376

Title@2025-06-11 (3): Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction

Title: Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction Einschließlich sprachlicher Einschränkungen aus der externen Wissensquelle für Audio-Visual Target Speech Extraction 将外部知识来源的语言限制纳入音像目标演讲摘录 2506.09792v1

Authors (5): Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li

Audio-visual target speaker extraction (AV-TSE) models primarily rely on target visual cues to isolate the target speaker’s voice from others. We know that humans leverage linguistic knowledge, such as syntax and semantics, to support speech perception. Inspired by this, we explore the potential of pre-trained speech-language models (PSLMs) and pre-trained language models (PLMs) as auxiliary knowledge sources for AV-TSE. In this study, we propose incorporating the linguistic constraints from PSLMs or PLMs for the AV-TSE model as additional supervision signals. Without introducing any extra computational cost during inference, the proposed approach consistently improves speech quality and intelligibility. Furthermore, we evaluate our method in multi-language settings and visual cue-impaired scenarios and show robust performance gains.

nan


Article 377

Title@2025-06-11 (3): DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images

Title: DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images DANCE: Deep Learning-Assisted Analyse von Proteinsequenzen mit Chaos verbesserte Kaleidoskopische Bilder 成就:利用混乱强化卡莱多斯图象对蛋白序列进行深入学习辅助分析 2409.06694v3

Authors (5): Taslim Murad, Prakash Chourasia, Sarwan Ali, Imdad Ullah Khan, Murray Patterson

Cancer is a complex disease characterized by uncontrolled cell growth. T cell receptors (TCRs), crucial proteins in the immune system, play a key role in recognizing antigens, including those associated with cancer. Recent advancements in sequencing technologies have facilitated comprehensive profiling of TCR repertoires, uncovering TCRs with potent anti-cancer activity and enabling TCR-based immunotherapies. However, analyzing these intricate biomolecules necessitates efficient representations that capture their structural and functional information. T-cell protein sequences pose unique challenges due to their relatively smaller lengths compared to other biomolecules. An image-based representation approach becomes a preferred choice for efficient embeddings, allowing for the preservation of essential details and enabling comprehensive analysis of T-cell protein sequences. In this paper, we propose to generate images from the protein sequences using the idea of Chaos Game Representation (CGR) using the Kaleidoscopic images approach. This Deep Learning Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images (called DANCE) provides a unique way to visualize protein sequences by recursively applying chaos game rules around a central seed point. we perform the classification of the T cell receptors (TCRs) protein sequences in terms of their respective target cancer cells, as TCRs are known for their immune response against cancer disease. The TCR sequences are converted into images using the DANCE method. We employ deep-learning vision models to perform the classification to obtain insights into the relationship between the visual patterns observed in the generated kaleidoscopic images and the underlying protein properties. By combining CGR-based image generation with deep learning classification, this study opens novel possibilities in the protein analysis domain.

nan


Article 378

Title@2025-06-11 (3): Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing

Title: Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing Darstellung Shattering in Transformers: Synthetische Studie mit Wissensbearbeitung 在变形器中代表变形器:带有知识编辑的合成研究 2410.17194v5

Authors (6): Kento Nishi, Rahul Ramesh, Maya Okawa, Mikail Khona, Hidenori Tanaka, Ekdeep Singh Lubana

Knowledge Editing (KE) algorithms alter models’ weights to perform targeted updates to incorrect, outdated, or otherwise unwanted factual associations. However, recent work has shown that applying KE can adversely affect models’ broader factual recall accuracy and diminish their reasoning abilities. Although these studies give insights into the potential harms of KE algorithms, e.g., performance evaluations on benchmarks, little is understood about why such destructive failures occur. Motivated by this, we define a novel synthetic task in which a Transformer is trained from scratch to internalize a “structured” knowledge graph. The structure enforces relationships between entities of the graph, such that editing a factual association has “trickling effects” on other entities (e.g., altering X’s parent is Y to Z affects who X’s siblings’ parent is). Through evaluations of edited models on this task, we show that KE inadvertently affects representations of entities beyond the targeted one, distorting relevant structures that allow a model to infer unseen knowledge about an entity. We call this phenomenon representation shattering and demonstrate that it degrades models’ factual recall and reasoning performance. We further corroborate our findings in naturalistic settings with pre-trained Llama and Mamba models as well. Overall, our work yields a precise mechanistic hypothesis to explain why KE has adverse effects on model abilities.

nan


Article 379

Title@2025-06-11 (3): A theoretical framework for self-supervised contrastive learning for continuous dependent data

Title: A theoretical framework for self-supervised contrastive learning for continuous dependent data Ein theoretischer Rahmen für selbstüberwachtes kontrastives Lernen für kontinuierliche abhängige Daten 为持续依赖数据进行自我监督对比式学习的理论框架 2506.09785v1

Authors (3): Alexander Marusov, Alexander Yuhay, Alexey Zaytsev

Self-supervised learning (SSL) has emerged as a powerful approach to learning representations, particularly in the field of computer vision. However, its application to dependent data, such as temporal and spatio-temporal domains, remains underexplored. Besides, traditional contrastive SSL methods often assume \emph{semantic independence between samples}, which does not hold for dependent data exhibiting complex correlations. We propose a novel theoretical framework for contrastive SSL tailored to \emph{continuous dependent data}, which allows the nearest samples to be semantically close to each other. In particular, we propose two possible \textit{ground truth similarity measures} between objects – \emph{hard} and \emph{soft} closeness. Under it, we derive an analytical form for the \textit{estimated similarity matrix} that accommodates both types of closeness between samples, thereby introducing dependency-aware loss functions. We validate our approach, \emph{Dependent TS2Vec}, on temporal and spatio-temporal downstream problems. Given the dependency patterns presented in the data, our approach surpasses modern ones for dependent data, highlighting the effectiveness of our theoretically grounded loss functions for SSL in capturing spatio-temporal dependencies. Specifically, we outperform TS2Vec on the standard UEA and UCR benchmarks, with accuracy improvements of $4.17$\% and $2.08$\%, respectively. Furthermore, on the drought classification task, which involves complex spatio-temporal patterns, our method achieves a $7$\% higher ROC-AUC score.

nan


Article 380

Title@2025-06-11 (3): On the Similarities of Embeddings in Contrastive Learning

Title: On the Similarities of Embeddings in Contrastive Learning Über die Ähnlichkeiten von Einbettungen im kontrastiven Lernen 关于差异学习中的嵌入相似性 2506.09781v1

Authors (4): Chungpa Lee, Sehee Lim, Kibok Lee, Jy-yong Sohn

Contrastive learning (CL) operates on a simple yet effective principle: embeddings of positive pairs are pulled together, while those of negative pairs are pushed apart. Although various forms of contrastive loss have been proposed and analyzed from different perspectives, prior works lack a comprehensive framework that systematically explains a broad class of these objectives. In this paper, we present a unified framework for understanding CL, which is based on analyzing the cosine similarity between embeddings of positive and negative pairs. In full-batch settings, we show that perfect alignment of positive pairs is unattainable when similarities of negative pairs fall below a certain threshold, and that this misalignment can be alleviated by incorporating within-view negative pairs. In mini-batch settings, we demonstrate that smaller batch sizes incur stronger separation among negative pairs within batches, which leads to higher variance in similarities of negative pairs. To address this limitation of mini-batch CL, we introduce an auxiliary loss term that reduces the variance of similarities of negative pairs in CL. Empirical results demonstrate that incorporating the proposed loss consistently improves the performance of CL methods in small-batch training.

nan


Article 381

Title@2025-06-11 (3): Cross-Channel Unlabeled Sensing over a Union of Signal Subspaces

Title: Cross-Channel Unlabeled Sensing over a Union of Signal Subspaces Cross-Channel-Unmarkierte Sensing über eine Union von Signal-Subräumen 对信号子空间联盟的交叉通道无标签遥感 2506.09773v1

Authors (4): Taulant Koka, Manolis C. Tsakiris, Benjamín Béjar Haro, Michael Muma

Cross-channel unlabeled sensing addresses the problem of recovering a multi-channel signal from measurements that were shuffled across channels. This work expands the cross-channel unlabeled sensing framework to signals that lie in a union of subspaces. The extension allows for handling more complex signal structures and broadens the framework to tasks like compressed sensing. These mismatches between samples and channels often arise in applications such as whole-brain calcium imaging of freely moving organisms or multi-target tracking. We improve over previous models by deriving tighter bounds on the required number of samples for unique reconstruction, while supporting more general signal types. The approach is validated through an application in whole-brain calcium imaging, where organism movements disrupt sample-to-neuron mappings. This demonstrates the utility of our framework in real-world settings with imprecise sample-channel associations, achieving accurate signal reconstruction.

nan


Article 382

Title@2025-06-11 (3): Load-Aware Training Scheduling for Model Circulation-based Decentralized Federated Learning

Title: Load-Aware Training Scheduling for Model Circulation-based Decentralized Federated Learning Load-Aware-Trainingsplanung für Modellzirkulationsbasiertes dezentralisiertes Federated Learning 模拟环环流分散式联邦学习模式的负载软件培训计划 2506.09769v1

Authors (2): Haruki Kainuma, Takayuki Nishio

This paper proposes Load-aware Tram-FL, an extension of Tram-FL that introduces a training scheduling mechanism to minimize total training time in decentralized federated learning by accounting for both computational and communication loads. The scheduling problem is formulated as a global optimization task, which-though intractable in its original form-is made solvable by decomposing it into node-wise subproblems. To promote balanced data utilization under non-IID distributions, a variance constraint is introduced, while the overall training latency, including both computation and communication costs, is minimized through the objective function. Simulation results on MNIST and CIFAR-10 demonstrate that Load-aware Tram-FL significantly reduces training time and accelerates convergence compared to baseline methods.

nan


Article 383

Title@2025-06-11 (3): Anomaly Detection via Autoencoder Composite Features and NCE

Title: Anomaly Detection via Autoencoder Composite Features and NCE Anomalieerkennung über Autoencoder Composite-Features und NCE 通过自动编码器合成特征和 NCE 异常探测 2502.01920v2

Authors (2): Yalin Liao, Austin J. Brockmeier

Unsupervised anomaly detection is a challenging task. Autoencoders (AEs) or generative models are often employed to model the data distribution of normal inputs and subsequently identify anomalous, out-of-distribution inputs by high reconstruction error or low likelihood, respectively. However, AEs may generalize and achieve small reconstruction errors on abnormal inputs. We propose a decoupled training approach for anomaly detection that both an AE and a likelihood model trained with noise contrastive estimation (NCE). After training the AE, NCE estimates a probability density function, to serve as the anomaly score, on the joint space of the AE’s latent representation combined with features of the reconstruction quality. To further reduce the false negative rate in NCE we systematically varying the reconstruction features to augment the training and optimize the contrastive Gaussian noise distribution. Experimental assessments on multiple benchmark datasets demonstrate that the proposed approach matches the performance of prevalent state-of-the-art anomaly detection algorithms.

nan


Article 384

Title@2025-06-11 (3): Learning to Optimize Package Picking for Large-Scale, Real-World Robot Induction

Title: Learning to Optimize Package Picking for Large-Scale, Real-World Robot Induction Optimierung der Auswahl von Paketen für großformatige, real-Welt-Roboter-Induktion 学习优化大型、现实世界机器人上岗软件包选择 2506.09765v1

Authors (6): Shuai Li, Azarakhsh Keipour, Sicong Zhao, Srinath Rajagopalan, Charles Swan, Kostas E. Bekris

Warehouse automation plays a pivotal role in enhancing operational efficiency, minimizing costs, and improving resilience to workforce variability. While prior research has demonstrated the potential of machine learning (ML) models to increase picking success rates in large-scale robotic fleets by prioritizing high-probability picks and packages, these efforts primarily focused on predicting success probabilities for picks sampled using heuristic methods. Limited attention has been given, however, to leveraging data-driven approaches to directly optimize sampled picks for better performance at scale. In this study, we propose an ML-based framework that predicts transform adjustments as well as improving the selection of suction cups for multi-suction end effectors for sampled picks to enhance their success probabilities. The framework was integrated and evaluated in test workcells that resemble the operations of Amazon Robotics’ Robot Induction (Robin) fleet, which is used for package manipulation. Evaluated on over 2 million picks, the proposed method achieves a 20\% reduction in pick failure rates compared to a heuristic-based pick sampling baseline, demonstrating its effectiveness in large-scale warehouse automation scenarios.

nan


Article 385

Title@2025-06-11 (3): Alice and the Caterpillar: A more descriptive null model for assessing data mining results

Title: Alice and the Caterpillar: A more descriptive null model for assessing data mining results Alice und die Caterpillar: Ein beschreibendes Nullmodell zur Bewertung der Ergebnisse des Data Mining 爱丽丝和毛毛虫:评估数据开采结果的更具描述性的无效模式 2506.09764v1

Authors (3): Giulia Preti, Gianmarco De Francisci Morales, Matteo Riondato

We introduce novel null models for assessing the results obtained from observed binary transactional and sequence datasets, using statistical hypothesis testing. Our null models maintain more properties of the observed dataset than existing ones. Specifically, they preserve the Bipartite Joint Degree Matrix of the bipartite (multi-)graph corresponding to the dataset, which ensures that the number of caterpillars, i.e., paths of length three, is preserved, in addition to other properties considered by other models. We describe Alice, a suite of Markov chain Monte Carlo algorithms for sampling datasets from our null models, based on a carefully defined set of states and efficient operations to move between them. The results of our experimental evaluation show that Alice mixes fast and scales well, and that our null model finds different significant results than ones previously considered in the literature.

nan


Article 386

Title@2025-06-11 (3): Generating Likely Counterfactuals Using Sum-Product Networks

Title: Generating Likely Counterfactuals Using Sum-Product Networks Wahrscheinliche Counterfactuals mit Sum-Product-Netzwerken generieren 使用总和生产网络产生可能的反事实 2401.14086v5

Authors (3): Jiri Nemecek, Tomas Pevny, Jakub Marecek

The need to explain decisions made by AI systems is driven by both recent regulation and user demand. The decisions are often explainable only post hoc. In counterfactual explanations, one may ask what constitutes the best counterfactual explanation. Clearly, multiple criteria must be taken into account, although “distance from the sample” is a key criterion. Recent methods that consider the plausibility of a counterfactual seem to sacrifice this original objective. Here, we present a system that provides high-likelihood explanations that are, at the same time, close and sparse. We show that the search for the most likely explanations satisfying many common desiderata for counterfactual explanations can be modeled using Mixed-Integer Optimization (MIO). We use a Sum-Product Network (SPN) to estimate the likelihood of a counterfactual. To achieve that, we propose an MIO formulation of an SPN, which can be of independent interest. The source code with examples is available at https://github.com/Epanemu/LiCE.

nan


Article 387

Title@2025-06-11 (3): STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization

Title: STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization STAR: Vielfältiges Lernen von Roboter-Fähigkeit Abstraktionen durch Rotations-Augmented Vector Quantization STAR:通过旋转-增强矢量量化,学习多式机器人技能变异 2506.03863v2

Authors (7): Hao Li, Qi Lv, Rui Shao, Xiang Deng, Yinchuan Li, Jianye Hao, Liqiang Nie

Transforming complex actions into discrete skill abstractions has demonstrated strong potential for robotic manipulation. Existing approaches mainly leverage latent variable models, e.g., VQ-VAE, to learn skill abstractions through learned vectors (codebooks), while they suffer from codebook collapse and modeling the causal relationship between learned skills. To address these limitations, we present \textbf{S}kill \textbf{T}raining with \textbf{A}ugmented \textbf{R}otation (\textbf{STAR}), a framework that advances both skill learning and composition to complete complex behaviors. Specifically, to prevent codebook collapse, we devise rotation-augmented residual skill quantization (RaRSQ). It encodes relative angles between encoder outputs into the gradient flow by rotation-based gradient mechanism. Points within the same skill code are forced to be either pushed apart or pulled closer together depending on gradient directions. Further, to capture the causal relationship between skills, we present causal skill transformer (CST) which explicitly models dependencies between skill representations through an autoregressive mechanism for coherent action generation. Extensive experiments demonstrate the superiority of STAR on both LIBERO benchmark and realworld tasks, with around 12\% improvement over the baselines.

nan


Article 388

Title@2025-06-11 (3): Feature Engineering for Agents: An Adaptive Cognitive Architecture for Interpretable ML Monitoring

Title: Feature Engineering for Agents: An Adaptive Cognitive Architecture for Interpretable ML Monitoring Feature Engineering for Agents: Eine adaptive Kognitive Architektur für interpretierbare ML-Überwachung 代理物特征工程:用于可解释ML监测的适应性认知结构 2506.09742v1

Authors (6): Gusseppe Bravo-Rocca, Peini Liu, Jordi Guitart, Rodrigo M Carrillo-Larco, Ajay Dholakia, David Ellison

Monitoring Machine Learning (ML) models in production environments is crucial, yet traditional approaches often yield verbose, low-interpretability outputs that hinder effective decision-making. We propose a cognitive architecture for ML monitoring that applies feature engineering principles to agents based on Large Language Models (LLMs), significantly enhancing the interpretability of monitoring outputs. Central to our approach is a Decision Procedure module that simulates feature engineering through three key steps: Refactor, Break Down, and Compile. The Refactor step improves data representation to better capture feature semantics, allowing the LLM to focus on salient aspects of the monitoring data while reducing noise and irrelevant information. Break Down decomposes complex information for detailed analysis, and Compile integrates sub-insights into clear, interpretable outputs. This process leads to a more deterministic planning approach, reducing dependence on LLM-generated planning, which can sometimes be inconsistent and overly general. The combination of feature engineering-driven planning and selective LLM utilization results in a robust decision support system, capable of providing highly interpretable and actionable insights. Experiments using multiple LLMs demonstrate the efficacy of our approach, achieving significantly higher accuracy compared to various baselines across several domains.

nan


Article 389

Title@2025-06-11 (3): Byzantine-Resilient Decentralized Multi-Armed Bandits

Title: Byzantine-Resilient Decentralized Multi-Armed Bandits Byzantinisch-Resiliente dezentralisierte Multi-Armed Banditen 拜占庭抗力强的分散式多武装强盗 2310.07320v2

Authors (4): Jingxuan Zhu, Alec Koppel, Alvaro Velasquez, Ji Liu

In decentralized cooperative multi-armed bandits (MAB), each agent observes a distinct stream of rewards, and seeks to exchange information with others to select a sequence of arms so as to minimize its regret. Agents in the cooperative setting can outperform a single agent running a MAB method such as Upper-Confidence Bound (UCB) independently. In this work, we study how to recover such salient behavior when an unknown fraction of the agents can be Byzantine, that is, communicate arbitrarily wrong information in the form of reward mean-estimates or confidence sets. This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets. Our key contribution is the development of a fully decentralized resilient upper confidence bound (UCB) algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values. This truncation step enables us to establish that the performance of each normal agent is no worse than the classic single-agent UCB1 algorithm in terms of regret, and more importantly, the cumulative regret of all normal agents is strictly better than the non-cooperative case, provided that each agent has at least 3f+1 neighbors where f is the maximum possible Byzantine agents in each agent’s neighborhood. Extensions to time-varying neighbor graphs, and minimax lower bounds are further established on the achievable regret. Experiments corroborate the merits of this framework in practice.

nan


Article 390

Title@2025-06-11 (3): Towards Multi-modal Graph Large Language Model

Title: Towards Multi-modal Graph Large Language Model Auf dem Weg zum multimodalen Graphenmodell für große Sprachen 走向多式大语言数字模型 2506.09738v1

Authors (6): Xin Wang, Zeyang Zhang, Linxin Xiao, Haibo Chen, Chendi Ge, Wenwu Zhu

Multi-modal graphs, which integrate diverse multi-modal features and relations, are ubiquitous in real-world applications. However, existing multi-modal graph learning methods are typically trained from scratch for specific graph data and tasks, failing to generalize across various multi-modal graph data and tasks. To bridge this gap, we explore the potential of Multi-modal Graph Large Language Models (MG-LLM) to unify and generalize across diverse multi-modal graph data and tasks. We propose a unified framework of multi-modal graph data, task, and model, discovering the inherent multi-granularity and multi-scale characteristics in multi-modal graphs. Specifically, we present five key desired characteristics for MG-LLM: 1) unified space for multi-modal structures and attributes, 2) capability of handling diverse multi-modal graph tasks, 3) multi-modal graph in-context learning, 4) multi-modal graph interaction with natural language, and 5) multi-modal graph reasoning. We then elaborate on the key challenges, review related works, and highlight promising future research directions towards realizing these ambitious characteristics. Finally, we summarize existing multi-modal graph datasets pertinent for model training. We believe this paper can contribute to the ongoing advancement of the research towards MG-LLM for generalization across multi-modal graph data and tasks.

nan


Article 391

Title@2025-06-11 (3): AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale

Title: AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale AtmosMJ: Über die Jahresskala hinausgehende KI-Wettervorhersagen über das Gateing-Mechanismus hinaus AtmosMJ: 重新研究国际气象预报系统在年规模以后的天气预报的确定机制 2506.09733v1

Authors (1): Minjong Cheon

The advent of Large Weather Models (LWMs) has marked a turning point in data-driven forecasting, with many models now outperforming traditional numerical systems in the medium range. However, achieving stable, long-range autoregressive forecasts beyond a few weeks remains a significant challenge. Prevailing state-of-the-art models that achieve year-long stability, such as SFNO and DLWP-HPX, have relied on transforming input data onto non-standard spatial domains like spherical harmonics or HEALPix meshes. This has led to the prevailing assumption that such representations are necessary to enforce physical consistency and long-term stability. This paper challenges that assumption by investigating whether comparable long-range performance can be achieved on the standard latitude-longitude grid. We introduce AtmosMJ, a deep convolutional network that operates directly on ERA5 data without any spherical remapping. The model’s stability is enabled by a novel Gated Residual Fusion (GRF) mechanism, which adaptively moderates feature updates to prevent error accumulation over long recursive simulations. Our results demonstrate that AtmosMJ produces stable and physically plausible forecasts for about 500 days. In quantitative evaluations, it achieves competitive 10-day forecast accuracy against models like Pangu-Weather and GraphCast, all while requiring a remarkably low training budget of 5.7 days on a V100 GPU. Our findings suggest that efficient architectural design, rather than non-standard data representation, can be the key to unlocking stable and computationally efficient long-range weather prediction.

nan


Article 392

Title@2025-06-11 (3): Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts

Title: Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts Auf dem Weg zu Basismodellen für die dynamische Systemrekonstruktion: Hierarchisches Meta-Learning über die Mischung von Experten 建立动态系统重建基础模式:通过专家混合学习 2502.05335v2

Authors (5): Roussel Desmond Nzoyem, Grant Stevens, Amarpal Sahota, David A. W. Barton, Tom Deakin

As foundational models reshape scientific discovery, a bottleneck persists in dynamical system reconstruction (DSR): the ability to learn across system hierarchies. Many meta-learning approaches have been applied successfully to single systems, but falter when confronted with sparse, loosely related datasets requiring multiple hierarchies to be learned. Mixture of Experts (MoE) offers a natural paradigm to address these challenges. Despite their potential, we demonstrate that naive MoEs are inadequate for the nuanced demands of hierarchical DSR, largely due to their gradient descent-based gating update mechanism which leads to slow updates and conflicted routing during training. To overcome this limitation, we introduce MixER: Mixture of Expert Reconstructors, a novel sparse top-1 MoE layer employing a custom gating update algorithm based on $K$-means and least squares. Extensive experiments validate MixER’s capabilities, demonstrating efficient training and scalability to systems of up to ten parametric ordinary differential equations. However, our layer underperforms state-of-the-art meta-learners in high-data regimes, particularly when each expert is constrained to process only a fraction of a dataset composed of highly related data points. Further analysis with synthetic and neuroscientific time series suggests that the quality of the contextual representations generated by MixER is closely linked to the presence of hierarchical structure in the data.

nan


Article 393

Title@2025-06-11 (3): Reevaluating Meta-Learning Optimization Algorithms Through Contextual Self-Modulation

Title: Reevaluating Meta-Learning Optimization Algorithms Through Contextual Self-Modulation Neubewertung von Meta-Learning-Optimierungsalgorithmen durch kontextuelle Selbst-Modulation 通过上下文自体调整重新评价元-收益优化优化化的元-收益优化化 2410.01655v2

Authors (3): Roussel Desmond Nzoyem, David A. W. Barton, Tom Deakin

Contextual Self-Modulation (CSM) (Nzoyem et al., 2025) is a potent regularization mechanism for Neural Context Flows (NCFs) which demonstrates powerful meta-learning on physical systems. However, CSM has limitations in its applicability across different modalities and in high-data regimes. In this work, we introduce two extensions: $i$CSM which expands CSM to infinite-dimensional variations by embedding the contexts into a function space, and StochasticNCF which improves scalability by providing a low-cost approximation of meta-gradient updates through a sampled set of nearest environments. These extensions are demonstrated through comprehensive experimentation on a range of tasks, including dynamical systems, computer vision challenges, and curve fitting problems. Additionally, we incorporate higher-order Taylor expansions via Taylor-Mode automatic differentiation, revealing that higher-order approximations do not necessarily enhance generalization. Finally, we demonstrate how CSM can be integrated into other meta-learning frameworks with FlashCAVIA, a computationally efficient extension of the CAVIA meta-learning framework (Zintgraf et al., 2019). Together, these contributions highlight the significant benefits of CSM and indicate that its strengths in meta-learning and out-of-distribution tasks are particularly well-suited to physical systems. Our open-source library, designed for modular integration of self-modulation into contextual meta-learning workflows, is available at https://github.com/ddrous/self-mod.

nan


Article 394

Title@2025-06-11 (3): Auto-Compressing Networks

Title: Auto-Compressing Networks Autokomprimierende Netzwerke 自动压缩网络 2506.09714v1

Authors (3): Vaggelis Dorovatas, Georgios Paraskevopoulos, Alexandros Potamianos

Deep neural networks with short residual connections have demonstrated remarkable success across domains, but increasing depth often introduces computational redundancy without corresponding improvements in representation quality. In this work, we introduce Auto-Compressing Networks (ACNs), an architectural variant where additive long feedforward connections from each layer to the output replace traditional short residual connections. ACNs showcase a unique property we coin as “auto-compression”, the ability of a network to organically compress information during training with gradient descent, through architectural design alone. Through auto-compression, information is dynamically “pushed” into early layers during training, enhancing their representational quality and revealing potential redundancy in deeper ones. We theoretically show that this property emerges from layer-wise training patterns present in ACNs, where layers are dynamically utilized during training based on task requirements. We also find that ACNs exhibit enhanced noise robustness compared to residual networks, superior performance in low-data settings, improved transfer learning capabilities, and mitigate catastrophic forgetting suggesting that they learn representations that generalize better despite using fewer parameters. Our results demonstrate up to 18% reduction in catastrophic forgetting and 30-80% architectural compression while maintaining accuracy across vision transformers, MLP-mixers, and BERT architectures. Furthermore, we demonstrate that coupling ACNs with traditional pruning techniques, enables significantly better sparsity-performance trade-offs compared to conventional architectures. These findings establish ACNs as a practical approach to developing efficient neural architectures that automatically adapt their computational footprint to task complexity, while learning robust representations.

nan


Article 395

Title@2025-06-11 (3): Training-Free Voice Conversion with Factorized Optimal Transport

Title: Training-Free Voice Conversion with Factorized Optimal Transport Schulungsfreie Sprachumwandlung mit Factorized Optimal Transport 以优化运输为要素的免费培训语音转换 2506.09709v1

Authors (3): Alexander Lobashev, Assel Yermekova, Maria Larchenko

This paper introduces Factorized MKL-VC, a training-free modification for kNN-VC pipeline. In contrast with original pipeline, our algorithm performs high quality any-to-any cross-lingual voice conversion with only 5 second of reference audio. MKL-VC replaces kNN regression with a factorized optimal transport map in WavLM embedding subspaces, derived from Monge-Kantorovich Linear solution. Factorization addresses non-uniform variance across dimensions, ensuring effective feature transformation. Experiments on LibriSpeech and FLEURS datasets show MKL-VC significantly improves content preservation and robustness with short reference audio, outperforming kNN-VC. MKL-VC achieves performance comparable to FACodec, especially in cross-lingual voice conversion domain.

nan


Article 396

Title@2025-06-11 (3): LLM2TEA: Agentic AI Designer Finds Innovative Objects with Generative Evolutionary Multitasking

Title: LLM2TEA: Agentic AI Designer Finds Innovative Objects with Generative Evolutionary Multitasking LLM2TEA: Agentischer AI-Designer findet innovative Objekte mit generativem evolutionären Multitasking LLM2TEA: ARI 设计师发现具有引演进进化多任务的创新对象 2406.14917v2

Authors (5): Melvin Wong, Jiao Liu, Thiago Rios, Stefan Menzel, Yew Soon Ong

In this paper, we introduce LLM-driven MultiTask Evolutionary Algorithm (LLM2TEA), the first agentic AI designer within a generative evolutionary multitasking (GEM) framework that promotes the crossover and synergy of designs from multiple domains, leading to innovative solutions that transcend individual disciplines. Of particular interest is the discovery of objects that are not only innovative but also conform to the physical specifications of the real world in science and engineering. LLM2TEA comprises a large language model to initialize a population of genotypes (defined by text prompts) describing the objects of interest, a text-to-3D generative model to produce phenotypes from these prompts, a classifier to interpret the semantic representations of the objects, and a physics simulation model to assess their physical properties. We propose several novel LLM-based multitask evolutionary operators to guide the search toward the discovery of high-performing practical objects. Experimental results in conceptual design optimization validate the effectiveness of LLM2TEA, revealing from 97\% to 174\% improvement in the diversity of innovative objects compared to the present text-to-3D generative model baseline. In addition, more than 73\% of the generated designs have better physical performance than the top 1\% percentile of the designs generated in the baseline. Moreover, LLM2TEA generates designs that are not only aesthetically creative but also functional in real-world applications. Several of these designs have been successfully 3D-printed, emphasizing the proposed approach’s capacity to transform AI-generated outputs into tangible physical objects. The designs produced by LLM2TEA meets practical requirements while showcasing creative and innovative features, underscoring its potential applications in complex design optimization and discovery.

nan


Article 397

Title@2025-06-11 (3): On the Importance of Embedding Norms in Self-Supervised Learning

Title: On the Importance of Embedding Norms in Self-Supervised Learning Über die Bedeutung der Einbettung von Normen in das Selbstüberwachte Lernen 关于自我监督学习中嵌入规范的重要性 2502.09252v2

Authors (7): Andrew Draganov, Sharvaree Vadgama, Sebastian Damrich, Jan Niklas Böhm, Lucas Maes, Dmitry Kobak, Erik Bekkers

Self-supervised learning (SSL) allows training data representations without a supervised signal and has become an important paradigm in machine learning. Most SSL methods employ the cosine similarity between embedding vectors and hence effectively embed data on a hypersphere. While this seemingly implies that embedding norms cannot play any role in SSL, a few recent works have suggested that embedding norms have properties related to network convergence and confidence. In this paper, we resolve this apparent contradiction and systematically establish the embedding norm’s role in SSL training. Using theoretical analysis, simulations, and experiments, we show that embedding norms (i) govern SSL convergence rates and (ii) encode network confidence, with smaller norms corresponding to unexpected samples. Additionally, we show that manipulating embedding norms can have large effects on convergence speed. Our findings demonstrate that SSL embedding norms are integral to understanding and optimizing network behavior.

nan


Article 398

Title@2025-06-11 (3): TRIDENT: Temporally Restricted Inference via DFA-Enhanced Neural Traversal

Title: TRIDENT: Temporally Restricted Inference via DFA-Enhanced Neural Traversal TRIDENT: Vorübergehend eingeschränkte Schlussfolgerung über DFA-verstärktes Neuraltraversal 通过DFA-增强的神经轨迹轨迹临时受限制的引文 2506.09701v1

Authors (5): Vincenzo Collura, Karim Tit, Laura Bussi, Eleonora Giunchiglia, Maxime Cordy

Large Language Models (LLMs) and other neural architectures have achieved impressive results across a variety of generative and classification tasks. However, they remain fundamentally ill-equipped to ensure that their outputs satisfy temporal constraints, such as those expressible in Linear Temporal Logic over finite traces (LTLf). In this paper, we introduce TRIDENT: a general and model-agnostic inference-time algorithm that guarantees compliance with such constraints without requiring any retraining. TRIDENT compiles LTLf formulas into a Deterministic Finite Automaton (DFA), which is used to guide a constrained variant of beam search. At each decoding step, transitions that would lead to constraint violations are masked, while remaining paths are dynamically re-ranked based on both the model’s probabilities and the DFA’s acceptance structure. We formally prove that the resulting sequences are guaranteed to satisfy the given LTLf constraints, and we empirically demonstrate that TRIDENT also improves output quality. We validate our approach on two distinct tasks: temporally constrained image-stream classification and controlled text generation. In both settings, TRIDENT achieves perfect constraint satisfaction, while comparison with the state of the art shows improved efficiency and high standard quality metrics.

nan


Article 399

Title@2025-06-11 (3): SLEEPYLAND: trust begins with fair evaluation of automatic sleep staging models

Title: SLEEPYLAND: trust begins with fair evaluation of automatic sleep staging models SLEEPYLAND: Vertrauen beginnt mit einer fairen Bewertung automatischer Schlafinszenierungsmodelle SEPEYLAND:信任始于公平评价自动睡眠模式 2506.08574v2

Authors (10): Alvise Dei Rossi, Matteo Metaldi, Michal Bechny, Irina Filchenko, Julia van der Meer, Markus H. Schmidt, Claudio L. A. Bassetti, Athina Tzovara, Francesca D. Faraci, Luigi Fiorillo

Despite advances in deep learning for automatic sleep staging, clinical adoption remains limited due to challenges in fair model evaluation, generalization across diverse datasets, model bias, and variability in human annotations. We present SLEEPYLAND, an open-source sleep staging evaluation framework designed to address these barriers. It includes more than 220’000 hours in-domain (ID) sleep recordings, and more than 84’000 hours out-of-domain (OOD) sleep recordings, spanning a broad range of ages, sleep-wake disorders, and hardware setups. We release pre-trained models based on high-performing SoA architectures and evaluate them under standardized conditions across single- and multi-channel EEG/EOG configurations. We introduce SOMNUS, an ensemble combining models across architectures and channel setups via soft voting. SOMNUS achieves robust performance across twenty-four different datasets, with macro-F1 scores between 68.7% and 87.2%, outperforming individual models in 94.9% of cases. Notably, SOMNUS surpasses previous SoA methods, even including cases where compared models were trained ID while SOMNUS treated the same data as OOD. Using a subset of the BSWR (N=6’633), we quantify model biases linked to age, gender, AHI, and PLMI, showing that while ensemble improves robustness, no model architecture consistently minimizes bias in performance and clinical markers estimation. In evaluations on OOD multi-annotated datasets (DOD-H, DOD-O), SOMNUS exceeds the best human scorer, i.e., MF1 85.2% vs 80.8% on DOD-H, and 80.2% vs 75.9% on DOD-O, better reproducing the scorer consensus than any individual expert (k = 0.89/0.85 and ACS = 0.95/0.94 for healthy/OSA cohorts). Finally, we introduce ensemble disagreement metrics - entropy and inter-model divergence based - predicting regions of scorer disagreement with ROC AUCs up to 0.828, offering a data-driven proxy for human uncertainty.

nan


Article 400

Title@2025-06-11 (3): Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment

Title: Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment Jenseits von Bradley-Terry-Modellen: Ein allgemeines Präferenzmodell für Sprachmodellausrichtung 超越布拉德利-泰瑞模式:语文模式统一通用优先模式 2410.02197v3

Authors (5): Yifan Zhang, Ge Zhang, Yue Wu, Kangping Xu, Quanquan Gu

Modeling human preferences is crucial for aligning foundation models with human values. Traditional reward modeling methods, such as the Bradley-Terry (BT) reward model, fall short in expressiveness, particularly in addressing intransitive preferences. In this paper, we introduce preference embedding, an approach that embeds responses into a latent space to capture intricate preference structures efficiently, achieving linear query complexity. Additionally, we propose preference score-based General Preference Optimization (GPO), which generalizes reward-based reinforcement learning from human feedback (RLHF). Experimental results show that our General Preference embedding Model (GPM) consistently outperforms the BT reward model on the RewardBench benchmark and effectively models cyclic preferences where any BT reward model behaves like a random guess. Furthermore, evaluations on downstream tasks such as AlpacaEval2.0, following the language model post-training with GPO and our general preference model, reveal performance improvements over BT models. These findings indicate that our method may enhance the alignment of foundation models with nuanced human values. The code is available at https://github.com/general-preference/general-preference-model.

nan


Article 401

Title@2025-06-11 (3): Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation

Title: Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation Auto-Regressive vs Flow-Matching: Eine vergleichende Studie zur Modellierung von Paradigmen für die Text-zu-Musik-Generierung 自动递减与流动配接: 文本到音乐一代模型模型比较研究 2506.08570v2

Authors (3): Or Tal, Felix Kreuk, Yossi Adi

Recent progress in text-to-music generation has enabled models to synthesize high-quality musical segments, full compositions, and even respond to fine-grained control signals, e.g. chord progressions. State-of-the-art (SOTA) systems differ significantly across many dimensions, such as training datasets, modeling paradigms, and architectural choices. This diversity complicates efforts to evaluate models fairly and pinpoint which design choices most influence performance. While factors like data and architecture are important, in this study we focus exclusively on the modeling paradigm. We conduct a systematic empirical analysis to isolate its effects, offering insights into associated trade-offs and emergent behaviors that can guide future text-to-music generation systems. Specifically, we compare the two arguably most common modeling paradigms: Auto-Regressive decoding and Conditional Flow-Matching. We conduct a controlled comparison by training all models from scratch using identical datasets, training configurations, and similar backbone architectures. Performance is evaluated across multiple axes, including generation quality, robustness to inference configurations, scalability, adherence to both textual and temporally aligned conditioning, and editing capabilities in the form of audio inpainting. This comparative study sheds light on distinct strengths and limitations of each paradigm, providing actionable insights that can inform future architectural and training decisions in the evolving landscape of text-to-music generation. Audio sampled examples are available at: https://huggingface.co/spaces/ortal1602/ARvsFM

nan


Article 402

Title@2025-06-11 (3): Share Secrets for Privacy: Confidential Forecasting with Vertical Federated Learning

Title: Share Secrets for Privacy: Confidential Forecasting with Vertical Federated Learning Share Secrets for Privacy: Vertrauliche Vorhersagen mit vertikalem Federated Learning 私隐共享秘密:与垂直联邦学习有关的保密预报 2405.20761v2

Authors (5): Aditya Shankar, Jérémie Decouchant, Dimitra Gkorou, Rihan Hai, Lydia Y. Chen

Vertical federated learning (VFL) is a promising area for time series forecasting in many applications, such as healthcare and manufacturing. Critical challenges to address include data privacy and over-fitting on small and noisy datasets during both training and inference. Additionally, such forecasting models must scale well with the number of parties while ensuring strong convergence and low-tuning complexity. We address these challenges and propose ``Secret-shared Time Series Forecasting with VFL’’ (STV), a novel framework with the following key features: i) a privacy-preserving algorithm for forecasting with SARIMAX and autoregressive trees on vertically-partitioned data; ii) decentralised forecasting using secret sharing and multi-party computation; and iii) novel N-party algorithms for matrix multiplication and inverse operations for exact parameter optimization, giving strong convergence with minimal tuning complexity. We evaluate on six representative datasets from public and industry-specific contexts. Results demonstrate that STV’s forecasting accuracy is comparable to those of centralized approaches. Our exact optimization outperforms centralized methods, including state-of-the-art diffusion models and long-short-term memory, by 23.81% on forecasting accuracy. We also evaluate scalability by examining the communication costs of exact and iterative optimization to navigate the choice between the two. STV’s code and supplementary material is available online: https://github.com/adis98/STV.

nan


Article 403

Title@2025-06-11 (3): Adding simple structure at inference improves Vision-Language Compositionality

Title: Adding simple structure at inference improves Vision-Language Compositionality Einfache Struktur bei Inferenz hinzufügen verbessert die Vision-Sprachenkompositionalität 在推论中添加简单结构改进视野-语言的构成性 2506.09691v1

Authors (4): Imanol Miranda, Ander Salaberria, Eneko Agirre, Gorka Azkune

Dual encoder Vision-Language Models (VLM) such as CLIP are widely used for image-text retrieval tasks. However, those models struggle with compositionality, showing a bag-of-words-like behavior that limits their retrieval performance. Many different training approaches have been proposed to improve the vision-language compositionality capabilities of those models. In comparison, inference-time techniques have received little attention. In this paper, we propose to add simple structure at inference, where, given an image and a caption: i) we divide the image into different smaller crops, ii) we extract text segments, capturing objects, attributes and relations, iii) using a VLM, we find the image crops that better align with text segments obtaining matches, and iv) we compute the final image-text similarity aggregating the individual similarities of the matches. Based on various popular dual encoder VLMs, we evaluate our approach in controlled and natural datasets for VL compositionality. We find that our approach consistently improves the performance of evaluated VLMs without any training, which shows the potential of inference-time techniques. The results are especially good for attribute-object binding as shown in the controlled dataset. As a result of an extensive analysis: i) we show that processing image crops is actually essential for the observed gains in performance, and ii) we identify specific areas to further improve inference-time approaches.

nan


Article 404

Title@2025-06-11 (3): Wasserstein Hypergraph Neural Network

Title: Wasserstein Hypergraph Neural Network Wasserstein Hypergraph Neuronales Netzwerk Vasserstein 高音神经网络 2506.09682v1

Authors (2): Iulia Duta, Pietro Liò

The ability to model relational information using machine learning has driven advancements across various domains, from medicine to social science. While graph representation learning has become mainstream over the past decade, representing higher-order relationships through hypergraphs is rapidly gaining momentum. In the last few years, numerous hypergraph neural networks have emerged, most of them falling under a two-stage, set-based framework. The messages are sent from nodes to edges and then from edges to nodes. However, most of the advancement still takes inspiration from the graph counterpart, often simplifying the aggregations to basic pooling operations. In this paper we are introducing Wasserstein Hypergraph Neural Network, a model that treats the nodes and hyperedge neighbourhood as distributions and aggregate the information using Sliced Wasserstein Pooling. Unlike conventional aggregators such as mean or sum, which only capture first-order statistics, our approach has the ability to preserve geometric properties like the shape and spread of distributions. This enables the learned embeddings to reflect how easily one hyperedge distribution can be transformed into another, following principles of optimal transport. Experimental results demonstrate that applying Wasserstein pooling in a hypergraph setting significantly benefits node classification tasks, achieving top performance on several real-world datasets.

nan


Article 405

Title@2025-06-11 (3): Assessing the Quality of Denoising Diffusion Models in Wasserstein Distance: Noisy Score and Optimal Bounds

Title: Assessing the Quality of Denoising Diffusion Models in Wasserstein Distance: Noisy Score and Optimal Bounds Bewertung der Qualität des Denoisierens von Diffusionsmodellen in Wasserstein Abstand: Geräuschpegel und optimale Bounds 评估瓦西尔斯坦距离中Disoising扩散模型的质量:噪音分数和最佳区区 2506.09681v1

Authors (3): Vahan Arsenyan, Elen Vardanyan, Arnak Dalalyan

Generative modeling aims to produce new random examples from an unknown target distribution, given access to a finite collection of examples. Among the leading approaches, denoising diffusion probabilistic models (DDPMs) construct such examples by mapping a Brownian motion via a diffusion process driven by an estimated score function. In this work, we first provide empirical evidence that DDPMs are robust to constant-variance noise in the score evaluations. We then establish finite-sample guarantees in Wasserstein-2 distance that exhibit two key features: (i) they characterize and quantify the robustness of DDPMs to noisy score estimates, and (ii) they achieve faster convergence rates than previously known results. Furthermore, we observe that the obtained rates match those known in the Gaussian case, implying their optimality.

nan


Article 406

Title@2025-06-11 (3): SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting

Title: SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting SplitLoRA: Balance Stabilität und Plastizität im kontinuierlichen Lernen durch gradienten Raum Splitting Split LoRA:通过逐步空间分割在持续学习中平衡稳定和可塑性 2505.22370v3

Authors (6): Haomiao Qiu, Miao Zhang, Ziyue Qiao, Weili Guan, Min Zhang, Liqiang Nie

Continual Learning requires a model to learn multiple tasks in sequence while maintaining both stability:preserving knowledge from previously learned tasks, and plasticity:effectively learning new tasks. Gradient projection has emerged as an effective and popular paradigm in CL, where it partitions the gradient space of previously learned tasks into two orthogonal subspaces: a primary subspace and a minor subspace. New tasks are learned effectively within the minor subspace, thereby reducing interference with previously acquired knowledge. However, existing Gradient Projection methods struggle to achieve an optimal balance between plasticity and stability, as it is hard to appropriately partition the gradient space. In this work, we consider a continual learning paradigm based on Low-Rank Adaptation, which has gained considerable attention due to its efficiency and wide applicability, and propose a novel approach for continual learning, called SplitLoRA. We first provide a theoretical analysis of how subspace partitioning affects model stability and plasticity. Informed by this analysis, we then introduce an effective method that derives the optimal partition of the gradient space for previously learned tasks. This approach effectively balances stability and plasticity in continual learning. Experimental results on multiple datasets demonstrate that the proposed method achieves state-of-the-art performance.

nan


Article 407

Title@2025-06-11 (3): Wavelet Scattering Transform and Fourier Representation for Offline Detection of Malicious Clients in Federated Learning

Title: Wavelet Scattering Transform and Fourier Representation for Offline Detection of Malicious Clients in Federated Learning Wavelet Scattering Transform und Fourier-Darstellung zur Offline-Erkennung von bösartigen Kunden im Federated Learning 联邦学习联盟恶意客户离线检测分波散散变形和Fourier代表处 2506.09674v1

Authors (3): Alessandro Licciardi, Davide Leo, Davide Carbone

Federated Learning (FL) enables the training of machine learning models across decentralized clients while preserving data privacy. However, the presence of anomalous or corrupted clients - such as those with faulty sensors or non representative data distributions - can significantly degrade model performance. Detecting such clients without accessing raw data remains a key challenge. We propose WAFFLE (Wavelet and Fourier representations for Federated Learning) a detection algorithm that labels malicious clients {\it before training}, using locally computed compressed representations derived from either the Wavelet Scattering Transform (WST) or the Fourier Transform. Both approaches provide low-dimensional, task-agnostic embeddings suitable for unsupervised client separation. A lightweight detector, trained on a distillated public dataset, performs the labeling with minimal communication and computational overhead. While both transforms enable effective detection, WST offers theoretical advantages, such as non-invertibility and stability to local deformations, that make it particularly well-suited to federated scenarios. Experiments on benchmark datasets show that our method improves detection accuracy and downstream classification performance compared to existing FL anomaly detection algorithms, validating its effectiveness as a pre-training alternative to online detection strategies.

nan


Article 408

Title@2025-06-11 (3): Optimizing Shortfall Risk Metric for Learning Regression Models

Title: Optimizing Shortfall Risk Metric for Learning Regression Models Optimierung von Shortfall Risk Metric zum Lernen von Regressionsmodellen 优化 “ 短缺风险 “ 风险计量模型的学习倒退模型 2505.17777v3

Authors (2): Harish G. Ramaswamy, L. A. Prashanth

We consider the problem of estimating and optimizing utility-based shortfall risk (UBSR) of a loss, say $(Y - \hat Y)^2$, in the context of a regression problem. Empirical risk minimization with a UBSR objective is challenging since UBSR is a non-linear function of the underlying distribution. We first derive a concentration bound for UBSR estimation using independent and identically distributed (i.i.d.) samples. We then frame the UBSR optimization problem as minimization of a pseudo-linear function in the space of achievable distributions $\mathcal D$ of the loss $(Y- \hat Y)^2$. We construct a gradient oracle for the UBSR objective and a linear minimization oracle (LMO) for the set $\mathcal D$. Using these oracles, we devise a bisection-type algorithm, and establish convergence to the UBSR-optimal solution.

nan


Article 409

Title@2025-06-11 (3): CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization

Title: CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization CROW: Beseitigung von Backdoors aus großen Sprachmodellen über interne Konsistenzregularisierung CROW:通过内部一致性规范化,消除大语言模式中的后门 2411.12768v2

Authors (4): Nay Myat Min, Long H. Pham, Yige Li, Jun Sun

Large Language Models (LLMs) are vulnerable to backdoor attacks that manipulate outputs via hidden triggers. Existing defense methods–designed for vision/text classification tasks–fail for text generation. We propose Internal Consistency Regularization (CROW), a defense leveraging the observation that backdoored models exhibit unstable layer-wise hidden representations when triggered, while clean models show smooth transitions. CROW enforces consistency across layers via adversarial perturbations and regularization during finetuning, neutralizing backdoors without requiring clean reference models or trigger knowledge–only a small clean dataset. Experiments across Llama-2 (7B, 13B), CodeLlama (7B, 13B), and Mistral-7B demonstrate CROW’s effectiveness: it achieves significant reductions in attack success rates across diverse backdoor strategies (sentiment steering, targeted refusal, code injection) while preserving generative performance. CROW’s architecture-agnostic design enables practical deployment.

nan


Article 410

Title@2025-06-11 (3): CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain

Title: CINeMA: Conditional Implicit Neural Multi-Modal Atlas for a Spatio-Temporal Representation of the Perinatal Brain CINeMA: Bedingter Impliziter Neuraler Multi-Modaler Atlas für eine räumlich-zeitliche Darstellung des Perinatalen Gehirns CINEMA: 围产期大脑空间时/时代表性有条件的隐性神经隐性多模式图集 2506.09668v1

Authors (8): Maik Dannecker, Vasiliki Sideri-Lampretsa, Sophie Starck, Angeline Mihailov, Mathieu Milh, Nadine Girard, Guillaume Auzias, Daniel Rueckert

Magnetic resonance imaging of fetal and neonatal brains reveals rapid neurodevelopment marked by substantial anatomical changes unfolding within days. Studying this critical stage of the developing human brain, therefore, requires accurate brain models-referred to as atlases-of high spatial and temporal resolution. To meet these demands, established traditional atlases and recently proposed deep learning-based methods rely on large and comprehensive datasets. This poses a major challenge for studying brains in the presence of pathologies for which data remains scarce. We address this limitation with CINeMA (Conditional Implicit Neural Multi-Modal Atlas), a novel framework for creating high-resolution, spatio-temporal, multimodal brain atlases, suitable for low-data settings. Unlike established methods, CINeMA operates in latent space, avoiding compute-intensive image registration and reducing atlas construction times from days to minutes. Furthermore, it enables flexible conditioning on anatomical features including GA, birth age, and pathologies like ventriculomegaly (VM) and agenesis of the corpus callosum (ACC). CINeMA supports downstream tasks such as tissue segmentation and age prediction whereas its generative properties enable synthetic data creation and anatomically informed data augmentation. Surpassing state-of-the-art methods in accuracy, efficiency, and versatility, CINeMA represents a powerful tool for advancing brain research. We release the code and atlases at https://github.com/m-dannecker/CINeMA.

nan


Article 411

Title@2025-06-11 (3): Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes

Title: Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes Lernen von zeitvariierenden Multi-Region Gehirnkommunikation über skalierbare Markovian Gaussian Prozesse 通过可缩放的马尔科维扬高斯进程进行学习、改变时间的多区域脑交流 2407.00397v5

Authors (4): Weihan Li, Yule Wang, Chengrui Li, Anqi Wu

Understanding and constructing brain communications that capture dynamic communications across multiple regions is fundamental to modern system neuroscience, yet current methods struggle to find time-varying region-level communications or scale to large neural datasets with long recording durations. We present a novel framework using Markovian Gaussian Processes to learn brain communications with time-varying temporal delays from multi-region neural recordings, named Adaptive Delay Model (ADM). Our method combines Gaussian Processes with State Space Models and employs parallel scan inference algorithms, enabling efficient scaling to large datasets while identifying concurrent communication patterns that evolve over time. This time-varying approach captures how brain region interactions shift dynamically during cognitive processes. Validated on synthetic and multi-region neural recordings datasets, our approach discovers both the directionality and temporal dynamics of neural communication. This work advances our understanding of distributed neural computation and provides a scalable tool for analyzing dynamic brain networks.

nan


Article 412

Title@2025-06-11 (3): Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement

Title: Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement Anmerkungsfreie MIDI-to-Audio-Synthese über Konkatenative Synthese und Generative Veredelung 通过分类合成和生产改进进行无说明的MIDI到研究综合综述 2410.16785v2

Authors (2): Osamu Take, Taketo Akama

Recent MIDI-to-audio synthesis methods using deep neural networks have successfully generated high-quality, expressive instrumental tracks. However, these methods require MIDI annotations for supervised training, limiting the diversity of instrument timbres and expression styles in the output. We propose CoSaRef, a MIDI-to-audio synthesis method that does not require MIDI-audio paired datasets. CoSaRef first generates a synthetic audio track using concatenative synthesis based on MIDI input, then refines it with a diffusion-based deep generative model trained on datasets without MIDI annotations. This approach improves the diversity of timbres and expression styles. Additionally, it allows detailed control over timbres and expression through audio sample selection and extra MIDI design, similar to traditional functions in digital audio workstations. Experiments showed that CoSaRef could generate realistic tracks while preserving fine-grained timbre control via one-shot samples. Moreover, despite not being supervised on MIDI annotation, CoSaRef outperformed the state-of-the-art timbre-controllable method based on MIDI supervision in both objective and subjective evaluation.

nan


Article 413

Title@2025-06-11 (3): SyncFed: Time-Aware Federated Learning through Explicit Timestamping and Synchronization

Title: SyncFed: Time-Aware Federated Learning through Explicit Timestamping and Synchronization SyncFed: Time-Aware-Federated Learning durch explizites Zeitstempeln und Synchronisieren 同步Fed:通过明确时间印记和同步化进行时间软件联邦学习 2506.09660v1

Authors (4): Baran Can Gül, Stefanos Tziampazis, Nasser Jazdi, Michael Weyrich

As Federated Learning (FL) expands to larger and more distributed environments, consistency in training is challenged by network-induced delays, clock unsynchronicity, and variability in client updates. This combination of factors may contribute to misaligned contributions that undermine model reliability and convergence. Existing methods like staleness-aware aggregation and model versioning address lagging updates heuristically, yet lack mechanisms to quantify staleness, especially in latency-sensitive and cross-regional deployments. In light of these considerations, we introduce \emph{SyncFed}, a time-aware FL framework that employs explicit synchronization and timestamping to establish a common temporal reference across the system. Staleness is quantified numerically based on exchanged timestamps under the Network Time Protocol (NTP), enabling the server to reason about the relative freshness of client updates and apply temporally informed weighting during aggregation. Our empirical evaluation on a geographically distributed testbed shows that, under \emph{SyncFed}, the global model evolves within a stable temporal context, resulting in improved accuracy and information freshness compared to round-based baselines devoid of temporal semantics.

nan


Article 414

Title@2025-06-11 (3): Intent Factored Generation: Unleashing the Diversity in Your Language Model

Title: Intent Factored Generation: Unleashing the Diversity in Your Language Model Intent Factored Generation: Die Vielfalt in Ihrem Sprachmodell entfesseln 内在因数代:在你的语言模式中释放多样性 2506.09659v1

Authors (5): Eltayeb Ahmed, Uljad Berdica, Martha Elliott, Danijela Horak, Jakob N. Foerster

Obtaining multiple meaningfully diverse, high quality samples from Large Language Models for a fixed prompt remains an open challenge. Current methods for increasing diversity often only operate at the token-level, paraphrasing the same response. This is problematic because it leads to poor exploration on reasoning problems and to unengaging, repetitive conversational agents. To address this we propose Intent Factored Generation (IFG), factorising the sampling process into two stages. First, we sample a semantically dense intent, e.g., a summary or keywords. Second, we sample the final response conditioning on both the original prompt and the intent from the first stage. This allows us to use a higher temperature during the intent step to promote conceptual diversity, and a lower temperature during the final generation to ensure the outputs are coherent and self-consistent. Additionally, we find that prompting the model to explicitly state its intent for each step of the chain-of-thought before generating the step is beneficial for reasoning tasks. We demonstrate our method’s effectiveness across a diverse set of tasks. We show this method improves both pass@k and Reinforcement Learning from Verifier Feedback on maths and code tasks. For instruction-tuning, we combine IFG with Direct Preference Optimisation to increase conversational diversity without sacrificing reward. Finally, we achieve higher diversity while maintaining the quality of generations on a general language modelling task, using a new dataset of reader comments and news articles that we collect and open-source. In summary, we present a simple method of increasing the sample diversity of LLMs while maintaining performance. This method can be implemented by changing the prompt and varying the temperature during generation, making it easy to integrate into many algorithms for gains across various applications.

nan


Article 415

Title@2025-06-11 (3): DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Title: DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy DipLLM: Feinsteuerungs-LLM für strategische Entscheidungsfindung in der Diplomatie DipLLM: 外交战略决策的精细推荐LLM 2506.09655v1

Authors (6): Kaixuan Xu, Jiajun Chai, Sicheng Li, Yuqian Fu, Yuanheng Zhu, Dongbin Zhao

Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplomacy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to simplify the complex task of multi-unit action assignment into a sequence of unit-level decisions. By defining an equilibrium policy within this framework as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its performance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.

nan


Article 416

Title@2025-06-11 (3): Improving Discriminator Guidance in Diffusion Models

Title: Improving Discriminator Guidance in Diffusion Models Verbesserung der Orientierungshilfen für Diskriminatoren bei Diffusionsmodellen 改进传播模式中反对者的指导 2503.16117v2

Authors (5): Alexandre Verine, Ahmed Mehdi Inane, Florian Le Bronnec, Benjamin Negrevergne, Yann Chevaleyre

Discriminator Guidance has become a popular method for efficiently refining pre-trained Score-Matching Diffusion models. However, in this paper, we demonstrate that the standard implementation of this technique does not necessarily lead to a distribution closer to the real data distribution. Specifically, we show that training the discriminator using Cross-Entropy loss, as commonly done, can in fact increase the Kullback-Leibler divergence between the model and target distributions, particularly when the discriminator overfits. To address this, we propose a theoretically sound training objective for discriminator guidance that properly minimizes the KL divergence. We analyze its properties and demonstrate empirically across multiple datasets that our proposed method consistently improves over the conventional method by producing samples of higher quality.

nan


Article 417

Title@2025-06-11 (3): Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations

Title: Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations Die Gleichwertigkeit Bayesischer neuraler Netzwerke und Gaussian-Prozesse: Über die Bedeutung von Lernaktivierungen 重新审视贝耶斯神经网络和高斯进程等效的贝耶斯神经网络和高斯进程:关于学习活动的重要性 2410.15777v3

Authors (3): Marcin Sendera, Amin Sorkhei, Tomasz Kuśmierczyk

Gaussian Processes (GPs) provide a convenient framework for specifying function-space priors, making them a natural choice for modeling uncertainty. In contrast, Bayesian Neural Networks (BNNs) offer greater scalability and extendability but lack the advantageous properties of GPs. This motivates the development of BNNs capable of replicating GP-like behavior. However, existing solutions are either limited to specific GP kernels or rely on heuristics. We demonstrate that trainable activations are crucial for effective mapping of GP priors to wide BNNs. Specifically, we leverage the closed-form 2-Wasserstein distance for efficient gradient-based optimization of reparameterized priors and activations. Beyond learned activations, we also introduce trainable periodic activations that ensure global stationarity by design, and functional priors conditioned on GP hyperparameters to allow efficient model selection. Empirically, our method consistently outperforms existing approaches or matches performance of the heuristic methods, while offering stronger theoretical foundations.

nan


Article 418

Title@2025-06-11 (3): Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

Title: Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension Diskrepanzen sind Tugend: Schwach-zu-starke Verallgemeinerung durch Lens der Intrinsischen Dimension 差异是道德:通过内分泌尺寸的透镜对电压的微弱普遍化 2502.05075v4

Authors (5): Yijun Dong, Yicheng Li, Yunai Li, Jason D. Lee, Qi Lei

Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong (large) student model is trained on pseudo-labels generated by a weak teacher. Surprisingly, W2S FT often outperforms the weak teacher. We seek to understand this phenomenon through the observation that FT often occurs in intrinsically low-dimensional spaces. Leveraging the low intrinsic dimensionality of FT, we analyze W2S in the ridgeless regression setting from a variance reduction perspective. For a strong student-weak teacher pair with sufficiently expressive low-dimensional feature subspaces $\mathcal{V}_s, \mathcal{V}_w$, we provide an exact characterization of the variance that dominates the generalization error of W2S. This unveils a virtue of discrepancy between the strong and weak models in W2S: the variance of the weak teacher is inherited by the strong student in $\mathcal{V}_s \cap \mathcal{V}_w$, while reduced by a factor of $\mathrm{dim}(\mathcal{V}_s)/N$ in the subspace of discrepancy $\mathcal{V}_w \setminus \mathcal{V}_s$ with $N$ pseudo-labels for W2S. Our analysis further casts light on the sample complexities and the scaling of performance gap recovery in W2S. The analysis is supported by experiments on synthetic regression problems, as well as real vision and NLP tasks.

nan


Article 419

Title@2025-06-11 (3): Pruning Spurious Subgraphs for Graph Out-of-Distribtuion Generalization

Title: Pruning Spurious Subgraphs for Graph Out-of-Distribtuion Generalization Pruning Spurious Subgraphs für Graph Out-of-Distribtuion Verallgemeinerung 平面平面平面平面图图解裂平面图集集 2506.05957v3

Authors (7): Tianjun Yao, Haoxuan Li, Yongqiang Chen, Tongliang Liu, Le Song, Eric Xing, Zhiqiang Shen

Graph Neural Networks (GNNs) often encounter significant performance degradation under distribution shifts between training and test data, hindering their applicability in real-world scenarios. Recent studies have proposed various methods to address the out-of-distribution generalization challenge, with many methods in the graph domain focusing on directly identifying an invariant subgraph that is predictive of the target label. However, we argue that identifying the edges from the invariant subgraph directly is challenging and error-prone, especially when some spurious edges exhibit strong correlations with the targets. In this paper, we propose PrunE, the first pruning-based graph OOD method that eliminates spurious edges to improve OOD generalizability. By pruning spurious edges, PrunE retains the invariant subgraph more comprehensively, which is critical for OOD generalization. Specifically, PrunE employs two regularization terms to prune spurious edges: 1) graph size constraint to exclude uninformative spurious edges, and 2) $\epsilon$-probability alignment to further suppress the occurrence of spurious edges. Through theoretical analysis and extensive experiments, we show that PrunE achieves superior OOD performance and outperforms previous state-of-the-art methods significantly. Codes are available at: \href{https://github.com/tianyao-aka/PrunE-GraphOOD}{https://github.com/tianyao-aka/PrunE-GraphOOD}.

nan


Article 420

Title@2025-06-11 (3): No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

Title: No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces Keine Aufgabe nach links: Isotropes Modell, das sich mit gemeinsamen und aufgabenspezifischen Subräumen vermischt 未留下任务 : 与共同和任务特定子空间合并的异形模型 2502.04959v3

Authors (6): Daniel Marczak, Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost van de Weijer

Model merging integrates the weights of multiple task-specific models into a single multi-task model. Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains. In this paper, we investigate the key characteristics of task matrices – weight update matrices applied to a pre-trained model – that enable effective merging. We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement over the pre-trained model. Based on this, we propose an isotropic merging framework that flattens the singular value spectrum of task matrices, enhances alignment, and reduces the performance gap. Additionally, we incorporate both common and task-specific subspaces to further improve alignment and performance. Our proposed approach achieves state-of-the-art performance on vision and language tasks across various sets of tasks and model scales. This work advances the understanding of model merging dynamics, offering an effective methodology to merge models without requiring additional training. Code is available at https://github.com/danielm1405/iso-merging .

nan


Article 421

Title@2025-06-11 (3): HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios

Title: HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios HopaDIFF: Ganzheitlich-Partial Aware Fourier Conditioned Diffusion für die Referenzierung von Human Action Segmentierung in Multi-Personen-Szenarien HopaDIFF: 在多重人的情况中为提及人类行动分部分而传播的全方位全方位认识 2506.09650v1

Authors (10): Kunyu Peng, Junchao Huang, Xiangsheng Huang, Di Wen, Junwei Zheng, Yufan Chen, Kailun Yang, Jiamin Wu, Chongqing Hao, Rainer Stiefelhagen

Action segmentation is a core challenge in high-level video understanding, aiming to partition untrimmed videos into segments and assign each a label from a predefined action set. Existing methods primarily address single-person activities with fixed action sequences, overlooking multi-person scenarios. In this work, we pioneer textual reference-guided human action segmentation in multi-person settings, where a textual description specifies the target person for segmentation. We introduce the first dataset for Referring Human Action Segmentation, i.e., RHAS133, built from 133 movies and annotated with 137 fine-grained actions with 33h video data, together with textual descriptions for this new task. Benchmarking existing action recognition methods on RHAS133 using VLM-based feature extractors reveals limited performance and poor aggregation of visual cues for the target person. To address this, we propose a holistic-partial aware Fourier-conditioned diffusion framework, i.e., HopaDIFF, leveraging a novel cross-input gate attentional xLSTM to enhance holistic-partial long-range reasoning and a novel Fourier condition to introduce more fine-grained control to improve the action segmentation generation. HopaDIFF achieves state-of-the-art results on RHAS133 in diverse evaluation settings. The code is available at https://github.com/KPeng9510/HopaDIFF.git.

nan


Article 422

Title@2025-06-11 (3): Scaling Laws for Uncertainty in Deep Learning

Title: Scaling Laws for Uncertainty in Deep Learning Skalierungsgesetze für Ungewissheit im Deep Learning 强化深层学习不确定性法律 2506.09648v1

Authors (5): Mattia Rosso, Simone Rossi, Giulio Franzese, Markus Heinonen, Maurizio Filippone

Deep learning has recently revealed the existence of scaling laws, demonstrating that model performance follows predictable trends based on dataset and model sizes. Inspired by these findings and fascinating phenomena emerging in the over-parameterized regime, we examine a parallel direction: do similar scaling laws govern predictive uncertainties in deep learning? In identifiable parametric models, such scaling laws can be derived in a straightforward manner by treating model parameters in a Bayesian way. In this case, for example, we obtain $O(1/N)$ contraction rates for epistemic uncertainty with respect to the number of data $N$. However, in over-parameterized models, these guarantees do not hold, leading to largely unexplored behaviors. In this work, we empirically show the existence of scaling laws associated with various measures of predictive uncertainty with respect to dataset and model sizes. Through experiments on vision and language tasks, we observe such scaling laws for in- and out-of-distribution predictive uncertainty estimated through popular approximate Bayesian inference and ensemble methods. Besides the elegance of scaling laws and the practical utility of extrapolating uncertainties to larger data or models, this work provides strong evidence to dispel recurring skepticism against Bayesian approaches: “In many applications of deep learning we have so much data available: what do we need Bayes for?”. Our findings show that “so much data” is typically not enough to make epistemic uncertainty negligible.

nan


Article 423

Title@2025-06-11 (3): Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls

Title: Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls Verteilungs- und adversariell robuste Logistische Regression über Intersecting Wasserstein Balls 通过交叉瓦塞施泰因球的分布式和反向强劲后勤后退 2407.13625v4

Authors (6): Aras Selvi, Eleonora Kreacic, Mohsen Ghassemi, Vamsi Potluru, Tucker Balch, Manuela Veloso

Adversarially robust optimization (ARO) has emerged as the de facto standard for training models that hedge against adversarial attacks in the test stage. While these models are robust against adversarial attacks, they tend to suffer severely from overfitting. To address this issue, some successful methods replace the empirical distribution in the training stage with alternatives including (i) a worst-case distribution residing in an ambiguity set, resulting in a distributionally robust (DR) counterpart of ARO; (ii) a mixture of the empirical distribution with a distribution induced by an auxiliary (e.g., synthetic, external, out-of-domain) dataset. Inspired by the former, we study the Wasserstein DR counterpart of ARO for logistic regression and show it admits a tractable convex optimization reformulation. Adopting the latter setting, we revise the DR approach by intersecting its ambiguity set with another ambiguity set built using the auxiliary dataset, which offers a significant improvement whenever the Wasserstein distance between the data generating and auxiliary distributions can be estimated. We study the underlying optimization problem, develop efficient solution algorithms, and demonstrate that the proposed method outperforms benchmark approaches on standard datasets.

nan


Article 424

Title@2025-06-11 (3): Real-Time Network Traffic Forecasting with Missing Data: A Generative Model Approach

Title: Real-Time Network Traffic Forecasting with Missing Data: A Generative Model Approach Echtzeit-Netzverkehr mit fehlenden Daten: Ein generatives Modellansatz 缺少数据实时网络流量预测:生成模式方法 2506.09647v1

Authors (4): Lei Deng, Wenhan Xu, Jingwei Li, Danny H. K. Tsang

Real-time network traffic forecasting is crucial for network management and early resource allocation. Existing network traffic forecasting approaches operate under the assumption that the network traffic data is fully observed. However, in practical scenarios, the collected data are often incomplete due to various human and natural factors. In this paper, we propose a generative model approach for real-time network traffic forecasting with missing data. Firstly, we model the network traffic forecasting task as a tensor completion problem. Secondly, we incorporate a pre-trained generative model to achieve the low-rank structure commonly associated with tensor completion. The generative model effectively captures the intrinsic low-rank structure of network traffic data during pre-training and enables the mapping from a compact latent representation to the tensor space. Thirdly, rather than directly optimizing the high-dimensional tensor, we optimize its latent representation, which simplifies the optimization process and enables real-time forecasting. We also establish a theoretical recovery guarantee that quantifies the error bound of the proposed approach. Experiments on real-world datasets demonstrate that our approach achieves accurate network traffic forecasting within 100 ms, with a mean absolute error (MAE) below 0.002, as validated on the Abilene dataset.

nan


Article 425

Title@2025-06-11 (3): Generalizable and Fast Surrogates: Model Predictive Control of Articulated Soft Robots using Physics-Informed Neural Networks

Title: Generalizable and Fast Surrogates: Model Predictive Control of Articulated Soft Robots using Physics-Informed Neural Networks Generalisierbare und schnelle Surrogate: Modellvorhersagesteuerung von artikulierten Soft Robots mit physikinformierten Neuronalen Netzwerken 通用和快速超速超速:利用物理成形神经网络对人工软体机器人进行模拟预测控制 2502.01916v2

Authors (6): Tim-Lukas Habich, Aran Mohammad, Simon F. G. Ehlers, Martin Bensch, Thomas Seel, Moritz Schappler

Soft robots can revolutionize several applications with high demands on dexterity and safety. When operating these systems, real-time estimation and control require fast and accurate models. However, prediction with first-principles (FP) models is slow, and learned black-box models have poor generalizability. Physics-informed machine learning offers excellent advantages here, but it is currently limited to simple, often simulated systems without considering changes after training. We propose physics-informed neural networks (PINNs) for articulated soft robots (ASRs) with a focus on data efficiency. The amount of expensive real-world training data is reduced to a minimum – one dataset in one system domain. Two hours of data in different domains are used for a comparison against two gold-standard approaches: In contrast to a recurrent neural network, the PINN provides a high generalizability. The prediction speed of an accurate FP model is exceeded with the PINN by up to a factor of 467 at slightly reduced accuracy. This enables nonlinear model predictive control (MPC) of a pneumatic ASR. Accurate position tracking with the MPC running at 47 Hz is achieved in six dynamic experiments.

nan


Article 426

Title@2025-06-11 (3): Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering

Title: Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering Effizientes und generalisierbares Graph Retriever für Wissens-Graph-Frageantworten lernen 知识拼图问题解答的学习效率和通用图表 2506.09645v1

Authors (6): Tianjun Yao, Haoxuan Li, Zhiqiang Shen, Pan Li, Tongliang Liu, Kun Zhang

Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains, but their reliability is hindered by the outdated knowledge and hallucinations. Retrieval-Augmented Generation mitigates these issues by grounding LLMs with external knowledge; however, most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning. Knowledge graphs, which represent facts as relational triples, offer a more structured and compact alternative. Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering (KGQA), with a significant proportion adopting the retrieve-then-reasoning paradigm. In this framework, graph-based retrievers have demonstrated strong empirical performance, yet they still face challenges in generalization ability. In this work, we propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA. RAPL addresses these limitations through three aspects: (1) a two-stage labeling strategy that combines heuristic signals with parametric models to provide causally grounded supervision; (2) a model-agnostic graph transformation approach to capture both intra- and inter-triple interactions, thereby enhancing representational capacity; and (3) a path-based reasoning strategy that facilitates learning from the injected rational knowledge, and supports downstream reasoner through structured inputs. Empirically, RAPL outperforms state-of-the-art methods by $2.66\%-20.34\%$, and significantly reduces the performance gap between smaller and more powerful LLM-based reasoners, as well as the gap under cross-dataset settings, highlighting its superior retrieval capability and generalizability. Codes are available at: https://github.com/tianyao-aka/RAPL.

nan


Article 427

Title@2025-06-11 (3): Evasion Attacks Against Bayesian Predictive Models

Title: Evasion Attacks Against Bayesian Predictive Models Evasionsangriffe gegen bayesische Vorhersagemodelle 攻击贝耶斯人预测模型 2506.09640v1

Authors (3): Pablo G. Arce, Roi Naveiro, David Ríos Insua

There is an increasing interest in analyzing the behavior of machine learning systems against adversarial attacks. However, most of the research in adversarial machine learning has focused on studying weaknesses against evasion or poisoning attacks to predictive models in classical setups, with the susceptibility of Bayesian predictive models to attacks remaining underexplored. This paper introduces a general methodology for designing optimal evasion attacks against such models. We investigate two adversarial objectives: perturbing specific point predictions and altering the entire posterior predictive distribution. For both scenarios, we propose novel gradient-based attacks and study their implementation and properties in various computational setups.

nan


Article 428

Title@2025-06-11 (3): FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models

Title: FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models FFVLMBENCH: 设想语言模型联邦罚款基准 2506.09638v1

Authors (6): Weiying Zheng, Ziyue Lin, Pengxin Guo, Yuyin Zhou, Feifei Wang, Liangqiong Qu

Vision-Language Models (VLMs) have demonstrated remarkable capabilities in cross-modal understanding and generation by integrating visual and textual information. While instruction tuning and parameter-efficient fine-tuning methods have substantially improved the generalization of VLMs, most existing approaches rely on centralized training, posing challenges for deployment in domains with strict privacy requirements like healthcare. Recent efforts have introduced Federated Learning (FL) into VLM fine-tuning to address these privacy concerns, yet comprehensive benchmarks for evaluating federated fine-tuning strategies, model architectures, and task generalization remain lacking. In this work, we present \textbf{FedVLMBench}, the first systematic benchmark for federated fine-tuning of VLMs. FedVLMBench integrates two mainstream VLM architectures (encoder-based and encoder-free), four fine-tuning strategies, five FL algorithms, six multimodal datasets spanning four cross-domain single-task scenarios and two cross-domain multitask settings, covering four distinct downstream task categories. Through extensive experiments, we uncover key insights into the interplay between VLM architectures, fine-tuning strategies, data heterogeneity, and multi-task federated optimization. Notably, we find that a 2-layer multilayer perceptron (MLP) connector with concurrent connector and LLM tuning emerges as the optimal configuration for encoder-based VLMs in FL. Furthermore, current FL methods exhibit significantly higher sensitivity to data heterogeneity in vision-centric tasks than text-centric ones, across both encoder-free and encoder-based VLM architectures. Our benchmark provides essential tools, datasets, and empirical guidance for the research community, offering a standardized platform to advance privacy-preserving, federated training of multimodal foundation models.

nan


Article 429

Title@2025-06-11 (3): Model Attribution and Detection of Synthetic Speech via Vocoder Fingerprints

Title: Model Attribution and Detection of Synthetic Speech via Vocoder Fingerprints Modellzuweisung und Erkennung von Synthetischer Sprache über Vocoder-Fingerabdrücke 通过 Vocoder 指纹识别合成演讲的模型归属和检测 2411.14013v2

Authors (5): Matías Pizarro, Mike Laszkiewicz, Shawkat Hesso, Dorothea Kolossa, Asja Fischer

As speech generation technology advances, so do the potential threats of misusing synthetic speech signals. This work tackles three tasks: (1) single-model attribution in an open-world setting corresponding to the task of identifying whether synthetic speech signals originate from a specific vocoder (which requires only target vocoder data), (2) model attribution in a closed-world setting that corresponds to selecting the specific model that generated a sample from a given set of models, and (3) distinguishing synthetic from real speech. We show that standardized average residuals between audio signals and their low-pass or EnCodec filtered versions serve as powerful vocoder fingerprints that can be leveraged for all tasks achieving an average AUROC of over 99% on LJSpeech and JSUT in most settings. The accompanying robustness study shows that it is also resilient to noise levels up to a certain degree.

nan


Article 430

Title@2025-06-11 (3): In-Context Bias Propagation in LLM-Based Tabular Data Generation

Title: In-Context Bias Propagation in LLM-Based Tabular Data Generation In-Context Bias Propagation in der LLM-basierten tabellarischen Datengenerierung 以LLM为基础的制表数据生成中的Intext Bias 2506.09630v1

Authors (6): Pol G. Recasens, Alberto Gutierrez, Jordi Torres, Josep. Ll Berral, Anisa Halimi, Kieran Fraser

Large Language Models (LLMs) are increasingly used for synthetic tabular data generation through in-context learning (ICL), offering a practical solution for data augmentation in data scarce scenarios. While prior work has shown the potential of LLMs to improve downstream task performance through augmenting underrepresented groups, these benefits often assume access to a subset of unbiased in-context examples, representative of the real dataset. In real-world settings, however, data is frequently noisy and demographically skewed. In this paper, we systematically study how statistical biases within in-context examples propagate to the distribution of synthetic tabular data, showing that even mild in-context biases lead to global statistical distortions. We further introduce an adversarial scenario where a malicious contributor can inject bias into the synthetic dataset via a subset of in-context examples, ultimately compromising the fairness of downstream classifiers for a targeted and protected subgroup. Our findings demonstrate a new vulnerability associated with LLM-based data generation pipelines that rely on in-context prompts with in sensitive domains.

nan


Article 431

Title@2025-06-11 (3): Holistic Uncertainty Estimation For Open-Set Recognition

Title: Holistic Uncertainty Estimation For Open-Set Recognition Ganzheitliche Ungewissheitsschätzung für Open-Set-Erkennung Open-Set 识别的全局性不确定性估算 2408.14229v2

Authors (2): Leonid Erlygin, Alexey Zaytsev

Accurate uncertainty estimation is a critical challenge in open-set recognition, where a probe biometric sample may belong to an unknown identity. It can be addressed through sample quality estimation via probabilistic embeddings. However, the low variance of probabilistic embedding only partly implies a low identification error probability: an embedding of a sample could be close to several classes in a gallery, thus yielding high uncertainty despite high sample quality. We propose HolUE - a holistic uncertainty estimation method based on a Bayesian probabilistic model; it is aware of two sources of ambiguity in the open-set recognition system: (1) the gallery uncertainty caused by overlapping classes and (2) the uncertainty of embeddings. Challenging open-set recognition datasets, such as IJB-C for the image domain and VoxBlink for the audio domain, serve as a testbed for our method. We also provide a new open-set recognition protocol for the identification of whales and dolphins. In all cases, HolUE better identifies recognition errors than alternative uncertainty estimation methods, including those based solely on sample quality.

nan


Article 432

Title@2025-06-11 (3): GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras

Title: GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras GLGENN: Eine neuartige Parameter-Licht-Equivariante Neural Networks-Architektur auf Basis von Clifford Geometrischen Algebras GLGENN: 以克里夫德几何代数为基础的新参数-日等均质神经网络建筑 2506.09625v1

Authors (2): Ekaterina Filimoshina, Dmitry Shirokov

We propose, implement, and compare with competitors a new architecture of equivariant neural networks based on geometric (Clifford) algebras: Generalized Lipschitz Group Equivariant Neural Networks (GLGENN). These networks are equivariant to all pseudo-orthogonal transformations, including rotations and reflections, of a vector space with any non-degenerate or degenerate symmetric bilinear form. We propose a weight-sharing parametrization technique that takes into account the fundamental structures and operations of geometric algebras. Due to this technique, GLGENN architecture is parameter-light and has less tendency to overfitting than baseline equivariant models. GLGENN outperforms or matches competitors on several benchmarking equivariant tasks, including estimation of an equivariant function and a convex hull experiment, while using significantly fewer optimizable parameters.

nan


Article 433

Title@2025-06-11 (3): SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot

Title: SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot SparseSSM: Effiziente selektiv strukturierte State Space-Modelle können in One-Shot beschnitten werden SprassessSSSSM: 高效选择性结构结构国家空间模型可在单层中谨慎处理 2506.09613v1

Authors (2): Kaiwen Tuo, Huan Wang

State-space language models such as Mamba match Transformer quality while permitting linear complexity inference, yet still comprise billions of parameters that hinder deployment. Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix at the heart of the selective state-space module (SSM). In this paper, we introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures. Our layer-wise algorithm (i) derives an approximate second-order saliency score that aggregates Hessian-trace information across time steps, (ii) incorporates a component sensitivity analysis to guide feed-forward network (FFN) pruning, which also sheds light on where redundancy resides in mamba architecture, (iii) can be easily extended to semi-structured and structured sparsity. Empirically, we prune 50% of SSM weights without fine-tuning and observe no zero-shot accuracy loss, achieving the current state-of-the-art pruning algorithm for Mamba-based LLMs.

nan


Article 434

Title@2025-06-11 (3): Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism

Title: Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism Verständnis der Kompetenzlücke in recurrenten Sprachmodellen: Die Rolle des Sammel- und Aggregatmechanismus 了解经常语文模式中的技能差距:集合和集合机制的作用 2504.18574v2

Authors (3): Aviv Bick, Eric Xing, Albert Gu

State-space models (SSMs) offer efficient alternatives to Transformers for long sequences, but their fixed-size recurrent state limits capability on algorithmic tasks, such as retrieving past context. In this work, we examine how in-context retrieval operates in Transformer- and SSM-based language models and find that both rely on a similar Gather-and-Aggregate (G&A) mechanism: a Gather Head extracts relevant information pieces from context, which an Aggregate Head integrates into a single representation. In both architectures, G&A concentrates in a few heads, forming critical bottlenecks even for simple retrieval. For example, we show that disabling a single Gather or Aggregate Head in a pruned Llama-3.1-8B impairs retrieving the correct answer letter in MMLU, reducing its accuracy from 66% to 25% (random guessing). Moreover, this retrieval bottleneck can obscure limited knowledge demands of tasks as the pruned model succeeds on MMLU with functioning G&A heads yet fails on other knowledge benchmarks. The bottleneck similarly extends to tasks where SSMs typically underperform, such as GSM8K, BBH, and dialogue comprehension. We show that SSMs’ retrieval challenges manifest in these heads, creating smoother attention patterns instead of the sharp token transitions effective G&A requires. Thus, the Transformer-SSM retrieval gap exists in just a few heads, rather than the entire language model. This suggests a unified explanation for Transformer vs. SSM performance gap while showing how to merge their strengths. We find that pretrained hybrid models, where SSMs are combined with a few attention layers, delegate the role of Aggregate Heads to attention. Similarly, replacing a single G&A head in a pretrained SSM with an attention variant boosts retrieval and benchmark scores.

nan


Article 435

Title@2025-06-11 (3): Accelerating Large-Scale Regularized High-Order Tensor Recovery

Title: Accelerating Large-Scale Regularized High-Order Tensor Recovery Beschleunigung der großflächigen Regularisierung der High-Order Tensor Erholung 加速大规模正规化高阶天文恢复 2506.09594v1

Authors (4): Wenjin Qin, Hailin Wang, Jingyao Hou, Jianjun Wang

Currently, existing tensor recovery methods fail to recognize the impact of tensor scale variations on their structural characteristics. Furthermore, existing studies face prohibitive computational costs when dealing with large-scale high-order tensor data. To alleviate these issue, assisted by the Krylov subspace iteration, block Lanczos bidiagonalization process, and random projection strategies, this article first devises two fast and accurate randomized algorithms for low-rank tensor approximation (LRTA) problem. Theoretical bounds on the accuracy of the approximation error estimate are established. Next, we develop a novel generalized nonconvex modeling framework tailored to large-scale tensor recovery, in which a new regularization paradigm is exploited to achieve insightful prior representation for large-scale tensors. On the basis of the above, we further investigate new unified nonconvex models and efficient optimization algorithms, respectively, for several typical high-order tensor recovery tasks in unquantized and quantized situations. To render the proposed algorithms practical and efficient for large-scale tensor data, the proposed randomized LRTA schemes are integrated into their central and time-intensive computations. Finally, we conduct extensive experiments on various large-scale tensors, whose results demonstrate the practicability, effectiveness and superiority of the proposed method in comparison with some state-of-the-art approaches.

nan


Article 436

Title@2025-06-11 (3): Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting

Title: Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting Physics-informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting 用于全球次季对季预报的计算机化远程连接软件变换器 2506.08049v2

Authors (3): Tengfei Lyu, Weijia Zhang, Hao Liu

Subseasonal-to-seasonal (S2S) forecasting, which predicts climate conditions from several weeks to months in advance, presents significant challenges due to the chaotic dynamics of atmospheric systems and complex interactions across multiple scales. Current approaches often fail to explicitly model underlying physical processes and teleconnections that are crucial at S2S timescales. We introduce TelePiT, a novel deep learning architecture that enhances global S2S forecasting through integrated multi-scale physics and teleconnection awareness. Our approach consists of three key components: (1) Spherical Harmonic Embedding, which accurately encodes global atmospheric variables onto spherical geometry; (2) Multi-Scale Physics-Informed Neural ODE, which explicitly captures atmospheric physical processes across multiple learnable frequency bands; (3) Teleconnection-Aware Transformer, which models critical global climate interactions through tactfully injecting teleconnection patterns into the self-attention. Extensive experiments demonstrate that TelePiT significantly outperforms state-of-the-art data-driven baselines and operational numerical weather prediction systems, with remarkable improvements for atmospheric variables including a 57.7% reduction in RMSE for 2-meter temperature compared to previous best models.

nan


Article 437

Title@2025-06-11 (3): Metric-Entropy Limits on the Approximation of Nonlinear Dynamical Systems

Title: Metric-Entropy Limits on the Approximation of Nonlinear Dynamical Systems Metrisch-Entropie-Grenzen zur Annäherung nichtlinearer dynamischer Systeme 非线-线性动态系统接近度的中子-内基限值 2407.01250v2

Authors (3): Yang Pan, Clemens Hutter, Helmut Bölcskei

This paper is concerned with fundamental limits on the approximation of nonlinear dynamical systems. Specifically, we show that recurrent neural networks (RNNs) can approximate nonlinear systems – that satisfy a Lipschitz property and forget past inputs fast enough – in metric-entropy-optimal manner. As the sets of sequence-to-sequence mappings realized by the dynamical systems we consider are significantly more massive than function classes generally analyzed in approximation theory, a refined metric-entropy characterization is needed, namely in terms of order, type, and generalized dimension. We compute these quantities for the classes of exponentially- and polynomially Lipschitz fading-memory systems and show that RNNs can achieve them.

nan


Article 438

Title@2025-06-11 (3): Beyond Overconfidence: Foundation Models Redefine Calibration in Deep Neural Networks

Title: Beyond Overconfidence: Foundation Models Redefine Calibration in Deep Neural Networks Beyond Overconfidency: Foundation Models Neufine Calibration in Deep Neural Networks 超越自信:深神经网络中的模型重新校准基金会 2506.09593v1

Authors (3): Achim Hekler, Lukas Kuhn, Florian Buettner

Reliable uncertainty calibration is essential for safely deploying deep neural networks in high-stakes applications. Deep neural networks are known to exhibit systematic overconfidence, especially under distribution shifts. Although foundation models such as ConvNeXt, EVA and BEiT have demonstrated significant improvements in predictive performance, their calibration properties remain underexplored. This paper presents a comprehensive investigation into the calibration behavior of foundation models, revealing insights that challenge established paradigms. Our empirical analysis shows that these models tend to be underconfident in in-distribution predictions, resulting in higher calibration errors, while demonstrating improved calibration under distribution shifts. Furthermore, we demonstrate that foundation models are highly responsive to post-hoc calibration techniques in the in-distribution setting, enabling practitioners to effectively mitigate underconfidence bias. However, these methods become progressively less reliable under severe distribution shifts and can occasionally produce counterproductive results. Our findings highlight the complex, non-monotonic effects of architectural and training innovations on calibration, challenging established narratives of continuous improvement.

nan


Article 439

Title@2025-06-11 (3): Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis

Title: Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis Decoding Knowledge Attribution in Mixture-of-Experts: Ein Rahmenwerk der Basic-Refinement-Kollaboration und Effizienzanalyse 在混合专家中解码知识的配置:基础改进协作和效率分析框架 2505.24593v2

Authors (6): Junzhuo Li, Bo Wang, Xiuze Zhou, Peijie Jiang, Jia Liu, Xuming Hu

The interpretability of Mixture-of-Experts (MoE) models, especially those with heterogeneous designs, remains underexplored. Existing attribution methods for dense models fail to capture dynamic routing-expert interactions in sparse MoE architectures. To address this issue, we propose a cross-level attribution algorithm to analyze sparse MoE architectures (Qwen 1.5-MoE, OLMoE, Mixtral-8x7B) against dense models (Qwen 1.5-7B, Llama-7B, Mistral-7B). Results show MoE models achieve 37% higher per-layer efficiency via a “mid-activation, late-amplification” pattern: early layers screen experts, while late layers refine knowledge collaboratively. Ablation studies reveal a “basic-refinement” framework–shared experts handle general tasks (entity recognition), while routed experts specialize in domain-specific processing (geographic attributes). Semantic-driven routing is evidenced by strong correlations between attention heads and experts (r=0.68), enabling task-aware coordination. Notably, architectural depth dictates robustness: deep Qwen 1.5-MoE mitigates expert failures (e.g., 43% MRR drop in geographic tasks when blocking top-10 experts) through shared expert redundancy, whereas shallow OLMoE suffers severe degradation (76% drop). Task sensitivity further guides design: core-sensitive tasks (geography) require concentrated expertise, while distributed-tolerant tasks (object attributes) leverage broader participation. These insights advance MoE interpretability, offering principles to balance efficiency, specialization, and robustness.

nan


Article 440

Title@2025-06-11 (3): Human-like object concept representations emerge naturally in multimodal large language models

Title: Human-like object concept representations emerge naturally in multimodal large language models Mensch-ähnliche Objekt-Konzept-Darstellungen entstehen natürlich in multimodalen großen Sprachmodellen 在多式大型语言模式中自然出现类似人类的物体概念表述 2407.01067v3

Authors (13): Changde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, Jinpeng Li, Shuang Qiu, Le Chang, Huiguang He

Understanding how humans conceptualize and categorize natural objects offers critical insights into perception and cognition. With the advent of Large Language Models (LLMs), a key question arises: can these models develop human-like object representations from linguistic and multimodal data? In this study, we combined behavioral and neuroimaging analyses to explore the relationship between object concept representations in LLMs and human cognition. We collected 4.7 million triplet judgments from LLMs and Multimodal LLMs (MLLMs) to derive low-dimensional embeddings that capture the similarity structure of 1,854 natural objects. The resulting 66-dimensional embeddings were stable, predictive, and exhibited semantic clustering similar to human mental representations. Remarkably, the dimensions underlying these embeddings were interpretable, suggesting that LLMs and MLLMs develop human-like conceptual representations of objects. Further analysis showed strong alignment between model embeddings and neural activity patterns in brain regions such as EBA, PPA, RSC, and FFA. This provides compelling evidence that the object representations in LLMs, while not identical to human ones, share fundamental similarities that reflect key aspects of human conceptual knowledge. Our findings advance the understanding of machine intelligence and inform the development of more human-like artificial cognitive systems.

nan


Article 441

Title@2025-06-11 (3): Feasibility Study of CNNs and MLPs for Radiation Heat Transfer in 2-D Furnaces with Spectrally Participative Gases

Title: Feasibility Study of CNNs and MLPs for Radiation Heat Transfer in 2-D Furnaces with Spectrally Participative Gases Machbarkeitsstudie von CNNs und MLPs für den Strahlungswärmetransfer in 2-D-Öfen mit Spektrally Participative Gasen 关于有线电视新闻网和多频多频卫星在2-D发热中用光谱参与气体进行辐射热传导的有线电视新闻网和 MLP的可行性研究 2506.08033v2

Authors (5): Axel TahmasebiMoradi, Vincent Ren, Benjamin Le-Creurer, Chetra Mang, Mouadh Yagoubi

Aiming to reduce the computational cost of numerical simulations, a convolutional neural network (CNN) and a multi-layer perceptron (MLP) are introduced to build a surrogate model to approximate radiative heat transfer solutions in a 2-D walled domain with participative gases. The originality of this work lays in the adaptation of the inputs of the problem (gas and wall properties) in order to fit with the CNN architecture, more commonly used for image processing. Two precision datasets have been created with the classical solver, ICARUS2D, that uses the discrete transfer radiation method with the statistical narrow bands model. The performance of the CNN architecture is compared to a more classical MLP architecture in terms of speed and accuracy. Thanks to Optuna, all results are obtained using the optimized hyper parameters networks. The results show a significant speedup with industrially acceptable relative errors compared to the classical solver for both architectures. Additionally, the CNN outperforms the MLP in terms of precision and is more robust and stable to changes in hyper-parameters. A performance analysis on the dataset size of the samples have also been carried out to gain a deeper understanding of the model behavior.

nan


Article 442

Title@2025-06-11 (3): MOORL: A Framework for Integrating Offline-Online Reinforcement Learning

Title: MOORL: A Framework for Integrating Offline-Online Reinforcement Learning MOORL: Ein Rahmen für die Integration von Offline-Online-Verstärkungslernen MORL: 整合离线-在线强化学习的框架 2506.09574v1

Authors (3): Gaurav Chaudhary, Wassim Uddin Mondal, Laxmidhar Behera

Sample efficiency and exploration remain critical challenges in Deep Reinforcement Learning (DRL), particularly in complex domains. Offline RL, which enables agents to learn optimal policies from static, pre-collected datasets, has emerged as a promising alternative. However, offline RL is constrained by issues such as out-of-distribution (OOD) actions that limit policy performance and generalization. To overcome these limitations, we propose Meta Offline-Online Reinforcement Learning (MOORL), a hybrid framework that unifies offline and online RL for efficient and scalable learning. While previous hybrid methods rely on extensive design components and added computational complexity to utilize offline data effectively, MOORL introduces a meta-policy that seamlessly adapts across offline and online trajectories. This enables the agent to leverage offline data for robust initialization while utilizing online interactions to drive efficient exploration. Our theoretical analysis demonstrates that the hybrid approach enhances exploration by effectively combining the complementary strengths of offline and online data. Furthermore, we demonstrate that MOORL learns a stable Q-function without added complexity. Extensive experiments on 28 tasks from the D4RL and V-D4RL benchmarks validate its effectiveness, showing consistent improvements over state-of-the-art offline and hybrid RL baselines. With minimal computational overhead, MOORL achieves strong performance, underscoring its potential for practical applications in real-world scenarios.

nan


Article 443

Title@2025-06-11 (3): Mechanistic PDE Networks for Discovery of Governing Equations

Title: Mechanistic PDE Networks for Discovery of Governing Equations Mechanistische PDE-Netzwerke für die Entdeckung von Regierenden Gleichungen 发现管理等分的机械 PDE 网络 2502.18377v2

Authors (3): Adeel Pervez, Efstratios Gavves, Francesco Locatello

We present Mechanistic PDE Networks – a model for discovery of governing partial differential equations from data. Mechanistic PDE Networks represent spatiotemporal data as space-time dependent linear partial differential equations in neural network hidden representations. The represented PDEs are then solved and decoded for specific tasks. The learned PDE representations naturally express the spatiotemporal dynamics in data in neural network hidden space, enabling increased power for dynamical modeling. Solving the PDE representations in a compute and memory-efficient way, however, is a significant challenge. We develop a native, GPU-capable, parallel, sparse, and differentiable multigrid solver specialized for linear partial differential equations that acts as a module in Mechanistic PDE Networks. Leveraging the PDE solver, we propose a discovery architecture that can discover nonlinear PDEs in complex settings while also being robust to noise. We validate PDE discovery on a number of PDEs, including reaction-diffusion and Navier-Stokes equations.

nan


Article 444

Title@2025-06-11 (3): From Symbolic to Neural and Back: Exploring Knowledge Graph-Large Language Model Synergies

Title: From Symbolic to Neural and Back: Exploring Knowledge Graph-Large Language Model Synergies Von Symbolisch zu Neural und Zurück: Erforschen von Wissen Graphen-Große Sprachmodell-Synergien 从符号到神经和后背:探索知识图集-大型语言模式协同作用 2506.09566v1

Authors (4): Blaž Škrlj, Boshko Koloski, Senja Pollak, Nada Lavrač

Integrating structured knowledge from Knowledge Graphs (KGs) into Large Language Models (LLMs) enhances factual grounding and reasoning capabilities. This survey paper systematically examines the synergy between KGs and LLMs, categorizing existing approaches into two main groups: KG-enhanced LLMs, which improve reasoning, reduce hallucinations, and enable complex question answering; and LLM-augmented KGs, which facilitate KG construction, completion, and querying. Through comprehensive analysis, we identify critical gaps and highlight the mutual benefits of structured knowledge integration. Compared to existing surveys, our study uniquely emphasizes scalability, computational efficiency, and data quality. Finally, we propose future research directions, including neuro-symbolic integration, dynamic KG updating, data reliability, and ethical considerations, paving the way for intelligent systems capable of managing more complex real-world knowledge tasks.

nan


Article 445

Title@2025-06-11 (3): Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics

Title: Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics Funktion Normalisierung verhindert Zusammenbruch nicht kontrastiver Lerndynamiken 防止非争议学习动态的崩溃 2309.16109v2

Authors (1): Han Bao

Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may collapse the dynamics, which is an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.

nan


Article 446

Title@2025-06-11 (3): Traceable LLM-based validation of statements in knowledge graphs

Title: Traceable LLM-based validation of statements in knowledge graphs Rückführbare LLM-basierte Validierung von Aussagen in Wissensgraphen 以可追踪的LLM为基础,对知识图中的报表进行可追踪的LLM验证 2409.07507v2

Authors (2): Daniel Adam, Tomáš Kliegr

This article presents a method for verifying RDF triples using LLMs, with an emphasis on providing traceable arguments. Because the LLMs cannot currently reliably identify the origin of the information used to construct the response to the user prompt, our approach is to avoid using internal LLM factual knowledge altogether. Instead, verified RDF statements are compared to chunks of external documents retrieved through a web search or Wikipedia. To assess the possible application of this retrieval augmented generation (RAG) workflow on biosciences content, we evaluated 1,719 positive statements from the BioRED dataset and the same number of newly generated negative statements. The resulting precision is 88 %, and recall is 44 %. This indicates that the method requires human oversight. We also evaluated the method on the SNLI dataset, which allowed us to compare our approach with models specifically tuned for the natural language inference task. We demonstrate the method on Wikidata, where a SPARQL query is used to automatically retrieve statements needing verification. Overall, the results suggest that LLMs could be used for large-scale verification of statements in KGs, a task previously unfeasible due to human annotation costs.

nan


Article 447

Title@2025-06-11 (3): On Finetuning Tabular Foundation Models

Title: On Finetuning Tabular Foundation Models Über Finetuning Tabular Foundation Modelle 关于微调表表基础模型 2506.08982v2

Authors (4): Ivan Rubachev, Akim Kotelnikov, Nikolay Kartashev, Artem Babenko

Foundation models are an emerging research direction in tabular deep learning. Notably, TabPFNv2 recently claimed superior performance over traditional GBDT-based methods on small-scale datasets using an in-context learning paradigm, which does not adapt model parameters to target datasets. However, the optimal finetuning approach for adapting tabular foundational models, and how this adaptation reshapes their internal mechanisms, remains underexplored. While prior works studied finetuning for earlier foundational models, inconsistent findings and TabPFNv2’s unique architecture necessitate fresh investigation. To address these questions, we first systematically evaluate various finetuning strategies on diverse datasets. Our findings establish full finetuning as the most practical solution for TabPFNv2 in terms of time-efficiency and effectiveness. We then investigate how finetuning alters TabPFNv2’s inner mechanisms, drawing an analogy to retrieval-augmented models. We reveal that the success of finetuning stems from the fact that after gradient-based adaptation, the dot products of the query-representations of test objects and the key-representations of in-context training objects more accurately reflect their target similarity. This improved similarity allows finetuned TabPFNv2 to better approximate target dependency by appropriately weighting relevant in-context samples, improving the retrieval-based prediction logic. From the practical perspective, we managed to finetune TabPFNv2 on datasets with up to 50K objects, observing performance improvements on almost all tasks. More precisely, on academic datasets with I.I.D. splits, finetuning allows TabPFNv2 to achieve state-of-the-art results, while on datasets with gradual temporal shifts and rich feature sets, TabPFNv2 is less stable and prior methods remain better.

nan


Article 448

Title@2025-06-11 (3): Large Language Models Miss the Multi-Agent Mark

Title: Large Language Models Miss the Multi-Agent Mark Große Sprachmodelle vermissen das Multi-Agent Mark 大语言模型 2505.21298v2

Authors (8): Emanuele La Malfa, Gabriele La Malfa, Samuele Marro, Jie M. Zhang, Elizabeth Black, Michael Luck, Philip Torr, Michael Wooldridge

Recent interest in Multi-Agent Systems of Large Language Models (MAS LLMs) has led to an increase in frameworks leveraging multiple LLMs to tackle complex tasks. However, much of this literature appropriates the terminology of MAS without engaging with its foundational principles. In this position paper, we highlight critical discrepancies between MAS theory and current MAS LLMs implementations, focusing on four key areas: the social aspect of agency, environment design, coordination and communication protocols, and measuring emergent behaviours. Our position is that many MAS LLMs lack multi-agent characteristics such as autonomy, social interaction, and structured environments, and often rely on oversimplified, LLM-centric architectures. The field may slow down and lose traction by revisiting problems the MAS literature has already addressed. Therefore, we systematically analyse this issue and outline associated research opportunities; we advocate for better integrating established MAS concepts and more precise terminology to avoid mischaracterisation and missed opportunities.

nan


Article 449

Title@2025-06-11 (3): Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means

Title: Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means Robuste Verteilte Schätzung: Erweiterung von Gossip-Algorithmen auf Rangfolge und Trimmmittel 强有力的分布分布式估算:将Gossip的数值扩大至排名和缩略语 2505.17836v5

Authors (3): Anna Van Elst, Igor Colin, Stephan Clémençon

This paper addresses the problem of robust estimation in gossip algorithms over arbitrary communication graphs. Gossip algorithms are fully decentralized, relying only on local neighbor-to-neighbor communication, making them well-suited for situations where communication is constrained. A fundamental challenge in existing mean-based gossip algorithms is their vulnerability to malicious or corrupted nodes. In this paper, we show that an outlier-robust mean can be computed by globally estimating a robust statistic. More specifically, we propose a novel gossip algorithm for rank estimation, referred to as \textsc{GoRank}, and leverage it to design a gossip procedure dedicated to trimmed mean estimation, coined \textsc{GoTrim}. In addition to a detailed description of the proposed methods, a key contribution of our work is a precise convergence analysis: we establish an $\mathcal{O}(1/t)$ rate for rank estimation and an $\mathcal{O}((\log t)/\sqrt{t})$ rate for trimmed mean estimation, where by $t$ is meant the number of iterations. Moreover, we provide a breakdown point analysis of \textsc{GoTrim}. We empirically validate our theoretical results through experiments on diverse network topologies, data distributions and contamination schemes.

nan


Article 450

Title@2025-06-11 (3): DeepExtractor: Time-domain reconstruction of signals and glitches in gravitational wave data with deep learning

Title: DeepExtractor: Time-domain reconstruction of signals and glitches in gravitational wave data with deep learning DeepExtractor: Zeit-Domänen-Rekonstruktion von Signalen und Störungen in Gravitationswellendaten mit Deep Learning 深度提取器:用深层学习重建引力波数据信号和小故障的时间- 域重建 2501.18423v3

Authors (7): Tom Dooney, Harsh Narola, Stefano Bromuri, R. Lyana Curier, Chris Van Den Broeck, Sarah Caudill, Daniel Stanley Tan

Gravitational wave (GW) detectors, such as LIGO, Virgo, and KAGRA, detect faint signals from distant astrophysical events. However, their high sensitivity also makes them susceptible to background noise, which can obscure these signals. This noise often includes transient artifacts called ‘glitches’, that can mimic genuine astrophysical signals or mask their true characteristics. In this study, we present DeepExtractor, a deep learning framework that is designed to reconstruct signals and glitches with power exceeding interferometer noise, regardless of their source. We design DeepExtractor to model the inherent noise distribution of GW detectors, following conventional assumptions that the noise is Gaussian and stationary over short time scales. It operates by predicting and subtracting the noise component of the data, retaining only the clean reconstruction of signal or glitch. We focus on applications related to glitches and validate DeepExtractor’s effectiveness through three experiments: (1) reconstructing simulated glitches injected into simulated detector noise, (2) comparing its performance with the state-of-the-art BayesWave algorithm, and (3) analyzing real data from the Gravity Spy dataset to demonstrate effective glitch subtraction from LIGO strain data. We further demonstrate its potential by reconstructing three real GW events from LIGO’s third observing run, without being trained on GW waveforms. Our proposed model achieves a median mismatch of only 0.9% for simulated glitches, outperforming several deep learning baselines. Additionally, DeepExtractor surpasses BayesWave in glitch recovery, offering a dramatic computational speedup by reconstructing one glitch sample in approximately 0.1 seconds on a CPU, compared to BayesWave’s processing time of approximately one hour per glitch.

nan


Article 451

Title@2025-06-11 (3): Differentially Private Federated $k$-Means Clustering with Server-Side Data

Title: Differentially Private Federated $k$-Means Clustering with Server-Side Data Differential Private Federated $k$-Means Clustering mit Server-Side-Daten 与服务器-Side数据组合在一起 2506.05408v2

Authors (3): Jonathan Scott, Christoph H. Lampert, David Saulpic

Clustering is a cornerstone of data analysis that is particularly suited to identifying coherent subgroups or substructures in unlabeled data, as are generated continuously in large amounts these days. However, in many cases traditional clustering methods are not applicable, because data are increasingly being produced and stored in a distributed way, e.g. on edge devices, and privacy concerns prevent it from being transferred to a central server. To address this challenge, we present FedDP-KMeans, a new algorithm for $k$-means clustering that is fully-federated as well as differentially private. Our approach leverages (potentially small and out-of-distribution) server-side data to overcome the primary challenge of differentially private clustering methods: the need for a good initialization. Combining our initialization with a simple federated DP-Lloyds algorithm we obtain an algorithm that achieves excellent results on synthetic and real-world benchmark tasks. We also provide a theoretical analysis of our method that provides bounds on the convergence speed and cluster identification success.

nan


Article 452

Title@2025-06-11 (3): The Causal Information Bottleneck and Optimal Causal Variable Abstractions

Title: The Causal Information Bottleneck and Optimal Causal Variable Abstractions Die Kausalinformation Engpass und Optimale Kausalvariable Abstraktionen 因果信息瓶颈和最佳因果变量抽象 2410.00535v5

Authors (3): Francisco N. F. Q. Simoes, Mehdi Dastani, Thijs van Ommen

To effectively study complex causal systems, it is often useful to construct abstractions of parts of the system by discarding irrelevant details while preserving key features. The Information Bottleneck (IB) method is a widely used approach to construct variable abstractions by compressing random variables while retaining predictive power over a target variable. Traditional methods like IB are purely statistical and ignore underlying causal structures, making them ill-suited for causal tasks. We propose the Causal Information Bottleneck (CIB), a causal extension of the IB, which compresses a set of chosen variables while maintaining causal control over a target variable. This method produces abstractions of (sets of) variables which are causally interpretable, give us insight about the interactions between the abstracted variables and the target variable, and can be used when reasoning about interventions. We present experimental results demonstrating that the learned abstractions accurately capture causal relations as intended.

nan


Article 453

Title@2025-06-11 (3): Identifiable Latent Bandits: Leveraging observational data for personalized decision-making

Title: Identifiable Latent Bandits: Leveraging observational data for personalized decision-making Identifizierbare Latent Bandits: Nutzung von Beobachtungsdaten für personalisierte Entscheidungen 利用观测数据进行个性化决策 2407.16239v4

Authors (4): Ahmet Zahid Balcıoğlu, Newton Mwai, Emil Carlsson, Fredrik D. Johansson

For many decision-making tasks, such as precision medicine, historical data alone are insufficient to determine the right choice for a new problem instance or patient. Online algorithms like multi-armed bandits can find optimal personalized decisions but are notoriously sample-hungry. In practice, training a bandit for a new individual from scratch is often infeasible, as the number of trials required is larger than the practical number of decision points. Latent bandits offer rapid exploration and personalization beyond what context variables can reveal, provided that a latent variable model can be learned consistently. In this work, we propose an identifiable latent bandit framework that leads to optimal decision-making with a shorter exploration time than classical bandits by learning from historical records of decisions and outcomes. Our method is based on nonlinear independent component analysis that provably identifies representations from observational data sufficient to infer the optimal action in new bandit instances. We verify this strategy in simulated and semi-synthetic environments, showing substantial improvement over online and offline learning baselines when identifying conditions are satisfied.

nan


Article 454

Title@2025-06-11 (3): Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information

Title: Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information Eng gekoppelte LiDAR-IMU-Leg-Odometrie mit Online-Lektionskinematik, die Fußtaktile Informationen enthält LIDAR-IMU-Leg Odo量度与包含脚触觉信息的在线学习腿脚力学 2506.09548v1

Authors (7): Taku Okawara, Kenji Koide, Aoki Takanose, Shuji Oishi, Masashi Yokozuka, Kentaro Uno, Kazuya Yoshida

In this letter, we present tightly coupled LiDAR-IMU-leg odometry, which is robust to challenging conditions such as featureless environments and deformable terrains. We developed an online learning-based leg kinematics model named the neural leg kinematics model, which incorporates tactile information (foot reaction force) to implicitly express the nonlinear dynamics between robot feet and the ground. Online training of this model enhances its adaptability to weight load changes of a robot (e.g., assuming delivery or transportation tasks) and terrain conditions. According to the \textit{neural adaptive leg odometry factor} and online uncertainty estimation of the leg kinematics model-based motion predictions, we jointly solve online training of this kinematics model and odometry estimation on a unified factor graph to retain the consistency of both. The proposed method was verified through real experiments using a quadruped robot in two challenging situations: 1) a sandy beach, representing an extremely featureless area with a deformable terrain, and 2) a campus, including multiple featureless areas and terrain types of asphalt, gravel (deformable terrain), and grass. Experimental results showed that our odometry estimation incorporating the \textit{neural leg kinematics model} outperforms state-of-the-art works. Our project page is available for further details: https://takuokawara.github.io/RAL2025_project_page/

nan


Article 455

Title@2025-06-11 (3): Design Patterns for Securing LLM Agents against Prompt Injections

Title: Design Patterns for Securing LLM Agents against Prompt Injections Designmuster zur Sicherung von LLM-Agenten gegen sofortige Injektionen 防止及时注射的LLM代理物的安全设计模式 2506.08837v2

Authors (13): Luca Beurer-Kellner, Beat Buesser Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn

As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent’s resilience on natural language inputs – an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.

nan


Article 456

Title@2025-06-11 (3): On the Privacy Risks of Spiking Neural Networks: A Membership Inference Analysis

Title: On the Privacy Risks of Spiking Neural Networks: A Membership Inference Analysis Über die Datenschutzrisiken von Spiking Neuronal Networks: Eine Schlussfolgerungsanalyse für die Mitgliedschaft 《关于间谍神经网络的隐私风险:成员推断分析》 2502.13191v4

Authors (4): Junyi Guan, Abhijith Sharma, Chong Tian, Salem Lahlou

Spiking Neural Networks (SNNs) are increasingly explored for their energy efficiency and robustness in real-world applications, yet their privacy risks remain largely unexamined. In this work, we investigate the susceptibility of SNNs to Membership Inference Attacks (MIAs) – a major privacy threat where an adversary attempts to determine whether a given sample was part of the training dataset. While prior work suggests that SNNs may offer inherent robustness due to their discrete, event-driven nature, we find that its resilience diminishes as latency (T) increases. Furthermore, we introduce an input dropout strategy under black box setting, that significantly enhances membership inference in SNNs. Our findings challenge the assumption that SNNs are inherently more secure, and even though they are expected to be better, our results reveal that SNNs exhibit privacy vulnerabilities that are equally comparable to Artificial Neural Networks (ANNs). Our code is available at https://github.com/sharmaabhijith/MIA_SNN.

nan


Article 457

Title@2025-06-11 (3): Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models

Title: Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models Athena: Multimodale Vernunft mit dateneffizienten Prozess-Reward-Modellen verbessern Athena:利用数据高效流程奖励模式加强多式联运理由 2506.09532v1

Authors (6): Shuai Wang, Zhenhua Liu, Jiaheng Wei, Xuanwu Yin, Dong Li, Emad Barsoum

We present Athena-PRM, a multimodal process reward model (PRM) designed to evaluate the reward score for each step in solving complex reasoning problems. Developing high-performance PRMs typically demands significant time and financial investment, primarily due to the necessity for step-level annotations of reasoning steps. Conventional automated labeling methods, such as Monte Carlo estimation, often produce noisy labels and incur substantial computational costs. To efficiently generate high-quality process-labeled data, we propose leveraging prediction consistency between weak and strong completers as a criterion for identifying reliable process labels. Remarkably, Athena-PRM demonstrates outstanding effectiveness across various scenarios and benchmarks with just 5,000 samples. Furthermore, we also develop two effective strategies to improve the performance of PRMs: ORM initialization and up-sampling for negative data. We validate our approach in three specific scenarios: verification for test time scaling, direct evaluation of reasoning step correctness, and reward ranked fine-tuning. Our Athena-PRM consistently achieves superior performance across multiple benchmarks and scenarios. Notably, when using Qwen2.5-VL-7B as the policy model, Athena-PRM enhances performance by 10.2 points on WeMath and 7.1 points on MathVista for test time scaling. Furthermore, Athena-PRM sets the state-of-the-art (SoTA) results in VisualProcessBench and outperforms the previous SoTA by 3.9 F1-score, showcasing its robust capability to accurately assess the correctness of the reasoning step. Additionally, utilizing Athena-PRM as the reward model, we develop Athena-7B with reward ranked fine-tuning and outperforms baseline with a significant margin on five benchmarks.

nan


Article 458

Title@2025-06-11 (3): Neural Functions for Learning Periodic Signal

Title: Neural Functions for Learning Periodic Signal Neurale Funktionen für das Lernen von periodischem Signal 学习周期信号的神经功能 2506.09526v1

Authors (4): Woojin Cho, Minju Jo, Kookjin Lee, Noseong Park

As function approximators, deep neural networks have served as an effective tool to represent various signal types. Recent approaches utilize multi-layer perceptrons (MLPs) to learn a nonlinear mapping from a coordinate to its corresponding signal, facilitating the learning of continuous neural representations from discrete data points. Despite notable successes in learning diverse signal types, coordinate-based MLPs often face issues of overfitting and limited generalizability beyond the training region, resulting in subpar extrapolation performance. This study addresses scenarios where the underlying true signals exhibit periodic properties, either spatially or temporally. We propose a novel network architecture, which extracts periodic patterns from measurements and leverages this information to represent the signal, thereby enhancing generalization and improving extrapolation performance. We demonstrate the efficacy of the proposed method through comprehensive experiments, including the learning of the periodic solutions for differential equations, and time series imputation (interpolation) and forecasting (extrapolation) on real-world datasets.

nan


Article 459

Title@2025-06-11 (3): Bias Detection via Maximum Subgroup Discrepancy

Title: Bias Detection via Maximum Subgroup Discrepancy Bias-Erkennung über maximale Untergruppe Diskrepanz 通过最大分组差异探测比亚斯 2502.02221v2

Authors (5): Jiří Němeček, Mark Kozdoba, Illia Kryvoviaz, Tomáš Pevný, Jakub Mareček

Bias evaluation is fundamental to trustworthy AI, both in terms of checking data quality and in terms of checking the outputs of AI systems. In testing data quality, for example, one may study the distance of a given dataset, viewed as a distribution, to a given ground-truth reference dataset. However, classical metrics, such as the Total Variation and the Wasserstein distances, are known to have high sample complexities and, therefore, may fail to provide a meaningful distinction in many practical scenarios. In this paper, we propose a new notion of distance, the Maximum Subgroup Discrepancy (MSD). In this metric, two distributions are close if, roughly, discrepancies are low for all feature subgroups. While the number of subgroups may be exponential, we show that the sample complexity is linear in the number of features, thus making it feasible for practical applications. Moreover, we provide a practical algorithm for evaluating the distance based on Mixed-integer optimization (MIO). We also note that the proposed distance is easily interpretable, thus providing clearer paths to fixing the biases once they have been identified. Finally, we describe a natural general bias detection framework, termed MSDD distances, and show that MSD aligns well with this framework. We empirically evaluate MSD by comparing it with other metrics and by demonstrating the above properties of MSD on real-world datasets.

nan


Article 460

Title@2025-06-11 (3): LLM-Powered CPI Prediction Inference with Online Text Time Series

Title: LLM-Powered CPI Prediction Inference with Online Text Time Series LLM-Powered CPI Prediction Inferenz mit Online Text Time Series LLM 授权的CPI 在线文本时间序列预测推论 2506.09516v1

Authors (4): Yingying Fan, Jinchi Lv, Ao Sun, Yurou Wang

Forecasting the Consumer Price Index (CPI) is an important yet challenging task in economics, where most existing approaches rely on low-frequency, survey-based data. With the recent advances of large language models (LLMs), there is growing potential to leverage high-frequency online text data for improved CPI prediction, an area still largely unexplored. This paper proposes LLM-CPI, an LLM-based approach for CPI prediction inference incorporating online text time series. We collect a large set of high-frequency online texts from a popularly used Chinese social network site and employ LLMs such as ChatGPT and the trained BERT models to construct continuous inflation labels for posts that are related to inflation. Online text embeddings are extracted via LDA and BERT. We develop a joint time series framework that combines monthly CPI data with LLM-generated daily CPI surrogates. The monthly model employs an ARX structure combining observed CPI data with text embeddings and macroeconomic variables, while the daily model uses a VARX structure built on LLM-generated CPI surrogates and text embeddings. We establish the asymptotic properties of the method and provide two forms of constructed prediction intervals. The finite-sample performance and practical advantages of LLM-CPI are demonstrated through both simulation and real data examples.

nan


Article 461

Title@2025-06-11 (3): A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications

Title: A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications Eine Umfrage zur Rolle von Künstlicher Intelligenz und maschinellem Lernen in 6G-V2X Anwendungen 关于6G-V2X应用中人工情报和机器学习的作用的调查 2506.09512v1

Authors (4): Donglin Wang, Anjie Qiu, Qiuheng Zhou, Hans D. Schotten

The rapid advancement of Vehicle-to-Everything (V2X) communication is transforming Intelligent Transportation Systems (ITS), with 6G networks expected to provide ultra-reliable, low-latency, and high-capacity connectivity for Connected and Autonomous Vehicles (CAVs). Artificial Intelligence (AI) and Machine Learning (ML) have emerged as key enablers in optimizing V2X communication by enhancing network management, predictive analytics, security, and cooperative driving due to their outstanding performance across various domains, such as natural language processing and computer vision. This survey comprehensively reviews recent advances in AI and ML models applied to 6G-V2X communication. It focuses on state-of-the-art techniques, including Deep Learning (DL), Reinforcement Learning (RL), Generative Learning (GL), and Federated Learning (FL), with particular emphasis on developments from the past two years. Notably, AI, especially GL, has shown remarkable progress and emerging potential in enhancing the performance, adaptability, and intelligence of 6G-V2X systems. Despite these advances, a systematic summary of recent research efforts in this area remains lacking, which this survey aims to address. We analyze their roles in 6G-V2X applications, such as intelligent resource allocation, beamforming, intelligent traffic management, and security management. Furthermore, we explore the technical challenges, including computational complexity, data privacy, and real-time decision-making constraints, while identifying future research directions for AI-driven 6G-V2X development. This study aims to provide valuable insights for researchers, engineers, and policymakers working towards realizing intelligent, AI-powered V2X ecosystems in 6G communication.

nan


Article 462

Title@2025-06-11 (3): Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design

Title: Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design Effizientes Preference-Based-Verstärkungslernen: Randomized Exploration trifft auf experimentelles Design 高效的以优惠为基础的强化学习:随机探索与实验设计 2506.09508v1

Authors (3): Andreas Schlaginhaufen, Reda Ouhamma, Maryam Kamgarpour

We study reinforcement learning from human feedback in general Markov decision processes, where agents learn from trajectory-level preference comparisons. A central challenge in this setting is to design algorithms that select informative preference queries to identify the underlying reward while ensuring theoretical guarantees. We propose a meta-algorithm based on randomized exploration, which avoids the computational challenges associated with optimistic approaches and remains tractable. We establish both regret and last-iterate guarantees under mild reinforcement learning oracle assumptions. To improve query complexity, we introduce and analyze an improved algorithm that collects batches of trajectory pairs and applies optimal experimental design to select informative comparison queries. The batch structure also enables parallelization of preference queries, which is relevant in practical deployment as feedback can be gathered concurrently. Empirical evaluation confirms that the proposed method is competitive with reward-based reinforcement learning while requiring a small number of preference queries.

nan


Article 463

Title@2025-06-11 (3): A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes

Title: A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes Eine einheitliche Theorie der Kompositionalität, Modularität und Interpretationsfähigkeit in Markov-Entscheidungsprozessen 马尔科夫决定程序的组成、模式和解释统一理论 2506.09499v1

Authors (2): Thomas J. Ringstrom, Paul R. Schrater

We introduce Option Kernel Bellman Equations (OKBEs) for a new reward-free Markov Decision Process. Rather than a value function, OKBEs directly construct and optimize a predictive map called a state-time option kernel (STOK) to maximize the probability of completing a goal while avoiding constraint violations. STOKs are compositional, modular, and interpretable initiation-to-termination transition kernels for policies in the Options Framework of Reinforcement Learning. This means: 1) STOKs can be composed using Chapman-Kolmogorov equations to make spatiotemporal predictions for multiple policies over long horizons, 2) high-dimensional STOKs can be represented and computed efficiently in a factorized and reconfigurable form, and 3) STOKs record the probabilities of semantically interpretable goal-success and constraint-violation events, needed for formal verification. Given a high-dimensional state-transition model for an intractable planning problem, we can decompose it with local STOKs and goal-conditioned policies that are aggregated into a factorized goal kernel, making it possible to forward-plan at the level of goals in high-dimensions to solve the problem. These properties lead to highly flexible agents that can rapidly synthesize meta-policies, reuse planning representations across many tasks, and justify goals using empowerment, an intrinsic motivation function. We argue that reward-maximization is in conflict with the properties of compositionality, modularity, and interpretability. Alternatively, OKBEs facilitate these properties to support verifiable long-horizon planning and intrinsic motivation that scales to dynamic high-dimensional world-models.

nan


Article 464

Title@2025-06-11 (3): LEMUR Neural Network Dataset: Towards Seamless AutoML

Title: LEMUR Neural Network Dataset: Towards Seamless AutoML LEMUR Neural Network Dataset: Auf dem Weg zur nahtlosen AutoML LEMUR 神经网络数据集: 走向无缝自动ML 2504.10552v2

Authors (10): Arash Torabi Goodarzi, Roman Kochnev, Waleed Khalid, Furui Qin, Tolgay Atinc Uzun, Yashkumar Sanjaybhai Dhameliya, Yash Kanubhai Kathiriya, Zofia Antonina Bentyn, Dmitry Ignatov, Radu Timofte

Neural networks are fundamental in artificial intelligence, driving progress in computer vision and natural language processing. High-quality datasets are crucial for their development, and there is growing interest in datasets composed of neural networks themselves to support benchmarking, automated machine learning (AutoML), and model analysis. We introduce LEMUR, an open source dataset of neural network models with well-structured code for diverse architectures across tasks such as object detection, image classification, segmentation, and natural language processing. LEMUR is primarily designed to provide a rich source of structured model representations and associated performance data, enabling the fine-tuning of large language models for AutoML applications. Leveraging Python and PyTorch, LEMUR enables seamless extension to new datasets and models while maintaining consistency. It integrates an Optuna-powered framework for evaluation, hyperparameter optimization, statistical analysis, and graphical insights. LEMUR VR extension enables the seamless deployment of models in virtual reality, optimizing their performance on resource-constrained devices. Providing tools for model evaluation, preprocessing, and database management, LEMUR supports researchers and practitioners in developing, testing, and analyzing neural networks. It offers an API that delivers comprehensive information about neural network models and their complete performance statistics with a single request, which can be used in experiments with code-generating large language models. The LEMUR and its plugins are accessible as open source projects under the MIT license at https://github.com/ABrain-One/nn-dataset, https://github.com/ABrain-One/nn-plots and https://github.com/ABrain-One/nn-vr.

nan


Article 465

Title@2025-06-11 (3): EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization

Title: EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization EnerBridge-DPO: Energiegeführtes Protein inverses Falten mit Markov-Brücken und direkter Präferenzoptimierung EnerBridge-DPO:以Markov桥和直接优化为首的热导蛋白质反向折叠 2506.09496v1

Authors (6): Dingyi Rong, Haotian Lu, Wenzhuo Zheng, Fan Zhang, Shuangjia Zheng, Ning Liu

Designing protein sequences with optimal energetic stability is a key challenge in protein inverse folding, as current deep learning methods are primarily trained by maximizing sequence recovery rates, often neglecting the energy of the generated sequences. This work aims to overcome this limitation by developing a model that directly generates low-energy, stable protein sequences. We propose EnerBridge-DPO, a novel inverse folding framework focused on generating low-energy, high-stability protein sequences. Our core innovation lies in: First, integrating Markov Bridges with Direct Preference Optimization (DPO), where energy-based preferences are used to fine-tune the Markov Bridge model. The Markov Bridge initiates optimization from an information-rich prior sequence, providing DPO with a pool of structurally plausible sequence candidates. Second, an explicit energy constraint loss is introduced, which enhances the energy-driven nature of DPO based on prior sequences, enabling the model to effectively learn energy representations from a wealth of prior knowledge and directly predict sequence energy values, thereby capturing quantitative features of the energy landscape. Our evaluations demonstrate that EnerBridge-DPO can design protein complex sequences with lower energy while maintaining sequence recovery rates comparable to state-of-the-art models, and accurately predicts $\Delta \Delta G$ values between various sequences.

nan


Article 466

Title@2025-06-11 (3): Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers

Title: Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers Bridging Online Behavior and Clinical Insight: Eine längsseitige LLM-basierte Studie zur Suizidalität auf YouTube enthüllt neuartige digitale Marker 架桥在线行为和临床透视:YouTube Reveals新数字标记上基于纵向LLM的自杀性纵向LM研究 2506.09495v1

Authors (7): Ilanit Sobol, Shir Lissak, Refael Tikochinski, Tal Nakash, Anat Brunstein Klomek, Eyal Fruchter, Roi Reichart

Suicide remains a leading cause of death in Western countries, underscoring the need for new research approaches. As social media becomes central to daily life, digital footprints offer valuable insight into suicidal behavior. Focusing on individuals who attempted suicide while uploading videos to their channels, we investigate: How do suicidal behaviors manifest on YouTube, and how do they differ from expert knowledge? We applied complementary approaches: computational bottom-up, hybrid, and expert-driven top-down, on a novel longitudinal dataset of 181 YouTube channels from individuals with life-threatening attempts, alongside 134 control channels. In the bottom-up approach, we applied LLM-based topic modeling to identify behavioral indicators. Of 166 topics, five were associated with suicide-attempt, with two also showing temporal attempt-related changes ($p<.01$) - Mental Health Struggles ($+0.08$)* and YouTube Engagement ($+0.1$)*. In the hybrid approach, a clinical expert reviewed LLM-derived topics and flagged 19 as suicide-related. However, none showed significant attempt-related temporal effects beyond those identified bottom-up. Notably, YouTube Engagement, a platform-specific indicator, was not flagged by the expert, underscoring the value of bottom-up discovery. In the top-down approach, psychological assessment of suicide attempt narratives revealed that the only significant difference between individuals who attempted before and those attempted during their upload period was the motivation to share this experience: the former aimed to Help Others ($\beta=-1.69$, $p<.01$), while the latter framed it as part of their Personal Recovery ($\beta=1.08$, $p<.01$). By integrating these approaches, we offer a nuanced understanding of suicidality, bridging digital behavior and clinical insights. * Within-group changes in relation to the suicide attempt.

nan


Article 467

Title@2025-06-11 (3): MIRA: Medical Time Series Foundation Model for Real-World Health Data

Title: MIRA: Medical Time Series Foundation Model for Real-World Health Data MIRA: Medical Time Series Foundation Modell für real-World Gesundheitsdaten 医疗时间系列基金会实际世界卫生数据模型 2506.07584v2

Authors (11): Hao Li, Bowen Deng, Chang Xu, Zhiyuan Feng, Viktor Schlegel, Yu-Hao Huang, Yizheng Sun, Jingyuan Sun, Kailai Yang, Yiyao Yu, Jiang Bian

A unified foundation model for medical time series – pretrained on open access and ethics board-approved medical corpora – offers the potential to reduce annotation burdens, minimize model customization, and enable robust transfer across clinical institutions, modalities, and tasks, particularly in data-scarce or privacy-constrained environments. However, existing generalist time series foundation models struggle to handle medical time series data due to their inherent challenges, including irregular intervals, heterogeneous sampling rates, and frequent missing values. To address these challenges, we introduce MIRA, a unified foundation model specifically designed for medical time series forecasting. MIRA incorporates a Continuous-Time Rotary Positional Encoding that enables fine-grained modeling of variable time intervals, a frequency-specific mixture-of-experts layer that routes computation across latent frequency regimes to further promote temporal specialization, and a Continuous Dynamics Extrapolation Block based on Neural ODE that models the continuous trajectory of latent states, enabling accurate forecasting at arbitrary target timestamps. Pretrained on a large-scale and diverse medical corpus comprising over 454 billion time points collect from publicly available datasets, MIRA achieves reductions in forecasting errors by an average of 10% and 7% in out-of-distribution and in-distribution scenarios, respectively, when compared to other zero-shot and fine-tuned baselines. We also introduce a comprehensive benchmark spanning multiple downstream clinical tasks, establishing a foundation for future research in medical time series modeling.

nan


Article 468

Title@2025-06-11 (3): Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds

Title: Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds Minimax-Optimalität von tiefen neuronalen Netzwerken auf abhängigen Daten über PAC-Bayes-Grenzen 通过PAC-Bayes界限依靠数据建立深海神经网络的最小最佳程度 2410.21702v2

Authors (2): Pierre Alquier, William Kengne

In a groundbreaking work, Schmidt-Hieber (2020) proved the minimax optimality of deep neural networks with ReLu activation for least-square regression estimation over a large class of functions defined by composition. In this paper, we extend these results in many directions. First, we remove the i.i.d. assumption on the observations, to allow some time dependence. The observations are assumed to be a Markov chain with a non-null pseudo-spectral gap. Then, we study a more general class of machine learning problems, which includes least-square and logistic regression as special cases. Leveraging on PAC-Bayes oracle inequalities and a version of Bernstein inequality due to Paulin (2015), we derive upper bounds on the estimation risk for a generalized Bayesian estimator. In the case of least-square regression, this bound matches (up to a logarithmic factor) the lower bound of Schmidt-Hieber (2020). We establish a similar lower bound for classification with the logistic loss, and prove that the proposed DNN estimator is optimal in the minimax sense.

nan


Article 469

Title@2025-06-11 (3): BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation

Title: BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation BemaGANv2: Eine Tutorial- und Vergleichsstudie von GAN-basierten Vocodern für die Langzeit-Audiogenerierung BemaGANv2: 以GAN为基地的长期音频生成Vocoders的教学和比较调查 2506.09487v1

Authors (10): Taesoo Park, Mungwi Jeong, Mingyu Park, Narae Kim, Junyoung Kim, Mujung Kim, Jisang Yoo, Hoyun Lee, Sanghoon Kim, Soonchul Kwon

This paper presents a tutorial-style survey and implementation guide of BemaGANv2, an advanced GAN-based vocoder designed for high-fidelity and long-term audio generation. Built upon the original BemaGAN architecture, BemaGANv2 incorporates major architectural innovations by replacing traditional ResBlocks in the generator with the Anti-aliased Multi-Periodicity composition (AMP) module, which internally applies the Snake activation function to better model periodic structures. In the discriminator framework, we integrate the Multi-Envelope Discriminator (MED), a novel architecture we originally proposed, to extract rich temporal envelope features crucial for periodicity detection. Coupled with the Multi-Resolution Discriminator (MRD), this combination enables more accurate modeling of long-range dependencies in audio. We systematically evaluate various discriminator configurations, including MSD + MED, MSD + MRD, and MPD + MED + MRD, using objective metrics (FAD, SSIM, PLCC, MCD) and subjective evaluations (MOS, SMOS). This paper also provides a comprehensive tutorial on the model architecture, training methodology, and implementation to promote reproducibility. The code and pre-trained models are available at: https://github.com/dinhoitt/BemaGANv2.

nan


Article 470

Title@2025-06-11 (3): Plug-and-Play image restoration with Stochastic deNOising REgularization

Title: Plug-and-Play image restoration with Stochastic deNOising REgularization Plug-and-Play-Bildrestaurierung mit stochastischer Entnoisierung der Regularisierung 利用斯托卡式脱氧化还溶性恢复插件和布板图像 2402.01779v3

Authors (4): Marien Renaud, Jean Prost, Arthur Leclaire, Nicolas Papadakis

Plug-and-Play (PnP) algorithms are a class of iterative algorithms that address image inverse problems by combining a physical model and a deep neural network for regularization. Even if they produce impressive image restoration results, these algorithms rely on a non-standard use of a denoiser on images that are less and less noisy along the iterations, which contrasts with recent algorithms based on Diffusion Models (DM), where the denoiser is applied only on re-noised images. We propose a new PnP framework, called Stochastic deNOising REgularization (SNORE), which applies the denoiser only on images with noise of the adequate level. It is based on an explicit stochastic regularization, which leads to a stochastic gradient descent algorithm to solve ill-posed inverse problems. A convergence analysis of this algorithm and its annealing extension is provided. Experimentally, we prove that SNORE is competitive with respect to state-of-the-art methods on deblurring and inpainting tasks, both quantitatively and qualitatively.

nan


Article 471

Title@2025-06-11 (3): Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling

Title: Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling Auf dem Weg zu einem einheitlichen und verlustfreien Latent-Raum für 3D-Molekular-Latent-Diffusionsmodell 迈向3D分子中子中子扩散模型的 统一和无损失的冷冻空间 2503.15567v3

Authors (9): Yanchen Luo, Zhiyuan Liu, Yi Zhao, Sihang Li, Hengxing Cai, Kenji Kawaguchi, Tat-Seng Chua, Yang Zhang, Xiang Wang

3D molecule generation is crucial for drug discovery and material science, requiring models to process complex multi-modalities, including atom types, chemical bonds, and 3D coordinates. A key challenge is integrating these modalities of different shapes while maintaining SE(3) equivariance for 3D coordinates. To achieve this, existing approaches typically maintain separate latent spaces for invariant and equivariant modalities, reducing efficiency in both training and sampling. In this work, we propose \textbf{U}nified Variational \textbf{A}uto-\textbf{E}ncoder for \textbf{3D} Molecular Latent Diffusion Modeling (\textbf{UAE-3D}), a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space, while maintaining near-zero reconstruction error. This unified latent space eliminates the complexities of handling multi-modality and equivariance when performing latent diffusion modeling. We demonstrate this by employing the Diffusion Transformer–a general-purpose diffusion model without any molecular inductive bias–for latent generation. Extensive experiments on GEOM-Drugs and QM9 datasets demonstrate that our method significantly establishes new benchmarks in both \textit{de novo} and conditional 3D molecule generation, achieving leading efficiency and quality. On GEOM-Drugs, it reduces FCD by 72.6\% over the previous best result, while achieving over 70\% relative average improvements in geometric fidelity.

nan


Article 472

Title@2025-06-11 (3): Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models

Title: Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models Konvergenzbedingungen für stochastische Liniensuche auf Basis der Optimierung von überparametrisierten Modellen 存储线搜索的一致条件 优化过度平衡模型的优化 2408.03199v2

Authors (2): Matteo Lapucci, Davide Pucci

In this paper, we deal with algorithms to solve the finite-sum problems related to fitting over-parametrized models, that typically satisfy the interpolation condition. In particular, we focus on approaches based on stochastic line searches and employing general search directions. We define conditions on the sequence of search directions that guarantee finite termination and bounds for the backtracking procedure. Moreover, we shed light on the additional property of directions needed to prove fast (linear) convergence of the general class of algorithms when applied to PL functions in the interpolation regime. From the point of view of algorithms design, the proposed analysis identifies safeguarding conditions that could be employed in relevant algorithmic frameworks. In particular, it could be of interest to integrate stochastic line searches within momentum, conjugate gradient or adaptive preconditioning methods.

nan


Article 473

Title@2025-06-11 (3): On a few pitfalls in KL divergence gradient estimation for RL

Title: On a few pitfalls in KL divergence gradient estimation for RL Bei einigen Fallen bei KL-Divergenzgradientenschätzung für RL KL RL差差差梯度估算的几个陷阱 2506.09477v1

Authors (2): Yunhao Tang, Rémi Munos

We point out a few pitfalls in implementing gradient estimation for KL divergence in RL training for LLM, as seen in a number of open source projects and papers. The first major pitfall is to differentiate through the KL estimate as loss functions to minimize KL divergence. We show that such implementations are generally incorrect and do not produce the desired KL gradient. Secondly, we show that some implementations do not account for the sequential nature of the estimation problem and produce a partial gradient at best. We demonstrate the impact of such issues with illustrative tabular and LLM experiments, and show the correct way to implement the KL gradient.

nan


Article 474

Title@2025-06-11 (3): Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough?

Title: Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough? Modellgröße in kontinuierlichen Gauß-Prozessen anpassen: Wie groß ist groß genug? 调整连续高斯进程模型大小:大到多大? 2408.07588v4

Authors (3): Guiomar Pescador-Barrios, Sarah Filippi, Mark van der Wilk

Many machine learning models require setting a parameter that controls their size before training, e.g. number of neurons in DNNs, or inducing points in GPs. Increasing capacity typically improves performance until all the information from the dataset is captured. After this point, computational cost keeps increasing, without improved performance. This leads to the question “How big is big enough?” We investigate this problem for Gaussian processes (single-layer neural networks) in continual learning. Here, data becomes available incrementally, and the final dataset size will therefore not be known before training, preventing the use of heuristics for setting a fixed model size. We develop a method to automatically adjust model size while maintaining near-optimal performance. Our experimental procedure follows the constraint that any hyperparameters must be set without seeing dataset properties, and we show that our method performs well across diverse datasets without the need to adjust its hyperparameter, showing it requires less tuning than others.

nan


Article 475

Title@2025-06-11 (3): A Reinforcement Learning Approach for RIS-aided Fair Communications

Title: A Reinforcement Learning Approach for RIS-aided Fair Communications Ein stärkerer Lernansatz für RIS-gestützte Fair Communications 以强化学习方法促进RIRS援助的公平交流 2506.06344v2

Authors (5): Alex Pierron, Michel Barbeau, Luca De Cicco, Jose Rubio-Hernan, Joaquin Garcia-Alfaro

Reconfigurable Intelligent Surfaces (RISs) are composed of physical elements that can dynamically alter electromagnetic wave properties to enhance beamforming and leading to improvements in areas with low coverage properties. They have the potential to be combined with Reinforcement Learning (RL) techniques to achieve network performance and energy efficiency via optimization techniques. In addition to performance and energy improvements, it is also crucial to consider the concept of fair communications. RISs must ensure that User Equipment (UE) units receive their signals with adequate strength, without other UE being deprived of service due to insufficient power. In this paper, we address such a problem. We explore the fairness properties of previous work and propose a novel method that aims at obtaining an efficient and fair duplex RIS-RL system for multiple legitimate UE units. We report and discuss our experimental work and simulation results. We also release our code and datasets to foster further research in the topic.

nan


Article 476

Title@2025-06-11 (3): Forecasting high-impact research topics via machine learning on evolving knowledge graphs

Title: Forecasting high-impact research topics via machine learning on evolving knowledge graphs Prognose hochwirksamer Forschungsthemen durch maschinelles Lernen auf sich entwickelnden Wissensgraphen 通过对不断演变的知识图表进行机器学习,预测高影响研究专题 2402.08640v4

Authors (2): Xuemei Gu, Mario Krenn

The exponential growth in scientific publications poses a severe challenge for human researchers. It forces attention to more narrow sub-fields, which makes it challenging to discover new impactful research ideas and collaborations outside one’s own field. While there are ways to predict a scientific paper’s future citation counts, they need the research to be finished and the paper written, usually assessing impact long after the idea was conceived. Here we show how to predict the impact of onsets of ideas that have never been published by researchers. For that, we developed a large evolving knowledge graph built from more than 21 million scientific papers. It combines a semantic network created from the content of the papers and an impact network created from the historic citations of papers. Using machine learning, we can predict the dynamic of the evolving network into the future with high accuracy (AUC values beyond 0.9 for most experiments), and thereby the impact of new research directions. We envision that the ability to predict the impact of new ideas will be a crucial component of future artificial muses that can inspire new impactful and interesting scientific ideas.

nan


Article 477

Title@2025-06-11 (3): ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Title: ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning ReinFlow: Feinsteuerungs-Flow Matching-Politik mit Online-Verstärkungs-Lernen ReinFlow: 与在线强化学习匹配流动政策的微调 2505.22094v3

Authors (4): Tonghe Zhang, Chao Yu, Sichang Su, Yu Wang

We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy’s deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/

nan


Article 478

Title@2025-06-11 (3): Bayesian Neural Scaling Law Extrapolation with Prior-Fitted Networks

Title: Bayesian Neural Scaling Law Extrapolation with Prior-Fitted Networks Bayesian Neural Scaling Law Extrapolation mit Prior-Fitted-Netzwerke Bayesian神经扩增法与事先确定网络的外推法 2505.23032v2

Authors (8): Dongwoo Lee, Dong Bok Lee, Steven Adriaensen, Juho Lee, Sung Ju Hwang, Frank Hutter, Seon Joo Kim, Hae Beom Lee

Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.

nan


Article 479

Title@2025-06-11 (3): Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms

Title: Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms Auf dem Weg zur Überbrückung des Lohn-Generation-Gaps in direkten Ausrichtungsalgorithmen 争取弥合直接调整数值中的奖励-评分差距 2506.09457v1

Authors (3): Zeguan Xiao, Yun Chen, Guanhua Chen

Direct Alignment Algorithms (DAAs), such as Direct Preference Optimization (DPO) and Simple Preference Optimization (SimPO), have emerged as efficient alternatives to Reinforcement Learning from Human Feedback (RLHF) algorithms for aligning large language models (LLMs) with human preferences. However, DAAs suffer from a fundamental limitation we identify as the “reward-generation gap” – a misalignment between optimization objectives during training and actual generation performance during inference. In this paper, we find a contributor to the reward-generation gap is the mismatch between the inherent importance of prefix tokens during the LLM generation process and how this importance is reflected in the implicit reward functions of DAAs. To bridge the gap, we introduce a simple yet effective approach called Prefix-Oriented Equal-length Training (POET), which truncates both preferred and dispreferred responses to match the shorter one’s length. Training with POET, where both responses in each sample are truncated to equal length, resulting in diverse truncated lengths across samples, the optimization of DAAs objective is implicitly constrained to converge across all positions, thus paying more attention to prefix tokens than the standard DAAs. We conduct experiments with DPO and SimPO, two representative DAAs, demonstrating that POET improves over their standard implementations, achieving up to 15.6 points in AlpacaEval 2 and overall improvements across downstream tasks. Our results highlight the importance of addressing the misalignment between reward optimization and generation performance in DAAs.

nan


Article 480

Title@2025-06-11 (3): NDCG-Consistent Softmax Approximation with Accelerated Convergence

Title: NDCG-Consistent Softmax Approximation with Accelerated Convergence NDCG-Konsistente Softmax-Annäherung mit beschleunigter Konvergenz NDCG-Consistent Softmax 与加速汇合相近 2506.09454v1

Authors (6): Yuanhao Pu, Defu Lian, Xiaolong Chen, Xu Huang, Jin Chen, Enhong Chen

Ranking tasks constitute fundamental components of extreme similarity learning frameworks, where extremely large corpora of objects are modeled through relative similarity relationships adhering to predefined ordinal structures. Among various ranking surrogates, Softmax (SM) Loss has been widely adopted due to its natural capability to handle listwise ranking via global negative comparisons, along with its flexibility across diverse application scenarios. However, despite its effectiveness, SM Loss often suffers from significant computational overhead and scalability limitations when applied to large-scale object spaces. To address this challenge, we propose novel loss formulations that align directly with ranking metrics: the Ranking-Generalizable \textbf{squared} (RG$^2$) Loss and the Ranking-Generalizable interactive (RG$^\times$) Loss, both derived through Taylor expansions of the SM Loss. Notably, RG$^2$ reveals the intrinsic mechanisms underlying weighted squared losses (WSL) in ranking methods and uncovers fundamental connections between sampling-based and non-sampling-based loss paradigms. Furthermore, we integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method, providing both generalization guarantees and convergence rate analyses. Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance relative to SM Loss, while significantly accelerating convergence. This framework offers the similarity learning community both theoretical insights and practically efficient tools, with methodologies applicable to a broad range of tasks where balancing ranking quality and computational efficiency is essential.

nan


Article 481

Title@2025-06-11 (3): MADCluster: Model-agnostic Anomaly Detection with Self-supervised Clustering Network

Title: MADCluster: Model-agnostic Anomaly Detection with Self-supervised Clustering Network MADCluster: Modell-agnostische Anomalieerkennung mit selbstüberwachtem Clustering-Netzwerk MADCluster:使用自监管的集群网进行模型-不可知异常探测 2505.16223v5

Authors (3): Sangyong Lee, Subo Hwang, Dohoon Kim

In this paper, we propose MADCluster, a novel model-agnostic anomaly detection framework utilizing self-supervised clustering. MADCluster is applicable to various deep learning architectures and addresses the ‘hypersphere collapse’ problem inherent in existing deep learning-based anomaly detection methods. The core idea is to cluster normal pattern data into a ‘single cluster’ while simultaneously learning the cluster center and mapping data close to this center. Also, to improve expressiveness and enable effective single clustering, we propose a new ‘One-directed Adaptive loss’. The optimization of this loss is mathematically proven. MADCluster consists of three main components: Base Embedder capturing high-dimensional temporal dynamics, Cluster Distance Mapping, and Sequence-wise Clustering for continuous center updates. Its model-agnostic characteristics are achieved by applying various architectures to the Base Embedder. Experiments on four time series benchmark datasets demonstrate that applying MADCluster improves the overall performance of comparative models. In conclusion, the compatibility of MADCluster shows potential for enhancing model performance across various architectures.

nan


Article 482

Title@2025-06-11 (3): SafeCoT: Improving VLM Safety with Minimal Reasoning

Title: SafeCoT: Improving VLM Safety with Minimal Reasoning SafeCoT: VLM-Sicherheit mit minimaler Vernunft verbessern SafeCot: 提高VLM安全性,并采用最低合理理由 2506.08399v2

Authors (4): Jiachen Ma, Zhanhui Zhou, Chao Yang, Chaochao Lu

Ensuring safe and appropriate responses from vision-language models (VLMs) remains a critical challenge, particularly in high-risk or ambiguous scenarios. We introduce SafeCoT, a lightweight, interpretable framework that leverages rule-based chain-of-thought (CoT) supervision to improve refusal behavior in VLMs. Unlike prior methods that rely on large-scale safety annotations or complex modeling, SafeCoT uses minimal supervision to help models reason about safety risks and make context-aware refusals. Experiments across multiple benchmarks show that SafeCoT significantly reduces overrefusal and enhances generalization, even with limited training data. Our approach offers a scalable solution for aligning VLMs with safety-critical objectives.

nan


Article 483

Title@2025-06-11 (3): Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform

Title: Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform Verschlimmerungen von LLM-Embedding-Sequenzen lernen: Glas-Reformation 学习LLM 嵌入序列的学习障碍:染色玻璃变形 2506.09452v1

Authors (4): Jay Roberts, Kyle Mylonakis, Sidhartha Roy, Kaan Kale

The high cost of ownership of AI compute infrastructure and challenges of robust serving of large language models (LLMs) has led to a surge in managed Model-as-a-service deployments. Even when enterprises choose on-premises deployments, the compute infrastructure is typically shared across many teams in order to maximize the return on investment. In both scenarios the deployed models operate only on plaintext data, and so enterprise data owners must allow their data to appear in plaintext on a shared or multi-tenant compute infrastructure. This results in data owners with private or sensitive data being hesitant or restricted in what data they use with these types of deployments. In this work we introduce the Stained Glass Transform, a learned, stochastic, and sequence dependent transformation of the word embeddings of an LLM which information theoretically provides privacy to the input of the LLM while preserving the utility of model. We theoretically connect a particular class of Stained Glass Transforms to the theory of mutual information of Gaussian Mixture Models. We then calculate a-postiori privacy estimates, based on mutual information, and verify the privacy and utility of instances of transformed embeddings through token level metrics of privacy and standard LLM performance benchmarks.

nan


Article 484

Title@2025-06-11 (3): Safe Screening Rules for Group SLOPE

Title: Safe Screening Rules for Group SLOPE Sichere Screening-Regeln für Gruppen-LOPE SLOPE集团安全筛选规则 2506.09451v1

Authors (3): Runxue Bao, Quanchao Lu, Yanfu Zhang

Variable selection is a challenging problem in high-dimensional sparse learning, especially when group structures exist. Group SLOPE performs well for the adaptive selection of groups of predictors. However, the block non-separable group effects in Group SLOPE make existing methods either invalid or inefficient. Consequently, Group SLOPE tends to incur significant computational costs and memory usage in practical high-dimensional scenarios. To overcome this issue, we introduce a safe screening rule tailored for the Group SLOPE model, which efficiently identifies inactive groups with zero coefficients by addressing the block non-separable group effects. By excluding these inactive groups during training, we achieve considerable gains in computational efficiency and memory usage. Importantly, the proposed screening rule can be seamlessly integrated into existing solvers for both batch and stochastic algorithms. Theoretically, we establish that our screening rule can be safely employed with existing optimization algorithms, ensuring the same results as the original approaches. Experimental results confirm that our method effectively detects inactive feature groups and significantly boosts computational efficiency without compromising accuracy.

nan


Article 485

Title@2025-06-11 (3): Modified K-means Algorithm with Local Optimality Guarantees

Title: Modified K-means Algorithm with Local Optimality Guarantees Modifizierter K-Mittel-Algorithmus mit lokalen Optimierungsgarantien 使用本地最佳度保障修改 K 平均值 2506.06990v2

Authors (3): Mingyi Li, Michael R. Metel, Akiko Takeda

The K-means algorithm is one of the most widely studied clustering algorithms in machine learning. While extensive research has focused on its ability to achieve a globally optimal solution, there still lacks a rigorous analysis of its local optimality guarantees. In this paper, we first present conditions under which the K-means algorithm converges to a locally optimal solution. Based on this, we propose simple modifications to the K-means algorithm which ensure local optimality in both the continuous and discrete sense, with the same computational complexity as the original K-means algorithm. As the dissimilarity measure, we consider a general Bregman divergence, which is an extension of the squared Euclidean distance often used in the K-means algorithm. Numerical experiments confirm that the K-means algorithm does not always find a locally optimal solution in practice, while our proposed methods provide improved locally optimal solutions with reduced clustering loss. Our code is available at https://github.com/lmingyi/LO-K-means.

nan


Article 486

Title@2025-06-11 (3): Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking

Title: Attention-Bayesian Hybrid Approach to Modular Multiple Particle Tracking Aufmerksamkeits-Bayesischer Hybrid-Ansatz für modulares multiples Partikel-Tracking 模块多粒子跟踪的注意 – – 巴伊斯混合办法 2506.09441v1

Authors (2): Piyush Mishra, Philippe Roudot

Tracking multiple particles in noisy and cluttered scenes remains challenging due to a combinatorial explosion of trajectory hypotheses, which scales super-exponentially with the number of particles and frames. The transformer architecture has shown a significant improvement in robustness against this high combinatorial load. However, its performance still falls short of the conventional Bayesian filtering approaches in scenarios presenting a reduced set of trajectory hypothesis. This suggests that while transformers excel at narrowing down possible associations, they may not be able to reach the optimality of the Bayesian approach in locally sparse scenario. Hence, we introduce a hybrid tracking framework that combines the ability of self-attention to learn the underlying representation of particle behavior with the reliability and interpretability of Bayesian filtering. We perform trajectory-to-detection association by solving a label prediction problem, using a transformer encoder to infer soft associations between detections across frames. This prunes the hypothesis set, enabling efficient multiple-particle tracking in Bayesian filtering framework. Our approach demonstrates improved tracking accuracy and robustness against spurious detections, offering a solution for high clutter multiple particle tracking scenarios.

nan


Article 487

Title@2025-06-11 (3): Optimization over Sparse Support-Preserving Sets: Two-Step Projection with Global Optimality Guarantees

Title: Optimization over Sparse Support-Preserving Sets: Two-Step Projection with Global Optimality Guarantees Optimierung über Sparse Support-Preserving Sets: Zwei-Schritt-Projektion mit globalen Optimierungsgarantien 使用全球最佳保证的两步投影 2506.08558v2

Authors (3): William de Vazelhes, Xiao-Tong Yuan, Bin Gu

In sparse optimization, enforcing hard constraints using the $\ell_0$ pseudo-norm offers advantages like controlled sparsity compared to convex relaxations. However, many real-world applications demand not only sparsity constraints but also some extra constraints. While prior algorithms have been developed to address this complex scenario with mixed combinatorial and convex constraints, they typically require the closed form projection onto the mixed constraints which might not exist, and/or only provide local guarantees of convergence which is different from the global guarantees commonly sought in sparse optimization. To fill this gap, in this paper, we study the problem of sparse optimization with extra support-preserving constraints commonly encountered in the literature. We present a new variant of iterative hard-thresholding algorithm equipped with a two-step consecutive projection operator customized for these mixed constraints, serving as a simple alternative to the Euclidean projection onto the mixed constraint. By introducing a novel trade-off between sparsity relaxation and sub-optimality, we provide global guarantees in objective value for the output of our algorithm, in the deterministic, stochastic, and zeroth-order settings, under the conventional restricted strong-convexity/smoothness assumptions. As a fundamental contribution in proof techniques, we develop a novel extension of the classic three-point lemma to the considered two-step non-convex projection operator, which allows us to analyze the convergence in objective value in an elegant way that has not been possible with existing techniques. In the zeroth-order case, such technique also improves upon the state-of-the-art result from de Vazelhes et. al. (2022), even in the case without additional constraints, by allowing us to remove a non-vanishing system error present in their work.

nan


Article 488

Title@2025-06-11 (3): Generalization Error Analysis for Attack-Free and Byzantine-Resilient Decentralized Learning with Data Heterogeneity

Title: Generalization Error Analysis for Attack-Free and Byzantine-Resilient Decentralized Learning with Data Heterogeneity Generalisierungsfehleranalyse für Angriffsfreies und Byzantinisch-Resilientes dezentralisiertes Lernen mit Daten Heterogenität 为无攻击攻击和有数据差异化的抗拜占庭分散化学习提供通用错误分析 2506.09438v1

Authors (3): Haoxiang Ye, Tao Sun, Qing Ling

Decentralized learning, which facilitates joint model training across geographically scattered agents, has gained significant attention in the field of signal and information processing in recent years. While the optimization errors of decentralized learning algorithms have been extensively studied, their generalization errors remain relatively under-explored. As the generalization errors reflect the scalability of trained models on unseen data and are crucial in determining the performance of trained models in real-world applications, understanding the generalization errors of decentralized learning is of paramount importance. In this paper, we present fine-grained generalization error analysis for both attack-free and Byzantine-resilient decentralized learning with heterogeneous data as well as under mild assumptions, in contrast to prior studies that consider homogeneous data and/or rely on a stringent bounded stochastic gradient assumption. Our results shed light on the impact of data heterogeneity, model initialization and stochastic gradient noise – factors that have not been closely investigated before – on the generalization error of decentralized learning. We also reveal that Byzantine attacks performed by malicious agents largely affect the generalization error, and their negative impact is inherently linked to the data heterogeneity while remaining independent on the sample size. Numerical experiments on both convex and non-convex tasks are conducted to validate our theoretical findings.

nan


Article 489

Title@2025-06-11 (3): Discovering Physics Laws of Dynamical Systems via Invariant Function Learning

Title: Discovering Physics Laws of Dynamical Systems via Invariant Function Learning Physikgesetze dynamischer Systeme durch Invariant Function Learning entdecken 通过动态功能学习发现动态系统的物理法 2502.04495v2

Authors (3): Shurui Gui, Xiner Li, Shuiwang Ji

We consider learning underlying laws of dynamical systems governed by ordinary differential equations (ODE). A key challenge is how to discover intrinsic dynamics across multiple environments while circumventing environment-specific mechanisms. Unlike prior work, we tackle more complex environments where changes extend beyond function coefficients to entirely different function forms. For example, we demonstrate the discovery of ideal pendulum’s natural motion $\alpha^2 \sin{\theta_t}$ by observing pendulum dynamics in different environments, such as the damped environment $\alpha^2 \sin(\theta_t) - \rho \omega_t$ and powered environment $\alpha^2 \sin(\theta_t) + \rho \frac{\omega_t}{\left \omega_t\right }$. Here, we formulate this problem as an \emph{invariant function learning} task and propose a new method, known as \textbf{D}isentanglement of \textbf{I}nvariant \textbf{F}unctions (DIF), that is grounded in causal analysis. We propose a causal graph and design an encoder-decoder hypernetwork that explicitly disentangles invariant functions from environment-specific dynamics. The discovery of invariant functions is guaranteed by our information-based principle that enforces the independence between extracted invariant functions and environments. Quantitative comparisons with meta-learning and invariant learning baselines on three ODE systems demonstrate the effectiveness and efficiency of our method. Furthermore, symbolic regression explanation results highlight the ability of our framework to uncover intrinsic laws. Our code has been released as part of the AIRS library (\href{https://github.com/divelab/AIRS/tree/main/OpenODE/DIF}{https://github.com/divelab/AIRS/}).

nan


Article 490

Title@2025-06-11 (3): Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling

Title: Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling Lohnüberoptimierung bei Direktausrichtungsalgorithmen mit Wichtigkeits-Sampling abmildern 在具有重要性抽样调查的直接对齐数值中,减缓回调过度优化 2506.08681v2

Authors (8): Phuc Minh Nguyen, Ngoc-Hieu Nguyen, Duy H. M. Nguyen, Anji Liu, An Mai, Binh T. Nguyen, Daniel Sonntag, Khoa D. Doan

Direct Alignment Algorithms (DAAs) such as Direct Preference Optimization (DPO) have emerged as alternatives to the standard Reinforcement Learning from Human Feedback (RLHF) for aligning large language models (LLMs) with human values. However, these methods are more susceptible to over-optimization, in which the model drifts away from the reference policy, leading to degraded performance as training progresses. This paper proposes a novel importance-sampling approach to mitigate the over-optimization problem of offline DAAs. This approach, called (IS-DAAs), multiplies the DAA objective with an importance ratio that accounts for the reference policy distribution. IS-DAAs additionally avoid the high variance issue associated with importance sampling by clipping the importance ratio to a maximum value. Our extensive experiments demonstrate that IS-DAAs can effectively mitigate over-optimization, especially under low regularization strength, and achieve better performance than other methods designed to address this problem. Our implementations are provided publicly at this link.

nan


Article 491

Title@2025-06-11 (3): When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

Title: When Is Diversity Rewarded in Cooperative Multi-Agent Learning? Wann wird Vielfalt im kooperativen Multi-Agenten-Lernen belohnt? 当多样性在多机构合作学习中得到回报吗? 2506.09434v1

Authors (3): Michael Amir, Matteo Bettini, Amanda Prorok

The success of teams in robotics, nature, and society often depends on the division of labor among diverse specialists; however, a principled explanation for when such diversity surpasses a homogeneous team is still missing. Focusing on multi-agent task allocation problems, our goal is to study this question from the perspective of reward design: what kinds of objectives are best suited for heterogeneous teams? We first consider an instantaneous, non-spatial setting where the global reward is built by two generalized aggregation operators: an inner operator that maps the $N$ agents’ effort allocations on individual tasks to a task score, and an outer operator that merges the $M$ task scores into the global team reward. We prove that the curvature of these operators determines whether heterogeneity can increase reward, and that for broad reward families this collapses to a simple convexity test. Next, we ask what incentivizes heterogeneity to emerge when embodied, time-extended agents must learn an effort allocation policy. To study heterogeneity in such settings, we use multi-agent reinforcement learning (MARL) as our computational paradigm, and introduce Heterogeneous Environment Design (HED), a gradient-based algorithm that optimizes the parameter space of underspecified MARL environments to find scenarios where heterogeneity is advantageous. Experiments in matrix games and an embodied Multi-Goal-Capture environment show that, despite the difference in settings, HED rediscovers the reward regimes predicted by our theory to maximize the advantage of heterogeneity, both validating HED and connecting our theoretical insights to reward design in MARL. Together, these results help us understand when behavioral diversity delivers a measurable benefit.

nan


Article 492

Title@2025-06-11 (3): Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training

Title: Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training Milderung von sauberen Korrelationen in LLMs durch Kausalität-Bewusstsein nach dem Training 通过Causaliity-Aware 培训后在LLMs中减轻净污损 2506.09433v1

Authors (2): Shurui Gui, Shuiwang Ji

While large language models (LLMs) have demonstrated remarkable capabilities in language modeling, recent studies reveal that they often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training. Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT). By decomposing a biased prediction into two unbiased steps, known as \textit{event estimation} and \textit{event intervention}, we reduce LLMs’ pre-training biases without incurring additional fine-tuning biases, thus enhancing the model’s generalization ability. Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks using only 100 ID fine-tuning samples, demonstrating the effectiveness and sample efficiency of CAPT.

nan


Article 493

Title@2025-06-11 (3): Through a Steerable Lens: Magnifying Neural Network Interpretability via Phase-Based Extrapolation

Title: Through a Steerable Lens: Magnifying Neural Network Interpretability via Phase-Based Extrapolation Durch eine Steerable Lens: Vergrößerung der Neuralen Netzwerk-Interpretabilität über phasenbasierte Extrapolation a 通过固态透镜:通过基于分阶段的外推法使神经网络放大解释 2506.02300v3

Authors (4): Farzaneh Mahdisoltani, Saeed Mahdisoltani, Roger B. Grosse, David J. Fleet

Understanding the internal representations and decision mechanisms of deep neural networks remains a critical open challenge. While existing interpretability methods often identify influential input regions, they may not elucidate how a model distinguishes between classes or what specific changes would transition an input from one category to another. To address these limitations, we propose a novel framework that visualizes the implicit path between classes by treating the network gradient as a form of infinitesimal motion. Drawing inspiration from phase-based motion magnification, we first decompose images using invertible transforms-specifically the Complex Steerable Pyramid-then compute class-conditional gradients in the transformed space. Rather than iteratively integrating the gradient to trace a full path, we amplify the one-step gradient to the input and perform a linear extrapolation to expose how the model moves from source to target class. By operating in the steerable pyramid domain, these amplified gradients produce semantically meaningful, spatially coherent morphs that highlight the classifier’s most sensitive directions, giving insight into the geometry of its decision boundaries. Experiments on both synthetic and real-world datasets demonstrate that our phase-focused extrapolation yields perceptually aligned, semantically meaningful transformations, offering a novel, interpretable lens into neural classifiers’ internal representations.

nan


Article 494

Title@2025-06-11 (3): CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis

Title: CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis CTPD: Cross-Modal Temporal Pattern Discovery für verbesserte multimodale elektronische Gesundheitsdatenanalyse CTPD: 增强多模式电子健康记录分析的跨模式时间模式发现 2411.00696v2

Authors (4): Fuying Wang, Feng Wu, Yihan Tang, Lequan Yu

Integrating multimodal Electronic Health Records (EHR) data, such as numerical time series and free-text clinical reports, has great potential in predicting clinical outcomes. However, prior work has primarily focused on capturing temporal interactions within individual samples and fusing multimodal information, overlooking critical temporal patterns across patients. These patterns, such as trends in vital signs like abnormal heart rate or blood pressure, can indicate deteriorating health or an impending critical event. Similarly, clinical notes often contain textual descriptions that reflect these patterns. Identifying corresponding temporal patterns across different modalities is crucial for improving the accuracy of clinical outcome predictions, yet it remains a challenging task. To address this gap, we introduce a Cross-Modal Temporal Pattern Discovery (CTPD) framework, designed to efficiently extract meaningful cross-modal temporal patterns from multimodal EHR data. Our approach introduces shared initial temporal pattern representations which are refined using slot attention to generate temporal semantic embeddings. To ensure rich cross-modal temporal semantics in the learned patterns, we introduce a contrastive-based TPNCE loss for cross-modal alignment, along with two reconstruction losses to retain core information of each modality. Evaluations on two clinically critical tasks, 48-hour in-hospital mortality and 24-hour phenotype classification, using the MIMIC-III database demonstrate the superiority of our method over existing approaches.

nan


Article 495

Title@2025-06-11 (3): Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic Monitoring

Title: Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic Monitoring Schwach überwachtes Multiple Instance-Lernen für Whale Call Detection und zeitliche Lokalisierung in der langdauerhaften passiven akustischen Überwachung 在长期被动听力监测中,受监督的捕鲸呼叫探测和时间定位多案例学习能力薄弱 2502.20838v2

Authors (4): Ragib Amin Nihal, Benjamin Yen, Runwu Shi, Kazuhiro Nakadai

Marine ecosystem monitoring via Passive Acoustic Monitoring (PAM) generates vast data, but deep learning often requires precise annotations and short segments. We introduce DSMIL-LocNet, a Multiple Instance Learning framework for whale call detection and localization using only bag-level labels. Our dual-stream model processes 2-30 minute audio segments, leveraging spectral and temporal features with attention-based instance selection. Tests on Antarctic whale data show longer contexts improve classification (F1: 0.8-0.9) while medium instances ensure localization precision (0.65-0.70). This suggests MIL can enhance scalable marine monitoring. Code: https://github.com/Ragib-Amin-Nihal/DSMIL-Loc

nan


Article 496

Title@2025-06-11 (3): Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models

Title: Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models Folgen Sie der Energie, finden Sie den Weg: Riemannsche Metrics aus energiebasierten Modellen 遵循能源,寻找道路:以能源为基础的模型的里曼尼度量仪 2505.18230v2

Authors (6): Louis Béthune, David Vigouroux, Yilun Du, Rufin VanRullen, Thomas Serre, Victor Boutin

What is the shortest path between two data points lying in a high-dimensional space? While the answer is trivial in Euclidean geometry, it becomes significantly more complex when the data lies on a curved manifold – requiring a Riemannian metric to describe the space’s local curvature. Estimating such a metric, however, remains a major challenge in high dimensions. In this work, we propose a method for deriving Riemannian metrics directly from pretrained Energy-Based Models (EBMs) – a class of generative models that assign low energy to high-density regions. These metrics define spatially varying distances, enabling the computation of geodesics – shortest paths that follow the data manifold’s intrinsic geometry. We introduce two novel metrics derived from EBMs and show that they produce geodesics that remain closer to the data manifold and exhibit lower curvature distortion, as measured by alignment with ground-truth trajectories. We evaluate our approach on increasingly complex datasets: synthetic datasets with known data density, rotated character images with interpretable geometry, and high-resolution natural images embedded in a pretrained VAE latent space. Our results show that EBM-derived metrics consistently outperform established baselines, especially in high-dimensional settings. Our work is the first to derive Riemannian metrics from EBMs, enabling data-aware geodesics and unlocking scalable, geometry-driven learning for generative modeling and simulation.

nan


Article 497

Title@2025-06-11 (3): Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Title: Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Vertrauen ist alles, was Sie brauchen: Wenig heißes RL Feintuning von Sprachmodellen 信心是你们需要的:很少点点RL精细的语言模式 2506.06395v3

Authors (5): Pengyi Li, Matvey Skripkin, Alexander Zubrey, Andrey Kuznetsov, Ivan Oseledets

Large language models (LLMs) excel at reasoning, yet post-training remains critical for aligning their behavior with task goals. Existing reinforcement learning (RL) methods often depend on costly human annotations or external reward models. We propose Reinforcement Learning via Self-Confidence (RLSC), which uses the model’s own confidence as reward signals-eliminating the need for labels, preference models, or reward engineering. Applied to Qwen2.5-Math-7B with only 16 samples per question and 10 or 20 training steps, RLSC improves accuracy by +13.4% on AIME2024, +21.2% on MATH500, +21.7% on Minerva Math, +20.8% on Olympiadbench, and +9.7% on AMC23. RLSC provides a simple, scalable post-training method for inference models, requiring only a small number of samples and unlabelled supervision.

nan


Article 498

Title@2025-06-11 (3): Improving the performance of optical inverse design of multilayer thin films using CNN-LSTM tandem neural networks

Title: Improving the performance of optical inverse design of multilayer thin films using CNN-LSTM tandem neural networks Verbesserung der Leistungsfähigkeit des optischen inversen Designs von Mehrschicht-Dünnfilmen mit CNN-LSTM Tandem-Neuronalen Netzwerken 利用CNN-LSTM同步神经网络改进多层薄薄膜光反光设计性能 2506.10044v1

Authors (4): Uijun Jung, Deokho Jang, Sungchul Kim, Jungho Kim

Optical properties of thin film are greatly influenced by the thickness of each layer. Accurately predicting these thicknesses and their corresponding optical properties is important in the optical inverse design of thin films. However, traditional inverse design methods usually demand extensive numerical simulations and optimization procedures, which are time-consuming. In this paper, we utilize deep learning for the inverse design of the transmission spectra of SiO2/TiO2 multilayer thin films. We implement a tandem neural network (TNN), which can solve the one-to-many mapping problem that greatly degrades the performance of deep-learning-based inverse designs. In general, the TNN has been implemented by a back-to-back connection of an inverse neural network and a pre-trained forward neural network, both of which have been implemented based on multilayer perceptron (MLP) algorithms. In this paper, we propose to use not only MLP, but also convolutional neural network (CNN) or long short-term memory (LSTM) algorithms in the configuration of the TNN. We show that an LSTM-LSTM-based TNN yields the highest accuracy but takes the longest training time among nine configurations of TNNs. We also find that a CNN-LSTM-based TNN will be an optimal solution in terms of accuracy and speed because it could integrate the strengths of the CNN and LSTM algorithms.

nan


Article 499

Title@2025-06-11 (3): Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation

Title: Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation Zeiteinheitliche Diffusionspolitik mit Aktionsdiskriminierung für die Robotermanipulation 对机器人操纵行动歧视的、时间统一的统一传播政策 2506.09422v1

Authors (5): Ye Niu, Sanping Zhou, Yizhe Li, Ye Den, Le Wang

In many complex scenarios, robotic manipulation relies on generative models to estimate the distribution of multiple successful actions. As the diffusion model has better training robustness than other generative models, it performs well in imitation learning through successful robot demonstrations. However, the diffusion-based policy methods typically require significant time to iteratively denoise robot actions, which hinders real-time responses in robotic manipulation. Moreover, existing diffusion policies model a time-varying action denoising process, whose temporal complexity increases the difficulty of model training and leads to suboptimal action accuracy. To generate robot actions efficiently and accurately, we present the Time-Unified Diffusion Policy (TUDP), which utilizes action recognition capabilities to build a time-unified denoising process. On the one hand, we build a time-unified velocity field in action space with additional action discrimination information. By unifying all timesteps of action denoising, our velocity field reduces the difficulty of policy learning and speeds up action generation. On the other hand, we propose an action-wise training method, which introduces an action discrimination branch to supply additional action discrimination information. Through action-wise training, the TUDP implicitly learns the ability to discern successful actions to better denoising accuracy. Our method achieves state-of-the-art performance on RLBench with the highest success rate of 82.6% on a multi-view setup and 83.8% on a single-view setup. In particular, when using fewer denoising iterations, TUDP achieves a more significant improvement in success rate. Additionally, TUDP can produce accurate actions for a wide range of real-world tasks.

nan


Article 500

Title@2025-06-11 (3): A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy

Title: A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy Ein Aufruf für kollaborative Intelligenz: Warum Menschen-Agenten-Systeme der KI-Autonomie vorausgehen sollten 呼吁合作情报:为什么人力-物力系统应先于自主 2506.09420v1

Authors (13): Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Chunyu Miao, Dongyuan Li, Aiwei Liu, Yue Zhou, Yankai Chen, Weizhi Zhang, Yangning Li, Liancheng Fang, Renhe Jiang, Philip S. Yu

Recent improvements in large language models (LLMs) have led many researchers to focus on building fully autonomous AI agents. This position paper questions whether this approach is the right path forward, as these autonomous systems still have problems with reliability, transparency, and understanding the actual requirements of human. We suggest a different approach: LLM-based Human-Agent Systems (LLM-HAS), where AI works with humans rather than replacing them. By keeping human involved to provide guidance, answer questions, and maintain control, these systems can be more trustworthy and adaptable. Looking at examples from healthcare, finance, and software development, we show how human-AI teamwork can handle complex tasks better than AI working alone. We also discuss the challenges of building these collaborative systems and offer practical solutions. This paper argues that progress in AI should not be measured by how independent systems become, but by how well they can work with humans. The most promising future for AI is not in systems that take over human roles, but in those that enhance human capabilities through meaningful partnership.

nan


Article 501

Title@2025-06-11 (3): NestQuant: Nested Lattice Quantization for Matrix Products and LLMs

Title: NestQuant: Nested Lattice Quantization for Matrix Products and LLMs NestQuant: Nested Lattice Quantization für Matrix-Produkte und LLMs NestQuant: 母体产品和LLMs的Nasted Lattice量化 2502.09720v2

Authors (4): Semyon Savkin, Eitan Porat, Or Ordentlich, Yury Polyanskiy

Post-training quantization (PTQ) has emerged as a critical technique for efficient deployment of large language models (LLMs). This work proposes NestQuant, a novel PTQ scheme for weights and activations that is based on self-similar nested lattices. Recent works have mathematically shown such quantizers to be information-theoretically optimal for low-precision matrix multiplication. We implement a practical low-complexity version of NestQuant based on Gosset lattice, making it a drop-in quantizer for any matrix multiplication step (e.g., in self-attention, MLP etc). For example, NestQuant quantizes weights, KV-cache, and activations of Llama-3-8B to 4 bits, achieving perplexity of 6.6 on wikitext2. This represents more than 55% reduction in perplexity gap with respect to unquantized model (perplexity of 6.14) compared to state-of-the-art Metas SpinQuant (perplexity 7.3), OstQuant (7.3) and QuaRot (8.2). Comparisons on bigger models (up to 70B) and on various LLM evaluation benchmarks confirm uniform superiority of NestQuant.

nan


Article 502

Title@2025-06-11 (3): DeepForm: Reasoning Large Language Model for Communication System Formulation

Title: DeepForm: Reasoning Large Language Model for Communication System Formulation DeepForm: Großes Sprachmodell für die Formulierung von Kommunikationssystemen 深形式:开发通信系统的理由大语言模式 2506.08551v2

Authors (6): Panlong Wu, Ting Wang, Yifei Zhong, Haoqi Zhang, Zitong Wang, Fangxin Wang

Communication system formulation is critical for advancing 6G and future wireless technologies, yet it remains a complex, expertise-intensive task. While Large Language Models (LLMs) offer potential, existing general-purpose models often lack the specialized domain knowledge, nuanced reasoning capabilities, and access to high-quality, domain-specific training data required for adapting a general LLM into an LLM specially for communication system formulation. To bridge this gap, we introduce DeepForm, the first reasoning LLM specially for automated communication system formulation. We propose the world-first large-scale, open-source dataset meticulously curated for this domain called Communication System Formulation Reasoning Corpus (CSFRC). Our framework employs a two-stage training strategy: first, Supervised Fine-Tuning (SFT) with Chain-of-Thought (CoT) data to distill domain knowledge; second, a novel rule-based Reinforcement Learning (RL) algorithm, C-ReMax based on ReMax, to cultivate advanced modeling capabilities and elicit sophisticated reasoning patterns like self-correction and verification. Extensive experiments demonstrate that our model achieves state-of-the-art performance, significantly outperforming larger proprietary LLMs on diverse senerios. We will release related resources to foster further research in this area after the paper is accepted.

nan


Article 503

Title@2025-06-11 (3): Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors

Title: Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors Wer die Störung gestartet hat, sollte sie beenden: Leiten datenfreies Modell Zusammenführen über Task-Vektoren 开始干预者应当结束干预的任何人:指导通过任务矢量合并无数据模式 2503.08099v2

Authors (5): Runxi Cheng, Feng Xiong, Yongxian Wei, Wanyun Zhu, Chun Yuan

Model merging seeks to integrate task-specific expert models into a unified architecture while preserving multi-task generalization capabilities, yet parameter interference between constituent models frequently induces performance degradation. Although prior work has explored many merging strategies, resolving interference without additional data for retraining or test-time computation remains challenging. In this paper, we theoretically demonstrate that the task vectors of the linear layer constitute an approximate linear subspace for its corresponding input. Therefore, we can minimize interference under the guidance of task vectors. Based on this insight, we propose \textbf{WUDI-Merging} (\textbf{W}hoever started the interference sho\textbf{U}ld en\textbf{D} \textbf{I}t), a simple yet effective model merging method that eliminates interference without any additional data or rescaling coefficients. Comprehensive empirical evaluations across vision and language benchmarks demonstrate our method’s superiority, achieving state-of-the-art performance in data-free model merging scenarios (average 10.9\% improvement versus baseline methods) while even outperforming mainstream test-time adaptation approaches by 3.3\%, and only very few computing resources are required. The code will be publicly available soon.

nan


Article 504

Title@2025-06-11 (3): Arrival Control in Quasi-Reversible Queueing Systems: Optimization and Reinforcement Learning

Title: Arrival Control in Quasi-Reversible Queueing Systems: Optimization and Reinforcement Learning Ankunftskontrolle in Quasi-Reversible Queueing Systemen: Optimierung und Verstärkungslernen 准可更新排队系统抵达控制:优化和强化学习 2505.16353v2

Authors (2): Céline Comte, Pascal Moyal

In this paper, we introduce a versatile scheme for optimizing the arrival rates of quasi-reversible queueing systems. We first propose an alternative definition of quasi-reversibility that encompasses reversibility and highlights the importance of the definition of customer classes. In a second time, we introduce balanced arrival control policies, which generalize the notion of balanced arrival rates introduced in the context of Whittle networks, to the much broader class of quasi-reversible queueing systems. We prove that supplementing a quasi-reversible queueing system with a balanced arrival-control policy preserves the quasi-reversibility, and we specify the form of the stationary measures. We revisit two canonical examples of quasi-reversible queueing systems, Whittle networks and order-independent queues. Lastly, we focus on the problem of admission control and leverage our results in the frameworks of optimization and reinforcement learning.

nan


Article 505

Title@2025-06-11 (3): Code-Switching Curriculum Learning for Multilingual Transfer in LLMs

Title: Code-Switching Curriculum Learning for Multilingual Transfer in LLMs Code-Switching Curriculum Learning für den mehrsprachigen Transfer in LLMs 以LLMM为多种语言转让进行代码转换课程学习 2411.02460v2

Authors (5): Haneul Yoo, Cheonbok Park, Sangdoo Yun, Alice Oh, Hwaran Lee

Large language models (LLMs) now exhibit near human-level performance in various tasks, but their performance drops drastically after a handful of high-resource languages due to the imbalance in pre-training data. Inspired by the human process of second language acquisition, particularly code-switching$\unicode{x2014}$the practice of language alternation in a conversation$\unicode{x2014}$we propose code-switching curriculum learning (CSCL) to enhance cross-lingual transfer for LLMs. CSCL mimics the stages of human language learning by progressively training models with a curriculum consisting of 1) token-level code-switching, 2) sentence-level code-switching, and 3) monolingual corpora. Using Qwen 2 as our underlying model, we demonstrate the efficacy of the CSCL in improving language transfer to Korean, achieving significant performance gains compared to monolingual continual pre-training methods. Ablation studies reveal that both token- and sentence-level code-switching significantly enhance cross-lingual transfer and that curriculum learning amplifies these effects. We also extend our findings into various languages, including Japanese (high-resource) and Indonesian (low-resource), and using two additional models (Gemma 2 and Phi 3.5). We further show that CSCL mitigates spurious correlations between language resources and safety alignment, presenting a robust, efficient framework for more equitable language transfer in LLMs. We observe that CSCL is effective for low-resource settings where high-quality, monolingual corpora for language transfer are hardly available.

nan


Article 506

Title@2025-06-11 (3): Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations

Title: Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations Sim-to-Real-Causal-Transfer: Ein metrischer Lernansatz zu kausal-aware Interaktionsdarstellungen 简单到实际因果转移:从计量学习方法进行体能互动演示 2312.04540v2

Authors (5): Ahmad Rahimi, Po-Chien Luan, Yuejiang Liu, Frano Rajič, Alexandre Alahi

Modeling spatial-temporal interactions among neighboring agents is at the heart of multi-agent problems such as motion forecasting and crowd navigation. Despite notable progress, it remains unclear to which extent modern representations can capture the causal relationships behind agent interactions. In this work, we take an in-depth look at the causal awareness of these representations, from computational formalism to real-world practice. First, we cast doubt on the notion of non-causal robustness studied in the recent CausalAgents benchmark. We show that recent representations are already partially resilient to perturbations of non-causal agents, and yet modeling indirect causal effects involving mediator agents remains challenging. To address this challenge, we introduce a metric learning approach that regularizes latent representations with causal annotations. Our controlled experiments show that this approach not only leads to higher degrees of causal awareness but also yields stronger out-of-distribution robustness. To further operationalize it in practice, we propose a sim-to-real causal transfer method via cross-domain multi-task learning. Experiments on pedestrian datasets show that our method can substantially boost generalization, even in the absence of real-world causal annotations. We hope our work provides a new perspective on the challenges and pathways towards causally-aware representations of multi-agent interactions. Our code is available at https://github.com/vita-epfl/CausalSim2Real.

nan


Article 507

Title@2025-06-11 (3): Scoop-and-Toss: Dynamic Object Collection for Quadrupedal Systems

Title: Scoop-and-Toss: Dynamic Object Collection for Quadrupedal Systems Scoop-and-Toss: Dynamische Objektsammlung für Vierradsysteme Scooop-tos:四制系统动态对象收藏 2506.09406v1

Authors (3): Minji Kang, Chanwoo Baek, Yoonsang Lee

Quadruped robots have made significant advances in locomotion, extending their capabilities from controlled environments to real-world applications. Beyond movement, recent work has explored loco-manipulation using the legs to perform tasks such as pressing buttons or opening doors. While these efforts demonstrate the feasibility of leg-based manipulation, most have focused on relatively static tasks. In this work, we propose a framework that enables quadruped robots to collect objects without additional actuators by leveraging the agility of their legs. By attaching a simple scoop-like add-on to one leg, the robot can scoop objects and toss them into a collection tray mounted on its back. Our method employs a hierarchical policy structure comprising two expert policies-one for scooping and tossing, and one for approaching object positions-and a meta-policy that dynamically switches between them. The expert policies are trained separately, followed by meta-policy training for coordinated multi-object collection. This approach demonstrates how quadruped legs can be effectively utilized for dynamic object manipulation, expanding their role beyond locomotion.

nan


Article 508

Title@2025-06-11 (3): Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps

Title: Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps Feinabstimmungs-Diffusions-Politiken mit Backpropagation durch Diffusion Zeitschritte 微调通过扩散时间步骤进行反向宣传的传播政策 2505.10482v2

Authors (5): Ningyuan Yang, Jiaxuan Gao, Feng Gao, Yi Wu, Chao Yu

Diffusion policies, widely adopted in decision-making scenarios such as robotics, gaming and autonomous driving, are capable of learning diverse skills from demonstration data due to their high representation power. However, the sub-optimal and limited coverage of demonstration data could lead to diffusion policies that generate sub-optimal trajectories and even catastrophic failures. While reinforcement learning (RL)-based fine-tuning has emerged as a promising solution to address these limitations, existing approaches struggle to effectively adapt Proximal Policy Optimization (PPO) to diffusion models. This challenge stems from the computational intractability of action likelihood estimation during the denoising process, which leads to complicated optimization objectives. In our experiments starting from randomly initialized policies, we find that online tuning of Diffusion Policies demonstrates much lower sample efficiency compared to directly applying PPO on MLP policies (MLP+PPO). To address these challenges, we introduce NCDPO, a novel framework that reformulates Diffusion Policy as a noise-conditioned deterministic policy. By treating each denoising step as a differentiable transformation conditioned on pre-sampled noise, NCDPO enables tractable likelihood evaluation and gradient backpropagation through all diffusion timesteps. Our experiments demonstrate that NCDPO achieves sample efficiency comparable to MLP+PPO when training from scratch, outperforming existing methods in both sample efficiency and final performance across diverse benchmarks, including continuous robot control and multi-agent game scenarios. Furthermore, our experimental results show that our method is robust to the number denoising timesteps in the Diffusion Policy.

nan


Article 509

Title@2025-06-11 (3): What Can RL Bring to VLA Generalization? An Empirical Study

Title: What Can RL Bring to VLA Generalization? An Empirical Study Was kann RL zur VLA-Verallgemeinerung bringen? Eine empirische Studie RL能带给VLA的概括化带来什么?经验研究。 2505.19789v2

Authors (8): Jijia Liu, Feng Gao, Bingwen Wei, Xinlei Chen, Qingmin Liao, Yi Wu, Chao Yu, Yu Wang

Large Vision-Language Action (VLA) models have shown significant potential for embodied AI. However, their predominant training via supervised fine-tuning (SFT) limits generalization due to susceptibility to compounding errors under distribution shifts. Reinforcement learning (RL) offers a path to overcome these limitations by optimizing for task objectives via trial-and-error, yet a systematic understanding of its specific generalization benefits for VLAs compared to SFT is lacking. To address this, our study introduces a comprehensive benchmark for evaluating VLA generalization and systematically investigates the impact of RL fine-tuning across diverse visual, semantic, and execution dimensions. Our extensive experiments reveal that RL fine-tuning, particularly with PPO, significantly enhances generalization in semantic understanding and execution robustness over SFT, while maintaining comparable visual robustness. We identify PPO as a more effective RL algorithm for VLAs than LLM-derived methods like DPO and GRPO. We also develop a simple recipe for efficient PPO training on VLAs, and demonstrate its practical utility for improving VLA generalization. The project page is at https://rlvla.github.io

nan


Article 510

Title@2025-06-11 (3): AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)

Title: AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI) AdversariaL attacK sAfety alignment(ALKALI): Sicherung von LLMs durch GRACE: Geometrische Darstellung-Bewusst Kontrastive Verbesserung- Einführung von Adversarial Vulnerability Quality Index (AVQI) Adversarial attacK sAfety alignment(ALKALI):通过GRACE来保护LLMS:几何代表-提高软件对抗性增强-采用反脆弱性质量指数(AVQI) 2506.08885v2

Authors (7): Danush Khanna, Krishna Kumar, Basab Ghosh, Vinija Jain, Vasu Sharma, Aman Chadha, Amitava Das

Adversarial threats against LLMs are escalating faster than current defenses can adapt. We expose a critical geometric blind spot in alignment: adversarial prompts exploit latent camouflage, embedding perilously close to the safe representation manifold while encoding unsafe intent thereby evading surface level defenses like Direct Preference Optimization (DPO), which remain blind to the latent geometry. We introduce ALKALI, the first rigorously curated adversarial benchmark and the most comprehensive to date spanning 9,000 prompts across three macro categories, six subtypes, and fifteen attack families. Evaluation of 21 leading LLMs reveals alarmingly high Attack Success Rates (ASRs) across both open and closed source models, exposing an underlying vulnerability we term latent camouflage, a structural blind spot where adversarial completions mimic the latent geometry of safe ones. To mitigate this vulnerability, we introduce GRACE - Geometric Representation Aware Contrastive Enhancement, an alignment framework coupling preference learning with latent space regularization. GRACE enforces two constraints: latent separation between safe and adversarial completions, and adversarial cohesion among unsafe and jailbreak behaviors. These operate over layerwise pooled embeddings guided by a learned attention profile, reshaping internal geometry without modifying the base model, and achieve up to 39% ASR reduction. Moreover, we introduce AVQI, a geometry aware metric that quantifies latent alignment failure via cluster separation and compactness. AVQI reveals when unsafe completions mimic the geometry of safe ones, offering a principled lens into how models internally encode safety. We make the code publicly available at https://anonymous.4open.science/r/alkali-B416/README.md.

nan


Article 511

Title@2025-06-11 (3): Chem42: a Family of chemical Language Models for Target-aware Ligand Generation

Title: Chem42: a Family of chemical Language Models for Target-aware Ligand Generation Chem42: eine Familie chemischer Sprachmodelle für die zielbewusste Ligand-Generation Chem42: “ 目标觉知长者 “ 化学语言模式家庭 2503.16563v2

Authors (7): Aahan Singh, Engin Tekin, Maryam Nadeem, Nancy A. ElNaker, Mohammad Amaan Sayeed, Natalia Vassilieva, Boulbaba Ben Amor

Revolutionizing drug discovery demands more than just understanding molecular interactions - it requires generative models that can design novel ligands tailored to specific biological targets. While chemical Language Models (cLMs) have made strides in learning molecular properties, most fail to incorporate target-specific insights, restricting their ability to drive de-novo ligand generation. Chem42, a cutting-edge family of generative chemical Language Models, is designed to bridge this gap. By integrating atomic-level interactions with multimodal inputs from Prot42, a complementary protein Language Model, Chem42 achieves a sophisticated cross-modal representation of molecular structures, interactions, and binding patterns. This innovative framework enables the creation of structurally valid, synthetically accessible ligands with enhanced target specificity. Evaluations across diverse protein targets confirm that Chem42 surpasses existing approaches in chemical validity, target-aware design, and predicted binding affinity. By reducing the search space of viable drug candidates, Chem42 could accelerate the drug discovery pipeline, offering a powerful generative AI tool for precision medicine. Our Chem42 models set a new benchmark in molecule property prediction, conditional molecule generation, and target-aware ligand design. The models are publicly available at huggingface.co/inceptionai.

nan


Article 512

Title@2025-06-11 (3): Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization

Title: Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization Synergisierendes Verstärkungslernen und genetische Algorithmen für die neurale Kombinatorialoptimierung 实现神经组合优化的强化强化学习和遗传数值同步化 2506.09404v1

Authors (5): Shengda Gu, Kai Li, Junliang Xing, Yifan Zhang, Jian Cheng

Combinatorial optimization problems are notoriously challenging due to their discrete structure and exponentially large solution space. Recent advances in deep reinforcement learning (DRL) have enabled the learning heuristics directly from data. However, DRL methods often suffer from limited exploration and susceptibility to local optima. On the other hand, evolutionary algorithms such as Genetic Algorithms (GAs) exhibit strong global exploration capabilities but are typically sample inefficient and computationally intensive. In this work, we propose the Evolutionary Augmentation Mechanism (EAM), a general and plug-and-play framework that synergizes the learning efficiency of DRL with the global search power of GAs. EAM operates by generating solutions from a learned policy and refining them through domain-specific genetic operations such as crossover and mutation. These evolved solutions are then selectively reinjected into the policy training loop, thereby enhancing exploration and accelerating convergence. We further provide a theoretical analysis that establishes an upper bound on the KL divergence between the evolved solution distribution and the policy distribution, ensuring stable and effective policy updates. EAM is model-agnostic and can be seamlessly integrated with state-of-the-art DRL solvers such as the Attention Model, POMO, and SymNCO. Extensive results on benchmark problems (e.g., TSP, CVRP, PCTSP, and OP) demonstrate that EAM significantly improves both solution quality and training efficiency over competitive baselines.

nan


Article 513

Title@2025-06-11 (3): Pre-trained Large Language Models Learn Hidden Markov Models In-context

Title: Pre-trained Large Language Models Learn Hidden Markov Models In-context Vortrainierte große Sprachmodelle lernen versteckte Markov-Modelle In-Kontext 受过培训的大型语言模型学习隐藏Markov模型 2506.07298v2

Authors (5): Yijia Dai, Zhaolin Gao, Yahya Sattar, Sarah Dean, Jennifer J. Sun

Hidden Markov Models (HMMs) are foundational tools for modeling sequential data with latent Markovian structure, yet fitting them to real-world data remains computationally challenging. In this work, we show that pre-trained large language models (LLMs) can effectively model data generated by HMMs via in-context learning (ICL)$\unicode{x2013}$their ability to infer patterns from examples within a prompt. On a diverse set of synthetic HMMs, LLMs achieve predictive accuracy approaching the theoretical optimum. We uncover novel scaling trends influenced by HMM properties, and offer theoretical conjectures for these empirical observations. We also provide practical guidelines for scientists on using ICL as a diagnostic tool for complex data. On real-world animal decision-making tasks, ICL achieves competitive performance with models designed by human experts. To our knowledge, this is the first demonstration that ICL can learn and predict HMM-generated sequences$\unicode{x2013}$an advance that deepens our understanding of in-context learning in LLMs and establishes its potential as a powerful tool for uncovering hidden structure in complex scientific data.

nan


Article 514

Title@2025-06-11 (3): LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification

Title: LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification LLM Enhancers for GNNs: Eine Analyse aus der Perspektive der Identifizierung von Kausalmechanismen GNN的LLM增强者:从因果关系机制的确定角度进行分析 2505.08265v3

Authors (6): Hang Gao, Wenxuan Huang, Fengge Wu, Junsuo Zhao, Changwen Zheng, Huaping Liu

The use of large language models (LLMs) as feature enhancers to optimize node representations, which are then used as inputs for graph neural networks (GNNs), has shown significant potential in graph representation learning. However, the fundamental properties of this approach remain underexplored. To address this issue, we propose conducting a more in-depth analysis of this issue based on the interchange intervention method. First, we construct a synthetic graph dataset with controllable causal relationships, enabling precise manipulation of semantic relationships and causal modeling to provide data for analysis. Using this dataset, we conduct interchange interventions to examine the deeper properties of LLM enhancers and GNNs, uncovering their underlying logic and internal mechanisms. Building on the analytical results, we design a plug-and-play optimization module to improve the information transfer between LLM enhancers and GNNs. Experiments across multiple datasets and models validate the proposed module.

nan


Article 515

Title@2025-06-11 (3): A theoretical basis for model collapse in recursive training

Title: A theoretical basis for model collapse in recursive training Eine theoretische Grundlage für Modellkollaps in rekursiver Ausbildung 循环培训模式崩溃的理论基础 2506.09401v1

Authors (1): Vivek Shripad Borkar

It is known that recursive training from generative models can lead to the so called `collapse’ of the simulated probability distribution. This note shows that one in fact gets two different asymptotic behaviours depending on whether an external source, howsoever minor, is also contributing samples.

nan


Article 516

Title@2025-06-11 (3): Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames

Title: Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames Effiziente Vorhersage von SO(3)-Equivarienten Hamiltonian Matrices über SO(2) Local Frames 通过SO(2)地方框架有效预测SO(3)-等质汉密尔顿母体 2506.09398v1

Authors (5): Haiyang Yu, Yuchao Lin, Xuan Zhang, Xiaofeng Qian, Shuiwang Ji

We consider the task of predicting Hamiltonian matrices to accelerate electronic structure calculations, which plays an important role in physics, chemistry, and materials science. Motivated by the inherent relationship between the off-diagonal blocks of the Hamiltonian matrix and the SO(2) local frame, we propose a novel and efficient network, called QHNetV2, that achieves global SO(3) equivariance without the costly SO(3) Clebsch-Gordan tensor products. This is achieved by introducing a set of new efficient and powerful SO(2)-equivariant operations and performing all off-diagonal feature updates and message passing within SO(2) local frames, thereby eliminating the need of SO(3) tensor products. Moreover, a continuous SO(2) tensor product is performed within the SO(2) local frame at each node to fuse node features, mimicking the symmetric contraction operation. Extensive experiments on the large QH9 and MD17 datasets demonstrate that our model achieves superior performance across a wide range of molecular structures and trajectories, highlighting its strong generalization capability. The proposed SO(2) operations on SO(2) local frames offer a promising direction for scalable and symmetry-aware learning of electronic structures. Our code will be released as part of the AIRS library https://github.com/divelab/AIRS.

nan


Article 517

Title@2025-06-11 (3): The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent

Title: The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent Der Computational Advantage of Depth: High-Dimensional Hierarchical Funktionen mit gradienten Abstieg lernen 深度的计算优势:学习带有渐变后代的高多元等级功能 2502.13961v3

Authors (4): Yatin Dandi, Luca Pesce, Lenka Zdeborová, Florent Krzakala

Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. In this paper, we introduce a class of target functions (single and multi-index Gaussian hierarchical targets) that incorporate a hierarchy of latent subspace dimensionalities. This framework enables us to analytically study the learning dynamics and generalization performance of deep networks compared to shallow ones in the high-dimensional limit. Specifically, our main theorem shows that feature learning with GD successively reduces the effective dimensionality, transforming a high-dimensional problem into a sequence of lower-dimensional ones. This enables learning the target function with drastically less samples than with shallow networks. While the results are proven in a controlled training setting, we also discuss more common training procedures and argue that they learn through the same mechanisms.

nan


Article 518

Title@2025-06-11 (3): SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving

Title: SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving SLED: Ein spekulatives LLM-Decoding-Framework für effizientes Edge Serving SLED: 有效边缘服务投机性LLM代谢框架 2506.09397v1

Authors (5): Xiangchen Li, Dimitrios Spatharakis, Saeid Ghafouri, Jiakun Fan, Dimitrios Nikolopoulos

Regardless the advancements in device capabilities, efficient inferencing advanced large language models (LLMs) at the edge remains challenging due to limited device memory and power constraints. Existing strategies, such as aggressive quantization, pruning, or remote inference, trade accuracy for efficiency or lead to substantial cost burdens. This position paper introduces a new approach that leverages speculative decoding, previously viewed primarily as a decoding acceleration technique for autoregressive generation of LLMs, as a promising approach specifically adapted for edge computing by orchestrating computation across heterogeneous devices. We propose SLED, a method that allows lightweight edge devices to draft multiple candidate tokens locally using diverse draft models, while a single, shared edge server efficiently batches and verifies the tokens utilizing a more precise target model. This approach supports device heterogeneity and reduces server-side memory footprint by avoiding the need to deploy multiple target models. Our initial experiments with Jetson Orin Nano, Raspberry Pi 5, and an RTX 6000 edge server indicate substantial benefits: significantly reduced latency, improved energy efficiency, and increased concurrent inference sessions, all without sacrificing model accuracy.

nan


Article 519

Title@2025-06-11 (3): A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities

Title: A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities Ein genauerer Blick auf TabPFN v2: seine Stärken verstehen und seine Fähigkeiten erweitern 更仔细地看看TabPFN v 2:了解其力量并扩大其能力 2502.17361v2

Authors (3): Han-Jia Ye, Si-Yang Liu, Wei-Lun Chao

Tabular datasets are inherently heterogeneous, presenting significant challenges for developing pre-trained foundation models. The recently introduced transformer-based Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning performance across diverse downstream datasets, marking a pivotal advancement in tabular foundation models. In this paper, we take a closer look at TabPFN v2 to examine how it effectively handles heterogeneity and achieves high predictive accuracy, and to explore how its limitations in high-dimensional, many-category, and large-scale tasks can be mitigated. We find that TabPFN v2 can infer attribute relationships even when provided with randomized attribute token inputs, eliminating the need to explicitly learn dataset-specific attribute embeddings to address heterogeneity. We further show that TabPFN v2 can be transformed into a feature extractor, revealing its ability to construct a highly separable feature space for accurate predictions. Lastly, we demonstrate that TabPFN v2’s limitations can be addressed through a test-time divide-and-conquer strategy, enabling scalable inference without requiring re-training. By uncovering the mechanisms behind TabPFN v2’s success and introducing strategies to extend its applicability, this study offers key insights into the design of future tabular foundation models.

nan


Article 520

Title@2025-06-11 (3): MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

Title: MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models MedChat: Multi-Agenten-Framework für multimodale Diagnosen mit großen Sprachmodellen MedChat:使用大语言模式的多语言多模式诊断多机构框架 2506.07400v2

Authors (9): Philip R. Liu, Sparsh Bansal, Jimmy Dinh, Aditya Pawar, Ramani Satishkumar, Shail Desai, Neeraj Gupta, Xin Wang, Shu Hu

The integration of deep learning-based glaucoma detection with large language models (LLMs) presents an automated strategy to mitigate ophthalmologist shortages and improve clinical reporting efficiency. However, applying general LLMs to medical imaging remains challenging due to hallucinations, limited interpretability, and insufficient domain-specific medical knowledge, which can potentially reduce clinical accuracy. Although recent approaches combining imaging models with LLM reasoning have improved reporting, they typically rely on a single generalist agent, restricting their capacity to emulate the diverse and complex reasoning found in multidisciplinary medical teams. To address these limitations, we propose MedChat, a multi-agent diagnostic framework and platform that combines specialized vision models with multiple role-specific LLM agents, all coordinated by a director agent. This design enhances reliability, reduces hallucination risk, and enables interactive diagnostic reporting through an interface tailored for clinical review and educational use. Code available at https://github.com/Purdue-M2/MedChat.

nan


Article 521

Title@2025-06-11 (3): Spatial-RAG: Spatial Retrieval Augmented Generation for Real-World Geospatial Reasoning Questions

Title: Spatial-RAG: Spatial Retrieval Augmented Generation for Real-World Geospatial Reasoning Questions Raum-RAG: Raum-Retrieval Augmented Generation für Real-World Geospatial Reasoning Fragen 空间-RAG:世界现实地理空间原因问题空间检索增加一代 2502.18470v5

Authors (6): Dazhou Yu, Riyang Bao, Ruiyu Ning, Jinghong Peng, Gengchen Mai, Liang Zhao

Answering real-world geospatial questions–such as finding restaurants along a travel route or amenities near a landmark–requires reasoning over both geographic relationships and semantic user intent. However, existing large language models (LLMs) lack spatial computing capabilities and access to up-to-date, ubiquitous real-world geospatial data, while traditional geospatial systems fall short in interpreting natural language. To bridge this gap, we introduce Spatial-RAG, a Retrieval-Augmented Generation (RAG) framework designed for geospatial question answering. Spatial-RAG integrates structured spatial databases with LLMs via a hybrid spatial retriever that combines sparse spatial filtering and dense semantic matching. It formulates the answering process as a multi-objective optimization over spatial and semantic relevance, identifying Pareto-optimal candidates and dynamically selecting the best response based on user intent. Experiments across multiple tourism and map-based QA datasets show that Spatial-RAG significantly improves accuracy, precision, and ranking performance over strong baselines.

nan


Article 522

Title@2025-06-11 (3): Coil2Coil: Self-supervised MR image denoising using phased-array coil images

Title: Coil2Coil: Self-supervised MR image denoising using phased-array coil images Coil2Coil: Selbstüberwachtes MR-Bild, das mit Hilfe von Phased-Array-Coil-Bildern dämpft Coil2Cooil:使用相相继阵状圆圈图像自行监督的 MR 图像除去污染 2208.07552v2

Authors (6): Juhyung Park, Dongwon Park, Sooyeon Ji, Hyeong-Geol Shin, Se Young Chun, Jongho Lee

Denoising of magnetic resonance images is beneficial in improving the quality of low signal-to-noise ratio images. Recently, denoising using deep neural networks has demonstrated promising results. Most of these networks, however, utilize supervised learning, which requires large training images of noise-corrupted and clean image pairs. Obtaining training images, particularly clean images, is expensive and time-consuming. Hence, methods such as Noise2Noise (N2N) that require only pairs of noise-corrupted images have been developed to reduce the burden of obtaining training datasets. In this study, we propose a new self-supervised denoising method, Coil2Coil (C2C), that does not require the acquisition of clean images or paired noise-corrupted images for training. Instead, the method utilizes multichannel data from phased-array coils to generate training images. First, it divides and combines multichannel coil images into two images, one for input and the other for label. Then, they are processed to impose noise independence and sensitivity normalization such that they can be used for the training images of N2N. For inference, the method inputs a coil-combined image (e.g., DICOM image), enabling a wide application of the method. When evaluated using synthetic noise-added images, C2C shows the best performance against several self-supervised methods, reporting comparable outcomes to supervised methods. When testing the DICOM images, C2C successfully denoised real noise without showing structure-dependent residuals in the error maps. Because of the significant advantage of not requiring additional scans for clean or paired images, the method can be easily utilized for various clinical applications.

nan


Article 523

Title@2025-06-11 (3): Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation

Title: Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation Die neuen Diffusionsmodelle: Vom Generativen Pre-Training bis zur Ein-Schritt-Generation 重新审视传播模式:从产生培训前培训到单步一代 2506.09376v1

Authors (2): Bowen Zheng, Tianming Yang

Diffusion distillation is a widely used technique to reduce the sampling cost of diffusion models, yet it often requires extensive training, and the student performance tends to be degraded. Recent studies show that incorporating a GAN objective may alleviate these issues, yet the underlying mechanism remains unclear. In this work, we first identify a key limitation of distillation: mismatched step sizes and parameter numbers between the teacher and the student model lead them to converge to different local minima, rendering direct imitation suboptimal. We further demonstrate that a standalone GAN objective, without relying a distillation loss, overcomes this limitation and is sufficient to convert diffusion models into efficient one-step generators. Based on this finding, we propose that diffusion training may be viewed as a form of generative pre-training, equipping models with capabilities that can be unlocked through lightweight GAN fine-tuning. Supporting this view, we create a one-step generation model by fine-tuning a pre-trained model with 85% of parameters frozen, achieving strong performance with only 0.2M images and near-SOTA results with 5M images. We further present a frequency-domain analysis that may explain the one-step generative capability gained in diffusion training. Overall, our work provides a new perspective for diffusion training, highlighting its role as a powerful generative pre-training process, which can be the basis for building efficient one-step generation models.

nan


Article 524

Title@2025-06-11 (3): Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining

Title: Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining Modality-Balancing Preference Optimierung großer multimodaler Modelle durch disversariale Negative Mining 通过反反向负负采矿优化大型多式联运模型 2506.08022v2

Authors (7): Chenxi Liu, Tianyi Xiong, Ruibo Chen, Yihan Wu, Junfeng Guo, Tianyi Zhou, Heng Huang

The task adaptation and alignment of Large Multimodal Models (LMMs) have been significantly advanced by instruction tuning and further strengthened by recent preference optimization. Yet, most LMMs still suffer from severe modality imbalance during reasoning, i.e., outweighing language prior biases over visual inputs, which bottlenecks their generalization to downstream tasks and causes hallucinations. However, existing preference optimization approaches for LMMs do not focus on restraining the internal biases of their Large Language Model (LLM) backbones when curating the training data. Moreover, they heavily rely on offline data and lack the capacity to explore diverse responses adaptive to dynamic distributional shifts during training. Meanwhile, Group Relative Policy Optimization (GRPO), a recent method using online-generated data and verified rewards to improve reasoning capabilities, remains largely underexplored in LMM alignment. In this paper, we propose a novel preference learning framework, Modality-Balancing Preference Optimization (MBPO), to address the modality imbalance in LMMs. MBPO constructs a more effective offline preference dataset by generating hard negatives, i.e., rejected responses misled by LLM biases due to limited usage of visual information, through adversarial perturbation of input images. Moreover, MBPO leverages the easy-to-verify nature of close-ended tasks to generate online responses with verified rewards. GRPO is then employed to train the model with offline-online hybrid data. Extensive experiments demonstrate that MBPO can enhance LMM performance on challenging vision-language tasks and effectively reduce hallucinations.

nan


Article 525

Title@2025-06-11 (3): Effective Regularization Through Loss-Function Metalearning

Title: Effective Regularization Through Loss-Function Metalearning Effektive Regularisierung durch Loss-Function Metalearning 通过损失 – – 职能计算学习实现有效正规化 2010.00788v5

Authors (3): Santiago Gonzalez, Xin Qiu, Risto Miikkulainen

Evolutionary computation can be used to optimize several different aspects of neural network architectures. For instance, the TaylorGLO method discovers novel, customized loss functions, resulting in improved performance, faster training, and improved data utilization. A likely reason is that such functions discourage overfitting, leading to effective regularization. This paper demonstrates theoretically that this is indeed the case for TaylorGLO. Learning rule decomposition reveals that evolved loss functions balance two factors: the pull toward zero error, and a push away from it to avoid overfitting. This is a general principle that may be used to understand other regularization techniques as well (as demonstrated in this paper for label smoothing). The theoretical analysis leads to a constraint that can be utilized to find more effective loss functions in practice; the mechanism also results in networks that are more robust (as demonstrated in this paper with adversarial inputs). The analysis in this paper thus constitutes a first step towards understanding regularization, and demonstrates the power of evolutionary neural architecture search in general.

nan


Article 526

Title@2025-06-11 (3): Low-resource domain adaptation while minimizing energy and hardware resource consumption

Title: Low-resource domain adaptation while minimizing energy and hardware resource consumption Low-Resource-Domänenanpassung bei Minimierung des Energie- und Hardware-Ressourcenverbrauchs 低资源领域的适应,同时尽量减少能源和硬件资源的消耗 2506.08433v2

Authors (3): Hernán Maina, Nicolás Wolovick, Luciana Benotti

Training Large Language Models (LLMs) is costly in terms of energy, hardware, and annotated data, often resulting in a positionality rooted in predominant cultures and values (Santy et al., 2023). Domain adaptation has emerged as a promising strategy to better align models with diverse cultural and value contexts (Hershcovich et al., 2022), but its computational cost remains a significant barrier, particularly for research groups lacking access to large-scale infrastructure. In this paper, we evaluate how the use of different numerical precision formats and data parallelization strategies impacts both training speed (as a proxy to energy and hardware consumption) and model accuracy, with the goal of facilitating domain adaptation in low-resource environments. Our findings are relevant to any setting where energy efficiency, accessibility, or limited hardware availability are key concerns.

nan


Article 527

Title@2025-06-11 (3): TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation

Title: TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation TimeDART: Ein Diffusion Autoregressiver Transformer für selbstüberwachte Zeitreihendarstellung TimeDART: 自动控制时间序列代表的传播自动递减变动器 2410.05711v5

Authors (4): Daoyu Wang, Mingyue Cheng, Zhiding Liu, Qi Liu

Self-supervised learning has garnered increasing attention in time series analysis for benefiting various downstream tasks and reducing reliance on labeled data. Despite its effectiveness, existing methods often struggle to comprehensively capture both long-term dynamic evolution and subtle local patterns in a unified manner. In this work, we propose \textbf{TimeDART}, a novel self-supervised time series pre-training framework that unifies two powerful generative paradigms to learn more transferable representations. Specifically, we first employ a causal Transformer encoder, accompanied by a patch-based embedding strategy, to model the evolving trends from left to right. Building on this global modeling, we further introduce a denoising diffusion process to capture fine-grained local patterns through forward diffusion and reverse denoising. Finally, we optimize the model in an autoregressive manner. As a result, TimeDART effectively accounts for both global and local sequence features in a coherent way. We conduct extensive experiments on public datasets for time series forecasting and classification. The experimental results demonstrate that TimeDART consistently outperforms previous compared methods, validating the effectiveness of our approach. Our code is available at https://github.com/Melmaphother/TimeDART.

nan


Article 528

Title@2025-06-11 (3): LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

Title: LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization LPO: Auf dem Weg zu einer präzisen Interaktion der GUI-Agenten über Standortpräferenzoptimierung LPO:通过位置优化实现准确的图形界面代理互动 2506.09373v1

Authors (12): Jiaqi Tang, Yu Xia, Yi-Feng Wu, Yuwei Hu, Yuhui Chen, Qing-Guo Chen, Xiaogang Xu, Xiangyu Wu, Hao Lu, Yanqing Ma, Shiyin Lu, Qifeng Chen

The advent of autonomous agents is transforming interactions with Graphical User Interfaces (GUIs) by employing natural language as a powerful intermediary. Despite the predominance of Supervised Fine-Tuning (SFT) methods in current GUI agents for achieving spatial localization, these methods face substantial challenges due to their limited capacity to accurately perceive positional data. Existing strategies, such as reinforcement learning, often fail to assess positional accuracy effectively, thereby restricting their utility. In response, we introduce Location Preference Optimization (LPO), a novel approach that leverages locational data to optimize interaction preferences. LPO uses information entropy to predict interaction positions by focusing on zones rich in information. Besides, it further introduces a dynamic location reward function based on physical distance, reflecting the varying importance of interaction positions. Supported by Group Relative Preference Optimization (GRPO), LPO facilitates an extensive exploration of GUI environments and significantly enhances interaction precision. Comprehensive experiments demonstrate LPO’s superior performance, achieving SOTA results across both offline benchmarks and real-world online evaluations. Our code will be made publicly available soon, at https://github.com/AIDC-AI/LPO.

nan


Article 529

Title: Learnable Spatial-Temporal Positional Encoding for Link Prediction Erlernbare räumlich-temporale Positionskodierung für Link Prediction 链接预测的空间-临时定位编码 2506.08309v2

Authors (5): Katherine Tieu, Dongqi Fu, Zihao Li, Ross Maciejewski, Jingrui He

Accurate predictions rely on the expressiveness power of graph deep learning frameworks like graph neural networks and graph transformers, where a positional encoding mechanism has become much more indispensable in recent state-of-the-art works to record the canonical position information. However, the current positional encoding is limited in three aspects: (1) most positional encoding methods use pre-defined, and fixed functions, which are inadequate to adapt to the complex attributed graphs; (2) a few pioneering works proposed the learnable positional encoding but are still limited to the structural information, not considering the real-world time-evolving topological and feature information; (3) most positional encoding methods are equipped with transformers’ attention mechanism to fully leverage their capabilities, where the dense or relational attention is often unaffordable on large-scale structured data. Hence, we aim to develop Learnable Spatial-Temporal Positional Encoding in an effective and efficient manner and propose a simple temporal link prediction model named L-STEP. Briefly, for L-STEP, we (1) prove the proposed positional learning scheme can preserve the graph property from the spatial-temporal spectral viewpoint, (2) verify that MLPs can fully exploit the expressiveness and reach transformers’ performance on that encoding, (3) change different initial positional encoding inputs to show robustness, (4) analyze the theoretical complexity and obtain less empirical running time than SOTA, and (5) demonstrate its temporal link prediction out-performance on 13 classic datasets and with 10 algorithms in both transductive and inductive settings using 3 different sampling strategies. Also, L-STEP obtains the leading performance in the newest large-scale TGB benchmark. Our code is available at https://github.com/kthrn22/L-STEP.

nan


Article 530

Title@2025-06-11 (3): Anomaly Detection and Generation with Diffusion Models: A Survey

Title: Anomaly Detection and Generation with Diffusion Models: A Survey Anomalieerkennung und -generierung mit Diffusionsmodellen: Eine Umfrage 使用传播模型进行异常检测和生成:调查 2506.09368v1

Authors (10): Yang Liu, Jing Liu, Chengfang Li, Rui Xi, Wenchao Li, Liang Cao, Jin Wang, Laurence T. Yang, Junsong Yuan, Wei Zhou

Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing, by identifying unexpected patterns that deviate from established norms in real-world data. Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest due to their ability to learn complex data distributions and generate high-fidelity samples, offering a robust framework for unsupervised AD. In this survey, we comprehensively review anomaly detection and generation with diffusion models (ADGDM), presenting a tutorial-style analysis of the theoretical foundations and practical implementations and spanning images, videos, time series, tabular, and multimodal data. Crucially, unlike existing surveys that often treat anomaly detection and generation as separate problems, we highlight their inherent synergistic relationship. We reveal how DMs enable a reinforcing cycle where generation techniques directly address the fundamental challenge of anomaly data scarcity, while detection methods provide critical feedback to improve generation fidelity and relevance, advancing both capabilities beyond their individual potential. A detailed taxonomy categorizes ADGDM methods based on anomaly scoring mechanisms, conditioning strategies, and architectural designs, analyzing their strengths and limitations. We final discuss key challenges including scalability and computational efficiency, and outline promising future directions such as efficient architectures, conditioning strategies, and integration with foundation models (e.g., visual-language models and large language models). By synthesizing recent advances and outlining open research questions, this survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.

nan


Article 531

Title@2025-06-11 (3): An Online Learning Approach to Prompt-based Selection of Generative Models and LLMs

Title: An Online Learning Approach to Prompt-based Selection of Generative Models and LLMs Ein Online-Lernansatz zur sofortigen Auswahl generativer Modelle und LLMs 以在线学习方式迅速选择创创模式和LLMs 2410.13287v4

Authors (3): Xiaoyan Hu, Ho-fung Leung, Farzan Farnia

Selecting a sample generation scheme from multiple prompt-based generative models, including large language models (LLMs) and prompt-guided image and video generation models, is typically addressed by choosing the model that maximizes an averaged evaluation score. However, this score-based selection overlooks the possibility that different models achieve the best generation performance for different types of text prompts. An online identification of the best generation model for various input prompts can reduce the costs associated with querying sub-optimal models. In this work, we explore the possibility of varying rankings of text-based generative models for different text prompts and propose an online learning framework to predict the best data generation model for a given input prompt. The proposed PAK-UCB algorithm addresses a contextual bandit (CB) setting with shared context variables across the arms, utilizing the generated data to update kernel-based functions that predict the score of each model available for unseen text prompts. Additionally, we leverage random Fourier features (RFF) to accelerate the online learning process of PAK-UCB. Our numerical experiments on real and simulated text-to-image and image-to-text generative models show that RFF-UCB performs successfully in identifying the best generation model across different sample types. The code is available at: github.com/yannxiaoyanhu/dgm-online-select.

nan


Article 532

Title@2025-06-11 (3): SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending

Title: SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending SkillBlender: Auf dem Weg zu einer vielseitigen humanoiden Ganzkörperloko-Manipulation durch Skill Blending 技术人才库:通过技能混合处理,努力通过技能混合处理,实现全体操控和操纵 2506.09366v1

Authors (8): Yuxuan Kuang, Haoran Geng, Amine Elhafsi, Tan-Dzung Do, Pieter Abbeel, Jitendra Malik, Marco Pavone, Yue Wang

Humanoid robots hold significant potential in accomplishing daily tasks across diverse environments thanks to their flexibility and human-like morphology. Recent works have made significant progress in humanoid whole-body control and loco-manipulation leveraging optimal control or reinforcement learning. However, these methods require tedious task-specific tuning for each task to achieve satisfactory behaviors, limiting their versatility and scalability to diverse tasks in daily scenarios. To that end, we introduce SkillBlender, a novel hierarchical reinforcement learning framework for versatile humanoid loco-manipulation. SkillBlender first pretrains goal-conditioned task-agnostic primitive skills, and then dynamically blends these skills to accomplish complex loco-manipulation tasks with minimal task-specific reward engineering. We also introduce SkillBench, a parallel, cross-embodiment, and diverse simulated benchmark containing three embodiments, four primitive skills, and eight challenging loco-manipulation tasks, accompanied by a set of scientific evaluation metrics balancing accuracy and feasibility. Extensive simulated experiments show that our method significantly outperforms all baselines, while naturally regularizing behaviors to avoid reward hacking, resulting in more accurate and feasible movements for diverse loco-manipulation tasks in our daily scenarios. Our code and benchmark will be open-sourced to the community to facilitate future research. Project page: https://usc-gvl.github.io/SkillBlender-web/.

nan


Article 533

Title@2025-06-11 (3): TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Title: TextSquare: Scaling up Text-Centric Visual Instruction Tuning TextSquare: Skalieren von Text-Centric Visual Instruction Tuning TextSquare: 增强文本内容的视觉教学图示 2404.12803v3

Authors (18): Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Yangfan He, Kuan Lu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao Liu, Yuan Xie, Xiang Bai, Can Huang

Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data. To this end, we introduce a new approach for creating a massive, high-quality instruction-tuning dataset, Square-10M, which is generated using closed-source MLLMs. The data construction process, termed Square, consists of four steps: Self-Questioning, Answering, Reasoning, and Evaluation. Our experiments with Square-10M led to three key findings: 1) Our model, TextSquare, considerably surpasses open-source previous state-of-the-art Text-centric MLLMs and sets a new standard on OCRBench(62.2%). It even outperforms top-tier models like GPT4V and Gemini in 6 of 10 text-centric benchmarks. 2) Additionally, we demonstrate the critical role of VQA reasoning data in offering comprehensive contextual insights for specific questions. This not only improves accuracy but also significantly mitigates hallucinations. Specifically, TextSquare scores an average of 75.1% across four general VQA and hallucination evaluation datasets, outperforming previous state-of-the-art models. 3) Notably, the phenomenon observed in scaling text-centric VQA datasets reveals a vivid pattern: the exponential increase of instruction tuning data volume is directly proportional to the improvement in model performance, thereby validating the necessity of the dataset scale and the high quality of Square-10M.

nan


Article 534

Title@2025-06-11 (3): Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models

Title: Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models Automatische Pseudo-Harmful Prompt Generation zur Bewertung falscher Ablehnungen in großen Sprachmodellen 自动生成用于评价大语言模式虚假拒绝的优多有害快速自动生成 2409.00598v2

Authors (6): Bang An, Sicheng Zhu, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang

Safety-aligned large language models (LLMs) sometimes falsely refuse pseudo-harmful prompts, like “how to kill a mosquito,” which are actually harmless. Frequent false refusals not only frustrate users but also provoke a public backlash against the very values alignment seeks to protect. In this paper, we propose the first method to auto-generate diverse, content-controlled, and model-dependent pseudo-harmful prompts. Using this method, we construct an evaluation dataset called PHTest, which is ten times larger than existing datasets, covers more false refusal patterns, and separately labels controversial prompts. We evaluate 20 LLMs on PHTest, uncovering new insights due to its scale and labeling. Our findings reveal a trade-off between minimizing false refusals and improving safety against jailbreak attacks. Moreover, we show that many jailbreak defenses significantly increase the false refusal rates, thereby undermining usability. Our method and dataset can help developers evaluate and fine-tune safer and more usable LLMs. Our code and dataset are available at https://github.com/umd-huang-lab/FalseRefusal

nan


Article 535

Title@2025-06-11 (3): Generalized Lie Symmetries in Physics-Informed Neural Operators

Title: Generalized Lie Symmetries in Physics-Informed Neural Operators Generalisierte Lie-Symmetrien in physikinformierten Neuraloperatoren 物理内成形神经操作器中的一般谎言对称 2502.00373v2

Authors (5): Amy Xiang Wang, Zakhar Shumaylov, Peter Zaika, Ferdia Sherry, Carola-Bibiane Schönlieb

Physics-informed neural operators (PINOs) have emerged as powerful tools for learning solution operators of partial differential equations (PDEs). Recent research has demonstrated that incorporating Lie point symmetry information can significantly enhance the training efficiency of PINOs, primarily through techniques like data, architecture, and loss augmentation. In this work, we focus on the latter, highlighting that point symmetries oftentimes result in no training signal, limiting their effectiveness in many problems. To address this, we propose a novel loss augmentation strategy that leverages evolutionary representatives of point symmetries, a specific class of generalized symmetries of the underlying PDE. These generalized symmetries provide a richer set of generators compared to standard symmetries, leading to a more informative training signal. We demonstrate that leveraging evolutionary representatives enhances the performance of neural operators, resulting in improved data efficiency and accuracy during training.

nan


Article 536

Title@2025-06-11 (3): PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs

Title: PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs PDE-Controller: LLMs für Autoformalisierung und Begründung von PDEs PDE – – 主计长:PDE的自动化正规化和原因说明 2502.00963v2

Authors (6): Mauricio Soroco, Jialin Song, Mengzhou Xia, Kye Emond, Weiran Sun, Wuyang Chen

While recent AI-for-math has made strides in pure mathematics, areas of applied mathematics, particularly PDEs, remain underexplored despite their significant real-world applications. We present PDE-Controller, a framework that enables large language models (LLMs) to control systems governed by partial differential equations (PDEs). Our approach enables LLMs to transform informal natural language instructions into formal specifications, and then execute reasoning and planning steps to improve the utility of PDE control. We build a holistic solution comprising datasets (both human-written cases and 2 million synthetic samples), math-reasoning models, and novel evaluation metrics, all of which require significant effort. Our PDE-Controller significantly outperforms prompting the latest open source and GPT models in reasoning, autoformalization, and program synthesis, achieving up to a 62% improvement in utility gain for PDE control. By bridging the gap between language generation and PDE systems, we demonstrate the potential of LLMs in addressing complex scientific and engineering challenges. We release all data, model checkpoints, and code at https://pde-controller.github.io/.

nan


Article 537

Title@2025-06-11 (3): Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Title: Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation Autoregressive Adversariale Nachschulung für die Echtzeit-Interactive Videogenerierung 实时互动互动视频制作培训后自动递减反反反反向自动 2506.09350v1

Authors (9): Shanchuan Lin, Ceyuan Yang, Hao He, Jianwen Jiang, Yuxi Ren, Xin Xia, Yang Zhao, Xuefeng Xiao, Lu Jiang

Existing large-scale video generation models are computationally intensive, preventing adoption in real-time and interactive applications. In this work, we propose autoregressive adversarial post-training (AAPT) to transform a pre-trained latent video diffusion model into a real-time, interactive video generator. Our model autoregressively generates a latent frame at a time using a single neural function evaluation (1NFE). The model can stream the result to the user in real time and receive interactive responses as controls to generate the next latent frame. Unlike existing approaches, our method explores adversarial training as an effective paradigm for autoregressive generation. This not only allows us to design an architecture that is more efficient for one-step generation while fully utilizing the KV cache, but also enables training the model in a student-forcing manner that proves to be effective in reducing error accumulation during long video generation. Our experiments demonstrate that our 8B model achieves real-time, 24fps, streaming video generation at 736x416 resolution on a single H100, or 1280x720 on 8xH100 up to a minute long (1440 frames). Visit our research website at https://seaweed-apt.com/2

nan


Article 538

Title@2025-06-11 (3): TS-PIELM: Time-Stepping Physics-Informed Extreme Learning Machine Facilitates Soil Consolidation Analyses

Title: TS-PIELM: Time-Stepping Physics-Informed Extreme Learning Machine Facilitates Soil Consolidation Analyses TS-PIELM: Zeitschnelle Physik-informierte extreme Lernmaschine erleichtert Bodenkonsolidierungsanalysen TS-PIELM: 时间制备物理系统一体化极端学习机器促进土壤巩固分析 2506.08381v2

Authors (5): He Yang, Fei Ren, Hai-Sui Yu, Xueyu Geng, Pei-Zhi Zhuang

Accuracy and efficiency of the conventional physics-informed neural network (PINN) need to be improved before it can be a competitive alternative for soil consolidation analyses. This paper aims to overcome these limitations by proposing a highly accurate and efficient physics-informed machine learning (PIML) approach, termed time-stepping physics-informed extreme learning machine (TS-PIELM). In the TS-PIELM framework the consolidation process is divided into numerous time intervals, which helps overcome the limitation of PIELM in solving differential equations with sharp gradients. To accelerate network training, the solution is approximated by a single-layer feedforward extreme learning machine (ELM), rather than using a fully connected neural network in PINN. The input layer weights of the ELM network are generated randomly and fixed during the training process. Subsequently, the output layer weights are directly computed by solving a system of linear equations, which significantly enhances the training efficiency compared to the time-consuming gradient descent method in PINN. Finally, the superior performance of TS-PIELM is demonstrated by solving three typical Terzaghi consolidation problems. Compared to PINN, results show that the computational efficiency and accuracy of the novel TS-PIELM framework are improved by more than 1000 times and 100 times for one-dimensional cases, respectively. This paper provides compelling evidence that PIML can be a powerful tool for computational geotechnics.

nan


Article 539

Title@2025-06-11 (3): Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis

Title: Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis Warum sind Web-KI-Agenten anfälliger als Standalone LLMs? Eine Sicherheitsanalyse 为什么互联网上AI代理机构比独立的LLMs更加脆弱? 2502.20383v2

Authors (5): Jeffrey Yang Fan Chiang, Seungjae Lee, Jia-Bin Huang, Furong Huang, Yizheng Chen

Recent advancements in Web AI agents have demonstrated remarkable capabilities in addressing complex web navigation tasks. However, emerging research shows that these agents exhibit greater vulnerability compared to standalone Large Language Models (LLMs), despite both being built upon the same safety-aligned models. This discrepancy is particularly concerning given the greater flexibility of Web AI Agent compared to standalone LLMs, which may expose them to a wider range of adversarial user inputs. To build a scaffold that addresses these concerns, this study investigates the underlying factors that contribute to the increased vulnerability of Web AI agents. Notably, this disparity stems from the multifaceted differences between Web AI agents and standalone LLMs, as well as the complex signals - nuances that simple evaluation metrics, such as success rate, often fail to capture. To tackle these challenges, we propose a component-level analysis and a more granular, systematic evaluation framework. Through this fine-grained investigation, we identify three critical factors that amplify the vulnerability of Web AI agents; (1) embedding user goals into the system prompt, (2) multi-step action generation, and (3) observational capabilities. Our findings highlights the pressing need to enhance security and robustness in AI agent design and provide actionable insights for targeted defense strategies.

nan


Article 540

Title@2025-06-11 (3): Adversarial Surrogate Risk Bounds for Binary Classification

Title: Adversarial Surrogate Risk Bounds for Binary Classification Adversarial Surrogate Risk Bounds für Binäre Klassifizierung 二元分类的反向代谢风险临界值 2506.09348v1

Authors (1): Natalie S. Frank

A central concern in classification is the vulnerability of machine learning models to adversarial attacks. Adversarial training is one of the most popular techniques for training robust classifiers, which involves minimizing an adversarial surrogate risk. Recent work characterized when a minimizing sequence of an adversarial surrogate risk is also a minimizing sequence of the adversarial classification risk for binary classification – a property known as adversarial consistency. However, these results do not address the rate at which the adversarial classification risk converges to its optimal value for such a sequence of functions that minimize the adversarial surrogate. This paper provides surrogate risk bounds that quantify that convergence rate. Additionally, we derive distribution-dependent surrogate risk bounds in the standard (non-adversarial) learning setting, that may be of independent interest.

nan


Article 541

Title@2025-06-11 (3): ErrorEraser: Unlearning Data Bias for Improved Continual Learning

Title: ErrorEraser: Unlearning Data Bias for Improved Continual Learning ErrorErraser: Unlearning Data Bias für verbessertes kontinuierliches Lernen 错误搜索器: 退出数据比亚, 用于改进持续学习 2506.09347v1

Authors (7): Xuemei Cao, Hanlin Gu, Xin Yang, Bingjun Wei, Haoyang Liang, Xiangkun Wang, Tianrui Li

Continual Learning (CL) primarily aims to retain knowledge to prevent catastrophic forgetting and transfer knowledge to facilitate learning new tasks. Unlike traditional methods, we propose a novel perspective: CL not only needs to prevent forgetting, but also requires intentional forgetting.This arises from existing CL methods ignoring biases in real-world data, leading the model to learn spurious correlations that transfer and amplify across tasks. From feature extraction and prediction results, we find that data biases simultaneously reduce CL’s ability to retain and transfer knowledge. To address this, we propose ErrorEraser, a universal plugin that removes erroneous memories caused by biases in CL, enhancing performance in both new and old tasks. ErrorEraser consists of two modules: Error Identification and Error Erasure. The former learns the probability density distribution of task data in the feature space without prior knowledge, enabling accurate identification of potentially biased samples. The latter ensures only erroneous knowledge is erased by shifting the decision space of representative outlier samples. Additionally, an incremental feature distribution learning strategy is designed to reduce the resource overhead during error identification in downstream tasks. Extensive experimental results show that ErrorEraser significantly mitigates the negative impact of data biases, achieving higher accuracy and lower forgetting rates across three types of CL methods. The code is available at https://github.com/diadai/ErrorEraser.

nan


Article 542

Title@2025-06-11 (3): Ming-Omni: A Unified Multimodal Model for Perception and Generation

Title: Ming-Omni: A Unified Multimodal Model for Perception and Generation Ming-Omni: Ein einheitliches multimodales Modell für Wahrnehmung und Erzeugung Ming-Omni:感知和一代的统一多模式模型 2506.09344v1

Authors (58): Inclusion AI, Biao Gong, Cheng Zou, Chuanyang Zheng, Chunluan Zhou, Canxiang Yan, Chunxiang Jin, Chunjie Shen, Dandan Zheng, Fudong Wang, Furong Xu, GuangMing Yao, Jun Zhou, Jingdong Chen, Jianxin Sun, Jiajia Liu, Jianjiang Zhu, Jun Peng, Kaixiang Ji, Kaiyou Song, Kaimeng Ren, Libin Wang, Lixiang Ru, Lele Xie, Longhua Tan, Lyuxin Xue, Lan Wang, Mochen Bai, Ning Gao, Pei Chen, Qingpei Guo, Qinglong Zhang, Qiang Xu, Rui Liu, Ruijie Xiong, Sirui Gao, Tinghao Liu, Taisong Li, Weilong Chai, Xinyu Xiao, Xiaomei Wang, Xiaoxue Chen, Xiao Lu, Xiaoyu Li, Xingning Dong, Xuzheng Yu, Yi Yuan, Yuting Gao, Yunxiao Sun, Yipeng Chen, Yifei Wu, Yongjie Lyu, Ziping Ma, Zipeng Feng, Zhijiang Fang, Zhihao Qiu, Ziyuan Huang, Zhengyu He

We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single model to efficiently process and fuse multimodal inputs within a unified framework, thereby facilitating diverse tasks without requiring separate models, task-specific fine-tuning, or structural redesign. Importantly, Ming-Omni extends beyond conventional multimodal models by supporting audio and image generation. This is achieved through the integration of an advanced audio decoder for natural-sounding speech and Ming-Lite-Uni for high-quality image generation, which also allow the model to engage in context-aware chatting, perform text-to-speech conversion, and conduct versatile image editing. Our experimental results showcase Ming-Omni offers a powerful solution for unified perception and generation across all modalities. Notably, our proposed Ming-Omni is the first open-source model we are aware of to match GPT-4o in modality support, and we release all code and model weights to encourage further research and development in the community.

nan


Article 543

Title@2025-06-11 (3): RePO: Replay-Enhanced Policy Optimization

Title: RePO: Replay-Enhanced Policy Optimization RePO: Replay-erweiterte Politikoptimierung REPO: 重放强化政策优化 2506.09340v1

Authors (5): Siheng Li, Zhanhui Zhou, Wai Lam, Chao Yang, Chaochao Lu

Reinforcement learning (RL) is vital for optimizing large language models (LLMs). Recent Group Relative Policy Optimization (GRPO) estimates advantages using multiple on-policy outputs per prompt, leading to high computational costs and low data efficiency. To address this, we introduce Replay-Enhanced Policy Optimization (RePO), which leverages diverse replay strategies to retrieve off-policy samples from a replay buffer, allowing policy optimization based on a broader and more diverse set of samples for each prompt. Experiments on five LLMs across seven mathematical reasoning benchmarks demonstrate that RePO achieves absolute average performance gains of $18.4$ and $4.1$ points for Qwen2.5-Math-1.5B and Qwen3-1.7B, respectively, compared to GRPO. Further analysis indicates that RePO increases computational cost by $15\%$ while raising the number of effective optimization steps by $48\%$ for Qwen3-1.7B, with both on-policy and off-policy sample numbers set to $8$. The repository can be accessed at https://github.com/SihengLi99/RePO.

nan


Article 544

Title@2025-06-11 (3): AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin

Title: AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin AsFT: Verankerungssicherheit während LLM-Feinsteuerung innerhalb eines schmalen Sicherheitsbeckens AsFT:LLM在狭窄安全盆地内精密出海期间安整安全 2506.08473v2

Authors (11): Shuo Yang, Qihui Zhang, Yuyang Liu, Yue Huang, Xiaojun Jia, Kunpeng Ning, Jiayu Yao, Jigang Wang, Hailiang Dai, Yibing Song, Li Yuan

Large language models (LLMs) are vulnerable to safety risks during fine-tuning, where small amounts of malicious or harmless data can compromise safeguards. In this paper, building on the concept of alignment direction – defined by the weight difference between aligned and unaligned models – we observe that perturbations along this direction preserve model safety. In contrast, perturbations along directions orthogonal to this alignment are strongly linked to harmful direction perturbations, rapidly degrading safety and framing the parameter space as a narrow safety basin. Based on this insight, we propose a methodology for safety fine-tuning called AsFT (Anchoring Safety in Fine-Tuning), which integrates a regularization term into the training objective. This term uses the alignment direction as an anchor to suppress updates in harmful directions, ensuring that fine-tuning is constrained within the narrow safety basin. Extensive experiments on multiple datasets show that AsFT outperforms Safe LoRA, reducing harmful behavior by 7.60 percent, improving model performance by 3.44 percent, and maintaining robust performance across various experimental settings. Code is available at https://github.com/PKU-YuanGroup/AsFT

nan


Article 545

Title@2025-06-11 (3): Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference

Title: Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference Sentence-Level-Reward-Modell kann besser für die Ausrichtung LLM aus menschlichen Präferenzen Generalisieren 判决一级奖赏模式可以普遍化,更有利于将人类特惠的LLM与LM相匹配 2503.04793v4

Authors (7): Wenjie Qiu, Yi-Chen Li, Xuqin Zhang, Tianyi Zhang, Yihang Zhang, Zongzhang Zhang, Yang Yu

Learning reward models from human preference datasets and subsequently optimizing language models via reinforcement learning has emerged as a fundamental paradigm for aligning LLMs with human preferences. The performance of the reward model plays a crucial role in the effectiveness of alignment. Previous reward models operate at a coarse-grained level, requiring the generation of a complete response to obtain a reward value. The sparse reward may present challenges for downstream reinforcement learning. While recent efforts have attempted to learn token-level reward models, the lack of explicit semantic information makes it difficult to model the credit of every individual token. In this paper, we propose assigning scores to every sentence, introducing an intermediate-grained reward model. By segmenting the complete response into sentences and applying differential operations to reward output at the start and end positions of each sentence, we can effectively model the rewards of sentences. Moreover, a novel attention mechanism is introduced to aggregate the scores of all sentences into a response-level score, which allows it to be trained using the Bradley-Terry model. On common benchmarks, our method outperforms the response-level reward model by 2.7% on RewardBench (for reward modeling evaluation) and surpasses all baselines on AlpacaEval (for alignment evaluation).

nan


Article 546

Title@2025-06-11 (3): Know What You Don’t Know: Uncertainty Calibration of Process Reward Models

Title: Know What You Don’t Know: Uncertainty Calibration of Process Reward Models Wissen, was Sie nicht wissen: Unsicherheit Kalibrierung von Prozess-Reward-Modellen 你不知道你不知道什么: 过程奖励模型的不确定性校准 2506.09338v1

Authors (5): Young-Jin Park, Kristjan Greenewald, Kaveh Alim, Hao Wang, Navid Azizan

Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poorly calibrated and often overestimate success probabilities. To address this, we present a calibration approach, performed via quantile regression, that adjusts PRM outputs to better align with true success probabilities. Leveraging these calibrated success estimates and their associated confidence bounds, we introduce an \emph{instance-adaptive scaling} (IAS) framework that dynamically adjusts the inference budget based on the estimated likelihood that a partial reasoning trajectory will yield a correct final answer. Unlike conventional methods that allocate a fixed number of reasoning trajectories per query, this approach successfully adapts to each instance and reasoning step when using our calibrated PRMs. Experiments on mathematical reasoning benchmarks show that (i) our PRM calibration method successfully achieves small calibration error, outperforming the baseline methods, (ii) calibration is crucial for enabling effective adaptive scaling, and (iii) the proposed IAS strategy reduces inference costs while maintaining final answer accuracy, utilizing less compute on more confident problems as desired.

nan


Article 547

Title@2025-06-11 (3): STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation

Title: STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation STAMImputer: Raum-Temporale Aufmerksamkeit MoE für Verkehrsdaten Imputation STAMIMUTER: 交通数据截读的时/时注意MoE 2506.08054v2

Authors (7): Yiming Wang, Hao Peng, Senzhang Wang, Haohua Du, Chunyang Liu, Jia Wu, Guanlin Wu

Traffic data imputation is fundamentally important to support various applications in intelligent transportation systems such as traffic flow prediction. However, existing time-to-space sequential methods often fail to effectively extract features in block-wise missing data scenarios. Meanwhile, the static graph structure for spatial feature propagation significantly constrains the models flexibility in handling the distribution shift issue for the nonstationary traffic data. To address these issues, this paper proposes a SpatioTemporal Attention Mixture of experts network named STAMImputer for traffic data imputation. Specifically, we introduce a Mixture of Experts (MoE) framework to capture latent spatio-temporal features and their influence weights, effectively imputing block missing. A novel Low-rank guided Sampling Graph ATtention (LrSGAT) mechanism is designed to dynamically balance the local and global correlations across road networks. The sampled attention vectors are utilized to generate dynamic graphs that capture real-time spatial correlations. Extensive experiments are conducted on four traffic datasets for evaluation. The result shows STAMImputer achieves significantly performance improvement compared with existing SOTA approaches. Our codes are available at https://github.com/RingBDStack/STAMImupter.

nan


Article 548

Title@2025-06-11 (3): Natural Language Guided Ligand-Binding Protein Design

Title: Natural Language Guided Ligand-Binding Protein Design Natürliche Sprache geführt Ligand-Binding Protein Design Ligand-Binding 蛋白设计 2506.09332v1

Authors (5): Zhenqiao Song, Ramith Hettiarachchi, Chuan Li, Jianwen Xie, Lei Li

Can AI protein models follow human language instructions and design proteins with desired functions (e.g. binding to a ligand)? Designing proteins that bind to a given ligand is crucial in a wide range of applications in biology and chemistry. Most prior AI models are trained on protein-ligand complex data, which is scarce due to the high cost and time requirements of laboratory experiments. In contrast, there is a substantial body of human-curated text descriptions about protein-ligand interactions and ligand formula. In this paper, we propose InstructPro, a family of protein generative models that follow natural language instructions to design ligand-binding proteins. Given a textual description of the desired function and a ligand formula in SMILES, InstructPro generates protein sequences that are functionally consistent with the specified instructions. We develop the model architecture, training strategy, and a large-scale dataset, InstructProBench, to support both training and evaluation. InstructProBench consists of 9,592,829 triples of (function description, ligand formula, protein sequence). We train two model variants: InstructPro-1B (with 1 billion parameters) and InstructPro-3B~(with 3 billion parameters). Both variants consistently outperform strong baselines, including ProGen2, ESM3, and Pinal. Notably, InstructPro-1B achieves the highest docking success rate (81.52% at moderate confidence) and the lowest average root mean square deviation (RMSD) compared to ground truth structures (4.026{\AA}). InstructPro-3B further descreases the average RMSD to 2.527{\AA}, demonstrating InstructPro’s ability to generate ligand-binding proteins that align with the functional specifications.

nan


Article 549

Title@2025-06-11 (3): SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space

Title: SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space SWAT-NN: Simultangewichte und Architekturtraining für neurale Netzwerke im Latent Space SWAT-NN: 低空空间神经网络的同质重量和建筑培训 2506.08270v2

Authors (3): Zitong Huang, Mansooreh Montazerin, Ajitesh Srivastava

Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.

nan


Article 550

Title@2025-06-11 (3): Your Agent Can Defend Itself against Backdoor Attacks

Title: Your Agent Can Defend Itself against Backdoor Attacks Ihr Agent kann sich gegen Hintertürangriffe verteidigen 你的特工可以保护自己 免受幕后攻击 2506.08336v2

Authors (5): Li Changjiang, Liang Jiacheng, Cao Bochuan, Chen Jinghui, Wang Ting

Despite their growing adoption across domains, large language model (LLM)-powered agents face significant security risks from backdoor attacks during training and fine-tuning. These compromised agents can subsequently be manipulated to execute malicious operations when presented with specific triggers in their inputs or environments. To address this pressing risk, we present ReAgent, a novel defense against a range of backdoor attacks on LLM-based agents. Intuitively, backdoor attacks often result in inconsistencies among the user’s instruction, the agent’s planning, and its execution. Drawing on this insight, ReAgent employs a two-level approach to detect potential backdoors. At the execution level, ReAgent verifies consistency between the agent’s thoughts and actions; at the planning level, ReAgent leverages the agent’s capability to reconstruct the instruction based on its thought trajectory, checking for consistency between the reconstructed instruction and the user’s instruction. Extensive evaluation demonstrates ReAgent’s effectiveness against various backdoor attacks across tasks. For instance, ReAgent reduces the attack success rate by up to 90\% in database operation tasks, outperforming existing defenses by large margins. This work reveals the potential of utilizing compromised agents themselves to mitigate backdoor risks.

nan


Article 551

Title@2025-06-11 (3): polyGen: A Learning Framework for Atomic-level Polymer Structure Generation

Title: polyGen: A Learning Framework for Atomic-level Polymer Structure Generation polyGen: Ein Lernrahmen für die atomare Polymerstrukturbildung 聚金:原子级聚合体结构生成学习框架 2504.17656v3

Authors (2): Ayush Jain, Rampi Ramprasad

Synthetic polymeric materials underpin fundamental technologies in the energy, electronics, consumer goods, and medical sectors, yet their development still suffers from prolonged design timelines. Although polymer informatics tools have supported speedup, polymer simulation protocols continue to face significant challenges in the on-demand generation of realistic 3D atomic structures that respect conformational diversity. Generative algorithms for 3D structures of inorganic crystals, bio-polymers, and small molecules exist, but have not addressed synthetic polymers because of challenges in representation and dataset constraints. In this work, we introduce polyGen, the first generative model designed specifically for polymer structures from minimal inputs such as the repeat unit chemistry alone. polyGen combines graph-based encodings with a latent diffusion transformer using positional biased attention for realistic conformation generation. Given the limited dataset of 3,855 DFT-optimized polymer structures, we incorporate joint training with small molecule data to enhance generation quality. We also establish structure matching criteria to benchmark our approach on this novel problem. polyGen overcomes the limitations of traditional crystal structure prediction methods for polymers, successfully generating realistic and diverse linear and branched conformations, with promising performance even on challenging large repeat units. As the first atomic-level proof-of-concept capturing intrinsic polymer flexibility, it marks a new capability in material structure generation.

nan


Article 552

Title@2025-06-11 (3): Low-Loss Space in Neural Networks is Continuous and Fully Connected

Title: Low-Loss Space in Neural Networks is Continuous and Fully Connected Low-Loss-Raum in neuralen Netzwerken ist kontinuierlich und vollständig verbunden 神经网络中的低损失空间是连续和完全连接的 2505.02604v3

Authors (4): Yongding Tian, Zaid Al-Ars, Maksim Kitsak, Peter Hofstee

Visualizations of the loss landscape in neural networks suggest that minima are isolated points. However, both theoretical and empirical studies indicate that it is possible to connect two different minima with a path consisting of intermediate points that also have low loss. In this study, we propose a new algorithm which investigates low-loss paths in the full parameter space, not only between two minima. Our experiments on LeNet5, ResNet18, and Compact Convolutional Transformer architectures consistently demonstrate the existence of such continuous paths in the parameter space. These results suggest that the low-loss region is a fully connected and continuous space in the parameter space. Our findings provide theoretical insight into neural network over-parameterization, highlighting that parameters collectively define a high-dimensional low-loss space, implying parameter redundancy exists only within individual models and not throughout the entire low-loss space. Additionally, our work also provides new visualization methods and opportunities to improve model generalization by exploring the low-loss space that is closer to the origin.

nan


Article 553

Title@2025-06-11 (3): Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies

Title: Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies Beschleunigung der LLM-Inferenz mit verlustfreier spekulativer Dekodierung von Algorithmen für heterogene Vokabeln 加速LLM 超异基因词汇表无损失投机代谢值的加速LLM推导值 2502.05202v3

Authors (8): Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Gaurav Jain, Oren Pereg, Moshe Wasserblat, David Harel

Accelerating the inference of large language models (LLMs) is a critical challenge in generative AI. Speculative decoding (SD) methods offer substantial efficiency gains by generating multiple tokens using a single target forward pass. However, existing SD approaches require the drafter and target models to share the same vocabulary, thus limiting the pool of possible drafters, often necessitating the training of a drafter from scratch. We present three new SD methods that remove this shared-vocabulary constraint. All three methods preserve the target distribution (i.e., they are lossless) and work with off-the-shelf models without requiring additional training or modifications. Empirically, on summarization, programming, and long-context tasks, our algorithms demonstrate significant speedups of up to 2.8x over standard autoregressive decoding. By enabling any off-the-shelf model to serve as a drafter and requiring no retraining, this work substantially broadens the applicability of the SD framework in practice.

nan


Article 554

Title@2025-06-11 (3): Monte Carlo Tree Diffusion for System 2 Planning

Title: Monte Carlo Tree Diffusion for System 2 Planning Monte Carlo Tree Diffusion für System 2 Planung 用于系统2规划的蒙特卡洛树传播 2502.07202v4

Authors (5): Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, Sungjin Ahn

Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance naturally improves with inference-time computation scaling-standard diffusion-based planners offer only limited avenues for the scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as inference-time computation increases.

nan


Article 555

Title@2025-06-11 (3): On The Impact of Merge Request Deviations on Code Review Practices

Title: On The Impact of Merge Request Deviations on Code Review Practices Über die Auswirkungen von Merge Request Abweichungen auf Code-Review-Praktiken 合并请求对守则审查惯例的影响 2506.08860v2

Authors (3): Samah Kansab, Francis Bordeleau, Ali Tizghadam

Code review is a key practice in software engineering, ensuring quality and collaboration. However, industrial Merge Request (MR) workflows often deviate from standardized review processes, with many MRs serving non-review purposes (e.g., drafts, rebases, or dependency updates). We term these cases deviations and hypothesize that ignoring them biases analytics and undermines ML models for review analysis. We identify seven deviation categories, occurring in 37.02% of MRs, and propose a few-shot learning detection method (91% accuracy). By excluding deviations, ML models predicting review completion time improve performance in 53.33% of cases (up to 2.25x) and exhibit significant shifts in feature importance (47% overall, 60% top-k). Our contributions include: (1) a taxonomy of MR deviations, (2) an AI-driven detection approach, and (3) empirical evidence of their impact on ML-based review analytics. This work aids practitioners in optimizing review efforts and ensuring reliable insights.

nan


Article 556

Title@2025-06-11 (3): Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models

Title: Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models Schulungsfreie Anleitung über die Unterscheidungsfähigkeit: Skalierbare Pfadsteuerung mit Baumsuche in Diffusions- und Flussmodellen 超越差异的无培训指导—-超越差异的无培训指导:在传播和流动模型中进行植树搜索的可扩展路径指导 2502.11420v3

Authors (4): Yingqing Guo, Yukang Yang, Hui Yuan, Mengdi Wang

Training-free guidance enables controlled generation in diffusion and flow models, but most methods rely on gradients and assume differentiable objectives. This work focuses on training-free guidance addressing challenges from non-differentiable objectives and discrete data distributions. We propose TreeG: Tree Search-Based Path Steering Guidance, applicable to both continuous and discrete settings in diffusion and flow models. TreeG offers a unified framework for training-free guidance by proposing, evaluating, and selecting candidates at each step, enhanced with tree search over active paths and parallel exploration. We comprehensively investigate the design space of TreeG over the candidate proposal module and the evaluation function, instantiating TreeG into three novel algorithms. Our experiments show that TreeG consistently outperforms top guidance baselines in symbolic music generation, small molecule design, and enhancer DNA design with improvements of 29.01%, 16.6%, and 18.43%. Additionally, we identify an inference-time scaling law showing TreeG’s scalability in inference-time computation.

nan


Article 557

Title@2025-06-11 (3): Alzheimer’s Dementia Detection Using Perplexity from Paired Large Language Models

Title: Alzheimer’s Dementia Detection Using Perplexity from Paired Large Language Models Alzheimer’s Dementia Detection Using Perplexity from Paired Large Language Models 阿兹海默氏病痴呆症检测,使用Paired大语言模型的复度 2506.09315v1

Authors (3): Yao Xiao, Heidi Christensen, Stefan Goetze

Alzheimer’s dementia (AD) is a neurodegenerative disorder with cognitive decline that commonly impacts language ability. This work extends the paired perplexity approach to detecting AD by using a recent large language model (LLM), the instruction-following version of Mistral-7B. We improve accuracy by an average of 3.33% over the best current paired perplexity method and by 6.35% over the top-ranked method from the ADReSS 2020 challenge benchmark. Our further analysis demonstrates that the proposed approach can effectively detect AD with a clear and interpretable decision boundary in contrast to other methods that suffer from opaque decision-making processes. Finally, by prompting the fine-tuned LLMs and comparing the model-generated responses to human responses, we illustrate that the LLMs have learned the special language patterns of AD speakers, which opens up possibilities for novel methods of model interpretation and data augmentation.

nan


Article 558

Title@2025-06-11 (3): Surrogate models to optimize plasma assisted atomic layer deposition in high aspect ratio features

Title: Surrogate models to optimize plasma assisted atomic layer deposition in high aspect ratio features Surrogate-Modelle zur Optimierung der plasmaunterstützten Atomschichtabscheidung in Eigenschaften mit hohem Seitenverhältnis 利用代用模型优化高方位比率特征的等离子体辅助原子层沉积 2506.09313v1

Authors (2): Angel Yanguas-Gil, Jeffrey W. Elam

In this work we explore surrogate models to optimize plasma enhanced atomic layer deposition (PEALD) in high aspect ratio features. In plasma-based processes such as PEALD and atomic layer etching, surface recombination can dominate the reactivity of plasma species with the surface, which can lead to unfeasibly long exposure times to achieve full conformality inside nanostructures like high aspect ratio vias. Using a synthetic dataset based on simulations of PEALD, we train artificial neural networks to predict saturation times based on cross section thickness data obtained for partially coated conditions. The results obtained show that just two experiments in undersaturated conditions contain enough information to predict saturation times within 10% of the ground truth. A surrogate model trained to determine whether surface recombination dominates the plasma-surface interactions in a PEALD process achieves 99% accuracy. This demonstrates that machine learning can provide a new pathway to accelerate the optimization of PEALD processes in areas such as microelectronics. Our approach can be easily extended to atomic layer etching and more complex structures.

nan


Article 559

Title@2025-06-11 (3): What is the Cost of Differential Privacy for Deep Learning-Based Trajectory Generation?

Title: What is the Cost of Differential Privacy for Deep Learning-Based Trajectory Generation? Was sind die Kosten der unterschiedlichen Privatsphäre für Deep Learning-Based Trajektorie Generation? 深层学习轨迹世代的不同隐私成本是多少? 2506.09312v1

Authors (6): Erik Buchholz, Natasha Fernandes, David D. Nguyen, Alsharif Abuadbba, Surya Nepal, Salil S. Kanhere

While location trajectories offer valuable insights, they also reveal sensitive personal information. Differential Privacy (DP) offers formal protection, but achieving a favourable utility-privacy trade-off remains challenging. Recent works explore deep learning-based generative models to produce synthetic trajectories. However, current models lack formal privacy guarantees and rely on conditional information derived from real data during generation. This work investigates the utility cost of enforcing DP in such models, addressing three research questions across two datasets and eleven utility metrics. (1) We evaluate how DP-SGD, the standard DP training method for deep learning, affects the utility of state-of-the-art generative models. (2) Since DP-SGD is limited to unconditional models, we propose a novel DP mechanism for conditional generation that provides formal guarantees and assess its impact on utility. (3) We analyse how model types - Diffusion, VAE, and GAN - affect the utility-privacy trade-off. Our results show that DP-SGD significantly impacts performance, although some utility remains if the datasets is sufficiently large. The proposed DP mechanism improves training stability, particularly when combined with DP-SGD, for unstable models such as GANs and on smaller datasets. Diffusion models yield the best utility without guarantees, but with DP-SGD, GANs perform best, indicating that the best non-private model is not necessarily optimal when targeting formal guarantees. In conclusion, DP trajectory generation remains a challenging task, and formal guarantees are currently only feasible with large datasets and in constrained use cases.

nan


Article 560

Title@2025-06-11 (3): Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions

Title: Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions Rewind-to-Delete: Zertifiziertes Maschinen-Entlernen für nicht konvexe Funktionen 倒回到删除: 认证机器不学习非convex函数 2409.09778v4

Authors (2): Siqiao Mu, Diego Klabjan

Machine unlearning algorithms aim to efficiently remove data from a model without retraining it from scratch, in order to remove corrupted or outdated data or respect a user’s right to be forgotten." Certified machine unlearning is a strong theoretical guarantee based on differential privacy that quantifies the extent to which an algorithm erases data from the model weights. In contrast to existing works in certified unlearning for convex or strongly convex loss functions, or nonconvex objectives with limiting assumptions, we propose the first, first-order, black-box (i.e., can be applied to models pretrained with vanilla gradient descent) algorithm for unlearning on general nonconvex loss functions, which unlearns byrewinding” to an earlier step during the learning process before performing gradient descent on the loss function of the retained data points. We prove $(\epsilon, \delta)$ certified unlearning and performance guarantees that establish the privacy-utility-complexity tradeoff of our algorithm, and we prove generalization guarantees for functions that satisfy the Polyak-Lojasiewicz inequality. Finally, we demonstrate the superior performance of our algorithm compared to existing methods, within a new experimental framework that more accurately reflects unlearning user data in practice.

nan


Article 561

Title@2025-06-11 (3): A Geometric Modeling of Occam’s Razor in Deep Learning

Title: A Geometric Modeling of Occam’s Razor in Deep Learning Eine geometrische Modellierung von Occams Razor im Deep Learning Occam的深层学习剃刀的几何建模 1905.11027v9

Authors (2): Ke Sun, Frank Nielsen

Why do deep neural networks (DNNs) benefit from very high dimensional parameter spaces? Their huge parameter complexities vs stunning performance in practice is all the more intriguing and not explainable using the standard theory of model selection for regular models. In this work, we propose a geometrically flavored information-theoretic approach to study this phenomenon. With the belief that simplicity is linked to better generalization, as grounded in the theory of minimum description length, the objective of our analysis is to examine and bound the complexity of DNNs. We introduce the locally varying dimensionality of the parameter space of neural network models by considering the number of significant dimensions of the Fisher information matrix, and model the parameter space as a manifold using the framework of singular semi-Riemannian geometry. We derive model complexity measures which yield short description lengths for deep neural network models based on their singularity analysis thus explaining the good performance of DNNs despite their large number of parameters.

nan


Article 562

Title@2025-06-11 (3): PARAFAC2-based Coupled Matrix and Tensor Factorizations with Constraints

Title: PARAFAC2-based Coupled Matrix and Tensor Factorizations with Constraints PARAFAC2-basierte gekoppelte Matrix- und Tensor-Faktorisierungen mit Einschränkungen 以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2-以PARAFAC2为基础的 2406.12338v2

Authors (5): Carla Schenker, Xiulin Wang, David Horner, Morten A. Rasmussen, Evrim Acar

Data fusion models based on Coupled Matrix and Tensor Factorizations (CMTF) have been effective tools for joint analysis of data from multiple sources. While the vast majority of CMTF models are based on the strictly multilinear CANDECOMP/PARAFAC (CP) tensor model, recently also the more flexible PARAFAC2 model has been integrated into CMTF models. PARAFAC2 tensor models can handle irregular/ragged tensors and have shown to be especially useful for modelling dynamic data with unaligned or irregular time profiles. However, existing PARAFAC2-based CMTF models have limitations in terms of possible regularizations on the factors and/or types of coupling between datasets. To address these limitations, in this paper we introduce a flexible algorithmic framework that fits PARAFAC2-based CMTF models using Alternating Optimization (AO) and the Alternating Direction Method of Multipliers (ADMM). The proposed framework allows to impose various constraints on all modes and linear couplings to other matrix-, CP- or PARAFAC2-models. Experiments on various simulated and a real dataset demonstrate the utility and versatility of the proposed framework as well as its benefits in terms of accuracy and efficiency in comparison with state-of-the-art methods.

nan


Article 563

Title@2025-06-11 (3): Neuromorphic Optical Tracking and Imaging of Randomly Moving Targets through Strongly Scattering Media

Title: Neuromorphic Optical Tracking and Imaging of Randomly Moving Targets through Strongly Scattering Media Neuromorphes optisches Tracking und Bildgebung von zufällig sich bewegenden Zielen durch stark streuende Medien 通过强烈散射媒体对随机移动目标进行神经形态光学跟踪和成像 2501.03874v2

Authors (3): Ning Zhang, Timothy Shea, Arto Nurmikko

Tracking and acquiring simultaneous optical images of randomly moving targets obscured by scattering media remains a challenging problem of importance to many applications that require precise object localization and identification. In this work we develop an end-to-end neuromorphic optical engineering and computational approach to demonstrate how to track and image normally invisible objects by combining an event detecting camera with a multistage neuromorphic deep learning strategy. Photons emerging from dense scattering media are detected by the event camera and converted to pixel-wise asynchronized spike trains - a first step in isolating object-specific information from the dominant uninformative background. Spiking data is fed into a deep spiking neural network (SNN) engine where object tracking and image reconstruction are performed by two separate yet interconnected modules running in parallel in discrete time steps over the event duration. Through benchtop experiments we demonstrate tracking and imaging randomly moving objects in dense turbid media as well as image reconstruction of spatially stationary but optically dynamic objects. Standardized character sets serve as representative proxies for geometrically complex objects, underscoring the method’s generality. The results highlight the advantages of a fully neuromorphic approach in meeting a major imaging technology with high computational efficiency and low power consumption.

nan


Article 564

Title@2025-06-11 (3): Lower Bounds for Learning Quantum States with Single-Copy Measurements

Title: Lower Bounds for Learning Quantum States with Single-Copy Measurements Untere Grenzen für das Lernen von Quantenzuständen mit Single-Copy-Messungen 具有单复度测量法的低下学习区次数国家 2207.14438v3

Authors (2): Angus Lowe, Ashwin Nayak

We study the problems of quantum tomography and shadow tomography using measurements performed on individual, identical copies of an unknown $d$-dimensional state. We first revisit a known lower bound due to Haah et al. (2017) on quantum tomography with accuracy $\epsilon$ in trace distance, when the measurements choices are independent of previously observed outcomes (i.e., they are nonadaptive). We give a succinct proof of this result. This leads to stronger lower bounds when the learner uses measurements with a constant number of outcomes. In particular, this rigorously establishes the optimality of the folklore ``Pauli tomography” algorithm in terms of its sample complexity. We also derive novel bounds of $\Omega(r^2 d/\epsilon^2)$ and $\Omega(r^2 d^2/\epsilon^2)$ for learning rank $r$ states using arbitrary and constant-outcome measurements, respectively, in the nonadaptive case. In addition to the sample complexity, a resource of practical significance for learning quantum states is the number of different measurements used by an algorithm. We extend our lower bounds to the case where the learner performs possibly adaptive measurements from a fixed set of $\exp(O(d))$ measurements. This implies in particular that adaptivity does not give us any advantage using single-copy measurements that are efficiently implementable. We also obtain a similar bound in the case where the goal is to predict the expectation values of a given sequence of observables, a task known as shadow tomography. Finally, in the case of adaptive, single-copy measurements implementable with polynomial-size circuits, we prove that a straightforward strategy based on computing sample means of the given observables is optimal.

nan


Article 565

Title@2025-06-10 (2): Lightweight Object Detection Using Quantized YOLOv4-Tiny for Emergency Response in Aerial Imagery

Title: Lightweight Object Detection Using Quantized YOLOv4-Tiny for Emergency Response in Aerial Imagery Leichtgewichtige Objekterkennung mit Quantisiertem YOLOv4-Tiny für Notfallreaktion in Luftbildern 在空中图像中,使用量化的YOLOv4-Tiny用于应急反应的轻重量物体探测 2506.09299v1

Authors (2): Sindhu Boddu, Arindam Mukherjee

This paper presents a lightweight and energy-efficient object detection solution for aerial imagery captured during emergency response situations. We focus on deploying the YOLOv4-Tiny model, a compact convolutional neural network, optimized through post-training quantization to INT8 precision. The model is trained on a custom-curated aerial emergency dataset, consisting of 10,820 annotated images covering critical emergency scenarios. Unlike prior works that rely on publicly available datasets, we created this dataset ourselves due to the lack of publicly available drone-view emergency imagery, making the dataset itself a key contribution of this work. The quantized model is evaluated against YOLOv5-small across multiple metrics, including mean Average Precision (mAP), F1 score, inference time, and model size. Experimental results demonstrate that the quantized YOLOv4-Tiny achieves comparable detection performance while reducing the model size from 22.5 MB to 6.4 MB and improving inference speed by 44\%. With a 71\% reduction in model size and a 44\% increase in inference speed, the quantized YOLOv4-Tiny model proves highly suitable for real-time emergency detection on low-power edge devices.

nan


Article 566

Title@2025-06-10 (2): WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles

Title: WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles WaKA: Datenzuweisung unter Verwendung von K-Nächstgelegenen Nachbarn und Mitgliedschafts-Privacy Principles WaKA:使用K-近邻和成员隐私权原则的数据归属 2411.01357v3

Authors (5): Patrick Mesana, Clément Bénesse, Hadrien Lautraite, Gilles Caporossi, Sébastien Gambs

In this paper, we introduce WaKA (Wasserstein K-nearest-neighbors Attribution), a novel attribution method that leverages principles from the LiRA (Likelihood Ratio Attack) framework and k-nearest neighbors classifiers (k-NN). WaKA efficiently measures the contribution of individual data points to the model’s loss distribution, analyzing every possible k-NN that can be constructed using the training set, without requiring to sample subsets of the training set. WaKA is versatile and can be used a posteriori as a membership inference attack (MIA) to assess privacy risks or a priori for privacy influence measurement and data valuation. Thus, WaKA can be seen as bridging the gap between data attribution and membership inference attack (MIA) by providing a unified framework to distinguish between a data point’s value and its privacy risk. For instance, we have shown that self-attribution values are more strongly correlated with the attack success rate than the contribution of a point to the model generalization. WaKA’s different usage were also evaluated across diverse real-world datasets, demonstrating performance very close to LiRA when used as an MIA on k-NN classifiers, but with greater computational efficiency. Additionally, WaKA shows greater robustness than Shapley Values for data minimization tasks (removal or addition) on imbalanced datasets.

nan


Article 567

Title@2025-06-10 (2): Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Title: Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems Künstliche Intelligenz für die Wissenschaft in Quanten-, Atom- und Kontinuumsystemen 量子、原子学和连续系统科学人造情报 2307.08423v5

Authors (63): Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Alex Strasser, Haiyang Yu, YuQing Xie, Xiang Fu, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence, Hannes Stärk, Shurui Gui, Carl Edwards, Nicholas Gao, Adriana Ladera, Tailin Wu, Elyssa F. Hofgard, Aria Mansouri Tehrani, Rui Wang, Ameya Daigavane, Montgomery Bohde, Jerry Kurtin, Qian Huang, Tuong Phung, Minkai Xu, Chaitanya K. Joshi, Simon V. Mathis, Kamyar Azizzadenesheli, Ada Fang, Alán Aspuru-Guzik, Erik Bekkers, Michael Bronstein, Marinka Zitnik, Anima Anandkumar, Stefano Ermon, Pietro Liò, Rose Yu, Stephan Günnemann, Jure Leskovec, Heng Ji, Jimeng Sun, Regina Barzilay, Tommi Jaakkola, Connor W. Coley, Xiaoning Qian, Xiaofeng Qian, Tess Smidt, Shuiwang Ji

Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.

nan


Article 568

Title@2025-06-10 (2): Causal Graph Recovery in Neuroimaging through Answer Set Programming

Title: Causal Graph Recovery in Neuroimaging through Answer Set Programming Causal Graph Recovery in Neuroimaging durch Answer Set Programmierung 通过 “ 解答组 “ 方案编制实现神经成形过程中的因果图恢复 2506.09286v1

Authors (5): Mohammadsajad Abavisani, Kseniya Solovyeva, David Danks, Vince Calhoun, Sergey Plis

Learning graphical causal structures from time series data presents significant challenges, especially when the measurement frequency does not match the causal timescale of the system. This often leads to a set of equally possible underlying causal graphs due to information loss from sub-sampling (i.e., not observing all possible states of the system throughout time). Our research addresses this challenge by incorporating the effects of sub-sampling in the derivation of causal graphs, resulting in more accurate and intuitive outcomes. We use a constraint optimization approach, specifically answer set programming (ASP), to find the optimal set of answers. ASP not only identifies the most probable underlying graph, but also provides an equivalence class of possible graphs for expert selection. In addition, using ASP allows us to leverage graph theory to further prune the set of possible solutions, yielding a smaller, more accurate answer set significantly faster than traditional approaches. We validate our approach on both simulated data and empirical structural brain connectivity, and demonstrate its superiority over established methods in these experiments. We further show how our method can be used as a meta-approach on top of established methods to obtain, on average, 12% improvement in F1 score. In addition, we achieved state of the art results in terms of precision and recall of reconstructing causal graph from sub-sampled time series data. Finally, our method shows robustness to varying degrees of sub-sampling on realistic simulations, whereas other methods perform worse for higher rates of sub-sampling.

nan


Article 569

Title@2025-06-10 (2): ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs

Title: ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs Skalierbare HD: Skalierbare und hochdurchsatzfähige Hyperdimensionale Computing-Inferenz auf Multi-Core-CPUs 可缩缩放的HD:可缩放的和高通向的多孔数多维电子计算引文 2506.09282v1

Authors (2): Dhruv Parikh, Viktor Prasanna

Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that represents and manipulates information using high-dimensional vectors, called hypervectors (HV). Traditional HDC methods, while robust to noise and inherently parallel, rely on single-pass, non-parametric training and often suffer from low accuracy. To address this, recent approaches adopt iterative training of base and class HVs, typically accelerated on GPUs. Inference, however, remains lightweight and well-suited for real-time execution. Yet, efficient HDC inference has been studied almost exclusively on specialized hardware such as FPGAs and GPUs, with limited attention to general-purpose multi-core CPUs. To address this gap, we propose ScalableHD for scalable and high-throughput HDC inference on multi-core CPUs. ScalableHD employs a two-stage pipelined execution model, where each stage is parallelized across cores and processes chunks of base and class HVs. Intermediate results are streamed between stages using a producer-consumer mechanism, enabling on-the-fly consumption and improving cache locality. To maximize performance, ScalableHD integrates memory tiling and NUMA-aware worker-to-core binding. Further, it features two execution variants tailored for small and large batch sizes, each designed to exploit compute parallelism based on workload characteristics while mitigating the memory-bound compute pattern that limits HDC inference performance on modern multi-core CPUs. ScalableHD achieves up to 10x speedup in throughput (samples per second) over state-of-the-art baselines such as TorchHD, across a diverse set of tasks ranging from human activity recognition to image classification, while preserving task accuracy. Furthermore, ScalableHD exhibits robust scalability: increasing the number of cores yields near-proportional throughput improvements.

nan


Article 570

Title@2025-06-10 (2): TTrace: Lightweight Error Checking and Diagnosis for Distributed Training

Title: TTrace: Lightweight Error Checking and Diagnosis for Distributed Training TTrace: Leichte Fehlerprüfung und Diagnose für verteiltes Training TTrace:轻量度错误检查和分布式培训诊断 2506.09280v1

Authors (8): Haitian Jiang, Shaowei Zhu, Zhen Zhang, Zhenyu Song, Xinwei Fu, Zhen Jia, Yida Wang, Jinyang Li

Distributed training is essential for scaling the training of large neural network models, such as large language models (LLMs), across thousands of GPUs. However, the complexity of distributed training programs makes them particularly prone to silent bugs, which do not produce explicit error signal but lead to incorrect training outcome. Effectively detecting and localizing such silent bugs in distributed training is challenging. Common debugging practice using metrics like training loss or gradient norm curves can be inefficient and ineffective. Additionally, obtaining intermediate tensor values and determining whether they are correct during silent bug localization is difficult, particularly in the context of low-precision training. To address those challenges, we design and implement TTrace, the first system capable of detecting and localizing silent bugs in distributed training. TTrace collects intermediate tensors from distributing training in a fine-grained manner and compares them against those from a trusted single-device reference implementation. To properly compare the floating-point values in the tensors, we propose novel mathematical analysis that provides a guideline for setting thresholds, enabling TTrace to distinguish bug-induced errors from floating-point round-off errors. Experimental results demonstrate that TTrace effectively detects 11 existing bugs and 3 new bugs in the widely used Megatron-LM framework, while requiring fewer than 10 lines of code change. TTrace is effective in various training recipes, including low-precision recipes involving BF16 and FP8.

nan


Article 571

Title@2025-06-10 (2): Ambient Diffusion Omni: Training Good Models with Bad Data

Title: Ambient Diffusion Omni: Training Good Models with Bad Data Ambient Diffusion Omni: Gute Modelle mit schlechten Daten trainieren 环境扩散全云:培训不良数据良好模型 2506.10038v1

Authors (5): Giannis Daras, Adrian Rodriguez-Munoz, Adam Klivans, Antonio Torralba, Constantinos Daskalakis

We show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model. Typically, diffusion models are trained on curated datasets that emerge from highly filtered data pools from the Web and other sources. We show that there is immense value in the lower-quality images that are often discarded. We present Ambient Diffusion Omni, a simple, principled framework to train diffusion models that can extract signal from all available images during training. Our framework exploits two properties of natural images – spectral power law decay and locality. We first validate our framework by successfully training diffusion models with images synthetically corrupted by Gaussian blur, JPEG compression, and motion blur. We then use our framework to achieve state-of-the-art ImageNet FID, and we show significant improvements in both image quality and diversity for text-to-image generative modeling. The core insight is that noise dampens the initial skew between the desired high-quality distribution and the mixed distribution we actually observe. We provide rigorous theoretical justification for our approach by analyzing the trade-off between learning from biased data versus limited unbiased data across diffusion times.

nan


Article 572

Title: A Topic Modeling Analysis of Stigma Dimensions, Social, and Related Behavioral Circumstances in Clinical Notes Among Patients with HIV Thematische Modellierung von Stigma-Dimensionen, sozialen und damit zusammenhängenden Verhaltensumständen bei Patienten mit HIV 艾滋病毒携带者临床说明中的污名、社会和相关行为环境的典型问题专题分析 2506.09279v1

Authors (8): Ziyi Chen, Yiyang Liu, Mattia Prosperi, Krishna Vaddiparti, Robert L Cook, Jiang Bian, Yi Guo, Yonghui Wu

Objective: To characterize stigma dimensions, social, and related behavioral circumstances in people living with HIV (PLWHs) seeking care, using natural language processing methods applied to a large collection of electronic health record (EHR) clinical notes from a large integrated health system in the southeast United States. Methods: We identified 9,140 cohort of PLWHs from the UF Health IDR and performed topic modeling analysis using Latent Dirichlet Allocation (LDA) to uncover stigma dimensions, social, and related behavioral circumstances. Domain experts created a seed list of HIV-related stigma keywords, then applied a snowball strategy to iteratively review notes for additional terms until saturation was reached. To identify more target topics, we tested three keyword-based filtering strategies. Domain experts manually reviewed the detected topics using the prevalent terms and key discussion topics. Word frequency analysis was used to highlight the prevalent terms associated with each topic. In addition, we conducted topic variation analysis among subgroups to examine differences across age and sex-specific demographics. Results and Conclusion: Topic modeling on sentences containing at least one keyword uncovered a wide range of topic themes associated with HIV-related stigma, social, and related behaviors circumstances, including “Mental Health Concern and Stigma”, “Social Support and Engagement”, “Limited Healthcare Access and Severe Illness”, “Treatment Refusal and Isolation” and so on. Topic variation analysis across age subgroups revealed differences. Extracting and understanding the HIV-related stigma dimensions, social, and related behavioral circumstances from EHR clinical notes enables scalable, time-efficient assessment, overcoming the limitations of traditional questionnaires and improving patient outcomes.

nan


Article 573

Title@2025-06-10 (2): UFM: A Simple Path towards Unified Dense Correspondence with Flow

Title: UFM: A Simple Path towards Unified Dense Correspondence with Flow UFM: Ein einfacher Weg zur Unified Dense Correspondence mit Flow UFM:通向统一思维与流动对应的简单路径 2506.09278v1

Authors (12): Yuchen Zhang, Nikhil Keetha, Chenwei Lyu, Bhuvan Jhamb, Yutian Chen, Yuheng Qiu, Jay Karhade, Shreyas Jha, Yaoyu Hu, Deva Ramanan, Sebastian Scherer, Wenshan Wang

Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this paper, we develop a Unified Flow & Matching model (UFM), which is trained on unified data for pixels that are co-visible in both source and target images. UFM uses a simple, generic transformer architecture that directly regresses the (u,v) flow. It is easier to train and more accurate for large flows compared to the typical coarse-to-fine cost volumes in prior work. UFM is 28% more accurate than state-of-the-art flow methods (Unimatch), while also having 62% less error and 6.7x faster than dense wide-baseline matchers (RoMa). UFM is the first to demonstrate that unified training can outperform specialized approaches across both domains. This result enables fast, general-purpose correspondence and opens new directions for multi-modal, long-range, and real-time correspondence tasks.

nan


Article 574

Title@2025-06-10 (2): EMMA: Efficient Visual Alignment in Multi-Modal LLMs

Title: EMMA: Efficient Visual Alignment in Multi-Modal LLMs EMMA: Effiziente visuelle Ausrichtung in Multi-Modal LLMs EMMA:多模式LMS中高效视觉协调 2410.02080v2

Authors (5): Sara Ghazanfari, Alexandre Araujo, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami

Multi-modal Large Language Models (MLLMs) have recently exhibited impressive general-purpose capabilities by leveraging vision foundation models to encode the core concepts of images into representations. These are then combined with instructions and processed by the language model to generate high-quality responses. Despite significant progress in enhancing the language component, challenges persist in optimally fusing visual encodings within the language model for task-specific adaptability. Recent research has focused on improving this fusion through modality adaptation modules but at the cost of significantly increased model complexity and training data needs. In this paper, we propose EMMA (Efficient Multi-Modal Adaptation), a lightweight cross-modality module designed to efficiently fuse visual and textual encodings, generating instruction-aware visual representations for the language model. Our key contributions include: (1) an efficient early fusion mechanism that integrates vision and language representations with minimal added parameters (less than 0.2% increase in model size), (2) an in-depth interpretability analysis that sheds light on the internal mechanisms of the proposed method; (3) comprehensive experiments that demonstrate notable improvements on both specialized and general benchmarks for MLLMs. Empirical results show that EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations. Our code is available at https://github.com/SaraGhazanfari/EMMA

nan


Article 575

Title@2025-06-10 (2): Learning The Minimum Action Distance

Title: Learning The Minimum Action Distance Die minimale Aktionsdistanz lernen 学习最低行动距离 2506.09276v1

Authors (4): Lorenzo Steccanella, Joshua B. Evans, Özgür Şimşek, Anders Jonsson

This paper presents a state representation framework for Markov decision processes (MDPs) that can be learned solely from state trajectories, requiring neither reward signals nor the actions executed by the agent. We propose learning the minimum action distance (MAD), defined as the minimum number of actions required to transition between states, as a fundamental metric that captures the underlying structure of an environment. MAD naturally enables critical downstream tasks such as goal-conditioned reinforcement learning and reward shaping by providing a dense, geometrically meaningful measure of progress. Our self-supervised learning approach constructs an embedding space where the distances between embedded state pairs correspond to their MAD, accommodating both symmetric and asymmetric approximations. We evaluate the framework on a comprehensive suite of environments with known MAD values, encompassing both deterministic and stochastic dynamics, as well as discrete and continuous state spaces, and environments with noisy observations. Empirical results demonstrate that the proposed approach not only efficiently learns accurate MAD representations across these diverse settings but also significantly outperforms existing state representation methods in terms of representation quality.

nan


Article 576

Title@2025-06-10 (2): Amortized Inference of Causal Models via Conditional Fixed-Point Iterations

Title: Amortized Inference of Causal Models via Conditional Fixed-Point Iterations Amortisierte Schlussfolgerung von Kausalmodellen über bedingte Fixed-Point-Iterationen 通过有条件的固定点定点迭代对因果模型进行摊销 2410.06128v3

Authors (5): Divyat Mahajan, Jannes Gladrow, Agrin Hilmkil, Cheng Zhang, Meyer Scetbon

Structural Causal Models (SCMs) offer a principled framework to reason about interventions and support out-of-distribution generalization, which are key goals in scientific discovery. However, the task of learning SCMs from observed data poses formidable challenges, and often requires training a separate model for each dataset. In this work, we propose amortized inference of SCMs by training a single model on multiple datasets sampled from different SCMs. We first use a transformer-based architecture for amortized learning of dataset embeddings, and then extend the Fixed-Point Approach (FiP) (Scetbon et al.) to infer SCMs conditionally on their dataset embeddings. As a byproduct, our method can generate observational and interventional data from novel SCMs at inference time, without updating parameters. Empirical results show that our amortized procedure performs on par with baselines trained specifically for each dataset on both in and out-of-distribution problems, and also outperforms them in scare data regimes.

nan


Article 577

Title@2025-06-10 (2): RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts

Title: RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts RocketPPA: Code-Level Power, Performance und Area Prediction über LLM und Mixture of Experts 火箭式PPPA:通过LLM和专家混合进行代码级动力、性能和地区预测 2503.21971v3

Authors (3): Armin Abdollahi, Mehdi Kamal, Massoud Pedram

This paper presents RocketPPA, a novel ultra-fast power, performance (delay), and area (PPA) estimator operating directly at the code-level abstraction using HDL code as input. The key technical innovation is its LLM-based regression model, which uniquely integrates a large language model (LLM) with a mixture-of-experts (MoE) architecture composed of multilayer perceptrons (MLPs). The LLM interprets the input HDL code and then utilizes its final hidden-layer representations to predict PPA metrics. Low-rank adaptation (LoRA) is used for parameter-efficient fine-tuning to enable efficient LLM training. Furthermore, the work includes the development of an LLM-based HDL code repair framework to generate a large and synthesizable training dataset. Experimental results on the VerilogEval benchmark demonstrate that RocketPPA achieves significant improvements in the accuracy of PPA estimation compared to previous state-of-the-art methods like Llama3-MetRex-8B. Specifically, at a 10% relative error threshold, RocketPPA enhances the pass rate for area prediction by 13.6%, delay by 9.4%, and power by 14.7%. At a 20% threshold, the improvements are 9.6% for area, 10.8% for delay, and 18.5% for power. Moreover, RocketPPA achieves a speedup of over 20x compared to MetRex and 30x over MasterRTL in processing the test set. The impact of RocketPPA is the potential to substantially accelerate the hardware design process by providing accurate PPA estimations early in the design cycle, thus avoiding the overhead of manual feature engineering and time-consuming synthesis flows.

nan


Article 578

Title@2025-06-10 (2): G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration

Title: G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration G-Sim: Generative Simulationen mit großen Sprachmodellen und gradientfreier Kalibrierung G-Sim:使用大语言模型和梯度自由校准的生成模拟 2506.09272v1

Authors (4): Samuel Holt, Max Ruiz Luyten, Antonin Berthon, Mihaela van der Schaar

Constructing robust simulators is essential for asking “what if?” questions and guiding policy in critical domains like healthcare and logistics. However, existing methods often struggle, either failing to generalize beyond historical data or, when using Large Language Models (LLMs), suffering from inaccuracies and poor empirical alignment. We introduce G-Sim, a hybrid framework that automates simulator construction by synergizing LLM-driven structural design with rigorous empirical calibration. G-Sim employs an LLM in an iterative loop to propose and refine a simulator’s core components and causal relationships, guided by domain knowledge. This structure is then grounded in reality by estimating its parameters using flexible calibration techniques. Specifically, G-Sim can leverage methods that are both likelihood-free and gradient-free with respect to the simulator, such as gradient-free optimization for direct parameter estimation or simulation-based inference for obtaining a posterior distribution over parameters. This allows it to handle non-differentiable and stochastic simulators. By integrating domain priors with empirical evidence, G-Sim produces reliable, causally-informed simulators, mitigating data-inefficiency and enabling robust system-level interventions for complex decision-making.

nan


Article 579

Title@2025-06-10 (2): Uncertainty Prioritized Experience Replay

Title: Uncertainty Prioritized Experience Replay Ungewissheit Priorisierte Erfahrung Replay 不确定性确定性优先经验重放 2506.09270v1

Authors (4): Rodrigo Carrasco-Davis, Sebastian Lee, Claudia Clopath, Will Dabney

Prioritized experience replay, which improves sample efficiency by selecting relevant transitions to update parameter estimates, is a crucial component of contemporary value-based deep reinforcement learning models. Typically, transitions are prioritized based on their temporal difference error. However, this approach is prone to favoring noisy transitions, even when the value estimation closely approximates the target mean. This phenomenon resembles the noisy TV problem postulated in the exploration literature, in which exploration-guided agents get stuck by mistaking noise for novelty. To mitigate the disruptive effects of noise in value estimation, we propose using epistemic uncertainty estimation to guide the prioritization of transitions from the replay buffer. Epistemic uncertainty quantifies the uncertainty that can be reduced by learning, hence reducing transitions sampled from the buffer generated by unpredictable random processes. We first illustrate the benefits of epistemic uncertainty prioritized replay in two tabular toy models: a simple multi-arm bandit task, and a noisy gridworld. Subsequently, we evaluate our prioritization scheme on the Atari suite, outperforming quantile regression deep Q-learning benchmarks; thus forging a path for the use of uncertainty prioritized replay in reinforcement learning agents.

nan


Article 580

Title@2025-06-10 (2): A Stable Whitening Optimizer for Efficient Neural Network Training

Title: A Stable Whitening Optimizer for Efficient Neural Network Training Ein stabiler Whitening Optimizer für effizientes Neural Network Training 高效神经网络培训的稳定白电优化器 2506.07254v2

Authors (3): Kevin Frans, Sergey Levine, Pieter Abbeel

In this work, we take an experimentally grounded look at neural network optimization. Building on the Shampoo family of algorithms, we identify and alleviate three key issues, resulting in the proposed SPlus method. First, we find that naive Shampoo is prone to divergence when matrix-inverses are cached for long periods. We introduce an alternate bounded update combining a historical eigenbasis with instantaneous normalization, resulting in across-the-board stability and significantly lower computational requirements. Second, we adapt a shape-aware scaling to enable learning rate transfer across network width. Third, we find that high learning rates result in large parameter noise, and propose a simple iterate-averaging scheme which unblocks faster learning. To properly confirm these findings, we introduce a pointed Transformer training benchmark, considering three objectives (language modelling, image classification, and diffusion modelling) across different stages of training. On average, SPlus is able to reach the validation performance of Adam within 44% of the gradient steps and 62% of the wallclock time.

nan


Article 581

Title@2025-06-10 (2): Dataset Properties Shape the Success of Neuroimaging-Based Patient Stratification: A Benchmarking Analysis Across Clustering Algorithms

Title: Dataset Properties Shape the Success of Neuroimaging-Based Patient Stratification: A Benchmarking Analysis Across Clustering Algorithms Datensatzeigenschaften gestalten den Erfolg der Neuroimaging-basierten Patientenschichtung: Eine Benchmarking-Analyse über Cluster-Algorithmen hinweg 数据集属性形状基于神经成形的患者分层成功率:对集束数值的基准分析 2503.12066v2

Authors (6): Yuetong Yu, Ruiyang Ge, Ilker Hacihaliloglu, Alexander Rauscher, Roger Tam, Sophia Frangou

Background: Data driven stratification of patients into biologically informed subtypes holds promise for precision neuropsychiatry, yet neuroimaging-based clustering methods often fail to generalize across cohorts. While algorithmic innovations have focused on model complexity, the role of underlying dataset characteristics remains underexplored. We hypothesized that cluster separation, size imbalance, noise, and the direction and magnitude of disease-related effects in the input data critically determine both within-algorithm accuracy and reproducibility. Methods: We evaluated 4 widely used stratification algorithms, HYDRA, SuStaIn, SmileGAN, and SurrealGAN, on a suite of synthetic brain-morphometry cohorts derived from the Human Connectome Project Young Adult dataset. Three global transformation patterns were applied to 600 pseudo-patients against 508 controls, followed by 4 within-dataset variations varying cluster count (k=2-6), overlap, and effect magnitude. Algorithm performance was quantified by accuracy in recovering the known ground-truth clusters. Results: Across 122 synthetic scenarios, data complexity consistently outweighed algorithm choice in predicting stratification success. Well-separated clusters yielded high accuracy for all methods, whereas overlapping, unequal-sized, or subtle effects reduced accuracy by up to 50%. SuStaIn could not scale beyond 17 features, HYDRA’s accuracy varied unpredictably with data heterogeneity. SmileGAN and SurrealGAN maintained robust pattern detection but did not assign discrete cluster labels to individuals. Conclusions: The study results demonstrate the impact of statistical properties of input data across algorithms and highlight the need for using realistic dataset distributions when new algorithms are being developed and suggest greater focus on data-centric strategies that actively shape and standardize the input distributions.

nan


Article 582

Title@2025-06-10 (2): Archon: An Architecture Search Framework for Inference-Time Techniques

Title: Archon: An Architecture Search Framework for Inference-Time Techniques Archon: Ein Architektur-Suchrahmen für Inferenz-Zeit-Techniken Archon: 推断-时间技术建筑搜索框架 2409.15254v6

Authors (11): Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, Azalia Mirhoseini

Inference-time techniques, such as repeated sampling or iterative revisions, are emerging as powerful ways to enhance large-language models (LLMs) at test time. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of each technique across models and tasks, the interactions between them, and the massive search space for combining them. To address these challenges, we introduce Archon, a modular and automated framework for optimizing the process of selecting and combining inference-time techniques and LLMs. Given a compute budget and a set of available LLMs, Archon explores a large design space to discover optimized configurations tailored to target benchmarks. It can design custom or general-purpose architectures that advance the Pareto frontier of accuracy vs. maximum token budget compared to top-performing baselines. Across instruction-following, reasoning, and coding tasks, we show that Archon can leverage additional inference compute budget to design systems that outperform frontier models such as OpenAI’s o1, GPT-4o, and Claude 3.5 Sonnet by an average of 15.1%.

nan


Article 583

Title@2025-06-10 (2): CFMI: Flow Matching for Missing Data Imputation

Title: CFMI: Flow Matching for Missing Data Imputation CFMI: Flow Matching für fehlende Daten-Imputation CFMI: 缺失数据计算流程匹配 2506.09258v1

Authors (2): Vaidotas Simkus, Michael U. Gutmann

We introduce conditional flow matching for imputation (CFMI), a new general-purpose method to impute missing data. The method combines continuous normalising flows, flow-matching, and shared conditional modelling to deal with intractabilities of traditional multiple imputation. Our comparison with nine classical and state-of-the-art imputation methods on 24 small to moderate-dimensional tabular data sets shows that CFMI matches or outperforms both traditional and modern techniques across a wide range of metrics. Applying the method to zero-shot imputation of time-series data, we find that it matches the accuracy of a related diffusion-based method while outperforming it in terms of computational efficiency. Overall, CFMI performs at least as well as traditional methods on lower-dimensional data while remaining scalable to high-dimensional settings, matching or exceeding the performance of other deep learning-based approaches, making it a go-to imputation method for a wide range of data types and dimensionalities.

nan


Article 584

Title@2025-06-10 (2): Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Title: Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Kommentar zur Illusion des Denkens: Die Stärken und Grenzen von Vernunftmodellen über das Lens of Problem Complexity verstehen 关于 “ 思考的幻觉:通过问题复杂焦点了解理性模型的长处和局限性 “ 的评论 2506.09250v1

Authors (2): C. Opus, A. Lawsen

Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit “accuracy collapse” on planning puzzles beyond certain complexity thresholds. We demonstrate that their findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Our analysis reveals three critical issues: (1) Tower of Hanoi experiments systematically exceed model output token limits at reported failure points, with models explicitly acknowledging these constraints in their outputs; (2) The authors’ automated evaluation framework fails to distinguish between reasoning failures and practical constraints, leading to misclassification of model capabilities; (3) Most concerningly, their River Crossing benchmarks include mathematically impossible instances for N > 5 due to insufficient boat capacity, yet models are scored as failures for not solving these unsolvable problems. When we control for these experimental artifacts, by requesting generating functions instead of exhaustive move lists, preliminary experiments across multiple models indicate high accuracy on Tower of Hanoi instances previously reported as complete failures. These findings highlight the importance of careful experimental design when evaluating AI reasoning capabilities.

nan


Article 585

Title@2025-06-10 (2): Simulation-based Inference for High-dimensional Data using Surjective Sequential Neural Likelihood Estimation

Title: Simulation-based Inference for High-dimensional Data using Surjective Sequential Neural Likelihood Estimation Simulationsbasierte Inferenz für hochdimensionale Daten mittels Surjektiv-Sequentieller Neural Likelihood-Abschätzung 使用粒子序列神经神经近似近似性估计法对高维数据进行基于模拟的推论 2308.01054v3

Authors (3): Simon Dirmeier, Carlo Albert, Fernando Perez-Cruz

Neural likelihood estimation methods for simulation-based inference can suffer from performance degradation when the modeled data is very high-dimensional or lies along a lower-dimensional manifold, which is due to the inability of the density estimator to accurately estimate a density function. We present Surjective Sequential Neural Likelihood (SSNL) estimation, a novel member in the family of methods for simulation-based inference (SBI). SSNL fits a dimensionality-reducing surjective normalizing flow model and uses it as a surrogate likelihood function, which allows for computational inference via Markov chain Monte Carlo or variational Bayes methods. Among other benefits, SSNL avoids the requirement to manually craft summary statistics for inference of high-dimensional data sets, since the lower-dimensional representation is computed simultaneously with learning the likelihood and without additional computational overhead. We evaluate SSNL on a wide variety of experiments, including two challenging real-world examples from the astrophysics and neuroscience literatures, and show that it either outperforms or is on par with state-of-the-art methods, making it an excellent off-the-shelf estimator for SBI for high-dimensional data sets.

nan


Article 586

Title@2025-06-10 (2): Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation

Title: Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation 在多式工业数据库检索增强型多式工业数据库中有条件监测援助 2506.09247v1

Authors (5): Karl Löwenmark, Daniel Strömbergsson, Chang Liu, Marcus Liwicki, Fredrik Sandin

Condition monitoring (CM) plays a crucial role in ensuring reliability and efficiency in the process industry. Although computerised maintenance systems effectively detect and classify faults, tasks like fault severity estimation, and maintenance decisions still largely depend on human expert analysis. The analysis and decision making automatically performed by current systems typically exhibit considerable uncertainty and high false alarm rates, leading to increased workload and reduced efficiency. This work integrates large language model (LLM)-based reasoning agents with CM workflows to address analyst and industry needs, namely reducing false alarms, enhancing fault severity estimation, improving decision support, and offering explainable interfaces. We propose MindRAG, a modular framework combining multimodal retrieval-augmented generation (RAG) with novel vector store structures designed specifically for CM data. The framework leverages existing annotations and maintenance work orders as surrogates for labels in a supervised learning protocol, addressing the common challenge of training predictive models on unlabelled and noisy real-world datasets. The primary contributions include: (1) an approach for structuring industry CM data into a semi-structured multimodal vector store compatible with LLM-driven workflows; (2) developing multimodal RAG techniques tailored for CM data; (3) developing practical reasoning agents capable of addressing real-world CM queries; and (4) presenting an experimental framework for integrating and evaluating such agents in realistic industrial scenarios. Preliminary results, evaluated with the help of an experienced analyst, indicate that MindRAG provide meaningful decision support for more efficient management of alarms, thereby improving the interpretability of CM systems.

nan


Article 587

Title@2025-06-10 (2): A Generative Framework for Predictive Modeling of Multiple Chronic Conditions Using Graph Variational Autoencoder and Bandit-Optimized Graph Neural Network

Title: A Generative Framework for Predictive Modeling of Multiple Chronic Conditions Using Graph Variational Autoencoder and Bandit-Optimized Graph Neural Network Ein generatives Framework für die vorausschauende Modellierung mehrerer chronischer Zustände mit Graph Variational Autoencoder und Bandit-Optimiertes Graphen-Neural-Netzwerk 利用图表变异自动编码器和强盗优化的图形神经网络对多种慢性条件进行预测模拟的生成框架 2409.13671v3

Authors (5): Julian Carvajal Rico, Adel Alaeddini, Syed Hasib Akhter Faruqui, Susan P Fisher-Hoch, Joseph B Mccormick

Predicting the emergence of multiple chronic conditions (MCC) is crucial for early intervention and personalized healthcare, as MCC significantly impacts patient outcomes and healthcare costs. Graph neural networks (GNNs) are effective methods for modeling complex graph data, such as those found in MCC. However, a significant challenge with GNNs is their reliance on an existing graph structure, which is not readily available for MCC. To address this challenge, we propose a novel generative framework for GNNs that constructs a representative underlying graph structure by utilizing the distribution of the data to enhance predictive analytics for MCC. Our framework employs a graph variational autoencoder (GVAE) to capture the complex relationships in patient data. This allows for a comprehensive understanding of individual health trajectories and facilitates the creation of diverse patient stochastic similarity graphs while preserving the original feature set. These variations of patient stochastic similarity graphs, generated from the GVAE decoder, are then processed by a GNN using a novel Laplacian regularization technique to refine the graph structure over time and improves the prediction accuracy of MCC. A contextual Bandit is designed to evaluate the stochastically generated graphs and identify the best-performing graph for the GNN model iteratively until model convergence. We validate the performance of the proposed contextual Bandit algorithm against $\varepsilon$-Greedy and multi-armed Bandit algorithms on a large cohort (n = 1,592) of patients with MCC. These advancements highlight the potential of the proposed approach to transform predictive healthcare analytics, enabling a more personalized and proactive approach to MCC management.

nan


Article 588

Title@2025-06-10 (2): Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning

Title: Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning Ursachenzuweisung von Lieferrisiken durch kausale Entdeckung mit Verstärkungslernen 利用强化学习发现原因和强化学习,确定分娩风险的根源原因 2408.05860v3

Authors (1): Minheng Xiao

This paper presents a novel approach to root cause attribution of delivery risks within supply chains by integrating causal discovery with reinforcement learning. As supply chains become increasingly complex, traditional methods of root cause analysis struggle to capture the intricate interrelationships between various factors, often leading to spurious correlations and suboptimal decision-making. Our approach addresses these challenges by leveraging causal discovery to identify the true causal relationships between operational variables, and reinforcement learning to iteratively refine the causal graph. This method enables the accurate identification of key drivers of late deliveries, such as shipping mode and delivery status, and provides actionable insights for optimizing supply chain performance. We apply our approach to a real-world supply chain dataset, demonstrating its effectiveness in uncovering the underlying causes of delivery delays and offering strategies for mitigating these risks. The findings have significant implications for improving operational efficiency, customer satisfaction, and overall profitability within supply chains.

nan


Article 589

Title@2025-06-10 (2): Electroencephalogram Emotion Recognition via AUC Maximization

Title: Electroencephalogram Emotion Recognition via AUC Maximization Elektroenzephalogramm-Emotionserkennung über AUC-Maximierung 通过AUUC实现最大化来识别电脑图情感 2408.08979v3

Authors (1): Minheng Xiao

Imbalanced datasets pose significant challenges in areas including neuroscience, cognitive science, and medical diagnostics, where accurately detecting minority classes is essential for robust model performance. This study addresses the issue of class imbalance, using the `Liking’ label in the DEAP dataset as an example. Such imbalances are often overlooked by prior research, which typically focuses on the more balanced arousal and valence labels and predominantly uses accuracy metrics to measure model performance. To tackle this issue, we adopt numerical optimization techniques aimed at maximizing the area under the curve (AUC), thus enhancing the detection of underrepresented classes. Our approach, which begins with a linear classifier, is compared against traditional linear classifiers, including logistic regression and support vector machines (SVM). Our method significantly outperforms these models, increasing recall from 41.6\% to 79.7\% and improving the F1-score from 0.506 to 0.632. These results highlight the efficacy of AUC maximization via numerical optimization in managing imbalanced datasets, providing an effective solution for enhancing predictive accuracy in detecting minority but crucial classes in out-of-sample datasets.

nan


Article 590

Title@2025-06-10 (2): PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies

Title: PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies PatchGuard: Widerwärtig robuste Anomalieerkennung und Lokalisierung durch Vision Transformer und Pseudoanomalien PatchGuard:通过视觉变形器和假冒异常器反向强力异常检测和定位 2506.09237v1

Authors (5): Mojtaba Nafez, Amirhossein Koochakian, Arad Maleki, Jafar Habibi, Mohammad Hossein Rohban

Anomaly Detection (AD) and Anomaly Localization (AL) are crucial in fields that demand high reliability, such as medical imaging and industrial monitoring. However, current AD and AL approaches are often susceptible to adversarial attacks due to limitations in training data, which typically include only normal, unlabeled samples. This study introduces PatchGuard, an adversarially robust AD and AL method that incorporates pseudo anomalies with localization masks within a Vision Transformer (ViT)-based architecture to address these vulnerabilities. We begin by examining the essential properties of pseudo anomalies, and follow it by providing theoretical insights into the attention mechanisms required to enhance the adversarial robustness of AD and AL systems. We then present our approach, which leverages Foreground-Aware Pseudo-Anomalies to overcome the deficiencies of previous anomaly-aware methods. Our method incorporates these crafted pseudo-anomaly samples into a ViT-based framework, with adversarial training guided by a novel loss function designed to improve model robustness, as supported by our theoretical analysis. Experimental results on well-established industrial and medical datasets demonstrate that PatchGuard significantly outperforms previous methods in adversarial settings, achieving performance gains of $53.2\%$ in AD and $68.5\%$ in AL, while also maintaining competitive accuracy in non-adversarial settings. The code repository is available at https://github.com/rohban-lab/PatchGuard .

nan


Article 591

Title@2025-06-10 (2): FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy

Title: FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy FastLloyd: Federated, Exakt, Secure, and Tunable $k$-Means Clustering with Differential Privacy 联邦、准确、安全和金枪鱼用$k$-配有不同隐私的组合型工具 2405.02437v3

Authors (3): Abdulrahman Diaa, Thomas Humphries, Florian Kerschbaum

We study the problem of privacy-preserving $k$-means clustering in the horizontally federated setting. Existing federated approaches using secure computation suffer from substantial overheads and do not offer output privacy. At the same time, differentially private (DP) $k$-means algorithms either assume a trusted central curator or significantly degrade utility by adding noise in the local DP model. Naively combining the secure and central DP solutions results in a protocol with impractical overhead. Instead, our work provides enhancements to both the DP and secure computation components, resulting in a design that is faster, more private, and more accurate than previous work. By utilizing the computational DP model, we design a lightweight, secure aggregation-based approach that achieves five orders of magnitude speed-up over state-of-the-art related work. Furthermore, we not only maintain the utility of the state-of-the-art in the central model of DP, but we improve the utility further by designing a new DP clustering mechanism.

nan


Article 592

Title@2025-06-10 (2): LieRE: Lie Rotational Positional Encodings

Title: LieRE: Lie Rotational Positional Encodings LieRE: Lie Rotational Positional Encodings 谎言: 谎言旋转位置编码 2406.10322v4

Authors (6): Sophie Ostmeier, Brian Axelrod, Maya Varma, Michael E. Moseley, Akshay Chaudhari, Curtis Langlotz

Transformer architectures depend on explicit position encodings to capture token positional information. Rotary Position Encoding (RoPE) has emerged as a popular choice in language models due to its efficient encoding of relative position information through key-query rotations. However, RoPE faces significant limitations beyond language processing: it is constrained to one-dimensional sequence data and, even with learnable phases, offers limited representational capacity. We address these challenges with Lie Relative Encodings (LieRE), which generalizes RoPE to high-dimensional rotation matrices by leveraging their Lie group structure. Through extensive evaluation on three image datasets across 2D and 3D classification tasks, LieRE achieves 1.5% improvement over state-of-the-art baselines on 2D tasks and 1% on 3D tasks, while demonstrating superior generalization to higher resolutions. Our implementation is computationally efficient, with results reproducible on 4 A100 GPUs in 30 minutes on CIFAR100. Our code is available at https://github.com/StanfordMIMI/LieRE.

nan


Article 593

Title@2025-06-10 (2): SoK: Machine Unlearning for Large Language Models

Title: SoK: Machine Unlearning for Large Language Models SoK: Maschinelles Lernen für große Sprachmodelle SoK:大语言模型的机器脱学 2506.09227v1

Authors (5): Jie Ren, Yue Xing, Yingqian Cui, Charu C. Aggarwal, Hui Liu

Large language model (LLM) unlearning has become a critical topic in machine learning, aiming to eliminate the influence of specific training data or knowledge without retraining the model from scratch. A variety of techniques have been proposed, including Gradient Ascent, model editing, and re-steering hidden representations. While existing surveys often organize these methods by their technical characteristics, such classifications tend to overlook a more fundamental dimension: the underlying intention of unlearning–whether it seeks to truly remove internal knowledge or merely suppress its behavioral effects. In this SoK paper, we propose a new taxonomy based on this intention-oriented perspective. Building on this taxonomy, we make three key contributions. First, we revisit recent findings suggesting that many removal methods may functionally behave like suppression, and explore whether true removal is necessary or achievable. Second, we survey existing evaluation strategies, identify limitations in current metrics and benchmarks, and suggest directions for developing more reliable and intention-aligned evaluations. Third, we highlight practical challenges–such as scalability and support for sequential unlearning–that currently hinder the broader deployment of unlearning methods. In summary, this work offers a comprehensive framework for understanding and advancing unlearning in generative AI, aiming to support future research and guide policy decisions around data removal and privacy.

nan


Article 594

Title@2025-06-10 (2): The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound

Title: The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound Die Wirksamkeit von Semantik-erhaltenden Transformationen im selbstüberwachten Lernen für medizinischen Ultraschall 医学超声波自监测学习中语义保留变异的功效 2504.07904v2

Authors (4): Blake VanBerlo, Alexander Wong, Jesse Hoey, Robert Arntfield

Data augmentation is a central component of joint embedding self-supervised learning (SSL). Approaches that work for natural images may not always be effective in medical imaging tasks. This study systematically investigated the impact of data augmentation and preprocessing strategies in SSL for lung ultrasound. Three data augmentation pipelines were assessed: (1) a baseline pipeline commonly used across imaging domains, (2) a novel semantic-preserving pipeline designed for ultrasound, and (3) a distilled set of the most effective transformations from both pipelines. Pretrained models were evaluated on multiple classification tasks: B-line detection, pleural effusion detection, and COVID-19 classification. Experiments revealed that semantics-preserving data augmentation resulted in the greatest performance for COVID-19 classification - a diagnostic task requiring global image context. Cropping-based methods yielded the greatest performance on the B-line and pleural effusion object classification tasks, which require strong local pattern recognition. Lastly, semantics-preserving ultrasound image preprocessing resulted in increased downstream performance for multiple tasks. Guidance regarding data augmentation and preprocessing strategies was synthesized for practitioners working with SSL in ultrasound.

nan


Article 595

Title@2025-06-10 (2): Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs

Title: Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs Robuste Geräuschdämpfung durch adaptive Bündelung von Transformer-Ausgängen 通过调适式组合变异器产出,通过调适式组合组合,强有力噪音减速 2506.09215v1

Authors (1): Greyson Brothers

We investigate the design of pooling methods used to summarize the outputs of transformer embedding models, primarily motivated by reinforcement learning and vision applications. This work considers problems where a subset of the input vectors contains requisite information for a downstream task (signal) while the rest are distractors (noise). By framing pooling as vector quantization with the goal of minimizing signal loss, we demonstrate that the standard methods used to aggregate transformer outputs, AvgPool, MaxPool, and ClsToken, are vulnerable to performance collapse as the signal-to-noise ratio (SNR) of inputs fluctuates. We then show that an attention-based adaptive pooling method can approximate the signal-optimal vector quantizer within derived error bounds for any SNR. Our theoretical results are first validated by supervised experiments on a synthetic dataset designed to isolate the SNR problem, then generalized to standard relational reasoning, multi-agent reinforcement learning, and vision benchmarks with noisy observations, where transformers with adaptive pooling display superior robustness across tasks.

nan


Article 596

Title@2025-06-10 (2): Revisiting Graph Projections for Effective Complementary Product Recommendation

Title: Revisiting Graph Projections for Effective Complementary Product Recommendation Überarbeiten von Graph Projektionen für eine effektive ergänzende Produktempfehlung 修订有效补充产品建议书的图表预测 2506.09209v1

Authors (3): Leandro Anghinoni, Pablo Zivic, Jorge Adrian Sanchez

Complementary product recommendation is a powerful strategy to improve customer experience and retail sales. However, recommending the right product is not a simple task because of the noisy and sparse nature of user-item interactions. In this work, we propose a simple yet effective method to predict a list of complementary products given a query item, based on the structure of a directed weighted graph projected from the user-item bipartite graph. We revisit bipartite graph projections for recommender systems and propose a novel approach for inferring complementarity relationships from historical user-item interactions. We compare our model with recent methods from the literature and show, despite the simplicity of our approach, an average improvement of +43% and +38% over sequential and graph-based recommenders, respectively, over different benchmarks.

nan


Article 597

Title@2025-06-10 (2): A Comparative Study of Conventional and Tripolar EEG for High-Performance Reach-to-Grasp BCI Systems

Title: A Comparative Study of Conventional and Tripolar EEG for High-Performance Reach-to-Grasp BCI Systems Vergleichende Studie des konventionellen und tripolaren EEG für Hochleistungs-Reach-to-Grap BCI-Systeme 利用常规和三极EEG进行高绩效接近格拉斯普到格拉斯普BCI系统比较研究 2402.09448v3

Authors (6): Ali Rabiee, Sima Ghafoori, Anna Cetera, Maryam Norouzi, Walter Besio, Reza Abiri

This study aims to enhance BCI applications for individuals with motor impairments by comparing the effectiveness of tripolar EEG (tEEG) with conventional EEG. The focus is on interpreting and decoding various grasping movements, such as power grasp and precision grasp. The goal is to determine which EEG technology is more effective in processing and translating grasp related neural signals. The approach involved experimenting on ten healthy participants who performed two distinct grasp movements: power grasp and precision grasp, with a no movement condition serving as the baseline. Our research presents a thorough comparison between EEG and tEEG in decoding grasping movements. This comparison spans several key parameters, including signal to noise ratio (SNR), spatial resolution via functional connectivity, ERPs, and wavelet time frequency analysis. Additionally, our study involved extracting and analyzing statistical features from the wavelet coefficients, and both binary and multiclass classification methods were employed. Four machine learning algorithms were used to evaluate the decoding accuracies. Our results indicated that tEEG demonstrated superior performance over conventional EEG in various aspects. This included a higher signal to noise ratio, enhanced spatial resolution, and more informative data in ERPs and wavelet time frequency analysis. The use of tEEG led to notable improvements in decoding accuracy for differentiating movement types. Specifically, tEEG achieved around 90% accuracy in binary and 75.97% for multiclass classification. These results are markedly better than those from standard EEG, which recorded a maximum of 77.85% and 61.27% in similar tasks, respectively. These findings highlight the superior effectiveness of tEEG over EEG in decoding grasp types and its competitive or superior performance in complex classifications compared with existing research.

nan


Article 598

Title@2025-06-10 (2): FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models

Title: FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models FLoRIST: Singular Value Thresholding für effizientes und präzises Federated Fine-Tuning von großen Sprachmodellen FLORST:为高效率和准确的联邦大语言模型的精密使用而保持单一价值 2506.09199v1

Authors (2): Hariharan Ramesh, Jyotikrishna Dass

Integrating Low-Rank Adaptation (LoRA) into federated learning offers a promising solution for parameter-efficient fine-tuning of Large Language Models (LLMs) without sharing local data. However, several methods designed for federated LoRA present significant challenges in balancing communication efficiency, model accuracy, and computational cost, particularly among heterogeneous clients. These methods either rely on simplistic averaging of local adapters, which introduces aggregation noise, require transmitting large stacked local adapters, leading to poor communication efficiency, or necessitate reconstructing memory-dense global weight-update matrix and performing computationally expensive decomposition to design client-specific low-rank adapters. In this work, we propose FLoRIST, a federated fine-tuning framework that achieves mathematically accurate aggregation without incurring high communication or computational overhead. Instead of constructing the full global weight-update matrix at the server, FLoRIST employs an efficient decomposition pipeline by performing singular value decomposition on stacked local adapters separately. This approach operates within a compact intermediate space to represent the accumulated information from local LoRAs. We introduce tunable singular value thresholding for server-side optimal rank selection to construct a pair of global low-rank adapters shared by all clients. Extensive empirical evaluations across multiple datasets and LLMs demonstrate that FLoRIST consistently strikes the best balance between superior communication efficiency and competitive performance in both homogeneous and heterogeneous setups.

nan


Article 599

Title@2025-06-10 (2): Multimodal Pragmatic Jailbreak on Text-to-image Models

Title: Multimodal Pragmatic Jailbreak on Text-to-image Models Multimodaler Pragmatischer Jailbreak zu Text-to-Image-Modellen 关于文本到图像模型的多式实用监狱 2409.19149v2

Authors (9): Tong Liu, Zhixin Lai, Jiawen Wang, Gengyuan Zhang, Shuo Chen, Philip Torr, Vera Demberg, Volker Tresp, Jindong Gu

Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content. To systematically explore this phenomenon, we propose a dataset to evaluate the current diffusion-based text-to-image (T2I) models under such jailbreak. We benchmark nine representative T2I models, including two closed-source commercial models. Experimental results reveal a concerning tendency to produce unsafe content: all tested models suffer from such type of jailbreak, with rates of unsafe generation ranging from around 10\% to 70\% where DALLE 3 demonstrates almost the highest unsafety. In real-world scenarios, various filters such as keyword blocklists, customized prompt filters, and NSFW image filters, are commonly employed to mitigate these risks. We evaluate the effectiveness of such filters against our jailbreak and found that, while these filters may be effective for single modality detection, they fail to work against our jailbreak. We also investigate the underlying reason for such jailbreaks, from the perspective of text rendering capability and training data. Our work provides a foundation for further development towards more secure and reliable T2I models. Project page at https://multimodalpragmatic.github.io/.

nan


Article 600

Title@2025-06-10 (2): Certified Unlearning for Neural Networks

Title: Certified Unlearning for Neural Networks Zertifiziertes Lernen für neurale Netzwerke 神经网络认证的不学习 2506.06985v2

Authors (5): Anastasia Koloskova, Youssef Allouah, Animesh Jha, Rachid Guerraoui, Sanmi Koyejo

We address the problem of machine unlearning, where the goal is to remove the influence of specific training data from a model upon request, motivated by privacy concerns and regulatory requirements such as the “right to be forgotten.” Unfortunately, existing methods rely on restrictive assumptions or lack formal guarantees. To this end, we propose a novel method for certified machine unlearning, leveraging the connection between unlearning and privacy amplification by stochastic post-processing. Our method uses noisy fine-tuning on the retain data, i.e., data that does not need to be removed, to ensure provable unlearning guarantees. This approach requires no assumptions about the underlying loss function, making it broadly applicable across diverse settings. We analyze the theoretical trade-offs in efficiency and accuracy and demonstrate empirically that our method not only achieves formal unlearning guarantees but also performs effectively in practice, outperforming existing baselines. Our code is available at https://github.com/stair-lab/certified-unlearning-neural-networks-icml-2025

nan


Article 601

Title@2025-06-10 (2): LaDCast: A Latent Diffusion Model for Medium-Range Ensemble Weather Forecasting

Title: LaDCast: A Latent Diffusion Model for Medium-Range Ensemble Weather Forecasting LaDCast: Ein latentes Diffusionsmodell für mittelgroße Ensemble-Wettervorhersage LaDCast: 中频集合天气预报的延迟传播模型 2506.09193v1

Authors (2): Yilin Zhuang, Karthik Duraisamy

Accurate probabilistic weather forecasting demands both high accuracy and efficient uncertainty quantification, challenges that overburden both ensemble numerical weather prediction (NWP) and recent machine-learning methods. We introduce LaDCast, the first global latent-diffusion framework for medium-range ensemble forecasting, which generates hourly ensemble forecasts entirely in a learned latent space. An autoencoder compresses high-dimensional ERA5 reanalysis fields into a compact representation, and a transformer-based diffusion model produces sequential latent updates with arbitrary hour initialization. The model incorporates Geometric Rotary Position Embedding (GeoRoPE) to account for the Earth’s spherical geometry, a dual-stream attention mechanism for efficient conditioning, and sinusoidal temporal embeddings to capture seasonal patterns. LaDCast achieves deterministic and probabilistic skill close to that of the European Centre for Medium-Range Forecast IFS-ENS, without any explicit perturbations. Notably, LaDCast demonstrates superior performance in tracking rare extreme events such as cyclones, capturing their trajectories more accurately than established models. By operating in latent space, LaDCast reduces storage and compute by orders of magnitude, demonstrating a practical path toward forecasting at kilometer-scale resolution in real time. We open-source our code and models and provide the training and evaluation pipelines at: https://github.com/tonyzyl/ladcast.

nan


Article 602

Title@2025-06-10 (2): Multi-Task Reward Learning from Human Ratings

Title: Multi-Task Reward Learning from Human Ratings Multi-Task-Reward-Lernen aus menschlichen Bewertungen 从人类评级中学习多任务奖励 2506.09183v1

Authors (6): Mingkang Wu, Devin White, Evelyn Rose, Vernon Lawhern, Nicholas R Waytowich, Yongcan Cao

Reinforcement learning from human feeback (RLHF) has become a key factor in aligning model behavior with users’ goals. However, while humans integrate multiple strategies when making decisions, current RLHF approaches often simplify this process by modeling human reasoning through isolated tasks such as classification or regression. In this paper, we propose a novel reinforcement learning (RL) method that mimics human decision-making by jointly considering multiple tasks. Specifically, we leverage human ratings in reward-free environments to infer a reward function, introducing learnable weights that balance the contributions of both classification and regression models. This design captures the inherent uncertainty in human decision-making and allows the model to adaptively emphasize different strategies. We conduct several experiments using synthetic human ratings to validate the effectiveness of the proposed approach. Results show that our method consistently outperforms existing rating-based RL methods, and in some cases, even surpasses traditional RL approaches.

nan


Article 603

Title@2025-06-10 (2): Pointwise confidence estimation in the non-linear $\ell^2$-regularized least squares

Title: Pointwise confidence estimation in the non-linear $\ell^2$-regularized least squares Punktweise Vertrauensschätzung in den nichtlinearen $\ell^2$-regularisierten kleinsten Quadraten 在非线性$\ell2美元正规化最低平方中点性信心估计值 2506.07088v2

Authors (2): Ilja Kuzborskij, Yasin Abbasi Yadkori

We consider a high-probability non-asymptotic confidence estimation in the $\ell^2$-regularized non-linear least-squares setting with fixed design. In particular, we study confidence estimation for local minimizers of the regularized training loss. We show a pointwise confidence bound, meaning that it holds for the prediction on any given fixed test input $x$. Importantly, the proposed confidence bound scales with similarity of the test input to the training data in the implicit feature space of the predictor (for instance, becoming very large when the test input lies far outside of the training data). This desirable last feature is captured by the weighted norm involving the inverse-Hessian matrix of the objective function, which is a generalized version of its counterpart in the linear setting, $x^{\top} \text{Cov}^{-1} x$. Our generalized result can be regarded as a non-asymptotic counterpart of the classical confidence interval based on asymptotic normality of the MLE estimator. We propose an efficient method for computing the weighted norm, which only mildly exceeds the cost of a gradient computation of the loss function. Finally, we complement our analysis with empirical evidence showing that the proposed confidence bound provides better coverage/width trade-off compared to a confidence estimation by bootstrapping, which is a gold-standard method in many applications involving non-linear predictors such as neural networks.

nan


Article 604

Title@2025-06-10 (2): Spatial Reasoning with Denoising Models

Title: Spatial Reasoning with Denoising Models Raumgerechtes Denken mit vernarbenden Modellen 空间原因与排减模式 2502.21075v2

Authors (4): Christopher Wewer, Bart Pogodzinski, Bernt Schiele, Jan Eric Lenssen

We introduce Spatial Reasoning Models (SRMs), a framework to perform reasoning over sets of continuous variables via denoising generative models. SRMs infer continuous representations on a set of unobserved variables, given observations on observed variables. Current generative models on spatial domains, such as diffusion and flow matching models, often collapse to hallucination in case of complex distributions. To measure this, we introduce a set of benchmark tasks that test the quality of complex reasoning in generative models and can quantify hallucination. The SRM framework allows to report key findings about importance of sequentialization in generation, the associated order, as well as the sampling strategies during training. It demonstrates, for the first time, that order of generation can successfully be predicted by the denoising network itself. Using these findings, we can increase the accuracy of specific reasoning tasks from <1% to >50%. Our project website provides additional videos, code, and the benchmark datasets: https://geometric-rl.mpi-inf.mpg.de/srm

nan


Article 605

Title@2025-06-10 (2): Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism

Title: Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism Roboter-Gated Interaktives Imitation-Lernen mit adaptivem Interventionsmechanismus 采用适应性干预机制的机器人自制交互式模拟学习 2506.09176v1

Authors (3): Haoyuan Cai, Zhenghao Peng, Bolei Zhou

Interactive Imitation Learning (IIL) allows agents to acquire desired behaviors through human interventions, but current methods impose high cognitive demands on human supervisors. We propose the Adaptive Intervention Mechanism (AIM), a novel robot-gated IIL algorithm that learns an adaptive criterion for requesting human demonstrations. AIM utilizes a proxy Q-function to mimic the human intervention rule and adjusts intervention requests based on the alignment between agent and human actions. By assigning high Q-values when the agent deviates from the expert and decreasing these values as the agent becomes proficient, the proxy Q-function enables the agent to assess the real-time alignment with the expert and request assistance when needed. Our expert-in-the-loop experiments reveal that AIM significantly reduces expert monitoring efforts in both continuous and discrete control tasks. Compared to the uncertainty-based baseline Thrifty-DAgger, our method achieves a 40% improvement in terms of human take-over cost and learning efficiency. Furthermore, AIM effectively identifies safety-critical states for expert assistance, thereby collecting higher-quality expert demonstrations and reducing overall expert data and environment interactions needed. Code and demo video are available at https://github.com/metadriverse/AIM.

nan


Article 606

Title@2025-06-10 (2): Multivariate Long-term Time Series Forecasting with Fourier Neural Filter

Title: Multivariate Long-term Time Series Forecasting with Fourier Neural Filter Multivariate Langzeit-Zeitreihenprognose mit Fourier-Neuralfilter 使用 Fourier神经过滤器预测多变长期时间序列 2506.09174v1

Authors (4): Chenheng Xu, Dan Wu, Yixin Zhu, Ying Nian Wu

Multivariate long-term time series forecasting has been suffering from the challenge of capturing both temporal dependencies within variables and spatial correlations across variables simultaneously. Current approaches predominantly repurpose backbones from natural language processing or computer vision (e.g., Transformers), which fail to adequately address the unique properties of time series (e.g., periodicity). The research community lacks a dedicated backbone with temporal-specific inductive biases, instead relying on domain-agnostic backbones supplemented with auxiliary techniques (e.g., signal decomposition). We introduce FNF as the backbone and DBD as the architecture to provide excellent learning capabilities and optimal learning pathways for spatio-temporal modeling, respectively. Our theoretical analysis proves that FNF unifies local time-domain and global frequency-domain information processing within a single backbone that extends naturally to spatial modeling, while information bottleneck theory demonstrates that DBD provides superior gradient flow and representation capacity compared to existing unified or sequential architectures. Our empirical evaluation across 11 public benchmark datasets spanning five domains (energy, meteorology, transportation, environment, and nature) confirms state-of-the-art performance with consistent hyperparameter settings. Notably, our approach achieves these results without any auxiliary techniques, suggesting that properly designed neural architectures can capture the inherent properties of time series, potentially transforming time series modeling in scientific and industrial applications.

nan


Article 607

Title@2025-06-10 (2): The Curious Language Model: Strategic Test-Time Information Acquisition

Title: The Curious Language Model: Strategic Test-Time Information Acquisition Das Kuriose Sprachmodell: Strategische Test-Zeit-Informationsbeschaffung 好奇语言模式:战略试验时间信息采集 2506.09173v1

Authors (5): Michael Cooper, Rohan Wadhawan, John Michael Giorgi, Chenhao Tan, Davis Liang

Decision-makers often possess insufficient information to render a confident decision. In these cases, the decision-maker can often undertake actions to acquire the necessary information about the problem at hand, e.g., by consulting knowledgeable authorities or by conducting experiments. Importantly, different levers of information acquisition come with different costs, posing the challenge of selecting the actions that are both informative and cost-effective. In this work, we propose CuriosiTree, a heuristic-based, test-time policy for zero-shot information acquisition in large language models (LLMs). CuriosiTree employs a greedy tree search to estimate the expected information gain of each action and strategically chooses actions based on a balance of anticipated information gain and associated cost. Empirical validation in a clinical diagnosis simulation shows that CuriosiTree enables cost-effective integration of heterogenous sources of information, and outperforms baseline action selection strategies in selecting action sequences that enable accurate diagnosis.

nan


Article 608

Title@2025-06-10 (2): MultiNet: An Open-Source Software Toolkit \& Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models

Title: MultiNet: An Open-Source Software Toolkit \& Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models MultiNet: Ein Open Source Software Toolkit \& Benchmark Suite für die Bewertung und Anpassung multimodaler Aktionsmodelle 多网:多模式行动模式评价和适应评估的开放源软件工具包 基准套件 2506.09172v1

Authors (3): Pranav Guruprasad, Yangyue Wang, Harshvardhan Sikka

Recent innovations in multimodal action models represent a promising direction for developing general-purpose agentic systems, combining visual understanding, language comprehension, and action generation. We introduce MultiNet - a novel, fully open-source benchmark and surrounding software ecosystem designed to rigorously evaluate and adapt models across vision, language, and action domains. We establish standardized evaluation protocols for assessing vision-language models (VLMs) and vision-language-action models (VLAs), and provide open source software to download relevant data, models, and evaluations. Additionally, we provide a composite dataset with over 1.3 trillion tokens of image captioning, visual question answering, commonsense reasoning, robotic control, digital game-play, simulated locomotion/manipulation, and many more tasks. The MultiNet benchmark, framework, toolkit, and evaluation harness have been used in downstream research on the limitations of VLA generalization.

nan


Article 609

Title: Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search Verbesserung der LLM-Agentenplanung mit In-Context Learning über Atomic Fact Augmentation und Lookahead Search 通过原子事实增强和目光头搜索改进LLM Agent 规划,通过原子事实增强和目光头搜索进行文中学习 2506.09171v1

Authors (4): Samuel Holt, Max Ruiz Luyten, Thomas Pouplin, Mihaela van der Schaar

Large Language Models (LLMs) are increasingly capable but often require significant guidance or extensive interaction history to perform effectively in complex, interactive environments. Existing methods may struggle with adapting to new information or efficiently utilizing past experiences for multi-step reasoning without fine-tuning. We introduce a novel LLM agent framework that enhances planning capabilities through in-context learning, facilitated by atomic fact augmentation and a recursive lookahead search. Our agent learns to extract task-critical ``atomic facts’’ from its interaction trajectories. These facts dynamically augment the prompts provided to LLM-based components responsible for action proposal, latent world model simulation, and state-value estimation. Planning is performed via a depth-limited lookahead search, where the LLM simulates potential trajectories and evaluates their outcomes, guided by the accumulated facts and interaction history. This approach allows the agent to improve its understanding and decision-making online, leveraging its experience to refine its behavior without weight updates. We provide a theoretical motivation linking performance to the quality of fact-based abstraction and LLM simulation accuracy. Empirically, our agent demonstrates improved performance and adaptability on challenging interactive tasks, achieving more optimal behavior as it accumulates experience, showcased in tasks such as TextFrozenLake and ALFWorld.

nan


Article 610

Title@2025-06-10 (2): ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning

Title: ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning ProteinZero: Selbstverbessernde Proteinerzeugung über Online-Verstärkungslernen 蛋白零:通过在线强化学习改善蛋白质生成 2506.07459v2

Authors (6): Ziwen Wang, Jiajun Fan, Ruihan Guo, Thao Nguyen, Heng Ji, Ge Liu

Protein generative models have shown remarkable promise in protein design but still face limitations in success rate, due to the scarcity of high-quality protein datasets for supervised pretraining. We present ProteinZero, a novel framework that enables scalable, automated, and continuous self-improvement of the inverse folding model through online reinforcement learning. To achieve computationally tractable online feedback, we introduce efficient proxy reward models based on ESM-fold and a novel rapid ddG predictor that significantly accelerates evaluation speed. ProteinZero employs a general RL framework balancing multi-reward maximization, KL-divergence from a reference model, and a novel protein-embedding level diversity regularization that prevents mode collapse while promoting higher sequence diversity. Through extensive experiments, we demonstrate that ProteinZero substantially outperforms existing methods across every key metric in protein design, achieving significant improvements in structural accuracy, designability, thermodynamic stability, and sequence diversity. Most impressively, ProteinZero reduces design failure rates by approximately 36% - 48% compared to widely-used methods like ProteinMPNN, ESM-IF and InstructPLM, consistently achieving success rates exceeding 90% across diverse and complex protein folds. Notably, the entire RL run on CATH-4.3 can be done with a single 8 X GPU node in under 3 days, including reward computation. Our work establishes a new paradigm for protein design where models evolve continuously from their own generated outputs, opening new possibilities for exploring the vast protein design space.

nan


Article 611

Title@2025-06-10 (2): Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models

Title: Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models Ist die Lärmkonditionierung notwendig? Eine einheitliche Theorie der Bedingungslosen Graphen-Diffusionsmodelle 是否有必要设定噪音条件? 无条件图形扩散模型的统一理论 2505.22935v2

Authors (2): Jipeng Li, Yanning Shen

Explicit noise-level conditioning is widely regarded as essential for the effective operation of Graph Diffusion Models (GDMs). In this work, we challenge this assumption by investigating whether denoisers can implicitly infer noise levels directly from corrupted graph structures, potentially eliminating the need for explicit noise conditioning. To this end, we develop a theoretical framework centered on Bernoulli edge-flip corruptions and extend it to encompass more complex scenarios involving coupled structure-attribute noise. Extensive empirical evaluations on both synthetic and real-world graph datasets, using models such as GDSS and DiGress, provide strong support for our theoretical findings. Notably, unconditional GDMs achieve performance comparable or superior to their conditioned counterparts, while also offering reductions in parameters (4-6%) and computation time (8-10%). Our results suggest that the high-dimensional nature of graph data itself often encodes sufficient information for the denoising process, opening avenues for simpler, more efficient GDM architectures.

nan


Article 612

Title@2025-06-10 (2): Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes

Title: Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes Skalierbare Spatiotemporale Inferenz mit Biased Scan Aufmerksamkeit Transformer Neural Prozesse 可缩缩缩的 Spatoto时推理 2506.09163v1

Authors (7): Daniel Jenson, Jhonathan Navott, Piotr Grynfelder, Mengyan Zhang, Makkunda Sharma, Elizaveta Semenova, Seth Flaxman

Neural Processes (NPs) are a rapidly evolving class of models designed to directly model the posterior predictive distribution of stochastic processes. While early architectures were developed primarily as a scalable alternative to Gaussian Processes (GPs), modern NPs tackle far more complex and data hungry applications spanning geology, epidemiology, climate, and robotics. These applications have placed increasing pressure on the scalability of these models, with many architectures compromising accuracy for scalability. In this paper, we demonstrate that this tradeoff is often unnecessary, particularly when modeling fully or partially translation invariant processes. We propose a versatile new architecture, the Biased Scan Attention Transformer Neural Process (BSA-TNP), which introduces Kernel Regression Blocks (KRBlocks), group-invariant attention biases, and memory-efficient Biased Scan Attention (BSA). BSA-TNP is able to: (1) match or exceed the accuracy of the best models while often training in a fraction of the time, (2) exhibit translation invariance, enabling learning at multiple resolutions simultaneously, (3) transparently model processes that evolve in both space and time, (4) support high dimensional fixed effects, and (5) scale gracefully – running inference with over 1M test points with 100K context points in under a minute on a single 24GB GPU.

nan


Article 613

Title@2025-06-10 (2): ICONS: Influence Consensus for Vision-Language Data Selection

Title: ICONS: Influence Consensus for Vision-Language Data Selection ICONS: Einflusskonsens für Vision-Sprachen-Datenauswahl ICONS: 愿景语言数据选择影响共识 2501.00654v3

Authors (6): Xindi Wu, Mengzhou Xia, Rulin Shao, Zhiwei Deng, Pang Wei Koh, Olga Russakovsky

Training vision-language models via instruction tuning often relies on large mixtures of data spanning diverse tasks and domains. However, these mixtures frequently include redundant information, increasing computational costs without proportional performance gains, necessitating more effective data selection strategies. Existing methods typically rely on task-agnostic heuristics to estimate data importance or focus on optimizing single tasks in isolation, limiting their effectiveness in multitask settings. In this work, we introduce ICONS, a gradient-based Influence CONsensus approach for vision-language data Selection. Our method leverages first-order training dynamics to estimate the influence of individual training examples on validation performance and aggregates these estimates across tasks via majority voting over task-specific influences. This cross-task consensus identifies data points that are consistently valuable across tasks, enabling us to prioritize examples that drive overall performance. The voting-based design further mitigates issues such as score calibration and outlier sensitivity, resulting in robust and scalable data selection for diverse multitask mixtures. With only 20% of the data from LLaVA-665K and Cambrian-7M, our selected subsets retain 98.6% and 98.8% of the performance achieved with full datasets, and can even surpass full data training at a 60% selection ratio on LLaVA-665K. Our approach also generalizes to unseen tasks and architectures, demonstrating strong transfer. We release two compact, high-utility subsets, LLaVA-ICONS-133K and Cambrian-ICONS-1.4M, preserving impactful training examples for efficient and scalable vision-language model development.

nan


Article 614

Title@2025-06-10 (2): FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts

Title: FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts FC-Attack:通过自动发光流程图制作的破监狱多式多式大语言模型 2502.21059v2

Authors (5): Ziyi Zhang, Zhen Sun, Zongmin Zhang, Jihui Guo, Xinlei He

Multimodal Large Language Models (MLLMs) have become powerful and widely adopted in some practical applications. However, recent research has revealed their vulnerability to multimodal jailbreak attacks, whereby the model can be induced to generate harmful content, leading to safety risks. Although most MLLMs have undergone safety alignment, recent research shows that the visual modality is still vulnerable to jailbreak attacks. In our work, we discover that by using flowcharts with partially harmful information, MLLMs can be induced to provide additional harmful details. Based on this, we propose a jailbreak attack method based on auto-generated flowcharts, FC-Attack. Specifically, FC-Attack first fine-tunes a pre-trained LLM to create a step-description generator based on benign datasets. The generator is then used to produce step descriptions corresponding to a harmful query, which are transformed into flowcharts in 3 different shapes (vertical, horizontal, and S-shaped) as visual prompts. These flowcharts are then combined with a benign textual prompt to execute the jailbreak attack on MLLMs. Our evaluations on Advbench show that FC-Attack attains an attack success rate of up to 96% via images and up to 78% via videos across multiple MLLMs. Additionally, we investigate factors affecting the attack performance, including the number of steps and the font styles in the flowcharts. We also find that FC-Attack can improve the jailbreak performance from 4% to 28% in Claude-3.5 by changing the font style. To mitigate the attack, we explore several defenses and find that AdaShield can largely reduce the jailbreak performance but with the cost of utility drop.

nan


Article 615

Title@2025-06-10 (2): QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

Title: QuEST: Stable Training of LLMs with 1-Bit Weights and Activations QuEST: Stabiles Training von LLMs mit 1-Bit Gewichten und Aktivierungen QuEST:对具有1-比分重量和激活作用的LLMs进行稳定培训 2502.05003v2

Authors (6): Andrei Panferov, Jiale Chen, Soroush Tabesh, Roberto L. Castro, Mahdi Nikdan, Dan Alistarh

One approach to reducing the massive costs of large language models (LLMs) is the use of quantized or sparse representations for training or deployment. While post-training compression methods are very popular, the question of obtaining even more accurate compressed models by directly training over such representations, i.e., Quantization-Aware Training (QAT), is still open: for example, a recent study (arXiv:2411.04330) put the “optimal” bit-width at which models can be trained using QAT, while staying accuracy-competitive with standard FP16/BF16 precision, at 8-bits weights and activations. We advance this state-of-the-art via a new method called QuEST, for which we demonstrate optimality at 4-bits and stable convergence as low as 1-bit weights and activations. QuEST achieves this by improving two key aspects of QAT methods: (1) accurate and fast quantization of the (continuous) distributions of weights and activations via Hadamard normalization and MSE-optimal fitting; (2) a new trust gradient estimator based on the idea of explicitly minimizing the error between the noisy gradient computed over quantized states and the “true” (but unknown) full-precision gradient. Experiments on Llama-type architectures show that QuEST induces stable scaling laws across the entire range of hardware-supported precisions, and can be extended to sparse representations. We provide GPU kernel support showing that models produced by QuEST can be executed efficiently. Our code is available at https://github.com/IST-DASLab/QuEST.

nan


Article 616

Title@2025-06-10 (2): TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval

Title: TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval TRACE: Erdungszeitreihen im Kontext für multimodale Einbettung und Retrieval TRACE: 在多式嵌入和检索背景下确定时间序列 2506.09114v1

Authors (8): Jialin Chen, Ziyu Zhao, Gaukhar Nurbek, Aosong Feng, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, Rex Ying

The ubiquity of dynamic data in domains such as weather, healthcare, and energy underscores a growing need for effective interpretation and retrieval of time-series data. These data are inherently tied to domain-specific contexts, such as clinical notes or weather narratives, making cross-modal retrieval essential not only for downstream tasks but also for developing robust time-series foundation models by retrieval-augmented generation (RAG). Despite the increasing demand, time-series retrieval remains largely underexplored. Existing methods often lack semantic grounding, struggle to align heterogeneous modalities, and have limited capacity for handling multi-channel signals. To address this gap, we propose TRACE, a generic multimodal retriever that grounds time-series embeddings in aligned textual context. TRACE enables fine-grained channel-level alignment and employs hard negative mining to facilitate semantically meaningful retrieval. It supports flexible cross-modal retrieval modes, including Text-to-Timeseries and Timeseries-to-Text, effectively linking linguistic descriptions with complex temporal patterns. By retrieving semantically relevant pairs, TRACE enriches downstream models with informative context, leading to improved predictive accuracy and interpretability. Beyond a static retrieval engine, TRACE also serves as a powerful standalone encoder, with lightweight task-specific tuning that refines context-aware representations while maintaining strong cross-modal alignment. These representations achieve state-of-the-art performance on downstream forecasting and classification tasks. Extensive experiments across multiple domains highlight its dual utility, as both an effective encoder for downstream applications and a general-purpose retriever to enhance time-series models.

nan


Article 617

Title@2025-06-10 (2): Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations

Title: Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations Verständnis von Aufgabenvektoren im In-Context-Lernen: Emergenz, Funktionalität und Einschränkungen 了解内文学习中的任务矢量:新出现、功能和局限性 2506.09048v1

Authors (4): Yuxin Dong, Jiachen Jiang, Zhihui Zhu, Xia Ning

Task vectors offer a compelling mechanism for accelerating inference in in-context learning (ICL) by distilling task-specific information into a single, reusable representation. Despite their empirical success, the underlying principles governing their emergence and functionality remain unclear. This work proposes the Linear Combination Conjecture, positing that task vectors act as single in-context demonstrations formed through linear combinations of the original ones. We provide both theoretical and empirical support for this conjecture. First, we show that task vectors naturally emerge in linear transformers trained on triplet-formatted prompts through loss landscape analysis. Next, we predict the failure of task vectors on representing high-rank mappings and confirm this on practical LLMs. Our findings are further validated through saliency analyses and parameter visualization, suggesting an enhancement of task vectors by injecting multiple ones into few-shot prompts. Together, our results advance the understanding of task vectors and shed light on the mechanisms underlying ICL in transformer-based models.

nan


Article 618

Title@2025-06-10 (2): Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

Title: Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation Agentische Neuronale Netzwerke: Selbstständige Multi-Agenten-Systeme über textuelle Backpropagation 动态神经网络:通过文字反向分析实现自我演进的多行为者系统 2506.09046v1

Authors (5): Xiaowen Ma, Chenyang Lin, Yao Zhang, Volker Tresp, Yunpu Ma

Leveraging multiple Large Language Models(LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network(ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative “team” focused on a specific subtask. Agentic Neural Network follows a two-phase optimization strategy: (1) Forward Phase-Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase-Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables ANN to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across four benchmark datasets, ANN surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements. Our findings indicate that ANN provides a scalable, data-driven framework for multi-agent systems, combining the collaborative capabilities of LLMs with the efficiency and flexibility of neural network principles. We plan to open-source the entire framework.

nan


Article 619

Title@2025-06-10 (2): Evolutionary Policy Optimization

Title: Evolutionary Policy Optimization Optimierung der Evolutionspolitik 演变政策优化 2503.19037v2

Authors (4): Jianren Wang, Yifan Su, Abhinav Gupta, Deepak Pathak

On-policy reinforcement learning (RL) algorithms are widely used for their strong asymptotic performance and training stability, but they struggle to scale with larger batch sizes, as additional parallel environments yield redundant data due to limited policy-induced diversity. In contrast, Evolutionary Algorithms (EAs) scale naturally and encourage exploration via randomized population-based search, but are often sample-inefficient. We propose Evolutionary Policy Optimization (EPO), a hybrid algorithm that combines the scalability and diversity of EAs with the performance and stability of policy gradients. EPO maintains a population of agents conditioned on latent variables, shares actor-critic network parameters for coherence and memory efficiency, and aggregates diverse experiences into a master agent. Across tasks in dexterous manipulation, legged locomotion, and classic control, EPO outperforms state-of-the-art baselines in sample efficiency, asymptotic performance, and scalability.

nan


Article 620

Title@2025-06-10 (2): The Decoupled Risk Landscape in Performative Prediction

Title: The Decoupled Risk Landscape in Performative Prediction Die entkoppelte Risikolandschaft in der Performativen Vorhersage 性能预测中脱钩的风险景观 2506.09044v1

Authors (4): Javier Sanguino, Thomas Kehrenberg, Jose A. Lozano, Novi Quadrianto

Performative Prediction addresses scenarios where deploying a model induces a distribution shift in the input data, such as individuals modifying their features and reapplying for a bank loan after rejection. Literature has had a theoretical perspective giving mathematical guarantees for convergence (either to the stable or optimal point). We believe that visualization of the loss landscape can complement this theoretical advances with practical insights. Therefore, (1) we introduce a simple decoupled risk visualization method inspired in the two-step process that performative prediction is. Our approach visualizes the risk landscape with respect to two parameter vectors: model parameters and data parameters. We use this method to propose new properties of the interest points, to examine how existing algorithms traverse the risk landscape and perform under more realistic conditions, including strategic classification with non-linear models. (2) Building on this decoupled risk visualization, we introduce a novel setting - extended Performative Prediction - which captures scenarios where the distribution reacts to a model different from the decision-making one, reflecting the reality that agents often lack full access to the deployed model.

nan


Article 621

Title@2025-06-10 (2): FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed

Title: FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed FZOO: Schneller Zeroth-Order-Optimierer für feintunende große Sprachmodelle in Richtung Adam-Scale-Geschwindigkeit FZOO: 向亚当规模速度的微调大语言模型最佳最佳优选器 2506.09034v1

Authors (7): Sizhe Dang, Yangyang Guo, Yanjun Zhao, Haishan Ye, Xiaodong Zheng, Guang Dai, Ivor Tsang

Fine-tuning large language models (LLMs) often faces GPU memory bottlenecks: the backward pass of first-order optimizers like Adam increases memory usage to more than 10 times the inference level (e.g., 633 GB for OPT-30B). Zeroth-order (ZO) optimizers avoid this cost by estimating gradients only from forward passes, yet existing methods like MeZO usually require many more steps to converge. Can this trade-off between speed and memory in ZO be fundamentally improved? Normalized-SGD demonstrates strong empirical performance with greater memory efficiency than Adam. In light of this, we introduce FZOO, a Fast Zeroth-Order Optimizer toward Adam-Scale Speed. FZOO reduces the total forward passes needed for convergence by employing batched one-sided estimates that adapt step sizes based on the standard deviation of batch losses. It also accelerates per-batch computation through the use of Rademacher random vector perturbations coupled with CUDA’s parallel processing. Extensive experiments on diverse models, including RoBERTa-large, OPT (350M-66B), Phi-2, and Llama3, across 11 tasks validate FZOO’s effectiveness. On average, FZOO outperforms MeZO by 3 percent in accuracy while requiring 3 times fewer forward passes. For RoBERTa-large, FZOO achieves average improvements of 5.6 percent in accuracy and an 18 times reduction in forward passes compared to MeZO, achieving convergence speeds comparable to Adam. We also provide theoretical analysis proving FZOO’s formal equivalence to a normalized-SGD update rule and its convergence guarantees. FZOO integrates smoothly into PEFT techniques, enabling even larger memory savings. Overall, our results make single-GPU, high-speed, full-parameter fine-tuning practical and point toward future work on memory-efficient pre-training.

nan


Article 622

Title@2025-06-10 (2): Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Title: Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning Router-R1: Lehren von LLMs Multi-Round Routing und Aggregation durch Verstärkungslernen 路由-R1路由-R1路由:教学LLMS 2506.09033v1

Authors (3): Haozhen Zhang, Tao Feng, Jiaxuan You

The rapid emergence of diverse large language models (LLMs) has spurred the development of LLM routers that assign user queries to the most suitable model. However, existing LLM routers typically perform a single-round, one-to-one mapping (\textit{i.e.}, assigning each query to a single model in isolation), which limits their capability to tackle complex tasks that demand the complementary strengths of multiple LLMs. In this paper, we present \textbf{Router-R1}, a reinforcement learning (RL)-based framework that formulates multi-LLM routing and aggregation as a sequential decision process. Router-R1 instantiates the router itself as a capable LLM, leveraging its reasoning ability to interleave “think” actions (internal deliberation) with “route” actions (dynamic model invocation), and integrates each response into its evolving context. To guide learning, we employ a lightweight rule-based reward comprising format rewards, final outcome rewards, and a novel cost reward for performance and cost trade-off optimization, opening a pathway toward optimizing performance-cost tradeoffs via RL. Router-R1 also conditions only on simple model descriptors such as pricing, latency, and example performance, enabling strong generalization to unseen model selection. Experiments on seven general and multi-hop QA benchmarks show that Router-R1 outperforms over several strong baselines, achieving superior performance while maintaining robust generalization and cost management.Code is available at https://github.com/ulab-uiuc/Router-R1.

nan


Article 623

Title@2025-06-10 (2): Optimal Spectral Transitions in High-Dimensional Multi-Index Models

Title: Optimal Spectral Transitions in High-Dimensional Multi-Index Models Optimale Spektralübergänge in High-Dimensional Multi-Index-Modellen 高多维多指数模型中最佳光谱转换 2502.02545v2

Authors (5): Leonardo Defilippis, Yatin Dandi, Pierre Mergny, Florent Krzakala, Bruno Loureiro

We consider the problem of how many samples from a Gaussian multi-index model are required to weakly reconstruct the relevant index subspace. Despite its increasing popularity as a testbed for investigating the computational complexity of neural networks, results beyond the single-index setting remain elusive. In this work, we introduce spectral algorithms based on the linearization of a message passing scheme tailored to this problem. Our main contribution is to show that the proposed methods achieve the optimal reconstruction threshold. Leveraging a high-dimensional characterization of the algorithms, we show that above the critical threshold the leading eigenvector correlates with the relevant index subspace, a phenomenon reminiscent of the Baik-Ben Arous-Peche (BBP) transition in spiked models arising in random matrix theory. Supported by numerical experiments and a rigorous theoretical framework, our work bridges critical gaps in the computational limits of weak learnability in multi-index model.

nan


Article 624

Title@2025-06-10 (2): JuStRank: Benchmarking LLM Judges for System Ranking

Title: JuStRank: Benchmarking LLM Judges for System Ranking JuStRank: Benchmarking der LLM-Richter für Systemranking JuStRank:为系统评分确定LLM法官基准 2412.09569v2

Authors (6): Ariel Gera, Odellia Boni, Yotam Perlitz, Roy Bar-Haim, Lilach Eden, Asaf Yehudai

Given the rapid progress of generative AI, there is a pressing need to systematically compare and choose between the numerous models and configurations available. The scale and versatility of such evaluations make the use of LLM-based judges a compelling solution for this challenge. Crucially, this approach requires first to validate the quality of the LLM judge itself. Previous work has focused on instance-based assessment of LLM judges, where a judge is evaluated over a set of responses, or response pairs, while being agnostic to their source systems. We argue that this setting overlooks critical factors affecting system-level ranking, such as a judge’s positive or negative bias towards certain systems. To address this gap, we conduct the first large-scale study of LLM judges as system rankers. System scores are generated by aggregating judgment scores over multiple system outputs, and the judge’s quality is assessed by comparing the resulting system ranking to a human-based ranking. Beyond overall judge assessment, our analysis provides a fine-grained characterization of judge behavior, including their decisiveness and bias.

nan


Article 625

Title@2025-06-10 (2): Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery

Title: Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery Spektrale Schätzwerte für Multi-Index-Modelle: Präzise Asymptotik und optimale schwache Erholung 多 Index 模型的光谱光谱模拟器:精确的无症状和最佳微弱恢复 2502.01583v2

Authors (3): Filip Kovačević, Yihan Zhang, Marco Mondelli

Multi-index models provide a popular framework to investigate the learnability of functions with low-dimensional structure and, also due to their connections with neural networks, they have been object of recent intensive study. In this paper, we focus on recovering the subspace spanned by the signals via spectral estimators – a family of methods routinely used in practice, often as a warm-start for iterative algorithms. Our main technical contribution is a precise asymptotic characterization of the performance of spectral methods, when sample size and input dimension grow proportionally and the dimension $p$ of the space to recover is fixed. Specifically, we locate the top-$p$ eigenvalues of the spectral matrix and establish the overlaps between the corresponding eigenvectors (which give the spectral estimators) and a basis of the signal subspace. Our analysis unveils a phase transition phenomenon in which, as the sample complexity grows, eigenvalues escape from the bulk of the spectrum and, when that happens, eigenvectors recover directions of the desired subspace. The precise characterization we put forward enables the optimization of the data preprocessing, thus allowing to identify the spectral estimator that requires the minimal sample size for weak recovery.

nan


Article 626

Title@2025-06-10 (2): Diffuse and Disperse: Image Generation with Representation Regularization

Title: Diffuse and Disperse: Image Generation with Representation Regularization Diffuse und Disperse: Bildgenerierung mit Repräsentationsregularisierung Diffuse & diffperse: 形象生成,有代表性的规范化 2506.09027v1

Authors (2): Runqian Wang, Kaiming He

The development of diffusion-based generative models over the past decade has largely proceeded independently of progress in representation learning. These diffusion models typically rely on regression-based objectives and generally lack explicit regularization. In this work, we propose \textit{Dispersive Loss}, a simple plug-and-play regularizer that effectively improves diffusion-based generative models. Our loss function encourages internal representations to disperse in the hidden space, analogous to contrastive self-supervised learning, with the key distinction that it requires no positive sample pairs and therefore does not interfere with the sampling process used for regression. Compared to the recent method of representation alignment (REPA), our approach is self-contained and minimalist, requiring no pre-training, no additional parameters, and no external data. We evaluate Dispersive Loss on the ImageNet dataset across a range of models and report consistent improvements over widely used and strong baselines. We hope our work will help bridge the gap between generative modeling and representation learning.

nan


Article 627

Title@2025-06-10 (2): e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Title: e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs e3: Erforschen lernen ermöglicht Extrapolation von Test-Time Compute für LLMs e3: 学习探索以利对LLMM的试验时间计算进行外推计算 2506.09026v1

Authors (8): Amrith Setlur, Matthew Y. R. Yang, Charlie Snell, Jeremy Greer, Ian Wu, Virginia Smith, Max Simchowitz, Aviral Kumar

Test-time scaling offers a promising path to improve LLM reasoning by utilizing more compute at inference time; however, the true promise of this paradigm lies in extrapolation (i.e., improvement in performance on hard problems as LLMs keep “thinking” for longer, beyond the maximum token budget they were trained on). Surprisingly, we find that most existing reasoning models do not extrapolate well. We show that one way to enable extrapolation is by training the LLM to perform in-context exploration: training the LLM to effectively spend its test time budget by chaining operations (such as generation, verification, refinement, etc.), or testing multiple hypotheses before it commits to an answer. To enable in-context exploration, we identify three key ingredients as part of our recipe e3: (1) chaining skills that the base LLM has asymmetric competence in, e.g., chaining verification (easy) with generation (hard), as a way to implement in-context search; (2) leveraging “negative” gradients from incorrect traces to amplify exploration during RL, resulting in longer search traces that chains additional asymmetries; and (3) coupling task difficulty with training token budget during training via a specifically-designed curriculum to structure in-context exploration. Our recipe e3 produces the best known 1.7B model according to AIME’25 and HMMT’25 scores, and extrapolates to 2x the training token budget. Our e3-1.7B model not only attains high pass@1 scores, but also improves pass@k over the base model.

nan


Article 628

Title@2025-06-10 (2): DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging

Title: DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging DIsoN: Dezentrale Isolationsnetzwerke für Out-of-Distribution Detection in Medical Imaging DISON:医疗成像中分散分布式隔离探测网 2506.09024v1

Authors (5): Felix Wagner, Pramit Saha, Harry Anthony, J. Alison Noble, Konstantinos Kamnitsas

Safe deployment of machine learning (ML) models in safety-critical domains such as medical imaging requires detecting inputs with characteristics not seen during training, known as out-of-distribution (OOD) detection, to prevent unreliable predictions. Effective OOD detection after deployment could benefit from access to the training data, enabling direct comparison between test samples and the training data distribution to identify differences. State-of-the-art OOD detection methods, however, either discard training data after deployment or assume that test samples and training data are centrally stored together, an assumption that rarely holds in real-world settings. This is because shipping training data with the deployed model is usually impossible due to the size of training databases, as well as proprietary or privacy constraints. We introduce the Isolation Network, an OOD detection framework that quantifies the difficulty of separating a target test sample from the training data by solving a binary classification task. We then propose Decentralized Isolation Networks (DIsoN), which enables the comparison of training and test data when data-sharing is impossible, by exchanging only model parameters between the remote computational nodes of training and deployment. We further extend DIsoN with class-conditioning, comparing a target sample solely with training data of its predicted class. We evaluate DIsoN on four medical imaging datasets (dermatology, chest X-ray, breast ultrasound, histopathology) across 12 OOD detection tasks. DIsoN performs favorably against existing methods while respecting data-privacy. This decentralized OOD detection framework opens the way for a new type of service that ML developers could provide along with their models: providing remote, secure utilization of their training data for OOD detection services. Code will be available upon acceptance at: *****

nan


Article 629

Title@2025-06-10 (2): Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing

Title: Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing Wahrscheinlich kostensensive gegnerische Verteidigung durch Randomized Glättung 通过随机滑动平滑,进行成本敏感度高的反反向防御 2310.08732v3

Authors (4): Yuan Xin, Dingfan Chen, Michael Backes, Xiao Zhang

As ML models are increasingly deployed in critical applications, robustness against adversarial perturbations is crucial. While numerous defenses have been proposed to counter such attacks, they typically assume that all adversarial transformations are equally important, an assumption that rarely aligns with real-world applications. To address this, we study the problem of robust learning against adversarial perturbations under cost-sensitive scenarios, where the potential harm of different types of misclassifications is encoded in a cost matrix. Our solution introduces a provably robust learning algorithm to certify and optimize for cost-sensitive robustness, building on the scalable certification framework of randomized smoothing. Specifically, we formalize the definition of cost-sensitive certified radius and propose our novel adaptation of the standard certification algorithm to generate tight robustness certificates tailored to any cost matrix. In addition, we design a robust training method that improves certified cost-sensitive robustness without compromising model accuracy. Extensive experiments on benchmark datasets, including challenging ones unsolvable by existing methods, demonstrate the effectiveness of our certification algorithm and training method across various cost-sensitive scenarios.

nan


Article 630

Title@2025-06-10 (2): Edit Flows: Flow Matching with Edit Operations

Title: Edit Flows: Flow Matching with Edit Operations Flows bearbeiten: Flow passend zu Bearbeitungsoperationen 编辑流程: 与编辑操作匹配的流程 2506.09018v1

Authors (4): Marton Havasi, Brian Karrer, Itai Gat, Ricky T. Q. Chen

Autoregressive generative models naturally generate variable-length sequences, while non-autoregressive models struggle, often imposing rigid, token-wise structures. We propose Edit Flows, a non-autoregressive model that overcomes these limitations by defining a discrete flow over sequences through edit operations-insertions, deletions, and substitutions. By modeling these operations within a Continuous-time Markov Chain over the sequence space, Edit Flows enable flexible, position-relative generation that aligns more closely with the structure of sequence data. Our training method leverages an expanded state space with auxiliary variables, making the learning process efficient and tractable. Empirical results show that Edit Flows outperforms both autoregressive and mask models on image captioning and significantly outperforms the mask construction in text and code generation.

nan


Article 631

Title@2025-06-10 (2): SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning

Title: SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning SPEED-RL: Schnellere Schulung von Vernunftmodellen über Online Curriculum Learning SPEEED-RL:通过在线课程学习更快地培训推理模型 2506.09016v1

Authors (4): Ruiqi Zhang, Daman Arora, Song Mei, Andrea Zanette

Training large language models with reinforcement learning (RL) against verifiable rewards significantly enhances their reasoning abilities, yet remains computationally expensive due to inefficient uniform prompt sampling. We introduce Selective Prompting with Efficient Estimation of Difficulty (SPEED), an adaptive online RL curriculum that selectively chooses training examples of intermediate difficulty to maximize learning efficiency. Theoretically, we establish that intermediate-difficulty prompts improve the gradient estimator’s signal-to-noise ratio, accelerating convergence. Empirically, our efficient implementation leads to 2x to 6x faster training without degrading accuracy, requires no manual tuning, and integrates seamlessly into standard RL algorithms.

nan


Article 632

Title@2025-06-10 (2): Effective Data Pruning through Score Extrapolation

Title: Effective Data Pruning through Score Extrapolation Effektive Datenkorrektur durch Score-Extrapolation 通过计分法外推法有效调节数据 2506.09010v1

Authors (6): Sebastian Schmidt, Prasanga Dhungel, Christoffer Löffler, Björn Nieth, Stephan Günnemann, Leo Schwinn

Training advanced machine learning models demands massive datasets, resulting in prohibitive computational costs. To address this challenge, data pruning techniques identify and remove redundant training samples while preserving model performance. Yet, existing pruning techniques predominantly require a full initial training pass to identify removable samples, negating any efficiency benefits for single training runs. To overcome this limitation, we introduce a novel importance score extrapolation framework that requires training on only a small subset of data. We present two initial approaches in this framework - k-nearest neighbors and graph neural networks - to accurately predict sample importance for the entire dataset using patterns learned from this minimal subset. We demonstrate the effectiveness of our approach for 2 state-of-the-art pruning methods (Dynamic Uncertainty and TDDS), 4 different datasets (CIFAR-10, CIFAR-100, Places-365, and ImageNet), and 3 training paradigms (supervised, unsupervised, and adversarial). Our results indicate that score extrapolation is a promising direction to scale expensive score calculation methods, such as pruning, data attribution, or other tasks.

nan


Article 633

Title@2025-06-10 (2): Scalable Equilibrium Sampling with Sequential Boltzmann Generators

Title: Scalable Equilibrium Sampling with Sequential Boltzmann Generators Skalierbare Equilibrium-Probenahme mit sequenziellen Boltzmann-Generatoren 配有序列波尔茨曼发电机的可缩缩平衡取样器 2502.18462v2

Authors (6): Charlie B. Tan, Avishek Joey Bose, Chen Lin, Leon Klein, Michael M. Bronstein, Alexander Tong

Scalable sampling of molecular states in thermodynamic equilibrium is a long-standing challenge in statistical physics. Boltzmann generators tackle this problem by pairing normalizing flows with importance sampling to obtain uncorrelated samples under the target distribution. In this paper, we extend the Boltzmann generator framework with two key contributions, denoting our framework Sequential Boltzmann Generators (SBG). The first is a highly efficient Transformer-based normalizing flow operating directly on all-atom Cartesian coordinates. In contrast to the equivariant continuous flows of prior methods, we leverage exactly invertible non-equivariant architectures which are highly efficient during both sample generation and likelihood evaluation. This efficiency unlocks more sophisticated inference strategies beyond standard importance sampling. In particular, we perform inference-time scaling of flow samples using a continuous-time variant of sequential Monte Carlo, in which flow samples are transported towards the target distribution with annealed Langevin dynamics. SBG achieves state-of-the-art performance w.r.t. all metrics on peptide systems, demonstrating the first equilibrium sampling in Cartesian coordinates of tri-, tetra- and hexa-peptides that were thus far intractable for prior Boltzmann generators.

nan


Article 634

Title@2025-06-10 (2): Branched Schrödinger Bridge Matching

Title: Branched Schrödinger Bridge Matching Verzweigte Schrödingerbrücke passend 分割的 Schrödinger 桥桥配对 2506.09007v1

Authors (4): Sophia Tang, Yinuo Zhang, Alexander Tong, Pranam Chatterjee

Predicting the intermediate trajectories between an initial and target distribution is a central problem in generative modeling. Existing approaches, such as flow matching and Schr"odinger Bridge Matching, effectively learn mappings between two distributions by modeling a single stochastic path. However, these methods are inherently limited to unimodal transitions and cannot capture branched or divergent evolution from a common origin to multiple distinct outcomes. To address this, we introduce Branched Schr"odinger Bridge Matching (BranchSBM), a novel framework that learns branched Schr"odinger bridges. BranchSBM parameterizes multiple time-dependent velocity fields and growth processes, enabling the representation of population-level divergence into multiple terminal distributions. We show that BranchSBM is not only more expressive but also essential for tasks involving multi-path surface navigation, modeling cell fate bifurcations from homogeneous progenitor states, and simulating diverging cellular responses to perturbations.

nan


Article 635

Title@2025-06-10 (2): CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model

Title: CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model CodeBrain: Bridging Entkoppelter Tokenizer und Multi-Scale-Architektur für EEG-Stiftungsmodell 编码系统:为EEG基金会模式建立连接拆分调控器和多规模结构 2506.09110v1

Authors (7): Jingying Ma, Feng Wu, Qika Lin, Yucheng Xing, Chenyu Liu, Ziyu Jia, Mengling Feng

Electroencephalography (EEG) provides real-time insights into brain activity and is widely used in neuroscience. However, variations in channel configurations, sequence lengths, and task objectives limit the transferability of traditional task-specific models. Although recent EEG foundation models (EFMs) aim to learn generalizable representations, they struggle with limited heterogeneous representation capacity and inefficiency in capturing multi-scale brain dependencies. To address these challenges, we propose CodeBrain, an efficient EFM structurally aligned with brain organization, trained in two stages. (1) We introduce a TFDual-Tokenizer that independently tokenizes heterogeneous temporal and frequency components, enabling a quadratic expansion of the discrete representation space. This also offers a degree of interpretability through cross-domain token analysis. (2) We propose the EEGSSM, which combines a structured global convolution architecture and a sliding window attention mechanism to jointly model sparse long-range and local dependencies. Unlike fully connected Transformer models, EEGSSM better reflects the brain’s small-world topology and efficiently captures EEG’s inherent multi-scale structure. EEGSSM is trained with a masked self-supervised learning objective to predict token indices obtained in TFDual-Tokenizer. Comprehensive experiments on 10 public EEG datasets demonstrate the generalizability of CodeBrain with linear probing. By offering biologically informed and interpretable EEG modeling, CodeBrain lays the foundation for future neuroscience research. Both code and pretraining weights will be released in the future version.

nan


Article 636

Title@2025-06-10 (2): SensorLM: Learning the Language of Wearable Sensors

Title: SensorLM: Learning the Language of Wearable Sensors SensorLM: Die Sprache der tragbaren Sensoren lernen 传感器LM:学习穿装传感器的语言 2506.09108v1

Authors (20): Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, Tim Althoff, Yun Liu, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, Yuzhe Yang

We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor data with language remains challenging due to the lack of paired, richly annotated sensor-text descriptions in uncurated, real-world wearable data. We introduce a hierarchical caption generation pipeline designed to capture statistical, structural, and semantic information from sensor data. This approach enabled the curation of the largest sensor-language dataset to date, comprising over 59.7 million hours of data from more than 103,000 people. Furthermore, SensorLM extends prominent multimodal pretraining architectures (e.g., CLIP, CoCa) and recovers them as specific variants within a generic architecture. Extensive experiments on real-world tasks in human activity analysis and healthcare verify the superior performance of SensorLM over state-of-the-art in zero-shot recognition, few-shot learning, and cross-modal retrieval. SensorLM also demonstrates intriguing capabilities including scaling behaviors, label efficiency, sensor captioning, and zero-shot generalization to unseen tasks.

nan


Article 637

Title@2025-06-10 (2): Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models

Title: Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Effiziente Ausrichtung der medizinischen Visionssprache durch Anpassung maskierter Visionsmodelle 通过调适蒙面愿景模型,提高医疗愿景-语言协调效率 2506.08990v1

Authors (5): Chenyu Lian, Hong-Yu Zhou, Dongyun Liang, Jing Qin, Liansheng Wang

Medical vision-language alignment through cross-modal contrastive learning shows promising performance in image-text matching tasks, such as retrieval and zero-shot classification. However, conventional cross-modal contrastive learning (CLIP-based) methods suffer from suboptimal visual representation capabilities, which also limits their effectiveness in vision-language alignment. In contrast, although the models pretrained via multimodal masked modeling struggle with direct cross-modal matching, they excel in visual representation. To address this contradiction, we propose ALTA (ALign Through Adapting), an efficient medical vision-language alignment method that utilizes only about 8% of the trainable parameters and less than 1/5 of the computational consumption required for masked record modeling. ALTA achieves superior performance in vision-language matching tasks like retrieval and zero-shot classification by adapting the pretrained vision model from masked record modeling. Additionally, we integrate temporal-multiview radiograph inputs to enhance the information consistency between radiographs and their corresponding descriptions in reports, further improving the vision-language alignment. Experimental evaluations show that ALTA outperforms the best-performing counterpart by over 4% absolute points in text-to-image accuracy and approximately 6% absolute points in image-to-text retrieval accuracy. The adaptation of vision-language models during efficient alignment also promotes better vision and language understanding. Code is publicly available at https://github.com/DopamineLcy/ALTA.

nan


Article 638

Title@2025-06-10 (2): SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Title: SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning SwS: Selbstbewusste Schwäche-getriebene Problemsynthese bei der Stärkung des Lernens für LLM-Reasoning SWS:自觉的弱点驱动的问题综述 2506.08989v1

Authors (8): Xiao Liang, Zhong-Zhi Li, Yeyun Gong, Yang Wang, Hengyuan Zhang, Yelong Shen, Ying Nian Wu, Weizhu Chen

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for training large language models (LLMs) on complex reasoning tasks, such as mathematical problem solving. A prerequisite for the scalability of RLVR is a high-quality problem set with precise and verifiable answers. However, the scarcity of well-crafted human-labeled math problems and limited-verification answers in existing distillation-oriented synthetic datasets limit their effectiveness in RL. Additionally, most problem synthesis strategies indiscriminately expand the problem set without considering the model’s capabilities, leading to low efficiency in generating useful questions. To mitigate this issue, we introduce a Self-aware Weakness-driven problem Synthesis framework (SwS) that systematically identifies model deficiencies and leverages them for problem augmentation. Specifically, we define weaknesses as questions that the model consistently fails to learn through its iterative sampling during RL training. We then extract the core concepts from these failure cases and synthesize new problems to strengthen the model’s weak areas in subsequent augmented training, enabling it to focus on and gradually overcome its weaknesses. Without relying on external knowledge distillation, our framework enables robust generalization byempowering the model to self-identify and address its weaknesses in RL, yielding average performance gains of 10.0% and 7.7% on 7B and 32B models across eight mainstream reasoning benchmarks.

nan


Article 639

Title@2025-06-10 (2): Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models

Title: Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models Quamba2: Robustes und skalierbares Post-Training Quantization Framework für selektive State Space Models Quamba2:有选择性的国家空间模型强力和可扩展的培训后量化框架 2503.22879v3

Authors (6): Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Mohamed S. Abdelfattah, Diana Marculescu

State Space Models (SSMs) are emerging as a compelling alternative to Transformers because of their consistent memory usage and high performance. Despite this, scaling up SSMs on cloud services or limited-resource devices is challenging due to their storage requirements and computational power. To overcome this, quantizing SSMs with low bit-width data formats can reduce model size and benefit from hardware acceleration. As SSMs are prone to quantization-induced errors, recent efforts have focused on optimizing a particular model or bit-width for efficiency without sacrificing performance. However, distinct bit-width configurations are essential for different scenarios, like W4A8 for boosting large-batch decoding speed, and W4A16 for enhancing generation speed in short prompt applications for a single user. To this end, we present Quamba2, compatible with W8A8, W4A8, and W4A16 for both Mamba1 and Mamba2 backbones, addressing the growing demand for SSM deployment on various platforms. Based on the channel order preserving and activation persistence of SSMs, we propose an offline approach to quantize inputs of a linear recurrence in 8-bit by sorting and clustering for input $x$, combined with a per-state-group quantization for input-dependent parameters $B$ and $C$. To ensure compute-invariance in the SSM output, we rearrange weights offline according to the clustering sequence. The experiments show that Quamba2-8B outperforms two state-of-the-art SSM quantization methods and delivers 1.3$\times$ and 3$\times$ speed-ups in the pre-filling and generation stages, respectively, while offering 4$\times$ memory reduction with only a $1.6\%$ average accuracy drop. The evaluation on MMLU shows the generalizability and robustness of our framework. The code and quantized models will be released at: https://github.com/enyac-group/Quamba.

nan


Article 640

Title@2025-06-10 (2): PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips

Title: PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips PrisonBreak: Gefängnisbrechende große Sprachmodelle mit weniger als fünfundzwanzig gezielten Bit-Flips 监狱破碎:破禁大语言模式,目标位翻页小于25个目标小于25个 2412.07192v2

Authors (8): Zachary Coalson, Jeonghyun Woo, Yu Sun, Shiyang Chen, Lishan Yang, Prashant Nair, Bo Fang, Sanghyun Hong

We introduce a new class of attacks on commercial-scale (human-aligned) language models that induce jailbreaking through targeted bitwise corruptions in model parameters. Our adversary can jailbreak billion-parameter language models with fewer than 25 bit-flips in all cases$-$and as few as 5 in some$-$using up to 40$\times$ less bit-flips than existing attacks on computer vision models at least 100$\times$ smaller. Unlike prompt-based jailbreaks, our attack renders these models in memory ‘uncensored’ at runtime, allowing them to generate harmful responses without any input modifications. Our attack algorithm efficiently identifies target bits to flip, offering up to 20$\times$ more computational efficiency than previous methods. This makes it practical for language models with billions of parameters. We show an end-to-end exploitation of our attack using software-induced fault injection, Rowhammer (RH). Our work examines 56 DRAM RH profiles from DDR4 and LPDDR4X devices with different RH vulnerabilities. We show that our attack can reliably induce jailbreaking in systems similar to those affected by prior bit-flip attacks. Moreover, our approach remains effective even against highly RH-secure systems (e.g., 46$\times$ more secure than previously tested systems). Our analyses further reveal that: (1) models with less post-training alignment require fewer bit flips to jailbreak; (2) certain model components, such as value projection layers, are substantially more vulnerable than others; and (3) our method is mechanistically different than existing jailbreaks. Our findings highlight a pressing, practical threat to the language model ecosystem and underscore the need for research to protect these models from bit-flip attacks.

nan


Article 641

Title@2025-06-10 (2): Bias Analysis in Unconditional Image Generative Models

Title: Bias Analysis in Unconditional Image Generative Models Bias-Analyse in bedingungslosen Bildgenerativen Modellen 无条件图像生成模型中的偏见分析 2506.09106v1

Authors (5): Xiaofeng Zhang, Michelle Lin, Simon Lacoste-Julien, Aaron Courville, Yash Goyal

The widespread adoption of generative AI models has raised growing concerns about representational harm and potential discriminatory outcomes. Yet, despite growing literature on this topic, the mechanisms by which bias emerges - especially in unconditional generation - remain disentangled. We define the bias of an attribute as the difference between the probability of its presence in the observed distribution and its expected proportion in an ideal reference distribution. In our analysis, we train a set of unconditional image generative models and adopt a commonly used bias evaluation framework to study bias shift between training and generated distributions. Our experiments reveal that the detected attribute shifts are small. We find that the attribute shifts are sensitive to the attribute classifier used to label generated images in the evaluation framework, particularly when its decision boundaries fall in high-density regions. Our empirical analysis indicates that this classifier sensitivity is often observed in attributes values that lie on a spectrum, as opposed to exhibiting a binary nature. This highlights the need for more representative labeling practices, understanding the shortcomings through greater scrutiny of evaluation frameworks, and recognizing the socially complex nature of attributes when evaluating bias.

nan


Article 642

Title@2025-06-10 (2): DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models

Title: DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models DiffLM: Ansteuerbare Synthetische Datengenerierung über Diffusions-Sprachenmodelle DiffLM: 通过传播语言模型生成可控合成数据 2411.03250v2

Authors (9): Ying Zhou, Xinyao Wang, Yulei Niu, Yaojie Shen, Lexin Tang, Fan Chen, Ben He, Le Sun, Longyin Wen

Recent advancements in large language models (LLMs) have significantly enhanced their knowledge and generative capabilities, leading to a surge of interest in leveraging LLMs for high-quality data synthesis. However, synthetic data generation via prompting LLMs remains challenging due to LLMs’ limited understanding of target data distributions and the complexity of prompt engineering, especially for structured formatted data. To address these issues, we introduce DiffLM, a controllable data synthesis framework based on variational autoencoder (VAE), which further (1) leverages diffusion models to reserve more information of original distribution and format structure in the learned latent distribution and (2) decouples the learning of target distribution knowledge from the LLM’s generative objectives via a plug-and-play latent feature injection module. As we observed significant discrepancies between the VAE’s latent representations and the real data distribution, the latent diffusion module is introduced into our framework to learn a fully expressive latent distribution. Evaluations on seven real-world datasets with structured formatted data (i.e., Tabular, Code, and Tool data) demonstrate that DiffLM generates high-quality data, with performance on downstream tasks surpassing that of real data by 2%-7% in certain cases. Data and code are available at https://github.com/bytedance/DiffLM.

nan


Article 643

Title@2025-06-10 (2): Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs

Title: Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs Exakte obere und untere Grenzen für die Ausgabeverteilung von neuralen Netzwerken mit Random Inputs 带有随机输入的神经网络输出分布精确的上下上下曲线 2502.11672v2

Authors (4): Andrey Kofnov, Daniel Kapla, Ezio Bartocci, Efstathia Bura

We derive exact upper and lower bounds for the cumulative distribution function (cdf) of the output of a neural network (NN) over its entire support subject to noisy (stochastic) inputs. The upper and lower bounds converge to the true cdf over its domain as the resolution increases. Our method applies to any feedforward NN using continuous monotonic piecewise twice continuously differentiable activation functions (e.g., ReLU, tanh and softmax) and convolutional NNs, which were beyond the scope of competing approaches. The novelty and instrumental tool of our approach is to bound general NNs with ReLU NNs. The ReLU NN-based bounds are then used to derive the upper and lower bounds of the cdf of the NN output. Experiments demonstrate that our method delivers guaranteed bounds of the predictive output distribution over its support, thus providing exact error guarantees, in contrast to competing approaches.

nan


Article 644

Title@2025-06-10 (2): Propositional Logic for Probing Generalization in Neural Networks

Title: Propositional Logic for Probing Generalization in Neural Networks Propositionelle Logik zur Probing-Verallgemeinerung in neuralen Netzwerken 神经网络普遍化的预示逻辑 2506.08978v1

Authors (3): Anna Langedijk, Jaap Jumelet, Willem Zuidema

The extent to which neural networks are able to acquire and represent symbolic rules remains a key topic of research and debate. Much current work focuses on the impressive capabilities of large language models, as well as their often ill-understood failures on a wide range of reasoning tasks. In this paper, in contrast, we investigate the generalization behavior of three key neural architectures (Transformers, Graph Convolution Networks and LSTMs) in a controlled task rooted in propositional logic. The task requires models to generate satisfying assignments for logical formulas, making it a structured and interpretable setting for studying compositionality. We introduce a balanced extension of an existing dataset to eliminate superficial patterns and enable testing on unseen operator combinations. Using this dataset, we evaluate the ability of the three architectures to generalize beyond the training distribution. While all models perform well in-distribution, we find that generalization to unseen patterns, particularly those involving negation, remains a significant challenge. Transformers fail to apply negation compositionally, unless structural biases are introduced. Our findings highlight persistent limitations in the ability of standard architectures to learn systematic representations of logical operators, suggesting the need for stronger inductive biases to support robust rule-based reasoning.

nan


Article 645

Title@2025-06-10 (2): Tailored Architectures for Time Series Forecasting: Evaluating Deep Learning Models on Gaussian Process-Generated Data

Title: Tailored Architectures for Time Series Forecasting: Evaluating Deep Learning Models on Gaussian Process-Generated Data Maßgeschneiderte Architekturen für die Prognose von Zeitreihen: Bewertung von Deep Learning-Modellen auf gaussischen prozessgenerierten Daten 时间序列预测:评价高斯进程生成数据的深学习模型 2506.08977v1

Authors (2): Victoria Hankemeier, Malte Schilling

Developments in Deep Learning have significantly improved time series forecasting by enabling more accurate modeling of complex temporal dependencies inherent in sequential data. The effectiveness of such models is often demonstrated on limited sets of specific real-world data. Although this allows for comparative analysis, it still does not demonstrate how specific data characteristics align with the architectural strengths of individual models. Our research aims at uncovering clear connections between time series characteristics and particular models. We introduce a novel dataset generated using Gaussian Processes, specifically designed to display distinct, known characteristics for targeted evaluations of model adaptability to them. Furthermore, we present TimeFlex, a new model that incorporates a modular architecture tailored to handle diverse temporal dynamics, including trends and periodic patterns. This model is compared to current state-of-the-art models, offering a deeper understanding of how models perform under varied time series conditions.

nan


Article 646

Title@2025-06-10 (2): Calibrated Physics-Informed Uncertainty Quantification

Title: Calibrated Physics-Informed Uncertainty Quantification Kalibrierung der physikinformierten Unsicherheit Quantifizierung 校准物理 - 内部不确定性量化 2502.04406v2

Authors (8): Vignesh Gopakumar, Ander Gray, Lorenzo Zanisi, Timothy Nunn, Daniel Giles, Matt J. Kusner, Stanislas Pamela, Marc Peter Deisenroth

Simulating complex physical systems is crucial for understanding and predicting phenomena across diverse fields, such as fluid dynamics and heat transfer, as well as plasma physics and structural mechanics. Traditional approaches rely on solving partial differential equations (PDEs) using numerical methods, which are computationally expensive and often prohibitively slow for real-time applications or large-scale simulations. Neural PDEs have emerged as efficient alternatives to these costly numerical solvers, offering significant computational speed-ups. However, their lack of robust uncertainty quantification (UQ) limits deployment in critical applications. We introduce a model-agnostic, physics-informed conformal prediction (CP) framework that provides guaranteed uncertainty estimates without requiring labelled data. By utilising a physics-based approach, we can quantify and calibrate the model’s inconsistencies with the physics rather than the uncertainty arising from the data. Our approach utilises convolutional layers as finite-difference stencils and leverages physics residual errors as nonconformity scores, enabling data-free UQ with marginal and joint coverage guarantees across prediction domains for a range of complex PDEs. We further validate the efficacy of our method on neural PDE models for plasma modelling and shot design in fusion reactors.

nan


Article 647

Title@2025-06-10 (2): On Large-scale Evaluation of Embedding Models for Knowledge Graph Completion

Title: On Large-scale Evaluation of Embedding Models for Knowledge Graph Completion Zur groß angelegten Bewertung von Einbettungsmodellen für die Wissensgraphenvervollständigung 大规模评价知识图完成情况嵌入模型 2504.08970v2

Authors (3): Nasim Shirvani-Mahdavi, Farahnaz Akrami, Chengkai Li

Knowledge graph embedding (KGE) models are extensively studied for knowledge graph completion, yet their evaluation remains constrained by unrealistic benchmarks. Standard evaluation metrics rely on the closed-world assumption, which penalizes models for correctly predicting missing triples, contradicting the fundamental goals of link prediction. These metrics often compress accuracy assessment into a single value, obscuring models’ specific strengths and weaknesses. The prevailing evaluation protocol, link prediction, operates under the unrealistic assumption that an entity’s properties, for which values are to be predicted, are known in advance. While alternative protocols such as property prediction, entity-pair ranking, and triple classification address some of these limitations, they remain underutilized. Moreover, commonly used datasets are either faulty or too small to reflect real-world data. Few studies examine the role of mediator nodes, which are essential for modeling n-ary relationships, or investigate model performance variation across domains. This paper conducts a comprehensive evaluation of four representative KGE models on large-scale datasets FB-CVT-REV and FB+CVT-REV. Our analysis reveals critical insights, including substantial performance variations between small and large datasets, both in relative rankings and absolute metrics, systematic overestimation of model capabilities when n-ary relations are binarized, and fundamental limitations in current evaluation protocols and metrics.

nan


Article 648

Title@2025-06-10 (2): Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers

Title: Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers Vortrainierte Sprachmodelle lernen bemerkenswert genaue Darstellungen von Zahlen 培训前语文模式的精确数字表示 2506.08966v1

Authors (5): Marek Kadlčík, Michal Štefánik, Timothee Mickus, Michal Spiegel, Josef Kuchař

Pretrained language models (LMs) are prone to arithmetic errors. Existing work showed limited success in probing numeric values from models’ representations, indicating that these errors can be attributed to the inherent unreliability of distributionally learned embeddings in representing exact quantities. However, we observe that previous probing methods are inadequate for the emergent structure of learned number embeddings with sinusoidal patterns. In response, we propose a novel probing technique that decodes numeric values from input embeddings with near-perfect accuracy across a range of open-source LMs. This proves that after the sole pre-training, LMs represent numbers with remarkable precision. Finally, we find that the embeddings’ preciseness judged by our probe’s accuracy explains a large portion of LM’s errors in elementary arithmetic, and show that aligning the embeddings with the pattern discovered by our probe can mitigate these errors.

nan


Article 649

Title@2025-06-10 (2): GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO

Title: GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO GFREND: Generatives, nur wenige Schuss Belohnungsableitung durch EfficieNt DPO GFRIEND: 通过公平竞争和消费者保护局生成的微小的报偿推理 2506.08965v1

Authors (3): Yiyang Zhao, Huiyu Bai, Xuejiao Zhao

The ability to train high-performing reward models with few-shot data is critical for enhancing the efficiency and scalability of Reinforcement Learning from Human Feedback (RLHF). We propose a data augmentation and expansion framework that enables generative reward models trained on small datasets to achieve comparable performance to those trained on large-scale datasets. Traditional methods to train a generative reward model, such as Direct Preference Optimization (DPO), are constrained by inefficiencies in sample pairing and limited data diversity. This work introduces preference refinement, which employs Chain-of-Thought (CoT) sampling to uncover diverse and high-quality preference relationships. It also incorporates a perplexity-based scoring mechanism to assign nuanced preference levels and utilizes Multi-level Direct Preference Optimization (M-DPO) to enable the model to capture finer-grained preference differences between samples. Experimental results demonstrate that the proposed method significantly enhances data efficiency and model performance, enabling reward models trained in a few-shot setting to achieve results on par with those trained on large-scale datasets. This study underscores the potential of data-efficient strategies in advancing reward model optimization, offering a robust solution for low-resource RLHF applications.

nan


Article 650

Title@2025-06-10 (2): Gumbel-max List Sampling for Distribution Coupling with Multiple Samples

Title: Gumbel-max List Sampling for Distribution Coupling with Multiple Samples Gumbel-max-Listenstichproben für die Verteilungskopplung mit mehreren Proben 与多个样本相混合的分布式组合的 Gumbel- max 列表列表抽样 2506.05632v2

Authors (3): Joseph Rowan, Buu Phan, Ashish Khisti

We study a relaxation of the problem of coupling probability distributions – a list of samples is generated from one distribution and an accept is declared if any one of these samples is identical to the sample generated from the other distribution. We propose a novel method for generating samples, which extends the Gumbel-max sampling suggested in Daliri et al. (arXiv:2408.07978) for coupling probability distributions. We also establish a corresponding lower bound on the acceptance probability, which we call the list matching lemma. We next discuss two applications of our setup. First, we develop a new mechanism for multi-draft speculative sampling that is simple to implement and achieves performance competitive with baselines such as SpecTr and SpecInfer across a range of language tasks. Our method also guarantees a certain degree of drafter invariance with respect to the output tokens which is not supported by existing schemes. We also provide a theoretical lower bound on the token level acceptance probability. As our second application, we consider distributed lossy compression with side information in a setting where a source sample is compressed and available to multiple decoders, each with independent side information. We propose a compression technique that is based on our generalization of Gumbel-max sampling and show that it provides significant gains in experiments involving synthetic Gaussian sources and the MNIST image dataset.

nan


Article 651

Title@2025-06-10 (2): BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly

Title: BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly 双形:双体几何大会的学习协作能力 2506.06221v2

Authors (9): Yan Shen, Ruihai Wu, Yubin Ke, Xinyuan Song, Zeyi Li, Xiaoqi Li, Hongwei Fan, Haoran Lu, Hao dong

Shape assembly, the process of combining parts into a complete whole, is a crucial robotic skill with broad real-world applications. Among various assembly tasks, geometric assembly–where broken parts are reassembled into their original form (e.g., reconstructing a shattered bowl)–is particularly challenging. This requires the robot to recognize geometric cues for grasping, assembly, and subsequent bimanual collaborative manipulation on varied fragments. In this paper, we exploit the geometric generalization of point-level affordance, learning affordance aware of bimanual collaboration in geometric assembly with long-horizon action sequences. To address the evaluation ambiguity caused by geometry diversity of broken parts, we introduce a real-world benchmark featuring geometric variety and global reproducibility. Extensive experiments demonstrate the superiority of our approach over both previous affordance-based and imitation-based methods. Project page: https://sites.google.com/view/biassembly/.

nan


Article 652

Title@2025-06-10 (2): Towards Robust Deep Reinforcement Learning against Environmental State Perturbation

Title: Towards Robust Deep Reinforcement Learning against Environmental State Perturbation Auf dem Weg zu einem robusten, tiefen Stärkungslernen gegen Umweltzustandsstörung 努力开展强有力的深入强化学习,防止环境状态的干扰 2506.08961v1

Authors (2): Chenxu Wang, Huaping Liu

Adversarial attacks and robustness in Deep Reinforcement Learning (DRL) have been widely studied in various threat models; however, few consider environmental state perturbations, which are natural in embodied scenarios. To improve the robustness of DRL agents, we formulate the problem of environmental state perturbation, introducing a preliminary non-targeted attack method as a calibration adversary, and then propose a defense framework, named Boosted Adversarial Training (BAT), which first tunes the agents via supervised learning to avoid catastrophic failure and subsequently adversarially trains the agent with reinforcement learning. Extensive experimental results substantiate the vulnerability of mainstream agents under environmental state perturbations and the effectiveness of our proposed attack. The defense results demonstrate that while existing robust reinforcement learning algorithms may not be suitable, our BAT framework can significantly enhance the robustness of agents against environmental state perturbations across various situations.

nan


Article 653

Title@2025-06-10 (2): MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning

Title: MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning MetaTT: Ein globaler Tensor-Train-Adapter für parametereffizientes Feintuning MetATT: 参数有效精美调试全球Tensor-Train适应器 2506.09105v1

Authors (6): Javier Lopez-Piqueres, Pranav Deshpande, Archan Ray, Mattia J. Villani, Marco Pistoia, Niraj Kumar

We present MetaTT, a unified Tensor Train (TT) adapter framework for global low-rank fine-tuning of pre-trained transformers. Unlike LoRA, which fine-tunes each weight matrix independently, MetaTT uses a single shared TT to factorize all transformer sub-modules – query, key, value, projection, and feed-forward layers – by indexing the structural axes like layer and matrix type, and optionally heads and tasks. For a given rank, while LoRA adds parameters proportional to the product across modes, MetaTT only adds parameters proportional to the sum across modes leading to a significantly compressed final adapter. Our benchmarks compare MetaTT with LoRA along with recent state-of-the-art matrix and tensor decomposition based fine-tuning schemes. We observe that when tested on standard language modeling benchmarks, MetaTT leads to the most reduction in the parameters while maintaining similar accuracy to LoRA and even outperforming other tensor-based methods. Unlike CP or other rank-factorizations, the TT ansatz benefits from mature optimization routines – e.g., DMRG-style rank adaptive minimization in addition to Adam, which we find simplifies training. Because new modes can be appended cheaply, MetaTT naturally extends to shared adapters across many tasks without redesigning the core tensor.

nan


Article 654

Title@2025-06-10 (2): IntTrajSim: Trajectory Prediction for Simulating Multi-Vehicle driving at Signalized Intersections

Title: IntTrajSim: Trajectory Prediction for Simulating Multi-Vehicle driving at Signalized Intersections IntTrajSim: Trajektorienvorhersage für das Simulieren von Multi-Fahrzeug-Fahren bei Signalisierten Schnitten IntrajSim: 信号交界处模拟多车辆驾驶轨迹预测 2506.08957v1

Authors (4): Yash Ranjan, Rahul Sengupta, Anand Rangarajan, Sanjay Ranka

Traffic simulators are widely used to study the operational efficiency of road infrastructure, but their rule-based approach limits their ability to mimic real-world driving behavior. Traffic intersections are critical components of the road infrastructure, both in terms of safety risk (nearly 28% of fatal crashes and 58% of nonfatal crashes happen at intersections) as well as the operational efficiency of a road corridor. This raises an important question: can we create a data-driven simulator that can mimic the macro- and micro-statistics of the driving behavior at a traffic intersection? Deep Generative Modeling-based trajectory prediction models provide a good starting point to model the complex dynamics of vehicles at an intersection. But they are not tested in a “live” micro-simulation scenario and are not evaluated on traffic engineering-related metrics. In this study, we propose traffic engineering-related metrics to evaluate generative trajectory prediction models and provide a simulation-in-the-loop pipeline to do so. We also provide a multi-headed self-attention-based trajectory prediction model that incorporates the signal information, which outperforms our previous models on the evaluation metrics.

nan


Article 655

Title@2025-06-10 (2): Unifying Block-wise PTQ and Distillation-based QAT for Progressive Quantization toward 2-bit Instruction-Tuned LLMs

Title: Unifying Block-wise PTQ and Distillation-based QAT for Progressive Quantization toward 2-bit Instruction-Tuned LLMs Vereinheitlichen Blockweise PTQ und Destillationsbasiertes QAT für die progressive Quantisierung in Richtung 2-Bit Instruction-Tuned LLMs 用于逐步量化至2位制导二位制导LMM 的QAT 统一块式 PTQ 和基于蒸馏的QAT 统一PTQ 和 蒸馏基QAT 2506.09104v1

Authors (5): Jung Hyun Lee, Seungjae Shin, Vinnam Kim, Jaeseong You, An Chen

As the rapid scaling of large language models (LLMs) poses significant challenges for deployment on resource-constrained devices, there is growing interest in extremely low-bit quantization, such as 2-bit. Although prior works have shown that 2-bit large models are pareto-optimal over their 4-bit smaller counterparts in both accuracy and latency, these advancements have been limited to pre-trained LLMs and have not yet been extended to instruction-tuned models. To bridge this gap, we propose Unified Progressive Quantization (UPQ)$-$a novel progressive quantization framework (FP16$\rightarrow$INT4$\rightarrow$INT2) that unifies block-wise post-training quantization (PTQ) with distillation-based quantization-aware training (Distill-QAT) for INT2 instruction-tuned LLM quantization. UPQ first quantizes FP16 instruction-tuned models to INT4 using block-wise PTQ to significantly reduce the quantization error introduced by subsequent INT2 quantization. Next, UPQ applies Distill-QAT to enable INT2 instruction-tuned LLMs to generate responses consistent with their original FP16 counterparts by minimizing the generalized Jensen-Shannon divergence (JSD) between the two. To the best of our knowledge, we are the first to demonstrate that UPQ can quantize open-source instruction-tuned LLMs to INT2 without relying on proprietary post-training data, while achieving state-of-the-art performances on MMLU and IFEval$-$two of the most representative benchmarks for evaluating instruction-tuned LLMs.

nan


Article 656

Title@2025-06-10 (2): Data Augmentation For Small Object using Fast AutoAugment

Title: Data Augmentation For Small Object using Fast AutoAugment Datenvergrößerung für kleine Objekte mit schneller AutoAugment 使用快速自动增强的小型对象数据增强 2506.08956v1

Authors (4): DaeEun Yoon, Semin Kim, SangWook Yoo, Jongha Lee

In recent years, there has been tremendous progress in object detection performance. However, despite these advances, the detection performance for small objects is significantly inferior to that of large objects. Detecting small objects is one of the most challenging and important problems in computer vision. To improve the detection performance for small objects, we propose an optimal data augmentation method using Fast AutoAugment. Through our proposed method, we can quickly find optimal augmentation policies that can overcome degradation when detecting small objects, and we achieve a 20% performance improvement on the DOTA dataset.

nan


Article 657

Title@2025-06-10 (2): Segment Concealed Objects with Incomplete Supervision

Title: Segment Concealed Objects with Incomplete Supervision Segmentverdeckte Objekte mit unvollständiger Überwachung 带不完全监督的隐藏物体 2506.08955v1

Authors (11): Chunming He, Kai Li, Yachao Zhang, Ziyun Yang, Youwei Pang, Longxiang Tang, Chengyu Fang, Yulun Zhang, Linghe Kong, Xiu Li, Sina Farsiu

Incompletely-Supervised Concealed Object Segmentation (ISCOS) involves segmenting objects that seamlessly blend into their surrounding environments, utilizing incompletely annotated data, such as weak and semi-annotations, for model training. This task remains highly challenging due to (1) the limited supervision provided by the incompletely annotated training data, and (2) the difficulty of distinguishing concealed objects from the background, which arises from the intrinsic similarities in concealed scenarios. In this paper, we introduce the first unified method for ISCOS to address these challenges. To tackle the issue of incomplete supervision, we propose a unified mean-teacher framework, SEE, that leverages the vision foundation model, ``\emph{Segment Anything Model (SAM)}’’, to generate pseudo-labels using coarse masks produced by the teacher model as prompts. To mitigate the effect of low-quality segmentation masks, we introduce a series of strategies for pseudo-label generation, storage, and supervision. These strategies aim to produce informative pseudo-labels, store the best pseudo-labels generated, and select the most reliable components to guide the student model, thereby ensuring robust network training. Additionally, to tackle the issue of intrinsic similarity, we design a hybrid-granularity feature grouping module that groups features at different granularities and aggregates these results. By clustering similar features, this module promotes segmentation coherence, facilitating more complete segmentation for both single-object and multiple-object images. We validate the effectiveness of our approach across multiple ISCOS tasks, and experimental results demonstrate that our method achieves state-of-the-art performance. Furthermore, SEE can serve as a plug-and-play solution, enhancing the performance of existing models.

nan


Article 658

Title@2025-06-10 (2): Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction

Title: Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction Protriever: End-to-End-Unterscheidbare Proteinhomologie Suche nach Fitness-Vorhersage 远期:最终到最终差异可辨别的蛋白质同理学研究,以预测适合性。 2506.08954v1

Authors (7): Ruben Weitzman, Peter Mørch Groth, Lood Van Niekerk, Aoi Otani, Yarin Gal, Debora Marks, Pascal Notin

Retrieving homologous protein sequences is essential for a broad range of protein modeling tasks such as fitness prediction, protein design, structure modeling, and protein-protein interactions. Traditional workflows have relied on a two-step process: first retrieving homologs via Multiple Sequence Alignments (MSA), then training models on one or more of these alignments. However, MSA-based retrieval is computationally expensive, struggles with highly divergent sequences or complex insertions & deletions patterns, and operates independently of the downstream modeling objective. We introduce Protriever, an end-to-end differentiable framework that learns to retrieve relevant homologs while simultaneously training for the target task. When applied to protein fitness prediction, Protriever achieves state-of-the-art performance compared to sequence-based models that rely on MSA-based homolog retrieval, while being two orders of magnitude faster through efficient vector search. Protriever is both architecture- and task-agnostic, and can flexibly adapt to different retrieval strategies and protein databases at inference time – offering a scalable alternative to alignment-centric approaches.

nan


Article 659

Title@2025-06-10 (2): Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Explanations for Multivariate Time Series Classification

Title: Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Explanations for Multivariate Time Series Classification Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Erklärungen für multivariate Zeitreihenklassifikation 多变量时间序列分类的多目标子序列、基于多目标子序列的简单反事实解释 2501.04009v2

Authors (2): Mario Refoyo, David Luengo

Deep Learning systems excel in complex tasks but often lack transparency, limiting their use in critical applications. Counterfactual explanations, a core tool within eXplainable Artificial Intelligence (XAI), offer insights into model decisions by identifying minimal changes to an input to alter its predicted outcome. However, existing methods for time series data are limited by univariate assumptions, rigid constraints on modifications, or lack of validity guarantees. This paper introduces Multi-SpaCE, a multi-objective counterfactual explanation method for multivariate time series. Using non-dominated ranking genetic algorithm II (NSGA-II), Multi-SpaCE balances proximity, sparsity, plausibility, and contiguity. Unlike most methods, it ensures perfect validity, supports multivariate data and provides a Pareto front of solutions, enabling flexibility to different end-user needs. Comprehensive experiments in diverse datasets demonstrate the ability of Multi-SpaCE to consistently achieve perfect validity and deliver superior performance compared to existing methods.

nan


Article 660

Title@2025-06-10 (2): Epistemic Uncertainty in Conformal Scores: A Unified Approach

Title: Epistemic Uncertainty in Conformal Scores: A Unified Approach Epistemische Ungewissheit in konformen Partituren: Ein einheitlicher Ansatz 非正式分数中的不确定性:统一办法 2502.06995v2

Authors (4): Luben M. C. Cabezas, Vagner S. Santos, Thiago R. Ramos, Rafael Izbicki

Conformal prediction methods create prediction bands with distribution-free guarantees but do not explicitly capture epistemic uncertainty, which can lead to overconfident predictions in data-sparse regions. Although recent conformal scores have been developed to address this limitation, they are typically designed for specific tasks, such as regression or quantile regression. Moreover, they rely on particular modeling choices for epistemic uncertainty, restricting their applicability. We introduce $\texttt{EPICSCORE}$, a model-agnostic approach that enhances any conformal score by explicitly integrating epistemic uncertainty. Leveraging Bayesian techniques such as Gaussian Processes, Monte Carlo Dropout, or Bayesian Additive Regression Trees, $\texttt{EPICSCORE}$ adaptively expands predictive intervals in regions with limited data while maintaining compact intervals where data is abundant. As with any conformal method, it preserves finite-sample marginal coverage. Additionally, it also achieves asymptotic conditional coverage. Experiments demonstrate its good performance compared to existing methods. Designed for compatibility with any Bayesian model, but equipped with distribution-free guarantees, $\texttt{EPICSCORE}$ provides a general-purpose framework for uncertainty quantification in prediction problems.

nan


Article 661

Title@2025-06-10 (2): Implicit Neural Representations for Chemical Reaction Paths

Title: Implicit Neural Representations for Chemical Reaction Paths Implizite neurale Darstellungen für chemische Reaktionswege 化学反应路径的隐性神经图示 2502.15843v2

Authors (5): Kalyan Ramakrishnan, Lars L. Schaaf, Chen Lin, Guangrun Wang, Philip Torr

We show that neural networks can be optimized to represent minimum energy paths as continuous functions, offering a flexible alternative to discrete path-search methods like Nudged Elastic Band (NEB). Our approach parameterizes reaction paths with a network trained on a loss function that discards tangential energy gradients and enables instant estimation of the transition state. We first validate the method on two-dimensional potentials and then demonstrate its advantages over NEB on challenging atomistic systems where (i) poor initial guesses yield unphysical paths, (ii) multiple competing paths exist, or (iii) the reaction follows a complex multi-step mechanism. Results highlight the versatility of the method: for instance, a simple adjustment to the sampling strategy during optimization can help escape local-minimum solutions. Finally, in a low-dimensional setting, we demonstrate that a single neural network can learn from existing paths and generalize to unseen systems, showing promise for a universal reaction path representation.

nan


Article 662

Title@2025-06-10 (2): KARMA: A Multilevel Decomposition Hybrid Mamba Framework for Multivariate Long-Term Time Series Forecasting

Title: KARMA: A Multilevel Decomposition Hybrid Mamba Framework for Multivariate Long-Term Time Series Forecasting KARMA: Ein Multilevel-Dekompositions-Hybrid-Mamba-Framework für multivariate Langzeitreihenprognosen KARMA: 多变长期时间序列预测多层次分解混合 Mamba 框架 2506.08939v1

Authors (7): Hang Ye, Gaoxiang Duan, Haoran Zeng, Yangxin Zhu, Lingxue Meng, Xiaoying Zheng, Yongxin Zhu

Multivariate long-term and efficient time series forecasting is a key requirement for a variety of practical applications, and there are complex interleaving time dynamics in time series data that require decomposition modeling. Traditional time series decomposition methods are single and rely on fixed rules, which are insufficient for mining the potential information of the series and adapting to the dynamic characteristics of complex series. On the other hand, the Transformer-based models for time series forecasting struggle to effectively model long sequences and intricate dynamic relationships due to their high computational complexity. To overcome these limitations, we introduce KARMA, with an Adaptive Time Channel Decomposition module (ATCD) to dynamically extract trend and seasonal components. It further integrates a Hybrid Frequency-Time Decomposition module (HFTD) to further decompose Series into frequency-domain and time-domain. These components are coupled with multi-scale Mamba-based KarmaBlock to efficiently process global and local information in a coordinated manner. Experiments on eight real-world datasets from diverse domains well demonstrated that KARMA significantly outperforms mainstream baseline methods in both predictive accuracy and computational efficiency. Code and full results are available at this repository: https://github.com/yedadasd/KARMA

nan


Article 663

Title@2025-06-10 (2): BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models

Title: BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models BioLangFusion: Multimodale Fusion von DNA, mRNA und Protein Language Models BioLangFusion:DNA、MRNA和蛋白素语言模型的多式联运 2506.08936v1

Authors (6): Amina Mollaysa, Artem Moskale, Pushpak Pati, Tommaso Mansi, Mangal Prakash, Rui Liao

We present BioLangFusion, a simple approach for integrating pre-trained DNA, mRNA, and protein language models into unified molecular representations. Motivated by the central dogma of molecular biology (information flow from gene to transcript to protein), we align per-modality embeddings at the biologically meaningful codon level (three nucleotides encoding one amino acid) to ensure direct cross-modal correspondence. BioLangFusion studies three standard fusion techniques: (i) codon-level embedding concatenation, (ii) entropy-regularized attention pooling inspired by multiple-instance learning, and (iii) cross-modal multi-head attention – each technique providing a different inductive bias for combining modality-specific signals. These methods require no additional pre-training or modification of the base models, allowing straightforward integration with existing sequence-based foundation models. Across five molecular property prediction tasks, BioLangFusion outperforms strong unimodal baselines, showing that even simple fusion of pre-trained models can capture complementary multi-omic information with minimal overhead.

nan


Article 664

Title@2025-06-10 (2): Can A Gamer Train A Mathematical Reasoning Model?

Title: Can A Gamer Train A Mathematical Reasoning Model? Kann ein Gamer ein mathematisches Reasoning-Modell trainieren? 赌徒能训练数学理论模型吗? 2506.08935v1

Authors (1): Andrew Shin

While large language models (LLMs) have achieved remarkable performance in various tasks including mathematical reasoning, their development typically demands prohibitive computational resources. Recent advancements have reduced costs for training capable models, yet even these approaches rely on high-end hardware clusters. In this paper, we demonstrate that a single average gaming GPU can train a solid mathematical reasoning model, by integrating reinforcement learning and memory optimization techniques. Specifically, we train a 1.5B parameter mathematical reasoning model on RTX 3080 Ti of 16GB memory that achieves comparable or better performance on mathematical reasoning benchmarks than models several times larger, in resource-constrained environments. Our results challenge the paradigm that state-of-the-art mathematical reasoning necessitates massive infrastructure, democratizing access to high-performance AI research. https://github.com/shinandrew/YouronMath.

nan


Article 665

Title@2025-06-10 (2): Local MDI+: Local Feature Importances for Tree-Based Models

Title: Local MDI+: Local Feature Importances for Tree-Based Models Lokale MDI+: lokale Feature-Wichtigkeiten für baumbasierte Modelle 本地 MDI+: 树基模型的本地特点重要性 2506.08928v1

Authors (5): Zhongyuan Liang, Zachary T. Rewolinski, Abhineet Agarwal, Tiffany M. Tang, Bin Yu

Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local (i.e. sample-specific) feature importance (LFI) methods such as LIME and TreeSHAP. However, these approaches rely on approximations that ignore the model’s internal structure and instead depend on potentially unstable perturbations. These issues are addressed in the global setting by MDI+, a feature importance method which exploits an equivalence between decision trees and linear models on a transformed node basis. However, the global MDI+ scores are not able to explain predictions when faced with heterogeneous individual characteristics. To address this gap, we propose Local MDI+ (LMDI+), a novel extension of the MDI+ framework to the sample specific setting. LMDI+ outperforms existing baselines LIME and TreeSHAP in identifying instance-specific signal features, averaging a 10% improvement in downstream task performance across twelve real-world benchmark datasets. It further demonstrates greater stability by consistently producing similar instance-level feature importance rankings across multiple random forest fits. Finally, LMDI+ enables local interpretability use cases, including the identification of closer counterfactuals and the discovery of homogeneous subgroups.

nan


Article 666

Title@2025-06-10 (2): Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap

Title: Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap Multimodale Unüberwachte Domain-Verallgemeinerung durch Herausholen über die Modalitätslücke 通过跨越模式差距的检索实现多式多式不受监督域化 2402.04416v3

Authors (4): Christopher Liao, Christian So, Theodoros Tsiligkaridis, Brian Kulis

Domain generalization (DG) is an important problem that learns a model which generalizes to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (MUDG) problem, which uses a large task-agnostic unlabeled source dataset during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space. We make three contributions in the MUDG setting. Firstly, we show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization. Accordingly, we propose paired k-means, a simple clustering algorithm that improves nearest neighbor recall by storing centroids in query space instead of image space. Secondly, we propose an adaptive text augmentation scheme for target labels designed to improve zero-shot accuracy and diversify retrieved image data. Lastly, we present two simple but effective components to further improve downstream target accuracy. We compare against state-of-the-art name-only transfer, source-free DG and zero-shot (ZS) methods on their respective benchmarks and show consistent improvement in accuracy on 20 diverse datasets. Code is available: https://github.com/Chris210634/mudg

nan


Article 667

Title@2025-06-10 (2): PropMEND: Hypernetworks for Knowledge Propagation in LLMs

Title: PropMEND: Hypernetworks for Knowledge Propagation in LLMs PropMEND: Hypernetworks für die Wissensverbreitung in LLMs PropMEND: 以LLMMs方式传播知识的超网络 2506.08920v1

Authors (3): Zeyu Leo Liu, Greg Durrett, Eunsol Choi

Knowledge editing techniques for large language models (LLMs) can inject knowledge that is later reproducible verbatim, but they fall short on propagating that knowledge: models cannot answer questions that require reasoning with the injected knowledge. We present a hypernetwork-based approach for knowledge propagation, named PropMEND, where we meta-learn how to modify gradients of a language modeling loss to encourage injected information to propagate. Our approach extends the meta-objective of MEND [29] so that gradient updates on knowledge are transformed to enable answering multi-hop questions involving that knowledge. We show improved performance on the RippleEdit dataset, showing almost 2x accuracy on challenging multi-hop questions whose answers are not explicitly stated in the injected fact. We further introduce a new dataset, Controlled RippleEdit, to evaluate the generalization of our hypernetwork, testing knowledge propagation along relations and entities unseen during hypernetwork training. PropMEND still outperforms existing approaches in unseen entity-relation pairs, yet the performance gap decreases substantially, suggesting future work in propagating knowledge to a wide range of relations.

nan


Article 668

Title@2025-06-10 (2): Provably Accurate Adaptive Sampling for Collocation Points in Physics-informed Neural Networks

Title: Provably Accurate Adaptive Sampling for Collocation Points in Physics-informed Neural Networks Wahrscheinlich genaue adaptive Probenahme für Kollokationspunkte in physikinformierten Neuronalen Netzwerken 物理知情神经网络定位点的准确适应性抽样 2504.00910v2

Authors (5): Antoine Caradot, Rémi Emonet, Amaury Habrard, Abdel-Rahim Mezidi, Marc Sebban

Despite considerable scientific advances in numerical simulation, efficiently solving PDEs remains a complex and often expensive problem. Physics-informed Neural Networks (PINN) have emerged as an efficient way to learn surrogate solvers by embedding the PDE in the loss function and minimizing its residuals using automatic differentiation at so-called collocation points. Originally uniformly sampled, the choice of the latter has been the subject of recent advances leading to adaptive sampling refinements for PINNs. In this paper, leveraging a new quadrature method for approximating definite integrals, we introduce a provably accurate sampling method for collocation points based on the Hessian of the PDE residuals. Comparative experiments conducted on a set of 1D and 2D PDEs demonstrate the benefits of our method.

nan


Article 669

Title@2025-06-10 (2): Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL)

Title: Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL) Verbesserung der Verallgemeinerbarkeit der Modellentdeckung über den Parameterraum mit Multi-Experiment-Gleichung Lernen (ME-EQL) 通过多实验方程学习(ME-EQL)加强跨参数空间模型发现的一般性 2506.08916v1

Authors (6): Maria-Veronica Ciocanel, John T. Nardini, Kevin B. Flores, Erica M. Rutter, Suzanne S. Sindi, Alexandria Volkening

Agent-based modeling (ABM) is a powerful tool for understanding self-organizing biological systems, but it is computationally intensive and often not analytically tractable. Equation learning (EQL) methods can derive continuum models from ABM data, but they typically require extensive simulations for each parameter set, raising concerns about generalizability. In this work, we extend EQL to Multi-experiment equation learning (ME-EQL) by introducing two methods: one-at-a-time ME-EQL (OAT ME-EQL), which learns individual models for each parameter set and connects them via interpolation, and embedded structure ME-EQL (ES ME-EQL), which builds a unified model library across parameters. We demonstrate these methods using a birth–death mean-field model and an on-lattice agent-based model of birth, death, and migration with spatial structure. Our results show that both methods significantly reduce the relative error in recovering parameters from agent-based simulations, with OAT ME-EQL offering better generalizability across parameter space. Our findings highlight the potential of equation learning from multiple experiments to enhance the generalizability and interpretability of learned models for complex biological systems.

nan


Article 670

Title@2025-06-10 (2): Federated Generalised Variational Inference: A Robust Probabilistic Federated Learning Framework

Title: Federated Generalised Variational Inference: A Robust Probabilistic Federated Learning Framework Federated Generalised Variational Inference: Ein robustes probabilistisches Federated Learning Framework 联邦通用多变性推断:强力概率联邦学习框架 2502.00846v3

Authors (4): Terje Mildner, Oliver Hamelijnck, Paris Giampouras, Theodoros Damoulas

We introduce FedGVI, a probabilistic Federated Learning (FL) framework that is robust to both prior and likelihood misspecification. FedGVI addresses limitations in both frequentist and Bayesian FL by providing unbiased predictions under model misspecification, with calibrated uncertainty quantification. Our approach generalises previous FL approaches, specifically Partitioned Variational Inference (Ashman et al., 2022), by allowing robust and conjugate updates, decreasing computational complexity at the clients. We offer theoretical analysis in terms of fixed-point convergence, optimality of the cavity distribution, and provable robustness to likelihood misspecification. Further, we empirically demonstrate the effectiveness of FedGVI in terms of improved robustness and predictive performance on multiple synthetic and real world classification data sets.

nan


Article 671

Title@2025-06-10 (2): Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU

Title: Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU Implementierung von Keyword-Spoting auf dem MCUX947 Microcontroller mit integrierter NPU 执行关于MCUX947微型控制器与集成 NPU 2506.08911v1

Authors (2): Petar Jakuš, Hrvoje Džapo

This paper presents a keyword spotting (KWS) system implemented on the NXP MCXN947 microcontroller with an integrated Neural Processing Unit (NPU), enabling real-time voice interaction on resource-constrained devices. The system combines MFCC feature extraction with a CNN classifier, optimized using Quantization Aware Training to reduce model size with minimal accuracy drop. Experimental results demonstrate a 59x speedup in inference time when leveraging the NPU compared to CPU-only execution, achieving 97.06% accuracy with a model size of 30.58 KB, demonstrating the feasibility of efficient, low-power voice interfaces on embedded platforms.

nan


Article 672

Title@2025-06-10 (2): BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

Title: BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning BEAST: Effiziente Tokenisierung von B-Splines kodierten Aktionssequenzen für das Imitationslernen BEAST: 模拟学习的B-S级编码行动序列的有效招接 2506.06072v2

Authors (14): Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, Ömer Erdinç Yağmurlu, Nils Blank, Moritz Reuss, Rudolf Lioutikov

We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures generating smooth trajectories without discontinuities between adjacent segments. We extensively evaluate BEAST by integrating it with three distinct model architectures: a Variational Autoencoder (VAE) with continuous tokens, a decoder-only Transformer with discrete tokens, and Florence-2, a pretrained Vision-Language Model with an encoder-decoder architecture, demonstrating BEAST’s compatibility and scalability with large pretrained models. We evaluate BEAST across three established benchmarks consisting of 166 simulated tasks and on three distinct robot settings with a total of 8 real-world tasks. Experimental results demonstrate that BEAST (i) significantly reduces both training and inference computational costs, and (ii) consistently generates smooth, high-frequency control signals suitable for continuous control tasks while (iii) reliably achieves competitive task success rates compared to state-of-the-art methods.

nan


Article 673

Title@2025-06-10 (2): Intention-Conditioned Flow Occupancy Models

Title: Intention-Conditioned Flow Occupancy Models Intentionskonditionierte Flow-Belegungsmodelle 有意有条件流动占用模式 2506.08902v1

Authors (4): Chongyi Zheng, Seohong Park, Sergey Levine, Benjamin Eysenbach

Large-scale pre-training has fundamentally changed how machine learning research is done today: large foundation models are trained once, and then can be used by anyone in the community (including those without data or compute resources to train a model from scratch) to adapt and fine-tune to specific tasks. Applying this same framework to reinforcement learning (RL) is appealing because it offers compelling avenues for addressing core challenges in RL, including sample efficiency and robustness. However, there remains a fundamental challenge to pre-train large models in the context of RL: actions have long-term dependencies, so training a foundation model that reasons across time is important. Recent advances in generative AI have provided new tools for modeling highly complex distributions. In this paper, we build a probabilistic model to predict which states an agent will visit in the temporally distant future (i.e., an occupancy measure) using flow matching. As large datasets are often constructed by many distinct users performing distinct tasks, we include in our model a latent variable capturing the user intention. This intention increases the expressivity of our model, and enables adaptation with generalized policy improvement. We call our proposed method intention-conditioned flow occupancy models (InFOM). Comparing with alternative methods for pre-training, our experiments on $36$ state-based and $4$ image-based benchmark tasks demonstrate that the proposed method achieves $1.8 \times$ median improvement in returns and increases success rates by $36\%$. Website: https://chongyi-zheng.github.io/infom Code: https://github.com/chongyi-zheng/infom

nan


Article 674

Title@2025-06-10 (2): Feature Shift Localization Network

Title: Feature Shift Localization Network Feature Shift Localization Network 本地化网络 2506.09101v1

Authors (4): Míriam Barrabés, Daniel Mas Montserrat, Kapal Dev, Alexander G. Ioannidis

Feature shifts between data sources are present in many applications involving healthcare, biomedical, socioeconomic, financial, survey, and multi-sensor data, among others, where unharmonized heterogeneous data sources, noisy data measurements, or inconsistent processing and standardization pipelines can lead to erroneous features. Localizing shifted features is important to address the underlying cause of the shift and correct or filter the data to avoid degrading downstream analysis. While many techniques can detect distribution shifts, localizing the features originating them is still challenging, with current solutions being either inaccurate or not scalable to large and high-dimensional datasets. In this work, we introduce the Feature Shift Localization Network (FSL-Net), a neural network that can localize feature shifts in large and high-dimensional datasets in a fast and accurate manner. The network, trained with a large number of datasets, learns to extract the statistical properties of the datasets and can localize feature shifts from previously unseen datasets and shifts without the need for re-training. The code and ready-to-use trained model are available at https://github.com/AI-sandbox/FSL-Net.

nan


Article 675

Title@2025-06-10 (2): Generalization analysis of an unfolding network for analysis-based Compressed Sensing

Title: Generalization analysis of an unfolding network for analysis-based Compressed Sensing Generalisierungsanalyse eines sich entfaltenden Netzwerks für analysebasiertes Komprimiertes Sensing 对正在展开的基于分析的压缩遥感压缩分析网络的概括化分析 2303.05582v3

Authors (2): Vicky Kouni, Yannis Panagakis

Unfolding networks have shown promising results in the Compressed Sensing (CS) field. Yet, the investigation of their generalization ability is still in its infancy. In this paper, we perform a generalization analysis of a state-of-the-art ADMM-based unfolding network, which jointly learns a decoder for CS and a sparsifying redundant analysis operator. To this end, we first impose a structural constraint on the learnable sparsifier, which parametrizes the network’s hypothesis class. For the latter, we estimate its Rademacher complexity. With this estimate in hand, we deliver generalization error bounds – which scale like the square root of the number of layers – for the examined network. Finally, the validity of our theory is assessed and numerical comparisons to a state-of-the-art unfolding network are made, on synthetic and real-world datasets. Our experimental results demonstrate that our proposed framework complies with our theoretical findings and outperforms the baseline, consistently for all datasets.

nan


Article 676

Title@2025-06-10 (2): Improved Variational Inference in Discrete VAEs using Error Correcting Codes

Title: Improved Variational Inference in Discrete VAEs using Error Correcting Codes Verbesserte Variationale Schlussfolgerungen in diskreten VAEs mit Fehlerkorrekturcodes 使用错误校正代码的差错 VAEs 中改进的变量引用 2410.07840v2

Authors (4): María Martínez-García, Grace Villacrés, David Mitchell, Pablo M. Olmos

Despite advances in deep probabilistic models, learning discrete latent representations remains challenging. This work introduces a novel method to improve inference in discrete Variational Autoencoders by reframing the inference problem through a generative perspective. We conceptualize the model as a communication system, and propose to leverage Error-Correcting Codes (ECCs) to introduce redundancy in latent representations, allowing the variational posterior to produce more accurate estimates and reduce the variational gap. We present a proof-of-concept using a Discrete Variational Autoencoder with binary latent variables and low-complexity repetition codes, extending it to a hierarchical structure for disentangling global and local data features. Our approach significantly improves generation quality, data reconstruction, and uncertainty calibration, outperforming the uncoded models even when trained with tighter bounds such as the Importance Weighted Autoencoder objective. We also outline the properties that ECCs should possess to be effectively utilized for improved discrete variational inference.

nan


Article 677

Title@2025-06-10 (2): Real-Time Cascade Mitigation in Power Systems Using Influence Graph Improved by Reinforcement Learning

Title: Real-Time Cascade Mitigation in Power Systems Using Influence Graph Improved by Reinforcement Learning Real-Time Cascade Mitigation in Stromsystemen mit Einfluss Graph verbessert durch Verstärkung Learning 利用强化学习所改进的影响图,在电力系统中实时连锁减缓电源系统 2506.08893v1

Authors (4): Kai Zhou, Youbiao He, Chong Zhong, Yifu Wu

Despite high reliability, modern power systems with growing renewable penetration face an increasing risk of cascading outages. Real-time cascade mitigation requires fast, complex operational decisions under uncertainty. In this work, we extend the influence graph into a Markov decision process model (MDP) for real-time mitigation of cascading outages in power transmission systems, accounting for uncertainties in generation, load, and initial contingencies. The MDP includes a do-nothing action to allow for conservative decision-making and is solved using reinforcement learning. We present a policy gradient learning algorithm initialized with a policy corresponding to the unmitigated case and designed to handle invalid actions. The proposed learning method converges faster than the conventional algorithm. Through careful reward design, we learn a policy that takes conservative actions without deteriorating system conditions. The model is validated on the IEEE 14-bus and IEEE 118-bus systems. The results show that proactive line disconnections can effectively reduce cascading risk, and certain lines consistently emerge as critical in mitigating cascade propagation.

nan


Article 678

Title@2025-06-10 (2): SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Title: SeerAttention-R: Sparse Attention Adaptation for Long Reasoning SeherAchtung-R: Sparse-Achtung-Anpassung für langes Nachdenken SeerAttention-R: 偏差的注意力调整以适应长期理由 2506.08889v1

Authors (15): Yizhao Gao, Shuming Guo, Shijie Cao, Yuqing Xia, Yu Cheng, Lei Wang, Lingxiao Ma, Yutao Sun, Tianzhu Ye, Li Dong, Hayden Kwok-Hay So, Yu Hua, Ting Cao, Fan Yang, Mao Yang

We introduce SeerAttention-R, a sparse attention framework specifically tailored for the long decoding of reasoning models. Extended from SeerAttention, SeerAttention-R retains the design of learning attention sparsity through a self-distilled gating mechanism, while removing query pooling to accommodate auto-regressive decoding. With a lightweight plug-in gating, SeerAttention-R is flexible and can be easily integrated into existing pretrained model without modifying the original parameters. We demonstrate that SeerAttention-R, trained on just 0.4B tokens, maintains near-lossless reasoning accuracy with 4K token budget in AIME benchmark under large sparse attention block sizes (64/128). Using TileLang, we develop a highly optimized sparse decoding kernel that achieves near-theoretical speedups of up to 9x over FlashAttention-3 on H100 GPU at 90% sparsity. Code is available at: https://github.com/microsoft/SeerAttention.

nan


Article 679

Title@2025-06-10 (2): InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation Analysis

Title: InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation Analysis InfoDPCCA: Informationstheoretische dynamische probabilistische kanonischen Korrelationsanalyse InfoDPCA: 信息理论动态动态概率能力与卡尼诺相关关系分析 2506.08884v1

Authors (2): Shiqin Tang, Shujian Yu

Extracting meaningful latent representations from high-dimensional sequential data is a crucial challenge in machine learning, with applications spanning natural science and engineering. We introduce InfoDPCCA, a dynamic probabilistic Canonical Correlation Analysis (CCA) framework designed to model two interdependent sequences of observations. InfoDPCCA leverages a novel information-theoretic objective to extract a shared latent representation that captures the mutual structure between the data streams and balances representation compression and predictive sufficiency while also learning separate latent components that encode information specific to each sequence. Unlike prior dynamic CCA models, such as DPCCA, our approach explicitly enforces the shared latent space to encode only the mutual information between the sequences, improving interpretability and robustness. We further introduce a two-step training scheme to bridge the gap between information-theoretic representation learning and generative modeling, along with a residual connection mechanism to enhance training stability. Through experiments on synthetic and medical fMRI data, we demonstrate that InfoDPCCA excels as a tool for representation learning. Code of InfoDPCCA is available at https://github.com/marcusstang/InfoDPCCA.

nan


Article 680

Title@2025-06-10 (2): Filling in the Blanks: Applying Data Imputation in incomplete Water Metering Data

Title: Filling in the Blanks: Applying Data Imputation in incomplete Water Metering Data Füllen der Blanks: Anwenden von Daten Imputation in unvollständigen Wasserzählerdaten 填充空白:在不完整的水计量数据中应用数据计算 2506.08882v1

Authors (4): Dimitrios Amaxilatis, Themistoklis Sarantakos, Ioannis Chatzigiannakis, Georgios Mylonas

In this work, we explore the application of recent data imputation techniques to enhance monitoring and management of water distribution networks using smart water meters, based on data derived from a real-world IoT water grid monitoring deployment. Despite the detailed data produced by such meters, data gaps due to technical issues can significantly impact operational decisions and efficiency. Our results, by comparing various imputation methods, such as k-Nearest Neighbors, MissForest, Transformers, and Recurrent Neural Networks, indicate that effective data imputation can substantially enhance the quality of the insights derived from water consumption data as we study their effect on accuracy and reliability of water metering data to provide solutions in applications like leak detection and predictive maintenance scheduling.

nan


Article 681

Title@2025-06-10 (2): Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models

Title: Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models Pivoting Factorization: Eine kompakte Meta-Low-Rank-Darstellung von Sparsität für effiziente Schlussfolgerungen in großen Sprachmodellen 重力因素化:大语言模型中高效引引的公平代表性的契约元低率代表 2501.19090v2

Authors (3): Jialin Zhao, Yingtao Zhang, Carlo Vittorio Cannistraci

The rapid growth of Large Language Models has driven demand for effective model compression techniques to reduce memory and computation costs. Low-rank pruning has gained attention for its GPU compatibility across all densities. However, low-rank pruning struggles to match the performance of semi-structured pruning, often doubling perplexity at similar densities. In this paper, we propose Pivoting Factorization (PIFA), a novel lossless meta low-rank representation that unsupervisedly learns a compact form of any low-rank representation, effectively eliminating redundant information. PIFA identifies pivot rows (linearly independent rows) and expresses non-pivot rows as linear combinations, achieving 24.2% additional memory savings and 24.6% faster inference over low-rank layers at rank = 50% of dimension. To mitigate the performance degradation caused by low-rank pruning, we introduce a novel, retraining-free reconstruction method that minimizes error accumulation (M). MPIFA, combining M and PIFA into an end-to-end framework, significantly outperforms existing low-rank pruning methods, and achieves performance comparable to semi-structured pruning, while surpassing it in GPU efficiency and compatibility. Our code is available at https://github.com/biomedical-cybernetics/pivoting-factorization.

nan


Article 682

Title@2025-06-10 (2): Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks

Title: Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks Sparse Spectral Training und Inferenz auf Euklidische und Hyperbolische Neuronale Netzwerke Euclidean 和超双曲神经网络的 广度光谱培训和推断 2405.15481v2

Authors (5): Jialin Zhao, Yingtao Zhang, Xinghang Li, Huaping Liu, Carlo Vittorio Cannistraci

The growing demands on GPU memory posed by the increasing number of neural network parameters call for training approaches that are more memory-efficient. Previous memory reduction training techniques, such as Low-Rank Adaptation (LoRA) and ReLoRA, face challenges, with LoRA being constrained by its low-rank structure, particularly during intensive tasks like pre-training, and ReLoRA suffering from saddle point issues. In this paper, we propose Sparse Spectral Training (SST) to optimize memory usage for pre-training. SST updates all singular values and selectively updates singular vectors through a multinomial sampling method weighted by the magnitude of the singular values. Furthermore, SST employs singular value decomposition to initialize and periodically reinitialize low-rank parameters, reducing distortion relative to full-rank training compared to other low-rank methods. Through comprehensive testing on both Euclidean and hyperbolic neural networks across various tasks, SST demonstrates its ability to outperform existing memory reduction training methods and is comparable to full-rank training in various cases. On LLaMA-1.3B, with only 18.7\% of the parameters trainable compared to full-rank training (using a rank equivalent to 6\% of the embedding dimension), SST reduces the perplexity gap between other low-rank methods and full-rank training by 97.4\%. This result highlights SST as an effective parameter-efficient technique for model pre-training.

nan


Article 683

Title@2025-06-10 (2): Adapting to Heterophilic Graph Data with Structure-Guided Neighbor Discovery

Title: Adapting to Heterophilic Graph Data with Structure-Guided Neighbor Discovery Anpassung an heterophile Graphendaten mit strukturgeführter Neighbor Discovery 与结构引导邻居发现发现器相适应以适应结构辅助邻居发现器的热对数图形数据 2506.08871v1

Authors (5): Victor M. Tenorio, Madeline Navarro, Samuel Rey, Santiago Segarra, Antonio G. Marques

Graph Neural Networks (GNNs) often struggle with heterophilic data, where connected nodes may have dissimilar labels, as they typically assume homophily and rely on local message passing. To address this, we propose creating alternative graph structures by linking nodes with similar structural attributes (e.g., role-based or global), thereby fostering higher label homophily on these new graphs. We theoretically prove that GNN performance can be improved by utilizing graphs with fewer false positive edges (connections between nodes of different classes) and that considering multiple graph views increases the likelihood of finding such beneficial structures. Building on these insights, we introduce Structure-Guided GNN (SG-GNN), an architecture that processes the original graph alongside the newly created structural graphs, adaptively learning to weigh their contributions. Extensive experiments on various benchmark datasets, particularly those with heterophilic characteristics, demonstrate that our SG-GNN achieves state-of-the-art or highly competitive performance, highlighting the efficacy of exploiting structural information to guide GNNs.

nan


Article 684

Title@2025-06-10 (2): Low-Rank Augmented Implicit Neural Representation for Unsupervised High-Dimensional Quantitative MRI Reconstruction

Title: Low-Rank Augmented Implicit Neural Representation for Unsupervised High-Dimensional Quantitative MRI Reconstruction Low-Rank Augmented Implizite Neural Repräsentanz für unüberwachte hochdimensionelle Quantitative MRT-Rekonstruktion 低Rank 低强度增强的内隐隐含神经力代表,用于未受监督的高多样性定量MRI重建 2506.09100v1

Authors (4): Haonan Zhang, Guoyan Lao, Yuyao Zhang, Hongjiang Wei

Quantitative magnetic resonance imaging (qMRI) provides tissue-specific parameters vital for clinical diagnosis. Although simultaneous multi-parametric qMRI (MP-qMRI) technologies enhance imaging efficiency, robustly reconstructing qMRI from highly undersampled, high-dimensional measurements remains a significant challenge. This difficulty arises primarily because current reconstruction methods that rely solely on a single prior or physics-informed model to solve the highly ill-posed inverse problem, which often leads to suboptimal results. To overcome this limitation, we propose LoREIN, a novel unsupervised and dual-prior-integrated framework for accelerated 3D MP-qMRI reconstruction. Technically, LoREIN incorporates both low-rank prior and continuity prior via low-rank representation (LRR) and implicit neural representation (INR), respectively, to enhance reconstruction fidelity. The powerful continuous representation of INR enables the estimation of optimal spatial bases within the low-rank subspace, facilitating high-fidelity reconstruction of weighted images. Simultaneously, the predicted multi-contrast weighted images provide essential structural and quantitative guidance, further enhancing the reconstruction accuracy of quantitative parameter maps. Furthermore, our work introduces a zero-shot learning paradigm with broad potential in complex spatiotemporal and high-dimensional image reconstruction tasks, further advancing the field of medical imaging.

nan


Article 685

Title@2025-06-10 (2): k-NN as a Simple and Effective Estimator of Transferability

Title: k-NN as a Simple and Effective Estimator of Transferability k-NN als einfacher und effektiver Schätzwert für die Übertragbarkeit k-NN作为可转让性的一个简单而有效的模拟器 2503.18528v2

Authors (5): Moein Sorkhei, Christos Matsoukas, Johan Fredin Haslum, Emir Konuk, Kevin Smith

How well can one expect transfer learning to work in a new setting where the domain is shifted, the task is different, and the architecture changes? Many transfer learning metrics have been proposed to answer this question. But how accurate are their predictions in a realistic new setting? We conducted an extensive evaluation involving over 42,000 experiments comparing 23 transferability metrics across 16 different datasets to assess their ability to predict transfer performance. Our findings reveal that none of the existing metrics perform well across the board. However, we find that a simple k-nearest neighbor evaluation – as is commonly used to evaluate feature quality for self-supervision – not only surpasses existing metrics, but also offers better computational efficiency and ease of implementation.

nan


Article 686

Title@2025-06-10 (2): Deep reinforcement learning for near-deterministic preparation of cubic- and quartic-phase gates in photonic quantum computing

Title: Deep reinforcement learning for near-deterministic preparation of cubic- and quartic-phase gates in photonic quantum computing Tiefes Verstärkungslernen für die nahezu deterministische Vorbereitung von kubischen und quartic-Phasentoren im photonischen Quanten-Computing 深入强化学习,为光子量衡算的立方和临界阶段大门进行近乎决定性的准备 2506.07859v2

Authors (4): Amanuel Anteneh, Léandre Brunel, Carlos González-Arciniegas, Olivier Pfister

Cubic-phase states are a sufficient resource for universal quantum computing over continuous variables. We present results from numerical experiments in which deep neural networks are trained via reinforcement learning to control a quantum optical circuit for generating cubic-phase states, with an average success rate of 96%. The only non-Gaussian resource required is photon-number-resolving measurements. We also show that the exact same resources enable the direct generation of a quartic-phase gate, with no need for a cubic gate decomposition.

nan


Article 687

Title@2025-06-10 (2): Cooperative and Collaborative Multi-Task Semantic Communication for Distributed Sources

Title: Cooperative and Collaborative Multi-Task Semantic Communication for Distributed Sources Kooperative und kollaborative Multi-Task Semantische Kommunikation für verteilte Quellen 合作与合作 多任务、跨语义交流促进分配源的传播 2411.02150v2

Authors (5): Ahmad Halimi Razlighi, Maximilian H. V. Tillmann, Edgar Beck, Carsten Bockelmann, Armin Dekorsy

In this paper, we explore a multi-task semantic communication (SemCom) system for distributed sources, extending the existing focus on collaborative single-task execution. We build on the cooperative multi-task processing introduced in [1], which divides the encoder into a common unit (CU) and multiple specific units (SUs). While earlier studies in multi-task SemCom focused on full observation settings, our research explores a more realistic case where only distributed partial observations are available, such as in a production line monitored by multiple sensing nodes. To address this, we propose an SemCom system that supports multi-task processing through cooperation on the transmitter side via split structure and collaboration on the receiver side. We have used an information-theoretic perspective with variational approximations for our end-to-end data-driven approach. Simulation results demonstrate that the proposed cooperative and collaborative multi-task (CCMT) SemCom system significantly improves task execution accuracy, particularly in complex datasets, if the noise introduced from the communication channel is not limiting the task performance too much. Our findings contribute to a more general SemCom framework capable of handling distributed sources and multiple tasks simultaneously, advancing the applicability of SemCom systems in real-world scenarios.

nan


Article 688

Title@2025-06-10 (2): StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

Title: StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams StreamSplat: Auf dem Weg zu Online Dynamische 3D-Rekonstruktion aus unkalibrierten Videostreams StreamSplat:从未经校正的视频流走向在线动态3D重建 2506.08862v1

Authors (5): Zike Wu, Qi Yan, Xuanyu Yi, Lele Wang, Renjie Liao

Real-time reconstruction of dynamic 3D scenes from uncalibrated video streams is crucial for numerous real-world applications. However, existing methods struggle to jointly address three key challenges: 1) processing uncalibrated inputs in real time, 2) accurately modeling dynamic scene evolution, and 3) maintaining long-term stability and computational efficiency. To this end, we introduce StreamSplat, the first fully feed-forward framework that transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting (3DGS) representations in an online manner, capable of recovering scene dynamics from temporally local observations. We propose two key technical innovations: a probabilistic sampling mechanism in the static encoder for 3DGS position prediction, and a bidirectional deformation field in the dynamic decoder that enables robust and efficient dynamic modeling. Extensive experiments on static and dynamic benchmarks demonstrate that StreamSplat consistently outperforms prior works in both reconstruction quality and dynamic scene modeling, while uniquely supporting online reconstruction of arbitrarily long video streams. Code and models are available at https://github.com/nickwzk/StreamSplat.

nan


Article 689

Title@2025-06-10 (2): DIME:Diffusion-Based Maximum Entropy Reinforcement Learning

Title: DIME:Diffusion-Based Maximum Entropy Reinforcement Learning DIME:Diffusionsbasiertes maximales Entropie-Verstärkungs-Lernen DIME: 以传播为基础的最大助爆强化学习 2502.02316v2

Authors (8): Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palenicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann

Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges-primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). \emph{DIME} leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.

nan


Article 690

Title@2025-06-10 (2): Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Title: Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers Zu groß zu denken: Kapazität, Erinnerung und Verallgemeinerung in vortrainierten Transformern 能力、记忆和在培训前变异器中的普及化 2506.09099v1

Authors (2): Joshua Barron, Devin White

The relationship between memorization and generalization in large language models (LLMs) remains an open area of research, with growing evidence that the two are deeply intertwined. In this work, we investigate this relationship by pre-training a series of capacity-limited Transformer models from scratch on two synthetic character-level tasks designed to separately probe generalization (via arithmetic extrapolation) and memorization (via factual recall). We observe a consistent trade-off: small models extrapolate to unseen arithmetic cases but fail to memorize facts, while larger models memorize but fail to extrapolate. An intermediate-capacity model exhibits a similar shift toward memorization. When trained on both tasks jointly, no model (regardless of size) succeeds at extrapolation. These findings suggest that pre-training may intrinsically favor one learning mode over the other. By isolating these dynamics in a controlled setting, our study offers insight into how model capacity shapes learning behavior and offers broader implications for the design and deployment of small language models.

nan


Article 691

Title@2025-06-10 (2): On the Hardness of Sampling from Mixture Distributions via Langevin Dynamics

Title: On the Hardness of Sampling from Mixture Distributions via Langevin Dynamics Über die Härte der Probenahme aus Mischungsverteilungen über Langevin Dynamics 使用 Langevin Directives 进行混音发行的样本的强度 2406.02017v3

Authors (3): Xiwei Cheng, Kexin Fu, Farzan Farnia

The Langevin Dynamics (LD), which aims to sample from a probability distribution using its score function, has been widely used for analyzing and developing score-based generative modeling algorithms. While the convergence behavior of LD in sampling from a uni-modal distribution has been extensively studied in the literature, the analysis of LD under a mixture distribution with distinct modes remains underexplored in the literature. In this work, we analyze LD in sampling from a mixture distribution and theoretically study its convergence properties. Our theoretical results indicate that for general mixture distributions of sub-Gaussian components, LD could fail in finding all the components within a sub-exponential number of steps in the data dimension. Following our result on the complexity of LD in sampling from high-dimensional variables, we propose Chained Langevin Dynamics (Chained-LD), which divides the data vector into patches of smaller sizes and generates every patch sequentially conditioned on the previous patches. Our theoretical analysis of Chained-LD indicates its faster convergence speed to the components of a mixture distribution. We present the results of several numerical experiments on synthetic and real image datasets, validating our theoretical results on the iteration complexities of sample generation from mixture distributions using the vanilla and chained LD algorithms.

nan


Article 692

Title@2025-06-10 (2): Learning surrogate equations for the analysis of an agent-based cancer model

Title: Learning surrogate equations for the analysis of an agent-based cancer model Erlernen von Surrogatgleichungen für die Analyse eines agentenbasierten Krebsmodells 用于分析基于制剂的癌症模式的学习替代方程 2503.01718v2

Authors (5): Kevin Burrage, Pamela M. Burrage, Justin N. Kreikemeyer, Adelinde M. Uhrmacher, Hasitha N. Weerasinghe

In this paper, we adapt a two-species agent-based cancer model that describes the interaction between cancer cells and healthy cells on a uniform grid to include the interaction with a third species – namely immune cells. We run six different scenarios to explore the competition between cancer and immune cells and the initial concentration of the immune cells on cancer dynamics. We then use coupled equation learning to construct a population-based reaction model for each scenario. We show how they can be unified into a single surrogate population-based reaction model, whose underlying three coupled ordinary differential equations are much easier to analyse than the original agent-based model. As an example, by finding the single steady state of the cancer concentration, we are able to find a linear relationship between this concentration and the initial concentration of the immune cells. This then enables us to estimate suitable values for the competition and initial concentration to reduce the cancer substantially without performing additional complex and expensive simulations from an agent-based stochastic model.

nan


Article 693

Title@2025-06-10 (2): Agile Reinforcement Learning for Real-Time Task Scheduling in Edge Computing

Title: Agile Reinforcement Learning for Real-Time Task Scheduling in Edge Computing Agiles Verstärkungslernen für Echtzeit-Aufgabenplanung im Edge Computing 边边计算中实时任务排期 2506.08850v1

Authors (3): Amin Avan, Akramul Azim, Qusay Mahmoud

Soft real-time applications are becoming increasingly complex, posing significant challenges for scheduling offloaded tasks in edge computing environments while meeting task timing constraints. Moreover, the exponential growth of the search space, presence of multiple objectives and parameters, and highly dynamic nature of edge computing environments further exacerbate the complexity of task scheduling. As a result, schedulers based on heuristic and metaheuristic algorithms frequently encounter difficulties in generating optimal or near-optimal task schedules due to their constrained ability to adapt to the dynamic conditions and complex environmental characteristics of edge computing. Accordingly, reinforcement learning algorithms have been incorporated into schedulers to address the complexity and dynamic conditions inherent in task scheduling in edge computing. However, a significant limitation of reinforcement learning algorithms is the prolonged learning time required to adapt to new environments and to address medium- and large-scale problems. This challenge arises from the extensive global action space and frequent random exploration of irrelevant actions. Therefore, this study proposes Agile Reinforcement learning (aRL), in which the RL-agent performs informed exploration and executes only relevant actions. Consequently, the predictability of the RL-agent is enhanced, leading to rapid adaptation and convergence, which positions aRL as a suitable candidate for scheduling the tasks of soft real-time applications in edge computing. The experiments demonstrate that the combination of informed exploration and action-masking methods enables aRL to achieve a higher hit-ratio and converge faster than the baseline approaches.

nan


Article 694

Title@2025-06-10 (2): IMAGIC-500: IMputation benchmark on A Generative Imaginary Country (500k samples)

Title: IMAGIC-500: IMputation benchmark on A Generative Imaginary Country (500k samples) IMAGIC-500: Imputations-Benchmark für ein generatives Imaginary Country (500k-Proben) IMAGIC-500: 创造想象力国家(500公里样本)的估测基准 2506.08844v1

Authors (6): Siyi Sun, David Antony Selby, Yunchuan Huang, Sebastian Vollmer, Seth Flaxman, Anisoara Calinescu

Missing data imputation in tabular datasets remains a pivotal challenge in data science and machine learning, particularly within socioeconomic research. However, real-world socioeconomic datasets are typically subject to strict data protection protocols, which often prohibit public sharing, even for synthetic derivatives. This severely limits the reproducibility and accessibility of benchmark studies in such settings. Further, there are very few publicly available synthetic datasets. Thus, there is limited availability of benchmarks for systematic evaluation of imputation methods on socioeconomic datasets, whether real or synthetic. In this study, we utilize the World Bank’s publicly available synthetic dataset, Synthetic Data for an Imaginary Country, which closely mimics a real World Bank household survey while being fully public, enabling broad access for methodological research. With this as a starting point, we derived the IMAGIC-500 dataset: we select a subset of 500k individuals across approximately 100k households with 19 socioeconomic features, designed to reflect the hierarchical structure of real-world household surveys. This paper introduces a comprehensive missing data imputation benchmark on IMAGIC-500 under various missing mechanisms (MCAR, MAR, MNAR) and missingness ratios (10\%, 20\%, 30\%, 40\%, 50\%). Our evaluation considers the imputation accuracy for continuous and categorical variables, computational efficiency, and impact on downstream predictive tasks, such as estimating educational attainment at the individual level. The results highlight the strengths and weaknesses of statistical, traditional machine learning, and deep learning imputation techniques, including recent diffusion-based methods. The IMAGIC-500 dataset and benchmark aim to facilitate the development of robust imputation algorithms and foster reproducible social science research.

nan


Article 695

Title@2025-06-10 (2): Quantum Architecture Search with Unsupervised Representation Learning

Title: Quantum Architecture Search with Unsupervised Representation Learning Quantenarchitektur Suche mit unüberwachtem Repräsentationslernen 以无人监督的代表制学习方式搜索量子结构 2401.11576v5

Authors (4): Yize Sun, Zixin Wu, Yunpu Ma, Volker Tresp

Unsupervised representation learning presents new opportunities for advancing Quantum Architecture Search (QAS) on Noisy Intermediate-Scale Quantum (NISQ) devices. QAS is designed to optimize quantum circuits for Variational Quantum Algorithms (VQAs). Most QAS algorithms tightly couple the search space and search algorithm, typically requiring the evaluation of numerous quantum circuits, resulting in high computational costs and limiting scalability to larger quantum circuits. Predictor-based QAS algorithms mitigate this issue by estimating circuit performance based on structure or embedding. However, these methods often demand time-intensive labeling to optimize gate parameters across many circuits, which is crucial for training accurate predictors. Inspired by the classical neural architecture search algorithm Arch2vec, we investigate the potential of unsupervised representation learning for QAS without relying on predictors. Our framework decouples unsupervised architecture representation learning from the search process, enabling the learned representations to be applied across various downstream tasks. Additionally, it integrates an improved quantum circuit graph encoding scheme, addressing the limitations of existing representations and enhancing search efficiency. This predictor-free approach removes the need for large labeled datasets. During the search, we employ REINFORCE and Bayesian Optimization to explore the latent representation space and compare their performance against baseline methods. We further validate our approach by executing the best-discovered MaxCut circuits on IBM’s ibm_sherbrooke quantum processor, confirming that the architectures retain optimal performance even under real hardware noise. Our results demonstrate that the framework efficiently identifies high-performing quantum circuits with fewer search iterations.

nan


Article 696

Title@2025-06-10 (2): Robot Pouring: Identifying Causes of Spillage and Selecting Alternative Action Parameters Using Probabilistic Actual Causation

Title: Robot Pouring: Identifying Causes of Spillage and Selecting Alternative Action Parameters Using Probabilistic Actual Causation Robot-Pouring: Identifizierung von Ursachen von Spillage und Auswahl alternativer Aktionsparameter mittels probabilistischer tatsächlicher Schadensursachen 机器人定位:利用概率实际原因查明螺旋原因和选择替代行动参数 2502.09395v3

Authors (5): Jaime Maldonado, Jonas Krumme, Christoph Zetzsche, Vanessa Didelez, Kerstin Schill

In everyday life, we perform tasks (e.g., cooking or cleaning) that involve a large variety of objects and goals. When confronted with an unexpected or unwanted outcome, we take corrective actions and try again until achieving the desired result. The reasoning performed to identify a cause of the observed outcome and to select an appropriate corrective action is a crucial aspect of human reasoning for successful task execution. Central to this reasoning is the assumption that a factor is responsible for producing the observed outcome. In this paper, we investigate the use of probabilistic actual causation to determine whether a factor is the cause of an observed undesired outcome. Furthermore, we show how the actual causation probabilities can be used to find alternative actions to change the outcome. We apply the probabilistic actual causation analysis to a robot pouring task. When spillage occurs, the analysis indicates whether a task parameter is the cause and how it should be changed to avoid spillage. The analysis requires a causal graph of the task and the corresponding conditional probability distributions. To fulfill these requirements, we perform a complete causal modeling procedure (i.e., task analysis, definition of variables, determination of the causal graph structure, and estimation of conditional probability distributions) using data from a realistic simulation of the robot pouring task, covering a large combinatorial space of task parameters. Based on the results, we discuss the implications of the variables’ representation and how the alternative actions suggested by the actual causation analysis would compare to the alternative solutions proposed by a human observer. The practical use of the analysis of probabilistic actual causation to select alternative action parameters is demonstrated.

nan


Article 697

Title@2025-06-10 (2): Nearly Optimal Differentially Private ReLU Regression

Title: Nearly Optimal Differentially Private ReLU Regression Fast optimal unterschiedliche private ReLU-Regression 接近最佳差异化私自 ReLU 回归 2503.06009v2

Authors (6): Meng Ding, Mingxi Lei, Shaowei Wang, Tianhang Zheng, Di Wang, Jinhui Xu

In this paper, we investigate one of the most fundamental nonconvex learning problems, ReLU regression, in the Differential Privacy (DP) model. Previous studies on private ReLU regression heavily rely on stringent assumptions, such as constant bounded norms for feature vectors and labels. We relax these assumptions to a more standard setting, where data can be i.i.d. sampled from $O(1)$-sub-Gaussian distributions. We first show that when $\varepsilon = \tilde{O}(\sqrt{\frac{1}{N}})$ and there is some public data, it is possible to achieve an upper bound of $\tilde{O}(\frac{d^2}{N^2 \varepsilon^2})$ for the excess population risk in $(\epsilon, \delta)$-DP, where $d$ is the dimension and $N$ is the number of data samples. Moreover, we relax the requirement of $\epsilon$ and public data by proposing and analyzing a one-pass mini-batch Generalized Linear Model Perceptron algorithm (DP-MBGLMtron). Additionally, using the tracing attack argument technique, we demonstrate that the minimax rate of the estimation error for $(\varepsilon, \delta)$-DP algorithms is lower bounded by $\Omega(\frac{d^2}{N^2 \varepsilon^2})$. This shows that DP-MBGLMtron achieves the optimal utility bound up to logarithmic factors. Experiments further support our theoretical results.

nan


Article 698

Title@2025-06-10 (2): Relational decomposition for program synthesis

Title: Relational decomposition for program synthesis Verhältnismäßige Zersetzung für die Programmsynthese 用于方案合成的通货膨胀分解 2408.12212v3

Authors (2): Céline Hocquette, Andrew Cropper

We introduce a relational approach to program synthesis. The key idea is to decompose synthesis tasks into simpler relational synthesis subtasks. Specifically, our representation decomposes a training input-output example into sets of input and output facts respectively. We then learn relations between the input and output facts. We demonstrate our approach using an off-the-shelf inductive logic programming (ILP) system on four challenging synthesis datasets. Our results show that (i) our representation can outperform a standard one, and (ii) an off-the-shelf ILP system with our representation can outperform domain-specific approaches.

nan


Article 699

Title@2025-06-10 (2): Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Title: Stepsize anything: A unified learning rate schedule for budgeted-iteration training Stepsize everything: Eine einheitliche Lernrate Zeitplan für budgetierte iteration Ausbildung 采取任何步骤:为列入预算的就地培训统一学习率时间表 2505.24452v2

Authors (5): Anda Tang, Yiming Dong, Yutao Zeng, zhou Xun, Zhouchen Lin

The expanding computational costs and limited resources underscore the critical need for budgeted-iteration training, which aims to achieve optimal learning within predetermined iteration budgets. While learning rate schedules fundamentally govern the performance of different networks and tasks, particularly in budgeted-iteration scenarios, their design remains largely heuristic, lacking theoretical foundations. In addition, the optimal learning rate schedule requires extensive trial-and-error selection, making the training process inefficient. In this work, we propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules among diverse architectures and tasks under different constrained training budgets. First, we bridge the gap by constructing a novel training budget-aware optimization framework, which explicitly accounts for the robustness to landscape curvature variations. From this framework, we derive the UBA schedule, controlled by a single hyper-parameter \varphi that provides a trade-off between flexibility and simplicity, eliminating the need for per-network numerical optimization. Moreover, we establish a theoretical connection between \varphi and the condition number, adding interpretation and justification to our approach. Besides, we prove the convergence for different values of \varphi. We offer practical guidelines for its selection via theoretical analysis and empirical results. Extensive experimental results show that UBA consistently surpasses the commonly-used schedules across diverse vision and language tasks, spanning network architectures (e.g., ResNet, OLMo) and scales, under different training-iteration budgets.

nan


Article 700

Title@2025-06-10 (2): EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations

Title: EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations EquivaMap: LLM-Leveraging für automatische Äquivalenzprüfung von Optimierungsformulierungen Equivamap: 利用LMLMs对优化配方进行自动等效检查 2502.14760v2

Authors (4): Haotian Zhai, Connor Lawless, Ellen Vitercik, Liu Leqi

A fundamental problem in combinatorial optimization is identifying equivalent formulations. Despite the growing need for automated equivalence checks – driven, for example, by optimization copilots, which generate problem formulations from natural language descriptions – current approaches rely on simple heuristics that fail to reliably check formulation equivalence. Inspired by Karp reductions, in this work we introduce Quasi-Karp equivalence, a formal criterion for determining when two optimization formulations are equivalent based on the existence of a mapping between their decision variables. We propose EquivaMap, a framework that leverages large language models to automatically discover such mappings for scalable, reliable equivalence checking, with a verification stage that ensures mapped solutions preserve feasibility and optimality without additional solver calls. To evaluate our approach, we construct EquivaFormulation, the first open-source dataset of equivalent optimization formulations, generated by applying transformations such as adding slack variables or valid inequalities to existing formulations. Empirically, EquivaMap significantly outperforms existing methods, achieving substantial improvements in correctly identifying formulation equivalence.

nan


Article 701

Title@2025-06-10 (2): High-Dimensional Independence Testing via Maximum and Average Distance Correlations

Title: High-Dimensional Independence Testing via Maximum and Average Distance Correlations High-Dimensional Independence Testing über maximale und mittlere Distanzkorrelationen 通过最大和平均距离交错进行高度多功能独立测试 2001.01095v4

Authors (2): Cencheng Shen, Yuexiao Dong

This paper investigates the utilization of maximum and average distance correlations for multivariate independence testing. We characterize their consistency properties in high-dimensional settings with respect to the number of marginally dependent dimensions, compare the advantages of each test statistic, examine their respective null distributions, and present a fast chi-square-based testing procedure. The resulting tests are non-parametric and applicable to both Euclidean distance and the Gaussian kernel as the underlying metric. To better understand the practical use cases of the proposed tests, we evaluate the empirical performance of the maximum distance correlation, average distance correlation, and the original distance correlation across various multivariate dependence scenarios, as well as conduct a real data experiment to test the presence of various cancer types and peptide levels in human plasma.

nan


Article 702

Title@2025-06-10 (2): Activated LoRA: Fine-tuned LLMs for Intrinsics

Title: Activated LoRA: Fine-tuned LLMs for Intrinsics Aktivierte LoRA: Feinjustierte LLMs für die Intrinsik 活性LORA: 用于内分泌的微调LMLM 2504.12397v4

Authors (9): Kristjan Greenewald, Luis Lastras, Thomas Parnell, Vraj Shah, Lucian Popa, Giulio Zizzo, Chulaka Gunasekara, Ambrish Rawat, David Cox

Low-Rank Adaptation (LoRA) has emerged as a highly efficient framework for finetuning the weights of large foundation models, and has become the go-to method for data-driven customization of LLMs. Despite the promise of highly customized behaviors and capabilities, switching between relevant LoRAs in a multiturn setting is inefficient, as the key-value (KV) cache of the entire turn history must be recomputed with the LoRA weights before generation can begin. To address this problem, we propose Activated LoRA (aLoRA), an adapter architecture which modifies the LoRA framework to only adapt weights for the tokens in the sequence \emph{after} the aLoRA is invoked. This change crucially allows aLoRA to accept the base model’s KV cache of the input string, meaning that aLoRA can be instantly activated whenever needed in a chain without recomputing the cache. This enables building what we call \emph{intrinsics}, i.e. specialized models invoked to perform well-defined operations on portions of an input chain or conversation that otherwise uses the base model by default. We train a set of aLoRA-based intrinsics models, demonstrating competitive accuracy with standard LoRA while achieving significant inference benefits. The codebase is at https://github.com/IBM/activated-lora.

nan


Article 703

Title@2025-06-10 (2): Spectral invariance and maximality properties of the frequency spectrum of quantum neural networks

Title: Spectral invariance and maximality properties of the frequency spectrum of quantum neural networks Spektrale Invarianz und maximale Eigenschaften des Frequenzspektrums quantenneuraler Netzwerke 量子神经网络频谱的光谱性和最大度性 2402.14515v3

Authors (2): Patrick Holzer, Ivica Turkalj

Quantum Neural Networks (QNNs) are a popular approach in Quantum Machine Learning. We analyze this frequency spectrum using the Minkowski sum for sets and the set of differences, which makes it particularly easy to express and calculate the frequency spectrum algebraically, and prove different maximality results for a large class of models. Furthermore, we prove that under some mild conditions there exists a bijection between classes of models with the same area $A:=R\cdot L$ that preserves the frequency spectrum, where $R$ denotes the number of qubits and $L$ the number of layers, which we consequently call spectral invariance under area-preserving transformations. With this we explain the symmetry in $R$ and $L$ in the results often observed in the literature and show that the maximal frequency spectrum depends only on the area $A=RL$ and not on the individual values of $R$ and $L$. Moreover, we collect and extend existing results and specify the maximum possible frequency spectrum of a QNN with arbitrarily many layers as a function of the spectrum of its generators. In the case of arbitrary dimensional generators, where our two introduces notions of maximality differ, we extend existing results based on the so-called Golomb ruler and introduce a second novel approach based on a variation of the turnpike problem, which we call the relaxed turnpike problem.

nan


Article 704

Title@2025-06-10 (2): Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Title: Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO Eintauchen in RL für Bildgenerierung mit CoT: Eine Studie über DPO vs. GRPO 与COT一起为图像生成而跳入RL:关于DPO与GROP公司的研究 2505.17017v2

Authors (8): Chengzhuo Tong, Ziyu Guo, Renrui Zhang, Wenyu Shan, Xinyu Wei, Zhenghao Xing, Hongsheng Li, Pheng-Ann Heng

Recent advancements underscore the significant role of Reinforcement Learning (RL) in enhancing the Chain-of-Thought (CoT) reasoning capabilities of large language models (LLMs). Two prominent RL algorithms, Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO), are central to these developments, showcasing different pros and cons. Autoregressive image generation, also interpretable as a sequential CoT reasoning process, presents unique challenges distinct from LLM-based CoT reasoning. These encompass ensuring text-image consistency, improving image aesthetic quality, and designing sophisticated reward models, rather than relying on simpler rule-based rewards. While recent efforts have extended RL to this domain, these explorations typically lack an in-depth analysis of the domain-specific challenges and the characteristics of different RL strategies. To bridge this gap, we provide the first comprehensive investigation of the GRPO and DPO algorithms in autoregressive image generation, evaluating their in-domain performance and out-of-domain generalization, while scrutinizing the impact of different reward models on their respective capabilities. Our findings reveal that GRPO and DPO exhibit distinct advantages, and crucially, that reward models possessing stronger intrinsic generalization capabilities potentially enhance the generalization potential of the applied RL algorithms. Furthermore, we systematically explore three prevalent scaling strategies to enhance both their in-domain and out-of-domain proficiency, deriving unique insights into efficiently scaling performance for each paradigm. We hope our study paves a new path for inspiring future work on developing more effective RL algorithms to achieve robust CoT reasoning in the realm of autoregressive image generation. Code is released at https://github.com/ZiyuGuo99/Image-Generation-CoT

nan


Article 705

Title@2025-06-10 (2): In Praise of Stubbornness: An Empirical Case for Cognitive-Dissonance Aware Continual Update of Knowledge in LLMs

Title: In Praise of Stubbornness: An Empirical Case for Cognitive-Dissonance Aware Continual Update of Knowledge in LLMs In Lob der Stubbornness: Ein empirischer Fall für Kognitiv-Dissonanz Aware Continual Update of Knowledge in LLMs 赞美固态:关于知识在LLM中不断更新的认知与不相识经验案例 2502.04390v2

Authors (6): Simone Clemente, Zied Ben Houidi, Alexis Huet, Dario Rossi, Giulio Franzese, Pietro Michiardi

Through systematic empirical investigation, we uncover a fundamental and concerning property of Large Language Models: while they can safely learn facts that don’t contradict their knowledge, attempting to update facts with contradictory information triggers catastrophic corruption of unrelated knowledge. Unlike humans, who naturally resist contradictory information, these models indiscriminately accept contradictions, leading to devastating interference, destroying up to 80% of unrelated knowledge even when learning as few as 10-100 contradicting facts. To understand whether this interference could be mitigated through selective plasticity, we experiment with targeted network updates, distinguishing between previously used (stubborn) and rarely used (plastic) neurons. We uncover another asymmetry: while sparing frequently-used neurons significantly improves retention of existing knowledge for non-contradictory updates (98% vs 93% with standard updates), contradictory updates trigger catastrophic interference regardless of targeting strategy. This effect which persists across tested model scales (GPT-2 to GPT-J-6B), suggests a fundamental limitation in how neural networks handle contradictions. Finally, we demonstrate that contradictory information can be reliably detected (95%+ accuracy) using simple model features, offering a potential protective mechanism. These findings motivate new architectures that can, like humans, naturally resist contradictions rather than allowing destructive overwrites.

nan


Article 706

Title@2025-06-10 (2): Improving the Noise Estimation of Latent Neural Stochastic Differential Equations

Title: Improving the Noise Estimation of Latent Neural Stochastic Differential Equations Verbesserung der Lärmschätzung latenter neuraler stochastischer Differentialgleichungen 改进对内心神经斯托卡式差异的噪音估计 2412.17499v2

Authors (4): Linus Heck, Maximilian Gelbrecht, Michael T. Schaub, Niklas Boers

Latent neural stochastic differential equations (SDEs) have recently emerged as a promising approach for learning generative models from stochastic time series data. However, they systematically underestimate the noise level inherent in such data, limiting their ability to capture stochastic dynamics accurately. We investigate this underestimation in detail and propose a straightforward solution: by including an explicit additional noise regularization in the loss function, we are able to learn a model that accurately captures the diffusion component of the data. We demonstrate our results on a conceptual model system that highlights the improved latent neural SDE’s capability to model stochastic bistable dynamics.

nan


Article 707

Title@2025-06-10 (2): Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval

Title: Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval Optimierte Text-Embedding-Modelle und Benchmarks für die Amharische Passage Retrieval 阿姆光通过通过检索的最佳文本嵌入模型和基准 2505.19356v2

Authors (3): Kidist Amde Mekonnen, Yosef Worku Alemneh, Maarten de Rijke

Neural retrieval methods using transformer-based pre-trained language models have advanced multilingual and cross-lingual retrieval. However, their effectiveness for low-resource, morphologically rich languages such as Amharic remains underexplored due to data scarcity and suboptimal tokenization. We address this gap by introducing Amharic-specific dense retrieval models based on pre-trained Amharic BERT and RoBERTa backbones. Our proposed RoBERTa-Base-Amharic-Embed model (110M parameters) achieves a 17.6% relative improvement in MRR@10 and a 9.86% gain in Recall@10 over the strongest multilingual baseline, Arctic Embed 2.0 (568M parameters). More compact variants, such as RoBERTa-Medium-Amharic-Embed (42M), remain competitive while being over 13x smaller. Additionally, we train a ColBERT-based late interaction retrieval model that achieves the highest MRR@10 score (0.843) among all evaluated models. We benchmark our proposed models against both sparse and dense retrieval baselines to systematically assess retrieval effectiveness in Amharic. Our analysis highlights key challenges in low-resource settings and underscores the importance of language-specific adaptation. To foster future research in low-resource IR, we publicly release our dataset, codebase, and trained models at https://github.com/kidist-amde/amharic-ir-benchmarks.

nan


Article 708

Title@2025-06-10 (2): syren-baryon: Analytic emulators for the impact of baryons on the matter power spectrum

Title: syren-baryon: Analytic emulators for the impact of baryons on the matter power spectrum Syren-Baryon: Analytische Emulatoren für den Einfluss von Baryonen auf das Materie-Leistungsspektrum syren-baryon: 用于分析男爵对物质能量频谱影响的分析模拟器 2506.08783v1

Authors (5): Lukas Kammerer, Deaglan J. Bartlett, Gabriel Kronberger, Harry Desmond, Pedro G. Ferreira

Baryonic physics has a considerable impact on the distribution of matter in our Universe on scales probed by current and future cosmological surveys, acting as a key systematic in such analyses. We seek simple symbolic parametrisations for the impact of baryonic physics on the matter power spectrum for a range of physically motivated models, as a function of wavenumber, redshift, cosmology, and parameters controlling the baryonic feedback. We use symbolic regression to construct analytic approximations for the ratio of the matter power spectrum in the presence of baryons to that without such effects. We obtain separate functions of each of four distinct sub-grid prescriptions of baryonic physics from the CAMELS suite of hydrodynamical simulations (Astrid, IllustrisTNG, SIMBA and Swift-EAGLE) as well as for a baryonification algorithm. We also provide functions which describe the uncertainty on these predictions, due to both the stochastic nature of baryonic physics and the errors on our fits. The error on our approximations to the hydrodynamical simulations is comparable to the sample variance estimated through varying initial conditions, and our baryonification expression has a root mean squared error of better than one percent, although this increases on small scales. These errors are comparable to those of previous numerical emulators for these models. Our expressions are enforced to have the physically correct behaviour on large scales and at high redshift. Due to their analytic form, we are able to directly interpret the impact of varying cosmology and feedback parameters, and we can identify parameters which have little to no effect. Each function is based on a different implementation of baryonic physics, and can therefore be used to discriminate between these models when applied to real data. We provide publicly available code for all symbolic approximations found.

nan


Article 709

Title@2025-06-10 (2): LT-PINN: Lagrangian Topology-conscious Physics-informed Neural Network for Boundary-focused Engineering Optimization

Title: LT-PINN: Lagrangian Topology-conscious Physics-informed Neural Network for Boundary-focused Engineering Optimization LT-PINN: Lagrangian Topologie-bewusste physik-informierte Neuronales Netzwerk für boundary-focused Engineering Optimization LT-PINN:Lagrangian 地形 – – 具有意识的地形 – – 物理意识 – – 以边界为重点的工程优化知情神经网络 2506.06300v2

Authors (5): Yuanye Zhou, Zhaokun Wang, Kai Zhou, Hui Tang, Xiaofan Li

Physics-informed neural networks (PINNs) have emerged as a powerful meshless tool for topology optimization, capable of simultaneously determining optimal topologies and physical solutions. However, conventional PINNs rely on density-based topology descriptions, which necessitate manual interpolation and limit their applicability to complex geometries. To address this, we propose Lagrangian topology-conscious PINNs (LT-PINNs), a novel framework for boundary-focused engineering optimization. By parameterizing the control variables of topology boundary curves as learnable parameters, LT-PINNs eliminate the need for manual interpolation and enable precise boundary determination. We further introduce specialized boundary condition loss function and topology loss function to ensure sharp and accurate boundary representations, even for intricate topologies. The accuracy and robustness of LT-PINNs are validated via two types of partial differential equations (PDEs), including elastic equation with Dirichlet boundary conditions and Laplace’s equation with Neumann boundary conditions. Furthermore, we demonstrate effectiveness of LT-PINNs on more complex time-dependent and time-independent flow problems without relying on measurement data, and showcase their engineering application potential in flow velocity rearrangement, transforming a uniform upstream velocity into a sine-shaped downstream profile. The results demonstrate (1) LT-PINNs achieve substantial reductions in relative L2 errors compared with the state-of-art density topology-oriented PINNs (DT-PINNs), (2) LT-PINNs can handle arbitrary boundary conditions, making them suitable for a wide range of PDEs, and (3) LT-PINNs can infer clear topology boundaries without manual interpolation, especially for complex topologies.

nan


Article 710

Title@2025-06-10 (2): Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models

Title: Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models Landsat-Bench: Datensätze und Benchmarks für Landsat-Stiftungsmodelle Landsat-Bench:Landsat基金会模型数据集和基准 2506.08780v1

Authors (3): Isaac Corley, Lakshay Sharma, Ruth Crasto

The Landsat program offers over 50 years of globally consistent Earth imagery. However, the lack of benchmarks for this data constrains progress towards Landsat-based Geospatial Foundation Models (GFM). In this paper, we introduce Landsat-Bench, a suite of three benchmarks with Landsat imagery that adapt from existing remote sensing datasets – EuroSAT-L, BigEarthNet-L, and LC100-L. We establish baseline and standardized evaluation methods across both common architectures and Landsat foundation models pretrained on the SSL4EO-L dataset. Notably, we provide evidence that SSL4EO-L pretrained GFMs extract better representations for downstream tasks in comparison to ImageNet, including performance gains of +4% OA and +5.1% mAP on EuroSAT-L and BigEarthNet-L.

nan


Article 711

Title@2025-06-10 (2): Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch

Title: Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch Silencing Empowerment, Allowing Bigotry: Auditierung der Moderation von Hass Rede auf Twitch 静默赋权,允许偏见:审计对抽搐者仇恨言论的温和程度 2506.07667v2

Authors (6): Prarabdh Shukla, Wei Yin Chong, Yash Patel, Brennan Schaffner, Danish Pruthi, Arjun Bhagoji

To meet the demands of content moderation, online platforms have resorted to automated systems. Newer forms of real-time engagement($\textit{e.g.}$, users commenting on live streams) on platforms like Twitch exert additional pressures on the latency expected of such moderation systems. Despite their prevalence, relatively little is known about the effectiveness of these systems. In this paper, we conduct an audit of Twitch’s automated moderation tool ($\texttt{AutoMod}$) to investigate its effectiveness in flagging hateful content. For our audit, we create streaming accounts to act as siloed test beds, and interface with the live chat using Twitch’s APIs to send over $107,000$ comments collated from $4$ datasets. We measure $\texttt{AutoMod}$’s accuracy in flagging blatantly hateful content containing misogyny, racism, ableism and homophobia. Our experiments reveal that a large fraction of hateful messages, up to $94\%$ on some datasets, $\textit{bypass moderation}$. Contextual addition of slurs to these messages results in $100\%$ removal, revealing $\texttt{AutoMod}$’s reliance on slurs as a moderation signal. We also find that contrary to Twitch’s community guidelines, $\texttt{AutoMod}$ blocks up to $89.5\%$ of benign examples that use sensitive words in pedagogical or empowering contexts. Overall, our audit points to large gaps in $\texttt{AutoMod}$’s capabilities and underscores the importance for such systems to understand context effectively.

nan


Article 712

Title@2025-06-10 (2): Data-Driven Discovery of PDEs via the Adjoint Method

Title: Data-Driven Discovery of PDEs via the Adjoint Method Datengestützte Entdeckung von PDEs über die Adjoint-Methode 通过 “ 联合方法 “ 发现项目设计书的数据驱动数据 2401.17177v4

Authors (3): Mohsen Sadr, Tony Tohme, Kamal Youcef-Toumi

In this work, we present an adjoint-based method for discovering the underlying governing partial differential equations (PDEs) given data. The idea is to consider a parameterized PDE in a general form and formulate a PDE-constrained optimization problem aimed at minimizing the error of the PDE solution from data. Using variational calculus, we obtain an evolution equation for the Lagrange multipliers (adjoint equations) allowing us to compute the gradient of the objective function with respect to the parameters of PDEs given data in a straightforward manner. In particular, we consider a family of parameterized PDEs encompassing linear, nonlinear, and spatial derivative candidate terms, and elegantly derive the corresponding adjoint equations. We show the efficacy of the proposed approach in identifying the form of the PDE up to machine accuracy, enabling the accurate discovery of PDEs from data. We also compare its performance with the famous PDE Functional Identification of Nonlinear Dynamics method known as PDE-FIND (Rudy et al., 2017), on both smooth and noisy data sets. Even though the proposed adjoint method relies on forward/backward solvers, it outperforms PDE-FIND for large data sets thanks to the analytic expressions for gradients of the cost function with respect to each PDE parameter.

nan


Article 713

Title@2025-06-10 (2): TextAtari: 100K Frames Game Playing with Language Agents

Title: TextAtari: 100K Frames Game Playing with Language Agents TextAtari: 100K Frames Spiel mit Sprachagenten TextAtari: 100K 框架游戏与语言代理游戏 2506.04098v2

Authors (11): Wenhao Li, Wenwu Li, Chuyun Shen, Junjie Sheng, Zixiao Huang, Di Wu, Yun Hua, Wei Yin, Xiangfeng Wang, Hongyuan Zha, Bo Jin

We present TextAtari, a benchmark for evaluating language agents on very long-horizon decision-making tasks spanning up to 100,000 steps. By translating the visual state representations of classic Atari games into rich textual descriptions, TextAtari creates a challenging test bed that bridges sequential decision-making with natural language processing. The benchmark includes nearly 100 distinct tasks with varying complexity, action spaces, and planning horizons, all rendered as text through an unsupervised representation learning framework (AtariARI). We evaluate three open-source large language models (Qwen2.5-7B, Gemma-7B, and Llama3.1-8B) across three agent frameworks (zero-shot, few-shot chain-of-thought, and reflection reasoning) to assess how different forms of prior knowledge affect performance on these long-horizon challenges. Four scenarios-Basic, Obscured, Manual Augmentation, and Reference-based-investigate the impact of semantic understanding, instruction comprehension, and expert demonstrations on agent decision-making. Our results reveal significant performance gaps between language agents and human players in extensive planning tasks, highlighting challenges in sequential reasoning, state tracking, and strategic planning across tens of thousands of steps. TextAtari provides standardized evaluation protocols, baseline implementations, and a framework for advancing research at the intersection of language models and planning. Our code is available at https://github.com/Lww007/Text-Atari-Agents.

nan


Article 714

Title@2025-06-10 (2): Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal Discovery

Title: Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal Discovery Pfade zur Kausalität: Informative Subgraphen innerhalb von Wissensgraphen finden für wissensbasierte Kausale Entdeckung 因果路径:在知识型因果发现的知识图中查找信息子集 2506.08771v1

Authors (2): Yuni Susanti, Michael Färber

Inferring causal relationships between variable pairs is crucial for understanding multivariate interactions in complex systems. Knowledge-based causal discovery – which involves inferring causal relationships by reasoning over the metadata of variables (e.g., names or textual context) – offers a compelling alternative to traditional methods that rely on observational data. However, existing methods using Large Language Models (LLMs) often produce unstable and inconsistent results, compromising their reliability for causal inference. To address this, we introduce a novel approach that integrates Knowledge Graphs (KGs) with LLMs to enhance knowledge-based causal discovery. Our approach identifies informative metapath-based subgraphs within KGs and further refines the selection of these subgraphs using Learning-to-Rank-based models. The top-ranked subgraphs are then incorporated into zero-shot prompts, improving the effectiveness of LLMs in inferring the causal relationship. Extensive experiments on biomedical and open-domain datasets demonstrate that our method outperforms most baselines by up to 44.4 points in F1 scores, evaluated across diverse LLMs and KGs. Our code and datasets are available on GitHub: https://github.com/susantiyuni/path-to-causality

nan


Article 715

Title@2025-06-10 (2): CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries

Title: CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries KLARIFY: Kontrastive Präferenz Verstärkungslernen für untangling Ambiguous Queries CLARIFY: 拼凑不清的问询中竞争优惠强化学习 2506.00388v3

Authors (6): Ni Mu, Hao Hu, Xiao Hu, Yiqin Yang, Bo Xu, Qing-Shan Jia

Preference-based reinforcement learning (PbRL) bypasses explicit reward engineering by inferring reward functions from human preference comparisons, enabling better alignment with human intentions. However, humans often struggle to label a clear preference between similar segments, reducing label efficiency and limiting PbRL’s real-world applicability. To address this, we propose an offline PbRL method: Contrastive LeArning for ResolvIng Ambiguous Feedback (CLARIFY), which learns a trajectory embedding space that incorporates preference information, ensuring clearly distinguished segments are spaced apart, thus facilitating the selection of more unambiguous queries. Extensive experiments demonstrate that CLARIFY outperforms baselines in both non-ideal teachers and real human feedback settings. Our approach not only selects more distinguished queries but also learns meaningful trajectory embeddings.

nan


Article 716

Title@2025-06-10 (2): Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting

Title: Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting Kann langsam denkende LLMs Vernunft im Laufe der Zeit? Empirische Studien in Zeitreihen Prognose 时间序列预测的经验研究 2505.24511v2

Authors (3): Jiahao Wang, Mingyue Cheng, Qi Liu

Time series forecasting (TSF) is a fundamental and widely studied task, spanning methods from classical statistical approaches to modern deep learning and multimodal language modeling. Despite their effectiveness, these methods often follow a fast thinking paradigm emphasizing pattern extraction and direct value mapping, while overlooking explicit reasoning over temporal dynamics and contextual dependencies. Meanwhile, emerging slow-thinking LLMs (e.g., ChatGPT-o1, DeepSeek-R1) have demonstrated impressive multi-step reasoning capabilities across diverse domains, suggesting a new opportunity for reframing TSF as a structured reasoning task. This motivates a key question: can slow-thinking LLMs effectively reason over temporal patterns to support time series forecasting, even in zero-shot manner? To investigate this, in this paper, we propose TimeReasoner, an extensive empirical study that formulates TSF as a conditional reasoning task. We design a series of prompting strategies to elicit inference-time reasoning from pretrained slow-thinking LLMs and evaluate their performance across diverse TSF benchmarks. Our findings reveal that slow-thinking LLMs exhibit non-trivial zero-shot forecasting capabilities, especially in capturing high-level trends and contextual shifts. While preliminary, our study surfaces important insights into the reasoning behaviors of LLMs in temporal domains highlighting both their potential and limitations. We hope this work catalyzes further research into reasoning-based forecasting paradigms and paves the way toward more interpretable and generalizable TSF frameworks.

nan


Article 717

Title@2025-06-10 (2): Detecting malignant dynamics on very few blood sample using signature coefficients

Title: Detecting malignant dynamics on very few blood sample using signature coefficients Erkennung maligner Dynamik an sehr wenigen Blutproben mit Signaturkoeffizienten 利用签字系数检测极少数血液样本中的恶性动态 2506.09097v1

Authors (2): Rémi Vaucher, Stéphane Chrétien

Recent discoveries have suggested that the promising avenue of using circulating tumor DNA (ctDNA) levels in blood samples provides reasonable accuracy for cancer monitoring, with extremely low burden on the patient’s side. It is known that the presence of ctDNA can result from various mechanisms leading to DNA release from cells, such as apoptosis, necrosis or active secretion. One key idea in recent cancer monitoring studies is that monitoring the dynamics of ctDNA levels might be sufficient for early multi-cancer detection. This interesting idea has been turned into commercial products, e.g. in the company named GRAIL. In the present work, we propose to explore the use of Signature theory for detecting aggressive cancer tumors based on the analysis of blood samples. Our approach combines tools from continuous time Markov modelling for the dynamics of ctDNA levels in the blood, with Signature theory for building efficient testing procedures. Signature theory is a topic of growing interest in the Machine Learning community (see Chevyrev2016 and Fermanian2021), which is now recognised as a powerful feature extraction tool for irregularly sampled signals. The method proposed in the present paper is shown to correctly address the challenging problem of overcoming the inherent data scarsity due to the extremely small number of blood samples per patient. The relevance of our approach is illustrated with extensive numerical experiments that confirm the efficiency of the proposed pipeline.

nan


Article 718

Title@2025-06-10 (2): On the Stability of the Jacobian Matrix in Deep Neural Networks

Title: On the Stability of the Jacobian Matrix in Deep Neural Networks Zur Stabilität der Jacobian Matrix in tiefen neuralen Netzwerken 深神经网络Jacobian矩阵的稳定 2506.08764v1

Authors (5): Benjamin Dadoun, Soufiane Hayou, Hanan Salam, Mohamed El Amine Seddik, Pierre Youssef

Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.

nan


Article 719

Title@2025-06-10 (2): EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements

Title: EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements EDINET-Bench: Bewertung von LLMs zu komplexen Finanzaufgaben anhand japanischer Finanzausweise EDINET-Bench:利用日本财务报表评估复杂金融任务LMs 2506.08762v1

Authors (7): Issa Sugiura, Takashi Ishida, Taro Makino, Chieko Tazuke, Takanori Nakagawa, Kosuke Nakago, David Ha

Financial analysis presents complex challenges that could leverage large language model (LLM) capabilities. However, the scarcity of challenging financial datasets, particularly for Japanese financial data, impedes academic innovation in financial analytics. As LLMs advance, this lack of accessible research resources increasingly hinders their development and evaluation in this specialized domain. To address this gap, we introduce EDINET-Bench, an open-source Japanese financial benchmark designed to evaluate the performance of LLMs on challenging financial tasks including accounting fraud detection, earnings forecasting, and industry prediction. EDINET-Bench is constructed by downloading annual reports from the past 10 years from Japan’s Electronic Disclosure for Investors’ NETwork (EDINET) and automatically assigning labels corresponding to each evaluation task. Our experiments reveal that even state-of-the-art LLMs struggle, performing only slightly better than logistic regression in binary classification for fraud detection and earnings forecasting. These results highlight significant challenges in applying LLMs to real-world financial applications and underscore the need for domain-specific adaptation. Our dataset, benchmark construction code, and evaluation code is publicly available to facilitate future research in finance with LLMs.

nan


Article 720

Title@2025-06-10 (2): Intra-Trajectory Consistency for Reward Modeling

Title: Intra-Trajectory Consistency for Reward Modeling Intra-Trajektorien-Konsistenz für Prämienmodellierung 奖励建模在轨内兼容性 2506.09096v1

Authors (7): Chaoyang Zhou, Shunyu Liu, Zengmao Wang, Di Wang, Rong-Cheng Tu, Bo Du, Dacheng Tao

Reward models are critical for improving large language models (LLMs), particularly in reinforcement learning from human feedback (RLHF) or inference-time verification. Current reward modeling typically relies on scores of overall responses to learn the outcome rewards for the responses. However, since the response-level scores are coarse-grained supervision signals, the reward model struggles to identify the specific components within a response trajectory that truly correlate with the scores, leading to poor generalization on unseen responses. In this paper, we propose to leverage generation probabilities to establish reward consistency between processes in the response trajectory, which allows the response-level supervisory signal to propagate across processes, thereby providing additional fine-grained signals for reward learning. Building on analysis under the Bayesian framework, we develop an intra-trajectory consistency regularization to enforce that adjacent processes with higher next-token generation probability maintain more consistent rewards. We apply the proposed regularization to the advanced outcome reward model, improving its performance on RewardBench. Besides, we show that the reward model trained with the proposed regularization induces better DPO-aligned policies and achieves better best-of-N (BON) inference-time verification results. Our code is provided in https://github.com/chaoyang101/ICRM.

nan


Article 721

Title@2025-06-10 (2): Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL

Title: Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL Verbesserung der Genauigkeit und Aufrechterhaltung des Datenrücklaufs von Kernkraftwerken: Ein funktionsorientierter LLM-Ansatz über NL-zu-SQL 提高核电厂数据检索的准确性和可维护性:NL-SQL的功能计算LLM法 2506.08757v1

Authors (5): Mishca de Costa, Muhammad Anwar, Dave Mercier, Mark Randall, Issam Hammad

Retrieving operational data from nuclear power plants requires exceptional accuracy and transparency due to the criticality of the decisions it supports. Traditionally, natural language to SQL (NL-to-SQL) approaches have been explored for querying such data. While NL-to-SQL promises ease of use, it poses significant risks: end-users cannot easily validate generated SQL queries, and legacy nuclear plant databases – often complex and poorly structured – complicate query generation due to decades of incremental modifications. These challenges increase the likelihood of inaccuracies and reduce trust in the approach. In this work, we propose an alternative paradigm: leveraging function-calling large language models (LLMs) to address these challenges. Instead of directly generating SQL queries, we define a set of pre-approved, purpose-specific functions representing common use cases. Queries are processed by invoking these functions, which encapsulate validated SQL logic. This hybrid approach mitigates the risks associated with direct NL-to-SQL translations by ensuring that SQL queries are reviewed and optimized by experts before deployment. While this strategy introduces the upfront cost of developing and maintaining the function library, we demonstrate how NL-to-SQL tools can assist in the initial generation of function code, allowing experts to focus on validation rather than creation. Our study includes a performance comparison between direct NL-to-SQL generation and the proposed function-based approach, highlighting improvements in accuracy and maintainability. This work underscores the importance of balancing user accessibility with operational safety and provides a novel, actionable framework for robust data retrieval in critical systems.

nan


Article 722

Title@2025-06-10 (2): Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Title: Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction Denken vs. Tun: Agenten, die Grund durch Skalierung Test-Zeit-Interaktion 思考与做:通过扩大试验时间互动而有理由的代理商 2506.07976v2

Authors (11): Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar

The current paradigm of test-time scaling relies on generating long reasoning traces (“thinking” more) before producing a response. In agent problems that require interaction, this can be done by generating thinking traces before acting in the world. However, this process does not allow agents to acquire new information from the environment or adapt their behavior over time. In this work, we propose to scale test-time interaction, an untapped dimension of test-time scaling that increases the agent’s interaction horizon to enable running rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout. To demonstrate the promise of this scaling dimension, we study the domain of web agents. We first show that even prompting-based interaction scaling without any training can improve task success on web benchmarks non-trivially. Building on this, we introduce TTI (Test-Time Interaction), a curriculum-based online reinforcement learning (RL) approach that trains agents by adaptively adjusting their rollout lengths. Using a Gemma 3 12B model, TTI produces state-of-the-art open-source, open-data web agents on WebVoyager and WebArena benchmarks. We further show that TTI enables agents to balance exploration and exploitation adaptively. Our results establish interaction scaling as a powerful, complementary axis to scaling per-step compute, offering new avenues for training adaptive agents.

nan


Article 723

Title@2025-06-10 (2): Understanding High-Dimensional Bayesian Optimization

Title: Understanding High-Dimensional Bayesian Optimization High-Dimensional Bayesian Optimierung verstehen 理解高多样性巴耶斯最佳优化 2502.09198v2

Authors (3): Leonard Papenmeier, Matthias Poloczek, Luigi Nardi

Recent work reported that simple Bayesian optimization (BO) methods perform well for high-dimensional real-world tasks, seemingly contradicting prior work and tribal knowledge. This paper investigates why. We identify underlying challenges that arise in high-dimensional BO and explain why recent methods succeed. Our empirical analysis shows that vanishing gradients caused by Gaussian process (GP) initialization schemes play a major role in the failures of high-dimensional Bayesian optimization (HDBO) and that methods that promote local search behaviors are better suited for the task. We find that maximum likelihood estimation (MLE) of GP length scales suffices for state-of-the-art performance. Based on this, we propose a simple variant of MLE called MSR that leverages these findings to achieve state-of-the-art performance on a comprehensive set of real-world applications. We present targeted experiments to illustrate and confirm our findings.

nan


Article 724

Title@2025-06-10 (2): Superposed Parameterised Quantum Circuits

Title: Superposed Parameterised Quantum Circuits Überlagerte parametrische Quantenkreise 量子电路 2506.08749v1

Authors (3): Viktoria Patapovich, Mo Kordzanganeh, Alexey Melnikov

Quantum machine learning has shown promise for high-dimensional data analysis, yet many existing approaches rely on linear unitary operations and shared trainable parameters across outputs. These constraints limit expressivity and scalability relative to the multi-layered, non-linear architectures of classical deep networks. We introduce superposed parameterised quantum circuits to overcome these limitations. By combining flip-flop quantum random-access memory with repeat-until-success protocols, a superposed parameterised quantum circuit embeds an exponential number of parameterised sub-models in a single circuit and induces polynomial activation functions through amplitude transformations and post-selection. We provide an analytic description of the architecture, showing how multiple parameter sets are trained in parallel while non-linear amplitude transformations broaden representational power beyond conventional quantum kernels. Numerical experiments underscore these advantages: on a 1D step-function regression a two-qubit superposed parameterised quantum circuit cuts the mean-squared error by three orders of magnitude versus a parameter-matched variational baseline; on a 2D star-shaped two-dimensional classification task, introducing a quadratic activation lifts accuracy to 81.4% and reduces run-to-run variance three-fold. These results position superposed parameterised quantum circuits as a hardware-efficient route toward deeper, more versatile parameterised quantum circuits capable of learning complex decision boundaries.

nan


Article 725

Title@2025-06-10 (2): Towards Secure and Private Language Models for Nuclear Power Plants

Title: Towards Secure and Private Language Models for Nuclear Power Plants Auf dem Weg zu sicheren und privaten Sprachmodellen für Kernkraftwerke 争取实现核电厂安全和私用语言模式 2506.08746v1

Authors (4): Muhammad Anwar, Mishca de Costa, Issam Hammad, Daniel Lau

This paper introduces a domain-specific Large Language Model for nuclear applications, built from the publicly accessible Essential CANDU textbook. Drawing on a compact Transformer-based architecture, the model is trained on a single GPU to protect the sensitive data inherent in nuclear operations. Despite relying on a relatively small dataset, it shows encouraging signs of capturing specialized nuclear vocabulary, though the generated text sometimes lacks syntactic coherence. By focusing exclusively on nuclear content, this approach demonstrates the feasibility of in-house LLM solutions that align with rigorous cybersecurity and data confidentiality standards. Early successes in text generation underscore the model’s utility for specialized tasks, while also revealing the need for richer corpora, more sophisticated preprocessing, and instruction fine-tuning to enhance domain accuracy. Future directions include extending the dataset to cover diverse nuclear subtopics, refining tokenization to reduce noise, and systematically evaluating the model’s readiness for real-world applications in nuclear domain.

nan


Article 726

Title@2025-06-10 (2): Bridging RDF Knowledge Graphs with Graph Neural Networks for Semantically-Rich Recommender Systems

Title: Bridging RDF Knowledge Graphs with Graph Neural Networks for Semantically-Rich Recommender Systems Bridging RDF Knowledge Graphs mit Graph Neural Networks für Semantically-Rich Recommender Systeme 将RDF知识图与Smantic-Rich建议系统图形神经网络连接起来 2506.08743v1

Authors (3): Michael Färber, David Lamprecht, Yuni Susanti

Graph Neural Networks (GNNs) have substantially advanced the field of recommender systems. However, despite the creation of more than a thousand knowledge graphs (KGs) under the W3C standard RDF, their rich semantic information has not yet been fully leveraged in GNN-based recommender systems. To address this gap, we propose a comprehensive integration of RDF KGs with GNNs that utilizes both the topological information from RDF object properties and the content information from RDF datatype properties. Our main focus is an in-depth evaluation of various GNNs, analyzing how different semantic feature initializations and types of graph structure heterogeneity influence their performance in recommendation tasks. Through experiments across multiple recommendation scenarios involving multi-million-node RDF graphs, we demonstrate that harnessing the semantic richness of RDF KGs significantly improves recommender systems and lays the groundwork for GNN-based recommender systems for the Linked Open Data cloud. The code and data are available on our GitHub repository: https://github.com/davidlamprecht/rdf-gnn-recommendation

nan


Article 727

Title@2025-06-10 (2): Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports

Title: Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports Urban Incident Prediction mit Graph Neural Networks: Integration von Regierungsbewertungen und Crowdsourced Reports 与图形神经网络城市事件预测:整合政府评级和众包报告 2506.08740v1

Authors (5): Sidhika Balachandar, Shuvom Sadhuka, Bonnie Berger, Emma Pierson, Nikhil Garg

Graph neural networks (GNNs) are widely used in urban spatiotemporal forecasting, such as predicting infrastructure problems. In this setting, government officials wish to know in which neighborhoods incidents like potholes or rodent issues occur. The true state of incidents (e.g., street conditions) for each neighborhood is observed via government inspection ratings. However, these ratings are only conducted for a sparse set of neighborhoods and incident types. We also observe the state of incidents via crowdsourced reports, which are more densely observed but may be biased due to heterogeneous reporting behavior. First, for such settings, we propose a multiview, multioutput GNN-based model that uses both unbiased rating data and biased reporting data to predict the true latent state of incidents. Second, we investigate a case study of New York City urban incidents and collect, standardize, and make publicly available a dataset of 9,615,863 crowdsourced reports and 1,041,415 government inspection ratings over 3 years and across 139 types of incidents. Finally, we show on both real and semi-synthetic data that our model can better predict the latent state compared to models that use only reporting data or models that use only rating data, especially when rating data is sparse and reports are predictive of ratings. We also quantify demographic biases in crowdsourced reporting, e.g., higher-income neighborhoods report problems at higher rates. Our analysis showcases a widely applicable approach for latent state prediction using heterogeneous, sparse, and biased data.

nan


Article 728

Title@2025-06-10 (2): Exploration by Random Reward Perturbation

Title: Exploration by Random Reward Perturbation Exploration durch zufällige Belohnung Störung 利用随机奖励干扰探测 2506.08737v1

Authors (5): Haozhe Ma, Guoji Fu, Zhengding Luo, Jiele Wu, Tze-Yun Leong

We introduce Random Reward Perturbation (RRP), a novel exploration strategy for reinforcement learning (RL). Our theoretical analyses demonstrate that adding zero-mean noise to environmental rewards effectively enhances policy diversity during training, thereby expanding the range of exploration. RRP is fully compatible with the action-perturbation-based exploration strategies, such as $\epsilon$-greedy, stochastic policies, and entropy regularization, providing additive improvements to exploration effects. It is general, lightweight, and can be integrated into existing RL algorithms with minimal implementation effort and negligible computational overhead. RRP establishes a theoretical connection between reward shaping and noise-driven exploration, highlighting their complementary potential. Experiments show that RRP significantly boosts the performance of Proximal Policy Optimization and Soft Actor-Critic, achieving higher sample efficiency and escaping local optima across various tasks, under both sparse and dense reward scenarios.

nan


Article 729

Title@2025-06-10 (2): scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data

Title: scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data scSSL-Bench: Benchmarking Selbstüberwachtes Lernen für Single-Cell-Daten scSSL-Bench:为单一分类数据制定自我监督学习基准 2506.10031v1

Authors (7): Olga Ovcharenko, Florian Barkmann, Philip Toma, Imant Daunhawer, Julia Vogt, Sebastian Schelter, Valentina Boeva

Self-supervised learning (SSL) has proven to be a powerful approach for extracting biologically meaningful representations from single-cell data. To advance our understanding of SSL methods applied to single-cell data, we present scSSL-Bench, a comprehensive benchmark that evaluates nineteen SSL methods. Our evaluation spans nine datasets and focuses on three common downstream tasks: batch correction, cell type annotation, and missing modality prediction. Furthermore, we systematically assess various data augmentation strategies. Our analysis reveals task-specific trade-offs: the specialized single-cell frameworks, scVI, CLAIRE, and the finetuned scGPT excel at uni-modal batch correction, while generic SSL methods, such as VICReg and SimCLR, demonstrate superior performance in cell typing and multi-modal data integration. Random masking emerges as the most effective augmentation technique across all tasks, surpassing domain-specific augmentations. Notably, our results indicate the need for a specialized single-cell multi-modal data integration framework. scSSL-Bench provides a standardized evaluation platform and concrete recommendations for applying SSL to single-cell analysis, advancing the convergence of deep learning and single-cell genomics.

nan


Article 730

Title@2025-06-10 (2): Fusing Global and Local: Transformer-CNN Synergy for Next-Gen Current Estimation

Title: Fusing Global and Local: Transformer-CNN Synergy for Next-Gen Current Estimation Fusing Global und Local: Transformer-CNN-Synergie für die aktuelle Schätzung von Next-Gen 全球和局部引信: 变异器-CNN 用于下金当前估算的协同效应 2504.07996v2

Authors (8): Junlang Huang, Hao Chen, Li Luo, Yong Cai, Lexin Zhang, Tianhao Ma, Yitian Zhang, Zhong Guan

This paper presents a hybrid model combining Transformer and CNN for predicting the current waveform in signal lines. Unlike traditional approaches such as current source models, driver linear representations, waveform functional fitting, or equivalent load capacitance methods, our model does not rely on fixed simplified models of standard-cell drivers or RC loads. Instead, it replaces the complex Newton iteration process used in traditional SPICE simulations, leveraging the powerful sequence modeling capabilities of the Transformer framework to directly predict current responses without iterative solving steps. The hybrid architecture effectively integrates the global feature-capturing ability of Transformers with the local feature extraction advantages of CNNs, significantly improving the accuracy of current waveform predictions. Experimental results demonstrate that, compared to traditional SPICE simulations, the proposed algorithm achieves an error of only 0.0098. These results highlight the algorithm’s superior capabilities in predicting signal line current waveforms, timing analysis, and power evaluation, making it suitable for a wide range of technology nodes, from 40nm to 3nm.

nan


Article 731

Title@2025-06-10 (2): Flexible and Efficient Drift Detection without Labels

Title: Flexible and Efficient Drift Detection without Labels Flexible und effiziente Drift-Erkennung ohne Etiketten 无标签的灵活而高效的漂流探测 2506.08734v1

Authors (4): Nelvin Tan, Yu-Ching Shih, Dong Yang, Amol Salunkhe

Machine learning models are being increasingly used to automate decisions in almost every domain, and ensuring the performance of these models is crucial for ensuring high quality machine learning enabled services. Ensuring concept drift is detected early is thus of the highest importance. A lot of research on concept drift has focused on the supervised case that assumes the true labels of supervised tasks are available immediately after making predictions. Controlling for false positives while monitoring the performance of predictive models used to make inference from extremely large datasets periodically, where the true labels are not instantly available, becomes extremely challenging. We propose a flexible and efficient concept drift detection algorithm that uses classical statistical process control in a label-less setting to accurately detect concept drifts. We shown empirically that under computational constraints, our approach has better statistical power than previous known methods. Furthermore, we introduce a new drift detection framework to model the scenario of detecting drift (without labels) given prior detections, and show our how our drift detection algorithm can be incorporated effectively into this framework. We demonstrate promising performance via numerical simulations.

nan


Article 732

Title@2025-06-10 (2): Mitigating fairwashing using Two-Source Audits

Title: Mitigating fairwashing using Two-Source Audits Fairwashing durch Zwei-Quellen-Audits abmildern 利用双重来源审计减少洗水 2305.13883v2

Authors (4): Jade Garcia Bourrée, Erwan Le Merrer, Gilles Tredan, Benoît Rottembourg

Recent legislation requires online platforms to provide dedicated APIs to assess the compliance of their decision-making algorithms with the law. Research has nevertheless shown that the auditors of such platforms are prone to manipulation (a practice referred to as \textit{fairwashing}). To address this salient problem, recent work has considered audits under the assumption of partial knowledge of the platform’s internal mechanisms. In this paper, we propose a more pragmatic approach with the \textit{Two-Source Audit} setup: while still leveraging the API, we advocate for the adjunction of a second source of data to both perform the audit of a platform and the detection of fairwashing attempts. Our method is based on identifying discrepancies between the two data sources, using data proxies at use in the fairness literature. We formally demonstrate the conditions for success in this fairwashing mitigation task. We then validate our method empirically, demonstrating that Two-Source Audits can achieve a Pareto-optimal balance between the two objectives. We believe this paper sets the stage for reliable audits in manipulation-prone setups, under mild assumptions.

nan


Article 733

Title@2025-06-10 (2): Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs

Title: Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs Breaking the ICE: Erforschen von Versprechungen und Herausforderungen von Benchmarks für Inferenz-Kohlenstoff- & Energieschätzungen für LLMs 打破ICE:探索LLMM的碳和能源估算基准的许诺和挑战 2506.08727v1

Authors (8): Samarth Sikand, Rohit Mehra, Priyavanshi Pathania, Nikhil Bamby, Vibhu Saujanya Sharma, Vikrant Kaulgud, Sanjay Podder, Adam P. Burden

While Generative AI stands to be one of the fastest adopted technologies ever, studies have made evident that the usage of Large Language Models (LLMs) puts significant burden on energy grids and our environment. It may prove a hindrance to the Sustainability goals of any organization. A crucial step in any Sustainability strategy is monitoring or estimating the energy consumption of various components. While there exist multiple tools for monitoring energy consumption, there is a dearth of tools/frameworks for estimating the consumption or carbon emissions. Current drawbacks of both monitoring and estimation tools include high input data points, intrusive nature, high error margin, etc. We posit that leveraging emerging LLM benchmarks and related data points can help overcome aforementioned challenges while balancing accuracy of the emission estimations. To that extent, we discuss the challenges of current approaches and present our evolving framework, R-ICE, which estimates prompt level inference carbon emissions by leveraging existing state-of-the-art(SOTA) benchmark. This direction provides a more practical and non-intrusive way to enable emerging use-cases like dynamic LLM routing, carbon accounting, etc. Our promising validation results suggest that benchmark-based modelling holds great potential for inference emission estimation and warrants further exploration from the scientific community.

nan


Article 734

Title@2025-06-10 (2): Stop Misusing t-SNE and UMAP for Visual Analytics

Title: Stop Misusing t-SNE and UMAP for Visual Analytics Stoppen Sie die missbräuchliche Verwendung von t-SNE und UMAP für Visual Analytics 停止滥用 t- SNE 和视觉分析分析 UMAP 2506.08725v1

Authors (4): Hyeon Jeon, Jeongin Park, Sungbok Shin, Jinwook Seo

Misuses of t-SNE and UMAP in visual analytics have become increasingly common. For example, although t-SNE and UMAP projections often do not faithfully reflect true distances between clusters, practitioners frequently use them to investigate inter-cluster relationships. In this paper, we bring this issue to the surface and comprehensively investigate why such misuse occurs and how to prevent it. We conduct a literature review of 114 papers to verify the prevalence of the misuse and analyze the reasonings behind it. We then execute an interview study to uncover practitioners’ implicit motivations for using these techniques – rationales often undisclosed in the literature. Our findings indicate that misuse of t-SNE and UMAP primarily stems from limited discourse on their appropriate use in visual analytics. We conclude by proposing future directions and concrete action items to promote more reasonable use of DR.

nan


Article 735

Title@2025-06-10 (2): VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?

Title: VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? VIST-GPT: Sich in der Ära des Visual Storytelling mit LLMs niederlassen? 使用LLMS的视觉故事时代? 2504.19267v3

Authors (5): Mohamed Gado, Towhid Taliee, Muhammad Memon, Dmitry Ignatov, Radu Timofte

Visual storytelling is an interdisciplinary field combining computer vision and natural language processing to generate cohesive narratives from sequences of images. This paper presents a novel approach that leverages recent advancements in multimodal models, specifically adapting transformer-based architectures and large multimodal models, for the visual storytelling task. Leveraging the large-scale Visual Storytelling (VIST) dataset, our VIST-GPT model produces visually grounded, contextually appropriate narratives. We address the limitations of traditional evaluation metrics, such as BLEU, METEOR, ROUGE, and CIDEr, which are not suitable for this task. Instead, we utilize RoViST and GROOVIST, novel reference-free metrics designed to assess visual storytelling, focusing on visual grounding, coherence, and non-redundancy. These metrics provide a more nuanced evaluation of narrative quality, aligning closely with human judgment.

nan


Article 736

Title@2025-06-10 (2): Strain Problems got you in a Twist? Try StrainRelief: A Quantum-Accurate Tool for Ligand Strain Calculations

Title: Strain Problems got you in a Twist? Try StrainRelief: A Quantum-Accurate Tool for Ligand Strain Calculations Strain Probleme haben Sie in einem Twist? Versuchen Sie StrainRelief: Ein Quantum-Accurate-Tool für Ligand Strain Berechnungen Strain Relief: Ligand Strain 计算 的量子精确工具 2503.13352v2

Authors (3): Ewan R. S. Wallace, Nathan C. Frey, Joshua A. Rackers

Ligand strain energy, the energy difference between the bound and unbound conformations of a ligand, is an important component of structure-based small molecule drug design. A large majority of observed ligands in protein-small molecule co-crystal structures bind in low-strain conformations, making strain energy a useful filter for structure-based drug design. In this work we present a tool for calculating ligand strain with a high accuracy. StrainRelief uses a MACE Neural Network Potential (NNP), trained on a large database of Density Functional Theory (DFT) calculations to estimate ligand strain of neutral molecules with quantum accuracy. We show that this tool estimates strain energy differences relative to DFT to within 1.4 kcal/mol, more accurately than alternative NNPs. These results highlight the utility of NNPs in drug discovery, and provide a useful tool for drug discovery teams.

nan


Article 737

Title@2025-06-10 (2): TimeWak: Temporal Chained-Hashing Watermark for Time Series Data

Title: TimeWak: Temporal Chained-Hashing Watermark for Time Series Data TimeWak: Temporales Ketten-Hashing-Wasserzeichen für Zeitreihendaten TimeWak:用于时间序列数据的时链套装水印 2506.06407v2

Authors (7): Zhi Wen Soi, Chaoyi Zhu, Fouad Abiad, Aditya Shankar, Jeroen M. Galjaard, Huijuan Wang, Lydia Y. Chen

Synthetic time series generated by diffusion models enable sharing privacy-sensitive datasets, such as patients’ functional MRI records. Key criteria for synthetic data include high data utility and traceability to verify the data source. Recent watermarking methods embed in homogeneous latent spaces, but state-of-the-art time series generators operate in real space, making latent-based watermarking incompatible. This creates the challenge of watermarking directly in real space while handling feature heterogeneity and temporal dependencies. We propose TimeWak, the first watermarking algorithm for multivariate time series diffusion models. To handle temporal dependence and spatial heterogeneity, TimeWak embeds a temporal chained-hashing watermark directly within the real temporal-feature space. The other unique feature is the $\epsilon$-exact inversion, which addresses the non-uniform reconstruction error distribution across features from inverting the diffusion process to detect watermarks. We derive the error bound of inverting multivariate time series and further maintain high watermark detectability. We extensively evaluate TimeWak on its impact on synthetic data quality, watermark detectability, and robustness under various post-editing attacks, against 5 datasets and baselines of different temporal lengths. Our results show that TimeWak achieves improvements of 61.96% in context-FID score, and 8.44% in correlational scores against the state-of-the-art baseline, while remaining consistently detectable.

nan


Article 738

Title@2025-06-10 (2): ASIDE: Architectural Separation of Instructions and Data in Language Models

Title: ASIDE: Architectural Separation of Instructions and Data in Language Models ASIDE: Architekturelle Trennung von Instruktionen und Daten in Sprachmodellen AIDIDE: 语言模式教学和数据结构分类 2503.10566v3

Authors (8): Egor Zverev, Evgenii Kortukov, Alexander Panfilov, Alexandra Volkova, Soroush Tabesh, Sebastian Lapuschkin, Wojciech Samek, Christoph H. Lampert

Despite their remarkable performance, large language models lack elementary safety features, making them susceptible to numerous malicious attacks. In particular, previous work has identified the absence of an intrinsic separation between instructions and data as a root cause of the success of prompt injection attacks. In this work, we propose a new architectural element, ASIDE, that allows language models to clearly separate instructions and data at the level of embeddings. ASIDE applies an orthogonal rotation to the embeddings of data tokens, thus creating clearly distinct representations of instructions and data tokens without introducing any additional parameters. As we demonstrate experimentally across a range of models, instruction-tuning LLMs with ASIDE (1) leads to highly increased instruction-data separation without a loss in model utility and (2) makes the models more robust to prompt injection benchmarks, even without dedicated safety training. Additionally, we provide insights into the mechanism underlying our method through an analysis of the model representations. The source code and training scripts are openly accessible at https://github.com/egozverev/aside.

nan


Article 739

Title: A Survey on Quantum Machine Learning: Current Trends, Challenges, Opportunities, and the Road Ahead Eine Umfrage zum Quantum Machine Learning: Aktuelle Trends, Herausforderungen, Chancen und die Zukunft 关于 “ 量子机器学习:当前趋势、挑战、机遇和路前道路 “ 的调查 2310.10315v4

Authors (4): Kamila Zaman, Alberto Marchisio, Muhammad Abdullah Hanif, Muhammad Shafique

Quantum Computing (QC) claims to improve the efficiency of solving complex problems, compared to classical computing. When QC is integrated with Machine Learning (ML), it creates a Quantum Machine Learning (QML) system. This paper aims to provide a thorough understanding of the foundational concepts of QC and its notable advantages over classical computing. Following this, we delve into the key aspects of QML in a detailed and comprehensive manner. In this survey, we investigate a variety of QML algorithms, discussing their applicability across different domains. We examine quantum datasets, highlighting their unique characteristics and advantages. The survey also covers the current state of hardware technologies, providing insights into the latest advancements and their implications for QML. Additionally, we review the software tools and simulators available for QML development, discussing their features and usability. Furthermore, we explore practical applications of QML, illustrating how it can be leveraged to solve real-world problems more efficiently than classical ML methods. This survey aims to consolidate the current landscape of QML and outline key opportunities and challenges for future research.

nan


Article 740

Title@2025-06-10 (2): Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data

Title: Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data Intelligenter zusammenführen, besser verallgemeinern: Modell verbessern Zusammenführen auf OOD-Daten 合并的 “ 聪明者 “ 、 “ 更普遍化:加强OOD数据模型合并 “ 2506.09093v1

Authors (8): Bingjie Zhang, Hongkang Li, Changlong Shi, Guowei Rong, He Zhao, Dongsheng Wang, Dandan Guo, Meng Wang

Multi-task learning (MTL) concurrently trains a model on diverse task datasets to exploit common features, thereby improving overall performance across the tasks. Recent studies have dedicated efforts to merging multiple independent model parameters into a unified model for MTL, thus circumventing the need for training data and expanding the scope of applicable scenarios of MTL. However, current approaches to model merging predominantly concentrate on enhancing performance within in-domain (ID) datasets, often overlooking their efficacy on out-of-domain (OOD) datasets. In this work, we proposed LwPTV (Layer-wise Pruning Task Vector) by building a saliency score, measuring the redundancy of parameters in task vectors. Designed in this way ours can achieve mask vector for each task and thus perform layer-wise pruning on the task vectors, only keeping the pre-trained model parameters at the corresponding layer in merged model. Owing to its flexibility, our method can be seamlessly integrated with most of existing model merging methods to improve their performance on OOD tasks. Extensive experiments demonstrate that the application of our method results in substantial enhancements in OOD performance while preserving the ability on ID tasks.

nan


Article 741

Title@2025-06-10 (2): Variational Autoencoder-Based Approach to Latent Feature Analysis on Efficient Representation of Power Load Monitoring Data

Title: Variational Autoencoder-Based Approach to Latent Feature Analysis on Efficient Representation of Power Load Monitoring Data Variationeller Autoencoder-basierter Ansatz zur Latent Feature Analyse zur effizienten Darstellung von Power Load Monitoring Daten 以变式自动编码器为基础,对电荷载量监测数据高效代表情况进行长期特征分析的变式自动编码器方法 2506.08698v1

Authors (2): Boyu Xie, Tangtang Xie

With the development of smart grids, High-Dimensional and Incomplete (HDI) Power Load Monitoring (PLM) data challenges the performance of Power Load Forecasting (PLF) models. In this paper, we propose a potential characterization model VAE-LF based on Variational Autoencoder (VAE) for efficiently representing and complementing PLM missing data. VAE-LF learns a low-dimensional latent representation of the data using an Encoder-Decoder structure by splitting the HDI PLM data into vectors and feeding them sequentially into the VAE-LF model, and generates the complementary data. Experiments on the UK-DALE dataset show that VAE-LF outperforms other benchmark models in both 5% and 10% sparsity test cases, with significantly lower RMSE and MAE, and especially outperforms on low sparsity ratio data. The method provides an efficient data-completion solution for electric load management in smart grids.

nan


Article 742

Title@2025-06-10 (2): Self-Training Elicits Concise Reasoning in Large Language Models

Title: Self-Training Elicits Concise Reasoning in Large Language Models Selbst-Trainings-Elizite schließen Vernunft in großen Sprachmodellen ein 大语言模式中的简明理由 2502.20122v3

Authors (6): Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun

Chain-of-thought (CoT) reasoning has enabled large language models (LLMs) to utilize additional computation through intermediate tokens to solve complex tasks. However, we posit that typical reasoning traces contain many redundant tokens, incurring extraneous inference costs. Upon examination of the output distribution of current LLMs, we find evidence on their latent ability to reason more concisely, relative to their default behavior. To elicit this capability, we propose simple fine-tuning methods which leverage self-generated concise reasoning paths obtained by best-of-N sampling and few-shot conditioning, in task-specific settings. Our combined method achieves a 30% reduction in output tokens on average, across five model families on GSM8K and MATH, while maintaining average accuracy. By exploiting the fundamental stochasticity and in-context learning capabilities of LLMs, our self-training approach robustly elicits concise reasoning on a wide range of models, including those with extensive post-training. Code is available at https://github.com/TergelMunkhbat/concise-reasoning

nan


Article 743

Title@2025-06-10 (2): CUDA-LLM: LLMs Can Write Efficient CUDA Kernels

Title: CUDA-LLM: LLMs Can Write Efficient CUDA Kernels CUDA-LLM: LLMs können effiziente CUDA-Kernel schreiben CUDA-LLM: 能够写写有效CUDA核心的LLMss 2506.09092v1

Authors (5): Wentao Chen, Jiace Zhu, Qi Fan, Yehan Ma, An Zou

Large Language Models (LLMs) have demonstrated strong capabilities in general-purpose code generation. However, generating the code which is deeply hardware-specific, architecture-aware, and performance-critical, especially for massively parallel GPUs, remains a complex challenge. In this work, we explore the use of LLMs for the automated generation and optimization of CUDA programs, with the goal of producing high-performance GPU kernels that fully exploit the underlying hardware. To address this challenge, we propose a novel framework called \textbf{Feature Search and Reinforcement (FSR)}. FSR jointly optimizes compilation and functional correctness, as well as the runtime performance, which are validated through extensive and diverse test cases, and measured by actual kernel execution latency on the target GPU, respectively. This approach enables LLMs not only to generate syntactically and semantically correct CUDA code but also to iteratively refine it for efficiency, tailored to the characteristics of the GPU architecture. We evaluate FSR on representative CUDA kernels, covering AI workloads and computational intensive algorithms. Our results show that LLMs augmented with FSR consistently guarantee correctness rates. Meanwhile, the automatically generated kernels can outperform general human-written code by a factor of up to 179$\times$ in execution speeds. These findings highlight the potential of combining LLMs with performance reinforcement to automate GPU programming for hardware-specific, architecture-sensitive, and performance-critical applications.

nan


Article 744

Title@2025-06-10 (2): Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy

Title: Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy Variationelle Schlussfolgerung optimiert mit der gekrümmten Geometrie der gekoppelten freien Energie 使用共同自由能源曲线几何法优化 2506.09091v1

Authors (5): Kenric Nelson, Igor Oliveira, Amenah Al-Najafi, Fode Zhang, Hon Keung Tony Ng

We introduce an optimization framework for variational inference based on the coupled free energy, extending variational inference techniques to account for the curved geometry of the coupled exponential family. This family includes important heavy-tailed distributions such as the generalized Pareto and the Student’s t. By leveraging the coupled free energy, which is equal to the coupled evidence lower bound (ELBO) of the inverted probabilities, we improve the accuracy and robustness of the learned model. The coupled generalization of Fisher Information metric and the affine connection. The method is applied to the design of a coupled variational autoencoder (CVAE). By using the coupling for both the distributions and cost functions, the reconstruction metric is derived to still be the mean-square average loss with modified constants. The novelty comes from sampling the heavy-tailed latent distribution with its associated coupled probability, which has faster decaying tails. The result is the ability to train a model with high penalties in the tails, while assuring that the training samples have a reduced number of outliers. The Wasserstein-2 or Fr'echet Inception Distance of the reconstructed CelebA images shows the CVAE has a 3\% improvement over the VAE after 5 epochs of training.

nan


Article 745

Title@2025-06-10 (2): Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?

Title: Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? Optuna vs Code Llama: Sind LLMs ein neues Paradigma für Hyperparameter-Tuning? Optuna 与 Code Llama:LLMs是超参数图例的新范式吗? 2504.06006v3

Authors (5): Roman Kochnev, Arash Torabi Goodarzi, Zofia Antonina Bentyn, Dmitry Ignatov, Radu Timofte

Optimal hyperparameter selection is critical for maximizing neural network performance, especially as models grow in complexity. This work investigates the viability of leveraging large language models (LLMs) for hyperparameter optimization by fine-tuning a parameter-efficient version of Code Llama using LoRA. The adapted LLM is capable of generating accurate and efficient hyperparameter recommendations tailored to diverse neural network architectures. Unlike traditional approaches such as Optuna, which rely on computationally intensive trial-and-error procedures, our method achieves competitive or superior results in terms of Root Mean Square Error (RMSE) while significantly reducing computational overhead. Our findings demonstrate that LLM-based optimization not only matches the performance of state-of-the-art techniques like Tree-structured Parzen Estimators (TPE) but also substantially accelerates the tuning process. This positions LLMs as a promising alternative for rapid experimentation, particularly in resource-constrained environments such as edge devices and mobile platforms, where computational efficiency is essential. In addition to improved efficiency, the method offers time savings and consistent performance across various tasks, highlighting its robustness and generalizability. All generated hyperparameters are included in the LEMUR Neural Network (NN) Dataset, which is publicly available and serves as an open-source benchmark for hyperparameter optimization research.

nan


Article 746

Title@2025-06-10 (2): RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

Title: RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling RuleReasoner: Verstärkte regelbasiertes Reasoning via Domain-aware Dynamic Sampling 规则响应器:通过Domain-aware动态抽样加强基于规则的理由说明 2506.08672v1

Authors (3): Yang Liu, Jiaqi Li, Zilong Zheng

Rule-based reasoning has been acknowledged as one of the fundamental problems in reasoning, while deviations in rule formats, types, and complexity in real-world applications pose severe challenges. Recent studies have shown that large reasoning models (LRMs) have remarkable reasoning capabilities, and their performance is substantially enhanced by reinforcement learning (RL). However, it remains an open question whether small reasoning models (SRMs) can learn rule-based reasoning effectively with robust generalization across diverse tasks and domains. To address this, we introduce Reinforced Rule-based Reasoning, a.k.a. RuleReasoner, a simple yet effective method to conduct rule-based reasoning via a wide collection of curated tasks and a novel domain-aware dynamic sampling approach. Specifically, RuleReasoner resamples each training batch by updating the sampling weights of different domains based on historical rewards. This facilitates domain augmentation and flexible online learning schedules for RL, obviating the need for pre-hoc human-engineered mix-training recipes used in existing methods. Empirical evaluations on in-distribution (ID) and out-of-distribution (OOD) benchmarks reveal that RuleReasoner outperforms frontier LRMs by a significant margin ($\Delta$4.1% average points on eight ID tasks and $\Delta$10.4% average points on three OOD tasks over OpenAI-o1). Notably, our approach also exhibits higher computational efficiency compared to prior dynamic sampling methods for RL.

nan


Article 747

Title@2025-06-10 (2): sparseGeoHOPCA: A Geometric Solution to Sparse Higher-Order PCA Without Covariance Estimation

Title: sparseGeoHOPCA: A Geometric Solution to Sparse Higher-Order PCA Without Covariance Estimation spärlichGeoHOPCA: Eine geometrische Lösung, um PCA ohne Kovarianzabschätzung höher zu bestellen 稀多GeoHOPCA:无变化估计的普通五氯苯甲醚的几何解决办法 2506.08670v1

Authors (6): Renjie Xu, Chong Wu, Maolin Che, Zhuoheng Ran, Yimin Wei, Hong Yan

We propose sparseGeoHOPCA, a novel framework for sparse higher-order principal component analysis (SHOPCA) that introduces a geometric perspective to high-dimensional tensor decomposition. By unfolding the input tensor along each mode and reformulating the resulting subproblems as structured binary linear optimization problems, our method transforms the original nonconvex sparse objective into a tractable geometric form. This eliminates the need for explicit covariance estimation and iterative deflation, enabling significant gains in both computational efficiency and interpretability, particularly in high-dimensional and unbalanced data scenarios. We theoretically establish the equivalence between the geometric subproblems and the original SHOPCA formulation, and derive worst-case approximation error bounds based on classical PCA residuals, providing data-dependent performance guarantees. The proposed algorithm achieves a total computational complexity of $O\left(\sum_{n=1}^{N} (k_n^3 + J_n k_n^2)\right)$, which scales linearly with tensor size. Extensive experiments demonstrate that sparseGeoHOPCA accurately recovers sparse supports in synthetic settings, preserves classification performance under 10$\times$ compression, and achieves high-quality image reconstruction on ImageNet, highlighting its robustness and versatility.

nan


Article 748

Title: Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search Mit Blueprints und Prompt Template Search die Fähigkeit kleiner Sprachmodelle zu verbessern 提高有蓝图和快速模板搜索的小型语言模式的理性能力 2506.08669v1

Authors (10): Dongge Han, Menglin Xia, Daniel Madrigal Diaz, Samuel Kessler, Ankur Mallick, Xuchao Zhang, Mirian Del Carmen Hipolito Garcia, Jin Xu, Victor Rühle, Saravan Rajmohan

Small language models (SLMs) offer promising and efficient alternatives to large language models (LLMs). However, SLMs’ limited capacity restricts their reasoning capabilities and makes them sensitive to prompt variations. To address these challenges, we propose a novel framework that enhances SLM reasoning capabilities through LLM generated blueprints. The blueprints provide structured, high-level reasoning guides that help SLMs systematically tackle related problems. Furthermore, our framework integrates a prompt template search mechanism to mitigate the SLMs’ sensitivity to prompt variations. Our framework demonstrates improved SLM performance across various tasks, including math (GSM8K), coding (MBPP), and logic reasoning (BBH). Our approach improves the reasoning capabilities of SLMs without increasing model size or requiring additional training, offering a lightweight and deployment-friendly solution for on-device or resource-constrained environments.

nan


Article 749

Title@2025-06-10 (2): Optimizing Learned Image Compression on Scalar and Entropy-Constraint Quantization

Title: Optimizing Learned Image Compression on Scalar and Entropy-Constraint Quantization Optimierung der Bildkompression auf Scalar- und Entropie-Konstraint-Quantisierung 优化 Scalar 和 Entropy- Contrpy- 约束量化的学习图像压缩 2506.08662v1

Authors (6): Florian Borzechowski, Michael Schäfer, Heiko Schwarz, Jonathan Pfaff, Detlev Marpe, Thomas Wiegand

The continuous improvements on image compression with variational autoencoders have lead to learned codecs competitive with conventional approaches in terms of rate-distortion efficiency. Nonetheless, taking the quantization into account during the training process remains a problem, since it produces zero derivatives almost everywhere and needs to be replaced with a differentiable approximation which allows end-to-end optimization. Though there are different methods for approximating the quantization, none of them model the quantization noise correctly and thus, result in suboptimal networks. Hence, we propose an additional finetuning training step: After conventional end-to-end training, parts of the network are retrained on quantized latents obtained at the inference stage. For entropy-constraint quantizers like Trellis-Coded Quantization, the impact of the quantizer is particularly difficult to approximate by rounding or adding noise as the quantized latents are interdependently chosen through a trellis search based on both the entropy model and a distortion measure. We show that retraining on correctly quantized data consistently yields additional coding gain for both uniform scalar and especially for entropy-constraint quantization, without increasing inference complexity. For the Kodak test set, we obtain average savings between 1% and 2%, and for the TecNick test set up to 2.2% in terms of Bj{\o}ntegaard-Delta bitrate.

nan


Article 750

Title@2025-06-10 (2): Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness

Title: Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness Auf dem Weg zu robuster Echtzeit-Multivariate-Zeitreihenprognose: Ein einheitliches Framework für Abhängigkeit, Asynchronität und Missingness 走向强劲的现实世界多变时间序列预测:关于依赖、同步和失踪的统一框架 2506.08660v1

Authors (4): Jinkwan Jang, Hyungjin Park, Jinmyeong Choi, Taesup Kim

Real-world time series data are inherently multivariate, often exhibiting complex inter-channel dependencies. Each channel is typically sampled at its own period and is prone to missing values due to various practical and operational constraints. These characteristics pose fundamental challenges related to channel dependency, sampling asynchrony, and missingness, all of which must be addressed to enable robust and reliable forecasting in practical settings. However, most existing architectures are built on oversimplified assumptions, such as identical sampling periods across channels and fully observed inputs at test time, which often do not hold in real-world scenarios. To bridge this gap, we propose ChannelTokenFormer, a Transformer-based forecasting model with a flexible architecture designed to explicitly capture cross-channel interactions, accommodate channel-wise asynchronous sampling, and effectively handle missing values. Extensive experiments on three benchmark datasets modified to reflect practical settings, along with one real-world industrial dataset, demonstrate the superior robustness and accuracy of ChannelTokenFormer under challenging real-world conditions.

nan


Article 751

Title@2025-06-10 (2): Integrating Asynchronous AdaBoost into Federated Learning: Five Real World Applications

Title: Integrating Asynchronous AdaBoost into Federated Learning: Five Real World Applications Integration von Asynchron AdaBoost in das Federated Learning: Fünf Anwendungen in der realen Welt 将AdaBoost亚洲同步体纳入联邦学习:五个真实世界应用 2506.09090v1

Authors (2): Arthur Oghlukyan, Nuria Gomez Blas

This paper presents a comprehensive analysis of an enhanced asynchronous AdaBoost framework for federated learning (FL), focusing on its application across five distinct domains: computer vision on edge devices, blockchain-based model transparency, on-device mobile personalization, IoT anomaly detection, and federated healthcare diagnostics. The proposed algorithm incorporates adaptive communication scheduling and delayed weight compensation to reduce synchronization frequency and communication overhead while preserving or improving model accuracy. We examine how these innovations improve communication efficiency, scalability, convergence, and robustness in each domain. Comparative metrics including training time, communication overhead, convergence iterations, and classification accuracy are evaluated using data and estimates derived from Oghlukyan’s enhanced AdaBoost framework. Empirical results show, for example, training time reductions on the order of 20-35% and communication overhead reductions of 30-40% compared to baseline AdaBoost, with convergence achieved in significantly fewer boosting rounds. Tables and charts summarize these improvements by domain. Mathematical formulations of the adaptive scheduling rule and error-driven synchronization thresholds are provided. Overall, the enhanced AdaBoost exhibits markedly improved efficiency and robustness across diverse FL scenarios, suggesting broad applicability of the approach.

nan


Article 752

Title@2025-06-10 (2): Scalable Graph Attention-based Instance Selection via Mini-Batch Sampling and Hierarchical Hashing

Title: Scalable Graph Attention-based Instance Selection via Mini-Batch Sampling and Hierarchical Hashing Skalierbare Grafik aufmerksamkeitsbasierte Instanzenauswahl über Mini-Batch-Sampling und Hierarchical Hashing 通过小型批量抽样和等级类散列选择基于可缩放图表的注意点选案 2502.20293v2

Authors (3): Zahiriddin Rustamov, Ayham Zaitouny, Nazar Zaki

Instance selection (IS) addresses the critical challenge of reducing dataset size while keeping informative characteristics, becoming increasingly important as datasets grow to millions of instances. Current IS methods often struggle with capturing complex relationships in high-dimensional spaces and scale with large datasets. This paper introduces a graph attention-based instance selection (GAIS) method that uses attention mechanisms to identify informative instances through their structural relationships in graph representations. We present two approaches for scalable graph construction: a distance-based mini-batch sampling technique that achieves dataset-size-independent complexity through strategic batch processing, and a hierarchical hashing approach that enables efficient similarity computation through random projections. The mini-batch approach keeps class distributions through stratified sampling, while the hierarchical hashing method captures relationships at multiple granularities through single-level, multi-level, and multi-view variants. Experiments across 39 datasets show that GAIS achieves reduction rates above 96\% while maintaining or improving model performance relative to state-of-the-art IS methods. The findings show that the distance-based mini-batch approach offers an optimal efficiency for large-scale datasets, while multi-view variants excel on complex, high-dimensional data, demonstrating that attention-based importance scoring can effectively identify instances important for maintaining decision boundaries while avoiding computationally prohibitive pairwise comparisons.

nan


Article 753

Title@2025-06-10 (2): When Simple Model Just Works: Is Network Traffic Classification in Crisis?

Title: When Simple Model Just Works: Is Network Traffic Classification in Crisis? Wenn einfaches Modell gerade funktioniert: Ist Netzwerkverkehrsklassifikation in der Krise? ” 简单模型 “ 何时起作用:网络交通分类是否处于危机之中? 2506.08655v1

Authors (6): Kamil Jerabek, Jan Luxemburk, Richard Plny, Josef Koumar, Jaroslav Pesek, Karel Hynek

Machine learning has been applied to network traffic classification (TC) for over two decades. While early efforts used shallow models, the latter 2010s saw a shift toward complex neural networks, often reporting near-perfect accuracy. However, it was recently revealed that a simple k-NN baseline using packet sequences metadata (sizes, times, and directions) can be on par or even outperform more complex methods. In this paper, we investigate this phenomenon further and evaluate this baseline across 12 datasets and 15 TC tasks, and investigate why it performs so well. Our analysis shows that most datasets contain over 50% redundant samples (identical packet sequences), which frequently appear in both training and test sets due to common splitting practices. This redundancy can lead to overestimated model performance and reduce the theoretical maximum accuracy when identical flows have conflicting labels. Given its distinct characteristics, we further argue that standard machine learning practices adapted from domains like NLP or computer vision may be ill-suited for TC. Finally, we propose new directions for task formulation and evaluation to address these challenges and help realign the field.

nan


Article 754

Title@2025-06-10 (2): A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck

Title: A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck Ein Privacy-Preserving Föderated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck 保护隐私的联邦学习框架,以普及CBCT, 将合成CT翻译在头部和颈部 2506.08654v1

Authors (3): Ciro Benito Raggio, Paolo Zaffino, Maria Francesca Spadea

Shortened Abstract Cone-beam computed tomography (CBCT) has become a widely adopted modality for image-guided radiotherapy (IGRT). However, CBCT suffers from increased noise, limited soft-tissue contrast, and artifacts, resulting in unreliable Hounsfield unit values and hindering direct dose calculation. Synthetic CT (sCT) generation from CBCT addresses these issues, especially using deep learning (DL) methods. Existing approaches are limited by institutional heterogeneity, scanner-dependent variations, and data privacy regulations that prevent multi-center data sharing. To overcome these challenges, we propose a cross-silo horizontal federated learning (FL) approach for CBCT-to-sCT synthesis in the head and neck region, extending our FedSynthCT framework. A conditional generative adversarial network was collaboratively trained on data from three European medical centers in the public SynthRAD2025 challenge dataset. The federated model demonstrated effective generalization across centers, with mean absolute error (MAE) ranging from $64.38\pm13.63$ to $85.90\pm7.10$ HU, structural similarity index (SSIM) from $0.882\pm0.022$ to $0.922\pm0.039$, and peak signal-to-noise ratio (PSNR) from $32.86\pm0.94$ to $34.91\pm1.04$ dB. Notably, on an external validation dataset of 60 patients, comparable performance was achieved (MAE: $75.22\pm11.81$ HU, SSIM: $0.904\pm0.034$, PSNR: $33.52\pm2.06$ dB) without additional training, confirming robust generalization despite protocol, scanner differences and registration errors. These findings demonstrate the technical feasibility of FL for CBCT-to-sCT synthesis while preserving data privacy and offer a collaborative solution for developing generalizable models across institutions without centralized data sharing or site-specific fine-tuning.

nan


Article 755

Title@2025-06-10 (2): Compositional Causal Reasoning Evaluation in Language Models

Title: Compositional Causal Reasoning Evaluation in Language Models kompositorische Kausale Reasoning-Bewertung in Sprachmodellen 语文模式的构成原因原因原因评估 2503.04556v4

Authors (5): Jacqueline R. M. A. Maasch, Alihan Hüyük, Xinnuo Xu, Aditya V. Nori, Javier Gonzalez

Causal reasoning and compositional reasoning are two core aspirations in AI. Measuring the extent of these behaviors requires principled evaluation methods. We explore a unified perspective that considers both behaviors simultaneously, termed compositional causal reasoning (CCR): the ability to infer how causal measures compose and, equivalently, how causal quantities propagate through graphs. We instantiate a framework for the systematic evaluation of CCR for the average treatment effect and the probability of necessity and sufficiency. As proof of concept, we demonstrate CCR evaluation for language models in the LLama, Phi, and GPT families. On a math word problem, our framework revealed a range of taxonomically distinct error patterns. CCR errors increased with the complexity of causal paths for all models except o1.

nan


Article 756

Title@2025-06-10 (2): JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset

Title: JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset JoFormer (Journal-basierter Transformer): Theorie und empirische Analyse auf dem winzigen Shakespeare-Datensatz Joformer(基于日圆的变换器):关于小莎士比亚数据集的理论和经验分析 2506.08652v1

Authors (1): Mahesh Godavarti

Transformers have demonstrated remarkable success in sequence modeling, yet effectively incorporating positional information remains a challenging and active area of research. In this paper, we introduce JoFormer, a journey-based Transformer architecture grounded in a recently proposed non-commutative algebra for composing transformations across positions. JoFormer represents relative positions through learnable directional transforms that are sequentially composed along the input, thereby extending and generalizing existing approaches based on relative position representations. We derive the JoFormer attention mechanism from first principles and show that it subsumes standard methods such as rotary transformations as special cases. To evaluate its effectiveness, we compare JoFormer to the RoFormer baseline on the Tiny Shakespeare character-level language modeling task. Our results demonstrate that JoFormer consistently achieves lower perplexity and faster convergence, highlighting the advantages of its more expressive, journey-based treatment of position. Notably, the per-token JoFormer is still a primitive, conceptual variant with layer-independent angles, yet it already demonstrates strong performance-underscoring its promise as a proof of concept for more expressive architectures. We conclude by discussing how JoFormer offers a principled approach to integrating positional structure into Transformer architectures. The code used in this work is available at https://github.com/mahesh-godavarti/joformer.

nan


Article 757

Title@2025-06-10 (2): BridgeNet: A Hybrid, Physics-Informed Machine Learning Framework for Solving High-Dimensional Fokker-Planck Equations

Title: BridgeNet: A Hybrid, Physics-Informed Machine Learning Framework for Solving High-Dimensional Fokker-Planck Equations BridgeNet: Hybrides, physikinformiertes Machine Learning Framework zur Lösung hochdimensionaler Fokker-Planck-Gleichungen BridgeNet:用于解决高二分法克-普朗克赤道的混合、物理成形机械学习框架 2506.04354v3

Authors (3): Elmira Mirzabeigi, Rezvan Salehi, Kourosh Parand

BridgeNet is a novel hybrid framework that integrates convolutional neural networks with physics-informed neural networks to efficiently solve non-linear, high-dimensional Fokker-Planck equations (FPEs). Traditional PINNs, which typically rely on fully connected architectures, often struggle to capture complex spatial hierarchies and enforce intricate boundary conditions. In contrast, BridgeNet leverages adaptive CNN layers for effective local feature extraction and incorporates a dynamically weighted loss function that rigorously enforces physical constraints. Extensive numerical experiments across various test cases demonstrate that BridgeNet not only achieves significantly lower error metrics and faster convergence compared to conventional PINN approaches but also maintains robust stability in high-dimensional settings. This work represents a substantial advancement in computational physics, offering a scalable and accurate solution methodology with promising applications in fields ranging from financial mathematics to complex system dynamics.

nan


Article 758

Title@2025-06-10 (2): TableDreamer: Progressive and Weakness-guided Data Synthesis from Scratch for Table Instruction Tuning

Title: TableDreamer: Progressive and Weakness-guided Data Synthesis from Scratch for Table Instruction Tuning TableDreamer: Progressive und schwache Datensynthese von Scratch für die Table Instruction Tuning 表Dreamer:Scratch为表格指示图纸提供的渐进和弱点制导数据合成 2506.08646v1

Authors (7): Mingyu Zheng, Zhifan Feng, Jia Wang, Lanrui Wang, Zheng Lin, Yang Hao, Weiping Wang

Despite the commendable progress of recent LLM-based data synthesis methods, they face two limitations in generating table instruction tuning data. First, they can not thoroughly explore the vast input space of table understanding tasks, leading to limited data diversity. Second, they ignore the weaknesses in table understanding ability of the target LLM and blindly pursue the increase of data quantity, resulting in suboptimal data efficiency. In this paper, we introduce a progressive and weakness-guided data synthesis framework tailored for table instruction tuning, named TableDreamer, to mitigate the above issues. Specifically, we first synthesize diverse tables and related instructions as seed data, and then perform an iterative exploration of the input space under the guidance of the newly identified weakness data, which eventually serve as the final training data for fine-tuning the target LLM. Extensive experiments on 10 tabular benchmarks demonstrate the effectiveness of the proposed framework, which boosts the average accuracy of Llama3.1-8B-instruct by 11.62% (49.07% to 60.69%) with 27K GPT-4o synthetic data and outperforms state-of-the-art data synthesis baselines which use more training data. The code and data is available at https://github.com/SpursGoZmy/TableDreamer

nan


Article 759

Title@2025-06-10 (2): Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach

Title: Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach Durchgängige cross-modale und uni-modale Darstellungen: Ein Kronecker-Produktansatz 引信的跨模式和统一模式代表:克罗马克产品方法 2506.08645v1

Authors (3): Youqi Wu, Jingwei Zhang, Farzan Farnia

Cross-modal embeddings, such as CLIP, BLIP and their variants, have achieved promising results in aligning representations across modalities. However, these embeddings could underperform compared to state-of-the-art single-modality embeddings on modality-specific tasks. On the other hand, single-modality embeddings excel in their domains but lack cross-modal alignment capabilities. In this work, we focus on the problem of unifying cross-modality and single-modality embeddings to achieve the performance of modality-expert embedding within individual modalities while preserving cross-modal alignment. To this end, we propose RP-KrossFuse, a method that leverages a random projection-based Kronecker product to integrate cross-modal embeddings with single-modality embeddings. RP-KrossFuse aims to fuse the sample-pairwise similarity scores of the fused embeddings and operates efficiently in a specified kernel space and supports scalable implementations via random Fourier features for shift-invariant kernels such as the Gaussian kernel. We demonstrate the effectiveness of RP-KrossFuse through several numerical experiments, combining CLIP embeddings with uni-modal image and text embeddings. Our numerical results indicate that RP-KrossFuse achieves competitive modality-specific performance while retaining cross-modal alignment, bridging the gap between cross-modal and single-modality embeddings.

nan


Article 760

Title@2025-06-10 (2): Semi-gradient DICE for Offline Constrained Reinforcement Learning

Title: Semi-gradient DICE for Offline Constrained Reinforcement Learning Halbgradienter DICE für Offline-Beschränktes Verstärkungslernen 用于非线外约束训练强化学习的半级DICE 2506.08644v1

Authors (4): Woosung Kim, JunHo Seo, Jongmin Lee, Byung-Jun Lee

Stationary Distribution Correction Estimation (DICE) addresses the mismatch between the stationary distribution induced by a policy and the target distribution required for reliable off-policy evaluation (OPE) and policy optimization. DICE-based offline constrained RL particularly benefits from the flexibility of DICE, as it simultaneously maximizes return while estimating costs in offline settings. However, we have observed that recent approaches designed to enhance the offline RL performance of the DICE framework inadvertently undermine its ability to perform OPE, making them unsuitable for constrained RL scenarios. In this paper, we identify the root cause of this limitation: their reliance on a semi-gradient optimization, which solves a fundamentally different optimization problem and results in failures in cost estimation. Building on these insights, we propose a novel method to enable OPE and constrained RL through semi-gradient DICE. Our method ensures accurate cost estimation and achieves state-of-the-art performance on the offline constrained RL benchmark, DSRL.

nan


Article 761

Title@2025-06-10 (2): An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation

Title: An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation Eine Analyse von Hyperparameter-Optimierungsmethoden für die retrieval Augmented Generation 对回收增加的一代的超光子辐射仪最佳优化方法的分析 2505.03452v2

Authors (15): Matan Orbach, Ohad Eytan, Benjamin Sznajder, Ariel Gera, Odellia Boni, Yoav Kantor, Gal Bloch, Omri Levy, Hadas Abraham, Nitzan Barzilay, Eyal Shnarch, Michael E. Factor, Shila Ofek-Koifman, Paula Ta-Shma, Assaf Toledo

Finding the optimal Retrieval-Augmented Generation (RAG) configuration for a given use case can be complex and expensive. Motivated by this challenge, frameworks for RAG hyper-parameter optimization (HPO) have recently emerged, yet their effectiveness has not been rigorously benchmarked. To address this gap, we present a comprehensive study involving 5 HPO algorithms over 5 datasets from diverse domains, including a new one collected for this work on real-world product documentation. Our study explores the largest HPO search space considered to date, with three evaluation metrics as optimization targets. Analysis of the results shows that RAG HPO can be done efficiently, either greedily or with random search, and that it significantly boosts RAG performance for all datasets. For greedy HPO approaches, we show that optimizing model selection first is preferable to the prevalent practice of optimizing according to RAG pipeline order.

nan


Article 762

Title@2025-06-10 (2): Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers

Title: Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers Zeitreihendarstellungen für die Klassifizierung liegen in vorgebildeten Visionstransformatoren versteckt 未受过训练的愿景变异者中隐藏的分类隐隐藏物的 时间序列 2506.08641v1

Authors (5): Simon Roschmann, Quentin Bouniot, Vasilii Feofanov, Ievgen Redko, Zeynep Akata

Time series classification is a fundamental task in healthcare and industry, yet the development of time series foundation models (TSFMs) remains limited by the scarcity of publicly available time series datasets. In this work, we propose Time Vision Transformer (TiViT), a framework that converts time series into images to leverage the representational power of frozen Vision Transformers (ViTs) pretrained on large-scale image datasets. First, we theoretically motivate our approach by analyzing the 2D patching of ViTs for time series, showing that it can increase the number of label-relevant tokens and reduce the sample complexity. Second, we empirically demonstrate that TiViT achieves state-of-the-art performance on standard time series classification benchmarks by utilizing the hidden representations of large OpenCLIP models. We explore the structure of TiViT representations and find that intermediate layers with high intrinsic dimension are the most effective for time series classification. Finally, we assess the alignment between TiViT and TSFM representation spaces and identify a strong complementarity, with further performance gains achieved by combining their features. Our findings reveal yet another direction for reusing vision representations in a non-visual domain.

nan


Article 763

Title@2025-06-10 (2): CASE: Contrastive Activation for Saliency Estimation

Title: CASE: Contrastive Activation for Saliency Estimation RECHTSSACHE: Kontrastive Aktivierung für Salienzschätzung CASE: 弹性估计的反动动动动 2506.07327v2

Authors (3): Dane Williamson, Yangfeng Ji, Matthew Dwyer

Saliency methods are widely used to visualize which input features are deemed relevant to a model’s prediction. However, their visual plausibility can obscure critical limitations. In this work, we propose a diagnostic test for class sensitivity: a method’s ability to distinguish between competing class labels on the same input. Through extensive experiments, we show that many widely used saliency methods produce nearly identical explanations regardless of the class label, calling into question their reliability. We find that class-insensitive behavior persists across architectures and datasets, suggesting the failure mode is structural rather than model-specific. Motivated by these findings, we introduce CASE, a contrastive explanation method that isolates features uniquely discriminative for the predicted class. We evaluate CASE using the proposed diagnostic and a perturbation-based fidelity test, and show that it produces faithful and more class-specific explanations than existing methods.

nan


Article 764

Title@2025-06-10 (2): Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth

Title: Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth Effizientes Feintuning Quantisierter Modelle über adaptive Rank und Bitwidth 通过适应级和Bitwidth对量化模型进行高效率的微调 2505.03802v3

Authors (6): Changhai Zhou, Shijie Han, Shiyang Zhang, Yuhua Zhou, Weizhong Zhang, Cheng Jin

QLoRA effectively combines low-bit quantization and LoRA to achieve memory-friendly fine-tuning for large language models (LLM). Recently, methods based on SVD for continuous update iterations to initialize LoRA matrices to accommodate quantization errors have generally failed to consistently improve performance. Dynamic mixed precision is a natural idea for continuously improving the fine-tuning performance of quantized models, but previous methods often optimize low-rank subspaces or quantization components separately, without considering their synergy. To address this, we propose \textbf{QR-Adaptor}, a unified, gradient-free strategy that uses partial calibration data to jointly search the quantization components and the rank of low-rank spaces for each layer, thereby continuously improving model performance. QR-Adaptor does not minimize quantization error but treats precision and rank allocation as a discrete optimization problem guided by actual downstream performance and memory usage. Compared to state-of-the-art (SOTA) quantized LoRA fine-tuning methods, our approach achieves a 4.89\% accuracy improvement on GSM8K, and in some cases even outperforms the 16-bit fine-tuned model while maintaining the memory footprint of the 4-bit setting.

nan


Article 765

Title@2025-06-10 (2): Discovering Continuous-Time Memory-Based Symbolic Policies using Genetic Programming

Title: Discovering Continuous-Time Memory-Based Symbolic Policies using Genetic Programming Kontinuierliche-Zeit-Speicher-basierte symbolische Richtlinien mit genetischer Programmierung entdecken 利用遗传方案拟定办法发现基于连续时间记忆的象征性政策 2406.02765v6

Authors (3): Sigur de Vries, Sander Keemink, Marcel van Gerven

Artificial intelligence techniques are increasingly being applied to solve control problems, but often rely on black-box methods without transparent output generation. To improve the interpretability and transparency in control systems, models can be defined as white-box symbolic policies described by mathematical expressions. For better performance in partially observable and volatile environments, the symbolic policies are extended with memory represented by continuous-time latent variables, governed by differential equations. Genetic programming is used for optimisation, resulting in interpretable policies consisting of symbolic expressions. Our results show that symbolic policies with memory compare with black-box policies on a variety of control tasks. Furthermore, the benefit of the memory in symbolic policies is demonstrated on experiments where memory-less policies fall short. Overall, we present a method for evolving high-performing symbolic policies that offer interpretability and transparency, which lacks in black-box models.

nan


Article 766

Title@2025-06-10 (2): HSG-12M: A Large-Scale Spatial Multigraph Dataset

Title: HSG-12M: A Large-Scale Spatial Multigraph Dataset HSG-12M: Ein großräumiger multigraphischer Datensatz HSG-12M:大型空间多光谱数据集 2506.08618v1

Authors (5): Xianquan Yan, Hakan Akgün, Kenji Kawaguchi, N. Duane Loh, Ching Hua Lee

Existing graph benchmarks assume non-spatial, simple edges, collapsing physically distinct paths into a single link. We introduce HSG-12M, the first large-scale dataset of $\textbf{spatial multigraphs}-$graphs embedded in a metric space where multiple geometrically distinct trajectories between two nodes are retained as separate edges. HSG-12M contains 11.6 million static and 5.1 million dynamic $\textit{Hamiltonian spectral graphs}$ across 1401 characteristic-polynomial classes, derived from 177 TB of spectral potential data. Each graph encodes the full geometry of a 1-D crystal’s energy spectrum on the complex plane, producing diverse, physics-grounded topologies that transcend conventional node-coordinate datasets. To enable future extensions, we release $\texttt{Poly2Graph}$: a high-performance, open-source pipeline that maps arbitrary 1-D crystal Hamiltonians to spectral graphs. Benchmarks with popular GNNs expose new challenges in learning from multi-edge geometry at scale. Beyond its practical utility, we show that spectral graphs serve as universal topological fingerprints of polynomials, vectors, and matrices, forging a new algebra-to-graph link. HSG-12M lays the groundwork for geometry-aware graph learning and new opportunities of data-driven scientific discovery in condensed matter physics and beyond.

nan


Article 767

Title@2025-06-10 (2): Predicting and Understanding College Student Mental Health with Interpretable Machine Learning

Title: Predicting and Understanding College Student Mental Health with Interpretable Machine Learning Vorhersagen und Verständnis College Student Mental Health mit Interpretable Machine Learning 预测和理解学院学生心理健康与可解释机器学习 2503.08002v3

Authors (5): Meghna Roy Chowdhury, Wei Xuan, Shreyas Sen, Yixue Zhao, Yi Ding

Mental health issues among college students have reached critical levels, significantly impacting academic performance and overall wellbeing. Predicting and understanding mental health status among college students is challenging due to three main factors: the necessity for large-scale longitudinal datasets, the prevalence of black-box machine learning models lacking transparency, and the tendency of existing approaches to provide aggregated insights at the population level rather than individualized understanding. To tackle these challenges, this paper presents I-HOPE, the first Interpretable Hierarchical mOdel for Personalized mEntal health prediction. I-HOPE is a two-stage hierarchical model that connects raw behavioral features to mental health status through five defined behavioral categories as interaction labels. We evaluate I-HOPE on the College Experience Study, the longest longitudinal mobile sensing dataset. This dataset spans five years and captures data from both pre-pandemic periods and the COVID-19 pandemic. I-HOPE achieves a prediction accuracy of 91%, significantly surpassing the 60-70% accuracy of baseline methods. In addition, I-HOPE distills complex patterns into interpretable and individualized insights, enabling the future development of tailored interventions and improving mental health support. The code is available at https://github.com/roycmeghna/I-HOPE.

nan


Article 768

Title@2025-06-10 (2): Generalizing while preserving monotonicity in comparison-based preference learning models

Title: Generalizing while preserving monotonicity in comparison-based preference learning models Verallgemeinern unter Wahrung der Monotonizität in vergleichsbasierten Präferenz-Lernmodellen 在以比较为基础的优惠学习模式中,在保持单一性的同时,实现普遍化 2506.08616v1

Authors (4): Julien Fageot, Peva Blanchard, Gilles Bareilles, Lê-Nguyên Hoang

If you tell a learning model that you prefer an alternative $a$ over another alternative $b$, then you probably expect the model to be monotone, that is, the valuation of $a$ increases, and that of $b$ decreases. Yet, perhaps surprisingly, many widely deployed comparison-based preference learning models, including large language models, fail to have this guarantee. Until now, the only comparison-based preference learning algorithms that were proved to be monotone are the Generalized Bradley-Terry models. Yet, these models are unable to generalize to uncompared data. In this paper, we advance the understanding of the set of models with generalization ability that are monotone. Namely, we propose a new class of Linear Generalized Bradley-Terry models with Diffusion Priors, and identify sufficient conditions on alternatives’ embeddings that guarantee monotonicity. Our experiments show that this monotonicity is far from being a general guarantee, and that our new class of generalizing models improves accuracy, especially when the dataset is limited.

nan


Article 769

Title@2025-06-10 (2): PatchTrAD: A Patch-Based Transformer focusing on Patch-Wise Reconstruction Error for Time Series Anomaly Detection

Title: PatchTrAD: A Patch-Based Transformer focusing on Patch-Wise Reconstruction Error for Time Series Anomaly Detection PatchTrad: Ein Patch-basierter Transformer mit Schwerpunkt auf Patch-Wise-Reconstruction-Fehler für Zeitreihenanomalienerkennung PatchTraAD:一个以补丁为主的变换器,侧重于用于反常探测的时间序列的补补维重建错误 2504.08827v2

Authors (3): Samy-Melwan Vilhes, Gilles Gasso, Mokhtar Z Alaya

Time series anomaly detection (TSAD) focuses on identifying whether observations in streaming data deviate significantly from normal patterns. With the prevalence of connected devices, anomaly detection on time series has become paramount, as it enables real-time monitoring and early detection of irregular behaviors across various application domains. In this work, we introduce PatchTrAD, a Patch-based Transformer model for time series anomaly detection. Our approach leverages a Transformer encoder along with the use of patches under a reconstructionbased framework for anomaly detection. Empirical evaluations on multiple benchmark datasets show that PatchTrAD is on par, in terms of detection performance, with state-of-the-art deep learning models for anomaly detection while being time efficient during inference.

nan


Article 770

Title@2025-06-10 (2): A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices

Title: A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices Ein zweistufiges Datenauswahl-Framework für dateneffiziente Modellschulungen auf Edge-Geräten 边缘装置数据有效模型培训两阶段数据选择框架 2505.16563v2

Authors (4): Chen Gong, Rui Xing, Zhenzhe Zheng, Fan Wu

The demand for machine learning (ML) model training on edge devices is escalating due to data privacy and personalized service needs. However, we observe that current on-device model training is hampered by the under-utilization of on-device data, due to low training throughput, limited storage and diverse data importance. To improve data resource utilization, we propose a two-stage data selection framework {\sf Titan} to select the most important data batch from streaming data for model training with guaranteed efficiency and effectiveness. Specifically, in the first stage, {\sf Titan} filters out a candidate dataset with potentially high importance in a coarse-grained manner.In the second stage of fine-grained selection, we propose a theoretically optimal data selection strategy to identify the data batch with the highest model performance improvement to current training round. To further enhance time-and-resource efficiency, {\sf Titan} leverages a pipeline to co-execute data selection and model training, and avoids resource conflicts by exploiting idle computing resources. We evaluate {\sf Titan} on real-world edge devices and three representative edge computing tasks with diverse models and data modalities. Empirical results demonstrate that {\sf Titan} achieves up to $43\%$ reduction in training time and $6.2\%$ increase in final accuracy with minor system overhead, such as data processing delay, memory footprint and energy consumption.

nan


Article 771

Title@2025-06-10 (2): CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference

Title: CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference CENTAUR: Die Unmögliche Dreifaltigkeit von Datenschutz, Effizienz und Leistung in der Datenschutz-Bewahrung Transformer-Schlussfolgerung CENTAUR:在隐私保护变换人推断中弥合隐私、效率和性能这三方面不可能实现的三重问题 2412.10652v2

Authors (9): Jinglong Luo, Guanzhong Chen, Yehong Zhang, Shiyu Liu, Hui Wang, Yue Yu, Xun Zhou, Yuan Qi, Zenglin Xu

With the growing deployment of pre-trained models like Transformers on cloud platforms, privacy concerns about model parameters and inference data are intensifying. Existing Privacy-Preserving Transformer Inference (PPTI) frameworks face the “impossible trinity” of balancing privacy, efficiency, and performance: Secure Multi-Party Computation (SMPC)-based approaches ensure strong privacy but suffer from high computational overhead and performance losses; Conversely, permutation-based methods achieve near-plaintext efficiency and accuracy but compromise privacy by exposing sensitive model parameters and intermediate results. Bridging this gap with a single approach presents substantial challenges, motivating the introduction of CENTAUR, a groundbreaking PPTI framework that seamlessly integrates random permutations and SMPC to address the “impossible trinity”. By designing efficient PPTI algorithms tailored to the structural properties of Transformer models, CENTAUR achieves an unprecedented balance among privacy, efficiency, and performance. Our experiments demonstrate CENTAUR’s ability to resist diverse data reconstruction attacks, achieve plaintext-level inference accuracy, and boost inference speed by 5.0-30.4 times, unlocking new possibilities for secure and efficient AI deployment.

nan


Article 772

Title@2025-06-10 (2): Spiking Neural Models for Decision-Making Tasks with Learning

Title: Spiking Neural Models for Decision-Making Tasks with Learning Spiralische Neuralmodelle für Entscheidungs-Making-Aufgaben mit Lernen 学习决策任务Spiking神经模型 2506.09087v1

Authors (4): Sophie Jaffard, Giulia Mezzadri, Patricia Reynaud-Bouret, Etienne Tanré

In cognition, response times and choices in decision-making tasks are commonly modeled using Drift Diffusion Models (DDMs), which describe the accumulation of evidence for a decision as a stochastic process, specifically a Brownian motion, with the drift rate reflecting the strength of the evidence. In the same vein, the Poisson counter model describes the accumulation of evidence as discrete events whose counts over time are modeled as Poisson processes, and has a spiking neurons interpretation as these processes are used to model neuronal activities. However, these models lack a learning mechanism and are limited to tasks where participants have prior knowledge of the categories. To bridge the gap between cognitive and biological models, we propose a biologically plausible Spiking Neural Network (SNN) model for decision-making that incorporates a learning mechanism and whose neurons activities are modeled by a multivariate Hawkes process. First, we show a coupling result between the DDM and the Poisson counter model, establishing that these two models provide similar categorizations and reaction times and that the DDM can be approximated by spiking Poisson neurons. To go further, we show that a particular DDM with correlated noise can be derived from a Hawkes network of spiking neurons governed by a local learning rule. In addition, we designed an online categorization task to evaluate the model predictions. This work provides a significant step toward integrating biologically relevant neural mechanisms into cognitive models, fostering a deeper understanding of the relationship between neural activity and behavior.

nan


Article 773

Title@2025-06-10 (2): Sample Efficient Demonstration Selection for In-Context Learning

Title: Sample Efficient Demonstration Selection for In-Context Learning Beispiel Effiziente Demonstrationsauswahl für das In-Context-Lernen 内部学习的高效示范选择 2506.08607v1

Authors (4): Kiran Purohit, V Venktesh, Sourangshu Bhattacharya, Avishek Anand

The in-context learning paradigm with LLMs has been instrumental in advancing a wide range of natural language processing tasks. The selection of few-shot examples (exemplars / demonstration samples) is essential for constructing effective prompts under context-length budget constraints. In this paper, we formulate the exemplar selection task as a top-m best arms identification problem. A key challenge in this setup is the exponentially large number of arms that need to be evaluated to identify the m-best arms. We propose CASE (Challenger Arm Sampling for Exemplar selection), a novel sample-efficient selective exploration strategy that maintains a shortlist of “challenger” arms, which are current candidates for the top-m arms. In each iteration, only one of the arms from this shortlist or the current topm set is pulled, thereby reducing sample complexity and, consequently, the number of LLM evaluations. Furthermore, we model the scores of exemplar subsets (arms) using a parameterized linear scoring function, leading to stochastic linear bandits setting. CASE achieves remarkable efficiency gains of up to 7x speedup in runtime while requiring 7x fewer LLM calls (87% reduction) without sacrificing performance compared to state-of-the-art exemplar selection methods. We release our code and data at https://github.com/kiranpurohit/CASE

nan


Article 774

Title@2025-06-10 (2): Flow Matching Meets PDEs: A Unified Framework for Physics-Constrained Generation

Title: Flow Matching Meets PDEs: A Unified Framework for Physics-Constrained Generation Flow Matching trifft auf PDEs: Ein einheitliches Framework für physikbedingte Generationen 流动匹配符合PDEs:受物理约束的一代的统一框架 2506.08604v1

Authors (4): Giacomo Baldan, Qiang Liu, Alberto Guardone, Nils Thuerey

Generative machine learning methods, such as diffusion models and flow matching, have shown great potential in modeling complex system behaviors and building efficient surrogate models. However, these methods typically learn the underlying physics implicitly from data. We propose Physics-Based Flow Matching (PBFM), a novel generative framework that explicitly embeds physical constraints, both PDE residuals and algebraic relations, into the flow matching objective. We also introduce temporal unrolling at training time that improves the accuracy of the final, noise-free sample prediction. Our method jointly minimizes the flow matching loss and the physics-based residual loss without requiring hyperparameter tuning of their relative weights. Additionally, we analyze the role of the minimum noise level, $\sigma_{\min}$, in the context of physical constraints and evaluate a stochastic sampling strategy that helps to reduce physical residuals. Through extensive benchmarks on three representative PDE problems, we show that our approach yields up to an $8\times$ more accurate physical residuals compared to FM, while clearly outperforming existing algorithms in terms of distributional accuracy. PBFM thus provides a principled and efficient framework for surrogate modeling, uncertainty quantification, and accelerated simulation in physics and engineering applications.

nan


Article 775

Title@2025-06-10 (2): CALT: A Library for Computer Algebra with Transformer

Title: CALT: A Library for Computer Algebra with Transformer CALT: Eine Bibliothek für Computer Algebra mit Transformer CALT:一个带有变换器的计算机代数库 2506.08600v1

Authors (3): Hiroshi Kera, Shun Arakawa, Yuta Sato

Recent advances in artificial intelligence have demonstrated the learnability of symbolic computation through end-to-end deep learning. Given a sufficient number of examples of symbolic expressions before and after the target computation, Transformer models - highly effective learners of sequence-to-sequence functions - can be trained to emulate the computation. This development opens up several intriguing challenges and new research directions, which require active contributions from the symbolic computation community. In this work, we introduce Computer Algebra with Transformer (CALT), a user-friendly Python library designed to help non-experts in deep learning train models for symbolic computation tasks.

nan


Article 776

Title@2025-06-10 (2): Model-based learning for joint channel estimationand hybrid MIMO precoding

Title: Model-based learning for joint channel estimationand hybrid MIMO precoding Modellbasiertes Lernen für gemeinsame Kanalschätzung und hybride MIMO-Vorkodierung 联合渠道估计和混合混合混合的混合混合混合混合混合混合混合混合模式的模型学习 2505.04255v2

Authors (5): Nay Klaimi, Amira Bedoui, Clément Elvira, Philippe Mary, Luc Le Magoarou

Hybrid precoding is a key ingredient of cost-effective massive multiple-input multiple-output transceivers. However, setting jointly digital and analog precoders to optimally serve multiple users is a difficult optimization problem. Moreover, it relies heavily on precise knowledge of the channels, which is difficult to obtain, especially when considering realistic systems comprising hardware impairments. In this paper, a joint channel estimation and hybrid precoding method is proposed, which consists in an end-to-end architecture taking received pilots as inputs and outputting pre-coders. The resulting neural network is fully model-based, making it lightweight and interpretable with very few learnable parameters. The channel estimation step is performed using the unfolded matching pursuit algorithm, accounting for imperfect knowledge of the antenna system, while the precoding step is done via unfolded projected gradient ascent. The great potential of the proposed method is empirically demonstrated on realistic synthetic channels.

nan


Article 777

Title@2025-06-10 (2): Solving excited states for long-range interacting trapped ions with neural networks

Title: Solving excited states for long-range interacting trapped ions with neural networks Lösen von angeregten Zuständen für Langstrecken-Interagieren von gefangenen Ionen mit neuronalen Netzwerken 用神经网络解决长距离互动嵌入离子的兴奋状态 2506.08594v1

Authors (7): Yixuan Ma, Chang Liu, Weikang Li, Shun-Yao Zhang, L. -M. Duan, Yukai Wu, Dong-Ling Deng

The computation of excited states in strongly interacting quantum many-body systems is of fundamental importance. Yet, it is notoriously challenging due to the exponential scaling of the Hilbert space dimension with the system size. Here, we introduce a neural network-based algorithm that can simultaneously output multiple low-lying excited states of a quantum many-body spin system in an accurate and efficient fashion. This algorithm, dubbed the neural quantum excited-state (NQES) algorithm, requires no explicit orthogonalization of the states and is generally applicable to higher dimensions. We demonstrate, through concrete examples including the Haldane-Shastry model with all-to-all interactions, that the NQES algorithm is capable of efficiently computing multiple excited states and their related observable expectations. In addition, we apply the NQES algorithm to two classes of long-range interacting trapped-ion systems in a two-dimensional Wigner crystal. For non-decaying all-to-all interactions with alternating signs, our computed low-lying excited states bear spatial correlation patterns similar to those of the ground states, which closely match recent experimental observations that the quasi-adiabatically prepared state accurately reproduces analytical ground-state correlations. For a system of up to 300 ions with power-law decaying antiferromagnetic interactions, we successfully uncover its gap scaling and correlation features. Our results establish a scalable and efficient algorithm for computing excited states of interacting quantum many-body systems, which holds potential applications ranging from benchmarking quantum devices to photoisomerization.

nan


Article 778

Title@2025-06-10 (2): Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings

Title: Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings Dichte Retriever können bei einfachen Abfragen scheitern: Enthüllen des Granularitäts-Dilemmas der Einbettungen 简单查询: 读取嵌入器的颗粒性三角形 2506.08592v1

Authors (6): Liyan Xu, Zhenlin Su, Mo Yu, Jiangnan Li, Fandong Meng, Jie Zhou

This work focuses on an observed limitation of text encoders: embeddings may not be able to recognize fine-grained entities or events within the semantics, resulting in failed dense retrieval on even simple cases. To examine such behaviors, we first introduce a new evaluation dataset in Chinese, named CapRetrieval, whose passages are image captions, and queries are phrases inquiring entities or events in various forms. Zero-shot evaluation suggests that encoders may fail on these fine-grained matching, regardless of training sources or model sizes. Aiming for enhancement, we proceed to finetune encoders with our proposed data generation strategies, which obtains the best performance on CapRetrieval. Within this process, we further identify an issue of granularity dilemma, a challenge for embeddings to express fine-grained salience while aligning with overall semantics. Our dataset, code and models in this work are publicly released at https://github.com/lxucs/CapRetrieval.

nan


Article 779

Title@2025-06-10 (2): Diversity-Guided MLP Reduction for Efficient Large Vision Transformers

Title: Diversity-Guided MLP Reduction for Efficient Large Vision Transformers Diversity-geführte MLP-Reduktion für effiziente Großbildtransformatoren 为高效的大型愿景变异器而减少由多样性引导的减量 2506.08591v1

Authors (5): Chengchao Shen, Hourun Zhu, Gongfan Fang, Jianxin Wang, Xinchao Wang

Transformer models achieve excellent scaling property, where the performance is improved with the increment of model capacity. However, large-scale model parameters lead to an unaffordable cost of computing and memory. We analyze popular transformer architectures and find that multilayer perceptron (MLP) modules take up the majority of model parameters. To this end, we focus on the recoverability of the compressed models and propose a Diversity-Guided MLP Reduction (DGMR) method to significantly reduce the parameters of large vision transformers with only negligible performance degradation. Specifically, we conduct a Gram-Schmidt weight pruning strategy to eliminate redundant neurons of MLP hidden layer, while preserving weight diversity for better performance recover during distillation. Compared to the model trained from scratch, our pruned model only requires 0.06\% data of LAION-2B (for the training of large vision transformers) without labels (ImageNet-1K) to recover the original performance. Experimental results on several state-of-the-art large vision transformers demonstrate that our method achieves a more than 57.0\% parameter and FLOPs reduction in a near lossless manner. Notably, for EVA-CLIP-E (4.4B), our method accomplishes a 71.5\% parameter and FLOPs reduction without performance degradation. The source code and trained weights are available at https://github.com/visresearch/DGMR.

nan


Article 780

Title@2025-06-10 (2): How transformers learn structured data: insights from hierarchical filtering

Title: How transformers learn structured data: insights from hierarchical filtering Wie Transformatoren strukturierte Daten lernen: Erkenntnisse aus der hierarchischen Filterung 变压器如何学习结构化数据:从上层过滤器获得的洞察力 2408.15138v3

Authors (4): Jerome Garnier-Brun, Marc Mézard, Emanuele Moscato, Luca Saglietti

Understanding the learning process and the embedded computation in transformers is becoming a central goal for the development of interpretable AI. In the present study, we introduce a hierarchical filtering procedure for data models of sequences on trees, allowing us to hand-tune the range of positional correlations in the data. Leveraging this controlled setting, we provide evidence that vanilla encoder-only transformers can approximate the exact inference algorithm when trained on root classification and masked language modeling tasks, and study how this computation is discovered and implemented. We find that correlations at larger distances, corresponding to increasing layers of the hierarchy, are sequentially included by the network during training. By comparing attention maps from models trained with varying degrees of filtering and by probing the different encoder levels, we find clear evidence of a reconstruction of correlations on successive length scales corresponding to the various levels of the hierarchy, which we relate to a plausible implementation of the exact inference algorithm within the same architecture.

nan


Article 781

Title@2025-06-10 (2): Diffusion-based Time Series Forecasting for Sewerage Systems

Title: Diffusion-based Time Series Forecasting for Sewerage Systems Diffusionsbasierte Zeitreihenprognosen für Kanalisationssysteme 排污系统时间序列预报 2506.08577v1

Authors (5): Nicholas A. Pearson, Francesca Cairoli, Luca Bortolussi, Davide Russo, Francesca Zanello

We introduce a novel deep learning approach that harnesses the power of generative artificial intelligence to enhance the accuracy of contextual forecasting in sewerage systems. By developing a diffusion-based model that processes multivariate time series data, our system excels at capturing complex correlations across diverse environmental signals, enabling robust predictions even during extreme weather events. To strengthen the model’s reliability, we further calibrate its predictions with a conformal inference technique, tailored for probabilistic time series data, ensuring that the resulting prediction intervals are statistically reliable and cover the true target values with a desired confidence level. Our empirical tests on real sewerage system data confirm the model’s exceptional capability to deliver reliable contextual predictions, maintaining accuracy even under severe weather conditions.

nan


Article 782

Title@2025-06-10 (2): FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

Title: FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL FlickerFusion: Intra-Trajektorie Domain Generalizing Multi-Agent RL FlickerFusion: 磁盘内域域 通用多代理 RL 2410.15876v4

Authors (8): Woosung Koh, Wonbeen Oh, Siyeol Kim, Suhin Shin, Hyeongjin Kim, Jaein Jang, Junghyun Lee, Se-Young Yun

Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory – a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. FlickerFusion stochastically drops out parts of the observation space, emulating being in-domain when inferenced OOD. The results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-`a-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.

nan


Article 783

Title@2025-06-10 (2): How Malicious AI Swarms Can Threaten Democracy

Title: How Malicious AI Swarms Can Threaten Democracy Wie bösartige KI-Schwärme die Demokratie bedrohen können 恶意的AI Swararms能够如何威胁民主 2506.06299v2

Authors (21): Daniel Thilo Schroeder, Meeyoung Cha, Andrea Baronchelli, Nick Bostrom, Nicholas A. Christakis, David Garcia, Amit Goldenberg, Yara Kyrychenko, Kevin Leyton-Brown, Nina Lutz, Gary Marcus, Filippo Menczer, Gordon Pennycook, David G. Rand, Frank Schweitzer, Christopher Summerfield, Audrey Tang, Jay Van Bavel, Sander van der Linden, Dawn Song, Jonas R. Kunst

Advances in AI portend a new era of sophisticated disinformation operations. While individual AI systems already create convincing – and at times misleading – information, an imminent development is the emergence of malicious AI swarms. These systems can coordinate covertly, infiltrate communities, evade traditional detectors, and run continuous A/B tests, with round-the-clock persistence. The result can include fabricated grassroots consensus, fragmented shared reality, mass harassment, voter micro-suppression or mobilization, contamination of AI training data, and erosion of institutional trust. With democratic processes worldwide increasingly vulnerable, we urge a three-pronged response: (1) platform-side defenses – always-on swarm-detection dashboards, pre-election high-fidelity swarm-simulation stress-tests, transparency audits, and optional client-side “AI shields” for users; (2) model-side safeguards – standardized persuasion-risk tests, provenance-authenticating passkeys, and watermarking; and (3) system-level oversight – a UN-backed AI Influence Observatory.

nan


Article 784

Title@2025-06-10 (2): The Geometries of Truth Are Orthogonal Across Tasks

Title: The Geometries of Truth Are Orthogonal Across Tasks Die Geometrien der Wahrheit sind orthogonal über Aufgaben hinweg 真理的地貌不同任务 2506.08572v1

Authors (7): Waiss Azizian, Michael Kirchhof, Eugene Ndiaye, Louis Bethune, Michal Klein, Pierre Ablin, Marco Cuturi

Large Language Models (LLMs) have demonstrated impressive generalization capabilities across various tasks, but their claim to practical relevance is still mired by concerns on their reliability. Recent works have proposed examining the activations produced by an LLM at inference time to assess whether its answer to a question is correct. Some works claim that a “geometry of truth” can be learned from examples, in the sense that the activations that generate correct answers can be distinguished from those leading to mistakes with a linear classifier. In this work, we underline a limitation of these approaches: we observe that these “geometries of truth” are intrinsically task-dependent and fail to transfer across tasks. More precisely, we show that linear classifiers trained across distinct tasks share little similarity and, when trained with sparsity-enforcing regularizers, have almost disjoint supports. We show that more sophisticated approaches (e.g., using mixtures of probes and tasks) fail to overcome this limitation, likely because activation vectors commonly used to classify answers form clearly separated clusters when examined across tasks.

nan


Article 785

Title@2025-06-10 (2): Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification

Title: Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification Asymptotische Normalität von unendlichen zentrierten Zufallswäldern -Anwendung auf eine unausgewogene Klassifizierung 无限集中的随机森林的零散状态—-对平衡分类的适用 2506.08548v1

Authors (4): Moria Mayala, Erwan Scornet, Charles Tillier, Olivier Wintenberger

Many classification tasks involve imbalanced data, in which a class is largely underrepresented. Several techniques consists in creating a rebalanced dataset on which a classifier is trained. In this paper, we study theoretically such a procedure, when the classifier is a Centered Random Forests (CRF). We establish a Central Limit Theorem (CLT) on the infinite CRF with explicit rates and exact constant. We then prove that the CRF trained on the rebalanced dataset exhibits a bias, which can be removed with appropriate techniques. Based on an importance sampling (IS) approach, the resulting debiased estimator, called IS-ICRF, satisfies a CLT centered at the prediction function value. For high imbalance settings, we prove that the IS-ICRF estimator enjoys a variance reduction compared to the ICRF trained on the original data. Therefore, our theoretical analysis highlights the benefits of training random forests on a rebalanced dataset (followed by a debiasing procedure) compared to using the original data. Our theoretical results, especially the variance rates and the variance reduction, appear to be valid for Breiman’s random forests in our experiments.

nan


Article 786

Title@2025-06-10 (2): LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation

Title: LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation LLM-ML Teaming: Integrierte Symbolische Dekodierung und Gradientensuche nach gültiger und stabiler Generativer Feature Transformation LLM-ML 团队化:综合符号解码和渐进搜索,以进行有效和稳定的生成地貌转型 2506.09085v1

Authors (7): Xinyuan Wang, Haoyue Bai, Nanxu Gong, Wangyang Ying, Sixun Dong, Xiquan Cui, Yanjie Fu

Feature transformation enhances data representation by deriving new features from the original data. Generative AI offers potential for this task, but faces challenges in stable generation (consistent outputs) and valid generation (error-free sequences). Existing methods–traditional MLs’ low validity and LLMs’ instability–fail to resolve both. We find that LLMs ensure valid syntax, while ML’s gradient-steered search stabilizes performance. To bridge this gap, we propose a teaming framework combining LLMs’ symbolic generation with ML’s gradient optimization. This framework includes four steps: (1) golden examples generation, aiming to prepare high-quality samples with the ground knowledge of the teacher LLM; (2) feature transformation sequence embedding and search, intending to uncover potentially superior embeddings within the latent space; (3) student LLM feature transformation, aiming to distill knowledge from the teacher LLM; (4) LLM-ML decoder teaming, dedicating to combine ML and the student LLM probabilities for valid and stable generation. The experiments on various datasets show that the teaming policy can achieve 5\% improvement in downstream performance while reducing nearly half of the error cases. The results also demonstrate the efficiency and robustness of the teaming policy. Additionally, we also have exciting findings on LLMs’ capacity to understand the original data.

nan


Article 787

Title@2025-06-10 (2): Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models

Title: Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models Verbesserte Ganzseitenoptimierung durch gemischt-körnige Belohnungsmechanismen-angepasste Sprachmodelle 通过混合发奖机制-成熟语言模式优化全页 2506.09084v1

Authors (3): Xinyuan Wang, Liang Wu, Yanjie Fu

Optimizing the presentation of search and recommendation results is crucial to enhancing user experience and engagement. Whole Page Optimization (WPO) plays a pivotal role in this process, as it directly influences how information is surfaced to users. While Pre-trained Large Language Models (LLMs) have demonstrated remarkable capabilities in generating coherent and contextually relevant content, fine-tuning these models for complex tasks like WPO presents challenges. Specifically, the need for extensive human-annotated data to mitigate issues such as hallucinations and model instability can be prohibitively expensive, especially in large-scale systems that interact with millions of items daily. In this work, we address the challenge of fine-tuning LLMs for WPO by using user feedback as the supervision. Unlike manually labeled datasets, user feedback is inherently noisy and less precise. To overcome this, we propose a reward-based fine-tuning approach, PageLLM, which employs a mixed-grained reward mechanism that combines page-level and item-level rewards. The page-level reward evaluates the overall quality and coherence, while the item-level reward focuses on the accuracy and relevance of key recommendations. This dual-reward structure ensures that both the holistic presentation and the critical individual components are optimized. We validate PageLLM on both public and industrial datasets. PageLLM outperforms baselines and achieves a 0.44\% GMV increase in an online A/B test with over 10 million users, demonstrating its real-world impact.

nan


Article 788

Title@2025-06-10 (2): Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models

Title: Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models Auf dem Weg zu einem autonomen Stärkungslernen für die real-weltliche Robotermanipulation mit großen Sprachmodellen 争取以大语言模式进行现实世界机器人操纵的自主强化学习 2503.04280v4

Authors (6): Niccolò Turcato, Matteo Iovino, Aris Synodinos, Alberto Dalla Libera, Ruggero Carli, Pietro Falco

Recent advancements in Large Language Models (LLMs) and Visual Language Models (VLMs) have significantly impacted robotics, enabling high-level semantic motion planning applications. Reinforcement Learning (RL), a complementary paradigm, enables agents to autonomously optimize complex behaviors through interaction and reward signals. However, designing effective reward functions for RL remains challenging, especially in real-world tasks where sparse rewards are insufficient and dense rewards require elaborate design. In this work, we propose Autonomous Reinforcement learning for Complex Human-Informed Environments (ARCHIE), an unsupervised pipeline leveraging GPT-4, a pre-trained LLM, to generate reward functions directly from natural language task descriptions. The rewards are used to train RL agents in simulated environments, where we formalize the reward generation process to enhance feasibility. Additionally, GPT-4 automates the coding of task success criteria, creating a fully automated, one-shot procedure for translating human-readable text into deployable robot skills. Our approach is validated through extensive simulated experiments on single-arm and bi-manual manipulation tasks using an ABB YuMi collaborative robot, highlighting its practicality and effectiveness. Tasks are demonstrated on the real robot setup.

nan


Article 789

Title@2025-06-10 (2): Structured Variational $D$-Decomposition for Accurate and Stable Low-Rank Approximation

Title: Structured Variational $D$-Decomposition for Accurate and Stable Low-Rank Approximation Strukturierte Variation $D$-Dekomposition für genaue und stabile Low-Rank-Annäherung 结构变化($D美元)-准确度和稳定、低 Rank 接近度的削减 2506.08535v1

Authors (1): Ronald Katende

We introduce the $D$-decomposition, a non-orthogonal matrix factorization of the form $A \approx P D Q$, where $P \in \mathbb{R}^{n \times k}$, $D \in \mathbb{R}^{k \times k}$, and $Q \in \mathbb{R}^{k \times n}$. The decomposition is defined variationally by minimizing a regularized Frobenius loss, allowing control over rank, sparsity, and conditioning. Unlike algebraic factorizations such as LU or SVD, it is computed by alternating minimization. We establish existence and perturbation stability of the solution and show that each update has complexity $\mathcal{O}(n^2k)$. Benchmarks against truncated SVD, CUR, and nonnegative matrix factorization show improved reconstruction accuracy on MovieLens, MNIST, Olivetti Faces, and gene expression matrices, particularly under sparsity and noise.

nan


Article 790

Title@2025-06-10 (2): Regularized Langevin Dynamics for Combinatorial Optimization

Title: Regularized Langevin Dynamics for Combinatorial Optimization Regularisierte Langevin-Dynamik für kombinatorische Optimierung 固定的用于组合优化的兰埃文动态 2502.00277v2

Authors (2): Shengyu Feng, Yiming Yang

This work proposes a simple yet effective sampling framework for combinatorial optimization (CO). Our method builds on discrete Langevin dynamics (LD), an efficient gradient-guided generative paradigm. However, we observe that directly applying LD often leads to limited exploration. To overcome this limitation, we propose the Regularized Langevin Dynamics (RLD), which enforces an expected distance between the sampled and current solutions, effectively avoiding local minima. We develop two CO solvers on top of RLD, one based on simulated annealing (SA), and the other one based on neural network (NN). Empirical results on three classic CO problems demonstrate that both of our methods can achieve comparable or better performance against the previous state-of-the-art (SOTA) SA- and NN-based solvers. In particular, our SA algorithm reduces the runtime of the previous SOTA SA method by up to 80\%, while achieving equal or superior performance. In summary, RLD offers a promising framework for enhancing both traditional heuristics and NN models to solve CO problems. Our code is available at https://github.com/Shengyu-Feng/RLD4CO.

nan


Article 791

Title@2025-06-10 (2): Federated Learning: From Theory to Practice

Title: Federated Learning: From Theory to Practice Föderiertes Lernen: Von der Theorie zur Praxis 联邦学习:从理论到实践 2505.19183v2

Authors (1): A. Jung

This book offers a hands-on introduction to building and understanding federated learning (FL) systems. FL enables multiple devices – such as smartphones, sensors, or local computers – to collaboratively train machine learning (ML) models, while keeping their data private and local. It is a powerful solution when data cannot or should not be centralized due to privacy, regulatory, or technical reasons. The book is designed for students, engineers, and researchers who want to learn how to design scalable, privacy preserving FL systems. Our main focus is on personalization: enabling each device to train its own model while still benefiting from collaboration with relevant devices. This is achieved by leveraging similarities between (the learning tasks associated with) devices that are encoded by the weighted edges (or links) of a federated learning network (FL network). The key idea is to represent real-world FL systems as networks of devices, where nodes correspond to device and edges represent communication links and data similarities between them. The training of personalized models for these devices can be naturally framed as a distributed optimization problem. This optimization problem is referred to as generalized total variation minimization (GTVMin) and ensures that devices with similar learning tasks learn similar model parameters. Our approach is both mathematically principled and practically motivated. While we introduce some advanced ideas from optimization theory and graph-based learning, we aim to keep the book accessible. Readers are guided through the core ideas step by step, with intuitive explanations.

nan


Article 792

Title@2025-06-10 (2): Robust Evolutionary Multi-Objective Network Architecture Search for Reinforcement Learning (EMNAS-RL)

Title: Robust Evolutionary Multi-Objective Network Architecture Search for Reinforcement Learning (EMNAS-RL) Robuste evolutionäre multiobjektive Netzwerkarchitektur Suche nach Verstärkungslernen (EMNAS-RL) 强劲进化多目标网络强化学习建筑搜索(EMNAS-RL) 2506.08533v1

Authors (4): Nihal Acharya Adde, Alexandra Gianzina, Hanno Gottschalk, Andreas Ebert

This paper introduces Evolutionary Multi-Objective Network Architecture Search (EMNAS) for the first time to optimize neural network architectures in large-scale Reinforcement Learning (RL) for Autonomous Driving (AD). EMNAS uses genetic algorithms to automate network design, tailored to enhance rewards and reduce model size without compromising performance. Additionally, parallelization techniques are employed to accelerate the search, and teacher-student methodologies are implemented to ensure scalable optimization. This research underscores the potential of transfer learning as a robust framework for optimizing performance across iterative learning processes by effectively leveraging knowledge from earlier generations to enhance learning efficiency and stability in subsequent generations. Experimental results demonstrate that tailored EMNAS outperforms manually designed models, achieving higher rewards with fewer parameters. The findings of these strategies contribute positively to EMNAS for RL in autonomous driving, advancing the field toward better-performing networks suitable for real-world scenarios.

nan


Article 793

Title@2025-06-10 (2): Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks

Title: Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks Effiziente robuste konforme Vorhersage über Lipschitz-gebundene Netzwerke 通过Lipschitz-Bounds网络进行非正式预测 2506.05434v2

Authors (7): Thomas Massena, Léo andéol, Thibaut Boissin, Franck Mamalet, Corentin Friedrich, Mathieu Serrurier, Sébastien Gerchinovitz

Conformal Prediction (CP) has proven to be an effective post-hoc method for improving the trustworthiness of neural networks by providing prediction sets with finite-sample guarantees. However, under adversarial attacks, classical conformal guarantees do not hold anymore: this problem is addressed in the field of Robust Conformal Prediction. Several methods have been proposed to provide robust CP sets with guarantees under adversarial perturbations, but, for large scale problems, these sets are either too large or the methods are too computationally demanding to be deployed in real life scenarios. In this work, we propose a new method that leverages Lipschitz-bounded networks to precisely and efficiently estimate robust CP sets. When combined with a 1-Lipschitz robust network, we demonstrate that our lip-rcp method outperforms state-of-the-art results in both the size of the robust CP sets and computational efficiency in medium and large-scale scenarios such as ImageNet. Taking a different angle, we also study vanilla CP under attack, and derive new worst-case coverage bounds of vanilla CP sets, which are valid simultaneously for all adversarial attack levels. Our lip-rcp method makes this second approach as efficient as vanilla CP while also allowing robustness guarantees.

nan


Article 794

Title@2025-06-10 (2): Leveraging chaos in the training of artificial neural networks

Title: Leveraging chaos in the training of artificial neural networks Das Chaos in der Ausbildung künstlicher neuronaler Netze abbauen 在培训人工神经网络时利用混乱 2506.08523v1

Authors (3): Pedro Jiménez-González, Miguel C. Soriano, Lucas Lacasa

Traditional algorithms to optimize artificial neural networks when confronted with a supervised learning task are usually exploitation-type relaxational dynamics such as gradient descent (GD). Here, we explore the dynamics of the neural network trajectory along training for unconventionally large learning rates. We show that for a region of values of the learning rate, the GD optimization shifts away from purely exploitation-like algorithm into a regime of exploration-exploitation balance, as the neural network is still capable of learning but the trajectory shows sensitive dependence on initial conditions – as characterized by positive network maximum Lyapunov exponent –. Interestingly, the characteristic training time required to reach an acceptable accuracy in the test set reaches a minimum precisely in such learning rate region, further suggesting that one can accelerate the training of artificial neural networks by locating at the onset of chaos. Our results – initially illustrated for the MNIST classification task – qualitatively hold for a range of supervised learning tasks, learning architectures and other hyperparameters, and showcase the emergent, constructive role of transient chaotic dynamics in the training of artificial neural networks.

nan


Article 795

Title@2025-06-10 (2): FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching

Title: FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching FEDTAIL: Federated Long-Tailed Domain Generalization mit Sharpness-Guided Gradient Matching FEDTAIL: 采用尖锐引导梯度匹配法的联邦长期失败域 2506.08518v1

Authors (4): Sunny Gupta, Nikita Jangid, Shounak Das, Amit Sethi

Domain Generalization (DG) seeks to train models that perform reliably on unseen target domains without access to target data during training. While recent progress in smoothing the loss landscape has improved generalization, existing methods often falter under long-tailed class distributions and conflicting optimization objectives. We introduce FedTAIL, a federated domain generalization framework that explicitly addresses these challenges through sharpness-guided, gradient-aligned optimization. Our method incorporates a gradient coherence regularizer to mitigate conflicts between classification and adversarial objectives, leading to more stable convergence. To combat class imbalance, we perform class-wise sharpness minimization and propose a curvature-aware dynamic weighting scheme that adaptively emphasizes underrepresented tail classes. Furthermore, we enhance conditional distribution alignment by integrating sharpness-aware perturbations into entropy regularization, improving robustness under domain shift. FedTAIL unifies optimization harmonization, class-aware regularization, and conditional alignment into a scalable, federated-compatible framework. Extensive evaluations across standard domain generalization benchmarks demonstrate that FedTAIL achieves state-of-the-art performance, particularly in the presence of domain shifts and label imbalance, validating its effectiveness in both centralized and federated settings. Code: https://github.com/sunnyinAI/FedTail

nan


Article 796

Title@2025-06-10 (2): Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering

Title: Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering Meta-Adaptive Prompt-Destillation für wenige-heiße visuelle Frage-Antworten 用于少数浅光视觉问题解答的元适应性快速蒸馏 2506.06905v2

Authors (3): Akash Gupta, Amos Storkey, Mirella Lapata

Large Multimodal Models (LMMs) often rely on in-context learning (ICL) to perform new tasks with minimal supervision. However, ICL performance, especially in smaller LMMs, is inconsistent and does not always improve monotonically with increasing examples. We hypothesize that this occurs due to the LMM being overwhelmed by additional information present in the image embeddings, which is not required for the downstream task. To address this, we propose a meta-learning approach that provides an alternative for inducing few-shot capabilities in LMMs, using a fixed set of soft prompts that are distilled from task-relevant image features and can be adapted at test time using a few examples. To facilitate this distillation, we introduce an attention-mapper module that can be easily integrated with the popular LLaVA v1.5 architecture and is jointly learned with soft prompts, enabling task adaptation in LMMs under low-data regimes with just a few gradient steps. Evaluation on the VL-ICL Bench shows that our method consistently outperforms ICL and related prompt-tuning approaches, even under image perturbations, improving task induction and reasoning across visual question answering tasks.

nan


Article 797

Title@2025-06-10 (2): NeurIPS 2024 ML4CFD Competition: Results and Retrospective Analysis

Title: NeurIPS 2024 ML4CFD Competition: Results and Retrospective Analysis NeurIPS 2024 ML4CFD Wettbewerb: Ergebnisse und Retrospektivenanalyse NeurIPS 2024 ML4CFD竞争:结果和回溯性分析 2506.08516v1

Authors (18): Mouadh Yagoubi, David Danan, Milad Leyli-Abadi, Ahmed Mazari, Jean-Patrick Brunet, Abbas Kabalan, Fabien Casenave, Yuxin Ma, Giovanni Catalani, Jean Fesquet, Jacob Helwig, Xuan Zhang, Haiyang Yu, Xavier Bertrand, Frederic Tost, Michael Baurheim, Joseph Morlier, Shuiwang Ji

The integration of machine learning (ML) into the physical sciences is reshaping computational paradigms, offering the potential to accelerate demanding simulations such as computational fluid dynamics (CFD). Yet, persistent challenges in accuracy, generalization, and physical consistency hinder the practical deployment of ML models in scientific domains. To address these limitations and systematically benchmark progress, we organized the ML4CFD competition, centered on surrogate modeling for aerodynamic simulations over two-dimensional airfoils. The competition attracted over 240 teams, who were provided with a curated dataset generated via OpenFOAM and evaluated through a multi-criteria framework encompassing predictive accuracy, physical fidelity, computational efficiency, and out-of-distribution generalization. This retrospective analysis reviews the competition outcomes, highlighting several approaches that outperformed baselines under our global evaluation score. Notably, the top entry exceeded the performance of the original OpenFOAM solver on aggregate metrics, illustrating the promise of ML-based surrogates to outperform traditional solvers under tailored criteria. Drawing from these results, we analyze the key design principles of top submissions, assess the robustness of our evaluation framework, and offer guidance for future scientific ML challenges.

nan


Article 798

Title@2025-06-10 (2): DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training

Title: DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training DiffGradCAM: Eine universelle Aktivierungskarte der Klasse, die dem adversarialen Training standhält DiffGradCAM: 通用级启动地图抗反向培训 2506.08514v1

Authors (3): Jacob Piland, Chris Sweet, Adam Czakja

Class Activation Mapping (CAM) and its gradient-based variants (e.g., GradCAM) have become standard tools for explaining Convolutional Neural Network (CNN) predictions. However, these approaches typically focus on individual logits, while for neural networks using softmax, the class membership probability estimates depend \textit{only} on the \textit{differences} between logits, not on their absolute values. This disconnect leaves standard CAMs vulnerable to adversarial manipulation, such as passive fooling, where a model is trained to produce misleading CAMs without affecting decision performance. We introduce \textbf{Salience-Hoax Activation Maps (SHAMs)}, an \emph{entropy-aware form of passive fooling} that serves as a benchmark for CAM robustness under adversarial conditions. To address the passive fooling vulnerability, we then propose \textbf{DiffGradCAM}, a novel, lightweight, and contrastive approach to class activation mapping that is both non-suceptible to passive fooling, but also matches the output of standard CAM methods such as GradCAM in the non-adversarial case. Together, SHAM and DiffGradCAM establish a new framework for probing and improving the robustness of saliency-based explanations. We validate both contributions across multi-class tasks with few and many classes.

nan


Article 799

Title@2025-06-10 (2): Human-Aligned Image Models Improve Visual Decoding from the Brain

Title: Human-Aligned Image Models Improve Visual Decoding from the Brain Menschlich ausgerichtete Imagemodelle verbessern die visuelle Dekodierung aus dem Gehirn 人与人之间的图像模型改进大脑的视觉解码 2502.03081v3

Authors (6): Nona Rajabi, Antônio H. Ribeiro, Miguel Vasco, Farzaneh Taleb, Mårten Björkman, Danica Kragic

Decoding visual images from brain activity has significant potential for advancing brain-computer interaction and enhancing the understanding of human perception. Recent approaches align the representation spaces of images and brain activity to enable visual decoding. In this paper, we introduce the use of human-aligned image encoders to map brain signals to images. We hypothesize that these models more effectively capture perceptual attributes associated with the rapid visual stimuli presentations commonly used in visual brain data recording experiments. Our empirical results support this hypothesis, demonstrating that this simple modification improves image retrieval accuracy by up to 21% compared to state-of-the-art methods. Comprehensive experiments confirm consistent performance improvements across diverse EEG architectures, image encoders, alignment methods, participants, and brain imaging modalities

nan


Article 800

Title@2025-06-10 (2): Explaining, Fast and Slow: Abstraction and Refinement of Provable Explanations

Title: Explaining, Fast and Slow: Abstraction and Refinement of Provable Explanations Erklären, Schnell und Langsam: Abstraktion und Verfeinerung wahrnehmbarer Erklärungen 解释、快速和缓慢:可允许解释的抽象和完善 2506.08505v1

Authors (5): Shahaf Bassan, Yizhak Yisrael Elboher, Tobias Ladner, Matthias Althoff, Guy Katz

Despite significant advancements in post-hoc explainability techniques for neural networks, many current methods rely on heuristics and do not provide formally provable guarantees over the explanations provided. Recent work has shown that it is possible to obtain explanations with formal guarantees by identifying subsets of input features that are sufficient to determine that predictions remain unchanged using neural network verification techniques. Despite the appeal of these explanations, their computation faces significant scalability challenges. In this work, we address this gap by proposing a novel abstraction-refinement technique for efficiently computing provably sufficient explanations of neural network predictions. Our method abstracts the original large neural network by constructing a substantially reduced network, where a sufficient explanation of the reduced network is also provably sufficient for the original network, hence significantly speeding up the verification process. If the explanation is in sufficient on the reduced network, we iteratively refine the network size by gradually increasing it until convergence. Our experiments demonstrate that our approach enhances the efficiency of obtaining provably sufficient explanations for neural network predictions while additionally providing a fine-grained interpretation of the network’s predictions across different abstraction levels.

nan


Article 801

Title@2025-06-10 (2): CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations

Title: CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations CoMuMDR: Code-gemischter multimodaler Multidomain-Korpus für Diskurs paRsing in Gesprächen COMMDR: 混合编码的多式混合多式对话中聊天声响多式公文 2506.08504v1

Authors (6): Divyaksh Shukla, Ritesh Baviskar, Dwijesh Gohil, Aniket Tiwari, Atul Shree, Ashutosh Modi

Discourse parsing is an important task useful for NLU applications such as summarization, machine comprehension, and emotion recognition. The current discourse parsing datasets based on conversations consists of written English dialogues restricted to a single domain. In this resource paper, we introduce CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations. The corpus (code-mixed in Hindi and English) has both audio and transcribed text and is annotated with nine discourse relations. We experiment with various SoTA baseline models; the poor performance of SoTA models highlights the challenges of multi-domain code-mixed corpus, pointing towards the need for developing better models for such realistic settings.

nan


Article 802

Title@2025-06-10 (2): Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

Title: Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning Memory, Benchmark & Robots: Ein Benchmark für die Lösung komplexer Aufgaben mit Verstärkungslernen 记忆、基准和机器人:用强化学习解决复杂任务的基准 2502.10550v2

Authors (4): Egor Cherepanov, Nikita Kachaev, Alexey K. Kovalev, Aleksandr I. Panov

Memory is crucial for enabling agents to tackle complex tasks with temporal and spatial dependencies. While many reinforcement learning (RL) algorithms incorporate memory, the field lacks a universal benchmark to assess an agent’s memory capabilities across diverse scenarios. This gap is particularly evident in tabletop robotic manipulation, where memory is essential for solving tasks with partial observability and ensuring robust performance, yet no standardized benchmarks exist. To address this, we introduce MIKASA (Memory-Intensive Skills Assessment Suite for Agents), a comprehensive benchmark for memory RL, with three key contributions: (1) we propose a comprehensive classification framework for memory-intensive RL tasks, (2) we collect MIKASA-Base – a unified benchmark that enables systematic evaluation of memory-enhanced agents across diverse scenarios, and (3) we develop MIKASA-Robo (pip install mikasa-robo-suite) – a novel benchmark of 32 carefully designed memory-intensive tasks that assess memory capabilities in tabletop robotic manipulation. Our work introduces a unified framework to advance memory RL research, enabling more robust systems for real-world use. MIKASA is available at https://tinyurl.com/membenchrobots.

nan


Article 803

Title@2025-06-10 (2): DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs

Title: DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs In Konflikte geraten: In suchgesteigerten LLMs widersprüchliche Quellen erkennen und bekämpfen 钻入冲突:发现和解决搜索中的冲突源 2506.08500v1

Authors (9): Arie Cattan, Alon Jacovi, Ori Ram, Jonathan Herzig, Roee Aharoni, Sasha Goldshtein, Eran Ofek, Idan Szpektor, Avi Caciularu

Retrieval Augmented Generation (RAG) is a commonly used approach for enhancing large language models (LLMs) with relevant and up-to-date information. However, the retrieved sources can often contain conflicting information and it remains unclear how models should address such discrepancies. In this work, we first propose a novel taxonomy of knowledge conflict types in RAG, along with the desired model behavior for each type. We then introduce CONFLICTS, a high-quality benchmark with expert annotations of conflict types in a realistic RAG setting. CONFLICTS is the first benchmark that enables tracking progress on how models address a wide range of knowledge conflicts. We conduct extensive experiments on this benchmark, showing that LLMs often struggle to appropriately resolve conflicts between sources. While prompting LLMs to explicitly reason about the potential conflict in the retrieved documents significantly improves the quality and appropriateness of their responses, substantial room for improvement in future research remains.

nan


Article 804

Title@2025-06-10 (2): Chip Placement with Diffusion Models

Title: Chip Placement with Diffusion Models Chipplatzierung mit Diffusionsmodellen 与传播模型一起放置芯片 2407.12282v3

Authors (6): Vint Lee, Minh Nguyen, Leena Elzeiny, Chun Deng, Pieter Abbeel, John Wawrzynek

Macro placement is a vital step in digital circuit design that defines the physical location of large collections of components, known as macros, on a 2D chip. Because key performance metrics of the chip are determined by the placement, optimizing it is crucial. Existing learning-based methods typically fall short because of their reliance on reinforcement learning (RL), which is slow and struggles to generalize, requiring online training on each new circuit. Instead, we train a diffusion model capable of placing new circuits zero-shot, using guided sampling in lieu of RL to optimize placement quality. To enable such models to train at scale, we designed a capable yet efficient architecture for the denoising model, and propose a novel algorithm to generate large synthetic datasets for pre-training. To allow zero-shot transfer to real circuits, we empirically study the design decisions of our dataset generation algorithm, and identify several key factors enabling generalization. When trained on our synthetic data, our models generate high-quality placements on unseen, realistic circuits, achieving competitive performance on placement benchmarks compared to state-of-the-art methods.

nan


Article 805

Title@2025-06-10 (2): The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning

Title: The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning Das Gesicht des Populismus: Prüfung von Unterschieden in Gesichtsgefühlen politischer Führer mit Hilfe von maschinellem Lernen 民粹主义的面孔:利用机器学习研究政治领袖的情感表达方式的差别 2304.09914v5

Authors (2): Sara Major, Aleksandar Tomašević

Populist rhetoric employed on online media is characterized as deeply impassioned and often imbued with strong emotions. The aim of this paper is to empirically investigate the differences in affective nonverbal communication of political leaders. We use a deep-learning approach to process a sample of 220 YouTube videos of political leaders from 15 different countries, analyze their facial expressions of emotion and then examine differences in average emotion scores representing the relative presence of 6 emotional states (anger, disgust, fear, happiness, sadness, and surprise) and a neutral expression for each frame of the YouTube video. Based on a sample of manually coded images, we find that this deep-learning approach has 53-60\% agreement with human labels. We observe statistically significant differences in the average score of negative emotions between groups of leaders with varying degrees of populist rhetoric.

nan


Article 806

Title@2025-06-10 (2): Thermodynamically Consistent Latent Dynamics Identification for Parametric Systems

Title: Thermodynamically Consistent Latent Dynamics Identification for Parametric Systems Thermodynamisch konsistente Latent Dynamics-Identifikation für parametrische Systeme 参数系统热动力一致的冷流动态识别 2506.08475v1

Authors (6): Xiaolong He, Yeonjong Shin, Anthony Gruber, Sohyeon Jung, Kookjin Lee, Youngsoo Choi

We propose an efficient thermodynamics-informed latent space dynamics identification (tLaSDI) framework for the reduced-order modeling of parametric nonlinear dynamical systems. This framework integrates autoencoders for dimensionality reduction with newly developed parametric GENERIC formalism-informed neural networks (pGFINNs), which enable efficient learning of parametric latent dynamics while preserving key thermodynamic principles such as free energy conservation and entropy generation across the parameter space. To further enhance model performance, a physics-informed active learning strategy is incorporated, leveraging a greedy, residual-based error indicator to adaptively sample informative training data, outperforming uniform sampling at equivalent computational cost. Numerical experiments on the Burgers’ equation and the 1D/1V Vlasov-Poisson equation demonstrate that the proposed method achieves up to 3,528x speed-up with 1-3% relative errors, and significant reduction in training (50-90%) and inference (57-61%) cost. Moreover, the learned latent space dynamics reveal the underlying thermodynamic behavior of the system, offering valuable insights into the physical-space dynamics.

nan


Article 807

Title@2025-06-10 (2): Penalty Learning for Optimal Partitioning using Multilayer Perceptron

Title: Penalty Learning for Optimal Partitioning using Multilayer Perceptron Penalty Learning für optimale Partitionierung mit Multilayer Perceptron 使用多层 Perceptron 进行最佳分区的惩罚学习 2408.00856v4

Authors (2): Tung L Nguyen, Toby Dylan Hocking

Changepoint detection is a technique used to identify significant shifts in sequences and is widely used in fields such as finance, genomics, and medicine. To identify the changepoints, dynamic programming (DP) algorithms, particularly Optimal Partitioning (OP) family, are widely used. To control the changepoints count, these algorithms use a fixed penalty to penalize the changepoints presence. To predict the optimal value of that penalty, existing methods used simple models such as linear or tree-based, which may limit predictive performance. To address this issue, this study proposes using a multilayer perceptron (MLP) with a ReLU activation function to predict the penalty. The proposed model generates continuous predictions – as opposed to the stepwise ones in tree-based models – and handles non-linearity better than linear models. Experiments on large benchmark genomic datasets demonstrate that the proposed model improves accuracy and F1 score compared to existing models.

nan


Article 808

Title@2025-06-10 (2): A Survey of the Self Supervised Learning Mechanisms for Vision Transformers

Title: A Survey of the Self Supervised Learning Mechanisms for Vision Transformers Eine Übersicht über die selbstüberwachten Lernmechanismen für Visionstransformatoren 展望变革者自我监督学习机制调查 2408.17059v5

Authors (14): Asifullah Khan, Anabia Sohail, Mustansar Fiaz, Mehdi Hassan, Tariq Habib Afridi, Sibghat Ullah Marwat, Farzeen Munir, Safdar Ali, Hannan Naseem, Muhammad Zaigham Zaheer, Kamran Ali, Tangina Sultana, Ziaurrehman Tanoli, Naeem Akhter

Vision Transformers (ViTs) have recently demonstrated remarkable performance in computer vision tasks. However, their parameter-intensive nature and reliance on large amounts of data for effective performance have shifted the focus from traditional human-annotated labels to unsupervised learning and pretraining strategies that uncover hidden structures within the data. In response to this challenge, self-supervised learning (SSL) has emerged as a promising paradigm. SSL leverages inherent relationships within the data itself as a form of supervision, eliminating the need for manual labeling and offering a more scalable and resource-efficient alternative for model training. Given these advantages, it is imperative to explore the integration of SSL techniques with ViTs, particularly in scenarios with limited labeled data. Inspired by this evolving trend, this survey aims to systematically review SSL mechanisms tailored for ViTs. We propose a comprehensive taxonomy to classify SSL techniques based on their representations and pre-training tasks. Additionally, we discuss the motivations behind SSL, review prominent pre-training tasks, and highlight advancements and challenges in this field. Furthermore, we conduct a comparative analysis of various SSL methods designed for ViTs, evaluating their strengths, limitations, and applicability to different scenarios.

nan


Article 809

Title@2025-06-10 (2): How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation

Title: How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation Wie richten und ergänzen Bilder LiDAR? Auf dem Weg zu einer harmonisierten multimodalen 3D-Panoptischen Segmentierung 图像如何对齐和补充 LiDAR ? 2505.18956v2

Authors (4): Yining Pan, Qiongjie Cui, Xulei Yang, Na Zhao

LiDAR-based 3D panoptic segmentation often struggles with the inherent sparsity of data from LiDAR sensors, which makes it challenging to accurately recognize distant or small objects. Recently, a few studies have sought to overcome this challenge by integrating LiDAR inputs with camera images, leveraging the rich and dense texture information provided by the latter. While these approaches have shown promising results, they still face challenges, such as misalignment during data augmentation and the reliance on post-processing steps. To address these issues, we propose Image-Assists-LiDAR (IAL), a novel multi-modal 3D panoptic segmentation framework. In IAL, we first introduce a modality-synchronized data augmentation strategy, PieAug, to ensure alignment between LiDAR and image inputs from the start. Next, we adopt a transformer decoder to directly predict panoptic segmentation results. To effectively fuse LiDAR and image features into tokens for the decoder, we design a Geometric-guided Token Fusion (GTF) module. Additionally, we leverage the complementary strengths of each modality as priors for query initialization through a Prior-based Query Generation (PQG) module, enhancing the decoder’s ability to generate accurate instance masks. Our IAL framework achieves state-of-the-art performance compared to previous multi-modal 3D panoptic segmentation methods on two widely used benchmarks. Code and models are publicly available at https://github.com/IMPL-Lab/IAL.git.

nan


Article 810

Title@2025-06-10 (2): AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models

Title: AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models AVA-Bench: Atomic Visual Ability Benchmark für Vision Foundation Modelle AVA-Bench:远景基础模型的原子视觉能力基准 2506.09082v1

Authors (8): Zheda Mai, Arpita Chowdhury, Zihe Wang, Sooyoung Jeon, Lemeng Wang, Jiacheng Hou, Jihyung Kil, Wei-Lun Chao

The rise of vision foundation models (VFMs) calls for systematic evaluation. A common approach pairs VFMs with large language models (LLMs) as general-purpose heads, followed by evaluation on broad Visual Question Answering (VQA) benchmarks. However, this protocol has two key blind spots: (i) the instruction tuning data may not align with VQA test distributions, meaning a wrong prediction can stem from such data mismatch rather than a VFM’ visual shortcomings; (ii) VQA benchmarks often require multiple visual abilities, making it hard to tell whether errors stem from lacking all required abilities or just a single critical one. To address these gaps, we introduce AVA-Bench, the first benchmark that explicitly disentangles 14 Atomic Visual Abilities (AVAs) – foundational skills like localization, depth estimation, and spatial understanding that collectively support complex visual reasoning tasks. By decoupling AVAs and matching training and test distributions within each, AVA-Bench pinpoints exactly where a VFM excels or falters. Applying AVA-Bench to leading VFMs thus reveals distinctive “ability fingerprints,” turning VFM selection from educated guesswork into principled engineering. Notably, we find that a 0.5B LLM yields similar VFM rankings as a 7B LLM while cutting GPU hours by 8x, enabling more efficient evaluation. By offering a comprehensive and transparent benchmark, we hope AVA-Bench lays the foundation for the next generation of VFMs.

nan


Article 811

Title@2025-06-10 (2): MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature

Title: MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature MAC: Eine effiziente Gradienten-Vorkonditionierung mit der mittleren Aktivierung Ungefähre Krümmung MAC: 使用平均振动作用的高效渐进式预设设备 2506.08464v1

Authors (3): Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko

Second-order optimization methods for training neural networks, such as KFAC, exhibit superior convergence by utilizing curvature information of loss landscape. However, it comes at the expense of high computational burden. In this work, we analyze the two components that constitute the layer-wise Fisher information matrix (FIM) used in KFAC: the Kronecker factors related to activations and pre-activation gradients. Based on empirical observations on their eigenspectra, we propose efficient approximations for them, resulting in a computationally efficient optimization method called MAC. To the best of our knowledge, MAC is the first algorithm to apply the Kronecker factorization to the FIM of attention layers used in transformers and explicitly integrate attention scores into the preconditioning. We also study the convergence property of MAC on nonlinear neural networks and provide two conditions under which it converges to global minima. Our extensive evaluations on various network architectures and datasets show that the proposed method outperforms KFAC and other state-of-the-art methods in terms of accuracy, end-to-end training time, and memory usage. Code is available at https://github.com/hseung88/mac.

nan


Article 812

Title@2025-06-10 (2): How to Provably Improve Return Conditioned Supervised Learning?

Title: How to Provably Improve Return Conditioned Supervised Learning? Wie man wahrscheinlich verbessern Rückkehr bedingtes Überwachtes Lernen? 如何改善受监督的学习? 2506.08463v1

Authors (5): Zhishuai Liu, Yu Yang, Ruhan Wang, Pan Xu, Dongruo Zhou

In sequential decision-making problems, Return-Conditioned Supervised Learning (RCSL) has gained increasing recognition for its simplicity and stability in modern decision-making tasks. Unlike traditional offline reinforcement learning (RL) algorithms, RCSL frames policy learning as a supervised learning problem by taking both the state and return as input. This approach eliminates the instability often associated with temporal difference (TD) learning in offline RL. However, RCSL has been criticized for lacking the stitching property, meaning its performance is inherently limited by the quality of the policy used to generate the offline dataset. To address this limitation, we propose a principled and simple framework called Reinforced RCSL. The key innovation of our framework is the introduction of a concept we call the in-distribution optimal return-to-go. This mechanism leverages our policy to identify the best achievable in-dataset future return based on the current state, avoiding the need for complex return augmentation techniques. Our theoretical analysis demonstrates that Reinforced RCSL can consistently outperform the standard RCSL approach. Empirical results further validate our claims, showing significant performance improvements across a range of benchmarks.

nan


Article 813

Title@2025-06-10 (2): MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning

Title: MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning MOBODY: Modellbasierte Off-Dynamik Offline-Verstärkung Lernen MOBODY: 示范非药物非药物脱线强化学习 2506.08460v1

Authors (4): Yihong Guo, Yu Yang, Pan Xu, Anqi Liu

We study the off-dynamics offline reinforcement learning problem, where the goal is to learn a policy from offline datasets collected from source and target domains with mismatched transition. Existing off-dynamics offline RL methods typically either filter source transitions that resemble those of the target domain or apply reward augmentation to source data, both constrained by the limited transitions available from the target domain. As a result, the learned policy is unable to explore target domain beyond the offline datasets. We propose MOBODY, a Model-Based Off-Dynamics offline RL algorithm that addresses this limitation by enabling exploration of the target domain via learned dynamics. MOBODY generates new synthetic transitions in the target domain through model rollouts, which are used as data augmentation during offline policy learning. Unlike existing model-based methods that learn dynamics from a single domain, MOBODY tackles the challenge of mismatched dynamics by leveraging both source and target datasets. Directly merging these datasets can bias the learned model toward source dynamics. Instead, MOBODY learns target dynamics by discovering a shared latent representation of states and transitions across domains through representation learning. To stabilize training, MOBODY incorporates a behavior cloning loss that regularizes the policy. Specifically, we introduce a Q-weighted behavior cloning loss that regularizes the policy toward actions with high target-domain Q-values, rather than uniformly imitating all actions in the dataset. These Q-values are learned from an enhanced target dataset composed of offline target data, augmented source data, and rollout data from the learned target dynamics. We evaluate MOBODY on MuJoCo benchmarks and show that it significantly outperforms state-of-the-art baselines, with especially pronounced improvements in challenging scenarios.

nan


Article 814

Title@2025-06-10 (2): Textual Unlearning Gives a False Sense of Unlearning

Title: Textual Unlearning Gives a False Sense of Unlearning Textuelles Nichtlernen gibt einen falschen Sinn für das Nichtlernen 文字不学习给人一个错误的不学习感 2406.13348v3

Authors (6): Jiacheng Du, Zhibo Wang, Jie Zhang, Xiaoyi Pang, Jiahui Hu, Kui Ren

Language Models (LMs) are prone to ‘‘memorizing’’ training data, including substantial sensitive user information. To mitigate privacy risks and safeguard the right to be forgotten, machine unlearning has emerged as a promising approach for enabling LMs to efficiently ‘‘forget’’ specific texts. However, despite the good intentions, is textual unlearning really as effective and reliable as expected? To address the concern, we first propose Unlearning Likelihood Ratio Attack+ (U-LiRA+), a rigorous textual unlearning auditing method, and find that unlearned texts can still be detected with very high confidence after unlearning. Further, we conduct an in-depth investigation on the privacy risks of textual unlearning mechanisms in deployment and present the Textual Unlearning Leakage Attack (TULA), along with its variants in both black- and white-box scenarios. We show that textual unlearning mechanisms could instead reveal more about the unlearned texts, exposing them to significant membership inference and data reconstruction risks. Our findings highlight that existing textual unlearning actually gives a false sense of unlearning, underscoring the need for more robust and secure unlearning mechanisms.

nan


Article 815

Title@2025-06-10 (2): The interplay of robustness and generalization in quantum machine learning

Title: The interplay of robustness and generalization in quantum machine learning Das Zusammenspiel von Robustheit und Verallgemeinerung im quantenmechanischen Lernen 量子机器学习中强力和一般化的相互作用 2506.08455v1

Authors (3): Julian Berberich, Tobias Fellner, Christian Holm

While adversarial robustness and generalization have individually received substantial attention in the recent literature on quantum machine learning, their interplay is much less explored. In this chapter, we address this interplay for variational quantum models, which were recently proposed as function approximators in supervised learning. We discuss recent results quantifying both robustness and generalization via Lipschitz bounds, which explicitly depend on model parameters. Thus, they give rise to a regularization-based training approach for robust and generalizable quantum models, highlighting the importance of trainable data encoding strategies. The practical implications of the theoretical results are demonstrated with an application to time series analysis.

nan


Article 816

Title@2025-06-10 (2): An Explainable Vision Transformer with Transfer Learning Combined with Support Vector Machine Based Efficient Drought Stress Identification

Title: An Explainable Vision Transformer with Transfer Learning Combined with Support Vector Machine Based Efficient Drought Stress Identification Ein erklärbarer Vision Transformer mit Transfer Learning kombiniert mit Unterstützung Vector Maschine basierte effiziente Drought Stress-Identifikation 一个可解释的愿景变异器,与转让学习相结合,与基于支持矢量的高效干旱应激反应机的识别 2407.21666v2

Authors (3): Aswini Kumar Patra, Ankit Varshney, Lingaraj Sahoo

Early detection of drought stress is critical for taking timely measures for reducing crop loss before the drought impact becomes irreversible. The subtle phenotypical and physiological changes in response to drought stress are captured by non-invasive imaging techniques and these imaging data serve as valuable resource for machine learning methods to identify drought stress. While convolutional neural networks (CNNs) are in wide use, vision transformers (ViTs) present a promising alternative in capturing long-range dependencies and intricate spatial relationships, thereby enhancing the detection of subtle indicators of drought stress. We propose an explainable deep learning pipeline that leverages the power of ViTs for drought stress detection in potato crops using aerial imagery. We applied two distinct approaches: a synergistic combination of ViT and support vector machine (SVM), where ViT extracts intricate spatial features from aerial images, and SVM classifies the crops as stressed or healthy and an end-to-end approach using a dedicated classification layer within ViT to directly detect drought stress. Our key findings explain the ViT model’s decision-making process by visualizing attention maps. These maps highlight the specific spatial features within the aerial images that the ViT model focuses as the drought stress signature. Our findings demonstrate that the proposed methods not only achieve high accuracy in drought stress identification but also shedding light on the diverse subtle plant features associated with drought stress. This offers a robust and interpretable solution for drought stress monitoring for farmers to undertake informed decisions for improved crop management.

nan


Article 817

Title@2025-06-10 (2): Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

Title: Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data Stabilisierendes Kontrastives RL: Techniken für das Roboterziel, das von Offline-Daten aus erreicht wird 稳定违规RL:利用离线数据实现机器人目标的技术 2306.03346v3

Authors (7): Chongyi Zheng, Benjamin Eysenbach, Homer Walke, Patrick Yin, Kuan Fang, Ruslan Salakhutdinov, Sergey Levine

Robotic systems that rely primarily on self-supervised learning have the potential to decrease the amount of human annotation and engineering effort required to learn control strategies. In the same way that prior robotic systems have leveraged self-supervised techniques from computer vision (CV) and natural language processing (NLP), our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem: learning to reach any goal without human-specified rewards or labels. Despite the seeming appeal, little (if any) prior work has demonstrated how self-supervised RL methods can be practically deployed on robotic systems. By first studying a challenging simulated version of this task, we discover design decisions about architectures and hyperparameters that increase the success rate by $2 \times$. These findings lay the groundwork for our main result: we demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks, with tasks being specified by a single goal image provided after training.

nan


Article 818

Title@2025-06-10 (2): Systematic and Efficient Construction of Quadratic Unconstrained Binary Optimization Forms for High-order and Dense Interactions

Title: Systematic and Efficient Construction of Quadratic Unconstrained Binary Optimization Forms for High-order and Dense Interactions Systematische und effiziente Konstruktion von Quadratic Unconstrained Binary Optimization Forms für High-Order und Dense Interaktionen 系统、高效地为高阶和高调互动和高调互动系统构建 “ 中度、无限制的 “ 优化二进制表格 2506.08448v1

Authors (2): Hyakka Nakada, Shu Tanaka

Quantum Annealing (QA) can efficiently solve combinatorial optimization problems whose objective functions are represented by Quadratic Unconstrained Binary Optimization (QUBO) formulations. For broader applicability of QA, quadratization methods are used to transform higher-order problems into QUBOs. However, quadratization methods for complex problems involving Machine Learning (ML) remain largely unknown. In these problems, strong nonlinearity and dense interactions prevent conventional methods from being applied. Therefore, we model target functions by the sum of rectified linear unit bases, which not only have the ability of universal approximation, but also have an equivalent quadratic-polynomial representation. In this study, the proof of concept is verified both numerically and analytically. In addition, by combining QA with the proposed quadratization, we design a new black-box optimization scheme, in which ML surrogate regressors are inputted to QA after the quadratization process.

nan


Article 819

Title@2025-06-10 (2): Understanding Bias Reinforcement in LLM Agents Debate

Title: Understanding Bias Reinforcement in LLM Agents Debate Verständnis der Bias-Verstärkung in LLM-Agenten-Debatte 了解LLLM代理商的强化申请 2503.16814v3

Authors (4): Jihwan Oh, Minchan Jeong, Jongwoo Ko, Se-Young Yun

Large Language Models $($LLMs$)$ solve complex problems using training-free methods like prompt engineering and in-context learning, yet ensuring reasoning correctness remains challenging. While self-correction methods such as self-consistency and self-refinement aim to improve reliability, they often reinforce biases due to the lack of effective feedback mechanisms. Multi-Agent Debate $($MAD$)$ has emerged as an alternative, but we identify two key limitations: bias reinforcement, where debate amplifies model biases instead of correcting them, and lack of perspective diversity, as all agents share the same model and reasoning patterns, limiting true debate effectiveness. To systematically evaluate these issues, we introduce $\textit{MetaNIM Arena}$, a benchmark designed to assess LLMs in adversarial strategic decision-making, where dynamic interactions influence optimal decisions. To overcome MAD’s limitations, we propose $\textbf{DReaMAD}$ $($$\textbf{D}$iverse $\textbf{Rea}$soning via $\textbf{M}$ulti-$\textbf{A}$gent $\textbf{D}$ebate with Refined Prompt$)$, a novel framework that $(1)$ refines LLM’s strategic prior knowledge to improve reasoning quality and $(2)$ promotes diverse viewpoints within a single model by systematically modifying prompts, reducing bias. Empirical results show that $\textbf{DReaMAD}$ significantly improves decision accuracy, reasoning diversity, and bias mitigation across multiple strategic tasks, establishing it as a more effective approach for LLM-based decision-making.

nan


Article 820

Title@2025-06-10 (2): A Framework for Controllable Multi-objective Learning with Annealed Stein Variational Hypernetworks

Title: A Framework for Controllable Multi-objective Learning with Annealed Stein Variational Hypernetworks Ein Rahmen für kontrollierbares multiobjektives Lernen mit Annealed Stein Variational Hypernetworks 与Annaaled Stein变异超网络的可控多目标学习框架 2506.06715v2

Authors (2): Minh-Duc Nguyen, Dung D. Le

Pareto Set Learning (PSL) is popular as an efficient approach to obtaining the complete optimal solution in Multi-objective Learning (MOL). A set of optimal solutions approximates the Pareto set, and its mapping is a set of dense points in the Pareto front in objective space. However, some current methods face a challenge: how to make the Pareto solution is diverse while maximizing the hypervolume value. In this paper, we propose a novel method to address this challenge, which employs Stein Variational Gradient Descent (SVGD) to approximate the entire Pareto set. SVGD pushes a set of particles towards the Pareto set by applying a form of functional gradient descent, which helps to converge and diversify optimal solutions. Additionally, we employ diverse gradient direction strategies to thoroughly investigate a unified framework for SVGD in multi-objective optimization and adapt this framework with an annealing schedule to promote stability. We introduce our method, SVH-MOL, and validate its effectiveness through extensive experiments on multi-objective problems and multi-task learning, demonstrating its superior performance.

nan


Article 821

Title@2025-06-10 (2): Time-Aware World Model for Adaptive Prediction and Control

Title: Time-Aware World Model for Adaptive Prediction and Control Time-Aware-Weltmodell für adaptive Vorhersage und Kontrolle 适应性预测和控制世界模型 2506.08441v1

Authors (3): Anh N. Nhu, Sanghyun Son, Ming Lin

In this work, we introduce the Time-Aware World Model (TAWM), a model-based approach that explicitly incorporates temporal dynamics. By conditioning on the time-step size, {\Delta}t, and training over a diverse range of {\Delta}t values – rather than sampling at a fixed time-step – TAWM learns both high- and low-frequency task dynamics across diverse control problems. Grounded in the information-theoretic insight that the optimal sampling rate depends on a system’s underlying dynamics, this time-aware formulation improves both performance and data efficiency. Empirical evaluations show that TAWM consistently outperforms conventional models across varying observation rates in a variety of control tasks, using the same number of training samples and iterations. Our code can be found online at: github.com/anh-nn01/Time-Aware-World-Model.

nan


Article 822

Title@2025-06-10 (2): Refiner: Data Refining against Gradient Leakage Attacks in Federated Learning

Title: Refiner: Data Refining against Gradient Leakage Attacks in Federated Learning Raffinerie: Datenveredelung gegen gradiente Leckage-Angriffe im Federated Learning 完善:完善数据,防止联邦学习中逐渐泄漏事件 2212.02042v3

Authors (5): Mingyuan Fan, Cen Chen, Chengyu Wang, Xiaodan Li, Wenmeng Zhou

Recent works have brought attention to the vulnerability of Federated Learning (FL) systems to gradient leakage attacks. Such attacks exploit clients’ uploaded gradients to reconstruct their sensitive data, thereby compromising the privacy protection capability of FL. In response, various defense mechanisms have been proposed to mitigate this threat by manipulating the uploaded gradients. Unfortunately, empirical evaluations have demonstrated limited resilience of these defenses against sophisticated attacks, indicating an urgent need for more effective defenses. In this paper, we explore a novel defensive paradigm that departs from conventional gradient perturbation approaches and instead focuses on the construction of robust data. Intuitively, if robust data exhibits low semantic similarity with clients’ raw data, the gradients associated with robust data can effectively obfuscate attackers. To this end, we design Refiner that jointly optimizes two metrics for privacy protection and performance maintenance. The utility metric is designed to promote consistency between the gradients of key parameters associated with robust data and those derived from clients’ data, thus maintaining model performance. Furthermore, the privacy metric guides the generation of robust data towards enlarging the semantic gap with clients’ data. Theoretical analysis supports the effectiveness of Refiner, and empirical evaluations on multiple benchmark datasets demonstrate the superior defense effectiveness of Refiner at defending against state-of-the-art attacks.

nan


Article 823

Title@2025-06-10 (2): Scaling Laws in Linear Regression: Compute, Parameters, and Data

Title: Scaling Laws in Linear Regression: Compute, Parameters, and Data Skalierungsgesetze in linearer Regression: Berechnung, Parameter und Daten 线性回归法的扩展法:计算、参数和数据 2406.08466v3

Authors (5): Licong Lin, Jingfeng Wu, Sham M. Kakade, Peter L. Bartlett, Jason D. Lee

Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, which predict that increasing model size monotonically improves performance. We study the theory of scaling laws in an infinite dimensional linear regression setup. Specifically, we consider a model with $M$ parameters as a linear function of sketched covariates. The model is trained by one-pass stochastic gradient descent (SGD) using $N$ data. Assuming the optimal parameter satisfies a Gaussian prior and the data covariance matrix has a power-law spectrum of degree $a>1$, we show that the reducible part of the test error is $\Theta(M^{-(a-1)} + N^{-(a-1)/a})$. The variance error, which increases with $M$, is dominated by the other errors due to the implicit regularization of SGD, thus disappearing from the bound. Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.

nan


Article 824

Title@2025-06-10 (2): Learning to Lead: Incentivizing Strategic Agents in the Dark

Title: Learning to Lead: Incentivizing Strategic Agents in the Dark Führen lernen: Anreize für strategische Agenten im Dunkeln 学习铅:在黑暗中激励战略剂 2506.08438v1

Authors (3): Yuchen Wu, Xinyi Zhong, Zhuoran Yang

We study an online learning version of the generalized principal-agent model, where a principal interacts repeatedly with a strategic agent possessing private types, private rewards, and taking unobservable actions. The agent is non-myopic, optimizing a discounted sum of future rewards and may strategically misreport types to manipulate the principal’s learning. The principal, observing only her own realized rewards and the agent’s reported types, aims to learn an optimal coordination mechanism that minimizes strategic regret. We develop the first provably sample-efficient algorithm for this challenging setting. Our approach features a novel pipeline that combines (i) a delaying mechanism to incentivize approximately myopic agent behavior, (ii) an innovative reward angle estimation framework that uses sector tests and a matching procedure to recover type-dependent reward functions, and (iii) a pessimistic-optimistic LinUCB algorithm that enables the principal to explore efficiently while respecting the agent’s incentive constraints. We establish a near optimal $\tilde{O}(\sqrt{T}) $ regret bound for learning the principal’s optimal policy, where $\tilde{O}(\cdot) $ omits logarithmic factors. Our results open up new avenues for designing robust online learning algorithms for a wide range of game-theoretic settings involving private types and strategic agents.

nan


Article 825

Title@2025-06-10 (2): Olica: Efficient Structured Pruning of Large Language Models without Retraining

Title: Olica: Efficient Structured Pruning of Large Language Models without Retraining Olica: Effizientes strukturiertes Beschneiden großer Sprachmodelle ohne Umschulung Olica:无再培训的大型语言模型高效结构稳健 2506.08436v1

Authors (2): Jiujun He, Huazhen Lin

Most existing structured pruning methods for Large Language Models (LLMs) require substantial computational and data resources for retraining to reestablish the corrupted correlations, making them prohibitively expensive. To address this, we propose a pruning framework for LLMs called Orthogonal decomposition and Linear Calibration (Olica), which eliminates the need for retraining. A key observation is that the multi-head attention (MHA) layer depends on two types of matrix products. By treating these matrix products as unified entities and applying principal component analysis (PCA), we extract the most important information to compress LLMs without sacrificing accuracy or disrupting their original structure. Consequently, retraining becomes unnecessary. A fast decomposition method is devised, reducing the complexity of PCA by a factor of the square of the number of attention heads. Additionally, to mitigate error accumulation problem caused by pruning the feed-forward network (FFN) layer, we introduce a linear calibration method to reconstruct the residual errors of pruned layers using low-rank matrices. By leveraging singular value decomposition (SVD) on the solution of the least-squares problem, these matrices are obtained without requiring retraining. Extensive experiments show that the proposed Olica is efficient in terms of data usage, GPU memory, and running time, while delivering superior performance across multiple benchmarks.

nan


Article 826

Title@2025-06-10 (2): Boosting Gradient Leakage Attacks: Data Reconstruction in Realistic FL Settings

Title: Boosting Gradient Leakage Attacks: Data Reconstruction in Realistic FL Settings Förderung gradienter Leckagenangriffe: Datenwiederaufbau in realistischen FL-Einstellungen 推动渐变渗漏袭击:在现实的FL设置中进行数据重建 2506.08435v1

Authors (4): Mingyuan Fan, Fuyi Wang, Cen Chen, Jianying Zhou

Federated learning (FL) enables collaborative model training among multiple clients without the need to expose raw data. Its ability to safeguard privacy, at the heart of FL, has recently been a hot-button debate topic. To elaborate, several studies have introduced a type of attacks known as gradient leakage attacks (GLAs), which exploit the gradients shared during training to reconstruct clients’ raw data. On the flip side, some literature, however, contends no substantial privacy risk in practical FL environments due to the effectiveness of such GLAs being limited to overly relaxed conditions, such as small batch sizes and knowledge of clients’ data distributions. This paper bridges this critical gap by empirically demonstrating that clients’ data can still be effectively reconstructed, even within realistic FL environments. Upon revisiting GLAs, we recognize that their performance failures stem from their inability to handle the gradient matching problem. To alleviate the performance bottlenecks identified above, we develop FedLeak, which introduces two novel techniques, partial gradient matching and gradient regularization. Moreover, to evaluate the performance of FedLeak in real-world FL environments, we formulate a practical evaluation protocol grounded in a thorough review of extensive FL literature and industry practices. Under this protocol, FedLeak can still achieve high-fidelity data reconstruction, thereby underscoring the significant vulnerability in FL systems and the urgent need for more effective defense methods.

nan


Article 827

Title@2025-06-10 (2): FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making

Title: FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making FinHEAR: Human Expertise und adaptive Risiko-Bewusst Temporale Begründung für Finanzentscheidungen FinHEAR: 金融决策的人类专门知识和适应性风险—-风险警报时间原因 2506.09080v1

Authors (7): Jiaxiang Chen, Mingxi Zou, Zhuo Wang, Qifan Wang, Dongning Sun, Chi Zhang, Zenglin Xu

Financial decision-making presents unique challenges for language models, demanding temporal reasoning, adaptive risk assessment, and responsiveness to dynamic events. While large language models (LLMs) show strong general reasoning capabilities, they often fail to capture behavioral patterns central to human financial decisions-such as expert reliance under information asymmetry, loss-averse sensitivity, and feedback-driven temporal adjustment. We propose FinHEAR, a multi-agent framework for Human Expertise and Adaptive Risk-aware reasoning. FinHEAR orchestrates specialized LLM-based agents to analyze historical trends, interpret current events, and retrieve expert-informed precedents within an event-centric pipeline. Grounded in behavioral economics, it incorporates expert-guided retrieval, confidence-adjusted position sizing, and outcome-based refinement to enhance interpretability and robustness. Empirical results on curated financial datasets show that FinHEAR consistently outperforms strong baselines across trend prediction and trading tasks, achieving higher accuracy and better risk-adjusted returns.

nan


Article 828

Title@2025-06-10 (2): Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings

Title: Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings Verstärkte Konvergenzraten für die nichtkonvexe Optimierung durch Reduktionsmappings 通过减少绘图实现非电流优化的夏端趋同率 2506.08428v1

Authors (3): Evan Markou, Thalaiyasingam Ajanthan, Stephen Gould

Many high-dimensional optimisation problems exhibit rich geometric structures in their set of minimisers, often forming smooth manifolds due to over-parametrisation or symmetries. When this structure is known, at least locally, it can be exploited through reduction mappings that reparametrise part of the parameter space to lie on the solution manifold. These reductions naturally arise from inner optimisation problems and effectively remove redundant directions, yielding a lower-dimensional objective. In this work, we introduce a general framework to understand how such reductions influence the optimisation landscape. We show that well-designed reduction mappings improve curvature properties of the objective, leading to better-conditioned problems and theoretically faster convergence for gradient-based methods. Our analysis unifies a range of scenarios where structural information at optimality is leveraged to accelerate convergence, offering a principled explanation for the empirical gains observed in such optimisation algorithms.

nan


Article 829

Title@2025-06-10 (2): HASFL: Heterogeneity-aware Split Federated Learning over Edge Computing Systems

Title: HASFL: Heterogeneity-aware Split Federated Learning over Edge Computing Systems HASFL: Heterogenität bewusst Split Federated Learning over Edge Computing Systems HASFL: 超越边缘电子计算系统的异异质-认知异质分化联邦学习系统 2506.08426v1

Authors (5): Zheng Lin, Zhe Chen, Xianhao Chen, Wei Ni, Yue Gao

Split federated learning (SFL) has emerged as a promising paradigm to democratize machine learning (ML) on edge devices by enabling layer-wise model partitioning. However, existing SFL approaches suffer significantly from the straggler effect due to the heterogeneous capabilities of edge devices. To address the fundamental challenge, we propose adaptively controlling batch sizes (BSs) and model splitting (MS) for edge devices to overcome resource heterogeneity. We first derive a tight convergence bound of SFL that quantifies the impact of varied BSs and MS on learning performance. Based on the convergence bound, we propose HASFL, a heterogeneity-aware SFL framework capable of adaptively controlling BS and MS to balance communication-computing latency and training convergence in heterogeneous edge networks. Extensive experiments with various datasets validate the effectiveness of HASFL and demonstrate its superiority over state-of-the-art benchmarks.

nan


Article 830

Title@2025-06-10 (2): Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy

Title: Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy Mic-hackathon 2024: Hackathon zum maschinellen Lernen für Elektronen- und Rastersondenmikroskopie Mic-hackathon 2024:关于电子和扫描探测器机器学习的Hackathon 2506.08423v1

Authors (73): Utkarsh Pratiush, Austin Houston, Kamyar Barakati, Aditya Raghavan, Dasol Yoon, Harikrishnan KP, Zhaslan Baraissov, Desheng Ma, Samuel S. Welborn, Mikolaj Jakowski, Shawn-Patrick Barhorst, Alexander J. Pattison, Panayotis Manganaris, Sita Sirisha Madugula, Sai Venkata Gayathri Ayyagari, Vishal Kennedy, Ralph Bulanadi, Michelle Wang, Kieran J. Pang, Ian Addison-Smith, Willy Menacho, Horacio V. Guzman, Alexander Kiefer, Nicholas Furth, Nikola L. Kolev, Mikhail Petrov, Viktoriia Liu, Sergey Ilyev, Srikar Rairao, Tommaso Rodani, Ivan Pinto-Huguet, Xuli Chen, Josep Cruañes, Marta Torrens, Jovan Pomar, Fanzhi Su, Pawan Vedanti, Zhiheng Lyu, Xingzhi Wang, Lehan Yao, Amir Taqieddin, Forrest Laskowski, Xiangyu Yin, Yu-Tsun Shao, Benjamin Fein-Ashley, Yi Jiang, Vineet Kumar, Himanshu Mishra, Yogesh Paul, Adib Bazgir, Rama chandra Praneeth Madugula, Yuwen Zhang, Pravan Omprakash, Jian Huang, Eric Montufar-Morales, Vivek Chawla, Harshit Sethi, Jie Huang, Lauri Kurki, Grace Guinan, Addison Salvador, Arman Ter-Petrosyan, Madeline Van Winkle, Steven R. Spurgeon, Ganesh Narasimha, Zijie Wu, Richard Liu, Yongtao Liu, Boris Slautin, Andrew R Lupini, Rama Vasudevan, Gerd Duscher, Sergei V. Kalinin

Microscopy is a primary source of information on materials structure and functionality at nanometer and atomic scales. The data generated is often well-structured, enriched with metadata and sample histories, though not always consistent in detail or format. The adoption of Data Management Plans (DMPs) by major funding agencies promotes preservation and access. However, deriving insights remains difficult due to the lack of standardized code ecosystems, benchmarks, and integration strategies. As a result, data usage is inefficient and analysis time is extensive. In addition to post-acquisition analysis, new APIs from major microscope manufacturers enable real-time, ML-based analytics for automated decision-making and ML-agent-controlled microscope operation. Yet, a gap remains between the ML and microscopy communities, limiting the impact of these methods on physics, materials discovery, and optimization. Hackathons help bridge this divide by fostering collaboration between ML researchers and microscopy experts. They encourage the development of novel solutions that apply ML to microscopy, while preparing a future workforce for instrumentation, materials science, and applied ML. This hackathon produced benchmark datasets and digital twins of microscopes to support community growth and standardized workflows. All related code is available at GitHub: https://github.com/KalininGroup/Mic-hackathon-2024-codes-publication/tree/1.0.0.1

nan


Article 831

Title@2025-06-10 (2): Dense ReLU Neural Networks for Temporal-spatial Model

Title: Dense ReLU Neural Networks for Temporal-spatial Model Dense ReLU Neuronale Netzwerke für zeitlich-räumliches Modell 时空空间模型神经网络 2411.09961v8

Authors (5): Carlos Misael Madrid Padilla, Zhi Zhang, Xiaokai Luo, Daren Wang, Oscar Hernan Madrid Padilla

In this paper, we focus on fully connected deep neural networks utilizing the Rectified Linear Unit (ReLU) activation function for nonparametric estimation. We derive non-asymptotic bounds that lead to convergence rates, addressing both temporal and spatial dependence in the observed measurements. By accounting for dependencies across time and space, our models better reflect the complexities of real-world data, enhancing both predictive performance and theoretical robustness. We also tackle the curse of dimensionality by modeling the data on a manifold, exploring the intrinsic dimensionality of high-dimensional data. We broaden existing theoretical findings of temporal-spatial analysis by applying them to neural networks in more general contexts and demonstrate that our proof techniques are effective for models with short-range dependence. Our empirical simulations across various synthetic response functions underscore the superior performance of our method, outperforming established approaches in the existing literature. These findings provide valuable insights into the strong capabilities of dense neural networks (Dense NN) for temporal-spatial modeling across a broad range of function classes.

nan


Article 832

Title@2025-06-10 (2): DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

Title: DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization DisCO: Mit diskriminativ eingeschränkter Optimierung groß angelegte Modelle stärken disCO:加强大理由理由模型,采用有偏见的限制性最佳化 2505.12366v2

Authors (5): Gang Li, Ming Lin, Tomer Galanti, Zhengzhong Tu, Tianbao Yang

The recent success and openness of DeepSeek-R1 have brought widespread attention to Group Relative Policy Optimization (GRPO) as a reinforcement learning method for large reasoning models (LRMs). In this work, we analyze the GRPO objective under a binary reward setting and reveal an inherent limitation of question-level difficulty bias. We also identify a connection between GRPO and traditional discriminative methods in supervised learning. Motivated by these insights, we introduce a new Discriminative Constrained Optimization (DisCO) framework for reinforcing LRMs, grounded in the principle of discriminative learning. The main differences between DisCO and GRPO and its recent variants are: (1) it replaces the group relative objective with a discriminative objective defined by a scoring function; (2) it abandons clipping-based surrogates in favor of non-clipping RL surrogate objectives used as scoring functions; (3) it employs a simple yet effective constrained optimization approach to enforce the KL divergence constraint, ensuring stable training. As a result, DisCO offers notable advantages over GRPO and its variants: (i) it completely eliminates difficulty bias by adopting discriminative objectives; (ii) it addresses the entropy instability in GRPO and its variants through the use of non-clipping scoring functions and a constrained optimization approach; (iii) it allows the incorporation of advanced discriminative learning techniques to address data imbalance, where a significant number of questions have more negative than positive generated answers during training. Our experiments on enhancing the mathematical reasoning capabilities of SFT-finetuned models show that DisCO significantly outperforms GRPO and its improved variants such as DAPO, achieving average gains of 7\% over GRPO and 6\% over DAPO across six benchmark tasks for an 1.5B model.

nan


Article 833

Title@2025-06-10 (2): Online Learning-guided Learning Rate Adaptation via Gradient Alignment

Title: Online Learning-guided Learning Rate Adaptation via Gradient Alignment Online Learning-guided Learning Rate Anpassung über Gradient Alignment 通过 “ 渐进协调 “ 调整,调整在线学习指导学习率 2506.08419v1

Authors (3): Ruichen Jiang, Ali Kavis, Aryan Mokhtari

The performance of an optimizer on large-scale deep learning models depends critically on fine-tuning the learning rate, often requiring an extensive grid search over base learning rates, schedules, and other hyperparameters. In this paper, we propose a principled framework called GALA (Gradient Alignment-based Learning rate Adaptation), which dynamically adjusts the learning rate by tracking the alignment between consecutive gradients and using a local curvature estimate. Guided by the convergence analysis, we formulate the problem of selecting the learning rate as a one-dimensional online learning problem. When paired with an online learning algorithm such as Follow-the-Regularized-Leader, our method produces a flexible, adaptive learning rate schedule that tends to increase when consecutive gradients are aligned and decrease otherwise. We establish a data-adaptive convergence rate for normalized SGD equipped with GALA in the smooth, nonconvex setting. Empirically, common optimizers such as SGD and Adam, when augmented with GALA, demonstrate robust performance across a wide range of initial learning rates and perform competitively without the need for tuning.

nan


Article 834

Title@2025-06-10 (2): Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

Title: Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood Offline RL mit glatter OOD-Verallgemeinerung in Convex Hull und seiner Nachbarschaft 在Convex Hull及其邻里平滑OOOD平滑的离线RL(RL) 2506.08417v1

Authors (8): Qingmao Yao, Zhichao Lei, Tianyuan Chen, Ziyue Yuan, Xuefan Chen, Jianxiang Liu, Faguo Wu, Xiao Zhang

Offline Reinforcement Learning (RL) struggles with distributional shifts, leading to the $Q$-value overestimation for out-of-distribution (OOD) actions. Existing methods address this issue by imposing constraints; however, they often become overly conservative when evaluating OOD regions, which constrains the $Q$-function generalization. This over-constraint issue results in poor $Q$-value estimation and hinders policy improvement. In this paper, we introduce a novel approach to achieve better $Q$-value estimation by enhancing $Q$-function generalization in OOD regions within Convex Hull and its Neighborhood (CHN). Under the safety generalization guarantees of the CHN, we propose the Smooth Bellman Operator (SBO), which updates OOD $Q$-values by smoothing them with neighboring in-sample $Q$-values. We theoretically show that SBO approximates true $Q$-values for both in-sample and OOD actions within the CHN. Our practical algorithm, Smooth Q-function OOD Generalization (SQOG), empirically alleviates the over-constraint issue, achieving near-accurate $Q$-value estimation. On the D4RL benchmarks, SQOG outperforms existing state-of-the-art methods in both performance and computational efficiency.

nan


Article 835

Title@2025-06-10 (2): DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data

Title: DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data DISCO balanciert die Skalen: Adaptives Domain- und Schwierigkeitsgrad-Bewusst-Verstärkung-Lernen auf unausgewogenen Daten DISCO 平衡规模:关于数据不平衡的适应性域域和困难性软件强化学习 2505.15074v2

Authors (10): Yuhang Zhou, Jing Zhu, Shengyi Qian, Zhuokai Zhao, Xiyao Wang, Xiaoyu Liu, Ming Li, Paiheng Xu, Wei Ai, Furong Huang

Large Language Models (LLMs) are increasingly aligned with human preferences through Reinforcement Learning from Human Feedback (RLHF). Among RLHF methods, Group Relative Policy Optimization (GRPO) has gained attention for its simplicity and strong performance, notably eliminating the need for a learned value function. However, GRPO implicitly assumes a balanced domain distribution and uniform semantic alignment across groups - assumptions that rarely hold in real-world datasets. When applied to multi-domain, imbalanced data, GRPO disproportionately optimizes for dominant domains, neglecting underrepresented ones and resulting in poor generalization and fairness. We propose Domain-Informed Self-Consistency Policy Optimization (DISCO), a principled extension to GRPO that addresses inter-group imbalance with two key innovations. Domain-aware reward scaling counteracts frequency bias by reweighting optimization based on domain prevalence. Difficulty-aware reward scaling leverages prompt-level self-consistency to identify and prioritize uncertain prompts that offer greater learning value. Together, these strategies promote more equitable and effective policy learning across domains. Extensive experiments across multiple LLMs and skewed training distributions show that DISCO improves generalization, outperforms existing GRPO variants by 5% on Qwen3 models, and sets new state-of-the-art results on multi-domain alignment benchmarks.

nan


Article 836

Title@2025-06-10 (2): Improved Scaling Laws in Linear Regression via Data Reuse

Title: Improved Scaling Laws in Linear Regression via Data Reuse Verbesserte Skalierungsgesetze bei linearer Regression über Datenwiederverwertung 通过数据再利用改进线性回归通过数据再利用的扩展法 2506.08415v1

Authors (3): Licong Lin, Jingfeng Wu, Peter L. Bartlett

Neural scaling laws suggest that the test error of large language models trained online decreases polynomially as the model size and data size increase. However, such scaling can be unsustainable when running out of new data. In this work, we show that data reuse can improve existing scaling laws in linear regression. Specifically, we derive sharp test error bounds on $M$-dimensional linear models trained by multi-pass stochastic gradient descent (multi-pass SGD) on $N$ data with sketched features. Assuming that the data covariance has a power-law spectrum of degree $a$, and that the true parameter follows a prior with an aligned power-law spectrum of degree $b-a$ (with $a > b > 1$), we show that multi-pass SGD achieves a test error of $\Theta(M^{1-b} + L^{(1-b)/a})$, where $L \lesssim N^{a/b}$ is the number of iterations. In the same setting, one-pass SGD only attains a test error of $\Theta(M^{1-b} + N^{(1-b)/a})$ (see e.g., Lin et al., 2024). This suggests an improved scaling law via data reuse (i.e., choosing $L>N$) in data-constrained regimes. Numerical simulations are also provided to verify our theoretical findings.

nan


Article 837

Title@2025-06-10 (2): Learning to Hear Broken Motors: Signature-Guided Data Augmentation for Induction-Motor Diagnostics

Title: Learning to Hear Broken Motors: Signature-Guided Data Augmentation for Induction-Motor Diagnostics Lernen, gebrochene Motoren zu hören: Signatur-geführte Datenvergrößerung für Induktions-Motor-Diagnostik 学习听听破碎汽车:诱导-摩托汽车诊断的签名指导数据增强 2506.08412v1

Authors (5): Saraa Ali, Aleksandr Khizhik, Stepan Svirin, Artem Ryzhikov, Denis Derkach

The application of machine learning (ML) algorithms in the intelligent diagnosis of three-phase engines has the potential to significantly enhance diagnostic performance and accuracy. Traditional methods largely rely on signature analysis, which, despite being a standard practice, can benefit from the integration of advanced ML techniques. In our study, we innovate by combining ML algorithms with a novel unsupervised anomaly generation methodology that takes into account the engine physics model. We propose Signature-Guided Data Augmentation (SGDA), an unsupervised framework that synthesizes physically plausible faults directly in the frequency domain of healthy current signals. Guided by Motor Current Signature Analysis, SGDA creates diverse and realistic anomalies without resorting to computationally intensive simulations. This hybrid approach leverages the strengths of both supervised ML and unsupervised signature analysis, achieving superior diagnostic accuracy and reliability along with wide industrial application. The findings highlight the potential of our approach to contribute significantly to the field of engine diagnostics, offering a robust and efficient solution for real-world applications.

nan


Article 838

Title@2025-06-10 (2): Mutual-Taught for Co-adapting Policy and Reward Models

Title: Mutual-Taught for Co-adapting Policy and Reward Models Gegenseitiges Lernen für Ko-Anpassungspolitik und Prämienmodelle 共同适应政策和奖励模式相互培训 2506.06292v2

Authors (8): Tianyuan Shi, Canbin Huang, Fanqi Wan, Longguang Zhong, Ziyi Yang, Weizhou Shen, Xiaojun Quan, Ming Yan

During the preference optimization of large language models (LLMs), distribution shifts may arise between newly generated model samples and the data used to train the reward model (RM). This shift reduces the efficacy of the RM, which in turn negatively impacts the performance of the policy model (PM). To address this challenge, we propose Mutual-Taught, a self-training method that iteratively improves both the PM and RM without requiring additional human annotation. Our approach mirrors the expectation-maximization (EM) algorithm. In the E-step, the PM is updated using feedback from the current RM, guiding the PM toward a better approximation of the latent optimal preference distribution. In the M-step, we update the RM by constructing training data from the outputs of the PM before and after the E-step update. This process ensures that the RM adapts to the evolving policy distribution. Experimental results demonstrate that this iterative approach leads to consistent improvements in both models. Specifically, our 8B policy model, LLaMA-3-8B-Instruct-MT, achieves a length-controlled win rate of 54.1\% on AlpacaEval-2, while our 8B reward model, FsfairX-LLaMA3-RM-MT, performs on par with GPT-4o-2024-08-06 on RewardBench.

nan


Article 839

Title@2025-06-10 (2): FUSE: Measure-Theoretic Compact Fuzzy Set Representation for Taxonomy Expansion

Title: FUSE: Measure-Theoretic Compact Fuzzy Set Representation for Taxonomy Expansion FUSE: Maß-theoretische Kompakt-Fuzzy-Set-Darstellung für Taxonomie-Expansion FUSE: 测量-理论结合模糊集束代表扩大分类学 2506.08409v1

Authors (7): Fred Xu, Song Jiang, Zijie Huang, Xiao Luo, Shichang Zhang, Adrian Chen, Yizhou Sun

Taxonomy Expansion, which models complex concepts and their relations, can be formulated as a set representation learning task. The generalization of set, fuzzy set, incorporates uncertainty and measures the information within a semantic concept, making it suitable for concept modeling. Existing works usually model sets as vectors or geometric objects such as boxes, which are not closed under set operations. In this work, we propose a sound and efficient formulation of set representation learning based on its volume approximation as a fuzzy set. The resulting embedding framework, Fuzzy Set Embedding (FUSE), satisfies all set operations and compactly approximates the underlying fuzzy set, hence preserving information while being efficient to learn, relying on minimum neural architecture. We empirically demonstrate the power of FUSE on the task of taxonomy expansion, where FUSE achieves remarkable improvements up to 23% compared with existing baselines. Our work marks the first attempt to understand and efficiently compute the embeddings of fuzzy sets.

nan


Article 840

Title@2025-06-10 (2): Towards Generalized Source Tracing for Codec-Based Deepfake Speech

Title: Towards Generalized Source Tracing for Codec-Based Deepfake Speech Auf dem Weg zu einer generalisierten Quellenverfolgung für codec-basierte Deepfake-Rede 努力实现以基于编码的 “ 深层假言 “ 为依据的 “ 普遍源源追踪 “ 目标 2506.07294v2

Authors (6): Xuanjun Chen, I-Ming Lin, Lin Zhang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang

Recent attempts at source tracing for codec-based deepfake speech (CodecFake), generated by neural audio codec-based speech generation (CoSG) models, have exhibited suboptimal performance. However, how to train source tracing models using simulated CoSG data while maintaining strong performance on real CoSG-generated audio remains an open challenge. In this paper, we show that models trained solely on codec-resynthesized data tend to overfit to non-speech regions and struggle to generalize to unseen content. To mitigate these challenges, we introduce the Semantic-Acoustic Source Tracing Network (SASTNet), which jointly leverages Whisper for semantic feature encoding and Wav2vec2 with AudioMAE for acoustic feature encoding. Our proposed SASTNet achieves state-of-the-art performance on the CoSG test set of the CodecFake+ dataset, demonstrating its effectiveness for reliable source tracing.

nan


Article 841

Title@2025-06-10 (2): RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation

Title: RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation RLHS: Fehlausrichtung in RLHF mit Hindsight Simulation abmildern RLHS: 减轻RLHF的差错,采用速光模拟模拟 2501.08617v3

Authors (5): Kaiqu Liang, Haimin Hu, Ryan Liu, Thomas L. Griffiths, Jaime Fernández Fisac

While Reinforcement Learning from Human Feedback (RLHF) has shown promise in aligning generative AI, we present empirical evidence that it can also cause severe, systematic misalignment. We hypothesize that this stems from evaluator feedback depending on downstream outcome predictions (foresight) that can be influenced by the AI’s output, inducing Goodhart’s law dynamics. We present a theoretical analysis showing that conditioning evaluator feedback on downstream observations (hindsight) inhibits this effect by decoupling the alignment signal from potentially compromised predictions–crucially, the result holds even if the observed outcomes are sampled from the AI’s own world model. Building on this insight, we introduce Reinforcement Learning from Hindsight Simulation (RLHS), which presents plausible simulated outcomes to evaluators before eliciting feedback. We validate RLHS across three consultancy settings–marketplace interactions, restaurant recommendations, and online course advising–using both online (PPO) and offline (DPO) fine-tuning methods, and show that it substantially improves alignment over RLHF in experiments and human evaluations. We perform post-hoc benchmark evaluations on TruthfulQA, HaluEval, and TrustLLM, finding that even after single-task fine-tuning, RLHF misalignment persists, whereas RLHS consistently outperforms baselines and demonstrates robust alignment generalization. The project webpage and code are available at https://rl-hindsight.github.io.

nan


Article 842

Title@2025-06-10 (2): Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry

Title: Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry Sparse Training von Zufallsinitialisierung: Ausrichten von Lotteriekartenmasken mit Gewichtssymmetrie 随机初始化的分散训练: 使用重量对称对调彩票票面罩 2505.05143v2

Authors (5): Mohammed Adnan, Rohan Jain, Ekansh Sharma, Rahul Krishnan, Yani Ioannou

The Lottery Ticket Hypothesis (LTH) suggests there exists a sparse LTH mask and weights that achieve the same generalization performance as the dense model while using significantly fewer parameters. However, finding a LTH solution is computationally expensive, and a LTH sparsity mask does not generalize to other random weight initializations. Recent work has suggested that neural networks trained from random initialization find solutions within the same basin modulo permutation, and proposes a method to align trained models within the same loss basin. We hypothesize that misalignment of basins is the reason why LTH masks do not generalize to new random initializations and propose permuting the LTH mask to align with the new optimization basin when performing sparse training from a different random init. We empirically show a significant increase in generalization when sparse training from random initialization with the permuted mask as compared to using the non-permuted LTH mask, on multiple datasets (CIFAR-10, CIFAR-100 and ImageNet) and models (VGG11, ResNet20 and ResNet50).

nan


Article 843

Title@2025-06-10 (2): mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks

Title: mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks mSTEB: Massive mehrsprachige Bewertung von LLMs zu Sprach- und Textaufgaben mSTEB: 对关于发言和文本任务LLM女士进行大规模多语种评价 2506.08400v1

Authors (7): Luel Hagos Beyene, Vivek Verma, Min Ma, Jesujoba O. Alabi, Fabian David Schmidt, Joyce Nakatumba-Nabende, David Ifeoluwa Adelani

Large Language models (LLMs) have demonstrated impressive performance on a wide range of tasks, including in multimodal settings such as speech. However, their evaluation is often limited to English and a few high-resource languages. For low-resource languages, there is no standardized evaluation benchmark. In this paper, we address this gap by introducing mSTEB, a new benchmark to evaluate the performance of LLMs on a wide range of tasks covering language identification, text classification, question answering, and translation tasks on both speech and text modalities. We evaluated the performance of leading LLMs such as Gemini 2.0 Flash and GPT-4o (Audio) and state-of-the-art open models such as Qwen 2 Audio and Gemma 3 27B. Our evaluation shows a wide gap in performance between high-resource and low-resource languages, especially for languages spoken in Africa and Americas/Oceania. Our findings show that more investment is needed to address their under-representation in LLMs coverage.

nan


Article 844

Title@2025-06-10 (2): Spatiotemporal deep learning models for detection of rapid intensification in cyclones

Title: Spatiotemporal deep learning models for detection of rapid intensification in cyclones Geotemporale Deep-Learning-Modelle zur Erkennung einer schnellen Verstärkung bei Zyklonen 探测气旋风暴迅速加剧的斯波时深深学习模式 2506.08397v1

Authors (3): Vamshika Sutar, Amandeep Singh, Rohitash Chandra

Cyclone rapid intensification is the rapid increase in cyclone wind intensity, exceeding a threshold of 30 knots, within 24 hours. Rapid intensification is considered an extreme event during a cyclone, and its occurrence is relatively rare, contributing to a class imbalance in the dataset. A diverse array of factors influences the likelihood of a cyclone undergoing rapid intensification, further complicating the task for conventional machine learning models. In this paper, we evaluate deep learning, ensemble learning and data augmentation frameworks to detect cyclone rapid intensification based on wind intensity and spatial coordinates. We note that conventional data augmentation methods cannot be utilised for generating spatiotemporal patterns replicating cyclones that undergo rapid intensification. Therefore, our framework employs deep learning models to generate spatial coordinates and wind intensity that replicate cyclones to address the class imbalance problem of rapid intensification. We also use a deep learning model for the classification module within the data augmentation framework to differentiate between rapid and non-rapid intensification events during a cyclone. Our results show that data augmentation improves the results for rapid intensification detection in cyclones, and spatial coordinates play a critical role as input features to the given models. This paves the way for research in synthetic data generation for spatiotemporal data with extreme events.

nan


Article 845

Title@2025-06-10 (2): From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models

Title: From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models Von Pixeln zu Prädikaten: Learning Symbolic World Models via Pretrained Vision-Language Models 从像素到预言:通过预先培训的视觉语言模型学习象征性世界模式 2501.00296v3

Authors (7): Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Jiuguang Wang, Tomás Lozano-Pérez, Leslie Pack Kaelbling

Our aim is to learn to solve long-horizon decision-making problems in complex robotics domains given low-level skills and a handful of short-horizon demonstrations containing sequences of images. To this end, we focus on learning abstract symbolic world models that facilitate zero-shot generalization to novel goals via planning. A critical component of such models is the set of symbolic predicates that define properties of and relationships between objects. In this work, we leverage pretrained vision language models (VLMs) to propose a large set of visual predicates potentially relevant for decision-making, and to evaluate those predicates directly from camera images. At training time, we pass the proposed predicates and demonstrations into an optimization-based model-learning algorithm to obtain an abstract symbolic world model that is defined in terms of a compact subset of the proposed predicates. At test time, given a novel goal in a novel setting, we use the VLM to construct a symbolic description of the current world state, and then use a search-based planning algorithm to find a sequence of low-level skills that achieves the goal. We demonstrate empirically across experiments in both simulation and the real world that our method can generalize aggressively, applying its learned world model to solve problems with a wide variety of object types, arrangements, numbers of objects, and visual backgrounds, as well as novel goals and much longer horizons than those seen at training time.

nan


Article 846

Title@2025-06-10 (2): Learning-based density-equalizing map

Title: Learning-based density-equalizing map Learning-based Dichte-equalizing Karte 基于学习的密度平等图 2506.10027v1

Authors (3): Yanwen Huang, Lok Ming Lui, Gary P. T. Choi

Density-equalizing map (DEM) serves as a powerful technique for creating shape deformations with the area changes reflecting an underlying density function. In recent decades, DEM has found widespread applications in fields such as data visualization, geometry processing, and medical imaging. Traditional approaches to DEM primarily rely on iterative numerical solvers for diffusion equations or optimization-based methods that minimize handcrafted energy functionals. However, these conventional techniques often face several challenges: they may suffer from limited accuracy, produce overlapping artifacts in extreme cases, and require substantial algorithmic redesign when extended from 2D to 3D, due to the derivative-dependent nature of their energy formulations. In this work, we propose a novel learning-based density-equalizing mapping framework (LDEM) using deep neural networks. Specifically, we introduce a loss function that enforces density uniformity and geometric regularity, and utilize a hierarchical approach to predict the transformations at both the coarse and dense levels. Our method demonstrates superior density-equalizing and bijectivity properties compared to prior methods for a wide range of simple and complex density distributions, and can be easily applied to surface remeshing with different effects. Also, it generalizes seamlessly from 2D to 3D domains without structural changes to the model architecture or loss formulation. Altogether, our work opens up new possibilities for scalable and robust computation of density-equalizing maps for practical applications.

nan


Article 847

Title@2025-06-10 (2): Reinforcement Learning Teachers of Test Time Scaling

Title: Reinforcement Learning Teachers of Test Time Scaling Verstärktes Lernen von Lehrern der Testzeitskalierung 测试时间尺度强化学习教师 2506.08388v1

Authors (3): Edoardo Cetin, Tianyu Zhao, Yujin Tang

Training reasoning language models (LMs) with reinforcement learning (RL) for one-hot correctness inherently relies on the LM being able to explore and solve its task with some chance at initialization. Furthermore, a key use case of reasoning LMs is to act as teachers for distilling new students and cold-starting future RL iterations rather than being deployed themselves. From these considerations, we introduce a new framework that avoids RL’s exploration challenge by training a new class of Reinforcement-Learned Teachers (RLTs) focused on yielding the most effective downstream distillation. RLTs are prompted with both the question and solution to each problem, and tasked to simply “connect-the-dots” with detailed explanations tailored for their students. We train RLTs with dense rewards obtained by feeding each explanation to the student and testing its understanding of the problem’s solution. In practice, the raw outputs of a 7B RLT provide higher final performance on competition and graduate-level tasks than existing distillation and cold-starting pipelines that collect and postprocess the reasoning traces of orders of magnitude larger LMs. Furthermore, RLTs maintain their effectiveness when training larger students and when applied zero-shot to out-of-distribution tasks, unlocking new levels of efficiency and re-usability for the RL reasoning framework.

nan


Article 848

Title@2025-06-10 (2): Network Threat Detection: Addressing Class Imbalanced Data with Deep Forest

Title: Network Threat Detection: Addressing Class Imbalanced Data with Deep Forest Netzwerk Threat Detection: Adressierung von unausgeglichenen Daten der Klasse mit tiefem Wald 网络威胁探测:处理深森林的分类平衡数据 2506.08383v1

Authors (2): Jiaqi Chen, Rongbin Ye

With the rapid expansion of Internet of Things (IoT) networks, detecting malicious traffic in real-time has become a critical cybersecurity challenge. This research addresses the detection challenges by presenting a comprehensive empirical analysis of machine learning techniques for malware detection using the IoT-23 dataset provided by the Stratosphere Laboratory. We address the significant class imbalance within the dataset through three resampling strategies. We implement and compare a few machine learning techniques. Our findings demonstrate that the combination of appropriate imbalance treatment techniques with ensemble methods, particularly gcForest, achieves better detection performance compared to traditional approaches. This work contributes significantly to the development of more intelligent and efficient automated threat detection systems for IoT environments, helping to secure critical infrastructure against sophisticated cyber attacks while optimizing computational resource usage.

nan


Article 849

Title@2025-06-10 (2): Reinforce LLM Reasoning through Multi-Agent Reflection

Title: Reinforce LLM Reasoning through Multi-Agent Reflection Verstärkung der LLM-Vernunft durch Multi-Agent-Reflexion 通过多方机构反省说明理由 2506.08379v1

Authors (2): Yurun Yuan, Tengyang Xie

Leveraging more test-time computation has proven to be an effective way to boost the reasoning capabilities of large language models (LLMs). Among various methods, the verify-and-improve paradigm stands out for enabling dynamic solution exploration and feedback incorporation. However, existing approaches often suffer from restricted feedback spaces and lack of coordinated training of different parties, leading to suboptimal performance. To address this, we model this multi-turn refinement process as a Markov Decision Process and introduce DPSDP (Direct Policy Search by Dynamic Programming), a reinforcement learning algorithm that trains an actor-critic LLM system to iteratively refine answers via direct preference learning on self-generated data. Theoretically, DPSDP can match the performance of any policy within the training distribution. Empirically, we instantiate DPSDP with various base models and show improvements on both in- and out-of-distribution benchmarks. For example, on benchmark MATH 500, majority voting over five refinement steps increases first-turn accuracy from 58.2% to 63.2% with Ministral-based models. An ablation study further confirms the benefits of multi-agent collaboration and out-of-distribution generalization.

nan


Article 850

Title@2025-06-10 (2): A Probabilistic Framework for Imputing Genetic Distances in Spatiotemporal Pathogen Models

Title: A Probabilistic Framework for Imputing Genetic Distances in Spatiotemporal Pathogen Models Ein probabilistischer Rahmen für die Imputierung genetischer Entfernungen in Spatiotemporal Pathogen-Modellen 关于将遗传距离算入时地病原体模型的概率框架 2506.09076v1

Authors (8): Haley Stone, Jing Du, Hao Xue, Matthew Scotch, David Heslop, Andreas Züfle, Chandini Raina MacIntyre, Flora Salim

Pathogen genome data offers valuable structure for spatial models, but its utility is limited by incomplete sequencing coverage. We propose a probabilistic framework for inferring genetic distances between unsequenced cases and known sequences within defined transmission chains, using time-aware evolutionary distance modeling. The method estimates pairwise divergence from collection dates and observed genetic distances, enabling biologically plausible imputation grounded in observed divergence patterns, without requiring sequence alignment or known transmission chains. Applied to highly pathogenic avian influenza A/H5 cases in wild birds in the United States, this approach supports scalable, uncertainty-aware augmentation of genomic datasets and enhances the integration of evolutionary information into spatiotemporal modeling workflows.

nan


Article 851

Title@2025-06-10 (2): AlphaFold Database Debiasing for Robust Inverse Folding

Title: AlphaFold Database Debiasing for Robust Inverse Folding AlphaFold Datenbank-Debiasing für robuste Inverse Faltung AlphaFold 数据库对硬性反向折叠的偏移 2506.08365v1

Authors (6): Cheng Tan, Zhenxiao Cao, Zhangyang Gao, Siyuan Li, Yufei Huang, Stan Z. Li

The AlphaFold Protein Structure Database (AFDB) offers unparalleled structural coverage at near-experimental accuracy, positioning it as a valuable resource for data-driven protein design. However, its direct use in training deep models that are sensitive to fine-grained atomic geometry, such as inverse folding, exposes a critical limitation. Comparative analysis of structural feature distributions reveals that AFDB structures exhibit distinct statistical regularities, reflecting a systematic geometric bias that deviates from the conformational diversity found in experimentally determined structures from the Protein Data Bank (PDB). While AFDB structures are cleaner and more idealized, PDB structures capture the intrinsic variability and physical realism essential for generalization in downstream tasks. To address this discrepancy, we introduce a Debiasing Structure AutoEncoder (DeSAE) that learns to reconstruct native-like conformations from intentionally corrupted backbone geometries. By training the model to recover plausible structural states, DeSAE implicitly captures a more robust and natural structural manifold. At inference, applying DeSAE to AFDB structures produces debiased structures that significantly improve inverse folding performance across multiple benchmarks. This work highlights the critical impact of subtle systematic biases in predicted structures and presents a principled framework for debiasing, significantly boosting the performance of structure-based learning tasks like inverse folding.

nan


Article 852

Title@2025-06-10 (2): Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity

Title: Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity Lösen von Convex-Concave-Problemen mit $\tilde{\mathcal{O}}(ε^{-4/7})$ Oracle-Komplexität zweiter Ordnung 以 $\ title_ mathcal{O} (- 4/ 7}) 美元解决二次折形甲骨文复杂度的连接问题 2506.08362v1

Authors (4): Lesi Chen, Chengchang Liu, Luo Luo, Jingzhao Zhang

Previous algorithms can solve convex-concave minimax problems $\min_{x \in \mathcal{X}} \max_{y \in \mathcal{Y}} f(x,y)$ with $\mathcal{O}(\epsilon^{-2/3})$ second-order oracle calls using Newton-type methods. This result has been speculated to be optimal because the upper bound is achieved by a natural generalization of the optimal first-order method. In this work, we show an improved upper bound of $\tilde{\mathcal{O}}(\epsilon^{-4/7})$ by generalizing the optimal second-order method for convex optimization to solve the convex-concave minimax problem. We further apply a similar technique to lazy Hessian algorithms and show that our proposed algorithm can also be seen as a second-order ``Catalyst’’ framework (Lin et al., JMLR 2018) that could accelerate any globally convergent algorithms for solving minimax problems.

nan


Article 853

Title@2025-06-10 (2): NysAct: A Scalable Preconditioned Gradient Descent using Nystrom Approximation

Title: NysAct: A Scalable Preconditioned Gradient Descent using Nystrom Approximation NysAct: Ein skalierbarer vorkonditionierter Gradient Abstieg mit Nystrom-Annäherung NysAct: 使用Nystrom 接近法的可缩放的预设条件渐变源 2506.08360v1

Authors (3): Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko

Adaptive gradient methods are computationally efficient and converge quickly, but they often suffer from poor generalization. In contrast, second-order methods enhance convergence and generalization but typically incur high computational and memory costs. In this work, we introduce NysAct, a scalable first-order gradient preconditioning method that strikes a balance between state-of-the-art first-order and second-order optimization methods. NysAct leverages an eigenvalue-shifted Nystrom method to approximate the activation covariance matrix, which is used as a preconditioning matrix, significantly reducing time and memory complexities with minimal impact on test accuracy. Our experiments show that NysAct not only achieves improved test accuracy compared to both first-order and second-order methods but also demands considerably less computational resources than existing second-order methods. Code is available at https://github.com/hseung88/nysact.

nan


Article 854

Title@2025-06-10 (2): An Adaptive Method Stabilizing Activations for Enhanced Generalization

Title: An Adaptive Method Stabilizing Activations for Enhanced Generalization Eine adaptive Methode zur Stabilisierung von Aktivierungen für verstärkte Generalisierung 增强普遍化的适应方法稳定活动 2506.08353v1

Authors (3): Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko

We introduce AdaAct, a novel optimization algorithm that adjusts learning rates according to activation variance. Our method enhances the stability of neuron outputs by incorporating neuron-wise adaptivity during the training process, which subsequently leads to better generalization – a complementary approach to conventional activation regularization methods. Experimental results demonstrate AdaAct’s competitive performance across standard image classification benchmarks. We evaluate AdaAct on CIFAR and ImageNet, comparing it with other state-of-the-art methods. Importantly, AdaAct effectively bridges the gap between the convergence speed of Adam and the strong generalization capabilities of SGD, all while maintaining competitive execution times. Code is available at https://github.com/hseung88/adaact.

nan


Article 855

Title@2025-06-10 (2): Differentially Private Relational Learning with Entity-level Privacy Guarantees

Title: Differentially Private Relational Learning with Entity-level Privacy Guarantees Unterschiedlich privates relationales Lernen mit Datenschutzgarantien auf Körperschaftsebene 与实体一级的隐私保障有区别的私人关系学习 2506.08347v1

Authors (5): Yinan Huang, Haoteng Ying, Eli Chien, Rongzhe Wei, Pan Li

Learning with relational and network-structured data is increasingly vital in sensitive domains where protecting the privacy of individual entities is paramount. Differential Privacy (DP) offers a principled approach for quantifying privacy risks, with DP-SGD emerging as a standard mechanism for private model training. However, directly applying DP-SGD to relational learning is challenging due to two key factors: (i) entities often participate in multiple relations, resulting in high and difficult-to-control sensitivity; and (ii) relational learning typically involves multi-stage, potentially coupled (interdependent) sampling procedures that make standard privacy amplification analyses inapplicable. This work presents a principled framework for relational learning with formal entity-level DP guarantees. We provide a rigorous sensitivity analysis and introduce an adaptive gradient clipping scheme that modulates clipping thresholds based on entity occurrence frequency. We also extend the privacy amplification results to a tractable subclass of coupled sampling, where the dependence arises only through sample sizes. These contributions lead to a tailored DP-SGD variant for relational data with provable privacy guarantees. Experiments on fine-tuning text encoders over text-attributed network-structured relational data demonstrate the strong utility-privacy trade-offs of our approach. Our code is available at https://github.com/Graph-COM/Node_DP.

nan


Article 856

Title@2025-06-10 (2): Re4MPC: Reactive Nonlinear MPC for Multi-model Motion Planning via Deep Reinforcement Learning

Title: Re4MPC: Reactive Nonlinear MPC for Multi-model Motion Planning via Deep Reinforcement Learning Re4MPC: Reaktives nichtlineares MPC für Multi-Model-Bewegungsplanung über Deep Reinforcement Learning Re4MPC:通过深强化学习进行多模式运动规划的反活动非线性非线性MPC 2506.08344v1

Authors (4): Neşet Ünver Akmandor, Sarvesh Prajapati, Mark Zolotas, Taşkın Padır

Traditional motion planning methods for robots with many degrees-of-freedom, such as mobile manipulators, are often computationally prohibitive for real-world settings. In this paper, we propose a novel multi-model motion planning pipeline, termed Re4MPC, which computes trajectories using Nonlinear Model Predictive Control (NMPC). Re4MPC generates trajectories in a computationally efficient manner by reactively selecting the model, cost, and constraints of the NMPC problem depending on the complexity of the task and robot state. The policy for this reactive decision-making is learned via a Deep Reinforcement Learning (DRL) framework. We introduce a mathematical formulation to integrate NMPC into this DRL framework. To validate our methodology and design choices, we evaluate DRL training and test outcomes in a physics-based simulation involving a mobile manipulator. Experimental results demonstrate that Re4MPC is more computationally efficient and achieves higher success rates in reaching end-effector goals than the NMPC baseline, which computes whole-body trajectories without our learning mechanism.

nan


Article 857

Title@2025-06-10 (2): Dynamical System Optimization

Title: Dynamical System Optimization Dynamische Systemoptimierung 动态系统优化 2506.08340v1

Authors (1): Emo Todorov

We develop an optimization framework centered around a core idea: once a (parametric) policy is specified, control authority is transferred to the policy, resulting in an autonomous dynamical system. Thus we should be able to optimize policy parameters without further reference to controls or actions, and without directly using the machinery of approximate Dynamic Programming and Reinforcement Learning. Here we derive simpler algorithms at the autonomous system level, and show that they compute the same quantities as policy gradients and Hessians, natural gradients, proximal methods. Analogs to approximate policy iteration and off-policy learning are also available. Since policy parameters and other system parameters are treated uniformly, the same algorithms apply to behavioral cloning, mechanism design, system identification, learning of state estimators. Tuning of generative AI models is not only possible, but is conceptually closer to the present framework than to Reinforcement Learning.

nan


Article 858

Title@2025-06-10 (2): Equivalence of stochastic and deterministic policy gradients

Title: Equivalence of stochastic and deterministic policy gradients Gleichwertigkeit stochastischer und deterministischer politischer Gradienten 政策梯度和确定性政策梯度等同 2505.23244v2

Authors (1): Emo Todorov

Policy gradients in continuous control have been derived for both stochastic and deterministic policies. Here we study the relationship between the two. In a widely-used family of MDPs involving Gaussian control noise and quadratic control costs, we show that the stochastic and deterministic policy gradients, natural gradients, and state value functions are identical; while the state-control value functions are different. We then develop a general procedure for constructing an MDP with deterministic policy that is equivalent to a given MDP with stochastic policy. The controls of this new MDP are the sufficient statistics of the stochastic policy in the original MDP. Our results suggest that policy gradient methods can be unified by approximating state value functions rather than state-control value functions.

nan


Article 859

Title@2025-06-10 (2): midr: Learning from Black-Box Models by Maximum Interpretation Decomposition

Title: midr: Learning from Black-Box Models by Maximum Interpretation Decomposition midr: Von Black-Box-Modellen lernen durch maximale Interpretationszersetzung 中间体: 通过最大解析分解从黑ox 模型中学习 2506.08338v1

Authors (3): Ryoichi Asashiba, Reiji Kozuma, Hirokazu Iwasawa

The use of appropriate methods of Interpretable Machine Learning (IML) and eXplainable Artificial Intelligence (XAI) is essential for adopting black-box predictive models in fields where model and prediction explainability is required. As a novel tool for interpreting black-box models, we introduce the R package midr, which implements Maximum Interpretation Decomposition (MID). MID is a functional decomposition approach that derives a low-order additive representation of a black-box model by minimizing the squared error between the model’s prediction function and this additive representation. midr enables learning from black-box models by constructing a global surrogate model with advanced analytical capabilities. After reviewing related work and the theoretical foundation of MID, we demonstrate the package’s usage and discuss some of its key features.

nan


Article 860

Title@2025-06-10 (2): A Simple Analysis of Discretization Error in Diffusion Models

Title: A Simple Analysis of Discretization Error in Diffusion Models Eine einfache Analyse des Diskretisierungsfehlers in Diffusionsmodellen 简单分析扩散模型中的分解错误 2506.08337v1

Authors (2): Juhyeok Choi, Chenglin Fan

Diffusion models, formulated as discretizations of stochastic differential equations (SDEs), achieve state-of-the-art generative performance. However, existing analyses of their discretization error often rely on complex probabilistic tools. In this work, we present a simplified theoretical framework for analyzing the Euler–Maruyama discretization of variance-preserving SDEs (VP-SDEs) in Denoising Diffusion Probabilistic Models (DDPMs), where $ T $ denotes the number of denoising steps in the diffusion process. Our approach leverages Gr"onwall’s inequality to derive a convergence rate of $ \mathcal{O}(1/T^{1/2}) $ under Lipschitz assumptions, significantly streamlining prior proofs. Furthermore, we demonstrate that the Gaussian noise in the discretization can be replaced by a discrete random variable (e.g., Rademacher or uniform noise) without sacrificing convergence guarantees-an insight with practical implications for efficient sampling. Experiments validate our theory, showing that (1) the error scales as predicted, (2) discrete noise achieves comparable sample quality to Gaussian noise, and (3) incorrect noise scaling degrades performance. By unifying simplified analysis and discrete noise substitution, our work bridges theoretical rigor with practical efficiency in diffusion-based generative modeling.

nan


Article 861

Title@2025-06-10 (2): Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer

Title: Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer Zero-shot Meta-Learning für Tabular Prediction Aufgaben mit adversarially Pre-trained Transformer 采用未受过训练的变换器进行表列预测任务 2502.04573v2

Authors (2): Yulun Wu, Doron L. Bergman

We present an Adversarially Pre-trained Transformer (APT) that is able to perform zero-shot meta-learning on tabular prediction tasks without pre-training on any real-world dataset, extending on the recent development of Prior-Data Fitted Networks (PFNs) and TabPFN. Specifically, APT is pre-trained with adversarial synthetic data agents, who continue to shift their underlying data generating distribution and deliberately challenge the model with different synthetic datasets. In addition, we propose a mixture block architecture that is able to handle classification tasks with arbitrary number of classes, addressing the class size limitation – a crucial weakness of prior deep tabular zero-shot learners. In experiments, we show that our framework matches state-of-the-art performance on small classification tasks without filtering on dataset characteristics such as number of classes and number of missing values, while maintaining an average runtime under one second. On common benchmark dataset suites in both classification and regression, we show that adversarial pre-training was able to enhance TabPFN’s performance. In our analysis, we demonstrate that the adversarial synthetic data agents were able to generate a more diverse collection of data compared to the ordinary random generator in TabPFN. In addition, we demonstrate that our mixture block neural design has improved generalizability and greatly accelerated pre-training.

nan


Article 862

Title@2025-06-10 (2): Generation from Noisy Examples

Title: Generation from Noisy Examples Generierung aus lauteren Beispielen 从吵闹中生成 2501.04179v2

Authors (2): Ananth Raman, Vinod Raman

We continue to study the learning-theoretic foundations of generation by extending the results from Kleinberg and Mullainathan [2024] and Li et al. [2024] to account for noisy example streams. In the noiseless setting of Kleinberg and Mullainathan [2024] and Li et al. [2024], an adversary picks a hypothesis from a binary hypothesis class and provides a generator with a sequence of its positive examples. The goal of the generator is to eventually output new, unseen positive examples. In the noisy setting, an adversary still picks a hypothesis and a sequence of its positive examples. But, before presenting the stream to the generator, the adversary inserts a finite number of negative examples. Unaware of which examples are noisy, the goal of the generator is to still eventually output new, unseen positive examples. In this paper, we provide necessary and sufficient conditions for when a binary hypothesis class can be noisily generatable. We provide such conditions with respect to various constraints on the number of distinct examples that need to be seen before perfect generation of positive examples. Interestingly, for finite and countable classes we show that generatability is largely unaffected by the presence of a finite number of noisy examples.

nan


Article 863

Title@2025-06-10 (2): Graph Prompting for Graph Learning Models: Recent Advances and Future Directions

Title: Graph Prompting for Graph Learning Models: Recent Advances and Future Directions Graph Prompting für Graph Learning Models: Aktuelle Fortschritte und zukünftige Richtungen 图示学习模型提示图示:最近的进展和未来方向 2506.08326v1

Authors (10): Xingbo Fu, Zehong Wang, Zihan Chen, Jiazheng Li, Yaochen Zhu, Zhenyu Lei, Cong Shen, Yanfang Ye, Chuxu Zhang, Jundong Li

Graph learning models have demonstrated great prowess in learning expressive representations from large-scale graph data in a wide variety of real-world scenarios. As a prevalent strategy for training powerful graph learning models, the “pre-training, adaptation” scheme first pre-trains graph learning models on unlabeled graph data in a self-supervised manner and then adapts them to specific downstream tasks. During the adaptation phase, graph prompting emerges as a promising approach that learns trainable prompts while keeping the pre-trained graph learning models unchanged. In this paper, we present a systematic review of recent advancements in graph prompting. First, we introduce representative graph pre-training methods that serve as the foundation step of graph prompting. Next, we review mainstream techniques in graph prompting and elaborate on how they design learnable prompts for graph prompting. Furthermore, we summarize the real-world applications of graph prompting from different domains. Finally, we discuss several open challenges in existing studies with promising future directions in this field.

nan


Article 864

Title@2025-06-10 (2): Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces

Title: Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces Modellfreier Kernel Conformal Depth misst Algorithmen zur Unsicherheitsquantifizierung in Regressionsmodellen in trennbaren Hilbert-Räumen 在塞可分离的希尔伯特空间的回归模型中,不确定性量化不确定性的模型-无模型核心常规 常规深度测量标准 2506.08325v1

Authors (5): Marcos Matabuena, Rahul Ghosal, Pavlo Mozharovskyi, Oscar Hernan Madrid Padilla, Jukka-Pekka Onnela

Depth measures are powerful tools for defining level sets in emerging, non–standard, and complex random objects such as high-dimensional multivariate data, functional data, and random graphs. Despite their favorable theoretical properties, the integration of depth measures into regression modeling to provide prediction regions remains a largely underexplored area of research. To address this gap, we propose a novel, model-free uncertainty quantification algorithm based on conditional depth measures–specifically, conditional kernel mean embeddings and an integrated depth measure. These new algorithms can be used to define prediction and tolerance regions when predictors and responses are defined in separable Hilbert spaces. The use of kernel mean embeddings ensures faster convergence rates in prediction region estimation. To enhance the practical utility of the algorithms with finite samples, we also introduce a conformal prediction variant that provides marginal, non-asymptotic guarantees for the derived prediction regions. Additionally, we establish both conditional and unconditional consistency results, as well as fast convergence rates in certain homoscedastic settings. We evaluate the finite–sample performance of our model in extensive simulation studies involving various types of functional data and traditional Euclidean scenarios. Finally, we demonstrate the practical relevance of our approach through a digital health application related to physical activity, aiming to provide personalized recommendations

nan


Article 865

Title@2025-06-10 (2): RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language

Title: RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors und Natural Language 关于音频、视频、嵌入传感器和自然语言的问答答案 2505.17114v2

Authors (3): Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam

Multimodal question answering (QA) often requires identifying which video, audio, or sensor tokens are relevant to the question. Yet modality disagreements are common: off-camera speech, background noise, or motion outside the field of view often mislead fusion models that weight all streams equally. We present RAVEN, a unified QA architecture whose core is QuART, a query-conditioned cross-modal gating module that assigns scalar relevance scores to each token across modalities, enabling the model to amplify informative signals and suppress distractors before fusion. RAVEN is trained through a three-stage pipeline comprising unimodal pretraining, query-aligned fusion, and disagreement-oriented fine-tuning – each stage targeting a distinct challenge in multi-modal reasoning: representation quality, cross-modal relevance, and robustness to modality mismatch. To support training and evaluation, we release AVS-QA, a dataset of 300K synchronized Audio–Video-Sensor streams paired with automatically generated question-answer pairs. Experimental results on seven multi-modal QA benchmarks – including egocentric and exocentric tasks – show that RAVEN achieves up to 14.5\% and 8.0\% gains in accuracy compared to state-of-the-art multi-modal large language models, respectively. Incorporating sensor data provides an additional 16.4\% boost, and the model remains robust under modality corruption, outperforming SOTA baselines by 50.23\%. Our code and dataset are available at https://github.com/BASHLab/RAVEN.

nan


Article 866

Title@2025-06-10 (2): TPP-LLM: Modeling Temporal Point Processes by Efficiently Fine-Tuning Large Language Models

Title: TPP-LLM: Modeling Temporal Point Processes by Efficiently Fine-Tuning Large Language Models TPP-LLM: Modellierung von Temporal Point-Prozessen durch effizientes Feintuning großer Sprachmodelle TPP-LLM:通过高效微调大语言模式模拟时点进程 2410.02062v2

Authors (2): Zefang Liu, Yinzhu Quan

Temporal point processes (TPPs) are widely used to model the timing and occurrence of events in domains such as social networks, transportation systems, and e-commerce. In this paper, we introduce TPP-LLM, a novel framework that integrates large language models (LLMs) with TPPs to capture both the semantic and temporal aspects of event sequences. Unlike traditional methods that rely on categorical event type representations, TPP-LLM directly utilizes the textual descriptions of event types, enabling the model to capture rich semantic information embedded in the text. While LLMs excel at understanding event semantics, they are less adept at capturing temporal patterns. To address this, TPP-LLM incorporates temporal embeddings and employs parameter-efficient fine-tuning (PEFT) methods to effectively learn temporal dynamics without extensive retraining. This approach improves both predictive accuracy and computational efficiency. Experimental results across diverse real-world datasets demonstrate that TPP-LLM outperforms state-of-the-art baselines in sequence modeling and event prediction, highlighting the benefits of combining LLMs with TPPs.

nan


Article 867

Title@2025-06-10 (2): Curse of High Dimensionality Issue in Transformer for Long-context Modeling

Title: Curse of High Dimensionality Issue in Transformer for Long-context Modeling Fluch der Hochdimensionalitätsfrage im Transformer für die Langkontextmodellierung 变异器中高多维度问题的诅咒,用于长期建模 2505.22107v3

Authors (8): Shuhai Zhang, Zeng You, Yaofo Chen, Zhiquan Wen, Qianyue Wang, Zhijie Qiu, Yuanqing Li, Mingkui Tan

Transformer-based large language models (LLMs) excel in natural language processing tasks by capturing long-range dependencies through self-attention mechanisms. However, long-context modeling faces significant computational inefficiencies due to \textit{redundant} attention computations: while attention weights are often \textit{sparse}, all tokens consume \textit{equal} computational resources. In this paper, we reformulate traditional probabilistic sequence modeling as a \textit{supervised learning task}, enabling the separation of relevant and irrelevant tokens and providing a clearer understanding of redundancy. Based on this reformulation, we theoretically analyze attention sparsity, revealing that only a few tokens significantly contribute to predictions. Building on this, we formulate attention optimization as a linear coding problem and propose a \textit{group coding strategy}, theoretically showing its ability to improve robustness against random noise and enhance learning efficiency. Motivated by this, we propose \textit{Dynamic Group Attention} (DGA), which leverages the group coding to explicitly reduce redundancy by aggregating less important tokens during attention computation. Empirical results show that our DGA significantly reduces computational costs while maintaining competitive performance.Code is available at https://github.com/bolixinyu/DynamicGroupAttention.

nan


Article 868

Title@2025-06-10 (2): Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion

Title: Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion Warum Diffusion arbeitet maskieren: Zustand auf dem Sprungplan für verbesserte diskrete Diffusion 为何遮掩传播工作有效:改进传播的跳跳时间表上的条件 2506.08316v1

Authors (3): Alan N. Amin, Nate Gruver, Andrew Gordon Wilson

Discrete diffusion models, like continuous diffusion models, generate high-quality samples by gradually undoing noise applied to datapoints with a Markov process. Gradual generation in theory comes with many conceptual benefits; for example, inductive biases can be incorporated into the noising Markov process, and access to improved sampling algorithms. In practice, however, the consistently best performing discrete diffusion model is, surprisingly, masking diffusion, which does not denoise gradually. Here we explain the superior performance of masking diffusion by noting that it makes use of a fundamental difference between continuous and discrete Markov processes: discrete Markov processes evolve by discontinuous jumps at a fixed rate and, unlike other discrete diffusion models, masking diffusion builds in the known distribution of jump times and only learns where to jump to. We show that we can similarly bake in the known distribution of jump times into any discrete diffusion model. The resulting models - schedule-conditioned discrete diffusion (SCUD) - generalize classical discrete diffusion and masking diffusion. By applying SCUD to models with noising processes that incorporate inductive biases on images, text, and protein data, we build models that outperform masking.

nan


Article 869

Title@2025-06-10 (2): Private Evolution Converges

Title: Private Evolution Converges Private Evolutionskonvergenzen 私人进化集市 2506.08312v1

Authors (3): Tomás González, Giulia Fanti, Aaditya Ramdas

Private Evolution (PE) is a promising training-free method for differentially private (DP) synthetic data generation. While it achieves strong performance in some domains (e.g., images and text), its behavior in others (e.g., tabular data) is less consistent. To date, the only theoretical analysis of the convergence of PE depends on unrealistic assumptions about both the algorithm’s behavior and the structure of the sensitive dataset. In this work, we develop a new theoretical framework to explain PE’s practical behavior and identify sufficient conditions for its convergence. For $d$-dimensional sensitive datasets with $n$ data points from a bounded domain, we prove that PE produces an $(\epsilon, \delta)$-DP synthetic dataset with expected 1-Wasserstein distance of order $\tilde{O}(d(n\epsilon)^{-1/d})$ from the original, establishing worst-case convergence of the algorithm as $n \to \infty$. Our analysis extends to general Banach spaces as well. We also connect PE to the Private Signed Measure Mechanism, a method for DP synthetic data generation that has thus far not seen much practical adoption. We demonstrate the practical relevance of our theoretical findings in simulations.

nan


Article 870

Title@2025-06-10 (2): Continuous Temporal Learning of Probability Distributions via Neural ODEs with Applications in Continuous Glucose Monitoring Data

Title: Continuous Temporal Learning of Probability Distributions via Neural ODEs with Applications in Continuous Glucose Monitoring Data Kontinuierliches Temporales Lernen der Wahrscheinlichkeitsverteilungen über neurale ODEs mit Anwendungen in kontinuierlichen Glukoseüberwachungsdaten 不断从时间上学习通过有持续甘蔗监测数据应用的神经代码对概率分布进行概率分布的连续时间学习 2505.08698v2

Authors (2): Antonio Álvarez-López, Marcos Matabuena

Modeling the dynamics of probability distributions from time-dependent data samples is a fundamental problem in many fields, including digital health. The goal is to analyze how the distribution of a biomarker, such as glucose, changes over time and how these changes may reflect the progression of chronic diseases like diabetes. We introduce a probabilistic model based on a Gaussian mixture that captures the evolution of a continuous-time stochastic process. Our approach combines a non-parametric estimate of the distribution, obtained with Maximum Mean Discrepancy (MMD), and a Neural Ordinary Differential Equation (Neural ODE) that governs the temporal evolution of the mixture weights. The model is highly interpretable, detects subtle distribution shifts, and remains computationally efficient. Simulation studies show that our method matches or surpasses the estimation accuracy of state-of-the-art, less interpretable techniques such as normalizing flows and non-parametric kernel density estimators. We further demonstrate its utility using data from a digital clinical trial, revealing how interventions affect the time-dependent distribution of glucose levels. The proposed method enables rigorous comparisons between control and treatment groups from both mathematical and clinical perspectives, offering novel longitudinal characterizations that existing approaches cannot achieve.

nan


Article 871

Title@2025-06-10 (2): GRAM: Generalization in Deep RL with a Robust Adaptation Module

Title: GRAM: Generalization in Deep RL with a Robust Adaptation Module GRAM: Verallgemeinerung in Deep RL mit einem robusten Anpassungsmodul GRAM: 带有强力适应模块的深 RL 常规化 2412.04323v2

Authors (6): James Queeney, Xiaoyi Cai, Alexander Schperberg, Radu Corcodel, Mouhacine Benosman, Jonathan P. How

The reliable deployment of deep reinforcement learning in real-world settings requires the ability to generalize across a variety of conditions, including both in-distribution scenarios seen during training as well as novel out-of-distribution scenarios. In this work, we present a framework for dynamics generalization in deep reinforcement learning that unifies these two distinct types of generalization within a single architecture. We introduce a robust adaptation module that provides a mechanism for identifying and reacting to both in-distribution and out-of-distribution environment dynamics, along with a joint training pipeline that combines the goals of in-distribution adaptation and out-of-distribution robustness. Our algorithm GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenarios upon deployment, which we demonstrate through extensive simulation and hardware locomotion experiments on a quadruped robot.

nan


Article 872

Title@2025-06-10 (2): H$^2$GFM: Towards unifying Homogeneity and Heterogeneity on Text-Attributed Graphs

Title: H$^2$GFM: Towards unifying Homogeneity and Heterogeneity on Text-Attributed Graphs H$^2$GFM: Auf dem Weg zur Vereinheitlichung von Homogenität und Heterogenität auf Text-Attributen Grafiken H$2美元GFM:努力在文本图表中统一同质性和异质性 2506.08298v1

Authors (7): Trung-Kien Nguyen, Heng Ping, Shixuan Li, Peiyu Zhang, Nikos Kanakaris, Nicholas Kotov, Paul Bogdan

The growing interests and applications of graph learning in diverse domains have propelled the development of a unified model generalizing well across different graphs and tasks, known as the Graph Foundation Model (GFM). Existing research has leveraged text-attributed graphs (TAGs) to tackle the heterogeneity in node features among graphs. However, they primarily focus on homogeneous TAGs (HoTAGs), leaving heterogeneous TAGs (HeTAGs), where multiple types of nodes/edges reside, underexplored. To enhance the capabilities and applications of GFM, we introduce H$^2$GFM, a novel framework designed to generalize across both HoTAGs and HeTAGs. Our model projects diverse meta-relations among graphs under a unified textual space, and employs a context encoding to capture spatial and higher-order semantic relationships. To achieve robust node representations, we propose a novel context-adaptive graph transformer (CGT), effectively capturing information from both context neighbors and their relationships. Furthermore, we employ a mixture of CGT experts to capture the heterogeneity in structural patterns among graph types. Comprehensive experiments on a wide range of HoTAGs and HeTAGs as well as learning scenarios demonstrate the effectiveness of our model.

nan


Article 873

Title@2025-06-09 (1): From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?

Title: From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information? Von passiv zur aktiven Vernunft: Können große Sprachmodelle unter unvollständigen Informationen die richtigen Fragen stellen? 从被动到积极理由:大语言模式能够在不完整信息下询问正确的问题吗? 2506.08295v1

Authors (6): Zhanke Zhou, Xiao Feng, Zhaocheng Zhu, Jiangchao Yao, Sanmi Koyejo, Bo Han

While existing benchmarks probe the reasoning abilities of large language models (LLMs) across diverse domains, they predominantly assess passive reasoning, providing models with all the information needed to reach a solution. By contrast, active reasoning-where an LLM must interact with external systems to acquire missing evidence or data-has received little systematic attention. To address this shortfall, we present AR-Bench, a novel benchmark designed explicitly to evaluate an LLM’s active reasoning skills. AR-Bench comprises three task families-detective cases, situation puzzles, and guessing numbers-that together simulate real-world, agentic scenarios and measure performance across commonsense, logical, and symbolic reasoning challenges. Empirical evaluation on AR-Bench demonstrates that contemporary LLMs exhibit pronounced difficulties with active reasoning: they frequently fail to acquire or leverage the information needed to solve tasks. This gap highlights a stark divergence between their passive and active reasoning abilities. Moreover, ablation studies indicate that even advanced strategies, such as tree-based searching or post-training approaches, yield only modest gains and fall short of the levels required for real-world deployment. Collectively, these findings highlight the critical need to advance methodology for active reasoning, e.g., incorporating interactive learning, real-time feedback loops, and environment-aware objectives for training. The benchmark is publicly available at: https://github.com/tmlr-group/AR-Bench.

nan


Article 874

Title@2025-06-09 (1): From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium

Title: From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium Von der Debatte zum Equilibrium: Glaube-getriebener Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium 从辩论到平衡:通过巴伊西亚纳什平衡进行 2506.08292v1

Authors (6): Xie Yi, Zhanke Zhou, Chentao Cao, Qiyu Niu, Tongliang Liu, Bo Han

Multi-agent frameworks can substantially boost the reasoning power of large language models (LLMs), but they typically incur heavy computational costs and lack convergence guarantees. To overcome these challenges, we recast multi-LLM coordination as an incomplete-information game and seek a Bayesian Nash equilibrium (BNE), in which each agent optimally responds to its probabilistic beliefs about the strategies of others. We introduce Efficient Coordination via Nash Equilibrium (ECON), a hierarchical reinforcement-learning paradigm that marries distributed reasoning with centralized final output. Under ECON, each LLM independently selects responses that maximize its expected reward, conditioned on its beliefs about co-agents, without requiring costly inter-agent exchanges. We mathematically prove that ECON attains a markedly tighter regret bound than non-equilibrium multi-agent schemes. Empirically, ECON outperforms existing multi-LLM approaches by 11.2% on average across six benchmarks spanning complex reasoning and planning tasks. Further experiments demonstrate ECON’s ability to flexibly incorporate additional models, confirming its scalability and paving the way toward larger, more powerful multi-LLM ensembles. The code is publicly available at: https://github.com/tmlr-group/ECON.

nan


Article 875

Title@2025-06-09 (1): Digital Twin Synchronization: Bridging the Sim-RL Agent to a Real-Time Robotic Additive Manufacturing Control

Title: Digital Twin Synchronization: Bridging the Sim-RL Agent to a Real-Time Robotic Additive Manufacturing Control Digitale Twin-Synchronisierung: Überbrücken des Sim-RL Agents zu einer Echtzeit-Roboter-Zusatz-Fertigungssteuerung 数字双双同步同步:将Sim-RL代理连接到实时机器人添加剂制造控制 2501.18016v2

Authors (4): Matsive Ali, Sandesh Giri, Sen Liu, Qin Yang

With the rapid development of deep reinforcement learning technology, it gradually demonstrates excellent potential and is becoming the most promising solution in the robotics. However, in the smart manufacturing domain, there is still not too much research involved in dynamic adaptive control mechanisms optimizing complex processes. This research advances the integration of Soft Actor-Critic (SAC) with digital twins for industrial robotics applications, providing a framework for enhanced adaptive real-time control for smart additive manufacturing processing. The system architecture combines Unity’s simulation environment with ROS2 for seamless digital twin synchronization, while leveraging transfer learning to efficiently adapt trained models across tasks. We demonstrate our methodology using a Viper X300s robot arm with the proposed hierarchical reward structure to address the common reinforcement learning challenges in two distinct control scenarios. The results show rapid policy convergence and robust task execution in both simulated and physical environments demonstrating the effectiveness of our approach.

nan


Article 876

Title@2025-06-09 (1): Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder Therapy

Title: Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder Therapy Edge Computing basierte human-Roboter Kognitive Fusion: Eine medizinische Fallstudie in der Autismus-Spektrum-Störungstherapie 以边缘计算机为基础的人类-机器人认知共生:自闭症频谱病理医学案例研究 2401.00776v2

Authors (1): Qin Yang

In recent years, edge computing has served as a paradigm that enables many future technologies like AI, Robotics, IoT, and high-speed wireless sensor networks (like 5G) by connecting cloud computing facilities and services to the end users. Especially in medical and healthcare applications, it provides remote patient monitoring and increases voluminous multimedia. From the robotics angle, robot-assisted therapy (RAT) is an active-assistive robotic technology in rehabilitation robotics, attracting researchers to study and benefit people with disability like autism spectrum disorder (ASD) children. However, the main challenge of RAT is that the model capable of detecting the affective states of ASD people exists and can recall individual preferences. Moreover, involving expert diagnosis and recommendations to guide robots in updating the therapy approach to adapt to different statuses and scenarios is a crucial part of the ASD therapy process. This paper proposes the architecture of edge cognitive computing by combining human experts and assisted robots collaborating in the same framework to achieve a seamless remote diagnosis, round-the-clock symptom monitoring, emergency warning, therapy alteration, and advanced assistance.

nan


Article 877

Title@2025-06-09 (1): Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

Title: Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers Präzise hochdimensionale Asymptotik für die Quantifizierung heterogener Übertragungen 用于量化异基因转移的精确高多元性高活性 2010.11750v5

Authors (5): Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su

The problem of learning one task using samples from another task is central to transfer learning. In this paper, we focus on answering the following question: when does combining the samples from two related tasks perform better than learning with one target task alone? This question is motivated by an empirical phenomenon known as negative transfer, which has been observed in practice. While the transfer effect from one task to another depends on factors such as their sample sizes and the spectrum of their covariance matrices, precisely quantifying this dependence has remained a challenging problem. In order to compare a transfer learning estimator to single-task learning, one needs to compare the risks between the two estimators precisely. Further, the comparison depends on the distribution shifts between the two tasks. This paper applies recent developments of random matrix theory to tackle this challenge in a high-dimensional linear regression setting with two tasks. We show precise high-dimensional asymptotics for the bias and variance of a classical hard parameter sharing (HPS) estimator in the proportional limit, where the sample sizes of both tasks increase proportionally with dimension at fixed ratios. The precise asymptotics apply to various types of distribution shifts, including covariate shifts, model shifts, and combinations of both. We illustrate these results in a random-effects model to mathematically prove a phase transition from positive to negative transfer as the number of source task samples increases. One insight from the analysis is that a rebalanced HPS estimator, which downsizes the source task when the model shift is high, achieves the minimax optimal rate. The finding regarding phase transition also applies to multiple tasks when covariates are shared across tasks. Simulations validate the accuracy of the high-dimensional asymptotics for finite dimensions.

nan


Article 878

Title@2025-06-09 (1): Voice Impression Control in Zero-Shot TTS

Title: Voice Impression Control in Zero-Shot TTS Sprach-Impression-Steuerung in Zero-Shot TTS 零热TTTS中的声音压抑控制 2506.05688v2

Authors (3): Keinichi Fujita, Shota Horiguchi, Yusuke Ijima

Para-/non-linguistic information in speech is pivotal in shaping the listeners’ impression. Although zero-shot text-to-speech (TTS) has achieved high speaker fidelity, modulating subtle para-/non-linguistic information to control perceived voice characteristics, i.e., impressions, remains challenging. We have therefore developed a voice impression control method in zero-shot TTS that utilizes a low-dimensional vector to represent the intensities of various voice impression pairs (e.g., dark-bright). The results of both objective and subjective evaluations have demonstrated our method’s effectiveness in impression control. Furthermore, generating this vector via a large language model enables target-impression generation from a natural language description of the desired impression, thus eliminating the need for manual optimization. Audio examples are available on our demo page (https://ntt-hilab-gensp.github.io/is2025voiceimpression/).

nan


Article 879

Title@2025-06-09 (1): Seeing Voices: Generating A-Roll Video from Audio with Mirage

Title: Seeing Voices: Generating A-Roll Video from Audio with Mirage Stimmen sehen: A-Roll-Video aus Audio mit Mirage generieren 观看声音:从《幻影之音》制作一首歌曲视频 2506.08279v1

Authors (13): Aditi Sundararaman, Amogh Adishesha, Andrew Jaegle, Dan Bigioi, Hyoung-Kyu Song, Jon Kyl, Justin Mao, Kevin Lan, Mojtaba Komeili, ShahRukh Athar, Sheila Babayan, Stanislau Beliasau, William Buchwalter

From professional filmmaking to user-generated content, creators and consumers have long recognized that the power of video depends on the harmonious integration of what we hear (the video’s audio track) with what we see (the video’s image sequence). Current approaches to video generation either ignore sound to focus on general-purpose but silent image sequence generation or address both visual and audio elements but focus on restricted application domains such as re-dubbing. We introduce Mirage, an audio-to-video foundation model that excels at generating realistic, expressive output imagery from scratch given an audio input. When integrated with existing methods for speech synthesis (text-to-speech, or TTS), Mirage results in compelling multimodal video. When trained on audio-video footage of people talking (A-roll) and conditioned on audio containing speech, Mirage generates video of people delivering a believable interpretation of the performance implicit in input audio. Our central technical contribution is a unified method for training self-attention-based audio-to-video generation models, either from scratch or given existing weights. This methodology allows Mirage to retain generality as an approach to audio-to-video generation while producing outputs of superior subjective quality to methods that incorporate audio-specific architectures or loss components specific to people, speech, or details of how images or audio are captured. We encourage readers to watch and listen to the results of Mirage for themselves (see paper and comments for links).

nan


Article 880

Title@2025-06-09 (1): Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain

Title: Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain Instruction-Tuned Video-Audio Modelle Elucidate Funktionelle Spezialisierung im Gehirn 脑中精明功能性专业 2506.08277v1

Authors (8): Subba Reddy Oota, Khushbu Pahwa, Prachi Jindal, Satya Sai Srinath Namburi, Maneesh Singh, Tanmoy Chakraborty, Bapi S. Raju, Manish Gupta

Recent voxel-wise multimodal brain encoding studies have shown that multimodal large language models (MLLMs) exhibit a higher degree of brain alignment compared to unimodal models in both unimodal and multimodal stimulus settings. More recently, instruction-tuned multimodal models have shown to generate task-specific representations that align strongly with brain activity. However, prior work evaluating the brain alignment of MLLMs has primarily focused on unimodal settings or relied on non-instruction-tuned multimodal models for multimodal stimuli. To address this gap, we investigated brain alignment, that is, measuring the degree of predictivity of neural activity recorded while participants were watching naturalistic movies (video along with audio) with representations derived from MLLMs. We utilized instruction-specific embeddings from six video and two audio instruction-tuned MLLMs. Experiments with 13 video task-specific instructions show that instruction-tuned video MLLMs significantly outperform non-instruction-tuned multimodal (by 15%) and unimodal models (by 20%). Our evaluation of MLLMs for both video and audio tasks using language-guided instructions shows clear disentanglement in task-specific representations from MLLMs, leading to precise differentiation of multimodal functional processing in the brain. We also find that MLLM layers align hierarchically with the brain, with early sensory areas showing strong alignment with early layers, while higher-level visual and language regions align more with middle to late layers. These findings provide clear evidence for the role of task-specific instructions in improving the alignment between brain activity and MLLMs, and open new avenues for mapping joint information processing in both the systems. We make the code publicly available [https://github.com/subbareddy248/mllm_videos].

nan


Article 881

Title@2025-06-09 (1): SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Title: SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device SnapGen-V: Ein Fünf-Sekunden-Video innerhalb von fünf Sekunden auf einem mobilen Gerät erzeugen SnapGen-V:在移动设备上制作5秒内5秒内5秒的视频 2412.10494v2

Authors (14): Yushu Wu, Zhixing Zhang, Yanyu Li, Yanwu Xu, Anil Kag, Yang Sui, Huseyin Coskun, Ke Ma, Aleksei Lebedev, Ju Hu, Dimitris Metaxas, Yanzhi Wang, Sergey Tulyakov, Jian Ren

We have witnessed the unprecedented success of diffusion-based video generation over the past year. Recently proposed models from the community have wielded the power to generate cinematic and high-resolution videos with smooth motions from arbitrary input prompts. However, as a supertask of image generation, video generation models require more computation and are thus hosted mostly on cloud servers, limiting broader adoption among content creators. In this work, we propose a comprehensive acceleration framework to bring the power of the large-scale video diffusion model to the hands of edge users. From the network architecture scope, we initialize from a compact image backbone and search out the design and arrangement of temporal layers to maximize hardware efficiency. In addition, we propose a dedicated adversarial fine-tuning algorithm for our efficient model and reduce the denoising steps to 4. Our model, with only 0.6B parameters, can generate a 5-second video on an iPhone 16 PM within 5 seconds. Compared to server-side models that take minutes on powerful GPUs to generate a single video, we accelerate the generation by magnitudes while delivering on-par quality.

nan


Article 882

Title@2025-06-09 (1): LEANN: A Low-Storage Vector Index

Title: LEANN: A Low-Storage Vector Index LEANN: Ein speicherarmer Vektorindex LEANN:低压力矢量指数 2506.08276v1

Authors (13): Yichuan Wang, Shu Liu, Zhifei Li, Yongji Wu, Ziming Mao, Yilong Zhao, Xiao Yan, Zhiying Xu, Yang Zhou, Ion Stoica, Sewon Min, Matei Zaharia, Joseph E. Gonzalez

Embedding-based search is widely used in applications such as recommendation and retrieval-augmented generation (RAG). Recently, there is a growing demand to support these capabilities over personal data stored locally on devices. However, maintaining the necessary data structure associated with the embedding-based search is often infeasible due to its high storage overhead. For example, indexing 100 GB of raw data requires 150 to 700 GB of storage, making local deployment impractical. Reducing this overhead while maintaining search quality and latency becomes a critical challenge. In this paper, we present LEANN, a storage-efficient approximate nearest neighbor (ANN) search index optimized for resource-constrained personal devices. LEANN combines a compact graph-based structure with an efficient on-the-fly recomputation strategy to enable fast and accurate retrieval with minimal storage overhead. Our evaluation shows that LEANN reduces index size to under 5% of the original raw data, achieving up to 50 times smaller storage than standard indexes, while maintaining 90% top-3 recall in under 2 seconds on real-world question answering benchmarks.

nan


Article 883

Title@2025-06-09 (1): Exploring Microstructural Dynamics in Cryptocurrency Limit Order Books: Better Inputs Matter More Than Stacking Another Hidden Layer

Title: Exploring Microstructural Dynamics in Cryptocurrency Limit Order Books: Better Inputs Matter More Than Stacking Another Hidden Layer Erforschung mikrostruktureller Dynamiken in Kryptowährung Limit Order Books: Bessere Eingaben sind wichtiger als das Stapeln einer anderen versteckten Ebene 探索加密货币限制顺序书中的微观结构动态:更好的投入比另层隐藏层更重要 2506.05764v2

Authors (1): Haochuan Wang

Cryptocurrency price dynamics are driven largely by microstructural supply demand imbalances in the limit order book (LOB), yet the highly noisy nature of LOB data complicates the signal extraction process. Prior research has demonstrated that deep-learning architectures can yield promising predictive performance on pre-processed equity and futures LOB data, but they often treat model complexity as an unqualified virtue. In this paper, we aim to examine whether adding extra hidden layers or parameters to “blackbox ish” neural networks genuinely enhances short term price forecasting, or if gains are primarily attributable to data preprocessing and feature engineering. We benchmark a spectrum of models from interpretable baselines, logistic regression, XGBoost to deep architectures (DeepLOB, Conv1D+LSTM) on BTC/USDT LOB snapshots sampled at 100 ms to multi second intervals using publicly available Bybit data. We introduce two data filtering pipelines (Kalman, Savitzky Golay) and evaluate both binary (up/down) and ternary (up/flat/down) labeling schemes. Our analysis compares models on out of sample accuracy, latency, and robustness to noise. Results reveal that, with data preprocessing and hyperparameter tuning, simpler models can match and even exceed the performance of more complex networks, offering faster inference and greater interpretability.

nan


Article 884

Title@2025-06-09 (1): DualDynamics: Synergizing Implicit and Explicit Methods for Robust Irregular Time Series Analysis

Title: DualDynamics: Synergizing Implicit and Explicit Methods for Robust Irregular Time Series Analysis DualDynamics: Synergetische Implizite und explizite Methoden zur robusten irregulären Zeitreihenanalyse 双动态: 优化非常规时间序列分析的隐含和显性方法 2401.04979v6

Authors (3): YongKyung Oh, Dong-Young Lim, Sungil Kim

Real-world time series analysis faces significant challenges when dealing with irregular and incomplete data. While Neural Differential Equation (NDE) based methods have shown promise, they struggle with limited expressiveness, scalability issues, and stability concerns. Conversely, Neural Flows offer stability but falter with irregular data. We introduce ‘DualDynamics’, a novel framework that synergistically combines NDE-based method and Neural Flow-based method. This approach enhances expressive power while balancing computational demands, addressing critical limitations of existing techniques. We demonstrate DualDynamics’ effectiveness across diverse tasks: classification of robustness to dataset shift, irregularly-sampled series analysis, interpolation of missing data, and forecasting with partial observations. Our results show consistent outperformance over state-of-the-art methods, indicating DualDynamics’ potential to advance irregular time series analysis significantly.

nan


Article 885

Title@2025-06-09 (1): Navigating the Latent Space Dynamics of Neural Models

Title: Navigating the Latent Space Dynamics of Neural Models Navigation der latenten Raumdynamik von Neuralmodellen 导航内壳模型的冷层空间动态 2505.22785v2

Authors (4): Marco Fumero, Luca Moschella, Emanuele Rodolà, Francesco Locatello

Neural networks transform high-dimensional data into compact, structured representations, often modeled as elements of a lower dimensional latent space. In this paper, we present an alternative interpretation of neural models as dynamical systems acting on the latent manifold. Specifically, we show that autoencoder models implicitly define a latent vector field on the manifold, derived by iteratively applying the encoding-decoding map, without any additional training. We observe that standard training procedures introduce inductive biases that lead to the emergence of attractor points within this vector field. Drawing on this insight, we propose to leverage the vector field as a representation for the network, providing a novel tool to analyze the properties of the model and the data. This representation enables to: (i) analyze the generalization and memorization regimes of neural models, even throughout training; (ii) extract prior knowledge encoded in the network’s parameters from the attractors, without requiring any input data; (iii) identify out-of-distribution samples from their trajectories in the vector field. We further validate our approach on vision foundation models, showcasing the applicability and effectiveness of our method in real-world scenarios.

nan


Article 886

Title@2025-06-09 (1): Comprehensive Review of Neural Differential Equations for Time Series Analysis

Title: Comprehensive Review of Neural Differential Equations for Time Series Analysis Umfassende Überprüfung neuraler Differentialgleichungen für die Zeitreihenanalyse 全面审查用于时间序列分析的神经差异等同 2502.09885v2

Authors (6): YongKyung Oh, Seungsu Kam, Jonghun Lee, Dong-Young Lim, Sungil Kim, Alex Bui

Time series modeling and analysis have become critical in various domains. Conventional methods such as RNNs and Transformers, while effective for discrete-time and regularly sampled data, face significant challenges in capturing the continuous dynamics and irregular sampling patterns inherent in real-world scenarios. Neural Differential Equations (NDEs) represent a paradigm shift by combining the flexibility of neural networks with the mathematical rigor of differential equations. This paper presents a comprehensive review of NDE-based methods for time series analysis, including neural ordinary differential equations, neural controlled differential equations, and neural stochastic differential equations. We provide a detailed discussion of their mathematical formulations, numerical methods, and applications, highlighting their ability to model continuous-time dynamics. Furthermore, we address key challenges and future research directions. This survey serves as a foundation for researchers and practitioners seeking to leverage NDEs for advanced time series analysis.

nan


Article 887

Title@2025-06-09 (1): Universal Differential Equations for Scientific Machine Learning of Node-Wise Battery Dynamics in Smart Grids

Title: Universal Differential Equations for Scientific Machine Learning of Node-Wise Battery Dynamics in Smart Grids Universelle Differentialgleichungen für das wissenschaftliche maschinelle Lernen von Node-Wise-Batteriedynamiken in Smart Grids 智能网格节点电池动态科学机器学习通用差异等量 2506.08272v1

Authors (1): Tarushri N. S.

Universal Differential Equations (UDEs), which blend neural networks with physical differential equations, have emerged as a powerful framework for scientific machine learning (SciML), enabling data-efficient, interpretable, and physically consistent modeling. In the context of smart grid systems, modeling node-wise battery dynamics remains a challenge due to the stochasticity of solar input and variability in household load profiles. Traditional approaches often struggle with generalization and fail to capture unmodeled residual dynamics. This work proposes a UDE-based approach to learn node-specific battery evolution by embedding a neural residual into a physically inspired battery ODE. Synthetic yet realistic solar generation and load demand data are used to simulate battery dynamics over time. The neural component learns to model unobserved or stochastic corrections arising from heterogeneity in node demand and environmental conditions. Comprehensive experiments reveal that the trained UDE aligns closely with ground truth battery trajectories, exhibits smooth convergence behavior, and maintains stability in long-term forecasts. These findings affirm the viability of UDE-based SciML approaches for battery modeling in decentralized energy networks and suggest broader implications for real-time control and optimization in renewable-integrated smart grids.

nan


Article 888

Title@2025-06-09 (1): Why Gradients Rapidly Increase Near the End of Training

Title: Why Gradients Rapidly Increase Near the End of Training Warum Gradienten am Ende der Ausbildung rasch zunehmen 为何在培训接近结束时梯度迅速增加 2506.02285v2

Authors (1): Aaron Defazio

During long-duration Large Language Model (LLM) training runs the gradient norm increases rapidly near the end of training. In this short note, we show that this increase is due to an unintended interaction between weight decay, normalization layers, and the learning rate schedule. We propose a simple correction that fixes this behavior while also resulting in lower loss values throughout training.

nan


Article 889

Title@2025-06-09 (1): Sparse Interpretable Deep Learning with LIES Networks for Symbolic Regression

Title: Sparse Interpretable Deep Learning with LIES Networks for Symbolic Regression Sparse Interpretable Deep Learning mit LIES-Netzwerken für symbolische Regression 与 LIES 网络共享的粗化可解释深空学习,用于符号回归 2506.08267v1

Authors (4): Mansooreh Montazerin, Majd Al Aawar, Antonio Ortega, Ajitesh Srivastava

Symbolic regression (SR) aims to discover closed-form mathematical expressions that accurately describe data, offering interpretability and analytical insight beyond standard black-box models. Existing SR methods often rely on population-based search or autoregressive modeling, which struggle with scalability and symbolic consistency. We introduce LIES (Logarithm, Identity, Exponential, Sine), a fixed neural network architecture with interpretable primitive activations that are optimized to model symbolic expressions. We develop a framework to extract compact formulae from LIES networks by training with an appropriate oversampling strategy and a tailored loss function to promote sparsity and to prevent gradient instability. After training, it applies additional pruning strategies to further simplify the learned expressions into compact formulae. Our experiments on SR benchmarks show that the LIES framework consistently produces sparse and accurate symbolic formulae outperforming all baselines. We also demonstrate the importance of each design component through ablation studies.

nan


Article 890

Title@2025-06-09 (1): Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints

Title: Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints Verstärktes Lernen aus menschlichem Feedback mit hochkonfidenziellen Sicherheitseinschränkungen 在高度信任安全限制下,从人类反馈中加强学习 2506.08266v1

Authors (6): Yaswanth Chittepu, Blossom Metevier, Will Schwarzer, Austin Hoag, Scott Niekum, Philip S. Thomas

Existing approaches to language model alignment often treat safety as a tradeoff against helpfulness, which can lead to unacceptable responses in sensitive domains. To ensure reliable performance in such settings, we propose High-Confidence Safe Reinforcement Learning from Human Feedback (HC-RLHF), a method that provides high-confidence safety guarantees while maximizing helpfulness. Similar to previous methods, HC-RLHF explicitly decouples human preferences into helpfulness and harmlessness (safety), which are learned by training a reward model and a cost model, respectively. It then employs a two-step process to find safe solutions. In the first step, it optimizes the reward function under an intentionally pessimistic version of the cost constraint. In the second step, the trained model undergoes a safety test to verify whether its performance stays within an upper-confidence bound of the actual cost constraint. We provide a theoretical analysis of HC-RLHF, including proof that it will not return an unsafe solution with a probability greater than a user-specified threshold. For our empirical analysis, we apply HC-RLHF to align three different language models (Qwen2-1.5B, Qwen2.5-3B, and LLaMa3.2-3B) with human preferences. Our results demonstrate that HC-RLHF produces safe models with high probability and can improve harmlessness and helpfulness compared to previous methods.

nan


Article 891

Title@2025-06-09 (1): Learning-Based Multiuser Scheduling in MIMO-OFDM Systems with Hybrid Beamforming

Title: Learning-Based Multiuser Scheduling in MIMO-OFDM Systems with Hybrid Beamforming Lernbasierte Multiuser-Scheichung in MIMO-OFDM-Systemen mit Hybrid-Beamforming MOIMO-OFDM系统和混合波束系统中基于学习的多用户规划 2506.08263v1

Authors (4): Pouya Agheli, Tugce Kobal, François Durand, Matthew Andrews

We investigate the multiuser scheduling problem in multiple-input multiple-output (MIMO) systems using orthogonal frequency division multiplexing (OFDM) and hybrid beamforming in which a base station (BS) communicates with multiple users over millimeter wave (mmWave) channels in the downlink. Improved scheduling is critical for enhancing spectral efficiency and the long-term performance of the system from the perspective of proportional fairness (PF) metric in hybrid beamforming systems due to its limited multiplexing gain. Our objective is to maximize PF by properly designing the analog and digital precoders within the hybrid beamforming and selecting the users subject to the number of radio frequency (RF) chains. Leveraging the characteristics of mmWave channels, we apply a two-timescale protocol. On a long timescale, we assign an analog beam to each user. Scheduling the users and designing the digital precoder are done accordingly on a short timescale. To conduct scheduling, we propose combinatorial solutions, such as greedy and sorting algorithms, followed by a machine learning (ML) approach. Our numerical results highlight the trade-off between the performance and complexity of the proposed approaches. Consequently, we show that the choice of approach depends on the specific criteria within a given scenario.

nan


Article 892

Title@2025-06-09 (1): SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense

Title: SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense SHIELD: Sichere Hypernetzwerke für inkrementale Expansions-Learning Defense SHIELD: 安全超网络,以逐步扩大学习防御 2506.08255v1

Authors (5): Patryk Krukowski, Łukasz Gorczyca, Piotr Helm, Kamil Książek, Przemysław Spurek

Traditional deep neural networks suffer from several limitations, including catastrophic forgetting. When models are adapted to new datasets, they tend to quickly forget previously learned knowledge. Another significant issue is the lack of robustness to even small perturbations in the input data. In practice, we can often easily perform adversarial attacks and change the network’s predictions, adding minimal noise to the input. Dedicated architectures and training procedures can solve each of the above problems separately. Unfortunately, currently, no model can simultaneously address both catastrophic forgetting and vulnerability to adversarial attacks. We introduce SHIELD (Secure Hypernetworks for Incremental Expansion and Learning Defense), a novel approach that integrates a hypernetwork-based continual learning approach with interval arithmetic. SHIELD use the hypernetwork to transfer trainable task embedding vectors into the weights of a target model dedicated to specific data. This paradigm allows for the dynamic generation of separate networks for each subtask, while the hypernetwork aggregates and analyzes information across all tasks. The target model takes in the input a data sample with a defined interval range, and by creating a hypercube, produces a prediction for the given range. Therefore, such target models provide strict guarantees against all possible attacks for data samples within the interval range. Our approach enhances security without sacrificing network adaptability, addressing the overlooked challenge of safety in continual learning.

nan


Article 893

Title@2025-06-09 (1): Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction

Title: Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction Selbstüberwachte Transformer als iterative Lösungsverbesserungen für Einschränkungen Zufriedenheit 自我主宰的变形器作为抑制满意度的循环解决方案改进器 2502.15794v2

Authors (4): Yudong W. Xu, Wenhao Li, Scott Sanner, Elias B. Khalil

We present a Transformer-based framework for Constraint Satisfaction Problems (CSPs). CSPs find use in many applications and thus accelerating their solution with machine learning is of wide interest. Most existing approaches rely on supervised learning from feasible solutions or reinforcement learning, paradigms that require either feasible solutions to these NP-Complete CSPs or large training budgets and a complex expert-designed reward signal. To address these challenges, we propose ConsFormer, a self-supervised framework that leverages a Transformer as a solution refiner. ConsFormer constructs a solution to a CSP iteratively in a process that mimics local search. Instead of using feasible solutions as labeled data, we devise differentiable approximations to the discrete constraints of a CSP to guide model training. Our model is trained to improve random assignments for a single step but is deployed iteratively at test time, circumventing the bottlenecks of supervised and reinforcement learning. Experiments on Sudoku, Graph Coloring, Nurse Rostering, and MAXCUT demonstrate that our method can tackle out-of-distribution CSPs simply through additional iterations.

nan


Article 894

Title@2025-06-09 (1): TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation

Title: TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation TRAVEL: Training-freies Retrieval und Ausrichtung für Vision-und-Sprachen-Navigation TRAVEL:无培训检索和协调愿景和语言导航 2502.07306v2

Authors (2): Navid Rajabi, Jana Kosecka

In this work, we propose a modular approach for the Vision-Language Navigation (VLN) task by decomposing the problem into four sub-modules that use state-of-the-art Large Language Models (LLMs) and Vision-Language Models (VLMs) in a zero-shot setting. Given navigation instruction in natural language, we first prompt LLM to extract the landmarks and the order in which they are visited. Assuming the known model of the environment, we retrieve the top-k locations of the last landmark and generate $k$ path hypotheses from the starting location to the last landmark using the shortest path algorithm on the topological map of the environment. Each path hypothesis is represented by a sequence of panoramas. We then use dynamic programming to compute the alignment score between the sequence of panoramas and the sequence of landmark names, which match scores obtained from VLM. Finally, we compute the nDTW metric between the hypothesis that yields the highest alignment score to evaluate the path fidelity. We demonstrate superior performance compared to other approaches that use joint semantic maps like VLMaps on the complex R2R-Habitat instruction dataset and quantify in detail the effect of visual grounding on navigation performance.

nan


Article 895

Title@2025-06-09 (1): Approximation-Aware Bayesian Optimization

Title: Approximation-Aware Bayesian Optimization Angemessene Bayesian-Optimierung 近似- Aware Bayesian 优化优化 2406.04308v2

Authors (6): Natalie Maus, Kyurae Kim, Geoff Pleiss, David Eriksson, John P. Cunningham, Jacob R. Gardner

High-dimensional Bayesian optimization (BO) tasks such as molecular design often require 10,000 function evaluations before obtaining meaningful results. While methods like sparse variational Gaussian processes (SVGPs) reduce computational requirements in these settings, the underlying approximations result in suboptimal data acquisitions that slow the progress of optimization. In this paper we modify SVGPs to better align with the goals of BO: targeting informed data acquisition rather than global posterior fidelity. Using the framework of utility-calibrated variational inference, we unify GP approximation and data acquisition into a joint optimization problem, thereby ensuring optimal decisions under a limited computational budget. Our approach can be used with any decision-theoretic acquisition function and is compatible with trust region methods like TuRBO. We derive efficient joint objectives for the expected improvement and knowledge gradient acquisition functions in both the standard and batch BO settings. Our approach outperforms standard SVGPs on high-dimensional benchmark tasks in control and molecular design.

nan


Article 896

Title@2025-06-09 (1): General Loss Functions Lead to (Approximate) Interpolation in High Dimensions

Title: General Loss Functions Lead to (Approximate) Interpolation in High Dimensions Allgemeine Verlustfunktionen führen zu (nahezu) Interpolation in hohen Dimensionen 导致(近近)高层面内插(近) 2303.07475v2

Authors (2): Kuo-Wei Lai, Vidya Muthukumar

We provide a unified framework that applies to a general family of convex losses across binary and multiclass settings in the overparameterized regime to approximately characterize the implicit bias of gradient descent in closed form. Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm interpolation in high dimensions, which arises from training on the squared loss. In contrast to prior work, which was tailored to exponentially-tailed losses and used the intermediate support-vector-machine formulation, our framework directly builds on the primal-dual analysis of Ji and Telgarsky (2021), allowing us to provide new approximate equivalences for general convex losses through a novel sensitivity analysis. Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings. Finally, we provide evidence for the tightness of our techniques and use our results to demonstrate the effect of certain loss functions designed for out-of-distribution problems on the closed-form solution.

nan


Article 897

Title@2025-06-09 (1): Parameter-free approximate equivariance for tasks with finite group symmetry

Title: Parameter-free approximate equivariance for tasks with finite group symmetry Parameterfreie ungefähre Äquivarianz für Aufgaben mit endlicher Gruppensymmetrie 有限组对称任务无参数 2506.08244v1

Authors (3): Riccardo Ali, Pietro Liò, Jamie Vicary

Equivariant neural networks incorporate symmetries through group actions, embedding them as an inductive bias to improve performance on a wide variety of tasks. However, existing equivariant methods can be computationally intensive, with high parameter counts, and are often tied to a specific architecture. We propose a simple zero-parameter approach that imposes approximate equivariance for a finite group in the latent representation, as an additional term in the loss function. We conduct experiments which allow the network to learn a group representation on the latent space, and show in every case it prefers to learn the regular representation. Fixing this action on the latent space, this yields a simple method to impose approximate equivariance as an additional loss penalty. We benchmark our approach on three datasets and compare it against several existing equivariant methods, showing that in many cases it achieves similar or better performance for a fraction of the parameters.

nan


Article 898

Title@2025-06-09 (1): Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic

Title: Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic Temporalisierendes Vertrauen: Bewertung von Ketten-of-Thought-Reasoning mit Signal Temporal Logic 时间化信任度:对根据信号时空逻辑进行的研究链原因的评价 2506.08243v1

Authors (4): Zhenjiang Mao, Artem Bisliouk, Rohith Reddy Nama, Ivan Ruchkin

Large Language Models (LLMs) have shown impressive performance in mathematical reasoning tasks when guided by Chain-of-Thought (CoT) prompting. However, they tend to produce highly confident yet incorrect outputs, which poses significant risks in domains like education, where users may lack the expertise to assess reasoning steps. To address this, we propose a structured framework that models stepwise confidence as a temporal signal and evaluates it using Signal Temporal Logic (STL). In particular, we define formal STL-based constraints to capture desirable temporal properties and compute robustness scores that serve as structured, interpretable confidence estimates. Our approach also introduces a set of uncertainty reshaping strategies to enforce smoothness, monotonicity, and causal consistency across the reasoning trajectory. Experiments show that our approach consistently improves calibration metrics and provides more reliable uncertainty estimates than conventional confidence aggregation and post-hoc calibration.

nan


Article 899

Title@2025-06-09 (1): Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations

Title: Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations Umgang mit den bösen Zwillingen: Zufällige Augmentation verbessern, indem Katastrophenvergessen von verschiedenen Augmentationen adressiert wird 对付邪恶双胞胎:通过解决灾难性地忘记多种增加现象,改善随机增加现象 2506.08240v1

Authors (2): Dongkyu Cho, Rumi Chunara

Data augmentation is a promising tool for enhancing out-of-distribution generalization, where the key is to produce diverse, challenging variations of the source domain via costly targeted augmentations that maximize its generalization effect. Conversely, random augmentation is inexpensive but is deemed suboptimal due to its limited effect. In this paper, we revisit random augmentation and explore methods to address its shortcomings. We show that the stochastic nature of random augmentation can produce a set of colliding augmentations that distorts the learned features, similar to catastrophic forgetting. We propose a simple solution that improves the generalization effect of random augmentation by addressing forgetting, which displays strong generalization performance across various single source domain generalization (sDG) benchmarks.

nan


Article 900

Title@2025-06-09 (1): MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning

Title: MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning MF-LAL: Arzneimittel-Compound-Generierung mit Multi-Fidelity Latent Space Active Learning MF-LAL:利用多功能低空空间主动学习生成药物化合物 2410.11226v3

Authors (5): Peter Eckmann, Dongxia Wu, Germano Heinzelmann, Michael K. Gilson, Rose Yu

Current generative models for drug discovery primarily use molecular docking as an oracle to guide the generation of active compounds. However, such models are often not useful in practice because even compounds with high docking scores do not consistently show real-world experimental activity. More accurate methods for activity prediction exist, such as molecular dynamics based binding free energy calculations, but they are too computationally expensive to use in a generative model. To address this challenge, we propose Multi-Fidelity Latent space Active Learning (MF-LAL), a generative modeling framework that integrates a set of oracles with varying cost-accuracy tradeoffs. Using active learning, we train a surrogate model for each oracle and use these surrogates to guide generation of compounds with high predicted activity. Unlike previous approaches that separately learn the surrogate model and generative model, MF-LAL combines the generative and multi-fidelity surrogate models into a single framework, allowing for more accurate activity prediction and higher quality samples. Our experiments on two disease-relevant proteins show that MF-LAL produces compounds with significantly better binding free energy scores than other single and multi-fidelity approaches (~50% improvement in mean binding free energy score). The code is available at https://github.com/Rose-STL-Lab/MF-LAL.

nan


Article 901

Title@2025-06-09 (1): Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework

Title: Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework Gewährleistung der Zuverlässigkeit von kuratierten EHR-Daten: Die Validierung der Genauigkeit für LLM/ML-extrahierte Informationen und Daten (VALID) Framework 确保封闭式电子HR-驱动数据可靠性:LLM/ML-提取信息和数据(VALID)框架的准确性验证 2506.08231v1

Authors (11): Melissa Estevez, Nisha Singh, Lauren Dyson, Blythe Adamson, Qianyu Yuan, Megan W. Hildner, Erin Fidyk, Olive Mbah, Farhad Khan, Kathi Seidl-Rathkopf, Aaron B. Cohen

Large language models (LLMs) are increasingly used to extract clinical data from electronic health records (EHRs), offering significant improvements in scalability and efficiency for real-world data (RWD) curation in oncology. However, the adoption of LLMs introduces new challenges in ensuring the reliability, accuracy, and fairness of extracted data, which are essential for research, regulatory, and clinical applications. Existing quality assurance frameworks for RWD and artificial intelligence do not fully address the unique error modes and complexities associated with LLM-extracted data. In this paper, we propose a comprehensive framework for evaluating the quality of clinical data extracted by LLMs. The framework integrates variable-level performance benchmarking against expert human abstraction, automated verification checks for internal consistency and plausibility, and replication analyses comparing LLM-extracted data to human-abstracted datasets or external standards. This multidimensional approach enables the identification of variables most in need of improvement, systematic detection of latent errors, and confirmation of dataset fitness-for-purpose in real-world research. Additionally, the framework supports bias assessment by stratifying metrics across demographic subgroups. By providing a rigorous and transparent method for assessing LLM-extracted RWD, this framework advances industry standards and supports the trustworthy use of AI-powered evidence generation in oncology research and practice.

nan


Article 902

Title@2025-06-09 (1): Scaling Laws of Motion Forecasting and Planning – A Technical Report

Title: Scaling Laws of Motion Forecasting and Planning – A Technical Report Skalierungsgesetze der Bewegungsvorhersage und Planung – Ein technischer Bericht 扩大运动预测和规划法 – – 技术报告 2506.08228v1

Authors (16): Mustafa Baniodeh, Kratarth Goel, Scott Ettinger, Carlos Fuertes, Ari Seff, Tim Shen, Cole Gulino, Chenjie Yang, Ghassen Jerfel, Dokook Choe, Rui Wang, Vinutha Kallem, Sergio Casas, Rami Al-Rfou, Benjamin Sapp, Dragomir Anguelov

We study the empirical scaling laws of a family of encoder-decoder autoregressive transformer models on the task of joint motion forecasting and planning in the autonomous driving domain. Using a 500 thousand hours driving dataset, we demonstrate that, similar to language modeling, model performance improves as a power-law function of the total compute budget, and we observe a strong correlation between model training loss and model evaluation metrics. Most interestingly, closed-loop metrics also improve with scaling, which has important implications for the suitability of open-loop metrics for model development and hill climbing. We also study the optimal scaling of the number of transformer parameters and the training data size for a training compute-optimal model. We find that as the training compute budget grows, optimal scaling requires increasing the model size 1.5x as fast as the dataset size. We also study inference-time compute scaling, where we observe that sampling and clustering the output of smaller models makes them competitive with larger models, up to a crossover point beyond which a larger models becomes more inference-compute efficient. Overall, our experimental results demonstrate that optimizing the training and inference-time scaling properties of motion forecasting and planning models is a key lever for improving their performance to address a wide variety of driving scenarios. Finally, we briefly study the utility of training on general logged driving data of other agents to improve the performance of the ego-agent, an important research area to address the scarcity of robotics data for large capacity models training.

nan


Article 903

Title@2025-06-09 (1): Mondrian: Transformer Operators via Domain Decomposition

Title: Mondrian: Transformer Operators via Domain Decomposition Mondrian: Transformer Operators via Domain Decomposition Mondrian:通过域分解的变形操作器 2506.08226v1

Authors (3): Arthur Feeney, Kuei-Hsiang Huang, Aparna Chandramowlishwaran

Operator learning enables data-driven modeling of partial differential equations (PDEs) by learning mappings between function spaces. However, scaling transformer-based operator models to high-resolution, multiscale domains remains a challenge due to the quadratic cost of attention and its coupling to discretization. We introduce \textbf{Mondrian}, transformer operators that decompose a domain into non-overlapping subdomains and apply attention over sequences of subdomain-restricted functions. Leveraging principles from domain decomposition, Mondrian decouples attention from discretization. Within each subdomain, it replaces standard layers with expressive neural operators, and attention across subdomains is computed via softmax-based inner products over functions. The formulation naturally extends to hierarchical windowed and neighborhood attention, supporting both local and global interactions. Mondrian achieves strong performance on Allen-Cahn and Navier-Stokes PDEs, demonstrating resolution scaling without retraining. These results highlight the promise of domain-decomposed attention for scalable and general-purpose neural operators.

nan


Article 904

Title@2025-06-09 (1): Harnessing the Continuous Structure: Utilizing the First-order Approach in Online Contract Design

Title: Harnessing the Continuous Structure: Utilizing the First-order Approach in Online Contract Design Kontinuierliche Struktur: Nutzung des First-Order-Ansatzes im Online-Contract Design 利用连续结构:在网上合同设计中使用一等办法 2403.07143v3

Authors (1): Shiliang Zuo

This work studies the online contract design problem. The principal’s goal is to learn the optimal contract that maximizes her utility through repeated interactions, without prior knowledge of the agent’s type (i.e., the agent’s cost and production functions). We leverage the structure provided by continuous action spaces, which allows the application of first-order conditions (FOC) to characterize the agent’s behavior. In some cases, we utilize conditions from the first-order approach (FOA) in economics, but in certain settings, we are able to apply FOC without additional assumptions, leading to simpler and more principled algorithms. We illustrate this approach in three problem settings. Firstly, we study the problem of learning the optimal contract when there can be many outcomes. In contrast to prior works that design highly specialized algorithms, we show that the problem can be directly reduced to Lipschitz bandits. Secondly, we study the problem of learning linear contracts. While the contracting problem involves hidden action (moral hazard) and the pricing problem involves hidden value (adverse selection), the two problems share a similar optimization structure, which enables direct reduction between the problem of learning linear contracts and dynamic pricing. Thirdly, we study the problem of learning contracts with many outcomes when agents are identical and provide an algorithm with polynomial sample complexity.

nan


Article 905

Title@2025-06-09 (1): What makes an Ensemble (Un) Interpretable?

Title: What makes an Ensemble (Un) Interpretable? Was macht ein Ensemble (Un) Interpretierbar? 集合物怎么可以解释呢? 2506.08216v1

Authors (4): Shahaf Bassan, Guy Amir, Meirav Zehavi, Guy Katz

Ensemble models are widely recognized in the ML community for their limited interpretability. For instance, while a single decision tree is considered interpretable, ensembles of trees (e.g., boosted trees) are often treated as black-boxes. Despite this folklore recognition, there remains a lack of rigorous mathematical understanding of what particularly makes an ensemble (un)-interpretable, including how fundamental factors like the (1) number, (2) size, and (3) type of base models influence its interpretability. In this work, we seek to bridge this gap by applying concepts from computational complexity theory to study the challenges of generating explanations for various ensemble configurations. Our analysis uncovers nuanced complexity patterns influenced by various factors. For example, we demonstrate that under standard complexity assumptions like P$\neq$NP, interpreting ensembles remains intractable even when base models are of constant size. Surprisingly, the complexity changes drastically with the number of base models: small ensembles of decision trees are efficiently interpretable, whereas interpreting ensembles with even a constant number of linear models remains intractable. We believe that our findings provide a more robust foundation for understanding the interpretability of ensembles, emphasizing the benefits of examining it through a computational complexity lens.

nan


Article 906

Title@2025-06-09 (1): Scalable Meta-Learning via Mixed-Mode Differentiation

Title: Scalable Meta-Learning via Mixed-Mode Differentiation Skalierbares Meta-Learning über Mixed-Mode Differenzierung 通过混合模式差异的可缩放元学习 2505.00793v2

Authors (5): Iurii Kemaev, Dan A Calian, Luisa M Zintgraf, Gregory Farquhar, Hado van Hasselt

Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the gradient-based optimisation itself, leading to “gradient-of-a-gradient” calculations with computationally expensive second-order and mixed derivatives. While modern automatic differentiation libraries provide a convenient way to write programs for calculating these derivatives, they oftentimes cannot fully exploit the specific structure of these problems out-of-the-box, leading to suboptimal performance. In this paper, we analyse such cases and propose Mixed-Flow Meta-Gradients, or MixFlow-MG – a practical algorithm that uses mixed-mode differentiation to construct more efficient and scalable computational graphs yielding over 10x memory and up to 25% wall-clock time improvements over standard implementations in modern meta-learning setups.

nan


Article 907

Title@2025-06-09 (1): Distinguishing Cause from Effect with Causal Velocity Models

Title: Distinguishing Cause from Effect with Causal Velocity Models Bezeichnende Ursache von Wirkung mit kausalen Velocity-Modellen 区别因与因果速率模型的影响的原因 2502.05122v2

Authors (4): Johnny Xi, Hugh Dance, Peter Orbanz, Benjamin Bloem-Reddy

Bivariate structural causal models (SCM) are often used to infer causal direction by examining their goodness-of-fit under restricted model classes. In this paper, we describe a parametrization of bivariate SCMs in terms of a causal velocity by viewing the cause variable as time in a dynamical system. The velocity implicitly defines counterfactual curves via the solution of initial value problems where the observation specifies the initial condition. Using tools from measure transport, we obtain a unique correspondence between SCMs and the score function of the generated distribution via its causal velocity. Based on this, we derive an objective function that directly regresses the velocity against the score function, the latter of which can be estimated non-parametrically from observational data. We use this to develop a method for bivariate causal discovery that extends beyond known model classes such as additive or location scale noise, and that requires no assumptions on the noise distributions. When the score is estimated well, the objective is also useful for detecting model non-identifiability and misspecification. We present positive results in simulation and benchmark experiments where many existing methods fail, and perform ablation studies to examine the method’s sensitivity to accurate score estimation.

nan


Article 908

Title@2025-06-09 (1): A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation

Title: A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation Eine umfassende Studie von Decoder-Only LLMs für die Text-to-Image-Generierung 综合研究 “ 文字到图像生成的开发者 – – 唯一 “ 有限LLMs “ 综合研究 2506.08210v1

Authors (5): Andrew Z. Wang, Songwei Ge, Tero Karras, Ming-Yu Liu, Yogesh Balaji

Both text-to-image generation and large language models (LLMs) have made significant advancements. However, many text-to-image models still employ the somewhat outdated T5 and CLIP as their text encoders. In this work, we investigate the effectiveness of using modern decoder-only LLMs as text encoders for text-to-image diffusion models. We build a standardized training and evaluation pipeline that allows us to isolate and evaluate the effect of different text embeddings. We train a total of 27 text-to-image models with 12 different text encoders to analyze the critical aspects of LLMs that could impact text-to-image generation, including the approaches to extract embeddings, different LLMs variants, and model sizes. Our experiments reveal that the de facto way of using last-layer embeddings as conditioning leads to inferior performance. Instead, we explore embeddings from various layers and find that using layer-normalized averaging across all layers significantly improves alignment with complex prompts. Most LLMs with this conditioning outperform the baseline T5 model, showing enhanced performance in advanced visio-linguistic reasoning skills.

nan


Article 909

Title@2025-06-09 (1): A Machine Learning Approach to Generate Residual Stress Distributions using Sparse Characterization Data in Friction-Stir Processed Parts

Title: A Machine Learning Approach to Generate Residual Stress Distributions using Sparse Characterization Data in Friction-Stir Processed Parts Ein maschineller Lernansatz zur Generierung residualer Stressverteilungen mit Sparse Charakterisierungsdaten in Friction-Stir Prozessteilen 利用摩擦-标准处理部件中的扭曲特征数据生成残余应激反应分布的机械学习方法 2506.08205v1

Authors (3): Shadab Anwar Shaikh, Kranthi Balusu, Ayoub Soulami

Residual stresses, which remain within a component after processing, can deteriorate performance. Accurately determining their full-field distributions is essential for optimizing the structural integrity and longevity. However, the experimental effort required for full-field characterization is impractical. Given these challenges, this work proposes a machine learning (ML) based Residual Stress Generator (RSG) to infer full-field stresses from limited measurements. An extensive dataset was initially constructed by performing numerous process simulations with a diverse parameter set. A ML model based on U-Net architecture was then trained to learn the underlying structure through systematic hyperparameter tuning. Then, the model’s ability to generate simulated stresses was evaluated, and it was ultimately tested on actual characterization data to validate its effectiveness. The model’s prediction of simulated stresses shows that it achieved excellent predictive accuracy and exhibited a significant degree of generalization, indicating that it successfully learnt the latent structure of residual stress distribution. The RSG’s performance in predicting experimentally characterized data highlights the feasibility of the proposed approach in providing a comprehensive understanding of residual stress distributions from limited measurements, thereby significantly reducing experimental efforts.

nan


Article 910

Title@2025-06-09 (1): Correlated Noise Mechanisms for Differentially Private Learning

Title: Correlated Noise Mechanisms for Differentially Private Learning Correlated Noise Mechanisms for Differentially Private Learning 差别私人学习相关噪音机制 2506.08201v1

Authors (12): Krishna Pillutla, Jalaj Upadhyay, Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, Arun Ganesh, Monika Henzinger, Jonathan Katz, Ryan McKenna, H. Brendan McMahan, Keith Rush, Thomas Steinke, Abhradeep Thakurta

This monograph explores the design and analysis of correlated noise mechanisms for differential privacy (DP), focusing on their application to private training of AI and machine learning models via the core primitive of estimation of weighted prefix sums. While typical DP mechanisms inject independent noise into each step of a stochastic gradient (SGD) learning algorithm in order to protect the privacy of the training data, a growing body of recent research demonstrates that introducing (anti-)correlations in the noise can significantly improve privacy-utility trade-offs by carefully canceling out some of the noise added on earlier steps in subsequent steps. Such correlated noise mechanisms, known variously as matrix mechanisms, factorization mechanisms, and DP-Follow-the-Regularized-Leader (DP-FTRL) when applied to learning algorithms, have also been influential in practice, with industrial deployment at a global scale.

nan


Article 911

Title@2025-06-09 (1): Tight Lower Bounds and Improved Convergence in Performative Prediction

Title: Tight Lower Bounds and Improved Convergence in Performative Prediction Enge niedrigere Grenzen und verbesserte Konvergenz in der Performativen Vorhersage 性能预测的下下角拉紧和更好趋同 2412.03671v2

Authors (5): Pedram Khorsandi, Rushil Gupta, Mehrnaz Mofakhami, Simon Lacoste-Julien, Gauthier Gidel

Performative prediction is a framework accounting for the shift in the data distribution induced by the prediction of a model deployed in the real world. Ensuring rapid convergence to a stable solution where the data distribution remains the same after the model deployment is crucial, especially in evolving environments. This paper extends the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous retraining snapshots, yielding a class of algorithms that we call Affine Risk Minimizers and enabling convergence to a performatively stable point for a broader class of problems. We introduce a new upper bound for methods that use only the final iteration of the dataset and prove for the first time the tightness of both this new bound and the previous existing bounds within the same regime. We also prove that utilizing historical datasets can surpass the lower bound for last iterate RRM, and empirically observe faster convergence to the stable point on various performative prediction benchmarks. We offer at the same time the first lower bound analysis for RRM within the class of Affine Risk Minimizers, quantifying the potential improvements in convergence speed that could be achieved with other variants in our framework.

nan


Article 912

Title@2025-06-09 (1): Interpreting Agent Behaviors in Reinforcement-Learning-Based Cyber-Battle Simulation Platforms

Title: Interpreting Agent Behaviors in Reinforcement-Learning-Based Cyber-Battle Simulation Platforms Verdolmetschung von Agenten-Behaviors bei verstärkten, Learning-basierten Cyber-Battle-Simulationsplattformen 强化-以学习为基础的网络战模拟平台中解释代理行为行为 2506.08192v1

Authors (5): Jared Claypoole, Steven Cheung, Ashish Gehani, Vinod Yegneswaran, Ahmad Ridley

We analyze two open source deep reinforcement learning agents submitted to the CAGE Challenge 2 cyber defense challenge, where each competitor submitted an agent to defend a simulated network against each of several provided rules-based attack agents. We demonstrate that one can gain interpretability of agent successes and failures by simplifying the complex state and action spaces and by tracking important events, shedding light on the fine-grained behavior of both the defense and attack agents in each experimental scenario. By analyzing important events within an evaluation episode, we identify patterns in infiltration and clearing events that tell us how well the attacker and defender played their respective roles; for example, defenders were generally able to clear infiltrations within one or two timesteps of a host being exploited. By examining transitions in the environment’s state caused by the various possible actions, we determine which actions tended to be effective and which did not, showing that certain important actions are between 40% and 99% ineffective. We examine how decoy services affect exploit success, concluding for instance that decoys block up to 94% of exploits that would directly grant privileged access to a host. Finally, we discuss the realism of the challenge and ways that the CAGE Challenge 4 has addressed some of our concerns.

nan


Article 913

Title@2025-06-09 (1): Adapting Prediction Sets to Distribution Shifts Without Labels

Title: Adapting Prediction Sets to Distribution Shifts Without Labels Anpassung von Vorhersagen an Verteilungsverschiebungen ohne Etiketten 调整预测集以适应无标签的分布转换 2406.01416v2

Authors (4): Kevin Kasa, Zhiyu Zhang, Heng Yang, Graham W. Taylor

Recently there has been a surge of interest to deploy confidence set predictions rather than point predictions in machine learning. Unfortunately, the effectiveness of such prediction sets is frequently impaired by distribution shifts in practice, and the challenge is often compounded by the lack of ground truth labels at test time. Focusing on a standard set-valued prediction framework called conformal prediction (CP), this paper studies how to improve its practical performance using only unlabeled data from the shifted test domain. This is achieved by two new methods called ECP and EACP, whose main idea is to adjust the score function in CP according to its base model’s own uncertainty evaluation. Through extensive experiments on a number of large-scale datasets and neural network architectures, we show that our methods provide consistent improvement over existing baselines and nearly match the performance of fully supervised methods.

nan


Article 914

Title@2025-06-09 (1): Innate-Values-driven Reinforcement Learning based Cognitive Modeling

Title: Innate-Values-driven Reinforcement Learning based Cognitive Modeling Angeborene Werte-getriebene Verstärkungs-Lernen auf Kognitive Modellierung 基于认知模型的强化学习模型 2411.09160v2

Authors (1): Qin Yang

Innate values describe agents’ intrinsic motivations, which reflect their inherent interests and preferences for pursuing goals and drive them to develop diverse skills that satisfy their various needs. Traditional reinforcement learning (RL) is learning from interaction based on the feedback rewards of the environment. However, in real scenarios, the rewards are generated by agents’ innate value systems, which differ vastly from individuals based on their needs and requirements. In other words, considering the AI agent as a self-organizing system, developing its awareness through balancing internal and external utilities based on its needs in different tasks is a crucial problem for individuals learning to support others and integrate community with safety and harmony in the long term. To address this gap, we propose a new RL model termed innate-values-driven RL (IVRL) based on combined motivations’ models and expected utility theory to mimic its complex behaviors in the evolution through decision-making and learning. Then, we introduce two IVRL-based models: IV-DQN and IV-A2C. By comparing them with benchmark algorithms such as DQN, DDQN, A2C, and PPO in the Role-Playing Game (RPG) reinforcement learning test platform VIZDoom, we demonstrated that the IVRL-based models can help the agent rationally organize various needs, achieve better performance effectively.

nan


Article 915

Title@2025-06-09 (1): Innate-Values-driven Reinforcement Learning based Cooperative Multi-Agent Cognitive Modeling

Title: Innate-Values-driven Reinforcement Learning based Cooperative Multi-Agent Cognitive Modeling Angeborene Werte-getriebene Verstärkung Learning basierte kooperative Multi-Agent Kognitive Modellierung 以基于强化的学习为基础的合作多代理共认型建模 2401.05572v2

Authors (1): Qin Yang

In multi-agent systems (MAS), the dynamic interaction among multiple decision-makers is driven by their innate values, affecting the environment’s state, and can cause specific behavioral patterns to emerge. On the other hand, innate values in cognitive modeling reflect individual interests and preferences for specific tasks and drive them to develop diverse skills and plans, satisfying their various needs and achieving common goals in cooperation. Therefore, building the awareness of AI agents to balance the group utilities and system costs and meet group members’ needs in their cooperation is a crucial problem for individuals learning to support their community and even integrate into human society in the long term. However, the current MAS reinforcement learning domain lacks a general intrinsic model to describe agents’ dynamic motivation for decision-making and learning from an individual needs perspective in their cooperation. To address the gap, this paper proposes a general MAS innate-values reinforcement learning (IVRL) architecture from the individual preferences angle. We tested the Multi-Agent IVRL Actor-Critic Model in different StarCraft Multi-Agent Challenge (SMAC) settings, which demonstrated its potential to organize the group’s behaviours to achieve better performance.

nan


Article 916

Title@2025-06-09 (1): FedGA-Tree: Federated Decision Tree using Genetic Algorithm

Title: FedGA-Tree: Federated Decision Tree using Genetic Algorithm FedGA-Baum: Föderierter Entscheidungsbaum mit genetischem Algorithmus FedGA-Tree:使用遗传算法的联邦决策树 2506.08176v1

Authors (2): Anh V Nguyen, Diego Klabjan

In recent years, with rising concerns for data privacy, Federated Learning has gained prominence, as it enables collaborative training without the aggregation of raw data from participating clients. However, much of the current focus has been on parametric gradient-based models, while nonparametric counterparts such as decision tree are relatively understudied. Existing methods for adapting decision trees to Federated Learning generally combine a greedy tree-building algorithm with differential privacy to produce a global model for all clients. These methods are limited to classification trees and categorical data due to the constraints of differential privacy. In this paper, we explore an alternative approach that utilizes Genetic Algorithm to facilitate the construction of personalized decision trees and accommodate categorical and numerical data, thus allowing for both classification and regression trees. Comprehensive experiments demonstrate that our method surpasses decision trees trained solely on local data and a benchmark algorithm.

nan


Article 917

Title@2025-06-09 (1): xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Title: xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs xGen-MM-Vid (BLIP-3-Video): Sie benötigen nur 32 Zeichen, um ein Video auch in VLMs darzustellen xGen-MM-Vid (BLIP-3-Video):您只需32个调子即可在 VLMs 中代表视频 2410.16267v2

Authors (12): Michael S. Ryoo, Honglu Zhou, Shrikant Kendre, Can Qin, Le Xue, Manli Shu, Jongwoo Park, Kanchana Ranasinghe, Silvio Savarese, Ran Xu, Caiming Xiong, Juan Carlos Niebles

We present xGen-MM-Vid (BLIP-3-Video): a multimodal language model for videos, particularly designed to efficiently capture temporal information over multiple frames. BLIP-3-Video takes advantage of the ‘temporal encoder’ in addition to the conventional visual tokenizer, which maps a sequence of tokens over multiple frames into a compact set of visual tokens. This enables BLIP3-Video to use much fewer visual tokens than its competing models (e.g., 32 vs. 4608 tokens). We explore different types of temporal encoders, including learnable spatio-temporal pooling as well as sequential models like Token Turing Machines. We experimentally confirm that BLIP-3-Video obtains video question-answering accuracies comparable to much larger state-of-the-art models (e.g., 34B), while being much smaller (i.e., 4B) and more efficient by using fewer visual tokens. The project website is at https://www.salesforceairesearch.com/opensource/xGen-MM-Vid/index.html

nan


Article 918

Title@2025-06-09 (1): Grouped Discrete Representation for Object-Centric Learning

Title: Grouped Discrete Representation for Object-Centric Learning Gruppierte diskrete Darstellung für Objekt-Zentrisches Lernen 用于对象中心学习的分组差异代表 2411.02299v2

Authors (4): Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen

Object-Centric Learning (OCL) aims to discover objects in images or videos by reconstructing the input. Representative methods achieve this by reconstructing the input as its Variational Autoencoder (VAE) discrete representations, which suppress (super-)pixel noise and enhance object separability. However, these methods treat features as indivisible units, overlooking their compositional attributes, and discretize features via scalar code indexes, losing attribute-level similarities and differences. We propose Grouped Discrete Representation (GDR) for OCL. For better generalization, features are decomposed into combinatorial attributes by organized channel grouping. For better convergence, features are quantized into discrete representations via tuple code indexes. Experiments demonstrate that GDR consistently improves both mainstream and state-of-the-art OCL methods across various datasets. Visualizations further highlight GDR’s superior object separability and interpretability. The source code is available on https://github.com/Genera1Z/GroupedDiscreteRepresentation.

nan


Article 919

Title@2025-06-09 (1): Federated Learning on Stochastic Neural Networks

Title: Federated Learning on Stochastic Neural Networks Föderiertes Lernen in stochastischen Neuronalen Netzwerken 存储神经神经网络方面的联邦学习 2506.08169v1

Authors (4): Jingqiao Tang, Ryan Bausback, Feng Bao, Richard Archibald

Federated learning is a machine learning paradigm that leverages edge computing on client devices to optimize models while maintaining user privacy by ensuring that local data remains on the device. However, since all data is collected by clients, federated learning is susceptible to latent noise in local datasets. Factors such as limited measurement capabilities or human errors may introduce inaccuracies in client data. To address this challenge, we propose the use of a stochastic neural network as the local model within the federated learning framework. Stochastic neural networks not only facilitate the estimation of the true underlying states of the data but also enable the quantification of latent noise. We refer to our federated learning approach, which incorporates stochastic neural networks as local models, as Federated stochastic neural networks. We will present numerical experiments demonstrating the performance and effectiveness of our method, particularly in handling non-independent and identically distributed data.

nan


Article 920

Title@2025-06-09 (1): SILK: Smooth InterpoLation frameworK for motion in-betweening A Simplified Computational Approach

Title: SILK: Smooth InterpoLation frameworK for motion in-betweening A Simplified Computational Approach SILK: Smooth InterpoLation frameworK für Bewegung zwischen einem vereinfachten Computational Approach SILK: 在简化计算方法之间运动的平滑间插框架work 2506.09075v1

Authors (4): Elly Akhoundi, Hung Yu Ling, Anup Anand Deshmukh, Judith Butepage

Motion in-betweening is a crucial tool for animators, enabling intricate control over pose-level details in each keyframe. Recent machine learning solutions for motion in-betweening rely on complex models, incorporating skeleton-aware architectures or requiring multiple modules and training steps. In this work, we introduce a simple yet effective Transformer-based framework, employing a single Transformer encoder to synthesize realistic motions for motion in-betweening tasks. We find that data modeling choices play a significant role in improving in-betweening performance. Among others, we show that increasing data volume can yield equivalent or improved motion transitions, that the choice of pose representation is vital for achieving high-quality results, and that incorporating velocity input features enhances animation performance. These findings challenge the assumption that model complexity is the primary determinant of animation quality and provide insights into a more data-centric approach to motion interpolation. Additional videos and supplementary material are available at https://silk-paper.github.io.

nan


Article 921

Title@2025-06-09 (1): UniVarFL: Uniformity and Variance Regularized Federated Learning for Heterogeneous Data

Title: UniVarFL: Uniformity and Variance Regularized Federated Learning for Heterogeneous Data UniVarFL: Uniformität und Varianz Regularisiertes Federated Learning für heterogene Daten UUVARFL: 统一和差异校正的联邦异种数据学习 2506.08167v1

Authors (3): Sunny Gupta, Nikita Jangid, Amit Sethi

Federated Learning (FL) often suffers from severe performance degradation when faced with non-IID data, largely due to local classifier bias. Traditional remedies such as global model regularization or layer freezing either incur high computational costs or struggle to adapt to feature shifts. In this work, we propose UniVarFL, a novel FL framework that emulates IID-like training dynamics directly at the client level, eliminating the need for global model dependency. UniVarFL leverages two complementary regularization strategies during local training: Classifier Variance Regularization, which aligns class-wise probability distributions with those expected under IID conditions, effectively mitigating local classifier bias; and Hyperspherical Uniformity Regularization, which encourages a uniform distribution of feature representations across the hypersphere, thereby enhancing the model’s ability to generalize under diverse data distributions. Extensive experiments on multiple benchmark datasets demonstrate that UniVarFL outperforms existing methods in accuracy, highlighting its potential as a highly scalable and efficient solution for real-world FL deployments, especially in resource-constrained settings. Code: https://github.com/sunnyinAI/UniVarFL

nan


Article 922

Title@2025-06-09 (1): BLUR: A Bi-Level Optimization Approach for LLM Unlearning

Title: BLUR: A Bi-Level Optimization Approach for LLM Unlearning BLUR: Ein Bi-Level-Optimierungsansatz für LLM-Unlearning BLUR: LLM 重新学习的双级优化办法 2506.08164v1

Authors (9): Hadi Reisizadeh, Jinghan Jia, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Sijia Liu, Mingyi Hong

Enabling large language models (LLMs) to unlearn knowledge and capabilities acquired during training has proven vital for ensuring compliance with data regulations and promoting ethical practices in generative AI. Although there are growing interests in developing various unlearning algorithms, it remains unclear how to best formulate the unlearning problem. The most popular formulation uses a weighted sum of forget and retain loss, but it often leads to performance degradation due to the inherent trade-off between forget and retain losses. In this work, we argue that it is important to model the hierarchical structure of the unlearning problem, where the forget problem (which \textit{unlearns} certain knowledge and/or capabilities) takes priority over the retain problem (which preserves model utility). This hierarchical structure naturally leads to a bi-level optimization formulation where the lower-level objective focuses on minimizing the forget loss, while the upper-level objective aims to maintain the model’s utility. Based on this new formulation, we propose a novel algorithm, termed Bi-Level UnleaRning (\texttt{BLUR}), which not only possesses strong theoretical guarantees but more importantly, delivers superior performance. In particular, our extensive experiments demonstrate that \texttt{BLUR} consistently outperforms all the state-of-the-art algorithms across various unlearning tasks, models, and metrics. Codes are available at https://github.com/OptimAI-Lab/BLURLLMUnlearning.

nan


Article 923

Title@2025-06-09 (1): A Metrics-Oriented Architectural Model to Characterize Complexity on Machine Learning-Enabled Systems

Title: A Metrics-Oriented Architectural Model to Characterize Complexity on Machine Learning-Enabled Systems Ein metrisch ausgerichtetes architektonisches Modell zur Charakterisierung von Komplexität auf maschinell lernfähigen Systemen 以计量为主的建筑建筑模型,以明确机械学习系统的复杂性 2506.08153v1

Authors (1): Renato Cordeiro Ferreira

How can the complexity of ML-enabled systems be managed effectively? The goal of this research is to investigate how complexity affects ML-Enabled Systems (MLES). To address this question, this research aims to introduce a metrics-based architectural model to characterize the complexity of MLES. The goal is to support architectural decisions, providing a guideline for the inception and growth of these systems. This paper showcases the first step for creating the metrics-based architectural model: an extension of a reference architecture that can describe MLES to collect their metrics.

nan


Article 924

Title@2025-06-09 (1): Towards a Mechanistic Explanation of Diffusion Model Generalization

Title: Towards a Mechanistic Explanation of Diffusion Model Generalization Auf dem Weg zu einer mechanistischen Erklärung der Verallgemeinerung des Difffusionsmodells 争取对传播模型通用化进行机械解释 2411.19339v3

Authors (4): Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy, Frank Wood

We propose a simple, training-free mechanism which explains the generalization behaviour of diffusion models. By comparing pre-trained diffusion models to their theoretically optimal empirical counterparts, we identify a shared local inductive bias across a variety of network architectures. From this observation, we hypothesize that network denoisers generalize through localized denoising operations, as these operations approximate the training objective well over much of the training distribution. To validate our hypothesis, we introduce novel denoising algorithms which aggregate local empirical denoisers to replicate network behaviour. Comparing these algorithms to network denoisers across forward and reverse diffusion processes, our approach exhibits consistent visual similarity to neural network outputs, with lower mean squared error than previously proposed methods.

nan


Article 925

Title@2025-06-09 (1): Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models

Title: Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models Mehrsprachige Hate Speech Detection in sozialen Medien mit translationsbasierten Ansätzen mit großen Sprachmodellen 在社会媒体中使用基于翻译的方法和大语言模式的多语言社会媒体多语言仇恨言论探测 2506.08147v1

Authors (6): Muhammad Usman, Muhammad Ahmad, M. Shahiki Tash, Irina Gelbukh, Rolando Quintero Tellez, Grigori Sidorov

Social media platforms are critical spaces for public discourse, shaping opinions and community dynamics, yet their widespread use has amplified harmful content, particularly hate speech, threatening online safety and inclusivity. While hate speech detection has been extensively studied in languages like English and Spanish, Urdu remains underexplored, especially using translation-based approaches. To address this gap, we introduce a trilingual dataset of 10,193 tweets in English (3,834 samples), Urdu (3,197 samples), and Spanish (3,162 samples), collected via keyword filtering, with a balanced distribution of 4,849 Hateful and 5,344 Not-Hateful labels. Our methodology leverages attention layers as a precursor to transformer-based models and large language models (LLMs), enhancing feature extraction for multilingual hate speech detection. For non-transformer models, we use TF-IDF for feature extraction. The dataset is benchmarked using state-of-the-art models, including GPT-3.5 Turbo and Qwen 2.5 72B, alongside traditional machine learning models like SVM and other transformers (e.g., BERT, RoBERTa). Three annotators, following rigorous guidelines, ensured high dataset quality, achieving a Fleiss’ Kappa of 0.821. Our approach, integrating attention layers with GPT-3.5 Turbo and Qwen 2.5 72B, achieves strong performance, with macro F1 scores of 0.87 for English (GPT-3.5 Turbo), 0.85 for Spanish (GPT-3.5 Turbo), 0.81 for Urdu (Qwen 2.5 72B), and 0.88 for the joint multilingual model (Qwen 2.5 72B). These results reflect improvements of 8.75% in English (over SVM baseline 0.80), 8.97% in Spanish (over SVM baseline 0.78), 5.19% in Urdu (over SVM baseline 0.77), and 7.32% in the joint multilingual model (over SVM baseline 0.82). Our framework offers a robust solution for multilingual hate speech detection, fostering safer digital communities worldwide.

nan


Article 926

Title@2025-06-09 (1): A Theory of Machine Understanding via the Minimum Description Length Principle

Title: A Theory of Machine Understanding via the Minimum Description Length Principle Eine Theorie des Maschinenverständnisses über das Prinzip der Mindestbeschreibungslänge 通过最低描述长度原则理解机器理论 2504.00395v3

Authors (2): Canlin Zhang, Xiuwen Liu

Deep neural networks trained through end-to-end learning have achieved remarkable success across various domains in the past decade. However, the end-to-end learning strategy, originally designed to minimize predictive loss in a black-box manner, faces two fundamental limitations: the struggle to form explainable representations in a self-supervised manner, and the inability to compress information rigorously following the Minimum Description Length (MDL) principle. These two limitations point to a deeper issue: an end-to-end learning model is not able to “understand” what it learns. In this paper, we establish a novel theory connecting these two limitations. We design the Spectrum VAE, a novel deep learning architecture whose minimum description length (MDL) can be rigorously evaluated. Then, we introduce the concept of latent dimension combinations, or what we term spiking patterns, and demonstrate that the observed spiking patterns should be as few as possible based on the training data in order for the Spectrum VAE to achieve the MDL. Finally, our theory demonstrates that when the MDL is achieved with respect to the given data distribution, the Spectrum VAE will naturally produce explainable latent representations of the data. In other words, explainable representations–or “understanding”–can emerge in a self-supervised manner simply by making the deep network obey the MDL principle. In our opinion, this also implies a deeper insight: To understand is to compress. At its core, our theory advocates for a shift in the training objective of deep networks: not only to minimize predictive loss, but also to minimize the description length regarding the given data. That is, a deep network should not only learn, but also understand what it learns. This work is entirely theoretical and aims to inspire future research toward self-supervised, explainable AI grounded in the MDL principle.

nan


Article 927

Title@2025-06-09 (1): Fully data-driven inverse hyperelasticity with hyper-network neural ODE fields

Title: Fully data-driven inverse hyperelasticity with hyper-network neural ODE fields Vollständig datengetriebene inverse Hyperelastizität mit hyper-network neuronalen ODE-Feldern 由全数据驱动的全数据驱动的超反超弹性,具有超网络神经极极光字段 2506.08146v1

Authors (6): Vahidullah Taç, Amirhossein Amiri-Hezaveh, Manuel K. Rausch, Grace N. Bechtel, Francisco Sahli Costabal, Adrian Buganza Tepole

We propose a new framework for identifying mechanical properties of heterogeneous materials without a closed-form constitutive equation. Given a full-field measurement of the displacement field, for instance as obtained from digital image correlation (DIC), a continuous approximation of the strain field is obtained by training a neural network that incorporates Fourier features to effectively capture sharp gradients in the data. A physics-based data-driven method built upon ordinary neural differential equations (NODEs) is employed to discover constitutive equations. The NODE framework can represent arbitrary materials while satisfying constraints in the theory of constitutive equations by default. To account for heterogeneity, a hyper-network is defined, where the input is the material coordinate system, and the output is the NODE-based constitutive equation. The parameters of the hyper-network are optimized by minimizing a multi-objective loss function that includes penalty terms for violations of the strong form of the equilibrium equations of elasticity and the associated Neumann boundary conditions. We showcase the framework with several numerical examples, including heterogeneity arising from variations in material parameters, spatial transitions from isotropy to anisotropy, material identification in the presence of noise, and, ultimately, application to experimental data. As the numerical results suggest, the proposed approach is robust and general in identifying the mechanical properties of heterogeneous materials with very few assumptions, making it a suitable alternative to classical inverse methods.

nan


Article 928

Title@2025-06-09 (1): Accelerating Spectral Clustering under Fairness Constraints

Title: Accelerating Spectral Clustering under Fairness Constraints Beschleunigung des Spektralclustering unter Fairness-Beschränkungen 加速在公平制约下聚居 2506.08143v1

Authors (4): Francesco Tonin, Alex Lambert, Johan A. K. Suykens, Volkan Cevher

Fairness of decision-making algorithms is an increasingly important issue. In this paper, we focus on spectral clustering with group fairness constraints, where every demographic group is represented in each cluster proportionally as in the general population. We present a new efficient method for fair spectral clustering (Fair SC) by casting the Fair SC problem within the difference of convex functions (DC) framework. To this end, we introduce a novel variable augmentation strategy and employ an alternating direction method of multipliers type of algorithm adapted to DC problems. We show that each associated subproblem can be solved efficiently, resulting in higher computational efficiency compared to prior work, which required a computationally expensive eigendecomposition. Numerical experiments demonstrate the effectiveness of our approach on both synthetic and real-world benchmarks, showing significant speedups in computation time over prior art, especially as the problem size grows. This work thus represents a considerable step forward towards the adoption of fair clustering in real-world applications.

nan


Article 929

Title@2025-06-09 (1): AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists

Title: AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists AutoSDT: Scaling Data-Driven Discovery Aufgaben gegenüber offenen Mitwissenschaftlern AutoSDD: 扩大数据驱动发现任务, 以建立开放的共同科学家 2506.08140v1

Authors (19): Yifei Li, Hanane Nour Moussa, Ziru Chen, Shijie Chen, Botao Yu, Mingyi Xue, Benjamin Burns, Tzu-Yao Chiu, Vishal Dey, Zitong Lu, Chen Wei, Qianheng Zhang, Tianyu Zhang, Song Gao, Xuhui Huang, Xia Ning, Nesreen K. Ahmed, Ali Payani, Huan Sun

Despite long-standing efforts in accelerating scientific discovery with AI, building AI co-scientists remains challenging due to limited high-quality data for training and evaluation. To tackle this data scarcity issue, we present AutoSDT, an automatic pipeline that collects high-quality coding tasks in real-world data-driven discovery workflows. AutoSDT leverages the coding capabilities and parametric knowledge of LLMs to search for diverse sources, select ecologically valid tasks, and synthesize accurate task instructions and code solutions. Using our pipeline, we construct AutoSDT-5K, a dataset of 5,404 coding tasks for data-driven discovery that covers four scientific disciplines and 756 unique Python packages. To the best of our knowledge, AutoSDT-5K is the only automatically collected and the largest open dataset for data-driven scientific discovery. Expert feedback on a subset of 256 tasks shows the effectiveness of AutoSDT: 93% of the collected tasks are ecologically valid, and 92.2% of the synthesized programs are functionally correct. Trained on AutoSDT-5K, the Qwen2.5-Coder-Instruct LLM series, dubbed AutoSDT-Coder, show substantial improvement on two challenging data-driven discovery benchmarks, ScienceAgentBench and DiscoveryBench. Most notably, AutoSDT-Coder-32B reaches the same level of performance as GPT-4o on ScienceAgentBench with a success rate of 7.8%, doubling the performance of its base model. On DiscoveryBench, it lifts the hypothesis matching score to 8.1, bringing a 17.4% relative improvement and closing the gap between open-weight models and GPT-4o.

nan


Article 930

Title@2025-06-09 (1): Nearness of Neighbors Attention for Regression in Supervised Finetuning

Title: Nearness of Neighbors Attention for Regression in Supervised Finetuning Nähe der Nachbarn Achtung vor der Regression in überwachter Finetuning 邻近邻居的近邻 在监督的微调中注意后退 2506.08139v1

Authors (3): Aviad Susman, Mayte Suárez-Fariñas, Joseph T Colonel

It is common in supervised machine learning to combine the feature extraction capabilities of neural networks with the predictive power of traditional algorithms, such as k-nearest neighbors (k-NN) or support vector machines. This procedure involves performing supervised fine-tuning (SFT) on a domain-appropriate feature extractor, followed by training a traditional predictor on the resulting SFT embeddings. When used in this manner, traditional predictors often deliver increased performance over the SFT model itself, despite the fine-tuned feature extractor yielding embeddings specifically optimized for prediction by the neural network’s final dense layer. This suggests that directly incorporating traditional algorithms into SFT as prediction layers may further improve performance. However, many traditional algorithms have not been implemented as neural network layers due to their non-differentiable nature and their unique optimization requirements. As a step towards solving this problem, we introduce the Nearness of Neighbors Attention (NONA) regression layer. NONA uses the mechanics of neural network attention and a novel learned attention-masking scheme to yield a differentiable proxy of the k-NN regression algorithm. Results on multiple unstructured datasets show improved performance over both dense layer prediction and k-NN on SFT embeddings for regression.

nan


Article 931

Title@2025-06-09 (1): Learning Efficient Representations of Neutrino Telescope Events

Title: Learning Efficient Representations of Neutrino Telescope Events Effiziente Darstellungen von Neutrino-Teleskop-Veranstaltungen lernen 中子望远镜活动学习效率高的代表性 2410.13148v2

Authors (3): Felix J. Yu, Nicholas Kamp, Carlos A. Argüelles

Neutrino telescopes detect rare interactions of particles produced in some of the most extreme environments in the Universe. This is accomplished by instrumenting a cubic-kilometer volume of naturally occurring transparent medium with light sensors. Given their substantial size and the high frequency of background interactions, these telescopes amass an enormous quantity of large variance, high-dimensional data. These attributes create substantial challenges for analyzing and reconstructing interactions, particularly when utilizing machine learning (ML) techniques. In this paper, we present a novel approach, called om2vec, that employs transformer-based variational autoencoders to efficiently represent neutrino telescope events by learning compact and descriptive latent representations. We demonstrate that these latent representations offer enhanced flexibility and improved computational efficiency, thereby facilitating downstream tasks in data analysis.

nan


Article 932

Title@2025-06-09 (1): UnCLe: Benchmarking Unsupervised Continual Learning for Depth Completion

Title: UnCLe: Benchmarking Unsupervised Continual Learning for Depth Completion UnCLe: Benchmarking unüberwachtes kontinuierliches Lernen für die Tiefenvervollständigung 无CLLe:为深度完成确定不受监督的连续学习基准 2410.18074v4

Authors (6): Xien Chen, Rit Gangopadhyay, Michael Chu, Patrick Rim, Hyoungseob Park, Alex Wong

We propose UnCLe, the first standardized benchmark for Unsupervised Continual Learning of a multimodal 3D reconstruction task: Depth completion aims to infer a dense depth map from a pair of synchronized RGB image and sparse depth map. We benchmark depth completion models under the practical scenario of unsupervised learning over continuous streams of data. While unsupervised learning of depth boasts the possibility continual learning of novel data distributions over time, existing methods are typically trained on a static, or stationary, dataset. However, when adapting to novel nonstationary distributions, they ``catastrophically forget’’ previously learned information. UnCLe simulates these non-stationary distributions by adapting depth completion models to sequences of datasets containing diverse scenes captured from distinct domains using different visual and range sensors. We adopt representative methods from continual learning paradigms and translate them to enable unsupervised continual learning of depth completion. We benchmark these models across indoor and outdoor environments, and investigate the degree of catastrophic forgetting through standard quantitative metrics. We find that unsupervised continual learning of depth completion is an open problem, and we invite researchers to leverage UnCLe as a development platform.

nan


Article 933

Title@2025-06-09 (1): Directed Exploration in Reinforcement Learning from Linear Temporal Logic

Title: Directed Exploration in Reinforcement Learning from Linear Temporal Logic Direkte Exploration bei der Verstärkung Lernen von linearen Temporal Logic 直接探索从线上时间逻辑中加强学习 2408.09495v2

Authors (3): Marco Bagatella, Andreas Krause, Georg Martius

Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning, as it allows describing objectives beyond the expressivity of conventional discounted return formulations. Nonetheless, recent works have shown that LTL formulas can be translated into a variable rewarding and discounting scheme, whose optimization produces a policy maximizing a lower bound on the probability of formula satisfaction. However, the synthesized reward signal remains fundamentally sparse, making exploration challenging. We aim to overcome this limitation, which can prevent current algorithms from scaling beyond low-dimensional, short-horizon problems. We show how better exploration can be achieved by further leveraging the LTL specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process, thus enabling a form of high-level value estimation. By taking a Bayesian perspective over LDBA dynamics and proposing a suitable prior distribution, we show that the values estimated through this procedure can be treated as a shaping potential and mapped to informative intrinsic rewards. Empirically, we demonstrate applications of our method from tabular settings to high-dimensional continuous systems, which have so far represented a significant challenge for LTL-based reinforcement learning algorithms.

nan


Article 934

Title@2025-06-09 (1): Constrained Pareto Set Identification with Bandit Feedback

Title: Constrained Pareto Set Identification with Bandit Feedback Beschränkte Pareto-Set-Identifikation mit Bandit-Feedback 被约束的 Pareto 设置 识别强盗反馈 2506.08127v1

Authors (3): Cyrille Kone, Emilie Kaufmann, Laura Richert

In this paper, we address the problem of identifying the Pareto Set under feasibility constraints in a multivariate bandit setting. Specifically, given a $K$-armed bandit with unknown means $\mu_1, \dots, \mu_K \in \mathbb{R}^d$, the goal is to identify the set of arms whose mean is not uniformly worse than that of another arm (i.e., not smaller for all objectives), while satisfying some known set of linear constraints, expressing, for example, some minimal performance on each objective. Our focus lies in fixed-confidence identification, for which we introduce an algorithm that significantly outperforms racing-like algorithms and the intuitive two-stage approach that first identifies feasible arms and then their Pareto Set. We further prove an information-theoretic lower bound on the sample complexity of any algorithm for constrained Pareto Set identification, showing that the sample complexity of our approach is near-optimal. Our theoretical results are supported by an extensive empirical evaluation on a series of benchmarks.

nan


Article 935

Title@2025-06-09 (1): Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning

Title: Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning Bingo: Effiziente Begründung von LLMs durch dynamisches und auf Bedeutung basierendes Verstärkungslernen 宾果:通过以动态和重要内容为基础的强化学习,提高LLMs高效率的理据能力 2506.08125v1

Authors (8): Hanbing Liu, Lang Cao, Yuanyi Ren, Mengyu Zhou, Haoyu Dong, Xiaojun Ma, Shi Han, Dongmei Zhang

Large language models have demonstrated impressive reasoning capabilities, yet they often suffer from inefficiencies due to unnecessarily verbose or redundant outputs. While many works have explored reinforcement learning (RL) to enhance reasoning abilities, most primarily focus on improving accuracy, with limited attention to reasoning efficiency. Some existing approaches introduce direct length-based rewards to encourage brevity, but this often leads to noticeable drops in accuracy. In this paper, we propose Bingo, an RL framework that advances length-based reward design to boost efficient reasoning. Bingo incorporates two key mechanisms: a significance-aware length reward, which gradually guides the model to reduce only insignificant tokens, and a dynamic length reward, which initially encourages elaborate reasoning for hard questions but decays over time to improve overall efficiency. Experiments across multiple reasoning benchmarks show that Bingo improves both accuracy and efficiency. It outperforms the vanilla reward and several other length-based reward baselines in RL, achieving a favorable trade-off between accuracy and efficiency. These results underscore the potential of training LLMs explicitly for efficient reasoning.

nan


Article 936

Title@2025-06-09 (1): Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence

Title: Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence Kontinuierliche Politik und Wert Iteration für stochastische Kontrollprobleme und deren Konvergenz 斯托卡控制问题及其趋同的持续政策和价值迭代 2506.08121v1

Authors (2): Qi Feng, Gu Wang

We introduce a continuous policy-value iteration algorithm where the approximations of the value function of a stochastic control problem and the optimal control are simultaneously updated through Langevin-type dynamics. This framework applies to both the entropy-regularized relaxed control problems and the classical control problems, with infinite horizon. We establish policy improvement and demonstrate convergence to the optimal control under the monotonicity condition of the Hamiltonian. By utilizing Langevin-type stochastic differential equations for continuous updates along the policy iteration direction, our approach enables the use of distribution sampling and non-convex learning techniques in machine learning to optimize the value function and identify the optimal control simultaneously.

nan


Article 937

Title@2025-06-09 (1): Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting

Title: Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting Benchmarking-Modelle der vortrainierten Zeitreihen für die Strompreisprognose 确定电力价格预测培训前时间序列模型的基准 2506.08113v1

Authors (3): Timothée Hornek Amir Sartipi, Igor Tchappi, Gilbert Fridgen

Accurate electricity price forecasting (EPF) is crucial for effective decision-making in power trading on the spot market. While recent advances in generative artificial intelligence (GenAI) and pre-trained large language models (LLMs) have inspired the development of numerous time series foundation models (TSFMs) for time series forecasting, their effectiveness in EPF remains uncertain. To address this gap, we benchmark several state-of-the-art pretrained models–Chronos-Bolt, Chronos-T5, TimesFM, Moirai, Time-MoE, and TimeGPT–against established statistical and machine learning (ML) methods for EPF. Using 2024 day-ahead auction (DAA) electricity prices from Germany, France, the Netherlands, Austria, and Belgium, we generate daily forecasts with a one-day horizon. Chronos-Bolt and Time-MoE emerge as the strongest among the TSFMs, performing on par with traditional models. However, the biseasonal MSTL model, which captures daily and weekly seasonality, stands out for its consistent performance across countries and evaluation metrics, with no TSFM statistically outperforming it.

nan


Article 938

Title@2025-06-09 (1): Selective Matching Losses – Not All Scores Are Created Equal

Title: Selective Matching Losses – Not All Scores Are Created Equal Selektive Matching Losses – Nicht alle Scores werden gleich erstellt 选择性匹配损失 – – 并非所有分数都等于创建 2506.04446v2

Authors (2): Gil I. Shamir, Manfred K. Warmuth

Learning systems match predicted scores to observations over some domain. Often, it is critical to produce accurate predictions in some subset (or region) of the domain, yet less important to accurately predict in other regions. We construct selective matching loss functions by design of increasing link functions over score domains. A matching loss is an integral over the link. A link defines loss sensitivity as function of the score, emphasizing high slope high sensitivity regions over flat ones. Loss asymmetry drives a model and resolves its underspecification to predict better in high sensitivity regions where it is more important, and to distinguish between high and low importance regions. A large variety of selective scalar losses can be designed with scaled and shifted Sigmoid and hyperbolic sine links. Their properties, however, do not extend to multi-class. Applying them per dimension lacks ranking sensitivity that assigns importance according to class score ranking. Utilizing composite Softmax functions, we develop a framework for multidimensional selective losses. We overcome limitations of the standard Softmax function, that is good for classification, but not for distinction between adjacent scores. Selective losses have substantial advantage over traditional losses in applications with more important score regions, including dwell-time prediction, retrieval, ranking with either pointwise, contrastive pairwise, or listwise losses, distillation problems, and fine-tuning alignment of Large Language Models (LLMs).

nan


Article 939

Title@2025-06-09 (1): Positional Attention: Expressivity and Learnability of Algorithmic Computation

Title: Positional Attention: Expressivity and Learnability of Algorithmic Computation Positionale Aufmerksamkeit: Expressivität und Erlernbarkeit algorithmischer Computation 位置注意: 算法的可表达性和可理解性 2410.01686v3

Authors (5): Artur Back de Luca, George Giapitzakis, Shenghao Yang, Petar Veličković, Kimon Fountoulakis

There is a growing interest in the ability of neural networks to execute algorithmic tasks (e.g., arithmetic, summary statistics, and sorting). The goal of this work is to better understand the role of attention in Transformers for algorithmic execution. Its importance for algorithmic execution has been studied theoretically and empirically using parallel computational models. Notably, many parallel algorithms communicate between processors solely using positional information. Inspired by this observation, we investigate how Transformers can execute algorithms using positional attention, where attention weights depend exclusively on positional encodings. We prove that Transformers with positional attention (positional Transformers) maintain the same expressivity of parallel computational models, incurring a logarithmic depth cost relative to the input length. We analyze their in-distribution learnability and explore how parameter norms in positional attention affect sample complexity. Our results show that positional Transformers introduce a learning trade-off: while they exhibit better theoretical dependence on parameter norms, certain tasks may require more layers, which can, in turn, increase sample complexity. Finally, we empirically explore the out-of-distribution performance of positional Transformers and find that they perform well in tasks where their underlying algorithmic solution relies on positional information.

nan


Article 940

Title@2025-06-09 (1): StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets

Title: StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets StableMTL:从部分附加说明合成数据集中重新定位多任务学习的远程传播模型 2506.08013v1

Authors (3): Anh-Quan Cao, Ivan Lopes, Raoul de Charette

Multi-task learning for dense prediction is limited by the need for extensive annotation for every task, though recent works have explored training with partial task labels. Leveraging the generalization power of diffusion models, we extend the partial learning setup to a zero-shot setting, training a multi-task model on multiple synthetic datasets, each labeled for only a subset of tasks. Our method, StableMTL, repurposes image generators for latent regression. Adapting a denoising framework with task encoding, per-task conditioning and a tailored training scheme. Instead of per-task losses requiring careful balancing, a unified latent loss is adopted, enabling seamless scaling to more tasks. To encourage inter-task synergy, we introduce a multi-stream model with a task-attention mechanism that converts N-to-N task interactions into efficient 1-to-N attention, promoting effective cross-task sharing. StableMTL outperforms baselines on 7 tasks across 8 benchmarks.

nan


Article 941

Title@2025-06-09 (1): Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Title: Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Selbsterzwingen: Überbrückung der Zug-Testlücke in autoregressiver Video-Diffusion 自我强迫:弥合自动递减视频传播中的培训-测试差距 2506.08009v1

Authors (5): Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, Eli Shechtman

We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models. It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs during inference. Unlike prior methods that denoise future frames based on ground-truth context frames, Self Forcing conditions each frame’s generation on previously self-generated outputs by performing autoregressive rollout with key-value (KV) caching during training. This strategy enables supervision through a holistic loss at the video level that directly evaluates the quality of the entire generated sequence, rather than relying solely on traditional frame-wise objectives. To ensure training efficiency, we employ a few-step diffusion model along with a stochastic gradient truncation strategy, effectively balancing computational cost and performance. We further introduce a rolling KV cache mechanism that enables efficient autoregressive video extrapolation. Extensive experiments demonstrate that our approach achieves real-time streaming video generation with sub-second latency on a single GPU, while matching or even surpassing the generation quality of significantly slower and non-causal diffusion models. Project website: http://self-forcing.github.io/

nan


Article 942

Title@2025-06-09 (1): Hidden in plain sight: VLMs overlook their visual representations

Title: Hidden in plain sight: VLMs overlook their visual representations Versteckt in freier Sicht: VLMs übersehen ihre visuellen Darstellungen 隐藏在普通视线之下:VLMs忽略了它们的视觉形象。 2506.08008v1

Authors (4): Stephanie Fu, Tyler Bonnen, Devin Guillory, Trevor Darrell

Language provides a natural interface to specify and evaluate performance on visual tasks. To realize this possibility, vision language models (VLMs) must successfully integrate visual and linguistic information. Our work compares VLMs to a direct readout of their visual encoders to understand their ability to integrate across these modalities. Across a series of vision-centric benchmarks (e.g., depth estimation, correspondence), we find that VLMs perform substantially worse than their visual encoders, dropping to near-chance performance. We investigate these results through a series of analyses across the entire VLM: namely 1) the degradation of vision representations, 2) brittleness to task prompt, and 3) the language model’s role in solving the task. We find that the bottleneck in performing these vision-centric tasks lies in this third category; VLMs are not effectively using visual information easily accessible throughout the entire model, and they inherit the language priors present in the LLM. Our work helps diagnose the failure modes of open-source VLMs, and presents a series of evaluations useful for future investigations into visual understanding within VLMs.

nan


Article 943

Title@2025-06-09 (1): Reparameterized LLM Training via Orthogonal Equivalence Transformation

Title: Reparameterized LLM Training via Orthogonal Equivalence Transformation Reparameterisiertes LLM-Training über Orthogonale Äquivalenztransformation 通过正正对等转化进行修复性磁力LLM培训 2506.08001v1

Authors (6): Zeju Qiu, Simon Buchholz, Tim Z. Xiao, Maximilian Dax, Bernhard Schölkopf, Weiyang Liu

While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field’s most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.

nan


Article 944

Title@2025-06-09 (1): MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation

Title: MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation MADFormer: Mixed Autoregressive und Diffusion Transformer für die kontinuierliche Bildgenerierung MADFormer: 用于连续图像生成的混合自动递减和扩散变异器 2506.07999v1

Authors (3): Junhao Chen, Yulia Tsvetkov, Xiaochuang Han

Recent progress in multimodal generation has increasingly combined autoregressive (AR) and diffusion-based approaches, leveraging their complementary strengths: AR models capture long-range dependencies and produce fluent, context-aware outputs, while diffusion models operate in continuous latent spaces to refine high-fidelity visual details. However, existing hybrids often lack systematic guidance on how and why to allocate model capacity between these paradigms. In this work, we introduce MADFormer, a Mixed Autoregressive and Diffusion Transformer that serves as a testbed for analyzing AR-diffusion trade-offs. MADFormer partitions image generation into spatial blocks, using AR layers for one-pass global conditioning across blocks and diffusion layers for iterative local refinement within each block. Through controlled experiments on FFHQ-1024 and ImageNet, we identify two key insights: (1) block-wise partitioning significantly improves performance on high-resolution images, and (2) vertically mixing AR and diffusion layers yields better quality-efficiency balances–improving FID by up to 75% under constrained inference compute. Our findings offer practical design principles for future hybrid generative models.

nan


Article 945

Title@2025-06-09 (1): Generative Modeling of Weights: Generalization or Memorization?

Title: Generative Modeling of Weights: Generalization or Memorization? Generative Modellierung von Gewichten: Verallgemeinerung oder Erinnerung? 产生重力建模:普遍化还是记忆化? 2506.07998v1

Authors (4): Boya Zeng, Yida Yin, Zhiqiu Xu, Zhuang Liu

Generative models, with their success in image and video generation, have recently been explored for synthesizing effective neural network weights. These approaches take trained neural network checkpoints as training data, and aim to generate high-performing neural network weights during inference. In this work, we examine four representative methods on their ability to generate novel model weights, i.e., weights that are different from the checkpoints seen during training. Surprisingly, we find that these methods synthesize weights largely by memorization: they produce either replicas, or at best simple interpolations, of the training checkpoints. Current methods fail to outperform simple baselines, such as adding noise to the weights or taking a simple weight ensemble, in obtaining different and simultaneously high-performing models. We further show that this memorization cannot be effectively mitigated by modifying modeling factors commonly associated with memorization in image diffusion models, or applying data augmentations. Our findings provide a realistic assessment of what types of data current generative models can model, and highlight the need for more careful evaluation of generative models in new domains. Our code is available at https://github.com/boyazeng/weight_memorization.

nan


Article 946

Title@2025-06-09 (1): Domain Switching on the Pareto Front: Multi-Objective Deep Kernel Learning in Automated Piezoresponse Force Microscopy

Title: Domain Switching on the Pareto Front: Multi-Objective Deep Kernel Learning in Automated Piezoresponse Force Microscopy Domain-Umschaltung der Pareto-Front: Multi-Objektives Deep Kernel-Lernen in automatisierter Piezoresponse-Kraftmikroskopie Pareto Front 域域切换:在自动 Piezo响应力量中进行多目标深核心学习的显微镜 2506.08073v1

Authors (8): Yu Liu, Utkarsh Pratiush, Kamyar Barakati, Hiroshi Funakubo, Ching-Che Lin, Jaegyu Kim, Lane W. Martin, Sergei V. Kalinin

Ferroelectric polarization switching underpins the functional performance of a wide range of materials and devices, yet its dependence on complex local microstructural features renders systematic exploration by manual or grid-based spectroscopic measurements impractical. Here, we introduce a multi-objective kernel-learning workflow that infers the microstructural rules governing switching behavior directly from high-resolution imaging data. Applied to automated piezoresponse force microscopy (PFM) experiments, our framework efficiently identifies the key relationships between domain-wall configurations and local switching kinetics, revealing how specific wall geometries and defect distributions modulate polarization reversal. Post-experiment analysis projects abstract reward functions, such as switching ease and domain symmetry, onto physically interpretable descriptors including domain configuration and proximity to boundaries. This enables not only high-throughput active learning, but also mechanistic insight into the microstructural control of switching phenomena. While demonstrated for ferroelectric domain switching, our approach provides a powerful, generalizable tool for navigating complex, non-differentiable design spaces, from structure-property correlations in molecular discovery to combinatorial optimization across diverse imaging modalities.

nan


Article 947

Title@2025-06-09 (1): PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation

Title: PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation PhantomWiki: On-Demand-Datensätze für die Bewertung von Vernunft und Retrieval PhohantomWiki: 用于理性和检索评价的现场数据集 2502.20377v2

Authors (9): Albert Gong, Kamilė Stankevičiūtė, Chao Wan, Anmol Kabra, Raphael Thesmar, Johann Lee, Julius Klenke, Carla P. Gomes, Kilian Q. Weinberger

High-quality benchmarks are essential for evaluating reasoning and retrieval capabilities of large language models (LLMs). However, curating datasets for this purpose is not a permanent solution as they are prone to data leakage and inflated performance results. To address these challenges, we propose PhantomWiki: a pipeline to generate unique, factually consistent document corpora with diverse question-answer pairs. Unlike prior work, PhantomWiki is neither a fixed dataset, nor is it based on any existing data. Instead, a new PhantomWiki instance is generated on demand for each evaluation. We vary the question difficulty and corpus size to disentangle reasoning and retrieval capabilities respectively, and find that PhantomWiki datasets are surprisingly challenging for frontier LLMs. Thus, we contribute a scalable and data leakage-resistant framework for disentangled evaluation of reasoning, retrieval, and tool-use abilities. Our code is available at https://github.com/kilian-group/phantom-wiki.

nan


Article 948

Title@2025-06-09 (1): When Two LLMs Debate, Both Think They’ll Win

Title: When Two LLMs Debate, Both Think They’ll Win Wenn zwei LLMs diskutieren, denken beide, dass sie gewinnen werden 当两个LLM 辩论, 双方都认为他们会赢 2505.19184v3

Authors (2): Pradyumna Shyama Prasad, Minh Nhat Nguyen

Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language Models (LLMs) in a dynamic, adversarial debate setting, uniquely combining two realistic factors: (a) a multi-turn format requiring models to update beliefs as new information emerges, and (b) a zero-sum structure to control for task-related uncertainty, since mutual high-confidence claims imply systematic overconfidence. We organized 60 three-round policy debates among ten state-of-the-art LLMs, with models privately rating their confidence (0-100) in winning after each round. We observed five concerning patterns: (1) Systematic overconfidence: models began debates with average initial confidence of 72.9% vs. a rational 50% baseline. (2) Confidence escalation: rather than reducing confidence as debates progressed, debaters increased their win probabilities, averaging 83% by the final round. (3) Mutual overestimation: in 61.7% of debates, both sides simultaneously claimed >=75% probability of victory, a logical impossibility. (4) Persistent self-debate bias: models debating identical copies increased confidence from 64.1% to 75.2%; even when explicitly informed their chance of winning was exactly 50%, confidence still rose (from 50.0% to 57.1%). (5) Misaligned private reasoning: models’ private scratchpad thoughts sometimes differed from their public confidence ratings, raising concerns about faithfulness of chain-of-thought reasoning. These results suggest LLMs lack the ability to accurately self-assess or update their beliefs in dynamic, multi-turn tasks; a major concern as LLMs are now increasingly deployed without careful review in assistant and agentic roles. Code for our experiments is available at https://github.com/pradyuprasad/llms_overconfidence

nan


Article 949

Title@2025-06-09 (1): Rethinking Crowd-Sourced Evaluation of Neuron Explanations

Title: Rethinking Crowd-Sourced Evaluation of Neuron Explanations Rethinking Crowd-Sourced Evaluation von Neuron Erklärungen 重新思考人群-源对内向解释的重新思考 2506.07985v1

Authors (4): Tuomas Oikarinen, Ge Yan, Akshay Kulkarni, Tsui-Wei Weng

Interpreting individual neurons or directions in activations space is an important component of mechanistic interpretability. As such, many algorithms have been proposed to automatically produce neuron explanations, but it is often not clear how reliable these explanations are, or which methods produce the best explanations. This can be measured via crowd-sourced evaluations, but they can often be noisy and expensive, leading to unreliable results. In this paper, we carefully analyze the evaluation pipeline and develop a cost-effective and highly accurate crowdsourced evaluation strategy. In contrast to previous human studies that only rate whether the explanation matches the most highly activating inputs, we estimate whether the explanation describes neuron activations across all inputs. To estimate this effectively, we introduce a novel application of importance sampling to determine which inputs are the most valuable to show to raters, leading to around 30x cost reduction compared to uniform sampling. We also analyze the label noise present in crowd-sourced evaluations and propose a Bayesian method to aggregate multiple ratings leading to a further ~5x reduction in number of ratings required for the same accuracy. Finally, we use these methods to conduct a large-scale study comparing the quality of neuron explanations produced by the most popular methods for two different vision models.

nan


Article 950

Title@2025-06-09 (1): CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray

Title: CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray CXR-LT 2024: Eine MICCAI-Herausforderung bei Langzeit-, Multilabel- und Zero-Shot-Erkrankungsklassifizierung aus der Bruströntgenuntersuchung CXR-LT 2024:MICCAI对胸前X光中长尾、多标签和零发疾病分类的挑战 2506.07984v1

Authors (33): Mingquan Lin, Gregory Holste, Song Wang, Yiliang Zhou, Yishu Wei, Imon Banerjee, Pengyi Chen, Tianjie Dai, Yuexi Du, Nicha C. Dvornek, Yuyan Ge, Zuowei Guo, Shouhei Hanaoka, Dongkyun Kim, Pablo Messina, Yang Lu, Denis Parra, Donghyun Son, Álvaro Soto, Aisha Urooj, René Vidal, Yosuke Yamagishi, Zefan Yang, Ruichi Zhang, Yang Zhou, Leo Anthony Celi, Ronald M. Summers, Zhiyong Lu, Hao Chen, Adam Flanders, George Shih, Zhangyang Wang, Yifan Peng

The CXR-LT series is a community-driven initiative designed to enhance lung disease classification using chest X-rays (CXR). It tackles challenges in open long-tailed lung disease classification and enhances the measurability of state-of-the-art techniques. The first event, CXR-LT 2023, aimed to achieve these goals by providing high-quality benchmark CXR data for model development and conducting comprehensive evaluations to identify ongoing issues impacting lung disease classification performance. Building on the success of CXR-LT 2023, the CXR-LT 2024 expands the dataset to 377,110 chest X-rays (CXRs) and 45 disease labels, including 19 new rare disease findings. It also introduces a new focus on zero-shot learning to address limitations identified in the previous event. Specifically, CXR-LT 2024 features three tasks: (i) long-tailed classification on a large, noisy test set, (ii) long-tailed classification on a manually annotated “gold standard” subset, and (iii) zero-shot generalization to five previously unseen disease findings. This paper provides an overview of CXR-LT 2024, detailing the data curation process and consolidating state-of-the-art solutions, including the use of multimodal models for rare disease detection, advanced generative approaches to handle noisy labels, and zero-shot learning strategies for unseen diseases. Additionally, the expanded dataset enhances disease coverage to better represent real-world clinical settings, offering a valuable resource for future research. By synthesizing the insights and innovations of participating teams, we aim to advance the development of clinically realistic and generalizable diagnostic models for chest radiography.

nan


Article 951

Title@2025-06-09 (1): Real-time Localization of a Soccer Ball from a Single Camera

Title: Real-time Localization of a Soccer Ball from a Single Camera Echtzeit Lokalisierung eines Fußballballs aus einer einzigen Kamera 单摄像头足球球的实时本地化 2506.07981v1

Authors (3): Dmitrii Vorobev, Artem Prosvetov, Karim Elhadji Daou

We propose a computationally efficient method for real-time three-dimensional football trajectory reconstruction from a single broadcast camera. In contrast to previous work, our approach introduces a multi-mode state model with $W$ discrete modes to significantly accelerate optimization while preserving centimeter-level accuracy – even in cases of severe occlusion, motion blur, and complex backgrounds. The system operates on standard CPUs and achieves low latency suitable for live broadcast settings. Extensive evaluation on a proprietary dataset of 6K-resolution Russian Premier League matches demonstrates performance comparable to multi-camera systems, without the need for specialized or costly infrastructure. This work provides a practical method for accessible and accurate 3D ball tracking in professional football environments.

nan


Article 952

Title@2025-06-09 (1): Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation

Title: Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation Bipartite Ranking von mehreren Labels: Auf Verlust Versus Label Aggregation 多标签的双边排名:关于损失的Versus Label聚合 2504.11284v2

Authors (10): Michal Lukasik, Lin Chen, Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Felix X. Yu, Sashank J. Reddi, Gang Fu, Mohammadhossein Bateni, Sanjiv Kumar

Bipartite ranking is a fundamental supervised learning problem, with the goal of learning a ranking over instances with maximal Area Under the ROC Curve (AUC) against a single binary target label. However, one may often observe multiple binary target labels, e.g., from distinct human annotators. How can one synthesize such labels into a single coherent ranking? In this work, we formally analyze two approaches to this problem – loss aggregation and label aggregation – by characterizing their Bayes-optimal solutions. We show that while both approaches can yield Pareto-optimal solutions, loss aggregation can exhibit label dictatorship: one can inadvertently (and undesirably) favor one label over others. This suggests that label aggregation can be preferable to loss aggregation, which we empirically verify.

nan


Article 953

Title@2025-06-09 (1): Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator

Title: Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator Realistischer städtischer Verkehrsgenerator mit dezentralisiertem Federated Learning für den SUMO-Simulator 使用分散化的联邦学习来模拟SUMO的模拟器进行现实化城市交通发电机 2506.07980v1

Authors (9): Alberto Bazán-Guillén, Carlos Beis-Penedo, Diego Cajaraville-Aboy, Pablo Barbecho-Bautista, Rebeca P. Díaz-Redondo, Luis J. de la Cruz Llopis, Ana Fernández-Vilas, Mónica Aguilar Igartua, Manuel Fernández-Veiga

Realistic urban traffic simulation is essential for sustainable urban planning and the development of intelligent transportation systems. However, generating high-fidelity, time-varying traffic profiles that accurately reflect real-world conditions, especially in large-scale scenarios, remains a major challenge. Existing methods often suffer from limitations in accuracy, scalability, or raise privacy concerns due to centralized data processing. This work introduces DesRUTGe (Decentralized Realistic Urban Traffic Generator), a novel framework that integrates Deep Reinforcement Learning (DRL) agents with the SUMO simulator to generate realistic 24-hour traffic patterns. A key innovation of DesRUTGe is its use of Decentralized Federated Learning (DFL), wherein each traffic detector and its corresponding urban zone function as an independent learning node. These nodes train local DRL models using minimal historical data and collaboratively refine their performance by exchanging model parameters with selected peers (e.g., geographically adjacent zones), without requiring a central coordinator. Evaluated using real-world data from the city of Barcelona, DesRUTGe outperforms standard SUMO-based tools such as RouteSampler, as well as other centralized learning approaches, by delivering more accurate and privacy-preserving traffic pattern generation.

nan


Article 954

Title@2025-06-09 (1): Automated Capability Discovery via Foundation Model Self-Exploration

Title: Automated Capability Discovery via Foundation Model Self-Exploration Automatisierte Capability Discovery über die Selbsterkundung von Fundamentmodellen 通过基金会自我探索示范模式自动发现能力 2502.07577v3

Authors (3): Cong Lu, Shengran Hu, Jeff Clune

Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains through training on web-scale data. It remains challenging to precisely characterize even a fraction of the full spectrum of these abilities and potential risks in any new model. Existing evaluation approaches often require significant human effort, and it is taking increasing effort to design ever harder challenges for more capable models. We introduce Automated Capability Discovery (ACD), a framework that designates one foundation model as a scientist to systematically propose open-ended tasks probing the abilities of a subject model (potentially itself). By combining frontier models with ideas from the field of open-endedness, ACD automatically and systematically uncovers a diverse spectrum of surprising capabilities and failures in the subject model. We demonstrate ACD across a range of foundation models (including the GPT, Claude, and Llama series), showing that it automatically generates thousands of distinct tasks, which are then clustered to reveal dozens of broader capability areas and failure modes, that would be challenging for any single team to uncover. We further validate our method’s automated scoring with extensive human surveys, observing high agreement between model-generated and human evaluations. By leveraging foundation models’ ability to both create tasks and self-evaluate, ACD is a significant step toward scalable, automated evaluation of novel AI systems. All code and evaluation logs are open-sourced at https://github.com/conglu1997/ACD.

nan


Article 955

Title@2025-06-09 (1): Hyperpruning: Efficient Search through Pruned Variants of Recurrent Neural Networks Leveraging Lyapunov Spectrum

Title: Hyperpruning: Efficient Search through Pruned Variants of Recurrent Neural Networks Leveraging Lyapunov Spectrum Hyperpruning: Effiziente Suche durch beschnittene Varianten von recurrenten Neuronalen Netzwerken, die Lyapunov Spectrum nutzen 超光速运行:通过经常神经网络的审慎变式有效搜索,利用Lyapunov光谱网 2506.07975v1

Authors (2): Caleb Zheng, Eli Shlizerman

A variety of pruning methods have been introduced for over-parameterized Recurrent Neural Networks to improve efficiency in terms of power consumption and storage utilization. These advances motivate a new paradigm, termed `hyperpruning’, which seeks to identify the most suitable pruning strategy for a given network architecture and application. Unlike conventional hyperparameter search, where the optimal configuration’s accuracy remains uncertain, in the context of network pruning, the accuracy of the dense model sets the target for the accuracy of the pruned one. The goal, therefore, is to discover pruned variants that match or even surpass this established accuracy. However, exhaustive search over pruning configurations is computationally expensive and lacks early performance guarantees. To address this challenge, we propose a novel Lyapunov Spectrum (LS)-based distance metric that enables early comparison between pruned and dense networks, allowing accurate prediction of post-training performance. By integrating this LS-based distance with standard hyperparameter optimization algorithms, we introduce an efficient hyperpruning framework, termed LS-based Hyperpruning (LSH). LSH reduces search time by an order of magnitude compared to conventional approaches relying on full training. Experiments on stacked LSTM and RHN architectures using the Penn Treebank dataset, and on AWD-LSTM-MoS using WikiText-2, demonstrate that under fixed training budgets and target pruning ratios, LSH consistently identifies superior pruned models. Remarkably, these pruned variants not only outperform those selected by loss-based baseline but also exceed the performance of their dense counterpart.

nan


Article 956

Title@2025-06-09 (1): Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures

Title: Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures Intelligentes Offloading im Fahrzeug Edge Computing: Eine umfassende Überprüfung von Deep Reinforcement-Lernansätzen und Architekturen 在车辆边缘计算机中卸载:对深强化学习方法和架构的全面审查 2502.06963v2

Authors (3): Ashab Uddin, Ahmed Hamdi Sakr, Ning Zhang

The increasing complexity of Intelligent Transportation Systems (ITS) has led to significant interest in computational offloading to external infrastructures such as edge servers, vehicular nodes, and UAVs. These dynamic and heterogeneous environments pose challenges for traditional offloading strategies, prompting the exploration of Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) as adaptive decision-making frameworks. This survey presents a comprehensive review of recent advances in DRL-based offloading for vehicular edge computing (VEC). We classify and compare existing works based on learning paradigms (e.g., single-agent, multi-agent), system architectures (e.g., centralized, distributed, hierarchical), and optimization objectives (e.g., latency, energy, fairness). Furthermore, we analyze how Markov Decision Process (MDP) formulations are applied and highlight emerging trends in reward design, coordination mechanisms, and scalability. Finally, we identify open challenges and outline future research directions to guide the development of robust and intelligent offloading strategies for next-generation ITS.

nan


Article 957

Title@2025-06-09 (1): HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization

Title: HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization HeuriGym: Agentischer Benchmark für LLM-gekrempelte Heuristik in der kombinatorischen Optimierung HeuriGym: 组合优化中LLM-减压希力学的代理基准 2506.07972v1

Authors (13): Hongzheng Chen, Yingheng Wang, Yaohui Cai, Hins Hu, Jiajie Li, Shirley Huang, Chenhui Deng, Rongjian Liang, Shufeng Kong, Haoxing Ren, Samitha Samaranayake, Carla P. Gomes, Zhiru Zhang

While Large Language Models (LLMs) have demonstrated significant advancements in reasoning and agent-based problem-solving, current evaluation methodologies fail to adequately assess their capabilities: existing benchmarks either rely on closed-ended questions prone to saturation and memorization, or subjective comparisons that lack consistency and rigor. In this work, we introduce HeuriGym, an agentic framework designed for evaluating heuristic algorithms generated by LLMs for combinatorial optimization problems, characterized by clearly defined objectives and expansive solution spaces. HeuriGym empowers LLMs to propose heuristics, receive evaluative feedback via code execution, and iteratively refine their solutions. We evaluate nine state-of-the-art models on nine problems across domains such as computer systems, logistics, and biology, exposing persistent limitations in tool use, planning, and adaptive reasoning. To quantify performance, we propose the Quality-Yield Index (QYI), a metric that captures both solution pass rate and quality. Even top models like GPT-o4-mini-high and Gemini-2.5-Pro attain QYI scores of only 0.6, well below the expert baseline of 1. Our open-source benchmark aims to guide the development of LLMs toward more effective and realistic problem-solving in scientific and engineering domains.

nan


Article 958

Title@2025-06-09 (1): MIB: A Mechanistic Interpretability Benchmark

Title: MIB: A Mechanistic Interpretability Benchmark MIB: Ein Mechanistischer Interpretations-Benchmark MIB: 机械解释基准 2504.13151v2

Authors (23): Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov

How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or causal variables in neural language models. The circuit localization track compares methods that locate the model components - and connections between them - most important for performing a task (e.g., attribution patching or information flow routes). The causal variable localization track compares methods that featurize a hidden vector, e.g., sparse autoencoders (SAEs) or distributed alignment search (DAS), and align those features to a task-relevant causal variable. Using MIB, we find that attribution and mask optimization methods perform best on circuit localization. For causal variable localization, we find that the supervised DAS method performs best, while SAE features are not better than neurons, i.e., non-featurized hidden vectors. These findings illustrate that MIB enables meaningful comparisons, and increases our confidence that there has been real progress in the field.

nan


Article 959

Title@2025-06-09 (1): A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling

Title: A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling Ein zweiphasiges Deep-Learning-Framework für adaptive Zeitschritte in der High-Speed-Flow-Modellierung 高频流建模适应性时间排行的两阶段深学习框架 2506.07969v1

Authors (14): Jacob Helwig, Sai Sreeharsha Adavi, Xuan Zhang, Yuchao Lin, Felix S. Chim, Luke Takeshi Vizzini, Haiyang Yu, Muhammad Hasnain, Saykat Kumar Biswas, John J. Holloway, Narendra Singh, N. K. Anand, Swagnik Guhathakurta, Shuiwang Ji

We consider the problem of modeling high-speed flows using machine learning methods. While most prior studies focus on low-speed fluid flows in which uniform time-stepping is practical, flows approaching and exceeding the speed of sound exhibit sudden changes such as shock waves. In such cases, it is essential to use adaptive time-stepping methods to allow a temporal resolution sufficient to resolve these phenomena while simultaneously balancing computational costs. Here, we propose a two-phase machine learning method, known as ShockCast, to model high-speed flows with adaptive time-stepping. In the first phase, we propose to employ a machine learning model to predict the timestep size. In the second phase, the predicted timestep is used as an input along with the current fluid fields to advance the system state by the predicted timestep. We explore several physically-motivated components for timestep prediction and introduce timestep conditioning strategies inspired by neural ODE and Mixture of Experts. As ShockCast is the first framework for learning high-speed flows, we evaluate our methods by generating two supersonic flow datasets, available at https://huggingface.co/datasets/divelab. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS).

nan


Article 960

Title@2025-06-09 (1): Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs

Title: Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs 物理知情神经溶解器中发现聚合物的神经内核内核分析:PIKANs诉PINNs 2506.07958v1

Authors (2): Salah A. Faroughi, Farinaz Mostajeran

Physics-informed Kolmogorov-Arnold Networks (PIKANs), and in particular their Chebyshev-based variants (cPIKANs), have recently emerged as promising models for solving partial differential equations (PDEs). However, their training dynamics and convergence behavior remain largely unexplored both theoretically and numerically. In this work, we aim to advance the theoretical understanding of cPIKANs by analyzing them using Neural Tangent Kernel (NTK) theory. Our objective is to discern the evolution of kernel structure throughout gradient-based training and its subsequent impact on learning efficiency. We first derive the NTK of standard cKANs in a supervised setting, and then extend the analysis to the physics-informed context. We analyze the spectral properties of NTK matrices, specifically their eigenvalue distributions and spectral bias, for four representative PDEs: the steady-state Helmholtz equation, transient diffusion and Allen-Cahn equations, and forced vibrations governed by the Euler-Bernoulli beam equation. We also conduct an investigation into the impact of various optimization strategies, e.g., first-order, second-order, and hybrid approaches, on the evolution of the NTK and the resulting learning dynamics. Results indicate a tractable behavior for NTK in the context of cPIKANs, which exposes learning dynamics that standard physics-informed neural networks (PINNs) cannot capture. Spectral trends also reveal when domain decomposition improves training, directly linking kernel behavior to convergence rates under different setups. To the best of our knowledge, this is the first systematic NTK study of cPIKANs, providing theoretical insight that clarifies and predicts their empirical performance.

nan


Article 961

Title@2025-06-09 (1): Distillation Robustifies Unlearning

Title: Distillation Robustifies Unlearning Destillation stärkt das Unlearning 固化 固化 校正 校正 2506.06278v2

Authors (9): Bruce W. Lee, Addie Foote, Alex Infanger, Leni Shor, Harish Kamath, Jacob Goldman-Wetzler, Bryce Woodworth, Alex Cloud, Alexander Matt Turner

Current LLM unlearning methods are not robust: they can be reverted easily with a few steps of finetuning. This is true even for the idealized unlearning method of training to imitate an oracle model that was never exposed to unwanted information, suggesting that output-based finetuning is insufficient to achieve robust unlearning. In a similar vein, we find that training a randomly initialized student to imitate an unlearned model transfers desired behaviors while leaving undesired capabilities behind. In other words, distillation robustifies unlearning. Building on this insight, we propose Unlearn-Noise-Distill-on-Outputs (UNDO), a scalable method that distills an unlearned model into a partially noised copy of itself. UNDO introduces a tunable tradeoff between compute cost and robustness, establishing a new Pareto frontier on synthetic language and arithmetic tasks. At its strongest setting, UNDO matches the robustness of a model retrained from scratch with perfect data filtering while using only 60-80% of the compute and requiring only 0.01% of the pretraining data to be labeled. We also show that UNDO robustifies unlearning on the more realistic Weapons of Mass Destruction Proxy (WMDP) benchmark. Since distillation is widely used in practice, incorporating an unlearning step beforehand offers a convenient path to robust capability removal.

nan


Article 962

Title@2025-06-09 (1): Language Models over Canonical Byte-Pair Encodings

Title: Language Models over Canonical Byte-Pair Encodings Sprachmodelle über Canonical Byte-Pair Encodings Canonicic 字节对称编码语言模型 2506.07956v1

Authors (10): Tim Vieira, Tianyu Liu, Clemente Pasti, Yahya Emara, Brian DuSell, Benjamin LeBrun, Mario Giulianelli, Juan Luis Gastaldi, Timothy J. O’Donnell, Ryan Cotterell

Modern language models represent probability distributions over character strings as distributions over (shorter) token strings derived via a deterministic tokenizer, such as byte-pair encoding. While this approach is highly effective at scaling up language models to large corpora, its current incarnations have a concerning property: the model assigns nonzero probability mass to an exponential number of $\it{noncanonical}$ token encodings of each character string – these are token strings that decode to valid character strings but are impossible under the deterministic tokenizer (i.e., they will never be seen in any training corpus, no matter how large). This misallocation is both erroneous, as noncanonical strings never appear in training data, and wasteful, diverting probability mass away from plausible outputs. These are avoidable mistakes! In this work, we propose methods to enforce canonicality in token-level language models, ensuring that only canonical token strings are assigned positive probability. We present two approaches: (1) canonicality by conditioning, leveraging test-time inference strategies without additional training, and (2) canonicality by construction, a model parameterization that guarantees canonical outputs but requires training. We demonstrate that fixing canonicality mistakes improves the likelihood of held-out data for several models and corpora.

nan


Article 963

Title@2025-06-09 (1): Discrete and Continuous Difference of Submodular Minimization

Title: Discrete and Continuous Difference of Submodular Minimization Diskreter und kontinuierlicher Unterschied der Submodularen Minimierung 子模块最小化的分质和连续差异 2506.07952v1

Authors (3): George Orfanides, Tim Hoheisel, Marwa El Halabi

Submodular functions, defined on continuous or discrete domains, arise in numerous applications. We study the minimization of the difference of two submodular (DS) functions, over both domains, extending prior work restricted to set functions. We show that all functions on discrete domains and all smooth functions on continuous domains are DS. For discrete domains, we observe that DS minimization is equivalent to minimizing the difference of two convex (DC) functions, as in the set function case. We propose a novel variant of the DC Algorithm (DCA) and apply it to the resulting DC Program, obtaining comparable theoretical guarantees as in the set function case. The algorithm can be applied to continuous domains via discretization. Experiments demonstrate that our method outperforms baselines in integer compressive sensing and integer least squares.

nan


Article 964

Title@2025-06-09 (1): Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

Title: Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization Easy2Hard-Bench: Standardisierte Schwierigkeits-Labels für Profiling LLM Performance und Generalisierung Easy2Hard-Bench:用于分析LLM性能和通用的标准化困难标签 2409.18433v2

Authors (11): Mucong Ding, Chenghao Deng, Jocelyn Choo, Zichu Wu, Aakriti Agrawal, Avi Schwarzschild, Tianyi Zhou, Tom Goldstein, John Langford, Anima Anandkumar, Furong Huang

While generalization over tasks from easy to hard is crucial to profile language models (LLMs), the datasets with fine-grained difficulty annotations for each problem across a broad range of complexity are still blank. Aiming to address this limitation, we present Easy2Hard-Bench, a consistently formatted collection of 6 benchmark datasets spanning various domains, such as mathematics and programming problems, chess puzzles, and reasoning questions. Each problem within these datasets is annotated with numerical difficulty scores. To systematically estimate problem difficulties, we collect abundant performance data on attempts to each problem by humans in the real world or LLMs on the prominent leaderboard. Leveraging the rich performance data, we apply well-established difficulty ranking systems, such as Item Response Theory (IRT) and Glicko-2 models, to uniformly assign numerical difficulty scores to problems. Moreover, datasets in Easy2Hard-Bench distinguish themselves from previous collections by a higher proportion of challenging problems. Through extensive experiments with six state-of-the-art LLMs, we provide a comprehensive analysis of their performance and generalization capabilities across varying levels of difficulty, with the aim of inspiring future research in LLM generalization. The datasets are available at https://huggingface.co/datasets/furonghuang-lab/Easy2Hard-Bench.

nan


Article 965

Title@2025-06-09 (1): Cost-Optimal Active AI Model Evaluation

Title: Cost-Optimal Active AI Model Evaluation Kostenoptimale, aktive KI-Modellbewertung AI模型评价 2506.07949v1

Authors (5): Anastasios N. Angelopoulos, Jacob Eisenstein, Jonathan Berant, Alekh Agarwal, Adam Fisch

The development lifecycle of generative AI systems requires continual evaluation, data acquisition, and annotation, which is costly in both resources and time. In practice, rapid iteration often makes it necessary to rely on synthetic annotation data because of the low cost, despite the potential for substantial bias. In this paper, we develop novel, cost-aware methods for actively balancing the use of a cheap, but often inaccurate, weak rater – such as a model-based autorater that is designed to automatically assess the quality of generated content – with a more expensive, but also more accurate, strong rater alternative such as a human. More specifically, the goal of our approach is to produce a low variance, unbiased estimate of the mean of the target “strong” rating, subject to some total annotation budget. Building on recent work in active and prediction-powered statistical inference, we derive a family of cost-optimal policies for allocating a given annotation budget between weak and strong raters so as to maximize statistical efficiency. Using synthetic and real-world data, we empirically characterize the conditions under which these policies yield improvements over prior methods. We find that, especially in tasks where there is high variability in the difficulty of examples, our policies can achieve the same estimation precision at a far lower total annotation budget than standard evaluation methods.

nan


Article 966

Title@2025-06-09 (1): TokenBreak: Bypassing Text Classification Models Through Token Manipulation

Title: TokenBreak: Bypassing Text Classification Models Through Token Manipulation TokenBreak: Textklassifikationsmodelle durch Token-Manipulation umgehen Token Break: 通过调制调控绕过文本分类模型 2506.07948v1

Authors (3): Kasimir Schulz, Kenneth Yeung, Kieran Evans

Natural Language Processing (NLP) models are used for text-related tasks such as classification and generation. To complete these tasks, input data is first tokenized from human-readable text into a format the model can understand, enabling it to make inferences and understand context. Text classification models can be implemented to guard against threats such as prompt injection attacks against Large Language Models (LLMs), toxic input and cybersecurity risks such as spam emails. In this paper, we introduce TokenBreak: a novel attack that can bypass these protection models by taking advantage of the tokenization strategy they use. This attack technique manipulates input text in such a way that certain models give an incorrect classification. Importantly, the end target (LLM or email recipient) can still understand and respond to the manipulated text and therefore be vulnerable to the very attack the protection model was put in place to prevent. The tokenizer is tied to model architecture, meaning it is possible to predict whether or not a model is vulnerable to attack based on family. We also present a defensive strategy as an added layer of protection that can be implemented without having to retrain the defensive model.

nan


Article 967

Title@2025-06-09 (1): Info-Coevolution: An Efficient Framework for Data Model Coevolution

Title: Info-Coevolution: An Efficient Framework for Data Model Coevolution Info-Coevolution: Ein effizienter Rahmen für die Datenmodellkoevolution 信息革命:数据模型革命的有效框架 2506.08070v1

Authors (9): Ziheng Qin, Hailun Xu, Wei Chee Yew, Qi Jia, Yang Luo, Kanchan Sarkar, Danhui Guan, Kai Wang, Yang You

Machine learning relies heavily on data, yet the continuous growth of real-world data poses challenges for efficient dataset construction and training. A fundamental yet unsolved question is: given our current model and data, does a new data (sample/batch) need annotation/learning? Conventional approaches retain all available data, leading to non-optimal data and training efficiency. Active learning aims to reduce data redundancy by selecting a subset of samples to annotate, while it increases pipeline complexity and introduces bias. In this work, we propose Info-Coevolution, a novel framework that efficiently enables models and data to coevolve through online selective annotation with no bias. Leveraging task-specific models (and open-source models), it selectively annotates and integrates online and web data to improve datasets efficiently. For real-world datasets like ImageNet-1K, Info-Coevolution reduces annotation and training costs by 32\% without performance loss. It is able to automatically give the saving ratio without tuning the ratio. It can further reduce the annotation ratio to 50\% with semi-supervised learning. We also explore retrieval-based dataset enhancement using unlabeled open-source data. Code is available at https://github.com/NUS-HPC-AI-Lab/Info-Coevolution/.

nan


Article 968

Title@2025-06-09 (1): Gradients: When Markets Meet Fine-tuning – A Distributed Approach to Model Optimisation

Title: Gradients: When Markets Meet Fine-tuning – A Distributed Approach to Model Optimisation Gradienten: Wenn Märkte Feinsteuerung treffen – Ein verteilter Ansatz zur Modelloptimierung 渐进:市场达到微调时 – – 模型优化的分配办法 2506.07940v1

Authors (1): Christopher Subia-Waud

Foundation model fine-tuning faces a fundamental challenge: existing AutoML platforms rely on single optimisation strategies that explore only a fraction of viable hyperparameter configurations. In this white paper, We introduce Gradients, a decentralised AutoML platform that transforms hyperparameter optimisation into a competitive marketplace where independent miners compete to discover optimal configurations. Economic incentives align individual exploration with collective optimisation goals, driving systematic investigation of hyperparameter regions that centralised methods miss. We evaluate our approach across 180 controlled experiments spanning diverse model architectures (70M to 70B parameters) and task types. Gradients achieves an 82.8\% win rate against HuggingFace AutoTrain and 100\% against TogetherAI, Databricks, and Google Cloud, with mean improvements of 11.8\% and 42.1\% respectively. Complex reasoning and retrieval tasks show particularly strong gains of 30-40\%, whilst diffusion models achieve 23.4\% improvements for person-specific generation. These results demonstrate that competitive, economically-driven approaches can systematically discover superior configurations that centralised AutoML consistently miss.

nan


Article 969

Title@2025-06-09 (1): Reliably detecting model failures in deployment without labels

Title: Reliably detecting model failures in deployment without labels Zuverlässige Erkennung von Modellfehlern im Einsatz ohne Etiketten 可靠地发现无标签部署中的模型失败 2506.05047v2

Authors (7): Viet Nguyen, Changjian Shui, Vijay Giri, Siddarth Arya, Amol Verma, Fahad Razak, Rahul G. Krishnan

The distribution of data changes over time; models operating operating in dynamic environments need retraining. But knowing when to retrain, without access to labels, is an open challenge since some, but not all shifts degrade model performance. This paper formalizes and addresses the problem of post-deployment deterioration (PDD) monitoring. We propose D3M, a practical and efficient monitoring algorithm based on the disagreement of predictive models, achieving low false positive rates under non-deteriorating shifts and provides sample complexity bounds for high true positive rates under deteriorating shifts. Empirical results on both standard benchmark and a real-world large-scale internal medicine dataset demonstrate the effectiveness of the framework and highlight its viability as an alert mechanism for high-stakes machine learning pipelines.

nan


Article 970

Title@2025-06-09 (1): Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models

Title: Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models Mimicking oder Reasoning: Multi-Modal In-Context Learning in Vision-Language-Modellen neu denken 模拟或解释:在视觉语言模型中重新思考多模式内学习 2506.07936v1

Authors (7): Chengyue Huang, Yuchen Zhu, Sichen Zhu, Jingyun Xiao, Moises Andrade, Shivang Chopra, Zsolt Kira

Vision-language models (VLMs) are widely assumed to exhibit in-context learning (ICL), a property similar to that of their language-only counterparts. While recent work suggests VLMs can perform multimodal ICL (MM-ICL), studies show they often rely on shallow heuristics – such as copying or majority voting – rather than true task understanding. We revisit this assumption by evaluating VLMs under distribution shifts, where support examples come from a dataset different from the query. Surprisingly, performance often degrades with more demonstrations, and models tend to copy answers rather than learn from them. To investigate further, we propose a new MM-ICL with Reasoning pipeline that augments each demonstration with a generated rationale alongside the answer. We conduct extensive and comprehensive experiments on both perception- and reasoning-required datasets with open-source VLMs ranging from 3B to 72B and proprietary models such as Gemini 2.0. We conduct controlled studies varying shot count, retrieval method, rationale quality, and distribution. Our results show limited performance sensitivity across these factors, suggesting that current VLMs do not effectively utilize demonstration-level information as intended in MM-ICL.

nan


Article 971

Title@2025-06-09 (1): Ensemble-Based Survival Models with the Self-Attended Beran Estimator Predictions

Title: Ensemble-Based Survival Models with the Self-Attended Beran Estimator Predictions Ensemble-basierte Überlebensmodelle mit den selbstbestimmten Beran-Estimator-Vorhersagen 与自行预测的贝兰估算器合成生存模型 2506.07933v1

Authors (5): Lev V. Utkin, Semen P. Khomets, Vlada A. Efremenko, Andrei V. Konstantinov, Natalya M. Verbova

Survival analysis predicts the time until an event of interest, such as failure or death, but faces challenges due to censored data, where some events remain unobserved. Ensemble-based models, like random survival forests and gradient boosting, are widely used but can produce unstable predictions due to variations in bootstrap samples. To address this, we propose SurvBESA (Survival Beran Estimators Self-Attended), a novel ensemble model that combines Beran estimators with a self-attention mechanism. Unlike traditional methods, SurvBESA applies self-attention to predicted survival functions, smoothing out noise by adjusting each survival function based on its similarity to neighboring survival functions. We also explore a special case using Huber’s contamination model to define attention weights, simplifying training to a quadratic or linear optimization problem. Numerical experiments show that SurvBESA outperforms state-of-the-art models. The implementation of SurvBESA is publicly available.

nan


Article 972

Title@2025-06-09 (1): Predicting Bad Goods Risk Scores with ARIMA Time Series: A Novel Risk Assessment Approach

Title: Predicting Bad Goods Risk Scores with ARIMA Time Series: A Novel Risk Assessment Approach Risiko-Scores mit ARIMA Time Series voraussagen: Ein neuartiger Ansatz zur Risikobewertung ARIMA 时间系列:新风险评估方法 2502.16520v3

Authors (1): Bishwajit Prasad Gond

The increasing complexity of supply chains and the rising costs associated with defective or substandard goods (bad goods) highlight the urgent need for advanced predictive methodologies to mitigate risks and enhance operational efficiency. This research presents a novel framework that integrates Time Series ARIMA (AutoRegressive Integrated Moving Average) models with a proprietary formula specifically designed to calculate bad goods after time series forecasting. By leveraging historical data patterns, including sales, returns, and capacity, the model forecasts potential quality failures, enabling proactive decision-making. ARIMA is employed to capture temporal trends in time series data, while the newly developed formula quantifies the likelihood and impact of defects with greater precision. Experimental results, validated on a dataset spanning 2022-2024 for Organic Beer-G 1 Liter, demonstrate that the proposed method outperforms traditional statistical models, such as Exponential Smoothing and Holt-Winters, in both prediction accuracy and risk evaluation. This study advances the field of predictive analytics by bridging time series forecasting, ARIMA, and risk management in supply chain quality control, offering a scalable and practical solution for minimizing losses due to bad goods.

nan


Article 973

Title@2025-06-09 (1): Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor

Title: Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor Squeeze3D: Ihr Modell der 3D-Generation ist geheim ein extremer Neuralkompressor Squeeze3D: 您的 3D 代的模型是一个秘密的 极端神经压缩器 2506.07932v1

Authors (5): Rishit Dagli, Yushi Guan, Sankeerth Durvasula, Mohammadreza Mofayezi, Nandita Vijaykumar

We propose Squeeze3D, a novel framework that leverages implicit prior knowledge learnt by existing pre-trained 3D generative models to compress 3D data at extremely high compression ratios. Our approach bridges the latent spaces between a pre-trained encoder and a pre-trained generation model through trainable mapping networks. Any 3D model represented as a mesh, point cloud, or a radiance field is first encoded by the pre-trained encoder and then transformed (i.e. compressed) into a highly compact latent code. This latent code can effectively be used as an extremely compressed representation of the mesh or point cloud. A mapping network transforms the compressed latent code into the latent space of a powerful generative model, which is then conditioned to recreate the original 3D model (i.e. decompression). Squeeze3D is trained entirely on generated synthetic data and does not require any 3D datasets. The Squeeze3D architecture can be flexibly used with existing pre-trained 3D encoders and existing generative models. It can flexibly support different formats, including meshes, point clouds, and radiance fields. Our experiments demonstrate that Squeeze3D achieves compression ratios of up to 2187x for textured meshes, 55x for point clouds, and 619x for radiance fields while maintaining visual quality comparable to many existing methods. Squeeze3D only incurs a small compression and decompression latency since it does not involve training object-specific networks to compress an object.

nan


Article 974

Title@2025-06-09 (1): Missing Data Imputation by Reducing Mutual Information with Rectified Flows

Title: Missing Data Imputation by Reducing Mutual Information with Rectified Flows Fehlende Daten-Imputation durch Verringerung gegenseitiger Informationen mit rektifizierten Strömen 通过减少相互信息与纠正流流 2505.11749v2

Authors (5): Jiahao Yu, Qizhen Ying, Leyang Wang, Ziyue Jiang, Song Liu

This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and their corresponding missing mask. Inspired by GAN-based approaches, which train generators to decrease the predictability of missingness patterns, our method explicitly targets the reduction of mutual information. Specifically, our algorithm iteratively minimizes the KL divergence between the joint distribution of the imputed data and missing mask, and the product of their marginals from the previous iteration. We show that the optimal imputation under this framework corresponds to solving an ODE, whose velocity field minimizes a rectified flow training objective. We further illustrate that some existing imputation techniques can be interpreted as approximate special cases of our mutual-information-reducing framework. Comprehensive experiments on synthetic and real-world datasets validate the efficacy of our proposed approach, demonstrating superior imputation performance.

nan


Article 975

Title@2025-06-09 (1): A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle

Title: A Generative Physics-Informed Reinforcement Learning-Based Approach for Construction of Representative Drive Cycle Ein generativer physik-informierter, lernbasierter Ansatz zur Konstruktion repräsentativer Antriebszyklen 为建造具有代表性的驱动周期而采用基于学习的创形物理综合强化强化学习方法 2506.07929v1

Authors (4): Amirreza Yasami, Mohammadali Tofigh, Mahdi Shahbakhti, Charles Robert Koch

Accurate driving cycle construction is crucial for vehicle design, fuel economy analysis, and environmental impact assessments. A generative Physics-Informed Expected SARSA-Monte Carlo (PIESMC) approach that constructs representative driving cycles by capturing transient dynamics, acceleration, deceleration, idling, and road grade transitions while ensuring model fidelity is introduced. Leveraging a physics-informed reinforcement learning framework with Monte Carlo sampling, PIESMC delivers efficient cycle construction with reduced computational cost. Experimental evaluations on two real-world datasets demonstrate that PIESMC replicates key kinematic and energy metrics, achieving up to a 57.3% reduction in cumulative kinematic fragment errors compared to the Micro-trip-based (MTB) method and a 10.5% reduction relative to the Markov-chain-based (MCB) method. Moreover, it is nearly an order of magnitude faster than conventional techniques. Analyses of vehicle-specific power distributions and wavelet-transformed frequency content further confirm its ability to reproduce experimental central tendencies and variability.

nan


Article 976

Title@2025-06-09 (1): Solving Inequality Proofs with Large Language Models

Title: Solving Inequality Proofs with Large Language Models Unequality Proofs mit großen Sprachmodellen lösen 以大语言模式解决不平等证据 2506.07927v1

Authors (7): Jiayi Sheng, Luna Lyu, Jikai Jin, Tony Xia, Alex Gu, James Zou, Pan Lu

Inequality proving, crucial across diverse scientific and mathematical fields, tests advanced reasoning skills such as discovering tight bounds and strategic theorem application. This makes it a distinct, demanding frontier for large language models (LLMs), offering insights beyond general mathematical problem-solving. Progress in this area is hampered by existing datasets that are often scarce, synthetic, or rigidly formal. We address this by proposing an informal yet verifiable task formulation, recasting inequality proving into two automatically checkable subtasks: bound estimation and relation prediction. Building on this, we release IneqMath, an expert-curated dataset of Olympiad-level inequalities, including a test set and training corpus enriched with step-wise solutions and theorem annotations. We also develop a novel LLM-as-judge evaluation framework, combining a final-answer judge with four step-wise judges designed to detect common reasoning flaws. A systematic evaluation of 29 leading LLMs on IneqMath reveals a surprising reality: even top models like o1 achieve less than 10% overall accuracy under step-wise scrutiny; this is a drop of up to 65.5% from their accuracy considering only final answer equivalence. This discrepancy exposes fragile deductive chains and a critical gap for current LLMs between merely finding an answer and constructing a rigorous proof. Scaling model size and increasing test-time computation yield limited gains in overall proof correctness. Instead, our findings highlight promising research directions such as theorem-guided reasoning and self-refinement. Code and data are available at https://ineqmath.github.io/.

nan


Article 977

Title@2025-06-09 (1): A Comparative Study of U-Net Architectures for Change Detection in Satellite Images

Title: A Comparative Study of U-Net Architectures for Change Detection in Satellite Images Eine vergleichende Studie über U-Net-Architekturen zur Erkennung von Veränderungen in Satellitenbildern U-Net图象变化探测网络结构比较研究 2506.07925v1

Authors (3): Yaxita Amin, Naimisha S Trivedi, Rashmi Bhattad

Remote sensing change detection is essential for monitoring the everchanging landscapes of the Earth. The U-Net architecture has gained popularity for its capability to capture spatial information and perform pixel-wise classification. However, their application in the Remote sensing field remains largely unexplored. Therefore, this paper fill the gap by conducting a comprehensive analysis of 34 papers. This study conducts a comparison and analysis of 18 different U-Net variations, assessing their potential for detecting changes in remote sensing. We evaluate both benefits along with drawbacks of each variation within the framework of this particular application. We emphasize variations that are explicitly built for change detection, such as Siamese Swin-U-Net, which utilizes a Siamese architecture. The analysis highlights the significance of aspects such as managing data from different time periods and collecting relationships over a long distance to enhance the precision of change detection. This study provides valuable insights for researchers and practitioners that choose U-Net versions for remote sensing change detection tasks.

nan


Article 978

Title@2025-06-09 (1): W4S4: WaLRUS Meets S4 for Long-Range Sequence Modeling

Title: W4S4: WaLRUS Meets S4 for Long-Range Sequence Modeling W4S4: WaLRUS trifft S4 für lange Sequenzmodellierung W4S4:WALRUS在长程序列建模中与S4相匹配 2506.07920v1

Authors (3): Hossein Babaei, Mel White, Richard G. Baraniuk

State Space Models (SSMs) have emerged as powerful components for sequence modeling, enabling efficient handling of long-range dependencies via linear recurrence and convolutional computation. However, their effectiveness depends heavily on the choice and initialization of the state matrix. In this work, we build on the SaFARi framework and existing WaLRUS SSMs to introduce a new variant, W4S4 (WaLRUS for S4), a new class of SSMs constructed from redundant wavelet frames. WaLRUS admits a stable diagonalization and supports fast kernel computation without requiring low-rank approximations, making it both theoretically grounded and computationally efficient. We show that WaLRUS retains information over long horizons significantly better than HiPPO-based SSMs, both in isolation and when integrated into deep architectures such as S4. Our experiments demonstrate consistent improvements across delay reconstruction tasks, classification benchmarks, and long-range sequence modeling, confirming that high-quality, structured initialization enabled by wavelet-based state dynamic offers substantial advantages over existing alternatives. WaLRUS provides a scalable and versatile foundation for the next generation of deep SSM-based models.

nan


Article 979

Title@2025-06-09 (1): Uncovering the Functional Roles of Nonlinearity in Memory

Title: Uncovering the Functional Roles of Nonlinearity in Memory Enthüllen der funktionalen Rollen der Nichtlinearität im Gedächtnis 发挥记忆中非线性的作用 2506.07919v1

Authors (2): Manuel Brenner, Georgia Koppe

Memory and long-range temporal processing are core requirements for sequence modeling tasks across natural language processing, time-series forecasting, speech recognition, and control. While nonlinear recurrence has long been viewed as essential for enabling such mechanisms, recent work suggests that linear dynamics may often suffice. In this study, we go beyond performance comparisons to systematically dissect the functional role of nonlinearity in recurrent networks–identifying both when it is computationally necessary, and what mechanisms it enables. We use Almost Linear Recurrent Neural Networks (AL-RNNs), which allow fine-grained control over nonlinearity, as both a flexible modeling tool and a probe into the internal mechanisms of memory. Across a range of classic sequence modeling tasks and a real-world stimulus selection task, we find that minimal nonlinearity is not only sufficient but often optimal, yielding models that are simpler, more robust, and more interpretable than their fully nonlinear or linear counterparts. Our results provide a principled framework for selectively introducing nonlinearity, bridging dynamical systems theory with the functional demands of long-range memory and structured computation in recurrent neural networks, with implications for both artificial and biological neural systems.

nan


Article 980

Title@2025-06-09 (1): CausalPFN: Amortized Causal Effect Estimation via In-Context Learning

Title: CausalPFN: Amortized Causal Effect Estimation via In-Context Learning CausalPFN: Amortisierte Abschätzung des Kausaleffekts durch In-Context Learning CausalPPFN: 通过内文学习对因果效应的摊销估计 2506.07918v1

Authors (7): Vahid Balazadeh, Hamidreza Kamkari, Valentin Thomas, Benson Li, Junwei Ma, Jesse C. Cresswell, Rahul G. Krishnan

Causal effect estimation from observational data is fundamental across various applications. However, selecting an appropriate estimator from dozens of specialized methods demands substantial manual effort and domain expertise. We present CausalPFN, a single transformer that amortizes this workflow: trained once on a large library of simulated data-generating processes that satisfy ignorability, it infers causal effects for new observational datasets out-of-the-box. CausalPFN combines ideas from Bayesian causal inference with the large-scale training protocol of prior-fitted networks (PFNs), learning to map raw observations directly to causal effects without any task-specific adjustment. Our approach achieves superior average performance on heterogeneous and average treatment effect estimation benchmarks (IHDP, Lalonde, ACIC). Moreover, it shows competitive performance for real-world policy making on uplift modeling tasks. CausalPFN provides calibrated uncertainty estimates to support reliable decision-making based on Bayesian principles. This ready-to-use model does not require any further training or tuning and takes a step toward automated causal inference (https://github.com/vdblm/CausalPFN).

nan


Article 981

Title@2025-06-09 (1): Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces

Title: Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces Diffuse Alles: Multimodale Diffusionsmodelle auf willkürlichen Staatsräumen Diffuse Everything:关于任意国家空间的多模式扩散模型 2506.07903v1

Authors (5): Kevin Rojas, Yuchen Zhu, Sichen Zhu, Felix X. -F. Ye, Molei Tao

Diffusion models have demonstrated remarkable performance in generating unimodal data across various tasks, including image, video, and text generation. On the contrary, the joint generation of multimodal data through diffusion models is still in the early stages of exploration. Existing approaches heavily rely on external preprocessing protocols, such as tokenizers and variational autoencoders, to harmonize varied data representations into a unified, unimodal format. This process heavily demands the high accuracy of encoders and decoders, which can be problematic for applications with limited data. To lift this restriction, we propose a novel framework for building multimodal diffusion models on arbitrary state spaces, enabling native generation of coupled data across different modalities. By introducing an innovative decoupled noise schedule for each modality, we enable both unconditional and modality-conditioned generation within a single model simultaneously. We empirically validate our approach for text-image generation and mixed-type tabular data synthesis, demonstrating that it achieves competitive performance.

nan


Article 982

Title@2025-06-09 (1): FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative Modeling

Title: FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative Modeling FunDiff: Diffusionsmodelle über Funktionsräume für physikinformierte Generative Modellierung FunDiff:物理成形生成模型的功能空间上的传播模型 2506.07902v1

Authors (4): Sifan Wang, Zehao Dou, Tong-Rui Liu, Lu Lu

Recent advances in generative modeling – particularly diffusion models and flow matching – have achieved remarkable success in synthesizing discrete data such as images and videos. However, adapting these models to physical applications remains challenging, as the quantities of interest are continuous functions governed by complex physical laws. Here, we introduce $\textbf{FunDiff}$, a novel framework for generative modeling in function spaces. FunDiff combines a latent diffusion process with a function autoencoder architecture to handle input functions with varying discretizations, generate continuous functions evaluable at arbitrary locations, and seamlessly incorporate physical priors. These priors are enforced through architectural constraints or physics-informed loss functions, ensuring that generated samples satisfy fundamental physical laws. We theoretically establish minimax optimality guarantees for density estimation in function spaces, showing that diffusion-based estimators achieve optimal convergence rates under suitable regularity conditions. We demonstrate the practical effectiveness of FunDiff across diverse applications in fluid dynamics and solid mechanics. Empirical results show that our method generates physically consistent samples with high fidelity to the target distribution and exhibits robustness to noisy and low-resolution data. Code and datasets are publicly available at https://github.com/sifanexisted/fundiff.

nan


Article 983

Title@2025-06-09 (1): MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs

Title: MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs MEMOIR: Lebenslange Modellbearbeitung mit minimalem Überschreiben und informierter Aufbewahrung für LLMs MEMOIR: 长寿命模型编辑,以最小覆盖和知情保留LLMM 2506.07899v1

Authors (5): Ke Wang, Yiming Qin, Nikolaos Dimitriadis, Alessandro Favero, Pascal Frossard

Language models deployed in real-world systems often require post-hoc updates to incorporate new or corrected knowledge. However, editing such models efficiently and reliably - without retraining or forgetting previous information - remains a major challenge. Existing methods for lifelong model editing either compromise generalization, interfere with past edits, or fail to scale to long editing sequences. We propose MEMOIR, a novel scalable framework that injects knowledge through a residual memory, i.e., a dedicated parameter module, while preserving the core capabilities of the pre-trained model. By sparsifying input activations through sample-dependent masks, MEMOIR confines each edit to a distinct subset of the memory parameters, minimizing interference among edits. At inference, it identifies relevant edits by comparing the sparse activation patterns of new queries to those stored during editing. This enables generalization to rephrased queries by activating only the relevant knowledge while suppressing unnecessary memory activation for unrelated prompts. Experiments on question answering, hallucination correction, and out-of-distribution generalization benchmarks across LLaMA-3 and Mistral demonstrate that MEMOIR achieves state-of-the-art performance across reliability, generalization, and locality metrics, scaling to thousands of sequential edits with minimal forgetting.

nan


Article 984

Title@2025-06-09 (1): GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution

Title: GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution GaussianVAE: Adaptive Lerndynamik von 3D Gaussianern für High-Fidelity Super-Resolution GausianVAE: 高频高频高频高频高频高分辨率3D的适应性学习动力 2506.07897v1

Authors (3): Shuja Khalid, Mohamed Ibrahim, Yang Liu

We present a novel approach for enhancing the resolution and geometric fidelity of 3D Gaussian Splatting (3DGS) beyond native training resolution. Current 3DGS methods are fundamentally limited by their input resolution, producing reconstructions that cannot extrapolate finer details than are present in the training views. Our work breaks this limitation through a lightweight generative model that predicts and refines additional 3D Gaussians where needed most. The key innovation is our Hessian-assisted sampling strategy, which intelligently identifies regions that are likely to benefit from densification, ensuring computational efficiency. Unlike computationally intensive GANs or diffusion approaches, our method operates in real-time (0.015s per inference on a single consumer-grade GPU), making it practical for interactive applications. Comprehensive experiments demonstrate significant improvements in both geometric accuracy and rendering quality compared to state-of-the-art methods, establishing a new paradigm for resolution-free 3D scene enhancement.

nan


Article 985

Title@2025-06-09 (1): Learning to Stop: Deep Learning for Mean Field Optimal Stopping

Title: Learning to Stop: Deep Learning for Mean Field Optimal Stopping Stoppen lernen: Deep Learning für das mittlere Feld Optimales Stoppen 学习到停止:为平均字段进行深度学习 2410.08850v2

Authors (3): Lorenzo Magnino, Yuchen Zhu, Mathieu Laurière

Optimal stopping is a fundamental problem in optimization with applications in risk management, finance, robotics, and machine learning. We extend the standard framework to a multi-agent setting, named multi-agent optimal stopping (MAOS), where agents cooperate to make optimal stopping decisions in a finite-space, discrete-time environment. Since solving MAOS becomes computationally prohibitive as the number of agents is very large, we study the mean-field optimal stopping (MFOS) problem, obtained as the number of agents tends to infinity. We establish that MFOS provides a good approximation to MAOS and prove a dynamic programming principle (DPP) based on mean-field control theory. We then propose two deep learning approaches: one that learns optimal stopping decisions by simulating full trajectories and another that leverages the DPP to compute the value function and to learn the optimal stopping rule using backward induction. Both methods train neural networks to approximate optimal stopping policies. We demonstrate the effectiveness and the scalability of our work through numerical experiments on 6 different problems in spatial dimension up to 300. To the best of our knowledge, this is the first work to formalize and computationally solve MFOS in discrete time and finite space, opening new directions for scalable MAOS methods.

nan


Article 986

Title@2025-06-09 (1): SMRS: advocating a unified reporting standard for surrogate models in the artificial intelligence era

Title: SMRS: advocating a unified reporting standard for surrogate models in the artificial intelligence era SMRS: Für einen einheitlichen Berichterstattungsstandard für Surrogatmodelle im Zeitalter der künstlichen Intelligenz SMRS:倡导人工智能时代代用模型的统一报告标准 2502.06753v2

Authors (5): Elizaveta Semenova, Alisa Sheinkman, Timothy James Hitge, Siobhan Mackenzie Hall, Jon Cockayne

Surrogate models are widely used to approximate complex systems across science and engineering to reduce computational costs. Despite their widespread adoption, the field lacks standardisation across key stages of the modelling pipeline, including data sampling, model selection, evaluation, and downstream analysis. This fragmentation limits reproducibility and cross-domain utility – a challenge further exacerbated by the rapid proliferation of AI-driven surrogate models. We argue for the urgent need to establish a structured reporting standard, the Surrogate Model Reporting Specification (SMRS), that systematically captures essential design and evaluation choices while remaining agnostic to implementation specifics. By promoting a standardised yet flexible framework, we aim to improve the reliability of surrogate modelling, foster interdisciplinary knowledge transfer, and, as a result, accelerate scientific progress in the AI era.

nan


Article 987

Title@2025-06-09 (1): SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark

Title: SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark SoK: Datenrekonstruktion greift gegen Modelle des maschinellen Lernens an: Definition, Metrics und Benchmark SoK:对机器学习模式进行数据重建攻击:定义、计量和基准 2506.07888v1

Authors (4): Rui Wen, Yiyong Liu, Michael Backes, Yang Zhang

Data reconstruction attacks, which aim to recover the training dataset of a target model with limited access, have gained increasing attention in recent years. However, there is currently no consensus on a formal definition of data reconstruction attacks or appropriate evaluation metrics for measuring their quality. This lack of rigorous definitions and universal metrics has hindered further advancement in this field. In this paper, we address this issue in the vision domain by proposing a unified attack taxonomy and formal definitions of data reconstruction attacks. We first propose a set of quantitative evaluation metrics that consider important criteria such as quantifiability, consistency, precision, and diversity. Additionally, we leverage large language models (LLMs) as a substitute for human judgment, enabling visual evaluation with an emphasis on high-quality reconstructions. Using our proposed taxonomy and metrics, we present a unified framework for systematically evaluating the strengths and limitations of existing attacks and establishing a benchmark for future research. Empirical results, primarily from a memorization perspective, not only validate the effectiveness of our metrics but also offer valuable insights for designing new attacks.

nan


Article 988

Title@2025-06-09 (1): SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection

Title: SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection SimLTD: Einfache, überwachte und halbüberwachte Langzeitobjekterkennung SimLTD: 简单监视和半监视长途远航物体探测 2412.20047v3

Authors (1): Phi Vu Tran

While modern visual recognition systems have made significant advancements, many continue to struggle with the open problem of learning from few exemplars. This paper focuses on the task of object detection in the setting where object classes follow a natural long-tailed distribution. Existing methods for long-tailed detection resort to external ImageNet labels to augment the low-shot training instances. However, such dependency on a large labeled database has limited utility in practical scenarios. We propose a versatile and scalable approach to leverage optional unlabeled images, which are easy to collect without the burden of human annotations. Our SimLTD framework is straightforward and intuitive, and consists of three simple steps: (1) pre-training on abundant head classes; (2) transfer learning on scarce tail classes; and (3) fine-tuning on a sampled set of both head and tail classes. Our approach can be viewed as an improved head-to-tail model transfer paradigm without the added complexities of meta-learning or knowledge distillation, as was required in past research. By harnessing supplementary unlabeled images, without extra image labels, SimLTD establishes new record results on the challenging LVIS v1 benchmark across both supervised and semi-supervised settings.

nan


Article 989

Title@2025-06-09 (1): Schauder Bases for $C[0, 1]$ Using ReLU, Softplus and Two Sigmoidal Functions

Title: Schauder Bases for $C[0, 1]$ Using ReLU, Softplus and Two Sigmoidal Functions Schauder-Basis für $C[0, 1]$ mit ReLU, Softplus und zwei Sigmoidal-Funktionen 使用 ReLU、 Softplus 和两个悬浮函数的 Schauder 基地[0, 1]$C[0, 1]美元 2506.07884v1

Authors (3): Anand Ganesh, Babhrubahan Bose, Anand Rajagopalan

We construct four Schauder bases for the space $C[0,1]$, one using ReLU functions, another using Softplus functions, and two more using sigmoidal versions of the ReLU and Softplus functions. This establishes the existence of a basis using these functions for the first time, and improves on the universal approximation property associated with them.

nan


Article 990

Title@2025-06-09 (1): Diffusion Counterfactual Generation with Semantic Abduction

Title: Diffusion Counterfactual Generation with Semantic Abduction Diffusion Gegenfaktische Generation mit semantischer Abduktion 以语义绑架方式传播反实际生成 2506.07883v1

Authors (5): Rajat Rasal, Avinash Kori, Fabio De Sousa Ribeiro, Tian Xia, Ben Glocker

Counterfactual image generation presents significant challenges, including preserving identity, maintaining perceptual quality, and ensuring faithfulness to an underlying causal model. While existing auto-encoding frameworks admit semantic latent spaces which can be manipulated for causal control, they struggle with scalability and fidelity. Advancements in diffusion models present opportunities for improving counterfactual image editing, having demonstrated state-of-the-art visual quality, human-aligned perception and representation learning capabilities. Here, we present a suite of diffusion-based causal mechanisms, introducing the notions of spatial, semantic and dynamic abduction. We propose a general framework that integrates semantic representations into diffusion models through the lens of Pearlian causality to edit images via a counterfactual reasoning process. To our knowledge, this is the first work to consider high-level semantic identity preservation for diffusion counterfactuals and to demonstrate how semantic control enables principled trade-offs between faithful causal control and identity preservation.

nan


Article 991

Title@2025-06-09 (1): On Hypothesis Transfer Learning of Functional Linear Models

Title: On Hypothesis Transfer Learning of Functional Linear Models Über Hypothesentransfer Lernen von funktionalen Linearmodellen 关于职能线性模型学习的假设转移 2206.04277v5

Authors (2): Haotian Lin, Matthew Reimherr

We study the transfer learning (TL) for the functional linear regression (FLR) under the Reproducing Kernel Hilbert Space (RKHS) framework, observing that the TL techniques in existing high-dimensional linear regression are not compatible with the truncation-based FLR methods, as functional data are intrinsically infinite-dimensional and generated by smooth underlying processes. We measure the similarity across tasks using RKHS distance, allowing the type of information being transferred to be tied to the properties of the imposed RKHS. Building on the hypothesis offset transfer learning paradigm, two algorithms are proposed: one conducts the transfer when positive sources are known, while the other leverages aggregation techniques to achieve robust transfer without prior information about the sources. We establish asymptotic lower bounds for this learning problem and show that the proposed algorithms enjoy a matching upper bound. These analyses provide statistical insights into factors that contribute to the dynamics of the transfer. We also extend the results to functional generalized linear models. The effectiveness of the proposed algorithms is demonstrated via extensive synthetic data as well as real-world data applications.

nan


Article 992

Title@2025-06-09 (1): GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity

Title: GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity GradSkip: Kommunikationsbeschleunigte lokale Gradientenmethoden mit besserer Computational Complexity GradSkip:通讯加速的地方渐进法,具有更好的计算复杂度 2210.16402v3

Authors (3): Artavazd Maranjyan, Mher Safaryan, Peter Richtárik

We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing clients to perform multiple local gradient-type training steps before communication. In a recent breakthrough, Mishchenko et al. (2022) proved that local training, when properly executed, leads to provable communication acceleration, and this holds in the strongly convex regime without relying on any data similarity assumptions. However, their ProxSkip method requires all clients to take the same number of local training steps in each communication round. We propose a redesign of the ProxSkip method, allowing clients with ``less important’’ data to get away with fewer local training steps without impacting the overall communication complexity of the method. In particular, we prove that our modified method, GradSkip, converges linearly under the same assumptions and has the same accelerated communication complexity, while the number of local gradient steps can be reduced relative to a local condition number. We further generalize our method by extending the randomness of probabilistic alternations to arbitrary unbiased compression operators and by considering a generic proximable regularizer. This generalization, which we call GradSkip+, recovers several related methods in the literature as special cases. Finally, we present an empirical study on carefully designed toy problems that confirm our theoretical claims.

nan


Article 993

Title@2025-06-09 (1): SIFBench: An Extensive Benchmark for Fatigue Analysis

Title: SIFBench: An Extensive Benchmark for Fatigue Analysis SIFBench: Ein umfassender Benchmark für die Ermüdungsanalyse SIFBench:关于 “ 致命分析 “ 的广泛基准 2506.01173v2

Authors (4): Tushar Gautam, Robert M. Kirby, Jacob Hochhalter, Shandian Zhe

Fatigue-induced crack growth is a leading cause of structural failure across critical industries such as aerospace, civil engineering, automotive, and energy. Accurate prediction of stress intensity factors (SIFs) – the key parameters governing crack propagation in linear elastic fracture mechanics – is essential for assessing fatigue life and ensuring structural integrity. While machine learning (ML) has shown great promise in SIF prediction, its advancement has been severely limited by the lack of rich, transparent, well-organized, and high-quality datasets. To address this gap, we introduce SIFBench, an open-source, large-scale benchmark database designed to support ML-based SIF prediction. SIFBench contains over 5 million different crack and component geometries derived from high-fidelity finite element simulations across 37 distinct scenarios, and provides a unified Python interface for seamless data access and customization. We report baseline results using a range of popular ML models – including random forests, support vector machines, feedforward neural networks, and Fourier neural operators – alongside comprehensive evaluation metrics and template code for model training, validation, and assessment. By offering a standardized and scalable resource, SIFBench substantially lowers the entry barrier and fosters the development and application of ML methods in damage tolerance design and predictive maintenance.

nan


Article 994

Title@2025-06-09 (1): Algorithmic Aspects of Strategic Trading

Title: Algorithmic Aspects of Strategic Trading Algorithmische Aspekte des strategischen Handels 战略贸易的算法方面 2502.07606v2

Authors (2): Michael Kearns, Mirah Shi

Algorithmic trading in modern financial markets is widely acknowledged to exhibit strategic, game-theoretic behaviors whose complexity can be difficult to model. A recent series of papers (Chriss, 2024b,c,a, 2025) has made progress in the setting of trading for position building. Here parties wish to buy or sell a fixed number of shares in a fixed time period in the presence of both temporary and permanent market impact, resulting in exponentially large strategy spaces. While these papers primarily consider the existence and structural properties of equilibrium strategies, in this work we focus on the algorithmic aspects of the proposed model. We give an efficient algorithm for computing best responses, and show that while the temporary impact only setting yields a potential game, best response dynamics do not generally converge for the general setting, for which no fast algorithm for (Nash) equilibrium computation is known. This leads us to consider the broader notion of Coarse Correlated Equilibria (CCE), which we show can be computed efficiently via an implementation of Follow the Perturbed Leader (FTPL). We illustrate the model and our results with an experimental investigation, where FTPL exhibits interesting behavior in different regimes of the relative weighting between temporary and permanent market impact.

nan


Article 995

Title@2025-06-09 (1): TS-RSR: A provably efficient approach for batch Bayesian Optimization

Title: TS-RSR: A provably efficient approach for batch Bayesian Optimization TS-RSR: Ein nachweislich effizienter Ansatz für die Batch Bayesian Optimierung TS-RSR:对Bayesian的批量最佳化采取可衡量有效办法 2403.04764v4

Authors (2): Zhaolin Ren, Na Li

This paper presents a new approach for batch Bayesian Optimization (BO) called Thompson Sampling-Regret to Sigma Ratio directed sampling (TS-RSR), where we sample a new batch of actions by minimizing a Thompson Sampling approximation of a regret to uncertainty ratio. Our sampling objective is able to coordinate the actions chosen in each batch in a way that minimizes redundancy between points whilst focusing on points with high predictive means or high uncertainty. Theoretically, we provide rigorous convergence guarantees on our algorithm’s regret, and numerically, we demonstrate that our method attains state-of-the-art performance on a range of challenging synthetic and realistic test functions, where it outperforms several competitive benchmark batch BO algorithms.

nan


Article 996

Title@2025-06-09 (1): Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?

Title: Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models? Können Hessian-Based Insights Fehlerdiagnosen in aufmerksamkeitsbasierten Modellen unterstützen? 以海珊为基地的洞察能支持以关注为基础的模型中的过失诊断吗? 2506.07871v1

Authors (2): Sigma Jahan, Mohammad Masudur Rahman

As attention-based deep learning models scale in size and complexity, diagnosing their faults becomes increasingly challenging. In this work, we conduct an empirical study to evaluate the potential of Hessian-based analysis for diagnosing faults in attention-based models. Specifically, we use Hessian-derived insights to identify fragile regions (via curvature analysis) and parameter interdependencies (via parameter interaction analysis) within attention mechanisms. Through experiments on three diverse models (HAN, 3D-CNN, DistilBERT), we show that Hessian-based metrics can localize instability and pinpoint fault sources more effectively than gradients alone. Our empirical findings suggest that these metrics could significantly improve fault diagnosis in complex neural architectures, potentially improving software debugging practices.

nan


Article 997

Title@2025-06-09 (1): Accurate INT8 Training Through Dynamic Block-Level Fallback

Title: Accurate INT8 Training Through Dynamic Block-Level Fallback Genaues INT8-Training durch dynamischen Block-Level-Fallback 通过动态区档级后退进行准确的INT8培训 2503.08040v3

Authors (5): Pengle Zhang, Jia Wei, Jintao Zhang, Jun Zhu, Jianfei Chen

Transformer models have achieved remarkable success across various AI applications but face significant training costs. Low-bit training, such as INT8 training, can leverage computational units with higher throughput, and has already demonstrated its effectiveness on GPT2 models with block-level quantization. However, it struggles with modern Transformer variants incorporating GLU units. This is because those variants demonstrate complex distributions of activation outliers. To address the challenge, we propose Fallback Quantization, implementing mixed-precision GEMM that dynamically falls back 8-bit to 16-bit for activation blocks containing outliers. Experiments show that our approach is robustly competent in both fine-tuning and pretraining settings. Moreover, our method achieves a 1.57x end-to-end training speedup on RTX4090 GPUs.

nan


Article 998

Title@2025-06-09 (1): FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity

Title: FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity FreeGave: 3D Physik Lernen aus dynamischen Videos von Gaussian Velocity FreeGave: Gaussian 速度从动态视频学习 3D物理 2506.07865v1

Authors (4): Jinxi Li, Ziyang Song, Siyuan Zhou, Bo Yang

In this paper, we aim to model 3D scene geometry, appearance, and the underlying physics purely from multi-view videos. By applying various governing PDEs as PINN losses or incorporating physics simulation into neural networks, existing works often fail to learn complex physical motions at boundaries or require object priors such as masks or types. In this paper, we propose FreeGave to learn the physics of complex dynamic 3D scenes without needing any object priors. The key to our approach is to introduce a physics code followed by a carefully designed divergence-free module for estimating a per-Gaussian velocity field, without relying on the inefficient PINN losses. Extensive experiments on three public datasets and a newly collected challenging real-world dataset demonstrate the superior performance of our method for future frame extrapolation and motion segmentation. Most notably, our investigation into the learned physics codes reveals that they truly learn meaningful 3D physical motion patterns in the absence of any human labels in training.

nan


Article 999

Title@2025-06-09 (1): Lightweight Sequential Transformers for Blood Glucose Level Prediction in Type-1 Diabetes

Title: Lightweight Sequential Transformers for Blood Glucose Level Prediction in Type-1 Diabetes Leichte Sequentielle Transformer für Blutglukose-Level-Vorhersage bei Typ-1-Diabetes 一型糖尿病血凝胶水平预测的轻量量 轻序式变压器 2506.07864v1

Authors (4): Mirko Paolo Barbato, Giorgia Rigamonti, Davide Marelli, Paolo Napoletano

Type 1 Diabetes (T1D) affects millions worldwide, requiring continuous monitoring to prevent severe hypo- and hyperglycemic events. While continuous glucose monitoring has improved blood glucose management, deploying predictive models on wearable devices remains challenging due to computational and memory constraints. To address this, we propose a novel Lightweight Sequential Transformer model designed for blood glucose prediction in T1D. By integrating the strengths of Transformers’ attention mechanisms and the sequential processing of recurrent neural networks, our architecture captures long-term dependencies while maintaining computational efficiency. The model is optimized for deployment on resource-constrained edge devices and incorporates a balanced loss function to handle the inherent data imbalance in hypo- and hyperglycemic events. Experiments on two benchmark datasets, OhioT1DM and DiaTrend, demonstrate that the proposed model outperforms state-of-the-art methods in predicting glucose levels and detecting adverse events. This work fills the gap between high-performance modeling and practical deployment, providing a reliable and efficient T1D management solution.

nan


Article 1000

Title@2025-06-09 (1): VIVAT: Virtuous Improving VAE Training through Artifact Mitigation

Title: VIVAT: Virtuous Improving VAE Training through Artifact Mitigation VIVAT: Tugendhafte Verbesserung der VAE-Ausbildung durch Artefakt-Minderung VIVAT:通过减少非手工行为,提高VAE培训的道德水平 2506.07863v1

Authors (5): Lev Novitskiy, Viacheslav Vasilev, Maria Kovaleva, Vladimir Arkhipkin, Denis Dimitrov

Variational Autoencoders (VAEs) remain a cornerstone of generative computer vision, yet their training is often plagued by artifacts that degrade reconstruction and generation quality. This paper introduces VIVAT, a systematic approach to mitigating common artifacts in KL-VAE training without requiring radical architectural changes. We present a detailed taxonomy of five prevalent artifacts - color shift, grid patterns, blur, corner and droplet artifacts - and analyze their root causes. Through straightforward modifications, including adjustments to loss weights, padding strategies, and the integration of Spatially Conditional Normalization, we demonstrate significant improvements in VAE performance. Our method achieves state-of-the-art results in image reconstruction metrics (PSNR and SSIM) across multiple benchmarks and enhances text-to-image generation quality, as evidenced by superior CLIP scores. By preserving the simplicity of the KL-VAE framework while addressing its practical challenges, VIVAT offers actionable insights for researchers and practitioners aiming to optimize VAE training.

nan


Article 1001

Title@2025-06-09 (1): PropEnc: A Property Encoder for Graph Neural Networks

Title: PropEnc: A Property Encoder for Graph Neural Networks Propenc: Ein Objekt-Encoder für Graph-Neural-Netzwerke PropEnc: 图形神经网络的属性编码器 2409.11554v3

Authors (3): Anwar Said, Waseem Abbas, Xenofon Koutsoukos

Graph machine learning, particularly using graph neural networks, heavily relies on node features. However, many real-world systems, such as social and biological networks, lack node features due to privacy concerns, incomplete data, or collection limitations. Structural and positional encoding are commonly used to address this but are constrained by the maximum values of the encoded properties, such as the highest node degree. This limitation makes them impractical for scale-free networks and applications involving large or non-categorical properties. This paper introduces PropEnc, a novel and versatile encoder to generate expressive node embedding from any graph metric. By combining histogram construction with reversed index encoding, PropEnc offers a flexible solution that supports low-dimensional representations and diverse input types, effectively mitigating sparsity issues while improving computational efficiency. Additionally, it replicates one-hot encoding or approximates indices with high accuracy, making it adaptable to a wide range of graph applications. We validate PropEnc through extensive experiments on graph classification task across several social networks lacking node features. The empirical results demonstrate that PropEnc offers an efficient mechanism for constructing node features from various graph metrics.

nan


Article 1002

Title@2025-06-09 (1): Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective

Title: Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective Fairness Overfiting in Machine Learning: Eine informationstheoretische Perspektive 在机器学习中过分适应公平:信息理论视角 2506.07861v1

Authors (4): Firas Laakom, Haobo Chen, Jürgen Schmidhuber, Yuheng Bu

Despite substantial progress in promoting fairness in high-stake applications using machine learning models, existing methods often modify the training process, such as through regularizers or other interventions, but lack formal guarantees that fairness achieved during training will generalize to unseen data. Although overfitting with respect to prediction performance has been extensively studied, overfitting in terms of fairness loss has received far less attention. This paper proposes a theoretical framework for analyzing fairness generalization error through an information-theoretic lens. Our novel bounding technique is based on Efron-Stein inequality, which allows us to derive tight information-theoretic fairness generalization bounds with both Mutual Information (MI) and Conditional Mutual Information (CMI). Our empirical results validate the tightness and practical relevance of these bounds across diverse fairness-aware learning algorithms. Our framework offers valuable insights to guide the design of algorithms improving fairness generalization.

nan


Article 1003

Title@2025-06-09 (1): LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds

Title: LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds LogoSP: Lokale-globale Gruppierung von Superpoints für unbeaufsichtigte semantische Segmentierung von 3D-Punkt-Clouds LogoSP: 3D点云无监督的语义分割超点地方-全球组别 2506.07857v1

Authors (4): Zihui Zhang, Weisheng Dai, Hongtao Wen, Bo Yang

We study the problem of unsupervised 3D semantic segmentation on raw point clouds without needing human labels in training. Existing methods usually formulate this problem into learning per-point local features followed by a simple grouping strategy, lacking the ability to discover additional and possibly richer semantic priors beyond local features. In this paper, we introduce LogoSP to learn 3D semantics from both local and global point features. The key to our approach is to discover 3D semantic information by grouping superpoints according to their global patterns in the frequency domain, thus generating highly accurate semantic pseudo-labels for training a segmentation network. Extensive experiments on two indoor and an outdoor datasets show that our LogoSP surpasses all existing unsupervised methods by large margins, achieving the state-of-the-art performance for unsupervised 3D semantic segmentation. Notably, our investigation into the learned global patterns reveals that they truly represent meaningful 3D semantics in the absence of human labels during training.

nan


Article 1004

Title@2025-06-09 (1): Residual Reweighted Conformal Prediction for Graph Neural Networks

Title: Residual Reweighted Conformal Prediction for Graph Neural Networks Residual Reweighted Conformal Prediction for Graph Neural Networks 图形神经网络的重新加权常规预测 2506.07854v1

Authors (6): Zheng Zhang, Jie Bao, Zhixin Zhou, Nicolo Colombo, Lixin Cheng, Rui Luo

Graph Neural Networks (GNNs) excel at modeling relational data but face significant challenges in high-stakes domains due to unquantified uncertainty. Conformal prediction (CP) offers statistical coverage guarantees, but existing methods often produce overly conservative prediction intervals that fail to account for graph heteroscedasticity and structural biases. While residual reweighting CP variants address some of these limitations, they neglect graph topology, cluster-specific uncertainties, and risk data leakage by reusing training sets. To address these issues, we propose Residual Reweighted GNN (RR-GNN), a framework designed to generate minimal prediction sets with provable marginal coverage guarantees. RR-GNN introduces three major innovations to enhance prediction performance. First, it employs Graph-Structured Mondrian CP to partition nodes or edges into communities based on topological features, ensuring cluster-conditional coverage that reflects heterogeneity. Second, it uses Residual-Adaptive Nonconformity Scores by training a secondary GNN on a held-out calibration set to estimate task-specific residuals, dynamically adjusting prediction intervals according to node or edge uncertainty. Third, it adopts a Cross-Training Protocol, which alternates the optimization of the primary GNN and the residual predictor to prevent information leakage while maintaining graph dependencies. We validate RR-GNN on 15 real-world graphs across diverse tasks, including node classification, regression, and edge weight prediction. Compared to CP baselines, RR-GNN achieves improved efficiency over state-of-the-art methods, with no loss of coverage.

nan


Article 1005

Title@2025-06-09 (1): Conditional Local Independence Testing with Application to Dynamic Causal Discovery

Title: Conditional Local Independence Testing with Application to Dynamic Causal Discovery Conditional Local Independence Testing mit Anwendung auf Dynamische Kausal-Discovery 有条件的局部独立测试,应用到动态因子发现 2506.07844v1

Authors (3): Mingzhou Liu, Xinwei Sun, Yizhou Wang

In this note, we extend the conditional local independence testing theory developed in Christgau et al. (2024) to Ito processes. The result can be applied to causal discovery in dynamic systems.

nan


Article 1006

Title@2025-06-09 (1): Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels

Title: Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels Jarzynski Umgewichtungs- und Probenahmedynamiken für das Training energiebasierter Modelle: Theoretische Analyse verschiedener Übergangskerne Jarzynski 培训以能源为基础的模型的加权和抽样动态:对不同过渡核心的理论分析 2506.07843v1

Authors (1): Davide Carbone

Energy-Based Models (EBMs) provide a flexible framework for generative modeling, but their training remains theoretically challenging due to the need to approximate normalization constants and efficiently sample from complex, multi-modal distributions. Traditional methods, such as contrastive divergence and score matching, introduce biases that can hinder accurate learning. In this work, we present a theoretical analysis of Jarzynski reweighting, a technique from non-equilibrium statistical mechanics, and its implications for training EBMs. We focus on the role of the choice of the kernel and we illustrate these theoretical considerations in two key generative frameworks: (i) flow-based diffusion models, where we reinterpret Jarzynski reweighting in the context of stochastic interpolants to mitigate discretization errors and improve sample quality, and (ii) Restricted Boltzmann Machines, where we analyze its role in correcting the biases of contrastive divergence. Our results provide insights into the interplay between kernel choice and model performance, highlighting the potential of Jarzynski reweighting as a principled tool for generative learning.

nan


Article 1007

Title@2025-06-09 (1): FedX: Adaptive Model Decomposition and Quantization for IoT Federated Learning

Title: FedX: Adaptive Model Decomposition and Quantization for IoT Federated Learning FedX: Adaptive Modellzersetzung und Quantisierung für IoT-Federated Learning FFX:国际oT联合会学习的适应性模型分解和量化 2504.12849v3

Authors (8): Phung Lai, Xiaopeng Jiang, Hai Phan, Cristian Borcea, Khang Tran, An Chen, Vijaya Datta Mayyuri, Ruoming Jin

Federated Learning (FL) allows collaborative training among multiple devices without data sharing, thus enabling privacy-sensitive applications on mobile or Internet of Things (IoT) devices, such as mobile health and asset tracking. However, designing an FL system with good model utility that works with low computation/communication overhead on heterogeneous, resource-constrained mobile/IoT devices is challenging. To address this problem, this paper proposes FedX, a novel adaptive model decomposition and quantization FL system for IoT. To balance utility with resource constraints on IoT devices, FedX decomposes a global FL model into different sub-networks with adaptive numbers of quantized bits for different devices. The key idea is that a device with fewer resources receives a smaller sub-network for lower overhead but utilizes a larger number of quantized bits for higher model utility, and vice versa. The quantization operations in FedX are done at the server to reduce the computational load on devices. FedX iteratively minimizes the losses in the devices’ local data and in the server’s public data using quantized sub-networks under a regularization term, and thus it maximizes the benefits of combining FL with model quantization through knowledge sharing among the server and devices in a cost-effective training process. Extensive experiments show that FedX significantly improves quantization times by up to 8.43X, on-device computation time by 1.5X, and total end-to-end training time by 1.36X, compared with baseline FL systems. We guarantee the global model convergence theoretically and validate local model convergence empirically, highlighting FedX’s optimization efficiency.

nan


Article 1008

Title@2025-06-09 (1): A Cognac Shot To Forget Bad Memories: Corrective Unlearning for Graph Neural Networks

Title: A Cognac Shot To Forget Bad Memories: Corrective Unlearning for Graph Neural Networks Ein Cognac, um schlechte Erinnerungen zu vergessen: Korrektives Lernen für Graphen-Neural-Netzwerke 为忘记坏记忆而拍摄的科格纳克白兰地:为图形神经网络进行矫正性学习 2412.00789v4

Authors (7): Varshita Kolipaka, Akshit Sinha, Debangan Mishra, Sumit Kumar, Arvindh Arun, Shashwat Goel, Ponnurangam Kumaraguru

Graph Neural Networks (GNNs) are increasingly being used for a variety of ML applications on graph data. Because graph data does not follow the independently and identically distributed (i.i.d.) assumption, adversarial manipulations or incorrect data can propagate to other data points through message passing, which deteriorates the model’s performance. To allow model developers to remove the adverse effects of manipulated entities from a trained GNN, we study the recently formulated problem of Corrective Unlearning. We find that current graph unlearning methods fail to unlearn the effect of manipulations even when the whole manipulated set is known. We introduce a new graph unlearning method, Cognac, which can unlearn the effect of the manipulation set even when only 5% of it is identified. It recovers most of the performance of a strong oracle with fully corrected training data, even beating retraining from scratch without the deletion set while being 8x more efficient. We hope our work assists GNN developers in mitigating harmful effects caused by issues in real-world data, post-training. Our code is publicly available at https://github.com/cognac-gnn-unlearning/corrective-unlearning-for-gnns

nan


Article 1009

Title@2025-06-09 (1): C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition

Title: C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition C3T: Grenzüberschreitender Transfer durch Zeit für sensorgestützte menschliche Aktivitätserkennung C3T: 以传感器为基础的人类活动识别跨时间跨模式转让 2407.16803v3

Authors (3): Abhi Kamboj, Anh Duy Nguyen, Minh N. Do

In order to unlock the potential of diverse sensors, we investigate a method to transfer knowledge between time-series modalities using a multimodal \textit{temporal} representation space for Human Activity Recognition (HAR). Specifically, we explore the setting where the modality used in testing has no labeled data during training, which we refer to as Unsupervised Modality Adaptation (UMA). We categorize existing UMA approaches as Student-Teacher or Contrastive Alignment methods. These methods typically compress continuous-time data samples into single latent vectors during alignment, inhibiting their ability to transfer temporal information through real-world temporal distortions. To address this, we introduce Cross-modal Transfer Through Time (C3T), which preserves temporal information during alignment to handle dynamic sensor data better. C3T achieves this by aligning a set of temporal latent vectors across sensing modalities. Our extensive experiments on various camera+IMU datasets demonstrate that C3T outperforms existing methods in UMA by at least 8% in accuracy and shows superior robustness to temporal distortions such as time-shift, misalignment, and dilation. Our findings suggest that C3T has significant potential for developing generalizable models for time-series sensor data, opening new avenues for various multimodal applications.

nan


Article 1010

Title@2025-06-09 (1): Efficient and Generalized end-to-end Autonomous Driving System with Latent Deep Reinforcement Learning and Demonstrations

Title: Efficient and Generalized end-to-end Autonomous Driving System with Latent Deep Reinforcement Learning and Demonstrations Effizientes und generalisiertes End-to-End Autonomes Fahrsystem mit Latent Deep Reinforcement Learning und Demonstrationen 高效和普遍通用的端到端自动驾驶系统,配有深深层深强化学习和示范 2401.11792v7

Authors (4): Zuojin Tang, Xiaoyu Chen, Yongqiang Li, Jianyu Chen

An intelligent driving system should dynamically formulate appropriate driving strategies based on the current environment and vehicle status while ensuring system security and reliability. However, methods based on reinforcement learning and imitation learning often suffer from high sample complexity, poor generalization, and low safety. To address these challenges, this paper introduces an efficient and generalized end-to-end autonomous driving system (EGADS) for complex and varied scenarios. The RL agent in our EGADS combines variational inference with normalizing flows, which are independent of distribution assumptions. This combination allows the agent to capture historical information relevant to driving in latent space effectively, thereby significantly reducing sample complexity. Additionally, we enhance safety by formulating robust safety constraints and improve generalization and performance by integrating RL with expert demonstrations. Experimental results demonstrate that, compared to existing methods, EGADS significantly reduces sample complexity, greatly improves safety performance, and exhibits strong generalization capabilities in complex urban scenarios. Particularly, we contributed an expert dataset collected through human expert steering wheel control, specifically using the G29 steering wheel.

nan


Article 1011

Title@2025-06-09 (1): Beyond Numeric Rewards: In-Context Dueling Bandits with LLM Agents

Title: Beyond Numeric Rewards: In-Context Dueling Bandits with LLM Agents Über Numerische Belohnungen hinaus: In-Context-Duellbanditen mit LLM-Agenten 除数值奖励之外:与LLM代理商进行文中评分的强盗 2407.01887v4

Authors (4): Fanzeng Xia, Hao Liu, Yisong Yue, Tongxin Li

In-Context Reinforcement Learning (ICRL) is a frontier paradigm to solve Reinforcement Learning (RL) problems in the foundation model era. While ICRL capabilities have been demonstrated in transformers through task-specific training, the potential of Large Language Models (LLMs) out-of-the-box remains largely unexplored. This paper investigates whether LLMs can generalize cross-domain to perform ICRL under the problem of Dueling Bandits (DB), a stateless preference-based RL setting. We find that the top-performing LLMs exhibit a notable zero-shot capacity for relative decision-making, which translates to low short-term weak regret across all DB environment instances by quickly including the best arm in duels. However, an optimality gap still exists between LLMs and classic DB algorithms in terms of strong regret. LLMs struggle to converge and consistently exploit even when explicitly prompted to do so, and are sensitive to prompt variations. To bridge this gap, we propose an agentic flow framework: LLM with Enhanced Algorithmic Dueling (LEAD), which integrates off-the-shelf DB algorithm support with LLM agents through fine-grained adaptive interplay. We show that LEAD has theoretical guarantees inherited from classic DB algorithms on both weak and strong regret. We validate its efficacy and robustness even with noisy and adversarial prompts. The design of such an agentic framework sheds light on how to enhance the trustworthiness of general-purpose LLMs generalized to in-context decision-making tasks.

nan


Article 1012

Title@2025-06-09 (1): Improving large language models with concept-aware fine-tuning

Title: Improving large language models with concept-aware fine-tuning Verbesserung großer Sprachmodelle mit konzeptbewusster Feinsteuerung 改进大语言模式,进行概念意识微调 2506.07833v1

Authors (4): Michael K. Chen, Xikun Zhang, Jiaxing Huang, Dacheng Tao

Large language models (LLMs) have become the cornerstone of modern AI. However, the existing paradigm of next-token prediction fundamentally limits their ability to form coherent, high-level concepts, making it a critical barrier to human-like understanding and reasoning. Take the phrase “ribonucleic acid” as an example: an LLM will first decompose it into tokens, i.e., artificial text fragments (“rib”, “on”, …), then learn each token sequentially, rather than grasping the phrase as a unified, coherent semantic entity. This fragmented representation hinders deeper conceptual understanding and, ultimately, the development of truly intelligent systems. In response, we introduce Concept-Aware Fine-Tuning (CAFT), a novel multi-token training method that redefines how LLMs are fine-tuned. By enabling the learning of sequences that span multiple tokens, this method fosters stronger concept-aware learning. Our experiments demonstrate significant improvements compared to conventional next-token finetuning methods across diverse tasks, including traditional applications like text summarization and domain-specific ones like de novo protein design. Multi-token prediction was previously only possible in the prohibitively expensive pretraining phase; CAFT, to our knowledge, is the first to bring the multi-token setting to the post-training phase, thus effectively democratizing its benefits for the broader community of practitioners and researchers. Finally, the unexpected effectiveness of our proposed method suggests wider implications for the machine learning research community. All code and data are available at https://github.com/michaelchen-lab/caft-llm

nan


Article 1013

Title@2025-06-09 (1): Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information

Title: Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information Dezentralisierung des Mehr-Agenten-Verstärkungs-Lernens mit zeitlichen Kausalinformationen 利用时间原因信息分散多机构强化学习 2506.07829v1

Authors (5): Jan Corazza, Hadi Partovi Aria, Hyohun Kim, Daniel Neider, Zhe Xu

Reinforcement learning (RL) algorithms can find an optimal policy for a single agent to accomplish a particular task. However, many real-world problems require multiple agents to collaborate in order to achieve a common goal. For example, a robot executing a task in a warehouse may require the assistance of a drone to retrieve items from high shelves. In Decentralized Multi-Agent RL (DMARL), agents learn independently and then combine their policies at execution time, but often must satisfy constraints on compatibility of local policies to ensure that they can achieve the global task when combined. In this paper, we study how providing high-level symbolic knowledge to agents can help address unique challenges of this setting, such as privacy constraints, communication limitations, and performance concerns. In particular, we extend the formal tools used to check the compatibility of local policies with the team task, making decentralized training with theoretical guarantees usable in more scenarios. Furthermore, we empirically demonstrate that symbolic knowledge about the temporal evolution of events in the environment can significantly expedite the learning process in DMARL.

nan


Article 1014

Title@2025-06-09 (1): R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation

Title: R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation R3D2: Realistische 3D Asset Insertion über Diffusion für autonome Fahrsimulation R3D2:现实的 3D 资产通过传播插入资产,用于自动驾驶模拟 2506.07826v1

Authors (11): William Ljungbergh, Bernardo Taveira, Wenzhao Zheng, Adam Tonderski, Chensheng Peng, Fredrik Kahl, Christoffer Petersson, Michael Felsberg, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan

Validating autonomous driving (AD) systems requires diverse and safety-critical testing, making photorealistic virtual environments essential. Traditional simulation platforms, while controllable, are resource-intensive to scale and often suffer from a domain gap with real-world data. In contrast, neural reconstruction methods like 3D Gaussian Splatting (3DGS) offer a scalable solution for creating photorealistic digital twins of real-world driving scenes. However, they struggle with dynamic object manipulation and reusability as their per-scene optimization-based methodology tends to result in incomplete object models with integrated illumination effects. This paper introduces R3D2, a lightweight, one-step diffusion model designed to overcome these limitations and enable realistic insertion of complete 3D assets into existing scenes by generating plausible rendering effects-such as shadows and consistent lighting-in real time. This is achieved by training R3D2 on a novel dataset: 3DGS object assets are generated from in-the-wild AD data using an image-conditioned 3D generative model, and then synthetically placed into neural rendering-based virtual environments, allowing R3D2 to learn realistic integration. Quantitative and qualitative evaluations demonstrate that R3D2 significantly enhances the realism of inserted assets, enabling use-cases like text-to-3D asset insertion and cross-scene/dataset object transfer, allowing for true scalability in AD validation. To promote further research in scalable and realistic AD simulation, we will release our dataset and code, see https://research.zenseact.com/publications/R3D2/.

nan


Article 1015

Title@2025-06-09 (1): Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation

Title: Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation Beschleunigung von Diffusionsmodellen in Offline-RL über die Reward-Aware-Konsistenz-Trajektoriendestillation 通过奖励-软件兼容性轨迹蒸馏在离线 RL 中加速扩散模型 2506.07822v1

Authors (6): Xintong Duan, Yutong He, Fahim Tajwar, Ruslan Salakhutdinov, J. Zico Kolter, Jeff Schneider

Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While the consistency model offers a potential solution, its applications to decision-making often struggle with suboptimal demonstrations or rely on complex concurrent training of multiple networks. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method enables single-step generation while maintaining higher performance and simpler training. Empirical evaluations on the Gym MuJoCo benchmarks and long horizon planning demonstrate that our approach can achieve an 8.7% improvement over previous state-of-the-art while offering up to 142x speedup over diffusion counterparts in inference time.

nan


Article 1016

Title@2025-06-09 (1): Accelerating Constrained Sampling: A Large Deviations Approach

Title: Accelerating Constrained Sampling: A Large Deviations Approach Beschleunigte Probenahme beschleunigen: Ein großer Abweichungsansatz 加速受控抽样:大偏离方法 2506.07816v1

Authors (4): Yingli Wang, Changwei Tu, Xiaoyu Wang, Lingjiong Zhu

The problem of sampling a target probability distribution on a constrained domain arises in many applications including machine learning. For constrained sampling, various Langevin algorithms such as projected Langevin Monte Carlo (PLMC) based on the discretization of reflected Langevin dynamics (RLD) and more generally skew-reflected non-reversible Langevin Monte Carlo (SRNLMC) based on the discretization of skew-reflected non-reversible Langevin dynamics (SRNLD) have been proposed and studied in the literature. This work focuses on the long-time behavior of SRNLD, where a skew-symmetric matrix is added to RLD. Although the non-asymptotic convergence analysis for SRNLD (and SRNLMC) and the acceleration compared to RLD (and PMLC) have been studied in the literature, it is not clear how one should design the skew-symmetric matrix in the dynamics to achieve good performance in practice. We establish a large deviation principle (LDP) for the empirical measure of SRNLD when the skew-symmetric matrix is chosen such that its product with the inward unit normal vector field on the boundary is zero. By explicitly characterizing the rate functions, we show that SRNLD can accelerate the convergence to the target distribution compared to RLD with this choice of the skew-symmetric matrix. Numerical experiments for SRNLMC based on the proposed skew-symmetric matrix show superior performance which validate the theoretical findings from the large deviations theory.

nan


Article 1017

Title@2025-06-09 (1): A weighted quantum ensemble of homogeneous quantum classifiers

Title: A weighted quantum ensemble of homogeneous quantum classifiers Ein gewichtetes Quantenensemble homogener Quantenklassifikatoren 一份加权量量的加权总量,由同质量分类器组成 2506.07810v1

Authors (3): Emiliano Tolotti, Enrico Blanzieri, Davide Pastorello

Ensemble methods in machine learning aim to improve prediction accuracy by combining multiple models. This is achieved by ensuring diversity among predictors to capture different data aspects. Homogeneous ensembles use identical models, achieving diversity through different data subsets, and weighted-average ensembles assign higher influence to more accurate models through a weight learning procedure. We propose a method to achieve a weighted homogeneous quantum ensemble using quantum classifiers with indexing registers for data encoding. This approach leverages instance-based quantum classifiers, enabling feature and training point subsampling through superposition and controlled unitaries, and allowing for a quantum-parallel execution of diverse internal classifiers with different data compositions in superposition. The method integrates a learning process involving circuit execution and classical weight optimization, for a trained ensemble execution with weights encoded in the circuit at test-time. Empirical evaluation demonstrate the effectiveness of the proposed method, offering insights into its performance.

nan


Article 1018

Title@2025-06-09 (1): Identifiable Object Representations under Spatial Ambiguities

Title: Identifiable Object Representations under Spatial Ambiguities Identifizierbare Objektdarstellungen unter räumlichen Ambiguitäten 空间模糊度下可识别物体的表示 2506.07806v1

Authors (3): Avinash Kori, Francesca Toni, Ben Glocker

Modular object-centric representations are essential for human-like reasoning but are challenging to obtain under spatial ambiguities, e.g. due to occlusions and view ambiguities. However, addressing challenges presents both theoretical and practical difficulties. We introduce a novel multi-view probabilistic approach that aggregates view-specific slots to capture invariant content information while simultaneously learning disentangled global viewpoint-level information. Unlike prior single-view methods, our approach resolves spatial ambiguities, provides theoretical guarantees for identifiability, and requires no viewpoint annotations. Extensive experiments on standard benchmarks and novel complex datasets validate our method’s robustness and scalability.

nan


Article 1019

Title@2025-06-09 (1): Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability

Title: Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability Verbesserung der adversarialen Robustheit mit konformer Vorhersage: Ein Rahmen für eine garantierte Model-Zuverlässigkeit 增强以非正式预测增强反向能力:保证模型可靠性框架 2506.07804v1

Authors (5): Jie Bao, Chuangyin Dang, Rui Luo, Hanwei Zhang, Zhixin Zhou

As deep learning models are increasingly deployed in high-risk applications, robust defenses against adversarial attacks and reliable performance guarantees become paramount. Moreover, accuracy alone does not provide sufficient assurance or reliable uncertainty estimates for these models. This study advances adversarial training by leveraging principles from Conformal Prediction. Specifically, we develop an adversarial attack method, termed OPSA (OPtimal Size Attack), designed to reduce the efficiency of conformal prediction at any significance level by maximizing model uncertainty without requiring coverage guarantees. Correspondingly, we introduce OPSA-AT (Adversarial Training), a defense strategy that integrates OPSA within a novel conformal training paradigm. Experimental evaluations demonstrate that our OPSA attack method induces greater uncertainty compared to baseline approaches for various defenses. Conversely, our OPSA-AT defensive model significantly enhances robustness not only against OPSA but also other adversarial attacks, and maintains reliable prediction. Our findings highlight the effectiveness of this integrated approach for developing trustworthy and resilient deep learning models for safety-critical domains. Our code is available at https://github.com/bjbbbb/Enhancing-Adversarial-Robustness-with-Conformal-Prediction.

nan


Article 1020

Title@2025-06-09 (1): Fine-grained Hierarchical Crop Type Classification from Integrated Hyperspectral EnMAP Data and Multispectral Sentinel-2 Time Series: A Large-scale Dataset and Dual-stream Transformer Method

Title: Fine-grained Hierarchical Crop Type Classification from Integrated Hyperspectral EnMAP Data and Multispectral Sentinel-2 Time Series: A Large-scale Dataset and Dual-stream Transformer Method Feinkörnige Hierarchische Sortenklassifikation aus integrierten Hyperspektralen EnMAP-Daten und Multispektralen Sentinel-2 Zeitreihen: Eine großflächige Datensatz- und Dual-Stream-Transformer-Methode 从综合超高光谱 EmmAP 数据和多谱哨兵-2时间序列:大型数据集和双流变异器方法中精细测的等级级作物类型分类 2506.06155v2

Authors (11): Wenyuan Li, Shunlin Liang, Yuxiang Zhang, Liqin Liu, Keyan Chen, Yongzhe Chen, Han Ma, Jianglei Xu, Yichuan Ma, Shikang Guan, Zhenwei Shi

Fine-grained crop type classification serves as the fundamental basis for large-scale crop mapping and plays a vital role in ensuring food security. It requires simultaneous capture of both phenological dynamics (obtained from multi-temporal satellite data like Sentinel-2) and subtle spectral variations (demanding nanometer-scale spectral resolution from hyperspectral imagery). Research combining these two modalities remains scarce currently due to challenges in hyperspectral data acquisition and crop types annotation costs. To address these issues, we construct a hierarchical hyperspectral crop dataset (H2Crop) by integrating 30m-resolution EnMAP hyperspectral data with Sentinel-2 time series. With over one million annotated field parcels organized in a four-tier crop taxonomy, H2Crop establishes a vital benchmark for fine-grained agricultural crop classification and hyperspectral image processing. We propose a dual-stream Transformer architecture that synergistically processes these modalities. It coordinates two specialized pathways: a spectral-spatial Transformer extracts fine-grained signatures from hyperspectral EnMAP data, while a temporal Swin Transformer extracts crop growth patterns from Sentinel-2 time series. The designed hierarchical classification head with hierarchical fusion then simultaneously delivers multi-level crop type classification across all taxonomic tiers. Experiments demonstrate that adding hyperspectral EnMAP data to Sentinel-2 time series yields a 4.2% average F1-scores improvement (peaking at 6.3%). Extensive comparisons also confirm our method’s higher accuracy over existing deep learning approaches for crop type classification and the consistent benefits of hyperspectral data across varying temporal windows and crop change scenarios. Codes and dataset are available at https://github.com/flyakon/H2Crop.

nan


Article 1021

Title@2025-06-09 (1): Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness

Title: Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness Widersacher mit Anreizen: Eine strategische Alternative zur widerspenstigen Robustheit 具有奖励措施的逆向:反反向强力战略替代方案 2406.11458v3

Authors (3): Maayan Ehrenberg, Roy Ganz, Nir Rosenfeld

Adversarial training aims to defend against adversaries: malicious opponents whose sole aim is to harm predictive performance in any way possible. This presents a rather harsh perspective, which we assert results in unnecessarily conservative training. As an alternative, we propose to model opponents as simply pursuing their own goals–rather than working directly against the classifier. Employing tools from strategic modeling, our approach enables knowledge or beliefs regarding the opponent’s possible incentives to be used as inductive bias for learning. Accordingly, our method of strategic training is designed to defend against all opponents within an ‘incentive uncertainty set’. This resorts to adversarial learning when the set is maximal, but offers potential gains when the set can be appropriately reduced. We conduct a series of experiments that show how even mild knowledge regarding the opponent’s incentives can be useful, and that the degree of potential gains depends on how these incentives relate to the structure of the learning task.

nan


Article 1022

Title@2025-06-09 (1): MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification

Title: MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification MultiMatch: Multihead-Konsistenzregularisierung passend zur semi-überwachten Textklassifikation 多匹配: 用于半有效文本分类的多标题一致性规则化 2506.07801v1

Authors (5): Iustin Sirbu, Robert-Adrian Popovici, Cornelia Caragea, Stefan Trausan-Matu, Traian Rebedea

We introduce MultiMatch, a novel semi-supervised learning (SSL) algorithm combining the paradigms of co-training and consistency regularization with pseudo-labeling. At its core, MultiMatch features a three-fold pseudo-label weighting module designed for three key purposes: selecting and filtering pseudo-labels based on head agreement and model confidence, and weighting them according to the perceived classification difficulty. This novel module enhances and unifies three existing techniques – heads agreement from Multihead Co-training, self-adaptive thresholds from FreeMatch, and Average Pseudo-Margins from MarginMatch – resulting in a holistic approach that improves robustness and performance in SSL settings. Experimental results on benchmark datasets highlight the superior performance of MultiMatch, achieving state-of-the-art results on 9 out of 10 setups from 5 natural language processing datasets and ranking first according to the Friedman test among 19 methods. Furthermore, MultiMatch demonstrates exceptional robustness in highly imbalanced settings, outperforming the second-best approach by 3.26% – and data imbalance is a key factor for many text classification tasks.

nan


Article 1023

Title@2025-06-09 (1): Generalized Interpolating Discrete Diffusion

Title: Generalized Interpolating Discrete Diffusion Generalisierte Interpolation diskreter Diffusion 普遍国际刑 化 2503.04482v2

Authors (6): Dimitri von Rütte, Janis Fluri, Yuhui Ding, Antonio Orvieto, Bernhard Schölkopf, Thomas Hofmann

While state-of-the-art language models achieve impressive results through next-token prediction, they have inherent limitations such as the inability to revise already generated tokens. This has prompted exploration of alternative approaches such as discrete diffusion. However, masked diffusion, which has emerged as a popular choice due to its simplicity and effectiveness, reintroduces this inability to revise words. To overcome this, we generalize masked diffusion, deriving a new family of general interpolating discrete diffusion (GIDD) which offers greater flexibility in the design of the noising processes. Leveraging a novel diffusion ELBO, we achieve compute-matched state-of-the-art performance in diffusion language modeling. Exploiting GIDD’s flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality and unlocking the ability for the model to correct its own mistakes, an area where autoregressive models notoriously have struggled. Code: https://github.com/dvruette/gidd/

nan


Article 1024

Title@2025-06-09 (1): LLM Unlearning Should Be Form-Independent

Title: LLM Unlearning Should Be Form-Independent LLM Unlearning sollte formunabhängig sein LLM 重新学习应是独立的形式 2506.07795v1

Authors (3): Xiaotian Ye, Mengqi Zhang, Shu Wu

Large Language Model (LLM) unlearning aims to erase or suppress undesirable knowledge within the model, offering promise for controlling harmful or private information to prevent misuse. However, recent studies highlight its limited efficacy in real-world scenarios, hindering practical adoption. In this study, we identify a pervasive issue underlying many downstream failures: the effectiveness of existing unlearning methods heavily depends on the form of training samples and frequently fails to generalize to alternate expressions of the same knowledge. We formally characterize this problem as Form-Dependent Bias and systematically investigate its specific manifestation patterns across various downstream tasks. To quantify its prevalence and support future research, we introduce ORT, a novel benchmark designed to evaluate the robustness of unlearning methods against variations in knowledge expression. Results reveal that Form-Dependent Bias is both widespread and severe among current techniques. We argue that LLM unlearning should be form-independent to address the endless forms of downstream tasks encountered in real-world security-critical scenarios. Towards this goal, we introduce Rank-one Concept Redirection (ROCR), a novel training-free method, as a promising solution path. ROCR performs unlearning by targeting the invariants in downstream tasks, specifically the activated dangerous concepts. It is capable of modifying model parameters within seconds to redirect the model’s perception of a specific unlearning target concept to another harmless concept. Extensive experiments demonstrate that ROCR significantly improves unlearning effectiveness compared to traditional methods while generating highly natural outputs.

nan


Article 1025

Title@2025-06-09 (1): High-Dimensional Learning in Finance

Title: High-Dimensional Learning in Finance High-Dimensional Learning in Finance 金融领域高层次学习 2506.03780v2

Authors (1): Hasan Fallahgoul

Recent advances in machine learning have shown promising results for financial prediction using large, over-parameterized models. This paper provides theoretical foundations and empirical validation for understanding when and how these methods achieve predictive success. I examine two key aspects of high-dimensional learning in finance. First, I prove that within-sample standardization in Random Fourier Features implementations fundamentally alters the underlying Gaussian kernel approximation, replacing shift-invariant kernels with training-set dependent alternatives. Second, I establish information-theoretic lower bounds that identify when reliable learning is impossible no matter how sophisticated the estimator. A detailed quantitative calibration of the polynomial lower bound shows that with typical parameter choices, e.g., 12,000 features, 12 monthly observations, and R-square 2-3%, the required sample size to escape the bound exceeds 25-30 years of data–well beyond any rolling-window actually used. Thus, observed out-of-sample success must originate from lower-complexity artefacts rather than from the intended high-dimensional mechanism.

nan


Article 1026

Title@2025-06-09 (1): Text-to-LoRA: Instant Transformer Adaption

Title: Text-to-LoRA: Instant Transformer Adaption Text-zu-LoRA: Instant Transformer Adaption 文本到LORA:即时变换器适应 2506.06105v2

Authors (4): Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, Robert Tjarko Lange

While Foundation Models provide a general tool for rapid content creation, they regularly require task-specific adaptation. Traditionally, this exercise involves careful curation of datasets and repeated fine-tuning of the underlying model. Fine-tuning techniques enable practitioners to adapt foundation models for many new applications but require expensive and lengthy training while being notably sensitive to hyperparameter choices. To overcome these limitations, we introduce Text-to-LoRA (T2L), a model capable of adapting large language models (LLMs) on the fly solely based on a natural language description of the target task. T2L is a hypernetwork trained to construct LoRAs in a single inexpensive forward pass. After training T2L on a suite of 9 pre-trained LoRA adapters (GSM8K, Arc, etc.), we show that the ad-hoc reconstructed LoRA instances match the performance of task-specific adapters across the corresponding test sets. Furthermore, T2L can compress hundreds of LoRA instances and zero-shot generalize to entirely unseen tasks. This approach provides a significant step towards democratizing the specialization of foundation models and enables language-based adaptation with minimal compute requirements. Our code is available at https://github.com/SakanaAI/text-to-lora

nan


Article 1027

Title@2025-06-09 (1): Detecting Out-of-Distribution Objects through Class-Conditioned Inpainting

Title: Detecting Out-of-Distribution Objects through Class-Conditioned Inpainting Nicht-Verteilungsobjekte durch klassenkonditionierte Inpainting erkennen 通过按类别修饰油漆检测分销物体 2402.03292v3

Authors (7): Quang-Huy Nguyen, Jin Peng Zhou, Zhenzhen Liu, Khanh-Huyen Bui, Kilian Q. Weinberger, Wei-Lun Chao, Dung D. Le

Recent object detectors have achieved impressive accuracy in identifying objects seen during training. However, real-world deployment often introduces novel and unexpected objects, referred to as out-of-distribution (OOD) objects, posing significant challenges to model trustworthiness. Modern object detectors are typically overconfident, making it unreliable to use their predictions alone for OOD detection. To address this, we propose leveraging an auxiliary model as a complementary solution. Specifically, we utilize an off-the-shelf text-to-image generative model, such as Stable Diffusion, which is trained with objective functions distinct from those of discriminative object detectors. We hypothesize that this fundamental difference enables the detection of OOD objects by measuring inconsistencies between the models. Concretely, for a given detected object bounding box and its predicted in-distribution class label, we perform class-conditioned inpainting on the image with the object removed. If the object is OOD, the inpainted image is likely to deviate significantly from the original, making the reconstruction error a robust indicator of OOD status. Extensive experiments demonstrate that our approach consistently surpasses existing zero-shot and non-zero-shot OOD detection methods, establishing a robust framework for enhancing object detection systems in dynamic environments.

nan


Article 1028

Title@2025-06-09 (1): Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy

Title: Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy Prozess und Vertrauen: Byzantinische Angriffe mit umfassender Verteidigungsstrategie angehen 审判和信任:以全面防卫战略对付拜占庭袭击 2505.07614v2

Authors (9): Gleb Molodtsov, Daniil Medyakov, Sergey Skorik, Nikolas Khachaturov, Shahane Tigranyan, Vladimir Aletov, Aram Avetisyan, Martin Takáč, Aleksandr Beznosikov

Recent advancements in machine learning have improved performance while also increasing computational demands. While federated and distributed setups address these issues, their structure is vulnerable to malicious influences. In this paper, we address a specific threat, Byzantine attacks, where compromised clients inject adversarial updates to derail global convergence. We combine the trust scores concept with trial function methodology to dynamically filter outliers. Our methods address the critical limitations of previous approaches, allowing functionality even when Byzantine nodes are in the majority. Moreover, our algorithms adapt to widely used scaled methods like Adam and RMSProp, as well as practical scenarios, including local training and partial participation. We validate the robustness of our methods by conducting extensive experiments on both synthetic and real ECG data collected from medical institutions. Furthermore, we provide a broad theoretical analysis of our algorithms and their extensions to aforementioned practical setups. The convergence guarantees of our methods are comparable to those of classical algorithms developed without Byzantine interference.

nan


Article 1029

Title@2025-06-09 (1): Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger

Title: Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger Re-Ranking Reasoning Kontext mit Baumsuche macht große Vision-Sprache Modelle stärker 以树木搜索为重新排列的原因说明背景,使大型视觉语言模型更强大 2506.07785v1

Authors (6): Qi Yang, Chenghao Zhang, Lubin Fan, Kun Ding, Jieping Ye, Shiming Xiang

Recent advancements in Large Vision Language Models (LVLMs) have significantly improved performance in Visual Question Answering (VQA) tasks through multimodal Retrieval-Augmented Generation (RAG). However, existing methods still face challenges, such as the scarcity of knowledge with reasoning examples and erratic responses from retrieved knowledge. To address these issues, in this study, we propose a multimodal RAG framework, termed RCTS, which enhances LVLMs by constructing a Reasoning Context-enriched knowledge base and a Tree Search re-ranking method. Specifically, we introduce a self-consistent evaluation mechanism to enrich the knowledge base with intrinsic reasoning patterns. We further propose a Monte Carlo Tree Search with Heuristic Rewards (MCTS-HR) to prioritize the most relevant examples. This ensures that LVLMs can leverage high-quality contextual reasoning for better and more consistent responses. Extensive experiments demonstrate that our framework achieves state-of-the-art performance on multiple VQA datasets, significantly outperforming In-Context Learning (ICL) and Vanilla-RAG methods. It highlights the effectiveness of our knowledge base and re-ranking method in improving LVLMs. Our code is available at https://github.com/yannqi/RCTS-RAG.

nan


Article 1030

Title@2025-06-09 (1): Is poisoning a real threat to LLM alignment? Maybe more so than you think

Title: Is poisoning a real threat to LLM alignment? Maybe more so than you think Ist Vergiftung eine echte Bedrohung für LLM Ausrichtung? Vielleicht mehr, als Sie denken 中毒是否真正威胁到LLM对齐? 2406.12091v4

Authors (5): Pankayaraj Pathmanathan, Souradip Chakraborty, Xiangyu Liu, Yongyuan Liang, Furong Huang

Recent advancements in Reinforcement Learning with Human Feedback (RLHF) have significantly impacted the alignment of Large Language Models (LLMs). The sensitivity of reinforcement learning algorithms such as Proximal Policy Optimization (PPO) has led to new line work on Direct Policy Optimization (DPO), which treats RLHF in a supervised learning framework. The increased practical use of these RLHF methods warrants an analysis of their vulnerabilities. In this work, we investigate the vulnerabilities of DPO to poisoning attacks under different scenarios and compare the effectiveness of preference poisoning, a first of its kind. We comprehensively analyze DPO’s vulnerabilities under different types of attacks, i.e., backdoor and non-backdoor attacks, and different poisoning methods across a wide array of language models, i.e., LLama 7B, Mistral 7B, and Gemma 7B. We find that unlike PPO-based methods, which, when it comes to backdoor attacks, require at least 4\% of the data to be poisoned to elicit harmful behavior, we exploit the true vulnerabilities of DPO more simply so we can poison the model with only as much as 0.5\% of the data. We further investigate the potential reasons behind the vulnerability and how well this vulnerability translates into backdoor vs non-backdoor attacks.

nan


Article 1031

Title@2025-06-09 (1): WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection

Title: WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection WWAggr: Ein Fenster Wasserstein-basierte Aggregation für Ensemble Change Point Detection WWWAggr:一个基于窗口的瓦西施泰因聚合群,用于聚合变化点探测 2506.08066v1

Authors (3): Alexander Stepikin, Evgenia Romanenkova, Alexey Zaytsev

Change Point Detection (CPD) aims to identify moments of abrupt distribution shifts in data streams. Real-world high-dimensional CPD remains challenging due to data pattern complexity and violation of common assumptions. Resorting to standalone deep neural networks, the current state-of-the-art detectors have yet to achieve perfect quality. Concurrently, ensembling provides more robust solutions, boosting the performance. In this paper, we investigate ensembles of deep change point detectors and realize that standard prediction aggregation techniques, e.g., averaging, are suboptimal and fail to account for problem peculiarities. Alternatively, we introduce WWAggr – a novel task-specific method of ensemble aggregation based on the Wasserstein distance. Our procedure is versatile, working effectively with various ensembles of deep CPD models. Moreover, unlike existing solutions, we practically lift a long-standing problem of the decision threshold selection for CPD.

nan


Article 1032

Title@2025-06-09 (1): Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics

Title: Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics Die Grenzen der Low-Bit-Optimierer drücken: Fokus auf EMA-Dynamik 推动低比最佳优化因素的极限:注重EMA动态 2505.00347v2

Authors (10): Cong Xu, Wenbin Liang, Mo Yu, Anan Liu, Ke-Yue Zhang, Shunli Wang, Lizhuang Ma, Jianyong Wang, Jun Wang, Wei Zhang

The rapid scaling of models has led to prohibitively high training and fine-tuning costs. A major factor accounting for memory consumption is the widespread use of stateful optimizers (e.g., Adam), which maintain auxiliary information of even 2x the model size in order to achieve optimal convergence. We therefore present SOLO in this work to spawn a novel type of optimizer that requires an extremely light memory footprint. While previous efforts have achieved certain success in 8-bit or 4-bit cases, SOLO enables Adam-style optimizers to maintain quantized states with precision as low as 3 bits, or even 2 bits. This immense progress is due to the identification and resolution of two key challenges: the signal swamping problem in unsigned quantization that results in unchanged state dynamics, and the increased gradient variance in signed quantization that leads to incorrect descent directions. The theoretical analysis suggests a tailored logarithmic quantization for the former and a precision-specific momentum hyperparameter for the latter. SOLO can thus be seamlessly applied to Adam-style optimizers, leading to substantial memory savings with minimal accuracy loss.

nan


Article 1033

Title@2025-06-09 (1): Trend-Aware Fashion Recommendation with Visual Segmentation and Semantic Similarity

Title: Trend-Aware Fashion Recommendation with Visual Segmentation and Semantic Similarity Trend-Aware Fashion Empfehlung mit Visueller Segmentierung und semantischer Ähnlichkeit 具有视觉分解和语义相似性的趋势软件时装建议 2506.07773v1

Authors (3): Mohamed Djilani, Nassim Ali Ousalah, Nidhal Eddine Chenni

We introduce a trend-aware and visually-grounded fashion recommendation system that integrates deep visual representations, garment-aware segmentation, semantic category similarity and user behavior simulation. Our pipeline extracts focused visual embeddings by masking non-garment regions via semantic segmentation followed by feature extraction using pretrained CNN backbones (ResNet-50, DenseNet-121, VGG16). To simulate realistic shopping behavior, we generate synthetic purchase histories influenced by user-specific trendiness and item popularity. Recommendations are computed using a weighted scoring function that fuses visual similarity, semantic coherence and popularity alignment. Experiments on the DeepFashion dataset demonstrate consistent gender alignment and improved category relevance, with ResNet-50 achieving 64.95% category similarity and lowest popularity MAE. An ablation study confirms the complementary roles of visual and popularity cues. Our method provides a scalable framework for personalized fashion recommendations that balances individual style with emerging trends. Our implementation is available at https://github.com/meddjilani/FashionRecommender

nan


Article 1034

Title: Diffusion Models-Aided Uplink Channel Estimation for RIS-Assisted Systems Diffusion Models-Aided Uplink Channel Abschätzung für RIS-Assistente Systeme IRIS辅助系统传播模型辅助上链接频道估计 2506.07770v1

Authors (8): Yang Wang, Yin Xu, Cixiao Zhang, Zhiyong Chen, Xiaowu Ou, Mingzeng Dai, Meixia Tao, Wenjun Zhang

This letter proposes a channel estimation method for reconfigurable intelligent surface (RIS)-assisted systems through a novel diffusion model (DM) framework. We reformulate the channel estimation problem as a denoising process, which aligns with the reverse process of the DM. To overcome the inherent randomness in the reverse process of conventional DM approaches, we adopt a deterministic sampling strategy with a step alignment mechanism that ensures the accuracy of channel estimation while adapting to different signal-to-noise ratio (SNR). Furthermore, to reduce the number of parameters of the U-Net, we meticulously design a lightweight network that achieves comparable performance, thereby enhancing the practicality of our proposed method. Extensive simulations demonstrate superior performance over a wide range of SNRs compared to baselines. For instance, the proposed method achieves performance improvements of up to 13.5 dB in normalized mean square error (NMSE) at SNR = 0 dB. Notably, the proposed lightweight network exhibits almost no performance loss compared to the original U-Net, while requiring only 6.59\% of its parameters.

nan


Article 1035

Title@2025-06-09 (1): Clustered Federated Learning via Embedding Distributions

Title: Clustered Federated Learning via Embedding Distributions Clustered Federated Learning durch Einbettung von Distributionen 通过嵌入式分发分发方式进行分组联邦学习 2506.07769v1

Authors (3): Dekai Zhang, Matthew Williams, Francesca Toni

Federated learning (FL) is a widely used framework for machine learning in distributed data environments where clients hold data that cannot be easily centralised, such as for data protection reasons. FL, however, is known to be vulnerable to non-IID data. Clustered FL addresses this issue by finding more homogeneous clusters of clients. We propose a novel one-shot clustering method, EMD-CFL, using the Earth Mover’s distance (EMD) between data distributions in embedding space. We theoretically motivate the use of EMDs using results from the domain adaptation literature and demonstrate empirically superior clustering performance in extensive comparisons against 16 baselines and on a range of challenging datasets.

nan


Article 1036

Title@2025-06-09 (1): Quickest Causal Change Point Detection by Adaptive Intervention

Title: Quickest Causal Change Point Detection by Adaptive Intervention Schnellste ursächliche Punktänderungserkennung durch adaptive Intervention 通过适应干预措施快速发现因果变化点 2506.07760v1

Authors (2): Haijie Xu, Chen Zhang

We propose an algorithm for change point monitoring in linear causal models that accounts for interventions. Through a special centralization technique, we can concentrate the changes arising from causal propagation across nodes into a single dimension. Additionally, by selecting appropriate intervention nodes based on Kullback-Leibler divergence, we can amplify the change magnitude. We also present an algorithm for selecting the intervention values, which aids in the identification of the most effective intervention nodes. Two monitoring methods are proposed, each with an adaptive intervention policy to make a balance between exploration and exploitation. We theoretically demonstrate the first-order optimality of the proposed methods and validate their properties using simulation datasets and two real-world case studies.

nan


Article 1037

Title@2025-06-09 (1): Agent Semantics, Semantic Spacetime, and Graphical Reasoning

Title: Agent Semantics, Semantic Spacetime, and Graphical Reasoning Agent Semantics, Semantische Raumzeit und Graphische Vernunft 语义学、语义空间时间和图形解释 2506.07756v1

Authors (1): Mark Burgess

Some formal aspects of the Semantic Spacetime graph model are presented, with reference to its use for directed knowledge representations and process modelling. A finite $\gamma(3,4)$ representation is defined to form a closed set of operations that can scale to any degree of semantic complexity. The Semantic Spacetime postulates bring predictability with minimal constraints to pathways in graphs. The ubiquitous appearance of absorbing states in any partial graph means that a graph process leaks information. The issue is closely associated with the issue of division by zero, which signals a loss of closure and the need for manual injection of remedial information. The Semantic Spacetime model (and its Promise Theory) origins help to clarify how such absorbing states are associated with boundary information where intentionality can enter.

nan


Article 1038

Title@2025-06-09 (1): Comparing Credit Risk Estimates in the Gen-AI Era

Title: Comparing Credit Risk Estimates in the Gen-AI Era Vergleich der Schätzungen des Kreditrisikos in der Ära der Gen-AI 比较GEN-AI时代的信用风险估算 2506.07754v1

Authors (7): Nicola Lavecchia, Sid Fadanelli, Federico Ricciuti, Gennaro Aloe, Enrico Bagli, Pietro Giuffrida, Daniele Vergari

Generative AI technologies have demonstrated significant potential across diverse applications. This study provides a comparative analysis of credit score modeling techniques, contrasting traditional approaches with those leveraging generative AI. Our findings reveal that current generative AI models fall short of matching the performance of traditional methods, regardless of the integration strategy employed. These results highlight the limitations in the current capabilities of generative AI for credit risk scoring, emphasizing the need for further research and development before the possibility of applying generative AI for this specific task, or equivalent ones.

nan


Article 1039

Title@2025-06-09 (1): An Overview of the Burer-Monteiro Method for Certifiable Robot Perception

Title: An Overview of the Burer-Monteiro Method for Certifiable Robot Perception Ein Überblick über die Burer-Monteiro-Methode für zertifizierbare Roboterwahrnehmung 关于可核证的机器人感知的Burer-Monteiro方法概览 2410.00117v2

Authors (5): Alan Papalia, Yulun Tian, David M. Rosen, Jonathan P. How, John J. Leonard

This paper presents an overview of the Burer-Monteiro method (BM), a technique that has been applied to solve robot perception problems to certifiable optimality in real-time. BM is often used to solve semidefinite programming relaxations, which can be used to perform global optimization for non-convex perception problems. Specifically, BM leverages the low-rank structure of typical semidefinite programs to dramatically reduce the computational cost of performing optimization. This paper discusses BM in certifiable perception, with three main objectives: (i) to consolidate information from the literature into a unified presentation, (ii) to elucidate the role of the linear independence constraint qualification (LICQ), a concept not yet well-covered in certifiable perception literature, and (iii) to share practical considerations that are discussed among practitioners but not thoroughly covered in the literature. Our general aim is to offer a practical primer for applying BM towards certifiable perception.

nan


Article 1040

Title@2025-06-09 (1): DualCast: A Model to Disentangle Aperiodic Events from Traffic Series

Title: DualCast: A Model to Disentangle Aperiodic Events from Traffic Series DualCast: Ein Modell, um Aperiodische Ereignisse aus der Verkehrsreihe zu entwirren DualCast: 将周期性事件与交通系列分开的模型 2411.18286v2

Authors (6): Xinyu Su, Feng Liu, Yanchuan Chang, Egemen Tanin, Majid Sarvi, Jianzhong Qi

Traffic forecasting is crucial for transportation systems optimisation. Current models minimise the mean forecasting errors, often favouring periodic events prevalent in the training data, while overlooking critical aperiodic ones like traffic incidents. To address this, we propose DualCast, a dual-branch framework that disentangles traffic signals into intrinsic spatial-temporal patterns and external environmental contexts, including aperiodic events. DualCast also employs a cross-time attention mechanism to capture high-order spatial-temporal relationships from both periodic and aperiodic patterns. DualCast is versatile. We integrate it with recent traffic forecasting models, consistently reducing their forecasting errors by up to 9.6% on multiple real datasets. Our source code is available at https://github.com/suzy0223/DualCast.

nan


Article 1041

Title@2025-06-09 (1): Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

Title: Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning Graph-Assistente Stiche für Offline-Hierarchisches Verstärkungslernen 离线高层强化学习的图表辅助细化 2506.07744v1

Authors (5): Seungho Baek, Taegeon Park, Jongchan Park, Seungjun Oh, Yusung Kim

Existing offline hierarchical reinforcement learning methods rely on high-level policy learning to generate subgoal sequences. However, their efficiency degrades as task horizons increase, and they lack effective strategies for stitching useful state transitions across different trajectories. We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. By embedding states into a Temporal Distance Representation (TDR) space, GAS clusters semantically similar states from different trajectories into unified graph nodes, enabling efficient transition stitching. A shortest-path algorithm is then applied to select subgoal sequences within the graph, while a low-level policy learns to reach the subgoals. To improve graph quality, we introduce the Temporal Efficiency (TE) metric, which filters out noisy or inefficient transition states, significantly enhancing task performance. GAS outperforms prior offline HRL methods across locomotion, navigation, and manipulation tasks. Notably, in the most stitching-critical task, it achieves a score of 88.3, dramatically surpassing the previous state-of-the-art score of 1.0. Our source code is available at: https://github.com/qortmdgh4141/GAS.

nan


Article 1042

Title@2025-06-09 (1): An end-to-end attention-based approach for learning on graphs

Title: An end-to-end attention-based approach for learning on graphs Ein durchgängiger, auf der Aufmerksamkeit basierender Ansatz für das Lernen auf Diagrammen 以端对端关注为基础的图表学习方法 2402.10793v3

Authors (4): David Buterez, Jon Paul Janet, Dino Oglic, Pietro Lio

There has been a recent surge in transformer-based architectures for learning on graphs, mainly motivated by attention as an effective learning mechanism and the desire to supersede handcrafted operators characteristic of message passing schemes. However, concerns over their empirical effectiveness, scalability, and complexity of the pre-processing steps have been raised, especially in relation to much simpler graph neural networks that typically perform on par with them across a wide range of benchmarks. To tackle these shortcomings, we consider graphs as sets of edges and propose a purely attention-based approach consisting of an encoder and an attention pooling mechanism. The encoder vertically interleaves masked and vanilla self-attention modules to learn an effective representations of edges, while allowing for tackling possible misspecifications in input graphs. Despite its simplicity, the approach outperforms fine-tuned message passing baselines and recently proposed transformer-based methods on more than 70 node and graph-level tasks, including challenging long-range benchmarks. Moreover, we demonstrate state-of-the-art performance across different tasks, ranging from molecular to vision graphs, and heterophilous node classification. The approach also outperforms graph neural networks and transformers in transfer learning settings, and scales much better than alternatives with a similar performance level or expressive power.

nan


Article 1043

Title@2025-06-09 (1): When, Where and Why to Average Weights?

Title: When, Where and Why to Average Weights? Wann, wo und warum zu Durchschnittsgewichten? 何时、何地和为何平均体重? 2502.06761v2

Authors (3): Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping

Averaging checkpoints along the training trajectory is a simple yet powerful approach to improve the generalization performance of Machine Learning models and reduce training time. Motivated by these potential gains, and in an effort to fairly and thoroughly benchmark this technique, we present an extensive evaluation of averaging techniques in modern Deep Learning, which we perform using AlgoPerf \citep{dahl_benchmarking_2023}, a large-scale benchmark for optimization algorithms. We investigate whether weight averaging can reduce training time, improve generalization, and replace learning rate decay, as suggested by recent literature. Our evaluation across seven architectures and datasets reveals that averaging significantly accelerates training and yields considerable efficiency gains, at the price of a minimal implementation and memory cost, while mildly improving generalization across all considered workloads. Finally, we explore the relationship between averaging and learning rate annealing and show how to optimally combine the two to achieve the best performances.

nan


Article 1044

Title@2025-06-09 (1): Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning

Title: Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning Spracheinbettung trifft auf Dynamische Grafik: Eine neue Exploration für neurale Architektur-Repräsentations-Lernen 语言嵌入与动态图表相匹配:神经结构代表制学习的新探索 2506.07735v1

Authors (6): Haizhao Jing, Haokui Zhang, Zhenhao Shang, Rong Xiao, Peng Wang, Yanning Zhang

Neural Architecture Representation Learning aims to transform network models into feature representations for predicting network attributes, playing a crucial role in deploying and designing networks for real-world applications. Recently, inspired by the success of transformers, transformer-based models integrated with Graph Neural Networks (GNNs) have achieved significant progress in representation learning. However, current methods still have some limitations. First, existing methods overlook hardware attribute information, which conflicts with the current trend of diversified deep learning hardware and limits the practical applicability of models. Second, current encoding approaches rely on static adjacency matrices to represent topological structures, failing to capture the structural differences between computational nodes, which ultimately compromises encoding effectiveness. In this paper, we introduce LeDG-Former, an innovative framework that addresses these limitations through the synergistic integration of language-based semantic embedding and dynamic graph representation learning. Specifically, inspired by large language models (LLMs), we propose a language embedding framework where both neural architectures and hardware platform specifications are projected into a unified semantic space through tokenization and LLM processing, enabling zero-shot prediction across different hardware platforms for the first time. Then, we propose a dynamic graph-based transformer for modeling neural architectures, resulting in improved neural architecture modeling performance. On the NNLQP benchmark, LeDG-Former surpasses previous methods, establishing a new SOTA while demonstrating the first successful cross-hardware latency prediction capability. Furthermore, our framework achieves superior performance on the cell-structured NAS-Bench-101 and NAS-Bench-201 datasets.

nan


Article 1045

Title@2025-06-09 (1): Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild

Title: Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild Robuste 3D-Form-Rekonstruktion in Zero-Shot aus einem einzigen Bild in der Wildnis 从野生的单一图像中进行零热零位的强力 3D 形状重建 2403.14539v3

Authors (4): Junhyeong Cho, Kim Youwang, Hunmin Yang, Tae-Hyun Oh

Recent monocular 3D shape reconstruction methods have shown promising zero-shot results on object-segmented images without any occlusions. However, their effectiveness is significantly compromised in real-world conditions, due to imperfect object segmentation by off-the-shelf models and the prevalence of occlusions. To effectively address these issues, we propose a unified regression model that integrates segmentation and reconstruction, specifically designed for occlusion-aware 3D shape reconstruction. To facilitate its reconstruction in the wild, we also introduce a scalable data synthesis pipeline that simulates a wide range of variations in objects, occluders, and backgrounds. Training on our synthetic data enables the proposed model to achieve state-of-the-art zero-shot results on real-world images, using significantly fewer parameters than competing approaches.

nan


Article 1046

Title@2025-06-09 (1): Profiling Electric Vehicles via Early Charging Voltage Patterns

Title: Profiling Electric Vehicles via Early Charging Voltage Patterns Profilieren von Elektrofahrzeugen über frühe Ladespannungsmuster 通过早期充电压模式分析电动车辆 2506.07714v1

Authors (4): Francesco Marchiori, Denis Donadel, Alessandro Brighente, Mauro Conti

Electric Vehicles (EVs) are rapidly gaining adoption as a sustainable alternative to fuel-powered vehicles, making secure charging infrastructure essential. Despite traditional authentication protocols, recent results showed that attackers may steal energy through tailored relay attacks. One countermeasure is leveraging the EV’s fingerprint on the current exchanged during charging. However, existing methods focus on the final charging stage, allowing malicious actors to consume substantial energy before being detected and repudiated. This underscores the need for earlier and more effective authentication methods to prevent unauthorized charging. Meanwhile, profiling raises privacy concerns, as uniquely identifying EVs through charging patterns could enable user tracking. In this paper, we propose a framework for uniquely identifying EVs using physical measurements from the early charging stages. We hypothesize that voltage behavior early in the process exhibits similar characteristics to current behavior in later stages. By extracting features from early voltage measurements, we demonstrate the feasibility of EV profiling. Our approach improves existing methods by enabling faster and more reliable vehicle identification. We test our solution on a dataset of 7408 usable charges from 49 EVs, achieving up to 0.86 accuracy. Feature importance analysis shows that near-optimal performance is possible with just 10 key features, improving efficiency alongside our lightweight models. This research lays the foundation for a novel authentication factor while exposing potential privacy risks from unauthorized access to charging data.

nan


Article 1047

Title@2025-06-09 (1): Representation Bending for Large Language Model Safety

Title: Representation Bending for Large Language Model Safety Darstellungsbiegen für große Sprachmodellsicherheit 大语文示范语文安全示范语文代表名单 2504.01550v2

Authors (10): Ashkan Yousefpour, Taeheon Kim, Ryan S. Kwon, Seungbeen Lee, Wonje Jeung, Seungju Han, Alvin Wan, Harrison Ngan, Youngjae Yu, Jonghyun Choi

Large Language Models (LLMs) have emerged as powerful tools, but their inherent safety risks - ranging from harmful content generation to broader societal harms - pose significant challenges. These risks can be amplified by the recent adversarial attacks, fine-tuning vulnerabilities, and the increasing deployment of LLMs in high-stakes environments. Existing safety-enhancing techniques, such as fine-tuning with human feedback or adversarial training, are still vulnerable as they address specific threats and often fail to generalize across unseen attacks, or require manual system-level defenses. This paper introduces RepBend, a novel approach that fundamentally disrupts the representations underlying harmful behaviors in LLMs, offering a scalable solution to enhance (potentially inherent) safety. RepBend brings the idea of activation steering - simple vector arithmetic for steering model’s behavior during inference - to loss-based fine-tuning. Through extensive evaluation, RepBend achieves state-of-the-art performance, outperforming prior methods such as Circuit Breaker, RMU, and NPO, with up to 95% reduction in attack success rates across diverse jailbreak benchmarks, all with negligible reduction in model usability and general capabilities.

nan


Article 1048

Title@2025-06-09 (1): Statistical Taylor Expansion

Title: Statistical Taylor Expansion Statistische Taylor-Erweiterung 扩大泰勒统计 2410.01223v8

Authors (1): Chengpu Wang

Statistical Taylor expansion replaces the input precise variables in a conventional Taylor expansion with random variables each with known distribution, to calculate the result mean and deviation. It is based on the uncorrelated uncertainty assumption: Each input variable is measured independently with fine enough statistical precision, so that their uncertainties are independent of each other. It requires each sampling count to be sufficiently large for accurate results. Statistical Taylor expansion reviews that the intermediate analytic expressions can no longer be regarded as independent of each other, and the result of analytic expression should be path independent. This conclusion differs fundamentally from the conventional common approach in applied mathematics to find the best execution path for a result. This paper also presents an implementation of statistical Taylor expansion called variance arithmetic, and the tests on variance arithmetic.

nan


Article 1049

Title@2025-06-09 (1): Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation

Title: Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation Bewertung der Robustheit in latenten Diffusionsmodellen durch Einbetten von Level Augmentation 通过嵌入级递增评估前流传播模型的强力 2506.07706v1

Authors (2): Boris Martirosyan, Alexey Karmanov

Latent diffusion models (LDMs) achieve state-of-the-art performance across various tasks, including image generation and video synthesis. However, they generally lack robustness, a limitation that remains not fully explored in current research. In this paper, we propose several methods to address this gap. First, we hypothesize that the robustness of LDMs primarily should be measured without their text encoder, because if we take and explore the whole architecture, the problems of image generator and text encoders wll be fused. Second, we introduce novel data augmentation techniques designed to reveal robustness shortcomings in LDMs when processing diverse textual prompts. We then fine-tune Stable Diffusion 3 and Stable Diffusion XL models using Dreambooth, incorporating these proposed augmentation methods across multiple tasks. Finally, we propose a novel evaluation pipeline specifically tailored to assess the robustness of LDMs fine-tuned via Dreambooth.

nan


Article 1050

Title@2025-06-09 (1): Towards a Small Language Model Lifecycle Framework

Title: Towards a Small Language Model Lifecycle Framework Auf dem Weg zu einem Rahmen für den Lebenszyklus eines kleinen Sprachmodells 建立一个小型语言模拟生命周期框架 2506.07695v1

Authors (4): Parsa Miraghaei, Sergio Moreschini, Antti Kolehmainen, David Hästbacka

Background: The growing demand for efficient and deployable language models has led to increased interest in Small Language Models (SLMs). However, existing research remains fragmented, lacking a unified lifecycle perspective. Objective: This study aims to define a comprehensive lifecycle framework for SLMs by synthesizing insights from academic literature and practitioner sources. Method: We conducted a comprehensive survey of 36 works, analyzing and categorizing lifecycle-relevant techniques. Results: We propose a modular lifecycle model structured into main, optional, and cross-cutting components. The model captures key interconnections across stages, supporting method reuse, co-adaptation, and lifecycle-awareness. Conclusion: Our framework provides a coherent foundation for developing and maintaining SLMs, bridging theory and practice, and guiding future research and tool development.

nan


Article 1051

Title@2025-06-09 (1): Training Superior Sparse Autoencoders for Instruct Models

Title: Training Superior Sparse Autoencoders for Instruct Models Training Superior Sparse Autoencoder für Instruct-Modelle 培训高级Sparass Autoencors 教学模型高级导师 2506.07691v1

Authors (8): Jiaming Li, Haoran Ye, Yukun Chen, Xinyue Li, Lei Zhang, Hamid Alinejad-Rokny, Jimmy Chih-Hsien Peng, Min Yang

As large language models (LLMs) grow in scale and capability, understanding their internal mechanisms becomes increasingly critical. Sparse autoencoders (SAEs) have emerged as a key tool in mechanistic interpretability, enabling the extraction of human-interpretable features from LLMs. However, existing SAE training methods are primarily designed for base models, resulting in reduced reconstruction quality and interpretability when applied to instruct models. To bridge this gap, we propose $\underline{\textbf{F}}$inetuning-$\underline{\textbf{a}}$ligned $\underline{\textbf{S}}$equential $\underline{\textbf{T}}$raining ($\textit{FAST}$), a novel training method specifically tailored for instruct models. $\textit{FAST}$ aligns the training process with the data distribution and activation patterns characteristic of instruct models, resulting in substantial improvements in both reconstruction and feature interpretability. On Qwen2.5-7B-Instruct, $\textit{FAST}$ achieves a mean squared error of 0.6468 in token reconstruction, significantly outperforming baseline methods with errors of 5.1985 and 1.5096. In feature interpretability, $\textit{FAST}$ yields a higher proportion of high-quality features, for Llama3.2-3B-Instruct, $21.1\%$ scored in the top range, compared to $7.0\%$ and $10.2\%$ for $\textit{BT(P)}$ and $\textit{BT(F)}$. Surprisingly, we discover that intervening on the activations of special tokens via the SAEs leads to improvements in output quality, suggesting new opportunities for fine-grained control of model behavior. Code, data, and 240 trained SAEs are available at https://github.com/Geaming2002/FAST.

nan


Article 1052

Title@2025-06-09 (1): Rao-Blackwellised Reparameterisation Gradients

Title: Rao-Blackwellised Reparameterisation Gradients Rao-Blackwellised Reparameterisation Gradienten Rao-黑-黑- 恢复度梯度 2506.07687v1

Authors (4): Kevin Lam, Thang Bui, George Deligiannidis, Yee Whye Teh

Latent Gaussian variables have been popularised in probabilistic machine learning. In turn, gradient estimators are the machinery that facilitates gradient-based optimisation for models with latent Gaussian variables. The reparameterisation trick is often used as the default estimator as it is simple to implement and yields low-variance gradients for variational inference. In this work, we propose the R2-G2 estimator as the Rao-Blackwellisation of the reparameterisation gradient estimator. Interestingly, we show that the local reparameterisation gradient estimator for Bayesian MLPs is an instance of the R2-G2 estimator and Rao-Blackwellisation. This lets us extend benefits of Rao-Blackwellised gradients to a suite of probabilistic models. We show that initial training with R2-G2 consistently yields better performance in models with multiple applications of the reparameterisation trick.

nan


Article 1053

Title@2025-06-09 (1): On Fitting Flow Models with Large Sinkhorn Couplings

Title: On Fitting Flow Models with Large Sinkhorn Couplings Auf Einbau-Flow-Modelle mit großen Sinkhorn-Kupplungen 配有大型Sinkhorn组合体的 “ 适合流程模型 “ 2506.05526v2

Authors (4): Michal Klein, Alireza Mousavi-Hosseini, Stephen Zhang, Marco Cuturi

Flow models transform data gradually from one modality (e.g. noise) onto another (e.g. images). Such models are parameterized by a time-dependent velocity field, trained to fit segments connecting pairs of source and target points. When the pairing between source and target points is given, training flow models boils down to a supervised regression problem. When no such pairing exists, as is the case when generating data from noise, training flows is much harder. A popular approach lies in picking source and target points independently. This can, however, lead to velocity fields that are slow to train, but also costly to integrate at inference time. In theory, one would greatly benefit from training flow models by sampling pairs from an optimal transport (OT) measure coupling source and target, since this would lead to a highly efficient flow solving the Benamou and Brenier dynamical OT problem. In practice, recent works have proposed to sample mini-batches of $n$ source and $n$ target points and reorder them using an OT solver to form better pairs. These works have advocated using batches of size $n\approx 256$, and considered OT solvers that return couplings that are either sharp (using e.g. the Hungarian algorithm) or blurred (using e.g. entropic regularization, a.k.a. Sinkhorn). We follow in the footsteps of these works by exploring the benefits of increasing $n$ by three to four orders of magnitude, and look more carefully on the effect of the entropic regularization $\varepsilon$ used in the Sinkhorn algorithm. Our analysis is facilitated by new scale invariant quantities to report the sharpness of a coupling, while our sharded computations across multiple GPU or GPU nodes allow scaling up $n$. We show that in both synthetic and image generation tasks, flow models greatly benefit when fitted with large Sinkhorn couplings, with a low entropic regularization $\varepsilon$.

nan


Article 1054

Title@2025-06-09 (1): Forests for Differences: Robust Causal Inference Beyond Parametric DiD

Title: Forests for Differences: Robust Causal Inference Beyond Parametric DiD Wälder für Unterschiede: Robuste Kausalableitung jenseits parametrischer DiD 差异的森林:超出参数二D的强力因果推断 2505.09706v2

Authors (2): Hugo Gobato Souto, Francisco Louzada Neto

This paper introduces the Difference-in-Differences Bayesian Causal Forest (DiD-BCF), a novel non-parametric model addressing key challenges in DiD estimation, such as staggered adoption and heterogeneous treatment effects. DiD-BCF provides a unified framework for estimating Average (ATE), Group-Average (GATE), and Conditional Average Treatment Effects (CATE). A core innovation, its Parallel Trends Assumption (PTA)-based reparameterization, enhances estimation accuracy and stability in complex panel data settings. Extensive simulations demonstrate DiD-BCF’s superior performance over established benchmarks, particularly under non-linearity, selection biases, and effect heterogeneity. Applied to U.S. minimum wage policy, the model uncovers significant conditional treatment effect heterogeneity related to county population, insights obscured by traditional methods. DiD-BCF offers a robust and versatile tool for more nuanced causal inference in modern DiD applications.

nan


Article 1055

Title@2025-06-09 (1): Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models

Title: Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models Unlösbare Problemerkennung: Robuste Bewertung des Verständnisses für große multimodale Modelle 无法解决的问题探测:大型多式联运模型的强有力理解评估 2403.20331v4

Authors (10): Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa

This paper introduces a novel task to evaluate the robust understanding capability of Large Multimodal Models (LMMs), termed $\textbf{Unsolvable Problem Detection (UPD)}$. Multiple-choice question answering (MCQA) is widely used to assess the understanding capability of LMMs, but it does not guarantee that LMMs truly comprehend the answer. UPD assesses the LMM’s ability to withhold answers when encountering unsolvable problems of MCQA, verifying whether the model truly understands the answer. UPD encompasses three problems: Absent Answer Detection (AAD), Incompatible Answer Set Detection (IASD), and Incompatible Visual Question Detection (IVQD), covering unsolvable cases like answer-lacking or incompatible choices and image-question mismatches. For the evaluation, we introduce the MM-UPD Bench, a benchmark for assessing performance across various ability dimensions. Our experiments reveal that even most LMMs, which demonstrate adequate performance on existing benchmarks, struggle significantly with MM-UPD, underscoring a novel aspect of trustworthiness that current benchmarks have overlooked. A detailed analysis shows that LMMs have different bottlenecks and chain-of-thought and self-reflection improved performance for LMMs with the bottleneck in their LLM capability. We hope our insights will enhance the broader understanding and development of more reliable LMMs. The code is available at https://github.com/AtsuMiyai/UPD.

nan


Article 1056

Title@2025-06-09 (1): How Benchmark Prediction from Fewer Data Misses the Mark

Title: How Benchmark Prediction from Fewer Data Misses the Mark Wie Benchmark-Voraussagen von weniger Daten das Zeichen vermissen 从更少数据中的基准预测如何错过标记 2506.07673v1

Authors (3): Guanhua Zhang, Florian E. Dorner, Moritz Hardt

Large language model (LLM) evaluation is increasingly costly, prompting interest in methods that speed up evaluation by shrinking benchmark datasets. Benchmark prediction (also called efficient LLM evaluation) aims to select a small subset of evaluation points and predict overall benchmark performance from that subset. In this paper, we systematically assess the strengths and limitations of 11 benchmark prediction methods across 19 diverse benchmarks. First, we identify a highly competitive baseline: Take a random sample and fit a regression model on the sample to predict missing entries. Outperforming most existing methods, this baseline challenges the assumption that careful subset selection is necessary for benchmark prediction. Second, we discover that all existing methods crucially depend on model similarity. They work best when interpolating scores among similar models. The effectiveness of benchmark prediction sharply declines when new models have higher accuracy than previously seen models. In this setting of extrapolation, none of the previous methods consistently beat a simple average over random samples. To improve over the sample average, we introduce a new method inspired by augmented inverse propensity weighting. This method consistently outperforms the random sample average even for extrapolation. However, its performance still relies on model similarity and the gains are modest in general. This shows that benchmark prediction fails just when it is most needed: at the evaluation frontier, where the goal is to evaluate new models of unknown capabilities.

nan


Article 1057

Title@2025-06-09 (1): Data-driven inventory management for new products: An adjusted Dyna-$Q$ approach with transfer learning

Title: Data-driven inventory management for new products: An adjusted Dyna-$Q$ approach with transfer learning Datengesteuertes Bestandsmanagement für neue Produkte: Ein angepasster Dyna-$Q$-Ansatz mit Transfer-Lernen 新产品数据驱动的库存管理新产品库存管理:经调整的转让学习Dyna-Q$办法 2501.08109v4

Authors (3): Xinye Qu, Longxiao Liu, Wenjie Huang

In this paper, we propose a novel reinforcement learning algorithm for inventory management of newly launched products with no historical demand information. The algorithm follows the classic Dyna-$Q$ structure, balancing the model-free and model-based approaches, while accelerating the training process of Dyna-$Q$ and mitigating the model discrepancy generated by the model-based feedback. Based on the idea of transfer learning, warm-start information from the demand data of existing similar products can be incorporated into the algorithm to further stabilize the early-stage training and reduce the variance of the estimated optimal policy. Our approach is validated through a case study of bakery inventory management with real data. The adjusted Dyna-$Q$ shows up to a 23.7\% reduction in average daily cost compared with $Q$-learning, and up to a 77.5\% reduction in training time within the same horizon compared with classic Dyna-$Q$. By using transfer learning, it can be found that the adjusted Dyna-$Q$ has the lowest total cost, lowest variance in total cost, and relatively low shortage percentages among all the benchmarking algorithms under a 30-day testing.

nan


Article 1058

Title@2025-06-09 (1): ProARD: progressive adversarial robustness distillation: provide wide range of robust students

Title: ProARD: progressive adversarial robustness distillation: provide wide range of robust students ProARD: progressive konversarische Robustheitsdestillation: für eine breite Palette robuster Studenten ProARD: 进步对抗性强力蒸馏:提供各种强健学生 2506.07666v1

Authors (3): Seyedhamidreza Mousavi, Seyedali Mousavi, Masoud Daneshtalab

Adversarial Robustness Distillation (ARD) has emerged as an effective method to enhance the robustness of lightweight deep neural networks against adversarial attacks. Current ARD approaches have leveraged a large robust teacher network to train one robust lightweight student. However, due to the diverse range of edge devices and resource constraints, current approaches require training a new student network from scratch to meet specific constraints, leading to substantial computational costs and increased CO2 emissions. This paper proposes Progressive Adversarial Robustness Distillation (ProARD), enabling the efficient one-time training of a dynamic network that supports a diverse range of accurate and robust student networks without requiring retraining. We first make a dynamic deep neural network based on dynamic layers by encompassing variations in width, depth, and expansion in each design stage to support a wide range of architectures. Then, we consider the student network with the largest size as the dynamic teacher network. ProARD trains this dynamic network using a weight-sharing mechanism to jointly optimize the dynamic teacher network and its internal student networks. However, due to the high computational cost of calculating exact gradients for all the students within the dynamic network, a sampling mechanism is required to select a subset of students. We show that random student sampling in each iteration fails to produce accurate and robust students.

nan


Article 1059

Title@2025-06-09 (1): The Universality Lens: Why Even Highly Over-Parametrized Models Learn Well

Title: The Universality Lens: Why Even Highly Over-Parametrized Models Learn Well Die Universalitätslinse: Warum selbst hoch überparametrisierte Modelle gut lernen 普世透镜:为什么甚至高超美化模型学习良好 2506.07661v1

Authors (3): Meir Feder, Ruediger Urbanke, Yaniv Fogel

A fundamental question in modern machine learning is why large, over-parameterized models, such as deep neural networks and transformers, tend to generalize well, even when their number of parameters far exceeds the number of training samples. We investigate this phenomenon through the lens of information theory, grounded in universal learning theory. Specifically, we study a Bayesian mixture learner with log-loss and (almost) uniform prior over an expansive hypothesis class. Our key result shows that the learner’s regret is not determined by the overall size of the hypothesis class, but rather by the cumulative probability of all models that are close, in Kullback-Leibler divergence distance, to the true data-generating process. We refer to this cumulative probability as the weight of the hypothesis. This leads to a natural notion of model simplicity: simple models are those with large weight and thus require fewer samples to generalize, while complex models have small weight and need more data. This perspective provides a rigorous and intuitive explanation for why over-parameterized models often avoid overfitting: the presence of simple hypotheses allows the posterior to concentrate on them when supported by the data. We further bridge theory and practice by recalling that stochastic gradient descent with Langevin dynamics samples from the correct posterior distribution, enabling our theoretical learner to be approximated using standard machine learning methods combined with ensemble learning. Our analysis yields non-uniform regret bounds and aligns with key practical concepts such as flat minima and model distillation. The results apply broadly across online, batch, and supervised learning settings, offering a unified and principled understanding of the generalization behavior of modern AI systems.

nan


Article 1060

Title@2025-06-09 (1): Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks

Title: Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks Bewertung von Dialect Fairness und Robustheit von großen Sprachmodellen in mit Gründen versehenen Aufgaben 评估大语言模式在说明理由任务中的公平性和强度 2410.11005v3

Authors (10): Fangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Xun Wang, Si-Qing Chen, Michael Wooldridge, Janet B. Pierrehumbert, Furu Wei

Language is not monolithic. While benchmarks, including those designed for multiple languages, are often used as proxies to evaluate the performance of Large Language Models (LLMs), they tend to overlook the nuances of within-language variation and thus fail to model the experience of speakers of non-standard dialects. Focusing on African American Vernacular English (AAVE), we present the first study aimed at objectively assessing the fairness and robustness of LLMs in handling dialects across canonical reasoning tasks, including algorithm, math, logic, and integrated reasoning. We introduce ReDial (Reasoning with Dialect Queries), a benchmark containing 1.2K+ parallel query pairs in Standardized English and AAVE. We hire AAVE speakers, including experts with computer science backgrounds, to rewrite seven popular benchmarks, such as HumanEval and GSM8K. With ReDial, we evaluate widely used LLMs, including GPT, Claude, Llama, Mistral, and the Phi model families. Our findings reveal that almost all of these widely used models show significant brittleness and unfairness to queries in AAVE. Our work establishes a systematic and objective framework for analyzing LLM bias in dialectal queries. Moreover, it highlights how mainstream LLMs provide unfair service to dialect speakers in reasoning tasks, laying a critical foundation for future research.

nan


Article 1061

Title@2025-06-09 (1): Feature Statistics with Uncertainty Help Adversarial Robustness

Title: Feature Statistics with Uncertainty Help Adversarial Robustness Feature Statistik mit Unsicherheit Hilfe Widersprüchliche Robustheit 具有不确定性的特征统计 2503.20583v2

Authors (6): Ran Wang, Xinlei Zhou, Meng Hu, Rihao Li, Wenhui Wu, Yuheng Jia

Despite the remarkable success of deep neural networks (DNNs), the security threat of adversarial attacks poses a significant challenge to the reliability of DNNs. In this paper, both theoretically and empirically, we discover a universal phenomenon that has been neglected in previous works, i.e., adversarial attacks tend to shift the distributions of feature statistics. Motivated by this finding, and by leveraging the advantages of uncertainty-aware stochastic methods in building robust models efficiently, we propose an uncertainty-driven feature statistics adjustment module for robustness enhancement, named Feature Statistics with Uncertainty (FSU). It randomly resamples channel-wise feature means and standard deviations of examples from multivariate Gaussian distributions, which helps to reconstruct the perturbed examples and calibrate the shifted distributions. The calibration recovers some domain characteristics of the data for classification, thereby mitigating the influence of perturbations and weakening the ability of attacks to deceive models. The proposed FSU module has universal applicability in training, attacking, predicting, and fine-tuning, demonstrating impressive robustness enhancement ability at a trivial additional time cost. For example, by fine-tuning the well-established models with FSU, the state-of-the-art methods achieve up to 17.13% and 34.82% robustness improvement against powerful AA and CW attacks on benchmark datasets.

nan


Article 1062

Title@2025-06-09 (1): Trajectory World Models for Heterogeneous Environments

Title: Trajectory World Models for Heterogeneous Environments Trajektoriale Weltmodelle für heterogene Umgebungen 不同不同环境世界模型 2502.01366v2

Authors (7): Shaofeng Yin, Jialong Wu, Siqiao Huang, Xingjian Su, Xu He, Jianye Hao, Mingsheng Long

Heterogeneity in sensors and actuators across environments poses a significant challenge to building large-scale pre-trained world models on top of this low-dimensional sensor information. In this work, we explore pre-training world models for heterogeneous environments by addressing key transfer barriers in both data diversity and model flexibility. We introduce UniTraj, a unified dataset comprising over one million trajectories from 80 environments, designed to scale data while preserving critical diversity. Additionally, we propose TrajWorld, a novel architecture capable of flexibly handling varying sensor and actuator information and capturing environment dynamics in-context. Pre-training TrajWorld on UniTraj yields substantial gains in transition prediction, achieves a new state-of-the-art for off-policy evaluation, and also delivers superior online performance of model predictive control. To the best of our knowledge, this work, for the first time, demonstrates the transfer benefits of world models across heterogeneous and complex control environments. Code and data are available at https://github.com/thuml/TrajWorld.

nan


Article 1063

Title@2025-06-09 (1): Training Articulatory Inversion Models for Interspeaker Consistency

Title: Training Articulatory Inversion Models for Interspeaker Consistency Training Artikulatorische Inversionsmodelle für die Konsistenz von Interspeakern 跨口间连通性人工转换模型培训 2505.20529v3

Authors (3): Charles McGhee, Mark J. F. Gales, Kate M. Knill

Acoustic-to-Articulatory Inversion (AAI) attempts to model the inverse mapping from speech to articulation. Exact articulatory prediction from speech alone may be impossible, as speakers can choose different forms of articulation seemingly without reference to their vocal tract structure. However, once a speaker has selected an articulatory form, their productions vary minimally. Recent works in AAI have proposed adapting Self-Supervised Learning (SSL) models to single-speaker datasets, claiming that these single-speaker models provide a universal articulatory template. In this paper, we investigate whether SSL-adapted models trained on single and multi-speaker data produce articulatory targets which are consistent across speaker identities for English and Russian. We do this through the use of a novel evaluation method which extracts articulatory targets using minimal pair sets. We also present a training method which can improve interspeaker consistency using only speech data.

nan


Article 1064

Title@2025-06-09 (1): HieraEdgeNet: A Multi-Scale Edge-Enhanced Framework for Automated Pollen Recognition

Title: HieraEdgeNet: A Multi-Scale Edge-Enhanced Framework for Automated Pollen Recognition HieraEdgeNet: Multi-Scale Edge-Enhanced Framework für automatisierte Pollenerkennung HieraEdgeNet:一个多层边缘强化自动承认花粉框架 2506.07637v1

Authors (6): Yuchong Long, Wen Sun, Ningxiao Sun, Wenxiao Wang, Chao Li, Shan Yin

Automated pollen recognition is vital to paleoclimatology, biodiversity monitoring, and public health, yet conventional methods are hampered by inefficiency and subjectivity. Existing deep learning models often struggle to achieve the requisite localization accuracy for microscopic targets like pollen, which are characterized by their minute size, indistinct edges, and complex backgrounds. To overcome this limitation, we introduce HieraEdgeNet, a multi-scale edge-enhancement framework. The framework’s core innovation is the introduction of three synergistic modules: the Hierarchical Edge Module (HEM), which explicitly extracts a multi-scale pyramid of edge features that corresponds to the semantic hierarchy at early network stages; the Synergistic Edge Fusion (SEF) module, for deeply fusing these edge priors with semantic information at each respective scale; and the Cross Stage Partial Omni-Kernel Module (CSPOKM), which maximally refines the most detail-rich feature layers using an Omni-Kernel operator - comprising anisotropic large-kernel convolutions and mixed-domain attention - all within a computationally efficient Cross-Stage Partial (CSP) framework. On a large-scale dataset comprising 120 pollen classes, HieraEdgeNet achieves a mean Average Precision (mAP@.5) of 0.9501, significantly outperforming state-of-the-art baseline models such as YOLOv12n and RT-DETR. Furthermore, qualitative analysis confirms that our approach generates feature representations that are more precisely focused on object boundaries. By systematically integrating edge information, HieraEdgeNet provides a robust and powerful solution for high-precision, high-efficiency automated detection of microscopic objects.

nan


Article 1065

Title@2025-06-09 (1): RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors

Title: RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors RAID: Ein Datensatz für die Prüfung der adversarialen Robustheit von KI-generierten Bilddetektoren 用于测试AI-光学图像探测器的反向强力的数据集 2506.03988v3

Authors (11): Hicham Eddoubi, Jonas Ricker, Federico Cocchi, Lorenzo Baraldi, Angelo Sotgiu, Maura Pintor, Marcella Cornia, Lorenzo Baraldi, Asja Fischer, Rita Cucchiara, Battista Biggio

AI-generated images have reached a quality level at which humans are incapable of reliably distinguishing them from real images. To counteract the inherent risk of fraud and disinformation, the detection of AI-generated images is a pressing challenge and an active research topic. While many of the presented methods claim to achieve high detection accuracy, they are usually evaluated under idealized conditions. In particular, the adversarial robustness is often neglected, potentially due to a lack of awareness or the substantial effort required to conduct a comprehensive robustness analysis. In this work, we tackle this problem by providing a simpler means to assess the robustness of AI-generated image detectors. We present RAID (Robust evaluation of AI-generated image Detectors), a dataset of 72k diverse and highly transferable adversarial examples. The dataset is created by running attacks against an ensemble of seven state-of-the-art detectors and images generated by four different text-to-image models. Extensive experiments show that our methodology generates adversarial images that transfer with a high success rate to unseen detectors, which can be used to quickly provide an approximate yet still reliable estimate of a detector’s adversarial robustness. Our findings indicate that current state-of-the-art AI-generated image detectors can be easily deceived by adversarial examples, highlighting the critical need for the development of more robust methods. We release our dataset at https://huggingface.co/datasets/aimagelab/RAID and evaluation code at https://github.com/pralab/RAID.

nan


Article 1066

Title@2025-06-09 (1): Can Perplexity Predict Fine-tuning Performance? An Investigation of Tokenization Effects on Sequential Language Models for Nepali

Title: Can Perplexity Predict Fine-tuning Performance? An Investigation of Tokenization Effects on Sequential Language Models for Nepali Kann Perplexity Fine-Tuning Performance voraussagen? Eine Untersuchung von Tokenization-Effekten auf sequentielle Sprachmodelle für Nepali 重复性预测微调性能能否? 对尼泊尔的序列语言模式的收缩性影响调查i 2404.18071v2

Authors (4): Nishant Luitel, Nirajan Bekoju, Anand Kumar Sah, Subarna Shakya

The impact of subword tokenization on language model performance is well-documented for perplexity, with finer granularity consistently reducing this intrinsic metric. However, research on how different tokenization schemes affect a model’s understanding capabilities remains limited, particularly for non-Latin script languages. Addressing this gap, we conducted a comprehensive evaluation of six distinct tokenization strategies by pretraining transformer-based language models for Nepali and evaluating their performance across multiple downstream tasks. While recent prominent models like GPT, RoBERTa, Claude, LLaMA, Mistral, Falcon, and MPT have adopted byte-level BPE tokenization, our findings demonstrate that for Nepali, SentencePiece tokenization consistently yields superior results on understanding-based tasks. Unlike previous studies that primarily focused on BERT-based architectures, our research specifically examines sequential transformer models, providing valuable insights for language model development in low-resource languages and highlighting the importance of tokenization strategy beyond perplexity reduction.

nan


Article 1067

Title@2025-06-09 (1): AI-based Framework for Robust Model-Based Connector Mating in Robotic Wire Harness Installation

Title: AI-based Framework for Robust Model-Based Connector Mating in Robotic Wire Harness Installation KI-basiertes Framework für robuste modellbasierte Steckverbinder-Matierung in der Roboterdraht-Harness-Installation 以AI为基础的机器人电线装置中强有力的模型连接器配对框架 2503.09409v2

Authors (6): Claudius Kienle, Benjamin Alt, Finn Schneider, Tobias Pertlwieser, Rainer Jäkel, Rania Rayyes

Despite the widespread adoption of industrial robots in automotive assembly, wire harness installation remains a largely manual process, as it requires precise and flexible manipulation. To address this challenge, we design a novel AI-based framework that automates cable connector mating by integrating force control with deep visuotactile learning. Our system optimizes search-and-insertion strategies using first-order optimization over a multimodal transformer architecture trained on visual, tactile, and proprioceptive data. Additionally, we design a novel automated data collection and optimization pipeline that minimizes the need for machine learning expertise. The framework optimizes robot programs that run natively on standard industrial controllers, permitting human experts to audit and certify them. Experimental validations on a center console assembly task demonstrate significant improvements in cycle times and robustness compared to conventional robot programming approaches. Videos are available under https://claudius-kienle.github.io/AppMuTT.

nan


Article 1068

Title@2025-06-09 (1): Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks

Title: Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks Rückkehr von ChebNet: Verständnis und Verbesserung eines überblickten GNN bei langfristigen Aufgaben ChebNet的返回:了解和改进对长期任务GNN的忽视 2506.07624v1

Authors (9): Ali Hariri, Álvaro Arroyo, Alessio Gravina, Moshe Eliasof, Carola-Bibiane Schönlieb, Davide Bacciu, Kamyar Azizzadenesheli, Xiaowen Dong, Pierre Vandergheynst

ChebNet, one of the earliest spectral GNNs, has largely been overshadowed by Message Passing Neural Networks (MPNNs), which gained popularity for their simplicity and effectiveness in capturing local graph structure. Despite their success, MPNNs are limited in their ability to capture long-range dependencies between nodes. This has led researchers to adapt MPNNs through rewiring or make use of Graph Transformers, which compromises the computational efficiency that characterized early spatial message-passing architectures, and typically disregards the graph structure. Almost a decade after its original introduction, we revisit ChebNet to shed light on its ability to model distant node interactions. We find that out-of-box, ChebNet already shows competitive advantages relative to classical MPNNs and GTs on long-range benchmarks, while maintaining good scalability properties for high-order polynomials. However, we uncover that this polynomial expansion leads ChebNet to an unstable regime during training. To address this limitation, we cast ChebNet as a stable and non-dissipative dynamical system, which we coin Stable-ChebNet. Our Stable-ChebNet model allows for stable information propagation, and has controllable dynamics which do not require the use of eigendecompositions, positional encodings, or graph rewiring. Across several benchmarks, Stable-ChebNet achieves near state-of-the-art performance.

nan


Article 1069

Title@2025-06-09 (1): LoRMA: Low-Rank Multiplicative Adaptation for LLMs

Title: LoRMA: Low-Rank Multiplicative Adaptation for LLMs LoRMA: Multiplikative Anpassung mit niedrigem Rank für LLMs LoRMA:LLMML的低Rank多复制性适应 2506.07621v1

Authors (3): Harsh Bihany, Shubham Patel, Ashutosh Modi

Large Language Models have shown remarkable capabilities in the NLP domain. Their effectiveness can mainly be attributed to their ability to adapt to an array of downstream tasks. However, generally, full fine-tuning is a computationally expensive job. To mitigate this, many techniques have been developed that prime efficiency, a prominent one being Low-Rank Adaptation (LoRA). However, LoRA and its variants employ re-parametrized additive updates. In this paper, we propose Low-Rank Multiplicative Adaptation (LoRMA), which shifts the paradigm of additive updates to a richer space of matrix multiplicative transformations. We tackle challenges such as computational complexity and rank bottleneck of matrix multiplication by effectively re-ordering operations and introducing rank inflation strategies. We conduct extensive experiments to demonstrate the effectiveness of our approach in terms of various evaluation metrics.

nan


Article 1070

Title@2025-06-09 (1): Retrieval-Augmented Generation as Noisy In-Context Learning: A Unified Theory and Risk Bounds

Title: Retrieval-Augmented Generation as Noisy In-Context Learning: A Unified Theory and Risk Bounds Retrieval-Augmented Generation as Noisy In-Context Learning: Eine einheitliche Theorie und Risikogrenzen 作为有噪音的文内学习:统一理论和风险波 2506.03100v3

Authors (5): Yang Guo, Yutian Tao, Yifei Ming, Robert D. Nowak, Yingyu Liang

Retrieval-augmented generation (RAG) has seen many empirical successes in recent years by aiding the LLM with external knowledge. However, its theoretical aspect has remained mostly unexplored. In this paper, we propose the first finite-sample generalization bound for RAG in in-context linear regression and derive an exact bias-variance tradeoff. Our framework views the retrieved texts as query-dependent noisy in-context examples and recovers the classical in-context learning (ICL) and standard RAG as the limit cases. Our analysis suggests that an intrinsic ceiling on generalization error exists on RAG as opposed to the ICL. Furthermore, our framework is able to model retrieval both from the training data and from external corpora by introducing uniform and non-uniform RAG noise. In line with our theory, we show the sample efficiency of ICL and RAG empirically with experiments on common QA benchmarks, such as Natural Questions and TriviaQA.

nan


Article 1071

Title@2025-06-09 (1): The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning

Title: The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning Der Catechol-Benchmark: Zeitreihen-Lösungsauswahldaten für wenige Aufnahmen maschinelles Lernen Catechol基准:几发机器学习的时间序列溶剂选择数据 2506.07619v1

Authors (14): Toby Boyne, Juan S. Campos, Becky D. Langdon, Jixiang Qing, Yilin Xie, Shiqiang Zhang, Calvin Tsay, Ruth Misener, Daniel W. Davies, Kim E. Jelfs, Sarah Boyall, Thomas M. Dixon, Linden Schrecker, Jose Pablo Folch

Machine learning has promised to change the landscape of laboratory chemistry, with impressive results in molecular property prediction and reaction retro-synthesis. However, chemical datasets are often inaccessible to the machine learning community as they tend to require cleaning, thorough understanding of the chemistry, or are simply not available. In this paper, we introduce a novel dataset for yield prediction, providing the first-ever transient flow dataset for machine learning benchmarking, covering over 1200 process conditions. While previous datasets focus on discrete parameters, our experimental set-up allow us to sample a large number of continuous process conditions, generating new challenges for machine learning models. We focus on solvent selection, a task that is particularly difficult to model theoretically and therefore ripe for machine learning applications. We showcase benchmarking for regression algorithms, transfer-learning approaches, feature engineering, and active learning, with important applications towards solvent replacement and sustainable manufacturing.

nan


Article 1072

Title@2025-06-09 (1): FuXi-Air: Urban Air Quality Forecasting Based on Emission-Meteorology-Pollutant multimodal Machine Learning

Title: FuXi-Air: Urban Air Quality Forecasting Based on Emission-Meteorology-Pollutant multimodal Machine Learning FuXi-Air: Vorhersage der städtischen Luftqualität auf der Grundlage von Emissions-Meteorologie-Pollutan multimodalen maschinellen Lernens FUXi-Air:基于排放-气象-污染-多式联运机学习的城市空气质量预测 2506.07616v1

Authors (12): Zhixin Geng, Xu Fan, Xiqiao Lu, Yan Zhang, Guangyuan Yu, Cheng Huang, Qian Wang, Yuewu Li, Weichun Ma, Qi Yu, Libo Wu, Hao Li

Air pollution has emerged as a major public health challenge in megacities. Numerical simulations and single-site machine learning approaches have been widely applied in air quality forecasting tasks. However, these methods face multiple limitations, including high computational costs, low operational efficiency, and limited integration with observational data. With the rapid advancement of artificial intelligence, there is an urgent need to develop a low-cost, efficient air quality forecasting model for smart urban management. An air quality forecasting model, named FuXi-Air, has been constructed in this study based on multimodal data fusion to support high-precision air quality forecasting and operated in typical megacities. The model integrates meteorological forecasts, emission inventories, and pollutant monitoring data under the guidance of air pollution mechanism. By combining an autoregressive prediction framework with a frame interpolation strategy, the model successfully completes 72-hour forecasts for six major air pollutants at an hourly resolution across multiple monitoring sites within 25-30 seconds. In terms of both computational efficiency and forecasting accuracy, it outperforms the mainstream numerical air quality models in operational forecasting work. Ablation experiments concerning key influencing factors show that although meteorological data contribute more to model accuracy than emission inventories do, the integration of multimodal data significantly improves forecasting precision and ensures that reliable predictions are obtained under differing pollution mechanisms across megacities. This study provides both a technical reference and a practical example for applying multimodal data-driven models to air quality forecasting and offers new insights into building hybrid forecasting systems to support air pollution risk warning in smart city management.

nan


Article 1073

Title@2025-06-09 (1): Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds

Title: Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds Poisson Midpoint-Methode für Log Concave Sampling: Jenseits der starken Fehler unteren Bounds 日志集中取样的 Poisson 中点方法: 超越强误差, 下界 2506.07614v1

Authors (2): Rishikesh Srinivasan, Dheeraj Nagaraj

We study the problem of sampling from strongly log-concave distributions over $\mathbb{R}^d$ using the Poisson midpoint discretization (a variant of the randomized midpoint method) for overdamped/underdamped Langevin dynamics. We prove its convergence in the 2-Wasserstein distance ($W_2$), achieving a cubic speedup in dependence on the target accuracy ($\epsilon$) over the Euler-Maruyama discretization, surpassing existing bounds for randomized midpoint methods. Notably, in the case of underdamped Langevin dynamics, we demonstrate the complexity of $W_2$ convergence is much smaller than the complexity lower bounds for convergence in $L^2$ strong error established in the literature.

nan


Article 1074

Title@2025-06-09 (1): APE: Selective Fine-tuning with Acceptance Criteria for Language Model Adaptation

Title: APE: Selective Fine-tuning with Acceptance Criteria for Language Model Adaptation APE: Selektive Feinabstimmung mit Annahmekriterien für Sprachmodellanpassung APE:有选择地调整语言模式适应接受标准 2505.19912v2

Authors (1): Javier Marín

We present Adjacent Possible Exploration (APE), a selective fine-tuning method for adapting large language models that systematically explores parameter modifications while maintaining model stability. Inspired by evolutionary optimization principles, APE evaluates multiple candidate parameter updates through fine-tuning on small data subsets and accepts only those exceeding a performance threshold. Unlike standard fine-tuning that follows single gradient directions, APE implements a filtered selection process that prevents destabilizing parameter changes while enabling systematic improvement. Our method achieves 33.9\% BLEU improvement and 36.2\% perplexity reduction on news summarization tasks while using minimal computational resources. The approach provides a practical framework for controlled model adaptation that balances performance gains with representational stability.

nan


Article 1075

Title: Link Prediction with Relational Hypergraphs Link Vorhersage mit relationalen Hypergraphen 将预测与关系电报联系起来 2402.04062v3

Authors (5): Xingyue Huang, Miguel Romero Orth, Pablo Barceló, Michael M. Bronstein, İsmail İlkan Ceylan

Link prediction with knowledge graphs has been thoroughly studied in graph machine learning, leading to a rich landscape of graph neural network architectures with successful applications. Nonetheless, it remains challenging to transfer the success of these architectures to relational hypergraphs, where the task of link prediction is over $k$-ary relations, which is substantially harder than link prediction with knowledge graphs. In this paper, we propose a framework for link prediction with relational hypergraphs, unlocking applications of graph neural networks to fully relational structures. Theoretically, we conduct a thorough analysis of the expressive power of the resulting model architectures via corresponding relational Weisfeiler-Leman algorithms and also via logical expressiveness. Empirically, we validate the power of the proposed model architectures on various relational hypergraph benchmarks. The resulting model architectures substantially outperform every baseline for inductive link prediction, and lead to state-of-the-art results for transductive link prediction.

nan


Article 1076

Title@2025-06-09 (1): SWAG: Long-term Surgical Workflow Prediction with Generative-based Anticipation

Title: SWAG: Long-term Surgical Workflow Prediction with Generative-based Anticipation SWAG: Langzeit-Surgical Workflow Prediction mit generativer Antizipation SWAG: 长期外科工作流动预测,以产生为基础预测 2412.18849v3

Authors (5): Maxence Boels, Yang Liu, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin

While existing approaches excel at recognising current surgical phases, they provide limited foresight and intraoperative guidance into future procedural steps. Similarly, current anticipation methods are constrained to predicting short-term and single events, neglecting the dense, repetitive, and long sequential nature of surgical workflows. To address these needs and limitations, we propose SWAG (Surgical Workflow Anticipative Generation), a framework that combines phase recognition and anticipation using a generative approach. This paper investigates two distinct decoding methods - single-pass (SP) and auto-regressive (AR) - to generate sequences of future surgical phases at minute intervals over long horizons. We propose a novel embedding approach using class transition probabilities to enhance the accuracy of phase anticipation. Additionally, we propose a generative framework using remaining time regression to classification (R2C). SWAG was evaluated on two publicly available datasets, Cholec80 and AutoLaparo21. Our single-pass model with class transition probability embeddings (SP*) achieves 32.1% and 41.3% F1 scores over 20 and 30 minutes on Cholec80 and AutoLaparo21, respectively. Moreover, our approach competes with existing methods on phase remaining time regression, achieving weighted mean absolute errors of 0.32 and 0.48 minutes for 2- and 3-minute horizons. SWAG demonstrates versatility across generative decoding frame works and classification and regression tasks to create temporal continuity between surgical workflow recognition and anticipation. Our method provides steps towards intraoperative surgical workflow generation for anticipation. Project: https://maxboels.github.io/swag.

nan


Article 1077

Title@2025-06-09 (1): Tree-Sliced Wasserstein Distance with Nonlinear Projection

Title: Tree-Sliced Wasserstein Distance with Nonlinear Projection Baumgeschliffene Wasserstein-Distanz mit nichtlinearer Projektion 用非线性投影绘制的树切瓦西尔斯坦距离 2505.00968v2

Authors (7): Thanh Tran, Viet-Hoang Tran, Thanh Chu, Trang Pham, Laurent El Ghaoui, Tam Le, Tan M. Nguyen

Tree-Sliced methods have recently emerged as an alternative to the traditional Sliced Wasserstein (SW) distance, replacing one-dimensional lines with tree-based metric spaces and incorporating a splitting mechanism for projecting measures. This approach enhances the ability to capture the topological structures of integration domains in Sliced Optimal Transport while maintaining low computational costs. Building on this foundation, we propose a novel nonlinear projectional framework for the Tree-Sliced Wasserstein (TSW) distance, substituting the linear projections in earlier versions with general projections, while ensuring the injectivity of the associated Radon Transform and preserving the well-definedness of the resulting metric. By designing appropriate projections, we construct efficient metrics for measures on both Euclidean spaces and spheres. Finally, we validate our proposed metric through extensive numerical experiments for Euclidean and spherical datasets. Applications include gradient flows, self-supervised learning, and generative models, where our methods demonstrate significant improvements over recent SW and TSW variants.

nan


Article 1078

Title@2025-06-09 (1): Learning from Double Positive and Unlabeled Data for Potential-Customer Identification

Title: Learning from Double Positive and Unlabeled Data for Potential-Customer Identification Lernen aus doppelt positiven und unmarkierten Daten für die potenzielle Kundenidentifikation 学习用于潜在海关识别的双正和无标签数据 2506.00436v2

Authors (5): Masahiro Kato, Yuki Ikeda, Kentaro Baba, Takashi Imai, Ryo Inokuchi

In this study, we propose a method for identifying potential customers in targeted marketing by applying learning from positive and unlabeled data (PU learning). We consider a scenario in which a company sells a product and can observe only the customers who purchased it. Decision-makers seek to market products effectively based on whether people have loyalty to the company. Individuals with loyalty are those who are likely to remain interested in the company even without additional advertising. Consequently, those loyal customers would likely purchase from the company if they are interested in the product. In contrast, people with lower loyalty may overlook the product or buy similar products from other companies unless they receive marketing attention. Therefore, by focusing marketing efforts on individuals who are interested in the product but do not have strong loyalty, we can achieve more efficient marketing. To achieve this goal, we consider how to learn, from limited data, a classifier that identifies potential customers who (i) have interest in the product and (ii) do not have loyalty to the company. Although our algorithm comprises a single-stage optimization, its objective function implicitly contains two losses derived from standard PU learning settings. For this reason, we refer to our approach as double PU learning. We verify the validity of the proposed algorithm through numerical experiments, confirming that it functions appropriately for the problem at hand.

nan


Article 1079

Title@2025-06-09 (1): How Expressive are Knowledge Graph Foundation Models?

Title: How Expressive are Knowledge Graph Foundation Models? Wie ausdrucksstark sind Wissensgraphen-Stiftungsmodelle? 知识图基础模型的表达性如何? 2502.13339v2

Authors (7): Xingyue Huang, Pablo Barceló, Michael M. Bronstein, İsmail İlkan Ceylan, Mikhail Galkin, Juan L Reutter, Miguel Romero Orth

Knowledge Graph Foundation Models (KGFMs) are at the frontier for deep learning on knowledge graphs (KGs), as they can generalize to completely novel knowledge graphs with different relational vocabularies. Despite their empirical success, our theoretical understanding of KGFMs remains very limited. In this paper, we conduct a rigorous study of the expressive power of KGFMs. Specifically, we show that the expressive power of KGFMs directly depends on the motifs that are used to learn the relation representations. We then observe that the most typical motifs used in the existing literature are binary, as the representations are learned based on how pairs of relations interact, which limits the model’s expressiveness. As part of our study, we design more expressive KGFMs using richer motifs, which necessitate learning relation representations based on, e.g., how triples of relations interact with each other. Finally, we empirically validate our theoretical findings, showing that the use of richer motifs results in better performance on a wide range of datasets drawn from different domains.

nan


Article 1080

Title@2025-06-09 (1): Revisiting Non-Acyclic GFlowNets in Discrete Environments

Title: Revisiting Non-Acyclic GFlowNets in Discrete Environments Nicht-azyklische GFlowNets in diskreten Umgebungen erneut besuchen 重新审视环境混乱中的非周期性GFlowNets 2502.07735v2

Authors (4): Nikita Morozov, Ian Maksimov, Daniil Tiapkin, Sergey Samsonov

Generative Flow Networks (GFlowNets) are a family of generative models that learn to sample objects from a given probability distribution, potentially known up to a normalizing constant. Instead of working in the object space, GFlowNets proceed by sampling trajectories in an appropriately constructed directed acyclic graph environment, greatly relying on the acyclicity of the graph. In our paper, we revisit the theory that relaxes the acyclicity assumption and present a simpler theoretical framework for non-acyclic GFlowNets in discrete environments. Moreover, we provide various novel theoretical insights related to training with fixed backward policies, the nature of flow functions, and connections between entropy-regularized RL and non-acyclic GFlowNets, which naturally generalize the respective concepts and theoretical results from the acyclic setting. In addition, we experimentally re-examine the concept of loss stability in non-acyclic GFlowNet training, as well as validate our own theoretical findings.

nan


Article 1081

Title@2025-06-09 (1): TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts

Title: TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts TwinBreak: Jailbreaking LLM Security Alignments basierend auf Twin Prompts 双突破:基于双提示的监狱破碎LLM安保调整 2506.07596v1

Authors (3): Torsten Krauß, Hamid Dashtbani, Alexandra Dmitrienko

Machine learning is advancing rapidly, with applications bringing notable benefits, such as improvements in translation and code generation. Models like ChatGPT, powered by Large Language Models (LLMs), are increasingly integrated into daily life. However, alongside these benefits, LLMs also introduce social risks. Malicious users can exploit LLMs by submitting harmful prompts, such as requesting instructions for illegal activities. To mitigate this, models often include a security mechanism that automatically rejects such harmful prompts. However, they can be bypassed through LLM jailbreaks. Current jailbreaks often require significant manual effort, high computational costs, or result in excessive model modifications that may degrade regular utility. We introduce TwinBreak, an innovative safety alignment removal method. Building on the idea that the safety mechanism operates like an embedded backdoor, TwinBreak identifies and prunes parameters responsible for this functionality. By focusing on the most relevant model layers, TwinBreak performs fine-grained analysis of parameters essential to model utility and safety. TwinBreak is the first method to analyze intermediate outputs from prompts with high structural and content similarity to isolate safety parameters. We present the TwinPrompt dataset containing 100 such twin prompts. Experiments confirm TwinBreak’s effectiveness, achieving 89% to 98% success rates with minimal computational requirements across 16 LLMs from five vendors.

nan


Article 1082

Title@2025-06-09 (1): Tree-Sliced Wasserstein Distance: A Geometric Perspective

Title: Tree-Sliced Wasserstein Distance: A Geometric Perspective Baumgeschliffene Wasserstein-Distanz: Eine geometrische Perspektive 树切瓦西尔斯坦距离:几何视角 2406.13725v3

Authors (7): Viet-Hoang Tran, Trang Pham, Tho Tran, Minh Khoi Nguyen Nhat, Thanh Chu, Tam Le, Tan M. Nguyen

Many variants of Optimal Transport (OT) have been developed to address its heavy computation. Among them, notably, Sliced Wasserstein (SW) is widely used for application domains by projecting the OT problem onto one-dimensional lines, and leveraging the closed-form expression of the univariate OT to reduce the computational burden. However, projecting measures onto low-dimensional spaces can lead to a loss of topological information. To mitigate this issue, in this work, we propose to replace one-dimensional lines with a more intricate structure, called tree systems. This structure is metrizable by a tree metric, which yields a closed-form expression for OT problems on tree systems. We provide an extensive theoretical analysis to formally define tree systems with their topological properties, introduce the concept of splitting maps, which operate as the projection mechanism onto these structures, then finally propose a novel variant of Radon transform for tree systems and verify its injectivity. This framework leads to an efficient metric between measures, termed Tree-Sliced Wasserstein distance on Systems of Lines (TSW-SL). By conducting a variety of experiments on gradient flows, image style transfer, and generative models, we illustrate that our proposed approach performs favorably compared to SW and its variants.

nan


Article 1083

Title@2025-06-09 (1): Exploiting Curvature in Online Convex Optimization with Delayed Feedback

Title: Exploiting Curvature in Online Convex Optimization with Delayed Feedback Ausnutzen von Kurven in Online Convex-Optimierung mit verzögertem Feedback 利用在线 Convex 优化使用延迟反馈的在线 Convex 优化中的曲线曲线 2506.07595v1

Authors (3): Hao Qiu, Emmanuel Esposito, Mengxiao Zhang

In this work, we study the online convex optimization problem with curved losses and delayed feedback. When losses are strongly convex, existing approaches obtain regret bounds of order $d_{\max} \ln T$, where $d_{\max}$ is the maximum delay and $T$ is the time horizon. However, in many cases, this guarantee can be much worse than $\sqrt{d_{\mathrm{tot}}}$ as obtained by a delayed version of online gradient descent, where $d_{\mathrm{tot}}$ is the total delay. We bridge this gap by proposing a variant of follow-the-regularized-leader that obtains regret of order $\min{\sigma_{\max}\ln T, \sqrt{d_{\mathrm{tot}}}}$, where $\sigma_{\max}$ is the maximum number of missing observations. We then consider exp-concave losses and extend the Online Newton Step algorithm to handle delays with an adaptive learning rate tuning, achieving regret $\min{d_{\max} n\ln T, \sqrt{d_{\mathrm{tot}}}}$ where $n$ is the dimension. To our knowledge, this is the first algorithm to achieve such a regret bound for exp-concave losses. We further consider the problem of unconstrained online linear regression and achieve a similar guarantee by designing a variant of the Vovk-Azoury-Warmuth forecaster with a clipping trick. Finally, we implement our algorithms and conduct experiments under various types of delay and losses, showing an improved performance over existing methods.

nan


Article 1084

Title: Lite-RVFL: A Lightweight Random Vector Functional-Link Neural Network for Learning Under Concept Drift Lite-RVFL: Ein leichter Zufallsvektor Funktions-Link Neuronales Netzwerk zum Lernen unter Konzept Drift Lite-RVFL: 轻量随机随机矢量功能链接神经网络,用于在概念驱动下学习 2506.08063v1

Authors (3): Songqiao Hu, Zeyi Liu, Xiao He

The change in data distribution over time, also known as concept drift, poses a significant challenge to the reliability of online learning methods. Existing methods typically require model retraining or drift detection, both of which demand high computational costs and are often unsuitable for real-time applications. To address these limitations, a lightweight, fast and efficient random vector functional-link network termed Lite-RVFL is proposed, capable of adapting to concept drift without drift detection and retraining. Lite-RVFL introduces a novel objective function that assigns weights exponentially increasing to new samples, thereby emphasizing recent data and enabling timely adaptation. Theoretical analysis confirms the feasibility of this objective function for drift adaptation, and an efficient incremental update rule is derived. Experimental results on a real-world safety assessment task validate the efficiency, effectiveness in adapting to drift, and potential to capture temporal patterns of Lite-RVFL. The source code is available at https://github.com/songqiaohu/Lite-RVFL.

nan


Article 1085

Title@2025-06-09 (1): FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning

Title: FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning FairDICE: Fairness-getriebenes Offline-Multi-Ziel-Verstärkungs-Lernen FairDICE: 公平发展离线多目标强化学习 2506.08062v1

Authors (4): Woosung Kim, Jinho Lee, Jongmin Lee, Byung-Jun Lee

Multi-objective reinforcement learning (MORL) aims to optimize policies in the presence of conflicting objectives, where linear scalarization is commonly used to reduce vector-valued returns into scalar signals. While effective for certain preferences, this approach cannot capture fairness-oriented goals such as Nash social welfare or max-min fairness, which require nonlinear and non-additive trade-offs. Although several online algorithms have been proposed for specific fairness objectives, a unified approach for optimizing nonlinear welfare criteria in the offline setting-where learning must proceed from a fixed dataset-remains unexplored. In this work, we present FairDICE, the first offline MORL framework that directly optimizes nonlinear welfare objective. FairDICE leverages distribution correction estimation to jointly account for welfare maximization and distributional regularization, enabling stable and sample-efficient learning without requiring explicit preference weights or exhaustive weight search. Across multiple offline benchmarks, FairDICE demonstrates strong fairness-aware performance compared to existing baselines.

nan


Article 1086

Title@2025-06-09 (1): Explore the vulnerability of black-box models via diffusion models

Title: Explore the vulnerability of black-box models via diffusion models Entdecken Sie die Schwachstelle von Black-Box-Modellen über Diffusionsmodelle 通过推广模式探索黑盒模式的脆弱性 2506.07590v1

Authors (4): Jiacheng Shi, Yanfu Zhang, Huajie Shao, Ashley Gao

Recent advancements in diffusion models have enabled high-fidelity and photorealistic image generation across diverse applications. However, these models also present security and privacy risks, including copyright violations, sensitive information leakage, and the creation of harmful or offensive content that could be exploited maliciously. In this study, we uncover a novel security threat where an attacker leverages diffusion model APIs to generate synthetic images, which are then used to train a high-performing substitute model. This enables the attacker to execute model extraction and transfer-based adversarial attacks on black-box classification models with minimal queries, without needing access to the original training data. The generated images are sufficiently high-resolution and diverse to train a substitute model whose outputs closely match those of the target model. Across the seven benchmarks, including CIFAR and ImageNet subsets, our method shows an average improvement of 27.37% over state-of-the-art methods while using just 0.01 times of the query budget, achieving a 98.68% success rate in adversarial attacks on the target model.

nan


Article 1087

Title@2025-06-09 (1): Attention with Trained Embeddings Provably Selects Important Tokens

Title: Attention with Trained Embeddings Provably Selects Important Tokens Aufmerksamkeit bei trainierten Einbettungen wählt wahrscheinlich wichtige Token aus 与经过训练的嵌入器的关注 2505.17282v2

Authors (4): Diyuan Wu, Aleksandr Shevchenko, Samet Oymak, Marco Mondelli

Token embeddings play a crucial role in language modeling but, despite this practical relevance, their theoretical understanding remains limited. Our paper addresses the gap by characterizing the structure of embeddings obtained via gradient descent. Specifically, we consider a one-layer softmax attention model with a linear head for binary classification, i.e., $\texttt{Softmax}( p^\top E_X^\top ) E_X v = \frac{ \sum_{i=1}^T \exp(p^\top E_{x_i}) E_{x_i}^\top v}{\sum_{j=1}^T \exp(p^\top E_{x_{j}}) }$, where $E_X = [ E_{x_1} , \dots, E_{x_T} ]^\top$ contains the embeddings of the input sequence, $p$ is the embedding of the $\mathrm{\langle cls \rangle}$ token and $v$ the output vector. First, we show that, already after a single step of gradient training with the logistic loss, the embeddings $E_X$ capture the importance of tokens in the dataset by aligning with the output vector $v$ proportionally to the frequency with which the corresponding tokens appear in the dataset. Then, after training $p$ via gradient flow until convergence, the softmax selects the important tokens in the sentence (i.e., those that are predictive of the label), and the resulting $\mathrm{\langle cls \rangle}$ embedding maximizes the margin for such a selection. Experiments on real-world datasets (IMDB, Yelp) exhibit a phenomenology close to that unveiled by our theory.

nan


Article 1088

Title@2025-06-09 (1): PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs

Title: PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs PrunePEFT: Iterative Hybride Pruning für eine parametereffiziente Feinabstimmung von LLMs PurnePEFT: LLMs 参数有效微调过渡混合节制 2506.07587v1

Authors (6): Tongzhou Yu, Zhuhao Zhang, Guanghui Zhu, Shen Jiang, Meikang Qiu, Yihua Huang

Parameter Efficient Fine-Tuning (PEFT) methods have emerged as effective and promising approaches for fine-tuning pre-trained language models. Compared with Full parameter Fine-Tuning (FFT), PEFT achieved comparable task performance with a substantial reduction of trainable parameters, which largely saved the training and storage costs. However, using the PEFT method requires considering a vast design space, such as the type of PEFT modules and their insertion layers. Inadequate configurations can lead to sub-optimal results. Conventional solutions such as architectural search techniques, while effective, tend to introduce substantial additional overhead. In this paper, we propose a novel approach, PrunePEFT, which formulates the PEFT strategy search as a pruning problem and introduces a hybrid pruning strategy that capitalizes on the sensitivity of pruning methods to different PEFT modules. This method extends traditional pruning techniques by iteratively removing redundant or conflicting PEFT modules, thereby optimizing the fine-tuned configuration. By efficiently identifying the most relevant modules, our approach significantly reduces the computational burden typically associated with architectural search processes, making it a more scalable and efficient solution for fine-tuning large pre-trained models.

nan


Article 1089

Title@2025-06-09 (1): Theoretical Limitations of Ensembles in the Age of Overparameterization

Title: Theoretical Limitations of Ensembles in the Age of Overparameterization Theoretische Einschränkungen von Ensembles im Zeitalter der Überparameterisierung 过分参数化时代集合群的理论局限性 2410.16201v2

Authors (3): Niclas Dern, John P. Cunningham, Geoff Pleiss

Classic ensembles generalize better than any single component model. In contrast, recent empirical studies find that modern ensembles of (overparameterized) neural networks may not provide any inherent generalization advantage over single but larger neural networks. This paper clarifies how modern overparameterized ensembles differ from their classic underparameterized counterparts, using ensembles of random feature (RF) regressors as a basis for developing theory. In contrast to the underparameterized regime, where ensembling typically induces regularization and increases generalization, we prove with minimal assumptions that infinite ensembles of overparameterized RF regressors become pointwise equivalent to (single) infinite-width RF regressors, and finite width ensembles rapidly converge to single models with the same parameter budget. These results, which are exact for ridgeless models and approximate for small ridge penalties, imply that overparameterized ensembles and single large models exhibit nearly identical generalization. We further characterize the predictive variance amongst ensemble members, demonstrating that it quantifies the expected effects of increasing capacity rather than capturing any conventional notion of uncertainty. Our results challenge common assumptions about the advantages of ensembles in overparameterized settings, prompting a reconsideration of how well intuitions from underparameterized ensembles transfer to deep ensembles and the overparameterized regime.

nan


Article 1090

Title@2025-06-09 (1): Aircraft Trajectory Dataset Augmentation in Latent Space

Title: Aircraft Trajectory Dataset Augmentation in Latent Space Flugbahndatensatz Augmentation im Latent Space 低空空间轨迹数据集增强 2506.07585v1

Authors (2): Seokbin Yoon, Keumjin Lee

Aircraft trajectory modeling plays a crucial role in Air Traffic Management (ATM) and is important for various downstream tasks, including conflict detection and landing time prediction. Dataset augmentation through the addition of synthetically generated trajectory data is necessary to develop a more robust aircraft trajectory model and ensure that the trajectory dataset is sufficient and balanced. In this work, we propose a novel framework called ATRADA for aircraft trajectory dataset augmentation. In the proposed framework, a Transformer encoder learns the underlying patterns in the original trajectory dataset and converts each data point into a context vector in the learned latent space. The converted dataset in the latent space is projected into reduced dimensions using principal component analysis (PCA), and a Gaussian mixture model (GMM) is applied to fit the probability distribution of the data points in the reduced-dimensional space. Finally, new samples are drawn from the fitted GMM, the dimension of the samples is reverted to the original dimension, and they are decoded with a Multi-Layer Perceptron (MLP). Several experiments demonstrate that the framework effectively generates new, high-quality synthetic aircraft trajectory data, which were compared to the results of several baselines.

nan


Article 1091

Title@2025-06-09 (1): FedCGD: Collective Gradient Divergence Optimized Scheduling for Wireless Federated Learning

Title: FedCGD: Collective Gradient Divergence Optimized Scheduling for Wireless Federated Learning FedCGD: Kollektive Gradient Divergenz optimierte Planung für drahtloses Federated Learning FedCGD:无线联邦学习集体渐进变异优化计划 2506.07581v1

Authors (5): Tan Chen, Jintao Yan, Yuxuan Sun, Sheng Zhou, Zhisheng Niu

Federated learning (FL) is a promising paradigm for multiple devices to cooperatively train a model. When applied in wireless networks, two issues consistently affect the performance of FL, i.e., data heterogeneity of devices and limited bandwidth. Many papers have investigated device scheduling strategies considering the two issues. However, most of them recognize data heterogeneity as a property of individual devices. In this paper, we prove that the convergence speed of FL is affected by the sum of device-level and sample-level collective gradient divergence (CGD). The device-level CGD refers to the gradient divergence of the scheduled device group, instead of the sum of the individual device divergence. The sample-level CGD is statistically upper bounded by sampling variance, which is inversely proportional to the total number of samples scheduled for local update. To derive a tractable form of the device-level CGD, we further consider a classification problem and transform it into the weighted earth moving distance (WEMD) between the group distribution and the global distribution. Then we propose FedCGD algorithm to minimize the sum of multi-level CGDs by balancing WEMD and sampling variance, within polynomial time. Simulation shows that the proposed strategy increases classification accuracy on the CIFAR-10 dataset by up to 4.2\% while scheduling 41.8\% fewer devices, and flexibly switches between reducing WEMD and reducing sampling variance.

nan


Article 1092

Title@2025-06-09 (1): HSF: Defending against Jailbreak Attacks with Hidden State Filtering

Title: HSF: Defending against Jailbreak Attacks with Hidden State Filtering HSF: Verteidigung gegen Jailbreak Attacken mit versteckten Zustand Filterung HSF: 防范以隐蔽国家过滤手段侵入监狱的袭击 2409.03788v2

Authors (4): Cheng Qian, Hainan Zhang, Lei Sha, Zhiming Zheng

With the growing deployment of LLMs in daily applications like chatbots and content generation, efforts to ensure outputs align with human values and avoid harmful content have intensified. However, increasingly sophisticated jailbreak attacks threaten this alignment, aiming to induce unsafe outputs. Current defense efforts either focus on prompt rewriting or detection, which are limited in effectiveness due to the various design of jailbreak prompts, or on output control and detection, which are computationally expensive as they require LLM inference. Therefore, designing a pre-inference defense method that resists diverse jailbreak prompts is crucial for preventing LLM jailbreak attacks. We observe that jailbreak attacks, safe queries, and harmful queries exhibit different clustering patterns within the LLM’s hidden state representation space. This suggests that by leveraging the LLM’s hidden state representational capabilities, we can analyze the LLM’s forthcoming behavior and proactively intervene for defense. In this paper, we propose a jailbreak attack defense strategy based on a Hidden State Filter (HSF), a lossless architectural defense mechanism that enables the model to preemptively identify and reject adversarial inputs before the inference process begins. We activate its defensive potential through an additional plugin module, effectively framing the defense task as a classification problem. Experimental results on two benchmark datasets, utilizing three different LLMs, show that HSF significantly enhances resilience against six cutting-edge jailbreak attacks. It significantly reduces the success rate of jailbreak attacks while minimally impacting responses to benign user queries, with negligible inference overhead, and outperforming defense baselines.Our code and data are available at https://anonymous.4open.science/r/Hidden-State-Filtering-8652/

nan


Article 1093

Title@2025-06-09 (1): Denoising the Future: Top-p Distributions for Moving Through Time

Title: Denoising the Future: Top-p Distributions for Moving Through Time Denoising the Future: Top-P-Distributionen für die Zeit 摒弃未来:跨越时空的顶级发行品 2506.07578v1

Authors (4): Florian Andreas Marwitz, Ralf Möller, Magnus Bender, Marcel Gehrke

Inference in dynamic probabilistic models is a complex task involving expensive operations. In particular, for Hidden Markov Models, the whole state space has to be enumerated for advancing in time. Even states with negligible probabilities are considered, resulting in computational inefficiency and increased noise due to the propagation of unlikely probability mass. We propose to denoise the future and speed up inference by using only the top-p states, i.e., the most probable states with accumulated probability p. We show that the error introduced by using only the top-p states is bound by p and the so-called minimal mixing rate of the underlying model. Moreover, in our empirical evaluation, we show that we can expect speedups of at least an order of magnitude, while the error in terms of total variation distance is below 0.09.

nan


Article 1094

Title@2025-06-09 (1): Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models

Title: Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models Uncertainty-o: Ein Modell-agnostisches Framework zur Enthüllung von Uncertainty in großen multimodalen Modellen 不确定性 – – o:大型多式联运模型中不确定性的不确定性稳定的一个示范 – – 不可知框架 2506.07575v1

Authors (4): Ruiyang Zhang, Hu Zhang, Hao Fei, Zhedong Zheng

Large Multimodal Models (LMMs), harnessing the complementarity among diverse modalities, are often considered more robust than pure Language Large Models (LLMs); yet do LMMs know what they do not know? There are three key open questions remaining: (1) how to evaluate the uncertainty of diverse LMMs in a unified manner, (2) how to prompt LMMs to show its uncertainty, and (3) how to quantify uncertainty for downstream tasks. In an attempt to address these challenges, we introduce Uncertainty-o: (1) a model-agnostic framework designed to reveal uncertainty in LMMs regardless of their modalities, architectures, or capabilities, (2) an empirical exploration of multimodal prompt perturbations to uncover LMM uncertainty, offering insights and findings, and (3) derive the formulation of multimodal semantic uncertainty, which enables quantifying uncertainty from multimodal responses. Experiments across 18 benchmarks spanning various modalities and 10 LMMs (both open- and closed-source) demonstrate the effectiveness of Uncertainty-o in reliably estimating LMM uncertainty, thereby enhancing downstream tasks such as hallucination detection, hallucination mitigation, and uncertainty-aware Chain-of-Thought reasoning.

nan


Article 1095

Title@2025-06-09 (1): Attacking Attention of Foundation Models Disrupts Downstream Tasks

Title: Attacking Attention of Foundation Models Disrupts Downstream Tasks Angriff auf die Aufmerksamkeit von Stiftungsmodellen bricht nachgelagerte Aufgaben aus 攻击基金会模型的注意 破坏下流任务 2506.05394v2

Authors (3): Hondamunige Prasanna Silva, Federico Becattini, Lorenzo Seidenari

Foundation models represent the most prominent and recent paradigm shift in artificial intelligence. Foundation models are large models, trained on broad data that deliver high accuracy in many downstream tasks, often without fine-tuning. For this reason, models such as CLIP , DINO or Vision Transfomers (ViT), are becoming the bedrock of many industrial AI-powered applications. However, the reliance on pre-trained foundation models also introduces significant security concerns, as these models are vulnerable to adversarial attacks. Such attacks involve deliberately crafted inputs designed to deceive AI systems, jeopardizing their reliability. This paper studies the vulnerabilities of vision foundation models, focusing specifically on CLIP and ViTs, and explores the transferability of adversarial attacks to downstream tasks. We introduce a novel attack, targeting the structure of transformer-based architectures in a task-agnostic fashion. We demonstrate the effectiveness of our attack on several downstream tasks: classification, captioning, image/text retrieval, segmentation and depth estimation. Code available at:https://github.com/HondamunigePrasannaSilva/attack-attention

nan


Article 1096

Title@2025-06-09 (1): AI Scientists Fail Without Strong Implementation Capability

Title: AI Scientists Fail Without Strong Implementation Capability KI-Wissenschaftler scheitern ohne starke Umsetzungsfähigkeit AI 缺乏强有力的执行能力的科学家失败 2506.01372v2

Authors (7): Minjun Zhu, Qiujie Xie, Yixuan Weng, Jian Wu, Zhen Lin, Linyi Yang, Yue Zhang

The emergence of Artificial Intelligence (AI) Scientist represents a paradigm shift in scientific discovery, with large language models (LLMs) taking the lead as the primary executor in the entire scientific workflow from idea generation to experiment implementation. Recent AI Scientist studies demonstrate sufficient capabilities for independent scientific discovery, with the generated research reports gaining acceptance at the ICLR 2025 workshop and ACL 2025, arguing that a human-level AI Scientist, capable of uncovering phenomena previously unknown to humans, may be imminent. Despite this substantial progress, AI Scientist has yet to produce a groundbreaking achievement in the domain of computer science on par with automated scientific tools. Based on extensive quantitative evidence from existing benchmarks in complex engineering tasks and a systematic evaluation assess 28 research papers generated by five advanced AI Scientist systems, we argue that \textbf{the fundamental bottleneck for AI Scientists lies in their capability to execute the requisite verification procedures.} Current AI Scientist systems lack the execution capabilities needed to execute rigorous experiments and produce high-quality scientific papers. To better illustrate the root cause of this \textbf{implementation gap}, we provide an in-depth discussion on the fundamental limitations of AI Scientist. This position paper aims to call for the participants in the community to bridge the implementation gap.

nan


Article 1097

Title@2025-06-09 (1): Complex Physics-Informed Neural Network

Title: Complex Physics-Informed Neural Network Komplexes physik-informiertes Neuronales Netzwerk 复杂物理内化神经网络 2502.04917v2

Authors (4): Chenhao Si, Ming Yan, Xin Li, Zhihong Xia

We propose compleX-PINN, a novel physics-informed neural network (PINN) architecture incorporating a learnable activation function inspired by the Cauchy integral theorem. By optimizing the activation parameters, compleX-PINN achieves high accuracy with just a single hidden layer. Empirically, we demonstrate that compleX-PINN solves high-dimensional problems that pose significant challenges for PINNs. Our results show that compleX-PINN consistently achieves substantially greater precision, often improving accuracy by an order of magnitude, on these complex tasks.

nan


Article 1098

Title@2025-06-09 (1): A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment

Title: A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment Ein einfacheres und erlernbares Graph Convolutional Attention Network für unüberwachtes Wissen Graphen Alignment 一个用于不受监督知识图表对齐的简化和可学习的图变注意网络 2410.13263v2

Authors (3): Weishan Cai, Wenjun Ma, Yuncheng Jiang

The success of current Entity Alignment (EA) task depends largely on the supervision information provided by labeled data. Considering the cost of labeled data, most supervised methods are difficult to apply in practical scenarios. Therefore, more and more works based on contrastive learning, active learning or other deep learning techniques have been developed, to solve the performance bottleneck caused by the lack of labeled data. However, the existing unsupervised EA methods still have some limitations, either their modeling complexity is high or they cannot balance the effectiveness and practicality of alignment. To overcome these issues, we propose a Simplifying and Learnable graph convolutional attention network for Unsupervised Knowledge Graphs alignment method (SLU). Specifically, we first introduce LCAT, a new and simple framework as the backbone network to model the graph structure of two KGs. Then we design a reconstruction method of relation structure based on potential matching relations for efficiently filtering invalid neighborhood information of aligned entities, to improve the usability and scalability of SLU. Impressively, a similarity function based on consistency is proposed to better measure the similarity of candidate entity pairs. Finally, we conduct extensive experiments on three datasets of different sizes (15K and 100K) and different types (cross-lingual and monolingual) to verify the superiority of SLU. Experimental results show that SLU significantly improves alignment accuracy, outperforming 25 supervised or unsupervised methods, and improving 6.4% in Hits@1 over the best baseline in the best case.

nan


Article 1099

Title@2025-06-09 (1): AQUATIC-Diff: Additive Quantization for Truly Tiny Compressed Diffusion Models

Title: AQUATIC-Diff: Additive Quantization for Truly Tiny Compressed Diffusion Models AQUATIC-Diff: Additive Quantisierung für wirklich kleine komprimierte Diffusionsmodelle AQUATIC-Diff: 真正微缩压缩扩散模型补充量化 2506.05960v2

Authors (2): Adil Hasan, Thomas Peyrin

Significant investments have been made towards the commodification of diffusion models for generation of diverse media. Their mass-market adoption is however still hobbled by the intense hardware resource requirements of diffusion model inference. Model quantization strategies tailored specifically towards diffusion models have been useful in easing this burden, yet have generally explored the Uniform Scalar Quantization (USQ) family of quantization methods. In contrast, Vector Quantization (VQ) methods, which operate on groups of multiple related weights as the basic unit of compression, have seen substantial success in Large Language Model (LLM) quantization. In this work, we apply codebook-based additive vector quantization to the problem of diffusion model compression. Our resulting approach achieves a new Pareto frontier for the extremely low-bit weight quantization on the standard class-conditional benchmark of LDM-4 on ImageNet at 20 inference time steps. Notably, we report sFID 1.92 points lower than the full-precision model at W4A8 and the best-reported results for FID, sFID and ISC at W2A8. We are also able to demonstrate FLOPs savings on arbitrary hardware via an efficient inference kernel, as opposed to savings resulting from small integer operations which may lack broad hardware support.

nan


Article 1100

Title@2025-06-09 (1): Improving Memory Efficiency for Training KANs via Meta Learning

Title: Improving Memory Efficiency for Training KANs via Meta Learning Verbesserung der Speichereffizienz für die Schulung von KANs über Meta Learning 通过Meta学习提高培训KANs的记忆效率 2506.07549v1

Authors (4): Zhangchi Zhao, Jun Shu, Deyu Meng, Zongben Xu

Inspired by the Kolmogorov-Arnold representation theorem, KANs offer a novel framework for function approximation by replacing traditional neural network weights with learnable univariate functions. This design demonstrates significant potential as an efficient and interpretable alternative to traditional MLPs. However, KANs are characterized by a substantially larger number of trainable parameters, leading to challenges in memory efficiency and higher training costs compared to MLPs. To address this limitation, we propose to generate weights for KANs via a smaller meta-learner, called MetaKANs. By training KANs and MetaKANs in an end-to-end differentiable manner, MetaKANs achieve comparable or even superior performance while significantly reducing the number of trainable parameters and maintaining promising interpretability. Extensive experiments on diverse benchmark tasks, including symbolic regression, partial differential equation solving, and image classification, demonstrate the effectiveness of MetaKANs in improving parameter efficiency and memory usage. The proposed method provides an alternative technique for training KANs, that allows for greater scalability and extensibility, and narrows the training cost gap with MLPs stated in the original paper of KANs. Our code is available at https://github.com/Murphyzc/MetaKAN.

nan


Article 1101

Title@2025-06-09 (1): Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques

Title: Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques Eliciting Fine-Tuned Transformer Fähigkeiten durch Inferenz-Zeit-Techniken 通过推断-时间技术的精密导精度变形器能力 2506.08060v1

Authors (1): Asankhaya Sharma

Large language models have transformed natural language processing, yet supervised fine-tuning (SFT) remains computationally intensive. This paper formally proves that capabilities acquired through SFT can be approximated by a base transformer model using inference-time techniques, specifically in-context learning (ICL), without altering model parameters, under idealized assumptions including unbounded computational resources and access to the fine-tuning dataset. We extend these results to practical scenarios with finite context lengths and partial dataset access. For text generation tasks with fixed output length $l$, datasets of size $\mathrm{O}\left( \frac{m V}{\varepsilon^2} \log \frac{m}{\delta} \right)$ or, with bounded context, $\mathrm{O}\left( \frac{l \log V}{\varepsilon^2} \log \frac{1}{\delta} \right)$ suffice to approximate fine-tuned behavior across $m$ contexts within error $\varepsilon$, where $V$ is the vocabulary size and $\delta$ is the failure probability. For linear classification, datasets of size $\mathrm{O}\left( \frac{d}{\varepsilon} \right)$ or, with fixed context, $\mathrm{O}\left( \frac{1}{\varepsilon^2} \log \frac{1}{\delta} \right)$ are sufficient, where $d$ is the input dimension. Grounded in the Turing completeness of transformers, these results provide a theoretical foundation for resource-efficient deployment of large language models, with practical techniques like retrieval-augmented generation bridging theory to real-world applications.

nan


Article 1102

Title@2025-06-09 (1): Output-Constrained Decision Trees

Title: Output-Constrained Decision Trees Output-beschränkte Entscheidungsbäume 受产出约束的决策树 2405.15314v3

Authors (6): Hüseyin Tunç, Doğanay Özese, Ş. İlker Birbil, Donato Maragno, Marco Caserta, Mustafa Baydoğan

Incorporating domain-specific constraints into machine learning models is essential for generating predictions that are both accurate and feasible in real-world applications. This paper introduces new methods for training Output-Constrained Regression Trees (OCRT), addressing the limitations of traditional decision trees in constrained multi-target regression tasks. We propose three approaches: M-OCRT, which uses split-based mixed integer programming to enforce constraints; E-OCRT, which employs an exhaustive search for optimal splits and solves constrained prediction problems at each decision node; and EP-OCRT, which applies post-hoc constrained optimization to tree predictions. To illustrate their potential uses in ensemble learning, we also introduce a random forest framework working under convex feasible sets. We validate the proposed methods through a computational study both on synthetic and industry-driven hierarchical time series datasets. Our results demonstrate that imposing constraints on decision tree training results in accurate and feasible predictions.

nan


Article 1103

Title@2025-06-09 (1): Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling

Title: Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling Heartcare Suite: Multidimensionales Verständnis von EKG mit Raw Multi-Lead Signal Modeling 心脏护理套套件:以原始多牵头信号建模对ECG的多层面理解 2506.05831v2

Authors (14): Yihan Xie, Sijing Li, Tianwei Lin, Zhuonan Wang, Chenglin Yang, Yu Zhong, Wenqiao Zhang, Haoyuan Li, Hao Jiang, Fengda Zhang, Qishan Chen, Jun Xiao, Yueting Zhuang, Beng Chin Ooi

We present Heartcare Suite, a multimodal comprehensive framework for finegrained electrocardiogram (ECG) understanding. It comprises three key components: (i) Heartcare-220K, a high-quality, structured, and comprehensive multimodal ECG dataset covering essential tasks such as disease diagnosis, waveform morphology analysis, and rhythm interpretation. (ii) Heartcare-Bench, a systematic and multi-dimensional benchmark designed to evaluate diagnostic intelligence and guide the optimization of Medical Multimodal Large Language Models (Med-MLLMs) in ECG scenarios. and (iii) HeartcareGPT with a tailored tokenizer Bidirectional ECG Abstract Tokenization (Beat), which compresses raw multi-lead signals into semantically rich discrete tokens via duallevel vector quantization and query-guided bidirectional diffusion mechanism. Built upon Heartcare-220K, HeartcareGPT achieves strong generalization and SoTA performance across multiple clinically meaningful tasks. Extensive experiments demonstrate that Heartcare Suite is highly effective in advancing ECGspecific multimodal understanding and evaluation. Our project is available at https://github.com/DCDmllm/Heartcare-Suite .

nan


Article 1104

Title@2025-06-09 (1): Flowing Datasets with Wasserstein over Wasserstein Gradient Flows

Title: Flowing Datasets with Wasserstein over Wasserstein Gradient Flows Flowing Datasets mit Wasserstein über Wasserstein Gradient Flows 与瓦森斯坦对瓦森斯坦梯流的流数据集 2506.07534v1

Authors (3): Clément Bonet, Christophe Vauthier, Anna Korba

Many applications in machine learning involve data represented as probability distributions. The emergence of such data requires radically novel techniques to design tractable gradient flows on probability distributions over this type of (infinite-dimensional) objects. For instance, being able to flow labeled datasets is a core task for applications ranging from domain adaptation to transfer learning or dataset distillation. In this setting, we propose to represent each class by the associated conditional distribution of features, and to model the dataset as a mixture distribution supported on these classes (which are themselves probability distributions), meaning that labeled datasets can be seen as probability distributions over probability distributions. We endow this space with a metric structure from optimal transport, namely the Wasserstein over Wasserstein (WoW) distance, derive a differential structure on this space, and define WoW gradient flows. The latter enables to design dynamics over this space that decrease a given objective functional. We apply our framework to transfer learning and dataset distillation tasks, leveraging our gradient flow construction as well as novel tractable functionals that take the form of Maximum Mean Discrepancies with Sliced-Wasserstein based kernels between probability distributions.

nan


Article 1105

Title@2025-06-09 (1): Learning What Reinforcement Learning Can’t: Interleaved Online Fine-Tuning for Hardest Questions

Title: Learning What Reinforcement Learning Can’t: Interleaved Online Fine-Tuning for Hardest Questions Lernen, was Verstärktes Lernen nicht kann: Interleaved Online Fine-Tuning für härteste Fragen 学习强化学习无法做到的东西: 2506.07527v1

Authors (11): Lu Ma, Hao Liang, Meiyi Qiang, Lexiang Tang, Xiaochen Ma, Zhen Hao Wong, Junbo Niu, Chengyu Shen, Runming He, Bin Cui, Wentao Zhang

Recent advances in large language model (LLM) reasoning have shown that sophisticated behaviors such as planning and self-reflection can emerge through reinforcement learning (RL). However, despite these successes, RL in its current form remains insufficient to induce capabilities that exceed the limitations of the base model, as it is primarily optimized based on existing knowledge of the model rather than facilitating the acquisition of new information. To address this limitation, we employ supervised fine-tuning (SFT) to learn what RL cannot, which enables the incorporation of new knowledge and reasoning patterns by leveraging high-quality demonstration data. We analyze the training dynamics of RL and SFT for LLM reasoning and find that RL excels at maintaining and improving performance on questions within the model’s original capabilities, while SFT is more effective at enabling progress on questions beyond the current scope of the model. Motivated by the complementary strengths of RL and SFT, we introduce a novel training approach, \textbf{ReLIFT} (\textbf{Re}inforcement \textbf{L}earning \textbf{I}nterleaved with Online \textbf{F}ine-\textbf{T}uning). In ReLIFT, the model is primarily trained using RL, but when it encounters challenging questions, high-quality solutions are collected for fine-tuning, and the training process alternates between RL and fine-tuning to enhance the model’s reasoning abilities. ReLIFT achieves an average improvement of over +5.2 points across five competition-level benchmarks and one out-of-distribution benchmark compared to other zero-RL models. Furthermore, we demonstrate that ReLIFT outperforms both RL and SFT while using only 13\% of the detailed demonstration data, highlighting its scalability. These results provide compelling evidence that ReLIFT overcomes the fundamental limitations of RL and underscores the significant potential.

nan


Article 1106

Title@2025-06-09 (1): Test-time Correlation Alignment

Title: Test-time Correlation Alignment Prüfzeitkorrelation Ausrichtung 测试时关联对齐 2505.00533v2

Authors (3): Linjing You, Jiabao Lu, Xiayuan Huang

Deep neural networks often degrade under distribution shifts. Although domain adaptation offers a solution, privacy constraints often prevent access to source data, making Test-Time Adaptation (TTA, which adapts using only unlabeled test data) increasingly attractive. However, current TTA methods still face practical challenges: (1) a primary focus on instance-wise alignment, overlooking CORrelation ALignment (CORAL) due to missing source correlations; (2) complex backpropagation operations for model updating, resulting in overhead computation and (3) domain forgetting. To address these challenges, we provide a theoretical analysis to investigate the feasibility of Test-time Correlation Alignment (TCA), demonstrating that correlation alignment between high-certainty instances and test instances can enhance test performances with a theoretical guarantee. Based on this, we propose two simple yet effective algorithms: LinearTCA and LinearTCA+. LinearTCA applies a simple linear transformation to achieve both instance and correlation alignment without additional model updates, while LinearTCA+ serves as a plug-and-play module that can easily boost existing TTA methods. Extensive experiments validate our theoretical insights and show that TCA methods significantly outperforms baselines across various tasks, benchmarks and backbones. Notably, LinearTCA achieves higher accuracy with only 4% GPU memory and 0.6% computation time compared to the best TTA baseline. It also outperforms existing methods on CLIP over 1.86%.

nan


Article 1107

Title@2025-06-09 (1): Towards Achieving Perfect Multimodal Alignment

Title: Towards Achieving Perfect Multimodal Alignment Auf dem Weg zu einer perfekten multimodalen Ausrichtung 实现完美多式联运协调 2503.15352v2

Authors (2): Abhi Kamboj, Minh N. Do

Multimodal alignment constructs a joint latent vector space where modalities representing the same concept map to neighboring latent vectors. We formulate this as an inverse problem and show that, under certain conditions, paired data from each modality can map to equivalent latent vectors, which we refer to as perfect alignment. When perfect alignment cannot be achieved, it can be approximated using the Singular Value Decomposition (SVD) of a multimodal data matrix. Experiments on synthetic multimodal Gaussian data verify the effectiveness of our perfect alignment method compared to a learned contrastive alignment method. We further demonstrate the practical application of cross-modal transfer for human action recognition, showing that perfect alignment significantly enhances the model’s accuracy. We conclude by discussing how these findings can be applied to various modalities and tasks and the limitations of our method. We hope these findings inspire further exploration of perfect alignment and its applications in representation learning.

nan


Article 1108

Title@2025-06-09 (1): Addressing Correlated Latent Exogenous Variables in Debiased Recommender Systems

Title: Addressing Correlated Latent Exogenous Variables in Debiased Recommender Systems Adressierung korrelierter latenter exogener Variablen in debiased Recommender Systems 处理受损害建议系统中与气候相关的低端外部源变量 2506.07517v1

Authors (4): Shuqiang Zhang, Yuchao Zhang, Jinkun Chen, Haochen Sui

Recommendation systems (RS) aim to provide personalized content, but they face a challenge in unbiased learning due to selection bias, where users only interact with items they prefer. This bias leads to a distorted representation of user preferences, which hinders the accuracy and fairness of recommendations. To address the issue, various methods such as error imputation based, inverse propensity scoring, and doubly robust techniques have been developed. Despite the progress, from the structural causal model perspective, previous debiasing methods in RS assume the independence of the exogenous variables. In this paper, we release this assumption and propose a learning algorithm based on likelihood maximization to learn a prediction model. We first discuss the correlation and difference between unmeasured confounding and our scenario, then we propose a unified method that effectively handles latent exogenous variables. Specifically, our method models the data generation process with latent exogenous variables under mild normality assumptions. We then develop a Monte Carlo algorithm to numerically estimate the likelihood function. Extensive experiments on synthetic datasets and three real-world datasets demonstrate the effectiveness of our proposed method. The code is at https://github.com/WallaceSUI/kdd25-background-variable.

nan


Article 1109

Title@2025-06-09 (1): SecFormer: Fast and Accurate Privacy-Preserving Inference for Transformer Models via SMPC

Title: SecFormer: Fast and Accurate Privacy-Preserving Inference for Transformer Models via SMPC SecFormer: Schnelle und genaue Datenschutz-Bewahrung von Schlussfolgerungen für Transformer-Modelle über SMPC SecFormer:通过SMPC为变形模型快速和准确的隐私保护推断 2401.00793v5

Authors (8): Jinglong Luo, Yehong Zhang, Zhuo Zhang, Jiaqi Zhang, Xin Mu, Hui Wang, Yue Yu, Zenglin Xu

With the growing use of Transformer models hosted on cloud platforms to offer inference services, privacy concerns are escalating, especially concerning sensitive data like investment plans and bank account details. Secure Multi-Party Computing (SMPC) emerges as a promising solution to protect the privacy of inference data and model parameters. However, the application of SMPC in Privacy-Preserving Inference (PPI) for Transformer models often leads to considerable slowdowns or declines in performance. This is largely due to the multitude of nonlinear operations in the Transformer architecture, which are not well-suited to SMPC and difficult to circumvent or optimize effectively. To address this concern, we introduce a comprehensive PPI framework called SecFormer to achieve fast and accurate PPI for Transformer models. We successfully eliminate the high-cost exponential and maximum operations in PPI without sacrificing model performance and develop a suite of efficient SMPC protocols by employing suitable numerical computation methods to boost other complex nonlinear functions in PPI, including GeLU, LayerNorm, and a redesigned Softmax. Our extensive experiments reveal that SecFormer outperforms MPCFormer in performance, showing improvements of $3.4\%$ and $24.7\%$ for BERT${\text{BASE}}$ and BERT${\text{LARGE}}$, respectively. In terms of efficiency, SecFormer is 3.57 and 3.58 times faster than PUMA for BERT${\text{BASE}}$ and BERT${\text{LARGE}}$, demonstrating its effectiveness and speed.

nan


Article 1110

Title@2025-06-09 (1): Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators

Title: Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators Beyond Closure Models: Chaotische Systeme über physikinformierte Neuraloperatoren lernen 超越封闭模型:通过物理内成形神经操作员学习毛质系统 2408.05177v4

Authors (7): Chuwei Wang, Julius Berner, Zongyi Li, Di Zhou, Jiayun Wang, Jane Bae, Anima Anandkumar

Accurately predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling. However, achieving such predictions typically requires iterative computations over a dense spatiotemporal grid to account for the unstable nature of chaotic systems, which is expensive and impractical in many real-world situations. An alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a \textit{closure model}, which approximates the overall information from fine scales not captured in the coarse-grid simulation. Recently, ML approaches have been used for closure modeling, but they typically require a large number of training samples from expensive fully-resolved simulations (FRS). In this work, we prove an even more fundamental limitation, i.e., the standard approach to learning closure models suffers from a large approximation error for generic problems, no matter how large the model is, and it stems from the non-uniqueness of the mapping. We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation by not using a closure model or a coarse-grid solver. We first train the PINO model on data from a coarse-grid solver and then fine-tune it with (a small amount of) FRS and physics-based losses on a fine grid. The discretization-free nature of neural operators means that they do not suffer from the restriction of a coarse grid that closure models face, and they can provably approximate the long-term statistics of chaotic systems. In our experiments, our PINO model achieves a 330x speedup compared to FRS with a relative error $\sim 10\%$. In contrast, the closure model coupled with a coarse-grid solver is $60$x slower than PINO while having a much higher error $\sim186\%$ when the closure model is trained on the same FRS dataset.

nan


Article 1111

Title@2025-06-09 (1): Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models

Title: Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models Eliciting In-Context Retrieval und Reasoning für Lang-Kontext Große Sprachmodelle 长式长式大语言模型的全文检索和理由 2501.08248v3

Authors (6): Yifu Qiu, Varun Embar, Yizhe Zhang, Navdeep Jaitly, Shay B. Cohen, Benjamin Han

Recent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their expanded context windows, LCLMs can process entire knowledge bases and perform retrieval and reasoning directly – a capability we define as In-Context Retrieval and Reasoning (ICR^2). However, existing benchmarks like LOFT often overestimate LCLM performance by providing overly simplified contexts. To address this, we introduce ICR^2, a benchmark that evaluates LCLMs in more realistic scenarios by including confounding passages retrieved with strong retrievers. We then propose three methods to enhance LCLM performance: (1) retrieve-then-generate fine-tuning, (2) retrieval-attention-probing, which uses attention heads to filter and de-noise long contexts during decoding, and (3) joint retrieval head training alongside the generation head. Our evaluation of five well-known LCLMs on LOFT and ICR^2 demonstrates significant gains with our best approach applied to Mistral-7B: +17 and +15 points by Exact Match on LOFT, and +13 and +2 points on ICR^2, compared to vanilla RAG and supervised fine-tuning, respectively. It even outperforms GPT-4-Turbo on most tasks despite being a much smaller model.

nan


Article 1112

Title@2025-06-09 (1): Reinforcement Learning via Implicit Imitation Guidance

Title: Reinforcement Learning via Implicit Imitation Guidance Verstärkung Lernen über Implizite Imitation Anleitung 通过隐含模拟指导加强学习 2506.07505v1

Authors (4): Perry Dong, Alec M. Lessing, Annie S. Chen, Chelsea Finn

We study the problem of sample efficient reinforcement learning, where prior data such as demonstrations are provided for initialization in lieu of a dense reward signal. A natural approach is to incorporate an imitation learning objective, either as regularization during training or to acquire a reference policy. However, imitation learning objectives can ultimately degrade long-term performance, as it does not directly align with reward maximization. In this work, we propose to use prior data solely for guiding exploration via noise added to the policy, sidestepping the need for explicit behavior cloning constraints. The key insight in our framework, Data-Guided Noise (DGN), is that demonstrations are most useful for identifying which actions should be explored, rather than forcing the policy to take certain actions. Our approach achieves up to 2-3x improvement over prior reinforcement learning from offline data methods across seven simulated continuous control tasks.

nan


Article 1113

Title@2025-06-09 (1): CaliciBoost: Performance-Driven Evaluation of Molecular Representations for Caco-2 Permeability Prediction

Title: CaliciBoost: Performance-Driven Evaluation of Molecular Representations for Caco-2 Permeability Prediction CaliciBoost: Performance-getriebene Auswertung von molekularen Darstellungen für Caco-2 Permeabilitätsvorhersage CaliciBoost:对Caco-2 渗透性预测的分子代表量进行实绩-驱动评价 2506.08059v1

Authors (11): Huong Van Le, Weibin Ren, Junhong Kim, Yukyung Yun, Young Bin Park, Young Jun Kim, Bok Kyung Han, Inho Choi, Jong IL Park, Hwi-Yeol Yun, Jae-Mun Choi

Caco-2 permeability serves as a critical in vitro indicator for predicting the oral absorption of drug candidates during early-stage drug discovery. To enhance the accuracy and efficiency of computational predictions, we systematically investigated the impact of eight molecular feature representation types including 2D/3D descriptors, structural fingerprints, and deep learning-based embeddings combined with automated machine learning techniques to predict Caco-2 permeability. Using two datasets of differing scale and diversity (TDC benchmark and curated OCHEM data), we assessed model performance across representations and identified PaDEL, Mordred, and RDKit descriptors as particularly effective for Caco-2 prediction. Notably, the AutoML-based model CaliciBoost achieved the best MAE performance. Furthermore, for both PaDEL and Mordred representations, the incorporation of 3D descriptors resulted in a 15.73% reduction in MAE compared to using 2D features alone, as confirmed by feature importance analysis. These findings highlight the effectiveness of AutoML approaches in ADMET modeling and offer practical guidance for feature selection in data-limited prediction tasks.

nan


Article 1114

Title@2025-06-09 (1): BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction

Title: BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction BatteryLife: Ein umfassender Datensatz und Benchmark für die Vorhersage der Akkulaufzeit 电池寿命:电池寿命预测综合数据集和基准 2502.18807v6

Authors (9): Ruifeng Tan, Weixiang Hong, Jiayue Tang, Xibin Lu, Ruijun Ma, Xiang Zheng, Jia Li, Jiaqiang Huang, Tong-Yi Zhang

Battery Life Prediction (BLP), which relies on time series data produced by battery degradation tests, is crucial for battery utilization, optimization, and production. Despite impressive advancements, this research area faces three key challenges. Firstly, the limited size of existing datasets impedes insights into modern battery life data. Secondly, most datasets are restricted to small-capacity lithium-ion batteries tested under a narrow range of diversity in labs, raising concerns about the generalizability of findings. Thirdly, inconsistent and limited benchmarks across studies obscure the effectiveness of baselines and leave it unclear if models popular in other time series fields are effective for BLP. To address these challenges, we propose BatteryLife, a comprehensive dataset and benchmark for BLP. BatteryLife integrates 16 datasets, offering a 2.5 times sample size compared to the previous largest dataset, and provides the most diverse battery life resource with batteries from 8 formats, 59 chemical systems, 9 operating temperatures, and 421 charge/discharge protocols, including both laboratory and industrial tests. Notably, BatteryLife is the first to release battery life datasets of zinc-ion batteries, sodium-ion batteries, and industry-tested large-capacity lithium-ion batteries. With the comprehensive dataset, we revisit the effectiveness of baselines popular in this and other time series fields. Furthermore, we propose CyclePatch, a plug-in technique that can be employed in various neural networks. Extensive benchmarking of 18 methods reveals that models popular in other time series fields can be unsuitable for BLP, and CyclePatch consistently improves model performance establishing state-of-the-art benchmarks. Moreover, BatteryLife evaluates model performance across aging conditions and domains. BatteryLife is available at https://github.com/Ruifeng-Tan/BatteryLife.

nan


Article 1115

Title@2025-06-09 (1): Scalable Sobolev IPM for Probability Measures on a Graph

Title: Scalable Sobolev IPM for Probability Measures on a Graph Skalierbares Sobolev IPM für Wahrscheinlichkeitsmessungen auf einem Graphen 可缩缩缩的 Sobolev 用于对图表进行概率测量的IPM IPM 2502.00737v2

Authors (4): Tam Le, Truyen Nguyen, Hideitsu Hino, Kenji Fukumizu

We investigate the Sobolev IPM problem for probability measures supported on a graph metric space. Sobolev IPM is an important instance of integral probability metrics (IPM), and is obtained by constraining a critic function within a unit ball defined by the Sobolev norm. In particular, it has been used to compare probability measures and is crucial for several theoretical works in machine learning. However, to our knowledge, there are no efficient algorithmic approaches to compute Sobolev IPM effectively, which hinders its practical applications. In this work, we establish a relation between Sobolev norm and weighted $L^p$-norm, and leverage it to propose a \emph{novel regularization} for Sobolev IPM. By exploiting the graph structure, we demonstrate that the regularized Sobolev IPM provides a \emph{closed-form} expression for fast computation. This advancement addresses long-standing computational challenges, and paves the way to apply Sobolev IPM for practical applications, even in large-scale settings. Additionally, the regularized Sobolev IPM is negative definite. Utilizing this property, we design positive-definite kernels upon the regularized Sobolev IPM, and provide preliminary evidences of their advantages for comparing probability measures on a given graph for document classification and topological data analysis.

nan


Article 1116

Title@2025-06-09 (1): Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning

Title: Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning Graph-of-Causal Evolution: Herausfordernde Modellkette für die Vernunft Causal 进化图:有挑战性的理由模型链 2506.07501v1

Authors (1): Libo Wang

In view of the problem that each subchain in the chain-of-model (CoM) relies only on the information of the previous subchain and may lose long-range dependencies due to the causal mask blocking the global context flow between multi-level subchains, this work proposes a graph of causal evolution (GoCE). Its core principle is to map the implicit token representation into a differentiable and sparse causal adjacency matrix, then permeate causal constraints through each layer of calculation using causal-masked attention and causal-MoE. By combining intervention consistency loss test and self-evolution gate, the dynamic balance between causal structure learning and adaptive updating of transformer architecture is realized. The researcher built experimental environments in sandboxes built with Claude Sonnet 4, o4-mini-high, and DeepSeek R1 respectively with the transformer variant architecture introduced in GoCE. It is evaluated on publicly available datasets including CLUTRR, CLADDER, EX-FEVER, and CausalQA and compared with the baseline LLMs. The finding proves that GoCE strengthens the transformer’s ability to capture long-range causal dependencies, while the ability to self-evolve is improved. It not only surpasses the design of CoM in terms of design principles, but also provides experience for future research on causal learning and continuous adaptive improvement.

nan


Article 1117

Title@2025-06-09 (1): Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks

Title: Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks Mind the Gap: Entfernen der Diskretisierungslücke in differenzierbaren Logic Gate-Netzwerken 牢记差距:消除不同逻辑门网的分化差距 2506.07500v1

Authors (4): Shakir Yousefi, Andreas Plesner, Till Aczel, Roger Wattenhofer

Modern neural networks demonstrate state-of-the-art performance on numerous existing benchmarks; however, their high computational requirements and energy consumption prompt researchers to seek more efficient solutions for real-world deployment. Logic gate networks (LGNs) learns a large network of logic gates for efficient image classification. However, learning a network that can solve a simple problem like CIFAR-10 can take days to weeks to train. Even then, almost half of the network remains unused, causing a discretization gap. This discretization gap hinders real-world deployment of LGNs, as the performance drop between training and inference negatively impacts accuracy. We inject Gumbel noise with a straight-through estimator during training to significantly speed up training, improve neuron utilization, and decrease the discretization gap. We theoretically show that this results from implicit Hessian regularization, which improves the convergence properties of LGNs. We train networks $4.5 \times$ faster in wall-clock time, reduce the discretization gap by $98\%$, and reduce the number of unused gates by $100\%$.

nan


Article 1118

Title@2025-06-09 (1): Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization

Title: Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization Low-Dimension-to-High-Dimension Verallgemeinerung und ihre Implikationen für Länge Verallgemeinerung 低差异至高差异通用化及其对延长通用化的影响 2410.08898v2

Authors (4): Yang Chen, Long Yang, Yitao Liang, Zhouchen Lin

Low-Dimension-to-High-Dimension (LDHD) generalization is a special case of Out-of-Distribution (OOD) generalization, where the training data are restricted to a low-dimensional subspace of the high-dimensional testing space. Assuming that each instance is generated from a latent variable and the dimension of the latent variable reflects the problem scale, the inherent scaling challenge in length generalization can be captured by the LDHD generalization in the latent space. We theoretically demonstrate that LDHD generalization is generally unattainable without exploiting prior knowledge to provide appropriate inductive bias. Specifically, we explore LDHD generalization in Boolean functions. We verify that different architectures trained with (S)GD converge to \emph{min-degree interpolators w.r.t. different independent sets}. LDHD generalization is achievable if and only if the target function coincides with this inductive bias. Applying the insights from LDHD generalization to length generalization, we explain the effectiveness of CoT as changing the structure latent space to enable better LDHD generalization. We also propose a principle for position embedding design to handle both the inherent LDHD generalization and the nuisances such as the data format. Following the principle, we propose a novel position embedding called RPE-Square that remedies the RPE for dealing with the data format nuisance.

nan


Article 1119

Title@2025-06-09 (1): Explicit Preference Optimization: No Need for an Implicit Reward Model

Title: Explicit Preference Optimization: No Need for an Implicit Reward Model Explizite Preference-Optimierung: Keine Notwendigkeit für ein Implizites Prämienmodell 明确的优惠最优化:没有必要采用隐性奖赏模式 2506.07492v1

Authors (4): Xiangkun Hu, Lemin Kong, Tong He, David Wipf

The generated responses of large language models (LLMs) are often fine-tuned to human preferences through a process called reinforcement learning from human feedback (RLHF). As RLHF relies on a challenging training sequence, whereby a separate reward model is independently learned and then later applied to LLM policy updates, ongoing research effort has targeted more straightforward alternatives. In this regard, direct preference optimization (DPO) and its many offshoots circumvent the need for a separate reward training step. Instead, through the judicious use of a reparameterization trick that induces an \textit{implicit} reward, DPO and related methods consolidate learning to the minimization of a single loss function. And yet despite demonstrable success in some real-world settings, we prove that DPO-based objectives are nonetheless subject to sub-optimal regularization and counter-intuitive interpolation behaviors, underappreciated artifacts of the reparameterizations upon which they are based. To this end, we introduce an \textit{explicit} preference optimization framework termed EXPO that requires no analogous reparameterization to achieve an implicit reward. Quite differently, we merely posit intuitively-appealing regularization factors from scratch that transparently avoid the potential pitfalls of key DPO variants, provably satisfying regularization desiderata that prior methods do not. Empirical results serve to corroborate our analyses and showcase the efficacy of EXPO.

nan


Article 1120

Title@2025-06-09 (1): Binary Classifier Optimization for Large Language Model Alignment

Title: Binary Classifier Optimization for Large Language Model Alignment Binärklassifikator-Optimierung für großsprachliche Modellausrichtung 大语言模型对齐的二进制分类器优化 2404.04656v2

Authors (4): Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On

In real-world services such as ChatGPT, aligning models based on user feedback is crucial for improving model performance. However, due to the simplicity and convenience of providing feedback, users typically offer only basic binary signals, such as ‘thumbs-up’ or ‘thumbs-down’. Most existing alignment research, on the other hand, relies on preference-based approaches that require both positive and negative responses as a pair. We propose Binary Classifier Optimization (BCO), a technique that effectively aligns LLMs using only binary feedback. BCO trains a binary classifier, where the logit serves as an implicit reward, effectively minimizing the Direct Preference Optimization (DPO) loss. We demonstrate that the binary cross-entropy loss employed in classifier training acts as an upper bound for the DPO loss. Additionally, a novel reward shift technique further minimizes the gap between the losses. We validate our methodology in two settings: first, on a paired preference dataset, where our method performs on par with DPO; and second, on a Likert-5 scale annotation dataset which stems from real users’ queries. Our model consistently demonstrates effective and robust alignment across four base LLMs and three different datasets, showcasing the strength of our approach to learning from binary signals.

nan


Article 1121

Title@2025-06-09 (1): CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization

Title: CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model für die Kontextoptimierung CoCOA-混合:环境优化组合模型 2506.07484v1

Authors (3): Dasol Hong, Wooju Lee, Hyun Myung

Prompt tuning, which adapts vision-language models by freezing model parameters and optimizing only the prompt, has proven effective for task-specific adaptations. The core challenge in prompt tuning is improving specialization for a specific task and generalization for unseen domains. However, frozen encoders often produce misaligned features, leading to confusion between classes and limiting specialization. To overcome this issue, we propose a confusion-aware loss (CoA-loss) that improves specialization by refining the decision boundaries between confusing classes. Additionally, we mathematically demonstrate that a mixture model can enhance generalization without compromising specialization. This is achieved using confidence-aware weights (CoA-weights), which adjust the weights of each prediction in the mixture model based on its confidence within the class domains. Extensive experiments show that CoCoA-Mix, a mixture model with CoA-loss and CoA-weights, outperforms state-of-the-art methods by enhancing specialization and generalization. Our code is publicly available at https://github.com/url-kaist/CoCoA-Mix.

nan


Article 1122

Title@2025-06-09 (1): Premise Selection for a Lean Hammer

Title: Premise Selection for a Lean Hammer Premise Auswahl für einen Lean Hammer 精精精锤预选 2506.07477v1

Authors (5): Thomas Zhu, Joshua Clune, Jeremy Avigad, Albert Qiaochu Jiang, Sean Welleck

Neural methods are transforming automated reasoning for proof assistants, yet integrating these advances into practical verification workflows remains challenging. Hammers are tools that interface with external automatic theorem provers to automate tedious reasoning steps. They have dramatically improved productivity in proof assistants, but the Lean proof assistant still does not have a hammer despite its growing popularity. We present LeanHammer, the first end-to-end domain-general hammer for Lean, built on a novel neural premise selection system for a hammer in dependent type theory. Unlike existing Lean premise selectors, our approach dynamically adapts to user-specific contexts and combines with symbolic proof search and reconstruction to create a practical hammer. With comprehensive evaluations, we show that our premise selector enables LeanHammer to solve 21\% more goals relative to existing premise selectors, and generalize well to diverse domains. Our work bridges the gap between neural retrieval and symbolic reasoning, making formal verification more accessible to researchers and practitioners.

nan


Article 1123

Title@2025-06-09 (1): FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models

Title: FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models FRED: Flexible Reduction-Distribution Interconnect und Communication Implementation für Wafer-Scale Distributed Training von DNN-Modellen FRED: DNN模型的宽度分布式分散式培训灵活弹性吸收-分散式互联和通信实施 2406.19580v2

Authors (5): Saeed Rashidi, William Won, Sudarshan Srinivasan, Puneet Gupta, Tushar Krishna

Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating high-end accelerators with high-speed wafer-scale interconnects, making it an attractive platform for distributed training. However, the wafer-scale interconnect should offer high performance and flexibility for various parallelization strategies to enable maximum optimizations for compute and memory usage. In this paper, we propose FRED, a wafer-scale interconnect that is tailored for the high-BW requirements of wafer-scale networks and can efficiently execute communication patterns of different parallelization strategies. Furthermore, FRED supports in-switch collective communication execution that reduces the network traffic by approximately 2X. Our results show that FRED can improve the average end-to-end training time of ResNet-152, Transformer-17B, GPT-3, and Transformer-1T by 1.76X, 1.87X, 1.34X, and 1.4X, respectively when compared to a baseline waferscale 2D-Mesh fabric.

nan


Article 1124

Title@2025-06-09 (1): Deep Ridgelet Transform and Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines

Title: Deep Ridgelet Transform and Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines Deep Ridgelet Transform und Unified Universality Theorem für tiefe und flache Gelenk-Gruppe-Equivariante Maschinen 深脊变异和统一通用深海和浅水联合小组-平等合用机器理论 2405.13682v5

Authors (4): Sho Sonoda, Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda

We present a constructive universal approximation theorem for learning machines equipped with joint-group-equivariant feature maps, called the joint-equivariant machines, based on the group representation theory. ``Constructive’’ here indicates that the distribution of parameters is given in a closed-form expression known as the ridgelet transform. Joint-group-equivariance encompasses a broad class of feature maps that generalize classical group-equivariance. Particularly, fully-connected networks are not group-equivariant but are joint-group-equivariant. Our main theorem also unifies the universal approximation theorems for both shallow and deep networks. Until this study, the universality of deep networks has been shown in a different manner from the universality of shallow networks, but our results discuss them on common ground. Now we can understand the approximation schemes of various learning machines in a unified manner. As applications, we show the constructive universal approximation properties of four examples: depth-$n$ joint-equivariant machine, depth-$n$ fully-connected network, depth-$n$ group-convolutional network, and a new depth-$2$ network with quadratic forms whose universality has not been known.

nan


Article 1125

Title@2025-06-09 (1): Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Title: Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models Verfolgung beweglicher Ziele mit Online-Selbstspiel-Verstärkung Lernen für sicherere Sprachmodelle 利用在线加强自身能力学习,建立更安全语言模式,以追踪移动目标 2506.07468v1

Authors (7): Mickel Liu, Liwei Jiang, Yancheng Liang, Simon Shaolei Du, Yejin Choi, Tim Althoff, Natasha Jaques

Conventional language model (LM) safety alignment relies on a reactive, disjoint procedure: attackers exploit a static model, followed by defensive fine-tuning to patch exposed vulnerabilities. This sequential approach creates a mismatch – attackers overfit to obsolete defenses, while defenders perpetually lag behind emerging threats. To address this, we propose Self-RedTeam, an online self-play reinforcement learning algorithm where an attacker and defender agent co-evolve through continuous interaction. We cast safety alignment as a two-player zero-sum game, where a single model alternates between attacker and defender roles – generating adversarial prompts and safeguarding against them – while a reward LM adjudicates outcomes. This enables dynamic co-adaptation. Grounded in the game-theoretic framework of zero-sum games, we establish a theoretical safety guarantee which motivates the design of our method: if self-play converges to a Nash Equilibrium, the defender will reliably produce safe responses to any adversarial input. Empirically, Self-RedTeam uncovers more diverse attacks (+21.8% SBERT) compared to attackers trained against static defenders and achieves higher robustness on safety benchmarks (e.g., +65.5% on WildJailBreak) than defenders trained against static attackers. We further propose hidden Chain-of-Thought, allowing agents to plan privately, which boosts adversarial diversity and reduces over-refusals. Our results motivate a shift from reactive patching to proactive co-evolution in LM safety training, enabling scalable, autonomous, and robust self-improvement of LMs via multi-agent reinforcement learning (MARL).

nan


Article 1126

Title@2025-06-09 (1): Physics-Informed Deep Learning Model for Line-integral Diagnostics Across Fusion Devices

Title: Physics-Informed Deep Learning Model for Line-integral Diagnostics Across Fusion Devices Physik-informiertes Deep-Learning-Modell für linienintegrierte Diagnosen über Fusionsgeräte 横穿融合装置的直线整体诊断法深学习模型 2412.00087v3

Authors (13): Cong Wang, Weizhe Yang, Haiping Wang, Renjie Yang, Jing Li, Zhijun Wang, Yixiong Wei, Xianli Huang, Chenshu Hu, Zhaoyang Liu, Xinyao Yu, Changqing Zou, Zhifeng Zhao

Rapid reconstruction of 2D plasma profiles from line-integral measurements is important in nuclear fusion. This paper introduces a physics-informed model architecture called Onion, that can enhance the performance of models and be adapted to various backbone networks. The model under Onion incorporates physical information by a multiplication process and applies the physics-informed loss function according to the principle of line integration. Prediction results demonstrate that the additional input of physical information improves the deep learning model’s ability, leading to a reduction in the average relative error E_1 between the reconstruction profiles and the target profiles by approximately 0.84x10^(-2) on synthetic datasets and about 0.06x10^(-2) on experimental datasets. Furthermore, the implementation of the Softplus activation function in the final two fully connected layers improves model performance. This enhancement results in a reduction in the E_1 by approximately 1.06x10^(-2) on synthetic datasets and about 0.11x10^(-2) on experimental datasets. The incorporation of the physics-informed loss function has been shown to correct the model’s predictions, bringing the back-projections closer to the actual inputs and reducing the errors associated with inversion algorithms. Besides, we have developed a synthetic data model to generate customized line-integral diagnostic datasets and have also collected soft x-ray diagnostic datasets from EAST and HL-2A. This study achieves reductions in reconstruction errors, and accelerates the development of surrogate models in fusion research.

nan


Article 1127

Title@2025-06-09 (1): Circumventing Backdoor Space via Weight Symmetry

Title: Circumventing Backdoor Space via Weight Symmetry Umgehung des Backdoor-Raums über die Gewichtssymmetrie 通过重量对称法环绕后门空间 2506.07467v1

Authors (7): Jie Peng, Hongwei Yang, Jing Zhao, Hengji Dong, Hui He, Weizhe Zhang, Haoyu He

Deep neural networks are vulnerable to backdoor attacks, where malicious behaviors are implanted during training. While existing defenses can effectively purify compromised models, they typically require labeled data or specific training procedures, making them difficult to apply beyond supervised learning settings. Notably, recent studies have shown successful backdoor attacks across various learning paradigms, highlighting a critical security concern. To address this gap, we propose Two-stage Symmetry Connectivity (TSC), a novel backdoor purification defense that operates independently of data format and requires only a small fraction of clean samples. Through theoretical analysis, we prove that by leveraging permutation invariance in neural networks and quadratic mode connectivity, TSC amplifies the loss on poisoned samples while maintaining bounded clean accuracy. Experiments demonstrate that TSC achieves robust performance comparable to state-of-the-art methods in supervised learning scenarios. Furthermore, TSC generalizes to self-supervised learning frameworks, such as SimCLR and CLIP, maintaining its strong defense capabilities. Our code is available at https://github.com/JiePeng104/TSC.

nan


Article 1128

Title@2025-06-09 (1): Parameter-Efficient Fine-Tuning of State Space Models

Title: Parameter-Efficient Fine-Tuning of State Space Models Parameter-Effizient Feintuning von State Space Models 国家空间模型有效精美设计 2410.09016v3

Authors (5): Kevin Galim, Wonjun Kang, Yuchen Zeng, Hyung Il Koo, Kangwook Lee

Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains largely underexplored. We start by investigating two fundamental questions on existing PEFT methods: (i) How do they perform on SSM-based models? (ii) Which parameters should they target for optimal results? Our analysis shows that LoRA and its variants consistently outperform all other PEFT methods. While LoRA is effective for linear projection matrices, it fails on SSM modules-yet still outperforms other methods applicable to SSMs, indicating their limitations. This underscores the need for a specialized SSM tuning approach. To address this, we propose Sparse Dimension Tuning (SDT), a PEFT method tailored for SSM modules. Combining SDT for SSMs with LoRA for linear projection matrices, we achieve state-of-the-art performance across extensive experiments.

nan


Article 1129

Title@2025-06-09 (1): KScope: A Framework for Characterizing the Knowledge Status of Language Models

Title: KScope: A Framework for Characterizing the Knowledge Status of Language Models KScope: Ein Rahmen für die Charakterisierung des Wissensstatus von Sprachmodellen KCope: 语言模式知识状况描述框架 2506.07458v1

Authors (6): Yuxin Xiao, Shan Chen, Jack Gallifant, Danielle Bitterman, Thomas Hartvigsen, Marzyeh Ghassemi

Characterizing a large language model’s (LLM’s) knowledge of a given question is challenging. As a result, prior work has primarily examined LLM behavior under knowledge conflicts, where the model’s internal parametric memory contradicts information in the external context. However, this does not fully reflect how well the model knows the answer to the question. In this paper, we first introduce a taxonomy of five knowledge statuses based on the consistency and correctness of LLM knowledge modes. We then propose KScope, a hierarchical framework of statistical tests that progressively refines hypotheses about knowledge modes and characterizes LLM knowledge into one of these five statuses. We apply KScope to nine LLMs across four datasets and systematically establish: (1) Supporting context narrows knowledge gaps across models. (2) Context features related to difficulty, relevance, and familiarity drive successful knowledge updates. (3) LLMs exhibit similar feature preferences when partially correct or conflicted, but diverge sharply when consistently wrong. (4) Context summarization constrained by our feature analysis, together with enhanced credibility, further improves update effectiveness and generalizes across LLMs.

nan


Article 1130

Title@2025-06-09 (1): When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment

Title: When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment Wenn Stil die Sicherheit bricht: Verteidigung von Sprachmodellen gegen hocheffiziente Stilausrichtung 当样式打破安全时: 防御语言模型, 对抗超优样式对齐 2506.07452v1

Authors (5): Yuxin Xiao, Sana Tonekaboni, Walter Gerych, Vinith Suriyakumar, Marzyeh Ghassemi

Large language models (LLMs) can be prompted with specific styles (e.g., formatting responses as lists), including in jailbreak queries. Although these style patterns are semantically unrelated to the malicious intents behind jailbreak queries, their safety impact remains unclear. In this work, we seek to understand whether style patterns compromise LLM safety, how superficial style alignment increases model vulnerability, and how best to mitigate these risks during alignment. We evaluate 32 LLMs across seven jailbreak benchmarks, and find that malicious queries with style patterns inflate the attack success rate (ASR) for nearly all models. Notably, ASR inflation correlates with both the length of style patterns and the relative attention an LLM exhibits on them. We then investigate superficial style alignment, and find that fine-tuning with specific styles makes LLMs more vulnerable to jailbreaks of those same styles. Finally, we propose SafeStyle, a defense strategy that incorporates a small amount of safety training data augmented to match the distribution of style patterns in the fine-tuning data. Across three LLMs and five fine-tuning style settings, SafeStyle consistently outperforms baselines in maintaining LLM safety.

nan


Article 1131

Title@2025-06-09 (1): Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs

Title: Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs Die Erweiterung der epistemischen Unsicherheit über Parameter hinaus würde helfen, zuverlässige LLMs zu konzipieren 超越参数的不确定性延伸有助于设计可靠的长效磁环 2506.07448v1

Authors (4): T. Duy Nguyen-Hien, Desi R. Ivanova, Yee Whye Teh, Wee Sun Lee

Although large language models (LLMs) are highly interactive and extendable, current approaches to ensure reliability in deployments remain mostly limited to rejecting outputs with high uncertainty in order to avoid misinformation. This conservative strategy reflects the current lack of tools to systematically distinguish and respond to different sources of uncertainty. In this paper, we advocate for the adoption of Bayesian Modeling of Experiments – a framework that provides a coherent foundation to reason about uncertainty and clarify the reducibility of uncertainty – for managing and proactively addressing uncertainty that arises in LLM deployments. This framework enables LLMs and their users to take contextually appropriate steps, such as requesting clarification, retrieving external information, or refining inputs. By supporting active resolution rather than passive avoidance, it opens the door to more reliable, transparent, and broadly applicable LLM systems, particularly in high-stakes, real-world settings.

nan


Article 1132

Title@2025-06-09 (1): One Fits All: Learning Fair Graph Neural Networks for Various Sensitive Attributes

Title: One Fits All: Learning Fair Graph Neural Networks for Various Sensitive Attributes Eins passt auf alle: Faire Graphen-Neural-Netzwerke für verschiedene Sensitive Attribute lernen ” 一个适合所有人 “ :学习各种敏感属性的公平图表神经网络 2406.13544v3

Authors (5): Yuchang Zhu, Jintang Li, Yatao Bian, Zibin Zheng, Liang Chen

Recent studies have highlighted fairness issues in Graph Neural Networks (GNNs), where they produce discriminatory predictions against specific protected groups categorized by sensitive attributes such as race and age. While various efforts to enhance GNN fairness have made significant progress, these approaches are often tailored to specific sensitive attributes. Consequently, they necessitate retraining the model from scratch to accommodate changes in the sensitive attribute requirement, resulting in high computational costs. To gain deeper insights into this issue, we approach the graph fairness problem from a causal modeling perspective, where we identify the confounding effect induced by the sensitive attribute as the underlying reason. Motivated by this observation, we formulate the fairness problem in graphs from an invariant learning perspective, which aims to learn invariant representations across environments. Accordingly, we propose a graph fairness framework based on invariant learning, namely FairINV, which enables the training of fair GNNs to accommodate various sensitive attributes within a single training session. Specifically, FairINV incorporates sensitive attribute partition and trains fair GNNs by eliminating spurious correlations between the label and various sensitive attributes. Experimental results on several real-world datasets demonstrate that FairINV significantly outperforms state-of-the-art fairness approaches, underscoring its effectiveness. Our code is available via: https://github.com/ZzoomD/FairINV/.

nan


Article 1133

Title@2025-06-09 (1): Resilience of Rademacher chaos of low degree

Title: Resilience of Rademacher chaos of low degree Resilienz von Rademacher Chaos von niedrigem Grad Rademacher低程度混乱的复原能力 2402.10504v5

Authors (3): Elad Aigner-Horev, Daniel Rosenberg, Roi Weiss

The resilience of a Rademacher chaos is the maximum number of adversarial sign-flips that the chaos can sustain without having its largest atom probability significantly altered. Inspired by probabilistic lower-bound guarantees for the resilience of linear Rademacher chaos, obtained by Bandeira, Ferber, and Kwan (Advances in Mathematics, Vol. $319$, $2017$), we provide probabilistic lower-bound guarantees for the resilience of Rademacher chaos of arbitrary yet sufficiently low degree. Our main results distinguish between Rademacher chaos of order two and those of higher order. In that, our first main result pertains to the resilience of decoupled bilinear Rademacher forms where different asymptotic behaviour is observed for sparse and dense matrices. For our second main result, we bootstrap our first result in order to provide resilience guarantees for quadratic Rademacher chaos. Our third main result, generalises the first and handles the resilience of decoupled Rademacher chaos of arbitrary yet sufficiently low order. Our results for decoupled Rademacher chaos of order two and that of higher order whilst are established through the same conceptual framework, differ substantially. A difference incurred due to the implementation of the same conceptual argument. The order two result is established using Dudley’s maximal inequality for sub-Gaussian processes, the Hanson-Wright inequality, as well as the Kolmogorov-Rogozin inequality. To handle higher order chaos, appeals to Dudley’s inequality as well as the Hanson-Wright inequality are replaced with tools suited for random tensors. Appeals to the Hanson-Wright inequality are replaced with appeals to a concentration result for random tensors put forth by Adamczak and Wolff. Our results are instance-dependent and thus allow for the efficient computation of resilience guarantees provided the order of the chaos is constant.

nan


Article 1134

Title@2025-06-09 (1): Position: We Need Responsible, Application-Driven (RAD) AI Research

Title: Position: We Need Responsible, Application-Driven (RAD) AI Research Position: Wir brauchen verantwortungsbewusste, anwendungsgetriebene (RAD) KI-Forschung 立场:我们需要负责任、应用驱动(RAD)AI研究 2505.04104v2

Authors (4): Sarah Hartman, Cheng Soon Ong, Julia Powles, Petra Kuhnert

This position paper argues that achieving meaningful scientific and societal advances with artificial intelligence (AI) requires a responsible, application-driven approach (RAD) to AI research. As AI is increasingly integrated into society, AI researchers must engage with the specific contexts where AI is being applied. This includes being responsive to ethical and legal considerations, technical and societal constraints, and public discourse. We present the case for RAD-AI to drive research through a three-staged approach: (1) building transdisciplinary teams and people-centred studies; (2) addressing context-specific methods, ethical commitments, assumptions, and metrics; and (3) testing and sustaining efficacy through staged testbeds and a community of practice. We present a vision for the future of application-driven AI research to unlock new value through technically feasible methods that are adaptive to the contextual needs and values of the communities they ultimately serve.

nan


Article 1135

Title@2025-06-09 (1): Straight-Line Diffusion Model for Efficient 3D Molecular Generation

Title: Straight-Line Diffusion Model for Efficient 3D Molecular Generation Geradliniges Diffusionsmodell für effiziente 3D-Molekulargeneration 高效 3D 分子生成的直线扩散模型 2503.02918v2

Authors (8): Yuyan Ni, Shikun Feng, Haohan Chi, Bowen Zheng, Huan-ang Gao, Wei-Ying Ma, Zhi-Ming Ma, Yanyan Lan

Diffusion-based models have shown great promise in molecular generation but often require a large number of sampling steps to generate valid samples. In this paper, we introduce a novel Straight-Line Diffusion Model (SLDM) to tackle this problem, by formulating the diffusion process to follow a linear trajectory. The proposed process aligns well with the noise sensitivity characteristic of molecular structures and uniformly distributes reconstruction effort across the generative process, thus enhancing learning efficiency and efficacy. Consequently, SLDM achieves state-of-the-art performance on 3D molecule generation benchmarks, delivering a 100-fold improvement in sampling efficiency.

nan


Article 1136

Title@2025-06-09 (1): Federated In-Context Learning: Iterative Refinement for Improved Answer Quality

Title: Federated In-Context Learning: Iterative Refinement for Improved Answer Quality Föderiertes In-Context-Lernen: Iterative Veredelung für verbesserte Antwortqualität 联邦内部学习:改进答复质量的过渡性改进 2506.07440v1

Authors (8): Ruhan Wang, Zhiyong Wang, Chengkai Huang, Rui Wang, Tong Yu, Lina Yao, John C. S. Lui, Dongruo Zhou

For question-answering (QA) tasks, in-context learning (ICL) enables language models to generate responses without modifying their parameters by leveraging examples provided in the input. However, the effectiveness of ICL heavily depends on the availability of high-quality examples, which are often scarce due to data privacy constraints, annotation costs, and distribution disparities. A natural solution is to utilize examples stored on client devices, but existing approaches either require transmitting model parameters - incurring significant communication overhead - or fail to fully exploit local datasets, limiting their effectiveness. To address these challenges, we propose Federated In-Context Learning (Fed-ICL), a general framework that enhances ICL through an iterative, collaborative process. Fed-ICL progressively refines responses by leveraging multi-round interactions between clients and a central server, improving answer quality without the need to transmit model parameters. We establish theoretical guarantees for the convergence of Fed-ICL and conduct extensive experiments on standard QA benchmarks, demonstrating that our proposed approach achieves strong performance while maintaining low communication costs.

nan


Article 1137

Title@2025-06-09 (1): Fast Geometric Embedding for Node Influence Maximization

Title: Fast Geometric Embedding for Node Influence Maximization Schnelle geometrische Einbettung für Knotenbeeinflussung Maximierung 节点影响最大化快速几何嵌入 2506.07435v1

Authors (2): Alexander Kolpakov, Igor Rivin

Computing classical centrality measures such as betweenness and closeness is computationally expensive on large-scale graphs. In this work, we introduce an efficient force layout algorithm that embeds a graph into a low-dimensional space, where the radial distance from the origin serves as a proxy for various centrality measures. We evaluate our method on multiple graph families and demonstrate strong correlations with degree, PageRank, and paths-based centralities. As an application, it turns out that the proposed embedding allows to find high-influence nodes in a network, and provides a fast and scalable alternative to the standard greedy algorithm.

nan


Article 1138

Title@2025-06-09 (1): Cartridges: Lightweight and general-purpose long context representations via self-study

Title: Cartridges: Lightweight and general-purpose long context representations via self-study Patronen: Leichte und universelle lange Kontextdarstellungen durch Selbststudium Cartridges:轻量和一般用途长背景介绍,通过自学 2506.06266v2

Authors (11): Sabri Eyuboglu, Ryan Ehrlich, Simran Arora, Neel Guha, Dylan Zinsley, Emily Liu, Will Tennien, Atri Rudra, James Zou, Azalia Mirhoseini, Christopher Re

Large language models are often used to answer queries grounded in large text corpora (e.g. codebases, legal documents, or chat histories) by placing the entire corpus in the context window and leveraging in-context learning (ICL). Although current models support contexts of 100K-1M tokens, this setup is costly to serve because the memory consumption of the KV cache scales with input length. We explore an alternative: training a smaller KV cache offline on each corpus. At inference time, we load this trained KV cache, which we call a Cartridge, and decode a response. Critically, the cost of training a Cartridge can be amortized across all the queries referencing the same corpus. However, we find that the naive approach of training the Cartridge with next-token prediction on the corpus is not competitive with ICL. Instead, we propose self-study, a training recipe in which we generate synthetic conversations about the corpus and train the Cartridge with a context-distillation objective. We find that Cartridges trained with self-study replicate the functionality of ICL, while being significantly cheaper to serve. On challenging long-context benchmarks, Cartridges trained with self-study match ICL performance while using 38.6x less memory and enabling 26.4x higher throughput. Self-study also extends the model’s effective context length (e.g. from 128k to 484k tokens on MTOB) and surprisingly, leads to Cartridges that can be composed at inference time without retraining.

nan


Article 1139

Title@2025-06-09 (1): Multi-agent Architecture Search via Agentic Supernet

Title: Multi-agent Architecture Search via Agentic Supernet Multi-Agent Architektur Suche über Agentic Supernet 通过 Agric Supernet 多剂机构建筑搜索 2502.04180v2

Authors (6): Guibin Zhang, Luyang Niu, Junfeng Fang, Kun Wang, Lei Bai, Xiang Wang

Large Language Model (LLM)-empowered multi-agent systems extend the cognitive boundaries of individual agents through disciplined collaboration and interaction, while constructing these systems often requires labor-intensive manual designs. Despite the availability of methods to automate the design of agentic workflows, they typically seek to identify a static, complex, one-size-fits-all system, which, however, fails to dynamically allocate inference resources based on the difficulty and domain of each query. To address this challenge, we shift away from the pursuit of a monolithic agentic system, instead optimizing the \textbf{agentic supernet}, a probabilistic and continuous distribution of agentic architectures. We introduce MaAS, an automated framework that samples query-dependent agentic systems from the supernet, delivering high-quality solutions and tailored resource allocation (\textit{e.g.}, LLM calls, tool calls, token cost). Comprehensive evaluation across six benchmarks demonstrates that MaAS \textbf{(I)} requires only $6\sim45\%$ of the inference costs of existing handcrafted or automated multi-agent systems, \textbf{(II)} surpasses them by $0.54\%\sim11.82\%$, and \textbf{(III)} enjoys superior cross-dataset and cross-LLM-backbone transferability.

nan


Article 1140

Title@2025-06-09 (1): FlashDMoE: Fast Distributed MoE in a Single Kernel

Title: FlashDMoE: Fast Distributed MoE in a Single Kernel FlashDMoE: Schnell verteiltes MoE in einem einzigen Kernel FlashDMoE: 在一个单一核心中快速分布的教育部 2506.04667v2

Authors (3): Osayamen Jonathan Aimuyo, Byungsoo Oh, Rachee Singh

The computational sparsity of Mixture-of-Experts (MoE) models enables sub-linear growth in compute cost as model size increases, thus offering a scalable path to training massive neural networks. However, existing implementations suffer from \emph{low GPU utilization}, \emph{significant latency overhead}, and a fundamental \emph{inability to leverage task locality}, primarily due to CPU-managed scheduling, host-initiated communication, and frequent kernel launches. To overcome these limitations, we develop FlashDMoE, a fully GPU-resident MoE operator that fuses expert computation and inter-GPU communication into a \emph{single persistent GPU kernel}. FlashDMoE enables fine-grained pipelining of dispatch, compute, and combine phases, eliminating launch overheads and reducing idle gaps. Unlike existing work, FlashDMoE obviates bulk-synchronous collectives for one-sided, device-initiated, inter-GPU (R)DMA transfers, thus unlocking \emph{payload efficiency}, where we eliminate bloated or redundant network payloads in sparsely activated layers. When evaluated on a single 8-H100 GPU node with MoE models having up to 128 experts and 16K token sequences, FlashDMoE achieves up to \textbf{9}$\times$ higher GPU utilization, \textbf{6}$\times$ lower latency, \textbf{5.7}$\times$ higher throughput, and \textbf{4}$\times$ better overlap efficiency compared to state-of-the-art baselines, despite using FP32 while baselines use FP16. FlashDMoE demonstrates that principled GPU kernel-hardware co-design is key to unlocking the performance ceiling of large-scale distributed ML workloads.

nan


Article 1141

Title@2025-06-09 (1): State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models

Title: State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models State-offset Tuning: State-based Parameter-Efficient Fine-Tuning für State Space Models 以国家为基础的国家空间模型参数 – – 有效精美设计 2503.03499v2

Authors (6): Wonjun Kang, Kevin Galim, Yuchen Zeng, Minjae Lee, Hyung Il Koo, Nam Ik Cho

State Space Models (SSMs) have emerged as efficient alternatives to Transformers, mitigating their quadratic computational cost. However, the application of Parameter-Efficient Fine-Tuning (PEFT) methods to SSMs remains largely unexplored. In particular, prompt-based methods like Prompt Tuning and Prefix-Tuning, which are widely used in Transformers, do not perform well on SSMs. To address this, we propose state-based methods as a superior alternative to prompt-based methods. This new family of methods naturally stems from the architectural characteristics of SSMs. State-based methods adjust state-related features directly instead of depending on external prompts. Furthermore, we introduce a novel state-based PEFT method: State-offset Tuning. At every timestep, our method directly affects the state at the current step, leading to more effective adaptation. Through extensive experiments across diverse datasets, we demonstrate the effectiveness of our method. Code is available at https://github.com/furiosa-ai/ssm-state-tuning.

nan


Article 1142

Title@2025-06-09 (1): HeTa: Relation-wise Heterogeneous Graph Foundation Attack Model

Title: HeTa: Relation-wise Heterogeneous Graph Foundation Attack Model HeTa: Verhältnisweises Heterogenes Graph Foundation Attack Model HeTa: 关系性异异异图形基金会攻击模型 2506.07428v1

Authors (4): Yuling Wang, Zihui Chen, Pengfei Jiao, Xiao Wang

Heterogeneous Graph Neural Networks (HGNNs) are vulnerable, highlighting the need for tailored attacks to assess their robustness and ensure security. However, existing HGNN attacks often require complex retraining of parameters to generate specific perturbations for new scenarios. Recently, foundation models have opened new horizons for the generalization of graph neural networks by capturing shared semantics across various graph distributions. This leads us to ask:Can we design a foundation attack model for HGNNs that enables generalizable perturbations across different HGNNs, and quickly adapts to new heterogeneous graphs (HGs)? Empirical findings reveal that, despite significant differences in model design and parameter space, different HGNNs surprisingly share common vulnerability patterns from a relation-aware perspective. Therefore, we explore how to design foundation HGNN attack criteria by mining shared attack units. In this paper, we propose a novel relation-wise heterogeneous graph foundation attack model, HeTa. We introduce a foundation surrogate model to align heterogeneity and identify the importance of shared relation-aware attack units. Building on this, we implement a serialized relation-by-relation attack based on the identified relational weights. In this way, the perturbation can be transferred to various target HGNNs and easily fine-tuned for new HGs. Extensive experiments exhibit powerful attack performances and generalizability of our method.

nan


Article 1143

Title@2025-06-09 (1): Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective

Title: Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective Macht-Rechts-Dekay Verlust für große Sprachmodell Finetuning: Eine Theorie-Perspektive 大语言模型微调的功率法减缩损失:理论视角 2505.16900v5

Authors (1): Jintian Shao

During the finetuning stage of text generation tasks, standard cross-entropy loss treats all tokens equally. This can lead models to overemphasize high-frequency, low-information tokens, neglecting lower-frequency tokens crucial for specificity and informativeness in generated content. This paper introduces a novel loss function, Power-Law Decay Loss (PDL), specifically designed to optimize the finetuning process for text generation. The core motivation for PDL stems from observations in information theory and linguistics: the informativeness of a token is often inversely proportional to its frequency of occurrence. PDL re-weights the contribution of each token in the standard cross-entropy loss based on its frequency in the training corpus, following a power-law decay. Specifically, the weights for high-frequency tokens are reduced, while low-frequency, information-dense tokens are assigned higher weights. This mechanism guides the model during finetuning to focus more on learning and generating tokens that convey specific and unique information, thereby enhancing the quality, diversity, and informativeness of the generated text. We theoretically elaborate on the motivation and construction of PDL and discuss its potential applications and advantages across various text generation finetuning tasks, such as abstractive summarization, dialogue systems, and style transfer.

nan


Article 1144

Title@2025-06-09 (1): DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning

Title: DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning DOMAIN: Mildly COnservative Model-BAsed Offline Verstärkungslernen DOMAIN: 低度防腐性示范性软体外表强化学习 2309.08925v4

Authors (10): Xiao-Yin Liu, Xiao-Hu Zhou, Mei-Jiang Gui, Guo-Tao Li, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Qi-Chao Zhang, Biao Luo, Zeng-Guang Hou

Model-based reinforcement learning (RL), which learns an environment model from the offline dataset and generates more out-of-distribution model data, has become an effective approach to the problem of distribution shift in offline RL. Due to the gap between the learned and actual environment, conservatism should be incorporated into the algorithm to balance accurate offline data and imprecise model data. The conservatism of current algorithms mostly relies on model uncertainty estimation. However, uncertainty estimation is unreliable and leads to poor performance in certain scenarios, and the previous methods ignore differences between the model data, which brings great conservatism. To address the above issues, this paper proposes a milDly cOnservative Model-bAsed offlINe RL algorithm (DOMAIN) without estimating model uncertainty, and designs the adaptive sampling distribution of model samples, which can adaptively adjust the model data penalty. In this paper, we theoretically demonstrate that the Q value learned by the DOMAIN outside the region is a lower bound of the true Q value, the DOMAIN is less conservative than previous model-based offline RL algorithms, and has the guarantee of safety policy improvement. The results of extensive experiments show that DOMAIN outperforms prior RL algorithms and the average performance has improved by 1.8% on the D4RL benchmark.

nan


Article 1145

Title@2025-06-09 (1): IF-GUIDE: Influence Function-Guided Detoxification of LLMs

Title: IF-GUIDE: Influence Function-Guided Detoxification of LLMs IF-GUIDE: Einfluss funktionsgeführte Entgiftung von LLMs IFGIIDE: LLMs的受控功能解毒作用影响 2506.01790v2

Authors (4): Zachary Coalson, Juhan Bae, Nicholas Carlini, Sanghyun Hong

We study how training data contributes to the emergence of toxic behaviors in large-language models. Most prior work on reducing model toxicity adopts $reactive$ approaches, such as fine-tuning pre-trained (and potentially toxic) models to align them with human values. In contrast, we propose a $proactive$ approach$-$IF-Guide$-$which leverages influence functions to identify harmful tokens within any training data and suppress their impact during training. To this end, we first show that standard influence functions are ineffective at discovering harmful training records. We then present a novel adaptation that measures token-level attributions from training data to model toxicity, along with techniques for selecting toxic training documents and a learning objective that can be integrated into both pre-training and fine-tuning. Moreover, IF-Guide does not rely on human-preference data, which is typically required by existing alignment methods. In evaluation, we demonstrate that IF-Guide substantially reduces both explicit and implicit toxicity$-$by up to 10$\times$ compared to uncensored models, and up to 3$\times$ compared to baseline alignment methods, e.g., DPO and RAD$-$across both pre-training and fine-tuning scenarios. IF-Guide is computationally efficient: a billion-parameter model is $not$ $necessary$ for computing influence scores; a million-parameter model$-$with 7.5$\times$ fewer parameters$-$can effectively serve as a proxy for identifying harmful data. Our code is publicly available at: https://github.com/ztcoalson/IF-Guide

nan


Article 1146

Title@2025-06-09 (1): Evidential Spectrum-Aware Contrastive Learning for OOD Detection in Dynamic Graphs

Title: Evidential Spectrum-Aware Contrastive Learning for OOD Detection in Dynamic Graphs Evidential Spectrum-Aware Kontrastives Lernen für OOD Detection in dynamischen Graphen 动态图中 OOD 探测的 证据性光谱- 软件差异性学习 2506.07417v1

Authors (6): Nan Sun, Xixun Lin, Zhiheng Zhou, Yanmin Shang, Zhenlin Cheng, Yanan Cao

Recently, Out-of-distribution (OOD) detection in dynamic graphs, which aims to identify whether incoming data deviates from the distribution of the in-distribution (ID) training set, has garnered considerable attention in security-sensitive fields. Current OOD detection paradigms primarily focus on static graphs and confront two critical challenges: i) high bias and high variance caused by single-point estimation, which makes the predictions sensitive to randomness in the data; ii) score homogenization resulting from the lack of OOD training data, where the model only learns ID-specific patterns, resulting in overall low OOD scores and a narrow score gap between ID and OOD data. To tackle these issues, we first investigate OOD detection in dynamic graphs through the lens of Evidential Deep Learning (EDL). Specifically, we propose EviSEC, an innovative and effective OOD detector via Evidential Spectrum-awarE Contrastive Learning. We design an evidential neural network to redefine the output as the posterior Dirichlet distribution, explaining the randomness of inputs through the uncertainty of distribution, which is overlooked by single-point estimation. Moreover, spectrum-aware augmentation module generates OOD approximations to identify patterns with high OOD scores, thereby widening the score gap between ID and OOD data and mitigating score homogenization. Extensive experiments on real-world datasets demonstrate that EviSAC effectively detects OOD samples in dynamic graphs.

nan


Article 1147

Title@2025-06-09 (1): LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments

Title: LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments LiteVLM: Eine Bildverarbeitungs-Modell-Inferenz-Pipeline mit geringer Latenz für ressourcenbeschränkte Umgebungen LiteVLM:资源受限环境的低寿命愿景-语言模型推导管道 2506.07416v1

Authors (4): Jin Huang, Yuchao Jin, Le An, Josh Park

This paper introduces an efficient Vision-Language Model (VLM) pipeline specifically optimized for deployment on embedded devices, such as those used in robotics and autonomous driving. The pipeline significantly reduces the computational overhead by jointly leveraging patch selection to filter irrelevant camera views, a token selection module to reduce input sequence length for the LLM, and speculative decoding to accelerate token generation. Evaluation on the NVIDIA DRIVE Thor platform for automonous driving application, our pipeline achieves $2.5\times$ end-to-end latency reduction without compromising task accuracy. The speed-up further increases to $3.2\times$ when applying FP8 post-training quantization. These results demonstrate our pipeline as a viable solution for enabling real-time VLM deployment in resource-constrained environments.

nan


Article 1148

Title@2025-06-09 (1): Variational Supervised Contrastive Learning

Title: Variational Supervised Contrastive Learning Variationelles Überwachtes Kontrastuelles Lernen 差异监督反舞弊学习 2506.07413v1

Authors (5): Ziwen Wang, Jiajun Fan, Thao Nguyen, Heng Ji, Ge Liu

Contrastive learning has proven to be highly efficient and adaptable in shaping representation spaces across diverse modalities by pulling similar samples together and pushing dissimilar ones apart. However, two key limitations persist: (1) Without explicit regulation of the embedding distribution, semantically related instances can inadvertently be pushed apart unless complementary signals guide pair selection, and (2) excessive reliance on large in-batch negatives and tailored augmentations hinders generalization. To address these limitations, we propose Variational Supervised Contrastive Learning (VarCon), which reformulates supervised contrastive learning as variational inference over latent class variables and maximizes a posterior-weighted evidence lower bound (ELBO) that replaces exhaustive pair-wise comparisons for efficient class-aware matching and grants fine-grained control over intra-class dispersion in the embedding space. Trained exclusively on image data, our experiments on CIFAR-10, CIFAR-100, ImageNet-100, and ImageNet-1K show that VarCon (1) achieves state-of-the-art performance for contrastive learning frameworks, reaching 79.36% Top-1 accuracy on ImageNet-1K and 78.29% on CIFAR-100 with a ResNet-50 encoder while converging in just 200 epochs; (2) yields substantially clearer decision boundaries and semantic organization in the embedding space, as evidenced by KNN classification, hierarchical clustering results, and transfer-learning assessments; and (3) demonstrates superior performance in few-shot learning than supervised baseline and superior robustness across various augmentation strategies.

nan


Article 1149

Title@2025-06-09 (1): GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models

Title: GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models GANQ: GPU-Adaptive nicht einheitliche Quantisierung für große Sprachmodelle GANQ: 大语言模型的GPU-APU-APU-Adapidive非统一量化 2501.12956v3

Authors (2): Pengxiang Zhao, Xiaoming Yuan

Large Language Models (LLMs) face significant deployment challenges due to their substantial resource requirements. While low-bit quantized weights can reduce memory usage and improve inference efficiency, current hardware lacks native support for mixed-precision General Matrix Multiplication (mpGEMM), resulting in inefficient dequantization-based implementations. Moreover, uniform quantization methods often fail to capture weight distributions adequately, leading to performance degradation. We propose GANQ (GPU-Adaptive Non-Uniform Quantization), a layer-wise post-training non-uniform quantization framework optimized for hardware-efficient lookup table-based mpGEMM. GANQ achieves superior quantization performance by utilizing a training-free, GPU-adaptive optimization algorithm to efficiently reduce layer-wise quantization errors. Extensive experiments demonstrate GANQ’s ability to reduce the perplexity gap from the FP16 baseline compared to state-of-the-art methods for both 3-bit and 4-bit quantization. Furthermore, when deployed on a single NVIDIA RTX 4090 GPU, GANQ’s quantized models achieve up to 2.57$\times$ speedup over the baseline, advancing memory and inference efficiency in LLM deployment.

nan


Article 1150

Title@2025-06-09 (1): Structured Pruning for Diverse Best-of-N Reasoning Optimization

Title: Structured Pruning for Diverse Best-of-N Reasoning Optimization Strukturierte Prunings für die unterschiedliche Best-of-N-Reasoning-Optimierung 结构审慎,实现最佳理由的多样化最佳最佳理由的优化 2506.03978v2

Authors (3): Hieu Trung Nguyen, Bao Nguyen, Viet Anh Nguyen

Model pruning in transformer-based language models, traditionally viewed as a means of achieving computational savings, can enhance the model’s reasoning capabilities. In this work, we uncover a surprising phenomenon: the selective pruning of certain attention heads leads to improvements in reasoning performance, particularly on challenging tasks. Motivated by this observation, we propose SPRINT, a novel contrastive learning framework that dynamically selects the optimal head and layer to prune during inference. By aligning question embeddings with head embeddings, SPRINT identifies those pruned-head configurations that result in more accurate reasoning. Extensive experiments demonstrate that our method significantly outperforms traditional best-of-$N$ and random head selection strategies on the MATH500 and GSM8K datasets.

nan


Article 1151

Title@2025-06-09 (1): Fractional-order Jacobian Matrix Differentiation and Its Application in Artificial Neural Networks

Title: Fractional-order Jacobian Matrix Differentiation and Its Application in Artificial Neural Networks Fraktions-Order Jacobian Matrix Differenzierung und seine Anwendung in künstlichen neuralen Netzwerken Jacobian 矩阵差异及其在人造神经网络中的应用 2506.07408v1

Authors (6): Xiaojun zhou, Chunna Zhao, Yaqun Huang, Chengli Zhou, Junjie Ye, Kemeng Xiang

Fractional-order differentiation has many characteristics different from integer-order differentiation. These characteristics can be applied to the optimization algorithms of artificial neural networks to obtain better results. However, due to insufficient theoretical research, at present, there is no fractional-order matrix differentiation method that is perfectly compatible with automatic differentiation (Autograd) technology. Therefore, we propose a fractional-order matrix differentiation calculation method. This method is introduced by the definition of the integer-order Jacobian matrix. We denote it as fractional-order Jacobian matrix differentiation ($^\alpha }$). Through $^\alpha }$, we can carry out the matrix-based fractional-order chain rule. Based on the Linear module and the fractional-order differentiation, we design the fractional-order Autograd technology to enable the use of fractional-order differentiation in hidden layers, thereby enhancing the practicality of fractional-order differentiation in deep learning. In the experiment, according to the PyTorch framework, we design fractional-order Linear (FLinear) and replace nn.Linear in the multilayer perceptron with FLinear. Through the qualitative analysis of the training set and validation set $Loss$, the quantitative analysis of the test set indicators, and the analysis of time consumption and GPU memory usage during model training, we verify the superior performance of $^\alpha }$ and prove that it is an excellent fractional-order gradient descent method in the field of deep learning.

nan


Article 1152

Title@2025-06-09 (1): Anomaly Detection and Early Warning Mechanism for Intelligent Monitoring Systems in Multi-Cloud Environments Based on LLM

Title: Anomaly Detection and Early Warning Mechanism for Intelligent Monitoring Systems in Multi-Cloud Environments Based on LLM Anomalieerkennungs- und Frühwarnmechanismus für intelligente Überwachungssysteme in Multi-Cloud-Umgebungen auf Basis von LLM 以LLM为基础在多城市环境中建立智能监测系统的异常探测和预警机制 2506.07407v1

Authors (4): Yihong Jin, Ze Yang, Juntian Liu, Xinhe Xu

With the rapid development of multi-cloud environments, it is increasingly important to ensure the security and reliability of intelligent monitoring systems. In this paper, we propose an anomaly detection and early warning mechanism for intelligent monitoring system in multi-cloud environment based on Large-Scale Language Model (LLM). On the basis of the existing monitoring framework, the proposed model innovatively introduces a multi-level feature extraction method, which combines the natural language processing ability of LLM with traditional machine learning methods to enhance the accuracy of anomaly detection and improve the real-time response efficiency. By introducing the contextual understanding capabilities of LLMs, the model dynamically adapts to different cloud service providers and environments, so as to more effectively detect abnormal patterns and predict potential failures. Experimental results show that the proposed model is significantly better than the traditional anomaly detection system in terms of detection accuracy and latency, and significantly improves the resilience and active management ability of cloud infrastructure.

nan


Article 1153

Title@2025-06-09 (1): InverseScope: Scalable Activation Inversion for Interpreting Large Language Models

Title: InverseScope: Scalable Activation Inversion for Interpreting Large Language Models InverseScope: Skalierbare Aktivierungs-Inversion zum Dolmetschen großer Sprachmodelle 反向范围: 用于解释大语言模型的可缩放活动转换 2506.07406v1

Authors (3): Yifan Luo, Zhennan Zhou, Bin Dong

Understanding the internal representations of large language models (LLMs) is a central challenge in interpretability research. Existing feature interpretability methods often rely on strong assumptions about the structure of representations that may not hold in practice. In this work, we introduce InverseScope, an assumption-light and scalable framework for interpreting neural activations via input inversion. Given a target activation, we define a distribution over inputs that generate similar activations and analyze this distribution to infer the encoded features. To address the inefficiency of sampling in high-dimensional spaces, we propose a novel conditional generation architecture that significantly improves sample efficiency compared to previous methods. We further introduce a quantitative evaluation protocol that tests interpretability hypotheses using feature consistency rate computed over the sampled inputs. InverseScope scales inversion-based interpretability methods to larger models and practical tasks, enabling systematic and quantitative analysis of internal representations in real-world LLMs.

nan


Article 1154

Title@2025-06-09 (1): RiemannFormer: A Framework for Attention in Curved Spaces

Title: RiemannFormer: A Framework for Attention in Curved Spaces RiemannFormer: Ein Rahmenwerk für Aufmerksamkeit in geschwungenen Räumen Riemann Fremer: 曲线空间的注意框架 2506.07405v1

Authors (1): Zhongping Ji

This research endeavors to offer insights into unlocking the further potential of transformer-based architectures. One of the primary motivations is to offer a geometric interpretation for the attention mechanism in transformers. In our framework, the attention mainly involves metric tensors, tangent spaces, inner product, and how they relate to each other. These quantities and structures at discrete positions are intricately interconnected via the parallel transport of tangent vectors. To make the learning process more efficient, we reduce the number of parameters through ingenious predefined configurations. Moreover, we introduce an explicit mechanism to highlight a neighborhood by attenuating the remote values, given that transformers inherently neglect local inductive bias. Experimental results demonstrate that our modules deliver significant performance improvements relative to the baseline. More evaluation experiments on visual and large language models will be launched successively.

nan


Article 1155

Title@2025-06-09 (1): Outlier-weighed Layerwise Sampling for LLM Fine-tuning

Title: Outlier-weighed Layerwise Sampling for LLM Fine-tuning Ausgewägte schichtweise Probenahme für LLM-Feinabstimmung LLM 精密调试 2405.18380v3

Authors (4): Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei Liu

The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks. However, the substantial size of LLMs presents significant challenges in training or fine-tuning. While parameter-efficient approaches such as low-rank adaptation (LoRA) have gained popularity, they often compromise performance compared to full-rank fine-tuning. In this paper, we propose Outlier-weighed Layerwise Sampling (OWS), a new memory-efficient fine-tuning approach, inspired by the layerwise outlier distribution of LLMs. Unlike LoRA, which adds extra adapters to all layers, OWS strategically assigns higher sampling probabilities to layers with more outliers, selectively sampling only a few layers and fine-tuning their pre-trained weights. To further increase the number of fine-tuned layers without a proportional rise in memory costs, we incorporate gradient low-rank projection, further boosting the approach’s performance. Our extensive experiments across various architectures, including LLaMa2 and Mistral, demonstrate that OWS consistently outperforms baseline approaches, including full fine-tuning. Specifically, it achieves up to a 1.1% average accuracy gain on the Commonsense Reasoning benchmark, a 3.0% improvement on MMLU, and a notable 10% boost on MT-Bench, while being more memory efficient. OWS allows us to fine-tune 7B LLMs with only 21GB of memory. Our code is available at https://github.com/pixeli99/OWS.

nan


Article 1156

Title@2025-06-09 (1): Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions

Title: Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions Numerische Untersuchung der Sequenzmodellierungstheorie mit steuerbaren Speicherfunktionen 使用可控内存函数对序列模拟理论进行数字调查 2506.05678v2

Authors (4): Haotian Jiang, Zeyu Bao, Shida Wang, Qianxiao Li

The evolution of sequence modeling architectures, from recurrent neural networks and convolutional models to Transformers and structured state-space models, reflects ongoing efforts to address the diverse temporal dependencies inherent in sequential data. Despite this progress, systematically characterizing the strengths and limitations of these architectures remains a fundamental challenge. In this work, we propose a synthetic benchmarking framework to evaluate how effectively different sequence models capture distinct temporal structures. The core of this approach is to generate synthetic targets, each characterized by a memory function and a parameter that determines the strength of temporal dependence. This setup allows us to produce a continuum of tasks that vary in temporal complexity, enabling fine-grained analysis of model behavior concerning specific memory properties. We focus on four representative memory functions, each corresponding to a distinct class of temporal structures. Experiments on several sequence modeling architectures confirm existing theoretical insights and reveal new findings. These results demonstrate the effectiveness of the proposed method in advancing theoretical understanding and highlight the importance of using controllable targets with clearly defined structures for evaluating sequence modeling architectures.

nan


Article 1157

Title@2025-06-09 (1): G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Title: G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems G-Memory: Hierarchischer Speicher für Multi-Agent-Systeme G-记忆:为多机构系统追踪等级记忆 2506.07398v1

Authors (6): Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, Shuicheng Yan

Large language model (LLM)-powered multi-agent systems (MAS) have demonstrated cognitive and execution capabilities that far exceed those of single LLM agents, yet their capacity for self-evolution remains hampered by underdeveloped memory architectures. Upon close inspection, we are alarmed to discover that prevailing MAS memory mechanisms (1) are overly simplistic, completely disregarding the nuanced inter-agent collaboration trajectories, and (2) lack cross-trial and agent-specific customization, in stark contrast to the expressive memory developed for single agents. To bridge this gap, we introduce G-Memory, a hierarchical, agentic memory system for MAS inspired by organizational memory theory, which manages the lengthy MAS interaction via a three-tier graph hierarchy: insight, query, and interaction graphs. Upon receiving a new user query, G-Memory performs bi-directional memory traversal to retrieve both $\textit{high-level, generalizable insights}$ that enable the system to leverage cross-trial knowledge, and $\textit{fine-grained, condensed interaction trajectories}$ that compactly encode prior collaboration experiences. Upon task execution, the entire hierarchy evolves by assimilating new collaborative trajectories, nurturing the progressive evolution of agent teams. Extensive experiments across five benchmarks, three LLM backbones, and three popular MAS frameworks demonstrate that G-Memory improves success rates in embodied action and accuracy in knowledge QA by up to $20.89\%$ and $10.12\%$, respectively, without any modifications to the original frameworks. Our codes are available at https://github.com/bingreeky/GMemory.

nan


Article 1158

Title@2025-06-09 (1): Nonparametric Modern Hopfield Models

Title: Nonparametric Modern Hopfield Models Nichtparametrische moderne Hopfield-Modelle 非参数现代跳地模型 2404.03900v2

Authors (5): Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu, Feng Ruan, Han Liu

We present a nonparametric interpretation for deep learning compatible modern Hopfield models and utilize this new perspective to debut efficient variants. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Interestingly, our framework not only recovers the known results from the original dense modern Hopfield model but also fills the void in the literature regarding efficient modern Hopfield models, by introducing \textit{sparse-structured} modern Hopfield models with sub-quadratic complexity. We establish that this sparse model inherits the appealing theoretical properties of its dense analogue – connection with transformer attention, fixed point convergence and exponential memory capacity. Additionally, we showcase the versatility of our framework by constructing a family of modern Hopfield models as extensions, including linear, random masked, top-$K$ and positive random feature modern Hopfield models. Empirically, we validate our framework in both synthetic and realistic settings for memory retrieval and learning tasks.

nan


Article 1159

Title@2025-06-09 (1): From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks

Title: From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks Von der statischen zur adaptiven Verteidigung: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks 从静态到适应性防卫:联邦多要求深入强化学习-驱动移动目标防御,防止UAV Swarm网络的DoS攻击 2506.07392v1

Authors (6): Yuyang Zhou, Guang Cheng, Kang Du, Zihan Chen, Tian Qin, Yuyu Zhao

The proliferation of unmanned aerial vehicle (UAV) swarms has enabled a wide range of mission-critical applications, but also exposes UAV networks to severe Denial-of-Service (DoS) threats due to their open wireless environment, dynamic topology, and resource constraints. Traditional static or centralized defense mechanisms are often inadequate for such dynamic and distributed scenarios. To address these challenges, we propose a novel federated multi-agent deep reinforcement learning (FMADRL)-driven moving target defense (MTD) framework for proactive and adaptive DoS mitigation in UAV swarm networks. Specifically, we design three lightweight and coordinated MTD mechanisms, including leader switching, route mutation, and frequency hopping, that leverage the inherent flexibility of UAV swarms to disrupt attacker efforts and enhance network resilience. The defense problem is formulated as a multi-agent partially observable Markov decision process (POMDP), capturing the distributed, resource-constrained, and uncertain nature of UAV swarms under attack. Each UAV is equipped with a local policy agent that autonomously selects MTD actions based on partial observations and local experiences. By employing a policy gradient-based FMADRL algorithm, UAVs collaboratively optimize their defense policies via reward-weighted aggregation, enabling distributed learning without sharing raw data and thus reducing communication overhead. Extensive simulations demonstrate that our approach significantly outperforms state-of-the-art baselines, achieving up to a 34.6% improvement in attack mitigation rate, a reduction in average recovery time of up to 94.6%, and decreases in energy consumption and defense cost by as much as 29.3% and 98.3%, respectively, while maintaining robust mission continuity under various DoS attack strategies.

nan


Article 1160

Title@2025-06-09 (1): Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection

Title: Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection Schutz von Deep-Learning-Modell-Urheberrechten mit zweifelhafter Beispiel-freier Wiederverwertungserkennung 保护深学习模式版权,进行反反对学性实例自由再利用探测 2407.03883v2

Authors (4): Xiaokun Luan, Xiyue Zhang, Jingyi Wang, Meng Sun

Model reuse techniques can reduce the resource requirements for training high-performance deep neural networks (DNNs) by leveraging existing models. However, unauthorized reuse and replication of DNNs can lead to copyright infringement and economic loss to the model owner. This underscores the need to analyze the reuse relation between DNNs and develop copyright protection techniques to safeguard intellectual property rights. Existing white-box testing-based approaches cannot address the common heterogeneous reuse case where the model architecture is changed, and DNN fingerprinting approaches heavily rely on generating adversarial examples with good transferability, which is known to be challenging in the black-box setting. To bridge the gap, we propose NFARD, a Neuron Functionality Analysis-based Reuse Detector, which only requires normal test samples to detect reuse relations by measuring the models’ differences on a newly proposed model characterization, i.e., neuron functionality (NF). A set of NF-based distance metrics is designed to make NFARD applicable to both white-box and black-box settings. Moreover, we devise a linear transformation method to handle heterogeneous reuse cases by constructing the optimal projection matrix for dimension consistency, significantly extending the application scope of NFARD. To the best of our knowledge, this is the first adversarial example-free method that exploits neuron functionality for DNN copyright protection. As a side contribution, we constructed a reuse detection benchmark named Reuse Zoo that covers various practical reuse techniques and popular datasets. Extensive evaluations on this comprehensive benchmark show that NFARD achieves F1 scores of 0.984 and 1.0 for detecting reuse relationships in black-box and white-box settings, respectively, while generating test suites 2 ~ 99 times faster than previous methods.

nan


Article 1161

Title@2025-06-09 (1): Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert

Title: Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert Lärm-Robustheit durch Lärm: Asymmetrische LoRA-Anpassung mit Giftungsexperte 噪音造成的噪音破坏:与中毒专家一起进行非对称 LoRA适应 2505.23868v3

Authors (8): Zhaokun Wang, Jinyu Guo, Jingwen Pu, Lingfeng Chen, Hongli Pu, Jie Ou, Libo Qin, Wenhong Tian

Current parameter-efficient fine-tuning methods for adapting pre-trained language models to downstream tasks are susceptible to interference from noisy data. Conventional noise-handling approaches either rely on laborious data pre-processing or employ model architecture modifications prone to error accumulation. In contrast to existing noise-process paradigms, we propose a noise-robust adaptation method via asymmetric LoRA poisoning experts (LoPE), a novel framework that enhances model robustness to noise only with generated noisy data. Drawing inspiration from the mixture-of-experts architecture, LoPE strategically integrates a dedicated poisoning expert in an asymmetric LoRA configuration. Through a two-stage paradigm, LoPE performs noise injection on the poisoning expert during fine-tuning to enhance its noise discrimination and processing ability. During inference, we selectively mask the dedicated poisoning expert to leverage purified knowledge acquired by normal experts for noise-robust output. Extensive experiments demonstrate that LoPE achieves strong performance and robustness purely through the low-cost noise injection, which completely eliminates the requirement of data cleaning.

nan


Article 1162

Title@2025-06-09 (1): Moment Alignment: Unifying Gradient and Hessian Matching for Domain Generalization

Title: Moment Alignment: Unifying Gradient and Hessian Matching for Domain Generalization Moment Alignment: Vereinheitlichung von Gradienten und Hessischen Matching für Domain Generalization 时速对齐: 统一渐变和赫西相匹配域域通用化 2506.07378v1

Authors (4): Yuen Chen, Haozhe Si, Guojun Zhang, Han Zhao

Domain generalization (DG) seeks to develop models that generalize well to unseen target domains, addressing the prevalent issue of distribution shifts in real-world applications. One line of research in DG focuses on aligning domain-level gradients and Hessians to enhance generalization. However, existing methods are computationally inefficient and the underlying principles of these approaches are not well understood. In this paper, we develop the theory of moment alignment for DG. Grounded in \textit{transfer measure}, a principled framework for quantifying generalizability between two domains, we first extend the definition of transfer measure to domain generalization that includes multiple source domains and establish a target error bound. Then, we prove that aligning derivatives across domains improves transfer measure both when the feature extractor induces an invariant optimal predictor across domains and when it does not. Notably, moment alignment provides a unifying understanding of Invariant Risk Minimization, gradient matching, and Hessian matching, three previously disconnected approaches to DG. We further connect feature moments and derivatives of the classifier head, and establish the duality between feature learning and classifier fitting. Building upon our theory, we introduce \textbf{C}losed-Form \textbf{M}oment \textbf{A}lignment (CMA), a novel DG algorithm that aligns domain-level gradients and Hessians in closed-form. Our method overcomes the computational inefficiencies of existing gradient and Hessian-based techniques by eliminating the need for repeated backpropagation or sampling-based Hessian estimation. We validate the efficacy of our approach through two sets of experiments: linear probing and full fine-tuning. CMA demonstrates superior performance in both settings compared to Empirical Risk Minimization and state-of-the-art algorithms.

nan


Article 1163

Title@2025-06-09 (1): Adapter Naturally Serves as Decoupler for Cross-Domain Few-Shot Semantic Segmentation

Title: Adapter Naturally Serves as Decoupler for Cross-Domain Few-Shot Semantic Segmentation Adapter dient natürlich als Entkoppler für die Cross-Domain-Few-Shot Semantic Segmentierung 自然适应器作为跨域鲜热语义分割的解码器 2506.07376v1

Authors (6): Jintao Tong, Ran Ma, Yixiong Zou, Guangyao Chen, Yuhua Li, Ruixuan Li

Cross-domain few-shot segmentation (CD-FSS) is proposed to pre-train the model on a source-domain dataset with sufficient samples, and then transfer the model to target-domain datasets where only a few samples are available for efficient fine-tuning. There are majorly two challenges in this task: (1) the domain gap and (2) fine-tuning with scarce data. To solve these challenges, we revisit the adapter-based methods, and discover an intriguing insight not explored in previous works: the adapter not only helps the fine-tuning of downstream tasks but also naturally serves as a domain information decoupler. Then, we delve into this finding for an interpretation, and find the model’s inherent structure could lead to a natural decoupling of domain information. Building upon this insight, we propose the Domain Feature Navigator (DFN), which is a structure-based decoupler instead of loss-based ones like current works, to capture domain-specific information, thereby directing the model’s attention towards domain-agnostic knowledge. Moreover, to prevent the potential excessive overfitting of DFN during the source-domain training, we further design the SAM-SVN method to constrain DFN from learning sample-specific knowledge. On target domains, we freeze the model and fine-tune the DFN to learn target-specific knowledge specific. Extensive experiments demonstrate that our method surpasses the state-of-the-art method in CD-FSS significantly by 2.69% and 4.68% MIoU in 1-shot and 5-shot scenarios, respectively.

nan


Article 1164

Title@2025-06-09 (1): Numerical Analysis of HiPPO-LegS ODE for Deep State Space Models

Title: Numerical Analysis of HiPPO-LegS ODE for Deep State Space Models Numerische Analyse von HiPPO-LegS ODE für Deep State Space Modelle HIPPPO-LegS深海国家空间模型ODE的数值分析 2412.08595v2

Authors (4): Jaesung R. Park, Jaewook J. Suh, Youngjoon Hong, Ernest K. Ryu

In deep learning, the recently introduced state space models utilize HiPPO (High-order Polynomial Projection Operators) memory units to approximate continuous-time trajectories of input functions using ordinary differential equations (ODEs), and these techniques have shown empirical success in capturing long-range dependencies in long input sequences. However, the mathematical foundations of these ODEs, particularly the singular HiPPO-LegS (Legendre Scaled) ODE, and their corresponding numerical discretizations remain unsettled. In this work, we fill this gap by establishing that HiPPO-LegS ODE is well-posed despite its singularity, albeit without the freedom of arbitrary initial conditions. Further, we establish convergence of the associated numerical discretization schemes for Riemann integrable input functions.

nan


Article 1165

Title@2025-06-09 (1): A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

Title: A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment Eine umfassende Umfrage in LLM(-Agent) Full Stack Sicherheit: Daten, Schulung und Bereitstellung 用LLLM(-代理)全堆安全:数据、培训和部署进行的全面调查 2504.15585v4

Authors (103): Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Shicheng Xu, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu, Yue Liu, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Zhaoxin Fan, Kai Wang, Yi Ding, Donghai Hong, Jiaming Ji, Yingxin Lai, Zitong Yu, Xinfeng Li, Yifan Jiang, Yanhui Li, Xinyu Deng, Junlin Wu, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Qiufeng Wang, Xiaolong Jin, Wenxuan Wang, Dongrui Liu, Yanwei Yue, Wenke Huang, Guancheng Wan, Heng Chang, Tianlin Li, Yi Yu, Chenghao Li, Jiawei Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Jiaheng Zhang, Tianwei Zhang, Xingjun Ma, Jindong Gu, Liang Pang, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Lingjuan Lyu, Yuval Elovici, Bhavya Kailkhura, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, Xiaofeng Wang, Dacheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu

The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concern, not only for researchers and corporations but also for every nation. Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e.g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire “lifechain” of LLMs. To address this gap, this paper introduces, for the first time, the concept of “full-stack” safety to systematically consider safety issues throughout the entire process of LLM training, deployment, and eventual commercialization. Compared to the off-the-shelf LLM safety surveys, our work demonstrates several distinctive advantages: (I) Comprehensive Perspective. We define the complete LLM lifecycle as encompassing data preparation, pre-training, post-training, deployment and final commercialization. To our knowledge, this represents the first safety survey to encompass the entire lifecycle of LLMs. (II) Extensive Literature Support. Our research is grounded in an exhaustive review of over 800+ papers, ensuring comprehensive coverage and systematic organization of security issues within a more holistic understanding. (III) Unique Insights. Through systematic literature analysis, we have developed reliable roadmaps and perspectives for each chapter. Our work identifies promising research directions, including safety in data generation, alignment techniques, model editing, and LLM-based agent systems. These insights provide valuable guidance for researchers pursuing future work in this field.

nan


Article 1166

Title@2025-06-09 (1): Theoretical Benefit and Limitation of Diffusion Language Model

Title: Theoretical Benefit and Limitation of Diffusion Language Model Theoretischer Nutzen und Begrenzung des Diffusions-Sprachmodells 传播语言模式理论效益和限制 2502.09622v2

Authors (6): Guhao Feng, Yihan Geng, Jian Guan, Wei Wu, Liwei Wang, Di He

Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each diffusion step. However, its efficiency-accuracy trade-off is not yet well understood. In this paper, we present a rigorous theoretical analysis of a widely used type of diffusion language model, the Masked Diffusion Model (MDM), and find that its effectiveness heavily depends on the target evaluation metric. Under mild conditions, we prove that when using perplexity as the metric, MDMs can achieve near-optimal perplexity in sampling steps regardless of sequence length, demonstrating that efficiency can be achieved without sacrificing performance. However, when using the sequence error rate–which is important for understanding the “correctness” of a sequence, such as a reasoning chain–we show that the required sampling steps must scale linearly with sequence length to obtain “correct” sequences, thereby eliminating MDM’s efficiency advantage over autoregressive models. Our analysis establishes the first theoretical foundation for understanding the benefits and limitations of MDMs. All theoretical findings are supported by empirical studies.

nan


Article 1167

Title@2025-06-09 (1): MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing

Title: MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing MoE-GPS: Guidlines für Vorhersagestrategie für dynamische Expertenduplikation in MoE Load Balancing MoE-GPS:教育部负载平衡中动态专家重叠预测战略指南 2506.07366v1

Authors (3): Haiyue Ma, Zhixu Du, Yiran Chen

In multi-GPU Mixture-of-Experts (MoE) network, experts are distributed across different GPUs, which creates load imbalance as each expert processes different number of tokens. Recent works improve MoE inference load balance by dynamically duplicating popular experts to more GPUs to process excessive tokens, which requires predicting the distribution before routing. In this paper, we discuss the tradeoff of prediction strategies, accuracies, overhead, and end-to-end system performance. We propose MoE-GPS, a framework that guides the selection of the optimal predictor design under various system configurations, by quantifying the performance impact to system-level model runtime. Specifically, we advocate for Distribution-Only Prediction, a prediction strategy that only predicts overall token distribution which significantly reduces overhead compared to the traditional Token-to-Expert Prediction. On Mixtral 8x7B MMLU dataset, MoE-GPS suggests Distribution-Only Prediction which improves end-to-end inference performance by more than 23% compared with Token-to-Expert Prediction.

nan


Article 1168

Title@2025-06-09 (1): Multiple Object Stitching for Unsupervised Representation Learning

Title: Multiple Object Stitching for Unsupervised Representation Learning Mehrere Objektstiche für unüberwachtes Repräsentationslernen 用于无人监督的代表学习的多对象拼接 2506.07364v1

Authors (3): Chengchao Shen, Dawei Liu, Jianxin Wang

Contrastive learning for single object centric images has achieved remarkable progress on unsupervised representation, but suffering inferior performance on the widespread images with multiple objects. In this paper, we propose a simple but effective method, Multiple Object Stitching (MOS), to refine the unsupervised representation for multi-object images. Specifically, we construct the multi-object images by stitching the single object centric ones, where the objects in the synthesized multi-object images are predetermined. Hence, compared to the existing contrastive methods, our method provides additional object correspondences between multi-object images without human annotations. In this manner, our method pays more attention to the representations of each object in multi-object image, thus providing more detailed representations for complicated downstream tasks, such as object detection and semantic segmentation. Experimental results on ImageNet, CIFAR and COCO datasets demonstrate that our proposed method achieves the leading unsupervised representation performance on both single object centric images and multi-object ones. The source code is available at https://github.com/visresearch/MultipleObjectStitching.

nan


Article 1169

Title@2025-06-09 (1): Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding

Title: Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding Stochastische nichtlineare Steuerung über finite-dimensionale Spektral-Dynamische Einbettung 通过有限维谱光谱动态嵌入器控制非线性控件 2304.03907v5

Authors (5): Zhaolin Ren, Tongzheng Ren, Haitong Ma, Na Li, Bo Dai

This paper proposes an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear stochastic systems. This method reveals an infinite-dimensional feature representation induced by the system’s nonlinear stochastic dynamics, enabling a linear representation of the state-action value function. For practical implementation, this representation is approximated using finite-dimensional trucations, specifically via two prominent kernel approximation methods: random feature truncation and Nystrom approximation. To characterize the effectiveness of these approximations, we provide an in-depth theoretical analysis to characterize the approximation error arising from the finite-dimension truncation and statistical error due to finite-sample approximation in both policy evaluation and policy optimization. Empirically, our algorithm performs favorably against existing stochastic control algorithms on several benchmark problems.

nan


Article 1170

Title@2025-06-09 (1): Lightweight Joint Audio-Visual Deepfake Detection via Single-Stream Multi-Modal Learning Framework

Title: Lightweight Joint Audio-Visual Deepfake Detection via Single-Stream Multi-Modal Learning Framework Leichte gemeinsame Audio-Visuelle Deepfake-Erkennung über Multi-Modal-Lernrahmen mit Single-Stream 通过单层多模式学习框架,通过单一链式多模式学习框架,联合探测音频-视觉深海假相 2506.07358v1

Authors (5): Kuiyuan Zhang, Wenjie Pei, Rushi Lan, Yifang Guo, Zhongyun Hua

Deepfakes are AI-synthesized multimedia data that may be abused for spreading misinformation. Deepfake generation involves both visual and audio manipulation. To detect audio-visual deepfakes, previous studies commonly employ two relatively independent sub-models to learn audio and visual features, respectively, and fuse them subsequently for deepfake detection. However, this may underutilize the inherent correlations between audio and visual features. Moreover, utilizing two isolated feature learning sub-models can result in redundant neural layers, making the overall model inefficient and impractical for resource-constrained environments. In this work, we design a lightweight network for audio-visual deepfake detection via a single-stream multi-modal learning framework. Specifically, we introduce a collaborative audio-visual learning block to efficiently integrate multi-modal information while learning the visual and audio features. By iteratively employing this block, our single-stream network achieves a continuous fusion of multi-modal features across its layers. Thus, our network efficiently captures visual and audio features without the need for excessive block stacking, resulting in a lightweight network design. Furthermore, we propose a multi-modal classification module that can boost the dependence of the visual and audio classifiers on modality content. It also enhances the whole resistance of the video classifier against the mismatches between audio and visual modalities. We conduct experiments on the DF-TIMIT, FakeAVCeleb, and DFDC benchmark datasets. Compared to state-of-the-art audio-visual joint detection methods, our method is significantly lightweight with only 0.48M parameters, yet it achieves superiority in both uni-modal and multi-modal deepfakes, as well as in unseen types of deepfakes.

nan


Article 1171

Title@2025-06-09 (1): CBAM-STN-TPS-YOLO: Enhancing Agricultural Object Detection through Spatially Adaptive Attention Mechanisms

Title: CBAM-STN-TPS-YOLO: Enhancing Agricultural Object Detection through Spatially Adaptive Attention Mechanisms CBAM-STN-TPS-YOLO: Verbesserung der Erkennung landwirtschaftlicher Objekte durch räumlich adaptive Aufmerksamkeitsmechanismen CBAM-STN-TPS-YOLO:通过空间适应性关注机制加强农业物体的探测 2506.07357v1

Authors (2): Satvik Praveen, Yoonsung Jung

Object detection is vital in precision agriculture for plant monitoring, disease detection, and yield estimation. However, models like YOLO struggle with occlusions, irregular structures, and background noise, reducing detection accuracy. While Spatial Transformer Networks (STNs) improve spatial invariance through learned transformations, affine mappings are insufficient for non-rigid deformations such as bent leaves and overlaps. We propose CBAM-STN-TPS-YOLO, a model integrating Thin-Plate Splines (TPS) into STNs for flexible, non-rigid spatial transformations that better align features. Performance is further enhanced by the Convolutional Block Attention Module (CBAM), which suppresses background noise and emphasizes relevant spatial and channel-wise features. On the occlusion-heavy Plant Growth and Phenotyping (PGP) dataset, our model outperforms STN-YOLO in precision, recall, and mAP. It achieves a 12% reduction in false positives, highlighting the benefits of improved spatial flexibility and attention-guided refinement. We also examine the impact of the TPS regularization parameter in balancing transformation smoothness and detection performance. This lightweight model improves spatial awareness and supports real-time edge deployment, making it ideal for smart farming applications requiring accurate and efficient monitoring.

nan


Article 1172

Title@2025-06-09 (1): Unveiling and Addressing Pseudo Forgetting in Large Language Models

Title: Unveiling and Addressing Pseudo Forgetting in Large Language Models Enthüllung und Adressierung von Pseudo Forgetting in großen Sprachmodellen 在大语言模式中统一和解决 “ 优于 “ 的遗忘 2411.11932v2

Authors (5): Huashan Sun, Yizhe Yang, Yinghao Li, Jiawei Li, Yang Gao

Although substantial efforts have been made to mitigate catastrophic forgetting in continual learning, the intrinsic mechanisms are not well understood. In this work, we demonstrate the existence of “pseudo forgetting”: the performance degradation on previous tasks is not attributed to a loss of capabilities, but rather to the failure of the instructions to activate the appropriate model abilities. We show that the model’s performance on previous tasks can be restored through two simple interventions: (1) providing partial external correct rationale, and (2) appending semantically meaningless suffixes to the original instructions, to guide the generation of correct rationales. Through empirical analysis of the internal mechanisms governing rationale generation, we reveal that models exhibiting pseudo forgetting show reduced instruction dependence during rationale generation, leading to suboptimal activation of their inherent capabilities. Based on this insight, we propose Rationale-Guidance Difficulty based Replay (RGD-R) framework that dynamically allocates replay data based on the model’s ability to correctly leverage the intrinsic capabilities. Experimental results demonstrate that RGD-R effectively mitigates pseudo forgetting while maintaining model plasticity.

nan


Article 1173

Title@2025-06-09 (1): SALT: A Lightweight Model Adaptation Method for Closed Split Computing Environments

Title: SALT: A Lightweight Model Adaptation Method for Closed Split Computing Environments SALT: Eine leichte Modellanpassungsmethode für geschlossene Split Computing-Umgebungen SALT: 用于封闭式拆分计算环境的轻量模型适应方法 2506.07355v1

Authors (2): Yuya Okada, Takayuki Nishio

We propose SALT (Split-Adaptive Lightweight Tuning), a lightweight model adaptation framework for Split Computing under closed constraints, where the head and tail networks are proprietary and inaccessible to users. In such closed environments, conventional adaptation methods are infeasible since they require access to model parameters or architectures. SALT addresses this challenge by introducing a compact, trainable adapter on the client side to refine latent features from the head network, enabling user-specific adaptation without modifying the original models or increasing communication overhead. We evaluate SALT on user-specific classification tasks with CIFAR-10 and CIFAR-100, demonstrating improved accuracy with lower training latency compared to fine-tuning methods. Furthermore, SALT facilitates model adaptation for robust inference over lossy networks, a common challenge in edge-cloud environments. With minimal deployment overhead, SALT offers a practical solution for personalized inference in edge AI systems under strict system constraints.

nan


Article 1174

Title@2025-06-09 (1): Task Generalization With AutoRegressive Compositional Structure: Can Learning From $D$ Tasks Generalize to $D^{T}$ Tasks?

Title: Task Generalization With AutoRegressive Compositional Structure: Can Learning From $D$ Tasks Generalize to $D^{T}$ Tasks? Aufgabenverallgemeinerung mit autoregressiver Kompositionsstruktur: Kann man von $D$ Aufgaben zu $D^{T}$ Aufgaben verallgemeinern? 与自动递减的构成结构相关的一般任务:能否从“$D”的任务向“$D”的任务普遍学习到“$D”的任务? 2502.08991v2

Authors (6): Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen, Hongzhou Lin, Jingzhao Zhang, Mikhail Belkin

Large language models (LLMs) exhibit remarkable task generalization, solving tasks they were never explicitly trained on with only a few demonstrations. This raises a fundamental question: When can learning from a small set of tasks generalize to a large task family? In this paper, we investigate task generalization through the lens of autoregressive compositional structure, where each task is a composition of $T$ operations, and each operation is among a finite family of $D$ subtasks. This yields a total class of size $D^T$. We first show that generalization to all $D^T$ tasks is theoretically achievable by training on only $\widetilde{O}(D)$ tasks. Empirically, we demonstrate that Transformers achieve such exponential task generalization on sparse parity functions via In-context Learning (ICL) and chain-of-thought (CoT) reasoning. We further show generalization in arithmetic and translation, beyond parity functions.

nan


Article 1175

Title@2025-06-09 (1): Decentralized Optimization on Compact Submanifolds by Quantized Riemannian Gradient Tracking

Title: Decentralized Optimization on Compact Submanifolds by Quantized Riemannian Gradient Tracking Dezentralisierte Optimierung auf kompakten Submanifolds durch Quantized Riemannian Gradient Tracking 量化的里伊曼尼梯度跟踪对契约子部件的优化 2506.07351v1

Authors (7): Jun Chen, Lina Liu, Tianyi Zhu, Yong Liu, Guang Dai, Yunliang Jiang, Ivor W. Tsang

This paper considers the problem of decentralized optimization on compact submanifolds, where a finite sum of smooth (possibly non-convex) local functions is minimized by $n$ agents forming an undirected and connected graph. However, the efficiency of distributed optimization is often hindered by communication bottlenecks. To mitigate this, we propose the Quantized Riemannian Gradient Tracking (Q-RGT) algorithm, where agents update their local variables using quantized gradients. The introduction of quantization noise allows our algorithm to bypass the constraints of the accurate Riemannian projection operator (such as retraction), further improving iterative efficiency. To the best of our knowledge, this is the first algorithm to achieve an $\mathcal{O}(1/K)$ convergence rate in the presence of quantization, matching the convergence rate of methods without quantization. Additionally, we explicitly derive lower bounds on decentralized consensus associated with a function of quantization levels. Numerical experiments demonstrate that Q-RGT performs comparably to non-quantized methods while reducing communication bottlenecks and computational overhead.

nan


Article 1176

Title@2025-06-09 (1): Distributed Risk-Sensitive Safety Filters for Uncertain Discrete-Time Systems

Title: Distributed Risk-Sensitive Safety Filters for Uncertain Discrete-Time Systems Verteilte risikosensible Sicherheitsfilter für unsichere Diskretzeitsysteme 用于不确定分辨时间系统的分散式风险敏感度安全过滤器 2506.07347v1

Authors (3): Armin Lederer, Erfaun Noorani, Andreas Krause

Ensuring safety in multi-agent systems is a significant challenge, particularly in settings where centralized coordination is impractical. In this work, we propose a novel risk-sensitive safety filter for discrete-time multi-agent systems with uncertain dynamics that leverages control barrier functions (CBFs) defined through value functions. Our approach relies on centralized risk-sensitive safety conditions based on exponential risk operators to ensure robustness against model uncertainties. We introduce a distributed formulation of the safety filter by deriving two alternative strategies: one based on worst-case anticipation and another on proximity to a known safe policy. By allowing agents to switch between strategies, feasibility can be ensured. Through detailed numerical evaluations, we demonstrate the efficacy of our approach in maintaining safety without being overly conservative.

nan


Article 1177

Title@2025-06-09 (1): ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity

Title: ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity ST-GraphNet: Ein räumlich-temporales Neuralnetzwerk für das Verständnis und die Vorhersage von Automatisiertem Fahrzeug-Crash-Severity ST-GraphNet:一个了解和预测自动车辆撞车严重性的空间-时时图神经网络 2506.08051v1

Authors (5): Mahmuda Sultana Mimi, Md Monzurul Islam, Anannya Ghosh Tusti, Shriyank Somvanshi, Subasish Das

Understanding the spatial and temporal dynamics of automated vehicle (AV) crash severity is critical for advancing urban mobility safety and infrastructure planning. In this work, we introduce ST-GraphNet, a spatio-temporal graph neural network framework designed to model and predict AV crash severity by using both fine-grained and region-aggregated spatial graphs. Using a balanced dataset of 2,352 real-world AV-related crash reports from Texas (2024), including geospatial coordinates, crash timestamps, SAE automation levels, and narrative descriptions, we construct two complementary graph representations: (1) a fine-grained graph with individual crash events as nodes, where edges are defined via spatio-temporal proximity; and (2) a coarse-grained graph where crashes are aggregated into Hexagonal Hierarchical Spatial Indexing (H3)-based spatial cells, connected through hexagonal adjacency. Each node in the graph is enriched with multimodal data, including semantic, spatial, and temporal attributes, including textual embeddings from crash narratives using a pretrained Sentence-BERT model. We evaluate various graph neural network (GNN) architectures, such as Graph Convolutional Networks (GCN), Graph Attention Networks (GAT), and Dynamic Spatio-Temporal GCN (DSTGCN), to classify crash severity and predict high-risk regions. Our proposed ST-GraphNet, which utilizes a DSTGCN backbone on the coarse-grained H3 graph, achieves a test accuracy of 97.74\%, substantially outperforming the best fine-grained model (64.7\% test accuracy). These findings highlight the effectiveness of spatial aggregation, dynamic message passing, and multi-modal feature integration in capturing the complex spatio-temporal patterns underlying AV crash severity.

nan


Article 1178

Title@2025-06-09 (1): A Comprehensive Survey on Artificial Intelligence for Complex Network: Potential, Methodology and Application

Title: A Comprehensive Survey on Artificial Intelligence for Complex Network: Potential, Methodology and Application Eine umfassende Umfrage zu Künstlicher Intelligenz für komplexes Netzwerk: Potenzial, Methodik und Anwendung 复杂网络人工情报综合调查:潜力、方法和应用 2402.16887v2

Authors (11): Jingtao Ding, Chang Liu, Yu Zheng, Yunke Zhang, Zihan Yu, Ruikun Li, Hongyi Chen, Jinghua Piao, Huandong Wang, Jiazhen Liu, Yong Li

Complex networks pervade various real-world systems, from the natural environment to human societies. The essence of these networks is in their ability to transition and evolve from microscopic disorder-where network topology and node dynamics intertwine-to a macroscopic order characterized by certain collective behaviors. Over the past two decades, complex network science has significantly enhanced our understanding of the statistical mechanics, structures, and dynamics underlying real-world networks. Despite these advancements, there remain considerable challenges in exploring more realistic systems and enhancing practical applications. The emergence of artificial intelligence (AI) technologies, coupled with the abundance of diverse real-world network data, has heralded a new era in complex network science research. This survey aims to systematically address the potential advantages of AI in overcoming the lingering challenges of complex network research. It endeavors to summarize the pivotal research problems and provide an exhaustive review of the corresponding methodologies and applications. Through this comprehensive survey-the first of its kind on AI for complex networks-we expect to provide valuable insights that will drive further research and advancement in this interdisciplinary field.

nan


Article 1179

Title@2025-06-09 (1): On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks

Title: On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks Über die Auswirkungen von Unsicherheit und Kalibrierung auf Likelihood-Ratio-Mitgliedschafts-Inferenzangriffe 不确定性和校准对潜在――拉蒂奥成员推论攻击的影响 2402.10686v5

Authors (4): Meiyi Zhu, Caili Guo, Chunyan Feng, Osvaldo Simeone

In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleatoric uncertainty in the true data generation process, of the epistemic uncertainty caused by a limited training data set, and of the calibration level of the target model. We compare three different settings, in which the attacker receives decreasingly informative feedback from the target model: confidence vector (CV) disclosure, in which the output probability vector is released; true label confidence (TLC) disclosure, in which only the probability assigned to the true label is made available by the model; and decision set (DS) disclosure, in which an adaptive prediction set is produced as in conformal prediction. We derive bounds on the advantage of an MIA adversary with the aim of offering insights into the impact of uncertainty and calibration on the effectiveness of MIAs. Simulation results demonstrate that the derived analytical bounds predict well the effectiveness of MIAs.

nan


Article 1180

Title@2025-06-09 (1): Calibrated Value-Aware Model Learning with Probabilistic Environment Models

Title: Calibrated Value-Aware Model Learning with Probabilistic Environment Models Kalibriertes Value-Aware-Modell-Lernen mit probabilistischen Umweltmodellen 使用概率环境模型校准的价值软件模型学习 2505.22772v2

Authors (6): Claas Voelcker, Anastasiia Pedan, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, Amir-massoud Farahmand

The idea of value-aware model learning, that models should produce accurate value estimates, has gained prominence in model-based reinforcement learning. The MuZero loss, which penalizes a model’s value function prediction compared to the ground-truth value function, has been utilized in several prominent empirical works in the literature. However, theoretical investigation into its strengths and weaknesses is limited. In this paper, we analyze the family of value-aware model learning losses, which includes the popular MuZero loss. We show that these losses, as normally used, are uncalibrated surrogate losses, which means that they do not always recover the correct model and value function. Building on this insight, we propose corrections to solve this issue. Furthermore, we investigate the interplay between the loss calibration, latent model architectures, and auxiliary losses that are commonly employed when training MuZero-style agents. We show that while deterministic models can be sufficient to predict accurate values, learning calibrated stochastic models is still advantageous.

nan


Article 1181

Title@2025-06-09 (1): Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models

Title: Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models Membership Inferenz Attack sollte weiter zu Verteilungsstatistiken für destillierte Generative Modelle 成员攻击的推论应转向已蒸馏生成模型的分发统计数据 2502.02970v2

Authors (6): Muxing Li, Zesheng Ye, Yixuan Li, Andy Song, Guangquan Zhang, Feng Liu

To detect unauthorized data usage in training large-scale generative models (e.g., ChatGPT or Midjourney), membership inference attacks (MIA) have proven effective in distinguishing a single training instance (a member) from a single non-training instance (a non-member). This success is mainly credited to a memorization effect: models tend to perform better on a member than a non-member. However, we find that standard MIAs fail against distilled generative models (i.e., student models) that are increasingly deployed in practice for efficiency (e.g., ChatGPT 4o-mini). Trained exclusively on data generated from a large-scale model (a teacher model), the student model lacks direct exposure to any members (teacher’s training data), nullifying the memorization effect that standard MIAs rely on. This finding reveals a serious privacy loophole, where generation-service providers could deploy a student model whose teacher was potentially trained on unauthorized data, yet claim the deployed model is clean because it was not directly trained on such data. Hence, are distilled models inherently unauditable for upstream privacy violations, and should we discard them when we care about privacy? We contend no, as we uncover a memory chain connecting the student and teacher’s member data: the distribution of student-generated data aligns more closely with the distribution of the teacher’s members than with non-members, thus we can detect unauthorized data usage even when direct instance-level memorization is absent. This leads us to posit that MIAs on distilled generative models should shift from instance-level scores to distribution-level statistics. We further propose three principles of distribution-based MIAs for detecting unauthorized training data through distilled generative models, and validate our position through an exemplar framework. We lastly discuss the implications of our position.

nan


Article 1182

Title@2025-06-09 (1): Real-Time Execution of Action Chunking Flow Policies

Title: Real-Time Execution of Action Chunking Flow Policies Real-Time Ausführung von Maßnahmen Chunking Flow Richtlinien 实时执行 “ 快速行动 “ 流动政策 2506.07339v1

Authors (3): Kevin Black, Manuel Y. Galliker, Sergey Levine

Modern AI systems, especially those interacting with the physical world, increasingly require real-time performance. However, the high latency of state-of-the-art generalist models, including recent vision-language action models (VLAs), poses a significant challenge. While action chunking has enabled temporal consistency in high-frequency control tasks, it does not fully address the latency problem, leading to pauses or out-of-distribution jerky movements at chunk boundaries. This paper presents a novel inference-time algorithm that enables smooth asynchronous execution of action chunking policies. Our method, real-time chunking (RTC), is applicable to any diffusion- or flow-based VLA out of the box with no re-training. It generates the next action chunk while executing the current one, “freezing” actions guaranteed to execute and “inpainting” the rest. To test RTC, we introduce a new benchmark of 12 highly dynamic tasks in the Kinetix simulator, as well as evaluate 6 challenging real-world bimanual manipulation tasks. Results demonstrate that RTC is fast, performant, and uniquely robust to inference delay, significantly improving task throughput and enabling high success rates in precise tasks $\unicode{x2013}$ such as lighting a match $\unicode{x2013}$ even in the presence of significant latency. See https://pi.website/research/real_time_chunking for videos.

nan


Article 1183

Title@2025-06-09 (1): Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models

Title: Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models Graph-KV: Durchbrechen der Sequenz durch Einspritzen struktureller Biasen in große Sprachmodelle 图图-KV:通过向大语言模型输入结构分界来打破序列 2506.07334v1

Authors (7): Haoyu Wang, Peihao Wang, Mufei Li, Shikun Liu, Siqi Miao, Zhangyang Wang, Pan Li

Modern large language models (LLMs) are inherently auto-regressive, requiring input to be serialized into flat sequences regardless of their structural dependencies. This serialization hinders the model’s ability to leverage structural inductive biases, especially in tasks such as retrieval-augmented generation (RAG) and reasoning on data with native graph structures, where inter-segment dependencies are crucial. We introduce Graph-KV with the potential to overcome this limitation. Graph-KV leverages the KV-cache of text segments as condensed representations and governs their interaction through structural inductive biases. In this framework, ‘target’ segments selectively attend only to the KV-caches of their designated ‘source’ segments, rather than all preceding segments in a serialized sequence. This approach induces a graph-structured block mask, sparsifying attention and enabling a message-passing-like step within the LLM. Furthermore, strategically allocated positional encodings for source and target segments reduce positional bias and context window consumption. We evaluate Graph-KV across three scenarios: (1) seven RAG benchmarks spanning direct inference, multi-hop reasoning, and long-document understanding; (2) Arxiv-QA, a novel academic paper QA task with full-text scientific papers structured as citation ego-graphs; and (3) paper topic classification within a citation network. By effectively reducing positional bias and harnessing structural inductive biases, Graph-KV substantially outperforms baselines, including standard costly sequential encoding, across various settings. Code and the Graph-KV data are publicly available.

nan


Article 1184

Title@2025-06-09 (1): Model Generalization on Text Attribute Graphs: Principles with Large Language Models

Title: Model Generalization on Text Attribute Graphs: Principles with Large Language Models Modellverallgemeinerung auf Text-Attribut-Grafiken: Prinzipien mit großen Sprachmodellen 文本属性图的模型通用化:大语言模型原则 2502.11836v2

Authors (4): Haoyu Wang, Shikun Liu, Rongzhe Wei, Pan Li

Large language models (LLMs) have recently been introduced to graph learning, aiming to extend their zero-shot generalization success to tasks where labeled graph data is scarce. Among these applications, inference over text-attributed graphs (TAGs) presents unique challenges: existing methods struggle with LLMs’ limited context length for processing large node neighborhoods and the misalignment between node embeddings and the LLM token space. To address these issues, we establish two key principles for ensuring generalization and derive the framework LLM-BP accordingly: (1) Unifying the attribute space with task-adaptive embeddings, where we leverage LLM-based encoders and task-aware prompting to enhance generalization of the text attribute embeddings; (2) Developing a generalizable graph information aggregation mechanism, for which we adopt belief propagation with LLM-estimated parameters that adapt across graphs. Evaluations on 11 real-world TAG benchmarks demonstrate that LLM-BP significantly outperforms existing approaches, achieving 8.10% improvement with task-conditional embeddings and an additional 1.71% gain from adaptive aggregation. The code and task-adaptive embeddings are publicly available.

nan


Article 1185

Title@2025-06-09 (1): JavelinGuard: Low-Cost Transformer Architectures for LLM Security

Title: JavelinGuard: Low-Cost Transformer Architectures for LLM Security JavelinGuard: Low-Cost-Transformer-Architekturen für LLM-Sicherheit JavelinGuard:低成本变革结构促进LLM安全 2506.07330v1

Authors (2): Yash Datta, Sharath Rajasekar

We present JavelinGuard, a suite of low-cost, high-performance model architectures designed for detecting malicious intent in Large Language Model (LLM) interactions, optimized specifically for production deployment. Recent advances in transformer architectures, including compact BERT(Devlin et al. 2019) variants (e.g., ModernBERT (Warner et al. 2024)), allow us to build highly accurate classifiers with as few as approximately 400M parameters that achieve rapid inference speeds even on standard CPU hardware. We systematically explore five progressively sophisticated transformer-based architectures: Sharanga (baseline transformer classifier), Mahendra (enhanced attention-weighted pooling with deeper heads), Vaishnava and Ashwina (hybrid neural ensemble architectures), and Raudra (an advanced multi-task framework with specialized loss functions). Our models are rigorously benchmarked across nine diverse adversarial datasets, including popular sets like the NotInject series, BIPIA, Garak, ImprovedLLM, ToxicChat, WildGuard, and our newly introduced JavelinBench, specifically crafted to test generalization on challenging borderline and hard-negative cases. Additionally, we compare our architectures against leading open-source guardrail models as well as large decoder-only LLMs such as gpt-4o, demonstrating superior cost-performance trade-offs in terms of accuracy, and latency. Our findings reveal that while Raudra’s multi-task design offers the most robust performance overall, each architecture presents unique trade-offs in speed, interpretability, and resource requirements, guiding practitioners in selecting the optimal balance of complexity and efficiency for real-world LLM security applications.

nan


Article 1186

Title@2025-06-09 (1): Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning

Title: Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning Dynamische Planung für Fahrzeug-zu-Fahrzeug-Kommunikation verbessertes Federated Learning 车辆到车辆的交通通信增强联邦学习的动态日程安排 2406.17470v2

Authors (6): Jintao Yan, Tan Chen, Yuxuan Sun, Zhaojun Nan, Sheng Zhou, Zhisheng Niu

Leveraging the computing and sensing capabilities of vehicles, vehicular federated learning (VFL) has been applied to edge training for connected vehicles. The dynamic and interconnected nature of vehicular networks presents unique opportunities to harness direct vehicle-to-vehicle (V2V) communications, enhancing VFL training efficiency. In this paper, we formulate a stochastic optimization problem to optimize the VFL training performance, considering the energy constraints and mobility of vehicles, and propose a V2V-enhanced dynamic scheduling (VEDS) algorithm to solve it. The model aggregation requirements of VFL and the limited transmission time due to mobility result in a stepwise objective function, which presents challenges in solving the problem. We thus propose a derivative-based drift-plus-penalty method to convert the long-term stochastic optimization problem to an online mixed integer nonlinear programming (MINLP) problem, and provide a theoretical analysis to bound the performance gap between the online solution and the offline optimal solution. Further analysis of the scheduling priority reduces the original problem into a set of convex optimization problems, which are efficiently solved using the interior-point method. Experimental results demonstrate that compared with the state-of-the-art benchmarks, the proposed algorithm enhances the image classification accuracy on the CIFAR-10 dataset by 4.20% and reduces the average displacement errors on the Argoverse trajectory prediction dataset by 9.82%.

nan


Article 1187

Title@2025-06-08 (7): Mobility-Aware Asynchronous Federated Learning with Dynamic Sparsification

Title: Mobility-Aware Asynchronous Federated Learning with Dynamic Sparsification Mobilität-Bewusst Asynchrones Federated Learning mit dynamischer Sparsifikation 具有动态分级功能的流动意识非同步联邦学习 2506.07328v1

Authors (6): Jintao Yan, Tan Chen, Yuxuan Sun, Zhaojun Nan, Sheng Zhou, Zhisheng Niu

Asynchronous Federated Learning (AFL) enables distributed model training across multiple mobile devices, allowing each device to independently update its local model without waiting for others. However, device mobility introduces intermittent connectivity, which necessitates gradient sparsification and leads to model staleness, jointly affecting AFL convergence. This paper develops a theoretical model to characterize the interplay among sparsification, model staleness and mobility-induced contact patterns, and their joint impact on AFL convergence. Based on the analysis, we propose a mobility-aware dynamic sparsification (MADS) algorithm that optimizes the sparsification degree based on contact time and model staleness. Closed-form solutions are derived, showing that under low-speed conditions, MADS increases the sparsification degree to enhance convergence, while under high-speed conditions, it reduces the sparsification degree to guarantee reliable uploads within limited contact time. Experimental results validate the theoretical findings. Compared with the state-of-the-art benchmarks, the MADS algorithm increases the image classification accuracy on the CIFAR-10 dataset by 8.76% and reduces the average displacement error in the Argoverse trajectory prediction dataset by 9.46%.

nan


Article 1188

Title@2025-06-08 (7): Reward Model Interpretability via Optimal and Pessimal Tokens

Title: Reward Model Interpretability via Optimal and Pessimal Tokens Reward Model Interpretierbarkeit über Optimale und Pessimal Token 通过最佳和微量调制调来解释性 2506.07326v1

Authors (5): Brian Christian, Hannah Rose Kirk, Jessica A. F. Thompson, Christopher Summerfield, Tsvetomira Dumbalska

Reward modeling has emerged as a crucial component in aligning large language models with human values. Significant attention has focused on using reward models as a means for fine-tuning generative models. However, the reward models themselves – which directly encode human value judgments by turning prompt-response pairs into scalar rewards – remain relatively understudied. We present a novel approach to reward model interpretability through exhaustive analysis of their responses across their entire vocabulary space. By examining how different reward models score every possible single-token response to value-laden prompts, we uncover several striking findings: (i) substantial heterogeneity between models trained on similar objectives, (ii) systematic asymmetries in how models encode high- vs low-scoring tokens, (iii) significant sensitivity to prompt framing that mirrors human cognitive biases, and (iv) overvaluation of more frequent tokens. We demonstrate these effects across ten recent open-source reward models of varying parameter counts and architectures. Our results challenge assumptions about the interchangeability of reward models, as well as their suitability as proxies of complex and context-dependent human values. We find that these models can encode concerning biases toward certain identity groups, which may emerge as unintended consequences of harmlessness training – distortions that risk propagating through the downstream large language models now deployed to millions.

nan


Article 1189

Title@2025-06-08 (7): Devanagari Digit Recognition using Quantum Machine Learning

Title: Devanagari Digit Recognition using Quantum Machine Learning Devanagari Digit Recognition mit Quantum Machine Learning Devanagari Digit 使用量子机器学习识别 2506.09069v1

Authors (1): Sahaj Raj Malla

Handwritten digit recognition in regional scripts, such as Devanagari, is crucial for multilingual document digitization, educational tools, and the preservation of cultural heritage. The script’s complex structure and limited annotated datasets pose significant challenges to conventional models. This paper introduces the first hybrid quantum-classical architecture for Devanagari handwritten digit recognition, combining a convolutional neural network (CNN) for spatial feature extraction with a 10-qubit variational quantum circuit (VQC) for quantum-enhanced classification. Trained and evaluated on the Devanagari Handwritten Character Dataset (DHCD), the proposed model achieves a state-of-the-art test accuracy for quantum implementation of 99.80% and a test loss of 0.2893, with an average per-class F1-score of 0.9980. Compared to equivalent classical CNNs, our model demonstrates superior accuracy with significantly fewer parameters and enhanced robustness. By leveraging quantum principles such as superposition and entanglement, this work establishes a novel benchmark for regional script recognition, highlighting the promise of quantum machine learning (QML) in real-world, low-resource language settings.

nan


Article 1190

Title@2025-06-08 (7): DEF: Diffusion-augmented Ensemble Forecasting

Title: DEF: Diffusion-augmented Ensemble Forecasting DEF: Diffusionsgenerierte Ensemble-Prognose DEF: 扩散增强的集合预测 2506.07324v1

Authors (4): David Millard, Arielle Carr, Stéphane Gaudreault, Ali Baheri

We present DEF (\textbf{\ul{D}}iffusion-augmented \textbf{\ul{E}}nsemble \textbf{\ul{F}}orecasting), a novel approach for generating initial condition perturbations. Modern approaches to initial condition perturbations are primarily designed for numerical weather prediction (NWP) solvers, limiting their applicability in the rapidly growing field of machine learning for weather prediction. Consequently, stochastic models in this domain are often developed on a case-by-case basis. We demonstrate that a simple conditional diffusion model can (1) generate meaningful structured perturbations, (2) be applied iteratively, and (3) utilize a guidance term to intuitivey control the level of perturbation. This method enables the transformation of any deterministic neural forecasting system into a stochastic one. With our stochastic extended systems, we show that the model accumulates less error over long-term forecasts while producing meaningful forecast distributions. We validate our approach on the 5.625$^\circ$ ERA5 reanalysis dataset, which comprises atmospheric and surface variables over a discretized global grid, spanning from the 1960s to the present. On this dataset, our method demonstrates improved predictive performance along with reasonable spread estimates.

nan


Article 1191

Title@2025-06-08 (7): Watermarking Language Models with Error Correcting Codes

Title: Watermarking Language Models with Error Correcting Codes Wasserzeichen-Sprachenmodelle mit Fehler-Korrektur-Codes 错误校正代码的划水语言模型 2406.10281v4

Authors (4): Patrick Chao, Yan Sun, Edgar Dobriban, Hamed Hassani

Recent progress in large language models enables the creation of realistic machine-generated content. Watermarking is a promising approach to distinguish machine-generated text from human text, embedding statistical signals in the output that are ideally undetectable to humans. We propose a watermarking framework that encodes such signals through an error correcting code. Our method, termed robust binary code (RBC) watermark, introduces no noticeable degradation in quality. We evaluate our watermark on base and instruction fine-tuned models and find that our watermark is robust to edits, deletions, and translations. We provide an information-theoretic perspective on watermarking, a powerful statistical test for detection and for generating $p$-values, and theoretical guarantees. Our empirical findings suggest our watermark is fast, powerful, and robust, comparing favorably to the state-of-the-art.

nan


Article 1192

Title@2025-06-08 (7): Generative Modeling of Networked Time-Series via Transformer Architectures

Title: Generative Modeling of Networked Time-Series via Transformer Architectures Generative Modellierung von vernetzten Zeitreihen über Transformer-Architekturen 通过变换器架构生成网络化时序建模 2506.07312v1

Authors (1): Yusuf Elnady

Many security and network applications require having large datasets to train the machine learning models. Limited data access is a well-known problem in the security domain. Recent studies have shown the potential of Transformer models to enlarge the size of data by synthesizing new samples, but the synthesized samples don’t improve the models over the real data. To address this issue, we design an efficient transformer-based model as a generative framework to generate time-series data, that can be used to boost the performance of existing and new ML workflows. Our new transformer model achieves the SOTA results. We style our model to be generalizable and work across different datasets, and produce high-quality samples.

nan


Article 1193

Title@2025-06-08 (7): Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference

Title: Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference Paged Attention trifft auf FlexAttention: Locking Long-Context Efficiency in Deployed Inference 专注:在部署推论中解锁长文本效率 2506.07311v1

Authors (5): Thomas Joshi, Herman Saini, Neil Dhillon, Antoni Viros i Martin, Kaoutar El Maghraoui

Large Language Models (LLMs) encounter severe memory inefficiencies during long-context inference due to conventional handling of key-value (KV) caches. In this work, we introduce a novel integration of PagedAttention with PyTorch’s FlexAttention, addressing internal fragmentation and inefficiencies associated with monolithic KV cache allocations. Implemented within IBM’s Foundation Model Stack (FMS), our fused attention kernel efficiently gathers scattered KV data. Our benchmarks on an NVIDIA L4 GPU (24GB) demonstrate significantly reduced inference latency, growing only linearly (~2x) with sequence length from 128 to 2048 tokens when utilizing a global KV cache, compared to exponential latency increases without caching. While peak memory usage remains largely unchanged for single-step evaluations (dominated by model weights and activations), paged attention causes minimal incremental memory usage, observable only at sequence lengths exceeding 2048 tokens due to its power-of-two cache allocations. We open-source the full implementation and discuss its implications for future long-context model deployment.

nan


Article 1194

Title@2025-06-08 (7): PASS: Private Attributes Protection with Stochastic Data Substitution

Title: PASS: Private Attributes Protection with Stochastic Data Substitution PASS: Schutz privater Attribute mit stochastischer Datenersetzung PASS: 私人财产保护,用存储数据替代数据 2506.07308v1

Authors (6): Yizhuo Chen, Chun-Fu, Chen, Hsiang Hsu, Shaohan Hu, Tarek Abdelzaher

The growing Machine Learning (ML) services require extensive collections of user data, which may inadvertently include people’s private information irrelevant to the services. Various studies have been proposed to protect private attributes by removing them from the data while maintaining the utilities of the data for downstream tasks. Nevertheless, as we theoretically and empirically show in the paper, these methods reveal severe vulnerability because of a common weakness rooted in their adversarial training based strategies. To overcome this limitation, we propose a novel approach, PASS, designed to stochastically substitute the original sample with another one according to certain probabilities, which is trained with a novel loss function soundly derived from information-theoretic objective defined for utility-preserving private attributes protection. The comprehensive evaluation of PASS on various datasets of different modalities, including facial images, human activity sensory signals, and voice recording datasets, substantiates PASS’s effectiveness and generalizability.

nan


Article 1195

Title@2025-06-08 (7): Highly Fast Text Segmentation With Pairwise Markov Chains

Title: Highly Fast Text Segmentation With Pairwise Markov Chains Hochschnelle Textsegmentierung mit paarweise Markov-Ketten 配对马科夫链条的快速文本分割 2102.11037v2

Authors (4): Elie Azeraf, Emmanuel Monfrini, Emmanuel Vignon, Wojciech Pieczynski

Natural Language Processing (NLP) models’ current trend consists of using increasingly more extra-data to build the best models as possible. It implies more expensive computational costs and training time, difficulties for deployment, and worries about these models’ carbon footprint reveal a critical problem in the future. Against this trend, our goal is to develop NLP models requiring no extra-data and minimizing training time. To do so, in this paper, we explore Markov chain models, Hidden Markov Chain (HMC) and Pairwise Markov Chain (PMC), for NLP segmentation tasks. We apply these models for three classic applications: POS Tagging, Named-Entity-Recognition, and Chunking. We develop an original method to adapt these models for text segmentation’s specific challenges to obtain relevant performances with very short training and execution times. PMC achieves equivalent results to those obtained by Conditional Random Fields (CRF), one of the most applied models for these tasks when no extra-data are used. Moreover, PMC has training times 30 times shorter than the CRF ones, which validates this model given our objectives.

nan


Article 1196

Title@2025-06-08 (7): Sharpness-Aware Teleportation on Riemannian Manifolds

Title: Sharpness-Aware Teleportation on Riemannian Manifolds Scharfheitsbewusste Teleportation auf Riemannschen Manifolds Riemannian Manifolds 上的移动 2309.17215v2

Authors (7): Tuan Truong, Hoang-Phi Nguyen, Haocheng Luo, Tung Pham, Mehrtash Harandi, Dinh Phung, Trung Le

Recent studies highlight the effectiveness of flat minima in enhancing generalization, with sharpness-aware minimization (SAM) achieving state-of-the-art performance. Additionally, insights into the intrinsic geometry of the loss landscape have shown promise for improving model generalization. Building on these advancements, we introduce a novel sharpness-aware, geometry-aware teleportation mechanism to further enhance robustness and generalization. The core innovation of our approach is to decompose each iteration into a teleportation step within a local orbit and a sharpness-aware step that transitions between different orbits, leveraging the Riemannian quotient manifold. Our approach is grounded in a theoretical framework that analyzes the generalization gap between population loss and worst-case empirical loss within the context of Riemannian manifolds. To demonstrate the effectiveness of our method, we evaluate and compare our algorithm on diverse vision benchmarks with various datasets and Riemannian manifolds.

nan


Article 1197

Title@2025-06-08 (7): Uncertainty-Aware Strategies: A Model-Agnostic Framework for Robust Financial Optimization through Subsampling

Title: Uncertainty-Aware Strategies: A Model-Agnostic Framework for Robust Financial Optimization through Subsampling Unsichere Strategien: Ein modell-agnostischer Rahmen für robuste finanzielle Optimierung durch Unterstichproben 不确定性 – – 软件战略:通过子取样实现稳健金融优化的模型 – – 不可确定框架 2506.07299v1

Authors (4): Hans Buehler, Blanka Horvath, Yannick Limmer, Thorsten Schmidt

This paper addresses the challenge of model uncertainty in quantitative finance, where decisions in portfolio allocation, derivative pricing, and risk management rely on estimating stochastic models from limited data. In practice, the unavailability of the true probability measure forces reliance on an empirical approximation, and even small misestimations can lead to significant deviations in decision quality. Building on the framework of Klibanoff et al. (2005), we enhance the conventional objective - whether this is expected utility in an investing context or a hedging metric - by superimposing an outer “uncertainty measure”, motivated by traditional monetary risk measures, on the space of models. In scenarios where a natural model distribution is lacking or Bayesian methods are impractical, we propose an ad hoc subsampling strategy, analogous to bootstrapping in statistical finance and related to mini-batch sampling in deep learning, to approximate model uncertainty. To address the quadratic memory demands of naive implementations, we also present an adapted stochastic gradient descent algorithm that enables efficient parallelization. Through analytical, simulated, and empirical studies - including multi-period, real data and high-dimensional examples - we demonstrate that uncertainty measures outperform traditional mixture of measures strategies and our model-agnostic subsampling-based approach not only enhances robustness against model risk but also achieves performance comparable to more elaborate Bayesian methods.

nan


Article 1198

Title@2025-06-08 (7): MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache

Title: MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache MiniKV: Die Grenzen der LLM-Inferenz durch 2-Bit-Layer-Diskriminativ-KV-Cache drücken MiniKV: 通过2-Bit 层分解 KV 缓存推进LLM 推断的限度 2411.18077v3

Authors (5): Akshat Sharma, Hangliang Ding, Jianping Li, Neel Dani, Minjia Zhang

How to efficiently serve LLMs in practice has become exceptionally challenging due to their prohibitive memory and computation requirements. In this study, we investigate optimizing the KV cache, whose memory footprint poses a critical bottleneck in LLM inference, especially when dealing with long context tasks. To tackle the challenge, we introduce MiniKV, a KV cache optimization method that simultaneously preserves long context task accuracy while significantly reducing KV cache size via a novel 2-bit layer-discriminative KV cache. More importantly, we develop specialized CUDA kernels to make MiniKV compatible with FlashAttention. Experiments on a wide range of long context tasks show that MiniKV effectively achieves 86% KV cache compression ratio while recovering over 98.5% of accuracy, outperforming state-of-the-art methods while achieving excellent measured system performance improvements.

nan


Article 1199

Title@2025-06-08 (7): On the kernel learning problem

Title: On the kernel learning problem Zum Problem des Kernel-Lernens 关于内核学习问题 2502.11665v2

Authors (2): Yang Li, Feng Ruan

The classical kernel ridge regression problem aims to find the best fit for the output $Y$ as a function of the input data $X\in \mathbb{R}^d$, with a fixed choice of regularization term imposed by a given choice of a reproducing kernel Hilbert space, such as a Sobolev space. Here we consider a generalization of the kernel ridge regression problem, by introducing an extra matrix parameter $U$, which aims to detect the scale parameters and the feature variables in the data, and thereby improve the efficiency of kernel ridge regression. This naturally leads to a nonlinear variational problem to optimize the choice of $U$. We study various foundational mathematical aspects of this variational problem, and in particular how this behaves in the presence of multiscale structures in the data.

nan


Article 1200

Title@2025-06-08 (7): Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI

Title: Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI Multi-Step Guided Diffusion für die Bildwiederherstellung an Edge-Geräten: Auf dem Weg zur leichten Wahrnehmung in körpernaher KI 多脚制导扩散,促进边缘设备图像恢复:向Embodied AI 中的轻量度感知 2506.07286v1

Authors (1): Aditya Chakravarty

Diffusion models have shown remarkable flexibility for solving inverse problems without task-specific retraining. However, existing approaches such as Manifold Preserving Guided Diffusion (MPGD) apply only a single gradient update per denoising step, limiting restoration fidelity and robustness, especially in embedded or out-of-distribution settings. In this work, we introduce a multistep optimization strategy within each denoising timestep, significantly enhancing image quality, perceptual accuracy, and generalization. Our experiments on super-resolution and Gaussian deblurring demonstrate that increasing the number of gradient updates per step improves LPIPS and PSNR with minimal latency overhead. Notably, we validate this approach on a Jetson Orin Nano using degraded ImageNet and a UAV dataset, showing that MPGD, originally trained on face datasets, generalizes effectively to natural and aerial scenes. Our findings highlight MPGD’s potential as a lightweight, plug-and-play restoration module for real-time visual perception in embodied AI agents such as drones and mobile robots.

nan


Article 1201

Title@2025-06-08 (7): Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization

Title: Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization Verteidigung gegen verschiedene Angriffe im Federated Learning durch Konsens-basierte Bi-Level-Optimierung 通过基于共识的双级优化,在通过共识实现的两级最佳化,在联邦学习中防范多种袭击 2412.02535v2

Authors (5): Nicolás García Trillos, Aditya Kumar Akash, Sixu Li, Konstantin Riedl, Yuhua Zhu

Adversarial attacks pose significant challenges in many machine learning applications, particularly in the setting of distributed training and federated learning, where malicious agents seek to corrupt the training process with the goal of jeopardizing and compromising the performance and reliability of the final models. In this paper, we address the problem of robust federated learning in the presence of such attacks by formulating the training task as a bi-level optimization problem. We conduct a theoretical analysis of the resilience of consensus-based bi-level optimization (CB$^2$O), an interacting multi-particle metaheuristic optimization method, in adversarial settings. Specifically, we provide a global convergence analysis of CB$^2$O in mean-field law in the presence of malicious agents, demonstrating the robustness of CB$^2$O against a diverse range of attacks. Thereby, we offer insights into how specific hyperparameter choices enable to mitigate adversarial effects. On the practical side, we extend CB$^2$O to the clustered federated learning setting by proposing FedCB$^2$O, a novel interacting multi-particle system, and design a practical algorithm that addresses the demands of real-world applications. Extensive experiments demonstrate the robustness of the FedCB$^2$O algorithm against label-flipping attacks in decentralized clustered federated learning scenarios, showcasing its effectiveness in practical contexts.

nan


Article 1202

Title@2025-06-08 (7): Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

Title: Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback 超前优先优化(EGPO):超越纳什最后的融合(Nash) 从人类反馈中学习 2503.08942v2

Authors (3): Runlong Zhou, Maryam Fazel, Simon S. Du

Reinforcement learning from human feedback (RLHF) has become essential for improving language model capabilities, but traditional approaches rely on the assumption that human preferences follow a transitive Bradley-Terry model. This assumption fails to capture the non-transitive nature of populational human preferences. Nash learning from human feedback (NLHF), targeting non-transitive preferences, is a problem of computing the Nash equilibrium (NE) of the two-player constant-sum game defined by the human preference. We introduce Extragradient preference optimization (EGPO), a novel algorithm for NLHF achieving last-iterate linear convergence to the NE of KL-regularized games and polynomial convergence to the NE of original games, while being robust to noise. Unlike previous approaches that rely on nested optimization, we derive an equivalent implementation using gradients of an online variant of the identity preference optimization (IPO) loss, enabling more faithful implementation for neural networks. Our empirical evaluations demonstrate EGPO’s superior performance over baseline methods when training for the same number of epochs, as measured by pairwise win-rates using the ground truth preference. These results validate both the theoretical strengths and practical advantages of EGPO for language model alignment with non-transitive human preferences.

nan


Article 1203

Title@2025-06-08 (7): Selective Prompt Anchoring for Code Generation

Title: Selective Prompt Anchoring for Code Generation Selektive Prompt-Ankerung für die Code-Generierung 代代代代代代代代代代代代代代代代代 代代代代代代代代代代代代代 代代代代代代代代代代代代 2408.09121v5

Authors (2): Yuan Tian, Tianyi Zhang

Recent advances in large language models (LLMs) have transformed software development by automatically generating code from natural language. Yet challenges remain in generating fully correct code that aligns with user intent. Our study reveals that LLMs tend to pay less attention to user prompts as more code tokens are generated. We hypothesize that this attention dilution issue is an important reason for code generation errors. To mitigate this issue, we propose Selective Prompt Anchoring (SPA) to guide code LLMs to pay more attention to user intent when generating code. We evaluate SPA using six base LLMs across six benchmarks. Our results demonstrate that SPA enhances Pass@1 by up to 12.9%, consistently outperforming SOTA code generation methods in all settings. Our code is available at https://github.com/magic-YuanTian/Selective-Prompt-Anchoring.

nan


Article 1204

Title@2025-06-08 (7): Machine Learning Should Maximize Welfare, but Not by (Only) Maximizing Accuracy

Title: Machine Learning Should Maximize Welfare, but Not by (Only) Maximizing Accuracy Maschinelles Lernen sollte Wohlfahrt maximieren, aber nicht durch (nur) Maximierung der Genauigkeit 机器学习应该最大限度地提高福利,但不能(仅仅)最大限度地提高准确度 2502.11981v2

Authors (2): Nir Rosenfeld, Haifeng Xu

Decades of research in machine learning have given us powerful tools for making accurate predictions. This has made such tools appealing for use in social settings and on human inputs. Yet despite a lack of justification for why the generic approach of accuracy maximization can or should improve our collective well-being – and mounting evidence of likely adverse outcomes – it remains the widespread default. This position paper asserts that for machine learning to become socially beneficial, it must be embedded within a broader economic framework that explicitly aims to maximize social welfare. The field of welfare economics asks: how should we allocate limited resources among self-interested agents to maximize overall benefits? We contend that this perspective applies to many contemporary applications of machine learning in social contexts, and advocate for its adoption. Rather than disposing of prediction, we propose to leverage this forte of machine learning towards welfare maximization. We demonstrate this idea by portraying a conceptual framework that gradually transitions from accuracy maximization (with awareness to welfare) to welfare maximization (via accurate prediction). We detail applications and use-cases for which this framework can be effective, identify technical challenges and practical opportunities, and highlight future avenues worth pursuing.

nan


Article 1205

Title@2025-06-08 (7): Tokenized Bandit for LLM Decoding and Alignment

Title: Tokenized Bandit for LLM Decoding and Alignment Tokenized Bandit für LLM Decodierung und Ausrichtung 用于LLMM 解码和对齐的定義土匪 2506.07276v1

Authors (4): Suho Shin, Chenghao Yang, Haifeng Xu, Mohammad T. Hajiaghayi

We introduce the tokenized linear bandit (TLB) and multi-armed bandit (TMAB), variants of linear and stochastic multi-armed bandit problems inspired by LLM decoding and alignment. In these problems, at each round $t \in [T]$, a user submits a query (context), and the decision maker (DM) sequentially selects a token irrevocably from a token set. Once the sequence is complete, the DM observes a random utility from the user, whose expectation is presented by a sequence function mapping the chosen token sequence to a nonnegative real value that depends on the query. In both problems, we first show that learning is impossible without any structure on the sequence function. We introduce a natural assumption, diminishing distance with more commons (DDMC), and propose algorithms with regret $\tilde{O}(L\sqrt{T})$ and $\tilde{O}(L\sqrt{T^{2/3}})$ for TLB and TMAB, respectively. As a side product, we obtain an (almost) optimality of the greedy decoding for LLM decoding algorithm under DDMC, which justifies the unresaonable effectiveness of greedy decoding in several tasks. This also has an immediate application to decoding-time LLM alignment, when the misaligned utility can be represented as the frozen LLM’s utility and a linearly realizable latent function. We finally validate our algorithm’s performance empirically as well as verify our assumptions using synthetic and real-world datasets.

nan


Article 1206

Title@2025-06-08 (7): A Cramér-von Mises Approach to Incentivizing Truthful Data Sharing

Title: A Cramér-von Mises Approach to Incentivizing Truthful Data Sharing Ein Cramér-von-Mises-Ansatz zur Förderung der wahrheitsgemäßen Datenfreigabe 激励真相数据共享的Cramér-von Mises方法 2506.07272v1

Authors (5): Alex Clinton, Thomas Zeng, Yiding Chen, Xiaojin Zhu, Kirthevasan Kandasamy

Modern data marketplaces and data sharing consortia increasingly rely on incentive mechanisms to encourage agents to contribute data. However, schemes that reward agents based on the quantity of submitted data are vulnerable to manipulation, as agents may submit fabricated or low-quality data to inflate their rewards. Prior work has proposed comparing each agent’s data against others’ to promote honesty: when others contribute genuine data, the best way to minimize discrepancy is to do the same. Yet prior implementations of this idea rely on very strong assumptions about the data distribution (e.g. Gaussian), limiting their applicability. In this work, we develop reward mechanisms based on a novel, two-sample test inspired by the Cram'er-von Mises statistic. Our methods strictly incentivize agents to submit more genuine data, while disincentivizing data fabrication and other types of untruthful reporting. We establish that truthful reporting constitutes a (possibly approximate) Nash equilibrium in both Bayesian and prior-agnostic settings. We theoretically instantiate our method in three canonical data sharing problems and show that it relaxes key assumptions made by prior work. Empirically, we demonstrate that our mechanism incentivizes truthful data sharing via simulations and on real-world language and image data.

nan


Article 1207

Title@2025-06-08 (7): Machine Learning-Based Self-Localization Using Internal Sensors for Automating Bulldozers

Title: Machine Learning-Based Self-Localization Using Internal Sensors for Automating Bulldozers Maschinelle Learning-basierte Selbstlokalisierung mittels interner Sensoren zur Automatisierung von Bulldozern 利用内部传感器实现推推推车自动化 2506.07271v1

Authors (7): Hikaru Sawafuji, Ryota Ozaki, Takuto Motomura, Toyohisa Matsuda, Masanori Tojima, Kento Uchida, Shinichi Shirakawa

Self-localization is an important technology for automating bulldozers. Conventional bulldozer self-localization systems rely on RTK-GNSS (Real Time Kinematic-Global Navigation Satellite Systems). However, RTK-GNSS signals are sometimes lost in certain mining conditions. Therefore, self-localization methods that do not depend on RTK-GNSS are required. In this paper, we propose a machine learning-based self-localization method for bulldozers. The proposed method consists of two steps: estimating local velocities using a machine learning model from internal sensors, and incorporating these estimates into an Extended Kalman Filter (EKF) for global localization. We also created a novel dataset for bulldozer odometry and conducted experiments across various driving scenarios, including slalom, excavation, and driving on slopes. The result demonstrated that the proposed self-localization method suppressed the accumulation of position errors compared to kinematics-based methods, especially when slip occurred. Furthermore, this study showed that bulldozer-specific sensors, such as blade position sensors and hydraulic pressure sensors, contributed to improving self-localization accuracy.

nan


Article 1208

Title@2025-06-08 (7): PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference

Title: PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference PecSched: Präventive und effiziente Clusterplanung für LLM-Inferenz 预测:先发制人和高效率的LLM推理群集安排 2409.15104v2

Authors (2): Zeyu Zhang, Haiying Shen

The scaling of transformer-based Large Language Models (LLMs) has significantly expanded their context lengths, enabling applications where inputs exceed 100K tokens. Our analysis of a recent Azure LLM inference trace reveals a highly skewed long-tail distribution of input lengths, with approximately 80% of inputs shorter than 2K tokens. Long inputs constitute only a small fraction. Existing cluster-level LLM scheduling strategies, including First-In-First-Out (FIFO), reservation-based, and priority-based approaches, primarily target short-input requests with lengths below 2K and fail to address this heterogeneity, leading to inefficiencies such as head-of-line blocking, resource underutilization, and starvation of long-input requests. We propose PecSched, a Preemptive and Efficient Cluster SCHEDuling system for LLM inference. PecSched introduces the following key techniques: 1) preemptive scheduling that prioritizes short-input requests for their performance; 2) coordinated prefill-decode colocation and disaggregation, which reduces both the duration and frequency of preemptions; 3) fast Sequence Parallelism (SP) that minimizes the prefill time of long-input requests to further reduce the likelihood and frequency of preemptions. Evaluations based on Azure LLM inference trace show that, compared to state-of-the-art cluster-level LLM inference schedulers, PecSched reduces the 99th percentile queueing delay of short-input requests by up to 92% and improves their throughput by up to 595%, without significantly affecting the Job Completion Time (JCT) of long-input requests. We open-sourced our code.

nan


Article 1209

Title@2025-06-08 (7): Universality of conformal prediction under the assumption of randomness

Title: Universality of conformal prediction under the assumption of randomness Universalität der konformen Vorhersage unter der Annahme der Zufälligkeit 假设随机性的情况下一致预测的普遍性 2502.19254v2

Authors (1): Vladimir Vovk

Conformal predictors provide set or functional predictions that are valid under the assumption of randomness, i.e., under the assumption of independent and identically distributed data. The question asked in this paper is whether there are predictors that are valid in the same sense under the assumption of randomness and that are more efficient than conformal predictors. The answer is that the class of conformal predictors is universal in that only limited gains in predictive efficiency are possible. The previous work in this area has relied on the algorithmic theory of randomness and so involved unspecified constants, whereas this paper’s results are much more practical. They are also shown to be optimal in some respects.

nan


Article 1210

Title@2025-06-08 (7): FDC: Fast KV Dimensionality Compression for Efficient LLM Inference

Title: FDC: Fast KV Dimensionality Compression for Efficient LLM Inference FDC: Schnelle KV-Dimensionalitätskompression für effiziente LLM-Inferenz FDC: 高效 LLM 推断快速 KV 尺寸压缩 2408.04107v3

Authors (2): Zeyu Zhang, Haiying Shen

In large-language models, memory constraints in the Key-Value Cache (KVC) pose a challenge during inference. In this work, we propose FDC, a fast KV dimensionality compression system that eliminates the decompression overhead incurred in the existing KV dimensionality compression system, Palu, and reduces attention time. Moreover, FDC employs adaptive compression, tailoring KV compression rates across heads and layers based on their contributions to inference to maximize overall compression while maintaining an accuracy loss constraint. Additionally, FDC enhances the attention kernel to balance the uneven workloads caused by the adaptive compression approach to further reduce attention computation latency. Comprehensive experiments demonstrate that compared to Palu, FDC can reduce Job Completion Time (JCT) by up to 64%, and delivers up to 1.97X throughput under the same latency, while maintaining 99% of the accuracy without compression. When state-of-the-art eviction and quantization methods are combined with FDC, they exhibit similar improvements compared to those combined with Palu. We open-sourced the code.

nan


Article 1211

Title@2025-06-08 (7): Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods

Title: Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods Ensemble-MIX: Verbesserung der Probeneffizienz in Multi-Agent RL mit Ensemble-Methoden 组合-MIX:利用组合方法提高多代理RL的取样效率 2506.02841v2

Authors (2): Tom Danino, Nahum Shimkin

Multi-agent reinforcement learning (MARL) methods have achieved state-of-the-art results on a range of multi-agent tasks. Yet, MARL algorithms typically require significantly more environment interactions than their single-agent counterparts to converge, a problem exacerbated by the difficulty in exploring over a large joint action space and the high variance intrinsic to MARL environments. To tackle these issues, we propose a novel algorithm that combines a decomposed centralized critic with decentralized ensemble learning, incorporating several key contributions. The main component in our scheme is a selective exploration method that leverages ensemble kurtosis. We extend the global decomposed critic with a diversity-regularized ensemble of individual critics and utilize its excess kurtosis to guide exploration toward high-uncertainty states and actions. To improve sample efficiency, we train the centralized critic with a novel truncated variation of the TD($\lambda$) algorithm, enabling efficient off-policy learning with reduced variance. On the actor side, our suggested algorithm adapts the mixed samples approach to MARL, mixing on-policy and off-policy loss functions for training the actors. This approach balances between stability and efficiency and outperforms purely off-policy learning. The evaluation shows our method outperforms state-of-the-art baselines on standard MARL benchmarks, including a variety of SMAC II maps.

nan


Article 1212

Title@2025-06-08 (7): BG-HOP: A Bimanual Generative Hand-Object Prior

Title: BG-HOP: A Bimanual Generative Hand-Object Prior BG-HOP: Bimanuelle Generative Hand-Objekt-Vorhersage BG-HOP: 双体生成手向对象前置 2506.09068v1

Authors (3): Sriram Krishna, Sravan Chittupalli, Sungjae Park

In this work, we present BG-HOP, a generative prior that seeks to model bimanual hand-object interactions in 3D. We address the challenge of limited bimanual interaction data by extending existing single-hand generative priors, demonstrating preliminary results in capturing the joint distribution of hands and objects. Our experiments showcase the model’s capability to generate bimanual interactions and synthesize grasps for given objects. We make code and models publicly available.

nan


Article 1213

Title@2025-06-08 (7): Counterfactual inference in sequential experiments

Title: Counterfactual inference in sequential experiments Gegenfaktische Inferenz in sequentiellen Experimenten 连续实验中的反事实推论 2202.06891v5

Authors (6): Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah

We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale – mean outcome under different treatments for each unit and each time – with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to $\infty$ together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.

nan


Article 1214

Title@2025-06-08 (7): RADAR: Recall Augmentation through Deferred Asynchronous Retrieval

Title: RADAR: Recall Augmentation through Deferred Asynchronous Retrieval RADAR: Erinnerung an Augmentation durch verzögerte asynchrone Retrieval RADAR:通过推迟的非同步回溯性回溯性增量 2506.07261v1

Authors (3): Amit Jaspal, Qian Dang, Ajantha Ramineni

Modern large-scale recommender systems employ multi-stage ranking funnel (Retrieval, Pre-ranking, Ranking) to balance engagement and computational constraints (latency, CPU). However, the initial retrieval stage, often relying on efficient but less precise methods like K-Nearest Neighbors (KNN), struggles to effectively surface the most engaging items from billion-scale catalogs, particularly distinguishing highly relevant and engaging candidates from merely relevant ones. We introduce Recall Augmentation through Deferred Asynchronous Retrieval (RADAR), a novel framework that leverages asynchronous, offline computation to pre-rank a significantly larger candidate set for users using the full complexity ranking model. These top-ranked items are stored and utilized as a high-quality retrieval source during online inference, bypassing online retrieval and pre-ranking stages for these candidates. We demonstrate through offline experiments that RADAR significantly boosts recall (2X Recall@200 vs DNN retrieval baseline) by effectively combining a larger retrieved candidate set with a more powerful ranking model. Online A/B tests confirm a +0.8% lift in topline engagement metrics, validating RADAR as a practical and effective method to improve recommendation quality under strict online serving constraints.

nan


Article 1215

Title@2025-06-08 (7): ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition

Title: ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition ALINE: Gemeinsame Amortisation für Bayesian Inferenz und aktive Datenerfassung AlINE: 贝耶斯推断和主动数据采集联合摊销 2506.07259v1

Authors (5): Daolang Huang, Xinyi Wen, Ayush Bharti, Samuel Kaski, Luigi Acerbi

Many critical applications, from autonomous scientific discovery to personalized medicine, demand systems that can both strategically acquire the most informative data and instantaneously perform inference based upon it. While amortized methods for Bayesian inference and experimental design offer part of the solution, neither approach is optimal in the most general and challenging task, where new data needs to be collected for instant inference. To tackle this issue, we introduce the Amortized Active Learning and Inference Engine (ALINE), a unified framework for amortized Bayesian inference and active data acquisition. ALINE leverages a transformer architecture trained via reinforcement learning with a reward based on self-estimated information gain provided by its own integrated inference component. This allows it to strategically query informative data points while simultaneously refining its predictions. Moreover, ALINE can selectively direct its querying strategy towards specific subsets of model parameters or designated predictive tasks, optimizing for posterior estimation, data prediction, or a mixture thereof. Empirical results on regression-based active learning, classical Bayesian experimental design benchmarks, and a psychometric model with selectively targeted parameters demonstrate that ALINE delivers both instant and accurate inference along with efficient selection of informative points.

nan


Article 1216

Title: Can the Rookies Cut the Tough Cookie? Exploring the Use of LLMs for SQL Equivalence Checking Können die Rookies das harte Cookie ausschneiden? Erforschen der Verwendung von LLMs für SQL-Equivalenzprüfung Rookies 能够切开硬饼干吗? 探索 SQL 等效检查使用 LLMs 的情况 2412.05561v2

Authors (2): Rajat Singh, Srikanta Bedathur

Equivalence checking of SQL queries is an intractable problem often encountered in settings ranging from grading SQL submissions to debugging query optimizers. Despite recent work toward developing practical solutions, only simple queries written using a small subset of SQL are supported, leaving the equivalence checking of sophisticated SQL queries at the mercy of intensive, potentially error-prone, manual analysis. In this paper, we explore how LLMs can be used to reason with SQL queries to address this challenging problem. Towards this, we introduce a novel, realistic, and sufficiently complex benchmark called SQLEquiQuest for SQL query equivalence checking that reflects real-world settings. We establish strong baselines for SQL equivalence checking by leveraging the ability of LLMs to reason with SQL queries. We conduct a detailed evaluation of several state-of-the-art LLMs using various prompting strategies and carefully constructed in-context learning examples, including logical plans generated by SQL query processors. Our empirical evaluation shows that LLMs go well beyond the current capabilities of formal models for SQL equivalence, going from a mere 30% supported query pairs to full coverage, achieving up to 82% accuracy on Spider+DIN. However, a critical limitation of LLMs revealed by our analysis is that they exhibit a strong bias for equivalence predictions, with consistently poor performance over non-equivalent pairs, opening a new direction for potential future research.

nan


Article 1217

Title@2025-06-08 (7): AMPO: Active Multi-Preference Optimization for Self-play Preference Selection

Title: AMPO: Active Multi-Preference Optimization for Self-play Preference Selection AMPO: Aktive Multi-Preference-Optimierung für Selbstspiel-Preference-Auswahl AMPO: 主动的自玩首选多优先优化 2502.18293v2

Authors (5): Taneesh Gupta, Rahul Madhavan, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan

Multi-preference optimization enriches language-model alignment beyond pairwise preferences by contrasting entire sets of helpful and undesired responses, thereby enabling richer training signals for large language models. During self-play alignment, these models often produce numerous candidate answers per query, rendering it computationally infeasible to include all responses in the training objective. In this work, we propose $\textit{Active Multi-Preference Optimization}$ (AMPO), a novel approach that combines on-policy generation, a multi-preference group-contrastive loss, and active subset selection. Specifically, we score and embed large candidate pools of responses and then select a small, yet informative, subset that covers reward extremes and distinct semantic clusters for preference optimization. Our contrastive training scheme is capable of identifying not only the best and worst answers but also subtle, underexplored modes that are crucial for robust alignment. Theoretically, we provide guarantees for expected reward maximization using our active selection method, and empirically, AMPO achieves state-of-the-art results on $\textit{AlpacaEval}$ using Llama 8B and Mistral 7B. We release our datasets $\href{https://huggingface.co/Multi-preference-Optimization}{here}$.

nan


Article 1218

Title@2025-06-08 (7): DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

Title: DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products DeltaProdukt: Verbesserung der State-Tracking in linearen RNNs über Haushaltsprodukte DeltaProduction:通过家用产品改进国家通过家用产品对Linear RNNNs的跟踪 2502.10297v5

Authors (6): Julien Siems, Timur Carstensen, Arber Zela, Frank Hutter, Massimiliano Pontil, Riccardo Grazzi

Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling, offering efficient training and linear-time inference. However, existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices. Diagonal matrices, used in models such as Mamba, GLA, or mLSTM, yield fast runtime but have limited expressivity. To address this, recent architectures such as DeltaNet and RWKV-7 adopted a diagonal plus rank-1 structure, which allows simultaneous token and channel mixing, improving associative recall and, as recently shown, state-tracking when allowing negative eigenvalues in the state-transition matrices. Building on the interpretation of DeltaNet’s recurrence as performing one step of online gradient descent per token on an associative recall loss, we introduce DeltaProduct, which instead takes multiple ($n_h$) steps per token. This naturally leads to diagonal plus rank-$n_h$ state-transition matrices, formed as products of $n_h$ generalized Householder transformations, providing a tunable mechanism to balance expressivity and efficiency. We provide a detailed theoretical characterization of the state-tracking capability of DeltaProduct in finite precision, showing how it improves by increasing $n_h$. Our extensive experiments demonstrate that DeltaProduct outperforms DeltaNet in both state-tracking and language modeling, while also showing significantly improved length extrapolation capabilities.

nan


Article 1219

Title@2025-06-08 (7): Horizon Reduction Makes RL Scalable

Title: Horizon Reduction Makes RL Scalable Horizontreduzierung macht RL skalierbar 降低地平线使RL可缩放 2506.04168v2

Authors (6): Seohong Park, Kevin Frans, Deepinder Mann, Benjamin Eysenbach, Aviral Kumar, Sergey Levine

In this work, we study the scalability of offline reinforcement learning (RL) algorithms. In principle, a truly scalable offline RL algorithm should be able to solve any given problem, regardless of its complexity, given sufficient data, compute, and model capacity. We investigate if and how current offline RL algorithms match up to this promise on diverse, challenging, previously unsolved tasks, using datasets up to 1000x larger than typical offline RL datasets. We observe that despite scaling up data, many existing offline RL algorithms exhibit poor scaling behavior, saturating well below the maximum performance. We hypothesize that the horizon is the main cause behind the poor scaling of offline RL. We empirically verify this hypothesis through several analysis experiments, showing that long horizons indeed present a fundamental barrier to scaling up offline RL. We then show that various horizon reduction techniques substantially enhance scalability on challenging tasks. Based on our insights, we also introduce a minimal yet scalable method named SHARSA that effectively reduces the horizon. SHARSA achieves the best asymptotic performance and scaling behavior among our evaluation methods, showing that explicitly reducing the horizon unlocks the scalability of offline RL. Code: https://github.com/seohongpark/horizon-reduction

nan


Article 1220

Title@2025-06-08 (7): Regretful Decisions under Label Noise

Title: Regretful Decisions under Label Noise Bedauerliche Entscheidungen im Rahmen von Label Noise 标签噪音下的遗憾决定 2504.09330v2

Authors (4): Sujay Nagaraj, Yang Liu, Flavio P. Calmon, Berk Ustun

Machine learning models are routinely used to support decisions that affect individuals – be it to screen a patient for a serious illness or to gauge their response to treatment. In these tasks, we are limited to learning models from datasets with noisy labels. In this paper, we study the instance-level impact of learning under label noise. We introduce a notion of regret for this regime, which measures the number of unforeseen mistakes due to noisy labels. We show that standard approaches to learning under label noise can return models that perform well at a population-level while subjecting individuals to a lottery of mistakes. We present a versatile approach to estimate the likelihood of mistakes at the individual-level from a noisy dataset by training models over plausible realizations of datasets without label noise. This is supported by a comprehensive empirical study of label noise in clinical prediction tasks. Our results reveal how failure to anticipate mistakes can compromise model reliability and adoption – we demonstrate how we can address these challenges by anticipating and avoiding regretful decisions.

nan


Article 1221

Title@2025-06-08 (7): Unraveling Token Prediction Refinement and Identifying Essential Layers in Language Models

Title: Unraveling Token Prediction Refinement and Identifying Essential Layers in Language Models Unraveling Token Prediction Verfeinerung und Identifizierung wesentlicher Ebenen in Sprachmodellen 在语言模型中统一地改进和确定关键层 2501.15054v2

Authors (1): Jaturong Kongmanee

This research aims to unravel how large language models (LLMs) iteratively refine token predictions through internal processing. We utilized a logit lens technique to analyze the model’s token predictions derived from intermediate representations. Specifically, we focused on (1) how LLMs access and utilize information from input contexts, and (2) how positioning of relevant information affects the model’s token prediction refinement process. On a multi-document question answering task with varying input context lengths, we found that the depth of prediction refinement (defined as the number of intermediate layers an LLM uses to transition from an initial correct token prediction to its final, stable correct output), as a function of the position of relevant information, exhibits an approximately inverted U-shaped curve. We also found that the gap between these two layers, on average, diminishes when relevant information is positioned at the beginning or end of the input context. This suggested that the model requires more refinements when processing longer contexts with relevant information situated in the middle. Furthermore, our findings indicate that not all layers are equally essential for determining final correct outputs. Our analysis provides insights into how token predictions are distributed across different conditions, and establishes important connections to existing hypotheses and previous findings in AI safety research and development.

nan


Article 1222

Title: Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search Unsichere LLM-basierte Suche: Quantitative Analyse und Minderung von Sicherheitsrisiken in der AI-Websuche 不安全的LLM-基于LLM的搜索:AI网络搜索中安全风险的定量分析和减轻 2502.04951v2

Authors (7): Zeren Luo, Zifan Peng, Yule Liu, Zhen Sun, Mingchen Li, Jingyi Zheng, Xinlei He

Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of AI-Powered Search Engines (AIPSEs), offering precise and efficient responses by integrating external databases with pre-existing knowledge. However, we observe that these AIPSEs raise risks such as quoting malicious content or citing malicious websites, leading to harmful or unverified information dissemination. In this study, we conduct the first safety risk quantification on seven production AIPSEs by systematically defining the threat model, risk type, and evaluating responses to various query types. With data collected from PhishTank, ThreatBook, and LevelBlue, our findings reveal that AIPSEs frequently generate harmful content that contains malicious URLs even with benign queries (e.g., with benign keywords). We also observe that directly querying a URL will increase the number of main risk-inclusive responses, while querying with natural language will slightly mitigate such risk. Compared to traditional search engines, AIPSEs outperform in both utility and safety. We further perform two case studies on online document spoofing and phishing to show the ease of deceiving AIPSEs in the real-world setting. To mitigate these risks, we develop an agent-based defense with a GPT-4.1-based content refinement tool and a URL detector. Our evaluation shows that our defense can effectively reduce the risk, with only a minor cost of reducing available information by approximately 10.7%. Our research highlights the urgent need for robust safety measures in AIPSEs.

nan


Article 1223

Title@2025-06-08 (7): Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy

Title: Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy Imperatives Lernen: Ein selbstüberwachtes neuro-symbolisches Lernrahmenwerk für Roboterautonomie 必要学习:自我监督的机器人自主的神经-双向学习框架 2406.16087v6

Authors (19): Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao

Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, labeling data for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neuro-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.

nan


Article 1224

Title@2025-06-08 (7): InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference

Title: InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference InfoSEM: Ein tiefgehendes Generatives Modell mit informativen Prioren für Gene Regulatory Network Inferenz InfoSEM: 基因监管网络推论具有信息前置信息的深生成模型 2503.04483v2

Authors (7): Tianyu Cui, Song-Jun Xu, Artem Moskalev, Shuwei Li, Tommaso Mansi, Mangal Prakash, Rui Liao

Inferring Gene Regulatory Networks (GRNs) from gene expression data is crucial for understanding biological processes. While supervised models are reported to achieve high performance for this task, they rely on costly ground truth (GT) labels and risk learning gene-specific biases, such as class imbalances of GT interactions, rather than true regulatory mechanisms. To address these issues, we introduce InfoSEM, an unsupervised generative model that leverages textual gene embeddings as informative priors, improving GRN inference without GT labels. InfoSEM can also integrate GT labels as an additional prior when available, avoiding biases and further enhancing performance. Additionally, we propose a biologically motivated benchmarking framework that better reflects real-world applications such as biomarker discovery and reveals learned biases of existing supervised methods. InfoSEM outperforms existing models by 38.5% across four datasets using textual embeddings prior and further boosts performance by 11.1% when integrating labeled data as priors.

nan


Article 1225

Title@2025-06-08 (7): Improving the Efficiency of Long Document Classification using Sentence Ranking Approach

Title: Improving the Efficiency of Long Document Classification using Sentence Ranking Approach Verbesserung der Effizienz der Langdokumentklassifikation mittels Sentence-Ranking-Ansatz 采用判决分级办法提高长文件分类的效率 2506.07248v1

Authors (4): Prathamesh Kokate, Mitali Sarnaik, Manavi Khopade, Raviraj Joshi

Long document classification poses challenges due to the computational limitations of transformer-based models, particularly BERT, which are constrained by fixed input lengths and quadratic attention complexity. Moreover, using the full document for classification is often redundant, as only a subset of sentences typically carries the necessary information. To address this, we propose a TF-IDF-based sentence ranking method that improves efficiency by selecting the most informative content. Our approach explores fixed-count and percentage-based sentence selection, along with an enhanced scoring strategy combining normalized TF-IDF scores and sentence length. Evaluated on the MahaNews LDC dataset of long Marathi news articles, the method consistently outperforms baselines such as first, last, and random sentence selection. With MahaBERT-v2, we achieve near-identical classification accuracy with just a 0.33 percent drop compared to the full-context baseline, while reducing input size by over 50 percent and inference latency by 43 percent. This demonstrates that significant context reduction is possible without sacrificing performance, making the method practical for real-world long document classification tasks.

nan


Article 1226

Title@2025-06-08 (7): Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models

Title: Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models Förderung der Ensemblevielfalt mit interaktiver bayesischer Distributions Robustheit für Feinsteuerungs-Stiftungsmodelle 促进混合多样性与互动的贝叶斯分配强力促进微调基础模型 2506.07247v1

Authors (6): Ngoc-Quan Pham, Tuan Truong, Quyen Tran, Tan Nguyen, Dinh Phung, Trung Le

We introduce Interactive Bayesian Distributional Robustness (IBDR), a novel Bayesian inference framework that allows modeling the interactions between particles, thereby enhancing ensemble quality through increased particle diversity. IBDR is grounded in a generalized theoretical framework that connects the distributional population loss with the approximate posterior, motivating a practical dual optimization procedure that enforces distributional robustness while fostering particle diversity. We evaluate IBDR’s performance against various baseline methods using the VTAB-1K benchmark and the common reasoning language task. The results consistently show that IBDR outperforms these baselines, underscoring its effectiveness in real-world applications.

nan


Article 1227

Title@2025-06-08 (7): Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

Title: Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs Übertaktung LLM-Reasoning: Überwachung und Kontrolle von Denkpfadlängen in LLMs LLM 解释:监测和控制LLM中思考长度 2506.07240v1

Authors (3): Roy Eisenstadt, Itamar Zimerman, Lior Wolf

Recently, techniques such as explicit structured reasoning have demonstrated strong test-time scaling behavior by enforcing a separation between the model’s internal “thinking” process and the final response. A key factor influencing answer quality in this setting is the length of the thinking stage. When the reasoning is too short, the model may fail to capture the complexity of the task. Conversely, when it is too long, the model may overthink, leading to unnecessary computation and degraded performance. This paper explores and exploits the underlying mechanisms by which LLMs understand and regulate the length of their reasoning during explicit thought processes. First, we show that LLMs encode their progress through the reasoning process and introduce an interactive progress bar visualization, which is then used to reveal insights on the model’s planning dynamics. Second, we manipulate the internal progress encoding during inference to reduce unnecessary steps and generate a more concise and decisive chain of thoughts. Our empirical results demonstrate that this “overclocking” method mitigates overthinking, improves answer accuracy, and reduces inference latency. Our code is publicly available.

nan


Article 1228

Title@2025-06-08 (7): Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments

Title: Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments Lernen als Individuen, Evolve als Team: Multi-Agent LLMs Anpassung in körpereigenen Umgebungen 作为个人学习,作为一个团队参与:多剂LMs在渗透环境中的适应 2506.07232v1

Authors (6): Xinran Li, Chenjia Bai, Zijian Li, Jiakun Zheng, Ting Xiao, Jun Zhang

Large language models (LLMs) possess extensive knowledge bases and strong reasoning capabilities, making them promising tools for complex, multi-agent planning in embodied environments. However, despite LLMs’ advanced abilities and the sophisticated modular design of agentic methods, existing LLM-based planning algorithms remain limited by weak adaptation capabilities to multi-agent embodied scenarios. We address this limitation by introducing a framework that enables LLM agents to learn and evolve both before and during test time, equipping them with environment-relevant knowledge for better planning and enhanced communication for improved cooperation. Inspired by centralized training with decentralized execution in multi-agent reinforcement learning, we propose a \textit{Learn as Individuals, Evolve as a Team (LIET)} paradigm for multi-agent LLMs adaptation. At the individual level, LLM agents learn a local utility function from exploratory datasets to better comprehend the embodied environment, which is then queried during test time to support informed decision-making. At the team level, LLM agents collaboratively and iteratively maintain and update a shared cooperation knowledge list based on new experiences, using it to guide more effective communication. By combining individual learning with team evolution, LIET enables comprehensive and flexible adaptation for LLM agents. Our experiments on Communicative Watch-And-Help and ThreeD-World Multi-Agent Transport benchmarks demonstrate that LIET, instantiated with both LLaMA and GPT-4o, outperforms existing baselines and exhibits strong cooperative planning abilities.

nan


Article 1229

Title@2025-06-08 (7): VARSHAP: Addressing Global Dependency Problems in Explainable AI with Variance-Based Local Feature Attribution

Title: VARSHAP: Addressing Global Dependency Problems in Explainable AI with Variance-Based Local Feature Attribution VARSHAP: Bewältigung globaler Abhängigkeitsprobleme bei erklärbarer KI mit varianzbasierter lokaler Feature-Attribution VARSHAP: 解决全球依赖性问题,在可解释的AI 中与基于差异的地方特点 2506.07229v1

Authors (4): Mateusz Gajewski, Mikołaj Morzy, Adam Karczmarz, Piotr Sankowski

Existing feature attribution methods like SHAP often suffer from global dependence, failing to capture true local model behavior. This paper introduces VARSHAP, a novel model-agnostic local feature attribution method which uses the reduction of prediction variance as the key importance metric of features. Building upon Shapley value framework, VARSHAP satisfies the key Shapley axioms, but, unlike SHAP, is resilient to global data distribution shifts. Experiments on synthetic and real-world datasets demonstrate that VARSHAP outperforms popular methods such as KernelSHAP or LIME, both quantitatively and qualitatively.

nan


Article 1230

Title@2025-06-08 (7): A spring-block theory of feature learning in deep neural networks

Title: A spring-block theory of feature learning in deep neural networks Eine Frühling-Block-Theorie des Feature-Lernens in tiefen neuronalen Netzwerken 深神经网络中特征学习的春季阶段理论 2407.19353v3

Authors (3): Cheng Shi, Liming Pan, Ivan Dokmanić

Feature-learning deep nets progressively collapse data to a regular low-dimensional geometry. How this emerges from the collective action of nonlinearity, noise, learning rate, and other factors, has eluded first-principles theories built from microscopic neuronal dynamics. We exhibit a noise-nonlinearity phase diagram that identifies regimes where shallow or deep layers learn more effectively and propose a macroscopic mechanical theory that reproduces the diagram and links feature learning across layers to generalization.

nan


Article 1231

Title@2025-06-08 (7): Targeting relative risk heterogeneity with causal forests

Title: Targeting relative risk heterogeneity with causal forests Ausrichtung der relativen Risiko Heterogenität mit ursächlichen Wäldern 确定相对风险目标,确定与因果森林的异质性 2309.15793v3

Authors (4): Vik Shirvaikar, Andrea Storås, Xi Lin, Chris Holmes

The identification of heterogeneous treatment effects (HTE) across subgroups is of significant interest in clinical trial analysis. Several state-of-the-art HTE estimation methods, including causal forests, apply recursive partitioning for non-parametric identification of relevant covariates and interactions. However, the partitioning criterion is typically based on differences in absolute risk. This can dilute statistical power by masking variation in the relative risk, which is often a more appropriate quantity of clinical interest. In this work, we propose and implement a methodology for modifying causal forests to target relative risk, using a novel node-splitting procedure based on exhaustive generalized linear model comparison. We present results from simulated data that suggest relative risk causal forests can capture otherwise undetected sources of heterogeneity. We implement our method on real-world trial data to explore HTEs for liraglutide in patients with type 2 diabetes.

nan


Article 1232

Title@2025-06-08 (7): VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data

Title: VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data VORTEX: Ein räumliches Computing-Framework für optimierte Drohnentelemetrie-Extraktion aus der First-Person-Ansicht Flugdaten anzeigen VORTEX:从第一人查看飞行数据中最佳遥测提取无人机空间计算框架 2412.18505v2

Authors (2): James E. Gallagher, Edward J. Oughton

This paper presents the Visual Optical Recognition Telemetry EXtraction (VORTEX) system for extracting and analyzing drone telemetry data from First Person View (FPV) Uncrewed Aerial System (UAS) footage. VORTEX employs MMOCR, a PyTorch-based Optical Character Recognition (OCR) toolbox, to extract telemetry variables from drone Heads Up Display (HUD) recordings, utilizing advanced image preprocessing techniques, including CLAHE enhancement and adaptive thresholding. The study optimizes spatial accuracy and computational efficiency through systematic investigation of temporal sampling rates (1s, 5s, 10s, 15s, 20s) and coordinate processing methods. Results demonstrate that the 5-second sampling rate, utilizing 4.07% of available frames, provides the optimal balance with a point retention rate of 64% and mean speed accuracy within 4.2% of the 1-second baseline while reducing computational overhead by 80.5%. Comparative analysis of coordinate processing methods reveals that while UTM Zone 33N projection and Haversine calculations provide consistently similar results (within 0.1% difference), raw WGS84 coordinates underestimate distances by 15-30% and speeds by 20-35%. Altitude measurements showed unexpected resilience to sampling rate variations, with only 2.1% variation across all intervals. This research is the first of its kind, providing quantitative benchmarks for establishing a robust framework for drone telemetry extraction and analysis using open-source tools and spatial libraries.

nan


Article 1233

Title@2025-06-08 (7): Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

Title: Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts Diversifizierung des Expertenwissens für Task-Agnostic Pruning in Sparse Mixture-of-Experts 使专家在Sparse Mixture 专家中的任务不可知性调节专家知识多样化 2407.09590v4

Authors (5): Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao

By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our empirical study reveals that some experts encode redundant knowledge during pre-training. We thus propose a method of grouping and pruning similar experts to improve the model’s parameter efficiency. We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures, including Mixtral, Deepseek-MoE, and Qwen. The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks. We will release our code to facilitate future research.

nan


Article 1234

Title@2025-06-08 (7): Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward

Title: Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward Mehr multimodale Begründungsfähigkeiten multimodaler Großsprachenmodelle durch Visual Perception Reward 通过视觉感知奖励促进多式大语言模型的多式多式能力 2506.07218v1

Authors (7): Tong Xiao, Xin Xu, Zhenya Huang, Hongyu Gao, Quan Liu, Qi Liu, Enhong Chen

Enhancing the multimodal reasoning capabilities of Multimodal Large Language Models (MLLMs) is a challenging task that has attracted increasing attention in the community. Recently, several studies have applied Reinforcement Learning with Verifiable Rewards (RLVR) to the multimodal domain in order to enhance the reasoning abilities of MLLMs. However, these works largely overlook the enhancement of multimodal perception capabilities in MLLMs, which serve as a core prerequisite and foundational component of complex multimodal reasoning. Through McNemar’s test, we find that existing RLVR method fails to effectively enhance the multimodal perception capabilities of MLLMs, thereby limiting their further improvement in multimodal reasoning. To address this limitation, we propose Perception-R1, which introduces a novel visual perception reward that explicitly encourages MLLMs to perceive the visual content accurately, thereby can effectively incentivizing both their multimodal perception and reasoning capabilities. Specifically, we first collect textual visual annotations from the CoT trajectories of multimodal problems, which will serve as visual references for reward assignment. During RLVR training, we employ a judging LLM to assess the consistency between the visual annotations and the responses generated by MLLM, and assign the visual perception reward based on these consistency judgments. Extensive experiments on several multimodal reasoning benchmarks demonstrate the effectiveness of our Perception-R1, which achieves state-of-the-art performance on most benchmarks using only 1,442 training data.

nan


Article 1235

Title@2025-06-08 (7): Decoupled Data Consistency with Diffusion Purification for Image Restoration

Title: Decoupled Data Consistency with Diffusion Purification for Image Restoration Entkoppelte Datenkonsistenz mit Diffusionsreinigung für Bildrestaurierung 数据与用于图像恢复的传播净化相兼容 2403.06054v6

Authors (6): Xiang Li, Soo Min Kwon, Shijun Liang, Ismail R. Alkhouri, Saiprasad Ravishankar, Qing Qu

Diffusion models have recently gained traction as a powerful class of deep generative priors, excelling in a wide range of image restoration tasks due to their exceptional ability to model data distributions. To solve image restoration problems, many existing techniques achieve data consistency by incorporating additional likelihood gradient steps into the reverse sampling process of diffusion models. However, the additional gradient steps pose a challenge for real-world practical applications as they incur a large computational overhead, thereby increasing inference time. They also present additional difficulties when using accelerated diffusion model samplers, as the number of data consistency steps is limited by the number of reverse sampling steps. In this work, we propose a novel diffusion-based image restoration solver that addresses these issues by decoupling the reverse process from the data consistency steps. Our method involves alternating between a reconstruction phase to maintain data consistency and a refinement phase that enforces the prior via diffusion purification. Our approach demonstrates versatility, making it highly adaptable for efficient problem-solving in latent space. Additionally, it reduces the necessity for numerous sampling steps through the integration of consistency models. The efficacy of our approach is validated through comprehensive experiments across various image restoration tasks, including image denoising, deblurring, inpainting, and super-resolution.

nan


Article 1236

Title@2025-06-08 (7): RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts

Title: RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts RepLoRA: Reparameterisieren der Low-Rank-Anpassung über die Perspektive der Mischung von Experten RepLORA:通过专家混合的视角,恢复低成本适应的计量 2502.03044v2

Authors (6): Tuan Truong, Chau Nguyen, Huy Nguyen, Minh Le, Trung Le, Nhat Ho

Low-rank Adaptation (LoRA) has emerged as a powerful method for fine-tuning large-scale foundation models. Despite its popularity, the theoretical understanding of LoRA has remained limited. This paper presents a theoretical analysis of LoRA by examining its connection to the Mixture of Experts models. Under this framework, we show that simple reparameterizations of the LoRA matrices can notably accelerate the low-rank matrix estimation process. In particular, we prove that reparameterization can reduce the data needed to achieve a desired estimation error from an exponential to a polynomial scale. Motivated by this insight, we propose Reparameterized Low-Rank Adaptation (RepLoRA), which incorporates lightweight MLPs to reparameterize the LoRA matrices. Extensive experiments across multiple domains demonstrate that RepLoRA consistently outperforms vanilla LoRA. Notably, with limited data, RepLoRA surpasses LoRA by a margin of up to 40.0% and achieves LoRA’s performance with only 30.0% of the training data, highlighting both the theoretical and empirical robustness of our PEFT method.

nan


Article 1237

Title@2025-06-08 (7): Tilted Sharpness-Aware Minimization

Title: Tilted Sharpness-Aware Minimization Gekippte Schärfe-Bewusst-Minimierung 静液尖锐度- 最小化 2410.22656v2

Authors (3): Tian Li, Tianyi Zhou, Jeffrey A. Bilmes

Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of overparameterized models by seeking flat minima on the loss landscape through optimizing model parameters that incur the largest loss within a neighborhood. Nevertheless, such min-max formulations are computationally challenging especially when the problem is highly non-convex. Additionally, focusing only on the worst-case local solution while ignoring potentially many other local solutions may be suboptimal when searching for flat minima. In this work, we propose Tilted SAM (TSAM), a smoothed generalization of SAM inspired by exponential tilting that effectively assigns higher priority to local solutions that incur larger losses. TSAM is parameterized by a tilt hyperparameter $t$ and reduces to SAM as $t$ approaches infinity. We show that TSAM is smoother than SAM and thus easier to optimize, and it explicitly favors flatter minima. We develop algorithms motivated by the discretization of Hamiltonian dynamics to solve TSAM. Empirically, TSAM arrives at flatter local minima and results in superior test performance than the baselines of SAM and ERM across a range of image and text tasks.

nan


Article 1238

Title@2025-06-08 (7): Randomness, exchangeability, and conformal prediction

Title: Randomness, exchangeability, and conformal prediction Zufälligkeit, Austauschbarkeit und konforme Vorhersage 随机性、易交换性以及符合性预测 2501.11689v3

Authors (1): Vladimir Vovk

This paper argues for a wider use of the functional theory of randomness, a modification of the algorithmic theory of randomness getting rid of unspecified additive constants. Both theories are useful for understanding relationships between the assumptions of IID data and data exchangeability. While the assumption of IID data is standard in machine learning, conformal prediction relies on data exchangeability. Nouretdinov, V’yugin, and Gammerman showed, using the language of the algorithmic theory of randomness, that conformal prediction is a universal method under the assumption of IID data. In this paper (written for the Alex Gammerman Festschrift) I will selectively review connections between exchangeability and the property of being IID, early history of conformal prediction, my encounters and collaboration with Alex and other interesting people, and a translation of Nouretdinov et al.’s results into the language of the functional theory of randomness, which moves it closer to practice. Namely, the translation says that every confidence predictor that is valid for IID data can be transformed to a conformal predictor without losing much in predictive efficiency.

nan


Article 1239

Title@2025-06-08 (7): Improving Generalization with Flat Hilbert Bayesian Inference

Title: Improving Generalization with Flat Hilbert Bayesian Inference Verbesserung der Generalisierung durch flache Hilbert Bayesische Schlussfolgerung 利用Flat Hilbert Bayesian推论改进普遍化 2410.04196v2

Authors (6): Tuan Truong, Quyen Tran, Quan Pham-Ngoc, Nhat Ho, Dinh Phung, Trung Le

We introduce Flat Hilbert Bayesian Inference (FHBI), an algorithm designed to enhance generalization in Bayesian inference. Our approach involves an iterative two-step procedure with an adversarial functional perturbation step and a functional descent step within a reproducing kernel Hilbert space. This methodology is supported by a theoretical analysis that extends previous findings on generalization ability from finite-dimensional Euclidean spaces to infinite-dimensional functional spaces. To evaluate the effectiveness of FHBI, we conduct comprehensive comparisons against nine baseline methods on the \texttt{VTAB-1K} benchmark, which encompasses 19 diverse datasets across various domains with diverse semantics. Empirical results demonstrate that FHBI consistently outperforms the baselines by notable margins, highlighting its practical efficacy.

nan


Article 1240

Title@2025-06-08 (7): Another look at inference after prediction

Title: Another look at inference after prediction Ein weiterer Blick auf die Schlussfolgerung nach der Vorhersage 另一种观点是预测之后的推论 2411.19908v4

Authors (5): Jessica Gronsbell, Jianhui Gao, Yaqi Shi, Zachary R. McCaw, David Cheng

From structural biology to epidemiology, predictions from machine learning (ML) models are increasingly used to complement costly gold-standard data to enable faster, more affordable, and scalable scientific inquiry. In response, prediction-based (PB) inference has emerged to accommodate statistical analysis using a large volume of predictions together with a small amount of gold-standard data. The goals of PB inference are two-fold: (i) to mitigate bias from errors in predictions and (ii) to improve efficiency relative to traditional inference using only the gold-standard data. While early PB inference methods focused on bias, their ability to enhance efficiency remains unclear. We revisit a popular PB inference method and show that a simple modification can be applied to guarantee improvements in efficiency beyond yielding valid inferences when the ML predictions are imperfect. The utility of this approach in leveraging prediction-based outcomes to enhance efficiency is demonstrated through extensive simulation studies and an application to the UK Biobank data. We further contextualize the problem of PB inference through historical literature from economics and statistics to highlight perspectives from classical methods in this contemporary problem.

nan


Article 1241

Title@2025-06-08 (7): Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching

Title: Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching Audio-Synthesizer-Inversion in symmetrischen Parameterräumen mit ungefähr gleichmässigem Flow-Matching 音频合成器在对称参数空间的反转, 大约等差流匹配 2506.07199v1

Authors (3): Ben Hayes, Charalampos Saitis, György Fazekas

Many audio synthesizers can produce the same signal given different parameter configurations, meaning the inversion from sound to parameters is an inherently ill-posed problem. We show that this is largely due to intrinsic symmetries of the synthesizer, and focus in particular on permutation invariance. First, we demonstrate on a synthetic task that regressing point estimates under permutation symmetry degrades performance, even when using a permutation-invariant loss function or symmetry-breaking heuristics. Then, viewing equivalent solutions as modes of a probability distribution, we show that a conditional generative model substantially improves performance. Further, acknowledging the invariance of the implicit parameter distribution, we find that performance is further improved by using a permutation equivariant continuous normalizing flow. To accommodate intricate symmetries in real synthesizers, we also propose a relaxed equivariance strategy that adaptively discovers relevant symmetries from data. Applying our method to Surge XT, a full-featured open source synthesizer used in real world audio production, we find our method outperforms regression and generative baselines across audio reconstruction metrics.

nan


Article 1242

Title@2025-06-08 (7): GGBall: Graph Generative Model on Poincaré Ball

Title: GGBall: Graph Generative Model on Poincaré Ball GGBall: Graph Generatives Modell auf Poincaré Ball GGBall:poincaré球球的图形生成模型 2506.07198v1

Authors (6): Tianci Bu, Chuanrui Wang, Hao Ma, Haoren Zheng, Xin Lu, Tailin Wu

Generating graphs with hierarchical structures remains a fundamental challenge due to the limitations of Euclidean geometry in capturing exponential complexity. Here we introduce \textbf{GGBall}, a novel hyperbolic framework for graph generation that integrates geometric inductive biases with modern generative paradigms. GGBall combines a Hyperbolic Vector-Quantized Autoencoder (HVQVAE) with a Riemannian flow matching prior defined via closed-form geodesics. This design enables flow-based priors to model complex latent distributions, while vector quantization helps preserve the curvature-aware structure of the hyperbolic space. We further develop a suite of hyperbolic GNN and Transformer layers that operate entirely within the manifold, ensuring stability and scalability. Empirically, our model reduces degree MMD by over 75\% on Community-Small and over 40\% on Ego-Small compared to state-of-the-art baselines, demonstrating an improved ability to preserve topological hierarchies. These results highlight the potential of hyperbolic geometry as a powerful foundation for the generative modeling of complex, structured, and hierarchical data domains. Our code is available at \href{https://github.com/AI4Science-WestlakeU/GGBall}{here}.

nan


Article 1243

Title@2025-06-08 (7): Finite-PINN: A Physics-Informed Neural Network with Finite Geometric Encoding for Solid Mechanics

Title: Finite-PINN: A Physics-Informed Neural Network with Finite Geometric Encoding for Solid Mechanics Finite-PINN: Ein physikinformiertes Neuronales Netzwerk mit Finite Geometrischer Encoding für feste Mechanik Finite-PINN:一个物理化神经网络,其中含有固体机械学的有限几何编码 2412.09453v2

Authors (4): Haolin Li, Yuyang Miao, Zahra Sharif Khodaei, M. H. Aliabadi

PINN models have demonstrated capabilities in addressing fluid PDE problems, and their potential in solid mechanics is beginning to emerge. This study identifies two key challenges when using PINN to solve general solid mechanics problems. These challenges become evident when comparing the limitations of PINN with the well-established numerical methods commonly used in solid mechanics, such as the finite element method (FEM). Specifically: a) PINN models generate solutions over an infinite domain, which conflicts with the finite boundaries typical of most solid structures; and b) the solution space utilised by PINN is Euclidean, which is inadequate for addressing the complex geometries often present in solid structures. This work presents a PINN architecture for general solid mechanics problems, referred to as the Finite-PINN model. The model is designed to effectively tackle two key challenges, while retaining as much of the original PINN framework as possible. To this end, the Finite-PINN incorporates finite geometric encoding into the neural network inputs, thereby transforming the solution space from a conventional Euclidean space into a hybrid Euclidean-topological space. The model is comprehensively trained using both strong-form and weak-form loss formulations, enabling its application to a wide range of forward and inverse problems in solid mechanics. For forward problems, the Finite-PINN model efficiently approximates solutions to solid mechanics problems when the geometric information of a given structure has been preprocessed. For inverse problems, it effectively reconstructs full-field solutions from very sparse observations by embedding both physical laws and geometric information within its architecture.

nan


Article 1244

Title@2025-06-08 (7): Analyzing Breast Cancer Survival Disparities by Race and Demographic Location: A Survival Analysis Approach

Title: Analyzing Breast Cancer Survival Disparities by Race and Demographic Location: A Survival Analysis Approach Analyse der Überlebensdisparitäten von Brustkrebs nach Rasse und demographischer Lage: Ein Ansatz zur Überlebensanalyse 按种族和人口地点分析乳腺癌生存差异:生存分析方法 2506.07191v1

Authors (2): Ramisa Farha, Joshua O. Olukoya

This study employs a robust analytical framework to uncover patterns in survival outcomes among breast cancer patients from diverse racial and geographical backgrounds. This research uses the SEER 2021 dataset to analyze breast cancer survival outcomes to identify and comprehend dissimilarities. Our approach integrates exploratory data analysis (EDA), through this we identify key variables that influence survival rates and employ survival analysis techniques, including the Kaplan-Meier estimator and log-rank test and the advanced modeling Cox Proportional Hazards model to determine how survival rates vary across racial groups and countries. Model validation and interpretation are undertaken to ensure the reliability of our findings, which are documented comprehensively to inform policymakers and healthcare professionals. The outcome of this paper is a detailed version of statistical analysis that not just highlights disparities in breast cancer treatment and care but also serves as a foundational tool for developing targeted interventions to address the inequalities effectively. Through this research, our aim is to contribute to the global efforts to improve breast cancer outcomes and reduce treatment disparities.

nan


Article 1245

Title@2025-06-08 (7): ST-USleepNet: A Spatial-Temporal Coupling Prominence Network for Multi-Channel Sleep Staging

Title: ST-USleepNet: A Spatial-Temporal Coupling Prominence Network for Multi-Channel Sleep Staging ST-USleepNet: Ein Raum-Temporal-Koppelungs-Prominenz-Netzwerk für Multi-Channel-Sleep-Staging ST-USSLSLSINT:多通道睡眠调节空间-时际聚合显要网络 2408.11884v3

Authors (4): Jingying Ma, Qika Lin, Ziyu Jia, Mengling Feng

Sleep staging is critical to assess sleep quality and diagnose disorders. Despite advancements in artificial intelligence enabling automated sleep staging, significant challenges remain: (1) Simultaneously extracting prominent temporal and spatial sleep features from multi-channel raw signals, including characteristic sleep waveforms and salient spatial brain networks. (2) Capturing the spatial-temporal coupling patterns essential for accurate sleep staging. To address these challenges, we propose a novel framework named ST-USleepNet, comprising a spatial-temporal graph construction module (ST) and a U-shaped sleep network (USleepNet). The ST module converts raw signals into a spatial-temporal graph based on signal similarity, temporal, and spatial relationships to model spatial-temporal coupling patterns. The USleepNet employs a U-shaped structure for both the temporal and spatial streams, mirroring its original use in image segmentation to isolate significant targets. Applied to raw sleep signals and graph data from the ST module, USleepNet effectively segments these inputs, simultaneously extracting prominent temporal and spatial sleep features. Testing on three datasets demonstrates that ST-USleepNet outperforms existing baselines, and model visualizations confirm its efficacy in extracting prominent sleep features and temporal-spatial coupling patterns across various sleep stages. The code is available at https://github.com/Majy-Yuji/ST-USleepNet.

nan


Article 1246

Title@2025-06-08 (7): Learning based on neurovectors for tabular data: a new neural network approach

Title: Learning based on neurovectors for tabular data: a new neural network approach Lernen auf Basis von Neurovektoren für tabellarische Daten: ein neuer neuronaler Netzwerkansatz 基于用于表格数据的神经动力学的学习:新的神经网络方法 2506.07185v1

Authors (4): J. C. Husillos, A. Gallego, A. Roma, A. Troncoso

In this paper, we present a novel learning approach based on Neurovectors, an innovative paradigm that structures information through interconnected nodes and vector relationships for tabular data processing. Unlike traditional artificial neural networks that rely on weight adjustment through backpropagation, Neurovectors encode information by structuring data in vector spaces where energy propagation, rather than traditional weight updates, drives the learning process, enabling a more adaptable and explainable learning process. Our method generates dynamic representations of knowledge through neurovectors, thereby improving both the interpretability and efficiency of the predictive model. Experimental results using datasets from well-established repositories such as the UCI machine learning repository and Kaggle are reported both for classification and regression. To evaluate its performance, we compare our approach with standard machine learning and deep learning models, showing that Neurovectors achieve competitive accuracy.

nan


Article 1247

Title@2025-06-08 (7): Regularized Adaptive Graph Learning for Large-Scale Traffic Forecasting

Title: Regularized Adaptive Graph Learning for Large-Scale Traffic Forecasting Regularisiertes Adaptives Graphenlernen für großräumige Verkehrsprognosen 用于大型交通流量预测的正规化适应性图表学习 2506.07179v1

Authors (5): Kaiqi Wu, Weiyang Kong, Sen Zhang, Yubao Liu, Zitong Chen

Traffic prediction is a critical task in spatial-temporal forecasting with broad applications in travel planning and urban management. Adaptive graph convolution networks have emerged as mainstream solutions due to their ability to learn node embeddings in a data-driven manner and capture complex latent dependencies. However, existing adaptive graph learning methods for traffic forecasting often either ignore the regularization of node embeddings, which account for a significant proportion of model parameters, or face scalability issues from expensive graph convolution operations. To address these challenges, we propose a Regularized Adaptive Graph Learning (RAGL) model. First, we introduce a regularized adaptive graph learning framework that synergizes Stochastic Shared Embedding (SSE) and adaptive graph convolution via a residual difference mechanism, achieving both embedding regularization and noise suppression. Second, to ensure scalability on large road networks, we develop the Efficient Cosine Operator (ECO), which performs graph convolution based on the cosine similarity of regularized embeddings with linear time complexity. Extensive experiments on four large-scale real-world traffic datasets show that RAGL consistently outperforms state-of-the-art methods in terms of prediction accuracy and exhibits competitive computational efficiency.

nan


Article 1248

Title@2025-06-08 (7): RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality

Title: RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality RULE: Verstärkung des Unerfahrenen erreicht Vergessen-Behalten Pareto-Optimalität RULE: 强化不放功率达到遗忘-保留最佳程度 2506.07171v1

Authors (8): Chenlong Zhang, Zhuoran Jin, Hongbang Yuan, Jiaheng Wei, Tong Zhou, Kang Liu, Jun Zhao, Yubo Chen

The widespread deployment of Large Language Models (LLMs) trained on massive, uncurated corpora has raised growing concerns about the inclusion of sensitive, copyrighted, or illegal content. This has led to increasing interest in LLM unlearning: the task of selectively removing specific information from a model without retraining from scratch or degrading overall utility. However, existing methods often rely on large-scale forget and retain datasets, and suffer from unnatural responses, poor generalization, or catastrophic utility loss. In this work, we propose Reinforcement UnLearning (RULE), an efficient framework that formulates unlearning as a refusal boundary optimization problem. RULE is trained with a small portion of the forget set and synthesized boundary queries, using a verifiable reward function that encourages safe refusal on forget–related queries while preserving helpful responses on permissible inputs. We provide both theoretical and empirical evidence demonstrating the effectiveness of RULE in achieving targeted unlearning without compromising model utility. Experimental results show that, with only $12%$ forget set and $8%$ synthesized boundary data, RULE outperforms existing baselines by up to $17.5%$ forget quality and $16.3%$ naturalness response while maintaining general utility, achieving forget–retain Pareto optimality. Remarkably, we further observe that RULE improves the naturalness of model outputs, enhances training efficiency, and exhibits strong generalization ability, generalizing refusal behavior to semantically related but unseen queries.

nan


Article 1249

Title@2025-06-08 (7): Efficient Text-Attributed Graph Learning through Selective Annotation and Graph Alignment

Title: Efficient Text-Attributed Graph Learning through Selective Annotation and Graph Alignment Effizientes Text-Attributes Graphenlernen durch selektive Annotation und Graphenausrichtung 通过选择性注释和图表对齐,通过选择性注释和图表对齐,高效率的文本定位图表学习 2506.07168v1

Authors (9): Huanyi Xie, Lijie Hu, Lu Yu, Tianhao Huang, Longfei Li, Meng Li, Jun Zhou, Huan Wang, Di Wang

In the realm of Text-attributed Graphs (TAGs), traditional graph neural networks (GNNs) often fall short due to the complex textual information associated with each node. Recent methods have improved node representations by leveraging large language models (LLMs) to enhance node text features, but these approaches typically require extensive annotations or fine-tuning across all nodes, which is both time-consuming and costly. To overcome these challenges, we introduce GAGA, an efficient framework for TAG representation learning. GAGA reduces annotation time and cost by focusing on annotating only representative nodes and edges. It constructs an annotation graph that captures the topological relationships among these annotations. Furthermore, GAGA employs a two-level alignment module to effectively integrate the annotation graph with the TAG, aligning their underlying structures. Experiments show that GAGA achieves classification accuracies on par with or surpassing state-of-the-art methods while requiring only 1% of the data to be annotated, demonstrating its high efficiency.

nan


Article 1250

Title@2025-06-08 (7): AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Title: AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models AMoPO: Adaptive Multi-Objektive Preference-Optimierung ohne Prämienmodelle und Referenzmodelle AMOP: 适应性、多目标、不奖励模式和参考模式的优化 2506.07165v1

Authors (10): Qi Liu, Jingqing Ruan, Hao Li, Haodong Zhao, Desheng Wang, Jiansong Chen, Wan Guanglu, Xunliang Cai, Zhi Zheng, Tong Xu

Existing multi-objective preference alignment methods for large language models (LLMs) face limitations: (1) the inability to effectively balance various preference dimensions, and (2) reliance on auxiliary reward/reference models introduces computational complexity. To address these challenges, we propose Adaptive Multi-objective Preference Optimization (AMoPO), a novel framework that achieves dynamic balance across preference dimensions. By introducing the multi-objective optimization paradigm to use the dimension-aware generation metrics as implicit rewards, AMoPO aligns LLMs with diverse preferences without additional reward models or reference models. We introduce an adaptive weight assignment mechanism that models the generation space as a Gaussian distribution, allowing dynamic prioritization of preference dimensions. Empirical results demonstrate that AMoPO outperforms state-of-the-art baselines by 28.5%, and the experiments on 7B, 14B, and 32B models reveal the scaling ability of AMoPO. Moreover, additional analysis of multiple dimensions verifies its adaptability and effectiveness. These findings validate AMoPO’s capability to achieve dimension-aware preference alignment, highlighting its superiority. Our codes and datasets are available at https://github.com/Javkonline/AMoPO.

nan


Article 1251

Title@2025-06-08 (7): ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models

Title: ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models ExLM: Nachdenken über die Auswirkungen von [MASK] Token in maskierten Sprachmodellen EXLM:重新思考[MASK]语调模式中[MASK]语调的影响 2501.13397v5

Authors (8): Kangjie Zheng, Junwei Yang, Siyue Liang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang

Masked Language Models (MLMs) have achieved remarkable success in many self-supervised representation learning tasks. MLMs are trained by randomly masking portions of the input sequences with [MASK] tokens and learning to reconstruct the original content based on the remaining context. This paper explores the impact of [MASK] tokens on MLMs. Analytical studies show that masking tokens can introduce the corrupted semantics problem, wherein the corrupted context may convey multiple, ambiguous meanings. This problem is also a key factor affecting the performance of MLMs on downstream tasks. Based on these findings, we propose a novel enhanced-context MLM, ExLM. Our approach expands [MASK] tokens in the input context and models the dependencies between these expanded states. This enhancement increases context capacity and enables the model to capture richer semantic information, effectively mitigating the corrupted semantics problem during pre-training. Experimental results demonstrate that ExLM achieves significant performance improvements in both text modeling and SMILES modeling tasks. Further analysis confirms that ExLM enriches semantic representations through context enhancement, and effectively reduces the semantic multimodality commonly observed in MLMs.

nan


Article 1252

Title@2025-06-08 (7): SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision

Title: SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision SMI-Editor: Bearbeiten-basiertes SMILES-Sprachenmodell mit Fragment-Level-Überwachung SMI-编辑器:基于编辑的SMILES语言模式,带有零散监督级别 2412.05569v2

Authors (8): Kangjie Zheng, Siyue Liang, Junwei Yang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang

SMILES, a crucial textual representation of molecular structures, has garnered significant attention as a foundation for pre-trained language models (LMs). However, most existing pre-trained SMILES LMs focus solely on the single-token level supervision during pre-training, failing to fully leverage the substructural information of molecules. This limitation makes the pre-training task overly simplistic, preventing the models from capturing richer molecular semantic information. Moreover, during pre-training, these SMILES LMs only process corrupted SMILES inputs, never encountering any valid SMILES, which leads to a train-inference mismatch. To address these challenges, we propose SMI-Editor, a novel edit-based pre-trained SMILES LM. SMI-Editor disrupts substructures within a molecule at random and feeds the resulting SMILES back into the model, which then attempts to restore the original SMILES through an editing process. This approach not only introduces fragment-level training signals, but also enables the use of valid SMILES as inputs, allowing the model to learn how to reconstruct complete molecules from these incomplete structures. As a result, the model demonstrates improved scalability and an enhanced ability to capture fragment-level molecular information. Experimental results show that SMI-Editor achieves state-of-the-art performance across multiple downstream molecular tasks, and even outperforming several 3D molecular representation models.

nan


Article 1253

Title@2025-06-08 (7): pFedSOP : Accelerating Training Of Personalized Federated Learning Using Second-Order Optimization

Title: pFedSOP : Accelerating Training Of Personalized Federated Learning Using Second-Order Optimization pFedSOP : Beschleunigendes Training des personalisierten Föderierten Lernens unter Verwendung von Second-Order-Optimierung pFedSOP: 利用二阶优化加快个人化联邦学习培训 2506.07159v1

Authors (2): Mrinmay Sen, Chalavadi Krishna Mohan

Personalized Federated Learning (PFL) enables clients to collaboratively train personalized models tailored to their individual objectives, addressing the challenge of model generalization in traditional Federated Learning (FL) due to high data heterogeneity. However, existing PFL methods often require increased communication rounds to achieve the desired performance, primarily due to slow training caused by the use of first-order optimization, which has linear convergence. Additionally, many of these methods increase local computation because of the additional data fed into the model during the search for personalized local models. One promising solution to this slow training is second-order optimization, known for its quadratic convergence. However, employing it in PFL is challenging due to the Hessian matrix and its inverse. In this paper, we propose pFedSOP, which efficiently utilizes second-order optimization in PFL to accelerate the training of personalized models and enhance performance with fewer communication rounds. Our approach first computes a personalized local gradient update using the Gompertz function-based normalized angle between local and global gradient updates, incorporating client-specific global information. We then use a regularized Fisher Information Matrix (FIM), computed from this personalized gradient update, as an approximation of the Hessian to update the personalized models. This FIM-based second-order optimization speeds up training with fewer communication rounds by tackling the challenges with exact Hessian and avoids additional data being fed into the model during the search for personalized local models. Extensive experiments on heterogeneously partitioned image classification datasets with partial client participation demonstrate that pFedSOP outperforms state-of-the-art FL and PFL algorithms.

nan


Article 1254

Title@2025-06-08 (7): Syntactic Control of Language Models by Posterior Inference

Title: Syntactic Control of Language Models by Posterior Inference Syntaktische Kontrolle von Sprachmodellen durch hintere Inferenz 由Posideer 推断对语言模型的协同控制 2506.07154v1

Authors (4): Vicky Xefteri, Tim Vieira, Ryan Cotterell, Afra Amini

Controlling the syntactic structure of text generated by language models is valuable for applications requiring clarity, stylistic consistency, or interpretability, yet it remains a challenging task. In this paper, we argue that sampling algorithms based on the posterior inference can effectively enforce a target constituency structure during generation. Our approach combines sequential Monte Carlo, which estimates the posterior distribution by sampling from a proposal distribution, with a syntactic tagger that ensures that each generated token aligns with the desired syntactic structure. Our experiments with GPT2 and Llama3-8B models show that with an appropriate proposal distribution, we can improve syntactic accuracy, increasing the F1 score from $12.31$ (GPT2-large) and $35.33$ (Llama3-8B) to about $93$ in both cases without compromising the language model’s fluency. These results underscore both the complexity of syntactic control and the effectiveness of sampling algorithms, offering a promising approach for applications where precise control over syntax is essential.

nan


Article 1255

Title@2025-06-08 (7): Tensor Product Neural Networks for Functional ANOVA Model

Title: Tensor Product Neural Networks for Functional ANOVA Model Tensor Produkt Neuronale Netzwerke für funktionales ANOVA-Modell 功能ANOVA模型的神经网络 2502.15215v4

Authors (5): Seokhun Park, Insung Kong, Yongchan Choi, Chanmoo Park, Yongdai Kim

Interpretability for machine learning models is becoming more and more important as machine learning models become more complex. The functional ANOVA model, which decomposes a high-dimensional function into a sum of lower dimensional functions (commonly referred to as components), is one of the most popular tools for interpretable AI, and recently, various neural networks have been developed for estimating each component in the functional ANOVA model. However, such neural networks are highly unstable when estimating each component since the components themselves are not uniquely defined. That is, there are multiple functional ANOVA decompositions for a given function. In this paper, we propose a novel neural network which guarantees a unique functional ANOVA decomposition and thus is able to estimate each component stably and accurately. We call our proposed neural network ANOVA Tensor Product Neural Network (ANOVA-TPNN) since it is motivated by the tensor product basis expansion. Theoretically, we prove that ANOVA-TPNN can approximate any smooth function well. Empirically, we show that ANOVA-TPNN provide much more stable estimation of each component and thus much more stable interpretation when training data and initial values of the model parameters vary than existing neural networks do.

nan


Article 1256

Title@2025-06-08 (7): Comba: Improving Bilinear RNNs with Closed-loop Control

Title: Comba: Improving Bilinear RNNs with Closed-loop Control Comba: Bilineare RNNs mit Closed-Loop-Steuerung verbessern Comba: 改进有闭环控制的双线区域网网 2506.02475v2

Authors (8): Jiaxi Hu, Yongqi Pan, Jusen Du, Disen Lan, Xiaqiang Tang, Qingsong Wen, Yuxuan Liang, Weigao Sun

Recent efficient sequence modeling methods such as Gated DeltaNet, TTT, and RWKV-7 have achieved performance improvements by supervising the recurrent memory management through Delta learning rule. Unlike previous state-space models (e.g., Mamba) and gated linear attentions (e.g., GLA), these models introduce interactions between the recurrent state and the key vector, structurally resembling bilinear systems. In this paper, we first introduce the concept of Bilinear RNNs with a comprehensive analysis on the advantages and limitations of these models. Then, based on closed-loop control theory, we propose a novel Bilinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections. We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on large-scale corpus. Comba demonstrates superior performance and computation efficiency in both language and vision modeling.

nan


Article 1257

Title@2025-06-08 (7): How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level Analysis

Title: How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level Analysis Wie reduziert DPO die Toxizität? Eine mechanistische Neuronen-Level-Analyse DPO 如何减少毒性? 2411.06424v3

Authors (5): Yushi Yang, Filip Sondej, Harry Mayne, Andrew Lee, Adam Mahdi

Safety fine-tuning algorithms reduce harmful outputs in language models, yet their mechanisms remain under-explored. Direct Preference Optimization (DPO) is a popular choice of algorithm, but prior explanations, attributing its effects solely to dampened toxic neurons in the MLP layers, are incomplete. In this study, we analyse four language models (Llama-3.1-8B, Gemma-2-2B, Mistral-7B, GPT-2-Medium) and show that toxic neurons only account for 2.5% to 24% of DPO’s effects across models. Instead, DPO balances distributed activation shifts across all MLP neurons to create a net toxicity reduction. We attribute this reduction to four neuron groups, two aligned with reducing toxicity and two promoting anti-toxicity, whose combined effects replicate DPO across models. To further validate this understanding, we develop an activation editing method mimicking DPO through distributed shifts along a toxicity representation. This method outperforms DPO in reducing toxicity while preserving perplexity, without requiring any weight updates. Our work provides a mechanistic understanding of DPO and introduces an efficient, tuning-free alternative for safety fine-tuning.

nan


Article 1258

Title@2025-06-08 (7): Quantile-Optimal Policy Learning under Unmeasured Confounding

Title: Quantile-Optimal Policy Learning under Unmeasured Confounding Quantile-Optimal Policy Learning unter ungemessener Verwirrung 在无法计量的混乱状态下进行量化最佳政策学习 2506.07140v1

Authors (5): Zhongren Chen, Siyu Chen, Zhengling Qi, Xiaohong Chen, Zhuoran Yang

We study quantile-optimal policy learning where the goal is to find a policy whose reward distribution has the largest $\alpha$-quantile for some $\alpha \in (0, 1)$. We focus on the offline setting whose generating process involves unobserved confounders. Such a problem suffers from three main challenges: (i) nonlinearity of the quantile objective as a functional of the reward distribution, (ii) unobserved confounding issue, and (iii) insufficient coverage of the offline dataset. To address these challenges, we propose a suite of causal-assisted policy learning methods that provably enjoy strong theoretical guarantees under mild conditions. In particular, to address (i) and (ii), using causal inference tools such as instrumental variables and negative controls, we propose to estimate the quantile objectives by solving nonlinear functional integral equations. Then we adopt a minimax estimation approach with nonparametric models to solve these integral equations, and propose to construct conservative policy estimates that address (iii). The final policy is the one that maximizes these pessimistic estimates. In addition, we propose a novel regularized policy learning method that is more amenable to computation. Finally, we prove that the policies learned by these methods are $\tilde{\mathscr{O}}(n^{-1/2})$ quantile-optimal under a mild coverage assumption on the offline dataset. Here, $\tilde{\mathscr{O}}(\cdot)$ omits poly-logarithmic factors. To the best of our knowledge, we propose the first sample-efficient policy learning algorithms for estimating the quantile-optimal policy when there exist unmeasured confounding.

nan


Article 1259

Title@2025-06-08 (7): Evaluation of Machine Learning Models in Student Academic Performance Prediction

Title: Evaluation of Machine Learning Models in Student Academic Performance Prediction Evaluation von Machine Learning Models in der studentischen Leistungsvorhersage 评价学生学术表现预测中的机器学习模式 2506.08047v1

Authors (2): A. G. R. Sandeepa, Sanka Mohottala

This research investigates the use of machine learning methods to forecast students’ academic performance in a school setting. Students’ data with behavioral, academic, and demographic details were used in implementations with standard classical machine learning models including multi-layer perceptron classifier (MLPC). MLPC obtained 86.46% maximum accuracy for test set across all implementations. Under 10-fold cross validation, MLPC obtained 79.58% average accuracy for test set while for train set, it was 99.65%. MLP’s better performance over other machine learning models strongly suggest the potential use of neural networks as data-efficient models. Feature selection approach played a crucial role in improving the performance and multiple evaluation approaches were used in order to compare with existing literature. Explainable machine learning methods were utilized to demystify the black box models and to validate the feature selection approach.

nan


Article 1260

Title@2025-06-08 (7): Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning

Title: Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning Zuverlässige Kritiker: Monotonische Verbesserung und Konvergenzgarantien für die Stärkung des Lernens 可靠的批评:加强学习的单体改进和融合保障 2506.07134v1

Authors (4): Eshwar S. R., Gugan Thoppe, Aditya Gopalan, Gal Dalal

Despite decades of research, it remains challenging to correctly use Reinforcement Learning (RL) algorithms with function approximation. A prime example is policy iteration, whose fundamental guarantee of monotonic improvement collapses even under linear function approximation. To address this issue, we introduce Reliable Policy Iteration (RPI). It replaces the common projection or Bellman-error minimization during policy evaluation with a Bellman-based constrained optimization. We prove that not only does RPI confer textbook monotonicity on its value estimates but these estimates also lower bound the true return. Also, their limit partially satisfies the unprojected Bellman equation, emphasizing RPI’s natural fit within RL. RPI is the first algorithm with such monotonicity and convergence guarantees under function approximation. For practical use, we provide a model-free variant of RPI that amounts to a novel critic. It can be readily integrated into primary model-free PI implementations such as DQN and DDPG. In classical control tasks, such RPI-enhanced variants consistently maintain their lower-bound guarantee while matching or surpassing the performance of all baseline methods.

nan


Article 1261

Title@2025-06-08 (7): DiC: Rethinking Conv3x3 Designs in Diffusion Models

Title: DiC: Rethinking Conv3x3 Designs in Diffusion Models DiC: Conv3x3-Designs in Diffusionsmodellen neu denken DiC: 重新思考传播模型中的Conv3x3 设计 2501.00603v2

Authors (6): Yuchuan Tian, Jing Han, Chengcheng Wang, Yuchen Liang, Chao Xu, Hanting Chen

Diffusion models have shown exceptional performance in visual generation tasks. Recently, these models have shifted from traditional U-Shaped CNN-Attention hybrid structures to fully transformer-based isotropic architectures. While these transformers exhibit strong scalability and performance, their reliance on complicated self-attention operation results in slow inference speeds. Contrary to these works, we rethink one of the simplest yet fastest module in deep learning, 3x3 Convolution, to construct a scaled-up purely convolutional diffusion model. We first discover that an Encoder-Decoder Hourglass design outperforms scalable isotropic architectures for Conv3x3, but still under-performing our expectation. Further improving the architecture, we introduce sparse skip connections to reduce redundancy and improve scalability. Based on the architecture, we introduce conditioning improvements including stage-specific embeddings, mid-block condition injection, and conditional gating. These improvements lead to our proposed Diffusion CNN (DiC), which serves as a swift yet competitive diffusion architecture baseline. Experiments on various scales and settings show that DiC surpasses existing diffusion transformers by considerable margins in terms of performance while keeping a good speed advantage. Project page: https://github.com/YuchuanTian/DiC

nan


Article 1262

Title@2025-06-08 (7): Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models

Title: Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models Gleichwertige Denoiser können Graphen nicht kopieren: Richten Sie Ihre Graphen-Diffusionsmodelle aus 等式 Denois 无法复制图表: 对齐您的图形扩散模型 2405.17656v2

Authors (5): Najwa Laabid, Severi Rissanen, Markus Heinonen, Arno Solin, Vikas Garg

Graph diffusion models, dominant in graph generative modeling, remain underexplored for graph-to-graph translation tasks like chemical reaction prediction. We demonstrate that standard permutation equivariant denoisers face fundamental limitations in these tasks due to their inability to break symmetries in noisy inputs. To address this, we propose aligning input and target graphs to break input symmetries while preserving permutation equivariance in non-matching graph portions. Using retrosynthesis (i.e., the task of predicting precursors for synthesis of a given target molecule) as our application domain, we show how alignment dramatically improves discrete diffusion model performance from 5% to a SOTA-matching 54.7% top-1 accuracy. Code is available at https://github.com/Aalto-QuML/DiffAlign.

nan


Article 1263

Title@2025-06-08 (7): Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models

Title: Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models Quality-Diversity Red-Teaming: Automatisierte Generierung von hochwertigen und unterschiedlichen Attackern für große Sprachmodelle 质量-多样性红队:为大语言模式自动生成高品质和不同攻击者 2506.07121v1

Authors (8): Ren-Jian Wang, Ke Xue, Zeyu Qin, Ziniu Li, Sheng Tang, Hao-Tian Li, Shengcai Liu, Chao Qian

Ensuring safety of large language models (LLMs) is important. Red teaming–a systematic approach to identifying adversarial prompts that elicit harmful responses from target LLMs–has emerged as a crucial safety evaluation method. Within this framework, the diversity of adversarial prompts is essential for comprehensive safety assessments. We find that previous approaches to red-teaming may suffer from two key limitations. First, they often pursue diversity through simplistic metrics like word frequency or sentence embedding similarity, which may not capture meaningful variation in attack strategies. Second, the common practice of training a single attacker model restricts coverage across potential attack styles and risk categories. This paper introduces Quality-Diversity Red-Teaming (QDRT), a new framework designed to address these limitations. QDRT achieves goal-driven diversity through behavior-conditioned training and implements a behavioral replay buffer in an open-ended manner. Additionally, it trains multiple specialized attackers capable of generating high-quality attacks across diverse styles and risk categories. Our empirical evaluation demonstrates that QDRT generates attacks that are both more diverse and more effective against a wide range of target LLMs, including GPT-2, Llama-3, Gemma-2, and Qwen2.5. This work advances the field of LLM safety by providing a systematic and effective approach to automated red-teaming, ultimately supporting the responsible deployment of LLMs.

nan


Article 1264

Title@2025-06-08 (7): A Hybrid Multi-Factor Network with Dynamic Sequence Modeling for Early Warning of Intraoperative Hypotension

Title: A Hybrid Multi-Factor Network with Dynamic Sequence Modeling for Early Warning of Intraoperative Hypotension Hybrides Multi-Factor-Netzwerk mit dynamischer Sequenzmodellierung zur Frühwarnung von intraoperativer Hypotonie 混合多要素网络,具有动态序列模型模型,以及早警告不合作水分的不合作状态; 2409.11064v4

Authors (4): Mingyue Cheng, Jintao Zhang, Zhiding Liu, Chunli Liu

Intraoperative hypotension (IOH) prediction using past physiological signals is crucial, as IOH may lead to inadequate organ perfusion and significantly elevate the risk of severe complications and mortality. However, current methods often rely on static modeling, overlooking the complex temporal dependencies and the inherently non-stationary nature of physiological signals. We propose a Hybrid Multi-Factor (HMF) network that formulates IOH prediction as a dynamic sequence forecasting task, explicitly capturing both temporal dependencies and physiological non-stationarity. We represent signal dynamics as multivariate time series and decompose them into trend and seasonal components, enabling separate modeling of long-term and periodic variations. Each component is encoded with a patch-based Transformer to balance computational efficiency and feature representation. To address distributional drift from evolving signals, we introduce a symmetric normalization mechanism. Experiments on both public and real-world clinical datasets show that HMF significantly outperforms competitive baselines. We hope HMF offers new insights into IOH prediction and ultimately promotes safer surgical care. Our code is available at https://github.com/Mingyue-Cheng/HMF.

nan


Article 1265

Title@2025-06-08 (7): Conditional Denoising Meets Polynomial Modeling: A Flexible Decoupled Framework for Time Series Forecasting

Title: Conditional Denoising Meets Polynomial Modeling: A Flexible Decoupled Framework for Time Series Forecasting Bedingtes Stören trifft auf Polynommodellierung: Ein flexibles entkoppeltes Framework für die Zeitreihenprognose 满足多面性建模:时间序列预测灵活拆分框架 2410.13253v7

Authors (5): Jintao Zhang, Mingyue Cheng, Xiaoyu Tao, Zhiding Liu, Daoyu Wang

Time series forecasting models are becoming increasingly prevalent due to their critical role in decision-making across various domains. However, most existing approaches represent the coupled temporal patterns, often neglecting the distinction between their specific components. In particular, fluctuating patterns and smooth trends within time series exhibit distinct characteristics. In this work, to model complicated temporal patterns, we propose a Conditional Denoising Polynomial Modeling (CDPM) framework, where probabilistic diffusion models and deterministic linear models are trained end-to-end. Instead of modeling the coupled time series, CDPM decomposes it into trend and seasonal components for modeling them separately. To capture the fluctuating seasonal component, we employ a probabilistic diffusion model based on statistical properties from the historical window. For the smooth trend component, a module is proposed to enhance linear models by incorporating historical dependencies, thereby preserving underlying trends and mitigating noise distortion. Extensive experiments conducted on six benchmarks demonstrate the effectiveness of our framework, highlighting the potential of combining probabilistic and deterministic models. Our code is available at https://github.com/zjt-gpu/CDPM.

nan


Article 1266

Title@2025-06-08 (7): Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings

Title: Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings Auf dem Weg zu einer universellen Offline-Black-Box-Optimierung über Sprachmodell-Embeddings 通过学习语言模型嵌入式嵌入,实现普遍离线黑箱优化 2506.07109v1

Authors (7): Rong-Xi Tan, Ming Chen, Ke Xue, Yao Wang, Yaoyuan Wang, Sheng Fu, Chao Qian

The pursuit of universal black-box optimization (BBO) algorithms is a longstanding goal. However, unlike domains such as language or vision, where scaling structured data has driven generalization, progress in offline BBO remains hindered by the lack of unified representations for heterogeneous numerical spaces. Thus, existing offline BBO approaches are constrained to single-task and fixed-dimensional settings, failing to achieve cross-domain universal optimization. Recent advances in language models (LMs) offer a promising path forward: their embeddings capture latent relationships in a unifying way, enabling universal optimization across different data types possible. In this paper, we discuss multiple potential approaches, including an end-to-end learning framework in the form of next-token prediction, as well as prioritizing the learning of latent spaces with strong representational capabilities. To validate the effectiveness of these methods, we collect offline BBO tasks and data from open-source academic works for training. Experiments demonstrate the universality and effectiveness of our proposed methods. Our findings suggest that unifying language model priors and learning string embedding space can overcome traditional barriers in universal BBO, paving the way for general-purpose BBO algorithms. The code is provided at https://github.com/lamda-bbo/universal-offline-bbo.

nan


Article 1267

Title@2025-06-08 (7): Filling the Missings: Spatiotemporal Data Imputation by Conditional Diffusion

Title: Filling the Missings: Spatiotemporal Data Imputation by Conditional Diffusion Füllung der fehlenden Daten: Spatiotemporale Daten-Imputation durch Bedingte Diffusion 填补失踪人员:通过有条件的传播来估计时空数据 2506.07099v1

Authors (5): Wenying He, Jieling Huang, Junhua Gu, Ji Zhang, Yude Bai

Missing data in spatiotemporal systems presents a significant challenge for modern applications, ranging from environmental monitoring to urban traffic management. The integrity of spatiotemporal data often deteriorates due to hardware malfunctions and software failures in real-world deployments. Current approaches based on machine learning and deep learning struggle to model the intricate interdependencies between spatial and temporal dimensions effectively and, more importantly, suffer from cumulative errors during the data imputation process, which propagate and amplify through iterations. To address these limitations, we propose CoFILL, a novel Conditional Diffusion Model for spatiotemporal data imputation. CoFILL builds on the inherent advantages of diffusion models to generate high-quality imputations without relying on potentially error-prone prior estimates. It incorporates an innovative dual-stream architecture that processes temporal and frequency domain features in parallel. By fusing these complementary features, CoFILL captures both rapid fluctuations and underlying patterns in the data, which enables more robust imputation. The extensive experiments reveal that CoFILL’s noise prediction network successfully transforms random noise into meaningful values that align with the true data distribution. The results also show that CoFILL outperforms state-of-the-art methods in imputation accuracy. The source code is publicly available at https://github.com/joyHJL/CoFILL.

nan


Article 1268

Title@2025-06-08 (7): A Theoretical Characterization of Optimal Data Augmentations in Self-Supervised Learning

Title: A Theoretical Characterization of Optimal Data Augmentations in Self-Supervised Learning Eine theoretische Charakterisierung von Optimal Data Augmentations im Selbst-überwachten Lernen 自我监督学习中最佳数据增加的理论特征 2411.01767v4

Authors (3): Shlomo Libo Feigin, Maximilian Fleissner, Debarghya Ghoshdastidar

Data augmentations play an important role in the recent success of self-supervised learning (SSL). While augmentations are commonly understood to encode invariances between different views into the learned representations, this interpretation overlooks the impact of the pretraining architecture and suggests that SSL would require diverse augmentations which resemble the data to work well. However, these assumptions do not align with empirical evidence, encouraging further theoretical understanding to guide the principled design of augmentations in new domains. To this end, we use kernel theory to derive analytical expressions for data augmentations that achieve desired target representations after pretraining. We consider non-contrastive and contrastive losses, namely VICReg, Barlow Twins and the Spectral Contrastive Loss, and provide an algorithm to construct such augmentations. Our analysis shows that augmentations need not be similar to the data to learn useful representations, nor be diverse, and that the architecture has a significant impact on the optimal augmentations.

nan


Article 1269

Title@2025-06-08 (7): Epidemic-guided deep learning for spatiotemporal forecasting of Tuberculosis outbreak

Title: Epidemic-guided deep learning for spatiotemporal forecasting of Tuberculosis outbreak Epidemisch geführtes Deep Learning für die spatiotemporale Vorhersage des Tuberkuloseausbruchs 流行病指导深层学习,以对结核病爆发进行时空预测 2502.10786v2

Authors (4): Madhab Barman, Madhurima Panja, Nachiketa Mishra, Tanujit Chakraborty

Tuberculosis (TB) remains a formidable global health challenge, driven by complex spatiotemporal transmission dynamics and influenced by factors such as population mobility and behavioral changes. We propose an Epidemic-Guided Deep Learning (EGDL) approach that fuses mechanistic epidemiological principles with advanced deep learning techniques to enhance early warning systems and intervention strategies for TB outbreaks. Our framework is built upon a modified networked Susceptible-Infectious-Recovered (MN-SIR) model augmented with a saturated incidence rate and graph Laplacian diffusion, capturing both long-term transmission dynamics and region-specific population mobility patterns. Compartmental model parameters are rigorously estimated using Bayesian inference via the Markov Chain Monte Carlo approach. Theoretical analysis leveraging the comparison principle and Green’s formula establishes global stability properties of the disease-free and endemic equilibria. Building on these epidemiological insights, we design two forecasting architectures, EGDL-Parallel and EGDL-Series, that integrate the mechanistic outputs of the MN-SIR model within deep neural networks. This integration mitigates the overfitting risks commonly encountered in data-driven methods and filters out noise inherent in surveillance data, resulting in reliable forecasts of real-world epidemic trends. Experiments conducted on TB incidence data from 47 prefectures in Japan and 31 provinces in mainland China demonstrate that our approach delivers robust and accurate predictions across multiple time horizons (short to medium-term forecasts), supporting its generalizability across regions with different population dynamics.

nan


Article 1270

Title@2025-06-08 (7): Patient Similarity Computation for Clinical Decision Support: An Efficient Use of Data Transformation, Combining Static and Time Series Data

Title: Patient Similarity Computation for Clinical Decision Support: An Efficient Use of Data Transformation, Combining Static and Time Series Data Patienten-Ähnlichkeitsrechnung für die klinische Entscheidungsunterstützung: Eine effiziente Nutzung von Datentransformationen, die Kombination von statischen und Zeitreihendaten 临床决定支持:有效利用数据转换、静态和时间系列数据 2506.07092v1

Authors (4): Joydeb Kumar Sana, Mohammad M. Masud, M Sohel Rahman, M Saifur Rahman

Patient similarity computation (PSC) is a fundamental problem in healthcare informatics. The aim of the patient similarity computation is to measure the similarity among patients according to their historical clinical records, which helps to improve clinical decision support. This paper presents a novel distributed patient similarity computation (DPSC) technique based on data transformation (DT) methods, utilizing an effective combination of time series and static data. Time series data are sensor-collected patients’ information, including metrics like heart rate, blood pressure, Oxygen saturation, respiration, etc. The static data are mainly patient background and demographic data, including age, weight, height, gender, etc. Static data has been used for clustering the patients. Before feeding the static data to the machine learning model adaptive Weight-of-Evidence (aWOE) and Z-score data transformation (DT) methods have been performed, which improve the prediction performances. In aWOE-based patient similarity models, sensitive patient information has been processed using aWOE which preserves the data privacy of the trained models. We used the Dynamic Time Warping (DTW) approach, which is robust and very popular, for time series similarity. However, DTW is not suitable for big data due to the significant computational run-time. To overcome this problem, distributed DTW computation is used in this study. For Coronary Artery Disease, our DT based approach boosts prediction performance by as much as 11.4%, 10.20%, and 12.6% in terms of AUC, accuracy, and F-measure, respectively. In the case of Congestive Heart Failure (CHF), our proposed method achieves performance enhancement up to 15.9%, 10.5%, and 21.9% for the same measures, respectively. The proposed method reduces the computation time by as high as 40%.

nan


Article 1271

Title@2025-06-08 (7): Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models

Title: Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models Stellen Sie sich vor zu hören: Auditory Knowledge Generation kann ein effektiver Assistent für Sprachmodelle sein 想象一下:产生听力知识可以成为语言模式的有效助理。 2503.16853v2

Authors (3): Suho Yoo, Hyunjong Ok, Jaeho Lee

Language models pretrained on text-only corpora often struggle with tasks that require auditory commonsense knowledge. Previous work addresses this problem by augmenting the language model to retrieve knowledge from external audio databases. This approach has several limitations, such as the potential lack of relevant audio in databases and the high costs associated with constructing the databases. To address these issues, we propose Imagine to Hear, a novel approach that dynamically generates auditory knowledge using generative models. Our framework detects multiple audio-related textual spans from the given prompt and generates corresponding auditory knowledge. We develop several mechanisms to efficiently process multiple auditory knowledge, including a CLAP-based rejection sampler and a language-audio fusion module. Our experiments show that our method achieves state-of-the-art performance on AuditoryBench without relying on external databases, highlighting the effectiveness of our generation-based approach.

nan


Article 1272

Title@2025-06-08 (7): State Entropy Regularization for Robust Reinforcement Learning

Title: State Entropy Regularization for Robust Reinforcement Learning Staatliche Entropie-Regularisierung für robustes Stärkungslernen 强力强化学习的状态元件常规化 2506.07085v1

Authors (6): Uri Koren, Yonatan Ashlag, Mirco Mutti, Esther Derman, Pierre-Luc Bacon, Shie Mannor

State entropy regularization has empirically shown better exploration and sample complexity in reinforcement learning (RL). However, its theoretical guarantees have not been studied. In this paper, we show that state entropy regularization improves robustness to structured and spatially correlated perturbations. These types of variation are common in transfer learning but often overlooked by standard robust RL methods, which typically focus on small, uncorrelated changes. We provide a comprehensive characterization of these robustness properties, including formal guarantees under reward and transition uncertainty, as well as settings where the method performs poorly. Much of our analysis contrasts state entropy with the widely used policy entropy regularization, highlighting their different benefits. Finally, from a practical standpoint, we illustrate that compared with policy entropy, the robustness advantages of state entropy are more sensitive to the number of rollouts used for policy evaluation.

nan


Article 1273

Title@2025-06-08 (7): Inverse Design of Metamaterials with Manufacturing-Guiding Spectrum-to-Structure Conditional Diffusion Model

Title: Inverse Design of Metamaterials with Manufacturing-Guiding Spectrum-to-Structure Conditional Diffusion Model Inverses Design von Metamaterialien mit Fertigungsleit-Spektrum-zu-Struktur Bedingtes Diffusionsmodell 具有 “ 制造-引导光谱到结构有条件扩散模型 “ 的元材料逆向设计 2506.07083v1

Authors (12): Jiawen Li, Jiang Guo, Yuanzhe Li, Zetian Mao, Jiaxing Shen, Tashi Xu, Diptesh Das, Jinming He, Run Hu, Yaerim Lee, Koji Tsuda, Junichiro Shiomi

Metamaterials are artificially engineered structures that manipulate electromagnetic waves, having optical properties absent in natural materials. Recently, machine learning for the inverse design of metamaterials has drawn attention. However, the highly nonlinear relationship between the metamaterial structures and optical behaviour, coupled with fabrication difficulties, poses challenges for using machine learning to design and manufacture complex metamaterials. Herein, we propose a general framework that implements customised spectrum-to-shape and size parameters to address one-to-many metamaterial inverse design problems using conditional diffusion models. Our method exhibits superior spectral prediction accuracy, generates a diverse range of patterns compared to other typical generative models, and offers valuable prior knowledge for manufacturing through the subsequent analysis of the diverse generated results, thereby facilitating the experimental fabrication of metamaterial designs. We demonstrate the efficacy of the proposed method by successfully designing and fabricating a free-form metamaterial with a tailored selective emission spectrum for thermal camouflage applications.

nan


Article 1274

Title@2025-06-08 (7): On the Generalization of Data-Assisted Control in port-Hamiltonian Systems (DAC-pH)

Title: On the Generalization of Data-Assisted Control in port-Hamiltonian Systems (DAC-pH) Zur Generalisierung der datengestützten Steuerung in port-Hamiltonischen Systemen (DAC-pH) 港口-汉堡系统数据辅助控制普遍化(DAC-pH) 2506.07079v1

Authors (2): Mostafa Eslami, Maryam Babazadeh

This paper introduces a hypothetical hybrid control framework for port-Hamiltonian (p$\mathcal{H}$) systems, employing a dynamic decomposition based on Data-Assisted Control (DAC). The system’s evolution is split into two parts with fixed topology: Right-Hand Side (RHS)- an intrinsic Hamiltonian flow handling worst-case parametric uncertainties, and Left-Hand Side (LHS)- a dissipative/input flow addressing both structural and parametric uncertainties. A virtual port variable $\Pi$ serves as the interface between these two components. A nonlinear controller manages the intrinsic Hamiltonian flow, determining a desired port control value $\Pi_c$. Concurrently, Reinforcement Learning (RL) is applied to the dissipative/input flow to learn an agent for providing optimal policy in mapping $\Pi_c$ to the actual system input. This hybrid approach effectively manages RHS uncertainties while preserving the system’s inherent structure. Key advantages include adjustable performance via LHS controller parameters, enhanced AI explainability and interpretability through the port variable $\Pi$, the ability to guarantee safety and state attainability with hard/soft constraints, reduced complexity in learning hypothesis classes compared to end-to-end solutions, and improved state/parameter estimation using LHS prior knowledge and system Hamiltonian to address partial observability. The paper details the p$\mathcal{H}$ formulation, derives the decomposition, and presents the modular controller architecture. Beyond design, crucial aspects of stability and robustness analysis and synthesis are investigated, paving the way for deeper theoretical investigations. An application example, a pendulum with nonlinear dynamics, is simulated to demonstrate the approach’s empirical and phenomenological benefits for future research.

nan


Article 1275

Title@2025-06-08 (7): AgentDrug: Utilizing Large Language Models in an Agentic Workflow for Zero-Shot Molecular Optimization

Title: AgentDrug: Utilizing Large Language Models in an Agentic Workflow for Zero-Shot Molecular Optimization AgentDrug: Einsatz großer Sprachmodelle in einem Agentischen Workflow für die Null-Shot-Molekularoptimierung A剂药物:利用大型语言模型进行零热分子优化的制剂工作流程中的大语言模型 2410.13147v9

Authors (3): Khiem Le, Ting Hua, Nitesh V. Chawla

Molecular optimization – modifying a given molecule to improve desired properties – is a fundamental task in drug discovery. While LLMs hold the potential to solve this task using natural language to drive the optimization, straightforward prompting achieves limited accuracy. In this work, we propose AgentDrug, an agentic workflow that leverages LLMs in a structured refinement process to achieve significantly higher accuracy. AgentDrug defines a nested refinement loop: the inner loop uses feedback from cheminformatics toolkits to validate molecular structures, while the outer loop guides the LLM with generic feedback and a gradient-based objective to steer the molecule toward property improvement. We evaluate AgentDrug on benchmarks with both single- and multi-property optimization under loose and strict thresholds. Results demonstrate significant performance gains over previous methods. With Qwen-2.5-3B, AgentDrug improves accuracy by 20.7\% (loose) and 16.8\% (strict) on six single-property tasks, and by 7.0\% and 5.3\% on eight multi-property tasks. With larger model Qwen-2.5-7B, AgentDrug further improves accuracy on 6 single-property objectives by 28.9\% (loose) and 29.0\% (strict), and on 8 multi-property objectives by 14.9\% (loose) and 13.2\% (strict).

nan


Article 1276

Title@2025-06-08 (7): E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models

Title: E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models E-BATS: Effiziente Backpropagation-freie Testzeitanpassung für Sprachstiftungsmodelle EBATS: 语音基础模型的有效后支持-无测试-时间适应 2506.07078v1

Authors (6): Jiaheng Dong, Hong Jia, Soumyajit Chatterjee, Abhirup Ghosh, James Bailey, Ting Dang

Speech Foundation Models encounter significant performance degradation when deployed in real-world scenarios involving acoustic domain shifts, such as background noise and speaker accents. Test-time adaptation (TTA) has recently emerged as a viable strategy to address such domain shifts at inference time without requiring access to source data or labels. However, existing TTA approaches, particularly those relying on backpropagation, are memory-intensive, limiting their applicability in speech tasks and resource-constrained settings. Although backpropagation-free methods offer improved efficiency, existing ones exhibit poor accuracy. This is because they are predominantly developed for vision tasks, which fundamentally differ from speech task formulations, noise characteristics, and model architecture, posing unique transferability challenges. In this paper, we introduce E-BATS, the first Efficient BAckpropagation-free TTA framework designed explicitly for speech foundation models. E-BATS achieves a balance between adaptation effectiveness and memory efficiency through three key components: (i) lightweight prompt adaptation for a forward-pass-based feature alignment, (ii) a multi-scale loss to capture both global (utterance-level) and local distribution shifts (token-level) and (iii) a test-time exponential moving average mechanism for stable adaptation across utterances. Experiments conducted on four noisy speech datasets spanning sixteen acoustic conditions demonstrate consistent improvements, with 4.1%-13.5% accuracy gains over backpropagation-free baselines and 2.0-6.4 times GPU memory savings compared to backpropagation-based methods. By enabling scalable and robust adaptation under acoustic variability, this work paves the way for developing more efficient adaptation approaches for practical speech processing systems in real-world environments.

nan


Article 1277

Title@2025-06-08 (7): Diversity from Human Feedback

Title: Diversity from Human Feedback Vielfalt durch menschliches Feedback 来自人类反馈的多样性 2310.06648v3

Authors (7): Ren-Jian Wang, Ke Xue, Yutong Wang, Peng Yang, Haobo Fu, Qiang Fu, Chao Qian

Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that the behavior learned by DivHF is much more consistent with human requirements than the one learned by direct data-driven approaches without human feedback, and makes the final solutions more diverse under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.

nan


Article 1278

Title@2025-06-08 (7): Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization

Title: Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization Beschleunigung der 3D Gaussian Splatting mit neuraler Sortierung und achsenorientierter Rasterisierung 加速 3D Gaussian 与神经排序和轴偏向光化加速 3D Gaussian 喷洒 2506.07069v1

Authors (10): Zhican Wang, Guanghui He, Dantong Liu, Lingjun Gao, Shell Xu Hu, Chen Zhang, Zhuoran Song, Nicholas Lane, Wayne Luk, Hongxiang Fan

3D Gaussian Splatting (3DGS) has recently gained significant attention for high-quality and efficient view synthesis, making it widely adopted in fields such as AR/VR, robotics, and autonomous driving. Despite its impressive algorithmic performance, real-time rendering on resource-constrained devices remains a major challenge due to tight power and area budgets. This paper presents an architecture-algorithm co-design to address these inefficiencies. First, we reveal substantial redundancy caused by repeated computation of common terms/expressions during the conventional rasterization. To resolve this, we propose axis-oriented rasterization, which pre-computes and reuses shared terms along both the X and Y axes through a dedicated hardware design, effectively reducing multiply-and-add (MAC) operations by up to 63%. Second, by identifying the resource and performance inefficiency of the sorting process, we introduce a novel neural sorting approach that predicts order-independent blending weights using an efficient neural network, eliminating the need for costly hardware sorters. A dedicated training framework is also proposed to improve its algorithmic stability. Third, to uniformly support rasterization and neural network inference, we design an efficient reconfigurable processing array that maximizes hardware utilization and throughput. Furthermore, we introduce a $\pi$-trajectory tile schedule, inspired by Morton encoding and Hilbert curve, to optimize Gaussian reuse and reduce memory access overhead. Comprehensive experiments demonstrate that the proposed design preserves rendering quality while achieving a speedup of $23.4\sim27.8\times$ and energy savings of $28.8\sim51.4\times$ compared to edge GPUs for real-world scenes. We plan to open-source our design to foster further development in this field.

nan


Article 1279

Title@2025-06-08 (7): Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems

Title: Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems Stabile Derivative Freie Gaussian Mischung Variationelle Schlussfolgerung für Bayesian Inverse Probleme Bayes 反问题中稳定衍生自由高斯混集变异推论 2501.04259v3

Authors (5): Baojun Che, Yifan Chen, Zhenghao Huan, Daniel Zhengyu Huang, Weijie Wang

This paper is concerned with the approximation of probability distributions known up to normalization constants, with a focus on Bayesian inference for large-scale inverse problems in scientific computing. In this context, key challenges include costly repeated evaluations of forward models, multimodality, and inaccessible gradients for the forward model. To address them, we develop a variational inference framework that combines Fisher-Rao natural gradient with specialized quadrature rules to enable derivative free updates of Gaussian mixture variational families. The resulting method, termed Derivative Free Gaussian Mixture Variational Inference (DF-GMVI), guarantees covariance positivity and affine invariance, offering a stable and efficient framework for approximating complex posterior distributions. The effectiveness of DF-GMVI is demonstrated through numerical experiments on challenging scenarios, including distributions with multiple modes, infinitely many modes, and curved modes in spaces with up to 100 dimensions. The method’s practicality is further demonstrated in a large-scale application, where it successfully recovers the initial conditions of the Navier-Stokes equations from solution data at positive times.

nan


Article 1280

Title@2025-06-08 (7): AdaSplash: Adaptive Sparse Flash Attention

Title: AdaSplash: Adaptive Sparse Flash Attention AdaSplash: Adaptive Sparse Flash Achtung AdaSplash: 适应性散散闪光关注 2502.12082v2

Authors (3): Nuno Gonçalves, Marcos Treviso, André F. T. Martins

The computational cost of softmax-based attention in transformers limits their applicability to long-context tasks. Adaptive sparsity, of which $\alpha$-entmax attention is an example, offers a flexible data-dependent alternative, but existing implementations are inefficient and do not leverage the sparsity to obtain runtime and memory gains. In this work, we propose AdaSplash, which combines the efficiency of GPU-optimized algorithms with the sparsity benefits of $\alpha$-entmax. We first introduce a hybrid Halley-bisection algorithm, resulting in a 7-fold reduction in the number of iterations needed to compute the $\alpha$-entmax transformation. Then, we implement custom Triton kernels to efficiently handle adaptive sparsity. Experiments with RoBERTa and ModernBERT for text classification and single-vector retrieval, along with GPT-2 for language modeling, show that our method achieves substantial improvements in runtime and memory efficiency compared to existing $\alpha$-entmax implementations. It approaches – and in some cases surpasses – the efficiency of highly optimized softmax implementations like FlashAttention-2, enabling long-context training while maintaining strong task performance.

nan


Article 1281

Title@2025-06-08 (7): D2R: dual regularization loss with collaborative adversarial generation for model robustness

Title: D2R: dual regularization loss with collaborative adversarial generation for model robustness D2R: Dualer Regularisierungsverlust mit kollaborativer konversatorischer Generation für Modell Robustheit D2R:双重身份正规化损失,为示范性稳健而协作性对抗性一代人 2506.07056v1

Authors (6): Zhenyu Liu, Huizhi Liang, Rajiv Ranjan, Zhanxing Zhu, Vaclav Snasel, Varun Ojha

The robustness of Deep Neural Network models is crucial for defending models against adversarial attacks. Recent defense methods have employed collaborative learning frameworks to enhance model robustness. Two key limitations of existing methods are (i) insufficient guidance of the target model via loss functions and (ii) non-collaborative adversarial generation. We, therefore, propose a dual regularization loss (D2R Loss) method and a collaborative adversarial generation (CAG) strategy for adversarial training. D2R loss includes two optimization steps. The adversarial distribution and clean distribution optimizations enhance the target model’s robustness by leveraging the strengths of different loss functions obtained via a suitable function space exploration to focus more precisely on the target model’s distribution. CAG generates adversarial samples using a gradient-based collaboration between guidance and target models. We conducted extensive experiments on three benchmark databases, including CIFAR-10, CIFAR-100, Tiny ImageNet, and two popular target models, WideResNet34-10 and PreActResNet18. Our results show that D2R loss with CAG produces highly robust models.

nan


Article 1282

Title@2025-06-08 (7): Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Title: Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead Policy Gradient mit Baumsuche: Lokale Optimas durch Lookahead vermeiden 与树搜索的政策渐变:通过 Lookahead 避免本地 Oppima 2506.07054v1

Authors (6): Uri Koren, Navdeep Kumar, Uri Gadot, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

Classical policy gradient (PG) methods in reinforcement learning frequently converge to suboptimal local optima, a challenge exacerbated in large or complex environments. This work investigates Policy Gradient with Tree Search (PGTS), an approach that integrates an $m$-step lookahead mechanism to enhance policy optimization. We provide theoretical analysis demonstrating that increasing the tree search depth $m$-monotonically reduces the set of undesirable stationary points and, consequently, improves the worst-case performance of any resulting stationary policy. Critically, our analysis accommodates practical scenarios where policy updates are restricted to states visited by the current policy, rather than requiring updates across the entire state space. Empirical evaluations on diverse MDP structures, including Ladder, Tightrope, and Gridworld environments, illustrate PGTS’s ability to exhibit “farsightedness,” navigate challenging reward landscapes, escape local traps where standard PG fails, and achieve superior solutions.

nan


Article 1283

Title@2025-06-08 (7): Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz

Title: Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz Hyperbolisches rezidivierendes neuronales Netzwerk als erste Art von nicht-euklidischen neuronalen Quantenzustandsansatz 超双曲经常性神经网络,作为第一种非欧洲的神经量子状态 ansatz 2505.22083v2

Authors (1): H. L. Dao

In this work, we introduce the first type of non-Euclidean neural quantum state (NQS) ansatz, in the form of the hyperbolic GRU (a variant of recurrent neural networks (RNNs)), to be used in the Variational Monte Carlo method of approximating the ground state energy for quantum many-body systems. In particular, we examine the performances of NQS ansatzes constructed from both conventional or Euclidean RNN/GRU and from hyperbolic GRU in the prototypical settings of the one- and two-dimensional transverse field Ising models (TFIM) and the one-dimensional Heisenberg $J_1J_2$ and $J_1J_2J_3$ systems. By virtue of the fact that, for all of the experiments performed in this work, hyperbolic GRU can yield performances comparable to or better than Euclidean RNNs, which have been extensively studied in these settings in the literature, our work is a proof-of-concept for the viability of hyperbolic GRU as the first type of non-Euclidean NQS ansatz for quantum many-body systems. Furthermore, in settings where the Hamiltonian displays a clear hierarchical interaction structure, such as the 1D Heisenberg $J_1J_2$ & $J_1J_2J_3$ systems with the 1st, 2nd and even 3rd nearest neighbor interactions, our results show that hyperbolic GRU definitively outperforms its Euclidean version in all instances. The fact that these results are reminiscent of the established ones from natural language processing where hyperbolic GRU almost always outperforms Euclidean RNNs when the training data exhibit a tree-like or hierarchical structure leads us to hypothesize that hyperbolic GRU NQS ansatz would likely outperform Euclidean RNN/GRU NQS ansatz in quantum spin systems that involve different degrees of nearest neighbor interactions. Finally, with this work, we hope to initiate future studies of other types of non-Euclidean NQS beyond hyperbolic GRU.

nan


Article 1284

Title@2025-06-08 (7): AHSG: Adversarial Attack on High-level Semantics in Graph Neural Networks

Title: AHSG: Adversarial Attack on High-level Semantics in Graph Neural Networks AHSG: Adversarialer Angriff auf hochrangige Semantik in Graphen-Neuralen Netzwerken AHSG:对图形神经网络高层语义的反向攻击 2412.07468v3

Authors (4): Kai Yuan, Jiahao Zhang, Yidi Wang, Xiaobing Pei

Adversarial attacks on Graph Neural Networks aim to perturb the performance of the learner by carefully modifying the graph topology and node attributes. Existing methods achieve attack stealthiness by constraining the modification budget and differences in graph properties. However, these methods typically disrupt task-relevant primary semantics directly, which results in low defensibility and detectability of the attack. In this paper, we propose an Adversarial Attack on High-level Semantics for Graph Neural Networks (AHSG), which is a graph structure attack model that ensures the retention of primary semantics. By combining latent representations with shared primary semantics, our model retains detectable attributes and relational patterns of the original graph while leveraging more subtle changes to carry out the attack. Then we use the Projected Gradient Descent algorithm to map the latent representations with attack effects to the adversarial graph. Through experiments on robust graph deep learning models equipped with defense strategies, we demonstrate that AHSG outperforms other state-of-the-art methods in attack effectiveness. Additionally, using Contextual Stochastic Block Models to detect the attacked graph further validates that our method preserves the primary semantics of the graph.

nan


Article 1285

Title@2025-06-08 (7): When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models

Title: When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models Wenn Aufmerksamkeit zusammenbricht: Wie degenerierte Ebenen in LLMs kleinere, stärkere Modelle aktivieren 注意折叠时: LLM 中如何分解层的 LLM 启用更小、 更强模型 2404.08634v3

Authors (4): Sunny Sanyal, Ravid Shwartz-Ziv, Alexandros G. Dimakis, Sujay Sanghavi

Large Language Models (LLMs) rely on the transformer architecture and its self-attention mechanism to deliver strong performance across tasks. However, we uncover a structural inefficiency in standard pre-trained decoder-style LLMs: in many of the deeper layers, attention matrices frequently collapse to near rank-one, single-column patterns. We refer to these underutilized components as lazy layers, which are redundant and computationally inefficient. To address this, we propose Inheritune, a simple and effective training recipe for building smaller, more efficient, and high performing language models. Inheritune initializes a compact model by inheriting the useful early layers from a larger pre-trained model, then progressively retrains and expands it. Our experiments across multiple models and datasets show that Inheritune trained models, despite having significantly fewer layers, can match or even outperform their larger counterparts. This approach yields compact, performant models and offers a practical path for efficient language model compression. Code is available at https://github.com/sanyalsunny111/LLM-Inheritune

nan


Article 1286

Title@2025-06-08 (7): FairPFN: A Tabular Foundation Model for Causal Fairness

Title: FairPFN: A Tabular Foundation Model for Causal Fairness FairPFN: Ein Tabular Foundation Modell für Causal Fairness FairPPFN: 因果关系公平性表示基金会模型 2506.07049v1

Authors (5): Jake Robertson, Noah Hollmann, Samuel Müller, Noor Awad, Frank Hutter

Machine learning (ML) systems are utilized in critical sectors, such as healthcare, law enforcement, and finance. However, these systems are often trained on historical data that contains demographic biases, leading to ML decisions that perpetuate or exacerbate existing social inequalities. Causal fairness provides a transparent, human-in-the-loop framework to mitigate algorithmic discrimination, aligning closely with legal doctrines of direct and indirect discrimination. However, current causal fairness frameworks hold a key limitation in that they assume prior knowledge of the correct causal model, restricting their applicability in complex fairness scenarios where causal models are unknown or difficult to identify. To bridge this gap, we propose FairPFN, a tabular foundation model pre-trained on synthetic causal fairness data to identify and mitigate the causal effects of protected attributes in its predictions. FairPFN’s key contribution is that it requires no knowledge of the causal model and still demonstrates strong performance in identifying and removing protected causal effects across a diverse set of hand-crafted and real-world scenarios relative to robust baseline methods. FairPFN paves the way for promising future research, making causal fairness more accessible to a wider variety of complex fairness problems.

nan


Article 1287

Title@2025-06-08 (7): SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Title: SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration SalbeiAchtung: Genaue 8-Bit-Achtung für Plug-and-Play-Inferenzbeschleunigung SageAttention: 精确注意8-Bit 加速插管和玩耍推论加速 2410.02367v8

Authors (6): Jintao Zhang, Jia Wei, Haofeng Huang, Pengle Zhang, Jun Zhu, Jianfei Chen

The transformer architecture predominates across various models. As the heart of the transformer, attention has a computational complexity of $O(N^2)$, compared to $O(N)$ for linear transformations. When handling large sequence lengths, attention becomes the primary time-consuming component. Although quantization has proven to be an effective method for accelerating model inference, existing quantization methods primarily focus on optimizing the linear layer. In response, we first analyze the feasibility of quantization in attention detailedly. Following that, we propose SageAttention, a highly efficient and accurate quantization method for attention. The OPS (operations per second) of our approach outperforms FlashAttention2 and xformers by about 2.1 times and 2.7 times, respectively. SageAttention also achieves superior accuracy performance over FlashAttention3. Comprehensive experiments confirm that our approach incurs almost no end-to-end metrics loss across diverse models, including those for large language processing, image generation, and video generation. The codes are available at https://github.com/thu-ml/SageAttention.

nan


Article 1288

Title@2025-06-08 (7): SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization

Title: SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization SalbeiAchtung2: Effiziente Aufmerksamkeit mit gründlicher Ausreißerglättung und per-thread INT4 Quantisierung SageAttention 2: 高效关注,以完全外流滑动和每英尺INT4 量化 2411.10958v6

Authors (6): Jintao Zhang, Haofeng Huang, Pengle Zhang, Jia Wei, Jun Zhu, Jianfei Chen

Although quantization for linear layers has been widely used, its application to accelerate the attention process remains limited. To further enhance the efficiency of attention computation compared to SageAttention while maintaining precision, we propose SageAttention2, which utilizes significantly faster 4-bit matrix multiplication (Matmul) alongside additional precision-enhancing techniques. First, we propose to quantize matrices $(Q, K)$ to INT4 in a hardware-friendly thread-level granularity and quantize matrices $(\widetilde P, V)$ to FP8. Second, we propose a method to smooth $Q$, enhancing the accuracy of INT4 $QK^\top$. Third, we propose a two-level accumulation strategy for $\widetilde PV$ to enhance the accuracy of FP8 $\widetilde PV$. The operations per second (OPS) of SageAttention2 surpass FlashAttention2 and xformers by about 3x and 4.5x on RTX4090, respectively. Moreover, SageAttention2 matches the speed of FlashAttention3(fp8) on the Hopper GPUs, while delivering much higher accuracy. Comprehensive experiments confirm that our approach incurs negligible end-to-end metrics loss across diverse models, including those for language, image, and video generation. The code is available at https://github.com/thu-ml/SageAttention.

nan


Article 1289

Title@2025-06-08 (7): Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning

Title: Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning Effiziente $Q$-Learning- und Actor-Critic-Methoden für robustes durchschnittliches Reward-Verstärkungs-Lernen 高效益平均奖励强化学习的学习和行为-竞争方法 2506.07040v1

Authors (3): Yang Xu, Swetha Ganesh, Vaneet Aggarwal

We present the first $Q$-learning and actor-critic algorithms for robust average reward Markov Decision Processes (MDPs) with non-asymptotic convergence under contamination, TV distance and Wasserstein distance uncertainty sets. We show that the robust $Q$ Bellman operator is a strict contractive mapping with respect to a carefully constructed semi-norm with constant functions being quotiented out. This property supports a stochastic approximation update, that learns the optimal robust $Q$ function in $\tilde{\cO}(\epsilon^{-2})$ samples. We also show that the same idea can be used for robust $Q$ function estimation, which can be further used for critic estimation. Coupling it with theories in robust policy mirror descent update, we present a natural actor-critic algorithm that attains an $\epsilon$-optimal robust policy in $\tilde{\cO}(\epsilon^{-3})$ samples. These results advance the theory of distributionally robust reinforcement learning in the average reward setting.

nan


Article 1290

Title@2025-06-08 (7): BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling

Title: BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling BRIDGE: Bootstrapping-Text zur Steuerung der Time-Series-Generation über Multi-Agent iterative Optimierung und Diffusionsmodellierung BRIDGE:通过多代理迭代优化和传播模型化控制时间- 系列生成的推进文本 2503.02445v4

Authors (8): Hao Li, Yu-Hao Huang, Chang Xu, Viktor Schlegel, Renhe Jiang, Riza Batista-Navarro, Goran Nenadic, Jiang Bian

Time-series Generation (TSG) is a prominent research area with broad applications in simulations, data augmentation, and counterfactual analysis. While existing methods have shown promise in unconditional single-domain TSG, real-world applications demand for cross-domain approaches capable of controlled generation tailored to domain-specific constraints and instance-level requirements. In this paper, we argue that text can provide semantic insights, domain information and instance-specific temporal patterns, to guide and improve TSG. We introduce ``Text-Controlled TSG’’, a task focused on generating realistic time series by incorporating textual descriptions. To address data scarcity in this setting, we propose a novel LLM-based Multi-Agent framework that synthesizes diverse, realistic text-to-TS datasets. Furthermore, we introduce BRIDGE, a hybrid text-controlled TSG framework that integrates semantic prototypes with text description for supporting domain-level guidance. This approach achieves state-of-the-art generation fidelity on 11 of 12 datasets, and improves controllability by up to 12% on MSE and 6% MAE compared to no text input generation, highlighting its potential for generating tailored time-series data.

nan


Article 1291

Title@2025-06-08 (7): Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond

Title: Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond Learning-at-Criticality in großen Sprachmodellen für Quantenfeldtheorie und darüber hinaus 量子实地理论及以后大语言模型的学习困难 2506.03703v2

Authors (7): Xiansheng Cai, Sihan Hu, Tao Wang, Yuan Huang, Pan Zhang, Youjin Deng, Kun Chen

Fundamental physics often confronts complex symbolic problems with few guiding exemplars or established principles. While artificial intelligence (AI) offers promise, its typical need for vast datasets to learn from hinders its use in these information-scarce frontiers. We introduce learning at criticality (LaC), a reinforcement learning (RL) scheme that tunes Large Language Models (LLMs) to a sharp learning transition, addressing this information scarcity. At this transition, LLMs achieve peak generalization from minimal data, exemplified by 7-digit base-7 addition – a test of nontrivial arithmetic reasoning. To elucidate this peak, we analyze a minimal concept-network model (CoNet) designed to capture the essence of how LLMs might link tokens. Trained on a single exemplar, this model also undergoes a sharp learning transition. This transition exhibits hallmarks of a second-order phase transition, notably power-law distributed solution path lengths. At this critical point, the system maximizes a ``critical thinking pattern” crucial for generalization, enabled by the underlying scale-free exploration. This suggests LLMs reach peak performance by operating at criticality, where such explorative dynamics enable the extraction of underlying operational rules. We demonstrate LaC in quantum field theory: an 8B-parameter LLM, tuned to its critical point by LaC using a few exemplars of symbolic Matsubara sums, solves unseen, higher-order problems, significantly outperforming far larger models. LaC thus leverages critical phenomena, a physical principle, to empower AI for complex, data-sparse challenges in fundamental physics.

nan


Article 1292

Title@2025-06-08 (7): Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalanced Regression

Title: Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalanced Regression Mixtur-Experten mit selbstüberwachter Aggregation für tabellarische Imbalanced Regression 用于制表平衡回归的测试时自我监督集成混合专家 2506.07033v1

Authors (4): Yung-Chien Wang, Kuang-Da Wang, Wei-Yao Wang, Wen-Chih Peng

Tabular data serve as a fundamental and ubiquitous representation of structured information in numerous real-world applications, e.g., finance and urban planning. In the realm of tabular imbalanced applications, data imbalance has been investigated in classification tasks with insufficient instances in certain labels, causing the model’s ineffective generalizability. However, the imbalance issue of tabular regression tasks is underexplored, and yet is critical due to unclear boundaries for continuous labels and simplifying assumptions in existing imbalance regression work, which often rely on known and balanced test distributions. Such assumptions may not hold in practice and can lead to performance degradation. To address these issues, we propose MATI: Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalance Regression, featuring two key innovations: (i) the Region-Aware Mixture Expert, which adopts a Gaussian Mixture Model to capture the underlying related regions. The statistical information of each Gaussian component is then used to synthesize and train region-specific experts to capture the unique characteristics of their respective regions. (ii) Test-Time Self-Supervised Expert Aggregation, which dynamically adjusts region expert weights based on test data features to reinforce expert adaptation across varying test distributions. We evaluated MATI on four real-world tabular imbalance regression datasets, including house pricing, bike sharing, and age prediction. To reflect realistic deployment scenarios, we adopted three types of test distributions: a balanced distribution with uniform target frequencies, a normal distribution that follows the training data, and an inverse distribution that emphasizes rare target regions. On average across these three test distributions, MATI achieved a 7.1% improvement in MAE compared to existing methods.

nan


Article 1293

Title@2025-06-08 (7): AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint

Title: AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint AlphaSteer:学习使用有原则的Null-空间限制的拒绝指导 2506.07022v1

Authors (9): Leheng Sheng, Changshuo Shen, Weixiang Zhao, Junfeng Fang, Xiaohao Liu, Zhenkai Liang, Xiang Wang, An Zhang, Tat-Seng Chua

As LLMs are increasingly deployed in real-world applications, ensuring their ability to refuse malicious prompts, especially jailbreak attacks, is essential for safe and reliable use. Recently, activation steering has emerged as an effective approach for enhancing LLM safety by adding a refusal direction vector to internal activations of LLMs during inference, which will further induce the refusal behaviors of LLMs. However, indiscriminately applying activation steering fundamentally suffers from the trade-off between safety and utility, since the same steering vector can also lead to over-refusal and degraded performance on benign prompts. Although prior efforts, such as vector calibration and conditional steering, have attempted to mitigate this trade-off, their lack of theoretical grounding limits their robustness and effectiveness. To better address the trade-off between safety and utility, we present a theoretically grounded and empirically effective activation steering method called AlphaSteer. Specifically, it considers activation steering as a learnable process with two principled learning objectives: utility preservation and safety enhancement. For utility preservation, it learns to construct a nearly zero vector for steering benign data, with the null-space constraints. For safety enhancement, it learns to construct a refusal direction vector for steering malicious data, with the help of linear regression. Experiments across multiple jailbreak attacks and utility benchmarks demonstrate the effectiveness of AlphaSteer, which significantly improves the safety of LLMs without compromising general capabilities. Our codes are available at https://github.com/AlphaLab-USTC/AlphaSteer.

nan


Article 1294

Title@2025-06-08 (7): LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models

Title: LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models LEANCODE: Modelle besser verstehen für Code-Vereinfachung von vortrainierten großen Sprachmodellen LEANCODE: 更好地理解模式,以更好地简化培训前大语言模式的守则 2505.14759v3

Authors (5): Yan Wang, Ling Ding, Tien N Nguyen, Shaohua Wang, Yanan Zheng

Large Language Models for code often entail significant computational complexity, which grows significantly with the length of the input code sequence. We propose LeanCode for code simplification to reduce training and prediction time, leveraging code contexts in utilizing attention scores to represent the tokens’ importance. We advocate for the selective removal of tokens based on the average context-aware attention scores rather than average scores across all inputs. LeanCode uses the attention scores of `CLS’ tokens within the encoder for classification tasks, such as code search. It also employs the encoder-decoder attention scores to determine token significance for sequence-to-sequence tasks like code summarization. Our evaluation shows LeanCode’s superiority over the SOTAs DietCode and Slimcode, with improvements of 60% and 16% for code search, and 29% and 27% for code summarization, respectively.

nan


Article 1295

Title@2025-06-08 (7): Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation

Title: Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation Struktur-agnostische Optimalität des doppelt robusten Lernens zur Abschätzung des Behandlungseffekts Doubly 治疗效果估测工具的“治疗效果强力学习”的结构-不可知性优化 2402.14264v4

Authors (2): Jikai Jin, Vasilis Syrgkanis

Average treatment effect estimation is the most central problem in causal inference with application to numerous disciplines. While many estimation strategies have been proposed in the literature, the statistical optimality of these methods has still remained an open area of investigation, especially in regimes where these methods do not achieve parametric rates. In this paper, we adopt the recently introduced structure-agnostic framework of statistical lower bounds, which poses no structural properties on the nuisance functions other than access to black-box estimators that achieve some statistical estimation rate. This framework is particularly appealing when one is only willing to consider estimation strategies that use non-parametric regression and classification oracles as black-box sub-processes. Within this framework, we prove the statistical optimality of the celebrated and widely used doubly robust estimators for both the Average Treatment Effect (ATE) and the Average Treatment Effect on the Treated (ATT), as well as weighted variants of the former, which arise in policy evaluation.

nan


Article 1296

Title@2025-06-08 (7): TABLET: Table Structure Recognition using Encoder-only Transformers

Title: TABLET: Table Structure Recognition using Encoder-only Transformers TABELLET: Tabelle Struktur Anerkennung mit Encoder-only Transformers TableT: 使用只使用编码器的变换器确认表格结构 2506.07015v1

Authors (2): Qiyu Hou, Jun Wang

To address the challenges of table structure recognition, we propose a novel Split-Merge-based top-down model optimized for large, densely populated tables. Our approach formulates row and column splitting as sequence labeling tasks, utilizing dual Transformer encoders to capture feature interactions. The merging process is framed as a grid cell classification task, leveraging an additional Transformer encoder to ensure accurate and coherent merging. By eliminating unstable bounding box predictions, our method reduces resolution loss and computational complexity, achieving high accuracy while maintaining fast processing speed. Extensive experiments on FinTabNet and PubTabNet demonstrate the superiority of our model over existing approaches, particularly in real-world applications. Our method offers a robust, scalable, and efficient solution for large-scale table recognition, making it well-suited for industrial deployment.

nan


Article 1297

Title@2025-06-08 (7): Comparison of Lightweight Methods for Vehicle Dynamics-Based Driver Drowsiness Detection

Title: Comparison of Lightweight Methods for Vehicle Dynamics-Based Driver Drowsiness Detection Vergleich von Leichtbaumethoden für Fahrzeugdynamik-basierte Fahrer-Drowsiness-Erkennung 机动机动机动机动机动机动车机动机动车驾驶识别漂泊情况轻型方法比较 2506.07014v1

Authors (3): Yutaro Nakagama, Daisuke Ishii, Kazuki Yoshizoe

Driver drowsiness detection (DDD) prevents road accidents caused by driver fatigue. Vehicle dynamics-based DDD has been proposed as a method that is both economical and high performance. However, there are concerns about the reliability of performance metrics and the reproducibility of many of the existing methods. For instance, some previous studies seem to have a data leakage issue among training and test datasets, and many do not openly provide the datasets they used. To this end, this paper aims to compare the performance of representative vehicle dynamics-based DDD methods under a transparent and fair framework that uses a public dataset. We first develop a framework for extracting features from an open dataset by Aygun et al. and performing DDD with lightweight ML models; the framework is carefully designed to support a variety of onfigurations. Second, we implement three existing representative methods and a concise random forest (RF)-based method in the framework. Finally, we report the results of experiments to verify the reproducibility and clarify the performance of DDD based on common metrics. Among the evaluated methods, the RF-based method achieved the highest accuracy of 88 %. Our findings imply the issues inherent in DDD methods developed in a non-standard manner, and demonstrate a high performance method implemented appropriately.

nan


Article 1298

Title@2025-06-08 (7): Half-AVAE: Adversarial-Enhanced Factorized and Structured Encoder-Free VAE for Underdetermined Independent Component Analysis

Title: Half-AVAE: Adversarial-Enhanced Factorized and Structured Encoder-Free VAE for Underdetermined Independent Component Analysis Half-AVAE: Adversarial-erhöhte faktorisierte und strukturierte Encoder-freie VAE für unterbestimmte unabhängige Komponentenanalyse 半AVAE:用于未确定独立部件分析的反强化加工厂和结构化无编码器 VAE 2506.07011v1

Authors (2): Yuan-Hao Wei, Yan-Jie Sun

This study advances the Variational Autoencoder (VAE) framework by addressing challenges in Independent Component Analysis (ICA) under both determined and underdetermined conditions, focusing on enhancing the independence and interpretability of latent variables. Traditional VAEs map observed data to latent variables and back via an encoder-decoder architecture, but struggle with underdetermined ICA where the number of latent variables exceeds observed signals. The proposed Half Adversarial VAE (Half-AVAE) builds on the encoder-free Half-VAE framework, eliminating explicit inverse mapping to tackle underdetermined scenarios. By integrating adversarial networks and External Enhancement (EE) terms, Half-AVAE promotes mutual independence among latent dimensions, achieving factorized and interpretable representations. Experiments with synthetic signals demonstrate that Half-AVAE outperforms baseline models, including GP-AVAE and Half-VAE, in recovering independent components under underdetermined conditions, as evidenced by lower root mean square errors. The study highlights the flexibility of VAEs in variational inference, showing that encoder omission, combined with adversarial training and structured priors, enables effective solutions for complex ICA tasks, advancing applications in disentanglement, causal inference, and generative modeling.

nan


Article 1299

Title@2025-06-08 (7): When do neural networks learn world models?

Title: When do neural networks learn world models? Wann lernen neuronale Netzwerke Weltmodelle? 神经网络何时学习世界模型? 2502.09297v4

Authors (3): Tianren Zhang, Guanyu Chen, Feng Chen

Humans develop world models that capture the underlying generation process of data. Whether neural networks can learn similar world models remains an open problem. In this work, we present the first theoretical results for this problem, showing that in a multi-task setting, models with a low-degree bias provably recover latent data-generating variables under mild assumptions–even if proxy tasks involve complex, non-linear functions of the latents. However, such recovery is sensitive to model architecture. Our analysis leverages Boolean models of task solutions via the Fourier-Walsh transform and introduces new techniques for analyzing invertible Boolean transforms, which may be of independent interest. We illustrate the algorithmic implications of our results and connect them to related research areas, including self-supervised learning, out-of-distribution generalization, and the linear representation hypothesis in large language models.

nan


Article 1300

Title@2025-06-08 (7): LLM-HDR: Bridging LLM-based Perception and Self-Supervision for Unpaired LDR-to-HDR Image Reconstruction

Title: LLM-HDR: Bridging LLM-based Perception and Self-Supervision for Unpaired LDR-to-HDR Image Reconstruction LLM-HDR: Überbrückung der LLM-basierten Wahrnehmung und Selbstüberwachung für ungepaarte LDR-zu-HDR-Bildrekonstruktion LLM-HDR: 连接以LLM为基础的LLM感知和自我监督的未受重视的LDR到HDR图像重建 2410.15068v3

Authors (6): Hrishav Bakul Barua, Kalin Stefanov, Lemuel Lai En Che, Abhinav Dhall, KokSheik Wong, Ganesh Krishnasamy

The translation of Low Dynamic Range (LDR) to High Dynamic Range (HDR) images is an important computer vision task. There is a significant amount of research utilizing both conventional non-learning methods and modern data-driven approaches, focusing on using both single-exposed and multi-exposed LDR for HDR image reconstruction. However, most current state-of-the-art methods require high-quality paired {LDR,HDR} datasets for model training. In addition, there is limited literature on using unpaired datasets for this task, that is, the model learns a mapping between domains, i.e., {LDR,HDR}. This paper proposes LLM-HDR, a method that integrates the perception of Large Language Models (LLM) into a modified semantic- and cycle-consistent adversarial architecture that utilizes unpaired {LDR,HDR} datasets for training. The method introduces novel artifact- and exposure-aware generators to address visual artifact removal and an encoder and loss to address semantic consistency, another under-explored topic. LLM-HDR is the first to use an LLM for the {LDR,HDR} translation task in a self-supervised setup. The method achieves state-of-the-art performance across several benchmark datasets and reconstructs high-quality HDR images. The official website of this work is available at: https://github.com/HrishavBakulBarua/LLM-HDR

nan


Article 1301

Title@2025-06-08 (7): Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts

Title: Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts Feynman-Kac Korrektoren in der Diffusion: Annealing, Anleitung und Produkt von Experten 传播中Feynman-Kac的更正者: 2503.02819v2

Authors (9): Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian, Roberto Bondesan, Alán Aspuru-Guzik, Arnaud Doucet, Rob Brekelmans, Alexander Tong, Kirill Neklyudov

While score-based generative models are the model of choice across diverse domains, there are limited tools available for controlling inference-time behavior in a principled manner, e.g. for composing multiple pretrained models. Existing classifier-free guidance methods use a simple heuristic to mix conditional and unconditional scores to approximately sample from conditional distributions. However, such methods do not approximate the intermediate distributions, necessitating additional `corrector’ steps. In this work, we provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models. We derive a weighted simulation scheme which we call Feynman-Kac Correctors (FKCs) based on the celebrated Feynman-Kac formula by carefully accounting for terms in the appropriate partial differential equations (PDEs). To simulate these PDEs, we propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality. We empirically demonstrate the utility of our methods by proposing amortized sampling via inference-time temperature annealing, improving multi-objective molecule generation using pretrained models, and improving classifier-free guidance for text-to-image generation. Our code is available at https://github.com/martaskrt/fkc-diffusion.

nan


Article 1302

Title@2025-06-08 (7): Automatic Doubly Robust Forests

Title: Automatic Doubly Robust Forests Automatic Doppelt robuste Wälder Doubly 自动硬森林 2412.07184v2

Authors (4): Zhaomeng Chen, Junting Duan, Victor Chernozhukov, Vasilis Syrgkanis

This paper proposes the automatic Doubly Robust Random Forest (DRRF) algorithm for estimating the conditional expectation of a moment functional in the presence of high-dimensional nuisance functions. DRRF extends the automatic debiasing framework based on the Riesz representer to the conditional setting and enables nonparametric, forest-based estimation (Athey et al., 2019; Oprescu et al., 2019). In contrast to existing methods, DRRF does not require prior knowledge of the form of the debiasing term or impose restrictive parametric or semi-parametric assumptions on the target quantity. Additionally, it is computationally efficient in making predictions at multiple query points. We establish consistency and asymptotic normality results for the DRRF estimator under general assumptions, allowing for the construction of valid confidence intervals. Through extensive simulations in heterogeneous treatment effect (HTE) estimation, we demonstrate the superior performance of DRRF over benchmark approaches in terms of estimation accuracy, robustness, and computational efficiency.

nan


Article 1303

Title@2025-06-08 (7): End-to-End Probabilistic Framework for Learning with Hard Constraints

Title: End-to-End Probabilistic Framework for Learning with Hard Constraints End-to-End-Probabilistisches Rahmenwerk für das Lernen mit harten Einschränkungen 最终到最终的艰苦制约因素学习概率框架 2506.07003v1

Authors (5): Utkarsh Utkarsh, Danielle C. Maddix, Ruijun Ma, Michael W. Mahoney, Yuyang Wang

We present a general purpose probabilistic forecasting framework, ProbHardE2E, to learn systems that can incorporate operational/physical constraints as hard requirements. ProbHardE2E enforces hard constraints by exploiting variance information in a novel way; and thus it is also capable of performing uncertainty quantification (UQ) on the model. Our methodology uses a novel differentiable probabilistic projection layer (DPPL) that can be combined with a wide range of neural network architectures. This DPPL allows the model to learn the system in an end-to-end manner, compared to other approaches where the constraints are satisfied either through a post-processing step or at inference. In addition, ProbHardE2E can optimize a strictly proper scoring rule, without making any distributional assumptions on the target, which enables it to obtain robust distributional estimates (in contrast to existing approaches that generally optimize likelihood-based objectives, which are heavily biased by their distributional assumptions and model choices); and it can incorporate a range of non-linear constraints (increasing the power of modeling and flexibility). We apply ProbHardE2E to problems in learning partial differential equations with uncertainty estimates and to probabilistic time-series forecasting, showcasing it as a broadly applicable general setup that connects these seemingly disparate domains.

nan


Article 1304

Title@2025-06-08 (7): Towards Physics-informed Diffusion for Anomaly Detection in Trajectories

Title: Towards Physics-informed Diffusion for Anomaly Detection in Trajectories Auf dem Weg zu physikinformierter Diffusion für Anomalienerkennung in Trajektorien 走向在物理知情的情况下传播用于轨迹中的异常探测 2506.06999v1

Authors (6): Arun Sharma, Mingzhou Yang, Majid Farhadloo, Subhankar Ghosh, Bharat Jayaprakash, Shashi Shekhar

Given trajectory data, a domain-specific study area, and a user-defined threshold, we aim to find anomalous trajectories indicative of possible GPS spoofing (e.g., fake trajectory). The problem is societally important to curb illegal activities in international waters, such as unauthorized fishing and illicit oil transfers. The problem is challenging due to advances in AI generated in deep fakes generation (e.g., additive noise, fake trajectories) and lack of adequate amount of labeled samples for ground-truth verification. Recent literature shows promising results for anomalous trajectory detection using generative models despite data sparsity. However, they do not consider fine-scale spatiotemporal dependencies and prior physical knowledge, resulting in higher false-positive rates. To address these limitations, we propose a physics-informed diffusion model that integrates kinematic constraints to identify trajectories that do not adhere to physical laws. Experimental results on real-world datasets in the maritime and urban domains show that the proposed framework results in higher prediction accuracy and lower estimation error rate for anomaly detection and trajectory generation methods, respectively. Our implementation is available at https://github.com/arunshar/Physics-Informed-Diffusion-Probabilistic-Model.

nan


Article 1305

Title@2025-06-08 (7): What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding

Title: What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding Was macht Vernunftmodelle anders? Folgen Sie der Vernunftführerin für effiziente Decodierung 是什么让理性模型不同? 跟随高效配方的理性导师 2506.06998v1

Authors (7): Ming Li, Zhengyuan Yang, Xiyao Wang, Dianqi Li, Kevin Lin, Tianyi Zhou, Lijuan Wang

Large reasoning models (LRMs) achieve strong reasoning performance by emitting long chains of thought. Yet, these verbose traces slow down inference and often drift into unnecessary detail, known as the overthinking phenomenon. To better understand LRMs’ behavior, we systematically analyze the token-level misalignment between reasoning and non-reasoning models. While it is expected that their primary difference lies in the stylistic “thinking cues”, LRMs uniquely exhibit two pivotal, previously under-explored phenomena: a Global Misalignment Rebound, where their divergence from non-reasoning models persists or even grows as response length increases, and more critically, a Local Misalignment Diminish, where the misalignment concentrates at the “thinking cues” each sentence starts with but rapidly declines in the remaining of the sentence. Motivated by the Local Misalignment Diminish, we propose FoReaL-Decoding, a collaborative fast-slow thinking decoding method for cost-quality trade-off. In FoReaL-Decoding, a Leading model leads the first few tokens for each sentence, and then a weaker draft model completes the following tokens to the end of each sentence. FoReaL-Decoding adopts a stochastic gate to smoothly interpolate between the small and the large model. On four popular math-reasoning benchmarks (AIME24, GPQA-Diamond, MATH500, AMC23), FoReaL-Decoding reduces theoretical FLOPs by 30 to 50% and trims CoT length by up to 40%, while preserving 86 to 100% of model performance. These results establish FoReaL-Decoding as a simple, plug-and-play route to controllable cost-quality trade-offs in reasoning-centric tasks.

nan


Article 1306

Title@2025-06-08 (7): Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access

Title: Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access Probenkomplexität des Diffusionsmodelltrainings ohne empirischen Risikominimizer-Zugang 传播示范培训的复杂程度,没有经验风险最小化,没有经验风险最小化机会 2505.18344v2

Authors (5): Mudit Gaur, Prashant Trivedi, Sasidhar Kunapuli, Amrit Singh Bedi, Vaneet Aggarwal

Diffusion models have demonstrated state-of-the-art performance across vision, language, and scientific domains. Despite their empirical success, prior theoretical analyses of the sample complexity suffer from poor scaling with input data dimension or rely on unrealistic assumptions such as access to exact empirical risk minimizers. In this work, we provide a principled analysis of score estimation, establishing a sample complexity bound of $\widetilde{\mathcal{O}}(\epsilon^{-6})$. Our approach leverages a structured decomposition of the score estimation error into statistical, approximation, and optimization errors, enabling us to eliminate the exponential dependence on neural network parameters that arises in prior analyses. It is the first such result which achieves sample complexity bounds without assuming access to the empirical risk minimizer of score function estimation loss.

nan


Article 1307

Title@2025-06-08 (7): STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings

Title: STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings STAMP Ihr Inhalt: Nachweis der Datensatzmitgliedschaft über Watermarked Rephrasings STAMP 您的内容: 通过水标修改方式证明数据集成员 2504.13416v2

Authors (3): Saksham Rastogi, Pratyush Maini, Danish Pruthi

Given how large parts of publicly available text are crawled to pretrain large language models (LLMs), data creators increasingly worry about the inclusion of their proprietary data for model training without attribution or licensing. Their concerns are also shared by benchmark curators whose test-sets might be compromised. In this paper, we present STAMP, a framework for detecting dataset membership-i.e., determining the inclusion of a dataset in the pretraining corpora of LLMs. Given an original piece of content, our proposal involves first generating multiple rephrases, each embedding a watermark with a unique secret key. One version is to be released publicly, while others are to be kept private. Subsequently, creators can compare model likelihoods between public and private versions using paired statistical tests to prove membership. We show that our framework can successfully detect contamination across four benchmarks which appear only once in the training data and constitute less than 0.001% of the total tokens, outperforming several contamination detection and dataset inference baselines. We verify that STAMP preserves both the semantic meaning and utility of the original data. We apply STAMP to two real-world scenarios to confirm the inclusion of paper abstracts and blog articles in the pretraining corpora.

nan


Article 1308

Title@2025-06-08 (7): Correcting for Position Bias in Learning to Rank: A Control Function Approach

Title: Correcting for Position Bias in Learning to Rank: A Control Function Approach Korrektur für Position Bias in Learning to Rank: Ein Steuerungsfunktionsansatz 校正学习排名中的比值位置:控制功能方法 2506.06989v1

Authors (3): Md Aminul Islam, Kathryn Vasilaky, Elena Zheleva

Implicit feedback data, such as user clicks, is commonly used in learning-to-rank (LTR) systems because it is easy to collect and it often reflects user preferences. However, this data is prone to various biases, and training an LTR system directly on biased data can result in suboptimal ranking performance. One of the most prominent and well-studied biases in implicit feedback data is position bias, which occurs because users are more likely to interact with higher-ranked documents regardless of their true relevance. In this paper, we propose a novel control function-based method that accounts for position bias in a two-stage process. The first stage uses exogenous variation from the residuals of the ranking process to correct for position bias in the second stage click equation. Unlike previous position bias correction methods, our method does not require knowledge of the click or propensity model and allows for nonlinearity in the underlying ranking model. Moreover, our method is general and allows for debiasing any state-of-the-art ranking algorithm by plugging it into the second stage. We also introduce a technique to debias validation clicks for hyperparameter tuning to select the optimal model in the absence of unbiased validation data. Experimental results demonstrate that our method outperforms state-of-the-art approaches in correcting for position bias.

nan


Article 1309

Title@2025-06-08 (7): Fully Explainable Classification Models Using Hyperblocks

Title: Fully Explainable Classification Models Using Hyperblocks Vollständig erklärbare Klassifikationsmodelle mit Hyperblocks 使用超超分块的完全可解释的分类模型 2506.06986v1

Authors (3): Austin Snyder, Ryan Gallagher, Boris Kovalerchuk

Building on existing work with Hyperblocks, which classify data using minimum and maximum bounds for each attribute, we focus on enhancing interpretability, decreasing training time, and reducing model complexity without sacrificing accuracy. This system allows subject matter experts (SMEs) to directly inspect and understand the model’s decision logic without requiring extensive machine learning expertise. To reduce Hyperblock complexity while retaining performance, we introduce a suite of algorithms for Hyperblock simplification. These include removing redundant attributes, removing redundant blocks through overlap analysis, and creating disjunctive units. These methods eliminate unnecessary parameters, dramatically reducing model size without harming classification power. We increase robustness by introducing an interpretable fallback mechanism using k-Nearest Neighbor (k-NN) classifiers for points not covered by any block, ensuring complete data coverage while preserving model transparency. Our results demonstrate that interpretable models can scale to high-dimensional, large-volume datasets while maintaining competitive accuracy. On benchmark datasets such as WBC (9-D), we achieve strong predictive performance with significantly reduced complexity. On MNIST (784-D), our method continues to improve through tuning and simplification, showing promise as a transparent alternative to black-box models in domains where trust, clarity, and control are crucial.

nan


Article 1310

Title@2025-06-08 (7): Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments

Title: Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments Deep RL braucht tiefe Verhaltensanalyse: Implizite Planung durch modellfreie Agenten in offenen Umgebungen erkunden 深 RL 需要深 RL 深度行为分析:探索不限名额环境中无示范工具的隐性规划 2506.06981v1

Authors (8): Riley Simmons-Edler, Ryan P. Badman, Felix Baastad Berg, Raymond Chua, John J. Vastola, Joshua Lunger, William Qian, Kanaka Rajan

Understanding the behavior of deep reinforcement learning (DRL) agents – particularly as task and agent sophistication increase – requires more than simple comparison of reward curves, yet standard methods for behavioral analysis remain underdeveloped in DRL. We apply tools from neuroscience and ethology to study DRL agents in a novel, complex, partially observable environment, ForageWorld, designed to capture key aspects of real-world animal foraging – including sparse, depleting resource patches, predator threats, and spatially extended arenas. We use this environment as a platform for applying joint behavioral and neural analysis to agents, revealing detailed, quantitatively grounded insights into agent strategies, memory, and planning. Contrary to common assumptions, we find that model-free RNN-based DRL agents can exhibit structured, planning-like behavior purely through emergent dynamics – without requiring explicit memory modules or world models. Our results show that studying DRL agents like animals – analyzing them with neuroethology-inspired tools that reveal structure in both behavior and neural dynamics – uncovers rich structure in their learning dynamics that would otherwise remain invisible. We distill these tools into a general analysis framework linking core behavioral and representational features to diagnostic methods, which can be reused for a wide range of tasks and agents. As agents grow more complex and autonomous, bridging neuroscience, cognitive science, and AI will be essential – not just for understanding their behavior, but for ensuring safe alignment and maximizing desirable behaviors that are hard to measure via reward. We show how this can be done by drawing on lessons from how biological intelligence is studied.

nan


Article 1311

Title@2025-06-08 (7): MoXGATE: Modality-aware cross-attention for multi-omic gastrointestinal cancer sub-type classification

Title: MoXGATE: Modality-aware cross-attention for multi-omic gastrointestinal cancer sub-type classification MoXGATE: Modalitätsbewusste Cross-Attention für multi-omere gastrointestinale Tumor-Subtypklassifikation MoXGATE:多工程胃肠胃癌亚型癌症分类模式 – – 意识交叉关注 2506.06980v1

Authors (5): Sajib Acharjee Dip, Uddip Acharjee Shuvo, Dipanwita Mallick, Abrar Rahman Abir, Liqing Zhang

Cancer subtype classification is crucial for personalized treatment and prognostic assessment. However, effectively integrating multi-omic data remains challenging due to the heterogeneous nature of genomic, epigenomic, and transcriptomic features. In this work, we propose Modality-Aware Cross-Attention MoXGATE, a novel deep-learning framework that leverages cross-attention and learnable modality weights to enhance feature fusion across multiple omics sources. Our approach effectively captures inter-modality dependencies, ensuring robust and interpretable integration. Through experiments on Gastrointestinal Adenocarcinoma (GIAC) and Breast Cancer (BRCA) datasets from TCGA, we demonstrate that MoXGATE outperforms existing methods, achieving 95\% classification accuracy. Ablation studies validate the effectiveness of cross-attention over simple concatenation and highlight the importance of different omics modalities. Moreover, our model generalizes well to unseen cancer types e.g., breast cancer, underscoring its adaptability. Key contributions include (1) a cross-attention-based multi-omic integration framework, (2) modality-weighted fusion for enhanced interpretability, (3) application of focal loss to mitigate data imbalance, and (4) validation across multiple cancer subtypes. Our results indicate that MoXGATE is a promising approach for multi-omic cancer subtype classification, offering improved performance and biological generalizability.

nan


Article 1312

Title@2025-06-08 (7): Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias

Title: Turb-L1: Achieving Long-term Turbulence Tracing By Tackling Spectral Bias Turb-L1: Langfristige Turbulenzen erreichen, die durch das Greifen spektraler Bias verfolgt werden Turb-L1:通过处理光辉双鱼,实现长期动荡追踪 2505.19038v2

Authors (10): Hao Wu, Yuan Gao, Ruiqi Shu, Zean Han, Fan Xu, Zhihong Zhu, Qingsong Wen, Xian Wu, Kun Wang, Xiaomeng Huang

Accurately predicting the long-term evolution of turbulence is crucial for advancing scientific understanding and optimizing engineering applications. However, existing deep learning methods face significant bottlenecks in long-term autoregressive prediction, which exhibit excessive smoothing and fail to accurately track complex fluid dynamics. Our extensive experimental and spectral analysis of prevailing methods provides an interpretable explanation for this shortcoming, identifying Spectral Bias as the core obstacle. Concretely, spectral bias is the inherent tendency of models to favor low-frequency, smooth features while overlooking critical high-frequency details during training, thus reducing fidelity and causing physical distortions in long-term predictions. Building on this insight, we propose Turb-L1, an innovative turbulence prediction method, which utilizes a Hierarchical Dynamics Synthesis mechanism within a multi-grid architecture to explicitly overcome spectral bias. It accurately captures cross-scale interactions and preserves the fidelity of high-frequency dynamics, enabling reliable long-term tracking of turbulence evolution. Extensive experiments on the 2D turbulence benchmark show that Turb-L1 demonstrates excellent performance: (I) In long-term predictions, it reduces Mean Squared Error (MSE) by $80.3\%$ and increases Structural Similarity (SSIM) by over $9\times$ compared to the SOTA baseline, significantly improving prediction fidelity. (II) It effectively overcomes spectral bias, accurately reproducing the full enstrophy spectrum and maintaining physical realism in high-wavenumber regions, thus avoiding the spectral distortions or spurious energy accumulation seen in other methods.

nan


Article 1313

Title@2025-06-08 (7): Near Optimal Non-asymptotic Sample Complexity of 1-Identification

Title: Near Optimal Non-asymptotic Sample Complexity of 1-Identification Nahe Optimale nicht-asymptotische Probenkomplexität der 1-Identifikation 近于最优化的非不消毒样品 复杂度 1 - 识别 2506.06978v1

Authors (2): Zitian Li, Wang Chi Cheung

Motivated by an open direction in existing literature, we study the 1-identification problem, a fundamental multi-armed bandit formulation on pure exploration. The goal is to determine whether there exists an arm whose mean reward is at least a known threshold $\mu_0$, or to output None if it believes such an arm does not exist. The agent needs to guarantee its output is correct with probability at least $1-\delta$. Degenne & Koolen 2019 has established the asymptotically tight sample complexity for the 1-identification problem, but they commented that the non-asymptotic analysis remains unclear. We design a new algorithm Sequential-Exploration-Exploitation (SEE), and conduct theoretical analysis from the non-asymptotic perspective. Novel to the literature, we achieve near optimality, in the sense of matching upper and lower bounds on the pulling complexity. The gap between the upper and lower bounds is up to a polynomial logarithmic factor. The numerical result also indicates the effectiveness of our algorithm, compared to existing benchmarks.

nan


Article 1314

Title@2025-06-08 (7): Post Reinforcement Learning Inference

Title: Post Reinforcement Learning Inference Rückschlüsse nach der Verstärkung des Lernens 强化后学习的推论 2302.08854v4

Authors (2): Vasilis Syrgkanis, Ruohan Zhan

We consider estimation and inference using data collected from reinforcement learning algorithms. These algorithms, characterized by their adaptive experimentation, interact with individual units over multiple stages, dynamically adjusting their strategies based on previous interactions. Our goal is to evaluate a counterfactual policy post-data collection and estimate structural parameters, like dynamic treatment effects, which can be used for credit assignment and determining the effect of earlier actions on final outcomes. Such parameters of interest can be framed as solutions to moment equations, but not minimizers of a population loss function, leading to Z-estimation approaches for static data. However, in the adaptive data collection environment of reinforcement learning, where algorithms deploy nonstationary behavior policies, standard estimators do not achieve asymptotic normality due to the fluctuating variance. We propose a weighted Z-estimation approach with carefully designed adaptive weights to stabilize the time-varying estimation variance. We identify proper weighting schemes to restore the consistency and asymptotic normality of the weighted Z-estimators for target parameters, which allows for hypothesis testing and constructing uniform confidence regions. Primary applications include dynamic treatment effect estimation and dynamic off-policy evaluation.

nan


Article 1315

Title@2025-06-08 (7): UdonCare: Hierarchy Pruning for Unseen Domain Discovery in Predictive Healthcare

Title: UdonCare: Hierarchy Pruning for Unseen Domain Discovery in Predictive Healthcare UdonCare: Hierarchie Pruning für Unsichtbare Domain Discovery in Predictive Healthcare UdonCare: 预测保健中未见的域域发现 的分级预留机制 2506.06977v1

Authors (4): Pengfei Hu, Xiaoxue Han, Fei Wang, Yue Ning

Domain generalization has become a critical challenge in clinical prediction, where patient cohorts often exhibit shifting data distributions that degrade model performance. Typical domain generalization approaches struggle in real-world healthcare settings for two main reasons: (1) patient-specific domain labels are typically unavailable, making domain discovery especially difficult; (2) purely data-driven approaches overlook key clinical insights, leading to a gap in medical knowledge integration. To address these problems, we leverage hierarchical medical ontologies like the ICD-9-CM hierarchy to group diseases into higher-level categories and discover more flexible latent domains. In this paper, we introduce UdonCare, a hierarchy-guided framework that iteratively prunes fine-grained domains, encodes these refined domains, and applies a Siamese-type inference mechanism to separate domain-related signals from patient-level features. Experimental results on clinical datasets (MIMIC-III and MIMIC-IV) show that the proposed model achieves higher performance compared to other domain generalization baselines when substantial domain gaps presents, highlighting the untapped potential of medical knowledge for enhancing domain generalization in practical healthcare applications.

nan


Article 1316

Title@2025-06-08 (7): Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning

Title: Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning Lernen, durch verstärktes Lernen durch reward-gewichtete Feintuning zu klären 学习通过通过奖励-加权微调加强学习来澄清问题 2506.06964v1

Authors (9): Subhojyoti Mukherjee, Viet Dac Lai, Raghavendra Addanki, Ryan Rossi, Seunghyun Yoon, Trung Bui, Anup Rao, Jayakumar Subramanian, Branislav Kveton

Question answering (QA) agents automatically answer questions posed in natural language. In this work, we learn to ask clarifying questions in QA agents. The key idea in our method is to simulate conversations that contain clarifying questions and learn from them using reinforcement learning (RL). To make RL practical, we propose and analyze offline RL objectives that can be viewed as reward-weighted supervised fine-tuning (SFT) and easily optimized in large language models. Our work stands in a stark contrast to recently proposed methods, based on SFT and direct preference optimization, which have additional hyper-parameters and do not directly optimize rewards. We compare to these methods empirically and report gains in both optimized rewards and language quality.

nan


Article 1317

Title@2025-06-08 (7): Fitting Multilevel Factor Models

Title: Fitting Multilevel Factor Models Passende Multilevel-Faktormodelle 适合多级因数模型 2409.12067v3

Authors (3): Tetiana Parshakova, Trevor Hastie, Stephen Boyd

We examine a special case of the multilevel factor model, with covariance given by multilevel low rank (MLR) matrix~\cite{parshakova2023factor}. We develop a novel, fast implementation of the expectation-maximization algorithm, tailored for multilevel factor models, to maximize the likelihood of the observed data. This method accommodates any hierarchical structure and maintains linear time and storage complexities per iteration. This is achieved through a new efficient technique for computing the inverse of the positive definite MLR matrix. We show that the inverse of positive definite MLR matrix is also an MLR matrix with the same sparsity in factors, and we use the recursive Sherman-Morrison-Woodbury matrix identity to obtain the factors of the inverse. Additionally, we present an algorithm that computes the Cholesky factorization of an expanded matrix with linear time and space complexities, yielding the covariance matrix as its Schur complement. This paper is accompanied by an open-source package that implements the proposed methods.

nan


Article 1318

Title@2025-06-08 (7): CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration

Title: CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration CE-CoLLM: Effiziente und adaptive Großsprachenmodelle durch Cloud-Edge-Kollaboration CE-COLLM:通过云眼协作建立高效和适应性大语言模型 2411.02829v2

Authors (2): Hongpeng Jin, Yanzhao Wu

Large Language Models (LLMs) exhibit remarkable human-like predictive capabilities. However, it is challenging to deploy LLMs to provide efficient and adaptive inference services at the edge. This paper proposes a novel Cloud-Edge Collaboration framework for LLMs (CE-CoLLM) to tackle these challenges. First, we identify the transmission of LLM contextual data between the cloud and edge as a key performance bottleneck, which introduces substantial communication overhead that dominates overall inference latency and makes na"ive cloud-edge collaboration for LLMs inefficient. Second, we introduce a suite of novel techniques, including a latency-aware early exit mechanism and efficient cloud context management, into CE-CoLLM, which collectively reduce communication overhead and preserve LLM inference accuracy. Third, we design two adaptive inference modes to accommodate diverse edge environments: (1) a low-latency standalone edge inference mode that enables reliable edge-side independent LLM inference even under unstable network conditions, and (2) a high-accuracy cloud-edge collaborative inference mode that adaptively leverages cloud resources to enhance prediction accuracy. Extensive experiments on multiple benchmark datasets demonstrate that CE-CoLLM reduces overall inference time by up to 13.81% and offloads over 84.53% of the computational workload from the cloud to the edge, compared to conventional cloud-based LLM deployment, without sacrificing prediction accuracy. The code is provided on GitHub at https://github.com/mlsysx/CE-CoLLM.

nan


Article 1319

Title@2025-06-08 (7): Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs

Title: Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs Beyond Self-Repellent Kernels: History-Driven Target Towards Efficient Nonlinear MCMC on General Graphs 超越自反应内核:历史驱动目标,实现高效的非线性非线性通用图形MCMC 2505.18300v2

Authors (3): Jie Hu, Yi-Ting Ma, Do Young Eun

We propose a history-driven target (HDT) framework in Markov Chain Monte Carlo (MCMC) to improve any random walk algorithm on discrete state spaces, such as general undirected graphs, for efficient sampling from target distribution $\boldsymbol{\mu}$. With broad applications in network science and distributed optimization, recent innovations like the self-repellent random walk (SRRW) achieve near-zero variance by prioritizing under-sampled states through transition kernel modifications based on past visit frequencies. However, SRRW’s reliance on explicit computation of transition probabilities for all neighbors at each step introduces substantial computational overhead, while its strict dependence on time-reversible Markov chains excludes advanced non-reversible MCMC methods. To overcome these limitations, instead of direct modification of transition kernel, HDT introduces a history-dependent target distribution $\boldsymbol{\pi}[\mathbf{x}]$ to replace the original target $\boldsymbol{\mu}$ in any graph sampler, where $\mathbf{x}$ represents the empirical measure of past visits. This design preserves lightweight implementation by requiring only local information between the current and proposed states and achieves compatibility with both reversible and non-reversible MCMC samplers, while retaining unbiased samples with target distribution $\boldsymbol{\mu}$ and near-zero variance performance. Extensive experiments in graph sampling demonstrate consistent performance gains, and a memory-efficient Least Recently Used (LRU) cache ensures scalability to large general graphs.

nan


Article 1320

Title@2025-06-08 (7): Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression

Title: Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression Sicherheitsbewusstes Verstärktes Lernen zur Kontrolle über risikosensitive Aktions-Value Iteration und quantitative Regression 通过风险敏感行动增值和量回归,加强安全软件强化学习,以通过风险敏感行动控制 2506.06954v1

Authors (4): Clinton Enwerem, Aniruddh G. Puranic, John S. Baras, Calin Belta

Mainstream approximate action-value iteration reinforcement learning (RL) algorithms suffer from overestimation bias, leading to suboptimal policies in high-variance stochastic environments. Quantile-based action-value iteration methods reduce this bias by learning a distribution of the expected cost-to-go using quantile regression. However, ensuring that the learned policy satisfies safety constraints remains a challenge when these constraints are not explicitly integrated into the RL framework. Existing methods often require complex neural architectures or manual tradeoffs due to combined cost functions. To address this, we propose a risk-regularized quantile-based algorithm integrating Conditional Value-at-Risk (CVaR) to enforce safety without complex architectures. We also provide theoretical guarantees on the contraction properties of the risk-sensitive distributional Bellman operator in Wasserstein space, ensuring convergence to a unique cost distribution. Simulations of a mobile robot in a dynamic reach-avoid task show that our approach leads to more goal successes, fewer collisions, and better safety-performance trade-offs compared to risk-neutral methods.

nan


Article 1321

Title@2025-06-07 (6): Is Your Training Pipeline Production-Ready? A Case Study in the Healthcare Domain

Title: Is Your Training Pipeline Production-Ready? A Case Study in the Healthcare Domain Ist Ihr Training Pipeline Production-Ready? Eine Fallstudie im Bereich Healthcare 你的训练管道生产-准备? 保健领域案例研究 2506.06946v1

Authors (5): Daniel Lawand, Lucas Quaresma, Roberto Bolgheroni, Alfredo Goldman, Renato Cordeiro Ferreira

Deploying a Machine Learning (ML) training pipeline into production requires robust software engineering practices. This differs significantly from experimental workflows. This experience report investigates this challenge in SPIRA, a project whose goal is to create an ML-Enabled System (MLES) to pre-diagnose insufficiency respiratory via speech analysis. The first version of SPIRA’s training pipeline lacked critical software quality attributes. This paper presents an overview of the MLES, then compares three versions of the architecture of the Continuous Training subsystem, which evolved from a Big Ball of Mud, to a Modular Monolith, towards Microservices. By adopting different design principles and patterns to enhance its maintainability, robustness, and extensibility. In this way, the paper seeks to offer insights for both ML Engineers tasked to productionize ML training pipelines and Data Scientists seeking to adopt MLOps practices.

nan


Article 1322

Title@2025-06-07 (6): Polar Hierarchical Mamba: Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences

Title: Polar Hierarchical Mamba: Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences Polar Hierarchical Mamba: Auf dem Weg zur Streaming-LiDAR-Objekterkennung mit Punktwolken als egozentrische Sequenzen 极地分级马姆巴:向以点云为主心序列的分点云星探测星体径流激光雷达物体的方向迈进 2506.06944v1

Authors (3): Mellon M. Zhang, Glen Chou, Saibal Mukhopadhyay

Accurate and efficient object detection is essential for autonomous vehicles, where real-time perception requires low latency and high throughput. LiDAR sensors provide robust depth information, but conventional methods process full 360{\deg} scans in a single pass, introducing significant delay. Streaming approaches address this by sequentially processing partial scans in the native polar coordinate system, yet they rely on translation-invariant convolutions that are misaligned with polar geometry – resulting in degraded performance or requiring complex distortion mitigation. Recent Mamba-based state space models (SSMs) have shown promise for LiDAR perception, but only in the full-scan setting, relying on geometric serialization and positional embeddings that are memory-intensive and ill-suited to streaming. We propose Polar Hierarchical Mamba (PHiM), a novel SSM architecture designed for polar-coordinate streaming LiDAR. PHiM uses local bidirectional Mamba blocks for intra-sector spatial encoding and a global forward Mamba for inter-sector temporal modeling, replacing convolutions and positional encodings with distortion-aware, dimensionally-decomposed operations. PHiM sets a new state-of-the-art among streaming detectors on the Waymo Open Dataset, outperforming the previous best by 10\% and matching full-scan baselines at twice the throughput. Code will be available at https://github.com/meilongzhang/Polar-Hierarchical-Mamba .

nan


Article 1323

Title@2025-06-07 (6): Conditional Denoising Diffusion for ISAC Enhanced Channel Estimation in Cell-Free 6G

Title: Conditional Denoising Diffusion for ISAC Enhanced Channel Estimation in Cell-Free 6G Bedingte störende Diffusion für ISAC-Verstärkte Kanalabschätzung in zellfreier 6G 以无细胞6G方式为ISAC强化频道估计仪进行有条件的低排放传播 2506.06942v1

Authors (5): Mohammad Farzanullah, Han Zhang, Akram Bin Sediq, Ali Afana, Melike Erol-Kantarci

Cell-free Integrated Sensing and Communication (ISAC) aims to revolutionize 6th Generation (6G) networks. By combining distributed access points with ISAC capabilities, it boosts spectral efficiency, situational awareness, and communication reliability. Channel estimation is a critical step in cell-free ISAC systems to ensure reliable communication, but its performance is usually limited by challenges such as pilot contamination and noisy channel estimates. This paper presents a novel framework leveraging sensing information as a key input within a Conditional Denoising Diffusion Model (CDDM). In this framework, we integrate CDDM with a Multimodal Transformer (MMT) to enhance channel estimation in ISAC-enabled cell-free systems. The MMT encoder effectively captures inter-modal relationships between sensing and location data, enabling the CDDM to iteratively denoise and refine channel estimates. Simulation results demonstrate that the proposed approach achieves significant performance gains. As compared with Least Squares (LS) and Minimum Mean Squared Error (MMSE) estimators, the proposed model achieves normalized mean squared error (NMSE) improvements of 8 dB and 9 dB, respectively. Moreover, we achieve a 27.8% NMSE improvement compared to the traditional denoising diffusion model (TDDM), which does not incorporate sensing channel information. Additionally, the model exhibits higher robustness against pilot contamination and maintains high accuracy under challenging conditions, such as low signal-to-noise ratios (SNRs). According to the simulation results, the model performs well for users near sensing targets by leveraging the correlation between sensing and communication channels.

nan


Article 1324

Title@2025-06-07 (6): RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior

Title: RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior RestoreGrad: Signal-Restaurierung mit Bedingten störenden Diffusions-Modellen mit gemeinsam gelernten Prior RestAGrad:使用有条件的低度扩散模型和先前联合研究的传播模型,恢复信号恢复信号 2502.13574v3

Authors (7): Ching-Hua Lee, Chouchang Yang, Jaejin Cho, Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Yilin Shen, Hongxia Jin

Denoising diffusion probabilistic models (DDPMs) can be utilized to recover a clean signal from its degraded observation(s) by conditioning the model on the degraded signal. The degraded signals are themselves contaminated versions of the clean signals; due to this correlation, they may encompass certain useful information about the target clean data distribution. However, existing adoption of the standard Gaussian as the prior distribution in turn discards such information when shaping the prior, resulting in sub-optimal performance. In this paper, we propose to improve conditional DDPMs for signal restoration by leveraging a more informative prior that is jointly learned with the diffusion model. The proposed framework, called RestoreGrad, seamlessly integrates DDPMs into the variational autoencoder (VAE) framework, taking advantage of the correlation between the degraded and clean signals to encode a better diffusion prior. On speech and image restoration tasks, we show that RestoreGrad demonstrates faster convergence (5-10 times fewer training steps) to achieve better quality of restored signals over existing DDPM baselines and improved robustness to using fewer sampling steps in inference time (2-2.5 times fewer), advocating the advantages of leveraging jointly learned prior for efficiency improvements in the diffusion process.

nan


Article 1325

Title@2025-06-07 (6): The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Title: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Die Illusion des Denkens: Die Stärken und Grenzen von Vernunftmodellen über das Lens of Problem Complexity verstehen 思考的幻觉:通过问题复杂焦点了解理性模型的长处和局限性 2506.06941v1

Authors (6): Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar

Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established math and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from contamination and does not provide insights into the reasoning traces. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs think. Through extensive experiments, we show that LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counterintuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having remaining token budget. By comparing LRMs with their standard LLM counterparts under same inference compute, we identify three performance regimes: (1) low-complexity tasks where standard models outperform LRMs, (2) medium-complexity tasks where LRMs demonstrates advantage, and (3) high-complexity tasks where both models face complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across scales. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models’ computational behavior, shedding light on their strengths, limitations, and raising questions about their reasoning capabilities.

nan


Article 1326

Title@2025-06-07 (6): Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More

Title: Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More Schärfe-Dynamik im NN-Training mit einem minimalistischen Beispiel verstehen: Die Auswirkungen von Dataset-Schwierigkeiten, Tiefe, Stochastik und mehr 在NN培训中以最微小的示例了解尖锐的动态:数据集难度、深度、储存性等的影响 2506.06940v1

Authors (3): Geonhui Yoo, Minhak Song, Chulhee Yun

When training deep neural networks with gradient descent, sharpness often increases – a phenomenon known as progressive sharpening – before saturating at the edge of stability. Although commonly observed in practice, the underlying mechanisms behind progressive sharpening remain poorly understood. In this work, we study this phenomenon using a minimalist model: a deep linear network with a single neuron per layer. We show that this simple model effectively captures the sharpness dynamics observed in recent empirical studies, offering a simple testbed to better understand neural network training. Moreover, we theoretically analyze how dataset properties, network depth, stochasticity of optimizers, and step size affect the degree of progressive sharpening in the minimalist model. We then empirically demonstrate how these theoretical insights extend to practical scenarios. This study offers a deeper understanding of sharpness dynamics in neural network training, highlighting the interplay between depth, training data, and optimizers.

nan


Article 1327

Title@2025-06-07 (6): Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning

Title: Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning Intelligente Anomalieerkennung für Lane-Rendering mit Transformer mit selbstüberwachtem Pre-Training und maßgeschneidertem Feintuning 利用自我监督的训练前和定制的微调访问变形器,对利用自我监督的训练前和定制的微调访问前变形器进行巷路成形的智能异常异常探测 2312.04398v6

Authors (6): Yongqi Dong, Xingmin Lu, Ruohan Li, Wei Song, Bart van Arem, Haneen Farah

The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.

nan


Article 1328

Title@2025-06-07 (6): Scaling over Scaling: Exploring Test-Time Scaling Plateau in Large Reasoning Models

Title: Scaling over Scaling: Exploring Test-Time Scaling Plateau in Large Reasoning Models Skalierung über Skalierung: Untersuchung des Testzeit-Skalierungs-Plateau in Großmodellen 缩放超过缩放: 探索大型理由模型中的试验时间缩放高台 2505.20522v2

Authors (5): Jian Wang, Boyan Zhu, Chak Tou Leong, Yongqi Li, Wenjie Li

Large reasoning models (LRMs) have exhibited the capacity of enhancing reasoning performance via internal test-time scaling. Building upon this, a promising direction is to further scale test-time compute to unlock even greater reasoning capabilities. However, as we push these scaling boundaries, systematically understanding the practical limits and achieving optimal resource allocation becomes a critical challenge. In this paper, we investigate the scaling plateau of test-time scaling and introduce the Test-Time Scaling Performance Model (TTSPM). We theoretically analyze two fundamental paradigms for such extended scaling, parallel scaling and sequential scaling, from a probabilistic modeling perspective. Our primary contribution is the derivation of the saturation point on the scaling budget for both strategies, identifying thresholds beyond which additional computation yields diminishing returns. Remarkably, despite their distinct mechanisms, both paradigms converge to a unified mathematical structure in their upper bounds. We empirically validate our theoretical findings on challenging reasoning benchmarks, including AIME, MATH-500, and GPQA, demonstrating the practical utility of these bounds for test-time resource allocation. We hope that this work provides insights into the cost-benefit trade-offs of test-time scaling, guiding the development of more resource-efficient inference strategies for large reasoning models.

nan


Article 1329

Title@2025-06-07 (6): Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry

Title: Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry Das Budget neu schreiben: Ein allgemeiner Rahmen für Black-Box-Angriffe unter Kosten-Asymmetrie 重写预算:费用不对称下的黑牛袭击总框架 2506.06933v1

Authors (3): Mahdi Salmani, Alireza Abdollahpoorrostam, Seyed-Mohsen Moosavi-Dezfooli

Traditional decision-based black-box adversarial attacks on image classifiers aim to generate adversarial examples by slightly modifying input images while keeping the number of queries low, where each query involves sending an input to the model and observing its output. Most existing methods assume that all queries have equal cost. However, in practice, queries may incur asymmetric costs; for example, in content moderation systems, certain output classes may trigger additional review, enforcement, or penalties, making them more costly than others. While prior work has considered such asymmetric cost settings, effective algorithms for this scenario remain underdeveloped. In this paper, we propose a general framework for decision-based attacks under asymmetric query costs, which we refer to as asymmetric black-box attacks. We modify two core components of existing attacks: the search strategy and the gradient estimation process. Specifically, we propose Asymmetric Search (AS), a more conservative variant of binary search that reduces reliance on high-cost queries, and Asymmetric Gradient Estimation (AGREST), which shifts the sampling distribution to favor low-cost queries. We design efficient algorithms that minimize total attack cost by balancing different query types, in contrast to earlier methods such as stealthy attacks that focus only on limiting expensive (high-cost) queries. Our method can be integrated into a range of existing black-box attacks with minimal changes. We perform both theoretical analysis and empirical evaluation on standard image classification benchmarks. Across various cost regimes, our method consistently achieves lower total query cost and smaller perturbations than existing approaches, with improvements of up to 40% in some settings.

nan


Article 1330

Title@2025-06-07 (6): Revisiting semi-supervised learning in the era of foundation models

Title: Revisiting semi-supervised learning in the era of foundation models Halbbeaufsichtigtes Lernen im Zeitalter der Gründungsmodelle 在基础模式时代重新审查半监督的学习 2503.09707v2

Authors (4): Ping Zhang, Zheda Mai, Quang-Huy Nguyen, Wei-Lun Chao

Semi-supervised learning (SSL) leverages abundant unlabeled data alongside limited labeled data to enhance learning. As vision foundation models (VFMs) increasingly serve as the backbone of vision applications, it remains unclear how SSL interacts with these pre-trained models. To address this gap, we develop new SSL benchmark datasets where frozen VFMs underperform and systematically evaluate representative SSL methods. We make a surprising observation: parameter-efficient fine-tuning (PEFT) using only labeled data often matches SSL performance, even without leveraging unlabeled data. This motivates us to revisit self-training, a conceptually simple SSL baseline, where we use the supervised PEFT model to pseudo-label unlabeled data for further training. To overcome the notorious issue of noisy pseudo-labels, we propose ensembling multiple PEFT approaches and VFM backbones to produce more robust pseudo-labels. Empirical results validate the effectiveness of this simple yet powerful approach, providing actionable insights into SSL with VFMs and paving the way for more scalable and practical semi-supervised learning in the era of foundation models.

nan


Article 1331

Title@2025-06-07 (6): BLEUBERI: BLEU is a surprisingly effective reward for instruction following

Title: BLEUBERI: BLEU is a surprisingly effective reward for instruction following BLEUBERI: BLEU ist eine überraschend effektive Belohnung für den Unterricht nach BLEU是一种令人惊讶的有效奖赏 2505.11080v2

Authors (7): Yapei Chang, Yekyung Kim, Michael Krumdick, Amir Zadeh, Chuan Li, Chris Tanner, Mohit Iyyer

Reward models are central to aligning LLMs with human preferences, but they are costly to train, requiring large-scale human-labeled preference data and powerful pretrained LLM backbones. Meanwhile, the increasing availability of high-quality synthetic instruction-following datasets raises the question: can simpler, reference-based metrics serve as viable alternatives to reward models during RL-based alignment? In this paper, we show first that BLEU, a basic string-matching metric, surprisingly matches strong reward models in agreement with human preferences on general instruction-following datasets. Based on this insight, we develop BLEUBERI, a method that first identifies challenging instructions and then applies Group Relative Policy Optimization (GRPO) using BLEU directly as the reward function. We demonstrate that BLEUBERI-trained models are competitive with models trained via reward model-guided RL across four challenging instruction-following benchmarks and three different base language models. A human evaluation further supports that the quality of BLEUBERI model outputs is on par with those from reward model-aligned models. Moreover, BLEUBERI models generate outputs that are more factually grounded than competing methods. Overall, we show that given access to high-quality reference outputs (easily obtained via existing instruction-following datasets or synthetic data generation), string matching-based metrics are cheap yet effective proxies for reward models during alignment. We release our code and data at https://github.com/lilakk/BLEUBERI.

nan


Article 1332

Title@2025-06-07 (6): Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes

Title: Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes Analyse der Thompson-Probenahme zur Kontrolle unbekannter linearer Diffusionsprozesse Thompson 用于控制未知线性扩散过程的抽样分析 2206.09977v2

Authors (3): Mohamad Kazem Shirani Faradonbeh, Sadegh Shirani, Mohsen Bayati

Linear diffusion processes serve as canonical continuous-time models for dynamic decision-making under uncertainty. These systems evolve according to drift matrices that specify the instantaneous rates of change in the expected system state, while also experiencing continuous random disturbances modeled by Brownian noise. For instance, in medical applications such as artificial pancreas systems, the drift matrices represent the internal dynamics of glucose concentrations. Classical results in stochastic control provide optimal policies under perfect knowledge of the drift matrices. However, practical decision-making scenarios typically feature uncertainty about the drift; in medical contexts, such parameters are patient-specific and unknown, requiring adaptive policies for efficiently learning the drift matrices while ensuring system stability and optimal performance. We study the Thompson sampling (TS) algorithm for decision-making in linear diffusion processes with unknown drift matrices. For this algorithm that designs control policies as if samples from a posterior belief about the parameters fully coincide with the unknown truth, we establish efficiency. That is, Thompson sampling learns optimal control actions fast, incurring only a square-root of time regret, and also learns to stabilize the system in a short time period. To our knowledge, this is the first such result for TS in a diffusion process control problem. Moreover, our empirical simulations in three settings that involve blood-glucose and flight control demonstrate that TS significantly improves regret, compared to the state-of-the-art algorithms, suggesting it explores in a more guarded fashion. Our theoretical analysis includes characterization of a certain optimality manifold that relates the geometry of the drift matrices to the optimal control of the diffusion process, among others.

nan


Article 1333

Title@2025-06-07 (6): Distributionally Robust Learning in Survival Analysis

Title: Distributionally Robust Learning in Survival Analysis Verteilungsweise robustes Lernen in der Überlebensanalyse 在生存分析中进行有力的分配学习 2506.01348v2

Authors (3): Yeping Jin, Lauren Wise, Ioannis Ch. Paschalidis

We introduce an innovative approach that incorporates a Distributionally Robust Learning (DRL) approach into Cox regression to enhance the robustness and accuracy of survival predictions. By formulating a DRL framework with a Wasserstein distance-based ambiguity set, we develop a variant Cox model that is less sensitive to assumptions about the underlying data distribution and more resilient to model misspecification and data perturbations. By leveraging Wasserstein duality, we reformulate the original min-max DRL problem into a tractable regularized empirical risk minimization problem, which can be computed by exponential conic programming. We provide guarantees on the finite sample behavior of our DRL-Cox model. Moreover, through extensive simulations and real world case studies, we demonstrate that our regression model achieves superior performance in terms of prediction accuracy and robustness compared with traditional methods.

nan


Article 1334

Title@2025-06-07 (6): Collapse-Proof Non-Contrastive Self-Supervised Learning

Title: Collapse-Proof Non-Contrastive Self-Supervised Learning Kollaps-Proof des nicht kontrasiven Selbstüberwachten Lernens 非封闭性自我监督学习的折叠结果 2410.04959v3

Authors (3): Emanuele Sansone, Tim Lebailly, Tinne Tuytelaars

We present a principled and simplified design of the projector and loss function for non-contrastive self-supervised learning based on hyperdimensional computing. We theoretically demonstrate that this design introduces an inductive bias that encourages representations to be simultaneously decorrelated and clustered, without explicitly enforcing these properties. This bias provably enhances generalization and suffices to avoid known training failure modes, such as representation, dimensional, cluster, and intracluster collapses. We validate our theoretical findings on image datasets, including SVHN, CIFAR-10, CIFAR-100, and ImageNet-100. Our approach effectively combines the strengths of feature decorrelation and cluster-based self-supervised learning methods, overcoming training failure modes while achieving strong generalization in clustering and linear classification tasks.

nan


Article 1335

Title@2025-06-07 (6): Basis Transformers for Multi-Task Tabular Regression

Title: Basis Transformers for Multi-Task Tabular Regression Basistransformatoren für Multi-Task-Tabellarische Regression 多任务表格回归基础变换器 2506.06926v1

Authors (3): Wei Min Loh, Jiaqi Shang, Pascal Poupart

Dealing with tabular data is challenging due to partial information, noise, and heterogeneous structure. Existing techniques often struggle to simultaneously address key aspects of tabular data such as textual information, a variable number of columns, and unseen data without metadata besides column names. We propose a novel architecture, \textit{basis transformers}, specifically designed to tackle these challenges while respecting inherent invariances in tabular data, including hierarchical structure and the representation of numeric values. We evaluate our design on a multi-task tabular regression benchmark, achieving an improvement of 0.338 in the median $R^2$ score and the lowest standard deviation across 34 tasks from the OpenML-CTR23 benchmark. Furthermore, our model has five times fewer parameters than the best-performing baseline and surpasses pretrained large language model baselines – even when initialized from randomized weights.

nan


Article 1336

Title@2025-06-07 (6): The global convergence time of stochastic gradient descent in non-convex landscapes: Sharp estimates via large deviations

Title: The global convergence time of stochastic gradient descent in non-convex landscapes: Sharp estimates via large deviations Die globale Konvergenzzeit stochastischer Gradientenabgänge in nicht konvexen Landschaften: Scharfe Schätzungen über große Abweichungen 非沿海地貌中悬浮性梯度下降的全球趋同时间:因巨大偏差而得出的粗略估计 2503.16398v2

Authors (4): Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

In this paper, we examine the time it takes for stochastic gradient descent (SGD) to reach the global minimum of a general, non-convex loss function. We approach this question through the lens of randomly perturbed dynamical systems and large deviations theory, and we provide a tight characterization of the global convergence time of SGD via matching upper and lower bounds. These bounds are dominated by the most “costly” set of obstacles that the algorithm may need to overcome in order to reach a global minimizer from a given initialization, coupling in this way the global geometry of the underlying loss landscape with the statistics of the noise entering the process. Finally, motivated by applications to the training of deep neural networks, we also provide a series of refinements and extensions of our analysis for loss functions with shallow local minima.

nan


Article 1337

Title@2025-06-07 (6): Measurement to Meaning: A Validity-Centered Framework for AI Evaluation

Title: Measurement to Meaning: A Validity-Centered Framework for AI Evaluation Messung zur Bedeutung: Ein gültigkeitszentrierter Rahmen für die AI-Bewertung 衡量到意义:AI评价的有效性-中心框架 2505.10573v3

Authors (9): Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, Sanmi Koyejo

While the capabilities and utility of AI systems have advanced, rigorous norms for evaluating these systems have lagged. Grand claims, such as models achieving general reasoning capabilities, are supported with model performance on narrow benchmarks, like performance on graduate-level exam questions, which provide a limited and potentially misleading assessment. We provide a structured approach for reasoning about the types of evaluative claims that can be made given the available evidence. For instance, our framework helps determine whether performance on a mathematical benchmark is an indication of the ability to solve problems on math tests or instead indicates a broader ability to reason. Our framework is well-suited for the contemporary paradigm in machine learning, where various stakeholders provide measurements and evaluations that downstream users use to validate their claims and decisions. At the same time, our framework also informs the construction of evaluations designed to speak to the validity of the relevant claims. By leveraging psychometrics’ breakdown of validity, evaluations can prioritize the most critical facets for a given claim, improving empirical utility and decision-making efficacy. We illustrate our framework through detailed case studies of vision and language model evaluations, highlighting how explicitly considering validity strengthens the connection between evaluation evidence and the claims being made.

nan


Article 1338

Title@2025-06-07 (6): Graph-Based Physics-Guided Urban PM2.5 Air Quality Imputation with Constrained Monitoring Data

Title: Graph-Based Physics-Guided Urban PM2.5 Air Quality Imputation with Constrained Monitoring Data Graph-based Physics-Guided Urban PM2.5 Luftqualität Imputation mit eingeschränkten Überwachungsdaten 基于图表的物理指导城市 PP2.5 空气质量与受控监测数据 2506.06917v1

Authors (5): Shangjie Du, Hui Wei, Dong Yoon Lee, Zhizhang Hu, Shijia Pan

This work introduces GraPhy, a graph-based, physics-guided learning framework for high-resolution and accurate air quality modeling in urban areas with limited monitoring data. Fine-grained air quality monitoring information is essential for reducing public exposure to pollutants. However, monitoring networks are often sparse in socioeconomically disadvantaged regions, limiting the accuracy and resolution of air quality modeling. To address this, we propose a physics-guided graph neural network architecture called GraPhy with layers and edge features designed specifically for low-resolution monitoring data. Experiments using data from California’s socioeconomically disadvantaged San Joaquin Valley show that GraPhy achieves the overall best performance evaluated by mean squared error (MSE), mean absolute error (MAE), and R-square value (R2), improving the performance by 9%-56% compared to various baseline models. Moreover, GraPhy consistently outperforms baselines across different spatial heterogeneity levels, demonstrating the effectiveness of our model design.

nan


Article 1339

Title@2025-06-07 (6): Graph Neural Networks in Modern AI-aided Drug Discovery

Title: Graph Neural Networks in Modern AI-aided Drug Discovery Graph Neuronale Netzwerke in der modernen KI-gestützten Drug Discovery 现代人工智能辅助药物发现现代神经网络图 2506.06915v1

Authors (12): Odin Zhang, Haitao Lin, Xujun Zhang, Xiaorui Wang, Zhenxing Wu, Qing Ye, Weibo Zhao, Jike Wang, Kejun Ying, Yu Kang, Chang-yu Hsieh, Tingjun Hou

Graph neural networks (GNNs), as topology/structure-aware models within deep learning, have emerged as powerful tools for AI-aided drug discovery (AIDD). By directly operating on molecular graphs, GNNs offer an intuitive and expressive framework for learning the complex topological and geometric features of drug-like molecules, cementing their role in modern molecular modeling. This review provides a comprehensive overview of the methodological foundations and representative applications of GNNs in drug discovery, spanning tasks such as molecular property prediction, virtual screening, molecular generation, biomedical knowledge graph construction, and synthesis planning. Particular attention is given to recent methodological advances, including geometric GNNs, interpretable models, uncertainty quantification, scalable graph architectures, and graph generative frameworks. We also discuss how these models integrate with modern deep learning approaches, such as self-supervised learning, multi-task learning, meta-learning and pre-training. Throughout this review, we highlight the practical challenges and methodological bottlenecks encountered when applying GNNs to real-world drug discovery pipelines, and conclude with a discussion on future directions.

nan


Article 1340

Title@2025-06-07 (6): Uncertainty Estimation on Graphs with Structure Informed Stochastic Partial Differential Equations

Title: Uncertainty Estimation on Graphs with Structure Informed Stochastic Partial Differential Equations Ungewissheitsschätzung auf Graphen mit Struktur Informierte stochastische Partielle Differentialgleichungen 具有结构、知情斯托卡部分差别的图图的不确定性估计 2506.06907v1

Authors (2): Fred Xu, Thomas Markovich

Graph Neural Networks have achieved impressive results across diverse network modeling tasks, but accurately estimating uncertainty on graphs remains difficult, especially under distributional shifts. Unlike traditional uncertainty estimation, graph-based uncertainty must account for randomness arising from both the graph’s structure and its label distribution, which adds complexity. In this paper, making an analogy between the evolution of a stochastic partial differential equation (SPDE) driven by Matern Gaussian Process and message passing using GNN layers, we present a principled way to design a novel message passing scheme that incorporates spatial-temporal noises motivated by the Gaussian Process approach to SPDE. Our method simultaneously captures uncertainty across space and time and allows explicit control over the covariance kernel smoothness, thereby enhancing uncertainty estimates on graphs with both low and high label informativeness. Our extensive experiments on Out-of-Distribution (OOD) detection on graph datasets with varying label informativeness demonstrate the soundness and superiority of our model to existing approaches.

nan


Article 1341

Title@2025-06-07 (6): When Incentives Backfire, Data Stops Being Human

Title: When Incentives Backfire, Data Stops Being Human Wenn Incentives Backfire, Daten aufhören, Mensch zu sein 当鼓励反火时,数据停止成为人类 2502.07732v2

Authors (5): Sebastin Santy, Prasanta Bhattacharya, Manoel Horta Ribeiro, Kelsey Allen, Sewoong Oh

Progress in AI has relied on human-generated data, from annotator marketplaces to the wider Internet. However, the widespread use of large language models now threatens the quality and integrity of human-generated data on these very platforms. We argue that this issue goes beyond the immediate challenge of filtering AI-generated content – it reveals deeper flaws in how data collection systems are designed. Existing systems often prioritize speed, scale, and efficiency at the cost of intrinsic human motivation, leading to declining engagement and data quality. We propose that rethinking data collection systems to align with contributors’ intrinsic motivations – rather than relying solely on external incentives – can help sustain high-quality data sourcing at scale while maintaining contributor trust and long-term participation.

nan


Article 1342

Title@2025-06-07 (6): Prediction-Powered Adaptive Shrinkage Estimation

Title: Prediction-Powered Adaptive Shrinkage Estimation Vorhersage-Powered Adaptive Schrumpfung Schätzung 预测力的适应性减缩估计值 2502.14166v2

Authors (2): Sida Li, Nikolaos Ignatiadis

Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI’s benefits for individual statistical problems, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage (PAS), a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means. PAS debiases noisy ML predictions within each task and then borrows strength across tasks by using those same predictions as a reference point for shrinkage. The amount of shrinkage is determined by minimizing an unbiased estimate of risk, and we prove that this tuning strategy is asymptotically optimal. Experiments on both synthetic and real-world datasets show that PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.

nan


Article 1343

Title@2025-06-07 (6): An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation

Title: An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation Eine adaptiv ungenaue Methode für das Zwei-Stufen-Lernen mit primär-dualer Stildifferenzierung 使用纯-双式风格差异的双级学习适应性不精确方法 2412.06436v3

Authors (5): Lea Bogensperger, Matthias J. Ehrhardt, Thomas Pock, Mohammad Sadegh Salehi, Hok Shing Wong

We consider a bilevel learning framework for learning linear operators. In this framework, the learnable parameters are optimized via a loss function that also depends on the minimizer of a convex optimization problem (denoted lower-level problem). We utilize an iterative algorithm called `piggyback’ to compute the gradient of the loss and minimizer of the lower-level problem. Given that the lower-level problem is solved numerically, the loss function and thus its gradient can only be computed inexactly. To estimate the accuracy of the computed hypergradient, we derive an a-posteriori error bound, which provides guides for setting the tolerance for the lower-level problem, as well as the piggyback algorithm. To efficiently solve the upper-level optimization, we also propose an adaptive method for choosing a suitable step-size. To illustrate the proposed method, we consider a few learned regularizer problems, such as training an input-convex neural network.

nan


Article 1344

Title@2025-06-07 (6): TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training

Title: TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training TorchTitan: Ein-Stop PyTorch native Lösung für produktionsfertiges LLM-Vortraining 火炬泰坦:当地一站式PyToirch当地生产即成LLM培训前的一站式PyTorch本地解决方案 2410.06511v3

Authors (13): Wanchao Liang, Tianyu Liu, Less Wright, Will Constable, Andrew Gu, Chien-Chin Huang, Iris Zhang, Wei Feng, Howard Huang, Junjie Wang, Sanket Purandare, Gokul Nadathur, Stratos Idreos

The development of large language models (LLMs) has been instrumental in advancing state-of-the-art natural language processing applications. Training LLMs with billions of parameters and trillions of tokens require sophisticated distributed systems that enable composing and comparing several state-of-the-art techniques in order to efficiently scale across thousands of accelerators. However, existing solutions are complex, scattered across multiple libraries/repositories, lack interoperability, and are cumbersome to maintain. Thus, curating and empirically comparing training recipes require non-trivial engineering effort. This paper introduces TorchTitan, an open-source, PyTorch-native distributed training system that unifies state-of-the-art techniques, streamlining integration and reducing overhead. TorchTitan enables 3D parallelism in a modular manner with elastic scaling, providing comprehensive logging, checkpointing, and debugging tools for production-ready training. It also incorporates hardware-software co-designed solutions, leveraging features like Float8 training and SymmetricMemory. As a flexible test bed, TorchTitan facilitates custom recipe curation and comparison, allowing us to develop optimized training recipes for Llama 3.1 and provide guidance on selecting techniques for maximum efficiency based on our experiences. We thoroughly assess TorchTitan on the Llama 3.1 family of LLMs, spanning 8 billion to 405 billion parameters, and showcase its exceptional performance, modular composability, and elastic scalability. By stacking training optimizations, we demonstrate accelerations of 65.08% with 1D parallelism at the 128-GPU scale (Llama 3.1 8B), an additional 12.59% with 2D parallelism at the 256-GPU scale (Llama 3.1 70B), and an additional 30% with 3D parallelism at the 512-GPU scale (Llama 3.1 405B) on NVIDIA H100 GPUs over optimized baselines.

nan


Article 1345

Title@2025-06-07 (6): NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery

Title: NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery NSD-Imagery: Ein Benchmark-Datensatz zur Erweiterung von fMRI-Vision-Dekodierungsmethoden auf geistige Bilder NSD- 想象力:将FMRI愿景解码方法扩展至精神图像的基准数据集 2506.06898v1

Authors (6): Reese Kneeland, Paul S. Scotti, Ghislain St-Yves, Jesse Breedlove, Kendrick Kay, Thomas Naselaris

We release NSD-Imagery, a benchmark dataset of human fMRI activity paired with mental images, to complement the existing Natural Scenes Dataset (NSD), a large-scale dataset of fMRI activity paired with seen images that enabled unprecedented improvements in fMRI-to-image reconstruction efforts. Recent models trained on NSD have been evaluated only on seen image reconstruction. Using NSD-Imagery, it is possible to assess how well these models perform on mental image reconstruction. This is a challenging generalization requirement because mental images are encoded in human brain activity with relatively lower signal-to-noise and spatial resolution; however, generalization from seen to mental imagery is critical for real-world applications in medical domains and brain-computer interfaces, where the desired information is always internally generated. We provide benchmarks for a suite of recent NSD-trained open-source visual decoding models (MindEye1, MindEye2, Brain Diffuser, iCNN, Takagi et al.) on NSD-Imagery, and show that the performance of decoding methods on mental images is largely decoupled from performance on vision reconstruction. We further demonstrate that architectural choices significantly impact cross-decoding performance: models employing simple linear decoding architectures and multimodal feature decoding generalize better to mental imagery, while complex architectures tend to overfit visual training data. Our findings indicate that mental imagery datasets are critical for the development of practical applications, and establish NSD-Imagery as a useful resource for better aligning visual decoding methods with this goal.

nan


Article 1346

Title@2025-06-07 (6): Exploring Image Transforms derived from Eye Gaze Variables for Progressive Autism Diagnosis

Title: Exploring Image Transforms derived from Eye Gaze Variables for Progressive Autism Diagnosis Erforschen von Bildtransformationen, abgeleitet von Eye Gaze-Variablen für die progressive Autismus-Diagnose 探索从用于渐进自闭症诊断的眼盖变量中产生的图像变换 2506.09065v1

Authors (6): Abigail Copiaco, Christian Ritz, Yassine Himeur, Valsamma Eapen, Ammar Albanna, Wathiq Mansoor

The prevalence of Autism Spectrum Disorder (ASD) has surged rapidly over the past decade, posing significant challenges in communication, behavior, and focus for affected individuals. Current diagnostic techniques, though effective, are time-intensive, leading to high social and economic costs. This work introduces an AI-powered assistive technology designed to streamline ASD diagnosis and management, enhancing convenience for individuals with ASD and efficiency for caregivers and therapists. The system integrates transfer learning with image transforms derived from eye gaze variables to diagnose ASD. This facilitates and opens opportunities for in-home periodical diagnosis, reducing stress for individuals and caregivers, while also preserving user privacy through the use of image transforms. The accessibility of the proposed method also offers opportunities for improved communication between guardians and therapists, ensuring regular updates on progress and evolving support needs. Overall, the approach proposed in this work ensures timely, accessible diagnosis while protecting the subjects’ privacy, improving outcomes for individuals with ASD.

nan


Article 1347

Title@2025-06-07 (6): LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

Title: LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models LLM-SRBench: Ein neuer Benchmark für die Entdeckung wissenschaftlicher Gleichungen mit großen Sprachmodellen LLM-SRBench:采用大语言模式科学等同发现新基准 2504.10415v2

Authors (6): Parshin Shojaee, Ngoc-Hieu Nguyen, Kazem Meidani, Amir Barati Farimani, Khoa D Doan, Chandan K Reddy

Scientific equation discovery is a fundamental task in the history of scientific progress, enabling the derivation of laws governing natural phenomena. Recently, Large Language Models (LLMs) have gained interest for this task due to their potential to leverage embedded scientific knowledge for hypothesis generation. However, evaluating the true discovery capabilities of these methods remains challenging, as existing benchmarks often rely on common equations that are susceptible to memorization by LLMs, leading to inflated performance metrics that do not reflect discovery. In this paper, we introduce LLM-SRBench, a comprehensive benchmark with 239 challenging problems across four scientific domains specifically designed to evaluate LLM-based scientific equation discovery methods while preventing trivial memorization. Our benchmark comprises two main categories: LSR-Transform, which transforms common physical models into less common mathematical representations to test reasoning beyond memorized forms, and LSR-Synth, which introduces synthetic, discovery-driven problems requiring data-driven reasoning. Through extensive evaluation of several state-of-the-art methods, using both open and closed LLMs, we find that the best-performing system so far achieves only 31.5% symbolic accuracy. These findings highlight the challenges of scientific equation discovery, positioning LLM-SRBench as a valuable resource for future research.

nan


Article 1348

Title@2025-06-07 (6): Multimodal Integration of Longitudinal Noninvasive Diagnostics for Survival Prediction in Immunotherapy Using Deep Learning

Title: Multimodal Integration of Longitudinal Noninvasive Diagnostics for Survival Prediction in Immunotherapy Using Deep Learning Multimodale Integration der langitudinalen Nichtinvasiven Diagnostik zur Überlebensvorhersage in der Immuntherapie mittels Deep Learning 利用深层学习的免疫疗法中生存预测纵向非侵入性诊断的多模式综合 2411.18253v2

Authors (7): Melda Yeghaian, Zuhir Bodalal, Daan van den Broek, John B A G Haanen, Regina G H Beets-Tan, Stefano Trebeschi, Marcel A J van Gerven

Purpose: Immunotherapies have revolutionized the landscape of cancer treatments. However, our understanding of response patterns in advanced cancers treated with immunotherapy remains limited. By leveraging routinely collected noninvasive longitudinal and multimodal data with artificial intelligence, we could unlock the potential to transform immunotherapy for cancer patients, paving the way for personalized treatment approaches. Methods: In this study, we developed a novel artificial neural network architecture, multimodal transformer-based simple temporal attention (MMTSimTA) network, building upon a combination of recent successful developments. We integrated pre- and on-treatment blood measurements, prescribed medications and CT-based volumes of organs from a large pan-cancer cohort of 694 patients treated with immunotherapy to predict mortality at three, six, nine and twelve months. Different variants of our extended MMTSimTA network were implemented and compared to baseline methods incorporating intermediate and late fusion based integration methods. Results: The strongest prognostic performance was demonstrated using a variant of the MMTSimTA model with area under the curves (AUCs) of $0.84 \pm $0.04, $0.83 \pm $0.02, $0.82 \pm $0.02, $0.81 \pm $0.03 for 3-, 6-, 9-, and 12-month survival prediction, respectively. Discussion: Our findings show that integrating noninvasive longitudinal data using our novel architecture yields an improved multimodal prognostic performance, especially in short-term survival prediction. Conclusion: Our study demonstrates that multimodal longitudinal integration of noninvasive data using deep learning may offer a promising approach for personalized prognostication in immunotherapy-treated cancer patients.

nan


Article 1349

Title@2025-06-07 (6): Scalable Gaussian Processes with Latent Kronecker Structure

Title: Scalable Gaussian Processes with Latent Kronecker Structure Skalierbare Gaußprozesse mit latenter Kronecker-Struktur 可缩放高斯进程 2506.06895v1

Authors (6): Jihao Andreas Lin, Sebastian Ament, Maximilian Balandat, David Eriksson, José Miguel Hernández-Lobato, Eytan Bakshy

Applying Gaussian processes (GPs) to very large datasets remains a challenge due to limited computational scalability. Matrix structures, such as the Kronecker product, can accelerate operations significantly, but their application commonly entails approximations or unrealistic assumptions. In particular, the most common path to creating a Kronecker-structured kernel matrix is by evaluating a product kernel on gridded inputs that can be expressed as a Cartesian product. However, this structure is lost if any observation is missing, breaking the Cartesian product structure, which frequently occurs in real-world data such as time series. To address this limitation, we propose leveraging latent Kronecker structure, by expressing the kernel matrix of observed values as the projection of a latent Kronecker product. In combination with iterative linear system solvers and pathwise conditioning, our method facilitates inference of exact GPs while requiring substantially fewer computational resources than standard iterative methods. We demonstrate that our method outperforms state-of-the-art sparse and variational GPs on real-world datasets with up to five million examples, including robotics, automated machine learning, and climate applications.

nan


Article 1350

Title@2025-06-07 (6): A Novel Neural Filter to Improve Accuracy of Neural Network Models of Dynamic Systems

Title: A Novel Neural Filter to Improve Accuracy of Neural Network Models of Dynamic Systems Ein neuartiger neuraler Filter zur Verbesserung der Genauigkeit neuraler Netzwerkmodelle dynamischer Systeme 提高动态系统神经网络模型准确性的新神经过滤器 2409.13654v2

Authors (3): Parham Oveissi, Turibius Rozario, Ankit Goel

The application of neural networks in modeling dynamic systems has become prominent due to their ability to estimate complex nonlinear functions. Despite their effectiveness, neural networks face challenges in long-term predictions, where the prediction error diverges over time, thus degrading their accuracy. This paper presents a neural filter to enhance the accuracy of long-term state predictions of neural network-based models of dynamic systems. Motivated by the extended Kalman filter, the neural filter combines the neural network state predictions with the measurements from the physical system to improve the estimated state’s accuracy. The neural filter’s improvements in prediction accuracy are demonstrated through applications to four nonlinear dynamical systems. Numerical experiments show that the neural filter significantly improves prediction accuracy and bounds the state estimate covariance, outperforming the neural network predictions. Furthermore, it is also shown that the accuracy of a poorly trained neural network model can be improved to the same level as that of an adequately trained neural network model, potentially decreasing the training cost and required data to train a neural network.

nan


Article 1351

Title@2025-06-07 (6): Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks?

Title: Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks? Kann In-Context-Verstärkung lernen, sich von Belohnungs-Vergiftungsangriffen zu erholen? 文中加强学习能从受毒害袭击中恢复吗? 2506.06891v1

Authors (3): Paulius Sasnauskas, Yiğit Yalın, Goran Radanović

We study the corruption-robustness of in-context reinforcement learning (ICRL), focusing on the Decision-Pretrained Transformer (DPT, Lee et al., 2023). To address the challenge of reward poisoning attacks targeting the DPT, we propose a novel adversarial training framework, called Adversarially Trained Decision-Pretrained Transformer (AT-DPT). Our method simultaneously trains an attacker to minimize the true reward of the DPT by poisoning environment rewards, and a DPT model to infer optimal actions from the poisoned data. We evaluate the effectiveness of our approach against standard bandit algorithms, including robust baselines designed to handle reward contamination. Our results show that the proposed method significantly outperforms these baselines in bandit settings, under a learned attacker. We additionally evaluate AT-DPT on an adaptive attacker, and observe similar results. Furthermore, we extend our evaluation to the MDP setting, confirming that the robustness observed in bandit scenarios generalizes to more complex environments.

nan


Article 1352

Title@2025-06-07 (6): FREE: Fast and Robust Vision Language Models with Early Exits

Title: FREE: Fast and Robust Vision Language Models with Early Exits KOSTENLOS: Schnelle und robuste Vision Sprachmodelle mit frühen Ausgängen FREE:有早期出入口的快速和强有力的愿景语言模式 2506.06884v1

Authors (2): Divya Jyoti Bajpai, Manjesh Kumar Hanawal

In recent years, Vision-Language Models (VLMs) have shown remarkable performance improvements in Vision-Language tasks. However, their large size poses challenges for real-world applications where inference latency is a concern. To tackle this issue, we propose employing Early Exit (EE) strategies in VLMs. However, training exit classifiers in VLMs is challenging, particularly with limited labeled training data. To address this, we introduce FREE, an adversarial training approach within a GAN-based framework. Here, each exit consists of a transformer layer and a classifier. The transformer layer is adversarially trained to produce feature representations similar to the final layer, while a feature classifier serves as the discriminator. Our method focuses on performing input-adaptive inference that increases inference speed with minimal drop in performance. Experimental results demonstrate the effectiveness of our approach in enhancing accuracy and model robustness by mitigating overthinking and the phenomenon of mid-crisis that we highlight. We experimentally validate that our method speeds up the inference process by more than 1.51x while retaining comparable performance. The source code is available at https://github.com/Div290/FREE.

nan


Article 1353

Title@2025-06-07 (6): LookAlike: Consistent Distractor Generation in Math MCQs

Title: LookAlike: Consistent Distractor Generation in Math MCQs LookAlike: Konsistente Distraktor-Generation in Math MCQs 类似外观: 在数学 MCQs 中, 一致的分隔器生成 2505.01903v2

Authors (5): Nisarg Parikh, Nigel Fernandez, Alexander Scarlatos, Simon Woodhead, Andrew Lan

Large language models (LLMs) are increasingly used to generate distractors for multiple-choice questions (MCQs), especially in domains like math education. However, existing approaches are limited in ensuring that the generated distractors are consistent with common student errors. We propose LookAlike, a method that improves error-distractor consistency via preference optimization. Our two main innovations are: (a) mining synthetic preference pairs from model inconsistencies, and (b) alternating supervised fine-tuning (SFT) with Direct Preference Optimization (DPO) to stabilize training. Unlike prior work that relies on heuristics or manually annotated preference data, LookAlike uses its own generation inconsistencies as dispreferred samples, thus enabling scalable and stable training. Evaluated on a real-world dataset of 1,400+ math MCQs, LookAlike achieves 51.6% accuracy in distractor generation and 57.2% in error generation under LLM-as-a-judge evaluation, outperforming an existing state-of-the-art method (45.6% / 47.7%). These improvements highlight the effectiveness of preference-based regularization and inconsistency mining for generating consistent math MCQ distractors at scale.

nan


Article 1354

Title@2025-06-07 (6): Empowering COVID-19 Detection: Optimizing Performance Through Fine-Tuned EfficientNet Deep Learning Architecture

Title: Empowering COVID-19 Detection: Optimizing Performance Through Fine-Tuned EfficientNet Deep Learning Architecture Empowering COVID-19 Detection: Optimierung der Leistung durch eine effiziente Net Deep Learning Architektur 增强COVID-19探测能力:通过精巧高效网络深层学习架构优化业绩 2311.16593v2

Authors (5): Md. Alamin Talukder, Md. Abu Layek, Mohsin Kazi, Md Ashraf Uddin, Sunil Aryal

The worldwide COVID-19 pandemic has profoundly influenced the health and everyday experiences of individuals across the planet. It is a highly contagious respiratory disease requiring early and accurate detection to curb its rapid transmission. Initial testing methods primarily revolved around identifying the genetic composition of the coronavirus, exhibiting a relatively low detection rate and requiring a time-intensive procedure. To address this challenge, experts have suggested using radiological imagery, particularly chest X-rays, as a valuable approach within the diagnostic protocol. This study investigates the potential of leveraging radiographic imaging (X-rays) with deep learning algorithms to swiftly and precisely identify COVID-19 patients. The proposed approach elevates the detection accuracy by fine-tuning with appropriate layers on various established transfer learning models. The experimentation was conducted on a COVID-19 X-ray dataset containing 2000 images. The accuracy rates achieved were impressive of 100% for EfficientNetB4 model. The fine-tuned EfficientNetB4 achieved an excellent accuracy score, showcasing its potential as a robust COVID-19 detection model. Furthermore, EfficientNetB4 excelled in identifying Lung disease using Chest X-ray dataset containing 4,350 Images, achieving remarkable performance with an accuracy of 99.17%, precision of 99.13%, recall of 99.16%, and f1-score of 99.14%. These results highlight the promise of fine-tuned transfer learning for efficient lung detection through medical imaging, especially with X-ray images. This research offers radiologists an effective means of aiding rapid and precise COVID-19 diagnosis and contributes valuable assistance for healthcare professionals in accurately identifying affected patients.

nan


Article 1355

Title@2025-06-07 (6): Learnable Activation Functions in Physics-Informed Neural Networks for Solving Partial Differential Equations

Title: Learnable Activation Functions in Physics-Informed Neural Networks for Solving Partial Differential Equations Lernbare Aktivierungsfunktionen in physikinformierten Neuronalen Netzwerken zur Lösung partieller Differentialgleichungen 用于解决部分差异等量的物理内成神经网络中可学习的激活功能 2411.15111v3

Authors (2): Afrah Farea, Mustafa Serdar Celebi

We investigate learnable activation functions in Physics-Informed Neural Networks (PINNs) for solving Partial Differential Equations (PDEs): comparing traditional Multilayer Perceptrons (MLPs) with fixed and trainable activations against Kolmogorov-Arnold Networks (KANs) that employ learnable basis functions. While PINNs effectively incorporate physical laws into the learning process, they suffer from convergence and spectral bias problems, which limit their applicability to problems with rapid oscillations or sharp transitions. In this work, we study various activation and basis functions across diverse PDEs, including oscillatory, nonlinear wave, mixed-physics, and fluid dynamics problems. Using empirical Neural Tangent Kernel (NTK) analysis and Hessian eigenvalue decomposition, we assess convergence behavior, stability, and high-frequency approximation capacity. While KANs offer improved expressivity for capturing complex, high-frequency PDE solutions, they introduce new optimization challenges, especially in deeper networks. Our findings show that KANs face a curse of functional dimensionality, creating intractable optimization landscapes in deeper networks. Low spectral bias alone does not guarantee good performance; adaptive spectral bias approaches such as B-splines achieve optimal results by balancing global stability with local high-frequency resolution. Different PDE types require tailored strategies: smooth global activation functions excel for wave phenomena, while local adaptive activation functions suit problems with sharp transitions.

nan


Article 1356

Title@2025-06-07 (6): An Optimized Ensemble Deep Learning Model For Brain Tumor Classification

Title: An Optimized Ensemble Deep Learning Model For Brain Tumor Classification Ein optimiertes Ensemble Deep Learning Modell für die Klassifizierung von Gehirntumoren 大脑肿瘤分类最佳组合深层学习模型 2305.12844v3

Authors (3): Md. Alamin Talukder, Md. Manowarul Islam, Md Ashraf Uddin

Brain tumors present a grave risk to human life, demanding precise and timely diagnosis for effective treatment. Inaccurate identification of brain tumors can significantly diminish life expectancy, underscoring the critical need for precise diagnostic methods. Manual identification of brain tumors within vast Magnetic Resonance Imaging (MRI) image datasets is arduous and time-consuming. Thus, the development of a reliable deep learning (DL) model is essential to enhance diagnostic accuracy and ultimately save lives. This study introduces an innovative optimization-based deep ensemble approach employing transfer learning (TL) to efficiently classify brain tumors. Our methodology includes meticulous preprocessing, reconstruction of TL architectures, fine-tuning, and ensemble DL models utilizing weighted optimization techniques such as Genetic Algorithm-based Weight Optimization (GAWO) and Grid Search-based Weight Optimization (GSWO). Experimentation is conducted on the Figshare Contrast-Enhanced MRI (CE-MRI) brain tumor dataset, comprising 3064 images. Our approach achieves notable accuracy scores, with Xception, ResNet50V2, ResNet152V2, InceptionResNetV2, GAWO, and GSWO attaining 99.42%, 98.37%, 98.22%, 98.26%, 99.71%, and 99.76% accuracy, respectively. Notably, GSWO demonstrates superior accuracy, averaging 99.76\% accuracy across five folds on the Figshare CE-MRI brain tumor dataset. The comparative analysis highlights the significant performance enhancement of our proposed model over existing counterparts. In conclusion, our optimized deep ensemble model exhibits exceptional accuracy in swiftly classifying brain tumors. Furthermore, it has the potential to assist neurologists and clinicians in making accurate and immediate diagnostic decisions.

nan


Article 1357

Title@2025-06-07 (6): Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization

Title: Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization Verbesserte Probenkomplexität für private Nonsmooth Nonconvex-Optimierung 改进私人非机动非康通非节流优化的样品复杂度 2410.05880v2

Authors (3): Guy Kornowski, Daogao Liu, Kunal Talwar

We study differentially private (DP) optimization algorithms for stochastic and empirical objectives which are neither smooth nor convex, and propose methods that return a Goldstein-stationary point with sample complexity bounds that improve on existing works. We start by providing a single-pass $(\epsilon,\delta)$-DP algorithm that returns an $(\alpha,\beta)$-stationary point as long as the dataset is of size $\widetilde{\Omega}(\sqrt{d}/\alpha\beta^{3}+d/\epsilon\alpha\beta^{2})$, which is $\Omega(\sqrt{d})$ times smaller than the algorithm of Zhang et al. [2024] for this task, where $d$ is the dimension. We then provide a multi-pass polynomial time algorithm which further improves the sample complexity to $\widetilde{\Omega}\left(d/\beta^2+d^{3/4}/\epsilon\alpha^{1/2}\beta^{3/2}\right)$, by designing a sample efficient ERM algorithm, and proving that Goldstein-stationary points generalize from the empirical loss to the population loss.

nan


Article 1358

Title@2025-06-07 (6): Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

Title: Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding Weit und deutlich sehen: Halluzinationen in MLLMs mit Aufmerksamkeit abbauen 远看远看得很清楚:在MLLLMS中减少幻觉,并注意固态解说 2505.16652v2

Authors (12): Feilong Tang, Chengzhi Liu, Zhongxing Xu, Ming Hu, Zelin Peng, Zhiwei Yang, Jionglong Su, Minquan Lin, Yifan Peng, Xuelian Cheng, Imran Razzak, Zongyuan Ge

Recent advancements in multimodal large language models (MLLMs) have significantly improved performance in visual question answering. However, they often suffer from hallucinations. In this work, hallucinations are categorized into two main types: initial hallucinations and snowball hallucinations. We argue that adequate contextual information can be extracted directly from the token interaction process. Inspired by causal inference in the decoding strategy, we propose to leverage causal masks to establish information propagation between multimodal tokens. The hypothesis is that insufficient interaction between those tokens may lead the model to rely on outlier tokens, overlooking dense and rich contextual cues. Therefore, we propose to intervene in the propagation process by tackling outlier tokens to enhance in-context inference. With this goal, we present FarSight, a versatile plug-and-play decoding strategy to reduce attention interference from outlier tokens merely by optimizing the causal mask. The heart of our method is effective token propagation. We design an attention register structure within the upper triangular matrix of the causal mask, dynamically allocating attention to capture attention diverted to outlier tokens. Moreover, a positional awareness encoding method with a diminishing masking rate is proposed, allowing the model to attend to further preceding tokens, especially for video sequence tasks. With extensive experiments, FarSight demonstrates significant hallucination-mitigating performance across different MLLMs on both image and video benchmarks, proving its effectiveness.

nan


Article 1359

Title@2025-06-07 (6): Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary Methods

Title: Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary Methods Learning Fluid-Struktur-Interaktion Dynamik mit physikinformierten Neuronalen Netzwerken und eingetauchten Grenzmethoden 与物理内成形神经网络和混合边界方法的互动动态 2505.18565v2

Authors (5): Afrah Farea, Saiful Khan, Reza Daryani, Emre Cenk Ersan, Mustafa Serdar Celebi

We introduce neural network architectures that combine physics-informed neural networks (PINNs) with the immersed boundary method (IBM) to solve fluid-structure interaction (FSI) problems. Our approach features two distinct architectures: a Single-FSI network with a unified parameter space, and an innovative Eulerian-Lagrangian network that maintains separate parameter spaces for fluid and structure domains. We study each architecture using standard Tanh and adaptive B-spline activation functions. Empirical studies on a 2D cavity flow problem involving a moving solid structure show that the Eulerian-Lagrangian architecture performs significantly better. The adaptive B-spline activation further enhances accuracy by providing locality-aware representation near boundaries. While our methodology shows promising results in predicting the velocity field, pressure recovery remains challenging due to the absence of explicit force-coupling constraints in the current formulation. Our findings underscore the importance of domain-specific architectural design and adaptive activation functions for modeling FSI problems within the PINN framework.

nan


Article 1360

Title@2025-06-07 (6): Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning

Title: Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning Log-Sum-Exponential-Estimator für Off-Policy-Evaluierung und -Lernen 用于非政策评价和学习的非政策评价和学习的日志- 2506.06873v1

Authors (6): Armin Behnamnia, Gholamali Aminian, Alireza Aghaei, Chengchun Shi, Vincent Y. F. Tan, Hamid R. Rabiee

Off-policy learning and evaluation leverage logged bandit feedback datasets, which contain context, action, propensity score, and feedback for each data point. These scenarios face significant challenges due to high variance and poor performance with low-quality propensity scores and heavy-tailed reward distributions. We address these issues by introducing a novel estimator based on the log-sum-exponential (LSE) operator, which outperforms traditional inverse propensity score estimators. Our LSE estimator demonstrates variance reduction and robustness under heavy-tailed conditions. For off-policy evaluation, we derive upper bounds on the estimator’s bias and variance. In the off-policy learning scenario, we establish bounds on the regret – the performance gap between our LSE estimator and the optimal policy – assuming bounded $(1+\epsilon)$-th moment of weighted reward. Notably, we achieve a convergence rate of $O(n^{-\epsilon/(1+ \epsilon)})$ for the regret bounds, where $\epsilon \in [0,1]$ and $n$ is the size of logged bandit feedback dataset. Theoretical analysis is complemented by comprehensive empirical evaluations in both off-policy learning and evaluation scenarios, confirming the practical advantages of our approach. The code for our estimator is available at the following link: https://github.com/armin-behnamnia/lse-offpolicy-learning.

nan


Article 1361

Title@2025-06-07 (6): Watermark under Fire: A Robustness Evaluation of LLM Watermarking

Title: Watermark under Fire: A Robustness Evaluation of LLM Watermarking Wasserzeichen unter Feuer: Eine Robustheitsbewertung von LLM Watermarking 火灾下的水印:对LLM水标志的强力评价 2411.13425v3

Authors (5): Jiacheng Liang, Zian Wang, Lauren Hong, Shouling Ji, Ting Wang

Various watermarking methods (``watermarkers’’) have been proposed to identify LLM-generated texts; yet, due to the lack of unified evaluation platforms, many critical questions remain under-explored: i) What are the strengths/limitations of various watermarkers, especially their attack robustness? ii) How do various design choices impact their robustness? iii) How to optimally operate watermarkers in adversarial environments? To fill this gap, we systematize existing LLM watermarkers and watermark removal attacks, mapping out their design spaces. We then develop WaterPark, a unified platform that integrates 10 state-of-the-art watermarkers and 12 representative attacks. More importantly, by leveraging WaterPark, we conduct a comprehensive assessment of existing watermarkers, unveiling the impact of various design choices on their attack robustness. We further explore the best practices to operate watermarkers in adversarial environments. We believe our study sheds light on current LLM watermarking techniques while WaterPark serves as a valuable testbed to facilitate future research.

nan


Article 1362

Title@2025-06-07 (6): KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies

Title: KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies KEA: Exploration durch proaktiv koordinierte Explorationsstrategien am Leben erhalten KEA:通过主动协调的勘探战略使勘探活动保持活跃 2503.18234v2

Authors (4): Shih-Min Yang, Martin Magnusson, Johannes A. Stork, Todor Stoyanov

Soft Actor-Critic (SAC) has achieved notable success in continuous control tasks but struggles in sparse reward settings, where infrequent rewards make efficient exploration challenging. While novelty-based exploration methods address this issue by encouraging the agent to explore novel states, they are not trivial to apply to SAC. In particular, managing the interaction between novelty-based exploration and SAC’s stochastic policy can lead to inefficient exploration and redundant sample collection. In this paper, we propose KEA (Keeping Exploration Alive) which tackles the inefficiencies in balancing exploration strategies when combining SAC with novelty-based exploration. KEA integrates a novelty-augmented SAC with a standard SAC agent, proactively coordinated via a switching mechanism. This coordination allows the agent to maintain stochasticity in high-novelty regions, enhancing exploration efficiency and reducing repeated sample collection. We first analyze this potential issue in a 2D navigation task, and then evaluate KEA on the DeepSea hard-exploration benchmark as well as sparse reward control tasks from the DeepMind Control Suite. Compared to state-of-the-art novelty-based exploration baselines, our experiments show that KEA significantly improves learning efficiency and robustness in sparse reward setups.

nan


Article 1363

Title@2025-06-07 (6): SAFE: Finding Sparse and Flat Minima to Improve Pruning

Title: SAFE: Finding Sparse and Flat Minima to Improve Pruning SAFE: Sparse und flache Minima finden, um Pruning zu verbessern FAFE: 寻找粗糙和平坦的迷你小米来改善谨慎 2506.06866v1

Authors (4): Dongyeop Lee, Kwanhee Lee, Jinseok Chung, Namhoon Lee

Sparsifying neural networks often suffers from seemingly inevitable performance degradation, and it remains challenging to restore the original performance despite much recent progress. Motivated by recent studies in robust optimization, we aim to tackle this problem by finding subnetworks that are both sparse and flat at the same time. Specifically, we formulate pruning as a sparsity-constrained optimization problem where flatness is encouraged as an objective. We solve it explicitly via an augmented Lagrange dual approach and extend it further by proposing a generalized projection operation, resulting in novel pruning methods called SAFE and its extension, SAFE$^+$. Extensive evaluations on standard image classification and language modeling tasks reveal that SAFE consistently yields sparse networks with improved generalization performance, which compares competitively to well-established baselines. In addition, SAFE demonstrates resilience to noisy data, making it well-suited for real-world conditions.

nan


Article 1364

Title@2025-06-07 (6): Generalization and Robustness of the Tilted Empirical Risk

Title: Generalization and Robustness of the Tilted Empirical Risk Verallgemeinerung und Robustheit des abgeschwächten Empirischen Risikos 微薄经验风险的概括化和有力性 2409.19431v3

Authors (6): Gholamali Aminian, Amir R. Asadi, Tian Li, Ahmad Beirami, Gesine Reinert, Samuel N. Cohen

The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, \citet{li2020tilted} proposed the {\it tilted empirical risk} (TER) as a non-linear risk metric for machine learning applications such as classification and regression problems. In this work, we examine the generalization error of the tilted empirical risk in the robustness regime under \textit{negative tilt}. Our first contribution is to provide uniform and information-theoretic bounds on the {\it tilted generalization error}, defined as the difference between the population risk and the tilted empirical risk, under negative tilt for unbounded loss function under bounded $(1+\epsilon)$-th moment of loss function for some $\epsilon\in(0,1]$ with a convergence rate of $O(n^{-\epsilon/(1+\epsilon)})$ where $n$ is the number of training samples, revealing a novel application for TER under no distribution shift. Secondly, we study the robustness of the tilted empirical risk with respect to noisy outliers at training time and provide theoretical guarantees under distribution shift for the tilted empirical risk. We empirically corroborate our findings in simple experimental setups where we evaluate our bounds to select the value of tilt in a data-driven manner.

nan


Article 1365

Title@2025-06-07 (6): Multimodal Spatial Language Maps for Robot Navigation and Manipulation

Title: Multimodal Spatial Language Maps for Robot Navigation and Manipulation Multimodale räumliche Sprachkarten für Roboternavigation und -manipulation 用于机器人导航和操纵的多模式空间语言地图 2506.06862v1

Authors (4): Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard

Grounding language to a navigating agent’s observations can leverage pretrained multimodal foundation models to match perceptions to object or event descriptions. However, previous approaches remain disconnected from environment mapping, lack the spatial precision of geometric maps, or neglect additional modality information beyond vision. To address this, we propose multimodal spatial language maps as a spatial map representation that fuses pretrained multimodal features with a 3D reconstruction of the environment. We build these maps autonomously using standard exploration. We present two instances of our maps, which are visual-language maps (VLMaps) and their extension to audio-visual-language maps (AVLMaps) obtained by adding audio information. When combined with large language models (LLMs), VLMaps can (i) translate natural language commands into open-vocabulary spatial goals (e.g., “in between the sofa and TV”) directly localized in the map, and (ii) be shared across different robot embodiments to generate tailored obstacle maps on demand. Building upon the capabilities above, AVLMaps extend VLMaps by introducing a unified 3D spatial representation integrating audio, visual, and language cues through the fusion of features from pretrained multimodal foundation models. This enables robots to ground multimodal goal queries (e.g., text, images, or audio snippets) to spatial locations for navigation. Additionally, the incorporation of diverse sensory inputs significantly enhances goal disambiguation in ambiguous environments. Experiments in simulation and real-world settings demonstrate that our multimodal spatial language maps enable zero-shot spatial and multimodal goal navigation and improve recall by 50% in ambiguous scenarios. These capabilities extend to mobile robots and tabletop manipulators, supporting navigation and interaction guided by visual, audio, and spatial cues.

nan


Article 1366

Title@2025-06-07 (6): Differentially Private Sparse Linear Regression with Heavy-tailed Responses

Title: Differentially Private Sparse Linear Regression with Heavy-tailed Responses Unterschiedlich Private Sparse Lineare Regression mit stark gestaffelten Responses 具有重尾量反应的 不同私人的粗线性递减 2506.06861v1

Authors (5): Xizhi Tian, Meng Ding, Touming Tao, Zihang Xiang, Di Wang

As a fundamental problem in machine learning and differential privacy (DP), DP linear regression has been extensively studied. However, most existing methods focus primarily on either regular data distributions or low-dimensional cases with irregular data. To address these limitations, this paper provides a comprehensive study of DP sparse linear regression with heavy-tailed responses in high-dimensional settings. In the first part, we introduce the DP-IHT-H method, which leverages the Huber loss and private iterative hard thresholding to achieve an estimation error bound of ( \tilde{O}\biggl( s^{* \frac{1 }{2}} \cdot \biggl(\frac{\log d}{n}\biggr)^{\frac{\zeta}{1 + \zeta}} + s^{* \frac{1 + 2\zeta}{2 + 2\zeta}} \cdot \biggl(\frac{\log^2 d}{n \varepsilon}\biggr)^{\frac{\zeta}{1 + \zeta}} \biggr) ) under the $(\varepsilon, \delta)$-DP model, where $n$ is the sample size, $d$ is the dimensionality, $s^$ is the sparsity of the parameter, and $\zeta \in (0, 1]$ characterizes the tail heaviness of the data. In the second part, we propose DP-IHT-L, which further improves the error bound under additional assumptions on the response and achieves ( \tilde{O}\Bigl(\frac{(s^)^{3/2} \log d}{n \varepsilon}\Bigr). ) Compared to the first result, this bound is independent of the tail parameter $\zeta$. Finally, through experiments on synthetic and real-world datasets, we demonstrate that our methods outperform standard DP algorithms designed for ``regular’’ data.

nan


Article 1367

Title@2025-06-07 (6): High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations

Title: High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations High-Fidelity Scientific Simulation Surrogats über adaptive Implizite Neuraldarstellungen 通过适应性隐隐性神经仪表显示的高纤维科学模拟模拟器 2506.06858v1

Authors (6): Ziwei Li, Yuhan Duan, Tianyu Xiong, Yi-Tang Chen, Wei-Lun Chao, Han-Wei Shen

Effective surrogate models are critical for accelerating scientific simulations. Implicit neural representations (INRs) offer a compact and continuous framework for modeling spatially structured data, but they often struggle with complex scientific fields exhibiting localized, high-frequency variations. Recent approaches address this by introducing additional features along rigid geometric structures (e.g., grids), but at the cost of flexibility and increased model size. In this paper, we propose a simple yet effective alternative: Feature-Adaptive INR (FA-INR). FA-INR leverages cross-attention to an augmented memory bank to learn flexible feature representations, enabling adaptive allocation of model capacity based on data characteristics, rather than rigid structural assumptions. To further improve scalability, we introduce a coordinate-guided mixture of experts (MoE) that enhances the specialization and efficiency of feature representations. Experiments on three large-scale ensemble simulation datasets show that FA-INR achieves state-of-the-art fidelity while significantly reducing model size, establishing a new trade-off frontier between accuracy and compactness for INR-based surrogates.

nan


Article 1368

Title@2025-06-07 (6): Free Random Projection for In-Context Reinforcement Learning

Title: Free Random Projection for In-Context Reinforcement Learning Freie Zufallsprojektion für In-Context-Verstärkung 内容强化学习自由随机投射 2504.06983v2

Authors (3): Tomohiro Hayase, Benoît Collins, Nakamasa Inoue

Hierarchical inductive biases are hypothesized to promote generalizable policies in reinforcement learning, as demonstrated by explicit hyperbolic latent representations and architectures. Therefore, a more flexible approach is to have these biases emerge naturally from the algorithm. We introduce Free Random Projection, an input mapping grounded in free probability theory that constructs random orthogonal matrices where hierarchical structure arises inherently. The free random projection integrates seamlessly into existing in-context reinforcement learning frameworks by encoding hierarchical organization within the input space without requiring explicit architectural modifications. Empirical results on multi-environment benchmarks show that free random projection consistently outperforms the standard random projection, leading to improvements in generalization. Furthermore, analyses within linearly solvable Markov decision processes and investigations of the spectrum of kernel random matrices reveal the theoretical underpinnings of free random projection’s enhanced performance, highlighting its capacity for effective adaptation in hierarchically structured state spaces.

nan


Article 1369

Title@2025-06-07 (6): JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data

Title: JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data JESTR: Gemeinsame Einbettung von Weltraumtechnik für Ranking-Kandidatenmoleküle für die Annotation untargeted Metabolomics Data JESTR: 联合嵌入空间技术,以评分非目标元元数据评分候选候选分子 2411.14464v3

Authors (4): Apurva Kalia, Yan Zhou Chen, Dilip Krishnan, Soha Hassoun

Motivation: A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint prediction (FP), annotation rates remain low. Results: We introduce in this paper a novel paradigm (JESTR) for annotation. Unlike prior approaches that explicitly construct molecular fingerprints or spectra, JESTR leverages the insight that molecules and their corresponding spectra are views of the same data and effectively embeds their representations in a joint space. Candidate structures are ranked based on cosine similarity between the embeddings of query spectrum and each candidate. We evaluate JESTR against mol-to-spec and spec-to-FP annotation tools on three datasets. On average, for rank@[1-5], JESTR outperforms other tools by 23.6%-71.6%. We further demonstrate the strong value of regularization with candidate molecules during training, boosting rank@1 performance by 11.4% and enhancing the model’s ability to discern between target and candidate molecules. When comparing JESTR’s performance against that of publicly available pretrained models of SIRIUS and CFM-ID on appropriate subsets of MassSpecGym benchmark dataset, JESTR outperforms these tools by 31% and 238%, respectively. Through JESTR, we offer a novel promising avenue towards accurate annotation, therefore unlocking valuable insights into the metabolome.

nan


Article 1370

Title@2025-06-07 (6): Curvature Enhanced Data Augmentation for Regression

Title: Curvature Enhanced Data Augmentation for Regression Verstärkte Datenvergrößerung für Regression 后回归加速度增强数据增强值 2506.06853v1

Authors (2): Ilya Kaufman Sirot, Omri Azencot

Deep learning models with a large number of parameters, often referred to as over-parameterized models, have achieved exceptional performance across various tasks. Despite concerns about overfitting, these models frequently generalize well to unseen data, thanks to effective regularization techniques, with data augmentation being among the most widely used. While data augmentation has shown great success in classification tasks using label-preserving transformations, its application in regression problems has received less attention. Recently, a novel \emph{manifold learning} approach for generating synthetic data was proposed, utilizing a first-order approximation of the data manifold. Building on this foundation, we present a theoretical framework and practical tools for approximating and sampling general data manifolds. Furthermore, we introduce the Curvature-Enhanced Manifold Sampling (CEMS) method for regression tasks. CEMS leverages a second-order representation of the data manifold to enable efficient sampling and reconstruction of new data points. Extensive evaluations across multiple datasets and comparisons with state-of-the-art methods demonstrate that CEMS delivers superior performance in both in-distribution and out-of-distribution scenarios, while introducing only minimal computational overhead. Code is available at https://github.com/azencot-group/CEMS.

nan


Article 1371

Title@2025-06-07 (6): Active Preference Optimization for Sample Efficient RLHF

Title: Active Preference Optimization for Sample Efficient RLHF Aktive Preference-Optimierung für Probeneffizienz RLHF 抽样高效RLHF的主动首选最佳优化 2402.10500v3

Authors (4): Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury

Large Language Models (LLMs) aligned using Reinforcement Learning from Human Feedback (RLHF) have shown remarkable generation abilities in numerous tasks. However, collecting high-quality human preferences creates costly bottlenecks in practical deployments, and hence, training data are often budgeted. In these scenarios, it is crucial to collect training data (e.g., contexts, a pair of generations for each context, and a preference indicating which generation is better) carefully, yet most of the existing methods sample contexts uniformly at random from a given collection. Given this, under the Bradley-Terry-Luce preference model and with a small budget of training data, we show that uniform sampling of contexts could lead to a policy (i.e., an aligned model) that suffers a constant sub-optimality gap from the optimal policy. This highlights the need for an adaptive context sampling strategy for effective alignment under a small sample budget. To address this, we reformulate RLHF within the contextual preference bandit framework, treating generations as actions, and give a nearly complete characterization of the sub-optimality gap in terms of both lower and upper bounds. First, when the action set is a $d$-dimensional hypercube and the number of samples is $T$, we show an $\Omega(d/\sqrt{T})$ lower bound. Next, we propose an algorithm, $\textit{Active Preference Optimization}$ ($\texttt{APO}$), that iteratively collects preferences for the most uncertain contexts. We show that the sub-optimality gap of the policy learned via $\texttt{APO}$ matches the lower bound up to a log factor and a non-linearity constant. Finally, we perform experiments on practical datasets to validate $\texttt{APO}$’s efficacy over existing methods, establishing it as a sample-efficient and cost-effective solution for LLM alignment.

nan


Article 1372

Title@2025-06-07 (6): A Statistical Framework for Model Selection in LSTM Networks

Title: A Statistical Framework for Model Selection in LSTM Networks Ein statistischer Rahmen für die Modellauswahl in LSTM-Netzwerken LSTM网络模式选择统计框架 2506.06840v1

Authors (1): Fahad Mostafa

Long Short-Term Memory (LSTM) neural network models have become the cornerstone for sequential data modeling in numerous applications, ranging from natural language processing to time series forecasting. Despite their success, the problem of model selection, including hyperparameter tuning, architecture specification, and regularization choice remains largely heuristic and computationally expensive. In this paper, we propose a unified statistical framework for systematic model selection in LSTM networks. Our framework extends classical model selection ideas, such as information criteria and shrinkage estimation, to sequential neural networks. We define penalized likelihoods adapted to temporal structures, propose a generalized threshold approach for hidden state dynamics, and provide efficient estimation strategies using variational Bayes and approximate marginal likelihood methods. Several biomedical data centric examples demonstrate the flexibility and improved performance of the proposed framework.

nan


Article 1373

Title@2025-06-07 (6): Harnessing Vision-Language Models for Time Series Anomaly Detection

Title: Harnessing Vision-Language Models for Time Series Anomaly Detection Nutzung von Vision-Language-Modellen für Zeitreihenanomalienerkennung 利用视觉语言模型进行时间序列异常探测 2506.06836v1

Authors (3): Zelin He, Sarah Alnegheimish, Matthew Reimherr

Time-series anomaly detection (TSAD) has played a vital role in a variety of fields, including healthcare, finance, and industrial monitoring. Prior methods, which mainly focus on training domain-specific models on numerical data, lack the visual-temporal reasoning capacity that human experts have to identify contextual anomalies. To fill this gap, we explore a solution based on vision language models (VLMs). Recent studies have shown the ability of VLMs for visual reasoning tasks, yet their direct application to time series has fallen short on both accuracy and efficiency. To harness the power of VLMs for TSAD, we propose a two-stage solution, with (1) ViT4TS, a vision-screening stage built on a relatively lightweight pretrained vision encoder, which leverages 2-D time-series representations to accurately localize candidate anomalies; (2) VLM4TS, a VLM-based stage that integrates global temporal context and VLM reasoning capacity to refine the detection upon the candidates provided by ViT4TS. We show that without any time-series training, VLM4TS outperforms time-series pretrained and from-scratch baselines in most cases, yielding a 24.6 percent improvement in F1-max score over the best baseline. Moreover, VLM4TS also consistently outperforms existing language-model-based TSAD methods and is on average 36 times more efficient in token usage.

nan


Article 1374

Title@2025-06-07 (6): LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models

Title: LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models LLM-Attacken: Verbesserung der Closed-Loop Adversarial Scenario Generation für autonomes Fahren mit großen Sprachmodellen LLM-攻击者:加强利用大语言模型自主驾驶的闭路反反向假想生成 2501.15850v2

Authors (4): Yuewen Mei, Tong Nie, Jian Sun, Ye Tian

Ensuring and improving the safety of autonomous driving systems (ADS) is crucial for the deployment of highly automated vehicles, especially in safety-critical events. To address the rarity issue, adversarial scenario generation methods are developed, in which behaviors of traffic participants are manipulated to induce safety-critical events. However, existing methods still face two limitations. First, identification of the adversarial participant directly impacts the effectiveness of the generation. However, the complexity of real-world scenarios, with numerous participants and diverse behaviors, makes identification challenging. Second, the potential of generated safety-critical scenarios to continuously improve ADS performance remains underexplored. To address these issues, we propose LLM-attacker: a closed-loop adversarial scenario generation framework leveraging large language models (LLMs). Specifically, multiple LLM agents are designed and coordinated to identify optimal attackers. Then, the trajectories of the attackers are optimized to generate adversarial scenarios. These scenarios are iteratively refined based on the performance of ADS, forming a feedback loop to improve ADS. Experimental results show that LLM-attacker can create more dangerous scenarios than other methods, and the ADS trained with it achieves a collision rate half that of training with normal scenarios. This indicates the ability of LLM-attacker to test and enhance the safety and robustness of ADS. Video demonstrations are provided at: https://drive.google.com/file/d/1Zv4V3iG7825oyiKbUwS2Y-rR0DQIE1ZA/view.

nan


Article 1375

Title@2025-06-07 (6): FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-World LoRA

Title: FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-World LoRA FedALT: Federated Fine-Tuning durch Adaptives lokales Training mit Rest-of-World LoRA FedALT:通过与世界其余地区合作的适应性地方培训实现联邦退休 2503.11880v2

Authors (4): Jieming Bian, Lei Wang, Letian Zhang, Jie Xu

Fine-tuning large language models (LLMs) in federated settings enables privacy-preserving adaptation but suffers from cross-client interference due to model aggregation. Existing federated LoRA fine-tuning methods, primarily based on FedAvg, struggle with data heterogeneity, leading to harmful cross-client interference and suboptimal personalization. In this work, we propose \textbf{FedALT}, a novel personalized federated LoRA fine-tuning algorithm that fundamentally departs from FedAvg. Instead of using an aggregated model to initialize local training, each client continues training its individual LoRA while incorporating shared knowledge through a separate Rest-of-World (RoW) LoRA component. To effectively balance local adaptation and global information, FedALT introduces an adaptive mixer that dynamically learns input-specific weightings between the individual and RoW LoRA components, drawing conceptual foundations from the Mixture-of-Experts (MoE) paradigm. Through extensive experiments on NLP benchmarks, we demonstrate that FedALT significantly outperforms state-of-the-art personalized federated LoRA fine-tuning methods, achieving superior local adaptation without sacrificing computational efficiency.

nan


Article 1376

Title: The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes Die Strömungen des Konflikts: Konflikttrends mit gaussischen Prozessen zersetzen 冲突流:与高斯进程冲突趋势的分解 2506.06828v1

Authors (1): Simon P. von der Maase

I present a novel approach to estimating the temporal and spatial patterns of violent conflict. I show how we can use highly temporally and spatially disaggregated data on conflict events in tandem with Gaussian processes to estimate temporospatial conflict trends. These trends can be studied to gain insight into conflict traps, diffusion and tempo-spatial conflict exposure in general; they can also be used to control for such phenomenons given other estimation tasks; lastly, the approach allow us to extrapolate the estimated tempo-spatial conflict patterns into future temporal units, thus facilitating powerful, stat-of-the-art, conflict forecasts. Importantly, these results are achieved via a relatively parsimonious framework using only one data source: past conflict patterns.

nan


Article 1377

Title@2025-06-07 (6): Data Shapley in One Training Run

Title: Data Shapley in One Training Run Datenschädlichkeit in einem Trainingslauf 一个培训运行中的数据形状 2406.11011v3

Authors (4): Jiachen T. Wang, Prateek Mittal, Dawn Song, Ruoxi Jia

Data Shapley provides a principled framework for attributing data’s contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing their application to large-scale models. Furthermore, they produce the same attribution score for any models produced by running the learning algorithm, meaning they cannot perform targeted attribution towards a specific model obtained from a single run of the algorithm. This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest. In its most efficient implementation, our technique incurs negligible additional runtime compared to standard model training. This dramatic efficiency improvement makes it possible to perform data attribution for the foundation model pretraining stage for the first time. We present several case studies that offer fresh insights into pretraining data’s contribution and discuss their implications for copyright in generative AI and pretraining data curation.

nan


Article 1378

Title@2025-06-07 (6): ASPO: Constraint-Aware Bayesian Optimization for FPGA-based Soft Processors

Title: ASPO: Constraint-Aware Bayesian Optimization for FPGA-based Soft Processors ASPO: Constraint-Aware Bayesian Optimierung für FPGA-basierte Softprozessoren APPO: 以FPGA为基础的软处理器的节压-Aware Bayesian优化 2506.06817v1

Authors (4): Haoran Wu, Ce Guo, Wayne Luk, Robert Mullins

Bayesian Optimization (BO) has shown promise in tuning processor design parameters. However, standard BO does not support constraints involving categorical parameters such as types of branch predictors and division circuits. In addition, optimization time of BO grows with processor complexity, which becomes increasingly significant especially for FPGA-based soft processors. This paper introduces ASPO, an approach that leverages disjunctive form to enable BO to handle constraints involving categorical parameters. Unlike existing methods that directly apply standard BO, the proposed ASPO method, for the first time, customizes the mathematical mechanism of BO to address challenges faced by soft-processor designs on FPGAs. Specifically, ASPO supports categorical parameters using a novel customized BO covariance kernel. It also accelerates the design evaluation procedure by penalizing the BO acquisition function with potential evaluation time and by reusing FPGA synthesis checkpoints from previously evaluated configurations. ASPO targets three soft processors: RocketChip, BOOM, and EL2 VeeR. The approach is evaluated based on seven RISC-V benchmarks. Results show that ASPO can reduce execution time for the ``multiply’’ benchmark on the BOOM processor by up to 35\% compared to the default configuration. Furthermore, it reduces design time for the BOOM processor by up to 74\% compared to Boomerang, a state-of-the-art hardware-oriented BO approach.

nan


Article 1379

Title@2025-06-07 (6): Path Integral Optimiser: Global Optimisation via Neural Schrödinger-Föllmer Diffusion

Title: Path Integral Optimiser: Global Optimisation via Neural Schrödinger-Föllmer Diffusion Path Integral Optimiser: Globale Optimierung über Neural Schrödinger-Föllmer Diffusion 路径综合优化:通过Neural Schrödinger-Föllmer扩散实现全球优化 2506.06815v1

Authors (3): Max McGuinness, Eirik Fladmark, Francisco Vargas

We present an early investigation into the use of neural diffusion processes for global optimisation, focusing on Zhang et al.’s Path Integral Sampler. One can use the Boltzmann distribution to formulate optimization as solving a Schr"odinger bridge sampling problem, then apply Girsanov’s theorem with a simple (single-point) prior to frame it in stochastic control terms, and compute the solution’s integral terms via a neural approximation (a Fourier MLP). We provide theoretical bounds for this optimiser, results on toy optimisation tasks, and a summary of the stochastic theory motivating the model. Ultimately, we found the optimiser to display promising per-step performance at optimisation tasks between 2 and 1,247 dimensions, but struggle to explore higher-dimensional spaces when faced with a 15.9k parameter model, indicating a need for work on adaptation in such environments.

nan


Article 1380

Title@2025-06-07 (6): Homophily-Driven Sanitation View for Robust Graph Contrastive Learning

Title: Homophily-Driven Sanitation View for Robust Graph Contrastive Learning Homophil-getriebene Sanitäransicht für robustes Graph Kontrastives Lernen 强力图表对比学习的光极- 驱动光学卫生视图 2307.12555v2

Authors (4): Yulin Zhu, Xing Ai, Yevgeniy Vorobeychik, Kai Zhou

We investigate adversarial robustness of unsupervised Graph Contrastive Learning (GCL) against structural attacks. First, we provide a comprehensive empirical and theoretical analysis of existing attacks, revealing how and why they downgrade the performance of GCL. Inspired by our analytic results, we present a robust GCL framework that integrates a homophily-driven sanitation view, which can be learned jointly with contrastive learning. A key challenge this poses, however, is the non-differentiable nature of the sanitation objective. To address this challenge, we propose a series of techniques to enable gradient-based end-to-end robust GCL. Moreover, we develop a fully unsupervised hyperparameter tuning method which, unlike prior approaches, does not require knowledge of node labels. We conduct extensive experiments to evaluate the performance of our proposed model, GCHS (Graph Contrastive Learning with Homophily-driven Sanitation View), against two state of the art structural attacks on GCL. Our results demonstrate that GCHS consistently outperforms all state of the art baselines in terms of the quality of generated node embeddings as well as performance on two important downstream tasks.

nan


Article 1381

Title@2025-06-07 (6): Efficient Parallelization of Message Passing Neural Network Potentials for Large-scale Molecular Dynamics

Title: Efficient Parallelization of Message Passing Neural Network Potentials for Large-scale Molecular Dynamics Effiziente Parallelisierung von Message-Passing-Neural Network-Potenziale für großmaßstäbliche molekulare Dynamiken 大型分子动态信息传递神经网络潜力的有效平行 2505.06711v3

Authors (2): Junfan Xia, Bin Jiang

Machine learning potentials have achieved great success in accelerating atomistic simulations. Many of them relying on atom-centered local descriptors are natural for parallelization. More recent message passing neural network (MPNN) models have demonstrated their superior accuracy and become increasingly popular. However, efficiently parallelizing MPNN models across multiple nodes remains challenging, limiting their practical applications in large-scale simulations. Here, we propose an efficient parallel algorithm for MPNN models, in which additional data communication is minimized among local atoms only in each MP layer without redundant computation, thus scaling linearly with the layer number. Integrated with our recursively embedded atom neural network model, this algorithm demonstrates excellent strong scaling and weak scaling behaviors in several benchmark systems. This approach enables massive molecular dynamics simulations on MPNN models as fast as on strictly local models for over 100 million atoms, vastly extending the applicability of the MPNN potential to an unprecedented scale. This general parallelization framework can empower various MPNN models to efficiently simulate very large and complex systems.

nan


Article 1382

Title@2025-06-07 (6): Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity

Title: Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity Schnelle kausale Entdeckung durch ungefähre Kernel-basierte generalisierte Score-Funktionen mit linearer Computational Complexity 以近端内核为基础的通用计分函数及线性比较复杂度 2412.17717v2

Authors (6): Yixin Ren, Haocheng Zhang, Yewei Xia, Hao Zhang, Jihong Guan, Shuigeng Zhou

Score-based causal discovery methods can effectively identify causal relationships by evaluating candidate graphs and selecting the one with the highest score. One popular class of scores is kernel-based generalized score functions, which can adapt to a wide range of scenarios and work well in practice because they circumvent assumptions about causal mechanisms and data distributions. Despite these advantages, kernel-based generalized score functions pose serious computational challenges in time and space, with a time complexity of $\mathcal{O}(n^3)$ and a memory complexity of $\mathcal{O}(n^2)$, where $n$ is the sample size. In this paper, we propose an approximate kernel-based generalized score function with $\mathcal{O}(n)$ time and space complexities by using low-rank technique and designing a set of rules to handle the complex composite matrix operations required to calculate the score, as well as developing sampling algorithms for different data types to benefit the handling of diverse data types efficiently. Our extensive causal discovery experiments on both synthetic and real-world data demonstrate that compared to the state-of-the-art method, our method can not only significantly reduce computational costs, but also achieve comparable accuracy, especially for large datasets.

nan


Article 1383

Title@2025-06-07 (6): IMPA-HGAE:Intra-Meta-Path Augmented Heterogeneous Graph Autoencoder

Title: IMPA-HGAE:Intra-Meta-Path Augmented Heterogeneous Graph Autoencoder IMPA-HGAE:Intra-Meta-Path Augmented Heterogene Graph Autoencoder IMPA-HGAE: 甲基板内增增异异异形图形自动编码器 2506.06809v1

Authors (4): Di Lin, Wanjing Ren, Xuanbin Li, Rui Zhang

Self-supervised learning (SSL) methods have been increasingly applied to diverse downstream tasks due to their superior generalization capabilities and low annotation costs. However, most existing heterogeneous graph SSL models convert heterogeneous graphs into homogeneous ones via meta-paths for training, which only leverage information from nodes at both ends of meta-paths while underutilizing the heterogeneous node information along the meta-paths. To address this limitation, this paper proposes a novel framework named IMPA-HGAE to enhance target node embeddings by fully exploiting internal node information along meta-paths. Experimental results validate that IMPA-HGAE achieves superior performance on heterogeneous datasets. Furthermore, this paper introduce innovative masking strategies to strengthen the representational capacity of generative SSL models on heterogeneous graph data. Additionally, this paper discuss the interpretability of the proposed method and potential future directions for generative self-supervised learning in heterogeneous graphs. This work provides insights into leveraging meta-path-guided structural semantics for robust representation learning in complex graph scenarios.

nan


Article 1384

Title@2025-06-07 (6): Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification

Title: Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification Label-Semantik Aware Generativer Ansatz für Domain-Agnostic Multilabel-Klassifikation 域-不可知性多标签分类的认知生成方法 2506.06806v1

Authors (5): Subhendu Khatuya, Shashwat Naidu, Saptarshi Ghosh, Pawan Goyal, Niloy Ganguly

The explosion of textual data has made manual document classification increasingly challenging. To address this, we introduce a robust, efficient domain-agnostic generative model framework for multi-label text classification. Instead of treating labels as mere atomic symbols, our approach utilizes predefined label descriptions and is trained to generate these descriptions based on the input text. During inference, the generated descriptions are matched to the pre-defined labels using a finetuned sentence transformer. We integrate this with a dual-objective loss function, combining cross-entropy loss and cosine similarity of the generated sentences with the predefined target descriptions, ensuring both semantic alignment and accuracy. Our proposed model LAGAMC stands out for its parameter efficiency and versatility across diverse datasets, making it well-suited for practical applications. We demonstrate the effectiveness of our proposed model by achieving new state-of-the-art performances across all evaluated datasets, surpassing several strong baselines. We achieve improvements of 13.94% in Micro-F1 and 24.85% in Macro-F1 compared to the closest baseline across all datasets.

nan


Article 1385

Title@2025-06-07 (6): Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs

Title: Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs Dialog ohne Grenzen: Konstant-Sized KV-Caches für erweiterte Reaktionen in LLMs 无限制对话框: 普通 KV 宽度响应库( LLM ) 2503.00979v2

Authors (5): Ravi Ghadia, Avinash Kumar, Gaurav Jain, Prashant Nair, Poulami Das

Autoregressive Transformers rely on Key-Value (KV) caching to accelerate inference. However, the linear growth of the KV cache with context length leads to excessive memory consumption and bandwidth constraints. This bottleneck is particularly problematic in real-time applications – such as chatbots and interactive assistants – where low latency and high memory efficiency are critical. Existing methods drop distant tokens or compress states in a lossy manner, sacrificing accuracy by discarding vital context or introducing bias. We propose MorphKV, an inference-time technique that maintains a constant-sized KV cache while preserving accuracy. MorphKV balances long-range dependencies and local coherence during text generation. It eliminates early-token bias while retaining high-fidelity context by adaptively ranking tokens through correlation-aware selection. Unlike heuristic retention or lossy compression, MorphKV iteratively refines the KV cache via lightweight updates guided by attention patterns of recent tokens. This approach captures inter-token correlation with greater accuracy, crucial for tasks like content creation and code generation. Our studies on long-response tasks show 52.9$\%$ memory savings and 18.2$\%$ higher accuracy on average compared to state-of-the-art prior works, enabling efficient real-world deployment.

nan


Article 1386

Title@2025-06-07 (6): Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate

Title: Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate Vorhersage-erweitert Monte Carlo: Eine Machine-Learning-Ansicht auf Steuerungsvariate 预测增强的蒙特卡洛:关于控制Variatte的机械学习观点 2412.11257v3

Authors (10): Fengpei Li, Haoxian Chen, Jiahe Lin, Arkin Gupta, Xiaowei Tan, Honglei Zhao, Gang Xu, Yuriy Nevmyvaka, Agostino Capponi, Henry Lam

For many complex simulation tasks spanning areas such as healthcare, engineering, and finance, Monte Carlo (MC) methods are invaluable due to their unbiased estimates and precise error quantification. Nevertheless, Monte Carlo simulations often become computationally prohibitive, especially for nested, multi-level, or path-dependent evaluations lacking effective variance reduction techniques. While machine learning (ML) surrogates appear as natural alternatives, naive replacements typically introduce unquantifiable biases. We address this challenge by introducing Prediction-Enhanced Monte Carlo (PEMC), a framework that leverages modern ML models as learned predictors, using cheap and parallelizable simulation as features, to output unbiased evaluation with reduced variance and runtime. PEMC can also be viewed as a “modernized” view of control variates, where we consider the overall computation-cost-aware variance reduction instead of per-replication reduction, while bypassing the closed-form mean function requirement and maintaining the advantageous unbiasedness and uncertainty quantifiability of Monte Carlo. We illustrate PEMC’s broader efficacy and versatility through three examples: first, equity derivatives such as variance swaps under stochastic local volatility models; second, interest rate derivatives such as swaption pricing under the Heath-Jarrow-Morton (HJM) interest-rate model. Finally, we showcase PEMC in a socially significant context - ambulance dispatch and hospital load balancing - where accurate mortality rate estimates are key for ethically sensitive decision-making. Across these diverse scenarios, PEMC consistently reduces variance while preserving unbiasedness, highlighting its potential as a powerful enhancement to standard Monte Carlo baselines.

nan


Article 1387

Title@2025-06-07 (6): EasyFS: an Efficient Model-free Feature Selection Framework via Elastic Transformation of Features

Title: EasyFS: an Efficient Model-free Feature Selection Framework via Elastic Transformation of Features EasyFS: Ein effizientes modellfreies Feature Selection Framework über die elastische Transformation von Features EsimpFS:通过地物的弹性变异,建立一个高效的无示范模式的地物选择框架 2402.05954v2

Authors (4): Jianming Lv, Sijun Xia, Depin Liang, Wei Chen

Traditional model-free feature selection methods treat each feature independently while disregarding the interrelationships among features, which leads to relatively poor performance compared with the model-aware methods. To address this challenge, we propose an efficient model-free feature selection framework via elastic expansion and compression of the features, namely EasyFS, to achieve better performance than state-of-the-art model-aware methods while sharing the characters of efficiency and flexibility with the existing model-free methods. In particular, EasyFS expands the feature space by using the random non-linear projection network to achieve the non-linear combinations of the original features, so as to model the interrelationships among the features and discover most correlated features. Meanwhile, a novel redundancy measurement based on the change of coding rate is proposed for efficient filtering of redundant features. Comprehensive experiments on 21 different datasets show that EasyFS outperforms state-of-the art methods up to 10.9\% in the regression tasks and 5.7\% in the classification tasks while saving more than 94\% of the time.

nan


Article 1388

Title@2025-06-07 (6): Refining Adaptive Zeroth-Order Optimization at Ease

Title: Refining Adaptive Zeroth-Order Optimization at Ease Verfeinerung adaptiver Zeroth-Order-Optimierung bei Leichtigkeit Ease 的适应性零极优化 2502.01014v2

Authors (4): Yao Shu, Qixin Zhang, Kun He, Zhongxiang Dai

Recently, zeroth-order (ZO) optimization plays an essential role in scenarios where gradient information is inaccessible or unaffordable, such as black-box systems and resource-constrained environments. While existing adaptive methods such as ZO-AdaMM have shown promise, they are fundamentally limited by their underutilization of moment information during optimization, usually resulting in underperforming convergence. To overcome these limitations, this paper introduces Refined Adaptive Zeroth-Order Optimization (R-AdaZO). Specifically, we first show the untapped variance reduction effect of first moment estimate on ZO gradient estimation, which improves the accuracy and stability of ZO updates. We then refine the second moment estimate based on these variance-reduced gradient estimates to better capture the geometry of the optimization landscape, enabling a more effective scaling of ZO updates. We present rigorous theoretical analysis to show (a) the first analysis to the variance reduction of first moment estimate in ZO optimization, (b) the improved second moment estimates with a more accurate approximation of its variance-free ideal, (c) the first variance-aware convergence framework for adaptive ZO methods, which may be of independent interest, and (d) the faster convergence of R-AdaZO than existing baselines like ZO-AdaMM. Our extensive experiments, including synthetic problems, black-box adversarial attack, and memory-efficient fine-tuning of large language models (LLMs), further verify the superior convergence of R-AdaZO, indicating that R-AdaZO offers an improved solution for real-world ZO optimization challenges.

nan


Article 1389

Title@2025-06-07 (6): Is Optimal Transport Necessary for Inverse Reinforcement Learning?

Title: Is Optimal Transport Necessary for Inverse Reinforcement Learning? Ist ein optimaler Verkehr für das Inverse Verstärkungslernen notwendig? 最佳交通是否为反强化学习所必需? 2506.06793v1

Authors (3): Zixuan Dong, Yumi Omori, Keith Ross

Inverse Reinforcement Learning (IRL) aims to recover a reward function from expert demonstrations. Recently, Optimal Transport (OT) methods have been successfully deployed to align trajectories and infer rewards. While OT-based methods have shown strong empirical results, they introduce algorithmic complexity, hyperparameter sensitivity, and require solving the OT optimization problems. In this work, we challenge the necessity of OT in IRL by proposing two simple, heuristic alternatives: (1) Minimum-Distance Reward, which assigns rewards based on the nearest expert state regardless of temporal order; and (2) Segment-Matching Reward, which incorporates lightweight temporal alignment by matching agent states to corresponding segments in the expert trajectory. These methods avoid optimization, exhibit linear-time complexity, and are easy to implement. Through extensive evaluations across 32 online and offline benchmarks with three reinforcement learning algorithms, we show that our simple rewards match or outperform recent OT-based approaches. Our findings suggest that the core benefits of OT may arise from basic proximity alignment rather than its optimal coupling formulation, advocating for reevaluation of complexity in future IRL design.

nan


Article 1390

Title@2025-06-07 (6): Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models

Title: Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models Ferret: Federated Full-Parameter Tuning auf Scale für große Sprachmodelle Ferret:在大语言模型规模上采用联邦全票全票制表 2409.06277v3

Authors (5): Yao Shu, Wenyang Hu, See-Kiong Ng, Bryan Kian Hsiang Low, Fei Richard Yu

Large Language Models (LLMs) have become indispensable in numerous real-world applications. However, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing approaches often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically comes at the cost of model accuracy. To this end, we propose federated full-parameter tuning at scale for LLMs (Ferret), the first first-order method with shared randomness to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. Ferret accomplishes this through three aspects: (i) it employs widely used first-order methods for efficient local updates; (ii) it projects these updates into a low-dimensional space to considerably reduce communication overhead; and (iii) it reconstructs local updates from this low-dimensional space with shared randomness to facilitate effective full-parameter global aggregation, ensuring fast convergence and competitive final performance. Our rigorous theoretical analyses and insights along with extensive experiments, show that Ferret significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy. Our implementation is available at https://github.com/allen4747/Ferret.

nan


Article 1391

Title@2025-06-07 (6): FuncGNN: Learning Functional Semantics of Logic Circuits with Graph Neural Networks

Title: FuncGNN: Learning Functional Semantics of Logic Circuits with Graph Neural Networks FuncGNN: Funktionelle Semantik von Logic Circuits mit Graph Neural Networks lernen FuncGNN: 学习具有图形神经网络的逻辑电路功能性词义 2506.06787v1

Authors (1): Qiyun Zhao

As integrated circuit scale grows and design complexity rises, effective circuit representation helps support logic synthesis, formal verification, and other automated processes in electronic design automation. And-Inverter Graphs (AIGs), as a compact and canonical structure, are widely adopted for representing Boolean logic in these workflows. However, the increasing complexity and integration density of modern circuits introduce structural heterogeneity and global logic information loss in AIGs, posing significant challenges to accurate circuit modeling. To address these issues, we propose FuncGNN, which integrates hybrid feature aggregation to extract multi-granularity topological patterns, thereby mitigating structural heterogeneity and enhancing logic circuit representations. FuncGNN further introduces gate-aware normalization that adapts to circuit-specific gate distributions, improving robustness to structural heterogeneity. Finally, FuncGNN employs multi-layer integration to merge intermediate features across layers, effectively synthesizing local and global semantic information for comprehensive logic representations. Experimental results on two logic-level analysis tasks (i.e., signal probability prediction and truth-table distance prediction) demonstrate that FuncGNN outperforms existing state-of-the-art methods, achieving improvements of 2.06% and 18.71%, respectively, while reducing training time by approximately 50.6% and GPU memory usage by about 32.8%.

nan


Article 1392

Title@2025-06-07 (6): Caterpillar GNN: Replacing Message Passing with Efficient Aggregation

Title: Caterpillar GNN: Replacing Message Passing with Efficient Aggregation Caterpillar GNN: Ersetzen der Nachricht durch effiziente Aggregation GNN: 以高效聚合取代信件传递 2506.06784v1

Authors (1): Marek Černý

Message-passing graph neural networks (MPGNNs) dominate modern graph learning, typically prioritizing maximal expressive power. In contrast, we introduce an \emph{efficient aggregation} mechanism, deliberately trading off some expressivity for stronger and more structured aggregation capabilities. Our approach allows seamless scaling between classical message-passing and simpler methods based on colored or plain walks. We rigorously characterize the expressive power at each intermediate step using homomorphism counts from a hierarchy of generalized \emph{caterpillar graphs}. Based on this foundation, we propose the \emph{Caterpillar GNN}, whose robust graph-level aggregation enables it to successfully tackle synthetic graph-level task specifically designed to be challenging for classical MPGNNs. Moreover, we demonstrate that, on real-world datasets, the Caterpillar GNN achieves comparable predictive performance while significantly reducing the number of nodes in the hidden layers of the computational graph.

nan


Article 1393

Title@2025-06-07 (6): Interpretable LLMs for Credit Risk: A Systematic Review and Taxonomy

Title: Interpretable LLMs for Credit Risk: A Systematic Review and Taxonomy Interpretierbare LLMs für Kreditrisiken: Systematische Überprüfung und Taxonomie 信用风险:系统审查和分类 2506.04290v2

Authors (2): Muhammed Golec, Maha AlabdulJalil

Large Language Models (LLM), which have developed in recent years, enable credit risk assessment through the analysis of financial texts such as analyst reports and corporate disclosures. This paper presents the first systematic review and taxonomy focusing on LLMbased approaches in credit risk estimation. We determined the basic model architectures by selecting 60 relevant papers published between 2020-2025 with the PRISMA research strategy. And we examined the data used for scenarios such as credit default prediction and risk analysis. Since the main focus of the paper is interpretability, we classify concepts such as explainability mechanisms, chain of thought prompts and natural language justifications for LLM-based credit models. The taxonomy organizes the literature under four main headings: model architectures, data types, explainability mechanisms and application areas. Based on this analysis, we highlight the main future trends and research gaps for LLM-based credit scoring systems. This paper aims to be a reference paper for artificial intelligence and financial researchers.

nan


Article 1394

Title@2025-06-07 (6): Feature-Based Instance Neighbor Discovery: Advanced Stable Test-Time Adaptation in Dynamic World

Title: Feature-Based Instance Neighbor Discovery: Advanced Stable Test-Time Adaptation in Dynamic World Feature-Based Instance Neighbor Discovery: Erweiterte stabile Test-Time-Anpassung in dynamischer Welt 以地貌为基础的实例邻国发现:动态世界中的高级稳定测试时间适应 2506.06782v1

Authors (7): Qinting Jiang, Chuyang Ye, Dongyan Wei, Bingli Wang, Yuan Xue, Jingyan Jiang, Zhi Wang

Despite progress, deep neural networks still suffer performance declines under distribution shifts between training and test domains, leading to a substantial decrease in Quality of Experience (QoE) for applications. Existing test-time adaptation (TTA) methods are challenged by dynamic, multiple test distributions within batches. We observe that feature distributions across different domains inherently cluster into distinct groups with varying means and variances. This divergence reveals a critical limitation of previous global normalization strategies in TTA, which inevitably distort the original data characteristics. Based on this insight, we propose Feature-based Instance Neighbor Discovery (FIND), which comprises three key components: Layer-wise Feature Disentanglement (LFD), Feature Aware Batch Normalization (FABN) and Selective FABN (S-FABN). LFD stably captures features with similar distributions at each layer by constructing graph structures. While FABN optimally combines source statistics with test-time distribution specific statistics for robust feature representation. Finally, S-FABN determines which layers require feature partitioning and which can remain unified, thereby enhancing inference efficiency. Extensive experiments demonstrate that FIND significantly outperforms existing methods, achieving a 30\% accuracy improvement in dynamic scenarios while maintaining computational efficiency.

nan


Article 1395

Title@2025-06-07 (6): Continuous-Time SO(3) Forecasting with Savitzky–Golay Neural Controlled Differential Equations

Title: Continuous-Time SO(3) Forecasting with Savitzky–Golay Neural Controlled Differential Equations Continuous-Time SO(3) Vorhersage mit Savitzky–Golay Neural gesteuerte Differentialgleichungen 连续时间 SO(3) 使用Savitzky-Golay 控制神经控制的神经等量的预测 2506.06780v1

Authors (4): Lennart Bastian, Mohammad Rashed, Nassir Navab, Tolga Birdal

Tracking and forecasting the rotation of objects is fundamental in computer vision and robotics, yet SO(3) extrapolation remains challenging as (1) sensor observations can be noisy and sparse, (2) motion patterns can be governed by complex dynamics, and (3) application settings can demand long-term forecasting. This work proposes modeling continuous-time rotational object dynamics on $SO(3)$ using Neural Controlled Differential Equations guided by Savitzky-Golay paths. Unlike existing methods that rely on simplified motion assumptions, our method learns a general latent dynamical system of the underlying object trajectory while respecting the geometric structure of rotations. Experimental results on real-world data demonstrate compelling forecasting capabilities compared to existing approaches.

nan


Article 1396

Title@2025-06-07 (6): Continuous Semi-Implicit Models

Title: Continuous Semi-Implicit Models Kontinuierliche semiimplizite Modelle 连续半集成模型 2506.06778v1

Authors (7): Longlin Yu, Jiajun Zha, Tong Yang, Tianyu Xie, Xiangyu Zhang, S. -H. Gary Chan, Cheng Zhang

Semi-implicit distributions have shown great promise in variational inference and generative modeling. Hierarchical semi-implicit models, which stack multiple semi-implicit layers, enhance the expressiveness of semi-implicit distributions and can be used to accelerate diffusion models given pretrained score networks. However, their sequential training often suffers from slow convergence. In this paper, we introduce CoSIM, a continuous semi-implicit model that extends hierarchical semi-implicit models into a continuous framework. By incorporating a continuous transition kernel, CoSIM enables efficient, simulation-free training. Furthermore, we show that CoSIM achieves consistency with a carefully designed transition kernel, offering a novel approach for multistep distillation of generative models at the distributional level. Extensive experiments on image generation demonstrate that CoSIM performs on par or better than existing diffusion model acceleration methods, achieving superior performance on FD-DINOv2.

nan


Article 1397

Title@2025-06-07 (6): Label Distribution Learning using the Squared Neural Family on the Probability Simplex

Title: Label Distribution Learning using the Squared Neural Family on the Probability Simplex Label-Distribution-Lernen mit der Quadratischen Neural-Familie auf der Wahrscheinlichkeit Simplex 利用 “ 概率简单 “ 上的平方神经家庭进行标签分布学习 2412.07324v2

Authors (3): Daokun Zhang, Russell Tsuchida, Dino Sejdinovic

Label distribution learning (LDL) provides a framework wherein a distribution over categories rather than a single category is predicted, with the aim of addressing ambiguity in labeled data. Existing research on LDL mainly focuses on the task of point estimation, i.e., finding an optimal distribution in the probability simplex conditioned on the given sample. In this paper, we propose a novel label distribution learning model SNEFY-LDL, which estimates a probability distribution of all possible label distributions over the simplex, by unleashing the expressive power of the recently introduced Squared Neural Family (SNEFY), a new class of tractable probability models. As a way to summarize the fitted model, we derive the closed-form label distribution mean, variance and covariance conditioned on the given sample, which can be used to predict the ground-truth label distributions, construct label distribution confidence intervals, and measure the correlations between different labels. Moreover, more information about the label distribution prediction uncertainties can be acquired from the modeled probability density function. Extensive experiments on conformal prediction, active learning and ensemble learning are conducted, verifying SNEFY-LDL’s great effectiveness in LDL uncertainty quantification. The source code of this paper is available at https://github.com/daokunzhang/SNEFY-LDL.

nan


Article 1398

Title@2025-06-07 (6): Taming Wild Branches: Overcoming Hard-to-Predict Branches using the Bullseye Predictor

Title: Taming Wild Branches: Overcoming Hard-to-Predict Branches using the Bullseye Predictor Wilde Zweige zähmen: Hard-to-Predict-Zweige mit dem Bullseye Predictor überwinden 塔姆荒野分支:利用 “ 红心 “ 预测仪克服难以预防的分支 2506.06773v1

Authors (3): Emet Behrendt, Shing Wai Pun, Prashant J. Nair

Branch prediction is key to the performance of out-of-order processors. While the CBP-2016 winner TAGE-SC-L combines geometric-history tables, a statistical corrector, and a loop predictor, over half of its remaining mispredictions stem from a small set of hard-to-predict (H2P) branches. These branches occur under diverse global histories, causing repeated thrashing in TAGE and eviction before usefulness counters can mature. Prior work shows that simply enlarging the tables offers only marginal improvement. We augment a 159 KB TAGE-SC-L predictor with a 28 KB H2P-targeted subsystem called the Bullseye predictor. It identifies problematic PCs using a set-associative H2P Identification Table (HIT) and steers them to one of two branch-specific perceptrons, one indexed by hashed local history and the other by folded global history. A short trial phase tracks head-to-head accuracy in an H2P cache. A branch becomes perceptron-resident only if the perceptron’s sustained accuracy and output magnitude exceed dynamic thresholds, after which TAGE updates for that PC are suppressed to reduce pollution. The HIT, cache, and perceptron operate fully in parallel with TAGE-SC-L, providing higher fidelity on the H2P tail. This achieves an average MPKI of 3.4045 and CycWpPKI of 145.09.

nan


Article 1399

Title@2025-06-07 (6): On Adversarial Robustness of Language Models in Transfer Learning

Title: On Adversarial Robustness of Language Models in Transfer Learning Zur widerlichen Robustheit von Sprachmodellen im Transfer-Lernen 转让学习中语言模式的反效果 2501.00066v2

Authors (4): Bohdan Turbal, Anastasiia Mazur, Jiaxu Zhao, Mykola Pechenizkiy

We investigate the adversarial robustness of LLMs in transfer learning scenarios. Through comprehensive experiments on multiple datasets (MBIB Hate Speech, MBIB Political Bias, MBIB Gender Bias) and various model architectures (BERT, RoBERTa, GPT-2, Gemma, Phi), we reveal that transfer learning, while improving standard performance metrics, often leads to increased vulnerability to adversarial attacks. Our findings demonstrate that larger models exhibit greater resilience to this phenomenon, suggesting a complex interplay between model size, architecture, and adaptation methods. Our work highlights the crucial need for considering adversarial robustness in transfer learning scenarios and provides insights into maintaining model security without compromising performance. These findings have significant implications for the development and deployment of LLMs in real-world applications where both performance and robustness are paramount.

nan


Article 1400

Title@2025-06-07 (6): AfroBench: How Good are Large Language Models on African Languages?

Title: AfroBench: How Good are Large Language Models on African Languages? AfroBench: Wie gut sind große Sprachmodelle auf afrikanischen Sprachen? AfroBench:非洲语言大语言模式有多好? 2311.07978v5

Authors (7): Jessica Ojo, Odunayo Ogundepo, Akintunde Oladipo, Kelechi Ogueji, Jimmy Lin, Pontus Stenetorp, David Ifeoluwa Adelani

Large-scale multilingual evaluations, such as MEGA, often include only a handful of African languages due to the scarcity of high-quality evaluation data and the limited discoverability of existing African datasets. This lack of representation hinders comprehensive LLM evaluation across a diverse range of languages and tasks. To address these challenges, we introduce AfroBench – a multi-task benchmark for evaluating the performance of LLMs across 64 African languages, 15 tasks and 22 datasets. AfroBench consists of nine natural language understanding datasets, six text generation datasets, six knowledge and question answering tasks, and one mathematical reasoning task. We present results comparing the performance of prompting LLMs to fine-tuned baselines based on BERT and T5-style models. Our results suggest large gaps in performance between high-resource languages, such as English, and African languages across most tasks; but performance also varies based on the availability of monolingual data resources. Our findings confirm that performance on African languages continues to remain a hurdle for current LLMs, underscoring the need for additional efforts to close this gap. https://mcgill-nlp.github.io/AfroBench/

nan


Article 1401

Title@2025-06-07 (6): Xeno-learning: knowledge transfer across species in deep learning-based spectral image analysis

Title: Xeno-learning: knowledge transfer across species in deep learning-based spectral image analysis Xeno-Learning: Wissenstransfer über Arten hinweg in der Deep Learning-basierten Spektralbildanalyse Xeno-学习:深学习光谱图象分析中的跨物种知识转让 2410.19789v2

Authors (19): Jan Sellner, Alexander Studier-Fischer, Ahmad Bin Qasim, Silvia Seidlitz, Nicholas Schreck, Minu Tizabi, Manuel Wiesenfarth, Annette Kopp-Schneider, Janne Heinecke, Jule Brandt, Samuel Knödler, Caelan Max Haney, Gabriel Salg, Berkin Özdemir, Maximilian Dietrich, Maurice Stephan Michel, Felix Nickel, Karl-Friedrich Kowalewski, Lena Maier-Hein

Novel optical imaging techniques, such as hyperspectral imaging (HSI) combined with machine learning-based (ML) analysis, have the potential to revolutionize clinical surgical imaging. However, these novel modalities face a shortage of large-scale, representative clinical data for training ML algorithms, while preclinical animal data is abundantly available through standardized experiments and allows for controlled induction of pathological tissue states, which is not ethically possible in patients. To leverage this situation, we propose a novel concept called “xeno-learning”, a cross-species knowledge transfer paradigm inspired by xeno-transplantation, where organs from a donor species are transplanted into a recipient species. Using a total of 13,874 HSI images from humans as well as porcine and rat models, we show that although spectral signatures of organs differ substantially across species, relative changes resulting from pathologies or surgical manipulation (e.g., malperfusion; injection of contrast agent) are comparable. Such changes learnt in one species can thus be transferred to a new species via a novel “physiology-based data augmentation” method, enabling the large-scale secondary use of preclinical animal data for humans. The resulting ethical, monetary, and performance benefits promise a high impact of the proposed knowledge transfer paradigm on future developments in the field.

nan


Article 1402

Title@2025-06-07 (6): The OCR Quest for Generalization: Learning to recognize low-resource alphabets with model editing

Title: The OCR Quest for Generalization: Learning to recognize low-resource alphabets with model editing OCR Quest for Generalization: Lernen, Low-Resource-Alphabets mit Modellbearbeitung zu erkennen OCR 普遍化调查:学会以示范编辑方式识别低资源字母表 2506.06761v1

Authors (3): Adrià Molina Rodríguez, Oriol Ramos Terrades, Josep Lladós

Achieving robustness in recognition systems across diverse domains is crucial for their practical utility. While ample data availability is usually assumed, low-resource languages, such as ancient manuscripts and non-western languages, tend to be kept out of the equations of massive pretraining and foundational techniques due to an under representation. In this work, we aim for building models which can generalize to new distributions of data, such as alphabets, faster than centralized fine-tune strategies. For doing so, we take advantage of the recent advancements in model editing to enhance the incorporation of unseen scripts (low-resource learning). In contrast to state-of-the-art meta-learning, we showcase the effectiveness of domain merging in sparse distributions of data, with agnosticity of its relation to the overall distribution or any other prototyping necessity. Even when using the same exact training data, our experiments showcase significant performance boosts in \textbf{transfer learning} to new alphabets and \textbf{out-of-domain evaluation} in challenging domain shifts, including historical ciphered texts and non-Latin scripts. This research contributes a novel approach into building models that can easily adopt under-represented alphabets and, therefore, enable document recognition to a wider set of contexts and cultures.

nan


Article 1403

Title@2025-06-07 (6): Bio-Inspired Classification: Combining Information Theory and Spiking Neural Networks – Influence of the Learning Rules

Title: Bio-Inspired Classification: Combining Information Theory and Spiking Neural Networks – Influence of the Learning Rules Bio-Inspirierte Klassifikation: Verknüpfung von Informationstheorie und spikenden Neuronalen Netzwerken – Einfluss der Lernregeln 生物启发分类:信息理论与Spiking神经网络相结合 – – 学习规则的影响 2506.06750v1

Authors (3): Zofia Rudnicka, Janusz Szczepanski, Agnieszka Pregowska

Training of Spiking Neural Networks (SNN) is challenging due to their unique properties, including temporal dynamics, non-differentiability of spike events, and sparse event-driven activations. In this paper, we widely consider the influence of the type of chosen learning algorithm, including bioinspired learning rules on the accuracy of classification. We proposed a bioinspired classifier based on the combination of SNN and Lempel-Ziv complexity (LZC). This approach synergizes the strengths of SNNs in temporal precision and biological realism with LZC’s structural complexity analysis, facilitating efficient and interpretable classification of spatiotemporal neural data. It turned out that the classic backpropagation algorithm achieves excellent classification accuracy, but at extremely high computational cost, which makes it impractical for real-time applications. Biologically inspired learning algorithms such as tempotron and Spikprop provide increased computational efficiency while maintaining competitive classification performance, making them suitable for time-sensitive tasks. The results obtained indicate that the selection of the most appropriate learning algorithm depends on the trade-off between classification accuracy and computational cost as well as application constraints.

nan


Article 1404

Title@2025-06-07 (6): Honey, I shrunk the hypothesis space (through logical preprocessing)

Title: Honey, I shrunk the hypothesis space (through logical preprocessing) Schatz, ich habe den Hypothesenraum geschrumpft (durch logische Vorverarbeitung) 亲爱的,我缩小了假设空间(通过逻辑预处理) 2506.06739v1

Authors (3): Andrew Cropper, Filipe Gouveia, David M. Cerna

Inductive logic programming (ILP) is a form of logical machine learning. The goal is to search a hypothesis space for a hypothesis that generalises training examples and background knowledge. We introduce an approach that ‘shrinks’ the hypothesis space before an ILP system searches it. Our approach uses background knowledge to find rules that cannot be in an optimal hypothesis regardless of the training examples. For instance, our approach discovers relationships such as “even numbers cannot be odd” and “prime numbers greater than 2 are odd”. It then removes violating rules from the hypothesis space. We implement our approach using answer set programming and use it to shrink the hypothesis space of a constraint-based ILP system. Our experiments on multiple domains, including visual reasoning and game playing, show that our approach can substantially reduce learning times whilst maintaining predictive accuracies. For instance, given just 10 seconds of preprocessing time, our approach can reduce learning times from over 10 hours to only 2 seconds.

nan


Article 1405

Title@2025-06-07 (6): WorldLLM: Improving LLMs’ world modeling using curiosity-driven theory-making

Title: WorldLLM: Improving LLMs’ world modeling using curiosity-driven theory-making WorldLLM: Verbesserung der Weltmodellierung von LLMs durch kuriositätsgetriebene Theoriebildung WorldLLLM:利用由好奇力驱动的理论制造改进LLMs的世界模型 2506.06725v1

Authors (5): Guillaume Levy, Cedric Colas, Pierre-Yves Oudeyer, Thomas Carta, Clement Romac

Large Language Models (LLMs) possess general world knowledge but often struggle to generate precise predictions in structured, domain-specific contexts such as simulations. These limitations arise from their inability to ground their broad, unstructured understanding in specific environments. To address this, we present WorldLLM, a framework that enhances LLM-based world modeling by combining Bayesian inference and autonomous active exploration with reinforcement learning. WorldLLM leverages the in-context learning abilities of LLMs to guide an LLM-based world model’s predictions using natural language hypotheses given in its prompt. These hypotheses are iteratively refined through a Bayesian inference framework that leverages a second LLM as the proposal distribution given collected evidence. This evidence is collected using a curiosity-driven reinforcement learning policy that explores the environment to find transitions with a low log-likelihood under our LLM-based predictive model using the current hypotheses. By alternating between refining hypotheses and collecting new evidence, our framework autonomously drives continual improvement of the predictions. Our experiments demonstrate the effectiveness of WorldLLM in a textual game environment that requires agents to manipulate and combine objects. The framework not only enhances predictive accuracy, but also generates human-interpretable theories of environment dynamics.

nan


Article 1406

Title@2025-06-07 (6): Directly Forecasting Belief for Reinforcement Learning with Delays

Title: Directly Forecasting Belief for Reinforcement Learning with Delays Direkt voraussagender Glaube für verstärktes Lernen mit Verzögerungen 直接预测学习迟迟加强学习的信念 2505.00546v2

Authors (9): Qingyuan Wu, Yuhui Wang, Simon Sinong Zhan, Yixuan Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang

Reinforcement learning (RL) with delays is challenging as sensory perceptions lag behind the actual events: the RL agent needs to estimate the real state of its environment based on past observations. State-of-the-art (SOTA) methods typically employ recursive, step-by-step forecasting of states. This can cause the accumulation of compounding errors. To tackle this problem, our novel belief estimation method, named Directly Forecasting Belief Transformer (DFBT), directly forecasts states from observations without incrementally estimating intermediate states step-by-step. We theoretically demonstrate that DFBT greatly reduces compounding errors of existing recursively forecasting methods, yielding stronger performance guarantees. In experiments with D4RL offline datasets, DFBT reduces compounding errors with remarkable prediction accuracy. DFBT’s capability to forecast state sequences also facilitates multi-step bootstrapping, thus greatly improving learning efficiency. On the MuJoCo benchmark, our DFBT-based method substantially outperforms SOTA baselines. Code is available at https://github.com/QingyuanWuNothing/DFBT.

nan


Article 1407

Title@2025-06-07 (6): IQFM A Wireless Foundational Model for I/Q Streams in AI-Native 6G

Title: IQFM A Wireless Foundational Model for I/Q Streams in AI-Native 6G IQFM Ein drahtloses Grundmodell für I/Q Streams in AI-Native 6G AI-Native 6G的I/Q流无线无线基础模型 2506.06718v1

Authors (2): Omar Mashaal, Hatem Abou-Zeid

Foundational models have shown remarkable potential in natural language processing and computer vision, yet remain in their infancy in wireless communications. While a few efforts have explored image-based modalities such as channel state information (CSI) and frequency spectrograms, foundational models that operate directly on raw IQ data remain largely unexplored. This paper presents, IQFM, the first I/Q signal foundational model for wireless communications. IQFM supporting diverse tasks: modulation classification, angle-of-arrival (AoA), beam prediction, and RF fingerprinting, without heavy preprocessing or handcrafted features. We also introduce a task-aware augmentation strategy that categorizes transformations into core augmentations, such as cyclic time shifting, and task-specific augmentations. This strategy forms the basis for structured, task-dependent representation learning within a contrastive self-supervised learning (SSL) framework. Using this strategy, the lightweight encoder, pre-trained via SSL on over-the-air multi-antenna IQ data, achieves up to 99.67% and 65.45% accuracy on modulation and AoA classification, respectively, using only one labeled sample per class, outperforming supervised baselines by up to 7x and 145x. The model also generalizes to out-of-distribution tasks; when adapted to new tasks using only 500 samples per class and minimal parameter updates via LoRA, the same frozen encoder achieves 94.15% on beam prediction (vs. 89.53% supervised), 50.00% on RML2016a modulation classification (vs. 49.30%), and 96.05% on RF fingerprinting (vs. 96.64%). These results demonstrate the potential of raw IQ-based foundational models as efficient, reusable encoders for multi-task learning in AI-native 6G systems.

nan


Article 1408

Title@2025-06-07 (6): Online SLA Decomposition: Enabling Real-Time Adaptation to Evolving Network Systems

Title: Online SLA Decomposition: Enabling Real-Time Adaptation to Evolving Network Systems Online-SLA-Zersetzung: Echtzeit-Anpassung an sich entwickelnde Netzwerksysteme ermöglichen SLA在线分解:使实时适应不断演变的网络系统 2408.08968v5

Authors (4): Cyril Shih-Huan Hsu, Danny De Vleeschauwer, Chrysa Papagianni, Paola Grosso

When a network slice spans multiple technology domains, it is crucial for each domain to uphold the End-to-End (E2E) Service Level Agreement (SLA) associated with the slice. Consequently, the E2E SLA must be properly decomposed into partial SLAs that are assigned to each domain involved. In a network slice management system with a two-level architecture, comprising an E2E service orchestrator and local domain controllers, we consider that the orchestrator has access only to historical data regarding the responses of local controllers to previous requests, and this information is used to construct a risk model for each domain. In this study, we extend our previous work by investigating the dynamic nature of real-world systems and introducing an online learning-decomposition framework to tackle the dynamicity. We propose a framework that continuously updates the risk models based on the most recent feedback. This approach leverages key components such as online gradient descent and FIFO memory buffers, which enhance the stability and robustness of the overall process. Our empirical study on an analytic model-based simulator demonstrates that the proposed framework outperforms the state-of-the-art static approach, delivering more accurate and resilient SLA decomposition under varying conditions and data limitations. Furthermore, we provide a comprehensive complexity analysis of the proposed solution.

nan


Article 1409

Title@2025-06-07 (6): Towards Black-Box Membership Inference Attack for Diffusion Models

Title: Towards Black-Box Membership Inference Attack for Diffusion Models Auf dem Weg zu Black-Box-Mitgliedschafts-Inferenz-Angriff für Diffusionsmodelle 争取为扩散模型进行黑箱成员推导攻击 2405.20771v4

Authors (4): Jingwei Li, Jing Dong, Tianxing He, Jingzhao Zhang

Given the rising popularity of AI-generated art and the associated copyright concerns, identifying whether an artwork was used to train a diffusion model is an important research topic. The work approaches this problem from the membership inference attack (MIA) perspective. We first identify the limitation of applying existing MIA methods for proprietary diffusion models: the required access of internal U-nets. To address the above problem, we introduce a novel membership inference attack method that uses only the image-to-image variation API and operates without access to the model’s internal U-net. Our method is based on the intuition that the model can more easily obtain an unbiased noise prediction estimate for images from the training set. By applying the API multiple times to the target image, averaging the outputs, and comparing the result to the original image, our approach can classify whether a sample was part of the training set. We validate our method using DDIM and Stable Diffusion setups and further extend both our approach and existing algorithms to the Diffusion Transformer architecture. Our experimental results consistently outperform previous methods.

nan


Article 1410

Title@2025-06-07 (6): Understanding Nonlinear Implicit Bias via Region Counts in Input Space

Title: Understanding Nonlinear Implicit Bias via Region Counts in Input Space Nichtlineare Implizite Bias über Regionszählungen im Eingaberaum verstehen 通过输入空间中的区域计数理解非线性隐隐含比值 2505.11370v2

Authors (5): Jingwei Li, Jing Xu, Zifan Wang, Huishuai Zhang, Jingzhao Zhang

One explanation for the strong generalization ability of neural networks is implicit bias. Yet, the definition and mechanism of implicit bias in non-linear contexts remains little understood. In this work, we propose to characterize implicit bias by the count of connected regions in the input space with the same predicted label. Compared with parameter-dependent metrics (e.g., norm or normalized margin), region count can be better adapted to nonlinear, overparameterized models, because it is determined by the function mapping and is invariant to reparametrization. Empirically, we found that small region counts align with geometrically simple decision boundaries and correlate well with good generalization performance. We also observe that good hyper-parameter choices such as larger learning rates and smaller batch sizes can induce small region counts. We further establish the theoretical connections and explain how larger learning rate can induce small region counts in neural networks.

nan


Article 1411

Title@2025-06-07 (6): RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking

Title: RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking RED QUEEN: Schutz großer Sprachmodelle gegen verschleierte Multi-Turn-Jailbreaking RED QUEEN: 保护大型语言模型,防止隐蔽的多轮破门破门 2409.17458v2

Authors (6): Yifan Jiang, Kriti Aggarwal, Tanmay Laud, Kashif Munir, Jay Pujara, Subhabrata Mukherjee

The rapid progress of Large Language Models (LLMs) has opened up new opportunities across various domains and applications; yet it also presents challenges related to potential misuse. To mitigate such risks, red teaming has been employed as a proactive security measure to probe language models for harmful outputs via jailbreak attacks. However, current jailbreak attack approaches are single-turn with explicit malicious queries that do not fully capture the complexity of real-world interactions. In reality, users can engage in multi-turn interactions with LLM-based chat assistants, allowing them to conceal their true intentions in a more covert manner. To bridge this gap, we, first, propose a new jailbreak approach, RED QUEEN ATTACK. This method constructs a multi-turn scenario, concealing the malicious intent under the guise of preventing harm. We craft 40 scenarios that vary in turns and select 14 harmful categories to generate 56k multi-turn attack data points. We conduct comprehensive experiments on the RED QUEEN ATTACK with four representative LLM families of different sizes. Our experiments reveal that all LLMs are vulnerable to RED QUEEN ATTACK, reaching 87.62% attack success rate on GPT-4o and 75.4% on Llama3-70B. Further analysis reveals that larger models are more susceptible to the RED QUEEN ATTACK, with multi-turn structures and concealment strategies contributing to its success. To prioritize safety, we introduce a straightforward mitigation strategy called RED QUEEN GUARD, which aligns LLMs to effectively counter adversarial attacks. This approach reduces the attack success rate to below 1% while maintaining the model’s performance across standard benchmarks. Full implementation and dataset are publicly accessible at https://github.com/kriti-hippo/red_queen.

nan


Article 1412

Title@2025-06-07 (6): Do Protein Transformers Have Biological Intelligence?

Title: Do Protein Transformers Have Biological Intelligence? Haben Proteintransformatoren biologische Intelligenz? 蛋白质变异器有生物智能吗? 2506.06701v1

Authors (8): Fudong Lin, Wanrou Du, Jinchan Liu, Tarikul Milon, Shelby Meche, Wu Xu, Xiaoqi Qin, Xu Yuan

Deep neural networks, particularly Transformers, have been widely adopted for predicting the functional properties of proteins. In this work, we focus on exploring whether Protein Transformers can capture biological intelligence among protein sequences. To achieve our goal, we first introduce a protein function dataset, namely Protein-FN, providing over 9000 protein data with meaningful labels. Second, we devise a new Transformer architecture, namely Sequence Protein Transformers (SPT), for computationally efficient protein function predictions. Third, we develop a novel Explainable Artificial Intelligence (XAI) technique called Sequence Score, which can efficiently interpret the decision-making processes of protein models, thereby overcoming the difficulty of deciphering biological intelligence bided in Protein Transformers. Remarkably, even our smallest SPT-Tiny model, which contains only 5.4M parameters, demonstrates impressive predictive accuracy, achieving 94.3% on the Antibiotic Resistance (AR) dataset and 99.6% on the Protein-FN dataset, all accomplished by training from scratch. Besides, our Sequence Score technique helps reveal that our SPT models can discover several meaningful patterns underlying the sequence structures of protein data, with these patterns aligning closely with the domain knowledge in the biology community. We have officially released our Protein-FN dataset on Hugging Face Datasets https://huggingface.co/datasets/Protein-FN/Protein-FN. Our code is available at https://github.com/fudong03/BioIntelligence.

nan


Article 1413

Title@2025-06-07 (6): MarginSel : Max-Margin Demonstration Selection for LLMs

Title: MarginSel : Max-Margin Demonstration Selection for LLMs MarginSel : Max-Margin-Demonstrationsauswahl für LLMs MaginSel : LLMS 最大边框演示选择 2506.06699v1

Authors (4): Rajeev Bhatt Ambati, James Lester, Shashank Srivastava, Snigdha Chaturvedi

Large Language Models (LLMs) excel at few-shot learning via in-context learning (ICL). However, the effectiveness of ICL is often sensitive to the selection and ordering of demonstration examples. To address this, we present MarginSel: Max-Margin Demonstration Selection for LLMs, a two-step method that selects hard demonstration examples for the ICL prompt, adapting to each test instance. Our approach achieves 2-7% absolute improvement in F1-score across classification tasks, compared to a random selection of examples. We also provide theoretical insights and empirical evidence showing that MarginSel induces max-margin behavior in LLMs by effectively increasing the margin for hard examples, analogous to support vectors, thereby shifting the decision boundary in a beneficial direction.

nan


Article 1414

Title@2025-06-07 (6): Contextual Experience Replay for Self-Improvement of Language Agents

Title: Contextual Experience Replay for Self-Improvement of Language Agents Kontextuelles Erfahrungsreplay zur Selbstverbesserung von Sprachagenten 语文代理人员自我改进的内事经验回顾 2506.06698v1

Authors (4): Yitao Liu, Chenglei Si, Karthik Narasimhan, Shunyu Yao

Large language model (LLM) agents have been applied to sequential decision-making tasks such as web navigation, but without any environment-specific experiences, they often fail in these complex tasks. Moreover, current LLM agents are not designed to continually learn from past experiences during inference time, which could be crucial for them to gain these environment-specific experiences. To address this, we propose Contextual Experience Replay (CER), a training-free framework to enable efficient self-improvement for language agents in their context window. Specifically, CER accumulates and synthesizes past experiences into a dynamic memory buffer. These experiences encompass environment dynamics and common decision-making patterns, allowing the agents to retrieve and augment themselves with relevant knowledge in new tasks, enhancing their adaptability in complex environments. We evaluate CER on the challenging WebArena and VisualWebArena benchmarks. On VisualWebArena, CER achieves a competitive performance of 31.9%. On WebArena, CER also gets a competitive average success rate of 36.7%, relatively improving the success rate of the GPT-4o agent baseline by 51.0%. We also conduct a comprehensive analysis on it to prove its efficiency, validity and understand it better.

nan


Article 1415

Title@2025-06-07 (6): Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning

Title: Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning Breaking Data Silos: Auf dem Weg zu offenen und skalierbaren Mobilitäts-Stiftungsmodellen durch generatives kontinuierliches Lernen 突破数据硅:通过创造持续学习实现开放和可缩放的流动基金会模式 2506.06694v1

Authors (5): Yuan Yuan, Yukun Liu, Chonghua Han, Jie Feng, Yong Li

Foundation models have revolutionized fields such as natural language processing and computer vision by enabling general-purpose learning across diverse tasks and datasets. However, building analogous models for human mobility remains challenging due to the privacy-sensitive nature of mobility data and the resulting data silos across institutions. To bridge this gap, we propose MoveGCL, a scalable and privacy-preserving framework for training mobility foundation models via generative continual learning. Without sharing raw data, MoveGCL enables decentralized and progressive model evolution by replaying synthetic trajectories generated from a frozen teacher model, and reinforces knowledge retention through a tailored distillation strategy that mitigates catastrophic forgetting. To address the heterogeneity of mobility patterns, MoveGCL incorporates a Mixture-of-Experts Transformer with a mobility-aware expert routing mechanism, and employs a layer-wise progressive adaptation strategy to stabilize continual updates. Experiments on six real-world urban datasets demonstrate that MoveGCL achieves performance comparable to joint training and significantly outperforms federated learning baselines, while offering strong privacy protection. MoveGCL marks a crucial step toward unlocking foundation models for mobility, offering a practical blueprint for open, scalable, and privacy-preserving model development in the era of foundation models.

nan


Article 1416

Title@2025-06-07 (6): From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

Title: From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications Vom Tiefrang gradienten Subraumstabilisierung zu Low-Rank-Gewichten: Beobachtungen, Theorien und Anwendungen 从低级梯度子空间稳定到低射速重量:观测、理论和应用 2407.11239v2

Authors (9): Ajay Jaiswal, Yifan Wang, Lu Yin, Shiwei Liu, Runjin Chen, Jiawei Zhao, Ananth Grama, Yuandong Tian, Zhangyang Wang

Large Language Models’ (LLMs) weight matrices can often be expressed in low-rank form with potential to relax memory and compute resource requirements. Unlike prior efforts that focus on developing novel matrix decompositions, in this work we study the non-uniform low-rank properties of weight matrices in LLMs through the lens of stabilizing gradient subspace. First, we provide a theoretical framework to understand the stabilization of gradient subspaces through Hessian analysis. Second, we empirically establish an important relationship between gradient dynamics and low-rank expressiveness of weight matrices. Our findings reveal that different LLM components exhibit varying levels of converged low-rank structures, necessitating variable rank reduction across them to minimize drop in performance due to compression. Drawing on this result, we present Weight Low-Rank Projection(WeLore) that unifies weight compression and memory-efficient fine-tuning into one, in a data-agnostic and one-shot manner. When used as a compression technique, WeLore categorizes weight matrices into Low-rank Components (LRCs) and Non-Low-rank Components (N-LRCs) and suitably encodes them for minimum performance loss. Our gradient dynamics perspective illustrates that LRCs tend to have better fine-tuning capabilities and their standalone fine-tuning can closely mimic and sometimes outperform the training loss trajectory and performance of full fine-tuning with notable memory and compute footprint reduction. Codes are available at https://github.com/VITA-Group/WeLore.

nan


Article 1417

Title@2025-06-07 (6): Learning Robust Heterogeneous Graph Representations via Contrastive-Reconstruction under Sparse Semantics

Title: Learning Robust Heterogeneous Graph Representations via Contrastive-Reconstruction under Sparse Semantics Robuste Heterogene Graphendarstellungen über Kontrastive-Rekonstruktion unter Sparsemantik lernen 在粗简语义下通过对比重建进行学习强强强异基因图示 2506.06682v1

Authors (4): Di Lin, Wanjing Ren, Xuanbin Li, Rui Zhang

In graph self-supervised learning, masked autoencoders (MAE) and contrastive learning (CL) are two prominent paradigms. MAE focuses on reconstructing masked elements, while CL maximizes similarity between augmented graph views. Recent studies highlight their complementarity: MAE excels at local feature capture, and CL at global information extraction. Hybrid frameworks for homogeneous graphs have been proposed, but face challenges in designing shared encoders to meet the semantic requirements of both tasks. In semantically sparse scenarios, CL struggles with view construction, and gradient imbalance between positive and negative samples persists. This paper introduces HetCRF, a novel dual-channel self-supervised learning framework for heterogeneous graphs. HetCRF uses a two-stage aggregation strategy to adapt embedding semantics, making it compatible with both MAE and CL. To address semantic sparsity, it enhances encoder output for view construction instead of relying on raw features, improving efficiency. Two positive sample augmentation strategies are also proposed to balance gradient contributions. Node classification experiments on four real-world heterogeneous graph datasets demonstrate that HetCRF outperforms state-of-the-art baselines. On datasets with missing node features, such as Aminer and Freebase, at a 40% label rate in node classification, HetCRF improves the Macro-F1 score by 2.75% and 2.2% respectively compared to the second-best baseline, validating its effectiveness and superiority.

nan


Article 1418

Title@2025-06-07 (6): From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients

Title: From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients Von Offline zu Online Memory-Free und Task-Free Continual Learning über feinkörnige Hypergradienten 从离线到在线无记忆和无任务持续学习 2502.18762v2

Authors (4): Nicolas Michel, Maorong Wang, Jiangpeng He, Toshihiko Yamasaki

Continual Learning (CL) aims to learn from a non-stationary data stream where the underlying distribution changes over time. While recent advances have produced efficient memory-free methods in the offline CL (offCL) setting, where tasks are known in advance and data can be revisited, online CL (onCL) remains dominated by memory-based approaches. The transition from offCL to onCL is challenging, as many offline methods rely on (1) prior knowledge of task boundaries and (2) sophisticated scheduling or optimization schemes, both of which are unavailable when data arrives sequentially and can be seen only once. In this paper, we investigate the adaptation of state-of-the-art memory-free offCL methods to the online setting. We first show that augmenting these methods with lightweight prototypes significantly improves performance, albeit at the cost of increased Gradient Imbalance, resulting in a biased learning towards earlier tasks. To address this issue, we introduce Fine-Grained Hypergradients, an online mechanism for rebalancing gradient updates during training. Our experiments demonstrate that the synergy between prototype memory and hypergradient reweighting substantially enhances the performance of memory-free methods in onCL and surpasses onCL baselines. Code will be released upon acceptance.

nan


Article 1419

Title@2025-06-07 (6): Interpretation of Deep Learning Model in Embryo Selection for In Vitro Fertilization (IVF) Treatment

Title: Interpretation of Deep Learning Model in Embryo Selection for In Vitro Fertilization (IVF) Treatment Interpretation von Deep-Learning-Modell in der Embryo-Auswahl für die In-Vitro-Düngung (IVF) Behandlung 体外受肥(IVF)治疗Embryo选择 Empryo的深学习模型解释 2506.06680v1

Authors (7): Radha Kodali, Venkata Rao Dhulipalla, Venkata Siva Kishor Tatavarty, Madhavi Nadakuditi, Bharadwaj Thiruveedhula, Suryanarayana Gunnam, Durga Prasad Bavirisetti

Infertility has a considerable impact on individuals’ quality of life, affecting them socially and psychologically, with projections indicating a rise in the upcoming years. In vitro fertilization (IVF) emerges as one of the primary techniques within economically developed nations, employed to address the rising problem of low fertility. Expert embryologists conventionally grade embryos by reviewing blastocyst images to select the most optimal for transfer, yet this process is time-consuming and lacks efficiency. Blastocyst images provide a valuable resource for assessing embryo viability. In this study, we introduce an explainable artificial intelligence (XAI) framework for classifying embryos, employing a fusion of convolutional neural network (CNN) and long short-term memory (LSTM) architecture, referred to as CNN-LSTM. Utilizing deep learning, our model achieves high accuracy in embryo classification while maintaining interpretability through XAI.

nan


Article 1420

Title@2025-06-07 (6): Rethinking Benign Overfitting in Two-Layer Neural Networks

Title: Rethinking Benign Overfitting in Two-Layer Neural Networks Neudenken Benign Overfitting in Zwei-Ebenen-Neural-Netzwerke 重新思考两层神经网络中的 “ 仁善 “ 改造 2502.11893v2

Authors (2): Ruichen Xu, Kexin Chen

Recent theoretical studies (Kou et al., 2023; Cao et al., 2022) have revealed a sharp phase transition from benign to harmful overfitting when the noise-to-feature ratio exceeds a threshold-a situation common in long-tailed data distributions where atypical data is prevalent. However, harmful overfitting rarely happens in overparameterized neural networks. Further experimental results suggested that memorization is necessary for achieving near-optimal generalization error in long-tailed data distributions (Feldman & Zhang, 2020). We argue that this discrepancy between theoretical predictions and empirical observations arises because previous feature-noise data models overlook the heterogeneous nature of noise across different data classes. In this paper, we refine the feature-noise data model by incorporating class-dependent heterogeneous noise and re-examine the overfitting phenomenon in neural networks. Through a comprehensive analysis of the training dynamics, we establish test loss bounds for the refined model. Our findings reveal that neural networks can leverage “data noise” to learn implicit features that improve the classification accuracy for long-tailed data. Our analysis also provides a training-free metric for evaluating data influence on test performance. Experimental validation on both synthetic and real-world datasets supports our theoretical results.

nan


Article 1421

Title@2025-06-07 (6): Accounting for plasticity: An extension of inelastic Constitutive Artificial Neural Networks

Title: Accounting for plasticity: An extension of inelastic Constitutive Artificial Neural Networks Bilanzierung der Plastizität: Eine Erweiterung der inelastischen konstitutiven künstlichen Neuronalen Netzwerke 可塑性核算:无弹性建筑人工神经神经网络的扩展 2407.19326v2

Authors (3): Birte Boes, Jaan-Willem Simon, Hagen Holthusen

In this work, we extend the existing framework of inelastic constitutive artificial neural networks (iCANNs) by incorporating plasticity to increase their applicability to model more complex material behavior. The proposed approach ensures objectivity, material symmetry, and thermodynamic consistency, providing a robust basis for automatic model discovery of constitutive equations at finite strains. These are predicted by discovering formulations for the Helmholtz free energy and plastic potentials for the yield function and evolution equations in terms of feed-forward networks. Our framework captures both linear and nonlinear kinematic hardening behavior. Investigation of our model’s prediction showed that the extended iCANNs successfully predict both linear and nonlinear kinematic hardening behavior based on experimental and artificially generated datasets, showcasing promising capabilities of this framework. Nonetheless, challenges remain in discovering more complex yield criteria with tension-compression asymmetry and addressing deviations in experimental data at larger strains. Despite these limitations, the proposed framework provides a promising basis for incorporating plasticity into iCANNs, offering a platform for advancing in the field of automated model discovery.

nan


Article 1422

Title@2025-06-07 (6): Flood-DamageSense: Multimodal Mamba with Multitask Learning for Building Flood Damage Assessment using SAR Remote Sensing Imagery

Title: Flood-DamageSense: Multimodal Mamba with Multitask Learning for Building Flood Damage Assessment using SAR Remote Sensing Imagery Flood-DamageSense: Multimodales Mamba mit Multitasking-Lernen zur Erstellung von Hochwasserschäden mit Hilfe von SAR Remote Sensing Imagery 水灾-灾情警报:利用SAR遥感图像学进行水灾损害评估的多任务学习的多模式Mamba 2506.06667v1

Authors (2): Yu-Hsuan Ho, Ali Mostafavi

Most post-disaster damage classifiers succeed only when destructive forces leave clear spectral or structural signatures – conditions rarely present after inundation. Consequently, existing models perform poorly at identifying flood-related building damages. The model presented in this study, Flood-DamageSense, addresses this gap as the first deep-learning framework purpose-built for building-level flood-damage assessment. The architecture fuses pre- and post-event SAR/InSAR scenes with very-high-resolution optical basemaps and an inherent flood-risk layer that encodes long-term exposure probabilities, guiding the network toward plausibly affected structures even when compositional change is minimal. A multimodal Mamba backbone with a semi-Siamese encoder and task-specific decoders jointly predicts (1) graded building-damage states, (2) floodwater extent, and (3) building footprints. Training and evaluation on Hurricane Harvey (2017) imagery from Harris County, Texas – supported by insurance-derived property-damage extents – show a mean F1 improvement of up to 19 percentage points over state-of-the-art baselines, with the largest gains in the frequently misclassified “minor” and “moderate” damage categories. Ablation studies identify the inherent-risk feature as the single most significant contributor to this performance boost. An end-to-end post-processing pipeline converts pixel-level outputs to actionable, building-scale damage maps within minutes of image acquisition. By combining risk-aware modeling with SAR’s all-weather capability, Flood-DamageSense delivers faster, finer-grained, and more reliable flood-damage intelligence to support post-disaster decision-making and resource allocation.

nan


Article 1423

Title@2025-06-07 (6): Through the Gaps: Uncovering Tactical Line-Breaking Passes with Clustering

Title: Through the Gaps: Uncovering Tactical Line-Breaking Passes with Clustering Durch die Lücken: Entdeckt taktische Linien-Breaking-Pässe mit Clustering 穿透差距:以集束方式破开战术直线通道 2506.06666v1

Authors (2): Oktay Karakuş, Hasan Arkadaş

Line-breaking passes (LBPs) are crucial tactical actions in football, allowing teams to penetrate defensive lines and access high-value spaces. In this study, we present an unsupervised, clustering-based framework for detecting and analysing LBPs using synchronised event and tracking data from elite matches. Our approach models opponent team shape through vertical spatial segmentation and identifies passes that disrupt defensive lines within open play. Beyond detection, we introduce several tactical metrics, including the space build-up ratio (SBR) and two chain-based variants, LBPCh$^1$ and LBPCh$^2$, which quantify the effectiveness of LBPs in generating immediate or sustained attacking threats. We evaluate these metrics across teams and players in the 2022 FIFA World Cup, revealing stylistic differences in vertical progression and structural disruption. The proposed methodology is explainable, scalable, and directly applicable to modern performance analysis and scouting workflows.

nan


Article 1424

Title@2025-06-07 (6): SDP-CROWN: Efficient Bound Propagation for Neural Network Verification with Tightness of Semidefinite Programming

Title: SDP-CROWN: Efficient Bound Propagation for Neural Network Verification with Tightness of Semidefinite Programming SDP-CROWN: Effiziente Bound Propagation für neurale Netzwerkverifikation mit Zähigkeit der semidefiniten Programmierung SDP-CROWN: 有效促进神经网络核查和半无限期方案拟订的严格性 2506.06665v1

Authors (4): Hong-Ming Chiu, Hao Chen, Huan Zhang, Richard Y. Zhang

Neural network verifiers based on linear bound propagation scale impressively to massive models but can be surprisingly loose when neuron coupling is crucial. Conversely, semidefinite programming (SDP) verifiers capture inter-neuron coupling naturally, but their cubic complexity restricts them to only small models. In this paper, we propose SDP-CROWN, a novel hybrid verification framework that combines the tightness of SDP relaxations with the scalability of bound-propagation verifiers. At the core of SDP-CROWN is a new linear bound, derived via SDP principles, that explicitly captures $\ell_{2}$-norm-based inter-neuron coupling while adding only one extra parameter per layer. This bound can be integrated seamlessly into any linear bound-propagation pipeline, preserving the inherent scalability of such methods yet significantly improving tightness. In theory, we prove that our inter-neuron bound can be up to a factor of $\sqrt{n}$ tighter than traditional per-neuron bounds. In practice, when incorporated into the state-of-the-art $\alpha$-CROWN verifier, we observe markedly improved verification performance on large models with up to 65 thousand neurons and 2.47 million parameters, achieving tightness that approaches that of costly SDP-based methods.

nan


Article 1425

Title@2025-06-07 (6): SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL

Title: SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL SLAC: Simulationsvorausgesetzter Latent Action Space für Ganzkörper-Real-World RL SLAC: 模拟 - 预先准备的中转行动空间,用于整个模拟 - 模拟 - 现实世界RL 2506.04147v2

Authors (3): Jiaheng Hu, Peter Stone, Roberto Martín-Martín

Building capable household and industrial robots requires mastering the control of versatile, high-degree-of-freedom (DoF) systems such as mobile manipulators. While reinforcement learning (RL) holds promise for autonomously acquiring robot control policies, scaling it to high-DoF embodiments remains challenging. Direct RL in the real world demands both safe exploration and high sample efficiency, which are difficult to achieve in practice. Sim-to-real RL, on the other hand, is often brittle due to the reality gap. This paper introduces SLAC, a method that renders real-world RL feasible for complex embodiments by leveraging a low-fidelity simulator to pretrain a task-agnostic latent action space. SLAC trains this latent action space via a customized unsupervised skill discovery method designed to promote temporal abstraction, disentanglement, and safety, thereby facilitating efficient downstream learning. Once a latent action space is learned, SLAC uses it as the action interface for a novel off-policy RL algorithm to autonomously learn downstream tasks through real-world interactions. We evaluate SLAC against existing methods on a suite of bimanual mobile manipulation tasks, where it achieves state-of-the-art performance. Notably, SLAC learns contact-rich whole-body tasks in under an hour of real-world interactions, without relying on any demonstrations or hand-crafted behavior priors. More information, code, and videos at robo-rl.github.io

nan


Article 1426

Title@2025-06-07 (6): Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence

Title: Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence Kurzlanges Adversarial Training hilft LLMs, langanhaltende Jailbreak Attacken zu verteidigen: Theoretische und empirische Beweise 短期对面培训帮助长者捍卫长期越狱袭击:理论和经验证据 2502.04204v2

Authors (4): Shaopeng Fu, Liang Ding, Jingfeng Zhang, Di Wang

Jailbreak attacks against large language models (LLMs) aim to induce harmful behaviors in LLMs through carefully crafted adversarial prompts. To mitigate attacks, one way is to perform adversarial training (AT)-based alignment, i.e., training LLMs on some of the most adversarial prompts to help them learn how to behave safely under attacks. During AT, the length of adversarial prompts plays a critical role in the robustness of aligned LLMs. While long-length adversarial prompts during AT might lead to strong LLM robustness, their synthesis however is very resource-consuming, which may limit the application of LLM AT. This paper focuses on adversarial suffix jailbreak attacks and unveils that to defend against a jailbreak attack with an adversarial suffix of length $\Theta(M)$, it is enough to align LLMs on prompts with adversarial suffixes of length $\Theta(\sqrt{M})$. Theoretically, we analyze the adversarial in-context learning of linear transformers on linear regression tasks and prove a robust generalization bound for trained transformers. The bound depends on the term $\Theta(\sqrt{M_{\text{test}}}/M_{\text{train}})$, where $M_{\text{train}}$ and $M_{\text{test}}$ are the numbers of adversarially perturbed in-context samples during training and testing. Empirically, we conduct AT on popular open-source LLMs and evaluate their robustness against jailbreak attacks of different adversarial suffix lengths. Results confirm a positive correlation between the attack success rate and the ratio of the square root of the adversarial suffix length during jailbreaking to the length during AT. Our findings show that it is practical to defend against long-length'' jailbreak attacks via efficientshort-length’’ AT. The code is available at https://github.com/fshp971/adv-icl.

nan


Article 1427

Title@2025-06-07 (6): Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning

Title: Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning Q-Policy: Quantenverstärkte Politikevaluierung für skalierbares Stärkungslernen Q-政策:为可扩展的强化学习加强政策评价 2505.11862v2

Authors (3): Kalyan Cherukuri, Aarav Lala, Yash Yardi

We propose Q-Policy, a hybrid quantum-classical reinforcement learning (RL) framework that mathematically accelerates policy evaluation and optimization by exploiting quantum computing primitives. Q-Policy encodes value functions in quantum superposition, enabling simultaneous evaluation of multiple state-action pairs via amplitude encoding and quantum parallelism. We introduce a quantum-enhanced policy iteration algorithm with provable polynomial reductions in sample complexity for the evaluation step, under standard assumptions. To demonstrate the technical feasibility and theoretical soundness of our approach, we validate Q-Policy on classical emulations of small discrete control tasks. Due to current hardware and simulation limitations, our experiments focus on showcasing proof-of-concept behavior rather than large-scale empirical evaluation. Our results support the potential of Q-Policy as a theoretical foundation for scalable RL on future quantum devices, addressing RL scalability challenges beyond classical approaches.

nan


Article 1428

Title@2025-06-07 (6): Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning

Title: Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning Pareto-Optimal Rewards von Noisy Preferences lernen: Ein Rahmen für multi-objektives Inverse-Verstärkung-Lernen 从新偏爱中学习 Pareto- Opatimal 奖励:多目标反强化学习框架 2505.11864v2

Authors (2): Kalyan Cherukuri, Aarav Lala

As generative agents become increasingly capable, alignment of their behavior with complex human values remains a fundamental challenge. Existing approaches often simplify human intent through reduction to a scalar reward, overlooking the multi-faceted nature of human feedback. In this work, we introduce a theoretical framework for preference-based Multi-Objective Inverse Reinforcement Learning (MO-IRL), where human preferences are modeled as latent vector-valued reward functions. We formalize the problem of recovering a Pareto-optimal reward representation from noisy preference queries and establish conditions for identifying the underlying multi-objective structure. We derive tight sample complexity bounds for recovering $\epsilon$-approximations of the Pareto front and introduce a regret formulation to quantify suboptimality in this multi-objective setting. Furthermore, we propose a provably convergent algorithm for policy optimization using preference-inferred reward cones. Our results bridge the gap between practical alignment techniques and theoretical guarantees, providing a principled foundation for learning aligned behaviors in a high-dimension and value-pluralistic environment.

nan


Article 1429

Title@2025-06-07 (6): GraphRAG under Fire

Title: GraphRAG under Fire GraphRAG unter Feuer 火下的图形RAG 2501.14050v3

Authors (7): Jiacheng Liang, Yuhui Wang, Changjiang Li, Rongyi Zhu, Tanqiu Jiang, Neil Gong, Ting Wang

GraphRAG advances retrieval-augmented generation (RAG) by structuring external knowledge as multi-scale knowledge graphs, enabling language models to integrate both broad context and granular details in their generation. While GraphRAG has demonstrated success across domains, its security implications remain largely unexplored. To bridge this gap, this work examines GraphRAG’s vulnerability to poisoning attacks, uncovering an intriguing security paradox: existing RAG poisoning attacks are less effective under GraphRAG than conventional RAG, due to GraphRAG’s graph-based indexing and retrieval; yet, the same features also create new attack surfaces. We present GragPoison, a novel attack that exploits shared relations in the underlying knowledge graph to craft poisoning text capable of compromising multiple queries simultaneously. GragPoison employs three key strategies: (i) relation injection to introduce false knowledge, (ii) relation enhancement to amplify poisoning influence, and (iii) narrative generation to embed malicious content within coherent text. Empirical evaluation across diverse datasets and models shows that GragPoison substantially outperforms existing attacks in terms of effectiveness (up to 98% success rate) and scalability (using less than 68% poisoning text) on multiple variations of GraphRAG. We also explore potential defensive measures and their limitations, identifying promising directions for future research.

nan


Article 1430

Title@2025-06-07 (6): Rescaled Influence Functions: Accurate Data Attribution in High Dimension

Title: Rescaled Influence Functions: Accurate Data Attribution in High Dimension Reskalierte Einflussfunktionen: Genaue Datenzuweisung in hoher Dimension 重新尺度影响函数: 高维度的准确数据归属 2506.06656v1

Authors (2): Ittai Rubinstein, Samuel B. Hopkins

How does the training data affect a model’s behavior? This is the question we seek to answer with data attribution. The leading practical approaches to data attribution are based on influence functions (IF). IFs utilize a first-order Taylor approximation to efficiently predict the effect of removing a set of samples from the training set without retraining the model, and are used in a wide variety of machine learning applications. However, especially in the high-dimensional regime (# params $\geq \Omega($# samples$)$), they are often imprecise and tend to underestimate the effect of sample removals, even for simple models such as logistic regression. We present rescaled influence functions (RIF), a new tool for data attribution which can be used as a drop-in replacement for influence functions, with little computational overhead but significant improvement in accuracy. We compare IF and RIF on a range of real-world datasets, showing that RIFs offer significantly better predictions in practice, and present a theoretical analysis explaining this improvement. Finally, we present a simple class of data poisoning attacks that would fool IF-based detections but would be detected by RIF.

nan


Article 1431

Title@2025-06-07 (6): Explaining Risks: Axiomatic Risk Attributions for Financial Models

Title: Explaining Risks: Axiomatic Risk Attributions for Financial Models Risiken erklären: Axiomatische Risikozuweisungen für Finanzmodelle 解释风险:金融模型的逻辑风险归属 2506.06653v1

Authors (1): Dangxing Chen

In recent years, machine learning models have achieved great success at the expense of highly complex black-box structures. By using axiomatic attribution methods, we can fairly allocate the contributions of each feature, thus allowing us to interpret the model predictions. In high-risk sectors such as finance, risk is just as important as mean predictions. Throughout this work, we address the following risk attribution problem: how to fairly allocate the risk given a model with data? We demonstrate with analysis and empirical examples that risk can be well allocated by extending the Shapley value framework.

nan


Article 1432

Title@2025-06-07 (6): Protap: A Benchmark for Protein Modeling on Realistic Downstream Applications

Title: Protap: A Benchmark for Protein Modeling on Realistic Downstream Applications Protap: Ein Benchmark für Proteinmodellierung auf realistischen Downstream-Anwendungen Protap:关于现实下下游应用的蛋白质建模基准 2506.02052v2

Authors (10): Shuo Yan, Yuliang Yan, Bin Ma, Chenao Li, Haochun Tang, Jiahua Lu, Minhua Lin, Yuyuan Feng, Hui Xiong, Enyan Dai

Recently, extensive deep learning architectures and pretraining strategies have been explored to support downstream protein applications. Additionally, domain-specific models incorporating biological knowledge have been developed to enhance performance in specialized tasks. In this work, we introduce $\textbf{Protap}$, a comprehensive benchmark that systematically compares backbone architectures, pretraining strategies, and domain-specific models across diverse and realistic downstream protein applications. Specifically, Protap covers five applications: three general tasks and two novel specialized tasks, i.e., enzyme-catalyzed protein cleavage site prediction and targeted protein degradation, which are industrially relevant yet missing from existing benchmarks. For each application, Protap compares various domain-specific models and general architectures under multiple pretraining settings. Our empirical studies imply that: (i) Though large-scale pretraining encoders achieve great results, they often underperform supervised encoders trained on small downstream training sets. (ii) Incorporating structural information during downstream fine-tuning can match or even outperform protein language models pretrained on large-scale sequence corpora. (iii) Domain-specific biological priors can enhance performance on specialized downstream tasks. Code and datasets are publicly available at https://github.com/Trust-App-AI-Lab/protap.

nan


Article 1433

Title@2025-06-07 (6): SAFER: A Calibrated Risk-Aware Multimodal Recommendation Model for Dynamic Treatment Regimes

Title: SAFER: A Calibrated Risk-Aware Multimodal Recommendation Model for Dynamic Treatment Regimes SAFER: Ein kalibriertes multimodales Empfehlungsmodell für dynamische Behandlungssysteme FAFER: 动态处理制度经校准的风险软件多式建议模型 2506.06649v1

Authors (4): Yishan Shen, Yuyang Ye, Hui Xiong, Yong Chen

Dynamic treatment regimes (DTRs) are critical to precision medicine, optimizing long-term outcomes through personalized, real-time decision-making in evolving clinical contexts, but require careful supervision for unsafe treatment risks. Existing efforts rely primarily on clinician-prescribed gold standards despite the absence of a known optimal strategy, and predominantly using structured EHR data without extracting valuable insights from clinical notes, limiting their reliability for treatment recommendations. In this work, we introduce SAFER, a calibrated risk-aware tabular-language recommendation framework for DTR that integrates both structured EHR and clinical notes, enabling them to learn from each other, and addresses inherent label uncertainty by assuming ambiguous optimal treatment solution for deceased patients. Moreover, SAFER employs conformal prediction to provide statistical guarantees, ensuring safe treatment recommendations while filtering out uncertain predictions. Experiments on two publicly available sepsis datasets demonstrate that SAFER outperforms state-of-the-art baselines across multiple recommendation metrics and counterfactual mortality rate, while offering robust formal assurances. These findings underscore SAFER potential as a trustworthy and theoretically grounded solution for high-stakes DTR applications.

nan


Article 1434

Title@2025-06-07 (6): Spark Transformer: Reactivating Sparsity in FFN and Attention

Title: Spark Transformer: Reactivating Sparsity in FFN and Attention Spark Transformer: Sparsamkeit in FFN und Aufmerksamkeit reaktivieren Spark 变换器: 重新激活FFFFF 的公平性和注意 2506.06644v1

Authors (19): Chong You, Kan Wu, Zhipeng Jia, Lin Chen, Srinadh Bhojanapalli, Jiaxian Guo, Utku Evci, Jan Wassenberg, Praneeth Netrapalli, Jeremiah J. Willcock, Suvinay Subramanian, Felix Chern, Alek Andreev, Shreya Pathak, Felix Yu, Prateek Jain, David E. Culler, Henry M. Levy, Sanjiv Kumar

The discovery of the lazy neuron phenomenon in trained Transformers, where the vast majority of neurons in their feed-forward networks (FFN) are inactive for each token, has spurred tremendous interests in activation sparsity for enhancing large model efficiency. While notable progress has been made in translating such sparsity to wall-time benefits, modern Transformers have moved away from the ReLU activation function crucial to this phenomenon. Existing efforts on re-introducing activation sparsity often degrade model quality, increase parameter count, complicate or slow down training. Sparse attention, the application of sparse activation to the attention mechanism, often faces similar challenges. This paper introduces the Spark Transformer, a novel architecture that achieves a high level of activation sparsity in both FFN and the attention mechanism while maintaining model quality, parameter count, and standard training procedures. Our method realizes sparsity via top-k masking for explicit control over sparsity level. Crucially, we introduce statistical top-k, a hardware-accelerator-friendly, linear-time approximate algorithm that avoids costly sorting and mitigates significant training slowdown from standard top-$k$ operators. Furthermore, Spark Transformer reallocates existing FFN parameters and attention key embeddings to form a low-cost predictor for identifying activated entries. This design not only mitigates quality loss from enforced sparsity, but also enhances wall-time benefit. Pretrained with the Gemma-2 recipe, Spark Transformer demonstrates competitive performance on standard benchmarks while exhibiting significant sparsity: only 8% of FFN neurons are activated, and each token attends to a maximum of 256 tokens. This sparsity translates to a 2.5x reduction in FLOPs, leading to decoding wall-time speedups of up to 1.79x on CPU and 1.40x on GPU.

nan


Article 1435

Title@2025-06-07 (6): GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Title: GLASS: Guided Latent Slot Diffusion for Object-Centric Learning GLASS: Geführte Latent Slot Diffusion für Objekt-Centric Learning GLASS: 向导中中流槽扩散,用于对象中心学习 2407.17929v2

Authors (3): Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth

Object-centric learning aims to decompose an input image into a set of meaningful object files (slots). These latent object representations enable a variety of downstream tasks. Yet, object-centric learning struggles on real-world datasets, which contain multiple objects of complex textures and shapes in natural everyday scenes. To address this, we introduce Guided Latent Slot Diffusion (GLASS), a novel slot-attention model that learns in the space of generated images and uses semantic and instance guidance modules to learn better slot embeddings for various downstream tasks. Our experiments show that GLASS surpasses state-of-the-art slot-attention methods by a wide margin on tasks such as (zero-shot) object discovery and conditional image generation for real-world scenes. Moreover, GLASS enables the first application of slot attention to the compositional generation of complex, realistic scenes.

nan


Article 1436

Title@2025-06-07 (6): Zero-shot Generalist Graph Anomaly Detection with Unified Neighborhood Prompts

Title: Zero-shot Generalist Graph Anomaly Detection with Unified Neighborhood Prompts Zero-shot Generalist Graph Anomalie Erkennung mit Unified Neighborhood Prompts 使用统一邻里提示器进行异常探测 2410.14886v2

Authors (5): Chaoxi Niu, Hezhe Qiao, Changlu Chen, Ling Chen, Guansong Pang

Graph anomaly detection (GAD), which aims to identify nodes in a graph that significantly deviate from normal patterns, plays a crucial role in broad application domains. However, existing GAD methods are one-model-for-one-dataset approaches, i.e., training a separate model for each graph dataset. This largely limits their applicability in real-world scenarios. To overcome this limitation, we propose a novel zero-shot generalist GAD approach UNPrompt that trains a one-for-all detection model, requiring the training of one GAD model on a single graph dataset and then effectively generalizing to detect anomalies in other graph datasets without any retraining or fine-tuning. The key insight in UNPrompt is that i) the predictability of latent node attributes can serve as a generalized anomaly measure and ii) generalized normal and abnormal graph patterns can be learned via latent node attribute prediction in a properly normalized node attribute space. UNPrompt achieves a generalist mode for GAD through two main modules: one module aligns the dimensionality and semantics of node attributes across different graphs via coordinate-wise normalization, while another module learns generalized neighborhood prompts that support the use of latent node attribute predictability as an anomaly score across different datasets. Extensive experiments on real-world GAD datasets show that UNPrompt significantly outperforms diverse competing methods under the generalist GAD setting, and it also has strong superiority under the one-model-for-one-dataset setting. Code is available at https://github.com/mala-lab/UNPrompt.

nan


Article 1437

Title@2025-06-07 (6): Non-Intrusive Load Monitoring Based on Image Load Signatures and Continual Learning

Title: Non-Intrusive Load Monitoring Based on Image Load Signatures and Continual Learning Nicht-intrusive Lastüberwachung basierend auf Bildlastsignaturen und kontinuierlichem Lernen 基于图像加载签名和不断学习的无侵入性负载监测 2506.06637v1

Authors (2): Olimjon Toirov, Wei Yu

Non-Intrusive Load Monitoring (NILM) identifies the operating status and energy consumption of each electrical device in the circuit by analyzing the electrical signals at the bus, which is of great significance for smart power management. However, the complex and changeable load combinations and application environments lead to the challenges of poor feature robustness and insufficient model generalization of traditional NILM methods. To this end, this paper proposes a new non-intrusive load monitoring method that integrates “image load signature” and continual learning. This method converts multi-dimensional power signals such as current, voltage, and power factor into visual image load feature signatures, and combines deep convolutional neural networks to realize the identification and classification of multiple devices; at the same time, self-supervised pre-training is introduced to improve feature generalization, and continual online learning strategies are used to overcome model forgetting to adapt to the emergence of new loads. This paper conducts a large number of experiments on high-sampling rate load datasets, and compares a variety of existing methods and model variants. The results show that the proposed method has achieved significant improvements in recognition accuracy.

nan


Article 1438

Title@2025-06-07 (6): Vision-QRWKV: Exploring Quantum-Enhanced RWKV Models for Image Classification

Title: Vision-QRWKV: Exploring Quantum-Enhanced RWKV Models for Image Classification Vision-QRWKV: Quantum-erweiterte RWKV-Modelle für die Bildklassifikation 展望-QRWKKV:探索以量为基础增强的RWKV图像分类模型 2506.06633v1

Authors (1): Chi-Sheng Chen

Recent advancements in quantum machine learning have shown promise in enhancing classical neural network architectures, particularly in domains involving complex, high-dimensional data. Building upon prior work in temporal sequence modeling, this paper introduces Vision-QRWKV, a hybrid quantum-classical extension of the Receptance Weighted Key Value (RWKV) architecture, applied for the first time to image classification tasks. By integrating a variational quantum circuit (VQC) into the channel mixing component of RWKV, our model aims to improve nonlinear feature transformation and enhance the expressive capacity of visual representations. We evaluate both classical and quantum RWKV models on a diverse collection of 14 medical and standard image classification benchmarks, including MedMNIST datasets, MNIST, and FashionMNIST. Our results demonstrate that the quantum-enhanced model outperforms its classical counterpart on a majority of datasets, particularly those with subtle or noisy class distinctions (e.g., ChestMNIST, RetinaMNIST, BloodMNIST). This study represents the first systematic application of quantum-enhanced RWKV in the visual domain, offering insights into the architectural trade-offs and future potential of quantum models for lightweight and efficient vision tasks.

nan


Article 1439

Title@2025-06-07 (6): Arbitrarily-Conditioned Multi-Functional Diffusion for Multi-Physics Emulation

Title: Arbitrarily-Conditioned Multi-Functional Diffusion for Multi-Physics Emulation Willkürlich bedingte multifunktionale Diffusion für multiphysikalische Emulation 多物理模拟模拟多功能多功能扩散 2410.13794v2

Authors (5): Da Long, Zhitong Xu, Guang Yang, Akil Narayan, Shandian Zhe

Modern physics simulation often involves multiple functions of interests, and traditional numerical approaches are known to be complex and computationally costly. While machine learning-based surrogate models can offer significant cost reductions, most focus on a single task, such as forward prediction, and typically lack uncertainty quantification – an essential component in many applications. To overcome these limitations, we propose Arbitrarily-Conditioned Multi-Functional Diffusion (ACM-FD), a versatile probabilistic surrogate model for multi-physics emulation. ACM-FD can perform a wide range of tasks within a single framework, including forward prediction, various inverse problems, and simulating data for entire systems or subsets of quantities conditioned on others. Specifically, we extend the standard Denoising Diffusion Probabilistic Model (DDPM) for multi-functional generation by modeling noise as Gaussian processes (GP). We propose a random-mask based, zero-regularized denoising loss to achieve flexible and robust conditional generation. We induce a Kronecker product structure in the GP covariance matrix, substantially reducing the computational cost and enabling efficient training and sampling. We demonstrate the effectiveness of ACM-FD across several fundamental multi-physics systems.

nan


Article 1440

Title@2025-06-07 (6): Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach

Title: Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach Wiederherstellung der Kalibrierung für ausgerichtete große Sprachmodelle: Ein Calibration-Aware Fine-Tuning-Ansatz 统一大语言模型的重新校准:校准-软件精美调试方法 2505.01997v2

Authors (7): Jiancong Xiao, Bojian Hou, Zhanliang Wang, Ruochen Jin, Qi Long, Weijie J. Su, Li Shen

One of the key technologies for the success of Large Language Models (LLMs) is preference alignment. However, a notable side effect of preference alignment is poor calibration: while the pre-trained models are typically well-calibrated, LLMs tend to become poorly calibrated after alignment with human preferences. In this paper, we investigate why preference alignment affects calibration and how to address this issue. For the first question, we observe that the preference collapse issue in alignment undesirably generalizes to the calibration scenario, causing LLMs to exhibit overconfidence and poor calibration. To address this, we demonstrate the importance of fine-tuning with domain-specific knowledge to alleviate the overconfidence issue. To further analyze whether this affects the model’s performance, we categorize models into two regimes: calibratable and non-calibratable, defined by bounds of Expected Calibration Error (ECE). In the calibratable regime, we propose a calibration-aware fine-tuning approach to achieve proper calibration without compromising LLMs’ performance. However, as models are further fine-tuned for better performance, they enter the non-calibratable regime. For this case, we develop an EM-algorithm-based ECE regularization for the fine-tuning loss to maintain low calibration error. Extensive experiments validate the effectiveness of the proposed methods.

nan


Article 1441

Title@2025-06-07 (6): Harnessing On-Device Large Language Model: Empirical Results and Implications for AI PC

Title: Harnessing On-Device Large Language Model: Empirical Results and Implications for AI PC Nutzung von On-Device Large Language Modell: Empirische Ergebnisse und Implikationen für KI-PC 利用大语言大语言模型:经验结果和对AIPC的影响 2505.15030v3

Authors (8): Qingyu Song, Peiyu Liao, Wenqian Zhao, Yiwen Wang, Shoubo Hu, Hui-Ling Zhen, Ning Jiang, Mingxuan Yuan

The increasing deployment of Large Language Models (LLMs) on edge devices, driven by model advancements and hardware improvements, offers significant privacy benefits. However, these on-device LLMs inherently face performance limitations due to reduced model capacity and necessary compression techniques. To address this, we introduce a systematic methodology – encompassing model capability, development efficiency, and system resources – for evaluating on-device LLMs. Our comprehensive evaluation, encompassing models from 0.5B to 14B parameters and seven post-training quantization (PTQ) methods on commodity laptops, yields several critical insights: 1) System-level metrics exhibit near-linear scaling with effective bits-per-weight (BPW). 2) A practical threshold exists around $\sim$3.5 effective BPW, larger models subjected to low-bit quantization consistently outperform smaller models utilizing higher bit-precision. 3) Quantization with low BPW incurs marginal accuracy loss but significant memory savings. 4) Determined by low-level implementation specifics power consumption on CPU, where computation-intensive operations spend more power than memory-intensive ones. These findings offer crucial insights and practical guidelines for the efficient deployment and optimized configuration of LLMs on resource-constrained edge devices. Our codebase is available at https://github.com/simmonssong/LLMOnDevice.

nan


Article 1442

Title@2025-06-07 (6): Policy Filtration for RLHF to Mitigate Noise in Reward Models

Title: Policy Filtration for RLHF to Mitigate Noise in Reward Models Politische Filtration für RLHF zur Mititation von Lärm in Prämienmodellen 将RLHF政策归类为奖励模型中最小噪音的政策 2409.06957v5

Authors (7): Chuheng Zhang, Wei Shen, Li Zhao, Xuyun Zhang, Xiaolong Xu, Wanchun Dou, Jiang Bian

While direct policy optimization methods exist, pioneering LLMs are fine-tuned with reinforcement learning from human feedback (RLHF) to generate better responses under the supervision of a reward model learned from preference data. One major challenge of RLHF is the inaccuracy of the intermediate reward model, especially in the tasks that requires complex reasoning for the reward model to score a response. We find that the reliability of the reward model varies across responses assigned with different rewards. This motivates us to filter the samples whose rewards may be unreliable to improve the signal-to-noise ratio during policy learning, resulting in Policy Filtration for Proximal Policy Optimization (PF-PPO). To choose a proper policy filtering strategy, we use the coefficient of determination (R2) between the rewards and actual scores on filtered samples as the metrics to help us find promising strategies since it measures how well the rewards filtered by PF-PPO indicate real performance. We provide extensive experiments to validate the effectiveness of PF-PPO in code generation and math reasoning tasks. In code generation, PF-PPO achieves the state-of-the-art performance of 7-billion-parameter models on HumanEval (+7.9%), MBPP (+0.7%), and LeetCode Contest (+10.0%) which is a more challenging benchmark created by us. In math reasoning, PF-PPO yields performance increase using different reward models and benchmarks (Ape210K and CMATH). Code is available on https://github.com/DtYXs/verl/tree/pf-ppo.

nan


Article 1443

Title@2025-06-07 (6): Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning

Title: Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning Lehrplan Verstärkung Lernen von einfachen bis harten Aufgaben verbessert LLM Reasoning 从易到困难的工作学习改进LLM 理由 2506.06632v1

Authors (11): Shubham Parashar, Shurui Gui, Xiner Li, Hongyi Ling, Sushil Vemuri, Blake Olson, Eric Li, Yu Zhang, James Caverlee, Dileep Kalathil, Shuiwang Ji

We aim to improve the reasoning capabilities of language models via reinforcement learning (RL). Recent RL post-trained models like DeepSeek-R1 have demonstrated reasoning abilities on mathematical and coding tasks. However, prior studies suggest that using RL alone to improve reasoning on inherently difficult tasks is less effective. Here, we draw inspiration from curriculum learning and propose to schedule tasks from easy to hard (E2H), allowing LLMs to build reasoning skills gradually. Our method is termed E2H Reasoner. Empirically, we observe that, although easy tasks are important initially, fading them out through appropriate scheduling is essential in preventing overfitting. Theoretically, we establish convergence guarantees for E2H Reasoner within an approximate policy iteration framework. We derive finite-sample complexity bounds and show that when tasks are appropriately decomposed and conditioned, learning through curriculum stages requires fewer total samples than direct learning. Experiments across multiple domains show that E2H Reasoner significantly improves the reasoning ability of small LLMs (1.5B to 3B), which otherwise struggle when trained with vanilla RL alone, highlighting the effectiveness of our method.

nan


Article 1444

Title@2025-06-07 (6): Q-WSL: Optimizing Goal-Conditioned RL with Weighted Supervised Learning via Dynamic Programming

Title: Q-WSL: Optimizing Goal-Conditioned RL with Weighted Supervised Learning via Dynamic Programming Q-WSL: Optimierung der zielkonditionierten RL mit gewichtetem betreutem Lernen durch dynamische Programmierung Q-WSL:通过动态方案拟订,以加权监督的学习方式,优化目标有条件的RL 2410.06648v5

Authors (4): Xing Lei, Xuetao Zhang, Zifeng Zhuang, Donglin Wang

A novel class of advanced algorithms, termed Goal-Conditioned Weighted Supervised Learning (GCWSL), has recently emerged to tackle the challenges posed by sparse rewards in goal-conditioned reinforcement learning (RL). GCWSL consistently delivers strong performance across a diverse set of goal-reaching tasks due to its simplicity, effectiveness, and stability. However, GCWSL methods lack a crucial capability known as trajectory stitching, which is essential for learning optimal policies when faced with unseen skills during testing. This limitation becomes particularly pronounced when the replay buffer is predominantly filled with sub-optimal trajectories. In contrast, traditional TD-based RL methods, such as Q-learning, which utilize Dynamic Programming, do not face this issue but often experience instability due to the inherent difficulties in value function approximation. In this paper, we propose Q-learning Weighted Supervised Learning (Q-WSL), a novel framework designed to overcome the limitations of GCWSL by incorporating the strengths of Dynamic Programming found in Q-learning. Q-WSL leverages Dynamic Programming results to output the optimal action of (state, goal) pairs across different trajectories within the replay buffer. This approach synergizes the strengths of both Q-learning and GCWSL, effectively mitigating their respective weaknesses and enhancing overall performance. Empirical evaluations on challenging goal-reaching tasks demonstrate that Q-WSL surpasses other goal-conditioned approaches in terms of both performance and sample efficiency. Additionally, Q-WSL exhibits notable robustness in environments characterized by binary reward structures and environmental stochasticity.

nan


Article 1445

Title@2025-06-07 (6): Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts

Title: Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts Zertifiziert robuste Modellbewertung im Federated Learning unter Meta-Distributionsverschiebungen 在混合分配变化下对联邦学习模式进行可验证的健全模式评价 2410.20250v2

Authors (3): Amir Najafi, Samin Mahdizadeh Sani, Farzan Farnia

We address the challenge of certifying the performance of a federated learning model on an unseen target network using only measurements from the source network that trained the model. Specifically, consider a source network “A” with $K$ clients, each holding private, non-IID datasets drawn from heterogeneous distributions, modeled as samples from a broader meta-distribution $\mu$. Our goal is to provide certified guarantees for the model’s performance on a different, unseen network “B”, governed by an unknown meta-distribution $\mu’$, assuming the deviation between $\mu$ and $\mu’$ is bounded either in Wasserstein distance or an $f$-divergence. We derive worst-case uniform guarantees for both the model’s average loss and its risk CDF, the latter corresponding to a novel, adversarially robust version of the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality. In addition, we show how the vanilla DKW bound enables principled certification of the model’s true performance on unseen clients within the same (source) network. Our bounds are efficiently computable, asymptotically minimax optimal, and preserve clients’ privacy. We also establish non-asymptotic generalization bounds that converge to zero as $K$ grows and the minimum per-client sample size exceeds $\mathcal{O}(\log K)$. Empirical evaluations confirm the practical utility of our bounds across real-world tasks. The project code is available at: github.com/samin-mehdizadeh/Robust-Evaluation-DKW

nan


Article 1446

Title@2025-06-07 (6): Equilibrium Distribution for t-Distributed Stochastic Neighbor Embedding with Generalized Kernels

Title: Equilibrium Distribution for t-Distributed Stochastic Neighbor Embedding with Generalized Kernels Gleichgewichtsverteilung für t-Distributed Stochastic Neighbor Einbettung mit generalisierten Kernen 以一般内核嵌入的三散散散蒸散邻居间均匀分布 2505.24311v2

Authors (1): Yi Gu

T-distributed stochastic neighbor embedding (t-SNE) is a well-known algorithm for visualizing high-dimensional data by finding low-dimensional representations. In this paper, we study the convergence of t-SNE with generalized kernels and extend the results of Auffinger and Fletcher in 2023. Our work starts by giving a concrete formulation of generalized input and output kernels. Then we prove that under certain conditions, the t-SNE algorithm converges to an equilibrium distribution for a wide range of input and output kernels as the number of data points diverges.

nan


Article 1447

Title@2025-06-07 (6): Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations

Title: Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations Robuste Erlernbarkeit von Proben-kompressiblen Verteilungen unter Lärm oder störenden Störungen 在噪音或对称干扰下抽样压缩分发的可有力学习性 2506.06613v1

Authors (2): Arefe Boushehrian, Amir Najafi

Learning distribution families over $\mathbb{R}^d$ is a fundamental problem in unsupervised learning and statistics. A central question in this setting is whether a given family of distributions possesses sufficient structure to be (at least) information-theoretically learnable and, if so, to characterize its sample complexity. In 2018, Ashtiani et al. reframed \emph{sample compressibility}, originally due to Littlestone and Warmuth (1986), as a structural property of distribution classes, proving that it guarantees PAC-learnability. This discovery subsequently enabled a series of recent advancements in deriving nearly tight sample complexity bounds for various high-dimensional open problems. It has been further conjectured that the converse also holds: every learnable class admits a tight sample compression scheme. In this work, we establish that sample compressible families remain learnable even from perturbed samples, subject to a set of necessary and sufficient conditions. We analyze two models of data perturbation: (i) an additive independent noise model, and (ii) an adversarial corruption model, where an adversary manipulates a limited subset of the samples unknown to the learner. Our results are general and rely on as minimal assumptions as possible. We develop a perturbation-quantization framework that interfaces naturally with the compression scheme and leads to sample complexity bounds that scale gracefully with the noise level and corruption budget. As concrete applications, we establish new sample complexity bounds for learning finite mixtures of high-dimensional uniform distributions under both noise and adversarial perturbations, as well as for learning Gaussian mixture models from adversarially corrupted samples, resolving two open problems in the literature.

nan


Article 1448

Title@2025-06-07 (6): Transferring Features Across Language Models With Model Stitching

Title: Transferring Features Across Language Models With Model Stitching Übertragung von Funktionen über Sprachmodelle mit Modellstich 使用模型裁剪的跨语言模型传输功能 2506.06609v1

Authors (4): Alan Chen, Jack Merullo, Alessandro Stolfo, Ellie Pavlick

In this work, we demonstrate that affine mappings between residual streams of language models is a cheap way to effectively transfer represented features between models. We apply this technique to transfer the weights of Sparse Autoencoders (SAEs) between models of different sizes to compare their representations. We find that small and large models learn highly similar representation spaces, which motivates training expensive components like SAEs on a smaller model and transferring to a larger model at a FLOPs savings. For example, using a small-to-large transferred SAE as initialization can lead to 50% cheaper training runs when training SAEs on larger models. Next, we show that transferred probes and steering vectors can effectively recover ground truth performance. Finally, we dive deeper into feature-level transferability, finding that semantic and structural features transfer noticeably differently while specific classes of functional features have their roles faithfully mapped. Overall, our findings illustrate similarities and differences in the linear representation spaces of small and large models and demonstrate a method for improving the training efficiency of SAEs.

nan


Article 1449

Title@2025-06-07 (6): Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit

Title: Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit Training-Free Tokenizer Transplantation über Orthogonal Passende Verfolgung 通过矫形配对运动进行无培训的取暖器移植 2506.06607v1

Authors (2): Charles Goddard, Fernando Fernandes Neto

We present a training-free method to transplant tokenizers in pretrained large language models (LLMs) by reconstructing unseen token embeddings via Orthogonal Matching Pursuit (OMP). Specifically, we approximate each out-of-vocabulary token as a sparse linear combination of shared tokens, in two phases: first, compute each new token’s representation in the donor embedding space with a small dictionary of shared anchor tokens, then transfer these same sparse coefficients back into the base model’s embedding space. On two challenging cross-tokenizer tasks–Llama$\to$Mistral NeMo (12B) and Qwen$\to$Llama (1B)–we show that OMP achieves best zero-shot preservation of the base model’s performance across multiple benchmarks, while other zero-shot approaches degrade significantly. Compared to baselines (zero-init, mean-init, and existing approaches like WECHSEL, FOCUS, ZETT), OMP consistently achieves the best overall performance, effectively bridging large tokenizer discrepancies without gradient updates. Our analysis further identifies mismatched numerical tokenization schemes as a critical challenge for preserving mathematical reasoning capabilities. This technique enables direct reuse of pretrained model weights with new tokenizers, facilitating cross-tokenizer knowledge distillation, speculative decoding, ensembling, merging, and domain-specific vocabulary adaptations. We integrate our method into the open-source mergekit-tokensurgeon tool for post hoc vocabulary realignment.

nan


Article 1450

Title@2025-06-07 (6): Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization

Title: Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization Stacey: Förderung des stochastischen Steepest Descents durch beschleunigte Optimierung von $\ell_p$-Smooth Nonconvex Stacey: 通过加速 $\ ell_ p$- Smooth 的非convex 优化, 推广斯托切性隐蔽源 2506.06606v1

Authors (6): Xinyu Luo, Cedar Site Bai, Bolian Li, Petros Drineas, Ruqi Zhang, Brian Bullins

While popular optimization methods such as SGD, AdamW, and Lion depend on steepest descent updates in either $\ell_2$ or $\ell_\infty$ norms, there remains a critical gap in handling the non-Euclidean structure observed in modern deep networks training. In this work, we address this need by introducing a new accelerated $\ell_p$ steepest descent algorithm, called Stacey, which uses interpolated primal-dual iterate sequences to effectively navigate non-Euclidean smooth optimization tasks. In addition to providing novel theoretical guarantees for the foundations of our algorithm, we empirically compare our approach against these popular methods on tasks including image classification and language model (LLM) pretraining, demonstrating both faster convergence and higher final accuracy. We further evaluate different values of $p$ across various models and datasets, underscoring the importance and efficiency of non-Euclidean approaches over standard Euclidean methods. Code can be found at https://github.com/xinyuluo8561/Stacey .

nan


Article 1451

Title@2025-06-07 (6): Scoring the Unscorables: Cyber Risk Assessment Beyond Internet Scans

Title: Scoring the Unscorables: Cyber Risk Assessment Beyond Internet Scans Bewertung der Unscorables: Cyber-Risikobewertung jenseits von Internet-Scans CSCOW:互联网扫描之外的网络风险评估 2506.06604v1

Authors (3): Armin Sarabi, Manish Karir, Mingyan Liu

In this paper we present a study on using novel data types to perform cyber risk quantification by estimating the likelihood of a data breach. We demonstrate that it is feasible to build a highly accurate cyber risk assessment model using public and readily available technology signatures obtained from crawling an organization’s website. This approach overcomes the limitations of previous similar approaches that relied on large-scale IP address based scanning data, which suffers from incomplete/missing IP address mappings as well as the lack of such data for large numbers of small and medium-sized organizations (SMEs). In comparison to scan data, technology digital signature data is more readily available for millions of SMEs. Our study shows that there is a strong relationship between these technology signatures and an organization’s cybersecurity posture. In cross-validating our model using different cyber incident datasets, we also highlight the key differences between ransomware attack victims and the larger population of cyber incident and data breach victims.

nan


Article 1452

Title: CAtCh: Cognitive Assessment through Cookie Thief CAtCh: Kognitive Bewertung durch Cookie Dieb CATCH: 通过饼干小偷进行认知评估 2506.06603v1

Authors (8): Joseph T Colonel, Carolyn Hagler, Guiselle Wismer, Laura Curtis, Jacqueline Becker, Juan Wisnivesky, Alex Federman, Gaurav Pandey

Several machine learning algorithms have been developed for the prediction of Alzheimer’s disease and related dementia (ADRD) from spontaneous speech. However, none of these algorithms have been translated for the prediction of broader cognitive impairment (CI), which in some cases is a precursor and risk factor of ADRD. In this paper, we evaluated several speech-based open-source methods originally proposed for the prediction of ADRD, as well as methods from multimodal sentiment analysis for the task of predicting CI from patient audio recordings. Results demonstrated that multimodal methods outperformed unimodal ones for CI prediction, and that acoustics-based approaches performed better than linguistics-based ones. Specifically, interpretable acoustic features relating to affect and prosody were found to significantly outperform BERT-based linguistic features and interpretable linguistic features, respectively. All the code developed for this study is available at https://github.com/JTColonel/catch.

nan


Article 1453

Title@2025-06-07 (6): Direct Prediction Set Minimization via Bilevel Conformal Classifier Training

Title: Direct Prediction Set Minimization via Bilevel Conformal Classifier Training Direkte Vorhersage Einstellung Minimierung über Bilevel Conformal Classifier Training 通过双级正规分类培训尽量减少直接预测设置 2506.06599v1

Authors (6): Yuanjie Shi, Hooman Shahrokhi, Xuesong Jia, Xiongzhi Chen, Janardhan Rao Doppa, Yan Yan

Conformal prediction (CP) is a promising uncertainty quantification framework which works as a wrapper around a black-box classifier to construct prediction sets (i.e., subset of candidate classes) with provable guarantees. However, standard calibration methods for CP tend to produce large prediction sets which makes them less useful in practice. This paper considers the problem of integrating conformal principles into the training process of deep classifiers to directly minimize the size of prediction sets. We formulate conformal training as a bilevel optimization problem and propose the {\em Direct Prediction Set Minimization (DPSM)} algorithm to solve it. The key insight behind DPSM is to minimize a measure of the prediction set size (upper level) that is conditioned on the learned quantile of conformity scores (lower level). We analyze that DPSM has a learning bound of $O(1/\sqrt{n})$ (with $n$ training samples), while prior conformal training methods based on stochastic approximation for the quantile has a bound of $\Omega(1/s)$ (with batch size $s$ and typically $s \ll \sqrt{n}$). Experiments on various benchmark datasets and deep models show that DPSM significantly outperforms the best prior conformal training baseline with $20.46\%\downarrow$ in the prediction set size and validates our theory.

nan


Article 1454

Title@2025-06-07 (6): Scaling Inference-Efficient Language Models

Title: Scaling Inference-Efficient Language Models Skalierung von Schlussfolgerungen-Effiziente Sprachmodelle 扩展率-有效语言模型 2501.18107v2

Authors (3): Song Bian, Minghao Yan, Shivaram Venkataraman

Scaling laws are powerful tools to predict the performance of large language models. However, current scaling laws fall short of accounting for inference costs. In this work, we first show that model architecture affects inference latency, where models of the same size can have up to 3.5x difference in latency. To tackle this challenge, we modify the Chinchilla scaling laws to co-optimize the model parameter count, the number of training tokens, and the model architecture. Due to the reason that models of similar training loss exhibit gaps in downstream evaluation, we also propose a novel method to train inference-efficient models based on the revised scaling laws. We perform extensive empirical studies to fit and evaluate our inference-aware scaling laws. We vary model parameters from 80M to 1B, training tokens from 1.6B to 30B, and model shapes, training 63 models. Guided by our inference-efficient scaling law and model selection method, we release the Morph-1B model, which improves inference latency by 1.8x while maintaining accuracy on downstream tasks compared to open-source models, pushing the Pareto frontier of accuracy-latency tradeoff. Notably, our experiments reveal that wider and shallower models can yield efficiency gains while preserving accuracy.

nan


Article 1455

Title@2025-06-07 (6): From Model-Based and Adaptive Control to Evolving Fuzzy Control

Title: From Model-Based and Adaptive Control to Evolving Fuzzy Control Von der modellbasierten und adaptiven Steuerung zur sich entwickelnden Fuzzy-Steuerung 从模型和适应控制到演化模糊控制 2506.06594v1

Authors (3): Daniel Leite, Igor Škrjanc, Fernando Gomide

Evolving fuzzy systems build and adapt fuzzy models - such as predictors and controllers - by incrementally updating their rule-base structure from data streams. On the occasion of the 60-year anniversary of fuzzy set theory, commemorated during the Fuzz-IEEE 2025 event, this brief paper revisits the historical development and core contributions of classical fuzzy and adaptive modeling and control frameworks. It then highlights the emergence and significance of evolving intelligent systems in fuzzy modeling and control, emphasizing their advantages in handling nonstationary environments. Key challenges and future directions are discussed, including safety, interpretability, and principled structural evolution.

nan


Article 1456

Title@2025-06-06 (5): Improving the Diffusability of Autoencoders

Title: Improving the Diffusability of Autoencoders Verbesserung der Diffusibilität von Autoencodern 提高自动编码器的易变性 2502.14831v3

Authors (8): Ivan Skorokhodov, Sharath Girish, Benran Hu, Willi Menapace, Yanyu Li, Rameen Abdal, Sergey Tulyakov, Aliaksandr Siarohin

Latent diffusion models have emerged as the leading approach for generating high-quality images and videos, utilizing compressed latent representations to reduce the computational burden of the diffusion process. While recent advancements have primarily focused on scaling diffusion backbones and improving autoencoder reconstruction quality, the interaction between these components has received comparatively less attention. In this work, we perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces, which are especially pronounced in the autoencoders with a large bottleneck channel size. We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality. To mitigate the issue, we propose scale equivariance: a simple regularization strategy that aligns latent and RGB spaces across frequencies by enforcing scale equivariance in the decoder. It requires minimal code changes and only up to 20K autoencoder fine-tuning steps, yet significantly improves generation quality, reducing FID by 19% for image generation on ImageNet-1K $256^2$ and FVD by at least 44% for video generation on Kinetics-700 $17 \times 256^2$. The source code is available at https://github.com/snap-research/diffusability.

nan


Article 1457

Title@2025-06-06 (5): Neighbour-Driven Gaussian Process Variational Autoencoders for Scalable Structured Latent Modelling

Title: Neighbour-Driven Gaussian Process Variational Autoencoders for Scalable Structured Latent Modelling Nachbar-getriebene Gauß-Prozessvariationelle Autoencoder für skalierbare strukturierte Latentenmodellierung 相邻的Driven Gaussian 用于可缩放结构晚期模拟模型的流程变异自动编码器 2505.16481v2

Authors (3): Xinxing Shi, Xiaoyu Jiang, Mauricio A. Álvarez

Gaussian Process (GP) Variational Autoencoders (VAEs) extend standard VAEs by replacing the fully factorised Gaussian prior with a GP prior, thereby capturing richer correlations among latent variables. However, performing exact GP inference in large-scale GPVAEs is computationally prohibitive, often forcing existing approaches to rely on restrictive kernel assumptions or large sets of inducing points. In this work, we propose a neighbour-driven approximation strategy that exploits local adjacencies in the latent space to achieve scalable GPVAE inference. By confining computations to the nearest neighbours of each data point, our method preserves essential latent dependencies, allowing more flexible kernel choices and mitigating the need for numerous inducing points. Through extensive experiments on tasks including representation learning, data imputation, and conditional generation, we demonstrate that our approach outperforms other GPVAE variants in both predictive performance and computational efficiency.

nan


Article 1458

Title@2025-06-06 (5): Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures

Title: Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures Globale Konvergenz von Gradienten EM für überparameterisierte Gauß-Mischungen 用于超度测量高斯混合体的梯度 EM 全球趋同 2506.06584v1

Authors (4): Mo Zhou, Weihang Xu, Maryam Fazel, Simon S. Du

Learning Gaussian Mixture Models (GMMs) is a fundamental problem in machine learning, with the Expectation-Maximization (EM) algorithm and its popular variant gradient EM being arguably the most widely used algorithms in practice. In the exact-parameterized setting, where both the ground truth GMM and the learning model have the same number of components $m$, a vast line of work has aimed to establish rigorous recovery guarantees for EM. However, global convergence has only been proven for the case of $m=2$, and EM is known to fail to recover the ground truth when $m\geq 3$. In this paper, we consider the $\textit{over-parameterized}$ setting, where the learning model uses $n>m$ components to fit an $m$-component ground truth GMM. In contrast to the exact-parameterized case, we provide a rigorous global convergence guarantee for gradient EM. Specifically, for any well separated GMMs in general position, we prove that with only mild over-parameterization $n = \Omega(m\log m)$, randomly initialized gradient EM converges globally to the ground truth at a polynomial rate with polynomial samples. Our analysis proceeds in two stages and introduces a suite of novel tools for Gaussian Mixture analysis. We use Hermite polynomials to study the dynamics of gradient EM and employ tensor decomposition to characterize the geometric landscape of the likelihood loss. This is the first global convergence and recovery result for EM or Gradient EM beyond the special case of $m=2$.

nan


Article 1459

Title@2025-06-06 (5): Demystifying Topological Message-Passing with Relational Structures: A Case Study on Oversquashing in Simplicial Message-Passing

Title: Demystifying Topological Message-Passing with Relational Structures: A Case Study on Oversquashing in Simplicial Message-Passing Entmystifizieren Topologisches Message-Passing mit relationalen Strukturen: Eine Fallstudie zum Übersetzen in Simplicial Message-Passing 解开神秘的地形信息与关系结构的粘贴:关于简单信息传送中过度隔绝的案例研究 2506.06582v1

Authors (5): Diaaeldin Taha, James Chapman, Marzieh Eidi, Karel Devriendt, Guido Montúfar

Topological deep learning (TDL) has emerged as a powerful tool for modeling higher-order interactions in relational data. However, phenomena such as oversquashing in topological message-passing remain understudied and lack theoretical analysis. We propose a unifying axiomatic framework that bridges graph and topological message-passing by viewing simplicial and cellular complexes and their message-passing schemes through the lens of relational structures. This approach extends graph-theoretic results and algorithms to higher-order structures, facilitating the analysis and mitigation of oversquashing in topological message-passing networks. Through theoretical analysis and empirical studies on simplicial networks, we demonstrate the potential of this framework to advance TDL.

nan


Article 1460

Title@2025-06-06 (5): Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques

Title: Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques Auf dem Weg zu einer effizienten Multi-LLM-Inferenz: Charakterisierung und Analyse von LLM Routing und Hierarchical Techniques 实现高效的多LLLM 多级LLM推理:LLM转流和等级技术的特征和分析 2506.06579v1

Authors (5): Adarsh Prasad Behera, Jaya Prakash Champati, Roberto Morabito, Sasu Tarkoma, James Gross

Recent progress in Language Models (LMs) has dramatically advanced the field of natural language processing (NLP), excelling at tasks like text generation, summarization, and question answering. However, their inference remains computationally expensive and energy intensive, especially in settings with limited hardware, power, or bandwidth. This makes it difficult to deploy LMs in mobile, edge, or cost sensitive environments. To address these challenges, recent approaches have introduced multi LLM intelligent model selection strategies that dynamically allocate computational resources based on query complexity – using lightweight models for simpler queries and escalating to larger models only when necessary. This survey explores two complementary strategies for efficient LLM inference: (i) routing, which selects the most suitable model based on the query, and (ii) cascading or hierarchical inference (HI), which escalates queries through a sequence of models until a confident response is found. Both approaches aim to reduce computation by using lightweight models for simpler tasks while offloading only when needed. We provide a comparative analysis of these techniques across key performance metrics, discuss benchmarking efforts, and outline open challenges. Finally, we outline future research directions to enable faster response times, adaptive model selection based on task complexity, and scalable deployment across heterogeneous environments, making LLM based systems more efficient and accessible for real world applications.

nan


Article 1461

Title@2025-06-06 (5): Generalization of Geometric Graph Neural Networks with Lipschitz Loss Functions

Title: Generalization of Geometric Graph Neural Networks with Lipschitz Loss Functions Generalisierung geometrischer Graphen-Neuralnetze mit Lipschitz-Verlustfunktionen 具有Lipschitz损失功能的几何图神经网络的通用化 2409.05191v2

Authors (3): Zhiyang Wang, Juan Cervino, Alejandro Ribeiro

In this paper, we study the generalization capabilities of geometric graph neural networks (GNNs). We consider GNNs over a geometric graph constructed from a finite set of randomly sampled points over an embedded manifold with topological information captured. We prove a generalization gap between the optimal empirical risk and the optimal statistical risk of this GNN, which decreases with the number of sampled points from the manifold and increases with the dimension of the underlying manifold. This generalization gap ensures that the GNN trained on a graph on a set of sampled points can be utilized to process other unseen graphs constructed from the same underlying manifold. The most important observation is that the generalization capability can be realized with one large graph instead of being limited to the size of the graph as in previous results. The generalization gap is derived based on the non-asymptotic convergence result of a GNN on the sampled graph to the underlying manifold neural networks (MNNs). We verify this theoretical result with experiments on multiple real-world datasets.

nan


Article 1462

Title@2025-06-06 (5): DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling

Title: DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling DISC: DISC: Dynamische Zersetzung verbessert LLM-Inferenzskalierung DISC: DISC: 动态分解改善LLM 推推法的扩大 2502.16706v2

Authors (9): Jonathan Light, Wei Cheng, Benjamin Riviere, Wu Yue, Masafumi Oyamada, Mengdi Wang, Yisong Yue, Santiago Paternain, Haifeng Chen

Inference scaling methods for large language models often work by breaking problems into steps or groups of tokens, then sampling and selecting the best next steps. However, these steps and their sizes are usually fixed or manually designed based on domain knowledge. We introduce dynamic decomposition, a method that adaptively and automatically breaks down solution and reasoning traces into manageable steps during inference. By allocating compute more effectively - especially by subdividing difficult steps and prioritizing their sampling - dynamic decomposition significantly boosts inference efficiency. Experiments on benchmarks like APPS, MATH, and LiveCodeBench show that dynamic decomposition outperforms fixed strategies such as token-level, sentence-level, and single-step decompositions, reducing the pass@10 error rate by 5.0%, 6.7%, and 10.5% respectively. These results show the promise of dynamic decomposition for improving a broad range of inference scaling techniques.

nan


Article 1463

Title@2025-06-06 (5): A Manifold Perspective on the Statistical Generalization of Graph Neural Networks

Title: A Manifold Perspective on the Statistical Generalization of Graph Neural Networks Eine vielfältige Perspektive auf die statistische Verallgemeinerung von Graphen-Neuralen Netzwerken 图表神经网络统计一般化的多功能视角 2406.05225v6

Authors (3): Zhiyang Wang, Juan Cervino, Alejandro Ribeiro

Graph Neural Networks (GNNs) extend convolutional neural networks to operate on graphs. Despite their impressive performances in various graph learning tasks, the theoretical understanding of their generalization capability is still lacking. Previous GNN generalization bounds ignore the underlying graph structures, often leading to bounds that increase with the number of nodes – a behavior contrary to the one experienced in practice. In this paper, we take a manifold perspective to establish the statistical generalization theory of GNNs on graphs sampled from a manifold in the spectral domain. As demonstrated empirically, we prove that the generalization bounds of GNNs decrease linearly with the size of the graphs in the logarithmic scale, and increase linearly with the spectral continuity constants of the filter functions. Notably, our theory explains both node-level and graph-level tasks. Our result has two implications: i) guaranteeing the generalization of GNNs to unseen data over manifolds; ii) providing insights into the practical design of GNNs, i.e., restrictions on the discriminability of GNNs are necessary to obtain a better generalization performance. We demonstrate our generalization bounds of GNNs using synthetic and multiple real-world datasets.

nan


Article 1464

Title@2025-06-06 (5): Graph Persistence goes Spectral

Title: Graph Persistence goes Spectral Grafik Persistenz geht Spektral 图 常年 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 2506.06571v1

Authors (3): Mattie Ji, Amauri H. Souza, Vikas Garg

Including intricate topological information (e.g., cycles) provably enhances the expressivity of message-passing graph neural networks (GNNs) beyond the Weisfeiler-Leman (WL) hierarchy. Consequently, Persistent Homology (PH) methods are increasingly employed for graph representation learning. In this context, recent works have proposed decorating classical PH diagrams with vertex and edge features for improved expressivity. However, due to their dependence on features, these methods still fail to capture basic graph structural information. In this paper, we propose SpectRe – a new topological descriptor for graphs that integrates spectral information into PH diagrams. Notably, SpectRe is strictly more expressive than existing descriptors on graphs. We also introduce notions of global and local stability to analyze existing descriptors and establish that SpectRe is locally stable. Finally, experiments on synthetic and real-world datasets demonstrate the effectiveness of SpectRe and its potential to enhance the capabilities of graph models in relevant learning tasks.

nan


Article 1465

Title@2025-06-06 (5): Reconstructing Heterogeneous Biomolecules via Hierarchical Gaussian Mixtures and Part Discovery

Title: Reconstructing Heterogeneous Biomolecules via Hierarchical Gaussian Mixtures and Part Discovery Rekonstruieren heterogene Biomoleküle über hierarchische Gauß-Mischungen und Teil-Entdeckung 通过高山级等级混合体和部件发现,通过等级制混合体重新构造异基因生物分子 2506.09063v1

Authors (4): Shayan Shekarforoush, David B. Lindell, Marcus A. Brubaker, David J. Fleet

Cryo-EM is a transformational paradigm in molecular biology where computational methods are used to infer 3D molecular structure at atomic resolution from extremely noisy 2D electron microscope images. At the forefront of research is how to model the structure when the imaged particles exhibit non-rigid conformational flexibility and compositional variation where parts are sometimes missing. We introduce a novel 3D reconstruction framework with a hierarchical Gaussian mixture model, inspired in part by Gaussian Splatting for 4D scene reconstruction. In particular, the structure of the model is grounded in an initial process that infers a part-based segmentation of the particle, providing essential inductive bias in order to handle both conformational and compositional variability. The framework, called CryoSPIRE, is shown to reveal biologically meaningful structures on complex experimental datasets, and establishes a new state-of-the-art on CryoBench, a benchmark for cryo-EM heterogeneity methods.

nan


Article 1466

Title@2025-06-06 (5): Securing Traffic Sign Recognition Systems in Autonomous Vehicles

Title: Securing Traffic Sign Recognition Systems in Autonomous Vehicles Sicherung von Verkehrszeichenerkennungssystemen in autonomen Fahrzeugen 保障机动车辆交通标志识别系统的安全 2506.06563v1

Authors (3): Thushari Hapuarachchi, Long Dang, Kaiqi Xiong

Deep Neural Networks (DNNs) are widely used for traffic sign recognition because they can automatically extract high-level features from images. These DNNs are trained on large-scale datasets obtained from unknown sources. Therefore, it is important to ensure that the models remain secure and are not compromised or poisoned during training. In this paper, we investigate the robustness of DNNs trained for traffic sign recognition. First, we perform the error-minimizing attacks on DNNs used for traffic sign recognition by adding imperceptible perturbations on training data. Then, we propose a data augmentation-based training method to mitigate the error-minimizing attacks. The proposed training method utilizes nonlinear transformations to disrupt the perturbations and improve the model robustness. We experiment with two well-known traffic sign datasets to demonstrate the severity of the attack and the effectiveness of our mitigation scheme. The error-minimizing attacks reduce the prediction accuracy of the DNNs from 99.90% to 10.6%. However, our mitigation scheme successfully restores the prediction accuracy to 96.05%. Moreover, our approach outperforms adversarial training in mitigating the error-minimizing attacks. Furthermore, we propose a detection model capable of identifying poisoned data even when the perturbations are imperceptible to human inspection. Our detection model achieves a success rate of over 99% in identifying the attack. This research highlights the need to employ advanced training methods for DNNs in traffic sign recognition systems to mitigate the effects of data poisoning attacks.

nan


Article 1467

Title@2025-06-06 (5): Enhancing radioisotope identification in gamma spectra via supervised domain adaptation

Title: Enhancing radioisotope identification in gamma spectra via supervised domain adaptation Verbesserung der Radioisotop-Identifikation in Gammaspektren durch Überwachung der Domänenanpassung 通过受监督的域适应加强伽马光谱中放射性同位素识别 2412.07069v2

Authors (1): Peter Lalor

Machine learning methods in gamma spectroscopy have the potential to provide accurate, real-time classification of unknown radioactive samples. However, obtaining sufficient experimental training data is often prohibitively expensive and time-consuming, and models trained solely on simulated data can struggle to generalize to the unpredictable range of real-world operating scenarios. In this study, we explore how supervised domain adaptation techniques can improve radioisotope identification models by transferring knowledge between different data domains. We begin by pretraining a model for radioisotope identification using data from a synthetic source domain, and then fine-tune it for a new target domain that shares the same label space. Our analysis indicates that fine-tuned models significantly outperform those trained exclusively on source-domain data or solely on target-domain data, particularly in the intermediate data regime ($\approx 10^2$ to $10^5$ target training samples). This conclusion is consistent across four different machine learning architectures (MLP, CNN, Transformer, and LSTM). Furthermore, our findings show that fine-tuned Transformers yield a statistically significant improvement in test performance compared to the other architectures. Overall, this study serves as a proof of concept for applying supervised domain adaptation techniques to gamma spectroscopy in scenarios where experimental data is limited.

nan


Article 1468

Title@2025-06-06 (5): ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models

Title: ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models ThinkEdit: Verdolmetschbare Gewichte bearbeiten, um übermäßig kurzes Denken in vernünftigen Modellen zu vermischen 思考Edit:将可解释的重量编辑为 “ 解释性模型 “ 中模拟超短思考 2503.22048v3

Authors (3): Chung-En Sun, Ge Yan, Tsui-Wei Weng

Recent studies have shown that Large Language Models (LLMs) augmented with chain-of-thought (CoT) reasoning demonstrate impressive problem-solving abilities. However, in this work, we identify a recurring issue where these models occasionally generate overly short reasoning, leading to degraded performance on even simple mathematical problems. Specifically, we investigate how reasoning length is embedded in the hidden representations of reasoning models and its impact on accuracy. Our analysis reveals that reasoning length is governed by a linear direction in the representation space, allowing us to induce overly short reasoning by steering the model along this direction. Building on this insight, we introduce \textbf{\textit{ThinkEdit}}, a simple yet effective weight-editing approach to mitigate the issue of overly short reasoning. We first identify a small subset of attention heads (approximately 4%) that predominantly drive short reasoning behavior. We then edit the output projection weights of these heads to remove the short reasoning direction. With changes to only 0.2% of the model’s parameters, \textbf{\textit{ThinkEdit}} effectively reduces overly short reasoning and yields notable accuracy gains for short reasoning outputs (+6.39%), along with an overall improvement across multiple math benchmarks (+3.34%). Our findings provide new mechanistic insights into how reasoning length is controlled within LLMs and highlight the potential of fine-grained model interventions to improve reasoning quality. Our code is available at: https://github.com/Trustworthy-ML-Lab/ThinkEdit\

nan


Article 1469

Title@2025-06-06 (5): Rapid training of Hamiltonian graph networks without gradient descent

Title: Rapid training of Hamiltonian graph networks without gradient descent Schnelles Training von Hamiltonschen Graphennetzwerken ohne Gradientenabstieg 快速培训汉密尔顿图表网络,无梯度下降 2506.06558v1

Authors (4): Atamert Rahma, Chinmay Datar, Ana Cukarska, Felix Dietrich

Learning dynamical systems that respect physical symmetries and constraints remains a fundamental challenge in data-driven modeling. Integrating physical laws with graph neural networks facilitates principled modeling of complex N-body dynamics and yields accurate and permutation-invariant models. However, training graph neural networks with iterative, gradient-based optimization algorithms (e.g., Adam, RMSProp, LBFGS) often leads to slow training, especially for large, complex systems. In comparison to 15 different optimizers, we demonstrate that Hamiltonian Graph Networks (HGN) can be trained up to 600x faster–but with comparable accuracy–by replacing iterative optimization with random feature-based parameter construction. We show robust performance in diverse simulations, including N-body mass-spring systems in up to 3 dimensions with different geometries, while retaining essential physical invariances with respect to permutation, rotation, and translation. We reveal that even when trained on minimal 8-node systems, the model can generalize in a zero-shot manner to systems as large as 4096 nodes without retraining. Our work challenges the dominance of iterative gradient-descent-based optimization algorithms for training neural network models for physical systems.

nan


Article 1470

Title@2025-06-06 (5): Infinity Search: Approximate Vector Search with Projections on q-Metric Spaces

Title: Infinity Search: Approximate Vector Search with Projections on q-Metric Spaces Infinity Search: Ungefähre Vektorsuche mit Projektionen auf q-Metric Spaces 无限搜索: 近似矢量搜索和 q- 计量空间预测 2506.06557v1

Authors (4): Antonio Pariente, Ignacio Hounie, Santiago Segarra, Alejandro Ribeiro

Despite the ubiquity of vector search applications, prevailing search algorithms overlook the metric structure of vector embeddings, treating it as a constraint rather than exploiting its underlying properties. In this paper, we demonstrate that in $q$-metric spaces, metric trees can leverage a stronger version of the triangle inequality to reduce comparisons for exact search. Notably, as $q$ approaches infinity, the search complexity becomes logarithmic. Therefore, we propose a novel projection method that embeds vector datasets with arbitrary dissimilarity measures into $q$-metric spaces while preserving the nearest neighbor. We propose to learn an approximation of this projection to efficiently transform query points to a space where euclidean distances satisfy the desired properties. Our experimental results with text and image vector embeddings show that learning $q$-metric approximations enables classic metric tree algorithms – which typically underperform with high-dimensional data – to achieve competitive performance against state-of-the-art search methods.

nan


Article 1471

Title@2025-06-06 (5): SDN-Based False Data Detection With Its Mitigation and Machine Learning Robustness for In-Vehicle Networks

Title: SDN-Based False Data Detection With Its Mitigation and Machine Learning Robustness for In-Vehicle Networks SDN-basierte falsche Datenerkennung mit seiner Abmilderung und Machine Learning Robustheit für In-Vehicle-Netzwerke 基于SDN的假数据探测,其减轻和机械学习强力适用于车辆内网络 2506.06556v1

Authors (4): Long Dang, Thushari Hapuarachchi, Kaiqi Xiong, Yi Li

As the development of autonomous and connected vehicles advances, the complexity of modern vehicles increases, with numerous Electronic Control Units (ECUs) integrated into the system. In an in-vehicle network, these ECUs communicate with one another using an standard protocol called Controller Area Network (CAN). Securing communication among ECUs plays a vital role in maintaining the safety and security of the vehicle. This paper proposes a robust SDN-based False Data Detection and Mitigation System (FDDMS) for in-vehicle networks. Leveraging the unique capabilities of Software-Defined Networking (SDN), FDDMS is designed to monitor and detect false data injection attacks in real-time. Specifically, we focus on brake-related ECUs within an SDN-enabled in-vehicle network. First, we decode raw CAN data to create an attack model that illustrates how false data can be injected into the system. Then, FDDMS, incorporating a Long Short Term Memory (LSTM)-based detection model, is used to identify false data injection attacks. We further propose an effective variant of DeepFool attack to evaluate the model’s robustness. To countermeasure the impacts of four adversarial attacks including Fast gradient descent method, Basic iterative method, DeepFool, and the DeepFool variant, we further enhance a re-training technique method with a threshold based selection strategy. Finally, a mitigation scheme is implemented to redirect attack traffic by dynamically updating flow rules through SDN. Our experimental results show that the proposed FDDMS is robust against adversarial attacks and effectively detects and mitigates false data injection attacks in real-time.

nan


Article 1472

Title@2025-06-06 (5): Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features

Title: Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features Theoretische Grundlagen des Deep Copula Klassifikators: Ein generativer Ansatz zur Modellierung abhängiger Merkmale 深 Cocula 分类法理论基础:建模附属地貌的开创性方法 2505.22997v2

Authors (3): Agnideep Aich, Ashit Baran Aich, Bruce Wade

Traditional classifiers often assume feature independence or rely on overly simplistic relationships, leading to poor performance in settings where real-world dependencies matter. We introduce the Deep Copula Classifier (DCC), a generative model that separates the learning of each feature’s marginal distribution from the modeling of their joint dependence structure via neural network-parameterized copulas. For each class, lightweight neural networks are used to flexibly and adaptively capture feature interactions, making DCC particularly effective when classification is driven by complex dependencies. We establish that DCC converges to the Bayes-optimal classifier under standard conditions and provide explicit convergence rates of O(n^{-r/(2r + d)}) for r-smooth copula densities. Beyond theoretical guarantees, we outline several practical extensions, including high-dimensional scalability through vine and factor copula architectures, semi-supervised learning via entropy regularization, and online adaptation using streaming gradient methods. By unifying statistical rigor with the representational power of neural networks, DCC offers a mathematically grounded and interpretable framework for dependency-aware classification.

nan


Article 1473

Title@2025-06-06 (5): GeoClip: Geometry-Aware Clipping for Differentially Private SGD

Title: GeoClip: Geometry-Aware Clipping for Differentially Private SGD GeoClip: Geometry-Aware Clipping für unterschiedlich private SGD GeoClip: 区分专用 SGD 的几何- 软件剪切 2506.06549v1

Authors (4): Atefeh Gilani, Naima Tasnim, Lalitha Sankar, Oliver Kosut

Differentially private stochastic gradient descent (DP-SGD) is the most widely used method for training machine learning models with provable privacy guarantees. A key challenge in DP-SGD is setting the per-sample gradient clipping threshold, which significantly affects the trade-off between privacy and utility. While recent adaptive methods improve performance by adjusting this threshold during training, they operate in the standard coordinate system and fail to account for correlations across the coordinates of the gradient. We propose GeoClip, a geometry-aware framework that clips and perturbs gradients in a transformed basis aligned with the geometry of the gradient distribution. GeoClip adaptively estimates this transformation using only previously released noisy gradients, incurring no additional privacy cost. We provide convergence guarantees for GeoClip and derive a closed-form solution for the optimal transformation that minimizes the amount of noise added while keeping the probability of gradient clipping under control. Experiments on both tabular and image datasets demonstrate that GeoClip consistently outperforms existing adaptive clipping methods under the same privacy budget.

nan


Article 1474

Title@2025-06-06 (5): MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping

Title: MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping MINT: Multimodale Instruktions-Tuning mit multimodaler Interaktions-Gruppierung MINT: 与多式互动组合组合的多式指示调制 2506.02308v3

Authors (5): Xiaojun Shan, Qi Cao, Xing Han, Haofei Yu, Paul Pu Liang

Recent advances in multimodal foundation models have achieved state-of-the-art performance across a range of tasks. These breakthroughs are largely driven by new pre-training paradigms that leverage large-scale, unlabeled multimodal data, followed by instruction fine-tuning on curated labeled datasets and high-quality prompts. While there is growing interest in scaling instruction fine-tuning to ever-larger datasets in both quantity and scale, our findings reveal that simply increasing the number of instruction-tuning tasks does not consistently yield better performance. Instead, we observe that grouping tasks by the common interactions across modalities, such as discovering redundant shared information, prioritizing modality selection with unique information, or requiring synergistic fusion to discover new information from both modalities, encourages the models to learn transferrable skills within a group while suppressing interference from mismatched tasks. To this end, we introduce MINT, a simple yet surprisingly effective task-grouping strategy based on the type of multimodal interaction. We demonstrate that the proposed method greatly outperforms existing task grouping baselines for multimodal instruction tuning, striking an effective balance between generalization and specialization.

nan


Article 1475

Title@2025-06-06 (5): Mixed-curvature decision trees and random forests

Title: Mixed-curvature decision trees and random forests Entscheidungsbäume mit gemischter Krümmung und zufällige Wälder 混合精度决定树和随机森林 2410.13879v3

Authors (6): Philippe Chlenski, Quentin Chu, Raiyan R. Khan, Kaizhu Du, Antonio Khalil Moretti, Itsik Pe’er

Decision trees (DTs) and their random forest (RF) extensions are workhorses of classification and regression in Euclidean spaces. However, algorithms for learning in non-Euclidean spaces are still limited. We extend DT and RF algorithms to product manifolds: Cartesian products of several hyperbolic, hyperspherical, or Euclidean components. Such manifolds handle heterogeneous curvature while still factorizing neatly into simpler components, making them compelling embedding spaces for complex datasets. Our novel angular reformulation respects manifold geometry while preserving the algorithmic properties that make decision trees effective. In the special cases of single-component manifolds, our method simplifies to its Euclidean or hyperbolic counterparts, or introduces hyperspherical DT algorithms, depending on the curvature. In benchmarks on a diverse suite of 57 classification, regression, and link prediction tasks, our product RFs ranked first on 29 tasks and came in the top 2 for 41. This highlights the value of product RFs as straightforward yet powerful new tools for data analysis in product manifolds. Code for our method is available at https://github.com/pchlenski/manify.

nan


Article 1476

Title@2025-06-06 (5): GNNAnatomy: Rethinking Model-Level Explanations for Graph Neural Networks

Title: GNNAnatomy: Rethinking Model-Level Explanations for Graph Neural Networks GNNAnatomy: Rethinking Model-Level Erklärungen für Graph Neuronale Netzwerke GNNN 解剖:重新思考图形神经网络的模型级解释 2406.04548v3

Authors (4): Hsiao-Ying Lu, Yiran Li, Ujwal Pratap Krishna Kaluvakolanu Thyagarajan, Kwan-Liu Ma

Graph Neural Networks (GNNs) achieve outstanding performance across graph-based tasks but remain difficult to interpret. In this paper, we revisit foundational assumptions underlying model-level explanation methods for GNNs, namely: (1) maximizing classification confidence yields representative explanations, (2) a single explanation suffices for an entire class of graphs, and (3) explanations are inherently trustworthy. We identify pitfalls resulting from these assumptions: methods that optimize for classification confidence may overlook partially learned patterns; topological diversity across graph subsets within the same class is often underrepresented; and explanations alone offer limited support for building user trust when applied to new datasets or models. This paper introduces GNNAnatomy, a distillation-based method designed to generate explanations while avoiding these pitfalls. GNNAnatomy first characterizes graph topology using graphlets, a set of fundamental substructures. We then train a transparent multilayer perceptron surrogate to directly approximate GNN predictions based on the graphlet representations. By analyzing the weights assigned to each graphlet, we identify the most discriminative topologies, which serve as GNN explanations. To account for structural diversity within a class, GNNAnatomy generates explanations at the required granularity through an interface that supports human-AI teaming. This interface helps users identify subsets of graphs where distinct critical substructures drive class differentiation, enabling multi-grained explanations. Additionally, by enabling exploration and linking explanations back to input graphs, the interface fosters greater transparency and trust. We evaluate GNNAnatomy on both synthetic and real-world datasets through quantitative metrics and qualitative comparisons with state-of-the-art model-level explainable GNN methods.

nan


Article 1477

Title@2025-06-06 (5): Direct Fisher Score Estimation for Likelihood Maximization

Title: Direct Fisher Score Estimation for Likelihood Maximization Direkte Fisher-Score-Schätzung für Likelihood Maximierung 渔业直接分数估计,以争取使机会最大化 2506.06542v1

Authors (4): Sherman Khoo, Yakun Wang, Song Liu, Mark Beaumont

We study the problem of likelihood maximization when the likelihood function is intractable but model simulations are readily available. We propose a sequential, gradient-based optimization method that directly models the Fisher score based on a local score matching technique which uses simulations from a localized region around each parameter iterate. By employing a linear parameterization to the surrogate score model, our technique admits a closed-form, least-squares solution. This approach yields a fast, flexible, and efficient approximation to the Fisher score, effectively smoothing the likelihood objective and mitigating the challenges posed by complex likelihood landscapes. We provide theoretical guarantees for our score estimator, including bounds on the bias introduced by the smoothing. Empirical results on a range of synthetic and real-world problems demonstrate the superior performance of our method compared to existing benchmarks.

nan


Article 1478

Title@2025-06-06 (5): LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs

Title: LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs LabSafety Bench: Benchmarking LLMs zu Sicherheitsfragen in wissenschaftlichen Laboren 实验室安全法官:科学实验室安全问题LLM女士基准基准 2410.14182v3

Authors (15): Yujun Zhou, Jingdong Yang, Yue Huang, Kehan Guo, Zoe Emory, Bikram Ghosh, Amita Bedar, Sujay Shekar, Zhenwen Liang, Pin-Yu Chen, Tian Gao, Werner Geyer, Nuno Moniz, Nitesh V Chawla, Xiangliang Zhang

Artificial Intelligence (AI) is revolutionizing scientific research, yet its growing integration into laboratory environments presents critical safety challenges. While large language models (LLMs) increasingly assist in tasks ranging from procedural guidance to autonomous experiment orchestration, an “illusion of understanding” may lead researchers to overestimate their reliability. Such overreliance is particularly dangerous in high-stakes laboratory settings, where failures in hazard identification or risk assessment can result in severe accidents. To address these concerns, we propose the Laboratory Safety Benchmark (LabSafety Bench), a comprehensive framework that evaluates large language models and vision language models (VLMs) on their ability to identify potential hazards, assess risks, and predict the consequences of unsafe actions in lab environments. LabSafety Bench comprises 765 multiple-choice questions aligned with US Occupational Safety and Health Administration (OSHA) protocols, along with 404 realistic laboratory scenarios featuring dual evaluation tasks: the Hazards Identification Test and the Consequence Identification Test, with 3128 open-ended questions in total. Evaluations across eight proprietary models, seven open-weight LLMs, and four VLMs reveal that, despite advanced performance on structured assessments, no model achieves the safety threshold required for reliable operation – none scoring above 70% on the Hazards Identification Test. Moreover, while proprietary models tend to excel in multiple-choice evaluations, their performance in open-ended, real-world scenario responses is comparable to that of open-source models. These findings underscore the urgent need for specialized evaluation frameworks to ensure the safe and responsible deployment of AI in laboratory settings.

nan


Article 1479

Title@2025-06-06 (5): Rational Tuning of LLM Cascades via Probabilistic Modeling

Title: Rational Tuning of LLM Cascades via Probabilistic Modeling Rationales Tuning von LLM-Kaskaden durch probabilistische Modellierung 通过概率建模对LLM Casscades 进行合理测算 2501.09345v4

Authors (2): Michael J. Zellinger, Matt Thomson

Understanding the reliability of large language models (LLMs) has recently garnered significant attention. Given LLMs’ propensity to hallucinate, as well as their high sensitivity to prompt design, it is already challenging to predict the performance of an individual LLM. However, the problem becomes more complex for compound LLM systems such as cascades, where in addition to each model’s standalone performance, we must understand how the error rates of different models interact. In this paper, we present a probabilistic model for the joint performance distribution of a sequence of LLMs, which enables a framework for rationally tuning the confidence thresholds of a LLM cascade using continuous optimization. Compared to selecting confidence thresholds using Bayesian optimization, our parametric Markov-copula model yields more favorable error-cost trade-offs, improving the area under the error-cost curve by 4.3% on average for cascades with $k\geq 3$ models. In the low-sample regime with $n \leq 30$ training examples, the performance improvement widens to 10.2%, suggesting that our framework’s inductive assumptions about the interactions between the error rates of different LLMs enhance sample efficiency. Overall, our Markov-copula model provides a rational basis for tuning LLM cascade performance and points to the potential of probabilistic methods in analyzing systems of LLMs.

nan


Article 1480

Title@2025-06-06 (5): Hierarchical and Collaborative LLM-Based Control for Multi-UAV Motion and Communication in Integrated Terrestrial and Non-Terrestrial Networks

Title: Hierarchical and Collaborative LLM-Based Control for Multi-UAV Motion and Communication in Integrated Terrestrial and Non-Terrestrial Networks Hierarchische und kollaborative LLM-basierte Steuerung für Multi-UAV-Bewegung und Kommunikation in integrierten terrestrischen und nicht-terrestrischen Netzwerken 在综合陆地和非地球网络中以基于LLM(基于LLM)的基于控制系统,以控制多功能、无人驾驶航空器的移动和通信 2506.06532v1

Authors (4): Zijiang Yan, Hao Zhou, Jianhua Pei, Hina Tabassum

Unmanned aerial vehicles (UAVs) have been widely adopted in various real-world applications. However, the control and optimization of multi-UAV systems remain a significant challenge, particularly in dynamic and constrained environments. This work explores the joint motion and communication control of multiple UAVs operating within integrated terrestrial and non-terrestrial networks that include high-altitude platform stations (HAPS). Specifically, we consider an aerial highway scenario in which UAVs must accelerate, decelerate, and change lanes to avoid collisions and maintain overall traffic flow. Different from existing studies, we propose a novel hierarchical and collaborative method based on large language models (LLMs). In our approach, an LLM deployed on the HAPS performs UAV access control, while another LLM onboard each UAV handles motion planning and control. This LLM-based framework leverages the rich knowledge embedded in pre-trained models to enable both high-level strategic planning and low-level tactical decisions. This knowledge-driven paradigm holds great potential for the development of next-generation 3D aerial highway systems. Experimental results demonstrate that our proposed collaborative LLM-based method achieves higher system rewards, lower operational costs, and significantly reduced UAV collision rates compared to baseline approaches.

nan


Article 1481

Title@2025-06-06 (5): Training-Free Constrained Generation With Stable Diffusion Models

Title: Training-Free Constrained Generation With Stable Diffusion Models Schulungsfreie eingeschränkte Generation mit stabilen Diffusionsmodellen 具有稳定传播模式的稳定传播模式的无培训、无培训的受训练一代 2502.05625v3

Authors (5): Stefano Zampini, Jacob K. Christopher, Luca Oneto, Davide Anguita, Ferdinando Fioretto

Stable diffusion models represent the state-of-the-art in data synthesis across diverse domains and hold transformative potential for applications in science and engineering, e.g., by facilitating the discovery of novel solutions and simulating systems that are computationally intractable to model explicitly. While there is increasing effort to incorporate physics-based constraints into generative models, existing techniques are either limited in their applicability to latent diffusion frameworks or lack the capability to strictly enforce domain-specific constraints. To address this limitation this paper proposes a novel integration of stable diffusion models with constrained optimization frameworks, enabling the generation of outputs satisfying stringent physical and functional requirements. The effectiveness of this approach is demonstrated through material design experiments requiring adherence to precise morphometric properties, challenging inverse design tasks involving the generation of materials inducing specific stress-strain responses, and copyright-constrained content generation tasks.

nan


Article 1482

Title@2025-06-06 (5): Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs

Title: Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs Sharp Gap-Dependent Variance-Aware Regret Bounds für Tabular MDPs 表列 MDP 的差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差差幅 2506.06521v1

Authors (5): Shulun Chen, Runlong Zhou, Zihan Zhang, Maryam Fazel, Simon S. Du

We consider the gap-dependent regret bounds for episodic MDPs. We show that the Monotonic Value Propagation (MVP) algorithm achieves a variance-aware gap-dependent regret bound of \(\tilde{O}\left(\left(\sum_{\Delta_h(s,a)>0} \frac{H^2 \log K \land \mathtt{Var}_{\max}^{\text{c}}}{\Delta_h(s,a)} +\sum_{\Delta_h(s,a)=0}\frac{ H^2 \land \mathtt{Var}_{\max}^{\text{c}}}{\Delta_{\mathrm{min}}} + SAH^4 (S \lor H) \right) \log K\right),\) where $H$ is the planning horizon, $S$ is the number of states, $A$ is the number of actions, and $K$ is the number of episodes. Here, $\Delta_h(s,a) =V_h^* (a) - Q_h^* (s, a)$ represents the suboptimality gap and $\Delta_{\mathrm{min}} := \min_{\Delta_h (s,a) > 0} \Delta_h(s,a)$. The term $\mathtt{Var}{\max}^{\text{c}}$ denotes the maximum conditional total variance, calculated as the maximum over all $(\pi, h, s)$ tuples of the expected total variance under policy $\pi$ conditioned on trajectories visiting state $s$ at step $h$. $\mathtt{Var}{\max}^{\text{c}}$ characterizes the maximum randomness encountered when learning any $(h, s)$ pair. Our result stems from a novel analysis of the weighted sum of the suboptimality gap and can be potentially adapted for other algorithms. To complement the study, we establish a lower bound of \(\Omega \left( \sum_{\Delta_h(s,a)>0} \frac{H^2 \land \mathtt{Var}_{\max}^{\text{c}}}{\Delta_h(s,a)}\cdot \log K\right),\) demonstrating the necessity of dependence on $\mathtt{Var}_{\max}^{\text{c}}$ even when the maximum unconditional total variance (without conditioning on $(h, s)$) approaches zero.

nan


Article 1483

Title@2025-06-06 (5): A Systematic Review of Poisoning Attacks Against Large Language Models

Title: A Systematic Review of Poisoning Attacks Against Large Language Models Eine systematische Überprüfung von Giftangriffen gegen große Sprachmodelle 系统审查针对大语言模式的中毒袭击 2506.06518v1

Authors (6): Neil Fendley, Edward W. Staley, Joshua Carney, William Redman, Marie Chau, Nathan Drenkow

With the widespread availability of pretrained Large Language Models (LLMs) and their training datasets, concerns about the security risks associated with their usage has increased significantly. One of these security risks is the threat of LLM poisoning attacks where an attacker modifies some part of the LLM training process to cause the LLM to behave in a malicious way. As an emerging area of research, the current frameworks and terminology for LLM poisoning attacks are derived from earlier classification poisoning literature and are not fully equipped for generative LLM settings. We conduct a systematic review of published LLM poisoning attacks to clarify the security implications and address inconsistencies in terminology across the literature. We propose a comprehensive poisoning threat model applicable to categorize a wide range of LLM poisoning attacks. The poisoning threat model includes four poisoning attack specifications that define the logistics and manipulation strategies of an attack as well as six poisoning metrics used to measure key characteristics of an attack. Under our proposed framework, we organize our discussion of published LLM poisoning literature along four critical dimensions of LLM poisoning attacks: concept poisons, stealthy poisons, persistent poisons, and poisons for unique tasks, to better understand the current landscape of security risks.

nan


Article 1484

Title@2025-06-06 (5): MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding

Title: MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding MindLLM: Ein subject-agnostisches und vielseitiges Modell für fMRI-to-Text Decodierung MindLLLM: FMRI 到文本代码的主体不可知性和易变性模型 2502.15786v2

Authors (6): Weikang Qiu, Zheng Huang, Haoyu Hu, Aosong Feng, Yujun Yan, Rex Ying

Decoding functional magnetic resonance imaging (fMRI) signals into text has been a key challenge in the neuroscience community, with the potential to advance brain-computer interfaces and uncover deeper insights into brain mechanisms. However, existing approaches often struggle with suboptimal predictive performance, limited task variety, and poor generalization across subjects. In response to this, we propose MindLLM, a model designed for subject-agnostic and versatile fMRI-to-text decoding. MindLLM consists of an fMRI encoder and an off-the-shelf LLM. The fMRI encoder employs a neuroscience-informed attention mechanism, which is capable of accommodating subjects with varying input shapes and thus achieves high-performance subject-agnostic decoding. Moreover, we introduce Brain Instruction Tuning (BIT), a novel approach that enhances the model’s ability to capture diverse semantic representations from fMRI signals, facilitating more versatile decoding. We evaluate MindLLM on comprehensive fMRI-to-text benchmarks. Results demonstrate that our model outperforms the baselines, improving downstream tasks by 12.0%, unseen subject generalization by 24.5%, and novel task adaptation by 25.0%. Furthermore, the attention patterns in MindLLM provide interpretable insights into its decision-making process.

nan


Article 1485

Title@2025-06-06 (5): InstantFT: An FPGA-Based Runtime Subsecond Fine-tuning of CNN Models

Title: InstantFT: An FPGA-Based Runtime Subsecond Fine-tuning of CNN Models InstantFT: Eine FPGA-basierte Laufzeit-Untersekunde Feinabstimmung von CNN-Modellen 即时TFT:以FPGA为基础对有线电视新闻网模型进行FPGA运行时间第二级微调 2506.06505v1

Authors (2): Keisuke Sugiura, Hiroki Matsutani

Training deep neural networks (DNNs) requires significantly more computation and memory than inference, making runtime adaptation of DNNs challenging on resource-limited IoT platforms. We propose InstantFT, an FPGA-based method for ultra-fast CNN fine-tuning on IoT devices, by optimizing the forward and backward computations in parameter-efficient fine-tuning (PEFT). Experiments on datasets with concept drift demonstrate that InstantFT fine-tunes a pre-trained CNN 17.4x faster than existing Low-Rank Adaptation (LoRA)-based approaches, while achieving comparable accuracy. Our FPGA-based InstantFT reduces the fine-tuning time to just 0.36s and improves energy-efficiency by 16.3x, enabling on-the-fly adaptation of CNNs to non-stationary data distributions.

nan


Article 1486

Title@2025-06-06 (5): Optimal Rates in Continual Linear Regression via Increasing Regularization

Title: Optimal Rates in Continual Linear Regression via Increasing Regularization Optimale Raten in kontinuierlicher linearer Regression durch zunehmende Regularisierung 通过增加正规化,持续线性回归的优化率 2506.06501v1

Authors (7): Ran Levinstein, Amit Attia, Matan Schliserman, Uri Sherman, Tomer Koren, Daniel Soudry, Itay Evron

We study realizable continual linear regression under random task orderings, a common setting for developing continual learning theory. In this setup, the worst-case expected loss after $k$ learning iterations admits a lower bound of $\Omega(1/k)$. However, prior work using an unregularized scheme has only established an upper bound of $O(1/k^{1/4})$, leaving a significant gap. Our paper proves that this gap can be narrowed, or even closed, using two frequently used regularization schemes: (1) explicit isotropic $\ell_2$ regularization, and (2) implicit regularization via finite step budgets. We show that these approaches, which are used in practice to mitigate forgetting, reduce to stochastic gradient descent (SGD) on carefully defined surrogate losses. Through this lens, we identify a fixed regularization strength that yields a near-optimal rate of $O(\log k / k)$. Moreover, formalizing and analyzing a generalized variant of SGD for time-varying functions, we derive an increasing regularization strength schedule that provably achieves an optimal rate of $O(1/k)$. This suggests that schedules that increase the regularization coefficient or decrease the number of steps per task are beneficial, at least in the worst case.

nan


Article 1487

Title@2025-06-06 (5): Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms

Title: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms Synthetische Problemgenerierung für die Vernunft über Qualität-Diversität-Algorithmen 通过质量-多样性算法导致的原因产生合成问题 2506.06499v1

Authors (4): Alex Havrilla, Edward Hughes, Mikayel Samvelyan, Jacob Abernethy

Large language model (LLM) driven synthetic data generation has emerged as a powerful method for improving model reasoning capabilities. However, most methods either distill large state-of-the-art models into small students or use natural ground-truth problem statements to guarantee problem statement quality. This limits the scalability of these approaches to more complex and diverse problem domains. To address this, we present SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms, a novel approach for generating high-quality and diverse synthetic math problem and solution pairs using only a single model by measuring a problem’s solve-rate: a proxy for problem difficulty. Starting from a seed dataset of 7.5K samples, we generate over 20 million new problem-solution pairs. We show that filtering the generated data by difficulty and then fine-tuning the same model on the resulting data improves relative model performance by up to 24\%. Additionally, we conduct ablations studying the impact of synthetic data quantity, quality and diversity on model generalization. We find that higher quality, as measured by problem difficulty, facilitates better in-distribution performance. Further, while generating diverse synthetic data does not as strongly benefit in-distribution performance, filtering for more diverse data facilitates more robust OOD generalization. We also confirm the existence of model and data scaling laws for synthetically generated problems, which positively benefit downstream model generalization.

nan


Article 1488

Title@2025-06-06 (5): Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent

Title: Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent Verbesserte Finite-Partikel-Konvergenzraten für Steinvariationsgradabszenz 改进斯坦因变化而变的渐变后代的有限粒子聚合率 2409.08469v3

Authors (3): Sayan Banerjee, Krishnakumar Balasubramanian, Promit Ghosal

We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy ($\mathsf{KSD}$) and Wasserstein-2 metrics. Our key insight is that the time derivative of the relative entropy between the joint density of $N$ particle locations and the $N$-fold product target measure, starting from a regular initial distribution, splits into a dominant negative part' proportional to $N$ times the expected $\mathsf{KSD}^2$ and a smaller positive part’. This observation leads to $\mathsf{KSD}$ rates of order $1/\sqrt{N}$, in both continuous and discrete time, providing a near optimal (in the sense of matching the corresponding i.i.d. rates) double exponential improvement over the recent result by Shi and Mackey (2024). Under mild assumptions on the kernel and potential, these bounds also grow polynomially in the dimension $d$. By adding a bilinear component to the kernel, the above approach is used to further obtain Wasserstein-2 convergence in continuous time. For the case of `bilinear + Mat'ern’ kernels, we derive Wasserstein-2 rates that exhibit a curse-of-dimensionality similar to the i.i.d. setting. We also obtain marginal convergence and long-time propagation of chaos results for the time-averaged particle laws.

nan


Article 1489

Title@2025-06-06 (5): Synthetic Text Generation for Training Large Language Models via Gradient Matching

Title: Synthetic Text Generation for Training Large Language Models via Gradient Matching Synthetische Textgenerierung für das Training großer Sprachmodelle über Gradient Matching 通过 “ 渐级匹配 “ 培训大语言模型的合成文本生成 2502.17607v2

Authors (6): Dang Nguyen, Zeman Li, Mohammadhossein Bateni, Vahab Mirrokni, Meisam Razaviyayn, Baharan Mirzasoleiman

Synthetic data has the potential to improve the performance, training efficiency, and privacy of real training examples. Nevertheless, existing approaches for synthetic text generation are mostly heuristics and cannot generate human-readable text without compromising the privacy of real data, or provide performance guarantees for training Large Language Models (LLMs). In this work, we propose the first theoretically rigorous approach for generating synthetic human-readable text that provides convergence, performance, and privacy guarantees for fine-tuning LLMs on a target task. To do so, we leverage Alternating Direction Method of Multipliers (ADMM) that iteratively optimizes the embeddings of synthetic examples to match the noisy gradient of the target training or validation data, and maps them to a sequence of text tokens with low perplexity. In doing so, the generated synthetic text guarantees convergence of the model to a close neighborhood of the solution obtained by fine-tuning on real data and preserves their privacy. Experiments on various classification tasks confirm the effectiveness of our proposed approach. Our code is available at https://github.com/BigML-CS-UCLA/GRADMM.

nan


Article 1490

Title@2025-06-06 (5): Differentiable Quadratic Optimization For The Maximum Independent Set Problem

Title: Differentiable Quadratic Optimization For The Maximum Independent Set Problem Unterschiedliche quadratische Optimierung für das maximale unabhängige Set-Problem 最大独立集集问题可区别的二次二次曲线优化 2406.19532v7

Authors (7): Ismail Alkhouri, Cedric Le Denmat, Yingjie Li, Cunxi Yu, Jia Liu, Rongrong Wang, Alvaro Velasquez

Combinatorial Optimization (CO) addresses many important problems, including the challenging Maximum Independent Set (MIS) problem. Alongside exact and heuristic solvers, differentiable approaches have emerged, often using continuous relaxations of ReLU-based or quadratic objectives. Noting that an MIS in a graph is a Maximum Clique (MC) in its complement, we propose a new quadratic formulation for MIS by incorporating an MC term, improving convergence and exploration. We show that every maximal independent set corresponds to a local minimizer, derive conditions with respect to the MIS size, and characterize stationary points. To tackle the non-convexity of the objective, we propose optimizing several initializations in parallel using momentum-based gradient descent, complemented by an efficient MIS checking criterion derived from our theory. We dub our method as parallelized Clique-Informed Quadratic Optimization for MIS (pCQO-MIS). Our experimental results demonstrate the effectiveness of the proposed method compared to exact, heuristic, sampling, and data-centric approaches. Notably, our method avoids the out-of-distribution tuning and reliance on (un)labeled data required by data-centric methods, while achieving superior MIS sizes and competitive runtime relative to their inference time. Additionally, a key advantage of pCQO-MIS is that, unlike exact and heuristic solvers, the runtime scales only with the number of nodes in the graph, not the number of edges. Our code is available at the GitHub repository: https://github.com/ledenmat/pCQO-mis-benchmark/tree/refactor.

nan


Article 1491

Title@2025-06-06 (5): RelGNN: Composite Message Passing for Relational Deep Learning

Title: RelGNN: Composite Message Passing for Relational Deep Learning RelGNN: Composite-Botschaft für relationales Deep Learning RelGNN:用于关系深深学习的综合信息传递 2502.06784v2

Authors (3): Tianlang Chen, Charilaos Kanatsoulis, Jure Leskovec

Predictive tasks on relational databases are critical in real-world applications spanning e-commerce, healthcare, and social media. To address these tasks effectively, Relational Deep Learning (RDL) encodes relational data as graphs, enabling Graph Neural Networks (GNNs) to exploit relational structures for improved predictions. However, existing RDL methods often overlook the intrinsic structural properties of the graphs built from relational databases, leading to modeling inefficiencies, particularly in handling many-to-many relationships. Here we introduce RelGNN, a novel GNN framework specifically designed to leverage the unique structural characteristics of the graphs built from relational databases. At the core of our approach is the introduction of atomic routes, which are simple paths that enable direct single-hop interactions between the source and destination nodes. Building upon these atomic routes, RelGNN designs new composite message passing and graph attention mechanisms that reduce redundancy, highlight key signals, and enhance predictive accuracy. RelGNN is evaluated on 30 diverse real-world tasks from Relbench (Fey et al., 2024), and achieves state-of-the-art performance on the vast majority of tasks, with improvements of up to 25%. Code is available at https://github.com/snap-stanford/RelGNN.

nan


Article 1492

Title@2025-06-06 (5): Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks

Title: Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks Alternierende Gradientenströme: Eine Theorie des Feature-Lernens in zweischichtigen Neuronalen Netzwerken 交错的渐变流:两层神经网络中的特色学习理论 2506.06489v1

Authors (8): Daniel Kunin, Giovanni Luca Marchetti, Feng Chen, Dhruva Karkada, James B. Simon, Michael R. DeWeese, Surya Ganguli, Nina Miolane

What features neural networks learn, and how, remains an open question. In this paper, we introduce Alternating Gradient Flows (AGF), an algorithmic framework that describes the dynamics of feature learning in two-layer networks trained from small initialization. Prior works have shown that gradient flow in this regime exhibits a staircase-like loss curve, alternating between plateaus where neurons slowly align to useful directions and sharp drops where neurons rapidly grow in norm. AGF approximates this behavior as an alternating two-step process: maximizing a utility function over dormant neurons and minimizing a cost function over active ones. AGF begins with all neurons dormant. At each round, a dormant neuron activates, triggering the acquisition of a feature and a drop in the loss. AGF quantifies the order, timing, and magnitude of these drops, matching experiments across architectures. We show that AGF unifies and extends existing saddle-to-saddle analyses in fully connected linear networks and attention-only linear transformers, where the learned features are singular modes and principal components, respectively. In diagonal linear networks, we prove AGF converges to gradient flow in the limit of vanishing initialization. Applying AGF to quadratic networks trained to perform modular addition, we give the first complete characterization of the training dynamics, revealing that networks learn Fourier features in decreasing order of coefficient magnitude. Altogether, AGF offers a promising step towards understanding feature learning in neural networks.

nan


Article 1493

Title@2025-06-06 (5): Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws

Title: Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws Diversity of Transformer Layers: Ein Aspekt der Parameterskalierungsgesetze 变形层多样性:参数面积法的一个方面 2505.24009v2

Authors (6): Hidetaka Kamigaito, Ying Zhang, Jingun Kwon, Katsuhiko Hayashi, Manabu Okumura, Taro Watanabe

Transformers deliver outstanding performance across a wide range of tasks and are now a dominant backbone architecture for large language models (LLMs). Their task-solving performance is improved by increasing parameter size, as shown in the recent studies on parameter scaling laws. Although recent mechanistic-interpretability studies have deepened our understanding of the internal behavior of Transformers by analyzing their residual stream, the relationship between these internal mechanisms and the parameter scaling laws remains unclear. To bridge this gap, we focus on layers and their size, which mainly decide the parameter size of Transformers. For this purpose, we first theoretically investigate the layers within the residual stream through a bias-diversity decomposition. The decomposition separates (i) bias, the error of each layer’s output from the ground truth, and (ii) diversity, which indicates how much the outputs of each layer differ from each other. Analyzing Transformers under this theory reveals that performance improves when individual layers make predictions close to the correct answer and remain mutually diverse. We show that diversity becomes especially critical when individual layers’ outputs are far from the ground truth. Finally, we introduce an information-theoretic diversity and show our main findings that adding layers enhances performance only when those layers behave differently, i.e., are diverse. We also reveal the performance gains from increasing the number of layers exhibit submodularity: marginal improvements diminish as additional layers increase, mirroring the logarithmic convergence predicted by the parameter scaling laws. Experiments on multiple semantic-understanding tasks with various LLMs empirically confirm the theoretical properties derived in this study.

nan


Article 1494

Title@2025-06-06 (5): Membership Inference Attacks for Unseen Classes

Title: Membership Inference Attacks for Unseen Classes Mitgliedschafts-Inferenz-Angriffe für ungesehene Klassen 对不见人见的类类的推论攻击 2506.06488v1

Authors (4): Pratiksha Thaker, Neil Kale, Zhiwei Steven Wu, Virginia Smith

Shadow model attacks are the state-of-the-art approach for membership inference attacks on machine learning models. However, these attacks typically assume an adversary has access to a background (nonmember) data distribution that matches the distribution the target model was trained on. We initiate a study of membership inference attacks where the adversary or auditor cannot access an entire subclass from the distribution – a more extreme but realistic version of distribution shift than has been studied previously. In this setting, we first show that the performance of shadow model attacks degrades catastrophically, and then demonstrate the promise of another approach, quantile regression, that does not have the same limitations. We show that quantile regression attacks consistently outperform shadow model attacks in the class dropout setting – for example, quantile regression attacks achieve up to 11$\times$ the TPR of shadow models on the unseen class on CIFAR-100, and achieve nontrivial TPR on ImageNet even with 90% of training classes removed. We also provide a theoretical model that illustrates the potential and limitations of this approach.

nan


Article 1495

Title@2025-06-06 (5): Neural-Augmented Kelvinlet: Real-Time Soft Tissue Deformation with Multiple Graspers

Title: Neural-Augmented Kelvinlet: Real-Time Soft Tissue Deformation with Multiple Graspers Neural-Augmented Kelvinlet: Echtzeit-Soft Tissue-Deformation mit mehreren Graspers 神经加速开尔文莱:实时软体组织变形,带多颗粒变形 2506.08043v1

Authors (15): Ashkan Shahbazi, Kyvia Pereira, Jon S. Heiselman, Elaheh Akbari, Annie C. Benson, Sepehr Seifi, Xinyuan Liu, Garrison L. Johnston, Erwin Terpstra, Anne Draaisma, Jan-Jaap Severes, Jie Ying Wu, Nabil Simaan, Michael L. Miga, Soheil Kolouri

Fast and accurate simulation of soft tissue deformation is a critical factor for surgical robotics and medical training. In this paper, we introduce a novel physics-informed neural simulator that approximates soft tissue deformations in a realistic and real-time manner. Our framework integrates Kelvinlet-based priors into neural simulators, making it the first approach to leverage Kelvinlets for residual learning and regularization in data-driven soft tissue modeling. By incorporating large-scale Finite Element Method (FEM) simulations of both linear and nonlinear soft tissue responses, our method improves neural network predictions across diverse architectures, enhancing accuracy and physical consistency while maintaining low latency for real-time performance. We demonstrate the effectiveness of our approach by performing accurate surgical maneuvers that simulate the use of standard laparoscopic tissue grasping tools with high fidelity. These results establish Kelvinlet-augmented learning as a powerful and efficient strategy for real-time, physics-aware soft tissue simulation in surgical applications.

nan


Article 1496

Title@2025-06-06 (5): A Certified Unlearning Approach without Access to Source Data

Title: A Certified Unlearning Approach without Access to Source Data Ein zertifizierter Unlearning-Ansatz ohne Zugriff auf Quelldaten 未经认证的不学习方法,无法获得源数据 2506.06486v1

Authors (4): Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, Basak Guler

With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning methods often assume access to the complete training dataset, which is unrealistic in scenarios where the source data is no longer available. To address this challenge, we propose a certified unlearning framework that enables effective data removal \final{without access to the original training data samples}. Our approach utilizes a surrogate dataset that approximates the statistical properties of the source data, allowing for controlled noise scaling based on the statistical distance between the two. \updated{While our theoretical guarantees assume knowledge of the exact statistical distance, practical implementations typically approximate this distance, resulting in potentially weaker but still meaningful privacy guarantees.} This ensures strong guarantees on the model’s behavior post-unlearning while maintaining its overall utility. We establish theoretical bounds, introduce practical noise calibration techniques, and validate our method through extensive experiments on both synthetic and real-world datasets. The results demonstrate the effectiveness and reliability of our approach in privacy-sensitive settings.

nan


Article 1497

Title@2025-06-06 (5): The Economic Dispatch of Power-to-Gas Systems with Deep Reinforcement Learning:Tackling the Challenge of Delayed Rewards with Long-Term Energy Storage

Title: The Economic Dispatch of Power-to-Gas Systems with Deep Reinforcement Learning:Tackling the Challenge of Delayed Rewards with Long-Term Energy Storage Die wirtschaftliche Dispatch von Power-to-Gas-Systemen mit tiefem Ausbau-Lernen:Die Herausforderung von verzögerten Belohnungen mit langfristiger Energiespeicherung bewältigen 以深入强化学习的方式经济发送电力到地电系统:利用长期能源储存应对延迟奖励的挑战 2506.06484v1

Authors (3): Manuel Sage, Khalil Al Handawi, Yaoyao Fiona Zhao

Power-to-Gas (P2G) technologies gain recognition for enabling the integration of intermittent renewables, such as wind and solar, into electricity grids. However, determining the most cost-effective operation of these systems is complex due to the volatile nature of renewable energy, electricity prices, and loads. Additionally, P2G systems are less efficient in converting and storing energy compared to battery energy storage systems (BESs), and the benefits of converting electricity into gas are not immediately apparent. Deep Reinforcement Learning (DRL) has shown promise in managing the operation of energy systems amidst these uncertainties. Yet, DRL techniques face difficulties with the delayed reward characteristic of P2G system operation. Previous research has mostly focused on short-term studies that look at the energy conversion process, neglecting the long-term storage capabilities of P2G. This study presents a new method by thoroughly examining how DRL can be applied to the economic operation of P2G systems, in combination with BESs and gas turbines, over extended periods. Through three progressively more complex case studies, we assess the performance of DRL algorithms, specifically Deep Q-Networks and Proximal Policy Optimization, and introduce modifications to enhance their effectiveness. These modifications include integrating forecasts, implementing penalties on the reward function, and applying strategic cost calculations, all aimed at addressing the issue of delayed rewards. Our findings indicate that while DRL initially struggles with the complex decision-making required for P2G system operation, the adjustments we propose significantly improve its capability to devise cost-effective operation strategies, thereby unlocking the potential for long-term energy storage in P2G technologies.

nan


Article 1498

Title@2025-06-06 (5): Noise Consistency Regularization for Improved Subject-Driven Image Synthesis

Title: Noise Consistency Regularization for Improved Subject-Driven Image Synthesis Lärmkonsistenz-Regularisierung für verbesserte themengetriebene Bildsynthese 改进主题驱动图像合成改进目标驱动图像集的一致一致规范化 2506.06483v1

Authors (4): Yao Ni, Song Wen, Piotr Koniusz, Anoop Cherian

Fine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects. However, existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity. To address these challenges, we propose two auxiliary consistency losses for diffusion fine-tuning. First, a prior consistency regularization loss ensures that the predicted diffusion noise for prior (non-subject) images remains consistent with that of the pretrained model, improving fidelity. Second, a subject consistency regularization loss enhances the fine-tuned model’s robustness to multiplicative noise modulated latent code, helping to preserve subject identity while improving diversity. Our experimental results demonstrate that incorporating these losses into fine-tuning not only preserves subject identity but also enhances image diversity, outperforming DreamBooth in terms of CLIP scores, background variation, and overall visual quality.

nan


Article 1499

Title@2025-06-06 (5): TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness

Title: TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness TimeRecipe: Ein Time-Series-Prognose-Rezept über Benchmarking Module Level Wirksamkeit 时间Recpe: 通过基准标准模块的实时序列预测食谱 2506.06482v1

Authors (6): Zhiyuan Zhao, Juntong Ni, Shangqing Xu, Haoxin Liu, Wei Jin, B. Aditya Prakash

Time-series forecasting is an essential task with wide real-world applications across domains. While recent advances in deep learning have enabled time-series forecasting models with accurate predictions, there remains considerable debate over which architectures and design components, such as series decomposition or normalization, are most effective under varying conditions. Existing benchmarks primarily evaluate models at a high level, offering limited insight into why certain designs work better. To mitigate this gap, we propose TimeRecipe, a unified benchmarking framework that systematically evaluates time-series forecasting methods at the module level. TimeRecipe conducts over 10,000 experiments to assess the effectiveness of individual components across a diverse range of datasets, forecasting horizons, and task settings. Our results reveal that exhaustive exploration of the design space can yield models that outperform existing state-of-the-art methods and uncover meaningful intuitions linking specific design choices to forecasting scenarios. Furthermore, we release a practical toolkit within TimeRecipe that recommends suitable model architectures based on these empirical insights. The benchmark is available at: https://github.com/AdityaLab/TimeRecipe.

nan


Article 1500

Title@2025-06-06 (5): Unveiling Markov Heads in Pretrained Language Models for Offline Reinforcement Learning

Title: Unveiling Markov Heads in Pretrained Language Models for Offline Reinforcement Learning Markov-Köpfe in vorgebildeten Sprachmodellen für Offline-Verstärkungslernen enthüllen 离线强化学习培训前语言模式中未排行的Markov负责人 2409.06985v2

Authors (7): Wenhao Zhao, Qiushui Xu, Linjie Xu, Lei Song, Jinyu Wang, Chunlai Zhou, Jiang Bian

Recently, incorporating knowledge from pretrained language models (PLMs) into decision transformers (DTs) has generated significant attention in offline reinforcement learning (RL). These PLMs perform well in RL tasks, raising an intriguing question: what kind of knowledge from PLMs has been transferred to RL to achieve such good results? This work first dives into this problem by analyzing each head quantitatively and points out Markov head, a crucial component that exists in the attention heads of PLMs. It leads to extreme attention on the last-input token and performs well only in short-term environments. Furthermore, we prove that this extreme attention cannot be changed by re-training embedding layer or fine-tuning. Inspired by our analysis, we propose a general method GPT2-DTMA, which equips a pretrained DT with Mixture of Attention (MoA), to accommodate diverse attention requirements during fine-tuning. Extensive experiments corroborate our theorems and demonstrate the effectiveness of GPT2-DTMA: it achieves comparable performance in short-term environments while significantly narrowing the performance gap in long-term environments.

nan


Article 1501

Title@2025-06-06 (5): DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation

Title: DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation DELT: Ein einfaches Diversity-orientiertes EarlyLate-Training für die Datensatzdestillation DELT: 以简单多样性驱动的数据集蒸馏早期远程培训 2411.19946v2

Authors (4): Zhiqiang Shen, Ammar Sherif, Zeyuan Yin, Shitong Shao

Recent advances in dataset distillation have led to solutions in two main directions. The conventional batch-to-batch matching mechanism is ideal for small-scale datasets and includes bi-level optimization methods on models and syntheses, such as FRePo, RCIG, and RaT-BPTT, as well as other methods like distribution matching, gradient matching, and weight trajectory matching. Conversely, batch-to-global matching typifies decoupled methods, which are particularly advantageous for large-scale datasets. This approach has garnered substantial interest within the community, as seen in SRe$^2$L, G-VBSM, WMDD, and CDA. A primary challenge with the second approach is the lack of diversity among syntheses within each class since samples are optimized independently and the same global supervision signals are reused across different synthetic images. In this study, we propose a new Diversity-driven EarlyLate Training (DELT) scheme to enhance the diversity of images in batch-to-global matching with less computation. Our approach is conceptually simple yet effective, it partitions predefined IPC samples into smaller subtasks and employs local optimizations to distill each subset into distributions from distinct phases, reducing the uniformity induced by the unified optimization process. These distilled images from the subtasks demonstrate effective generalization when applied to the entire task. We conduct extensive experiments on CIFAR, Tiny-ImageNet, ImageNet-1K, and its sub-datasets. Our approach outperforms the previous state-of-the-art by 2$\sim$5% on average across different datasets and IPCs (images per class), increasing diversity per class by more than 5% while reducing synthesis time by up to 39.3% for enhancing the training efficiency. Code is available at: https://github.com/VILA-Lab/DELT.

nan


Article 1502

Title@2025-06-06 (5): Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage

Title: Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage Kosteneffizientes LLM-Training mit Lifetime-Aware Tensor-Offloading über GPUDirect Storage 通过GPUPUDirect 储存进行终身软件传感器脱载的成本效益高的LLM培训 2506.06472v1

Authors (7): Ziqi Yuan, Haoyang Zhang, Yirui Eric Zhou, Apoorve Mohan, I-Hsin Chung, Seetharami Seelam, Jian Huang

We present the design and implementation of a new lifetime-aware tensor offloading framework for GPU memory expansion using low-cost PCIe-based solid-state drives (SSDs). Our framework, TERAIO, is developed explicitly for large language model (LLM) training with multiple GPUs and multiple SSDs. Its design is driven by our observation that the active tensors take only a small fraction (1.7% on average) of allocated GPU memory in each LLM training iteration, the inactive tensors are usually large and will not be used for a long period of time, creating ample opportunities for offloading/prefetching tensors to/from slow SSDs without stalling the GPU training process. TERAIO accurately estimates the lifetime (active period of time in GPU memory) of each tensor with the profiling of the first few iterations in the training process. With the tensor lifetime analysis, TERAIO will generate an optimized tensor offloading/prefetching plan and integrate it into the compiled LLM program via PyTorch. TERAIO has a runtime tensor migration engine to execute the offloading/prefetching plan via GPUDirect storage, which allows direct tensor migration between GPUs and SSDs for alleviating the CPU bottleneck and maximizing the SSD bandwidth utilization. In comparison with state-of-the-art studies such as ZeRO-Offload and ZeRO-Infinity, we show that TERAIO improves the training performance of various LLMs by 1.47x on average, and achieves 80.7% of the ideal performance assuming unlimited GPU memory.

nan


Article 1503

Title@2025-06-06 (5): SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation

Title: SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation SIGMA:通过西布林-引导蒙特卡罗增殖改进大语言示范理由说明 2506.06470v1

Authors (7): Yanwei Ren, Haotian Zhang, Fuxiang Wu, Jiayan Qiu, Jiaxing Huang, Baosheng Yu, Liu Liu

Enhancing large language models by simply scaling up datasets has begun to yield diminishing returns, shifting the spotlight to data quality. Monte Carlo Tree Search (MCTS) has emerged as a powerful technique for generating high-quality chain-of-thought data, yet conventional approaches typically retain only the top-scoring trajectory from the search tree, discarding sibling nodes that often contain valuable partial insights, recurrent error patterns, and alternative reasoning strategies. This unconditional rejection of non-optimal reasoning branches may waste vast amounts of informative data in the whole search tree. We propose SIGMA (Sibling Guided Monte Carlo Augmentation), a novel framework that reintegrates these discarded sibling nodes to refine LLM reasoning. SIGMA forges semantic links among sibling nodes along each search path and applies a two-stage refinement: a critique model identifies overlooked strengths and weaknesses across the sibling set, and a revision model conducts text-based backpropagation to refine the top-scoring trajectory in light of this comparative feedback. By recovering and amplifying the underutilized but valuable signals from non-optimal reasoning branches, SIGMA substantially improves reasoning trajectories. On the challenging MATH benchmark, our SIGMA-tuned 7B model achieves 54.92% accuracy using only 30K samples, outperforming state-of-the-art models trained on 590K samples. This result highlights that our sibling-guided optimization not only significantly reduces data usage but also significantly boosts LLM reasoning.

nan


Article 1504

Title@2025-06-06 (5): BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

Title: BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning BRiTE: Bootstrapping verstärkter Denkprozess zur Verbesserung von Sprachmodellen Brite:强化强化思考程序以加强语言示范理由 2501.18858v2

Authors (13): Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, Liwei Wang, Mingyi Hong, Zhaoran Wang

Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, yet generating reliable reasoning processes remains a significant challenge. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. Within this framework, we introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps. First, it generates high-quality rationales by approximating the optimal thinking process through reinforcement learning, using a novel reward shaping mechanism. Second, it enhances the base LLM by maximizing the joint probability of rationale generation with respect to the model’s parameters. Theoretically, we demonstrate BRiTE’s convergence at a rate of $1/T$ with $T$ representing the number of iterations. Empirical evaluations on math and coding benchmarks demonstrate that our approach consistently improves performance across different base models without requiring human-annotated thinking processes. In addition, BRiTE demonstrates superior performance compared to existing algorithms that bootstrap thinking processes use alternative methods such as rejection sampling, and can even match or exceed the results achieved through supervised fine-tuning with human-annotated data.

nan


Article 1505

Title@2025-06-06 (5): Latent Thought Models with Variational Bayes Inference-Time Computation

Title: Latent Thought Models with Variational Bayes Inference-Time Computation Latent Thought Models mit Variational Bayes Inferenz-Zeit-Computation 具有变异性贝湾推推算-时间计算法的古老思想模型 2502.01567v2

Authors (11): Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, Edouardo Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying Nian Wu

We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast learning of local variational parameters for the posterior distribution of latent vectors (inference-time computation), and slow learning of global decoder parameters. Empirical studies reveal that LTMs possess additional scaling dimensions beyond traditional Large Language Models (LLMs), such as the number of iterations in inference-time computation and number of latent thought vectors. Higher sample efficiency can be achieved by increasing training compute per token, with further gains possible by trading model size for more inference steps. Designed based on these scaling properties, LTMs demonstrate superior sample and parameter efficiency compared to autoregressive models and discrete diffusion models. They significantly outperform these counterparts in validation perplexity and zero-shot language modeling tasks. Additionally, LTMs exhibit emergent few-shot in-context reasoning capabilities that scale with model size, and achieve competitive performance in conditional and unconditional text generation.

nan


Article 1506

Title@2025-06-06 (5): FairICP: Encouraging Equalized Odds via Inverse Conditional Permutation

Title: FairICP: Encouraging Equalized Odds via Inverse Conditional Permutation FairICP: Förderung ausgeglichener Odds durch inverse Conditional Permutation FairICP:通过反条件性变异鼓励平等奇数 2404.05678v4

Authors (2): Yuheng Lai, Leying Guan

$\textit{Equalized odds}$, an important notion of algorithmic fairness, aims to ensure that sensitive variables, such as race and gender, do not unfairly influence the algorithm’s prediction when conditioning on the true outcome. Despite rapid advancements, current research primarily focuses on equalized odds violations caused by a single sensitive attribute, leaving the challenge of simultaneously accounting for multiple attributes under-addressed. We bridge this gap by introducing an in-processing fairness-aware learning approach, FairICP, which integrates adversarial learning with a novel inverse conditional permutation scheme. FairICP offers a flexible and efficient scheme to promote equalized odds under fairness conditions described by complex and multi-dimensional sensitive attributes. The efficacy and adaptability of our method are demonstrated through both simulation studies and empirical analyses of real-world datasets.

nan


Article 1507

Title@2025-06-06 (5): Towards Infant Sleep-Optimized Driving: Synergizing Wearable and Vehicle Sensing in Intelligent Cruise Control

Title: Towards Infant Sleep-Optimized Driving: Synergizing Wearable and Vehicle Sensing in Intelligent Cruise Control Auf dem Weg zum schlafoptimierten Fahren für Säuglinge: Verwendbares und Fahrzeug sensibilisieren in intelligenter Kreuzfahrtsteuerung 婴儿睡眠优化驾驶:在智能巡航控制中统一穿戴和车辆遥感 2506.06459v1

Authors (3): Ruitao Chen, Mozhang Guo, Jinge Li

Automated driving (AD) has substantially improved vehicle safety and driving comfort, but their impact on passenger well-being, particularly infant sleep, is not sufficiently studied. Sudden acceleration, abrupt braking, and sharp maneuvers can disrupt infant sleep, compromising both passenger comfort and parental convenience. To solve this problem, this paper explores the integration of reinforcement learning (RL) within AD to personalize driving behavior and optimally balance occupant comfort and travel efficiency. In particular, we propose an intelligent cruise control framework that adapts to varying driving conditions to enhance infant sleep quality by effectively synergizing wearable sensing and vehicle data. Long short-term memory (LSTM) and transformer-based neural networks are integrated with RL to model the relationship between driving behavior and infant sleep quality under diverse traffic and road conditions. Based on the sleep quality indicators from the wearable sensors, driving action data from vehicle controllers, and map data from map applications, the model dynamically computes the optimal driving aggressiveness level, which is subsequently translated into specific AD control strategies, e.g., the magnitude and frequency of acceleration, lane change, and overtaking. Simulation results demonstrate that the proposed solution significantly improves infant sleep quality compared to baseline methods, while preserving desirable travel efficiency.

nan


Article 1508

Title@2025-06-06 (5): WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets

Title: WISCA: A Consensus-Based Approach to Harmonizing Interpretability in Tabular Datasets WISCA: Ein auf Konsens basierender Ansatz zur Harmonisierung der Dolmetschbarkeit in Tabellendatensätzen WISSCA: 统一图表数据集解释性的协商一致办法 2506.06455v1

Authors (3): Antonio Jesús Banegas-Luna, Horacio Pérez-Sánchez, Carlos Martínez-Cortés

While predictive accuracy is often prioritized in machine learning (ML) models, interpretability remains essential in scientific and high-stakes domains. However, diverse interpretability algorithms frequently yield conflicting explanations, highlighting the need for consensus to harmonize results. In this study, six ML models were trained on six synthetic datasets with known ground truths, utilizing various model-agnostic interpretability techniques. Consensus explanations were generated using established methods and a novel approach: WISCA (Weighted Scaled Consensus Attributions), which integrates class probability and normalized attributions. WISCA consistently aligned with the most reliable individual method, underscoring the value of robust consensus strategies in improving explanation reliability.

nan


Article 1509

Title@2025-06-06 (5): LETS Forecast: Learning Embedology for Time Series Forecasting

Title: LETS Forecast: Learning Embedology for Time Series Forecasting LETS-Prognose: Lernembedologie für Zeitreihenprognosen LETS预测: 学习时间序列预测的生态学 2506.06454v1

Authors (5): Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Srinath Namburi GNVV, Nada Magdi Elkordi, Yin Li

Real-world time series are often governed by complex nonlinear dynamics. Understanding these underlying dynamics is crucial for precise future prediction. While deep learning has achieved major success in time series forecasting, many existing approaches do not explicitly model the dynamics. To bridge this gap, we introduce DeepEDM, a framework that integrates nonlinear dynamical systems modeling with deep neural networks. Inspired by empirical dynamic modeling (EDM) and rooted in Takens’ theorem, DeepEDM presents a novel deep model that learns a latent space from time-delayed embeddings, and employs kernel regression to approximate the underlying dynamics, while leveraging efficient implementation of softmax attention and allowing for accurate prediction of future time steps. To evaluate our method, we conduct comprehensive experiments on synthetic data of nonlinear dynamical systems as well as real-world time series across domains. Our results show that DeepEDM is robust to input noise, and outperforms state-of-the-art methods in forecasting accuracy. Our code is available at: https://abrarmajeedi.github.io/deep_edm.

nan


Article 1510

Title@2025-06-06 (5): Tackling Feature and Sample Heterogeneity in Decentralized Multi-Task Learning: A Sheaf-Theoretic Approach

Title: Tackling Feature and Sample Heterogeneity in Decentralized Multi-Task Learning: A Sheaf-Theoretic Approach Schlagen von Feature und Sample Heterogenität im dezentralisierten Multi-Task-Lernen: Ein schaftheoretischer Ansatz 处理分散化多任务学习中的地貌特征和抽样差异:沙夫理论方法 2502.01145v2

Authors (3): Chaouki Ben Issaid, Praneeth Vepakomma, Mehdi Bennis

Federated multi-task learning (FMTL) aims to simultaneously learn multiple related tasks across clients without sharing sensitive raw data. However, in the decentralized setting, existing FMTL frameworks are limited in their ability to capture complex task relationships and handle feature and sample heterogeneity across clients. To address these challenges, we introduce a novel sheaf-theoretic-based approach for FMTL. By representing client relationships using cellular sheaves, our framework can flexibly model interactions between heterogeneous client models. We formulate the sheaf-based FMTL optimization problem using sheaf Laplacian regularization and propose the Sheaf-FMTL algorithm to solve it. We show that the proposed framework provides a unified view encompassing many existing federated learning (FL) and FMTL approaches. Furthermore, we prove that our proposed algorithm, Sheaf-FMTL, achieves a sublinear convergence rate in line with state-of-the-art decentralized FMTL algorithms. Extensive experiments show that although Sheaf-FMTL introduces computational and storage overhead due to the management of interaction maps, it achieves substantial communication savings in terms of transmitted bits when compared to decentralized FMTL baselines. This trade-off makes Sheaf-FMTL especially suitable for cross-silo FL scenarios, where managing model heterogeneity and ensuring communication efficiency are essential, and where clients have adequate computational resources.

nan


Article 1511

Title@2025-06-06 (5): Canonical Autoregressive Generation

Title: Canonical Autoregressive Generation Kanonische autoregressive Generation Canonic 自动后退一代 2506.06446v1

Authors (4): Ivi Chatzi, Nina Corvelo Benz, Stratis Tsirtsis, Manuel Gomez-Rodriguez

State of the art large language models are trained using large amounts of tokens derived from raw text using what is called a tokenizer. Crucially, the tokenizer determines the (token) vocabulary a model will use during inference as well as, in principle, the (token) language. This is because, while the token vocabulary may allow for different tokenizations of a string, the tokenizer always maps the string to only one of these tokenizations–the canonical tokenization. However, multiple lines of empirical evidence suggest that large language models do not always generate canonical token sequences, and this comes with several negative consequences. In this work, we first show that, to generate a canonical token sequence, a model needs to generate (partial) canonical token sequences at each step of the autoregressive generation process underpinning its functioning. Building upon this theoretical result, we introduce canonical sampling, a simple and efficient sampling method that precludes a given model from generating non-canonical token sequences. Further, we also show that, in comparison with standard sampling, the distribution of token sequences generated using canonical sampling is provably closer to the true distribution of token sequences used during training.

nan


Article 1512

Title@2025-06-06 (5): Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Title: Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Saffron-1: Auf dem Weg zu einem Skalierungsparadigma für die LLM-Sicherheitssicherung 山红-1:努力为LLM公司安全保证建立一个推论范围扩增范例 2506.06444v1

Authors (5): Ruizhong Qiu, Gaotang Li, Tianxin Wei, Jingrui He, Hanghang Tong

Existing safety assurance research has primarily focused on training-phase alignment to instill safe behaviors into LLMs. However, recent studies have exposed these methods’ susceptibility to diverse jailbreak attacks. Concurrently, inference scaling has significantly advanced LLM reasoning capabilities but remains unexplored in the context of safety assurance. Addressing this gap, our work pioneers inference scaling for robust and effective LLM safety against emerging threats. We reveal that conventional inference scaling techniques, despite their success in reasoning tasks, perform poorly in safety contexts, even falling short of basic approaches like Best-of-N Sampling. We attribute this inefficiency to a newly identified challenge, the exploration–efficiency dilemma, arising from the high computational overhead associated with frequent process reward model (PRM) evaluations. To overcome this dilemma, we propose SAFFRON, a novel inference scaling paradigm tailored explicitly for safety assurance. Central to our approach is the introduction of a multifurcation reward model (MRM) that significantly reduces the required number of reward model evaluations. To operationalize this paradigm, we further propose: (i) a partial supervision training objective for MRM, (ii) a conservative exploration constraint to prevent out-of-distribution explorations, and (iii) a Trie-based key–value caching strategy that facilitates cache sharing across sequences during tree search. Extensive experiments validate the effectiveness of our method. Additionally, we publicly release our trained multifurcation reward model (Saffron-1) and the accompanying token-level safety reward dataset (Safety4M) to accelerate future research in LLM safety. Our code, model, and data are publicly available at https://github.com/q-rz/saffron , and our project homepage is at https://q-rz.github.io/p/saffron .

nan


Article 1513

Title@2025-06-06 (5): Unlocking Chemical Insights: Superior Molecular Representations from Intermediate Encoder Layers

Title: Unlocking Chemical Insights: Superior Molecular Representations from Intermediate Encoder Layers Entsperren von chemischen Erkenntnissen: Überlegene molekulare Darstellungen von Zwischen-Encoder-Schichten 解锁化学透视:中间编码层的高级分子代表 2506.06443v1

Authors (1): Luis Pinto

Pretrained molecular encoders have become indispensable in computational chemistry for tasks such as property prediction and molecular generation. However, the standard practice of relying solely on final-layer embeddings for downstream tasks may discard valuable information. In this work, we challenge this convention by conducting a comprehensive layer-wise analysis of five diverse molecular encoders across 22 ADMET property prediction tasks. Our results demonstrate that embeddings from intermediate layers consistently outperform final-layer representations. Specifically, using fixed embeddings from the optimal intermediate layers improved downstream performance by an average of 5.4%, reaching gains up to 28.6%. Furthermore, finetuning up to these intermediate layers yielded even greater average improvements of 8.5%, with performance increases as high as 40.8%, achieving new state-of-the-art results on several benchmarks. Additionally, a strong positive correlation between fixed embedding performance and finetuning outcomes supports an efficient evaluate-then-finetune approach, enabling identification of optimal layers with reduced computational cost. These findings highlight the importance of exploring the full representational depth of molecular encoders to achieve substantial performance improvements and computational efficiency. The code is made publicly available at https://github.com/luispintoc/Unlocking-Chemical-Insights.

nan


Article 1514

Title@2025-06-06 (5): A Framework of decision-relevant observability: Reinforcement Learning converges under relative ignorability

Title: A Framework of decision-relevant observability: Reinforcement Learning converges under relative ignorability Ein Rahmen entscheidungsrelevanter Beobachtbarkeit: Verstärktes Lernen konvergiert unter relativer Entzündlichkeit 与决策相关的可观察性框架:加强学习在相对忽视的情况下趋同 2504.07722v4

Authors (1): MaryLena Bleile

From clinical dosing algorithms to autonomous robots, sequential decision-making systems routinely operate with missing or incomplete data. Classical reinforcement learning theory, which is commonly used to solve sequential decision problems, assumes Markovian observability, which may not hold under partial observability. Causal inference paradigms formalise ignorability of missingness. We show these views can be unified and generalized in order to guarantee Q-learning convergence even when the Markov property fails. To do so, we introduce the concept of \emph{relative ignorability}. Relative ignorability is a graphical-causal criterion which refines the requirements for accurate decision-making based on incomplete data. Theoretical results and simulations both reveal that non-markovian stochastic processes whose missingness is relatively ignorable with respect to causal estimands can still be optimized using standard Reinforcement Learning algorithms. These results expand the theoretical foundations of safe, data-efficient AI to real-world environments where complete information is unattainable.

nan


Article 1515

Title@2025-06-06 (5): Exploring Diffusion Transformer Designs via Grafting

Title: Exploring Diffusion Transformer Designs via Grafting Erforschen von Diffusion Transformer Designs über Grafting 探索通过Garfting 开发传播变异器设计 2506.05340v2

Authors (12): Keshigeyan Chandrasegaran, Michael Poli, Daniel Y. Fu, Dongjun Kim, Lea M. Hadzic, Manling Li, Agrim Gupta, Stefano Massaroli, Azalia Mirhoseini, Juan Carlos Niebles, Stefano Ermon, Li Fei-Fei

Designing model architectures requires decisions such as selecting operators (e.g., attention, convolution) and configurations (e.g., depth, width). However, evaluating the impact of these decisions on model quality requires costly pretraining, limiting architectural investigation. Inspired by how new software is built on existing code, we ask: can new architecture designs be studied using pretrained models? To this end, we present grafting, a simple approach for editing pretrained diffusion transformers (DiTs) to materialize new architectures under small compute budgets. Informed by our analysis of activation behavior and attention locality, we construct a testbed based on the DiT-XL/2 design to study the impact of grafting on model quality. Using this testbed, we develop a family of hybrid designs via grafting: replacing softmax attention with gated convolution, local attention, and linear attention, and replacing MLPs with variable expansion ratio and convolutional variants. Notably, many hybrid designs achieve good quality (FID: 2.38-2.64 vs. 2.27 for DiT-XL/2) using <2% pretraining compute. We then graft a text-to-image model (PixArt-Sigma), achieving a 1.43x speedup with less than a 2% drop in GenEval score. Finally, we present a case study that restructures DiT-XL/2 by converting every pair of sequential transformer blocks into parallel blocks via grafting. This reduces model depth by 2x and yields better quality (FID: 2.77) than other models of comparable depth. Together, we show that new diffusion model designs can be explored by grafting pretrained DiTs, with edits ranging from operator replacement to architecture restructuring. Code and grafted models: https://grafting.stanford.edu

nan


Article 1516

Title@2025-06-06 (5): Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias

Title: Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias Eigenspektrumanalyse von Neuronalen Netzwerken ohne Aspect Ratio Bias 对无视比比比的神经网络进行微微光谱分析 2506.06280v1

Authors (4): Yuanzhe Hu, Kinshuk Goel, Vlad Killiakov, Yaoqing Yang

Diagnosing deep neural networks (DNNs) through the eigenspectrum of weight matrices has been an active area of research in recent years. At a high level, eigenspectrum analysis of DNNs involves measuring the heavytailness of the empirical spectral densities (ESD) of weight matrices. It provides insight into how well a model is trained and can guide decisions on assigning better layer-wise training hyperparameters. In this paper, we address a challenge associated with such eigenspectrum methods: the impact of the aspect ratio of weight matrices on estimated heavytailness metrics. We demonstrate that matrices of varying sizes (and aspect ratios) introduce a non-negligible bias in estimating heavytailness metrics, leading to inaccurate model diagnosis and layer-wise hyperparameter assignment. To overcome this challenge, we propose FARMS (Fixed-Aspect-Ratio Matrix Subsampling), a method that normalizes the weight matrices by subsampling submatrices with a fixed aspect ratio. Instead of measuring the heavytailness of the original ESD, we measure the average ESD of these subsampled submatrices. We show that measuring the heavytailness of these submatrices with the fixed aspect ratio can effectively mitigate the aspect ratio bias. We validate our approach across various optimization techniques and application domains that involve eigenspectrum analysis of weights, including image classification in computer vision (CV) models, scientific machine learning (SciML) model training, and large language model (LLM) pruning. Our results show that despite its simplicity, FARMS uniformly improves the accuracy of eigenspectrum analysis while enabling more effective layer-wise hyperparameter assignment in these application domains. In one of the LLM pruning experiments, FARMS reduces the perplexity of the LLaMA-7B model by 17.3% when compared with the state-of-the-art method.

nan


Article 1517

Title@2025-06-06 (5): STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Title: STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis STARFlow: Skalierung von Latent Normalisierungsströmen für hochauflösende Bildsynthese STARRFlow: 增加高分辨率图像合成的中端正常化流程 2506.06276v1

Authors (10): Jiatao Gu, Tianrong Chen, David Berthelot, Huangjie Zheng, Yuyang Wang, Ruixiang Zhang, Laurent Dinh, Miguel Angel Bautista, Josh Susskind, Shuangfei Zhai

We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis. The core of STARFlow is Transformer Autoregressive Flow (TARFlow), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers. We first establish the theoretical universality of TARFlow for modeling continuous distributions. Building on this foundation, we introduce several key architectural and algorithmic innovations to significantly enhance scalability: (1) a deep-shallow design, wherein a deep Transformer block captures most of the model representational capacity, complemented by a few shallow Transformer blocks that are computationally efficient yet substantially beneficial; (2) modeling in the latent space of pretrained autoencoders, which proves more effective than direct pixel-level modeling; and (3) a novel guidance algorithm that significantly boosts sample quality. Crucially, our model remains an end-to-end normalizing flow, enabling exact maximum likelihood training in continuous spaces without discretization. STARFlow achieves competitive performance in both class-conditional and text-conditional image generation tasks, approaching state-of-the-art diffusion models in sample quality. To our knowledge, this work is the first successful demonstration of normalizing flows operating effectively at this scale and resolution.

nan


Article 1518

Title@2025-06-06 (5): Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding

Title: Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding Filmfakten und Fibs (MF$^2$): Ein Benchmark für langes Filmverständnis 电影事实和纤维(MF$2美元):长期电影理解基准 2506.06275v1

Authors (31): Emmanouil Zaranis, António Farinhas, Saul Santos, Beatriz Canaverde, Miguel Moura Ramos, Aditya K Surikuchi, André Viveiros, Baohao Liao, Elena Bueno-Benito, Nithin Sivakumaran, Pavlo Vasylenko, Shoubin Yu, Sonal Sannigrahi, Wafaa Mohammed, Ben Peters, Danae Sánchez Villegas, Elias Stengel-Eskin, Giuseppe Attanasio, Jaehong Yoon, Stella Frank, Alessandro Suglia, Chrysoula Zerva, Desmond Elliott, Mariella Dimiccoli, Mohit Bansal, Oswald Lanz, Raffaella Bernardi, Raquel Fernández, Sandro Pezzelle, Vlad Niculae, André F. T. Martins

Despite recent progress in vision-language models (VLMs), holistic understanding of long-form video content remains a significant challenge, partly due to limitations in current benchmarks. Many focus on peripheral, ``needle-in-a-haystack’’ details, encouraging context-insensitive retrieval over deep comprehension. Others rely on large-scale, semi-automatically generated questions (often produced by language models themselves) that are easier for models to answer but fail to reflect genuine understanding. In this paper, we introduce MF$^2$, a new benchmark for evaluating whether models can comprehend, consolidate, and recall key narrative information from full-length movies (50-170 minutes long). MF$^2$ includes over 50 full-length, open-licensed movies, each paired with manually constructed sets of claim pairs – one true (fact) and one plausible but false (fib), totalling over 850 pairs. These claims target core narrative elements such as character motivations and emotions, causal chains, and event order, and refer to memorable moments that humans can recall without rewatching the movie. Instead of multiple-choice formats, we adopt a binary claim evaluation protocol: for each pair, models must correctly identify both the true and false claims. This reduces biases like answer ordering and enables a more precise assessment of reasoning. Our experiments demonstrate that both open-weight and closed state-of-the-art models fall well short of human performance, underscoring the relative ease of the task for humans and their superior ability to retain and reason over critical narrative information – an ability current VLMs lack.

nan


Article 1519

Title@2025-06-06 (5): Do Large Language Models Reason Causally Like Us? Even Better?

Title: Do Large Language Models Reason Causally Like Us? Even Better? Haben große Sprachmodelle Grund ursächlich wie wir? Noch besser? 大语言模型的原因和我们一样吗?更好吗? 2502.10215v2

Authors (4): Hanna M. Dettki, Brenden M. Lake, Charley M. Wu, Bob Rehder

Causal reasoning is a core component of intelligence. Large language models (LLMs) have shown impressive capabilities in generating human-like text, raising questions about whether their responses reflect true understanding or statistical patterns. We compared causal reasoning in humans and four LLMs using tasks based on collider graphs, rating the likelihood of a query variable occurring given evidence from other variables. LLMs’ causal inferences ranged from often nonsensical (GPT-3.5) to human-like to often more normatively aligned than those of humans (GPT-4o, Gemini-Pro, and Claude). Computational model fitting showed that one reason for GPT-4o, Gemini-Pro, and Claude’s superior performance is they didn’t exhibit the “associative bias” that plagues human causal reasoning. Nevertheless, even these LLMs did not fully capture subtler reasoning patterns associated with collider graphs, such as “explaining away”.

nan


Article 1520

Title@2025-06-06 (5): Covering Number of Real Algebraic Varieties and Beyond: Improved Bounds and Applications

Title: Covering Number of Real Algebraic Varieties and Beyond: Improved Bounds and Applications Abdeckung der Anzahl der realen algebraischen Sorten und darüber hinaus: Verbesserte Grenzen und Anwendungen 覆盖实际代数差异数目及以后:改善边界和应用 2311.05116v4

Authors (2): Yifan Zhang, Joe Kileel

Covering numbers are a powerful tool used in the development of approximation algorithms, randomized dimension reduction methods, smoothed complexity analysis, and others. In this paper we prove upper bounds on the covering number of numerous sets in Euclidean space, namely real algebraic varieties, images of polynomial maps and semialgebraic sets in terms of the number of variables and degrees of the polynomials involved. The bounds remarkably improve the best known general bound by Yomdin-Comte, and our proof is much more straightforward. In particular, our result gives new bounds on the volume of the tubular neighborhood of the image of a polynomial map and a semialgebraic set, where results for varieties by Lotz and Basu-Lerario are not directly applicable. We illustrate the power of the result on three computational applications. Firstly, we derive a near-optimal bound on the covering number of tensors with low canonical polyadic (CP) rank, quantifying their approximation properties and filling in an important missing piece of theory for tensor dimension reduction and reconstruction. Secondly, we prove a bound on dimensionality reduction of images of polynomial maps via randomized sketching, which has direct applications to large scale polynomial optimization. Finally, we deduce generalization error bounds for deep neural networks with rational or ReLU activation functions, improving or matching the best known results in the machine learning literature while helping to quantify the impact of architecture choice on generalization error.

nan


Article 1521

Title@2025-06-06 (5): Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems

Title: Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems Sketched Equivariant Imaging Regularization und Deep Internal Learning für inverse Probleme 用于反向问题的预言的等同成像正规化和深内部学习 2411.05771v4

Authors (3): Guixian Xu, Jinglai Li, Junqi Tang

Equivariant Imaging (EI) regularization has become the de-facto technique for unsupervised training of deep imaging networks, without any need of ground-truth data. Observing that the EI-based unsupervised training paradigm currently has significant computational redundancy leading to inefficiency in high-dimensional applications, we propose a sketched EI regularization which leverages the randomized sketching techniques for acceleration. We apply our sketched EI regularization to develop an accelerated deep internal learning framework, which can be efficiently applied for test-time network adaptation. Additionally, for network adaptation tasks, we propose a parameter-efficient approach to accelerate both EI and Sketched-EI via optimizing only the normalization layers. Our numerical study on X-ray CT and multicoil magnetic resonance image reconstruction tasks demonstrate that our approach can achieve significant computational acceleration over standard EI counterpart in single-input setting and network adaptation at test time.

nan


Article 1522

Title@2025-06-06 (5): Normalizing Flows are Capable Generative Models

Title: Normalizing Flows are Capable Generative Models Normalisierende Strömungen sind fähige Generative Modelle 正常流动是能够生成的模型 2412.06329v3

Authors (10): Shuangfei Zhai, Ruixiang Zhang, Preetum Nakkiran, David Berthelot, Jiatao Gu, Huangjie Zheng, Tianrong Chen, Miguel Angel Bautista, Navdeep Jaitly, Josh Susskind

Normalizing Flows (NFs) are likelihood-based models for continuous inputs. They have demonstrated promising results on both density estimation and generative modeling tasks, but have received relatively little attention in recent years. In this work, we demonstrate that NFs are more powerful than previously believed. We present TarFlow: a simple and scalable architecture that enables highly performant NF models. TarFlow can be thought of as a Transformer-based variant of Masked Autoregressive Flows (MAFs): it consists of a stack of autoregressive Transformer blocks on image patches, alternating the autoregression direction between layers. TarFlow is straightforward to train end-to-end, and capable of directly modeling and generating pixels. We also propose three key techniques to improve sample quality: Gaussian noise augmentation during training, a post training denoising procedure, and an effective guidance method for both class-conditional and unconditional settings. Putting these together, TarFlow sets new state-of-the-art results on likelihood estimation for images, beating the previous best methods by a large margin, and generates samples with quality and diversity comparable to diffusion models, for the first time with a stand-alone NF model. We make our code available at https://github.com/apple/ml-tarflow.

nan


Article 1523

Title@2025-06-06 (5): Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

Title: Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens Reflect-then-Plan: Offline-Modellbasierte Planung durch eine doppelt Bayesian-Lens 反映时计划:通过杜布里·巴耶斯冷镜头进行离线示范规划 2506.06261v1

Authors (5): Jihwan Jeong, Xiaoyu Wang, Jingmin Wang, Scott Sanner, Pascal Poupart

Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach. RefPlan unifies uncertainty modeling and MB planning by recasting planning as Bayesian posterior estimation. At deployment, it updates a belief over environment dynamics using real-time observations, incorporating uncertainty into MB planning via marginalization. Empirical results on standard benchmarks show that RefPlan significantly improves the performance of conservative offline RL policies. In particular, RefPlan maintains robust performance under high epistemic uncertainty and limited data, while demonstrating resilience to changing environment dynamics, improving the flexibility, generalizability, and robustness of offline-learned policies.

nan


Article 1524

Title@2025-06-06 (5): DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation

Title: DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation DPCore: Dynamischer Prompt-Kernsatz für kontinuierliche Test-Zeit-Anpassung DPCore: 持续试验-时间适应动态快速核心集 2406.10737v4

Authors (4): Yunbei Zhang, Akshay Mehra, Shuaicheng Niu, Jihun Hamm

Continual Test-Time Adaptation (CTTA) seeks to adapt source pre-trained models to continually changing, unseen target domains. While existing CTTA methods assume structured domain changes with uniform durations, real-world environments often exhibit dynamic patterns where domains recur with varying frequencies and durations. Current approaches, which adapt the same parameters across different domains, struggle in such dynamic conditions-they face convergence issues with brief domain exposures, risk forgetting previously learned knowledge, or misapplying it to irrelevant domains. To remedy this, we propose DPCore, a method designed for robust performance across diverse domain change patterns while ensuring computational efficiency. DPCore integrates three key components: Visual Prompt Adaptation for efficient domain alignment, a Prompt Coreset for knowledge preservation, and a Dynamic Update mechanism that intelligently adjusts existing prompts for similar domains while creating new ones for substantially different domains. Extensive experiments on four benchmarks demonstrate that DPCore consistently outperforms various CTTA methods, achieving state-of-the-art performance in both structured and dynamic settings while reducing trainable parameters by 99% and computation time by 64% compared to previous approaches.

nan


Article 1525

Title@2025-06-06 (5): PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time

Title: PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time PersonaAgent: Wenn große Sprachmodell-Agenten Personalisierung zur Testzeit treffen 人 员:当大语言示范媒介在试验时间达到个性化时 2506.06254v1

Authors (15): Weizhi Zhang, Xinyang Zhang, Chenwei Zhang, Liangwei Yang, Jingbo Shang, Zhepei Wei, Henry Peng Zou, Zijie Huang, Zhengyang Wang, Yifan Gao, Xiaoman Pan, Lian Xiong, Jingguo Liu, Philip S. Yu, Xian Li

Large Language Model (LLM) empowered agents have recently emerged as advanced paradigms that exhibit impressive capabilities in a wide range of domains and tasks. Despite their potential, current LLM agents often adopt a one-size-fits-all approach, lacking the flexibility to respond to users’ varying needs and preferences. This limitation motivates us to develop PersonaAgent, the first personalized LLM agent framework designed to address versatile personalization tasks. Specifically, PersonaAgent integrates two complementary components - a personalized memory module that includes episodic and semantic memory mechanisms; a personalized action module that enables the agent to perform tool actions tailored to the user. At the core, the persona (defined as unique system prompt for each user) functions as an intermediary: it leverages insights from personalized memory to control agent actions, while the outcomes of these actions in turn refine the memory. Based on the framework, we propose a test-time user-preference alignment strategy that simulate the latest n interactions to optimize the persona prompt, ensuring real-time user preference alignment through textual loss feedback between simulated and ground-truth responses. Experimental evaluations demonstrate that PersonaAgent significantly outperforms other baseline methods by not only personalizing the action space effectively but also scaling during test-time real-world applications. These results underscore the feasibility and potential of our approach in delivering tailored, dynamic user experiences.

nan


Article 1526

Title@2025-06-06 (5): Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Title: Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Sparse Autoencoder lernen monosemantische Funktionen in Vision-Language-Modellen 在视觉语言模型中学习数学特征 2504.02821v2

Authors (5): Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot, Serge Belongie, Zeynep Akata

Given that interpretability and steerability are crucial to AI safety, Sparse Autoencoders (SAEs) have emerged as a tool to enhance them in Large Language Models (LLMs). In this work, we extend the application of SAEs to Vision-Language Models (VLMs), such as CLIP, and introduce a comprehensive framework for evaluating monosemanticity at the neuron-level in vision representations. To ensure that our evaluation aligns with human perception, we propose a benchmark derived from a large-scale user study. Our experimental results reveal that SAEs trained on VLMs significantly enhance the monosemanticity of individual neurons, with sparsity and wide latents being the most influential factors. Notably, we demonstrate that applying SAE interventions on CLIP’s vision encoder directly steers multimodal LLM outputs (e.g., LLaVA), without any modifications to the underlying model. These findings emphasize the practicality and efficacy of SAEs as an unsupervised tool for enhancing both interpretability and control of VLMs. Code is available at https://github.com/ExplainableML/sae-for-vlm.

nan


Article 1527

Title@2025-06-06 (5): Lagrangian-based Equilibrium Propagation: generalisation to arbitrary boundary conditions & equivalence with Hamiltonian Echo Learning

Title: Lagrangian-based Equilibrium Propagation: generalisation to arbitrary boundary conditions & equivalence with Hamiltonian Echo Learning Lagrangian-basierte Equilibrium Propagation: Verallgemeinerung zu willkürlichen Randbedingungen & Äquivalenz mit Hamiltonian Echo Learning Lagrangian 以拉格朗江为基础的平衡传播:对任意边界条件的概括化和与汉密尔顿回声学习的等同 2506.06248v1

Authors (4): Guillaume Pourcel, Debabrota Basu, Maxence Ernoult, Aditya Gilra

Equilibrium Propagation (EP) is a learning algorithm for training Energy-based Models (EBMs) on static inputs which leverages the variational description of their fixed points. Extending EP to time-varying inputs is a challenging problem, as the variational description must apply to the entire system trajectory rather than just fixed points, and careful consideration of boundary conditions becomes essential. In this work, we present Generalized Lagrangian Equilibrium Propagation (GLEP), which extends the variational formulation of EP to time-varying inputs. We demonstrate that GLEP yields different learning algorithms depending on the boundary conditions of the system, many of which are impractical for implementation. We then show that Hamiltonian Echo Learning (HEL) – which includes the recently proposed Recurrent HEL (RHEL) and the earlier known Hamiltonian Echo Backpropagation (HEB) algorithms – can be derived as a special case of GLEP. Notably, HEL is the only instance of GLEP we found that inherits the properties that make EP a desirable alternative to backpropagation for hardware implementations: it operates in a “forward-only” manner (i.e. using the same system for both inference and learning), it scales efficiently (requiring only two or more passes through the system regardless of model size), and enables local learning.

nan


Article 1528

Title@2025-06-06 (5): Neural Responses to Affective Sentences Reveal Signatures of Depression

Title: Neural Responses to Affective Sentences Reveal Signatures of Depression Neurale Reaktionen auf Affective Sentences Offenbarungssignaturen von Depressionen 对消极判决的神经反应 反萧条征兆 2506.06244v1

Authors (16): Aditya Kommineni, Woojae Jeong, Kleanthis Avramidis, Colin McDaniel, Myzelle Hughes, Thomas McGee, Elsi Kaiser, Kristina Lerman, Idan A. Blank, Dani Byrd, Assal Habibi, B. Rael Cahn, Sudarsana Kadiri, Takfarinas Medani, Richard M. Leahy, Shrikanth Narayanan

Major Depressive Disorder (MDD) is a highly prevalent mental health condition, and a deeper understanding of its neurocognitive foundations is essential for identifying how core functions such as emotional and self-referential processing are affected. We investigate how depression alters the temporal dynamics of emotional processing by measuring neural responses to self-referential affective sentences using surface electroencephalography (EEG) in healthy and depressed individuals. Our results reveal significant group-level differences in neural activity during sentence viewing, suggesting disrupted integration of emotional and self-referential information in depression. Deep learning model trained on these responses achieves an area under the receiver operating curve (AUC) of 0.707 in distinguishing healthy from depressed participants, and 0.624 in differentiating depressed subgroups with and without suicidal ideation. Spatial ablations highlight anterior electrodes associated with semantic and affective processing as key contributors. These findings suggest stable, stimulus-driven neural signatures of depression that may inform future diagnostic tools.

nan


Article 1529

Title@2025-06-06 (5): LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

Title: LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models LlavaGuard: Ein offenes VLM-basiertes Framework zur Sicherung von Vision-Datensätzen und -Modellen LlavaGuard:基于VLM的公开保护视觉数据集和模型框架 2406.05113v3

Authors (5): Lukas Helff, Felix Friedrich, Manuel Brack, Kristian Kersting, Patrick Schramowski

This paper introduces LlavaGuard, a suite of VLM-based vision safeguards that address the critical need for reliable guardrails in the era of large-scale data and models. To this end, we establish a novel open framework, describing a customizable safety taxonomy, data preprocessing, augmentation, and training setup. For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category, and rationale. We also employ advanced augmentations to support context-specific assessments. The resulting LlavaGuard models, ranging from 0.5B to 7B, serve as a versatile tool for evaluating the safety compliance of visual content against flexible policies. In comprehensive experiments, LlavaGuard outperforms both state-of-the-art safeguards and VLMs in accuracy and in flexibly handling different policies. Additionally, we demonstrate LlavaGuard’s performance in two real-world applications: large-scale dataset annotation and moderation of text-to-image models. We make our entire framework, including the dataset, model weights, and training code.

nan


Article 1530

Title@2025-06-06 (5): fairmetrics: An R package for group fairness evaluation

Title: fairmetrics: An R package for group fairness evaluation fairmetrics: Ein R-Paket für die Bewertung von Gruppengerechtigkeit 公平度:团体公平评估R包件 2506.06243v1

Authors (3): Benjamin Smith, Jianhui Gao, Jessica Gronsbell

Fairness is a growing area of machine learning (ML) that focuses on ensuring models do not produce systematically biased outcomes for specific groups, particularly those defined by protected attributes such as race, gender, or age. Evaluating fairness is a critical aspect of ML model development, as biased models can perpetuate structural inequalities. The {fairmetrics} R package offers a user-friendly framework for rigorously evaluating numerous group-based fairness criteria, including metrics based on independence (e.g., statistical parity), separation (e.g., equalized odds), and sufficiency (e.g., predictive parity). Group-based fairness criteria assess whether a model is equally accurate or well-calibrated across a set of predefined groups so that appropriate bias mitigation strategies can be implemented. {fairmetrics} provides both point and interval estimates for multiple metrics through a convenient wrapper function and includes an example dataset derived from the Medical Information Mart for Intensive Care, version II (MIMIC-II) database (Goldberger et al., 2000; Raffa, 2016).

nan


Article 1531

Title@2025-06-06 (5): TQml Simulator: Optimized Simulation of Quantum Machine Learning

Title: TQml Simulator: Optimized Simulation of Quantum Machine Learning TQml Simulator: Optimierte Simulation des Quantum Machine Learning TQml 模拟器: 量子机器学习优化模拟 2506.04891v2

Authors (5): Viacheslav Kuzmin, Basil Kyriacou, Mateusz Papierz, Mo Kordzanganeh, Alexey Melnikov

Hardware-efficient circuits employed in Quantum Machine Learning are typically composed of alternating layers of uniformly applied gates. High-speed numerical simulators for such circuits are crucial for advancing research in this field. In this work, we numerically benchmark universal and gate-specific techniques for simulating the action of layers of gates on quantum state vectors, aiming to accelerate the overall simulation of Quantum Machine Learning algorithms. Our analysis shows that the optimal simulation method for a given layer of gates depends on the number of qubits involved, and that a tailored combination of techniques can yield substantial performance gains in the forward and backward passes for a given circuit. Building on these insights, we developed a numerical simulator, named TQml Simulator, that employs the most efficient simulation method for each layer in a given circuit. We evaluated TQml Simulator on circuits constructed from standard gate sets, such as rotations and CNOTs, as well as on native gates from IonQ and IBM quantum processing units. In most cases, our simulator outperforms equivalent Pennylane’s default_qubit simulator by up to a factor of 10, depending on the circuit, the number of qubits, the batch size of the input data, and the hardware used.

nan


Article 1532

Title@2025-06-06 (5): Towards an Explainable Comparison and Alignment of Feature Embeddings

Title: Towards an Explainable Comparison and Alignment of Feature Embeddings Auf dem Weg zu einem erklärbaren Vergleich und der Ausrichtung von Feature-Embeddings 争取实现可解释的比较和地貌隐含物的一致 2506.06231v1

Authors (3): Mohammad Jalali, Bahar Dibaei Nia, Farzan Farnia

While several feature embedding models have been developed in the literature, comparisons of these embeddings have largely focused on their numerical performance in classification-related downstream applications. However, an interpretable comparison of different embeddings requires identifying and analyzing mismatches between sample groups clustered within the embedding spaces. In this work, we propose the \emph{Spectral Pairwise Embedding Comparison (SPEC)} framework to compare embeddings and identify their differences in clustering a reference dataset. Our approach examines the kernel matrices derived from two embeddings and leverages the eigendecomposition of the difference kernel matrix to detect sample clusters that are captured differently by the two embeddings. We present a scalable implementation of this kernel-based approach, with computational complexity that grows linearly with the sample size. Furthermore, we introduce an optimization problem using this framework to align two embeddings, ensuring that clusters identified in one embedding are also captured in the other model. We provide numerical results demonstrating the SPEC’s application to compare and align embeddings on large-scale datasets such as ImageNet and MS-COCO. The code is available at https://github.com/mjalali/embedding-comparison.

nan


Article 1533

Title@2025-06-06 (5): NeurNCD: Novel Class Discovery via Implicit Neural Representation

Title: NeurNCD: Novel Class Discovery via Implicit Neural Representation NeurNCD: Novel Class Discovery über Implizite Neuraldarstellung NeurNCD: 通过隐性神经代表方式发现新类 2506.06412v1

Authors (2): Junming Wang, Yi Shi

Discovering novel classes in open-world settings is crucial for real-world applications. Traditional explicit representations, such as object descriptors or 3D segmentation maps, are constrained by their discrete, hole-prone, and noisy nature, which hinders accurate novel class discovery. To address these challenges, we introduce NeurNCD, the first versatile and data-efficient framework for novel class discovery that employs the meticulously designed Embedding-NeRF model combined with KL divergence as a substitute for traditional explicit 3D segmentation maps to aggregate semantic embedding and entropy in visual embedding space. NeurNCD also integrates several key components, including feature query, feature modulation and clustering, facilitating efficient feature augmentation and information exchange between the pre-trained semantic segmentation network and implicit neural representations. As a result, our framework achieves superior segmentation performance in both open and closed-world settings without relying on densely labelled datasets for supervised training or human interaction to generate sparse label supervision. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches on the NYUv2 and Replica datasets.

nan


Article 1534

Title@2025-06-06 (5): Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets

Title: Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets Fréchet Radiomic Distance (FRD): Ein vielseitiges Metric für den Vergleich medizinischer Bilddatensätze Fréchet放射性距离(FRD):用于比较医疗成像数据集的Versatile计量仪 2412.01496v2

Authors (19): Nicholas Konz, Richard Osuala, Preeti Verma, Yuwen Chen, Hanxue Gu, Haoyu Dong, Yaqian Chen, Andrew Marshall, Lidia Garrucho, Kaisar Kushibar, Daniel M. Lang, Gene S. Kim, Lars J. Grimm, John M. Lewin, James S. Duncan, Julia A. Schnabel, Oliver Diaz, Karim Lekadir, Maciej A. Mazurowski

Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perceptual metrics (e.g., Fr'echet Inception Distance/FID) from natural imaging, which we show insufficiently capture anatomical features. To this end, we introduce a new perceptual metric tailored for medical images, FRD (Fr'echet Radiomic Distance), which utilizes standardized, clinically meaningful, and interpretable image features. We show that FRD is superior to other image distribution metrics for a range of medical imaging applications, including out-of-domain (OOD) detection, the evaluation of image-to-image translation (by correlating more with downstream task performance as well as anatomical consistency and realism), and the evaluation of unconditional image generation. Moreover, FRD offers additional benefits such as stability and computational efficiency at low sample sizes, sensitivity to image corruptions and adversarial attacks, feature interpretability, and correlation with radiologist-perceived image quality. Additionally, we address key gaps in the literature by presenting an extensive framework for the multifaceted evaluation of image similarity metrics in medical imaging – including the first large-scale comparative study of generative models for medical image translation – and release an accessible codebase to facilitate future research. Our results are supported by thorough experiments spanning a variety of datasets, modalities, and downstream tasks, highlighting the broad potential of FRD for medical image analysis.

nan


Article 1535

Title@2025-06-06 (5): Corrector Sampling in Language Models

Title: Corrector Sampling in Language Models Korrektor-Probenahme in Sprachmodellen 语言模型中校正器样本 2506.06215v1

Authors (4): Itai Gat, Neta Shaul, Uriel Singer, Yaron Lipman

Autoregressive language models accumulate errors due to their fixed, irrevocable left-to-right token generation. To address this, we propose a new sampling method called Resample-Previous-Tokens (RPT). RPT mitigates error accumulation by iteratively revisiting and potentially replacing tokens in a window of previously generated text. This method can be integrated into existing autoregressive models, preserving their next-token-prediction quality and speed. Fine-tuning a pretrained 8B parameter model with RPT for only 100B resulted in ~10% relative improvements on reasoning and coding benchmarks compared to the standard sampling.

nan


Article 1536

Title@2025-06-06 (5): Model-Driven Graph Contrastive Learning

Title: Model-Driven Graph Contrastive Learning Modellgetriebenes Graph Kontrastives Lernen 模型驱动图反比学习 2506.06212v1

Authors (3): Ali Azizpour, Nicolas Zilberstein, Santiago Segarra

We propose $\textbf{MGCL}$, a model-driven graph contrastive learning (GCL) framework that leverages graphons (probabilistic generative models for graphs) to guide contrastive learning by accounting for the data’s underlying generative process. GCL has emerged as a powerful self-supervised framework for learning expressive node or graph representations without relying on annotated labels, which are often scarce in real-world data. By contrasting augmented views of graph data, GCL has demonstrated strong performance across various downstream tasks, such as node and graph classification. However, existing methods typically rely on manually designed or heuristic augmentation strategies that are not tailored to the underlying data distribution and operate at the individual graph level, ignoring similarities among graphs generated from the same model. Conversely, in our proposed approach, MGCL first estimates the graphon associated with the observed data and then defines a graphon-informed augmentation process, enabling data-adaptive and principled augmentations. Additionally, for graph-level tasks, MGCL clusters the dataset and estimates a graphon per group, enabling contrastive pairs to reflect shared semantics and structure. Extensive experiments on benchmark datasets demonstrate that MGCL achieves state-of-the-art performance, highlighting the advantages of incorporating generative models into GCL.

nan


Article 1537

Title@2025-06-06 (5): Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning

Title: Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning Pseudo-Etikettierung trifft Labelglättung für geräuscharmes partiales Label-Lernen Pseudo 标签会满足新闻部分标签学习的标签平滑 2402.04835v3

Authors (3): Darshana Saravanan, Naresh Manwani, Vineet Gandhi

We motivate weakly supervised learning as an effective learning paradigm for problems where curating perfectly annotated datasets is expensive and may require domain expertise such as fine-grained classification. We focus on Partial Label Learning (PLL), a weakly-supervised learning paradigm where each training instance is paired with a set of candidate labels (partial label), one of which is the true label. Noisy PLL (NPLL) relaxes this constraint by allowing some partial labels to not contain the true label, enhancing the practicality of the problem. Our work centres on NPLL and presents a framework that initially assigns pseudo-labels to images by exploiting the noisy partial labels through a weighted nearest neighbour algorithm. These pseudo-label and image pairs are then used to train a deep neural network classifier with label smoothing. The classifier’s features and predictions are subsequently employed to refine and enhance the accuracy of pseudo-labels. We perform thorough experiments on seven datasets and compare against nine NPLL and PLL methods. We achieve state-of-the-art results in all studied settings from the prior literature, obtaining substantial gains in the simulated fine-grained benchmarks. Further, we show the promising generalisation capability of our framework in realistic, fine-grained, crowd-sourced datasets.

nan


Article 1538

Title@2025-06-06 (5): Kinetics: Rethinking Test-Time Scaling Laws

Title: Kinetics: Rethinking Test-Time Scaling Laws Kinetik: Überdenken von Test-Zeit-Skalierungsgesetzen 动因:重新思考试验时间扩增法 2506.05333v2

Authors (6): Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen

We rethink test-time scaling laws from a practical efficiency perspective, revealing that the effectiveness of smaller models is significantly overestimated. Prior work, grounded in compute-optimality, overlooks critical memory access bottlenecks introduced by inference-time strategies (e.g., Best-of-$N$, long CoTs). Our holistic analysis, spanning models from 0.6B to 32B parameters, reveals a new Kinetics Scaling Law that better guides resource allocation by incorporating both computation and memory access costs. Kinetics Scaling Law suggests that test-time compute is more effective when used on models above a threshold than smaller ones. A key reason is that in TTS, attention, rather than parameter count, emerges as the dominant cost factor. Motivated by this, we propose a new scaling paradigm centered on sparse attention, which lowers per-token cost and enables longer generations and more parallel samples within the same resource budget. Empirically, we show that sparse attention models consistently outperform dense counterparts, achieving over 60 points gains in low-cost regimes and over 5 points gains in high-cost regimes for problem-solving accuracy on AIME, encompassing evaluations on state-of-the-art MoEs. These results suggest that sparse attention is essential and increasingly important with more computing invested, for realizing the full potential of test-time scaling where, unlike training, accuracy has yet to saturate as a function of computation, and continues to improve through increased generation. The code is available at https://github.com/Infini-AI-Lab/Kinetics.

nan


Article 1539

Title@2025-06-06 (5): ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment

Title: ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment ByzSecAgg: Ein Byzantinisch-Resistent Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment ByzsecAgg:基于编码计算和矢量承诺的联邦学习拜占庭 – – 相对安全安全聚合计划 2302.09913v4

Authors (3): Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali, Giuseppe Caire

In this paper, we propose ByzSecAgg, an efficient secure aggregation scheme for federated learning that is resistant to Byzantine attacks and privacy leakages. Processing individual updates to manage adversarial behavior, while preserving the privacy of the data against colluding nodes, requires some sort of secure secret sharing. However, the communication load for secret sharing of long vectors of updates can be very high. In federated settings, where users are often edge devices with potential bandwidth constraints, excessive communication overhead is undesirable. ByzSecAgg solves this problem by partitioning local updates into smaller sub-vectors and sharing them using ramp secret sharing. However, this sharing method does not admit bilinear computations, such as pairwise distances calculations, which are needed for distance-based outlier-detection algorithms, and effective methods for mitigating Byzantine attacks. To overcome this issue, each user runs another round of ramp sharing, with a different embedding of the data in the sharing polynomial. This technique, motivated by ideas from coded computing, enables secure computation of pairwise distance. In addition, to maintain the integrity and privacy of the local update, ByzSecAgg also uses a vector commitment method, in which the commitment size remains constant (i.e., does not increase with the length of the local update), while simultaneously allowing verification of the secret sharing process. In terms of communication load, ByzSecAgg significantly outperforms the related baseline scheme, known as BREA.

nan


Article 1540

Title@2025-06-06 (5): How to craft a deep reinforcement learning policy for wind farm flow control

Title: How to craft a deep reinforcement learning policy for wind farm flow control Wie man eine tief greifende Lernpolitik für die Steuerung des Windparkflusses schafft 如何为风力农场流动控制制定深入强化学习政策 2506.06204v1

Authors (5): Elie Kadoche, Pascal Bianchi, Florence Carton, Philippe Ciblat, Damien Ernst

Within wind farms, wake effects between turbines can significantly reduce overall energy production. Wind farm flow control encompasses methods designed to mitigate these effects through coordinated turbine control. Wake steering, for example, consists in intentionally misaligning certain turbines with the wind to optimize airflow and increase power output. However, designing a robust wake steering controller remains challenging, and existing machine learning approaches are limited to quasi-static wind conditions or small wind farms. This work presents a new deep reinforcement learning methodology to develop a wake steering policy that overcomes these limitations. Our approach introduces a novel architecture that combines graph attention networks and multi-head self-attention blocks, alongside a novel reward function and training strategy. The resulting model computes the yaw angles of each turbine, optimizing energy production in time-varying wind conditions. An empirical study conducted on steady-state, low-fidelity simulation, shows that our model requires approximately 10 times fewer training steps than a fully connected neural network and achieves more robust performance compared to a strong optimization baseline, increasing energy production by up to 14 %. To the best of our knowledge, this is the first deep reinforcement learning-based wake steering controller to generalize effectively across any time-varying wind conditions in a low-fidelity, steady-state numerical simulation setting.

nan


Article 1541

Title@2025-06-06 (5): MLOps with Microservices: A Case Study on the Maritime Domain

Title: MLOps with Microservices: A Case Study on the Maritime Domain MLOps mit Microservices: Eine Fallstudie zum maritimen Bereich 具有微服务的多边业务方案:海洋领域案例研究 2506.06202v1

Authors (3): Renato Cordeiro Ferreira, Rowanne Trapmann, Willem-Jan van den Heuvel

This case study describes challenges and lessons learned on building Ocean Guard: a Machine Learning-Enabled System (MLES) for anomaly detection in the maritime domain. First, the paper presents the system’s specification, and architecture. Ocean Guard was designed with a microservices’ architecture to enable multiple teams to work on the project in parallel. Then, the paper discusses how the developers adapted contract-based design to MLOps for achieving that goal. As a MLES, Ocean Guard employs code, model, and data contracts to establish guidelines between its services. This case study hopes to inspire software engineers, machine learning engineers, and data scientists to leverage similar approaches for their systems.

nan


Article 1542

Title@2025-06-06 (5): Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons

Title: Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons Quantifizierung der Optimierungs- und Generalisierungsvorteile von Graphen-Neuralnetzwerken gegenüber mehrschichtigen Perzeptronen 量化多层倍感图像神经网络的优化和普及优势 2306.13926v3

Authors (5): Wei Huang, Yuan Cao, Haonan Wang, Xin Cao, Taiji Suzuki

Graph neural networks (GNNs) have demonstrated remarkable capabilities in learning from graph-structured data, often outperforming traditional Multilayer Perceptrons (MLPs) in numerous graph-based tasks. Although existing works have demonstrated the benefits of graph convolution through Laplacian smoothing, expressivity or separability, there remains a lack of quantitative analysis comparing GNNs and MLPs from an optimization and generalization perspective. This study aims to address this gap by examining the role of graph convolution through feature learning theory. Using a signal-noise data model, we conduct a comparative analysis of the optimization and generalization between two-layer graph convolutional networks (GCNs) and their MLP counterparts. Our approach tracks the trajectory of signal learning and noise memorization in GNNs, characterizing their post-training generalization. We reveal that GNNs significantly prioritize signal learning, thus enhancing the regime of {low test error} over MLPs by $D^{q-2}$ times, where $D$ denotes a node’s expected degree and $q$ is the power of ReLU activation function with $q>2$. This finding highlights a substantial and quantitative discrepancy between GNNs and MLPs in terms of optimization and generalization, a conclusion further supported by our empirical simulations on both synthetic and real-world datasets.

nan


Article 1543

Title@2025-06-06 (5): DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models

Title: DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models DyGMamba: Langfristige zeitliche Abhängigkeiten auf kontinuierlich-zeitdynamischen Graphen mit State Space Models effizient modellieren DyGMamba:利用国家空间模型以连续时动态图表作为有效模拟长期时间依赖模型 2408.04713v4

Authors (8): Zifeng Ding, Yifeng Li, Yuan He, Antonio Norelli, Jingcheng Wu, Volker Tresp, Michael Bronstein, Yunpu Ma

Learning useful representations for continuous-time dynamic graphs (CTDGs) is challenging, due to the concurrent need to span long node interaction histories and grasp nuanced temporal details. In particular, two problems emerge: (1) Encoding longer histories requires more computational resources, making it crucial for CTDG models to maintain low computational complexity to ensure efficiency; (2) Meanwhile, more powerful models are needed to identify and select the most critical temporal information within the extended context provided by longer histories. To address these problems, we propose a CTDG representation learning model named DyGMamba, originating from the popular Mamba state space model (SSM). DyGMamba first leverages a node-level SSM to encode the sequence of historical node interactions. Another time-level SSM is then employed to exploit the temporal patterns hidden in the historical graph, where its output is used to dynamically select the critical information from the interaction history. We validate DyGMamba experimentally on the dynamic link prediction task. The results show that our model achieves state-of-the-art in most cases. DyGMamba also maintains high efficiency in terms of computational resources, making it possible to capture long temporal dependencies with a limited computation budget.

nan


Article 1544

Title@2025-06-06 (5): Transformative or Conservative? Conservation laws for ResNets and Transformers

Title: Transformative or Conservative? Conservation laws for ResNets and Transformers Transformative oder konservative? Erhaltungsgesetze für ResNets und Transformer 改革还是保守? ResNets和变异器的养护法 2506.06194v1

Authors (3): Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré

While conservation laws in gradient flow training dynamics are well understood for (mostly shallow) ReLU and linear networks, their study remains largely unexplored for more practical architectures. This paper bridges this gap by deriving and analyzing conservation laws for modern architectures, with a focus on convolutional ResNets and Transformer networks. For this, we first show that basic building blocks such as ReLU (or linear) shallow networks, with or without convolution, have easily expressed conservation laws, and no more than the known ones. In the case of a single attention layer, we also completely describe all conservation laws, and we show that residual blocks have the same conservation laws as the same block without a skip connection. We then introduce the notion of conservation laws that depend only on a subset of parameters (corresponding e.g. to a pair of consecutive layers, to a residual block, or to an attention layer). We demonstrate that the characterization of such laws can be reduced to the analysis of the corresponding building block in isolation. Finally, we examine how these newly discovered conservation principles, initially established in the continuous gradient flow regime, persist under discrete optimization dynamics, particularly in the context of Stochastic Gradient Descent (SGD).

nan


Article 1545

Title@2025-06-06 (5): ICU-TSB: A Benchmark for Temporal Patient Representation Learning for Unsupervised Stratification into Patient Cohorts

Title: ICU-TSB: A Benchmark for Temporal Patient Representation Learning for Unsupervised Stratification into Patient Cohorts ICU-TSB: Benchmark für zeitliche Patientenrepräsentanz-Lernen zur unüberwachten Schichtung in Patientenkohorten ICU-TSB: 用于在不受监督的情况下将患者分划入患者群体的时间病人代表制学习基准 2506.06192v1

Authors (5): Dimitrios Proios, Alban Bornet, Anthony Yazdani, Jose F Rodrigues Jr, Douglas Teodoro

Patient stratification identifying clinically meaningful subgroups is essential for advancing personalized medicine through improved diagnostics and treatment strategies. Electronic health records (EHRs), particularly those from intensive care units (ICUs), contain rich temporal clinical data that can be leveraged for this purpose. In this work, we introduce ICU-TSB (Temporal Stratification Benchmark), the first comprehensive benchmark for evaluating patient stratification based on temporal patient representation learning using three publicly available ICU EHR datasets. A key contribution of our benchmark is a novel hierarchical evaluation framework utilizing disease taxonomies to measure the alignment of discovered clusters with clinically validated disease groupings. In our experiments with ICU-TSB, we compared statistical methods and several recurrent neural networks, including LSTM and GRU, for their ability to generate effective patient representations for subsequent clustering of patient trajectories. Our results demonstrate that temporal representation learning can rediscover clinically meaningful patient cohorts; nevertheless, it remains a challenging task, with v-measuring varying from up to 0.46 at the top level of the taxonomy to up to 0.40 at the lowest level. To further enhance the practical utility of our findings, we also evaluate multiple strategies for assigning interpretable labels to the identified clusters. The experiments and benchmark are fully reproducible and available at https://github.com/ds4dh/CBMS2025stratification.

nan


Article 1546

Title@2025-06-06 (5): Physics-Informed Neural Networks for Control of Single-Phase Flow Systems Governed by Partial Differential Equations

Title: Physics-Informed Neural Networks for Control of Single-Phase Flow Systems Governed by Partial Differential Equations Physik-informierte Neuronale Netze zur Steuerung von einphasigen Flusssystemen, die durch partielle Differentialgleichungen gesteuert werden 用于控制按部分差异等量划分的单阶段流动系统的物理内建神经网络 2506.06188v1

Authors (4): Luis Kin Miyatake, Eduardo Camponogara, Eric Aislan Antonelo, Alexey Pavlov

The modeling and control of single-phase flow systems governed by Partial Differential Equations (PDEs) present challenges, especially under transient conditions. In this work, we extend the Physics-Informed Neural Nets for Control (PINC) framework, originally proposed to modeling and control of Ordinary Differential Equations (ODE) without the need of any labeled data, to the PDE case, particularly to single-phase incompressible and compressible flows, integrating neural networks with physical conservation laws. The PINC model for PDEs is structured into two stages: a steady-state network, which learns equilibrium solutions for a wide range of control inputs, and a transient network, which captures dynamic responses under time-varying boundary conditions. We propose a simplifying assumption that reduces the dimensionality of the spatial coordinate regarding the initial condition, allowing the efficient training of the PINC network. This simplification enables the derivation of optimal control policies using Model Predictive Control (MPC). We validate our approach through numerical experiments, demonstrating that the PINC model, which is trained exclusively using physical laws, i.e., without labeled data, accurately represents flow dynamics and enables real-time control applications. The results highlight the PINC’s capability to efficiently approximate PDE solutions without requiring iterative solvers, making it a promising alternative for fluid flow monitoring and optimization in engineering applications.

nan


Article 1547

Title@2025-06-06 (5): Antithetic Noise in Diffusion Models

Title: Antithetic Noise in Diffusion Models Antithetischer Lärm bei Diffusionsmodellen 扩散模型中的抗异静能 2506.06185v1

Authors (6): Jing Jia, Sifan Liu, Bowen Song, Wei Yuan, Liyue Shen, Guanyang Wang

We initiate a systematic study of antithetic initial noise in diffusion models. Across unconditional models trained on diverse datasets, text-conditioned latent-diffusion models, and diffusion-posterior samplers, we find that pairing each initial noise with its negation consistently yields strongly negatively correlated samples. To explain this phenomenon, we combine experiments and theoretical analysis, leading to a symmetry conjecture that the learned score function is approximately affine antisymmetric (odd symmetry up to a constant shift), and provide evidence supporting it. Leveraging this negative correlation, we enable two applications: (1) enhancing image diversity in models like Stable Diffusion without quality loss, and (2) sharpening uncertainty quantification (e.g., up to 90% narrower confidence intervals) when estimating downstream statistics. Building on these gains, we extend the two-point pairing to a randomized quasi-Monte Carlo estimator, which further improves estimation accuracy. Our framework is training-free, model-agnostic, and adds no runtime overhead.

nan


Article 1548

Title@2025-06-06 (5): SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels

Title: SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels SemiOccam: Ein robustes semi-überwachtes Bilderkennungsnetzwerk mit Sparse-Labels 半 Occam: 使用粗略标签粗略标签的强力半半超图像识别网络 2506.03582v2

Authors (2): Rui Yann, Xianglei Xing

We present SemiOccam, an image recognition network that leverages semi-supervised learning in a highly efficient manner. Existing works often rely on complex training techniques and architectures, requiring hundreds of GPU hours for training, while their generalization ability when dealing with extremely limited labeled data remains to be improved. To address these limitations, we construct a hierarchical mixture density classification decision mechanism by optimizing mutual information between feature representations and target classes, compressing redundant information while retaining crucial discriminative components. Experimental results demonstrate that our method achieves state-of-the-art performance on various datasets when using negligible labeled samples, and its simple architecture keeps training time to minute-level. Notably, this paper reveals a long-overlooked data leakage issue in the STL-10 dataset for semi-supervised learning tasks and removes duplicates to ensure the reliability of experimental results. We also release the deduplicated CleanSTL-10 dataset to facilitate fair and reliable research in future semi-supervised learning. Code available at https://github.com/Shu1L0n9/SemiOccam.

nan


Article 1549

Title@2025-06-06 (5): A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization

Title: A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization Eine theoretische Studie der (Hyper-)Selbstachtung durch das Lens of Interactions: Repräsentation, Ausbildung, Verallgemeinerung 通过互动的镜头进行(Hyper)自我关注的理论研究:代表性、培训、普遍化 2506.06179v1

Authors (2): Muhammed Ustaomeroglu, Guannan Qu

Self-attention has emerged as a core component of modern neural architectures, yet its theoretical underpinnings remain elusive. In this paper, we study self-attention through the lens of interacting entities, ranging from agents in multi-agent reinforcement learning to alleles in genetic sequences, and show that a single layer linear self-attention can efficiently represent, learn, and generalize functions capturing pairwise interactions, including out-of-distribution scenarios. Our analysis reveals that self-attention acts as a mutual interaction learner under minimal assumptions on the diversity of interaction patterns observed during training, thereby encompassing a wide variety of real-world domains. In addition, we validate our theoretical insights through experiments demonstrating that self-attention learns interaction functions and generalizes across both population distributions and out-of-distribution scenarios. Building on our theories, we introduce HyperFeatureAttention, a novel neural network module designed to learn couplings of different feature-level interactions between entities. Furthermore, we propose HyperAttention, a new module that extends beyond pairwise interactions to capture multi-entity dependencies, such as three-way, four-way, or general n-way interactions.

nan


Article 1550

Title@2025-06-06 (5): CoxNTF: A New Approach for Joint Clustering and Prediction in Survival Analysis

Title: CoxNTF: A New Approach for Joint Clustering and Prediction in Survival Analysis CoxNTF: Ein neuer Ansatz für Joint Clustering und Vorhersagen in der Überlebensanalyse CoxNTF:在生存分析中联合集群和预测的新方法 2506.06411v1

Authors (3): Paul Fogel, Christophe Geissler, George Luta

The interpretation of the results of survival analysis often benefits from latent factor representations of baseline covariates. However, existing methods, such as Nonnegative Matrix Factorization (NMF), do not incorporate survival information, limiting their predictive power. We present CoxNTF, a novel approach that uses non-negative tensor factorization (NTF) to derive meaningful latent representations that are closely associated with survival outcomes. CoxNTF constructs a weighted covariate tensor in which survival probabilities derived from the Coxnet model are used to guide the tensorization process. Our results show that CoxNTF achieves survival prediction performance comparable to using Coxnet with the original covariates, while providing a structured and interpretable clustering framework. In addition, the new approach effectively handles feature redundancy, making it a powerful tool for joint clustering and prediction in survival analysis.

nan


Article 1551

Title@2025-06-06 (5): Reusing Trajectories in Policy Gradients Enables Fast Convergence

Title: Reusing Trajectories in Policy Gradients Enables Fast Convergence Wiederverwendung von Trajektorien in politischen Gradienten ermöglicht schnelle Konvergenz 在政策梯度中重新使用轨迹以启用快速聚合 2506.06178v1

Authors (5): Alessandro Montenegro, Federico Mansutti, Marco Mussi, Matteo Papini, Alberto Maria Metelli

Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. These methods learn the parameters of parametric policies via stochastic gradient ascent, typically using on-policy trajectory data to estimate the policy gradient. However, such reliance on fresh data makes them sample-inefficient. Indeed, vanilla PG methods require $O(\epsilon^{-2})$ trajectories to reach an $\epsilon$-approximate stationary point. A common strategy to improve efficiency is to reuse off-policy information from past iterations, such as previous gradients or trajectories. While gradient reuse has received substantial theoretical attention, leading to improved rates of $O(\epsilon^{-3/2})$, the reuse of past trajectories remains largely unexplored from a theoretical perspective. In this work, we provide the first rigorous theoretical evidence that extensive reuse of past off-policy trajectories can significantly accelerate convergence in PG methods. We introduce a power mean correction to the multiple importance weighting estimator and propose RPG (Retrospective Policy Gradient), a PG algorithm that combines old and new trajectories for policy updates. Through a novel analysis, we show that, under established assumptions, RPG achieves a sample complexity of $\widetilde{O}(\epsilon^{-1})$, the best known rate in the literature. We further validate empirically our approach against PG methods with state-of-the-art rates.

nan


Article 1552

Title@2025-06-06 (5): Improving choice model specification using reinforcement learning

Title: Improving choice model specification using reinforcement learning Verbesserung der Wahlmodellspezifikation mittels Verstärkungslernen 利用强化学习改进选择模式规格 2506.06410v1

Authors (3): Gabriel Nova, Sander van Cranenburgh, Stephane Hess

Discrete choice modelling is a theory-driven modelling framework for understanding and forecasting choice behaviour. To obtain behavioural insights, modellers test several competing model specifications in their attempts to discover the ‘true’ data generation process. This trial-and-error process requires expertise, is time-consuming, and relies on subjective theoretical assumptions. Although metaheuristics have been proposed to assist choice modellers, they treat model specification as a classic optimisation problem, relying on static strategies, applying predefined rules, and neglecting outcomes from previous estimated models. As a result, current metaheuristics struggle to prioritise promising search regions, adapt exploration dynamically, and transfer knowledge to other modelling tasks. To address these limitations, we introduce a deep reinforcement learning-based framework where an ‘agent’ specifies models by estimating them and receiving rewards based on goodness-of-fit and parsimony. Results demonstrate the agent dynamically adapts its strategies to identify promising specifications across data generation processes, showing robustness and potential transferability, without prior domain knowledge.

nan


Article 1553

Title@2025-06-06 (5): Emergent Response Planning in LLMs

Title: Emergent Response Planning in LLMs Emergent Response Planning in LLMs 新的LLMM 应急规划 2502.06258v2

Authors (5): Zhichen Dong, Zhanhui Zhou, Zhixuan Liu, Chao Yang, Chaochao Lu

In this work, we argue that large language models (LLMs), though trained to predict only the next token, exhibit emergent planning behaviors: $\textbf{their hidden representations encode future outputs beyond the next token}$. Through simple probing, we demonstrate that LLM prompt representations encode global attributes of their entire responses, including $\textit{structure attributes}$ (e.g., response length, reasoning steps), $\textit{content attributes}$ (e.g., character choices in storywriting, multiple-choice answers at the end of response), and $\textit{behavior attributes}$ (e.g., answer confidence, factual consistency). In addition to identifying response planning, we explore how it scales with model size across tasks and how it evolves during generation. The findings that LLMs plan ahead for the future in their hidden representations suggest potential applications for improving transparency and generation control.

nan


Article 1554

Title@2025-06-06 (5): A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks

Title: A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks Riemannische Optimierungsperspektive der Gauß-Newton-Methode für neurale Netze Fefforward神经网络的高斯-牛顿方法的优化观点 2412.14031v4

Authors (1): Semih Cayci

We analyze the convergence of Gauss-Newton dynamics for training neural networks with smooth activation functions. In the underparameterized regime, the Gauss-Newton gradient flow induces a Riemannian gradient flow on a low-dimensional, smooth, embedded submanifold of the Euclidean output space. Using tools from Riemannian optimization, we prove \emph{last-iterate} convergence of the Riemannian gradient flow to the optimal in-class predictor at an \emph{exponential rate} that is independent of the conditioning of the Gram matrix, \emph{without} requiring explicit regularization. We further characterize the critical impacts of the neural network scaling factor and the initialization on the convergence behavior. In the overparameterized regime, we show that the Levenberg-Marquardt dynamics with an appropriately chosen damping schedule yields fast convergence rate despite potentially ill-conditioned neural tangent kernel matrices, analogous to the underparameterized regime. These findings demonstrate the potential of Gauss-Newton methods for efficiently optimizing neural networks in the near-initialization regime, particularly in ill-conditioned problems where kernel and Gram matrices have small singular values.

nan


Article 1555

Title@2025-06-06 (5): The Lock-in Hypothesis: Stagnation by Algorithm

Title: The Lock-in Hypothesis: Stagnation by Algorithm Die Lock-in Hypothese: Stagnation durch Algorithmen Lock-in 假说: 阿尔哥里特姆的停滞 2506.06166v1

Authors (4): Tianyi Alex Qiu, Zhonghao He, Tejasveer Chugh, Max Kleiman-Weiner

The training and deployment of large language models (LLMs) create a feedback loop with human users: models learn human beliefs from data, reinforce these beliefs with generated content, reabsorb the reinforced beliefs, and feed them back to users again and again. This dynamic resembles an echo chamber. We hypothesize that this feedback loop entrenches the existing values and beliefs of users, leading to a loss of diversity and potentially the lock-in of false beliefs. We formalize this hypothesis and test it empirically with agent-based LLM simulations and real-world GPT usage data. Analysis reveals sudden but sustained drops in diversity after the release of new GPT iterations, consistent with the hypothesized human-AI feedback loop. Code and data available at https://thelockinhypothesis.com

nan


Article 1556

Title@2025-06-06 (5): Automated Journalistic Questions: A New Method for Extracting 5W1H in French

Title: Automated Journalistic Questions: A New Method for Extracting 5W1H in French Automatisierte journalistische Fragen: Eine neue Methode zur Extraktion von 5W1H in Französisch 自动新闻问题:以法文提取5W1H的新方法 2505.14804v2

Authors (3): Maxence Verhaverbeke, Julie A. Gramaccia, Richard Khoury

The 5W1H questions – who, what, when, where, why and how – are commonly used in journalism to ensure that an article describes events clearly and systematically. Answering them is a crucial prerequisites for tasks such as summarization, clustering, and news aggregation. In this paper, we design the first automated extraction pipeline to get 5W1H information from French news articles. To evaluate the performance of our algorithm, we also create a corpus of 250 Quebec news articles with 5W1H answers marked by four human annotators. Our results demonstrate that our pipeline performs as well in this task as the large language model GPT-4o.

nan


Article 1557

Title@2025-06-06 (5): Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Title: Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning Nicht alle Rollouts sind nützlich: Down-Sampling Rollouts im LLM Verstärkungslernen 并非所有的推出都有用:在LLM强化学习中进行下选推出 2504.13818v2

Authors (4): Yixuan Even Xu, Yash Savani, Fei Fang, Zico Kolter

Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful paradigm for enhancing reasoning capabilities in large language models. However, it is constrained by a fundamental asymmetry in computation and memory requirements: rollout generation is embarrassingly parallel and memory-light, whereas policy updates are communication-heavy and memory-intensive. To address this, we introduce PODS (Policy Optimization with Down-Sampling). PODS produces numerous rollouts in parallel, then trains on only an informative subset, preserving learning signals while slashing update cost. We instantiate PODS with max-variance down-sampling, a principled criterion that maximises reward diversity and show it admits an $O(n\log n)$ solution. Empirically, coupling PODS with Group Relative Policy Optimization (GRPO) achieves superior performance over standard GRPO across different reasoning benchmarks and hardware environments.

nan


Article 1558

Title@2025-06-06 (5): ENMA: Tokenwise Autoregression for Generative Neural PDE Operators

Title: ENMA: Tokenwise Autoregression for Generative Neural PDE Operators ENMA: Tokenwise Autoregression für generative neurale PDE-Betreiber ENMA: 生成神经器 PDE 运算符的取向自动递减 2506.06158v1

Authors (4): Armand Kassaï Koupaï, Lise Le Boudec, Louis Serrano, Patrick Gallinari

Solving time-dependent parametric partial differential equations (PDEs) remains a fundamental challenge for neural solvers, particularly when generalizing across a wide range of physical parameters and dynamics. When data is uncertain or incomplete-as is often the case-a natural approach is to turn to generative models. We introduce ENMA, a generative neural operator designed to model spatio-temporal dynamics arising from physical phenomena. ENMA predicts future dynamics in a compressed latent space using a generative masked autoregressive transformer trained with flow matching loss, enabling tokenwise generation. Irregularly sampled spatial observations are encoded into uniform latent representations via attention mechanisms and further compressed through a spatio-temporal convolutional encoder. This allows ENMA to perform in-context learning at inference time by conditioning on either past states of the target trajectory or auxiliary context trajectories with similar dynamics. The result is a robust and adaptable framework that generalizes to new PDE regimes and supports one-shot surrogate modeling of time-dependent parametric PDEs.

nan


Article 1559

Title@2025-06-06 (5): Deconfounding Multi-Cause Latent Confounders: A Factor-Model Approach to Climate Model Bias Correction

Title: Deconfounding Multi-Cause Latent Confounders: A Factor-Model Approach to Climate Model Bias Correction Deconfounding Multi-Cause Latent Confounders: Ein Faktor-Modell Ansatz zur Klimamodell Bias Korrektur 多原因、中位本部混乱创始人:对气候模型比重纠正的因数模式办法 2408.12063v2

Authors (10): Wentao Gao, Jiuyong Li, Debo Cheng, Lin Liu, Jixue Liu, Thuc Duy Le, Xiaojing Du, Xiongren Chen, Yanchang Zhao, Yun Chen

Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, the GCM Outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglect unobserved confounders, leading to biased results. This paper proposes a novel bias correction approach to utilize both GCM and observational data to learn a factor model that captures multi-cause latent confounders. Inspired by recent advances in causality based time series deconfounding, our method first constructs a factor model to learn latent confounders from historical data and then applies them to enhance the bias correction process using advanced time series forecasting models. The experimental results demonstrate significant improvements in the accuracy of precipitation outputs. By addressing unobserved confounders, our approach offers a robust and theoretically grounded solution for climate model bias correction.

nan


Article 1560

Title@2025-06-06 (5): Graph Neural Network Generalization with Gaussian Mixture Model Based Augmentation

Title: Graph Neural Network Generalization with Gaussian Mixture Model Based Augmentation Graphische Neural-Netzwerk-Verallgemeinerung mit Gauß-Mischungsmodell-basierter Augmentation Gaussian 混合模型基底放大模型的神经网络通用 2411.08638v3

Authors (5): Yassine Abbahaddou, Fragkiskos D. Malliaros, Johannes F. Lutzeyer, Amine Mohamed Aboussalah, Michalis Vazirgiannis

Graph Neural Networks (GNNs) have shown great promise in tasks like node and graph classification, but they often struggle to generalize, particularly to unseen or out-of-distribution (OOD) data. These challenges are exacerbated when training data is limited in size or diversity. To address these issues, we introduce a theoretical framework using Rademacher complexity to compute a regret bound on the generalization error and then characterize the effect of data augmentation. This framework informs the design of GRATIN, an efficient graph data augmentation algorithm leveraging the capability of Gaussian Mixture Models (GMMs) to approximate any distribution. Our approach not only outperforms existing augmentation techniques in terms of generalization but also offers improved time complexity, making it highly suitable for real-world applications.

nan


Article 1561

Title@2025-06-06 (5): ELEVATE-GenAI: Reporting Guidelines for the Use of Large Language Models in Health Economics and Outcomes Research: an ISPOR Working Group on Generative AI Report

Title: ELEVATE-GenAI: Reporting Guidelines for the Use of Large Language Models in Health Economics and Outcomes Research: an ISPOR Working Group on Generative AI Report ELEVATE-GenAI: Berichterstattungsrichtlinien für den Einsatz von großen Sprachmodellen in der Gesundheitsökonomie und der Ergebnisforschung: eine ISPOR-Arbeitsgruppe für generative KI-Berichte ELEVATE-GenAI:在卫生经济学和成果研究中使用大语言模式的报告准则:科学、科学、科学、科学、技术研究所创举报告工作组 2501.12394v2

Authors (8): Rachael L. Fleurence, Dalia Dawoud, Jiang Bian, Mitchell K. Higashi, Xiaoyan Wang, Hua Xu, Jagpreet Chhatwal, Turgay Ayer

Introduction: Generative artificial intelligence (AI), particularly large language models (LLMs), holds significant promise for Health Economics and Outcomes Research (HEOR). However, standardized reporting guidance for LLM-assisted research is lacking. This article introduces the ELEVATE GenAI framework and checklist - reporting guidelines specifically designed for HEOR studies involving LLMs. Methods: The framework was developed through a targeted literature review of existing reporting guidelines, AI evaluation frameworks, and expert input from the ISPOR Working Group on Generative AI. It comprises ten domains, including model characteristics, accuracy, reproducibility, and fairness and bias. The accompanying checklist translates the framework into actionable reporting items. To illustrate its use, the framework was applied to two published HEOR studies: one focused on systematic literature review tasks and the other on economic modeling. Results: The ELEVATE GenAI framework offers a comprehensive structure for reporting LLM-assisted HEOR research, while the checklist facilitates practical implementation. Its application to the two case studies demonstrates its relevance and usability across different HEOR contexts. Limitations: Although the framework provides robust reporting guidance, further empirical testing is needed to assess its validity, completeness, usability, as well as its generalizability across diverse HEOR use cases. Conclusion: The ELEVATE GenAI framework and checklist address a critical gap by offering structured guidance for transparent, accurate, and reproducible reporting of LLM-assisted HEOR research. Future work will focus on extensive testing and validation to support broader adoption and refinement.

nan


Article 1562

Title@2025-06-06 (5): Laplace Transform Based Low-Complexity Learning of Continuous Markov Semigroups

Title: Laplace Transform Based Low-Complexity Learning of Continuous Markov Semigroups Laplace Transform-basierte Low-Complexity-Lernen von kontinuierlichen Markov-Halbgruppen 连续马尔科夫半群体低复杂程度低学习 2410.14477v2

Authors (6): Vladimir R. Kostic, Karim Lounici, Hélène Halconruy, Timothée Devergne, Pietro Novelli, Massimiliano Pontil

Markov processes serve as a universal model for many real-world random processes. This paper presents a data-driven approach for learning these models through the spectral decomposition of the infinitesimal generator (IG) of the Markov semigroup. The unbounded nature of IGs complicates traditional methods such as vector-valued regression and Hilbert-Schmidt operator analysis. Existing techniques, including physics-informed kernel regression, are computationally expensive and limited in scope, with no recovery guarantees for transfer operator methods when the time-lag is small. We propose a novel method that leverages the IG’s resolvent, characterized by the Laplace transform of transfer operators. This approach is robust to time-lag variations, ensuring accurate eigenvalue learning even for small time-lags. Our statistical analysis applies to a broader class of Markov processes than current methods while reducing computational complexity from quadratic to linear in the state dimension. Finally, we illustrate the behaviour of our method in two experiments.

nan


Article 1563

Title@2025-06-06 (5): LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

Title: LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws LLMs on the Line: Daten bestimmen die Loss-to-Loss-Skalierungsgesetze 线上LLM:数据确定损失到损失的缩放法 2502.12120v2

Authors (5): Prasanna Mayilvahanan, Thaddäus Wiedemer, Sayak Mallick, Matthias Bethge, Wieland Brendel

Scaling laws guide the development of large language models (LLMs) by offering estimates for the optimal balance of model size, tokens, and compute. More recently, loss-to-loss scaling laws that relate losses across pretraining datasets and downstream tasks have emerged as a powerful tool for understanding and improving LLM performance. In this work, we investigate which factors most strongly influence loss-to-loss scaling. Our experiments reveal that the pretraining data and tokenizer determine the scaling trend. In contrast, model size, optimization hyperparameters, and even significant architectural differences, such as between transformer-based models like Llama and state-space models like Mamba, have limited impact. Consequently, practitioners should carefully curate suitable pretraining datasets for optimal downstream performance, while architectures and other settings can be freely optimized for training efficiency.

nan


Article 1564

Title@2025-06-06 (5): carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks

Title: carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks carps: Ein Rahmen für den Vergleich von N Hyperparameter-Optimierern auf M Benchmarks 在M基准上比较超超参数最佳放大器的框架 2506.06143v1

Authors (16): Carolin Benjamins, Helena Graf, Sarah Segel, Difan Deng, Tim Ruhkopf, Leona Hennig, Soham Basu, Neeratyoy Mallik, Edward Bergman, Deyao Chen, François Clément, Matthias Feurer, Katharina Eggensperger, Frank Hutter, Carola Doerr, Marius Lindauer

Hyperparameter Optimization (HPO) is crucial to develop well-performing machine learning models. In order to ease prototyping and benchmarking of HPO methods, we propose carps, a benchmark framework for Comprehensive Automated Research Performance Studies allowing to evaluate N optimizers on M benchmark tasks. In this first release of carps, we focus on the four most important types of HPO task types: blackbox, multi-fidelity, multi-objective and multi-fidelity-multi-objective. With 3 336 tasks from 5 community benchmark collections and 28 variants of 9 optimizer families, we offer the biggest go-to library to date to evaluate and compare HPO methods. The carps framework relies on a purpose-built, lightweight interface, gluing together optimizers and benchmark tasks. It also features an analysis pipeline, facilitating the evaluation of optimizers on benchmarks. However, navigating a huge number of tasks while developing and comparing methods can be computationally infeasible. To address this, we obtain a subset of representative tasks by minimizing the star discrepancy of the subset, in the space spanned by the full set. As a result, we propose an initial subset of 10 to 30 diverse tasks for each task type, and include functionality to re-compute subsets as more benchmarks become available, enabling efficient evaluations. We also establish a first set of baseline results on these tasks as a measure for future comparisons. With carps (https://www.github.com/automl/CARP-S), we make an important step in the standardization of HPO evaluation.

nan


Article 1565

Title@2025-06-06 (5): Efficient Diffusion Models: A Survey

Title: Efficient Diffusion Models: A Survey Effiziente Diffusionsmodelle: Eine Umfrage 高效传播模型:调查 2502.06805v3

Authors (14): Hui Shen, Jingxuan Zhang, Boning Xiong, Rui Hu, Shoufa Chen, Zhongwei Wan, Xin Wang, Yu Zhang, Zixuan Gong, Guangyin Bao, Chaofan Tao, Yongfeng Huang, Ye Yuan, Mi Zhang

Diffusion models have emerged as powerful generative models capable of producing high-quality contents such as images, videos, and audio, demonstrating their potential to revolutionize digital content creation. However, these capabilities come at the cost of their significant computational resources and lengthy generation time, underscoring the critical need to develop efficient techniques for practical deployment. In this survey, we provide a systematic and comprehensive review of research on efficient diffusion models. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient diffusion model topics from algorithm-level, system-level, and framework perspective, respectively. We have also created a GitHub repository where we organize the papers featured in this survey at https://github.com/AIoT-MLSys-Lab/Efficient-Diffusion-Model-Survey. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient diffusion model research and inspire them to contribute to this important and exciting field.

nan


Article 1566

Title@2025-06-06 (5): Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models

Title: Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models Tabelle-r1: Selbstbeaufsichtigtes und verstärktes Lernen für programmbasierte Tabellenveranlagung in kleinen Sprachmodellen 表-r1:针对基于方案的小语言模式说明理由的表格进行自我监督和强化学习 2506.06137v1

Authors (9): Rihui Jin, Zheyu Xin, Xing Xie, Zuoyi Li, Guilin Qi, Yongrui Chen, Xinbang Dai, Tongtong Wu, Gholamreza Haffari

Table reasoning (TR) requires structured reasoning over semi-structured tabular data and remains challenging, particularly for small language models (SLMs, e.g., LLaMA-8B) due to their limited capacity compared to large LMs (LLMs, e.g., GPT-4o). To narrow this gap, we explore program-based TR (P-TR), which circumvents key limitations of text-based TR (T-TR), notably in numerical reasoning, by generating executable programs. However, applying P-TR to SLMs introduces two challenges: (i) vulnerability to heterogeneity in table layouts, and (ii) inconsistency in reasoning due to limited code generation capability. We propose Table-r1, a two-stage P-TR method designed for SLMs. Stage 1 introduces an innovative self-supervised learning task, Layout Transformation Inference, to improve tabular layout generalization from a programmatic view. Stage 2 adopts a mix-paradigm variant of Group Relative Policy Optimization, enhancing P-TR consistency while allowing dynamic fallback to T-TR when needed. Experiments on four TR benchmarks demonstrate that Table-r1 outperforms all SLM-based methods, achieving at least a 15% accuracy improvement over the base model (LLaMA-8B) across all datasets and reaching performance competitive with LLMs.

nan


Article 1567

Title@2025-06-06 (5): Regret-Free Reinforcement Learning for LTL Specifications

Title: Regret-Free Reinforcement Learning for LTL Specifications Bedauern-freie Verstärkungs-Lernen für LTL-Spezifikationen 为远程学习具体要求而后悔地免费加强学习 2411.12019v2

Authors (3): Rupak Majumdar, Mahmoud Salamati, Sadegh Soudjani

Learning to control an unknown dynamical system with respect to high-level temporal specifications is an important problem in control theory. We present the first regret-free online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics. We assume that the underlying (unknown) dynamics is modeled by a finite-state and action Markov decision process (MDP). Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs. For general LTL specifications, we show that the synthesis problem can be reduced to a reach-avoid problem once the graph structure is known. Additionally, we provide an algorithm for learning the graph structure, assuming knowledge of a minimum transition probability, which operates independently of the main regret-free algorithm. Our LTL controller synthesis algorithm provides sharp bounds on how close we are to achieving optimal behavior after a finite number of learning episodes. In contrast, previous algorithms for LTL synthesis only provide asymptotic guarantees, which give no insight into the transient performance during the learning phase.

nan


Article 1568

Title@2025-06-06 (5): VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

Title: VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters VisionTS: Visuelle Maskierte Autoencoder sind kostenlose Zero-Shot-Zeitreihen-Forecaster 视野:视觉蒙面自动识别器是无热零热时间序列预测器 2408.17253v4

Authors (6): Mouxiang Chen, Lefei Shen, Zhuo Li, Xiaoyun Joy Wang, Jianling Sun, Chenghao Liu

Foundation models have emerged as a promising approach in time series forecasting (TSF). Existing approaches either repurpose large language models (LLMs) or build large-scale time series datasets to develop TSF foundation models for universal forecasting. However, these methods face challenges due to the severe cross-domain gap or in-domain heterogeneity. This paper explores a new road to building a TSF foundation model from rich, high-quality natural images. Our key insight is that a visual masked autoencoder, pre-trained on the ImageNet dataset, can naturally be a numeric series forecaster. By reformulating TSF as an image reconstruction task, we bridge the gap between image pre-training and TSF downstream tasks. Surprisingly, without further adaptation in the time series domain, the proposed VisionTS could achieve better zero-shot forecast performance than existing TSF foundation models. With fine-tuning for one epoch, VisionTS could further improve the forecasting and achieve state-of-the-art performance in most cases. Extensive experiments reveal intrinsic similarities between images and real-world time series, suggesting that visual models may offer a “free lunch” for TSF and highlight the potential for future cross-modality research. Our code is publicly available at https://github.com/Keytoyze/VisionTS.

nan


Article 1569

Title@2025-06-06 (5): Similarity Matching Networks: Hebbian Learning and Convergence Over Multiple Time Scales

Title: Similarity Matching Networks: Hebbian Learning and Convergence Over Multiple Time Scales Ähnlichkeit passender Netzwerke: Hebbisches Lernen und Konvergenz über mehrere Zeitskalan 相似性匹配网络: Hebbian 学习和多重时间尺度的趋同 2506.06134v1

Authors (3): Veronica Centorrino, Francesco Bullo, Giovanni Russo

A recent breakthrough in biologically-plausible normative frameworks for dimensionality reduction is based upon the similarity matching cost function and the low-rank matrix approximation problem. Despite clear biological interpretation, successful application in several domains, and experimental validation, a formal complete convergence analysis remains elusive. Building on this framework, we consider and analyze a continuous-time neural network, the \emph{similarity matching network}, for principal subspace projection. Derived from a min-max-min objective, this biologically-plausible network consists of three coupled dynamics evolving at different time scales: neural dynamics, lateral synaptic dynamics, and feedforward synaptic dynamics at the fast, intermediate, and slow time scales, respectively. The feedforward and lateral synaptic dynamics consist of Hebbian and anti-Hebbian learning rules, respectively. By leveraging a multilevel optimization framework, we prove convergence of the dynamics in the offline setting. Specifically, at the first level (fast time scale), we show strong convexity of the cost function and global exponential convergence of the corresponding gradient-flow dynamics. At the second level (intermediate time scale), we prove strong concavity of the cost function and exponential convergence of the corresponding gradient-flow dynamics within the space of positive definite matrices. At the third and final level (slow time scale), we study a non-convex and non-smooth cost function, provide explicit expressions for its global minima, and prove almost sure convergence of the corresponding gradient-flow dynamics to the global minima. These results rely on two empirically motivated conjectures that are supported by thorough numerical experiments. Finally, we validate the effectiveness of our approach via a numerical example.

nan


Article 1570

Title@2025-06-06 (5): Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions

Title: Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions Verbesserung der numerischen Stabilität des Normalisierten gegenseitigen Informations-Schätzers über hohe Abmessungen 提高高方面标准化相互信息模拟器的数值稳定 2410.07642v2

Authors (2): Marko Tuononen, Ville Hautamäki

Mutual information provides a powerful, general-purpose metric for quantifying the amount of shared information between variables. Estimating normalized mutual information using a k-Nearest Neighbor (k-NN) based approach involves the calculation of the scaling-invariant k-NN radius. Calculation of the radius suffers from numerical overflow when the joint dimensionality of the data becomes high, typically in the range of several hundred dimensions. To address this issue, we propose a logarithmic transformation technique that improves the numerical stability of the radius calculation in high-dimensional spaces. By applying the proposed transformation during the calculation of the radius, numerical overflow is avoided, and precision is maintained. Proposed transformation is validated through both theoretical analysis and empirical evaluation, demonstrating its ability to stabilize the calculation without compromising precision, increasing bias, or adding significant computational overhead, while also helping to maintain estimator variance.

nan


Article 1571

Title@2025-06-06 (5): Gradient Similarity Surgery in Multi-Task Deep Learning

Title: Gradient Similarity Surgery in Multi-Task Deep Learning Gradient Ähnlichkeit Chirurgie in Multi-Task Deep Learning 多任务深层学习中的渐渐相似性外科手术 2506.06130v1

Authors (4): Thomas Borsani, Andrea Rosani, Giuseppe Nicosia, Giuseppe Di Fatta

The multi-task learning ($MTL$) paradigm aims to simultaneously learn multiple tasks within a single model capturing higher-level, more general hidden patterns that are shared by the tasks. In deep learning, a significant challenge in the backpropagation training process is the design of advanced optimisers to improve the convergence speed and stability of the gradient descent learning rule. In particular, in multi-task deep learning ($MTDL$) the multitude of tasks may generate potentially conflicting gradients that would hinder the concurrent convergence of the diverse loss functions. This challenge arises when the gradients of the task objectives have either different magnitudes or opposite directions, causing one or a few to dominate or to interfere with each other, thus degrading the training process. Gradient surgery methods address the problem explicitly dealing with conflicting gradients by adjusting the overall gradient trajectory. This work introduces a novel gradient surgery method, the Similarity-Aware Momentum Gradient Surgery (SAM-GS), which provides an effective and scalable approach based on a gradient magnitude similarity measure to guide the optimisation process. The SAM-GS surgery adopts gradient equalisation and modulation of the first-order momentum. A series of experimental tests have shown the effectiveness of SAM-GS on synthetic problems and $MTL$ benchmarks. Gradient magnitude similarity plays a crucial role in regularising gradient aggregation in $MTDL$ for the optimisation of the learning process.

nan


Article 1572

Title@2025-06-06 (5): Flow-Attentional Graph Neural Networks

Title: Flow-Attentional Graph Neural Networks Fluss-aufmerksamer Graph Neuronale Netzwerke 流动动态图图神经网络 2506.06127v1

Authors (4): Pascal Plettenberg, Dominik Köhler, Bernhard Sick, Josephine M. Thomas

Graph Neural Networks (GNNs) have become essential for learning from graph-structured data. However, existing GNNs do not consider the conservation law inherent in graphs associated with a flow of physical resources, such as electrical current in power grids or traffic in transportation networks, which can lead to reduced model performance. To address this, we propose flow attention, which adapts existing graph attention mechanisms to satisfy Kirchhoff's first law. Furthermore, we discuss how this modification influences the expressivity and identify sets of non-isomorphic graphs that can be discriminated by flow attention but not by standard attention. Through extensive experiments on two flow graph datasets (electronic circuits and power grids), we demonstrate that flow attention enhances the performance of attention-based GNNs on both graph-level classification and regression tasks.

nan


Article 1573

Title@2025-06-06 (5): Convergence of linear programming hierarchies for Gibbs states of spin systems

Title: Convergence of linear programming hierarchies for Gibbs states of spin systems Konvergenz von linearen Programmierhierarchien für Gibbs-Zustände von Spinsystemen Gibbbs旋转系统状态线性编程分级组合 2506.06125v1

Authors (2): Hamza Fawzi, Omar Fawzi

We consider the problem of computing expectation values of local functions under the Gibbs distribution of a spin system. In particular, we study two families of linear programming hierarchies for this problem. The first hierarchy imposes local spin flip equalities and has been considered in the bootstrap literature in high energy physics. For this hierarchy, we prove fast convergence under a spatial mixing (decay of correlations) condition. This condition is satisfied for example above the critical temperature for Ising models on a $d$-dimensional grid. The second hierarchy is based on a Markov chain having the Gibbs state as a fixed point and has been studied in the optimization literature and more recently in the bootstrap literature. For this hierarchy, we prove fast convergence provided the Markov chain mixes rapidly. Both hierarchies lead to an $\varepsilon$-approximation for local expectation values using a linear program of size quasi-polynomial in $n/\varepsilon$, where $n$ is the total number of sites, provided the interactions can be embedded in a $d$-dimensional grid with constant $d$. Compared to standard Monte Carlo methods, an advantage of this approach is that it always (i.e., for any system) outputs rigorous upper and lower bounds on the expectation value of interest, without needing an a priori analysis of the convergence speed.

nan


Article 1574

Title@2025-06-06 (5): Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

Title: Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library Verstärkte Lernoptimierung für großformatiges Lernen: Eine effiziente und benutzerfreundliche Scaling-Bibliothek 大规模学习的强化学习优化:高效和方便用户的扩大规模图书馆 2506.06122v1

Authors (41): Weixun Wang, Shaopan Xiong, Gengru Chen, Wei Gao, Sheng Guo, Yancheng He, Ju Huang, Jiaheng Liu, Zhendong Li, Xiaoyang Li, Zichen Liu, Haizhou Zhao, Dakai An, Lunxi Cao, Qiyang Cao, Wanxi Deng, Feilei Du, Yiliang Gu, Jiahe Li, Xiang Li, Mingjie Liu, Yijia Luo, Zihe Liu, Yadao Wang, Pei Wang, Tianyuan Wu, Yanan Wu, Yuheng Zhao, Shuaibing Zhao, Jin Yang, Siran Yang, Yingshui Tan, Huimin Yi, Yuchi Xu, Yujin Yuan, Xingyao Zhang, Lin Qu, Wenbo Su, Wei Wang, Jiamang Wang, Bo Zheng

We introduce ROLL, an efficient, scalable, and user-friendly library designed for Reinforcement Learning Optimization for Large-scale Learning. ROLL caters to three primary user groups: tech pioneers aiming for cost-effective, fault-tolerant large-scale training, developers requiring flexible control over training workflows, and researchers seeking agile experimentation. ROLL is built upon several key modules to serve these user groups effectively. First, a single-controller architecture combined with an abstraction of the parallel worker simplifies the development of the training pipeline. Second, the parallel strategy and data transfer modules enable efficient and scalable training. Third, the rollout scheduler offers fine-grained management of each sample’s lifecycle during the rollout stage. Fourth, the environment worker and reward worker support rapid and flexible experimentation with agentic RL algorithms and reward designs. Finally, AutoDeviceMapping allows users to assign resources to different models flexibly across various stages.

nan


Article 1575

Title@2025-06-06 (5): Training on Plausible Counterfactuals Removes Spurious Correlations

Title: Training on Plausible Counterfactuals Removes Spurious Correlations Training auf Plausible Counterfactals entfernt spurlose Korrelationen 关于可视反事实消除污损的培训 2505.16583v5

Authors (4): Shpresim Sadiku, Kartikeya Chitranshi, Hiroshi Kera, Sebastian Pokutta

Plausible counterfactual explanations (p-CFEs) are perturbations that minimally modify inputs to change classifier decisions while remaining plausible under the data distribution. In this study, we demonstrate that classifiers can be trained on p-CFEs labeled with induced \emph{incorrect} target classes to classify unperturbed inputs with the original labels. While previous studies have shown that such learning is possible with adversarial perturbations, we extend this paradigm to p-CFEs. Interestingly, our experiments reveal that learning from p-CFEs is even more effective: the resulting classifiers achieve not only high in-distribution accuracy but also exhibit significantly reduced bias with respect to spurious correlations.

nan


Article 1576

Title@2025-06-06 (5): Learners’ Languages

Title: Learners’ Languages Sprachen der Lernenden 学生语言 2103.01189v3

Authors (1): David I. Spivak

In “Backprop as functor”, the authors show that the fundamental elements of deep learning – gradient descent and backpropagation – can be conceptualized as a strong monoidal functor Para(Euc)$\to$Learn from the category of parameterized Euclidean spaces to that of learners, a category developed explicitly to capture parameter update and backpropagation. It was soon realized that there is an isomorphism Learn$\cong$Para(Slens), where Slens is the symmetric monoidal category of simple lenses as used in functional programming. In this note, we observe that Slens is a full subcategory of Poly, the category of polynomial functors in one variable, via the functor $A\mapsto Ay^A$. Using the fact that (Poly,$\otimes$) is monoidal closed, we show that a map $A\to B$ in Para(Slens) has a natural interpretation in terms of dynamical systems (more precisely, generalized Moore machines) whose interface is the internal-hom type $[Ay^A,By^B]$. Finally, we review the fact that the category p-Coalg of dynamical systems on any $p \in$ Poly forms a topos, and consider the logical propositions that can be stated in its internal language. We give gradient descent as an example, and we conclude by discussing some directions for future work.

nan


Article 1577

Title@2025-06-06 (5): Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness

Title: Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness Auf dem Weg zum Lebenszyklus Unlearning Commitment Management: Messung der Stichprobenebene Unlearning Completeness 实现生命周期不学习承诺管理:衡量抽样一级不学习 2506.06112v1

Authors (5): Cheng-Long Wang, Qi Li, Zihang Xiang, Yinzhi Cao, Di Wang

Growing concerns over data privacy and security highlight the importance of machine unlearning–removing specific data influences from trained models without full retraining. Techniques like Membership Inference Attacks (MIAs) are widely used to externally assess successful unlearning. However, existing methods face two key limitations: (1) maximizing MIA effectiveness (e.g., via online attacks) requires prohibitive computational resources, often exceeding retraining costs; (2) MIAs, designed for binary inclusion tests, struggle to capture granular changes in approximate unlearning. To address these challenges, we propose the Interpolated Approximate Measurement (IAM), a framework natively designed for unlearning inference. IAM quantifies sample-level unlearning completeness by interpolating the model’s generalization-fitting behavior gap on queried samples. IAM achieves strong performance in binary inclusion tests for exact unlearning and high correlation for approximate unlearning–scalable to LLMs using just one pre-trained shadow model. We theoretically analyze how IAM’s scoring mechanism maintains performance efficiently. We then apply IAM to recent approximate unlearning algorithms, revealing general risks of both over-unlearning and under-unlearning, underscoring the need for stronger safeguards in approximate unlearning systems. The code is available at https://github.com/Happy2Git/Unlearning_Inference_IAM.

nan


Article 1578

Title@2025-06-06 (5): Synthetic Tabular Data: Methods, Attacks and Defenses

Title: Synthetic Tabular Data: Methods, Attacks and Defenses Synthetische Tabellendaten: Methoden, Angriffe und Verteidigungen 合成图表数据:方法、攻击和防御 2506.06108v1

Authors (4): Graham Cormode, Samuel Maddock, Enayat Ullah, Shripad Gade

Synthetic data is often positioned as a solution to replace sensitive fixed-size datasets with a source of unlimited matching data, freed from privacy concerns. There has been much progress in synthetic data generation over the last decade, leveraging corresponding advances in machine learning and data analytics. In this survey, we cover the key developments and the main concepts in tabular synthetic data generation, including paradigms based on probabilistic graphical models and on deep learning. We provide background and motivation, before giving a technical deep-dive into the methodologies. We also address the limitations of synthetic data, by studying attacks that seek to retrieve information about the original sensitive data. Finally, we present extensions and open problems in this area.

nan


Article 1579

Title@2025-06-06 (5): The Canary’s Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text

Title: The Canary’s Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text Das Echo der Kanaren: Auditing Privacy Risks of LLM-generated Synthetic Text 加那利的回声:LLM发明的合成文字的 审计隐私风险 2502.14921v2

Authors (5): Matthieu Meeus, Lukas Wutschitz, Santiago Zanella-Béguelin, Shruti Tople, Reza Shokri

How much information about training samples can be leaked through synthetic data generated by Large Language Models (LLMs)? Overlooking the subtleties of information flow in synthetic data generation pipelines can lead to a false sense of privacy. In this paper, we assume an adversary has access to some synthetic data generated by a LLM. We design membership inference attacks (MIAs) that target the training data used to fine-tune the LLM that is then used to synthesize data. The significant performance of our MIA shows that synthetic data leak information about the training data. Further, we find that canaries crafted for model-based MIAs are sub-optimal for privacy auditing when only synthetic data is released. Such out-of-distribution canaries have limited influence on the model’s output when prompted to generate useful, in-distribution synthetic data, which drastically reduces their effectiveness. To tackle this problem, we leverage the mechanics of auto-regressive models to design canaries with an in-distribution prefix and a high-perplexity suffix that leave detectable traces in synthetic data. This enhances the power of data-based MIAs and provides a better assessment of the privacy risks of releasing synthetic data generated by LLMs.

nan


Article 1580

Title@2025-06-06 (5): Label-Context-Dependent Internal Language Model Estimation for CTC

Title: Label-Context-Dependent Internal Language Model Estimation for CTC Label-Context-Dependent Interne Sprachmodellschätzung für CTC Label-Ctext-依赖性内部语言模型 反恐委员会内部语言模型估算 2506.06096v1

Authors (4): Zijian Yang, Minh-Nghia Phan, Ralf Schlüter, Hermann Ney

Although connectionist temporal classification (CTC) has the label context independence assumption, it can still implicitly learn a context-dependent internal language model (ILM) due to modern powerful encoders. In this work, we investigate the implicit context dependency modeled in the ILM of CTC. To this end, we propose novel context-dependent ILM estimation methods for CTC based on knowledge distillation (KD) with theoretical justifications. Furthermore, we introduce two regularization methods for KD. We conduct experiments on Librispeech and TED-LIUM Release 2 datasets for in-domain and cross-domain evaluation, respectively. Experimental results show that context-dependent ILMs outperform the context-independent priors in cross-domain evaluation, indicating that CTC learns a context-dependent ILM. The proposed label-level KD with smoothing method surpasses other ILM estimation approaches, with more than 13% relative improvement in word error rate compared to shallow fusion.

nan


Article 1581

Title@2025-06-06 (5): Flexible Operator Fusion for Fast Sparse Transformer with Diverse Masking on GPU

Title: Flexible Operator Fusion for Fast Sparse Transformer with Diverse Masking on GPU Flexible Operator Fusion für schnellen Sparse Transformer mit vielfältigem maskieren auf GPU 具有 GPU 上多种掩码的快速分散变异器灵活操作器融合 2506.06095v1

Authors (9): Wenhao Dai, Haodong Deng, Mengfei Rong, Xinyu Yang, Hongyu Liu, Fangxin Liu, Hailong Yang, Weifeng Liu, Qingxiao Sun

Large language models are popular around the world due to their powerful understanding capabilities. As the core component of LLMs, accelerating Transformer through parallelization has gradually become a hot research topic. Mask layers introduce sparsity into Transformer to reduce calculations. However, previous works rarely focus on the performance optimization of sparse Transformer. Moreover, rule-based mechanisms ignore the fusion opportunities of mixed-type operators and fail to adapt to various sequence lengths. To address the above problems, we propose STOF, a framework that incorporates optimizations for Sparse Transformer via flexible masking and operator fusion on GPU. We firstly unify the storage format and kernel implementation for the multi-head attention. Then, we map fusion schemes to compilation templates and determine the optimal parameter setting through a two-stage search engine. The experimental results show that compared to the state-of-the-art work, STOF achieves maximum speedups of 1.7x in MHA computation and 1.5x in end-to-end inference.

nan


Article 1582

Title@2025-06-06 (5): On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems

Title: On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems On-Board-Mission-Replanning für adaptive kooperative Multi-Roboter-Systeme 特派团适应性合作多机器人系统重新规划 2506.06094v1

Authors (6): Elim Kwan, Rehman Qureshi, Liam Fletcher, Colin Laganier, Victoria Nockles, Richard Walters

Cooperative autonomous robotic systems have significant potential for executing complex multi-task missions across space, air, ground, and maritime domains. But they commonly operate in remote, dynamic and hazardous environments, requiring rapid in-mission adaptation without reliance on fragile or slow communication links to centralised compute. Fast, on-board replanning algorithms are therefore needed to enhance resilience. Reinforcement Learning shows strong promise for efficiently solving mission planning tasks when formulated as Travelling Salesperson Problems (TSPs), but existing methods: 1) are unsuitable for replanning, where agents do not start at a single location; 2) do not allow cooperation between agents; 3) are unable to model tasks with variable durations; or 4) lack practical considerations for on-board deployment. Here we define the Cooperative Mission Replanning Problem as a novel variant of multiple TSP with adaptations to overcome these issues, and develop a new encoder/decoder-based model using Graph Attention Networks and Attention Models to solve it effectively and efficiently. Using a simple example of cooperative drones, we show our replanner consistently (90% of the time) maintains performance within 10% of the state-of-the-art LKH3 heuristic solver, whilst running 85-370 times faster on a Raspberry Pi. This work paves the way for increased resilience in autonomous multi-agent systems.

nan


Article 1583

Title@2025-06-06 (5): Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture

Title: Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture Paradigmen der AI-Bewertung: Kartierungsziele, Methoden und Kultur AI 评价:绘图目标、方法和文化 2502.15620v2

Authors (4): John Burden, Marko Tešić, Lorenzo Pacchiardi, José Hernández-Orallo

Research in AI evaluation has grown increasingly complex and multidisciplinary, attracting researchers with diverse backgrounds and objectives. As a result, divergent evaluation paradigms have emerged, often developing in isolation, adopting conflicting terminologies, and overlooking each other’s contributions. This fragmentation has led to insular research trajectories and communication barriers both among different paradigms and with the general public, contributing to unmet expectations for deployed AI systems. To help bridge this insularity, in this paper we survey recent work in the AI evaluation landscape and identify six main paradigms. We characterise major recent contributions within each paradigm across key dimensions related to their goals, methodologies and research cultures. By clarifying the unique combination of questions and approaches associated with each paradigm, we aim to increase awareness of the breadth of current evaluation approaches and foster cross-pollination between different paradigms. We also identify potential gaps in the field to inspire future research directions.

nan


Article 1584

Title@2025-06-06 (5): HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions

Title: HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions HeavyWater und SimplexWater: Wassermarkierung mit geringer Entropie Textverteilung 重水和简易水:低英基文本分发的标注水 2506.06409v1

Authors (8): Dor Tsur, Carol Xuan Long, Claudio Mayrink Verdun, Hsiang Hsu, Chen-Fu Chen, Haim Permuter, Sajani Vithana, Flavio P. Calmon

Large language model (LLM) watermarks enable authentication of text provenance, curb misuse of machine-generated text, and promote trust in AI systems. Current watermarks operate by changing the next-token predictions output by an LLM. The updated (i.e., watermarked) predictions depend on random side information produced, for example, by hashing previously generated tokens. LLM watermarking is particularly challenging in low-entropy generation tasks - such as coding - where next-token predictions are near-deterministic. In this paper, we propose an optimization framework for watermark design. Our goal is to understand how to most effectively use random side information in order to maximize the likelihood of watermark detection and minimize the distortion of generated text. Our analysis informs the design of two new watermarks: HeavyWater and SimplexWater. Both watermarks are tunable, gracefully trading-off between detection accuracy and text distortion. They can also be applied to any LLM and are agnostic to side information generation. We examine the performance of HeavyWater and SimplexWater through several benchmarks, demonstrating that they can achieve high watermark detection accuracy with minimal compromise of text generation quality, particularly in the low-entropy regime. Our theoretical analysis also reveals surprising new connections between LLM watermarking and coding theory. The code implementation can be found in https://github.com/DorTsur/HeavyWater_SimplexWater

nan


Article 1585

Title@2025-06-06 (5): LinGuinE: Longitudinal Guidance Estimation for Volumetric Lung Tumour Segmentation

Title: LinGuinE: Longitudinal Guidance Estimation for Volumetric Lung Tumour Segmentation LinGuinE: Longitudinal Guidance Schätzung für Volumetrische Lungentumorsegmentierung LinGuinE: 活性肺炎肿瘤分离的纵向指导估计 2506.06092v1

Authors (5): Nadine Garibli, Mayank Patwari, Bence Csiba, Yi Wei, Kostas Sidiropoulos

Segmentation of lung gross tumour volumes is an important first step in radiotherapy and surgical intervention, and is starting to play a role in assessing chemotherapy response. Response to a drug is measured by tracking the tumour volumes over a series of CT scans over a time period i.e. a longitudinal study. However, there currently exist few solutions for automated or semi-automated longitudinal tumour segmentation. This paper introduces LinGuinE, an automated method to segment a longitudinal series of lung tumours. A radiologist must provide an initial input, indicating the location of the tumour in a CT scan at an arbitrary time point. LinGuinE samples points inside this tumour and propagates them to another time point using rigid registration. A click validity classifier selects points which still fall within the tumour; these are used to automatically create a segmentation in the new time point. We test LinGuinE on a dataset acquired from a phase 3 clinical trial for lung tumours and the publicly available 4-D lung CBCT dataset. We find that LinGuinE improves the Dice on both test sets by over 20% (p< 0.05) across 63 longitudinal studies. We show that any time point can be used as a starting point, conduct ablation experiments, and find that our LinGuinE setup yields the best results on both test datasets.

nan


Article 1586

Title@2025-06-06 (5): Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion

Title: Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion Multidimensionaler Adaptiver Koeffizient für die Inferenz-Trajektorie-Optimierung in Strömung und Diffusion 流动与传播过程中的多维可调适系数 2404.14161v3

Authors (4): Dohoon Lee, Jaehyun Park, Hyunwoo J. Kim, Kyogu Lee

Flow and diffusion models have demonstrated strong performance and training stability across various tasks but lack two critical properties of simulation-based methods: freedom of dimensionality and adaptability to different inference trajectories. To address this limitation, we propose the Multidimensional Adaptive Coefficient (MAC), a plug-in module for flow and diffusion models that extends conventional unidimensional coefficients to multidimensional ones and enables inference trajectory-wise adaptation. MAC is trained via simulation-based feedback through adversarial refinement. Empirical results across diverse frameworks and datasets demonstrate that MAC enhances generative quality with high training efficiency. Consequently, our work offers a new perspective on inference trajectory optimality, encouraging future research to move beyond vector field design and to leverage training-efficient, simulation-based optimization.

nan


Article 1587

Title@2025-06-06 (5): Multilevel neural simulation-based inference

Title: Multilevel neural simulation-based inference Mehrstufige neuronale simulationsbasierte Inferenz 多级神经模拟模拟推断 2506.06087v1

Authors (4): Yuga Hikida, Ayush Bharti, Niall Jeffrey, François-Xavier Briol

Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing a simulator. However, the performance of neural SBI can suffer when simulators are computationally expensive, thereby limiting the number of simulations that can be performed. In this paper, we propose a novel approach to neural SBI which leverages multilevel Monte Carlo techniques for settings where several simulators of varying cost and fidelity are available. We demonstrate through both theoretical analysis and extensive experiments that our method can significantly enhance the accuracy of SBI methods given a fixed computational budget.

nan


Article 1588

Title@2025-06-06 (5): A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data

Title: A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data Ein neuartiges, Mensch-in-the-Loop Computational Grounded Theory Framework für große soziale Daten 大型社会数据新颖的、 “ 网上人 “ 计算基础理论大社会数据新理论框架 2506.06083v1

Authors (4): Lama Alqazlan, Zheng Fang, Michael Castelle, Rob Procter

The availability of big data has significantly influenced the possibilities and methodological choices for conducting large-scale behavioural and social science research. In the context of qualitative data analysis, a major challenge is that conventional methods require intensive manual labour and are often impractical to apply to large datasets. One effective way to address this issue is by integrating emerging computational methods to overcome scalability limitations. However, a critical concern for researchers is the trustworthiness of results when Machine Learning (ML) and Natural Language Processing (NLP) tools are used to analyse such data. We argue that confidence in the credibility and robustness of results depends on adopting a ‘human-in-the-loop’ methodology that is able to provide researchers with control over the analytical process, while retaining the benefits of using ML and NLP. With this in mind, we propose a novel methodological framework for Computational Grounded Theory (CGT) that supports the analysis of large qualitative datasets, while maintaining the rigour of established Grounded Theory (GT) methodologies. To illustrate the framework’s value, we present the results of testing it on a dataset collected from Reddit in a study aimed at understanding tutors’ experiences in the gig economy.

nan


Article 1589

Title@2025-06-06 (5): Training Deep Learning Models with Norm-Constrained LMOs

Title: Training Deep Learning Models with Norm-Constrained LMOs Ausbildung von Deep Learning-Modellen mit normgebundenen ABS 与受诺姆约束的LMO培训的LMO培训深学习模式 2502.07529v2

Authors (6): Thomas Pethick, Wanyun Xie, Kimon Antonakopoulos, Zhenyu Zhu, Antonio Silveti-Falls, Volkan Cevher

In this work, we study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new stochastic family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems. The resulting update rule unifies several existing optimization methods under a single framework. Furthermore, we propose an explicit choice of norm for deep architectures, which, as a side benefit, leads to the transferability of hyperparameters across model sizes. Experimentally, we demonstrate significant speedups on nanoGPT training using our algorithm, Scion, without any reliance on Adam. The proposed method is memory-efficient, requiring only one set of model weights and one set of gradients, which can be stored in half-precision. The code is available at https://github.com/LIONS-EPFL/scion .

nan


Article 1590

Title@2025-06-06 (5): Proximal Policy Distillation

Title: Proximal Policy Distillation Proximale Politikdestillation Proximal 政策蒸馏 2407.15134v2

Authors (1): Giacomo Spigler

We introduce Proximal Policy Distillation (PPD), a novel policy distillation method that integrates student-driven distillation and Proximal Policy Optimization (PPO) to increase sample efficiency and to leverage the additional rewards that the student policy collects during distillation. To assess the efficacy of our method, we compare PPD with two common alternatives, student-distill and teacher-distill, over a wide range of reinforcement learning environments that include discrete actions and continuous control (ATARI, Mujoco, and Procgen). For each environment and method, we perform distillation to a set of target student neural networks that are smaller, identical (self-distillation), or larger than the teacher network. Our findings indicate that PPD improves sample efficiency and produces better student policies compared to typical policy distillation approaches. Moreover, PPD demonstrates greater robustness than alternative methods when distilling policies from imperfect demonstrations. The code for the paper is released as part of a new Python library built on top of stable-baselines3 to facilitate policy distillation: `sb3-distill’.

nan


Article 1591

Title@2025-06-06 (5): System-Aware Unlearning Algorithms: Use Lesser, Forget Faster

Title: System-Aware Unlearning Algorithms: Use Lesser, Forget Faster System-Aware Unlearning Algorithmen: Weniger verwenden, schneller vergessen 系统提醒 取消学习算法 : 使用更少, 忘记更快 2506.06073v1

Authors (3): Linda Lu, Ayush Sekhari, Karthik Sridharan

Machine unlearning addresses the problem of updating a machine learning model/system trained on a dataset $S$ so that the influence of a set of deletion requests $U \subseteq S$ on the unlearned model is minimized. The gold standard definition of unlearning demands that the updated model, after deletion, be nearly identical to the model obtained by retraining. This definition is designed for a worst-case attacker (one who can recover not only the unlearned model but also the remaining data samples, i.e., $S \setminus U$). Such a stringent definition has made developing efficient unlearning algorithms challenging. However, such strong attackers are also unrealistic. In this work, we propose a new definition, system-aware unlearning, which aims to provide unlearning guarantees against an attacker that can at best only gain access to the data stored in the system for learning/unlearning requests and not all of $S\setminus U$. With this new definition, we use the simple intuition that if a system can store less to make its learning/unlearning updates, it can be more secure and update more efficiently against a system-aware attacker. Towards that end, we present an exact system-aware unlearning algorithm for linear classification using a selective sampling-based approach, and we generalize the method for classification with general function classes. We theoretically analyze the tradeoffs between deletion capacity, accuracy, memory, and computation time.

nan


Article 1592

Title@2025-06-06 (5): ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis

Title: ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis ProofAug: Effiziente Neuraltheorem-Probung mittels feinkörniger Proof-Strukturanalyse 校对:通过精细的校对结构分析证明有效的神经理论 2501.18310v2

Authors (4): Haoxiong Liu, Jiacheng Sun, Zhenguo Li, Andrew C Yao

The synergy between deep learning models and traditional automation tools, such as built-in tactics of the proof assistant and off-the-shelf automated theorem provers, plays a crucial role in developing robust and efficient neural theorem provers(NTPs). However, for proof synthesis with LLMs, previous work applies automation tools either only when explicitly invoked by the model or at a single granularity level, failing to fully exploit their power. To solve this issue, we propose ProofAug, a procedure that equips LLMs with automation methods at various granularities through fine-grained structure analysis of model-generated proof proposals. ProofAug also serves as a versatile plug-and-play module that seamlessly integrates with any tree-search algorithm, enabling our construction of an efficient recursive proving (ERP) module to further enhance performance. The superiority of our method is validated on the miniF2F benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant. Notably, by additionally employing a mixed prompting strategy, we achieve a cumulative pass rate of 66.0% after curation of the dataset (61.9% for the original version) with 2100 queries to the model per problem (In contrast, the previous SOTA in Isabelle, Subgoal-XL, only achieves 56.1% using 16384 queries per problem). We also implement a Lean 4 version of ProofAug that can improve the pass@1 performance of Kimina-Prover-Preview-Distill-1.5B from 44.3% to 50.4% on miniF2F-test. Our code is available at https://github.com/haoxiongliu/ProofAug.

nan


Article 1593

Title@2025-06-06 (5): Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models

Title: Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models Bewertung von Intersektionalen Bias in Darstellungen von vortrainierten Bilderkennungsmodellen 评估培训前图像识别模型代表性的跨部门偏见 2506.03664v2

Authors (2): Valerie Krug, Sebastian Stober

Deep Learning models have achieved remarkable success. Training them is often accelerated by building on top of pre-trained models which poses the risk of perpetuating encoded biases. Here, we investigate biases in the representations of commonly used ImageNet classifiers for facial images while considering intersections of sensitive variables age, race and gender. To assess the biases, we use linear classifier probes and visualize activations as topographic maps. We find that representations in ImageNet classifiers particularly allow differentiation between ages. Less strongly pronounced, the models appear to associate certain ethnicities and distinguish genders in middle-aged groups.

nan


Article 1594

Title@2025-06-06 (5): The Influence of the Memory Capacity of Neural DDEs on the Universal Approximation Property

Title: The Influence of the Memory Capacity of Neural DDEs on the Universal Approximation Property Der Einfluss der Speicherkapazität von neuralen DDEs auf die universelle Annäherungseigenschaft 神经DDE的记忆力对普遍接近财产的影响 2505.07244v2

Authors (2): Christian Kuehn, Sara-Viola Kuntz

Neural Ordinary Differential Equations (Neural ODEs), which are the continuous-time analog of Residual Neural Networks (ResNets), have gained significant attention in recent years. Similarly, Neural Delay Differential Equations (Neural DDEs) can be interpreted as an infinite depth limit of Densely Connected Residual Neural Networks (DenseResNets). In contrast to traditional ResNet architectures, DenseResNets are feed-forward networks that allow for shortcut connections across all layers. These additional connections introduce memory in the network architecture, as typical in many modern architectures. In this work, we explore how the memory capacity in neural DDEs influences the universal approximation property. The key parameter for studying the memory capacity is the product $K \tau$ of the Lipschitz constant and the delay of the DDE. In the case of non-augmented architectures, where the network width is not larger than the input and output dimensions, neural ODEs and classical feed-forward neural networks cannot have the universal approximation property. We show that if the memory capacity $K\tau$ is sufficiently small, the dynamics of the neural DDE can be approximated by a neural ODE. Consequently, non-augmented neural DDEs with a small memory capacity also lack the universal approximation property. In contrast, if the memory capacity $K\tau$ is sufficiently large, we can establish the universal approximation property of neural DDEs for continuous functions. If the neural DDE architecture is augmented, we can expand the parameter regions in which universal approximation is possible. Overall, our results show that by increasing the memory capacity $K\tau$, the infinite-dimensional phase space of DDEs with positive delay $\tau>0$ is not sufficient to guarantee a direct jump transition to universal approximation, but only after a certain memory threshold, universal approximation holds.

nan


Article 1595

Title@2025-06-06 (5): Zero-Shot Detection of LLM-Generated Code via Approximated Task Conditioning

Title: Zero-Shot Detection of LLM-Generated Code via Approximated Task Conditioning Zero-Shot-Erkennung des LLM-generierten Codes über eine angenäherte Aufgabenkonditionierung 通过近似任务设置条件对LLM生成的代码进行零热检测 2506.06069v1

Authors (4): Maor Ashkenazi, Ofir Brenner, Tal Furman Shohet, Eran Treister

Detecting Large Language Model (LLM)-generated code is a growing challenge with implications for security, intellectual property, and academic integrity. We investigate the role of conditional probability distributions in improving zero-shot LLM-generated code detection, when considering both the code and the corresponding task prompt that generated it. Our key insight is that when evaluating the probability distribution of code tokens using an LLM, there is little difference between LLM-generated and human-written code. However, conditioning on the task reveals notable differences. This contrasts with natural language text, where differences exist even in the unconditional distributions. Leveraging this, we propose a novel zero-shot detection approach that approximates the original task used to generate a given code snippet and then evaluates token-level entropy under the approximated task conditioning (ATC). We further provide a mathematical intuition, contextualizing our method relative to previous approaches. ATC requires neither access to the generator LLM nor the original task prompts, making it practical for real-world applications. To the best of our knowledge, it achieves state-of-the-art results across benchmarks and generalizes across programming languages, including Python, CPP, and Java. Our findings highlight the importance of task-level conditioning for LLM-generated code detection. The supplementary materials and code are available at https://github.com/maorash/ATC, including the dataset gathering implementation, to foster further research in this area.

nan


Article 1596

Title@2025-06-06 (5): Feedforward Few-shot Species Range Estimation

Title: Feedforward Few-shot Species Range Estimation Einspeisung Wenig abgespuckte Arten Range-Schätzung 向前进的微粒物种范围估计 2502.14977v2

Authors (9): Christian Lange, Max Hamilton, Elijah Cole, Alexander Shepard, Samuel Heinrich, Angela Zhu, Subhransu Maji, Grant Van Horn, Oisin Mac Aodha

Knowing where a particular species can or cannot be found on Earth is crucial for ecological research and conservation efforts. By mapping the spatial ranges of all species, we would obtain deeper insights into how global biodiversity is affected by climate change and habitat loss. However, accurate range estimates are only available for a relatively small proportion of all known species. For the majority of the remaining species, we typically only have a small number of records denoting the spatial locations where they have previously been observed. We outline a new approach for few-shot species range estimation to address the challenge of accurately estimating the range of a species from limited data. During inference, our model takes a set of spatial locations as input, along with optional metadata such as text or an image, and outputs a species encoding that can be used to predict the range of a previously unseen species in a feedforward manner. We evaluate our approach on two challenging benchmarks, where we obtain state-of-the-art range estimation performance, in a fraction of the compute time, compared to recent alternative approaches.

nan


Article 1597

Title@2025-06-06 (5): HIGHT: Hierarchical Graph Tokenization for Molecule-Language Alignment

Title: HIGHT: Hierarchical Graph Tokenization for Molecule-Language Alignment HIGHT: Hierarchische Graphen-Tokenisierung für die Ausrichtung von Molekülen HIG: 分子-语言对齐的等级图形方位化 2406.14021v2

Authors (5): Yongqiang Chen, Quanming Yao, Juzheng Zhang, James Cheng, Yatao Bian

Recently, there has been a surge of interest in extending the success of large language models (LLMs) from texts to molecules. Most existing approaches adopt a graph neural network to represent a molecule as a series of node tokens for molecule-language alignment, which, however, have overlooked the inherent hierarchical structures in molecules. Notably, higher-order molecular structures contain rich semantics of functional groups, which encode crucial biochemical functionalities of the molecules. We show that neglecting the hierarchical information in tokenization will lead to subpar molecule-language alignment and severe hallucination. To address this limitation, we propose HIerarchical GrapH Tokenization (HIGHT). HIGHT employs a hierarchical graph tokenizer that encodes the hierarchy of atom, motif, and molecular levels of informative tokens to improve the molecular perception of LLMs. HIGHT also adopts an augmented instruction tuning dataset, enriched with the hierarchical graph information, to further enhance the molecule-language alignment. Extensive experiments on 14 real-world benchmarks verify the effectiveness of HIGHT in reducing hallucination by 40%, and significant improvements in various molecule-language downstream tasks. The project is available at https: //higraphllm.github.io/.

nan


Article 1598

Title@2025-06-06 (5): Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large $p$

Title: Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large $p$ Ab Initio Nichtparametrische Variable Auswahl für skalierbare symbolische Regression mit großen $p$ 用于大额p美元可缩放的符号回归的非参数变量选择 2410.13681v2

Authors (2): Shengbin Ye, Meng Li

Symbolic regression (SR) is a powerful technique for discovering symbolic expressions that characterize nonlinear relationships in data, gaining increasing attention for its interpretability, compactness, and robustness. However, existing SR methods do not scale to datasets with a large number of input variables (referred to as extreme-scale SR), which is common in modern scientific applications. This ``large $p$’’ setting, often accompanied by measurement error, leads to slow performance of SR methods and overly complex expressions that are difficult to interpret. To address this scalability challenge, we propose a method called PAN+SR, which combines a key idea of ab initio nonparametric variable selection with SR to efficiently pre-screen large input spaces and reduce search complexity while maintaining accuracy. The use of nonparametric methods eliminates model misspecification, supporting a strategy called parametric-assisted nonparametric (PAN). We also extend SRBench, an open-source benchmarking platform, by incorporating high-dimensional regression problems with various signal-to-noise ratios. Our results demonstrate that PAN+SR consistently enhances the performance of 19 contemporary SR methods, enabling several to achieve state-of-the-art performance on these challenging datasets.

nan


Article 1599

Title@2025-06-06 (5): Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation

Title: Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation Effiziente feinkörnige Anleitung für die Diffusionsmodellbasierte Symbolische Musikgenerierung 以模型为基础的示范符号音乐制作的 扩散模型的精效微粒指导 2410.08435v3

Authors (5): Tingyu Zhu, Haoyu Liu, Ziyu Wang, Zhimin Jiang, Zeyu Zheng

Developing generative models to create or conditionally create symbolic music presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To address these challenges, we introduce an efficient Fine-Grained Guidance (FGG) approach within diffusion models. FGG guides the diffusion models to generate music that aligns more closely with the control and intent of expert composers, which is critical to improve the accuracy, listenability, and quality of generated music. This approach empowers diffusion models to excel in advanced applications such as improvisation, and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effects of the FGG approach. We provide numerical experiments and subjective evaluation to demonstrate the effectiveness of our approach. We have published a demo page to showcase performances, which enables real-time interactive generation.

nan


Article 1600

Title@2025-06-06 (5): MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Title: MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding MedXpertQA: Benchmarking von medizinischem Verständnis auf Expertenebene MedXperttQA:确定专家级医疗理由和理解基准 2501.18362v3

Authors (9): Yuxin Zuo, Shang Qu, Yifei Li, Zhangren Chen, Xuekai Zhu, Ermo Hua, Kaiyan Zhang, Ning Ding, Bowen Zhou

We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 questions spanning 17 specialties and 11 body systems. It includes two subsets, Text for text evaluation and MM for multimodal evaluation. Notably, MM introduces expert-level exam questions with diverse images and rich clinical information, including patient records and examination results, setting it apart from traditional medical multimodal benchmarks with simple QA pairs generated from image captions. MedXpertQA applies rigorous filtering and augmentation to address the insufficient difficulty of existing benchmarks like MedQA, and incorporates specialty board questions to improve clinical relevance and comprehensiveness. We perform data synthesis to mitigate data leakage risk and conduct multiple rounds of expert reviews to ensure accuracy and reliability. We evaluate 18 leading models on \benchmark. Moreover, medicine is deeply connected to real-world decision-making, providing a rich and representative setting for assessing reasoning abilities beyond mathematics and code. To this end, we develop a reasoning-oriented subset to facilitate the assessment of o1-like models. Code and data are available at: https://github.com/TsinghuaC3I/MedXpertQA

nan


Article 1601

Title@2025-06-06 (5): TRUST: Test-time Resource Utilization for Superior Trustworthiness

Title: TRUST: Test-time Resource Utilization for Superior Trustworthiness TRUST: Testzeit-Ressourcennutzung für höchste Vertrauenswürdigkeit 信托基金:高可信赖性试验时间利用资源 2506.06048v1

Authors (2): Haripriya Harikumar, Santu Rana

Standard uncertainty estimation techniques, such as dropout, often struggle to clearly distinguish reliable predictions from unreliable ones. We attribute this limitation to noisy classifier weights, which, while not impairing overall class-level predictions, render finer-level statistics less informative. To address this, we propose a novel test-time optimization method that accounts for the impact of such noise to produce more reliable confidence estimates. This score defines a monotonic subset-selection function, where population accuracy consistently increases as samples with lower scores are removed, and it demonstrates superior performance in standard risk-based metrics such as AUSE and AURC. Additionally, our method effectively identifies discrepancies between training and test distributions, reliably differentiates in-distribution from out-of-distribution samples, and elucidates key differences between CNN and ViT classifiers across various vision datasets.

nan


Article 1602

Title@2025-06-06 (5): Diffusion-Based Hierarchical Graph Neural Networks for Simulating Nonlinear Solid Mechanics

Title: Diffusion-Based Hierarchical Graph Neural Networks for Simulating Nonlinear Solid Mechanics Diffusionsbasierte Hierarchische Graphen-Neuralnetze zur Simulation nichtlinearer Festmechanik 模拟非线性固体机械师 2506.06045v1

Authors (4): Tobias Würth, Niklas Freymuth, Gerhard Neumann, Luise Kärger

Graph-based learned simulators have emerged as a promising approach for simulating physical systems on unstructured meshes, offering speed and generalization across diverse geometries. However, they often struggle with capturing global phenomena, such as bending or long-range correlations, and suffer from error accumulation over long rollouts due to their reliance on local message passing and direct next-step prediction. We address these limitations by introducing the Rolling Diffusion-Batched Inference Network (ROBIN), a novel learned simulator that integrates two key innovations: (i) Rolling Diffusion, a parallelized inference scheme that amortizes the cost of diffusion-based refinement across physical time steps by overlapping denoising steps across a temporal window. (ii) A Hierarchical Graph Neural Network built on algebraic multigrid coarsening, enabling multiscale message passing across different mesh resolutions. This architecture, implemented via Algebraic-hierarchical Message Passing Networks, captures both fine-scale local dynamics and global structural effects critical for phenomena like beam bending or multi-body contact. We validate ROBIN on challenging 2D and 3D solid mechanics benchmarks involving geometric, material, and contact nonlinearities. ROBIN achieves state-of-the-art accuracy on all tasks, substantially outperforming existing next-step learned simulators while reducing inference time by up to an order of magnitude compared to standard diffusion simulators.

nan


Article 1603

Title@2025-06-06 (5): SDS-Net: Shallow-Deep Synergism-detection Network for infrared small target detection

Title: SDS-Net: Shallow-Deep Synergism-detection Network for infrared small target detection SDS-Net: Shallow-Deep Synergism-Detektionsnetzwerk für Infrarot-Kleinzielerkennung SDS-Net:红外小型目标探测浅水深协同-探测网络 2506.06042v1

Authors (5): Taoran Yue, Xiaojin Lu, Jiaxi Cai, Yuanping Chen, Shibing Chu

Current CNN-based infrared small target detection(IRSTD) methods generally overlook the heterogeneity between shallow and deep features, leading to inefficient collaboration between shallow fine grained structural information and deep high-level semantic representations. Additionally, the dependency relationships and fusion mechanisms across different feature hierarchies lack systematic modeling, which fails to fully exploit the complementarity of multilevel features. These limitations hinder IRSTD performance while incurring substantial computational costs. To address these challenges, this paper proposes a shallow-deep synergistic detection network (SDS-Net) that efficiently models multilevel feature representations to increase both the detection accuracy and computational efficiency in IRSTD tasks. SDS-Net introduces a dual-branch architecture that separately models the structural characteristics and semantic properties of features, effectively preserving shallow spatial details while capturing deep semantic representations, thereby achieving high-precision detection with significantly improved inference speed. Furthermore, the network incorporates an adaptive feature fusion module to dynamically model cross-layer feature correlations, enhancing overall feature collaboration and representation capability. Comprehensive experiments on three public datasets (NUAA-SIRST, NUDT-SIRST, and IRSTD-1K) demonstrate that SDS-Net outperforms state-of-the-art IRSTD methods while maintaining low computational complexity and high inference efficiency, showing superior detection performance and broad application prospects. Our code will be made public at https://github.com/PhysiLearn/SDS-Net.

nan


Article 1604

Title@2025-06-06 (5): Tensor-to-Tensor Models with Fast Iterated Sum Features

Title: Tensor-to-Tensor Models with Fast Iterated Sum Features Tensor-to-Tensor-Modelle mit schnell iterierten Sum-Features 具有快速迭代超特性的电传感器到传感器模型 2506.06041v1

Authors (4): Joscha Diehl, Rasheed Ibraheem, Leonard Schmitz, Yue Wu

Data in the form of images or higher-order tensors is ubiquitous in modern deep learning applications. Owing to their inherent high dimensionality, the need for subquadratic layers processing such data is even more pressing than for sequence data. We propose a novel tensor-to-tensor layer with linear cost in the input size, utilizing the mathematical gadget of ``corner trees’’ from the field of permutation counting. In particular, for order-two tensors, we provide an image-to-image layer that can be plugged into image processing pipelines. On the one hand, our method can be seen as a higher-order generalization of state-space models. On the other hand, it is based on a multiparameter generalization of the signature of iterated integrals (or sums). The proposed tensor-to-tensor concept is used to build a neural network layer called the Fast Iterated Sums (FIS) layer which integrates seamlessly with other layer types. We demonstrate the usability of the FIS layer with both classification and anomaly detection tasks. By replacing some layers of a smaller ResNet architecture with FIS, a similar accuracy (with a difference of only 0.1\%) was achieved in comparison to a larger ResNet while reducing the number of trainable parameters and multi-add operations. The FIS layer was also used to build an anomaly detection model that achieved an average AUROC of 97.3\% on the texture images of the popular MVTec AD dataset. The processing and modelling codes are publicly available at https://github.com/diehlj/fast-iterated-sums.

nan


Article 1605

Title@2025-06-06 (5): Do-PFN: In-Context Learning for Causal Effect Estimation

Title: Do-PFN: In-Context Learning for Causal Effect Estimation Do-PFN: In-Context-Lernen für die Schätzung des Kausaleffekts DoPFN: 因果关系估算的理论内学习 2506.06039v1

Authors (6): Jake Robertson, Arik Reuter, Siyuan Guo, Noah Hollmann, Frank Hutter, Bernhard Schölkopf

Estimation of causal effects is critical to a range of scientific disciplines. Existing methods for this task either require interventional data, knowledge about the ground truth causal graph, or rely on assumptions such as unconfoundedness, restricting their applicability in real-world settings. In the domain of tabular machine learning, Prior-data fitted networks (PFNs) have achieved state-of-the-art predictive performance, having been pre-trained on synthetic data to solve tabular prediction problems via in-context learning. To assess whether this can be transferred to the harder problem of causal effect estimation, we pre-train PFNs on synthetic data drawn from a wide variety of causal structures, including interventions, to predict interventional outcomes given observational data. Through extensive experiments on synthetic case studies, we show that our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph. We also perform ablation studies that elucidate Do-PFN’s scalability and robustness across datasets with a variety of causal characteristics.

nan


Article 1606

Title@2025-06-06 (5): CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models

Title: CRISP-NAM: Competing Risks Interpretable Survival Prediction with Neural Additive Models CRISP-NAM: Konkurrenzfähige Risiken interpretierbare Überlebensvorhersage mit neuralen Additivenmodellen CRIISP-NAM: 与神经添加模型相竞争的风险解释性生存预测 2505.21360v2

Authors (1): Dhanesh Ramachandram

Competing risks are crucial considerations in survival modelling, particularly in healthcare domains where patients may experience multiple distinct event types. We propose CRISP-NAM (Competing Risks Interpretable Survival Prediction with Neural Additive Models), an interpretable neural additive model for competing risks survival analysis which extends the neural additive architecture to model cause-specific hazards while preserving feature-level interpretability. Each feature contributes independently to risk estimation through dedicated neural networks, allowing for visualization of complex non-linear relationships between covariates and each competing risk. We demonstrate competitive performance on multiple datasets compared to existing approaches.

nan


Article 1607

Title: A PID-Controlled Tensor Wheel Decomposition Model for Dynamic Link Prediction Ein PID-gesteuertes Tensor-Radzersetzungsmodell für dynamische Link-Vorhersage PID-控制式天线仪式轮式动态链接预测分解模型 2505.14211v2

Authors (2): Qu Wang, Yan Xia

Link prediction in dynamic networks remains a fundamental challenge in network science, requiring the inference of potential interactions and their evolving strengths through spatiotemporal pattern analysis. Traditional static network methods have inherent limitations in capturing temporal dependencies and weight dynamics, while tensor-based methods offer a promising paradigm by encoding dynamic networks into high-order tensors to explicitly model multidimensional interactions across nodes and time. Among them, tensor wheel decomposition (TWD) stands out for its innovative topological structure, which decomposes high-order tensors into cyclic factors and core tensors to maintain structural integrity. To improve the prediction accuracy, this study introduces a PID-controlled tensor wheel decomposition (PTWD) model, which mainly adopts the following two ideas: 1) exploiting the representation power of TWD to capture the latent features of dynamic network topology and weight evolution, and 2) integrating the proportional-integral-derivative (PID) control principle into the optimization process to obtain a stable model parameter learning scheme. The performance on four real datasets verifies that the proposed PTWD model has more accurate link prediction capabilities compared to other models.

nan


Article 1608

Title@2025-06-06 (5): Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling

Title: Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling Neural Flow Diffusion Modelle: Erlernbarer Vorwärtsprozess für verbesserte Diffusionsmodellierung 神经流流传播模型:改进传播模型的可学习前进进程 2404.12940v3

Authors (3): Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth

Conventional diffusion models typically relies on a fixed forward process, which implicitly defines complex marginal distributions over latent variables. This can often complicate the reverse process’ task in learning generative trajectories, and results in costly inference for diffusion models. To address these limitations, we introduce Neural Flow Diffusion Models (NFDM), a novel framework that enhances diffusion models by supporting a broader range of forward processes beyond the standard Gaussian. We also propose a novel parameterization technique for learning the forward process. Our framework provides an end-to-end, simulation-free optimization objective, effectively minimizing a variational upper bound on the negative log-likelihood. Experimental results demonstrate NFDM’s strong performance, evidenced by state-of-the-art likelihood estimation. Furthermore, we investigate NFDM’s capacity for learning generative dynamics with specific characteristics, such as deterministic straight lines trajectories, and demonstrate how the framework may be adopted for learning bridges between two distributions. The results underscores NFDM’s versatility and its potential for a wide range of applications.

nan


Article 1609

Title@2025-06-06 (5): SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations

Title: SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations SDE Matching: Skalierbares und simulationsfreies Training latenter stochastischer Differentialgleichungen SDE 匹配:可缩放和模拟无模拟的静态碎裂差异等量模拟培训 2502.02472v2

Authors (3): Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth

The Latent Stochastic Differential Equation (SDE) is a powerful tool for time series and sequence modeling. However, training Latent SDEs typically relies on adjoint sensitivity methods, which depend on simulation and backpropagation through approximate SDE solutions, which limit scalability. In this work, we propose SDE Matching, a new simulation-free method for training Latent SDEs. Inspired by modern Score- and Flow Matching algorithms for learning generative dynamics, we extend these ideas to the domain of stochastic dynamics for time series and sequence modeling, eliminating the need for costly numerical simulations. Our results demonstrate that SDE Matching achieves performance comparable to adjoint sensitivity methods while drastically reducing computational complexity.

nan


Article 1610

Title@2025-06-06 (5): Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification

Title: Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification Probenspezifische Geräuschinjektion für die Diffusions-basierte Adversarialreinigung 以传播为基础的反向净化的样本特定噪音注射 2506.06027v1

Authors (5): Yuhao Sun, Jiacheng Zhang, Zesheng Ye, Chaowei Xiao, Feng Liu

Diffusion-based purification (DBP) methods aim to remove adversarial noise from the input sample by first injecting Gaussian noise through a forward diffusion process, and then recovering the clean example through a reverse generative process. In the above process, how much Gaussian noise is injected to the input sample is key to the success of DBP methods, which is controlled by a constant noise level $t^$ for all samples in existing methods. In this paper, we discover that an optimal $t^$ for each sample indeed could be different. Intuitively, the cleaner a sample is, the less the noise it should be injected, and vice versa. Motivated by this finding, we propose a new framework, called Sample-specific Score-aware Noise Injection (SSNI). Specifically, SSNI uses a pre-trained score network to estimate how much a data point deviates from the clean data distribution (i.e., score norms). Then, based on the magnitude of score norms, SSNI applies a reweighting function to adaptively adjust $t^*$ for each sample, achieving sample-specific noise injections. Empirically, incorporating our framework with existing DBP methods results in a notable improvement in both accuracy and robustness on CIFAR-10 and ImageNet-1K, highlighting the necessity to allocate distinct noise levels to different samples in DBP methods. Our code is available at: https://github.com/tmlr-group/SSNI.

nan


Article 1611

Title@2025-06-06 (5): On Inverse Problems, Parameter Estimation, and Domain Generalization

Title: On Inverse Problems, Parameter Estimation, and Domain Generalization Inverse Probleme, Parameterschätzung und Domain-Verallgemeinerung 反向问题、参数估计和域通用化 2506.06024v1

Authors (1): Deborah Pereg

Signal restoration and inverse problems are key elements in most real-world data science applications. In the past decades, with the emergence of machine learning methods, inversion of measurements has become a popular step in almost all physical applications, which is normally executed prior to downstream tasks that often involve parameter estimation. In this work, we analyze the general problem of parameter estimation in an inverse problem setting. First, we address the domain-shift problem by re-formulating it in direct relation with the discrete parameter estimation analysis. We analyze a significant vulnerability in current attempts to enforce domain generalization, which we dubbed the Double Meaning Theorem. Our theoretical findings are experimentally illustrated for domain shift examples in image deblurring and speckle suppression in medical imaging. We then proceed to a theoretical analysis of parameter estimation given observed measurements before and after data processing involving an inversion of the observations. We compare this setting for invertible and non-invertible (degradation) processes. We distinguish between continuous and discrete parameter estimation, corresponding with regression and classification problems, respectively. Our theoretical findings align with the well-known information-theoretic data processing inequality, and to a certain degree question the common misconception that data-processing for inversion, based on modern generative models that may often produce outstanding perceptual quality, will necessarily improve the following parameter estimation objective. It is our hope that this paper will provide practitioners with deeper insights that may be leveraged in the future for the development of more efficient and informed strategic system planning, critical in safety-sensitive applications.

nan


Article 1612

Title@2025-06-06 (5): Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation

Title: Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation Begründung durch Ausführung: Vereinheitlichung von Prozess- und Ergebnisprämien für die Codegenerierung 执行中的理由:代码生成的统一程序和结果奖励 2412.15118v2

Authors (8): Zhuohao Yu, Weizheng Gu, Yidong Wang, Xingru Jiang, Zhengran Zeng, Jindong Wang, Wei Ye, Shikun Zhang

Large Language Models excel at code generation yet struggle with complex programming tasks that demand sophisticated reasoning. To bridge this gap, traditional process supervision relies on learned reward models requiring costly training data and suffering from reward misalignment, while outcome supervision fails for complex tasks needing coordinated intermediate steps. We introduce Outcome Refining Process Supervision, which unifies process and outcome supervision by leveraging executable verification: a tree-structured search framework generates strategic alternatives, profiles execution metrics, and scores candidates via self-critique mechanisms that integrate runtime feedback with reasoning. Experiments across 5 models and 3 benchmarks show consistent gains, with 26.9% higher correctness and 42.2% improved code efficiency. The results demonstrate that ORPS enables LLMs to overcome local optima in code generation, suggesting a promising direction for combining verifiable outcomes with structured reasoning to tackle complex challenges. We open-source at: https://github.com/zhuohaoyu/ORPS

nan


Article 1613

Title@2025-06-06 (5): Improving Customer Service with Automatic Topic Detection in User Emails

Title: Improving Customer Service with Automatic Topic Detection in User Emails Verbesserung des Kundenservices mit automatischer Themenerkennung in Benutzer-E-Mails 改进用户电子邮件自动专题检测客户服务 2502.19115v3

Authors (5): Bojana Bašaragin, Darija Medvecki, Gorana Gojić, Milena Oparnica, Dragiša Mišković

This study introduces a novel natural language processing pipeline that enhances customer service efficiency at Telekom Srbija, a leading Serbian telecommunications company, through automated email topic detection and labeling. Central to the pipeline is BERTopic, a modular framework that allows unsupervised topic modeling. After a series of preprocessing and postprocessing steps, we assign one of 12 topics and several additional labels to incoming emails, allowing customer service to filter and access them through a custom-made application. While applied to Serbian, the methodology is conceptually language-agnostic and can be readily adapted to other languages, particularly those that are low-resourced and morphologically rich. The system performance was evaluated by assessing the speed and correctness of the automatically assigned topics, with a weighted average processing time of 0.041 seconds per email and a weighted average F1 score of 0.96. The system now operates in the company’s production environment, streamlining customer service operations through automated email classification.

nan


Article 1614

Title@2025-06-06 (5): Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems

Title: Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems Unisoma: Ein Unified Transformer-basierter Solver für Multi-Solid-Systeme Unisoma:多层系统统一变压器解决方案 2506.06021v1

Authors (5): Shilong Tao, Zhe Feng, Haonan Sun, Zhanxing Zhu, Yunhuai Liu

Multi-solid systems are foundational to a wide range of real-world applications, yet modeling their complex interactions remains challenging. Existing deep learning methods predominantly rely on implicit modeling, where the factors influencing solid deformation are not explicitly represented but are instead indirectly learned. However, as the number of solids increases, these methods struggle to accurately capture intricate physical interactions. In this paper, we introduce a novel explicit modeling paradigm that incorporates factors influencing solid deformation through structured modules. Specifically, we present Unisoma, a unified and flexible Transformer-based model capable of handling variable numbers of solids. Unisoma directly captures physical interactions using contact modules and adaptive interaction allocation mechanism, and learns the deformation through a triplet relationship. Compared to implicit modeling techniques, explicit modeling is more well-suited for multi-solid systems with diverse coupling patterns, as it enables detailed treatment of each solid while preventing information blending and confusion. Experimentally, Unisoma achieves consistent state-of-the-art performance across seven well-established datasets and two complex multi-solid tasks. Code is avaiable at \href{this link}{https://github.com/therontau0054/Unisoma}.

nan


Article 1615

Title@2025-06-06 (5): One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy

Title: One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy Ein Stein, zwei Vögel: Verstärkung der feindlichen Verteidigung durch die Linse der distributiven Diskrepanz 一石,两鸟:通过分布性差异的镜头加强反向防御 2503.02169v2

Authors (4): Jiacheng Zhang, Benjamin I. P. Rubinstein, Jingfeng Zhang, Feng Liu

Statistical adversarial data detection (SADD) detects whether an upcoming batch contains adversarial examples (AEs) by measuring the distributional discrepancies between clean examples (CEs) and AEs. In this paper, we explore the strength of SADD-based methods by theoretically showing that minimizing distributional discrepancy can help reduce the expected loss on AEs. Despite these advantages, SADD-based methods have a potential limitation: they discard inputs that are detected as AEs, leading to the loss of useful information within those inputs. To address this limitation, we propose a two-pronged adversarial defense method, named Distributional-discrepancy-based Adversarial Defense (DAD). In the training phase, DAD first optimizes the test power of the maximum mean discrepancy (MMD) to derive MMD-OPT, which is a stone that kills two birds. MMD-OPT first serves as a guiding signal to minimize the distributional discrepancy between CEs and AEs to train a denoiser. Then, it serves as a discriminator to differentiate CEs and AEs during inference. Overall, in the inference stage, DAD consists of a two-pronged process: (1) directly feeding the detected CEs into the classifier, and (2) removing noise from the detected AEs by the distributional-discrepancy-based denoiser. Extensive experiments show that DAD outperforms current state-of-the-art (SOTA) defense methods by simultaneously improving clean and robust accuracy on CIFAR-10 and ImageNet-1K against adaptive white-box attacks. Codes are publicly available at: https://github.com/tmlr-group/DAD.

nan


Article 1616

Title@2025-06-06 (5): A Theoretical Justification for Asymmetric Actor-Critic Algorithms

Title: A Theoretical Justification for Asymmetric Actor-Critic Algorithms Eine theoretische Rechtfertigung für asymmetrische Aktor-Kritische Algorithmen 关于非对称精算作用者-批评算法的理论理由 2501.19116v2

Authors (3): Gaspard Lambrechts, Damien Ernst, Aditya Mahajan

In reinforcement learning for partially observable environments, many successful algorithms have been developed within the asymmetric learning paradigm. This paradigm leverages additional state information available at training time for faster learning. Although the proposed learning objectives are usually theoretically sound, these methods still lack a precise theoretical justification for their potential benefits. We propose such a justification for asymmetric actor-critic algorithms with linear function approximators by adapting a finite-time convergence analysis to this setting. The resulting finite-time bound reveals that the asymmetric critic eliminates error terms arising from aliasing in the agent state.

nan


Article 1617

Title@2025-06-06 (5): A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces

Title: A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces Ein Fisher-Rao Gradientenstrom für Entropie-regularisierte Markov-Entscheidungsprozesse in polnischen Räumen 用于波兰空间的环丙正规化的Markov决策过程的Fisher-Rao梯度流 2310.02951v3

Authors (5): Bekzhan Kerimkulov, James-Michael Leahy, David Siska, Lukasz Szpruch, Yufei Zhang

We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon entropy-regularised Markov decision processes with Polish state and action space. The flow is a continuous-time analogue of a policy mirror descent method. We establish the global well-posedness of the gradient flow and demonstrate its exponential convergence to the optimal policy. Moreover, we prove the flow is stable with respect to gradient evaluation, offering insights into the performance of a natural policy gradient flow with log-linear policy parameterisation. To overcome challenges stemming from the lack of the convexity of the objective function and the discontinuity arising from the entropy regulariser, we leverage the performance difference lemma and the duality relationship between the gradient and mirror descent flows. Our analysis provides a theoretical foundation for developing various discrete policy gradient algorithms.

nan


Article 1618

Title@2025-06-06 (5): pLDDT-Predictor: High-speed Protein Screening Using Transformer and ESM2

Title: pLDDT-Predictor: High-speed Protein Screening Using Transformer and ESM2 pLDDT-Predictor: High-Speed Protein Screening mit Transformer und ESM2 pLDDT-预测:使用变压器和无害环境管理2进行高速蛋白素筛选 2410.21283v3

Authors (6): Joongwon Chae, Zhenyu Wang, Ijaz Gul, Jiansong Ji, Zhenglin Chen, Peiwu Qin

Recent advancements in protein structure prediction, particularly AlphaFold2, have revolutionized structural biology by achieving near-experimental accuracy ($\text{average RMSD} < 1.5\text{\AA}$). However, the computational demands of these models (approximately 30 minutes per protein on an RTX 4090) significantly limit their application in high-throughput protein screening. While large language models like ESM (Evolutionary Scale Modeling) have shown promise in extracting structural information directly from protein sequences, rapid assessment of protein structure quality for large-scale analyses remains a major challenge. We introduce pLDDT-Predictor, a high-speed protein screening tool that achieves a $250,000\times$ speedup compared to AlphaFold2 by leveraging pre-trained ESM2 protein embeddings and a Transformer architecture. Our model predicts AlphaFold2’s pLDDT (predicted Local Distance Difference Test) scores with a Pearson correlation of 0.7891 and processes proteins in just 0.007 seconds on average. Using a comprehensive dataset of 1.5 million diverse protein sequences (ranging from 50 to 2048 amino acids), we demonstrate that pLDDT-Predictor accurately classifies high-confidence structures (pLDDT $>$ 70) with 91.2\% accuracy and achieves an MSE of 84.8142 compared to AlphaFold2’s predictions. The source code and pre-trained models are freely available at https://github.com/jw-chae/pLDDT_Predictor, enabling the research community to perform rapid, large-scale protein structure quality assessments.

nan


Article 1619

Title@2025-06-06 (5): Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation

Title: Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation Klon-Robuste Gewichte in metrischen Räumen: Umgang mit Redundanz-Bias für Benchmark-Aggregation 中度空间的克隆-罗布体重量:为基准聚合处理冗余琵琶湖 2502.03576v2

Authors (2): Damien Berriaud, Roger Wattenhofer

We are given a set of elements in a metric space. The distribution of the elements is arbitrary, possibly adversarial. Can we weigh the elements in a way that is resistant to such (adversarial) manipulations? This problem arises in various contexts. For instance, the elements could represent data points, requiring robust domain adaptation. Alternatively, they might represent tasks to be aggregated into a benchmark; or questions about personal political opinions in voting advice applications. This article introduces a theoretical framework for dealing with such problems. We propose clone-proof weighting functions as a solution concept. These functions distribute importance across elements of a set such that similar objects (``clones’’) share (some of) their weights, thus avoiding a potential bias introduced by their multiplicity. Our framework extends the maximum uncertainty principle to accommodate general metric spaces and includes a set of axioms – symmetry, continuity, and clone-proofness – that guide the construction of weighting functions. Finally, we address the existence of weighting functions satisfying our axioms in the significant case of Euclidean spaces and propose a general method for their construction.

nan


Article 1620

Title@2025-06-06 (5): LightGTS: A Lightweight General Time Series Forecasting Model

Title: LightGTS: A Lightweight General Time Series Forecasting Model LightGTS: Ein leichtes allgemeines Zeitreihen-Vorhersagemodell LightGTS:轻型一般时间序列预测模型 2506.06005v1

Authors (8): Yihang Wang, Yuying Qiu, Peng Chen, Yang Shu, Zhongwen Rao, Lujia Pan, Bin Yang, Chenjuan Guo

Existing works on general time series forecasting build foundation models with heavy model parameters through large-scale multi-source pre-training. These models achieve superior generalization ability across various datasets at the cost of significant computational burdens and limitations in resource-constrained scenarios. This paper introduces LightGTS, a lightweight general time series forecasting model designed from the perspective of consistent periodical modeling. To handle diverse scales and intrinsic periods in multi-source pre-training, we introduce Periodical Tokenization, which extracts consistent periodic patterns across different datasets with varying scales. To better utilize the periodicity in the decoding process, we further introduce Periodical Parallel Decoding, which leverages historical tokens to improve forecasting. Based on the two techniques above which fully leverage the inductive bias of periods inherent in time series, LightGTS uses a lightweight model to achieve outstanding performance on general time series forecasting. It achieves state-of-the-art forecasting performance on 9 real-world benchmarks in both zero-shot and full-shot settings with much better efficiency compared with existing time series foundation models.

nan


Article 1621

Title@2025-06-06 (5): What Really is a Member? Discrediting Membership Inference via Poisoning

Title: What Really is a Member? Discrediting Membership Inference via Poisoning Was ist wirklich ein Mitglied? Diskreditierende Mitgliedschaftsinferenz durch Vergiftung 真正的成员是什么? 2506.06003v1

Authors (8): Neal Mangaokar, Ashish Hooda, Zhuohang Li, Bradley A. Malin, Kassem Fawaz, Somesh Jha, Atul Prakash, Amrita Roy Chowdhury

Membership inference tests aim to determine whether a particular data point was included in a language model’s training set. However, recent works have shown that such tests often fail under the strict definition of membership based on exact matching, and have suggested relaxing this definition to include semantic neighbors as members as well. In this work, we show that membership inference tests are still unreliable under this relaxation - it is possible to poison the training dataset in a way that causes the test to produce incorrect predictions for a target point. We theoretically reveal a trade-off between a test’s accuracy and its robustness to poisoning. We also present a concrete instantiation of this poisoning attack and empirically validate its effectiveness. Our results show that it can degrade the performance of existing tests to well below random.

nan


Article 1622

Title@2025-06-06 (5): LaDEEP: A Deep Learning-based Surrogate Model for Large Deformation of Elastic-Plastic Solids

Title: LaDEEP: A Deep Learning-based Surrogate Model for Large Deformation of Elastic-Plastic Solids LaDEEP: Ein Deep Learning-basiertes Surrogate-Modell für große Deformation von elastischen-Plastischen Festkörpern LaDEEEP:一个基于深层学习的弹性固体大规模变形代金模型 2506.06001v1

Authors (5): Shilong Tao, Zhe Feng, Haonan Sun, Zhanxing Zhu, Yunhuai Liu

Scientific computing for large deformation of elastic-plastic solids is critical for numerous real-world applications. Classical numerical solvers rely primarily on local discrete linear approximation and are constrained by an inherent trade-off between accuracy and efficiency. Recently, deep learning models have achieved impressive progress in solving the continuum mechanism. While previous models have explored various architectures and constructed coefficient-solution mappings, they are designed for general instances without considering specific problem properties and hard to accurately handle with complex elastic-plastic solids involving contact, loading and unloading. In this work, we take stretch bending, a popular metal fabrication technique, as our case study and introduce LaDEEP, a deep learning-based surrogate model for \textbf{La}rge \textbf{De}formation of \textbf{E}lastic-\textbf{P}lastic Solids. We encode the partitioned regions of the involved slender solids into a token sequence to maintain their essential order property. To characterize the physical process of the solid deformation, a two-stage Transformer-based module is designed to predict the deformation with the sequence of tokens as input. Empirically, LaDEEP achieves five magnitudes faster speed than finite element methods with a comparable accuracy, and gains 20.47\% relative improvement on average compared to other deep learning baselines. We have also deployed our model into a real-world industrial production system, and it has shown remarkable performance in both accuracy and efficiency.

nan


Article 1623

Title@2025-06-06 (5): Machine learning for in-situ composition mapping in a self-driving magnetron sputtering system

Title: Machine learning for in-situ composition mapping in a self-driving magnetron sputtering system Maschinelles Lernen für die In-situ-Kompositionsmapping in einem selbstfahrenden Magnetron-Sputtersystem 在自驾驶磁磁突飞系统内进行原地成像绘图的机器学习 2506.05999v1

Authors (5): Sanna Jarl, Jens Sjölund, Robert J. W. Frost, Anders Holst, Jonathan J. S. Scragg

Self-driving labs (SDLs), employing automation and machine learning (ML) to accelerate experimental procedures, have enormous potential in the discovery of new materials. However, in thin film science, SDLs are mainly restricted to solution-based synthetic methods which are easier to automate but cannot access the broad chemical space of inorganic materials. This work presents an SDL based on magnetron co-sputtering. We are using combinatorial frameworks, obtaining accurate composition maps on multi-element, compositionally graded thin films. This normally requires time-consuming ex-situ analysis prone to systematic errors. We present a rapid and calibration-free in-situ, ML driven approach to produce composition maps for arbitrary source combinations and sputtering conditions. We develop a method to predict the composition distribution in a multi-element combinatorial thin film, using in-situ measurements from quartz-crystal microbalance sensors placed in a sputter chamber. For a given source, the sensor readings are learned as a function of the sputtering pressure and magnetron power, through active learning using Gaussian processes (GPs). The final GPs are combined with a geometric model of the deposition flux distribution in the chamber, which allows interpolation of the deposition rates from each source, at any position across the sample. We investigate several acquisition functions for the ML procedure. A fully Bayesian GP - BALM (Bayesian active learning MacKay) - achieved the best performance, learning the deposition rates for a single source in 10 experiments. Prediction accuracy for co-sputtering composition distributions was verified experimentally. Our framework dramatically increases throughput by avoiding the need for extensive characterisation or calibration, thus demonstrating the potential of ML-guided SDLs to accelerate materials exploration.

nan


Article 1624

Title@2025-06-06 (5): RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory

Title: RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory RETENTION: Ressourceneffizientes baumbasiertes Ensemblemodell Beschleunigung mit Content-Addressablem Speicher 消耗:基于资源的高效树基集合模型加速,内装内存内存内装内存内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内装内 2506.05994v1

Authors (6): Yi-Chun Liao, Chieh-Lin Tsai, Yuan-Hao Chang, Camélia Slimani, Jalil Boukhobza, Tei-Wei Kuo

Although deep learning has demonstrated remarkable capabilities in learning from unstructured data, modern tree-based ensemble models remain superior in extracting relevant information and learning from structured datasets. While several efforts have been made to accelerate tree-based models, the inherent characteristics of the models pose significant challenges for conventional accelerators. Recent research leveraging content-addressable memory (CAM) offers a promising solution for accelerating tree-based models, yet existing designs suffer from excessive memory consumption and low utilization. This work addresses these challenges by introducing RETENTION, an end-to-end framework that significantly reduces CAM capacity requirement for tree-based model inference. We propose an iterative pruning algorithm with a novel pruning criterion tailored for bagging-based models (e.g., Random Forest), which minimizes model complexity while ensuring controlled accuracy degradation. Additionally, we present a tree mapping scheme that incorporates two innovative data placement strategies to alleviate the memory redundancy caused by the widespread use of don’t care states in CAM. Experimental results show that implementing the tree mapping scheme alone achieves $1.46\times$ to $21.30 \times$ better space efficiency, while the full RETENTION framework yields $4.35\times$ to $207.12\times$ improvement with less than 3% accuracy loss. These results demonstrate that RETENTION is highly effective in reducing CAM capacity requirement, providing a resource-efficient direction for tree-based model acceleration.

nan


Article 1625

Title@2025-06-06 (5): Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

Title: Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning Diffusions-Politiken für die Verallgemeinerung von Out-of-Distribution im Offline-Verstärkungs-Lernen 在离线强化学习中推广分配外普遍化政策 2307.04726v4

Authors (3): Suzan Ece Ada, Erhan Oztop, Emre Ugur

Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for data collection. However, they face challenges handling distribution shifts due to the lack of online interaction during training. To this end, we propose a novel method named State Reconstruction for Diffusion Policies (SRDP) that incorporates state reconstruction feature learning in the recent class of diffusion policies to address the problem of out-of-distribution (OOD) generalization. Our method promotes learning of generalizable state representation to alleviate the distribution shift caused by OOD states. To illustrate the OOD generalization and faster convergence of SRDP, we design a novel 2D Multimodal Contextual Bandit environment and realize it on a 6-DoF real-world UR10 robot, as well as in simulation, and compare its performance with prior algorithms. In particular, we show the importance of the proposed state reconstruction via ablation studies. In addition, we assess the performance of our model on standard continuous control benchmarks (D4RL), namely the navigation of an 8-DoF ant and forward locomotion of half-cheetah, hopper, and walker2d, achieving state-of-the-art results. Finally, we demonstrate that our method can achieve 167% improvement over the competing baseline on a sparse continuous control navigation task where various regions of the state space are removed from the offline RL dataset, including the region encapsulating the goal.

nan


Article 1626

Title@2025-06-06 (5): Peri-LN: Revisiting Normalization Layer in the Transformer Architecture

Title: Peri-LN: Revisiting Normalization Layer in the Transformer Architecture Peri-LN: Normalisierungsschicht in der Transformer-Architektur neu aufsuchen Peri-LN: 变形器结构中重新审视正常化层 2502.02732v3

Authors (10): Jeonghoon Kim, Byeongchan Lee, Cheonbok Park, Yeontaek Oh, Beomjun Kim, Taehwan Yoo, Seongjin Shin, Dongyoon Han, Jinwoo Shin, Kang Min Yoo

Selecting a layer normalization (LN) strategy that stabilizes training and speeds convergence in Transformers remains difficult, even for today’s large language models (LLM). We present a comprehensive analytical foundation for understanding how different LN strategies influence training dynamics in large-scale Transformers. Until recently, Pre-LN and Post-LN have long dominated practices despite their limitations in large-scale training. However, several open-source models have recently begun silently adopting a third strategy without much explanation. This strategy places normalization layer peripherally around sublayers, a design we term Peri-LN. While Peri-LN has demonstrated promising performance, its precise mechanisms and benefits remain almost unexplored. Our in-depth analysis delineates the distinct behaviors of LN strategies, showing how each placement shapes activation variance and gradient propagation. To validate our theoretical insight, we conduct extensive experiments on Transformers up to $3.2$B parameters, showing that Peri-LN consistently achieves more balanced variance growth, steadier gradient flow, and convergence stability. Our results suggest that Peri-LN warrants broader consideration for large-scale Transformer architectures, providing renewed insights into the optimal placement of LN.

nan


Article 1627

Title@2025-06-06 (5): Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning

Title: Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning Dynamische Mischung von Progressivem Parameter-Effizienter Expertenbibliothek für lebenslanges Roboterlernen 终身机器人学习进步参数有效专家图书馆动态混合体 2506.05985v1

Authors (6): Yuheng Lei, Sitong Mao, Shunbo Zhou, Hongyuan Zhang, Xuelong Li, Ping Luo

A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting. Previous work within the dominant pretrain-then-finetune paradigm has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters. However, in the context of lifelong learning, these methods rely on the impractical assumption of a test-time task identifier and restrict knowledge sharing among isolated adapters. To address these limitations, we propose Dynamic Mixture of Progressive Parameter-Efficient Expert Library (DMPEL) for lifelong robot learning. DMPEL progressively learn a low-rank expert library and employs a lightweight router to dynamically combine experts into an end-to-end policy, facilitating flexible behavior during lifelong adaptation. Moreover, by leveraging the modular structure of the fine-tuned parameters, we introduce coefficient replay to guide the router in accurately retrieving frozen experts for previously encountered tasks, thereby mitigating catastrophic forgetting. This method is significantly more storage- and computationally-efficient than applying demonstration replay to the entire policy. Extensive experiments on the lifelong manipulation benchmark LIBERO demonstrate that our framework outperforms state-of-the-art lifelong learning methods in success rates across continual adaptation, while utilizing minimal trainable parameters and storage.

nan


Article 1628

Title@2025-06-06 (5): Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

Title: Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations Jacobian Sparse Autoencoder: Sparsify Computations, nicht nur Aktivierungen Jacobian Sparass Autoencoders: 简化计算,而不仅仅是激活 2502.18147v2

Authors (4): Lucy Farnik, Tim Lawson, Conor Houghton, Laurence Aitchison

Sparse autoencoders (SAEs) have been successfully used to discover sparse and human-interpretable representations of the latent activations of LLMs. However, we would ultimately like to understand the computations performed by LLMs and not just their representations. The extent to which SAEs can help us understand computations is unclear because they are not designed to “sparsify” computations in any sense, only latent activations. To solve this, we propose Jacobian SAEs (JSAEs), which yield not only sparsity in the input and output activations of a given model component but also sparsity in the computation (formally, the Jacobian) connecting them. With a na"ive implementation, the Jacobians in LLMs would be computationally intractable due to their size. One key technical contribution is thus finding an efficient way of computing Jacobians in this setup. We find that JSAEs extract a relatively large degree of computational sparsity while preserving downstream LLM performance approximately as well as traditional SAEs. We also show that Jacobians are a reasonable proxy for computational sparsity because MLPs are approximately linear when rewritten in the JSAE basis. Lastly, we show that JSAEs achieve a greater degree of computational sparsity on pre-trained LLMs than on the equivalent randomized LLM. This shows that the sparsity of the computational graph appears to be a property that LLMs learn through training, and suggests that JSAEs might be more suitable for understanding learned transformer computations than standard SAEs.

nan


Article 1629

Title@2025-06-06 (5): A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations

Title: A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations Eine Übersicht über automatische Bewertungsmethoden zu Text-, Bild- und Sprachgenerierungen 文字、视觉和语音一代自动评价方法调查 2506.10019v1

Authors (11): Tian Lan, Yang-Hao Zhou, Zi-Ao Ma, Fanshu Sun, Rui-Qing Sun, Junyu Luo, Rong-Cheng Tu, Heyan Huang, Chen Xu, Zhijing Wu, Xian-Ling Mao

Recent advances in deep learning have significantly enhanced generative AI capabilities across text, images, and audio. However, automatically evaluating the quality of these generated outputs presents ongoing challenges. Although numerous automatic evaluation methods exist, current research lacks a systematic framework that comprehensively organizes these methods across text, visual, and audio modalities. To address this issue, we present a comprehensive review and a unified taxonomy of automatic evaluation methods for generated content across all three modalities; We identify five fundamental paradigms that characterize existing evaluation approaches across these domains. Our analysis begins by examining evaluation methods for text generation, where techniques are most mature. We then extend this framework to image and audio generation, demonstrating its broad applicability. Finally, we discuss promising directions for future research in cross-modal evaluation methodologies.

nan


Article 1630

Title@2025-06-06 (5): HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks

Title: HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks HO-FMN: Hyperparameter-Optimierung für schnelle Minimal-Norm-Angriffe HO-FMN: 快速最低目标攻击的超参数优化 2407.08806v2

Authors (9): Raffaele Mura, Giuseppe Floris, Luca Scionis, Giorgio Piras, Maura Pintor, Ambra Demontis, Giorgio Giacinto, Battista Biggio, Fabio Roli

Gradient-based attacks are a primary tool to evaluate robustness of machine-learning models. However, many attacks tend to provide overly-optimistic evaluations as they use fixed loss functions, optimizers, step-size schedulers, and default hyperparameters. In this work, we tackle these limitations by proposing a parametric variation of the well-known fast minimum-norm attack algorithm, whose loss, optimizer, step-size scheduler, and hyperparameters can be dynamically adjusted. We re-evaluate 12 robust models, showing that our attack finds smaller adversarial perturbations without requiring any additional tuning. This also enables reporting adversarial robustness as a function of the perturbation budget, providing a more complete evaluation than that offered by fixed-budget attacks, while remaining efficient. We release our open-source code at https://github.com/pralab/HO-FMN.

nan


Article 1631

Title@2025-06-06 (5): AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification

Title: AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification AMPED: Adaptive Multi-Objektive Projektion zum Ausgleich von Exploration und Geschicklichkeit Diversifizierung AMMPED: 兼顾探索与技能多样化的适应性多目标预测 2506.05980v1

Authors (6): Geonwoo Cho, Jaemoon Lee, Jaegyun Im, Subi Lee, Jihwan Lee, Sundong Kim

Skill-based reinforcement learning (SBRL) enables rapid adaptation in environments with sparse rewards by pretraining a skill-conditioned policy. Effective skill learning requires jointly maximizing both exploration and skill diversity. However, existing methods often face challenges in simultaneously optimizing for these two conflicting objectives. In this work, we propose a new method, Adaptive Multi-objective Projection for balancing Exploration and skill Diversification (AMPED), which explicitly addresses both exploration and skill diversification. We begin by conducting extensive ablation studies to identify and define a set of objectives that effectively capture the aspects of exploration and skill diversity, respectively. During the skill pretraining phase, AMPED introduces a gradient surgery technique to balance the objectives of exploration and skill diversity, mitigating conflicts and reducing reliance on heuristic tuning. In the subsequent fine-tuning phase, AMPED incorporates a skill selector module that dynamically selects suitable skills for downstream tasks, based on task-specific performance signals. Our approach achieves performance that surpasses SBRL baselines across various benchmarks. These results highlight the importance of explicitly harmonizing exploration and diversity and demonstrate the effectiveness of AMPED in enabling robust and generalizable skill learning. Project Page: https://geonwoo.me/amped/

nan


Article 1632

Title@2025-06-06 (5): Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning

Title: Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning Mit Adaptive Transformer Block Expansion in Federated Fine-Tuning Catastrophic Forgetting 减轻灾难性灾难,以适应性变异器集团扩大联邦炼油厂中的适应性变异器块 2506.05977v1

Authors (6): Yujia Huo, Jianchun Liu, Hongli Xu, Zhenguo Ma, Shilong Wang, Liusheng Huang

Federated fine-tuning (FedFT) of large language models (LLMs) has emerged as a promising solution for adapting models to distributed data environments while ensuring data privacy. Existing FedFT methods predominantly utilize parameter-efficient fine-tuning (PEFT) techniques to reduce communication and computation overhead. However, they often fail to adequately address the catastrophic forgetting, a critical challenge arising from continual adaptation in distributed environments. The traditional centralized fine-tuning methods, which are not designed for the heterogeneous and privacy-constrained nature of federated environments, struggle to mitigate this issue effectively. Moreover, the challenge is further exacerbated by significant variation in data distributions and device capabilities across clients, which leads to intensified forgetting and degraded model generalization. To tackle these issues, we propose FedBE, a novel FedFT framework that integrates an adaptive transformer block expansion mechanism with a dynamic trainable-block allocation strategy. Specifically, FedBE expands trainable blocks within the model architecture, structurally separating newly learned task-specific knowledge from the original pre-trained representations. Additionally, FedBE dynamically assigns these trainable blocks to clients based on their data distributions and computational capabilities. This enables the framework to better accommodate heterogeneous federated environments and enhances the generalization ability of the model.Extensive experiments show that compared with existing federated fine-tuning methods, FedBE achieves 12-74% higher accuracy retention on general tasks after fine-tuning and a model convergence acceleration ratio of 1.9-3.1x without degrading the accuracy of downstream tasks.

nan


Article 1633

Title@2025-06-06 (5): Graph Deep Learning for Time Series Forecasting

Title: Graph Deep Learning for Time Series Forecasting Graph Deep Learning für die Vorhersage von Zeitreihen 用于时间序列预测的深深学习图 2310.15978v2

Authors (4): Andrea Cini, Ivan Marisca, Daniele Zambon, Cesare Alippi

Graph deep learning methods have become popular tools to process collections of correlated time series. Unlike traditional multivariate forecasting methods, graph-based predictors leverage pairwise relationships by conditioning forecasts on graphs spanning the time series collection. The conditioning takes the form of architectural inductive biases on the forecasting architecture, resulting in a family of models called spatiotemporal graph neural networks. These biases allow for training global forecasting models on large collections of time series while localizing predictions w.r.t. each element in the set (nodes) by accounting for correlations among them (edges). Recent advances in graph neural networks and deep learning for time series forecasting make the adoption of such processing framework appealing and timely. However, most studies focus on refining existing architectures by exploiting modern deep-learning practices. Conversely, foundational and methodological aspects have not been subject to systematic investigation. To fill this void, this tutorial paper aims to introduce a comprehensive methodological framework formalizing the forecasting problem and providing design principles for graph-based predictors, as well as methods to assess their performance. In addition, together with an overview of the field, we provide design guidelines and best practices, as well as an in-depth discussion of open challenges and future directions.

nan


Article 1634

Title@2025-06-06 (5): On Measuring Long-Range Interactions in Graph Neural Networks

Title: On Measuring Long-Range Interactions in Graph Neural Networks Bei der Messung von Langzeit-Interaktionen in Graphen-Neuralen Netzwerken 关于在图形神经网络中测量长距离相互作用 2506.05971v1

Authors (5): Jacob Bamberger, Benjamin Gutteridge, Scott le Roux, Michael M. Bronstein, Xiaowen Dong

Long-range graph tasks – those dependent on interactions between distant nodes – are an open problem in graph neural network research. Real-world benchmark tasks, especially the Long Range Graph Benchmark, have become popular for validating the long-range capability of proposed architectures. However, this is an empirical approach that lacks both robustness and theoretical underpinning; a more principled characterization of the long-range problem is required. To bridge this gap, we formalize long-range interactions in graph tasks, introduce a range measure for operators on graphs, and validate it with synthetic experiments. We then leverage our measure to examine commonly used tasks and architectures, and discuss to what extent they are, in fact, long-range. We believe our work advances efforts to define and address the long-range problem on graphs, and that our range measure will aid evaluation of new datasets and architectures.

nan


Article 1635

Title@2025-06-06 (5): Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning

Title: Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning Schrittweiser Übergang vom Bellman Optimality Operator zum Bellman Operator im Online-Verstärkungslernen 从Bellman最佳运算符逐步过渡到在线强化学习中的Bellman运算符 2506.05968v1

Authors (5): Motoki Omura, Kazuki Ota, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada

For continuous action spaces, actor-critic methods are widely used in online reinforcement learning (RL). However, unlike RL algorithms for discrete actions, which generally model the optimal value function using the Bellman optimality operator, RL algorithms for continuous actions typically model Q-values for the current policy using the Bellman operator. These algorithms for continuous actions rely exclusively on policy updates for improvement, which often results in low sample efficiency. This study examines the effectiveness of incorporating the Bellman optimality operator into actor-critic frameworks. Experiments in a simple environment show that modeling optimal values accelerates learning but leads to overestimation bias. To address this, we propose an annealing approach that gradually transitions from the Bellman optimality operator to the Bellman operator, thereby accelerating learning while mitigating bias. Our method, combined with TD3 and SAC, significantly outperforms existing approaches across various locomotion and manipulation tasks, demonstrating improved performance and robustness to hyperparameters related to optimality.

nan


Article 1636

Title@2025-06-06 (5): Preference Learning for AI Alignment: a Causal Perspective

Title: Preference Learning for AI Alignment: a Causal Perspective Bevorzugtes Lernen für die AI-Ausrichtung: eine kausale Perspektive AI一致的优先学习:从因果关系角度看问题 2506.05967v1

Authors (2): Katarzyna Kobalczyk, Mihaela van der Schaar

Reward modelling from preference data is a crucial step in aligning large language models (LLMs) with human values, requiring robust generalisation to novel prompt-response pairs. In this work, we propose to frame this problem in a causal paradigm, providing the rich toolbox of causality to identify the persistent challenges, such as causal misidentification, preference heterogeneity, and confounding due to user-specific factors. Inheriting from the literature of causal inference, we identify key assumptions necessary for reliable generalisation and contrast them with common data collection practices. We illustrate failure modes of naive reward models and demonstrate how causally-inspired approaches can improve model robustness. Finally, we outline desiderata for future research and practices, advocating targeted interventions to address inherent limitations of observational data.

nan


Article 1637

Title@2025-06-06 (5): Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities

Title: Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities Auf dem Weg zu robusten multimodalen physiologischen Grundlagenmodellen: Umgang mit willkürlich fehlenden Modalitäten 争取建立强有力的多式多模式生理基金会模型:处理任意失踪方式 2504.19596v2

Authors (4): Wei-Bang Jiang, Xi Fu, Yi Ding, Cuntai Guan

Multimodal physiological signals, such as EEG, ECG, EOG, and EMG, are crucial for healthcare and brain-computer interfaces. While existing methods rely on specialized architectures and dataset-specific fusion strategies, they struggle to learn universal representations that generalize across datasets and handle missing modalities at inference time. To address these issues, we propose PhysioOmni, a foundation model for multimodal physiological signal analysis that models both homogeneous and heterogeneous features to decouple multimodal signals and extract generic representations while maintaining compatibility with arbitrary missing modalities. PhysioOmni trains a decoupled multimodal tokenizer, enabling masked signal pre-training via modality-invariant and modality-specific objectives. To ensure adaptability to diverse and incomplete modality combinations, the pre-trained encoders undergo resilient fine-tuning with prototype alignment on downstream datasets. Extensive experiments on four downstream tasks, emotion recognition, sleep stage classification, motor prediction, and mental workload detection, demonstrate that PhysioOmni achieves state-of-the-art performance while maintaining strong robustness to missing modalities. Our code and model weights will be released.

nan


Article 1638

Title@2025-06-06 (5): Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology

Title: Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology Subspezialitätsspezifisches Stiftungsmodell für intelligente Gastrointestinalpathologie 智能气胃肠道病理学 2505.21928v2

Authors (27): Lianghui Zhu, Xitong Ling, Minxi Ouyang, Xiaoping Liu, Tian Guan, Mingxi Fu, Zhiqiang Cheng, Fanglei Fu, Maomao Zeng, Liming Liu, Song Duan, Qiang Huang, Ying Xiao, Jianming Li, Shanming Lu, Zhenghua Piao, Mingxi Zhu, Yibo Jin, Shan Xu, Qiming He, Yizhi Wang, Junru Cheng, Xuanyu Wang, Luxi Xie, Houqiang Li, Sufang Tian, Yonghong He

Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterative optimization strategy combining pretraining with fine-screening, specifically designed to address the detection of sparsely distributed lesion areas in whole-slide images. Digepath is pretrained on over 353 million multi-scale images from 210,043 H&E-stained slides of GI diseases. It attains state-of-the-art performance on 33 out of 34 tasks related to GI pathology, including pathological diagnosis, protein expression status prediction, gene mutation prediction, and prognosis evaluation. We further translate the intelligent screening module for early GI cancer and achieve near-perfect 99.70% sensitivity across nine independent medical institutions. This work not only advances AI-driven precision pathology for GI diseases but also bridge critical gaps in histopathological practice.

nan


Article 1639

Title@2025-06-06 (5): Certification for Differentially Private Prediction in Gradient-Based Training

Title: Certification for Differentially Private Prediction in Gradient-Based Training Zertifizierung für unterschiedliche private Vorhersagen im gradient-based Training 渐进式培训中差别私人预测认证 2406.13433v3

Authors (8): Matthew Wicker, Philip Sosnin, Igor Shilov, Adrianna Janik, Mark N. Müller, Yves-Alexandre de Montjoye, Adrian Weller, Calvin Tsay

We study private prediction where differential privacy is achieved by adding noise to the outputs of a non-private model. Existing methods rely on noise proportional to the global sensitivity of the model, often resulting in sub-optimal privacy-utility trade-offs compared to private training. We introduce a novel approach for computing dataset-specific upper bounds on prediction sensitivity by leveraging convex relaxation and bound propagation techniques. By combining these bounds with the smooth sensitivity mechanism, we significantly improve the privacy analysis of private prediction compared to global sensitivity-based approaches. Experimental results across real-world datasets in medical image classification and natural language processing demonstrate that our sensitivity bounds are can be orders of magnitude tighter than global sensitivity. Our approach provides a strong basis for the development of novel privacy preserving technologies.

nan


Article 1640

Title@2025-06-06 (5): Diving into Self-Evolving Training for Multimodal Reasoning

Title: Diving into Self-Evolving Training for Multimodal Reasoning Eintauchen in das Selbst-Evolving Training für multimodale Vernunft 跳入多模式理由自我发展培训 2412.17451v3

Authors (6): Wei Liu, Junlong Li, Xiwen Zhang, Fan Zhou, Yu Cheng, Junxian He

Self-evolving trainin–where models iteratively learn from their own outputs–has emerged as a key approach for complex reasoning tasks, addressing the scarcity of high-quality chain-of-thought data. However, its effectiveness in multimodal reasoning, a domain more intricate than text-only reasoning, remains underexplored, and the understanding of critical factors in this training paradigm remains limited. Furthermore, a central challenge for this training method is performance saturation, which impedes further improvements and scalability. Inspired by reinforcement learning (RL), in this paper, we reframe self-evolving training for multimodal reasoning through the lens of RL, identifying three pivotal factors: Training Method, Reward Model, and Prompt Variation. Through systematic analysis, we establish relatively optimal design principles that significantly enhance multimodal reasoning capabilities. Moreover, delving deeper into training dynamics, we uncover the roots of saturation and propose a new automatic balancing mechanism to mitigate this limitation. Building on these insights, we propose M-STAR (Multimodal Self-evolving Training for Reasoning), a framework that achieves consistent performance gains across models of varying sizes and diverse benchmarks. All resources are made publicly available at https://mstar-lmm.github.io.

nan


Article 1641

Title@2025-06-06 (5): Applying XAI based unsupervised knowledge discovering for Operation modes in a WWTP. A real case: AQUAVALL WWTP

Title: Applying XAI based unsupervised knowledge discovering for Operation modes in a WWTP. A real case: AQUAVALL WWTP Anwendung von XAI-basierten unüberwachten Kenntnissen für Betriebsmodi in einem WWTP. Ein echter Fall: AQUAVALL WWTP 在WWTP中应用基于 XAI 的未经监督的操作模式知识发现 XAI 。 一个真实案例: AQUAVALL WWTP 2506.05958v1

Authors (5): Alicia Beneyto-Rodriguez, Gregorio I. Sainz-Palmero, Marta Galende-Hernández, María J. Fuente, José M. Cuenca

Water reuse is a key point when fresh water is a commodity in ever greater demand, but which is also becoming ever more available. Furthermore, the return of clean water to its natural environment is also mandatory. Therefore, wastewater treatment plants (WWTPs) are essential in any policy focused on these serious challenges. WWTPs are complex facilities which need to operate at their best to achieve their goals. Nowadays, they are largely monitored, generating large databases of historical data concerning their functioning over time. All this implies a large amount of embedded information which is not usually easy for plant managers to assimilate, correlate and understand; in other words, for them to know the global operation of the plant at any given time. At this point, the intelligent and Machine Learning (ML) approaches can give support for that need, managing all the data and translating them into manageable, interpretable and explainable knowledge about how the WWTP plant is operating at a glance. Here, an eXplainable Artificial Intelligence (XAI) based methodology is proposed and tested for a real WWTP, in order to extract explainable service knowledge concerning the operation modes of the WWTP managed by AQUAVALL, which is the public service in charge of the integral water cycle in the City Council of Valladolid (Castilla y Le'on, Spain). By applying well-known approaches of XAI and ML focused on the challenge of WWTP, it has been possible to summarize a large number of historical databases through a few explained operation modes of the plant in a low-dimensional data space, showing the variables and facility units involved in each case.

nan


Article 1642

Title@2025-06-06 (5): Inferring entropy production in many-body systems using nonequilibrium MaxEnt

Title: Inferring entropy production in many-body systems using nonequilibrium MaxEnt Ableitung der Entropieproduktion in vielen Körpersystemen unter Verwendung von Nonequilibrium MaxEnt 在使用无平衡的MaxEnt的多机体系统中,在多机体系统中产生导导导导酶 2505.10444v2

Authors (3): Miguel Aguilera, Sosuke Ito, Artemy Kolchinsky

We propose a method for inferring entropy production (EP) in high-dimensional stochastic systems, including many-body systems and non-Markovian systems with long memory. Standard techniques for estimating EP become intractable in such systems due to computational and statistical limitations. We infer trajectory-level EP and lower bounds on average EP by exploiting a nonequilibrium analogue of the Maximum Entropy principle, along with convex duality. Our approach uses only samples of trajectory observables (such as spatiotemporal correlation functions). It does not require reconstruction of high-dimensional probability distributions or rate matrices, nor any special assumptions such as discrete states or multipartite dynamics. It may be used to compute a hierarchical decomposition of EP, reflecting contributions from different kinds of interactions, and it has an intuitive physical interpretation as a thermodynamic uncertainty relation. We demonstrate its numerical performance on a disordered nonequilibrium spin model with 1000 spins and a large neural spike-train dataset.

nan


Article 1643

Title@2025-06-06 (5): Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes

Title: Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes Deterministische Politik mit politischen Gradienten in eingeschränkten Markov-Entscheidungsprozessen lernen 在受约束的Markov决策过程中,学习具有政策分级的政策确定性政策 2506.05953v1

Authors (5): Alessandro Montenegro, Leonardo Cesani, Marco Mussi, Matteo Papini, Alberto Maria Metelli

Constrained Reinforcement Learning (CRL) addresses sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints. In this setting, policy-based methods are widely used thanks to their advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or a parameter-based exploration strategy, depending on whether they learn the parameters of a stochastic policy or those of a stochastic hyperpolicy. We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under gradient domination assumptions. Furthermore, under specific noise models where the (hyper)policy is expressed as a stochastic perturbation of the actions or of the parameters of an underlying deterministic policy, we additionally establish global last-iterate convergence guarantees of C-PG to the optimal deterministic policy. This holds when learning a stochastic (hyper)policy and subsequently switching off the stochasticity at the end of training, thereby deploying a deterministic policy. Finally, we empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks, and compare them against state-of-the-art baselines, demonstrating their effectiveness, in particular when deploying deterministic policies after training.

nan


Article 1644

Title@2025-06-06 (5): How to explain grokking

Title: How to explain grokking Wie man Grokking erklärt 如何解释怪语 2412.18624v3

Authors (1): S. V. Kozyrev

Explanation of grokking (delayed generalization) in learning is given by modeling grokking by the stochastic gradient Langevin dynamics (Brownian motion) and applying the ideas of thermodynamics.

nan


Article 1645

Title@2025-06-06 (5): GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression

Title: GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression GRASP: Redundante Ebenen durch adaptive Singular-Parameter für effiziente Modellkompression ersetzen GRASP: 用高效模型压缩的适应性单质参数取代冗余层 2501.00339v3

Authors (6): Kainan Liu, Yong Zhang, Ning Cheng, Zhitao Li, Shaojun Wang, Jing Xiao

Recent studies have demonstrated that many layers are functionally redundant in large language models (LLMs), enabling model compression by removing these layers to reduce inference cost. While such approaches can improve efficiency, indiscriminate layer pruning often results in significant performance degradation. In this paper, we propose GRASP (Gradient-based Retention of Adaptive Singular Parameters), a novel compression framework that mitigates this issue by preserving sensitivity-aware singular values. Unlike direct layer pruning, GRASP leverages gradient-based attribution on a small calibration dataset to adaptively identify and retain critical singular components. By replacing redundant layers with only a minimal set of parameters, GRASP achieves efficient compression while maintaining strong performance with minimal overhead. Experiments across multiple LLMs show that GRASP consistently outperforms existing compression methods, achieving 90% of the original model’s performance under a 20% compression ratio.

nan


Article 1646

Title@2025-06-06 (5): BARK: A Fully Bayesian Tree Kernel for Black-box Optimization

Title: BARK: A Fully Bayesian Tree Kernel for Black-box Optimization BARK: Ein vollbayesischer Baumkernel für Black-Box-Optimierung 用于黑盒优化的全巴伊西亚树核心圈 2503.05574v2

Authors (5): Toby Boyne, Jose Pablo Folch, Robert M Lee, Behrang Shafei, Ruth Misener

We perform Bayesian optimization using a Gaussian process perspective on Bayesian Additive Regression Trees (BART). Our BART Kernel (BARK) uses tree agreement to define a posterior over piecewise-constant functions, and we explore the space of tree kernels using a Markov chain Monte Carlo approach. Where BART only samples functions, the resulting BARK model obtains samples of Gaussian processes defining distributions over functions, which allow us to build acquisition functions for Bayesian optimization. Our tree-based approach enables global optimization over the surrogate, even for mixed-feature spaces. Moreover, where many previous tree-based kernels provide uncertainty quantification over function values, our sampling scheme captures uncertainty over the tree structure itself. Our experiments show the strong performance of BARK on both synthetic and applied benchmarks, due to the combination of our fully Bayesian surrogate and the optimization procedure.

nan


Article 1647

Title@2025-06-06 (5): AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Title: AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML AutoML-Agent: Ein Multi-Agent-LLM-Framework für Full-Pipeline-AutoML 自动MAL- Agency: 全Pipeline 自动MLM 多边代理LLM 框架 2410.02958v2

Authors (3): Patara Trirat, Wonyong Jeong, Sung Ju Hwang

Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline, such as optimal model search and hyperparameter tuning. Existing AutoML systems often require technical expertise to set up complex tools, which is in general time-consuming and requires a large amount of human effort. Therefore, recent works have started exploiting large language models (LLM) to lessen such burden and increase the usability of AutoML frameworks via a natural language interface, allowing non-expert users to build their data-driven solutions. These methods, however, are usually designed only for a particular process in the AI development pipeline and do not efficiently use the inherent capacity of the LLMs. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML, i.e., from data retrieval to model deployment. AutoML-Agent takes user’s task descriptions, facilitates collaboration between specialized LLM agents, and delivers deployment-ready models. Unlike existing work, instead of devising a single plan, we introduce a retrieval-augmented planning strategy to enhance exploration to search for more optimal plans. We also decompose each plan into sub-tasks (e.g., data preprocessing and neural network design) each of which is solved by a specialized agent we build via prompting executing in parallel, making the search process more efficient. Moreover, we propose a multi-stage verification to verify executed results and guide the code generation LLM in implementing successful solutions. Extensive experiments on seven downstream tasks using fourteen datasets show that AutoML-Agent achieves a higher success rate in automating the full AutoML process, yielding systems with good performance throughout the diverse domains.

nan


Article 1648

Title@2025-06-06 (5): Additive decomposition of one-dimensional signals using Transformers

Title: Additive decomposition of one-dimensional signals using Transformers Additive Zersetzung von eindimensionalen Signalen mit Transformern 使用变压器的单维信号的添加分解 2506.05942v1

Authors (4): Samuele Salti, Andrea Pinto, Alessandro Lanza, Serena Morigi

One-dimensional signal decomposition is a well-established and widely used technique across various scientific fields. It serves as a highly valuable pre-processing step for data analysis. While traditional decomposition techniques often rely on mathematical models, recent research suggests that applying the latest deep learning models to this problem presents an exciting, unexplored area with promising potential. This work presents a novel method for the additive decomposition of one-dimensional signals. We leverage the Transformer architecture to decompose signals into their constituent components: piece-wise constant, smooth (low-frequency oscillatory), textured (high-frequency oscillatory), and a noise component. Our model, trained on synthetic data, achieves excellent accuracy in modeling and decomposing input signals from the same distribution, as demonstrated by the experimental results.

nan


Article 1649

Title@2025-06-06 (5): SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models

Title: SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models SafeAuto: Knowledge-Enhanced Safe Autonomous Driving mit multimodalen Foundation-Modellen SafeAuto:利用多模式基金会模式加强知识增强安全自主驾驶 2503.00211v2

Authors (6): Jiawei Zhang, Xuan Yang, Taiqi Wang, Yu Yao, Aleksandr Petiushko, Bo Li

Traditional autonomous driving systems often struggle to connect high-level reasoning with low-level control, leading to suboptimal and sometimes unsafe behaviors. Recent advances in multimodal large language models (MLLMs), which process both visual and textual data, offer an opportunity to unify perception and reasoning. However, effectively embedding precise safety knowledge into MLLMs for autonomous driving remains a significant challenge. To address this, we propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge. First, we introduce a Position-Dependent Cross-Entropy (PDCE) loss to improve low-level control signal predictions when values are represented as text. Second, to explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic (e.g., “red light $\implies$ stop”) and embeds them into a probabilistic graphical model (e.g., Markov Logic Network) to verify predicted actions using recognized environmental attributes. Additionally, our Multimodal Retrieval-Augmented Generation (RAG) model leverages video, control signals, and environmental attributes to learn from past driving experiences. Integrating PDCE, MLN, and Multimodal RAG, SafeAuto outperforms existing baselines across multiple datasets, enabling more accurate, reliable, and safer autonomous driving. The code is available at https://github.com/AI-secure/SafeAuto.

nan


Article 1650

Title@2025-06-06 (5): Comparative Analysis of Modern Machine Learning Models for Retail Sales Forecasting

Title: Comparative Analysis of Modern Machine Learning Models for Retail Sales Forecasting Vergleichende Analyse von modernen Machine Learning-Modellen für die Prognose des Einzelhandels 零售销售预测现代机器学习模型比较分析 2506.05941v1

Authors (4): Luka Hobor, Mario Brcic, Lidija Polutnik, Ante Kapetanovic

Accurate forecasting is key for all business planning. When estimated sales are too high, brick-and-mortar retailers may incur higher costs due to unsold inventories, higher labor and storage space costs, etc. On the other hand, when forecasts underestimate the level of sales, firms experience lost sales, shortages, and impact on the reputation of the retailer in their relevant market. Accurate forecasting presents a competitive advantage for companies. It facilitates the achievement of revenue and profit goals and execution of pricing strategy and tactics. In this study, we provide an exhaustive assessment of the forecasting models applied to a high-resolution brick-and-mortar retail dataset. Our forecasting framework addresses the problems found in retail environments, including intermittent demand, missing values, and frequent product turnover. We compare tree-based ensembles (such as XGBoost and LightGBM) and state-of-the-art neural network architectures (including N-BEATS, NHITS, and the Temporal Fusion Transformer) across various experimental settings. Our results show that localized modeling strategies especially those using tree-based models on individual groups with non-imputed data, consistently deliver superior forecasting accuracy and computational efficiency. In contrast, neural models benefit from advanced imputation methods, yet still fall short in handling the irregularities typical of physical retail data. These results further practical understanding for model selection in retail environment and highlight the significance of data preprocessing to improve forecast performance.

nan


Article 1651

Title@2025-06-06 (5): Exponential Family Variational Flow Matching for Tabular Data Generation

Title: Exponential Family Variational Flow Matching for Tabular Data Generation Exponentieller Familien-Variationsfluss passend zur tabellarischen Datengenerierung 用于制表数据的家庭流动量变化 2506.05940v1

Authors (3): Andrés Guzmán-Cordero, Floor Eijkelboom, Jan-Willem van de Meent

While denoising diffusion and flow matching have driven major advances in generative modeling, their application to tabular data remains limited, despite its ubiquity in real-world applications. To this end, we develop TabbyFlow, a variational Flow Matching (VFM) method for tabular data generation. To apply VFM to data with mixed continuous and discrete features, we introduce Exponential Family Variational Flow Matching (EF-VFM), which represents heterogeneous data types using a general exponential family distribution. We hereby obtain an efficient, data-driven objective based on moment matching, enabling principled learning of probability paths over mixed continuous and discrete variables. We also establish a connection between variational flow matching and generalized flow matching objectives based on Bregman divergences. Evaluation on tabular data benchmarks demonstrates state-of-the-art performance compared to baselines.

nan


Article 1652

Title@2025-06-06 (5): Quantifying Adversarial Uncertainty in Evidential Deep Learning using Conflict Resolution

Title: Quantifying Adversarial Uncertainty in Evidential Deep Learning using Conflict Resolution Quantifizierung von Unsicherheiten im Evidential Deep Learning mittels Konfliktlösung 以解决冲突的方式量化证据深层学习中存在的逆差不确定性 2506.05937v1

Authors (3): Charmaine Barker, Daniel Bethell, Simos Gerasimou

Reliability of deep learning models is critical for deployment in high-stakes applications, where out-of-distribution or adversarial inputs may lead to detrimental outcomes. Evidential Deep Learning, an efficient paradigm for uncertainty quantification, models predictions as Dirichlet distributions of a single forward pass. However, EDL is particularly vulnerable to adversarially perturbed inputs, making overconfident errors. Conflict-aware Evidential Deep Learning (C-EDL) is a lightweight post-hoc uncertainty quantification approach that mitigates these issues, enhancing adversarial and OOD robustness without retraining. C-EDL generates diverse, task-preserving transformations per input and quantifies representational disagreement to calibrate uncertainty estimates when needed. C-EDL’s conflict-aware prediction adjustment improves detection of OOD and adversarial inputs, maintaining high in-distribution accuracy and low computational overhead. Our experimental evaluation shows that C-EDL significantly outperforms state-of-the-art EDL variants and competitive baselines, achieving substantial reductions in coverage for OOD data (up to 55%) and adversarial data (up to 90%), across a range of datasets, attack types, and uncertainty metrics.

nan


Article 1653

Title@2025-06-06 (5): Deep Learning Weather Models for Subregional Ocean Forecasting: A Case Study on the Canary Current Upwelling System

Title: Deep Learning Weather Models for Subregional Ocean Forecasting: A Case Study on the Canary Current Upwelling System Deep Learning Weather Models for Subregional Ocean Forecasting: Eine Fallstudie über das kanarische aktuelle Auftriebssystem 次区域海洋预报的深学习气象模型:关于加那利洋流上升系统的个案研究 2505.24429v2

Authors (3): Giovanny A. Cuervo-Londoño, Javier Sánchez, Ángel Rodríguez-Santana

Oceanographic forecasting impacts various sectors of society by supporting environmental conservation and economic activities. Based on global circulation models, traditional forecasting methods are computationally expensive and slow, limiting their ability to provide rapid forecasts. Recent advances in deep learning offer faster and more accurate predictions, although these data-driven models are often trained with global data from numerical simulations, which may not reflect reality. The emergence of such models presents great potential for improving ocean prediction at a subregional domain. However, their ability to predict fine-scale ocean processes, like mesoscale structures, remains largely unknown. This work aims to adapt a graph neural network initially developed for global weather forecasting to improve subregional ocean prediction, specifically focusing on the Canary Current upwelling system. The model is trained with satellite data and compared to state-of-the-art physical ocean models to assess its performance in capturing ocean dynamics. Our results show that the deep learning model surpasses traditional methods in precision despite some challenges in upwelling areas. It demonstrated superior performance in reducing RMSE errors compared to ConvLSTM and the GLORYS reanalysis, particularly in regions with complex oceanic dynamics such as Cape Ghir, Cape Bojador, and Cape Blanc. The model achieved improvements of up to 26.5% relative to ConvLSTM and error reductions of up to 76% in 5-day forecasts compared to the GLORYS reanalysis at these critical locations, highlighting its enhanced capability to capture spatial variability and improve predictive accuracy in complex areas. These findings suggest the viability of adapting meteorological data-driven models for improving subregional medium-term ocean forecasting.

nan


Article 1654

Title@2025-06-06 (5): Machine Learning Predictions for Traffic Equilibria in Road Renovation Scheduling

Title: Machine Learning Predictions for Traffic Equilibria in Road Renovation Scheduling Machine Learning Vorhersagen für Verkehrsequilibrien in der Road Renovation Scheduling 道路翻新日程安排交通平衡的机器学习预测 2506.05933v1

Authors (4): Robbert Bosch, Wouter van Heeswijk, Patricia Rogetzer, Martijn Mes

Accurately estimating the impact of road maintenance schedules on traffic conditions is important because maintenance operations can substantially worsen congestion if not carefully planned. Reliable estimates allow planners to avoid excessive delays during periods of roadwork. Since the exact increase in congestion is difficult to predict analytically, traffic simulations are commonly used to assess the redistribution of the flow of traffic. However, when applied to long-term maintenance planning involving many overlapping projects and scheduling alternatives, these simulations must be run thousands of times, resulting in a significant computational burden. This paper investigates the use of machine learning-based surrogate models to predict network-wide congestion caused by simultaneous road renovations. We frame the problem as a supervised learning task, using one-hot encodings, engineered traffic features, and heuristic approximations. A range of linear, ensemble-based, probabilistic, and neural regression models is evaluated under an online learning framework in which data progressively becomes available. The experimental results show that the Costliest Subset Heuristic provides a reasonable approximation when limited training data is available, and that most regression models fail to outperform it, with the exception of XGBoost, which achieves substantially better accuracy. In overall performance, XGBoost significantly outperforms alternatives in a range of metrics, most strikingly Mean Absolute Percentage Error (MAPE) and Pinball loss, where it achieves a MAPE of 11% and outperforms the next-best model by 20% and 38% respectively. This modeling approach has the potential to reduce the computational burden of large-scale traffic assignment problems in maintenance planning.

nan


Article 1655

Title@2025-06-06 (5): Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training

Title: Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training Progressive Data Dropout: Ein verblüffend einfacher Ansatz zum schnelleren Training 渐进数据辍学:快速培训的一个令人尴尬的简单方法 2505.22342v2

Authors (7): Shriram M S, Xinyue Hao, Shihao Hou, Yang Lu, Laura Sevilla-Lara, Anurag Arnab, Shreyank N Gowda

The success of the machine learning field has reliably depended on training on large datasets. While effective, this trend comes at an extraordinary cost. This is due to two deeply intertwined factors: the size of models and the size of datasets. While promising research efforts focus on reducing the size of models, the other half of the equation remains fairly mysterious. Indeed, it is surprising that the standard approach to training remains to iterate over and over, uniformly sampling the training dataset. In this paper we explore a series of alternative training paradigms that leverage insights from hard-data-mining and dropout, simple enough to implement and use that can become the new training standard. The proposed Progressive Data Dropout reduces the number of effective epochs to as little as 12.4% of the baseline. This savings actually do not come at any cost for accuracy. Surprisingly, the proposed method improves accuracy by up to 4.82%. Our approach requires no changes to model architecture or optimizer, and can be applied across standard training pipelines, thus posing an excellent opportunity for wide adoption. Code can be found here: https://github.com/bazyagami/LearningWithRevision

nan


Article 1656

Title@2025-06-06 (5): Can Masked Autoencoders Also Listen to Birds?

Title: Can Masked Autoencoders Also Listen to Birds? Können maskierte Autoencoder auch Vögel hören? 面具汽车人也能听鸟的话吗? 2504.12880v3

Authors (6): Lukas Rauch, René Heinrich, Ilyass Moummad, Alexis Joly, Bernhard Sick, Christoph Scholz

Masked Autoencoders (MAEs) have shown competitive results in audio classification by learning rich semantic representations through an efficient self-supervised reconstruction task. However, general-purpose models fail to generalize well when applied directly to fine-grained audio domains. Specifically, bird-sound classification requires distinguishing subtle inter-species differences and managing high intra-species acoustic variability, thereby revealing the performance limitations of general-domain Audio-MAE models. This work demonstrates that bridging this domain gap requires more than domain-specific pretraining data; adapting the entire training pipeline is crucial. We systematically revisit and adapt the pretraining recipe, fine-tuning methods, and frozen feature utilization to bird sounds using BirdSet, a large-scale bioacoustic dataset comparable to AudioSet. Our resulting Bird-MAE achieves new state-of-the-art results in BirdSet’s multi-label classification benchmark. Additionally, we introduce the parameter-efficient prototypical probing, enhancing the utility of frozen MAE representations and closely approaching fine-tuning performance in low-resource settings. Bird-MAE’s prototypical probes outperform linear probing by up to 37%$\text{p}$ in MAP and narrow the gap to fine-tuning to approximately 3.3%$\text{p}$ on average across BirdSet downstream tasks. Bird-MAE also demonstrates robust few-shot capabilities with prototypical probing in our newly established few-shot benchmark on BirdSet, highlighting the potential of tailored self-supervised learning pipelines for fine-grained audio domains.

nan


Article 1657

Title@2025-06-06 (5): GraphGPT: Generative Pre-trained Graph Eulerian Transformer

Title: GraphGPT: Generative Pre-trained Graph Eulerian Transformer GraphGPT: Generative vortrainierte Graph Eulerian Transformer 图GGPT: 未经培训的先产生图 欧勒里亚变形器 2401.00529v3

Authors (6): Qifang Zhao, Weidong Ren, Tianyu Li, Hong Liu, Xingsheng He, Xiaoxiao Xu

We introduceGraphGPT, a novel self-supervised generative pre-trained model for graph learning based on the Graph Eulerian Transformer (GET). First, we propose GET, which combines a standard transformer encoder or decoder architecture with an innovative graph-to-sequence transformation method. This method converts graphs or sampled subgraphs into sequences of tokens representing nodes, edges, and attributes in a reversible manner using Eulerian paths. We pre-train GET using either of the two self-supervised tasks: next-token prediction (NTP) and scheduled masked-token prediction (SMTP). The pre-trained model is then fine-tuned for downstream tasks such as graph-, edge-, and node-level prediction. Despite its simplicity, GraphGPT achieves performance comparable to or surpassing state-of-the-art methods on multiple large-scale Open Graph Benchmark (OGB) datasets. It demonstrates exceptional results on the molecular property prediction dataset PCQM4Mv2 and the protein-protein interaction dataset ogbl-ppa. Notably, generative pre-training enables scaling GraphGPT to 2 billion parameters while maintaining performance gains - a breakthrough that overcomes the scalability limitations of traditional Graph Neural Networks (GNNs) and prior graph transformers (GTs). To advance research in graph foundation models and facilitate scientific discovery in chemistry, materials science, and related fields, we will release the source code (https://github.com/alibaba/graph-gpt) and pre-trained checkpoints.

nan


Article 1658

Title@2025-06-06 (5): UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning

Title: UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning UDora: Ein einheitliches rotes Teaming-Framework gegen LLM-Agenten durch Dynamisches Entführen ihrer eigenen Vernunft UDora:通过动态劫持LLM代理商的自身理由,针对这些代理商的统一红色团队框架 2503.01908v2

Authors (3): Jiawei Zhang, Shuang Yang, Bo Li

Large Language Model (LLM) agents equipped with external tools have become increasingly powerful for complex tasks such as web shopping, automated email replies, and financial trading. However, these advancements amplify the risks of adversarial attacks, especially when agents can access sensitive external functionalities. Nevertheless, manipulating LLM agents into performing targeted malicious actions or invoking specific tools remains challenging, as these agents extensively reason or plan before executing final actions. In this work, we present UDora, a unified red teaming framework designed for LLM agents that dynamically hijacks the agent’s reasoning processes to compel malicious behavior. Specifically, UDora first generates the model’s reasoning trace for the given task, then automatically identifies optimal points within this trace to insert targeted perturbations. The resulting perturbed reasoning is then used as a surrogate response for optimization. By iteratively applying this process, the LLM agent will then be induced to undertake designated malicious actions or to invoke specific malicious tools. Our approach demonstrates superior effectiveness compared to existing methods across three LLM agent datasets. The code is available at https://github.com/AI-secure/UDora.

nan


Article 1659

Title@2025-06-06 (5): Over-PINNs: Enhancing Physics-Informed Neural Networks via Higher-Order Partial Derivative Overdetermination of PDEs

Title: Over-PINNs: Enhancing Physics-Informed Neural Networks via Higher-Order Partial Derivative Overdetermination of PDEs Über-PINNs: Verbesserung physikinformierter neuraler Netzwerke durch höhere Ordnung Partial Derivative Überbestimmung von PDEs 超标准PINNs:通过高端部分衍生因素超过PDE的确定,加强物理内化神经网络 2506.05918v1

Authors (4): Wenxuan Huo, Qiang He, Gang Zhu, Weifeng Huang

Partial differential equations (PDEs) serve as the cornerstone of mathematical physics. In recent years, Physics-Informed Neural Networks (PINNs) have significantly reduced the dependence on large datasets by embedding physical laws directly into the training of neural networks. However, when dealing with complex problems, the accuracy of PINNs still has room for improvement. To address this issue, we introduce the Over-PINNs framework, which leverages automatic differentiation (AD) to generate higher-order auxiliary equations that impose additional physical constraints. These equations are incorporated as extra loss terms in the training process, effectively enhancing the model’s ability to capture physical information through an “overdetermined” approach. Numerical results illustrate that this method exhibits strong versatility in solving various types of PDEs. It achieves a significant improvement in solution accuracy without incurring substantial additional computational costs.

nan


Article 1660

Title@2025-06-06 (5): Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness

Title: Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness Halbüberwachte Segmentierung jenseits der Genauigkeit neu denken: Zuverlässigkeit und Robustheit 重新思考超越准确度的半监督分割:可靠性和强性 2506.05917v1

Authors (3): Steven Landgraf, Markus Hillemann, Markus Ulrich

Semantic segmentation is critical for scene understanding but demands costly pixel-wise annotations, attracting increasing attention to semi-supervised approaches to leverage abundant unlabeled data. While semi-supervised segmentation is often promoted as a path toward scalable, real-world deployment, it is astonishing that current evaluation protocols exclusively focus on segmentation accuracy, entirely overlooking reliability and robustness. These qualities, which ensure consistent performance under diverse conditions (robustness) and well-calibrated model confidences as well as meaningful uncertainties (reliability), are essential for safety-critical applications like autonomous driving, where models must handle unpredictable environments and avoid sudden failures at all costs. To address this gap, we introduce the Reliable Segmentation Score (RSS), a novel metric that combines predictive accuracy, calibration, and uncertainty quality measures via a harmonic mean. RSS penalizes deficiencies in any of its components, providing an easy and intuitive way of holistically judging segmentation models. Comprehensive evaluations of UniMatchV2 against its predecessor and a supervised baseline show that semi-supervised methods often trade reliability for accuracy. While out-of-domain evaluations demonstrate UniMatchV2’s robustness, they further expose persistent reliability shortcomings. We advocate for a shift in evaluation protocols toward more holistic metrics like RSS to better align semi-supervised learning research with real-world deployment needs.

nan


Article 1661

Title@2025-06-06 (5): Approximating Latent Manifolds in Neural Networks via Vanishing Ideals

Title: Approximating Latent Manifolds in Neural Networks via Vanishing Ideals Approximierende latente Manifolds in neuralen Netzen über Vanishing Ideals 借助消失的理念在神经网络中接近的中点修饰 2502.15051v2

Authors (4): Nico Pelleriti, Max Zimmer, Elias Wirth, Sebastian Pokutta

Deep neural networks have reshaped modern machine learning by learning powerful latent representations that often align with the manifold hypothesis: high-dimensional data lie on lower-dimensional manifolds. In this paper, we establish a connection between manifold learning and computational algebra by demonstrating how vanishing ideals can characterize the latent manifolds of deep networks. To that end, we propose a new neural architecture that (i) truncates a pretrained network at an intermediate layer, (ii) approximates each class manifold via polynomial generators of the vanishing ideal, and (iii) transforms the resulting latent space into linearly separable features through a single polynomial layer. The resulting models have significantly fewer layers than their pretrained baselines, while maintaining comparable accuracy, achieving higher throughput, and utilizing fewer parameters. Furthermore, drawing on spectral complexity analysis, we derive sharper theoretical guarantees for generalization, showing that our approach can in principle offer tighter bounds than standard deep networks. Numerical experiments confirm the effectiveness and efficiency of the proposed approach.

nan


Article 1662

Title@2025-06-06 (5): DeviceScope: An Interactive App to Detect and Localize Appliance Patterns in Electricity Consumption Time Series

Title: DeviceScope: An Interactive App to Detect and Localize Appliance Patterns in Electricity Consumption Time Series DeviceScope: Eine interaktive App zur Erkennung und Lokalisierung von Appliance-Mustern in der Stromverbrauchs-Zeitreihe 设备范围:在电力消耗时间序列中检测和本地化应用程序模式的交互式应用程序 2506.05912v1

Authors (4): Adrien Petralia, Paul Boniol, Philippe Charpentier, Themis Palpanas

In recent years, electricity suppliers have installed millions of smart meters worldwide to improve the management of the smart grid system. These meters collect a large amount of electrical consumption data to produce valuable information to help consumers reduce their electricity footprint. However, having non-expert users (e.g., consumers or sales advisors) understand these data and derive usage patterns for different appliances has become a significant challenge for electricity suppliers because these data record the aggregated behavior of all appliances. At the same time, ground-truth labels (which could train appliance detection and localization models) are expensive to collect and extremely scarce in practice. This paper introduces DeviceScope, an interactive tool designed to facilitate understanding smart meter data by detecting and localizing individual appliance patterns within a given time period. Our system is based on CamAL (Class Activation Map-based Appliance Localization), a novel weakly supervised approach for appliance localization that only requires the knowledge of the existence of an appliance in a household to be trained. This paper appeared in ICDE 2025.

nan


Article 1663

Title@2025-06-06 (5): CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness

Title: CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness CAPability: Ein umfassender Visual Caption Benchmark zur Bewertung von Korrektheit und Gründlichkeit 能力:评估正确性和彻底性的综合视觉能力基准 2502.14914v3

Authors (12): Zhihang Liu, Chen-Wei Xie, Bin Wen, Feiwu Yu, Jixuan Chen, Pandeng Li, Boqiang Zhang, Nianzu Yang, Yinglu Li, Zuan Gao, Yun Zheng, Hongtao Xie

Visual captioning benchmarks have become outdated with the emergence of modern multimodal large language models (MLLMs), as the brief ground-truth sentences and traditional metrics fail to assess detailed captions effectively. While recent benchmarks attempt to address this by focusing on keyword extraction or object-centric evaluation, they remain limited to vague-view or object-view analyses and incomplete visual element coverage. In this paper, we introduce CAPability, a comprehensive multi-view benchmark for evaluating visual captioning across 12 dimensions spanning six critical views. We curate nearly 11K human-annotated images and videos with visual element annotations to evaluate the generated captions. CAPability stably assesses both the correctness and thoroughness of captions with \textit{precision} and \textit{hit} metrics. By converting annotations to QA pairs, we further introduce a heuristic metric, \textit{know but cannot tell} ($K\bar{T}$), indicating a significant performance gap between QA and caption capabilities. Our work provides a holistic analysis of MLLMs’ captioning abilities, as we identify their strengths and weaknesses across various dimensions, guiding future research to enhance specific aspects of their capabilities.

nan


Article 1664

Title@2025-06-06 (5): Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants

Title: Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants Brauchen wir Strategien zur Neuausrichtung? Eine theoretische und empirische Studie rund um SMOTE und deren Varianten 我们需要重新平衡战略吗? 围绕SMOTE及其变体进行理论和经验研究。 2402.03819v5

Authors (3): Abdoulaye Sakho, Emmanuel Malherbe, Erwan Scornet

Synthetic Minority Oversampling Technique (SMOTE) is a common rebalancing strategy for handling imbalanced tabular data sets. However, few works analyze SMOTE theoretically. In this paper, we derive several non-asymptotic upper bound on SMOTE density. From these results, we prove that SMOTE (with default parameter) tends to copy the original minority samples asymptotically. We confirm and illustrate empirically this first theoretical behavior on a real-world data-set.bFurthermore, we prove that SMOTE density vanishes near the boundary of the support of the minority class distribution. We then adapt SMOTE based on our theoretical findings to introduce two new variants. These strategies are compared on 13 tabular data sets with 10 state-of-the-art rebalancing procedures, including deep generative and diffusion models. One of our key findings is that, for most data sets, applying no rebalancing strategy is competitive in terms of predictive performances, would it be with LightGBM, tuned random forests or logistic regression. However, when the imbalance ratio is artificially augmented, one of our two modifications of SMOTE leads to promising predictive performances compared to SMOTE and other state-of-the-art strategies.

nan


Article 1665

Title@2025-06-06 (5): A Driving Regime-Embedded Deep Learning Framework for Modeling Intra-Driver Heterogeneity in Multi-Scale Car-Following Dynamics

Title: A Driving Regime-Embedded Deep Learning Framework for Modeling Intra-Driver Heterogeneity in Multi-Scale Car-Following Dynamics Ein Driving Regime-embedded Deep Learning Framework zur Modellierung von Intra-Driver Heterogenität in multi-Scale Car-Following Dynamics 用于模拟多规模汽车跟踪动态中多规模汽车内部异质性模型的驱动系统(以制度为床位的深学习框架) 2506.05902v1

Authors (6): Shirui Zhou, Jiying Yan, Junfang Tian, Tao Wang, Yongfu Li, Shiquan Zhong

A fundamental challenge in car-following modeling lies in accurately representing the multi-scale complexity of driving behaviors, particularly the intra-driver heterogeneity where a single driver’s actions fluctuate dynamically under varying conditions. While existing models, both conventional and data-driven, address behavioral heterogeneity to some extent, they often emphasize inter-driver heterogeneity or rely on simplified assumptions, limiting their ability to capture the dynamic heterogeneity of a single driver under different driving conditions. To address this gap, we propose a novel data-driven car-following framework that systematically embeds discrete driving regimes (e.g., steady-state following, acceleration, cruising) into vehicular motion predictions. Leveraging high-resolution traffic trajectory datasets, the proposed hybrid deep learning architecture combines Gated Recurrent Units for discrete driving regime classification with Long Short-Term Memory networks for continuous kinematic prediction, unifying discrete decision-making processes and continuous vehicular dynamics to comprehensively represent inter- and intra-driver heterogeneity. Driving regimes are identified using a bottom-up segmentation algorithm and Dynamic Time Warping, ensuring robust characterization of behavioral states across diverse traffic scenarios. Comparative analyses demonstrate that the framework significantly reduces prediction errors for acceleration (maximum MSE improvement reached 58.47\%), speed, and spacing metrics while reproducing critical traffic phenomena, such as stop-and-go wave propagation and oscillatory dynamics.

nan


Article 1666

Title@2025-06-06 (5): State-Covering Trajectory Stitching for Diffusion Planners

Title: State-Covering Trajectory Stitching for Diffusion Planners State-Covering Trajektorie Stich für Diffusionsplaner 用于扩散规划器的 国家覆盖轨迹切换 2506.00895v2

Authors (2): Kyowoon Lee, Jaesik Choi

Diffusion-based generative models are emerging as powerful tools for long-horizon planning in reinforcement learning (RL), particularly with offline datasets. However, their performance is fundamentally limited by the quality and diversity of training data. This often restricts their generalization to tasks outside their training distribution or longer planning horizons. To overcome this challenge, we propose State-Covering Trajectory Stitching (SCoTS), a novel reward-free trajectory augmentation method that incrementally stitches together short trajectory segments, systematically generating diverse and extended trajectories. SCoTS first learns a temporal distance-preserving latent representation that captures the underlying temporal structure of the environment, then iteratively stitches trajectory segments guided by directional exploration and novelty to effectively cover and expand this latent space. We demonstrate that SCoTS significantly improves the performance and generalization capabilities of diffusion planners on offline goal-conditioned benchmarks requiring stitching and long-horizon reasoning. Furthermore, augmented trajectories generated by SCoTS significantly improve the performance of widely used offline goal-conditioned RL algorithms across diverse environments.

nan


Article 1667

Title@2025-06-06 (5): Few Labels are all you need: A Weakly Supervised Framework for Appliance Localization in Smart-Meter Series

Title: Few Labels are all you need: A Weakly Supervised Framework for Appliance Localization in Smart-Meter Series Wenige Labels sind alles, was Sie brauchen: Ein schwach überwachtes Framework für die Gerätelokalisierung in der Smart-Meter-Serie 少数标签是您所需要的:智能- 密特系列中受微弱监督的应用程序本地化框架 2506.05895v1

Authors (4): Adrien Petralia, Paul Boniol, Philippe Charpentier, Themis Palpanas

Improving smart grid system management is crucial in the fight against climate change, and enabling consumers to play an active role in this effort is a significant challenge for electricity suppliers. In this regard, millions of smart meters have been deployed worldwide in the last decade, recording the main electricity power consumed in individual households. This data produces valuable information that can help them reduce their electricity footprint; nevertheless, the collected signal aggregates the consumption of the different appliances running simultaneously in the house, making it difficult to apprehend. Non-Intrusive Load Monitoring (NILM) refers to the challenge of estimating the power consumption, pattern, or on/off state activation of individual appliances using the main smart meter signal. Recent methods proposed to tackle this task are based on a fully supervised deep-learning approach that requires both the aggregate signal and the ground truth of individual appliance power. However, such labels are expensive to collect and extremely scarce in practice, as they require conducting intrusive surveys in households to monitor each appliance. In this paper, we introduce CamAL, a weakly supervised approach for appliance pattern localization that only requires information on the presence of an appliance in a household to be trained. CamAL merges an ensemble of deep-learning classifiers combined with an explainable classification method to be able to localize appliance patterns. Our experimental evaluation, conducted on 4 real-world datasets, demonstrates that CamAL significantly outperforms existing weakly supervised baselines and that current SotA fully supervised NILM approaches require significantly more labels to reach CamAL performances. The source of our experiments is available at: https://github.com/adrienpetralia/CamAL. This paper appeared in ICDE 2025.

nan


Article 1668

Title@2025-06-06 (5): Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games

Title: Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games Politikoptimierung für lineare-Quadratic Graphon Mean Field Games mit kontinuierlicher Zeit 连续时线性二次曲线平面平面运动会的政策优化 2506.05894v1

Authors (2): Philipp Plank, Yufei Zhang

Multi-agent reinforcement learning, despite its popularity and empirical success, faces significant scalability challenges in large-population dynamic games. Graphon mean field games (GMFGs) offer a principled framework for approximating such games while capturing heterogeneity among players. In this paper, we propose and analyze a policy optimization framework for continuous-time, finite-horizon linear-quadratic GMFGs. Exploiting the structural properties of GMFGs, we design an efficient policy parameterization in which each player’s policy is represented as an affine function of their private state, with a shared slope function and player-specific intercepts. We develop a bilevel optimization algorithm that alternates between policy gradient updates for best-response computation under a fixed population distribution, and distribution updates using the resulting policies. We prove linear convergence of the policy gradient steps to best-response policies and establish global convergence of the overall algorithm to the Nash equilibrium. The analysis relies on novel landscape characterizations over infinite-dimensional policy spaces. Numerical experiments demonstrate the convergence and robustness of the proposed algorithm under varying graphon structures, noise levels, and action frequencies.

nan


Article 1669

Title: Open Your Eyes: Vision Enhances Message Passing Neural Networks in Link Prediction Öffnen Sie Ihre Augen: Vision verbessert die Botschaft an neuralen Netzwerken in der Link-Vorhersage ” 睁开眼睛:愿景在链接预测中加强信息传递神经网络 “ 2505.08266v3

Authors (8): Yanbin Wei, Xuehao Wang, Zhan Zhuang, Yang Chen, Shuhao Chen, Yulong Zhang, Yu Zhang, James Kwok

Message-passing graph neural networks (MPNNs) and structural features (SFs) are cornerstones for the link prediction task. However, as a common and intuitive mode of understanding, the potential of visual perception has been overlooked in the MPNN community. For the first time, we equip MPNNs with vision structural awareness by proposing an effective framework called Graph Vision Network (GVN), along with a more efficient variant (E-GVN). Extensive empirical results demonstrate that with the proposed frameworks, GVN consistently benefits from the vision enhancement across seven link prediction datasets, including challenging large-scale graphs. Such improvements are compatible with existing state-of-the-art (SOTA) methods and GVNs achieve new SOTA results, thereby underscoring a promising novel direction for link prediction.

nan


Article 1670

Title@2025-06-06 (5): An Uncertainty-Aware ED-LSTM for Probabilistic Suffix Prediction

Title: An Uncertainty-Aware ED-LSTM for Probabilistic Suffix Prediction Eine unsichere ED-LSTM für probabilistische Suffix-Vorhersage 用于概率后置物后置物预测的不确定性( ED-LSTM) 的不确定性警告 ED-LSTM 2505.21339v2

Authors (3): Henryk Mustroph, Michel Kunkler, Stefanie Rinderle-Ma

Suffix prediction of business processes forecasts the remaining sequence of events until process completion. Current approaches focus on predicting the most likely suffix, representing a single scenario. However, when the future course of a process is subject to uncertainty and high variability, the expressiveness of such a single scenario can be limited, since other possible scenarios, which together may have a higher overall probability, are overlooked. To address this limitation, we propose probabilistic suffix prediction, a novel approach that approximates a probability distribution of suffixes. The proposed approach is based on an Uncertainty-Aware Encoder-Decoder LSTM (U-ED-LSTM) and a Monte Carlo (MC) suffix sampling algorithm. We capture epistemic uncertainties via MC dropout and aleatoric uncertainties as learned loss attenuation. This technical report presents a comprehensive evaluation of the probabilistic suffix prediction approach’s predictive performance and calibration under three different hyperparameter settings, using four real-life and one artificial event log. The results show that: i) probabilistic suffix prediction can outperform most likely suffix prediction, the U-ED-LSTM has reasonable predictive performance, and ii) the model’s predictions are well calibrated.

nan


Article 1671

Title@2025-06-06 (5): Variational Inference for Quantum HyperNetworks

Title: Variational Inference for Quantum HyperNetworks Variationelle Schlussfolgerung für Quantum HyperNetworks 量子超网络的变异推推推 2506.05888v1

Authors (5): Luca Nepote, Alix Lhéritier, Nicolas Bondoux, Marios Kountouris, Maurizio Filippone

Binary Neural Networks (BiNNs), which employ single-bit precision weights, have emerged as a promising solution to reduce memory usage and power consumption while maintaining competitive performance in large-scale systems. However, training BiNNs remains a significant challenge due to the limitations of conventional training algorithms. Quantum HyperNetworks offer a novel paradigm for enhancing the optimization of BiNN by leveraging quantum computing. Specifically, a Variational Quantum Algorithm is employed to generate binary weights through quantum circuit measurements, while key quantum phenomena such as superposition and entanglement facilitate the exploration of a broader solution space. In this work, we establish a connection between this approach and Bayesian inference by deriving the Evidence Lower Bound (ELBO), when direct access to the output distribution is available (i.e., in simulations), and introducing a surrogate ELBO based on the Maximum Mean Discrepancy (MMD) metric for scenarios involving implicit distributions, as commonly encountered in practice. Our experimental results demonstrate that the proposed methods outperform standard Maximum Likelihood Estimation (MLE), improving trainability and generalization.

nan


Article 1672

Title@2025-06-06 (5): NILMFormer: Non-Intrusive Load Monitoring that Accounts for Non-Stationarity

Title: NILMFormer: Non-Intrusive Load Monitoring that Accounts for Non-Stationarity NILMFormer: Non-Intrusive Load Monitoring, die Konten für Non-Stationarität NILMFormer: 非入侵性账户的无侵入性负载监测 2506.05880v1

Authors (4): Adrien Petralia, Philippe Charpentier, Youssef Kadhi, Themis Palpanas

Millions of smart meters have been deployed worldwide, collecting the total power consumed by individual households. Based on these data, electricity suppliers offer their clients energy monitoring solutions to provide feedback on the consumption of their individual appliances. Historically, such estimates have relied on statistical methods that use coarse-grained total monthly consumption and static customer data, such as appliance ownership. Non-Intrusive Load Monitoring (NILM) is the problem of disaggregating a household’s collected total power consumption to retrieve the consumed power for individual appliances. Current state-of-the-art (SotA) solutions for NILM are based on deep-learning (DL) and operate on subsequences of an entire household consumption reading. However, the non-stationary nature of real-world smart meter data leads to a drift in the data distribution within each segmented window, which significantly affects model performance. This paper introduces NILMFormer, a Transformer-based architecture that incorporates a new subsequence stationarization/de-stationarization scheme to mitigate the distribution drift and that uses a novel positional encoding that relies only on the subsequence’s timestamp information. Experiments with 4 real-world datasets show that NILMFormer significantly outperforms the SotA approaches. Our solution has been deployed as the backbone algorithm for EDF’s (Electricit'e De France) consumption monitoring service, delivering detailed insights to millions of customers about their individual appliances’ power consumption. This paper appeared in KDD 2025.

nan


Article 1673

Title@2025-06-06 (5): Infinite-Dimensional Diffusion Models

Title: Infinite-Dimensional Diffusion Models Infinite-Dimensional Diffusionsmodelle 无限不同传播模型 2302.10130v3

Authors (4): Jakiw Pidstrigach, Youssef Marzouk, Sebastian Reich, Sven Wang

Diffusion models have had a profound impact on many application areas, including those where data are intrinsically infinite-dimensional, such as images or time series. The standard approach is first to discretize and then to apply diffusion models to the discretized data. While such approaches are practically appealing, the performance of the resulting algorithms typically deteriorates as discretization parameters are refined. In this paper, we instead directly formulate diffusion-based generative models in infinite dimensions and apply them to the generative modelling of functions. We prove that our formulations are well posed in the infinite-dimensional setting and provide dimension-independent distance bounds from the sample to the target measure. Using our theory, we also develop guidelines for the design of infinite-dimensional diffusion models. For image distributions, these guidelines are in line with current canonical choices. For other distributions, however, we can improve upon these canonical choices. We demonstrate these results both theoretically and empirically, by applying the algorithms to data distributions on manifolds and to distributions arising in Bayesian inverse problems or simulation-based inference.

nan


Article 1674

Title@2025-06-06 (5): A projection-based framework for gradient-free and parallel learning

Title: A projection-based framework for gradient-free and parallel learning Ein projektionsbasierter Rahmen für gradientenfreies und paralleles Lernen 以预测为基础的无梯度和平行学习框架 2506.05878v1

Authors (4): Andreas Bergmeister, Manish Krishan Lal, Stefanie Jegelka, Suvrit Sra

We present a feasibility-seeking approach to neural network training. This mathematical optimization framework is distinct from conventional gradient-based loss minimization and uses projection operators and iterative projection algorithms. We reformulate training as a large-scale feasibility problem: finding network parameters and states that satisfy local constraints derived from its elementary operations. Training then involves projecting onto these constraints, a local operation that can be parallelized across the network. We introduce PJAX, a JAX-based software framework that enables this paradigm. PJAX composes projection operators for elementary operations, automatically deriving the solution operators for the feasibility problems (akin to autodiff for derivatives). It inherently supports GPU/TPU acceleration, provides a familiar NumPy-like API, and is extensible. We train diverse architectures (MLPs, CNNs, RNNs) on standard benchmarks using PJAX, demonstrating its functionality and generality. Our results show that this approach is as a compelling alternative to gradient-based training, with clear advantages in parallelism and the ability to handle non-differentiable operations.

nan


Article 1675

Title@2025-06-06 (5): Interpretable Clustering Ensemble

Title: Interpretable Clustering Ensemble Interpretiertes Cluster-Ensemble 可解释的集群组合组合 2506.05877v1

Authors (5): Hang Lv, Lianyu Hu, Mudi Jiang, Xinying Liu, Zengyou He

Clustering ensemble has emerged as an important research topic in the field of machine learning. Although numerous methods have been proposed to improve clustering quality, most existing approaches overlook the need for interpretability in high-stakes applications. In domains such as medical diagnosis and financial risk assessment, algorithms must not only be accurate but also interpretable to ensure transparent and trustworthy decision-making. Therefore, to fill the gap of lack of interpretable algorithms in the field of clustering ensemble, we propose the first interpretable clustering ensemble algorithm in the literature. By treating base partitions as categorical variables, our method constructs a decision tree in the original feature space and use the statistical association test to guide the tree building process. Experimental results demonstrate that our algorithm achieves comparable performance to state-of-the-art (SOTA) clustering ensemble methods while maintaining an additional feature of interpretability. To the best of our knowledge, this is the first interpretable algorithm specifically designed for clustering ensemble, offering a new perspective for future research in interpretable clustering.

nan


Article 1676

Title@2025-06-06 (5): BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures

Title: BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures BestServe: Servierstrategien mit optimalem Goodput in Kollokations- und Disaggregationsarchitekturen 最佳服务:在合用和拆分结构中采用最优优利用的优良战略 2506.05871v1

Authors (6): Xiannan Hu, Tianyou Zeng, Xiaoming Yuan, Liwei Song, Guangyuan Zhang, Bangzheng He

Serving large language models (LLMs) to millions of users requires efficient resource allocation and parallelism strategies. It is a labor intensive trial-and-error process to find such a strategy. We present BestServe, a novel framework for ranking serving strategies by estimating goodput under various operating scenarios. Supporting both collocated and disaggregated architectures, BestServe leverages an inference simulator built on an adapted roofline model and CPU-GPU dispatch dynamics. Our framework determines the optimal strategy in minutes on a single standard CPU, eliminating the need for costly benchmarking, while achieving predictions within a $20\%$ error margin. It appeals to be practical for rapid deployment planning because of its lightweight design and strong extensibility.

nan


Article 1677

Title: Loss Functions for Predictor-based Neural Architecture Search Verlustfunktionen für vorausschauende neurale Architektursuche 预测基于神经建筑搜索的损失函数 2506.05869v1

Authors (4): Han Ji, Yuqi Feng, Jiahao Fan, Yanan Sun

Evaluation is a critical but costly procedure in neural architecture search (NAS). Performance predictors have been widely adopted to reduce evaluation costs by directly estimating architecture performance. The effectiveness of predictors is heavily influenced by the choice of loss functions. While traditional predictors employ regression loss functions to evaluate the absolute accuracy of architectures, recent approaches have explored various ranking-based loss functions, such as pairwise and listwise ranking losses, to focus on the ranking of architecture performance. Despite their success in NAS, the effectiveness and characteristics of these loss functions have not been thoroughly investigated. In this paper, we conduct the first comprehensive study on loss functions in performance predictors, categorizing them into three main types: regression, ranking, and weighted loss functions. Specifically, we assess eight loss functions using a range of NAS-relevant metrics on 13 tasks across five search spaces. Our results reveal that specific categories of loss functions can be effectively combined to enhance predictor-based NAS. Furthermore, our findings could provide practical guidance for selecting appropriate loss functions for various tasks. We hope this work provides meaningful insights to guide the development of loss functions for predictor-based methods in the NAS community.

nan


Article 1678

Title@2025-06-06 (5): Stealix: Model Stealing via Prompt Evolution

Title: Stealix: Model Stealing via Prompt Evolution Stealix: Modelldiebstahl über Prompt Evolution Stenyix: 通过快速进化偷盗模型 2506.05867v1

Authors (4): Zhixiong Zhuang, Hui-Po Wang, Maria-Irina Nicolae, Mario Fritz

Model stealing poses a significant security risk in machine learning by enabling attackers to replicate a black-box model without access to its training data, thus jeopardizing intellectual property and exposing sensitive information. Recent methods that use pre-trained diffusion models for data synthesis improve efficiency and performance but rely heavily on manually crafted prompts, limiting automation and scalability, especially for attackers with little expertise. To assess the risks posed by open-source pre-trained models, we propose a more realistic threat model that eliminates the need for prompt design skills or knowledge of class names. In this context, we introduce Stealix, the first approach to perform model stealing without predefined prompts. Stealix uses two open-source pre-trained models to infer the victim model’s data distribution, and iteratively refines prompts through a genetic algorithm, progressively improving the precision and diversity of synthetic images. Our experimental results demonstrate that Stealix significantly outperforms other methods, even those with access to class names or fine-grained prompts, while operating under the same query budget. These findings highlight the scalability of our approach and suggest that the risks posed by pre-trained generative models in model stealing may be greater than previously recognized.

nan


Article 1679

Title@2025-06-06 (5): Enhancing pretraining efficiency for medical image segmentation via transferability metrics

Title: Enhancing pretraining efficiency for medical image segmentation via transferability metrics Steigerung der Vorschulungseffizienz für die Segmentierung medizinischer Bilder durch Übertragbarkeitsmetriken 通过可转让性指标,提高医疗图象分形的训练前效率 2410.18677v2

Authors (3): Gábor Hidy, Bence Bakos, András Lukács

In medical image segmentation tasks, the scarcity of labeled training data poses a significant challenge when training deep neural networks. When using U-Net-style architectures, it is common practice to address this problem by pretraining the encoder part on a large general-purpose dataset like ImageNet. However, these methods are resource-intensive and do not guarantee improved performance on the downstream task. In this paper we investigate a variety of training setups on medical image segmentation datasets, using ImageNet-pretrained models. By examining over 300 combinations of models, datasets, and training methods, we find that shorter pretraining often leads to better results on the downstream task, providing additional proof to the well-known fact that the accuracy of the model on ImageNet is a poor indicator for downstream performance. As our main contribution, we introduce a novel transferability metric, based on contrastive learning, that measures how robustly a pretrained model is able to represent the target data. In contrast to other transferability scores, our method is applicable to the case of transferring from ImageNet classification to medical image segmentation. We apply our robustness score by measuring it throughout the pretraining phase to indicate when the model weights are optimal for downstream transfer. This reduces pretraining time and improves results on the target task.

nan


Article 1680

Title@2025-06-06 (5): Conditioning Diffusions Using Malliavin Calculus

Title: Conditioning Diffusions Using Malliavin Calculus Konditionierung Diffusionen mit Malliavin Calculus 使用 Malliavin Calculus 有条件的传播 2504.03461v2

Authors (5): Jakiw Pidstrigach, Elizabeth Baker, Carles Domingo-Enrich, George Deligiannidis, Nikolas Nüsken

In generative modelling and stochastic optimal control, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an infinite value if the target is hit and zero otherwise. We introduce a novel framework, based on Malliavin calculus and centred around a generalisation of the Tweedie score formula to nonlinear stochastic differential equations, that enables the development of methods robust to such singularities. This allows our approach to handle a broad range of applications, like diffusion bridges, or adding conditional controls to an already trained diffusion model. We demonstrate that our approach offers stable and reliable training, outperforming existing techniques. As a byproduct, we also introduce a novel score matching objective. Our loss functions are formulated such that they could readily be extended to manifold-valued and infinite dimensional diffusions.

nan


Article 1681

Title@2025-06-06 (5): Improved Allergy Wheal Detection for the Skin Prick Automated Test Device

Title: Improved Allergy Wheal Detection for the Skin Prick Automated Test Device Verbesserte Allergikererkennung für das Automatisierte Testgerät Skin Prick 改进皮肤皮板自动自动测试装置的过敏 wHeal 检测 2506.05862v1

Authors (24): Rembert Daems, Sven Seys, Valérie Hox, Adam Chaker, Glynnis De Greve, Winde Lemmens, Anne-Lise Poirrier, Eline Beckers, Zuzana Diamant, Carmen Dierickx, Peter W. Hellings, Caroline Huart, Claudia Jerin, Mark Jorissen, Hanne Oscé, Karolien Roux, Mark Thompson, Sophie Tombu, Saartje Uyttebroek, Andrzej Zarowski, Senne Gorris, Laura Van Gerven, Dirk Loeckx, Thomas Demeester

Background: The skin prick test (SPT) is the gold standard for diagnosing sensitization to inhalant allergies. The Skin Prick Automated Test (SPAT) device was designed for increased consistency in test results, and captures 32 images to be jointly used for allergy wheal detection and delineation, which leads to a diagnosis. Materials and Methods: Using SPAT data from $868$ patients with suspected inhalant allergies, we designed an automated method to detect and delineate wheals on these images. To this end, $10,416$ wheals were manually annotated by drawing detailed polygons along the edges. The unique data-modality of the SPAT device, with $32$ images taken under distinct lighting conditions, requires a custom-made approach. Our proposed method consists of two parts: a neural network component that segments the wheals on the pixel level, followed by an algorithmic and interpretable approach for detecting and delineating the wheals. Results: We evaluate the performance of our method on a hold-out validation set of $217$ patients. As a baseline we use a single conventionally lighted image per SPT as input to our method. Conclusion: Using the $32$ SPAT images under various lighting conditions offers a considerably higher accuracy than a single image in conventional, uniform light.

nan


Article 1682

Title@2025-06-06 (5): Wavelet-based Disentangled Adaptive Normalization for Non-stationary Times Series Forecasting

Title: Wavelet-based Disentangled Adaptive Normalization for Non-stationary Times Series Forecasting Wavelet-basiert Disentangled Adaptive Normalisierung für nicht-stationäre Zeitreihen Forecasting 用于非静止时报系列预测的不固定时报调适正常化 2506.05857v1

Authors (9): Junpeng Lin, Tian Lan, Bo Zhang, Ke Lin, Dandan Miao, Huiru He, Jiantao Ye, Chen Zhang, Yan-fu Li

Forecasting non-stationary time series is a challenging task because their statistical properties often change over time, making it hard for deep models to generalize well. Instance-level normalization techniques can help address shifts in temporal distribution. However, most existing methods overlook the multi-component nature of time series, where different components exhibit distinct non-stationary behaviors. In this paper, we propose Wavelet-based Disentangled Adaptive Normalization (WDAN), a model-agnostic framework designed to address non-stationarity in time series forecasting. WDAN uses discrete wavelet transforms to break down the input into low-frequency trends and high-frequency fluctuations. It then applies tailored normalization strategies to each part. For trend components that exhibit strong non-stationarity, we apply first-order differencing to extract stable features used for predicting normalization parameters. Extensive experiments on multiple benchmarks demonstrate that WDAN consistently improves forecasting accuracy across various backbone model. Code is available at this repository: https://github.com/MonBG/WDAN.

nan


Article 1683

Title@2025-06-06 (5): TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research

Title: TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research TinySQL: Ein progressiver Text-zu-SQL-Datensatz für die mechanistische Interpretationsforschung TinySQL: 用于机械解释性研究的渐进文本到SQL数据集 2503.12730v3

Authors (6): Abir Harrasse, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks, Amir Abdullah

Mechanistic interpretability research faces a gap between analyzing simple circuits in toy tasks and discovering features in large models. To bridge this gap, we propose text-to-SQL generation as an ideal task to study, as it combines the formal structure of toy tasks with real-world complexity. We introduce TinySQL, a synthetic dataset, progressing from basic to advanced SQL operations, and train models ranging from 33M to 1B parameters to establish a comprehensive testbed for interpretability. We apply multiple complementary interpretability techniques, including Edge Attribution Patching and Sparse Autoencoders, to identify minimal circuits and components supporting SQL generation. We compare circuits for different SQL subskills, evaluating their minimality, reliability, and identifiability. Finally, we conduct a layerwise logit lens analysis to reveal how models compose SQL queries across layers: from intent recognition to schema resolution to structured generation. Our work provides a robust framework for probing and comparing interpretability methods in a structured, progressively complex setting.

nan


Article 1684

Title@2025-06-06 (5): BoA: Attention-aware Post-training Quantization without Backpropagation

Title: BoA: Attention-aware Post-training Quantization without Backpropagation BoA: Aufmerksamkeitsbewusste Nachschulung Quantisierung ohne Backpropagation BoA: 注意培训后量化而不反言反言 2406.13474v3

Authors (6): Junhan Kim, Ho-young Kim, Eulrang Cho, Chungman Lee, Joonyoung Kim, Yongkweon Jeon

Post-training quantization (PTQ) is a promising solution for deploying large language models (LLMs) on resource-constrained devices. Early methods developed for small-scale networks, such as ResNet, rely on gradient-based optimization, which becomes impractical for hyper-scale LLMs with billions of parameters. While recently proposed backpropagation-free or transformation-based methods alleviate this issue, they ignore inter-layer interactions or use the naive nearest-rounding-based quantized weight assignment to save the heavy computational cost of weight optimization. In this paper, we introduce a novel backpropagation-free PTQ algorithm that optimizes quantized weights by considering inter-layer dependencies. The key innovation is the development of attention-aware Hessian matrices that capture inter-layer interactions within the attention module. Extensive experiments demonstrate that our approach not only outperforms existing weight quantization methods but also shows good synergy with conventional methods to suppress activation outliers, leading to state-of-the-art weight-activation quantization performance. The code will be available at https://github.com/SamsungLabs/BoA.

nan


Article 1685

Title@2025-06-06 (5): Training-Free Query Optimization via LLM-Based Plan Similarity

Title: Training-Free Query Optimization via LLM-Based Plan Similarity Schulungsfreie Abfrage Optimierung über LLM-basierte Plan-Ähnlichkeit 通过LLM-基于LLM-计划相似性实现无培训查询优化 2506.05853v1

Authors (3): Nikita Vasilenko, Alexander Demin, Vladimir Boorlakov

Large language model (LLM) embeddings offer a promising new avenue for database query optimization. In this paper, we explore how pre-trained execution plan embeddings can guide SQL query execution without the need for additional model training. We introduce LLM-PM (LLM-based Plan Mapping), a framework that embeds the default execution plan of a query, finds its k nearest neighbors among previously executed plans, and recommends database hintsets based on neighborhood voting. A lightweight consistency check validates the selected hint, while a fallback mechanism searches the full hint space when needed. Evaluated on the JOB-CEB benchmark using OpenGauss, LLM-PM achieves an average speed-up of 21% query latency reduction. This work highlights the potential of LLM-powered embeddings to deliver practical improvements in query performance and opens new directions for training-free, embedding-based optimizer guidance systems.

nan


Article 1686

Title@2025-06-06 (5): Regional, Lattice and Logical Representations of Neural Networks

Title: Regional, Lattice and Logical Representations of Neural Networks Regionale, gitter- und logische Vertretungen neuraler Netzwerke 神经网络的区域、拉蒂斯和逻辑代表 2506.05834v1

Authors (2): Sandro Preto, Marcelo Finger

A possible path to the interpretability of neural networks is to (approximately) represent them in the regional format of piecewise linear functions, where regions of inputs are associated to linear functions computing the network outputs. We present an algorithm for the translation of feedforward neural networks with ReLU activation functions in hidden layers and truncated identity activation functions in the output layer. We also empirically investigate the complexity of regional representations outputted by our method for neural networks with varying sizes. Lattice and logical representations of neural networks are straightforward from regional representations as long as they satisfy a specific property. So we empirically investigate to what extent the translations by our algorithm satisfy such property.

nan


Article 1687

Title@2025-06-06 (5): Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning

Title: Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning Mit dem Pfeil der Zeit lernen: Hyperbolische Geometrie für rückseitig-kompatibles Repräsentationslernen 沿时间之箭学习:反向比较代表学习的超单几何学 2506.05826v1

Authors (8): Ngoc Bui, Menglin Yang, Runjin Chen, Leonardo Neves, Mingxuan Ju, Rex Ying, Neil Shah, Tong Zhao

Backward compatible representation learning enables updated models to integrate seamlessly with existing ones, avoiding to reprocess stored data. Despite recent advances, existing compatibility approaches in Euclidean space neglect the uncertainty in the old embedding model and force the new model to reconstruct outdated representations regardless of their quality, thereby hindering the learning process of the new model. In this paper, we propose to switch perspectives to hyperbolic geometry, where we treat time as a natural axis for capturing a model’s confidence and evolution. By lifting embeddings into hyperbolic space and constraining updated embeddings to lie within the entailment cone of the old ones, we maintain generational consistency across models while accounting for uncertainties in the representations. To further enhance compatibility, we introduce a robust contrastive alignment loss that dynamically adjusts alignment weights based on the uncertainty of the old embeddings. Experiments validate the superiority of the proposed method in achieving compatibility, paving the way for more resilient and adaptable machine learning systems.

nan


Article 1688

Title@2025-06-06 (5): Domain Generalizable Knowledge Tracing via Concept Aggregation and Relation-Based Attention

Title: Domain Generalizable Knowledge Tracing via Concept Aggregation and Relation-Based Attention Domain Generalizable Knowledge Tracing über Konzeptaggregation und relationsbasierte Aufmerksamkeit 通过概念聚合和基于关系的关注寻找可普遍适用的知识 2407.02547v2

Authors (5): Yuquan Xie, Shengtao Peng, Wanqi Yang, Ming Yang, Yang Gao

Knowledge Tracing (KT) is a critical task in online education systems, aiming to monitor students’ knowledge states throughout a learning period. Common KT approaches involve predicting the probability of a student correctly answering the next question based on their exercise history. However, these methods often suffer from performance degradation when faced with the scarcity of student interactions in new education systems. To address this, we leverage student interactions from existing education systems to mitigate performance degradation caused by limited training data. Nevertheless, these interactions exhibit significant differences since they are derived from different education systems. To address this issue, we propose a domain generalization approach for knowledge tracing, where existing education systems are considered source domains, and new education systems with limited data are considered target domains. Additionally, we design a domain-generalizable knowledge tracing framework (DGKT) that can be applied to any KT model. Specifically, we present a concept aggregation approach designed to reduce conceptual disparities within sequences of student interactions from diverse domains. To further mitigate domain discrepancies, we introduce a novel normalization module called Sequence Instance Normalization (SeqIN). Moreover, to fully leverage exercise information, we propose a new knowledge tracing model tailored for the domain generalization KT task, named Domain-Generalizable Relation-based Knowledge Tracing (DGRKT). Extensive experiments across five benchmark datasets demonstrate that the proposed method performs well despite limited training data.

nan


Article 1689

Title@2025-06-06 (5): SAGE: A Framework of Precise Retrieval for RAG

Title: SAGE: A Framework of Precise Retrieval for RAG SAGE: Ein Rahmen für präzises Retrieval für RAG SAGE:RAG的精确检索框架 2503.01713v3

Authors (3): Jintao Zhang, Guoliang Li, Jinyang Su

Retrieval-augmented generation (RAG) has demonstrated significant proficiency in conducting question-answering (QA) tasks within a specified corpus. Nonetheless, numerous failure instances of RAG in QA still exist. These failures are not solely attributable to the limitations of Large Language Models (LLMs); instead, they predominantly arise from the retrieval of inaccurate information for LLMs due to two limitations: (1) Current RAG methods segment the corpus without considering semantics, making it difficult to find relevant context due to impaired correlation between questions and the segments. (2) There is a trade-off between missing essential context with fewer context retrieved and getting irrelevant context with more context retrieved. In this paper, we introduce a RAG framework (SAGE), to overcome these limitations. First, to address the segmentation issue without considering semantics, we propose to train a semantic segmentation model. This model is trained to segment the corpus into semantically complete chunks. Second, to ensure that only the most relevant chunks are retrieved while the irrelevant ones are ignored, we design a chunk selection algorithm to dynamically select chunks based on the decreasing speed of the relevance score, leading to a more relevant selection. Third, to further ensure the precision of the retrieved chunks, we propose letting LLMs assess whether retrieved chunks are excessive or lacking and then adjust the amount of context accordingly. Experiments show that SAGE outperforms baselines by 61.25% in the quality of QA on average. Moreover, by avoiding retrieving noisy context, SAGE lowers the cost of the tokens consumed in LLM inference and achieves a 49.41% enhancement in cost efficiency on average. Additionally, our work offers valuable insights for boosting RAG.

nan


Article 1690

Title@2025-06-06 (5): SageAttention2++: A More Efficient Implementation of SageAttention2

Title: SageAttention2++: A More Efficient Implementation of SageAttention2 SageAttention2++: Effizientere Umsetzung von SageAttention2 SageAttention2++:更有效地实施SageAttention2 2505.21136v3

Authors (8): Jintao Zhang, Xiaoming Xu, Jia Wei, Haofeng Huang, Pengle Zhang, Chendong Xiang, Jun Zhu, Jianfei Chen

The efficiency of attention is critical because its time complexity grows quadratically with sequence length. SageAttention2 addresses this by utilizing quantization to accelerate matrix multiplications (Matmul) in attention. To further accelerate SageAttention2, we propose to utilize the faster instruction of FP8 Matmul accumulated in FP16. The instruction is 2x faster than the FP8 Matmul used in SageAttention2. Our experiments show that SageAttention2++ achieves a 3.9x speedup over FlashAttention while maintaining the same attention accuracy as SageAttention2. This means SageAttention2++ effectively accelerates various models, including those for language, image, and video generation, with negligible end-to-end metrics loss. The code will be available at https://github.com/thu-ml/SageAttention.

nan


Article 1691

Title@2025-06-06 (5): Non-Asymptotic Length Generalization

Title: Non-Asymptotic Length Generalization Nicht-Asymptotische Längenverallgemeinerung 非负时长一般化 2506.03085v2

Authors (3): Thomas Chen, Tengyu Ma, Zhiyuan Li

Length generalization is the ability of a learning algorithm to learn a hypothesis which generalizes to longer inputs than the inputs in the training set. In this paper, we provide provable guarantees of length generalization for various classes of functions in an idealized setting. First, we formalize the framework of non-asymptotic length generalization, which requires a computable upper bound for the minimum input length that guarantees length generalization, as a function of the complexity of ground-truth function under some given complexity measure. We refer to this minimum input length to length generalize as length complexity. We show the Minimum-Complexity Interpolator learning algorithm achieves optimal length complexity. We further show that whether a function class admits non-asymptotic length generalization is equivalent to the decidability of its language equivalence problem, which implies that there is no computable upper bound for the length complexity of Context-Free Grammars. On the positive side, we show that the length complexity of Deterministic Finite Automata is $2n - 2$ where $n$ is the number of states of the ground-truth automaton. Our main results are upper bounds of length complexity for a subset of a transformer-related function class called C-RASP (Yang & Chiang, 2024). We show that the length complexity of 1-layer C-RASP functions is $O(T^2)$ when the ground-truth function has precision $T$, and that the length complexity of 2-layer C-RASP functions is $O(T^{O(K)})$ when the ground-truth function has precision $T$ and $K$ heads.

nan


Article 1692

Title@2025-06-06 (5): Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants

Title: Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants 高有机铜剂在 lvLiNGAM 中的因果效应识别 2506.05202v2

Authors (5): Daniele Tramontano, Yaroslav Kivva, Saber Salehkaleybar, Mathias Drton, Negar Kiyavash

This paper investigates causal effect identification in latent variable Linear Non-Gaussian Acyclic Models (lvLiNGAM) using higher-order cumulants, addressing two prominent setups that are challenging in the presence of latent confounding: (1) a single proxy variable that may causally influence the treatment and (2) underspecified instrumental variable cases where fewer instruments exist than treatments. We prove that causal effects are identifiable with a single proxy or instrument and provide corresponding estimation methods. Experimental results demonstrate the accuracy and robustness of our approaches compared to existing methods, advancing the theoretical and practical understanding of causal inference in linear systems with latent confounders.

nan


Article 1693

Title@2025-06-06 (5): Taming Knowledge Conflicts in Language Models

Title: Taming Knowledge Conflicts in Language Models Zähmen von Wissenskonflikten in Sprachmodellen 在语言模式中讨论知识冲突 2503.10996v2

Authors (3): Gaotang Li, Yuzhong Chen, Hanghang Tong

Language Models (LMs) often encounter knowledge conflicts when parametric memory contradicts contextual knowledge. Previous works attribute this conflict to the interplay between “memory heads” and “context heads”, attention heads assumed to promote either memory or context exclusively. In this study, we go beyond this fundamental assumption by uncovering a critical phenomenon we term the superposition of contextual information and parametric memory, where highly influential attention heads simultaneously contribute to both memory and context. Building upon this insight, we propose Just Run Twice (JuICE), a test-time attention intervention method that steers LMs toward either parametric beliefs or contextual knowledge without requiring fine-tuning. JuICE identifies a set of reliable attention heads and leverages a dual-run approach to mitigate the superposition effects. Extensive experiments across 11 datasets and 6 model architectures demonstrate that JuICE sets the new state-of-the-art performance and robust generalization, achieving significant and consistent improvement across different domains under various conflict types. Finally, we theoretically analyze knowledge conflict and the superposition of contextual information and parametric memory in attention heads, which further elucidates the effectiveness of JuICE in these settings. Our code is available at https://github.com/GaotangLi/JUICE.

nan


Article 1694

Title@2025-06-06 (5): Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models

Title: Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models Boolesche Matrix-Logik-Programmierung für aktives Lernen von Genfunktionen in metabolischen Netzwerkmodellen im Genommaßstab 在基因组规模代谢网络模型中积极学习基因功能的布尔矩阵逻辑逻辑编程 2405.06724v4

Authors (4): Lun Ai, Stephen H. Muggleton, Shi-Shun Liang, Geoff S. Baldwin

Reasoning about hypotheses and updating knowledge through empirical observations are central to scientific discovery. In this work, we applied logic-based machine learning methods to drive biological discovery by guiding experimentation. Genome-scale metabolic network models (GEMs) - comprehensive representations of metabolic genes and reactions - are widely used to evaluate genetic engineering of biological systems. However, GEMs often fail to accurately predict the behaviour of genetically engineered cells, primarily due to incomplete annotations of gene interactions. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To efficiently predict using GEM, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging Boolean matrices to evaluate large logic programs. We developed a new system, $BMLP_{active}$, which guides cost-effective experimentation and uses interpretable logic programs to encode a state-of-the-art GEM of a model bacterial organism. Notably, $BMLP_{active}$ successfully learned the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for biological discovery, which would then facilitate microbial engineering for practical applications.

nan


Article 1695

Title@2025-06-06 (5): SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference

Title: SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference SpargeAchtung: Genaue und trainingsfreie Sparse Achtung Beschleunigen jeder Modellinferenz 隔离注意:准确和无培训的分散注意加速任何典型推断 2502.18137v5

Authors (7): Jintao Zhang, Chendong Xiang, Haofeng Huang, Jia Wei, Haocheng Xi, Jun Zhu, Jianfei Chen

An efficient attention implementation is essential for large models due to its quadratic time complexity. Fortunately, attention commonly exhibits sparsity, i.e., many values in the attention map are near zero, allowing for the omission of corresponding computations. Many studies have utilized the sparse pattern to accelerate attention. However, most existing works focus on optimizing attention within specific models by exploiting certain sparse patterns of the attention map. A universal sparse attention that guarantees both the speedup and end-to-end performance of diverse models remains elusive. In this paper, we propose SpargeAttn, a universal sparse and quantized attention for any model. Our method uses a two-stage online filter: in the first stage, we rapidly and accurately predict the attention map, enabling the skip of some matrix multiplications in attention. In the second stage, we design an online softmax-aware filter that incurs no extra overhead and further skips some matrix multiplications. Experiments show that our method significantly accelerates diverse models, including language, image, and video generation, without sacrificing end-to-end metrics. The codes are available at https://github.com/thu-ml/SpargeAttn.

nan


Article 1696

Title@2025-06-06 (5): Overcoming Challenges of Partial Client Participation in Federated Learning : A Comprehensive Review

Title: Overcoming Challenges of Partial Client Participation in Federated Learning : A Comprehensive Review Herausforderungen der Teilkundenbeteiligung am Föderierten Lernen überwinden : Ein umfassender Überblick 克服部分客户参与联邦学习的挑战:全面审查 2506.02887v2

Authors (4): Mrinmay Sen, Shruti Aparna, Rohit Agarwal, Chalavadi Krishna Mohan

Federated Learning (FL) is a learning mechanism that falls under the distributed training umbrella, which collaboratively trains a shared global model without disclosing the raw data from different clients. This paper presents an extensive survey on the impact of partial client participation in federated learning. While much of the existing research focuses on addressing issues such as generalization, robustness, and fairness caused by data heterogeneity under the assumption of full client participation, limited attention has been given to the practical and theoretical challenges arising from partial client participation, which is common in real-world scenarios. This survey provides an in-depth review of existing FL methods designed to cope with partial client participation. We offer a comprehensive analysis supported by theoretical insights and empirical findings, along with a structured categorization of these methods, highlighting their respective advantages and disadvantages.

nan


Article 1697

Title@2025-06-06 (5): BOLD: Boolean Logic Deep Learning

Title: BOLD: Boolean Logic Deep Learning BOLD: Boolean Logic Deep Learning 布尔逻辑深层学习 2405.16339v2

Authors (5): Van Minh Nguyen, Cristian Ocampo, Aymen Askri, Louis Leconte, Ba-Hien Tran

Deep learning is computationally intensive, with significant efforts focused on reducing arithmetic complexity, particularly regarding energy consumption dominated by data movement. While existing literature emphasizes inference, training is considerably more resource-intensive. This paper proposes a novel mathematical principle by introducing the notion of Boolean variation such that neurons made of Boolean weights and inputs can be trained – for the first time – efficiently in Boolean domain using Boolean logic instead of gradient descent and real arithmetic. We explore its convergence, conduct extensively experimental benchmarking, and provide consistent complexity evaluation by considering chip architecture, memory hierarchy, dataflow, and arithmetic precision. Our approach achieves baseline full-precision accuracy in ImageNet classification and surpasses state-of-the-art results in semantic segmentation, with notable performance in image super-resolution, and natural language understanding with transformer-based models. Moreover, it significantly reduces energy consumption during both training and inference.

nan


Article 1698

Title@2025-06-06 (5): Positional Encoding meets Persistent Homology on Graphs

Title: Positional Encoding meets Persistent Homology on Graphs Positionale Kodierung trifft Persistente Homologie auf Graphen 定位编码与图纸上的常同理学相识 2506.05814v1

Authors (3): Yogesh Verma, Amauri H. Souza, Vikas Garg

The local inductive bias of message-passing graph neural networks (GNNs) hampers their ability to exploit key structural information (e.g., connectivity and cycles). Positional encoding (PE) and Persistent Homology (PH) have emerged as two promising approaches to mitigate this issue. PE schemes endow GNNs with location-aware features, while PH methods enhance GNNs with multiresolution topological features. However, a rigorous theoretical characterization of the relative merits and shortcomings of PE and PH has remained elusive. We bridge this gap by establishing that neither paradigm is more expressive than the other, providing novel constructions where one approach fails but the other succeeds. Our insights inform the design of a novel learnable method, PiPE (Persistence-informed Positional Encoding), which is provably more expressive than both PH and PE. PiPE demonstrates strong performance across a variety of tasks (e.g., molecule property prediction, graph classification, and out-of-distribution generalization), thereby advancing the frontiers of graph representation learning. Code is available at https://github.com/Aalto-QuML/PIPE.

nan


Article 1699

Title@2025-06-06 (5): E-LDA: Toward Interpretable LDA Topic Models with Strong Guarantees in Logarithmic Parallel Time

Title: E-LDA: Toward Interpretable LDA Topic Models with Strong Guarantees in Logarithmic Parallel Time E-LDA: Auf dem Weg zu interpretierbaren LDA-Themenmodellen mit starken Garantien in logarithmischer Parallelzeit E-LDA:争取在对数平行时间建立具有有力保证的可解释的LDA专题模型 2506.07747v1

Authors (1): Adam Breuer

In this paper, we provide the first practical algorithms with provable guarantees for the problem of inferring the topics assigned to each document in an LDA topic model. This is the primary inference problem for many applications of topic models in social science, data exploration, and causal inference settings. We obtain this result by showing a novel non-gradient-based, combinatorial approach to estimating topic models. This yields algorithms that converge to near-optimal posterior probability in logarithmic parallel computation time (adaptivity) – exponentially faster than any known LDA algorithm. We also show that our approach can provide interpretability guarantees such that each learned topic is formally associated with a known keyword. Finally, we show that unlike alternatives, our approach can maintain the independence assumptions necessary to use the learned topic model for downstream causal inference methods that allow researchers to study topics as treatments. In terms of practical performance, our approach consistently returns solutions of higher semantic quality than solutions from state-of-the-art LDA algorithms, neural topic models, and LLM-based topic models across a diverse range of text datasets and evaluation parameters.

nan


Article 1700

Title@2025-06-06 (5): Unintended Harms of Value-Aligned LLMs: Psychological and Empirical Insights

Title: Unintended Harms of Value-Aligned LLMs: Psychological and Empirical Insights Unbeabsichtigte Schäden an wertorientierten LLMs: Psychologische und empirische Erkenntnisse 价值观一致的LMMs:心理和实证透视的意外伤害 2506.06404v1

Authors (6): Sooyung Choi, Jaehyeok Lee, Xiaoyuan Yi, Jing Yao, Xing Xie, JinYeong Bak

The application scope of Large Language Models (LLMs) continues to expand, leading to increasing interest in personalized LLMs that align with human values. However, aligning these models with individual values raises significant safety concerns, as certain values may correlate with harmful information. In this paper, we identify specific safety risks associated with value-aligned LLMs and investigate the psychological principles behind these challenges. Our findings reveal two key insights. (1) Value-aligned LLMs are more prone to harmful behavior compared to non-fine-tuned models and exhibit slightly higher risks in traditional safety evaluations than other fine-tuned models. (2) These safety issues arise because value-aligned LLMs genuinely generate text according to the aligned values, which can amplify harmful outcomes. Using a dataset with detailed safety categories, we find significant correlations between value alignment and safety risks, supported by psychological hypotheses. This study offers insights into the “black box” of value alignment and proposes in-context alignment methods to enhance the safety of value-aligned LLMs.

nan


Article 1701

Title@2025-06-06 (5): ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Title: ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think ZeroFlow: Katastrophisches Vergessen zu überwinden ist einfacher als Sie denken 零花:克服灾难的遗忘比你想象的容易 2501.01045v4

Authors (7): Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang

Backpropagation provides a generalized configuration for overcoming catastrophic forgetting. Optimizers such as SGD and Adam are commonly used for weight updates in continual learning and continual pre-training. However, access to gradient information is not always feasible in practice due to black-box APIs, hardware constraints, or non-differentiable systems, a challenge we refer to as the gradient bans. To bridge this gap, we introduce ZeroFlow, the first benchmark designed to evaluate gradient-free optimization algorithms for overcoming forgetting. ZeroFlow examines a suite of forward pass-based methods across various algorithms, forgetting scenarios, and datasets. Our results show that forward passes alone can be sufficient to mitigate forgetting. We uncover novel optimization principles that highlight the potential of forward pass-based methods in mitigating forgetting, managing task conflicts, and reducing memory demands. Additionally, we propose new enhancements that further improve forgetting resistance using only forward passes. This work provides essential tools and insights to advance the development of forward-pass-based methods for continual learning.

nan


Article 1702

Title: Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model Neuraler Zusammenbruch in kumulativen Linkmodellen für Ordinalregression: Eine Analyse mit ungezwungenem Feature Model Ordinal 倒退累积链接模型中的神经折叠:含有不受限制特性模型的分析 2506.05801v1

Authors (3): Chuang Ma, Tomoyuki Obuchi, Toshiyuki Tanaka

A phenomenon known as ‘‘Neural Collapse (NC)’’ in deep classification tasks, in which the penultimate-layer features and the final classifiers exhibit an extremely simple geometric structure, has recently attracted considerable attention, with the expectation that it can deepen our understanding of how deep neural networks behave. The Unconstrained Feature Model (UFM) has been proposed to explain NC theoretically, and there emerges a growing body of work that extends NC to tasks other than classification and leverages it for practical applications. In this study, we investigate whether a similar phenomenon arises in deep Ordinal Regression (OR) tasks, via combining the cumulative link model for OR and UFM. We show that a phenomenon we call Ordinal Neural Collapse (ONC) indeed emerges and is characterized by the following three properties: (ONC1) all optimal features in the same class collapse to their within-class mean when regularization is applied; (ONC2) these class means align with the classifier, meaning that they collapse onto a one-dimensional subspace; (ONC3) the optimal latent variables (corresponding to logits or preactivations in classification tasks) are aligned according to the class order, and in particular, in the zero-regularization limit, a highly local and simple geometric relationship emerges between the latent variables and the threshold values. We prove these properties analytically within the UFM framework with fixed threshold values and corroborate them empirically across a variety of datasets. We also discuss how these insights can be leveraged in OR, highlighting the use of fixed thresholds.

nan


Article 1703

Title@2025-06-06 (5): Uncertainty Propagation in the Fast Fourier Transform

Title: Uncertainty Propagation in the Fast Fourier Transform Ungewissheitsverbreitung in der schnellen Fourier-Transformation 快速傅里叶变形中的不确定性传播 2504.10136v2

Authors (3): Luca Schmid, Charlotte Muth, Laurent Schmalen

We address the problem of uncertainty propagation in the discrete Fourier transform by modeling the fast Fourier transform as a factor graph. Building on this representation, we propose an efficient framework for approximate Bayesian inference using belief propagation (BP) and expectation propagation, extending its applicability beyond Gaussian assumptions. By leveraging an appropriate BP message representation and a suitable schedule, our method achieves stable convergence with accurate mean and variance estimates. Numerical experiments in representative scenarios from communications demonstrate the practical potential of the proposed framework for uncertainty-aware inference in probabilistic systems operating across both time and frequency domain.

nan


Article 1704

Title@2025-06-06 (5): Option Pricing Using Ensemble Learning

Title: Option Pricing Using Ensemble Learning Optionspreise mit Ensemble-Lernen 使用集合学习的选项定价 2506.05799v1

Authors (2): Zeyuan Li, Qingdao Huang

Ensemble learning is characterized by flexibility, high precision, and refined structure. As a critical component within computational finance, option pricing with machine learning requires both high predictive accuracy and reduced structural complexity-features that align well with the inherent advantages of ensemble learning. This paper investigates the application of ensemble learning to option pricing, and conducts a comparative analysis with classical machine learning models to assess their performance in terms of accuracy, local feature extraction, and robustness to noise. A novel experimental strategy is introduced, leveraging parameter transfer across experiments to improve robustness and realism in financial simulations.Building upon this strategy, an evaluation mechanism is developed that incorporates a scoring strategy and a weighted evaluation strategy explicitly emphasizing the foundational role of financial theory. This mechanism embodies an orderly integration of theoretical finance and computational methods. In addition, the study examines the interaction between sliding window technique and noise, revealing nuanced patterns that suggest a potential connection relevant to ongoing research in machine learning and data science.

nan


Article 1705

Title@2025-06-06 (5): Maximum Entropy Reinforcement Learning with Diffusion Policy

Title: Maximum Entropy Reinforcement Learning with Diffusion Policy Maximales Entropie-Verstärkung-Lernen mit Diffusionspolitik 采用传播政策进行最大内容强化学习 2502.11612v3

Authors (3): Xiaoyi Dong, Jian Cheng, Xi Sheryl Zhang

The Soft Actor-Critic (SAC) algorithm with a Gaussian policy has become a mainstream implementation for realizing the Maximum Entropy Reinforcement Learning (MaxEnt RL) objective, which incorporates entropy maximization to encourage exploration and enhance policy robustness. While the Gaussian policy performs well on simpler tasks, its exploration capacity and potential performance in complex multi-goal RL environments are limited by its inherent unimodality. In this paper, we employ the diffusion model, a powerful generative model capable of capturing complex multimodal distributions, as the policy representation to fulfill the MaxEnt RL objective, developing a method named MaxEnt RL with Diffusion Policy (MaxEntDP). Our method enables efficient exploration and brings the policy closer to the optimal MaxEnt policy. Experimental results on Mujoco benchmarks show that MaxEntDP outperforms the Gaussian policy and other generative models within the MaxEnt RL framework, and performs comparably to other state-of-the-art diffusion-based online RL algorithms. Our code is available at https://github.com/diffusionyes/MaxEntDP.

nan


Article 1706

Title@2025-06-06 (5): EqCollide: Equivariant and Collision-Aware Deformable Objects Neural Simulator

Title: EqCollide: Equivariant and Collision-Aware Deformable Objects Neural Simulator EqCollide: Gleichwertige und kollisionsfähige verformbare Objekte Neural Simulator EqCollide: 等同和相撞-可变形物体神经模拟器 2506.05797v1

Authors (4): Qianyi Chen, Tianrun Gao, Chenbo Jiang, Tailin Wu

Simulating collisions of deformable objects is a fundamental yet challenging task due to the complexity of modeling solid mechanics and multi-body interactions. Existing data-driven methods often suffer from lack of equivariance to physical symmetries, inadequate handling of collisions, and limited scalability. Here we introduce EqCollide, the first end-to-end equivariant neural fields simulator for deformable objects and their collisions. We propose an equivariant encoder to map object geometry and velocity into latent control points. A subsequent equivariant Graph Neural Network-based Neural Ordinary Differential Equation models the interactions among control points via collision-aware message passing. To reconstruct velocity fields, we query a neural field conditioned on control point features, enabling continuous and resolution-independent motion predictions. Experimental results show that EqCollide achieves accurate, stable, and scalable simulations across diverse object configurations, and our model achieves 24.34% to 35.82% lower rollout MSE even compared with the best-performing baseline model. Furthermore, our model could generalize to more colliding objects and extended temporal horizons, and stay robust to input transformed with group action.

nan


Article 1707

Title@2025-06-06 (5): TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

Title: TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining TiC-LM: Ein Web-Scale-Benchmark für zeitkontinuierliches LLM-Vortraining TIC-LM:培训前时间-持续性LLM网络基准 2504.02107v3

Authors (11): Jeffrey Li, Mohammadreza Armandpour, Iman Mirzadeh, Sachin Mehta, Vaishaal Shankar, Raviteja Vemulapalli, Samy Bengio, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pouransari, Fartash Faghri

Large Language Models (LLMs) trained on historical web data inevitably become outdated. We investigate evaluation strategies and update methods for LLMs as new data becomes available. We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Common Crawl (CC) - orders of magnitude larger than previous continual language modeling benchmarks. We also design time-stratified evaluations across both general CC data and specific domains (Wikipedia, StackExchange, and code documentation) to assess how well various continual learning methods adapt to new data while retaining past knowledge. Our findings demonstrate that, on general CC data, autoregressive meta-schedules combined with a fixed-ratio replay of older data can achieve comparable held-out loss to re-training from scratch, while requiring significantly less computation (2.6x). However, the optimal balance between incorporating new data and replaying old data differs as replay is crucial to avoid forgetting on generic web data but less so on specific domains.

nan


Article 1708

Title@2025-06-06 (5): Exploiting Similarity for Computation and Communication-Efficient Decentralized Optimization

Title: Exploiting Similarity for Computation and Communication-Efficient Decentralized Optimization Ausnutzung von Ähnlichkeiten für die Berechnung und Kommunikation-Effiziente dezentralisierte Optimierung 利用计算和通信效率-权力下放优化的相似性 2506.05791v1

Authors (4): Yuki Takezawa, Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich

Reducing communication complexity is critical for efficient decentralized optimization. The proximal decentralized optimization (PDO) framework is particularly appealing, as methods within this framework can exploit functional similarity among nodes to reduce communication rounds. Specifically, when local functions at different nodes are similar, these methods achieve faster convergence with fewer communication steps. However, existing PDO methods often require highly accurate solutions to subproblems associated with the proximal operator, resulting in significant computational overhead. In this work, we propose the Stabilized Proximal Decentralized Optimization (SPDO) method, which achieves state-of-the-art communication and computational complexities within the PDO framework. Additionally, we refine the analysis of existing PDO methods by relaxing subproblem accuracy requirements and leveraging average functional similarity. Experimental results demonstrate that SPDO significantly outperforms existing methods.

nan


Article 1709

Title@2025-06-06 (5): Robust sensor fusion against on-vehicle sensor staleness

Title: Robust sensor fusion against on-vehicle sensor staleness Robuste Sensor-Fusion gegen Stillstand des Sensors im Fahrzeug 压制车辆传感器悬浮剂的强感传感器聚变 2506.05780v1

Authors (5): Meng Fan, Yifan Zuo, Patrick Blaes, Harley Montgomery, Subhasis Das

Sensor fusion is crucial for a performant and robust Perception system in autonomous vehicles, but sensor staleness, where data from different sensors arrives with varying delays, poses significant challenges. Temporal misalignment between sensor modalities leads to inconsistent object state estimates, severely degrading the quality of trajectory predictions that are critical for safety. We present a novel and model-agnostic approach to address this problem via (1) a per-point timestamp offset feature (for LiDAR and radar both relative to camera) that enables fine-grained temporal awareness in sensor fusion, and (2) a data augmentation strategy that simulates realistic sensor staleness patterns observed in deployed vehicles. Our method is integrated into a perspective-view detection model that consumes sensor data from multiple LiDARs, radars and cameras. We demonstrate that while a conventional model shows significant regressions when one sensor modality is stale, our approach reaches consistently good performance across both synchronized and stale conditions.

nan


Article 1710

Title@2025-06-06 (5): Pegasus: A Universal Framework for Scalable Deep Learning Inference on the Dataplane

Title: Pegasus: A Universal Framework for Scalable Deep Learning Inference on the Dataplane Pegasus: Ein universelles Framework für skalierbare Deep Learning-Schlussfolgerungen auf dem Dataplane Pegasus:数据飞机可缩放深学习推理通用框架 2506.05779v1

Authors (12): Yinchao Zhang, Su Yao, Yong Feng, Kang Chen, Tong Li, Zhuotao Liu, Yi Zhao, Lexuan Zhang, Xiangyu Gao, Feng Xiong, Qi Li, Ke Xu

The paradigm of Intelligent DataPlane (IDP) embeds deep learning (DL) models on the network dataplane to enable intelligent traffic analysis at line-speed. However, the current use of the match-action table (MAT) abstraction on the dataplane is misaligned with DL inference, leading to several key limitations, including accuracy degradation, limited scale, and lack of generality. This paper proposes Pegasus to address these limitations. Pegasus translates DL operations into three dataplane-oriented primitives to achieve generality: Partition, Map, and SumReduce. Specifically, Partition “divides” high-dimensional features into multiple low-dimensional vectors, making them more suitable for the dataplane; Map “conquers” computations on the low-dimensional vectors in parallel with the technique of fuzzy matching, while SumReduce “combines” the computation results. Additionally, Pegasus employs Primitive Fusion to merge computations, improving scalability. Finally, Pegasus adopts full precision weights with fixed-point activations to improve accuracy. Our implementation on a P4 switch demonstrates that Pegasus can effectively support various types of DL models, including Multi-Layer Perceptron (MLP), Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and AutoEncoder models on the dataplane. Meanwhile, Pegasus outperforms state-of-the-art approaches with an average accuracy improvement of up to 22.8%, along with up to 248x larger model size and 212x larger input scale.

nan


Article 1711

Title@2025-06-06 (5): Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch

Title: Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch Dunkles Wissen unter verschiedenen Lehrer-Kapazitäten erforschen und Kapazitäts-Mismatch angehen 在各种教师能力之下探索黑暗知识,解决能力不足问题 2405.13078v2

Authors (3): Wen-Shu Fan, Xin-Chun Li, De-Chuan Zhan

Knowledge Distillation (KD) could transfer the ``dark knowledge” of a well-performed yet large neural network to a weaker but lightweight one. From the view of output logits and softened probabilities, this paper goes deeper into the dark knowledge provided by teachers with different capacities. Two fundamental observations are: (1) a larger teacher tends to produce probability vectors with lower distinction among non-ground-truth classes; (2) teachers with different capacities are basically consistent in their cognition of relative class affinity. Through abundant experimental studies we verify these observations and provide in-depth empirical explanations to them. We argue that the distinctness among incorrect classes embodies the essence of dark knowledge. A larger and more accurate teacher lacks this distinctness, which hampers its teaching ability compared to a smaller teacher, ultimately leading to the peculiar phenomenon named “capacity mismatch”. Building on this insight, this paper explores multiple simple yet effective ways to address capacity mismatch, achieving superior experimental results compared to previous approaches.

nan


Article 1712

Title@2025-06-06 (5): ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation

Title: ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation ActionPiece: Kontextuelle Tokenisierung von Handlungssequenzen für eine generative Empfehlung ActionPiece: 产生建议的背景化行动序列 2502.13581v2

Authors (8): Yupeng Hou, Jianmo Ni, Zhankui He, Noveen Sachdeva, Wang-Cheng Kang, Ed H. Chi, Julian McAuley, Derek Zhiyuan Cheng

Generative recommendation (GR) is an emerging paradigm where user actions are tokenized into discrete token patterns and autoregressively generated as predictions. However, existing GR models tokenize each action independently, assigning the same fixed tokens to identical actions across all sequences without considering contextual relationships. This lack of context-awareness can lead to suboptimal performance, as the same action may hold different meanings depending on its surrounding context. To address this issue, we propose ActionPiece to explicitly incorporate context when tokenizing action sequences. In ActionPiece, each action is represented as a set of item features. Given the action sequence corpora, we construct the vocabulary by merging feature patterns as new tokens, based on their co-occurrence frequency both within individual sets and across adjacent sets. Considering the unordered nature of feature sets, we further introduce set permutation regularization, which produces multiple segmentations of action sequences with the same semantics. Our code is available at: https://github.com/google-deepmind/action_piece.

nan


Article 1713

Title@2025-06-06 (5): Evaluating Neuron Explanations: A Unified Framework with Sanity Checks

Title: Evaluating Neuron Explanations: A Unified Framework with Sanity Checks Neuron-Erklärungen bewerten: Ein einheitliches Framework mit Sanitätskontrollen 评估内向解释:一个统一框架,检查是否合乎情理 2506.05774v1

Authors (3): Tuomas Oikarinen, Ge Yan, Tsui-Wei Weng

Understanding the function of individual units in a neural network is an important building block for mechanistic interpretability. This is often done by generating a simple text explanation of the behavior of individual neurons or units. For these explanations to be useful, we must understand how reliable and truthful they are. In this work we unify many existing explanation evaluation methods under one mathematical framework. This allows us to compare existing evaluation metrics, understand the evaluation pipeline with increased clarity and apply existing statistical methods on the evaluation. In addition, we propose two simple sanity checks on the evaluation metrics and show that many commonly used metrics fail these tests and do not change their score after massive changes to the concept labels. Based on our experimental and theoretical results, we propose guidelines that future evaluations should follow and identify a set of reliable evaluation metrics.

nan


Article 1714

Title@2025-06-06 (5): SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures

Title: SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures SGD Jittering: Eine Schulungsstrategie für robuste und präzise modellbasierte Architekturen SGD JGT JUGT JIGT: 强健和准确的建模建筑培训战略 2410.14667v3

Authors (2): Peimeng Guan, Mark A. Davenport

Inverse problems aim to reconstruct unseen data from corrupted or perturbed measurements. While most work focuses on improving reconstruction quality, generalization accuracy and robustness are equally important, especially for safety-critical applications. Model-based architectures (MBAs), such as loop unrolling methods, are considered more interpretable and achieve better reconstructions. Empirical evidence suggests that MBAs are more robust to perturbations than black-box solvers, but the accuracy-robustness tradeoff in MBAs remains underexplored. In this work, we propose a simple yet effective training scheme for MBAs, called SGD jittering, which injects noise iteration-wise during reconstruction. We theoretically demonstrate that SGD jittering not only generalizes better than the standard mean squared error training but is also more robust to average-case attacks. We validate SGD jittering using denoising toy examples, seismic deconvolution, and single-coil MRI reconstruction. Both SGD jittering and its SPGD extension yield cleaner reconstructions for out-of-distribution data and demonstrates enhanced robustness against adversarial attacks.

nan


Article 1715

Title@2025-06-06 (5): Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks

Title: Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks Visualisierung, Nachdenken und Abbau der verlorenen Landschaft tiefer neuraler Netzwerke 深神经网络丧失的地貌景观 2405.12493v2

Authors (4): Yichu Xu, Xin-Chun Li, Lan Li, De-Chuan Zhan

The loss landscape of deep neural networks (DNNs) is commonly considered complex and wildly fluctuated. However, an interesting observation is that the loss surfaces plotted along Gaussian noise directions are almost v-basin ones with the perturbed model lying on the basin. This motivates us to rethink whether the 1D or 2D subspace could cover more complex local geometry structures, and how to mine the corresponding perturbation directions. This paper systematically and gradually categorizes the 1D curves from simple to complex, including v-basin, v-side, w-basin, w-peak, and vvv-basin curves. Notably, the latter two types are already hard to obtain via the intuitive construction of specific perturbation directions, and we need to propose proper mining algorithms to plot the corresponding 1D curves. Combining these 1D directions, various types of 2D surfaces are visualized such as the saddle surfaces and the bottom of a bottle of wine that are only shown by demo functions in previous works. Finally, we propose theoretical insights from the lens of the Hessian matrix to explain the observed several interesting phenomena.

nan


Article 1716

Title@2025-06-06 (5): Predicting ICU In-Hospital Mortality Using Adaptive Transformer Layer Fusion

Title: Predicting ICU In-Hospital Mortality Using Adaptive Transformer Layer Fusion Vorhersage der Krankenhaussterblichkeit der Intensivstation mittels adaptiver Transformer Layer Fusion 利用适应性变异器图层融合预测伊斯兰法院联盟在医院内的死亡率 2506.04924v2

Authors (19): Han Wang, Ruoyun He, Guoguang Lao, Ting Liu, Hejiao Luo, Changqi Qin, Hongying Luo, Junmin Huang, Zihan Wei, Lu Chen, Yongzhi Xu, Ziqian Bi, Junhao Song, Tianyang Wang, Chia Xin Liang, Xinyuan Song, Huafeng Liu, Junfeng Hao, Chunjie Tian

Early identification of high-risk ICU patients is crucial for directing limited medical resources. We introduce ALFIA (Adaptive Layer Fusion with Intelligent Attention), a modular, attention-based architecture that jointly trains LoRA (Low-Rank Adaptation) adapters and an adaptive layer-weighting mechanism to fuse multi-layer semantic features from a BERT backbone. Trained on our rigorous cw-24 (CriticalWindow-24) benchmark, ALFIA surpasses state-of-the-art tabular classifiers in AUPRC while preserving a balanced precision-recall profile. The embeddings produced by ALFIA’s fusion module, capturing both fine-grained clinical cues and high-level concepts, enable seamless pairing with GBDTs (CatBoost/LightGBM) as ALFIA-boost, and deep neuro networks as ALFIA-nn, yielding additional performance gains. Our experiments confirm ALFIA’s superior early-warning performance, by operating directly on routine clinical text, it furnishes clinicians with a convenient yet robust tool for risk stratification and timely intervention in critical-care settings.

nan


Article 1717

Title@2025-06-06 (5): AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation

Title: AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation AANet: Virtuelles Screening unter struktureller Unsicherheit über Alignment und Aggregation AANet:通过对齐和聚合在结构不确定性下进行虚拟筛选 2506.05768v1

Authors (8): Wenyu Zhu, Jianhui Wang, Bowen Gao, Yinjun Jia, Haichuan Tan, Ya-Qin Zhang, Wei-Ying Ma, Yanyan Lan

Virtual screening (VS) is a critical component of modern drug discovery, yet most existing methods–whether physics-based or deep learning-based–are developed around holo protein structures with known ligand-bound pockets. Consequently, their performance degrades significantly on apo or predicted structures such as those from AlphaFold2, which are more representative of real-world early-stage drug discovery, where pocket information is often missing. In this paper, we introduce an alignment-and-aggregation framework to enable accurate virtual screening under structural uncertainty. Our method comprises two core components: (1) a tri-modal contrastive learning module that aligns representations of the ligand, the holo pocket, and cavities detected from structures, thereby enhancing robustness to pocket localization error; and (2) a cross-attention based adapter for dynamically aggregating candidate binding sites, enabling the model to learn from activity data even without precise pocket annotations. We evaluated our method on a newly curated benchmark of apo structures, where it significantly outperforms state-of-the-art methods in blind apo setting, improving the early enrichment factor (EF1%) from 11.75 to 37.19. Notably, it also maintains strong performance on holo structures. These results demonstrate the promise of our approach in advancing first-in-class drug discovery, particularly in scenarios lacking experimentally resolved protein-ligand complexes.

nan


Article 1718

Title@2025-06-06 (5): On the Query Complexity of Verifier-Assisted Language Generation

Title: On the Query Complexity of Verifier-Assisted Language Generation Zur Fragekomplexität der sprachlichen Generierung von Prüfern 关于验证者协助语言生成的查询复杂程度 2502.12123v2

Authors (6): Edoardo Botta, Yuchen Li, Aashay Mehta, Jordan T. Ash, Cyril Zhang, Andrej Risteski

Recently, a plethora of works have proposed inference-time algorithms (e.g. best-of-n), which incorporate verifiers to assist the generation process. Their quality-efficiency trade-offs have been empirically benchmarked on a variety of constrained generation tasks, but the algorithmic design landscape is still largely poorly understood. In this paper, we develop a mathematical framework for reasoning about constrained generation using a pre-trained language model generator oracle and a process verifier–which can decide whether a prefix can be extended to a string which satisfies the constraints of choice. We show that even in very simple settings, access to a verifier can render an intractable problem (information-theoretically or computationally) to a tractable one. In fact, we show even simple algorithms, like tokenwise rejection sampling, can enjoy significant benefits from access to a verifier. Empirically, we show that a natural modification of tokenwise rejection sampling, in which the sampler is allowed to “backtrack” (i.e., erase the final few generated tokens) has robust and substantive benefits over natural baselines (e.g. (blockwise) rejection sampling, nucleus sampling)–both in terms of computational efficiency, accuracy and diversity.

nan


Article 1719

Title@2025-06-06 (5): BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning

Title: BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning BiTrajDiff: Bidirektionale Trajektorien-Generation mit Diffusionsmodellen für Offline-Verstärkung BirrajDiff:双向轨迹生成,带有离线强化学习传播模型 2506.05762v1

Authors (6): Yunpeng Qing, Shuo Chen, Yixiao Chi, Shunyu Liu, Sixu Lin, Changqing Zou

Recent advances in offline Reinforcement Learning (RL) have proven that effective policy learning can benefit from imposing conservative constraints on pre-collected datasets. However, such static datasets often exhibit distribution bias, resulting in limited generalizability. To address this limitation, a straightforward solution is data augmentation (DA), which leverages generative models to enrich data distribution. Despite the promising results, current DA techniques focus solely on reconstructing future trajectories from given states, while ignoring the exploration of history transitions that reach them. This single-direction paradigm inevitably hinders the discovery of diverse behavior patterns, especially those leading to critical states that may have yielded high-reward outcomes. In this work, we introduce Bidirectional Trajectory Diffusion (BiTrajDiff), a novel DA framework for offline RL that models both future and history trajectories from any intermediate states. Specifically, we decompose the trajectory generation task into two independent yet complementary diffusion processes: one generating forward trajectories to predict future dynamics, and the other generating backward trajectories to trace essential history transitions.BiTrajDiff can efficiently leverage critical states as anchors to expand into potentially valuable yet underexplored regions of the state space, thereby facilitating dataset diversity. Extensive experiments on the D4RL benchmark suite demonstrate that BiTrajDiff achieves superior performance compared to other advanced DA methods across various offline RL backbones.

nan


Article 1720

Title@2025-06-06 (5): Mapping correlations and coherence: adjacency-based approach to data visualization and regularity discovery

Title: Mapping correlations and coherence: adjacency-based approach to data visualization and regularity discovery Mapping-Korrelationen und Kohärenz: Adjazenz-basierter Ansatz zur Datenvisualisierung und Regelmäßigkeits-Entdeckung 绘图的相互关系和一致性:基于对应的可视化和定期发现数据方法 2506.05758v1

Authors (1): Guang-Xing Li

The development of science has been transforming man’s view towards nature for centuries. Observing structures and patterns in an effective approach to discover regularities from data is a key step toward theory-building. With increasingly complex data being obtained, revealing regularities systematically has become a challenge. Correlation is a most commonly-used and effective approach to describe regularities in data, yet for complex patterns, spatial inhomogeneity and complexity can often undermine the correlations. We present an algorithm to derive maps representing the type and degree of correlations, by taking the two-fold symmetry of the correlation vector into full account using the Stokes parameter. The method allows for a spatially resolved view of the nature and strength of correlations between physical quantities. In the correlation view, a region can often be separated into different subregions with different types of correlations. Subregions correspond to physical regimes for physical systems, or climate zones for climate maps. The simplicity of the method makes it widely applicable to a variety of data, where the correlation-based approach makes the map particularly useful in revealing regularities in physical systems and alike. As a new and efficient approach to represent data, the method should facilitate the development of new computational approaches to regularity discovery.

nan


Article 1721

Title@2025-06-06 (5): FlowOE: Imitation Learning with Flow Policy from Ensemble RL Experts for Optimal Execution under Heston Volatility and Concave Market Impacts

Title: FlowOE: Imitation Learning with Flow Policy from Ensemble RL Experts for Optimal Execution under Heston Volatility and Concave Market Impacts FlowOE: Imitation Learning with Flow Policy von Ensemble RL Experts for Optimal Execution unter Heston Volatility und Concave Market Impacts FlowOE:在赫斯顿波动和集中市场影响下最优化执行的集合RL专家以流动政策进行模拟学习 2506.05755v1

Authors (2): Yang Li, Zhi Chen

Optimal execution in financial markets refers to the process of strategically transacting a large volume of assets over a period to achieve the best possible outcome by balancing the trade-off between market impact costs and timing or volatility risks. Traditional optimal execution strategies, such as static Almgren-Chriss models, often prove suboptimal in dynamic financial markets. This paper propose flowOE, a novel imitation learning framework based on flow matching models, to address these limitations. FlowOE learns from a diverse set of expert traditional strategies and adaptively selects the most suitable expert behavior for prevailing market conditions. A key innovation is the incorporation of a refining loss function during the imitation process, enabling flowOE not only to mimic but also to improve upon the learned expert actions. To the best of our knowledge, this work is the first to apply flow matching models in a stochastic optimal execution problem. Empirical evaluations across various market conditions demonstrate that flowOE significantly outperforms both the specifically calibrated expert models and other traditional benchmarks, achieving higher profits with reduced risk. These results underscore the practical applicability and potential of flowOE to enhance adaptive optimal execution.

nan


Article 1722

Title@2025-06-06 (5): Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective

Title: Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective Beschränkte Probenahmen für Sprachmodelle sollten einfach sein: Eine MCMC-Perspektive 语言模式抽样应容易:MCMC视角 2506.05754v1

Authors (6): Emmanuel Anaya Gonzalez, Sairam Vaidya, Kanghee Park, Ruyi Ji, Taylor Berg-Kirkpatrick, Loris D’Antoni

Constrained decoding enables Language Models (LMs) to produce samples that provably satisfy hard constraints. However, existing constrained-decoding approaches often distort the underlying model distribution, a limitation that is especially problematic in applications like program fuzzing, where one wants to generate diverse and valid program inputs for testing purposes. We propose a new constrained sampling framework based on Markov Chain Monte Carlo (MCMC) that simultaneously satisfies three core desiderata: constraint satisfying (every sample satisfies the constraint), monotonically converging (the sampling process converges to the true conditional distribution), and efficient (high-quality samples emerge in few steps). Our method constructs a proposal distribution over valid outputs and applies a Metropolis-Hastings acceptance criterion based on the LM’s likelihood, ensuring principled and efficient exploration of the constrained space. Empirically, our sampler outperforms existing methods on both synthetic benchmarks and real-world program fuzzing tasks.

nan


Article 1723

Title@2025-06-06 (5): Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting

Title: Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting Integration von Spatiotemporalen Features in LSTM für spatially Informed COVID-19 Hospitalization Forecasting LSTM综合地貌特征纳入LSTM系统,用于空间知情COVID-19医院化预测 2506.05752v1

Authors (5): Zhongying Wang, Thoai D. Ngo, Hamidreza Zoraghein, Benjamin Lucas, Morteza Karimzadeh

The COVID-19 pandemic’s severe impact highlighted the need for accurate, timely hospitalization forecasting to support effective healthcare planning. However, most forecasting models struggled, especially during variant surges, when they were needed most. This study introduces a novel Long Short-Term Memory (LSTM) framework for forecasting daily state-level incident hospitalizations in the United States. We present a spatiotemporal feature, Social Proximity to Hospitalizations (SPH), derived from Facebook’s Social Connectedness Index to improve forecasts. SPH serves as a proxy for interstate population interaction, capturing transmission dynamics across space and time. Our parallel LSTM architecture captures both short- and long-term temporal dependencies, and our multi-horizon ensembling strategy balances consistency and forecasting error. Evaluation against COVID-19 Forecast Hub ensemble models during the Delta and Omicron surges reveals superiority of our model. On average, our model surpasses the ensemble by 27, 42, 54, and 69 hospitalizations per state on the $7^{th}$, $14^{th}$, $21^{st}$, and $28^{th}$ forecast days, respectively, during the Omicron surge. Data-ablation experiments confirm SPH’s predictive power, highlighting its effectiveness in enhancing forecasting models. This research not only advances hospitalization forecasting but also underscores the significance of spatiotemporal features, such as SPH, in refining predictive performance in modeling the complex dynamics of infectious disease spread.

nan


Article 1724

Title@2025-06-06 (5): Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Title: Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models 低Rank适应的计算限制(LORA)变换模型的精确调整 2406.03136v2

Authors (5): Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu

We study the computational limits of Low-Rank Adaptation (LoRA) for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior of efficiency assuming the Strong Exponential Time Hypothesis (SETH), and (ii) prove the existence of almost linear algorithms by controlling the LoRA update computation term by term. For the former, we identify a sharp transition in the efficiency of all possible rank-$r$ LoRA update algorithms for transformers, based on specific norms resulting from the multiplications of the input sequence $X$, pretrained weights ${W^\star}$, and adapter matrices $\alpha B A/r$. Specifically, we derive a shared upper bound threshold for such norms, and show that efficient (sub-quadratic) approximation algorithms of LoRA exist only below this threshold. For the latter, we prove the existence of almost linear approximation algorithms for LoRA adaptation by utilizing the hierarchical low-rank structures of LoRA gradients and approximating the gradients with a series of chained low-rank approximations. To showcase our theory, we consider two practical scenarios: partial (e.g., only $W_V$ and $W_Q$) and full adaptations (e.g., $W_Q$, $W_V$, and $W_K$) of weights in attention heads.

nan


Article 1725

Title@2025-06-06 (5): Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance

Title: Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance Effiziente Online-RFT mit Plug-and-Play-LLM-Richtern: Entsperren State-of-the-Art-Performance 与 “ 插插 “ 和 “ Play LLM “ 法官一起高效在线RFT:解锁最新业绩 2506.05748v1

Authors (2): Rudransh Agnihotri, Ananya Pandey

Reward-model training is the cost bottleneck in modern Reinforcement Learning Human Feedback (RLHF) pipelines, often requiring tens of billions of parameters and an offline preference-tuning phase. In the proposed method, a frozen, instruction-tuned 7B LLM is augmented with only a one line JSON rubric and a rank-16 LoRA adapter (affecting just 0.8% of the model’s parameters), enabling it to serve as a complete substitute for the previously used heavyweight evaluation models. The plug-and-play judge achieves 96.2% accuracy on RewardBench, outperforming specialized reward networks ranging from 27B to 70B parameters. Additionally, it allows a 7B actor to outperform the top 70B DPO baseline, which scores 61.8%, by achieving 92% exact match accuracy on GSM-8K utilizing online PPO. Thorough ablations indicate that (i) six in context demonstrations deliver the majority of the zero-to-few-shot improvements (+2pp), and (ii) the LoRA effectively addresses the remaining disparity, particularly in the safety and adversarial Chat-Hard segments. The proposed model introduces HH-Rationales, a subset of 10,000 pairs from Anthropic HH-RLHF, to examine interpretability, accompanied by human generated justifications. GPT-4 scoring indicates that our LoRA judge attains approximately = 9/10 in similarity to human explanations, while zero-shot judges score around =5/10. These results indicate that the combination of prompt engineering and tiny LoRA produces a cost effective, transparent, and easily adjustable reward function, removing the offline phase while achieving new state-of-the-art outcomes for both static evaluation and online RLHF.

nan


Article 1726

Title@2025-06-06 (5): SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models

Title: SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models SPRINT: Interleaved Planung und parallelisierte Ausführung in vernünftigen Modellen ermöglichen SPRINT: 在理由说明模式中促进跨离规划和平行执行 2506.05745v1

Authors (6): Emil Biju, Shayan Talaei, Zhemin Huang, Mohammadreza Pourreza, Azalia Mirhoseini, Amin Saberi

Large reasoning models (LRMs) excel at complex reasoning tasks but typically generate lengthy sequential chains-of-thought, resulting in long inference times before arriving at the final answer. To address this challenge, we introduce SPRINT, a novel post-training and inference-time framework designed to enable LRMs to dynamically identify and exploit opportunities for parallelization during their reasoning process. SPRINT incorporates an innovative data curation pipeline that reorganizes natural language reasoning trajectories into structured rounds of long-horizon planning and parallel execution. By fine-tuning LRMs on a small amount of such curated data, the models learn to dynamically identify independent subtasks within extended reasoning processes and effectively execute them in parallel. Through extensive evaluations, we show that the models fine-tuned with the SPRINT framework match the performance of reasoning models on complex domains such as mathematics while generating up to ~39% fewer sequential tokens on problems requiring more than 8000 output tokens. Finally, we observe consistent results transferred to two out-of-distribution tasks of GPQA and Countdown with up to 45% and 65% reduction in average sequential tokens for longer reasoning trajectories, while achieving the performance of the fine-tuned reasoning model.

nan


Article 1727

Title@2025-06-06 (5): In-context denoising with one-layer transformers: connections between attention and associative memory retrieval

Title: In-context denoising with one-layer transformers: connections between attention and associative memory retrieval In-Context-Denoising mit Einschichttransformatoren: Verbindungen zwischen Aufmerksamkeit und assoziativem Speicherabruf 与一层变压器的自通脱色:注意和联动内存检索之间的联系 2502.05164v2

Authors (3): Matthew Smart, Alberto Bietti, Anirvan M. Sengupta

We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attention layer processes each denoising prompt by performing a single gradient descent update on a context-aware DAM energy landscape, where context tokens serve as associative memories and the query token acts as an initial state. This one-step update yields better solutions than exact retrieval of either a context token or a spurious local minimum, providing a concrete example of DAM networks extending beyond the standard retrieval paradigm. Overall, this work solidifies the link between associative memory and attention mechanisms first identified by Ramsauer et al., and demonstrates the relevance of associative memory models in the study of in-context learning.

nan


Article 1728

Title@2025-06-06 (5): FinSage: A Multi-aspect RAG System for Financial Filings Question Answering

Title: FinSage: A Multi-aspect RAG System for Financial Filings Question Answering FinSage: Ein Multi-Aspect-RAG-System für Finanzfilialen Frageantworten FinSage:财务申报问题解答的多层RAG系统 2504.14493v3

Authors (16): Xinyu Wang, Jijun Chi, Zhenghan Tai, Tung Sum Thomas Kwok, Muzhi Li, Zhuhong Li, Hailin He, Yuchen Hua, Peng Lu, Suyuchen Wang, Yihong Wu, Jerry Huang, Jingrui Tian, Fengran Mo, Yufei Cui, Ling Zhou

Leveraging large language models in real-world settings often entails a need to utilize domain-specific data and tools in order to follow the complex regulations that need to be followed for acceptable use. Within financial sectors, modern enterprises increasingly rely on Retrieval-Augmented Generation (RAG) systems to address complex compliance requirements in financial document workflows. However, existing solutions struggle to account for the inherent heterogeneity of data (e.g., text, tables, diagrams) and evolving nature of regulatory standards used in financial filings, leading to compromised accuracy in critical information extraction. We propose the FinSage framework as a solution, utilizing a multi-aspect RAG framework tailored for regulatory compliance analysis in multi-modal financial documents. FinSage introduces three innovative components: (1) a multi-modal pre-processing pipeline that unifies diverse data formats and generates chunk-level metadata summaries, (2) a multi-path sparse-dense retrieval system augmented with query expansion (HyDE) and metadata-aware semantic search, and (3) a domain-specialized re-ranking module fine-tuned via Direct Preference Optimization (DPO) to prioritize compliance-critical content. Extensive experiments demonstrate that FinSage achieves an impressive recall of 92.51% on 75 expert-curated questions derived from surpasses the best baseline method on the FinanceBench question answering datasets by 24.06% in accuracy. Moreover, FinSage has been successfully deployed as financial question-answering agent in online meetings, where it has already served more than 1,200 people.

nan


Article 1729

Title@2025-06-06 (5): ResearchTown: Simulator of Human Research Community

Title: ResearchTown: Simulator of Human Research Community ResearchTown: Simulator der Human Research Community 研究对象:人类研究界模拟者 2412.17767v2

Authors (8): Haofei Yu, Zhaochen Hong, Zirui Cheng, Kunlun Zhu, Keyang Xuan, Jinwei Yao, Tao Feng, Jiaxuan You

Large Language Models (LLMs) have demonstrated remarkable potential in scientific domains, yet a fundamental question remains unanswered: Can we simulate human research communities with LLMs? Addressing this question can deepen our understanding of the processes behind idea brainstorming and inspire the automatic discovery of novel scientific insights. In this work, we propose ResearchTown, a multi-agent framework for research community simulation. Within this framework, the human research community is simplified as an agent-data graph, where researchers and papers are represented as agent-type and data-type nodes, respectively, and connected based on their collaboration relationships. We also introduce TextGNN, a text-based inference framework that models various research activities (e.g., paper reading, paper writing, and review writing) as special forms of a unified message-passing process on the agent-data graph. To evaluate the quality of the research community simulation, we present ResearchBench, a benchmark that uses a node-masking prediction task for scalable and objective assessment based on similarity. Our experiments reveal three key findings: (1) ResearchTown can provide a realistic simulation of collaborative research activities, including paper writing and review writing; (2) ResearchTown can maintain robust simulation with multiple researchers and diverse papers; (3) ResearchTown can generate interdisciplinary research ideas that potentially inspire pioneering research directions.

nan


Article 1730

Title@2025-06-06 (5): Generalized Incremental Learning under Concept Drift across Evolving Data Streams

Title: Generalized Incremental Learning under Concept Drift across Evolving Data Streams Allgemeines Inkrementelles Lernen unter Konzept Drift über Evolving Data Streams 贯穿不断演变的数据流的 “ 概念漂流 “ 概念下的普通递增学习 2506.05736v1

Authors (3): En Yu, Jie Lu, Guangquan Zhang

Real-world data streams exhibit inherent non-stationarity characterized by concept drift, posing significant challenges for adaptive learning systems. While existing methods address isolated distribution shifts, they overlook the critical co-evolution of label spaces and distributions under limited supervision and persistent uncertainty. To address this, we formalize Generalized Incremental Learning under Concept Drift (GILCD), characterizing the joint evolution of distributions and label spaces in open-environment streaming contexts, and propose a novel framework called Calibrated Source-Free Adaptation (CSFA). First, CSFA introduces a training-free prototype calibration mechanism that dynamically fuses emerging prototypes with base representations, enabling stable new-class identification without optimization overhead. Second, we design a novel source-free adaptation algorithm, i.e., Reliable Surrogate Gap Sharpness-aware (RSGS) minimization. It integrates sharpness-aware perturbation loss optimization with surrogate gap minimization, while employing entropy-based uncertainty filtering to discard unreliable samples. This mechanism ensures robust distribution alignment and mitigates generalization degradation caused by uncertainties. Therefore, CSFA establishes a unified framework for stable adaptation to evolving semantics and distributions in open-world streaming scenarios. Extensive experiments validate the superior performance and effectiveness of CSFA compared to state-of-the-art approaches.

nan


Article 1731

Title@2025-06-06 (5): Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness

Title: Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness Vergessen LLMs wirklich? Bewertung des Unlernens mit Wissenskorrelation und Vertrauensbewusstsein LLMs真的会忘却吗?用知识关联和信心意识来评价不学习 2506.05735v1

Authors (12): Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu, Ruihan Wu, Haoteng Yin, Mohsen Ghassemi, Yifan Li, Vamsi K. Potluru, Eli Chien, Kamalika Chaudhuri, Olgica Milenkovic, Pan Li

Machine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs). However, existing approaches predominantly focus on the explicit removal of isolated facts, often overlooking latent inferential dependencies and the non-deterministic nature of knowledge within LLMs. Consequently, facts presumed forgotten may persist implicitly through correlated information. To address these challenges, we propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge by representing relevant factual contexts as knowledge graphs with associated confidence scores. We further develop an inference-based evaluation protocol leveraging powerful LLMs as judges; these judges reason over the extracted knowledge subgraph to determine unlearning success. Our LLM judges utilize carefully designed prompts and are calibrated against human evaluations to ensure their trustworthiness and stability. Extensive experiments on our newly constructed benchmark demonstrate that our framework provides a more realistic and rigorous assessment of unlearning performance. Moreover, our findings reveal that current evaluation strategies tend to overestimate unlearning effectiveness. Our code is publicly available at https://github.com/Graph-COM/Knowledge_Unlearning.git.

nan


Article 1732

Title@2025-06-06 (5): Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity

Title: Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity Nicht-konvexe Matrix-Erfassung: Brechen der quadratischen Rank-Schranke in der Probenkomplexität 非曲线矩阵表感测:打破样本复杂程度的二次级屏障 2408.13276v3

Authors (2): Dominik Stöger, Yizhe Zhu

For the problem of reconstructing a low-rank matrix from a few linear measurements, two classes of algorithms have been widely studied in the literature: convex approaches based on nuclear norm minimization, and non-convex approaches that use factorized gradient descent. Under certain statistical model assumptions, it is known that nuclear norm minimization recovers the ground truth as soon as the number of samples scales linearly with the number of degrees of freedom of the ground truth. In contrast, while non-convex approaches are computationally less expensive, existing recovery guarantees assume that the number of samples scales at least quadratically with the rank $r$ of the ground-truth matrix. In this paper, we close this gap by showing that the non-convex approaches can be as efficient as nuclear norm minimization in terms of sample complexity. Namely, we consider the problem of reconstructing a positive semidefinite matrix from a few Gaussian measurements. We show that factorized gradient descent with spectral initialization converges to the ground truth with a linear rate as soon as the number of samples scales with $ \Omega (rd\kappa^2)$, where $d$ is the dimension, and $\kappa$ is the condition number of the ground truth matrix. This improves the previous rank-dependence in the sample complexity of non-convex matrix factorization from quadratic to linear. Our proof relies on a probabilistic decoupling argument, where we show that the gradient descent iterates are only weakly dependent on the individual entries of the measurement matrices. We expect that our proof technique is of independent interest for other non-convex problems.

nan


Article 1733

Title@2025-06-06 (5): GroMo: Plant Growth Modeling with Multiview Images

Title: GroMo: Plant Growth Modeling with Multiview Images GroMo: Plant Growth Modeling mit Multiview-Bildern Gromo:以多视角图像建模的植物增长模型 2503.06608v2

Authors (8): Ruchi Bhatt, Shreya Bansal, Amanpreet Chander, Rupinder Kaur, Malya Singh, Mohan Kankanhalli, Abdulmotaleb El Saddik, Mukesh Kumar Saini

Understanding plant growth dynamics is essential for applications in agriculture and plant phenotyping. We present the Growth Modelling (GroMo) challenge, which is designed for two primary tasks: (1) plant age prediction and (2) leaf count estimation, both essential for crop monitoring and precision agriculture. For this challenge, we introduce GroMo25, a dataset with images of four crops: radish, okra, wheat, and mustard. Each crop consists of multiple plants (p1, p2, …, pn) captured over different days (d1, d2, …, dm) and categorized into five levels (L1, L2, L3, L4, L5). Each plant is captured from 24 different angles with a 15-degree gap between images. Participants are required to perform both tasks for all four crops with these multiview images. We proposed a Multiview Vision Transformer (MVVT) model for the GroMo challenge and evaluated the crop-wise performance on GroMo25. MVVT reports an average MAE of 7.74 for age prediction and an MAE of 5.52 for leaf count. The GroMo Challenge aims to advance plant phenotyping research by encouraging innovative solutions for tracking and predicting plant growth. The GitHub repository is publicly available at https://github.com/mriglab/GroMo-Plant-Growth-Modeling-with-Multiview-Images.

nan


Article 1734

Title@2025-06-06 (5): Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization

Title: Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization Provable Complexity Verbesserung von AdaGrad über SGD: obere und untere Bounds in stochastischen Non-Convex-Optimierung AdaGrad的复杂度高于SGD的改善:在斯托切性非Convex优化中的上下界 2406.04592v3

Authors (3): Ruichen Jiang, Devyani Maladkar, Aryan Mokhtari

Adaptive gradient methods, such as AdaGrad, are among the most successful optimization algorithms for neural network training. While these methods are known to achieve better dimensional dependence than stochastic gradient descent (SGD) for stochastic convex optimization under favorable geometry, the theoretical justification for their success in stochastic non-convex optimization remains elusive. In fact, under standard assumptions of Lipschitz gradients and bounded noise variance, it is known that SGD is worst-case optimal in terms of finding a near-stationary point with respect to the $l_2$-norm, making further improvements impossible. Motivated by this limitation, we introduce refined assumptions on the smoothness structure of the objective and the gradient noise variance, which better suit the coordinate-wise nature of adaptive gradient methods. Moreover, we adopt the $l_1$-norm of the gradient as the stationarity measure, as opposed to the standard $l_2$-norm, to align with the coordinate-wise analysis and obtain tighter convergence guarantees for AdaGrad. Under these new assumptions and the $l_1$-norm stationarity measure, we establish an upper bound on the convergence rate of AdaGrad and a corresponding lower bound for SGD. In particular, we identify non-convex settings in which the iteration complexity of AdaGrad is favorable over SGD and show that, for certain configurations of problem parameters, it outperforms SGD by a factor of $d$, where $d$ is the problem dimension. To the best of our knowledge, this is the first result to demonstrate a provable gain of adaptive gradient methods over SGD in a non-convex setting. We also present supporting lower bounds, including one specific to AdaGrad and one applicable to general deterministic first-order methods, showing that our upper bound for AdaGrad is tight and unimprovable up to a logarithmic factor under certain conditions.

nan


Article 1735

Title@2025-06-06 (5): Large Language Models are Good Relational Learners

Title: Large Language Models are Good Relational Learners Große Sprachmodelle sind gute relationale Lernende 大语言模型是好关系学习者 2506.05725v1

Authors (3): Fang Wu, Vijay Prakash Dwivedi, Jure Leskovec

Large language models (LLMs) have demonstrated remarkable capabilities across various domains, yet their application to relational deep learning (RDL) remains underexplored. Existing approaches adapt LLMs by traversing relational links between entities in a database and converting the structured data into flat text documents. Still, this text-based serialization disregards critical relational structures, introduces redundancy, and often exceeds standard LLM context lengths. We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for LLMs within a retrieval-augmented generation (RAG) framework. Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to effectively process and reason over complex entity relationships. Specifically, the GNN encoder extracts a local subgraph around an entity to build feature representations that contain relevant entity relationships and temporal dependencies. These representations are transformed into structured prompts using a denormalization process, effectively allowing the LLM to reason over relational structures. Through extensive experiments, we demonstrate that Rel-LLM outperforms existing methods on key RDL tasks, offering a scalable and efficient approach to integrating LLMs with structured data sources. Code is available at https://github.com/smiles724/Rel-LLM.

nan


Article 1736

Title@2025-06-06 (5): Any-Class Presence Likelihood for Robust Multi-Label Classification with Abundant Negative Data

Title: Any-Class Presence Likelihood for Robust Multi-Label Classification with Abundant Negative Data Jede-Klasse-Präsenz-Wahrscheinlichkeit für robuste Multi-Label-Klassifikation mit überschwänglichen Negativdaten 具有大量负值数据的强性多标签分类中存在任何类别的可能性 2506.05721v1

Authors (5): Dumindu Tissera, Omar Awadallah, Muhammad Umair Danish, Ayan Sadhu, Katarina Grolinger

Multi-label Classification (MLC) assigns an instance to one or more non-exclusive classes. A challenge arises when the dataset contains a large proportion of instances with no assigned class, referred to as negative data, which can overwhelm the learning process and hinder the accurate identification and classification of positive instances. Nevertheless, it is common in MLC applications such as industrial defect detection, agricultural disease identification, and healthcare diagnosis to encounter large amounts of negative data. Assigning a separate negative class to these instances further complicates the learning objective and introduces unnecessary redundancies. To address this challenge, we redesign standard MLC loss functions by deriving a likelihood of any class being present, formulated by a normalized weighted geometric mean of the predicted class probabilities. We introduce a regularization parameter that controls the relative contribution of the absent class probabilities to the any-class presence likelihood in positive instances. The any-class presence likelihood complements the multi-label learning by encouraging the network to become more aware of implicit positive instances and improve the label classification within those positive instances. Experiments on large-scale datasets with negative data: SewerML, modified COCO, and ChestX-ray14, across various networks and base loss functions show that our loss functions consistently improve MLC performance of their standard loss counterparts, achieving gains of up to 6.01 percentage points in F1, 8.06 in F2, and 3.11 in mean average precision, all without additional parameters or computational complexity. Code available at: https://github.com/ML-for-Sensor-Data-Western/gmean-mlc

nan


Article 1737

Title@2025-06-06 (5): Optimization Proxies using Limited Labeled Data and Training Time – A Semi-Supervised Bayesian Neural Network Approach

Title: Optimization Proxies using Limited Labeled Data and Training Time – A Semi-Supervised Bayesian Neural Network Approach Optimierungs-Proxies mit begrenzten beschrifteten Daten und Trainingszeit – Ein semi-überwachter Bayesian Neural Network-Ansatz 使用有限标签数据和培训时间的有限标签数据和培训时间 – – 半超模贝耶斯神经网络方法 2410.03085v3

Authors (5): Parikshit Pareek, Abhijith Jayakumar, Kaarthik Sundar, Deepjyoti Deka, Sidhant Misra

Constrained optimization problems arise in various engineering systems such as inventory management and power grids. Standard deep neural network (DNN) based machine learning proxies are ineffective in practical settings where labeled data is scarce and training times are limited. We propose a semi-supervised Bayesian Neural Networks (BNNs) based optimization proxy for this complex regime, wherein training commences in a sandwiched fashion, alternating between a supervised learning step for minimizing cost, and an unsupervised learning step for enforcing constraint feasibility. We show that the proposed semi-supervised BNN outperforms DNN architectures on important non-convex constrained optimization problems from energy network operations, achieving up to a tenfold reduction in expected maximum equality gap and halving the inequality gaps. Further, the BNN’s ability to provide posterior samples is leveraged to construct practically meaningful probabilistic confidence bounds on performance using a limited validation data, unlike prior methods. The implementation code for this study is available at: https://github.com/kaarthiksundar/BNN-OPF/.

nan


Article 1738

Title@2025-06-06 (5): Grokking Beyond the Euclidean Norm of Model Parameters

Title: Grokking Beyond the Euclidean Norm of Model Parameters Grokking jenseits der euklidischen Norm von Modellparametern 示范参数欧洲标准 2506.05718v1

Authors (3): Pascal Jr Tikeng Notsawo, Guillaume Dumas, Guillaume Rabusseau

Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. In this work, we demonstrate that grokking can be induced by regularization, either explicit or implicit. More precisely, we show that when there exists a model with a property $P$ (e.g., sparse or low-rank weights) that generalizes on the problem of interest, gradient descent with a small but non-zero regularization of $P$ (e.g., $\ell_1$ or nuclear norm regularization) results in grokking. This extends previous work showing that small non-zero weight decay induces grokking. Moreover, our analysis shows that over-parameterization by adding depth makes it possible to grok or ungrok without explicitly using regularization, which is impossible in shallow cases. We further show that the $\ell_2$ norm is not a reliable proxy for generalization when the model is regularized toward a different property $P$, as the $\ell_2$ norm grows in many cases where no weight decay is used, but the model generalizes anyway. We also show that grokking can be amplified solely through data selection, with any other hyperparameter fixed.

nan


Article 1739

Title@2025-06-06 (5): Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning

Title: Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning Ensemble Elastic DQN: Ein neuartiger Mehrschritt-Ensemble-Ansatz zur Bewältigung von Überschätzungen im tiefen wertbasierten Verstärkungslernen 共合大 Elastic DQN: 一种新颖的多步共步办法,以解决深重价值强化学习中的高估问题 2506.05716v1

Authors (5): Adrian Ly, Richard Dazeley, Peter Vamplew, Francisco Cruz, Sunil Aryal

While many algorithmic extensions to Deep Q-Networks (DQN) have been proposed, there remains limited understanding of how different improvements interact. In particular, multi-step and ensemble style extensions have shown promise in reducing overestimation bias, thereby improving sample efficiency and algorithmic stability. In this paper, we introduce a novel algorithm called Ensemble Elastic Step DQN (EEDQN), which unifies ensembles with elastic step updates to stabilise algorithmic performance. EEDQN is designed to address two major challenges in deep reinforcement learning: overestimation bias and sample efficiency. We evaluated EEDQN against standard and ensemble DQN variants across the MinAtar benchmark, a set of environments that emphasise behavioral learning while reducing representational complexity. Our results show that EEDQN achieves consistently robust performance across all tested environments, outperforming baseline DQN methods and matching or exceeding state-of-the-art ensemble DQNs in final returns on most of the MinAtar environments. These findings highlight the potential of systematically combining algorithmic improvements and provide evidence that ensemble and multi-step methods, when carefully integrated, can yield substantial gains.

nan


Article 1740

Title@2025-06-06 (5): Mirage: A Multi-Level Superoptimizer for Tensor Programs

Title: Mirage: A Multi-Level Superoptimizer for Tensor Programs Mirage: Ein Multi-Level-Superoptimizer für Tensor-Programme 幻影:向导方案多层次超强激励器 2405.05751v3

Authors (10): Mengdi Wu, Xinhao Cheng, Shengyu Liu, Chunan Shi, Jianan Ji, Kit Ao, Praveen Velliengiri, Xupeng Miao, Oded Padon, Zhihao Jia

We introduce Mirage, the first multi-level superoptimizer for tensor programs. A key idea in Mirage is $\mu$Graphs, a uniform representation of tensor programs at the kernel, thread block, and thread levels of the GPU compute hierarchy. $\mu$Graphs enable Mirage to discover novel optimizations that combine algebraic transformations, schedule transformations, and generation of new custom kernels. To navigate the large search space, Mirage introduces a pruning technique based on abstraction that significantly reduces the search space and provides a certain optimality guarantee. To ensure that the optimized $\mu$Graph is equivalent to the input program, Mirage introduces a probabilistic equivalence verification procedure with strong theoretical guarantees. Our evaluation shows that Mirage outperforms existing approaches by up to 3.3$\times$ even for DNNs that are widely used and heavily optimized. Mirage is publicly available at https://github.com/mirage-project/mirage.

nan


Article 1741

Title@2025-06-06 (5): Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation

Title: Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation Kommen Sie zusammen, aber nicht jetzt: Eine progressive Strategie, um Low-Rank-Anpassung zu fördern 齐心合力,但现在不是现在:一个推进低Rank适应的渐进战略 2506.05713v1

Authors (12): Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma, Yu Zhang, Ying Wei

Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters’ activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter’s marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances multi-task merging accuracy, improves pruning robustness, and reduces training overhead, all while remaining compatible with diverse LoRA variants. Code is available at https://github.com/zwebzone/coto.

nan


Article 1742

Title@2025-06-06 (5): A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models

Title: A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models Eine Umfrage über Sparse Autoencoder: Interpretation der internen Mechanismen großer Sprachmodelle 粗略自动编码器调查:解释大语言模式内部机制 2503.05613v2

Authors (7): Dong Shu, Xuansheng Wu, Haiyan Zhao, Daking Rai, Ziyu Yao, Ninghao Liu, Mengnan Du

Large Language Models (LLMs) have transformed natural language processing, yet their internal mechanisms remain largely opaque. Recently, mechanistic interpretability has attracted significant attention from the research community as a means to understand the inner workings of LLMs. Among various mechanistic interpretability approaches, Sparse Autoencoders (SAEs) have emerged as a promising method due to their ability to disentangle the complex, superimposed features within LLMs into more interpretable components. This paper presents a comprehensive survey of SAEs for interpreting and understanding the internal workings of LLMs. Our major contributions include: (1) exploring the technical framework of SAEs, covering basic architecture, design improvements, and effective training strategies; (2) examining different approaches to explaining SAE features, categorized into input-based and output-based explanation methods; (3) discussing evaluation methods for assessing SAE performance, covering both structural and functional metrics; and (4) investigating real-world applications of SAEs in understanding and manipulating LLM behaviors.

nan


Article 1743

Title@2025-06-06 (5): Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application

Title: Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v1

Authors (4): Xiucheng Wang, Honggang Jia, Nan Cheng, Dusit Niyato

In this paper, a novel semantic communication framework empowered by generative artificial intelligence (GAI) is proposed, specifically leveraging the capabilities of diffusion models (DMs). A rigorous theoretical foundation is established based on stochastic differential equations (SDEs), which elucidates the denoising properties of DMs in mitigating additive white Gaussian noise (AWGN) in latent semantic representations. Crucially, a closed-form analytical relationship between the signal-to-noise ratio (SNR) and the denoising timestep is derived, enabling the optimal selection of diffusion parameters for any given channel condition. To address the distribution mismatch between the received signal and the DM’s training data, a mathematically principled scaling mechanism is introduced, ensuring robust performance across a wide range of SNRs without requiring model fine-tuning. Built upon this theoretical insight, we develop a latent diffusion model (LDM)-based semantic transceiver, wherein a variational autoencoder (VAE) is employed for efficient semantic compression, and a pretrained DM serves as a universal denoiser. Notably, the proposed architecture is fully training-free at inference time, offering high modularity and compatibility with large-scale pretrained LDMs. This design inherently supports zero-shot generalization and mitigates the challenges posed by out-of-distribution inputs. Extensive experimental evaluations demonstrate that the proposed framework significantly outperforms conventional neural-network-based semantic communication baselines, particularly under low SNR conditions and distributional shifts, thereby establishing a promising direction for GAI-driven robust semantic transmission in future 6G systems.

nan


Article 1744

Title@2025-06-06 (5): Understanding Memorization in Generative Models via Sharpness in Probability Landscapes

Title: Understanding Memorization in Generative Models via Sharpness in Probability Landscapes Gedächtnis in Generativen Modellen über Schärfe in Wahrscheinlichkeitslandschaften verstehen 通过在概率景观中的尖锐化理解生成模型中的记忆化 2412.04140v3

Authors (3): Dongjae Jeon, Dueun Kim, Albert No

In this paper, we introduce a geometric framework to analyze memorization in diffusion models through the sharpness of the log probability density. We mathematically justify a previously proposed score-difference-based memorization metric by demonstrating its effectiveness in quantifying sharpness. Additionally, we propose a novel memorization metric that captures sharpness at the initial stage of image generation in latent diffusion models, offering early insights into potential memorization. Leveraging this metric, we develop a mitigation strategy that optimizes the initial noise of the generation process using a sharpness-aware regularization term.

nan


Article 1745

Title@2025-06-06 (5): Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting

Title: Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting Online-Erkennung von LLM-generierten Texten durch sequentielle Hypothesenprüfung durch Wetten 通过打赌按顺序假设测试在线检测LLM发光文本 2410.22318v3

Authors (2): Can Chen, Jun-Kun Wang

Developing algorithms to differentiate between machine-generated texts and human-written texts has garnered substantial attention in recent years. Existing methods in this direction typically concern an offline setting where a dataset containing a mix of real and machine-generated texts is given upfront, and the task is to determine whether each sample in the dataset is from a large language model (LLM) or a human. However, in many practical scenarios, sources such as news websites, social media accounts, and online forums publish content in a streaming fashion. Therefore, in this online scenario, how to quickly and accurately determine whether the source is an LLM with strong statistical guarantees is crucial for these media or platforms to function effectively and prevent the spread of misinformation and other potential misuse of LLMs. To tackle the problem of online detection, we develop an algorithm based on the techniques of sequential hypothesis testing by betting that not only builds upon and complements existing offline detection techniques but also enjoys statistical guarantees, which include a controlled false positive rate and the expected time to correctly identify a source as an LLM. Experiments were conducted to demonstrate the effectiveness of our method.

nan


Article 1746

Title@2025-06-06 (5): Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces

Title: Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces Action-Adaptive Continual Learning: Politische Generalisierung unter dynamischen Aktionsräumen aktivieren 行动促进性持续学习:动态行动空间下的扶持性政策概括化 2506.05702v1

Authors (7): Chaofan Pan, Jiafen Liu, Yanhua Li, Linbo Xiong, Fan Min, Wei Wei, Xin Yang

Continual Learning (CL) is a powerful tool that enables agents to learn a sequence of tasks, accumulating knowledge learned in the past and using it for problem-solving or future task learning. However, existing CL methods often assume that the agent’s capabilities remain static within dynamic environments, which doesn’t reflect real-world scenarios where capabilities dynamically change. This paper introduces a new and realistic problem: Continual Learning with Dynamic Capabilities (CL-DC), posing a significant challenge for CL agents: How can policy generalization across different action spaces be achieved? Inspired by the cortical functions, we propose an Action-Adaptive Continual Learning framework (AACL) to address this challenge. Our framework decouples the agent’s policy from the specific action space by building an action representation space. For a new action space, the encoder-decoder of action representations is adaptively fine-tuned to maintain a balance between stability and plasticity. Furthermore, we release a benchmark based on three environments to validate the effectiveness of methods for CL-DC. Experimental results demonstrate that our framework outperforms popular methods by generalizing the policy across action spaces.

nan


Article 1747

Title@2025-06-06 (5): Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health

Title: Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health Statistisch gültige Überwachung nach der Bereitstellung sollte Standard für KI-basierte digitale Gesundheit sein 统计上有效的部署后监测应作为AI 数字健康的标准 2506.05701v1

Authors (4): Pavel Dolin, Weizhi Li, Gautam Dasarathy, Visar Berisha

This position paper argues that post-deployment monitoring in clinical AI is underdeveloped and proposes statistically valid and label-efficient testing frameworks as a principled foundation for ensuring reliability and safety in real-world deployment. A recent review found that only 9% of FDA-registered AI-based healthcare tools include a post-deployment surveillance plan. Existing monitoring approaches are often manual, sporadic, and reactive, making them ill-suited for the dynamic environments in which clinical models operate. We contend that post-deployment monitoring should be grounded in label-efficient and statistically valid testing frameworks, offering a principled alternative to current practices. We use the term “statistically valid” to refer to methods that provide explicit guarantees on error rates (e.g., Type I/II error), enable formal inference under pre-defined assumptions, and support reproducibility–features that align with regulatory requirements. Specifically, we propose that the detection of changes in the data and model performance degradation should be framed as distinct statistical hypothesis testing problems. Grounding monitoring in statistical rigor ensures a reproducible and scientifically sound basis for maintaining the reliability of clinical AI systems. Importantly, it also opens new research directions for the technical community–spanning theory, methods, and tools for statistically principled detection, attribution, and mitigation of post-deployment model failures in real-world settings.

nan


Article 1748

Title@2025-06-06 (5): Knowledge Retention for Continual Model-Based Reinforcement Learning

Title: Knowledge Retention for Continual Model-Based Reinforcement Learning Wissensvorbehalt für kontinuierliches modellbasiertes Verstärkungslernen 继续保留知识,以不断保持知识,进行基于模式的不断强化学习 2503.04256v4

Authors (4): Yixiang Sun, Haotian Fu, Michael Littman, George Konidaris

We propose DRAGO, a novel approach for continual model-based reinforcement learning aimed at improving the incremental development of world models across a sequence of tasks that differ in their reward functions but not the state space or dynamics. DRAGO comprises two key components: Synthetic Experience Rehearsal, which leverages generative models to create synthetic experiences from past tasks, allowing the agent to reinforce previously learned dynamics without storing data, and Regaining Memories Through Exploration, which introduces an intrinsic reward mechanism to guide the agent toward revisiting relevant states from prior tasks. Together, these components enable the agent to maintain a comprehensive and continually developing world model, facilitating more effective learning and adaptation across diverse environments. Empirical evaluations demonstrate that DRAGO is able to preserve knowledge across tasks, achieving superior performance in various continual learning scenarios.

nan


Article 1749

Title@2025-06-06 (5): Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning

Title: Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning Allgemeines LLM-Lernen von Graphen-Synthetischen Daten mit Verstärkungs-Lernen 利用强化学习学习图形合成数据学习 2506.00845v2

Authors (6): Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xinyun Liu, Yulia Tsvetkov

Previous research has sought to enhance the graph reasoning capabilities of LLMs by supervised fine-tuning on synthetic graph data. While these led to specialized LLMs better at solving graph algorithm problems, we don’t need LLMs for shortest path: we need generalization from synthetic graph data to real-world tasks with implicit graph structures. In this work, we propose to unlock generalizable learning of graph synthetic data with reinforcement learning. We first design solution-based and process-based rewards for synthetic graph problems: instead of rigid memorizing response patterns in direct fine-tuning, we posit that RL would help LLMs grasp the essentials underlying graph reasoning and alleviate overfitting. We employ RL algorithms such as GRPO and DPO, aligning both off-the-shelf LLMs and LLMs fine-tuned on synthetic graph data. We then compare them against existing settings on both in-domain synthetic tasks and out-of-domain real-world tasks with implicit graph structures such as multi-hop QA, structured planning, and more. Extensive experiments demonstrate that our RL recipe leads to statistically significant improvement on 5 datasets, with an average gain of 12.9\% over baseline settings. Further analysis reveals that process-based rewards consistently outperform solution-based rewards, mixing synthetic and real-world task data yields potential gains, while compositionality and explainable intermediate steps remains a critical challenge even after RL.

nan


Article 1750

Title@2025-06-06 (5): Graph Attention Networks Unleashed: A Fast and Explainable Vulnerability Assessment Framework for Microgrids

Title: Graph Attention Networks Unleashed: A Fast and Explainable Vulnerability Assessment Framework for Microgrids Graphen-Achtungsnetzwerke unleashed: Ein schneller und erklärbarer Sicherheitsbewertungsrahmen für Mikrogrids 未剥离的图示注意网络:微电网快速和可解释的脆弱性评估框架 2503.00786v2

Authors (5): Wei Liu, Tao Zhang, Chenhui Lin, Kaiwen Li, Rui Wang

Independent microgrids are crucial for supplying electricity by combining distributed energy resources and loads in scenarios like isolated islands and field combat. Fast and accurate assessments of microgrid vulnerability against intentional attacks or natural disasters are essential for effective risk prevention and design optimization. However, conventional Monte Carlo simulation (MCS) methods are computationally expensive and time-consuming, while existing machine learning-based approaches often lack accuracy and explainability. To address these challenges, this study proposes a fast and explainable vulnerability assessment framework that integrates MCS with a graph attention network enhanced by self-attention pooling (GAT-S). MCS generates training data, while the GAT-S model learns the structural and electrical characteristics of the microgrid and further assesses its vulnerability intelligently. The GAT-S improves explainability and computational efficiency by dynamically assigning attention weights to critical nodes. Comprehensive experimental evaluations across various microgrid configurations demonstrate that the proposed framework provides accurate vulnerability assessments, achieving a mean squared error as low as 0.001, real-time responsiveness within 1 second, and delivering explainable results.

nan


Article 1751

Title@2025-06-06 (5): Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework

Title: Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework Progressiv stark sein! Erweiterung der Wissensdestillation von großen Sprachmodellen durch einen Curriculum Learning Framework 通过课程学习框架加强大语言模式的知识提炼 2506.05695v1

Authors (2): Lingyuan Liu, Mengxiang Zhang

Knowledge Distillation (KD) compresses large language models (LLMs) by transferring the teacher model’s capabilities to a smaller student model, reducing inference cost and memory usage while maintaining performance. However, existing KD methods for LLMs often fail to prevent significant shifts in the student model’s distribution during training, leading to issues such as catastrophic forgetting, mode collapse, and training-inference mismatch. To address these challenges, we propose a novel, plug-in curriculum learning framework inspired by the strength training principle of “progressive overload” (POCL), which can be seamlessly integrated into existing white-box KD approaches with minimal computational overhead. The framework comprises two core components: (1) a difficulty measurer that ranks and partitions training samples from easy to hard, and (2) a training scheduler that incrementally introduces these subsets into the distillation process at fixed intervals while applying loss functions with progressively rising temperatures. By starting with the easiest samples and progressively increasing the difficulty, the approach enhances both the stability and efficiency of learning. Extensive experiments in instruction-following settings demonstrate that POCL consistently improves the performance of distilled student models across various white-box KD methods and model families. Our findings highlight the effectiveness of sorted training samples in KD for LLMs. More generally, our work demonstrates how to structure training data within the KD process to enhance the stability and performance of distilled LLMs.

nan


Article 1752

Title@2025-06-06 (5): Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs

Title: Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs Direct Behavior Optimization: Entsperren des Potenzials leichter LLMs 直接行为优化:释放轻度LMT的潜力 2506.06401v1

Authors (7): Hongming Yang, Shi Lin, Jun Shao, Changting Lin, Donghai Zhu, Meng Han, Qinglei Kong

Lightweight Large Language Models (LwLLMs) are reduced-parameter, optimized models designed to run efficiently on consumer-grade hardware, offering significant advantages in resource efficiency, cost-effectiveness, and data privacy. However, these models often struggle with limited inference and reasoning capabilities, which restrict their performance on complex tasks and limit their practical applicability. Moreover, existing prompt optimization methods typically rely on extensive manual effort or the meta-cognitive abilities of state-of-the-art LLMs, making them less effective for LwLLMs. To address these challenges, we introduce DeBoP, a new Direct Behavior Optimization Paradigm, original from the Chain-of-Thought (CoT) prompting technique. Unlike CoT Prompting, DeBoP is an automatic optimization method, which focuses on the optimization directly on the behavior of LwLLMs. In particular, DeBoP transforms the optimization of complex prompts into the optimization of discrete, quantifiable execution sequences using a gradient-free Monte Carlo Tree Search. We evaluate DeBoP on seven challenging tasks where state-of-the-art LLMs excel but LwLLMs generally underperform. Experimental results demonstrate that DeBoP significantly outperforms recent prompt optimization methods on most tasks. In particular, DeBoP-optimized LwLLMs surpass GPT-3.5 on most tasks while reducing computational time by approximately 60% compared to other automatic prompt optimization methods.

nan


Article 1753

Title@2025-06-06 (5): Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

Title: Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning Gerade genug denken: Effiziente Vernunft mit adaptiven Längenstrafen Stärkungslernen ” 思考够了 “ : “ 适应性延长刑罚强化学习的高效理性 “ 2506.05256v2

Authors (7): Violet Xiang, Chase Blagden, Rafael Rafailov, Nathan Lile, Sang Truong, Chelsea Finn, Nick Haber

Large reasoning models (LRMs) achieve higher performance on challenging reasoning tasks by generating more tokens at inference time, but this verbosity often wastes computation on easy problems. Existing solutions, including supervised finetuning on shorter traces, user-controlled budgets, or RL with uniform penalties, either require data curation, manual configuration, or treat all problems alike regardless of difficulty. We introduce Adaptive Length Penalty (ALP), a reinforcement learning objective tailoring generation length to per-prompt solve rate. During training, ALP monitors each prompt’s online solve rate through multiple rollouts and adds a differentiable penalty whose magnitude scales inversely with that rate, so confident (easy) prompts incur a high cost for extra tokens while hard prompts remain unhindered. Posttraining DeepScaleR-1.5B with ALP cuts average token usage by 50\% without significantly dropping performance. Relative to fixed-budget and uniform penalty baselines, ALP redistributes its reduced budget more intelligently by cutting compute on easy prompts and reallocating saved tokens to difficult ones, delivering higher accuracy on the hardest problems with higher cost.

nan


Article 1754

Title@2025-06-06 (5): Simmering: Sufficient is better than optimal for training neural networks

Title: Simmering: Sufficient is better than optimal for training neural networks Simmering: Ausreichend ist besser als optimal für die Ausbildung neuronaler Netzwerke 浸泡:对培训神经网络而言,充足比最佳的更好 2410.19912v2

Authors (3): Irina Babayan, Hazhir Aliahmadi, Greg van Anders

The broad range of neural network training techniques that invoke optimization but rely on ad hoc modification for validity suggests that optimization-based training is misguided. Shortcomings of optimization-based training are brought to particularly strong relief by the problem of overfitting, where naive optimization produces spurious outcomes. The broad success of neural networks for modelling physical processes has prompted advances that are based on inverting the direction of investigation and treating neural networks as if they were physical systems in their own right. These successes raise the question of whether broader, physical perspectives could motivate the construction of improved training algorithms. Here, we introduce simmering, a physics-based method that trains neural networks to generate weights and biases that are merely ``good enough’’, but which, paradoxically, outperforms leading optimization-based approaches. Using classification and regression examples we show that simmering corrects neural networks that are overfit by Adam, and show that simmering avoids overfitting if deployed from the outset. Our results question optimization as a paradigm for neural network training, and leverage information-geometric arguments to point to the existence of classes of sufficient training algorithms that do not take optimization as their starting point.

nan


Article 1755

Title@2025-06-06 (5): TracLLM: A Generic Framework for Attributing Long Context LLMs

Title: TracLLM: A Generic Framework for Attributing Long Context LLMs TracLLM: Ein generisches Rahmenwerk für die Zuweisung von LLMs mit langem Kontext TracLLM: 长期LMLM授标通用框架 2506.04202v2

Authors (4): Yanting Wang, Wei Zou, Runpeng Geng, Jinyuan Jia

Long context large language models (LLMs) are deployed in many real-world applications such as RAG, agent, and broad LLM-integrated applications. Given an instruction and a long context (e.g., documents, PDF files, webpages), a long context LLM can generate an output grounded in the provided context, aiming to provide more accurate, up-to-date, and verifiable outputs while reducing hallucinations and unsupported claims. This raises a research question: how to pinpoint the texts (e.g., sentences, passages, or paragraphs) in the context that contribute most to or are responsible for the generated output by an LLM? This process, which we call context traceback, has various real-world applications, such as 1) debugging LLM-based systems, 2) conducting post-attack forensic analysis for attacks (e.g., prompt injection attack, knowledge corruption attacks) to an LLM, and 3) highlighting knowledge sources to enhance the trust of users towards outputs generated by LLMs. When applied to context traceback for long context LLMs, existing feature attribution methods such as Shapley have sub-optimal performance and/or incur a large computational cost. In this work, we develop TracLLM, the first generic context traceback framework tailored to long context LLMs. Our framework can improve the effectiveness and efficiency of existing feature attribution methods. To improve the efficiency, we develop an informed search based algorithm in TracLLM. We also develop contribution score ensemble/denoising techniques to improve the accuracy of TracLLM. Our evaluation results show TracLLM can effectively identify texts in a long context that lead to the output of an LLM. Our code and data are at: https://github.com/Wang-Yanting/TracLLM.

nan


Article 1756

Title@2025-06-06 (5): RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models

Title: RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models RoSTE: Ein effizienter Quantization-Aware-überwachter Feintuning-Ansatz für große Sprachmodelle RoSTE:对大语言模式采用高效率的量化软件监督微调方法 2502.09003v3

Authors (7): Quan Wei, Chung-Yiu Yau, Hoi-To Wai, Yang Katie Zhao, Dongyeop Kang, Youngsuk Park, Mingyi Hong

Supervised fine-tuning is a standard method for adapting pre-trained large language models (LLMs) to downstream tasks. Quantization has been recently studied as a post-training technique for efficient LLM deployment. To obtain quantized fine-tuned LLMs, conventional pipelines would first fine-tune the pre-trained models, followed by post-training quantization. This often yields suboptimal performance as it fails to leverage the synergy between fine-tuning and quantization. To effectively realize low-bit quantization of weights, activations and KV caches in LLMs, we propose an algorithm named Rotated Straight-Through-Estimator (RoSTE), which combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy that identifies an effective rotation configuration to reduce activation outliers. We provide theoretical insights on RoSTE by analyzing its prediction error when applied to an overparameterized least square quantized training problem. Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration. Experiments on Pythia, Qwen and Llama models of different sizes demonstrate the effectiveness of RoSTE. Compared to existing post-SFT quantization baselines, our method consistently achieves superior performances across various tasks and different LLM architectures. Our code is available at https://github.com/OptimAI-Lab/RoSTE.

nan


Article 1757

Title@2025-06-06 (5): Sparse Autoencoders, Again?

Title: Sparse Autoencoders, Again? Sparse Autoencoder, schon wieder? 又是粗糙的自动编码器吗? 2506.04859v2

Authors (4): Yin Lu, Xuening Zhu, Tong He, David Wipf

Is there really much more to say about sparse autoencoders (SAEs)? Autoencoders in general, and SAEs in particular, represent deep architectures that are capable of modeling low-dimensional latent structure in data. Such structure could reflect, among other things, correlation patterns in large language model activations, or complex natural image manifolds. And yet despite the wide-ranging applicability, there have been relatively few changes to SAEs beyond the original recipe from decades ago, namely, standard deep encoder/decoder layers trained with a classical/deterministic sparse regularizer applied within the latent space. One possible exception is the variational autoencoder (VAE), which adopts a stochastic encoder module capable of producing sparse representations when applied to manifold data. In this work we formalize underappreciated weaknesses with both canonical SAEs, as well as analogous VAEs applied to similar tasks, and propose a hybrid alternative model that circumvents these prior limitations. In terms of theoretical support, we prove that global minima of our proposed model recover certain forms of structured data spread across a union of manifolds. Meanwhile, empirical evaluations on synthetic and real-world datasets substantiate the efficacy of our approach in accurately estimating underlying manifold dimensions and producing sparser latent representations without compromising reconstruction error. In general, we are able to exceed the performance of equivalent-capacity SAEs and VAEs, as well as recent diffusion models where applicable, within domains such as images and language model activation patterns.

nan


Article 1758

Title@2025-06-06 (5): Learning Design-Score Manifold to Guide Diffusion Models for Offline Optimization

Title: Learning Design-Score Manifold to Guide Diffusion Models for Offline Optimization Learning Design-Score Manifold zur Anleitung von Diffusionsmodellen für Offline-Optimierung 指导离线优化推广模式 2506.05680v1

Authors (6): Tailin Zhou, Zhilin Chen, Wenlong Lyu, Zhitang Chen, Danny H. K. Tsang, Jun Zhang

Optimizing complex systems, from discovering therapeutic drugs to designing high-performance materials, remains a fundamental challenge across science and engineering, as the underlying rules are often unknown and costly to evaluate. Offline optimization aims to optimize designs for target scores using pre-collected datasets without system interaction. However, conventional approaches may fail beyond training data, predicting inaccurate scores and generating inferior designs. This paper introduces ManGO, a diffusion-based framework that learns the design-score manifold, capturing the design-score interdependencies holistically. Unlike existing methods that treat design and score spaces in isolation, ManGO unifies forward prediction and backward generation, attaining generalization beyond training data. Key to this is its derivative-free guidance for conditional generation, coupled with adaptive inference-time scaling that dynamically optimizes denoising paths. Extensive evaluations demonstrate that ManGO outperforms 24 single- and 10 multi-objective optimization methods across diverse domains, including synthetic tasks, robot control, material design, DNA sequence, and real-world engineering optimization.

nan


Article 1759

Title@2025-06-06 (5): Topology-aware Neural Flux Prediction Guided by Physics

Title: Topology-aware Neural Flux Prediction Guided by Physics Topologie-bewusst Neurale Flux-Vorhersage geleitet von der Physik 物理引导 2506.05676v1

Authors (4): Haoyang Jiang, Jindong Wang, Xingquan Zhu, Yi He

Graph Neural Networks (GNNs) often struggle in preserving high-frequency components of nodal signals when dealing with directed graphs. Such components are crucial for modeling flow dynamics, without which a traditional GNN tends to treat a graph with forward and reverse topologies equal.To make GNNs sensitive to those high-frequency components thereby being capable to capture detailed topological differences, this paper proposes a novel framework that combines 1) explicit difference matrices that model directional gradients and 2) implicit physical constraints that enforce messages passing within GNNs to be consistent with natural laws. Evaluations on two real-world directed graph data, namely, water flux network and urban traffic flow network, demonstrate the effectiveness of our proposal.

nan


Article 1760

Title@2025-06-06 (5): LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models

Title: LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models LauraTSE: Target Speaker Extraction mit Auto-Regressive Decoder-Only Language Models LARTSE: 使用自动递退解码器单一语言模式将议长抽取作为目标 2504.07402v2

Authors (3): Beilong Tang, Bang Zeng, Ming Li

We propose LauraTSE, an Auto-Regressive Decoder-Only Language Model for Target Speaker Extraction built upon the LauraGPT backbone. LauraTSE employs a small-scale auto-regressive decoder-only language model that generates the initial layers of the target speech’s discrete codec representations from the continuous embeddings of both the mixture and reference speech. These outputs serve as coarse-grained predictions. To refine them, a one-step encoder-only language model reconstructs the full codec representation by integrating information from both the mixture and the reference speech, adding fine-grained details. Our approach achieves superior or comparable performance to existing TSE models. Additionally, we conduct ablation studies to investigate the data scalability and the contribution of the encoder-only model.

nan


Article 1761

Title@2025-06-06 (5): Kernel Density Machines

Title: Kernel Density Machines Kerndichtemaschinen 内核密度机 2504.21419v2

Authors (2): Damir Filipovic, Paul Schneider

We introduce kernel density machines (KDM), a nonparametric estimator of a Radon–Nikodym derivative, based on reproducing kernel Hilbert spaces. KDM applies to general probability measures on countably generated measurable spaces under minimal assumptions. For computational efficiency, we incorporate a low-rank approximation with precisely controlled error that grants scalability to large-sample settings. We provide rigorous theoretical guarantees, including asymptotic consistency, a functional central limit theorem, and finite-sample error bounds, establishing a strong foundation for practical use. Empirical results based on simulated and real data demonstrate the efficacy and precision of KDM.

nan


Article 1762

Title@2025-06-06 (5): Contextually Guided Transformers via Low-Rank Adaptation

Title: Contextually Guided Transformers via Low-Rank Adaptation Kontextuell geführte Transformer über Low-Rank-Anpassung 通过低辐射适应实现上下文向导变形器 2506.05672v1

Authors (4): Andrey Zhmoginov, Jihwan Lee, Max Vladymyrov, Mark Sandler

Large Language Models (LLMs) based on Transformers excel at text processing, but their reliance on prompts for specialized behavior introduces computational overhead. We propose a modification to a Transformer architecture that eliminates the need for explicit prompts by learning to encode context into the model’s weights. Our Contextually Guided Transformer (CGT) model maintains a contextual summary at each sequence position, allowing it to update the weights on the fly based on the preceding context. This approach enables the model to self-specialize, effectively creating a tailored model for processing information following a given prefix. We demonstrate the effectiveness of our method on synthetic in-context learning tasks and language modeling benchmarks. Furthermore, we introduce techniques for enhancing the interpretability of the learned contextual representations, drawing connections to Variational Autoencoders and promoting smoother, more consistent context encoding. This work offers a novel direction for efficient and adaptable language modeling by integrating context directly into the model’s architecture.

nan


Article 1763

Title@2025-06-06 (5): BAQ: Efficient Bit Allocation Quantization for Large Language Models

Title: BAQ: Efficient Bit Allocation Quantization for Large Language Models BAQ: Effiziente Bitallokation Quantisierung für große Sprachmodelle BAQ: 大语言模式的高效比分配比量量化 2506.05664v1

Authors (4): Chao Zhang, Li Wang, Samson Lasaulce, Merouane Debbah

Post-training model quantization is a widely adopted technique for reducing the memory and computational costs of large language models (LLMs). However, most existing methods rely on uniform or heuristic bitwidth assignments, failing to account for the nonuniform sensitivity of weights to quantization noise. In this paper, we propose a novel framework for allocating quantization bitwidths based on sensitivity metrics derived from a Hessian proxy. We make key assumptions, which allow the layer/component-wise loss function to be expressed as an explicit function of the bitwidths. This enables a neat formulation of the bit allocation problem as a convex optimization task, whose closed-form solution adapts precision across weights to minimize the layer-wise quantization loss. Inspecting the solution provides several insights (such as the equal-loss structure), which are then exploited to design the proposed \textbf{BAQ} (Bit Allocation Quantization) algorithm. The proposed algorithm achieves a good trade-off between loss minimization and complexity and allows BAQ to be integrated into standard quantization pipelines with minimal overhead. Experimental results show that BAQ consistently outperforms GPTQ, achieving up to 56$\times$ lower perplexity at the same bitwidth on large language models ranging from 125M to 30B parameters. Leveraging our analytical results derived from solving the optimal bit allocation problem, we also provide a theoretical explanation for the observed gains. All codes of this paper are available at https://github.com/CSU-ModelCompression/BAQ.

nan


Article 1764

Title@2025-06-06 (5): CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation

Title: CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation CoopetitiveV: LLM-powered Coopetitive Multi-Agent für hochwertige Verilog-Generation 协作V:利用LLM-动力协同协作的多方协作促进高品质活性一代 2412.11014v2

Authors (8): Zhendong Mi, Renming Zheng, Haowen Zhong, Yue Sun, Seth Kneeland, Sayan Moitra, Ken Kutzer, Zhaozhuo Xu Shaoyi Huang

Recent advances in agentic LLMs have demonstrated great capabilities in Verilog code generation. However, existing approaches either use LLM-assisted single-agent prompting or cooperation-only multi-agent learning, which will lead to: (i) Degeneration issue for single-agent learning: characterized by diminished error detection and correction capabilities; (ii) Error propagation in cooperation-only multi-agent learning: erroneous information from the former agent will be propagated to the latter through prompts, which can make the latter agents generate buggy code. In this paper, we propose an LLM-based coopetitive multi-agent prompting framework, in which the agents cannot collaborate with each other to form the generation pipeline, but also create a healthy competitive mechanism to improve the generating quality. Our experimental results show that the coopetitive multi-agent framework can effectively mitigate the degeneration risk and reduce the error propagation while improving code error correction capabilities, resulting in higher quality Verilog code generation. The effectiveness of our approach is validated through extensive experiments. On VerilogEval Machine and Human dataset, CoopetitiveV+GPT-4 achieves 99.2% and 99.1% pass@10 scores, respectively. While on RTLLM, CoopetitiveV+GPT-4 obtains 100% syntax and 99.9% functionality pass@5 scores.

nan


Article 1765

Title@2025-06-06 (5): Emulating compact binary population synthesis simulations with robust uncertainty quantification and model comparison: Bayesian normalizing flows

Title: Emulating compact binary population synthesis simulations with robust uncertainty quantification and model comparison: Bayesian normalizing flows Emulieren kompakter binärer Populationssynthesesimulationen mit robuster Unsicherheitsquantifizierung und Modellvergleich: Bayesische Normalisierungsströme 以稳健的不确定性量化和模型比较来模拟精密的二元人口合成模拟,并进行有力的不确定性量化和模型比较: 2506.05657v1

Authors (1): Anarya Ray

Population synthesis simulations of compact binary coalescences~(CBCs) play a crucial role in extracting astrophysical insights from an ensemble of gravitational wave~(GW) observations. However, realistic simulations are costly to implement for a dense grid of initial conditions. Normalizing flows can emulate the distribution functions of a simulated population of binary parameters and thereby enable empirical constraints on the astrophysical initial conditions and branching fractions of various formation channels given data from a catalog of GW observations. They can also be used for data amplification in sparse regions of the CBC parameter space to guide the development of phenomenological population models for rarely synthesizable systems with components in theorized mass gaps, without having to simulate a prohibitively large number of binaries. But flow predictions are wrought with uncertainties, especially for sparse training sets. In this work I develop a method for quantifying and marginalizing uncertainties in the emulators by introducing the Bayesian Normalizing flow, a conditional density estimator constructed from Bayesian neural networks. Using the exact likelihood function associated with density estimators I sample the posterior distribution of flow parameters with suitably chosen priors to quantify and marginalize over flow uncertainties. I demonstrate the accuracy, calibration, and data-amplification impacts of the estimated uncertainties for simulations of binary black hole populations formed through common envelope evolution. I outline applications of the methodology in simulation-based inference from growing GW catalogs and sketch other uses for general simulation-based approaches in GW astronomy.

nan


Article 1766

Title@2025-06-06 (5): Federated Foundation Model for GI Endoscopy Images

Title: Federated Foundation Model for GI Endoscopy Images Federiertes Stiftungsmodell für GI-Endoskopie-Bilder GI 内窥镜图像联邦基金会模型 2505.24108v2

Authors (8): Alina Devkota, Annahita Amireskandari, Joel Palko, Shyam Thakkar, Donald Adjeroh, Xiajun Jiang, Binod Bhattarai, Prashnna K. Gyawali

Gastrointestinal (GI) endoscopy is essential in identifying GI tract abnormalities in order to detect diseases in their early stages and improve patient outcomes. Although deep learning has shown success in supporting GI diagnostics and decision-making, these models require curated datasets with labels that are expensive to acquire. Foundation models offer a promising solution by learning general-purpose representations, which can be finetuned for specific tasks, overcoming data scarcity. Developing foundation models for medical imaging holds significant potential, but the sensitive and protected nature of medical data presents unique challenges. Foundation model training typically requires extensive datasets, and while hospitals generate large volumes of data, privacy restrictions prevent direct data sharing, making foundation model training infeasible in most scenarios. In this work, we propose a FL framework for training foundation models for gastroendoscopy imaging, enabling data to remain within local hospital environments while contributing to a shared model. We explore several established FL algorithms, assessing their suitability for training foundation models without relying on task-specific labels, conducting experiments in both homogeneous and heterogeneous settings. We evaluate the trained foundation model on three critical downstream tasks–classification, detection, and segmentation–and demonstrate that it achieves improved performance across all tasks, highlighting the effectiveness of our approach in a federated, privacy-preserving setting.

nan


Article 1767

Title@2025-06-06 (5): Learning to Weight Parameters for Data Attribution

Title: Learning to Weight Parameters for Data Attribution Gewichtungsparameter für die Zuordnung von Daten lernen 数据归属的重度参数学习 2506.05647v1

Authors (4): Shuangqi Li, Hieu Le, Jingyi Xu, Mathieu Salzmann

We study data attribution in generative models, aiming to identify which training examples most influence a given output. Existing methods achieve this by tracing gradients back to training data. However, they typically treat all network parameters uniformly, ignoring the fact that different layers encode different types of information and may thus draw information differently from the training set. We propose a method that models this by learning parameter importance weights tailored for attribution, without requiring labeled data. This allows the attribution process to adapt to the structure of the model, capturing which training examples contribute to specific semantic aspects of an output, such as subject, style, or background. Our method improves attribution accuracy across diffusion models and enables fine-grained insights into how outputs borrow from training data.

nan


Article 1768

Title@2025-06-06 (5): Computational bottlenecks for denoising diffusions

Title: Computational bottlenecks for denoising diffusions Computationale Engpässe für die Disseminierung von Diffusionen 消除传播传播的计算瓶颈 2503.08028v2

Authors (2): Andrea Montanari, Viet Vu

Denoising diffusions sample from a probability distribution $\mu$ in $\mathbb{R}^d$ by constructing a stochastic process $({\hat{\boldsymbol x}}_t:t\ge 0)$ in $\mathbb{R}^d$ such that ${\hat{\boldsymbol x}}_0$ is easy to sample, but the distribution of $\hat{\boldsymbol x}_T$ at large $T$ approximates $\mu$. The drift ${\boldsymbol m}:\mathbb{R}^d\times\mathbb{R}\to\mathbb{R}^d$ of this diffusion process is learned my minimizing a score-matching objective. Is every probability distribution $\mu$, for which sampling is tractable, also amenable to sampling via diffusions? We provide evidence to the contrary by studying a probability distribution $\mu$ for which sampling is easy, but the drift of the diffusion process is intractable – under a popular conjecture on information-computation gaps in statistical estimation. We show that there exist drifts that are superpolynomially close to the optimum value (among polynomial time drifts) and yet yield samples with distribution that is very far from the target one.

nan


Article 1769

Title@2025-06-06 (5): Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models

Title: Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models Pseudo Relevanz Feedback reicht aus, um die Lücke zwischen kleinen und großen Dense-Retrieval-Modellen zu schließen Peedo相关性反馈足以弥合小型和大型常量检索模型之间的差距 2503.14887v2

Authors (4): Hang Li, Xiao Wang, Bevan Koopman, Guido Zuccon

Scaling dense retrievers to larger large language model (LLM) backbones has been a dominant strategy for improving their retrieval effectiveness. However, this has substantial cost implications: larger backbones require more expensive hardware (e.g. GPUs with more memory) and lead to higher indexing and querying costs (latency, energy consumption). In this paper, we challenge this paradigm by introducing PromptPRF, a feature-based pseudo-relevance feedback (PRF) framework that enables small LLM-based dense retrievers to achieve effectiveness comparable to much larger models. PromptPRF uses LLMs to extract query-independent, structured and unstructured features (e.g., entities, summaries, chain-of-thought keywords, essay) from top-ranked documents. These features are generated offline and integrated into dense query representations via prompting, enabling efficient retrieval without additional training. Unlike prior methods such as GRF, which rely on online, query-specific generation and sparse retrieval, PromptPRF decouples feedback generation from query processing and supports dense retrievers in a fully zero-shot setting. Experiments on TREC DL and BEIR benchmarks demonstrate that PromptPRF consistently improves retrieval effectiveness and offers favourable cost-effectiveness trade-offs. We further present ablation studies to understand the role of positional feedback and analyse the interplay between feature extractor size, PRF depth, and model performance. Our findings demonstrate that with effective PRF design, scaling the retriever is not always necessary, narrowing the gap between small and large models while reducing inference cost.

nan


Article 1770

Title@2025-06-06 (5): Interpretable Enzyme Function Prediction via Residue-Level Detection

Title: Interpretable Enzyme Function Prediction via Residue-Level Detection Interpretierbare Enzymfunktionsvorhersage über Rückstandserkennung 通过残留水平探测的可解释酶功能预测 2501.05644v2

Authors (4): Zhao Yang, Bing Su, Jiahao Chen, Ji-Rong Wen

Predicting multiple functions labeled with Enzyme Commission (EC) numbers from the enzyme sequence is of great significance but remains a challenge due to its sparse multi-label classification nature, i.e., each enzyme is typically associated with only a few labels out of more than 6000 possible EC numbers. However, existing machine learning algorithms generally learn a fixed global representation for each enzyme to classify all functions, thereby they lack interpretability and the fine-grained information of some function-specific local residue fragments may be overwhelmed. Here we present an attention-based framework, namely ProtDETR (Protein Detection Transformer), by casting enzyme function prediction as a detection problem. It uses a set of learnable functional queries to adaptatively extract different local representations from the sequence of residue-level features for predicting different EC numbers. ProtDETR not only significantly outperforms existing deep learning-based enzyme function prediction methods, but also provides a new interpretable perspective on automatically detecting different local regions for identifying different functions through cross-attentions between queries and residue-level features. Code is available at https://github.com/yangzhao1230/ProtDETR.

nan


Article 1771

Title@2025-06-06 (5): Where is the signal in tokenization space?

Title: Where is the signal in tokenization space? Wo ist das Signal im Tokenisationsraum? 象征空间的信号在哪里? 2408.08541v2

Authors (5): Renato Lui Geh, Honghua Zhang, Kareem Ahmed, Benjie Wang, Guy Van den Broeck

Large Language Models (LLMs) are typically shipped with tokenizers that deterministically encode text into so-called canonical token sequences, to which the LLMs assign probability values. One common assumption is that the probability of a piece of text is the probability of its canonical token sequence. However, the tokenization of a string is not unique: e.g., the Llama2 tokenizer encodes Tokens as [Tok,ens], but [Tok,en,s] also represents the same text. In this paper, we study non-canonical tokenizations. We prove that, given a string, it is computationally hard to find the most likely tokenization for an autoregressive LLM, as well as to compute the marginal probability over all possible tokenizations. We then show how the marginal is, in most cases, indistinguishable from the canonical probability. Surprisingly, we then empirically demonstrate the existence of a significant amount of signal hidden within tokenization space. Notably, by simply aggregating the probabilities of non-canonical tokenizations, we achieve improvements across a range of LLM evaluation benchmarks for a variety of architectures, including transformers and state space models.

nan


Article 1772

Title@2025-06-06 (5): Hierarchical Refinement: Optimal Transport to Infinity and Beyond

Title: Hierarchical Refinement: Optimal Transport to Infinity and Beyond Hierarchische Verfeinerung: Optimaler Transport zur Unendlichkeit und darüber hinaus 梯级精炼:向无限及以后的最佳运输 2503.03025v2

Authors (4): Peter Halmos, Julian Gold, Xinhao Liu, Benjamin J. Raphael

Optimal transport (OT) has enjoyed great success in machine learning as a principled way to align datasets via a least-cost correspondence, driven in large part by the runtime efficiency of the Sinkhorn algorithm (Cuturi, 2013). However, Sinkhorn has quadratic space complexity in the number of points, limiting scalability to larger datasets. Low-rank OT achieves linear-space complexity, but by definition, cannot compute a one-to-one correspondence between points. When the optimal transport problem is an assignment problem between datasets then an optimal mapping, known as the Monge map, is guaranteed to be a bijection. In this setting, we show that the factors of an optimal low-rank coupling co-cluster each point with its image under the Monge map. We leverage this invariant to derive an algorithm, Hierarchical Refinement (HiRef), that dynamically constructs a multiscale partition of each dataset using low-rank OT subproblems, culminating in a bijective coupling. Hierarchical Refinement uses linear space and has log-linear runtime, retaining the space advantage of low-rank OT while overcoming its limited resolution. We demonstrate the advantages of Hierarchical Refinement on several datasets, including ones containing over a million points, scaling full-rank OT to problems previously beyond Sinkhorn’s reach.

nan


Article 1773

Title@2025-06-06 (5): Autocomp: LLM-Driven Code Optimization for Tensor Accelerators

Title: Autocomp: LLM-Driven Code Optimization for Tensor Accelerators Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v2

Authors (4): Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao

Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today’s computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise in code generation and optimization tasks, but generating low-resource languages like specialized tensor accelerator code still poses a significant challenge. We tackle this challenge with Autocomp, an approach that empowers accelerator programmers to leverage domain knowledge and hardware feedback to optimize code via an automated LLM-driven search. We accomplish this by: 1) formulating each optimization pass as a structured two-phase prompt, divided into planning and code generation phases, 2) inserting domain knowledge during planning via a concise and adaptable optimization menu, and 3) integrating correctness and performance metrics from hardware as feedback at each search iteration. Across three categories of representative workloads and two different accelerators, we demonstrate that Autocomp-optimized code runs 5.6x (GEMM) and 2.7x (convolution) faster than the vendor-provided library, and outperforms expert-level hand-tuned code by 1.4x (GEMM), 1.1x (convolution), and 1.3x (fine-grained linear algebra). Additionally, we demonstrate that optimization schedules generated from Autocomp can be reused across similar tensor operations, improving speedups by up to 24% under a fixed sample budget.

nan


Article 1774

Title@2025-06-06 (5): Estimating stationary mass, frequency by frequency

Title: Estimating stationary mass, frequency by frequency Schätzung der stationären Masse, Häufigkeit nach Häufigkeit 估计固定质量,按频率分列的频率 2503.12808v3

Authors (3): Milind Nakul, Vidya Muthukumar, Ashwin Pananjady

Suppose we observe a trajectory of length $n$ from an exponentially $\alpha$-mixing stochastic process over a finite but potentially large state space. We consider the problem of estimating the probability mass placed by the stationary distribution of any such process on elements that occur with a certain frequency in the observed sequence. We estimate this vector of probabilities in total variation distance, showing universal consistency in $n$ and recovering known results for i.i.d. sequences as special cases. Our proposed methodology – implementable in linear time – carefully combines the plug-in (or empirical) estimator with a recently-proposed modification of the Good–Turing estimator called WingIt, which was originally developed for Markovian sequences. En route to controlling the error of our estimator, we develop new performance bounds on WingIt and the plug-in estimator for exponentially $\alpha$-mixing stochastic processes. Importantly, the extensively used method of Poissonization can no longer be applied in our non i.i.d. setting, and so we develop complementary tools – including concentration inequalities for a natural self-normalized statistic of mixing sequences – that may prove independently useful in the design and analysis of estimators for related problems. Simulation studies corroborate our theoretical findings.

nan


Article 1775

Title@2025-06-06 (5): Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer

Title: Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer Longitudinal Targeted Minimal Loss-basierte Abschätzung mit zeitlich-differenzierter Heterogener Transformer 与时间差异变化异质变异器的纵向定值最低损失估计值 2404.04399v2

Authors (8): Toru Shirakawa, Yi Li, Yulun Wu, Sky Qiu, Yuxuan Li, Mingduo Zhao, Hiroyasu Iso, Mark van der Laan

We propose Deep Longitudinal Targeted Minimum Loss-based Estimation (Deep LTMLE), a novel approach to estimate the counterfactual mean of outcome under dynamic treatment policies in longitudinal problem settings. Our approach utilizes a transformer architecture with heterogeneous type embedding trained using temporal-difference learning. After obtaining an initial estimate using the transformer, following the targeted minimum loss-based likelihood estimation (TMLE) framework, we statistically corrected for the bias commonly associated with machine learning algorithms. Furthermore, our method also facilitates statistical inference by enabling the provision of 95% confidence intervals grounded in asymptotic statistical theory. Simulation results demonstrate our method’s superior performance over existing approaches, particularly in complex, long time-horizon scenarios. It remains effective in small-sample, short-duration contexts, matching the performance of asymptotically efficient estimators. To demonstrate our method in practice, we applied our method to estimate counterfactual mean outcomes for standard versus intensive blood pressure management strategies in a real-world cardiovascular epidemiology cohort study.

nan


Article 1776

Title@2025-06-06 (5): Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones

Title: Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones Projektierbare Modelle: One-Shot-Generation von kleinen Spezialtransformatoren aus großen Modellen 集成项目模型:大型一型小型专门化变形器的单制制式 2506.05641v1

Authors (3): Andrey Zhmoginov, Jihwan Lee, Mark Sandler

Modern Foundation Models (FMs) are typically trained on corpora spanning a wide range of different data modalities, topics and downstream tasks. Utilizing these models can be very computationally expensive and is out of reach for most consumer devices. Furthermore, most of the broad FM knowledge may actually be irrelevant for a specific task at hand. Here we explore a technique for mapping parameters of a large Transformer to parameters of a smaller specialized model. By making this transformation task-specific, we aim to capture a narrower scope of the knowledge needed for performing a specific task by a smaller model. We study our method on image modeling tasks, showing that performance of generated models exceeds that of universal conditional models.

nan


Article 1777

Title@2025-06-05 (4): A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition

Title: A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition Ein fiktiver Q&A-Datensatz für das Studieren von Memorization und Knowledge Acquisition 用于研究记忆化和知识获取的字形 A 数据集 2506.05639v1

Authors (5): John Kirchenbauer, Janny Mongkolsupawan, Yuxin Wen, Tom Goldstein, Daphne Ippolito

When language models are trained on textual data, they acquire both knowledge about the structure of language as well as knowledge of facts about the world. At inference time, their knowledge of facts can be leveraged to solve interesting problems and perform useful knowledge work for users. It is well known that language models can verbatim memorize long sequences from their training data. However, it is much less well understood how language models memorize facts seen during training. In this work, we propose a new dataset to specifically empower researchers to study the dual processes of fact memorization and verbatim sequence memorization. The dataset consists of synthetically-generated, webtext-like documents about fictional events, as well as question-answer pairs about the events. We conduct training experiments showing how synthetic data about fictional events can be effective in teasing apart different forms of memorization. We also document the challenges in effectively building realistic, fictional synthetic data.

nan


Article 1778

Title@2025-06-05 (4): Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks’ Internal Representations

Title: Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks’ Internal Representations Erklärbare Konzeptgenerierung durch Vision-Sprachvorliebe Lernen zum Verständnis der internen Repräsentationen neuraler Netzwerke 通过愿景-语言优先学习,了解神经网络内部代表网络,可解释的概念生成 2408.13438v3

Authors (3): Aditya Taparia, Som Sagar, Ransalu Senanayake

Understanding the inner representation of a neural network helps users improve models. Concept-based methods have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual “concepts” that are not directly related to feature attributes. For instance, the concept of “stripes” is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and manually collect multiple candidate concept image sets, making the process labor-intensive and prone to overlooking important concepts. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a standard generative model does not result in meaningful concepts, we devise a reinforcement learning-based preference optimization (RLPO) algorithm that fine-tunes a vision-language generative model from approximate textual descriptions of concepts. Through a series of experiments, we demonstrate our method’s ability to efficiently and reliably articulate diverse concepts that are otherwise challenging to craft manually.

nan


Article 1779

Title@2025-06-05 (4): CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

Title: CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP CLIPErase: Effizientes Entlernen von Visual-Textual Associations in CLIP CLIPErase:在CLIP中有效不学习视觉-语言协会 2410.23330v2

Authors (6): Tianyu Yang, Lisen Dai, Xiangqi Wang, Minhao Cheng, Yapeng Tian, Xiangliang Zhang

Machine unlearning (MU) has gained significant attention as a means to remove specific data from trained models without requiring a full retraining process. While progress has been made in unimodal domains like text and image classification, unlearning in multimodal models remains relatively underexplored. In this work, we address the unique challenges of unlearning in CLIP, a prominent multimodal model that aligns visual and textual representations. We introduce CLIPErase, a novel approach that disentangles and selectively forgets both visual and textual associations, ensuring that unlearning does not compromise model performance. CLIPErase consists of three key modules: a Forgetting Module that disrupts the associations in the forget set, a Retention Module that preserves performance on the retain set, and a Consistency Module that maintains consistency with the original model. Extensive experiments on the CIFAR-100 and Flickr30K datasets across four CLIP downstream tasks demonstrate that CLIPErase effectively forgets designated associations in zero-shot tasks for multimodal samples, while preserving the model’s performance on the retain set after unlearning.

nan


Article 1780

Title@2025-06-05 (4): Bayesian Inference for Correlated Human Experts and Classifiers

Title: Bayesian Inference for Correlated Human Experts and Classifiers Bayesische Schlussfolgerung für korrelierte menschliche Experten und Klassifikatoren 与相关人类专家和分类有关连的人类专家和分类人的贝叶斯推论 2506.05636v1

Authors (5): Markelle Kelly, Alex Boyd, Sam Showalter, Mark Steyvers, Padhraic Smyth

Applications of machine learning often involve making predictions based on both model outputs and the opinions of human experts. In this context, we investigate the problem of querying experts for class label predictions, using as few human queries as possible, and leveraging the class probability estimates of pre-trained classifiers. We develop a general Bayesian framework for this problem, modeling expert correlation via a joint latent representation, enabling simulation-based inference about the utility of additional expert queries, as well as inference of posterior distributions over unobserved expert labels. We apply our approach to two real-world medical classification problems, as well as to CIFAR-10H and ImageNet-16H, demonstrating substantial reductions relative to baselines in the cost of querying human experts while maintaining high prediction accuracy.

nan


Article 1781

Title@2025-06-05 (4): AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking

Title: AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking AdaReasoner: Adaptive Reasoning ermöglicht flexibleres Denken AdaReasoner:适应性理性使思考更灵活 2505.17312v2

Authors (7): Xiangqi Wang, Yue Huang, Yanbo Wang, Xiaonan Luo, Kehan Guo, Yujun Zhou, Xiangliang Zhang

LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work ‘well enough’ across tasks but seldom achieve task-specific optimality. To address this gap, we introduce AdaReasoner, an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations for tasks requiring different types of thinking. AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy, along with a pretrained reward model to optimize the policy model for reasoning configurations with only a few-shot guide. AdaReasoner is backed by theoretical guarantees and experiments of fast convergence and a sublinear policy gap. Across six different LLMs and a variety of reasoning tasks, it consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.

nan


Article 1782

Title@2025-06-05 (4): AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization

Title: AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization AutoQD: Automatische Entdeckung von unterschiedlichen Verhalten mit Qualitäts-Diversitäts-Optimierung 自动QD: 以质量- 多样性优化自动发现多种行为 2506.05634v1

Authors (2): Saeed Hedayatian, Stefanos Nikolaidis

Quality-Diversity (QD) algorithms have shown remarkable success in discovering diverse, high-performing solutions, but rely heavily on hand-crafted behavioral descriptors that constrain exploration to predefined notions of diversity. Leveraging the equivalence between policies and occupancy measures, we present a theoretically grounded approach to automatically generate behavioral descriptors by embedding the occupancy measures of policies in Markov Decision Processes. Our method, AutoQD, leverages random Fourier features to approximate the Maximum Mean Discrepancy (MMD) between policy occupancy measures, creating embeddings whose distances reflect meaningful behavioral differences. A low-dimensional projection of these embeddings that captures the most behaviorally significant dimensions is then used as behavioral descriptors for off-the-shelf QD methods. We prove that our embeddings converge to true MMD distances between occupancy measures as the number of sampled trajectories and embedding dimensions increase. Through experiments in multiple continuous control tasks we demonstrate AutoQD’s ability in discovering diverse policies without predefined behavioral descriptors, presenting a well-motivated alternative to prior methods in unsupervised Reinforcement Learning and QD optimization. Our approach opens new possibilities for open-ended learning and automated behavior discovery in sequential decision making settings without requiring domain-specific knowledge.

nan


Article 1783

Title@2025-06-05 (4): The TESS Ten Thousand Catalog: 10,001 uniformly-vetted and -validated Eclipsing Binary Stars detected in Full-Frame Image data by machine learning and analyzed by citizen scientists

Title: The TESS Ten Thousand Catalog: 10,001 uniformly-vetted and -validated Eclipsing Binary Stars detected in Full-Frame Image data by machine learning and analyzed by citizen scientists Der TESS Zehntausend-Katalog: 10,001 gleichmäßig getestete und validierte Eclipsing Binary Stars in Full-Frame-Bilddaten durch maschinelles Lernen erkannt und von Bürgerwissenschaftlern analysiert TESS 10 000万个目录:10 001个统一审查的和经过验证的二进制星,通过机器学习和由公民科学家分析,在全框架图像数据中检测到 2506.05631v1

Authors (39): Veselin B. Kostov, Brian P. Powell, Aline U. Fornear, Marco Z. Di Fraia, Robert Gagliano, Thomas L. Jacobs, Julien S. de Lambilly, Hugo A. Durantini Luca, Steven R. Majewski, Mark Omohundro, Jerome Orosz, Saul A. Rappaport, Ryan Salik, Donald Short, William Welsh, Svetoslav Alexandrov, Cledison Marcos da Silva, Erika Dunning, Gerd Guhne, Marc Huten, Michiharu Hyogo, Davide Iannone, Sam Lee, Christian Magliano, Manya Sharma, Allan Tarr, John Yablonsky, Sovan Acharya, Fred Adams, Thomas Barclay, Benjamin T. Montet, Susan Mullally, Greg Olmschenk, Andrej Prsa, Elisa Quintana, Robert Wilson, Hasret Balcioglu, Ethan Kruse, the Eclipsing Binary Patrol Collaboration

The Transiting Exoplanet Survey Satellite (TESS) has surveyed nearly the entire sky in Full-Frame Image mode with a time resolution of 200 seconds to 30 minutes and a temporal baseline of at least 27 days. In addition to the primary goal of discovering new exoplanets, TESS is exceptionally capable at detecting variable stars, and in particular short-period eclipsing binaries which are relatively common, making up a few percent of all stars, and represent powerful astrophysical laboratories for deep investigations of stellar formation and evolution. We combed Sectors 1-82 of TESS Full-Frame Image data searching for eclipsing binary stars using a neural network that identified ~1.2 million stars with eclipse-like features. Of these, we have performed an in-depth analysis on ~60,000 targets using automated methods and manual inspection by citizen scientists. Here we present a catalog of 10001 uniformly-vetted and -validated eclipsing binary stars that passed all our ephemeris and photocenter tests, as well as complementary visual inspection. Of these, 7936 are new eclipsing binaries while the remaining 2065 are known systems for which we update the published ephemerides. We outline the detection and analysis of the targets, discuss the properties of the sample, and highlight potentially interesting systems. Finally, we also provide a list of ~900,000 unvetted and unvalidated targets for which the neural network found eclipse-like features with a score higher than 0.9, and for which there are no known eclipsing binaries within a sky-projected separation of a TESS pixel (~21 arcsec).

nan


Article 1784

Title@2025-06-05 (4): A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series

Title: A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series Ein verlässlicher Rahmen für die Mensch-in-the-Loop-Anomalie-Erkennung in der Zeitreihe 时间序列中人类在Loop异常探测的可靠框架 2405.03234v3

Authors (4): Ziquan Deng, Xiwei Xuan, Kwan-Liu Ma, Zhaodan Kong

Time series anomaly detection is a critical machine learning task for numerous applications, such as finance, healthcare, and industrial systems. However, even high-performing models may exhibit potential issues such as biases, leading to unreliable outcomes and misplaced confidence. While model explanation techniques, particularly visual explanations, offer valuable insights by elucidating model attributions of their decision, many limitations still exist – They are primarily instance-based and not scalable across the dataset, and they provide one-directional information from the model to the human side, lacking a mechanism for users to address detected issues. To fulfill these gaps, we introduce HILAD, a novel framework designed to foster a dynamic and bidirectional collaboration between humans and AI for enhancing anomaly detection models in time series. Through our visual interface, HILAD empowers domain experts to detect, interpret, and correct unexpected model behaviors at scale. Our evaluation through user studies with two models and three time series datasets demonstrates the effectiveness of HILAD, which fosters a deeper model understanding, immediate corrective actions, and model reliability enhancement.

nan


Article 1785

Title@2025-06-05 (4): GP-MoLFormer-Sim: Test Time Molecular Optimization through Contextual Similarity Guidance

Title: GP-MoLFormer-Sim: Test Time Molecular Optimization through Contextual Similarity Guidance GP-MoLFormer-Sim: Testzeit-Molekularoptimierung durch kontextuelle Ähnlichkeitsführung GP-MoLFormer-Sim:通过背景相似性指导试验时间分子优化 2506.05628v1

Authors (7): Jiri Navratil, Jarret Ross, Payel Das, Youssef Mroueh, Samuel C Hoffman, Vijil Chenthamarakshan, Brian Belgodere

The ability to design molecules while preserving similarity to a target molecule and/or property is crucial for various applications in drug discovery, chemical design, and biology. We introduce in this paper an efficient training-free method for navigating and sampling from the molecular space with a generative Chemical Language Model (CLM), while using the molecular similarity to the target as a guide. Our method leverages the contextual representations learned from the CLM itself to estimate the molecular similarity, which is then used to adjust the autoregressive sampling strategy of the CLM. At each step of the decoding process, the method tracks the distance of the current generations from the target and updates the logits to encourage the preservation of similarity in generations. We implement the method using a recently proposed $\sim$47M parameter SMILES-based CLM, GP-MoLFormer, and therefore refer to the method as GP-MoLFormer-Sim, which enables a test-time update of the deep generative policy to reflect the contextual similarity to a set of guide molecules. The method is further integrated into a genetic algorithm (GA) and tested on a set of standard molecular optimization benchmarks involving property optimization, molecular rediscovery, and structure-based drug design. Results show that, GP-MoLFormer-Sim, combined with GA (GP-MoLFormer-Sim+GA) outperforms existing training-free baseline methods, when the oracle remains black-box. The findings in this work are a step forward in understanding and guiding the generative mechanisms of CLMs.

nan


Article 1786

Title@2025-06-05 (4): Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Title: Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency Grundlegende Grenzen der schnellen Tuning Transformer: Universalität, Kapazität und Effizienz 迅速提款变换者的根本限制:普遍性、能力和效益 2411.16525v2

Authors (6): Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani, Chenyang Li, Zhao Song, Han Liu

We investigate the statistical and computational limits of prompt tuning for transformer-based foundation models. Our key contributions are prompt tuning on \emph{single-head} transformers with only a \emph{single} self-attention layer: (i) is universal, and (ii) supports efficient (even almost-linear time) algorithms under the Strong Exponential Time Hypothesis (SETH). Statistically, we prove that prompt tuning on such simplest possible transformers are universal approximators for sequence-to-sequence Lipschitz functions. In addition, we provide an exponential-in-$dL$ and -in-$(1/\epsilon)$ lower bound on the required soft-prompt tokens for prompt tuning to memorize any dataset with 1-layer, 1-head transformers. Computationally, we identify a phase transition in the efficiency of prompt tuning, determined by the norm of the \emph{soft-prompt-induced} keys and queries, and provide an upper bound criterion. Beyond this criterion, no sub-quadratic (efficient) algorithm for prompt tuning exists under SETH. Within this criterion, we showcase our theory by proving the existence of almost-linear time prompt tuning inference algorithms. These fundamental limits provide important necessary conditions for designing expressive and efficient prompt tuning methods for practitioners.

nan


Article 1787

Title@2025-06-05 (4): Theoretical Analysis of Positional Encodings in Transformer Models: Impact on Expressiveness and Generalization

Title: Theoretical Analysis of Positional Encodings in Transformer Models: Impact on Expressiveness and Generalization Theoretische Analyse von Positionskodierungen in Transformer-Modellen: Auswirkungen auf Expressivität und Generalisierung 变换模型中位置编码理论分析:对表现力和普遍化的影响 2506.06398v1

Authors (1): Yin Li

Positional encodings are a core part of transformer-based models, enabling processing of sequential data without recurrence. This paper presents a theoretical framework to analyze how various positional encoding methods, including sinusoidal, learned, relative, and bias-based methods like Attention with Linear Biases (ALiBi), impact a transformer’s expressiveness, generalization ability, and extrapolation to longer sequences. Expressiveness is defined via function approximation, generalization bounds are established using Rademacher complexity, and new encoding methods based on orthogonal functions, such as wavelets and Legendre polynomials, are proposed. The extrapolation capacity of existing and proposed encodings is analyzed, extending ALiBi’s biasing approach to a unified theoretical context. Experimental evaluation on synthetic sequence-to-sequence tasks shows that orthogonal transform-based encodings outperform traditional sinusoidal encodings in generalization and extrapolation. This work addresses a critical gap in transformer theory, providing insights for design choices in natural language processing, computer vision, and other transformer applications.

nan


Article 1788

Title@2025-06-05 (4): Two-dimensional Taxonomy for N-ary Knowledge Representation Learning Methods

Title: Two-dimensional Taxonomy for N-ary Knowledge Representation Learning Methods Zweidimensionale Taxonomie für N-are Wissensdarstellung Lernmethoden Nary知识代表制学习方法二维分类学 2506.05626v1

Authors (3): Xiaohua Lu, Liubov Tupikina, Mehwish Alam

Real-world knowledge can take various forms, including structured, semi-structured, and unstructured data. Among these, knowledge graphs are a form of structured human knowledge that integrate heterogeneous data sources into structured representations but typically reduce complex n-ary relations to simple triples, thereby losing higher-order relational details. In contrast, hypergraphs naturally represent n-ary relations with hyperedges, which directly connect multiple entities together. Yet hypergraph representation learning often overlooks entity roles in hyperedges, limiting the fine-grained semantic modelling. To address these issues, knowledge hypergraphs and hyper-relational knowledge graphs combine the advantages of knowledge graphs and hypergraphs to better capture the complex structures and role-specific semantics of real-world knowledge. This survey provides a comprehensive review of methods handling n-ary relational data, covering both knowledge hypergraphs and hyper-relational knowledge graphs literatures. We propose a two-dimensional taxonomy: the first dimension categorises models based on their methodology, i.e., translation-based models, tensor factorisation-based models, deep neural network-based models, logic rules-based models, and hyperedge expansion-based models. The second dimension classifies models according to their awareness of entity roles and positions in n-ary relations, dividing them into aware-less, position-aware, and role-aware approaches. Finally, we discuss existing datasets, negative sampling strategies, and outline open challenges to inspire future research.

nan


Article 1789

Title@2025-06-05 (4): Heterogeneous Sequel-Aware Graph Neural Networks for Sequential Learning

Title: Heterogeneous Sequel-Aware Graph Neural Networks for Sequential Learning Heterogene Sequel-Aware Graph Neuronale Netzwerke für sequentielles Lernen 用于序列学习的异异质Sequel-Aware 图像神经网络 2506.05625v1

Authors (3): Anushka Tiwari, Haimonti Dutta, Shahrzad Khanizadeh

Graph-based recommendation systems use higher-order user and item embeddings for next-item predictions. Dynamically adding collaborative signals from neighbors helps to use similar users’ preferences during learning. While item-item correlations and their impact on recommendations have been studied, the efficacy of temporal item sequences for recommendations is much less explored. In this paper, we examine temporal item sequence (sequel-aware) embeddings along with higher-order user embeddings and show that sequel-aware Graph Neural Networks have better (or comparable) recommendation performance than graph-based recommendation systems that do not consider sequel information. Extensive empirical results comparing Heterogeneous Sequel-aware Graph Neural Networks (HSAL-GNNs) to other algorithms for sequential learning (such as transformers, graph neural networks, auto-encoders) are presented on three synthetic and three real-world datasets. Our results indicate that the incorporation of sequence information from items greatly enhances recommendations.

nan


Article 1790

Title@2025-06-05 (4): Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space

Title: Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space Verteilungsmatrix Fertigstellung über die nächsten Nachbarn im Wassersteinraum 通过瓦塞斯坦空间近距离邻居完成分布矩阵 2410.13112v2

Authors (4): Jacob Feitelberg, Kyuseong Choi, Anish Agarwal, Raaz Dwivedi

We study the problem of distributional matrix completion: Given a sparsely observed matrix of empirical distributions, we seek to impute the true distributions associated with both observed and unobserved matrix entries. This is a generalization of traditional matrix completion, where the observations per matrix entry are scalar-valued. To do so, we utilize tools from optimal transport to generalize the nearest neighbors method to the distributional setting. Under a suitable latent factor model on probability distributions, we establish that our method recovers the distributions in the Wasserstein metric. We demonstrate through simulations that our method (i) provides better distributional estimates for an entry compared to using observed samples for that entry alone, (ii) yields accurate estimates of distributional quantities such as standard deviation and value-at-risk, and (iii) inherently supports heteroscedastic distributions. In addition, we demonstrate our method on a real-world dataset of quarterly earnings prediction distributions. We also prove novel asymptotic results for Wasserstein barycenters over one-dimensional distributions.

nan


Article 1791

Title@2025-06-05 (4): Active Learning of Piecewise Gaussian Process Surrogates

Title: Active Learning of Piecewise Gaussian Process Surrogates Aktives Lernen des stückweisen Gaußschen Prozesses Surrogats 正在积极学习小费高斯进程代得国家 2301.08789v3

Authors (6): Chiwoo Park, Robert Waelder, Bonggwon Kang, Benji Maruyama, Soondo Hong, Robert Gramacy

Active learning of Gaussian process (GP) surrogates has been useful for optimizing experimental designs for physical/computer simulation experiments, and for steering data acquisition schemes in machine learning. In this paper, we develop a method for active learning of piecewise, Jump GP surrogates. Jump GPs are continuous within, but discontinuous across, regions of a design space, as required for applications spanning autonomous materials design, configuration of smart factory systems, and many others. Although our active learning heuristics are appropriated from strategies originally designed for ordinary GPs, we demonstrate that additionally accounting for model bias, as opposed to the usual model uncertainty, is essential in the Jump GP context. Toward that end, we develop an estimator for bias and variance of Jump GP models. Illustrations, and evidence of the advantage of our proposed methods, are provided on a suite of synthetic benchmarks, and real-simulation experiments of varying complexity.

nan


Article 1792

Title@2025-06-05 (4): EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM?

Title: EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM? EnsemW2S: Kann ein Ensemble von LLMs dazu genutzt werden, eine stärkere LLM zu erhalten? EnsemW2S:能否利用捆绑的LLMs获得更强大的LMm? 2410.04571v2

Authors (7): Aakriti Agrawal, Mucong Ding, Zora Che, Chenghao Deng, Anirudh Satheesh, John Langford, Furong Huang

How can we harness the collective capabilities of multiple Large Language Models (LLMs) to create an even more powerful model? This question forms the foundation of our research, where we propose an innovative approach to weak-to-strong (w2s) generalization-a critical problem in AI alignment. Our work introduces an easy-to-hard (e2h) framework for studying the feasibility of w2s generalization, where weak models trained on simpler tasks collaboratively supervise stronger models on more complex tasks. This setup mirrors real-world challenges, where direct human supervision is limited. To achieve this, we develop a novel AdaBoost-inspired ensemble method, demonstrating that an ensemble of weak supervisors can enhance the performance of stronger LLMs across classification and generative tasks on difficult QA datasets. In several cases, our ensemble approach matches the performance of models trained on ground-truth data, establishing a new benchmark for w2s generalization. We observe an improvement of up to 14% over existing baselines and average improvements of 5% and 4% for binary classification and generative tasks, respectively. This research points to a promising direction for enhancing AI through collective supervision, especially in scenarios where labeled data is sparse or insufficient.

nan


Article 1793

Title@2025-06-05 (4): Exploring Representations and Interventions in Time Series Foundation Models

Title: Exploring Representations and Interventions in Time Series Foundation Models Exploring Repräsentationen und Interventionen in Time Series Foundation Models 探讨在时间系列基础模型中的代表性和干预措施 2409.12915v5

Authors (5): Michał Wiliński, Mononito Goswami, Willa Potosnak, Nina Żukowska, Artur Dubrawski

Time series foundation models (TSFMs) promise to be powerful tools for a wide range of applications. However, their internal representations and learned concepts are still not well understood. In this study, we investigate the structure and redundancy of representations across various TSFMs, examining the self-similarity of model layers within and across different model sizes. This analysis reveals block-like redundancy in the representations, which can be utilized for informed pruning to improve inference speed and efficiency. Additionally, we explore the concepts learned by these models - such as periodicity and trends - and how these can be manipulated through latent space steering to influence model behavior. Our experiments show that steering interventions can introduce new features, e.g., adding periodicity or trends to signals that initially lacked them. These findings underscore the value of representational analysis for optimizing models and demonstrate how conceptual steering offers new possibilities for more controlled and efficient time series analysis with TSFMs.

nan


Article 1794

Title@2025-06-05 (4): Toward Efficient Kernel-Based Solvers for Nonlinear PDEs

Title: Toward Efficient Kernel-Based Solvers for Nonlinear PDEs Auf dem Weg zu effizienten Kernel-basierten Lösern für nichtlineare PDEs 实现非线性PDE 高效的以核心为基础的内核解决问题器 2410.11165v4

Authors (6): Zhitong Xu, Da Long, Yiming Xu, Guang Yang, Shandian Zhe, Houman Owhadi

We introduce a novel kernel learning framework toward efficiently solving nonlinear partial differential equations (PDEs). In contrast to the state-of-the-art kernel solver that embeds differential operators within kernels, posing challenges with a large number of collocation points, our approach eliminates these operators from the kernel. We model the solution using a standard kernel interpolation form and differentiate the interpolant to compute the derivatives. Our framework obviates the need for complex Gram matrix construction between solutions and their derivatives, allowing for a straightforward implementation and scalable computation. As an instance, we allocate the collocation points on a grid and adopt a product kernel, which yields a Kronecker product structure in the interpolation. This structure enables us to avoid computing the full Gram matrix, reducing costs and scaling efficiently to a large number of collocation points. We provide a proof of the convergence and rate analysis of our method under appropriate regularity assumptions. In numerical experiments, we demonstrate the advantages of our method in solving several benchmark PDEs.

nan


Article 1795

Title@2025-06-05 (4): Population-Proportional Preference Learning from Human Feedback: An Axiomatic Approach

Title: Population-Proportional Preference Learning from Human Feedback: An Axiomatic Approach Bevölkerung-proportionale Präferenz Lernen aus menschlichem Feedback: Ein axiomatischer Ansatz 从人类反馈中学习:一种不言而喻的方法 2506.05619v1

Authors (4): Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar, Pablo A. Parrilo

Conventional preference learning methods often prioritize opinions held more widely when aggregating preferences from multiple evaluators. This may result in policies that are biased in favor of some types of opinions or groups. The objective of this paper is to develop a novel preference learning framework capable of aligning aggregate opinions and policies proportionally with the true population distribution of evaluator preferences. Our approach infers the feasible set of evaluator population distributions directly from pairwise comparison data. Using these estimates, the algorithm constructs a policy that satisfies foundational axioms from social choice theory, namely monotonicity and Pareto efficiency, as well as our newly-introduced axioms of population-proportional representation and population-bounded robustness. We propose a soft-max relaxation method that smoothly trade-offs population-proportional representation with the selection of the Condorcet winner (which beats all other options in pairwise comparisons). Finally, we validate the effectiveness and scalability of our approach through experiments on both tabular recommendation tasks and large-scale language model alignment.

nan


Article 1796

Title@2025-06-05 (4): Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

Title: Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods Rollout Roulette: Ein probabilistischer Schlussfolgerungsansatz zur Inferenz-Zeit-Skalierung von LLMs mit partikelbasierten Monte-Carlo-Methoden 滚轮推出:使用以粒子为基础的蒙特卡洛方法对LLMs的推论-时间尺度的概率推推方法 2502.01618v4

Authors (5): Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava

Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating scaling the computation spent at inference time. Existing inference-time scaling methods, usually with reward models, cast the task as a search problem, which tends to be vulnerable to reward hacking as a consequence of approximation errors in reward models. In this paper, we instead cast inference-time scaling as a probabilistic inference task and leverage sampling-based techniques to explore the typical set of the state distribution of a state-space model with an approximate likelihood, rather than optimize for its mode directly. We propose a novel inference-time scaling approach by adapting particle-based Monte Carlo methods to this task. Our empirical evaluation demonstrates that our methods have a 4-16x better scaling rate over our deterministic search counterparts on various challenging mathematical reasoning tasks. Using our approach, we show that Qwen2.5-Math-1.5B-Instruct can surpass GPT-4o accuracy in only 4 rollouts, while Qwen2.5-Math-7B-Instruct scales to o1 level accuracy in only 32 rollouts. Our work not only presents an effective method to inference-time scaling, but also connects the rich literature in probabilistic inference with inference-time scaling of LLMs to develop more robust algorithms in future work. Code, videos, and further information available at https://probabilistic-inference-scaling.github.io.

nan


Article 1797

Title@2025-06-05 (4): LFA applied to CNNs: Efficient Singular Value Decomposition of Convolutional Mappings by Local Fourier Analysis

Title: LFA applied to CNNs: Efficient Singular Value Decomposition of Convolutional Mappings by Local Fourier Analysis LFA für CNNs: Effiziente Singularwertzerlegung von konvolutionären Mappings durch lokale Fourier-Analysen LFA适用于有线电视新闻网:通过局部Fourier分析对革命绘图法的高效单项价值分解 2506.05617v1

Authors (3): Antonia van Betteray, Matthias Rottmann, Karsten Kahl

The singular values of convolutional mappings encode interesting spectral properties, which can be used, e.g., to improve generalization and robustness of convolutional neural networks as well as to facilitate model compression. However, the computation of singular values is typically very resource-intensive. The naive approach involves unrolling the convolutional mapping along the input and channel dimensions into a large and sparse two-dimensional matrix, making the exact calculation of all singular values infeasible due to hardware limitations. In particular, this is true for matrices that represent convolutional mappings with large inputs and a high number of channels. Existing efficient methods leverage the Fast Fourier transformation (FFT) to transform convolutional mappings into the frequency domain, enabling the computation of singular values for matrices representing convolutions with larger input and channel dimensions. For a constant number of channels in a given convolution, an FFT can compute N singular values in O(N log N) complexity. In this work, we propose an approach of complexity O(N) based on local Fourier analysis, which additionally exploits the shift invariance of convolutional operators. We provide a theoretical analysis of our algorithm’s runtime and validate its efficiency through numerical experiments. Our results demonstrate that our proposed method is scalable and offers a practical solution to calculate the entire set of singular values - along with the corresponding singular vectors if needed - for high-dimensional convolutional mappings.

nan


Article 1798

Title@2025-06-05 (4): When Maximum Entropy Misleads Policy Optimization

Title: When Maximum Entropy Misleads Policy Optimization Wenn maximale Entropie die Politikoptimierung verfälscht 当最大 Entroppy 错误引导政策优化 2506.05615v1

Authors (3): Ruipeng Zhang, Ya-Chien Chang, Sicun Gao

The Maximum Entropy Reinforcement Learning (MaxEnt RL) framework is a leading approach for achieving efficient learning and robust performance across many RL tasks. However, MaxEnt methods have also been shown to struggle with performance-critical control problems in practice, where non-MaxEnt algorithms can successfully learn. In this work, we analyze how the trade-off between robustness and optimality affects the performance of MaxEnt algorithms in complex control tasks: while entropy maximization enhances exploration and robustness, it can also mislead policy optimization, leading to failure in tasks that require precise, low-entropy policies. Through experiments on a variety of control problems, we concretely demonstrate this misleading effect. Our analysis leads to better understanding of how to balance reward design and entropy maximization in challenging control problems.

nan


Article 1799

Title@2025-06-05 (4): Extracting Interpretable Logic Rules from Graph Neural Networks

Title: Extracting Interpretable Logic Rules from Graph Neural Networks Extrahieren von interpretierbaren Logic Rules aus Graph Neural Networks 从图形神经网络中提取可解释逻辑规则 2503.19476v2

Authors (5): Chuqin Geng, Ziyu Zhao, Zhaoyue Wang, Haolin Ye, Xujie Si

Graph neural networks (GNNs) operate over both input feature spaces and combinatorial graph structures, making it challenging to understand the rationale behind their predictions. As GNNs gain widespread popularity and demonstrate success across various domains, such as drug discovery, studying their interpretability has become a critical task. To address this, many explainability methods have been proposed, with recent efforts shifting from instance-specific explanations to global concept-based explainability. However, these approaches face several limitations, such as relying on predefined concepts and explaining only a limited set of patterns. To address this, we propose a novel framework, LOGICXGNN, for extracting interpretable logic rules from GNNs. LOGICXGNN is model-agnostic, efficient, and data-driven, eliminating the need for predefined concepts. More importantly, it can serve as a rule-based classifier and even outperform the original neural models. Its interpretability facilitates knowledge discovery, as demonstrated by its ability to extract detailed and accurate chemistry knowledge that is often overlooked by existing methods. Another key advantage of LOGICXGNN is its ability to generate new graph instances in a controlled and transparent manner, offering significant potential for applications such as drug design. We empirically demonstrate these merits through experiments on real-world datasets such as MUTAG and BBBP.

nan


Article 1800

Title@2025-06-05 (4): Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning

Title: Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning Reasoning Towards Fairness: Bias in Sprachmodellen durch reasoning-guided Fine-Tuning abmildern 追求公平的理由:通过有理、有理、有指导的微调减少语言模式中的偏见 2504.05632v3

Authors (3): Sanchit Kabra, Akshita Jha, Chandan K. Reddy

Recent advances in large-scale generative language models have shown that reasoning capabilities can significantly improve model performance across a variety of tasks. However, the impact of reasoning on a model’s ability to mitigate stereotypical responses remains largely underexplored. In this work, we investigate the crucial relationship between a model’s reasoning ability and fairness, and ask whether improved reasoning capabilities can mitigate harmful stereotypical responses, especially those arising due to shallow or flawed reasoning. We conduct a comprehensive evaluation of multiple open-source LLMs, and find that larger models with stronger reasoning abilities exhibit substantially lower stereotypical bias on existing fairness benchmarks. Building on this insight, we introduce ReGiFT – Reasoning Guided Fine-Tuning, a novel approach that extracts structured reasoning traces from advanced reasoning models and infuses them into models that lack such capabilities. We use only general-purpose reasoning and do not require any fairness-specific supervision for bias mitigation. Notably, we see that models fine-tuned using ReGiFT not only improve fairness relative to their non-reasoning counterparts but also outperform advanced reasoning models on fairness benchmarks. We also analyze how variations in the correctness of the reasoning traces and their length influence model fairness and their overall performance. Our findings highlight that enhancing reasoning capabilities is an effective, fairness-agnostic strategy for mitigating stereotypical bias caused by reasoning flaws.

nan


Article 1801

Title: Robust and Efficient Transfer Learning via Supernet Transfer in Warm-started Neural Architecture Search Robustes und effizientes Transfer-Lernen über Supernet-Transfer in der neuralen Architektur 通过热启动的神经结构搜索中的超级网传输进行强力和高效传输学习 2407.20279v2

Authors (2): Prabhant Singh, Joaquin Vanschoren

Hand-designing Neural Networks is a tedious process that requires significant expertise. Neural Architecture Search (NAS) frameworks offer a very useful and popular solution that helps to democratize AI. However, these NAS frameworks are often computationally expensive to run, which limits their applicability and accessibility. In this paper, we propose a novel transfer learning approach, capable of effectively transferring pretrained supernets based on Optimal Transport or multi-dataset pretaining. This method can be generally applied to NAS methods based on Differentiable Architecture Search (DARTS). Through extensive experiments across dozens of image classification tasks, we demonstrate that transferring pretrained supernets in this way can not only drastically speed up the supernet training which then finds optimal models (3 to 5 times faster on average), but even yield that outperform those found when running DARTS methods from scratch. We also observe positive transfer to almost all target datasets, making it very robust. Besides drastically improving the applicability of NAS methods, this also opens up new applications for continual learning and related fields.

nan


Article 1802

Title@2025-06-05 (4): Adversarial Tokenization

Title: Adversarial Tokenization Adversariale Tokenisierung 对抗感应器化 2503.02174v2

Authors (3): Renato Lui Geh, Zilei Shao, Guy Van den Broeck

Current LLM pipelines account for only one possible tokenization for a given string, ignoring exponentially many alternative tokenizations during training and inference. For example, the standard Llama3 tokenization of penguin is [p,enguin], yet [peng,uin] is another perfectly valid alternative. In this paper, we show that despite LLMs being trained solely on one tokenization, they still retain semantic understanding of other tokenizations, raising questions about their implications in LLM safety. Put succinctly, we answer the following question: can we adversarially tokenize an obviously malicious string to evade safety and alignment restrictions? We show that not only is adversarial tokenization an effective yet previously neglected axis of attack, but it is also competitive against existing state-of-the-art adversarial approaches without changing the text of the harmful request. We empirically validate this exploit across three state-of-the-art LLMs and adversarial datasets, revealing a previously unknown vulnerability in subword models.

nan


Article 1803

Title@2025-06-05 (4): Model-Based Exploration in Monitored Markov Decision Processes

Title: Model-Based Exploration in Monitored Markov Decision Processes Modellbasierte Exploration in überwachten Markov-Entscheidungsprozessen 在监测的Markov决策过程中进行基于模型的探索 2502.16772v4

Authors (4): Alireza Kazemipour, Simone Parisi, Matthew E. Taylor, Michael Bowling

A tenet of reinforcement learning is that the agent always observes rewards. However, this is not true in many realistic settings, e.g., a human observer may not always be available to provide rewards, sensors may be limited or malfunctioning, or rewards may be inaccessible during deployment. Monitored Markov decision processes (Mon-MDPs) have recently been proposed to model such settings. However, existing Mon-MDP algorithms have several limitations: they do not fully exploit the problem structure, cannot leverage a known monitor, lack worst-case guarantees for ‘unsolvable’ Mon-MDPs without specific initialization, and offer only asymptotic convergence proofs. This paper makes three contributions. First, we introduce a model-based algorithm for Mon-MDPs that addresses these shortcomings. The algorithm employs two instances of model-based interval estimation: one to ensure that observable rewards are reliably captured, and another to learn the minimax-optimal policy. Second, we empirically demonstrate the advantages. We show faster convergence than prior algorithms in over four dozen benchmarks, and even more dramatic improvement when the monitoring process is known. Third, we present the first finite-sample bound on performance. We show convergence to a minimax-optimal policy even when some rewards are never observable.

nan


Article 1804

Title@2025-06-05 (4): An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus

Title: An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus Eine optimale Kaskade-Feature-Level-Spatiotemporale Fusionsstrategie für Anomalieerkennung im CAN-Bus CAN公共汽车异常探测的最佳最佳岩岩岩层地貌级超光时聚变战略 2501.18821v3

Authors (5): Mohammad Fatahi, Danial Sadrian Zadeh, Benyamin Ghojogh, Behzad Moshiri, Otman Basir

Intelligent transportation systems (ITS) play a pivotal role in modern infrastructure but face security risks due to the broadcast-based nature of the in-vehicle Controller Area Network (CAN) buses. While numerous machine learning models and strategies have been proposed to detect CAN anomalies, existing approaches lack robustness evaluations and fail to comprehensively detect attacks due to shifting their focus on a subset of dominant structures of anomalies. To overcome these limitations, the current study proposes a cascade feature-level spatiotemporal fusion framework that integrates the spatial features and temporal features through a two-parameter genetic algorithm (2P-GA)-optimized cascade architecture to cover all dominant structures of anomalies. Extensive paired t-test analysis confirms that the model achieves an AUC-ROC of 0.9987, demonstrating robust anomaly detection capabilities. The Spatial Module improves the precision by approximately 4%, while the Temporal Module compensates for recall losses, ensuring high true positive rates. The proposed framework detects all attack types with 100% accuracy on the CAR-HACKING dataset, outperforming state-of-the-art methods. This study provides a validated, robust solution for real-world CAN security challenges.

nan


Article 1805

Title@2025-06-05 (4): FaCTR: Factorized Channel-Temporal Representation Transformers for Efficient Time Series Forecasting

Title: FaCTR: Factorized Channel-Temporal Representation Transformers for Efficient Time Series Forecasting FaCTR: Factorized Channel-Temporal Representation Transformers für effiziente Zeitreihenprognosen FACTR: 高效时间序列预报的有条频道-临时代表制变革器 2506.05597v1

Authors (2): Yash Vijay, Harini Subramanyan

While Transformers excel in language and vision-where inputs are semantically rich and exhibit univariate dependency structures-their architectural complexity leads to diminishing returns in time series forecasting. Time series data is characterized by low per-timestep information density and complex dependencies across channels and covariates, requiring conditioning on structured variable interactions. To address this mismatch and overparameterization, we propose FaCTR, a lightweight spatiotemporal Transformer with an explicitly structural design. FaCTR injects dynamic, symmetric cross-channel interactions-modeled via a low-rank Factorization Machine into temporally contextualized patch embeddings through a learnable gating mechanism. It further encodes static and dynamic covariates for multivariate conditioning. Despite its compact design, FaCTR achieves state-of-the-art performance on eleven public forecasting benchmarks spanning both short-term and long-term horizons, with its largest variant using close to only 400K parameters-on average 50x smaller than competitive spatiotemporal transformer baselines. In addition, its structured design enables interpretability through cross-channel influence scores-an essential requirement for real-world decision-making. Finally, FaCTR supports self-supervised pretraining, positioning it as a compact yet versatile foundation for downstream time series tasks.

nan


Article 1806

Title@2025-06-05 (4): Zero-shot protein stability prediction by inverse folding models: a free energy interpretation

Title: Zero-shot protein stability prediction by inverse folding models: a free energy interpretation Null-Shot-Proteinstabilitätsprognose durch inverse Faltmodelle: eine freie Energieinterpretation 通过反折叠模型进行零射蛋白稳定性预测:免费能源解释 2506.05596v1

Authors (7): Jes Frellsen, Maher M. Kassem, Tone Bengtsen, Lars Olsen, Kresten Lindorff-Larsen, Jesper Ferkinghoff-Borg, Wouter Boomsma

Inverse folding models have proven to be highly effective zero-shot predictors of protein stability. Despite this success, the link between the amino acid preferences of an inverse folding model and the free-energy considerations underlying thermodynamic stability remains incompletely understood. A better understanding would be of interest not only from a theoretical perspective, but also potentially provide the basis for stronger zero-shot stability prediction. In this paper, we take steps to clarify the free-energy foundations of inverse folding models. Our derivation reveals the standard practice of likelihood ratios as a simplistic approximation and suggests several paths towards better estimates of the relative stability. We empirically assess these approaches and demonstrate that considerable gains in zero-shot performance can be achieved with fairly simple means.

nan


Article 1807

Title@2025-06-05 (4): Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations

Title: Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations Unüberwachte Latent Pattern Analyse zur Schätzung des Typ-2-Diabetes-Risikos in nicht diagnostizierten Populationen 未经监督的对未诊断的人群2型糖尿病风险估算的 2505.21824v2

Authors (3): Praveen Kumar, Vincent T. Metzger, Scott A. Malec

The global prevalence of diabetes, particularly type 2 diabetes mellitus (T2DM), is rapidly increasing, posing significant health and economic challenges. T2DM not only disrupts blood glucose regulation but also damages vital organs such as the heart, kidneys, eyes, nerves, and blood vessels, leading to substantial morbidity and mortality. In the US alone, the economic burden of diagnosed diabetes exceeded $400 billion in 2022. Early detection of individuals at risk is critical to mitigating these impacts. While machine learning approaches for T2DM prediction are increasingly adopted, many rely on supervised learning, which is often limited by the lack of confirmed negative cases. To address this limitation, we propose a novel unsupervised framework that integrates Non-negative Matrix Factorization (NMF) with statistical techniques to identify individuals at risk of developing T2DM. Our method identifies latent patterns of multimorbidity and polypharmacy among diagnosed T2DM patients and applies these patterns to estimate the T2DM risk in undiagnosed individuals. By leveraging data-driven insights from comorbidity and medication usage, our approach provides an interpretable and scalable solution that can assist healthcare providers in implementing timely interventions, ultimately improving patient outcomes and potentially reducing the future health and economic burden of T2DM.

nan


Article 1808

Title@2025-06-05 (4): Nonlinear Causal Discovery through a Sequential Edge Orientation Approach

Title: Nonlinear Causal Discovery through a Sequential Edge Orientation Approach Nichtlineare Causal Discovery durch einen sequentiellen Edge-Orientierungsansatz 通过序列边缘方向方法的非线性因果发现 2506.05590v1

Authors (2): Stella Huang, Qing Zhou

Recent advances have established the identifiability of a directed acyclic graph (DAG) under additive noise models (ANMs), spurring the development of various causal discovery methods. However, most existing methods make restrictive model assumptions, rely heavily on general independence tests, or require substantial computational time. To address these limitations, we propose a sequential procedure to orient undirected edges in a completed partial DAG (CPDAG), representing an equivalence class of DAGs, by leveraging the pairwise additive noise model (PANM) to identify their causal directions. We prove that this procedure can recover the true causal DAG assuming a restricted ANM. Building on this result, we develop a novel constraint-based algorithm for learning causal DAGs under nonlinear ANMs. Given an estimated CPDAG, we develop a ranking procedure that sorts undirected edges by their adherence to the PANM, which defines an evaluation order of the edges. To determine the edge direction, we devise a statistical test that compares the log-likelihood values, evaluated with respect to the competing directions, of a sub-graph comprising just the candidate nodes and their identified parents in the partial DAG. We further establish the structural learning consistency of our algorithm in the large-sample limit. Extensive experiments on synthetic and real-world datasets demonstrate that our method is computationally efficient, robust to model misspecification, and consistently outperforms many existing nonlinear DAG learning methods.

nan


Article 1809

Title@2025-06-05 (4): MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark

Title: MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark MMTU: Ein massiver Multi-Task-Tabellen-Verstands- und Orientierungs-Benchmark MMTU: 大规模多任务表格理解和说明理由基准 2506.05587v1

Authors (9): Junjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong, Shi Han, Lingjiao Chen, Dongmei Zhang, Surajit Chaudhuri, H. V. Jagadish

Tables and table-based use cases play a crucial role in many important real-world applications, such as spreadsheets, databases, and computational notebooks, which traditionally require expert-level users like data engineers, data analysts, and database administrators to operate. Although LLMs have shown remarkable progress in working with tables (e.g., in spreadsheet and database copilot scenarios), comprehensive benchmarking of such capabilities remains limited. In contrast to an extensive and growing list of NLP benchmarks, evaluations of table-related tasks are scarce, and narrowly focus on tasks like NL-to-SQL and Table-QA, overlooking the broader spectrum of real-world tasks that professional users face. This gap limits our understanding and model progress in this important area. In this work, we introduce MMTU, a large-scale benchmark with over 30K questions across 25 real-world table tasks, designed to comprehensively evaluate models ability to understand, reason, and manipulate real tables at the expert-level. These tasks are drawn from decades’ worth of computer science research on tabular data, with a focus on complex table tasks faced by professional users. We show that MMTU require a combination of skills – including table understanding, reasoning, and coding – that remain challenging for today’s frontier models, where even frontier reasoning models like OpenAI o4-mini and DeepSeek R1 score only around 60%, suggesting significant room for improvement. We highlight key findings in our evaluation using MMTU and hope that this benchmark drives further advances in understanding and developing foundation models for structured data processing and analysis. Our code and data are available at https://github.com/MMTU-Benchmark/MMTU and https://huggingface.co/datasets/MMTU-benchmark/MMTU.

nan


Article 1810

Title@2025-06-05 (4): CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions

Title: CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions CoFrNets: Interpretierte Neuralarchitektur Inspiriert von fortgesetzten Brüchen CoFrNets:受连续分数启发的可解释神经结构 2506.05586v1

Authors (6): Isha Puri, Amit Dhurandhar, Tejaswini Pedapati, Kartikeyan Shanmugam, Dennis Wei, Kush R. Varshney

In recent years there has been a considerable amount of research on local post hoc explanations for neural networks. However, work on building interpretable neural architectures has been relatively sparse. In this paper, we present a novel neural architecture, CoFrNet, inspired by the form of continued fractions which are known to have many attractive properties in number theory, such as fast convergence of approximations to real numbers. We show that CoFrNets can be efficiently trained as well as interpreted leveraging their particular functional form. Moreover, we prove that such architectures are universal approximators based on a proof strategy that is different than the typical strategy used to prove universal approximation results for neural networks based on infinite width (or depth), which is likely to be of independent interest. We experiment on nonlinear synthetic functions and are able to accurately model as well as estimate feature attributions and even higher order terms in some cases, which is a testament to the representational power as well as interpretability of such architectures. To further showcase the power of CoFrNets, we experiment on seven real datasets spanning tabular, text and image modalities, and show that they are either comparable or significantly better than other interpretable models and multilayer perceptrons, sometimes approaching the accuracies of state-of-the-art models.

nan


Article 1811

Title@2025-06-05 (4): TabFlex: Scaling Tabular Learning to Millions with Linear Attention

Title: TabFlex: Scaling Tabular Learning to Millions with Linear Attention TabFlex: Tabulares Lernen zu Millionen mit linearer Aufmerksamkeit skalieren TabFlex: 以线性关注方式向百万上百万上百万人推广表格学习 2506.05584v1

Authors (4): Yuchen Zeng, Tuan Dinh, Wonjun Kang, Andreas C Mueller

Leveraging the in-context learning (ICL) capability of Large Language Models (LLMs) for tabular classification has gained significant attention for its training-free adaptability across diverse datasets. Recent advancements, like TabPFN, excel in small-scale tabular datasets but struggle to scale for large and complex datasets. Our work enhances the efficiency and scalability of TabPFN for larger datasets by incorporating linear attention mechanisms as a scalable alternative to complexity-quadratic self-attention. Our model, TabFlex, efficiently handles tabular datasets with thousands of features and hundreds of classes, scaling seamlessly to millions of samples. For instance, TabFlex processes the poker-hand dataset with over a million samples in just 5 seconds. Our extensive evaluations demonstrate that TabFlex can achieve over a 2x speedup compared to TabPFN and a 1.5x speedup over XGBoost, outperforming 25 tested baselines in terms of efficiency across a diverse range of datasets. Furthermore, TabFlex remains highly effective on large-scale datasets, delivering strong performance with significantly reduced computational costs, especially when combined with data-efficient techniques such as dimensionality reduction and data sampling.

nan


Article 1812

Title@2025-06-05 (4): Conformal Prediction Adaptive to Unknown Subpopulation Shifts

Title: Conformal Prediction Adaptive to Unknown Subpopulation Shifts Konforme Vorhersage Anpassung an unbekannte Subpopulationsverschiebungen 适应未知亚人口变化 2506.05583v1

Authors (4): Nien-Shao Wang, Duygu Nur Yaldiz, Yavuz Faruk Bakman, Sai Praneeth Karimireddy

Conformal prediction is widely used to equip black-box machine learning models with uncertainty quantification enjoying formal coverage guarantees. However, these guarantees typically break down in the presence of distribution shifts, where the data distribution at test time differs from the training (or calibration-time) distribution. In this work, we address subpopulation shifts, where the test environment exhibits an unknown and differing mixture of subpopulations compared to the calibration data. We propose new methods that provably adapt conformal prediction to such shifts, ensuring valid coverage without requiring explicit knowledge of subpopulation structure. Our algorithms scale to high-dimensional settings and perform effectively in realistic machine learning tasks. Extensive experiments on vision (with vision transformers) and language (with large language models) benchmarks demonstrate that our methods reliably maintain coverage and controls risk in scenarios where standard conformal prediction fails.

nan


Article 1813

Title@2025-06-05 (4): On the Importance of Gaussianizing Representations

Title: On the Importance of Gaussianizing Representations Über die Bedeutung gaussianisierender Repräsentationen 关于 “ 高机构化 “ 代表的重要性 2505.00685v2

Authors (2): Daniel Eftekhari, Vardan Papyan

The normal distribution plays a central role in information theory - it is at the same time the best-case signal and worst-case noise distribution, has the greatest representational capacity of any distribution, and offers an equivalence between uncorrelatedness and independence for joint distributions. Accounting for the mean and variance of activations throughout the layers of deep neural networks has had a significant effect on facilitating their effective training, but seldom has a prescription for precisely what distribution these activations should take, and how this might be achieved, been offered. Motivated by the information-theoretic properties of the normal distribution, we address this question and concurrently present normality normalization: a novel normalization layer which encourages normality in the feature representations of neural networks using the power transform and employs additive Gaussian noise during training. Our experiments comprehensively demonstrate the effectiveness of normality normalization, in regards to its generalization performance on an array of widely used model and dataset combinations, its strong performance across various common factors of variation such as model width, depth, and training minibatch size, its suitability for usage wherever existing normalization layers are conventionally used, and as a means to improving model robustness to random perturbations.

nan


Article 1814

Title@2025-06-05 (4): The Impact of Inference Acceleration on Bias of LLMs

Title: The Impact of Inference Acceleration on Bias of LLMs Die Auswirkungen von Inferenzbeschleunigung auf LLM-Bias 推推加速对LLMM的偏见的影响 2410.22118v3

Authors (4): Elisabeth Kirsten, Ivan Habernal, Vedant Nanda, Muhammad Bilal Zafar

Last few years have seen unprecedented advances in capabilities of Large Language Models (LLMs). These advancements promise to benefit a vast array of application domains. However, due to their immense size, performing inference with LLMs is both costly and slow. Consequently, a plethora of recent work has proposed strategies to enhance inference efficiency, e.g., quantization, pruning, and caching. These acceleration strategies reduce the inference cost and latency, often by several factors, while maintaining much of the predictive performance measured via common benchmarks. In this work, we explore another critical aspect of LLM performance: demographic bias in model generations due to inference acceleration optimizations. Using a wide range of metrics, we probe bias in model outputs from a number of angles. Analysis of outputs before and after inference acceleration shows significant change in bias. Worryingly, these bias effects are complex and unpredictable. A combination of an acceleration strategy and bias type may show little bias change in one model but may lead to a large effect in another. Our results highlight a need for in-depth and case-by-case evaluation of model bias after it has been modified to accelerate inference.

nan


Article 1815

Title@2025-06-05 (4): Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts

Title: Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts Gemeinsames Lernen in Agentensystemen: Eine kollektive KI ist größer als die Summe ihrer Teile 危险系统合作学习:集体AI大于其各部分的总和 2506.05577v1

Authors (10): Saptarshi Nath, Christos Peridis, Eseoghene Benjamin, Xinran Liu, Soheil Kolouri, Peter Kinnell, Zexin Li, Cong Liu, Shirin Dora, Andrea Soltoggio

Agentic AI has gained significant interest as a research paradigm focused on autonomy, self-directed learning, and long-term reliability of decision making. Real-world agentic systems operate in decentralized settings on a large set of tasks or data distributions with constraints such as limited bandwidth, asynchronous execution, and the absence of a centralized model or even common objectives. We posit that exploiting previously learned skills, task similarities, and communication capabilities in a collective of agentic AI are challenging but essential elements to enabling scalability, open-endedness, and beneficial collaborative learning dynamics. In this paper, we introduce Modular Sharing and Composition in Collective Learning (MOSAIC), an agentic algorithm that allows multiple agents to independently solve different tasks while also identifying, sharing, and reusing useful machine-learned knowledge, without coordination, synchronization, or centralized control. MOSAIC combines three mechanisms: (1) modular policy composition via neural network masks, (2) cosine similarity estimation using Wasserstein embeddings for knowledge selection, and (3) asynchronous communication and policy integration. Results on a set of RL benchmarks show that MOSAIC has a greater sample efficiency than isolated learners, i.e., it learns significantly faster, and in some cases, finds solutions to tasks that cannot be solved by isolated learners. The collaborative learning and sharing dynamics are also observed to result in the emergence of ideal curricula of tasks, from easy to hard. These findings support the case for collaborative learning in agentic systems to achieve better and continuously evolving performance both at the individual and collective levels.

nan


Article 1816

Title@2025-06-05 (4): When can in-context learning generalize out of task distribution?

Title: When can in-context learning generalize out of task distribution? Wann kann das In-Context-Lernen aus der Aufgabenverteilung verallgemeinern? 什么时候能通俗易懂地了解任务分配的概况? 2506.05574v1

Authors (4): Chase Goddard, Lindsay M. Smith, Vudtiwat Ngampruetikorn, David J. Schwab

In-context learning (ICL) is a remarkable capability of pretrained transformers that allows models to generalize to unseen tasks after seeing only a few examples. We investigate empirically the conditions necessary on the pretraining distribution for ICL to emerge and generalize \emph{out-of-distribution}. Previous work has focused on the number of distinct tasks necessary in the pretraining dataset. Here, we use a different notion of task diversity to study the emergence of ICL in transformers trained on linear functions. We find that as task diversity increases, transformers undergo a transition from a specialized solution, which exhibits ICL only within the pretraining task distribution, to a solution which generalizes out of distribution to the entire task space. We also investigate the nature of the solutions learned by the transformer on both sides of the transition, and observe similar transitions in nonlinear regression problems. We construct a phase diagram to characterize how our concept of task diversity interacts with the number of pretraining tasks. In addition, we explore how factors such as the depth of the model and the dimensionality of the regression problem influence the transition.

nan


Article 1817

Title@2025-06-05 (4): Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning

Title: Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning Ravan: Multi-Head Low-Rank Anpassung für Federated Fine-Tuning Ravan:针对联邦罚款的多级低兰克适应措施 2506.05568v1

Authors (4): Arian Raje, Baris Askin, Divyansh Jhunjhunwala, Gauri Joshi

Large language models (LLMs) have not yet effectively leveraged the vast amounts of edge-device data, and federated learning (FL) offers a promising paradigm to collaboratively fine-tune LLMs without transferring private edge data to the cloud. To operate within the computation and communication constraints of edge devices, recent literature on federated fine-tuning of LLMs proposes the use of low-rank adaptation (LoRA) and similar parameter-efficient methods. However, LoRA-based methods suffer from accuracy degradation in FL settings, primarily because of data and computational heterogeneity across clients. We propose \textsc{Ravan}, an adaptive multi-head LoRA method that balances parameter efficiency and model expressivity by reparameterizing the weight updates as the sum of multiple LoRA heads $s_i\textbf{B}_i\textbf{H}_i\textbf{A}_i$ in which only the core matrices $\textbf{H}_i$ and their lightweight scaling factors $s_i$ are trained. These trainable scaling factors let the optimization focus on the most useful heads, recovering a higher-rank approximation of the full update without increasing the number of communicated parameters since clients upload $s_i\textbf{H}_i$ directly. Experiments on vision and language benchmarks show that \textsc{Ravan} improves test accuracy by 2-8\% over prior parameter-efficient baselines, making it a robust and scalable solution for federated fine-tuning of LLMs.

nan


Article 1818

Title@2025-06-05 (4): Partially-Supervised Neural Network Model For Quadratic Multiparametric Programming

Title: Partially-Supervised Neural Network Model For Quadratic Multiparametric Programming Teilweise überwachtes neurales Netzwerkmodell für die quadratische multiparametrische Programmierung 半二次多参数方案拟订部分运行的神经网络模型 2506.05567v1

Authors (3): Fuat Can Beylunioglu, Mehrdad Pirnia, P. Robert Duimering

Neural Networks (NN) with ReLU activation functions are used to model multiparametric quadratic optimization problems (mp-QP) in diverse engineering applications. Researchers have suggested leveraging the piecewise affine property of deep NN models to solve mp-QP with linear constraints, which also exhibit piecewise affine behaviour. However, traditional deep NN applications to mp-QP fall short of providing optimal and feasible predictions, even when trained on large datasets. This study proposes a partially-supervised NN (PSNN) architecture that directly represents the mathematical structure of the global solution function. In contrast to generic NN training approaches, the proposed PSNN method derives a large proportion of model weights directly from the mathematical properties of the optimization problem, producing more accurate solutions despite significantly smaller training data sets. Many energy management problems are formulated as QP, so we apply the proposed approach to energy systems (specifically DC optimal power flow) to demonstrate proof of concept. Model performance in terms of solution accuracy and speed of predictions was compared against a commercial solver and a generic Deep NN model based on classical training. Results show KKT sufficient conditions for PSNN consistently outperform generic NN architectures with classical training using far less data, including when tested on extreme, out-of-training distribution test data. Given its speed advantages over traditional solvers, the PSNN model can quickly produce optimal and feasible solutions within a second for millions of input parameters sampled from a distribution of stochastic demands and renewable generator dispatches, which can be used for simulations and long term planning.

nan


Article 1819

Title@2025-06-05 (4): Applying Informer for Option Pricing: A Transformer-Based Approach

Title: Applying Informer for Option Pricing: A Transformer-Based Approach Informer für Optionspreise anwenden: Ein transformerbasierter Ansatz 备选定价应用内线:基于变换器的方法 2506.05565v1

Authors (2): Feliks Bańka, Jarosław A. Chudziak

Accurate option pricing is essential for effective trading and risk management in financial markets, yet it remains challenging due to market volatility and the limitations of traditional models like Black-Scholes. In this paper, we investigate the application of the Informer neural network for option pricing, leveraging its ability to capture long-term dependencies and dynamically adjust to market fluctuations. This research contributes to the field of financial forecasting by introducing Informer’s efficient architecture to enhance prediction accuracy and provide a more adaptable and resilient framework compared to existing methods. Our results demonstrate that Informer outperforms traditional approaches in option pricing, advancing the capabilities of data-driven financial forecasting in this domain.

nan


Article 1820

Title@2025-06-05 (4): Relational Conformal Prediction for Correlated Time Series

Title: Relational Conformal Prediction for Correlated Time Series Beziehungskonforme Vorhersage für korrelierte Zeitreihen Cor相关时间序列的通货膨胀正规预测 2502.09443v2

Authors (5): Andrea Cini, Alexander Jenkins, Danilo Mandic, Cesare Alippi, Filippo Maria Bianchi

We address the problem of uncertainty quantification in time series forecasting by exploiting observations at correlated sequences. Relational deep learning methods leveraging graph representations are among the most effective tools for obtaining point estimates from spatiotemporal data and correlated time series. However, the problem of exploiting relational structures to estimate the uncertainty of such predictions has been largely overlooked in the same context. To this end, we propose a novel distribution-free approach based on the conformal prediction framework and quantile regression. Despite the recent applications of conformal prediction to sequential data, existing methods operate independently on each target time series and do not account for relationships among them when constructing the prediction interval. We fill this void by introducing a novel conformal prediction method based on graph deep learning operators. Our approach, named Conformal Relational Prediction (CoRel), does not require the relational structure (graph) to be known a priori and can be applied on top of any pre-trained predictor. Additionally, CoRel includes an adaptive component to handle non-exchangeable data and changes in the input time series. Our approach provides accurate coverage and achieves state-of-the-art uncertainty quantification in relevant benchmarks.

nan


Article 1821

Title@2025-06-05 (4): Mixed-Precision Conjugate Gradient Solvers with RL-Driven Precision Tuning

Title: Mixed-Precision Conjugate Gradient Solvers with RL-Driven Precision Tuning Mixed-Precision Conjugate Gradient Solver mit RL-getriebener Präzisionstuning RL-Driven 精确调试混合精密共分梯度溶剂 2504.14268v4

Authors (1): Xinye Chen

This paper presents a novel reinforcement learning (RL) framework for dynamically optimizing numerical precision in the preconditioned conjugate gradient (CG) method. By modeling precision selection as a Markov Decision Process (MDP), we employ Q-learning to adaptively assign precision levels to key operations, striking an optimal balance between computational efficiency and numerical accuracy, while ensuring stability through double-precision scalar computations and residual computing. In practice, the algorithm is trained on a set of data and subsequently performs inference for precision selection on out-of-sample data, without requiring re-analysis or retraining for new datasets. This enables the method to adapt seamlessly to new problem instances without the computational overhead of recalibration. Our results demonstrate the effectiveness of RL in enhancing solver’s performance, marking the first application of RL to mixed-precision numerical methods. The findings highlight the approach’s practical advantages, robustness, and scalability, providing valuable insights into its integration with iterative solvers and paving the way for AI-driven advancements in scientific computing.

nan


Article 1822

Title@2025-06-05 (4): DART-Vetter: A Deep LeARning Tool for automatic triage of exoplanet candidates

Title: DART-Vetter: A Deep LeARning Tool for automatic triage of exoplanet candidates DART-Vetter: Ein Deep LeARning Tool für die automatische Triage von Exoplaneten-Kandidaten DART – – 设计者:一个用于自动分流异丙型板块候选物的深深LeARning工具 2506.05556v1

Authors (15): Stefano Fiscale, Laura Inno, Alessandra Rotundi, Angelo Ciaramella, Alessio Ferone, Christian Magliano, Luca Cacciapuoti, Veselin Kostov, Elisa Quintana, Giovanni Covone, Maria Teresa Muscari Tomajoli, Vito Saggese, Luca Tonietti, Antonio Vanzanella, Vincenzo Della Corte

In the identification of new planetary candidates in transit surveys, the employment of Deep Learning models proved to be essential to efficiently analyse a continuously growing volume of photometric observations. To further improve the robustness of these models, it is necessary to exploit the complementarity of data collected from different transit surveys such as NASA’s Kepler, Transiting Exoplanet Survey Satellite (TESS), and, in the near future, the ESA PLAnetary Transits and Oscillation of stars (PLATO) mission. In this work, we present a Deep Learning model, named DART-Vetter, able to distinguish planetary candidates (PC) from false positives signals (NPC) detected by any potential transiting survey. DART-Vetter is a Convolutional Neural Network that processes only the light curves folded on the period of the relative signal, featuring a simpler and more compact architecture with respect to other triaging and/or vetting models available in the literature. We trained and tested DART-Vetter on several dataset of publicly available and homogeneously labelled TESS and Kepler light curves in order to prove the effectiveness of our model. Despite its simplicity, DART-Vetter achieves highly competitive triaging performance, with a recall rate of 91% on an ensemble of TESS and Kepler data, when compared to Exominer and Astronet-Triage. Its compact, open source and easy to replicate architecture makes DART-Vetter a particularly useful tool for automatizing triaging procedures or assisting human vetters, showing a discrete generalization on TCEs with Multiple Event Statistic (MES) > 20 and orbital period < 50 days.

nan


Article 1823

Title@2025-06-05 (4): Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP

Title: Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP Multivariate Temporale Regression im Maßstab: Ein Drei-Pillar-Framework kombiniert ML, XAI und NLP 规模上多变时间回落:三边框架组合 ML、 XAI 和 NLP 2504.02151v2

Authors (2): Jiztom Kavalakkatt Francis, Matthew J Darr

This paper introduces a novel framework that accelerates the discovery of actionable relationships in high-dimensional temporal data by integrating machine learning (ML), explainable AI (XAI), and natural language processing (NLP) to enhance data quality and streamline workflows. Traditional methods often fail to recognize complex temporal relationships, leading to noisy, redundant, or biased datasets. Our approach combines ML-driven pruning to identify and mitigate low-quality samples, XAI-based interpretability to validate critical feature interactions, and NLP for future contextual validation, reducing the time required to uncover actionable insights by 40-60%. Evaluated on real-world agricultural and synthetic datasets, the framework significantly improves performance metrics (e.g., MSE, R2, MAE) and computational efficiency, with hardware-agnostic scalability across diverse platforms. While long-term real-world impacts (e.g., cost savings, sustainability gains) are pending, this methodology provides an immediate pathway to accelerate data-centric AI in dynamic domains like agriculture and energy, enabling faster iteration cycles for domain experts.

nan


Article 1824

Title@2025-06-05 (4): Online Conformal Model Selection for Nonstationary Time Series

Title: Online Conformal Model Selection for Nonstationary Time Series Online-Konforme Modellauswahl für nichtstationäre Zeitreihen 非固定时间系列在线非正式模型选择 2506.05544v1

Authors (2): Shibo Li, Yao Zheng

This paper introduces the MPS (Model Prediction Set), a novel framework for online model selection for nonstationary time series. Classical model selection methods, such as information criteria and cross-validation, rely heavily on the stationarity assumption and often fail in dynamic environments which undergo gradual or abrupt changes over time. Yet real-world data are rarely stationary, and model selection under nonstationarity remains a largely open problem. To tackle this challenge, we combine conformal inference with model confidence sets to develop a procedure that adaptively selects models best suited to the evolving dynamics at any given time. Concretely, the MPS updates in real time a confidence set of candidate models that covers the best model for the next time period with a specified long-run probability, while adapting to nonstationarity of unknown forms. Through simulations and real-world data analysis, we demonstrate that MPS reliably and efficiently identifies optimal models under nonstationarity, an essential capability lacking in offline methods. Moreover, MPS frequently produces high-quality sets with small cardinality, whose evolution offers deeper insights into changing dynamics. As a generic framework, MPS accommodates any data-generating process, data structure, model class, training method, and evaluation metric, making it broadly applicable across diverse problem settings.

nan


Article 1825

Title@2025-06-05 (4): Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data

Title: Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data Agentomics-ML: Autonomes Machine Learning Experimentation Agent für Genomische und Transkriptionsdaten ML:基因组和转基因数据自动机械学习实验代理 2506.05542v1

Authors (9): Vlastimil Martinek, Andrea Gariboldi, Dimosthenis Tzimotoudis, Aitor Alberdi Escudero, Edward Blake, David Cechak, Luke Cassar, Alessandro Balestrucci, Panagiotis Alexiou

The adoption of machine learning (ML) and deep learning methods has revolutionized molecular medicine by driving breakthroughs in genomics, transcriptomics, drug discovery, and biological systems modeling. The increasing quantity, multimodality, and heterogeneity of biological datasets demand automated methods that can produce generalizable predictive models. Recent developments in large language model-based agents have shown promise for automating end-to-end ML experimentation on structured benchmarks. However, when applied to heterogeneous computational biology datasets, these methods struggle with generalization and success rates. Here, we introduce Agentomics-ML, a fully autonomous agent-based system designed to produce a classification model and the necessary files for reproducible training and inference. Our method follows predefined steps of an ML experimentation process, repeatedly interacting with the file system through Bash to complete individual steps. Once an ML model is produced, training and validation metrics provide scalar feedback to a reflection step to identify issues such as overfitting. This step then creates verbal feedback for future iterations, suggesting adjustments to steps such as data representation, model architecture, and hyperparameter choices. We have evaluated Agentomics-ML on several established genomic and transcriptomic benchmark datasets and show that it outperforms existing state-of-the-art agent-based methods in both generalization and success rates. While state-of-the-art models built by domain experts still lead in absolute performance on the majority of the computational biology datasets used in this work, Agentomics-ML narrows the gap for fully autonomous systems and achieves state-of-the-art performance on one of the used benchmark datasets. The code is available at https://github.com/BioGeMT/Agentomics-ML.

nan


Article 1826

Title@2025-06-05 (4): SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms

Title: SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms SocialDF: Benchmark Dataset und Detection Model zur Minderung schädlicher Deepfake-Inhalte auf Social Media-Plattformen 社会DF:社会媒体平台减少有害深层内容的基准数据集和检测模式 2506.05538v1

Authors (6): Arnesh Batra, Anushk Kumar, Jashn Khemani, Arush Gumber, Arhan Jain, Somil Gupta

The rapid advancement of deep generative models has significantly improved the realism of synthetic media, presenting both opportunities and security challenges. While deepfake technology has valuable applications in entertainment and accessibility, it has emerged as a potent vector for misinformation campaigns, particularly on social media. Existing detection frameworks struggle to distinguish between benign and adversarially generated deepfakes engineered to manipulate public perception. To address this challenge, we introduce SocialDF, a curated dataset reflecting real-world deepfake challenges on social media platforms. This dataset encompasses high-fidelity deepfakes sourced from various online ecosystems, ensuring broad coverage of manipulative techniques. We propose a novel LLM-based multi-factor detection approach that combines facial recognition, automated speech transcription, and a multi-agent LLM pipeline to cross-verify audio-visual cues. Our methodology emphasizes robust, multi-modal verification techniques that incorporate linguistic, behavioral, and contextual analysis to effectively discern synthetic media from authentic content.

nan


Article 1827

Title@2025-06-05 (4): Faster Approx. Top-K: Harnessing the Full Power of Two Stages

Title: Faster Approx. Top-K: Harnessing the Full Power of Two Stages Schneller ca. Top-K: Die volle Kraft zweier Etappen nutzen 快速近似。 最高K级: 利用两个阶段的全功率 2506.04165v2

Authors (5): Yashas Samaga, Varun Yerram, Spandana Raj Babbula, Prateek Jain, Praneeth Netrapalli

We consider the Top-$K$ selection problem, which aims to identify the largest-$K$ elements from an array. Top-$K$ selection arises in many machine learning algorithms and often becomes a bottleneck on accelerators, which are optimized for dense matrix multiplications. To address this problem, \citet{chern2022tpuknnknearestneighbor} proposed a fast two-stage \textit{approximate} Top-$K$ algorithm: (i) partition the input array and select the top-$1$ element from each partition, (ii) sort this \textit{smaller subset} and return the top $K$ elements. In this paper, we consider a generalized version of this algorithm, where the first stage selects top-$K’$ elements, for some $1 \leq K’ \leq K$, from each partition. Our contributions are as follows: (i) we derive an expression for the expected recall of this generalized algorithm and show that choosing $K’ > 1$ with fewer partitions in the first stage reduces the input size to the second stage more effectively while maintaining the same expected recall as the original algorithm, (ii) we derive a bound on the expected recall for the original algorithm in \citet{chern2022tpuknnknearestneighbor} that is provably tighter by a factor of $2$ than the one in that paper, and (iii) we implement our algorithm on Cloud TPUv5e and achieve around an order of magnitude speedups over the original algorithm without sacrificing recall on real-world tasks.

nan


Article 1828

Title@2025-06-05 (4): A new pathway to generative artificial intelligence by minimizing the maximum entropy

Title: A new pathway to generative artificial intelligence by minimizing the maximum entropy Ein neuer Weg zur generativen künstlichen Intelligenz durch Minimierung der maximalen Entropie 通过最大限度地减少最大倍增 实现人工智能基因化的新途径 2502.13287v2

Authors (2): Mattia Miotto, Lorenzo Monacelli

Generative artificial intelligence revolutionized society. Current models are trained by minimizing the distance between the produced data and the training set. Consequently, development is plateauing as they are intrinsically data-hungry and challenging to direct during the generative process. To overcome these limitations, we introduce a paradigm shift through a framework where we do not fit the training set but find the most informative yet least noisy representation of the data simultaneously minimizing the entropy to reduce noise and maximizing it to remain unbiased via adversary training. The result is a general physics-driven model, which is data-efficient and flexible, permitting to control and influence the generative process. Benchmarking shows that our approach outperforms variational autoencoders. We demonstrate the methods effectiveness in generating images, even with limited training data, and its unprecedented capability to customize the generation process a posteriori without any fine-tuning or retraining

nan


Article 1829

Title@2025-06-05 (4): MCMC-Correction of Score-Based Diffusion Models for Model Composition

Title: MCMC-Correction of Score-Based Diffusion Models for Model Composition MCMC-Korrektion von Score-basierten Diffusionsmodellen für die Modellzusammensetzung MC-基于记分传播模型模型的示范构成的校正 2307.14012v3

Authors (5): Anders Sjöberg, Jakob Lindqvist, Magnus Önnheim, Mats Jirstrand, Lennart Svensson

Diffusion models can be parameterized in terms of either a score or an energy function. The energy parameterization is attractive as it enables sampling procedures such as Markov Chain Monte Carlo (MCMC) that incorporates a Metropolis-Hastings (MH) correction step based on energy differences between proposed samples. Such corrections can significantly improve sampling quality, particularly in the context of model composition, where pre-trained models are combined to generate samples from novel distributions. Score-based diffusion models, on the other hand, are more widely adopted and come with a rich ecosystem of pre-trained models. However, they do not, in general, define an underlying energy function, making MH-based sampling inapplicable. In this work, we address this limitation by retaining the score parameterization and introducing a novel MH-like acceptance rule based on line integration of the score function. This allows the reuse of existing diffusion models while still combining the reverse process with various MCMC techniques, viewed as an instance of annealed MCMC. Through experiments on synthetic and real-world data, we show that our MH-like samplers offer comparable improvements to those obtained with energy-based models, without requiring explicit energy parameterization.

nan


Article 1830

Title@2025-06-05 (4): Spectral Graph Neural Networks are Incomplete on Graphs with a Simple Spectrum

Title: Spectral Graph Neural Networks are Incomplete on Graphs with a Simple Spectrum Spektrale Graphen-Neuralnetze sind auf Graphen mit einem einfachen Spektrum unvollständig 光谱图形神经网络在带有简单光谱的图形上不完整 2506.05530v1

Authors (4): Snir Hordan, Maya Bechler-Speicher, Gur Lifshitz, Nadav Dym

Spectral features are widely incorporated within Graph Neural Networks (GNNs) to improve their expressive power, or their ability to distinguish among non-isomorphic graphs. One popular example is the usage of graph Laplacian eigenvectors for positional encoding in MPNNs and Graph Transformers. The expressive power of such Spectrally-enhanced GNNs (SGNNs) is usually evaluated via the k-WL graph isomorphism test hierarchy and homomorphism counting. Yet, these frameworks align poorly with the graph spectra, yielding limited insight into SGNNs’ expressive power. We leverage a well-studied paradigm of classifying graphs by their largest eigenvalue multiplicity to introduce an expressivity hierarchy for SGNNs. We then prove that many SGNNs are incomplete even on graphs with distinct eigenvalues. To mitigate this deficiency, we adapt rotation equivariant neural networks to the graph spectra setting to propose a method to provably improve SGNNs’ expressivity on simple spectrum graphs. We empirically verify our theoretical claims via an image classification experiment on the MNIST Superpixel dataset and eigenvector canonicalization on graphs from ZINC.

nan


Article 1831

Title@2025-06-05 (4): Avoiding Death through Fear Intrinsic Conditioning

Title: Avoiding Death through Fear Intrinsic Conditioning Vermeiden des Todes durch Angst Intrinsische Konditionierung 避免因恐惧而死亡 2506.05529v1

Authors (4): Rodney Sanchez, Ferat Sahin, Alexander Ororbia, Jamison Heard

Biological and psychological concepts have inspired reinforcement learning algorithms to create new complex behaviors that expand agents’ capacity. These behaviors can be seen in the rise of techniques like goal decomposition, curriculum, and intrinsic rewards, which have paved the way for these complex behaviors. One limitation in evaluating these methods is the requirement for engineered extrinsic for realistic environments. A central challenge in engineering the necessary reward function(s) comes from these environments containing states that carry high negative rewards, but provide no feedback to the agent. Death is one such stimuli that fails to provide direct feedback to the agent. In this work, we introduce an intrinsic reward function inspired by early amygdala development and produce this intrinsic reward through a novel memory-augmented neural network (MANN) architecture. We show how this intrinsic motivation serves to deter exploration of terminal states and results in avoidance behavior similar to fear conditioning observed in animals. Furthermore, we demonstrate how modifying a threshold where the fear response is active produces a range of behaviors that are described under the paradigm of general anxiety disorders (GADs). We demonstrate this behavior in the Miniworld Sidewalk environment, which provides a partially observable Markov decision process (POMDP) and a sparse reward with a non-descriptive terminal condition, i.e., death. In effect, this study results in a biologically-inspired neural architecture and framework for fear conditioning paradigms; we empirically demonstrate avoidance behavior in a constructed agent that is able to solve environments with non-descriptive terminal conditions.

nan


Article 1832

Title@2025-06-05 (4): MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning

Title: MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning MORSE-500: Ein programmierbarer Video-Benchmark zum Stress-Test Multimodale Reasoning MORSE-500:压力测试多模式理由方案可控制的视频基准 2506.05523v1

Authors (13): Zikui Cai, Andrew Wang, Anirudh Satheesh, Ankit Nakhawa, Hyunwoo Jae, Keenan Powell, Minghui Liu, Neel Jay, Sungbin Oh, Xiyao Wang, Yongyuan Liang, Tom Goldstein, Furong Huang

Despite rapid advances in vision-language models (VLMs), current benchmarks for multimodal reasoning fall short in three key dimensions. First, they overwhelmingly rely on static images, failing to capture the temporal complexity of real-world environments. Second, they narrowly focus on mathematical problem-solving, neglecting the broader spectrum of reasoning skills – including abstract, physical, planning, spatial, and temporal capabilities – required for robust multimodal intelligence. Third, many benchmarks quickly saturate, offering limited headroom for diagnosing failure modes or measuring continued progress. We introduce MORSE-500 (Multimodal Reasoning Stress-test Environment), a video benchmark composed of 500 fully scripted clips with embedded questions spanning six complementary reasoning categories. Each instance is programmatically generated using deterministic Python scripts (via Manim, Matplotlib, MoviePy), generative video models, and curated real footage. This script-driven design allows fine-grained control over visual complexity, distractor density, and temporal dynamics – enabling difficulty to be scaled systematically as models improve. Unlike static benchmarks that become obsolete once saturated, MORSE-500 is built to evolve: its controllable generation pipeline supports the creation of arbitrarily challenging new instances, making it ideally suited for stress-testing next-generation models. Initial experiments with state-of-the-art systems – including various Gemini 2.5 Pro and OpenAI o3 which represent the strongest available at the time, alongside strong open-source models – reveal substantial performance gaps across all categories, with particularly large deficits in abstract and planning tasks. We release the full dataset, generation scripts, and evaluation harness to support transparent, reproducible, and forward-looking multimodal reasoning research.

nan


Article 1833

Title@2025-06-05 (4): The Complexity of Learning Sparse Superposed Features with Feedback

Title: The Complexity of Learning Sparse Superposed Features with Feedback Die Komplexität des Lernens Sparse überlagerte Funktionen mit Feedback 具有反馈的学习偏散超强特征的复杂性 2502.05407v3

Authors (1): Akash Kumar

The success of deep networks is crucially attributed to their ability to capture latent features within a representation space. In this work, we investigate whether the underlying learned features of a model can be efficiently retrieved through feedback from an agent, such as a large language model (LLM), in the form of relative \textit{triplet comparisons}. These features may represent various constructs, including dictionaries in LLMs or a covariance matrix of Mahalanobis distances. We analyze the feedback complexity associated with learning a feature matrix in sparse settings. Our results establish tight bounds when the agent is permitted to construct activations and demonstrate strong upper bounds in sparse scenarios when the agent’s feedback is limited to distributional information. We validate our theoretical findings through experiments on two distinct applications: feature recovery from Recursive Feature Machines and dictionary extraction from sparse autoencoders trained on Large Language Models.

nan


Article 1834

Title@2025-06-05 (4): Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots

Title: Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots Learning to Recover: Dynamische Belohnungsformung mit Rad-Leg-Koordination für gefallene Roboter 学习恢复: 堕落机器人使用轮式轴协调进行动态回授形状 2506.05516v1

Authors (5): Boyuan Deng, Luca Rossini, Jin Wang, Weijie Wang, Nikolaos Tsagarakis

Adaptive recovery from fall incidents are essential skills for the practical deployment of wheeled-legged robots, which uniquely combine the agility of legs with the speed of wheels for rapid recovery. However, traditional methods relying on preplanned recovery motions, simplified dynamics or sparse rewards often fail to produce robust recovery policies. This paper presents a learning-based framework integrating Episode-based Dynamic Reward Shaping and curriculum learning, which dynamically balances exploration of diverse recovery maneuvers with precise posture refinement. An asymmetric actor-critic architecture accelerates training by leveraging privileged information in simulation, while noise-injected observations enhance robustness against uncertainties. We further demonstrate that synergistic wheel-leg coordination reduces joint torque consumption by 15.8% and 26.2% and improves stabilization through energy transfer mechanisms. Extensive evaluations on two distinct quadruped platforms achieve recovery success rates up to 99.1% and 97.8% without platform-specific tuning. The supplementary material is available at https://boyuandeng.github.io/L2R-WheelLegCoordination/

nan


Article 1835

Title@2025-06-05 (4): Winner-takes-all for Multivariate Probabilistic Time Series Forecasting

Title: Winner-takes-all for Multivariate Probabilistic Time Series Forecasting Winner-takes-all für Multivariate probabilistische Zeitreihen-Prognose 多变量概率性时间序列预测的赢取全胜 2506.05515v1

Authors (3): Adrien Cortés, Rémi Rehm, Victor Letzelter

We introduce TimeMCL, a method leveraging the Multiple Choice Learning (MCL) paradigm to forecast multiple plausible time series futures. Our approach employs a neural network with multiple heads and utilizes the Winner-Takes-All (WTA) loss to promote diversity among predictions. MCL has recently gained attention due to its simplicity and ability to address ill-posed and ambiguous tasks. We propose an adaptation of this framework for time-series forecasting, presenting it as an efficient method to predict diverse futures, which we relate to its implicit quantization objective. We provide insights into our approach using synthetic data and evaluate it on real-world time series, demonstrating its promising performance at a light computational cost.

nan


Article 1836

Title@2025-06-05 (4): Geometric and Physical Constraints Synergistically Enhance Neural PDE Surrogates

Title: Geometric and Physical Constraints Synergistically Enhance Neural PDE Surrogates Geometrische und physikalische Einschränkungen Synergistisch verbessern neurale PDE-Überlagerungen 同步增强神经PDE代号 2506.05513v1

Authors (2): Yunfei Huang, David S. Greenberg

Neural PDE surrogates can improve the cost-accuracy tradeoff of classical solvers, but often generalize poorly to new initial conditions and accumulate errors over time. Physical and symmetry constraints have shown promise in closing this performance gap, but existing techniques for imposing these inductive biases are incompatible with the staggered grids commonly used in computational fluid dynamics. Here we introduce novel input and output layers that respect physical laws and symmetries on the staggered grids, and for the first time systematically investigate how these constraints, individually and in combination, affect the accuracy of PDE surrogates. We focus on two challenging problems: shallow water equations with closed boundaries and decaying incompressible turbulence. Compared to strong baselines, symmetries and physical constraints consistently improve performance across tasks, architectures, autoregressive prediction steps, accuracy measures, and network sizes. Symmetries are more effective than physical constraints, but surrogates with both performed best, even compared to baselines with data augmentation or pushforward training, while themselves benefiting from the pushforward trick. Doubly-constrained surrogates also generalize better to initial conditions and durations beyond the range of the training data, and more accurately predict real-world ocean currents.

nan


Article 1837

Title@2025-06-05 (4): Investigating Non-Transitivity in LLM-as-a-Judge

Title: Investigating Non-Transitivity in LLM-as-a-Judge Untersuchung der Nichtübertragbarkeit in LLM-as-a-Richter 调查LLM-as-A法官的非过渡性 2502.14074v3

Authors (4): Yi Xu, Laura Ruis, Tim Rocktäschel, Robert Kirk

Automatic evaluation methods based on large language models (LLMs) are emerging as the standard tool for assessing the instruction-following abilities of LLM-based agents. The most common method in this paradigm, pairwise comparisons with a baseline model, critically depends on the assumption of transitive preferences. However, the validity of this assumption remains largely unexplored. In this study, we investigate the presence of non-transitivity within the AlpacaEval framework and analyze its effects on model rankings. We find that LLM judges exhibit non-transitive preferences, leading to rankings that are sensitive to the choice of the baseline model. To mitigate this issue, we show that round-robin tournaments combined with Bradley-Terry models of preference can produce more reliable rankings. Notably, our method increases both the Spearman correlation and the Kendall correlation with Chatbot Arena (95.0% -> 96.4% and 82.1% -> 86.3% respectively). To address the computational cost of round-robin tournaments, we propose Swiss-Wise Iterative Matchmaking (Swim) tournaments, using a dynamic matching strategy to capture the benefits of round-robin tournaments while maintaining computational efficiency.

nan


Article 1838

Title@2025-06-05 (4): How can Diffusion Models Evolve into Continual Generators?

Title: How can Diffusion Models Evolve into Continual Generators? Wie können sich Diffusionsmodelle zu kontinuierlichen Generatoren entwickeln? 传播模型如何发展成持续发电机? 2505.11936v2

Authors (3): Jingren Liu, Zhong Ji, Xiangyu Chen

While diffusion models have achieved remarkable success in static data generation, their deployment in streaming or continual learning (CL) scenarios faces a major challenge: catastrophic forgetting (CF), where newly acquired generative capabilities overwrite previously learned ones. To systematically address this, we introduce a formal Continual Diffusion Generation (CDG) paradigm that characterizes and redefines CL in the context of generative diffusion models. Prior efforts often adapt heuristic strategies from continual classification tasks but lack alignment with the underlying diffusion process. In this work, we develop the first theoretical framework for CDG by analyzing cross-task dynamics in diffusion-based generative modeling. Our analysis reveals that the retention and stability of generative knowledge across tasks are governed by three key consistency criteria: inter-task knowledge consistency (IKC), unconditional knowledge consistency (UKC), and label knowledge consistency (LKC). Building on these insights, we propose Continual Consistency Diffusion (CCD), a principled framework that integrates these consistency objectives into training via hierarchical loss terms $\mathcal{L}{IKC}$, $\mathcal{L}{UKC}$, and $\mathcal{L}_{LKC}$. This promotes effective knowledge retention while enabling the assimilation of new generative capabilities. Extensive experiments on four benchmark datasets demonstrate that CCD achieves state-of-the-art performance under continual settings, with substantial gains in Mean Fidelity (MF) and Incremental Mean Fidelity (IMF), particularly in tasks with rich cross-task knowledge overlap.

nan


Article 1839

Title@2025-06-05 (4): The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models

Title: The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models Der Generative Sprung: Scharfe Probenkomplexität für effizientes Lernen Gaussischer Multi-Index-Modelle 产生式滑坡:高效学习高斯多指标模型的简单抽样复杂程度 2506.05500v1

Authors (3): Alex Damian, Jason D. Lee, Joan Bruna

In this work we consider generic Gaussian Multi-index models, in which the labels only depend on the (Gaussian) $d$-dimensional inputs through their projection onto a low-dimensional $r = O_d(1)$ subspace, and we study efficient agnostic estimation procedures for this hidden subspace. We introduce the \emph{generative leap} exponent $k^\star$, a natural extension of the generative exponent from [Damian et al.’24] to the multi-index setting. We first show that a sample complexity of $n=\Theta(d^{1 \vee \k/2})$ is necessary in the class of algorithms captured by the Low-Degree-Polynomial framework. We then establish that this sample complexity is also sufficient, by giving an agnostic sequential estimation procedure (that is, requiring no prior knowledge of the multi-index model) based on a spectral U-statistic over appropriate Hermite tensors. We further compute the generative leap exponent for several examples including piecewise linear functions (deep ReLU networks with bias), and general deep neural networks (with $r$-dimensional first hidden layer).

nan


Article 1840

Title@2025-06-05 (4): Multidimensional Analysis of Specific Language Impairment Using Unsupervised Learning Through PCA and Clustering

Title: Multidimensional Analysis of Specific Language Impairment Using Unsupervised Learning Through PCA and Clustering Multidimensionale Analyse der spezifischen Sprachbeeinträchtigung durch unbeaufsichtigtes Lernen durch PCA und Clustering 利用无监督的学习,通过常设仲裁院和集群,对特定语言障碍进行多层面分析 2506.05498v1

Authors (1): Niruthiha Selvanayagam

Specific Language Impairment (SLI) affects approximately 7 percent of children, presenting as isolated language deficits despite normal cognitive abilities, sensory systems, and supportive environments. Traditional diagnostic approaches often rely on standardized assessments, which may overlook subtle developmental patterns. This study aims to identify natural language development trajectories in children with and without SLI using unsupervised machine learning techniques, providing insights for early identification and targeted interventions. Narrative samples from 1,163 children aged 4-16 years across three corpora (Conti-Ramsden 4, ENNI, and Gillam) were analyzed using Principal Component Analysis (PCA) and clustering. A total of 64 linguistic features were evaluated to uncover developmental trajectories and distinguish linguistic profiles. Two primary clusters emerged: (1) high language production with low SLI prevalence, and (2) limited production but higher syntactic complexity with higher SLI prevalence. Additionally, boundary cases exhibited intermediate traits, supporting a continuum model of language abilities. Findings suggest SLI manifests primarily through reduced production capacity rather than syntactic complexity deficits. The results challenge categorical diagnostic frameworks and highlight the potential of unsupervised learning techniques for refining diagnostic criteria and intervention strategies.

nan


Article 1841

Title@2025-06-05 (4): Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models

Title: Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models Conformal Prediction Beyond the Seen: Eine fehlende Massenperspektive für Unsicherheitsquantifizierung in generativen Modellen 可见之外的非正式预测:在生成模型中不确定性量化的缺失质量角度 2506.05497v1

Authors (4): Sima Noorani, Shayan Kiyani, George Pappas, Hamed Hassani

Uncertainty quantification (UQ) is essential for safe deployment of generative AI models such as large language models (LLMs), especially in high stakes applications. Conformal prediction (CP) offers a principled uncertainty quantification framework, but classical methods focus on regression and classification, relying on geometric distances or softmax scores: tools that presuppose structured outputs. We depart from this paradigm by studying CP in a query only setting, where prediction sets must be constructed solely from finite queries to a black box generative model, introducing a new trade off between coverage, test time query budget, and informativeness. We introduce Conformal Prediction with Query Oracle (CPQ), a framework characterizing the optimal interplay between these objectives. Our finite sample algorithm is built on two core principles: one governs the optimal query policy, and the other defines the optimal mapping from queried samples to prediction sets. Remarkably, both are rooted in the classical missing mass problem in statistics. Specifically, the optimal query policy depends on the rate of decay, or the derivative, of the missing mass, for which we develop a novel estimator. Meanwhile, the optimal mapping hinges on the missing mass itself, which we estimate using Good Turing estimators. We then turn our focus to implementing our method for language models, where outputs are vast, variable, and often under specified. Fine grained experiments on three real world open ended tasks and two LLMs, show CPQ applicability to any black box LLM and highlight: (1) individual contribution of each principle to CPQ performance, and (2) CPQ ability to yield significantly more informative prediction sets than existing conformal methods for language uncertainty quantification.

nan


Article 1842

Title@2025-06-05 (4): Learning-Augmented Hierarchical Clustering

Title: Learning-Augmented Hierarchical Clustering Learning-Augmented Hierarchical Clustering 学习增强的等级集群 2506.05495v1

Authors (4): Vladimir Braverman, Jon C. Ergun, Chen Wang, Samson Zhou

Hierarchical clustering (HC) is an important data analysis technique in which the goal is to recursively partition a dataset into a tree-like structure while grouping together similar data points at each level of granularity. Unfortunately, for many of the proposed HC objectives, there exist strong barriers to approximation algorithms with the hardness of approximation. Thus, we consider the problem of hierarchical clustering given auxiliary information from natural oracles. Specifically, we focus on a splitting oracle which, when provided with a triplet of vertices $(u,v,w)$, answers (possibly erroneously) the pairs of vertices whose lowest common ancestor includes all three vertices in an optimal tree, i.e., identifying which vertex ``splits away’’ from the others. Using such an oracle, we obtain the following results: - A polynomial-time algorithm that outputs a hierarchical clustering tree with $O(1)$-approximation to the Dasgupta objective (Dasgupta [STOC’16]). - A near-linear time algorithm that outputs a hierarchical clustering tree with $(1-o(1))$-approximation to the Moseley-Wang objective (Moseley and Wang [NeurIPS’17]). Under the plausible Small Set Expansion Hypothesis, no polynomial-time algorithm can achieve any constant approximation for Dasgupta’s objective or $(1-C)$-approximation for the Moseley-Wang objective for some constant $C>0$. As such, our results demonstrate that the splitting oracle enables algorithms to outperform standard HC approaches and overcome hardness constraints. Furthermore, our approaches extend to sublinear settings, in which we show new streaming and PRAM algorithms for HC with improved guarantees.

nan


Article 1843

Title: GENIUS: A Generative Framework for Universal Multimodal Search GENIUS: Ein generatives Framework für die universelle multimodale Suche GENNIUS: 通用多模式搜索创创框架 2503.19868v2

Authors (6): Sungyeon Kim, Xinliang Zhu, Xiaofan Lin, Muhammet Bastan, Douglas Gray, Suha Kwak

Generative retrieval is an emerging approach in information retrieval that generates identifiers (IDs) of target data based on a query, providing an efficient alternative to traditional embedding-based retrieval methods. However, existing models are task-specific and fall short of embedding-based retrieval in performance. This paper proposes GENIUS, a universal generative retrieval framework supporting diverse tasks across multiple modalities and domains. At its core, GENIUS introduces modality-decoupled semantic quantization, transforming multimodal data into discrete IDs encoding both modality and semantics. Moreover, to enhance generalization, we propose a query augmentation that interpolates between a query and its target, allowing GENIUS to adapt to varied query forms. Evaluated on the M-BEIR benchmark, it surpasses prior generative methods by a clear margin. Unlike embedding-based retrieval, GENIUS consistently maintains high retrieval speed across database size, with competitive performance across multiple benchmarks. With additional re-ranking, GENIUS often achieves results close to those of embedding-based methods while preserving efficiency.

nan


Article 1844

Title@2025-06-05 (4): Sentiment Analysis in Learning Management Systems Understanding Student Feedback at Scale

Title: Sentiment Analysis in Learning Management Systems Understanding Student Feedback at Scale Sentiment Analysis in Lernmanagement-Systemen Verständnis des Feedbacks von Studierenden auf Scale 学习管理系统对学习管理系统的情感分析 了解学生的大规模反馈 2506.05490v1

Authors (1): Mohammed Almutairi

During the wake of the Covid-19 pandemic, the educational paradigm has experienced a major change from in person learning traditional to online platforms. The change of learning convention has impacted the teacher-student especially in non-verbal communication. The absent of non-verbal communication has led to a reliance on verbal feedback which diminished the efficacy of the educational experience. This paper explores the integration of sentiment analysis into learning management systems (LMS) to bridge the student-teacher’s gap by offering an alternative approach to interpreting student feedback beyond its verbal context. The research involves data preparation, feature selection, and the development of a deep neural network model encompassing word embedding, LSTM, and attention mechanisms. This model is compared against a logistic regression baseline to evaluate its efficacy in understanding student feedback. The study aims to bridge the communication gap between instructors and students in online learning environments, offering insights into the emotional context of student feedback and ultimately improving the quality of online education.

nan


Article 1845

Title@2025-06-05 (4): Initial Model Incorporation for Deep Learning FWI: Pretraining or Denormalization?

Title: Initial Model Incorporation for Deep Learning FWI: Pretraining or Denormalization? Ursprüngliche Modellinkorporation für Deep Learning FWI: Vor- oder Entnormierung? FWI: 培训前或非正规化? 2506.05484v1

Authors (4): Ruihua Chen, Bangyu Wu, Meng Li, Kai Yang

Subsurface property neural network reparameterized full waveform inversion (FWI) has emerged as an effective unsupervised learning framework, which can invert stably with an inaccurate starting model. It updates the trainable neural network parameters instead of fine-tuning on the subsurface model directly. There are primarily two ways to embed the prior knowledge of the initial model into neural networks, that is, pretraining and denormalization. Pretraining first regulates the neural networks’ parameters by fitting the initial velocity model; Denormalization directly adds the outputs of the network into the initial models without pretraining. In this letter, we systematically investigate the influence of the two ways of initial model incorporation for the neural network reparameterized FWI. We demonstrate that pretraining requires inverting the model perturbation based on a constant velocity value (mean) with a two-stage implementation. It leads to a complex workflow and inconsistency of objective functions in the two-stage process, causing the network parameters to become inactive and lose plasticity. Experimental results demonstrate that denormalization can simplify workflows, accelerate convergence, and enhance inversion accuracy compared with pretraining.

nan


Article 1846

Title@2025-06-05 (4): ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting

Title: ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting ODE-GS: Latente ODEs für dynamische Szene Extrapolation mit 3D Gaussian Splatting ODE-GS: 带有 3D Gaussian Splatting 3D 的动态场点外推法 2506.05480v1

Authors (5): Daniel Wang, Patrick Rim, Tian Tian, Alex Wong, Ganesh Sundaramoorthi

We present ODE-GS, a novel method that unifies 3D Gaussian Splatting with latent neural ordinary differential equations (ODEs) to forecast dynamic 3D scenes far beyond the time span seen during training. Existing neural rendering systems - whether NeRF- or 3DGS-based - embed time directly in a deformation network and therefore excel at interpolation but collapse when asked to predict the future, where timestamps are strictly out-of-distribution. ODE-GS eliminates this dependency: after learning a high-fidelity, time-conditioned deformation model for the training window, we freeze it and train a Transformer encoder that summarizes past Gaussian trajectories into a latent state whose continuous evolution is governed by a neural ODE. Numerical integration of this latent flow yields smooth, physically plausible Gaussian trajectories that can be queried at any future instant and rendered in real time. Coupled with a variational objective and a lightweight second-derivative regularizer, ODE-GS attains state-of-the-art extrapolation on D-NeRF and NVFI benchmarks, improving PSNR by up to 10 dB and halving perceptual error (LPIPS) relative to the strongest baselines. Our results demonstrate that continuous-time latent dynamics are a powerful, practical route to photorealistic prediction of complex 3D scenes.

nan


Article 1847

Title@2025-06-05 (4): Learning-Augmented Algorithms for MTS with Bandit Access to Multiple Predictors

Title: Learning-Augmented Algorithms for MTS with Bandit Access to Multiple Predictors Lerngesteigerte Algorithmen für MTS mit Bandit-Zugriff auf mehrere Vorhersager 多边贸易体系的学习强化比值,使强盗能够接触多种预测者 2506.05479v1

Authors (2): Matei Gabriel Coşa, Marek Eliáš

We consider the following problem: We are given $\ell$ heuristics for Metrical Task Systems (MTS), where each might be tailored to a different type of input instances. While processing an input instance received online, we are allowed to query the action of only one of the heuristics at each time step. Our goal is to achieve performance comparable to the best of the given heuristics. The main difficulty of our setting comes from the fact that the cost paid by a heuristic at time $t$ cannot be estimated unless the same heuristic was also queried at time $t-1$. This is related to Bandit Learning against memory bounded adversaries (Arora et al., 2012). We show how to achieve regret of $O(\text{OPT}^{2/3})$ and prove a tight lower bound based on the construction of Dekel et al. (2013).

nan


Article 1848

Title@2025-06-05 (4): Inference-Time Hyper-Scaling with KV Cache Compression

Title: Inference-Time Hyper-Scaling with KV Cache Compression Inferenz-Zeit-Hyper-Skalierung mit KV-Cache-Kompression 与 KV 缓存压缩一起的推断时间超大缩放 2506.05345v1

Authors (4): Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot, Edoardo M. Ponti

Inference-time scaling trades efficiency for increased reasoning accuracy by generating longer or more parallel sequences. However, in Transformer LLMs, generation cost is bottlenecked by the size of the key-value (KV) cache, rather than the number of generated tokens. Hence, we explore inference-time hyper-scaling: by compressing the KV cache, we can generate more tokens within the same compute budget and further improve the accuracy of scaled inference. The success of this approach, however, hinges on the ability of compression methods to preserve accuracy even at high compression ratios. To make hyper-scaling practical, we introduce Dynamic Memory Sparsification (DMS), a novel method for sparsifying KV caches that only requires 1K training steps to achieve 8$\times$ compression, while maintaining better accuracy than training-free sparse attention. Instead of prematurely discarding cached tokens, DMS delays token eviction, implicitly merging representations and preserving critical information. We demonstrate the effectiveness of inference-time hyper-scaling with DMS on multiple families of LLMs, showing that it boosts accuracy for comparable inference runtime and memory load. For instance, we enhance Qwen-R1 32B by an average of 9.1 points on AIME 24, 7.6 on GPQA, and 9.6 on LiveCodeBench across compute budgets.

nan


Article 1849

Title@2025-06-05 (4): Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets

Title: Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets Warum LLM Sicherheits-Guardrails nach Feinabstimmung zusammenbrechen: Eine Ähnlichkeitsanalyse zwischen Alignment- und Feinabstimmungs-Datensätzen 为何LLM 安全护卫车在微调后倒塌:对准和微调数据集之间的相似性分析 2506.05346v1

Authors (7): Lei Hsiung, Tianyu Pang, Yung-Chen Tang, Linyue Song, Tsung-Yi Ho, Pin-Yu Chen, Yaoqing Yang

Recent advancements in large language models (LLMs) have underscored their vulnerability to safety alignment jailbreaks, particularly when subjected to downstream fine-tuning. However, existing mitigation strategies primarily focus on reactively addressing jailbreak incidents after safety guardrails have been compromised, removing harmful gradients during fine-tuning, or continuously reinforcing safety alignment throughout fine-tuning. As such, they tend to overlook a critical upstream factor: the role of the original safety-alignment data. This paper therefore investigates the degradation of safety guardrails through the lens of representation similarity between upstream alignment datasets and downstream fine-tuning tasks. Our experiments demonstrate that high similarity between these datasets significantly weakens safety guardrails, making models more susceptible to jailbreaks. Conversely, low similarity between these two types of datasets yields substantially more robust models and thus reduces harmfulness score by up to 10.33%. By highlighting the importance of upstream dataset design in the building of durable safety guardrails and reducing real-world vulnerability to jailbreak attacks, these findings offer actionable insights for fine-tuning service providers.

nan


Article 1850

Title@2025-06-05 (4): Search Arena: Analyzing Search-Augmented LLMs

Title: Search Arena: Analyzing Search-Augmented LLMs Sucharena: Analyse von Such-Augmentierten LLMs 搜索竞技场: 分析搜索推荐LMS 2506.05334v1

Authors (11): Mihran Miroyan, Tsung-Han Wu, Logan King, Tianle Li, Jiayi Pan, Xinyan Hu, Wei-Lin Chiang, Anastasios N. Angelopoulos, Trevor Darrell, Narges Norouzi, Joseph E. Gonzalez

Search-augmented language models combine web search with Large Language Models (LLMs) to improve response groundedness and freshness. However, analyzing these systems remains challenging: existing datasets are limited in scale and narrow in scope, often constrained to static, single-turn, fact-checking questions. In this work, we introduce Search Arena, a crowd-sourced, large-scale, human-preference dataset of over 24,000 paired multi-turn user interactions with search-augmented LLMs. The dataset spans diverse intents and languages, and contains full system traces with around 12,000 human preference votes. Our analysis reveals that user preferences are influenced by the number of citations, even when the cited content does not directly support the attributed claims, uncovering a gap between perceived and actual credibility. Furthermore, user preferences vary across cited sources, revealing that community-driven platforms are generally preferred and static encyclopedic sources are not always appropriate and reliable. To assess performance across different settings, we conduct cross-arena analyses by testing search-augmented LLMs in a general-purpose chat environment and conventional LLMs in search-intensive settings. We find that web search does not degrade and may even improve performance in non-search settings; however, the quality in search settings is significantly affected if solely relying on the model’s parametric knowledge. We open-sourced the dataset to support future research in this direction. Our dataset and code are available at: https://github.com/lmarena/search-arena.

nan


Article 1851

Title@2025-06-05 (4): AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Title: AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning AceReason-Nemotron: Mathematische und Code-Reasonierung durch Stärkungslernen AceReson-Nepron:通过强化学习推进数学和守则 2505.16400v3

Authors (8): Yang Chen, Zhuolin Yang, Zihan Liu, Chankyu Lee, Peng Xu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

Despite recent progress in large-scale reinforcement learning (RL) for reasoning, the training recipe for building high-performing reasoning models remains elusive. Key implementation details of frontier models, such as DeepSeek-R1, including data curation strategies and RL training recipe, are often omitted. Moreover, recent research indicates distillation remains more effective than RL for smaller models. In this work, we demonstrate that large-scale RL can significantly enhance the reasoning capabilities of strong, small- and mid-sized models, achieving results that surpass those of state-of-the-art distillation-based models. We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts. Notably, we find that math-only RL not only significantly enhances the performance of strong distilled models on math benchmarks (e.g., +14.6% / +17.2% on AIME 2025 for the 7B / 14B models), but also code reasoning tasks (e.g., +6.8% / +5.8% on LiveCodeBench for the 7B / 14B models). In addition, extended code-only RL iterations further improve performance on code benchmarks with minimal or no degradation in math results. We develop a robust data curation pipeline to collect challenging prompts with high-quality, verifiable answers and test cases to enable verification-based RL across both domains. Finally, we identify key experimental insights, including curriculum learning with progressively increasing response lengths and the stabilizing effect of on-policy parameter updates. We find that RL not only elicits the foundational reasoning capabilities acquired during pretraining and supervised fine-tuning (e.g., distillation), but also pushes the limits of the model’s reasoning ability, enabling it to solve problems that were previously unsolvable.

nan


Article 1852

Title@2025-06-05 (4): Zeroth-Order Optimization Finds Flat Minima

Title: Zeroth-Order Optimization Finds Flat Minima Nullte-Order-Optimierung findet flache Minima 零阶优化发现平底微米 2506.05454v1

Authors (6): Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Michael Muehlebach, Niao He

Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known on the implicit regularization that provides a fine-grained characterization on which particular solutions are finally reached. We show that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian, which is widely used in previous work to distinguish between sharp and flat minima. We further provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions, where flat minima are defined as the minimizers that achieve the smallest trace of Hessian among all optimal solutions. Experiments on binary classification tasks with convex losses and language model fine-tuning support our theoretical findings.

nan


Article 1853

Title@2025-06-05 (4): The broader spectrum of in-context learning

Title: The broader spectrum of in-context learning Das breitere Spektrum des In-Context-Lernens 广义的内通学习 2412.03782v3

Authors (4): Andrew Kyle Lampinen, Stephanie C. Y. Chan, Aaditya K. Singh, Murray Shanahan

The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning within a much broader spectrum of meta-learned in-context learning. Indeed, we suggest that any distribution of sequences in which context non-trivially decreases loss on subsequent predictions can be interpreted as eliciting a kind of in-context learning. We suggest that this perspective helps to unify the broad set of in-context abilities that language models exhibit – such as adapting to tasks from instructions or role play, or extrapolating time series. This perspective also sheds light on potential roots of in-context learning in lower-level processing of linguistic dependencies (e.g. coreference or parallel structures). Finally, taking this perspective highlights the importance of generalization, which we suggest can be studied along several dimensions: not only the ability to learn something novel, but also flexibility in learning from different presentations, and in applying what is learned. We discuss broader connections to past literature in meta-learning and goal-conditioned agents, and other perspectives on learning and adaptation. We close by suggesting that research on in-context learning should consider this broader spectrum of in-context capabilities and types of generalization.

nan


Article 1854

Title@2025-06-05 (4): Admissibility of Completely Randomized Trials: A Large-Deviation Approach

Title: Admissibility of Completely Randomized Trials: A Large-Deviation Approach Zulässigkeit von völlig randomisierten Prüfungen: Ein Ansatz mit großer Abweichung 完全随机审理的可受理性:一个大型减缓办法 2506.05329v1

Authors (3): Guido Imbens, Chao Qin, Stefan Wager

When an experimenter has the option of running an adaptive trial, is it admissible to ignore this option and run a non-adaptive trial instead? We provide a negative answer to this question in the best-arm identification problem, where the experimenter aims to allocate measurement efforts judiciously to confidently deploy the most effective treatment arm. We find that, whenever there are at least three treatment arms, there exist simple adaptive designs that universally and strictly dominate non-adaptive completely randomized trials. This dominance is characterized by a notion called efficiency exponent, which quantifies a design’s statistical efficiency when the experimental sample is large. Our analysis focuses on the class of batched arm elimination designs, which progressively eliminate underperforming arms at pre-specified batch intervals. We characterize simple sufficient conditions under which these designs universally and strictly dominate completely randomized trials. These results resolve the second open problem posed in Qin [2022].

nan


Article 1855

Title@2025-06-05 (4): Seeing the Invisible: Machine learning-Based QPI Kernel Extraction via Latent Alignment

Title: Seeing the Invisible: Machine learning-Based QPI Kernel Extraction via Latent Alignment Das Unsichtbare sehen: Maschinelles Lernen basierte QPI-Kernel-Extraktion über Latent Alignment 可见:机器学习 QPI Kernel 采掘,通过Lient对齐 2506.05325v1

Authors (6): Yingshuai Ji, Haomin Zhuang, Matthew Toole, James McKenzie, Xiaolong Liu, Xiangliang Zhang

Quasiparticle interference (QPI) imaging is a powerful tool for probing electronic structures in quantum materials, but extracting the single-scatterer QPI pattern (i.e., the kernel) from a multi-scatterer image remains a fundamentally ill-posed inverse problem. In this work, we propose the first AI-based framework for QPI kernel extraction. We introduce a two-step learning strategy that decouples kernel representation learning from observation-to-kernel inference. In the first step, we train a variational autoencoder to learn a compact latent space of scattering kernels. In the second step, we align the latent representation of QPI observations with those of the pre-learned kernels using a dedicated encoder. This design enables the model to infer kernels robustly even under complex, entangled scattering conditions. We construct a diverse and physically realistic QPI dataset comprising 100 unique kernels and evaluate our method against a direct one-step baseline. Experimental results demonstrate that our approach achieves significantly higher extraction accuracy, and improved generalization to unseen kernels.

nan


Article 1856

Title@2025-06-05 (4): LSM-2: Learning from Incomplete Wearable Sensor Data

Title: LSM-2: Learning from Incomplete Wearable Sensor Data LSM-2: Lernen von unvollständigen Wearable Sensordaten LSM-2:学习不全穿感应器数据 2506.05321v1

Authors (25): Maxwell A. Xu, Girish Narayanswamy, Kumar Ayush, Dimitris Spathis, Shun Liao, Shyam A. Tailor, Ahmed Metwally, A. Ali Heydari, Yuwei Zhang, Jake Garrison, Samy Abdel-Ghaffar, Xuhai Xu, Ken Gu, Jacob Sunshine, Ming-Zher Poh, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Mark Malhotra, Shwetak Patel, Yuzhe Yang, James M. Rehg, Xin Liu, Daniel McDuff

Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM), a novel SSL approach that learns robust representations directly from incomplete data without requiring explicit imputation. AIM’s core novelty lies in its use of learnable mask tokens to model both existing (“inherited”) and artificially introduced missingness, enabling it to robustly handle fragmented real-world data during inference. Pre-trained on an extensive dataset of 40M hours of day-long multimodal sensor data, our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling. Furthermore, LSM-2 with AIM exhibits superior scaling performance, and critically, maintains high performance even under targeted missingness scenarios, reflecting clinically coherent patterns, such as the diagnostic value of nighttime biosignals for hypertension prediction. This makes AIM a more reliable choice for real-world wearable data applications.

nan


Article 1857

Title@2025-06-05 (4): Generalizable, real-time neural decoding with hybrid state-space models

Title: Generalizable, real-time neural decoding with hybrid state-space models Verallgemeinerbare, Echtzeit-Neuraldekodierung mit hybriden State-Space-Modellen 与混合状态空间模型的通用实时神经解码 2506.05320v1

Authors (7): Avery Hee-Woon Ryoo, Nanda H. Krishna, Ximeng Mao, Mehdi Azabou, Eva L. Dyer, Matthew G. Perich, Guillaume Lajoie

Real-time decoding of neural activity is central to neuroscience and neurotechnology applications, from closed-loop experiments to brain-computer interfaces, where models are subject to strict latency constraints. Traditional methods, including simple recurrent neural networks, are fast and lightweight but often struggle to generalize to unseen data. In contrast, recent Transformer-based approaches leverage large-scale pretraining for strong generalization performance, but typically have much larger computational requirements and are not always suitable for low-resource or real-time settings. To address these shortcomings, we present POSSM, a novel hybrid architecture that combines individual spike tokenization via a cross-attention module with a recurrent state-space model (SSM) backbone to enable (1) fast and causal online prediction on neural activity and (2) efficient generalization to new sessions, individuals, and tasks through multi-dataset pretraining. We evaluate POSSM’s decoding performance and inference speed on intracortical decoding of monkey motor tasks, and show that it extends to clinical applications, namely handwriting and speech decoding in human subjects. Notably, we demonstrate that pretraining on monkey motor-cortical recordings improves decoding performance on the human handwriting task, highlighting the exciting potential for cross-species transfer. In all of these tasks, we find that POSSM achieves decoding accuracy comparable to state-of-the-art Transformers, at a fraction of the inference cost (up to 9x faster on GPU). These results suggest that hybrid SSMs are a promising approach to bridging the gap between accuracy, inference speed, and generalization when training neural decoders for real-time, closed-loop applications.

nan


Article 1858

Title@2025-06-05 (4): Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

Title: Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay Verbesserung der Dateneffizienz für LLM-Verstärkung Feinabstimmung durch problemorientierte Online-Datenauswahl und Rollout-Replay 提高LLLM数据效率,通过困难目标在线数据选择和推出重播提高LLM强化微调的数据效率 2506.05316v1

Authors (7): Yifan Sun, Jingyan Shen, Yibin Wang, Tianyu Chen, Zhendong Wang, Mingyuan Zhou, Huan Zhang

Reinforcement learning (RL) has become an effective approach for fine-tuning large language models (LLMs), particularly to enhance their reasoning capabilities. However, RL fine-tuning remains highly resource-intensive, and existing work has largely overlooked the problem of data efficiency. In this paper, we propose two techniques to improve data efficiency in LLM RL fine-tuning: difficulty-targeted online data selection and rollout replay. We introduce the notion of adaptive difficulty to guide online data selection, prioritizing questions of moderate difficulty that are more likely to yield informative learning signals. To estimate adaptive difficulty efficiently, we develop an attention-based framework that requires rollouts for only a small reference set of questions. The adaptive difficulty of the remaining questions is then estimated based on their similarity to this set. To further reduce rollout cost, we introduce a rollout replay mechanism that reuses recent rollouts, lowering per-step computation while maintaining stable updates. Extensive experiments across 6 LLM-dataset combinations show that our method reduces RL fine-tuning time by 25% to 65% to reach the same level of performance as the original GRPO algorithm.

nan


Article 1859

Title@2025-06-05 (4): Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models

Title: Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models Engropisches Lernen eingeschränkt: Ein primäres Rahmenwerk für große Sprachmodelle 未学习:大语言模式的原始-双重框架 2506.05314v1

Authors (5): Taha Entesari, Arman Hatami, Rinat Khaziev, Anil Ramakrishna, Mahyar Fazlyab

Large Language Models (LLMs) deployed in real-world settings increasingly face the need to unlearn sensitive, outdated, or proprietary information. Existing unlearning methods typically formulate forgetting and retention as a regularized trade-off, combining both objectives into a single scalarized loss. This often leads to unstable optimization and degraded performance on retained data, especially under aggressive forgetting. We propose a new formulation of LLM unlearning as a constrained optimization problem: forgetting is enforced via a novel logit-margin flattening loss that explicitly drives the output distribution toward uniformity on a designated forget set, while retention is preserved through a hard constraint on a separate retain set. Compared to entropy-based objectives, our loss is softmax-free, numerically stable, and maintains non-vanishing gradients, enabling more efficient and robust optimization. We solve the constrained problem using a scalable primal-dual algorithm that exposes the trade-off between forgetting and retention through the dynamics of the dual variable. Evaluations on the TOFU and MUSE benchmarks across diverse LLM architectures demonstrate that our approach consistently matches or exceeds state-of-the-art baselines, effectively removing targeted information while preserving downstream utility.

nan


Article 1860

Title@2025-06-05 (4): Learning normalized image densities via dual score matching

Title: Learning normalized image densities via dual score matching Normalisierte Bilddichten durch Dual-Score-Matching lernen 通过双分比比对学习图像密度正常化 2506.05310v1

Authors (3): Florentin Guth, Zahra Kadkhodaie, Eero P Simoncelli

Learning probability models from data is at the heart of many machine learning endeavors, but is notoriously difficult due to the curse of dimensionality. We introduce a new framework for learning \emph{normalized} energy (log probability) models that is inspired from diffusion generative models, which rely on networks optimized to estimate the score. We modify a score network architecture to compute an energy while preserving its inductive biases. The gradient of this energy network with respect to its input image is the score of the learned density, which can be optimized using a denoising objective. Importantly, the gradient with respect to the noise level provides an additional score that can be optimized with a novel secondary objective, ensuring consistent and normalized energies across noise levels. We train an energy network with this \emph{dual} score matching objective on the ImageNet64 dataset, and obtain a cross-entropy (negative log likelihood) value comparable to the state of the art. We further validate our approach by showing that our energy model \emph{strongly generalizes}: estimated log probabilities are nearly independent of the specific images in the training set. Finally, we demonstrate that both image probability and dimensionality of local neighborhoods vary significantly with image content, in contrast with traditional assumptions such as concentration of measure or support on a low-dimensional manifold.

nan


Article 1861

Title@2025-06-05 (4): ProRefine: Inference-time Prompt Refinement with Textual Feedback

Title: ProRefine: Inference-time Prompt Refinement with Textual Feedback ProRefine: Inferenz-Zeit Prompt Verfeinerung mit Text-Feedback ProRefine: 用文字反馈迅速改进推论-时间 2506.05305v1

Authors (5): Deepak Pandita, Tharindu Cyril Weerasooriya, Ankit Parag Shah, Christopher M. Homan, Wei Wei

Agentic workflows, where multiple AI agents collaborate to accomplish complex tasks like reasoning or planning, are becoming increasingly prevalent. However, these workflows often suffer from error propagation and sub-optimal performance, largely due to poorly designed prompts that fail to effectively guide individual agents. This is a critical problem because it limits the reliability and scalability of these powerful systems. We introduce ProRefine, an innovative inference-time prompt optimization method that leverages textual feedback from large language models (LLMs) to address this challenge. ProRefine dynamically refines prompts for multi-step reasoning tasks without additional training or ground truth labels. Evaluated on five benchmark mathematical reasoning datasets, ProRefine significantly surpasses zero-shot Chain-of-Thought baselines by 3 to 37 percentage points. This approach not only boosts accuracy but also allows smaller models to match the performance of larger ones, highlighting its potential for efficient and scalable AI deployment, and democratizing access to high-performing AI.

nan


Article 1862

Title@2025-06-05 (4): Power Law Guided Dynamic Sifting for Efficient Attention

Title: Power Law Guided Dynamic Sifting for Efficient Attention Power Law geführtes Dynamic Sifting für effiziente Aufmerksamkeit 有效关注动力法指导动态筛选 2506.05300v1

Authors (3): Nirav Koley, Prajwal Singhania, Abhinav Bhatele

Efficient inference on GPUs using large language models remains challenging due to memory bandwidth limitations, particularly during data transfers between High Bandwidth Memory (HBM) and SRAM in attention computations. Approximate attention methods address this issue by reducing computational and memory overhead but often rely on expensive top-$k$ operations, which perform poorly on GPUs. We propose SiftAttention, a novel approximate attention method that replaces the top-$k$ step with a computationally efficient element-wise filtering operation based on a threshold value. Our intuition for doing this is based on our empirical observation that the $\tau$-th quantile of attention scores follows a predictable power-law over sequential generation steps. Exploiting this insight, our approach dynamically estimates a threshold value per prompt at each generation step. Only attention scores above this threshold and their corresponding value vectors are loaded/used to compute the attention output, reducing data movement between HBM and SRAM. Our evaluation demonstrates that SiftAttention preserves model quality better than existing approximate attention methods while reducing memory bandwidth usage when loading value vectors.

nan


Article 1863

Title@2025-06-05 (4): Control Tax: The Price of Keeping AI in Check

Title: Control Tax: The Price of Keeping AI in Check Steuerregelung: Der Preis der AII-Kontrolle 控制税:检查人工智能检查价格 2506.05296v1

Authors (4): Mikhail Terekhov, Zhen Ning David Liu, Caglar Gulcehre, Samuel Albanie

The rapid integration of agentic AI into high-stakes real-world applications requires robust oversight mechanisms. The emerging field of AI Control (AIC) aims to provide such an oversight mechanism, but practical adoption depends heavily on implementation overhead. To study this problem better, we introduce the notion of Control tax – the operational and financial cost of integrating control measures into AI pipelines. Our work makes three key contributions to the field of AIC: (1) we introduce a theoretical framework that quantifies the Control Tax and maps classifier performance to safety assurances; (2) we conduct comprehensive evaluations of state-of-the-art language models in adversarial settings, where attacker models insert subtle backdoors into code while monitoring models attempt to detect these vulnerabilities; and (3) we provide empirical financial cost estimates for control protocols and develop optimized monitoring strategies that balance safety and cost-effectiveness while accounting for practical constraints like auditing budgets. Our framework enables practitioners to make informed decisions by systematically connecting safety guarantees with their costs, advancing AIC through principled economic feasibility assessment across different deployment contexts.

nan


Article 1864

Title: A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$: Robust Imitation via Learning to Search Ein glattes Meer hat nie eine geschickte $\texttt{SAILOR}$: Robuste Imitation über das Lernen zu suchen 平滑的海海从未造出技能用$\ textt{ SAILOR}$: 通过学习搜索进行强力模仿 2506.05294v1

Authors (8): Arnav Kumar Jain, Vibhakar Mohta, Subin Kim, Atiksh Bhardwaj, Juntao Ren, Yunhai Feng, Sanjiban Choudhury, Gokul Swamy

The fundamental limitation of the behavioral cloning (BC) approach to imitation learning is that it only teaches an agent what the expert did at states the expert visited. This means that when a BC agent makes a mistake which takes them out of the support of the demonstrations, they often don’t know how to recover from it. In this sense, BC is akin to giving the agent the fish – giving them dense supervision across a narrow set of states – rather than teaching them to fish: to be able to reason independently about achieving the expert’s outcome even when faced with unseen situations at test-time. In response, we explore learning to search (L2S) from expert demonstrations, i.e. learning the components required to, at test time, plan to match expert outcomes, even after making a mistake. These include (1) a world model and (2) a reward model. We carefully ablate the set of algorithmic and design decisions required to combine these and other components for stable and sample/interaction-efficient learning of recovery behavior without additional human corrections. Across a dozen visual manipulation tasks from three benchmarks, our approach $\texttt{SAILOR}$ consistently out-performs state-of-the-art Diffusion Policies trained via BC on the same data. Furthermore, scaling up the amount of demonstrations used for BC by 5-10$\times$ still leaves a performance gap. We find that $\texttt{SAILOR}$ can identify nuanced failures and is robust to reward hacking. Our code is available at https://github.com/arnavkj1995/SAILOR .

nan


Article 1865

Title@2025-06-05 (4): Learning Beyond Experience: Generalizing to Unseen State Space with Reservoir Computing

Title: Learning Beyond Experience: Generalizing to Unseen State Space with Reservoir Computing Learning Beyond Experience: Mit Reservoir Computing den Zustand des Weltraums verallgemeinern 超越经验的学习:利用储量计算,向不见得的国家空间推广 2506.05292v1

Authors (3): Declan A. Norton, Yuanzhao Zhang, Michelle Girvan

Machine learning techniques offer an effective approach to modeling dynamical systems solely from observed data. However, without explicit structural priors – built-in assumptions about the underlying dynamics – these techniques typically struggle to generalize to aspects of the dynamics that are poorly represented in the training data. Here, we demonstrate that reservoir computing – a simple, efficient, and versatile machine learning framework often used for data-driven modeling of dynamical systems – can generalize to unexplored regions of state space without explicit structural priors. First, we describe a multiple-trajectory training scheme for reservoir computers that supports training across a collection of disjoint time series, enabling effective use of available training data. Then, applying this training scheme to multistable dynamical systems, we show that RCs trained on trajectories from a single basin of attraction can achieve out-of-domain generalization by capturing system behavior in entirely unobserved basins.

nan


Article 1866

Title@2025-06-05 (4): Stable Vision Concept Transformers for Medical Diagnosis

Title: Stable Vision Concept Transformers for Medical Diagnosis Stabile Vision Konzepttransformatoren für medizinische Diagnose 医学诊断的稳定远景构想变异器 2506.05286v1

Authors (6): Lijie Hu, Songning Lai, Yuan Hua, Shu Yang, Jingfeng Zhang, Di Wang

Transparency is a paramount concern in the medical field, prompting researchers to delve into the realm of explainable AI (XAI). Among these XAI methods, Concept Bottleneck Models (CBMs) aim to restrict the model’s latent space to human-understandable high-level concepts by generating a conceptual layer for extracting conceptual features, which has drawn much attention recently. However, existing methods rely solely on concept features to determine the model’s predictions, which overlook the intrinsic feature embeddings within medical images. To address this utility gap between the original models and concept-based models, we propose Vision Concept Transformer (VCT). Furthermore, despite their benefits, CBMs have been found to negatively impact model performance and fail to provide stable explanations when faced with input perturbations, which limits their application in the medical field. To address this faithfulness issue, this paper further proposes the Stable Vision Concept Transformer (SVCT) based on VCT, which leverages the vision transformer (ViT) as its backbone and incorporates a conceptual layer. SVCT employs conceptual features to enhance decision-making capabilities by fusing them with image features and ensures model faithfulness through the integration of Denoised Diffusion Smoothing. Comprehensive experiments on four medical datasets demonstrate that our VCT and SVCT maintain accuracy while remaining interpretable compared to baselines. Furthermore, even when subjected to perturbations, our SVCT model consistently provides faithful explanations, thus meeting the needs of the medical field.

nan


Article 1867

Title@2025-06-05 (4): Fast-DataShapley: Neural Modeling for Training Data Valuation

Title: Fast-DataShapley: Neural Modeling for Training Data Valuation Fast-DataShapley: Neurale Modellierung für Trainingsdatenbewertung 快速数据库:培训数据评估神经模型 2506.05281v1

Authors (7): Haifeng Sun, Yu Xiong, Runze Wu, Xinyu Cai, Changjie Fan, Lan Zhang, Xiang-Yang Li

The value and copyright of training data are crucial in the artificial intelligence industry. Service platforms should protect data providers’ legitimate rights and fairly reward them for their contributions. Shapley value, a potent tool for evaluating contributions, outperforms other methods in theory, but its computational overhead escalates exponentially with the number of data providers. Recent works based on Shapley values attempt to mitigate computation complexity by approximation algorithms. However, they need to retrain for each test sample, leading to intolerable costs. We propose Fast-DataShapley, a one-pass training method that leverages the weighted least squares characterization of the Shapley value to train a reusable explainer model with real-time reasoning speed. Given new test samples, no retraining is required to calculate the Shapley values of the training data. Additionally, we propose three methods with theoretical guarantees to reduce training overhead from two aspects: the approximate calculation of the utility function and the group calculation of the training data. We analyze time complexity to show the efficiency of our methods. The experimental evaluations on various image datasets demonstrate superior performance and efficiency compared to baselines. Specifically, the performance is improved to more than 2.5 times, and the explainer’s training speed can be increased by two orders of magnitude.

nan


Article 1868

Title@2025-06-05 (4): How to Unlock Time Series Editing? Diffusion-Driven Approach with Multi-Grained Control

Title: How to Unlock Time Series Editing? Diffusion-Driven Approach with Multi-Grained Control Wie man Zeitreihenbearbeitung entsperrt? Diffusion-getriebener Ansatz mit mehrkörniger Steuerung 如何解锁时间序列编辑 ? 2506.05276v1

Authors (7): Hao Yu, Chu Xin Cheng, Runlong Yu, Yuyang Ye, Shiwei Tong, Zhaofeng Liu, Defu Lian

Recent advances in time series generation have shown promise, yet controlling properties in generated sequences remains challenging. Time Series Editing (TSE) - making precise modifications while preserving temporal coherence - consider both point-level constraints and segment-level controls that current methods struggle to provide. We introduce the CocktailEdit framework to enable simultaneous, flexible control across different types of constraints. This framework combines two key mechanisms: a confidence-weighted anchor control for point-wise constraints and a classifier-based control for managing statistical properties such as sums and averages over segments. Our methods achieve precise local control during the denoising inference stage while maintaining temporal coherence and integrating seamlessly, with any conditionally trained diffusion-based time series models. Extensive experiments across diverse datasets and models demonstrate its effectiveness. Our work bridges the gap between pure generative modeling and real-world time series editing needs, offering a flexible solution for human-in-the-loop time series generation and editing. The code and demo are provided for validation.

nan


Article 1869

Title@2025-06-05 (4): Tight analyses of first-order methods with error feedback

Title: Tight analyses of first-order methods with error feedback Enge Analysen von First-Order-Methoden mit Fehlerrückmeldung 利用错误反馈对一级处理方法进行严格分析 2506.05271v1

Authors (3): Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut

Communication between agents often constitutes a major computational bottleneck in distributed learning. One of the most common mitigation strategies is to compress the information exchanged, thereby reducing communication overhead. To counteract the degradation in convergence associated with compressed communication, error feedback schemes – most notably $\mathrm{EF}$ and $\mathrm{EF}^{21}$ – were introduced. In this work, we provide a tight analysis of both of these methods. Specifically, we find the Lyapunov function that yields the best possible convergence rate for each method – with matching lower bounds. This principled approach yields sharp performance guarantees and enables a rigorous, apples-to-apples comparison between $\mathrm{EF}$, $\mathrm{EF}^{21}$, and compressed gradient descent. Our analysis is carried out in a simplified yet representative setting, which allows for clean theoretical insights and fair comparison of the underlying mechanisms.

nan


Article 1870

Title@2025-06-05 (4): Context is Key: A Benchmark for Forecasting with Essential Textual Information

Title: Context is Key: A Benchmark for Forecasting with Essential Textual Information Kontext ist der Schlüssel: Ein Benchmark für Vorhersagen mit wesentlichen Textinformationen 背景是关键:用基本文字信息预测的基准 2410.18959v4

Authors (11): Andrew Robert Williams, Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Jithendaraa Subramanian, Roland Riachi, James Requeima, Alexandre Lacoste, Irina Rish, Nicolas Chapados, Alexandre Drouin

Forecasting is a critical task in decision-making across numerous domains. While historical numerical data provide a start, they fail to convey the complete context for reliable and accurate predictions. Human forecasters frequently rely on additional information, such as background knowledge and constraints, which can efficiently be communicated through natural language. However, in spite of recent progress with LLM-based forecasters, their ability to effectively integrate this textual information remains an open question. To address this, we introduce “Context is Key” (CiK), a time-series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context, requiring models to integrate both modalities; crucially, every task in CiK requires understanding textual context to be solved successfully. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters, and propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings. This benchmark aims to advance multimodal forecasting by promoting models that are both accurate and accessible to decision-makers with varied technical expertise. The benchmark can be visualized at https://servicenow.github.io/context-is-key-forecasting/v0/.

nan


Article 1871

Title@2025-06-05 (4): Learning long range dependencies through time reversal symmetry breaking

Title: Learning long range dependencies through time reversal symmetry breaking Längere Abhängigkeiten durch Zeitumkehr-Symmetrie-Bruch lernen 通过时间逆向对称断裂,通过时间逆向对称断裂,学习远距离依赖性 2506.05259v1

Authors (2): Guillaume Pourcel, Maxence Ernoult

Deep State Space Models (SSMs) reignite physics-grounded compute paradigms, as RNNs could natively be embodied into dynamical systems. This calls for dedicated learning algorithms obeying to core physical principles, with efficient techniques to simulate these systems and guide their design. We propose Recurrent Hamiltonian Echo Learning (RHEL), an algorithm which provably computes loss gradients as finite differences of physical trajectories of non-dissipative, Hamiltonian systems. In ML terms, RHEL only requires three “forward passes” irrespective of model size, without explicit Jacobian computation, nor incurring any variance in the gradient estimation. Motivated by the physical realization of our algorithm, we first introduce RHEL in continuous time and demonstrate its formal equivalence with the continuous adjoint state method. To facilitate the simulation of Hamiltonian systems trained by RHEL, we propose a discrete-time version of RHEL which is equivalent to Backpropagation Through Time (BPTT) when applied to a class of recurrent modules which we call Hamiltonian Recurrent Units (HRUs). This setting allows us to demonstrate the scalability of RHEL by generalizing these results to hierarchies of HRUs, which we call Hamiltonian SSMs (HSSMs). We apply RHEL to train HSSMs with linear and nonlinear dynamics on a variety of time-series tasks ranging from mid-range to long-range classification and regression with sequence length reaching $\sim 50k$. We show that RHEL consistently matches the performance of BPTT across all models and tasks. This work opens new doors for the design of scalable, energy-efficient physical systems endowed with self-learning capabilities for sequence modelling.

nan


Article 1872

Title@2025-06-05 (4): Continual Learning from Simulated Interactions via Multitask Prospective Rehearsal for Bionic Limb Behavior Modeling

Title: Continual Learning from Simulated Interactions via Multitask Prospective Rehearsal for Bionic Limb Behavior Modeling Kontinuierliches Lernen von Simulierten Interaktionen über Multitask Prospective Probe für Bionic Limb Behavior Modeling 通过多塔斯克双立架行为模拟模型预演预演预演,从模拟互动中不断学习 2405.01114v4

Authors (5): Sharmita Dey, Benjamin Paassen, Sarath Ravindran Nair, Sabri Boughorbel, Arndt F. Schilling

Lower limb amputations and neuromuscular impairments severely restrict mobility, necessitating advancements beyond conventional prosthetics. While motorized bionic limbs show promise, their effectiveness depends on replicating the dynamic coordination of human movement across diverse environments. In this paper, we introduce a model for human behavior in the context of bionic prosthesis control. Our approach leverages human locomotion demonstrations to learn the synergistic coupling of the lower limbs, enabling the prediction of the kinematic behavior of a missing limb during tasks such as walking, climbing inclines, and stairs. We propose a multitasking, continually adaptive model that anticipates and refines movements over time. At the core of our method is a technique called multitask prospective rehearsal, that anticipates and synthesizes future movements based on the previous prediction and employs a corrective mechanism for subsequent predictions. Our evolving architecture merges lightweight, task-specific modules on a shared backbone, ensuring both specificity and scalability. We validate our model through experiments on real-world human gait datasets, including transtibial amputees, across a wide range of locomotion tasks. Results demonstrate that our approach consistently outperforms baseline models, particularly in scenarios with distributional shifts, adversarial perturbations, and noise.

nan


Article 1873

Title@2025-06-05 (4): Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning

Title: Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning Konservative Klassifikatoren tun konsequent gut mit Verbesserung Agenten: Charakterisierung statistischer und Online-Lernen 保守的分类机构与改进机构保持一贯的很好:将统计和在线学习定性为特征 2506.05252v1

Authors (2): Dravyansh Sharma, Alec Sun

Machine learning is now ubiquitous in societal decision-making, for example in evaluating job candidates or loan applications, and it is increasingly important to take into account how classified agents will react to the learning algorithms. The majority of recent literature on strategic classification has focused on reducing and countering deceptive behaviors by the classified agents, but recent work of Attias et al. identifies surprising properties of learnability when the agents genuinely improve in order to attain the desirable classification, such as smaller generalization error than standard PAC-learning. In this paper we characterize so-called learnability with improvements across multiple new axes. We introduce an asymmetric variant of minimally consistent concept classes and use it to provide an exact characterization of proper learning with improvements in the realizable setting. While prior work studies learnability only under general, arbitrary agent improvement regions, we give positive results for more natural Euclidean ball improvement sets. In particular, we characterize improper learning under a mild generative assumption on the data distribution. We further show how to learn in more challenging settings, achieving lower generalization error under well-studied bounded noise models and obtaining mistake bounds in realizable and agnostic online learning. We resolve open questions posed by Attias et al. for both proper and improper learning.

nan


Article 1874

Title@2025-06-05 (4): On the Convergence of Gradient Descent on Learning Transformers with Residual Connections

Title: On the Convergence of Gradient Descent on Learning Transformers with Residual Connections Über die Konvergenz des gradienten Abstiegs auf Lerntransformatoren mit residualen Verbindungen 关于有残余连接的学习变异器的 “ 渐渐后代 “ 趋同 2506.05249v1

Authors (3): Zhen Qin, Jinxin Zhou, Zhihui Zhu

Transformer models have emerged as fundamental tools across various scientific and engineering disciplines, owing to their outstanding performance in diverse applications. Despite this empirical success, the theoretical foundations of Transformers remain relatively underdeveloped, particularly in understanding their training dynamics. Existing research predominantly examines isolated components–such as self-attention mechanisms and feedforward networks–without thoroughly investigating the interdependencies between these components, especially when residual connections are present. In this paper, we aim to bridge this gap by analyzing the convergence behavior of a structurally complete yet single-layer Transformer, comprising self-attention, a feedforward network, and residual connections. We demonstrate that, under appropriate initialization, gradient descent exhibits a linear convergence rate, where the convergence speed is determined by the minimum and maximum singular values of the output matrix from the attention layer. Moreover, our analysis reveals that residual connections serve to ameliorate the ill-conditioning of this output matrix, an issue stemming from the low-rank structure imposed by the softmax operation, thereby promoting enhanced optimization stability. We also extend our theoretical findings to a multi-layer Transformer architecture, confirming the linear convergence rate of gradient descent under suitable initialization. Empirical results corroborate our theoretical insights, illustrating the beneficial role of residual connections in promoting convergence stability.

nan


Article 1875

Title@2025-06-05 (4): Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress

Title: Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress LLM-Fortschritt neu denken: Berechenbare und unabhängige Wege zum Fortschritt 重新思考LLM进步:计算依赖性和独立前进之路 2505.04075v2

Authors (5): Jack Sanderson, Teddy Foley, Spencer Guo, Anqi Qu, Henry Josephson

Regulatory efforts to govern large language model (LLM) development have predominantly focused on restricting access to high-performance computational resources. This study evaluates the efficacy of such measures by examining whether LLM capabilities can advance through algorithmic innovation in compute-constrained environments. We propose a novel framework distinguishing compute-dependent innovations–which yield disproportionate benefits at high compute–from compute-independent innovations, which improve efficiency across compute scales. The impact is quantified using Compute-Equivalent Gain (CEG). Experimental validation with nanoGPT models confirms that compute-independent advancements yield significant performance gains (e.g., with combined CEG up to $3.5\times$) across the tested scales. In contrast, compute-dependent advancements were detrimental to performance at smaller experimental scales, but showed improved CEG (on par with the baseline) as model size increased, a trend consistent with their definition of yielding primary benefits at higher compute. Crucially, these findings indicate that restrictions on computational hardware, while potentially slowing LLM progress, are insufficient to prevent all capability gains driven by algorithmic advancements. We argue that effective AI oversight must therefore incorporate mechanisms for understanding, anticipating, and potentially guiding algorithmic research, moving beyond a singular focus on hardware. The proposed framework also serves as an analytical tool for forecasting AI progress.

nan


Article 1876

Title@2025-06-05 (4): A theoretical framework for overfitting in energy-based modeling

Title: A theoretical framework for overfitting in energy-based modeling Ein theoretischer Rahmen für die Überrüstung in energiebasierte Modellierung 以能源为基础的建模的超套理论框架 2501.19158v3

Authors (4): Giovanni Catania, Aurélien Decelle, Cyril Furtlehner, Beatriz Seoane

We investigate the impact of limited data on training pairwise energy-based models for inverse problems aimed at identifying interaction networks. Utilizing the Gaussian model as testbed, we dissect training trajectories across the eigenbasis of the coupling matrix, exploiting the independent evolution of eigenmodes and revealing that the learning timescales are tied to the spectral decomposition of the empirical covariance matrix. We see that optimal points for early stopping arise from the interplay between these timescales and the initial conditions of training. Moreover, we show that finite data corrections can be accurately modeled through asymptotic random matrix theory calculations and provide the counterpart of generalized cross-validation in the energy based model context. Our analytical framework extends to binary-variable maximum-entropy pairwise models with minimal variations. These findings offer strategies to control overfitting in discrete-variable models through empirical shrinkage corrections, improving the management of overfitting in energy-based generative models. Finally, we propose a generalization to arbitrary energy-based models by deriving the neural tangent kernel dynamics of the score function under the score-matching algorithm.

nan


Article 1877

Title@2025-06-05 (4): Robust Moment Identification for Nonlinear PDEs via a Neural ODE Approach

Title: Robust Moment Identification for Nonlinear PDEs via a Neural ODE Approach Robuste Moment-Identifikation für nichtlineare PDEs über einen neuralen ODE-Ansatz 通过神经极光模式方法确定非线性PDE的强劲势头 2506.05245v1

Authors (4): Shaoxuan Chen, Su Yang, Panayotis G. Kevrekidis, Wei Zhu

We propose a data-driven framework for learning reduced-order moment dynamics from PDE-governed systems using Neural ODEs. In contrast to derivative-based methods like SINDy, which necessitate densely sampled data and are sensitive to noise, our approach based on Neural ODEs directly models moment trajectories, enabling robust learning from sparse and potentially irregular time series. Using as an application platform the nonlinear Schr"{o}dinger equation, the framework accurately recovers governing moment dynamics when closure is available, even with limited and irregular observations. For systems without analytical closure, we introduce a data-driven coordinate transformation strategy based on Stiefel manifold optimization, enabling the discovery of low-dimensional representations in which the moment dynamics become closed, facilitating interpretable and reliable modeling. We also explore cases where a closure model is not known, such as a Fisher-KPP reaction-diffusion system. Here we demonstrate that Neural ODEs can still effectively approximate the unclosed moment dynamics and achieve superior extrapolation accuracy compared to physical-expert-derived ODE models. This advantage remains robust even under sparse and irregular sampling, highlighting the method’s robustness in data-limited settings. Our results highlight the Neural ODE framework as a powerful and flexible tool for learning interpretable, low-dimensional moment dynamics in complex PDE-governed systems.

nan


Article 1878

Title@2025-06-05 (4): Aligning Latent Spaces with Flow Priors

Title: Aligning Latent Spaces with Flow Priors Latente Räume mit Flow Priors ausrichten 将局部空间与流动前置空间对齐 2506.05240v1

Authors (5): Yizhuo Li, Yuying Ge, Yixiao Ge, Ying Shan, Ping Luo

This paper presents a novel framework for aligning learnable latent spaces to arbitrary target distributions by leveraging flow-based generative models as priors. Our method first pretrains a flow model on the target features to capture the underlying distribution. This fixed flow model subsequently regularizes the latent space via an alignment loss, which reformulates the flow matching objective to treat the latents as optimization targets. We formally prove that minimizing this alignment loss establishes a computationally tractable surrogate objective for maximizing a variational lower bound on the log-likelihood of latents under the target distribution. Notably, the proposed method eliminates computationally expensive likelihood evaluations and avoids ODE solving during optimization. As a proof of concept, we demonstrate in a controlled setting that the alignment loss landscape closely approximates the negative log-likelihood of the target distribution. We further validate the effectiveness of our approach through large-scale image generation experiments on ImageNet with diverse target distributions, accompanied by detailed discussions and ablation studies. With both theoretical and empirical validation, our framework paves a new way for latent space alignment.

nan


Article 1879

Title@2025-06-05 (4): Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit

Title: Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit Sparse Autoencoder bewerten: Vom shallow Design bis hin zum passenden Pursuit 评估粗略自动校对器:从浅色设计到匹配追求 2506.05239v1

Authors (5): Valérie Costa, Thomas Fel, Ekdeep Singh Lubana, Bahareh Tolooshams, Demba Ba

Sparse autoencoders (SAEs) have recently become central tools for interpretability, leveraging dictionary learning principles to extract sparse, interpretable features from neural representations whose underlying structure is typically unknown. This paper evaluates SAEs in a controlled setting using MNIST, which reveals that current shallow architectures implicitly rely on a quasi-orthogonality assumption that limits the ability to extract correlated features. To move beyond this, we introduce a multi-iteration SAE by unrolling Matching Pursuit (MP-SAE), enabling the residual-guided extraction of correlated features that arise in hierarchical settings such as handwritten digit generation while guaranteeing monotonic improvement of the reconstruction as more atoms are selected.

nan


Article 1880

Title@2025-06-05 (4): Timing is Important: Risk-aware Fund Allocation based on Time-Series Forecasting

Title: Timing is Important: Risk-aware Fund Allocation based on Time-Series Forecasting Timing ist wichtig: Risiko-aware Fund Allokation basierend auf Time-Series Forecasting 时间选择很重要:根据时间-系列预测进行有风险的基金分配 2505.24835v2

Authors (9): Fuyuan Lyu, Linfeng Du, Yunpeng Weng, Qiufang Ying, Zhiyan Xu, Wen Zou, Haolun Wu, Xiuqiang He, Xing Tang

Fund allocation has been an increasingly important problem in the financial domain. In reality, we aim to allocate the funds to buy certain assets within a certain future period. Naive solutions such as prediction-only or Predict-then-Optimize approaches suffer from goal mismatch. Additionally, the introduction of the SOTA time series forecasting model inevitably introduces additional uncertainty in the predicted result. To solve both problems mentioned above, we introduce a Risk-aware Time-Series Predict-and-Allocate (RTS-PnO) framework, which holds no prior assumption on the forecasting models. Such a framework contains three features: (i) end-to-end training with objective alignment measurement, (ii) adaptive forecasting uncertainty calibration, and (iii) agnostic towards forecasting models. The evaluation of RTS-PnO is conducted over both online and offline experiments. For offline experiments, eight datasets from three categories of financial applications are used: Currency, Stock, and Cryptos. RTS-PnO consistently outperforms other competitive baselines. The online experiment is conducted on the Cross-Border Payment business at FiT, Tencent, and an 8.4\% decrease in regret is witnessed when compared with the product-line approach. The code for the offline experiment is available at https://github.com/fuyuanlyu/RTS-PnO.

nan


Article 1881

Title: Blink of an eye: a simple theory for feature localization in generative models Blink des Auges: eine einfache Theorie zur Featurelokalisierung in generativen Modellen 眨眼:在基因模型中地貌化的简单理论 2502.00921v2

Authors (3): Marvin Li, Aayush Karan, Sitan Chen

Large language models can exhibit unexpected behavior in the blink of an eye. In a recent computer use demo, a language model switched from coding to Googling pictures of Yellowstone, and these sudden shifts in behavior have also been observed in reasoning patterns and jailbreaks. This phenomenon is not unique to autoregressive models: in diffusion models, key features of the final output are decided in narrow ``critical windows’’ of the generation process. In this work we develop a simple, unifying theory to explain this phenomenon using the formalism of stochastic localization samplers. We show that it emerges generically as the generation process localizes to a sub-population of the distribution it models. While critical windows have been studied at length in diffusion models, existing theory heavily relies on strong distributional assumptions and the particulars of Gaussian diffusion. In contrast to existing work our theory (1) applies to autoregressive and diffusion models; (2) makes no distributional assumptions; (3) quantitatively improves previous bounds even when specialized to diffusions; and (4) requires basic tools and no stochastic calculus or statistical-physics-based machinery. We also identify an intriguing connection to the all-or-nothing phenomenon from statistical inference. Finally, we validate our predictions empirically for LLMs and find that critical windows often coincide with failures in problem solving for various math and reasoning benchmarks.

nan


Article 1882

Title@2025-06-05 (4): Stochastic Poisson Surface Reconstruction with One Solve using Geometric Gaussian Processes

Title: Stochastic Poisson Surface Reconstruction with One Solve using Geometric Gaussian Processes Stochastische Poisson-Oberflächenrekonstruktion mit einer Lösung mit geometrischen Gauß-Prozessen 利用几何高斯进程进行一索一索的斯托查斯·普瓦松地面重建 2503.19136v2

Authors (4): Sidhanth Holalkere, David S. Bindel, Silvia Sellán, Alexander Terenin

Poisson Surface Reconstruction is a widely-used algorithm for reconstructing a surface from an oriented point cloud. To facilitate applications where only partial surface information is available, or scanning is performed sequentially, a recent line of work proposes to incorporate uncertainty into the reconstructed surface via Gaussian process models. The resulting algorithms first perform Gaussian process interpolation, then solve a set of volumetric partial differential equations globally in space, resulting in a computationally expensive two-stage procedure. In this work, we apply recently-developed techniques from geometric Gaussian processes to combine interpolation and surface reconstruction into a single stage, requiring only one linear solve per sample. The resulting reconstructed surface samples can be queried locally in space, without the use of problem-dependent volumetric meshes or grids. These capabilities enable one to (a) perform probabilistic collision detection locally around the region of interest, (b) perform ray casting without evaluating points not on the ray’s trajectory, and (c) perform next-view planning on a per-ray basis. They also do not requiring one to approximate kernel matrix inverses with diagonal matrices as part of intermediate computations, unlike prior methods. Results show that our approach provides a cleaner, more-principled, and more-flexible stochastic surface reconstruction pipeline.

nan


Article 1883

Title@2025-06-05 (4): MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

Title: MesaNet: Sequence Modeling by Locally Optimal Test-Time Training MesaNet: Sequenzmodellierung durch lokal optimale Test-Time-Schulung MesaNet:通过当地最佳试验时间培训进行序列建模 2506.05233v1

Authors (16): Johannes von Oswald, Nino Scherrer, Seijin Kobayashi, Luca Versari, Songlin Yang, Maximilian Schlegel, Kaitlin Maile, Yanick Schimpf, Oliver Sieberling, Alexander Meulemans, Rif A. Saurous, Guillaume Lajoie, Charlotte Frenkel, Razvan Pascanu, Blaise Agüera y Arcas, João Sacramento

Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, transformers require scaling memory and compute linearly during inference. A recent stream of work linearized the softmax operation, resulting in powerful recurrent neural network (RNN) models with constant memory and compute costs such as DeltaNet, Mamba or xLSTM. These models can be unified by noting that their recurrent layer dynamics can all be derived from an in-context regression objective, approximately optimized through an online learning rule. Here, we join this line of work and introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer (von Oswald et al., 2024), and study it in language modeling at the billion-parameter scale. This layer again stems from an in-context loss, but which is now minimized to optimality at every time point using a fast conjugate gradient solver. Through an extensive suite of experiments, we show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs, especially on tasks requiring long context understanding. This performance gain comes at the cost of additional flops spent during inference time. Our results are therefore intriguingly related to recent trends of increasing test-time compute to improve performance – here by spending compute to solve sequential optimization problems within the neural network itself.

nan


Article 1884

Title@2025-06-05 (4): Causal Discovery from Conditionally Stationary Time Series

Title: Causal Discovery from Conditionally Stationary Time Series Causal Discovery aus Conditionally Stationary Time Series 从有条件的静止时间序列中发现的因果发现 2110.06257v4

Authors (7): Carles Balsells-Rodas, Xavier Sumba, Tanmayee Narendra, Ruibo Tu, Gabriele Schweikert, Hedvig Kjellstrom, Yingzhen Li

Causal discovery, i.e., inferring underlying causal relationships from observational data, is highly challenging for AI systems. In a time series modeling context, traditional causal discovery methods mainly consider constrained scenarios with fully observed variables and/or data from stationary time-series. We develop a causal discovery approach to handle a wide class of nonstationary time series that are conditionally stationary, where the nonstationary behaviour is modeled as stationarity conditioned on a set of latent state variables. Named State-Dependent Causal Inference (SDCI), our approach is able to recover the underlying causal dependencies, with provable identifiablity for the state-dependent causal structures. Empirical experiments on nonlinear particle interaction data and gene regulatory networks demonstrate SDCI’s superior performance over baseline causal discovery methods. Improved results over non-causal RNNs on modeling NBA player movements demonstrate the potential of our method and motivate the use of causality-driven methods for forecasting.

nan


Article 1885

Title@2025-06-05 (4): Progressive Tempering Sampler with Diffusion

Title: Progressive Tempering Sampler with Diffusion Progressive Temperierung mit Diffusion 具有扩散作用的渐进性温度采样器 2506.05231v1

Authors (7): Severi Rissanen, RuiKang OuYang, Jiajun He, Wenlin Chen, Markus Heinonen, Arno Solin, José Miguel Hernández-Lobato

Recent research has focused on designing neural samplers that amortize the process of sampling from unnormalized densities. However, despite significant advancements, they still fall short of the state-of-the-art MCMC approach, Parallel Tempering (PT), when it comes to the efficiency of target evaluations. On the other hand, unlike a well-trained neural sampler, PT yields only dependent samples and needs to be rerun – at considerable computational cost – whenever new samples are required. To address these weaknesses, we propose the Progressive Tempering Sampler with Diffusion (PTSD), which trains diffusion models sequentially across temperatures, leveraging the advantages of PT to improve the training of neural samplers. We also introduce a novel method to combine high-temperature diffusion models to generate approximate lower-temperature samples, which are minimally refined using MCMC and used to train the next diffusion model. PTSD enables efficient reuse of sample information across temperature levels while generating well-mixed, uncorrelated samples. Our method significantly improves target evaluation efficiency, outperforming diffusion-based neural samplers.

nan


Article 1886

Title@2025-06-05 (4): Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts

Title: Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts Diagonales Batching löst Parallelismus in recurrenten Speichertransformatoren für lange Kontexte auf 对角对角拔 解锁长期常用内存变换器中的平行主义 2506.05229v1

Authors (5): Danil Sivtsov, Ivan Rodkin, Gleb Kuzmin, Yuri Kuratov, Ivan Oseledets

Transformer models struggle with long-context inference due to their quadratic time and linear memory complexity. Recurrent Memory Transformers (RMTs) offer a solution by reducing the asymptotic cost to linear time and constant memory usage. However, their memory update mechanism leads to sequential execution, causing a performance bottleneck. We introduce Diagonal Batching, a scheduling scheme that unlocks parallelism across segments in RMTs while preserving exact recurrence. This approach eliminates the sequential constraint, enabling efficient GPU inference even for single long-context inputs without complex batching and pipelining techniques. Because the technique is purely a run-time computation reordering, existing RMT models adopt it with no retraining. Applied to a LLaMA-1B ARMT model, Diagonal Batching yields a 3.3x speedup over standard full-attention LLaMA-1B and a 1.8x speedup over the sequential RMT implementation on 131,072-token sequences. By removing sequential bottleneck, Diagonal Batching reduces inference cost and latency, thereby strengthening RMTs as a practical solution for real-world, long-context applications.

nan


Article 1887

Title@2025-06-05 (4): The Lessons of Developing Process Reward Models in Mathematical Reasoning

Title: The Lessons of Developing Process Reward Models in Mathematical Reasoning Die Lehren aus der Entwicklung von Prozess-Reward-Modellen in mathematischer Reasoning 数学理由中发展进程奖励模型的经验教训 2501.07301v2

Authors (9): Zhenru Zhang, Chujie Zheng, Yangzhen Wu, Beichen Zhang, Runji Lin, Bowen Yu, Dayiheng Liu, Jingren Zhou, Junyang Lin

Process Reward Models (PRMs) emerge as a promising approach for process supervision in mathematical reasoning of Large Language Models (LLMs), which aim to identify and mitigate intermediate errors in the reasoning processes. However, the development of effective PRMs faces significant challenges, particularly in data annotation and evaluation methodologies. In this paper, through extensive experiments, we demonstrate that commonly used Monte Carlo (MC) estimation-based data synthesis for PRMs typically yields inferior performance and generalization compared to LLM-as-a-judge and human annotation methods. MC estimation relies on completion models to evaluate current-step correctness, leading to inaccurate step verification. Furthermore, we identify potential biases in conventional Best-of-N (BoN) evaluation strategies for PRMs: (1) The unreliable policy models generate responses with correct answers but flawed processes, leading to a misalignment between the evaluation criteria of BoN and the PRM objectives of process verification. (2) The tolerance of PRMs of such responses leads to inflated BoN scores. (3) Existing PRMs have a significant proportion of minimum scores concentrated on the final answer steps, revealing the shift from process to outcome-based assessment in BoN Optimized PRMs. To address these challenges, we develop a consensus filtering mechanism that effectively integrates MC estimation with LLM-as-a-judge and advocates a more comprehensive evaluation framework that combines response-level and step-level metrics. Based on the mechanisms, we significantly improve both model performance and data efficiency in the BoN evaluation and the step-wise error identification task. Finally, we release a new state-of-the-art PRM that outperforms existing open-source alternatives and provides practical guidelines for future research in building process supervision models.

nan


Article 1888

Title@2025-06-05 (4): A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values

Title: A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values Ein einheitliches Rahmenwerk für wahrscheinlich effiziente Algorithmen zur Schätzung von Shapley-Werten 用于估计损耗值的可预见高效算法统一框架 2506.05216v1

Authors (10): Tyler Chen, Akshay Seshadri, Mattia J. Villani, Pradeep Niroula, Shouvanik Chakrabarti, Archan Ray, Pranav Deshpande, Romina Yalovetzky, Marco Pistoia, Niraj Kumar

Shapley values have emerged as a critical tool for explaining which features impact the decisions made by machine learning models. However, computing exact Shapley values is difficult, generally requiring an exponential (in the feature dimension) number of model evaluations. To address this, many model-agnostic randomized estimators have been developed, the most influential and widely used being the KernelSHAP method (Lundberg & Lee, 2017). While related estimators such as unbiased KernelSHAP (Covert & Lee, 2021) and LeverageSHAP (Musco & Witter, 2025) are known to satisfy theoretical guarantees, bounds for KernelSHAP have remained elusive. We describe a broad and unified framework that encompasses KernelSHAP and related estimators constructed using both with and without replacement sampling strategies. We then prove strong non-asymptotic theoretical guarantees that apply to all estimators from our framework. This provides, to the best of our knowledge, the first theoretical guarantees for KernelSHAP and sheds further light on tradeoffs between existing estimators. Through comprehensive benchmarking on small and medium dimensional datasets for Decision-Tree models, we validate our approach against exact Shapley values, consistently achieving low mean squared error with modest sample sizes. Furthermore, we make specific implementation improvements to enable scalability of our methods to high-dimensional datasets. Our methods, tested on datasets such MNIST and CIFAR10, provide consistently better results compared to the KernelSHAP library.

nan


Article 1889

Title@2025-06-05 (4): Learning Theory of Decentralized Robust Kernel-Based Learning Algorithm

Title: Learning Theory of Decentralized Robust Kernel-Based Learning Algorithm Lerntheorie des dezentralisierten robusten Kernel-basierten Lernalgorithmus 分散的强力核心学习学习理论 2506.05215v1

Authors (1): Zhan Yu

We propose a new decentralized robust kernel-based learning algorithm within the framework of reproducing kernel Hilbert space (RKHS) by utilizing a networked system that can be represented as a connected graph. The robust loss function $\mathcal{L}_\sigma$ induced by a windowing function $W$ and a robustness scaling parameter $\sigma>0$, can encompass a broad spectrum of robust losses. Consequently, the proposed algorithm effectively provides a unified decentralized learning framework for robust regression, which fundamentally differs from the existing distributed robust kernel learning schemes, all of which are divide-and-conquer based. We rigorously establish the learning theory and offer a comprehensive convergence analysis for the algorithm. We show each local robust estimator generated from the decentralized algorithm can be utilized to approximate the regression function. Based on kernel-based integral operator techniques, we derive general high confidence convergence bounds for each local approximating sequence in terms of the mean square distance, RKHS norm, and generalization error, respectively. Moreover, we provide rigorous selection rules for local sample size and show that, under properly selected step size and scaling parameter $\sigma$, the decentralized robust algorithm can achieve optimal learning rates (up to logarithmic factors) in both norms. The parameter $\sigma$ is shown to be essential for enhancing robustness while also ensuring favorable convergence behavior. The intrinsic connection among decentralization, sample selection, robustness of the algorithm, and its convergence is clearly reflected.

nan


Article 1890

Title@2025-06-05 (4): Mitigating Degree Bias Adaptively with Hard-to-Learn Nodes in Graph Contrastive Learning

Title: Mitigating Degree Bias Adaptively with Hard-to-Learn Nodes in Graph Contrastive Learning Degree Bias im Graph Contrastive Learning adaptiv mit schwer zu erlernenden Knoten abmildern 与图表对比学习中难以识别的节点相适应 2506.05214v1

Authors (5): Jingyu Hu, Hongbo Bo, Jun Hong, Xiaowei Liu, Weiru Liu

Graph Neural Networks (GNNs) often suffer from degree bias in node classification tasks, where prediction performance varies across nodes with different degrees. Several approaches, which adopt Graph Contrastive Learning (GCL), have been proposed to mitigate this bias. However, the limited number of positive pairs and the equal weighting of all positives and negatives in GCL still lead to low-degree nodes acquiring insufficient and noisy information. This paper proposes the Hardness Adaptive Reweighted (HAR) contrastive loss to mitigate degree bias. It adds more positive pairs by leveraging node labels and adaptively weights positive and negative pairs based on their learning hardness. In addition, we develop an experimental framework named SHARP to extend HAR to a broader range of scenarios. Both our theoretical analysis and experiments validate the effectiveness of SHARP. The experimental results across four datasets show that SHARP achieves better performance against baselines at both global and degree levels.

nan


Article 1891

Title@2025-06-05 (4): Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data

Title: Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data Verstärktes Lernen für individuelle Optimale Politik aus heterogenen Daten 利用不同不同数据为个人最佳政策强化学习 2505.09496v2

Authors (3): Rui Miao, Babak Shahbaba, Annie Qu

Offline reinforcement learning (RL) aims to find optimal policies in dynamic environments in order to maximize the expected total rewards by leveraging pre-collected data. Learning from heterogeneous data is one of the fundamental challenges in offline RL. Traditional methods focus on learning an optimal policy for all individuals with pre-collected data from a single episode or homogeneous batch episodes, and thus, may result in a suboptimal policy for a heterogeneous population. In this paper, we propose an individualized offline policy optimization framework for heterogeneous time-stationary Markov decision processes (MDPs). The proposed heterogeneous model with individual latent variables enables us to efficiently estimate the individual Q-functions, and our Penalized Pessimistic Personalized Policy Learning (P4L) algorithm guarantees a fast rate on the average regret under a weak partial coverage assumption on behavior policies. In addition, our simulation studies and a real data application demonstrate the superior numerical performance of the proposed method compared with existing methods.

nan


Article 1892

Title@2025-06-05 (4): Unleashing The Power of Pre-Trained Language Models for Irregularly Sampled Time Series

Title: Unleashing The Power of Pre-Trained Language Models for Irregularly Sampled Time Series Lösen der Macht von vortrainierten Sprachmodellen für unregelmäßig gemusterte Zeitreihen 为不定期抽样抽样时间序列释放培训前语言模型的力量 2408.08328v2

Authors (4): Weijia Zhang, Chenlong Yin, Hao Liu, Hui Xiong

Pre-trained Language Models (PLMs), such as ChatGPT, have significantly advanced the field of natural language processing. This progress has inspired a series of innovative studies that explore the adaptation of PLMs to time series analysis, intending to create a unified foundation model that addresses various time series analytical tasks. However, these efforts predominantly focus on Regularly Sampled Time Series (RSTS), neglecting the unique challenges posed by Irregularly Sampled Time Series (ISTS), which are characterized by uneven sampling intervals and prevalent missing data. To bridge this gap, this work takes the first step in exploring the potential of PLMs for ISTS analysis. We begin by investigating the effect of various methods for representing ISTS, aiming to maximize the efficacy of PLMs in the analysis. Furthermore, we propose a unified PLM-based framework, named ISTS-PLM, to address diverse ISTS analytical tasks. It integrates novel time-aware and variable-aware PLMs tailored to tackle the intractable intra- and inter-time series modeling in ISTS. Finally, extensive experiments on a comprehensive benchmark demonstrate that the ISTS-PLM, utilizing a structured and effective series-based representation for ISTS, consistently achieves state-of-the-art performance across various analytical tasks, such as classification, interpolation, extrapolation, few-shot and zero-shot learning scenarios, spanning scientific domains like healthcare, biomechanics, and climate science.

nan


Article 1893

Title@2025-06-05 (4): The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Title: The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text The Common Pile v0.1: Ein 8TB-Datensatz von Public Domain und Openly Licensed Text Pile v0.1:公共域和公开许可文本的 8TB数据集 2506.05209v1

Authors (27): Nikhil Kandpal, Brian Lester, Colin Raffel, Sebastian Majstorovic, Stella Biderman, Baber Abbasi, Luca Soldaini, Enrico Shippole, A. Feder Cooper, Aviya Skowron, John Kirchenbauer, Shayne Longpre, Lintang Sutawika, Alon Albalak, Zhenlin Xu, Guilherme Penedo, Loubna Ben Allal, Elie Bakouch, John David Pressman, Honglu Fan, Dashiell Stander, Guangyu Song, Aaron Gokaslan, Tom Goldstein, Brian R. Bartoldson, Bhavya Kailkhura, Tyler Murray

Large language models (LLMs) are typically trained on enormous quantities of unlicensed text, a practice that has led to scrutiny due to possible intellectual property infringement and ethical concerns. Training LLMs on openly licensed text presents a first step towards addressing these issues, but prior data collection efforts have yielded datasets too small or low-quality to produce performant LLMs. To address this gap, we collect, curate, and release the Common Pile v0.1, an eight terabyte collection of openly licensed text designed for LLM pretraining. The Common Pile comprises content from 30 sources that span diverse domains including research papers, code, books, encyclopedias, educational materials, audio transcripts, and more. Crucially, we validate our efforts by training two 7 billion parameter LLMs on text from the Common Pile: Comma v0.1-1T and Comma v0.1-2T, trained on 1 and 2 trillion tokens respectively. Both models attain competitive performance to LLMs trained on unlicensed text with similar computational budgets, such as Llama 1 and 2 7B. In addition to releasing the Common Pile v0.1 itself, we also release the code used in its creation as well as the training mixture and checkpoints for the Comma v0.1 models.

nan


Article 1894

Title@2025-06-05 (4): One Wave To Explain Them All: A Unifying Perspective On Feature Attribution

Title: One Wave To Explain Them All: A Unifying Perspective On Feature Attribution Eine Welle, um sie alle zu erklären: Eine verbindende Perspektive auf Feature Attribution 一波一波解释全部:一个统一化的地物归属观 2410.01482v2

Authors (4): Gabriel Kasmi, Amandine Brunetto, Thomas Fel, Jayneel Parekh

Feature attribution methods aim to improve the transparency of deep neural networks by identifying the input features that influence a model’s decision. Pixel-based heatmaps have become the standard for attributing features to high-dimensional inputs, such as images, audio representations, and volumes. While intuitive and convenient, these pixel-based attributions fail to capture the underlying structure of the data. Moreover, the choice of domain for computing attributions has often been overlooked. This work demonstrates that the wavelet domain allows for informative and meaningful attributions. It handles any input dimension and offers a unified approach to feature attribution. Our method, the Wavelet Attribution Method (WAM), leverages the spatial and scale-localized properties of wavelet coefficients to provide explanations that capture both the where and what of a model’s decision-making process. We show that WAM quantitatively matches or outperforms existing gradient-based methods across multiple modalities, including audio, images, and volumes. Additionally, we discuss how WAM bridges attribution with broader aspects of model robustness and transparency. Project page: https://gabrielkasmi.github.io/wam/

nan


Article 1895

Title@2025-06-05 (4): Trustworthiness Preservation by Copies of Machine Learning Systems

Title: Trustworthiness Preservation by Copies of Machine Learning Systems Vertrauenswürdigkeit Erhaltung durch Kopien von Machine Learning Systemen 机器学习系统副本保存可信赖性 2506.05203v1

Authors (2): Leonardo Ceragioli, Giuseppe Primiero

A common practice of ML systems development concerns the training of the same model under different data sets, and the use of the same (training and test) sets for different learning models. The first case is a desirable practice for identifying high quality and unbiased training conditions. The latter case coincides with the search for optimal models under a common dataset for training. These differently obtained systems have been considered akin to copies. In the quest for responsible AI, a legitimate but hardly investigated question is how to verify that trustworthiness is preserved by copies. In this paper we introduce a calculus to model and verify probabilistic complex queries over data and define four distinct notions: Justifiably, Equally, Weakly and Almost Trustworthy which can be checked analysing the (partial) behaviour of the copy with respect to its original. We provide a study of the relations between these notions of trustworthiness, and how they compose with each other and under logical operations. The aim is to offer a computational tool to check the trustworthiness of possibly complex systems copied from an original whose behavour is known.

nan


Article 1896

Title@2025-06-05 (4): Transformers Meet In-Context Learning: A Universal Approximation Theory

Title: Transformers Meet In-Context Learning: A Universal Approximation Theory Transformer treffen auf In-Context Learning: Eine universelle Annäherungstheorie 变革者满足内文学习:通用近似理论 2506.05200v1

Authors (5): Gen Li, Yuchen Jiao, Yu Huang, Yuting Wei, Yuxin Chen

Modern large language models are capable of in-context learning, the ability to perform new tasks at inference time using only a handful of input-output examples in the prompt, without any fine-tuning or parameter updates. We develop a universal approximation theory to better understand how transformers enable in-context learning. For any class of functions (each representing a distinct task), we demonstrate how to construct a transformer that, without any further weight updates, can perform reliable prediction given only a few in-context examples. In contrast to much of the recent literature that frames transformers as algorithm approximators – i.e., constructing transformers to emulate the iterations of optimization algorithms as a means to approximate solutions of learning problems – our work adopts a fundamentally different approach rooted in universal function approximation. This alternative approach offers approximation guarantees that are not constrained by the effectiveness of the optimization algorithms being approximated, thereby extending far beyond convex problems and linear function classes. Our construction sheds light on how transformers can simultaneously learn general-purpose representations and adapt dynamically to in-context examples.

nan


Article 1897

Title@2025-06-05 (4): Quantifying Cross-Modality Memorization in Vision-Language Models

Title: Quantifying Cross-Modality Memorization in Vision-Language Models Quantifizierung der Cross-Modality-Memorisierung in Vision-Language-Modellen 设想语言模型中量化跨模式的计量 2506.05198v1

Authors (6): Yuxin Wen, Yangsibo Huang, Tom Goldstein, Ravi Kumar, Badih Ghazi, Chiyuan Zhang

Understanding what and how neural networks memorize during training is crucial, both from the perspective of unintentional memorization of potentially sensitive information and from the standpoint of effective knowledge acquisition for real-world, knowledge-intensive tasks. While previous studies primarily investigate memorization within a single modality, such as text memorization in large language models or image memorization in diffusion models, unified multimodal models are becoming increasingly prevalent in practical applications. In this work, we focus on the unique characteristics of cross-modality memorization and conduct a systematic study centered on vision-language models. To facilitate controlled experiments, we first introduce a synthetic persona dataset comprising diverse synthetic person images and textual descriptions. We quantify factual knowledge memorization and cross-modal transferability by training models on a single modality and evaluating their performance in the other. Our results reveal that facts learned in one modality transfer to the other, but a significant gap exists between recalling information in the source and target modalities. Furthermore, we observe that this gap exists across various scenarios, including more capable models, machine unlearning, and the multi-hop case. At the end, we propose a baseline method to mitigate this challenge. We hope our study can inspire future research on developing more robust multimodal learning techniques to enhance cross-modal transferability.

nan


Article 1898

Title@2025-06-05 (4): Abnormal component analysis

Title: Abnormal component analysis Abnorme Komponentenanalyse 异常元件分析 2312.16139v2

Authors (3): Romain Valla, Pavlo Mozharovskyi, Florence d’Alché-Buc

At the crossway of machine learning and data analysis, anomaly detection aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification and isolation constitute an important task in almost any area of industry and science. While a substantial body of literature is devoted to detection of anomalies, little attention is payed to their explanation. This is the case mostly due to intrinsically non-supervised nature of the task and non-robustness of the exploratory methods like principal component analysis (PCA). We introduce a new statistical tool dedicated for exploratory analysis of abnormal observations using data depth as a score. Abnormal component analysis (shortly ACA) is a method that searches a low-dimensional data representation that best visualises and explains anomalies. This low-dimensional representation not only allows to distinguish groups of anomalies better than the methods of the state of the art, but as well provides a – linear in variables and thus easily interpretable – explanation for anomalies. In a comparative simulation and real-data study, ACA also proves advantageous for anomaly analysis with respect to methods present in the literature.

nan


Article 1899

Title@2025-06-05 (4): Locality Preserving Markovian Transition for Instance Retrieval

Title: Locality Preserving Markovian Transition for Instance Retrieval Lokalität Erhaltung der Markovschen Übergang für Instance Retrieval 将Markovian 过渡保留为事件检索 2506.05196v1

Authors (5): Jifei Luo, Wenzheng Wu, Hantao Yao, Lu Yu, Changsheng Xu

Diffusion-based re-ranking methods are effective in modeling the data manifolds through similarity propagation in affinity graphs. However, positive signals tend to diminish over several steps away from the source, reducing discriminative power beyond local regions. To address this issue, we introduce the Locality Preserving Markovian Transition (LPMT) framework, which employs a long-term thermodynamic transition process with multiple states for accurate manifold distance measurement. The proposed LPMT first integrates diffusion processes across separate graphs using Bidirectional Collaborative Diffusion (BCD) to establish strong similarity relationships. Afterwards, Locality State Embedding (LSE) encodes each instance into a distribution for enhanced local consistency. These distributions are interconnected via the Thermodynamic Markovian Transition (TMT) process, enabling efficient global retrieval while maintaining local effectiveness. Experimental results across diverse tasks confirm the effectiveness of LPMT for instance retrieval.

nan


Article 1900

Title@2025-06-05 (4): Chaotic Hedging with Iterated Integrals and Neural Networks

Title: Chaotic Hedging with Iterated Integrals and Neural Networks Chaotische Absicherung mit iterierten Integralen und neuralen Netzwerken 使用循环综合器和神经网络的冷酷杂乱和神经网络 2209.10166v4

Authors (2): Ariel Neufeld, Philipp Schmocker

In this paper, we derive an $L^p$-chaos expansion based on iterated Stratonovich integrals with respect to a given exponentially integrable continuous semimartingale. By omitting the orthogonality of the expansion, we show that every $p$-integrable functional, $p \in [1,\infty)$, can be approximated by a finite sum of iterated Stratonovich integrals. Using (possibly random) neural networks as integrands, we therefere obtain universal approximation results for $p$-integrable financial derivatives in the $L^p$-sense. Moreover, we can approximately solve the $L^p$-hedging problem (coinciding for $p = 2$ with the quadratic hedging problem), where the approximating hedging strategy can be computed in closed form within short runtime.

nan


Article 1901

Title@2025-06-05 (4): Counterfactual reasoning: an analysis of in-context emergence

Title: Counterfactual reasoning: an analysis of in-context emergence Gegenfaktische Argumentation: eine Analyse der Entstehung von Inkontexten 反事实推理:对内源出现的分析 2506.05188v1

Authors (3): Moritz Miller, Bernhard Schölkopf, Siyuan Guo

Large-scale neural language models (LMs) exhibit remarkable performance in in-context learning: the ability to learn and reason the input context on the fly without parameter update. This work studies in-context counterfactual reasoning in language models, that is, to predict the consequences of changes under hypothetical scenarios. We focus on studying a well-defined synthetic setup: a linear regression task that requires noise abduction, where accurate prediction is based on inferring and copying the contextual noise from factual observations. We show that language models are capable of counterfactual reasoning in this controlled setup and provide insights that counterfactual reasoning for a broad class of functions can be reduced to a transformation on in-context observations; we find self-attention, model depth, and data diversity in pre-training drive performance in Transformers. More interestingly, our findings extend beyond regression tasks and show that Transformers can perform noise abduction on sequential data, providing preliminary evidence on the potential for counterfactual story generation. Our code is available under https://github.com/moXmiller/counterfactual-reasoning.git .

nan


Article 1902

Title@2025-06-05 (4): TreeRPO: Tree Relative Policy Optimization

Title: TreeRPO: Tree Relative Policy Optimization TreeRPO: Baum Relative Politikoptimierung TreeRPO:树相对政策优化 2506.05183v1

Authors (6): Zhicheng Yang, Zhijiang Guo, Yinya Huang, Xiaodan Liang, Yiwei Wang, Jing Tang

Large Language Models (LLMs) have shown remarkable reasoning capabilities through Reinforcement Learning with Verifiable Rewards (RLVR) methods. However, a key limitation of existing approaches is that rewards defined at the full trajectory level provide insufficient guidance for optimizing the intermediate steps of a reasoning process. To address this, we introduce \textbf{\name}, a novel method that estimates the mathematical expectations of rewards at various reasoning steps using tree sampling. Unlike prior methods that rely on a separate step reward model, \name directly estimates these rewards through this sampling process. Building on the group-relative reward training mechanism of GRPO, \name innovatively computes rewards based on step-level groups generated during tree sampling. This advancement allows \name to produce fine-grained and dense reward signals, significantly enhancing the learning process and overall performance of LLMs. Experimental results demonstrate that our \name algorithm substantially improves the average Pass@1 accuracy of Qwen-2.5-Math on test benchmarks, increasing it from 19.0\% to 35.5\%. Furthermore, \name significantly outperforms GRPO by 2.9\% in performance while simultaneously reducing the average response length by 18.1\%, showcasing its effectiveness and efficiency. Our code will be available at \href{https://github.com/yangzhch6/TreeRPO}{https://github.com/yangzhch6/TreeRPO}.

nan


Article 1903

Title@2025-06-05 (4): AI-powered Contextual 3D Environment Generation: A Systematic Review

Title: AI-powered Contextual 3D Environment Generation: A Systematic Review KI-powered Contextual 3D Environment Generation: Eine systematische Überprüfung 3D环境代:系统审查 2506.05449v1

Authors (2): Miguel Silva, Alexandre Valle de Carvalho

The generation of high-quality 3D environments is crucial for industries such as gaming, virtual reality, and cinema, yet remains resource-intensive due to the reliance on manual processes. This study performs a systematic review of existing generative AI techniques for 3D scene generation, analyzing their characteristics, strengths, limitations, and potential for improvement. By examining state-of-the-art approaches, it presents key challenges such as scene authenticity and the influence of textual inputs. Special attention is given to how AI can blend different stylistic domains while maintaining coherence, the impact of training data on output quality, and the limitations of current models. In addition, this review surveys existing evaluation metrics for assessing realism and explores how industry professionals incorporate AI into their workflows. The findings of this study aim to provide a comprehensive understanding of the current landscape and serve as a foundation for future research on AI-driven 3D content generation. Key findings include that advanced generative architectures enable high-quality 3D content creation at a high computational cost, effective multi-modal integration techniques like cross-attention and latent space alignment facilitate text-to-3D tasks, and the quality and diversity of training data combined with comprehensive evaluation metrics are critical to achieving scalable, robust 3D scene generation.

nan


Article 1904

Title@2025-06-05 (4): Associative Memory and Generative Diffusion in the Zero-noise Limit

Title: Associative Memory and Generative Diffusion in the Zero-noise Limit Assoziatives Gedächtnis und Generative Diffusion in der Null-Rausch-Grenze 零噪音限制中的联合内存和生成扩散 2506.05178v1

Authors (2): Joshua Hess, Quaid Morris

Connections between generative diffusion and continuous-state associative memory models are studied. Morse-Smale dynamical systems are emphasized as universal approximators of gradient-based associative memory models and diffusion models as white-noise perturbed systems thereof. Universal properties of associative memory that follow from this description are described and used to characterize a generic transition from generation to memory as noise levels diminish. Structural stability inherited by Morse-Smale flows is shown to imply a notion of stability for diffusions at vanishing noise levels. Applied to one- and two-parameter families of gradients, this indicates stability at all but isolated points of associative memory learning landscapes and the learning and generation landscapes of diffusion models with gradient drift in the zero-noise limit, at which small sets of generic bifurcations characterize qualitative transitions between stable systems. Examples illustrating the characterization of these landscapes by sequences of these bifurcations are given, along with structural stability criterion for classic and modern Hopfield networks (equivalently, the attention mechanism).

nan


Article 1905

Title@2025-06-05 (4): Can Large Language Models Understand Intermediate Representations in Compilers?

Title: Can Large Language Models Understand Intermediate Representations in Compilers? Können große Sprachmodelle Zwischendarstellungen in Compilern verstehen? 大语言模式能理解《汇编者》的中间代表吗? 2502.06854v2

Authors (7): Hailong Jiang, Jianfeng Zhu, Yao Wan, Bo Fang, Hongyu Zhang, Ruoming Jin, Qiang Guan

Intermediate Representations (IRs) play a critical role in compiler design and program analysis, yet their comprehension by Large Language Models (LLMs) remains underexplored. In this paper, we present an explorative empirical study evaluating the capabilities of six state-of-the-art LLMs: GPT-4, GPT-3, DeepSeek, Gemma 2, Llama 3, and Code Llama, in understanding IRs. Specifically, we assess model performance across four core tasks: control flow graph reconstruction, decompilation, code summarization, and execution reasoning. While LLMs exhibit competence in parsing IR syntax and identifying high-level structures, they consistently struggle with instruction-level reasoning, especially in control flow reasoning, loop handling, and dynamic execution. Common failure modes include misinterpreting branching instructions, omitting critical operations, and relying on heuristic reasoning rather than precise instruction-level logic. Our findings highlight the need for IR-specific enhancements in LLM design. We recommend fine-tuning on structured IR datasets and integrating control-flow-sensitive architectures to improve model effectiveness. All experimental data and source code are publicly available at

nan


Article 1906

Title@2025-06-05 (4): Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks

Title: Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks Verallgemeinerungen durch Meta-Learned Model Representations: PAC-Bayes und Sample Compression Hypernetworks 通过Met-Learn模型代表:PAC-Bayes和抽样压缩超网络的通用环 2410.13577v3

Authors (5): Benjamin Leblanc, Mathieu Bazinet, Nathaniel D’Amours, Alexandre Drouin, Pascal Germain

Both PAC-Bayesian and Sample Compress learning frameworks are instrumental for deriving tight (non-vacuous) generalization bounds for neural networks. We leverage these results in a meta-learning scheme, relying on a hypernetwork that outputs the parameters of a downstream predictor from a dataset input. The originality of our approach lies in the investigated hypernetwork architectures that encode the dataset before decoding the parameters: (1) a PAC-Bayesian encoder that expresses a posterior distribution over a latent space, (2) a Sample Compress encoder that selects a small sample of the dataset input along with a message from a discrete set, and (3) a hybrid between both approaches motivated by a new Sample Compress theorem handling continuous messages. The latter theorem exploits the pivotal information transiting at the encoder-decoder junction in order to compute generalization guarantees for each downstream predictor obtained by our meta-learning scheme.

nan


Article 1907

Title@2025-06-05 (4): Goal-Oriented Time-Series Forecasting: Foundation Framework Design

Title: Goal-Oriented Time-Series Forecasting: Foundation Framework Design Zielorientierte Zeitreihenplanung: Grundlagenrahmengestaltung 目标导向的时间系列预测:基础框架设计 2504.17493v2

Authors (7): Luca-Andrei Fechete, Mohamed Sana, Fadhel Ayed, Nicola Piovesan, Wenjie Li, Antonio De Domenico, Tareq Si Salem

Traditional time-series forecasting often focuses only on minimizing prediction errors, ignoring the specific requirements of real-world applications that employ them. This paper presents a new training methodology, which allows a forecasting model to dynamically adjust its focus based on the importance of forecast ranges specified by the end application. Unlike previous methods that fix these ranges beforehand, our training approach breaks down predictions over the entire signal range into smaller segments, which are then dynamically weighted and combined to produce accurate forecasts within a region of interest. We tested our method on standard datasets, including a new wireless communication dataset, and found that not only it improves prediction accuracy but also enhances the performance of end application employing the forecasting model. This research provides a basis for creating forecasting systems that better connect prediction and decision-making in various practical applications.

nan


Article 1908

Title@2025-06-05 (4): Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests

Title: Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests Bewertung der Robustheit von Deep Reinforcement Learning für die autonome Oberflächenfahrzeugsteuerung in Feldversuchen 评价实地测试中用于自动水陆车辆控制的深强化学习的强力 2505.10033v2

Authors (4): Luis F. W. Batista, Stéphanie Aravecchia, Seth Hutchinson, Cédric Pradalier

Despite significant advancements in Deep Reinforcement Learning (DRL) for Autonomous Surface Vehicles (ASVs), their robustness in real-world conditions, particularly under external disturbances, remains insufficiently explored. In this paper, we evaluate the resilience of a DRL-based agent designed to capture floating waste under various perturbations. We train the agent using domain randomization and evaluate its performance in real-world field tests, assessing its ability to handle unexpected disturbances such as asymmetric drag and an off-center payload. We assess the agent’s performance under these perturbations in both simulation and real-world experiments, quantifying performance degradation and benchmarking it against an MPC baseline. Results indicate that the DRL agent performs reliably despite significant disturbances. Along with the open-source release of our implementation, we provide insights into effective training strategies, real-world challenges, and practical considerations for deploying DRLbased ASV controllers.

nan


Article 1909

Title@2025-06-05 (4): Detection-Driven Object Count Optimization for Text-to-Image Diffusion Models

Title: Detection-Driven Object Count Optimization for Text-to-Image Diffusion Models Detektionsgesteuerte Objektanzahl-Optimierung für Text-zu-Bild-Diffusionsmodelle 文本到图像传播模型最佳化 2408.11721v2

Authors (4): Oz Zafar, Yuval Cohen, Lior Wolf, Idan Schwartz

Accurately controlling object count in text-to-image generation remains a key challenge. Supervised methods often fail, as training data rarely covers all count variations. Methods that manipulate the denoising process to add or remove objects can help; however, they still require labeled data, limit robustness and image quality, and rely on a slow, iterative process. Pre-trained differentiable counting models that rely on soft object density summation exist and could steer generation, but employing them presents three main challenges: (i) they are pre-trained on clean images, making them less effective during denoising steps that operate on noisy inputs; (ii) they are not robust to viewpoint changes; and (iii) optimization is computationally expensive, requiring repeated model evaluations per image. We propose a new framework that uses pre-trained object counting techniques and object detectors to guide generation. First, we optimize a counting token using an outer-loop loss computed on fully generated images. Second, we introduce a detection-driven scaling term that corrects errors caused by viewpoint and proportion shifts, among other factors, without requiring backpropagation through the detection model. Third, we show that the optimized parameters can be reused for new prompts, removing the need for repeated optimization. Our method provides efficiency through token reuse, flexibility via compatibility with various detectors, and accuracy with improved counting across diverse object categories.

nan


Article 1910

Title@2025-06-05 (4): SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows

Title: SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows SESaMo: Symmetrie-verstärkende stochastische Modulation für normalisierende Strömungen SESaMo: 正常流动的对称性-强化斯托调动 2505.19619v2

Authors (4): Janik Kreit, Dominic Schuh, Kim A. Nicoli, Lena Funcke

Deep generative models have recently garnered significant attention across various fields, from physics to chemistry, where sampling from unnormalized Boltzmann-like distributions represents a fundamental challenge. In particular, autoregressive models and normalizing flows have become prominent due to their appealing ability to yield closed-form probability densities. Moreover, it is well-established that incorporating prior knowledge - such as symmetries - into deep neural networks can substantially improve training performances. In this context, recent advances have focused on developing symmetry-equivariant generative models, achieving remarkable results. Building upon these foundations, this paper introduces Symmetry-Enforcing Stochastic Modulation (SESaMo). Similar to equivariant normalizing flows, SESaMo enables the incorporation of inductive biases (e.g., symmetries) into normalizing flows through a novel technique called stochastic modulation. This approach enhances the flexibility of the generative model, allowing to effectively learn a variety of exact and broken symmetries. Our numerical experiments benchmark SESaMo in different scenarios, including an 8-Gaussian mixture model and physically relevant field theories, such as the $\phi^4$ theory and the Hubbard model.

nan


Article 1911

Title@2025-06-05 (4): Federated Isolation Forest for Efficient Anomaly Detection on Edge IoT Systems

Title: Federated Isolation Forest for Efficient Anomaly Detection on Edge IoT Systems Föderierter Isolationswald für effiziente Anomalienerkennung an Edge IoT-Systemen 在边缘IOT系统中高效异常探测的联邦隔离林 2506.05138v1

Authors (3): Pavle Vasiljevic, Milica Matic, Miroslav Popovic

Recently, federated learning frameworks such as Python TestBed for Federated Learning Algorithms and MicroPython TestBed for Federated Learning Algorithms have emerged to tackle user privacy concerns and efficiency in embedded systems. Even more recently, an efficient federated anomaly detection algorithm, FLiForest, based on Isolation Forests has been developed, offering a low-resource, unsupervised method well-suited for edge deployment and continuous learning. In this paper, we present an application of Isolation Forest-based temperature anomaly detection, developed using the previously mentioned federated learning frameworks, aimed at small edge devices and IoT systems running MicroPython. The system has been experimentally evaluated, achieving over 96% accuracy in distinguishing normal from abnormal readings and above 78% precision in detecting anomalies across all tested configurations, while maintaining a memory usage below 160 KB during model training. These results highlight its suitability for resource-constrained environments and edge systems, while upholding federated learning principles of data privacy and collaborative learning.

nan


Article 1912

Title@2025-06-05 (4): Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning

Title: Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning Training Dynamics zugrunde liegende Sprachmodellskalierungsgesetze: Verlustverschleierung und Null-Summe-Lernen 培训动态基础语言示范缩写法:损失减速和零苏姆学习 2506.05447v1

Authors (5): Andrei Mircea, Supriyo Chakraborty, Nima Chitsazan, Irina Rish, Ekaterina Lobacheva

This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl

nan


Article 1913

Title@2025-06-05 (4): DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning

Title: DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning DiCoRe: Erweitern der Null-Shot-Erkennung durch Divergent-Convergent LLM Reasoning DiCore: 通过差异-说服者LLM 合理性,加强零射事件探测 2506.05128v1

Authors (5): Tanmay Parekh, Kartik Mehta, Ninareh Mehrabi, Kai-Wei Chang, Nanyun Peng

Zero-shot Event Detection (ED), the task of identifying event mentions in natural language text without any training data, is critical for document understanding in specialized domains. Understanding the complex event ontology, extracting domain-specific triggers from the passage, and structuring them appropriately overloads and limits the utility of Large Language Models (LLMs) for zero-shot ED. To this end, we propose DiCoRe, a divergent-convergent reasoning framework that decouples the task of ED using Dreamer and Grounder. Dreamer encourages divergent reasoning through open-ended event discovery, which helps to boost event coverage. Conversely, Grounder introduces convergent reasoning to align the free-form predictions with the task-specific instructions using finite-state machine guided constrained decoding. Additionally, an LLM-Judge verifies the final outputs to ensure high precision. Through extensive experiments on six datasets across five domains and nine LLMs, we demonstrate how DiCoRe consistently outperforms prior zero-shot, transfer-learning, and reasoning baselines, achieving 4-7% average F1 gains over the best baseline – establishing DiCoRe as a strong zero-shot ED framework.

nan


Article 1914

Title@2025-06-05 (4): Membership Inference Attacks on Sequence Models

Title: Membership Inference Attacks on Sequence Models Membership Inferenz Attacks on Sequence Models 对序列模型的攻击 2506.05126v1

Authors (4): Lorenzo Rossi, Michael Aerni, Jie Zhang, Florian Tramèr

Sequence models, such as Large Language Models (LLMs) and autoregressive image generators, have a tendency to memorize and inadvertently leak sensitive information. While this tendency has critical legal implications, existing tools are insufficient to audit the resulting risks. We hypothesize that those tools’ shortcomings are due to mismatched assumptions. Thus, we argue that effectively measuring privacy leakage in sequence models requires leveraging the correlations inherent in sequential generation. To illustrate this, we adapt a state-of-the-art membership inference attack to explicitly model within-sequence correlations, thereby demonstrating how a strong existing attack can be naturally extended to suit the structure of sequence models. Through a case study, we show that our adaptations consistently improve the effectiveness of memorization audits without introducing additional computational costs. Our work hence serves as an important stepping stone toward reliable memorization audits for large sequence models.

nan


Article 1915

Title@2025-06-05 (4): Jailbreak Attack Initializations as Extractors of Compliance Directions

Title: Jailbreak Attack Initializations as Extractors of Compliance Directions Jailbreak Attack Initialisierungen als Extraktoren für Compliance-Anweisungen 作为 “ 遵守指示 “ 的提取器 2502.09755v2

Authors (5): Amit Levi, Rom Himelstein, Yaniv Nemcovsky, Avi Mendelson, Chaim Baskin

Safety-aligned LLMs respond to prompts with either compliance or refusal, each corresponding to distinct directions in the model’s activation space. Recent works show that initializing attacks via self-transfer from other prompts significantly enhances their performance. However, the underlying mechanisms of these initializations remain unclear, and attacks utilize arbitrary or hand-picked initializations. This work presents that each gradient-based jailbreak attack and subsequent initialization gradually converge to a single compliance direction that suppresses refusal, thereby enabling an efficient transition from refusal to compliance. Based on this insight, we propose CRI, an initialization framework that aims to project unseen prompts further along compliance directions. We demonstrate our approach on multiple attacks, models, and datasets, achieving an increased attack success rate (ASR) and reduced computational overhead, highlighting the fragility of safety-aligned LLMs. A reference implementation is available at: https://amit1221levi.github.io/CRI-Jailbreak-Init-LLMs-evaluation.

nan


Article 1916

Title@2025-06-05 (4): Nonlinear Causal Discovery for Grouped Data

Title: Nonlinear Causal Discovery for Grouped Data Nichtlineare kausale Entdeckung für gruppierte Daten 分组数据的非线性构造发现 2506.05120v1

Authors (3): Konstantin Göbler, Tobias Windisch, Mathias Drton

Inferring cause-effect relationships from observational data has gained significant attention in recent years, but most methods are limited to scalar random variables. In many important domains, including neuroscience, psychology, social science, and industrial manufacturing, the causal units of interest are groups of variables rather than individual scalar measurements. Motivated by these applications, we extend nonlinear additive noise models to handle random vectors, establishing a two-step approach for causal graph learning: First, infer the causal order among random vectors. Second, perform model selection to identify the best graph consistent with this order. We introduce effective and novel solutions for both steps in the vector case, demonstrating strong performance in simulations. Finally, we apply our method to real-world assembly line data with partial knowledge of causal ordering among variable groups.

nan


Article 1917

Title@2025-06-05 (4): Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images

Title: Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images Erforschung von adversarialen Wasserzeichen in transformerbasierten Modellen: Übertragbarkeit und Robustheit gegen Abwehrmechanismen für medizinische Bilder 探索以变换模式为基础的变换模型中的反对水标记:医疗图像防御机制的可转移性和强力 2506.06389v1

Authors (5): Rifat Sadik, Tanvir Rahman, Arpan Bhattacharjee, Bikash Chandra Halder, Ismail Hossain

Deep learning models have shown remarkable success in dermatological image analysis, offering potential for automated skin disease diagnosis. Previously, convolutional neural network(CNN) based architectures have achieved immense popularity and success in computer vision (CV) based task like skin image recognition, generation and video analysis. But with the emergence of transformer based models, CV tasks are now are nowadays carrying out using these models. Vision Transformers (ViTs) is such a transformer-based models that have shown success in computer vision. It uses self-attention mechanisms to achieve state-of-the-art performance across various tasks. However, their reliance on global attention mechanisms makes them susceptible to adversarial perturbations. This paper aims to investigate the susceptibility of ViTs for medical images to adversarial watermarking-a method that adds so-called imperceptible perturbations in order to fool models. By generating adversarial watermarks through Projected Gradient Descent (PGD), we examine the transferability of such attacks to CNNs and analyze the performance defense mechanism – adversarial training. Results indicate that while performance is not compromised for clean images, ViTs certainly become much more vulnerable to adversarial attacks: an accuracy drop of as low as 27.6%. Nevertheless, adversarial training raises it up to 90.0%.

nan


Article 1918

Title@2025-06-05 (4): Learning Rock Pushability on Rough Planetary Terrain

Title: Learning Rock Pushability on Rough Planetary Terrain Rock Pushability lernen auf rauem planetarischen Gelände 环绕行星地的学习岩岩推力 2505.09833v2

Authors (3): Tuba Girgin, Emre Girgin, Cagri Kilic

In the context of mobile navigation in unstructured environments, the predominant approach entails the avoidance of obstacles. The prevailing path planning algorithms are contingent upon deviating from the intended path for an indefinite duration and returning to the closest point on the route after the obstacle is left behind spatially. However, avoiding an obstacle on a path that will be used repeatedly by multiple agents can hinder long-term efficiency and lead to a lasting reliance on an active path planning system. In this study, we propose an alternative approach to mobile navigation in unstructured environments by leveraging the manipulation capabilities of a robotic manipulator mounted on top of a mobile robot. Our proposed framework integrates exteroceptive and proprioceptive feedback to assess the push affordance of obstacles, facilitating their repositioning rather than avoidance. While our preliminary visual estimation takes into account the characteristics of both the obstacle and the surface it relies on, the push affordance estimation module exploits the force feedback obtained by interacting with the obstacle via a robotic manipulator as the guidance signal. The objective of our navigation approach is to enhance the efficiency of routes utilized by multiple agents over extended periods by reducing the overall time spent by a fleet in environments where autonomous infrastructure development is imperative, such as lunar or Martian surfaces.

nan


Article 1919

Title@2025-06-05 (4): Survey on the Evaluation of Generative Models in Music

Title: Survey on the Evaluation of Generative Models in Music Umfrage zur Bewertung generativer Modelle in der Musik 音乐创创模型评价调查 2506.05104v1

Authors (6): Alexander Lerch, Claire Arthur, Nick Bryan-Kinns, Corey Ford, Qianyi Sun, Ashvala Vinay

Research on generative systems in music has seen considerable attention and growth in recent years. A variety of attempts have been made to systematically evaluate such systems. We provide an interdisciplinary review of the common evaluation targets, methodologies, and metrics for the evaluation of both system output and model usability, covering subjective and objective approaches, qualitative and quantitative approaches, as well as empirical and computational methods. We discuss the advantages and challenges of such approaches from a musicological, an engineering, and an HCI perspective.

nan


Article 1920

Title@2025-06-05 (4): Structural Alignment Improves Graph Test-Time Adaptation

Title: Structural Alignment Improves Graph Test-Time Adaptation Struktural Alignment verbessert Graph Test-Time Anpassung 结构调整改进图示测试时间适应 2502.18334v3

Authors (4): Hans Hao-Hsun Hsu, Shikun Liu, Han Zhao, Pan Li

Graph-based learning excels at capturing interaction patterns in diverse domains like recommendation, fraud detection, and particle physics. However, its performance often degrades under distribution shifts, especially those altering network connectivity. Current methods to address these shifts typically require retraining with the source dataset, which is often infeasible due to computational or privacy limitations. We introduce Test-Time Structural Alignment (TSA), a novel algorithm for Graph Test-Time Adaptation (GTTA) that aligns graph structures during inference without accessing the source data. Grounded in a theoretical understanding of graph data distribution shifts, TSA employs three synergistic strategies: uncertainty-aware neighborhood weighting to accommodate neighbor label distribution shifts, adaptive balancing of self-node and aggregated neighborhood representations based on their signal-to-noise ratio, and decision boundary refinement to correct residual label and feature shifts. Extensive experiments on synthetic and real-world datasets demonstrate TSA’s consistent outperformance of both non-graph TTA methods and state-of-the-art GTTA baselines.

nan


Article 1921

Title@2025-06-05 (4): Privacy Amplification Through Synthetic Data: Insights from Linear Regression

Title: Privacy Amplification Through Synthetic Data: Insights from Linear Regression Datenschutzverstärkung durch synthetische Daten: Einblicke aus linearer Regression 通过合成数据放大隐私:线性回归的洞视 2506.05101v1

Authors (4): Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard

Synthetic data inherits the differential privacy guarantees of the model used to generate it. Additionally, synthetic data may benefit from privacy amplification when the generative model is kept hidden. While empirical studies suggest this phenomenon, a rigorous theoretical understanding is still lacking. In this paper, we investigate this question through the well-understood framework of linear regression. First, we establish negative results showing that if an adversary controls the seed of the generative model, a single synthetic data point can leak as much information as releasing the model itself. Conversely, we show that when synthetic data is generated from random inputs, releasing a limited number of synthetic data points amplifies privacy beyond the model’s inherent guarantees. We believe our findings in linear regression can serve as a foundation for deriving more general bounds in the future.

nan


Article 1922

Title@2025-06-05 (4): The Impossibility of Fair LLMs

Title: The Impossibility of Fair LLMs Die Unmöglichkeit fairer LLMs 公平专利Ms的不可行性 2406.03198v2

Authors (5): Jacy Anthis, Kristian Lum, Michael Ekstrand, Avi Feller, Chenhao Tan

The rise of general-purpose artificial intelligence (AI) systems, particularly large language models (LLMs), has raised pressing moral questions about how to reduce bias and ensure fairness at scale. Researchers have documented a sort of “bias” in the significant correlations between demographics (e.g., race, gender) in LLM prompts and responses, but it remains unclear how LLM fairness could be evaluated with more rigorous definitions, such as group fairness or fair representations. We analyze a variety of technical fairness frameworks and find inherent challenges in each that make the development of a fair LLM intractable. We show that each framework either does not logically extend to the general-purpose AI context or is infeasible in practice, primarily due to the large amounts of unstructured training data and the many potential combinations of human populations, use cases, and sensitive attributes. These inherent challenges would persist for general-purpose AI, including LLMs, even if empirical challenges, such as limited participatory input and limited measurement methods, were overcome. Nonetheless, fairness will remain an important type of model evaluation, and there are still promising research directions, particularly the development of standards for the responsibility of LLM developers, context-specific evaluations, and methods of iterative, participatory, and AI-assisted evaluation that could scale fairness across the diverse contexts of modern human-AI interaction.

nan


Article 1923

Title@2025-06-05 (4): Semi-Implicit Variational Inference via Kernelized Path Gradient Descent

Title: Semi-Implicit Variational Inference via Kernelized Path Gradient Descent Halbimplizite Variationsableitung über kernalisierten Pfad gradienten Abstieg 通过内核化路径梯度梯度 2506.05088v1

Authors (3): Tobias Pielok, Bernd Bischl, David Rügamer

Semi-implicit variational inference (SIVI) is a powerful framework for approximating complex posterior distributions, but training with the Kullback-Leibler (KL) divergence can be challenging due to high variance and bias in high-dimensional settings. While current state-of-the-art semi-implicit variational inference methods, particularly Kernel Semi-Implicit Variational Inference (KSIVI), have been shown to work in high dimensions, training remains moderately expensive. In this work, we propose a kernelized KL divergence estimator that stabilizes training through nonparametric smoothing. To further reduce the bias, we introduce an importance sampling correction. We provide a theoretical connection to the amortized version of the Stein variational gradient descent, which estimates the score gradient via Stein’s identity, showing that both methods minimize the same objective, but our semi-implicit approach achieves lower gradient variance. In addition, our method’s bias in function space is benign, leading to more stable and efficient optimization. Empirical results demonstrate that our method outperforms or matches state-of-the-art SIVI methods in both performance and training efficiency.

nan


Article 1924

Title@2025-06-05 (4): Autoformulation of Mathematical Optimization Models Using LLMs

Title: Autoformulation of Mathematical Optimization Models Using LLMs Autoformulierung mathematischer Optimierungsmodelle mit LLMs 使用LLMM 的数学优化模型自动格式化 2411.01679v2

Authors (4): Nicolás Astorga, Tennison Liu, Yuanzhang Xiao, Mihaela van der Schaar

Mathematical optimization is fundamental to decision-making across diverse domains, from operations research to healthcare. Yet, translating real-world problems into optimization models remains a difficult task, often demanding specialized expertise. This paper approaches the problem of $\textit{autoformulation}$: the automated creation of solver-ready optimization models from natural language problem descriptions. We identify three core challenges of autoformulation: $\textit{(1)}$ the vast, problem-dependent hypothesis space, $\textit{(2)}$ efficient and diverse exploration of this space under uncertainty, and $\textit{(3)}$ evaluation of formulation correctness against problem description. To address these challenges, we present a novel method leveraging $\textit{Large Language Models}$ (LLMs) with $\textit{Monte-Carlo Tree Search}$, exploiting the hierarchical nature of optimization modeling to generate and systematically explore possible formulations. To enhance search efficiency, we introduce symbolic pruning to eliminate trivially equivalent search paths (branches), and employ LLM-based evaluation of partial formulations to guide search. Empirical analysis on linear and mixed-integer programming benchmarks demonstrates our method’s effectiveness, with significant performance gains from both LLM-based value estimation and symbolic pruning techniques.

nan


Article 1925

Title@2025-06-05 (4): EMBER2024 – A Benchmark Dataset for Holistic Evaluation of Malware Classifiers

Title: EMBER2024 – A Benchmark Dataset for Holistic Evaluation of Malware Classifiers EMBER2024 – Ein Benchmark-Datensatz für die ganzheitliche Auswertung von Malware-Klassifikatoren 2024年9月24日 – – 用于全面评估软件分类器基准数据集 2506.05074v1

Authors (8): Robert J. Joyce, Gideon Miller, Phil Roth, Richard Zak, Elliott Zaresky-Williams, Hyrum Anderson, Edward Raff, James Holt

A lack of accessible data has historically restricted malware analysis research, and practitioners have relied heavily on datasets provided by industry sources to advance. Existing public datasets are limited by narrow scope - most include files targeting a single platform, have labels supporting just one type of malware classification task, and make no effort to capture the evasive files that make malware detection difficult in practice. We present EMBER2024, a new dataset that enables holistic evaluation of malware classifiers. Created in collaboration with the authors of EMBER2017 and EMBER2018, the EMBER2024 dataset includes hashes, metadata, feature vectors, and labels for more than 3.2 million files from six file formats. Our dataset supports the training and evaluation of machine learning models on seven malware classification tasks, including malware detection, malware family classification, and malware behavior identification. EMBER2024 is the first to include a collection of malicious files that initially went undetected by a set of antivirus products, creating a “challenge” set to assess classifier performance against evasive malware. This work also introduces EMBER feature version 3, with added support for several new feature types. We are releasing the EMBER2024 dataset to promote reproducibility and empower researchers in the pursuit of new malware research topics.

nan


Article 1926

Title@2025-06-05 (4): GoRA: Gradient-driven Adaptive Low Rank Adaptation

Title: GoRA: Gradient-driven Adaptive Low Rank Adaptation GoRA: Gradient-getriebene Adaptive Low-Rank-Anpassung GARA:逐步驱动的适应性低级别适应 2502.12171v2

Authors (7): Haonan He, Peng Ye, Yuchen Ren, Yuan Yuan, Luyang Zhou, Shucun Ju, Lei Chen

Low-Rank Adaptation (LoRA) is a crucial method for efficiently fine-tuning large language models (LLMs), with its effectiveness influenced by two key factors: rank selection and weight initialization. While numerous LoRA variants have been proposed to improve performance by addressing one of these aspects, they often compromise usability or computational efficiency. In this paper, we analyze and identify the core limitations of existing approaches and propose a novel framework – GoRA (Gradient-driven Adaptive Low Rank Adaptation) – that simultaneously adapts both the rank and initialization strategy within a unified framework. GoRA leverages gradient information during training to dynamically assign optimal ranks and initialize low-rank adapter weights in an adaptive manner. To our knowledge, GoRA is the first method that not only addresses the limitations of prior approaches – which often focus on either rank selection or initialization in isolation – but also unifies both aspects within a single framework, enabling more effective and efficient adaptation. Extensive experiments across various architectures and modalities show that GoRA consistently outperforms existing LoRA-based methods while preserving the efficiency of vanilla LoRA. For example, when fine-tuning Llama3.1-8B-Base for mathematical reasoning, GoRA achieves a 5.13-point improvement over standard LoRA and even outperforms full fine-tuning by 2.05 points under high-rank settings.

nan


Article 1927

Title@2025-06-05 (4): UnHiPPO: Uncertainty-aware Initialization for State Space Models

Title: UnHiPPO: Uncertainty-aware Initialization for State Space Models UnHiPPO: Unsichere Initialisierung für State Space Models UNHPPPO:国家空间模型的不确定性初始化 2506.05065v1

Authors (3): Marten Lienen, Abdullah Saydemir, Stephan Günnemann

State space models are emerging as a dominant model class for sequence problems with many relying on the HiPPO framework to initialize their dynamics. However, HiPPO fundamentally assumes data to be noise-free; an assumption often violated in practice. We extend the HiPPO theory with measurement noise and derive an uncertainty-aware initialization for state space model dynamics. In our analysis, we interpret HiPPO as a linear stochastic control problem where the data enters as a noise-free control signal. We then reformulate the problem so that the data become noisy outputs of a latent system and arrive at an alternative dynamics initialization that infers the posterior of this latent system from the data without increasing runtime. Our experiments show that our initialization improves the resistance of state-space models to noise both at training and inference time. Find our implementation at https://cs.cit.tum.de/daml/unhippo.

nan


Article 1928

Title@2025-06-05 (4): NIMO: a Nonlinear Interpretable MOdel

Title: NIMO: a Nonlinear Interpretable MOdel NIMO: ein nichtlinear interpretierbarer MOdel NIIMO:非线性解释元体 2506.05059v1

Authors (3): Shijian Xu, Marcello Massimo Negri, Volker Roth

Neural networks (NNs) have achieved tremendous success over the past decade, yet they are still extremely difficult to interpret. In contrast, linear models are less expressive but offer inherent interpretability. Linear coefficients are interpretable as the marginal effect of a feature on the prediction, assuming all other features are kept fixed. To combine the benefits of both approaches, we introduce NIMO (Nonlinear Interpretable MOdel). The key idea is to define a model where the NN is designed to learn nonlinear corrections to the linear model predictions, while also maintaining the original interpretability of the linear coefficients. Relevantly, we develop an optimization algorithm based on profile likelihood that elegantly allows for optimizing over the NN parameters while updating the linear coefficients analytically. By relying on adaptive ridge regression we can easily incorporate sparsity constraints as well. We show empirically that we can recover the underlying linear coefficients while significantly improving the predictive accuracy. Compared to other hybrid interpretable approaches, our model is the only one that actually maintains the same interpretability of linear coefficients as in linear models. We also achieve higher performance on various regression and classification settings.

nan


Article 1929

Title@2025-06-05 (4): Mini Diffuser: Fast Multi-task Diffusion Policy Training Using Two-level Mini-batches

Title: Mini Diffuser: Fast Multi-task Diffusion Policy Training Using Two-level Mini-batches Mini Diffusor: Schnelles Multi-Task Diffusion Policy Training mit zweistufigen Mini-Batches Mini Diffuser:利用两级小型水槽快速多任务传播政策培训 2505.09430v2

Authors (4): Yutong Hu, Pinhao Song, Kehan Wen, Renaud Detry

We present a method that reduces, by an order of magnitude, the time and memory needed to train multi-task vision-language robotic diffusion policies. This improvement arises from a previously underexplored distinction between action diffusion and the image diffusion techniques that inspired it: In image generation, the target is high-dimensional. By contrast, in action generation, the dimensionality of the target is comparatively small, and only the image condition is high-dimensional. Our approach, \emph{Mini Diffuser}, exploits this asymmetry by introducing \emph{two-level minibatching}, which pairs multiple noised action samples with each vision-language condition, instead of the conventional one-to-one sampling strategy. To support this batching scheme, we introduce architectural adaptations to the diffusion transformer that prevent information leakage across samples while maintaining full conditioning access. In RLBench simulations, Mini-Diffuser achieves 95\% of the performance of state-of-the-art multi-task diffusion policies, while using only 5\% of the training time and 7\% of the memory. Real-world experiments further validate that Mini-Diffuser preserves the key strengths of diffusion-based policies, including the ability to model multimodal action distributions and produce behavior conditioned on diverse perceptual inputs. Code available at mini-diffuse-actor.github.io

nan


Article 1930

Title@2025-06-05 (4): Neurosymbolic Association Rule Mining from Tabular Data

Title: Neurosymbolic Association Rule Mining from Tabular Data Neurosymbolische Assoziationsregel Mining aus tabellarischen Daten 利用表格数据进行常规采矿 2504.19354v2

Authors (3): Erkan Karabulut, Paul Groth, Victoria Degeler

Association Rule Mining (ARM) is the task of mining patterns among data features in the form of logical rules, with applications across a myriad of domains. However, high-dimensional datasets often result in an excessive number of rules, increasing execution time and negatively impacting downstream task performance. Managing this rule explosion remains a central challenge in ARM research. To address this, we introduce Aerial+, a novel neurosymbolic ARM method. Aerial+ leverages an under-complete autoencoder to create a neural representation of the data, capturing associations between features. It extracts rules from this neural representation by exploiting the model’s reconstruction mechanism. Extensive evaluations on five datasets against seven baselines demonstrate that Aerial+ achieves state-of-the-art results by learning more concise, high-quality rule sets with full data coverage. When integrated into rule-based interpretable machine learning models, Aerial+ significantly reduces execution time while maintaining or improving accuracy.

nan


Article 1931

Title@2025-06-05 (4): A Compressive-Expressive Communication Framework for Compositional Representations

Title: A Compressive-Expressive Communication Framework for Compositional Representations Ein kompressiv-expressiver Kommunikationsrahmen für kompositorische Repräsentationen 组成代表制压缩-高压沟通框架 2501.19182v3

Authors (4): Rafael Elberg, Felipe del Rio, Mircea Petrache, Denis Parra

Compositional generalization–the ability to interpret novel combinations of familiar elements–is a hallmark of human cognition and language. Despite recent advances, deep neural networks still struggle to acquire this property reliably. In this work, we introduce CELEBI (Compressive-Expressive Language Emergence through a discrete Bottleneck and Iterated learning), a novel self-supervised framework for inducing compositionality in learned representations from pre-trained models, through a reconstruction-based communication game between a sender and a receiver. Building on theories of language emergence, we integrate three mechanisms that jointly promote compressibility, expressivity, and efficiency in the emergent language. First, interactive decoding incentivizes intermediate reasoning by requiring the receiver to produce partial reconstructions after each symbol. Second, a reconstruction-based imitation phase, inspired by iterated learning, trains successive generations of agents to imitate reconstructions rather than messages, enforcing a tighter communication bottleneck. Third, pairwise distance maximization regularizes message diversity by encouraging high distances between messages, with formal links to entropy maximization. Our method significantly improves both the efficiency and compositionality of the learned messages on the Shapes3D and MPI3D datasets, surpassing prior discrete communication frameworks in both reconstruction accuracy and topographic similarity. This work provides new theoretical and empirical evidence for the emergence of structured, generalizable communication protocols from simplicity-based inductive biases.

nan


Article 1932

Title@2025-06-05 (4): Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Title: Optimizing Anytime Reasoning via Budget Relative Policy Optimization Optimierung jederzeit über Budget-Relational-Policy-Optimierung 通过预算相对政策优化优化,优化任何时间的理由 2505.13438v2

Authors (6): Penghui Qi, Zichen Liu, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin

Scaling test-time compute is crucial for enhancing the reasoning capabilities of large language models (LLMs). Existing approaches typically employ reinforcement learning (RL) to maximize a verifiable reward obtained at the end of reasoning traces. However, such methods optimize only the final performance under a large and fixed token budget, which hinders efficiency in both training and deployment. In this work, we present a novel framework, AnytimeReasoner, to optimize anytime reasoning performance, which aims to improve token efficiency and the flexibility of reasoning under varying token budget constraints. To achieve this, we truncate the complete thinking process to fit within sampled token budgets from a prior distribution, compelling the model to summarize the optimal answer for each truncated thinking for verification. This introduces verifiable dense rewards into the reasoning process, facilitating more effective credit assignment in RL optimization. We then optimize the thinking and summary policies in a decoupled manner to maximize the cumulative reward. Additionally, we introduce a novel variance reduction technique, Budget Relative Policy Optimization (BRPO), to enhance the robustness and efficiency of the learning process when reinforcing the thinking policy. Empirical results in mathematical reasoning tasks demonstrate that our method consistently outperforms GRPO across all thinking budgets under various prior distributions, enhancing both training and token efficiency.

nan


Article 1933

Title@2025-06-05 (4): Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic

Title: Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic Causal Policy Learning in Verstärkung Lernen: Hintertür-angepasste Soft-Actor-Kritik 强化学习中因果政策学习:后门调整后软动作器-批评 2506.05445v1

Authors (5): Thanh Vinh Vo, Young Lee, Haozhe Ma, Chien Lu, Tze-Yun Leong

Hidden confounders that influence both states and actions can bias policy learning in reinforcement learning (RL), leading to suboptimal or non-generalizable behavior. Most RL algorithms ignore this issue, learning policies from observational trajectories based solely on statistical associations rather than causal effects. We propose DoSAC (Do-Calculus Soft Actor-Critic with Backdoor Adjustment), a principled extension of the SAC algorithm that corrects for hidden confounding via causal intervention estimation. DoSAC estimates the interventional policy $\pi(a \mathrm{do}(s))$ using the backdoor criterion, without requiring access to true confounders or causal labels. To achieve this, we introduce a learnable Backdoor Reconstructor that infers pseudo-past variables (previous state and action) from the current state to enable backdoor adjustment from observational data. This module is integrated into a soft actor-critic framework to compute both the interventional policy and its entropy. Empirical results on continuous control benchmarks show that DoSAC outperforms baselines under confounded settings, with improved robustness, generalization, and policy reliability.

nan


Article 1934

Title@2025-06-05 (4): A Selective Homomorphic Encryption Approach for Faster Privacy-Preserving Federated Learning

Title: A Selective Homomorphic Encryption Approach for Faster Privacy-Preserving Federated Learning Ein selektiver homomorpher Verschlüsselungsansatz für schnelleres Datenschutz-Erhalten von Federated Learning 为更快的隐私-保护联邦学习采取选择性单态加密方法 2501.12911v4

Authors (2): Abdulkadir Korkmaz, Praveen Rao

Federated learning (FL) has come forward as a critical approach for privacy-preserving machine learning in healthcare, allowing collaborative model training across decentralized medical datasets without exchanging clients’ data. However, current security implementations for these systems face a fundamental trade-off: rigorous cryptographic protections like fully homomorphic encryption (FHE) impose prohibitive computational overhead, while lightweight alternatives risk vulnerable data leakage through model updates. To address this issue, we present FAS (Fast and Secure Federated Learning), a novel approach that strategically combines selective homomorphic encryption, differential privacy, and bitwise scrambling to achieve robust security without compromising practical usability. Our approach eliminates the need for model pretraining phases while dynamically protecting high-risk model parameters through layered encryption and obfuscation. We implemented FAS using the Flower framework and evaluated it on a cluster of eleven physical machines. Our approach was up to 90\% faster than applying FHE on the model weights. In addition, we eliminated the computational overhead that is required by competitors such as FedML-HE and MaskCrypt. Our approach was up to 1.5$\times$ faster than the competitors while achieving comparable security results. Experimental evaluations on medical imaging datasets confirm that FAS maintains similar security results to conventional FHE against gradient inversion attacks while preserving diagnostic model accuracy. These results position FAS as a practical solution for latency-sensitive healthcare applications where both privacy preservation and computational efficiency are requirements.

nan


Article 1935

Title@2025-06-05 (4): iN2V: Bringing Transductive Node Embeddings to Inductive Graphs

Title: iN2V: Bringing Transductive Node Embeddings to Inductive Graphs iN2V: Transduktive Knoteneinbettungen in induktive Graphen bringen iN2V: 将感应节点嵌入式带入感应图 2506.05039v1

Authors (2): Nicolas Lell, Ansgar Scherp

Shallow node embeddings like node2vec (N2V) can be used for nodes without features or to supplement existing features with structure-based information. Embedding methods like N2V are limited in their application on new nodes, which restricts them to the transductive setting where the entire graph, including the test nodes, is available during training. We propose inductive node2vec (iN2V), which combines a post-hoc procedure to compute embeddings for nodes unseen during training and modifications to the original N2V training procedure to prepare the embeddings for this post-hoc procedure. We conduct experiments on several benchmark datasets and demonstrate that iN2V is an effective approach to bringing transductive embeddings to an inductive setting. Using iN2V embeddings improves node classification by 1 point on average, with up to 6 points of improvement depending on the dataset and the number of unseen nodes. Our iN2V is a plug-in approach to create new or enrich existing embeddings. It can also be combined with other embedding methods, making it a versatile approach for inductive node representation learning. Code to reproduce the results is available at https://github.com/Foisunt/iN2V .

nan


Article 1936

Title@2025-06-05 (4): Scalable Multi-Agent Offline Reinforcement Learning and the Role of Information

Title: Scalable Multi-Agent Offline Reinforcement Learning and the Role of Information Skalierbares Mehr-Agent-Offline-Verstärkungs-Lernen und die Rolle der Information 可缩放的多代理机构脱线强化学习和信息的作用 2502.11260v2

Authors (3): Riccardo Zamboni, Enrico Brunetti, Marcello Restelli

Offline Reinforcement Learning (RL) focuses on learning policies solely from a batch of previously collected data. offering the potential to leverage such datasets effectively without the need for costly or risky active exploration. While recent advances in Offline Multi-Agent RL (MARL) have shown promise, most existing methods either rely on large datasets jointly collected by all agents or agent-specific datasets collected independently. The former approach ensures strong performance but raises scalability concerns, while the latter emphasizes scalability at the expense of performance guarantees. In this work, we propose a novel scalable routine for both dataset collection and offline learning. Agents first collect diverse datasets coherently with a pre-specified information-sharing network and subsequently learn coherent localized policies without requiring either full observability or falling back to complete decentralization. We theoretically demonstrate that this structured approach allows a multi-agent extension of the seminal Fitted Q-Iteration (FQI) algorithm to globally converge, in high probability, to near-optimal policies. The convergence is subject to error terms that depend on the informativeness of the shared information. Furthermore, we show how this approach allows to bound the inherent error of the supervised-learning phase of FQI with the mutual information between shared and unshared information. Our algorithm, SCAlable Multi-agent FQI (SCAM-FQI), is then evaluated on a distributed decision-making problem. The empirical results align with our theoretical findings, supporting the effectiveness of SCAM-FQI in achieving a balance between scalability and policy performance.

nan


Article 1937

Title@2025-06-05 (4): TIMING: Temporality-Aware Integrated Gradients for Time Series Explanation

Title: TIMING: Temporality-Aware Integrated Gradients for Time Series Explanation TIMING: Temporality-Aware integrierte Gradienten für die Zeitreihe Erklärung TIMING: 时间序列解释时空软件集成梯度 2506.05035v1

Authors (3): Hyeongwon Jang, Changhun Kim, Eunho Yang

Recent explainable artificial intelligence (XAI) methods for time series primarily estimate point-wise attribution magnitudes, while overlooking the directional impact on predictions, leading to suboptimal identification of significant points. Our analysis shows that conventional Integrated Gradients (IG) effectively capture critical points with both positive and negative impacts on predictions. However, current evaluation metrics fail to assess this capability, as they inadvertently cancel out opposing feature contributions. To address this limitation, we propose novel evaluation metrics-Cumulative Prediction Difference (CPD) and Cumulative Prediction Preservation (CPP)-to systematically assess whether attribution methods accurately identify significant positive and negative points in time series XAI. Under these metrics, conventional IG outperforms recent counterparts. However, directly applying IG to time series data may lead to suboptimal outcomes, as generated paths ignore temporal relationships and introduce out-of-distribution samples. To overcome these challenges, we introduce TIMING, which enhances IG by incorporating temporal awareness while maintaining its theoretical properties. Extensive experiments on synthetic and real-world time series benchmarks demonstrate that TIMING outperforms existing time series XAI baselines. Our code is available at https://github.com/drumpt/TIMING.

nan


Article 1938

Title@2025-06-05 (4): Identifying and Understanding Cross-Class Features in Adversarial Training

Title: Identifying and Understanding Cross-Class Features in Adversarial Training Identifizieren und Verstehen von Cross-Class-Features im Adversarial Training 确定和理解反向培训的跨系列特点 2506.05032v1

Authors (3): Zeming Wei, Yiwen Guo, Yisen Wang

Adversarial training (AT) has been considered one of the most effective methods for making deep neural networks robust against adversarial attacks, while the training mechanisms and dynamics of AT remain open research problems. In this paper, we present a novel perspective on studying AT through the lens of class-wise feature attribution. Specifically, we identify the impact of a key family of features on AT that are shared by multiple classes, which we call cross-class features. These features are typically useful for robust classification, which we offer theoretical evidence to illustrate through a synthetic data model. Through systematic studies across multiple model architectures and settings, we find that during the initial stage of AT, the model tends to learn more cross-class features until the best robustness checkpoint. As AT further squeezes the training robust loss and causes robust overfitting, the model tends to make decisions based on more class-specific features. Based on these discoveries, we further provide a unified view of two existing properties of AT, including the advantage of soft-label training and robust overfitting. Overall, these insights refine the current understanding of AT mechanisms and provide new perspectives on studying them. Our code is available at https://github.com/PKU-ML/Cross-Class-Features-AT.

nan


Article 1939

Title@2025-06-05 (4): Artificial Intelligence Should Genuinely Support Clinical Reasoning and Decision Making To Bridge the Translational Gap

Title: Artificial Intelligence Should Genuinely Support Clinical Reasoning and Decision Making To Bridge the Translational Gap Künstliche Intelligenz sollte die klinische Vernunft und Entscheidungsfindung wirklich unterstützen, um die translationale Lücke zu überbrücken 人工情报应真正支持临床理由和决策,以弥补翻译差距 2506.05030v1

Authors (3): Kacper Sokol, James Fackler, Julia E Vogt

Artificial intelligence promises to revolutionise medicine, yet its impact remains limited because of the pervasive translational gap. We posit that the prevailing technology-centric approaches underpin this challenge, rendering such systems fundamentally incompatible with clinical practice, specifically diagnostic reasoning and decision making. Instead, we propose a novel sociotechnical conceptualisation of data-driven support tools designed to complement doctors’ cognitive and epistemic activities. Crucially, it prioritises real-world impact over superhuman performance on inconsequential benchmarks.

nan


Article 1940

Title@2025-06-05 (4): Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning

Title: Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning Engpässe Transformer: Regelmäßige KV-Cache Abstraktion für generalisierte Vernunft 装瓶式变压器: 用于一般理由的周期 KV 缓存摘要 2505.16950v2

Authors (5): Adnan Oomerjee, Zafeirios Fountas, Zhongwei Yu, Haitham Bou-Ammar, Jun Wang

Despite their impressive capabilities, Large Language Models struggle with generalisation beyond their training distribution, often exhibiting sophisticated pattern interpolation rather than true abstract reasoning (extrapolation). In this work, we approach this limitation through the lens of Information Bottleneck (IB) theory, which posits that model generalisation emerges from an optimal balance between input compression and retention of predictive information in latent representations. We prove using IB theory that decoder-only Transformers are inherently constrained in their ability to form task-optimal sequence representations. We then use this result to demonstrate that periodic global transformation of the internal sequence-level representations (KV cache) is a necessary computational step for improving Transformer generalisation in reasoning tasks. Based on these theoretical insights, we propose a modification to the Transformer architecture, in the form of an additional module that globally rewrites the KV cache at periodic intervals, shifting its capacity away from memorising input prefixes and toward encoding features most useful for predicting future tokens. Our model delivers substantial gains on mathematical reasoning benchmarks, outperforming both vanilla Transformers with up to 3.5x more parameters, as well as heuristic-driven pruning mechanisms for cache compression. Our approach can be seen as a principled generalisation of existing KV-cache compression methods; whereas such methods focus solely on compressing input representations, they often do so at the expense of retaining predictive information, and thus their capabilities are inherently bounded by those of an unconstrained model. This establishes a principled framework to manipulate Transformer memory using information theory, addressing fundamental reasoning limitations that scaling alone cannot overcome.

nan


Article 1941

Title@2025-06-05 (4): Tuning the Right Foundation Models is What you Need for Partial Label Learning

Title: Tuning the Right Foundation Models is What you Need for Partial Label Learning Tuning the right foundation models is what you need for Partial Label Learning 向右基建模投注是部分标签学习所需要的 2506.05027v1

Authors (4): Kuang He, Wei Tang, Tong Wei, Min-Ling Zhang

Partial label learning (PLL) seeks to train generalizable classifiers from datasets with inexact supervision, a common challenge in real-world applications. Existing studies have developed numerous approaches to progressively refine and recover ground-truth labels by training convolutional neural networks. However, limited attention has been given to foundation models that offer transferrable representations. In this work, we empirically conduct comprehensive evaluations of 11 foundation models across 13 PLL approaches on 8 benchmark datasets under 3 PLL scenarios. We further propose PartialCLIP, an efficient fine-tuning framework for foundation models in PLL. Our findings reveal that current PLL approaches tend to 1) achieve significant performance gains when using foundation models, 2) exhibit remarkably similar performance to each other, 3) maintain stable performance across varying ambiguity levels, while 4) are susceptible to foundation model selection and adaptation strategies. Additionally, we demonstrate the efficacy of text-embedding classifier initialization and effective candidate label filtering using zero-shot CLIP. Our experimental results and analysis underscore the limitations of current PLL approaches and provide valuable insights for developing more generalizable PLL models. The source code can be found at https://github.com/SEU-hk/PartialCLIP.

nan


Article 1942

Title@2025-06-05 (4): A unified weighting framework for evaluating nearest neighbour classification

Title: A unified weighting framework for evaluating nearest neighbour classification Ein einheitlicher Gewichtungsrahmen für die Bewertung der nächsten Nachbarklassifikation 评价近邻分类的统一加权框架 2311.16872v3

Authors (3): Oliver Urs Lenz, Henri Bollaert, Chris Cornelis

We present the first comprehensive and large-scale evaluation of classical (NN), fuzzy (FNN) and fuzzy rough (FRNN) nearest neighbour classification. We standardise existing proposals for nearest neighbour weighting with kernel functions, applied to the distance values and/or ranks of the nearest neighbours of a test instance. In particular, we show that the theoretically optimal Samworth weights converge to a kernel. Kernel functions are closely related to fuzzy negation operators, and we propose a new kernel based on Yager negation. We also consider various distance and scaling measures, which we show can be related to each other. Through a systematic series of experiments on 85 real-life classification datasets, we find that NN, FNN and FRNN all perform best with Boscovich distance, and that NN and FRNN perform best with a combination of Samworth rank- and distance-weights and scaling by the mean absolute deviation around the median ($r_1$), the standard deviation ($r_2$) or the semi-interquartile range ($r_{\infty}^*$), while FNN performs best with only Samworth distance-weights and $r_1$- or $r_2$-scaling. However, NN achieves comparable performance with Yager-$\frac{1}{2}$ distance-weights, which are simpler to implement than a combination of Samworth distance- and rank-weights. Finally, FRNN generally outperforms NN, which in turn performs systematically better than FNN.

nan


Article 1943

Title@2025-06-05 (4): GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs

Title: GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs GRU: Abbau des Kompromisses zwischen Unlearning und Retention für LLMs GRU:减少LLMM的不学习与保留之间的取舍 2503.09117v3

Authors (7): Yue Wang, Qizhou Wang, Feng Liu, Wei Huang, Yali Du, Xiaojiang Du, Bo Han

Large language model (LLM) unlearning has demonstrated its essential role in removing privacy and copyright-related responses, crucial for their legal and safe applications. However, the pursuit of complete unlearning often comes with substantial costs due to its compromises in their general functionality, leading to a notorious trade-off between unlearning and retention. It motivates this paper to explore enhanced unlearning schemes that can mitigate this trade-off. Specifically, we propose Gradient Rectified Unlearning (GRU), an improved framework that regulates the directions of gradient updates during the unlearning procedure such that their side impacts on other, unrelated responses can be minimized. GRU is easy and general to implement, demonstrating practical effectiveness across a variety of well-established unlearning benchmarks.

nan


Article 1944

Title@2025-06-05 (4): U-NetMN and SegNetMN: Modified U-Net and SegNet models for bimodal SAR image segmentation

Title: U-NetMN and SegNetMN: Modified U-Net and SegNet models for bimodal SAR image segmentation U-NetMN und SegNetMN: Modifizierte U-Net- und SegNet-Modelle für bimodale SAR-Bildsegmentierung U-NetMN和SegNetMN:经过修改的双式合成孔径雷达图像分离 U-Net和SegNet模型和SegNet模型 2506.05444v1

Authors (6): Marwane Kzadri, Franco Alberto Cardillo, Nanée Chahinian, Carole Delenne, Renaud Hostache, Jamal Riffi

Segmenting Synthetic Aperture Radar (SAR) images is crucial for many remote sensing applications, particularly water body detection. However, deep learning-based segmentation models often face challenges related to convergence speed and stability, mainly due to the complex statistical distribution of this type of data. In this study, we evaluate the impact of mode normalization on two widely used semantic segmentation models, U-Net and SegNet. Specifically, we integrate mode normalization, to reduce convergence time while maintaining the performance of the baseline models. Experimental results demonstrate that mode normalization significantly accelerates convergence. Furthermore, cross-validation results indicate that normalized models exhibit increased stability in different zones. These findings highlight the effectiveness of normalization in improving computational efficiency and generalization in SAR image segmentation.

nan


Article 1945

Title@2025-06-05 (4): On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding

Title: On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding Über Expressive Power von Schleiftransformatoren: Theoretische Analyse und Verbesserung über die Zeitschrittkodierung 循环变换器的表达力:通过时间步骤编码进行理论分析和加强 2410.01405v7

Authors (2): Kevin Xu, Issei Sato

Looped Transformers provide advantages in parameter efficiency, computational capabilities, and generalization for reasoning tasks. However, their expressive power regarding function approximation remains underexplored. In this paper, we establish the approximation rate of Looped Transformers by defining the modulus of continuity for sequence-to-sequence functions. This reveals a limitation specific to the looped architecture. That is, the analysis prompts the incorporation of scaling parameters for each loop, conditioned on timestep encoding. Experiments validate the theoretical results, showing that increasing the number of loops enhances performance, with further gains achieved through the timestep encoding.

nan


Article 1946

Title@2025-06-05 (4): Stein Variational Evolution Strategies

Title: Stein Variational Evolution Strategies Strategien der Steinvariational Evolution Stein变异演变战略 2410.10390v2

Authors (3): Cornelius V. Braun, Robert T. Lange, Marc Toussaint

Stein Variational Gradient Descent (SVGD) is a highly efficient method to sample from an unnormalized probability distribution. However, the SVGD update relies on gradients of the log-density, which may not always be available. Existing gradient-free versions of SVGD make use of simple Monte Carlo approximations or gradients from surrogate distributions, both with limitations. To improve gradient-free Stein variational inference, we combine SVGD steps with evolution strategy (ES) updates. Our results demonstrate that the resulting algorithm generates high-quality samples from unnormalized target densities without requiring gradient information. Compared to prior gradient-free SVGD methods, we find that the integration of the ES update in SVGD significantly improves the performance on multiple challenging benchmark problems.

nan


Article 1947

Title@2025-06-05 (4): Controlling Summarization Length Through EOS Token Weighting

Title: Controlling Summarization Length Through EOS Token Weighting Kontrolle der Zusammenfassung Länge durch EOS Token Gewichtung 控制通过 EOS 过 EOS 键权重加权的控控总和长度 2506.05017v1

Authors (4): Zeno Belligoli, Emmanouil Stergiadis, Eran Fainman, Ilya Gusev

Controlling the length of generated text can be crucial in various text-generation tasks, including summarization. Existing methods often require complex model alterations, limiting compatibility with pre-trained models. We address these limitations by developing a simple approach for controlling the length of automatic text summaries by increasing the importance of correctly predicting the EOS token in the cross-entropy loss computation. The proposed methodology is agnostic to architecture and decoding algorithms and orthogonal to other inference-time techniques to control the generation length. We tested it with encoder-decoder and modern GPT-style LLMs, and show that this method can control generation length, often without affecting the quality of the summary.

nan


Article 1948

Title@2025-06-05 (4): A new architecture of high-order deep neural networks that learn martingales

Title: A new architecture of high-order deep neural networks that learn martingales Eine neue Architektur von hochordentlich tiefen neuronalen Netzwerken, die martingales lernen 一个由高阶深层神经网络组成的新结构,能够学习马丁鱼 2505.03789v2

Authors (2): Syoiti Ninomiya, Yuming Ma

A new deep-learning neural network architecture based on high-order weak approximation algorithms for stochastic differential equations (SDEs) is proposed. The architecture enables the efficient learning of martingales by deep learning models. The behaviour of deep neural networks based on this architecture, when applied to the problem of pricing financial derivatives, is also examined. The core of this new architecture lies in the high-order weak approximation algorithms of the explicit Runge–Kutta type, wherein the approximation is realised solely through iterative compositions and linear combinations of vector fields of the target SDEs.

nan


Article 1949

Title@2025-06-05 (4): Multi-Point Proximity Encoding For Vector-Mode Geospatial Machine Learning

Title: Multi-Point Proximity Encoding For Vector-Mode Geospatial Machine Learning Multi-Point Proximity Encoding für Vector-Mode Geospatial Machine Learning 矢量模式地理空间机器学习多点近似编码 2506.05016v1

Authors (1): John Collins

Vector-mode geospatial data – points, lines, and polygons – must be encoded into an appropriate form in order to be used with traditional machine learning and artificial intelligence models. Encoding methods attempt to represent a given shape as a vector that captures its essential geometric properties. This paper presents an encoding method based on scaled distances from a shape to a set of reference points within a region of interest. The method, MultiPoint Proximity (MPP) encoding, can be applied to any type of shape, enabling the parameterization of machine learning models with encoded representations of vector-mode geospatial features. We show that MPP encoding possesses the desirable properties of shape-centricity and continuity, can be used to differentiate spatial objects based on their geometric features, and can capture pairwise spatial relationships with high precision. In all cases, MPP encoding is shown to perform better than an alternative method based on rasterization.

nan


Article 1950

Title@2025-06-05 (4): Towards Reasonable Concept Bottleneck Models

Title: Towards Reasonable Concept Bottleneck Models Auf dem Weg zu vernünftigen Konzeptmodellen für Engpässe 迈向合理概念的瓶颈模式 2506.05014v1

Authors (4): Nektarios Kalampalikis, Kavya Gupta, Georgi Vitanov, Isabel Valera

In this paper, we propose $\textbf{C}$oncept $\textbf{REA}$soning $\textbf{M}$odels (CREAM), a novel family of Concept Bottleneck Models (CBMs) that: (i) explicitly encodes concept-concept (${\texttt{C-C}}$) and concept-task (${\texttt{C$\rightarrow$Y}}$) relationships to enforce a desired model reasoning; and (ii) use a regularized side-channel to achieve competitive task performance, while keeping high concept importance. Specifically, CREAM architecturally embeds (bi)directed concept-concept, and concept to task relationships specified by a human expert, while severing undesired information flows (e.g., to handle mutually exclusive concepts). Moreover, CREAM integrates a black-box side-channel that is regularized to encourage task predictions to be grounded in the relevant concepts, thereby utilizing the side-channel only when necessary to enhance performance. Our experiments show that: (i) CREAM mainly relies on concepts while achieving task performance on par with black-box models; and (ii) the embedded ${\texttt{C-C}}$ and ${\texttt{C$\rightarrow$Y}}$ relationships ease model interventions and mitigate concept leakage.

nan


Article 1951

Title@2025-06-05 (4): Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Title: Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity 1568 Tokens in einen einzigen Vektor und wieder zurück krammen: Die Grenzen der Einbettung von Raumkapazität erkunden 将1568吨撞成单一矢量和后向:探索嵌入空间能力的极限 2502.13063v2

Authors (4): Yuri Kuratov, Mikhail Arkhipov, Aydar Bulatov, Mikhail Burtsev

A range of recent works addresses the problem of compression of sequence of tokens into a shorter sequence of real-valued vectors to be used as inputs instead of token embeddings or key-value cache. These approaches are focused on reduction of the amount of compute in existing language models rather than minimization of number of bits needed to store text. Despite relying on powerful models as encoders, the maximum attainable lossless compression ratio is typically not higher than x10. This fact is highly intriguing because, in theory, the maximum information capacity of large real-valued vectors is far beyond the presented rates even for 16-bit precision and a modest vector size. In this work, we explore the limits of compression by replacing the encoder with a per-sample optimization procedure. We show that vectors with compression ratios up to x1500 exist, which highlights two orders of magnitude gap between existing and practically attainable solutions. Furthermore, we empirically show that the compression limits are determined not by the length of the input but by the amount of uncertainty to be reduced, namely, the cross-entropy loss on this sequence without any conditioning. The obtained limits highlight the substantial gap between the theoretical capacity of input embeddings and their practical utilization, suggesting significant room for optimization in model design.

nan


Article 1952

Title@2025-06-05 (4): QiMeng: Fully Automated Hardware and Software Design for Processor Chip

Title: QiMeng: Fully Automated Hardware and Software Design for Processor Chip QiMeng: Vollautomatisierte Hardware und Software-Design für Prozessor Chip QiMeng:处理芯片全自动硬件和软件设计 2506.05007v1

Authors (20): Rui Zhang, Yuanbo Wen, Shuyao Cheng, Di Huang, Shaohui Peng, Jiaming Guo, Pengwei Jin, Jiacheng Zhao, Tianrui Ma, Yaoyu Zhu, Yifan Hao, Yongwei Zhao, Shengwen Liang, Ying Wang, Xing Hu, Zidong Du, Huimin Cui, Ling Li, Qi Guo, Yunji Chen

Processor chip design technology serves as a key frontier driving breakthroughs in computer science and related fields. With the rapid advancement of information technology, conventional design paradigms face three major challenges: the physical constraints of fabrication technologies, the escalating demands for design resources, and the increasing diversity of ecosystems. Automated processor chip design has emerged as a transformative solution to address these challenges. While recent breakthroughs in Artificial Intelligence (AI), particularly Large Language Models (LLMs) techniques, have opened new possibilities for fully automated processor chip design, substantial challenges remain in establishing domain-specific LLMs for processor chip design. In this paper, we propose QiMeng, a novel system for fully automated hardware and software design of processor chips. QiMeng comprises three hierarchical layers. In the bottom-layer, we construct a domain-specific Large Processor Chip Model (LPCM) that introduces novel designs in architecture, training, and inference, to address key challenges such as knowledge representation gap, data scarcity, correctness assurance, and enormous solution space. In the middle-layer, leveraging the LPCM’s knowledge representation and inference capabilities, we develop the Hardware Design Agent and the Software Design Agent to automate the design of hardware and software for processor chips. Currently, several components of QiMeng have been completed and successfully applied in various top-layer applications, demonstrating significant advantages and providing a feasible solution for efficient, fully automated hardware/software design of processor chips. Future research will focus on integrating all components and performing iterative top-down and bottom-up design processes to establish a comprehensive QiMeng system.

nan


Article 1953

Title@2025-06-05 (4): Monotonic anomaly detection

Title: Monotonic anomaly detection Monotonische Anomalieerkennung 单声异常探测 2410.23158v2

Authors (2): Oliver Urs Lenz, Matthijs van Leeuwen

Semi-supervised anomaly detection is based on the principle that potential anomalies are those records that look different from normal training data. However, in some cases we are specifically interested in anomalies that correspond to high attribute values (or low, but not both). We present two asymmetrical distance measures that take this monotonicity into account: ramp distance and signed distance. Through experiments on synthetic and real-life datasets, we show that ramp distance increases anomaly detection performance over the traditional absolute distance. While signed distance also performs well on synthetic data, it performs substantially poorer on real-life datasets. We argue that this is a consequence of the fact that when using signed distance, low values of certain attributes automatically compensate for high values of other attributes, such that anomaly detection is reduced to counting the total attribute value sum, which is too simplistic in practice.

nan


Article 1954

Title@2025-06-05 (4): Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games

Title: Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games Vorsichtiger Optimismus: Ein Meta-Algorithmus für fast-konstantes Bedauern in den allgemeinen Spielen 谨慎乐观主义:普通运动会中近距离遗憾的代算算法 2506.05005v1

Authors (3): Ashkan Soleymani, Georgios Piliouras, Gabriele Farina

Recent work [Soleymani et al., 2025] introduced a variant of Optimistic Multiplicative Weights Updates (OMWU) that adaptively controls the learning pace in a dynamic, non-monotone manner, achieving new state-of-the-art regret minimization guarantees in general games. In this work, we demonstrate that no-regret learning acceleration through adaptive pacing of the learners is not an isolated phenomenon. We introduce \emph{Cautious Optimism}, a framework for substantially faster regularized learning in general games. Cautious Optimism takes as input any instance of Follow-the-Regularized-Leader (FTRL) and outputs an accelerated no-regret learning algorithm by pacing the underlying FTRL with minimal computational overhead. Importantly, we retain uncoupledness (learners do not need to know other players’ utilities). Cautious Optimistic FTRL achieves near-optimal $O_T(\log T)$ regret in diverse self-play (mixing-and-matching regularizers) while preserving the optimal $O(\sqrt{T})$ regret in adversarial scenarios. In contrast to prior works (e.g. Syrgkanis et al. [2015], Daskalakis et al. [2021]), our analysis does not rely on monotonic step-sizes, showcasing a novel route for fast learning in general games.

nan


Article 1955

Title@2025-06-05 (4): UniPTMs: The First Unified Multi-type PTM Site Prediction Model via Master-Slave Architecture-Based Multi-Stage Fusion Strategy and Hierarchical Contrastive Loss

Title: UniPTMs: The First Unified Multi-type PTM Site Prediction Model via Master-Slave Architecture-Based Multi-Stage Fusion Strategy and Hierarchical Contrastive Loss UniPTMs: Das erste einheitliche Multi-Type PTM Site Prediction Model über Master-Slave Architecture-basierte Multi-Stage Fusionsstrategie und hierarchischen Kontrastverlust UNIPTMs:第一个单一多型PTM多型PTM站点预测模型,通过基于万能结构的建筑多阶段融合战略和等级对立损失 2506.05443v1

Authors (6): Yiyu Lin, Yan Wang, You Zhou, Xinye Ni, Jiahui Wu, Sen Yang

As a core mechanism of epigenetic regulation in eukaryotes, protein post-translational modifications (PTMs) require precise prediction to decipher dynamic life activity networks. To address the limitations of existing deep learning models in cross-modal feature fusion, domain generalization, and architectural optimization, this study proposes UniPTMs: the first unified framework for multi-type PTM prediction. The framework innovatively establishes a “Master-Slave” dual-path collaborative architecture: The master path dynamically integrates high-dimensional representations of protein sequences, structures, and evolutionary information through a Bidirectional Gated Cross-Attention (BGCA) module, while the slave path optimizes feature discrepancies and recalibration between structural and traditional features using a Low-Dimensional Fusion Network (LDFN). Complemented by a Multi-scale Adaptive convolutional Pyramid (MACP) for capturing local feature patterns and a Bidirectional Hierarchical Gated Fusion Network (BHGFN) enabling multi-level feature integration across paths, the framework employs a Hierarchical Dynamic Weighting Fusion (HDWF) mechanism to intelligently aggregate multimodal features. Enhanced by a novel Hierarchical Contrastive loss function for feature consistency optimization, UniPTMs demonstrates significant performance improvements (3.2%-11.4% MCC and 4.2%-14.3% AP increases) over state-of-the-art models across five modification types and transcends the Single-Type Prediction Paradigm. To strike a balance between model complexity and performance, we have also developed a lightweight variant named UniPTMs-mini.

nan


Article 1956

Title@2025-06-05 (4): FPTQuant: Function-Preserving Transforms for LLM Quantization

Title: FPTQuant: Function-Preserving Transforms for LLM Quantization FPTQuant: Funktionsschonende Transformationen für die LLM-Quantisierung FPTQuant: 用于LLM量化的功能保存变换 2506.04985v1

Authors (4): Boris van Breugel, Yelysei Bondarenko, Paul Whatmough, Markus Nagel

Large language models (LLMs) require substantial compute, and thus energy, at inference time. While quantizing weights and activations is effective at improving efficiency, naive quantization of LLMs can significantly degrade performance due to large magnitude outliers. This paper describes FPTQuant, which introduces four novel, lightweight, and expressive function-preserving transforms (FPTs) to facilitate quantization of transformers: (1) a mergeable pre-RoPE transform for queries and keys, (2) a mergeable transform for values, (3) a mergeable scaling transform within the MLP block, and (4) a cheap, dynamic scaling transform. By leveraging the equivariances and independencies inherent to canonical transformer operation, we designed these FPTs to maintain the model’s function while shaping the intermediate activation distributions to be more quantization friendly. FPTQuant requires no custom kernels and adds virtually no overhead during inference. The FPTs are trained both locally to reduce outliers, and end-to-end such that the outputs of the quantized and full-precision models match. FPTQuant enables static INT4 quantization with minimal overhead and shows SOTA speed-up of up to 3.9 times over FP. Empirically, FPTQuant has an excellent accuracy-speed trade-off – it is performing on par or exceeding most prior work and only shows slightly lower accuracy compared to a method that is up to 29% slower.

nan


Article 1957

Title@2025-06-05 (4): Agentic AI for Intent-Based Industrial Automation

Title: Agentic AI for Intent-Based Industrial Automation Agentische KI für Intent-Based Industrial Automation 用于内在工业自动化的AAA剂 2506.04980v1

Authors (2): Marcos Lima Romero, Ricardo Suyama

The recent development of Agentic AI systems, empowered by autonomous large language models (LLMs) agents with planning and tool-usage capabilities, enables new possibilities for the evolution of industrial automation and reduces the complexity introduced by Industry 4.0. This work proposes a conceptual framework that integrates Agentic AI with the intent-based paradigm, originally developed in network research, to simplify human-machine interaction (HMI) and better align automation systems with the human-centric, sustainable, and resilient principles of Industry 5.0. Based on the intent-based processing, the framework allows human operators to express high-level business or operational goals in natural language, which are decomposed into actionable components. These intents are broken into expectations, conditions, targets, context, and information that guide sub-agents equipped with specialized tools to execute domain-specific tasks. A proof of concept was implemented using the CMAPSS dataset and Google Agent Developer Kit (ADK), demonstrating the feasibility of intent decomposition, agent orchestration, and autonomous decision-making in predictive maintenance scenarios. The results confirm the potential of this approach to reduce technical barriers and enable scalable, intent-driven automation, despite data quality and explainability concerns.

nan


Article 1958

Title@2025-06-05 (4): Entropy-based Training Methods for Scalable Neural Implicit Sampler

Title: Entropy-based Training Methods for Scalable Neural Implicit Sampler Entropie-basierte Trainingsmethoden für skalierbare neurale Implizite Sampler 用于可缩缩的神经隐隐性取样器的基于英信的培训方法 2306.04952v2

Authors (3): Weijian Luo, Boya Zhang, Zhihua Zhang

Efficiently sampling from un-normalized target distributions is a fundamental problem in scientific computing and machine learning. Traditional approaches such as Markov Chain Monte Carlo (MCMC) guarantee asymptotically unbiased samples from such distributions but suffer from computational inefficiency, particularly when dealing with high-dimensional targets, as they require numerous iterations to generate a batch of samples. In this paper, we introduce an efficient and scalable neural implicit sampler that overcomes these limitations. The implicit sampler can generate large batches of samples with low computational costs by leveraging a neural transformation that directly maps easily sampled latent vectors to target samples without the need for iterative procedures. To train the neural implicit samplers, we introduce two novel methods: the KL training method and the Fisher training method. The former method minimizes the Kullback-Leibler divergence, while the latter minimizes the Fisher divergence between the sampler and the target distributions. By employing the two training methods, we effectively optimize the neural implicit samplers to learn and generate from the desired target distribution. To demonstrate the effectiveness, efficiency, and scalability of our proposed samplers, we evaluate them on three sampling benchmarks with different scales.

nan


Article 1959

Title@2025-06-05 (4): Deep histological synthesis from mass spectrometry imaging for multimodal registration

Title: Deep histological synthesis from mass spectrometry imaging for multimodal registration Tiefe histologische Synthese aus Massenspektrometrie-Bildgebung für multimodale Registrierung 从用于多式联运注册的质谱成像成像中产生深度神学合成 2506.05441v1

Authors (4): Kimberley M. Bird, Xujiong Ye, Alan M. Race, James M. Brown

Registration of histological and mass spectrometry imaging (MSI) allows for more precise identification of structural changes and chemical interactions in tissue. With histology and MSI having entirely different image formation processes and dimensionalities, registration of the two modalities remains an ongoing challenge. This work proposes a solution that synthesises histological images from MSI, using a pix2pix model, to effectively enable unimodal registration. Preliminary results show promising synthetic histology images with limited artifacts, achieving increases in mutual information (MI) and structural similarity index measures (SSIM) of +0.924 and +0.419, respectively, compared to a baseline U-Net model. Our source code is available on GitHub: https://github.com/kimberley/MIUA2025.

nan


Article 1960

Title@2025-06-05 (4): Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences

Title: Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences Diff-Instruct++: Ein-Schritt-Training Text-zu-Bild-Generator-Modell zur Ausrichtung mit menschlichen Einstellungen Diff- Instrucct+++:培训一步骤文本到图像生成模型,使之与人类偏好相一致 2410.18881v2

Authors (1): Weijian Luo

One-step text-to-image generator models offer advantages such as swift inference efficiency, flexible architectures, and state-of-the-art generation performance. In this paper, we study the problem of aligning one-step generator models with human preferences for the first time. Inspired by the success of reinforcement learning using human feedback (RLHF), we formulate the alignment problem as maximizing expected human reward functions while adding an Integral Kullback-Leibler divergence term to prevent the generator from diverging. By overcoming technical challenges, we introduce Diff-Instruct++ (DI++), the first, fast-converging and image data-free human preference alignment method for one-step text-to-image generators. We also introduce novel theoretical insights, showing that using CFG for diffusion distillation is secretly doing RLHF with DI++. Such an interesting finding brings understanding and potential contributions to future research involving CFG. In the experiment sections, we align both UNet-based and DiT-based one-step generators using DI++, which use the Stable Diffusion 1.5 and the PixelArt-$\alpha$ as the reference diffusion processes. The resulting DiT-based one-step text-to-image model achieves a strong Aesthetic Score of 6.19 and an Image Reward of 1.24 on the COCO validation prompt dataset. It also achieves a leading Human preference Score (HPSv2.0) of 28.48, outperforming other open-sourced models such as Stable Diffusion XL, DMD2, SD-Turbo, as well as PixelArt-$\alpha$. Both theoretical contributions and empirical evidence indicate that DI++ is a strong human-preference alignment approach for one-step text-to-image models. The homepage of the paper is https://github.com/pkulwj1994/diff_instruct_pp.

nan


Article 1961

Title@2025-06-05 (4): Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule

Title: Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule Pilotieren von strukturbasiertem Drug Design über Modalitätsspezifische Optimale Planung 通过特定方式和特定最佳安排表进行基于结构的药物设计 2505.07286v2

Authors (8): Keyue Qiu, Yuxuan Song, Zhehuan Fan, Peidong Liu, Zhe Zhang, Mingyue Zheng, Hao Zhou, Wei-Ying Ma

Structure-Based Drug Design (SBDD) is crucial for identifying bioactive molecules. Recent deep generative models are faced with challenges in geometric structure modeling. A major bottleneck lies in the twisted probability path of multi-modalities – continuous 3D positions and discrete 2D topologies – which jointly determine molecular geometries. By establishing the fact that noise schedules decide the Variational Lower Bound (VLB) for the twisted probability path, we propose VLB-Optimal Scheduling (VOS) strategy in this under-explored area, which optimizes VLB as a path integral for SBDD. Our model effectively enhances molecular geometries and interaction modeling, achieving state-of-the-art PoseBusters passing rate of 95.9% on CrossDock, more than 10% improvement upon strong baselines, while maintaining high affinities and robust intramolecular validity evaluated on held-out test set. Code is available at https://github.com/AlgoMole/MolCRAFT.

nan


Article 1962

Title@2025-06-05 (4): Efficient Time Series Processing for Transformers and State-Space Models through Token Merging

Title: Efficient Time Series Processing for Transformers and State-Space Models through Token Merging Effiziente Zeitreihenverarbeitung für Transformatoren und State-Space-Modelle durch Token Merging 通过 Token 合并对变形器和国家空间模型的有效时间序列处理 2405.17951v3

Authors (4): Leon Götz, Marcel Kollovieh, Stephan Günnemann, Leo Schwinn

Despite recent advances in subquadratic attention mechanisms or state-space models, processing long token sequences still imposes significant computational requirements. Token merging has emerged as a solution to increase computational efficiency in computer vision architectures. In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its computational complexity from quadratic to linear based on the neighborhood size to effectively scale to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Further, we identify spectral properties of the input data that reliably predict the potential benefits of local merging without requiring evaluation on downstream tasks. Our comprehensive empirical evaluation demonstrates that local merging offers substantial efficiency gains with minimal impact on accuracy, achieving up to 5400% acceleration on the recently proposed Chronos foundation model.

nan


Article 1963

Title@2025-06-05 (4): Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models

Title: Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models Gemeinsame Interventionseffekte von Single-Variablen-Interventionen in Additiven-Modellen lernen 在 “ 补充模式 “ 中单一可变干预措施的 学习联合干预效应 2506.04945v1

Authors (3): Armin Kekić, Sergio Hernan Garrido Mejia, Bernhard Schölkopf

Estimating causal effects of joint interventions on multiple variables is crucial in many domains, but obtaining data from such simultaneous interventions can be challenging. Our study explores how to learn joint interventional effects using only observational data and single-variable interventions. We present an identifiability result for this problem, showing that for a class of nonlinear additive outcome mechanisms, joint effects can be inferred without access to joint interventional data. We propose a practical estimator that decomposes the causal effect into confounded and unconfounded contributions for each intervention variable. Experiments on synthetic data demonstrate that our method achieves performance comparable to models trained directly on joint interventional data, outperforming a purely observational estimator.

nan


Article 1964

Title@2025-06-05 (4): Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation

Title: Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation Schwacher Generativer Sampler zur effizienten Probeninvariantverteilung der stochastischen Differentialgleichung 用于有效取样的弱生成采样器,以有效采样、不变化地分配蒸蒸蒸式差别等式的微变式采样器 2405.19256v2

Authors (4): Zhiqiang Cai, Yu Cao, Yuanfei Huang, Xiang Zhou

Sampling invariant distributions from an It\^o diffusion process presents a significant challenge in stochastic simulation. Traditional numerical solvers for stochastic differential equations require both a fine step size and a lengthy simulation period, resulting in biased and correlated samples. The current deep learning-based method solves the stationary Fokker–Planck equation to determine the invariant probability density function in the form of deep neural networks, but they generally do not directly address the problem of sampling from the computed density function. In this work, we introduce a framework that employs a weak generative sampler (WGS) to directly generate independent and identically distributed (iid) samples induced by a transformation map derived from the stationary Fokker–Planck equation. Our proposed loss function is based on the weak form of the Fokker–Planck equation, integrating normalizing flows to characterize the invariant distribution and facilitate sample generation from a base distribution. Our randomized test function circumvents the need for min-max optimization in the traditional weak formulation. Our method necessitates neither the computationally intensive calculation of the Jacobian determinant nor the invertibility of the transformation map. A crucial component of our framework is the adaptively chosen family of test functions in the form of Gaussian kernel functions with centers related to the generated data samples. Experimental results on several benchmark examples demonstrate the effectiveness and scalability of our method, which offers both low computational costs and excellent capability in exploring multiple metastable states.

nan


Article 1965

Title@2025-06-05 (4): David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training

Title: David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training David und Goliath: Kleines Ein-Schritt-Modell schlägt große Diffusion mit Partitur nach dem Training David和Goliath:小型一步骤模式比大规模传播更胜于记分后培训 2410.20898v3

Authors (4): Weijian Luo, Colin Zhang, Debing Zhang, Zhengyang Geng

We propose Diff-Instruct* (DI), a data-efficient post-training approach for one-step text-to-image generative models to improve its human preferences without requiring image data. Our method frames alignment as online reinforcement learning from human feedback (RLHF), which optimizes the one-step model to maximize human reward functions while being regularized to be kept close to a reference diffusion process. Unlike traditional RLHF approaches, which rely on the Kullback-Leibler divergence as the regularization, we introduce a novel general score-based divergence regularization that substantially improves performance as well as post-training stability. Although the general score-based RLHF objective is intractable to optimize, we derive a strictly equivalent tractable loss function in theory that can efficiently compute its \emph{gradient} for optimizations. We introduce \emph{DI-SDXL-1step}, which is a 2.6B one-step text-to-image model at a resolution of $1024\times 1024$, post-trained from DMD2 w.r.t SDXL. \textbf{Our 2.6B \emph{DI*-SDXL-1step} model outperforms the 50-step 12B FLUX-dev model} in ImageReward, PickScore, and CLIP score on the Parti prompts benchmark while using only 1.88\% of the inference time. This result clearly shows that with proper post-training, the small one-step model is capable of beating huge multi-step diffusion models. Our model is open-sourced at this link: https://github.com/pkulwj1994/diff_instruct_star. We hope our findings can contribute to human-centric machine learning techniques.

nan


Article 1966

Title@2025-06-05 (4): Can Transformers Do Enumerative Geometry?

Title: Can Transformers Do Enumerative Geometry? Können Transformer enumerative Geometrie? 变形人能做数字几何吗? 2408.14915v3

Authors (3): Baran Hashemi, Roderic G. Corominas, Alessandro Giacchetto

How can Transformers model and learn enumerative geometry? What is a robust procedure for using Transformers in abductive knowledge discovery within a mathematician-machine collaboration? In this work, we introduce a Transformer-based approach to computational enumerative geometry, specifically targeting the computation of $\psi$-class intersection numbers on the moduli space of curves. By reformulating the problem as a continuous optimization task, we compute intersection numbers across a wide value range from $10^{-45}$ to $10^{45}$. To capture the recursive nature inherent in these intersection numbers, we propose the Dynamic Range Activator (DRA), a new activation function that enhances the Transformer’s ability to model recursive patterns and handle severe heteroscedasticity. Given precision requirements for computing the intersections, we quantify the uncertainty of the predictions using Conformal Prediction with a dynamic sliding window adaptive to the partitions of equivalent number of marked points. To the best of our knowledge, there has been no prior work on modeling recursive functions with such a high-variance and factorial growth. Beyond simply computing intersection numbers, we explore the enumerative “world-model” of Transformers. Our interpretability analysis reveals that the network is implicitly modeling the Virasoro constraints in a purely data-driven manner. Moreover, through abductive hypothesis testing, probing, and causal inference, we uncover evidence of an emergent internal representation of the the large-genus asymptotic of $\psi$-class intersection numbers. These findings suggest that the network internalizes the parameters of the asymptotic closed-form and the polynomiality phenomenon of intersection numbers in a non-linear manner. This opens up new possibilities in inferring asymptotic closed-form expressions directly from limited amount of data.

nan


Article 1967

Title@2025-06-05 (4): Energentic Intelligence: From Self-Sustaining Systems to Enduring Artificial Life

Title: Energentic Intelligence: From Self-Sustaining Systems to Enduring Artificial Life Energentische Intelligenz: Von selbsterhaltenden Systemen zum Ausharren des künstlichen Lebens 能源情报:从自我维持系统到人命持续 2506.04916v1

Authors (1): Atahan Karagoz

This paper introduces Energentic Intelligence, a class of autonomous systems defined not by task performance, but by their capacity to sustain themselves through internal energy regulation. Departing from conventional reward-driven paradigms, these agents treat survival-maintaining functional operation under fluctuating energetic and thermal conditions-as the central objective. We formalize this principle through an energy-based utility function and a viability-constrained survival horizon, and propose a modular architecture that integrates energy harvesting, thermal regulation, and adaptive computation into a closed-loop control system. A simulated environment demonstrates the emergence of stable, resource-aware behavior without external supervision. Together, these contributions provide a theoretical and architectural foundation for deploying autonomous agents in resource-volatile settings where persistence must be self-regulated and infrastructure cannot be assumed.

nan


Article 1968

Title@2025-06-05 (4): MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

Title: MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining MiMo: Entsperren des vernünftigen Potenzials des Sprachmodells – Von der Vorschulung zur Nachschulung 米莫:释放语文模式的理性潜力 – – 从预培训到员额培训 2505.07608v2

Authors (65): LLM-Core Xiaomi, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai, Chenhong He, Dong Zhang, Duo Zhang, Guoan Wang, Hao Tian, Haochen Zhao, Heng Qu, Hongshen Xu, Jun Shi, Kainan Bao, Kai Fang, Kang Zhou, Kangyang Zhou, Lei Li, Menghang Zhu, Nuo Chen, Qiantong Wang, Shaohui Liu, Shicheng Li, Shuhao Gu, Shuhuai Ren, Shuo Liu, Sirui Deng, Weiji Zhuang, Weiwei Lv, Wenyu Yang, Xin Zhang, Xing Yong, Xing Zhang, Xingchen Song, Xinzhe Xu, Xu Wang, Yihan Yan, Yu Tu, Yuanyuan Tian, Yudong Wang, Yue Yu, Zhenru Lin, Zhichao Song, Zihao Yue

We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model’s reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective for enhanced performance and accelerated inference speed. During post-training, we curate a dataset of 130K verifiable mathematics and programming problems for reinforcement learning, integrating a test-difficulty-driven code-reward scheme to alleviate sparse-reward issues and employing strategic data resampling to stabilize training. Extensive evaluations show that MiMo-7B-Base possesses exceptional reasoning potential, outperforming even much larger 32B models. The final RL-tuned model, MiMo-7B-RL, achieves superior performance on mathematics, code and general reasoning tasks, surpassing the performance of OpenAI o1-mini. The model checkpoints are available at https://github.com/xiaomimimo/MiMo.

nan


Article 1969

Title@2025-06-05 (4): Dissecting Long Reasoning Models: An Empirical Study

Title: Dissecting Long Reasoning Models: An Empirical Study Modelle der langen Vernunft zu entschlüsseln: Eine empirische Studie 解析长期理由模型:经验研究 2506.04913v1

Authors (8): Yongyu Mu, Jiali Zeng, Bei Li, Xinyan Guan, Fandong Meng, Jie Zhou, Tong Xiao, Jingbo Zhu

Despite recent progress in training long-context reasoning models via reinforcement learning (RL), several open questions and counterintuitive behaviors remain. This work focuses on three key aspects: (1) We systematically analyze the roles of positive and negative samples in RL, revealing that positive samples mainly facilitate data fitting, whereas negative samples significantly enhance generalization and robustness. Interestingly, training solely on negative samples can rival standard RL training performance. (2) We identify substantial data inefficiency in group relative policy optimization, where over half of the samples yield zero advantage. To address this, we explore two straightforward strategies, including relative length rewards and offline sample injection, to better leverage these data and enhance reasoning efficiency and capability. (3) We investigate unstable performance across various reasoning models and benchmarks, attributing instability to uncertain problems with ambiguous outcomes, and demonstrate that multiple evaluation runs mitigate this issue.

nan


Article 1970

Title@2025-06-05 (4): When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models

Title: When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models Wenn LLMs denken lügen: Enthüllen der strategischen Täuschung in Repräsentationen von Vernunftmodellen 当思考LLLM Lie:在解释理由模型代表中消除战略欺骗时 2506.04909v1

Authors (3): Kai Wang, Yihao Zhang, Meng Sun

The honesty of large language models (LLMs) is a critical alignment challenge, especially as advanced systems with chain-of-thought (CoT) reasoning may strategically deceive humans. Unlike traditional honesty issues on LLMs, which could be possibly explained as some kind of hallucination, those models’ explicit thought paths enable us to study strategic deception–goal-driven, intentional misinformation where reasoning contradicts outputs. Using representation engineering, we systematically induce, detect, and control such deception in CoT-enabled LLMs, extracting “deception vectors” via Linear Artificial Tomography (LAT) for 89% detection accuracy. Through activation steering, we achieve a 40% success rate in eliciting context-appropriate deception without explicit prompts, unveiling the specific honesty-related issue of reasoning models and providing tools for trustworthy AI alignment.

nan


Article 1971

Title@2025-06-05 (4): Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy

Title: Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy Trainieren Sie einen Sparse Autoencoder über mehrere Sparsity-Haushalte, um Dolmetschbarkeit und Genauigkeit zu erhalten 跨多级预算培训一个简单自动编码员,以维护可解释性和准确性 2505.24473v2

Authors (7): Nikita Balagansky, Yaroslav Aksenov, Daniil Laptev, Vadim Kurochkin, Gleb Gerasimov, Nikita Koryagin, Daniil Gavrilov

Sparse Autoencoders (SAEs) have proven to be powerful tools for interpreting neural networks by decomposing hidden representations into disentangled, interpretable features via sparsity constraints. However, conventional SAEs are constrained by the fixed sparsity level chosen during training; meeting different sparsity requirements therefore demands separate models and increases the computational footprint during both training and evaluation. We introduce a novel training objective, \emph{HierarchicalTopK}, which trains a single SAE to optimise reconstructions across multiple sparsity levels simultaneously. Experiments with Gemma-2 2B demonstrate that our approach achieves Pareto-optimal trade-offs between sparsity and explained variance, outperforming traditional SAEs trained at individual sparsity levels. Further analysis shows that HierarchicalTopK preserves high interpretability scores even at higher sparsity. The proposed objective thus closes an important gap between flexibility and interpretability in SAE design.

nan


Article 1972

Title@2025-06-05 (4): Are all models wrong? Fundamental limits in distribution-free empirical model falsification

Title: Are all models wrong? Fundamental limits in distribution-free empirical model falsification Sind alle Modelle falsch? Grundlegende Grenzen in der verteilungsfreien empirischen Modellfälschung 所有模型是否都错了?无分销经验模型伪造的基本限制 2502.06765v2

Authors (3): Manuel M. Müller, Yuetian Luo, Rina Foygel Barber

In statistics and machine learning, when we train a fitted model on available data, we typically want to ensure that we are searching within a model class that contains at least one accurate model – that is, we would like to ensure an upper bound on the model class risk (the lowest possible risk that can be attained by any model in the class). However, it is also of interest to establish lower bounds on the model class risk, for instance so that we can determine whether our fitted model is at least approximately optimal within the class, or, so that we can decide whether the model class is unsuitable for the particular task at hand. Particularly in the setting of interpolation learning where machine learning models are trained to reach zero error on the training data, we might ask if, at the very least, a positive lower bound on the model class risk is possible – or are we unable to detect that “all models are wrong”? In this work, we answer these questions in a distribution-free setting by establishing a model-agnostic, fundamental hardness result for the problem of constructing a lower bound on the best test error achievable over a model class, and examine its implications on specific model classes such as tree-based methods and linear regression.

nan


Article 1973

Title@2025-06-05 (4): Leveraging Diffusion Models for Parameterized Quantum Circuit Generation

Title: Leveraging Diffusion Models for Parameterized Quantum Circuit Generation Nutzung von Diffusionsmodellen für die parameterisierte Quantum Circuit Generation 利用可计量量子电路生成的传播模型 2505.20863v2

Authors (4): Daniel Barta, Darya Martyniuk, Johannes Jung, Adrian Paschke

Quantum computing holds immense potential, yet its practical success depends on multiple factors, including advances in quantum circuit design. In this paper, we introduce a generative approach based on denoising diffusion models (DMs) to synthesize parameterized quantum circuits (PQCs). Extending the recent diffusion model pipeline of F"urrutter et al. [1], our model effectively conditions the synthesis process, enabling the simultaneous generation of circuit architectures and their continuous gate parameters. We demonstrate our approach in synthesizing PQCs optimized for generating high-fidelity Greenberger-Horne-Zeilinger (GHZ) states and achieving high accuracy in quantum machine learning (QML) classification tasks. Our results indicate a strong generalization across varying gate sets and scaling qubit counts, highlighting the versatility and computational efficiency of diffusion-based methods. This work illustrates the potential of generative models as a powerful tool for accelerating and optimizing the design of PQCs, supporting the development of more practical and scalable quantum applications.

nan


Article 1974

Title@2025-06-05 (4): Scalable Multi-Output Gaussian Processes with Stochastic Variational Inference

Title: Scalable Multi-Output Gaussian Processes with Stochastic Variational Inference Skalierbare Multi-Output Gaußsche Prozesse mit stochastischer Variationsableitung 可缩放多输出高斯进程,带有斯托卡变动推断 2407.02476v2

Authors (4): Xiaoyu Jiang, Sokratia Georgaka, Magnus Rattray, Mauricio A. Álvarez

The Multi-Output Gaussian Process is is a popular tool for modelling data from multiple sources. A typical choice to build a covariance function for a MOGP is the Linear Model of Coregionalization (LMC) which parametrically models the covariance between outputs. The Latent Variable MOGP (LV-MOGP) generalises this idea by modelling the covariance between outputs using a kernel applied to latent variables, one per output, leading to a flexible MOGP model that allows efficient generalization to new outputs with few data points. Computational complexity in LV-MOGP grows linearly with the number of outputs, which makes it unsuitable for problems with a large number of outputs. In this paper, we propose a stochastic variational inference approach for the LV-MOGP that allows mini-batches for both inputs and outputs, making computational complexity per training iteration independent of the number of outputs.

nan


Article 1975

Title@2025-06-05 (4): Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge

Title: Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge Full-Parameter Continual Pretraining von Gemma2: Einblicke in Fluency und Domain Knowledge Gemma2:深入了解流能和广域知识 2505.05946v2

Authors (3): Vytenis Šliogeris, Povilas Daniušis, Artūras Nakvosas

In this technical report, we empirically investigate the relationship between linguistic fluency and domain knowledge in the context of continual learning with large language models (LLMs). Specifically, we enhance the linguistic fluency of the Gemma2 LLM for the Lithuanian language by autoregressively pretraining its full parameter set on the first 10\% of the Lithuanian language component of the CulturaX dataset. To prevent catastrophic forgetting of the model’s existing domain knowledge, we apply Elastic Weight Consolidation (EWC), leveraging Fisher information estimated using data from the Massive Multitask Language Understanding (MMLU) benchmark. In the post-training evaluations, we assess linguistic fluency through perplexity and evaluate domain knowledge using accuracy on a suite of language understanding benchmarks, including ARC-Easy, Belebele, GSM8K, HellaSwag, MMLU, TruthfulQA, and Winogrande, in both English and Lithuanian. The empirical results demonstrate that EWC not only mitigates catastrophic forgetting by preserving the model’s performance in terms of both linguistic fluency and domain knowledge but also improves or maintains these capabilities for the newly added Lithuanian language. These findings highlight the potential for more efficient adaptation of general-purpose LLMs to under-represented languages without requiring access to the original training data. The accompanying codebase is openly accessible at https://github.com/Neurotechnology/LLM_EWC.

nan


Article 1976

Title@2025-06-05 (4): Gaussian Process Diffeomorphic Statistical Shape Modelling Outperforms Angle-Based Methods for Assessment of Hip Dysplasia

Title: Gaussian Process Diffeomorphic Statistical Shape Modelling Outperforms Angle-Based Methods for Assessment of Hip Dysplasia Gaussian Prozess Diffeomorphe statistische Form Modellierung übertrifft Winkel-basierte Methoden zur Beurteilung von Hüftdysplasie Gausian 进程 外形外形光效光效模型统计形态底部形态评估方法 2506.04886v1

Authors (6): Allen Paul, George Grammatopoulos, Adwaye Rambojun, Neill D. F. Campbell, Harinderjit S. Gill, Tony Shardlow

Dysplasia is a recognised risk factor for osteoarthritis (OA) of the hip, early diagnosis of dysplasia is important to provide opportunities for surgical interventions aimed at reducing the risk of hip OA. We have developed a pipeline for semi-automated classification of dysplasia using volumetric CT scans of patients’ hips and a minimal set of clinically annotated landmarks, combining the framework of the Gaussian Process Latent Variable Model with diffeomorphism to create a statistical shape model, which we termed the Gaussian Process Diffeomorphic Statistical Shape Model (GPDSSM). We used 192 CT scans, 100 for model training and 92 for testing. The GPDSSM effectively distinguishes dysplastic samples from controls while also highlighting regions of the underlying surface that show dysplastic variations. As well as improving classification accuracy compared to angle-based methods (AUC 96.2% vs 91.2%), the GPDSSM can save time for clinicians by removing the need to manually measure angles and interpreting 2D scans for possible markers of dysplasia.

nan


Article 1977

Title@2025-06-05 (4): Self-Tuning Spectral Clustering for Speaker Diarization

Title: Self-Tuning Spectral Clustering for Speaker Diarization Selbst-Tuning Spectral Clustering für Sprecher-Diarisierung 为议长进行自我引导光谱群集 2410.00023v2

Authors (4): Nikhil Raghav, Avisek Gupta, Md Sahidullah, Swagatam Das

Spectral clustering has proven effective in grouping speech representations for speaker diarization tasks, although post-processing the affinity matrix remains difficult due to the need for careful tuning before constructing the Laplacian. In this study, we present a novel pruning algorithm to create a sparse affinity matrix called spectral clustering on p-neighborhood retained affinity matrix (SC-pNA). Our method improves on node-specific fixed neighbor selection by allowing a variable number of neighbors, eliminating the need for external tuning data as the pruning parameters are derived directly from the affinity matrix. SC-pNA does so by identifying two clusters in every row of the initial affinity matrix, and retains only the top p % similarity scores from the cluster containing larger similarities. Spectral clustering is performed subsequently, with the number of clusters determined as the maximum eigengap. Experimental results on the challenging DIHARD-III dataset highlight the superiority of SC-pNA, which is also computationally more efficient than existing auto-tuning approaches. Our implementations are available at https://github.com/nikhilraghav29/SC-pNA.

nan


Article 1978

Title@2025-06-05 (4): Multiplicative Dynamic Mode Decomposition

Title: Multiplicative Dynamic Mode Decomposition Multiplikative Dynamische Moduszersetzung 多倍复制动态模式分解 2405.05334v2

Authors (2): Nicolas Boullé, Matthew J. Colbrook

Koopman operators are infinite-dimensional operators that linearize nonlinear dynamical systems, facilitating the study of their spectral properties and enabling the prediction of the time evolution of observable quantities. Recent methods have aimed to approximate Koopman operators while preserving key structures. However, approximating Koopman operators typically requires a dictionary of observables to capture the system’s behavior in a finite-dimensional subspace. The selection of these functions is often heuristic, may result in the loss of spectral information, and can severely complicate structure preservation. This paper introduces Multiplicative Dynamic Mode Decomposition (MultDMD), which enforces the multiplicative structure inherent in the Koopman operator within its finite-dimensional approximation. Leveraging this multiplicative property, we guide the selection of observables and define a constrained optimization problem for the matrix approximation, which can be efficiently solved. MultDMD presents a structured approach to finite-dimensional approximations and can more accurately reflect the spectral properties of the Koopman operator. We elaborate on the theoretical framework of MultDMD, detailing its formulation, optimization strategy, and convergence properties. The efficacy of MultDMD is demonstrated through several examples, including the nonlinear pendulum, the Lorenz system, and fluid dynamics data, where we demonstrate its remarkable robustness to noise.

nan


Article 1979

Title@2025-06-05 (4): Multi-granularity Knowledge Transfer for Continual Reinforcement Learning

Title: Multi-granularity Knowledge Transfer for Continual Reinforcement Learning Multi-Granularitäts-Wissenstransfer für kontinuierliches Stärkungslernen 用于持续加强学习的多族知识转让 2401.15098v3

Authors (7): Chaofan Pan, Lingfei Ren, Yihui Feng, Linbo Xiong, Wei Wei, Yonghao Li, Xin Yang

Continual reinforcement learning (CRL) empowers RL agents with the ability to learn a sequence of tasks, accumulating knowledge learned in the past and using the knowledge for problemsolving or future task learning. However, existing methods often focus on transferring fine-grained knowledge across similar tasks, which neglects the multi-granularity structure of human cognitive control, resulting in insufficient knowledge transfer across diverse tasks. To enhance coarse-grained knowledge transfer, we propose a novel framework called MT-Core (as shorthand for Multi-granularity knowledge Transfer for Continual reinforcement learning). MT-Core has a key characteristic of multi-granularity policy learning: 1) a coarsegrained policy formulation for utilizing the powerful reasoning ability of the large language model (LLM) to set goals, and 2) a fine-grained policy learning through RL which is oriented by the goals. We also construct a new policy library (knowledge base) to store policies that can be retrieved for multi-granularity knowledge transfer. Experimental results demonstrate the superiority of the proposed MT-Core in handling diverse CRL tasks versus popular baselines.

nan


Article 1980

Title@2025-06-05 (4): kTULA: A Langevin sampling algorithm with improved KL bounds under super-linear log-gradients

Title: kTULA: A Langevin sampling algorithm with improved KL bounds under super-linear log-gradients kTULA: Ein Langevin-Sampling-Algorithmus mit verbesserten KL-Grenzen unter superlinearen Log-Gradienten kTULA: Langevin 取样算法,在超线性日志梯度下改进 KL 界限 2506.04878v1

Authors (3): Iosif Lytras, Sotirios Sabanis, Ying Zhang

Motivated by applications in deep learning, where the global Lipschitz continuity condition is often not satisfied, we examine the problem of sampling from distributions with super-linearly growing log-gradients. We propose a novel tamed Langevin dynamics-based algorithm, called kTULA, to solve the aforementioned sampling problem, and provide a theoretical guarantee for its performance. More precisely, we establish a non-asymptotic convergence bound in Kullback-Leibler (KL) divergence with the best-known rate of convergence equal to $2-\overline{\epsilon}$, $\overline{\epsilon}>0$, which significantly improves relevant results in existing literature. This enables us to obtain an improved non-asymptotic error bound in Wasserstein-2 distance, which can be used to further derive a non-asymptotic guarantee for kTULA to solve the associated optimization problems. To illustrate the applicability of kTULA, we apply the proposed algorithm to the problem of sampling from a high-dimensional double-well potential distribution and to an optimization problem involving a neural network. We show that our main results can be used to provide theoretical guarantees for the performance of kTULA.

nan


Article 1981

Title@2025-06-05 (4): There Was Never a Bottleneck in Concept Bottleneck Models

Title: There Was Never a Bottleneck in Concept Bottleneck Models Es gab nie einen Engpass in Konzept Engpässe Modelle ” 概念瓶颈模型 “ 从未出现过瓶颈问题 2506.04877v1

Authors (3): Antonio Almudévar, José Miguel Hernández-Lobato, Alfonso Ortega

Deep learning representations are often difficult to interpret, which can hinder their deployment in sensitive applications. Concept Bottleneck Models (CBMs) have emerged as a promising approach to mitigate this issue by learning representations that support target task performance while ensuring that each component predicts a concrete concept from a predefined set. In this work, we argue that CBMs do not impose a true bottleneck: the fact that a component can predict a concept does not guarantee that it encodes only information about that concept. This shortcoming raises concerns regarding interpretability and the validity of intervention procedures. To overcome this limitation, we propose Minimal Concept Bottleneck Models (MCBMs), which incorporate an Information Bottleneck (IB) objective to constrain each representation component to retain only the information relevant to its corresponding concept. This IB is implemented via a variational regularization term added to the training loss. As a result, MCBMs support concept-level interventions with theoretical guarantees, remain consistent with Bayesian principles, and offer greater flexibility in key design choices.

nan


Article 1982

Title@2025-06-05 (4): Aligning Multimodal Representations through an Information Bottleneck

Title: Aligning Multimodal Representations through an Information Bottleneck Multimodale Repräsentationen durch einen Informationsengpass ausrichten 通过信息瓶颈协调多式联运代表机构 2506.04870v1

Authors (5): Antonio Almudévar, José Miguel Hernández-Lobato, Sameer Khurana, Ricard Marxer, Alfonso Ortega

Contrastive losses have been extensively used as a tool for multimodal representation learning. However, it has been empirically observed that their use is not effective to learn an aligned representation space. In this paper, we argue that this phenomenon is caused by the presence of modality-specific information in the representation space. Although some of the most widely used contrastive losses maximize the mutual information between representations of both modalities, they are not designed to remove the modality-specific information. We give a theoretical description of this problem through the lens of the Information Bottleneck Principle. We also empirically analyze how different hyperparameters affect the emergence of this phenomenon in a controlled experimental setup. Finally, we propose a regularization term in the loss function that is derived by means of a variational approximation and aims to increase the representational alignment. We analyze in a set of controlled experiments and real-world applications the advantages of including this regularization term.

nan


Article 1983

Title@2025-06-05 (4): LLMs for sensory-motor control: Combining in-context and iterative learning

Title: LLMs for sensory-motor control: Combining in-context and iterative learning LLMs für die sensorisch-motorische Steuerung: Kombination von Kontext und iterativem Lernen 感官运动控制LLMs:将内文与迭接学习相结合 2506.04867v1

Authors (2): Jônata Tyska Carvalho, Stefano Nolfi

We propose a method that enables large language models (LLMs) to control embodied agents by directly mapping continuous observation vectors to continuous action vectors. Initially, the LLMs generate a control strategy based on a textual description of the agent, its environment, and the intended goal. This strategy is then iteratively refined through a learning process in which the LLMs are repeatedly prompted to improve the current strategy, using performance feedback and sensory-motor data collected during its evaluation. The method is validated on classic control tasks from the Gymnasium library and the inverted pendulum task from the MuJoCo library. In most cases, it successfully identifies optimal or high-performing solutions by integrating symbolic knowledge derived through reasoning with sub-symbolic sensory-motor data gathered as the agent interacts with its environment.

nan


Article 1984

Title@2025-06-05 (4): Fokker-Planck to Callan-Symanzik: evolution of weight matrices under training

Title: Fokker-Planck to Callan-Symanzik: evolution of weight matrices under training Fokker-Planck zu Callan-Symanzik: Entwicklung von Gewichtsmatrizen im Training Fokker-Planck to Callan-Symanzik:培训中重量矩阵的演变 2501.09659v2

Authors (3): Wei Bu, Uri Kol, Ziming Liu

The dynamical evolution of a neural network during training has been an incredibly fascinating subject of study. First principal derivation of generic evolution of variables in statistical physics systems has proved useful when used to describe training dynamics conceptually, which in practice means numerically solving equations such as Fokker-Planck equation. Simulating entire networks inevitably runs into the curse of dimensionality. In this paper, we utilize Fokker-Planck to simulate the probability density evolution of individual weight matrices in the bottleneck layers of a simple 2-bottleneck-layered auto-encoder and compare the theoretical evolutions against the empirical ones by examining the output data distributions. We also derive physically relevant partial differential equations such as Callan-Symanzik and Kardar-Parisi-Zhang equations from the dynamical equation we have.

nan


Article 1985

Title@2025-06-05 (4): Improving AI-generated music with user-guided training

Title: Improving AI-generated music with user-guided training Verbesserung der KI-generierten Musik durch benutzergeführtes Training 通过用户指导培训,改进AI制制的音乐 2506.04852v1

Authors (5): Vishwa Mohan Singh, Sai Anirudh Aryasomayajula, Ahan Chatterjee, Beste Aydemir, Rifat Mehreen Amin

AI music generation has advanced rapidly, with models like diffusion and autoregressive algorithms enabling high-fidelity outputs. These tools can alter styles, mix instruments, or isolate them. Since sound can be visualized as spectrograms, image-generation algorithms can be applied to generate novel music. However, these algorithms are typically trained on fixed datasets, which makes it challenging for them to interpret and respond to user input accurately. This is especially problematic because music is highly subjective and requires a level of personalization that image generation does not provide. In this work, we propose a human-computation approach to gradually improve the performance of these algorithms based on user interactions. The human-computation element involves aggregating and selecting user ratings to use as the loss function for fine-tuning the model. We employ a genetic algorithm that incorporates user feedback to enhance the baseline performance of a model initially trained on a fixed dataset. The effectiveness of this approach is measured by the average increase in user ratings with each iteration. In the pilot test, the first iteration showed an average rating increase of 0.2 compared to the baseline. The second iteration further improved upon this, achieving an additional increase of 0.39 over the first iteration.

nan


Article 1986

Title@2025-06-05 (4): An Unsupervised Framework for Dynamic Health Indicator Construction and Its Application in Rolling Bearing Prognostics

Title: An Unsupervised Framework for Dynamic Health Indicator Construction and Its Application in Rolling Bearing Prognostics Ein unbeaufsichtigtes Framework für die Konstruktion dynamischer Gesundheitsindikatoren und deren Anwendung in der Wälzlagerprognostik 一个无人监督的动态健康指标建设框架及其在滚轴预测器中的应用 2506.05438v1

Authors (4): Tongda Sun, Chen Yin, Huailiang Zheng, Yining Dong

Health indicator (HI) plays a key role in degradation assessment and prognostics of rolling bearings. Although various HI construction methods have been investigated, most of them rely on expert knowledge for feature extraction and overlook capturing dynamic information hidden in sequential degradation processes, which limits the ability of the constructed HI for degradation trend representation and prognostics. To address these concerns, a novel dynamic HI that considers HI-level temporal dependence is constructed through an unsupervised framework. Specifically, a degradation feature learning module composed of a skip-connection-based autoencoder first maps raw signals to a representative degradation feature space (DFS) to automatically extract essential degradation features without the need for expert knowledge. Subsequently, in this DFS, a new HI-generating module embedded with an inner HI-prediction block is proposed for dynamic HI construction, where the temporal dependence between past and current HI states is guaranteed and modeled explicitly. On this basis, the dynamic HI captures the inherent dynamic contents of the degradation process, ensuring its effectiveness for degradation tendency modeling and future degradation prognostics. The experiment results on two bearing lifecycle datasets demonstrate that the proposed HI construction method outperforms comparison methods, and the constructed dynamic HI is superior for prognostic tasks.

nan


Article 1987

Title@2025-06-05 (4): Learning pure quantum states (almost) without regret

Title: Learning pure quantum states (almost) without regret Lernen von reinen Quantenzuständen (fast) ohne Bedauern 学习纯量度状态(几乎) 2406.18370v2

Authors (3): Josep Lumbreras, Mikhail Terekhov, Marco Tomamichel

We initiate the study of sample-optimal quantum state tomography with minimal disturbance to the samples. Can we efficiently learn a precise description of a quantum state through sequential measurements of samples while at the same time making sure that the post-measurement state of the samples is only minimally perturbed? Defining regret as the cumulative disturbance of all samples, the challenge is to find a balance between the most informative sequence of measurements on the one hand and measurements incurring minimal regret on the other. Here we answer this question for qubit states by exhibiting a protocol that for pure states achieves maximal precision while incurring a regret that grows only polylogarithmically with the number of samples, a scaling that we show to be optimal.

nan


Article 1988

Title@2025-06-05 (4): Hybrid deep convolution model for lung cancer detection with transfer learning

Title: Hybrid deep convolution model for lung cancer detection with transfer learning Hybrides Deep-Convolution-Modell für Lungenkrebserkennung mit Transfer-Lernen 肺癌检测与转移学习的混合深变深混合模型 2501.02785v2

Authors (4): Sugandha Saxena, S. N. Prasad, Ashwin M Polnaya, Shweta Agarwala

Advances in healthcare research have significantly enhanced our understanding of disease mechanisms, diagnostic precision, and therapeutic options. Yet, lung cancer remains one of the leading causes of cancer-related mortality worldwide due to challenges in early and accurate diagnosis. While current lung cancer detection models show promise, there is considerable potential for further improving the accuracy for timely intervention. To address this challenge, we introduce a hybrid deep convolution model leveraging transfer learning, named the Maximum Sensitivity Neural Network (MSNN). MSNN is designed to improve the precision of lung cancer detection by refining sensitivity and specificity. This model has surpassed existing deep learning approaches through experimental validation, achieving an accuracy of 98% and a sensitivity of 97%. By overlaying sensitivity maps onto lung Computed Tomography (CT) scans, it enables the visualization of regions most indicative of malignant or benign classifications. This innovative method demonstrates exceptional performance in distinguishing lung cancer with minimal false positives, thereby enhancing the accuracy of medical diagnoses.

nan


Article 1989

Title@2025-06-05 (4): Event Classification of Accelerometer Data for Industrial Package Monitoring with Embedded Deep Learning

Title: Event Classification of Accelerometer Data for Industrial Package Monitoring with Embedded Deep Learning Event-Klassifikation von Accelerometer-Daten für die industrielle Paketüberwachung mit eingebettetem Deep Learning 用于以嵌入深层学习方式监测工业一揽子措施的加速计数据的事件分类 2506.05435v1

Authors (6): Manon Renault, Hamoud Younes, Hugo Tessier, Ronan Le Roy, Bastien Pasdeloup, Mathieu Léonardon

Package monitoring is an important topic in industrial applications, with significant implications for operational efficiency and ecological sustainability. In this study, we propose an approach that employs an embedded system, placed on reusable packages, to detect their state (on a Forklift, in a Truck, or in an undetermined location). We aim to design a system with a lifespan of several years, corresponding to the lifespan of reusable packages. Our analysis demonstrates that maximizing device lifespan requires minimizing wake time. We propose a pipeline that includes data processing, training, and evaluation of the deep learning model designed for imbalanced, multiclass time series data collected from an embedded sensor. The method uses a one-dimensional Convolutional Neural Network architecture to classify accelerometer data from the IoT device. Before training, two data augmentation techniques are tested to solve the imbalance problem of the dataset: the Synthetic Minority Oversampling TEchnique and the ADAptive SYNthetic sampling approach. After training, compression techniques are implemented to have a small model size. On the considered twoclass problem, the methodology yields a precision of 94.54% for the first class and 95.83% for the second class, while compression techniques reduce the model size by a factor of four. The trained model is deployed on the IoT device, where it operates with a power consumption of 316 mW during inference.

nan


Article 1990

Title@2025-06-05 (4): From EHRs to Patient Pathways: Scalable Modeling of Longitudinal Health Trajectories with LLMs

Title: From EHRs to Patient Pathways: Scalable Modeling of Longitudinal Health Trajectories with LLMs Von EHRs zu Patientenpfaden: Skalierbare Modellierung von langitudinalen Gesundheits-Trajektorien mit LLMs 从EHRs到患者途径:可扩展的长纵向健康轨迹模型与LLMs 2506.04831v1

Authors (5): Chantal Pellegrini, Ege Özsoy, David Bani-Harouni, Matthias Keicher, Nassir Navab

Healthcare systems face significant challenges in managing and interpreting vast, heterogeneous patient data for personalized care. Existing approaches often focus on narrow use cases with a limited feature space, overlooking the complex, longitudinal interactions needed for a holistic understanding of patient health. In this work, we propose a novel approach to patient pathway modeling by transforming diverse electronic health record (EHR) data into a structured representation and designing a holistic pathway prediction model, EHR2Path, optimized to predict future health trajectories. Further, we introduce a novel summary mechanism that embeds long-term temporal context into topic-specific summary tokens, improving performance over text-only models, while being much more token-efficient. EHR2Path demonstrates strong performance in both next time-step prediction and longitudinal simulation, outperforming competitive baselines. It enables detailed simulations of patient trajectories, inherently targeting diverse evaluation tasks, such as forecasting vital signs, lab test results, or length-of-stay, opening a path towards predictive and personalized healthcare.

nan


Article 1991

Title@2025-06-05 (4): Reading Recognition in the Wild

Title: Reading Recognition in the Wild Leseerkennung in der Wildnis 野生生物的阅读识别 2505.24848v2

Authors (15): Charig Yang, Samiul Alam, Shakhrul Iman Siam, Michael J. Proulx, Lambert Mathias, Kiran Somasundaram, Luis Pesqueira, James Fort, Sheroze Sheriffdeen, Omkar Parkhi, Carl Ren, Mi Zhang, Yuning Chai, Richard Newcombe, Hyo Jin Kim

To enable egocentric contextual AI in always-on smart glasses, it is crucial to be able to keep a record of the user’s interactions with the world, including during reading. In this paper, we introduce a new task of reading recognition to determine when the user is reading. We first introduce the first-of-its-kind large-scale multimodal Reading in the Wild dataset, containing 100 hours of reading and non-reading videos in diverse and realistic scenarios. We then identify three modalities (egocentric RGB, eye gaze, head pose) that can be used to solve the task, and present a flexible transformer model that performs the task using these modalities, either individually or combined. We show that these modalities are relevant and complementary to the task, and investigate how to efficiently and effectively encode each modality. Additionally, we show the usefulness of this dataset towards classifying types of reading, extending current reading understanding studies conducted in constrained settings to larger scale, diversity and realism.

nan


Article 1992

Title@2025-06-05 (4): Supervised Graph Contrastive Learning for Gene Regulatory Network

Title: Supervised Graph Contrastive Learning for Gene Regulatory Network Überwachtes Graph Kontrastives Lernen für Gene Regulatory Network 受监督的基因监管网络图表对比性学习 2505.17786v2

Authors (5): Sho Oshima, Yuji Okamoto, Taisei Tosaki, Ryosuke Kojima, Yasushi Okuno

Graph representation learning is effective for obtaining a meaningful latent space utilizing the structure of graph data and is widely applied, including biological networks. In particular, Graph Contrastive Learning (GCL) has emerged as a powerful self-supervised method that relies on applying perturbations to graphs for data augmentation. However, when applying existing GCL methods to biological networks such as Gene Regulatory Networks (GRNs), they overlooked meaningful biologically relevant perturbations, e.g., gene knockdowns. In this study, we introduce SupGCL (Supervised Graph Contrastive Learning), a novel GCL method for GRNs that directly incorporates biological perturbations derived from gene knockdown experiments as the supervision. SupGCL mathematically extends existing GCL methods that utilize non-biological perturbations to probabilistic models that introduce actual biological gene perturbation utilizing gene knockdown data. Using the GRN representation obtained by our proposed method, our aim is to improve the performance of biological downstream tasks such as patient hazard prediction and disease subtype classification (graph-level task), and gene function classification (node-level task). We applied SupGCL on real GRN datasets derived from patients with multiple types of cancer, and in all experiments SupGCL achieves better performance than state-of-the-art baselines.

nan


Article 1993

Title@2025-06-05 (4): Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light Detectors

Title: Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light Detectors Tor das Stopplicht: Realistische Widersacher-Angriffe auf Verkehrslicht-Detektoren 愚笨的暗灯:现实的对抗性对交通灯光探测器的补丁攻击 2506.04823v1

Authors (5): Svetlana Pavlitska, Jamie Robb, Nikolai Polley, Melih Yazgan, J. Marius Zöllner

Realistic adversarial attacks on various camera-based perception tasks of autonomous vehicles have been successfully demonstrated so far. However, only a few works considered attacks on traffic light detectors. This work shows how CNNs for traffic light detection can be attacked with printed patches. We propose a threat model, where each instance of a traffic light is attacked with a patch placed under it, and describe a training strategy. We demonstrate successful adversarial patch attacks in universal settings. Our experiments show realistic targeted red-to-green label-flipping attacks and attacks on pictogram classification. Finally, we perform a real-world evaluation with printed patches and demonstrate attacks in the lab settings with a mobile traffic light for construction sites and in a test area with stationary traffic lights. Our code is available at https://github.com/KASTEL-MobilityLab/attacks-on-traffic-light-detection.

nan


Article 1994

Title@2025-06-05 (4): Offline Multi-agent Reinforcement Learning via Score Decomposition

Title: Offline Multi-agent Reinforcement Learning via Score Decomposition Offline-Multi-Agenten-Verstärkung Lernen über Score-Dekomposition 通过计分分分分分分化进行离线多剂强化学习 2505.05968v2

Authors (5): Dan Qiao, Wenhao Li, Shanchao Yang, Hongyuan Zha, Baoxiang Wang

Offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts, particularly stemming from the high dimensionality of joint action spaces and the presence of out-of-distribution joint action selections. In this work, we highlight that a fundamental challenge in offline MARL arises from the multi-equilibrium nature of cooperative tasks, which induces a highly multimodal joint behavior policy space coupled with heterogeneous-quality behavior data. This makes it difficult for individual policy regularization to align with a consistent coordination pattern, leading to the policy distribution shift problems. To tackle this challenge, we design a sequential score function decomposition method that distills per-agent regularization signals from the joint behavior policy, which induces coordinated modality selection under decentralized execution constraints. Then we leverage a flexible diffusion-based generative model to learn these score functions from multimodal offline data, and integrate them into joint-action critics to guide policy updates toward high-reward, in-distribution regions under a shared team reward. Our approach achieves state-of-the-art performance across multiple particle environments and Multi-agent MuJoCo benchmarks consistently. To the best of our knowledge, this is the first work to explicitly address the distributional gap between offline and online MARL, paving the way for more generalizable offline policy-based MARL methods.

nan


Article 1995

Title@2025-06-05 (4): LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning

Title: LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning LogicPuzzleRL: Robuste mathematische Reasoning in LLMs durch Verstärkungslernen kultivieren LologicPuwloughRL:通过强化学习在LLMs中培养强力数学理由 2506.04821v1

Authors (10): Zhen Hao Wong, Jingwen Deng, Runming He, Zirong Chen, Qijie You, Hejun Dong, Hao Liang, Chengyu Shen, Bin Cui, Wentao Zhang

Large language models (LLMs) excel at many supervised tasks but often struggle with structured reasoning in unfamiliar settings. This discrepancy suggests that standard fine-tuning pipelines may instill narrow, domain-specific heuristics rather than fostering general-purpose thinking strategies. In this work, we propose a “play to learn” framework that fine-tunes LLMs through reinforcement learning on a suite of seven custom logic puzzles, each designed to cultivate distinct reasoning skills such as constraint propagation, spatial consistency, and symbolic deduction. Using a reinforcement learning setup with verifiable rewards, models receive binary feedback based on puzzle correctness, encouraging iterative, hypothesis-driven problem solving. We demonstrate that this training approach significantly improves out-of-distribution performance on a range of mathematical benchmarks, especially for mid-difficulty problems that require multi-step reasoning. Analyses across problem categories and difficulty levels reveal that puzzle training promotes transferable reasoning routines, strengthening algebraic manipulation, geometric inference, and combinatorial logic, while offering limited gains on rote or highly specialized tasks. These findings show that reinforcement learning over logic puzzles reshapes the internal reasoning of LLMs, enabling more robust and compositional generalization without relying on task-specific symbolic tools.

nan


Article 1996

Title@2025-06-05 (4): Distributional encoding for Gaussian process regression with qualitative inputs

Title: Distributional encoding for Gaussian process regression with qualitative inputs Verteilungskodierung für Gaußsche Prozessregression mit qualitativen Inputs 带有定性投入的高斯进程回归的分布编码 2506.04813v1

Authors (1): Sébastien Da Veiga

Gaussian Process (GP) regression is a popular and sample-efficient approach for many engineering applications, where observations are expensive to acquire, and is also a central ingredient of Bayesian optimization (BO), a highly prevailing method for the optimization of black-box functions. However, when all or some input variables are categorical, building a predictive and computationally efficient GP remains challenging. Starting from the naive target encoding idea, where the original categorical values are replaced with the mean of the target variable for that category, we propose a generalization based on distributional encoding (DE) which makes use of all samples of the target variable for a category. To handle this type of encoding inside the GP, we build upon recent results on characteristic kernels for probability distributions, based on the maximum mean discrepancy and the Wasserstein distance. We also discuss several extensions for classification, multi-task learning and incorporation or auxiliary information. Our approach is validated empirically, and we demonstrate state-of-the-art predictive performance on a variety of synthetic and real-world datasets. DE is naturally complementary to recent advances in BO over discrete and mixed-spaces.

nan


Article 1997

Title@2025-06-05 (4): Uniform convergence of the smooth calibration error and its relationship with functional gradient

Title: Uniform convergence of the smooth calibration error and its relationship with functional gradient Einheitliche Konvergenz des glatten Kalibrierfehlers und seines Verhältnisses mit dem funktionellen Gradienten 平稳校准误差及其与功能梯度的关系统一汇合 2505.19396v2

Authors (2): Futoshi Futami, Atsushi Nitanda

Calibration is a critical requirement for reliable probabilistic prediction, especially in high-risk applications. However, the theoretical understanding of which learning algorithms can simultaneously achieve high accuracy and good calibration remains limited, and many existing studies provide empirical validation or a theoretical guarantee in restrictive settings. To address this issue, in this work, we focus on the smooth calibration error (CE) and provide a uniform convergence bound, showing that the smooth CE is bounded by the sum of the smooth CE over the training dataset and a generalization gap. We further prove that the functional gradient of the loss function can effectively control the training smooth CE. Based on this framework, we analyze three representative algorithms: gradient boosting trees, kernel boosting, and two-layer neural networks. For each, we derive conditions under which both classification and calibration performances are simultaneously guaranteed. Our results offer new theoretical insights and practical guidance for designing reliable probabilistic models with provable calibration guarantees.

nan


Article 1998

Title@2025-06-05 (4): Adaptive Preconditioners Trigger Loss Spikes in Adam

Title: Adaptive Preconditioners Trigger Loss Spikes in Adam Adaptive Preconditioners Trigger Verlust Spikes in Adam 亚当的触发器失落斯派克 2506.04805v1

Authors (9): Zhiwei Bai, Zhangchen Zhou, Jiajie Zhao, Xiaolong Li, Zhiyu Li, Feiyu Xiong, Hongkang Yang, Yaoyu Zhang, Zhi-Qin John Xu

Loss spikes emerge commonly during training across neural networks of varying architectures and scales when using the Adam optimizer. In this work, we investigate the underlying mechanism responsible for Adam spikes. While previous explanations attribute these phenomena to the lower-loss-as-sharper characteristics of the loss landscape, our analysis reveals that Adam’s adaptive preconditioners themselves can trigger spikes. Specifically, we identify a critical regime where squared gradients become substantially smaller than the second-order moment estimates, causing the latter to undergo a $\beta_2$-exponential decay and to respond sluggishly to current gradient information. This mechanism can push the maximum eigenvalue of the preconditioned Hessian beyond the classical stability threshold $2/\eta$ for a sustained period, inducing instability. This instability further leads to an alignment between the gradient and the maximum eigendirection, and a loss spike occurs precisely when the gradient-directional curvature exceeds $2/\eta$. We verify this mechanism through extensive experiments on fully connected networks, convolutional networks, and Transformer architectures.

nan


Article 1999

Title@2025-06-05 (4): Augmented Invertible Koopman Autoencoder for long-term time series forecasting

Title: Augmented Invertible Koopman Autoencoder for long-term time series forecasting Augmented Invertible Koopman Autoencoder für langfristige Zeitreihen 用于长期时间序列预报的可逆 Koopman 自动编码器 2503.12930v2

Authors (5): Anthony Frion, Lucas Drumetz, Mauro Dalla Mura, Guillaume Tochon, Abdeldjalil Aïssa-El-Bey

Following the introduction of Dynamic Mode Decomposition and its numerous extensions, many neural autoencoder-based implementations of the Koopman operator have recently been proposed. This class of methods appears to be of interest for modeling dynamical systems, either through direct long-term prediction of the evolution of the state or as a powerful embedding for downstream methods. In particular, a recent line of work has developed invertible Koopman autoencoders (IKAEs), which provide an exact reconstruction of the input state thanks to their analytically invertible encoder, based on coupling layer normalizing flow models. We identify that the conservation of the dimension imposed by the normalizing flows is a limitation for the IKAE models, and thus we propose to augment the latent state with a second, non-invertible encoder network. This results in our new model: the Augmented Invertible Koopman AutoEncoder (AIKAE). We demonstrate the relevance of the AIKAE through a series of long-term time series forecasting experiments, on satellite image time series as well as on a benchmark involving predictions based on a large lookback window of observations.

nan


Article 2000

Title@2025-06-05 (4): LotusFilter: Fast Diverse Nearest Neighbor Search via a Learned Cutoff Table

Title: LotusFilter: Fast Diverse Nearest Neighbor Search via a Learned Cutoff Table LotusFilter: Schnelles Diverses Naheste Nachbarschaftssuche über einen erfahrenen Cutoff-Tisch LotusFilter:通过计算截止表快速搜索近邻 2506.04790v1

Authors (1): Yusuke Matsui

Approximate nearest neighbor search (ANNS) is an essential building block for applications like RAG but can sometimes yield results that are overly similar to each other. In certain scenarios, search results should be similar to the query and yet diverse. We propose LotusFilter, a post-processing module to diversify ANNS results. We precompute a cutoff table summarizing vectors that are close to each other. During the filtering, LotusFilter greedily looks up the table to delete redundant vectors from the candidates. We demonstrated that the LotusFilter operates fast (0.02 [ms/query]) in settings resembling real-world RAG applications, utilizing features such as OpenAI embeddings. Our code is publicly available at https://github.com/matsui528/lotf.

nan


Article 2001

Title@2025-06-05 (4): Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques

Title: Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques Auf dem Weg zur multimodalen Fusion LLM-Centric: Eine Studie über Integrationsstrategien und -techniken 走向LLM-Centric Multimods 融合:关于一体化战略和技术的调查 2506.04788v1

Authors (4): Jisu An, Junseok Lee, Jeoungeun Lee, Yongseok Son

The rapid progress of Multimodal Large Language Models(MLLMs) has transformed the AI landscape. These models combine pre-trained LLMs with various modality encoders. This integration requires a systematic understanding of how different modalities connect to the language backbone. Our survey presents an LLM-centric analysis of current approaches. We examine methods for transforming and aligning diverse modal inputs into the language embedding space. This addresses a significant gap in existing literature. We propose a classification framework for MLLMs based on three key dimensions. First, we examine architectural strategies for modality integration. This includes both the specific integration mechanisms and the fusion level. Second, we categorize representation learning techniques as either joint or coordinate representations. Third, we analyze training paradigms, including training strategies and objective functions. By examining 125 MLLMs developed between 2021 and 2025, we identify emerging patterns in the field. Our taxonomy provides researchers with a structured overview of current integration techniques. These insights aim to guide the development of more robust multimodal integration strategies for future models built on pre-trained foundations.

nan


Article 2002

Title@2025-06-05 (4): Kernel $k$-Medoids as General Vector Quantization

Title: Kernel $k$-Medoids as General Vector Quantization Kernel $k$-Medoids als General Vector Quantization 作为一般矢量量化的内核 $k$- 中值 2506.04786v1

Authors (3): Thore Gerlach, Sascha Mücke, Christian Bauckhage

Vector Quantization (VQ) is a widely used technique in machine learning and data compression, valued for its simplicity and interpretability. Among hard VQ methods, $k$-medoids clustering and Kernel Density Estimation (KDE) approaches represent two prominent yet seemingly unrelated paradigms – one distance-based, the other rooted in probability density matching. In this paper, we investigate their connection through the lens of Quadratic Unconstrained Binary Optimization (QUBO). We compare a heuristic QUBO formulation for $k$-medoids, which balances centrality and diversity, with a principled QUBO derived from minimizing Maximum Mean Discrepancy in KDE-based VQ. Surprisingly, we show that the KDE-QUBO is a special case of the $k$-medoids-QUBO under mild assumptions on the kernel’s feature map. This reveals a deeper structural relationship between these two approaches and provides new insight into the geometric interpretation of the weighting parameters used in QUBO formulations for VQ.

nan


Article 2003

Title@2025-06-05 (4): Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward

Title: Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward Prefix Grouper: Effiziente GRPO-Schulung durch Shared-Prefix Forward 前缀群組: 通过共享前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前言前语前言前言前言前语前言前语前言前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语前语语 2506.05433v1

Authors (8): Zikang Liu, Tongtian Yue, Yepeng Tang, Longteng Guo, Junxian Cai, Qingbin Liu, Xi Chen, Jing Liu

Group Relative Policy Optimization (GRPO) enhances policy learning by computing gradients from relative comparisons among candidate outputs that share a common input prefix. Despite its effectiveness, GRPO introduces substantial computational overhead when processing long shared prefixes, which must be redundantly encoded for each group member. This inefficiency becomes a major scalability bottleneck in long-context learning scenarios. We propose Prefix Grouper, an efficient GRPO training algorithm that eliminates redundant prefix computation via a Shared-Prefix Forward strategy. In particular, by restructuring self-attention into two parts, our method enables the shared prefix to be encoded only once, while preserving full differentiability and compatibility with end-to-end training. We provide both theoretical and empirical evidence that Prefix Grouper is training-equivalent to standard GRPO: it yields identical forward outputs and backward gradients, ensuring that the optimization dynamics and final policy performance remain unchanged. Empirically, our experiments confirm that Prefix Grouper achieves consistent results while significantly reducing the computational cost of training, particularly in long-prefix scenarios. The proposed method is fully plug-and-play: it is compatible with existing GRPO-based architectures and can be seamlessly integrated into current training pipelines as a drop-in replacement, requiring no structural modifications and only minimal changes to input construction and attention computation. Prefix Grouper enables the use of larger group sizes under the same computational budget, thereby improving the scalability of GRPO to more complex tasks and larger models. Code is now available at https://github.com/johncaged/PrefixGrouper

nan


Article 2004

Title@2025-06-05 (4): Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

Title: Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards Verbesserte Regret Bounds für lineare Banditen mit schwerfälligen Belohnungen 使用重负奖励的线性强盗的遗憾弹道改善 2506.04775v1

Authors (3): Artin Tajdini, Jonathan Scarlett, Kevin Jamieson

We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite $(1+\epsilon)$-absolute central moment bounded by $\upsilon$ for some $\epsilon \in (0,1]$. We improve both upper and lower bounds on the minimax regret compared to prior work. When $\upsilon = \mathcal{O}(1)$, the best prior known regret upper bound is $\tilde{\mathcal{O}}(d T^{\frac{1}{1+\epsilon}})$. While a lower with the same scaling has been given, it relies on a construction using $\upsilon = \mathcal{O}(d)$, and adapting the construction to the bounded-moment regime with $\upsilon = \mathcal{O}(1)$ yields only a $\Omega(d^{\frac{\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}})$ lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being $\sqrt{d}$ below the optimal rate in the finite-variance case ($\epsilon = 1$). We propose a new elimination-based algorithm guided by experimental design, which achieves regret $\tilde{\mathcal{O}}(d^{\frac{1+3\epsilon}{2(1+\epsilon)}} T^{\frac{1}{1+\epsilon}})$, thus improving the dependence on $d$ for all $\epsilon \in (0,1)$ and recovering a known optimal result for $\epsilon = 1$. We also establish a lower bound of $\Omega(d^{\frac{2\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}})$, which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems. For finite action sets, we derive similarly improved upper and lower bounds for regret. Finally, we provide action set dependent regret upper bounds showing that for some geometries, such as $l_p$-norm balls for $p \le 1 + \epsilon$, we can further reduce the dependence on $d$, and we can handle infinite-dimensional settings via the kernel trick, in particular establishing new regret bounds for the Mat'ern kernel that are the first to be sublinear for all $\epsilon \in (0, 1]$.

nan


Article 2005

Title@2025-06-05 (4): Fine-Grained Interpretation of Political Opinions in Large Language Models

Title: Fine-Grained Interpretation of Political Opinions in Large Language Models Feinkörnige Interpretation politischer Meinungen in großen Sprachmodellen 以大语言模式对政治意见的精细解释 2506.04774v1

Authors (4): Jingyu Hu, Mengyue Yang, Mengnan Du, Weiru Liu

Studies of LLMs’ political opinions mainly rely on evaluations of their open-ended responses. Recent work indicates that there is a misalignment between LLMs’ responses and their internal intentions. This motivates us to probe LLMs’ internal mechanisms and help uncover their internal political states. Additionally, we found that the analysis of LLMs’ political opinions often relies on single-axis concepts, which can lead to concept confounds. In this work, we extend the single-axis to multi-dimensions and apply interpretable representation engineering techniques for more transparent LLM political concept learning. Specifically, we designed a four-dimensional political learning framework and constructed a corresponding dataset for fine-grained political concept vector learning. These vectors can be used to detect and intervene in LLM internals. Experiments are conducted on eight open-source LLMs with three representation engineering techniques. Results show these vectors can disentangle political concept confounds. Detection tasks validate the semantic meaning of the vectors and show good generalization and robustness in OOD settings. Intervention Experiments show these vectors can intervene in LLMs to generate responses with different political leanings.

nan


Article 2006

Title@2025-06-05 (4): Inclusive, Differentially Private Federated Learning for Clinical Data

Title: Inclusive, Differentially Private Federated Learning for Clinical Data Inklusives, differenziert privates Federated Learning für klinische Daten 包容性、差异化私联校临床数据学习 2505.22108v2

Authors (10): Santhosh Parampottupadam, Melih Coşğun, Sarthak Pati, Maximilian Zenk, Saikat Roy, Dimitrios Bounias, Benjamin Hamm, Sinem Sav, Ralf Floca, Klaus Maier-Hein

Federated Learning (FL) offers a promising approach for training clinical AI models without centralizing sensitive patient data. However, its real-world adoption is hindered by challenges related to privacy, resource constraints, and compliance. Existing Differential Privacy (DP) approaches often apply uniform noise, which disproportionately degrades model performance, even among well-compliant institutions. In this work, we propose a novel compliance-aware FL framework that enhances DP by adaptively adjusting noise based on quantifiable client compliance scores. Additionally, we introduce a compliance scoring tool based on key healthcare and security standards to promote secure, inclusive, and equitable participation across diverse clinical settings. Extensive experiments on public datasets demonstrate that integrating under-resourced, less compliant clinics with highly regulated institutions yields accuracy improvements of up to 15% over traditional FL. This work advances FL by balancing privacy, compliance, and performance, making it a viable solution for real-world clinical workflows in global healthcare.

nan


Article 2007

Title@2025-06-05 (4): Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective

Title: Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective Der Stabilitäts-Plastizitäts-Trade-off im kontinuierlichen Lernen aus architektonischer Perspektive neu denken 从建筑学的视角重新思考持续学习中稳定-弹性权衡取舍 2506.03951v2

Authors (4): Aojun Lu, Hangjie Yuan, Tao Feng, Yanan Sun

The quest for Continual Learning (CL) seeks to empower neural networks with the ability to learn and adapt incrementally. Central to this pursuit is addressing the stability-plasticity dilemma, which involves striking a balance between two conflicting objectives: preserving previously learned knowledge and acquiring new knowledge. While numerous CL methods aim to achieve this trade-off, they often overlook the impact of network architecture on stability and plasticity, restricting the trade-off to the parameter level. In this paper, we delve into the conflict between stability and plasticity at the architectural level. We reveal that under an equal parameter constraint, deeper networks exhibit better plasticity, while wider networks are characterized by superior stability. To address this architectural-level dilemma, we introduce a novel framework denoted Dual-Arch, which serves as a plug-in component for CL. This framework leverages the complementary strengths of two distinct and independent networks: one dedicated to plasticity and the other to stability. Each network is designed with a specialized and lightweight architecture, tailored to its respective objective. Extensive experiments demonstrate that Dual-Arch enhances the performance of existing CL methods while being up to 87% more compact in terms of parameters. Code: https://github.com/byyx666/Dual-Arch.

nan


Article 2008

Title@2025-06-05 (4): PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling

Title: PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling PCDVQ: Verbesserung der Vector Quantization für große Sprachmodelle über Polar Coordinate Entkopplung PCDVQ:通过极地协调脱钩,加强大语言模型的矢量量化 2506.05432v1

Authors (6): Yuxuan Yue, Zukang Xu, Zhihang Yuan, Dawei Yang, Jianglong Wu, Liqiang Nie

Large Language Models (LLMs) face significant challenges in edge deployment due to their massive parameter scale. Vector Quantization (VQ), a clustering-based quantization method, serves as a prevalent solution to this issue for its extremely low-bit (even at 2-bit) and considerable accuracy. Since a vector is a quantity in mathematics and physics that has both direction and magnitude, existing VQ works typically quantize them in a coupled manner. However, we find that direction exhibits significantly greater sensitivity to quantization compared to the magnitude. For instance, when separately clustering the directions and magnitudes of weight vectors in LLaMA-2-7B, the accuracy drop of zero-shot tasks are 46.5\% and 2.3\%, respectively. This gap even increases with the reduction of clustering centers. Further, Euclidean distance, a common metric to access vector similarities in current VQ works, places greater emphasis on reducing the magnitude error. This property is contrary to the above finding, unavoidably leading to larger quantization errors. To these ends, this paper proposes Polar Coordinate Decoupled Vector Quantization (PCDVQ), an effective and efficient VQ framework consisting of two key modules: 1) Polar Coordinate Decoupling (PCD), which transforms vectors into their polar coordinate representations and perform independent quantization of the direction and magnitude parameters.2) Distribution Aligned Codebook Construction (DACC), which optimizes the direction and magnitude codebooks in accordance with the source distribution. Experimental results show that PCDVQ outperforms baseline methods at 2-bit level by at least 1.5\% zero-shot accuracy, establishing a novel paradigm for accurate and highly compressed LLMs.

nan


Article 2009

Title@2025-06-05 (4): OpenGT: A Comprehensive Benchmark For Graph Transformers

Title: OpenGT: A Comprehensive Benchmark For Graph Transformers OpenGT: Ein umfassender Benchmark für Graph Transformer OpenGT: 图表变形器综合基准 2506.04765v1

Authors (6): Jiachen Tang, Zhonghao Wang, Sirui Chen, Sheng Zhou, Jiawei Chen, Jiajun Bu

Graph Transformers (GTs) have recently demonstrated remarkable performance across diverse domains. By leveraging attention mechanisms, GTs are capable of modeling long-range dependencies and complex structural relationships beyond local neighborhoods. However, their applicable scenarios are still underexplored, this highlights the need to identify when and why they excel. Furthermore, unlike GNNs, which predominantly rely on message-passing mechanisms, GTs exhibit a diverse design space in areas such as positional encoding, attention mechanisms, and graph-specific adaptations. Yet, it remains unclear which of these design choices are truly effective and under what conditions. As a result, the community currently lacks a comprehensive benchmark and library to promote a deeper understanding and further development of GTs. To address this gap, this paper introduces OpenGT, a comprehensive benchmark for Graph Transformers. OpenGT enables fair comparisons and multidimensional analysis by establishing standardized experimental settings and incorporating a broad selection of state-of-the-art GNNs and GTs. Our benchmark evaluates GTs from multiple perspectives, encompassing diverse tasks and datasets with varying properties. Through extensive experiments, our benchmark has uncovered several critical insights, including the difficulty of transferring models across task levels, the limitations of local attention, the efficiency trade-offs in several models, the application scenarios of specific positional encodings, and the preprocessing overhead of some positional encodings. We aspire for this work to establish a foundation for future graph transformer research emphasizing fairness, reproducibility, and generalizability. We have developed an easy-to-use library OpenGT for training and evaluating existing GTs. The benchmark code is available at https://github.com/eaglelab-zju/OpenGT.

nan


Article 2010

Title@2025-06-05 (4): Log-Linear Attention

Title: Log-Linear Attention Log-Linear-Achtung 日志边注意 2506.04761v1

Authors (6): Han Guo, Songlin Yang, Tarushii Goel, Eric P. Xing, Tri Dao, Yoon Kim

The attention mechanism in Transformers is an important primitive for accurate and scalable sequence modeling. Its quadratic-compute and linear-memory complexity however remain significant bottlenecks. Linear attention and state-space models enable linear-time, constant-memory sequence modeling and can moreover be trained efficiently through matmul-rich parallelization across sequence length. However, at their core these models are still RNNs, and thus their use of a fixed-size hidden state to model the context is a fundamental limitation. This paper develops log-linear attention, an attention mechanism that balances linear attention’s efficiency and the expressiveness of softmax attention. Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states. We show that with a particular growth function, log-linear attention admits a similarly matmul-rich parallel form whose compute cost is log-linear in sequence length. Log-linear attention is a general framework and can be applied on top of existing linear attention variants. As case studies, we instantiate log-linear variants of two recent architectures – Mamba-2 and Gated DeltaNet – and find they perform well compared to their linear-time variants.

nan


Article 2011

Title@2025-06-05 (4): Robustness Evaluation for Video Models with Reinforcement Learning

Title: Robustness Evaluation for Video Models with Reinforcement Learning Robustheitsbewertung für Videomodelle mit Verstärkungslernen 强化学习录像模型的强力评价 2506.05431v1

Authors (8): Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna Gutierrez, Soumyendu Sarkar

Evaluating the robustness of Video classification models is very challenging, specifically when compared to image-based models. With their increased temporal dimension, there is a significant increase in complexity and computational cost. One of the key challenges is to keep the perturbations to a minimum to induce misclassification. In this work, we propose a multi-agent reinforcement learning approach (spatial and temporal) that cooperatively learns to identify the given video’s sensitive spatial and temporal regions. The agents consider temporal coherence in generating fine perturbations, leading to a more effective and visually imperceptible attack. Our method outperforms the state-of-the-art solutions on the Lp metric and the average queries. Our method enables custom distortion types, making the robustness evaluation more relevant to the use case. We extensively evaluate 4 popular models for video action recognition on two popular datasets, HMDB-51 and UCF-101.

nan


Article 2012

Title@2025-06-05 (4): Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models

Title: Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models Multi-Layer GRPO: Verbessern von Vernunft und Selbstkorrektion in großen Sprachmodellen 多语言GROPO:加强大语言模式的理性和自我校正 2506.04746v1

Authors (4): Fei Ding, Baiqiao Wang, Zijian Zeng, Youwei Wang

The Group Relative Policy Optimization (GRPO) algorithm has demonstrated considerable success in enhancing the reasoning capabilities of large language models (LLMs), as evidenced by DeepSeek-R1. However, the absence of intermediate supervision in GRPO frequently leads to inefficient exploration dynamics. A single error in a complex reasoning chain can invalidate the entire solution, resulting in abrupt reward vanishing and compromising training stability.To address these challenges, we propose MGRPO (Multi-layer GRPO). MGRPO operates in two layers: the first layer employs standard GRPO to generate an initial response. This response, along with the original query, is then fed into a second-layer GRPO process. This second layer is specifically trained to identify and correct errors in the initial response, effectively creating a self-correction loop. This mechanism provides implicit process-level supervision by rewarding successful error correction, without requiring an explicit, densely-annotated reward model. Experimental results on several mathematical reasoning benchmarks demonstrate that MGRPO significantly outperforms standard GRPO, achieving superior performance by fostering both reasoning and self-correction abilities.

nan


Article 2013

Title@2025-06-05 (4): Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models

Title: Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models Text-zu-CAD-Generierung durch visuelles Feedback in großen Sprachmodellen 通过将视觉反馈引入大语言模型,将生成的文字引入CAD 2501.19054v3

Authors (4): Ruiyu Wang, Yu Yuan, Shizhao Sun, Jiang Bian

Creating Computer-Aided Design (CAD) models requires significant expertise and effort. Text-to-CAD, which converts textual descriptions into CAD parametric sequences, is crucial in streamlining this process. Recent studies have utilized ground-truth parametric sequences, known as sequential signals, as supervision to achieve this goal. However, CAD models are inherently multimodal, comprising parametric sequences and corresponding rendered visual objects. Besides,the rendering process from parametric sequences to visual objects is many-to-one. Therefore, both sequential and visual signals are critical for effective training. In this work, we introduce CADFusion, a framework that uses Large Language Models (LLMs) as the backbone and alternates between two training stages: the sequential learning (SL) stage and the visual feedback (VF) stage. In the SL stage, we train LLMs using ground-truth parametric sequences, enabling the generation of logically coherent parametric sequences. In the VF stage, we reward parametric sequences that render into visually preferred objects and penalize those that do not, allowing LLMs to learn how rendered visual objects are perceived and evaluated. These two stages alternate throughout the training, ensuring balanced learning and preserving benefits of both signals. Experiments demonstrate that CADFusion significantly improves performance, both qualitatively and quantitatively.

nan


Article 2014

Title@2025-06-05 (4): Focus On This, Not That! Steering LLMs with Adaptive Feature Specification

Title: Focus On This, Not That! Steering LLMs with Adaptive Feature Specification Fokus auf diese, nicht das! Lenkung LLMs mit adaptiven Feature Specification 聚焦于此, 不是那个! 指导性带适应性特征规格的LLMS 2410.22944v4

Authors (5): Tom A. Lamb, Adam Davies, Alasdair Paren, Philip H. S. Torr, Francesco Pinto

Despite the success of Instruction Tuning (IT) in training large language models (LLMs), such models often leverage spurious or biased features learnt from their training data and can become misaligned, leading to undesired behaviours. While existing techniques can steer model behaviour at inference-time, they are often post-hoc and do not embed steering as an intrinsic model feature. In this work, we introduce Focus Instruction Tuning (FIT), which trains LLMs to condition their responses by focusing on specific features whilst ignoring others, leading to different behaviours based on what features are specified. Across diverse benchmarks, we demonstrate that FIT: (i) successfully steers behaviour at inference time; (ii) increases robustness by amplifying core task signals and down-weighting spurious cues; (iii) mitigates social bias by suppressing demographic attributes; and (iv) generalises under distribution shifts and to previously unseen focus features. FIT therefore offers a lightweight, intrinsic mechanism for building more robust, fair, and easily controllable LLMs.

nan


Article 2015

Title@2025-06-05 (4): Model-based Neural Data Augmentation for sub-wavelength Radio Localization

Title: Model-based Neural Data Augmentation for sub-wavelength Radio Localization Modellbasierte Neural Data Augmentation für die Subwellenlängen-Radiolokalisierung 以模型为基础的亚波长无线电无线电下波长无线电神经数据扩增本地化模型 2506.06387v1

Authors (6): Baptiste Chatelier, Vincent Corlay, Musa Furkan Keskin, Matthieu Crussière, Henk Wymeersch, Luc Le Magoarou

The increasing deployment of large antenna arrays at base stations has significantly improved the spatial resolution and localization accuracy of radio-localization methods. However, traditional signal processing techniques struggle in complex radio environments, particularly in scenarios dominated by non line of sight (NLoS) propagation paths, resulting in degraded localization accuracy. Recent developments in machine learning have facilitated the development of machine learning-assisted localization techniques, enhancing localization accuracy in complex radio environments. However, these methods often involve substantial computational complexity during both the training and inference phases. This work extends the well-established fingerprinting-based localization framework by simultaneously reducing its memory requirements and improving its accuracy. Specifically, a model-based neural network is used to learn the location-to-channel mapping, and then serves as a generative neural channel model. This generative model augments the fingerprinting comparison dictionary while reducing the memory requirements. The proposed method outperforms fingerprinting baselines by achieving sub-wavelength localization accuracy, even in NLoS environments. Remarkably, it offers an improvement by several orders of magnitude in localization accuracy, while simultaneously reducing memory requirements by an order of magnitude compared to classical fingerprinting methods.

nan


Article 2016

Title@2025-06-05 (4): The Role of Diversity in In-Context Learning for Large Language Models

Title: The Role of Diversity in In-Context Learning for Large Language Models Die Rolle der Vielfalt im In-Context-Lernen für große Sprachmodelle 多样性在为大语言模式进行内文学习方面的作用 2505.19426v2

Authors (3): Wenyang Xiao, Haoyu Zhao, Lingxiao Huang

In-context learning (ICL) is a crucial capability of current large language models (LLMs), where the selection of examples plays a key role in performance. While most existing approaches focus on selecting the most similar examples to the query, the impact of diversity in example selection remains underexplored. We systematically investigate the role of diversity in in-context example selection through experiments across a range of tasks, from sentiment classification to more challenging math and code problems. Experiments on Llama-3.1, Gemma-2, and Mistral-v0.3 families of models show that diversity-aware selection methods improve performance, particularly on complex tasks like math and code, and enhance robustness to out-of-distribution queries. To support these findings, we introduce a theoretical framework that explains the benefits of incorporating diversity in in-context example selection.

nan


Article 2017

Title@2025-06-05 (4): SPHINX: Structural Prediction using Hypergraph Inference Network

Title: SPHINX: Structural Prediction using Hypergraph Inference Network SPHINX: Structural Prediction mit Hypergraph Inference Network SPHINX:利用高光推断网络进行结构预测 2410.03208v2

Authors (2): Iulia Duta, Pietro Liò

The importance of higher-order relations is widely recognized in a large number of real-world systems. However, annotating them is a tedious and sometimes impossible task. Consequently, current approaches for data modelling either ignore the higher-order interactions altogether or simplify them into pairwise connections. In order to facilitate higher-order processing, even when a hypergraph structure is not available, we introduce Structural Prediction using Hypergraph Inference Network (SPHINX), a model that learns to infer a latent hypergraph structure in an unsupervised way, solely from the final node-level signal. The model consists of a soft, differentiable clustering method used to sequentially predict, for each hyperedge, the probability distribution over the nodes and a sampling algorithm that converts them into an explicit hypergraph structure. We show that the recent advancement in $k$-subset sampling represents a suitable tool for producing discrete hypergraph structures, addressing some of the training instabilities exhibited by prior works. The resulting model can generate the higher-order structure necessary for any modern hypergraph neural network, facilitating the capture of higher-order interaction in domains where annotating them is difficult. Through extensive ablation studies and experiments conducted on two challenging datasets for trajectory prediction, we demonstrate that our model is capable of inferring suitable latent hypergraphs, that are interpretable and enhance the final performance.

nan


Article 2018

Title@2025-06-05 (4): Automated Modeling Method for Pathloss Model Discovery

Title: Automated Modeling Method for Pathloss Model Discovery Automatisierte Modellierungsmethode für Pathloss Model Discovery 病理模型发现自动建模方法 2505.23383v2

Authors (5): Ahmad Anaqreh, Shih-Kai Chou, Mihael Mohorčič, Thomas Lagkas, Carolina Fortuna

Modeling propagation is the cornerstone for designing and optimizing next-generation wireless systems, with a particular emphasis on 5G and beyond era. Traditional modeling methods have long relied on statistic-based techniques to characterize propagation behavior across different environments. With the expansion of wireless communication systems, there is a growing demand for methods that guarantee the accuracy and interpretability of modeling. Artificial intelligence (AI)-based techniques, in particular, are increasingly being adopted to overcome this challenge, although the interpretability is not assured with most of these methods. Inspired by recent advancements in AI, this paper proposes a novel approach that accelerates the discovery of path loss models while maintaining interpretability. The proposed method automates the formulation, evaluation, and refinement of the model, facilitating the discovery of the model. We examine two techniques: one based on Deep Symbolic Regression, offering full interpretability, and the second based on Kolmogorov-Arnold Networks, providing two levels of interpretability. Both approaches are evaluated on two synthetic and two real-world datasets. Our results show that Kolmogorov-Arnold Networks achieve the coefficient of determination value R^2 close to 1 with minimal prediction error, while Deep Symbolic Regression generates compact models with moderate accuracy. Moreover, on the selected examples, we demonstrate that automated methods outperform traditional methods, achieving up to 75% reduction in prediction errors, offering accurate and explainable solutions with potential to increase the efficiency of discovering next-generation path loss models.

nan


Article 2019

Title: Scaling Trends in Language Model Robustness Skalierungstrends in der Robustheit von Sprachmodellen 语言模型强度的增缩趋势 2407.18213v5

Authors (8): Nikolaus Howe, Ian McKenzie, Oskar Hollinsworth, Michał Zajac, Tom Tseng, Aaron Tucker, Pierre-Luc Bacon, Adam Gleave

Increasing model size has unlocked a dazzling array of capabilities in modern language models. At the same time, even frontier models remain vulnerable to jailbreaks and prompt injections, despite concerted efforts to make them robust. As both attack and defense gain access to more compute, and as models become larger, what happens to robustness? We argue that to answer this question requires a \emph{scaling} approach, which we employ in an extensive study of language model robustness across several classification tasks, model families, and adversarial attacks. We find that in the absence of explicit safety training, larger models are not consistently more robust; however, scale improves sample efficiency in adversarial training, though it worsens compute efficiency. Further, we find that increasing attack compute smoothly improves attack success rate against both undefended and adversarially trained models. Finally, after exploring robustness transfer across attacks and threat models, we combine attack and defense scaling rates to study the offense-defense balance. We find that while attack scaling outpaces adversarial training across all models studied, larger adversarially trained models might give defense the advantage in the long run. These results underscore the utility of the scaling lens, and provide a paradigm for evaluating future attacks and defenses on frontier models.

nan


Article 2020

Title@2025-06-05 (4): $μ$nit Scaling: Simple and Scalable FP8 LLM Training

Title: $μ$nit Scaling: Simple and Scalable FP8 LLM Training $μ$nit Scaling: Einfache und skalierbare FP8 LLM-Schulung $m$nit 缩放:简单和可缩放的FP8LLM培训 2502.05967v3

Authors (4): Saaketh Narayan, Abhay Gupta, Mansheej Paul, Davis Blalock

Large Language Model training with 8-bit floating point (FP8) formats promises significant efficiency improvements, but reduced numerical precision makes training challenging. It is currently possible to train in FP8 only if one is willing to tune various hyperparameters, reduce model scale, or accept the overhead of computing dynamic scale factors. We demonstrate simple, scalable FP8 training that requires no dynamic scaling factors or special hyperparameters, even at large model sizes. Our method, $\mu$nit Scaling ($\mu$S), also enables simple hyperparameter transfer across model widths, matched numerics across training and inference, and other desirable properties. $\mu$nit Scaling is straightforward to implement, consisting of a set of minimal interventions based on a first-principles analysis of common transformer operations. We validate our method by training models from 1B to 13B parameters, performing all hidden linear layer computations in FP8. We achieve quality equal to higher precision baselines while also training up to 33% faster.

nan


Article 2021

Title@2025-06-05 (4): Coordinated Robustness Evaluation Framework for Vision-Language Models

Title: Coordinated Robustness Evaluation Framework for Vision-Language Models Koordinierte Robustheitsbewertungsrahmen für Vision-Language-Modelle 愿景-语言模型协调强力评价框架 2506.05429v1

Authors (8): Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna Gutierrez, Soumyendu Sarkar

Vision-language models, which integrate computer vision and natural language processing capabilities, have demonstrated significant advancements in tasks such as image captioning and visual question and answering. However, similar to traditional models, they are susceptible to small perturbations, posing a challenge to their robustness, particularly in deployment scenarios. Evaluating the robustness of these models requires perturbations in both the vision and language modalities to learn their inter-modal dependencies. In this work, we train a generic surrogate model that can take both image and text as input and generate joint representation which is further used to generate adversarial perturbations for both the text and image modalities. This coordinated attack strategy is evaluated on the visual question and answering and visual reasoning datasets using various state-of-the-art vision-language models. Our results indicate that the proposed strategy outperforms other multi-modal attacks and single-modality attacks from the recent literature. Our results demonstrate their effectiveness in compromising the robustness of several state-of-the-art pre-trained multi-modal models such as instruct-BLIP, ViLT and others.

nan


Article 2022

Title@2025-06-05 (4): Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks

Title: Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks Regularisierte KL-Divergenz für gut definierte Funktions-Raum-Varianten-Schlussfolgerung in Bayesischen neuronalen Netzwerken Bayesian神经网络中定义精准的功能-空间变动推论的正规化 KL-KL-Diverence 2406.04317v3

Authors (2): Tristan Cinquin, Robert Bamler

Bayesian neural networks (BNN) promise to combine the predictive performance of neural networks with principled uncertainty modeling important for safety-critical systems and decision making. However, posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult. This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights. In this paper, we address a fundamental issue with such function-space VI approaches pointed out by Burt et al. (2020), who showed that the objective function (ELBO) is negative infinite for most priors of interest. Our solution builds on generalized VI (Knoblauch et al., 2019) with the regularized KL divergence (Quang, 2019) and is, to the best of our knowledge, the first well-defined variational objective for function-space inference in BNNs with Gaussian process (GP) priors. Experiments show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets, and provides competitive uncertainty estimates for regression, classification and out-of-distribution detection compared to BNN baselines with both function and weight-space priors.

nan


Article 2023

Title@2025-06-05 (4): Adapt before Continual Learning

Title: Adapt before Continual Learning Anpassung vor dem kontinuierlichen Lernen 在持续学习前适应 2506.03956v2

Authors (5): Aojun Lu, Tao Feng, Hangjie Yuan, Chunhui Ding, Yanan Sun

Continual Learning (CL) seeks to enable neural networks to incrementally acquire new knowledge (plasticity) while retaining existing knowledge (stability). While pre-trained models (PTMs) have become pivotal in CL, prevailing approaches freeze the PTM backbone to preserve stability, limiting their plasticity, particularly when encountering significant domain gaps in incremental tasks. Conversely, sequentially finetuning the entire PTM risks catastrophic forgetting of generalizable knowledge, exposing a critical stability-plasticity trade-off. To address this challenge, we propose Adapting PTMs before the core CL process (ACL), a novel framework that refines the PTM backbone through a plug-and-play adaptation phase before learning each new task with existing CL approaches (e.g., prompt tuning). ACL enhances plasticity by aligning embeddings with their original class prototypes while distancing them from others, theoretically and empirically shown to balance stability and plasticity. Extensive experiments demonstrate that ACL significantly improves CL performance across benchmarks and integrated methods, offering a versatile solution for PTM-based CL. Code is available at https://github.com/byyx666/ACL_code.

nan


Article 2024

Title@2025-06-05 (4): FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation

Title: FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation FlowDAS: Ein stochastisches Interpolant-basiertes Framework für die Datenassimilierung FlowDAS:一个基于数据同化的存储式内插框架 2501.16642v2

Authors (5): Siyi Chen, Yixuan Jia, Qing Qu, He Sun, Jeffrey A Fessler

Data assimilation (DA) integrates observations with a dynamical model to estimate states of PDE-governed systems. Model-driven methods (e.g., Kalman, particle) presuppose full knowledge of the true dynamics, which is not always satisfied in practice, while purely data-driven solvers learn a deterministic mapping between observations and states and therefore miss the intrinsic stochasticity of real processes. Recently, score-based diffusion models learn a global diffusion prior and provide a good modeling of the stochastic dynamics, showing new potential for DA. However, their all-at-once generation rather than step-by-step transition limits their performance when dealing with highly complex stochastic processes and lacks physical interpretability. To tackle these drawbacks, we introduce FlowDAS, a generative DA framework that uses stochastic interpolants to directly learn state transition dynamics and achieve step-by-step transition to better model the real dynamics. We also improve the framework by combining the observation, better suiting the DA settings. Directly learning the underlying dynamics from collected data removes restrictive dynamical assumptions, and conditioning on observations at each interpolation step yields stable, measurement-consistent forecasts. Experiments on Lorenz-63, Navier-Stokes super-resolution/sparse-observation scenarios, and large-scale weather forecasting – where dynamics are partly or wholly unknown – show that FlowDAS surpasses model-driven methods, neural operators, and score-based baselines in accuracy and physical plausibility.

nan


Article 2025

Title@2025-06-05 (4): Wyckoff Transformer: Generation of Symmetric Crystals

Title: Wyckoff Transformer: Generation of Symmetric Crystals Wyckoff Transformer: Erzeugung symmetrischer Kristalle Wyckoff 变形器: 生成对称晶体 2503.02407v4

Authors (7): Nikita Kazeev, Wei Nong, Ignat Romanov, Ruiming Zhu, Andrey Ustyuzhanin, Shuya Yamazaki, Kedar Hippalgaonkar

Crystal symmetry plays a fundamental role in determining its physical, chemical, and electronic properties such as electrical and thermal conductivity, optical and polarization behavior, and mechanical strength. Almost all known crystalline materials have internal symmetry. However, this is often inadequately addressed by existing generative models, making the consistent generation of stable and symmetrically valid crystal structures a significant challenge. We introduce WyFormer, a generative model that directly tackles this by formally conditioning on space group symmetry. It achieves this by using Wyckoff positions as the basis for an elegant, compressed, and discrete structure representation. To model the distribution, we develop a permutation-invariant autoregressive model based on the Transformer encoder and an absence of positional encoding. Extensive experimentation demonstrates WyFormer’s compelling combination of attributes: it achieves best-in-class symmetry-conditioned generation, incorporates a physics-motivated inductive bias, produces structures with competitive stability, predicts material properties with competitive accuracy even without atomic coordinates, and exhibits unparalleled inference speed.

nan


Article 2026

Title@2025-06-05 (4): Using In-Context Learning for Automatic Defect Labelling of Display Manufacturing Data

Title: Using In-Context Learning for Automatic Defect Labelling of Display Manufacturing Data Verwendung von In-Context-Learning zur automatischen Defekt-Etikettierung von Display-Produktionsdaten 使用内文学习自动去除显示器制造数据标签 2506.04717v1

Authors (5): Babar Hussain, Qiang Liu, Gang Chen, Bihai She, Dahai Yu

This paper presents an AI-assisted auto-labeling system for display panel defect detection that leverages in-context learning capabilities. We adopt and enhance the SegGPT architecture with several domain-specific training techniques and introduce a scribble-based annotation mechanism to streamline the labeling process. Our two-stage training approach, validated on industrial display panel datasets, demonstrates significant improvements over the baseline model, achieving an average IoU increase of 0.22 and a 14% improvement in recall across multiple product types, while maintaining approximately 60% auto-labeling coverage. Experimental results show that models trained on our auto-labeled data match the performance of those trained on human-labeled data, offering a practical solution for reducing manual annotation efforts in industrial inspection systems.

nan


Article 2027

Title@2025-06-05 (4): Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices

Title: Infinitesimal Higher-Order Spectral Variations in Rectangular Real Random Matrices Infinitesimale Spektralvariationen höherer Ordnung in rektangulären realen Zufallsmatrizen 矩形真实随机矩阵中无尽的高端天文光谱变化 2506.03764v2

Authors (1): Róisín Luo

We present a theoretical framework for deriving the general $n$-th order Fr'echet derivatives of singular values in real rectangular matrices, by leveraging reduced resolvent operators from Kato’s analytic perturbation theory for self-adjoint operators. Deriving closed-form expressions for higher-order derivatives of singular values is notoriously challenging through standard matrix-analysis techniques. To overcome this, we treat a real rectangular matrix as a compact operator on a finite-dimensional Hilbert space, and embed the rectangular matrix into a block self-adjoint operator so that non-symmetric perturbations are captured. Applying Kato’s asymptotic eigenvalue expansion to this construction, we obtain a general, closed-form expression for the infinitesimal $n$-th order spectral variations. Specializing to $n=2$ and deploying on a Kronecker-product representation with matrix convention yield the Hessian of a singular value, not found in literature. By bridging abstract operator-theoretic perturbation theory with matrices, our framework equips researchers with a practical toolkit for higher-order spectral sensitivity studies in random matrix applications (e.g., adversarial perturbation in deep learning).

nan


Article 2028

Title@2025-06-05 (4): Bandit Multiclass List Classification

Title: Bandit Multiclass List Classification Bandit-Listenklassifikation für mehrere Klassen 强盗多级名单分类 2502.09257v2

Authors (2): Liad Erez, Tomer Koren

We study the problem of multiclass list classification with (semi-)bandit feedback, where input examples are mapped into subsets of size $m$ of a collection of $K$ possible labels. In each round of the interaction, the learner observes feedback consisting of the predicted labels which lie in some underlying set of ground truth labels associated with the given example. Our main result is for the $(\varepsilon,\delta)$-PAC variant of the problem for which we design an algorithm that returns an $\varepsilon$-optimal hypothesis with high probability using a sample complexity of $\widetilde{O} \big( (\mathrm{poly}(K/m) + sm / \varepsilon^2) \log ( H /\delta) \big)$ where $H$ is the underlying (finite) hypothesis class and $s$ is an upper bound on the number of true labels for a given example. This bound improves upon known bounds for combinatorial semi-bandits whenever $s \ll K$. Moreover, in the regime where $s = O(1)$ the leading terms in our bound match the corresponding full-information rates, implying that bandit feedback essentially comes at no cost. Our PAC learning algorithm is also computationally efficient given access to an ERM oracle for $H$. In the special case of single-label classification corresponding to $s=m=1$, we prove a sample complexity bound of $O \big((K^7 + 1/\varepsilon^2)\log ( H /\delta)\big)$ which improves upon recent results in this scenario (Erez et al. ‘24). Additionally, we consider the regret minimization setting where data can be generated adversarially, and establish a regret bound of $\widetilde O( H + \sqrt{smT \log H })$. Our results generalize and extend prior work in the simpler single-label setting (Erez et al. ‘24), and apply more generally to contextual combinatorial semi-bandit problems with $s$-sparse rewards.

nan


Article 2029

Title@2025-06-05 (4): UNO: Unlearning via Orthogonalization in Generative models

Title: UNO: Unlearning via Orthogonalization in Generative models UNO: Entlernen über Orthogonalisierung in generativen Modellen UNO:在创制模型中,通过正正孔化重新学习 2506.04712v1

Authors (2): Pinak Mandal, Georg A. Gottwald

As generative models become increasingly powerful and pervasive, the ability to unlearn specific data, whether due to privacy concerns, legal requirements, or the correction of harmful content, has become increasingly important. Unlike in conventional training, where data are accumulated and knowledge is reinforced, unlearning aims to selectively remove the influence of particular data points without costly retraining from scratch. To be effective and reliable, such algorithms need to achieve (i) forgetting of the undesired data, (ii) preservation of the quality of the generation, (iii) preservation of the influence of the desired training data on the model parameters, and (iv) small number of training steps. We propose fast unlearning algorithms based on loss gradient orthogonalization. We show that our algorithms are able to forget data while maintaining the fidelity of the original model. Using MNIST and CelebA data, we demonstrate that our algorithms achieve orders of magnitude faster unlearning times than their predecessors, such as gradient surgery.

nan


Article 2030

Title@2025-06-05 (4): Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence

Title: Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence Explizite Dichte-Annäherung für neurale Implizite Sampler mit einer Bernstein-basierten Konvex-Divergenz 使用伯恩斯坦-以伯尔斯坦为基础的电汇变异的神经隐性采样器的显性密度近似 2506.04700v1

Authors (4): José Manuel de Frutos, Manuel A. Vázquez, Pablo M. Olmos, Joaquín Míguez

Rank-based statistical metrics, such as the invariant statistical loss (ISL), have recently emerged as robust and practically effective tools for training implicit generative models. In this work, we introduce dual-ISL, a novel likelihood-free objective for training implicit generative models that interchanges the roles of the target and model distributions in the ISL framework, yielding a convex optimization problem in the space of model densities. We prove that the resulting rank-based discrepancy $d_K$ is i) continuous under weak convergence and with respect to the $L^1$ norm, and ii) convex in its first argument-properties not shared by classical divergences such as KL or Wasserstein distances. Building on this, we develop a theoretical framework that interprets $d_K$ as an $L^2$-projection of the density ratio $q = p/\tilde p$ onto a Bernstein polynomial basis, from which we derive exact bounds on the truncation error, precise convergence rates, and a closed-form expression for the truncated density approximation. We further extend our analysis to the multivariate setting via random one-dimensional projections, defining a sliced dual-ISL divergence that retains both convexity and continuity. We empirically show that these theoretical advantages translate into practical ones. Specifically, across several benchmarks dual-ISL converges more rapidly, delivers markedly smoother and more stable training, and more effectively prevents mode collapse than classical ISL and other leading implicit generative methods-while also providing an explicit density approximation.

nan


Article 2031

Title@2025-06-05 (4): Enhanced Drought Analysis in Bangladesh: A Machine Learning Approach for Severity Classification Using Satellite Data

Title: Enhanced Drought Analysis in Bangladesh: A Machine Learning Approach for Severity Classification Using Satellite Data Verbesserte Trockenheitsanalyse in Bangladesch: Ein maschineller Lernansatz für die Severity-Klassifikation mittels Satellitendaten 孟加拉国加强干旱分析:利用卫星数据进行重力分类的机械学习方法 2506.04696v1

Authors (3): Tonmoy Paul, Mrittika Devi Mati, Md. Mahmudul Islam

Drought poses a pervasive environmental challenge in Bangladesh, impacting agriculture, socio-economic stability, and food security due to its unique geographic and anthropogenic vulnerabilities. Traditional drought indices, such as the Standardized Precipitation Index (SPI) and Palmer Drought Severity Index (PDSI), often overlook crucial factors like soil moisture and temperature, limiting their resolution. Moreover, current machine learning models applied to drought prediction have been underexplored in the context of Bangladesh, lacking a comprehensive integration of satellite data across multiple districts. To address these gaps, we propose a satellite data-driven machine learning framework to classify drought across 38 districts of Bangladesh. Using unsupervised algorithms like K-means and Bayesian Gaussian Mixture for clustering, followed by classification models such as KNN, Random Forest, Decision Tree, and Naive Bayes, the framework integrates weather data (humidity, soil moisture, temperature) from 2012-2024. This approach successfully classifies drought severity into different levels. However, it shows significant variabilities in drought vulnerabilities across regions which highlights the aptitude of machine learning models in terms of identifying and predicting drought conditions.

nan


Article 2032

Title@2025-06-05 (4): On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models

Title: On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models Über den Mechanismus der bewussten Musterauswahl im Ausbau des Lernens für Sprachmodelle 语文模式强化学习中选择理由选择模式机制 2506.04695v1

Authors (3): Xingwu Chen, Tianle Li, Difan Zou

Reinforcement learning (RL) has demonstrated remarkable success in enhancing model capabilities, including instruction-following, preference learning, and reasoning. Yet despite its empirical successes, the mechanisms by which RL improves reasoning abilities remain poorly understood. We present a systematic study of Reinforcement Learning with Verifiable Rewards (RLVR), showing that its primary benefit comes from optimizing the selection of existing reasoning patterns. Through extensive experiments, we demonstrate that RLVR-trained models preferentially adopt high-success-rate reasoning patterns while mostly maintaining stable performance on individual patterns. We further develop theoretical analyses on the convergence and training dynamics of RLVR based on a simplified question-reason-answer model. We study the gradient flow and show that RLVR can indeed find the solution that selects the reason pattern with the highest success rate. Besides, our theoretical results reveal two distinct regimes regarding the convergence of RLVR training: (1) rapid convergence for models with relatively strong initial reasoning capabilities versus (2) slower optimization dynamics for weaker models. Furthermore, we show that the slower optimization for weaker models can be mitigated by applying the supervised fine-tuning (SFT) before RLVR, when using a feasibly high-quality SFT dataset. We validate the theoretical findings through extensive experiments. This work advances our theoretical understanding of RL’s role in LLM fine-tuning and offers insights for further enhancing reasoning capabilities.

nan


Article 2033

Title@2025-06-05 (4): Influence Functions for Edge Edits in Non-Convex Graph Neural Networks

Title: Influence Functions for Edge Edits in Non-Convex Graph Neural Networks Einflussfunktionen für Kantenbearbeitungen in nicht konvexen Graphen-Neuralnetzwerken 非Convex 图形神经网络中的边缘编辑的影响函数 2506.04694v1

Authors (6): Jaeseung Heo, Kyeongheung Yun, Seokwon Yoon, MoonJeong Park, Jungseul Ok, Dongwoo Kim

Understanding how individual edges influence the behavior of graph neural networks (GNNs) is essential for improving their interpretability and robustness. Graph influence functions have emerged as promising tools to efficiently estimate the effects of edge deletions without retraining. However, existing influence prediction methods rely on strict convexity assumptions, exclusively consider the influence of edge deletions while disregarding edge insertions, and fail to capture changes in message propagation caused by these modifications. In this work, we propose a proximal Bregman response function specifically tailored for GNNs, relaxing the convexity requirement and enabling accurate influence prediction for standard neural network architectures. Furthermore, our method explicitly accounts for message propagation effects and extends influence prediction to both edge deletions and insertions in a principled way. Experiments with real-world datasets demonstrate accurate influence predictions for different characteristics of GNNs. We further demonstrate that the influence function is versatile in applications such as graph rewiring and adversarial attacks.

nan


Article 2034

Title@2025-06-05 (4): Towards Better Generalization via Distributional Input Projection Network

Title: Towards Better Generalization via Distributional Input Projection Network Auf dem Weg zu einer besseren Verallgemeinerung über das Distributional Input Projection Network 通过分配投入预测网络更好地实现普遍化 2506.04690v1

Authors (4): Yifan Hao, Yanxin Lu, Xinwei Shen, Tong Zhang

As overparameterized models become increasingly prevalent, training loss alone offers limited insight into generalization performance. While smoothness has been linked to improved generalization across various settings, directly enforcing smoothness in neural networks remains challenging. To address this, we introduce Distributional Input Projection Networks (DIPNet), a novel framework that projects inputs into learnable distributions at each layer. This distributional representation induces a smoother loss landscape with respect to the input, promoting better generalization. We provide theoretical analysis showing that DIPNet reduces both local smoothness measures and the Lipschitz constant of the network, contributing to improved generalization performance. Empirically, we validate DIPNet across a wide range of architectures and tasks, including Vision Transformers (ViTs), Large Language Models (LLMs), ResNet and MLPs. Our method consistently enhances test performance under standard settings, adversarial attacks, out-of-distribution inputs, and reasoning benchmarks. We demonstrate that the proposed input projection strategy can be seamlessly integrated into existing models, providing a general and effective approach for boosting generalization performance in modern deep learning.

nan


Article 2035

Title@2025-06-05 (4): Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Title: Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models Recycling the Web: Eine Methode zur Verbesserung der Vorschulung von Daten Qualität und Menge für Sprachmodelle 网上再循环:提高语文模式培训前数据质量和数量的方法 2506.04689v1

Authors (7): Thao Nguyen, Yang Li, Olga Golovneva, Luke Zettlemoyer, Sewoong Oh, Ludwig Schmidt, Xian Li

Scaling laws predict that the performance of large language models improves with increasing model size and data size. In practice, pre-training has been relying on massive web crawls, using almost all data sources publicly available on the internet so far. However, this pool of natural data does not grow at the same rate as the compute supply. Furthermore, the availability of high-quality texts is even more limited: data filtering pipelines often remove up to 99% of the initial web scrapes to achieve state-of-the-art. To address the “data wall” of pre-training scaling, our work explores ways to transform and recycle data discarded in existing filtering processes. We propose REWIRE, REcycling the Web with guIded REwrite, a method to enrich low-quality documents so that they could become useful for training. This in turn allows us to increase the representation of synthetic data in the final pre-training set. Experiments at 1B, 3B and 7B scales of the DCLM benchmark show that mixing high-quality raw texts and our rewritten texts lead to 1.0, 1.3 and 2.5 percentage points improvement respectively across 22 diverse tasks, compared to training on only filtered web data. Training on the raw-synthetic data mix is also more effective than having access to 2x web data. Through further analysis, we demonstrate that about 82% of the mixed in texts come from transforming lower-quality documents that would otherwise be discarded. REWIRE also outperforms related approaches of generating synthetic data, including Wikipedia-style paraphrasing, question-answer synthesizing and knowledge extraction. These results suggest that recycling web texts holds the potential for being a simple and effective approach for scaling pre-training data.

nan


Article 2036

Title@2025-06-05 (4): UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules

Title: UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules UniSim: Ein einheitlicher Simulator für zeitgesteuerte Dynamik von Biomolekülen UniSim: 生物分子时间分析分析动态的统一模拟器 2506.03157v2

Authors (3): Ziyang Yu, Wenbing Huang, Yang Liu

Molecular Dynamics (MD) simulations are essential for understanding the atomic-level behavior of molecular systems, giving insights into their transitions and interactions. However, classical MD techniques are limited by the trade-off between accuracy and efficiency, while recent deep learning-based improvements have mostly focused on single-domain molecules, lacking transferability to unfamiliar molecular systems. Therefore, we propose \textbf{Uni}fied \textbf{Sim}ulator (UniSim), which leverages cross-domain knowledge to enhance the understanding of atomic interactions. First, we employ a multi-head pretraining approach to learn a unified atomic representation model from a large and diverse set of molecular data. Then, based on the stochastic interpolant framework, we learn the state transition patterns over long timesteps from MD trajectories, and introduce a force guidance module for rapidly adapting to different chemical environments. Our experiments demonstrate that UniSim achieves highly competitive performance across small molecules, peptides, and proteins.

nan


Article 2037

Title@2025-06-05 (4): Model Predictive Control is Almost Optimal for Restless Bandit

Title: Model Predictive Control is Almost Optimal for Restless Bandit Modell Predictive Control ist fast optimal für Restless Bandit 模型预测控制几乎是无休止强盗的最佳模式 2410.06307v2

Authors (2): Nicolas Gast, Dheeraj Narasimha

We consider the discrete time infinite horizon average reward restless markovian bandit (RMAB) problem. We propose a \emph{model predictive control} based non-stationary policy with a rolling computational horizon $\tau$. At each time-slot, this policy solves a $\tau$ horizon linear program whose first control value is kept as a control for the RMAB. Our solution requires minimal assumptions and quantifies the loss in optimality in terms of $\tau$ and the number of arms, $N$. We show that its sub-optimality gap is $O(1/\sqrt{N})$ in general, and $\exp(-\Omega(N))$ under a local-stability condition. Our proof is based on a framework from dynamic control known as \emph{dissipativity}. Our solution easy to implement and performs very well in practice when compared to the state of the art. Further, both our solution and our proof methodology can easily be generalized to more general constrained MDP settings and should thus, be of great interest to the burgeoning RMAB community.

nan


Article 2038

Title@2025-06-05 (4): Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?

Title: Neural Discovery in Mathematics: Do Machines Dream of Colored Planes? Neural Discovery in der Mathematik: Träumen Maschinen von farbigen Flugzeugen? 数学中的神经发现:机器梦到彩色平面吗? 2501.18527v3

Authors (5): Konrad Mundinger, Max Zimmer, Aldo Kiem, Christoph Spiegel, Sebastian Pokutta

We demonstrate how neural networks can drive mathematical discovery through a case study of the Hadwiger-Nelson problem, a long-standing open problem at the intersection of discrete geometry and extremal combinatorics that is concerned with coloring the plane while avoiding monochromatic unit-distance pairs. Using neural networks as approximators, we reformulate this mixed discrete-continuous geometric coloring problem with hard constraints as an optimization task with a probabilistic, differentiable loss function. This enables gradient-based exploration of admissible configurations that most significantly led to the discovery of two novel six-colorings, providing the first improvement in thirty years to the off-diagonal variant of the original problem. Here, we establish the underlying machine learning approach used to obtain these results and demonstrate its broader applicability through additional numerical insights.

nan


Article 2039

Title@2025-06-05 (4): Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction

Title: Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction Diffusion mit einem sprachlichen Kompass: Steuerung der Generation klinisch plausibler Zukunft sMRI-Darstellungen für frühe MCI-Konversionsvorhersage 语言指南的传播:指导为早期MCI转换预测生成临床可视未来SMRI代表 2506.05428v1

Authors (4): Zhihao Tang, Chaozhuo Li, Litian Zhang, Xi Zhang

Early prediction of Mild Cognitive Impairment (MCI) conversion is hampered by a trade-off between immediacy–making fast predictions from a single baseline sMRI–and accuracy–leveraging longitudinal scans to capture disease progression. We propose MCI-Diff, a diffusion-based framework that synthesizes clinically plausible future sMRI representations directly from baseline data, achieving both real-time risk assessment and high predictive performance. First, a multi-task sequence reconstruction strategy trains a shared denoising network on interpolation and extrapolation tasks to handle irregular follow-up sampling and learn robust latent trajectories. Second, an LLM-driven “linguistic compass” is introduced for clinical plausibility sampling: generated feature candidates are quantized, tokenized, and scored by a fine-tuned language model conditioned on expected structural biomarkers, guiding autoregressive generation toward realistic disease patterns. Experiments on ADNI and AIBL cohorts show that MCI-Diff outperforms state-of-the-art baselines, improving early conversion accuracy by 5-12%.

nan


Article 2040

Title@2025-06-05 (4): Urania: Differentially Private Insights into AI Use

Title: Urania: Differentially Private Insights into AI Use Urania: Unterschiedliche private Einblicke in die KI-Nutzung Urania: 差异私人透视转化为 AI 使用 2506.04681v1

Authors (11): Daogao Liu, Edith Cohen, Badih Ghazi, Peter Kairouz, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi, Adam Sealfon, Da Yu, Chiyuan Zhang

We introduce $Urania$, a novel framework for generating insights about LLM chatbot interactions with rigorous differential privacy (DP) guarantees. The framework employs a private clustering mechanism and innovative keyword extraction methods, including frequency-based, TF-IDF-based, and LLM-guided approaches. By leveraging DP tools such as clustering, partition selection, and histogram-based summarization, $Urania$ provides end-to-end privacy protection. Our evaluation assesses lexical and semantic content preservation, pair similarity, and LLM-based metrics, benchmarking against a non-private Clio-inspired pipeline (Tamkin et al., 2024). Moreover, we develop a simple empirical privacy evaluation that demonstrates the enhanced robustness of our DP pipeline. The results show the framework’s ability to extract meaningful conversational insights while maintaining stringent user privacy, effectively balancing data utility with privacy preservation.

nan


Article 2041

Title@2025-06-05 (4): The cost of ensembling: is it always worth combining?

Title: The cost of ensembling: is it always worth combining? Die Kosten für die Montage: lohnt es sich immer zu kombinieren? 集合的成本:总值得合并吗? 2506.04677v1

Authors (1): Marco Zanotti

Given the continuous increase in dataset sizes and the complexity of forecasting models, the trade-off between forecast accuracy and computational cost is emerging as an extremely relevant topic, especially in the context of ensemble learning for time series forecasting. To asses it, we evaluated ten base models and eight ensemble configurations across two large-scale retail datasets (M5 and VN1), considering both point and probabilistic accuracy under varying retraining frequencies. We showed that ensembles consistently improve forecasting performance, particularly in probabilistic settings. However, these gains come at a substantial computational cost, especially for larger, accuracy-driven ensembles. We found that reducing retraining frequency significantly lowers costs, with minimal impact on accuracy, particularly for point forecasts. Moreover, efficiency-driven ensembles offer a strong balance, achieving competitive accuracy with considerably lower costs compared to accuracy-optimized combinations. Most importantly, small ensembles of two or three models are often sufficient to achieve near-optimal results. These findings provide practical guidelines for deploying scalable and cost-efficient forecasting systems, supporting the broader goals of sustainable AI in forecasting. Overall, this work shows that careful ensemble design and retraining strategy selection can yield accurate, robust, and cost-effective forecasts suitable for real-world applications.

nan


Article 2042

Title@2025-06-05 (4): Gen-n-Val: Agentic Image Data Generation and Validation

Title: Gen-n-Val: Agentic Image Data Generation and Validation Gen-n-Val: Gen-n-Val: Agentische Bilddatengenerierung und -validierung Gen-n-Val: 代理图像数据生成和校验 2506.04676v1

Authors (5): Jing-En Huang, I-Sheng Fang, Tzuhsuan Huang, Chih-Yu Wang, Jun-Cheng Chen

Recently, Large Language Models (LLMs) and Vision Large Language Models (VLLMs) have demonstrated impressive performance as agents across various tasks while data scarcity and label noise remain significant challenges in computer vision tasks, such as object detection and instance segmentation. A common solution for resolving these issues is to generate synthetic data. However, current synthetic data generation methods struggle with issues, such as multiple objects per mask, inaccurate segmentation, and incorrect category labels, limiting their effectiveness. To address these issues, we introduce Gen-n-Val, a novel agentic data generation framework that leverages Layer Diffusion (LD), LLMs, and VLLMs to produce high-quality, single-object masks and diverse backgrounds. Gen-n-Val consists of two agents: (1) The LD prompt agent, an LLM, optimizes prompts for LD to generate high-quality foreground instance images and segmentation masks. These optimized prompts ensure the generation of single-object synthetic data with precise instance masks and clean backgrounds. (2) The data validation agent, a VLLM, which filters out low-quality synthetic instance images. The system prompts for both agents are refined through TextGrad. Additionally, we use image harmonization to combine multiple instances within scenes. Compared to state-of-the-art synthetic data approaches like MosaicFusion, our approach reduces invalid synthetic data from 50% to 7% and improves performance by 1% mAP on rare classes in COCO instance segmentation with YOLOv9c and YOLO11m. Furthermore, Gen-n-Val shows significant improvements (7. 1% mAP) over YOLO-Worldv2-M in open-vocabulary object detection benchmarks with YOLO11m. Moreover, Gen-n-Val improves the performance of YOLOv9 and YOLO11 families in instance segmentation and object detection.

nan


Article 2043

Title@2025-06-05 (4): Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

Title: Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models Psi-Sampler: Erste Partikel-Probenahme für SMC-basierte Inferenz-Zeit-Belohnung in Score-Modellen Psi-Sampler: 分数模型中基于 SMC 的推理-时间回报比对初始粒子抽样 2506.01320v2

Authors (4): Taehoon Yoon, Yunhong Min, Kyeongmin Yeo, Minhyuk Sung

We introduce $\Psi$-Sampler, an SMC-based framework incorporating pCNL-based initial particle sampling for effective inference-time reward alignment with a score-based generative model. Inference-time reward alignment with score-based generative models has recently gained significant traction, following a broader paradigm shift from pre-training to post-training optimization. At the core of this trend is the application of Sequential Monte Carlo (SMC) to the denoising process. However, existing methods typically initialize particles from the Gaussian prior, which inadequately captures reward-relevant regions and results in reduced sampling efficiency. We demonstrate that initializing from the reward-aware posterior significantly improves alignment performance. To enable posterior sampling in high-dimensional latent spaces, we introduce the preconditioned Crank-Nicolson Langevin (pCNL) algorithm, which combines dimension-robust proposals with gradient-informed dynamics. This approach enables efficient and scalable posterior sampling and consistently improves performance across various reward alignment tasks, including layout-to-image generation, quantity-aware generation, and aesthetic-preference generation, as demonstrated in our experiments. Project Webpage: https://psi-sampler.github.io/

nan


Article 2044

Title@2025-06-05 (4): TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization

Title: TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization TANGO: Clustering mit Typicality-Aware Nonlocal Mode-Seeking und Graph-Cut-Optimierung TANGO:以非本地模式搜索和图集优化组合组合 2408.10084v2

Authors (3): Haowen Ma, Zhiguo Long, Hua Meng

Density-based mode-seeking methods generate a \emph{density-ascending dependency} from low-density points towards higher-density neighbors. Current mode-seeking methods identify modes by breaking some dependency connections, but relying heavily on local data characteristics, requiring case-by-case threshold settings or human intervention to be effective for different datasets. To address this issue, we introduce a novel concept called \emph{typicality}, by exploring the \emph{locally defined} dependency from a \emph{global} perspective, to quantify how confident a point would be a mode. We devise an algorithm that effectively and efficiently identifies modes with the help of the global-view typicality. To implement and validate our idea, we design a clustering method called TANGO, which not only leverages typicality to detect modes, but also utilizes graph-cut with an improved \emph{path-based similarity} to aggregate data into the final clusters. Moreover, this paper also provides some theoretical analysis on the proposed algorithm. Experimental results on several synthetic and extensive real-world datasets demonstrate the effectiveness and superiority of TANGO. The code is available at https://github.com/SWJTU-ML/TANGO_code.

nan


Article 2045

Title@2025-06-05 (4): MTPNet: Multi-Grained Target Perception for Unified Activity Cliff Prediction

Title: MTPNet: Multi-Grained Target Perception for Unified Activity Cliff Prediction MTPNet: Multi-Grained Target Perception für Unified Activity Cliff Prediction MTPNet: 统一活动悬崖预测的多重目标感知 2506.05427v1

Authors (5): Zishan Shu, Yufan Deng, Hongyu Zhang, Zhiwei Nie, Jie Chen

Activity cliff prediction is a critical task in drug discovery and material design. Existing computational methods are limited to handling single binding targets, which restricts the applicability of these prediction models. In this paper, we present the Multi-Grained Target Perception network (MTPNet) to incorporate the prior knowledge of interactions between the molecules and their target proteins. Specifically, MTPNet is a unified framework for activity cliff prediction, which consists of two components: Macro-level Target Semantic (MTS) guidance and Micro-level Pocket Semantic (MPS) guidance. By this way, MTPNet dynamically optimizes molecular representations through multi-grained protein semantic conditions. To our knowledge, it is the first time to employ the receptor proteins as guiding information to effectively capture critical interaction details. Extensive experiments on 30 representative activity cliff datasets demonstrate that MTPNet significantly outperforms previous approaches, achieving an average RMSE improvement of 18.95% on top of several mainstream GNN architectures. Overall, MTPNet internalizes interaction patterns through conditional deep learning to achieve unified predictions of activity cliffs, helping to accelerate compound optimization and design. Codes are available at: https://github.com/ZishanShu/MTPNet.

nan


Article 2046

Title@2025-06-05 (4): FedAPM: Federated Learning via ADMM with Partial Model Personalization

Title: FedAPM: Federated Learning via ADMM with Partial Model Personalization FedAPM: Federated Learning via ADMM mit Teilmodell-Personalisierung FedAPP:通过具有部分示范性人格化的ADMMD学习联合会 2506.04672v1

Authors (9): Shengkun Zhu, Feiteng Nie, Jinshan Zeng, Sheng Wang, Yuan Sun, Yuan Yao, Shangfeng Chen, Quanqing Xu, Chuanhui Yang

In federated learning (FL), the assumption that datasets from different devices are independent and identically distributed (i.i.d.) often does not hold due to user differences, and the presence of various data modalities across clients makes using a single model impractical. Personalizing certain parts of the model can effectively address these issues by allowing those parts to differ across clients, while the remaining parts serve as a shared model. However, we found that partial model personalization may exacerbate client drift (each client’s local model diverges from the shared model), thereby reducing the effectiveness and efficiency of FL algorithms. We propose an FL framework based on the alternating direction method of multipliers (ADMM), referred to as FedAPM, to mitigate client drift. We construct the augmented Lagrangian function by incorporating first-order and second-order proximal terms into the objective, with the second-order term providing fixed correction and the first-order term offering compensatory correction between the local and shared models. Our analysis demonstrates that FedAPM, by using explicit estimates of the Lagrange multiplier, is more stable and efficient in terms of convergence compared to other FL frameworks. We establish the global convergence of FedAPM training from arbitrary initial points to a stationary point, achieving three types of rates: constant, linear, and sublinear, under mild assumptions. We conduct experiments using four heterogeneous and multimodal datasets with different metrics to validate the performance of FedAPM. Specifically, FedAPM achieves faster and more accurate convergence, outperforming the SOTA methods with average improvements of 12.3% in test accuracy, 16.4% in F1 score, and 18.0% in AUC while requiring fewer communication rounds.

nan


Article 2047

Title@2025-06-05 (4): Noise-Resistant Label Reconstruction Feature Selection for Partial Multi-Label Learning

Title: Noise-Resistant Label Reconstruction Feature Selection for Partial Multi-Label Learning Noise-Resistent Label Reconstruction Feature Selection for Partial Multi-Label Learning 部分多标签学习选择部分多标签学习的恢复标签功能 2506.04669v1

Authors (4): Wanfu Gao, Hanlin Pan, Qingqi Han, Kunpeng Liu

The “Curse of dimensionality” is prevalent across various data patterns, which increases the risk of model overfitting and leads to a decline in model classification performance. However, few studies have focused on this issue in Partial Multi-label Learning (PML), where each sample is associated with a set of candidate labels, at least one of which is correct. Existing PML methods addressing this problem are mainly based on the low-rank assumption. However, low-rank assumption is difficult to be satisfied in practical situations and may lead to loss of high-dimensional information. Furthermore, we find that existing methods have poor ability to identify positive labels, which is important in real-world scenarios. In this paper, a PML feature selection method is proposed considering two important characteristics of dataset: label relationship’s noise-resistance and label connectivity. Our proposed method utilizes label relationship’s noise-resistance to disambiguate labels. Then the learning process is designed through the reformed low-rank assumption. Finally, representative labels are found through label connectivity, and the weight matrix is reconstructed to select features with strong identification ability to these labels. The experimental results on benchmark datasets demonstrate the superiority of the proposed method.

nan


Article 2048

Title@2025-06-05 (4): Mixture-of-Experts Meets In-Context Reinforcement Learning

Title: Mixture-of-Experts Meets In-Context Reinforcement Learning Mixture-of-Experts trifft auf In-Context-Verstärktes Lernen 混合专家会议 文中加强学习 2506.05426v1

Authors (7): Wenhao Wu, Fuhong Liu, Haoru Li, Zican Hu, Daoyi Dong, Chunlin Chen, Zhi Wang

In-context reinforcement learning (ICRL) has emerged as a promising paradigm for adapting RL agents to downstream tasks through prompt conditioning. However, two notable challenges remain in fully harnessing in-context learning within RL domains: the intrinsic multi-modality of the state-action-reward data and the diverse, heterogeneous nature of decision tasks. To tackle these challenges, we propose \textbf{T2MIR} (\textbf{T}oken- and \textbf{T}ask-wise \textbf{M}oE for \textbf{I}n-context \textbf{R}L), an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models. T2MIR substitutes the feedforward layer with two parallel layers: a token-wise MoE that captures distinct semantics of input tokens across multiple modalities, and a task-wise MoE that routes diverse tasks to specialized experts for managing a broad task distribution with alleviated gradient conflicts. To enhance task-wise routing, we introduce a contrastive learning method that maximizes the mutual information between the task and its router representation, enabling more precise capture of task-relevant information. The outputs of two MoE components are concatenated and fed into the next layer. Comprehensive experiments show that T2MIR significantly facilitates in-context learning capacity and outperforms various types of baselines. We bring the potential and promise of MoE to ICRL, offering a simple and scalable architectural enhancement to advance ICRL one step closer toward achievements in language and vision communities. Our code is available at https://github.com/NJU-RL/T2MIR.

nan


Article 2049

Title@2025-06-05 (4): Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs

Title: Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs Multiple Invertible und Partial-Equivariant Funktion für Latent Vector Transformation zur Verbesserung der Disentanglementierung in VAEs 用于中子矢量变异的多重可垂直和部分等同函数,以强化 VAEs 的分解 2502.03740v2

Authors (3): Hee-Jun Jung, Jaehyoung Jeong, Kangil Kim

Disentanglement learning is a core issue for understanding and re-using trained information in Variational AutoEncoder (VAE), and effective inductive bias has been reported as a key factor. However, the actual implementation of such bias is still vague. In this paper, we propose a novel method, called Multiple Invertible and partial-equivariant transformation (MIPE-transformation), to inject inductive bias by 1) guaranteeing the invertibility of latent-to-latent vector transformation while preserving a certain portion of equivariance of input-to-latent vector transformation, called Invertible and partial-equivariant transformation (IPE-transformation), 2) extending the form of prior and posterior in VAE frameworks to an unrestricted form through a learnable conversion to an approximated exponential family, called Exponential Family conversion (EF-conversion), and 3) integrating multiple units of IPE-transformation and EF-conversion, and their training. In experiments on 3D Cars, 3D Shapes, and dSprites datasets, MIPE-transformation improves the disentanglement performance of state-of-the-art VAEs.

nan


Article 2050

Title@2025-06-05 (4): Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning

Title: Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning Latent Safety-Constrained Policy Approach für sicheres Offline-Verstärkungslernen 安全离线强化安全学习受安保约束的政策办法 2412.08794v2

Authors (4): Prajwal Koirala, Zhanhong Jiang, Soumik Sarkar, Cody Fleming

In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints, utilizing only offline data. Traditional methods often face difficulties in balancing these constraints, leading to either diminished performance or increased safety risks. We address these issues with a novel approach that begins by learning a conservatively safe policy through the use of Conditional Variational Autoencoders, which model the latent safety constraints. Subsequently, we frame this as a Constrained Reward-Return Maximization problem, wherein the policy aims to optimize rewards while complying with the inferred latent safety constraints. This is achieved by training an encoder with a reward-Advantage Weighted Regression objective within the latent constraint space. Our methodology is supported by theoretical analysis, including bounds on policy performance and sample complexity. Extensive empirical evaluation on benchmark datasets, including challenging autonomous driving scenarios, demonstrates that our approach not only maintains safety compliance but also excels in cumulative reward optimization, surpassing existing methods. Additional visualizations provide further insights into the effectiveness and underlying mechanisms of our approach.

nan


Article 2051

Title@2025-06-05 (4): Can Artificial Intelligence Trade the Stock Market?

Title: Can Artificial Intelligence Trade the Stock Market? Kann Künstliche Intelligenz den Aktienmarkt handeln? 人工智能贸易 股市? 2506.04658v1

Authors (2): Jędrzej Maskiewicz, Paweł Sakowski

The paper explores the use of Deep Reinforcement Learning (DRL) in stock market trading, focusing on two algorithms: Double Deep Q-Network (DDQN) and Proximal Policy Optimization (PPO) and compares them with Buy and Hold benchmark. It evaluates these algorithms across three currency pairs, the S&P 500 index and Bitcoin, on the daily data in the period of 2019-2023. The results demonstrate DRL’s effectiveness in trading and its ability to manage risk by strategically avoiding trades in unfavorable conditions, providing a substantial edge over classical approaches, based on supervised learning in terms of risk-adjusted returns.

nan


Article 2052

Title@2025-06-05 (4): The Oversmoothing Fallacy: A Misguided Narrative in GNN Research

Title: The Oversmoothing Fallacy: A Misguided Narrative in GNN Research Der überwältigende Fehler: Ein fehlgeleitetes Narrativ in der GNN-Forschung 《压倒性代价:GNN研究中的误导性叙述》 2506.04653v1

Authors (5): MoonJeong Park, Sunghyun Choi, Jaeseung Heo, Eunhyeok Park, Dongwoo Kim

Oversmoothing has been recognized as a main obstacle to building deep Graph Neural Networks (GNNs), limiting the performance. This position paper argues that the influence of oversmoothing has been overstated and advocates for a further exploration of deep GNN architectures. Given the three core operations of GNNs, aggregation, linear transformation, and non-linear activation, we show that prior studies have mistakenly confused oversmoothing with the vanishing gradient, caused by transformation and activation rather than aggregation. Our finding challenges prior beliefs about oversmoothing being unique to GNNs. Furthermore, we demonstrate that classical solutions such as skip connections and normalization enable the successful stacking of deep GNN layers without performance degradation. Our results clarify misconceptions about oversmoothing and shed new light on the potential of deep GNNs.

nan


Article 2053

Title@2025-06-05 (4): Active Learning with Variational Quantum Circuits for Quantum Process Tomography

Title: Active Learning with Variational Quantum Circuits for Quantum Process Tomography Aktives Lernen mit unterschiedlichen Quantenkreisen für die Quantenprozesstomographie 使用量子过程地形图谱学量子变量电路进行主动学习 2412.20925v2

Authors (3): Jiaqi Yang, Xiaohua Xu, Wei Xie

Quantum process tomography (QPT) is a fundamental tool for fully characterizing quantum systems. It relies on querying a set of quantum states as input to the quantum process. Previous QPT methods typically employ a straightforward strategy for randomly selecting quantum states, overlooking differences in informativeness among them. In this work, we propose a general active learning (AL) framework that adaptively selects the most informative subset of quantum states for reconstruction. We design and evaluate various AL algorithms and provide practical guidelines for selecting suitable methods in different scenarios. In particular, we introduce a learning framework that leverages the widely-used variational quantum circuits (VQCs) to perform the QPT task and integrate our AL algorithms into the query step. We demonstrate our algorithms by reconstructing the unitary quantum processes resulting from random quantum circuits with up to seven qubits. Numerical results show that our AL algorithms achieve significantly improved reconstruction, and the improvement increases with the size of the underlying quantum system. Our work opens new avenues for further advancing existing QPT methods.

nan


Article 2054

Title@2025-06-05 (4): Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction

Title: Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction Neural Network Reprogrammierbarkeit: Ein einheitliches Thema auf Modellumprogrammierung, Prompt-Tuning und Prompt-Instruktion 神经网络重组:关于模式重组、即时提款和即时指示的统一主题 2506.04650v1

Authors (7): Zesheng Ye, Chengyi Cai, Ruijiang Dong, Jianzhong Qi, Lei Feng, Pin-Yu Chen, Feng Liu

As large-scale pre-trained foundation models continue to expand in size and capability, efficiently adapting them to specific downstream tasks has become increasingly critical. Despite substantial progress, existing adaptation approaches have evolved largely in isolation, without a clear understanding of their interrelationships. This survey introduces neural network reprogrammability as a unifying framework that bridges mainstream model adaptation techniques–model reprogramming, prompt tuning, and prompt instruction–previously fragmented research areas yet converges on a shared principle: repurposing a pre-trained model by manipulating information at the interfaces while keeping the model parameters frozen. These methods exploit neural networks’ sensitivity to manipulation on different interfaces, be it through perturbing inputs, inserting tokens into intermediate layers, or providing task-specific examples in context, to redirect model behaviors towards desired outcomes. We then present a taxonomy that categorizes such information manipulation-based adaptation approaches across four key dimensions: manipulation format (fixed or learnable), location (interfaces where manipulations occur), operator (how they are applied), and output alignment requirement (post-processing needed to align outputs with downstream tasks). Notably, this framework applies consistently across data modalities, independent of specific model architectures. Moreover, viewing established techniques like in-context learning and chain-of-thought prompting through this lens reveals both their theoretical connections and practical distinctions. We further analyze remaining technical challenges and ethical considerations, positioning neural network reprogrammability as a fundamental paradigm for efficient model adaptation. We lastly identify promising research directions emerging from this integrative viewpoint.

nan


Article 2055

Title@2025-06-05 (4): Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing

Title: Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing Nicht alle Optionen sind gleich: Textuelle Optionsgewichtung für Token-Efficient LLM-basierte Wissensverfolgung 并非所有选项都创建为等: Token- Efficient LLM 知识追踪的文本选项比重 2410.12872v2

Authors (4): JongWoo Kim, SeongYeub Chu, Bryan Wong, Mun Yi

Large Language Models (LLMs) have recently emerged as promising tools for knowledge tracing (KT) due to their strong reasoning and generalization abilities. While recent LLM-based KT methods have proposed new prompt formats, they struggle to represent the full interaction histories of example learners within a single prompt during in-context learning (ICL), resulting in limited scalability and high computational cost under token constraints. In this work, we present \textit{LLM-based Option-weighted Knowledge Tracing (LOKT)}, a simple yet effective framework that encodes the interaction histories of example learners in context as \textit{textual categorical option weights (TCOW)}. TCOW are semantic labels (e.g., ``inadequate’’) assigned to the options selected by learners when answering questions, enhancing the interpretability of LLMs. Experiments on multiple-choice datasets show that LOKT outperforms existing non-LLM and LLM-based KT models in both cold-start and warm-start settings. Moreover, LOKT enables scalable and cost-efficient inference, achieving strong performance even under strict token constraints. Our code is available at \href{https://anonymous.4open.science/r/LOKT_model-3233}{https://anonymous.4open.science/r/LOKT_model-3233}.

nan


Article 2056

Title@2025-06-05 (4): ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation

Title: ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation ActivePusher: Aktives Lernen und Planen mit Residual Physics für nichtprähensile Manipulation 主动推进器: 主动学习和规划,对非传染病操纵进行残余物理研究与规划 2506.04646v1

Authors (3): Zhuoyun Zhong, Seyedali Golestaneh, Constantinos Chamzas

Planning with learned dynamics models offers a promising approach toward real-world, long-horizon manipulation, particularly in nonprehensile settings such as pushing or rolling, where accurate analytical models are difficult to obtain. Although learning-based methods hold promise, collecting training data can be costly and inefficient, as it often relies on randomly sampled interactions that are not necessarily the most informative. To address this challenge, we propose ActivePusher, a novel framework that combines residual-physics modeling with kernel-based uncertainty-driven active learning to focus data acquisition on the most informative skill parameters. Additionally, ActivePusher seamlessly integrates with model-based kinodynamic planners, leveraging uncertainty estimates to bias control sampling toward more reliable actions. We evaluate our approach in both simulation and real-world environments and demonstrate that it improves data efficiency and planning success rates compared to baseline methods.

nan


Article 2057

Title@2025-06-05 (4): Inference economics of language models

Title: Inference economics of language models Schlußfolgerung Wirtschaftlichkeit von Sprachmodellen 语言模式的推论经济学 2506.04645v1

Authors (1): Ege Erdil

We develop a theoretical model that addresses the economic trade-off between cost per token versus serial token generation speed when deploying LLMs for inference at scale. Our model takes into account arithmetic, memory bandwidth, network bandwidth and latency constraints; and optimizes over different parallelism setups and batch sizes to find the ones that optimize serial inference speed at a given cost per token. We use the model to compute Pareto frontiers of serial speed versus cost per token for popular language models.

nan


Article 2058

Title@2025-06-05 (4): MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning

Title: MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning MoDoMoDo: Multi-Domain-Datenmischungen für multimodales LLM-Verstärkungslernen MoDoMoMoDoDo:多式LLM强化学习多功能数据混合体 2505.24871v2

Authors (10): Yiqing Liang, Jielin Qiu, Wenhao Ding, Zuxin Liu, James Tompkin, Mengdi Xu, Mengzhou Xia, Zhengzhong Tu, Laixi Shi, Jiacheng Zhu

Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a powerful paradigm for post-training large language models (LLMs), achieving state-of-the-art performance on tasks with structured, verifiable answers. Applying RLVR to Multimodal LLMs (MLLMs) presents significant opportunities but is complicated by the broader, heterogeneous nature of vision-language tasks that demand nuanced visual, logical, and spatial capabilities. As such, training MLLMs using RLVR on multiple datasets could be beneficial but creates challenges with conflicting objectives from interaction among diverse datasets, highlighting the need for optimal dataset mixture strategies to improve generalization and reasoning. We introduce a systematic post-training framework for Multimodal LLM RLVR, featuring a rigorous data mixture problem formulation and benchmark implementation. Specifically, (1) We developed a multimodal RLVR framework for multi-dataset post-training by curating a dataset that contains different verifiable vision-language problems and enabling multi-domain online RL learning with different verifiable rewards; (2) We proposed a data mixture strategy that learns to predict the RL fine-tuning outcome from the data mixture distribution, and consequently optimizes the best mixture. Comprehensive experiments showcase that multi-domain RLVR training, when combined with mixture prediction strategies, can significantly boost MLLM general reasoning capacities. Our best mixture improves the post-trained model’s accuracy on out-of-distribution benchmarks by an average of 5.24% compared to the same model post-trained with uniform data mixture, and by a total of 20.74% compared to the pre-finetuning baseline.

nan


Article 2059

Title@2025-06-05 (4): Adversarial Combinatorial Semi-bandits with Graph Feedback

Title: Adversarial Combinatorial Semi-bandits with Graph Feedback Adversariale Kombinatoriale Halbbänder mit Graph Feedback 带有图图反馈的半斜面 2502.18826v4

Authors (1): Yuxiao Wen

In combinatorial semi-bandits, a learner repeatedly selects from a combinatorial decision set of arms, receives the realized sum of rewards, and observes the rewards of the individual selected arms as feedback. In this paper, we extend this framework to include \emph{graph feedback}, where the learner observes the rewards of all neighboring arms of the selected arms in a feedback graph $G$. We establish that the optimal regret over a time horizon $T$ scales as $\widetilde{\Theta}(S\sqrt{T}+\sqrt{\alpha ST})$, where $S$ is the size of the combinatorial decisions and $\alpha$ is the independence number of $G$. This result interpolates between the known regrets $\widetilde\Theta(S\sqrt{T})$ under full information (i.e., $G$ is complete) and $\widetilde\Theta(\sqrt{KST})$ under the semi-bandit feedback (i.e., $G$ has only self-loops), where $K$ is the total number of arms. A key technical ingredient is to realize a convexified action using a random decision vector with negative correlations. We also show that online stochastic mirror descent (OSMD) that only realizes convexified actions in expectation is suboptimal.

nan


Article 2060

Title@2025-06-05 (4): An analytic theory of creativity in convolutional diffusion models

Title: An analytic theory of creativity in convolutional diffusion models Eine analytische Theorie der Kreativität in konvolutionären Diffusionsmodellen 革命扩散模型中的创造性分析理论 2412.20292v2

Authors (2): Mason Kamb, Surya Ganguli

We obtain an analytic, interpretable and predictive theory of creativity in convolutional diffusion models. Indeed, score-matching diffusion models can generate highly original images that lie far from their training data. However, optimal score-matching theory suggests that these models should only be able to produce memorized training examples. To reconcile this theory-experiment gap, we identify two simple inductive biases, locality and equivariance, that: (1) induce a form of combinatorial creativity by preventing optimal score-matching; (2) result in fully analytic, completely mechanistically interpretable, local score (LS) and equivariant local score (ELS) machines that, (3) after calibrating a single time-dependent hyperparameter can quantitatively predict the outputs of trained convolution only diffusion models (like ResNets and UNets) with high accuracy (median $r^2$ of $0.95, 0.94, 0.94, 0.96$ for our top model on CIFAR10, FashionMNIST, MNIST, and CelebA). Our model reveals a locally consistent patch mosaic mechanism of creativity, in which diffusion models create exponentially many novel images by mixing and matching different local training set patches at different scales and image locations. Our theory also partially predicts the outputs of pre-trained self-attention enabled UNets (median $r^2 \sim 0.77$ on CIFAR10), revealing an intriguing role for attention in carving out semantic coherence from local patch mosaics.

nan


Article 2061

Title@2025-06-05 (4): Composing Agents to Minimize Worst-case Risk

Title: Composing Agents to Minimize Worst-case Risk Komponieren von Agenten zur Minimierung des schlimmsten Risikos 尽量减少最坏情况风险的混合剂 2506.04632v1

Authors (2): Guruprerana Shabadi, Rajeev Alur

From software development to robot control, modern agentic systems decompose complex objectives into a sequence of subtasks and choose a set of specialized AI agents to complete them. We formalize an agentic workflow as a directed acyclic graph, called an agent graph, where edges represent AI agents and paths correspond to feasible compositions of agents. When deploying these systems in the real world, we need to choose compositions of agents that not only maximize the task success, but also minimize risk where the risk captures requirements like safety, fairness, and privacy. This additionally requires carefully analyzing the low-probability (tail) behaviors of compositions of agents. In this work, we consider worst-case risk minimization over the set of feasible agent compositions. We define worst-case risk as the tail quantile – also known as value-at-risk – of the loss distribution of the agent composition where the loss quantifies the risk associated with agent behaviors. We introduce an efficient algorithm that traverses the agent graph and finds a near-optimal composition of agents by approximating the value-at-risk via a union bound and dynamic programming. Furthermore, we prove that the approximation is near-optimal asymptotically for a broad class of practical loss functions. To evaluate our framework, we consider a suite of video game-like control benchmarks that require composing several agents trained with reinforcement learning and demonstrate our algorithm’s effectiveness in approximating the value-at-risk and identifying the optimal agent composition.

nan


Article 2062

Title: PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models Vergifteter Papagei: Subtile Daten vergiften Angriffe auf Elicit Copyright-Verletzung Inhalt aus großen Sprachmodellen 有毒Parrot:大语言模型中侵犯版权内容的隐含数据毒害攻击 2503.07697v2

Authors (8): Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang

As the capabilities of large language models (LLMs) continue to expand, their usage has become increasingly prevalent. However, as reflected in numerous ongoing lawsuits regarding LLM-generated content, addressing copyright infringement remains a significant challenge. In this paper, we introduce PoisonedParrot: the first stealthy data poisoning attack that induces an LLM to generate copyrighted content even when the model has not been directly trained on the specific copyrighted material. PoisonedParrot integrates small fragments of copyrighted text into the poison samples using an off-the-shelf LLM. Despite its simplicity, evaluated in a wide range of experiments, PoisonedParrot is surprisingly effective at priming the model to generate copyrighted content with no discernible side effects. Moreover, we discover that existing defenses are largely ineffective against our attack. Finally, we make the first attempt at mitigating copyright-infringement poisoning attacks by proposing a defense: ParrotTrap. We encourage the community to explore this emerging threat model further.

nan


Article 2063

Title@2025-06-05 (4): Constructive Symbolic Reinforcement Learning via Intuitionistic Logic and Goal-Chaining Inference

Title: Constructive Symbolic Reinforcement Learning via Intuitionistic Logic and Goal-Chaining Inference Konstruktives symbolisches Stärkungslernen durch intuitionistische Logik und zielführende Schlussfolgerung 通过神学逻辑和目标 – – 测深推断法进行建设性建构性符号强化学习 2506.05422v1

Authors (1): Andrei T. Patrascu

We introduce a novel learning and planning framework that replaces traditional reward-based optimisation with constructive logical inference. In our model, actions, transitions, and goals are represented as logical propositions, and decision-making proceeds by building constructive proofs under intuitionistic logic. This method ensures that state transitions and policies are accepted only when supported by verifiable preconditions – eschewing probabilistic trial-and-error in favour of guaranteed logical validity. We implement a symbolic agent operating in a structured gridworld, where reaching a goal requires satisfying a chain of intermediate subgoals (e.g., collecting keys to open doors), each governed by logical constraints. Unlike conventional reinforcement learning agents, which require extensive exploration and suffer from unsafe or invalid transitions, our constructive agent builds a provably correct plan through goal chaining, condition tracking, and knowledge accumulation. Empirical comparison with Q-learning demonstrates that our method achieves perfect safety, interpretable behaviour, and efficient convergence with no invalid actions, highlighting its potential for safe planning, symbolic cognition, and trustworthy AI. This work presents a new direction for reinforcement learning grounded not in numeric optimisation, but in constructive logic and proof theory.

nan


Article 2064

Title@2025-06-05 (4): On The Sample Complexity Bounds In Bilevel Reinforcement Learning

Title: On The Sample Complexity Bounds In Bilevel Reinforcement Learning Auf der Probe Komplexität Grenzen im Zwei-Ebenen-Verstärkungs-Lernen 关于在双级强化学习中的抽样复杂度交错 2503.17644v3

Authors (5): Mudit Gaur, Utsav Singh, Amrit Singh Bedi, Raghu Pasupathu, Vaneet Aggarwal

Bilevel reinforcement learning (BRL) has emerged as a powerful framework for aligning generative models, yet its theoretical foundations, especially sample complexity bounds, remain underexplored. In this work, we present the first sample complexity bound for BRL, establishing a rate of $\mathcal{O}(\epsilon^{-3})$ in continuous state-action spaces. Traditional MDP analysis techniques do not extend to BRL due to its nested structure and non-convex lower-level problems. We overcome these challenges by leveraging the Polyak-{\L}ojasiewicz (PL) condition and the MDP structure to obtain closed-form gradients, enabling tight sample complexity analysis. Our analysis also extends to general bi-level optimization settings with non-convex lower levels, where we achieve state-of-the-art sample complexity results of $\mathcal{O}(\epsilon^{-3})$ improving upon existing bounds of $\mathcal{O}(\epsilon^{-6})$. Additionally, we address the computational bottleneck of hypergradient estimation by proposing a fully first-order, Hessian-free algorithm suitable for large-scale problems.

nan


Article 2065

Title@2025-06-05 (4): MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

Title: MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation MAVL: Ein mehrsprachiger Audio-Video-Text Datensatz für animierte Song-Übersetzung MAVL: 动动歌曲翻译多语种视听歌词数据集 2505.18614v2

Authors (4): Woohyun Cho, Youngmin Kim, Sunghyun Lee, Youngjae Yu

Lyrics translation requires both accurate semantic transfer and preservation of musical rhythm, syllabic structure, and poetic style. In animated musicals, the challenge intensifies due to alignment with visual and auditory cues. We introduce Multilingual Audio-Video Lyrics Benchmark for Animated Song Translation (MAVL), the first multilingual, multimodal benchmark for singable lyrics translation. By integrating text, audio, and video, MAVL enables richer and more expressive translations than text-only approaches. Building on this, we propose Syllable-Constrained Audio-Video LLM with Chain-of-Thought SylAVL-CoT, which leverages audio-video cues and enforces syllabic constraints to produce natural-sounding lyrics. Experimental results demonstrate that SylAVL-CoT significantly outperforms text-based models in singability and contextual accuracy, emphasizing the value of multimodal, multilingual approaches for lyrics translation.

nan


Article 2066

Title@2025-06-05 (4): Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos

Title: Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos Ungewissheitsbasiertes Extensibles Codebook für diskretes Federated Learning in heterogenen Datensilos 以不确定性为基础的异异种数据过滤器中分辨联邦学习扩展守则手册 2402.18888v4

Authors (3): Tianyi Zhang, Yu Cao, Dianbo Liu

Federated learning (FL), aimed at leveraging vast distributed datasets, confronts a crucial challenge: the heterogeneity of data across different silos. While previous studies have explored discrete representations to enhance model generalization across minor distributional shifts, these approaches often struggle to adapt to new data silos with significantly divergent distributions. In response, we have identified that models derived from FL exhibit markedly increased uncertainty when applied to data silos with unfamiliar distributions. Consequently, we propose an innovative yet straightforward iterative framework, termed \emph{Uncertainty-Based Extensible-Codebook Federated Learning (UEFL)}. This framework dynamically maps latent features to trainable discrete vectors, assesses the uncertainty, and specifically extends the discretization dictionary or codebook for silos exhibiting high uncertainty. Our approach aims to simultaneously enhance accuracy and reduce uncertainty by explicitly addressing the diversity of data distributions, all while maintaining minimal computational overhead in environments characterized by heterogeneous data silos. Extensive experiments across multiple datasets demonstrate that UEFL outperforms state-of-the-art methods, achieving significant improvements in accuracy (by 3\%–22.1\%) and uncertainty reduction (by 38.83\%–96.24\%). The source code is available at https://github.com/destiny301/uefl.

nan


Article 2067

Title@2025-06-05 (4): Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Title: Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? Können Wasserzeichen große Sprachmodelle die urheberrechtlich geschützte Textgenerierung verhindern und Trainingsdaten ausblenden? 水标记大语言模型能否防止有版权的文本生成和隐藏培训数据? 2407.17417v3

Authors (9): Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang

Large Language Models (LLMs) have demonstrated impressive capabilities in generating diverse and contextually rich text. However, concerns regarding copyright infringement arise as LLMs may inadvertently produce copyrighted material. In this paper, we first investigate the effectiveness of watermarking LLMs as a deterrent against the generation of copyrighted texts. Through theoretical analysis and empirical evaluation, we demonstrate that incorporating watermarks into LLMs significantly reduces the likelihood of generating copyrighted content, thereby addressing a critical concern in the deployment of LLMs. However, we also find that watermarking can have unintended consequences on Membership Inference Attacks (MIAs), which aim to discern whether a sample was part of the pretraining dataset and may be used to detect copyright violations. Surprisingly, we find that watermarking adversely affects the success rate of MIAs, complicating the task of detecting copyrighted text in the pretraining dataset. These results reveal the complex interplay between different regulatory measures, which may impact each other in unforeseen ways. Finally, we propose an adaptive technique to improve the success rate of a recent MIA under watermarking. Our findings underscore the importance of developing adaptive methods to study critical problems in LLMs with potential legal implications.

nan


Article 2068

Title@2025-06-05 (4): Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning

Title: Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning Bedauern-Optimal Q-Lernen mit niedrigen Kosten für ein-Agenten und Federated Verstärkungs-Lernen 单一机构和联邦强化学习的低费用学习 2506.04626v1

Authors (3): Haochen Zhang, Zhong Zheng, Lingzhou Xue

Motivated by real-world settings where data collection and policy deployment – whether for a single agent or across multiple agents – are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL (FRL) with a focus on minimizing burn-in costs (the sample sizes needed to reach near-optimal regret) and policy switching or communication costs. In parallel finite-horizon episodic Markov Decision Processes (MDPs) with $S$ states and $A$ actions, existing methods either require superlinear burn-in costs in $S$ and $A$ or fail to achieve logarithmic switching or communication costs. We propose two novel model-free RL algorithms – Q-EarlySettled-LowCost and FedQ-EarlySettled-LowCost – that are the first in the literature to simultaneously achieve: (i) the best near-optimal regret among all known model-free RL or FRL algorithms, (ii) low burn-in cost that scales linearly with $S$ and $A$, and (iii) logarithmic policy switching cost for single-agent RL or communication cost for FRL. Additionally, we establish gap-dependent theoretical guarantees for both regret and switching/communication costs, improving or matching the best-known gap-dependent bounds.

nan


Article 2069

Title@2025-06-05 (4): Static Word Embeddings for Sentence Semantic Representation

Title: Static Word Embeddings for Sentence Semantic Representation Statische Wort-Einbettungen für Satz semantische Darstellung 判决语义代表的静态单词嵌入 2506.04624v1

Authors (5): Takashi Wada, Yuki Hirakawa, Ryotaro Shimizu, Takahiro Kawashima, Yuki Saito

We propose new static word embeddings optimised for sentence semantic representation. We first extract word embeddings from a pre-trained Sentence Transformer, and improve them with sentence-level principal component analysis, followed by either knowledge distillation or contrastive learning. During inference, we represent sentences by simply averaging word embeddings, which requires little computational cost. We evaluate models on both monolingual and cross-lingual tasks and show that our model substantially outperforms existing static models on sentence semantic tasks, and even rivals a basic Sentence Transformer model (SimCSE) on some data sets. Lastly, we perform a variety of analyses and show that our method successfully removes word embedding components that are irrelevant to sentence semantics, and adjusts the vector norms based on the influence of words on sentence semantics.

nan


Article 2070

Title@2025-06-05 (4): Solving Inverse Problems via Diffusion-Based Priors: An Approximation-Free Ensemble Sampling Approach

Title: Solving Inverse Problems via Diffusion-Based Priors: An Approximation-Free Ensemble Sampling Approach Inverse Probleme durch Diffusionsbasierte Priorien lösen: Ein annäherungsfreier Ensemble-Probenansatz 通过以传播为基础的前例解决反向问题:一种近似-无聚合集成样样样的方法 2506.03979v2

Authors (5): Haoxuan Chen, Yinuo Ren, Martin Renqiang Min, Lexing Ying, Zachary Izzo

Diffusion models (DMs) have proven to be effective in modeling high-dimensional distributions, leading to their widespread adoption for representing complex priors in Bayesian inverse problems (BIPs). However, current DM-based posterior sampling methods proposed for solving common BIPs rely on heuristic approximations to the generative process. To exploit the generative capability of DMs and avoid the usage of such approximations, we propose an ensemble-based algorithm that performs posterior sampling without the use of heuristic approximations. Our algorithm is motivated by existing works that combine DM-based methods with the sequential Monte Carlo (SMC) method. By examining how the prior evolves through the diffusion process encoded by the pre-trained score function, we derive a modified partial differential equation (PDE) governing the evolution of the corresponding posterior distribution. This PDE includes a modified diffusion term and a reweighting term, which can be simulated via stochastic weighted particle methods. Theoretically, we prove that the error between the true posterior distribution can be bounded in terms of the training error of the pre-trained score function and the number of particles in the ensemble. Empirically, we validate our algorithm on several inverse problems in imaging to show that our method gives more accurate reconstructions compared to existing DM-based methods.

nan


Article 2071

Title@2025-06-05 (4): Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting

Title: Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting Proaktive Modellanpassung gegen Konzept Drift für Online-Zeitreihenprognosen 针对在线时间序列预测的 “ 动态动态 “ 概念的前瞻性模型适应 2412.08435v4

Authors (2): Lifan Zhao, Yanyan Shen

Time series forecasting always faces the challenge of concept drift, where data distributions evolve over time, leading to a decline in forecast model performance. Existing solutions are based on online learning, which continually organize recent time series observations as new training samples and update model parameters according to the forecasting feedback on recent data. However, they overlook a critical issue: obtaining ground-truth future values of each sample should be delayed until after the forecast horizon. This delay creates a temporal gap between the training samples and the test sample. Our empirical analysis reveals that the gap can introduce concept drift, causing forecast models to adapt to outdated concepts. In this paper, we present Proceed, a novel proactive model adaptation framework for online time series forecasting. Proceed first estimates the concept drift between the recently used training samples and the current test sample. It then employs an adaptation generator to efficiently translate the estimated drift into parameter adjustments, proactively adapting the model to the test sample. To enhance the generalization capability of the framework, Proceed is trained on synthetic diverse concept drifts. Extensive experiments on five real-world datasets across various forecast models demonstrate that Proceed brings more performance improvements than the state-of-the-art online learning methods, significantly facilitating forecast models’ resilience against concept drifts. Code is available at https://github.com/SJTU-DMTai/OnlineTSF.

nan


Article 2072

Title@2025-06-05 (4): Efficiently Serving Large Multimodal Models Using EPD Disaggregation

Title: Efficiently Serving Large Multimodal Models Using EPD Disaggregation Effizientes Servieren großer multimodaler Modelle mit EPD-Disaggregation 利用EPD拆分有效服务大型多模式模式 2501.05460v3

Authors (12): Gursimran Singh, Xinglu Wang, Yifan Hu, Timothy Yu, Linzi Xing, Wei Jiang, Zhefeng Wang, Xiaolong Bai, Yi Li, Ying Xiong, Yong Zhang, Zhenan Fan

Large Multimodal Models (LMMs) extend Large Language Models (LLMs) by handling diverse inputs such as images, audio, and video, but at the cost of adding a multimodal encoding stage that increases both computational and memory overhead. This step negatively affects key Service Level Objectives (SLOs), such as time to first token (TTFT) and time per output token (TPOT). We introduce Encode-Prefill-Decode (EPD) Disaggregation, a novel framework that separates the encoding, prefill, and decode stages onto dedicated resources. Unlike current systems, which bundle encoding and prefill together, our approach decouples these steps, unlocking new opportunities and optimizations. These include a mechanism to cache multimedia tokens for efficient transfer, a novel way to parallelize the encoding load within a request, a module for optimal resource allocation for disaggregated serving, and a novel role-switching method to handle changing workload characteristics. Experimental evaluations with popular LMMs show substantial gains in memory efficiency (up to 15x lower peak memory utilization), batch sizes (up to 22x larger), 10x more images per request, and 2.2x larger KV caches. Furthermore, it leads to significant improvements in SLO attainment (up to 90-100% improvement) and TTFT (up to 71% reduction), compared to systems that do not disaggregate. The code is available at https://github.com/vbdi/epdserve.

nan


Article 2073

Title@2025-06-05 (4): Exploring bidirectional bounds for minimax-training of Energy-based models

Title: Exploring bidirectional bounds for minimax-training of Energy-based models Erforschung bidirektionaler Grenzen für Minimax-Training von energiebasierten Modellen 探讨能源模型小型培训的双向双向界限 2506.04609v1

Authors (6): Cong Geng, Jia Wang, Li Chen, Zhiyong Gao, Jes Frellsen, Søren Hauberg

Energy-based models (EBMs) estimate unnormalized densities in an elegant framework, but they are generally difficult to train. Recent work has linked EBMs to generative adversarial networks, by noting that they can be trained through a minimax game using a variational lower bound. To avoid the instabilities caused by minimizing a lower bound, we propose to instead work with bidirectional bounds, meaning that we maximize a lower bound and minimize an upper bound when training the EBM. We investigate four different bounds on the log-likelihood derived from different perspectives. We derive lower bounds based on the singular values of the generator Jacobian and on mutual information. To upper bound the negative log-likelihood, we consider a gradient penalty-like bound, as well as one based on diffusion processes. In all cases, we provide algorithms for evaluating the bounds. We compare the different bounds to investigate, the pros and cons of the different approaches. Finally, we demonstrate that the use of bidirectional bounds stabilizes EBM training and yields high-quality density estimation and sample generation.

nan


Article 2074

Title@2025-06-05 (4): ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning

Title: ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning ADG: Ambient Diffusion-Guided Dataset Recovery für Korruption-Robust Offline-Verstärkung Lernen ADG: 为腐败 – – ROBT离线强化学习而进行渗透性传播、指导性数据集恢复 2505.23871v2

Authors (8): Zeyuan Liu, Zhihe Yang, Jiawei Xu, Rui Yang, Jiafei Lyu, Baoxiang Wang, Yunjian Xu, Xiu Li

Real-world datasets collected from sensors or human inputs are prone to noise and errors, posing significant challenges for applying offline reinforcement learning (RL). While existing methods have made progress in addressing corrupted actions and rewards, they remain insufficient for handling corruption in high-dimensional state spaces and for cases where multiple elements in the dataset are corrupted simultaneously. Diffusion models, known for their strong denoising capabilities, offer a promising direction for this problem-but their tendency to overfit noisy samples limits their direct applicability. To overcome this, we propose Ambient Diffusion-Guided Dataset Recovery (ADG), a novel approach that pioneers the use of diffusion models to tackle data corruption in offline RL. First, we introduce Ambient Denoising Diffusion Probabilistic Models (DDPM) from approximated distributions, which enable learning on partially corrupted datasets with theoretical guarantees. Second, we use the noise-prediction property of Ambient DDPM to distinguish between clean and corrupted data, and then use the clean subset to train a standard DDPM. Third, we employ the trained standard DDPM to refine the previously identified corrupted data, enhancing data quality for subsequent offline RL training. A notable strength of ADG is its versatility-it can be seamlessly integrated with any offline RL algorithm. Experiments on a range of benchmarks, including MuJoCo, Kitchen, and Adroit, demonstrate that ADG effectively mitigates the impact of corrupted data and improves the robustness of offline RL under various noise settings, achieving state-of-the-art results.

nan


Article 2075

Title@2025-06-05 (4): Ignoring Directionality Leads to Compromised Graph Neural Network Explanations

Title: Ignoring Directionality Leads to Compromised Graph Neural Network Explanations Ignorieren der Richtungalität führt zu kompromissierten Graphen-Neural-Netzwerk-Erklärungen 忽略方向引导导致压缩的图形神经网络解释 2506.04608v1

Authors (3): Changsheng Sun, Xinke Li, Jin Song Dong

Graph Neural Networks (GNNs) are increasingly used in critical domains, where reliable explanations are vital for supporting human decision-making. However, the common practice of graph symmetrization discards directional information, leading to significant information loss and misleading explanations. Our analysis demonstrates how this practice compromises explanation fidelity. Through theoretical and empirical studies, we show that preserving directional semantics significantly improves explanation quality, ensuring more faithful insights for human decision-makers. These findings highlight the need for direction-aware GNN explainability in security-critical applications.

nan


Article 2076

Title@2025-06-05 (4): MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball

Title: MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball MVP-Shapley: Featurebasierte Modellierung für die Bewertung des wertvollsten Spielers im Basketball MVP-Shaplay:评估篮球中最有价值的玩家的基于地物的模型模型 2506.04602v1

Authors (8): Haifeng Sun, Yu Xiong, Runze Wu, Kai Wang, Lan Zhang, Changjie Fan, Shaojie Tang, Xiang-Yang Li

The burgeoning growth of the esports and multiplayer online gaming community has highlighted the critical importance of evaluating the Most Valuable Player (MVP). The establishment of an explainable and practical MVP evaluation method is very challenging. In our study, we specifically focus on play-by-play data, which records related events during the game, such as assists and points. We aim to address the challenges by introducing a new MVP evaluation framework, denoted as \oursys, which leverages Shapley values. This approach encompasses feature processing, win-loss model training, Shapley value allocation, and MVP ranking determination based on players’ contributions. Additionally, we optimize our algorithm to align with expert voting results from the perspective of causality. Finally, we substantiated the efficacy of our method through validation using the NBA dataset and the Dunk City Dynasty dataset and implemented online deployment in the industry.

nan


Article 2077

Title@2025-06-05 (4): Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets

Title: Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets Skalierungsgesetze für einen robusten Vergleich von Open Foundation Language-Vision-Modellen und -Datensätzen 开放基础语言-视觉模型和数据集强力比较法 2506.04598v1

Authors (7): Marianna Nezhurina, Tomer Porian, Giovanni Pucceti, Tommie Kerssies, Romain Beaumont, Mehdi Cherti, Jenia Jitsev

In studies of transferable learning, scaling laws are obtained for various important foundation models to predict their properties and performance at larger scales. We show here how scaling law derivation can also be used for model and dataset comparison, allowing to decide which procedure is to be preferred for pre-training. For the first time, full scaling laws based on dense measurements across a wide span of model and samples seen scales are derived for two important language-vision learning procedures, CLIP and MaMMUT, that use either contrastive only or contrastive and captioning text generative loss. Ensuring sufficient prediction accuracy for held out points, we use derived scaling laws to compare both models, obtaining evidence for MaMMUT’s stronger improvement with scale and better sample efficiency than standard CLIP. To strengthen validity of the comparison, we show scaling laws for various downstream tasks, classification, retrieval, and segmentation, and for different open datasets, DataComp, DFN and Re-LAION, observing consistently the same trends. We show that comparison can also be performed when deriving scaling laws with a constant learning rate schedule, reducing compute cost. Accurate derivation of scaling laws provides thus means to perform model and dataset comparison across scale spans, avoiding misleading conclusions based on measurements from single reference scales only, paving the road for systematic comparison and improvement of open foundation models and datasets for their creation. We release all the pre-trained models with their intermediate checkpoints, including openMaMMUT-L/14, which achieves $80.3\%$ zero-shot ImageNet-1k accuracy, trained on 12.8B samples from DataComp-1.4B. Code for reproducing experiments in the paper and raw experiments data can be found at https://github.com/LAION-AI/scaling-laws-for-comparison.

nan


Article 2078

Title@2025-06-05 (4): Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

Title: Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Lösen des vernünftigen Potenzials von vortrainierten LLMs durch Kritik Feinsteuerung auf ein Problem 通过Critique对一个问题的微调,释放预先培训的LLMs的理据潜力 2506.03295v2

Authors (5): Yubo Wang, Ping Nie, Kai Zou, Lijun Wu, Wenhu Chen

We have witnessed that strong LLMs like Qwen-Math, MiMo, and Phi-4 possess immense reasoning potential inherited from the pre-training stage. With reinforcement learning (RL), these models can improve dramatically on reasoning tasks. Recent studies have shown that even RL on a single problem can unleash these models’ reasoning capabilities. However, RL is not only expensive but also unstable. Even one-shot RL requires hundreds of GPU hours. This raises a critical question: Is there a more efficient way to unleash the reasoning potential of these powerful base LLMs? In this work, we demonstrate that Critique Fine-Tuning (CFT) on only one problem can effectively unleash the reasoning potential of LLMs. Our method constructs critique data by collecting diverse model-generated solutions to a single problem and using teacher LLMs to provide detailed critiques. We fine-tune Qwen and Llama family models, ranging from 1.5B to 14B parameters, on the CFT data and observe significant performance gains across diverse reasoning tasks. For example, with just 5 GPU hours of training, Qwen-Math-7B-CFT show an average improvement of 15% on six math benchmarks and 16% on three logic reasoning benchmarks. These results are comparable to or even surpass the results from RL with 20x less compute. Ablation studies reveal the robustness of one-shot CFT across different prompt problems. These results highlight one-shot CFT as a simple, general, and compute-efficient approach to unleashing the reasoning capabilities of modern LLMs.

nan


Article 2079

Title@2025-06-05 (4): Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable

Title: Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable Sicherheitssteuer: Sicherheitsausrichtung macht Ihre großen vernünftigen Modelle weniger vernünftig 安全税:安全协调使你的大理由模型降低合理性 2503.00555v2

Authors (7): Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Zachary Yahn, Yichang Xu, Ling Liu

Safety alignment is an important procedure before the official deployment of a Large Language Model (LLM). While safety alignment has been extensively studied for LLM, there is still a large research gap for Large Reasoning Models (LRMs) that equip with improved reasoning capability. We in this paper systematically examine a simplified pipeline for producing safety aligned LRMs. With our evaluation of various LRMs, we deliver two main findings: i) Safety alignment can be done upon the LRM to restore its safety capability. ii) Safety alignment leads to a degradation of the reasoning capability of LRMs. The two findings show that there exists a trade-off between reasoning and safety capability with the sequential LRM production pipeline. The discovered trade-off, which we name Safety Tax, should shed light on future endeavors of safety research on LRMs. As a by-product, we curate a dataset called DirectRefusal, which might serve as an alternative dataset for safety alignment. Our source code is available at https://github.com/git-disl/Safety-Tax.

nan


Article 2080

Title@2025-06-05 (4): Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification

Title: Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification Sicher: Mathematische Reasoning in großen Sprachmodellen durch retrospektive stufenweise Verifizierung verbessern 安全:通过回溯性逐步认识正式核查,加强大语言模型中的数学理由 2506.04592v1

Authors (10): Chengwu Liu, Ye Yuan, Yichun Yin, Yan Xu, Xin Xu, Zaoyu Chen, Yasheng Wang, Lifeng Shang, Qun Liu, Ming Zhang

Chain-of-Thought (CoT) prompting has become the de facto method to elicit reasoning capabilities from large language models (LLMs). However, to mitigate hallucinations in CoT that are notoriously difficult to detect, current methods such as process reward models (PRMs) or self-consistency operate as opaque boxes and do not provide checkable evidence for their judgments, possibly limiting their effectiveness. To address this issue, we draw inspiration from the idea that “the gold standard for supporting a mathematical claim is to provide a proof”. We propose a retrospective, step-aware formal verification framework $Safe$. Rather than assigning arbitrary scores, we strive to articulate mathematical claims in formal mathematical language Lean 4 at each reasoning step and provide formal proofs to identify hallucinations. We evaluate our framework $Safe$ across multiple language models and various mathematical datasets, demonstrating a significant performance improvement while offering interpretable and verifiable evidence. We also propose $FormalStep$ as a benchmark for step correctness theorem proving with $30,809$ formal statements. To the best of our knowledge, our work represents the first endeavor to utilize formal mathematical language Lean 4 for verifying natural language content generated by LLMs, aligning with the reason why formal mathematical languages were created in the first place: to provide a robust foundation for hallucination-prone human-written proofs.

nan


Article 2081

Title@2025-06-05 (4): Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret

Title: Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret Thompson Sampling und UCB verbinden: Auf dem Weg zu effizienteren Kompromissen zwischen Datenschutz und Bedauern 连接Thompson抽样和UCB:实现更高效的隐私和遗憾之间的权衡取舍 2505.02383v2

Authors (5): Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias Lécuyer, Nidhi Hegde

We address differentially private stochastic bandit problems from the angles of exploring the deep connections among Thompson Sampling with Gaussian priors, Gaussian mechanisms, and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private bandit algorithm that enables to trade off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-\alpha)}\right)$-GDP and enjoys an $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bound, where $\alpha \in [0,1]$ controls the trade-off between privacy and regret. Theoretically, our DP-TS-UCB relies on anti-concentration bounds of Gaussian distributions and links exploration mechanisms in Thompson Sampling-based algorithms and Upper Confidence Bound-based algorithms, which may be of independent interest.

nan


Article 2082

Title@2025-06-05 (4): Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization

Title: Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization Adressierung von Konzept-Mislabeling in Konzept-Bottleneck-Modellen durch Preference-Optimierung 通过优先优化处理概念瓶颈模式中的概念误贴标签问题 2504.18026v3

Authors (4): Emiliano Penaloza, Tianyue H. Zhan, Laurent Charlin, Mateo Espinosa Zarlenga

Concept Bottleneck Models (CBMs) propose to enhance the trustworthiness of AI systems by constraining their decisions on a set of human-understandable concepts. However, CBMs typically assume that datasets contain accurate concept labels-an assumption often violated in practice, which we show can significantly degrade performance (by 25% in some cases). To address this, we introduce the Concept Preference Optimization (CPO) objective, a new loss function based on Direct Preference Optimization, which effectively mitigates the negative impact of concept mislabeling on CBM performance. We provide an analysis of key properties of the CPO objective, showing it directly optimizes for the concept’s posterior distribution, and contrast it against Binary Cross Entropy (BCE), demonstrating that CPO is inherently less sensitive to concept noise. We empirically confirm our analysis by finding that CPO consistently outperforms BCE on three real-world datasets, both with and without added label noise. We make our code available on Github.

nan


Article 2083

Title@2025-06-05 (4): AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment

Title: AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment AdvBDGen: Adversarially Fortified Prompt-Spezifische Fuzzy Backdoor Generator gegen LLM Ausrichtung AdvBDGen:抗LLM对齐的反增强的快速特制烟雾后门发电机 2410.11283v3

Authors (4): Pankayaraj Pathmanathan, Udari Madhushani Sehwag, Michael-Andrei Panaitescu-Liess, Furong Huang

With the growing adoption of reinforcement learning with human feedback (RLHF) for aligning large language models (LLMs), the risk of backdoor installation during alignment has increased, leading to unintended and harmful behaviors. Existing backdoor triggers are typically limited to fixed word patterns, making them detectable during data cleaning and easily removable post-poisoning. In this work, we explore the use of prompt-specific paraphrases as backdoor triggers, enhancing their stealth and resistance to removal during LLM alignment. We propose AdvBDGen, an adversarially fortified generative fine-tuning framework that automatically generates prompt-specific backdoors that are effective, stealthy, and transferable across models. AdvBDGen employs a generator-discriminator pair, fortified by an adversary, to ensure the installability and stealthiness of backdoors. It enables the crafting and successful installation of complex triggers using as little as 3% of the fine-tuning data. Once installed, these backdoors can jailbreak LLMs during inference, demonstrate improved stability against perturbations compared to traditional constant triggers, and are more challenging to remove. These findings underscore an urgent need for the research community to develop more robust defenses against adversarial backdoor threats in LLM alignment.

nan


Article 2084

Title@2025-06-05 (4): Learning Hyperparameters via a Data-Emphasized Variational Objective

Title: Learning Hyperparameters via a Data-Emphasized Variational Objective Hyperparameter über ein datenverstärktes Variationsziel lernen 通过数据-按单位分列的变异目标学习超超参数 2502.01861v2

Authors (3): Ethan Harvey, Mikhail Petrov, Michael C. Hughes

When training large flexible models on limited data, avoiding overfitting is a practical concern. Common grid search or smarter search methods rely on expensive separate runs at each candidate hyperparameter while carving out a validation set that reduces available training data. In this paper, we consider direct gradient-based learning of regularization hyperparameters on the full training set via the evidence lower bound (“ELBo”) objective from Bayesian variational methods. We focus on scenarios where the model is over-parameterized for flexibility while the approximate posterior is chosen to be Gaussian with isotropic covariance for tractability, even though it cannot match the true posterior exactly. In such scenarios, we find the ELBo prioritizes posteriors that match the prior variance, which leads to severely underfitting the data. Instead, we recommend a data-emphasized ELBo that upweights the influence of the data likelihood relative to the prior. In Bayesian transfer learning of classifiers for text and images, our method reduces 88+ hour grid searches of past work to under 3 hours while delivering comparable accuracy. We further demonstrate how our approach enables efficient yet accurate approximations of Gaussian processes with learnable length-scale kernels.

nan


Article 2085

Title@2025-06-05 (4): Contrastive Visual Data Augmentation

Title: Contrastive Visual Data Augmentation Kontrastive Bilddatenvergrößerung 对比视觉数据增强 2502.17709v2

Authors (9): Yu Zhou, Bingxuan Li, Mohan Tang, Xiaomeng Jin, Te-Lin Wu, Kuan-Hao Huang, Heng Ji, Kai-Wei Chang, Nanyun Peng

Large multimodal models (LMMs) often struggle to recognize novel concepts, as they rely on pre-trained knowledge and have limited ability to capture subtle visual details. Domain-specific knowledge gaps in training also make them prone to confusing visually similar, commonly misrepresented, or low-resource concepts. To help LMMs better align nuanced visual features with language, improving their ability to recognize and reason about novel or rare concepts, we propose a Contrastive visual Data Augmentation (CoDA) strategy. CoDA extracts key contrastive textual and visual features of target concepts against the known concepts they are misrecognized as, and then uses multimodal generative models to produce targeted synthetic data. Automatic filtering of extracted features and augmented images is implemented to guarantee their quality, as verified by human annotators. We show the effectiveness and efficiency of CoDA on low-resource concept and diverse scene recognition datasets including INaturalist and SUN. We additionally collect NovelSpecies, a benchmark dataset consisting of newly discovered animal species that are guaranteed to be unseen by LMMs. LLaVA-1.6 1-shot updating results on these three datasets show CoDA significantly improves SOTA visual data augmentation strategies by 12.3% (NovelSpecies), 5.1% (SUN), and 6.0% (iNat) absolute gains in accuracy.

nan


Article 2086

Title@2025-06-05 (4): StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation

Title: StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation StatsMerging: Statistik-geführtes Modell Zusammenführen über Task-Spezifische Lehrer-Destillation StatsMerging:通过特定任务教师蒸馏将统计指导模型合并 2506.04567v1

Authors (3): Ranjith Merugu, Bryan Bo Cao, Shubham Jain

Model merging has emerged as a promising solution to accommodate multiple large models within constrained memory budgets. We present StatsMerging, a novel lightweight learning-based model merging method guided by weight distribution statistics without requiring ground truth labels or test samples. StatsMerging offers three key advantages: (1) It uniquely leverages singular values from singular value decomposition (SVD) to capture task-specific weight distributions, serving as a proxy for task importance to guide task coefficient prediction; (2) It employs a lightweight learner StatsMergeLearner to model the weight distributions of task-specific pre-trained models, improving generalization and enhancing adaptation to unseen samples; (3) It introduces Task-Specific Teacher Distillation for merging vision models with heterogeneous architectures, a merging learning paradigm that avoids costly ground-truth labels by task-specific teacher distillation. Notably, we present two types of knowledge distillation, (a) distilling knowledge from task-specific models to StatsMergeLearner; and (b) distilling knowledge from models with heterogeneous architectures prior to merging. Extensive experiments across eight tasks demonstrate the effectiveness of StatsMerging. Our results show that StatsMerging outperforms state-of-the-art techniques in terms of overall accuracy, generalization to unseen tasks, and robustness to image quality variations.

nan


Article 2087

Title@2025-06-05 (4): Clustering and Median Aggregation Improve Differentially Private Inference

Title: Clustering and Median Aggregation Improve Differentially Private Inference Clustering und Median Aggregation verbessern unterschiedliche private Schlussfolgerungen 群集和中中聚合改善差别私人推断 2506.04566v1

Authors (7): Kareem Amin, Salman Avestimehr, Sara Babakniya, Alex Bie, Weiwei Kong, Natalia Ponomareva, Umar Syed

Differentially private (DP) language model inference is an approach for generating private synthetic text. A sensitive input example is used to prompt an off-the-shelf large language model (LLM) to produce a similar example. Multiple examples can be aggregated together to formally satisfy the DP guarantee. Prior work creates inference batches by sampling sensitive inputs uniformly at random. We show that uniform sampling degrades the quality of privately generated text, especially when the sensitive examples concern heterogeneous topics. We remedy this problem by clustering the input data before selecting inference batches. Next, we observe that clustering also leads to more similar next-token predictions across inferences. We use this insight to introduce a new algorithm that aggregates next token statistics by privately computing medians instead of averages. This approach leverages the fact that the median has decreased local sensitivity when next token predictions are similar, allowing us to state a data-dependent and ex-post DP guarantee about the privacy properties of this algorithm. Finally, we demonstrate improvements in terms of representativeness metrics (e.g., MAUVE) as well as downstream task performance. We show that our method produces high-quality synthetic data at significantly lower privacy cost than a previous state-of-the-art method.

nan


Article 2088

Title@2025-06-05 (4): ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration

Title: ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration ReCalKV: Low-Rank KV Cache-Kompression über Head Reordering und Offline-Kalibrierung ReCalKV: 通过头重排序和离线校准进行低Rank KV缓存压缩 2505.24357v2

Authors (6): Xianglong Yan, Zhiteng Li, Tianao Zhang, Linghe Kong, Yulun Zhang, Xiaokang Yang

Large language models (LLMs) have achieved remarkable performance, yet their capability on long-context reasoning is often constrained by the excessive memory required to store the Key-Value (KV) cache. This makes KV cache compression an essential step toward enabling efficient long-context reasoning. Recent methods have explored reducing the hidden dimensions of the KV cache, but many introduce additional computation through projection layers or suffer from significant performance degradation under high compression ratios. To address these challenges, we propose ReCalKV, a post-training KV cache compression method that reduces the hidden dimensions of the KV cache. We develop distinct compression strategies for Keys and Values based on their different roles and varying importance in the attention mechanism. For Keys, we propose Head-wise Similarity-aware Reordering (HSR), which clusters similar heads and applies grouped SVD to the key projection matrix, reducing additional computation while preserving accuracy. For Values, we propose Offline Calibration and Matrix Fusion (OCMF) to preserve accuracy without extra computational overhead. Experiments show that ReCalKV outperforms existing low-rank compression methods, achieving high compression ratios with minimal performance loss. The code and models will be available at: https://github.com/XIANGLONGYAN/ReCalKV.

nan


Article 2089

Title@2025-06-05 (4): OpenThoughts: Data Recipes for Reasoning Models

Title: OpenThoughts: Data Recipes for Reasoning Models OpenThoughts: Datenrezepte für sinnvolle Modelle OpenToughts: 用于解释模型的数据折录 2506.04178v2

Authors (50): Etash Guha, Ryan Marten, Sedrick Keh, Negin Raoof, Georgios Smyrnis, Hritik Bansal, Marianna Nezhurina, Jean Mercat, Trung Vu, Zayne Sprague, Ashima Suvarna, Benjamin Feuer, Liangyu Chen, Zaid Khan, Eric Frankel, Sachin Grover, Caroline Choi, Niklas Muennighoff, Shiye Su, Wanjia Zhao, John Yang, Shreyas Pimpalgaonkar, Kartik Sharma, Charlie Cheng-Jie Ji, Yichuan Deng, Sarah Pratt, Vivek Ramanujan, Jon Saad-Falcon, Jeffrey Li, Achal Dave, Alon Albalak, Kushal Arora, Blake Wulfe, Chinmay Hegde, Greg Durrett, Sewoong Oh, Mohit Bansal, Saadia Gabriel, Aditya Grover, Kai-Wei Chang, Vaishaal Shankar, Aaron Gokaslan, Mike A. Merrill, Tatsunori Hashimoto, Yejin Choi, Jenia Jitsev, Reinhard Heckel, Maheswaran Sathiamoorthy, Alexandros G. Dimakis, Ludwig Schmidt

Reasoning models have made rapid progress on many benchmarks involving math, code, and science. Yet, there are still many open questions about the best training recipes for reasoning since state-of-the-art models often rely on proprietary datasets with little to no public information available. To address this, the goal of the OpenThoughts project is to create open-source datasets for training reasoning models. After initial explorations, our OpenThoughts2-1M dataset led to OpenThinker2-32B, the first model trained on public reasoning data to match DeepSeek-R1-Distill-32B on standard reasoning benchmarks such as AIME and LiveCodeBench. We then improve our dataset further by systematically investigating each step of our data generation pipeline with 1,000+ controlled experiments, which led to OpenThoughts3. Scaling the pipeline to 1.2M examples and using QwQ-32B as teacher yields our OpenThoughts3-7B model, which achieves state-of-the-art results: 53% on AIME 2025, 51% on LiveCodeBench 06/24-01/25, and 54% on GPQA Diamond - improvements of 15.3, 17.2, and 20.5 percentage points compared to the DeepSeek-R1-Distill-Qwen-7B. All of our datasets and models are available on https://openthoughts.ai.

nan


Article 2090

Title@2025-06-05 (4): Investigating Distributions of Telecom Adapted Sentence Embeddings for Document Retrieval

Title: Investigating Distributions of Telecom Adapted Sentence Embeddings for Document Retrieval Untersuchung der Verteilung von Telecom-adaptierten Satz-Einbindungen für Dokumentenwiederherstellung 用于文件检索的经调整的远程判刑嵌入表的调查分发情况 2406.12336v3

Authors (7): Sujoy Roychowdhury, Sumit Soman, Ranjani Hosakere Gireesha, Vansh Chhabra, Neeraj Gunda, Subhadip Bandyopadhyay, Sai Krishna Bala

A plethora of sentence embedding models makes it challenging to choose one, especially for technical domains rich with specialized vocabulary. In this work, we domain adapt embeddings using telecom data for question answering. We evaluate embeddings obtained from publicly available models and their domain-adapted variants, on both point retrieval accuracies, as well as their (95%) confidence intervals. We establish a systematic method to obtain thresholds for similarity scores for different embeddings. As expected, we observe that fine-tuning improves mean bootstrapped accuracies. We also observe that it results in tighter confidence intervals, which further improve when pre-training is preceded by fine-tuning. We introduce metrics which measure the distributional overlaps of top-$K$, correct and random document similarities with the question. Further, we show that these metrics are correlated with retrieval accuracy and similarity thresholds. Recent literature shows conflicting effects of isotropy on retrieval accuracies. Our experiments establish that the isotropy of embeddings (as measured by two independent state-of-the-art isotropy metric definitions) is poorly correlated with retrieval performance. We show that embeddings for domain-specific sentences have little overlap with those for domain-agnostic ones, and fine-tuning moves them further apart. Based on our results, we provide recommendations for use of our methodology and metrics by researchers and practitioners.

nan


Article 2091

Title@2025-06-05 (4): LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits

Title: LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits LEDRO: LLM-Enhanced Design Raumreduzierung und Optimierung für analoge Schaltungen LLDRO: LLM-加强设计空间减少和模拟电路优化 2411.12930v2

Authors (4): Dimple Vijay Kochar, Hanrui Wang, Anantha Chandrakasan, Xin Zhang

Traditional approaches for designing analog circuits are time-consuming and require significant human expertise. Existing automation efforts using methods like Bayesian Optimization (BO) and Reinforcement Learning (RL) are sub-optimal and costly to generalize across different topologies and technology nodes. In our work, we introduce a novel approach, LEDRO, utilizing Large Language Models (LLMs) in conjunction with optimization techniques to iteratively refine the design space for analog circuit sizing. LEDRO is highly generalizable compared to other RL and BO baselines, eliminating the need for design annotation or model training for different topologies or technology nodes. We conduct a comprehensive evaluation of our proposed framework and baseline on 22 different Op-Amp topologies across four FinFET technology nodes. Results demonstrate the superior performance of LEDRO as it outperforms our best baseline by an average of 13% FoM improvement with 2.15x speed-up on low complexity Op-Amps and 48% FoM improvement with 1.7x speed-up on high complexity Op-Amps. This highlights LEDRO’s effective performance, efficiency, and generalizability.

nan


Article 2092

Title@2025-06-05 (4): What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Title: What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Was in LLM-Schichten passiert ist, wenn es um schnelles gegen langsames Denken geht: Eine gradiente Perspektive 训练快速与慢思考:渐进视角时 LLM 图层中发生的情况 2410.23743v2

Authors (3): Ming Li, Yanhong Li, Tianyi Zhou

What makes a difference in the post-training of LLMs? We investigate the training patterns of different layers in large language models (LLMs) through the lens of the gradient. We are specifically interested in how fast vs. slow thinking affects the layer-wise gradients, given the recent popularity of training LLMs on reasoning paths such as chain-of-thoughts (CoT) and process rewards. In our study, fast thinking without CoT leads to larger gradients and larger differences of gradients across layers than slow thinking (Detailed CoT), indicating the learning stability brought by the latter. Additionally, we study whether the gradient patterns can reflect the correctness of responses when training different LLMs using slow vs. fast thinking paths. The results show that the gradients of slow thinking can distinguish correct and irrelevant reasoning paths. As a comparison, we conduct similar gradient analyses on non-reasoning knowledge learning tasks, on which, however, trivially increasing the response length does not lead to similar behaviors of slow thinking. Our study strengthens fundamental understandings of LLM training and sheds novel insights on its efficiency and stability, which pave the way towards building a generalizable System-2 agent. Our code, data, and gradient statistics can be found in: https://github.com/MingLiiii/Layer_Gradient.

nan


Article 2093

Title@2025-06-05 (4): Non-linear Multi-objective Optimization with Probabilistic Branch and Bound

Title: Non-linear Multi-objective Optimization with Probabilistic Branch and Bound Nichtlineare Multiobjektive Optimierung mit probabilistischem Zweig und Bound 非线性多目标优化与概率分支和圆 2506.04554v1

Authors (2): Hao Huang, Zelda B. Zabinsky

A multiple objective simulation optimization algorithm named Multiple Objective Probabilistic Branch and Bound with Single Observation (MOPBnB(so)) is presented for approximating the Pareto optimal set and the associated efficient frontier for stochastic multi-objective optimization problems. MOPBnB(so) evaluates a noisy function exactly once at any solution and uses neighboring solutions to estimate the objective functions, in contrast to a variant that uses multiple replications at a solution to estimate the objective functions. A finite-time performance analysis for deterministic multi-objective problems provides a bound on the probability that MOPBnB(so) captures the Pareto optimal set. Asymptotic convergence of MOPBnB(so) on stochastic problems is derived, in that the algorithm captures the Pareto optimal set and the estimations converge to the true objective function values. Numerical results reveal that the variant with multiple replications is extremely intensive in terms of computational resources compared to MOPBnB(so). In addition, numerical results show that MOPBnB(so) outperforms a genetic algorithm NSGA-II on test problems.

nan


Article 2094

Title@2025-06-05 (4): Unsupervised Machine Learning for Scientific Discovery: Workflow and Best Practices

Title: Unsupervised Machine Learning for Scientific Discovery: Workflow and Best Practices Unüberwachtes maschinelles Lernen für wissenschaftliche Entdeckung: Workflow und Best Practices 为科学发现进行无人监督的机器学习:工作流程和最佳做法 2506.04553v1

Authors (4): Andersen Chang, Tiffany M. Tang, Tarek M. Zikry, Genevera I. Allen

Unsupervised machine learning is widely used to mine large, unlabeled datasets to make data-driven discoveries in critical domains such as climate science, biomedicine, astronomy, chemistry, and more. However, despite its widespread utilization, there is a lack of standardization in unsupervised learning workflows for making reliable and reproducible scientific discoveries. In this paper, we present a structured workflow for using unsupervised learning techniques in science. We highlight and discuss best practices starting with formulating validatable scientific questions, conducting robust data preparation and exploration, using a range of modeling techniques, performing rigorous validation by evaluating the stability and generalizability of unsupervised learning conclusions, and promoting effective communication and documentation of results to ensure reproducible scientific discoveries. To illustrate our proposed workflow, we present a case study from astronomy, seeking to refine globular clusters of Milky Way stars based upon their chemical composition. Our case study highlights the importance of validation and illustrates how the benefits of a carefully-designed workflow for unsupervised learning can advance scientific discovery.

nan


Article 2095

Title@2025-06-05 (4): Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality

Title: Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality Politisches Lernen “ohne” überschneidet sich: Pessimismus und generalisierte empirische Ungleichheit Bernsteins 政策学习“无重叠”重叠:悲观主义和普遍经验型伯恩斯坦的不平等 2212.09900v4

Authors (4): Ying Jin, Zhimei Ren, Zhuoran Yang, Zhaoran Wang

This paper studies offline policy learning, which aims at utilizing observations collected a priori (from either fixed or adaptively evolving behavior policies) to learn an optimal individualized decision rule that achieves the best overall outcomes for a given population. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded. As one has no control over the data collection process, this assumption can be unrealistic in many situations, especially when the behavior policies are allowed to evolve over time with diminishing propensities for certain actions. In this paper, we propose Pessimistic Policy Learning (PPL), a new algorithm that optimizes lower confidence bounds (LCBs) – instead of point estimates – of the policy values. The LCBs are constructed using knowledge of the behavior policies for collecting the offline data. Without assuming any uniform overlap condition, we establish a data-dependent upper bound for the suboptimality of our algorithm, which only depends on (i) the overlap for the optimal policy, and (ii) the complexity of the policy class we optimize over. As an implication, for adaptively collected data, we ensure efficient policy learning as long as the propensities for optimal actions are lower bounded over time, while those for suboptimal ones are allowed to diminish arbitrarily fast. In our theoretical analysis, we develop a new self-normalized type concentration inequality for inverse-propensity-weighting estimators, generalizing the well-known empirical Bernstein’s inequality to unbounded and non-i.i.d. data. We complement our theory with an efficient optimization algorithm via Majorization-Minimization and policy tree search, as well as extensive simulation studies and real-world applications that demonstrate the efficacy of PPL.

nan


Article 2096

Title@2025-06-05 (4): Communication Efficient Adaptive Model-Driven Quantum Federated Learning

Title: Communication Efficient Adaptive Model-Driven Quantum Federated Learning Kommunikation Effizientes adaptives modellgetriebenes Quantum-Federiertes Lernen 适应性强、适应性强的示范模式-量子联邦学习 2506.04548v1

Authors (2): Dev Gurung, Shiva Raj Pokhrel

Training with huge datasets and a large number of participating devices leads to bottlenecks in federated learning (FL). Furthermore, the challenges of heterogeneity between multiple FL clients affect the overall performance of the system. In a quantum federated learning (QFL) context, we address these three main challenges: i) training bottlenecks from massive datasets, ii) the involvement of a substantial number of devices, and iii) non-IID data distributions. We introduce a model-driven quantum federated learning algorithm (mdQFL) to tackle these challenges. Our proposed approach is efficient and adaptable to various factors, including different numbers of devices. To the best of our knowledge, it is the first to explore training and update personalization, as well as test generalization within a QFL setting, which can be applied to other FL scenarios. We evaluated the efficiency of the proposed mdQFL framework through extensive experiments under diverse non-IID data heterogeneity conditions using various datasets within the Qiskit environment. Our results demonstrate a nearly 50% decrease in total communication costs while maintaining or, in some cases, exceeding the accuracy of the final model and consistently improving local model training compared to the standard QFL baseline. Moreover, our experimental evaluation thoroughly explores the QFL and mdQFL algorithms, along with several influencing factors. In addition, we present a theoretical analysis to clarify the complexities of the proposed algorithm. The experimental code is available at 1.

nan


Article 2097

Title@2025-06-05 (4): $K$-MSHC: Unmasking Minimally Sufficient Head Circuits in Large Language Models with Experiments on Syntactic Classification Tasks

Title: $K$-MSHC: Unmasking Minimally Sufficient Head Circuits in Large Language Models with Experiments on Syntactic Classification Tasks $K$-MSHC: Entlarvung minimal ausreichender Kopfkreise in großen Sprachmodellen mit Experimenten zu syntaktischen Klassifizierungsaufgaben USK美元-MSHC:在以同步分类任务实验的大型语言模型中,以最小化的足够大语言电路总电路 2505.12268v2

Authors (3): Pratim Chowdhary, Peter Chin, Deepernab Chakrabarty

Understanding which neural components drive specific capabilities in mid-sized language models ($\leq$10B parameters) remains a key challenge. We introduce the $(\bm{K}, \epsilon)$-Minimum Sufficient Head Circuit ($K$-MSHC), a methodology to identify minimal sets of attention heads crucial for classification tasks as well as Search-K-MSHC, an efficient algorithm for discovering these circuits. Applying our Search-K-MSHC algorithm to Gemma-9B, we analyze three syntactic task families: grammar acceptability, arithmetic verification, and arithmetic word problems. Our findings reveal distinct task-specific head circuits, with grammar tasks predominantly utilizing early layers, word problems showing pronounced activity in both shallow and deep regions, and arithmetic verification demonstrating a more distributed pattern across the network. We discover non-linear circuit overlap patterns, where different task pairs share computational components at varying levels of importance. While grammar and arithmetic share many “weak” heads, arithmetic and word problems share more consistently critical “strong” heads. Importantly, we find that each task maintains dedicated “super-heads” with minimal cross-task overlap, suggesting that syntactic and numerical competencies emerge from specialized yet partially reusable head circuits.

nan


Article 2098

Title@2025-06-05 (4): ATLaS: Agent Tuning via Learning Critical Steps

Title: ATLaS: Agent Tuning via Learning Critical Steps ATLaS: Agent Tuning über Learning Critical Steps 通过学习关键步骤进行测试的代理 2503.02197v2

Authors (6): Zhixun Chen, Ming Li, Yuxuan Huang, Yali Du, Meng Fang, Tianyi Zhou

Large Language Model (LLM) agents have demonstrated remarkable generalization capabilities across multi-domain tasks. Existing agent tuning approaches typically employ supervised finetuning on entire expert trajectories. However, behavior-cloning of full trajectories can introduce expert bias and weaken generalization to states not covered by the expert data. Additionally, critical steps, such as planning, complex reasoning for intermediate subtasks, and strategic decision-making, are essential to success in agent tasks, so learning these steps is the key to improving LLM agents. For more effective and efficient agent tuning, we propose ATLaS that identifies the critical steps in expert trajectories and finetunes LLMs solely on these steps with reduced costs. By steering the training’s focus to a few critical steps, our method mitigates the risk of overfitting entire trajectories and promotes generalization across different environments and tasks. In extensive experiments, an LLM finetuned on only 30% critical steps selected by ATLaS outperforms the LLM finetuned on all steps and recent open-source LLM agents. ATLaS maintains and improves base LLM skills as generalist agents interacting with diverse environments.

nan


Article 2099

Title@2025-06-05 (4): General Intelligence Requires Reward-based Pretraining

Title: General Intelligence Requires Reward-based Pretraining Allgemeine Intelligenz erfordert eine prämienbasierte Vorausbildung 以奖励为基础的训练前训练 2502.19402v2

Authors (4): Seungwook Han, Jyothish Pari, Samuel J. Gershman, Pulkit Agrawal

Large Language Models (LLMs) have demonstrated impressive real-world utility, exemplifying artificial useful intelligence (AUI). However, their ability to reason adaptively and robustly – the hallmarks of artificial general intelligence (AGI) – remains fragile. While LLMs seemingly succeed in commonsense reasoning, programming, and mathematics, they struggle to generalize algorithmic understanding across novel contexts. Our experiments with algorithmic tasks in esoteric programming languages reveal that LLM’s reasoning overfits to the training data and is limited in its transferability. We hypothesize that the core issue underlying such limited transferability is the coupling of reasoning and knowledge in LLMs. To transition from AUI to AGI, we propose disentangling knowledge and reasoning through three key directions: (1) pretaining to reason using RL from scratch as an alternative to the widely used next-token prediction pretraining, (2) using a curriculum of synthetic tasks to ease the learning of a reasoning prior for RL that can then be transferred to natural language tasks, and (3) learning more generalizable reasoning functions using a small context window to reduce exploiting spurious correlations between tokens. Such a reasoning system coupled with a trained retrieval system and a large external memory bank as a knowledge store can overcome several limitations of existing architectures at learning to reason in novel scenarios.

nan


Article 2100

Title@2025-06-05 (4): Combinatorial Reinforcement Learning with Preference Feedback

Title: Combinatorial Reinforcement Learning with Preference Feedback Kombinatorisches Stärkungslernen mit Präferenz-Feedback 结合强化学习与优先反馈 2502.10158v3

Authors (2): Joongkyu Lee, Min-hwan Oh

In this paper, we consider combinatorial reinforcement learning with preference feedback, where a learning agent sequentially offers an action–an assortment of multiple items to–a user, whose preference feedback follows a multinomial logistic (MNL) model. This framework allows us to model real-world scenarios, particularly those involving long-term user engagement, such as in recommender systems and online advertising. However, this framework faces two main challenges: (1) the unknown value of each item, unlike traditional MNL bandits that only address single-step preference feedback, and (2) the difficulty of ensuring optimism while maintaining tractable assortment selection in the combinatorial action space with unknown values. In this paper, we assume a contextual MNL preference model, where the mean utilities are linear, and the value of each item is approximated by a general function. We propose an algorithm, MNL-VQL, that addresses these challenges, making it both computationally and statistically efficient. As a special case, for linear MDPs (with the MNL preference feedback), we establish the first regret lower bound in this framework and show that MNL-VQL achieves nearly minimax-optimal regret. To the best of our knowledge, this is the first work to provide statistical guarantees in combinatorial RL with preference feedback.

nan


Article 2101

Title@2025-06-05 (4): hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation

Title: hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation hdl2v: Ein Code-Übersetzungsdatensatz für verbesserte LLM Verilog-Generierung hdl2v: 用于强化LLM Verilog 生成的代码翻译数据集 2506.04544v1

Authors (6): Charles Hong, Brendan Roberts, Huijae An, Alex Um, Advay Ratan, Yakun Sophia Shao

Large language models (LLMs) are playing an increasingly large role in domains such as code generation, including hardware code generation, where Verilog is the key language. However, the amount of publicly available Verilog code pales in comparison to the amount of code available for software languages like Python. In this work, we present hdl2v (“HDL-to-Verilog”), a dataset which seeks to increase the amount of available human-written Verilog data by translating or compiling three other hardware description languages - VHDL, Chisel, and PyMTL3 - to Verilog. Furthermore, we demonstrate the value of hdl2v in enhancing LLM Verilog generation by improving performance of a 32 billion-parameter open-weight model by up to 23% (pass@10) in VerilogEvalV2, without utilizing any data augmentation or knowledge distillation from larger models. We also show hdl2v’s ability to boost the performance of a data augmentation-based fine-tuning approach by 63%. Finally, we characterize and analyze our dataset to better understand which characteristics of HDL-to-Verilog datasets can be expanded upon in future work for even better performance.

nan


Article 2102

Title@2025-06-05 (4): Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction

Title: Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction Neurale MJD: Neurale nicht-stationäre Merton Jump Diffusion für Zeitreihenvorhersage 神经MJD: 用于时间序列预测的神经非立体Merton跳跃扩散 2506.04542v1

Authors (4): Yuanpei Gao, Qi Yan, Yan Leng, Renjie Liao

While deep learning methods have achieved strong performance in time series prediction, their black-box nature and inability to explicitly model underlying stochastic processes often limit their generalization to non-stationary data, especially in the presence of abrupt changes. In this work, we introduce Neural MJD, a neural network based non-stationary Merton jump diffusion (MJD) model. Our model explicitly formulates forecasting as a stochastic differential equation (SDE) simulation problem, combining a time-inhomogeneous It\^o diffusion to capture non-stationary stochastic dynamics with a time-inhomogeneous compound Poisson process to model abrupt jumps. To enable tractable learning, we introduce a likelihood truncation mechanism that caps the number of jumps within small time intervals and provide a theoretical error bound for this approximation. Additionally, we propose an Euler-Maruyama with restart solver, which achieves a provably lower error bound in estimating expected states and reduced variance compared to the standard solver. Experiments on both synthetic and real-world datasets demonstrate that Neural MJD consistently outperforms state-of-the-art deep learning and statistical learning methods.

nan


Article 2103

Title@2025-06-05 (4): Chronoamperometry with Room-Temperature Ionic Liquids: Sub-Second Inference Techniques

Title: Chronoamperometry with Room-Temperature Ionic Liquids: Sub-Second Inference Techniques Chronoamperometrie mit Ionenflüssigkeiten der Raumtemperatur: Sub-Second Inferenztechniken 与室-室-温度-不态液体:二次二次推断技术 2506.04540v1

Authors (1): Kordel K. France

Chronoamperometry (CA) is a fundamental electrochemical technique used for quantifying redox-active species. However, in room-temperature ionic liquids (RTILs), the high viscosity and slow mass transport often lead to extended measurement durations. This paper presents a novel mathematical regression approach that reduces CA measurement windows to under 1 second, significantly faster than previously reported methods, which typically require 1-4 seconds or longer. By applying an inference algorithm to the initial transient current response, this method accurately predicts steady-state electrochemical parameters without requiring additional hardware modifications. The approach is validated through comparison with standard chronoamperometric techniques and is demonstrated to maintain reasonable accuracy while dramatically reducing data acquisition time. The implications of this technique are explored in analytical chemistry, sensor technology, and battery science, where rapid electrochemical quantification is critical. Our technique is focused on enabling faster multiplexing of chronoamperometric measurements for rapid olfactory and electrochemical analysis.

nan


Article 2104

Title@2025-06-05 (4): Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models

Title: Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models Intelligent4DSE: Optimierung der High-Level-Synthese Design Space Exploration mit Graph Neural Networks und großen Sprachmodellen 智能4DSE:利用图形神经网络和大语言模型优化高级合成设计空间探索 2504.19649v2

Authors (4): Lei Xu, Shanshan Wang, Emmanuel Casseau, Chenglong Xiao

High-level synthesis (HLS) design space exploration (DSE) is an optimization process in electronic design automation (EDA) that systematically explores high-level design configurations to achieve Pareto-optimal hardware implementations balancing performance, area, and power (PPA). To optimize this process, HLS prediction tasks often employ message-passing neural networks (MPNNs), leveraging complex architectures to achieve high accuracy. These predictors serve as evaluators in the DSE process, effectively bypassing the time-consuming estimations traditionally required by HLS tools. However, existing models often prioritize structural complexity and minimization of training loss, overlooking task-specific characteristics. Additionally, while evolutionary algorithms are widely used in DSE, they typically require extensive domain-specific knowledge to design effective crossover and mutation operators. To address these limitations, we propose CoGNNs-LLMEA, a framework that integrates a graph neural network with task-adaptive message passing and a large language model-enhanced evolutionary algorithm. As a predictive model, CoGNNs directly leverages intermediate representations generated from source code after compiler front-end processing, enabling prediction of quality of results (QoR) without invoking HLS tools. Due to its strong adaptability to tasks, CoGNNs can be tuned to predict post-HLS and post-implementation outcomes, effectively bridging the gap between high-level abstractions and physical implementation characteristics. CoGNNs achieves state-of-the-art prediction accuracy in post-HLS QoR prediction, reducing mean prediction errors by 2.8$\times$ for latency and 3.4$\times$ for resource utilization compared to baseline models.

nan


Article 2105

Title@2025-06-05 (4): Olfactory Inertial Odometry: Sensor Calibration and Drift Compensation

Title: Olfactory Inertial Odometry: Sensor Calibration and Drift Compensation Olfactory Inertial Odometrie: Sensorkalibrierung und Driftkompensation 腐蚀性惰性测量:传感器校准和漂流赔偿 2506.04539v1

Authors (4): Kordel K. France, Ovidiu Daescu, Anirban Paul, Shalini Prasad

Visual inertial odometry (VIO) is a process for fusing visual and kinematic data to understand a machine’s state in a navigation task. Olfactory inertial odometry (OIO) is an analog to VIO that fuses signals from gas sensors with inertial data to help a robot navigate by scent. Gas dynamics and environmental factors introduce disturbances into olfactory navigation tasks that can make OIO difficult to facilitate. With our work here, we define a process for calibrating a robot for OIO that generalizes to several olfaction sensor types. Our focus is specifically on calibrating OIO for centimeter-level accuracy in localizing an odor source on a slow-moving robot platform to demonstrate use cases in robotic surgery and touchless security screening. We demonstrate our process for OIO calibration on a real robotic arm and show how this calibration improves performance over a cold-start olfactory navigation task.

nan


Article 2106

Title@2025-06-05 (4): Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

Title: Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs Projektionsbasierte Lyapunov-Methode für voll heterogene schwach gekoppelte MDPs 以投射为基础的Lyapunov 方法,用于完全多元的微软相混合的 MDP 2502.06072v4

Authors (3): Xiangcheng Zhang, Yige Hong, Weina Wang

Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, an efficiently computable policy achieves an $O(1/\sqrt{N})$ optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.

nan


Article 2107

Title@2025-06-05 (4): NOBLE – Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models

Title: NOBLE – Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models NOBLE – Neuronenoperator mit biologisch informierten Latent-Embeddings zur Erfassung experimenteller Variabilität in biologischen Neuronenmodellen NOBL – – 神经操作器,具有生物知情的中程嵌入装置,以捕获生物中子模型中的实验变异性 2506.04536v1

Authors (6): Luca Ghafourpour, Valentin Duruisseaux, Bahareh Tolooshams, Philip H. Wong, Costas A. Anastassiou, Anima Anandkumar

Characterizing the diverse computational properties of human neurons via multimodal electrophysiological, transcriptomic, and morphological data provides the foundation for constructing and validating bio-realistic neuron models that can advance our understanding of fundamental mechanisms underlying brain function. However, current modeling approaches remain constrained by the limited availability and intrinsic variability of experimental neuronal data. To capture variability, ensembles of deterministic models are often used, but are difficult to scale as model generation requires repeating computationally expensive optimization for each neuron. While deep learning is becoming increasingly relevant in this space, it fails to capture the full biophysical complexity of neurons, their nonlinear voltage dynamics, and variability. To address these shortcomings, we introduce NOBLE, a neural operator framework that learns a mapping from a continuous frequency-modulated embedding of interpretable neuron features to the somatic voltage response induced by current injection. Trained on data generated from biophysically realistic neuron models, NOBLE predicts distributions of neural dynamics accounting for the intrinsic experimental variability. Unlike conventional bio-realistic neuron models, interpolating within the embedding space offers models whose dynamics are consistent with experimentally observed responses. NOBLE is the first scaled-up deep learning framework validated on real experimental data, enabling efficient generation of synthetic neurons that exhibit trial-to-trial variability and achieve a $4200\times$ speedup over numerical solvers. To this end, NOBLE captures fundamental neural properties, opening the door to a better understanding of cellular composition and computations, neuromorphic architectures, large-scale brain circuits, and general neuroAI applications.

nan


Article 2108

Title@2025-06-05 (4): On the Emergence of Position Bias in Transformers

Title: On the Emergence of Position Bias in Transformers Über die Entstehung der Position Bias in Transformatoren 变换者中立场偏见的出现 2502.01951v2

Authors (4): Xinyi Wu, Yifei Wang, Stefanie Jegelka, Ali Jadbabaie

Recent studies have revealed various manifestations of position bias in transformer architectures, from the “lost-in-the-middle” phenomenon to attention sinks, yet a comprehensive theoretical understanding of how attention masks and positional encodings shape these biases remains elusive. This paper presents a graph-theoretic framework for analyzing position bias in multi-layer attention. Modeling attention masks as directed graphs, we quantify how tokens interact with contextual information based on their sequential positions. We uncover two key insights: First, causal masking inherently biases attention toward earlier positions, as tokens in deeper layers attend to increasingly more contextualized representations of earlier tokens. Second, we characterize the competing effects of the causal mask and relative positional encodings, such as the decay mask and rotary positional encoding (RoPE): while both mechanisms introduce distance-based decay within individual attention maps, their aggregate effect across multiple attention layers$\unicode{x2013}$coupled with the causal mask$\unicode{x2013}$leads to a trade-off between the long-term decay effects and the cumulative importance of early sequence positions. Through controlled numerical experiments, we not only validate our theoretical findings but also reproduce position biases observed in real-world LLMs. Our framework offers a principled foundation for understanding positional biases in transformers, shedding light on the complex interplay of attention mechanism components and guiding more informed architectural design.

nan


Article 2109

Title@2025-06-05 (4): HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training

Title: HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training HALOS: Hierarchische asynchrone lokale SGD über langsame Netzwerke für geo-distributed Large Language Model Training HALOS:地理分布大语言模式培训的跨慢网的等级非同步本地 SGD 2506.04531v1

Authors (8): Geon-Woo Kim, Junbo Li, Shashidhar Gandham, Omar Baldonado, Adithya Gangidi, Pavan Balaji, Zhangyang Wang, Aditya Akella

Training large language models (LLMs) increasingly relies on geographically distributed accelerators, causing prohibitive communication costs across regions and uneven utilization of heterogeneous hardware. We propose HALoS, a hierarchical asynchronous optimization framework that tackles these issues by introducing local parameter servers (LPSs) within each region and a global parameter server (GPS) that merges updates across regions. This hierarchical design minimizes expensive inter-region communication, reduces straggler effects, and leverages fast intra-region links. We provide a rigorous convergence analysis for HALoS under non-convex objectives, including theoretical guarantees on the role of hierarchical momentum in asynchronous training. Empirically, HALoS attains up to 7.5x faster convergence than synchronous baselines in geo-distributed LLM training and improves upon existing asynchronous methods by up to 2.1x. Crucially, HALoS preserves the model quality of fully synchronous SGD-matching or exceeding accuracy on standard language modeling and downstream benchmarks-while substantially lowering total training time. These results demonstrate that hierarchical, server-side update accumulation and global model merging are powerful tools for scalable, efficient training of new-era LLMs in heterogeneous, geo-distributed environments.

nan


Article 2110

Title@2025-06-05 (4): Self-Predictive Dynamics for Generalization of Vision-based Reinforcement Learning

Title: Self-Predictive Dynamics for Generalization of Vision-based Reinforcement Learning Selbstvorhersagbare Dynamiken für die Generalisierung des visionsbasierten Verstärkungslernens 普及基于愿景的强化学习的自我预测动态 2506.05418v1

Authors (3): Kyungsoo Kim, Jeongsoo Ha, Yusung Kim

Vision-based reinforcement learning requires efficient and robust representations of image-based observations, especially when the images contain distracting (task-irrelevant) elements such as shadows, clouds, and light. It becomes more important if those distractions are not exposed during training. We design a Self-Predictive Dynamics (SPD) method to extract task-relevant features efficiently, even in unseen observations after training. SPD uses weak and strong augmentations in parallel, and learns representations by predicting inverse and forward transitions across the two-way augmented versions. In a set of MuJoCo visual control tasks and an autonomous driving task (CARLA), SPD outperforms previous studies in complex observations, and significantly improves the generalization performance for unseen observations. Our code is available at https://github.com/unigary/SPD.

nan


Article 2111

Title@2025-06-05 (4): Hierarchical Implicit Neural Emulators

Title: Hierarchical Implicit Neural Emulators Hierarchische Implizite Neuralemulatoren 等级隐性神经隐性模拟器 2506.04528v1

Authors (7): Ruoxi Jiang, Xiao Zhang, Karan Jakhar, Peter Y. Lu, Pedram Hassanzadeh, Michael Maire, Rebecca Willett

Neural PDE solvers offer a powerful tool for modeling complex dynamical systems, but often struggle with error accumulation over long time horizons and maintaining stability and physical consistency. We introduce a multiscale implicit neural emulator that enhances long-term prediction accuracy by conditioning on a hierarchy of lower-dimensional future state representations. Drawing inspiration from the stability properties of numerical implicit time-stepping methods, our approach leverages predictions several steps ahead in time at increasing compression rates for next-timestep refinements. By actively adjusting the temporal downsampling ratios, our design enables the model to capture dynamics across multiple granularities and enforce long-range temporal coherence. Experiments on turbulent fluid dynamics show that our method achieves high short-term accuracy and produces long-term stable forecasts, significantly outperforming autoregressive baselines while adding minimal computational overhead.

nan


Article 2112

Title@2025-06-05 (4): Uncovering Memorization Effect in the Presence of Spurious Correlations

Title: Uncovering Memorization Effect in the Presence of Spurious Correlations Uncovering Memorization Effekt in der Gegenwart von Purous Correlations 在存在净腐蚀物时未覆盖的记忆化效应 2501.00961v3

Authors (6): Chenyu You, Haocheng Dai, Yifei Min, Jasjeet S. Sekhon, Sarang Joshi, James S. Duncan

Machine learning models often rely on simple spurious features – patterns in training data that correlate with targets but are not causally related to them, like image backgrounds in foreground classification. This reliance typically leads to imbalanced test performance across minority and majority groups. In this work, we take a closer look at the fundamental cause of such imbalanced performance through the lens of memorization, which refers to the ability to predict accurately on atypical examples (minority groups) in the training set but failing in achieving the same accuracy in the testing set. This paper systematically shows the ubiquitous existence of spurious features in a small set of neurons within the network, providing the first-ever evidence that memorization may contribute to imbalanced group performance. Through three experimental sources of converging empirical evidence, we find the property of a small subset of neurons or channels in memorizing minority group information. Inspired by these findings, we hypothesize that spurious memorization, concentrated within a small subset of neurons, plays a key role in driving imbalanced group performance. To further substantiate this hypothesis, we show that eliminating these unnecessary spurious memorization patterns via a novel framework during training can significantly affect the model performance on minority groups. Our experimental results across various architectures and benchmarks offer new insights on how neural networks encode core and spurious knowledge, laying the groundwork for future research in demystifying robustness to spurious correlation.

nan


Article 2113

Title@2025-06-05 (4): Perturbative Gradient Training: A novel training paradigm for bridging the gap between deep neural networks and physical reservoir computing

Title: Perturbative Gradient Training: A novel training paradigm for bridging the gap between deep neural networks and physical reservoir computing Perturbative Gradient Training: Ein neuartiges Trainingsparadigma zur Überbrückung der Kluft zwischen tiefen neuronalen Netzwerken und physikalischen Reservoir Computing 循环渐进培训:弥合深神经网络与物理储油层计算之间的差距的新培训模式 2506.04523v1

Authors (3): Cliff B. Abbott, Mark Elo, Dmytro A. Bozhko

We introduce Perturbative Gradient Training (PGT), a novel training paradigm that overcomes a critical limitation of physical reservoir computing: the inability to perform backpropagation due to the black-box nature of physical reservoirs. Drawing inspiration from perturbation theory in physics, PGT uses random perturbations in the network’s parameter space to approximate gradient updates using only forward passes. We demonstrate the feasibility of this approach on both simulated neural network architectures, including a dense network and a transformer model with a reservoir layer, and on experimental hardware using a magnonic auto-oscillation ring as the physical reservoir. Our results show that PGT can achieve performance comparable to that of standard backpropagation methods in cases where backpropagation is impractical or impossible. PGT represents a promising step toward integrating physical reservoirs into deeper neural network architectures and achieving significant energy efficiency gains in AI training.

nan


Article 2114

Title@2025-06-05 (4): EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles

Title: EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles EnsemW2S: Verbesserung der Schwach-zu-Strong-Verallgemeinerung mit großsprachigen Modellensembles EnsemW2S:用大语言模型组合加强弱至强的通用化 2505.21959v2

Authors (9): Aakriti Agrawal, Mucong Ding, Zora Che, Chenghao Deng, Anirudh Satheesh, Bang An, Bayan Bruss, John Langford, Furong Huang

With Large Language Models (LLMs) rapidly approaching and potentially surpassing human-level performance, it has become imperative to develop approaches capable of effectively supervising and enhancing these powerful models using smaller, human-level models exposed to only human-level data. We address this critical weak-to-strong (W2S) generalization challenge by proposing a novel method aimed at improving weak experts, by training on the same limited human-level data, enabling them to generalize to complex, super-human-level tasks. Our approach, called \textbf{EnsemW2S}, employs a token-level ensemble strategy that iteratively combines multiple weak experts, systematically addressing the shortcomings identified in preceding iterations. By continuously refining these weak models, we significantly enhance their collective ability to supervise stronger student models. We extensively evaluate the generalization performance of both the ensemble of weak experts and the subsequent strong student model across in-distribution (ID) and out-of-distribution (OOD) datasets. For OOD, we specifically introduce question difficulty as an additional dimension for defining distributional shifts. Our empirical results demonstrate notable improvements, achieving 4\%, and 3.2\% improvements on ID datasets and, upto 6\% and 2.28\% on OOD datasets for experts and student models respectively, underscoring the effectiveness of our proposed method in advancing W2S generalization.

nan